rk4 algorithm sent to you chuck

Development directions, tasks, and features being actively implemented or pursued by the development team.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

klauss wrote:
chuck_starchaser wrote:Integration of RK4 physics
  • Finishing RK4 physics:
  • Finishing interpolation:
  • Finishing scheduler:
That should happen before integration (ie: before branching), IMO.
I'll get to work on my parts of it immediately.
I'd go for a hybrid: ie, do the new scheduler (with full collisions every physics frame), but without the octree structure, and rather use the current sorted list structure. We'll see if it poses a performance issue once it's done, and if we need to go all the way to the octree or we can leave it for later.
K.
That behavior will have to be retained, it seems to be a core gameplay feature for privateer-like mods. We'll need a form of scripted motion I guess.
I guess we could create a new type of AF for realistic physics (if not realistic roid concentrations). OR... If we wanted to eventually replace all current AF's by a new type, we could just make it configurable.
Do we rewrite gravity?, or just re-enable the old code for it?
Rewrite - it's the EASIEST way, rk4 should handle it transparently.
Cool!
In fact, I keep thinking that gravity should be separate from unit forces, torques and mass loss. I think we should have a /gravity folder, and rk4step could call it separately and combine it with the force it gets from unit code; rather than complicate unit code with it.
In fact, maybe gravitational attractors should be in a separate array from the RigidBody array, as it needs to be traversed very often, and they are very sparse items.
There's no cross-platform way of obtaining finer resolution time - period. All possible (reasonably portable) ways have been coded in lin_time.h/cpp, to my knowledge.

PS: the new soundsystem thus uses sound tracks as time source for lipsync with movies, system time isn't accurate enough. But using sound tracks as time source for physics wouldn't be viable IMHO.
ROFL
"At the end of the beep, the time is 7829010284 milliseconds 932 microseconds; this is a recording..." :D
Maybe I should design a little circuit board with a USB connector and PIC microcontroller that can be queried for
an arbitrary, say 48-bit, "time" in microseconds; and either get filthy rich on it, or push it as open source hardware.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

Well, you could plug a (slightly more expensive) gps tracker and read its gps-derived time, which is... well... VERY accurate.

(it's the time on the GPS satellites, which is a highly stable atomic clock)
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

OT:
That would be a useful product on its own, but gps costs $.
What I'm talking about would be RIDICULOUSLY cheap:
A circuit board about the size of a thumb nail, a USB connector, and an 8-pin microcontroller, which bought in tape and reel rolls of 5000 cost less than a dollar.
Its product is not standard time, but just time... FineTimeUSB(TM)
ALL real-time games could use this, and some other applications too, that don't need to know absolute time, but need some kind of reference with better resolution than a millisecond.
Standard time could be accomodated in software: with low absolute accuracy (+/- a couple of seconds) but with improved resolution down to microseconds. Essentially, the software would be embedded in your clock app (in the task bar) and would correlate standard time to usb-time at 1 second intervals. If an application then asks for standard time it could get an expanded packet with date, hours, minutes, seconds, milliseconds and microseconds.
But in any case, what games desperately need is a high resolution time base to query; and this is what this gadget would provide.
/OT

Coded my own fixed array allocator, as I couldn't find an appropriate one anywhere.

Code: Select all

#include <string>
#include <memory>
#include <cassert>

/*
 * Couldn't find a really fast array allocator, so I'm coding my own.
 * This allocator is fixed size element AND fixed size array.
 * It is also NOT thread safe by any stretch of the imagination.
 * Instead of a pointer, it returns an index into its array, on alloc.
 * You must provide it with a calculated size such that that it is
 * size_you_need + sizeof(ndx_t) rounded up to a 2^n, -sizeof(ndx_t).
* ASz must NOT be a power of two. Fibonacci's are best --e.g. 6765
 */
 
struct msg
{
    std::string s;
    msg( char const * cs ) : s(cs) {}
};

#ifndef _NDEBUG
  #define assert_or_throw( cond, cstr ) do{ assert( ((cond)) ||! cstr ); }while(0)
#else
  #define assert_or_throw( cond, cstr ) do{ if( !((cond)) ) throw( msg(cstr) ); }while(0)
#endif

//not sure how to implement prefetch, yet
#define prefetch_clean( x ) do{}while(0)
#define prefetch_dirty( x ) do{}while(0)

template < size_t TSz, size_t ASz >
class ArrayAllocator
{
public:
    typedef unsigned int ndx_t;
private:
    struct ArrayElement
    {
        char mem_[TSz];
        ndx_t next_free_;
    }
    array_[ASz];
    ndx_t first_free_;
    ndx_t count_;
    //non-copiable:
    ArrayAllocator( ArrayAllocator const & );
    void operator=( ArrayAllocator const & );
public:
    void erase()
    {
        for( ndx_t i = ASz; i !=0; --i )
            array_[i-1].next_free_ = i;
        array_[ASz-1].next_free_ = 0;
        first_free_ = 0;
        count_ = 0;
    }
    ArrayAllocator()
    {
        erase();
    }
    ndx_t get_count() const { return count_; }
    bool is_free( ndx_t ndx ) const { return array_[ndx].next_free_ != 0; }
    void * operator[]( ndx_t ndx ){ return reinterpret_cast<void*>( array_+ndx ); }
    ndx_t allocate()
    {
        prefetch_dirty( array_[first_free_].next_free_ );
        ndx_t ret = first_free_;
        assert_or_throw( ret < ASz-1, "ArrayAllocator full" );
        first_free_ = array_[ret].next_free_;
        ++count_;
        array_[ret].next_free_ = 0;
        return ret;
    }
    void deallocate( ndx_t which )
    {
        assert_or_throw( which < ASz-1, "ArrayAllocator: deallocating index beyond range" );
        assert_or_throw( array_[which].next_free_ == 0, "ArrayAllocator: deallocating index that was already free" );
        ndx_t temp = first_free_;
        --count_;
        first_free_ = which;
        array_[which].next_free_ = temp;
    }
    //this iterator is a joke; intentionally; super-simple: it always starts at zero and
    //only has pre-increment operator which returns the index of the next array element in use,
    //at each ++. If you ++ and there are no more used elements, it simply throws msg("").
    //To be used like... iterator i; try{ while(1){ do_some(myarray[++i]); } } catch(...){} 
    class iterator
    {
        ArrayAllocator const & aa_;
        ndx_t ndx_;
        ndx_t cnt_;
    public:
        iterator( ArrayAllocator const & aa ) : aa_(aa), ndx_(0), cnt_(0) {}
        void rewind(){ ndx_ = 0; cnt_ = 0; }
        ndx_t operator++()
        {
            assert_or_throw( cnt_ < aa_.get_count(), "" ); //throws msg("") when done.
            while( aa_.is_free(++ndx_) );
            ++cnt_;
            prefetch_clean( aa_[ndx+128/TSz] );
            return ndx_;
        }
    };
    std::auto_ptr<iterator> get_iterator(){ return new iterator( *this ); }
};
7 lines of code to allocate.
6 lines of code to deallocate.
Find an allocator out there that can match that :)
Compiles, too...
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

--and I'm having a wild idea about that, that maybe rk4step() could compute rotation first, update the interpolation structure, and pass a pointer to it as an argument to forces, torques and mass loss...
One of us is not understanding. In rk4, to get, for example, the position at the intermediate state, I compute intermediate_position=x0+v0*dt..an euler step. It is just 1 multiplication and 1 addition.

To interpolate the position at the intermediate state linearly given x1 and x0, you compute intermediate_position=x0+(x1-x0)*dt/deltaT. This is also just 1 multiplication and 1 addition if you pre-compute (x1-x0)/deltaT.

This is the same for rotations. The linear interpolation is just as expensive computationally as the calculation for the intermediate state in rk4(which remember, is just an euler step). Computing the intermediate position via interpolation does not save you any time computationally.

Once the intermediate state is calculated, rk4 asks the system to tell it what the force, torque and mass loss are at the intermediate (time,position, orientation, mass, and any other state variables that may be relevant to those quantities.

To reiterate, an euler step and a linear interpolation are the same computational cost in terms of computing the intermediate state in rk4. The reason for using interpolation is to avoid calling getForceTorqueandMassLoss for every frame...not to avoid computing the position at every frame. That always has to be done anyway. With the proposed interpolation of rotations, you are still calling getForceTorqueandMassLoss just as often, you are using just as much computation to calculate the intermediate states via interpolation as you would to calculate the, and you are still performing just as many quaternion multiplications (regardless of whether you use an euler step or an interpolation, you still have to do the rotation business)

Since the intermediate values are already being calculated the simplest way they can(an euler step), the only way to actually save on computation time is to just not calculate the intermediate value for some variables whose variation over the interval is deemed unimportant to the getForceTorqueandMassLoss function. One could suggest that, with sufficiently small time steps mass variation over an interval could be small enough to be negligible. One could also argue that holding orientation constant over the interval would produce errors that are not perceived by the player. The latter of these avoid 4 out of 5 quaternion rotations by not computing intermediate orientations at all.
Okay, if I'm understanding correctly, you had no plan to obtain interpolated vectors from me, when you call my forces, torques and mass loss function. Correct? I always assumed I was expected to come up with them, somehow.
Correct, the values used to call getForceTorqueandMassLoss are the exact values of the state for which rk4 needs those quantities.
I think we should pass rotation and rotation interpolation data to the callee, so that I can give you accurate force vectors (for your rk4 translational computations).
As I said before, this just costs accuracy and doesn't actually save any computation. If you interpolate in getForceTorqueandMassLoss or I do an euler step in the rk4helper, the computational cost is the same. However, rk4 EXPECTS euler step results. The weights for each of the intermediate state derivative values are based on the assumption of an euler step in computing the intermediate states. If I were to use a midpoint step instead of an euler step, I may need different weights for the different interpolated derivatives.
This I don't get. A planet has no torque forces acting on it, or thrusters; therefore doing Euler or RK4 for rotation, in my mind, would give us the exact same result, no?
That was a bad example. My point was that interpolations are where lots and lots of quaternion rotations are, and eliminating 4 from the physics step doesn't really save you much. It saves you especially little for planets. And it damage gameplay and believability for players.

You claim you are proposing an "euler" step for rotation, but I have no idea what you mean by that. I have tried to articulate why that is a misnomer and why the pseudocode you presented (involving interpolating the intermediate states) doesn't actually save you any computation (in my previous post and again in this one). I have also tried to explain the options for saving computation by reducing the number of quaternion rotations, but that is NOT an euler step for rotations.

The only way to get rid of most of the quaternion rotations in the physics step (what I THINK you are concerned about, though at this point I am not so sure) is to not compute them. In each rk4helper call, just return the orientation as it came in and use that constant orientation to compute getForceTorqueandMassLoss. Then, at the end, you compute orientation the same way it is currently computed, as a weighted average of the angular_velocities at the intermediate states. This only involves one quaternion multiplication per physics step as opposed to 5. However, I should point out that this won't help with a more cumbersome problem...that of moment of inertia matrix operations.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Errol_Summerlin wrote:
--and I'm having a wild idea about that, that maybe rk4step() could compute rotation first, update the interpolation structure, and pass a pointer to it as an argument to forces, torques and mass loss...
One of us is not understanding. In rk4, to get, for example, the position at the intermediate state, I compute intermediate_position=x0+v0*dt..an euler step. It is just 1 multiplication and 1 addition.
First off, Errol, I want to apologize for this discussion taking so long; so many long posts. One of us is not understanding, indeed; and I'd bet it's me, given your much greater knowledge of physics and math; but then again my argument seems so clear to me I can't help another part of me believing the opposite. So, let's say I'm 50-50 on my bets; or maybe 75-75, as there's a high chance we're both not understanding --at least each other. But I'm sure you want me to understand, rather than drop the subject; and I want to understand, too.
I already got the idea that those "funny half steps" and whatnot in rk4 are Euler steps, and that they are computationally cheap as far as their Euler-ness is concerned. So, possibly our problem is semantic. I'm looking at the amount of code in helper(), which is called 3 times from rk4step() (lines 198, 199 and 200), and which in turn calls getForceTorqueandMassLoss() (line 132). Besides this call, helper() contains a total of 16 lines of computational code; --heavy code, where variables are as likely to be vectors as scalars. Not complaining; I just wanted to say that when I talk about "half steps" I mean "funny computations that take time", whereas maybe you have a more narrow definition of the term, involving "just one multiplication and one addition". There's also another call of getForceTorqueandMassLoss() from rk4Step() itself (line 233), and some more heavy code preceding it.
That's what I'm looking at, in general. I never meant to imply that the half-steps were anything other than Euler steps.

What I meant to say is that when you want a force vector from me, 4 times per rk4step (one in rk4 step, plus once per helper() call, times three calls) you expect vectors that are true for those positions. No? I take your current place-holder getForce...() as example, where the force line reads:

Code: Select all

    FTML.Force   = -1.0*state.mass*state.position/pow( position.length(), 3.0 ); //gravity like force pulling object toward (0,0,0)
and where state.position is the position AT the half-step or whatever position you call the function from, right?
This gives you a "gravity" vector true from that position.
But now, if the unit is rotating AND thrusting, presumably you'd similarly want true thrust vectors (or something as close as possible to true thrust vectors) AT such positions and times. No?

If so, to give you true (or close to true) thrust vectors, I need to be able to interpolate or extrapolate or somehow estimate my ship's orientation at those locations and times.

Unless there is something VERY fundamental I'm missing here. Am I getting the general idea right, about rk4? What it seems to me it's doing is sort of "super-sampling" the physics step, like testing what it gets from a full Euler step, versus what it gets from two consecutive Euler half-steps and applying some magical theorem to combine these results in the right ratios to get a much more accurate estimate, right? And these "super-samplings" happen at "future times", like by 1 and 0.5 dt's and whatnot, right? And so at each of those samplings you need a force vector as would be true at that position and time?

If so, to give you a true vector I need to estimate how much my ship has rotated during those 1 dt and 0.5 dt intervals.
And to do such an estimation I need either times and rotational speeds, to do my own Euler on my ship's orientation; or else times and two orientations (e.g. quaternions) 1 dt apart, to interpolate between.
And I was saying that the change in orientation ***for a ship, with thrusters*** is probably very small over a dt; so I could do a cheap, quaternion interpolation, for example; but you're the math wiz.

(((NOTE: I *almost* wrote that all I might need are two forward vectors to interpolate between, because it's "just a thrusting vector" I need to compute; but then I realized I could be thrusting down, or to the left, say; rather than thrusting forward; so no: I would need a full quaternion of orientation.)))

The other, COMPLETELY SEPARATE issue, is rotation simulation, which is what you seem to be discussing below...
To interpolate the position at the intermediate state linearly given x1 and x0, you compute intermediate_position=x0+(x1-x0)*dt/deltaT. This is also just 1 multiplication and 1 addition if you pre-compute (x1-x0)/deltaT.

This is the same for rotations. The linear interpolation is just as expensive computationally as the calculation for the intermediate state in rk4(which remember, is just an euler step). Computing the intermediate position via interpolation does not save you any time computationally.
I never meant to imply it would; and I never suggested interpolation for the intermediate position for rotation simulation. I suggested it for computing thrusting vector, in Unit code; a completely separate issue.
Once the intermediate state is calculated, rk4 asks the system to tell it what the force, torque and mass loss are at the intermediate (time,position, orientation, mass, and any other state variables that may be relevant to those quantities.
Exactly. And I only mentioned interpolation of rotations in connection to computing force, in Unit code; NOT in rk4step.
To reiterate, an euler step and a linear interpolation are the same computational cost in terms of computing the intermediate state in rk4.
I never doubted it. Interpolation is definitely even a little more expensive than a Euler step.
The reason for using interpolation is to avoid calling getForceTorqueandMassLoss for every frame...not to avoid computing the position at every frame.
It never even crossed my mind to use interpolations for position anywhere in rk4.
That always has to be done anyway. With the proposed interpolation of rotations, you are still calling getForceTorqueandMassLoss just as often, you are using just as much computation to calculate the intermediate states via interpolation as you would to calculate the, and you are still performing just as many quaternion multiplications (regardless of whether you use an euler step or an interpolation, you still have to do the rotation business)
Again, I never meant to suggest any kind of interpolations as a substitute for calling getForceTorqueandMassLoss(). What I was suggesting is that IN the getForceTorqueandMassLoss() function (which has yet to be written), I need to somehow interpolate orientations, so as to be able to give you true force (thrust) vectors; but that unless you give me some interpolation data, I cannot perform such interpolations.
So, for the sake of an intellectual exercise: rk4step code could shift a bit, such that the code lines pertaining to rotation could go first, at the top of the function, and come up with a "next" quaternion of orientation. Then, when you call getForce....() you give me the previous and next quaternions, and a t; and so I can do a quaternion interpolation at time t, and then compute my thrusting vector based on that, and return. Then you got a PrettyGood(TM)_force_vector from me. I'm trying to be nice :D

But now you could ask "How are we going to compute rotation early in the rk4step() function, when we need to call helper() first and all that?"
If that's your question, my answer continues to be that I would remove all code related to rotation from helper(), and do just a simple Euler simulation of rotation early in rk4, and leave it at that. Why? Because I don't think accuracy of rotation simulation is important at all; but that the accuracy of the thrusting direction consequences of (however inaccurate) rotations are much more important.
Since the intermediate values are already being calculated the simplest way they can(an euler step), the only way to actually save on computation time is to just not calculate the intermediate value for some variables whose variation over the interval is deemed unimportant to the getForceTorqueandMassLoss function.
I'd like to change your sentence to "Since the intermediate values are already being calculated the simplest way they can(an euler step), the only way to actually save on computation time is to just not calculate the intermediate value for some variables whose variation over the interval is deemed unimportant." (Period.)
Rotation is unimportant. (AND expensive.)
In fact, rotation is important IN getForce...(), as far as thrusting direction agreeing with rotation; but not in absolute terms.
One could suggest that, with sufficiently small time steps mass variation over an interval could be small enough to be negligible. One could also argue that holding orientation constant over the interval would produce errors that are not perceived by the player. The latter of these avoid 4 out of 5 quaternion rotations by not computing intermediate orientations at all.
Not sure what you mean by "holding orientation constant over the interval". What I've been suggesting, but I don't think I ever did clearly enough, is that at the top of rk4 we could do a simple Euler step for orientation:

Code: Select all

next_rotational_speed = last_rotational_speed + rotational_accel*dt;
next_orientation_quat = last_orientation_quat * quat(0.5*(last_rotational_speed+next_rotational_speed)*dt);
DONE!
No half steps. No helper() code for rotation.

THEN, when you call my getForce() function, you give me those quats:
Force = getForce( position, etceteras, last_orientation_quat, next_orientation_quat, t );
so that I can interpolate between those orientations and give you properly oriented force vectors.
Okay, if I'm understanding correctly, you had no plan to obtain interpolated vectors from me, when you call my forces, torques and mass loss function. Correct? I always assumed I was expected to come up with them, somehow.
Correct, the values used to call getForceTorqueandMassLoss are the exact values of the state for which rk4 needs those quantities.
Occasionally we DO understand each other :D so there's light at the end of the tunnel...
I think we should pass rotation and rotation interpolation data to the callee, so that I can give you accurate force vectors (for your rk4 translational computations).
As I said before, this just costs accuracy and doesn't actually save any computation.
And here understanding drops to zero, again :(
I was suggesting that precisely to INCREASE accuracy; NOT to save computations.
If you interpolate in getForceTorqueandMassLoss or I do an euler step in the rk4helper, the computational cost is the same. However, rk4 EXPECTS euler step results. The weights for each of the intermediate state derivative values are based on the assumption of an euler step in computing the intermediate states. If I were to use a midpoint step instead of an euler step, I may need different weights for the different interpolated derivatives.
Here I'm lost; but in case I'm understanding something, I never suggested changing ANYTHING about the way you compute RK4.
I was simply saying, and I don't know how to say it more clearly, that the code INSIDE getForce...() needs to interpolate orientation, to be able to give you back thrusting forces that reflect the fact that the ship has changed orientations at each call, as they are spread in time by multiples of 0.5*dt.
This I don't get. A planet has no torque forces acting on it, or thrusters; therefore doing Euler or RK4 for rotation, in my mind, would give us the exact same result, no?
That was a bad example.
Okay, but do give me a good one, then. I can't fathom any.
My point was that interpolations are where lots and lots of quaternion rotations are, and eliminating 4 from the physics step doesn't really save you much. It saves you especially little for planets. And it damage gameplay and believability for players.
Ditto; I need a good example of this damage.
And it's not clear to me that we need such heavy code in the interpolations as we have now in rk4step() and helper(). I'm hopeful we can optimize the code to avoid conversions back and forth between vectors of rotation, angle and axis, and quaternion representations.
You claim you are proposing an "euler" step for rotation, but I have no idea what you mean by that.
In pseudo-code:

Code: Select all

void RigidBody::rk4Step( double dt ) //non-reentrant; see comment about static variables
{
    ...............................................
    next_rotational_speed =
      last_rotational_speed + rotational_accel*dt; // <<<<<<<<<<<<<<<<< NEW
    next_orientation_quat =
      last_orientation_quat * quat(0.5*(last_rotational_speed+next_rotational_speed)*dt); // <<<<<<<<< NEW
    ...............................................
    helper( *this, 0.5*dt, d2, last_orientation_quat, next_orientation_quat ); // <<<< ADDED PARAMS
    helper( d2, 0.5*dt, d3, last_orientation_quat, next_orientation_quat ); // <<<< ADDED PARAMS
    helper( d3, dt, d4, last_orientation_quat, next_orientation_quat ); // <<<< ADDED PARAMS
    ...............................................
}
Then helper() would pass these two quaternions to getForce...() so that I can interpolate orientation at t.
But helper() would NOT do any computations related to rotation and orientation. Orientation is done, complete, finito, by the time helper() is called for a first time.
I have tried to articulate why that is a misnomer and why the pseudocode you presented (involving interpolating the intermediate states) doesn't actually save you any computation (in my previous post and again in this one). I have also tried to explain the options for saving computation by reducing the number of quaternion rotations, but that is NOT an euler step for rotations.
But I think you've all along been mixing up ***where*** I suggest to interpolate, and ***where*** to use Euler.
My interpolation suggestion has always been for getForceTorqueandMassLoss(), --i.e. Unit code; NOT for rk4 code.
And my Euler suggestion was simply to simplify rotational simulation, --since accuracy for it is irrelevant--, to a single Euler step.
The only way to get rid of most of the quaternion rotations in the physics step (what I THINK you are concerned about, though at this point I am not so sure) is to not compute them.
EXACTLY!
In each rk4helper call, just return the orientation as it came in
Better than that: Don't touch it. Don't even look at it.
and use that constant orientation to compute getForceTorqueandMassLoss.
NO! This is what I'm trying to say IS important in terms of accuracy. getForceTorqueandMassLoss() does need to use orientation interpolations to correctly orient thrust. But these interpolations should be much lighter than the ones you currently do in helper(). Interpolating between two quaternions that differ little boils down to a linear interpolation of the terms and renormalizing.
Then, at the end, you compute orientation the same way it is currently computed, as a weighted average of the angular_velocities at the intermediate states. This only involves one quaternion multiplication per physics step as opposed to 5. However, I should point out that this won't help with a more cumbersome problem...that of moment of inertia matrix operations.
Hmmm...
How about we 'cross that bridge when we get there'? I've no idea about it; but at least I'm sure we are only going to do it once per physics step; NOT four times...

EDIT:
And if we still don't understand each other, I will ask Klauss to enlighten us; he usually understands all sides in any argument; but has been notably silent in this one.

EDIT2:
Update: Class bit_array is now updated to be configurable to any number of bits (but fixed at compile time), and it compiles.
Last edited by chuck_starchaser on Sat Apr 17, 2010 7:03 pm, edited 1 time in total.
log0

Re: rk4 algorithm sent to you chuck

Post by log0 »

Maybe you need a:

Code: Select all

getLocalForceTorqueandMassLoss()
So there is no need to pass the interpolated position/orientation in the rk4step(). (You will still have to calculate them in the rk4 code of course.)
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

If so, to give you true (or close to true) thrust vectors, I need to be able to interpolate or extrapolate or somehow estimate my ship's orientation at those locations and times.
In the original code I wrote, when rk4 calls getForceTorqueandMassLoss, the argument for that should have been a state. Rk4 does not need anything interpolated because getForceTorqueandMassLoss's argument INCLUDES the orientation at the intermediate state. It is already calculated. All it needs is the force acting on the ship at THAT particular at that particular location and, at that particular orientation as provided by the state. So I don't see where interpolation or extrapolation is needed.
If so, to give you a true vector I need to estimate how much my ship has rotated during those 1 dt and 0.5 dt intervals.
And to do such an estimation I need either times and rotational speeds, to do my own Euler on my ship's orientation; or else times and two orientations (e.g. quaternions) 1 dt apart, to interpolate between.
Or, the argument to the function includes the orientation at the intermediate time as computed by the intermediate euler step and you have no need to compute anything except what direction the thrust is in given the ship's orientation at that intermediate time.
I'd like to change your sentence to "Since the intermediate values are already being calculated the simplest way they can(an euler step), the only way to actually save on computation time is to just not calculate the intermediate value for some variables whose variation over the interval is deemed unimportant." (Period.)
Rotation is unimportant. (AND expensive.)
In fact, rotation is important IN getForce...(), as far as thrusting direction agreeing with rotation; but not in absolute terms.
This is EXACTLY why I compute the orientation at the intermediate states and feed it (as part of the state) to the getForceTorqueandMassLoss function. I didn't want you to have to do any interpolations or extrapolation in getForceTorqueandMassLoss. My assumption was that you would have access to the thrust vector relative to the orientation of the ship. Then, when I give a different orientation as the argument to getForceTorqueandMassLoss, it is just a rotation to compute the direction of the thrust vector in the new orientation. I have already given you, as an argument to getForce, exactly the orientation that I want to compute the force for. No interpolations needed.

In principle, the force acting on an object can be written as F(x,v,a,orientation,a_v,a_a,m, etc...). My intention was to feed all the possible values that might be important into the function F(which is getForceTorqueandMassLoss).
DONE!
No half steps. No helper() code for rotation.

THEN, when you call my getForce() function, you give me those quats:
Force = getForce( position, etceteras, last_orientation_quat, next_orientation_quat, t );
so that I can interpolate between those orientations and give you properly oriented force vectors.
But the interpolations you intend to do in getForce are as computationally expensive as the simple euler rotation I do before each getForce call.
I was simply saying, and I don't know how to say it more clearly, that the code INSIDE getForce...() needs to interpolate orientation, to be able to give you back thrusting forces that reflect the fact that the ship has changed orientations at each call, as they are spread in time by multiples of 0.5*dt.
I likewise, can't figure out how to say it more clearly, the code INSIDE rk4 has already calculated the orientation at the intermediate step using a simple euler step(same cost or cheaper than interpolation regardless of where you do it). getForce does not need to interpolate anything. The orientation(which should be used to compute the direction of the thrust vector) is an argument to the function each time it is called.
But helper() would NOT do any computations related to rotation and orientation. Orientation is done, complete, finito, by the time helper() is called for a first time.
Great, so you cut out the euler step for each of the 3 intermediate state calculations and added an interpolation to the getForce function (which is called 3 times). We at least agreed that linear interpolation is as expensive as the euler step, so you haven't saved any computation, but now, the value used for the getForce function is not the value computed by an euler half-step like it is supposed to be. The value of the orientation is instead being computed as an interpolation over a whole-euler step. This is less accurate, AND, it doesn't feedback like it is supposed to. You see that both of the first two helper functions both have .5dt time steps. Under your pseudocode, the orientation at the first helper's getForce call will be identical to the one at the second getForce call, because the interpolated orientation will be the same. However, with normal rk4, the first helper returns a usual euler half-step, but then the velocity at the end of this intermediate state becomes the velocity for the second helper step, and you get a different orientation. Getting the thrust vector for these different orientations allows rk4 to compute how the thrust varies with orientation. Don't ask how, its complicated. You won't find a decent derivation anywhere online. You literally have to go back to the original paper to find it, and even that is unspeakably unclear.

At this point, if this doesn't clear it up, I think we need to just start over starting with the original rk4...going step by step with the proposed changes.

P.S. I couldn't get to the code this weekend. I had to go to New York, and well, I will just say that I really hate New York.
I am going to try tomorrow to write a "speedy" version of rk4 addressing the concerns you have raised.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

My deepest apologies. :oops:

I completely forgot that I had orientation available to getForceTorqueandMassLoss().
Totally forgot.
And the reason I confused myself was that I pictured so strongly a future rk4Step() without rotation computations in helper(), that my mind simply erased the current state of the code from its memory.
Yes, you're absolutely right.

Still, I think my solution is more code-efficient. Why? Because the Euler step you do inside the helper() function includes all of the following code:

Code: Select all

    output_deltas.angular_velocity = angular_velocity + input_deltas.angular_acceleration*dt;
    angle = input_deltas.angular_velocity.length() * dt;
    axis = normalize( input_deltas.angular_velocity );
    angular_delta_quat = quat< double >( angle, axis );
    output_deltas.orientation  = orientation;
    output_deltas.orientation *= angular_delta_quat;
And that is HEAVY to do three times. Count the number of instructions, if you don't believe me. It's HUGE.

With my solution, we'd do it only once, at the beginning of rk4Step (IOW, do just a single Euler step).
And then interpolation in getForce...() is as simple as,

Code: Select all

orientation = lerp( t/dt, next_orientation, previous_orientation );
orientation.normalize();
where

Code: Select all

quat lerp( double x, quat const & a, quat const & b )
{
    return quat( b*x + a*(1.0-x) );
}
And then we probably don't even need to interpolate, because we have the previous and next quaternions
of orientation, and the only other orientation we need, afaik, is the mid-point; right?
So, in fact, forget about interpolation in getForce...(); just do it in the call to it, if/when necessary.

With me so far?

However, to calculate the single Euler step for rotation at the beginning of rk4Step, you'll need torque from me, as well as mass loss.
So, to do this we need to split the getForceTorqueandMassLoss() into two functions:

Code: Select all

getTorqueandMassLoss( time ); //no Force
No arguments: I know both the torque and the mass loss without a need for state. So you'd call this first.
Then you'd do the single Euler step for rotation.
And then you'd do the calls of helper(), but where helper doesn't mess with rotations except to pass a rotation to the other function split it calls, namely,

Code: Select all

getForce( position, time, orientation );
Where orientation is either the previous quaternion of orientation, or the next, or the average of the two, normalized.

Do you see my point now?

Yes, doing Euler instead of RK4 for rotation is a compromise of precision, but precision of rotation is MUCH less important than precision of position simulation; --and MUCH more expensive.


Here's the current helper() function (scroll down):

Code: Select all

inline void RigidBody::helper( RigidBody const & input_deltas, double dt, RigidBody & output_deltas )
{
    //performs a basic EulerStep and returns the state after a time dt
    static float angle;
    static vec3< double > axis;
    static quat< double > angular_delta_quat;
    static ForceTorqueandMassLoss FTdm_dt;
    output_deltas.position = position + input_deltas.velocity*dt;
    output_deltas.velocity = velocity + input_deltas.acceleration*dt;
    output_deltas.angular_velocity = angular_velocity + input_deltas.angular_acceleration*dt;
    angle = input_deltas.angular_velocity.length() * dt;
    axis = normalize( input_deltas.angular_velocity );
    angular_delta_quat = quat< double >( angle, axis );
    output_deltas.orientation  = orientation;
    output_deltas.orientation *= angular_delta_quat;
    output_deltas.mass = mass+input_deltas.dm_dt*dt;
    output_deltas.inverse_mass = 1.0/output_deltas.mass;
    output_deltas.tensor_of_inertia = tensor_of_inertia
                                               * output_deltas.mass
                                               * inverse_mass;
    output_deltas.time = time + dt;
    getForceTorqueandMassLoss( output_deltas, FTdm_dt );
    output_deltas.acceleration = FTdm_dt.Force*output_deltas.inverse_mass;
    output_deltas.angular_acceleration = FTdm_dt.Torque/output_deltas.tensor_of_inertia;
    output_deltas.dm_dt = FTdm_dt.dm_dt;
}
And here's what the new helper() function would look like:

Code: Select all

inline void RigidBody::helper( RigidBody const & input_deltas, double dt, RigidBody & output_deltas, quat rotation )
{
    static ForceTorqueandMassLoss FTdm_dt;
    output_deltas.position = position + input_deltas.velocity*dt;
    output_deltas.velocity = velocity + input_deltas.acceleration*dt;
    output_deltas.mass = mass+input_deltas.dm_dt*dt;
    output_deltas.inverse_mass = 1.0/output_deltas.mass;
    output_deltas.time = time + dt;
    getForce( output_deltas, FTdm_dt, rotation );
    output_deltas.acceleration = FTdm_dt.Force*output_deltas.inverse_mass;
    output_deltas.angular_acceleration = FTdm_dt.Torque/output_deltas.tensor_of_inertia;
    output_deltas.dm_dt = FTdm_dt.dm_dt;
}
Less than half as many lines of code, and more importantly, the lines of code removed were the heaviest code, with roots, sine and cosine and gazillions of multiplications and additions.

And now, this is what the new rk4Step might look like:

Code: Select all

void RigidBody::rk4Step( double dt ) //non-reentrant; see comment about static variables
{
    OutputState & ostate = toggler ? *o0 : *o1;
    static RigidBody d2, d3, d4;
    static TorqueandMassLoss Tdm_dt;
    static Force F;
    static float angle;
    static float inv_dt;
    static float temp;
    static vec3< double > axis;
    static quat< double > angular_delta_quat;
    static vec3< double > rk4_angular_velocity;
    static quat< double > old_orientation;
    assert( not_zero(dt) ||! "rk4Step passed a dt of zero in the argument" );
    Tdm_dt = getTorqueandMassLoss();
    angular_acceleration = Tdm_dt.torque * inv_tensor_of_inertia;
    angular_velocity += angular_acceleration*dt;
    angle = angular_velocity.length() * dt;
    axis = normalize( angular_velocity );
    angular_delta_quat = quat< double >( angle, axis );
    old_orientation = orientation;
    orientation *= angular_delta_quat;
    orientation.normalize();
    helper( *this, 0.5*dt, d2, (old_orientation+orientation).normalize() );
    helper( d2, 0.5*dt, d3, orientation );
    helper( d3, dt, d4, (2.0*orientation-old_orientation).normalize()? );
    inv_dt = 1.0/dt;
    temp = inv_dt*inv_dt;
    ostate.X0 = position;
    ostate.V0 = velocity;
    //C1=(-1*(2*dt*v0 + dt*v1 + 3*x0 - 3*x1))/Power(dt,2)
    //C2=(2*(dt*v0 + dt*v1 + 2*x0 - 2*x1))/Power(dt,3)
    ostate.C1 = position*3.0 + velocity*(dt*2.0);
    ostate.C2 = position*2.0 + velocity*dt;
    position  += ( ( 1.0/6.0 )*dt*( velocity+2.0*( d2.velocity+d3.velocity )+d4.velocity ) );
    velocity  += ( (1.0/6.0)*dt*( acceleration+2.0*( d2.acceleration+d3.acceleration )+d4.acceleration ) );
    ostate.C1 -= position*3.0;
    ostate.C2 -= position*2.0;
    ostate.C1 += velocity*dt;
    ostate.C2 += velocity*dt;
    ostate.C1 *= (-1.0*temp);
    ostate.C2 *= (2.0*temp*inv_dt);
    mass  += ( (1.0/6.0)*dt*( dm_dt+2.0*( d2.dm_dt+d3.dm_dt )+d4.dm_dt ) );
    tensor_of_inertia *= inverse_mass;
    inverse_mass = 1.0/mass;
    tensor_of_inertia *= mass;
    time += dt;
...or something along the lines.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

chuck_starchaser wrote:7 lines of code to allocate.
6 lines of code to deallocate.
Find an allocator out there that can match that :)
Compiles, too...
I can write one for you :p
(that uses half as much memory)

I wrote array allocators many many times - I can dictate them if you want ;)

I wrote two for work, a refcount allocator for python, several other allocators for other projects, and each time I get it to compile to... um... 5 assembly instructions I believe.

Anyway, I'd make it a template. The compiler optimizes template code more aggresively (besides if it's a template it can know the type of objects being handled). So I'd get rid of the char array, in favor of a T[] array.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

@Errol: if I'm following the discussion right, you'd be computing a thrust vector for each intermediate step. I see a potential performance issue here if that is coded "generically" - ie: calling some kind of Unit function, or accessing some fields within the unit.

In short, doing that would hurt data flow patterns and make it all run a lot slower. I don't believe thrust vector relative to the unit's local coordinate system would change within a time step by much. In fact, I believe we can consider it constant, or at most linearly interpolated between initial and next state. Perhaps you should, then, add thrust vector to the input state structure (and have Unit write there instead of RK reading it).

Then getForceAndTorqueAndWhatnot() would be greatly simplified - am I right?
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

klauss wrote:
chuck_starchaser wrote:7 lines of code to allocate.
6 lines of code to deallocate.
Find an allocator out there that can match that :)
Compiles, too...
I can write one for you :p
(that uses half as much memory)

I wrote array allocators many many times - I can dictate them if you want ;)
Hahaha; unfortunately my cell phone is disconnected.
Apparently they want payment from me!
But you got me curious.
I wrote two for work, a refcount allocator for python, several other allocators for other projects, and each time I get it to compile to... um... 5 assembly instructions I believe.
VERY curious.
Anyway, I'd make it a template. The compiler optimizes template code more aggresively (besides if it's a template it can know the type of objects being handled). So I'd get rid of the char array, in favor of a T[] array.
It IS a template; but the two template arguments are size_t, for element size and array size.
I just didn't know how to write template metaprogramming code to allocate extra size to T to align the structure to a POT.
But maybe I could just use STATIC_ASSERT to make sure T is POT-2 size (-2 because I need 2 bytes for index of next free location).

I never wrote an array allocator before, so it could definitely be that there's some trick I failed to re-invent.
klauss wrote:@Errol: if I'm following the discussion right, you'd be computing a thrust vector for each intermediate step. I see a potential performance issue here if that is coded "generically" - ie: calling some kind of Unit function, or accessing some fields within the unit.
Sorry for replying to a message directed to Errol.
That's correct; that IS Errol's plan, and I agree with it. See below.
In short, doing that would hurt data flow patterns and make it all run a lot slower.
Not so bad if we prefetch unit data together with physics data; but in fact, what I'm doing is I have a class phys_object that inherits Errol's RigidBody class, and adds to it such things as thrusting and other forces, distance (to the player), an "agility" number, for prioritizing; and will probably include collision data too. So, all of this will be in the same array-allocated structure.
I hear you screaming. :)
Thing is, right now RigidBody is a thoroughly inconvenient size of 280 bytes, and will need another 140 for the tensor of inertia and inverse tensor of inertia. That's 420 bytes; so there's 90 bytes unused to the next POT-2 (510), so might as well.
So, we actually won't need to touch Unit at all, probably; but we'll have a whopping 8 cache lines to raise per rigid body, plus another for the extrapolation structure.

((Yeah, I know, earlier I spoke of RigidBody fitting in a cache line or two. Just a mistake, I multiplied the sum of scalars by 4 bytes, instead of 8; --everything is a double.))
I don't believe thrust vector relative to the unit's local coordinate system would change within a time step by much. In fact, I believe we can consider it constant, or at most linearly interpolated between initial and next state. Perhaps you should, then, add thrust vector to the input state structure (and have Unit write there instead of RK reading it).
There's no "input state structure", right now, though you could think of the thrust-vector-to-be in phys_object as an input structure. I think it's a given that thrusting will be assumed constant across a physics step, though, frankly, I never thought about it.
In any case, the issue is rotating that thrust vector as the ship rotates across the rk4 super-sampling mini-steps.
Errol is computing orientation for each of those mini-Euler steps, which he says are a simple multiplication and addition, but they boil down to tons of multiplications and additions, plus two square roots and sine and cosine calls.
My suggestion is to do all that only once per physics step --i.e. use Euler for rotation, and produce rotations of thrust vector at the rk4 mini-steps by much simpler quaternion interpolations (12 multiplications, 9 additions, 1 square root).

But I suspect I'm missing something...

Ahhhh, I get it, I think:

You're saying we should rotate the thrust vector directly, rather than compute ship rotation and then get thrusting direction at each call of getForce...(). Is that it? Good idea!
So, getForce...() should be called only once, really; its data stored temporarily, and then manipulated as needed inside rk4step(). Well, not quite: rk4 has no knowledge of gravitational attractors; it HAS to ask for a resultant force, every time. But yes, there should be a void getReady() function that tells Unit to prepare a thrust vector, and what getForce...() should pass as argument is a differential (rotation, rather than orientation) quaternion to apply to that thrust vector, in subsequent sub-steps.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

chuck_starchaser wrote:Ahhhh, I get it, I think:

You're saying we should rotate the thrust vector directly, rather than compute ship rotation and then get thrusting direction at each call of getForce...(). Is that it? Good idea!
So, getForce...() should be called only once, really; its data stored temporarily, and then manipulated as needed inside rk4step(). Well, not quite: rk4 has no knowledge of gravitational attractors; it HAS to ask for a resultant force, every time. But yes, there should be a void getReady() function that tells Unit to prepare a thrust vector, and what getForce...() should pass as argument is a differential (rotation, rather than orientation) quaternion to apply to that thrust vector, in subsequent sub-steps.
Pretty much.

About the allocator, metaprogramming is easy (in theory): just write a template as if it were a function, with a constant value() getter:

Code: Select all

template<int SIZE> class meta_log
{
    static int value()
    {
        if (SIZE <= 1)
            return 0;
        else
            return 1 + meta_log<SIZE/2>::value();
    }
}
Isn't that fun?
(I won't guarantee it compiles - metaprogramming is known for its ability to kill compilers ;) )

So I'd stay away from template metaprogramming (because it will be a compatibility problem). Then perhaps having the byte array is needed... if you REALLY want to pad to POT - which I don't think you need to. You just have to align to cache line boundaries, so you could use the trick:

Code: Select all

struct Entry {
    struct CacheLine {
        char pad[64];
    }
    union {
        T data;
        CacheLine padding[(sizeof(data)+sizeof(CacheLine)-1) / sizeof(CacheLine)];
    }
}
Having unions is a bit of a pain in the ass, though. So you could just have an assert that your (sizeof(T) % SIZEOF_CACHE_LINE) == 0, and pad manually in T ;)

(boost has static asserts that fail at compile time)


Edit:

It iched so bad that I tried ;)

Code: Select all

#include <stdio.h>

template<int SIZE> class meta_log
{
public:
    static int value()
    {
        if (SIZE <= 1)
            return 0;
        else
            return 1 + meta_log<SIZE/2>::value();
    }
};

struct TestStruct {
    double a,b,c,d,e,f,g;
};

int main()
{
    printf("%d",meta_log<sizeof(TestStruct)>::value());
}
Resulting assembly:

Code: Select all

Dump of assembler code for function main:
0x00000000004005c0 <main+0>:	sub    $0x8,%rsp
0x00000000004005c4 <main+4>:	mov    $0x5,%esi
0x00000000004005c9 <main+9>:	mov    $0x4006cc,%edi
0x00000000004005ce <main+14>:	xor    %eax,%eax
0x00000000004005d0 <main+16>:	callq  0x400498 <printf@plt>
0x00000000004005d5 <main+21>:	xor    %eax,%eax
0x00000000004005d7 <main+23>:	add    $0x8,%rsp
0x00000000004005db <main+27>:	retq   
End of assembler dump.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Can't make it work.
char[size param] wants a constant value; doesn't like those static x<y>::value() functions.
This is out of my league, Klauss; and since you've done array allocators before, how about
you just give me working code for the allocator?
Basically, this is what I have, but now it's broken:

Code: Select all

template < typename T, size_t ASz >
class ArrayAllocator
{
    template< size_t SIZE >
    struct meta_log
    {
        static size_t value()
        {
            if (SIZE <= 1)
                return 0;
            else
                return 1 + meta_log< SIZE/2 >::value();
        }
    };
    template< typename U >
    struct meta_size
    {
        typedef char[(1 << (meta_log< sizeof(U)+1 >::value()+1)-2] out_t;
    };
public:
    typedef phys_obj_id_t ndx_t;
private:
    template< typename X >
    struct ArrayElement
    {
        X x;
        ndx_t next_free_;
    };
    ArrayElement< meta_size<T>::out_t > array_[ASz];
    ndx_t first_free_;
    ndx_t count_;
    //non-copiable:
    ArrayAllocator( ArrayAllocator const & );
    void operator=( ArrayAllocator const & );
public:
    void erase()
    {
        for( ndx_t i = ASz; i !=0; --i )
            array_[i-1].next_free_ = i;
        array_[ASz-1].next_free_ = 0;
        first_free_ = 0;
        count_ = 0;
    }
    ArrayAllocator()
    {
        erase();
    }
    ndx_t get_count() const { return count_; }
    bool is_free( ndx_t ndx ) const { return array_[ndx].next_free_ != 0; }
    void * operator[]( ndx_t ndx ){ return reinterpret_cast<void*>( array_+ndx ); }
    ndx_t allocate()
    {
        prefetch_dirty( array_[first_free_].next_free_ );
        ndx_t ret = first_free_;
        assert_or_throw( ret < ASz-1, "ArrayAllocator full" );
        first_free_ = array_[ret].next_free_;
        ++count_;
        array_[ret].next_free_ = 0;
        return ret;
    }
    void deallocate( ndx_t which )
    {
        assert_or_throw( which < ASz-1, "ArrayAllocator: deallocating index beyond range" );
        assert_or_throw( array_[which].next_free_ == 0, "ArrayAllocator: deallocating index that was already free" );
        ndx_t temp = first_free_;
        --count_;
        first_free_ = which;
        array_[which].next_free_ = temp;
    }
    //this iterator is a joke; intentionally; super-simple: it always starts at zero and
    //only has pre-increment operator which returns the index of the next array element in use,
    //at each ++. If you ++ and there are no more used elements, it simply throws msg("").
    //To be used like... iterator i; try{ while(1){ do_some(myarray[++i]); } } catch(...){} 
    class iterator
    {
        ArrayAllocator const & aa_;
        ndx_t ndx_;
        ndx_t cnt_;
    public:
        iterator( ArrayAllocator const & aa ) : aa_(aa), ndx_(0), cnt_(0) {}
        void rewind(){ ndx_ = 0; cnt_ = 0; }
        ndx_t operator++()
        {
            assert_or_throw( cnt_ < aa_.get_count(), "" ); //throws msg("") when done.
            while( aa_.is_free(++ndx_) );
            ++cnt_;
            prefetch_clean( aa_[ndx+128/TSz] );
            return ndx_;
        }
    };
    std::auto_ptr<iterator> get_iterator(){ return new iterator( *this ); }
};
If T is, say, 280 bytes, it has to pad to 510 for the inner structure, then add a uint16_t index to next.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

Again, does it have to be POT? Why not just align to cache line boundaries?
What's the rationale?
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Sorry; that's what I meant, really. Final size should be 64*N - 2
Well, some machines have 128 cache lines...
Bah, those are Intel, anyways.
So... 64.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Another thing, Klauss:
I'm not sure how graphics frame time is computed, but I was thinking that IF there is a function in OpenGL that you call to get an estimated next_frame_time, in milliseconds, I was thinking of filtering it, to get finer resolution, sort of like,

Code: Select all

double next_frame_time()
{
    static size_t previous_time_ms = ::GL_next_time();
    static double filtered_time = double( ::GL_next_time() );
    static double filtered_delta = 22.2;
    size_t time_ms = ::GL_next_time();
    size_t temp1 = time_ms - previous_time_ms;
    double temp2 = double(temp1) - filtered_delta;
    filtered_delta += 0.1 * temp2;
    if( std::abs( filtered_delta ) >= 1.0 )
    {
        if( temp2 > 0.0 )
            filtered_delta += 0.5*(temp2-1.0);
        else
            filtered_delta += 0.5*(temp2+1.0);
    }
    filtered_time += filtered_delta;
    previous_time_ms = time_ms;
    return filtered_time;
}
Or something along the lines. Probably it should return a fixed point numeric type, rather than a double; just
trying to simplify.
Should respond quickly when frame-rate changes quickly, but smooth out millisecond jitter.

But I was now thinking, what we really care for is the time of the next screen refresh AFTER the next frame time.
Why?
Because that represents, for example, 16 mS jitter, at 60 Hz refresh rate; which makes 1 mS jitter small potato.

So, question is, is next refresh time AFTER next graphics frame time... Turing-computable? :mrgreen:

EDIT:
Don't miss my previous post, which is now at the bottom of the last page.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

First, GL doesn't tell you such a thing. I'm not even sure how to define "next frame time".

So... how would you define "next frame time"?
The time it will take to render all the commands you will send? :roll:

I know what you're trying to do. Avoid jitter. But... I don't think it's even as remotely easy as you seem to think. Filtering rendered times won't get you anything, you have to minimize the difference between the time you're rendering within your virtual universe (ie: the time your graphics rendering is depicting) vs. the actual display time of the depiction.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

I didn't tink it would be easy; that's why I asked if it was "Turing-computable". I don't think it is, really. You'd have to know how long the commands will take to execute, and then delay to the (asynchronous) next screen refresh.
It was a shot.
Looks to me now like we have a problem, and that the whole industry has a problem, and nobody's yet saying the emperor has no clothes on, but, millisecond jitter would be pretty bad, if it was the main problem, but there's another problem that's much rougher and coarser, an uncertainty in the order of 15-50 milliseconds: what every real-time game needs to know in advance is next *display* time, hopefully in microseconds, and there should be a way for the gpu to commit to this time, once it has promised it, such as by delaying vsynch an extra refresh cycle. THAT''s what the industry needs; the only thing that will make games look choppy but continuous, like movies, which they don't.
And for us it means not only that millisecond jitter is not an issue, but probably that we won't need to linearly interpolate betwen 32Hz cubic interpolations. Not sure; just throwing in the idea. The uncertainty about when the frame we are computing will actually be on the screen is so great it dwarfs such issues, seems to me.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

Well, to even be able to achieve that, you'd have to:
  • Guarantee your scenes won't be overly complex (requiring a drop in FPS)
  • Render a future state in advance (perhaps by simulating a bit ahead and accepting input-reaction lag as a side effect)
  • Delay display of such state to the precise predefined display time
Thing is, a lot of things can collude to prevent you from achieving any of those goals:
  • You can inadvertantly surpass GPU resources and enter its "sweet spot" (where its performance drops radically).
  • You can have jitter in your time source, defeating the whole thing.
  • You can have problems simulating the physics, having the same consequence as having too complex scenes graphically: lost frames.
  • The user can specifically ask for more detail than his hardware can handle.
In order to avoid the introduction of incorrectly timed pictures, you'd have to verify just before swapping that you can actually meet your deadline, and bail out (and skip the offending frame) if you do miss it. You'd trade skipped frames vs incorrect timings. I really, really don't know what's worse. I've experience incorrectly timed frames (happens a lot when, say, your computer starts swapping), and it's indeed annoying. But I don't know if skipping frames will be less annoying.

You're facing a really hard problem, because to approach a solution you have to make compromises, difficult compromises. Not because implementing any solution is difficult, I doubt any solution is hard to implement.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Yep; that's what I suspected.
Anyways, the conclusion for me is that under the best of circumstances (constant gpu load) there will be time jitters
of approximate magnitude = min( 1/refresh_rate, 1/FPS ), typically 13 to 17 milliseconds --say 15 milliseconds, for a
figure to keep in mind--; so it's pointless to filter time to get rid of millisecond jitters.
Now, perhaps it doesn't make it pointless to interpolate between 32 Hz cubic interpolations; I'm not sure.
No; it certainly doesn't; that would add 30 milliseconds to the 15.
What it DOES put into question is the value of doing cubic interpolations.
Well, no; a unit that gets simulated once every four seconds, say, absolutely needs cubic interpolations.
But a unit being simulated at 16 Hz could be linearly interpolated...
Nah; the 32 hz interpolation comes right at the middle, where it counts the most.

Okay, never mind; we can just stay the course. All that's new is that filtering time is not warranted.
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

With me so far?
kind of... I will explain later

First though, I just figured out another source of confusion. lerp, as it is written there, is not a mathematical linear interpolation. It is a non-linear interpolation. It provides the right values at the end points, but angular velocity over the interval is not constant. It varies over the interval. Exactly how it varies is pretty complicated and depends on the exact rotation.

First, I think the code

Code: Select all

b*x + a*(1.0-x)
should have a and b swapped to make it consistent with the variable labels.
It seems like, at x=0, the result should be previous_orientation(b), and at x=1, the result should be next_orientation(a).

Code: Select all

a*x + b*(1.0-x)
provides those results at the end points.

Now, about the difference between lerp and a "REAL" interpolation. Say, for example, that b=1,0,0,0 and a=0,0,1,0...a 180 degree rotation about the y-axis

at x=1/3, the object should be rotated 60 degrees from its original position. At this point, a REAL linear interpolation at constant angular velocity over the interval gets intermediate_quaternion=(.866,0,.5,0) [.866=sqrt(3)/2]

For the lerp, you get (2/3,0,1/3,0)/sqrt(5/9)=(.8944,0,.447,0). This is only a 53 degree rotation. By the time you get to x=1, the lerp has caught up to the "REAL" interpolation, but pretty much everywhere else, it is off a little.

When you said a linear interpolation before, I thought you were talking about a REAL linear interpolation(like that which I proposed to Klauss) that IS just as computationally expensive as the euler step I was performing. This is why I thought you were speaking non-sense. It is once again us speaking different languages.

So, now that this is cleared up, I finally see where you can save computation by immediately computing the final orientation as separate euler step that may as well be outside rk4 altogether and using a lerp to get the orientation argument for the getForce function.

However, there are still several problems:

1) You didn't address my point made near the end of my last post. The intermediate steps are not simply sampling different times. The helper function is called twice with the same dt/2 time step starting from the initial position BOTH times. The difference between the two steps is that the first one uses the derivatives at the initial position, and the second step uses the derivatives at the position and time it calculated in the first helper call. In your algorithm, both of the first two helper functions will use the same value of orientation in the getForce function whereas the true rk4 would provide different values of orientation. This will have the effect of making orientation independent of angular velocity and torque as far as rk4 is concerned because, despite having a different angular velocity in each of the calls to helper (*this and d2 will have different angular_velocities), the resulting orientation at the half time step was the same both times. In addition the final helper step is NOT supposed to be the final orientation. The derivatives being used to calculate it are obviously not the right derivatives. That is intentional as it is just supposed be one component of the averaging process. I am having trouble articulating this point. I guess what I am trying to say is that this is messy and I am not sure how it will work.

2) lerp is a BAD interpolation. It could be industry standard for all I know, but it is BAD. It creates anomalous variations in angular velocity that have nothing to with actual torques on the object. It may work ok over an interpolation step. In fact, for computational speed reasons, it probably HAS to be the choice for interpolation, but I am very dubious as to how it will behave in a physics step where the force is dependent upon the lerped orientation.

3) You have the orientation arguments to the helper function as additions of quaternions. Additions of quaternions are NOT rotations. They can approximate rotations in the small angle limit, but not all physics steps will have small rotations over the interval. So, you have to select carefully only objects that are guaranteed not to spin too much over a physics step. And, given that a spacecraft can still get hit by something and exceed the governor setting (until the governor can slow the rotation back down). I can't see even player ships abiding by this limit ALL the time. Even assuming that the lerp result is good enough, the argument to your helper functions don't seem right. Unless I am missing something(very possible given how hard it is for me to follow "optimized" code), both of the first two helper functions should have (orientation+old_orientation).normalize() as their argument. And the last should be just orientation.

4) The two big computational problems as you see them are the construction of the quaternion from the angle and the axis which involves a sin and a cos and a sqrt and the actual rotation which involves a quaternion multiplications which is 12 adds and 16 multiplications. While this does eliminate those cumbersome calculations, it does it in a rather convoluted way that results in unpredictable responses(due to the feedback) with significant deterioration of rotational accuracy(due to the simple euler step over the whole time step) and even more significant deterioration of the thrust vector accuracy(due to lerp).

I think the best solution is to leave rk4 as is but not compute the orientation in the helper function.
First, delete these 5 lines

Code: Select all

angle = input_deltas.angular_velocity.length() * dt;
    axis = normalize( input_deltas.angular_velocity );
    angular_delta_quat = quat< double >( angle, axis );
    output_deltas.orientation  = orientation;
    output_deltas.orientation *= angular_delta_quat;
Then add this line of code

Code: Select all

angular_delta=angular_velocity*dt;
Then, add angular_delta to the arguments to getForceTorqueandMassLoss()

Then, in getForceTorqueandMassLoss(), you will have the angular_delta vector(which contains both the magnitude and the axis of the rotation) and the original orientation(as part of the state).

Then, getForceTorqueandMassLoss() has all the information needed to compute the "exact" orientation (computationally expensive, equivalent to rk4 as it is now), completely ignore the angular_delta(assuming orientation constant over a time step, computationally cheap), or do an approximation to a quaternion rotation(small angle approximation or quaternion addition instead of multiplication). This gets those heavy computations out of rk4 and allows the getForce function to make a determination about the importance of the rotational element's importance in calculating the force.

See, rk4 doesn't actually care about the orientation for any purpose other than the call to getForceTorqueandMassLoss(). Only the derivatives at the advanced state are used in future helper calls.

The above addresses your major concerns regarding computational cost, but doesn't pigeon hole the algorithm into the lerping hack which may not work for all objects. It defers the orientation computation to getForce so it only gets done if you REALLY need to do it. So, for things that thrust, most of the time, they will have a governor, and you can use the small angle approximation, and for things that don't thurst, you can ignore the angular_delta because the Force is independent of the object's orientation. But, if something is thrusting and is spinning fast(perhaps due to a collision), you may have to full monty and actually compute the quaternion multiplication.

Finally, the above modification is simpler to explain and simpler in terms of code complexity. It is also easier to foresee the consequences of various approximations that may be used. Even completely ignoring the angular_delta won't break rk4. It might produce a delayed response to rotation while thrusting if the physics step is too large, but it may, as you argued, produce no effect the player can notice.

What is even better is that we dumped the heavy code, and the simulation is still 4th order for rotation. The only place where additional error will creep in is if an object where force depends on orientation uses an insufficiently accurate orientation determination.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

(In my lerp example a was previous, b next.)
(Yeah, I though your two half steps were consecutive. So, what's the third one?)

Can't say I'm sure I understand every detail of the new proposal, but it sounds marvelous; let's do it just so.
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

(In my lerp example a was previous, b next.)

Code: Select all

orientation = lerp( t/dt, next_orientation, previous_orientation );
orientation.normalize();
quat lerp( double x, quat const & a, quat const & b )
{
    return quat( b*x + a*(1.0-x) );
}
lerp is called in the first line of code with next_orientation as the second parameter. The second parameter to the function is a, which should be the previous orientation according to the equation you used. Thats what I meant by "inconsistent", but its not really important.
Yeah, I though your two half steps were consecutive. So, what's the third one?
Its complicated, and I can't say I fully understand why it all works out. This is why I am loathe to substantially change rk4. If you change an algorithm you don't fully understand, the results will likely not be what you anticipated. The easiest way to see what is going on is to step through an rk4 step in a particular scenario. Imagine the object orbiting at r0=(1,0,0) with v0=(0,1,0) and a0=-r0/r0^3=(-1,0,0). Lets try and rk4 step of dt=1 just for simplicity again.

So, we do the first euler half-step. r1=r0+v0*dt/2=(1,0.5,0), v1=v0+a0*dt/2=(-0.5,1,0), a1=-r1/r1^3=(-0.7155,-0.3578,0)

Then, we do the next euler half-step r2=r0+v1*dt/2=(0.75,0.5,0), v2=v0+a1*dt/2=(-0.358,0.8211,0), a2=-r2/r2^3=(-1.024,-0.6827,0)

The, we do the last full euler step r3=r0+v2*dt=(0.642,0.8211,0), v3=v0+a2*dt=(-1.024,0.3173,0), a3=(-0.56698,-0.7252,0)

Now, we compute rfinal=r0+(v0+2*v1+2*v2+v3)*dt/6= (1,0,0)+(-0.4567,0.8266,0)=(0.5433,0.8266,0)

The exact solution is (0.5403,0.8415,0). That is less than 2% error on a time step that goes 16% of the way around the orbit. This is the equivalent of taking a 2 month time step for the earth.

vfinal=v0+(a0+2*a1+2*a2+a3)*dt/6=(-0.841,0.5323,0)

The exact solution is (0.8415,0.5403,0). Again, very good for such a large time step.

But the point of this post isn't to espouse the virtues of rk4. It is to try to explain what rk4 is doing exactly. So, the first steps misses outside the circle. The position only changes due to the initial velocity(which neglects the acceleration and has no x-component). It finds out that the velocity there has an x component. So, it knows that it underestimated the x-component of the velocity. Then it goes back and does the same step using that velocity at the end of the half time step instead. The idea is that this will over-estimate the x-component of the velocity by the same amount the original underestimated and miss inside the circle. When these two steps, the dominant steps, are averaged at the end, the average acceleration will come out about right. I don't exactly have a good reason why the final step chooses to use the step that missed too close as opposed to the one that missed too far, but I imagine that either one produces a bit of error in the higher order terms, but that somehow, the extra iteration makes the one that missed too close a little better. This is a bit of handwavium, sadly, but the only way to get more detailed is to do the analysis done at this link.

http://www.artcompsci.org/kali/vol/vol- ... 04_ok.html

The rk4 algorithm discussed there is the same as ours, except, they are able to make a simplification because their second order differential equation is set up such that acceleration is only a function of position. Ours needs to be a bit more general. However, you can, in principle, do the same mathematical analysis they do on the algorithm we have. The actual analysis is near the bottom of that page "An attempt at an Analytical Proof", but you may have to read a bit above that to catch the thread of the conversation. The "attempt" is good enough for me, but you can continue reading for a more rigorous proof.

I will do the best 'I' can do here. If you write out the steps of the algorithm and write v0,v1,v2, and v3 in terms of v0 and the accelerations, you get rfinal=(a0+a1+a2)*dt^2/6+v0*dt+r0.

a1 and a2, in the example above are functions of r and can be written as a taylor series a1=a0+fd0*deltaR1+sd0*deltaR1^2/2 where fd0 and sd0 are the first and second derivatives of a with respect to r evaluated at deltaR=0.

runge-kutta computes a1 for a given deltaR1 and a2 for a different deltaR2. Using these two equations, and the known values of a1,a2, a0, deltaR1, and deltaR2, you can solve for fd0 and sd0.

We can now replace a1 with a0+... and the same for a2. We also know that deltaR1=r1-r0 and deltaR2=r2-r0. After doing all those substitutions, you get rfinal=r0+v0*dt+a0*dt^2/2 like you expect, but you also have several higher order terms. The first one is fd0*v0*dt^3/6. fd0*v0=(da/dr)*(dr/dt)=da/dt.

The next term is (a0*fd0+sd0*v0^2)*dt^4/24 This needs to be the second derivative of a with respect to time.

a full derivative can be written as Dj/Dt=dj/dt+dj/dr*dr/dt where D is a complete derivative and d is a partial derivative. Since j=fd0*v0, dj/dt=fd0*a0, dj/dr=sd0*v0 and dr/dt=v0. So, this term is correct as well.

Essentially, by sampling the acceleration at those two midpoints, we were able to get acceleration to second order which gives us position to 4th order. You may be asking yourself where the weighting comes in at this point. The particular weighting is how the coefficients come out right. The 1/6 and the 1/24 in each of the last two terms are the coefficients of a taylor expansions 3! and 4! respectively. If the weighting was different, the coefficients would not come out right. Say, for example, you had (v0+v1+v2+v3)/4 instead of (v0+2*(v1+v2)+v3)/6.

In that case, the coefficients for the last 3 terms would be 1/2, 3/16, and you would get a different coefficient for each of the partial terms in the last term. This would result in systematic errors.

It also may not be clear what happens for functions where a is dependent on more than one variable (such as this application). The answer is that THIS is really why rk4 is so stable. See, rk4 doesn't actually solve the equations for a1 and a2. These are only solved to provide proof that the coefficients are correct in the end (and to determine the correct coefficients in the first place). a1 and a2 each are KNOWN to be equivalent to their taylor expansion and correct up to some order. When you compute a1 in a REAL program, a1=a0+Da/Dt*dt+D(Da/Dt)/Dt*dt^2/2 where Da/Dt=da/dt+da/dr*dr/dt+da/dv*dv/dt. . . and so on for every variable. In other words, everything is already included in the value of a1 computed. This is part of the reason rk4 is so computationally heavy, but also the primary reason for its stability.

Finally, to answer your actual question, about the 3rd helper step. As you can see, it doesn't appear in the equations for r, but it does appear in the equations for v, where essentially, the same thing is done. Only, now, there are 3 equations (a1,a2, and a3), and you can deduce from that 3 terms of the taylor expansion for a giving you 4th order in v as well. One could, in principle, leave out this last helper step to save some time and only have velocity to the 3rd order, but the coefficients would get messed up and I would have to re-derive runge-kutta in that new circumstance...a task that, while interesting, would likely take a great deal of time.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Errol_Summerlin wrote:lerp is called in the first line of code with next_orientation as the second parameter. The second parameter to the function is a, which should be the previous orientation according to the equation you used. Thats what I meant by "inconsistent", but its not really important.
Ooops. I have a consistent problem with consistency; probably a pair of axons crossed the wrong way around...
Yeah, I though your two half steps were consecutive. So, what's the third one?
Its complicated, and I can't say I fully understand why it all works out. This is why I am loathe to substantially change rk4. If you change an algorithm you don't fully understand, the results will likely not be what you anticipated. The easiest way to see what is going on is to step through an rk4 step in a particular scenario. Imagine the object orbiting at r0=(1,0,0) with v0=(0,1,0) and a0=-r0/r0^3=(-1,0,0). Lets try and rk4 step of dt=1 just for simplicity again.

So, we do the first euler half-step. r1=r0+v0*dt/2=(1,0.5,0), v1=v0+a0*dt/2=(-0.5,1,0), a1=-r1/r1^3=(-0.7155,-0.3578,0)

Then, we do the next euler half-step r2=r0+v1*dt/2=(0.75,0.5,0), v2=v0+a1*dt/2=(-0.358,0.8211,0), a2=-r2/r2^3=(-1.024,-0.6827,0)

The, we do the last full euler step r3=r0+v2*dt=(0.642,0.8211,0), v3=v0+a2*dt=(-1.024,0.3173,0), a3=(-0.56698,-0.7252,0)

Now, we compute rfinal=r0+(v0+2*v1+2*v2+v3)*dt/6= (1,0,0)+(-0.4567,0.8266,0)=(0.5433,0.8266,0)

The exact solution is (0.5403,0.8415,0). That is less than 2% error on a time step that goes 16% of the way around the orbit. This is the equivalent of taking a 2 month time step for the earth.

vfinal=v0+(a0+2*a1+2*a2+a3)*dt/6=(-0.841,0.5323,0)

The exact solution is (0.8415,0.5403,0). Again, very good for such a large time step.

But the point of this post isn't to espouse the virtues of rk4. It is to try to explain what rk4 is doing exactly.
And VERY well explained. It finally got past the bone layer, this time.
So, the first steps misses outside the circle. The position only changes due to the initial velocity(which neglects the acceleration and has no x-component). It finds out that the velocity there has an x component. So, it knows that it underestimated the x-component of the velocity. Then it goes back and does the same step using that velocity at the end of the half time step instead. The idea is that this will over-estimate the x-component of the velocity by the same amount the original underestimated and miss inside the circle. When these two steps, the dominant steps, are averaged at the end, the average acceleration will come out about right. I don't exactly have a good reason why the final step chooses to use the step that missed too close as opposed to the one that missed too far,
Makes sense to me: If you extend the close miss it would definitely be closer to the circle than extending the far miss. Chances are the close miss error will decrease if you double the deltas; little chance of that with the far miss deltas.
This is a bit of handwavium, sadly, but the only way to get more detailed is to do the analysis done at this link.

http://www.artcompsci.org/kali/vol/vol- ... 04_ok.html

The rk4 algorithm discussed there is the same as ours, except, they are able to make a simplification because their second order differential equation is set up such that acceleration is only a function of position. Ours needs to be a bit more general. However, you can, in principle, do the same mathematical analysis they do on the algorithm we have. The actual analysis is near the bottom of that page "An attempt at an Analytical Proof", but you may have to read a bit above that to catch the thread of the conversation. The "attempt" is good enough for me, but you can continue reading for a more rigorous proof.
Gotta go right now; I'll have a look later; but I probably won't understand a thing.
......
All way over my head.


There's another issue with optimizations we're going to have to tackle, sooner or later...
Even if we limited gravitational attractors to a dozen per system, we still have an o(n^2) problem, which, frankly, I'm not so scholarly about optimizations to care much what the o is, unless it bites me in the ass; but I think its teeth are sharp in this case. For each object we simulate we've got to compute forces; gravity among them; but gravity implies going through those 12 attractors, computing distances squared, which involves... 3 subtractions, 3 squarings (muls), and two additions (no square root, thank God). Then we divide the mass by this d^2, and multiply by the gravitational constant.
So that's a grand total of
12*5= 60 addition/subtraction
12*4 = 48 multiplications
and
12 divisions
Then we vectorially add those forces, which is 11 adds x 3 dimensions = 33

Grand total for 12 attractors: 93 adds, 48 muls, 12 divides. And we have to do that 3 times per rk4 step, so,

279 adds, 144 muls, 36 divides... !!! per rigid body, per rk4 step, at ONLY 12 attractors... All double precision. :roll:

We talked about this before, and about several possible solutions: spheres of influence, turning space into a vector grid...

I just came up with an idea that I think is MUCH more elegant: Using a separate scheduler for gravity computations.

I'll explain it in a second, but let me say first why I think this is more "elegant":
Gravity is a weak force, and yet the one we need computed most accurately; but besides being weak, it changes very little from one place to another, given the vast distances involved in space.
However, increasing step sizes for planets and stations ensures that gravitation force vector deltas are not too insignificant, and helps us reduce computation load, which now we compensate for, precision-wise by using RK4. So far so good. But what about ships, notably small fighters in a dogfight? They may be simulated 16 times a second due to proximity to the player, and gravity force vector recomputed each time, 279 adds, 144 muls, 36 divides. Is this warranted? Yes and no: 16 Hz simulation may well be warranted, but NOT 16 Hz gravity.
Now, assume our original problem of drift when trying to dock at a station. I suggested before that the station could be harmonized with the player's simulation; --i.e.: simulating the station at 16 Hz-- as the player approaches.
Let me now suggest almost the opposite: Harmonize the player to the station. If the station is simulated every 8 seconds, say, do so with the player.
But no; we can't do that, as the player may thrust and steer and expect quick responses.
Well, the problem here is precisely that thrusting and steering forces, and gravity forces, would have vastly different step size needs; and we are mixing them together and trying to find ONE step size that fits both. THAT is the problem.
So the elegant solution is to address different animals differently.

A more detailed explanation:

This may be false, but assume that Gravity force computations would be done ahead of the rest of the physics, using the same scheduler code, but different scheduler data (object bit masks) from the other physics. So, we still have 16Hz, 8Hz, 4Hz, etcetera bins. Gravity force simulation has different priority criteria: Typically, given your little fighter is orbiting a planet at the same altitude as some big station, your frequency bin will be the same. Step size would be decided by some ad-hoc (but hopefully good) estimation of precision loss per unit of game time --however we might define that-- so that if precision loss is too little, frequency decreases; and if too high, it increases (by powers of two, of course). So, upon entering a system we could start everything at 1-second steps for gravity, and let the objects find their own best frequency bins; or we could start them all randomized, for that matter (probably better than overloading a single bin); or, we could use some rough initial best fit approximation.

But now, what stops us from using a similar algorithm for the rest of the forces? If you step on the gas, your simulation frequency jumps; if you don't, there is nothing to simulate... --except gravity; but that is now scheduled separately... or, is it?

We got a problem: When we simulate rigid body physics, we need to take all forces into account, not just thrusting or colliding, and not just gravity.
When I say we don't need to "simulate gravity" as often as other forces, what I mean is that we don't need to fully compute gravity force vectors (based on planet distance details) as often as we need to for other forces, but rk4step() needs the resultant of all forces as often as it needs to run.

So, the truer description of the solution I am proposing is a dual scheduling system (dual bit masks), for "gravity" and for "other", but a common resulting call of rk4step(). It would work like this: The gravity force scheduler moves the flag bit for a unit to a gravity frequency bin it deems ideal based on precision considerations. Similarly, the "other" force scheduler also places a flag bit for the unit, based on force applications, distance to the player, and whatever criteria, into a "other" frequency bin it deems most appropriate.
Now, units are simulated whenever either of those bits are set, for the current frequency bin being processed. But IF the flag bit is for gravity, rk4step will a) use the gravity step size, b) include a full gravity computation, and c) write gravity force results only, to a cubic interpolation structure.
Whereas IF the flag bit is an "other" flag bit, it will a) use the "other" step size b) skip gravity computations (distance to planets and all that), but compute an interpolated value for gravity vector, instead; and c) it will write normal rk4step() output to cubic interpolation structures, and not touch the gravity force structure.

Opinions?
Post Reply