rk4 algorithm sent to you chuck

Development directions, tasks, and features being actively implemented or pursued by the development team.
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

Maybe we should just standardize on vectors of rotation for everything, and kiss quaternions good-bye. The point of using quaternions was that they were supposedly easy to interpolate; but by now we know better.
Well, here is the thing. I don't exactly understand why people use unit quaternions when the same information can be contained in a vector. I think it has to do with this though: A vector has all the information you need for the rotation, but there is no vector operation that will apply the rotations correctly. You have to use a rotation matrix. Consider the un-rotated default position of an object.

In quaternion notation, that is (1,0,0,0). In vector notation, it is (0,0,0). Now say you want to rotate Pi radians about the x-axis. In quaternion notation, this rotation is (0,1,0,0). In vector notation, this would be (Pi, 0, 0). This seems ok so far. We have the axis and the angle through which we rotated. Now however, consider an additional rotation about the y-axis of Pi radians. The combination of these two rotations is equivalent to a single rotation of Pi radians about the z-axis. Just visualize rolling a plane(airplane) upside down, and then pitching it over 180 degrees. It is now facing the opposite direction from where it started, but it is facing upright again. Just like if it had yawed 180 degrees.

You can see this by doing quaternion multiplication of (0,1,0,0)*(0,0,1,0)=(0,0,0,-1). However, if we do the same thing we did last time with the vector, we get (Pi,Pi,0). This is a rotation, but it does not take the object where it is supposed to be. Cumulative rotations don't work with the vector operations. This is why you need the rotation matrices.

You can "store" orientation as a 3 element vector, but to actually perform the rotation, you need to either construct quaternions of both the current orientation and the rotation, or construct a rotation matrix that can act on the 3-element vector.

According to wiki-pedia, quaternions are preferred because they don't suffer from something called gimbal lock that can happen with rotations. However, I have not been able to find any place on the internet that explains wtf gimbal lock is.

Likewise, you can compute the delta_theta_vector as a vector, but to actually apply it as a rotation, you need to construct a quaternion from it. This is why angular_velocity and angular_acceleration can be (and should be) vectors instead of quaternions. They are instantaneous derivatives. They aren't an actual finite rotation.

Sadly, quaternions are necessary for rotations to work correctly, but I currently can only do angular displacement calculations using vectors. So, unless I can figure out how to do it with quaternions(not sure its even possible), it is inevitable that we need to switch between the two... calculating the 3-element vector representing the rotation and then turning that into quaternion that we apply to the current orientation.
First we need to answer how many gravitational bodies we intend to simulate,
This is a HUGE question. The I we just do objects large enough to have assumed a mostly spherical shape, we already have hundreds of objects. If we start doing anything large enough to impact your space ship in a collision, then we basically need everything larger than a medicine ball. That is a number I wouldn't even like to guess at, but if I had to, I would say at least millions, perhaps billions of objects. You have the rings of the gas giants, the asteroid belt, the ort cloud (though it may be too far out for us to care about), comets.

I am actually starting to wonder if it might make more sense to treat each planetary system as its own "system". We would obviously need some complicated sector grid. With differentially rotating sectors within the system such that planets always stay in the same grid as they go around the sun. Aside from the problem of gridding, this would allow resources to be focused entirely on the system the player is in allowing us to model it in detail. This would mean a "loading" screen in-system, but if we are smart, the loading screens will be far away from anything of interest, and players can use jump-gates for in-system travel between planets and, in most cases, never see the loading screen as it is covered up by a fancy wormhole graphic...just like I presume inter-system travel loading screens are handled.
Can't find it.
1st option: linear interpolation. rk4 needs to give you the rotation <angle,axis>that gets you from the current state to the future state. If you just have the current state <angle,axis> and future state <angle,axis>, you could, in principle extract the rotation that got you there, but it is computationally expensive and complicated. It is more efficient to just carry it around in the state. So, the rotation <angle,axis> object is given. The value of angle is just the average angular velocity (av) times the time step(ts). So, I write interp_angle=(angle/ts)*t where t is the elapsed time since current state. You can see that when t=ts, we are at the future state and the value of angle is what it should be. Then, I construct the interp_rot_quaternion=quaternion(cos(interp_angle/2),axis*sin(interp_angle/x)). I then say interp_quaternion=interp_rot_quaternion*current_state_orientation_quaternion. This gives you the orientation at a time in between the current state and future state. And, you are guaranteed to end up where you are supposed to at t=ts.

It isn't pseudo-code, but that is the algorithm.
The gravitation of asteroids is too feeble to have any effect perceivable to the player.
Well, yes and no. Ceres is a large asteroid with .27m/s^2 acceleration at its surface, but there are few of those that exert significant gravity at any large distance. That is how I sized the spheres of influence.

a=GM/r^2--> r=sqrt(GM)/sqrt(a)...in my sphere of influence calculations before, I took my 'a' threshold to be the sun's gravitational pull at 100 AU which is around 6*10^-7 m/s^2. If you make the gravitational threshold higher, you will need to make exceptions for the sun and probably some of the gas giants with lots of moons to make sure that these objects are influencing the objects that are bound to them gravitationally. Otherwise, pluto just goes floating off into space. I also chose the low threshold so that even if something is traveling very fast, it will likely get picked up by the sphere of influence well before the gravitational pull is noticeable to the player.

Also, without micro-gravity, it will be impossible to keep the asteroid field from dispersing. The asteroids will all have small random velocities in the radial direction. In reality, the gravitational pull of the asteroids in the belt pulls back any errant asteroids that start to wander out. Without this, we will need to create some sort of artificial acceleration that keeps these objects in a belt.

While the effect of gravity on the player may be negligible. The cumulative effects of gravity on objects in the system(or the lack thereof IS perceivable to the player). If the asteroid belt disperses, or some of Jupiter's moons going flying off...or Saturn's rings disperse, these things will be noticed.

Another thing is that pluto holds charon in orbit via micro-gravity. It may be appropriate to have each object in the system have a list of "attractors", but have them be a fixed list for most objects. Only players would have dynamic lists. Objects like asteroids that could be moved from their normal orbits by player intervention may also become partially dynamic once they have been removed with their fixed list continuing to influence them all the time, but having the option to add other influences temporarily if it wanders into the sphere of influence of another object.

Damn! Tensor of inertia is a matrix, not a vector. Couldn't it be expressed more succinctly as three quantities?, --namely, the moment of inertia around each axis? I would envision us reading just 3 quantities from units.csv, then expanding that to a matrix, if necessary (in RigidBody's constructor); and maybe the inverse tensor of inertia as well, if it speeds up calculations.
It is not as simple as a moment of inertia about each axis. I can't really explain it that well. It is best seen through demonstration, but first, we need to calculate the moi tensor of an actual non-symmetrical object. Let me try and find a suitable object and get back to you. You will definitely want to carry around the inverse tensor of inertia. In fact, that is really the only thing that we need. We don't care about the actual moment of inertia. We are always doing torque*inverse_moi. Maybe for purposes of computing changes in the moment of inertia tensor, we will need it, but the computational expense of inverting matrices is such that I would definitely recommend carrying around the inverse.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

Errol_Summerlin wrote: According to wiki-pedia, quaternions are preferred because they don't suffer from something called gimbal lock that can happen with rotations. However, I have not been able to find any place on the internet that explains wtf gimbal lock is.
Perhaps this?
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

@Klauss: Alright, can't do it now; I have to leave --and won't be able to do it tonight either--; but to pre-answer your memory concerns,

Code: Select all

class RigidBody
{
    friend int main(); //temporary
    vec3< double > acceleration;
    vec3< double > angular_acceleration;
    vec3< double > velocity;
    vec3< double > angular_velocity;
    vec3< double > position;
    quat< double > orientation; //orientation in 3D space represented as a quat<double>
    vec3< double > tensor_of_inertia;
    double mass;
    double inverse_mass;
    double dm_dt; //rate of mass gain (<0 for loss)
    double time; //current time
public:
    struct ForceTorqueandMassLoss
    {
        vec3< double > Force;
        vec3< double > Torque;
        double dm_dt; //rate of mass gain ( <0 for loss)
    };
    struct OutputState
    {
        vec3<double> X0, V0, C1, C2;
        quat<double> rX0, rV0, rC1, rC2;
    };
    .................
As you can see, the memory foot-print for RigidBody is 128 bytes, right now; but there's going to be a matrix to come, for tensor of inertia, and perhaps a second matrix for precomputed inverse tensor of inertia. There's also OutputState, a struct of which there are two instances, but only one is written to at each step, based on a toggling boolean. Its size is 128 bytes.
So, all in all, 6 (64-byte) cache lines per RigidBody step.
I'm not sure what memory bandwidth is, these days, for ddr3 main memory; but I don't think it's an issue, because memory bandwidth can be hidden by prefetching. I think the real consideration, if any, is L1 reads and writes to/from registers; which I also lack numbers for, but I imagine it would be far, FAR less than the 2.7 microseconds it currently takes an rk4step to run on my machine. I can't remember if loading an int from L1 into a register took 1 clock or 2; and a double is like 2 ints, but the pipe to L1 is quad-word anyway, but it probably is far sub-nanosecond-ish, so worst case, at 1 nanosecond per quad word, for a 2 GHz cpu, we got 96 nanoseconds for register loading total, say 100; so +=0.1= 2.8 microseconds versus 2.7, assuming, as you do, that my test keeps everything in registers, which I doubt completely (not enough registers by a long shot).
Errol_Summerlin wrote:
Maybe we should just standardize on vectors of rotation for everything, and kiss quaternions good-bye. The point of using quaternions was that they were supposedly easy to interpolate; but by now we know better.
Well, here is the thing. I don't exactly understand why people use unit quaternions when the same information can be contained in a vector. I think it has to do with this though: A vector has all the information you need for the rotation, but there is no vector operation that will apply the rotations correctly. You have to use a rotation matrix.
..........................................
You can "store" orientation as a 3 element vector, but to actually perform the rotation, you need to either construct quaternions of both the current orientation and the rotation, or construct a rotation matrix that can act on the 3-element vector.
Ouch.
Yeah, I feared this was going to be the case.
Well, this problem is in your hands; you're the math wiz here. The idea is to do this with the least flops per rk4step. And not all flops are equal, btw; trig ops take a lot more cycles than muls and adds; and sqrt somewhere in-between I think. And the problem, the way I see it, is that not only we need a quaternion conversion for integrating rotation, but also for the computations of rC1 and rC2 !!!
And I'm not sure what the rotational interpolation formula, rX0+rV0*t+rC1*t^2+rC2*t^3 even looks like, in terms of implementation...
What's rV0? A vector? A quaternion? And how do you multiply it by t?
I fear we may have to make do with linear interpolation for rotations.
According to wiki-pedia, quaternions are preferred because they don't suffer from something called gimbal lock that can happen with rotations. However, I have not been able to find any place on the internet that explains wtf gimbal lock is.
Yeah, I remember reading that. No idea.
First we need to answer how many gravitational bodies we intend to simulate,
This is a HUGE question. The I we just do objects large enough to have assumed a mostly spherical shape, we already have hundreds of objects. If we start doing anything large enough to impact your space ship in a collision, then we basically need everything larger than a medicine ball. That is a number I wouldn't even like to guess at, but if I had to, I would say at least millions, perhaps billions of objects. You have the rings of the gas giants, the asteroid belt, the ort cloud (though it may be too far out for us to care about), comets.
But, as I was saying in a previous post, the relative movement of ice chunks in a planetary ring are far too invisibly slow to warrant simulating them. Which is why I don't understand why Klauss keeps talking about thousands of units. For vegastrike, a planetary ring should be a single unit with a special shader attached to it that generates (gpu instancing) random ice chunks in fixed positions IF/WHEN you get close enough to it to see them. (And uses the gpu, rather than the cpu, for collisions). Nothing more. No need to simulate nothing, I don't think.
I am actually starting to wonder if it might make more sense to treat each planetary system as its own "system". We would obviously need some complicated sector grid. With differentially rotating sectors within the system such that planets always stay in the same grid as they go around the sun. Aside from the problem of gridding, this would allow resources to be focused entirely on the system the player is in allowing us to model it in detail. This would mean a "loading" screen in-system, but if we are smart, the loading screens will be far away from anything of interest, and players can use jump-gates for in-system travel between planets and, in most cases, never see the loading screen as it is covered up by a fancy wormhole graphic...just like I presume inter-system travel loading screens are handled.
Hahaha, it's come up before --intra-system jump-gates. In any case, this is not a game, but rather a game-engine that has to support mods that may or may not require intra-system (or inter-system) jumps and whatnot; so we can't address the problem by changing the name of the game. But as far as having each planet be its own system, in mathematical and programmatic terms, will come de-facto when we implement multiple, hierarchic coordinate systems. But that's a MAJOR undertaking.
Can't find it.
1st option: linear interpolation. rk4 needs to give you the rotation <angle,axis>that gets you from the current state to the future state. If you just have the current state <angle,axis> and future state <angle,axis>, you could, in principle extract the rotation that got you there, but it is computationally expensive and complicated. It is more efficient to just carry it around in the state. So, the rotation <angle,axis> object is given. The value of angle is just the average angular velocity (av) times the time step(ts). So, I write interp_angle=(angle/ts)*t where t is the elapsed time since current state. You can see that when t=ts, we are at the future state and the value of angle is what it should be. Then, I construct the interp_rot_quaternion=quaternion(cos(interp_angle/2),axis*sin(interp_angle/x)). I then say interp_quaternion=interp_rot_quaternion*current_state_orientation_quaternion. This gives you the orientation at a time in between the current state and future state. And, you are guaranteed to end up where you are supposed to at t=ts.

It isn't pseudo-code, but that is the algorithm.
Okay, let me digest that and I'll get back to you.
The gravitation of asteroids is too feeble to have any effect perceivable to the player.
Well, yes and no. Ceres is a large asteroid with .27m/s^2 acceleration at its surface, but there are few of those that exert significant gravity at any large distance. That is how I sized the spheres of influence.

a=GM/r^2--> r=sqrt(GM)/sqrt(a)...in my sphere of influence calculations before, I took my 'a' threshold to be the sun's gravitational pull at 100 AU which is around 6*10^-7 m/s^2. If you make the gravitational threshold higher, you will need to make exceptions for the sun and probably some of the gas giants with lots of moons to make sure that these objects are influencing the objects that are bound to them gravitationally. Otherwise, pluto just goes floating off into space. I also chose the low threshold so that even if something is traveling very fast, it will likely get picked up by the sphere of influence well before the gravitational pull is noticeable to the player.
Gottcha.
In this sense, every object is potentially a gravitational attractor. I guess the difference is that some of them --e.g. Ceres and all asteroids in general-- ONLY have gravitational effects IF the player gets close to them, but need not participate in an N-body paradigm for the whole system.
So, I think we need an attribute for objects that fall under this paradigm, so that they are never considered for the lists of attractors of other gravitational objects, EXCEPT the player's.
Also, without micro-gravity, it will be impossible to keep the asteroid field from dispersing. The asteroids will all have small random velocities in the radial direction. In reality, the gravitational pull of the asteroids in the belt pulls back any errant asteroids that start to wander out. Without this, we will need to create some sort of artificial acceleration that keeps these objects in a belt.
Yes; but ditto: This dispersal would take years. This is not part of the player's observational time-frame; so I would insist that the best way to tackle fields of rocks or ice-chunks is by fixed location, gpu "instancing" and other tricks. But then again, I'm not sure why Klauss speaks of thousands of objects, and he's almost always right, so I'll wait before making final solution claims.
Another thing is that pluto holds charon in orbit via micro-gravity. It may be appropriate to have each object in the system have a list of "attractors", but have them be a fixed list for most objects. Only players would have dynamic lists. Objects like asteroids that could be moved from their normal orbits by player intervention may also become partially dynamic once they have been removed with their fixed list continuing to influence them all the time, but having the option to add other influences temporarily if it wanders into the sphere of influence of another object.
Indeed!!!
Each object should have a precomputed, persistent list of unconditional attractors, and another list of potential attractors. These two lists would make short work of updating current sphere of influence lists.
Damn! Tensor of inertia is a matrix, not a vector. Couldn't it be expressed more succinctly as three quantities?, --namely, the moment of inertia around each axis? I would envision us reading just 3 quantities from units.csv, then expanding that to a matrix, if necessary (in RigidBody's constructor); and maybe the inverse tensor of inertia as well, if it speeds up calculations.
It is not as simple as a moment of inertia about each axis. I can't really explain it that well. It is best seen through demonstration, but first, we need to calculate the moi tensor of an actual non-symmetrical object. Let me try and find a suitable object and get back to you. You will definitely want to carry around the inverse tensor of inertia. In fact, that is really the only thing that we need. We don't care about the actual moment of inertia. We are always doing torque*inverse_moi. Maybe for purposes of computing changes in the moment of inertia tensor, we will need it, but the computational expense of inverting matrices is such that I would definitely recommend carrying around the inverse.
I guess we will need both. The non-inverted tensor of inertia will be needed to calculate forces during collisions.

What I consider the main thing for this post is my proposal to do away with cubic interpolation for rotation. Linear interpolation should suffice.

Furthermore, I'd like to propose we use Euler steps for rotation, instead of rk4. Why? Because, as I mentioned in earlier posts, the main reason we want the precision of rk4 is for docking at stations; and, more generally, wherever the consequences of cumulative errors would be perceived by the player, specially concerning gravity and inertial trajectories. But gravity has no (direct) effect on rotations, and is not (directly) affected by them. Rotations are just there; and changes in rotations are caused by thrusting and collisions, neither of which is accurately known to the player, and therefore the player has no way to judge the accuracy of their effects.
Secondly, whereas using Euler for mass was incompatible with using rk4 for positions and velocities, given that the latter are influenced by mass, the same argument does not apply to rotations, as these have no effect on velocity/position integrations (except through the mediation of thrusters, but ditto: the player has no way to judge the accuracy of thrusting effects).
Thirdly, conversions between vector of rotation, angle and axis, and quaternions, makes rotational integration the heaviest code in all of rk4Step() by a long shot, while being an order of magnitude less relevant, probably, to the player's experience, vis a vis translational integration.

So, I think it's a no-brainer where optimization efforts should focus.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

chuck_starchaser wrote:@Klauss: Alright, can't do it now; I have to leave --and won't be able to do it tonight either--; but to pre-answer your memory concerns,

Code: Select all

class RigidBody
{
    friend int main(); //temporary
    vec3< double > acceleration;
    vec3< double > angular_acceleration;
    vec3< double > velocity;
    vec3< double > angular_velocity;
    vec3< double > position;
    quat< double > orientation; //orientation in 3D space represented as a quat<double>
    vec3< double > tensor_of_inertia;
    double mass;
    double inverse_mass;
    double dm_dt; //rate of mass gain (<0 for loss)
    double time; //current time
public:
    struct ForceTorqueandMassLoss
    {
        vec3< double > Force;
        vec3< double > Torque;
        double dm_dt; //rate of mass gain ( <0 for loss)
    };
    struct OutputState
    {
        vec3<double> X0, V0, C1, C2;
        quat<double> rX0, rV0, rC1, rC2;
    };
    .................
As you can see, the memory foot-print for RigidBody is 128 bytes, right now; but there's going to be a matrix to come, for tensor of inertia, and perhaps a second matrix for precomputed inverse tensor of inertia. There's also OutputState, a struct of which there are two instances, but only one is written to at each step, based on a toggling boolean. Its size is 128 bytes.
So, all in all, 6 (64-byte) cache lines per RigidBody step.
I'm not sure what memory bandwidth is, these days, for ddr3 main memory; but I don't think it's an issue, because memory bandwidth can be hidden by prefetching. I think the real consideration, if any, is L1 reads and writes to/from registers; which I also lack numbers for, but I imagine it would be far, FAR less than the 2.7 microseconds it currently takes an rk4step to run on my machine. I can't remember if loading an int from L1 into a register took 1 clock or 2; and a double is like 2 ints, but the pipe to L1 is quad-word anyway, but it probably is far sub-nanosecond-ish, so worst case, at 1 nanosecond per quad word, for a 2 GHz cpu, we got 96 nanoseconds for register loading total, say 100; so +=0.1= 2.8 microseconds versus 2.7, assuming, as you do, that my test keeps everything in registers, which I doubt completely (not enough registers by a long shot).
Ehm... no... memory bandwidth cannot be hidden by any mean whatsoever other than by fitting the entire working set inside the cache - and since the CPU has to manage graphics and AI at the same time, I highly doubt that will happen.

From the 384 bytes per entity estimate, 5000 units would use circa 2MB, which means you can achieve 4000FPS tops because of memory bandwidth alone on a modern yet not state of the art memory system (8GB/s). 2MB will fit on a clean L3, but since the L3 will be pressured by other processes, I don't believe it will happen.

Still, 4000FPS are waaaay more than enough.

But then again, the memory bus will be pressured by other systems just as well. So I'd rather concentrate on the bandwidth consumed by physics, which would be 2MB * 30FPS (assuming the highest priority slot runs at 30FPS, which seems more than enough). That's 60MB/s, which is quite low.

A working set of 2MB, however, is too big - most L3s won't fit the entire stucture, and that could hurt other subsystems. Since the structure will also have data required for the graphical system (interpolation has to read state), I'd suggest vectorizing the Output structures and storing them in a separate buffer, so other systems can access that information (the information they care about) in a tightly packed array, and thus cache pressure gets reduced considerably.

So... that's all theory. Having a simple benchmark helps in that it allows you to confirm that theory, and measure real performance - even though the benchmark will have insane FPS ratings, you have to consider the fact that the CPU & memory will be pressured by other subsystems, like sound, graphics, AI, etc... even the OS.

PS: Oh... btw... x86_64 has 64 64-bit registers (IIRC). Take that ;) - ain't that impressive? You can't address them all (only 16), but they're accessible to the hardware through register renaming (for dependency resolution).

PS2: Add to that the 8 MMX (or 8 FP) 64-bit and 16 XMM 128-bit registers... there's a lot of room for a clever compiler. I doubt gcc is that clever though ;)
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

But, as I was saying in a previous post, the relative movement of ice chunks in a planetary ring are far too invisibly slow to warrant simulating them. Which is why I don't understand why Klauss keeps talking about thousands of units. For vegastrike, a planetary ring should be a single unit with a special shader attached to it that generates (gpu instancing) random ice chunks in fixed positions IF/WHEN you get close enough to it to see them. (And uses the gpu, rather than the cpu, for collisions). Nothing more. No need to simulate nothing, I don't think.
So, you are saying we treat the object like a ring at large distances and only refine it into smaller chunks when a player gets close. That is all well and good, but what happens if a player has a collision with a chunk and knocks it out of the ring. Does it then become an object on its own?
What I consider the main thing for this post is my proposal to do away with cubic interpolation for rotation. Linear interpolation should suffice.
I agree in principle. Lets do linear for now. I will eventually write a cubic interpolator for rotations just because i think it should be available as a tool, but since players are always simulated at high priority, linear interpolation should work fine for them, and players wont notice linear interpolation of other objects. For the most part, every planetary body will have a very nearly constant angular velocity anyway. Only objects experiencing large torques might require anything more complicated than a linear interpolation.
Furthermore, I'd like to propose we use Euler steps for rotation, instead of rk4.
Here is the problem with that. In order to do Euler step for rotation, you have to consider the orientation of the object a constant during the rk4 step for the linear stuff. If you don't consider it constant over the rk4 step, you are doing all the work to compute it using rk4 and then using an euler step at the end. This isn't so bad except you have to consider a rapidly spinning spacecraft that decides to thrust. Now, because it is spinning so fast, the net force on the object ought to be zero because it is thrusting in all directions at once. However, with an euler step for rotation, you consider the orientation to be constant over the time step. So, you get 1/16th second of thrust in a particular direction. Then you update the orientation with an euler step and get another little kick...and another and another. Instead of just continuing to spin in place like it should. The object will spiral outward in an ever widening circle. This effect is analogous to the orbital instability seen with gravitation. If the forces acting on an object were not intimately connected the orientation of the object, then this would be fine, but since thrust is a large fraction of the forces on an object, I worry it will create problems.

This is similar to the problem with making mass an euler step and doing everything else in rk4. If you start taking elements out of rk4, you have to assume they are constant over the time interval. With large amounts of propellant being beneficial to all spacecraft (as it increases thrust to mass ratio up to around 4 times the mass of your ship), the assumption that mass is constant over the interval will reduce the accuracy of thrust.

Rk4 is so useful because it can feel its way through multi-variable space...where the acceleration depends on more than just the position of the object. The higher order terms included in rk4 are da/dt and d(da/dt)/dt. And quite frankly, the spinning of the space ship during thrust and the loss of mass over an interval are the biggest contributors to these higher order terms. Radial gradients in gravitational force are fairly small so long as you are outiside the surface of the planet. This is why I still don't think that the accuracy of the Euler step was the reason you had the gravitational jitter. For one thing, the jitter should have been consistently in one direction if it were due to euler.

Another problem with this is that it require writing an entirely different computational step that is a hybrid rk4/euler step. I *think* I could do this, but I can't vouch for stability.

Also, I wouldn't say the quaternion stuff is that much more computationally expensive. It is identical to the linear calculations except that calculating the new position is not simply x0+deltax. It is a quaternion multiplication, yes, but that is only 16 muls and 16 adds(less if you optimize a little) out of hundreds of multiplications and additions in the rk4 algorithm itself. I still maintain that the function that grabs the force is what we really need to focus on optimizing, and that is unaffected by how we use that information.

My thinking is that we should make sure we do it correctly first. We get rid of the "jitter", make sure all the physics looks right and then, if we are having issues with computation time, we can start looking for ways to optimize the code and remove unnecessary physics calculations one at a time. Then, if we introduce an artifact or something else that breaks immersion, we know what optimization caused the artifact and we can find another way to optimize that doesn't cause the artifact. Right now, we are speculating about what needs to be optimized without actually having the game in front of us. I am all about figuring out what to cut for the sake of optimization, but lets wait and see what needs to be optimized first and wait till we actual have a program to test it on where we can see the effects of the changes we make instead of trying to compute them apriori. I can always write the hybrid_step function later and have objects call that instead.

@klauss....I have read that before. It doesn't compute :)
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

Errol_Summerlin wrote:My thinking is that we should make sure we do it correctly first. We get rid of the "jitter", make sure all the physics looks right and then, if we are having issues with computation time, we can start looking for ways to optimize the code and remove unnecessary physics calculations one at a time. Then, if we introduce an artifact or something else that breaks immersion, we know what optimization caused the artifact and we can find another way to optimize that doesn't cause the artifact. Right now, we are speculating about what needs to be optimized without actually having the game in front of us. I am all about figuring out what to cut for the sake of optimization, but lets wait and see what needs to be optimized first and wait till we actual have a program to test it on where we can see the effects of the changes we make instead of trying to compute them apriori. I can always write the hybrid_step function later and have objects call that instead.

@klauss....I have read that before. It doesn't compute :)
I'm actually going the other way. I think perhaps performance is ok as it is: rk4 may be too heavy to perform for every object @30FPS, but if you combine rk4 at a bigger timestep for most units and cubic interpolation @30FPS and linear interpolation at graphics frame rates (ie: top speed), then you get enough performance.

Linear/cubic interpolation only needs a smaller working set (less than 64 bytes per unit perhaps), and if separated, I could bet performance will be more than enough.

Ie: I'm optimistic and pushing for a benchmark to more scientifically substantiate that optimism.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

klauss wrote:Ehm... no... memory bandwidth cannot be hidden by any mean whatsoever other than by fitting the entire working set inside the cache - and since the CPU has to manage graphics and AI at the same time, I highly doubt that will happen.
Why you say this? Memory latency can most definitely be hidden by precacheing, specially in a case like this where the weight of computations far exceeds the latencies involved.
As for managing graphics "at the same time", this is happening in one of two ways: on a separate core, in which case L1 is separate; or in a separate thread, in the same core, and most OS kernels won't force any granularity smaller than double digit milliseconds per thread, so as far as we are concerned, cache conflicts between threads are not an issue, unless you're talking about hyperthreading.
From the 384 bytes per entity estimate, 5000 units...
Klauss, there's only 42 units to simulate. Where do you come up with these magic numbers?
...would use circa 2MB, which means you can achieve 4000FPS tops because of memory bandwidth alone on a modern yet not state of the art memory system (8GB/s). 2MB will fit on a clean L3, but since the L3 will be pressured by other processes, I don't believe it will happen.

Still, 4000FPS are waaaay more than enough.

But then again, the memory bus will be pressured by other systems just as well. So I'd rather concentrate on the bandwidth consumed by physics, which would be 2MB * 30FPS (assuming the highest priority slot runs at 30FPS, which seems more than enough). That's 60MB/s, which is quite low.

A working set of 2MB, however, is too big - most L3s won't fit the entire stucture, and that could hurt other subsystems.
Ditto, I don't know where you get the requirement of "the entire structure" having to be in cache. My plan is to have one prefetching iterator running two RigidBody objects ahead, prefetching RigidBodies to be processed; and a second prefetching iterator running one object ahead, prefetching any attractors whose forces we will need. By the time the physics loop gets to a new unit, all needed info will be in L1, tweedling thumbs and bored to death.
Since the structure will also have data required for the graphical system (interpolation has to read state), I'd suggest vectorizing the Output structures and storing them in a separate buffer, so other systems can access that information (the information they care about) in a tightly packed array, and thus cache pressure gets reduced considerably.
Done. That's what OutputState is.
So... that's all theory. Having a simple benchmark helps in that it allows you to confirm that theory, and measure real performance - even though the benchmark will have insane FPS ratings, you have to consider the fact that the CPU & memory will be pressured by other subsystems, like sound, graphics, AI, etc... even the OS.
That's why I'd integrate before benchmarking. But anyhow, we got numbers already. About 3 uS per rk4step without considering computation of forces; and 0.1 uS per interpolation.
PS: Oh... btw... x86_64 has 64 64-bit registers (IIRC). Take that ;) - ain't that impressive? You can't address them all (only 16), but they're accessible to the hardware through register renaming (for dependency resolution).
Right! I forgot the renaming bit; I was thinking 16. Good point.
PS2: Add to that the 8 MMX (or 8 FP) 64-bit and 16 XMM 128-bit registers... there's a lot of room for a clever compiler. I doubt gcc is that clever though ;)
Definitely. Well, as I was saying to Errol, once all this is working and in-game, I intend to do the final optimization: rewrite in Assembler, using SSE2 instructions and registers. I got really good at MMX and 3DNow!, back in the days...
(Good doesn't begin to describe it. Heck, I can't write C++ code that compiles and runs the first time. Can you? But I could write long 3DNow! Assembler routines that compiled AND worked flawlessl; nothing at all to fix; --and it didn't happen just once, but many times.)

Errol_Summerlin wrote:
But, as I was saying in a previous post, the relative movement of ice chunks in a planetary ring are far too invisibly slow to warrant simulating them. Which is why I don't understand why Klauss keeps talking about thousands of units. For vegastrike, a planetary ring should be a single unit with a special shader attached to it that generates (gpu instancing) random ice chunks in fixed positions IF/WHEN you get close enough to it to see them. (And uses the gpu, rather than the cpu, for collisions). Nothing more. No need to simulate nothing, I don't think.
So, you are saying we treat the object like a ring at large distances and only refine it into smaller chunks when a player gets close. That is all well and good, but what happens if a player has a collision with a chunk and knocks it out of the ring. Does it then become an object on its own?
Good question, and the answer is probably yes, but irrelevant to the matter at hand, that being physics simulation. The crux of it that the lowliest of Saturn's rings has probably trillions of ice chunks, but we cannot simulate billions of them, nor millions, for that matter; so what does thousands do for us? We will HAVE to come up with really filthy tricks, no matter, and if they work for trillions, they will work for thousands as well. IOW, I think we'd never need to simulate more than 42 attractors at a time, and such a job would never fall on an ice chunk in a ring. Secondly, if you decide to fly slowly right into a Saturnian ring, dodging ice chunks, probably we will have to use gpu instancing for most of them, but turn a few into units momentarily if you hit them, or something along the lines. But probably those ice chunks are more like snow balls than like solid ice, and we could just make them burst and disperse upon colliding.
Furthermore, I'd like to propose we use Euler steps for rotation, instead of rk4.
Here is the problem with that. In order to do Euler step for rotation, you have to consider the orientation of the object a constant during the rk4 step for the linear stuff. If you don't consider it constant over the rk4 step, you are doing all the work to compute it using rk4 and then using an euler step at the end. This isn't so bad except you have to consider a rapidly spinning spacecraft that decides to thrust. Now, because it is spinning so fast, the net force on the object ought to be zero because it is thrusting in all directions at once. However, with an euler step for rotation, you consider the orientation to be constant over the time step. So, you get 1/16th second of thrust in a particular direction. Then you update the orientation with an euler step and get another little kick...and another and another. Instead of just continuing to spin in place like it should. The object will spiral outward in an ever widening circle. This effect is analogous to the orbital instability seen with gravitation. If the forces acting on an object were not intimately connected the orientation of the object, then this would be fine, but since thrust is a large fraction of the forces on an object, I worry it will create problems.

This is similar to the problem with making mass an euler step and doing everything else in rk4. If you start taking elements out of rk4, you have to assume they are constant over the time interval. With large amounts of propellant being beneficial to all spacecraft (as it increases thrust to mass ratio up to around 4 times the mass of your ship), the assumption that mass is constant over the interval will reduce the accuracy of thrust.

Rk4 is so useful because it can feel its way through multi-variable space...where the acceleration depends on more than just the position of the object. The higher order terms included in rk4 are da/dt and d(da/dt)/dt. And quite frankly, the spinning of the space ship during thrust and the loss of mass over an interval are the biggest contributors to these higher order terms. Radial gradients in gravitational force are fairly small so long as you are outiside the surface of the planet. This is why I still don't think that the accuracy of the Euler step was the reason you had the gravitational jitter. For one thing, the jitter should have been consistently in one direction if it were due to euler.

Another problem with this is that it require writing an entirely different computational step that is a hybrid rk4/euler step. I *think* I could do this, but I can't vouch for stability.

Also, I wouldn't say the quaternion stuff is that much more computationally expensive. It is identical to the linear calculations except that calculating the new position is not simply x0+deltax. It is a quaternion multiplication, yes, but that is only 16 muls and 16 adds(less if you optimize a little) out of hundreds of multiplications and additions in the rk4 algorithm itself. I still maintain that the function that grabs the force is what we really need to focus on optimizing, and that is unaffected by how we use that information.

My thinking is that we should make sure we do it correctly first. We get rid of the "jitter", make sure all the physics looks right and then, if we are having issues with computation time, we can start looking for ways to optimize the code and remove unnecessary physics calculations one at a time. Then, if we introduce an artifact or something else that breaks immersion, we know what optimization caused the artifact and we can find another way to optimize that doesn't cause the artifact. Right now, we are speculating about what needs to be optimized without actually having the game in front of us. I am all about figuring out what to cut for the sake of optimization, but lets wait and see what needs to be optimized first and wait till we actual have a program to test it on where we can see the effects of the changes we make instead of trying to compute them apriori. I can always write the hybrid_step function later and have objects call that instead.
Errol, you didn't quote my arguments following the first sentence, which already address the issues you raise. In fact, I mentioned that the only problem would be with thrusting BUT that the player has no accurate knowledge or intuition of thrusting, so it is irrelevant to the player's experience. Furthermore, anything that's close enough to the player for the player to appreciate is also simulated faster than objects farther away. So, I insist: We CAN use Euler for simulating rotation without noticeable adverse consequences and for huge performance gains.
klauss wrote:I'm actually going the other way. I think perhaps performance is ok as it is: rk4 may be too heavy to perform for every object @30FPS, but if you combine rk4 at a bigger timestep for most units and cubic interpolation @30FPS and linear interpolation at graphics frame rates (ie: top speed), then you get enough performance.
Sounds like a good plan.
So, basically, we got a 32Hz pulse coming into the physics scheduler, which it allocates to 16Hz, 8Hz, 4Hz, etcetera simulation bins; but BEFORE it goes to simulate a bin, it performs cubic interpolations for all units. Gravitation then uses interpolated positions, which is advantage number one. Advantage number two is with collisions, which then don't have to interpolate a select list before testing. Advantage number three is graphics, which can then interpolate linearly between two 32FPS frames.
Linear/cubic interpolation only needs a smaller working set (less than 64 bytes per unit perhaps), and if separated, I could bet performance will be more than enough.
64 bytes... plus rotational interpolation data, which should be less than 64 bytes if we go linear for it.
An issue to consider, tho, is that if we want to do linear interpolations between cubic interpolations, we will have to either maintan two interpolated states, or compute position AND velocity, which increases the cubic interpolation's cpu load.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

chuck_starchaser wrote:
klauss wrote:Ehm... no... memory bandwidth cannot be hidden by any mean whatsoever other than by fitting the entire working set inside the cache - and since the CPU has to manage graphics and AI at the same time, I highly doubt that will happen.
Why you say this? Memory latency can most definitely be hidden by precacheing, specially in a case like this where the weight of computations far exceeds the latencies involved.
Latencies you can hide - but if your working set is bigger than the cache, then prefetching doesn't help: the system will starve, because the memory bus can't feed data as fast as the processor can consume it.

But perhaps you're saying that the processor can't consume data that fast? That may be with rk4... but not with interpolation.
chuck_starchaser wrote:As for managing graphics "at the same time", this is happening in one of two ways: on a separate core, in which case L1 is separate; or in a separate thread, in the same core, and most OS kernels won't force any granularity smaller than double digit milliseconds per thread, so as far as we are concerned, cache conflicts between threads are not an issue, unless you're talking about hyperthreading.
It doesn't matter, L3 is shared in every architecture I've known to have it - L2 just as well in many. L1 can't fit a 2MB buffer.
chuck_starchaser wrote:
From the 384 bytes per entity estimate, 5000 units...
Klauss, there's only 42 units to simulate. Where do you come up with these magic numbers?
You're perhaps thinking PU, or VS really really early in the game. If you play long enough, VS (UTCS) will start to be really overpopulated. It isn't uncommon to have close to 1k units per system (counting dropped cargo containers - which are units), and VS simulates (not sure if in full) a couple systems.

chuck_starchaser wrote:
Since the structure will also have data required for the graphical system (interpolation has to read state), I'd suggest vectorizing the Output structures and storing them in a separate buffer, so other systems can access that information (the information they care about) in a tightly packed array, and thus cache pressure gets reduced considerably.
Done. That's what OutputState is.
Oh... I thought it was embedded in the bigger state structure.
Cool then :D
chuck_starchaser wrote:
So... that's all theory. Having a simple benchmark helps in that it allows you to confirm that theory, and measure real performance - even though the benchmark will have insane FPS ratings, you have to consider the fact that the CPU & memory will be pressured by other subsystems, like sound, graphics, AI, etc... even the OS.
That's why I'd integrate before benchmarking. But anyhow, we got numbers already. About 3 uS per rk4step without considering computation of forces; and 0.1 uS per interpolation.
In fact, I thought this limited benchmarking I was proposing was easier than the discussion we've been having. I'm not saying discussing is a waste of time, don't misunderstand it, but I thought it was trivial (make a struct into an array, loop, time, print).
chuck_starchaser wrote:So, basically, we got a 32Hz pulse coming into the physics scheduler, which it allocates to 16Hz, 8Hz, 4Hz, etcetera simulation bins; but BEFORE it goes to simulate a bin, it performs cubic interpolations for all units. Gravitation then uses interpolated positions, which is advantage number one. Advantage number two is with collisions, which then don't have to interpolate a select list before testing. Advantage number three is graphics, which can then interpolate linearly between two 32FPS frames.
Linear/cubic interpolation only needs a smaller working set (less than 64 bytes per unit perhaps), and if separated, I could bet performance will be more than enough.
64 bytes... plus rotational interpolation data, which should be less than 64 bytes if we go linear for it.
An issue to consider, tho, is that if we want to do linear interpolations between cubic interpolations, we will have to either maintan two interpolated states, or compute position AND velocity, which increases the cubic interpolation's cpu load.
Hm... good point... but an interpolated velocity will be useful in many places anyway.
(graphics use the velocity - it does, really)
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Ehm... Sorry, no; I hadn't vectorized the OutputState structs.

Okay, consider all this done; but not tonight; I really need to catch up with my RL job work (I'll probably stay up all night designing this printed circuit board for a motor drive :(). Tomorrow night, probably.
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

So, I insist: We CAN use Euler for simulating rotation without noticeable adverse consequences and for huge performance gains.
The scenario I suggested of a spinning spacecraft thrusting was designed specifically to address your argument that you claimed I neglected. My intuition tells me that thrusting while I am spinning very fast should produce no net force, but with an euler simulation of rotation, this will not occur. I just think that if you make rotations euler step, you will need to simulate euler rotation steps more rapidly than the rk4 linear motion steps to avoid jerkiness in the physics steps. In addition, you will damage the accuracy of the rk4 step. True, in most circumstances, players won't notice, but is still a degradation in the accuracy of the results for a questionable benefit to run-time since you may need to take hybrid steps more often to avoid problems with euler steps on the rotations. Because the benefit of the proposed idea is questionable, I am advocating a wait and see policy with regards to optimizations. Rk4 as written, is neat and tidy and has minimal possibility of artifacts and conflicts. Hybrid methods, in my experience, get messy with both coding and results. You get one of those codes that you have to constantly tinker with to deal with the unexpected consequences of the hybrid optimizations.

However, its your baby. At this point you have done like 90% of the actual work and I am just consulting. But, if you insist on NOT using rk4 for rotations, at least use the midpoint method for rotations. The primary thing acting to rotate a body will be a constant torque, which translates into a constant angular acceleration. Using delta_theta=a_v*t+.5*a_a*t^2 will be FAR more accurate given the importance of that third term and its additional cost is quite small since you don't even need an additional call to getForceTorqueandMassLoss.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

(Just taking a break from the work I'm doing for work.)
Errol_Summerlin wrote:
So, I insist: We CAN use Euler for simulating rotation without noticeable adverse consequences and for huge performance gains.
The scenario I suggested of a spinning spacecraft thrusting was designed specifically to address your argument that you claimed I neglected. My intuition tells me that thrusting while I am spinning very fast should produce no net force, but with an euler simulation of rotation, this will not occur. I just think that if you make rotations euler step, you will need to simulate euler rotation steps more rapidly than the rk4 linear motion steps to avoid jerkiness in the physics steps. In addition, you will damage the accuracy of the rk4 step.
Very true, but for me it's a question of degree: The player's own ship will be getting maximum speed of simulation, always. That's 16 physics steps per second. How fast could your ship spin? I think about 90 degrees per second is the maximum governor limit. So, that's 64 steps in a full circle. Which means that using Euler it will be as if your thrusters were not mounted straight but 360/128=3 degrees off. Hmm... That IS a bit off. But anyhow, if your ship is spinning fast, the furthest thing from your mind is whether the thrusting seems to lag your heading a tiny weenie bit.
Furthermore, this won't happen :D Tell you why: The position rk4 will call the forces and mass loss function several times, right? When I have to compute thrusting, at each call, I will have to interpolate rotations; --linearly, admittedly, but interpolated thrustings, anyhow. So you will get a nice shower of thrust vectors that more or less will cover the rk4 step size. So it won't be the 3 degrees error I calculated above; it will be maybe 0.3 degrees error. Do correct me if I'm wrong, tho.

Now, suppose we're talking about a missile that got hit by a bolt and sent spinning really fast, like one rotation per second, and it's in the 8Hz simulation bin: Now we only have 8 steps per rotation and so the error from using Euler for rotation is huge.
Well, what's the consequence of that? That instead of making tight loops it spirals out? That's cause for celebration. Fireworks! :D
True, in most circumstances, players won't notice, but is still a degradation in the accuracy of the results for a questionable benefit to run-time since you may need to take hybrid steps more often to avoid problems with euler steps on the rotations.
I'm not sure what you refer to by "hybrid steps"; but what I'm saying is "to Hell with accuracy", in this case. Accuracy here only matters in as far as it benefits gameplay and believability.
Because the benefit of the proposed idea is questionable, I am advocating a wait and see policy with regards to optimizations.
Certainly; but keep in mind that in a game engine, optimizations are a must. It's not the same paradigm as in other types of applications where one part of the program may be a high optimization priority and the rest doesn't matter. In a game engine you're either optimizing for speed, or you're optimizing for size, or you're optimizing for conciseness of data representation, etceteras; or else you are optimizing for accuracy, or believability, or for eye-candy, or ear-candy, versus speed or size; but you're always "optimizing". It's the game-engine way of life; and it's not a question of whether a given optimization makes the game run faster *now*, but a question of whether it would benefit the game in any circumstances. With 100 units in a system, full rk4 at full speed may be no different from the present Euler, in terms of speed. With 5000 units, as Klauss proposes, it's a whole different story. Furthermore, right now probably the AI takes more time than all of physics AND collisions put together; but this will change.
Ultimately, you optimize because even if optimizing makes no difference now, it simply improves the engine's capabilities. Without optimization, the limit for the number of units above which frame-rate drops, say, is 666. After optimization the limit becomes 777? Good enough, even if we only ever have 555 units, in a given game. Asking whether to optimize, in a game engine, is like asking whether there is a real need to make a racing car faster... It wouldn't be a game engine if it wouldn't need to be faster.
Nobody is questioning whether accuracy is beneficial. Of course it is; and I'm a big fan of it. But the benefit of accuracy is asymptotic to how much of it can be perceived. Thus, a value_of_accuracy ~= arctan( K*accuracy )/K, sort of thing; where K is the million dollar question.
And the billion dollar question is value/cost, of course.
Now, we know that K is a lot higher for rotational accuracy than it is for position and velocity.
And we also know that the cost of rotational accuracy is much higher than that of positional accuracy, as it involves trig functions and square roots. So, value/cost is MUCH greater for positional simulation than for rotational simulation. Q.E.D.
Rk4 as written, is neat and tidy and has minimal possibility of artifacts and conflicts. Hybrid methods, in my experience, get messy with both coding and results. You get one of those codes that you have to constantly tinker with to deal with the unexpected consequences of the hybrid optimizations.
Again, I'm not sure what the hybrid methods are, though they sound awful; but I do know I had no intention to use them :D.
The performance benefit of simplifying the simulation of rotations is HUGE, Errol. Have a look at the code: Converting from a vector of rotation to a quaternion requires a length and a normalized vector, to obtain angle and axis, first of all, which involves a square root. Then the quaternion's constructor has to compute a sine and a cosine. Sine, cosine and square root are multi-cycle instructions. They obtain values by successive approximation; using Taylor series, or something along the lines... They are sloooowwwww instructions. Specially at double precision. And we're doing that not only in rk4Step(), but at each call of helper(). Going Euler for rotations gets rid of this code at least across helper() calls, which is 3 times out of 4. That would be a quadrupling of speed, if rotation was originally 100% of physics, which it isn't; but I'm sure it's well over 75% of it, in terms of cpu load. Roughly, I would expect at least a doubling of the overall physics performance, from this optimization. At least! At a cost of (perceivable) degradation of less than 1% (subjectively assessed).
However, its your baby. At this point you have done like 90% of the actual work and I am just consulting.
Far from it; this is entirely your baby; I'm just tutoring it :). What I've done is 100% optimizations; could not touch the algorithm if I wanted to, as it goes way over my head. What happens is that you are dreaming about its accuracy only; I'm dreaming half about accuracy and half about speed. ;-)
But, if you insist on NOT using rk4 for rotations, ...
I'm not nearly as hard headed as I may sound. I just hold my ground as long as I think I'm right; that's all; but I do listen, and consider other points of view.
What I will do is use a #ifdef USE_RK4_FOR_ROTATION ... current code ... #else ... Euler/mid-point ... #endif.
So your code won't be lost, and we can make performance and game-play comparisons.
... at least use the midpoint method for rotations.
What would the code look like?
The primary thing acting to rotate a body will be a constant torque, which translates into a constant angular acceleration. Using delta_theta=a_v*t+.5*a_a*t^2 will be FAR more accurate given the importance of that third term and its additional cost is quite small since you don't even need an additional call to getForceTorqueandMassLoss.
Sounds good; but only you know how to code that. By the way, the mid-point routine I commented it out a while ago, since it wasn't being used; and it probably needs updating as per the most recent changes; which I can do; but I'm just letting you know in case you try to uncomment it and the program breaks.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

chuck_starchaser wrote:Sine, cosine and square root are multi-cycle instructions. They obtain values by successive approximation; using Taylor series, or something along the lines... They are sloooowwwww instructions. Specially at double precision.
You can use cosine tables... I'm not sure how precision of the sin/cos affects the simulation though, and cosine tables are less precise (big time).

There's a much faster half-precision sin/cos in SSE if you do the tylor series by hand, but this requires quite a bit of assembly programming.

In any case I wouldn't consider this a big problem... I agree with Errol, first get the rk4 clean and simple, then optimize it. Optimization takes a lot of trial and error, so it's better to have a clean code base from which to optimize, and real benchmarks where to test the optimization steps.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
Errol_Summerlin
Merchant
Merchant
Posts: 46
Joined: Thu Mar 11, 2010 4:10 am

Re: rk4 algorithm sent to you chuck

Post by Errol_Summerlin »

Furthermore, this won't happen Tell you why: The position rk4 will call the forces and mass loss function several times, right? When I have to compute thrusting, at each call, I will have to interpolate rotations; --linearly, admittedly, but interpolated thrustings, anyhow. So you will get a nice shower of thrust vectors that more or less will cover the rk4 step size. ... Do correct me if I'm wrong, tho.
Well, if I understand what you are suggesting, then yeah, that won't work. It brings back all the computational expense or the rk4 step with none of the benefits. I will try to explain why below.

This is the PHYSICS step...you don't have an advanced state to interpolate to yet. IF you advance the rotation variables to the next step first, then you can go back and use the interpolated values for the rotations to compute the orientation at other times for purposes of advancing the other variables, but that doesn't actually save you anything. Calculating the orientation at an intermediate step in rk4 is done by a simple euler step. If you look at the rk4helper routine, you will see that all of the intermediate steps are just euler steps. This is just as expensive as interpolating the rotation linearly and you still have to query for force, torque and mass loss just as often. Both interpolating and the euler step involve a quaternion multiplication (the most computationally cumbersome part of the rotations). So, you have given up accuracy of simulation for 0 gain in run-time. Also, with a linear interpolation, the derivatives of the rotation variables will not be accurate. These are what rk4 really cares about anyway

IF you treat the orientation as constant during all the intermediate rk4 step, then you can avoid most(4 out of 5) of those cumbersome quaternion rotations, but every time you go back to query the system for the force, you will be feeding it the old orientation (since you chose not to waste the resources computing the new orientation at the intermediate steps and just calculate the change in orientation once at the end of the step). This MAY work ok, but I can't be sure. I have never tried something like that. It would certainly work if the force was independent of orientation (like a pure gravity simulation), but since that isn't the case, you will see some effect.

That being said, I don't think you can justify cutting out 4 quaternion multiplications per physics step when every single interpolated orientation already involves a quaternion multiplication. This can be dozens of extra quaternion multiplications. The much larger problem, IMO, is how to do rotations more efficiently. I am working on something to this effect. Since, for any linear interpolation, we know that the axis of rotation is constant over that time interval,(If we did cubic interpolation, the axis of rotation might change and this would make efficiency more complicated). Knowing that the axis of rotation is constant may allow us to pre-calculate some quantities once per physics step instead of doing it at each interpolation.

Consider rotation quaternion (a1,b1,c1,d1) acting on current orientation quaternion (a2,b2,c2,d2)
The first term of the result is a1a2-b1b2-c1c2-d1d2. But we know that (-b1b2-c1c2-d1*d2)=unit(b1,c1,d1)*(b2,c2,d2)*sin(alpha/2) where alpha is the angle through which we are rotating (goes up with time). So, we can calculate tempvalue=unit(b1,c1,d1)*(b2,c2,d2) once for the entire interpolation, and the first term of the resulting vector is then just a1*a2-sin(alpha/2)*tempvalue. Since we needed to compute sin(alpha/2) anyway(since we are using angle,axis representation to avoid losing the number of rotations per time step), this just saves a bunch of computation at interpolation time. Similar tricks can be used on the other terms too I think, but I haven't finished.
Certainly; but keep in mind that in a game engine, optimizations are a must.
Yes, but shouldn't we have a game engine before we start optimizing? That way we can see what impact (if any) our optimizations have on gameplay and believability.
. Have a look at the code: Converting from a vector of rotation to a quaternion requires a length and a normalized vector, to obtain angle and axis, first of all, which involves a square root. Then the quaternion's constructor has to compute a sine and a cosine. Sine, cosine and square root are multi-cycle instructions. They obtain values by successive approximation; using Taylor series, or something along the lines... They are sloooowwwww instructions. Specially at double precision. And we're doing that not only in rk4Step(), but at each call of helper(). Going Euler for rotations gets rid of this code at least across helper() calls, which is 3 times out of 4. That would be a quadrupling of speed, if rotation was originally 100% of physics, which it isn't; but I'm sure it's well over 75% of it, in terms of cpu load. Roughly, I would expect at least a doubling of the overall physics performance, from this optimization. At least! At a cost of (perceivable) degradation of less than 1% (subjectively assessed).
Well, you can neglect the normalization. It doesn't need to be done often. It only needs to be done to keep the quaternion from slowly wandering away from unit length due to machine precision issues. The sin and cos functions are unavoidable if you want a "real" rotation. However, let me point out several things.

First, if you have a governor that makes rotations only 3 degrees per physics step, you don't need to call sin or cos. You can instead make a function smallanglesin(double angle) that simply returns angle and smallanglecos(double angle) which returns 1-(angle^2)/2. For 3 degrees or less in a physics time step, these functions are accurate to within 4 parts in 10000. I did not suggest this before because you seemed to be concerned about graphical objects making multiple rotations in a single physics step. For large rotations > 10 degrees per physics step, these approximations break down.

I don't know how the math library calculates sin and cos, but to get double precision, they probably only have to follow the taylor series for three or 4 terms for small value of the angle. For the largest possible value of the argument, Pi, it takes about 5 terms of the expansion to see significant convergence and probably another 5 to get it to double precision. These things are written to use the least number of terms they can to achieve the desired precision

My point is that you need to make up your mind about whether or not high spin rates are something to be concerned about. First, you said we need to be concerned about them in case an object makes multiple rotations during a physics step. Then you said that it is only 3 degrees per physics step as justification for eliminating spin from the thrust direction considerations. If we guarantee small angles, we can do all kinds of optimization.

Also, going "euler" for rotations is not really accurate. The calculations for rotations are already euler. What you are suggesting doing is holding orientation constant over a physics time step for purposes of calculating all the intermediate states. The final orientation delta could still be calculated with an rk4_step because the rotational velocity at the advanced steps would still be calculated. However, in each of the intermediate state calls to getForceTorqueandMassLoss, the original orientation at the beginning of the time step would be used instead. The part you are saying is the actual changing of the orientation(the construction of the rotation quaternion where sin and cos come in and the quaternion multiplications) in each of the helper steps would be eliminated. However, while you CAN eliminate these steps here for some gain in performace, it will not be NEARLY as large as you think. First, as I said before, these orientation calculations have to be done at EVERY interpolated state in order to actually rotate the object. This will be a large chunk of the run time. Second, the getForceTorqueandMassLoss function is still called 5 times in your proposed hybrid step. This too is a big chunk of the run time in an actual game. It isn't too expensive in the little mock up program I wrote with a single object going around in circles, but it will be much more expensive in game. All told, I expect that the actual rotations inside the physics step, while they may be a good chunk of the computational expense of the rk4_physics step, they are not a big chunk of all the things that need to be done in a physics step. Now, I could be wrong about the computational gains, but I think there are enough things going on, that is not clear which one of them is the bottleneck. And we shouldn't give up ANYTHING in the rk4_step if it isn't the slow part of the code.

Its like a circuit with 3 resistors in series of 1 ohm, 100 ohms, and 10000 ohms. Decreasing the resistance of the 1 ohm resistor doesn't really change the overall resistance much. To me, the quaternion rotations in rk4, interpolations, and calls to getForceTorqueandMassLoss are 3 resistors of comparable size. It is unclear to me which is the biggest and it is likely that in different circumstances, different resistors will be the largest. However, if the quaternion rotations in rk4 are a small fraction of the overall computational expense (for example, a planet with large physics steps that still needs to be interpolated frequently), then the gain you gain very little in performance, but you increase code complexity and decrease gameplay and believability. This, to me, makes the optimization not a CLEAR optimization. It is only an optimization in certain cases. This is why I oppose it at THIS juncture. That is not to say we shouldn't try, but I want to be able to see the in-game performance increase.
What would the code look like?
If you are going to do a hybrid, I should write it. I think I understand what you are getting at and it doesn't require eliminating the rk4_step for computing the new orientation at the end. It just requires holding orientation constant throughout all the helper steps to avoid the expensive sin, cos and quaternion multiplications functions.

I will see if I can do it this weekend. I will make it a 4th potential algorithm to use for stepping through the physics. That way we have a clean rk4 in case the performance gain is deemed not worth the loss of precision.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Let me address a later paragraph first:
Errol_Summerlin wrote:My point is that you need to make up your mind about whether or not high spin rates are something to be concerned about. First, you said we need to be concerned about them in case an object makes multiple rotations during a physics step. Then you said that it is only 3 degrees per physics step as justification for eliminating spin from the thrust direction considerations. If we guarantee small angles, we can do all kinds of optimization.
This is the thing: The only fast spinners I would expect are chunks of metal and and other debris when a ship explodes. Specially capital ships: Once you put enough torpedoes into them that you get a first internal explosion, I've been fancying for years to have a series or chain of explosions, and tons of debris spewing out, each piece spinning at a different rate, around some random axis. Note that, at least theoretically, these debris would need NO physics simulation at all, except one first step that sets up the interpolation struct data. Why NO physics? Because there are no forces on them, except gravity, of course, so, say one physics step every 8 seconds. But some could be spinning faster than a full rotation per second. So, this is all inter/extrapolation work.
When I said it's only 3 degrees, I was referring to the player's ship, which is physics-simulated at maximum speed, unconditionally.
In other words: The problem of thrusting while rotating doesn't mingle with fast rotation rates, and fast rotation rates don't mingle with thrusting.
So, as far as thrusting vector interpolations are concerned, you CAN interpolate quaternions, or substitute angle for sine(angle), etceteras.
But for rotation simulation, we need the ability to interpolate multi-cycle-per-step rates.
In other words, when you call my function for forces, torque and mass loss, I could use filthy tricks for thrust vector interpolations, IF I had the data, of course... --and I'm having a wild idea about that, that maybe rk4step() could compute rotation first, update the interpolation structure, and pass a pointer to it as an argument to forces, torques and mass loss...
EDIT:
More like this:
  • Call a get_mass_loss() function.
  • Call a get_torque() function.
  • Compute rotation (using Euler + Mid-point or whatever)
  • Update OutputState as far as rotation and rotation interpolation is concerned
  • Call get_forces( OutputState * ) function (with output state as argument so I can interpolate rotation for thrust vector)
  • Compute translation (RK4)
  • Update OutputState as far as translation interpolation is concerned
Furthermore, this won't happen Tell you why: The position rk4 will call the forces and mass loss function several times, right? When I have to compute thrusting, at each call, I will have to interpolate rotations; --linearly, admittedly, but interpolated thrustings, anyhow. So you will get a nice shower of thrust vectors that more or less will cover the rk4 step size. ... Do correct me if I'm wrong, tho.
Well, if I understand what you are suggesting, then yeah, that won't work. It brings back all the computational expense or the rk4 step with none of the benefits. I will try to explain why below.

This is the PHYSICS step...you don't have an advanced state to interpolate to yet. IF you advance the rotation variables to the next step first, then you can go back and use the interpolated values for the rotations to compute the orientation at other times for purposes of advancing the other variables, but that doesn't actually save you anything. Calculating the orientation at an intermediate step in rk4 is done by a simple euler step. If you look at the rk4helper routine, you will see that all of the intermediate steps are just euler steps. This is just as expensive as interpolating the rotation linearly and you still have to query for force, torque and mass loss just as often. Both interpolating and the euler step involve a quaternion multiplication (the most computationally cumbersome part of the rotations). So, you have given up accuracy of simulation for 0 gain in run-time. Also, with a linear interpolation, the derivatives of the rotation variables will not be accurate. These are what rk4 really cares about anyway

IF you treat the orientation as constant during all the intermediate rk4 step, then you can avoid most(4 out of 5) of those cumbersome quaternion rotations, but every time you go back to query the system for the force, you will be feeding it the old orientation (since you chose not to waste the resources computing the new orientation at the intermediate steps and just calculate the change in orientation once at the end of the step). This MAY work ok, but I can't be sure. I have never tried something like that. It would certainly work if the force was independent of orientation (like a pure gravity simulation), but since that isn't the case, you will see some effect.

That being said, I don't think you can justify cutting out 4 quaternion multiplications per physics step when every single interpolated orientation already involves a quaternion multiplication. This can be dozens of extra quaternion multiplications. The much larger problem, IMO, is how to do rotations more efficiently. I am working on something to this effect. Since, for any linear interpolation, we know that the axis of rotation is constant over that time interval,(If we did cubic interpolation, the axis of rotation might change and this would make efficiency more complicated). Knowing that the axis of rotation is constant may allow us to pre-calculate some quantities once per physics step instead of doing it at each interpolation.

Consider rotation quaternion (a1,b1,c1,d1) acting on current orientation quaternion (a2,b2,c2,d2)
The first term of the result is a1a2-b1b2-c1c2-d1d2. But we know that (-b1b2-c1c2-d1*d2)=unit(b1,c1,d1)*(b2,c2,d2)*sin(alpha/2) where alpha is the angle through which we are rotating (goes up with time). So, we can calculate tempvalue=unit(b1,c1,d1)*(b2,c2,d2) once for the entire interpolation, and the first term of the resulting vector is then just a1*a2-sin(alpha/2)*tempvalue. Since we needed to compute sin(alpha/2) anyway(since we are using angle,axis representation to avoid losing the number of rotations per time step), this just saves a bunch of computation at interpolation time. Similar tricks can be used on the other terms too I think, but I haven't finished.
Okay, if I'm understanding correctly, you had no plan to obtain interpolated vectors from me, when you call my forces, torques and mass loss function. Correct? I always assumed I was expected to come up with them, somehow.
Okay, let me say first that I consider positional accuracy maybe two, maybe three, orders of magnitude more important that rotational accuracy. That is, if a ship should have turned 59 degrees since the previous physics step, but it turned 57 degrees, instead, ... frankly I don't think that matters at all. Can't see how it could matter. What I think matters 1 to 1.5 orders of magnitude more, is that given it turned 57 degrees, acceleration from thrusting should be pretty close to the average of all those vectors across the 57 degree arc.
In other words, I think that interpolating rotations for the sake of those force, torque and mass loss calls is MUCH more important than simulating rotations accurately.
But as I mentioned above, these interpolations are small angle interpolations, given that ships are slower rotating than pieces of debris, AND get simulated more often (AND pieces of debris don't have thrusters).
So, I would continue to advocate Euler for rotations (possibly with mid-step), and using axis and angle, to make it possible for the interpolator to interpolate multiple rotations; and for the calls to forces, torque and mass loss, I think we should pass rotation and rotation interpolation data to the callee, so that I can give you accurate force vectors (for your rk4 translational computations).

Well, you can neglect the normalization. It doesn't need to be done often. It only needs to be done to keep the quaternion from slowly wandering away from unit length due to machine precision issues.
Actually, I think I missed that square root. I was referring to the square root needed to convert vector of rotation to axis and angle, which you do before converting to a quaternion.
The sin and cos functions are unavoidable if you want a "real" rotation. However, let me point out several things....
Understood.
Its like a circuit with 3 resistors in series of 1 ohm, 100 ohms, and 10000 ohms. Decreasing the resistance of the 1 ohm resistor doesn't really change the overall resistance much. To me, the quaternion rotations in rk4, interpolations, and calls to getForceTorqueandMassLoss are 3 resistors of comparable size. It is unclear to me which is the biggest and it is likely that in different circumstances, different resistors will be the largest.
Yep, it's not clear to me either; but at least I think we are beginning to understand each other. I hope my statements above clarify my thinking for you a bit more.
Basically, I think there are three areas of accuracy that have vastly different "player value", and different requirements:
  • Rotational physics step: Lowest player value for accuracy, but must produce interpolation data that works for faster spinning than the simulation rate.
  • Rotational thrusting vector interpolations at each call of forces, torques and mass loss: Medium player value for accuracy, but does NOT need to support faster spinning rate than physics simulation rate; --and in fact the angle per step would be small, most of the time.
  • Translational physics step: Highest player value for accuracy, specially with regards to inertia and gravity (orbital trajectories).
However, if the quaternion rotations in rk4 are a small fraction of the overall computational expense (for example, a planet with large physics steps that still needs to be interpolated frequently), then the gain you gain very little in performance, but you increase code complexity and decrease gameplay and believability. This, to me, makes the optimization not a CLEAR optimization. It is only an optimization in certain cases. This is why I oppose it at THIS juncture. That is not to say we shouldn't try, but I want to be able to see the in-game performance increase.
This I don't get. A planet has no torque forces acting on it, or thrusters; therefore doing Euler or RK4 for rotation, in my mind, would give us the exact same result, no?
What would the code look like?
If you are going to do a hybrid, I should write it. I think I understand what you are getting at and it doesn't require eliminating the rk4_step for computing the new orientation at the end. It just requires holding orientation constant throughout all the helper steps to avoid the expensive sin, cos and quaternion multiplications functions.

I will see if I can do it this weekend. I will make it a 4th potential algorithm to use for stepping through the physics. That way we have a clean rk4 in case the performance gain is deemed not worth the loss of precision.
Cool!

Update:
I've made all the changes needed for testing the simulation over a big array of RigidBodies, and vectorized the OutputState structs and all that, and I got it to compile and link, but when I try to run it I get a segmentation fault. What's the command for the debugger? Can't remember what it was. Tried db, dbg, debug... Nothing exists.
charlieg
Elite Mercenary
Elite Mercenary
Posts: 1329
Joined: Thu Mar 27, 2003 11:51 pm
Location: Manchester, UK
Contact:

Re: rk4 algorithm sent to you chuck

Post by charlieg »

gdb (short for gnu debugger)?
Free Gamer - free software games compendium and commentary!
FreeGameDev forum - open source game development community
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

charlieg wrote:gdb (short for gnu debugger)?
That was it; thanks!
So, found that problem with backtrace; now it runs.
But now I get nothing but NAN's in standard output. Probably a problem of initialization...
(I wrote a proper constructor for RigidBody, but I'm not sure I wrote it properly.)
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Okay, Klauss, here is some hard data:
I'm putting my total physics iterations constant, at 10 million; and I can vary the size of the array, but the
size divides the total iterations to obtain the actual number of passes.
So, in a nutshell, no matter what size the array is, the total number of rk4 steps is 10 million.
And the number of interpolations performed is 10 per physics step, first column; 1 second column.
The order of the operations is, well, here is the test code:

Code: Select all

    #define TOTAL_PHYSICS_ITERS 10000000
    #define NUM_OBJECTS 5000
    #define NUM_ITERATIONS (TOTAL_PHYSICS_ITERS/NUM_OBJECTS)
    #define NUM_ITER_PRINT (NUM_ITERATIONS/10)
    #define DT 0.01
    #define NUM_INTERPS_PER_STEP 10
    RigidBody objects[NUM_OBJECTS];
    RigidBody::OutputState a[NUM_OBJECTS];
    RigidBody::OutputState b[NUM_OBJECTS];
    .................. < initializations code > ..........................
    for( size_t i = 0; i < NUM_ITERATIONS; ++i )
    {
        for( size_t objn = 0; objn < NUM_OBJECTS; ++objn )
        {
            objects[objn].rk4Step( dt );
        }
        for( size_t n = 0; n < NUM_INTERPS_PER_STEP; ++n )
        {
            for( size_t objn = 0; objn < NUM_OBJECTS; ++objn )
               ipos += objects[objn].get_interpolated_position( double(n)*(DT/NUM_INTERPS_PER_STEP) );
        }
        for( size_t objn = 0; objn < NUM_OBJECTS; ++objn )
        {
            objects[objn].AdvanceState();
        }
        ..................... < printing code > ...................
    }
}
Okay, so here's the data (array size is NUM_OBJECTS, which I vary and recompile):

Code: Select all

array size     time (seconds)
array size   10interp 1interp
  9999        83        29
  5000        54        27
  2000        47        27
  1000        32        26
   500        30        24
   200        27        23
   100        26        23
    50        26        22
    20        26        22
    10        26        22
So, it looks like you were right, as usual; when the interpolation structures don't fit in cache we have a problem; and even though we don't have explicit precacheing, we do have implicit, so precacheing is active but not helping.
However, our REAL interpolation will interpolate rotation, as well as position (right now it doesn't); and it will also compute velocity (which right now it doesn't), so I think that in the end it WILL benefit from precacheing (implicit or explicit), if only by being slower... :(
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Ehmm...
I'm thinking:
Using 8000 units as a maximum base-line:
Assuming 500 are in the 16 Hz bin, 500 in the 8 Hz bin, 500 in 4 Hz ..... and 500 in the 1/4096 Hz bin.
Average is 1000 units simulated at 16 Hz; so, 16,000 sim steps per second, total.
But if we interpolate them all at 32 Hz, that's 8000 x 32 = 256,000 interpolations per second.
Now, 256,000 / 16,000 = 16 interpolations per physics step, average.
So, I'm going to measure time at 8000 array size, and 16 interpolations...

72 seconds.

That's 7.2 microseconds per 1 physics step plus 16 interpolations.

16,000 x 7.2 = 115,200 microseconds = 115.2 milliseconds = 0.1152 seconds of cpu time per second...

That's 11.52% of CPU time!!!!! :shock:
Someone was suggesting we don't optimize? :P

EDIT:
Committed r26.
Compressed:
http://wcjunction.com/temp_images/RK4.tar.gz
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Was just in chat with Klauss, and for the record:
  • He thought the results were very encouraging; --obtained a figure of 277 FPS at 8000 units, if all the engine did was phyisics and interpolations
  • I reminded him that presently we only write position-related interpolation data in rk4, and our interpolator reads that and interpolates position only; but that eventually,
    • rk4step will write rotation interpolation data in addition to position, and
    • the interpolation routine will read the added data and interpolate rotation as well as position,
    • plus velocity,
    • plus rotational speed
  • But that then the interpolation code will benefit more from precacheing than it does now
  • But forgot to mention to him the complexity of the getForceTorqueAndMassLoss() function, and the frequency with which it will be called (I think four times per rk4 physics step)
  • Klauss mentioned in passing that it is important to avoid power-of-two arrays, due to the possibility of engaging accesses to cache that are right on the associative stride, which is very architecture-dependent, but always a power of two. The formula, he believes, for stride, is the size of the cache divided by the number of associative ways. Thus, a 2 Meg L3 cache that is 8-ways associative would have a stride of 256 K bytes, or 4096 cache lines (for 64-byte cache line size).
  • He thought a good policy might be to pick prime numbers.
  • I thought a better policy might be to pick Fibbonacci numbers; reason being that prime numbers can add to powers of two; but Fibs only add to Fibs.
In any case, I think the results are indeed encouraging; but we will DEFINITELY have to work hard on optimizations, and then SIMD-fy the heck out of all the forces, physics and interpolation code.

Klauss also pointed out that IF the SIMD-fication of the physics and/or interpolation code favors computing position for a large number of objects, then velocity for all, then rotation for all... that a better way of precacheing would be to take entire blocks of units to process (non-power of two blocks), and prefetch the next block while processing the current block. The two blocks must fit in cache (of targeted cache level, I suppose).
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

@chuck: perfectly accurate recount - and correct supposition at the end ;)

I'd vote for integration as soon as the code is deemed ready and the current release is made.

(passing question: what happen to the current release? what is there left to do?)
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

klauss wrote:@chuck: perfectly accurate recount - and correct supposition at the end ;)
Hehehe, the planets must be aligned ;-)
I'd vote for integration as soon as the code is deemed ready and the current release is made.
YEY! Let's do it.
My biggest worry is the surgery on Unit.
A lesser worry but still worrisome is getForceTorqueAndMassLoss(), which should be trivial to write, but less than trivial to optimize, and highly polymorphic. Not sure how it should be implemented. I hate virtual functions. Probably a good candidate for Visitor Pattern knock knock.
I got the toughest part of the scheduler coded (the bit array of flags, and the prefetching iterator), but I haven't even tried to compile anything yet.
(passing question: what happen to the current release? what is there left to do?)
Passing question indeed.
I'm not sure; and I sympathize with the "release often" motto; but on the other hand, developers are human, and releasing is a chore, not fun; this is fun. But since you ask, I believe where things were sitting was on the reported problems in Windows in PU. I suggested there be some windows testing done with UTCS, but that hasn't happened. And for my part I've been procrastinating on finishing up the reorganization of shaders into folders and all that.

EDIT:
I'm going to try and put my thoughts together on integration into a list of tasks, and put it into a new post.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Integration of RK4 physics
  • What CAN be left out, for now:
    • SIMD-fication and some other tactical optimizations (but NOT strategic optimizations)
    • Tensor of inertia (use moments, with a magical scaling factor, since they are all wrong in units.csv) ... Or ... Should we bite the bullet and fix units.csv (at least as far as moment of inertia is concerned)?
    • Off-line code to compute tensor of inertia for all units --and update units.csv-- can be tackled later
    • Dynamic up-keep of list of gravitational attractors per-object. Just make planets the only attractors, and they all attract all the time
  • Finishing RK4 physics:
    • Writing of orientation, rotational velocity, rC1 and rC2 to OutputState (Errol)
    • Ensuring that rotational data in OutputState is high spin rate compatible (Errol)
    • Splitting getForceTorqueAndMassLoss() into three functions (CS and Errol)
    • Coming up with an efficient way to compute Force with extrapolated vectors. Perhaps getForce() could pass the orientation? (Orientation interpolations for this function can assume angles are small.) (CS and Errol)
  • Finishing interpolation:
    • Make it interpolate rotations ( must work for rot freq > sim freq ) (Errol)
    • Make it interpolate linear velocity and rotational velocity (CS)
    • Write linear interpolator of cubic interpolations (CS)
  • Finishing scheduler:
    • Bit Array and iterator classes are coded but have never been compiled. (CS)
    • The scheduler class itself is only barely started but should be trivial (CS)
    • Prioritization for assignment of units to frequency bins has to be written (CS)
    • In the physics scheduler I have a call-back function to provide it with a 32Hz pulse; but I don't know where it comes from.
    • What do we do with collisions? Do we try to integrate physics to the old collisions?, or code the new collision scheduler?
  • Unit surgery:
    • Need to separate all physics and collision data and functions from Unit, but it will probably involve frequent testing
    • Refer to: http://vegastrike.sourceforge.net/forum ... 27&t=15331
    • First we need to find a good, fast, non-thread-safe Array Allocator to use for RigidBody instances.
    • Question: Do we move ALL physics and collision crap from Unit to RigidBody?, or leave RigidBody as is?
    • RK4 will often call a function getForceTorqueAndMassLoss() (calls it not just once per physics step but 4x per...
    • Suggest we split it into getMassLoss(), getTorque(), getForce( interp& ). Anyhow, they have to be written.
    • Actually, first thing to do is organize header inclusions for unit_generic.h/.cpp, I think, to reduce build time at each testing break.
  • Integration:
    • Linear interpolator of cubic interpolations -> Graphics
    • Array allocator of RigidBodies <-> Unit
    • Array allocator of RigidBodies <-> Physics+Collision
    • Gravity...
  • Uncertainties:
    • What do we do with collisions? Do we try to integrate physics to the old collisions?, or code the new collision scheduler?
    • In the physics scheduler I have a call-back function to provide it with a 32Hz pulse; but I don't know where it comes from.
    • What happens with Asteroids? Asteroids currently use un-real physics that makes them rotate around nothing for no reason.
    • Do we rewrite gravity?, or just re-enable the old code for it?
    • Is there a cross-platform way to obtain system time at finer resolution than one millisecond? Perhaps from the gpu?
  • Expected problem areas:
    • Initial state of RigidBody, OutputState, and other interpolation structures.
    • Avoiding pile-up's in the collisions scheduler, IF we decide to code it now.
    • Code to deal with collisions, going "back in time" to re-initialize the physics per bounce, etceteras.
    • Torque generation by collisions; I have NO IDEA how we're going to do this.
    • Unit surgery, in general.
    • Interfacing with AI, autopilot, FlyByWire, ...
  • Removal of dead code: SIMULATION_ATOM, old physics, old collisions, (gravity, IF we rewrite it).
  • Windows compiling and testing.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

This will be BIG - it should be done on a branch.

Say, /branches/rk4_physics
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: rk4 algorithm sent to you chuck

Post by chuck_starchaser »

Okay, I'll get a branch started. Just want to make sure trunk compiles for me first; right now it doesn't; looks like I need ffmpeg; and went to fetch it but decided to do Ubuntu updates first; --hadn't done them in more than a week.
In the meantime, maybe you could prioritize the task list above; I planned to write a procedural task list, but couldn't, as there are too many things I know nothing about, as per the "uncertainties" and other questions sprinkled all over, such that I'm too unsure about task dependencies, and about how to divide up the work so that integration can proceed by parts.
What I can start doing is moving code into the new branch-to-be into appropriately named folders.
klauss
Elite
Elite
Posts: 7243
Joined: Mon Apr 18, 2005 2:40 pm
Location: LS87, Buenos Aires, República Argentina

Re: rk4 algorithm sent to you chuck

Post by klauss »

chuck_starchaser wrote:Integration of RK4 physics
  • Finishing RK4 physics:
  • Finishing interpolation:
  • Finishing scheduler:
That should happen before integration (ie: before branching), IMO.
chuck_starchaser wrote:Integration of RK4 physics
  • Uncertainties:
    • What do we do with collisions? Do we try to integrate physics to the old collisions?, or code the new collision scheduler?
I'd go for a hybrid: ie, do the new scheduler (with full collisions every physics frame), but without the octree structure, and rather use the current sorted list structure. We'll see if it poses a performance issue once it's done, and if we need to go all the way to the octree or we can leave it for later.
chuck_starchaser wrote:Integration of RK4 physics
  • Uncertainties:
    • What happens with Asteroids? Asteroids currently use un-real physics that makes them rotate around nothing for no reason.
That behavior will have to be retained, it seems to be a core gameplay feature for privateer-like mods. We'll need a form of scripted motion I guess.
chuck_starchaser wrote:Integration of RK4 physics
  • Uncertainties:
    • Do we rewrite gravity?, or just re-enable the old code for it?
    • Is there a cross-platform way to obtain system time at finer resolution than one millisecond? Perhaps from the gpu?
Rewrite - it's the EASIEST way, rk4 should handle it transparently.

There's no cross-platform way of obtaining finer resolution time - period. All possible (reasonably portable) ways have been coded in lin_time.h/cpp, to my knowledge.

PS: the new soundsystem thus uses sound tracks as time source for lipsync with movies, system time isn't accurate enough. But using sound tracks as time source for physics wouldn't be viable IMHO.
Oíd mortales, el grito sagrado...
Call me "Menes, lord of Cats"
Wing Commander Universe
Post Reply