Beyond 4.4

Post by **ace123** » Sun Jul 29, 2007 9:37 pm

Yeah... that would be nice.
Unfortunately it would be extremely hard to do anything in that other thread.

The only thing I can see for making this work is to optionally take a screenshot, save it as a texture, and then go in an independent rendering loop displaying that animation.
That way the other thread can tough universe stuff and textures and units all it wants without interference.

Speaking of which, I have a feeling that the Music loading thread (Muzak::loaderThread), in its dealings with strings could be causing memory problems. I'm not entirely sure though.

Actually, now that I think about it, why not hook into bootstrap_draw(), and have it render a frame of the animation in a single thread while loading meshes.

At worst, we can do a loading screen with almost no additional coding/debugging by making the dynamic universe python script bring up the loading screen and display messages as it is loading.
I believe this is already implemented with the VS.showSplashScreen(), VS.hideSplashScreen(), and VS.showSplashMessage(string text) python calls. They just aren't being used in the correct places in the python script yet.

Post by **safemode** » Sun Jul 29, 2007 9:58 pm

the wormhole thread was an afterthought. My main concern is the other stuff i was talking about.

But what i was thinking the wormhole thread could do is render a generic non-specific animation. Like a first person view of entering and traveling through it, then bounce back out of the thread for entering into the new system.

This is just a cosmetic thing though. I'm much more interested in throwing around the other ideas. Especially the unit deletion since it seems to be the most easiest to implement... but the unit creation thread would be the most beneficial to in-game performance.

Post by **safemode** » Sun Jul 29, 2007 10:30 pm

basically the "loading screen" would just be wormhole travel animation set on loop until the new system is loaded.

For unit deletion we would use a global delete queue and a couple of global locks to keep the main vegastrike code from doing anything for when the thread is reading and popping from the queue and when the vegastrike main program is pushing onto the queue. These locks would only be held as long as it takes to assign a pointer or pop from the queue... no time at all. The thread would wait and begin processing the queue every X amount of time and sleep the rest until the application exits.

Unit creation is another thing altogether. I'm thinking of a way to spawn a thread each time a unit is asked to be created and having the main vegastrike code loop at a slow (but not unplayably slow) rate between physics sim and sleeping until the unit is created. Hopefully this would result in no lag on unit creation while maintaing serialization so things dont start happening out of order.

AzureSky · Post by **AzureSky** » Tue Jul 31, 2007 12:19 am

I was thinking some about making Unit smaller, and was wondering whether storing the meshes would benefit any from the Flyweight design pattern. That way, if you had 10 instances of Unit X flying around, you would only have one Mesh X in memory, instead of 10 of them.

Would this be worth pursuing?

Post by **safemode** » Tue Jul 31, 2007 1:25 am

sharing memory would definitely be a plus.

Whatever it takes to reduce the load of creating a unit and destroying a unit. Since that seems to be the greatest cause of in-game stuttering.

Between physical changes to the unit class to reduce load, both memory and processor, and pursuing threading the creation and deletion of Units, things should look a lot nicer in gameplay post 4.4

charlieg · Post by **charlieg** » Tue Jul 31, 2007 10:52 am

AzureSky wrote:I was thinking some about making Unit smaller, and was wondering whether storing the meshes would benefit any from the Flyweight design pattern. That way, if you had 10 instances of Unit X flying around, you would only have one Mesh X in memory, instead of 10 of them.

Would this be worth pursuing?

I can't believe it's not that way already...

AzureSky · Post by **AzureSky** » Tue Jul 31, 2007 2:27 pm

charlieg wrote:
AzureSky wrote:I was thinking some about making Unit smaller, and was wondering whether storing the meshes would benefit any from the Flyweight design pattern. That way, if you had 10 instances of Unit X flying around, you would only have one Mesh X in memory, instead of 10 of them.

Would this be worth pursuing?
I can't believe it's not that way already...

It may be actually; I haven't looked into the code enough to know. If it is, it's not very explicit about it.

Halleck · Post by **Halleck** » Wed Aug 01, 2007 12:22 am

I'm only just beginning to fully grok what it is you're talking about here but it sounds great. Stuttering on creating units (and the ludicrous load time for the default mission) is one of the things that bugs me the most when playing VS, and anything that can be done to reduce the overhead for this would be awesome. I hope you guys can find a way to make it work.

Post by **safemode** » Wed Aug 01, 2007 1:23 am

I'm going to probably make a commit to my branch that implements some basic threading across the ProcessDeleteQueue function. Very rough draftish. I want to see if anyone notices any improvement. Remember, the big stutter is caused by creation, not deletion.

Post by **safemode** » Wed Aug 01, 2007 3:52 am

Well, threading the delete queue has hit a snag, It seems that in the destructor for the Mesh class, it updates some gllist and that does not like being in a thread at all. It's strange, i'm not sure why it has a problem, it'll crash even if i create the thread and join it right after (no parallel code executed that isn't already being done). Very strange.

Post by **safemode** » Wed Aug 01, 2007 4:11 am

In fact, everything related to GL, which is most interesting operations, causes a segfault when i execute it in a thread...regardless of if anything is processed in parallel or not.

The following code shouldn't change the execution path of the program at all.
pthread_create()
pthread_join()

So i dont know why it causes crashes with gl. I'll have to look into this further.

Post by **safemode** » Sun Aug 05, 2007 1:53 am

This post is in response to the previous posts about threading unit creation and deletion to work around a stuttering problem during gameplay.

In an effort to track down what is causing these delays in play whenever a unit is inserted into the universe i've been profiling vegastrike in a mission where all it does is constantly, every frame, insert units into the game. This should (if the issue is indeed related to units being inserted into the game) magnify that problem function and make it readily apparent what the bad guy is.

Unfortunately, profiling alone wont give you latency information. The worst case latency per frame of all the functions would be ideal information to figure out which function is responsible for causing these stutters.

In any case, in profiling vegastrike I found that my UnitIterator::advance function was somewhat slow, and progressively became worse when the unit list size became extremely large. This differed from the old UnitCollection that didn't suffer from this exponential increase in execution time per call. After a couple days work, I've fixed my advance function. The new UnitCollection now performs faster than the old UnitCollection did, about 14% faster on avg (see std::list thread ).

This leaves a couple other functions to blame. The ParticlePoint::Draw method is the biggest time consumer. Commenting it out however doesn't fix the stuttering problem. Another big spender is csCdBBox::BuildBBoxTree, but this only executes 1 time for different unit being loaded. I haven't checked for certain, to make sure that the stutter still exists after all 140 or so calls to it. So this is still suspect, and currently our best one.

The profile can be retrieved from my website. It includes the commenting out of particlepoint::draw. http://signal-lost.homeip.net/files/vs_ ... ile.txt.gz

The latest svn includes my changes up to the current revision of my branch. I wanted to get the speedups of advance out there bofore everyone started profiling and telling me my advance function is to blame. There are definitely still areas that can be even more streamlined, but now it's faster than the old one so maybe now i can stop obsessing over it and track down why particlePoint::draw is so inefficient and what's wrong with the BuildBBox function.

So, what does this mean for post 4.4 you may be asking. Well, this hopefully means that post-4.4, you should expect to not have a stutter whenever units are inserted into the system. You can also expect less overhead from the collection class and iterators than before, allowing more units to be held in the collection than before without having a detrimental effect on the performance of the list. This makes way for the goal of simulating over 10,000 units in game.

Post by **chuck_starchaser** » Sun Aug 05, 2007 3:12 am

Just some quick thoughts: I think Azure Sky was talking about something called "instancing". I don't know much about it, except that it's a gpu feature allowing the gpu to keep one copy of a mesh in video ram but place it at multiple locations in a scene. Supposed to be much faster as well as more memory efficient.

This is probably a stupid question, but the problem with unit creation stutters, could it not be due to the time required to load the textures from disk and send them to the videocard? Is the profiling you're doing using dozens of differnt ship types using big textures and competing for video ram real estate with backgrounds and cockpits and hud's and planet textures, or just adding thousands of the same type of ship all sharing the same texture?

I was thinking the other day, memory is like hundreds of times slower than modern cpu's. Wouldn't it be more efficient if the engine forced the use jpg instead of png for any non-alpha textures? (Like, if( !has_alpha() && ! is_jpg() ) throw( umoron() ) ); then, the files would be 10 to 20 times smaller, reducing disk and memory access time; more than compensating for the extra cpu work for decompressing the image, and reducing download size; then another function could re-compress the image using some dds compatible format before sending it to the videocard. Too many times I've seen megabyte png textures that should have been 50k jpg's. Heck, many of the backgrounds are bmp's!

Was also thinking about file access. It's been a while since I looked at the VS code much but I seem to remember that file acces was immediate. There's ways in windows, and I'm sure in *nix also, to prefetch files from disk. If you prefetch them a 20th of a second before you need them, by the time you need them they'd be in memory already.

Just my 2c.

Post by **safemode** » Sun Aug 05, 2007 4:00 am

I profiled using gcc's profiling code. I used the total_war_python.mission mission to test, which loads up _all_ the meshes. Hence the 141 total calls.

a lot of time spent in grabbing something from the filesystem would show up in the profile. What wouldn't show up is something in python land outside of the C++ function calls. Also, callback functions from gpu land dealing with GL may also cause high latency blocks. Other callback functions outside of vegastrike may also not show up in the profile.

After a while in unix, all the filesystem related things dealing with VS (if you have the ram, and i do) are cached in ram anyway. So hdd latency shouldn't be an issue. GL related code can be at fault. Python code could be at fault. Some normally fine function could have a moment of high latency whenever a unit is inserted, but this tiny moment could occur infrequently enough to not overshoot other functions in the profile, and if it's called a lot and for a majority of those calls takes a short amount of time, the avg willl be low per call. Basically, I just need a way to determine what functions cause the highest amount of time at any one instance. The profile gives avg's, I want maxes. the function that is at fault should have a max runtime of around 1 second (way too long). It may execute at a fraction of that most of the time, but all it takes is 1 time every blue moon in game to be too much. If anyone knows of a profiler that can give me max's rather than just avg's ...chime in.

Post by **chuck_starchaser** » Sun Aug 05, 2007 5:13 am

I used AMD's profiler, and it has more features than my head could get around. I don't think it had max latency reporting in statistical sampling mode, which was the only mode I used; but it had many other modes, like for profiling specific functions and stuff. It integrated well with Vstudio; not sure in gcc... Let me see...
Yes; Code Analyst for Linux:
http://developer.amd.com/calinux.jsp

Halleck · Post by **Halleck** » Sun Aug 05, 2007 12:05 pm

This is a bit tangenital, but as to chuck's png vs. jpg question:
I rather loathe the idea of storing all images as jpegs since I don't like converting from quantized to lossless formats...
Perhaps this is a bit superstitious for extremely high quality (90-100%+) jpegs, but I still prefer png's because it's easier on the mind in situations when the image has to be tweaked and re-saved in any way.

If pngs really are dragging the engine down, perhaps we could have a branch or a repository for source images stored in a lossless format, and convert them to jpg for trunk/release. Then I'd be a little less nervous.

Post by **chuck_starchaser** » Sun Aug 05, 2007 2:35 pm

Well, once I decided to find out exactly how much degradation was there in jpg, so I made a simple experiment: I blended, in Gimp, a jpg with the original, in difference mode, for various images and compression qualities. In every case, the difference was a black screen at first, and became multicolored dust after multiplying that difference many times.

Jpeg is good.
It's pretty much like using ogg for audio.
At the price of a subtle loss, you get a huge compression ratio.

But certainly there should be a repository for originals, as you suggest; --and for audio as well we should have all the wav originals safely stored. The losses in jpg AND ogg compression, although subtle, are un-recoverable. Any future work or tweaking of the images or the sounds *require* access to the un-compressed originals.

Post by **safemode** » Sun Aug 05, 2007 3:07 pm

opengl has a compressed texture loader and outputting function. Meaning, we can generate compressed textures of all our images on first run, and then load the compressed images for subsequent runs (or we can do that prior to a release and have vegastrike only grok for compressed textures).

Then opengl will use the compressed textures natively.

Is this not a possible solution or are we already using compressed textures and that has nothing to do with all the image data that we're loading from png files?

Post by **chuck_starchaser** » Sun Aug 05, 2007 3:57 pm

Well, png does have compression; it's just the wrong kind of compression in many cases. Png and jpg compression are complete different animals. Let me explain:
Suppose you create a texture that is a 50% gray, solid fill. Compress that with png and you get a tiny file, like 10k, say. Now, simply add 1 bit of random noise to this grey tone. Not enough to even see the difference, and try compressing it with png... Even at maximum compression, now the image takes megabytes.
PNG was meant to repace GIF, and it's similar to gif in functionality. It works well for text, banners, and anything using solid colors; but if you put noise, and continuous color gradients, PNG chokes.
JPG makes judgement calls, instead. If we go back to the half tone gray image with noise, for example, jpg will look at it and say "this noise isn't even visible, and there's no pattern to it, so the hell with it"; and it will probably give you still a gray image with like a bit of noise, but the noise might not be exactly the same.

For a numerical example, I've just been working on scaling Privateer images, and the ratio between png and jpg file sizes is consitently 20:1, even though I use 90% quality for jpg.
On the other hand, if you have a captured screenfull of text, and you compare file sizes between png and jpg for that, you'd be surprised to find png achieving better compression than jpg by like a 10:1 ratio. So, like I said, anything using solid colors is stuff for png; but anything having rust, grime, scratches and baked in ambient occlusion shadows, is definitely NOT for png.

Now, the type of (dds file) compression that gpu's natively deal with are yet a third kind of compression animal, which is lossy in a different kind of way from jpg. It's much more lossy than jpg, in fact, AND it doesn't achieve nearly as good compression. So what's the point of it? The point of it is that it is FAST. The GPU can decompress it on the fly; so the textures can be compressed IN video memory; effectively doubling or tripling apparent video memory space. But it would be *terrible* as a type of compression to put all the textures into, in the game download; because the (dds) texture files would probably be bigger (less compression) than png's.

Besides, dds compression can be tweaked to achieve some pleasant compromise between lossiness and compression ratio, which could be part of the settings, like "detail"; but which opportunity would be lost if the textures were dds-pre-compressed.

So, what I would suggest is what I did suggest before: Forcing the use of jpg for non-alpha channel-requiring textures, in the engine; and have dds compression immediately following jpg de-compression.

One might argue about the cpu load of decompressing from jpg and recompressing to dds on the fly; but I believe the savings in disk- *AND memory-* -access times will far outweight the extra processing; specially if the routine declares a static memory scratchpad for temporary storage of the uncompressed image, avoiding allocation and deallocation at each call.

Post by **safemode** » Sun Aug 05, 2007 4:19 pm

I made a new thread to handle this topic of fixing the stuttering problem. So for further discussion of compression to save disk space and such to reduce IO latency take it there. Gonna try and pull back to the topic of code to change/clean etc in vegastrike

Post by **safemode** » Thu Aug 16, 2007 1:42 pm

Definite:
Ok. so post 4.4(5.0 now) is going to get dds support.
It's going to get std::list unitcollection
It's going to get a user option to enable caching of compressed textures (for those not already a dds file)

maybe:
unit class refactoring. (gonna try and keep the api the same). Direct access to data members (not using a function) will stop. This will facilitate compartmentalizing.

C++ AI. This is a big one. We want to get the ai code optimized in C++ and still allow python to describe some tunable modifiers to personalize the AI. Additionally, the ai should be able to learn as the game progresses and factions befriend or make enemies of one another. This will facilitate another new feature interfaction diplomacy.

Diplomacy. Though mostly implemented in python, the infrastucture to allow inter-faction deals needs to be implemented. Pacts, communication, bargaining. It sounds really complicated, but we dont need to implement all of it in python in VS. That could be for other mods. I'm really interested in being able to communicate in ship to friendly factions as if it was a broadcast to every ship of the faction for instances when you need help in an emergency or want to coodinate a broad scale attack. The individual ships would decide to respond, but it would be all at once. Very cool. And if we want to add the rest of the features to VS's game, we would add the python and add menus on bases etc. I think that would add a very valuable and worthwhile feature to VS, since it is mostly a game based around interacting with dozens of factions.

Faction creation. Perhaps a python thing, not sure though if some C++ would need to be changed. We should be given the option to create a faction on campaign creation. Other computer ships could opt to join the faction as it gains power or diplomatic deals are made or simply bought. The computer will learn a new ai attitude based on your gameplay and this faction could gain control and conquer other factions. You could steal pirate bases for instance, or displace a faction as the commanding force in a given system. The nav computer should display controlling factions alongside system names, or in the sector map, display by color the systems a faction controls.

This opens up a missing section of a privateering universe exploring game like VS is. You can go it alone, or you can become the commander leading his army, you can become the tychoon, you can become the politician, all without changing the in-ship space combat/sim gameplay of the game. It by no means will be like a civ game, it would be much more exciting. And all of that is going to need the C++ backend.

Thats all my initiative. I dont know whats on other dev's agenda (thats the purpose of this thread)

I wish i was good at graphical programming:
I want to get a book and read up on opengl so i can dive into gldrv and gfx files and fix slow spots and maybe pretty up others. For instance, the particle code is extremely slow (single slowest function in game is drawing particles) and it may be due to the gl code. We aren't utilizing hardware features as much as we could be I'm sure. Effects to beam weapons and explosions and damage is missing.

I dont think I'll really get into that though, the other stuff is more important to me.

Halleck · Post by **Halleck** » Thu Aug 16, 2007 3:57 pm

This all sounds good.

Also, what about offloading some of the dynamic universe code to C++?

Right now it seems to me like that is taking up a lot of time in Python. Takes VS nearly three minutes to start a new game on my system, not that much better if I'm loading one.

I see in my stdout that it spends loads of time "generating capital" (ships) and then launching them.

Post by **chuck_starchaser** » Thu Aug 16, 2007 4:26 pm

I really like the idea of faction creation; might come really handy in PU and future WC-related mods, where, to map all factions statically might make a huge list. Factions could split along the story line, or merge, or go missing, or be encountered late in a game plot.

Along the same lines, another feature that would be even more useful, and even urgent, would be dynamic system and jump point creation. Case in point: In the original Privateer game, there were jump points that don't exist until someone gives you their coordinates. But in VS PR, those jump points appear there, and can even be jumped, before their coordinates are given to you. Another way to solve the problem would be to have a mod switch that makes jump points invisible and/or un-jumpable until they are listed in your nav computer's map.

Post by **safemode** » Fri Aug 17, 2007 12:27 am

while that's probably a strictly python thing (location of objects) I dont like the idea of something in space appearing out of no where one minute but wasn't there prior when it's something that should have been there all along.

Better would be to just not have objects show up on the map unless you have bought the maps or you're within range of the object. This means we can have jumppoints not charted on any maps and you would either have to stumble upon them, or be given their coordinates, and since space is so big, the chances of you stumbling upon them are slim to none. I really dont like the idea that you have mapped everything of interest in a system, without having to buy any maps or be within range, that should fix your problem.

As for dynamic universe in C++... it may or may not be feasible to do that. Right now i think AI is easier and more cpu intensive and thus will give us a bigger bang for our time.

Post by **chuck_starchaser** » Fri Aug 17, 2007 4:50 am

safemode wrote:while that's probably a strictly python thing (location of objects) I dont like the idea of something in space appearing out of no where one minute but wasn't there prior when it's something that should have been there all along.

True, but it's not really a matter of choice that much; it's the way the original games in the WC/Privateer line work. Before your nav computer knows a jump point is there, you can't see it. It doesn't exist. There's actually no contradiction, because in WC jump points require highly specialized equipment to detect. They aren't visible. The balls of blue light that represent them are supposedly HUD projections by your nav computer. The only inconsistency, really, is the fact that the luminous spheres are clipped by the windows; they shouldn't be; they should look like an overlay that ignores window frames. Right now the problem is that since the VS engine doesn't allow making them invisible, players run into these jump points and they jump them out of curiosity and do things that break the game's plot. But don't worry about it, if this is a python thing, then someone who knows python will have to address it.