the ancient stuttering problem

Development directions, tasks, and features being actively implemented or pursued by the development team.
Post Reply
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

the ancient stuttering problem

Post by safemode »

I have a couple threads "post 4.4" and "std::list " that have both become a kind of search for the cause of the stuttering problem that can be seen in-game apparently when a unit is brought into the system.

Two or four very likely explanations can account for this.

1. Textures are too big, they are taking too long to load either the first time or caching isn't working correctly. The solution then is to use compressed textures and fix caching if it's broken.

2. Scheduling of collision and other intensive tasks is occuring in a way that piles them up in a single physics frame, rather than spread out nice and uniformly. This leads to a sporadic pause in the physics frame. Solution would be to create a new scheduling routine that guarantees no pileups.

3. Python has a routine somewhere that in certain situations (loading a unit into the system) causes it to run too long and blocks the game. That problem has yet to be explored.

4. Any callbacks that may exist, in opengl land or other, may be blocking for some reason sporadically. This also, has yet to be explored.


#2 is looking more and more suspect, because the pauses occured in my test of total_war_python even after all the textures had been loaded and cached. In between the pauses, the game responded smoothly...too smoothly for a game with thousand and thousands of units being rendered on screen.

My idea, besides just changing how the things are scheduled would be to make the scheduling routine time aware of certain function calls, and if it's reaching the point of 1/60th of a second, to reschedule remaining things for the next frame and move on. The next frame, it'll process what it didn't process before, and keep track of hitting that 1/60th of a second barrier. It does this over and over as needed. We could do it every Nth function call to remove some overhead of calling the timing function ... details to be determined later :)
I call that a self-pre-empting scheduler.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Re: the ancient stuttering problem

Post by chuck_starchaser »

Good summary, safemode. Personally, I lean a bit more to suspecting #1 rather than #2 as the *main* cause of stuttering; though I'm sure #2 needs to be addressed too. The reason I suspect #1 is that if #2 was the main problem, the stutters shuld be more random; but they, in my experience at least, are pretty consistently showing up at the moment of unit creation. (((One problem that also needs addressing and is closely related is that of textures and LOD's. Basically, if a ship is at the other end of the system and would occupy a fraction of a pixel on screen, we still are loading up all the textures for it to video memory, I believe. Terrible waste. Maybe the engine could force per-LOD textures?)))
safemode wrote: 4. Any callbacks that may exist, in opengl land or other, may be blocking for some reason sporadically. This also, has yet to be explored.
This could also go hand in hand with #1: Thrashing of textures between video and system memory could be causing such "callbacks" (or simply freezing the gpu's responses to the driver?).
#2 is looking more and more suspect, because the pauses occured in my test of total_war_python even after all the textures had been loaded and cached.
Cached where? Video memory or system memory? I strongly suspect that video memory is not nearly big enough to store all needed textures. Remember that, at least presently, textures residing in video memory are un-compressed. If your average texture for a ship is 1k by 1k, and you have diffuse, specular, glow and damage maps, that's, let's see: 1 byte for each of R, G, B and A = 4 bytes. Times 1k squared is 4 megs. Times 4 textures it's a whopping 16 megs. With a 256 meg videocard you got enough room for 16 ship types, without counting background and planetary textures, without counting cockpit and HUD textures, and without counting the meshes. All of which means that once the video ram fills up, any new textures that are needed are going to cause swapping of least recently used textures in video ram back to system memory to make room for the new; plus some extra time for memory management to defrag the newly freed up space.
My idea, besides just changing how the things are scheduled would be to make the scheduling routine time aware of certain function calls, and if it's reaching the point of 1/60th of a second, to reschedule remaining things for the next frame and move on. The next frame, it'll process what it didn't process before, and keep track of hitting that 1/60th of a second barrier. It does this over and over as needed. We could do it every Nth function call to remove some overhead of calling the timing function ... details to be determined later :)
I call that a self-pre-empting scheduler.
Sounds like a plan.
My last suggestion would be to find out who's the culprit first. Your idea could be applied to textures and unit loading in general, just as well: Like to only allow the loading of two new unit types per frame, say; and queue up loading of any further new unit type requests for successive frames (but to immediately issue asynchronous file loading commands for the disk drive, so that by the next frame, the files are already loaded, and it's just a matter of sending them to the video card).
Last edited by chuck_starchaser on Sun Aug 05, 2007 4:41 pm, edited 1 time in total.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

Assuming it was graphics loading that was the culprit.

You're suggesting to create two copies of every image, once in a lossless format stored in a separate branch, and the other in a lossy more compressed format left in the data4.x dir. This saves disk io. and thus reduces io waits and perhaps fixes the stuttering.


What i had tried to suggest before regarding image loading was to utilize a native OpenGL format as our lossy format, skipping jpg altogether. It's a win win on cpu and memory land. We dont have to decompress and recompress anything, the images are already in a format the gpu can understand (let it deal with decompression) so already we are miles ahead in the cpu overhead dept compared to utilizing pngs or jpg files. In the memory dept, yes they are bigger on filesystem than jpg's, but a jpg has to be loaded into ram, decompressed to use with opengl, optionally recompressed to an opengl native format and that has to be cached in ram too. With using the native compressed format from the start, we only have to cache that file once, so we are miles ahead in the memory dept.

Just pick a quality level that compromises well with the compression ratio and start using anti-aliasing or something to hide any artifacts. The idea is to put as much processing as possible on the gpu and not on the cpu and to minimize how much memory bouncing is required to do things. I think my idea would accomplish both goals much more fully than moving to yet another format that we'll need to play with to get to something openGL likes.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

I think what we currently do. is process and cache as many images as possible on game load that the user has allowed us to use. So a majority of images are in ram, which is extremely fast (much faster han anything between cpu and gpu). But like we've mentioned above, png's are huge and take up far too much space to be all cached in ram, and uncompressed images take up far too much space to keep many in the video card's memory. So we do need to make a change here. The question is, do we move to a lossy format that needs to be post-processed upon load everytime it's read from disk (even with jpg, we may not be able to cache all images in ram at a single time) or do we utilize a native compression format that will allow us to load an image and use it without any post processing ever.

In both cases, you'd be storing the compresesd native opengl format in ram, not the jpg so in both cases, once all the post processing is done, the same amount of memory is used.
But everytime you have to hit the filesystem, the jpg would have to be read into ram, decompressed, recompressed to opengl native and the jpg memory cleared some other texture would have to be thrown out of ram to make room and the process would repeat as needed.
Using native compressed format directly would result in " read into ram, throw out another texture to make room, repeat" Which is faster, less memory overhead and makes more sense?

What i'm getting at is that in both cases, we'll be moving to a compressed texture format to maximize how many simultaneous textures can stay in video ram at the same time. Like you said, 16 ships is way too few, and that's being generous. If we could double that or triple it, it would be worth far more than the increased time it takes to read in an extra few hundred KB from disk. I really dont think the added cpu and memory overhead in dealing with jpgs is worth it when we could just use the native compressed format.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

safemode wrote:Assuming it was graphics loading that was the culprit.

You're suggesting to create two copies of every image, once in a lossless format stored in a separate branch, and the other in a lossy more compressed format left in the data4.x dir. This saves disk io. and thus reduces io waits and perhaps fixes the stuttering.
Exactly.
What i had tried to suggest before regarding image loading was to utilize a native OpenGL format as our lossy format, skipping jpg altogether. It's a win win on cpu and memory land. We dont have to decompress and recompress anything, the images are already in a format the gpu can understand (let it deal with decompression) so already we are miles ahead in the cpu overhead dept compared to utilizing pngs or jpg files. In the memory dept, yes they are bigger on filesystem than jpg's, but a jpg has to be loaded into ram, decompressed to use with opengl, optionally recompressed to an opengl native format and that has to be cached in ram too. With using the native compressed format from the start, we only have to cache that file once, so we are miles ahead in the memory dept.
It would be better than what we're doing now, for sure; but the compression ratio of dds formats is laughable: You'll still find the texture files taking up megabytes. No comparison at all with the quality and compression ratio of jpg. We're talking about a 20:1 difference here, not just small change...
Just pick a quality level that compromises well with the compression ratio and start using anti-aliasing or something to hide any artifacts. The idea is to put as much processing as possible on the gpu and not on the cpu and to minimize how much memory bouncing is required to do things. I think my idea would accomplish both goals much more fully than moving to yet another format that we'll need to play with to get to something openGL likes.
Well, the code for decompressing jpg is already in the engine, and if the code for compressing to a gpu-compatible format can be found somewhere as open source, it wouldn't take long to implement. My argument for this solution is based on that,
a) JPG gives you MUCH higher compression than DDS, AND at much higher quality.
b) Losses for DDS compression are much higher, and should really be user-selectable, rather than burnt into the files.
c) Most of the time in loading a texture and sending it to the videocard is probably spent in memory access, even after disk cacheing. Memory is very slow. Typically in the order of microseconds latency per new cache line. A 3 GHz cpu, due to parallel execution and pipelining, executes dozens of instructions per clock cycle. Think of it as 10 instruction to be conservative. So at 3 GHz we got 30 instruction per nanosecond. That's 30 THOUSAND instructions equivalent time loss whenever the cpu has to wait 1 microsecond for a piece of memory that wasn't in L1 or L2 cache. So I think I'm even understating it when I say that the savings in memory access time would far outweight the extra processing in decompressing jpg and re-compressing to dds on the fly.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

chuck_starchaser wrote:
safemode wrote:Assuming it was g

Well, the code for decompressing jpg is already in the engine, and if the code for compressing to a gpu-compatible format can be found somewhere as open source, it wouldn't take long to implement. My argument for this solution is based on that,
If we didn't use jpg, we'd remove that decompressor, same for png. saving cpu and some memory footprint
a) JPG gives you MUCH higher compression than DDS, AND at much higher quality.
Then you are opting for uncompressed textures in gpu land.
b) Losses for DDS compression are much higher, and should really be user-selectable, rather than burnt into the files.
This is a good point, though I would think that there are ways to accomplish this with the native compression format. And in this point, you are suggesting that we utilize the native compression format in gpu land, and to utilize the jpg as a "highest level" of quality and allow compressed native formats to give the effect of low quality low memory options for computers with less resources. This becomes very complicated as to the drawbacks and gains compared to the current way things are done.
c) Most of the time in loading a texture and sending it to the videocard is probably spent in memory access, even after disk cacheing. Memory is very slow. Typically in the order of microseconds latency per new cache line. A 3 GHz cpu, due to parallel execution and pipelining, executes dozens of instructions per clock cycle. Think of it as 10 instruction to be conservative. So at 3 GHz we got 30 instruction per nanosecond. That's 30 THOUSAND instructions equivalent time loss whenever the cpu has to wait 1 microsecond for a piece of memory that wasn't in L1 or L2 cache. So I think I'm even understating it when I say that the savings in memory access time would far outweight the extra processing in decompressing jpg and re-compressing to dds on the fly.
Using your argument that memory access is slow, you're adding steps to my suggestion that all have to occur in memory. They add copies in memory and add processor cycles to boot. You are going to be creating a compressed texture eventually to get the effect you stated in your step that you want user selectable quality levels of textures. The only thing you save by using jpg is filesystem transferrate time for the difference in filesize between native compressed and jpg and you give a better "high quality" mode for uncompressed texture use for those who want the best possible picture. The only drawback my suggestion has is that my top level quality wont be as high as yours and the hdd will have to transfer more per file, but my suggestion completely kills yours in every other aspect. How much does filesystem transfers matter once they've already been started and how often do they actually occur in game ? That is a good question, since linux and the game will cache as much as possible to ram. To what effect does the filesystem file size become the overpowering factor ? I'm sure there is a magic filesize where the difference in filesize wouldn't matter below a certain number, but above does.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

safemode wrote: Using your argument that memory access is slow, you're adding steps to my suggestion that all have to occur in memory. They add copies in memory and add processor cycles to boot. You are going to be creating a compressed texture eventually to get the effect you stated in your step that you want user selectable quality levels of textures. The only thing you save by using jpg is filesystem transferrate time for the difference in filesize between native compressed and jpg and you give a better "high quality" mode for uncompressed texture use for those who want the best possible picture.
No; let me use a hypothetical example. Say we have 1k by 1k texture. Uncompressed that's 3 megs without alpha channel. Compressed with png you might get down to 1 meg: Lossless but still very big and slow. Compressed with dds you might also end up with about 500 k to 1 meg depending on how lossy you make the compression. Fast for the gpu, and simple, to code, but of lower quality, still slow to load into memory AND to read from memory; and with the losses burnt into the file, rather than user settable. The same image compressed with jpg at 90% quality might be only 50k in size, with excellent quality (indistinguishable from the original to most eyes), and much faster to load from disk, AND much faster to read from memory. Its only drawback it that it has to be decompressed, as the gpu can't decompress jpg. So if we want to send the image to the videocard in a gpu compatible compression format, we have to re-compress after decompressing from jpg. This is what I propose.

Now, what you say is true: My suggestion requires memory for intermediate storage of the image in uncompressed format. However, this scratchpad memory doesn't necessarily incurr the memory access time gottchas I was talking about. Why? Because, as I was suggesting, the routine that loads jpg and re-compresses to dds could declare a 2k by 2k pixel memory plane as a static variable, and re-use it. Since this routine is called multiple times in short bursts, this area of memory is likely to stay cached in L2 after the first call. Or, we could add SSE Prefetch instructions to the loops, to make sure it is. This would be the best solution. That way, while your decompression routine is writing to one row of pixels, the next row is being raised to L2 cache (if it isn't already there) in the background, in parallel. With nicely placed prefetches, and if we put asynchronous preloads from disk also, the whole decompress and recompress latency could be brought down to microseconds. The slowest part would then be the reading of jpg data from memory, but, being jpg, it's about 1/20th as much data as reading a png or dds.

So, to summarize, with your solution you'd have large dds textures being called from disk and put into memory, incurring long access time from memory. With my solution you'd have much smaller files having to be allocated into memory, with much reduced memory latency costs; and the routine that decompresses from jpg and recompresses to dds would use an area of memory scratchpad for storing the un-compressed textures temporarily; but, being a static member, this scratchpad memory would incurr no per-call allocation latencies; and it would enjoy higher cache hit ratio due to the burstiness of calls, when multiple textures are loaded consecutively; even without using SSE prefetch instructions.
The only drawback my suggestion has is that my top level quality wont be as high as yours and the hdd will have to transfer more per file, but my suggestion completely kills yours in every other aspect.
Not so: The size of the download would still be as huge as using png's; reading dds from disk to memory and from memory to send to the videocard would still be 10 or 20 times longer than using jpg.
How much does filesystem transfers matter once they've already been started and how often do they actually occur in game ? That is a good question, since linux and the game will cache as much as possible to ram.
Disk data flow, once it has started, is faster than memory speed, I believe. Seek time is another story: In the order of milliseconds. So this would favor your argument. However, with asynchronous preloading of files, the problem of disk latency would become a non-issue either way; and the main bottleneck then is memory latency. Having files loaded into memory in jpg format reduces memory access latencies by the same ratio as the difference between the compression ratios. 10 to 20 times. So jpg wins hands down.
To what effect does the filesystem file size become the overpowering factor ? I'm sure there is a magic filesize where the difference in filesize wouldn't matter below a certain number, but above does.
Well, with asynchronous disk preloading, there'd be not much difference at all with access time, regarless of file sizes. But file size is still an issue with memory access time AND the size of the download. If we replaced many of those png's and bmp's with jpg, I'm pretty sure we could reduce the size of the vegastrike download by 50% or more.
And just so we don't forget, jpg quality is MUCH higher than dds compression achieves. So you get smaller AND much better image files.

And the extra work of decompressing and re-compressing, I re-submit, would be tiny by comparison to memory access latency; and far more than compensated for, by reduction of the latter.

But if you still dislike my proposal of decompressing and re-compressing on the fly; I might suggest an alternative: Have the game engine decompress and re-compress all images at once the first time it runs; or make that part of the installer. It would easily double or triple the game installation's disk footprint, and it would kill the memory access time advantages, so in my opinion it wouldn't be as fast as doing the decompression and recompression on the fly, but at least it would still enjoy the benefit of reduced download size. But I really don't like this solution, as it's an exercise in futility, IMO: Doing the decompression on the fly would be faster, due to reduced memory latencies; so there'd be no justification whatsoever, no benefit whatsoever, to be obtained for the price of increasing the installation size. In fact, you'd increase the game's disk space only to slow it down.

Since this thread is about the "stuttering problem", my original proposal was a way I believe would address it. Download size and installation footprint are secondary and off topic. My whole point was reducing latencies, and I believe that decompressing from jpg and recompressing to dds on the fly would be the way to go to reduce the stutter problem, if indeed it is due to texture loading and/or memory thrashing in the videocard.

EDIT:

The slowness of memory is nothing to sneeze at: We're talking hundreds of times slower than the cpu. In the old days, the optimizer's paradigm was "What values could we keep in memory so that they don't have to be re-computed?" The new paradigm is more like "What could we possibly re-compute from data already in cache, so we don't have to access it from off-cache memory?"

Anything that achieves any kind of in-memory compression is a winner, pretty much regardless of how much computation it takes to de-compress.

By that same token, the way to optimize a program for speed, is to optimize the inner loops for speed, and the other 95% of the code for size; as the benefit of increased cache coherence and resulting reduction of amortized memory latencies overweights all other concerns.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

How much L2 cache do you think you have? Because i find it very hard to understand how you're keeping a 3MB section of memory in L2 cache when most processors dont have 3MB of L2 cache.

jpg also doesn't have alpha, so you'll still be left having to use another format for those textures requiring an alpha channel.

for jpg, or png (if using compression for vram's sake), you'll need 3 copies of each texture and all the cpu required to get between the three + hdd io. Add to that, every time a texture has to be bumped out of ram due to reaching our limit of texture memory we have to do it all over again. So you have to add up the time it takes you to do all that over the course of a simulated game. I think you are underestimating the amount of time that sequence of events takes.

Now You're banking that doing all that is still faster than loading a file that is likely less than half a MB anyway. I say it's not. At those sizes, the difference in reading the two files to ram would be nothing. Especially not compared to what you're proposing.

I really dont know why you're stuck on the memory read speed issue. You have to do that no matter what method you choose because in the end, we're going to have to compress the textures so that more will stay on the vram, a situation which is much much faster than having to read from system ram).

All we're arguing about here is hdd storage format. You're saying jpg yields smaller files which means less time to read from disk vs using a dds file which will be somewhat larger and take more time to load than jpg + all the processing required to get back to dds in ram.

I really dont think the hdd IO is our limiting factor with the stuttering problem. We simply dont read from hdd that often to cause this problem everytime a unit is loaded. What happens much more frequently occurs between the vram and system ram, and that's due to not being able to hold all the textures in vram. Compressing the texture data will allow this to happen much less frequently. So we should definitely do that, no matter what format the texture data is in on disk.

Really there is no reason to limit to one texture format, we already have jpg and png textures. Adding support for dds textures on disk + making all texture image in ram compressed shouldn't mean we still can't mix and match as needed.


In the end, i really dont think this is causing the stuttering problem. At least not in the sense that it is a ram speed limiting issue or filesystem io limiting issue.

The guessing game is getting old though. It's time to get some good latency profiling going on.

Edit: plus, if you look at the units, most of them are jpg textures already. Perhaps the uncompressed aspect in system ram is our real problem. Fixing that to use compressed textures should not negate also giving the option of using dds files directly on disk, especially for low quality textures.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

safemode wrote:How much L2 cache do you think you have? Because i find it very hard to understand how you're keeping a 3MB section of memory in L2 cache when most processors dont have 3MB of L2 cache.
Yep. You're right; I wasn't thinking. This is an argument for the use of prefetch instructions, tho.
jpg also doesn't have alpha, so you'll still be left having to use another format for those textures requiring an alpha channel.
Yes; I said that from the beginning, remember? "If no alpha and is not jpg, throw()".
for jpg, or png (if using compression for vram's sake), you'll need 3 copies of each texture
Why 3?
and all the cpu required to get between the three + hdd io.
But that's my whole point: "all the cpu" adds to exactly zero cost, because it is hidden by the memory latency; it executes in parallel with memory fetching to cache.
Add to that, every time a texture has to be bumped out of ram due to reaching our limit of texture memory we have to do it all over again.
Do what again? Something that's quicker than the alternative?
So you have to add up the time it takes you to do all that
Again, "all that" is faster than loading a larger texture.
over the course of a simulated game. I think you are underestimating the amount of time that sequence of events takes.
I think you're underestimating the time it takes to read 20 times as much memory.
Now You're banking that doing all that is still faster than loading a file that is likely less than half a MB anyway.
A file that is less than 50 k is 10 times faster to read from memory than a file that is less than 500 k; yes :)
I say it's not.
I guess we'll have to agree to disagree, then.
At those sizes, the difference in reading the two files to ram would be nothing.
Reading the files to ram is not the issue; from ram is.
I really dont know why you're stuck on the memory read speed issue.
I'm stuck on it because I've read enough about it to be convinced. Read any modern documents about *practical* code optimization, and the biggest issue of all is cache coherence. In fact, just about ALL other code optimizations expicitly assume all data and instruction to be in L1 cache. But in a typical, poorly optimized app, the cpu spends most of its time tweedling thumbs waiting for for memory to be fetched to cache. That's why Intel came up with the idea of hyperthreading. They thought that with two front ends, the cpu would be better utilized, because when one thread is waiting for something from memory, another thread could execute. It was short-sighted only in the sense that the threads were competing for the same cache resources, so the benefits were lower than the theoretical models predicted, --which were assuming the threads were small pieces of code, small data, and high amounts of computation on each data unit.
You have to do that no matter what method you choose because in the end, we're going to have to compress the textures so that more will stay on the vram, a situation which is much much faster than having to read from system ram).
Agreed.
All we're arguing about here is hdd storage format. You're saying jpg yields smaller files which means less time to read from disk vs using a dds file which will be somewhat larger and take more time to load than jpg
The storage format is also the format in which the file from disk comes into file cache in ddr ram. If it's 20 times smaller, it will take 20 times less memory, which will cause 20 time less cache lines needing to be raised, which will result in 20 times less chache misses, which will result in about 18 or 19 x speedup in reading that file from memory. The time it takes a 3 GHz cpu to decompress from jpg and re-compress to dds, if you add all the instruction cycles required, I bet you any amount of money you want, is less than 10% the time it takes to read the data from memory in the first place. So the reduction of memory reading time by a factor of 20 far outweights the processing involved. On the other hand, as you said, the space required for the decompressed data doesn't fit in cache. So the only way to get full speed processing is to use prefetch. But if we use prefetch, we could use it on the dds file directly. So, maybe you're right and the best solution would be storage in dds, and then use prefetch during transfer. Unfortunately, prefetch is wasted if there's nothing for the cpu to do with the data already fetched... Prefetching is a way to parallelize data fetching and processing, but if there's no processing of the data, there's nothing to gain from it. So, I come back to my original postition: If for no other reason, reading the files in jpg format would be better because the decompression work comes for free --hidden by the memory latency.
I really dont think the hdd IO is our limiting factor with the stuttering problem.
I never thought it should be. I think it is, but only because of the use of immediate access calls, as opposed to asynchronous requests with callbacks when the file is ready. Immediate access functions don't even return untile the whole process of file fetching is complete, which takes milliseconds.
We simply dont read from hdd that often to cause this problem everytime a unit is loaded.
It's true that the reads aren't frequent; but you have burts of them. One ship usually requires a mesh and four textures.
What happens much more frequently occurs between the vram and system ram, and that's due to not being able to hold all the textures in vram. Compressing the texture data will allow this to happen much less frequently. So we should definitely do that, no matter what format the texture data is in on disk.
Here we agree 100%.
Really there is no reason to limit to one texture format, we already have jpg and png textures. Adding support for dds textures on disk + making all texture image in ram compressed shouldn't mean we still can't mix and match as needed.
Very true. Removing support for BMP might be beneficial, though :)
In the end, i really dont think this is causing the stuttering problem. At least not in the sense that it is a ram speed limiting issue or filesystem io limiting issue.
Well, I have to agree on this; I do believe the main problem is video ram memory thrashing.
The guessing game is getting old though. It's time to get some good latency profiling going on.
True
Edit: plus, if you look at the units, most of them are jpg textures already. Perhaps the uncompressed aspect in system ram is our real problem. Fixing that to use compressed textures should not negate also giving the option of using dds files directly on disk, especially for low quality textures.
Yep.
BTW, using dds textures directly in storage has been done, already. It's not a new invention. And there's reasons why the whole world isn't doing that, though; and we've discussed them already: Quality, size, etc.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

chuck_starchaser wrote:
safemode wrote:

Since this thread is about the "stuttering problem", my original proposal was a way I believe would address it. Download size and installation footprint are secondary and off topic. My whole point was reducing latencies, and I believe that decompressing from jpg and recompressing to dds on the fly would be the way to go to reduce the stutter problem, if indeed it is due to texture loading and/or memory thrashing in the videocard.
What you're saying is that it's not about the disk footprint, when in fact, that is exactly what you're arguing, that it's the on disk file size that is responsible for the high load times to ram. Smaller files == faster loading (which is what you're arguing). So installation footprint is on-topic as far as you're concerned.

Your whole point specifically relies on the idea that it's the filesystem read that is our limiting factor. Because your method requires reading a jpg into ram, reading an uncompressed texture from ram (regardless of where the scratch pad happens to reside, it has to be read from the decompressor function, or do you plan putting everything into L2?) , and writing a generated compressed texture to ram. So obviously, everything after reading from the disk is slower than just using dds files for textures.

That being said, like i mentioned before, we should allow various on-disk formats including dds, there is no reason to exclude any. What this thread has so-far produced is the need to create compressed textures for use in-game. Because obviously, dealing only with uncompressed textures isn't very efficient.

EDIT:

The slowness of memory is nothing to sneeze at: We're talking hundreds of times slower than the cpu. In the old days, the optimizer's paradigm was "What values could we keep in memory so that they don't have to be re-computed?" The new paradigm is more like "What could we possibly re-compute from data already in cache, so we don't have to access it from off-cache memory?"

Anything that achieves any kind of in-memory compression is a winner, pretty much regardless of how much computation it takes to de-compress.

By that same token, the way to optimize a program for speed, is to optimize the inner loops for speed, and the other 95% of the code for size; as the benefit of increased cache coherence and resulting reduction of amortized memory latencies overweights all other concerns.
Indeed, but you're limited as to what actually fits in cache in order for it to be optimized in a way that benefits from always residing in cache. Textures can't. Even compressed textures only weighing in at a few hundred KB wont be held in L2 cache. Plus you have all kinds of other things running that are going to pop into the L2 cache (notably the kernel) so you can't rely on executing in L2 all the time, or even some of the time.


the paradigm you quoted is good, assuming your program is the only one running, is guaranteed to be in cache most/all the time and actually fits inside the cache.
vegastrike isn't the only program running, it isn't small enough to fit inside L2 itself, and certainly, the textures compressed and uncompressed aren't small enough to stay in L2 along with everything else that wants to stay in L2.

say for a 300kB png
jpg/png
read from disk (read 300k from disk, write 300kB to ram)-> uncompress in ram (read 300kB compressed data write 2.2MB uncompressed data)-> recompress in ram (read 2.2MB uncompressed data write new 268kB compressed data-> write to vram (read 268kB compressed data in ram, write 268kB compressed data to vram).

jpg format would only reduce the filesystem load size amount from 300kB read, to 54kB read. That's the only real difference between using png or jpg for texture storage, unless their decoders require vastly different amounts of cpu time.

dds
read from disk (read 268kB disk write 268kB to ram) -> write to vram (read 268kB compressed data in ram, write 268kB compressed data to vram)

Your method requires 4.7MB _more_ data to be pushed back and forth around ram for every avg sized texture. You may still end up being faster though if the disk io is really that bad at transferring that extra 200kB to ram. Though, if you had your way with prefetching data from hdd, then you've just killed your only real benefit because now the hdd latency isn't an issue.


Really what i would like to see is something like what crystal space does for textures. They support various on disk formats, including dds. They only use the high quality png and jpg files for high detailed objects in the game, in high quality mode. Everything else uses dds files either generated at runtime or already existing on disk. I think we can get away with making 90% of the textures we use dds files. The cockpit should be high quality (only one will be loaded at a time in vram anyway) and when the user sets it, bases and planets should be high quality.
crystal space has come to the conclusion that the on disk size difference from jpg to dds doesn't outweight the added processing and memory requirements of converting to dds in ram. http://www.crystalspace3d.org/main/Effi ... s_Tutorial
read the heading texture file format.

So finally this is what we should do (in my mind anyway):

basically, anything that doesn't look that bad as a dds, should be a dds file on disk. Everything else should be jpg on disk. PNG should be kept as a source for creating the dds and jpg files by developers only. compressed vs uncompressed textures are determined via configuration settings, high quality uses the uncompressed versions of some textures (those that would be obvious to the user) all other modes compress increasingly visible textures to dds. Added onto that is a prefetching file IO function or thread that will initiate the reading process prior to actually needing the data while the code is executing other commands.

That last part is the most annoying part. How do we create a function that reads data from the disk before we know we need it if when we read in a unit we want to load, we need the data then? That is to say, if python requests to load unit XYZ, how do we make a prefetch that gets that data before the next command which would undoubtedly be one that operates on that data that it asked to load?
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

your argument to the post previous to mine about being 20 times faster because it's 20 times smaller is only true on the very first time it's read to be compressed to dds. After that, you add 4.7MB for a 50k jpg to your ram reads/writes to get to the dds end product.

The only benefit jpg has over dds is the initial fs read, everything else involved with jpg is slower.

As for all the processor usage being hidden by the ram latency, guess what your uncompressed texture has to be written to? ram, guess what it needs to be read from to be compressed to dds, ram. Then guess what the dds finally has to be written to as cached storage, ram. So it's most definitely not hidden by the initial ram latency of writing the on-disk file to ram, since you have to hit ram half a dozen times during your processing of the jpg/png. You'll be lucky if it's hidden mostly by the hdd IO wait of reading the extra 200kB.


and my quote about banking on doing it all is still faster than loading a file that is 10 times the size is in reference to the fact that it doesn't take 10 times as long to load a file that is 10 times the size when we're dealing with small files like this. A good deal of that time is hung up in the hdd seek time and fs execution of beginning the retrieval of the file. That time is less significant the larger the file obviously.


reading to ram from the disk _IS_ the issue you've been talking about. We both agree that the use of compressed textures is the best way to go about things, so regardless of the disk format, everything is going to be the same in ram. So reading from ram is no different if the texture is a jpg on disk or dds on disk because it's all dds in ram. If you think ram speed is slow, you should definitely believe hdd speed is monumentally slow, that alone should be why you are advocating jpg as a texture format and runtime conversion to dds in ram texture cache.

The real argument hear is if the added memory and cpu overhead of converting a jpg to dds everytime it has to be read from disk outweighs the benefit of reading less data from the disk for each texture (roughtly 5.5 times less data 270K -> 52K). You are saying it does, I'm saying it most certainly wont when the textures are prefetched. There is no other argument being made here. Once the jpg is read from disk, you convert it to dds and it has the same performance from then on as just reading in the dds file. It's the disk to dds execution line that we're arguing over.

But I changed my tune when I mentioned we should allow all these formats to be used, but convert over anything that lends itself to dds to dds on disk, and keep jpg for all high quality capable textures and keep the png source files in a different branch that developers will use to generate the jpg and dds files. Now we're just arguing over if the hdd is slow enough to hide the added cpu/ram requirements of converting jpg to dds. If the hdd is prefetching, that doesn't matter, if it isn't prefetching, that's the argument. You said it's slower than conversion, crystal space says it's not. I side with crystal space for small files.

And with that being said, perhaps uncompressed tarballs of the textures and meshes for a unit should be made, so that only one call to the filesystem needs to be made to load up a unit, rather than half a dozen. The same amount of data is loaded, just less calls, so it should be much faster overall
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

safemode wrote:Your whole point specifically relies on the idea that it's the filesystem read that is our limiting factor.
No, I said, that reading the data from ram was the issue; --after it has been loaded from disk--, assuming we'd use asynchronous preloading.
But I'm not going to argue the point any more; it's obvious I won't be able to win you to my point of view. But it's my fault, for not having thought about it more carefully before posting. Your accounting of memory use, and the fact that the datas don't fit in cache is correct. If I could start all over, I'd still say the best way to go would be to load jpg, decompress and compress to dds on the fly; but I would change just one crucial detail: Instead of having a 4 megabyte chunk of memory for temporary storage of the un-compressed image, I'd have a much smaller chunk of scratchpad memory, and de-compress and re-compress a small chunk at a time. But I know I've lost the initial argument and I'm changing positions. In any case, I agree that, first things first, in-memory compression is the most improtant issue, so I'll drop the other one.
crystal space has come to the conclusion that the on disk size difference from jpg to dds doesn't outweight the added processing and memory requirements of converting to dds in ram. http://www.crystalspace3d.org/main/Effi ... s_Tutorial
read the heading texture file format.
True, that's what they say. Do I agree with them? No. But like I said, I'm dropping the issue; texture compression is more urgent anyways.
So finally this is what we should do (in my mind anyway):

basically, anything that doesn't look that bad as a dds, should be a dds file on disk. Everything else should be jpg on disk. PNG should be kept as a source for creating the dds and jpg files by developers only. compressed vs uncompressed textures are determined via configuration settings, high quality uses the uncompressed versions of some textures (those that would be obvious to the user) all other modes compress increasingly visible textures to dds. Added onto that is a prefetching file IO function or thread that will initiate the reading process prior to actually needing the data while the code is executing other commands.
Excellent plan.
That last part is the most annoying part. How do we create a function that reads data from the disk before we know we need it
Exactly; we cannot order a file from disk before we know we need it. But the problem can be simplified by turning it around: Once we know we need a texture, how long could we wait for it? We ARE waiting for it anyways: the time it takes the disk drive to a) seek the track, b) wait for the disk to rotate to the start of the file, c) for the data to transfer... All of which takes from 10 to 15 milliseconds. The question is not whether or not to wait for it, really, but whether we're going to watch it while we wait, or do something useful with that precious time. So if we accept that we cannot have it all right now, we could simply order the file now, but wait till the next frame to pick it up.
if when we read in a unit we want to load, we need the data then? That is to say, if python requests to load unit XYZ, how do we make a prefetch that gets that data before the next command which would undoubtedly be one that operates on that data that it asked to load?
I know zilch about Python. Is there no Wait() or Sleep() function we can call after ordering a file, so that execution of that Python code stalls until the next frame revives it?
Otherwise, rather than use a Python native function, we could wrap a c++ function into Python that issues an asynchronous file fetch, and then blocks. This would in fact block the Python thread. I imagine Python is running on a separate thread?
If not, there must be a way we limit how much python code we run per-frame? Whatever method we use, we need to apply it after any file fetching call.
Or write a C++ function like...

Code: Select all

typedef std::pair< filename, callback_fn_t > file_fetch_job_t;
typedef std::queue < file_fetch_job_t > file_fetch_queue_t;
static file_fetch_queue_t the_queue;
FILE* LoadFile( Filename, void * callback_fn = NULL )
{
  if( callback_fn )
  {
    ::asynchronous_fetch( Filename );
    the_queue.push( file_fetch_job_t( Filename, callback_fn ) );
    return 0;
  }
  else
  {
    return fopen( Filename, r+ );
  }
}
And a function we call once per frame, at the start of each frame:

Code: Select all

void deliver_a_file()
{
    if( ! the_queue.is_empty() )
    {
        file_fetch_job_t ffj = the_queue.pop();
        *(jjt.second())(jjt.first());
    }
}
Sorry, that's rather poetic pseudocode, but hopefully it conveys my idea.
Last edited by chuck_starchaser on Mon Aug 06, 2007 1:22 am, edited 1 time in total.
loki1950
The Shepherd
Posts: 5841
Joined: Fri May 13, 2005 8:37 pm
Location: Ottawa
Contact:

Post by loki1950 »

Just a quick comment ORGE makes extensive use of libzip as all resource files are zipped in one bundle.How will effect this debate :?:

Enjoy the Choice :)
my box::HP Envy i5-6400 @2Q70GHzx4 8 Gb ram/1 Tb(Win10 64)/3 Tb Mint 19.2/GTX745 4Gb acer S243HL K222HQL
Q8200/Asus P5QDLX/8 Gb ram/WD 2Tb 2-500 G HD/GF GT640 2Gb Mint 17.3 64 bit Win 10 32 bit acer and Lenovo ideapad 320-15ARB Win 10/Mint 19.2
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

Sorry for making such rapid posts, i dont usually have the opportunity to spend time at the computer over the weekend.



to summarize, for post 4.4 vegastrike needs:

1. Filesystem texture loading prefetching. Some method of reading in textures from the disk to ram prior to needing that data.

2. A procedure to compress any non-dds textures read in from the filesystem to dds dependent on user configuration preferences.

3. Remove all png textures from the data4.x branch, stick them in a new branch created to keep "originals" so that more compressed lossy versions can be shipped to users.

4. make any small and less constantly visible textures in the game as dds on disk. Anything we dont care to be user-definable quality wise that looks fine as dds should be dds.

5. jpg files will be used or any texture that can be altered in quality by the user to adjust for memory usage. This will likely be the very large textures and textures that are in the cockpit. Anything not converted to dds by the procedure that looks at the user's quality options will stay in ram as uncompressed images. When the user defined texture space is filled, the textures not in use will have to be dumped and the required texture will have to be re-decompressed from the on-disk jpg file



So, if half of the textures are left as jpg and the other half are already on disk as dds from initial installation, then the user has the option of greatly effecting both the quality, amount of space in ram, and amount of cpu required to load from disk the textures and cpu required to swap between vram and system ram those large textures.

each time the quality option is changed, the first time you load vs, vs should process any jpg files now falling under the option of "lower quality allowed" and create on disk dds files of those jpg's. The game then uses these dds files whenever it has to read from disk so long as the quality option doesn't change. The higher the qulaity, the more and more of those jpg files are converted to dds.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

Sorry, I spent a long time editing my last post; I guess I forgot I was editing it, and thought I was just writing it for the first time. Anyways, looks like there's no easy way of postponing use of disk files. The problem is that the code that uses the file has to change, no matter how you look at it. The only way to be able to use the code as is would be to actually use predictive prefetches. Maybe that wouldn't be too difficult? Not sure. The problem is that if we need to read files in order to find out what files we need, those first files should be prefetched even earlier...
Damn programming; things are so easy to think; so hard to implement.
safemode wrote:And with that being said, perhaps uncompressed tarballs of the textures and meshes for a unit should be made, so that only one call to the filesystem needs to be made to load up a unit, rather than half a dozen. The same amount of data is loaded, just less calls, so it should be much faster overall
Great idea!

@Loki: That sounds good. Actually, Tarball... That doesn't imply compression, does it? Not sure... Generally, I'd trust whatever the Ogre guys do than whatever Chrystal Space guys say. The Ogre people are pretty aware of performance issues. Heck, they have tons of SSE optimizations in their later releases. But anyways, I'd say grouping files into compressed tarballs would be best. Well, compressed images won't compress any further, but bfxm's and other files would. Now, having ALL files into one huge compressed file I'm not sure; depends whether the compression used is localized. If you need to access data elsewhere in the file to get a file from it it could be detrimental. But if decompression is fairly localized, then, one huge compressed file might be good if the OS can swap parts of it it's not been using to the swap file/partition and still be able to pull stuff out of the chunks still in memory, that'd be good.
Last edited by chuck_starchaser on Mon Aug 06, 2007 2:27 am, edited 1 time in total.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

If we can't create a automagic prefetching function to load units before we need them then we'll just have to live with caching all the unit textures at initial runtime, to the limit of the user defined memory allowance. With the use of dds for most of the textures, this could be very viable even at high quality mode. As it is now, VS only uses 3/4 of the 1GB of ram i allow it to use. And by moving to jpg for everything high quality requiring, we should reduce disk access by a lot.

For that which has to be re-read from disk during gameplay, we'll have to live with it.


But because of the reasons i mentioned just now, I really doubt disk access or even memory access is the reason why the game lags on a new unit creation. I think it has to be either a run-away function in C++, extremely inefficient python code or a scheduling problem with simulation.

I still lean towards scheduling. Fixing texture issue is still very good, but i think the pausing is being caused maybe by a unit usurping scheduling priority out of the clean random order prior to it's insertion into the system.

I'd really like to see the scheduling routine handle time directly.
Execute whatever it has in queue until it's timer is >= 1/60th of a second. Then move along and reschedule the remaining things in queue for another frame. repeat and continue.


By not allowing anything to really break the 60fps barrier by much we should avoid the massive pauses we're currently experiencing.
The trick would be calling the time function every N calls during the frame so that we're not calling it every single function call.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

chuck_starchaser wrote:S

That sounds good. Actually, Tarball... That doesn't imply compression, does it? Not sure... Generally, I'd trust whatever the Ogre guys do than whatever Chrystal Space guys say. The Ogre people are pretty aware of performance issues. Heck, they have tons of SSE optimizations in their later releases. But anyways, I'd say grouping files into compressed tarballs would be best. Well, compressed images won't compress any further, but bfxm's and other files would. Now, having ALL files into one huge compressed file I'm not sure; depends whether the compression used is localized. If you need to access data elsewhere in the file to get a file from it it could be detrimental. But if decompression is fairly localized, then, one huge compressed file might be good if the OS can swap parts of it it's not been using to the swap file/partition and still be able to pull stuff out of the chunks still in memory, that'd be good.
I'm not sure if i would compress the tarballs, you probably wouldn't see much size saving for the cpu use. Tarballs themselves dont have any compression. tar will call gzip or bzip2 to compress files inside it. Access inside the tar file without extracting them should be possible in ram, so we wont have to duplicate any memory.

My concern with using single files for an entire unit would be if we reach our limit in ram and have to dump something not in use to free up space, we'd have to dump the entire unit, even if it was only to free up a hundred KB. Then if we load that unit again, we have to reload the entire unit file ...just to get what could have been only a hundred KB file. Not sure if that occurs often though.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

hash tables are a huge cpu eater in game. They also happen to be used to hold every texture in the game. Coincidence that we see texture loading and hash tables using lots of cpu around the time pauses occur? perhaps not.


aux_texture.cpp/.h are in need of inspection. This is also the area where we can figure out how to slip compression in.

Edit:

it appears that the current way to operate quality control is to manually remove color information from the textures. This is great because that means when we implement the new quality control (increasing/decreasing the number of textures that get compressed) we should see drastic improvements in speed in all quality levels as well as better quality in all those levels.

the BuildBBox function is called the first time a unit gets loaded (this happens about 141 times total) I've yet to run a simulated mission where it loads all the units but doesn't load an absolutely enormous number of them. If i could get a mission to load all the units but only 1 of each. And then slowly load more I could see if it pauses again .... and since the game wouldn't be hampered by trying to simulate tens of thousands of units, I'd know that the pauses were still being caused by the same problem we see in normal missions. Right now with my total_war mission, it's not clear if the pauses I still see with all the units loaded are caused by the same issue in the beginning or by having to simulate over 10,000 units.
Last edited by safemode on Mon Aug 06, 2007 3:04 am, edited 1 time in total.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

I'm thinking, one way of deferring use of files until the next frame could be achieved relatively easily by using the boost::function library
http://www.boost.org/doc/html/function.html
to turn sequential lists of commands into std:list's of conditionally deferred calls...

Or just change the code. Typically, if we know a function is going to be called once per frame, for the simplest case,
say you have a function

Code: Select all

void x()
{
  a();
  b();
  c();
  FILE d = fopen( "foo.txt", r+ );
  e( d );
}
That could be changed into

Code: Select all

void x()
{
  static std::string deferred_file;
  if( deferred_file ) e( deferred_file );
  a();
  b();
  c();
  asynchronous_load( "foo.txt", +r );
  deferred_file = "foo.txt";
}
Now, if the file could be called more than (or less than) once per frame, we'd have to queue file loads, and store frame numbers with them to make sure we don't pull a call before at least one or two frames have passed.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

You still run into the issue of not knowing what file to load prior to actually needing it. Being able to defer a function call is nice, not knowing what arguments to give that function until you actually need the data is another issue.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

safemode wrote:My concern with using single files for an entire unit would be if we reach our limit in ram and have to dump something not in use to free up space, we'd have to dump the entire unit, even if it was only to free up a hundred KB. Then if we load that unit again, we have to reload the entire unit file ...just to get what could have been only a hundred KB file. Not sure if that occurs often though.
No, the OS does the dumping for you; and it dumps stuff to the swap file on a per least recently used memory page. It doesn't even know what a file is; it just swaps out pages. And there's nothing you need to --or can-- do from the code side, in fact; you don't even know when the memory limit is reached. To your app it looks like virtually endless memory, ask and ye shall get... When the memory AND the swap file fill up, you get a bad_alloc and it's game over. No warning lights.
safemode wrote:You still run into the issue of not knowing what file to load prior to actually needing it. Being able to defer a function call is nice, not knowing what arguments to give that function until you actually need the data is another issue.
Another way would be using co-routines. I had a CUJ magazine issue that showed a template way of implementing co-routines in C++.
Yeah, the problem is, okay, servers implement asynchronous io all the time; but for them it's easy, because they got like gazillions of threads running, so if a thread needs to wait for io it just blocks, and another thread gets control. But with a single threaded program we gain nothing from blocking.
This is harder than I thought.
Yeah, if we could predictively prefetch files that'd be a much easier way.
No boost::prophecy needed; just basically duplication of the code that WILL require files, but generating asynchronous fetches instead, and with all code not related to file loading removed, executing a frame ahead of time :D

EDIT: By the way, I can make vegastrike stutter at will; just by pressing "I". So, it shouldn't be too hard to believe that simple io could be the culprit.
Last edited by chuck_starchaser on Mon Aug 06, 2007 4:33 am, edited 1 time in total.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

chuck_starchaser wrote:
safemode wrote:My concern with using single files for an entire unit would be if we reach our limit in ram and have to dump something not in use to free up space, we'd have to dump the entire unit, even if it was only to free up a hundred KB. Then if we load that unit again, we have to reload the entire unit file ...just to get what could have been only a hundred KB file. Not sure if that occurs often though.
No, the OS does the dumping for you; and it dumps stuff to the swap file on a per least recently used memory page. It doesn't even know what a file is; it just swaps out pages. And there's nothing you need to --or can-- do from the code side, in fact; you don't even know when the memory limit is reached. To your app it looks like virtually endless memory, ask and ye shall get... When the memory AND the swap file fill up, you get a bad_alloc and it's game over. No warning lights.
You're forgetting that we allocate a user specified amount of memory for texture loading. We internally maintain this memory. Not the OS. We have to cull textures when this memory is filled up. We disallow ourselves from using more than that specified amount of ram for textures.

Another way would be using co-routines. I had a CUJ magazine issue that showed a template way of implementing co-routines in C++.
Yeah, the problem is, okay, servers implement asynchronous io all the time; but for them it's easy, because they got like gazillions of threads running, so if a thread needs to wait for io it just blocks, and another thread gets control. But with a single threaded program we gain nothing from blocking.
This is harder than I thought.
Yeah, if we could predictively prefetch files that'd be a much easier way.
No prophecy needed; just basically duplication of the code that WILL require files, but generating asynchronous fetches instead, and with all code not related to file loading removed, executing a frame ahead of time :D
The thing is, we create units dependent on the current universe. If we create units via buffering that function, so that we can prefetch, then the units we actually insert into the system, will be late. They'll be created due to situations that were in the Universe N frames ago. This could lead to a lot of weird behavior with missions

The alternative to prefetching data is finding some way to thread the read call and do something else safely at the same time that we need to do anyway that's not dependent on that texture data. If we can do that, then we cancel out the (or greatly reduce the already reduced) time it takes to read the file and we're golden.

So we block the function to read and do something else that the frame had to do anyway. making the time it takes to run the frame the same as if it didn't have to read the file.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

safemode wrote:You're forgetting that we allocate a user specified amount of memory for texture loading. We internally maintain this memory. Not the OS. We have to cull textures when this memory is filled up. We disallow ourselves from using more than that specified amount of ram for textures.
Ah, didn't know.
The thing is, we create units dependent on the current universe. If we create units via buffering that function, so that we can prefetch, then the units we actually insert into the system, will be late. They'll be created due to situations that were in the Universe N frames ago. This could lead to a lot of weird behavior with missions
Not if N=1 ;-) One frame should be plenty of time, for one io op at least.
But besides, the universe doesn't depend on textures. The ship unit could be created immediately and be fully functional; the delay would be just graphical: A delay of one frame for the mesh and textures to appear (be sent to the videocard).
Last edited by chuck_starchaser on Mon Aug 06, 2007 7:10 am, edited 4 times in total.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

sometime during the week, we should hammer out a really good outline of what we want to see in the post 4.4 vegastrike dealing with textures and prefetching file io. During that time we should really know for sure what vegastrike is doing already and how.

I want to start putting up outlines of specific changes to code up on the wiki.

So far we have three things going on, texture procedures, file prefetching, and scheduling. It'd be really helpful to have these really outlined in pseudo-code/rationalization prior to jumping into the actual code for post 4.4

Then we can make those things the main goals that should be accomplished before the next release.
chuck_starchaser
Elite
Elite
Posts: 8014
Joined: Fri Sep 05, 2003 4:03 am
Location: Montreal
Contact:

Post by chuck_starchaser »

True. Well, I'm Johnny Come Lately, here; I've no idea what the plans are.

I was looking for some kind of asynchronous i/o api and so far google's let me down this time. Well, I was hoping for the simplest thing that perhaps doesn't exist; a simple file_open( s ) function that returns immediately but gets the ball rolling; but I can't find anything like it. Well, I'm thinking, if worse comes to worst we can create a function like

Code: Select all

void open_and_close( char * const s )
{
    FILE *f = fopen( s, "r" );
    static char dummy;
    fread( f, dummy, 1 );
    if( f ) fclose( f );
}
And a global function like

Code: Select all

void asynchronous_file_open( char *fn )
{
    using boost;
    thread thrd(&open_and_close(fn));
    thrd.join();
}
So a thread is created that does absolutely nothing except block for a while, while getting the file to fetch, then closes it; but the beauty of it is that the file is cached. The OS won't evict it as soon as it's closed. So we can then open it in immediate mode, as usual, but without delay.
So then, in unit, we'd have something like

Code: Select all

class unit
{
    using std;
    class create_graphical_functor
    {
        list< char * const > const * pflist_; //pointer to a list of filenames
    public:
        void operator()() //fcall operator sends textures to the videocard
        {
            FILE x* = fopen( *(pflist_)[0]... )
            gl_this_or_that...
            fclose( x );
            ................
            FILE y* = fopen( *(pflist_)[1]... )
            gl_this_or_that...
            fclose( y );
            ................
            FILE z* = fopen( *(pflist_)[2]... )
            gl_this_or_that...
            fclose( z );
            ................
            delete pflist_*;
        }
        //ctor:
        explicit create_graphical( list< char * const > const * pflist )
        : pflist_(pflist)
        {
        }
    };
    static queue < create_graphical_functor > the_queue;
    static void process_queue();
public:
    void create( xml_t xml )
    {
        list< char* const > * pflist = new list< char* const >;
        extract_file_list_from_xml( xml, pflist );
        for_all( it, pflist ){ asynchronous_file_open( it ); }
        initialize_all_other_unit_stuff();
        the_queue.push( create_graphical_functor( pflist ) );
    }
};
Tell me you don't like that. :D
Actually, I'm not sure whether fopen/fread in the created thread cause the thread to block or to lock >:^0
But there ought to be asynchronous io functions that block that we can use instead of fopen/fread, I'm sure.
Post Reply