safemode wrote:
Using your argument that memory access is slow, you're adding steps to my suggestion that all have to occur in memory. They add copies in memory and add processor cycles to boot. You are going to be creating a compressed texture eventually to get the effect you stated in your step that you want user selectable quality levels of textures. The only thing you save by using jpg is filesystem transferrate time for the difference in filesize between native compressed and jpg and you give a better "high quality" mode for uncompressed texture use for those who want the best possible picture.
No; let me use a hypothetical example. Say we have 1k by 1k texture. Uncompressed that's 3 megs without alpha channel. Compressed with png you might get down to 1 meg: Lossless but still very big and slow. Compressed with dds you might also end up with about 500 k to 1 meg depending on how lossy you make the compression. Fast for the gpu, and simple, to code, but of lower quality, still slow to load into memory AND to read from memory; and with the losses burnt into the file, rather than user settable. The same image compressed with jpg at 90% quality might be only 50k in size, with excellent quality (indistinguishable from the original to most eyes), and much faster to load from disk, AND much faster to read from memory. Its only drawback it that it has to be decompressed, as the gpu can't decompress jpg. So if we want to send the image to the videocard in a gpu compatible compression format, we have to re-compress after decompressing from jpg. This is what I propose.
Now, what you say is true: My suggestion requires memory for intermediate storage of the image in uncompressed format. However, this scratchpad memory doesn't necessarily incurr the memory access time gottchas I was talking about. Why? Because, as I was suggesting, the routine that loads jpg and re-compresses to dds could declare a 2k by 2k pixel memory plane as a
static variable, and re-use it. Since this routine is called multiple times in short bursts, this area of memory is likely to stay cached in L2 after the first call. Or, we could add SSE Prefetch instructions to the loops, to make sure it is. This would be the best solution. That way, while your decompression routine is writing to one row of pixels, the next row is being raised to L2 cache (if it isn't already there) in the background, in parallel. With nicely placed prefetches, and if we put asynchronous preloads from disk also, the whole decompress and recompress latency could be brought down to microseconds. The slowest part would then be the reading of jpg data from memory, but, being jpg, it's about 1/20th as much data as reading a png or dds.
So, to summarize, with your solution you'd have large dds textures being called from disk and put into memory, incurring long access time from memory. With my solution you'd have much smaller files having to be allocated into memory, with much reduced memory latency costs; and the routine that decompresses from jpg and recompresses to dds would use an area of memory scratchpad for storing the un-compressed textures temporarily; but, being a static member, this scratchpad memory would incurr no per-call allocation latencies; and it would enjoy higher cache hit ratio due to the burstiness of calls, when multiple textures are loaded consecutively; even without using SSE prefetch instructions.
The only drawback my suggestion has is that my top level quality wont be as high as yours and the hdd will have to transfer more per file, but my suggestion completely kills yours in every other aspect.
Not so: The size of the download would still be as huge as using png's; reading dds from disk to memory and from memory to send to the videocard would still be 10 or 20 times longer than using jpg.
How much does filesystem transfers matter once they've already been started and how often do they actually occur in game ? That is a good question, since linux and the game will cache as much as possible to ram.
Disk data flow, once it has started, is faster than memory speed, I believe. Seek time is another story: In the order of milliseconds. So this would favor your argument. However, with asynchronous preloading of files, the problem of disk latency would become a non-issue either way; and the main bottleneck then is memory latency. Having files loaded into memory in jpg format reduces memory access latencies by the same ratio as the difference between the compression ratios. 10 to 20 times. So jpg wins hands down.
To what effect does the filesystem file size become the overpowering factor ? I'm sure there is a magic filesize where the difference in filesize wouldn't matter below a certain number, but above does.
Well, with asynchronous disk preloading, there'd be not much difference at all with access time, regarless of file sizes. But file size is still an issue with memory access time AND the size of the download. If we replaced many of those png's and bmp's with jpg, I'm pretty sure we could reduce the size of the vegastrike download by 50% or more.
And just so we don't forget, jpg quality is MUCH higher than dds compression achieves. So you get smaller AND much better image files.
And the extra work of decompressing and re-compressing, I re-submit, would be tiny by comparison to memory access latency; and far more than compensated for, by reduction of the latter.
But if you still dislike my proposal of decompressing and re-compressing on the fly; I might suggest an alternative: Have the game engine decompress and re-compress all images at once the first time it runs; or make that part of the installer. It would easily double or triple the game installation's disk footprint, and it would kill the memory access time advantages, so in my opinion it wouldn't be as fast as doing the decompression and recompression on the fly, but at least it would still enjoy the benefit of reduced download size. But I really don't like this solution, as it's an exercise in futility, IMO: Doing the decompression on the fly would be faster, due to reduced memory latencies; so there'd be no justification whatsoever, no benefit whatsoever, to be obtained for the price of increasing the installation size. In fact, you'd increase the game's disk space only to slow it down.
Since this thread is about the "stuttering problem", my original proposal was a way I believe would address it. Download size and installation footprint are secondary and off topic. My whole point was reducing latencies, and I believe that decompressing from jpg and recompressing to dds on the fly would be the way to go to reduce the stutter problem, if indeed it is due to texture loading and/or memory thrashing in the videocard.
EDIT:
The slowness of memory is nothing to sneeze at: We're talking hundreds of times slower than the cpu. In the old days, the optimizer's paradigm was "What values could we keep in memory so that they don't have to be re-computed?" The new paradigm is more like "What could we possibly re-compute from data already in cache, so we don't have to access it from off-cache memory?"
Anything that achieves any kind of in-memory compression is a winner, pretty much regardless of how much computation it takes to de-compress.
By that same token, the way to optimize a program for speed, is to optimize the inner loops for speed, and the other 95% of the code for size; as the benefit of increased cache coherence and resulting reduction of amortized memory latencies overweights all other concerns.