feature freeze time

Post by **klauss** » Mon Mar 08, 2010 7:16 pm

I think the pimpl idiom is not so much to accelerate compilation (which it incidently does), but to reduce dependencies among headers that shouldn't be there.

If the implementation of a class needs, say, ffmpeg, there's no need to force an ffmpeg dependency on all users of that class, if the interface doesn't need it. In essence, if the interface uses generic or non-ffmpeg types.

Since private member fields are part of the header (and the binary representation of objects), the pimpl idiom lets you effectively differentiate private, implementation-only fields from interface fields, making the interface really portable. If you build the pimpl'ed version of a class against a different ffmpeg version, the interface will still be binary compatible, which is a LOT to say.

That, IMO, is the goal of the pimpl idiom. That it speeds up builds is a side effect.

Post by **chuck_starchaser** » Mon Mar 08, 2010 8:13 pm

@safemode: About 20 minutes, I'd say. Next time I'll time it.
The problem is about 1/3 of it too many headers included unnecessarily (in the cpp files) --probably--; but
probably about 2/3 of it too many headers coming in via the .h file including other .h files, which
themselves include other .h files.... I know that a .h file will refuse to be included twice; but that's not
what I was referring to.
@klauss: I know; but I'm personally craving the side-effects, right now

EDIT:
Another idiom we should be using and we're not is nameless namespaces for private things in cpp files:

Code: Select all

#include "foo.h"

namespace {

struct fooImpl
{
   ....
};

};

This way, fooImpl_ won't be exported by the linker.

Post by **safemode** » Mon Mar 08, 2010 9:06 pm

if we do a header check, it has to be done from the roots of the project first, or attempting to manually hunt down un-used includes in the top end is going to be useless.

But increasing the complexity of code from a readability standpoint for only the sake of compilation speed is not enough. And since we already use boost, the other benefits of the pimpl idiom would be replaced by just using a smart pointer from boost wouldn't it?

in either case, i dont think compile times are at the level where anything needs to be considered above basic header checking and continuing the work of knocking out bugs and basic optimization of code

Post by **safemode** » Mon Mar 08, 2010 9:11 pm

Also, there's gotta be a tool out there that lets you give a source file and have it give you a tree of it's includes, allowing you to identify any loops, along with any includes that aren't actually used in the immediately preceding file. Or at least i wish there was.

Post by **klauss** » Mon Mar 08, 2010 9:26 pm

gcc does that, I don't remember the arguments, but it printed out a list of include files required by each cpp file.

It's used by automake to do dependency analysis.

Post by **safemode** » Mon Mar 08, 2010 9:52 pm

The problem with using gcc is that it will process the preproccesor arguments, hiding includes that aren't actually being included in one source dependency tree.

so i'm guessing that this feature can hide includes that are interdependent on various macros being defined, allowing some headers to mask their includes if the calling file has some define set ...etc etc.

Aside from that, it may not tell us header includes that aren't used, since it's purpose is to report headers that are used. I'll have to double check that though.

Post by **chuck_starchaser** » Mon Mar 08, 2010 9:56 pm

-H

We should add it to the cmake configuration.
It doesn't tell you whether the includes are needed, though; only that they are included.

Post by **safemode** » Mon Mar 08, 2010 10:29 pm

chuck_starchaser wrote:-H

We should add it to the cmake configuration.
It doesn't tell you whether the includes are needed, though; only that they are included.

First, before we worry about header inclusion checking... we need to decide if it's OK to include a header that say, includes iostream, and not include iostream in the cpp file (assuming it's needed in both). Or including a header that includes other headers while not including those other headers. (opcode does this type of crap, which i didn't change because there was no time to).

I say it's bad. If i have afile (header or source), and i comment out a header include then i should lose only that stuff defined within that specific header. Yet, often we have system headers specifically designed to allow you to include 1 header to get the functionality from an entire library consisting of many headers even if you dont require stuff from half of them. So if it's acceptable there, why would we condemn it locally? and if we dont condemn it, how do we decide what is clean organization of headers other than just getting rid of ones we dont use in source files?

edit: additionally, how do we plan on identifying data in a source file that depends on x.h if we also depend on y.h and y.h depends on x.h ? if we comment out x.h in the source, we still get it because y includes it. If we get a tree of includes, we can see what's happening, but only if we know before hand that we should be missing defines, which requires knowing where all our variables come from in the first place, which means we wouldn't need any of this crap and could do it all in our head.

The tool has to process the source, then identify where very function and data member comes from in addition to the include tree and flag headers that are required for a given file but not included in that file (they get included a hop or more away).

Post by **chuck_starchaser** » Mon Mar 08, 2010 11:30 pm

The standard rules are like this:
If foo.h needs x.h and y.h, and y.h includes x.h, then including y.h is enough.

Why?

Because the cardinal rule for headers is that they should only include another header if
clients will need that other header to be able to use it.
If that rule is assumed adhered to, then, the fact that y.h includes x.h is not because it
needs x.h, but because the author decided that foo.h needs x.h to be able to use y.h.
So it includes x.h SO THAT foo.h doesn't have to include x.h separately.

If the author of y.h does not think that a foo.h including a y.h would need an x.h, then
he or she would NOT include x.h, and would find a way to NOT include it, such as by
using a pimpl or something.

So, we just have to make sure that the cardinal rule is adhered to in all the sources; no
exceptions:
A header file should only include another header file on behalf of its users, when it can
be demonstrated that the users will need it.
Example: matrix.h could include vector.h, because users would have no use for matrices
without vectors to use them with. (Probably a bad example.)

Post by **klauss** » Mon Mar 08, 2010 11:42 pm

Also, system files whose whole purpose is to include a bunch of other (hopefully related) include files are OK if you ask me.

But that's their whole purpose.

BTW: a way to break an unneeded dependency is to use pointers or references along with forward class/struct declarations.

Post by **chuck_starchaser** » Tue Mar 09, 2010 2:23 am

I've also seen a mention of #include <some_stl_header_fwd>, which instead of giving you definitions it gives you only declarations, like forward references. Those might be useful, if they truly exist...
In foo.h
#include <memory_fwd>

In foo.cpp
#include <memory>

Post by **klauss** » Tue Mar 09, 2010 4:33 am

safemode wrote:How long does compilation take for you? For me it's probably about 10 minutes. and my machine is years old.

My machine at home:

$ time make -j2
...
real 17m39.868s
user 31m41.639s
sys 1m20.697s

That's hyper threading, no real dual core here.
Hyper threading reduces build time by about 10%, in case you're wondering.

Post by **safemode** » Tue Mar 09, 2010 3:16 pm

i timed mine with -O2 -mtune=athlon64 -mfpmath=sse -msse2 -mmx and the rest regular RELEASE flags.
It's a dual core, and that reduces compile time by just about half.

real time was just under 9 minutes. make -j2

This is compiling everything in the vs code tree. It's marginally less when compiling _just_ vegastrike.

my binary before stripping is something like 8-9MB, after stripping about 7MB for vegastrike. Vegaserver is roughly half the size before and after stripping.

If you're compiling with debugging symbols, you're _significantly_ increasing the amount of IO to disk, and this can greatly increase compile time (and reduce parallel building).

edit: also, i noticed some issues in cmake with trying to modify cflags (you can't for the builtins we define)... So i'm going to introduce a few changes to the cmake file that will allow us to have a "release-like" mode that is unique so that we can modify it's flags and play around as needed. Basically, builtin modes are FORCED to the flags we write in the cmakelist.txt file, but this FORCE disables user-side changes at config time. This is fine for the builtin types though so we can ask "did you build as Release" and if they say "yes" we know exactly what their build flags were. But if we want to test flags, we need a self-made mode, like Maintainer is, but setup to mimic Release. So i'm going to introduce Rel, which is just like Release, only it lets us modify the flags at config time.

Like i mentioned in another thread, I also plan on adding a prompt to cmake to allow the user to select their cpu arch (with possibly auto-detection too) so more optimized gcc flags can be used.

Post by **safemode** » Tue Mar 09, 2010 3:32 pm

klauss wrote:
safemode wrote:How long does compilation take for you? For me it's probably about 10 minutes. and my machine is years old.
My machine at home:

$ time make -j2
...
real 17m39.868s
user 31m41.639s
sys 1m20.697s

That's hyper threading, no real dual core here.
Hyper threading reduces build time by about 10%, in case you're wondering.

how do you figure 10% ? 17m is 46% less time than 32m. i think you are underestimating your hyperthreading

real time is how long the process actually took. user and sys is how much cpu time the process took. In the case of smp, each execution unit is timed and all times are added up. So those are the times it would have taken if you had no smp enabled.

Post by **klauss** » Tue Mar 09, 2010 3:38 pm

I thought I had a neat explanation.
Now I don't

I guess I'll re-think it.
Or try with -j1

Post by **safemode** » Tue Mar 09, 2010 3:50 pm

klauss wrote:Nope, because with hyper threading, the two execution units share processor time.
So I do 37/2, and that's the time it would have taken without HT (granted, that's a grossly gross approximation, I would have to actually disable HT and try - that, or re-build with -j1 - I might try that later on).

try it with -j1, should be interesting. That's an aweful lot of congestion to be wasting 35% of it's time waiting to execute. Makes me wonder if you wouldn't have been better with -j1, allowing the kernel to utilize the other execution units to play with io and do it's thing.

edit: Also, interested in chuck's times. Must be larger than yours, with all the interest in lowering compile times. BTW, do you have swap enabled and is it active when you are compiling and linking VS? are you timing in cmake or autoconf generated builds? 64 or 32bit? and what's the size of the resultant bins (only vegastrike and vegaserver i'm concerned with). template compiling and compiling C++ in general can tend to use a lot of ram, so fast and plentiful ram is extremely important to compile times. Hitting swap is not acceptable.

i'm guessing klauss is using a ~2Ghz intel P4 with hyperthreading (cuz he told me ht) .. what about you chuck?

Post by **klauss** » Tue Mar 09, 2010 4:19 pm

safemode wrote:i'm guessing klauss is using a ~2Ghz intel P4 with hyperthreading (cuz he told me ht) .. what about you chuck?

Close. 2.8Ghz P4 with HT, 2GB RAM. RAM is plenty indeed and there's no IO bottleneck. SWAP is stuck at ~0 all the time.

I did the test against -j1 once, and HT helped about that much (~10%) - but I'll do it again.

Post by **safemode** » Tue Mar 09, 2010 5:02 pm

klauss wrote:
safemode wrote:i'm guessing klauss is using a ~2Ghz intel P4 with hyperthreading (cuz he told me ht) .. what about you chuck?
Close. 2.8Ghz P4 with HT, 2GB RAM. RAM is plenty indeed and there's no IO bottleneck. SWAP is stuck at ~0 all the time.

I did the test against -j1 once, and HT helped about that much (~10%) - but I'll do it again.

hrm.. we have the same ram. debugging symbols? (i dont have swap, but like you mentioned, swap isn't a factor with 2GB and most of it not being used by other crap when compiling vs). ./autoconf or cmake. ?

my main concern for your compile time is why you are so close to double my time when i'm running on a 2Ghz setup. You should be much closer to only 50% slower. I'm starting to bet it's disk io bound slowness hitting you. I compile on a RAID0 sata setup (2 disk). And i dont compile in debugging symbols, (your bin with debugging can be upwards of almost 100MB before stripping, and every object will be drastically larger, greatly increasing disk io wait during compile).

Post by **klauss** » Tue Mar 09, 2010 5:30 pm

safemode wrote:./autoconf or cmake. ?

Autoconf

safemode wrote:my main concern for your compile time is why you are so close to double my time when i'm running on a 2Ghz setup.

You have a true dual core.
BTW, Pentium 4's aren't dual core. Except perhaps the Pentium-D, but those shipped with higher clocks. So if you have a real dual core at 2Ghz, you're not using NetBurst, you're probably using the Core microarchitecture (Pentium Dual-Core). In fact, 2Ghz was the clock of the average Core chip, IIRC.

So, your CPU is based on the Core architecture, which has several benefits. One is a bigger cache, another is a shorter pipeline, and lower TDP.

NetBurst was substantially better than P3s due to their faster memory, but had many, many shortcomings.

Mine is a Northwood, which is about 3 years older than yours.

Post by **safemode** » Tue Mar 09, 2010 6:49 pm

klauss wrote:
safemode wrote:./autoconf or cmake. ?
Autoconf

try cmake, cmake build times are faster than autoconf's. It also organizes the objects differently so it may produce less hdd overhead and/or less memory overhead. Though, you can probably wait until i get home and commit a couple cmake tweaks.

safemode wrote:my main concern for your compile time is why you are so close to double my time when i'm running on a 2Ghz setup.
You have a true dual core.
BTW, Pentium 4's aren't dual core. Except perhaps the Pentium-D, but those shipped with higher clocks. So if you have a real dual core at 2Ghz, you're not using NetBurst, you're probably using the Core microarchitecture (Pentium Dual-Core). In fact, 2Ghz was the clock of the average Core chip, IIRC.

So, your CPU is based on the Core architecture, which has several benefits. One is a bigger cache, another is a shorter pipeline, and lower TDP.

NetBurst was substantially better than P3s due to their faster memory, but had many, many shortcomings.

Mine is a Northwood, which is about 3 years older than yours.

you missed my -mtune .. It's an athlon64 X2, (first gen) Yours still might be older, but not by much. I have an integrated memory controller, so i have more memory bandwidth than you do, but cpu wise, i would figure we would be close to eachother, yet your HT enabled compile is about half the speed of my dual core. I suppose caching could play a part. In certain situations i see > 2x single core performance when doing the same type of task in parallel. Ie, encoding in lame or oggenc both result in > 2x the performance as if i had just run 1 at a time.

Also good to note (and i cant believe i forgot it) is if you're compiling against the internal boost or using system boost. System boost yields significantly faster compiles.

In the end, you may be simply on target with your platform, your distro is likely not compiled for your cpu arch as closely as mine is, so that may explain certain performance benefits i get to see because everything in my dist is compiled for x86_64 (which baselines very closely to many specific amd64 optimizations), while the best you can probably hope for in P4 land is i686.

Post by **klauss** » Tue Mar 09, 2010 7:02 pm

safemode wrote:you missed my -mtune .. It's an athlon64 X2, (first gen)

Totally. In fact, I thought I had read you had a P4 too - go figure.

AMD was substantially better at FP math at that time, and I know gcc does a lot of FP math (don't ask what it does it for - no clue whatsoever, but profilings I read once in a review revealed that).

Caches were also a lot better - not larger, but better. Especially in multicore land.

safemode wrote:Also good to note (and i cant believe i forgot it) is if you're compiling against the internal boost or using system boost. System boost yields significantly faster compiles.

System boost, python2.6.

The config line is something like: ./configure --enable-release --disable-debug --with-boost=system --with-python=2.6

safemode wrote:In the end, you may be simply on target with your platform, your distro is likely not compiled for your cpu arch as closely as mine is, so that may explain certain performance benefits i get to see because everything in my dist is compiled for x86_64 (which baselines very closely to many specific amd64 optimizations), while the best you can probably hope for in P4 land is i686.

x86_64 is an overall better arch, both in AMD and intel. It has a lot more registers, it has better conditional instructions (limited form of predication), it has more flexible memory addressing modes. Aside from the fact that it consumes twice as much memory in pointers, it's better everywhere else.

Post by **safemode** » Tue Mar 09, 2010 8:29 pm

setting the python version should be unnecessary.. Your boost was compiled against a certain python already so it has to be the one used, allowing the user to mix and match should be denied, so we should ignore this option when using system boost. We should be able to say for a given boost we decide to use, we are given the version of python to use (which has to be installed because they have that boost installed).

the with python version option is only useful if you are compiling an in-tree boost version.

Post by **safemode** » Wed Mar 10, 2010 12:28 am

I have functional cpu arch selection code setup and working for everything from P4 (pre-prescott) to latest and amd K8 to latest ...user selectable ..not autodetect yet. i'll commit changes when i get back from job#2

croxis · Post by **croxis** » Wed Mar 10, 2010 2:22 am

In linux I mount a tmpfs into the temp build directory. I haven't done any testing but my compile times seem to be much faster writing to ram instead of a hard drive.

Post by **chuck_starchaser** » Wed Mar 10, 2010 3:25 am

Hell, yeah, a ram-disk should help.
There must be a way to set one up in Ubuntu?

So, for the record, what did I say it took me to compile? 20 minutes? I was 2.5% off: 19.5 minutes, about...
This machine is getting old, though: an ancient (single core) Athlon 64 3000+, and two gigs of DDR1.
(Maybe we should hack GCC to use CUDA

)

Vega Strike Forums

feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time

Re: feature freeze time