strook wrote:maybe (i haven't tried out it yet) you would maximize the speedup on for example a machine with 2 cores with -j3, cause the processor makes often noops cause the pipeline is not full, cause of data dependencies (code that must be executed in serial order).
Well, not true.
In order to get advantage of that (you're talking of hyperthreading), your processor actually has to support hyperthreading (core 2's don't for instance), and furthermore, your tasks have to be taxing different execution units in the CPU. Which is rare in compile workloads.
Compilation is usually memory-bound, and even with 4 cores, a memory-bound process with -j4 won't perform at 4x. AMD processors cannot use the entire bandwidth of the memory bus on one core, so yes, -j4 may work nicely on a 4-core CPU, but -j5 will probably just increase cache pressure and be counterproductive. On intel processors, however, single-core memory bandwidth is much better, and 2 cores can easily saturate a 4-core's memory bus. So -j4 would be hard-pressed to outperform -j3, say.
The thing is rather complex, so a good rule of thumb is -jn with n=number of processors. Going over it is dubious, and a smaller number is probably worth a try.
So... with complex things as these, better not speculate and benchmark instead. I know for instance HT alone does shave off about 5-10% of build time on a P4 I have with -j2. So it's kind of worth it. And -j4 on i5 works wonders. -i8 on i7 is the same, HT in i7 is a whopping lot better than on the P4, it really performs a lot better, but still, the i7 is suffering from cache pressure already with -i8, so I usually use -i6 there.