free optimizations for building vs
-
- Developer
- Posts: 2150
- Joined: Mon Apr 23, 2007 1:17 am
- Location: Pennsylvania
- Contact:
free optimizations for building vs
I wanted to post this where lots of people who build their own VS read but maybe aren't interested in dev talk.
In GCC, the only way to auto generate SIMD instructions is to use -ftree-vectorize and either -maltivec or -msse/-msse2 etc . Now, the -msse/-msse2 also requires -mfpmath=sse i believe. Some of this stuff is on by default in the amd64 releases by Debian, Not all of it though.
I'd be interested if people would make clean, then configure with --enable-release="-O2 -ftree-vectorize" and add the -maltivec for mac or -msse/-msse2 -mfpmath=sse as appropriate for x86.
This is not to measure perf increases, but to see if this breaks anything or causes segfaults.
The idea is that such an option could be set via configure detecting the processor. But I only want to do that if it's doesn't cause weird crap to happen like -O3 does
In GCC, the only way to auto generate SIMD instructions is to use -ftree-vectorize and either -maltivec or -msse/-msse2 etc . Now, the -msse/-msse2 also requires -mfpmath=sse i believe. Some of this stuff is on by default in the amd64 releases by Debian, Not all of it though.
I'd be interested if people would make clean, then configure with --enable-release="-O2 -ftree-vectorize" and add the -maltivec for mac or -msse/-msse2 -mfpmath=sse as appropriate for x86.
This is not to measure perf increases, but to see if this breaks anything or causes segfaults.
The idea is that such an option could be set via configure detecting the processor. But I only want to do that if it's doesn't cause weird crap to happen like -O3 does
Ed Sweetman endorses this message.
-
- Explorer
- Posts: 8
- Joined: Tue Mar 11, 2008 2:28 am
- Location: UK
Hi
Recompiling now, configured with this:
Will post back when I've used the binary a bit more.
EDIT:
Segfaulted for first time in a while. Not sure if it was related to compiling with those flags or running out of fuel. Afterburner seems to take up a lot of fuel in svn r12066.
Maybe I should compile without -fomit-frame-pointer to get a backtrace?
Recompiling now, configured with this:
Code: Select all
./configure --enable-stencil-buffer --enable-flags="-O2 \
-march=athlon-xp -mfpmath=sse -msse \
-ftree-vectorize -fomit-frame-pointer -pipe"
EDIT:
Segfaulted for first time in a while. Not sure if it was related to compiling with those flags or running out of fuel. Afterburner seems to take up a lot of fuel in svn r12066.
Maybe I should compile without -fomit-frame-pointer to get a backtrace?
-
- ISO Party Member
- Posts: 410
- Joined: Tue Jun 26, 2007 7:15 pm
Code: Select all
src/gldrv/gl_init.cpp: In function ‘void GFXInit(int, char**)’:
src/gldrv/gl_init.cpp:440: warning: deprecated conversion from string constant to ‘char*’
src/gldrv/gl_init.cpp:440: warning: deprecated conversion from string constant to ‘char*’
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘.’ token
src/gldrv/gl_init.cpp:454: error: expected primary-expression before ‘==’ token
src/gldrv/gl_init.cpp:454: error: expected primary-expression before ‘==’ token
src/gldrv/gl_init.cpp:454: error: expected primary-expression before ‘=’ token
src/gldrv/gl_init.cpp:455: error: ‘struct gl_options_t’ has no member named ‘smooth_lines’
src/gldrv/gl_init.cpp:456: error: ‘struct gl_options_t’ has no member named ‘smooth_points’
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘.’ token
src/gldrv/gl_init.cpp:460: error: expected `;' before ‘gl_options’
make[1]: *** [src/gldrv/gl_init.o] Error 1
make[1]: Leaving directory `/home/proteus/VegaStrike/vegastrike'
make: *** [all] Error 2
-
- Lead Network Developer
- Posts: 2560
- Joined: Sun Jan 12, 2003 9:13 am
- Location: Palo Alto CA
- Contact:
-
- ISO Party Member
- Posts: 410
- Joined: Tue Jun 26, 2007 7:15 pm
Oh... that's probably from my reversion of Breakable's earlier patch. Reverting that patch is necessary to make VS playable on machines with Intel graphics chipsets, otherwise VS will keep smoothing lines even when smoothing is disabled; however, the patch still hasn't been reverted in SVN as far as I know.
-
- Lead Network Developer
- Posts: 2560
- Joined: Sun Jan 12, 2003 9:13 am
- Location: Palo Alto CA
- Contact:
Miramor, can you try applying this patch to a clean SVN?
I think Breakable's patch did some of the right things... but clearly it left it enabled somewhere which is bad.
I added extra conservative glDisable's to the places where I thought they might be needed. (and fixed one place that did a glEnable with a boolean argument)
If this doesn't fix it, I'll try to take out more of his patchset until it does work :-p
The Intel Macbook doesn't have this problem as far as I can tell...
Code: Select all
Index: src/gldrv/gl_state.cpp
===================================================================
--- src/gldrv/gl_state.cpp (revision 12088)
+++ src/gldrv/gl_state.cpp (working copy)
@@ -219,14 +219,8 @@
glDisable(GL_CULL_FACE);
break;
case SMOOTH:
- if(gl_options.smooth_lines)
- {
- glDisable(GL_LINE_SMOOTH);
- }
- if(gl_options.smooth_points)
- {
- glDisable (GL_POINT_SMOOTH);
- }
+ glDisable(GL_LINE_SMOOTH);
+ glDisable (GL_POINT_SMOOTH);
break;
case STENCIL:
glDisable(GL_STENCIL);
Index: src/gfx/particle.cpp
===================================================================
--- src/gfx/particle.cpp (revision 12088)
+++ src/gfx/particle.cpp (working copy)
@@ -103,8 +103,12 @@
static float psiz=XMLSupport::parse_float (vs_config->getVariable ("graphics","sparkesize","1.5"));
GFXPointSize(psiz);
- static bool psmooth=XMLSupport::parse_bool (vs_config->getVariable ("graphics","sparkesmooth","false"));
- glEnable(psmooth);
+
+ static bool psmooth=XMLSupport::parse_bool (vs_config->getVariable ("graphics","sparkesmooth","false"));
+ if (psmooth && gl_options.smooth_points) {
+ glEnable(GL_POINT_SMOOTH);
+ }
+
#else
GFXEnable(TEXTURE0);
GFXDisable(TEXTURE1);
@@ -154,10 +158,7 @@
}
GFXEnd();
#ifdef USE_POINTS
- if(gl_options.smooth_points)
- {
- glDisable (GL_POINT_SMOOTH);
- }
+ glDisable (GL_POINT_SMOOTH);
GFXPointSize(1);
#else
GFXDisable(DEPTHWRITE);
I added extra conservative glDisable's to the places where I thought they might be needed. (and fixed one place that did a glEnable with a boolean argument)
If this doesn't fix it, I'll try to take out more of his patchset until it does work :-p
The Intel Macbook doesn't have this problem as far as I can tell...
-
- ISO Party Member
- Posts: 410
- Joined: Tue Jun 26, 2007 7:15 pm
ace123: the patch fails to apply.
Code: Select all
$ patch -p0 <nosmooth.patch
patching file src/gldrv/gl_state.cpp
Hunk #1 FAILED at 219.
1 out of 1 hunk FAILED -- saving rejects to file src/gldrv/gl_state.cpp.rej
patching file src/gfx/particle.cpp
Hunk #1 FAILED at 103.
Hunk #2 FAILED at 158.
2 out of 2 hunks FAILED -- saving rejects to file src/gfx/particle.cpp.rej
-
- Lead Network Developer
- Posts: 2560
- Joined: Sun Jan 12, 2003 9:13 am
- Location: Palo Alto CA
- Contact:
-
- Developer
- Posts: 2150
- Joined: Mon Apr 23, 2007 1:17 am
- Location: Pennsylvania
- Contact:
try with and without patch?
I would not rely on loading times to dictate which helps or not. Loading is dependent on the video driver and the disk cache. Instead, focus on in-game details. For one, viewing Atlantis just after launch ...another would be flying very close to a base and seeing how the game behaves. etc
I would not rely on loading times to dictate which helps or not. Loading is dependent on the video driver and the disk cache. Instead, focus on in-game details. For one, viewing Atlantis just after launch ...another would be flying very close to a base and seeing how the game behaves. etc
Ed Sweetman endorses this message.
-
- Developer
- Posts: 2150
- Joined: Mon Apr 23, 2007 1:17 am
- Location: Pennsylvania
- Contact:
I didn't use the patch posted in the thread.
i'm running on an athlon64 x2 (not am2), and i'm in 64bit mode using gcc (GCC) 4.2.3 (Debian 4.2.3-2)
my arguments were
./configure -with-boost=1.33 --with-python-version=2.5 --enable-nvidia-cg --enable-stencil-buffer --enable-release=2 --enable-flags="-ftree-vectorize -msse -msse2 -mfpmath=sse -mmmx"
I'm using the internal boost because for some reason, using debian's 1.34 version of boost causes the game to go slower. Not sure why.
With the configurations as shown, i did not experience any instability, or any change in functionality from simply using release=2. I profiled the code and found that alot of functions took significantly less time.
i'm running on an athlon64 x2 (not am2), and i'm in 64bit mode using gcc (GCC) 4.2.3 (Debian 4.2.3-2)
my arguments were
./configure -with-boost=1.33 --with-python-version=2.5 --enable-nvidia-cg --enable-stencil-buffer --enable-release=2 --enable-flags="-ftree-vectorize -msse -msse2 -mfpmath=sse -mmmx"
I'm using the internal boost because for some reason, using debian's 1.34 version of boost causes the game to go slower. Not sure why.
With the configurations as shown, i did not experience any instability, or any change in functionality from simply using release=2. I profiled the code and found that alot of functions took significantly less time.
Ed Sweetman endorses this message.
-
- Lead Network Developer
- Posts: 2560
- Joined: Sun Jan 12, 2003 9:13 am
- Location: Palo Alto CA
- Contact:
-
- Developer
- Posts: 2150
- Joined: Mon Apr 23, 2007 1:17 am
- Location: Pennsylvania
- Contact:
I'm running some further tests on just what gets vectorized and what doesn't.
If all goes well, i'll add a cpu detecting routine to the configure file and setup an option to enable aggressive opts.
aggressive opts will do a little bit more than just auto-vectorization. Though, not a whole lot more. I dont want to go too overboard.
Mostly the extra arguments will be related to alignments,arch. Most dists set these for gcc when the gcc pkg is geared to a single arch.
If all goes well, i'll add a cpu detecting routine to the configure file and setup an option to enable aggressive opts.
aggressive opts will do a little bit more than just auto-vectorization. Though, not a whole lot more. I dont want to go too overboard.
Mostly the extra arguments will be related to alignments,arch. Most dists set these for gcc when the gcc pkg is geared to a single arch.
Ed Sweetman endorses this message.
-
- Developer
- Posts: 2150
- Joined: Mon Apr 23, 2007 1:17 am
- Location: Pennsylvania
- Contact:
It appears that vectorization is slowing VS down. I'm not sure why but profiling it shows that the vectorized VS was slower than the non-vectorized in many cases by 40-50% that sucks. I dont know why it's performing so much worse. My guess is alignment issues, or the loops that are vectorized are not in fast code paths, and the standard header code that gets vectorized has mis-alignment issues with our own code.
who knows.
./configure --enable-profile -with-boost=1.33 --with-python-version=2.5 --enable-nvidia-cg --enable-stencil-buffer --enable-release=2 --enable-flags="-march=k8 -m64 -ffast-math -fsingle-precision-constant -funroll-loops --param max-unroll-times=4 -funsafe-loop-optimizations -fgcse-sm -fgcse-las -maccumulate-outgoing-args
this performed rather well. some flags are redundent to defaults.
who knows.
./configure --enable-profile -with-boost=1.33 --with-python-version=2.5 --enable-nvidia-cg --enable-stencil-buffer --enable-release=2 --enable-flags="-march=k8 -m64 -ffast-math -fsingle-precision-constant -funroll-loops --param max-unroll-times=4 -funsafe-loop-optimizations -fgcse-sm -fgcse-las -maccumulate-outgoing-args
this performed rather well. some flags are redundent to defaults.
Ed Sweetman endorses this message.
-
- Explorer
- Posts: 8
- Joined: Tue Mar 11, 2008 2:28 am
- Location: UK
-
- ISO Party Member
- Posts: 445
- Joined: Tue Feb 11, 2003 8:04 am