free optimizations for building vs

Talk among developers, and propose and discuss general development planning/tackling/etc... feature in this forum.
Post Reply
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

free optimizations for building vs

Post by safemode »

I wanted to post this where lots of people who build their own VS read but maybe aren't interested in dev talk.

In GCC, the only way to auto generate SIMD instructions is to use -ftree-vectorize and either -maltivec or -msse/-msse2 etc . Now, the -msse/-msse2 also requires -mfpmath=sse i believe. Some of this stuff is on by default in the amd64 releases by Debian, Not all of it though.

I'd be interested if people would make clean, then configure with --enable-release="-O2 -ftree-vectorize" and add the -maltivec for mac or -msse/-msse2 -mfpmath=sse as appropriate for x86.

This is not to measure perf increases, but to see if this breaks anything or causes segfaults.



The idea is that such an option could be set via configure detecting the processor. But I only want to do that if it's doesn't cause weird crap to happen like -O3 does
Ed Sweetman endorses this message.
prolix
Explorer
Explorer
Posts: 8
Joined: Tue Mar 11, 2008 2:28 am
Location: UK

Post by prolix »

Hi

Recompiling now, configured with this:

Code: Select all

./configure --enable-stencil-buffer --enable-flags="-O2 \
	-march=athlon-xp -mfpmath=sse -msse \
	-ftree-vectorize -fomit-frame-pointer -pipe"
Will post back when I've used the binary a bit more.

EDIT:
Segfaulted for first time in a while. Not sure if it was related to compiling with those flags or running out of fuel. Afterburner seems to take up a lot of fuel in svn r12066.
Maybe I should compile without -fomit-frame-pointer to get a backtrace?
Miramor
ISO Party Member
ISO Party Member
Posts: 410
Joined: Tue Jun 26, 2007 7:15 pm

Post by Miramor »

safemode: I'll try the optimizations you posted and see what I get.
Miramor
ISO Party Member
ISO Party Member
Posts: 410
Joined: Tue Jun 26, 2007 7:15 pm

Post by Miramor »

Code: Select all

src/gldrv/gl_init.cpp: In function ‘void GFXInit(int, char**)’:
src/gldrv/gl_init.cpp:440: warning: deprecated conversion from string constant to ‘char*’
src/gldrv/gl_init.cpp:440: warning: deprecated conversion from string constant to ‘char*’
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘<’ token
src/gldrv/gl_init.cpp:453: error: expected primary-expression before ‘.’ token
src/gldrv/gl_init.cpp:454: error: expected primary-expression before ‘==’ token
src/gldrv/gl_init.cpp:454: error: expected primary-expression before ‘==’ token
src/gldrv/gl_init.cpp:454: error: expected primary-expression before ‘=’ token
src/gldrv/gl_init.cpp:455: error: ‘struct gl_options_t’ has no member named ‘smooth_lines’
src/gldrv/gl_init.cpp:456: error: ‘struct gl_options_t’ has no member named ‘smooth_points’
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘>’ token
src/gldrv/gl_init.cpp:459: error: expected primary-expression before ‘.’ token
src/gldrv/gl_init.cpp:460: error: expected `;' before ‘gl_options’
make[1]: *** [src/gldrv/gl_init.o] Error 1
make[1]: Leaving directory `/home/proteus/VegaStrike/vegastrike'
make: *** [all] Error 2
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

There was a SVN conflict in your gl_init.cpp file because of some modifications you may have made.

Open up the file in a text editor and search for "<<<<<" Then, pick the half of the conflict you want to keep.

If you aren't sure, do a "svn revert src/gldrv/gl_init.cpp" and try 'make' again.
Miramor
ISO Party Member
ISO Party Member
Posts: 410
Joined: Tue Jun 26, 2007 7:15 pm

Post by Miramor »

Oh... that's probably from my reversion of Breakable's earlier patch. Reverting that patch is necessary to make VS playable on machines with Intel graphics chipsets, otherwise VS will keep smoothing lines even when smoothing is disabled; however, the patch still hasn't been reverted in SVN as far as I know.
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

Miramor, can you try applying this patch to a clean SVN?

Code: Select all

Index: src/gldrv/gl_state.cpp
===================================================================
--- src/gldrv/gl_state.cpp	(revision 12088)
+++ src/gldrv/gl_state.cpp	(working copy)
@@ -219,14 +219,8 @@
 	  glDisable(GL_CULL_FACE);
 	  break;
 	case SMOOTH:
-		if(gl_options.smooth_lines)
-		{
-			glDisable(GL_LINE_SMOOTH);
-		}
-		if(gl_options.smooth_points)
-		{
-			glDisable (GL_POINT_SMOOTH);
-		}
+		glDisable(GL_LINE_SMOOTH);
+		glDisable (GL_POINT_SMOOTH);
 		break;
     case STENCIL:
         glDisable(GL_STENCIL);
Index: src/gfx/particle.cpp
===================================================================
--- src/gfx/particle.cpp	(revision 12088)
+++ src/gfx/particle.cpp	(working copy)
@@ -103,8 +103,12 @@
   static float psiz=XMLSupport::parse_float (vs_config->getVariable ("graphics","sparkesize","1.5"));
   
   GFXPointSize(psiz);
-  static bool psmooth=XMLSupport::parse_bool (vs_config->getVariable ("graphics","sparkesmooth","false"));  
-  glEnable(psmooth);
+  
+  static bool psmooth=XMLSupport::parse_bool (vs_config->getVariable ("graphics","sparkesmooth","false"));
+  if (psmooth && gl_options.smooth_points) {
+    glEnable(GL_POINT_SMOOTH);
+  }
+  
 #else
   GFXEnable(TEXTURE0);
   GFXDisable(TEXTURE1);
@@ -154,10 +158,7 @@
   }
   GFXEnd();
 #ifdef USE_POINTS  
-  if(gl_options.smooth_points)
-  {
-	  glDisable (GL_POINT_SMOOTH);
-  }
+  glDisable (GL_POINT_SMOOTH);
   GFXPointSize(1);
 #else
   GFXDisable(DEPTHWRITE);
I think Breakable's patch did some of the right things... but clearly it left it enabled somewhere which is bad.

I added extra conservative glDisable's to the places where I thought they might be needed. (and fixed one place that did a glEnable with a boolean argument)

If this doesn't fix it, I'll try to take out more of his patchset until it does work :-p

The Intel Macbook doesn't have this problem as far as I can tell...
Miramor
ISO Party Member
ISO Party Member
Posts: 410
Joined: Tue Jun 26, 2007 7:15 pm

Post by Miramor »

ace123: the patch fails to apply.

Code: Select all

$ patch -p0 <nosmooth.patch
patching file src/gldrv/gl_state.cpp
Hunk #1 FAILED at 219.
1 out of 1 hunk FAILED -- saving rejects to file src/gldrv/gl_state.cpp.rej
patching file src/gfx/particle.cpp
Hunk #1 FAILED at 103.
Hunk #2 FAILED at 158.
2 out of 2 hunks FAILED -- saving rejects to file src/gfx/particle.cpp.rej
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

This is to a clean (from SVN) version of particle.cpp and gl_state.cpp, right?

I think the forum broke the spacing.
I'm attaching a patch file.
You do not have the required permissions to view the files attached to this post.
Miramor
ISO Party Member
ISO Party Member
Posts: 410
Joined: Tue Jun 26, 2007 7:15 pm

Post by Miramor »

Yep, clean... I'll see how the attached version works.

Edit: yeah that worked. Compiling now.
Miramor
ISO Party Member
ISO Party Member
Posts: 410
Joined: Tue Jun 26, 2007 7:15 pm

Post by Miramor »

Hmm, these optimizations seem to slow down VegaStrike's loading. Weird.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

try with and without patch?


I would not rely on loading times to dictate which helps or not. Loading is dependent on the video driver and the disk cache. Instead, focus on in-game details. For one, viewing Atlantis just after launch ...another would be flying very close to a base and seeing how the game behaves. etc
Ed Sweetman endorses this message.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

I didn't use the patch posted in the thread.

i'm running on an athlon64 x2 (not am2), and i'm in 64bit mode using gcc (GCC) 4.2.3 (Debian 4.2.3-2)

my arguments were
./configure -with-boost=1.33 --with-python-version=2.5 --enable-nvidia-cg --enable-stencil-buffer --enable-release=2 --enable-flags="-ftree-vectorize -msse -msse2 -mfpmath=sse -mmmx"


I'm using the internal boost because for some reason, using debian's 1.34 version of boost causes the game to go slower. Not sure why.

With the configurations as shown, i did not experience any instability, or any change in functionality from simply using release=2. I profiled the code and found that alot of functions took significantly less time.
Ed Sweetman endorses this message.
ace123
Lead Network Developer
Lead Network Developer
Posts: 2560
Joined: Sun Jan 12, 2003 9:13 am
Location: Palo Alto CA
Contact:

Post by ace123 »

I have not heard back, but I can only assume that my patch didn't make things any worse... But I would like to know if it solved the problem.

I will commit it to SVN.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

I'm running some further tests on just what gets vectorized and what doesn't.


If all goes well, i'll add a cpu detecting routine to the configure file and setup an option to enable aggressive opts.

aggressive opts will do a little bit more than just auto-vectorization. Though, not a whole lot more. I dont want to go too overboard.

Mostly the extra arguments will be related to alignments,arch. Most dists set these for gcc when the gcc pkg is geared to a single arch.
Ed Sweetman endorses this message.
safemode
Developer
Developer
Posts: 2150
Joined: Mon Apr 23, 2007 1:17 am
Location: Pennsylvania
Contact:

Post by safemode »

It appears that vectorization is slowing VS down. I'm not sure why but profiling it shows that the vectorized VS was slower than the non-vectorized in many cases by 40-50% that sucks. I dont know why it's performing so much worse. My guess is alignment issues, or the loops that are vectorized are not in fast code paths, and the standard header code that gets vectorized has mis-alignment issues with our own code.

who knows.
./configure --enable-profile -with-boost=1.33 --with-python-version=2.5 --enable-nvidia-cg --enable-stencil-buffer --enable-release=2 --enable-flags="-march=k8 -m64 -ffast-math -fsingle-precision-constant -funroll-loops --param max-unroll-times=4 -funsafe-loop-optimizations -fgcse-sm -fgcse-las -maccumulate-outgoing-args


this performed rather well. some flags are redundent to defaults.
Ed Sweetman endorses this message.
prolix
Explorer
Explorer
Posts: 8
Joined: Tue Mar 11, 2008 2:28 am
Location: UK

Post by prolix »

The auto-vectorized code is way slow, I removed those flags. I think using 387 is faster on athlon-xp than sse, I removed the -mfpmath=sse flag too. Seems a lot smoother now.
Not that I'm expert at compiler setting, but curious of what gcc people use? I'm using the Gentoo gcc-4.1.2
energyman76b
ISO Party Member
ISO Party Member
Posts: 445
Joined: Tue Feb 11, 2003 8:04 am

Post by energyman76b »

If you don't use an amd64 system, don't use -mfpmath=sse. If you use an amd64 system, you don't need -mfpmath=sse because it is default if you set the right march.
Post Reply