CineMut shader family - Opaque

Post by **chuck_starchaser** » Wed Jun 18, 2008 5:04 am

Want to work together on it, Klauss?

Here's my starting file:

//NEW SHADER (high end)
//samplers
uniform samplerCube cubeMap;
uniform sampler2D diffuseMap;
uniform sampler2D specMap;
uniform sampler2D glowMap;
uniform sampler2D normalMap;
uniform sampler2D damageMap;
uniform sampler2D detailMap;
//other uniforms
uniform vec4 cloaking; //why is cloaking a vec4, anyways?
uniform vec4 damage; //why is damage a vec4, anyways?
//how about uniform vec4 uniforms4; //.r=cloak, .g=damage, .b=?, .a=?
//envColor won't be needed, since we're fetching it from the envmap
//etceteras

//main
main()
{
	///VARIABLE DECLARATIONS
	//vector variables
	vec4 temp4; //all-purpose vec4 variable
	vec3 eye_vec3;
	vec3 vnormal_vec3;
	vec3 normal_vec3;
	vec3 tangent_vec3;
	vec3 cotangent_vec3; //"binormal" ;-)
	vec3 reflection_vec3;
	vec3 light0_vec3;
	vec3 light1_vec3;
	//material color variables
	vec3 diff_mat3;
	vec3 spec_mat3;
	vec3 glow_mat3;
	//inferent light variables
	vec3 light0_il3;
	vec3 light1 il3;
	vec3 ambient_il3; //to be fetched from envmap via normal
	vec3 specular_il3; //to be fetched from envmap via reflect vector
	//afferent light variables
	vec3 diff_contrib_al3;
	vec3 spec_contrib_al3;
	vec3 glow_contrib_al3;
	vec3 amb_contrib_al3;
	//scalar factors and coefficients
	float ao_glow_fac1; //plain ambient occlusion factor, used for ambient contribution
	float ao_spec_fac1; //squared ambient occlusion, used for specular modulation
	float ao_diff_fac1; //square root of ambient occlusion, used for diffuse
	float is_dielectric_mat1; //0.0 = metal; 1.0 = dielectric
	float alpha_mat1;
	float gloss_mat1; //A.K.A. "shininess"
	float limited_gloss_fac1; //smooth-limited gloss to use for spotlights
	///MOSTLY TEXTURE FETCHES (start with spec, as we'll need shininess at the earliest time):
	//read specular into temp4, then spec_mat gets .rgb, and gloss_mat gets .a^2 (assume gamma=0.5)
	//read normalmap into temp4, then .rgb goes to U, .a goes to V
	//read glow texture into temp4, then .rgb^2 goes to glow_mat, and .a goes to ao_glo_fac
	//read diffuse into temp4, then rgb goes to diff_mat, a goes to alpha
	//read damage texture into temp4, then .rgb goes to damg_mat3, .a goes to is_dielectric
	//read detail texture into temp4; then .rgb modulates diffuse/spec, .a modulates shininess
	///VECTOR COMPUTATIONS (MOSTLY, some shininess post-processing interspersed):
	//mesh-derived vector computations (vnormal, tangent, cotangent)
	//soft penumbra stuff
	//self-shadow stuff
	//normalmapping-derived vector computations (normal (using tangent))
	//shininess to env map LOD
	//reflection vector computation
	//compute smooth shininess limit for spotlights
	//compute spot brighness factor from limited shininess
	//read env LOD (normal); .rgb^2 goes to ambient_il3 (assume gamma=0.5)
	//read env LOD (reflect); .rgb^2 goes to specular_il3 (assume gamma=0.5)
	//FRESNEL STUFF
	//other gammas of ambient occlusion
	ao_diff_fac1 = sqrt( ao_glow_fac1 );
	ao_spec_fac1 = ao_glow_fac1 * ao_glow_fac1;
	//AMBIENT CONTRIBUTION
	//DIFFUSE LIGHT0
	//DIFFUSE LIGHT1
	//SPEC-SPOTLIGHT LIGHT 0
	//SPEC-SPOTLIGHT LIGHT 1
	//ENVIRONMENT MAPPING
	//GLOW
	//CLOAK
	//WRITE
}

I'm already trying to optimize code by not putting lines with dependencies too close to each other; so that's why you won't see much of a relation between one code line and the next; it's intentional.
Anyways, Klauss, if you want to start throwing in some code, be my guest

First part is the hardest for me; I don't even know what the gl_TexCoord[BLAH].xyz things do, like where the numbers come from.
Needless to say, I'm not familiar with cubemap stuff; and I hope there's a textureCubeLOD() function...

And where do other interpolants come from, like attenuation?
I guess I could look it up but gotta go to bed; working early tomorrow.

EDIT:
Phewww... Yes, it exists:

Code: Select all

vec4 textureCubeLod(samplerCube s, vec3 coord , float lod);

http://www.opengl.org/sdk/docs/tutorial ... turing.php

Post by **chuck_starchaser** » Wed Jun 18, 2008 2:07 pm

Klauss, why are we using texture2DLOD()? The plain texture2D() has an optional bias parameter...

Code: Select all

vec4 textureCube(samplerCube s, vec3 coord [, float bias])
vec4 textureCubeLod(samplerCube s, vec3 coord , float lod)

It would seem to me, using bias would be more correct: The sampler already computes an LOD level that supposedly should be exactly appropriate for the given envmap size and screen resolution to represent a material of the highest shininess. Shouldn't that be our starting point from which to bias towards larger blurring radii?

Post by **klauss** » Wed Jun 18, 2008 2:29 pm

You'd think, but you'll notice that using the bias introduces lots of artifacts. At least I couldn't do it artifactless with bias alone.

Post by **chuck_starchaser** » Wed Jun 18, 2008 2:58 pm

Alright, never mind, then; just checking.
EDIT:
Just curious: what kind of artifacts? I mean, the spheremaps we're using presently are artifact factories. Maybe we should try bias again, after we move to cubemaps?

Post by **chuck_starchaser** » Thu Jun 19, 2008 2:30 am

Klauss, been moving stuff over from your modified high end fp to the new shader. There's stuff I don't understand at all, in the lighting function:
float NdotL = clamp( dot(normal,light), 0.0, VNdotL );
float RdotL = clamp( dot(reflection,light), 0.0, VNdotL );
Why clamp these to VNdotL, and not 1.0? Isn't self-shadow enough?
Specially RdotL; I think that's wrong. It would only allow bumps to have shadows, but not highlights.

Post by **chuck_starchaser** » Thu Jun 19, 2008 3:57 am

Okay, here's the work in progress:

Code: Select all

//NEW SHADER (high end)
uniform int light_enabled[gl_MaxLights];
uniform int max_light_enabled;
//samplers
uniform samplerCube cubeMap;
uniform sampler2D diffMap;   //1-bit alpha in alpha, for alpha-testing only
uniform sampler2D specMap;   //sqrt(shininess) in alpha
uniform sampler2D glowMap;   //ambient occlusion in alpha
uniform sampler2D normMap;   //U in .rgb; V in alpha
uniform sampler2D damgMap;   //"is_dielectric" in 1-bit alpha
uniform sampler2D detailMap; //.rgb adds to diffuse, subtracts from spec; alpha mods shininess
//other uniforms
uniform vec4 cloakdmg; //.rg=cloak, .ba=damage
//envColor won't be needed, since we're fetching it from the envmap

//NOTE: Since the term "binormal" has been rightly deprecated, I use "cotangent" instead :)

vec3 fastnormalize( vec3 input ) //less accurate than normalize() but should use less instructions
{
	float tmp = dot( input, input );
	tmp = 1.5 - (0.5*tmp);
	return tmp * input;
}
vec3 norm_decode( vec4 input )
{
	//The LaGrande normalmap noodle does away with the z-term for the normal by encoding U and V
	//as 0.5*tan( angle ), where angle is arcsin( U ) or arcsin( V ), respectively. To fit that
	//into a 0-1 range, we multiply by 0.5 once again, and add 0.5.
	//To reverse the encoding, we first subtract 0.5, then multiply by four, fill the z term with
	//1.0, and normalize. But multiplying by four is not needed if instead we fill the z term with
	//0.25, instead; *then* normalize:
	vec3 result;
	result.x = 0.3333*(input.r+input.g+input.b) - 0.5;
	result.y = input.a - 0.5;
	result.z = 0.25;
	return normalize( result ); //can't use fastnormalize() here
}
vec3 imatmul(vec3 tan, vec3 cotan, vec3 norm, vec3 light)
{
    return light.xxx*tan + light.yyy*cotan + light.zzz*norm;
}
float soft_NdotL( float NdotL ) //for soft penumbras
{
	float s = 1.0 - (NdotL*NdotL); //s is 1.0 at penumbra point, falls slowly
	s *= s; //falls faster
	s *= s; //falls much faster either way from the penumbra point
	s *= NdotL; //s is now zero at penumbra but has tiny +/- wavelets to the sides
	return clamp( 0.98*NdotL + 0.02 - s, 0.0, 1.0 ); //we shrink NdotL by 2%, shift
   //it up by 2%, and subtract the s wavelet to flatten the penumbra area
}
float selfshadow(float sNdotL) //use of soft NdotL should be most correct
{
  float s = clamp(1.0 - sNdotL, 0.0, 1.0);
  s *= s;
  s *= s;
  s *= s;
  return clamp(1.0 - s, 0.0, 1.0);
}
float specularNormalizeFactor( float limited_shininess )
{
   return pow(1.7/(1.0+limited_shininess/10.0),-1.7);
}
void lightingLight
   (
   in vec3 light, in vec3 normal, in vec3 vnormal, in vec3 reflection,
   in vec3 lightDiffuse, in float lightAtt, in float ltd_gloss,
   inout vec3 diff_acc, inout vec3 spec_acc
   )
{
   float NdotL = clamp( dot(normal,light), 0.0, 1.0 );
   float sNdotL = soft_NdotL( dot(vnormal,light) );
   float RdotL = clamp( dot(reflection,light), 0.0, 1.0 );
   float selfshadow = selfshadow( sNdotL );
   float spec = pow( RdotL, ltd_gloss );
   diff_acc += ( NdotL * lightDiffuse.rgb * lightAtt * selfshadow );
   spec_acc += ( lightDiffuse.rgb * lightAtt * selfshadow );
}

#define lighting(name, lightno_gl, lightno_tex) \
void name( \
   in vec3 normal, in vec3 vnormal, in  vec3 reflection, \
   in float limited_gloss, \
   inout vec3 diff_acc, inout vec3 spec_acc) \
{ \
   lightingLight( \
      normalize(gl_TexCoord[lightno_tex].xyz), \
      normal, vnormal, reflection, \
      gl_FrontLightProduct[lightno_gl].diffuse.rgb, \
      gl_TexCoord[lightno_tex].w, \
      limited_gloss, \
      diff_acc, spec_acc); \
}

lighting(lite0, 0, 5)
lighting(lite1, 1, 6)

main()
{
	///VARIABLE DECLARATIONS
	//vector variables
	vec4 temp4; //all-purpose vec4 temporary
	vec3 eye_vec3;
	vec3 vnormal_vec3;
	vec3 normal_vec3;
	vec3 tangent_vec3;
	vec3 cotangent_vec3; //"binormal" ;-)
	vec3 reflect_vec3;
	vec3 light0_vec3;
	vec3 light1_vec3;
	//material color variables
	vec3 diff_mat3;
	vec3 damg_mat3;
	vec3 spec_mat3;
	vec3 glow_mat3;
	//inferent light variables
	vec3 light0_il3;
	vec3 light1 il3;
	vec3 ambient_il3; //to be fetched from envmap via normal
	vec3 specular_il3; //to be fetched from envmap via reflect vector
	vec3 spec_light_acc3; //specular light accumulator
	vec3 diff_light_acc3; //diffuse light accumulator
	//afferent light variables
	vec3 diff_contrib_al3;
	vec3 spec_contrib_al3;
	vec3 glow_contrib_al3;
	vec3 amb_contrib_al3;
	//scalar factors and coefficients
	float ao_glow_fac1; //plain ambient occlusion factor, used for ambient contribution
	float ao_spec_fac1; //squared ambient occlusion, used for specular modulation
	float ao_diff_fac1; //square root of ambient occlusion, used for diffuse
	float is_dielectric_mat1; //0.0 = metal; 1.0 = dielectric
	float alpha_mat1;
	float gloss_mat1; //A.K.A. "shininess"
	float limited_gloss_fac1; //smooth-limited gloss to use for spotlights
	//interpolated mesh data fetches
	vec2 texcoords2 = gl_TexCoord[0].xy;
	vec3 vnormal_vec3 = fastnormalize( gl_TexCoord[1].xyz );
	vec3 tangent_vec3 = gl_TexCoord[2].xyz;
	vec3 cotangent_vec3 = gl_TexCoord[3].xyz;
	///MOSTLY TEXTURE FETCHES (start with spec, as we'll need shininess at the earliest time):
	//read specular into temp4, then spec_mat gets .rgb, and gloss_mat gets .a^2 (assume gamma=0.5)
	temp4 = texture2D(specMap,texcoords2).rgba;
	spec_mat3 = temp4.rgb; gloss_mat1 = temp4.a;
	//read normalmap into temp4, then .rgb goes to U, .a goes to V (tangent space just for now)
	temp4 = texture2D(normMap,texcoords2).rgba;
	normal_vec3 = norm_decode( temp4 );
	//read glow texture into temp4, then .rgb^2 goes to glow_mat, and .a goes to ao_glo_fac
	temp4 = texture2D(glowMap,texcoords2).rgba;
	glow_mat3 = temp4.rgb * temp4.rgb; ao_glow_fac1 = temp4.a; 
	//read diffuse into temp4, then rgb goes to diff_mat, a goes to alpha
	temp4 = texture2D(diffMap,texcoords2).rgba;
	diff_mat3 = temp4.rgb; alpha_mat1 = temp4.a;
	//read damage texture into temp4, then .rgb goes to damg_mat3, .a goes to is_dielectric
	temp4 = texture2D(damgMap,texcoords2).rgba;
	damg_mat3 = temp4.rgb; is_dielectric_mat1 = temp4.a;
	//read detail texture into temp4; then .rgb modulates diffuse/spec, .a modulates shininess
	temp4 = texture2D(detailMap,16.0*texcoords2);
	temp4 -= vec4( 0.5 ); temp4 *= 0.12345;
	diff_mat3 -= temp4.rgb; spec_mat3 += temp4.rgb; gloss_mat1 -= temp4.a;
	///OTHER PRE-PER-LIGHT COMPUTATIONS
	//normalmapping-derived vector computations (normal (using tangent))
    normal_vec3 = fastnormalize(imatmul(tangent_vec3,cotangent_vec3,vnormal_vec3,normal_vec3));
	//reflection vector computation
	reflect_vec3 = -reflect( eye_vec3, normal_vec3 );
	//compute smooth shininess limit for spotlights
    limited_gloss_fac1 = limited_shininessMap(shininess,specmap);
    //initialize accumulators
    diff_light_acc3 = spec_light_acc3 = vec3( 0.0 );
    //and might as well compute the shininess adjusted specularity
    float some = specularNormalizerFactor( limited_gloss_fac1 );
	//and might as well compute other gammas of ambient occlusion
	ao_diff_fac1 = sqrt( ao_glow_fac1 );
	ao_spec_fac1 = ao_glow_fac1 * ao_glow_fac1;
	///PER-LIGHT COMPUTATIONS
    if( light_enabled[0] != 0 )
     lite0(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
    if( light_enabled[1] != 0 )
     lite1(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
    //process accumulations
    diff_contrib_al3 = diff_light_acc3 * diff_mat3 * ao_diff_fac1;
    spec_contrib_al3 = spec_light_acc3 * some * spec_mat3 * ao_spec_fac1;
	//FRESNEL STUFF
	///ENVIRONMENT MAPPING
	//shininess to env map LOD
	//read env LOD (normal); .rgb^2 goes to ambient_il3 (assume gamma=0.5)
	//read env LOD (reflect); .rgb^2 goes to specular_il3 (assume gamma=0.5)
	//AMBIENT CONTRIBUTION
	//GLOW
	//CLOAK
	//WRITE
}

Could you please check it, Klauss? I've made a major cleanup of the lightingLight function and macro and whatnot. I pulled the specularNormalize stuff out of it, since it's a material property that can multiply the accumulated specular light; --doesn't need to be done per-light. There were inputs that weren't being used. I replaced the specmap and material shininess inputs by a pre-computed limited shininess input, so that the limited shininess is computed only once. And I fixed those clamps by VNdotL; --see my previous post.

I stopped at the fresnel stuff, for now.

Post by **klauss** » Thu Jun 19, 2008 2:46 pm

chuck_starchaser wrote:Klauss, been moving stuff over from your modified high end fp to the new shader. There's stuff I don't understand at all, in the lighting function:
float NdotL = clamp( dot(normal,light), 0.0, VNdotL );
float RdotL = clamp( dot(reflection,light), 0.0, VNdotL );
Why clamp these to VNdotL, and not 1.0? Isn't self-shadow enough?
Specially RdotL; I think that's wrong. It would only allow bumps to have shadows, but not highlights.

The idea about clamping to VNdotL is that VNdotL specifies the overall surface geometry and NdotL only smaller features. Take a look, VNdotL is prescaled (4x IIRC), so in reality it's like clamp( dot(normal,light), 0.0, 4*dot(vnormal, light) ). The idea is that NdotL could never be wildly above VNdotL, because when VNdotL approaches 0 bigger surface elements block light. It's a form of self-shadowing I found quite superior in visuals to the usual (ie: multiplying by 4*VNdotL).

I'll try to take a look at your shader but I've been having major trouble with shaders lately. Remember the utter slowness on some cards (nVidia BTW) when approaching large stations, asteroid belts, and such? I tracked it: it's the vertex shader. nVidia's compiler doesn't like the macro magic I did to make ATI run in hardware. But ATI doesn't like for loops that make nVidia happy. I'm at an impass. I don't know what to do, except add a vssetup option "nVidia vs ATI".

I say this because your shaders are one huge basic block. Chuck... one huge basic block may be good in microcode, but reading it is painful! And it's exactly what nVidia doesn't like! Shader compilers can optimize a helluva lot, you can add a function or two, they will be inlined if possible, and a loop or two, it will be unrolled if possible (at least nVidia does it right). You can use nvShaderPerf (great tool) to print out the generated assembly if you wish

But if doing that will break ATI... then it's something we must resolve first don't you think? The very very bad news is that the P3 in which I had the ATI R9800 installed just became kaputt... HD doesn't work right, and I doubt it's the HD (it's new), I think it's the IDE controller, which would mean the mobo, so... bad bad news - I won't be able to easily check this out. BTW: do you know of any good mobo manufacturer? I'm getting disenchanted with ASUS, and I know of no decent manufacturer except perhaps intel (though I'm not ecstatic about them) or something with nVidia chipsets.

Post by **safemode** » Thu Jun 19, 2008 2:56 pm

get away from via based chipsets. Nvidia chipsets are awesome

For now the idea of having to select your video card maker isn't bad. In the next incarnation of vssetup, this could be autodetected via GL driver data.

Doesn't seem like we'll be able to avoid having an Nvidia specifc shader and an ATI specific one.

Post by **chuck_starchaser** » Thu Jun 19, 2008 2:58 pm

klauss wrote:The idea about clamping to VNdotL is that VNdotL specifies the overall surface geometry and NdotL only smaller features. Take a look, VNdotL is prescaled (4x IIRC), so in reality it's like clamp( dot(normal,light), 0.0, 4*dot(vnormal, light) ). The idea is that NdotL could never be wildly above VNdotL, because when VNdotL approaches 0 bigger surface elements block light. It's a form of self-shadowing I found quite superior in visuals to the usual (ie: multiplying by 4*VNdotL).

Ok, I get it now. I guess I confused myself by removing the 4.0*... first, and *then* looking at it

I'll put it back, but I think I'll use 4.0*sNdotL so that bump highlights don't suddenly disappear towards the penumbra.

I'll try to take a look at your shader but I've been having major trouble with shaders lately. Remember the utter slowness on some cards (nVidia BTW) when approaching large stations, asteroid belts, and such? I tracked it: it's the vertex shader. nVidia's compiler doesn't like the macro magic I did to make ATI run in hardware. But ATI doesn't like for loops that make nVidia happy. I'm at an impass. I don't know what to do, except add a vssetup option "nVidia vs ATI".

Ouch!

I say this because your shaders are one huge basic block. Chuck... one huge basic block may be good in microcode, but reading it is painful!

Not true. You're probably looking at the first post, which had almost no code; mostly just comments to be followed by code later. If you look at my last post, I'm using as many functions as you are.

And it's exactly what nVidia doesn't like! Shader compilers can optimize a helluva lot, you can add a function or two, they will be inlined if possible; and a loop or two, it will be unrolled if possible (at least nVidia does it right).

Indeed. Not just "if possible"; they WILL be. It's a long, straight pipeline; isn't it?

You can use nvShaderPerf (great tool) to print out the generated assembly if you wish

Ok, thanks; I'll look for it.

safemode wrote:For now the idea of having to select your video card maker isn't bad. In the next incarnation of vssetup, this could be autodetected via GL driver data.

Doesn't seem like we'll be able to avoid having an Nvidia specifc shader and an ATI specific one.

Autodetection would be the way to go, I would think. Vssetup is already pretty rich in options, and getting richer...
Maintaining this constantly growing set of shaders is going to become a nightmare, pretty soon. Shaders multiply faster than rabbits, it seems. Maybe we need code to put them together from smaller fragments; like this macros vs for loops thing; perhaps this dicotomy could be isolated and inserted into the shader appropriately at run-time?
I'm thinking, we could use the current /programs folder for shader fragments, and then the engine could assemble shaders from those fragments and put them in a .vegastrike/programs folder.

Post by **klauss** » Thu Jun 19, 2008 3:27 pm

chuck_starchaser wrote:Not true. You're probably looking at the first post. If you look at my last post, I'm using as many functions as you are.

Ok, yes, but the very same thing I do is wrong for nVidia (using macros instead of functions, unroll by hand instead of using for loops).

chuck_starchaser wrote:Indeed. Not just "if possible"; they WILL be. It's a long, straight pipeline; isn't it?

Not in shader model 3, they can actually loop and branch.

Post by **chuck_starchaser** » Thu Jun 19, 2008 3:44 pm

klauss wrote:
chuck_starchaser wrote:Not true. You're probably looking at the first post. If you look at my last post, I'm using as many functions as you are.
Ok, yes, but the very same thing I do is wrong for nVidia (using macros instead of functions, unroll by hand instead of using for loops).

I thought of replacing the macro with a for loop, but I actually didn't know how to write it and wasn't sure if it was the right thing. Could you show me how to do the for loop version of the lighting? Just the for statement; I can fill in the rest. Could we have some kind of #if #else #endif for now to differenciate ati from nvidia code? Are there precompiler conditionals in glsl?

chuck_starchaser wrote:Indeed. Not just "if possible"; they WILL be. It's a long, straight pipeline; isn't it?
Not in shader model 3, they can actually loop and branch.

Can't get that through my head. How is that even possible? Are you sure they are not inlined/unrolled? You have a link?

EDIT:
By the way,

Remember the utter slowness on some cards (nVidia BTW) when approaching large stations, asteroid belts, and such? I tracked it: it's the vertex shader.

CONGRATULATIONS!

Post by **safemode** » Thu Jun 19, 2008 3:49 pm

something tells me that the shader differences from nvidia to ati wont allow some type of modular system to be created where by we can pick and choose pieces and put together a complete shader.

We can have vssetup determine what shader set to use (nvidia or ati) automatically, then we can have the engine determine what shader to use based on framerate, like we do now, where periodically we will poll the framerate and if it's below a thresh-hold, we drop to the next lower shader.

Thus, the only option about shaders we need to do is set a "max" shader. This would be the shader we start out from.

If you can figure out how to do modular shading that wouldn't be more complicated than just having a separate full shader, that would be cool to outline here.

Post by **chuck_starchaser** » Thu Jun 19, 2008 5:47 pm

Damn! I'd completely missed this paragraph.

klauss wrote:But if doing that will break ATI... then it's something we must resolve first don't you think? The very very bad news is that the P3 in which I had the ATI R9800 installed just became kaputt... HD doesn't work right, and I doubt it's the HD (it's new), I think it's the IDE controller, which would mean the mobo, so... bad bad news - I won't be able to easily check this out. BTW: do you know of any good mobo manufacturer? I'm getting disenchanted with ASUS, and I know of no decent manufacturer except perhaps intel (though I'm not ecstatic about them) or something with nVidia chipsets.

I became disenchanted with ASUS years ago. My current mobo (getting a bit oldish now, but rock-solid) is an Abit KN8. Socket 939 Athlon64 mobo, with nVidia chipset.
One thing I look for in mobos, since my bad experiences with ASUS, is the chipset fan. In the KN8 it is replaceable. In the last ASUS mobo I had, the chipset fan was glued to the northbridge AND riveted to the board; and it failed in less than six months. I've sworn to never buy anything ASUS since then.

Post by **chuck_starchaser** » Thu Jun 19, 2008 9:21 pm

Now with NVShadePerf I've managed to get rid of a lot of typos and such; the shader is compiling, at least; though it's not finished yet.

Code: Select all

//NEW SHADER (high end)
uniform int light_enabled[gl_MaxLights];
uniform int max_light_enabled;
//samplers
uniform samplerCube cubeMap;
uniform sampler2D diffMap;   //1-bit alpha in alpha, for alpha-testing only
uniform sampler2D specMap;   //sqrt(shininess) in alpha
uniform sampler2D glowMap;   //ambient occlusion in alpha
uniform sampler2D normMap;   //U in .rgb; V in alpha
uniform sampler2D damgMap;   //"is_dielectric" in 1-bit alpha
uniform sampler2D detailMap; //.rgb adds to diffuse, subtracts from spec; alpha mods shininess
//other uniforms
uniform vec4 cloakdmg; //.rg=cloak, .ba=damage
//envColor won't be needed, since we're fetching it from the envmap

//NOTE: Since the term "binormal" has been rightly deprecated, I use "cotangent" instead :)

vec3 fastnormalize( vec3 input ) //less accurate than normalize() but should use less instructions
{
	float tmp = dot( input, input );
	tmp = 1.5 - (0.5*tmp);
	return tmp * input;
}
vec3 norm_decode( vec4 input )
{
	//The LaGrande normalmap noodle does away with the z-term for the normal by encoding U and V
	//as 0.5*tan( angle ), where angle is arcsin( U ) or arcsin( V ), respectively. To fit that
	//into a 0-1 range, we multiply by 0.5 once again, and add 0.5.
	//To reverse the encoding, we first subtract 0.5, then multiply by four, fill the z term with
	//1.0, and normalize. But multiplying by four is not needed if instead we fill the z term with
	//0.25, instead; *then* normalize:
	vec3 result;
	result.x = 0.3333*(input.r+input.g+input.b) - 0.5;
	result.y = input.a - 0.5;
	result.z = 0.25;
	return normalize( result ); //can't use fastnormalize() here
}
vec3 imatmul(vec3 tan, vec3 cotan, vec3 norm, vec3 light)
{
    return light.xxx*tan + light.yyy*cotan + light.zzz*norm;
}
float limited_shininess( float shine )
{
  float limit = 50; //50^2 is 2500. 2500*0.001 = 2.5 --enough risk of saturation!
  return (shine*limit)/(shine+limit);
}
float soft_NdotL( float NdotL ) //for soft penumbras
{
   float s = 1.0 - (NdotL*NdotL); //s is 1.0 at penumbra point, falls slowly
   s *= s; //falls faster
   s *= s; //falls much faster either way from the penumbra point
   s *= NdotL; //s is now zero at penumbra but has tiny +/- wavelets to the sides
   return clamp( 0.98*NdotL + 0.02 - s, 0.0, 1.0 ); //we shrink NdotL by 2%, shift
   //it up by 2%, and subtract the s wavelet to flatten the penumbra area
}
float selfshadow(float sNdotL) //use of soft NdotL should be most correct
{
  float s = clamp(1.0 - sNdotL, 0.0, 1.0);
  s *= s;
  s *= s;
  s *= s;
  return clamp(1.0 - s, 0.0, 1.0);
}
float specularNormalizeFactor( float limited_shininess )
{
   return pow(1.7/(1.0+limited_shininess/10.0),-1.7);
}
void lightingLight
   (
   in vec3 light, in vec3 normal, in vec3 vnormal, in vec3 reflection,
   in vec3 lightDiffuse, in float lightAtt, in float ltd_gloss,
   inout vec3 diff_acc, inout vec3 spec_acc
   )
{
   float NdotL = clamp( dot(normal,light), 0.0, 1.0 );
   float sNdotL = soft_NdotL( dot(vnormal,light) );
   float RdotL = clamp( dot(reflection,light), 0.0, 1.0 );
   float selfshadow = selfshadow( sNdotL );
   float spec = pow( RdotL, ltd_gloss );
   diff_acc += ( NdotL * lightDiffuse.rgb * lightAtt * selfshadow );
   spec_acc += ( lightDiffuse.rgb * lightAtt * selfshadow );
}

#define lighting(name, lightno_gl, lightno_tex) \
void name( \
   in vec3 normal, in vec3 vnormal, in  vec3 reflection, \
   in float limited_gloss, \
   inout vec3 diff_acc, inout vec3 spec_acc) \
{ \
   lightingLight( \
      normalize(gl_TexCoord[lightno_tex].xyz), \
      normal, vnormal, reflection, \
      gl_FrontLightProduct[lightno_gl].diffuse.rgb, \
      gl_TexCoord[lightno_tex].w, \
      limited_gloss, \
      diff_acc, spec_acc); \
}

lighting(lite0, 0, 5)
lighting(lite1, 1, 6)

void main()
{
	///VARIABLE DECLARATIONS
	//vector variables
	vec4 temp4; //all-purpose vec4 temporary
	vec3 eye_vec3;
	vec3 vnormal_vec3;
	vec3 normal_vec3;
	vec3 tangent_vec3;
	vec3 cotangent_vec3; //"binormal" ;-)
	vec3 reflect_vec3;
	vec3 light0_vec3;
	vec3 light1_vec3;
	//material color variables
	vec3 diff_mat3;
	vec3 damg_mat3;
	vec3 spec_mat3;
	vec3 glow_mat3;
	//inferent light variables
	vec3 light0_il3;
	vec3 light1_il3;
	vec3 ambient_il3; //to be fetched from envmap via normal
	vec3 specular_il3; //to be fetched from envmap via reflect vector
	vec3 spec_light_acc3; //specular light accumulator
	vec3 diff_light_acc3; //diffuse light accumulator
	//afferent light variables
	vec3 diff_contrib_al3;
	vec3 spec_contrib_al3;
	vec3 glow_contrib_al3;
	vec3 amb_contrib_al3;
	//scalar factors and coefficients
	float ao_glow_fac1; //plain ambient occlusion factor, used for ambient contribution
	float ao_spec_fac1; //squared ambient occlusion, used for specular modulation
	float ao_diff_fac1; //square root of ambient occlusion, used for diffuse
	float is_dielectric_mat1; //0.0 = metal; 1.0 = dielectric
	float alpha_mat1;
	float gloss_mat1; //A.K.A. "shininess"
	float limited_gloss_fac1; //smooth-limited gloss to use for spotlights
	//interpolated mesh data fetches
	vec2 texcoords2 = gl_TexCoord[0].xy;
	vnormal_vec3 = fastnormalize( gl_TexCoord[1].xyz );
	tangent_vec3 = gl_TexCoord[2].xyz;
	cotangent_vec3 = gl_TexCoord[3].xyz;
	///MOSTLY TEXTURE FETCHES (start with spec, as we'll need shininess at the earliest time):
	//read specular into temp4, then spec_mat gets .rgb, and gloss_mat gets .a^2 (assume gamma=0.5)
	temp4 = texture2D(specMap,texcoords2).rgba;
	spec_mat3 = temp4.rgb; gloss_mat1 = clamp( 255.0 * temp4.a * temp4.a, 1.0, 255.0 );
	//read normalmap into temp4, then .rgb goes to U, .a goes to V (tangent space just for now)
	temp4 = texture2D(normMap,texcoords2).rgba;
	normal_vec3 = norm_decode( temp4 );
	//read glow texture into temp4, then .rgb^2 goes to glow_mat, and .a goes to ao_glo_fac
	temp4 = texture2D(glowMap,texcoords2).rgba;
	glow_mat3 = temp4.rgb * temp4.rgb; ao_glow_fac1 = temp4.a; 
	//read diffuse into temp4, then rgb goes to diff_mat, a goes to alpha
	temp4 = texture2D(diffMap,texcoords2).rgba;
	diff_mat3 = temp4.rgb; alpha_mat1 = temp4.a;
	//read damage texture into temp4, then .rgb goes to damg_mat3, .a goes to is_dielectric
	temp4 = texture2D(damgMap,texcoords2).rgba;
	damg_mat3 = temp4.rgb; is_dielectric_mat1 = temp4.a;
	//read detail texture into temp4; then .rgb modulates diffuse/spec, .a modulates shininess
	temp4 = texture2D(detailMap,16.0*texcoords2);
	temp4 -= vec4( 0.5 ); temp4 *= 0.12345;
	diff_mat3 -= temp4.rgb; spec_mat3 += temp4.rgb; gloss_mat1 -= temp4.a;
	///OTHER PRE-PER-LIGHT COMPUTATIONS
	//normalmapping-derived vector computations (normal (using tangent))
    normal_vec3 = fastnormalize(imatmul(tangent_vec3,cotangent_vec3,vnormal_vec3,normal_vec3));
	//reflection vector computation
	reflect_vec3 = -reflect( eye_vec3, normal_vec3 );
	//compute smooth shininess limit for spotlights
    limited_gloss_fac1 = limited_shininess( gloss_mat1 );
    //initialize accumulators
    diff_light_acc3 = spec_light_acc3 = vec3( 0.0 );
    //and might as well compute the shininess adjusted specularity
    float some = specularNormalizeFactor( limited_gloss_fac1 );
	//and might as well compute other gammas of ambient occlusion
	ao_diff_fac1 = sqrt( ao_glow_fac1 );
	ao_spec_fac1 = ao_glow_fac1 * ao_glow_fac1;
	///PER-LIGHT COMPUTATIONS
    if( light_enabled[0] != 0 )
     lite0(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
    if( light_enabled[1] != 0 )
     lite1(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
    //we will process the accumulators later, to give the above loops time to finish
	///FRESNEL STUFF
	//Fresnel evaluates to a coefficient that will be used to blend white specular specularity with
	//the diffuse AND specular contributions, in the case of dielectrics. For non-dielectrics, fresnel
	//will be zero. Shininess for fresnel specularity is always maxed out. Specularity and shininess
	//specified through textures, with a dielectric material, will constitute a "third layer" for the
	//material, and will allow representation of metallized paints.
	//Fresnel:
	float fresnel_alpha = 1.0 - clamp( dot( eye_vec3, normal_vec3 ), 0.0, 1.0 );
	fresnel_alpha *= fresnel_alpha;
	fresnel_alpha = clamp( 0.0625 + ( 0.9375 * fresnel_alpha ), 0.0625, 1.0 );
	fresnel_alpha *= is_dielectric_mat1;
	///ENVIRONMENT MAPPING
	//shininess to env map LOD
	//read env LOD (normal); .rgb^2 goes to ambient_il3 (assume gamma=0.5)
	//read env LOD (reflect); .rgb^2 goes to specular_il3 (assume gamma=0.5)
	//AMBIENT CONTRIBUTION
	//GLOW
    //process accumulations
    diff_contrib_al3 = diff_light_acc3 * diff_mat3 * ao_diff_fac1;
    spec_contrib_al3 = spec_light_acc3 * some * spec_mat3 * ao_spec_fac1;
	//CLOAK
	//WRITE
}

Post by **chuck_starchaser** » Fri Jun 20, 2008 2:13 am

Klauss, I'm getting rid of xmesh shininess awareness, as well as this...

Code: Select all

    return min(7.0, max(0.0,7.0-log2(shininess+1.0))+3.0*(1.0+envColor.a));

...insanity. I know you explained to me the reasons for having all these parameters that add and multiply with the shininess, but it's INSANE. No artist will ever grasp such convoluted mess. I have shininess encoded with gamma = 0.5, so we can span 255 and still get good resolution at the low end, where it really matters. Besides, fresnel reflectivity won't even use texture2DLOD, it will use plain texture2D. (Yes, I separated the reads, because if you indicate that a material is dielectric, it only means that the top layer is dielectric, and that's usually a glossy layer, and fresnel reflection is always white, and the reflectivity is an immutable, coming from the fresnel formula; but if you specify a color in specular other than black, it means there's specularity to the stuff underneath the top layer, and you might also specify a shininess --e.g.: metalized car paint. Anyways, I got,

Code: Select all

float shininess2Lod(float shininess) 
{ 
    return clamp( 7.0 - log2( shininess + 1.0 ), 0.0, 7.0 );
}
float limited_shininess( float shine )
{
    float limit = 50.0; //50^2 is 2500. 2500*0.001 = 2.5 --enough risk of saturation!
    return (shine*limit)/(shine+limit);
}
float specularNormalizeFactor( float limited_shininess )
{
    return pow(1.7/(1.0+limited_shininess/10.0),-1.7);
}
vec3 ambientMapping( in vec3 normal )
{
    vec4 result = textureCubeLOD( cubeMap, normal, 7.7 );
	return result.rgb * result.a;
}
vec3 envMapping( in vec3 reflection )
{
	vec4 result = textureCube( cubeMap, reflection );
	return result.rgb * result.a;
}
vec3 envMappingLOD( in vec3 reflection, in float LoD )
{
    vec4 result = textureCubeLOD( cubeMap, reflection, LoD );
	return result.rgb * result.a;
}

Or... do you think I should encode shininess logarithmically?
I'm thinking, we could save some instructions that way, because we're taking the log of the shininess for LOD purposes, anyways...
And it seems to me, intuitively, that it would be the best encoding.
No need to clamp, either, because,
pow( 255.0, incoming_shininess )
is by definition 1 when the input is zero and 255 when the input is 1.0.

BTW, my multiplying by alpha in the returns of those envmap functions is in lieu of possibly using the alpha channel to increase precision.

Post by **chuck_starchaser** » Fri Jun 20, 2008 3:34 am

Alright, Klauss, here's a full update. ALMOST finished. I forgot to do the damage stuf....

Code: Select all

//NEW SHADER (high end)
uniform int light_enabled[gl_MaxLights];
uniform int max_light_enabled;
//samplers
uniform samplerCube cubeMap;
uniform sampler2D diffMap;   //1-bit alpha in alpha, for alpha-testing only
uniform sampler2D specMap;   //sqrt(shininess) in alpha
uniform sampler2D glowMap;   //ambient occlusion in alpha
uniform sampler2D normMap;   //U in .rgb; V in alpha
uniform sampler2D damgMap;   //"is_dielectric" in 1-bit alpha
uniform sampler2D detailMap; //.rgb adds to diffuse, subtracts from spec; alpha mods shininess
//other uniforms
uniform vec4 cloakdmg; //.rg=cloak, .ba=damage
//envColor won't be needed, since we're fetching it from the envmap

//NOTE: Since the term "binormal" has been rightly deprecated, I use "cotangent" instead :)

vec3 fastnormalize( vec3 input ) //less accurate than normalize() but should use less instructions
{
	float tmp = dot( input, input );
	tmp = 1.5 - (0.5*tmp);
	return tmp * input;
}
vec3 norm_decode( vec4 input )
{
	//The LaGrande normalmap noodle does away with the z-term for the normal by encoding U and V
	//as 0.5*tan( angle ), where angle is arcsin( U ) or arcsin( V ), respectively. To fit that
	//into a 0-1 range, we multiply by 0.5 once again, and add 0.5.
	//To reverse the encoding, we first subtract 0.5, then multiply by four, fill the z term with
	//1.0, and normalize. But multiplying by four is not needed if instead we fill the z term with
	//0.25, instead; *then* normalize:
	vec3 result;
	result.x = 0.3333*(input.r+input.g+input.b) - 0.5;
	result.y = input.a - 0.5;
	result.z = 0.25;
	return normalize( result ); //can't use fastnormalize() here
}
vec3 imatmul(vec3 tan, vec3 cotan, vec3 norm, vec3 light)
{
	return light.xxx*tan + light.yyy*cotan + light.zzz*norm;
}
float shininess2Lod(float shininess) 
{ 
	return clamp( 7.0 - log2( shininess + 1.0 ), 0.0, 7.0 );
}
float limited_shininess( float shine )
{
	float limit = 50.0; //50^2 is 2500. 2500*0.001 = 2.5 --enough risk of saturation!
	return (shine*limit)/(shine+limit);
}
float specularNormalizeFactor( float limited_shininess )
{
	return pow(1.7/(1.0+limited_shininess/10.0),-1.7);
}
vec3 ambientMapping( in vec3 normal )
{
	vec4 result = textureCubeLod( cubeMap, normal, 7.7 );
	return result.rgb * result.a;
}
vec3 envMapping( in vec3 reflection )
{
	vec4 result = textureCube( cubeMap, reflection );
	return result.rgb * result.a;
}
vec3 envMappingLOD( in vec3 reflection, in float LoD )
{
	vec4 result = textureCubeLod( cubeMap, reflection, LoD );
	return result.rgb * result.a;
}
float soft_NdotL( float NdotL ) //for soft penumbras
{
	float s = 1.0 - (NdotL*NdotL); //s is 1.0 at penumbra point, falls slowly
	s *= s; //falls faster
	s *= s; //falls much faster either way from the penumbra point
	s *= NdotL; //s is now zero at penumbra but has tiny +/- wavelets to the sides
	return clamp( 0.98*NdotL + 0.02 - s, 0.0, 1.0 ); //we shrink NdotL by 2%, shift
	//it up by 2%, and subtract the s wavelet to flatten the penumbra area
}
float selfshadow(float sNdotL) //use of soft NdotL should be most correct
{
	float s = clamp(1.0 - sNdotL, 0.0, 1.0);
	s *= s;
	s *= s;
	s *= s;
	return clamp(1.0 - s, 0.0, 1.0);
}
void lightingLight
   (
   in vec3 light, in vec3 normal, in vec3 vnormal, in vec3 reflection,
   in vec3 lightDiffuse, in float lightAtt, in float ltd_gloss,
   inout vec3 diff_acc, inout vec3 spec_acc
   )
{
	float NdotL = clamp( dot(normal,light), 0.0, 1.0 );
	float sNdotL = soft_NdotL( dot(vnormal,light) );
	float RdotL = clamp( dot(reflection,light), 0.0, 1.0 );
	float selfshadow = selfshadow( sNdotL );
	float spec = pow( RdotL, ltd_gloss );
	diff_acc += ( NdotL * lightDiffuse.rgb * lightAtt * selfshadow );
	spec_acc += ( lightDiffuse.rgb * lightAtt * selfshadow );
}

#define lighting(name, lightno_gl, lightno_tex) \
void name( \
   in vec3 normal, in vec3 vnormal, in  vec3 reflection, \
   in float limited_gloss, \
   inout vec3 diff_acc, inout vec3 spec_acc) \
{ \
	lightingLight( \
	  normalize(gl_TexCoord[lightno_tex].xyz), \
	  normal, vnormal, reflection, \
	  gl_FrontLightProduct[lightno_gl].diffuse.rgb, \
	  gl_TexCoord[lightno_tex].w, \
	  limited_gloss, \
	  diff_acc, spec_acc); \
}

lighting(lite0, 0, 5)
lighting(lite1, 1, 6)

void main()
{
	///VARIABLE DECLARATIONS
	//vector variables
	vec4 temp4; //all-purpose vec4 temporary
	vec3 eye_vec3;
	vec3 vnormal_vec3;
	vec3 normal_vec3;
	vec3 tangent_vec3;
	vec3 cotangent_vec3; //"binormal" ;-)
	vec3 reflect_vec3;
	vec3 light0_vec3;
	vec3 light1_vec3;
	//material color variables
	vec3 diff_mat3;
	vec3 damg_mat3;
	vec3 spec_mat3;
	vec3 glow_mat3;
	//inferent light variables
	vec3 light0_il3;
	vec3 light1_il3;
	vec3 ambient_il3; //to be fetched from envmap via normal
	vec3 specular_il3; //to be fetched from envmap via reflect vector
	vec3 spec_light_acc3; //specular light accumulator
	vec3 diff_light_acc3; //diffuse light accumulator
	//afferent light variables
	vec3 diff_contrib_al3;
	vec3 spec_contrib_al3;
	vec3 envm_contrib_al3;
	vec3 frsn_contrib_al3;
	vec3 glow_contrib_al3;
	vec3 amb_contrib_al3;
	//accumulator:
	vec4 result4;
	//scalar factors and coefficients
	float ao_glow_fac1; //plain ambient occlusion factor, used for ambient contribution
	float ao_spec_fac1; //squared ambient occlusion, used for specular modulation
	float ao_diff_fac1; //square root of ambient occlusion, used for diffuse
	float is_dielectric_mat1; //0.0 = metal; 1.0 = dielectric
	float gloss_mat1; //A.K.A. "shininess"
	float limited_gloss_fac1; //smooth-limited gloss to use for spotlights
	//interpolated mesh data fetches
	vec2 texcoords2 = gl_TexCoord[0].xy;
	vnormal_vec3 = fastnormalize( gl_TexCoord[1].xyz );
	tangent_vec3 = gl_TexCoord[2].xyz;
	cotangent_vec3 = gl_TexCoord[3].xyz;
	///MOSTLY TEXTURE FETCHES (start with spec, as we'll need shininess at the earliest time):
	//read specular into temp4, then spec_mat gets .rgb, and gloss_mat gets .a^2 (assume gamma=0.5)
	temp4 = texture2D(specMap,texcoords2).rgba;
	spec_mat3 = temp4.rgb; gloss_mat1 = clamp( 255.0 * temp4.a * temp4.a, 1.0, 255.0 );
	//read normalmap into temp4, then .rgb goes to U, .a goes to V (tangent space just for now)
	temp4 = texture2D(normMap,texcoords2).rgba;
	normal_vec3 = norm_decode( temp4 );
	//read glow texture into temp4, then .rgb^2 goes to glow_mat, and .a goes to ao_glo_fac
	temp4 = texture2D(glowMap,texcoords2).rgba;
	glow_mat3 = temp4.rgb * temp4.rgb; ao_glow_fac1 = temp4.a; 
	//read diffuse into temp4, then rgb goes to diff_mat, a goes to alpha
	temp4 = texture2D(diffMap,texcoords2).rgba;
	diff_mat3 = temp4.rgb; result4.a = temp4.a;
	//read damage texture into temp4, then .rgb goes to damg_mat3, .a goes to is_dielectric
	temp4 = texture2D(damgMap,texcoords2).rgba;
	damg_mat3 = temp4.rgb; is_dielectric_mat1 = temp4.a;
	//read detail texture into temp4; then .rgb modulates diffuse/spec, .a modulates shininess
	temp4 = texture2D(detailMap,16.0*texcoords2);
	temp4 -= vec4( 0.5 ); temp4 *= 0.12345;
	diff_mat3 -= temp4.rgb; spec_mat3 += temp4.rgb; gloss_mat1 -= temp4.a;
	///OTHER PRE-PER-LIGHT COMPUTATIONS
	//normalmapping-derived vector computations (normal (using tangent))
	normal_vec3 = fastnormalize(imatmul(tangent_vec3,cotangent_vec3,vnormal_vec3,normal_vec3));
	//reflection vector computation
	reflect_vec3 = -reflect( eye_vec3, normal_vec3 );
	//compute smooth shininess limit for spotlights
	limited_gloss_fac1 = limited_shininess( gloss_mat1 );
	//initialize accumulators
	diff_light_acc3 = spec_light_acc3 = vec3( 0.0 );
	//and might as well compute the shininess adjusted specularity
	float spec_gloss_adj = specularNormalizeFactor( limited_gloss_fac1 );
	//and might as well compute other gammas of ambient occlusion
	ao_diff_fac1 = sqrt( ao_glow_fac1 );
	ao_spec_fac1 = ao_glow_fac1 * ao_glow_fac1;
	///PER-LIGHT COMPUTATIONS
	if( light_enabled[0] != 0 )
	 lite0(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
	if( light_enabled[1] != 0 )
	 lite1(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
	//we will process the accumulators later, to give the above loops time to finish
	//AMBIENT CONTRIBUTION
	//assume the environment cube map is encoded with gamma = 0.5 (but keep it in the family ;-)
	amb_contrib_al3 = ambientMapping( normal_vec3 );
	frsn_contrib_al3 = envMapping( reflect_vec3 ); //fresnel env mapping, while we're at it...
	amb_contrib_al3 *= ( amb_contrib_al3 * ao_glow_fac1 * diff_mat3 );
	//now we can multiply material albedos by the flavors of ambient occlusion
	spec_mat3 *= ao_spec_fac1;
	diff_mat3 *= ao_diff_fac1;
	///FRESNEL STUFF begins
	//Fresnel evaluates to a coefficient that will be used to blend white specular specularity with
	//the diffuse AND specular contributions, in the case of dielectrics. For non-dielectrics, fresnel
	//will be zero. Shininess for fresnel specularity is always maxed out. Specularity and shininess
	//specified through textures, with a dielectric material, will constitute a "third layer" for the
	//material, and will allow representation of metallized paints.
	float fresnel_alpha = 1.0 - clamp( dot( eye_vec3, normal_vec3 ), 0.0, 1.0 );
	fresnel_alpha *= fresnel_alpha;
	fresnel_alpha = clamp( 0.0625 + ( 0.9375 * fresnel_alpha ), 0.0625, 1.0 );
	fresnel_alpha *= is_dielectric_mat1;
	float fresnel_beta = 1.0 - fresnel_alpha; // ;-)
	///ENVIRONMENT MAPPING
	//shininess to env map LOD
	//read env LOD (reflect); .rgb^2 goes to specular_il3 (assume gamma=0.5)
	//assume the environment cube map is encoded with gamma = 0.5 (but keep it in the family ;-)
	envm_contrib_al3 = envMappingLOD( reflect_vec3, shininess2Lod( gloss_mat1 ) );
	envm_contrib_al3 *= envm_contrib_al3;
	//FRESNEL STUFF continues; now we apply it:
	//essentially, total specular contribution is
	//specular_material * (1-fresnel) * LODenv + fresnel * env (fresnel shininess always maxed out)
	frsn_contrib_al3 *= fresnel_alpha; //don't multiply fresnel contrib by material spec; think...
	diff_mat3 *= fresnel_beta;
	envm_contrib_al3 *= ( fresnel_beta * spec_mat3 );
	//diffuse contribution also gets multiplied by 1-fresnel
	diff_contrib_al3 = diff_light_acc3 * diff_mat3 * ao_diff_fac1 * fresnel_beta;
	//specular contribution is a bit of a hard question. Theoretically it should be multiplied by
	//1-fresnel, but then we should add fresnel reflection of lights to the lighting loop, which
	//would be expensive. Furthermore, specular spotlights are already gloss-limited to account for
	//non-point-light sources; and this would apply to fresnel reflectivity. In summary, forget it.
	//what the spec contribution needs to be multiplied by is the specular gloss adjustment
	spec_contrib_al3 = spec_light_acc3 * spec_mat3 * ao_spec_fac1 * spec_gloss_adj;
	//GLOW (we got it, already, in glow_mat3)
	//process accumulations
	result4.rgb = amb_contrib_al3 + diff_contrib_al3 + frsn_contrib_al3 + spec_contrib_al3 + glow_mat3;
	//ALPHA and CLOAK
	result4.rgb *= result4.a;
	result4 *= cloakdmg.rrrg;
	//WRITE
	gl_FragColor = result4;
}

Now, I ran NVShaderPerf on it, an the results seem kind of suspicious...
How could it possibly compile to such few assembler instructions?
Here's the output:

Code: Select all

!!ARBfp1.0
OPTION NV_fragment_program2;
# cgc version 2.0.0012, build date Jan 30 2008
# command line args: -profile fp40 -oglsl
# source file: new_shader.fp
#vendor NVIDIA Corporation
#version 2.0.0.12
#profile fp40
#program main
#semantic light_enabled
#semantic max_light_enabled
#semantic cubeMap
#semantic diffMap
#semantic specMap
#semantic glowMap
#semantic normMap
#semantic damgMap
#semantic detailMap
#semantic cloakdmg
#semantic gl_FrontLightProduct : state.lightprod.front
#var int light_enabled[0] :  : c[0] : -1 : 1
#var int light_enabled[1] :  : c[1] : -1 : 1
#var int light_enabled[2] :  :  : -1 : 0
#var int light_enabled[3] :  :  : -1 : 0
#var int light_enabled[4] :  :  : -1 : 0
#var int light_enabled[5] :  :  : -1 : 0
#var int light_enabled[6] :  :  : -1 : 0
#var int light_enabled[7] :  :  : -1 : 0
#var int max_light_enabled :  :  : -1 : 0
#var samplerCUBE cubeMap :  : texunit 6 : -1 : 1
#var sampler2D diffMap :  : texunit 3 : -1 : 1
#var sampler2D specMap :  : texunit 0 : -1 : 1
#var sampler2D glowMap :  : texunit 2 : -1 : 1
#var sampler2D normMap :  : texunit 1 : -1 : 1
#var sampler2D damgMap :  : texunit 4 : -1 : 1
#var sampler2D detailMap :  : texunit 5 : -1 : 1
#var float4 cloakdmg :  : c[2] : -1 : 1
#var float4 gl_FrontLightProduct[0].ambient : state.lightprod[0].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[0].diffuse : state.lightprod[0].front.diffuse : c[3] : -1 : 1
#var float4 gl_FrontLightProduct[0].specular : state.lightprod[0].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[1].ambient : state.lightprod[1].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[1].diffuse : state.lightprod[1].front.diffuse : c[4] : -1 : 1
#var float4 gl_FrontLightProduct[1].specular : state.lightprod[1].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[2].ambient : state.lightprod[2].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[2].diffuse : state.lightprod[2].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[2].specular : state.lightprod[2].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[3].ambient : state.lightprod[3].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[3].diffuse : state.lightprod[3].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[3].specular : state.lightprod[3].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[4].ambient : state.lightprod[4].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[4].diffuse : state.lightprod[4].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[4].specular : state.lightprod[4].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[5].ambient : state.lightprod[5].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[5].diffuse : state.lightprod[5].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[5].specular : state.lightprod[5].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[6].ambient : state.lightprod[6].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[6].diffuse : state.lightprod[6].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[6].specular : state.lightprod[6].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[7].ambient : state.lightprod[7].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[7].diffuse : state.lightprod[7].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[7].specular : state.lightprod[7].front.specular :  : -1 : 0
#var float4 gl_TexCoord[0] : $vin.TEX0 : TEX0 : -1 : 1
#var float4 gl_TexCoord[1] : $vin.TEX1 : TEX1 : -1 : 1
#var float4 gl_TexCoord[2] : $vin.TEX2 : TEX2 : -1 : 1
#var float4 gl_TexCoord[3] : $vin.TEX3 : TEX3 : -1 : 1
#var float4 gl_TexCoord[4] :  :  : -1 : 0
#var float4 gl_TexCoord[5] : $vin.TEX5 : TEX5 : -1 : 1
#var float4 gl_TexCoord[6] : $vin.TEX6 : TEX6 : -1 : 1
#var float4 gl_TexCoord[7] :  :  : -1 : 0
#var float4 gl_FragColor : $vout.COLOR : COL : -1 : 1
#const c[5] = 0.5 1.5 0.98000002 1
#const c[6] = 0.02 0 16 0.12345
#const c[7] = 255 50 5 0
#const c[8] = 1.7 -1.7 0.25 0.33329999
#const c[9] = 2 0.9375 0.0625 7.6999998
PARAM c[10] = { program.local[0..2],
		state.lightprod[0].front.diffuse,
		state.lightprod[1].front.diffuse,
		{ 0.5, 1.5, 0.98000002, 1 },
		{ 0.02, 0, 16, 0.12345 },
		{ 255, 50, 5, 0 },
		{ 1.7, -1.7, 0.25, 0.33329999 },
		{ 2, 0.9375, 0.0625, 7.6999998 } };
TEMP R0;
TEMP R1;
TEMP R2;
TEMP R3;
TEMP R4;
TEMP R5;
TEMP R6;
TEMP R7;
TEMP R8;
TEMP RC;
TEMP HC;
OUTPUT oCol = result.color;
TEX   R0, fragment.texcoord[0], texture[1], 2D;
ADDR  R0.x, R0, R0.y;
DP3R  R0.y, fragment.texcoord[5], fragment.texcoord[5];
RSQR  R0.y, R0.y;
MULR  R1.xyz, R0.y, fragment.texcoord[5];
MOVR  R6.zw, c[5].xyxw;
ADDR  R0.x, R0.z, R0;
MADR  R2.x, R0, c[8].w, -R6.z;
ADDR  R2.y, R0.w, -c[5].x;
MOVR  R2.z, c[8];
DP3R  R0.x, R2, R2;
RSQR  R1.w, R0.x;
MULR  R2.xyz, R1.w, R2;
MULR  R3.xyz, R2.y, fragment.texcoord[3];
DP3R  R0.x, fragment.texcoord[1], fragment.texcoord[1];
MADR  R0.x, -R0, c[5], c[5].y;
MULR  R0.xyz, R0.x, fragment.texcoord[1];
DP3R  R0.w, R0, R1;
MADR  R3.xyz, R2.x, fragment.texcoord[2], R3;
MADR  R2.xyz, R0, R2.z, R3;
MADR  R1.w, -R0, R0, c[5];
MULR  R1.w, R1, R1;
MULR  R1.w, R1, R1;
MULR  R1.w, R0, R1;
MADR  R0.w, R0, c[5].z, -R1;
DP3R  R2.w, R2, R2;
MADR  R1.w, -R2, c[5].x, c[5].y;
MULR  R3.xyz, R1.w, R2;
DP3R_SAT R1.x, R1, R3;
ADDR_SAT R0.w, R0, c[6].x;
ADDR_SAT R0.w, -R0, c[5];
MULR  R0.w, R0, R0;
MULR  R1.xyz, R1.x, c[3];
MULR  R0.w, R0, R0;
MADR_SAT R0.w, -R0, R0, c[5];
MOVR  R4.xyz, c[6].y;
MOVXC RC.x, c[0];
MULR  R1.xyz, fragment.texcoord[5].w, R1;
MULR  R4.xyz(NE.x), R0.w, R1;
MOVR  R6.xyz, c[6].y;
MULR  R1.xyz, fragment.texcoord[5].w, c[3];
MULR  R6.xyz(NE.x), R1, R0.w;
DP3R  R0.w, fragment.texcoord[6], fragment.texcoord[6];
RSQR  R0.w, R0.w;
MULR  R1.xyz, R0.w, fragment.texcoord[6];
DP3R  R0.x, R0, R1;
MADR  R0.y, -R0.x, R0.x, c[5].w;
MULR  R0.y, R0, R0;
MULR  R0.y, R0, R0;
MULR  R0.y, R0.x, R0;
MADR  R0.x, R0, c[5].z, -R0.y;
DP3R_SAT R0.y, R1, R3;
ADDR_SAT R0.x, R0, c[6];
ADDR_SAT R0.x, -R0, c[5].w;
MULR  R0.w, R0.x, R0.x;
MULR  R0.xyz, R0.y, c[4];
MULR  R0.w, R0, R0;
MULR  R1.xy, fragment.texcoord[0], c[6].z;
MADR_SAT R0.w, -R0, R0, c[5];
MULR  R0.xyz, fragment.texcoord[6].w, R0;
MOVXC RC.x, c[1];
MADR  R4.xyz(NE.x), R0.w, R0, R4;
MULR  R0.xyz, fragment.texcoord[6].w, c[4];
MADR  R6.xyz(NE.x), R0, R0.w, R6;
DP3R_SAT R0.x, R3, R5;
ADDR  R0.x, -R0, c[5].w;
MULR  R0.x, R0, R0;
MADR  R0.x, R0, c[9].y, c[9].z;
TEX   R0.w, fragment.texcoord[0], texture[4], 2D;
MAXR_SAT R0.x, R0, c[9].z;
MULR  R5.w, R0.x, R0;
TEX   R0, fragment.texcoord[0], texture[2], 2D;
RSQR  R2.x, R0.w;
RCPR  R4.w, R2.x;
TEX   R1, R1, texture[5], 2D;
ADDR  R2, R1, -c[5].x;
MULR  R2, R2, c[6].w;
TEX   R1, fragment.texcoord[0], texture[3], 2D;
ADDR  R1.xyz, -R2, R1;
ADDR  R3.w, -R5, c[5];
MULR  R7.xyz, R1, R4.w;
MULR  R7.xyz, R7, R3.w;
MULR  R4.xyz, R4, R7;
MULR  R4.xyz, R4, R4.w;
MULR  R7.xyz, R4, R3.w;
MOVR  R3.w, c[9];
TXL   R4, R3, texture[6], CUBE;
MULR  R4.xyz, R4, R4.w;
MULR  R8.xyz, R0.w, R4;
DP3R  R3.w, R3, R5;
MULR  R3.xyz, R3, R3.w;
MULR  R1.xyz, R8, R1;
MADR  R1.xyz, R4, R1, R7;
TEX   R4, fragment.texcoord[0], texture[0], 2D;
MADR  R3.xyz, -R3, c[9].x, R5;
TEX   R3, -R3, texture[6], CUBE;
MULR  R3.xyz, R3, R3.w;
MULR  R0.w, R0, R0;
ADDR  R2.xyz, R4, R2;
MULR  R2.xyz, R2, R0.w;
MULR  R2.xyz, R6, R2;
MULR  R2.xyz, R2, R0.w;
MULR  R0.w, R4, R4;
MULR  R0.w, R0, c[7].x;
MINR  R0.w, R0, c[7].x;
MAXR  R0.w, R0, c[5];
ADDR  R0.w, -R2, R0;
ADDR  R2.w, R0, c[7].y;
RCPR  R2.w, R2.w;
MULR  R0.w, R0, R2;
MADR  R0.w, R0, c[7].z, R6;
RCPR  R0.w, R0.w;
MULR  R0.w, R0, c[8].x;
POWR  R0.w, R0.w, c[8].y;
MADR  R1.xyz, R3, R5.w, R1;
MADR  R1.xyz, R2, R0.w, R1;
MADR  R0.xyz, R0, R0, R1;
MULR  R0.xyz, R0, R1.w;
MOVR  R0.w, R1;
MULR  oCol, R0, c[2].xxxy;
END
# 120 instructions, 9 R-regs, 0 H-regs

The stdout said...

Code: Select all

NVShaderPerf : version 2.0, build date Jun 11 2008, 19:15:47
Copyright (C) 2002-2008, NVIDIA Corporation
=====================================================================
Performance analysis of new_shader.fp
Fragment Performance Setup: Driver 174.74, GPU G70, Flags 0x0
Results 61 cycles, 7 r regs, 157,377,056 pixels/s

I hope that doesn't mean 61 passes

157,377,056 / 60 fps = 2,622,951
sqrt(2622951)=1619.5526954481392648148767189895
So we could be at 1600 x 1280 and get 70 fps ?!!
Maybe the lightingLight code is not getting instanced. How does it know which lights are enabled?

Other than that, unless you post to stop me, I think I'm going to go ahead and instead of encoding shininess with gamma 0.5, I'm going to go logarithmic. It will save instructions. I bet even the specular modulation by shininess function will be simplified. No need for clamps. And I'm sure the grey scale values for shininesses for the various materials will be a lot more intuitive.

EDIT:
LOL

I commented out the conditionals for the lights, and

Code: Select all

# 116 instructions, 9 R-regs, 0 H-regs

so I guess it was assuming both lights were enabled.
I'm utterly astonished how tight the assembly is. Mshaft has a thing or two to learn from nVidia...

Post by **chuck_starchaser** » Fri Jun 20, 2008 5:18 am

ALL DONE!

Damage is now included; and I changed shininess representation to logarithmic.
128 instructions, exactly

Code: Select all

//NEW SHADER (high end)
uniform int light_enabled[gl_MaxLights];
uniform int max_light_enabled;
//samplers
uniform samplerCube cubeMap;
uniform sampler2D diffMap;   //1-bit alpha in alpha, for alpha-testing only
uniform sampler2D specMap;   //sqrt(shininess) in alpha
uniform sampler2D glowMap;   //ambient occlusion in alpha
uniform sampler2D normMap;   //U in .rgb; V in alpha
uniform sampler2D damgMap;   //"is_dielectric" in 1-bit alpha
uniform sampler2D detailMap; //.rgb adds to diffuse, subtracts from spec; alpha mods shininess
//other uniforms
uniform vec4 cloakdmg; //.rg=cloak, .ba=damage
//envColor won't be needed, since we're fetching it from the envmap

//NOTE: Since the term "binormal" has been rightly deprecated, I use "cotangent" instead :)

vec3 lerp( in float f, in vec3 a, in vec3 b)
{
    return (1.0-f)*a + f*b;
}
vec3 fastnormalize( in vec3 input ) //less accurate than normalize() but should use less instructions
{
    float tmp = dot( input, input );
    tmp = 1.5 - (0.5*tmp);
    return tmp * input;
}
vec3 norm_decode( in vec4 input )
{
    //The LaGrande normalmap noodle does away with the z-term for the normal by encoding U and V
    //as 0.5*tan( angle ), where angle is arcsin( U ) or arcsin( V ), respectively. To fit that
    //into a 0-1 range, we multiply by 0.5 once again, and add 0.5.
    //To reverse the encoding, we first subtract 0.5, then multiply by four, fill the z term with
    //1.0, and normalize. But multiplying by four is not needed if instead we fill the z term with
    //0.25, instead; *then* normalize:
    vec3 result;
    result.x = 0.3333*(input.r+input.g+input.b) - 0.5;
    result.y = input.a - 0.5;
    result.z = 0.25;
    return normalize( result ); //can't use fastnormalize() here
}
vec3 imatmul( in vec3 tan, in vec3 cotan, in vec3 norm, in vec3 light )
{
    return light.xxx*tan + light.yyy*cotan + light.zzz*norm;
}
float shininess2Lod( in float alphashininess ) 
{ 
    //return clamp( 7.0 - log2( shininess + 1.0 ), 0.0, 7.0 );
    return 0.5 + 7.0 * ( 1.0 - alphashininess );
}
float alpha2shininess( in float alpha )
{
    return pow( 255.0, alpha ); //means that alpha is log255( shininess )
}
float limited_shininess( in float shine )
{
    float limit = 50.0; //50^2 is 2500. 2500*0.001 = 2.5 --enough risk of saturation!
    return (shine*limit)/(shine+limit);
}
float specularNormalizeFactor( in float limited_shininess )
{
    return pow(1.7/(1.0+limited_shininess/10.0),-1.7);
}
vec3 ambientMapping( in vec3 normal )
{
    vec4 result = textureCubeLod( cubeMap, normal, 7.7 );
    return result.rgb * result.a;
}
vec3 envMapping( in vec3 reflection )
{
    vec4 result = textureCube( cubeMap, reflection );
    return result.rgb * result.a;
}
vec3 envMappingLOD( in vec3 reflection, in float LoD )
{
    vec4 result = textureCubeLod( cubeMap, reflection, LoD );
    return result.rgb * result.a;
}
float soft_NdotL( in float NdotL ) //for soft penumbras
{
    float s = 1.0 - (NdotL*NdotL); //s is 1.0 at penumbra point, falls slowly
    s *= s; //falls faster
    s *= s; //falls much faster either way from the penumbra point
    s *= NdotL; //s is now zero at penumbra but has tiny +/- wavelets to the sides
    return clamp( 0.98*NdotL + 0.02 - s, 0.0, 1.0 ); //we shrink NdotL by 2%, shift
    //it up by 2%, and subtract the s wavelet to flatten the penumbra area
}
float selfshadow( in float sNdotL ) //use of soft NdotL should be most correct
{
    float s = clamp(1.0 - sNdotL, 0.0, 1.0);
    s *= s;
    s *= s;
    s *= s;
    return clamp(1.0 - s, 0.0, 1.0);
}
void lightingLight
   (
   in vec3 light, in vec3 normal, in vec3 vnormal, in vec3 reflection,
   in vec3 lightDiffuse, in float lightAtt, in float ltd_gloss,
   inout vec3 diff_acc, inout vec3 spec_acc
   )
{
    float NdotL = clamp( dot(normal,light), 0.0, 1.0 );
    float sNdotL = soft_NdotL( dot(vnormal,light) );
    float RdotL = clamp( dot(reflection,light), 0.0, 1.0 );
    float selfshadow = selfshadow( sNdotL );
    float spec = pow( RdotL, ltd_gloss );
    diff_acc += ( NdotL * lightDiffuse.rgb * lightAtt * selfshadow );
    spec_acc += ( lightDiffuse.rgb * lightAtt * selfshadow );
}

#define lighting(name, lightno_gl, lightno_tex) \
void name( \
   in vec3 normal, in vec3 vnormal, in  vec3 reflection, \
   in float limited_gloss, \
   inout vec3 diff_acc, inout vec3 spec_acc) \
{ \
    lightingLight( \
      normalize(gl_TexCoord[lightno_tex].xyz), \
      normal, vnormal, reflection, \
      gl_FrontLightProduct[lightno_gl].diffuse.rgb, \
      gl_TexCoord[lightno_tex].w, \
      limited_gloss, \
      diff_acc, spec_acc); \
}

lighting(lite0, 0, 5)
lighting(lite1, 1, 6)

void main()
{
    ///VARIABLE DECLARATIONS
    //vector variables
    vec4 temp4; //all-purpose vec4 temporary
    vec3 eye_vec3;
    vec3 vnormal_vec3;
    vec3 normal_vec3;
    vec3 tangent_vec3;
    vec3 cotangent_vec3; //"binormal" ;-)
    vec3 reflect_vec3;
    vec3 light0_vec3;
    vec3 light1_vec3;
    //material color variables
    vec3 diff_mat3;
    vec3 damg_mat3;
    vec3 spec_mat3;
    vec3 glow_mat3;
    //inferent light variables
    vec3 light0_il3;
    vec3 light1_il3;
    vec3 ambient_il3; //to be fetched from envmap via normal
    vec3 specular_il3; //to be fetched from envmap via reflect vector
    vec3 spec_light_acc3; //specular light accumulator
    vec3 diff_light_acc3; //diffuse light accumulator
    //afferent light variables
    vec3 diff_contrib_al3;
    vec3 spec_contrib_al3;
    vec3 envm_contrib_al3;
    vec3 frsn_contrib_al3;
    vec3 glow_contrib_al3;
    vec3 amb_contrib_al3;
    //accumulator:
    vec4 result4;
    //scalar factors and coefficients
    float ao_glow_fac1; //plain ambient occlusion factor, used for ambient contribution
    float ao_spec_fac1; //squared ambient occlusion, used for specular modulation
    float ao_diff_fac1; //square root of ambient occlusion, used for diffuse
    float is_dielectric_mat1; //0.0 = metal; 1.0 = dielectric
    float gloss_mat1; //A.K.A. "shininess"
    float limited_gloss_fac1; //smooth-limited gloss to use for spotlights
    //interpolated mesh data fetches
    vec2 texcoords2 = gl_TexCoord[0].xy;
    vnormal_vec3 = fastnormalize( gl_TexCoord[1].xyz );
    tangent_vec3 = gl_TexCoord[2].xyz;
    cotangent_vec3 = gl_TexCoord[3].xyz;
    ///MOSTLY TEXTURE FETCHES (start with spec, as we'll need shininess at the earliest time):
    //read specular into temp4, then spec_mat gets .rgb, and gloss_mat gets .a^2 (assume gamma=0.5)
    temp4 = texture2D(specMap,texcoords2).rgba;
    spec_mat3 = temp4.rgb;
    // gloss_mat1 = clamp( 255.0 * temp4.a * temp4.a, 1.0, 255.0 );
    gloss_mat1 = alpha2shininess( temp4.a );
    float gloss_LoD1 = shininess2Lod( temp4.a );
    //read normalmap into temp4, then .rgb goes to U, .a goes to V (tangent space just for now)
    temp4 = texture2D(normMap,texcoords2).rgba;
    normal_vec3 = norm_decode( temp4 );
    //read glow texture into temp4, then .rgb^2 goes to glow_mat, and .a goes to ao_glo_fac
    temp4 = texture2D(glowMap,texcoords2).rgba;
    glow_mat3 = temp4.rgb * temp4.rgb; ao_glow_fac1 = temp4.a; 
    //read diffuse into temp4, then rgb goes to diff_mat, a goes to alpha
    temp4 = texture2D(diffMap,texcoords2).rgba;
    diff_mat3 = temp4.rgb; result4.a = temp4.a;
    //read damage texture into temp4, then .rgb goes to damg_mat3, .a goes to is_dielectric
    temp4 = texture2D(damgMap,texcoords2).rgba;
    damg_mat3 = temp4.rgb; is_dielectric_mat1 = temp4.a;
    //blend damage back into the diffuse
    diff_mat3 = lerp( cloakdmg.b, diff_mat3, damg_mat3 );
    //we need a darkening color to limit specularity as a function of damage, also
    vec3 darkenin3;
    temp4.rgb = vec4( 1.0 );
    darkenin3 = vec3( 0.333 * dot( damg_mat3, damg_mat3 ) );
    darkenin3 = lerp( cloakdmg, temp4.rgb, darkenin3 );
    //and darken specular material by it
    spec_mat3 *= darkenin3;
    //read detail texture into temp4; then .rgb modulates diffuse/spec, .a modulates shininess
    temp4 = texture2D(detailMap,16.0*texcoords2);
    temp4 -= vec4( 0.5 ); temp4 *= 0.12345;
    diff_mat3 -= temp4.rgb; spec_mat3 += temp4.rgb; gloss_mat1 -= temp4.a;
    ///OTHER PRE-PER-LIGHT COMPUTATIONS
    //normalmapping-derived vector computations (normal (using tangent))
    normal_vec3 = fastnormalize(imatmul(tangent_vec3,cotangent_vec3,vnormal_vec3,normal_vec3));
    //reflection vector computation
    reflect_vec3 = -reflect( eye_vec3, normal_vec3 );
    //compute smooth shininess limit for spotlights
    limited_gloss_fac1 = limited_shininess( gloss_mat1 );
    //initialize accumulators
    diff_light_acc3 = spec_light_acc3 = vec3( 0.0 );
    //and might as well compute the shininess adjusted specularity
    float spec_gloss_adj = specularNormalizeFactor( limited_gloss_fac1 );
    //and might as well compute other gammas of ambient occlusion
    ao_diff_fac1 = sqrt( ao_glow_fac1 );
    ao_spec_fac1 = ao_glow_fac1 * ao_glow_fac1;
    ///PER-LIGHT COMPUTATIONS
    if( light_enabled[0] != 0 )
     lite0(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
    if( light_enabled[1] != 0 )
     lite1(normal_vec3,vnormal_vec3,reflect_vec3,limited_gloss_fac1,diff_light_acc3,spec_light_acc3);
    //we will process the accumulators later, to give the above loops time to finish
    //AMBIENT CONTRIBUTION
    //assume the environment cube map is encoded with gamma = 0.5 (but keep it in the family ;-)
    amb_contrib_al3 = ambientMapping( normal_vec3 );
    frsn_contrib_al3 = envMapping( reflect_vec3 ); //fresnel env mapping, while we're at it...
    amb_contrib_al3 *= ( amb_contrib_al3 * ao_glow_fac1 * diff_mat3 );
    //now we can multiply material albedos by the flavors of ambient occlusion
    spec_mat3 *= ao_spec_fac1;
    diff_mat3 *= ao_diff_fac1;
    ///FRESNEL STUFF begins
    //Fresnel evaluates to a coefficient that will be used to blend white specular specularity with
    //the diffuse AND specular contributions, in the case of dielectrics. For non-dielectrics, fresnel
    //will be zero. Shininess for fresnel specularity is always maxed out. Specularity and shininess
    //specified through textures, with a dielectric material, will constitute a "third layer" for the
    //material, and will allow representation of metallized paints.
    float fresnel_alpha = 1.0 - clamp( dot( eye_vec3, normal_vec3 ), 0.0, 1.0 );
    fresnel_alpha *= fresnel_alpha;
    fresnel_alpha = clamp( 0.0625 + ( 0.9375 * fresnel_alpha ), 0.0625, 1.0 );
    fresnel_alpha *= ( is_dielectric_mat1 * (1.0-cloakdmg.b) ); //cloakdmg.b is the damage
    float fresnel_beta = 1.0 - fresnel_alpha; // ;-)
    ///ENVIRONMENT MAPPING
    //shininess to env map LOD
    //read env LOD (reflect); .rgb^2 goes to specular_il3 (assume gamma=0.5)
    //assume the environment cube map is encoded with gamma = 0.5 (but keep it in the family ;-)
    envm_contrib_al3 = envMappingLOD( reflect_vec3, gloss_LoD1 );
    envm_contrib_al3 *= envm_contrib_al3;
    //FRESNEL STUFF continues; now we apply it:
    //essentially, total specular contribution is
    //specular_material * (1-fresnel) * LODenv + fresnel * env (fresnel shininess always maxed out)
    frsn_contrib_al3 *= fresnel_alpha; //don't multiply fresnel contrib by material spec; think...
    diff_mat3 *= fresnel_beta;
    envm_contrib_al3 *= ( fresnel_beta * spec_mat3 );
    //diffuse contribution also gets multiplied by 1-fresnel
    diff_contrib_al3 = diff_light_acc3 * diff_mat3 * ao_diff_fac1 * fresnel_beta;
    //specular contribution is a bit of a hard question. Theoretically it should be multiplied by
    //1-fresnel, but then we should add fresnel reflection of lights to the lighting loop, which
    //would be expensive. Furthermore, specular spotlights are already gloss-limited to account for
    //non-point-light sources; and this would apply to fresnel reflectivity. In summary, forget it.
    //what the spec contribution needs to be multiplied by is the specular gloss adjustment; AND
    //faded down by damage
    spec_contrib_al3 = spec_light_acc3 * spec_mat3 * ao_spec_fac1 * spec_gloss_adj;
    //GLOW (we got it, already, in glow_mat3; well, not quite; we want to darken it by damage
    glow_mat3 *= ( 0.5 * (2.0-cloakdmg.b) );
    //process accumulations
    result4.rgb = amb_contrib_al3 + diff_contrib_al3 + frsn_contrib_al3 + spec_contrib_al3 + glow_mat3;
    //ALPHA and CLOAK
    result4.rgb *= result4.a;
    result4 *= cloakdmg.rrrg;
    //WRITE
    gl_FragColor = result4;
}

NVShaderPerf:

Code: Select all

!!ARBfp1.0
OPTION NV_fragment_program2;
# cgc version 2.0.0012, build date Jan 30 2008
# command line args: -profile fp40 -oglsl
# source file: new_shader.fp
#vendor NVIDIA Corporation
#version 2.0.0.12
#profile fp40
#program main
#semantic light_enabled
#semantic max_light_enabled
#semantic cubeMap
#semantic diffMap
#semantic specMap
#semantic glowMap
#semantic normMap
#semantic damgMap
#semantic detailMap
#semantic cloakdmg
#semantic gl_FrontLightProduct : state.lightprod.front
#var int light_enabled[0] :  : c[0] : -1 : 1
#var int light_enabled[1] :  : c[1] : -1 : 1
#var int light_enabled[2] :  :  : -1 : 0
#var int light_enabled[3] :  :  : -1 : 0
#var int light_enabled[4] :  :  : -1 : 0
#var int light_enabled[5] :  :  : -1 : 0
#var int light_enabled[6] :  :  : -1 : 0
#var int light_enabled[7] :  :  : -1 : 0
#var int max_light_enabled :  :  : -1 : 0
#var samplerCUBE cubeMap :  : texunit 6 : -1 : 1
#var sampler2D diffMap :  : texunit 3 : -1 : 1
#var sampler2D specMap :  : texunit 0 : -1 : 1
#var sampler2D glowMap :  : texunit 2 : -1 : 1
#var sampler2D normMap :  : texunit 1 : -1 : 1
#var sampler2D damgMap :  : texunit 4 : -1 : 1
#var sampler2D detailMap :  : texunit 5 : -1 : 1
#var float4 cloakdmg :  : c[2] : -1 : 1
#var float4 gl_FrontLightProduct[0].ambient : state.lightprod[0].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[0].diffuse : state.lightprod[0].front.diffuse : c[3] : -1 : 1
#var float4 gl_FrontLightProduct[0].specular : state.lightprod[0].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[1].ambient : state.lightprod[1].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[1].diffuse : state.lightprod[1].front.diffuse : c[4] : -1 : 1
#var float4 gl_FrontLightProduct[1].specular : state.lightprod[1].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[2].ambient : state.lightprod[2].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[2].diffuse : state.lightprod[2].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[2].specular : state.lightprod[2].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[3].ambient : state.lightprod[3].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[3].diffuse : state.lightprod[3].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[3].specular : state.lightprod[3].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[4].ambient : state.lightprod[4].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[4].diffuse : state.lightprod[4].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[4].specular : state.lightprod[4].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[5].ambient : state.lightprod[5].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[5].diffuse : state.lightprod[5].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[5].specular : state.lightprod[5].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[6].ambient : state.lightprod[6].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[6].diffuse : state.lightprod[6].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[6].specular : state.lightprod[6].front.specular :  : -1 : 0
#var float4 gl_FrontLightProduct[7].ambient : state.lightprod[7].front.ambient :  : -1 : 0
#var float4 gl_FrontLightProduct[7].diffuse : state.lightprod[7].front.diffuse :  : -1 : 0
#var float4 gl_FrontLightProduct[7].specular : state.lightprod[7].front.specular :  : -1 : 0
#var float4 gl_FragColor : $vout.COLOR : COL : -1 : 1
#var float4 gl_TexCoord[0] : $vin.TEX0 : TEX0 : -1 : 1
#var float4 gl_TexCoord[1] : $vin.TEX1 : TEX1 : -1 : 1
#var float4 gl_TexCoord[2] : $vin.TEX2 : TEX2 : -1 : 1
#var float4 gl_TexCoord[3] : $vin.TEX3 : TEX3 : -1 : 1
#var float4 gl_TexCoord[4] :  :  : -1 : 0
#var float4 gl_TexCoord[5] : $vin.TEX5 : TEX5 : -1 : 1
#var float4 gl_TexCoord[6] : $vin.TEX6 : TEX6 : -1 : 1
#var float4 gl_TexCoord[7] :  :  : -1 : 0
#const c[5] = 2 0.5 1.5 0.98000002
#const c[6] = 1 0.02 0 0.333
#const c[7] = 16 0.12345 255 50
#const c[8] = 5 0 1.7 -1.7
#const c[9] = 0.25 0.33329999 0.9375 0.0625
#const c[10] = 7.6999998
PARAM c[11] = { program.local[0..2],
		state.lightprod[0].front.diffuse,
		state.lightprod[1].front.diffuse,
		{ 2, 0.5, 1.5, 0.98000002 },
		{ 1, 0.02, 0, 0.333 },
		{ 16, 0.12345, 255, 50 },
		{ 5, 0, 1.7, -1.7 },
		{ 0.25, 0.33329999, 0.9375, 0.0625 },
		{ 7.6999998 } };
TEMP R0;
TEMP R1;
TEMP R2;
TEMP R3;
TEMP R4;
TEMP R5;
TEMP R6;
TEMP R7;
TEMP R8;
TEMP R9;
TEMP RC;
TEMP HC;
OUTPUT oCol = result.color;
TEX   R0, fragment.texcoord[0], texture[1], 2D;
ADDR  R0.x, R0, R0.y;
DP3R  R0.y, fragment.texcoord[5], fragment.texcoord[5];
RSQR  R0.y, R0.y;
MULR  R1.xyz, R0.y, fragment.texcoord[5];
TEX   R7, fragment.texcoord[0], texture[3], 2D;
MOVR  R5.xw, c[5].yyzx;
ADDR  R0.x, R0.z, R0;
MADR  R2.x, R0, c[9].y, -R5;
ADDR  R2.y, R0.w, -c[5];
MOVR  R2.z, c[9].x;
DP3R  R0.x, R2, R2;
RSQR  R1.w, R0.x;
MULR  R2.xyz, R1.w, R2;
MULR  R3.xyz, R2.y, fragment.texcoord[3];
DP3R  R0.x, fragment.texcoord[1], fragment.texcoord[1];
MADR  R0.x, -R0, c[5].y, c[5].z;
MULR  R0.xyz, R0.x, fragment.texcoord[1];
DP3R  R0.w, R0, R1;
MADR  R3.xyz, R2.x, fragment.texcoord[2], R3;
MADR  R2.xyz, R0, R2.z, R3;
MADR  R1.w, -R0, R0, c[6].x;
MULR  R1.w, R1, R1;
MULR  R1.w, R1, R1;
MULR  R1.w, R0, R1;
MADR  R0.w, R0, c[5], -R1;
DP3R  R2.w, R2, R2;
MADR  R1.w, -R2, c[5].y, c[5].z;
MULR  R2.xyz, R1.w, R2;
DP3R_SAT R1.x, R1, R2;
ADDR_SAT R0.w, R0, c[6].y;
ADDR_SAT R0.w, -R0, c[6].x;
MULR  R0.w, R0, R0;
MULR  R1.xyz, R1.x, c[3];
MULR  R0.w, R0, R0;
TEX   R3, fragment.texcoord[0], texture[4], 2D;
MADR_SAT R0.w, -R0, R0, c[6].x;
MOVR  R6.xyz, c[6].z;
MOVXC RC.x, c[0];
MULR  R1.xyz, fragment.texcoord[5].w, R1;
MULR  R6.xyz(NE.x), R0.w, R1;
MOVR  R5.xyz, c[6].z;
MULR  R1.xyz, fragment.texcoord[5].w, c[3];
MULR  R5.xyz(NE.x), R1, R0.w;
DP3R  R0.w, fragment.texcoord[6], fragment.texcoord[6];
RSQR  R0.w, R0.w;
MULR  R1.xyz, R0.w, fragment.texcoord[6];
DP3R  R0.x, R0, R1;
MADR  R0.y, -R0.x, R0.x, c[6].x;
MULR  R0.y, R0, R0;
MULR  R0.y, R0, R0;
MULR  R0.y, R0.x, R0;
MADR  R0.x, R0, c[5].w, -R0.y;
DP3R_SAT R0.y, R1, R2;
ADDR_SAT R0.x, R0, c[6].y;
ADDR_SAT R0.x, -R0, c[6];
MULR  R0.w, R0.x, R0.x;
MULR  R0.xyz, R0.y, c[4];
MULR  R0.w, R0, R0;
MULR  R1.xy, fragment.texcoord[0], c[7].x;
TEX   R1, R1, texture[5], 2D;
ADDR  R1, R1, -c[5].y;
MULR  R1, R1, c[7].y;
MADR_SAT R0.w, -R0, R0, c[6].x;
MULR  R0.xyz, fragment.texcoord[6].w, R0;
MOVXC RC.x, c[1];
MADR  R6.xyz(NE.x), R0.w, R0, R6;
MULR  R0.xyz, fragment.texcoord[6].w, c[4];
MADR  R5.xyz(NE.x), R0, R0.w, R5;
DP3R_SAT R0.x, R2, R4;
ADDR  R0.x, -R0, c[6];
MULR  R0.x, R0, R0;
MADR  R0.x, R0, c[9].z, c[9].w;
MADR  R0.y, R3.w, -c[2].z, R3.w;
MAXR_SAT R0.x, R0, c[9].w;
MULR  R3.w, R0.x, R0.y;
MADR  R0.xyz, R7, -c[2].z, R7;
MADR  R7.xyz, R3, c[2].z, R0;
DP3R  R3.x, R3, R3;
TEX   R0, fragment.texcoord[0], texture[2], 2D;
RSQR  R4.w, R0.w;
ADDR  R2.w, -R3, c[6].x;
ADDR  R7.xyz, -R1, R7;
RCPR  R4.w, R4.w;
MULR  R8.xyz, R7, R4.w;
MULR  R8.xyz, R8, R2.w;
MULR  R6.xyz, R6, R8;
MULR  R6.xyz, R6, R4.w;
MULR  R8.xyz, R6, R2.w;
MOVR  R2.w, c[10].x;
TXL   R6, R2, texture[6], CUBE;
MULR  R6.xyz, R6, R6.w;
MULR  R9.xyz, R0.w, R6;
MULR  R7.xyz, R9, R7;
MADR  R7.xyz, R6, R7, R8;
MULR  R0, R0, R0;
TEX   R6, fragment.texcoord[0], texture[0], 2D;
MULR  R3.x, R3, c[2];
MOVR  R2.w, c[2].x;
MADR  R2.w, R3.x, c[6], -R2;
MADR  R3.xyz, R2.w, R6, R6;
DP3R  R2.w, R2, R4;
MULR  R2.xyz, R2, R2.w;
ADDR  R1.xyz, R3, R1;
MULR  R1.xyz, R0.w, R1;
MULR  R1.xyz, R5, R1;
MULR  R1.xyz, R0.w, R1;
POWR  R0.w, c[7].z, R6.w;
ADDR  R1.w, -R1, R0;
MADR  R2.xyz, -R2, c[5].x, R4;
TEX   R2, -R2, texture[6], CUBE;
MULR  R2.xyz, R2, R2.w;
ADDR  R0.w, R1, c[7];
RCPR  R2.w, R0.w;
MADR  R2.xyz, R2, R3.w, R7;
MOVR  R0.w, c[6].x;
MULR  R1.w, R1, R2;
MADR  R0.w, R1, c[8].x, R0;
RCPR  R0.w, R0.w;
MULR  R0.w, R0, c[8].z;
POWR  R0.w, R0.w, c[8].w;
MADR  R1.xyz, R1, R0.w, R2;
ADDR  R0.w, R5, -c[2].z;
MULR  R0.xyz, R0.w, R0;
MADR  R0.xyz, R0, c[5].y, R1;
MULR  R0.xyz, R0, R7.w;
MOVR  R0.w, R7;
MULR  oCol, R0, c[2].xxxy;
END
# 128 instructions, 10 R-regs, 0 H-regs

NVShaderPerf std out:

Code: Select all

NVShaderPerf : version 2.0, build date Jun 11 2008, 19:15:47
Copyright (C) 2002-2008, NVIDIA Corporation
=====================================================================
Performance analysis of new_shader.fp
Fragment Performance Setup: Driver 174.74, GPU G70, Flags 0x0
Results 73 cycles, 8 r regs, 131,506,848 pixels/s

131,506,848 pixels/s / 60 FPS = 2191780.8 pixels per frame.
sqrt( 2191780.8 pixels ) = 1480.4664129928784034494503871134
1400 x 1400 approx.
So at 1280 x 1024 we'd get like 75 fps. And thats based on G70 series gpu. And we've got it all...

Klauss, give it a good read when you got a chance; I'll start working on the glass and engines exhausts shader.

Post by **chuck_starchaser** » Fri Jun 20, 2008 1:53 pm

By the way, Klauss, I took out the microshadowing stuff from the ambient mapping routine because it's not necessary; that'll be in the ambient occlusion texture already. I have a LaGrande noodle to do just that: It takes ambient occlusion (or any light baking) and the bumpmap as inputs, and it outputs a modified ambient occlusion or lightbaking that includes bumpmap modulations.

FYI, the algorithm I came up with makes an incorrect assumption (strictly speaking), but which is statistically valid most of the time, that a gradient in illumination intensity implies bulk directionality of the light source. It computes an ad-hoc vector for the light source based on the rate of change of illumination on the surface. Then it computes a surface vector based on the bumpmap data (all of this in tangent space, of course), and it modulates the light baking based on the dot product of those two vectors. Works like a charm.

EDIT:
Note also I unified cloak and damage into a single vec4 input: cloakdmg.
Is that okay?
I don't know much about cloak OR damage. Not sure why cloak is implemented by multiplying by .rrrg; or why it's a vec4 in the first place.

New idea:
If it's okay, I'd make this vec4 variable carry whatever cloak carried in r and g in its r and g; then blue would carry damage, and alpha could carry (1-damage); --having this last thing would save 3 shader instructions, I think.

Post by **pyramid** » Fri Jun 20, 2008 6:53 pm

I have compiled from svn freshly at home (openUSE + nv 8600GT + highend shader from chuck + max settings). I cannot see ANY diffuse textures on models that have multiple textures. An exception are planets with diffuse tex only which show all right.

Here you can see the blink lights on the Llama, while the Llama is completely transparent:

And here's the star fortress seen from top. You can see the glow texture but neither diffuse nor specular (not sure if it has one):

I know the shader is not ready yet. So just a heads up since it might be different with the previously compiled exe from klauss. Not sure however, so maybe I'm just blabbering in which case just ignore it.

charlieg · Post by **charlieg** » Fri Jun 20, 2008 9:49 pm

Chuck, you just invented a cloak device!

Post by **chuck_starchaser** » Fri Jun 20, 2008 10:12 pm

charlieg wrote:Chuck, you just invented a cloak device!

Klauss did!

I've no idea, Pyramid. I never wrote any shaders to work with the techniques branch yet. I suppose the techniques branch should default to trunk behavior when a technique is not specified in xmesh, but the techniques branch binary crashed on me so I've never been able to verify that it does. Or have I? Now I can't remember if I got around the crashing problems or not... I think not.

This sounds like a code problem, more than a shader problem, tho:
If planets having ONLY diffuse, show a diffuse, but stations having several textures show only the glow, it means that textures are somehow obliterating one another, so only the last one getting loaded shows up.

Post by **pyramid** » Fri Jun 20, 2008 10:48 pm

I was wondering if this is worth posting over at the techniques thread, but then again, most of the models seems to do just fine with the standard shaders in trunk. Maybe I just post this problem over and hope klauss knows what's up.

Post by **chuck_starchaser** » Fri Jun 20, 2008 11:12 pm

pyramid wrote:I was wondering if this is worth posting over at the techniques thread, but then again, most of the models seems to do just fine with the standard shaders in trunk. Maybe I just post this problem over and hope klauss knows what's up.

Certainly; this is a techniques issue.

Post by **klauss** » Fri Jun 20, 2008 11:36 pm

Can't really comment on all of it, at work now, I just thought I'd skim the forum. Now... I wouldn't get rid of shininess... it seems wrong. I like the fact that shininess is specified accurately by the material, and the texture only modulates it relatively speaking. Having to specify a very very specific value in shininess is madness, IMO, for the artist. Mostly if we'll have to deal with compression artifacts and loss of precision and the like.

Of course people can do whatever they like on their mod. If they have their dataset irreparably messed up, use a shader that ignores what's messed up, ok. But I wouldn't make it official, no way.

About the ultimate cloak: I bet it's a shader problem. It may be returning the wrong alpha value or something like that.

Post by **chuck_starchaser** » Sat Jun 21, 2008 2:05 am

klauss wrote:Can't really comment on all of it, at work now, I just thought I'd skim the forum. Now... I wouldn't get rid of shininess... it seems wrong. I like the fact that shininess is specified accurately by the material, and the texture only modulates it relatively speaking. Having to specify a very very specific value in shininess is madness, IMO, for the artist. Mostly if we'll have to deal with compression artifacts and loss of precision and the like.

Klauss, it's the other way around: What an artist wants, and needs, is dependable tools and techniques and materials. What we need, specifically for texturing ships and stations, is a library of materials. And I intend to create just such a library for Blender. Hundreds of named materials, eventually; and maybe a tool to produce various types of paints with customizable colors. For such a library of materials, we need to know that 0x55 shininess will always look the same, consistently, rather than depend on this or that other thing.

Now, did you see my post about encoding shininess logarithmically?
Do you realize how precise that will be?
Remember we got 8-bits. It's in the alpha channel of spec.
Assume a range of shininess from 1.0 to 256.0:
What do you think the 256th root of 256 is?
1.0218971486541166782344801347833
That's 2.2% increments all across the range!!!
In music, that's less than a quarter-tone.

1.0
1.0218971486541166782344801347833
1.0442737824274138403219664787399
1.0671404006768236181695211209928
1.0905077326652576592070106557607
1.1143867425958925363088129569196
..............................................
..............................................
..............................................
..............................................
..............................................
..............................................
..............................................
..............................................
..............................................
..............................................
224.80027652778233383835655928004
229.72276160039771986274601499236
234.7530350603958353263466482673
239.89345716611838737023986559417
245.14643985883485361569424259787
250.51444789445123443434524820824
256.0

Of course people can do whatever they like on their mod. If they have their dataset irreparably messed up, use a shader that ignores what's messed up, ok. But I wouldn't make it official, no way.

What's really messed up is having an incomprehensibly convoluted interface.

Without a simple and consistent interface, Vegastrike will continue to attract and then disillusion artists. Everything goes well while modeling, and up to applying textures in the modeling tool and getting renders; then they disappear when they find they can't deal with mesher and all the inscrutable parameters and complications of xmesh. Like a totally misleading keyword "refletive" buried somewhere in there causing a large and totally arbitrary amount of shininess being added. That is messed up; --hopefully not irreparably... Too much useless crap, like all those color specifications that have to either be all zero or all one for them to work as one would expect, some multiplicative, some additive, no way to know, and if you're told once, no way to remember any way. And alpha values meaning special things. That is messed up.

Ask yourself: Is this engine for artists or for programmers?
Because it's the modders and artists, data-side people, that usually use game engines to create games. But this engine is like written for programmers. Developers even speak of Python as being "data side".
I happen to be something in-between, but I tell you most artists won't understand shininess if you explain it to them 10^40 times. (Hell; well over 90% of them don't understand what the specular texture is for, and how to produce one; so they follow the 'blind leading the blind', falacious advice making the rounds of desaturating the diffuse...

)

But you can show an artist a few dozen shininesses and he'll catch on.
But, do you expect artists to understand about a shininess number in xmesh that multiplies the value coming from the texture, and a "reflective" parameter adding an arbitrary value to it....?
Get real, Klauss.
"Multiplying shininess", to an artist, sounds like "verticalizing sweetness" or "taking the arc-coshine of desperation"... meaningless to them.

There should be NOTHING in xmesh, except blending mode and shader. Even the texture names could be standardized and hard-coded.
We should be dropping napalm all over xmesh, rather than have to struggle to get rid of such huge amounts of non-sense one tiny bit at a time.
I'd even say xmesh bears 80% of the blame for the Vegastrike engine not being more popular.

But if you insist on that shininess parameter being used, I say fine, 2 more instructions... I don't really care. But the LaGrande tool set and material libraries will expect that number to be 256.0, now and to the end of time.
So, just wasted instructions, wasted bytes in memory, and a risk of accidentally messing up things, --for NOTHING in return.

Ultimately, the look of a model in-game should, as much as possible, depend exclusively on the textures. That's what artists would expect, and quite rightfully so. Some parameters in xmesh are unavoidable, of course, --like blending mode and technique. But anything avoidable in xmesh should be avoided with tenacious determination. I imagine that the original specification for xmesh was concocted by programmers who don't understand artists at all; and who thought that "the more bells and whistles, the better". Well, nothing could be further from the truth. Xmesh is a heavy burden on artists, as it is, precisely because of all the bells and whistles.

Well, at least to whatever extent I may be representive, my texturing work is all fun until I'm approaching the time of having to export .obj's, and use mesher, and edit the xmesh file.... Then I begin to experience cold sweats of paralyzing fear and dread... (correctly) anticipating I will have to repeat the process a dozen times, --with so many things that can go wrong, and do.

EDIT:
Question:
Assuming we use, say, 1024 as our standard cube-map size, is there a way to mathematically determine what shininess level such resolution corresponds to? Because I think *that*, rather than an arbitrary 256.0, should be the maximum level of shininess representable.