The New Demo Pages

Introduction
This demo is related to Very Basic vp, except that it's about fragment programs and it's even simpler.
We don't do much in the fp to emulate the fixed pipeline. It's only when you try to emulate the texture combiners, or Nvidia's register combiners or texture shaders or ATI's fragment shaders that it becomes a little more complex.

I will be using GL_ARB_fragment_program and GL_NV_fragment_program, so if you don't have an NVidia based card, then you only get to run the first one.

It's worth to note that the ARB version was approved on Sept 18, 2002 and was first introduced on the Radeon 9700. The NV version appeared before that in the 40.xx drivers but could only be used when emulation was activated since the CineFX architecture (NV30) was not yet public.
With emulation, it will of course run incredibly slow (< 1 FPS).

The Source + Explanations
Like I said in "Very Basic vp", both specs are quite similar. What's more interesting is that there are no new functions for fp in both extensions. Just a few tokens for querying the video cards abilities such as MAX_PROGRAM_ALU_INSTRUCTIONS, MAX_PROGRAM_TEX_INSTRUCTIONS, MAX_PROGRAM_TEX_INDIRECTIONS, MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS, and others, and also the equivalent to GL_VERTEX_PROGRAM --> GL_FRAGMENT_PROGRAM!
Also, it's odd that in this case, GL_FRAGMENT_PROGRAM_ARB != GL_FRAGMENT_PROGRAM_NV.

So you have to use the same API as you did for vp.
Quick example:

//ARB
GLuint ARBFragmentProgramID[10];
glGenProgramsARB(1, &ARBFragmentProgramID[0]);
glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, ARBFragmentProgramID[0]);
glProgramString(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, length, cBuffer);
glGetIntegerv(GL_PROGRAM_ERROR_POSITION_ARB, &errorPos);
glGetProgramiv(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_UNDER_NATIVE_LIMITS_ARB, &isNative);
if((errorPos==-1)&&(isNative==1)) {it's working}

//NV
GLuint NVFragmentProgramID[10];
glGenProgramsNV(1, &NVVertexProgramID[0]);
glBindProgramNV(GL_FRAGMENT_PROGRAM_NV, NVVertexProgramID[0]);
glLoadProgramNV(GL_FRAGMENT_PROGRAM_NV, NVVertexProgramID[0], length, cBuffer);
glGetIntegerv(GL_PROGRAM_ERROR_POSITION_NV, &errorPos);
if(errorPos==-1) {it's working}

With NV, it is possible to control the precision of each instruction, while with ARB there isn't much you can do, except provide an obscure hint using program option (ARB_precision_hint_nicest or ARB_precision_hint_fastest). I don't know what exactly that means when running say on a Radeon 9700/9800 or Geforce FX 5800/5900. The minimum precision required by the ARB spec and in GL in general is 1 over 10^5 which works out to 17 bits.
In NV, you can control each instruction precision. You have a choice between 32, 16 or 12 bit precision. All you have to do is append the letter "R", "H" or "X" to the instruction mnemonic. If you don't, then the destination register determines the precision (typically 32 bit). Some instruction don't take the precision suffixes (TEX/TXP/TXD).

So what can be said about the instructions themselves? Mostly, the instructions present in ARB_vp are present in ARB_fp. LOG, EXP and ARL are gone, but SIN, COS, SCS, LRP, CMP, TEX, TXP, TXB and KIL are added.
In NV, the number of instructions is huge. There must be near to 200 or more. This is because the instructions can have a "R", "H", "X", "C", and "_SAT" suffix.

In ARB, temp registers are undefined at the beginning of each fragment program invocation.
In NV, temp registers are initialized to zero. So are local parameters.

In both, the standard fragment attributes are present, except for stencil auxiliary buffer. A future version will remedy the situation and I think a new hardware might be needed as well.

In ARB, the minimum hw resources should be 10 attributes, 24 parameters, 4 texture indirections, 48 ALU instructions, 24 texture instructions, and 16 temporaries.
In NV, the temporaries are R0 through R31 which are 32 bit float.(s.e8.m23)
There is also H0 through H63 which are 16 bit float.(s.e5.m10)
There are 2 pseudo registers : RC and HC.
There is one condition code register called CC. Each component can be a GT, a EQ, a LT or a UN.

There are five different types of program parameters:
- embedded scalar constants
- embedded vector constants
- named constants
- named local parameters
- numbered local parameters

NV added this type of local program parameter declaration :
DECLARE color = {1,0,0,1};

NV allows constants to be defined using the DEFINE instruction.
DEFINE pi = 3.1415926535; (scalar)
DEFINE color = {0.2, 0.5, 0.8, 1.0}; (vector)

The KIL instruction is interesting since it can be used to stop processing of a fragment. I heard that the KIL instruction might not be "honored" on certain GPUs. This only means a loss of performance but the output will look alright.

One of the most interesting things about fp is that the old notion of texture coordinates with a certain texture bound on a certain texture unit is done with.
You can now use whichever tex coords set you want on whichever texture units and you can access the texture target you want (1D, 2D, 3D, cubemap, rectangle).
This means the texture hierarchy and texture enables will be ignored.
The only catch is that you can only reference one texture target for a particular texture unit.

Example :
TEX texel, iTex0, texture[0], 2D;

In NV, there is something called "fragment combiner programs" or FCP and in the script, instead of having !!FP1.0, you have !!FCP1.0
Besides color and depth, you can pass 4 textures coordinates for further processing inside your FCP, while in ARB, there is only color and depth output.

Conclusion
There is a lot to these extensions. This page covers a fraction of the things to know. I recommend going through the 5000 line specification files to understand them better.

A few more demos will appear making better use of these extensions.
That's it for this very basic fp!

Download source + exe