 
Best viewed 1024 x 768 minimum
 Thanks to 
 
by Silicon Graphics Inc.
www.opengl.org
A Few ScreenShots
Source + Executable for Depth Of Field Effect (52.5 KB)
Normalization cubemap generated with glhBuildNormalizationCubeMap (417 KB, 6 PNG files)
Introduction
This page contains a library called "glh" 
which    stands    for   Graphics     Library Helper. It is kind of like OpenGL's
GLU but contains     extra functions     and  optimized functions. The
optimizations    are done    with assembly and  are   for  the x86 architecture. 
Note that    not all parts    are done in assembly.
glhlib is still freeware and still contains the very fast image scaling function --> glhScaleImage_asm386
the function that started this project.
Log
Saturday, May 1, 2004
This is an annoucement.
Since it would be nice to have some features similar to what D3DX offers, I have decided to go in this direction.
The next version will be 1.51 and will feature a reader/writer for DDS files with support for many formats (if not all), support for
2D, 3D and Cubemap textures.
The new functions may look something like this :
glhReadFile_DDS(const char *pfilePath, GLint dataAlignment, GLint *width, GLint *height, GLint *depth, GLint *format, GLint *textureType, GLint desiredFormat, GLenum type,  void *pixels);
glhWriteFile_DDS(const char *pfilePath, GLint dataAlignment, GLint width, GLint height, GLint depth, GLint format, GLenum type,  void *pixels);
I intend to support the following formats :
B2_G3_R3 = 8 bit
B2_G3_R3_A8 = 16 bit
B5_G5_R5_A1 = 16 bit
B5_G5_R5_X1 = 16 bit
B4_G4_R4_A4 = 16 bit
B4_G4_R4_X4 = 16 bit
B5_G6_R5 = 16 bit
BGR8 = 24 bit
RGB8 = 24 bit
BGRA8 = 32 bit
RGBA8 = 32 bit
BGRX8 = 32 bit
RGBX8 = 32 bit
R10_G10_B10_A2 = 32 bit
B10_G10_R10_A2 = 32 bit
B16_G16_R16_A16 = 64 bit
Monday, Nov 24, 2003
Version 1.50 is ready for download.
This update contains some functions that take advantage of SSE instructions.
Intel has created these instructions and released them with the launch of the Pentium 3
Other CPU makers, like AMD, have also added these instructions into their architecture.
It's time to add them to glhlib. Future addition will have an SSE version as well when possible.
In the header file, Block 16 has been added.
The new functions are (See the header file for more details): 
NOTE: These functions are for mass processing.
glhProjectFLOAT_2 (Similar to gluProject but uses float and vertices are 4D)
glhProjectFLOAT_3 (Just like glhProjectFLOAT_2, but instead of 4D data, it takes 3D data)
glhUnProjectFLOAT_2 (Similar to gluUnProject but uses float and is intended for mass processing)
glhUnProjectFLOAT_3 (Just like glhUnProjectFLOAT_2, but instead of 4D data, it takes 3D data)
glhProjectFLOAT_SSE_Aligned_2 (Just like glhProjectFLOAT_2, but uses SSE and some x86_fpu)
glhProjectFLOAT_SSE_Aligned_WarmCache_2 (Do I need to explain?)
glhProjectFLOAT_SSE_Unaligned_2 (Just like glhProjectFLOAT_SSE_Aligned_2, but data need not be 16 byte aligned)
glhUnProjectFLOAT_SSE_Aligned_2 (The unproject version)
glhUnProjectFLOAT_SSE_Aligned_WarmCache_2 (The unproject version)
glhUnProjectFLOAT_SSE_Unaligned_2 (The unproject version)
glhMultiplyMatrixByVector4by4FLOAT_1 (Does a mult_matrix with 4D vectors)
glhMultiplyMatrixByVector4by4FLOAT_2 (Does a mult_matrix with 3D vectors)
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_1 (Does a mult_matrix with 4D vectors with SSE)
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_WarmCache_1 (Does a mult_matrix with 4D vectors with SSE)
glhDoesProcessorSupportMMX (Obvious)
glhDoesProcessorSupportSSE (Obvious)
glhDoesOSSupportSSE (Obvious)
Here is one benchmark :
(Processing 1.6 MB of vertices) 
glhProjectFLOAT_2 vs. glhProjectFLOAT_SSE_Aligned_2
glhProjectFLOAT_SSE_Aligned_2 is ~1.3 times faster
glhProjectFLOAT_2 vs. glhProjectFLOAT_SSE_Aligned_WarmCache_2
glhProjectFLOAT_SSE_Aligned_WarmCache_2 ~1.8 times faster
glhProjectFLOAT_2 vs. glhProjectFLOAT_SSE_Unaligned_2
glhProjectFLOAT_SSE_Unaligned_2 is ~1.2 times faster
(Processing 4.0 MB of vertices) 
glhMultiplyMatrixByVector4by4FLOAT_1 vs. glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_1
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_1 is ~1.6 times faster
glhMultiplyMatrixByVector4by4FLOAT_1 vs.glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_WarmCache_1
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_WarmCache_1 is ~1.7 times faster
Conclusion
Theoretically, since SSE allows us to process 4 numbers at a times, it should
improve the performance by 4 over standard x86_fpu.
In one artificial test, I attained 3.5 times the performance.
In the above glh functions the maximum is probably 2x since there is some overhead
and perhaps there is room for improvement.
The glh functions that prefetch data into L1 and L2 cache and try to avoid cache pollution
are slightly faster.
In the future, I might release sorting, intersection testing, CSG operation doing, image
processing functions using SSE.
Friday, Sept 12, 2003
Version 1.41 is ready for download.
In version 1.40 (or another version), I had made a change to functions :
- glhUnProjectFLOAT_1
- glhUnProjectDOUBLE_1 
in order to optimize them but unfortunatly I had placed the wrong function
in there so the result it was calculating was wrong. Now both functions give
correct results, comparable to gluUnProject which is present in the GLU library.
Also, I have removed (my personal versions) gl.h glext.h glu.h wglext.h from the download, so now the file
size is smaller than previous, 59.1 KB while before it was over 100 KB.
Tuesday, June 24, 2003
Version 1.40 is ready for download.
There is a set of functions added and some changes. Check out the header file to learn about the functions and their usage.
Added support for data alignment == 4 in glhScaleImage_asm386, thus any
functions using it in this library also support it.
Made small change to glhScaleImage_asm386 to improve performance for
both 24 and 32 bpp images.
Corrected bug in glhScaleImage_asm386 that caused data corruption on
some color channels for 32 bit images (data alignment == 1).
Corrected bug in glhScaleImage_asm386_MMX that caused data corruption on
some color channels for 32 bit images (data alignment == 1). (Do not use this function)
Corrected bug in glhScaleImage2_asm386 that caused data corruption on
some color channels for 32 bit images (data alignment == 1). (Do not use this function)
Improved glhBuild2DMipmaps so that it wouldn't modify the supplied data
as it created the mipmaps. Also supports data alignment == 4 due to update made
to glhScaleImage_asm386.
Fixed small bug that caused it to reject data format GL_BGR, GL_BGRA,
and GL_ABGR.
glhRenderWith_DepthOfField_SceneAntialiased_FLOAT
has been renamed to
glhRender_DOF_SceneAA_FLOAT
to avoid a compiler bug in Visual C++ 6
The problem is that VC++ is producing a faulty
lib file which cause problems with the linker of VC++
Really weird, since other functions seem fine.
Section 12 has a complete set of matrix functions for doing calculations in software,
Section 13 :
glhBuildCubeMapMipmaps (Has the same benifits as glhBuild2DMipmaps)
glhLowerPowerOfTwo2
glhHigherPowerOfTwo2
Decided not to add glhScaleImage3D_asm386 and glhBuild3DMipmaps.
Section 14 :
glhFrustumd
glhFrustumf
glhOrthod
glhOrthof
glhMergedFrustumd
glhMergedFrustumf
glhMergedPerspectived
glhMergedPerspectivef
glhFrustumInfiniteFarPlaned
glhFrustumInfiniteFarPlanef
glhPerspectiveInfiniteFarPlaned
glhPerspectiveInfiniteFarPlanef
glhLookAtd
glhLookAtf
glhIsMatrixRotationMatrixd
glhIsMatrixRotationMatrixf
glhExtractAnglesFromRotationMatrixd2
glhExtractAnglesFromRotationMatrixf2
Section 15 :
glhBuild2DNormalMipmaps
glhBuildCubeMapNormalMipmaps
glhBuildNormalizationCubeMap
glhBuildNormalizationCubeMap_FLOAT
Tuesday, August 20, 2002
     Version 1.30 is up for download. I made some important changes to the
 glh.  As usual, there is a set of new functions added, but nothing seriously
 big.  Look at Block 11 and 12 in the header file.
Sunday, Feb 3, 2002
                         A better release. Added the glhGetString function. 
 
Saturday, July 28, 2001
A first release of the library and it's source
code   which    only   contains     an optimized version of gluScaleImage
called   glhScaleImage_asm386.
I have been able to attain drastic performance
gains    compared     to  the   gluScaleImage  present in glu32.dll
Here is one benchmark :
Beginning benchmarking of gluScaleImage:
1024 x 1024 (24 bit) --> 400 x 400 (24 bit)
time = 1502.00012 milliseconds
Beginning benchmarking of glhScaleImage_asm386
with   point    filtering:
1024 x 1024 (24 bit) --> 400 x 400 (24 bit)
time = 19.99996 milliseconds
Beginning benchmarking of glhScaleImage_asm386 with   linear    filtering:
1024 x 1024 (24 bit) --> 400 x 400 (24 bit)
time = 120.00000 milliseconds
First / Second = 75.10016
First / Third = 12.51667
Third / Second = 6.00001
By using point filtering, the algorithm is 75 times 
faster than gluScaleImage.    By using linear filtering, the algorithm
is   12 times faster than gluScaleImage.    Very impressive numbers,
I'd    say.  The catch is that the image's alignment    must be 1, and
must   be  24  or 32  bit (GL_RGB or GL_RGB8 or GL_RGBA or GL_RGBA8),  
the buffers     must  be of  type GLubyte (unsigned char). That's what most
eople   use    (as do  I) so that's what I optimized for.
Just remember that your mileage may vary, 
that   the   algorithm      may   have   bugs,  that the results it generates
may   not  match that of   the  original   gluScaleImage.
Links
| Home Page | My Home Page, the root of everything on this server. | 
| The GLU Library | The GLU library for OpenGL. Download the latest version! | 
| 
 | 
Thanks to

by Silicon Graphics Inc.
www.opengl.org
* OpenGL(R) is a registered trademark of
Silicon Graphics, Inc.
This page is http://www.oocities.org/vmelkon/glhlibrary.html
This page is http://ee.1asphost.com/vmelkon/glhlibrary.html
Graphics Library Helper aka glh
Copyright (C) 2001-2005 Vrej M. All Rights Reserved.