Best viewed 1024 x 768 minimum
Thanks to
by Silicon Graphics Inc.
www.opengl.org
A Few ScreenShots
Source + Executable for Depth Of Field Effect (52.5 KB)
Normalization cubemap generated with glhBuildNormalizationCubeMap (417 KB, 6 PNG files)
Introduction
This page contains a library called "glh"
which stands for Graphics Library Helper. It is kind of like OpenGL's
GLU but contains extra functions and optimized functions. The
optimizations are done with assembly and are for the x86 architecture.
Note that not all parts are done in assembly.
glhlib is still freeware and still contains the very fast image scaling function --> glhScaleImage_asm386
the function that started this project.
Log
Saturday, May 1, 2004
This is an annoucement.
Since it would be nice to have some features similar to what D3DX offers, I have decided to go in this direction.
The next version will be 1.51 and will feature a reader/writer for DDS files with support for many formats (if not all), support for
2D, 3D and Cubemap textures.
The new functions may look something like this :
glhReadFile_DDS(const char *pfilePath, GLint dataAlignment, GLint *width, GLint *height, GLint *depth, GLint *format, GLint *textureType, GLint desiredFormat, GLenum type, void *pixels);
glhWriteFile_DDS(const char *pfilePath, GLint dataAlignment, GLint width, GLint height, GLint depth, GLint format, GLenum type, void *pixels);
I intend to support the following formats :
B2_G3_R3 = 8 bit
B2_G3_R3_A8 = 16 bit
B5_G5_R5_A1 = 16 bit
B5_G5_R5_X1 = 16 bit
B4_G4_R4_A4 = 16 bit
B4_G4_R4_X4 = 16 bit
B5_G6_R5 = 16 bit
BGR8 = 24 bit
RGB8 = 24 bit
BGRA8 = 32 bit
RGBA8 = 32 bit
BGRX8 = 32 bit
RGBX8 = 32 bit
R10_G10_B10_A2 = 32 bit
B10_G10_R10_A2 = 32 bit
B16_G16_R16_A16 = 64 bit
Monday, Nov 24, 2003
Version 1.50 is ready for download.
This update contains some functions that take advantage of SSE instructions.
Intel has created these instructions and released them with the launch of the Pentium 3
Other CPU makers, like AMD, have also added these instructions into their architecture.
It's time to add them to glhlib. Future addition will have an SSE version as well when possible.
In the header file, Block 16 has been added.
The new functions are (See the header file for more details):
NOTE: These functions are for mass processing.
glhProjectFLOAT_2 (Similar to gluProject but uses float and vertices are 4D)
glhProjectFLOAT_3 (Just like glhProjectFLOAT_2, but instead of 4D data, it takes 3D data)
glhUnProjectFLOAT_2 (Similar to gluUnProject but uses float and is intended for mass processing)
glhUnProjectFLOAT_3 (Just like glhUnProjectFLOAT_2, but instead of 4D data, it takes 3D data)
glhProjectFLOAT_SSE_Aligned_2 (Just like glhProjectFLOAT_2, but uses SSE and some x86_fpu)
glhProjectFLOAT_SSE_Aligned_WarmCache_2 (Do I need to explain?)
glhProjectFLOAT_SSE_Unaligned_2 (Just like glhProjectFLOAT_SSE_Aligned_2, but data need not be 16 byte aligned)
glhUnProjectFLOAT_SSE_Aligned_2 (The unproject version)
glhUnProjectFLOAT_SSE_Aligned_WarmCache_2 (The unproject version)
glhUnProjectFLOAT_SSE_Unaligned_2 (The unproject version)
glhMultiplyMatrixByVector4by4FLOAT_1 (Does a mult_matrix with 4D vectors)
glhMultiplyMatrixByVector4by4FLOAT_2 (Does a mult_matrix with 3D vectors)
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_1 (Does a mult_matrix with 4D vectors with SSE)
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_WarmCache_1 (Does a mult_matrix with 4D vectors with SSE)
glhDoesProcessorSupportMMX (Obvious)
glhDoesProcessorSupportSSE (Obvious)
glhDoesOSSupportSSE (Obvious)
Here is one benchmark :
(Processing 1.6 MB of vertices)
glhProjectFLOAT_2 vs. glhProjectFLOAT_SSE_Aligned_2
glhProjectFLOAT_SSE_Aligned_2 is ~1.3 times faster
glhProjectFLOAT_2 vs. glhProjectFLOAT_SSE_Aligned_WarmCache_2
glhProjectFLOAT_SSE_Aligned_WarmCache_2 ~1.8 times faster
glhProjectFLOAT_2 vs. glhProjectFLOAT_SSE_Unaligned_2
glhProjectFLOAT_SSE_Unaligned_2 is ~1.2 times faster
(Processing 4.0 MB of vertices)
glhMultiplyMatrixByVector4by4FLOAT_1 vs. glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_1
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_1 is ~1.6 times faster
glhMultiplyMatrixByVector4by4FLOAT_1 vs.glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_WarmCache_1
glhMultiplyMatrixByVector4by4FLOAT_SSE_Aligned_WarmCache_1 is ~1.7 times faster
Conclusion
Theoretically, since SSE allows us to process 4 numbers at a times, it should
improve the performance by 4 over standard x86_fpu.
In one artificial test, I attained 3.5 times the performance.
In the above glh functions the maximum is probably 2x since there is some overhead
and perhaps there is room for improvement.
The glh functions that prefetch data into L1 and L2 cache and try to avoid cache pollution
are slightly faster.
In the future, I might release sorting, intersection testing, CSG operation doing, image
processing functions using SSE.
Friday, Sept 12, 2003
Version 1.41 is ready for download.
In version 1.40 (or another version), I had made a change to functions :
- glhUnProjectFLOAT_1
- glhUnProjectDOUBLE_1
in order to optimize them but unfortunatly I had placed the wrong function
in there so the result it was calculating was wrong. Now both functions give
correct results, comparable to gluUnProject which is present in the GLU library.
Also, I have removed (my personal versions) gl.h glext.h glu.h wglext.h from the download, so now the file
size is smaller than previous, 59.1 KB while before it was over 100 KB.
Tuesday, June 24, 2003
Version 1.40 is ready for download.
There is a set of functions added and some changes. Check out the header file to learn about the functions and their usage.
Added support for data alignment == 4 in glhScaleImage_asm386, thus any
functions using it in this library also support it.
Made small change to glhScaleImage_asm386 to improve performance for
both 24 and 32 bpp images.
Corrected bug in glhScaleImage_asm386 that caused data corruption on
some color channels for 32 bit images (data alignment == 1).
Corrected bug in glhScaleImage_asm386_MMX that caused data corruption on
some color channels for 32 bit images (data alignment == 1). (Do not use this function)
Corrected bug in glhScaleImage2_asm386 that caused data corruption on
some color channels for 32 bit images (data alignment == 1). (Do not use this function)
Improved glhBuild2DMipmaps so that it wouldn't modify the supplied data
as it created the mipmaps. Also supports data alignment == 4 due to update made
to glhScaleImage_asm386.
Fixed small bug that caused it to reject data format GL_BGR, GL_BGRA,
and GL_ABGR.
glhRenderWith_DepthOfField_SceneAntialiased_FLOAT
has been renamed to
glhRender_DOF_SceneAA_FLOAT
to avoid a compiler bug in Visual C++ 6
The problem is that VC++ is producing a faulty
lib file which cause problems with the linker of VC++
Really weird, since other functions seem fine.
Section 12 has a complete set of matrix functions for doing calculations in software,
Section 13 :
glhBuildCubeMapMipmaps (Has the same benifits as glhBuild2DMipmaps)
glhLowerPowerOfTwo2
glhHigherPowerOfTwo2
Decided not to add glhScaleImage3D_asm386 and glhBuild3DMipmaps.
Section 14 :
glhFrustumd
glhFrustumf
glhOrthod
glhOrthof
glhMergedFrustumd
glhMergedFrustumf
glhMergedPerspectived
glhMergedPerspectivef
glhFrustumInfiniteFarPlaned
glhFrustumInfiniteFarPlanef
glhPerspectiveInfiniteFarPlaned
glhPerspectiveInfiniteFarPlanef
glhLookAtd
glhLookAtf
glhIsMatrixRotationMatrixd
glhIsMatrixRotationMatrixf
glhExtractAnglesFromRotationMatrixd2
glhExtractAnglesFromRotationMatrixf2
Section 15 :
glhBuild2DNormalMipmaps
glhBuildCubeMapNormalMipmaps
glhBuildNormalizationCubeMap
glhBuildNormalizationCubeMap_FLOAT
Tuesday, August 20, 2002
Version 1.30 is up for download. I made some important changes to the
glh. As usual, there is a set of new functions added, but nothing seriously
big. Look at Block 11 and 12 in the header file.
Sunday, Feb 3, 2002
A better release. Added the glhGetString function.
Saturday, July 28, 2001
A first release of the library and it's source
code which only contains an optimized version of gluScaleImage
called glhScaleImage_asm386.
I have been able to attain drastic performance
gains compared to the gluScaleImage present in glu32.dll
Here is one benchmark :
Beginning benchmarking of gluScaleImage:
1024 x 1024 (24 bit) --> 400 x 400 (24 bit)
time = 1502.00012 milliseconds
Beginning benchmarking of glhScaleImage_asm386
with point filtering:
1024 x 1024 (24 bit) --> 400 x 400 (24 bit)
time = 19.99996 milliseconds
Beginning benchmarking of glhScaleImage_asm386 with linear filtering:
1024 x 1024 (24 bit) --> 400 x 400 (24 bit)
time = 120.00000 milliseconds
First / Second = 75.10016
First / Third = 12.51667
Third / Second = 6.00001
By using point filtering, the algorithm is 75 times
faster than gluScaleImage. By using linear filtering, the algorithm
is 12 times faster than gluScaleImage. Very impressive numbers,
I'd say. The catch is that the image's alignment must be 1, and
must be 24 or 32 bit (GL_RGB or GL_RGB8 or GL_RGBA or GL_RGBA8),
the buffers must be of type GLubyte (unsigned char). That's what most
eople use (as do I) so that's what I optimized for.
Just remember that your mileage may vary,
that the algorithm may have bugs, that the results it generates
may not match that of the original gluScaleImage.
Links
Home Page | My Home Page, the root of everything on this server. |
The GLU Library | The GLU library for OpenGL. Download the latest version! |
|
Thanks to
by Silicon Graphics Inc.
www.opengl.org
* OpenGL(R) is a registered trademark of
Silicon Graphics, Inc.
This page is http://www.oocities.org/vmelkon/glhlibrary.html
This page is http://ee.1asphost.com/vmelkon/glhlibrary.html
Graphics Library Helper aka glh
Copyright (C) 2001-2005 Vrej M. All Rights Reserved.