P3D Version 1.7
                         3D rendering engine
                          by Calin Andrian
                        
                        
            http://www.geocities.com/SiliconValley/Garage/9955/

        Allegro is a game programming library by Shawn Hargreaves.
            http://www.talula.demon.uk/


DISCLAIMER
#include     /* you know the drill */


INTRO

Main features - covered below in this doc:
- scene rendering package, with scanline-oriented z-sorting, very fast
- z-buffered rendering package
- 4 more rendering styles, *TRANS.


FILES IN THIS ZIP

p3d.txt         This file
makefile        Obvious
libp3d.a        The library
p3d.h           Public defines
p3dpriv.h       Internal defines
scene3d.c       3D rendering engine
poly3d.c        3D polygon3d(), with their helpers. It replaces Allegro's
                file, covering z-buffered rendering also.
scanline.s      Replaces Allegro's file. Some of the routines have speed
                improvements over the original.
mmxline.s       Replaces Allegro's MMX part of scanline.s. Improved routines.
zbufline.s      Z-buffered renderers.
perf3d.c        Small program to measure 3d performance when rendering to
                a memory bitmap.
test.c          Added some menus and tests


INSTALLING

0.   Prerequisites: DJGPP, ALLEGRO 3.1 installed. It may work with earlier
     versions of Allegro, but I didn't try.
     NOTE: You will need binutils (i.e. GNU assembler) version 2.8.1 or
     higher in order to compile the MMX support.
1.   Unzip all files in a dedicated directory (I use /djgpp/allegro/p3d).
2.A. If you are in a hurry, type "make install" and the pre-built library
     libp3d.a and header p3d.h will go to the "lib" and "include" directory
     of your djgpp installation.
2.B. If you want to compile it yourself, type "make clean" to get rid of
     the precompiled library, followed by "make". This will build everything
     (including the new test.exe) and install the library and the header
     to their places. Test.exe will stay put.
       Compiling the package requires files from the Allegro package:
     - from allegro/src. These are part of the Allegro sources.
     - from allegro/obj/djgpp. These are created when you compile Allegro.
     The makefile finds them presuming "allegro" is a subdir of the djgpp
     base directory (ex: c:\djgpp\allegro). If this is not true, you should
     edit the makefile.
3.   Include "p3d.h" after "allegro.h" in the source files that have to
     do with 3D. Link your programs with libp3d.a before liballeg.a.
     Example:     "gcc -o foo.exe foo.o   -lp3d -lalleg"


SCENE RENDERING INTRO

There are 4 important and simple enough approaches to removing hidden
objects (or surfaces):

1. Painter's algorithm - you draw far things first, so that when you
   draw near things you overwrite them. It leads to the depth-sort
   algorithm, which is pretty simple. It has some drawbacks: it can't
   solve some kind of ambiguities without breaking some polygons in
   smaller ones; it draws _everything_ - if you have a lot of polygons
   in your line of sight you draw a _lot_; the complexity of the sorting
   problem grows fast with the number of polygons.

2. Z-buffering - every pixel is recorded along with its z coordinate.
   If the new pixel is farther away, it will be skipped. Andrei Ellman
   has written a z-buffering package for Allegro, called AllegroPak.
   Pros: you don't make any polygon sorting - you just draw everything
   and the low-level routines take care that you see what should be
   visible; it's the only algorithm that directly solves penetrating
   shapes; the complexity of the problem grows linear with the number
   of polygons. Cons: the low-level routines are all more complex, and
   therefore slower; you still draw (or at least process every pixel
   individually) all the pixels of all the polygons. This is covered
   by P3D.

3. Scan-line algorithms - along each scanline in your screen, you keep
   track of what polygons you are "in" and which is the nearest. This
   status changes only where the scanline crosses some polygon edge.
   So you have to juggle an edge list and a polygon list. And you
   have to sort the edges for each scanline (this can be countered by
   keeping the order of the previous scanline - it won't change much).
   The BIG advantage is that you write each pixel only once. If you
   have a lot of overlapping polygons you can get incredibile speeds
   compared to any of the previous algorithms. This is covered by the
   scene_* routines in P3D.

4. Area subdivision - the true "divide and conquer" strategy. Too complex
   to tell you about it here. The nicest from these 4, from the
   programmer's point of view - it's juicy, it's challenging. But it
   expects efficient area drawing low-level routines. Because Allegro
   (due to video memory banking) is scan-line oriented, the performance
   expectations are too low to even try.


SCANLINE SORTING

The scene rendering has approx the following steps:
1. Initialize the scene (set the clip area, clear the bitmap, blit a
   background, etc)
2. Call scene_start()
3. Transform all your points to camera space
4. Clip polygons
5. Project with persp_project()
6. "Draw" polygons with scene_polygon3d() and/or scene_polygon3d_f().
   This doesn't do any actual drawing, only initializes tables.
7. Render all the polygons defined in step 3 to the bitmap with
   scene_render().
8. Overlay some non-3D graphics.
9. Show the bitmap (blit it to screen, flip the page, etc).


The package uses a scanline-oriented algorithm (see above). For each
horizontal line in the viewport an x-sorted edge list is used to keep
track of what polygons are "in" and which is the nearest.
 Vertical coherency is used - the edge list for a scanline is sorted
starting from the previous one - it won't change much.
 With ver 1.2 horizontal coherency is used - a dinamic z-sorted list
is used for polygons that overlap.
 I've tried global coherency - when you determine the visibility relation
between two polygons, store it for use in the next scanlines. I gave up
because there is no speed to be gained - the current bottleneck is
somewhere else.
The scene rendering routines use the same low-level asm routines as
normal polygon3d().


Here are the functions:

int scene_start(BITMAP *bmp, int nedge, int npoly);
   Initializes a scene.
   The bitmap bmp is the bitmap you will eventually render on.

   Nedge and npoly are your estimates of how many edges and how many
   polygons you will render (you cannot get over the limit specified
   here). If you use same values in succesive calls, the space will be
   reused (no new malloc()). If you call with nedge=0 or npoly=0, the
   allocated space is freed.
   The memory allocated is a little less than 150 * (nedge + npoly) bytes.

   Returns zero on success, negative numbers if allocations fail.

int scene_polygon3d(int type, BITMAP *texture, int vc, V3D *vtx[]);
int scene_polygon3d_f(int type, BITMAP *texture, int vc, V3D_f *vtx[]);
   Puts a polygon in the rendering list. Nothing is really rendered at
   this moment. Should be called between scene_start() and scene_render().

   Arguments are the same as for polygon3d(), except the bitmap is missing.
   It will be used the one passed to scene_start().

   Unlike polygon3d(), the polygon may be concave or self-intersecting.
   Shapes that penetrate one another may look OK, but they are not
   really handled by this code.
   Note that the texture is stored as a pointer only, and you should
   keep the actual bitmap until scene_render(), where it is used.

   Since the FLAT style is implemented with the low-level hline() funtion,
   the FLAT style is subject to DRAW_MODEs. All these modes are valid.
   Along with the polygon, this mode will be stored for the rendering moment,
   and also all the other realted variables (color_map pointer, pattern
   pointer, anchor, blender values).

   The settings of cpu_mmx and cpu_3dnow global variables on entry in this
   routine affect the choice of low-level asm routine that will be used by
   scene_render() for this polygon.

   Returns zero on success, negative numbers if there is no space in
   the edge or polygon tables, positive if it won't be rendered for
   lack of rendering routine.

void scene_render();
   Renders all the specified scene_polygon3d()'s on the bitmap passed
   to scene_start(). Rendering is done one scanline at a time, with
   no pixel being processed more than once.
   Note that between scene_start() and scene_render() you shouldn't
   change the clip rectangle of the destination bitmap. For speed
   reasons, you should set the clip rectangle to the minimum.
   Note also that all the textures passed to scene_polygon3d() are
   stored as pointers only and actually used in scene_render().

extern float scene_gap;
   This number (default value = 100.0) controls the behaviour of the
   z-sorting algorithm. When and edge is very close to another's
   polygon plane, there is an interval of uncertainty inside which
   you cannot tell which object is visible (which z is is smaller).
   This is due to numerical cummulative errors for edges that
   have undergone a lot of transformations and interpolations.

   The default value means that if the 1/z values (in projected space)
   differ by only 1/100 (one percent), they are considered to be
   equal and the x-slopes of the planes are used to find out which
   plane is getting closer when we move to the right.

   Larger values means narrower margins, and increasing the chance of
   missing true adjacent edges/planes. Smaller values means larger
   margins, and increasing the chance of mistaking close polygons for
   adjacent ones. The value of 100 is close to the optimum. However,
   the optimum shifts slightly with resolution, and may be
   application-dependent. It's for you to fine-tune. Mail me if you
   have a better solution.


SCENE RENDERING PITFALLS

The plane's equation is computed from the first 3 vertices of the
polygon. Make sure that they are not colinear, or you will probably
get FPU exceptions.

Unlike polygon3d(), scene_polygon3d() requires valid z coordinates
for all vertices, regardless of rendering style (unlike polygon3d(),
which only uses z coordinate for *PTEX*).

Don't build the colors for GRGB styles with makecol24(), as this routine
builds a hardware-dependent color. GRGB requires a hardware-independent
color with the RGB in this order. Use something like:
    color = (r << 16) | (g << 8) | b
with r, g, b between 0 and 255.

All polygons passed to scene_polygon3d() have to be persp_project()'ed.

After scene_render() the mode is reset to SOLID.


SCENE RENDERING PERFORMANCE

You can play with test.exe and see what happens with simple rendering
styles (FLAT), medium (GRGB) and very CPU-intensive (PTEX_LIT). You
will see that the slow video card, the performance loss due to
truecolor mode, the performance penalty of tough (but sometimes
unavoidable) rendering styles, are all compensated by the fact that
the algorithm doesn't process a pixel twice.
There is another source of speed: the video banks are changed only
once per scanline, not once per scanline per polygon. With VESA 1 for
example this may be important.

On the other hand, using a lot of *MASK* polygons drastically reduces
performance, because when a MASKed polygon is the first in line of
sight, the polygons underneath have to be drawn too. Same applies to
FLAT polygons drawn with DRAW_MODE_TRANS.

Example:
A Cyrix PR166 + S3 Trio renders 186 normal PTEX_LITs per second, in
32 bit mode. This means that you can hope (at the strange resolution
of 512 x 256 that test.exe uses for testing) for 4 frames per second
with 50 triangles per frame. But the scene test shows 717 triangles
per second at 50 triangles per scene (and frame). So you may even
get to 14 frames per second !
On the other hand, the same machine renders 1813 FLATs per second
in 8 bit mode. Scene rendering shows around 1500 no matter how many
triangles there are in a frame. You lose a little because of the
extra processing in sorting and testing edges, compared to straight
FLAT polygons which draw _very_ fast.


Z-BUFFERED RENDERING

I added the z-buffered rendering as an extension of the normal
POLYTYPE_* rendering styles. Just OR the POLYTYPE with the value
POLYTYPE_ZBUF, and the normal polygon3d(), polygon3d_f(), quad(), etc
will render z-buffered polygons. Example:
  polygon3d(POLYTYPE_ATEX | POLYTYPE_ZBUF, tex, vc, vtx);

Of course, the z coordinates have to be valid regardless of rendering
style.

At the beginning of every frame you have to clear the z-buffer. This is
done with:

zbuf_start(BITMAP *bmp, float clip_z);
   This routine clears the zbuffer associated with the destination bitmap
   bmp. Bmp is the bitmap where you will render all the polygons. If the
   z-buffer doesn't exist, it is created. If bmp is NULL, the z-buffer is
   destroyed. The zbuffer is filled with the value of clip_z. This is 1/z
   of the far clipping polygon (z=0.0 means no clipping).
   If you want to change the size of the destination bitmap, you have to
   destroy the current z-buffer and create a new one.

The z-buffer is a 32 bit/pixel bitmap, pointed to by
BITMAP *zbuf_bmp;
If you are worried about clearing time and/or the size of the z-buffer,
you should pass zbuf_start() the smallest possible sub-bitmap, and use
this also as the destination for your rendering.

Z-buffered rendering works also within the scene renderer. It may be
helpful when you have a few intersecting polygons, but most of the
polygons may be safely rendered by the normal scanline sorting algo.
Same as before: just OR the POLYTYPE with POLYTYPE_ZBUF. Also, you
have to clear the z-buffer at the start of the frame. Example:

scene_start(screen, 4000, 1000);
if (some_polys_are_zbuf) zbuf_start(screen, 0.0);
while (polygons) {
   ....
   if (this_poly_is_zbuf) type |= POLYTYPE_ZBUF;
   scene_polygon3d(type, tex, vc, vtx);
}
scene_render();

If you don't need z-buffered rendering and you want to reduce your exe
size, you can comment out the USE_ZBUF #define in p3d.h.

Notes on z-buf renderers:
1. Unlike the normal FLAT renderers, the z-buffered ones don't use
the hline() routine. Therefore DRAW_MODE has no effect.
2. The *LIT* routines work the traditional way - through the set of
blender routines.
3. All the z-buf routines are much slower than their normal counterparts
(all use the FPU to interpolate and test 1/z values).
4. No MMX versions. I doubt MMX could help speeding them up. Even 3DNow
is not very useful.


TRANS RENDERING STYLES

I defined 4 more rendering modes:
POLYTYPE_ATEX_TRANS,
POLYTYPE_PTEX_TRANS,
POLYTYPE_ATEX_MASK_TRANS,
POLYTYPE_PTEX_MASK_TRANS.
They render translucent textures, as you expected. All the general
rules for drawing translucent things apply.
They have a major limitation: these modes only work with memory bitmaps
or linear frame buffers (not with banked frame buffers). Don't even try,
they do no checking and your program will die horribly (or at least draw
wrong things).


CHANGES

From 1.6 to 1.7:
- More speed improvements. Totally rewritten 3DNow routines.
- Full support for all DRAW_MODEs.
- Minor bug fixes in poly3d.c.
- Added perf3d.exe for simple performance measurement.
- Split scanline.s in two files (moved the MMX and 3DNow routines in
  mmxline.s).

From 1.5 to 1.6:
- Added z-buffered rendering.
- Improved some rendering routines in scanline.s.

From 1.4 to 1.5:
- Stripped it down, as most of it has gone in the official Allegro
distribuition.

From 1.3 to 1.4:
- Corrected a major problem with the scene rendering: I was only taking
into account the closest polygon at all times. But this meant that you
don't see the polygon in the back through the holes of a MASKed polygon
in the front. In fact the fix made the code more elegant.

From 1.2 to 1.3:
- Changed the FLAT modes to use hline().
- Added back 3DNow! routines.
- Split the distribution in more files.
- Automatically check for MMX-capable assembler.

From 1.1 to 1.2:
- Removed the "parallel" argument to scene_start(). There was no real
support for parallel projection. Some other things (too many) had to
be modified too and I quit. Only perspective is supported. Sorry for
the oscillations - sometimes I'm in open loop.
- Rearranged scene_render() for better performance. Now it takes
advantage of horizontal coherency (if the polygons don't penetrate each
other their z-order doesn't change - so overlapping polygons are kept
in a z-sorted list).
- Solved a strange bug related to z-sorting adjacent polygons. It was
due to numerical approximations (you have to leave an uncertainty zone
around zero when you compare two values that have undergone a lot of
transformations).
- Added 3d clipping.

From 1.0 to 1.1:
- Added the "parallel" argument to scene_start(). Previously the 
parallel (!) mode was used. It had no bad effects for perspective,
unless some shapes got too close to the viewing point.
- Re-packed the whole thing. Now it generates an independent library
and doesn't change any of the original Allegro sources.


TODO
- First on the list: KNI routines.
- Make perspective correct ALPHA interpolation in PTEX renderers. Already
tried it once, but I wasn't happy with the speed.
- Add a new rendering style: PGRGB (GRGB with perspective - correct
interpolation). I discovered that the warping effects are very strange
when the polygon is 3D clipped (when extra vertices are inserted by
the clipping routine). I'm afraid though that it could be very sloooow.


THANKS
To DJ Delorie for DJGPP
To Shawn Hargreaves for Allegro
To all the Allegroids on the mailing list for their interest


DOWNLOAD FROM

My page at
http://www.geocities.com/SiliconValley/Garage/9955/



Try it and tell me what you think,
Calin Andrian



1999 Feb 24, 19:09


    Source: geocities.com/siliconvalley/garage/9955

               ( geocities.com/siliconvalley/garage)                   ( geocities.com/siliconvalley)