Code Gems Part 3
This text comes from IMPHOBIA Issue X - June 1995
Welcome and greetings! Prepare for another bunch of coding tips... By the
way, if You have some nice tricks but You don't feel enough inspiration to
write Code Gems part 5, please send 'em to me. I'm running out of ideas,
but I'm sure there's a couple of tricks left :-) I'd like to say a big-
big thanks to my enthusiastic friends who helped me with finishing this
article : Perla, Nicke, Deity, George, Stinyo, Rodrigo, G.O.D., #coders and
all I forgot...
* Correction for the previous part *
In the last issue I wrote that TASM doesn't support the LOOP instruction
using ECX instead of CX in 16-bit code. Well, I was wrong, sorry about
that. Of course, it has the ability of doing that. (I've managed to throw a
glance at an original Tasm book :-) There are two instruction aliases
called LOOPW and LOOPD. The first one always uses CX as counter (independently
from the size of the current code segment), the other vice versa. These
can be used as LOOPWE, LOOPDNE, and so on. JECXZ also available.
* Calculating the absolute value of AX *
This wonderful 'gem' was developed by Laertis / Nemesis.
cwd
xor ax,dx
sub ax,dx
* Short Compare, part II *
Checking if a register contains 8000h (or 80h or 80000000h):
neg register
jo it_was_8000
The content of the desired register won't be changed if it was 8000h :-)
* Multi-Segment STOS/MOVS *
In flat real mode it's possible to use multi-segment block movement : ECX for
counting and ESI / EDI for addressing.
ESTOSD macro
db 67h
stosd
endm
For example, this code clears four megabytes of memory:
xor eax,eax
mov ecx,100000h
mov esi,200000h
rep estosd
* Pixel drawing in protected mode *
Here comes a 'routine' which sets a pixel to the given value in 256-color
mode:
(parameters: EAX=X coordinate,EBX=Y coordinate, CL=color)
add eax,table[ebx*4]
mov [eax],cl
The only difference from the real mode method that the TABLE doesn't contain
the 0, 320, 640, etc. values.It contains (a0000-base of DS),(a0000-base of DS+320),
... There's an other version which doesn't change EAX :
mov edx,table[ebx*4]
mov [eax+edx],cl
* Simple recursive calls *
Sometimes we have to call one subroutine many times like this:
mov cx,4
call waitraster
loop $-3
But this requires a register as cycle counter ;-) There's the other way:
call waitraster4
...
waitraster4:
call waitraster2
waitraster2:
call waitraster
waitraster:
mov dx,03dah
...
ret
Well, this is not really interesting. It just works :-) Now a more usable
example : loading instrument data to the AdLib card.
;Load AdLib instrument. Inputs:
;ds:si: register values (5 words;
; lower byte: data for operator
; 1, higher byte: data for
; operator 2)
;al: adlib port (0,1,2,8,9,a,10h,
11h,12h)
loadinstr:
mov dx,388h
add al,0e0h
call double_load
sub al,0c6h
call double_double
add al,1ah
double_double:
call double_load
add al,1ah
double_load:
call final_load
final_load:
mov ah,[si]
inc si
out dx,al
call adlib_address_delay
xchg al,ah
out dx,al
call adlib_data_delay
mov al,ah
add al,3
ret
* Hardware scroll with one page *
First a few words about vertical hardware scrolling. The 'standard'
scroll requires at least two pages. In the beginning the first page is
visible, and it's black. Then the screen goes up one row - the first
row of the second page appears at the bottom. Now this row is copied to the
1st row of the 1st page (which row is now invisible). This process continues
until the 2nd page is entirely visible. At this point the two pages
are identical. Now the 1st page is displayed again and the whole process
starts from the beginning. The problem with it is the memory requirememnt,
which is too big. With this method it's impossible to make a 640*480
scroll since one page occupies more than 128k video memory.
But why do we need two pages? Because the video memory is not 'circular'.
I mean if we'd scroll the screen up by one pixel, the 1st row of the video
memory which was on the top of the screen now would be at the bottom.
With this kind of video memory we could do a smooth vertical scroll with
a single page : in the beginning, the screen is black. Now wait for a vertical
retrace, then change the 1st row, and shift the screen up by one row
that the previously modified row appear in the bottom. Perfect eh? The
question is how can we make 'circular' memory...
It's a well-known fact that there's a certain problem with the hardware
scroll on TSENG cards : every second page contains some 'noise' instead of
the scroll we're expected to see...The cause of this is the 'memory
display start' register (3d4/0c,0d) which works a bit different than other
cards. At other cards always only the first 256k of the video memory will be
displayed on the screen, even if the memory display start register (MDSR)
is set close to the end of the 256k. These cards handle this 256k memory as
a circular buffer, but the TSENG boards not:
.----------. <- screen -> .----------.
| video |<- MDSR ->|video |
| memory | |memory |
| | | |
| 3ffff| | 3ffff |
|----------| ---------- |
|00000 | TSENG -> |40000 |
| wraps | |continues |
| | <- VGA | |
`----------´ `----------´
So what we can do is 'emulate' the standard VGA circular buffer with the
LINE COMPARE REGISTER (LCR, 3d4/18h). The function of this register is
pretty simple: if the scanline counter reaches this value, the display
address wraps to 0, beginning of the video memory:
.---------.
MDSR -> |video |
|memory |
| |
line | ?????|
compare -> |---------|
register |00000 |
|wraps |
| |
`---------´
The *big* advantage is that it's possible to emulate shorter than 256k
circular video memory! It should work on all VGA cards. The most elegant way
is to add a LCR changer code to the MDSR modifier routine. With this the
existing 'standard' scrollers can be fixed for TSENG cards too. Remember,
the line compare register is 10-bit, the highest two bits are located in
3d4/7/4. bit and 3d4/9/6. bit.
* Gouraud shading - 2 instructions/pixel *
The main goal of this example is not really to show a G-shading with two
instructions ;-) It's rather an example for 'how to pray down the
upper words of 32-bit registers without shifting'. There's often a need
for calculating with fixed-point numbers: a doubleword's upper word is
the whole part, the lower is the fractional part. The problem is that
the upper words of the 32-bit registers are hard to reach. For
example, at ADD EAX,EBX how to get EAX's upper word? No (quick) way :-(
The idea beyond t he trick is changing the upper & lower words, and using ADC
instead of ADD:
; EAX & EBX are fixed-point numbers
ror eax,16
ror ebx,16
cycle:
...
adc eax,ebx
stosw
...
loop cycle
The whole part of the fixed-point numbers will be in the lower words :-)
It's very important to save the Carry flag for appropriate result.
Now the Gouraud shading. The following piece of code is only a horizontal
shaded line drawer routine, not the whole poly-filler. Colors are expected
to be fixed-point numbers presented as doublewords with 8-bit whole part
in the highest byte (this value will appear on the screen) and 18-bit
fractional part. (18 bits may seem to be a lot, but surely more accurate
than 8 bits ;-)
;In: eax: end color
; ebx: start color
; ecx: line length
; es:edi: destination
;!!! 32-bit PM version !!!
gou_line:
sub eax,ebx
;Fill edx with the carry flag
rcr eax,1
cdq
rcl eax,1
idiv ecx
;Pull down the upper parts of dwords
rol eax,8
rol ebx,8
xchg ebx,eax
;Calculate the address of the entry
;point in the linearized code
neg ecx
lea ecx,[ecx*2+ecx+320*3+
offset gou_linearized]
jmp ecx
gou_linearized:
rept 320
stosb
adc eax,ebx
endm
ret
Variations: If You want to use it in real mode, then You have to modify the
linearized-code entry point calculation, because the length of a
stosb/adc pair is four bytes:
neg cx
shl cx,2
add cx,320*4+offset g.lin.
jmp cx
486-optimization fans may think some indexed linearized code instead of
stosb :-) In this case take care to correctly set up the lin. code because
the lengths of 'mov [edi+0],al', 'mov [edi+1],al' and 'mov [edi+200h],al'
are different, so with a rept we won't get equal-length instructions.