Code Gems Part 5
This text comes from IMPHOBIA Issue XI - December 1995
It's the third time that Code Gems appears in Imphobia! I hope you'll
find some useful info here ;-) If You have any suggestions, questions or
ideas, don't hesitate to contact me :
Ervin / Abaddon
Ervin Toth
48/A, Kiss Janos street
1126 Budapest
HUNGARY
+36-1-201-9563
ervin@sch.bme.hu
* LEA strikes back *
Most of us love the LEA instruction because of its wide usability. One
of the possible uses is when it's used to multiply:
LEA EBX,[EBX+EBX*4]
Is it the fastest way in real mode when the upper word of EBX is
sometimes useless? Not. The machine code of this instruction is 66,67,8d,
1c,9b. As You can see, it contains TWO prefixes, 66 and 67: both the operand-
and address-size prefix. What happens if the first one is missing, like in
LEA BX,[EBX+EBX*4] (67,8d,1c,9b) ?
- the upper word of EBX isn't changed
- instruction is shorter by one byte
- TASM understands this form too :-)
- ...and it's faster!
Let's take a look from the other side:
LEA EAX,[SI]
This does the same as
MOVZX EAX,SI !!!
And it's shorter & quicker in both 16 and 32-bit code... What a pity that
only a few register combinations fit between the brackets : [BX], [BP],
[SI], [DI], [BX/BP+SI/DI+immediate]. In 16-bit code there's a similar
trick, LEA with 16-bit immediate. For example,
LEA EAX,[1234h]
which clears the upper word of EAX too but is shorter than
MOV EAX,1234h.
By the way, the Intel processors don't support the immediate LEA with 32-bit
operand. And hell, TASM doesn't understand the immediate LEA, it must be
hardcoded each time.
* Aligned fill *
Back to the stoneage : filled vectors. The most dirty solution for filling a
horizontal line (instead of a rep stosb) is probably this:
test cl,1
je _one
stosb
_one:
test cl,2
je _two
stosw
_two:
shr cx,2
rep stosd
Generally it's a really time-wasting way. A doubleword written to the
memory may take some extra cycles if it wasn't aligned on dword boundary.
Writing ONE doubleword MISALIGNED may take as much time as writing TWO
doublewords ALIGNED! So here follows a horizontal line filler, which writes
everything completely aligned without any conditional jumps (eax = color,
cx: number of bytes to fill, es: di -> target):
mov bx,cx ; Save CX
xor cx,cx ; Put 1 to CX if it
test bx,bx ; wasn't 0, else leave
setne cl ; it zero
and cx,di ; Leave CX 1 if DI is
sub bx,cx ; odd, else clear it
; and adjust BX
rep stosb ; Fill one byte if DI
; was odd
cmp bx,2 ; Put 2 to CX if we
setnb cl ; need to fill two or
add cx,cx ; more bytes, else 0
and cx,di ; Clear CX if DI is on
sub bx,cx ; dword boundary, else
; leave it & adjust BX
shr cx,1 ; Fill one word (if CX
rep stosw ; isn't 0)
mov cx,bx ; Put the number of
shr cx,2 ; remaining bytes to
rep stosd ; CX and fill dwords
and bx,3 ; Fill the rest
mov cx,bx
shr cx,1
rep stosw
adc cx,cx
rep stosb
Is it really faster than a rep stosb? Not always. Only when a lot of bytes
have to be filled - around 10. And of course it can be even quicker with
conditional jumps. But without those it's so nice, eh...?
* Use ES *
Are You bored of writing
MOVS WORD PTR ES:[DI],ES:[SI]
when you want to redefine the source
segment register? You can use
SEGES
MOVSW
instead... SEGDS, SEGES, etc. is a built-in macro in TASM, they're gonna
be compiled as DS:, ES:, etc. prefixes.
* Introducing a new video resolution *
In 256-color modes using two screen pages was always a big pain. We had to
deal with chain-4 mode or VESA or SVGA registers. Basically, on a 'standard'
VGA card in 'chunky' mode there's no way to reach the card's memory above
64k. So what can we do with the usable 64k? If we want to use it for two
screen pages, one page eats 32k, which resolution is it enough for? Well, a
resolution of 256*128 is a possible choice. With this we have can handle
two pages... How to initialize this mode:
mov ax,13h
int 10h
mov cx,11
mov dx,03d4h
mov si,offset tweaker
rep outsw
radix 16
tweaker dw 0e11,6300,3f01,4002
dw 8603,5004,0d07,5810
dw 0ff12,2013,0915
After this there's a small window in the middle of the screen with 256*128
pixel dimensions. Pixel drawing will be quite easy because of the
horizontal length :-) This resolution is fully laptop-incopatible and some
weird monitors probably won't accept it (however it never happened under my
tests) as well as some VGA cards won't love the rep outsw (Hi Jmagic ;-)
This mode was used in Technomancer's UnderWater demo and Fish intro.
(Thanks for detecting the bug, pal!)
The next topic was written by LEM
* Fast way to clear the screen *
(by using less colors in 256c mode)
Let's say You use 32 colors, the first 32 and put black to the registers
left. In the next frame put black to the first 32 registers and restore the
32c palette in the next 32 registers.. and so on... prob... after 8 frames
you'll reach the end of the palette and go back to the first 32 colors...
(if You use 1 col/frame you'll reach the end of the palette after 256
frames ofcoz) so you'll see the old crap you left 8 frames earlier...
Hehehe,DID YOU REALLY THINK THAT WAS ALL ???
You can actually clear the screen with that technique!
Let's say You want to draw points in 1 color (works for 1 -> xx colors !!!)
You take 200 colors for the trick which makes 56 colors left; use them
for a logo for example, a size of 64*200 (any size, but just to show you
it works) and put it somewhere on the screen (let's say on the left).
Now You've got 200 colors left, so you can display points during 200 frames
before having garbage on the screen,RIGHT?
WRONG!!! You can display points forever without garbage!
frame 0: col 0 on, clear line 0
frame 1: col 1 on,
col 0 black, clear line 1
...
frame 199: col 199 on,
col 198 black, clr line 199
Go back to color 0, so now You actually have cleared 200 lines in 200
frames! Remember, color 0 was used only during 1st frame so when you're
going to use it again, there won't be be color0 on the screen anymore (but
there will be color 1 -> 199 but they're all black).
Of course you can use 2, 10, x colors (and clear then screen in 200/x
times, the problem is that you have to set/reset a lot of colors every frame)
AND keep some colors for a nice smoothed picture (of course You CANNOT
draw with the tricky technique on it.)
* Uncle Ben's wisdoms of the month *
Remember, a NEG + DEC pair equals a NOT. And the DEC/INC doesn't change
the carry flag. You may need it some day...
* Intel processor bug??? *
I've faced (again...) an interesting problem during the development of some
32-bit interface. Let's say we don't need the 0..7 interrupts, so there are
no interrupt gates in the memory where the IDT starts. The first valid int
gate is No. 8 , which is at IDT's base+40h. The problem is that when an
interrupt occurs in 32-bit protected mode, and the interrupt handler's code
is in the beginning of the IDT (in those unused 40h bytes), the processor
shuts down, but when the int handler's code is somewhere else, everything is
okay. Do you have any ideas...?
* Imphobia Coder Compo! *
This article has been removed because Imphobia #12 has already been released.