Code Gems Part 5

This text comes from IMPHOBIA Issue XI - December 1995


It's the third time that Code Gems appears in Imphobia! I hope you'll find some useful info here ;-) If You have any suggestions, questions or ideas, don't hesitate to contact me :
           Ervin / Abaddon
              Ervin Toth
       48/A, Kiss Janos street
            1126 Budapest
               HUNGARY
            +36-1-201-9563
           ervin@sch.bme.hu


* LEA strikes back *

Most of us love the LEA instruction because of its wide usability. One of the possible uses is when it's used to multiply:
         LEA EBX,[EBX+EBX*4]
Is it the fastest way in real mode when the upper word of EBX is sometimes useless? Not. The machine code of this instruction is 66,67,8d, 1c,9b. As You can see, it contains TWO prefixes, 66 and 67: both the operand- and address-size prefix. What happens if the first one is missing, like in
  LEA BX,[EBX+EBX*4] (67,8d,1c,9b) ?

- the upper word of EBX isn't changed
- instruction is shorter by one byte
- TASM understands this form too :-)
- ...and it's faster!

Let's take a look from the other side:
             LEA EAX,[SI]
This does the same as
           MOVZX EAX,SI !!!
And it's shorter & quicker in both 16 and 32-bit code... What a pity that only a few register combinations fit between the brackets : [BX], [BP], [SI], [DI], [BX/BP+SI/DI+immediate]. In 16-bit code there's a similar trick, LEA with 16-bit immediate. For example,
           LEA EAX,[1234h]
which clears the upper word of EAX too but is shorter than
            MOV EAX,1234h.
By the way, the Intel processors don't support the immediate LEA with 32-bit operand. And hell, TASM doesn't understand the immediate LEA, it must be hardcoded each time.

* Aligned fill *

Back to the stoneage : filled vectors. The most dirty solution for filling a horizontal line (instead of a rep stosb) is probably this:
        test    cl,1
        je      _one
        stosb
_one:
        test    cl,2
        je      _two
        stosw
_two:
        shr     cx,2
        rep     stosd
Generally it's a really time-wasting way. A doubleword written to the memory may take some extra cycles if it wasn't aligned on dword boundary. Writing ONE doubleword MISALIGNED may take as much time as writing TWO doublewords ALIGNED! So here follows a horizontal line filler, which writes everything completely aligned without any conditional jumps (eax = color, cx: number of bytes to fill, es: di -> target):
mov     bx,cx   ; Save CX

xor     cx,cx   ; Put 1 to CX if it
test    bx,bx   ; wasn't 0, else leave
setne   cl      ; it zero

and     cx,di   ; Leave CX 1 if DI is
sub     bx,cx   ; odd, else clear it
                ; and adjust BX
rep     stosb   ; Fill one byte if DI
                ; was odd
cmp     bx,2    ; Put 2 to CX if we
setnb   cl      ; need to fill two or
add     cx,cx   ; more bytes, else 0

and     cx,di   ; Clear CX if DI is on
sub     bx,cx   ; dword boundary, else
                ; leave it & adjust BX
shr     cx,1    ; Fill one word (if CX
rep     stosw   ; isn't 0)

mov     cx,bx   ; Put the number of
shr     cx,2    ; remaining bytes to
rep     stosd   ; CX and fill dwords

and     bx,3    ; Fill the rest
mov     cx,bx
shr     cx,1
rep     stosw
adc     cx,cx
rep     stosb
Is it really faster than a rep stosb? Not always. Only when a lot of bytes have to be filled - around 10. And of course it can be even quicker with conditional jumps. But without those it's so nice, eh...?

* Use ES *

Are You bored of writing
    MOVS WORD PTR ES:[DI],ES:[SI]
when you want to redefine the source segment register? You can use
SEGES
MOVSW
instead... SEGDS, SEGES, etc. is a built-in macro in TASM, they're gonna be compiled as DS:, ES:, etc. prefixes.

* Introducing a new video resolution *

In 256-color modes using two screen pages was always a big pain. We had to deal with chain-4 mode or VESA or SVGA registers. Basically, on a 'standard' VGA card in 'chunky' mode there's no way to reach the card's memory above 64k. So what can we do with the usable 64k? If we want to use it for two screen pages, one page eats 32k, which resolution is it enough for? Well, a resolution of 256*128 is a possible choice. With this we have can handle two pages... How to initialize this mode:
        mov        ax,13h
        int        10h
        mov        cx,11
        mov        dx,03d4h
        mov        si,offset tweaker
        rep        outsw

radix        16
tweaker dw        0e11,6300,3f01,4002
        dw        8603,5004,0d07,5810
        dw        0ff12,2013,0915
After this there's a small window in the middle of the screen with 256*128 pixel dimensions. Pixel drawing will be quite easy because of the horizontal length :-) This resolution is fully laptop-incopatible and some weird monitors probably won't accept it (however it never happened under my tests) as well as some VGA cards won't love the rep outsw (Hi Jmagic ;-) This mode was used in Technomancer's UnderWater demo and Fish intro. (Thanks for detecting the bug, pal!)

The next topic was written by LEM

* Fast way to clear the screen *
(by using less colors in 256c mode)

Let's say You use 32 colors, the first 32 and put black to the registers left. In the next frame put black to the first 32 registers and restore the 32c palette in the next 32 registers.. and so on... prob... after 8 frames you'll reach the end of the palette and go back to the first 32 colors... (if You use 1 col/frame you'll reach the end of the palette after 256 frames ofcoz) so you'll see the old crap you left 8 frames earlier...

Hehehe,DID YOU REALLY THINK THAT WAS ALL ???

You can actually clear the screen with that technique!

Let's say You want to draw points in 1 color (works for 1 -> xx colors !!!) You take 200 colors for the trick which makes 56 colors left; use them for a logo for example, a size of 64*200 (any size, but just to show you it works) and put it somewhere on the screen (let's say on the left).

Now You've got 200 colors left, so you can display points during 200 frames before having garbage on the screen,RIGHT?

WRONG!!! You can display points forever without garbage!
frame   0: col   0 on,    clear line 0
frame   1: col   1 on,
           col   0 black, clear line 1
...
frame 199: col 199 on,
           col 198 black, clr line 199
Go back to color 0, so now You actually have cleared 200 lines in 200 frames! Remember, color 0 was used only during 1st frame so when you're going to use it again, there won't be be color0 on the screen anymore (but there will be color 1 -> 199 but they're all black).

Of course you can use 2, 10, x colors (and clear then screen in 200/x times, the problem is that you have to set/reset a lot of colors every frame) AND keep some colors for a nice smoothed picture (of course You CANNOT draw with the tricky technique on it.)

* Uncle Ben's wisdoms of the month *

Remember, a NEG + DEC pair equals a NOT. And the DEC/INC doesn't change the carry flag. You may need it some day...

* Intel processor bug??? *

I've faced (again...) an interesting problem during the development of some 32-bit interface. Let's say we don't need the 0..7 interrupts, so there are no interrupt gates in the memory where the IDT starts. The first valid int gate is No. 8 , which is at IDT's base+40h. The problem is that when an interrupt occurs in 32-bit protected mode, and the interrupt handler's code is in the beginning of the IDT (in those unused 40h bytes), the processor shuts down, but when the int handler's code is somewhere else, everything is okay. Do you have any ideas...?

* Imphobia Coder Compo! *

This article has been removed because Imphobia #12 has already been released.