Code Gems Part 4
This text comes from IMPHOBIA Issue X - June 1995
* Demand paging on SVGA boards in PM *
(This is going to be deep protected mode system coding, be prepared...)
It would be very nice to 'map' the video memory to the linear address
space so we could reach it as a one megabyte long array. Some cards
support it, the rest not : at these card s only the 'bank switching'
routines allow to access the entire video memory. Our goal is to reduce
the number of bank switches as possible. Several techniques has
been developed, but many of them has a big problem : the routine which
determines whether a bank-switch is necessary must run very much times.
The next method solves this problem. It maps an 1MB long memory area to
the video memory on any SVGA card, and bank-switch will occur only if
necessary. It works in protected and flat virtual mode only, NOT in (flat) real mode.
Essentially it's a kind of 'virtual memory' technique based on PAGING.
Let's set up the 4k-page table reserving one megabyte above the highest
physical memory address (let's say from 800000h to 8ffffffh) and map it
to a0000h by 64k steps:
Linear address|Physical address
--------------.-------------.---------
0- fff | 0- fff|This is
1000- 1fff | 1000- 1fff|the very
2000- 2fff | 2000- 2fff|normal
... | |mapping,
7ff000-7fffff |7ff000-7fffff|no diff.
--------------+-------------+---------
800000-800fff | a0000- a0fff|800000-
801000-801fff | a1000- a1fff|80ffff:
... | |mapped to
80f000-80ffff | af000- affff|a0000-
| |affff
--------------+-------------+---------
810000-810fff | a0000- a0fff|810000-
811000-811fff | a1000- a1fff|81ffff:
... | |mapped to
81f000-81ffff | af000- affff|a0000-
| |affff
--------------+-------------+---------
... | |
--------------+-------------+---------
8f0000-8f0fff | a0000- a0fff|8f0000-
... | |8fffff:
8ff000-8fffff | af000- affff|mapped to
| |a0000-
| |affff
Great. From 8 to 9 megabytes we can address the a0000 - affff segment
sixteen times. Now comes the TRICK. Mark all 4k pages between 810000-
8fffff as 'NOT PRESENT' and pages in 800000-80ffff as 'PRESENT', and hook
interrupt 0e ('page fault' exception). If a page fault occurs, it means that
a bank switch needed - mark the accurate pages as 'PRESENT' and
old ones as 'NOT PRESENT', do the bank-switch, and return from the
exception.
The fault handler looks like this:
push eax edx
; Get page fault address:
mov eax,cr3
; Substract starting address
sub eax,800000h
; Put bank's number to AL
shr eax,16
; SVGA bankswitch
mov dx,svga_switch_port
out dx,al
; Mark pages present/absent
(not too difficult to do :-)
; Bye
pop edx eax
iretd
This example assumes that the video memory can be browsed in 64k banks
and the bank-switch is simple :-)
Now let's observe the problems...
1. Paging must be enabled. This causes some slowdown. But paging can be
disabled when no video-operations are in use.
2. EMM compatibility. No problem with VCPI, the only difference is that
paging may not be disabled at virtual-mode callbacks. The big
problem is the DPMI, which doesn't allow modifying the page table.
3. Reading the video memory. Some SVGA cards have a separate write and read
bank register, but that makes no difference. The page fault handler
can not determine whether it was a read or write operation at a MOVS.
So it will (occasionally) switch bank TWICE in a single MOVS instruction!
This means that reading the video memory should be eliminated as
possible.
4. Words and doublewords written to bank boundaries. This is the roughest
problem. When no special code is inserted to handle this case, the
system will fall to an infinite exception cycle :-( Let's take look at
an example : a STOSD occurs to 80fffe. It causes a page fault since the pages
from 810000 are not present. The fault handler enables the pages between
810000-81ffff and disables 800000-80ffff, and returns to the
instruction which caused the exception but then immediately a new fault is
generated because the 80fffe address is in an absent page...
Of course this can be avoided by
a) writing bytes only,
b) writing words to even addresses and dwords to dword boundaries. If none of
these conditions can be satisfied, the fault handler must decode the
instruction which caused the exception and emulate it. This can be pretty
simple if only one-two kind of instructions access the desired area.
Of course it's enough to check only those instructios which caused a page
fault above the address fffc:
mov eax,cr3
cmp ax,0fffch
ja possible_bank_override
normal_fault:
...
iretd
; Check the instruction which caused
; the page fault (STOSD's code is A5)
possible_bank_override:
mov edx,[esp]
cmp byte ptr[edx],0a5h
jne normal_fault
; Now emulate a STOSD
...
iretd
Ok. This is the end until the next issue. If You have questions, ideas,
or anything, feel free to contact me:
Ervin / AbaddoN
Ervin Toth
48/A, Kiss Janos street
1126 Budapest
HUNGARY
(+36)-1-201-9563
ervin@sch.bme.hu