Interrupt Essentials

What is it really?

Welcome

Hi! Welcome to the eighth chapter of this series. You're about ready to learn the heart beat of assembly language: Interrupts. Mastering it, you can do a lot of interesting stuff. This "interrupt" guy seems to be quite a headache for most people. Let's find out why.

Introduction to Interrupt

You have heard "interrupt" jargon from assembly gurus. What is it? In the first chapter, I hinted a clue: Interrupt is just like a procedure provided by the system (i.e. BIOS, Operating system, or drivers for the most part). You can invoke it -- they do useful stuff. Let's look at the snippet of our first program in the first chapter:

     :
   mov   ax, 4c00h
   int   21h
     :

Here you see how to invoke an interrupt. These two lines actually request the operating system to terminate the program.

Interrupt invocations does more interesting stuff. You see that the interrupt is called using int instruction with a number after it. In the example, it is 21h. Invoking different number will end up with different result. This number is refered as interrupt number.

Interrupt number alone is not enough. Interrupt behaves differently depending on which service number is called. In this example, the service number is placed in AH, which is equal to 4ch. (Remember that when AX=4c00h, it means AH=4ch and AL=00h). Service numbers are usually placed in AH. Sub-service number is usually placed in AL. So, in this case, we're calling interrupt 21h, service 4ch, subservice 00h.

This interrupt mechanism is pretty much like a phone number. Think of the interrupt number as an area code, the service number as the phone number, and the sub-service number as the extension. Calling different number will cause different response.

Now the problem here is: How can we know which number perform which service. That's a good question. If you forgot a phone number, what will you do? You'll open the phone book, right? It is likewise in this situation. However, now our "phone book" is called interrupt list -- a list of known service -- which you can get here.

Unlike phone companies, virtually no one regulates the interrupt numbers for each program / driver / service. So, you'll find that those interrupt numbers sometimes conflict each other. This causes software incompatibility. However, there are some interrupt number that you can hold for sure, for example 10h for BIOS services and 21h for operating system services, and more others. As you try to discover your way to unleash interrupt powers, don't try conflicting or dubious interrupt numbers. But don't too discouraged either. Or maybe you can flow along with this course first before taking off.

Hardware and Software Interrupts

People often distinguishes between hardware and software interrupts. The difference between the two is just who triggers the interrupt. If it is triggered by hardware, then it is called hardware interrupt. Similarly, if it is triggered by software, it is called software interrupt. The one that we used just now is a software interrupt, because it's our program which invoked it. Hardware interrupts are rarely invoked. Instead of being invoked, we usually put a "listener" routine and intercept them so that the next time when the hardware invoke the interrupt, we can intercept it and do appropriate actions. There are two categories in hardware interrupts. One category belongs to the interrupts issued by the processor, and the other is issued by other hardwares like sound cards, network cards, and so on. The second category is then known as interrupt requests (IRQ).

This lesson primarily focuses on software interrupts. Hardware interrupts will be discussed later.

As the interrupt numbers are limited, IBM first try to regulate this interrupt numbers. Interrupt number 00h to 0Fh are assigned for hardware interrupts. Computers newer than AT, have additional hardware interrupt numbers assigned from 0A0h to 0A7h. The rest, i.e. 10h to 0FFh are assigned for software interrupts. Yes, the maximum number of interrupts is only 0FFh or 255. It is IBM too who designated the interrupt numbers from 00h through 2Fh. These numbers become defacto standard for all PCs. Since the number 30h to 0FFh are marked as "free", many drivers, resident programs, and others race to reserve those numbers. Even worse, the original interrupt numbers from 28h to 2Fh, which are marked as "reserved", are also taken. So, the only reliable numbers are from 00h to 27h. Of course there are "survivors" who managed to emerge as a new standard, such as: Novell drivers who took over 2Ah and 7Ah, Microsoft mouse at 33h, EMS driver at 67h, Himem/XMS driver at 2Fh, and so on.

How about the service numbers. There is chaos too. Some drivers even "coveted" the "pristine" de-facto interrupt numbers like 10h and 21h. They try not to conflict with the original ones by assigning peculiar service numbers. However, as the computer system grows, the system may overtake these service numbers too. Thus, sometimes, upgrading to new computer system (either OS or new computer), may cause a clash.

How can we cope with this? Again, stay with the "common" numbers. How can we know that? Try to look here. Or just enjoy these series. Or maybe you can read more assembly programs to figure out which.

Input and Output to Screen

Taking input or printing output to the screen need interrupts to be invoked. This is why we haven't got screen outputs yet. I tried to cover this earlier, but I realized that this may take a lot of background to be more interesting. Alright. Let's try some "Hello world!" stuff. Please note that I'm not going to write MASM variants now. After these chapters, you should be able to do minor changes to this TASM variant to feed it into MASM. So, I'll write only TASM version.

ideal
p286n
model tiny

codeseg
   org 100h
   jmp start

   message db 'Hello World!$'
start:
   mov dx, offset message
   mov ah, 09
   int 21h

   mov ax, 4c00h
   int 21h
end

Focus on the first three lines after the start label. You'll recognize that we are invoking interrupt number 21h, service 09h. As I mentioned earlier, interrupt 21h is reserved for operating system calls, which is in this case DOS calls. And when you look up what service 09h does on interrupt 21h in interrupt list, it says print out a string. How the information on the interrupt list say?

INT 21 - DOS - PRINT STRING
        AH = 09h
        DS:DX = address of string terminated by "$"
Note:  Break checked, and INT 23h called if pressed

The interrupt list says something else too. It requires DS:DX register pair to point to the message. When a register pair is needed and the first one is segment register (CS, DS, or ES), it usually signifies a pointer address. For now you don't have to worry about setting DS since in our memory model it has been taken care of automatically (well, sort of). I'll explain this later after you have more backgrounds. The only thing we should worry about is just setting DX. So, prior to calling the interrupt service, we assign DX as the offset of the message.

About the message itself, don't worry about whether the break is checked or not. Just ignore it. It will only matter if you deal more in system programs.

Run it and compile it! Ahh.... finally a "Hello World!" appears on the screen. To insert a new line simply change the message declaration into:

   message db 'Hello World!',10,13,'$'

Some assembly gurus will say, "Hey, int 21h service 09h is obsolete!". Yup, that's true. Don't worry about it. It's still there for backward compatibility. So, it shouldn't matter. We just want our hello world, don't we? :-)

Getting input is a little different. If we look into the interrupt list, interrupt 21h (again), service 0Ah offers a mean to input from keyboard. How the interrupt list say?

INT 21 - DOS - BUFFERED KEYBOARD INPUT
        AH = 0Ah
        DS:DX = address of buffer
Note: first byte of buffer must contain maximum length on entry,
      second byte contains actual length of previous line which may
        be recalled with the DOS line-editing commands
      on return the second byte contains actual length, third and
        subsequent bytes contain the input line

It also requires DX to be pointed to the buffer to hold the input. However, the buffer structure is a bit different. This time, DOS requires the first byte to denote the maximum possible length of the text, the second one is reserved to specify how many characters are actually inputted by the user, and the rest will hold the actual message. So, this is how I arrange this: In this example, the maximum characters possible is 80, denoted by maxlen. Right following it, I reserve a byte to hold the actual length. Then the message, which is reserved as 80 bytes (i.e. 80 dup (?)).

ideal
p286n
model tiny

codeseg
   org 100h
   jmp start

   maxlen     db 80
   actual_len db ?
   message    db 80 dup (?)

start:
   mov dx, offset maxlen
   mov ah, 0ah
   int 21h

   mov ax, 4c00h
   int 21h
end

Since DOS requires the maximum length to be the first byte, we point DX to the offset of maxlen instead of to message.

Output: A Better Version

If you examine the output interrupt we discussed earlier, it seems to have a significant disadvantage: the requirement of '$' terminator at the end. There is one way to cope with that: output characters one by one using a loop. The loop terminates if the character being read is 0. Zero in ASCII number is defined as a blank and usually used to terminate stuffs. Thus, this is called zero terminated ASCII, or better known as ASCIIZ. This string format is used in C/C++. Pascal uses different string format: The first byte stores the string length and then followed by the string. However, this leads to a disadvantage: The maximum string length is limited to 255. So, for now, we use ASCIIZ.

Now, how can we print an ASCIIZ string? First of all, we need to know how to print one character on screen. We're not using int 21h/09h anymore. Aha, we found it! It is (again) interrupt 21h, service 06h. Let's look what the interrupt list says:

INT 21 - DOS - DIRECT CONSOLE I/O CHARACTER OUTPUT
        AH = 06h
        DL = character <> FFh

What else does it require? It require DL as the input character and DL must not be equal to 0ffh. No problem. So, our problem is solved. OK, before we write our assembly solution, it is always better to write a pseudocode first to sketch out the algorithm. Let's look into the pseudo-code. Let's say the message is stored in message variable:

  BX = offset of message;
  do
     DL = character pointed by BX;
     BX = BX + 1;
     if DL is not 0 then
         AH = 6;
         int 21;
     end
  while DL is not 0

OK, the algorithm is set, let's transform this to assembly. Let's take the standard skeleton from the previous program:

ideal
p286n
model tiny

codeseg
   org 100h
   jmp start

   message db 'Hello World!',10,13,0
start:
   mov  bx, offset message

myloop:
   mov  dl, [bx]  ; --> means fetch a character pointed by BX to DL.
   inc  bx        ; --> BX = BX + 1
   cmp  dl, 0     ; --> Is DL zero?
   je   quit      ; --> If yes, then quit

   mov  ah, 06    ; --> Otherwise, do the print service and repeat
   int  21h
   jmp  myloop

quit:

   mov  ax, 4c00h
   int  21h
end

Ah, so, it's pretty obvious. Note the mov dl, [bx]. If you surround a register with a square bracket like this: [bx], it means BX is treated as a pointer instead of value, and the square bracket denotes dereferencing. OK, this involves a good understanding of pointer. I won't reexplain pointers here. When we say mov bx, offset message, assembler puts an integer for BX register as a representation of the pointer of message variable. If it is dereferenced (with a square bracket), we're no longer talking about the integer, but the value it's pointing to. So, in this case, for the first iteration, dl will contain the letter 'H'. The next line is inc bx which means the pointer is incremented so that the next time, when it gets dereferenced, it will point to the next character, 'e'. This process loops until it reaches 0, the end of the string.

How about the input? I leave it for your homework. As a big hint, here is the excerpt from the interrupt list:

INT 21 - DOS - DIRECT CONSOLE I/O CHARACTER INPUT
        AH = 06h
        DL = 0FFh
Return: ZF set   = no character
        ZF clear = character recieved

        AL = character
Notes: Character is echoed to STDOUT if received. Break are NOT checked

Apparently, we still need to invoke int 21h/06h, but now with DL is equal to 0ffh. At the return, AL will contain the character. Ignore everything else for the moment. We're doing this in a loop too. But now, we'll terminate when we receive an 'Enter' button pressed, right? I give you a hint: The enter button code is 13. How can we put a character back to a dereferenced pointer like above? Here is how: mov [bx], al. Aha! The reset should be pretty easy. Go get it! ;-)

Number To String

The output routines we discussed so far are intended only for outputting strings. How can we output numbers? Well, we have to convert the numbers to string first. For example: We'd like to output the contents of variable n. Let's say n is a unsigned 16-bit integer variable. Next, we'll need a temporary variable to hold the string value of n. Since we know unsigned 16-bit integer ranges from 0 to 65535, we only need to reserve 6 bytes in the temporary variable: five for the digits, one for the zero terminator. Alright, so how's the pseudo code then? Let's look at the excerpt below:

  BX = offset of message;
  DX:AX = n
  AX = DX:AX / 10000, remainder in DX

  ; since we know AX is between 0 and 9 after division, so, it is safe to say
  ; that the interested digit is in AL. Then we need to convert AL into ASCII
  ; digit '0' through '9'. Fortunately, there is an easy conversion. ASCII digit
  ; '0' is 30h, '1' is 31h, and so on. Thus, converting AL to its corresponding
  ; ASCII character just take an addition by 30h.

  AL = AL + 30
  [BX] = AL      ; store the digit
  BX = BX + 1    ; increment pointer

  ; Repeat the division
  AX = DX       ; because we deal only with the remainder now.
  DX = 0

  AX = DX:AX / 1000, remainder in DX

  AL = AL + 30
  [BX] = AL      ; store the digit
  BX = BX + 1    ; increment pointer

  ; Repeat the division
  AX = DX       ; because we deal only with the remainder now.
  DX = 0

  AX = DX:AX / 100, remainder in DX

  AL = AL + 30
  [BX] = AL      ; store the digit
  BX = BX + 1    ; increment pointer

  ; Repeat the division
  AX = DX       ; because we deal only with the remainder now.
  DX = 0

  AX = DX:AX / 10, remainder in DX

  AL = AL + 30
  [BX] = AL      ; store the digit
  BX = BX + 1    ; increment pointer

  ; At this point, we have DX is between 0 to 9 as the remainder. So, we can directly
  ; convert DL to ASCII and store it in BX. (why? because when DX=0..9, DH always equals to 0,
  ; DL will be 0..9.

  DL = DL + 30
  [BX] = DL

Whoa! That's pretty long! Well, there is a shorter way using a loop. This is just for you to learn. As you might have guessed, we just divide from the highest possible digit: the ten thousands, then the thousands, down to the hundreds, and the tens, and then the last digit. This is quite straight forward. Let's examine the assembly code below:

ideal
p286n
model tiny

codeseg
   org 100h
   jmp start

   n       dw 12345
   message db 6 dup (0)
start:
   mov  bx, offset message

   mov  ax, [n]
   sub  dx, dx       ; DX:AX = n
   mov  cx, 10000    ; Divide it by 10000
   div  cx           ; result in AX,  remainder in DX

   add  al, 30h
   mov  [bx], al
   inc  bx
   mov  ax, dx
   sub  dx, dx
   mov  cx, 1000     ; Divide it by 1000
   div  cx

   add  al, 30h
   mov  [bx], al
   inc  bx
   mov  ax, dx
   sub  dx, dx
   mov  cx, 100      ; Divide it by 100
   div  cx

   add  al, 30h
   mov  [bx], al
   inc  bx
   mov  ax, dx
   sub  dx, dx
   mov  cx, 10       ; Divide it by 10
   div  cx

   add  dl, 30h
   mov  [bx], dl


   ; The rest are the same:

   mov  bx, offset message

myloop:
   mov  dl, [bx]  ; --> means fetch a character pointed by BX to DL.
   inc  bx        ; --> BX = BX + 1
   cmp  dl, 0     ; --> Is DL zero?
   je   quit      ; --> If yes, then quit

   mov  ah, 06    ; --> Otherwise, do the print service and repeat
   int  21h
   jmp  myloop

quit:

   mov  ax, 4c00h
   int  21h
end

Hmm, that's cubersome, isn't it? We don't need to put the zero terminator at the end. Why? Because we initialize the temporary buffer as all zeros. Now, there is one problem. When we replace the number 12345 with 123, it will display 00123. Why? :-) I'll leave this for you to answer. You should be able to fix this. It's simple. Hint: You need to compare AL before storing the digit.

Closing

OK, I think that's all for now. I'd like to thank you Victor Forsyuk who create a wonderful Norton Guides Database which excerpt is quoted for the interrupt list for this tutorial.

I hope that you are not discouraged. This chapter may be a little steep. But, what I can suggest you is just practice, practice, practice. I'm sure that you'll be able to grasp this concept. See you next time.

Where to go

Chapter 9
News Page
x86 Assembly Lesson 1 index
Contacting Me