Variables in Assembly

Variable Basics and MOV Command

Welcome

Hi! Welcome to the second chapter of this series. This time, I'm going to talk about the variables in Assembly. This is quite significant since it behaves differently than that of the high programming languages like C/C++, Pascal, or Java. I hope after following this lesson, you'll be able to use the assembly variable context fully. Let's begin!

Before starting, I just remind you that assembly language (in TASM or MASM) is NOT case-sensitive. It means that "this", "This", and "tHiS" are considered as the same. I just forgot whether I have mentioned this in the last chapter or not. Well, you can always tell the assembly to be case-sensitive if you'd like to. That will involves in setting some command line switches. It will be discussed later.

Moreover, I remind you that comments in assembly begins with a semicolon (;). Everything after a semicolon until the end of the line is ignored. So, it's like a double-slash in C++ or a quote in BASIC.

I give examples in TASM ideal syntax most of the times. I assume that you have followed the previous chapter concerning the format. I don't want to specify each example in both MASM and TASM formats since they are largely the same. The subtle differences in MASM syntax will be mentioned.

Variables Declaration

If you recall our first program from the last chapter, our ideal syntax (TASM based) looks like this:

ideal
p286n
model tiny

codeseg
   org 100h
   jmp start

   ; your data and subroutine here (this is a comment)

start:
   mov ax, 4c00h
   int 21h
end

Notice where you'd put your variable declarations. It's right after the jmp start statement. Well, you can actually place your declarations anywhere inside the program. However, for now, let's just place them there. Placing it after the start label can be disastrous if you do not handle it carefully.

OK, now, there are 3 types of variable declarations in assembly that you need to know now: db, dw, and dd. The db is to declare the one-byte-length variables. Likewise, dw and dd are for the word (2 bytes) variables and double-word (4 bytes) variables. Of course there are more variable types available, but all of them deals only with numeric variables and will be discussed later, after you grasped the core concept of assembly language.

How can we declare variables then? The declaration syntax is as follows:

var_name   db   value

That's simple. You need to change var_name with your variable name and value with the corresponding default value of var_name. Declaring word and double-word variables are just similar, you just change the db into dw and dd. For example:

ideal
p286n
model tiny

codeseg
   org 100h
   jmp start

   score db 100
   year  dw 2001
   money dd 1000000

start:
   mov ax, 4c00h
   int 21h
end

Ah, that's straight forward. Of course you can assign those variables with binary or hexadecimal values. Binary values would need to be appended by the letter 'b' at the end of the number. Likewise, hexadecimals with 'h' at the end. If the hexadecimal numbers start with letters (i.e. A, B, C, D, E, or F), you need to add a zero in front of that number and add an 'h' after that number. For example:

   :
   bits  db 101001b
   var2  dw 4567h
   var3  dw 0BABEh
   :

Variable Limits and Negative Values

How about the limits of those variable types then? See the table below:

Declaration	Acronym	Length	Limit
db	define byte	1 byte	0-255
dw	define word	2 bytes	0-65535
dd	define double	4 bytes	0-4294967295

"Ah, I see that the db, dw, and dd are for positive values only. How can I define a negative ones?" Well, you can assign the variables as negative values, too. However, assembler will convert them to the corresponding positive value. For example: If you assign -1 to a db variable, assembler will convert them to positive 255 integer. "How can it be? It will certainly confuse my calculation then." Nope. In fact, the converted negative values will behave similarly as if they are not flipped. Trust me. ;-) The only thing you need to beware of is just when you want to print the contents of that variable out to the screen and to distinguish the negative values from the positive ones.

To distinguish negatives from positives, usually programmers likes to divide the variable ranges into two roughly equal parts. For bytes, if the value is between 0 and 127, it is considered as positive, the rest (128-255) are considered negative. This scheme also perpetuates in dividing words and double-words. It's not hard at all, you just remember which variables are considered negative and treat them accordingly. You may find it cumbersome at first, though.

Now, the next question would be on how can we find the corresponding positive values for each negative numbers. Before we start, I just remind you that 1 byte equals to 8 bits. So 2 bytes is 16 bits, 4 bytes is 32 bits. I assume that you are able to convert a decimal number to binary and vice versa. I also assume that you're capable in doing binary digit addition.

To find the corresponding positive value, you first ignore the negative sign, then convert that number into binary. Remember the variable type you are in. If it is a byte, the resulting binary number must be 8 digits. Likewise, a word must be 16 digits and a double must be 32 digits. If the result digit is less than that, pad it with zeroes. Then, flip all digits in the binary number (i.e. from 0 to 1 or from 1 to 0). After that, increase that binary by one. Convert the result back to decimal. Voila! That's the corresponding positive value.

For example, you want to convert -5 byte to its corresponding positive value. Ignore the negative and convert 5 to binary. It's 101, right? Since we're dealing with bytes, we must have 8 digits. The result 101 is just 3 digits, so we must pad it with zeroes. Therefore, we now have 00000101. Then, we flip the digits from 0 to 1 or 1 to 0. So, we now have 11111010. The next step is to increase that number by 1: 11111010 + 1 = 11111011. Then we convert this number back to decimal: 251. Ta da! So, -5 is 251 in positive representation.

Hmm, if you find that this calculation is cumbersome... uh... Well, you have to live with that if you'd like to learn assembly. Moreover, you need to be familiar with hexadecimal numbers too. You would need to learn some converting operations and do some arithmetic between decimal, binary and hexadecimal. If you're kinda awkward, you can always employ calculators. If you do it over and over again, you probably do the calculations by heart quickly (and amaze your friends ;-).

Moving Around Values

Still remember what registers are? Registers can be treated as variables that reside in CPU chips, right? We have a handful of those registers: AX, BX, CX, DX, SI, DI, and so on. If you need some review, click here.

"If we have some registers built-in, why on earth do we need another variables to declare then?" Hmm, there are a couple of reasons. The first is that of course we need to have more variables than just about twelve. The second is that some registers can't be used for storing values at all, for example CS and IP, as they are critical to running the program. That limits our freedom to about 6 or 7 registers, which is not adequate for most of our needs.

However, if you need to do some calculations or commands involving the variables, in most occasion, you'll have to load the variable values to the registers. Loading those values up to the registers and storing the value from registers to variables can be done through the mov command.

The syntax of the mov command is mov a,b which means assign b to a (i.e. a := b). So, in our first program, we observe the command mov ax, 4c00h. That means ax = 4c00h. We give the register AX value 4c00 hexadecimal. Note that we CAN'T do mov 4c00h, ax. ;-)

Similarly, you can load the variables to a register or store them back. You can even transfer values between registers. Let's look at the example below:

    :
    :  ; (the usual preamble)
   jmp start

   our_var dw 10

start:
   mov bx, [our_var]
   mov cx, bx
   mov [our_var], cx

   mov ax, 4c00h
   int 21h
end

Ah. that's pretty straight forward. The first statement (i.e. mov bx, [our_var]) is to load our_var variable to BX register. The second statement (mov cx, bx) is to transfer that value from BX to CX. The third statement (mov [our_var], cx) is to save the value from CX register back to our_var variable. Note the square brackets when you deal with variables. In MASM, those square brackets are not needed. However, these square brackets are good to distinguish the variable from its address so that in later on when we deal with pointers, we are not confused.

When we deal with byte variables (i.e. db), we need to use byte registers (e.g. AL, AH, BL, BH, and so on) to do our bidding. AX, BX, CX, DX, and so on are word registers. How about moving around double-word variables then? You can use double-word registers which is available in 80386 processors or better (use p386n instead of p286n to enable double-word registers). The double-word registers includes EAX, EBX, ECX, EDX, and so on. Or, you can have a work around, which is discussed later (in caveats section). Better not to do 80386 instructions first. I'll explain that in later chapters.

We can later assign variables with constants with mov instruction. However, this will work only with 80286 or better processors. In 8086, this instruction is illegal. See below:

    :
   mov [word ptr our_var], 1
    :

Notice the word ptr modifier. Modifiers must be used when you assign constants to variables. Why? It will be explained later. You don't have to worry about this now. ;-) Since our_var is a word variable, we need to use word ptr modifier. Likewise, byte variable uses byte ptr modifier and double-word variable uses dword ptr. Note that moving constants to dword ptr variables need to have a 80386 processors or better (and use p386n instead of p286n).

Caveats in MOVs

There are caveats in using mov command. You CANNOT use mov [var1], [var2]. In other words, mov command cannot transfer values between two variables directly. So, how can we get around with this? Use the register.

Suppose both var1 and var2 are word variables. We can use any word registers (AX, BX, CX, DX, and so on) to do the transfer. Suppose we use AX. Thus, mov [var1], [var2] must be transformed into:

mov ax, [var2]
mov [var1],ax

That's the way. Why do we do this? Well, don't ask me... ask the maker of the assembler ;-). My guess is that we do not know the variable types when that program is translated into machine codes. So, there is no way for the processor to distinguish the variable types.

The other is that variables in assembly are treated differently than that of any high level programming language (Pascal, C/C++, Java, etc). The assembler actually treat variables as a label that has an address in the memory (RAM in most cases) associated to it. Moreover, the assembly language is later assembled (or compiled) into a machine codes. Informations concerning variable names and their respective types are LOST. Also, assembler does not check the types of variables. It simply doesn't care. So, if you declare a variable as bytes, but you use a word register to move them around, it's legal. However, that particular mov will read and write 2 bytes instead of 1. Likewise, using byte register to move a word variable will only transfer one byte instead of two.

This fact confuses a lot of people learning assembler, especially if they have no low level view at all. However, this is the benefit of assembler. You can tweak it around, use it or abuse it. ;-) See the example below:

    :
    :  ; (the usual preamble)
   jmp start

   var1 db 1
   var2 db 2
   var3 dw 305h

start:
   mov ax, [var1]  ; ax now equals to 0201h (i.e. 2*256+1)
   mov ax, [var3]  ; ax now equals to 0305h
   mov ax, [var2]  ; ax now equals to 0502h
   mov al, [var3]  ; al now equals to 05h
   :
   :

OK, I know it's confusing. It becomes less murky when we deal with how the assembler store those variables in the memory. Suppose var1 get stored in memory address 100h. Those variables placed adjacently will also be placed adjacently in the memory. So, the memory diagram will look like this:

Address	Value
100h	01h
101h	02h
102h	05h
103h	03h

Clearly, address 100h contains var1. Address 101h contains var2. Address 102h-103h contains var3. Notice the way that Intel assembler store a word value. It stores the least significant byte first, then the most significant byte later. The value 305h is broken into two parts: 03h as the most significant byte and 05h as the least significant byte. So, 05h get stored first, then 03h. Note: Dealing with memories is better expressed by hexadecimals (or hex for short). If you have a decimal number, you'd like to convert the value to hex first. Each two digits of hex is worth one byte.

Now, how can we deal with the mov then? OK, let's look into this fragment of the code:

   mov ax, [var1]  ; ax now equals to 0201h (i.e. 2*256+1)
   mov ax, [var3]  ; ax now equals to 0305h
   mov ax, [var2]  ; ax now equals to 0502h
   mov al, [var3]  ; al now equals to 05h

Remember that AX is a word register and AL is a byte register and one word equals to two bytes. The first instruction will read two bytes from var1. So, it will read 01h and 02h altogether and compose it to AX. The result is 0201h, not 0102h. The second instruction will read two bytes starting from var3. So, it will read 05h and then 03h. The result is 0305h. Notice that the reading is always swapped too. Likewise the third instruction. AX reads two bytes from var2 at address 101h: 02h first, then 05h next. The result is 0502h. The last one, AL reads only one byte at the starting point of var3 which gives us only 05h.

That's the way the variables work.

Oh, No! More Gotchas in MOVs

You NEED to know this in order to understand the double-word mov work around. So, this is important. ;-) Recall that variables in assembly are treated as addresses. You can even view it as pointers too. Suppose we have this example:

    :
    :  ; (the usual preamble)
   jmp start

   var1 db 1
   var2 db 2
   var3 dw 305h

start:
   mov ax, [var1+1]
   :
   :

Yes, it's similar to the one we use above. Note the new +1 thing. What does it mean? Recall the memory table displacement from the previous section: var1 has the address 100h. So, mov ax, [var1+1] gives register AX 2 bytes from address 101h (because 100h + 1 = 101h). Analoguously, mov ax, [var1+2] is from address 102h and so on. However, you cannot have the plus constant too high, like mov ax, [var1+1000]. ;-) I guess the maximum is 8. I'm not certain. Why don't you try it then?

Now about the double-word work around. Double-word variables are also stored similarly (i.e. bottom-up, flipped like the word variables). Suppose we have this variable:

  my_var dd 1234BABEh

Let's say that this variable get the address 100h. The memory address mapping is as follows:

Address	Value
100h	0BEh
101h	0BAh
102h	34h
103h	12h

See, it's similarly flipped up. In 80286 processor or below, we are not able to use double-word registers to transfer values to and from double-word variables. However, using the work around I mentioned earlier, it is possible to do that. The way we usually do it is as follows:

   mov ax, [my_var]
   mov dx, [my_var+2]

Ah... Did you notice that? ;-) Now, the register pair DX:AX contains the double-word value. Let's see how it works. Since my_var has the address 100h, so AX get 2 bytes read from there: 0BEh and 0BAh. Now AX has the value 0BABEh. Then DX registers read 2 bytes from address 100h+2 (i.e. 102h). So, 34h and 12h are read. DX get the value 1234h. Therefore, pairing DX:AX, get the original double-word value: 1234BABEh. Writing to double-word variables are done similarly.

Impacts on Registers

Recall that the word register AX consists of AH and AL. Modifying either AH or AL will modify the contents of AX. Likewise, modifying AX will be likely modify AH and AL. This situation is analogous to other word registers that have two byte registers (i.e. BX, CX, and DX). Take look at this example:

   mov  ax, 1234h     ; AX = 1234h ==>  AH = 12h, AL = 34h
   mov  al, 56h     ; AX = 1256h ==>  AH = 12h, AL = 56h
   mov  ah, 99h     ; AX = 9956h ==>  AH = 99h, AL = 56h

Ah. So, when I assign value to AX and AX is important to me, I must not modify neither AL nor AH. That's true. Similar manner occur on BX (with its BL and BH), CX (CL and CH), and DX (DL and DH). To refresh your memory, please take a look at the picture below:

Please keep in mind this behavior when you do programs in assembly.

Question Marks On Variables

If you are not certain about the default value of a variable, or you just lazy to specify one, you can give a question mark ("?") instead. For example:

   another_var dw ?

Well, this usually useful when you treat absolute memory address to which is unsafe to determine the default value. Or may be it is a requirement on combining assembly programs with other high level programming languages (i.e. Pascal or C/C++).

[Not Exactly] String Variables

You can define strings variables in assembly. It is as follows:

  message db "Hello World!$"

String variables are required to be stored as db variables. The string is then surrounded by quotes, either single or double, up to you. If you begin a string with a double quote, you'll have to close it with another double quote. If it is a single quote at start, a single quote as well to finish. This is neat. So, if you plan to use a quote in the middle of your string, you can enclose your string by a double quote, and vice versa.

Why do we have to end our string with a dollar sign ("$")? Well, some of the ye olde DOS services require us to do so. However, some of the system may require you to end it by zero ASCII code instead:

  message db "Hello World!",0

How strange! Yes, we've got to live with these peculiarities. So, before doing your string, make sure where your string will go. If it is used to feed DOS (to print them), we have to end it with a dollar sign ("$"). If we use new Windows or using C libraries, we have to end it with zero ASCII code. I will tell you which one to use each time. Don't worry. Eventhough you use Windows right now, most of the times you'll still need to invoke the old DOS services. Windows still provide those severely outdated thing to afford compatibility.

How is the string variables are stores then? It is similar to the normal variables except now, each characters of the string is converted to its corresponding ASCII code. Uh oh, need to memorize them. ;-) You can download any ASCII table program somewhere on the net. There are plenty of resources.

Another thing to remember in string variables is that the string ASCII codes are NOT flipped as it usually is in normal variables. So, suppose we have that message variable above stored at address 100h. The memory contents is as follows:

Address	Value	ASCII Code
100h	H	48h
101h	e	65h
102h	l	6ch
103h	l	6ch
104h	o	6fh
105h	<space>	32h
106h	W	57h
107h	o	6fh
108h	r	72h
109h	l	6ch
10Ah	d	64h
10Bh	!	21h
10Ch	$	24h

Hmm... The string is stored contiguously.

Multi-valued Variables

"Hey, that's strange. You said that db variables contain one byte." Well, I should say: Not exactly. The variables defined as db means each value is defined as bytes. However, there is no restriction on how many values we can define for each variable names. ;-) See the example below:

   multivar db 12h, 34h, 56h, 78h, 00h, 11h, 22h, 00h

That's a legal variable definition. The memory map is as follows (assume that multivar is at address 100h):

Address	Value
100h	012h
101h	034h
102h	56h
103h	78h
104h	00h
105h	11h
106h	22h
107h	00h

Ah, so multi valued variables are stored contiguously. No flipping, right? Hmm... not quite. Check this out:

   multivar2 dw 1234h, 5678h, 0011h, 2200h

Its corresponding memory map (assume multivar2 is at address 100h):

Address	Value
100h	034h
101h	012h
102h	78h
103h	56h
104h	11h
105h	00h
106h	00h
107h	22h

Hmm... the flipping is done word-wise. It's because we declare the variable as define word (dw), so the flipping is word-wise. Analoguously, declaring variables as bytes (db) will not cause flipping since in db, we don't do flipping. Also, declaring variables as double-word (dd) will also be flipped double-word wise (i.e. four bytes flipped at a time).

Well, you can consider that multi-valued variable as an array. In fact, high level programming language will translate array definitions to this. Of course you can use the plussed mov instructions to access the array. However, it is not efficient. The proper way to access that multi-valued array will be discussed later in this lesson. Just stay tuned.

Using dup

Another way to declare a multi-valued variables are using dup command. See the example below:

   my_array db 5 dup (00h)

That example above is similar to:

   my_array db 00h, 00h, 00h, 00h, 00h

Oh, so dup is kinda shortcut (or -- more precisely -- a macro) to define variables with the same values. Of course you can define something like this:

   bar_array db 10 dup (?)

Closing

I know this new variable concepts will overwhelm you for a moment, but fret not. You'll get used to it. One suggestion is to draw its memory table like I did for those examples. This will clarify most situations. If you are in doubt, you're always welcome to review through the preliminary chapter, chapter 1 and 2.

So far, I haven't even explained on a program that says "Hello World!" All I did is to explain you the crucial concepts on assembly language just to enable you to write some basic assembly program. With these key concepts in mind, you'll hopefully program in assembly more gracefully.

OK, I think that's all for now. See you next time.

Where to go

Chapter 3
News Page
Lesson 1 contents
Contacting Me