MOVS, CMPS, LODS, SCAS, and STOS
Hi! Welcome to the thirteenth chapter of this series. I hope that you really understand the array concepts we discussed in the last chapter. Now, I'm going to explain the basic string instruction. As you have noticed that in assembly, a string is basically an array. So, the instructions to do strings are also applicable into arrays. This turns out to be an invaluable concept.
There are five basic string instructions, which are also known as the "five brothers". Of course these instruction can be "emulated" with mov, cmp, loop and jmp. However, these five brothers are a lot faster since they are the "built-in" instructions. OK, straight to the stuff.
LES DI and LDS SI
String instructions typically uses DS:SI pair to denote the source string and ES:DI pair to denote the destination string. In "tiny" memory mode, we don't really care about setting DS and ES, right? So, the only thing we care is to set the register SI and DI to point to the source and destination offset respectively. However, you may find the instructions les di, [somestringvar] and lds si, [otherstringvar] in some programs. These instructions are used to set both ES and DI or both DS and SI respectively. So, you may think of it as a "combo" instruction.
After setting source and/or destination register pairs, you may want to specify on how the string instruction is performed: Should it be performed backwards or forwards? Well, this may be a bit strange for you, but assembly can do these instructions in both directions.
Determining which way to go involves setting the direction flag. Intel x86 assembly has two instructins for this: cld ("clear direction flag") and std ("set direction flag"). Clearing direction flag will cause the string instructions done forward. Setting it will make a reverse direction. Since we typically want to do the string instructions forward, we almost always put cld instruction after setting the register pairs.
The instruction movs is used to copy source string into the destination (yes, copy, not move). This instruction has two variants: movsb and movsw. The movsb ("move string byte") moves one byte at a time, whereas movsw moves two bytes at a time.
Since we'd like to move several bytes at a time, these movs instructions are done in batches using rep prefix. The number of movements is specified by CX register. See the example below:
: lds si, [src] les di, [dest] cld mov cx, 100 rep movsb :
This example will copy 100 bytes from src to dest. If you replace movsb with movsw, you copy 200 bytes instead. If you remove the rep prefix, the CX register will have no effect. You will move one byte (if it is movsb, or 2 bytes if it is movsw).
Assembly gurus use this instruction a lot, because arrays can be copied in the very same way. You can use this to emulate C/C++'s strcpy.
The instruction cmps is used to compare two strings. It also has two variants: cmpsb and cmpsw. The cmpsb is to compare one byte at a time and cmpsw will compare two bytes at a time. Usually, we tend to use more of cmpsb. Let's look at the example below:
: lds si, [src] les di, [dest] cld mov cx, 100 rep cmpsb jne @@mismatch @@match: : : @@mismatch: dec si dec di :
After the rep cmpsb, the zero flag is set if the result is equal. If the strings are not equal, then the zero flag is cleared. Thus, typically, after a rep cmpsb you do a jne @@somelabel to detect mismatches.
If there is a mismatch, then SI and DI point one byte further from the mismatch point. So, you need to decrement them by one like the example above.
If you replace the prefix rep with repne, it means that you want to make sure that all elements in the strings are completely not the same. The repne is seldom used in conjunction of cmpsb though.
C/C++ users: Why don't you use this for doing strcmp?
The instruction scas is used to scan a string pointed by ES:DI. So, this time DS:SI is not used. This instruction is typically used for searching a particular character in a string. As with other string instructions, scas also has two variants: scasb and scasw. In scasb, the string ES:DI is searched for the occurence of the element specified by the register AL, whereas in scasw, the element to be searched is in AX. Look at the following example:
: les di, [msg] mov al, 65 ; --> 65 is the ASCII code for capital A. cld mov cx, 1000 ; --> search within 1000 bytes rep scasb je @@found @@notfound: : : @@found: dec di ; --> If we found it, DI always point 1 byte further, just like in cmps :
As it is in cmps instruction, we must check with either jne or je to assert whether it really finds it or not.
Let's look at the following procedure. This procedure is used to calculate the string length (C/C++: strlen, Pascal: length).
; -- String length, result in AX proc strlen strpointer: dword push es push di push si push cx les di, [strpointer] mov si, di sub al, al cld mov cx, 10000 ; --> Scanning within the first 10000 bytes rep scasb je @@found mov ax, -1 ; --> When we can't find it, return -1 jmp @@quit @@found: sub si, di mov ax, si inc ax @@quit: pop cx pop si pop di pop es ret endp
Well, in building strcpy, you'll need this function. To invoke this function, do call strlen, @data, offset mystr (TASM) or invoke strlen, seg data, offset mystr (MASM).
The stos instruction bombard the string pointed by ES:DI pair with the value in the accumulator. So, it is great when you'd like to initialize arrays (usually with zeroes). As with the other brothers, it has two variants: stosb and stosw. In stosb, all bytes in the string ES:DI is replaced with whatever AL contains. In stosw, the initializator is AX instead of AL.
Look at the following example:
: les di, [myarray] sub ax, ax ; --> AX = 0 cld mov cx, 100 rep stosw :
This excerpt will initialize 200 bytes of myarray by 0.
The lods instruction will load a chunk (either a byte or a word) from the string pointed by DS:SI into accumulator. As always, it has two variants: lodsb and lodsw. Unlike the other brothers, this lods instruction usually never comboed with rep prefix. Why? Because we usually interested in fetching a byte (or a word) at a time and then examine it. If we use rep stosb or rep stosw, the value in accumulator gets overwritten. Thus, the rep prefix makes no sense here.
The lods instruction actually replaceable by the normal mov. But, I think the lods is faster. Look at the following example:
: lds si, [mystr] cld lodsb ; now AL contains the first byte pointed by DS:SI :
The excerpt above is actually equivalent to:
: lds si, [mystr] mov al, [si] inc si ; now AL contains the first byte pointed by DS:SI :
The other advantage in lods is that this instruction can go backward or forward depending the direction flag. The processor will take care of this automatically (which may be handy when you'd like to reverse a string, for example). In the manual way, you have to keep track this yourself.
After a blitz introduction with the five brothers, you'd probably feel a little overwhelmed. Let me summarize it for you:
- String instructions usually use DS:SI pair or ES:DI pair or both. Setting this register pair can be achieved using lds si and les di instructions.
- movs and cmps instructions will need both. scas and stos will need only ES:DI pair. lods will need only DS:SI pair.
- String instructions use the direction flag to determine the direction of the operation. Clearing the flag using cld will cause the operations done forward, setting the flags using std will make them run backward.
- The string instructions always have two variants. One is by adding the letter 'b', the other is to mount the letter 'w' which signify byte order or word order respectively.
- The byte order instructions are done byte per byte, whereas the word order ones are done per two bytes.
- Among the string instructions that need accumulator (either AX or AL) are scas, stos, and lods.
- In byte order instruction, the accumulator means AL, whereas in the word order one, it means AX.
- All instructions are usually prefixed by rep or its variants (e.g. repne), except lods.
- All instructions which need rep prefix or its variants, needs to set CX as the counter.
- After doing rep, the instructions cmps and scas will set the zero flag, which must be checked to detect whether a match or mismatch was found. The detection can be done using je or jne instructions.
Whew! A pretty long chapter. OK, I think that's all for now. By the way, 80386 processors or better include one more variant by augmenting the letter 'd'. So, we'll have movsd, and so on. The operation is done per 4-bytes at a time. Therefore, it's twice speedier than the w-variant and four times faster than the b-variant. Wonderful, isn't it? If the d-variant instructions ever need the accumulator, it means EAX, the extended version of AX. I'll explain more of this in the second lesson.
See you next time.
Where to go
Roby Joehanes © 2001