Tutorial - Assembly Introduction

Assembly language

(This tutorial is aimed for people who want to use assembly in C++ to speed things up a bit.)

Welcome to this tutorial on Assembly language. First, let me explain to you what assembly language actually is. I hope you are already familiar with any programming language (for example: C++ or Visual Basic) because this will make thinks easier to understand. Anyways, think of assembly as a readable version of computer language. This also means, that every instruction in assembly corresponds to one instruction in computer language. It is important to know that the computer only understands binary instructions, in other words, 1's and 0's. you have to keep this in mind, to understand alot of things in every programming language, most importantly when it comes to calculations.
Before looking at the syntaxes (instructions) of the assembly language, i have to show you how data is stored in the memory. Firstly, think of the memory as a street with many houses. Each of these houses can store up to a certain amount of data. But to acces this data, we will need the house number first. This may sound easy but the house numbers aren't simple decimal numbers ranging from 0-9, these house numbers will have a range from 0-F (0,1,2,3,4,5,6,7,8,9,A, B,C,D,E,F), which is also called Hexadecimal and instead of having 10 "Numbers", it has 16. this might make addressing a bit difficult because we are used to our decimal system. Then there is the binary system which i already mentioned earlier. it only has 2 "numbers", 0 and 1. Anyways, you might be asking yourself, why we are using a hexa- decimal system instead of our decimal system. The answer is this, since the computer only understands 0's and 1's, converting a hexadecimal number into binary is alot easier than converting a decimal number into binary. I will explain why in a minute, first i will have to tell you how information is actually stored in a certain address of the memory. The information is stored as a binary number, and mostly has a range from 00000000 (0) to 11111111 (255). this means each memory "spot" can hold a value between 0 and 255 (255 is a common number in the computer world, because of this). These are called integers. Here is a list of all unsigned (which means they don't have a - or + sign infront of them) integers:

storage type	Range (Low to High)	Power of 2
Unsigned Byte	0 to 255	2^8 - 1
Unsigned word	0 to 65,535	2^16 - 1
Unsigned Doubleword	0 to 4,294,967,295	2^32 - 1
Unsigned Quadword	0 to 18,446,744,073,709,551,615	2^64 - 1

Mostly a sign is set infront of the integer, like this: 0 for positive and 1 for negative. (for example: 00000010 would be a positive 2 and 11111110 would be a negative 2). You might be asking yourself why 11111110 is a -2? This is because the computer counts like this: ...00000001 (1), 00000000 (0), 11111111 (-1), 11111110 (-2), 11111101 (-3)... This means that 10000000 (-128) is the smallest number and 01111111 (128) is the biggest number. To get the negative number, just reverse (for example 00001111 = 11110000) the bits then add 00000001 to it (for example 11110000+00000001 = 11110001). signed words, Dwords and QWords are stored the same way.
It is important to know that each memory "spot" can only store a byte, this means when storing a word, you are actually storing 2 bytes, and when storing a Dword, you are actualy storing 4 bytes, and QWord is 8 bytes. Just now i mentioned that converting a Haxadecimal number to binary is easier than converting a decimal number. There is a simple answer to this. 10 is not a power of 2 and 16 is. (I doubt this has helped you) But anyway, 16 = 2^4, and since a byte has the storage size of 2^8 (when starting with 1) we could easily fit 2 hexadecimal numbers inside. When we would want to store decimal numbers, we would be actually loosing space.
Now lets get back to the "street" ;). I mentioned that every house has a number or what we would call it now, an address. These addresses have a certain form like this: FFFF:FFFF. the 4 numbers infront of the ":" are called the segment and the other 4 are called the offset within this segment. Now, a segment is a certain part of a memory, which i have been refering to as "spot". Each segment has a certain amount of storage. at each offset we can store one byte. and in total there are FFFF offsets. These amounts can vary. But the "idea" is the same.
Now i think you know how the computer stores data in the memory. This will Help us covering the next topic, Registers. There are alot of registers, and registers are something like a variable, except it is possible to acces them faster and registers are used to store values from functions, like when multiplying 2 numbers, the result is stored in a registry (always in the same one). It is advisable not to change the contents of some of the registries, because this might result in a crash. Registries are like variables, and can store a certain amount of data. There are general purpose registers, which are primarily used for arithmetic and data movement. Each Registry can be addressed as either a single 32-bit value or a 16-Bit value. Some 16-Bit Registers can be addressed as 2 8-Bit values. For example the EAX registry, which is 32-Bits. It has a lower 16-Bits named AX, which has a Upper 8-Bits (AH) and a lower 8-Bits (AL). When talking about bits, it is ment: 8-Bit means, that the storage capability is a byte or 2^8, and so on. Here is a list of the most used Registers.

Register Name	Purpose
EAX	Automatically used for multiplications and Divisions and is often called the extended accumulator
ECX	Automatically used for loops (decreases ECX till 0)
ESP	Used to address memory on the stack (system memory structure). It shouldn't be changed. Also known as Extended stack pointer.
ESI, EDI	Used by High-speed memory transfer instructions also called Extended source/destination index
EBP	Used by high-level languages to reference function parameters and local Variables. Should only be used is you know what you are doing. Also known as extended frame pointer

This all is more or less all the "Need-to-know" stuff before starting to learn Assembly, and it is vital that you understand it a bit. ;)
One thing you shouldn't do is write a program in pure assembly. It might be fast and stuff, but it is extremly hard to debug it. That's why i will show you how to use assembly in visual c++ (the syntax might vary with other programs). This is the reason why i hoped you know some c++. Anyways, to use assembly in c++, just type
__asm
{
Statements
}

You can use all variables u initialised in C++ in assembly, and manipulate them. They will have the same names. Anyways, here are some basic instructions

Instruction	Description	Usage	Example
Mov	This will store a value into a certain location. Destination and source can be anything, but they are not allowed to be 2 registers at a time	Mov [Destination] [Source]	Mov Eax, value1
Inc	This will increase the variable/register by 1	Inc [Reg/Var]	Inc EAX
Dec	Decreases a Variable/Register by 1	Dec [var/Reg]	DEC AX
Add	Source and Destination are added together and stored into the Destination. Cannot be both Register	Add [Destination] [Source]	Add Eax, 2
Sub	Source is subtracted from the Destination, and stored there. Cannot be both Registers.	Sub [Destination] [Source]	Sub Var1, Eax
Mul	value is multiplied with either AL AX or EAX, depending on format and stored into it.	Mul [Value/Var/Reg]	Mul Var1
Div	value is Divided with either AL AX or EAX, depending on format and stored into it.	Div [Value/Var/Reg]	Div Var1
Push	Stores a Value onto the Stack, first in first out.	Push [Value/Var/Reg]	Push AL
Pop	Retreives last Pushed value	Pop [Var/Reg]	Pop AL
pusha	pushes all Registers to the Stack	pusha	pusha
Popa	Pops all Registers	Popa	Popa

These are all the Basics you need to know to start programming in assembly language. Altough all this is nothing compared to the real power of assembly. With assembly you can do all sorts of stuff, like directly accessing hardware with Interrupts, or accessing the Hard drive at a high-speed.

Well, as you noticed, this part of the tutorial doesn't actually explain how the statements work, but that is the aim of the second tutorial, which isn't totally finished yet.