Memory machine assembly commentslocation language language (HEX)21AA:0100 B409 MOV AH,09 ;display string of characters21AA:0102 BA1701 MOV DX,0117 ;point to string 21AA:0105 CD21 INT 21 ;do it 21AA:0107 B401 MOV AH,01 ;keyboard input function 21AA:0109 CD21 INT 21 ;do it 21AA:010B B44C MOV AH,4C ;exit function 21AA:010D 2C30 SUB AL,30 ;convert to number21AA:010F 7EF6 JLE 0107 ;jump to 107 if < "1" 21AA:0111 3C09 CMP AL,09 ;compare to "9" 21AA:0113 7FF2 JG 0107 ;jump to 107 if greater 21AA:0115 CD21 INT 21 ;do exit Memory dumpLocation HEX Ascii 21AA:0100 B4 09 BA 17 01 CD 21 B4-01 CD 21 B4 4C 2C 30 7E ......!...!.L,0~ 21AA:0110 F6 3C 09 7F F2 CD 21 45-6E 74 65 72 20 6F 70 74 .<....!Enter opt 21AA:0120 69 6F 6E 3A 20 24 C1 C4-B4 00 B3 20 20 20 20 20 ion: $..... Above is a example of an assembly language program to get a keypress from 1 to 9 from the keyboard. The program is written by the programmer as shown under the columns "assembly language" and "comments". He then has an assembler read and assemble the program into machine language. An assembler is a computer program which can interpret an assembly language file and convert it to machine language complete with memory locations. Usually the assembly language program would be named something like OPT1TO9.ASM and the output from the assembler would be a file like OPT1TO9.COM or OPT1TO9.EXE. When the program is loaded into memory the data in memory would be as shown in the following table labeled "memory dump". The memory location 21AA:0100 is actually determined by multiplying the number 21AA (HEX) by 16 (decimal) and adding 100 (HEX). That's because that the chip used in IBM-PC compatible computers loads programs at the beginning of 16 byte paragraphs of memory. Memory locations are referred to as the offset in bytes from the beginning of the paragraph. Programs generally begin at offset 100 because the first 100H bytes (256 in decimal) are reserved for information needed by the computer and are called the Program Segment Prefix (PSP). The assembler actually only determines the offset memory locations and the segment (paragraph) location is determined by the DOS (Disk Operating System) as the program is loaded from disk. To write an assembly language program the programmer must know the mnemonics for the particular chip and assembler he is using. Mnemonics are the language understood by the assembler like the mnemonic, "MOV AH,09", which means to move the Hexidecimal number 9 into the AH register of the chip. The comments in a program are for the programmer's own use and are ignored by the assembler. BIOS, DOS, and Interrupts Because computers manufactured by different companies and the components they are made from may vary, each PC compatible has a series of machine language mini programs built into its ROM, (Read Only Memory). These programs, actually subroutines, are called the BIOS, (Basic Input/Output System). They handle all interaction between the ALU and the IO devices built into the system. These routines are used by setting certain memory registers in the ALU chip and calling an Interrupt. The memory registers include AX, BX, CX, and DX and well as others. Each memory register is a set of two byte wide locations within the ALU chip. The AX register, for example, is made up of the AL, (Low order) byte and the AH, (High order) byte. There can be as many as 256 interrupts available to carry out various functions. For instance interrupt 10H (16 decimal) provides functions to control the video display, interrupt 16H provides keyboard access functions, and interrupt 21H provides a plethora of MS-DOS functions. AH is set with the function number before calling the interrupt. For instance function 9 in interrupt 21H is a function to display a string of characters on the video. If this function is called, it is assumed that the DX register will point to the memory location where the string of characters begins and that the string will be terminated by a 24H character which is a $. This is illustrated in the first three instructions in the assembly language program above. The disk operating system is the computer program which is loaded into memory upon initial startup of the system and which remains in memory, at least in part, during the entire time the system is powered up. It provides access through the BIOS to a number of commands to load programs from disk, execute programs, and provide various disk management and other I/O functions. The DOS internal commands include: DIR CLS TYPE COPY COPY + DATE TIME DEL ERASE REN VER VOL RMDIR CHDIR MKDIR CTTY PATH PROMPT SET VERIFY EXIT CALL ECHO FOR GOTO IF PAUSE REM SHIFT Besides these the DOS can execute any other EXE, COM, or BAT program if the program's name is entered at the DOS prompt. Higher Level Languages Although assembly language provides the most speed and the most efficient use of memory of any programming language, it is also the most difficult in which to program. This is because it is the most primitive language next to the manual system of simply calculating the bytes to put into memory for a program and setting the memory locations. For speed and ease of programming a number of higher level languages have been developed over the years. These include BASIC, C, and Pascal. Although higher level languages are easier to understand and use they result in larger and slower programs than does assembly language. The most efficient higher level language in these terms is C, but it is also more difficult to use than some other languages. Let us discuss BASIC as a commonly available example of higher level languages. Source code written in the BASIC language for the above program might look like this: 10 PRINT "Enter option: "; 20 INPUT X$ 30 IF X$<"1" OR X$>"9" THEN GOTO 20 40 END This very small program takes up 74 bytes in interpreted BASIC, but only 38 bytes in machine language and will operate much faster in machine language. Since BASIC can be used as an interpreted program or BASIC source code can be compiled into machine language, this program could be used in either of two ways. If it were compiled by a basic compiler the output would be a machine language EXE program which would take more memory space, 2400 bytes, but would run much faster than if interpreted. To use an interpreter the BASIC language interpreter would have to be loaded into memory as a program to be run. It would then load, interpret and execute the above program from the source code. Another type of program which can be executed by the DOS is a batch program. This is simply a series of DOS internal commands and EXE, COM, or BAT files which can be listed and run in sequence. An example batch program would be: @ECHO OFF SCREEN cyan black ECHO ΥΝΝΝΝΝΝΝΝ΅ CHOOSE A PROGRAM ΖΝΝΝΝΝΝΝΝΝΝΝΝΝΝΈ ECHO ³ ³ ECHO ³ [1] - Interest calculations ³ ECHO ³ [2] - Play Chess ³ ECHO ³ [3] - Return to DOS ³ ECHO ³ ³ ECHO ΤΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΝΎ OPT1TO9 ECHO Key Accepted - One moment please ... IF ERRORLEVEL 3 GOTO END IF ERRORLEVEL 2 GOTO B IF ERRORLEVEL 1 GOTO A :A BASIC INTEREST GOTO END :B CHESS GOTO END :END In the MS-DOS world some of the extensions at the end of file names have standardized meanings as follows: EXE - An execute program which can be executed from DOS COM - A command program which can be executed from DOS BAT - A batch file of DOS commands which can be executed in sequence from DOS. BAS - A BASIC program which can be compiled or interpreted. ASM - An assembly language program which can be assembled. C - Source code for a C program which can be compiled. PAS - Source code for a Pascal program which can be compiled. TXT - An ASCII file DOC - An ASCII file So all that is necessary to make a computer do whatever you want is to know the syntax and language of one or several of the above languages and plan, write, and debug the program.