The Implementation Of A Special Language Interpreter

(A Thesis Presented to The Faculty of the Department of Applied Science The College of William and Mary in Virginia - August 1975)
by James Thomas Lee, Jr. 07/05/97

II. THE INTERPRETER IN DESIGN {5 words}

A. The First Pass {4,215 words}

Although the Interpreter involves execution of each instruction as read, some instructions may require branching to another part of the program, or it may involve the execution of a programmed logical loop. If these types of programming options are to be available to the user, then the Interpreter must have certain vital information about the user's program for ready reference. Such information might include the address of a statement which is to be branched to or the location of pertinent input data. To make ready this information for the Interpreter, a two-pass technique is employed. The Interpreter, in the first pass, will read and store the user's source program, will set up tables of names and address pointers, and will maintain significant counters. As a result, when a branch to another part of the program is requested, the address of the next instruction will be readily available without requiring a clumsy and time-consuming program search. In addition, when a variable is used, its location, data type, and current value will be on hand and easily accessible. Paying close attention to the construction of the tables in the first pass will result in a smoother flowing execution phase in the second pass.

Tables to be maintained and developed in the first pass are a source text listing table, a variable reference table, a table of statement numbers, a table of the first character addresses of each instruction, a table of character string lengths, and a table of data values (see Figure 2). Each table and a description of how it is to be maintained will follow.

1. PERM (Permanent Program)--This table will store the source text of the user's program. However, the text is not stored exactly as read. From the very beginning, the Interpreter is trying to set a level of simplicity which will result in a systematic and efficient program execution. To explain, in most higher-level languages, the instruction set is composed of meaningful words which are easy for the user to remember when writing his program, but which are often of. varying character string length and not easy for the Interpreter to recognize during its execution attempt. In this Interpreter, the problem is solved in the first pass and never revisited. First, a routine called PREPROC reviews each instruction to identify the key words, or reserved words. As in ALGOL, the Interpreter responds to a reserved word list. Briefly, they are TERMINATE, REAL, INTEGER, STRING, BOOLEAN, FILE, SCRATCH, CARD, ENTER, DATA, WRITE, MAT, GOTO, ABS, IF, THEN, FOR, TO, WHILE, BACK, PUNCH, SIN, LN, FACT, COS, ASIN, ACOS, ATAN, TAN EXP, LOG, AND, OR, EOR, CNVRT, and SQRT. Observation will reveal that none of the above words begin with the same first.two characters [2]. Therefore, after PREPROC has found all the reserved words on a line, it rewrites the line and replaces the reserved word with the two-letter code preceded by a single quote. The single quote is included to alert the Interpreter that a reserved word follows and not just a two-digit variable name. Of note, the single quote should never appear to the Interpreter at any other time except in a character string; however any character string will be enclosed between double quotes and protected from a possible misinterpretation by the Interpreter. In addition to establishing a simple instruction recognition procedure, the above method of program storage can help reduce the amount of core required for program storage. Final preparation for storage is to add a semicolon to end the instruction. This actually has a two-fold purpose. First, the user can extend his statement from one card to another without concern. In many higher level languages, like FORTRAN, the user must put a special character in a particular column on his card to have multi-card instructions, and this can be bothersome. Secondly, the semicolon assists the Interpreter in locating the end of an instruction. Of course, the Interpreter can continue to read the user's program until the next instruction is found or until the Interpreter thinks it has found the next instruction, but this would require more extensive and really unnecessary programming. Also, as in the case with the single quote, the semicolon should not in any way cause horrendous problems for the Interpreter by way of misinterpretation. Now that the instruction has been prepared for storage, a few examples follow of the instruction as the Interpreter sees it:


			INPUT			IN STORAGE
			REAL X, Y, Z		'REX, Y, Z;
			A=(B+TAN(X))+75		A=(B+'TA(X))+75;
			TERMINATE		'TE;

As stated above, the problem of instruction recognition is resolved in the first pass. This is possible through a special routine, DETRMN (Determine Instruction Type), which has been developed as a follow-up to the above program preparation. Thus, whenever a single quote is found, the next two characters are read, and a call is made to DETRMN. When the instruction type has been determined, the routine returns the first word address of the routine which will execute the instruction.

---------------------------------------------------------------------------------------------------------------------------

Figure 2. The First Pass of the Interpreter

---------------------------------------------------------------------------------------------------------------------------


					START
					  |
				CARD CONTAINING INSTRUCTION IS READ
				AND STORED IN INST
					  |
				MODIFIED TEXT STORED IN IPERM
					  |
				CHECK FOR STATEMENT NUMBER.
				IF FOUND, MAKE TABLES ENTRY
					  |
				CHECK STATEMENT TYPE
					  |
DATA	--------------------------------OTHER---------------------------DECLARATION
STATEMENT								STATEMENT
	|								|
EVALUATE AND CONVERT 							FOR DECLARED VARIABLES.
ENTRY TO PROPER-DATA 							MAKE TABLES ENTRY.
TYPE AND STORE IN							|
DATATAB									IF STRING, MAKE
	|								CHARACTER LENGTH 
	|								ENTRY IN STLGTAB
	|_______________________________________________________________|
					  |
				IF MORE CARDS, GO TO START.
				OTHERWISE, CONTINUE BELOW.
					  |
				CHECK FOR DATA DUMP OPTION.
				IF REQUESTED, DUMP THE TABLES
				DEVELOPED IN THE FIRST PASS.
					  |
				CHECK LAST CARD FOR TERMINATE.
				IF NOT FOUND, ABORT.
					  |
				SET UP PI AND STAR AS
				DECLARED VARIABLES.
					  |
				CONTINUE TO THE SECOND PASS

---------------------------------------------------------------------------------------------------------------------------

2. TABLES--When the Interpreter has knowledge of the statement type being used, the next problem to be encountered is the variable name. When the Interpreter reads a variable name in the user's program, a number of questions are asked. First of all, was the variable properly declared by a Declaration Statement (Declaration Statements will be discussed below)? If not, then the Interpreter does not recognize it, and no analysis or calculation can take place. A second question asked by the Interpreter is, What is the variable's data type? This is important because if the type is string variable, then the Interpreter is dealing with variable length constants. If the type is real number, then the Interpreter must be ready to deal with floating point notation; that is, to make floating point conversions and calculations. If the data type is boolean, then the Interpreter must use a special routine for boolean expressions (BOOLSTOR). Only if the type is integer does the Interpreter do nothing special to prepare. Another question to be answered is, Are arrays being used? If they are, the Interpreter is then tasked with determining the displacement between the first word of the array in storage and the desired element of the array. When you consider the variable word lengths used by character strings and real numbers, then you can begin to imagine the difficulty that the Interpreter must face. A final problem or question, as if the Interpreter needs any more, is, Where can the data variable's current value be found in storage? To provide the Interpreter with the answers to these questions is a table called TABLES; however, before any entries can be made, the Interpreter must find one of the following Declaration Statements:

  • (a) REAL X, Y, Z,...

    This instruction enables the user to specify a variable whose value will be a real number; that is, it will contain a decimal point. In addition, large numbers using scientific notation are to be expressed and declared as real numbers.

  • (b) INTEGER X, Y, Z,. . .
  • This instruction enables the user to specify a variable whose value is an integer; that is, it does not contain a decimal point. As a word of precaution though, the Interpreter only reserves one word of core for an integer, so the maximum value allowed in storage is 77717777 (octal) or 16777215 (decimal). However this value is further reduced by the WRITE routine-which uses the left-most BCD character as a sign bit. Therefore, the restriction imposed upon an integer X is that it must be in the range -9999999 (decimal) < X < 9999999 (decimal). Any integer not within the above range must be declared as a real number.

  • (c) STRING (N) X
  • This instruction allows the user to declare character strings. Since variable length character strings (up to 150 characters) are permitted, the N in the syntactic above is to indicate the number of characters in the string. If N is left blank, the character string length will be assumed to be four (4) characters. Of note, a String Declaration Statement only allows one string variable to be declared per statement.

  • (d) BOOLEAN X, Y, Z,...
  • This instruction enables the user to specify his logical variables. These variables will always assume the values of either true or false. On input, the user may enter his logical values as "F," "T," "FALSE," or "TRUE," and the Interpreter will always print on output "TRUE" or "FALSE," whichever is applicable.

    NOTE: To avoid repetition with each statement above, comment on 'X, Y, Z,...' has been saved until now. Any number of variables, so long as the overall instruction length does not exceed 160 characters, are allowed in an instruction. Each variable name may be up to four (4) characters in length, and any variable name may be used as an array name providing standard array notation has been used; that is, ARA (NI, N2, N3,...) up to six dimensions.

    When the Declaration Statement has been discovered, the Interpreter begins a review of the line to retain certain significant data. First, it looks at the variable name and compares it with the current list of variables already declared to ensure that the name has not been previously used, and then, if this check is satisfied, the new name is stored in TABLES. Next, since the data type can be taken directly from the instruction type, the Interpreter stores the type in TABLES. This information is stored in a two-digit notation (IN--Integer, RE--Real, ST--String, BO--Boolean). If an array is being declared, the Interpreter stores the relative character address of the array's dimensioning specifications in PERM for later use by a routine which will locate the absolute storage location of a given array element (ARRAYLOC--Array Element Locator). This relative character address is stored along with the data type because it can be stored in the other two character positions of the variable's value in the data tables. A pointer to the data tables is kept, and as new variables and arrays are declared, the pointer increments through the data table to the next available storage location. Before the pointer is incremented, though its value is stored in TABLES as the storage location of the variables declared. The actual format of entries into TABLES is as follows:

    XXXX PPTT AAAA

    where XXXX is the variable name, PP is a pointer to the user's program in PERM if the entry is an array declaration, TT is the two-digit indicator, and AAAA is the relative displacement of the variable's storage location from the first word of the data tables.

    If the variable is a string variable, then the Interpreter must store some additional data. As stated earlier, a string character may be up to 150 characters in length which requires special preparation for storage. So that the Interpreter will not become confused with varying string lengths, a special table, STLGTAB (String Length Table), is kept, and the table entry will be in a two-word format as follows:

    XXXX OLLL

    where XXXX is the string variable name and LLL is the field length.

    Another situation requiring special consideration is the handling of statement numbers. A user will normally use statement numbers to mark entry points into his program. Eventually these statement numbers will become operands of conditional and unconditional branch statements; thus, they will play a large part in the execution of the user's program. When the Interpreter encounters these statement numbers and the operand of a branching instruction, it must be able to determine if the statement number has been used as an entry point, and if it hasl then where is that entry point. The questions asked here are similar to those asked when the Interpreter found variable names. Thus, the storage of the information is closely related. For this reason, when the Interpreter reads one of these entry points in the first pass, an entry is made into TABLES, and that entry is in the following format:

    NNNN bbbb AAAA

    where NNNN represents the statement number (of note, the statement number cannot exceed 999 (decimal). This restriction exists because the Interpreter expects a colon to follow the statement number in the user's program, and the colon is stored as part of the number to serve as a delimiter), the bbbb indicates blanks because a statement number has no datatype and the space is not needed (of note, the word of core is sacrificed here to maintain the consistency of the table structure), and the AAAA is the relative address of the statement number in the uset's program as stored in PERM.

    In discussions of variables and statement numbers, concern has been expressed about knowing that the variable or statement number had not been previously declared. The error is obvious; if a variable or statement number has been declared two or more times, the Interpreter will.not know which storage location to store a value, or it may not know which entry point to go to on a branching operation. To check each Declaration Statement for duplicate variables and each statement number for repetition is a special routine called TABSCAN (Table Scan). If the routine finds the error, a message is sent to the user (see Section IV, para 23). Although TABSCAN is instrumental in determining duplicate variables and repeated statement numbers, it is also useful to retrieve the specific information which was retained on the variables and statement numbers. The routine makes a TABLES search to locate the name or number and return with the stored information of that name or number.

    In the discussion above, note was made of the different problems which the Interpreter must face and resolve for successful processing of variable names and statement numbers. With the specially constructed TABLES and the routine TABSCAN, the task of the Interpreter in this area is no longer so difficult.

    3. STMTTAB (Statement Table)--This table contains the relative address as stored in PERM of each instruction of the user's program, except the first, which the Interpreter already knows to be zero. This table is significant during the execution cycle, because it directs the Interpreter through each instruction of the program.

    4. DATATAB (Data Tables)--This table holds the current value for each data variable which is declared in the user's program. Entries are made either through a DATA Statement, which is processed in the first pass, or by an Assignment Statement, which is processed in the second pass and will be discussed as part of the second pass discussion (see Section IIB, para 2). The DATA Statement is of the form:

    DATA NAME, VALUE, NAME, VALUE,...

    where NAME is the variable name and it can be an array, and VALUE is the value assigned to the variable (for character strings, including Boolean values, VALUE must be enclosed in double quotes).

    When the Interpreter finds a DATA Statement, the first task is to determine the variable used, if it exists, and if it does, then where does it live? All these questions can be answered using TABSCAN. Use of TABSCAN will return the data type and the storage location of the variable, but if the routine is unable to find the variable, then an error is printed (see Section IV, para 5).

    Once the Interpreter knows the type of data that it is working with, it can make the necessary conversions to get the data stored. Since the storage area has already been assigned by the Interpreter, according to data type, the Interpreter is concerned only with the proper conversion and storage, which is no problem; thus, with the data type returned by TABSCAN, the Interpreter chooses one of the following conversion techniques:

  • (a) Boolean Variables--Boolean variables take on values of either TRUE or FALSE. These values are entered into the computer in BCD format, and stored in BCD format, so no conversion is necessary. They are stored as read.
  • (b) String Character Variables--Integers are read into the computer in BCD format, but since the computer performs calculations in binary arithmetic, it will be necessary to convert the integer BCD value to an octal (binary) number. The Interpreter has two routines which perform the desired conversion. The first, BCDOCT, is designed to handle small integers known to be fewer than four digit. This routine is simple and efficient for quick conversions. For data input, though, it is not known whether the value will be large or small, so the Interpreter assumes that it will be larger than four digits. For the large BCD-to-octal conversions, the routine BCDOCTL will convert numbers up to 12 digits in length. The limitation on BCDOCTL might warn the user that if he expects an integer larger than 12 digits, he should declare it a real number and use scientific notation for input to avoid an error.
  • BCDOCTL converts large and small BCD numbers to octal in three successive calls to BCDOCT. Initially, the BCD integer is stored in a three-word storage area, NUMA, left justified and zero filled to the-end of the storage area, and a counter is kept for the number of BCD digits in the number (call it n). On the first call to BCDOCT, the first four characters of NUMA from the left are converted to octal, multiplied by 100,000,000 and stored in a temporary storage area. On the next call, the second four characters are converted by BCDOCT, multiplied by 100,000 and added to the above number in the temporary storage area. On the last call, the last four numbers of NUMA are converted and added to the sum in the storage area. All of this multiplication and addition has created a very large number. To bring it back into the correct value, the number is divided by 10** (12-n). This division will yield the final solution ready to be stored.

  • (c) Real Number Variables--Real numbers are exactly like the BCD integers discussed above except that they normally have a decimal point. It is not necessary for a floating point number to have a decimal point, but the notation assumes it. Essentially, floating point notation is a scientific notation for binary numbers. Instead of measuring the number as a power of ten, floating point expresses the decimal number as its binary equivalent value raised to a power of two. The routine which performs the conversion is BCDFLPT. First, the number is converted as though it did not contain a decimal, and then it is divided by the power of ten which brings the decimal back in place. To understand better how this works, recall the 12-digit storage area, NUMA, which was used for BCDOCTL. As then, the number is stored in NUMA, and a counter keeps the number of BCD digits in the number. However, another counter is required for BCDFLPT which counts the number of digits to the right of the decimal point (of note though, the decimal point is not stored in NUMA). When the number is in NUMA, it can be converted to octal just as before, and then BCDFLPT converts this simple integer to its floating point integer equivalent. Finally the floating point value is'divided by ten raised to the power m, where m is the number of digits to the right of the decimal. The division is floating point division; hence, the answer is in floating point notation and ready to be stored (see Figure 3).
  • ---------------------------------------------------------------------------------------------------------------------------

    Figure 3. Above is an example of the conversion routines, BCDOCTL and BCDFLPT, in operation. Assume the value 104.5312500 is the object for floating point conversion. For the conversion back to BCD, see Figure 6.

    ---------------------------------------------------------------------------------------------------------------------------

    Step 1.  The Interpreter first reads the value into a buffet, NUMA, ignoring the 
                decimal but counting the digits.  After being read, the buffer and counters
                will contain the following:
    
    		NUMA = 1045 3125 0000  with
    			NUMCOUNT = 0012	(octal) count for total digits.
    			DECOUNT  = 0007	(octal) count of digits to the right of
    						the decimal.
    
    
    Step 2.  Next, through three successive calls to BCDOCT, the value in NUMA is converted
                to octal (ignoring the decimal).  Then the number is further reduced by 
                division.
    
    		1.  First call: 1045 (decimal)			2. Second call: 3125 (decimal)
    		    converted to 2025 (octal).            	   converted to 6064 (octal).
    		    575360400 (octal) for 100000000          	   23420 (octal) for 10000
    	 	       x 2025 					  x 6064
                    -------------				       ---------	
          		1412453672400 (octal)			       167127500 (octal)
    
    Note:  The third call to BCDOCT returns the value zero.  The two values are then added 
              together.
    
    				1412453672400 (octal)
    				  + 167127500 (octal)
                       	        -------------
    				1412643022100 (octal)
    
    
    Step 3. When the calls have been completed to BCDOCT, control returns to BCDOCTL.  The above 
               sum is divided by 10**(12-n) where n=iO (decimal).
    
    			1412643022100 divided by 144  equals 7623431620 (octal)
    
    
    Step 4. Now the number is ready to be converted to floating point, which the floating point
              value of 7623431620 is 20367623 43162000.  Division will be by the floating point
              value for 10**m, where m=7.  This value is 20304611 32000000.  The floating point
              division yields the final value:
    
    					20076421 00000000.
    
    

    ---------------------------------------------------------------------------------------------------------------------------

    A problem alluded to earlier but not discussed was locating the elements of an array. Now that data has been converted and is ready to be stored, we have to tell the Interpreter where to store it. In most cases, the Interpreter can use TABSCAN to retrieve the storage location of the variable and store the value there, but in the case of arrays, special calculations must determine the relative displacement between the first element of the array and the desired element. This calculation is done by the routine ARRAYLOC (Array Locator) using the formula:

    Relative Location of Element = c * { SUM [x(i) * PROD n(j) ]}, i => 1 to d; j=i+1 to 6

    where d is the dimension of the array (limited to six dimensions because of the Interpreter's limit to 12K words of core), x(i) is the ith parameter of the array element, n(j) is the jth parameter of the declared array as specified in the Declaration Statement, and c is a multiplicative constant which indicates the number of words of core used by each element of the array. The multiple is automatically set to one for integers and boolean variables, two for real numbers, and the value in STLGTAB divided by four (plus one for any remainder) for string characters.

    Once the Interpreter has read all the user's program and stored the required information for the tables discussed in this section, there are only a few administrative tasks and then, the Interpreter is ready for the second pass. The first administrative function is to check to see if the user has requested a data dump. It is possible, by using jump switch 6 [3] on the computer console (see Section III, para 4), to obtain a data dump of all the tables which should be developed in the first pass. The dump allows the user to ensure that his tables have been set up correctly before he continues to the second pass. More discussion will be given on the data dumps in Section III. Next, the Interpreter must check to see if the last card is terminate. If it is not, the program must be abnormally terminated by the Interpreter, or the Interpreter will not be able to terminate itself properly during the execution phase.

    One final operation must occur before the Interpreter can move on to the second pass. As an added capability of the Interpreter, instructions which retrieve data directly from the CDC 3100 system's data files are provided. When trig functions are used, the value PI is very frequently desired. By the development of the Interpreter, the user is required to first declare PI as a variable (real), and then set it equal to 3.l4l592654; however, the requirement for PI can be and is anticipated by the Interpreter. Before leaving the first pass, the Interpreter adds the variable name PI just as though it had been declared, and literally follows the same routine as other variables which are formally declared. In addition, the constant value stated above is placed in its appropriate location in the DATATAB. When using data from the computer's data base, the same situation again arises. Without intervention by the Interpreter, the user would be required to declare an eighty character string variable; however, since the Interpreter knows of the requirement, it simulates the declaration of a string variable (80 characters in length) and gives it the name STAR. When a line of data is retrieved from the computer's data base, it is stored in STAR.

    Now that all the tables have been completed, the Interpreter is ready to move to the second pass (see Figure 4).

    B. The Second Pass {882 words}

    In the second pass, the Interpreter is ready to begin execution of the user's program. To enable a sequential execution of the program, the Interpreter has a program counter, STMTNUM, which actually contains the first character address of the current instruction, and a cyclic routine which directs the Interpreter as STMTNUM steps through the program.

    Most program counters are incremented by one after each instruction because the counter is an instruction counter, but since the Interpreter must read each instruction character by character to interpret and execute it, and since the program is stored character by character, then the program counter will be a character counter. Now the reader might recall the development of the table STMTTAB (see Section IIA, para 3), in the first pass which stored the relative character address of the first character of each instruction. This table is used to update STMTNUM after the execution of each instruction until the completion of the program. It should be noted that completion of the program will be indicated by the TERMINATE Statement, not the end of the STMTTAB table, although both events should occur at the same time. In some cases, though, the user might end his program with a TERMINATE Statement which is correct and was checked for toward the end of the first pass, but due to some program bug, such as an infinite loop, his program may not be able to terminate. To eliminate this problem until the user can correct his program, the Interpreter will recognize an emergency abort which can be enacted by the user by depressing jump switch 2 on the CDC 3100 computer console. The abort will completely terminate the execution of the Interpreter and return control to the ECDAPS.

    Once the first character address of the next instruction has been loaded, the Interpreter begins a character by character search of the instruction looking for either a single quote or an equals sign. The single quote (see Section IIA, para 1) will alert the Interpreter that an instruction type will follow and can be determined from the next two characters. If the Interpreter finds the single quote, then these next two characters will be read, the instruction type will be determined by using the routine DETRMN, and control will be passed to the routine which evaluates and executes the instruction. On the other hand, if the Interpreter finds the equals sign, then it knows that an expression is being calculated instead of an instruction being executed, so it will be necessary to use a special routine to process the instruction. However, before the expression can be evaluated, the Interpreter must learn the final storage area for the results of the expression. This information can be found in the argument on the left side of the expression. The Interpreter has a special routine, EXPRHOME (Expression Home), which will evaluate the left side of the expression and advise the Interpreter of the place to store the final solution. Now to return to the above discussion, a special routine must be used to evaluate an expression, but because the evaluation of the Boolean expression differs so greatly from any other expression, the Interpreter has two routines for solving expressions. To execute Boolean expressions is BOOLSTOR, and to evaluate normal expressions is EXPRESN. As the reader might guess, the expression type returned by EXPRHOME will direct the Interpreter to the correct routine to be used.

    ---------------------------------------------------------------------------------------------------------------------------

    Figure 4. Above is a sample program to demonstrate the development of the first pass tables.

    ---------------------------------------------------------------------------------------------------------------------------

    User's Program:
    	INTEGER A,B(7),C					STMTTAB	 14  (octal)
    	REAL D,E(2,2)							 30
    	STRING F,G,H							 41
    	STRING (5) 1							 52
    	BOOLEAN J,K							 61
    	STRING (13) L(3,3,3)						100
    	DATA A,7,E(1,2),3.5,F,t'CAT,"I,"LIVIM,"				203
    		J,"T,"L(1,2,1),"ABCDEFGHIJKLM"
    	TERMINATE
    
    			 00	 01	 02	 03	 04	 05	 06	 07
    PERM	0000		'INA	,B(7	),C;	'RED	,E(2	,2);	'STF	,G,H	
    	0010		;'ST	(5)I	;'BO	J,K;	'ST(	13)L	(3,3	,3);
    	0020		'DAA	,7,E	(1,2	),3.	5,F,	"CAT	",I,	"LIV
    	0030		ER," 	J,"T	,"L(	1,2,	1),"	ABCD	EFGH	IJKL	
    	0040		M";'	TE;
    
    TABLES   NAME DATATYPE        LOCATION           	STLGTAB    NAME  LENGTH
      	  A	   IN		  0  (octal)			     F	    4 (octal)
    	  B	 07IN		  1				     G	    4
    	  C	   IN	 	 10				     H	    4
    	  D	   RE	 	 11				     I	    5
    	  E	 OCRE		 13				     L	   15
    	  F	   ST		 23
    	  G	   ST		 24
    	  H	   ST		 25
    	  I	   ST		 26
    	  J	   BO		 30
    	  K	   BO		 31
    	  L	 OZST		 32
    
    DATATAB			 00	 01	 02	 03	 04	 05	 06	 07
     	0000		0007	0000	0000	0000	0000	0000	0000	0000
         	0010		0000	0000	0000	0000	0000	2000*	0000	0000
    	0020		0000	0000	0000	CATO	0000	0000	LIVE	ROOO
    	0030		T000	0000	0000	0000	0000	0000	0000	0000
    	0040		0000	0000	0000	0000	0000	0000	ABCD	EFGH
    	0050		IJKL	M000	0000	0000	0000	0000	0000	0000
    
    *The value in octal locations 0015 and 0016 is 20027000 00000000.  The obvious discrepancy exists because this is the only value in the table which is in octal format.  All others are in BCD.  The value, being floating point, must be printed in octal or it will have no ,meaning to the reader.
    

    ---------------------------------------------------------------------------------------------------------------------------

    At this point, it should be evident that the Interpreter deals with only two broad types of instructions: (1) Executable instructions, and (2) Expressions (see Figure 5). Below will follow a more thorough inquiry into the mechanics of each of these statement types.

    I. Executable instructions {63 words}

    Executable instructions can be further broken down into instructions which give directions to the Interpreter about program execution and into instructions which cause the Interpreter to make an Input/Output request of the CDC 3100 computer system. Discussion of these instructions will be offered for the instructions which direct the Interpreter in the program execution followed by the I/O request instructions.

    (A) Instructions which affect the program execution {1,779 words}

    1. The TERMINATE Statement--This instruction is probably the most important to the Interpreter because it indicates that the job is done. Without this instruction at the end of the program, the program will not execute, and a diagnostic message will be printed (see Section IV, para 6). Above it was noted that the user has the option to get a data dump after the first pass. At the end of the program execution, a similar option also exists. The final chore of the Interpreter after it reads the TERMINATE Statement is to provide a listing of the source program and if requested, the data dump. If before the source listing begins, the user decides to request the full data dump, he may do so by depressing jump switch 1 on the computer console. On the other hand, if the user decides not to even get the source listing, in the case of a long program, then once the listing begins, he may depress jump switch 2 on the computer console, and the listing will be aborted (for more discussion of data dumps and dump options, see Section III). Upon completion of the printout the Interpreter will print a message to the user to indicate a normal termination of the program.

    2. The GOTO Statement--This statement allows the user to skip a section of his program which he may not wish to execute in the normal instruction sequence. The format of the statement is GOTO N where N is an integer number (less than 999). When the Interpreter reads N, it uses TABSCAN to determine the character address associated ,with the statement number (see Section IIA, para 2), which should have been stored in the first pass, and then a reverse search through STMTTAB to locate the address and determine the new pointer to the table. The entire procedure is simply to modify STMTNUM. Obviously, when a jump is to be made, the Interpreter is not only concerned with properly executing the jump, but also with ensuring that the pointer to STMTTAB, which is STMTNUM, is updated so that each statement executed hereafter is the correct instruction. Once the pointer to STMTTAB is updated, the Interpreter is ready to continue to the next instruction.

    ---------------------------------------------------------------------------------------------------------------------------

    Figure 5. The Instruction Set capability for the Second Pass of the Interpreter

    ---------------------------------------------------------------------------------------------------------------------------

    I.  Executable Instructions
    
    	A.  Instructions which affect the program execution
    		 1.  TERMINATE
    		 2.  GOTO N
    		 3.  IF (X) THEN Y
    		 4.  FOR I=n to N
    		 5.  BACK
    		 6.  WHILE (I.op-N)
    
    	B.  Input/Output Request Instructions
    		 1.  N:MAT (Xl, X2,
    		 2.  WRITE n, (Xl, X2,
    		 3.  PUNCH n, (XI, X2,
    		 4.  CARD n, (Xl, X2,
    		 5.  ENTER n (Xl, X2,
    		 6.  FILE, (list of parameters)
    
    II.  Expressions
    
    	A.  Normal-Expressions (functions)
    		 1.  EXP (X)
    		 2.  LN (X)
    		 3.  LOG (X)
    		 4.  SQRT (X)
    		 5.  FACT (X)
    		 6.  ABS (X)
    		 7.  SIN (X)
    		 8.  COS (X)
    		 9.  TAN (X)
    		10.  ASIN (X)
    		11.  ACOS (X)
    		12.  ATAN (X)
    		13.  CNVRT string name (Nl:N2)
    
    	B.  Logical Expres sions (log ical functions)
    		 1.  (x) AND (y)
    		 2.  (x) OR (y)
    		 3.  (x) EOR (y)
    

    ---------------------------------------------------------------------------------------------------------------------------

    3. The IF ... THEN Statement--This statement gives the user the capability to execute an instruction (the THEN segment) providing a preset condition holds (the IF segment), and to skip the execution of the instruction if the condition does not hold. The format of this instruction is

    IF (X) THEN Y

    where X is any logical expression containing two operands separated by a logical operators, and Y is any executable instruction. The expression (X), for any two operands, x and y, can make the following logical comparisons:

    
    			(x.EQ.y)			x equal to y
    			(x.NE.Y)			x not equal to y
    			(x.GE.y)			x greater than or equal to y
    			(x.LE.y)			x less than or equal to y
    			(x.GT.y)			x greater than y
    			(x.LT.y)			x less than y
    
    

    Of note is that the logical operators shown above, like FORTRAN, are flanked on both sides by periods.

    To actually make the logical comparisons, a special routine, LEXPRCHK, has.been developed. This routine makes use of the normal expression routine, EXPRESN (to be discussed later), to return a single value of TRUE or FALSE to the Interpreter. Even though EXPRESN is used, Boolean comparisons are possible because of a specially designed portion of the two routines (see Section IIB, introduction). LEXPRCHK, in solving the expressions, actually makes two calls to EXPRESN, first to solve x and then y of the above examples. As noted earlier, an expression requires a storage location for its final value and a data type so that the final answer will be in the correct data format. LEXPRCHK makes up two sets of temporary storage locations and uses the data types of the variables in each of the subexpressions, x and y, to set up.the calls to EXPRESN. When both calls have been made, a check,, using the logical operator, is made to determine if the value of the logical expression is TRUE or FALSE. The Interpreter uses this final value to either execute the THEN portion of the statement (if the value is TRUE), or to skip to the next instruction (if the value is FALSE). Of final note LEXPRCHK, in comparing character strings and boolean variables, will only check for 'equal to' or 'not equal to.' In some higher level languages, such as ALGOL, the user can arrange alphabetic constants and strings in alphabetic order by using logical operators, such as 'less than or equal to' and 'greater than or equal to.' This language cannot permit such comparisons, though because the BCD characters in the CDC 3100 Computer for A and Z are such that LEXPRCHK would inconsistently assign the following values:

    
    			(A.LT.B)                	 TRUE (correct)
    			(A.LT.Z)                	 FALSE (incorrect)
    
    

    Therefore, any other logical operator except 'EQ' and 'NE' for character strings and boolean variables are illegal and an error will be printed (see Section IV, para 24).

    4. The 'FOR I=n TO N' Statement--This statement functions similarly to the FORTRAN DO Loop Statement. It gives the user the capability to cycle through specified statements until the number of cycles, or loops is equivalent to the limit entered by the user. The format of the statement is as shown,, where I is any declared integer variable name (note that this variable must be declared using an Integer Declaration Statement) which acts as the indice or loop counter, n is the initial value assigned to I (the value should also be an integer; however, if n is a real number, since I was declared an integer, it will be rounded to the nearest integer), and N is the limit of 1; thus placing a limit on the number of loops to be taken. Unlike FORTRAN, though, the user cannot specify the value by which I will be incremented with each loop; the counter, I, will automatically be incremented by one with each loop.

    The Interpreter will allow up to six nested loops; that is, only six loops may be used at one time. The organization of this statement by the Interpreter is not very involved and relatively uncomplicated. The Interpreter maintains a table which contains the relative address (relative to DATATAB) of the counter's storage location (determined by TABSCAN), a table which contains the final value, or limit, of the counter, and a table which contains the character address of the first statement following the 'FOR' Statement. The construction of these tables enables efficient processing and execution of the 'FOR' Statement. Finally, the 'FOR' Statement requires another statement which acts as a delimiter to the loop. This additional statement will indicate to the Interpreter that the end of the loop has been found, and the Interpreter must either execute the instructions of the loop again, or close the loop and continue to the next instruction. The statement referred to here is the RACK statement.

    5. The 'BACK' Statement--This statement, as stated above, signals the conclusion of a set of instructions. This conclusion will be either to a group of instructions involved in a 'FOR' Statement loop or a 'WHILE' Statement loop. The format of the Statement is as shown above. The statement has no formal parameters.

    When used with the 'FOR' Statement, the 'BACK' Statement will cause the Interpreter to examine the tables developed by the 'FOR' Statement and do one of the following things.

  • (a) If the value of the counter, 1, has not yet reached its limit, the Interpreter will retrieve the relative character address of the statement following the 'FOR' Statement and transfer it to the instruction which begins the loop again. Just prior to transferring, though, the counter, it will be incremented and restored.
  • (b) If the value of the counter, I, is equal to its limit, then the Interpreter will close the loop, by decrementing a counter to the tables, and continue to the next executable instruction.
  • (c) If the value of the counter, I, exceeds its limit, then the user must have incorrectly used the 'FOR' Statement. Most likely, such an error would be initializing the value of I, n, to be a value larger than the limit, N. In such a case, an error message will be printed for the user (see Section IV, para 18).
  • When used with the 'WHILE' Statement (to be discussed later) many of the above actions will take place. In particular, though, since the variable in the 'WHILE' Statement is not automatically incremented with each loop, the 'BACK' Statement must also check to see if the variable is converging to the limit. If not, the computer verges on an infinite loop, and the program must be aborted. To coordinate this, the Interpreter uses the computers internal clock to limit the loop execution time. If the program must be aborted, an error message will be printed to the user (see Section IV, para 26).

    6. The 'WHILE' Statement--The 'WHILE' Statement, borrowed from ALGOL, is very much like the 'FOR' Statement, except that the variable I is not restricted to being an integer, and the user is not required to initialize the loop counter. The format for the 'WHILE' Statement is

    WHILE (I.op.N)

    where as seen with the FOR Statement, I is the variable which is compared with N, by the logical operator, op, to determine if the set of instructions in the nest will be executed. Obviously, if the condition (I.op.N) is TRUE, the instructions will be executed; otherwise, execution will skip to the first statement following the WHILE loop. As with the FOR Statement, the set of instructions for this statement will need the 'BACK' Statement As a delimiter (see Section IV, para 27).

    (B) Input/Output Request Instructions {3,941 words}

    Before moving into the specific I/O instructions and their formats, a few words will be offered .to introduce the Interpreter's I/O System.

    In storage, many values are not in a desirable or understandable format for the user; thus, conversion and formatting of data for output becomes important. If floating point notation has been used, it must be converted to its real number equivalent (in BCD) before printing; if the number is octal, it must be converted to decimal since most people do not work with octal values; and if it is in string notation, the data is stored in the correct format, but it must receive proper handling for printing according to its specified string length. To do the above important tasks are two routines for data conversions and three routines for data formatting. First we will discuss the data conversions.

    For real number values stored in floating point notation, the routine FLTDEC (Floating Point to Decimal Conversion) is used. FLTDEC actually performs the conversion in two parts: first, it converts the floating point number to its integer value (BCD) without the fraction and stores it; then it finishes the conversion by processing the decimal portion which is preceded by a decimal and stored behind the BCD integer. The BCD integer is arrived at by unfloating the real number into an octal integer and fraction; the fraction is placed in temporary storage. The integer then is converted to BCD by dividing it by the largest possible power of ten (such that the resulting quotient be greater than one but less than ten), where the quotient represents the leftmost significant digit, and from there by continually dividing the remainder of the previous division by the next smallest power of ten, to produce each additional significant digit (from left to right), until the divisor becomes one. When the divisor is one, the integer conversion is complete. At this point, the fraction is retrieved from its storage area, and converted to a decimal fraction by continually multiplying the fraction by ten (decimal) until the desired accuracy (seven places) is achieved. Each multiplication produces one significant digit (from left to right), where the significant digit will be found to the left of the decimal. As the reader can see, the routine is complicated, particularly when dealing with negative numbers and decimal numbers (less than abs(l)); however, without it, the Interpreter cannot provide the user with a meaningful answer.

    Another such routine within the Interpreter is FLTBCD (Floating Point to BCD Conversion), which converts octal integers to BCD decimal integers. From the above paragraph, it should be noted that the Interpreter was tasked with converting the floating point number to an integer (decimal) and a decimal (fraction) number; thus, conversion of integers is identical to real number conversions except the fraction is excluded. So the integer conversion will use the same technique and the same routine, FLTOCT, initially, but if there is any fraction, it will be rounded off to the nearest integer (see Figure 6).

    ---------------------------------------------------------------------------------------------------------------------------

    Figure 6. Above is an illustration of the FLTDEC and FLTBCD routines which prepare data for the user's eyes.

    ---------------------------------------------------------------------------------------------------------------------------

    Note:  Initially, it will be assumed that the value 20076421 00000000 is already in storage waiting to be converted to BCD format for printout purposes.
    
    	Step 1. Unfloating the floating point number yields:  150.42.
    	Retain the 150; temporarily store the 42.
    
    	Step 2. Next the number is converted by digit through division.  The number is divided by the largest power of ten which produces a quotient greater than one but less than ten.  The single digit quotient is a significant digit in the conversion and is saved, while the power of ten is reduced one place as a divisor for the remainder of the previous division.  Continue dividing in this manner until the power of ten is zero.
    
    	a).  Dividing by 144 (octal for decimal 100) yields:
    			150 divided by 144 yields 1, remainder 4
    
    	b).  Dividing by 12 (octal for decimal ten) yields:
    			4 divided by 12 yields 0, remainder 4
    
    	c).  Dividing by 1 (octal for decimal one) yields:
    			4 divided by 1 yields 4, remainder 0
    
    	Step 3. Next place the three digits from the calculations above in order to form 104.  At this point, FLTBCD, for integers, rounds off the fraction (.42 (octal)), and returns the BCD integer value 105.
    
    	Step 4. Multiply .42 (octal) by 12 (octal for decimal ten) seven consecutive times as below to determine the decimal fraction.
    
    	1. .42	2. .24 	3. .10 	4. .20 	5. .40 	6. .00 	7. .00
    	    12	    12	    12	    12	    12	    12	    12
    	  ----	  ----	  ----	  ----	  ----	  ----	  ----
    	  5.24	  3.10	  1.20	  2.40	  5.00	  0.00	  0.00
    
    Step 5. The final step is to place the leftmost digits from each multiplication above with the 104 to arrive at the BCD value
    				104.5312500
    

    ---------------------------------------------------------------------------------------------------------------------------

    Now that the data has been converted to the correct data structure for printing, the user must be concerned about the output format. Otherwise, there will be times when the computed format will appear messy and inadequate to the user. For example, the Interpreter in converting real numbers to BCD format, uses the standard seven digits on each side of the decimal. If the user is working in dollars and cents, he will most likely only want two digits to the right of the decimal; another case is for small numbers, the Interpreter fills out the value with leading and trailing zeros, which the user may or may not want in his printout. Therefore, because at times the user will not be satisfied with the computers format, and at other times, he will want to use the computer's format, the Interpreter provides the option of either formatting or not.

    Whether format statements are used or not, the Interpreter begins its analysis in the same way. First, the Interpreter reads the statement to look for a statement number. If it finds one, the number is stored for later use. If not, a flag is set to indicate that no format statement is being used. Then the Interpreter transfers to a routine DATAREAD, which finishes reading the I/O Instruction to identify which data variables are required for the printout. DATAREAD creates a table, IOREQ, to store the data type, the storage location of the data item to be processed, and when a string variable, the string length. In the case of arrays, the absolute address is calculated and stored, rather than storing the array's first element word address. Once the table is established for the line to be printed, the Interpreter moves to another routine, MATEVAL, which actually constructs the line for output. Before this second routine gets started though, it checks for the statement number, which, if it exists, would have been stored earlier. If there is a statement number, then the line cannot be constructed until the MAT Statement is evaluated so that guidance will be available for MATEVAL. To perform this evaluation, the routine MATSTA has been developed. MATSTA, first of all, determines the location of the MAT Statement (using TABSCAN, see Section IIA, para 2); then, it makes a character by character replica of the desired output line, using only characters (X--for blank R--for integer portion of real number, D--for decimal of the real number, F--for fraction of the real number, C--for character strings, and I-for each integer digit) instead of the actual values. Later, MATEVAL will use this replica to determine the exact location to store blanks, numbers (also paying attention to the size of the number in digits to be printed), and character strings. Once the replica line has been constructed, MATEVAL uses the table constructed by DATAREAD to retrieve each data item listed; it uses the conversion routine discussed above to put the data in the correct format; and finally, it overwtites the replica line according to the placing of the format characters on the line. It should be noted that the Interpreter does not actually overwrite the replica line, but writes in parallel to another buffer area of the same size. Overwriting the replica line would lead to problems because the nature of the procedure described above implies that the Interpreter must scan the replica line in search for X's, R's, D's, F's, Clg and I's. Assuming that a character string is part of the output, if a second line were not used for the data, the Interpreter would be hard pressed to separate actual data items (characters) from the format notes. The difficulty is not worth the extra core saved by using only one buffer. Thus this problem is avoided by storing the character string label in the second buffer (the buffer which is actually printed). Since the label was to appear in this location in the second buffer anyway, no harm is done, and much confusion is avoided. In accordance with the above discussion of format statements, the format of the MAT Statement is introduced below:

    n: MAT (Xl,X2,...)

    where n is the statement number followed by a colon, and (X1, X2...) is made up of any data formats using the cases as illustrated below:

  • (a) Fw.d--where w is the field length (including the decimal), and d is the number of characters to the right of the decimal.
  • (b) Iw--where w is the number of digits in the integer to be printed.
  • (c) Cw--where w is the number of characters in a string to be printed.
  • (d) nx--where n specifies a certain number of blanks to be printed.
  • Of course, as noted earlier, F, I, C, and X must appear as part of the field length information to identify the data type for MATEVAL.

    Next to be considered is the case when a format statement has not been used. If no statement number is indicated in the I/O Instruction,, then the Interpreter is saved some analysis, but it does not escape totally free from the spell of formats. Although not officially entered by the user, when no format statement is used, the Interpreter must still supply an output format. In this instance, the data to be printed can only be taken directly from storage and dumpdd; thus, the concept of a standard format is begun.

    Even though no format statement exists, as noted earlier, the Interpreter will still go through DATAREAD to set up the table of data types, storage locations, and string lengths. From DATAREAD, the Interpreter goes to MATEVAL, but since there is no format number, the Interpreter will not use MATSTA. Since no direction is provided from MATSTA, it is necessary to keep a column counter, or tab counter, to keep abreast of where each data value is to be stored. As these values are stored, the tab counter will increment accordingly to move down the line as the Interpreter moves through the table which was produced by DATAREAD. Obviously, some form of spacing must be used. The Interpreter automatically assigns four words of storage to real numbers and to integers. Since the format of these data items is unadjusted, they will be printed in long form; that is, integers will be seven digits (with leading zeros), and real numbers will be fifteen digits (with leading and trailing zeros). Character strings will be stored in their entirety, and the spacing will be a skip to the end of the current word and then skip one full word. The character string skipping is irregular because the Interpreter does not readily know where the address of the next word of buffer storage is. Therefore, it must divide a character by four, round the quotient, and add one word to ensure adequate spacing. (For examples of the I/O System, see Appendix A).

    Now that the user has seen how the I/O structuring operates for output, discussion can turn to the actual I/O Instructions which make use of the above concepts:

    1. The 'WRITE' Statement--This instruction provides the user with the capability to make I/O data requests of the printer. As was seen above, the option to use or not to use format statements exists. The format of the WRITE Statement is:

    WRITE n,(Xl, X2....

    where n is the optional statement number of the format statement used, and (xi, x2,...) are the data variable names for the quantities to be printed (the parentheses are optional). The data variables can be any type; they can also be in array format. The routine DATAREAD, as discussed above, will properly handle the data type, data location, the string length where applicable, and store the information in a table.

    2. The 'PUNCH' Statement--The PUNCH instruction is identical to the WRITE Statement except that the output peripheral is a card punch instead of a line printer. The format for this instruction is:

    PUNCH n, (X1, X2,...)

    where the above symbols are as represented in the WRITE Statement.

    As above, where it was necessary to prepare, through conversion routines and formatting routines, the data in storage for the user's eyes, now it is necessary to consider the reciprocal operation; that is, preparing input data for the computer's eyes. The process is not to differ greatly from the above method of converting the output data; however, just by the fact that the data is moving in a different direction means that additional routines are necessary. For example, the Interpreter cannot very well convert BCD quantities to binary (octal) or floating point values by using the same routine which converts from floating point and binary to BCD. In the same manner, instead of using MATEVAL which stores information to a line in its final format ready for print, a different routine will be required which will take the data from the line as read into the computer, convert it to its proper format for computer use, and store it in its proper storage location in DATATAB. Therefore, it is necessary to determine what parts of the current 1/0 system can be salvaged for inputs and what parts must be developed.

    As stated above, different routines will be needed to convert the data structures recognized by the computer. Fortunately, these routines have already been developed for use in the DATA Statement (see Section IIA, para 4). They are BCDFLPT, which converts the BCD real number into its floating point equivalent, and BCDOCTL, which converts a BCD integer value into its octal equivalent. Therefore, it is only necessary to read the line of data and determine what value is to be converted, use the above routines to convert it, and then,to store the data. First let us consider the storage of the data. As was discussed earlier, to store data which is in BCD format, the Interpreter must have a data type and data storage location. Nov the reader might recall that the Interpreter constructed a table, using the routine DATAREAD, which contained precisely the needed information for data storage. DATAREAD was introduced as a routine which stored the type, storage location, and character count of string variable for output conversions, but the same routine will now be used to store the data type, storage location, and string length for input. The obvious difference will be in the next step where different conversion routines will be used de ending upon whether the operation is input or output. Thus the real change to be made will be in the routine which causes the data conversion. To carry the thought one more step, another difference is the final destination of the data storage; one case will cause data to be stored to a buffer for printing or punching while the other case will cause the data to be stored to the computer's data tables for future processing. In the output mode, the routine which converted and stored the data was MATEVAL. Thus, for the input operation MATEVAL must be the routine to be replaced, or actually, to be redone to reverse the direction of the data. The replacement routine will be MATINPT, which will follow DATAREAD, and use the table established by it. In addition the system will use the routine MATSTA, which, as the reader will recall, constructs a replica line of the format line, if used. Then these three routines together will get data from the input queue of the computer to the correct storage location in the data tables. In cases where no format is used, the Interpreter will just read the information and recognize commas as data delimiters.

    Now that the user has seen how the I/O is done for input, it is time to turn to the instructions which will cause the above concepts to be used.

    3. The 'CARD' Statement--This instruction will give the user the capability to enter data with cards. The structure of this statement is:

    CARD n, (Xl, X2....

    where n is the optional format statement number, and (XI,X2,...) are the data variable names which indicate the storage location for the input data. Of note, the parentheses above are optional in all of the Input Routines as they were in the Output Routines.

    4. The 'ENTER' Statement--This instruction provides the user with the capability to enter data into the computer from the console typewriter. The format of this instruction is:

    ENTER n, (Xl, X2,...)

    where the symbols used are identical with the ones used in the CARD Statement.

    Thus far, the Interpreter has dealt with computer input and output in terms of its peripheral equipment, such as the printer, card reader, card punch, and the typewriter. However, there is one more I/O instruction to be considered. A major feature of the Interpreter is that the user will have access to the data base in the computer. To make the data retrieval, it is necessary to use two routines of the ECDAPS Software System; one is QUERY, which when supplied with a list of parameters for data retrieval options, will search the data base and bring the desired data into a buffer of the computer. Output from QUERY to the Interpreter can call up the data line by line for the user. All subsequent calls for a line of data is coordinated by the ECDAPS routine DATA, which updates the pointer to the data buffer. After each line of data is read, it is necessary to update the pointer to the next line.

    Normally the Interpreter acts as the middle man between the user and the ECDAPS Software System. Thus, a routine must be developed to enable the user to define his data needs of the system and to state those needs to the two ECDAPS routines to be used. To better understand what such a routine must look like, it will be necessary to understand a few things about the data base of the computer. Since the data is classified military information, it will not be possible to discuss in great detail the purpose or design of the data. However, it is important for the reader to understand that the computer is a data gathering center. The data come from various military installations and is entered directly into the computer over the teletypewriter circuits discussed in Section IA. Since a number of installations are involved, the data entry has an identification field to distinguish between the stations sending. Therefore one parameter for the user's data request will be to examine only data from a certain station. Another parameter will be time. Each data entry has a time on it to indicate when the report is written. Now, although they cannot be discussed, the data can be entered into the computer in several different formats, each format for a different task, and it is necessary to allow data retrieval by the type of report desired. Thus, with these parameters, the instruction can now be developed.

    5. The 'FILE' Statement--The FILE Statement provides the user with the capability to retrieve data from the main system's data base. The format of the instruction is:

  • (a) FILE,NSTAR-TGT,TMODAHRNN
  • (b) FILE,NECNUM,TMODAHRM
  • (c) FILE,CNlN2 ... N8,VlV2 ... V8,TMODAHRM
  • where in cases (a) and (b) above, N is a type of report indicator. N may have the value R (reports) or A (abstracts). STARTGT is the station identifier and ECNUM is a data identifier (because of the classification, these cannot be further explained). TMODAHRMN is a time parameter; the T must appear to alert the Interpreter that the time input is requested. MODAHRM is to indicate the month, day, hour, and minute as a start time for the data to be retrieved. In case (c), the user has still another option for recalling the system's data. Because the data is in the form of an 80-column string (see Section IIA, closing remarks), the data can be requested based on stated column values. The columns to be compared will be NlN2 ... N8 (note no comma between column numbers), up to 8, and the values to be compared against for the respective columns will be VlV2 ... V8 (again no comma). Of note, the values specified by V(i) should be only one-digit even though they are implied to be two-digits. With this capability, the user can request to see all lines of data in the computer with a 3 in column 27 by using the instruction FILE,C27,3.

    So that the reader will not be confused by use of this instruction, an explanation must be given. From above, it seems that the instruction will load an entire buffer of data into an area for the user to use. This is not the case. The buffer has been loaded by QUERY for the Interpreter to use, but due to the limitation on storage for the Interpreter, it is not possible for the Interpreter to load this data for the user. Instead, the FILE instruction causes a single line of data to be loaded. For more data, it is necessary to loop through the instruction each time a new line is required. Now, an obvious way for the user to override the Interpreter's intent here to conserve storage is to declare an 80-column string array, read a line of data and store it in each successive element of the array. However, when a line is called up, it is automatically stored in an area set aside by the Interpreter (STAR), and can be operated on from that special location. The desirable method to use the FILE instruction capability is to read a line of data, operate on it, and return for another line. In this way, the user can still use the data without tying up the Interpreter's storage area. To loop the FILE instruction as recommended, the Interpreter has developed a special technique to continue reading data until the end of file is reached. The loop should look like this:

    
    		WHILE (ST.AR.NE. "T")
    		FILE,(followed here by requested parameters)
    			.
    			.
    			.
    		BACK
    			.
    			.
    			.
    		TERMINATE
    
    

    As can be seen, the Interpreter performs the loading operation with the WHILE Statement. Once the last line of data has been found, the Interpreter enters the letter 'T' in the first character position of STAR.

    6. The 'SCRATCH' Statement--The simple function of this command is to void the current file of data, which had been earlier retrieved, when the user is ready to fetch new data. After this statement, the user can reinitialize his QUERY parameters. Then, when he specifies the new parameters in another 'FILE' Statement, the new buffer of data can be loaded. Now that the I/O instructions have been discussed it is time to look at the instructions and functions which revolve around the Interpreter's expressions. Since the computer is filled with calculations, the solving of expressions is highly important.

    II. Expressions {188 words}

    Developing a method to solve expressions is extremely complex because expressions can be simple involving no variables, or cumbersome involving variables, functions, or just a combination of these. They may be long or short, with parentheses or without. In each case, one routine must be able to accommodate the problem. The first task for the Interpreter to handle when confronted by an expression is to determine the expression's home; that is, once a final solution has been calculated, the Interpreter must know where to store it and in which data form. Thus as was stated in the introduction to the Second Pass, a routine, EXPRHOME, will read the name of the variable, including arrays, on the left side of the expression, and determine and store the location of the variable and its data type.

    Once the location and type are known, it is necessary to tackle the right side of the equation; however, because the calculation of expressions are handled differently in the case of Boolean variables, it is necessary to have two techniques to solve expressions. Thus, discussion of each of these routines will follow.

    (a) Normal Expressions {2,626 words}

    The routine to solve normal expressions, EXPRESN, is divided into two main parts; one, EXPRESN, has the capability to solve expressions which have both real numbers and integers contained within it, and two, EXPRESN can process string variable assignment statements.

    First, let us consider the string variable assignments. A string variable may be assigned a value either by giving it the value of a string constant which is enclosed in double quotes, or by setting it equal to another string variable. Therefore, if the Interpreter, in scanning the expression, finds a double quote, it knows immediately to store the characters following as the string's value. In the other case, if the string variable is set equal to another string variable, the Interpreter simply copies the characters from the first location in core to the new location. As is readily seen, this first type of expression can be easily handled by the Interpreter. The solving of this string variable problem is accomplished primarily by allowing the Interpreter to utilize programming designed for the second type of normal expression.

    A major feature of the Interpreter in dealing with character strings is the capability given to the user to work with partial strings. This means that the user may print just part of the string variable, or in an assignment, he may give it the value of only part of another string. The feature exists in ALGOL W [4]; it cannot be found in FORTRAN or most other scientific languages. The notation adopted by the Interpreter to indicate a partial string is

    STR (Nl.N2)

    where STR is the string variable name, Nl signals to the Interpreter to skip the first Nl characters, and beginning with Nl+l, use the next N2 characters.

    In the second type of normal expression, the Interpreter is much more involved with the expression. In the previous type, the Interpreter was concerned with relocating data from the expression or from a location in storage to the location of the variable being assigned the value. In this problem, though, the Interpreter deals with arithmetic expressions which may be nested in parentheses and which may contain many variable names and arrays. In either case, the .Interpreter must break the expression down into simple, easily solvable components of the original problem. In undertaking this task the Interpreter first scans the line in search for each data element (variables, constants, and functions). As each element is discovered, it is converted into its floating point value; that is, variables are retrieved from storage, and if required, are converted and constants are all assumed to be real numbers and are automatically converted to floating point. When these floating point numbers are arrived at, they are stored in a table called VALU. Since a floating point number requires two words of core, the relative address of these in VALU are 0, 2, 4, .., which in BCD will be 00, 02, 04,..., respectively. As the values are being stored in VALU, the Interpreter is rewriting the expression by replacing each value by its relative address in VALU, but without changing the symbols and other notation of the expression. Below is an example of how this process works:

    
    		DEMO   (3.0-+ARA(2,3))*7-VAR**EXP
    					where ARA(2,3)	=	2
    					and VAR		=	4
    					and EXP	 	=	6
    
    From rewriting the expression as described above, the result is
    
    	=(O+2)*4-6^8           		where VALU 		20026000 00000000
    						+2		20024000 00000000
    						+4		20037000 00000000
    						+6		20034000 00000000
    						+8		20036000 00000000
    
    

    Note that the symbol for exponentiation, **, is rewritten as a ^. This is deliberate. From close observation of the rewritten expression above, the reader will note that the line can now be reread, and the Interpreter will with each character load the address of a value, or a parenthesis, or an arithmetic operator. With the new structure, the Interpreter has surpassed the problem of character names and lengths as well as variable and value string length. Next, higher level languages deal with the execution priority sequence. Usually they will execute everything in parentheses first, then they will execute each operation on the line according to the operator priority. Thus the Interpreter will execute everything in parentheses. It will accomplish the task by reading and storing the expression. Whenever a begin parenthesis is encountered, the Interpreter forgets what it has read, and from the point of the new parenthesis it begins to read and store the remainder of the expression. Whenever an end parenthesis is encountered, the Interpreter solves whatever portion of the expression has been stored at that point. When the expression within the parenthesis has been calculated, the final value determined will be stored in VALU in the lowest relative address of this mini-expression, and the expression will be rewritten, only this time, with one less level of parentheses. When the Interpreter does not find any parentheses, it will run into the semicolon stored at the end of the instruction by PREPROC, and then it will solve the expression which has been read. When more than one operator is in an expression, priority is given in the following order: Within the example given above, the final solution would be derived as follows:

    
    		=(O+2)*4-6^8		(0+2) is evaluated and stored in 0
    		=0*4-6^8		6^8 is evaluated and stored in 6
    		=0*4-6			0*4 is evaluated and stored in 0
    		=0-6			0-6 is evaluated and stored in 0
    		=0			Final value will always be in zero
    
    

    From the breakdown given above, the two points are worth noting; first, the final will always be in relative address 0 in VALU, and secondly, the Interpreter does not require four passes through the line to solve such a trivial problem. The Interpreter requires one pass to rid the expression of the parentheses, and one pass to solve the problem.

    One part of expression solving which has been ignored until now is function processing. When the Interpreter comes across a function request such as TAN, SQRT, etc., it must come up with a quick evaluation of that expression before it can continue to the actual mechanics of expression solving which were discussed above. Thus before moving on, it is necessary to jump back to discuss the functions which come with the Interpreter, and how they work.

    1. EXP (X)--This function solves the problem, e**X, where e=2.718281828. The Interpreter arrives at the value by a method of series expansions through the first sixty expansions [5]. The equation is

    EXP (X) = (x**0)/(0!) + (x**1)/(1!) + (x**2)/(2!) + ... + (x**59)/(59!)

    The expansion is carried so far, because when dealing with real numbers, the Interpreter calculates the solution to seven decimal places. In problems where X is extremely large or small, the seven place accuracy can be obtained only by going through sixty expansions.

    2. LN (X)--This function determines the natural logarithm of X. Initially the Interpreter tried using series expansion for this problem, but too many equations were necessary to cover the different cases for the value of X. Finally, a Newton-Raphson Iterative type equation was accepted. The equation is

    
    		LN(x) = Y(I+l)
    				where Y (I+l) = y (I) + [X-EXP(Y(I))] / EXP(Y(I)), X.LE.0,
    				and Y(I+1) - Y(I) .LT. 0.0000000001
    
    

    Note that in solving the natural log of X, the Interpreter makes use of the Exponential function discussed above. In many of the function calculations, the reader will notice that the Interpreter often relies on already developed, tried, and tested software to develop new software. Above it was mentioned in the discussion of the exponential function that sixty iterations were required to attain accuracy for certain X. Another instance where high accuracy is required in the exponential is with the natural log function since it is dependent upon the former function. In many cases while testing the natural log method of calculation, it was noted that an inaccurate value for EXP(Y(I)) prevented the Newton-Raphson formula from converging rapidly enough. Thus, it was necessary to extend the series until it provided the accuracy to deal with the varying values of Y(I).

    The tolerance factor used in all iterative type solutions is as shown above, .0000000001. Such a small tolerance affords a high factor of accuracy for all calculations. It should be brought out that whereas the Interpreter only prints the solution to seven decimal digits, often the final output will be varied by the lower order values. In use of floating point notation, the Interpreter can express up to twelve digits of a number. These digits may be to the right or left of the decimal; however, more often than not, the Interpreter will be calculating numbers to the ninth and tenth decimal place. Therefore, in discussing the accuracy of calculations, many times the Interpreter is dealing with the lower order values which may not be printed, but obviously affect the final solution.

    3. LOG (X)--This routine will calculate the common logarithm of X. The method is very simple using the identity

    LOG (X) = (2.302485093) * LN(X), X.GE.0

    Again the reader can see where the preciseness all the way back to EXP (X) will be required just to calculate the common logarithm of X.

    4. SQRT (X)--This routine will calculate the square root of X. As might be expected, the solution is determined by using the famous Newton-Raphson method of iteration. However, as the reader has been told, the Interpreter, whenever possible, will make use of already developed technology for continued developments, so the decision had :to be made as to whether to use the exponentiation of X to the one-half power. The problem was quickly resolved by the examination of the two techniques. Whereas in every example tested (see Appendix B), the solutions were identical, the Newton-Raphson method proved itself quicker in reaching the final solution. It was interesting and reassuring to note that each method, even though completely different software was used for each, produced the identical value to the seventh decimal place. The Newton-Raphson iteration formula for the square root is

    
    		SQRT (X) = Y(I+l)
    				where Y(I+l) = Y(I) + [X - Y(I)**2] / (2*(Y(I))),
    				amd X.GE.0, and Y(I+l)-Y(I)<.0000000001
    
    

    5. FACT (X)--This function will calculate the factorial of X. The routine begins with X, decrements by one, and multiplies the two to arrive at the product. Then, it continues to decrement the value of X by one, multiplying the new number by the previous product, which yields a new product, and continues through the cycle until X becomes equal to one. Then the final value has been found. Of note, X must be a positive integer or an error will result (see Section IV, para 13). The equation is

    FACT (X) = (X)(X-1)(X-2) . . . (1), where X is a positive integer.

    6. ABS (X)--This routine determines the absolute value of X. The determination is made according to the following relationships

                         
    			   		X=X	for X positive or zero
    		ABS (X) equals  			or
    			   		X=-X	for X negative
    
    

    7. SIN (X)--This routine will calculate the sine of X by using the series expansion method [8]. The formula is

    SIN (X) = (x**1)/(1!) - (x**3)/(3!) + (x**5)/(5!) - (x**7)/(7!) + (x**9)/(9!) - (x**11)/(11!)

    Note that the series is alternating in sign; also note that the series is expanded only to the sixth expansion. The series is short because it provides a quick and highly accurate convergence to the solution.

    An additional capability to the user which is not immediately obvious is that the function has radian or degree input available. If the user wishes to exercise the radian input option, he can precede the parentheses with an R; otherwise degree input will be assumed.

    8. COS(X)--This routine will calculate the cosine value of X. The series expansion technique is also used with cosine, and the formula is

    COS (X) = (x**0)/(0!) - (x**2)/(2!) + (x**4)/(4!) - (x**6)/(6!) + (x**8)/(8!) - (x**10)/(10!)

    Note here also that the series has an alternating sign, and it too is expanded only to the 6th element. Although it should not need mention; the cosine function has the same option for radian and degree input as the sine function.

    9. TAN (X)--This routine will calculate the tangent of X. For this value, it is simple for the Interpreter to determine the sine and cosine values of X and use the following relationship

    TAN (X) = SIN(X) / COS(X)

    Of note, the tangent function has the same restrictions and advantages as it constituents sine and cosine.

    10. ASIN (X)--This routine calculates the arcsine, or inverse sine, of X. The value is calculated using the N6wton-Raphson iterative technique. The formula is

    
    		ASIN (X) = Y(I+l)
    				where Y(I+l) = Y(I) + [X - SIN(Y(I))] / COS(Y(I))],
    				and X.GE.-1, X.LE.1 and Y(I+l)-Y(I)<.0000000001
    
    

    11. ACOS (X)--This routine calculates the value of the arccosine, or the inverse cosine, of X, again with the ASIN function, by using the Newton-Raphson method of iteratively converging to the solution. The formula is

    
    		ACOS (X) = Y(I+l)
    				where Y(I+l) = Y(I) + [COS(Y(I)) - X] / SIN(Y(I))],
    				and X.GE.-1, X.LE.1 and Y(I+l)-Y(I)<.0000000001
    
    

    12. ATAN (X)--This routine will determine the acrtangent or the inverse tangent of X. The calculation is performed by a direct formula using

    ATAN(X) = ASIN [(X**2)/(1 + X**2)] ** 1/2

    Although not mentioned with the inverse trigonometric functions, the user can get his answer either in radians, by using the R preceding the parentheses in the instruction, or by degrees by leaving the blank as discussed with the trigonometric functions.

    13. CNVRT(STR(Nl:N2))--This routine is much simpler than it appears. When data is retrieved from the computer's data base, it is given to the user in string character format (see Sedtion IIB, para lb5); however, the user may often wish to make calculations from the line of data, such as calculation of time differences, or to perform other numerical comparisons. Before these calculations are possible, the data must be converted from string constant format (BCD) to integer format (binary). Obviously, since the Interpreter has already solved this problem in data input processing with BCDOCTL, it will not be difficult to perform the conversion for the user. In the format above, STR represents the string variable name (in the case of the computer's data base, the name would be STAR), Nl directs the Interpreter to skip the first Nl characters of the string, and N2 is the number of characters to be converted. The conversion will be done as one value in length of N2 characters up to twelve digits. As the reader may recall, the limit to twelve digits was imposed by BCDOCTL (see Section IIA, para 4d).

    Now that each of the functions has been discussed, the emphasis appropriately turns to the use of these functions. In each of the above functions, the operand variable X was used, but there was never any mention of what the Interpreter expected in X. To make the execution of functions more systematic and smoother, the Interpreter uses a special routine PARENCHK, which looks at the argument in the function and returns the value which is to be operated on. The only restriction imposed by the routine is that the argument cannot be a complex expression, and it cannot contain any arrays. The above statement says that complex expressions cannot be used, so obviously we should define what constitutes a complex expression. Any expression which requires the use of parentheses is considered by the Interpreter to be complex. Thus, the restrictions are really saying that the Interpreter cannot handle any data item which contains a parenthesis; thus, an array cannot be used because of the array elements in parentheses. Once the routine has returned the value, in floating point form for computations, the function can derive the value of its desired purpose, and the routine to solve expressions can once again commence execution of the expressions.

    (b) Logical Expressions {626 words}

    As stated above, it is necessary to have two expression routines. In solving normal expressions, the Interpreter is concerned with numerical calculations, in the case of the integer and real number expressions, and with character string manipulations. The reader may be questioning the need for an additional routine for Boolean expressions. The distinction is not drawn because of the data types--that is binary arithmetic as opposed to alphabetic strings. The difference in the two expressions is the operator. The Boolean expression is using logical operators, distussed in the 'IF ... THEN' Statement (see Section IIB, para la3), whereas the normal expression is using normal arithmetic operators. In the case of the character string expressions, there are no operators used. Either the string variable is equal to a string constant enclosed in double quotes, or it is equal to another string variable. To evaluate the Boolean expression, the Interpreter has a routine, BOOLSTOR, which can handle simple and complex expressions; that is, Iogical expressions which may or may not be linked by logical conjunctions. The logical conjunctions which link two logical expressions are:

    1. The 'AND' Logical Connector--For two subexpressions x and y, the logical AND connector will cause the followin actions.

    
    				true  =>  if and only if (x) and (y) is true
    	(x) AND (y) equals		or
    				false =>  otherwise
    
    

    2. The 'OR' Logical connector--For two subexpressions, x and y, the logical OR will cause the following action:

    
    				true  =>  if either (x) or (y) is true
    	(x) OR (y)  equals		or
    				false =>  otherwise
    
    

    3. The 'EOR' Logical Connector--For two subexpressions x and y, the expression is evaluated as follows:

    
    				true  =>  if either (x) or (y), but not both, is true
    	(x) EOR (y) equals		or
    				false =>  otherwise
    
    

    The reader should note that the expressions cited above, x and y, are the same type of expressions as used in the 'WHILE' and 'IF' Statements. As such, x and y will be expressions using logical operators. As may be recalled, these operators are GE, GT, LE, LT, NE, EQ, and when used they must be flanked by periods, and the entire subexpression must be enclosed in parentheses (see Appendix C).

    Missing from the above list of logical connectives is the 'NOT' Operator. This operator has been omitted because the user may easily set up the inverse of a Boolean character in the following manner.

    
    				BOOLEAN A, B
    					.
    					.
    					.
    				A= (B.  NE.  "T")
    					.
    					.
    					.
    				TERMINATE.
    
    

    In the above example, if B is equal to true, then (B.NE."T") will be false, and A will be false; conversely, if B is equal to false, then (B.NE."T") will be true, and A will be true. Thus even without the inverse Boolean operator, the user can still derive the same effect.

    This completes the execution stage of the Interpreter. At this point, the user should have a good idea of the working of the Interpreter, as well as the language itself. Now, the remaining pages will be devoted to the Error Diagnostics and the Data Dump System.

    ENDNOTES

    2. Lee, John A. N., "The Anatomy of a Compiler" (New York, D. Van Nostrand Company, 1974), p. 22.

    3. Control Data Corporation, CDC 3100 COMPUTER SYSTEM REFERENCE MANUAL (St. Paul, Control Data Corporation, 1965),p. 7-30.

    4. Sites, Richard L., "ALGOL W Reference Manual," (Stanford University, 1972), p. 40.

    5. National Book of Standards, Handbook of Mathematical FunctionsWith Formulas, Graphs, and Mathematical Tables, ed. by Milton Abramowitz and Irene A. Stegun (Washington, D. C.: U. S. Government Printing Office, 1964), p. 69.

    6. Ibid., p. 68.

    7. Thomas, George B., Jr., Calculus and Analytical Geometry, (Reading, Massachusetts, Addison Wesley, 1972). p. 463.

    8. Ibid., p. 822.

    9. Ibid., p. 823.

    Section III. The Data Dump

    Back To The Table Of Contents

    Back To TLEE's Home Page

    Send email to: tlee6040@aol.com