Chapter 4

Chapter 4

Data-types, variables etc…

Of memory and data-types,

and whether variables are English words,

computer words or quite simply,

storage spaces !

Data can be basically of three types – Alphanumeric, numeric or decimal numbers. In C & C++, these basic types are represented in the following form – char for alphabets plus numbers, int for numbers only, float for ‘short’ decimals, double for ‘long’ decimals and boolean for true/false. Data can be basically of three types – alphanumeric, numeric and decimal numbers but since interaction of data, in programming, is between an educated entity, the human being, and an illiterate alien – the computer, translation of this data into understandable binary equivalent becomes necessary and hence even normal figures like an arithmetic operator (+, -, *, %) or date or time, need to be considered as data-types. Representation of this data, in computer programming, is done through keywords. Numbers are represented as int (for values between –32768 to 32767); alphabets and numbers (alphanumeric) are represented through the char keyword and decimals through float or double. All int values are limited in size i.e. short by default. So, for values above 32767 we need to redefine or qualify the int type variable to a long type variable so that it can hold values between -2147483648 to 2147483647 as discussed earlier. Similarly, a double type variable is capable of holding more values than the float type.

Since, as we have discussed earlier, the computer needs to be told about every form of input that is given to it, almost all forms of input is some form of data or the other. The keys that are pressed are to be somehow defined or communicated to the computer so that it can translate, understand what key has been pressed so that appropriate processing could be taken on it. The memory in the computer again needs to be interpreted by your program and supplied in the form of an understandable language to the computer. So, this too becomes some form of a data-type.

The keyboard & the mouse

The keyboard is an auxiliary device and as such need not be classified as a data-type but the keys that are pressed need to be. The keys in a keyboard when pressed, generate an integer number called scan code and is unique for each key. Thus, the computer is able to understand which key has been pressed by its unique scan code. As computer programs are based on Input/Output only, the need to understand the standard input devices – the keyboard and the mouse – thoroughly, arises. But even if the key pressed generates an unique scan code how will we use this scan code in our program. Obviously by trapping it into some int type variable. But what will be the means by which we will obtain this scan code into our variable – through a function that we write ourselves or through a pre-defined function like getch( ), int86( ) or getche( ). All functions return a value and those which don’t are of void return type. The function getch( ) takes input through the keyboard and returns an integer which is the scan code of the key pressed. Thus, we can obtain the scan code of the key pressed. Int86( ) is a low-level, software interrupt function declared in dos.h.

Interrupt

Interrupts, in computer terminology, are actions which are also understood as services in C terminology. The mouse cursor is driven by an interrupt; the keyboard is driven by an interrupt; the memory is driven by an interrupt etc. You must have understood by now that the interrupts are actually slicing the CPU’s time or resources so that the clock-time of an action is decreased to the maximum to enable faster simulation of an action. Simulation is the keyword to understand most of the magic of the computer world. Everything is simulated to seem something. Thus, a cursor, if it blinks, does not actually blink but only simulates blinking. What actually happens is that a line or a small box appears and reappears in a matter of nanoseconds. But the line or the small box needs to be drawn, right ? And if your drawing of the cursor shape takes 1 second and disappearing it takes half-a-second and making it reappear takes another half-a-second, it has taken 2 seconds for the cursor to blink once which is abominably slow in the computer world. Hence, time-slicing, time-sharing and other means of sharing the CPU’s time and resources were conceived which were ultimately grouped to be known as interrupt services. Each interrupt service is denoted by an integer number in hex. And, to speedup the clock-time process, the interrupt services are enabled through the "int86 (, , ,)" dos function. Low-level functions enable fastest processing as they interact with the CPU at just one level above the assembly language. Execution speed of each pre-defined functions are clocked and defined and enhanced to enable faster processing. So, when you define a function of your own, make sure that your function does not roam about too much in useless processing, sticks to the minimum task and returns immediately to the calling function.

Execution speed of functions

Keypress. The magic word. In windows programming everything is event-driven; but in Dos, there is no such provision. Hence, we have to determine through subtle manipulation of logic as to when and what key has been pressed. Since, our program will depend, from start to finish, on the user-input, either through the keyboard or through the mouse the whole input process should be placed, ideally, within a "while" or a "do while" loop.

As far as the mouse is concerned, the task of determining the input type is far more easier because there are only three possibilities – mouse click, mouse move and mouse over. And only 3 buttons, at times just 1 or 2 (!), to be handled. And just as keys have scan codes, buttons too return values when pressed. Everything is bit masked when the mouse is used. No more integer values returned. The binary interpretation term for the computer is 0 or 1 and all data transfer between the CPU and the user is a manipulation of these 2 digits which is called the bit pattern. Hence, to trap the mouse action, the bits returned by the mouse press or mouse event is evaluated (and-ed) with another value to determine what action to take. A logical and, which we have discussed in the earlier chapter, is used.

File

Keys, buttons and now, files. Files are another form of storage space but is a permanent form of storage unlike variables. A file is used to store a large set of information, in a permanent form, in a hard disk or in any type of disk media. Let us consider the basic operations and elements of "files" and "file manipulation". A file, if you are familiar with a windows file, has a name, a size and the time of creation attached to it. There is one other element, which is the mode. A mode of a file can be understood as the access specifier for the file. Let us understand the various computer operations as may be performed on a file. A file may be 1. Opened. 2. Read 3. Written on to 4. Deleted 5. Renamed 6. Closed 7. Copied 8. Edited or appended to. For you, the human being, the file "autoexec.bat" can be identified by its name, but for the computer, an identifier has to be supplied which is in the form of a number again, a unique number for each file accessed. The next step after providing an identifier is to tell the computer what kind of a file is it on which the user seeks action. That is, if the user seeks an "open file" action on a given file name, the actions that need to be performed by the computer are, to check if the file exists at all, and if not, inform the user about it. So, to achieve the above action the computer needs two types of information – the filename and the access mode. The first can be supplied in the form of a string (i.e through a character variable) like "filename.ext" and the second information as an integer as specified by the file organization paradigm of the OS or even in the form of constants. (A constant is also a data-type and can be defined as a variable whose value does not changed, once initialized.). And so, the modes are assigned to a file at the time of its creation. A file can be "read-only", "read-write", "write only", "hidden", "archive" or "system". And since, a file is a container for large sets of data, it may be cumbersome to use the basic data-types, which have limited storage capacity, when transferring data to and from files. Hence, a concept called "buffer" has been used in computer programming which solves this problem in file I/O. A buffer is a temporary, large storage space in memory. The size of the buffer, in many cases, is dependant on the buffer size as specified in the "config.sys" of the DOS OS. Another element, in file handling, is the handle of the file when created. In Dos, and hence in C & C++, a file, as it is identified by its name by the human being, is identified by a unique number called "a handle". This handle is subsequently used in the program to access and manipulate data in the file.

Memory

So, another medium of storage is memory. Memory plays a highly integral role in data-storage. Memory is used for storing variable values as well as other elements of programming like a function, a function’s header and parameters and even your source code (C or C++) instructions. The memory of the computer is divided into segments – a code segment (CS), a stack segment (SS), a data segment (DS) and a "heap". The code segment stores, as you can guess, the code of your program (to test this type in a program of more than 8,000 or 9,000 lines of code into one single file and try to compile it and you will get a segment error as "the program size is too large to fit into memory"); a stack segment is where values of a variable or function delimiters are pushed and popped; a data segment is where the variables & data actually reside. But now we are faced with a real, tough task. That is, even if memory is used as a storage medium how do you actually use it or in other words, how does your program interact with it ? This is how. Memory is identified by unique addresses which are assigned to entities like a variable or an object and this address is used to refer to the memory location where a variable or its value or a function resides. For example, if we know that "result" is a variable of type "int" (declared as int x & int y), we will refer to this resultant value through the identifier "result"(as in a function call – cout << result or printf("%d", result) – but the computer knows nothing of this English word - "result". It only understands the convention of assigning a memory address to a variable when it is declared, first time, in your program. This applies to all and any type of variable. As also to another type of variable called a "pointer" – a variable which is all about memory, memory references, memory locations and memory addresses only.

Pointer

Pointers are special to C & C++. There are pointers and there are pointers to pointers and there are pointers to pointers to pointers and then there are even more pointers to pointers to pointers to pointers ! It is not as silly as it sounds. In fact, pointers are very interesting when you sit on them. Before proceeding further, it becomes necessary, of course, to define them. Pointers hold the address of the memory location where a value resides. Although many a student considers pointers as difficult, tiresome and better left alone due to its complex concept, it is not exactly so. Pointers are the best and the easiest form of storage because more often than not you may need to change the capacity of a variable and pointers allow you to allocate dynamic memory for variables. Once you begin to use pointers, memory and its allocation and even your programming technique will change for the better. A pointer is a tool to be used sparingly and cautiously and only experienced programmers should employ pointers or the whole system may crash !

A pointer can be of any basic data type. A pointer variable is declared with an "*" as a prefix to an identifier. A pointer is dynamically allotted memory. That is, its capacity can vary according to the size of the data. A pointer can point to an object, to an array, to a structure or to a string variable and of course, a pointer can point to a function, too. The pointer to a function, as a concept, has been retained to explicitly design "delegate", a key concept in the .Net architecture & administration. A pointer is essentially used when size of data may vary at run-time. We can conclude by this assertion that such data can only be either text or an image or a buffer. A pointer is also used in complex data structures, which we shall discuss soon, like stacks, queues, binary trees or linked lists. We must take a break here from pointers and consider the above statement deeply. Why are there complex data structures if they are complex? It would seem logical to ignore them and go for the simpler data structures. But then, are there simpler data structures ? There are. In fact, it is out of these simple data structures, the complex data structures are formed. Due to complexities of a problem, simple data structures cave in and hence the complex data structures are used. We shall see how. But before that we need to understand the simple data structures, their need and their advantages and their limitations. The first in line of many simple data structures is a structure.

Structure

A structure is a group of related data members/variables/items. This group of variables are called members of the structure and they can be accessed only through a variable of type structure just like a variable of type int. If you are familiar with a database, structures will not be very difficult to understand. If you are not a record of all related data members are grouped together to form a structure. That is, a record of all the students of all the classes in a school can be maintained in a single structure since the data is related. A structure’s purpose can be defined as a collection of related data. For example, if a school needs to automate its operations, a structure called "studentsdetails" will easily solve the purpose. The structure shall be defined, in this case, as follows

struct studentsdetails{

char name;

int rollno;

char address;

int class;

char section;

};

And once such a structure has been defined, to access the members of the structure – name, address, class & section – a structure variable needs to be declared as below.

struct studentsdetails s;

where ‘s’ is the structure variable and the members of the structure (name,rollno,address,class,section) are assigned values as follows

s.name = "William Bell";

s.rollno=21;

s.address = "123, High street, Bangkok";

s.class=9;

s.section = ‘A’;

The data access through the usage of a structure becomes highly easy. Because of its nature, a structure is used more often for building complex data structures. Similar to a structure, there is another structure called a ‘union’.

Union

An union is exactly the same as a structure in definition, syntax and access but is defined with the keyword union. There is another major difference between an union and a structure, in that only one member of the union can contain a value at a given time whereas a structure provides storage for each of its members. The amount of storage space for the union is based upon the size of the largest member of the union.

union myunion{

int a;

char str[100];

float salary;

}union_var;

where union_var is the name of the union variable and 100 bytes of storage space will be allotted to the above union. Before going on to the complex data structures, let us quickly know about another data type called arrays.

Array

Arrays are known as derived data types chiefly because they can be of the basic data types hence they are understood as derived from these basic data types. Arrays are stored in contiguous memory locations and hence values in an array can be retrieved faster. But, an array also has a limitation in that, it can be of a fixed size only. This is because an array’s size has to be defined at the time of its declaration and cannot be modified dynamically. An array can be one-dimensional, two-dimensional or multi-dimensional. If you are familiar with matrices, a multi-dimensional array is just that – a combination of rows and columns. An array, if it is declared as int arr[100], is understood to be an array of type int, with its size being 100. An array starts with element 0 and ends at the 100^th element. An array can be of any basic data-type and can even be an array of pointers. An array can also be an array of objects. An array can store one digit or one character per element. As you must have understood by now, the simpler data structures are for handling basic I/O operations in a computer program. But real-life problems are not tailor-made to suit your convenience. Real-life problems can be numerous, varied and unpredictable. The makers of C have provided all kinds of data structures to possibly overcome any real-life problem regardless of its complexity. You can understand this provision like a city-planner making a master plan and leaving the architects to fill in the gaps with any type of house. The implementation depends on the programmer’s skills.