DATA STRUCTOR

DATA STRUCTOR

STACKS
A stack is used to provide temporary storage space for values. It is defined as a data structure which operates on a first in, last out basis. Its uses a single pointer (index) to keep track of the information in the stack.

The basic operations associated with a stack are,

insert (push) an item onto the stack
remove (pop) an item from the stack

The following diagram shows an empty stack of four locations. It looks just like an array, and it has an index pointer pointing to the beginning of the stack (called the TOP of the stack).

     +--------+

     |        |   <------- Stack Pointer

     +--------+

     |        |

     +--------+

     |        |

     +--------+

     |        |

     +--------+

Pushing items onto the stack
The stack pointer is considered to be pointing to a free (empty) slot in the stack. A push operation thus involves copying data into the empty slot of the stack, then adjusting the stack pointer to point to the next free slot.

   Module Push

        stack[stack_pointer] = data;

        stack_pointer = stack_pointer + 1;

   Module End

Lets push the value 45 onto the stack. [Note: The stack could hold any data type, not necessarily decimal numbers. We have used numbers for simplicity]. The stack now looks like

     +--------+

     |   45   |

     +--------+

     |        |   <------- Stack Pointer

     +--------+

     |        |

     +--------+

     |        |

     +--------+

Note how the stack pointer has been adjusted to point to the next free location in the stack. [Note: for the time being we are ignoring certain problems. We will address these shortly!!!].

Removing items from the stack
To remove an item, first decrement (subtract 1 from) the stack pointer, then extract the data from that position in the stack.

   Module Remove

        stack_pointer = stack_pointer - 1;

        data = stack[stack_pointer];

   Module End

Time now to address the problems of the above implementation
There are a number of problems associated with the above routines for pushing and removing items.

the push module does not check to see if the stack is full
the remove module does not check to see if the stack is empty

There are a number of solutions to this problem. We shall present a simplified solution. We do not argue that it is the best, just that it is one of a possible number of solutions.

   Comment: Assume that the array elements begin at 1

   Comment: Assume that the maximum elements of the stack is MAX

   Var: stack[MAX] : Integer;

   Module Initialize

        stack_pointer = 1;

   Module End

   Module Push

        if stack_pointer >= MAX then

           return error

        else begin

           stack[stack_pointer] = data;

           stack_pointer = stack_pointer + 1;

end

   Module End

   Module Remove

        if stack_pointer <= 1 then

           return error

        else begin

           stack_pointer = stack_pointer - 1;

           data = stack[stack_pointer];

end

   Module End

QUEUES
Everybody has experience with queues as they are common place. Queues occur in cafeterias, petrol stations, shopping centres, anyplace where many people (customers) line up for few resources (cashier's, salespeople, petrol pumps etc).

The purpose of a queue is to provide some form of buffering. Typical uses of queues in computer systems are,

process management
buffer between the fast computer and a slow printer

A queue is defined as a data structure which holds a series of items to be processed on a first in first out basis (though some queues can be sorted in priority). The operations associated with a queue are,

initialize the queue
insert an item in the queue
remove an item from the queue
count the number of empty slots in the queue
count the number of items in the queue

The following diagram shows an empty queue. It is identified as a series of ten locations, and two pointers named front and rear. These two pointers keep track of where the front and rear of the queue is.

          1   2   3   4   5   6   7   8   9  10

        +---+---+---+---+---+---+---+---+---+---+

        |   |   |   |   |   |   |   |   |   |   |

        +---+---+---+---+---+---+---+---+---+---+

        +---+

        | 1 | Front

        +---+

        +---+

        | 10| Rear

        +---+

The front pointer is used to delete items, and the rear pointer insert items. Inserting two items called A and B will rearrange the queue to look like,

          1   2   3   4   5   6   7   8   9  10

        +---+---+---+---+---+---+---+---+---+---+

        | A | B |   |   |   |   |   |   |   |   |

        +---+---+---+---+---+---+---+---+---+---+

        +---+

        | 1 | Front

        +---+

        +---+

        | 2 | Rear

        +---+

The pseudo-code for the various queue operations follows.

        module initialize

               count = 0

               head = 1

               rear = size of queue

        end module initialize

        module insert

               if count = size of queue

                       queue is full and do not insert

               else

               begin

                       increment count

                       increment rear

                       if rear > size of queue

                               rear = 1

                       endif

                       insert data at queue[rear]

               endif

        end module insert

        module remove

               if count = 0

                       queue is empty and do not remove

               else

               begin

                       data = queue[front]

                       decrement count

                       increment front

                       if front > size of queue

                               front = 1

                       endif

               endif

        end module remove

        module count

               return count

        end module count

        module free_space

               return queue.size - count

        end module free_space

The implementation of this is left to the student as a programming exercise.

LINKED LISTS
Data structures can be arranged in memory in a variety of different ways. Each method has advantages and disadvantages. Arrays for example seem easy to use, where each element is stored sequentially in memory.

This type of approach is not efficient in larger computer systems, where a number of users share main memory. In such circumstances, there may not be enough contiguous main memory left to hold a sequentially stored data structure like an array (but there could be enough if all the small free blocks were moved into one large block).

One way to overcome this is to link all elements of data together. Data is arranged into records, and a special item is added to each record called a pointer (a link field), which contains a link to the next record etc. Renaming each record as a node and we have a complex data structure called a linked list.

A linked list is a series of nodes connected together via pointers. Pictorially, it looks like,

         Node 1          +---+ Node 2        +---+ Node n

        +--------+--+    |  +--------+--+    |  +--------+--+

        |        | ------+  |        | ------+  |        |  |

        +--------+--+       +--------+--+       +--------+--+

           Data   Link        Data    Lnk         Data    Lnk

In Pascal, a node definition looks like,

        type    ptr = ^node;

               node =  record

                       data : string[20];

                       next : ptr;

               end;

The following Pascal program declares three nodes, inserts data into each of them, then using a pointer, cycles through the chain from node to node printing out the contents of each node. The value 0 is always stored in the last node of a chain, to indicate the end of the list.

        program linked_list;

        type    ptr = ^node;

               node = record

                       data : string[20];

                       next : ptr;

                       end;

var

               node1, node2, node3 : node;

               nodeptr : ptr;

        begin

               node1.data := 'Welcome ';

               node1.next := @node2;

               node2.data := 'to ';

               node2.next := @node3;

               node3.data := 'Linked Lists.';

               node3.next := nil;

               nodeptr := @node1;

               while nodeptr <> nil do

               begin

                       write( nodeptr^.data );

                       nodeptr := nodeptr^.next;

               end;

                writeln;

        end.

Linked lists are used in a wide variety of applications.

directory structures
operating systems for process management
data management in data base systems.

An advantage of storing data using a linked list is that the key sequence can be altered by changing the pointers, not by moving data nodes. This means sorting data is quick, but searching is slower as you must follow the chain from node to node.

HASHING
Hashing is a technique used to improve data access times. Rather than sorting the data according to a key, it computes the location of the data from the key. In simple terms, it computes an index value from the key of the desired record. At that index value, the record is stored/retrieved.

If we had an array of 1000 elements, we would expect our hash function (the method used to calculate the index value) to evenly distribute the keys over this range. In practice this is difficult to achieve. It is possible that two different search keys might generate the same hash value (ie, index), and we call this a collision.

The size of the table (array) is generally a prime number, as this has the least probability of creating collisions.

The following code segment defines an array of records in Pascal which will be used to demonstrate hashing.

        const   size = 511;

        type    data = record

                       name : string[20];

                       age : integer;

               end;

               hashtable = array[1..size] of data;

Next, lets initialize the table with zero's, so this makes it easy to see if information is already stored at a particular element (important when inserting data later).

        procedure initialize ( var Table : hashtable );

        var i : integer;

        begin

               for i := 1 to size do

               begin

                       Table[i].name := '                    ';

                       Table[i].age := 0;

               end;

        end;

The procedure insert will take the hash value and the data and insert it into the table. If the position is already occupied (ie, a collision occurs) the table will be searched from that point onwards till an empty slot occurs.

        procedure insert ( Position : integer; Element : data ; var Table : hashtable );

        begin

               while Table[Position].name[1] <> ' ' do

                       Position := (Position + 1) mod size;

               Table[Position] := Element;

        end;

The procedure hash will generate a hash value from a given data record. In deciding this, it must generate a value between 1 and 511. Obviously, we cannot use the age field of the record, this will generate too many collisions. The best method is to add up the value of all the characters in the persons name, then extract nine bits from it (2^9 = 512) to generate the index value.

        procedure hash ( var Position : integer ; Element : data ; var Table : hashtable );

        var i : integer;

        begin

               i := 1;

               position := 0;

               while i < 20 do

                       position := position + ord(Element.name[i]);

               position := position mod 511;

        end;

The remaining procedures to search and delete are left as a programming exercise for the student.

INDEXED SEQUENTIAL
This method seeks to make a direct access file appear like a sequence file. The sequencing is achieved via an index table, which holds the keys of the records and their position within the file.

A program reads the index sequentially, and when it locates the key it desires, it reads the record from the indicated position.

The advantages of this method are,

the records need not be in key sequence
the records may be processed sequentially or directly

FILE MERGING
This is the process of merging two or more input files into a single output file (which are normally all sequential in nature). The files to be merged are first sorted using the same key sequence, which is preserved in the output file.

A pseudo-code example for a 2-way merge is shown below.

        program two_way merge

        begin

               read 1st record of infile1, name as rec1

               read 1st record of infile2, name as rec2

               while rec1 or rec2 is not an end-of-record

               begin

                       if rec1.key < rec2.key

                       begin

                               write rec1 to outfile

                               read new rec1 from  infile1

end

                       else

                       begin

                               write rec2 to outfile

                               read new rec2 from infile2

end

                       endif

               endwhile

               write end-of-record to outfile

        end program two_way merge