http://www.oocities.org/SiliconValley/Peaks/8778/TAU_advprog.html

1 Introduction

1.1 Common information

1.2 What is an algorithm?

We all know algorithms in out everyday life. For example: the description how to make a soup (soup recipe) is an algorithm:

  1. to take a volume
  2. to fill it with water
  3. to put it on the fire
  4. ...

If we recall how a computer is built then we know that there is a processor inside. The only thing the processor can do is to perform very simple instruction, like:

  1. get a number from memory A
  2. get another number from memory B
  3. add B to A
  4. put the result to memory C
  5. ...

This is also an algorithm. The only difference between human and computer algorithm is that an algorithm for computer must be absolutely precise.

1.3 What is a program?

A program is an algorithm that may be executed by a processor. (Of course, this algorithm may be complicated and consist of many more simple algorithms.) The only language a processor understands is a sequence of 0 and 1, so called "binary code". Some pieces of this binary code are commands for a processor, some are data the commands deal with.

It is clear that to write such a binary code for more or less not trivial programs is very not handy for men. So there were created various languages that are still enough formal to be easily converted to the binary code but much more readable (and therefore much easier to write) for humans. Of course, for each such a language there must be some translator that translates a text written in the language to the binary code. Such a translator is called compiler.

One of such languages is C, which we will use in this course. Like all other programming languages C has commands and data.

1.4 Data structures

Let's define so called "data model". A data model means a data object and set of operations defined on such data objects. For example: let's take a data model representing the well-known integer numbers. A data object here is an integer number, operations that we can make on such data objects are usual +, -, *, /, etc.
In C such a data model is called data type. C has built-in simple data types. The main of them are:
integer and float data types (all of the data types possible in C will be recalled later). For example:

int a, b;
float pi, r, s;

a = 5;
b = 3;
a = a + b;
pi = 3.14159;
r = 2.5;
s = pi * r * r;

Also C allows to build more complicated data types using these built-in simple types. Such data types are called struct in C. For example:

struct personal_data
{
	int age;
	int no_of_children;
};

struct personal_data michael;
michael.age = 25;
michael. no_of_children = 2;

C gives us only one built-in operation for such a new data type: access to its members (with operators . and ->).
If we want to define more operations for some specific data type we usually write a function. For example:

int is_adult( struct personal_data person)
{
	if( person.age >= 18)
		return( 1);
	else
		return( 0);
}

It is very regular situation when we need manipulate with not one data object but a collection of them. C has only one built-in construction to build data collection: an array. For example:

int points_for_exercises[5];
points_for_exercises[3] = 5;

Please note: array here is not the data collection, it is a tool to build the data collection.

Again: C has only one built-in operation for arrays: access to an element (a data object) of array (with operator []).

Various data collections may have different features. The main features that should be mentioned are:

Usually if we use some data collection we perform various operations on it, for example:

So such a data collection is also a "data model".

If we have a specific implementation of a data collection then we will call it data structure. E.g. array is the simplest data structure. In fact, using arrays we can build data collection with any features we need.

1.5 Why do we need various data structures?

So we have the reasonable question: why do we need various data structures? The main problem is an efficiency of various operations performed on a data structure.

What is the efficiency of an operation? In fact each operation is an algorithm. The efficiency of an algorithm is Algorithm running time. To understand this let's take array data structure and see 2 examples:

Example 1.1. To print in reverse order file containing exactly 10 lines (each line is not longer than 128 chars - just not to use dynamic memory allocation for now):

#include <stdio.h>

void main()
{
	char buf[10][128];
	int i;

	/* to collect all lines */
	for( i = 0; i < 10; i++)
		gets( buf[i]);
	/* to print what is collected */
	/* don't forget that indexes in C are 0-based */
	for( i = 9; i >= 0; i--)
		puts( buf[i]);
}

Example 1.2. The same as in example 1.1. But to avoid printing the same line twice (i.e. lines with the same contents).

#include <stdio.h>
#include <string.h>

void main()
{
	char buf[10][128];
	int i, j, last, the_same;

	/* to collect unique lines only */
	last = 0;
	for( i = 0; i < 10; i++)
	{
		gets( buf[last]);
		the_same = 0;
		for( j = 0; j < last - 1; j++)
			if( strcmp( buf[j], buf[last]) == 0)
			{
				the_same = 1;
				break;
			}
		if( the_same != 0)
			last++ ;
	}
	/* to print what is collected */
	for( i = last - 1; i >= 0; i--)
		puts( buf[i]);
}

The second example is much more complicated, but what is much more important for us: it takes much more time (see the part of the insertion to array). The difference is that it adds uniqueness in example 1.1 and an array does not fit well for this. What does "much more time" mean? Let's look into insertion part of example 1.1: if we wrote it for number of lines N, then we could see that for each line there is a fixed number of operations executed. We will say that running time of the insertion of N lines is O(N). Let's look into example 1.2: now there is for each line a number of operations dependent on number of lines (N). I.e. to insert each line we must check each line that has been inserted already (max is N-1), so to insert each line is O(N), so insertion of N lines is O(N*N) or O(N2). How the O() may be treated? E.g. if a program takes 100 sec for N=10 and it is O(N), then we can be sure that it will take about 1000 sec for N=100; but if the program is O(N2) then it will take about 10000 sec for N=100.

Well, there are other data structures more efficient for various operations, etc. The more important and usually used of them are covered by this course, for example:

Of course, we will learn the operations (versus algorithms) on these data structures, e.g.

And when we will look the algorithms we will learn some main approaches in the algorithm implementations, e.g. what is iterative and recursive algorithms, etc.

1.6 Reminding the terms used

Algorithm: description of sequence of actions.
Program: an algorithm that may be understood and performed by computer (processor).
Programming language: a language allowing writing of programs.
Data model: data object + set of operations on such data objects.
In C: data object = data type. There are basic types (int, float) only. The struct is used to build new types.
Note: struct is NOT a data type, it is a tool to build data types. Functions should be written to define operations for new data type.
Data collection: data model where data object is a collection of other data objects.
In C: the only built-in data collection (array) with the only operation defined (access to an element) - almost always insufficient, so array usually is used as a tool to build other data collections (defining operations as functions). Another tool is use of dynamic memory allocation and pointers (will be shown later).
The main problem is operations (algorithms) on a data collection. Implementation of data collection must be chosen based on this.
Algorithm running time (algorithm efficiency): a convention how to express the dependency of the time required for an algorithm on size of data collection. The convention used is O().

1.7 Basic programming techniques

This is a more practical issue: how to organize a program to deal with it more handy and easier. We are working under MS Windows in some IDE (integrated development environment): MSVC or BC. Each program there is a project, i.e. each program is independent on other programs (actually it is possible to create more complicated projects, but this is out of our scope in this course). Also it is a commonplace practice to place closely related functions and data (and only them) into one source file. To give access to such functions / data programmers use headers files (we recall them more detailed later). So for example: if we are building some program using a list then the recommended structure of the project (in the simplest case) is:

What's going on when you build this project?

First, the source files are compiled to object files (do you remember what is a compiler?), then these object files are linked to the result executable file:

list.c -> list.obj \
                    > main.exe
main.c -> main.obj /

In such a way we have more or less abstract implementation of the list data structure (in files list.c and list.h) and a program that uses this data structure (in file main.c). If tomorrow we want to write another program that uses this data structure then the only thing we have to do is to write another main.c.

This example is the simplest case of the modularity approach. We separated some functionality (the list data structure) in separated source files, so they became usable by many programs. The next step of modularity is to create library, i.e. to take such source files implementing some functionality and to compile them, i.e. to create a library. For our example we can create a library list.lib containing the file list.obj.
Now if want to use the list in some project then we don't need to copy the source files but just to link the library, i.e. our new project will be:

new_main.c -> main.obj \
                        > new_main.exe
              list.lib /