http://www.oocities.org/SiliconValley/Peaks/8778/TAU_advprog.html
1 Introduction
1.1 Common information
- Lecturer's name, phone number,
email.
- Assistant's name, phone number, email.
- The name of the course: "Advanced Programming"
- The plan of the course
- The main book of the course: Alfred V. Aho & Jeffrey
D. Ullman: Foundation of Computer Science. C Edition, Computer
Science Press 1995
- The course structure:
- --- up to 6 exercises (each one is 2 points max)
- --- mid-term ("magen") (20 points)
- --- examine (the rest of points)
1.2 What is an algorithm?
We all know algorithms in out everyday life. For example: the
description how to make a soup (soup recipe) is an algorithm:
- to take a volume
- to fill it with water
- to put it on the fire
- ...
If we recall how a computer is built then we know that there is
a processor inside. The only thing the processor can do is to perform
very simple instruction, like:
- get a number from memory A
- get another number from memory B
- add B to A
- put the result to memory C
- ...
This is also an algorithm. The only difference between human and
computer algorithm is that an algorithm for computer must be
absolutely precise.
1.3 What is a program?
A program is an algorithm that may be executed by a processor. (Of
course, this algorithm may be complicated and consist of many more
simple algorithms.) The only language a processor understands is a
sequence of 0 and 1, so called "binary code". Some pieces of this
binary code are commands for a processor, some are data the commands
deal with.
It is clear that to write such a binary code for more or less not
trivial programs is very not handy for men. So there were created
various languages that are still enough formal to be easily converted
to the binary code but much more readable (and therefore much easier
to write) for humans. Of course, for each such a language there must
be some translator that translates a text written in the language to
the binary code. Such a translator is called compiler.
One of such languages is C, which we will use in this course. Like
all other programming languages C has commands and data.
1.4 Data structures
Let's define so called "data model". A data model means a data object
and set of operations defined on such data objects. For example: let's
take a data model representing the well-known integer numbers. A data
object here is an integer number, operations that we can make on such
data objects are usual +, -, *,
/, etc.
In C such a data model is called data type. C has built-in simple
data types. The main of them are:
integer and float data types (all of the data types possible in C
will be recalled later). For example:
int a, b;
float pi, r, s;
a = 5;
b = 3;
a = a + b;
pi = 3.14159;
r = 2.5;
s = pi * r * r;
Also C allows to build more complicated data types using these
built-in simple types. Such data types are called struct in C. For
example:
struct personal_data
{
int age;
int no_of_children;
};
struct personal_data michael;
michael.age = 25;
michael. no_of_children = 2;
C gives us only one built-in operation for such a new data type:
access to its members (with operators . and
->).
If we want to define more operations for some specific data type
we usually write a function. For example:
int is_adult( struct personal_data person)
{
if( person.age >= 18)
return( 1);
else
return( 0);
}
It is very regular situation when we need manipulate with not one
data object but a collection of them. C has only one built-in
construction to build data collection: an array. For example:
int points_for_exercises[5];
points_for_exercises[3] = 5;
Please note: array here is not the data collection, it is a tool to
build the data collection.
Again: C has only one built-in operation for arrays: access to an
element (a data object) of array (with operator []).
Various data collections may have different features. The main
features that should be mentioned are:
- data objects in collections are ordered / not ordered;
- each data object is unique / not unique (per collection);
- the collection has pre-known size / unknown size.
Usually if we use some data collection we perform various operations
on it, for example:
- insertion of a data object
- deletion of a data object
- search of a data object
- sort of the collection
- union of 2 collections
- intersection of 2 collections
- difference of 2 collections
- ...
So such a data collection is also a "data model".
If we have a specific implementation of a data collection then we
will call it data structure. E.g. array is the simplest data
structure. In fact, using arrays we can build data collection with
any features we need.
1.5 Why do we need various data structures?
So we have the reasonable question: why do we need various data
structures? The main problem is an efficiency of various operations
performed on a data structure.
What is the efficiency of an operation? In fact each operation is
an algorithm. The efficiency of an algorithm is Algorithm running
time. To understand this let's take array data structure and
see 2 examples:
Example 1.1. To print in reverse order file containing exactly 10
lines (each line is not longer than 128 chars - just not to use
dynamic memory allocation for now):
#include <stdio.h>
void main()
{
char buf[10][128];
int i;
/* to collect all lines */
for( i = 0; i < 10; i++)
gets( buf[i]);
/* to print what is collected */
/* don't forget that indexes in C are 0-based */
for( i = 9; i >= 0; i--)
puts( buf[i]);
}
Example 1.2. The same as in example 1.1. But to avoid printing the
same line twice (i.e. lines with the same contents).
#include <stdio.h>
#include <string.h>
void main()
{
char buf[10][128];
int i, j, last, the_same;
/* to collect unique lines only */
last = 0;
for( i = 0; i < 10; i++)
{
gets( buf[last]);
the_same = 0;
for( j = 0; j < last - 1; j++)
if( strcmp( buf[j], buf[last]) == 0)
{
the_same = 1;
break;
}
if( the_same != 0)
last++ ;
}
/* to print what is collected */
for( i = last - 1; i >= 0; i--)
puts( buf[i]);
}
The second example is much more complicated, but what is much more
important for us: it takes much more time (see the part of the
insertion to array). The difference is that it adds uniqueness in
example 1.1 and an array does not fit well for this. What does
"much more time" mean? Let's look into insertion part of example
1.1: if we wrote it for number of lines N, then we could see that
for each line there is a fixed number of operations executed. We
will say that running time of the insertion of N lines is O(N). Let's
look into example 1.2: now there is for each line a number of
operations dependent on number of lines (N). I.e. to insert each line
we must check each line that has been inserted already (max is N-1),
so to insert each line is O(N), so insertion of N lines is O(N*N)
or O(N2). How the O() may be treated? E.g. if a program
takes 100 sec for N=10 and it is O(N), then we can be sure that it
will take about 1000 sec for N=100; but if the program is
O(N2) then it will take about 10000 sec for N=100.
Well, there are other data structures more efficient for various
operations, etc. The more important and usually used of them are
covered by this course, for example:
- list
- queue and stack
- trees
- graphs
Of course, we will learn the operations (versus algorithms) on these
data structures, e.g.
- sorting algorithms
- searching algorithms
- etc.
And when we will look the algorithms we will learn some main
approaches in the algorithm implementations, e.g. what is iterative
and recursive algorithms, etc.
1.6 Reminding the terms used
Algorithm: description of sequence of actions.
Program: an algorithm that may be understood and performed
by computer (processor).
Programming language: a language allowing writing of programs.
Data model: data object + set of operations on such data
objects.
In C: data object = data type. There are basic types (int, float)
only. The struct is used to build new types.
Note: struct is NOT a data type, it is a tool to build data
types. Functions should be written to define operations for new data
type.
Data collection: data model where data object is a collection
of other data objects.
In C: the only built-in data collection (array) with the only
operation defined (access to an element) - almost always insufficient,
so array usually is used as a tool to build other data collections
(defining operations as functions). Another tool is use of dynamic
memory allocation and pointers (will be shown later).
The main problem is operations (algorithms) on a data
collection. Implementation of data collection must be chosen based
on this.
Algorithm running time (algorithm efficiency): a convention
how to express the dependency of the time required for an algorithm
on size of data collection. The convention used is O().
1.7 Basic programming techniques
This is a more practical issue: how to organize a program to deal
with it more handy and easier. We are working under MS Windows in
some IDE (integrated development environment): MSVC or BC. Each
program there is a project, i.e. each program is independent
on other programs (actually it is possible to create more
complicated projects, but this is out of our scope in this
course). Also it is a commonplace practice to place closely related
functions and data (and only them) into one source file. To give
access to such functions / data programmers use headers files
(we recall them more detailed later). So for example: if we are
building some program using a list then the recommended structure of
the project (in the simplest case) is:
- list implementation file (list.c) - there are all functions
implementing the list functionality
- list definitions file (list.h) - there are prototypes
(i.e. declarations) of the functions from list.c
- code using list functionality (main.c) - this is the program
using the list (i.e. the functions prototyped in list.h).
What's going on when you build this project?
First, the source files are compiled to object files (do you
remember what is a compiler?), then these object files are linked
to the result executable file:
list.c -> list.obj \
> main.exe
main.c -> main.obj /
In such a way we have more or less abstract implementation of the
list data structure (in files list.c and list.h) and a program that
uses this data structure (in file main.c). If tomorrow we want to
write another program that uses this data structure then the only
thing we have to do is to write another main.c.
This example is the simplest case of the modularity approach.
We separated some functionality (the list data structure) in
separated source files, so they became usable by many programs. The
next step of modularity is to create library, i.e. to take such
source files implementing some functionality and to compile them,
i.e. to create a library. For our example we can create a library
list.lib containing the file list.obj.
Now if want to use the list in some project then we don't need to
copy the source files but just to link the library, i.e. our new
project will be:
new_main.c -> main.obj \
> new_main.exe
list.lib /