http://www.oocities.org/SiliconValley/Peaks/8778/TAU_advprog.html
3 Linked List
Let's recall what is a data model:
data model = data object + set of operations
Well, we will discuss the "list" data object with various operations
on it, for example: insertion to list, deletion from list, search in
list, etc.
What is a list? A list is a finite sequence of zero or more
elements. We can have lists of integers, lists of real numbers, list
of structures, lists of lists of integers, and so on. E.g. this is
a list of integers:
( 1, 2, 3, 5, 2, 6, 4, 1 )
The length of a list is the number of occurrences of elements on the
list. Note: this is number of "positions" not distinct elements, so
in our example we have length of the list = 8, although there are only
6 differ elements.
A list may have various features, e.g. to be ordered or non-ordered,
to contain unique elements (each element can occur in the list only
once) on not unique, etc.
We can implement lists by various data structures, e.g. arrays. But
arrays become very not efficient if our lists tend to grow - each
time we reach the end of array we have to allocate new array and to
copy all the elements to this new array. So usually lists are
represented by linked cells (nodes) like the example that was used
before. Such a data structure is called "linked list".
We will use the following structure to represent a node of the list
(e.g. for list of integers):
typedef struct list_node
{
int val;
struct list_node* next;
} NODE;
Note: A working example of a linked list implementation is
provided:
http://www.oocities.org/SiliconValley/Peaks/8778/TAU_advprog-IntList.html
.
You can use it freely for any purposes (including the use as a
base for your exercises).
3.1 The simplest operations on list. Iterative and recursive algorithms.
Let's see 2 very basic operations on linked list, namely search in
list and deletion from list. (An example of implementation of the
insertion to such a list was given before. Various insertions will
be discussed a little bit later.)
Let's write a function that checks whether an integer number is
contained by a list of integers:
int lookup( NODE* list, int x)
{
NODE* n;
for( n = list; n != NULL; n = n->next)
if( n->val == x)
return( 1);
return( 0);
}
(Let's recall what is running time of an algorithm. Our "lookup"
algorithm takes O(N) time.)
Now let's introduce very interesting and useful thing: recursive
algorithms. What does it mean "recursive algorithm"? This is an
algorithm that calls itself, usually to perform the same actions
for some subtask, i.e. reduced original task.
Many programming languages (including C) allow functions to call
themselves. This allows us to implement recursive algorithms
easily.
Let's take our "lookup" algorithm. You can see that after each
iteration of the loop we have the same searching but for the tail
of the list, i.e. for the sublist. So we can write this function
recursively, e.g.:
int lookup( NODE* list, int x)
{
if( list == NULL)
return( 0);
else if( list->val == x)
return( 1);
else
return lookup( list->next, x);
}
The running time of this function is still O(N).
OK, let's implement another basic operation on a list: deletion. Let's
assume that out deletion function must remove from a list ALL nodes
with some value. The main difference here is that this operation may
change the structure of our list (i.e. to delete some nodes from it),
such a change may affect the first node also, therefore we can get
as a result changed pointer to the beginning of our list (to the 1st
node of the list). So the function must both accept this pointer as
one of its parameters and to return its possibly changed value.
This may be solved in 2 ways: 1. returning the new value, and
2. passing pointer to the changing variable - in our case pointer to
pointer to the 1st node.
The iterative implementations may be:
NODE* remove( NODE* list, int x)
{
NODE* n;
NODE* next_n;
NODE* prev_n = NULL;
for( n = list; n != NULL; )
{
next_n = n->next;
if( n->val == x)
{
free( n); /* if the nodes were malloc'ed */
if( prev_n == NULL)
/* if it IS the 1-ST node */
list = next_n;
else
/* if it IS NOT the 1-ST node */
prev_n->next = next_n;
}
else
/* if a node has ben deleted then the previous
* node remained the same, so we change it
* only if nothing was deleted */
prev_n = n;
n = next_n;
}
return( list);
}
Or using "pointer to pointer":
void remove( NODE** list, int x)
{
NODE** n_ptr;
NODE* next_n;
for( n_ptr = list; *n_ptr != NULL; )
{
if( (*n_ptr)->val == x)
{
next_n = (*n_ptr)->next;
free( *n_ptr); /* if the nodes were malloc'ed */
*n_ptr = next_n;
}
else
n_ptr = &( (*n_ptr)->next);
}
}
Also we can write this function recursively, e.g.:
void remove( NODE** n_ptr, int x)
{
NODE* next_n;
if( *n_ptr == NULL)
return;
if( (*n_ptr)->val == x)
{
next_n = (*n_ptr)->next;
free( *n_ptr); /* if the nodes were malloc'ed */
*n_ptr = next_n;
}
else
remove( &( (*n_ptr)->next), x);
}
The running time of all these implementations is O(N).
3.2 Running time of insertion to non-ordered list; insertion to
ordered list.
Let's recall the example of insertion to list. There was the following
algorithm:
void insert( NODE** list, int x)
{
NODE** node;
NODE** new_node;
/* if our list is empty - that's all */
if( *list == NULL)
new_node = list;
else
{
/* let's find the last element of the list */
for( node = list; (*node) != NULL;
node = &( (*node)->next))
;
new_node = node;
}
*new_node = malloc( sizeof( NODE));
(*new_node)->val = x;
(*new_node)->next = NULL;
}
What does this algorithm? First, it always adds new node to the end of
the list. Second, it does not check if there is already such an
element. So we have here a not-ordered, not-unique list. The running
time of this algorithm is O(N).
If the order of elements in the list is really not important then we
can dramatically reduce the running time by adding new nodes to the
beginning of the list instead of its end:
void insert( NODE** list, int x)
{
NODE* next_n;
next_n = *list;
*list = malloc( sizeof( NODE));
(*list)->val = x;
(*list)->next = next_n;
}
This function takes O(1) time!
But if we want to have either ordered or unique list then we have to
take our first "insert" and to add the appropriate checks to
there. For example, insertion into unique list may look so:
void insert( NODE** list, int x)
{
NODE** n;
NODE** new_node;
NODE* next_n;
/* if our list is empty - that's all */
if( *list == NULL)
new_node = list;
else
{
/* let's find the last element of the list */
for( n = list; (*n) != NULL;
n = &( (*n)->next))
if( (*n)->val == x)
return; /* x already exists there */
new_node = n;
}
next_n = *new_node;
*new_node = malloc( sizeof( NODE));
(*new_node)->val = x;
(*new_node)->next = next_n;
}
And insertion into an ordered list may be written so (very similar to
the previous example):
void insert( NODE** list, int x)
{
NODE** n;
NODE** new_node;
NODE* next_n;
/* if our list is empty - that's all */
if( *list == NULL)
new_node = list;
else
{
/* let's find the last element of the list */
for( n = list; (*n) != NULL;
n = &( (*n)->next))
if( (*n)->val >= x)
break; /* we reached the place */
/* where to insert */
new_node = n;
}
next_n = *new_node;
*new_node = malloc( sizeof( NODE));
(*new_node)->val = x;
(*new_node)->next = next_n;
}
It is easy to see that:
- In most cases the implementation of the insertion operation
defines features of linked list (ordered, unique, etc.)
- In most cases running time of the insertion operation is O(N) for
linked lists.
3.3 Selection sorting.
It is an often situation when we have not ordered list while we need
it ordered ("list" here means "list data model" not linked list, so
it may be implemented as array, linked list of however else). So we
have to sort the list. There are many different algorithms of
sorting. Let's discuss some of them. We will start from "selection
sorting" algorithm - one of the simplest sorting algorithms.
3.3.1 Abstract of the algorithm.
Suppose we have a list of N integers that we want to sort.
In the first iteration we will find ("select") the first cell
containing the smallest element in the whole list and exchange it
with the first cell of the list. In the second iteration we will
find the smallest element in the list excepting the first cell and
exchange it with the second cell of the list. And so on.
In such a way all the time our list consists of 2 parts: already
sorted and not sorted yet. In the very beginning the first part is
empty while the second part is the list itself. On each iteration the
sorted part is increased by 1 cell and the second part is decreased
by 1 cell. We continue until the second part becomes empty.
3.3.2 Use of selection sort for array of integers.
Let's see how the "selection sort" may be implemented for array of
integer (array here is an implementation of list of integers).
3.3.2.1 Iterative version.
void SelectionSort( int A[], int size)
{
int i, j, small, temp;
for( i = 0; i < size - 1; i ++)
{
/* set small to the index of the first */
/* occurrence of the smallest element */
/* remaining in the unsorted part */
small = i;
for( j = i + 1; j < size; j ++)
if( A[j] < A[small])
small = j;
/* now we can exchange A[small] and A[i] */
temp = A[small];
A[small] = A[i];
A[i] = temp;
}
}
Here we have the outer loop going through (almost) all elements of the
array. For each iteration of the outer loop we perform the inner loop
that goes through the rest of the array, so it has the average number
of iterations about N/2. So the running time of this algorithm is
O(N*N/2) or O(N2).
3.3.2.2 Recursive version.
Now let's see how we can write a recursive version of this
algorithm:
void SelectionSort( int A[], int i, int size)
{
int j, small, temp;
if( i == size - 1)
return;
/* find the first position of the smallest element */
small = i;
for( j = i + 1; j < size; j ++)
if( A[j] < A[small])
small = j;
/* exchange A[small] and A[i] */
temp = A[small];
A[small] = A[i];
A[i] = temp;
/* and perform the same for the rest of the array */
SelectionSort( A, i + 1, size);
}
This function performs only one step of the algorithm, namely it finds
the first occurrence of the smallest element and exchanges it with the
first cell. After this we know that the 1st element is ordered, so we
call the same procedure for the rest of the array.
3.4 Algorithm paradigm "Divide-and-Conquer".
One of the common ways to solve a problem is to try to break it into
subproblems (somehow simpler) and then to solve the subproblems and
combine their solutions into a solution of the whole problem. The
term "Divide-and-Conquer" is used for such a technique. Particularly
if the subproblems are similar to the original problem, then we may
use the same method to solve the subproblems (e.g. recursively).
There must be 2 requirements satisfied:
- The subproblems must be simpler than the original problem.
- After a finite number of subdivisions, we must get a problem that
can be solved without further dividing of it.
So we have the following general steps for this approach:
- To divide of the task to subtasks
- To solve the "small" problem for each "small" subtask
- To merge the results of subtasks
It's clear that if we want to apply such a method to some particular
problem then we must have 3 different "subalgorithms":
- A method for division.
- A method for solution of the "simplest" subtask.
- A method for merging.
To have the "Divide-and-Conquer" approach effective we must have
these 3 "subalgorithms" more effective than a general algorithm for
the problem as a whole.
We will discuss an example of powerful use of this technique: another
sorting algorithm called "Merge Sort".
3.5 Merge sorting.
The main idea of the "Divide-and-Conquer" approach is implemented
very straightforward here. Firstly we split our list on some (say 2)
sublists of equal length, then in turn each of them we split again
until we reach lists with 1 cell only. Then we build our list back
from this "elementary" sublists but already ordered. So the main
procedure may look so:
NODE* MergeSort( NODE* list)
{
NODE* second_list;
if( list == NULL)
return( NULL);
if( list->next == NULL);
return( list);
/* we have at least 2 cells in list */
second_list = split( list);
/* BTW, the original list "lost" half of its cells - */
/* they moved to second list */
return( merge( MergeSort( list), MergeSort( second_list)));
}
3.5.1 Dividing of list to sub-lists.
Let's divide our list (on each step) to 2 equal (or almost equal if
the original list is odd length) lists by moving all even-numbered
cells to the newly created second list. We choose such a division
because it may be easily implemented recursively:
NODE* split( NODE* list)
{
NODE* second_cell;
if( list == NULL)
return( NULL);
if( list->next == NULL)
return( NULL);
/* we have at least 2 cells */
second_cell = list->next;
list->next = second_cell->next;
second_cell->next = split( second_cell->next);
return( second_cell);
}
3.5.2 Merging of ordered lists.
Now we have to write a function for merge of 2 ordered lists:
NODE* merge( NODE* list1, NODE* list2)
{
if( list1 == NULL)
return( list2);
if( list2 == NULL)
return( list1);
if( list1->val <= list2->val)
{
list1->next = merge( list1->next, list2);
return( list1);
}
else
{
list2->next = merge( list1, list2->next);
return( list2);
}
}
Please note that this function may be useful standalone if you have
2 already ordered lists and want to build from them 1 common list.
Please note: one of tasks that we have to know is to understand
programs written by somebody else. So if these procedures do not look
too clear - don't be afraid. Take the functions, insert printing to
various places and see how they work. They are not too complicated.
3.6 Quick Sort
The algorithms we had - Selection Sort and Merge Sort - have running
time O(N2). Of course, this is not the best performance for
sorting. There is another algorithm that has running time
O(Nlog2N). This algorithm is called "Quick Sort" and it has
the best running time (at least for today). There are many various
modifications of the Quick Sort but all of them use the same
idea. This idea will be discussed now.
Suppose we have the following list:
[ 3 5 2 6 1 8 5 2 ]
Let's choose some value the is more or less median of values in the
list (please pay attention: this is one of the main problems of the
Quick Sort algorithm). Suppose we know what is interval of our values
and choose value 4.5. Let's divide our list into 2 parts: the left
part should contain only elements less or equal our chosen value
(4.5) while the right part should contain only elements greater than
4.5. How will we do it? Let's take the leftmost element of the list
and find the first number greater than 4.5 - this is the first number
that must be moved to the right part:
[ 3 5 2 8 1 5 2 ]
^
Now let's find the last (rightmost) element less or equal 4.5:
[ 3 5 2 8 1 5 2 ]
^
Now we can exchange these 2 elements:
[ 3 2 2 8 1 5 5 ]
^ ^
Now we will repeat this procedure but we can start from the positions
where we found the previously exchanged elements, so we will repeat
such exchanges until our "pointers" to the elements meet. I.e. the
sequence of the list changes is:
[ 3 2 2 8 1 5 5 ]
^ ^
[ 3 2 2 8 1 5 5 ]
^ ^
[ 3 2 2 1 8 5 5 ]
^ ^
Now we can split our list to two parts:
[ 3 2 2 1 ] [ 8 5 5 ]
Let's forget (temporarily) about the bigger part and perform the same
for the smaller one (the chosen value will be 6.5 now - it is much
easier to do after the first splitting due to we passed the whole list
during the elements exchanging, so we know maximal and minimal values
for each part: left and right):
[ 8 5 5]
^ ^
[ 5 5 8]
^ ^
[ 5 5 8]
^ ^
Well, to be pedantic we have to repeat the splitting again (using
value 6.5) but now we will not exchange any elements, so we will get
the sorted lists:
[ 5 5 ] [ 8 ]
The only thing we have to do is to merge them back to:
[ 5 5 8 ]
and to recall the part we forgot:
[ 3 2 2 1 ]
Well, now we can do the same for it:
[ 3 2 2 1 ] - choose 1.5
^ ^
[ 1 2 2 3 ]
^ ^
[ 1 2 2 3 ]
^ ^
[ 1 ] [ 2 2 3 ] - choose 2.5
^ ^
[ 1 ] [ 2 2 ] [ 3 ]
And after merge we get our list sorted:
[ 1 2 2 3 5 5 6 ]
OK, due to we are "forgotting" always bigger parts, so we have to
repeat the passing of the entire list not more than log2N
times, so the overall running time is O(Nlog2N).
The main problem here is choosing of the initial "median" value. This
is really important because if the value is "bad" then we can get the
same sequence of actions as in "Selection Sort" (e.g. if we always
will split list into the first list with 1 leftmost element and the
second list containing all other elements). The most modifications of
the "Quick Sort" differ by solution of this problem. One (and the
first suggested by the algorithm author) solution is to take the value
of the first element: if the list is really random then this number
is neither better not worse then other numbers. Another solution is
to take several elements in the beginning of the list and to calculate
their average value. And so on. The best solution may be chosen only
if you now something about the nature of your list.
3.7 Other sorting algorithms
We discussed the main sorting algorithms: "Selection Sort", "Merge
Sort" and "Quick Sort". In fact there are many such algorithms, each
of them fits better some definite circumstances. For example, there
are cases when you can not load all you data into memory, so there are
algorithms of so called "external sorting". And so on, and so on. If
you face some specific problem requiring sorting and you feel that
the "trivial" algorithm discussed here do not fit your needs then you
have to refer some appropriate book or other source.