http://www.oocities.org/SiliconValley/Peaks/8778/TAU_advprog.html

3 Linked List

Let's recall what is a data model:

data model = data object + set of operations
Well, we will discuss the "list" data object with various operations on it, for example: insertion to list, deletion from list, search in list, etc.

What is a list? A list is a finite sequence of zero or more elements. We can have lists of integers, lists of real numbers, list of structures, lists of lists of integers, and so on. E.g. this is a list of integers:

( 1, 2, 3, 5, 2, 6, 4, 1 )
The length of a list is the number of occurrences of elements on the list. Note: this is number of "positions" not distinct elements, so in our example we have length of the list = 8, although there are only 6 differ elements.

A list may have various features, e.g. to be ordered or non-ordered, to contain unique elements (each element can occur in the list only once) on not unique, etc.

We can implement lists by various data structures, e.g. arrays. But arrays become very not efficient if our lists tend to grow - each time we reach the end of array we have to allocate new array and to copy all the elements to this new array. So usually lists are represented by linked cells (nodes) like the example that was used before. Such a data structure is called "linked list".

We will use the following structure to represent a node of the list (e.g. for list of integers):

typedef struct list_node
{
	int val;
	struct list_node* next;
} NODE;
Note: A working example of a linked list implementation is provided: http://www.oocities.org/SiliconValley/Peaks/8778/TAU_advprog-IntList.html .
You can use it freely for any purposes (including the use as a base for your exercises).

3.1 The simplest operations on list. Iterative and recursive algorithms.

Let's see 2 very basic operations on linked list, namely search in list and deletion from list. (An example of implementation of the insertion to such a list was given before. Various insertions will be discussed a little bit later.)

Let's write a function that checks whether an integer number is contained by a list of integers:

int lookup( NODE* list, int x)
{
	NODE* n;
	for( n = list; n != NULL; n = n->next)
		if( n->val == x)
			return( 1);
	return( 0);
}
(Let's recall what is running time of an algorithm. Our "lookup" algorithm takes O(N) time.)

Now let's introduce very interesting and useful thing: recursive algorithms. What does it mean "recursive algorithm"? This is an algorithm that calls itself, usually to perform the same actions for some subtask, i.e. reduced original task.

Many programming languages (including C) allow functions to call themselves. This allows us to implement recursive algorithms easily.

Let's take our "lookup" algorithm. You can see that after each iteration of the loop we have the same searching but for the tail of the list, i.e. for the sublist. So we can write this function recursively, e.g.:

int lookup( NODE* list, int x)
{
	if( list == NULL)
		return( 0);
	else if( list->val == x)
		return( 1);
	else
		return lookup( list->next, x);
}
The running time of this function is still O(N).

OK, let's implement another basic operation on a list: deletion. Let's assume that out deletion function must remove from a list ALL nodes with some value. The main difference here is that this operation may change the structure of our list (i.e. to delete some nodes from it), such a change may affect the first node also, therefore we can get as a result changed pointer to the beginning of our list (to the 1st node of the list). So the function must both accept this pointer as one of its parameters and to return its possibly changed value.
This may be solved in 2 ways: 1. returning the new value, and 2. passing pointer to the changing variable - in our case pointer to pointer to the 1st node.

The iterative implementations may be:

NODE* remove( NODE* list, int x)
{
	NODE* n;
	NODE* next_n;
	NODE* prev_n = NULL;

	for( n = list; n != NULL; )
	{
		next_n = n->next;
		if( n->val == x)
		{
			free( n); /* if the nodes were malloc'ed */
			if( prev_n == NULL)
				/* if it IS the 1-ST node */
				list = next_n;
			else
				/* if it IS NOT the 1-ST node */
				prev_n->next = next_n;
		}
		else
			/* if a node has ben deleted then the previous
			 * node remained the same, so we change it
			 * only if nothing was deleted */
			prev_n = n;
		n = next_n;
	}
	return( list);
}
Or using "pointer to pointer":
void remove( NODE** list, int x)
{
	NODE** n_ptr;
	NODE* next_n;

	for( n_ptr = list; *n_ptr != NULL; )
	{
		if( (*n_ptr)->val == x)
		{
			next_n = (*n_ptr)->next;
			free( *n_ptr); /* if the nodes were malloc'ed */
			*n_ptr = next_n;
		}
		else
			n_ptr = &( (*n_ptr)->next);
	}
}
Also we can write this function recursively, e.g.:

void remove( NODE** n_ptr, int x)
{
	NODE* next_n;

	if( *n_ptr == NULL)
		return;
	if( (*n_ptr)->val == x)
	{
		next_n = (*n_ptr)->next;
		free( *n_ptr); /* if the nodes were malloc'ed */
		*n_ptr = next_n;
	}
	else
		remove( &( (*n_ptr)->next), x);
}
The running time of all these implementations is O(N).

3.2 Running time of insertion to non-ordered list; insertion to ordered list.

Let's recall the example of insertion to list. There was the following algorithm:

void insert( NODE** list, int x)
{
	NODE** node;
	NODE** new_node;

	/* if our list is empty - that's all */
	if( *list == NULL)
		new_node = list;
	else
	{
		/* let's find the last element of the list */
		for( node = list; (*node) != NULL;
		node = &( (*node)->next))
				;
			new_node = node;
	}
	*new_node = malloc( sizeof( NODE));
	(*new_node)->val = x;
	(*new_node)->next = NULL;
}
What does this algorithm? First, it always adds new node to the end of the list. Second, it does not check if there is already such an element. So we have here a not-ordered, not-unique list. The running time of this algorithm is O(N).

If the order of elements in the list is really not important then we can dramatically reduce the running time by adding new nodes to the beginning of the list instead of its end:

void insert( NODE** list, int x)
{
	NODE* next_n;

	next_n = *list;
	*list = malloc( sizeof( NODE));
	(*list)->val = x;
	(*list)->next = next_n;
}
This function takes O(1) time!

But if we want to have either ordered or unique list then we have to take our first "insert" and to add the appropriate checks to there. For example, insertion into unique list may look so:

void insert( NODE** list, int x)
{
	NODE** n;
	NODE** new_node;
	NODE* next_n;

	/* if our list is empty - that's all */
	if( *list == NULL)
		new_node = list;
	else
	{
		/* let's find the last element of the list */
		for( n = list; (*n) != NULL;
				n = &( (*n)->next))
			if( (*n)->val == x)
				return;	/* x already exists there */
		new_node = n;
	}
	next_n = *new_node;
	*new_node = malloc( sizeof( NODE));
	(*new_node)->val = x;
	(*new_node)->next = next_n;
}
And insertion into an ordered list may be written so (very similar to the previous example):

void insert( NODE** list, int x)
{
	NODE** n;
	NODE** new_node;
	NODE* next_n;

	/* if our list is empty - that's all */
	if( *list == NULL)
		new_node = list;
	else
	{
		/* let's find the last element of the list */
		for( n = list; (*n) != NULL;
				n = &( (*n)->next))
			if( (*n)->val >= x)
				break;	/* we reached the place */
					/* where to insert */
		new_node = n;
	}
	next_n = *new_node;
	*new_node = malloc( sizeof( NODE));
	(*new_node)->val = x;
	(*new_node)->next = next_n;
}
It is easy to see that:
  1. In most cases the implementation of the insertion operation defines features of linked list (ordered, unique, etc.)
  2. In most cases running time of the insertion operation is O(N) for linked lists.

3.3 Selection sorting.

It is an often situation when we have not ordered list while we need it ordered ("list" here means "list data model" not linked list, so it may be implemented as array, linked list of however else). So we have to sort the list. There are many different algorithms of sorting. Let's discuss some of them. We will start from "selection sorting" algorithm - one of the simplest sorting algorithms.

3.3.1 Abstract of the algorithm.

Suppose we have a list of N integers that we want to sort.

In the first iteration we will find ("select") the first cell containing the smallest element in the whole list and exchange it with the first cell of the list. In the second iteration we will find the smallest element in the list excepting the first cell and exchange it with the second cell of the list. And so on.

In such a way all the time our list consists of 2 parts: already sorted and not sorted yet. In the very beginning the first part is empty while the second part is the list itself. On each iteration the sorted part is increased by 1 cell and the second part is decreased by 1 cell. We continue until the second part becomes empty.

3.3.2 Use of selection sort for array of integers.

Let's see how the "selection sort" may be implemented for array of integer (array here is an implementation of list of integers).

3.3.2.1 Iterative version.

void SelectionSort( int A[], int size)
{
	int i, j, small, temp;
	for( i = 0; i < size - 1; i ++)
	{
		/* set small to the index of the first */
		/* occurrence of the smallest element */
		/* remaining in the unsorted part */
		small = i;
		for( j = i + 1; j < size; j ++)
			if( A[j] < A[small])
				small = j;

		/* now we can exchange A[small] and A[i] */
		temp = A[small];
		A[small] = A[i];
		A[i] = temp;
	}
}
Here we have the outer loop going through (almost) all elements of the array. For each iteration of the outer loop we perform the inner loop that goes through the rest of the array, so it has the average number of iterations about N/2. So the running time of this algorithm is O(N*N/2) or O(N2).

3.3.2.2 Recursive version.

Now let's see how we can write a recursive version of this algorithm:

void SelectionSort( int A[], int i, int size)
{
	int j, small, temp;
	if( i == size - 1)
		return;
	/* find the first position of the smallest element */
	small = i;
	for( j = i + 1; j < size; j ++)
		if( A[j] < A[small])
			small = j;

	/* exchange A[small] and A[i] */
	temp = A[small];
	A[small] = A[i];
	A[i] = temp;

	/* and perform the same for the rest of the array */
	SelectionSort( A, i + 1, size);
}
This function performs only one step of the algorithm, namely it finds the first occurrence of the smallest element and exchanges it with the first cell. After this we know that the 1st element is ordered, so we call the same procedure for the rest of the array.

3.4 Algorithm paradigm "Divide-and-Conquer".

One of the common ways to solve a problem is to try to break it into subproblems (somehow simpler) and then to solve the subproblems and combine their solutions into a solution of the whole problem. The term "Divide-and-Conquer" is used for such a technique. Particularly if the subproblems are similar to the original problem, then we may use the same method to solve the subproblems (e.g. recursively).

There must be 2 requirements satisfied:
  1. The subproblems must be simpler than the original problem.
  2. After a finite number of subdivisions, we must get a problem that can be solved without further dividing of it.
So we have the following general steps for this approach:
  1. To divide of the task to subtasks
  2. To solve the "small" problem for each "small" subtask
  3. To merge the results of subtasks
It's clear that if we want to apply such a method to some particular problem then we must have 3 different "subalgorithms":
  1. A method for division.
  2. A method for solution of the "simplest" subtask.
  3. A method for merging.
To have the "Divide-and-Conquer" approach effective we must have these 3 "subalgorithms" more effective than a general algorithm for the problem as a whole.

We will discuss an example of powerful use of this technique: another sorting algorithm called "Merge Sort".

3.5 Merge sorting.

The main idea of the "Divide-and-Conquer" approach is implemented very straightforward here. Firstly we split our list on some (say 2) sublists of equal length, then in turn each of them we split again until we reach lists with 1 cell only. Then we build our list back from this "elementary" sublists but already ordered. So the main procedure may look so:

NODE* MergeSort( NODE* list)
{
	NODE* second_list;

	if( list == NULL)
		return( NULL);
	if( list->next == NULL);
		return( list);

	/* we have at least 2 cells in list */
	second_list = split( list);

	/* BTW, the original list "lost" half of its cells - */
	/* they moved to second list */
	return( merge( MergeSort( list), MergeSort( second_list)));
}

3.5.1 Dividing of list to sub-lists.

Let's divide our list (on each step) to 2 equal (or almost equal if the original list is odd length) lists by moving all even-numbered cells to the newly created second list. We choose such a division because it may be easily implemented recursively:

NODE* split( NODE* list)
{
	NODE* second_cell;

	if( list == NULL)
		return( NULL);
	if( list->next == NULL)
		return( NULL);

	/* we have at least 2 cells */
	second_cell = list->next;
	list->next = second_cell->next;
	second_cell->next = split( second_cell->next);
	return( second_cell);
}

3.5.2 Merging of ordered lists.

Now we have to write a function for merge of 2 ordered lists:

NODE* merge( NODE* list1, NODE* list2)
{
	if( list1 == NULL)
		return( list2);
	if( list2 == NULL)
		return( list1);

	if( list1->val <= list2->val)
	{
		list1->next = merge( list1->next, list2);
		return( list1);
	}
	else
	{
		list2->next = merge( list1, list2->next);
		return( list2);
	}
}
Please note that this function may be useful standalone if you have 2 already ordered lists and want to build from them 1 common list.

Please note: one of tasks that we have to know is to understand programs written by somebody else. So if these procedures do not look too clear - don't be afraid. Take the functions, insert printing to various places and see how they work. They are not too complicated.

3.6 Quick Sort

The algorithms we had - Selection Sort and Merge Sort - have running time O(N2). Of course, this is not the best performance for sorting. There is another algorithm that has running time O(Nlog2N). This algorithm is called "Quick Sort" and it has the best running time (at least for today). There are many various modifications of the Quick Sort but all of them use the same idea. This idea will be discussed now.

Suppose we have the following list:

[ 3 5 2 6 1 8 5 2 ]
Let's choose some value the is more or less median of values in the list (please pay attention: this is one of the main problems of the Quick Sort algorithm). Suppose we know what is interval of our values and choose value 4.5. Let's divide our list into 2 parts: the left part should contain only elements less or equal our chosen value (4.5) while the right part should contain only elements greater than 4.5. How will we do it? Let's take the leftmost element of the list and find the first number greater than 4.5 - this is the first number that must be moved to the right part:

[ 3 5 2 8 1 5 2 ]
    ^
Now let's find the last (rightmost) element less or equal 4.5:

[ 3 5 2 8 1 5 2 ]
              ^
Now we can exchange these 2 elements:

[ 3 2 2 8 1 5 5 ]
    ^         ^
Now we will repeat this procedure but we can start from the positions where we found the previously exchanged elements, so we will repeat such exchanges until our "pointers" to the elements meet. I.e. the sequence of the list changes is:

[ 3 2 2 8 1 5 5 ]
    ^         ^
[ 3 2 2 8 1 5 5 ]
        ^ ^
[ 3 2 2 1 8 5 5 ]
        ^ ^
Now we can split our list to two parts:

[ 3 2 2 1 ] [ 8 5 5 ]
Let's forget (temporarily) about the bigger part and perform the same for the smaller one (the chosen value will be 6.5 now - it is much easier to do after the first splitting due to we passed the whole list during the elements exchanging, so we know maximal and minimal values for each part: left and right):

[ 8 5 5]
  ^   ^
[ 5 5 8]
  ^   ^
[ 5 5 8]
    ^ ^
Well, to be pedantic we have to repeat the splitting again (using value 6.5) but now we will not exchange any elements, so we will get the sorted lists:

[ 5 5 ] [ 8 ]
The only thing we have to do is to merge them back to:

[ 5 5 8 ]
and to recall the part we forgot:

[ 3 2 2 1 ]
Well, now we can do the same for it:

[ 3 2 2 1 ] - choose 1.5
  ^     ^
[ 1 2 2 3 ]
  ^     ^
[ 1 2 2 3 ]
  ^ ^
[ 1 ] [ 2 2 3 ] - choose 2.5
          ^ ^
[ 1 ] [ 2 2 ] [ 3 ]
And after merge we get our list sorted:
[ 1 2 2 3 5 5 6 ]
OK, due to we are "forgotting" always bigger parts, so we have to repeat the passing of the entire list not more than log2N times, so the overall running time is O(Nlog2N).

The main problem here is choosing of the initial "median" value. This is really important because if the value is "bad" then we can get the same sequence of actions as in "Selection Sort" (e.g. if we always will split list into the first list with 1 leftmost element and the second list containing all other elements). The most modifications of the "Quick Sort" differ by solution of this problem. One (and the first suggested by the algorithm author) solution is to take the value of the first element: if the list is really random then this number is neither better not worse then other numbers. Another solution is to take several elements in the beginning of the list and to calculate their average value. And so on. The best solution may be chosen only if you now something about the nature of your list.

3.7 Other sorting algorithms

We discussed the main sorting algorithms: "Selection Sort", "Merge Sort" and "Quick Sort". In fact there are many such algorithms, each of them fits better some definite circumstances. For example, there are cases when you can not load all you data into memory, so there are algorithms of so called "external sorting". And so on, and so on. If you face some specific problem requiring sorting and you feel that the "trivial" algorithm discussed here do not fit your needs then you have to refer some appropriate book or other source.

3.8 Exercise 3