Computer Science 101

with:  Erik Oosterwal

Search for specific programs or algorithms:
Custom Search





A Comparison of Sorting Algorithms


There are a multitude of sorting algorithms, each with its own set of pros and cons.  Certainly one of the first sorting routines that every first year Computer Science student learns is the Bubble Sort.  Back when I was in school during the '80s, the Heap Sort was touted as the greatest and fastest algorithm.  Now it seems the Quick Sort is the most favored among those who deal with sorting routines.

I was curious if I could come up with an algorithm which is faster than the Quick Sort, but so far (Nov-01-2005) I have been unsuccessful at finding one.  However, I did stumble across a nice algorithm which does not require recursive function calls or stacks, and time-wise it falls within the same order of magnitude as the Quick Sort.  For all shuffled data samples that I have tested, my version of the comb sort, which I call the O-Sort (short for Oosterwal Sort) requires only slightly more than double the time to sort an array of integers than the Quick sort, and only half as long as the Shell-Metzner Sort I found listed in an old Creative Computing magazine from the mid '70s.  Comparisons of the various sorting routines mentioned, along with code in Visual Basic, is found later in this article.

In order to test and compare the efficiency of the sorting routines I created an array of size Length, filled that array with integers from 1 to Length, then shuffled the array.  The following code shows the array filling and shuffling process:



    Dim IntArray, Length, I as Integer

    Length = CStr(Text1.Text)  ' Value taken from the console
    ReDim IntArray(Length)
    
    ' Fill the array with integers from 1 to Length
    For I = 1 To Length
        IntArray(I) = I
    Next
        
    ' Swap the value of each cell in the array with some
    ' other random cell.
    For I = 1 To Length Step 1
        J = Int(Rnd * Length) + 1
        Temp = IntArray(I)
        IntArray(I) = IntArray(J)
        IntArray(J) = Temp
    Next


Next, to test the time it takes for each routine to sort the array, I used the Timer function in Visual Basic before and after the routine, then found the difference.  The Timer function returns the number of seconds elapsed since midnight, so in order to be fair, these routines all need to complete within one 24 hour period. As a reference point, I also used the same set of commands around the shuffling loop in the Quick Sort routine to see how long it takes to randomize the data.  Here is what the timer code looks like:



    Dim I, Length, IntArray as Integer
    Dim Time1, Time2 as Single

    Time1 = Timer
    
'<<<<    Sorting routine goes here.    >>>>
    
    Time2 = Timer
    
    I = 1
    While (I <= Length)
        If IntArray(I) <> I Then
            MsgBox ("O-Sort failed")
            I = Length
        End If
        I = I + 1
    Wend
    
    MsgBox ("Sorting Time = " + Str((Time2 - Time1) * 1000))

You'll notice that right after the second timer statement there is a While loop.  This loop is used to verify that the sorting algorithm actually worked properly and sorted the shuffled data back where it's supposed to be. It isn't necessary to multiply the difference by 1000, like I do in the MsgBox statement, but I tend to get lazy when looking at message boxes and when the computer tells me .388576259465E-2 I get confused and wonder whether that's larger or smaller than .681283634983E-3, especially when the output font is small.  I prefer looking at numbers to the left of the decimal point.


The Bubble Sort

The first sorting routine is the infamous Bubble Sort.  It's simple to program, it's simple to see how it works, and it's simply too slow for any useful purpose.  Don't try to convince me that its small program size warrants its use even for small arrays because my O-Sort is just as small and runs circles around it performance wise.



' Bubble Sort

    Dim I, J, IntArray(), Temp as Integer	' Declare Variables

    For I = 1 To Length - 1			' Supposed to be the smaller value
        For J = I To Length			' Supposed to be the larger value
            If IntArray(I) > IntArray(J) Then	' Compare Cells
                Temp = IntArray(I)		' \
                IntArray(I) = IntArray(J)	' | Swap Cells
                IntArray(J) = Temp		' /
            End If
        Next
    Next


It's a tried and true old friend but we'll say nothing more about the Bubble Sort.


** Insert Shell Sort information here... ***



' Shell Sort

    Increment = 3
    
    While (Increment > 0)
        For I = 1 + Increment To Length
            Temp = IntArray(I)
            For J = I - Increment To 1 Step 0 - Increment
                If (IntArray(J) > Temp) Then
                    IntArray(J + Increment) = IntArray(J)
                Else
                    Exit For
                End If
            Next
            IntArray(J + Increment) = Temp
        Next
        Increment = Increment - 1
    Wend



The Shell-Metzner Sort *

The next sort we'll look at is the Shell-Metzner Sort.  In an old copy of Creative Computing I still have from the '70s, I found an article very coincidentally entitled A Comparison of Sorts by John P. Grillo of the Department of Computer Information Systems at West Texas State University.  In that article John describes and compares the Bubble Sort with the Delayed Replacement Sort (also known as the Selective Replacement Sort, or Selection Sort) and the Shell-Metzner Sort.  The article states that the Shell-Metzner sort is an adaptation of the Shell sort by Marlene Metzner.  Here is how I converted the BASIC listing and flow chart provided in the magazine to Visual Basic:



' Shell-Metzner Sort

    Dim I, J, K, L, M, IntArray(), Temp, Length as Integer	' Declare Variables

    M = Length						' Set initial step size to the size of the array

    Do
        M = Int(M / 2)					' Step size decreases by half each time
        If M = 0 Then Exit Do
        K = Length - M					' K is the upper limit for J
        J = 1						' J is the starting point for the lower valued cell
        
        Do
            I = J					' I is supposed to point to the smaller value
            
            Do
                L = I + M				' L is supposed to point to the larger value
                If IntArray(I) > IntArray(L) Then	' Compare Cells
                    Temp = IntArray(I)			' \
                    IntArray(I) = IntArray(L)		' | Swap Cells
                    IntArray(L) = Temp			' /
                    I = I - M				' Decrease smaller cell index by step size
                Else
                    Exit Do
                End If
            Loop While (I >= 1)				' If I got decreased too much, reset the pointers
            J = J + 1					' Increase the lower limit of I
        Loop While (J <= K)
    Loop

The Shell-Metzner Sort uses five indices to check which cells to swap.  While the Shell Sort starts with an increment, or step size, of 3, the Metzner version starts with a step size equal to 1/2 the length of the array, and while the Shell sort passes through the array 3 times, with each pass increasing the number of comparisons quadratically, the Metzner version's iterations increase at a much slower rate.  All this adds up to a huge increase in sorting speed.

* I recently found out via the NIST website that the algorithm given above was incorrectly called the Shell-Metzner sort by John Grillo.  On that page, Marlene gives a nice little history behind the misidentification and attributes Don Shell as the originator.


*** Insert Heap Sort information here... ***

*** Insert Quick Sort information here.. ***



' Quick Sort

    Dim IntArray(), I, J, Temp As Long
    Dim LeftStack(32), RightStack(32), LeftIndex, RightIndex As Long
    Dim StackPointer, IntValue As Long

    LeftIndex = 1
    RightIndex = Length
        
    StackPointer = 1
    LeftStack(StackPointer) = LeftIndex
    RightStack(StackPointer) = RightIndex
    
    Do
        If RightIndex > LeftIndex Then
            IntValue = IntArray(RightIndex)
            I = LeftIndex - 1
            J = RightIndex
            ' find the pivot item
            Do
                Do: I = I + 1: Loop Until IntArray(I) >= IntValue
                Do: J = J - 1: Loop Until J = LeftIndex Or IntArray(J) <= IntValue
                Temp = IntArray(I)
                IntArray(I) = IntArray(J)
                IntArray(J) = Temp
            Loop Until J <= I
            ' swap found items
            Temp = IntArray(J)
            IntArray(J) = IntArray(I)
            IntArray(I) = IntArray(RightIndex)
            IntArray(RightIndex) = Temp
            ' push the pairs of pointers that differ most on the stack
            StackPointer = StackPointer + 1
            If (I - LeftIndex) > (RightIndex - I) Then
                LeftStack(StackPointer) = LeftIndex
                RightStack(StackPointer) = I - 1
                LeftIndex = I + 1
            Else
                LeftStack(StackPointer) = I + 1
                RightStack(StackPointer) = RightIndex
                RightIndex = I - 1
            End If
        Else
            'pop a new pair of pointers off the stack
            LeftIndex = LeftStack(StackPointer)
            RightIndex = RightStack(StackPointer)
            StackPointer = StackPointer - 1
            If StackPointer = 0 Then Exit Do
        End If
    Loop




The O-Sort

I developed the O-Sort independently during the spring of 2004 and discovered in the fall of 2005 that the very sorting algorithm I thought I had developed had already been described by Stephen Lacey and Richard Box in 1991 in Byte Magazine (A fast, easy sort, Byte, 16(4):315-ff, April 1991).  The O-Sort falls under the category of Diminishing Increment Sorts.

The basic principle behind the O-Sort is to compare cells in the first half of the array with cells in the second half of the array, swapping them if the contents in the first half are larger than in the second half.  This is accomplished by setting the initial step size to Length/2-1.  The first set of passes compares 1/2 of the array.  After each iteration the step size is decreased and the cells are compared starting at cell 1, again, and comparing each cell with the cell that is Step cells away from it.  The process is repeated until the step size is 0.  This boils down to a modification of the Bubble sort, but in practice the first version of the O-Sort took generally  about 1/2 as long as the Bubble Sort.  This long sort time was due to me decreasing the step size by only 1 after each iteration.

With those disappointing results I started wondering if I could increase the performance by using a larger decrement value for the step size.  The problem with using a larger decrement value is that the routine ends up skipping too many cells in later iterations and you end up with an array that is mostly, but not completely, sorted.  I then changed the step decrement statement to decrease the step size by 10%  (Step = Step * 0.9).  This worked much better so I tried decreasing by 20% (Step = Step * 0.8) which failed.  I then tried some different values between 0.8 and 0.9 and discovered that 0.85 works most of the time, but fails occasionally, while 0.86 appears to work every time.  Version 2 can be seen here:



' O-Sort  (Oosterwal Sort)  Version 2

    Step = Int(Length / 2 - 1)				' Set initial step size
    
    While (Step > 0)
        
        For I = 1 To Length - Step			' Set the range of the lower search cells
            If IntArray(I) > IntArray(I + Step) Then	' Compare cells
                Temp = IntArray(I)			' \
                IntArray(I) = IntArray(I + Step)	' | Swap Cells
                IntArray(I + Step) = Temp		' /
            End If
        Next
            
        Step = Int(Step * 0.86)				' Decrement the step size
        
    Wend

Its simplicity is beautiful and the speed is incredibly fast.  The sort times take about twice that of the quick sort routine listed earlier on this page.  Using only 2 loops and 1 comparison statement, this routine is a model in simplicity.

As fast as it is, though, I wondered if I could improve on it even more.  I convinced myself that using an integer value for step size was causing a small delay and I needed a method of decreasing the step size more smoothly.  My fondness for the Fibonacci sequence got me thinking that I could use its principles in decreasing the step size in an exponential fashion and still make it smooth.  This new development required two variables to keep track of the decrements--the first would be a real number (related to phi) times the length of the previous step, and the second would be an integer value approximating the real step size value. My initial guess of using phi (0.618....) was way off.  The value 0.618 is much too small and the steps decrease too rapidly.  Thinking that I was doomed to be limited to numbers in the 0.86 range I tried 0.85 for my 'phi' value and was pleasantly surprised to see that it worked.  I then tried a number of increasingly smaller values and ended up with 0.78 being stable and 0.77 producing arrays that were not completely sorted.  I'm sure I could play with that value some more and find one closer to 0.77 that works all the time, but at this point the additional increase in speed will not be very large.

O-Sort Version 3 is listed below.  You will notice the two variables named SngPhi and SngFib and understand that those names are left over from my hope that a real Fibonacci sequence would have worked.



' O-Sort   (Oosterwal Sort)  Version 3

    SngPhi = 0.78					' Define phi value
    SngFib = Length * SngPhi				' Set initial real step size
    Step = Int(SngFib)					' Set initial integer step size
    
    While (Step > 0)
        
        For I = 1 To Length - Step			' Range of lower cell
            If IntArray(I) > IntArray(I + Step) Then	' Compare cells
                Temp = IntArray(I)			' \
                IntArray(I) = IntArray(I + Step)	' | Swap Cells
                IntArray(I + Step) = Temp		' /
            End If
        Next
            
        SngFib = SngFib * SngPhi			' Decrease the Real step size
        Step = Int(SngFib)				' Set the integer step value
        
    Wend

While using a real variable to keep track of the step size increases the number of statements required in this version over Version 2, its simplicity is still beautiful and the increase in speed over Version 2 is noticeable. Version 3 of O-Sort is in the same magnitude of time as the Quick sort--requiring less than twice the time, and, requiring only 2 loops and 1 comparison statement, the routine is still a model of simplicity.

During some manual runs of this algorithm, I've discovered that the smallest element runs to the first cell fairly quickly and the largest element runs to the last cell at about the same rate.  For my next version of this routine I will figure out if it can be determined how quickly the first and last cells are filled by their proper elements so I can reduce the number of cells that need to be looked at during later iterations.  This should increase the routine even more for large array sizes.

Raw Data and Comparison Graphs

Some early raw data:

10,000 Elements of 100% Shuffled Data
Bubble Sort Shell Sort Shell-Metzner O-sort v.2 O-Sort v.3 Quick Sort
33890
33558

5550
5609
5601
5441
5718

441
378
386
390
378

390
332
332
328
332
332
328
382

281
269
222
281
218

160
160
160
171
160
160
160
109


100,000 Elements of 100% Shuffled Data
Shell-Metzner O-Sort v.2 O-Sort v.3 Quick Sort Shuffle

6812
6648
7031
6808
6968

5382
5378
5437
5378
5378
5437

3902
3949
3949
3957
3949

2027
2140
2089
2203
2140
2078

167
171
171
160
160


As you can see in the second table, the time it takes to shuffle the array is only a magnitude smaller than the Quick Sort.  While the routine I use for shuffling the data could be made faster by using a step increment larger than 1 in the loop, it demonstrates the theoretical lower time limit needed to sort the data.  What that means is if you had a magical sort routine that knew exactly where each element was supposed to go, it would take about the same time to sort it as it does to randomize it.  There would be 0 comparisons done and somewhere between N/2 and N-1 swaps, where N is the number of elements in the array.


The Quick sort routine is known to have a defect when the data to be sorted is already mostly sorted.  To compare the efficiency of the routines with 'almost sorted' data, I revised the shuffling routine shown at the beginning of this article so that the FOR loop used a step size of 10 for one set of runs (10% shuffled) and a step size of 20 for a second set of runs (5% shuffled).  The differences are quite telling:

100,000 Elements of 10% Shuffled Data
Shell-Metzner O-Sort v.3 Quick Sort Shuffle

5320
5378
5542
5546
5761

3839
3847
3839
3898
3839

3792
4230
3898
3957
4167

0
0
0
0
0



1,000,000 Elements of 10% Shuffled Data
O-Sort v.3 Quick Sort Shuffle

47402
47410
47289
47402
47339

45089
51531
51250
49257
48671

218
222
160
160
218



10,000 Elements of 5% Shuffled Data
O-Sort v.3 Quick Sort Shuffle

222
218
281
269
218

441
550
488
500
441

0
0
0
0
0



100,000 Elements of 5% Shuffled Data
O-Sort v.3 Quick Sort Shuffle

3789
3789
3726
3789
3789

5156
6371
7250
5941
6417

0
0
58
0
0



As can be seen in the tables, for data that is nearly sorted, the O-Sort takes less time as the data becomes more nearly shuffled, but the Quick Sort takes much longer.


*** Insert graphs showing relative speeds of the sorts described in this article here... ***





Discuss computer algorithms and other computer science topics at the Computer Algorithms blog page.

All code and original algorithms are © Erik Oosterwal - 1987-2008
Computer Science 101

Dressing for Success