with: Erik Oosterwal
![]()
Custom Search
|
Dim IntArray, Length, I as Integer Length = CStr(Text1.Text) ' Value taken from the console ReDim IntArray(Length) ' Fill the array with integers from 1 to Length For I = 1 To Length IntArray(I) = I Next ' Swap the value of each cell in the array with some ' other random cell. For I = 1 To Length Step 1 J = Int(Rnd * Length) + 1 Temp = IntArray(I) IntArray(I) = IntArray(J) IntArray(J) = Temp Next
Next, to test the time it takes for each routine to sort the array, I used
the Timer function in Visual Basic before and after the routine, then found
the difference. The Timer function returns the number of seconds elapsed
since midnight, so in order to be fair, these routines all need to complete
within one 24 hour period. As a reference point, I also used the same set
of commands around the shuffling loop in the Quick Sort routine to see how
long it takes to randomize the data. Here is what the timer code looks
like:
Dim I, Length, IntArray as Integer Dim Time1, Time2 as Single Time1 = Timer '<<<< Sorting routine goes here. >>>> Time2 = Timer I = 1 While (I <= Length) If IntArray(I) <> I Then MsgBox ("O-Sort failed") I = Length End If I = I + 1 Wend MsgBox ("Sorting Time = " + Str((Time2 - Time1) * 1000))
You'll notice that right after the second timer statement there is a While
loop. This loop is used to verify that the sorting algorithm actually
worked properly and sorted the shuffled data back where it's supposed to
be. It isn't necessary to multiply the difference by 1000, like I do in the
MsgBox statement, but I tend to get lazy when looking at message boxes and
when the computer tells me .388576259465E-2 I get confused and wonder whether
that's larger or smaller than .681283634983E-3, especially when the output
font is small. I prefer looking at numbers to the left of the decimal
point.
The Bubble Sort
The first sorting routine is the infamous Bubble Sort. It's simple
to program, it's simple to see how it works, and it's simply too slow for
any useful purpose. Don't try to convince me that its small program
size warrants its use even for small arrays because my O-Sort is just as
small and runs circles around it performance wise.
' Bubble Sort Dim I, J, IntArray(), Temp as Integer ' Declare Variables For I = 1 To Length - 1 ' Supposed to be the smaller value For J = I To Length ' Supposed to be the larger value If IntArray(I) > IntArray(J) Then ' Compare Cells Temp = IntArray(I) ' \ IntArray(I) = IntArray(J) ' | Swap Cells IntArray(J) = Temp ' / End If Next Next
It's a tried and true old friend but we'll say nothing more about the Bubble
Sort.
** Insert Shell Sort information here... ***
' Shell Sort Increment = 3 While (Increment > 0) For I = 1 + Increment To Length Temp = IntArray(I) For J = I - Increment To 1 Step 0 - Increment If (IntArray(J) > Temp) Then IntArray(J + Increment) = IntArray(J) Else Exit For End If Next IntArray(J + Increment) = Temp Next Increment = Increment - 1 Wend
The Shell-Metzner Sort *
The next sort we'll look at is the Shell-Metzner Sort. In an old copy
of Creative Computing I still have from the '70s, I found an article very
coincidentally entitled
A Comparison
of Sorts by John P. Grillo of the Department of Computer Information
Systems at West Texas State University. In that article John describes
and compares the Bubble Sort with the Delayed Replacement Sort (also known
as the Selective Replacement Sort, or Selection Sort) and the Shell-Metzner
Sort. The article states that the Shell-Metzner sort is an adaptation
of the Shell sort by Marlene Metzner. Here is how I converted the BASIC
listing and flow chart provided in the magazine to Visual Basic:
' Shell-Metzner Sort Dim I, J, K, L, M, IntArray(), Temp, Length as Integer ' Declare Variables M = Length ' Set initial step size to the size of the array Do M = Int(M / 2) ' Step size decreases by half each time If M = 0 Then Exit Do K = Length - M ' K is the upper limit for J J = 1 ' J is the starting point for the lower valued cell Do I = J ' I is supposed to point to the smaller value Do L = I + M ' L is supposed to point to the larger value If IntArray(I) > IntArray(L) Then ' Compare Cells Temp = IntArray(I) ' \ IntArray(I) = IntArray(L) ' | Swap Cells IntArray(L) = Temp ' / I = I - M ' Decrease smaller cell index by step size Else Exit Do End If Loop While (I >= 1) ' If I got decreased too much, reset the pointers J = J + 1 ' Increase the lower limit of I Loop While (J <= K) Loop
The Shell-Metzner Sort uses five indices to check which cells to swap.
While the Shell Sort starts with an increment, or step size, of 3,
the Metzner version starts with a step size equal to 1/2 the length of the
array, and while the Shell sort passes through the array 3 times, with each
pass increasing the number of comparisons quadratically, the Metzner version's
iterations increase at a much slower rate. All this adds up to a huge
increase in sorting speed.
* I recently found out via the
NIST website that
the algorithm given above was incorrectly called the Shell-Metzner sort by
John Grillo. On that page, Marlene gives a nice little history behind
the misidentification and attributes Don Shell as the originator.
*** Insert Heap Sort information here... ***
*** Insert Quick Sort information here.. ***
' Quick Sort Dim IntArray(), I, J, Temp As Long Dim LeftStack(32), RightStack(32), LeftIndex, RightIndex As Long Dim StackPointer, IntValue As Long LeftIndex = 1 RightIndex = Length StackPointer = 1 LeftStack(StackPointer) = LeftIndex RightStack(StackPointer) = RightIndex Do If RightIndex > LeftIndex Then IntValue = IntArray(RightIndex) I = LeftIndex - 1 J = RightIndex ' find the pivot item Do Do: I = I + 1: Loop Until IntArray(I) >= IntValue Do: J = J - 1: Loop Until J = LeftIndex Or IntArray(J) <= IntValue Temp = IntArray(I) IntArray(I) = IntArray(J) IntArray(J) = Temp Loop Until J <= I ' swap found items Temp = IntArray(J) IntArray(J) = IntArray(I) IntArray(I) = IntArray(RightIndex) IntArray(RightIndex) = Temp ' push the pairs of pointers that differ most on the stack StackPointer = StackPointer + 1 If (I - LeftIndex) > (RightIndex - I) Then LeftStack(StackPointer) = LeftIndex RightStack(StackPointer) = I - 1 LeftIndex = I + 1 Else LeftStack(StackPointer) = I + 1 RightStack(StackPointer) = RightIndex RightIndex = I - 1 End If Else 'pop a new pair of pointers off the stack LeftIndex = LeftStack(StackPointer) RightIndex = RightStack(StackPointer) StackPointer = StackPointer - 1 If StackPointer = 0 Then Exit Do End If Loop
The O-Sort
I developed the O-Sort independently during the spring of 2004 and discovered
in the fall of 2005 that the very sorting algorithm I thought I had developed
had already been described by Stephen Lacey and Richard Box in 1991 in Byte
Magazine (A fast, easy sort, Byte, 16(4):315-ff, April 1991). The
O-Sort falls under the category of Diminishing Increment Sorts.
The basic principle behind the O-Sort is to compare cells in the first half
of the array with cells in the second half of the array, swapping them if
the contents in the first half are larger than in the second half. This
is accomplished by setting the initial step size to Length/2-1. The
first set of passes compares 1/2 of the array. After each iteration
the step size is decreased and the cells are compared starting at cell 1,
again, and comparing each cell with the cell that is Step cells away from
it. The process is repeated until the step size is 0. This
boils down to a modification of the Bubble sort, but in practice the first
version of the O-Sort took generally about 1/2 as long as the Bubble
Sort. This long sort time was due to me decreasing the step size by
only 1 after each iteration.
With those disappointing results I started wondering if I could increase
the performance by using a larger decrement value for the step size. The
problem with using a larger decrement value is that the routine ends up skipping
too many cells in later iterations and you end up with an array that is mostly,
but not completely, sorted. I then changed the step decrement statement
to decrease the step size by 10% (Step = Step * 0.9). This worked
much better so I tried decreasing by 20% (Step = Step * 0.8) which failed.
I then tried some different values between 0.8 and 0.9 and discovered
that 0.85 works most of the time, but fails occasionally, while 0.86 appears
to work every time. Version 2 can be seen here:
' O-Sort (Oosterwal Sort) Version 2 Step = Int(Length / 2 - 1) ' Set initial step size While (Step > 0) For I = 1 To Length - Step ' Set the range of the lower search cells If IntArray(I) > IntArray(I + Step) Then ' Compare cells Temp = IntArray(I) ' \ IntArray(I) = IntArray(I + Step) ' | Swap Cells IntArray(I + Step) = Temp ' / End If Next Step = Int(Step * 0.86) ' Decrement the step size Wend
Its simplicity is beautiful and the speed is incredibly fast. The sort
times take about twice that of the quick sort routine listed earlier on this
page. Using only 2 loops and 1 comparison statement, this routine is
a model in simplicity.
As fast as it is, though, I wondered if I could improve on it even more.
I convinced myself that using an integer value for step size was causing
a small delay and I needed a method of decreasing the step size more smoothly.
My fondness for the Fibonacci sequence got me thinking that I could
use its principles in decreasing the step size in an exponential fashion
and still make it smooth. This new development required two variables
to keep track of the decrements--the first would be a real number (related
to phi) times the length of the previous step, and the second would be an
integer value approximating the real step size value. My initial guess of
using phi (0.618....) was way off. The value 0.618 is much too small
and the steps decrease too rapidly. Thinking that I was doomed to be
limited to numbers in the 0.86 range I tried 0.85 for my 'phi' value and
was pleasantly surprised to see that it worked. I then tried a number
of increasingly smaller values and ended up with 0.78 being stable and 0.77
producing arrays that were not completely sorted. I'm sure I could
play with that value some more and find one closer to 0.77 that works all
the time, but at this point the additional increase in speed will not be
very large.
O-Sort Version 3 is listed below. You will notice the two variables
named SngPhi and SngFib and understand that those names are left over from
my hope that a real Fibonacci sequence would have worked.
' O-Sort (Oosterwal Sort) Version 3 SngPhi = 0.78 ' Define phi value SngFib = Length * SngPhi ' Set initial real step size Step = Int(SngFib) ' Set initial integer step size While (Step > 0) For I = 1 To Length - Step ' Range of lower cell If IntArray(I) > IntArray(I + Step) Then ' Compare cells Temp = IntArray(I) ' \ IntArray(I) = IntArray(I + Step) ' | Swap Cells IntArray(I + Step) = Temp ' / End If Next SngFib = SngFib * SngPhi ' Decrease the Real step size Step = Int(SngFib) ' Set the integer step value Wend
While using a real variable to keep track of the step size increases the
number of statements required in this version over Version 2, its simplicity
is still beautiful and the increase in speed over Version 2 is noticeable.
Version 3 of O-Sort is in the same magnitude of time as the Quick sort--requiring
less than twice the time, and, requiring only 2 loops and 1 comparison statement,
the routine is still a model of simplicity.
During some manual runs of this algorithm, I've discovered that the smallest
element runs to the first cell fairly quickly and the largest element runs
to the last cell at about the same rate. For my next version of this
routine I will figure out if it can be determined how quickly the first and
last cells are filled by their proper elements so I can reduce the number
of cells that need to be looked at during later iterations. This should
increase the routine even more for large array sizes.
Raw Data and Comparison Graphs
Some early raw data:
Bubble Sort | Shell Sort | Shell-Metzner | O-sort v.2 | O-Sort v.3 | Quick Sort |
33890 33558 |
5550 |
441 |
390 332 332 328 332 332 328 382 |
281 |
160 160 160 171 160 160 160 109 |
Shell-Metzner | O-Sort v.2 | O-Sort v.3 | Quick Sort | Shuffle |
6812 |
5382 5378 5437 5378 5378 5437 |
3902 |
2027 2140 2089 2203 2140 2078 |
167 |
As you can see in the second table, the time it takes to shuffle the array
is only a magnitude smaller than the Quick Sort. While the routine
I use for shuffling the data could be made faster by using a step increment
larger than 1 in the loop, it demonstrates the theoretical lower time limit
needed to sort the data. What that means is if you had a magical sort
routine that knew exactly where each element was supposed to go, it would
take about the same time to sort it as it does to randomize it. There
would be 0 comparisons done and somewhere between N/2 and N-1 swaps, where
N is the number of elements in the array.
The Quick sort routine is known to have a defect when the data to be sorted
is already mostly sorted. To compare the efficiency of the routines
with 'almost sorted' data, I revised the shuffling routine shown at the beginning
of this article so that the FOR loop used a step size of 10 for one set of
runs (10% shuffled) and a step size of 20 for a second set of runs (5% shuffled).
The differences are quite telling:
Shell-Metzner | O-Sort v.3 | Quick Sort | Shuffle |
5320 |
3839 |
3792 |
0 |
O-Sort v.3 | Quick Sort | Shuffle |
47402 |
45089 |
218 |
O-Sort v.3 | Quick Sort | Shuffle |
222 |
441 |
0 |
O-Sort v.3 | Quick Sort | Shuffle |
3789 |
5156 |
0 |
As can be seen in the tables, for data that is nearly sorted, the O-Sort
takes less time as the data becomes more nearly shuffled, but the Quick Sort
takes much longer.
*** Insert graphs showing relative speeds of the sorts described in this
article here... ***