This applet lets you observe the dynamic
operation of several basic sort algorithms. The implementations,
explanations
and display technique are all taken from Algorithms in C++, Third
Edition, Sedgewick, 1998.
Generate a data set by clicking one of the data set
buttons. This generates a data set with the appropriate
characteristics. Each data set is an array of 512 values in the
range 0..511. Data sets have no duplicate entries except the
Gaussian, Limited, and Equal distributions.
Run one or more sort algorithms on the data set to see how
the algorithm works. Each execution of a sort runs on the
same data set until you generate a new one.
Data Set Types
Click "Random" to get an evenly distributed random data set
with no duplications.
Click "Ordered" to get a data set in ascending order with no
duplications
Click "Reverse" to get a data set in descending order with no
duplications
Click "Gaussian" to get a gaussian distribution with
duplicate elements
Click "Limited" to get an evenly distributed random data set
with only 10 distinct values
Click "NearOrdered" to get a data set in nearly ascending
order with no duplications
Click "NearReverse" to get a data set in nearly descending
order with no duplications
Click "Equal" to get a data set where all the values are the
same
Sort Algorithms
Click "Bubble" to run the bubble sort algorithm.
Click "Insert" to run the insertion sort algorithm.
Click "Select" to run the selection sort algorithm.
Click "Shell" to run the shellsort algorithm.
Click "Heap" to run the heapsort algorithm.
Click "Quick" to run the quicksort algorithm.
Click "Med3Quick" to run the median-of-3 quicksort
algorithm.
Click "ThreeQuick" to run the three-way partitioning
quicksort algorithm.
Click "Merge" to run the mergesort algorithm.
Click "BUMerge" to run the bottom-up mergesort
algorithm.
The display is a cartesian system where the x axis is the
domain of possible values 'i' and the y axis is the value of the
array at index 'i'. When a complete data set is sorted all values
will be on the diagonal, i == a[i] (unless there were
duplicates). The more spread out the points are the farther from
being sorted the array is. For example, the value i=1 may start
out in a[511] and it will end up in a[1] when the array is fully
sorted. It takes a little thought to figure out how to interpret
the display, but once grokked, it becomes clear.
The display shows a number of 'Ops' which is an approximation
of the total number of comparisons and item swaps that occurred
during the sort. This value corresponds roughly to the complexity
of the algorithm expressed in big-O notation. It doesn't reflect
the implementation details that also contribute to algorithm
performance. Different data sets will yield different performance
for a given algorithm.
An important characteristic of sort algorithms that has a
large effect on their efficiency is how far elements are moved in
a swap. Bubble sort moves elements only a short distance and the
array crawls into sorted order. Better algorithms move elements
farther in a single operation so that the array seems to jump
into sorted order. The display technique used here really shows
this effect.
This applet redraws the array at various key points in the
algorithm with a time delay inserted to make it run slow enough
to watch. Because the loop/recursion structure of the various
algorithms can be different, where the repaints occur and because
the repaint and time delay dominate the running of the applet,
the clock time to completion of a particular algorithm doesn't
reflect the relative performance of the algorithm. Use the 'Ops'
figure for comparisons.
For a comprehensive overview of sorting algorithms and
algorithm complexity see the Sedgewick book. The implementations
used here are the basic ones from his book. There are plenty of
optimizations that can be done to improve these algorithms.
Basic Algorithm Characteristics
The analysis of sort algorithms is complex and has a lot of
variables, but in general the algorithms used here should come
out something like this:
Bubble sort is O(N**2). The number of Ops should come out
<= 512 * 512 = 262144
Insertion sort is O(N**2), but it can be O(N) for nearly
sorted data.
Selection sort is O(N**2) as well, but it only performs N-1
swaps.
Quicksort is O(N lg N) on the average but can degenerate to
(N**2)/2 in the worst case (try the ordered data set on
quicksort). Quicksort is recursive and needs a lot of stack
space.
Median-of-3 quicksort makes the worst case much less likely,
since it does a better job of picking a pivot element on which to
partition the data set.
Three-way partitioning quicksort improves the behavior of
quicksort when there are a lot of items equal to the pivot.
Shell sort (named for Mr. Shell) is less than O(N**(4/3)) for
this implementation. Shell sort is iterative and doesn't require
much extra memory.
Merge sort is O(N lg N) for all data sets, so while it is
slower than the best case for quicksort, it doesn't have
degenerate cases. It needs additional storage equal to the size
of the input array and it is recursive so it needs stack
space.
Bottom-up merge sort is a non-recursive variant which does
the same work in a different order.
Heap sort is guaranteed to be O(N lg N), doesn't degenerate
like quicksort and doesn't use extra memory like mergesort, but
its implementation has more operations so on average it's not as
good as quicksort.
This page and applet were originally written by David M. Howard, October 1998. They
were extended (with twice as many sorts and data distributions) by
Brian Howard (no
relation) in September 2002.