# Sorting Algorithms

This applet lets you observe the dynamic operation of several basic sort algorithms. The implementations, explanations
and display technique are all taken from Algorithms in C++, Third Edition, Sedgewick, 1998.

• Generate a data set by clicking one of the data set buttons. This generates a data set with the appropriate characteristics. Each data set is an array of 512 values in the range 0..511. Data sets have no duplicate entries except the Gaussian, Limited, and Equal distributions.
• Run one or more sort algorithms on the data set to see how the algorithm works. Each execution of a sort runs on the same data set until you generate a new one.  Data Set Types Click "Random" to get an evenly distributed random data set with no duplications. Click "Ordered" to get a data set in ascending order with no duplications Click "Reverse" to get a data set in descending order with no duplications Click "Gaussian" to get a gaussian distribution with duplicate elements Click "Limited" to get an evenly distributed random data set with only 10 distinct values Click "NearOrdered" to get a data set in nearly ascending order with no duplications Click "NearReverse" to get a data set in nearly descending order with no duplications Click "Equal" to get a data set where all the values are the same Sort Algorithms Click "Bubble" to run the bubble sort algorithm. Click "Insert" to run the insertion sort algorithm. Click "Select" to run the selection sort algorithm. Click "Shell" to run the shellsort algorithm. Click "Heap" to run the heapsort algorithm. Click "Quick" to run the quicksort algorithm. Click "Med3Quick" to run the median-of-3 quicksort algorithm. Click "ThreeQuick" to run the three-way partitioning quicksort algorithm. Click "Merge" to run the mergesort algorithm. Click "BUMerge" to run the bottom-up mergesort algorithm.

• The display is a cartesian system where the x axis is the domain of possible values 'i' and the y axis is the value of the array at index 'i'. When a complete data set is sorted all values will be on the diagonal, i == a[i] (unless there were duplicates). The more spread out the points are the farther from being sorted the array is. For example, the value i=1 may start out in a and it will end up in a when the array is fully sorted. It takes a little thought to figure out how to interpret the display, but once grokked, it becomes clear.
• The display shows a number of 'Ops' which is an approximation of the total number of comparisons and item swaps that occurred during the sort. This value corresponds roughly to the complexity of the algorithm expressed in big-O notation. It doesn't reflect the implementation details that also contribute to algorithm performance. Different data sets will yield different performance for a given algorithm.
• An important characteristic of sort algorithms that has a large effect on their efficiency is how far elements are moved in a swap. Bubble sort moves elements only a short distance and the array crawls into sorted order. Better algorithms move elements farther in a single operation so that the array seems to jump into sorted order. The display technique used here really shows this effect.
• This applet redraws the array at various key points in the algorithm with a time delay inserted to make it run slow enough to watch. Because the loop/recursion structure of the various algorithms can be different, where the repaints occur and because the repaint and time delay dominate the running of the applet, the clock time to completion of a particular algorithm doesn't reflect the relative performance of the algorithm. Use the 'Ops' figure for comparisons.
• For a comprehensive overview of sorting algorithms and algorithm complexity see the Sedgewick book. The implementations used here are the basic ones from his book. There are plenty of optimizations that can be done to improve these algorithms.
• Basic Algorithm Characteristics

The analysis of sort algorithms is complex and has a lot of variables, but in general the algorithms used here should come out something like this:

• Bubble sort is O(N**2). The number of Ops should come out <= 512 * 512 = 262144
• Insertion sort is O(N**2), but it can be O(N) for nearly sorted data.
• Selection sort is O(N**2) as well, but it only performs N-1 swaps.
• Quicksort is O(N lg N) on the average but can degenerate to (N**2)/2 in the worst case (try the ordered data set on quicksort). Quicksort is recursive and needs a lot of stack space.
• Median-of-3 quicksort makes the worst case much less likely, since it does a better job of picking a pivot element on which to partition the data set.
• Three-way partitioning quicksort improves the behavior of quicksort when there are a lot of items equal to the pivot.
• Shell sort (named for Mr. Shell) is less than O(N**(4/3)) for this implementation. Shell sort is iterative and doesn't require much extra memory.
• Merge sort is O(N lg N) for all data sets, so while it is slower than the best case for quicksort, it doesn't have degenerate cases. It needs additional storage equal to the size of the input array and it is recursive so it needs stack space.
• Bottom-up merge sort is a non-recursive variant which does the same work in a different order.
• Heap sort is guaranteed to be O(N lg N), doesn't degenerate like quicksort and doesn't use extra memory like mergesort, but its implementation has more operations so on average it's not as good as quicksort.

Source Code

This page and applet were originally written by David M. Howard, October 1998. They were extended (with twice as many sorts and data distributions) by Brian Howard (no relation) in September 2002.