1. Sorting

We have seen that we have needed to sort data in a couple of places:

  • we needed to be able to sort genomes when we evolved Taylor Swift
  • we needed to sort objects based on their depth (z value) to draw background items first
  • we need it in many other places in computing!

Sorting a set of items is a good place to dive into the idea of an algorithm.

1.1 Algorithms

An algorithm is a well-defined, complete, step-by-step description that performs a process in a finite amount of time and space.

Before, we have used a straightforward algorithm that is so stupid, it doesn't have a name. Let's call it squared_sort for a reason that will become obvious shortly.

The Squared sort algorithm, stated very abstractly:

Compare all combinations of values at positions i and j:
    Swap where item at i is greater than the item at j

The Squared sort algorithm, stated with more details about an implementation:

Starting at the first element and going to second-to-last item, call this i:
   Starting at i, and going to last item, call this j:
       if the item at location i is greater than the item at location j:
           Swap the items at i and j
       end if
   end j loop
end i loop

An algorithm is not so specific that it requires that it be written in a computer language. However, it is specific enough that it can be written in a programming language.

squared_sort looks like this in Java:

void squared_sort() {
    for (int i = 0; i < array.length - 1; i++) {
        for (int j = i; j < array.length; j++) {
            comparisons++;
            if (array[i] > array[j]) {
                swap(i, j);
            }
        }
    }
}

void swap(int i, int j) {
    swaps++;
    int tmp = array[i];
    array[i] = array[j];
    array[j] = tmp;
}

As we discussed in class, this sorting algorithm will compare all pairwise combinations of items, swapping the elements of an array if the first (array[i]) is bigger than the second (array[j]).

Things to note:

  1. the array operated on must be a global
  2. we keep track of the comparisons made, and the number of swapped items through global variables comparisons, and swaps
  3. i runs from 0 to length - 1
  4. j runs from i to length
  5. reported times are in milliseconds, ms. There are 1,000 ms in a second

About how many comparisons and swaps will be performed on an array of 10 items? On 1000 items?

Stop here and think.

Ok, now let's experiment!

1.1.1 Squared sort

Putting everything together to create an array of 10 random numbers, and sort them using squared_sort.

Shows the time, comparisons, and swaps for 10 random elements.

In [3]:
int [] array;
int comparisons = 0;
int swaps = 0;

void squared_sort() {
    for (int i = 0; i < array.length - 1; i++) {
        for (int j = i; j < array.length; j++) {
            comparisons++;
            if (array[i] > array[j]) {
                swap(i, j);
            }
        }
    }
}

void swap(int i, int j) {
    swaps++;
    int tmp = array[i];
    array[i] = array[j];
    array[j] = tmp;
}

void print_array() {
    for (int i = 0; i < array.length; i++) {
        print("" + array[i] + ", ");
    }
    println("");
}

void setup() {
    println("-----------------------------------------------------------------");

    array = new int [10];
    for (int i = 0; i < array.length; i++) {
        array[i] = int(random(array.length));
    }

    comparisons = 0;
    swaps = 0;
    println("Before sorting:");
    if (array.length < 100)
        print_array();

    int start = millis();
    squared_sort();
    int stop = millis();
    println("Squared sort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
    if (array.length < 100)
        print_array();
}
Sketch #1:

Sketch #1 state: Loading...

You should see output that is similar to:

Before sorting:
6, 3, 9, 5, 4, 3, 1, 8, 5, 5, 
Squared took: 0 ms, 54 compares, 22 swaps on random 10
1, 3, 3, 4, 5, 5, 5, 6, 8, 9, 

Does that match your expectations?

If we look at the algorithm (or code) we see that we run through all of the items in the array, and for each of those, we basically run through the items in the array again! You'll study these ideas much more in Data Structures, but as a rough approximation, we see that we'd make about $n ^ 2$ comparisons (where n is the number of items in an array). That would be about 100 in this example. (Actually, we see that since j starts at i, it is really $n^2/2$ which is closer to what we actually see, 54).

Do the following experiments:

How does it do on 100 items?

How does it do on 1000 items?

How does it do on 10,000 items?

This $n ^ 2$ is bad.

Sketch on paper what the graph looks like for squared-sort

1.1.2 Bubble sort

Bubble sort is the name given to a simple, and somewhat interesting, sorting algorithm.

The Bubblesort algorithm in abstract terms:

While not done:
    Run through all items in a set, i:
        if item i is greater than item i + 1:
            Swap items i and i + 1
    If no swaps were made on last sweep, you are done

The Bubblesort algorithm, with more implementation details:

Let stop be the length of the array
While stop is not at the beginning:
    Starting at the first element and going to stop - 1, call this i:
       Let new_stop be the beginning position
       if the item at location i is greater than the item at location i + 1:
           swap the items at i and i + 1
           Save i as new_stop
       end if
    Let stop be new_stop
end while

And written in Java:

void bubble_sort() {
    int stop = array.length;
    while (stop != 0) {
        last_swap = 0;
        for (int i = 0; i < stop - 1; i++) {
            comparisons++;
            if (array[i] > array[i + 1]) {
                swap(i, i + 1);
                last_swap = i + 1;
            }
        }
        stop = last_swap;
    }
}

How will this compare to Squared sort?

Let's try it!

In [5]:
int [] array;
int comparisons = 0;
int swaps = 0;

void bubble_sort() {
    int stop = array.length;
    while (stop != 0) {
        int last_swap = 0;
        for (int i = 0; i < stop - 1; i++) {
            comparisons++;
            if (array[i] > array[i + 1]) {
                swap(i, i + 1);
                last_swap = i + 1;
            }
        }
        stop = last_swap;
    }
}

void swap(int i, int j) {
    swaps++;
    int tmp = array[i];
    array[i] = array[j];
    array[j] = tmp;
}

void print_array() {
    for (int i = 0; i < array.length; i++) {
        print("" + array[i] + ", ");
    }
    println("");
}

void setup() {
    println("-----------------------------------------------------------------");

    array = new int [10];
    for (int i = 0; i < array.length; i++) {
        array[i] = int(random(array.length));
    }

    comparisons = 0;
    swaps = 0;
    println("Before sorting:");
    if (array.length < 100)
        print_array();

    int start = millis();
    bubble_sort();
    int stop = millis();
    println("Bubble sort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
    if (array.length < 100)
        print_array();
}
Sketch #2:

Sketch #2 state: Loading...

You probably see results, such as:

Before sorting:
2, 4, 5, 5, 9, 6, 7, 8, 0, 9, 
Bubble sort took: 0 ms, 37 compares, 11 swaps on random 10
0, 2, 4, 5, 5, 6, 7, 8, 9, 9, 

Try Bubble sort on 10, 100, 1000, and 10,000 array sizes*

Describe how bubble sort differs in behavior compared to squared sort. Draw on the graph Bubble sort results.

1.1.2.1 Bubble sort is best, at something!

A big difference between bubble_sort and all other sorts is its behavior on an already sorted array.

Try it!

Do an experiment where you call bubble sort a second time. You can do the same for squared sort, but it will always give the same number of comparisons for a given $n$.

1.1.3 Quicksort

The final algorithm that we will look at today is Quicksort. Quicksort represents a class of algorithms in computer science that capture the motto "divide and conquer".

The Quicksort algorithm, at 30,000 feet (abstract view):

Pick a pivot value
Break set of items into two parts:
    first part is all less than the pivot value
    second part is all greater than the pivot value
Sort the first part
Sort the second part

The Quicksort algorithm, with more implementation details:

Quicksort on start and end:
    if start less than end:
        Partition from start to end around p:
        Quicksort on start and p - 1
        Quicksort on p and end

The partition picks a point,

Partition from start to end:
   Pick a pivot position, p, and pivot value
   Partition array into:
       those less than pivot value
       pivot value
       those less than pivot value
   return partition

In Java, the main quicksort function looks like:

void quicksort(int lo, int hi) {
    if (lo < hi) {
        int p = partition(lo, hi);
        quicksort(lo, p - 1);
        quicksort(p + 1, hi);
    }
}

How will this differ from squared sort and bubble sort?

In [9]:
int [] array;
int comparisons = 0;
int swaps = 0;

void quicksort(int lo, int hi) {
    if (lo < hi) {
        int p = partition(lo, hi);
        quicksort(lo, p - 1);
        quicksort(p + 1, hi);
    }
}

int partition(int lo, int hi) {
    int pivotIndex = choosePivot(lo, hi);
    int pivotValue = array[pivotIndex];
    // put the chosen pivot at array[hi]
    swap(pivotIndex, hi);
    int storeIndex = lo;
    // Compare remaining array elements against pivotValue = array[hi]
    for (int i = lo; i <= hi - 1; i++)  {
        comparisons++;
        if (array[i] < pivotValue) {
             swap(i, storeIndex);
             storeIndex = storeIndex + 1;
        }
    }
    swap(storeIndex, hi);  // Move pivot to its final place
    return storeIndex;
}

int choosePivot(int lo, int hi) {
    return ceil((hi + lo) / 2);
}

void swap(int i, int j) {
    swaps++;
    int tmp = array[i];
    array[i] = array[j];
    array[j] = tmp;
}

void print_array() {
    for (int i = 0; i < array.length; i++) {
        print("" + array[i] + ", ");
    }
    println("");
}

void setup() {
    println("-----------------------------------------------------------------");

    array = new int [10];
    for (int i = 0; i < array.length; i++) {
        array[i] = int(random(array.length));
    }

    comparisons = 0;
    swaps = 0;
    println("Before sorting:");
    if (array.length < 100)
        print_array();

    int start = millis();
    quicksort(0, array.length - 1);
    int stop = millis();
    println("Quicksort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);
    if (array.length < 100)
        print_array();
}
Sketch #3:

Sketch #3 state: Loading...

You'll see something similar to:

Before sorting:
1, 9, 6, 8, 5, 9, 7, 7, 5, 0, 
Quicksort took: 0 ms, 32 compares, 26 swaps on random 10
0, 5, 6, 5, 1, 7, 7, 8, 9, 9, 

Try Quicksort on 10, 100, 1000, and 10,000 array sizes*

Describe how quicksort differs in behavior compared to bubble and squared sort.

Create a plot with all of the following lines:

For different sizes of n (10, 100, 1,000, 10,000):

  1. Squared sort, on random data and on sorted data
  2. Bubble sort, on random data and on sorted data
  3. Quicksort, on random data and on sorted data

Here is some code that might help:

In [16]:
int [] array;
int comparisons = 0;
int swaps = 0;

void swap(int i, int j) {
    swaps++;
    int tmp = array[i];
    array[i] = array[j];
    array[j] = tmp;
}

void squared_sort() {
    for (int i = 0; i < array.length - 1; i++) {
        for (int j = i; j < array.length; j++) {
            comparisons++;
            if (array[i] > array[j]) {
                swap(i, j);
            }
        }
    }
}

void bubble_sort() {
    int stop = array.length;
    while (stop != 0) {
        int last_swap = 0;
        for (int i = 0; i < stop - 1; i++) {
            comparisons++;
            if (array[i] > array[i + 1]) {
                swap(i, i + 1);
                last_swap = i + 1;
            }
        }
        stop = last_swap;
    }
}

void quicksort(int lo, int hi) {
    if (lo < hi) {
        int p = partition(lo, hi);
        quicksort(lo, p - 1);
        quicksort(p + 1, hi);
    }
}

int partition(int lo, int hi) {
    int pivotIndex = choosePivot(lo, hi);
    int pivotValue = array[pivotIndex];
    // put the chosen pivot at array[hi]
    swap(pivotIndex, hi);
    int storeIndex = lo;
    // Compare remaining array elements against pivotValue = array[hi]
    for (int i = lo; i <= hi - 1; i++)  {
        comparisons++;
        if (array[i] < pivotValue) {
             swap(i, storeIndex);
             storeIndex = storeIndex + 1;
        }
    }
    swap(storeIndex, hi);  // Move pivot to its final place
    return storeIndex;
}

int choosePivot(int lo, int hi) {
    return ceil((hi + lo) / 2);
}

void print_array() {
    for (int i = 0; i < array.length; i++) {
        print("" + array[i] + ", ");
    }
    println("");
}

void setup() {
    println("=================================================================");
    for (int s = 5; s < 11; s++) { 
        println("-----------------------------------------------------------------");
        int[] orig = new int [int(pow(2, s))];
        array = new int [int(pow(2, s))];
        // make a copy, to compare across all
        for (int i = 0; i < array.length; i++) {
            orig[i] = int(random(array.length));
        }

        // reset array:
        for (int i = 0; i < array.length; i++) {
            array[i] = orig[i];
        }
        comparisons = 0;
        swaps = 0;
        int start = millis();
        squared_sort();
        int stop = millis();
        println("Squared took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);

        // reset:
        comparisons = 0;
        swaps = 0;
        start = millis();
        squared_sort();
        stop = millis();
        println("Squared took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on sorted " + array.length);

        // reset array:
        for (int i = 0; i < array.length; i++) {
            array[i] = orig[i];
        }
        comparisons = 0;
        swaps = 0;
        start = millis();
        bubble_sort();
        stop = millis();
        println("Bubble took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);

        // reset:
        comparisons = 0;
        swaps = 0;
        start = millis();
        bubble_sort();
        stop = millis();
        println("Bubble took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on sorted " + array.length);

        // reset array:
        for (int i = 0; i < array.length; i++) {
            array[i] = orig[i];
        }
        comparisons = 0;
        swaps = 0;
        start = millis();
        quicksort(0, array.length - 1);
        stop = millis();
        println("Quicksort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on random " + array.length);

        // reset:
        comparisons = 0;
        swaps = 0;
        start = millis();
        quicksort(0, array.length - 1);
        stop = millis();
        println("Quicksort took: " + (stop - start) + " ms, " + comparisons + " compares, " + swaps + " swaps on sorted " + array.length);
    }
}
Sketch #5:

Sketch #5 state: Loading...