MCS 275 Spring 2021
Emily Dumas
Course bulletins:
Let's look at the functions $n$, $n \log(n)$, and $n^2$ as $n$ grows.
We say L
is partitioned if there is an element (the pivot) so that:
partition(L,start,end)
should move around elements of L
between indices start
and end
to achieve this,
returning the pivot position.
quicksort
:
Input: list L
and indices start
and end
.
Goal: reorder elements of L
so that L[start:end]
is sorted.
(end-start)
is less than or equal to 1, return immediately.partition(L,start,end)
to partition the list, letting m
be the final location of the pivot.
quicksort(L,start,m)
and quicksort(L,m+1,end)
to sort the parts of the list on either side of the pivot.We will always use L[end-1]
as the pivot (though there are other common choices).
There is an algorithm for partition
based on the idea of using swaps to move all small elements to a contiguous block at the beginning.
It makes a single pass through the entire list.
Let's implement partition
(with last element pivot) in Python.
partition
:
Input: list L
and indices start
and end
.
Goal: Take L[end-1]
as a pivot, and reorder elements of L
to partition L[start:end]
accordingly.
pivot=L[end-1]
.dst=start
.src
from start
to end-1
:L[src] < pivot
, swap L[src]
and L[dst]
.dst
by 1.L[end-1]
and L[dst]
to put the pivot in its proper place.dst
.Popular choices for the pivot:
L[end-1]
(used in lecture today)L[start]
L[start:end]
L[(start+end)//2]
L[start:end]
(more complicated to find!)Knowing something about your starting data may guide choice of partition strategy (or even the choice to use something other than quicksort).
Almost-sorted data is a common special case where first or last pivots are bad.
Theorem: If you measure the time cost of quicksort in any of these terms
then the cost to sort a list of length $n$ is less than $C n^2$, for some constant $C$.
But if you average over all possible orders of the input data, the result is less than $C n \log(n)$.
What if we ask our version of quicksort
to sort a list that is already sorted?
Recursion depth is $n$ (whereas if the pivot is always the median it would be $\approx \log_2 n$).
Number of comparisons $\approx C n^2$. Very slow!
A sort is called stable if items that compare as equal stay in the same relative order after sorting.
This could be important if the items are more complex objects we want to sort by one attribute (e.g. sort alphabetized employee records by hiring year).
As we implemented them:
Algorithm | Time (worst) | Time (average) | Stable? | Space |
---|---|---|---|---|
Mergesort | $C n \log(n)$ | $C n\log(n)$ | Yes | $C n$ |
Quicksort | $C n^2$ | $C n\log(n)$ | No | $C$ |
(Every time $C$ is used, it represents a different constant.)
Algorithm | Time (worst) | Time (average) | Stable? | Space |
---|---|---|---|---|
Mergesort | $C n \log(n)$ | $C n\log(n)$ | Yes | $C n$ |
Quicksort | $C n^2$ | $C n\log(n)$ | No | $C$ |
Insertion | $C n^2$ | $C n^2$ | Yes | $C$ |
Bubble | $C n^2$ | $C n^2$ | Yes | $C$ |
(Every time $C$ is used, it represents a different constant.)
Mergesort is rarely a bad choice. It is stable and sorts in $C n \log(n)$ time. Nearly sorted input is not a pathological case. Its main weakness is its use of memory proportional to the input size.
Heapsort, which we'll discuss later, has $C n \log(n)$ running time and uses constant space, but it is not stable.
There are stable comparison sorts with $C n \log(n)$ running time and constant space (best in every category!) though they are more complex.
If swaps and comparisons have very different cost, it may be important to select an algorithm that minimizes one of them. Python's list.sort
assumes that comparisons are expensive, and uses Timsort.
Algorithms that take time proportional to $n^2$ are a big source of real-world trouble. They are often fast enough in small-scale tests to not be noticed as a problem, yet are slow enough for large inputs to disable the fastest computers.
Unchanged from Lecture 17