A document from MCS 275 Spring 2024, instructor Emily Dumas. You can also get the notebook file.

MCS 275 Spring 2024 Worksheet 6 Solutions

  • Course instructor: Emily Dumas
  • Contributors to this document: Johnny Joyce, Kylash Viswanathan, Patrick Ward

Topics

The main topic of this worksheet is sorting, and the recursive algorithms for sorting we covered in lectures 12 and 13.

Resources

These things might be helpful while working on the problems. Remember that for worksheets, we don't strictly limit what resources you can consult, so these are only suggestions.

1. Merge sorted stacks

Recall that a stack is a data structure that mimics a physical stack of items, where the only available operations are to remove the top item ("pop") or add a new item to the top ("push").

In Python, you can simulate a stack using a list by limiting yourself to only calling the methods

  • .pop() for the stack pop operation, and
  • .append(x) for the stack push operation In this way, the end of the list becomes the top of the stack.

In mergesort, the main step was to create a function that can merge two sorted lists. We made a version of this that uses indexing by integers. However, the algorithm for merging two sorted lists only ever needs to look at the "next" item from each of the lists, meaning it can also be implemented using stacks.

Make a function merge_sorted_stacks(A,B,S) that takes two stacks A and B whose elements are in sorted order, with the top of each stack being the smallest element, and an initially empty stack S. The function should merge the two stacks into a single reverse-sorted stack S. It can destroy A and B as it does so.

Remember, A, B, and S will actually be Python list objects. The only thing that makes them stacks is that you won't use any methods or operations on them except .pop() and .append(x).

For example, merge_sorted_stacks should function as follows:

In [3]:
# Example with numbers
# A list of numbers is a sorted stack if it is in descending order
# meaning the top of stack (last element of the list) is the smallest.
A = [5,3,1]
B = [6,4,3,2,0]
S = []
merge_sorted_stacks(A,B,S)
S  # will be a reverse sorted stack: top=last element will be largest element
Out[3]:
[0, 1, 2, 3, 3, 4, 5, 6]
In [4]:
# Example with strings
# A list of strings is a sorted stack if it is in reverse alphabetical order
# meaning the top of stack (last element of the list) is the earliest in 
# the Python string order
S = []
merge_sorted_stacks(
    ["zebra","kangaroo","aardvark"],
    ["newt","asp"],
    S)
S
Out[4]:
['aardvark', 'asp', 'kangaroo', 'newt', 'zebra']

Solution

In [11]:
def merge_sorted_stacks(A,B,S=[]):
    """Takes two sorted stacks A and B. Returns merged sorted stack."""
    # A_next and B_next should represent the next item to be taken off the stack
    A_next = A.pop()
    B_next = B.pop()
    
    # Take items from A and B until one list is empty
    while True:
        
        if A_next <= B_next:
            S.append(A_next)
            if len(A) == 0: # If there's nothing left in A, stop the loop
                S.append(B_next) # There's still an item from B that we need to append
                break
            else:
                A_next = A.pop()
            
        else:
            S.append(B_next)
            if len(B) == 0:
                S.append(A_next) # There's still an item from A that we need to append
                break
            else:
                B_next = B.pop()
            
    # After the loop has finished, one of the two lists may still have items in it
    # So take all the remaining items and put them in S.
    while len(A) != 0:
        A_next = A.pop()
        S.append(A_next)
    while len(B) != 0:
        B_next = B.pop()
        S.append(B_next)
    
    return S

2. Quicksort with other pivot strategies

The quicksort implementation we discussed in lecture uses the last element of the list as a pivot. Let's explore other options.

Make a new version of quicksort that has an optional argument pivot_strategy with default value "last". Implement these other behaviors for the following values:

  • "first" - Always use the first element as the pivot
  • "middle" - Use an element as close as possible to the middle of the list as the pivot
  • "random" - Select the pivot position at random

(Of course, quicksort is always operating on the part of the list between start and end, so all instances of "first", "last", etc., are understood relative to that part. For example, "first" means index start.)

Test your modified version of quicksort to confirm that it works properly with each strategy.

Don't forget that the pivot_strategy argument needs to be propagated to the recursive calls!

Solution

In [2]:
def partition(L, start, end, verbose=False, pivot_strategy="last"):
    """
    Partition L[start:end] using the
    last element (L[end-1]) as the pivot.
    Returns the position of the pivot.
    """
    # Choose the index of the pivot
    if pivot_strategy == "last":
        pivot_index = end - 1
    elif pivot_strategy == "first":
        pivot_index = start
    elif pivot_strategy == "middle":
        pivot_index = (start + end) // 2
    elif pivot_strategy == "random":
        pivot_index = random.choice(range(start,end-1))
    
    # Put the pivot at the end
    L[pivot_index], L[end - 1] = L[end - 1], L[pivot_index]

    # Define the pivot. Now the rest of the algorithm can proceed without any changes
    pivot = L[end - 1]
        
    dst = start
    for src in range(start, end):  # challenge Q: Why can't we use range(start,end-1)
        if L[src] < pivot:
            # swap L[src], L[dst]
            L[src], L[dst] = L[dst], L[src]
            dst += 1

    # put the pivot into its final place
    L[end - 1], L[dst] = L[dst], L[end - 1]
    if verbose:
        print("Partitioned into:", L[start:dst], " ", L[dst], " ", L[dst + 1 : end])
    return dst

def quicksort(L, start=0, end=None, verbose=False, pivot_strategy="last"):
    """
    Sort L[start:end] in place using
    quicksort.  Modifies L, returns nothing.
    """
    if end == None:
        # default to end of L
        end = len(L)
    # Are we done yet?!
    if end - start <= 1:
        return
    # Not done yet, need to partition L
    m = partition(L, start, end, verbose, pivot_strategy)
    # m is now the pivot position
    quicksort(L, start, m, verbose, pivot_strategy)  # orange (less than pivot)
    quicksort(L, m + 1, end, verbose, pivot_strategy)  # purple (>= pivot)

3. Pathological test data generator

We discussed in class that quicksort with last element pivots behaves poorly (quadratic time) on sorted input. I mentioned that "nearly sorted input" also creates problems. But the phrase "nearly sorted" is not precise, and could have many interpretations.

Write the following functions to generate lists that are far from randomly ordered, for testing purposes:

  • nearly_sorted1(n,frac=0.1) - Returns a list of length n. That list should contain consecutive integers, except that int(frac*n) of them are replaced with random values chosen from the range -2*n to 3*n. That is, fraction frac of the positions are chosen randomly from a wider range, but the rest are linearly increasing.
  • nearly_sorted2(n) - Returns a list of length n. The difference between any entry and the one before it is a number randomly chosen from these options: [-1,0,1,2,3,4,5]. Since most of these values are positive, the list "mostly increases" and so could be considered "nearly sorted". Unlike nearly_sorted1, this one doesn't produce lists where an entry is very far from the ones on either side of it.
  • nearly_sorted3(n,k=3) - Returns a list of length n obtained from list(range(n)) as follows: First, that list is broken into k pieces of similar size. Then, those pieces are reassembled in a random order. For example, if n=10 and k=2 you might get a return value like [5,6,7,8,9,0,1,2,3,4] because pieces [0,1,2,3,4] and [5,6,7,8,9] were reassembled in the opposite order. Since the resulting list is likely to have long sublists that are in increasing order, it could be considered "nearly sorted".

Solution

In [3]:
import random

def nearly_sorted1(n, frac=0.1):
    """Generates list of integers 0 to n-1. Replaces n*frac with random ints"""
    L = list(range(n))
    to_replace = random.sample(L, int(frac*n)) # Randomly choose n*frac items to replace
    for i in to_replace:
        L[i] = random.randint(-2*n, 3*n)
    return L

print(nearly_sorted1(10))
[0, 1, 2, -16, 4, 5, 6, 7, 8, 9]
In [4]:
def nearly_sorted2(n):
    """List of n items starting at 0. Each successive element randomly differs from the
    last by one of: -1,0,1,2,3,4,5"""
    L = [0]
    for i in range(n-1):
        L.append( L[-1] + random.choice((-1,0,1,2,3,4,5)))
    return L

print(nearly_sorted2(10))
[0, 3, 4, 3, 3, 4, 4, 9, 12, 13]
In [5]:
def nearly_sorted3(n, k=3):
    """Takes list with numbers 0 to n-1. Breaks into k pieces, shuffles, then re-assembles"""
    L = list(range(n))
    
    # Choose k-1 indices in ascending order where we'll "split" the list.
    # Don't choose index 0 or -1 because this will result in one of our pieces
    # being empty (i.e. L[0:0] is empty and L[-1:-1] is empty)
    indices = sorted(random.sample(L[1:-1], k-1))
    
    pieces = []
    
    # First piece is from index 0 up to our first randomly-chosen index
    pieces.append(L[:indices[0]])
    
    # Middle pieces
    for i in range(1, k-1):
        pieces.append( L[indices[i-1]:indices[i]] )
        
    # Last piece is from last randomly-chosen index onwards
    pieces.append( L[indices[-1]:])
    
    random.shuffle(pieces)
    
    
    # Put our pieces back into one list
    L = []
    for piece in pieces:
        L.extend(piece)
    return L


print(nearly_sorted3(10, k=3))
[1, 2, 3, 4, 5, 6, 7, 0, 8, 9]

4. Quicksort stress test

Do timing studies of quicksort with different pivot strategies on the types of nearly sorted lists generated in the previous problem, for n=1000, n=10_000, and n=100_000. Which ones show a clear difference relative to a randomly shuffled list? (In class we noticed this was the case for last-element pivots and comparing precisely sorted and randomly shuffled input.)

Is there a difference between the strategies? Between the types of nearly sorted input?

Solution

This may take a long time to run!

In [11]:
import time
import sys
import tabulate
from tabulate import tabulate #tabulate is a package which prints tables

sys.setrecursionlimit(100_000) #change how many recursions are allowed

pivot_strategies = ["last", "first", "middle", "random"]
lengths = [1000, 10_000, 100_000]

#create a dictionary for each length
len1000 = {strategy:[strategy] for strategy in pivot_strategies}
len10000 = {strategy:[strategy] for strategy in pivot_strategies}
len100000 = {strategy:[strategy] for strategy in pivot_strategies}
lengthdicts = [len1000,len10000,len100000]

#create a dictionary of the length dictionaries
length_dict = dict(zip(lengths,lengthdicts))
nearly_sorted_funcs = [nearly_sorted1, nearly_sorted2, nearly_sorted3]

for func in nearly_sorted_funcs:
    for length in lengths:
        L = func(length) #create a list of the given length with the given generator 
        for strategy in pivot_strategies:
            # Because quicksort operates in-place, we need to make a copy,
            # otherwise the list will already be sorted for the next run
            L_copy = L.copy()
            a = time.time()
            quicksort(L_copy, pivot_strategy = strategy)
            b = time.time()
            length_dict[length][strategy].append(b-a) #add the timing to the appropriate list



                
In [12]:
#create tables            
table1000 = [length_dict[1000][i] for i in pivot_strategies]
table10000 = [length_dict[10000][i] for i in pivot_strategies]
table100000 = [length_dict[100000][i] for i in pivot_strategies]

#display tables    
print("Length = 1,000")
print(tabulate(table1000,headers=["Strategy","nearly sorted 1", "nearly sorted 2", "nearly sorted 3"]))
print("-"*50)
print("Length = 10,000")
print(tabulate(table10000,headers=["Strategy","nearly sorted 1", "nearly sorted 2", "nearly sorted 3"]))
print("-"*50)
print("Length = 100,000")
print(tabulate(table100000,headers=["Strategy","nearly sorted 1", "nearly sorted 2", "nearly sorted 3"]))
Length = 1,000
Strategy      nearly sorted 1    nearly sorted 2    nearly sorted 3
----------  -----------------  -----------------  -----------------
last               0.00299907        0.0349188           0.03901
first              0.00500798        0.0202539           0.0220072
middle             0.00299072        0.000750065         0.00193596
random             0.00100088        0.00219584          0.00105667
--------------------------------------------------
Length = 10,000
Strategy      nearly sorted 1    nearly sorted 2    nearly sorted 3
----------  -----------------  -----------------  -----------------
last                0.0800476          3.50105            4.39366
first               0.0640016          1.7971             3.77291
middle              0.0159976          0.0129473          0.0149899
random              0.0170054          0.019053           0.0170023
--------------------------------------------------
Length = 100,000
Strategy      nearly sorted 1    nearly sorted 2    nearly sorted 3
----------  -----------------  -----------------  -----------------
last                 0.550946         414.372             52.9173
first                0.846028         247.995             46.1735
middle               0.332998           0.168922           0.288047
random               0.279958           0.231125           0.223828

Revision history

  • 2024-02-15 Initial publication