Here are links to two CSV files you should download:
Each one contains timing data for three sorting algorithms:
mergesort
we wrote in classquicksort
we wrote in classEach row of file timing-random.csv
shows the time taken by each algorithm to sort a list that is randomly shuffled. The first column of the file, n
, is the length of the list for that test.
The file timing-nearlysorted.csv
is similar, but for those tests the list used is in "nearly sorted order".
Write a Python script that does the following:
timing-plot-random.png
and timing-plot-nearlysorted.png
.In each plot there should be a title, axes labels, a legend, and the data for each algorithm should appear in a different color. Use dots as markers, and do not have the data points connected by lines.
Choose limits on the y
axis so that if one of the tests takes much longer than all the others, it doesn't prevent the other two from being seen in detail. (It's OK if that means the long-running one goes off the top of the plot.)
Put your code in hwk10prob2.py
. Upload the code and both images timing-plot-random.png
and timing-plot-nearlysorted.png
to gradescope.
import matplotlib.pyplot as plt
import numpy as np
import csv
import collections
def load_dataset(fn):
"Load columns into a dictionary mapping column names to arrays"
d = collections.defaultdict(list)
with open(fn,"r") as fp:
rdr = csv.DictReader(fp)
for row in rdr:
for col in row:
if col == "n":
coltype = int
else:
coltype = float
d[col].append( coltype(row[col]) )
for col in d:
d[col] = np.array(d[col])
return dict(d) # convert defaultdict to regular dict
rand_data = load_dataset("timing-random.csv")
ns_data = load_dataset("timing-nearlysorted.csv")
plt.figure(figsize=(8,5),dpi=100)
plt.plot(rand_data["n"],rand_data["quicksort"],marker=".",linestyle="none",label="quicksort")
plt.plot(rand_data["n"],rand_data["mergesort"],marker=".",linestyle="none",label="mergesort")
plt.plot(rand_data["n"],rand_data["list.sort"],marker=".",linestyle="none",label="list.sort")
plt.title("Sort timing, random input")
plt.xlabel("n")
plt.ylabel("time (s)")
plt.ylim(0,0.028)
plt.savefig("timing-plot-random.png")
plt.show()
plt.figure(figsize=(8,5),dpi=100)
plt.plot(ns_data["n"],ns_data["quicksort"],marker=".",linestyle="none",label="quicksort")
plt.plot(ns_data["n"],ns_data["mergesort"],marker=".",linestyle="none",label="mergesort")
plt.plot(ns_data["n"],ns_data["list.sort"],marker=".",linestyle="none",label="list.sort")
plt.title("Sort timing, nearly sorted input")
plt.xlabel("n")
plt.ylabel("time (s)")
plt.ylim(0,0.03)
plt.savefig("timing-plot-nearlysorted.png")
plt.show()
Work in hwk10prob3.py
.
Write a Python script to generate a plot as similar to the one below as you can manage. Try to do it with as few for
loops as you can manage, favoring numpy
features instead. (The code I used to generate it has no loops.)
Hint: You should use plt.scatter
.
Put your code in hwk10prob3.py
. Upload the code and a PNG image of your plot to gradescope.
Based on the idea that np.meshgrid
returns two matrices; one increases along rows, the other along columns. With a small position shift, these give the right behavior to make the pink and blue dot grids.
plt.figure(figsize=(8,8),dpi=100)
x = np.arange(10)
y = np.arange(10)
xx,yy = np.meshgrid(x,y)
plt.scatter(xx.ravel(),yy.ravel(),s=50*(1+xx.ravel()),c="#70A0FF")
plt.scatter((xx[:-1,:-1] + 0.5).ravel(),(yy[:-1,:-1] + 0.5).ravel(),s=50*(1+yy[:-1,:-1].ravel()),c="#FFAABB")
plt.xlim(-0.5,9.5)
plt.ylim(-0.5,9.5)
plt.show()