The main topics of this worksheet are:
These things might be helpful while working on the problems. Remember that for worksheets, we don't strictly limit what resources you can consult, so these are only suggestions.
The most useful files from the sample code repository are:
Here is a function that takes two strings and returns the set of characters that appear in both strings.
def common_chars(s1,s2):
"""Return a set of all characters that are present in both
strings `s1` and `s2`."""
common = set()
for c1 in s1:
if c1 in s2:
common.add(c1)
return common
It works. Here's a simple example:
common_chars("mathematics","computer science")
However, this function is actually needlessly slow. Here's an example that generates two strings that each have 50,000
characters, runs common_chars
on them, and prints the total time.
import random
import time
s1 = ''.join([ random.choice(["edfghijklmnopqrstuvwxyzzzzzzzzzzzzzzzz"]) for _ in range(50000) ])
s2 = ''.join([ random.choice(["abcedfghijklmnopqrstuvw"]) for _ in range(50000) ]) + 'z'
t_start = time.time()
both = common_chars(s1,s2)
t_end = time.time()
print("Common characters:")
print(both)
print("\nRunning time: {:.2f} seconds".format(t_end-t_start))
If you try this yourself, you might get a slightly different time, but it will probably take more than 10 seconds.
First, what is going on here? It should be possible to compare millions of characters for equality per second, and there are only 100,000 characters you need to look at, right?
Second, can you fix it? (It is possible to make this function shorter, clearer, and so that it takes less than 0.05 seconds.)
Recall that we have two classes for building trees:
Node
, for generic binary treesBST
, a subclass for binary search treesConsider the following binary tree, made of Node
objects:
[True]
/ \
["Alice"] [False]
/ \
[5.8] [5.5]
If we wanted to save the tree to a file, JSON would be a natural choice as it allows nesting of data structures. The tree above might be saved as:
{
"class": "Node",
"tree": {
"key": true,
"left": {
"key": "Alice",
"left": {
"key": 5.8,
"left": null,
"right": null
},
"right": {
"key": 5.5,
"left": null,
"right": null
}
},
"right": {
"key": false,
"left": null,
"right": null
}
}
}
Here, the top-level JSON object has just two keys: "class"
indicates what kind of tree it is (BST or Node), and "tree"
maps to a hierarchy of objects that represent the nodes of the tree.
The same general approach could be applied to binary search trees, too. This BST:
[6]
[5] [14]
[4] [10] [16]
Could be saved as:
{
"class": "BST",
"tree": {
"key": 6,
"left": {
"key": 5,
"left": {
"key": 4,
"left": null,
"right": null
},
"right": null
},
"right": {
"key": 14,
"left": {
"key": 10,
"left": null,
"right": null
},
"right": {
"key": 16,
"left": null,
"right": null
}
}
}
}
Add a method save(fp)
to class Node
in trees.py
that will save a tree as a JSON object, writing it to an open file object fp
.
Then, add a function to the module trees.py
called load(fp)
that takes an open file object, reads a JSON object from it, and then builds and returns the corresponding tree. The return type should be either Node
or BST
depending on what is found in the JSON file.
Suggestions / hints:
Node
objectsNode
called as_dict_tree()
that handles this naturally recursive operation. Then you can handle the top-level object creation (with "class"
and "tree"
keys) in the method save
itself.{"Node": Node, "BST": BST}
can be used to map names of classes to actual classes. That allows you to build a Node
or BST
object depending on a string, e.g.node_classes = {"Node": Node, "BST": BST}
k = "Node"
A = node_classes[k](key=5)
# Now A is a Node object
k = "BST"
B = node_classes[k](key=11)
# Now B is a BST object
When you're done, the following code should build a BST, save it to a file, and load it back again.
from trees import load, BST
from treevis import treeprint
T = BST()
T.insert(8)
T.insert(12)
T.insert(2)
T.insert(3)
with open("tree.json","w",encoding="UTF-8") as fp:
T.save(fp)
with open("tree.json","r",encoding="UTF-8") as fp:
W = load(fp)
print("Tree that was saved:")
treevis.treeprint(T)
print("Tree that was loaded:")
treevis.treeprint(W)
Write a program to merge any number of CSV files, so that each row in any of the input files becomes a row in the output file. All of the input CSV files will have header rows. If the CSV files have the same columns, this is of course easy. But you should also handle the general case, where some columns may exist in multiple files, and others may be unique to a single file. The output file should contain one column for each distinct column name that appears in any of the input files.
Arrange it so your program csvmerge.py
accepts all the input filenames as command line arguments. The last command line argument is the name of the output file that should be created.
For example, you might use a command like
python3 csvmerge.py a.csv b.csv c.csv out.csv
with a.csv
containing:
name,age,favorite
Melissa,52,vanilla
Jonah,24,strawberry
and b.csv
containing:
name,major
Josefina,falconry
David,phrenology
and c.csv
containing:
age,major
5,bubbles
11,chess
In which case the program should create out.csv
containing:
name,age,favorite,major
Melissa,52,vanilla,
Jonah,24,strawberry,
Josefina,,,falconry
David,,,phrenology
,5,,bubbles
,11,,chess
This is an elaboration of a comment from lecture. It doesn't ask you to do much except experiment with an interesting construction.
The following one-line function lets you create hierarchies of dictionaries in Python using defaultdict:
# Adapted from https://gist.github.com/hrldcpr/2012250 by Harold Cooper
from collections import defaultdict
def autohierarchy():
return defaultdict(autohierarchy)
Here's an example of how you can use it. Basically, you just attempt to access keys to create them! You can even access a key of a nested dictionary, and all the surrounding dictionaries will be created.
tasks = autohierarchy()
print("Adding stuff")
tasks["home"]["maintenance"]["demagnetize rain gutters"]
tasks["home"]["maintenance"]["replace missing front door"]
tasks["home"]["cleaning"]["load dishwasher"]
tasks["home"]["cleaning"]["run dishwasher"]
tasks["home"]["cleaning"]["empty dishwasher"]
tasks["school"]["mcs 275"]["project 3"]
print("Ok, added the stuff")
Now if you print one of these objects, it looks ugly. But if you convert it to JSON, it looks nice.
import json
print("Ugly, but has all the stuff we intended:")
print(tasks)
print()
print("Nicer:")
print(json.dumps(tasks, indent=4))
Can you figure out how and why this works?