We've covered a lot of small topics lately; this worksheet includes:
set
and defaultdict
These things might be helpful while working on the problems. Remember that for worksheets, we don't strictly limit what resources you can consult, so these are only suggestions.
If one of these problems ends up taking a lot of time, it might make sense to skip to another one or move on to another aspect of the problem so that you spend some time on each topic (set/defaultdict, CSV/JSON, Pillow) during lab.
Recall that we have two classes for building trees:
Node
, for generic binary treesBST
, a subclass for binary search treesConsider the following binary tree, made of Node
objects:
<True>
/ \
<"Alice"> <False>
/ \
<5.8> <5.5>
If we wanted to save the tree to a file, JSON would be a natural choice as it allows nesting of data structures. You can't write an arbitrary object (like Node
or BST
) to a JSOn file, but the tree above might be saved to a JSON file by creating a nested dictionary such as
{
"class": "Node",
"tree": {
"key": True,
"left": {
"key": "Alice",
"left": {
"key": 5.8,
"left": None,
"right": None
},
"right": {
"key": 5.5,
"left": None,
"right": None
}
},
"right": {
"key": False,
"left": None,
"right": None
}
}
}
and then writing that to a file.
That would result in a JSON file with two top-level keys: "class"
indicates what kind of tree it is (BST
or Node
), and "tree"
maps to a hierarchy of objects that represent the nodes of the tree.
The same general approach could be applied to binary search trees, too. This BST:
<6>
<5> <14>
<4> <10> <16>
Could be saved to a JSON file as the dictionary
{
"class": "BST",
"tree": {
"key": 6,
"left": {
"key": 5,
"left": {
"key": 4,
"left": None,
"right": None
},
"right": None
},
"right": {
"key": 14,
"left": {
"key": 10,
"left": None,
"right": None
},
"right": {
"key": 16,
"left": None,
"right": None
}
}
}
}
Add a method save(fp)
to class Node
in trees.py
that will save a tree as JSON, writing it to an open file object fp
.
Then, add a function to the module trees.py
called load(fp)
that takes an open file object, reads a JSON object from it, and then builds and returns the corresponding tree. The return type should be either Node
or BST
depending on what is found in the JSON file.
Suggestions / hints:
BST
case and build a version that works for Node
objects. In fact, ignore the BST
case entirely unless you have time to add that at the end.Node
called as_dict_tree()
that handles this naturally recursive operation. Then you can handle the top-level object creation (with "class"
and "tree"
keys) in the method save
itself.{"Node": Node, "BST": BST}
can be used to map names of classes to actual classes. That allows you to build a Node
or BST
object depending on a string, e.g.node_classes = {"Node": Node, "BST": BST}
k = "Node"
A = node_classes[k](key=5)
# Now A is a Node object
k = "BST"
B = node_classes[k](key=11)
# Now B is a BST object
When you're done, the following code should build a BST, save it to a file, and load it back again.
from trees import load, BST
from treevis import treeprint
T = BST()
T.insert(8)
T.insert(12)
T.insert(2)
T.insert(3)
with open("tree.json","w",encoding="UTF-8") as fp:
T.save(fp)
with open("tree.json","r",encoding="UTF-8") as fp:
W = load(fp)
print("Tree that was saved:")
treevis.treeprint(T)
print("Tree that was loaded:")
treevis.treeprint(W)
I recommend working on this problem in a Python notebook.
Imagine you're a developer working on a project where many pixel art images will be incorporated into an application (e.g. a game, a GUI program, etc.). A few of these images have already been drawn by various artists, and those artists were free to choose any colors they liked when doing so. Here are links to those images:
But now there is a plan to standardize on a palette of a few colors, and to draw all remaining images using only that color palette.
Your task is to analyze the existing images and generate a list of the distinct colors that are used in them. Ideally, the list should be ordered from most-frequently used colors (in terms of number of pixels) to least-frequently used. However, if sorting by frequency proves to be too complicated, it's acceptable to simply produce a list of distinct colors in arbitrary order.
Make a program that does this, and which outputs the list in two ways:
palette.csv
that has three columns, named red
, green
, and blue
, and whose rows contain the distinct colors appearing in the sample images. Its contents might look like this, for example:red,green,blue
18,210,194
241,231,108
...
palette.png
that is created by the program in which 32 of the colors are shown in a horizontal strip, with each one represented by a 16x64 (width x height) pixel rectangle filled with that color. (Thus the image has dimensions 512x64.) Here's a sample of what it might look like, but this sample doesn't show the colors listed above nor colors that necessarily appear in the sample images.
Do these images actually have any colors in common, or is this simply a matter of concatenating the color lists from the three images?
This problem provides a natural opportunity to use several things:
defaultdict
to track how many pixels have each color, ORset
to track the set of distinct colors (without tracking how often they are used)csv
module to handle writing the palette to CSV filePIL
module), to manage loading of the sample images and saving of palette.png
Here is a function that takes two strings and returns the set of characters that appear in both strings.
def common_chars(s1,s2):
"""Return a set of all characters that are present in both
strings `s1` and `s2`."""
common = set()
for c1 in s1:
if c1 in s2:
common.add(c1)
return common
It works. Here's a simple example:
common_chars("mathematics","computer science")
However, this function is actually needlessly slow. Here's an example that generates two strings that each have 50,000
characters, runs common_chars
on them, and prints the total time.
import random
import time
s1 = ''.join([ random.choice(["edfghijklmnopqrstuvwxyzzzzzzzzzzzzzzzz"]) for _ in range(50000) ])
s2 = ''.join([ random.choice(["abcedfghijklmnopqrstuvw"]) for _ in range(50000) ]) + 'z'
t_start = time.time()
both = common_chars(s1,s2)
t_end = time.time()
print("Common characters:")
print(both)
print("\nRunning time: {:.2f} seconds".format(t_end-t_start))
If you try this yourself, you might get a slightly different time, but it will probably take more than 10 seconds.
First, what is going on here? It should be possible to compare millions of characters for equality per second, and there are only 100,000 characters you need to look at, right?
Second, can you fix it? (It is possible to make this function shorter, clearer, and so that it takes less than 0.05 seconds.)