Lecture 18

Traversals; set & defaultdict

MCS 275 Spring 2024
Emily Dumas

View as:   Presentation   ·   PDF-exportable  ·   Printable

Lecture 18: Traversals; set & defaultdict

Reminders and announcements:

  • Homework 6 due tomorrow
  • Project 2 due Friday
  • Project 2 autograder opens today
  • I will try to grade Project 1 by end of day Wednesday

IntegerSet variants

  • IntegerSetBase - non-functional base class; declares interface, does type checking, just needs a storage system to be added
  • IntegerSet - uses BST
  • IntegerSetUL - uses unsorted list
  • IntegerSetSL - uses sorted list

IntegerSet timing

integerset.py has been updated with a script to test addition and membership test times for 20,000 integers.

Walking a tree

Back to discussing binary trees (BST or not).

To systematically visit every node is to traverse or walk the tree.

Natural to do recursively; think of it as three tasks:

  • Visit a node
  • Traverse its left subtree
  • Traverse its right subtree

Named traversals

Result depends on how you order the tasks.

preorder means:Node,Left,Right
postorder means:Left,Right,Node
inorder means:Left,Node,Right

Note: They all visit left child before right child.

Preorder traversal

node, left, right

Preorder traversal

Typical use: Make a copy of the tree.

Insert the keys into an empty BST in this order to recreate the original tree.

Postorder traversal

left, right, node

Postorder traversal

Typical use: Delete the tree.

If you delete keys in postorder, then you will only ever be removing nodes without children.

Inorder traversal

left, node, right

Inorder traversal

Typical use: Turn a BST into a sorted list of keys.

Uniquely describing a tree

Many different binary trees can have the same inorder traversal.

Many different binary trees can have the same preorder traversal.

And yet:

Theorem: A binary tree T is uniquely determined by its inorder and preorder traversals.

Strengths of BSTs

  • BSTs make a lot of data accessible in a few "hops" from the root.
  • They are a good choice for mutable data structures involving search operations.
  • Deletion of a node is an important feature we didn't implement. (Take MCS 360!) But it can also be done efficiently.
  • Unbalanced trees are less efficient.

MCS 360 usually covers rebalancing operations.

Set

Python's built-in type set represents an unordered collection of distinct objects.

You can put an object in a set if (and only if) it's allowed as a key of a dict. For built-in types that usually just means immutable.

Allowed: bool, int, float, str, tuple

Not allowed: list, set, dict

Set usage

S = { 4, 8, 15, 16, 23, 42 } # Set literal
S = set()  # New empty set
S.add(5)   # S is {5}
S.add(10)  # S is {5,10}
8 in S   # False
5 in S   # True
S.discard(1)  # Does nothing
S.remove(1)   # Raises KeyError
S.remove(5)   # Now S is {10}
S.pop()  # Remove and return one element (unclear which!)
for x in S:  # sets are iterable (but no control over order)
    print(x)

Set operations

Binary operations returning new sets:


S | S2  # Evaluates to union of sets
S & S2  # Evaluates to intersection of sets
S.union(iterable)        # Like | but allows any iterable
S.intersection(iterable) # Like & but allows any iterable

Set mutations

Operations that modify a set S based on contents of another collection.


        # adds elements of iterable to S
        S.update(iterable) 
        
        # remove anything from S that is NOT in the iterable
        S.intersection_update(iterable) 
        
        # remove anything from S that is in the iterable
        S.difference_update(iterable) 

More about set

set has lots of other features that are described in the documentation.

Python's set is basically a dictionary without values.

For large collections, it is much faster than using a list.

Appropriate whenever order is not important, and items cannot appear multiple times.

Histogram

You want to know how many times each character appears in a string.

hist = dict()
for c in s:
    hist[c] += 1

This won't work. Why?

defaultdict

Built-in module collections contains a class defaultdict that works like a dictionary, but if a key is requested that doesn't exist, it creates it and assigns a default value.

import collections
hist = collections.defaultdict(int)
for c in s:
    hist[c] += 1

This works!

The defaultdict constructor takes one argument, a function default_factory.

default_factory is called to make default values for keys when needed.

Common examples with built-in factories:

defaultdict(list)  # default value [] as returned by list()
defaultdict(int)   # default value 0, as returned by int()
defaultdict(float) # default value 0.0, as returned by float()
defaultdict(str)   # default value "", as returned by str()

References

Revision history

  • 2022-03-02 Finalization of the 2022 lecture this was based on
  • 2024-02-19 Initial publication