A document from MCS 260 Fall 2021, instructor Emily Dumas. You can also get the notebook file.

MCS 260 Fall 2021 Homework 7 Solutions

  • Course instructor: Emily Dumas
  • Solutions prepared by: Johnny Joyce and Emily Dumas

Instructions:

  • Complete the problems below, which ask you to write Python scripts.
  • Upload your python code directly to gradescope, i.e. upload the .py files containing your work. (If you upload a screenshot or other file format, you won't get credit.)

Deadline

This homework assignment must be submitted in Gradescope by 10am CST on Tuesday, October 5, 2021.

Topic

This homework focuses on JSON and CSV files, as well as the data structures stack and queue.

Collaboration

Collaboration is prohibited, and you may only access resources (books, online, etc.) listed below.

Resources you may consult

The course materials you may refer to for this homework are:

Point distribution

This homework assignment has 2 problems, numbered 2 and 3. The grading breakdown is:

Points Item
2 Autograder
4 Problem 2
4 Problem 3
10 Total

What to do if you're stuck

Ask your instructor or TA a question by email, in office hours, or on discord.

( 1. There's no problem 1 )

Gradescope will show the results of the automated syntax check of all submitted files as the score for problem 1.

2. Multiple delimiter matching

Modify the sample program parentheses.py we developed in class to create a new program called hwk7prob2.py that knows about both brackets [,] and parentheses (,) and can check for proper matching in an expression that may contain both. Brackets can match brackets, and parentheses can match parentheses, but a bracket cannot match a parenthesis. There's no restriction on which grouping symbol can be used inside the other, so all of these expressions are valid:

  • [(1+2) - (3+4)]
  • ([1 + (2-3)]+4)
  • [5 * [6 - 7]]

Notice that there are three kinds of errors that the program needs to be able to report:

  1. Too many left delimiters: some brackets or parentheses remain open at the end of the expression, e.g. [(1+2)-3 or ([5+6]
  2. Too many right delimiters: A ] or ) appears when there isn't anything to match it with, e.g. (1+2)] or ((1-2)+4))
  3. Delimiter type mistmatch: A ] would match an earlier (, or a ) would match an earlier [, e.g. [1+2) or ((5+6)-7]

The first two errors are also present in the example parentheses.py, but the third type is new to this modified version of the program and you need to add code to detect and report it.

Hint: The basic structure of the program won't change very much, but you will need to change what is stored on the stack. The original parentheses.py stores a position (integer), and that's no longer enough information. I suggest you make it so that instead of pushing an integer onto the stack, the new program pushes a list like ["(",7] or ["[",5] that indicates both the delimiter type (first element) and the position in the string (second element). This will, of course, change the way you access the position when you pop something off the stack. Another option is to push a dictionary like {"delimiter": "(", "position": 7}.

VERY IMPORTANT: For full credit, you must not leave any comments or docstrings in parentheses.py that are inaccurate because of your changes. Whenever you edit a program, you need to make sure you edit any comments that are affected by your changes!

Solution

The hint offers two suggestions for how the change can be done, so we'll present solutions that use each of them.

Solution 1: Push lists

The solution below is a direct modification of the sample program parentheses.py, and not all that much has actually changed. If you want to see just the differences between this and the original program, you can view that at this site: https://www.diffchecker.com/l2RU87mt. This website uses green to highlight lines which have been modified/added to the original.

In [10]:
"""Detect matching of parentheses in an expression"""
# MCS 260 Fall 2021 Lecture 17

# Read expression (which should include parentheses/brackets)
s = input("Expression: ")

# We'll use a stack to keep track of all the "(" or "["
# that haven't been matched with ")" or "]" yet.  Every
# new opening delimiter we see gets pushed, and every closing
# delimiter we see closes whatever is at the top of the stack.
currently_open = []

# We want both the characters of s and their positions
# so we use enumerate()
for i,c in enumerate(s):
    # c is character from s
    # i is the position (0-based index) of that character in s
    if c == "(" or c == "[":
        # New left paren/bracket opened; push it
        currently_open.append([c,i])
    elif c == ")" or c == "]":
        # Right delim closes whatever left delim is at the
        # top of the stack.  But we need to make sure the 
        # stack is nonempty before trying to pop.
        try:
            # i0 and c0 are the corresponding i and c 
            # for the opening paren/bracket
            c0, i0 = currently_open.pop()
            
            if (c0 == "(" and c == "]") or (c0=="[" and c == ")"):
                print("Error:")
                print(s)
                print(" "*i0 + "^ has mismatched delimiter types")
                print("First delimiter is " + c0)
                print("Second delimiter is " + c)
                exit()
            else:
                print("Matching delimiters found: " + s[i0:i+1])

        except IndexError:
            # Error because there was no opening delim on the
            # stack to match the closing delimiter
            print("Error:")
            print(s)
            print(" "*i + "^ does not match any preceding (")
            exit()
    
# are there any delimiters open?
# If so, it means that there is a ( or [ with no match
if len(currently_open) > 0:
    print("Error:")
    print(s)
    print(" "*currently_open.pop() + "^ is not matched by any following )")
else:
    print("Delimiters matched successfully.")

# Examples of what we expect the error messages to look like:

# (1 + ((2+3) - 5
#      ^  is not matched by any following )

# ( 1 + (3-4))) + 5
#             ^ does not match any preceding (
Expression: [(1+2) - (3+4)]
Matching delimiters found: (1+2)
Matching delimiters found: (3+4)
Matching delimiters found: [(1+2) - (3+4)]
Parentheses matched successfully.

Solution 2: Push dictionaries

Where the previous solution used lists to keep track of both position and delimiter type, this one uses a dictionary. It also demonstrates an alternative way to record what the matching pairs of delimiters are, and it prints a slightly fancier error message.

This solution involves changing more lines of parentheses.py.

In [ ]:
"""Detect matching of parentheses and brackets in an expression"""
# Based on `parentheses.py` from MCS 260 Lecture 17 and 18

s = input("Expression: ")

# left delimiters that haven't been matched yet
currently_open = []

# record which left delimiters match right ones
matching_left = {
    ")": "(",
    "]": "["
}
# This could be handled in other ways, but having a dictionary
# makes the conditionals a bit shorter

for i,c in enumerate(s):
    if c in ["(","["]:
        # New left delimiter; push it
        currently_open.append( {"delimiter": c, "position": i} )
    elif c in [")","]"]:
        try:
            just_closed = currently_open.pop()
        except IndexError:
            print("Error:")
            print(s)
            print(" "*i + "^ does not match any preceding delimiter")
            exit()
        if just_closed["delimiter"] != matching_left[c]:
            # Tried to match a bracket and a parenthesis.
            # We print a fancy error message here, but any solution
            # that detected this condition is acceptable.
            print("Error:")
            print(s)
            j = just_closed["position"]
            print(" "*j + "^" + "-"*(i-1-j) + "^ tried to match bracket and parenthesis")
            exit()
    
# are there any delimiters still open?
if len(currently_open) > 0:
    never_closed = currently_open.pop()
    print("Error:")
    print(s)
    print(" "*never_closed["position"] + "^ is not matched by anything")
else:
    print("Parentheses and brackets matched successfully.")

3. Chemical element CSV to JSON

Suppose I have a CSV file called elements.csv containing a list of chemical elements that looks like this:

number,abbreviation,name
11,Na,Sodium
80,Hg,Mercury

Write a program hwk7prob3.py that will read this file and write a JSON file elements.json in the following format:

[ {"number": 11, "abbreviation": "Na", "name": "Sodium"},
  {"number": 80, "abbreviation": "Hg", "name": "Mercury"} ]

Notice how the output JSON file contains a list of objects, and each object has keys that match the three column names from the CSV file. Also, the atomic number field is an integer in the JSON output (whereas it will be a string when you read it using the csv module).

You can use the CSV example above as the content of a file elements.csv for testing purposes, but your program needs to be able to handle any CSV file listing chemical elements in that format (with three columns, "number", "abbreviation", and "name", appearing in that order).

Your program can assume that elements.csv is in the current working directory when it is run.

Restricted methods note: For full credit your answer needs to use the csv module to read the file elements.csv.

Solution

There are several ways to do it. We begin with the most straightforward:

Solution 1: csv.reader and a boolean

In [9]:
import csv
import json

# Open the CSV file and initialize the reader
infile = open("elements.csv", "r", encoding="UTF-8", newline="")
reader = csv.reader(infile)

# Get the output JSON file ready
outfile = open("elements.json", "w", encoding="UTF-8")

# Initialize a list to put our JSON data into
jsondata = []

# Is this the first iteration where we are looking at the header row?
on_header = True

# Iterate over each line in the CSV file
for element in reader:
    if on_header:
        # Record that we're done reading the header row
        on_header = False
    else:
        # This is a regular row, not the header
        # Make a dictionary for this row and add it to jsondata
        elementjson = {"number": int(element[0]),
                       "abbreviation": element[1],
                       "name": element[2]}
        jsondata.append(elementjson)

# Write the JSON data to the output file
json.dump(jsondata, outfile)
    
# Close the files
infile.close()
outfile.close()

Commentary

Needing to carry around a boolean variable to decide whether we've read the header row is a bit clumsy, but it is the inevitable result of two characteristics of the program:

  1. We chose to use csv.reader, which treats every row the same way (i.e. doesn't know the difference between header and regular rows)
  2. The csv.reader object is something we only use as the container in a for loop, which thus visits each row (including the header)

Each of the other solutions we offer addresses this clumsiness by reconsidering one of these characteristics.

Solution 2: DictReader

The csv module can read data directly into dictionaries, which are nearly ready for writing to the JSON file. Unlike csv.reader, the csv.DictReader object automatically reads the header row and iteration over it only gives us the actual data. We need only convert the atomic number from a string to an integer.

In [9]:
import csv
import json

# Open the CSV file and initialize the reader
infile = open("elements.csv", "r", encoding="UTF-8", newline="")
reader = csv.DictReader(infile)

# Get the output JSON file ready
outfile = open("elements.json", "w", encoding="UTF-8")

# Initialize a list to put our JSON data into
jsondata = []

# Iterate over each line in the CSV file
for element in reader:
    # Now element is a dictionary containing one row of the CSV file
    # using headers as keys.  It's almost what we want, but `number` will
    # be a string in this dictionary.  We convert it to an integer.
    element["number"] = int(element["number"])
    jsondata.append(element)

# Write the JSON data to the output file
json.dump(jsondata, outfile)
    
# Close the files
infile.close()
outfile.close()

Solution 3: csv.reader and next()

This solution uses csv.reader but demonstrates something we didn't learn in class yet to handle the issue of skipping the header row: The built-in function next(), if called with a csv.reader object as its argument, will read and discard one row. There was of course no expectation you'd use this function in your homework solution, but since csv.reader was the most common approach students took, it seems natural to take this opportunity to demonstrate a built-in function that can be used to clean it up a bit.

In [ ]:
import csv
import json

# Open the CSV file and initialize the reader
infile = open("elements.csv", "r", encoding="UTF-8", newline="")
reader = csv.reader(infile)

# Get the output JSON file ready
outfile = open("elements.json", "w", encoding="UTF-8")

# Skip the first row, which contains the header
next(reader)

# Get the output JSON file ready
outfile = open("elements.json", "w")

# Initialize a list to put our JSON data into
jsondata = []

# Iterate over each line in the CSV file
for element in reader:
    # Make a dictionary for this row and add it to jsondata
    elementjson = {"number": int(element[0]),
                   "abbreviation": element[1],
                   "name": element[2]}
    jsondata.append(elementjson)

# Write the JSON data to the output file
json.dump(jsondata, outfile)
    
# Close the files
infile.close()
outfile.close()

Revision history

  • 2021-10-13 Initial upload of solutions