This worksheet focuses on JSON and CSV files, as well as the data structures stack and queue.
As I announced in lecture, this and future worksheets will have a first problem that is treated differently:
The main course materials to refer to for this worksheet are:
json
or csv
in the name, and the sample program spillreport.py
.(Lecture videos are not linked on worksheets, but are also useful to review while working on worksheets. Video links can be found in the course course Blackboard site.)
In Lecture 17, we wrote a program parentheses.py to examine mathematical expressions like ((1+2)+3)
and check that it is possible to match the parentheses up in pairs. It uses a stack to keep track of all the unmatched left parentheses while scanning through the string.
Modify this program (saving the new one as parentheses2.py
) so that while reading an expression, it prints a message every time it encounters a ")" that shows which "(" matches it. The output should look like this:
Expression: (((1+2)/7) + (9/(2+3)))
Detected matching pair of parentheses enclosing: (1+2)
Detected matching pair of parentheses enclosing: ((1+2)/7)
Detected matching pair of parentheses enclosing: (2+3)
Detected matching pair of parentheses enclosing: (9/(2+3))
Detected matching pair of parentheses enclosing: (((1+2)/7) + (9/(2+3)))
Parentheses matched successfully.
# MCS 260 Fall 20201 Worksheet 7 Problem 1
# Modification of `parenthesis.py`
"""
Checks parenthesis matching of an expression, also showing the part
inside each pair of matching parentheses.
"""
s = input("Expression: ")
# stack to track the opening parentheses
opening = []
for i,c in enumerate(s):
# c is character from s
# i is the position (0-based index) of that character in s
if c == "(":
# New left paren opened; push it
opening.append(i)
elif c == ")":
try:
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# THIS IS THE PART THAT'S NEW
# Parenthesized subexpression begins at the position
# on the top of stack, ends at current position (i)
opening_pos = opening.pop()
closing_pos = i
subexpr = s[opening_pos:closing_pos+1] # inclusive slice
print("Detected matching pair of parentheses enclosing:", subexpr)
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
except IndexError:
# Error because there was no "(" on the
# stack to match this ")"
print("Error:")
print(s)
print(" "*i + "^ does not match any preceding (")
exit()
# Checks whether parenthesis are open, no change from parenthesis.py
if len(opening) > 0:
print("Error:")
print(s)
print(" "*open.pop() + "^ is not matched by any following )")
else:
print("Parentheses matched successfully.")
From 2010 to 2012, the National Geographic Channel produced a weekly hour-long TV series titled "Python Hunters". While you might imagine this was a show about computer programming, it was actually about snakes.
A JSON file containing a list of episodes can be found from the link below:
(This is not an official reference, but a user-generated catalog from a web site called TVMaze. They make such datasets available for anyone to use, for free, as long as they give proper attribution.)
You're going to need this data file in this problem, so you should visit the link and save it somewhere (e.g. as pythonhunter.json
).
The JSON object in this file is a list, and each item in the list is a dictionary describing an episode . All of the dictionaries have the same keys, which are characteristics of an episode.
The same information could be presented in table form and stored in a CSV file. For example, if the only characteristics of an episode we care about are:
then the episodes could be listed in a CSV file that would look like this:
season,number,name,airdate
1,1,The Perfect Storm,2010-07-12
1,2,The Big Freeze,2010-07-19
Write a Python program episodes_to_csv.py
that reads the JSON file containing the episode list and creates a CSV file in the format described above.
Use Python's json
module to read the JSON file, and use the csv
module to write the CSV file. CSV is simple enough that you could probably write the data in this format manually (i.e. with file.write()
), but the point of this problem is to get some practice using the csv
module.
# MCS 260 Fall 20201 Worksheet 7 Problem 2
"""
Read episodes.json in the current working directory, which should contain the JSON
from https://api.tvmaze.com/shows/25376/episodes and write a table with episode
data to episodes.csv.
"""
import json
import csv
import os
# Read JSON of episode data
infile = open("episodes.json","r",encoding="UTF-8")
episode_data = json.load(infile)
infile.close()
# Write CSV of episode data (selected fields)
output_columns = ["season","number","name","airdate"]
outfile = open("episodes.csv","w",encoding="UTF-8",newline="")
writer = csv.writer(outfile)
writer.writerow(output_columns)
for episode in data:
# Select the data we want about this episode, make a list of values, and write to
# CSV. We use a list comprehension to do this in a single line.
writer.writerow([ episode[k] for k in output_columns ])
outfile.close()
Write a program csvtotaler.py
that expects two command line arguments:
sys.argv[1]
is the input filename, an existing CSV filesys.argv[2]
is the output filename, which will be created by this program (also CSV)The CSV file specified as the first argument is expected to contain a header row, followed by columns of integer values. There can be any number of columns, and any number of rows, but all the values are expected to be integers. Here's a sample of what it might look like:
Received,Shipped,Damaged,Returned
291,155,3,8
408,120,5,0
355,109,0,3
The program should read this and write a new CSV file that has all the same columns, as well as a new column called "Rowtype" that appears before the others. Each row of data from the input file should be copied here, with the value "Data" in the "Rowtype" column. Then, the sum of each column should be computed and these columnwise totals written to a row at the bottom whose "Rowtype" is "Total". Thus, for the CSV shown above, the output file would contain:
Rowtype,Received,Shipped,Damaged,Returned
Data,291,155,3,8
Data,408,120,5,0
Data,355,109,0,3
Total,1054,384,8,11
# MCS 260 Fall 20201 Worksheet 7 Problem 3
"""
Total columns in a CSV file, and write original data and totals to a new CSV file.
"""
import sys
import csv
infn = sys.argv[1]
outfn = sys.argv[2]
infile = open(infn,"r",encoding="UTF-8",newline="")
outfile = open(outfn,"w",encoding="UTF-8",newline="")
reader = csv.reader(infile)
writer = csv.writer(outfile)
header = True # Are we working on the header row right now?
for row in reader:
if header:
header = False
writer.writerow(["Rowtype"] + row)
totals = [ 0 for x in row ] # a list of zeros, as many as there are columns
else:
# Convert each value to an integer and add it to the running total of that column
for i,v in enumerate(row):
totals[i] = totals[i] + int(v)
# Write this row of data
writer.writerow(["Data"] + row)
# Done reading, so close the input file
infile.close()
# Write the row with the totals to the output file
writer.writerow(["Totals"] + totals)
outfile.close()
Download and save this JSON file containing data about current members of the US House of Representatives.
https://www.govtrack.us/api/v2/role?current=true&role_type=representative&limit=438
Now, write a program that reads this and prints the twitter ids of all the representatives from Pennsylvania.
(Do that by exploring the structure of the JSON file yourself, either by loading it into a variable in the REPL or opening the file in an editor and looking around. The point is to practice understanding the layout of information in a JSON file and turning it into code that can access and filter it in Python.)
# MCS 260 Fall 20201 Worksheet 7 Problem 4
"""
Read JSON data about US House of Rep members from house.json and print info about PA representative
twitter handles to the terminal.
"""
import json
fobj = open("house.json","r",encoding="UTF-8")
housedata = json.load(fobj)
fobj.close()
# housedata is now a dictionary, with the main info about representatives
# in a list that is associated with key "objects"
# List of data on reps from PA
PAreps = [ x for x in housedata["objects"] if x["state"]=="PA"]
# Print the ones that have twitter ids
for rep in PAreps:
tid=rep["person"]["twitterid"]
if tid:
print(tid)