A document from MCS 275 Spring 2023, instructor Emily Dumas. You can also get the notebook file.

MCS 275 Spring 2023 Homework 12 Solutions

  • Course Instructor: Emily Dumas
  • Contributors to this document: Johnny Joyce

Instructions:

  • Complete the problems below, which ask you to write Python scripts.

Deadline

This homework assignment must be submitted in Gradescope by Noon central time on Tuesday April 11, 2023.

Collaboration

Collaboration is prohibited, and you may only access resources (books, online, etc.) listed below.

Content

This homework is about SQLite.

Resources you may consult

Most relevant:

Less likely to be relevant, but also allowed:

Point distribution

This homework assignment has 2 problems, numbered 2 and 3. The grading breakdown is:

Points Item
3 Autograder
6 Problem 2
6 Problem 3
15 Total

The part marked "autograder" reflects points assigned to your submission based on some simple automated checks for Python syntax, etc. The result of these checks is shown immediately after you submit.

What to do if you're stuck

Ask your instructor or TA a question by email, in office hours, or on discord.

Problem 2: Power plant info script

The work for this problem should go in a file called hwk12prob2.py.

What to know and prepare

If you don't already have it, download the powerplants SQLite database from Worksheet 12 and save it in a place you can find:

As a reminder, the only table is called powerplants and the columns present in that table are described at

Below you'll find a basic Python script that opens that database and prints the total number of columns. I'm providing it so you focus more on queries than on Python script details. The script requires powerplants.sqlite to be in the same directory.

In [ ]:
"Count rows in the powerplants table"
import sqlite3

# Open connection
con = sqlite3.connect("powerplants.sqlite")
# Run query
res = con.execute("SELECT COUNT(*) FROM powerplants;")

# Get one result row; there is also a .fetchall() method to get
# a list of rows, or you can iterate over result rows in a for loop.
first_result_row = res.fetchone()

# Now first_result_row is a tuple like (30000,)
# We want its first entry
numrows = first_result_row[0]

# Close connection
con.close()

print("The number of rows in table `powerplants` is {}.".format(numrows))

What to do

Compose appropriate SQL queries and modify this Python script so that it prints answers to all of the following questions:

  1. How many power plants in the database are located in Uganda?
  2. What primary fuels are used by power plants in Jamaica? (List them without repeats)
  3. Among power plants in the database for which total output in 2017 and 2018 are both available, what fraction (in percent) had greater total output in 2018?
  4. What are the names of the top five power plants in India in terms of total power output in the year 2016?
  5. Determine which of these companies own power plants in at least two different countries:

    • Biffa
    • EDF
    • Eskom
    • IPP
    • PacifiCorp

    For any that own plants in two or more countries, print the company name.

Note: I don't mean you should run the queries yourself and then copy the answers into print() calls. Instead, the Python script you submit should execute the queries each time it is run.

Solution

In [3]:
import sqlite3

# Open connection
con = sqlite3.connect("powerplants.sqlite")

# 1
query1 = con.execute("SELECT COUNT(*) FROM powerplants WHERE country = 'Uganda';")
num_uganda,  = query1.fetchone() # Use comma to unpack tuple of length 1
print("Uganda has {} power plants listed in the database".format(num_uganda))


# 2
query2 = con.execute("SELECT DISTINCT primary_fuel FROM powerplants WHERE country = 'Jamaica';")
fuels = [row[0] for row in query2]
fuels = ", ".join(fuels)
print("Jamaica uses the following fuels: {}".format(fuels))

# 3
query3_numerator = con.execute("""SELECT COUNT(*) FROM powerplants 
                                    WHERE (output_gwh_2017 IS NOT NULL AND output_gwh_2018 IS NOT NULL 
                                    AND output_gwh_2017 > output_gwh_2018);""")
query3_denominator = con.execute("""SELECT COUNT(*) FROM powerplants 
                                    WHERE (output_gwh_2017 IS NOT NULL AND output_gwh_2018 IS NOT NULL);""")
numerator = query3_numerator.fetchone()[0]
denominator = query3_denominator.fetchone()[0]
print("""Among power plants that have 2017 and 2018 data available, {:.2f}% had greater output in 2017 than 2018""".format(100*numerator/denominator))

# 4
query4 = con.execute("SELECT name FROM powerplants WHERE country='India' ORDER BY output_gwh_2016 DESC LIMIT 5;")
print("The top 5 power plants in India by output in 2016 are:")
for i, row in enumerate(query4):
    print("#{}: {}".format(i+1, row[0]))
    
# 5
# Use a GROUP BY .. HAVING statement to only select owners present in at least 2 countries
query5 = con.execute("""SELECT DISTINCT owner FROM powerplants 
                        WHERE owner IN ('Biffa', 'EDF', 'Eskom', 'IPP', 'PacifiCorp')
                        GROUP BY owner HAVING COUNT(DISTINCT country) >= 2;""")
companies = [row[0] for row in query5]
companies = ", ".join(companies)
print("The following companies have powerplants in at least two different countries: {}".format(companies))



# Close connection
con.close()
Uganda has 11 power plants listed in the database
Jamaica uses the following fuels: Gas, Solar, Oil, Hydro, Wind
Among power plants that have 2017 and 2018 data available, 48.65% had greater output in 2017 than 2018
The top 5 power plants in India by output in 2016 are:
#1: VINDH_CHAL STPS
#2: MUNDRA TPP
#3: SASAN UMPP
#4: MUNDRA UMPP
#5: SIPAT STPS
The following companies have powerplants in at least two different countries: EDF, IPP

Problem 3: Utility to help using todo.py for demos

The work for this problem should go in a file called hwk12prob3.py.

What to know and prepare

This problem requires you to be familiar with the "todo list" sample application from the course sample code repository, which uses a SQLite database. It can be found at:

Download it, save it somewhere, try it out in the terminal, and take a look at the code. That script will make its own database the first time you run it. Try commands like:

python3 todo.py add Submit homework 12
python3 todo.py add Finish project 4
python3 todo.py list
python3 todo.py done 1
python3 todo.py list
python3 todo.py listall

As you see, this script stores a task list in a database table, and lets you add, remove, or modify items in that list by running the script with appropriate command line arguments.

For reference, here's the table schema it uses in todo.sqlite:

CREATE TABLE tasks (
        id INTEGER PRIMARY KEY,
        description TEXT,
        completed INTEGER DEFAULT 0
    );

What to do

Imagine you're planning to use todo.py as part of a lecture demonstration. It might not be ideal to start with an empty database, and it might also be helpful to have a way to mark every task as not completed (undone) as a sort of "reset" to allow you to once again show how to mark a few of them completed (done).

Write a script hwk12prob3.py that expects one command line argument. The behavior should be:

  1. If the command line argument is anything other than reset, sampledata, or cleardone, print an error message and exit.
  2. Open database todo.sqlite.
  3. If a table called tasks doesn't already exist, create a table with the columns id, description, completed as described above.
  4. If the command line argument is reset, delete all rows from the table tasks
  5. If the command line argument is sampledata, add 10 rows into the database with distinct descriptions. Have all of them start out with completed set to 0. Don't set the id column values explicitly, so if you run this twice, 10 more rows will be added.
  6. If the command line argument if cleardone, set completed to 0 in every row

This could therefore be used as a companion to todo.py for demonstrations, e.g.

  • Before the first demo, run python3 hwk12prob3.py reset and then python3 hwk12prob3.py sampledata so you have a known set of sample tasks (none of them done) in the database.
  • Before the next demo, reset done/undone markers with python3 hwk12prob3.py cleardone

Note: The script hwk12prob3.py does not need to duplicate any features of todo.py and should not be written by modifying todo.py. Instead, the two scripts are meant to be used together, accessing the same database but doing different things.

Solution

In [ ]:
import sys
import sqlite3
import datetime

if sys.argv[1] not in ["reset", "sampledata", "cleardone"]:
    print("Error: Only the following commands are accepted: reset, sampledata, cleardone")
    exit()
    
con = sqlite3.connect("todo.sqlite")
          
# Initial table setup
con.execute("""CREATE TABLE IF NOT EXIST tasks (id INTEGER PRIMARY KEY,
                                                description TEXT,
                                                completed INTEGER DEFAULT 0);""")

# Delete all entries
if sys.argv[1] == "reset":
    con.execute("DELETE * FROM tasks;")

# Insert 10 new rows
elif sys.argv[1] == "sampledata":
    now = datetime.datetime.now() # This line is not required - but it makes our example descriptions unique
    for i in range(10):
        # In your homework submissions, your task descriptions may vary - you can put anything you like!
        description = "Finish example task {} generated on {}".format(i+1, now)
        # In the line below, `description` is in a tuple with only 1 entry because we need to
        # use a list/tuple that contains all values that we want to insert into `?` placeholders
        con.execute("INSERT INTO tasks (description) VALUES (?);", (description,))
    
# Set all tasks as completed
elif sys.argv[1] == "cleardone":
    con.execute("UPDATE tasks SET completed = 1;")

con.commit()
con.close()

Revision history

  • 2023-04-06 Initial publication