A document from MCS 275 Spring 2022, instructor Emily Dumas. You can also get the notebook file.

MCS 275 Spring 2022 Worksheet 12 Solutions

  • Course instructor: Emily Dumas
  • Solutions prepared by: Jennifer Vaccaro, Johnny Joyce

Topics

This worksheet focuses on SQLite databases.

Resources

These things might be helpful while working on the problems. Remember that for worksheets, we don't strictly limit what resources you can consult, so these are only suggestions.

The main references for these topics are:

Setup steps

You'll need to complete these before you can work on the worksheet.

Quiz 12 will also be impossible to work on until you've successfully completed these steps.

Get the SQLite command line shell

There's nothing to install if you just want to use SQLite with Python, as the module sqlite3 is already part of the Python standard library.

However, if you just want to run one SQL query and show the results, a fairly long Python program is needed. It's much easier if you can install the SQLite REPL (or "command line shell") that lets you run queries directly from a prompt in your terminal. This worksheet assumes you have the command line shell for SQLite.

Lectures 30 and 32 discussed how to install this. Here is a quick video showing the steps to install in Windows 10:

Here are more detailed written installation instructions by platform.

  • If you use Windows, you need to install it yourself. The installation doesn't look like the graphical ones you may be used to, with a window and buttons guiding you through the steps. Instead you download a zip file and extract it. The whole thing is relatively quick, but the steps below are described in some detail, so the written instructions are a bit long. (Consider just watching the 1-minute video and following along instead.)

    1. Go to https://www.sqlite.org/download.html.
    2. Scroll down to the heading Precompiled Binaries for Windows
    3. In that section, click on the link whose name begins with sqlite-tools-win32-x86-. The description next to the link should begin: A bundle of command-line tools for managing SQLite database files
    4. The download should either begin immediately, or you'll be asked if you want to download the file. If prompted, save it somewhere you know how to find in the file explorer.
    5. Find the ZIP file you just downloaded and open it in the file explorer (either by clicking the filename where the web browser shows a recent download, or by opening the file explorer and navigating to the place where you saved the ZIP file).
    6. This should open a new explorer window in which you browse the contents of the ZIP file. It contains a folder with a name like sqlite-tools-win32-x86-3380200. Double click to enter that folder.
    7. You should now see a list of three files, named

      • sqldiff.exe
      • sqlite3.exe
      • sqlite3_analyzer.exe

      However, the .exe may be missing if explorer is configured to hide file extensions (the default).

    8. Of these, sqlite3.exe is the only one you want. Drag that file to the desktop to extract a copy of it.
    9. You should now see a sqlite3 icon on the desktop. Don't click it; we'll work in the terminal instead.
    10. Open a terminal (Windows Terminal or PowerShell) and change directory to the desktop (e.g. with cd Desktop or cd C:\Users\myusername\Desktop or cd C:\Users\myusername\OneDrive\Desktop, depding on where PowerShell opens and whether you use OneDrive desktop backup)
    11. Test running the command line shell by entering this command in powershell: .\sqlite3.exe
    12. Now SQLite should display a startup message and prompt. Success!
    13. Exit the SQLite command line shell with .exit to return to PowerShell
    14. In the future, you will probably want to run the SQLite command line shell from whatever directory contains your database file. To do that, open PowerShell, go to the directory you want to work in, and then enter the full path to the sqlite3.exe file, e.g. C:\Users\myusername\Desktop\sqlite3.exe.
      • If your username contains spaces, this won't work. You will need to instead surround the command with single quotation marks and precede it with an ampersand and space, e.g.
        & 'C:\Users\My Username Has Spaces\Desktop\sqlite3.exe'
  • If you use Linux or MacOS, SQLite's command line shell is almost always pre-installed. Type sqlite3 in a terminal and press enter. Success (meaning it is already installed) looks something like this:

    sqlite3
    SQLite version 3.31.1 2020-01-27 19:55:54
    Enter ".help" for usage hints.
    Connected to a transient in-memory database.
    Use ".open FILENAME" to reopen on a persistent database.
    sqlite>

    (at which point you'd want to exit using command .exit) Failure (meaning it is not already installed) looks something like this:

    $ sqlite3
    sqlite3: command not found

    If you use Linux or MacOS and SQLite's command line shell is not already installed, contact your TA or instructor for help.

Know how to tell which REPL is open, and how to exit SQLite

Any time you have your terminal open, there are three situations you may find yourself in:

  1. The terminal itself is waiting for a command from you. If this is the case, you'll see a prompt like
    PS C:\Users\ddumas\Desktop>
    or
    $
  2. The Python REPL is running, and waiting for a Python statement from you. If this is the case, you'll see a prompt

    >>>

    and you can quit back to the terminal using the command exit().

  3. The SQLite command line shell is running, and is waiting for a SQL command from you. If this is the case, you'll see a prompt

    sqlite>

    and you can quit back to the terminal using the command .exit

To summarize

When running prompt looks like exit with and then
Terminal (Windows) PS C:\Users> exit window closes
Terminal (Mac/Linux) $ exit window closes
Python >>> exit() back to terminal
SQLite sqlite> .exit back to terminal

Get the star database

Questions 1 uses the HYG star database discussed in Lecture 32. You

So you'll need to download this and put the file in a place where you can find it.

The only table in this database is called stars, and the columns present in that table are described at

Confirm you can open the star database in the SQLite command line shell

  • Open your terminal
  • Change directory to the one that contains the file hyg_data.sqlite
  • Run the SQLite command line shell, giving hyg_data.sqlite as the first command line argument, e.g. a typical Windows command for PowerShell would be
     C:\Users\myusername\Desktop\sqlite3.exe hyg_data.sqlite
    and a typical Linux or MacOS terminal command would be
     sqlite3 hyg_data.sqlite
    After pressing enter, you should see that SQLite is running and waiting for a command with the prompt
     sqlite>
  • KEY STEP: We need to check that SQLite actually opened the star database file, because if SQLite didn't find it, then the command we ran will actually create an empty file with that name. To check that we actually opened an existing database, run a command that uses the data we expect to find there, e.g.
      SELECT COUNT(*) FROM stars;
    If everything is working, the output should look like this:
      sqlite> SELECT COUNT(*) FROM stars;
      119614
      sqlite>
    If you instead see this:
      sqlite> SELECT COUNT(*) FROM stars;
      Error: no such table: stars
      sqlite>
    then it means you ran sqlite3 successfully, but in a directory that didn't contain the star database. That directory will now contain an empty file named hyg_data.sqlite, which you should probably find and remove to prevent yourself from later confusing it with the actual database file.

1. Basic star queries

Use SQL queries run against the HYG star database to answer these questions. (You should consider both the query and its return value as part of the answer.)

1A. Dim stars

What are the ids, color indices (ci), and names (proper) of the five dimmest stars in the database (as measured by magnitude)? Hints:

  • Most stars in the database don't have proper names, so it's possible those fields will be blank.
  • Magnitude is an inverted scale: Large postive numbers indicate very dim stars.

1B. Unnamed and blue

Find the positions (as (ra,dec) coordinates) of the ten "most blue" stars that don't have proper names in the database. (The column ci is proportional to how blue a star is, so large values in that column mean very blue.)

Solution

1A.

Version with SQL command only:

SQL Command:

  • SELECT id, ci, proper FROM stars ORDER BY mag DESC LIMIT 5;

Output (notice that these stars do not have names or color indices in the database):

118947|| 119182|| 118972|| 118747|| 118781||

Version with full Python code:

In [2]:
import sqlite3
con = sqlite3.connect("hyg_data.sqlite")

# 1A: ids, color indices, and proper names of the 5 lowest-brightness stars
# mag is inverse to brightness, so get the stars with the highest mag (i.e. use DESC)
c = con.execute("SELECT id, ci, proper FROM stars ORDER BY mag DESC LIMIT 5;")
print("Part A: five lowest-brightness stars")
for s in c:
    print(s)
    
con.close() # No need to commit first, since we didn't modify the database.
Part A: five lowest-brightness stars
(118947, None, None)
(119182, None, None)
(118972, None, None)
(118747, None, None)
(118781, None, None)

1B.

Version with SQL command only:

SQL Command:

  • SELECT ra,dec FROM stars WHERE proper IS NULL ORDER BY ci DESC LIMIT 10;

Output:

18.53891|36.99879 8.944483|19.849147 5.720509|-46.453841 15.825364|39.571637 12.788448|4.144822 21.920492|50.497146 15.466949|-75.929604 9.92392|-41.586884 10.372171|-47.972659 19.953475|44.261121

Version with full Python code:

In [3]:
import sqlite3
con = sqlite3.connect("hyg_data.sqlite")

# Get positions of stars with no names which have the highest color index (i.e. most blue)
c = con.execute("SELECT ra,dec FROM stars WHERE proper IS NULL ORDER BY ci DESC LIMIT 10;") 

print("\nPart B: ten unnamed stars in order of blueness")
for star in c:
    print(star)

con.close() # No need to commit first, since we didn't modify the database.
Part B: ten unnamed stars in order of blueness
(18.53891, 36.99879)
(8.944483, 19.849147)
(5.720509, -46.453841)
(15.825364, 39.571637)
(12.788448, 4.144822)
(21.920492, 50.497146)
(15.466949, -75.929604)
(9.92392, -41.586884)
(10.372171, -47.972659)
(19.953475, 44.261121)

2. Programming joke utility

Expressions in SQL

There is a new SQLite feature (or more precisely, a feature of the SQL dialect that SQLite uses for queries) that you'll need to use in this problem.

In many places where we've used column names in our queries, you can also use expressions that apply arithmetic operators and other functions to the values in the columns. For example, if a database of MCS 275 grades has columns called project3pct and project4pct, then this query would return the email addresses of students whose grades on those two projects differed by more than 10 percent:

SELECT email FROM mcs275roster WHERE ABS(project3pct - project4pct) > 10;

You can also use expressions like this in the requested list of output columns. For example, this query would get the average of project 3 and 4 percentages for all students, listed in alphabetical order by last name.

SELECT lastname, firstname, 0.5*(project3pct + project4pct) FROM mcs275roster ORDER BY lastname;

Such expressions can also be used after ORDER BY to make a custom sort.

You can find lists of built-in functions in SQLite in the documentation:

The task

Write a program that stores, delivers, and ranks programming jokes using a SQLite database. It should support three operations:

  • Add a joke: The user provides a string and the program adds it to the joke table. In addition to the joke itself, the joke table has columns to store the number of times a joke has been told, and how many times it was successful in amusing the user.
  • Tell a joke: A random joke is selected and displayed. The user is asked if they found it amusing. Their answer is recorded by updating the relevant counts (times told, times successful) for that row.
  • Best jokes list: Among jokes that have succeeded at least once, display the top 5 ranked by success rate (ratio of times successful to times told).

The program should create the database and table it needs if they don't already exist. Otherwise, it should open and use the existing database.

The three functions should be selected using command line arguments. The first command line argument is always the command---one of add, tell, or best. If the command is add, then a second command line argument is required, which is the joke itself. If the command is tell, no other arguments are required but the user is prompted for their approval/disapproval of the joke through keyboard input.

Hints:

  1. Even without you asking to do so, SQLite will add an extra column to your table called rowid that has a distinct integer value for each row. This is a primary key. It won't return this column unless you ask for it explicitly (e.g. SELECT * FROM ... won't show it, but SELECT rowid FROM ... or SELECT rowid,* FROM ... will. Having a unique id for each row is helpful so you can retrieve a row, and then apply an operation to the same row later.
  2. SQLite doesn't have a feature to select a random row directly, but it does have a built-in function RANDOM() that will return a random number for each row, and you order by that.
  3. In a SQLite query, if columns A and B both have type integer then A/B computes the integer division of A by B. In contrast, 1.0*A/B would give the true quotient because the multiplication by 1.0 converts A to a float (or REAL, in SQLite terminology).

Save the program as jokedb.py.

Here is a sample session of what using it should look like. (These are from a linux terminal session, where the terminal prompt is "$". In Windows PowerShell, the prompt will look a bit different.)

$ python3 jokedb.py tell
ERROR: No jokes in database.

[... omitted: several jokes are added ...]

$ python3 jokedb.py tell
There are 10 types of people in the world: Those who understand binary, and those who don't.

Were you amused by this? (Y/N)y

$ python3 jokedb.py tell
The two hardest things in programming are naming things, cache invalidation, and off-by-one errors.

Were you amused by this? (Y/N)n

$ python3 jokedb.py add "Most people agree that there were no widely-used high-level programming languages before FORTRAN.  Unfortunately, there is no agreement on whether this makes FORTRAN the 1st such language, or the 0th."

$ python3 jokedb.py tell
After learning Python, my kids stopped saying 'I won't take out the garbage!'.  Instead, they say 'take_out_garbage() is deprecated in v2.0'.

Were you amused by this? (Y/N)y

$ python3 jokedb.py best
-------------------------------------------------------
#1 with 100% success rate after 8 tellings:
Knock knock.
Race condition.
Who's there?
-------------------------------------------------------
#2 with 71% success rate after 7 tellings:
There are 10 types of people in the world: Those who understand binary, and those who don't.
-------------------------------------------------------
#3 with 67% success rate after 6 tellings:
A software testing engineer walks into a bar and orders a refrigerator, -1 glasses of water, and INT_MAX+1 cans of soda.
-------------------------------------------------------
#4 with 60% success rate after 5 tellings:
After learning Python, my kids stopped saying 'I won't take out the garbage!'.  Instead, they say 'take_out_garbage() is deprecated in v2.0'.
-------------------------------------------------------
#5 with 50% success rate after 4 tellings:
The two hardest things in programming are naming things, cache invalidation, and off-by-one errors.

$

Solution

In [ ]:
import sys
import sqlite3

def add(con, joke):
    """If db table exists, add a joke. Otherwise, create a table and add the joke."""
    
    # If the table does not exist yet, create it
    con.execute("CREATE TABLE IF NOT EXISTS jokes (joke TEXT, tells INT,success INT)")
    
    # By now, table must exist. 
    # We haven't told the joke yet, so other vals set to 0
    con.execute("INSERT INTO jokes VALUES (?,?,?);",(joke,0,0))
    
    con.commit()

def tell(con):
    """Tells a random joke from the database"""
    
    # Select a random joke and its corresponding row data
    c = con.execute("SELECT rowid,* FROM jokes ORDER BY RANDOM() LIMIT 1")
    
    # Unpack the values and print out the joke
    rowid, joke, tells, success = c.fetchone()
    print(joke)
    tells += 1
    
    # While loop to account for invalid user input
    while True: 
        rank = input("Was this a good joke? (Y/N)")
        if rank.upper() == "Y":
            # It's a good joke!
            success += 1
            break
        elif rank.upper() == "N":
            # It's a bad joke :-(
            success += 0
            break
        else:
            print("Invalid input, please try again.")
            
    # Update the tells and success values for the joke we told (identified by rowid)
    con.execute("UPDATE jokes SET tells=?,success=? WHERE rowid==?;",(tells,success,rowid))
    con.commit()

def best(con):
    """Prints out the top 5 jokes, based on ratings"""
    c = con.execute("SELECT * FROM jokes WHERE tells>0 ORDER BY -1.0*success/tells LIMIT 5")
    for i,joke in enumerate(c):
        best_print(i+1,joke[0],joke[1],joke[2])

def best_print(num, joke, tells, successes):
    """Prints joke data with a specific format"""
    print("-------------------------")
    if tells == 0:
        print("#{} with no tellings yet:")
    else:
        print("#{} with {}% success rate after {} tellings:".format(num,int(100*successes/tells),tells,joke))
    print(joke)

if __name__=="__main__":
    cmd = sys.argv[1]
    con = sqlite3.connect("jokes.sqlite")

    # Check the command value, and call the appropriate function
    if cmd == "add":
        joke = sys.argv[2]
        add(con,joke)
    elif cmd == "tell":
        tell(con)
    elif cmd == "best":
        best(con)
    else:
        print("Command not understood. Please use 'add', 'tell', or 'best'.")
    
    # Close the connection (any changes were commited within the function)
    con.close()

3. More complex star queries

COUNT

This is another SQLite feature to know about (which will be helpful in this problem).

When you make a SELECT query, if you only want to know how many rows would be returned, and not the actual data, you can ask for

COUNT(*)

in place of the list of columns you would otherwise include. For example,

SELECT COUNT(*) FROM mcs275roster WHERE project4pct >= 90;

might return the number of students who scored 90 percent or higher on project 4.

3A. Siriusly bright

While mag is a column indicating the apparent brightness of the star as seen from earth, this is not the same as the amount of light a star emits. Some stars seem dim only because they are very far from the earth. To account for this, the column absmag (absolute magnitude) indicates the brightness of the star if it were observed from a certain standard distance (about 32 light-years). Thus absmag is a measure of the light energy output of the star. Both mag and absmag use a scale where smaller numbers correspond to more light energy.

The brightest star in the night sky is Sirius. But how many stars in this database actually emit more light than Sirius (and appear dimmer due to being farther away)?

Of those, what fraction are less blue than Sirius?

3B. Solar twin

What named star the database has energy output closest to that of the sun?

Hints:

  • If two stars have nearly the same energy output, then the difference of their absolute magnitudes is close to zero (though it might be positive or negative)
  • "Close to zero" is the same as having small absolute value

Solution

3A.

Version with SQL commands only:

First find the absmag of Sirius. SQL command:

  • SELECT absmag FROM stars WHERE proper IS "Sirius";

Output:

1.454

This tells us we need to query for stars with absmag < 1.454 (we use < here because lower values of absolute magnitude correspond to more light energy). SQL command:

  • SELECT COUNT(*) FROM stars WHERE absmag<1.454

Output:

58873

So there are 58873 stars in the database which emit more light than Sirius.

We can follow a similar procedure to find out what fraction of these are less blue than Sirius. See the full Python version below for this.

Version with full Python code:

In [1]:
import sqlite3
con = sqlite3.connect("hyg_data.sqlite")

# Get Sirius's absmag and ci
c = con.execute("SELECT absmag, ci FROM stars WHERE proper IS 'Sirius';")
sirius_absmag, sirius_ci = c.fetchone()

# Find number of stars which are brighter than Sirius
c1 = con.execute("SELECT COUNT(*) FROM stars WHERE absmag<?;", (sirius_absmag,))
count_brighter, = c1.fetchone() # Comma here accounts for fact that fetchone() returns list
print("\n3A: Number of stars emitting more light than Sirius: {}".format(count_brighter))

c2 = con.execute("SELECT COUNT(*) FROM stars WHERE absmag<? AND ci<?;", (sirius_absmag,sirius_ci))
count_brighter_lessblue, = c2.fetchone()
print("3A: Ratio brighter and less blue than Sirius: {}".format(count_brighter_lessblue/count_brighter))

con.close() # No need to commit first, since we didn't modify the database.
3A: Number of stars emitting more light than Sirius: 58873
3A: Ratio brighter and less blue than Sirius: 0.12042871944694512

3B.

Version with SQL commands only:

Similarly to part A, we need the Sun's absolute magnitude. SQL command:

  • SELECT absmag FROM stars WHERE proper IS 'Sol';

Output:

4.85

Then we query which stars have the closest value of absmag to 4.85 (except the Sun itself). SQL command:

  • SELECT proper FROM stars WHERE proper is not NULL and proper != 'Sol' ORDER BY ABS(absmag-4.85) LIMIT 1;

Output:

Rigil Kentaurus

Version with full Python code:

In [4]:
import sqlite3
con = sqlite3.connect("hyg_data.sqlite")

# Get the sun's absmag
c = con.execute("SELECT absmag FROM stars WHERE proper IS 'Sol';")
sun_absmag = c.fetchone()[0]

# Exclude the Sun itself from these results by using `proper != 'Sol'`
c1 = con.execute("SELECT proper FROM stars WHERE proper is not NULL and proper != 'Sol' ORDER BY ABS(absmag-?) LIMIT 1;", (sun_absmag,))
print("3B: Star with closest light output to that of the Sun:", c1.fetchone()[0])

con.close() # No need to commit first, since we didn't modify the database.
3B: Star with closest light output to that of the Sun: Rigil Kentaurus