A document from MCS 275 Spring 2021, instructor Emily Dumas. You can also get the notebook file.

Worksheet 12 Solutions

MCS 275 Spring 2021 - Emily Dumas

Solutions by Jennifer Vaccaro

Topics

This worksheet focuses on SQLite databases.

The main references for these topics are:

Instructions

  • To complete this worksheet you'll need to have matplotlib installed, or
  • Problem 1 is handled differently than the others:
    • Tuesday discussion students: Problem 1 will be presented as an example at the start of discussion
    • Thursday discussion students: Please complete Problem 1 before discussion and bring your solution
  • For the other problems:
    • Work on these problems in discussion.

Star database

Questions 1 and 3 use the HYG star database in SQLite format available at:

So you'll need to download this, unzip it, and put the hyg_data.sqlite file in a place where you can find it.

The only table in this database is called stars, and the columns present in that table are described at

1. Basic star queries

Use SQL queries run against the HYG star database to answer these questions. (You should consider both the query and its return value as part of the answer.)

1A. Dim stars

What are the ids and proper names of the five dimmest stars in the database (as measured by magnitude)?

1B. Unnamed and blue

Find the positions (as (ra,dec) coordinates) of the ten "most blue" stars that don't have proper names in the database. (The column ci is proportional to how blue a star is.)

In [3]:
# MCS 275 Week 12 Problem 1
# J Vaccaro
# Work incollaboration with Tuesday and Thursday discussions. 
# Thanks to Melanie for pointing out that mag is inverse from brightness, respectively, while higher ci=more blue

import os
DATADIR = "hyg_data"
DBFILE = os.path.join(DATADIR,"hyg_data.sqlite")

import sqlite3
con = sqlite3.connect(DBFILE)

# 1A ids and proper names of the 5 lowest-brightness stars, but mag is inverse to brightness
c = con.execute("SELECT id,proper FROM stars ORDER BY mag DESC LIMIT 5")
print("Part A: five lowest-brightness stars")
for s in c:
    print(s)

# 1B 10 unnamed and blue stars (WOBL order)
# Only selects the positions
c = con.execute("SELECT ra,dec FROM stars WHERE proper IS NULL AND ci!='' ORDER BY ci DESC LIMIT 10") 

## Selects the positions, name, and blueness for checking work
#c = con.execute("SELECT ra,dec,proper,ci FROM stars WHERE proper IS NULL AND ci!='' ORDER BY ci DESC LIMIT 10") 

print("\nPart B: ten unnamed stars in order of blueness")
for s in c:
    print(s)

# No need to commit first, since we didn't modify the database.
con.close()
Part A: five lowest-brightness stars
(118947, None)
(119182, None)
(118972, None)
(118747, None)
(118781, None)

Part B: ten unnamed stars in order of blueness
(18.53891, 36.99879)
(8.944483, 19.849147)
(5.720509, -46.453841)
(15.825364, 39.571637)
(12.788448, 4.144822)
(21.920492, 50.497146)
(15.466949, -75.929604)
(9.92392, -41.586884)
(10.372171, -47.972659)
(19.953475, 44.261121)

2. Programming joke utility

Expressions in SQL

There is a new SQLite feature (or more precisely, a feature of the SQL dialect that SQLite uses for queries) that you'll need to use in this problem.

In many places where we've used column names in our queries, you can also use expressions that apply arithmetic operators and other functions to the values in the columns. For example, if a database of MCS 275 grades has columns called project3pct and project4pct, then this query would return the email addresses of students whose grades on those two projects differed by more than 10 percent:

SELECT email FROM mcs275roster WHERE ABS(project3pct - project4pct) > 10;

You can also use expressions like this in the requested list of output columns. For example, this query would get the average of project 3 and 4 percentages for all students, listed in alphabetical order by last name.

SELECT lastname, firstname, 0.5*(project3pct + project4pct) FROM mcs275roster ORDER BY lastname;

Such expressions can also be used after ORDER BY to make a custom sort.

You can find lists of built-in functions in SQLite in the documentation:

The task

Write a program that stores, delivers, and ranks programming jokes using a SQLite database. It should support three operations:

  • Add a joke: The user provides a string and the program adds it to the joke table. In addition to the joke itself, the joke table has columns to store the number of times a joke has been told, and how many times it was successful in amusing the user.
  • Tell a joke: A random joke is selected and displayed. The user is asked if they found it amusing. Their answer is recorded by updating the relevant counts (times told, times successful) for that row.
  • Best jokes list: Among jokes that have succeeded at least once, display the top 5 ranked by success rate (ratio of times successful to times told).

The program should create the database and table it needs if they don't already exist. Otherwise, it should open and use the existing database.

The three functions should be selected using command line arguments. The first command line argument is always the command---one of add, tell, or best. If the command is add, then a second command line argument is required, which is the joke itself. If the command is tell, no other arguments are required but the user is prompted for their approval/disapproval of the joke through keyboard input.

Hints:

  1. Even without you asking to do so, SQLite will add an extra column to your table called rowid that has a distinct integer value for each row. This is a primary key. It won't return this column unless you ask for it explicitly (e.g. SELECT * FROM ... won't show it, but SELECT rowid FROM ... or SELECT rowid,* FROM ... will. Having a unique id for each row is helpful so you can retrieve a row, and then apply an operation to the same row later.
  2. SQLite doesn't have a feature to select a random row directly, but it does have a built-in function RANDOM() that will return a random number for each row, and you order by that.
  3. In a SQLite query, if columns A and B both have type integer then A/B computes the integer division of A by B. In contrast, 1.0*A/B would give the true quotient because the multiplication by 1.0 converts A to a float (or REAL, in SQLite terminology).

Save the program as jokedb.py.

Here is a sample session of what using it should look like. (These are from a linux terminal session, where the terminal prompt is "$". In Windows PowerShell, the prompt will look a bit different.)

$ python3 jokedb.py tell
ERROR: No jokes in database.

[... omitted: several jokes are added ...]

$ python3 jokedb.py tell
There are 10 types of people in the world: Those who understand binary, and those who don't.

Were you amused by this? (Y/N)y

$ python3 jokedb.py tell
The two hardest things in programming are naming things, cache invalidation, and off-by-one errors.

Were you amused by this? (Y/N)n

$ python3 jokedb.py add "Most people agree that there were no widely-used high-level programming languages before FORTRAN.  Unfortunately, there is no agreement on whether this makes FORTRAN the 1st such language, or the 0th."

$ python3 jokedb.py tell
After learning Python, my kids stopped saying 'I won't take out the garbage!'.  Instead, they say 'take_out_garbage() is deprecated in v2.0'.

Were you amused by this? (Y/N)y

$ python3 jokedb.py best
-------------------------------------------------------
#1 with 100% success rate after 8 tellings:
Knock knock.
Race condition.
Who's there?
-------------------------------------------------------
#2 with 71% success rate after 7 tellings:
There are 10 types of people in the world: Those who understand binary, and those who don't.
-------------------------------------------------------
#3 with 67% success rate after 6 tellings:
A software testing engineer walks into a bar and orders a refrigerator, -1 glasses of water, and INT_MAX+1 cans of soda.
-------------------------------------------------------
#4 with 60% success rate after 5 tellings:
After learning Python, my kids stopped saying 'I won't take out the garbage!'.  Instead, they say 'take_out_garbage() is deprecated in v2.0'.
-------------------------------------------------------
#5 with 50% success rate after 4 tellings:
The two hardest things in programming are naming things, cache invalidation, and off-by-one errors.

$
In [ ]:
# MCS 275 Week 12 Problem 2
# J Vaccaro
# I completed this work myself, in accordance with the syllabus.
"""Program for adding and telling jokes from a database jokes.sqlite"""

import sys
import sqlite3

def add(con, joke):
    """If db table exists, add a joke. Otherwise, create a table and add the joke."""
    con.execute("CREATE TABLE IF NOT EXISTS jokes (joke TEXT, tells INT,success INT)")
    con.execute("INSERT INTO jokes VALUES (?,?,?);",(joke,0,0)) # We haven't told the joke yet, so other vals set to 0
    con.commit()

def tell(con):
    """Tells a random joke from the database"""
    # Select a random joke, and its corresponding row data
    c = con.execute("SELECT rowid,* FROM jokes ORDER BY RANDOM() LIMIT 1")
    # Unpack the values and print out the joke
    rowid,joke,tells,success = c.fetchone()
    print(joke)

    # Now, update the variables for joke data
    tells += 1
    # While loop to account for invalid user input
    while True: 
        rank = input("Was this a good joke? (Y/N)")
        if rank.upper() == "Y":
            # It's a good joke!
            success += 1
            break
        elif rank.upper() == "N":
            # It's a bad joke :-(
            success += 0
            break
        else:
            print("Invalid input, please try again.")
    # Update the tells and success values for the joke we told (identified by rowid)
    con.execute("UPDATE jokes SET tells=?,success=? WHERE rowid==?;",(tells,success,rowid))
    con.commit()

def best(con):
    """Prints out the top 5 jokes, based on ratings"""
    c = con.execute("SELECT * FROM jokes WHERE tells>0 ORDER BY -1.0*success/tells LIMIT 5")
    for i,joke in enumerate(c):
        best_print(i+1,joke[0],joke[1],joke[2])

def best_print(num, joke, tells, successes):
    """Prints joke data with a specific format"""
    print("-------------------------")
    if tells == 0:
        print("#{} with no tellings yet:")
    else:
        print("#{} with {}% success rate after {} tellings:".format(num,int(100*successes/tells),tells,joke))
    print(joke)

if __name__=="__main__":
    cmd = sys.argv[1]
    con = sqlite3.connect("jokes.sqlite")

    # Check the command value, and call the appropriate function
    if cmd == "add":
        joke = sys.argv[2]
        add(con,joke)
    elif cmd == "tell":
        tell(con)
    elif cmd == "best":
        best(con)
    
    # Close the connection (any changes were commited within the function)
    con.close()

3. More complex star queries

COUNT

This is another SQLite feature to know about (which will be helpful in this problem).

When you make a SELECT query, if you only want to know how many rows would be returned, and not the actual data, you can ask for

COUNT(*)

in place of the list of columns you would otherwise include. For example,

SELECT COUNT(*) FROM mcs275roster WHERE project4pct >= 90;

might return the number of students who scored 90 percent or higher on project 4.

3A. Siriusly bright

While mag is a column indicating the apparent brightness of the star as seen from earth, this is not the same as the amount of light a star emits. Some stars seem dim only because they are very far from the earth. To account for this, the column absmag (absolute magnitude) indicates the brightness of the star if it were observed from a certain standard distance (about 32 light-years). Thus absmag is a measure of the light energy output of the star. Both mag and absmag use a scale where smaller numbers correspond to more light energy.

The brightest star in the night sky is Sirius. But how many stars in this database actually emit more light than Sirius (and appear dimmer due to being farther away)?

Of those, what fraction are less blue than Sirius?

3B. Solar twin

What named star the database has energy output closest to that of the sun?

3C. Extras

Only work on these if you've finished everything else. Solutions will not be provided.

  • Suppose I see a star and want to know which one it is. Given a position (ra,dec), an approximate color (e.g. redder than polaris, bluer than polaris, or unsure), and a viewing method (naked eye, which can see magnitudes up to about 3 near Chicago, or small telecope, which can see magnitudes up to about 8), write a Python script that will identify likely candidates from the database and report the top matches.
  • Stars are moving. The database contains velocity vectors (in x, y, z components) relative to a certain coordinate system where the sun is not moving, but only for some of the stars. Which star in the database is moving fastest, as measured by the usual length $\| (vx,vy,vz) \| = \sqrt{vx^2 + vy^2 + vz^2}$? How far down the list ranked by velocity (highest first) do you need to go to find a star with a name? To find a star with magitude 3 or less?
In [5]:
# MCS 275 Week 12 Problem 3
# J Vaccaro
# I completed this work myself, in accordance with the syllabus.

import os
DATADIR = "hyg_data"
DBFILE = os.path.join(DATADIR,"hyg_data.sqlite")

import sqlite3
con = sqlite3.connect(DBFILE)

# First, get the sun and sirius's mag/absmag
c = con.execute("SELECT proper,mag,absmag,ci FROM stars ORDER BY mag LIMIT 2;")
print("30: Get the Sun and Sirius's mag/absmag")
# Brightest is the sun, 2nd brightest is Sirius
sun_name,sun_mag,sun_absmag,sun_blue = c.fetchone()
print(sun_name,sun_mag,sun_absmag,sun_blue)
sirius_name,sirius_mag,sirius_absmag,sirius_blue = c.fetchone()
print(sirius_name,sirius_mag,sirius_absmag,sirius_blue)

# 3A Compare with Sirius 
c1 = con.execute("SELECT COUNT(*) FROM stars WHERE absmag<? ;",(sirius_absmag,))
count_brighter, = c1.fetchone()
print("\n3A: Num with greater absmag than Sirius:",count_brighter)
c2 = con.execute("SELECT COUNT(*) FROM stars WHERE absmag<? AND ci<?;",(sirius_absmag,sirius_blue))
count_brighter_lessblue, = c2.fetchone()
print("3A: Ratio brighter and less blue than Sirius: {}".format(count_brighter_lessblue/count_brighter))

# 3B Compare with Sun
c3 = con.execute("SELECT proper,absmag FROM stars WHERE proper IS NOT NULL AND proper!=? ORDER BY ABS(absmag-?) LIMIT 1;",(sun_name,sun_absmag))

print("\n3B: Most similar light emission to the sun:")
print(*c3.fetchone())

# Close down the connection.
con.close()
30: Get the Sun and Sirius's mag/absmag
Sol -26.7 4.85 0.656
Sirius -1.44 1.454 0.009

3A: Num with greater absmag than Sirius: 58873
3A: Ratio brighter and less blue than Sirius: 0.12042871944694512

3B: Most similar light emission to the sun:
Rigil Kentaurus 4.379
In [ ]: