This worksheet focuses on SQLite databases.
These things might be helpful while working on the problems. Remember that for worksheets, we don't strictly limit what resources you can consult, so these are only suggestions.
The main references for these topics are:
You'll need to complete these before you can work on the worksheet.
Quiz 12 will also be impossible to work on until you've successfully completed these steps.
There's nothing to install if you just want to use SQLite with Python, as the module sqlite3
is already part of the Python standard library.
However, if you just want to run one SQL query and show the results, a fairly long Python program is needed. It's much easier if you can install the SQLite REPL (or "command line shell") that lets you run queries directly from a prompt in your terminal. This worksheet assumes you have the command line shell for SQLite.
Lectures 30 and 32 discussed how to install this. Here is a quick video showing the steps to install in Windows 10:
Here are more detailed written installation instructions by platform.
If you use Windows, you need to install it yourself. The installation doesn't look like the graphical ones you may be used to, with a window and buttons guiding you through the steps. Instead you download a zip file and extract it. The whole thing is relatively quick, but the steps below are described in some detail, so the written instructions are a bit long. (Consider just watching the 1-minute video and following along instead.)
sqlite-tools-win32-x86-
. The description next to the link should begin: A bundle of command-line tools for managing SQLite database filessqlite-tools-win32-x86-3380200
. Double click to enter that folder.You should now see a list of three files, named
sqldiff.exe
sqlite3.exe
sqlite3_analyzer.exe
However, the .exe
may be missing if explorer is configured to hide file extensions (the default).
sqlite3.exe
is the only one you want. Drag that file to the desktop to extract a copy of it.sqlite3
icon on the desktop. Don't click it; we'll work in the terminal instead.cd Desktop
or cd C:\Users\myusername\Desktop
or cd C:\Users\myusername\OneDrive\Desktop
, depding on where PowerShell opens and whether you use OneDrive desktop backup).\sqlite3.exe
.exit
to return to PowerShellsqlite3.exe
file, e.g. C:\Users\myusername\Desktop\sqlite3.exe
.& 'C:\Users\My Username Has Spaces\Desktop\sqlite3.exe'
If you use Linux or MacOS, SQLite's command line shell is almost always pre-installed. Type sqlite3
in a terminal and press enter. Success (meaning it is already installed) looks something like this:
sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite>
(at which point you'd want to exit using command .exit
)
Failure (meaning it is not already installed) looks something like this:
$ sqlite3
sqlite3: command not found
If you use Linux or MacOS and SQLite's command line shell is not already installed, contact your TA or instructor for help.
Any time you have your terminal open, there are three situations you may find yourself in:
PS C:\Users\ddumas\Desktop>
or
$
The Python REPL is running, and waiting for a Python statement from you. If this is the case, you'll see a prompt
>>>
and you can quit back to the terminal using the command exit()
.
The SQLite command line shell is running, and is waiting for a SQL command from you. If this is the case, you'll see a prompt
sqlite>
and you can quit back to the terminal using the command .exit
To summarize
When running | prompt looks like | exit with | and then |
---|---|---|---|
Terminal (Windows) | PS C:\Users> |
exit |
window closes |
Terminal (Mac/Linux) | $ |
exit |
window closes |
Python | >>> |
exit() |
back to terminal |
SQLite | sqlite> |
.exit |
back to terminal |
Questions 1 uses the HYG star database discussed in Lecture 32. You
So you'll need to download this and put the file in a place where you can find it.
The only table in this database is called stars
, and the columns present in that table are described at
hyg_data.sqlite
hyg_data.sqlite
as the first command line argument, e.g. a typical Windows command for PowerShell would be
C:\Users\myusername\Desktop\sqlite3.exe hyg_data.sqlite
and a typical Linux or MacOS terminal command would be
sqlite3 hyg_data.sqlite
After pressing enter, you should see that SQLite is running and waiting for a command with the prompt
sqlite>
SELECT COUNT(*) FROM stars;
If everything is working, the output should look like this:
sqlite> SELECT COUNT(*) FROM stars;
119614
sqlite>
If you instead see this:
sqlite> SELECT COUNT(*) FROM stars;
Error: no such table: stars
sqlite>
then it means you ran sqlite3
successfully, but in a directory that didn't contain the star database. That directory will now contain an empty file named hyg_data.sqlite
, which you should probably find and remove to prevent yourself from later confusing it with the actual database file.Use SQL queries run against the HYG star database to answer these questions. (You should consider both the query and its return value as part of the answer.)
What are the id
s, color indices (ci
), and names (proper
) of the five dimmest stars in the database (as measured by mag
nitude)? Hints:
Find the positions (as (ra,dec)
coordinates) of the ten "most blue" stars that don't have proper names in the database. (The column ci
is proportional to how blue a star is, so large values in that column mean very blue.)
There is a new SQLite feature (or more precisely, a feature of the SQL dialect that SQLite uses for queries) that you'll need to use in this problem.
In many places where we've used column names in our queries, you can also use expressions that apply arithmetic operators and other functions to the values in the columns. For example, if a database of MCS 275 grades has columns called project3pct
and project4pct
, then this query would return the email addresses of students whose grades on those two projects differed by more than 10 percent:
SELECT email FROM mcs275roster WHERE ABS(project3pct - project4pct) > 10;
You can also use expressions like this in the requested list of output columns. For example, this query would get the average of project 3 and 4 percentages for all students, listed in alphabetical order by last name.
SELECT lastname, firstname, 0.5*(project3pct + project4pct) FROM mcs275roster ORDER BY lastname;
Such expressions can also be used after ORDER BY
to make a custom sort.
You can find lists of built-in functions in SQLite in the documentation:
Write a program that stores, delivers, and ranks programming jokes using a SQLite database. It should support three operations:
The program should create the database and table it needs if they don't already exist. Otherwise, it should open and use the existing database.
The three functions should be selected using command line arguments. The first command line argument is always the command---one of add
, tell
, or best
. If the command is add
, then a second command line argument is required, which is the joke itself. If the command is tell
, no other arguments are required but the user is prompted for their approval/disapproval of the joke through keyboard input.
Hints:
rowid
that has a distinct integer value for each row. This is a primary key. It won't return this column unless you ask for it explicitly (e.g. SELECT * FROM ...
won't show it, but SELECT rowid FROM ...
or SELECT rowid,* FROM ...
will. Having a unique id for each row is helpful so you can retrieve a row, and then apply an operation to the same row later.RANDOM()
that will return a random number for each row, and you order by that.A
and B
both have type integer then A/B
computes the integer division of A
by B
. In contrast, 1.0*A/B
would give the true quotient because the multiplication by 1.0 converts A
to a float (or REAL
, in SQLite terminology).Save the program as jokedb.py
.
Here is a sample session of what using it should look like. (These are from a linux terminal session, where the terminal prompt is "$". In Windows PowerShell, the prompt will look a bit different.)
$ python3 jokedb.py tell
ERROR: No jokes in database.
[... omitted: several jokes are added ...]
$ python3 jokedb.py tell
There are 10 types of people in the world: Those who understand binary, and those who don't.
Were you amused by this? (Y/N)y
$ python3 jokedb.py tell
The two hardest things in programming are naming things, cache invalidation, and off-by-one errors.
Were you amused by this? (Y/N)n
$ python3 jokedb.py add "Most people agree that there were no widely-used high-level programming languages before FORTRAN. Unfortunately, there is no agreement on whether this makes FORTRAN the 1st such language, or the 0th."
$ python3 jokedb.py tell
After learning Python, my kids stopped saying 'I won't take out the garbage!'. Instead, they say 'take_out_garbage() is deprecated in v2.0'.
Were you amused by this? (Y/N)y
$ python3 jokedb.py best
-------------------------------------------------------
#1 with 100% success rate after 8 tellings:
Knock knock.
Race condition.
Who's there?
-------------------------------------------------------
#2 with 71% success rate after 7 tellings:
There are 10 types of people in the world: Those who understand binary, and those who don't.
-------------------------------------------------------
#3 with 67% success rate after 6 tellings:
A software testing engineer walks into a bar and orders a refrigerator, -1 glasses of water, and INT_MAX+1 cans of soda.
-------------------------------------------------------
#4 with 60% success rate after 5 tellings:
After learning Python, my kids stopped saying 'I won't take out the garbage!'. Instead, they say 'take_out_garbage() is deprecated in v2.0'.
-------------------------------------------------------
#5 with 50% success rate after 4 tellings:
The two hardest things in programming are naming things, cache invalidation, and off-by-one errors.
$
This is another SQLite feature to know about (which will be helpful in this problem).
When you make a SELECT query, if you only want to know how many rows would be returned, and not the actual data, you can ask for
COUNT(*)
in place of the list of columns you would otherwise include. For example,
SELECT COUNT(*) FROM mcs275roster WHERE project4pct >= 90;
might return the number of students who scored 90 percent or higher on project 4.
While mag
is a column indicating the apparent brightness of the star as seen from earth, this is not the same as the amount of light a star emits. Some stars seem dim only because they are very far from the earth. To account for this, the column absmag
(absolute magnitude) indicates the brightness of the star if it were observed from a certain standard distance (about 32 light-years). Thus absmag
is a measure of the light energy output of the star. Both mag
and absmag
use a scale where smaller numbers correspond to more light energy.
The brightest star in the night sky is Sirius. But how many stars in this database actually emit more light than Sirius (and appear dimmer due to being farther away)?
Of those, what fraction are less blue than Sirius?
What named star the database has energy output closest to that of the sun?
Hints: