MCS 260 Fall 2021
Emily Dumas
Application | Retrieve http://example.com/ |
---|---|
Transport | Transmit GET / to 93.184.216.34 |
Network | Deliver this packet to 93.184.216.34 |
Link | Send this ethernet frame to the router |
Physical | Change voltages on these wires... |
We'll discuss making Application-level network requests in Python.
We focus specifically on retrieving data (documents, etc.) from a Uniform Resource Locator or URL.
The urllib
module in Python supports this. It is primarily focused on HTTP, HTTPS, and local files.
HTTP allows many types of requests. For example:
Today we'll only use GET.
Response consists of a numeric status code, some headers (key: value
pairs, one per line), then a payload.
E.g. GET a web page, the HTML will be in the payload.
There are lots of codes; first digit gives category:
Response to GET http://example.com/
Import urllib.request
to get the most convenient functions for loading URLs.
Call urllib.request.urlopen(url)
to open the URL url
using GET. It returns a response object.
Response objects behave like read-only files, and should be closed with .close()
.
If a 4xx or 5xx response is received, or if contacting the host fails, a urllib.error.URLError
exception is raised.
A HTTP response object res
has:
res.status
— the status coderes.geturl()
— returns the final URL (maybe not the one requested, if redirection used)res.read()
— returns the payload as a bytes
objectres.headers
— dict-like object storing the HTTP headers (not HTML header!)res.headers.get_content_charset()
— Return payload encoding, if knownOften the payload is meant to be a string, but you will always receive it as bytes.
To recover that string from the bytes
object returned by res.read()
, you need to call the .decode(...)
method, e.g.
enc = res.headers.get_content_charset() # probably "UTF-8"
response_string = res.read().decode(enc) # bytes -> str
An application programming interface or API is a structured way for computer programs to talk to each other.
APIs often use the network, and often use HTTP.
Some are available freely to anyone.
urllib.request.urlopen
is a great way to fetch data from HTTP APIs.
Example for today: A free dice rolling JSON API* by Steve Brazier at roll.diceapi.com
.
Examples:
http://roll.diceapi.com/json/d6
— roll one six-sided diehttp://roll.diceapi.com/json/3d6
— roll three six-sided dicehttp://roll.diceapi.com/json/4d12
— roll four twelve-sided dice* This API could disappear at any moment. It worked on November 9, 2021.
HTTP GET requests can send an associative array of parameters. For example, to send the dictionary {"name":"David","apple":"McIntosh"}
to http://example.com/
the URL would be
http://example.com/?name=David&apple=McIntosh
The parameter list begins with ?
and has &
between name=value pairs. It gets tricky when values or names have spaces, but urllib.parse.urlencode
can convert a dictionary to a suitable string.
The domain cat-fact.herokuapp.com
hosts an API* created by CS undergrad student Alex Wohlbruck for retrieving facts about cats (and other animals). E.g.
https://cat-fact.herokuapp.com/facts/random?amount=2
— two random facts about catshttps://cat-fact.herokuapp.com/facts/random?animal_type=dog&amount=1
— one random fact about dogs* This API could disappear at any moment. It worked on November 10, 2020.