Lecture 20

CSV and JSON

MCS 275 Spring 2023
Emily Dumas

Lecture 20: CSV and JSON

Reminders and announcements:

  • Project 3 coming soon.

Install Pillow

To prepare for Friday's lecture, you need the Python package Pillow. Most people can install it with

python3 -m pip install pillow

Or substitute the correct interpreter name for your platform.

If you have trouble, check the install instructions and let us know if you don't find a solution there.

Reading and writing data files

CSV

The 90% correct one-line summary:

A way to store a spreadsheet in a text file.

CSV

Comma separated values. A text file format like:


    State,Capital,Population
    Kentucky,Frankfort,25527
    South Dakota,Pierre,13646

Column headings in the first row (usually).

Untyped. Up to reader to figure out string/float/int/etc.


        District,Fin-Sub,Chrgbl Fin No,PO Name,Unit Name,Property Address,County,City,ST,ZIP Code,Property Status,Ownership,FDB ID (All),AMS Locale Key (All),FDB Facility Type (All),FDB Facility Subtype (All),Building Ownership Description,Land Desc,Space Certified Indicator,Bldg Occu Date,Int Sq Ft
        Greater Boston,431120-G01,431120,BARRINGTON,MAIN OFFICE,200 MIDDLE HWY,BRISTOL,BARRINGTON,RI,02806-9998,Active,Owned,1354095,V25837,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1974,"6,769"
        Greater Boston,432360-G01,432360,COVENTRY,MAIN OFFICE,1550 NOOSENECK HILL RD,KENT,COVENTRY,RI,02816-9998,Active,Owned,1359450,V25859,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,7/1/1990,"13,764"
        Greater Boston,434480-G01,434480,HARRISVILLE,MAIN OFFICE,131 HARRISVILLE MAIN ST,PROVIDENCE,HARRISVILLE,RI,02830-9998,Active,Owned,1366338,V25891,Post Office,Administrative Post Office (APO),"USPS Building, Transferred","USPS Land, Transferred",No,12/1/1951,"1,413"
        Greater Boston,436020-G01,436020,NEWPORT,MAIN OFFICE,320 THAMES ST STE 1,NEWPORT,NEWPORT,RI,02840-9998,Active,Owned,1375017,V25919,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,8/1/1917,"35,637"
        Greater Boston,436090-G02,436090,NORTH KINGSTOWN,MAIN OFFICE,7715 POST RD,WASHINGTON,NORTH KINGSTOWN,RI,02852-9998,Active,Owned,1375354,V25921,Post Office,Administrative Post Office (APO),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,10/1/1997,"32,629"
        Greater Boston,436580-G02,436580,PASCOAG,MAIN OFFICE,35 BRIDGE WAY,PROVIDENCE,PASCOAG,RI,02859-3132,Active,Owned,1376912,V25928,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",No,6/1/1953,"3,290"
        Greater Boston,436723-G01,436723,PAWTUCKET,CUMBERLAND BR.,2055 DIAMOND HILL RD,PROVIDENCE,CUMBERLAND,RI,02864-9998,Active,Owned,1434572,V25862,Post Office,Branch,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,3/1/1985,"9,452"
        Greater Boston,436720-G03,436720,PAWTUCKET,DARLINGTON,30 MONTICELLO RD,PROVIDENCE,PAWTUCKET,RI,02861-3810,Active,Owned,1360314,V25864,Post Office,Station,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,,"16,137"
        Greater Boston,436720-G01,436720,PAWTUCKET,MAIN OFFICE,40 MONTGOMERY ST,PROVIDENCE,PAWTUCKET,RI,02860-9998,Active,Owned,1377003,V25929,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,9/1/1933,"49,535"
        Greater Boston,436720-G01,436720,PAWTUCKET,MAIN OFFICE,40 MONTGOMERY ST,PROVIDENCE,PAWTUCKET,RI,02860-9998,Active,Owned,1434718,V28273,Post Office,Finance Station - No Delivery,"USPS Building, Transferred","USPS Land, Transferred",Yes,9/1/1933,"49,535"
        Greater Boston,436860-G01,436860,PORTSMOUTH,MAIN OFFICE,95 CHASE RD,NEWPORT,PORTSMOUTH,RI,02871-9998,Active,Owned,1378183,V25933,Post Office,Main Post Office,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,8/1/1990,"12,048"
        Greater Boston,437140-G07,437140,PROVIDENCE,CORLISS PK. STA & VMF,55 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-9722,Active,Owned,1434581,V25907,Post Office,Carrier Annex (ANX),Unknown,Unknown,Yes,7/1/1980,"39,677"
        Greater Boston,437140-G07,437140,PROVIDENCE,CORLISS PK. STA & VMF,55 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-9722,Active,Owned,1434685,V27719,Vehicle Maintenance,Vehicle Maintenance Facility (VMF),Unknown,Unknown,Yes,7/1/1980,"39,677"
        Greater Boston,437178-G01,437178,PROVIDENCE,EAST PROVIDENCE BR.,17 GROVE AVE,PROVIDENCE,EAST PROVIDENCE,RI,02914-4506,Active,Owned,1434641,V27459,Post Office,Finance Branch,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,6/1/1971,"6,085"
        Greater Boston,437166-G01,437166,PROVIDENCE,JOHNSTON BRANCH,1530 ATWOOD AVE,PROVIDENCE,JOHNSTON,RI,02919-9998,Active,Owned,1434577,V25901,Post Office,Branch,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,,"17,034"
        Greater Boston,437170-G01,437170,PROVIDENCE,OLNEYVILLE STA,100 HARTFORD AVE,PROVIDENCE,PROVIDENCE,RI,02909-9998,Active,Owned,1376101,V25927,Post Office,Station,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1967,"10,733"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1434586,V25936,Post Office,Main Post Office,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1441076,V25935,Mail Processing,Processing and Distribution Center/Facility (PDC/PDF),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1444535,8938,Administrative Office,Postal Inspection Service Field Division (USPIS),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,438260-G07,438260,WAKEFIELD,MAIN OFFICE,551 KINGSTOWN RD,WASHINGTON,WAKEFIELD,RI,02879-9998,Active,Owned,1386177,V25967,Post Office,Main Post Office,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,11/1/1997,"31,358"
        Greater Boston,438260-G01,438260,WAKEFIELD,NARRAGANSETT BR.,15 MEMORIAL SQ,WASHINGTON,NARRAGANSETT,RI,02882-3391,Active,Owned,1434584,V25916,Post Office,Finance Branch,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1916,"6,408"
        Greater Boston,438540-G01,438540,WARREN,MAIN OFFICE,53 CHILD ST,BRISTOL,WARREN,RI,02885-9998,Active,Owned,1386428,V25968,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,8/1/1931,"9,484"
        Greater Boston,438680-G01,438680,WARWICK,APPONAUG STATION,3205 POST RD,KENT,WARWICK,RI,02886-7157,Active,Owned,1353208,V27496,Post Office,Finance Station,"USPS Building, Transferred","USPS Land, Transferred",Yes,10/1/1941,"6,097"
        Greater Boston,438680-G03,438680,WARWICK,POST OFFICE ANNEX-STORAGE,3205 POST RD REAR,KENT,WARWICK,RI,02886-7141,Active,Owned,,,,,"USPS Building, Prev. Leased","USPS Land, Prev. Leased",Yes,10/1/1957,"10,131"
        Greater Boston,439380-G02,439380,WEST WARWICK,MAIN OFFICE,100 WASHINGTON ST,KENT,WEST WARWICK,RI,02893-9998,Active,Owned,1386937,V25977,Post Office,Main Post Office,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,,"17,482"
        Greater Boston,439100-G01,439100,WESTERLY,HIGH STREET STATION,5 HIGH ST,WASHINGTON,WESTERLY,RI,02891-1878,Active,Owned,1366914,V25892,Post Office,Station,"USPS Building, Transferred","USPS Land, Transferred",No,6/1/1914,"8,448"
        Greater Boston,439100-G02,439100,WESTERLY,MAIN OFFICE,110 TOM HARVEY RD,WASHINGTON,WESTERLY,RI,02891-9998,Active,Owned,1387262,V25978,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,12/1/1988,"14,236"
        Greater Boston,439800-G02,439800,WOONSOCKET,MAIN OFFICE,127 SOCIAL ST STE 1,PROVIDENCE,WOONSOCKET,RI,02895-9998,Active,Owned,1388287,V25983,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",No,3/1/1976,"34,476"
        Greater Boston,439800-G02,439800,WOONSOCKET,MAIN OFFICE,127 SOCIAL ST STE 1,PROVIDENCE,WOONSOCKET,RI,02895-9998,Active,Owned,1448921,11464,Administrative Office,Office of the Inspector General Office (OIG),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",No,3/1/1976,"34,476"
    

Source: USPS

Reading CSV


        with open("datafile.csv","r",newline="",encoding="UTF-8") as fp:
            rdr = csv.DictReader(fp)
            rownum = 1
            for row in rdr:  # reader objects are iterable (ONCE!)
                # row is a dict like {"State": "Kentucky", ...}
                print("Row",rownum)
                rownum += 1
                for colname in row:
                    print("{}: {}".format(colname,row[colname]))
    

Writing CSV


        with open("courses.csv","w",newline="",encoding="UTF-8") as fp:
            w = csv.DictWriter(fp, fieldnames=["course","instructor"])
            # Write the column headers
            w.writeheader()
            # Now write the rows of data
            w.writerow({"course": "MCS 260",
                        "instructor": "Dumas"})
            w.writerow({"course": "MCS 275",
                        "instructor": "Dumas"}
        

Writing CSV

        
            with open("courses.csv","w",newline="",encoding="UTF-8") as fp:
                w = csv.writer(fp)
                # Write the column headers
                w.writerow(["course","instructor"])
                # Now write the rows of data
                w.writerow(["MCS 260","Dumas"])
                w.writerow(["MCS 275","Dumas"])
        

JSON

The 90% correct one-line summary:

A way to store a dict in a file.

JSON

JSON stands for JavaScript object notation. It is a text-based format for typed hierarchical data.


                    {
                        "title": "Fighting robotic wasps",
                        "authors": [
                          "Paolo Cortázar"
                        ],
                        "year": 2426,
                        "tags": [
                          "nonfiction",
                          "self-help"
                        ],
                        "credits": {
                          "editor": "Fayez Okoye-Sarkis",
                          "cover design": "Teresa Duarte"
                        },
                        "checked out": true,
                        "avg star rating": 4.89
                      }
    

                    {
                        "newsFeedItemList": [
                          {
                            "title": "Illinois Ranks #2 State in the Nation for Corporate Investment ",
                            "type": "Press Release",
                            "date": "Wednesday, March 01",
                            "year": "2023",
                            "description": "CHICAGO — Site Selection Magazine – an international industry-leading business publication -- released its annual corporate expansion and relocation rankings, naming Illinois 2nd in the nation for corporate projects and Chicago the Top Metro for the 10th year in a row.",
                            "thumbnail": "https://www2.illinois.gov/IISNewsImages/rollupimages/GOV/govlogo.jpg",
                            "url": "https://www.illinois.gov/news/press-release.26121.html",
                            "altText": ""
                          },
                          {
                            "title": "State of Illinois Launches Coalition in Effort to Eliminate Viral Hepatitis in Illinois ",
                            "type": "Press Release",
                            "date": "Tuesday, February 28",
                            "year": "2023",
                            "description": "CHICAGO – The Illinois Department of Public Health (IDPH) launched a new effort today to eliminate viral hepatitis in Illinois.",
                            "thumbnail": "https://www2.illinois.gov/IISNewsImages/rollupimages/DPH/IDPHlogo.jpg",
                            "url": "https://www.illinois.gov/news/press-release.26117.html",
                            "altText": ""
                          },
                          {
                            "title": "ILLINOIS LT. GOV. JULIANA STRATTON TESTIFIES AT U.S. SENATE HEARING ON THE EQUAL RIGHTS AMENDMENT",
                            "type": "Press Release",
                            "date": "Tuesday, February 28",
                            "year": "2023",
                            "description": "WASHINGTON – The journey to equality must push forward.",
                            "thumbnail": "https://www2.illinois.gov/IISNewsImages/rollupimages/LtGov/LtGovLogo.jpg",
                            "url": "https://www.illinois.gov/news/press-release.26119.html",
                            "altText": ""
                          }
                        ]
                      }
    

Source: illinois.gov home page

JSON value types

  • string — must use double quotes.
  • number — float, int, other? Up to reader.
  • boolean — lower case names true, false.
  • null — like Python None.
  • array — like Python list. Brackets and commas.
  • object — like Python dict. Curly braces, colons, and commas. Keys must be strings.

Reading JSON


        with open("in.json","r",encoding="UTF-8") as fp:
            val = json.load(fp) # read from file
        
        # OR if you have a string
        val = json.loads(s)
    

The object returned can be hard to use if you don't have documentation for the layout of the file. But since it has keys and values, it is at least explorable.

Writing JSON


        val = { 
                "temperature": 451.3,
                "primes": [2,3,5,7,11],
                "awesome": True,
                "starter": "charmander"
              }
        with open("out.json","w",encoding="UTF-8") as fp:
            json.dump(val,fp) # save exactly one object to file
        
        # OR if you just want the JSON as a string
        s = json.dumps(val)
    

Key JSON features

  • Does not require data to be tabular.
  • Has excellent standardization and cross-language support.
  • Most HTTP APIs (e.g. data portals) return JSON.
  • Semi-readable and semi-writeable for humans.

Conversion table for Python → JSON

  • dictobject
  • list or tuplearray
  • int or floatnumber
  • boolboolean
  • Nonenull

References

Revision history

  • 2022-03-04 Last year's lecture on this topic finalized
  • 2023-03-01 Updated for 2023