Lecture 19

CSV and JSON

MCS 275 Spring 2024
Emily Dumas

View as:   Presentation   ·   PDF-exportable  ·   Printable

Lecture 19: CSV and JSON

Reminders and announcements:

  • Project 1 grading underway
  • Project 2 due Friday 11:59pm
  • Next week is "quiet" (no projects pending)
  • Project 3 announcement likely between Feb 29 and Mar 4, is due March 15

Install Pillow

In recent work, the Python image package "pillow" was recommended.

Starting on Friday it will be required. Install with

python3 -m pip install pillow

Or substitute the correct interpreter name for your platform.

If you have trouble, check the install instructions and let us know if you don't find a solution there.

Working with data in files

CSV

The 90% correct one-line summary:

A way to store a spreadsheet in a text file.

CSV

Comma separated values. A text file format like:


    State,Capital,Population
    Kentucky,Frankfort,25527
    South Dakota,Pierre,13646

Column headings in the first row (usually).

Untyped. You can write various types, but no type data is embedded in the output.


        District,Fin-Sub,Chrgbl Fin No,PO Name,Unit Name,Property Address,County,City,ST,ZIP Code,Property Status,Ownership,FDB ID (All),AMS Locale Key (All),FDB Facility Type (All),FDB Facility Subtype (All),Building Ownership Description,Land Desc,Space Certified Indicator,Bldg Occu Date,Int Sq Ft
        Greater Boston,431120-G01,431120,BARRINGTON,MAIN OFFICE,200 MIDDLE HWY,BRISTOL,BARRINGTON,RI,02806-9998,Active,Owned,1354095,V25837,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1974,"6,769"
        Greater Boston,432360-G01,432360,COVENTRY,MAIN OFFICE,1550 NOOSENECK HILL RD,KENT,COVENTRY,RI,02816-9998,Active,Owned,1359450,V25859,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,7/1/1990,"13,764"
        Greater Boston,434480-G01,434480,HARRISVILLE,MAIN OFFICE,131 HARRISVILLE MAIN ST,PROVIDENCE,HARRISVILLE,RI,02830-9998,Active,Owned,1366338,V25891,Post Office,Administrative Post Office (APO),"USPS Building, Transferred","USPS Land, Transferred",No,12/1/1951,"1,413"
        Greater Boston,436020-G01,436020,NEWPORT,MAIN OFFICE,320 THAMES ST STE 1,NEWPORT,NEWPORT,RI,02840-9998,Active,Owned,1375017,V25919,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,8/1/1917,"35,637"
        Greater Boston,436090-G02,436090,NORTH KINGSTOWN,MAIN OFFICE,7715 POST RD,WASHINGTON,NORTH KINGSTOWN,RI,02852-9998,Active,Owned,1375354,V25921,Post Office,Administrative Post Office (APO),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,10/1/1997,"32,629"
        Greater Boston,436580-G02,436580,PASCOAG,MAIN OFFICE,35 BRIDGE WAY,PROVIDENCE,PASCOAG,RI,02859-3132,Active,Owned,1376912,V25928,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",No,6/1/1953,"3,290"
        Greater Boston,436723-G01,436723,PAWTUCKET,CUMBERLAND BR.,2055 DIAMOND HILL RD,PROVIDENCE,CUMBERLAND,RI,02864-9998,Active,Owned,1434572,V25862,Post Office,Branch,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,3/1/1985,"9,452"
        Greater Boston,436720-G03,436720,PAWTUCKET,DARLINGTON,30 MONTICELLO RD,PROVIDENCE,PAWTUCKET,RI,02861-3810,Active,Owned,1360314,V25864,Post Office,Station,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,,"16,137"
        Greater Boston,436720-G01,436720,PAWTUCKET,MAIN OFFICE,40 MONTGOMERY ST,PROVIDENCE,PAWTUCKET,RI,02860-9998,Active,Owned,1377003,V25929,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,9/1/1933,"49,535"
        Greater Boston,436720-G01,436720,PAWTUCKET,MAIN OFFICE,40 MONTGOMERY ST,PROVIDENCE,PAWTUCKET,RI,02860-9998,Active,Owned,1434718,V28273,Post Office,Finance Station - No Delivery,"USPS Building, Transferred","USPS Land, Transferred",Yes,9/1/1933,"49,535"
        Greater Boston,436860-G01,436860,PORTSMOUTH,MAIN OFFICE,95 CHASE RD,NEWPORT,PORTSMOUTH,RI,02871-9998,Active,Owned,1378183,V25933,Post Office,Main Post Office,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,8/1/1990,"12,048"
        Greater Boston,437140-G07,437140,PROVIDENCE,CORLISS PK. STA & VMF,55 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-9722,Active,Owned,1434581,V25907,Post Office,Carrier Annex (ANX),Unknown,Unknown,Yes,7/1/1980,"39,677"
        Greater Boston,437140-G07,437140,PROVIDENCE,CORLISS PK. STA & VMF,55 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-9722,Active,Owned,1434685,V27719,Vehicle Maintenance,Vehicle Maintenance Facility (VMF),Unknown,Unknown,Yes,7/1/1980,"39,677"
        Greater Boston,437178-G01,437178,PROVIDENCE,EAST PROVIDENCE BR.,17 GROVE AVE,PROVIDENCE,EAST PROVIDENCE,RI,02914-4506,Active,Owned,1434641,V27459,Post Office,Finance Branch,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,6/1/1971,"6,085"
        Greater Boston,437166-G01,437166,PROVIDENCE,JOHNSTON BRANCH,1530 ATWOOD AVE,PROVIDENCE,JOHNSTON,RI,02919-9998,Active,Owned,1434577,V25901,Post Office,Branch,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,,"17,034"
        Greater Boston,437170-G01,437170,PROVIDENCE,OLNEYVILLE STA,100 HARTFORD AVE,PROVIDENCE,PROVIDENCE,RI,02909-9998,Active,Owned,1376101,V25927,Post Office,Station,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1967,"10,733"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1434586,V25936,Post Office,Main Post Office,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1441076,V25935,Mail Processing,Processing and Distribution Center/Facility (PDC/PDF),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1444535,8938,Administrative Office,Postal Inspection Service Field Division (USPIS),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,438260-G07,438260,WAKEFIELD,MAIN OFFICE,551 KINGSTOWN RD,WASHINGTON,WAKEFIELD,RI,02879-9998,Active,Owned,1386177,V25967,Post Office,Main Post Office,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,11/1/1997,"31,358"
        Greater Boston,438260-G01,438260,WAKEFIELD,NARRAGANSETT BR.,15 MEMORIAL SQ,WASHINGTON,NARRAGANSETT,RI,02882-3391,Active,Owned,1434584,V25916,Post Office,Finance Branch,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1916,"6,408"
        Greater Boston,438540-G01,438540,WARREN,MAIN OFFICE,53 CHILD ST,BRISTOL,WARREN,RI,02885-9998,Active,Owned,1386428,V25968,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,8/1/1931,"9,484"
        Greater Boston,438680-G01,438680,WARWICK,APPONAUG STATION,3205 POST RD,KENT,WARWICK,RI,02886-7157,Active,Owned,1353208,V27496,Post Office,Finance Station,"USPS Building, Transferred","USPS Land, Transferred",Yes,10/1/1941,"6,097"
        Greater Boston,438680-G03,438680,WARWICK,POST OFFICE ANNEX-STORAGE,3205 POST RD REAR,KENT,WARWICK,RI,02886-7141,Active,Owned,,,,,"USPS Building, Prev. Leased","USPS Land, Prev. Leased",Yes,10/1/1957,"10,131"
        Greater Boston,439380-G02,439380,WEST WARWICK,MAIN OFFICE,100 WASHINGTON ST,KENT,WEST WARWICK,RI,02893-9998,Active,Owned,1386937,V25977,Post Office,Main Post Office,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,,"17,482"
        Greater Boston,439100-G01,439100,WESTERLY,HIGH STREET STATION,5 HIGH ST,WASHINGTON,WESTERLY,RI,02891-1878,Active,Owned,1366914,V25892,Post Office,Station,"USPS Building, Transferred","USPS Land, Transferred",No,6/1/1914,"8,448"
        Greater Boston,439100-G02,439100,WESTERLY,MAIN OFFICE,110 TOM HARVEY RD,WASHINGTON,WESTERLY,RI,02891-9998,Active,Owned,1387262,V25978,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,12/1/1988,"14,236"
        Greater Boston,439800-G02,439800,WOONSOCKET,MAIN OFFICE,127 SOCIAL ST STE 1,PROVIDENCE,WOONSOCKET,RI,02895-9998,Active,Owned,1388287,V25983,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",No,3/1/1976,"34,476"
        Greater Boston,439800-G02,439800,WOONSOCKET,MAIN OFFICE,127 SOCIAL ST STE 1,PROVIDENCE,WOONSOCKET,RI,02895-9998,Active,Owned,1448921,11464,Administrative Office,Office of the Inspector General Office (OIG),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",No,3/1/1976,"34,476"
    

Source: USPS

Reading CSV

Options

  • Read lines of text and split manually (not recommended)
  • csv module (basic, in standard library)
  • pandas module (advanced, heavy, requires installation)

Reading CSV


            import csv
            with open("datafile.csv","r",newline="",encoding="UTF-8") as fp:
            rdr = csv.DictReader(fp)
            for row in rdr:  # reader objects are iterable (ONCE!)
                print("Row:")
                for colname in row:
                    print("{}: {}".format(colname,row[colname]))
    

Writing CSV


        with open("courses.csv","w",newline="",encoding="UTF-8") as fp:
            w = csv.DictWriter(fp, fieldnames=["course","instructor"])
            # Write the column headers
            w.writeheader()
            # Now write the rows of data
            w.writerow({"course": "MCS 260",
                        "instructor": "Dumas"})
            w.writerow({"course": "MCS 275",
                        "instructor": "Dumas"})
        

Writing CSV

        
            with open("courses.csv","w",newline="",encoding="UTF-8") as fp:
                w = csv.writer(fp)
                # Write the column headers
                w.writerow(["course","instructor"])
                # Now write the rows of data
                w.writerow(["MCS 260","Dumas"])
                w.writerow(["MCS 275","Dumas"])
        

JSON

The 90% correct one-line summary:

A way to store a dict in a file.

JSON

JSON stands for JavaScript object notation. It is a text-based format for typed hierarchical data.


                    {
                        "title": "How to get your AI-powered toaster to like you",
                        "authors": [
                          "Amelia T. Abraham"
                        ],
                        "year": 2039,
                        "tags": [
                          "nonfiction",
                          "opinion"
                        ],
                        "credits": {
                          "editor": "Martina Liselotte",
                          "cover design": "Imelda Ágata"
                        },
                        "checked out": true,
                        "avg star rating": 4.89
                    }
    

                    {
                        "newsFeedItemList": [
                          {
                            "title": "Illinois Department of Labor, Illinois Coalition for Immigrant and Refugee Rights Partner on Labor Rights Outreach Initiative",
                            "type": "Press Release",
                            "date": "Tuesday, February 20",
                            "year": "2024",
                            "description": "CHICAGO – The Illinois Department of Labor (IDOL) and the Illinois Coalition for Immigrant and Refugee Rights (ICIRR) are partnering on an education and outreach program to raise awareness of workplace protections that apply to all workers in Illinois, with a particular focus on labor exploitation and child labor.",
                            "thumbnail": "https://www2.illinois.gov/IISNewsImages/RollupImages/DOL/IDOLLogo.jpg",
                            "url": "https://www.illinois.gov/news/press-release.29647.html",
                            "altText": ""
                          },
                          {
                            "title": "ILLINOIS STATE POLICE METROPOLITAN ENFORCEMENT GROUPS AND DRUG TASK FORCES SUCCESSES IN 2023",
                            "type": "Press Release",
                            "date": "Tuesday, February 20",
                            "year": "2024",
                            "description": "SPRINGFIELD – The Illinois State Police (ISP), working with law enforcement and community partners, continues to decrease the supply of illegal guns and drugs devastating neighborhoods across the state.",
                            "thumbnail": "https://www2.illinois.gov/IISNewsImages/rollupimages/ISP/ISPlogo.jpg",
                            "url": "https://www.illinois.gov/news/press-release.29648.html",
                            "altText": ""
                          },
                          {
                            "title": "2023 Tazewell County Final Multiplier Announced",
                            "type": "Press Release",
                            "date": "Tuesday, February 20",
                            "year": "2024",
                            "description": "SPRINGFIELD, IL, - Tazewell County has been issued a final property assessment equalization factor of 1.0000, according to David Harris, Director of the Illinois Department of Revenue.",
                            "thumbnail": "https://www2.illinois.gov/IISNewsImages/rollupimages/DOR/DORlogosm.gif",
                            "url": "https://www.illinois.gov/news/press-release.29649.html",
                            "altText": ""
                          }
                        ]
                      }
    

Source: illinois.gov home page

Fetch URLs

curl is a convenient command-line tool to fetch data from the web.

JSON value types

  • string — must use double quotes.
  • number — float, int, other? Up to reader.
  • boolean — lower case names true, false.
  • null — like Python None.
  • array — like Python list. Brackets and commas.
  • object — like Python dict. Curly braces, colons, and commas. Keys must be strings.

Reading JSON


        with open("in.json","r",encoding="UTF-8") as fp:
            val = json.load(fp) # read from file
        
        # OR if you have a string
        val = json.loads(s)
    

The object returned can be hard to use if you don't have documentation for the layout of the file. But since it has keys and values, it is at least explorable.

Writing JSON


        val = { 
                "temperature": 451.3,
                "primes": [2,3,5,7,11],
                "awesome": True,
                "starter": "charmander"
              }
        with open("out.json","w",encoding="UTF-8") as fp:
            json.dump(val,fp) # save exactly one object to file
        
        # OR if you just want the JSON as a string
        s = json.dumps(val)
    

Key JSON features

  • Does not require data to be tabular.
  • Has excellent standardization and cross-language support.
  • Most HTTP APIs (e.g. data portals) return JSON.
  • Semi-readable and semi-writeable for humans.

Conversion table for Python → JSON

  • dictobject
  • list or tuplearray
  • int or floatnumber
  • boolboolean
  • Nonenull

References

Revision history

  • 2023-03-01 Finalization of the 2023 lecture this was based on
  • 2024-02-21 Initial publication