Lecture 16

CSV

MCS 260 Fall 2021
David Dumas

Reminders

  • Read the project 1 solution and ask questions!
  • Project 2 due 6pm central time Fri Oct 8
  • Project 2 autograder opens Mon Oct 4
  • Change from worksheet 7 onwards: Problem 1 is for whole-lab discussion.
  • CSV

    CSV is a format for storing tabular data in a text file.

    Basic format: Each line contains some values, separated by commas. There is no universally accepted way to indicate types (e.g. string vs int).

    
            fullname,midterm,final,hwk_avg
            Maureen Singh,82.0,91.5,94.0
            Yousuf Shaw,78.0,89.0,96.5
        

    Often, the first line contains column headers.

    CSV vs JSON

    • JSON - for arbitrary data structures, especially hierarchies. Verbose if many dictionaries have the same keys. Typed.
      • Mostly for talking to other programs.
    • CSV - for tabular data, i.e. text representation of a spreadsheet. Untyped.
      • Mostly for interacting with spreadsheets and databases.

    Reading CSV

    csv.reader(f) returns an iterable that gives the rows one by one, as lists of values.

    Use the return value in a for loop to process the file row by row.

    Hit it with list() if you need the whole list a once (rare).

    Important: When opening to read/write CSV you need to give open() an extra argument newline="".

    Reading CSV (cont'd)

    If the CSV file has has headers, a better option is csv.DictReader(f) which yields rows as dictionaries, using column headers as keys.

    csv.DictReader does not return the header row.

    Writing CSV

    csv.writer(f) takes a file object and returns a writer object, which has a useful method:

    • .writerow(L) — Write the items in iterable L to a row in the file.

    Note: When opening to read/write CSV you need to give open() an extra argument newline="".

    Writing CSV (cont'd)

    csv.DictWriter(f,fieldnames=L) specifies an iterable L of field names, and returns a writer object that expects rows as dictionaries. Useful methods:

    • .writeheader() — Write the field names to a header row.
    • .writerow(d) — Write the values from dictionary d to a line of the output file (but only the ones corresponding to keys that are field names).

    Other features

    The CSV reader and writer functions can use a separator other than a comma, e.g. specify delimiter="\t" to read or write tab separated values (TSV).

    Some CSV files put values in quotes so that the separator character can appear in the value, e.g.

    
    fullname,occupation
    Octavia Spencer,"actor,author"
    "Bond, James Bond","spy"
    

    The csv module supports this convention.

    References

    Revision history

    • 2021-09-29 Initial publication