MCS 260 Fall 2021
Emily Dumas
A file is a named, ordered collection of data, usually in persistent storage (disks, flash, etc.).
Files are stored in a hierarchy of directories.
The OS provides functions that programs can use to access files, handling the lower-level details itself.
The OS keeps track of a file offset, the place in the file where the next read or write operation will happen. This can also be moved with an operation called seek.
Two ways to access files:
We'll only cover text format file operations for now.
An encoding is a way of turning a sequence of unicode characters into a sequence of bytes.
UTF-8, ISO-8859-1, and CP-1252 are examples of encodings.
E.g. translating "AdiĆ³s"
into bytes:
0x41 0x64 0x69 0xc3 0xb3 0x73
in UTF-80x41 0x64 0x69 0xf3 0x73
in ISO-8859-1We will use UTF-8 exclusively.
open(filename,[mode],[encoding=...])
opens a file and returns an object representing it.
Methods of the file object allow you to read or write.
"""Write a string to a file"""
fout = open("out.txt","w",encoding="UTF-8") # w means write
fout.write("Hello world")
fout.close() # Done with this file (OS does cleanup)
fin = open("out.txt","r",encoding="UTF-8") # r means read only (default)
s = fin.read() # Get entire file contents
fin.close()
print("Contents of file:",s)
The files you read/write this way can have any name you like; they don't need to end in ".txt".
Often you want to process one line at a time. File objects are iterable, giving the lines. E.g. nl.py
"""Number the lines of a file specified on command line"""
import sys
fin = open(sys.argv[1],"r",encoding="UTF-8")
n = 0
for line in fin:
n = n+1
print(n,line,end="") # line usually has \n at the end
fin.close()
Sample output:
$ python nl.py nl.py
1 """Number the lines of a file specified on command line"""
2 import sys
...
Important: file.write()
is not like print()
. It doesn't add a newline, and it doesn't accept multiple arguments to print.
pet_type = "ducks"
print("I have",21,pet_type) # OK
fout.write("I have",21,pet_type) # FAILS
Must prepare a single string to write. The usual way is to use str.format()
:
pet_type = "ducks"
fout.write("I have {} {}\n".format(21,pet_type)) # ok
str.format()
has many features to create a string based on a template and some values. In the string, placeholders ({}
or {...}
) are replaced by arguments of str.format()
.
>>> "{1} taught {0}".format("MCS 260","Dumas") # give indices
'Dumas taught MCS 260'
>>> for x in range(98,101):
... print("{:4}".format(x)) # specify width
...
98
99
100
>>> "{:04}".format(42) # pad to width with zeros
'0042'
>>> "{:8.2f}".format(42) # f = float, width 8, 2 digits after .
' 42.00'
>>> "{:8x}".format(42) # x = hex int, width 8
' 2a'
>>> "{:8d}".format(42) # d = decimal int, width 8
' 42'
>>> "{:.2f}".format(13+2j) # f allows complex; no total width
'13.00+2.00j'
The general placeholder syntax is {w:ot}
where w specifies which argument, o is a set of options, and t is the type.