MCS 260 Fall 2021
Emily Dumas
Often can solve a problem with recursion or with loops (an iterative solution). Why use recursion?
Pros:
Unclear:
Cons:
Recall that backslash \
in a string starts an escape sequence in Python.
You can disable escape sequences by putting the letter r
immediately before the quotation mark(s). This is known as a raw string. In a raw string, a single \ represents the \ character.
However, raw strings cannot end with a single \
>>> print("C:\\Users\\ddumas\n(home)")
C:\Users\ddumas
(home)
>>> print(r"C:\\Users\\ddumas\n(home)")
C:\\Users\\ddumas\n(home)
>>> print(r"C:\Users\ddumas")
C:\Users\ddumas
>>>
Today we'll learn about the module re
in Python, which supports a text searching language known as regular expressions or regexes.
Some of its key functions include:
Regexes are a mini programming language for specifying patterns of text.
Dialects of regex are supported in many programming languages. We'll cover the Python dialect.
Simplest usage: Find and replace a substring.
import re
s = "Avocado is usually considered a vegetable."
print(re.sub("vegetable","fruit",s))
re.sub(pattern, replacement, string)
The first argument of re.sub
is a pattern.
Unless it contains characters with special meaning in a regex pattern, the pattern just matches substrings equal to the pattern.
"vegetable"
matches the string "vegetable"
"foo"
matches the string "foo"
.
— matches any character except newline\s
— matches any whitespace character\d
— matches a decimal digit\w
— matches a "word character" (a-z, A-Z, 0-9, _)+
— previous item must repeat 1 or more times*
— previous item must repeat 0 or more times?
— previous item must repeat 0 or 1 times{n}
— previous item must appear n timesReplace any price in whole dollars (written like $2
or $1999
) with the string -PRICE-
.
Note: $
is a special character. To match a dollar sign, put \$
in the pattern.
re.match(pattern,string)
— does string
begin with a match to pattern
? Return a match object or None
.re.search(pattern,string)
— does string
contain a match to the pattern
? Return a match object or None
.re.finditer(pattern,string)
— return an iterable yielding all the non-overlapping matches as match objects.Most regex functions return match objects that contain info about a part of the string matching the expression.
A match object has a method .group()
that returns the full text of the match.
.start()
and .end()
return the indices where the match begins and ends in the string.
A part of a pattern in parentheses is a group. A group is treated as a unit for operators like +,*,?
.
e.g. pattern (ha)+
means one or more repetitions of ha
.
It matches ha
or haha
or hahaha
but does not match Haha
or h
or hah
.
In contrast, ha+
means the letter h
followed by one or more repetitions of a
, e.g. haaaaaaa
Matched groups are available as .group(1)
, .group(2)
, etc., with the 1-based number referring to the order of left parentheses in the pattern.
Group 0 always refers to the entire pattern.
e.g. pattern My name is (\w+).
will capture the name (not containing spaces!) in group 1.
Find all of the phone numbers in a string that are written in the format 319-555-1012
, and split each one into area code (e.g. 319
), exchange (e.g. 555
), and line number (e.g. 1012
).
re
module is good as a reference.print
are lacking parentheses. Otherwise, the code should work.