MCS 260 Fall 2020
Emily Dumas
Often can solve a problem with recursion or with loops (an iterative solution). Why use recursion?
Pros:
Unclear:
Cons:
Today we'll learn about the module re in Python, which supports a text searching language known as regular expressions or regexes.
Some of its key functions include:
Regexes are a mini programming language for specifying patterns of text. Dialects of regex are supported in many programming languages. We'll cover the Python dialect.
Simplest usage: Find and replace a substring.
import re
s = "Avocado is usually considered a vegetable."
print(re.sub("vegetable","fruit",s))
re.sub(pattern, replacement, string)
The first argument of re.sub is a pattern.
Unless it contains characters with special meaning in a regex pattern, the pattern just matches substrings equal to the pattern.
"vegetable" matches the string "vegetable""foo" matches the string "foo"Recall that backslash \ in a string starts an escape sequence in Python, and \\ represents a single backslash character in the string. If your string contains a lot of backslashes, you may want to disable escape sequences.
You can do so by putting the letter r immediately before the quotation mark(s). This is known as a raw string. In a raw string, a single \ represents the \ character.
. — matches any character except newline\s — matches any whitespace character\d — matches a decimal digit+ — previous item must repeat 1 or more times* — previous item must repeat 0 or more times? — previous item must repeat 0 or 1 times{n} — previous item must appear n timesReplace any price in whole dollars (written like $2 or $1999) with the string -PRICE-.
Note: $ is a special character. To match a dollar sign, use \$.
What if you don't want to replace a regex, just find it?
re.match(pattern,string) — does string begin with a match to pattern? Return a match object or None.re.search(pattern,string) — does string contain a match to the pattern? Return a match object or None.re.findall(pattern,string) — return a list of all non-overlapping matches as strings.If a match is found, then the match object has a method .group() that returns the full text of the match.
.start() and .end() return the indices where the match begins and ends in the string.
A part of a pattern in parentheses is a group. A group is treated as a unit for operators like +,*,?.
e.g. pattern (ha)+ matches ha or haha or hahaha but does not match Haha or h or hah.
Matched groups are available from the match object using .group(1), .group(2), etc..
Find all of the phone numbers in a string that are written in the format 319-555-1012, and split each one into area code (e.g. 319), exchange (e.g. 555), and line number (e.g. 1012).
print are lacking parentheses. Otherwise, the code should work.re module is good as a reference, not ideal to learn from.