python regular expressions easy text processing. regular expression a way of identifying certain...
TRANSCRIPT
Python Regular ExpressionsEasy text processing
Regular Expression A way of identifying certain String
patterns Formally, a RE is:
a letter or lambda RE1 RE2 (concatenate 2 RE’s) (RE or RE) (RE)*
Why do you think they’re called Regular Expressions?
Python regex Use the re module import re The special characters:
. ^ $ * + ? { } [ ] \ | ( ) We’ll learn them one at a time…
Character classes [abc] means a or b or c [a-c] is the same thing [a-z] = any lowercase letter [^579] = any character except 5, 7, or 9
For Strings, use |:Shannon|Duvall
Metacharacters \d any digit [0-9] \D any non-digit [^0-9] \s any whitespace character (tabs, return
so forth) \S \w any alphanumeric character \W \b any word boundary . anything except newline
Repeat * means 0 or more
ma*d matches: md, mad, and maaaaad + means 1 or more
ma+d matches mad and maaaaad but not md
? means 0 or 1ma?d matches md and mad only
{x,y} means between x and y repetitionsma{1,3}d matches mad, maad, and maaad
Repeating groups [ab]* matches a, b, bbb (ab)* matches ab, abab, ababab
More metacharacters ^ outside of a character class, means
the beginning of a line $ matches the end of a line
What can I do with them?Search re.search(pattern, string, <flags>) pattern is the regex string is what you are searching in flags are special modifiers, optional This either returns None (false) or a
Match object When specifying the regex, use r to
denote “raw string”
Search Exampleimport reline = “Cats are smarter than dogs”if re.search(r’.*are.*than.*’,line):
print(“yes”)
Groups Using () in a regex creates a group that
can be referenced later. The string that matches the entire regex
is said to be group 0. Other groups are numbered, starting at
1.
Grouping exampleimport rem = re.search(r'(\w+) (\w+)',"Shannon Lynn Duvall")m.group(0)'Shannon Lynn’m.group(1)'Shannon’m.group(2)'Lynn'
Grouping Example Would it match?m = re.search(r’(\w+) \1’, “Shannon Shannon”)
Space taken out:m = re.search(r’(\w+)\1’, “Shannon Shannon”)
Nested groups Group number goes from out to in. Count
the parentheses.m = re.search(r'(a(b)c)d’, ’’abcd’’)m.group(0)'abcd’m.group(1)'abc’m.group(2)'b'
sub: search and replace re.sub(regex, putIn, string, <flags>)
phone = "1-800-555-9090” newPhone = re.sub(r'\D', “”, phone)
What is newPhone?
findall Search for all matches and return them
as a list song ="12 drummers drumming, 11
pipers piping, 10 lords a leaping" nums = re.findall(r'\d+',song) nums is now [‘12’, ‘11’, ‘10’]
split Split a string based on a regex as the
delimiters.
verses = re.split(r'\d+',song)verses is['', ' drummers drumming, ', ' pipers piping, ', ' lords a leaping']
split with groups Sometimes you want the delimiter to
show up in the list. Use a group – the group will be returned in the list.
verses = re.split(r'(\d+)',song)verses is:['', '12', ' drummers drumming, ', '11', ' pipers piping, ', '10', ' lords a leaping']
Examples You have a string that represents a
poker hand: a,k,q,j for ace, king, queen, jack 1-9 for numbers 1-9 0 for 10
How would you: Make sure a string is a valid hand? Check for a pair of sevens? Check for any pair? Check for 3 of a kind? Check for a full house?