python regular expressions easy text processing. regular expression a way of identifying certain...

Python Regular ExpressionsEasy text processing

Regular Expression A way of identifying certain String

patterns Formally, a RE is:

a letter or lambda RE1 RE2 (concatenate 2 RE’s) (RE or RE) (RE)*

Why do you think they’re called Regular Expressions?

Python regex Use the re module import re The special characters:

. ^ $ * + ? { } [ ] \ | ( ) We’ll learn them one at a time…

Character classes [abc] means a or b or c [a-c] is the same thing [a-z] = any lowercase letter [^579] = any character except 5, 7, or 9

For Strings, use |:Shannon|Duvall

Metacharacters \d any digit [0-9] \D any non-digit [^0-9] \s any whitespace character (tabs, return

so forth) \S \w any alphanumeric character \W \b any word boundary . anything except newline

Repeat * means 0 or more

ma*d matches: md, mad, and maaaaad + means 1 or more

ma+d matches mad and maaaaad but not md

? means 0 or 1ma?d matches md and mad only

{x,y} means between x and y repetitionsma{1,3}d matches mad, maad, and maaad

Repeating groups [ab]* matches a, b, bbb (ab)* matches ab, abab, ababab

More metacharacters ^ outside of a character class, means

the beginning of a line $ matches the end of a line

What can I do with them?Search re.search(pattern, string, <flags>) pattern is the regex string is what you are searching in flags are special modifiers, optional This either returns None (false) or a

Match object When specifying the regex, use r to

denote “raw string”

Search Exampleimport reline = “Cats are smarter than dogs”if re.search(r’.*are.*than.*’,line):

print(“yes”)

Groups Using () in a regex creates a group that

can be referenced later. The string that matches the entire regex

is said to be group 0. Other groups are numbered, starting at

1.

Grouping exampleimport rem = re.search(r'(\w+) (\w+)',"Shannon Lynn Duvall")m.group(0)'Shannon Lynn’m.group(1)'Shannon’m.group(2)'Lynn'

Grouping Example Would it match?m = re.search(r’(\w+) \1’, “Shannon Shannon”)

Space taken out:m = re.search(r’(\w+)\1’, “Shannon Shannon”)

Nested groups Group number goes from out to in. Count

the parentheses.m = re.search(r'(a(b)c)d’, ’’abcd’’)m.group(0)'abcd’m.group(1)'abc’m.group(2)'b'

sub: search and replace re.sub(regex, putIn, string, <flags>)

phone = "1-800-555-9090” newPhone = re.sub(r'\D', “”, phone)

What is newPhone?

findall Search for all matches and return them

as a list song ="12 drummers drumming, 11

pipers piping, 10 lords a leaping" nums = re.findall(r'\d+',song) nums is now [‘12’, ‘11’, ‘10’]

split Split a string based on a regex as the

delimiters.

verses = re.split(r'\d+',song)verses is['', ' drummers drumming, ', ' pipers piping, ', ' lords a leaping']

split with groups Sometimes you want the delimiter to

show up in the list. Use a group – the group will be returned in the list.

verses = re.split(r'(\d+)',song)verses is:['', '12', ' drummers drumming, ', '11', ' pipers piping, ', '10', ' lords a leaping']

Examples You have a string that represents a

poker hand: a,k,q,j for ace, king, queen, jack 1-9 for numbers 1-9 0 for 10

How would you: Make sure a string is a valid hand? Check for a pair of sevens? Check for any pair? Check for 3 of a kind? Check for a full house?

python regular expressions easy text processing. regular expression a way of identifying certain...

Documents