regular expression presentation for the hub
DESCRIPTION
Presentation I did for the helpdesk of my alma materTRANSCRIPT
Regular ExpressionsBen Simpson - <3 HUB
● Working with web technologies for 10 years● Former HUB supervisor● Tour de jobs: http://tinyurl.com/kmsns38● Graduated from CSU with a BAS in
Technology Management 2013● Husband and proud father● Presenter on regular expressions!
Introductions
What Is a Regular Expression?
Pattern matching
What Could I Do With a RegExp?
● Searching● Syntax highlighting● Data validation● Sanitation● Data queries / extraction● Many tasks that require matching a pattern
RegExps Won’t Let You Time Travel
Brain Teaser
Which of the following is a valid telephone number?1. 678 466 40002. (678) 466-40003. 12344. domain\\user5. 1 (800) 1234 567
How did you know?Depends on who you ask...
We Pattern Match Every Day
● Telephone numbers follow a pattern that we recognize
● This pattern has rules (3 digit zip, 7 digit number, numeric only)
● There are often many variations to a pattern (optional intl code)
Literal Characters
String: The cat in the hatRegExp: /at/
The cat in the hat
Regular Expressions in Javascript
var haystack = "The cat in the hat";var needle = new RegExp(/cat/);haystack.match(needle); // truthy
needle = new RegExp(/dog/);haystack.match(needle); // falsey
Well that wasn’t so badThe best is yet to come!
Special Characters (Metacharacters)
● \ - escape character
● ^ - beginning of line (not inside brackets)
● $ - ending of line
● . - wildcard
● | - or junction
● ? - zero or one
● * - zero or more
● + - one or more
● () - grouping
● [] - character set
● {} - repetition
Demonstration of Special Characters
String: ...To login to your email use the username: “[email protected]” with a password “password123”...
RegExp: /username "(.*)" .* password "(.*)"/Results: 1. [email protected] 2. password123
Shorthand Character Classes
● \d - digit [0-9]● \w - word● \s - whitespace
● \D - digit [^\d]● \W - word [^\w]● \S - whitespace [^\s]
Wait a Second!You said this was easy
Thinking about a Telephone Pattern● Optional international code● 3 digit area code● 7 digit number● Optional extension● What about alpha phrases? (e.g. 678 466-HELP)● What is the length of intl codes? (e.g. 358 for Finland)● Are parenthesis optional?● Is spacing optional?● Country specific formats (e.g. France 06 87 71 23 45)
Regular Expression - Telephone #
String: 678 466 4357RegExp: \d{3} \d{3} \d{4}
String: (678) 466-4357RegExp: \(\d{3}\) \d{3}-\d{4}
Telephone # - Two Variations
String: 678 466 4357 (678) 466-4357RegExp: \(?\d{3}\)? \d{3}[\s-]\d{4}
Telephone # - Three Variations
String: 678 466 4357 (678) 466-4357 1 (678) 466-4357RegExp: \d*\s?\(?\d{3}\)? \d{3}[\s-]?\d{4}
That Escalated Quickly
Surprisingly Difficult
● Seemingly simple patterns can become very complex.
● Its best to work against data that is consistent, or regular in its implementation of patterns
● If the data is too dirty, a regular expression won’t be much help
When RegExps Go Bad
● Websites that don’t accept special characters in email addresses, URLs, telephone numbers, etc
● May be RegExps that are too restrictive● Doesn’t take into account all variations of a
pattern● Longer expressions are difficult to grok
In a Nutshell
“Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.”
-Jamie Zawinski
Brain Teaser
Which of the following a valid email address?1. [email protected]. [email protected]. ben+email4. http://www.clayton.edu5. abc."defghi"[email protected]
Thinking about Email Address
● Has a local part (e.g. [email protected])● Has a domain part (e.g. thehub@clayton.
edu)● Has an @ symbol in the middle● Do we need to support special characters?● Can we verify based on minimum /
maximum length?
Best to Keep It Simple!
String: [email protected]
RegExp: .*@.*
Yeah, but isn’t here an official email Regex that takes all the patterns into account? Yes...
RFC 5322 - The Email RegExp(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)* | "(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f] | \\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])? | \[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]: (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f] | \\[\x01-\x09\x0b\x0c\x0e-\x7f])+) \])
Maybe this instead?
(╯°□°)╯︵ ┻━┻)
(Let me put that back for you)
┬─┬ ノ( ゜-゜ノ)
Brain Teaser
Which is a valid zipcode?1. 300222. 30022-71553. 3001314. -71555. AB123XY
Thinking About a Zipcode
● Digits only● 5 digits mandatory plus optional 4 digit code● 4 digit code suffixed with hyphen● Do other countries use zip codes?● Pattern is easier because there is less
variation (Thank USPS!)
Brain Teaser
Which is a valid URL?1. http://www.clayton.edu2. www.clayton.edu3. clayton.edu4. thehub.clayton.edu5. ben:[email protected]:80/foo?bar=baz#qux
Thinking about a URL
Extra Credit
● IP address● HTML Tag contents● Validating a password against requirements● Dates● Times