regular expression presentation for the hub

37
Regular Expressions Ben Simpson - <3 HUB

Upload: thehoagie

Post on 03-Jul-2015

334 views

Category:

Technology


0 download

DESCRIPTION

Presentation I did for the helpdesk of my alma mater

TRANSCRIPT

Page 1: Regular expression presentation for the HUB

Regular ExpressionsBen Simpson - <3 HUB

Page 2: Regular expression presentation for the HUB

● Working with web technologies for 10 years● Former HUB supervisor● Tour de jobs: http://tinyurl.com/kmsns38● Graduated from CSU with a BAS in

Technology Management 2013● Husband and proud father● Presenter on regular expressions!

Introductions

Page 3: Regular expression presentation for the HUB

What Is a Regular Expression?

Pattern matching

Page 4: Regular expression presentation for the HUB

What Could I Do With a RegExp?

● Searching● Syntax highlighting● Data validation● Sanitation● Data queries / extraction● Many tasks that require matching a pattern

Page 5: Regular expression presentation for the HUB

RegExps Won’t Let You Time Travel

Page 6: Regular expression presentation for the HUB

Brain Teaser

Which of the following is a valid telephone number?1. 678 466 40002. (678) 466-40003. 12344. domain\\user5. 1 (800) 1234 567

Page 7: Regular expression presentation for the HUB

How did you know?Depends on who you ask...

Page 8: Regular expression presentation for the HUB

We Pattern Match Every Day

● Telephone numbers follow a pattern that we recognize

● This pattern has rules (3 digit zip, 7 digit number, numeric only)

● There are often many variations to a pattern (optional intl code)

Page 9: Regular expression presentation for the HUB

Literal Characters

String: The cat in the hatRegExp: /at/

The cat in the hat

Page 10: Regular expression presentation for the HUB

Regular Expressions in Javascript

var haystack = "The cat in the hat";var needle = new RegExp(/cat/);haystack.match(needle); // truthy

needle = new RegExp(/dog/);haystack.match(needle); // falsey

Page 11: Regular expression presentation for the HUB

Well that wasn’t so badThe best is yet to come!

Page 12: Regular expression presentation for the HUB

Special Characters (Metacharacters)

● \ - escape character

● ^ - beginning of line (not inside brackets)

● $ - ending of line

● . - wildcard

● | - or junction

● ? - zero or one

● * - zero or more

● + - one or more

● () - grouping

● [] - character set

● {} - repetition

Page 13: Regular expression presentation for the HUB
Page 14: Regular expression presentation for the HUB

Demonstration of Special Characters

String: ...To login to your email use the username: “[email protected]” with a password “password123”...

RegExp: /username "(.*)" .* password "(.*)"/Results: 1. [email protected] 2. password123

Page 15: Regular expression presentation for the HUB

Shorthand Character Classes

● \d - digit [0-9]● \w - word● \s - whitespace

● \D - digit [^\d]● \W - word [^\w]● \S - whitespace [^\s]

Page 16: Regular expression presentation for the HUB

Wait a Second!You said this was easy

Page 17: Regular expression presentation for the HUB

Thinking about a Telephone Pattern● Optional international code● 3 digit area code● 7 digit number● Optional extension● What about alpha phrases? (e.g. 678 466-HELP)● What is the length of intl codes? (e.g. 358 for Finland)● Are parenthesis optional?● Is spacing optional?● Country specific formats (e.g. France 06 87 71 23 45)

Page 18: Regular expression presentation for the HUB

Regular Expression - Telephone #

String: 678 466 4357RegExp: \d{3} \d{3} \d{4}

String: (678) 466-4357RegExp: \(\d{3}\) \d{3}-\d{4}

Page 19: Regular expression presentation for the HUB

Telephone # - Two Variations

String: 678 466 4357 (678) 466-4357RegExp: \(?\d{3}\)? \d{3}[\s-]\d{4}

Page 20: Regular expression presentation for the HUB

Telephone # - Three Variations

String: 678 466 4357 (678) 466-4357 1 (678) 466-4357RegExp: \d*\s?\(?\d{3}\)? \d{3}[\s-]?\d{4}

Page 21: Regular expression presentation for the HUB

That Escalated Quickly

Page 22: Regular expression presentation for the HUB

Surprisingly Difficult

● Seemingly simple patterns can become very complex.

● Its best to work against data that is consistent, or regular in its implementation of patterns

● If the data is too dirty, a regular expression won’t be much help

Page 23: Regular expression presentation for the HUB

When RegExps Go Bad

● Websites that don’t accept special characters in email addresses, URLs, telephone numbers, etc

● May be RegExps that are too restrictive● Doesn’t take into account all variations of a

pattern● Longer expressions are difficult to grok

Page 24: Regular expression presentation for the HUB
Page 25: Regular expression presentation for the HUB

In a Nutshell

“Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.”

-Jamie Zawinski

Page 27: Regular expression presentation for the HUB

Thinking about Email Address

● Has a local part (e.g. [email protected])● Has a domain part (e.g. thehub@clayton.

edu)● Has an @ symbol in the middle● Do we need to support special characters?● Can we verify based on minimum /

maximum length?

Page 28: Regular expression presentation for the HUB

Best to Keep It Simple!

String: [email protected]

RegExp: .*@.*

Yeah, but isn’t here an official email Regex that takes all the patterns into account? Yes...

Page 29: Regular expression presentation for the HUB

RFC 5322 - The Email RegExp(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)* | "(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f] | \\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@ (?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])? | \[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]: (?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f] | \\[\x01-\x09\x0b\x0c\x0e-\x7f])+) \])

Page 30: Regular expression presentation for the HUB

Maybe this instead?

(╯°□°)╯︵ ┻━┻)

Page 31: Regular expression presentation for the HUB

(Let me put that back for you)

┬─┬ ノ( ゜-゜ノ)

Page 32: Regular expression presentation for the HUB

Brain Teaser

Which is a valid zipcode?1. 300222. 30022-71553. 3001314. -71555. AB123XY

Page 33: Regular expression presentation for the HUB

Thinking About a Zipcode

● Digits only● 5 digits mandatory plus optional 4 digit code● 4 digit code suffixed with hyphen● Do other countries use zip codes?● Pattern is easier because there is less

variation (Thank USPS!)

Page 34: Regular expression presentation for the HUB

Brain Teaser

Which is a valid URL?1. http://www.clayton.edu2. www.clayton.edu3. clayton.edu4. thehub.clayton.edu5. ben:[email protected]:80/foo?bar=baz#qux

Page 35: Regular expression presentation for the HUB

Thinking about a URL

Page 37: Regular expression presentation for the HUB

Extra Credit

● IP address● HTML Tag contents● Validating a password against requirements● Dates● Times