regular expressions in javascript and command line
DESCRIPTION
A 5-minute introduction to regular expressions in JavaScript and the UNIX command line.TRANSCRIPT
RegularExpressions
Make complex data searches easy(-ish)!
Mandi Grant@wygrant 9/9/2014
Regular expressions are forfinding text that conforms
to a particular pattern.
Short Examples
[bB]acon“Bacon” and “bacon” but not “BACON”d{5}5-digits in a row\b\.[a-z]{3}\b‘.’ followed by 3-letters (.com)
Let’s do something practical.
Example 1:
Validating a Social Security Number
nnn-nn-nnnn
In JavaScript, you can assign a regex to a var:
var regex = /\d{3}-\d{2}-\d{4}$/
JS Example
Let’s do a harder one!
Identify .txt files that contain at least one e-mail address.
Example 2:
Searching through Files
e-mail addresses are basically:
*@*.*
[a-zA-Z0-9.]\+@
[a-zA-Z0-9.] means “find any character between a- z, A-Z, and 0-9, and also allow ‘.’ ”
\+ means “accept any number of those things”
@ means “must include this symbol ‘@’ ”
Okay, what do we do with that?
Grep to the rescue!
$ grep is built in to Unix
Add arguments like -l to limit results to filenamesand > “output.txt” to pipe results to a file.
Our complete regular expression:
$ grep [a-zA-Z0-9.]\+@[a-zA-Z0-9.]\+\.[a-zA-Z0-9]{\2,20\} files/*
Run in command
line
Again forthe
domain
Look for any quantity of
these characters TLD
Must include
“@”
Must include
“.”
Between 2 and 20
characters long
notice there is no “.” in the TLD character set!
Searchfiles/*
Practical Uses for RegEx● Search 50k files for phone numbers
o Famous real-life problem Amazon faced in 2003● Change thousands of art asset filepaths
o We almost made some junior dev do this by hand...● Remove HTML tags, scripts in form data
o No l337 hax 4 u
RegEx Gotchas1. Data must fit a pattern2. Simplicity vs accuracy3. Be careful with
“optional” 4. Syntax varies by
environment
Try it out!regexr.com
...that match this very specific format:[a-zA-Z0-9.]\+@[a-zA-Z0-9.]\+\.[a-zA-Z0-9]{2,20}
Regular Expressions: