regular expressions in javascript and command line

18
Regular Expressions Make complex data searches easy(-ish)! Mandi Grant @wygrant 9/9/2014

Upload: mandi-grant

Post on 28-Jun-2015

172 views

Category:

Technology


1 download

DESCRIPTION

A 5-minute introduction to regular expressions in JavaScript and the UNIX command line.

TRANSCRIPT

Page 1: Regular Expressions in JavaScript and Command Line

RegularExpressions

Make complex data searches easy(-ish)!

Mandi Grant@wygrant 9/9/2014

Page 2: Regular Expressions in JavaScript and Command Line

Regular expressions are forfinding text that conforms

to a particular pattern.

Page 3: Regular Expressions in JavaScript and Command Line

Short Examples

[bB]acon“Bacon” and “bacon” but not “BACON”d{5}5-digits in a row\b\.[a-z]{3}\b‘.’ followed by 3-letters (.com)

Page 4: Regular Expressions in JavaScript and Command Line

Let’s do something practical.

Page 5: Regular Expressions in JavaScript and Command Line

Example 1:

Validating a Social Security Number

nnn-nn-nnnn

Page 6: Regular Expressions in JavaScript and Command Line

In JavaScript, you can assign a regex to a var:

var regex = /\d{3}-\d{2}-\d{4}$/

Page 7: Regular Expressions in JavaScript and Command Line

JS Example

Page 8: Regular Expressions in JavaScript and Command Line

Let’s do a harder one!

Page 9: Regular Expressions in JavaScript and Command Line

Identify .txt files that contain at least one e-mail address.

Example 2:

Searching through Files

Page 10: Regular Expressions in JavaScript and Command Line

e-mail addresses are basically:

*@*.*

Page 11: Regular Expressions in JavaScript and Command Line

[a-zA-Z0-9.]\+@

[a-zA-Z0-9.] means “find any character between a- z, A-Z, and 0-9, and also allow ‘.’ ”

\+ means “accept any number of those things”

@ means “must include this symbol ‘@’ ”

Page 12: Regular Expressions in JavaScript and Command Line

Okay, what do we do with that?

Page 13: Regular Expressions in JavaScript and Command Line

Grep to the rescue!

$ grep is built in to Unix

Add arguments like -l to limit results to filenamesand > “output.txt” to pipe results to a file.

Page 14: Regular Expressions in JavaScript and Command Line

Our complete regular expression:

$ grep [a-zA-Z0-9.]\+@[a-zA-Z0-9.]\+\.[a-zA-Z0-9]{\2,20\} files/*

Run in command

line

Again forthe

domain

Look for any quantity of

these characters TLD

Must include

“@”

Must include

“.”

Between 2 and 20

characters long

notice there is no “.” in the TLD character set!

Searchfiles/*

Page 15: Regular Expressions in JavaScript and Command Line

Practical Uses for RegEx● Search 50k files for phone numbers

o Famous real-life problem Amazon faced in 2003● Change thousands of art asset filepaths

o We almost made some junior dev do this by hand...● Remove HTML tags, scripts in form data

o No l337 hax 4 u

Page 16: Regular Expressions in JavaScript and Command Line

RegEx Gotchas1. Data must fit a pattern2. Simplicity vs accuracy3. Be careful with

“optional” 4. Syntax varies by

environment

Page 17: Regular Expressions in JavaScript and Command Line

Try it out!regexr.com

Page 18: Regular Expressions in JavaScript and Command Line

...that match this very specific format:[a-zA-Z0-9.]\+@[a-zA-Z0-9.]\+\.[a-zA-Z0-9]{2,20}

Regular Expressions: