Transcript
Page 1: Regular Expressions: JavaScript And Beyond
Page 2: Regular Expressions: JavaScript And Beyond

Regular Expressions:JavaScript And Beyond

Max ShirshinFrontend Team Lead

deltamethod

Page 3: Regular Expressions: JavaScript And Beyond

Introduction

Page 4: Regular Expressions: JavaScript And Beyond

Types of regular expressions• POSIX (BRE, ERE)

• PCRE = Perl-Compatible Regular Expressions

4

From the JavaScript language specification:

"The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language".

Page 5: Regular Expressions: JavaScript And Beyond

5

JS syntax (overview only)

var re = /^foo/;

       

Page 6: Regular Expressions: JavaScript And Beyond

6

JS syntax (overview only)

var re = /^foo/;

// booleanre.test('string');    

Page 7: Regular Expressions: JavaScript And Beyond

7

JS syntax (overview only)

var re = /^foo/;

// booleanre.test('string'); // null or Arrayre.exec('string');

Page 8: Regular Expressions: JavaScript And Beyond

8

Regular expressions consist of...

● Tokens— common characters— special characters (metacharacters)

● Operations— quantification— enumeration— grouping

Page 9: Regular Expressions: JavaScript And Beyond

Tokens and metacharacters

Page 10: Regular Expressions: JavaScript And Beyond

/./.test('foo'); // true

/./.test('\r\n') // false

        10

Any character

Page 11: Regular Expressions: JavaScript And Beyond

/./.test('foo'); // true

/./.test('\r\n') // false

What do you need instead:

/[\s\S]/ for JavaScript or/./s (works in Perl/PCRE, not in JS)

11

Any character

Page 12: Regular Expressions: JavaScript And Beyond

>>> /^something$/.test('something')true

   

 

 

12

String boundaries

Page 13: Regular Expressions: JavaScript And Beyond

>>> /^something$/.test('something')true

>>> /^something$/.test('something\nbad')false

 

 

13

String boundaries

Page 14: Regular Expressions: JavaScript And Beyond

>>> /^something$/.test('something')true

>>> /^something$/.test('something\nbad')false

>>> /^something$/m.test('something\nbad')true

14

String boundaries

Page 15: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true   

   

   

15

Word boundaries

Page 16: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true

   

   

16

Word boundaries

Page 17: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true

>>> /a\b/.test('naïve')true

   

17

Word boundaries

Page 18: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true

>>> /a\b/.test('naïve')true

not a word boundary/\Ba/.test('alabama');

18

Word boundaries

Page 19: Regular Expressions: JavaScript And Beyond

Character classes

Page 20: Regular Expressions: JavaScript And Beyond

/\s/ (inverted version: /\S/)

   

   

     

20

Whitespace

Page 21: Regular Expressions: JavaScript And Beyond

/\s/ (inverted version: /\S/)

FF:\t \n \v \f \r \u0020 \u00a0 \u1680 \u180e \u2000 \u2001 \u2002 \u2003 \u2004 \u2005 \u2006 \u2007 \u2008 \u2009 \u200a\ u2028 \u2029\ u202f \u205f \u3000

Chrome, IE 9:as in FF plus \ufeff

IE 7, 8 :-(only:\t \n \v \f \r \u0020

21

Whitespace

Page 22: Regular Expressions: JavaScript And Beyond

/\d/ ~ digits from 0 to 9

/\w/ ~ Latin letters, digits, underscoreDoes not work for Cyrillic, Greek etc.

Inverted forms:/\D/ ~ anything but digits/\W/ ~ anything but alphanumeric characters

22

Alphanumeric characters

Page 23: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/          

23

Custom character classes

Page 24: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/      

24

Custom character classes

Page 25: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/ More than one range is okay:/[a-cG-M0-7]/  

25

Custom character classes

Page 26: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/ More than one range is okay:/[a-cG-M0-7]/ IMPORTANT: ranges come from Unicode, not from national alphabets!

26

Custom character classes

Page 27: Regular Expressions: JavaScript And Beyond

"dot" means just dot!/[.]/.test('anything') // false

   

27

Custom character classes

Page 28: Regular Expressions: JavaScript And Beyond

"dot" means just dot!/[.]/.test('anything') // false

adding \ ] -/[\\\]-]/

28

Custom character classes

Page 29: Regular Expressions: JavaScript And Beyond

anything except a, b, c:/[^abc]/ ^ as a character:/[abc^]/

29

Inverted character classes

Page 30: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;

a nice alternative to /[\s\S]/

30

Inverted character classes

Page 31: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

31

Inverted character classes

Page 32: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

Chrome, FF:>>> /([^])/.exec('a');['a', 'a']

32

Inverted character classes

Page 33: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

IE:>>> /([^])/.exec('a');['a', '']

33

Inverted character classes

Page 34: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

IE:>>> /([\s\S])/.exec('a');['a', 'a']

34

Inverted character classes

Page 35: Regular Expressions: JavaScript And Beyond

Quantifiers

Page 36: Regular Expressions: JavaScript And Beyond

/bo*/.test('b') // true

   

36

Zero or more, one or more

Page 37: Regular Expressions: JavaScript And Beyond

/bo*/.test('b') // true

/.*/.test('') // true  

37

Zero or more, one or more

Page 38: Regular Expressions: JavaScript And Beyond

/bo*/.test('b') // true

/.*/.test('') // true /bo+/.test('b') // false

38

Zero or more, one or more

Page 39: Regular Expressions: JavaScript And Beyond

/colou?r/.test('color');/colou?r/.test('colour');

39

Zero or one

Page 40: Regular Expressions: JavaScript And Beyond

40

How many?

/bo{7}/ exactly 7

       

Page 41: Regular Expressions: JavaScript And Beyond

41

How many?

/bo{7}/ exactly 7

/bo{2,5}/ from 2 to 5, x < y      

Page 42: Regular Expressions: JavaScript And Beyond

42

How many?

/bo{7}/ exactly 7

/bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more    

Page 43: Regular Expressions: JavaScript And Beyond

43

How many?

/bo{7}/ exactly 7

/bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more This does not work in JS:/b{,5}/.test('bbbbb')

Page 44: Regular Expressions: JavaScript And Beyond

var r = /a+/.exec('aaaaa');    

44

Greedy quantifiers

Page 45: Regular Expressions: JavaScript And Beyond

var r = /a+/.exec('aaaaa'); >>> r[0] 

45

Greedy quantifiers

Page 46: Regular Expressions: JavaScript And Beyond

var r = /a+/.exec('aaaaa'); >>> r[0]"aaaaa"

46

Greedy quantifiers

Page 47: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');         

47

Lazy quantifiers

Page 48: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]       

48

Lazy quantifiers

Page 49: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a"      

49

Lazy quantifiers

Page 50: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');   

50

Lazy quantifiers

Page 51: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');>>> r[0] 

51

Lazy quantifiers

Page 52: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');>>> r[0]""

52

Lazy quantifiers

Page 53: Regular Expressions: JavaScript And Beyond

Groups

Page 54: Regular Expressions: JavaScript And Beyond

capturing/(boo)/.test("boo");

   

54

Groups

Page 55: Regular Expressions: JavaScript And Beyond

capturing/(boo)/.test("boo");

non-capturing/(?:boo)/.test("boo");

55

Groups

Page 56: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');         

       

56

Grouping and the RegExp constructor

Page 57: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo"     

       

57

Grouping and the RegExp constructor

Page 58: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b" 

       

58

Grouping and the RegExp constructor

Page 59: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9""       

59

Grouping and the RegExp constructor

Page 60: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9"">>> RegExp.$10undefined   

60

Grouping and the RegExp constructor

Page 61: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9"">>> RegExp.$10undefined>>> RegExp.$0undefined

61

Grouping and the RegExp constructor

Page 62: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

 

     

62

Numbering of capturing groups

Page 63: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar      

63

Numbering of capturing groups

Page 64: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar $2 ( ) foo   

64

Numbering of capturing groups

Page 65: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar $2 ( ) foo$3 ( ) bar 

65

Numbering of capturing groups

Page 66: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar $2 ( ) foo$3 ( ) bar$4 ( ) a

66

Numbering of capturing groups

Page 67: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

   

       

67

Lookahead

Page 68: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

>>> !!rtrue

       

68

Lookahead

Page 69: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

>>> !!rtrue

>>> r[0]"best"    

69

Lookahead

Page 70: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

>>> !!rtrue

>>> r[0]"best" >>> /best(?! match)/.test('best match')false

70

Lookahead

Page 71: Regular Expressions: JavaScript And Beyond

NOT supported in JavaScript at all

/(?<=text)match/positive lookbehind

/(?<!text)match/negative lookbehind

71

Lookbehind

Page 72: Regular Expressions: JavaScript And Beyond

Enumerations

Page 73: Regular Expressions: JavaScript And Beyond

/red|green|blue light//(red|green|blue) light/ >>> /var a(;|$)/.test('var a')true

73

Logical "or"

Page 74: Regular Expressions: JavaScript And Beyond

true/(red|green) apple is \1/.test('red apple is red')

true/(red|green) apple is \1/.test('green apple is green')

74

Backreferences

Page 75: Regular Expressions: JavaScript And Beyond

Alternative character represenations

Page 76: Regular Expressions: JavaScript And Beyond

\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)

 

   

   

76

Representing a character

Page 77: Regular Expressions: JavaScript And Beyond

\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)

backslash takes away special character meaning:

/\(\)/.test('()') // true/\\n/.test('\\n') // true

   

77

Representing a character

Page 78: Regular Expressions: JavaScript And Beyond

\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)

backslash takes away special character meaning:

/\(\)/.test('()') // true/\\n/.test('\\n') // true

...or vice versa!/\f/.test('f') // false!

78

Representing a character

Page 79: Regular Expressions: JavaScript And Beyond

Flags

Page 80: Regular Expressions: JavaScript And Beyond

g i m s x y      

     

80

Regular expression flags

Page 81: Regular Expressions: JavaScript And Beyond

g i m s x y global match   

     

81

Regular expression flags

Page 82: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore case 

     

82

Regular expression flags

Page 83: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore casemultiline matching for ^ and $

     

83

Regular expression flags

Page 84: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore casemultiline matching for ^ and $

JavaScript does NOT provide support for:string as single lineextend pattern

84

Regular expression flags

Page 85: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore casemultiline matching for ^ and $

Mozilla-only, non-standard:stickyMatch only from the .lastIndex index (a regexp instance property). Thus, ^ can match at a predefined position.

85

Regular expression flags

Page 86: Regular Expressions: JavaScript And Beyond

/(?i)foo//(?i-m)bar$//(?i-sm).x$//(?i)foo(?-i)bar/ Some implementations do NOT support flag switching on-the-go.

In JS, flags are set for the whole regexp instance and you can't change them.

86

Alternative syntax for flags

Page 87: Regular Expressions: JavaScript And Beyond

RegExp in JavaScript

Page 88: Regular Expressions: JavaScript And Beyond

RegExp instances: /regexp/.exec('string') null or array ['whole match', $1, $2, ...] /regexp/.test('string') false or true String instances: 'str'.match(/regexp/) 'str'.match('\\w{1,3}') - same as /regexp/.exec if no 'g' flag used; - array of all matches if 'g' flag used (internal capturing groups ignored) 'str'.search(/regexp/) 'str'.search('\\w{1,3}') first match index, or -1

88

Methods

Page 89: Regular Expressions: JavaScript And Beyond

String instances:'str'.replace(/old/, 'new'); WARNING: special magic supported in the replacement string: $$ inserts a dollar sign "$" $& substring that matches the regexp $` substring before $& $' substring after $& $1, $2, $3 etc.: string that matches n-th capturing group 'str'.replace(/(r)(e)gexp/g, function(matched, $1, $2, offset, sourceString) { // what should replace the matched part on this iteration? return 'replacement';});

89

Methods

Page 90: Regular Expressions: JavaScript And Beyond

// BAD CODEvar re = new RegExp('^' + userInput + '$');// ...var userInput = '[abc]'; // oops!

// GOOD, DO IT AT HOMERegExp.escape = function(text) { return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");}; var re = new RegExp('^' + RegExp.escape(userInput) + '$');

90

RegExp injection

Page 91: Regular Expressions: JavaScript And Beyond

Recommended reading

Page 92: Regular Expressions: JavaScript And Beyond

Online, just google it:MDN Guide on Regular Expressions

Mastering Regular ExpressionsO'Reilly Media

The Book:

Page 93: Regular Expressions: JavaScript And Beyond

Thank you!


Top Related