regular expressions: javascript and beyond
Post on 03-Jul-2015
662 Views
Preview:
DESCRIPTION
TRANSCRIPT
Regular Expressions:JavaScript And Beyond
Max ShirshinFrontend Team Lead
deltamethod
Introduction
Types of regular expressions• POSIX (BRE, ERE)
• PCRE = Perl-Compatible Regular Expressions
4
From the JavaScript language specification:
"The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language".
5
JS syntax (overview only)
var re = /^foo/;
6
JS syntax (overview only)
var re = /^foo/;
// booleanre.test('string');
7
JS syntax (overview only)
var re = /^foo/;
// booleanre.test('string'); // null or Arrayre.exec('string');
8
Regular expressions consist of...
● Tokens— common characters— special characters (metacharacters)
● Operations— quantification— enumeration— grouping
Tokens and metacharacters
/./.test('foo'); // true
/./.test('\r\n') // false
10
Any character
/./.test('foo'); // true
/./.test('\r\n') // false
What do you need instead:
/[\s\S]/ for JavaScript or/./s (works in Perl/PCRE, not in JS)
11
Any character
>>> /^something$/.test('something')true
12
String boundaries
>>> /^something$/.test('something')true
>>> /^something$/.test('something\nbad')false
13
String boundaries
>>> /^something$/.test('something')true
>>> /^something$/.test('something\nbad')false
>>> /^something$/m.test('something\nbad')true
14
String boundaries
>>> /\ba/.test('alabama)true
15
Word boundaries
>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true
16
Word boundaries
>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true
>>> /a\b/.test('naïve')true
17
Word boundaries
>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true
>>> /a\b/.test('naïve')true
not a word boundary/\Ba/.test('alabama');
18
Word boundaries
Character classes
/\s/ (inverted version: /\S/)
20
Whitespace
/\s/ (inverted version: /\S/)
FF:\t \n \v \f \r \u0020 \u00a0 \u1680 \u180e \u2000 \u2001 \u2002 \u2003 \u2004 \u2005 \u2006 \u2007 \u2008 \u2009 \u200a\ u2028 \u2029\ u202f \u205f \u3000
Chrome, IE 9:as in FF plus \ufeff
IE 7, 8 :-(only:\t \n \v \f \r \u0020
21
Whitespace
/\d/ ~ digits from 0 to 9
/\w/ ~ Latin letters, digits, underscoreDoes not work for Cyrillic, Greek etc.
Inverted forms:/\D/ ~ anything but digits/\W/ ~ anything but alphanumeric characters
22
Alphanumeric characters
Example:/[abc123]/
23
Custom character classes
Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/
24
Custom character classes
Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/ More than one range is okay:/[a-cG-M0-7]/
25
Custom character classes
Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/ More than one range is okay:/[a-cG-M0-7]/ IMPORTANT: ranges come from Unicode, not from national alphabets!
26
Custom character classes
"dot" means just dot!/[.]/.test('anything') // false
27
Custom character classes
"dot" means just dot!/[.]/.test('anything') // false
adding \ ] -/[\\\]-]/
28
Custom character classes
anything except a, b, c:/[^abc]/ ^ as a character:/[abc^]/
29
Inverted character classes
/[^]/matches ANY character;
a nice alternative to /[\s\S]/
30
Inverted character classes
/[^]/matches ANY character;could bea nice alternative to /[\s\S]/
31
Inverted character classes
/[^]/matches ANY character;could bea nice alternative to /[\s\S]/
Chrome, FF:>>> /([^])/.exec('a');['a', 'a']
32
Inverted character classes
/[^]/matches ANY character;could bea nice alternative to /[\s\S]/
IE:>>> /([^])/.exec('a');['a', '']
33
Inverted character classes
/[^]/matches ANY character;could bea nice alternative to /[\s\S]/
IE:>>> /([\s\S])/.exec('a');['a', 'a']
34
Inverted character classes
Quantifiers
/bo*/.test('b') // true
36
Zero or more, one or more
/bo*/.test('b') // true
/.*/.test('') // true
37
Zero or more, one or more
/bo*/.test('b') // true
/.*/.test('') // true /bo+/.test('b') // false
38
Zero or more, one or more
/colou?r/.test('color');/colou?r/.test('colour');
39
Zero or one
40
How many?
/bo{7}/ exactly 7
41
How many?
/bo{7}/ exactly 7
/bo{2,5}/ from 2 to 5, x < y
42
How many?
/bo{7}/ exactly 7
/bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more
43
How many?
/bo{7}/ exactly 7
/bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more This does not work in JS:/b{,5}/.test('bbbbb')
var r = /a+/.exec('aaaaa');
44
Greedy quantifiers
var r = /a+/.exec('aaaaa'); >>> r[0]
45
Greedy quantifiers
var r = /a+/.exec('aaaaa'); >>> r[0]"aaaaa"
46
Greedy quantifiers
var r = /a+?/.exec('aaaaa');
47
Lazy quantifiers
var r = /a+?/.exec('aaaaa');>>> r[0]
48
Lazy quantifiers
var r = /a+?/.exec('aaaaa');>>> r[0]"a"
49
Lazy quantifiers
var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');
50
Lazy quantifiers
var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');>>> r[0]
51
Lazy quantifiers
var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');>>> r[0]""
52
Lazy quantifiers
Groups
capturing/(boo)/.test("boo");
54
Groups
capturing/(boo)/.test("boo");
non-capturing/(?:boo)/.test("boo");
55
Groups
var result = /(bo)o+(b)/.exec('the booooob');
56
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo"
57
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b"
58
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9""
59
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9"">>> RegExp.$10undefined
60
Grouping and the RegExp constructor
var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9"">>> RegExp.$10undefined>>> RegExp.$0undefined
61
Grouping and the RegExp constructor
/((foo) (b(a)r))/
62
Numbering of capturing groups
/((foo) (b(a)r))/
$1 ( ) foo bar
63
Numbering of capturing groups
/((foo) (b(a)r))/
$1 ( ) foo bar $2 ( ) foo
64
Numbering of capturing groups
/((foo) (b(a)r))/
$1 ( ) foo bar $2 ( ) foo$3 ( ) bar
65
Numbering of capturing groups
/((foo) (b(a)r))/
$1 ( ) foo bar $2 ( ) foo$3 ( ) bar$4 ( ) a
66
Numbering of capturing groups
var r = /best(?= match)/.exec('best match');
67
Lookahead
var r = /best(?= match)/.exec('best match');
>>> !!rtrue
68
Lookahead
var r = /best(?= match)/.exec('best match');
>>> !!rtrue
>>> r[0]"best"
69
Lookahead
var r = /best(?= match)/.exec('best match');
>>> !!rtrue
>>> r[0]"best" >>> /best(?! match)/.test('best match')false
70
Lookahead
NOT supported in JavaScript at all
/(?<=text)match/positive lookbehind
/(?<!text)match/negative lookbehind
71
Lookbehind
Enumerations
/red|green|blue light//(red|green|blue) light/ >>> /var a(;|$)/.test('var a')true
73
Logical "or"
true/(red|green) apple is \1/.test('red apple is red')
true/(red|green) apple is \1/.test('green apple is green')
74
Backreferences
Alternative character represenations
\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)
76
Representing a character
\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)
backslash takes away special character meaning:
/\(\)/.test('()') // true/\\n/.test('\\n') // true
77
Representing a character
\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)
backslash takes away special character meaning:
/\(\)/.test('()') // true/\\n/.test('\\n') // true
...or vice versa!/\f/.test('f') // false!
78
Representing a character
Flags
g i m s x y
80
Regular expression flags
g i m s x y global match
81
Regular expression flags
g i m s x y global matchignore case
82
Regular expression flags
g i m s x y global matchignore casemultiline matching for ^ and $
83
Regular expression flags
g i m s x y global matchignore casemultiline matching for ^ and $
JavaScript does NOT provide support for:string as single lineextend pattern
84
Regular expression flags
g i m s x y global matchignore casemultiline matching for ^ and $
Mozilla-only, non-standard:stickyMatch only from the .lastIndex index (a regexp instance property). Thus, ^ can match at a predefined position.
85
Regular expression flags
/(?i)foo//(?i-m)bar$//(?i-sm).x$//(?i)foo(?-i)bar/ Some implementations do NOT support flag switching on-the-go.
In JS, flags are set for the whole regexp instance and you can't change them.
86
Alternative syntax for flags
RegExp in JavaScript
RegExp instances: /regexp/.exec('string') null or array ['whole match', $1, $2, ...] /regexp/.test('string') false or true String instances: 'str'.match(/regexp/) 'str'.match('\\w{1,3}') - same as /regexp/.exec if no 'g' flag used; - array of all matches if 'g' flag used (internal capturing groups ignored) 'str'.search(/regexp/) 'str'.search('\\w{1,3}') first match index, or -1
88
Methods
String instances:'str'.replace(/old/, 'new'); WARNING: special magic supported in the replacement string: $$ inserts a dollar sign "$" $& substring that matches the regexp $` substring before $& $' substring after $& $1, $2, $3 etc.: string that matches n-th capturing group 'str'.replace(/(r)(e)gexp/g, function(matched, $1, $2, offset, sourceString) { // what should replace the matched part on this iteration? return 'replacement';});
89
Methods
// BAD CODEvar re = new RegExp('^' + userInput + '$');// ...var userInput = '[abc]'; // oops!
// GOOD, DO IT AT HOMERegExp.escape = function(text) { return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");}; var re = new RegExp('^' + RegExp.escape(userInput) + '$');
90
RegExp injection
Recommended reading
Online, just google it:MDN Guide on Regular Expressions
Mastering Regular ExpressionsO'Reilly Media
The Book:
Thank you!
top related