cs346 regular expressions1 pattern matching regular expression
TRANSCRIPT
CS346 Regular Expressions 1
Pattern Matching
Regular Expression
CS346 Regular Expressions 2
Pattern Matching
JavaScript provides two ways to do pattern matching:
1. Using RegExp objects 2. Using methods on String objects
RE in both ways are the same Same as in Perl
CS346 Regular Expressions 3
Simple patterns
Two categories of characters in patterns:
a. normal characters (match themselves)
b. metacharacters (can have special meanings in patterns--do not match themselves)
\ | ( ) [ ] { } ^ $ * + ? .
- A metacharacter is treated as a normal character if it is backslashed
- period (.) is a special metacharacter - it matches any character except newline
CS346 Regular Expressions 4
create RegExp objects
var varname = / reg_ex_pattern / flags
Simplest example: exact match To match occurrence of “our” in a
string containing your, our, sour, four, pour
var toMatch = /our/;
CS346 Regular Expressions 5
1. Matching in RegExp objects
Tests a string for pattern matches. This method returns a Boolean that indicates whether or not the specified pattern exists within the searched string. This is the most commonly used method for validation. Use test() method of RegExp object
Format: regexp.test( string_to_be_tested ) test() returns a Boolean
var tomatch=/our/;var result = tomatch.test(“pour”); //boolean result
Example: 16-0-checkName.html
CS346 Regular Expressions 6
Pattern Modifiers (Adding flags)
Flag(s) Purpose
i Makes the match case insensitive/oak/i matches "OAK" and "Oak"
g Performs a global match not just the first
ig Makes the match case insensitive and global
CS346 Regular Expressions 7
2. Matching in Strings
search() method Returns the position in the specified string of the RE
pattern (position is relative to zero); returns -1 if it fails
var str = "Gluckenheimer"; var position = str.search(/n/); /* position is now 6 */
match() method compares a RE and a string to see whether they
match. replace() method
finds out if a RE matches a string and then replaces a matched string with a new string
CS346 Regular Expressions 8
search() method
Format: string.search(reg-exp) Searches the string for the first match to the given regular
expression returns an integer that indicates the position in the string
(zero-indexed). If no match is found, the method will return –1.
Similar to the indexOf() method, Example: To find the location of the first absolute link
within a HTML document:: pos = htmlString.search(/^<a href =
”http:\/\/”$/i);if ( pos != -1) { alert( ‘First absolute link found at’ + pos +’position.’);}else { alert ( ‘Absolute links not found’);}
CS346 Regular Expressions 9
Match() method
match() method Format: string.match( regular_expression ) returns an array of all the matching strings found in the
string given. If no matches are found, then match() returns false.
Example: To check the proper format for a phone number entered by a user, with the form of
(XXX) XXX-XXXX.
function checkPhone( phone ) { phoneRegex = /^\(\d\d\d\) \d\d\d-\d\d\d\d$/; if( !phone.match( phoneRegex ) ) { alert( ‘Please enter a valid phone number’ ); return false; } return true;}
CS346 Regular Expressions 10
replace() method
Format string.replace(reg_exp) Properties: replaces matches to a given regular expression with
some new string. Example: To replace every newline character (\n) with a break
<br /> tag, comment = document.forms[0].comments.value; /* assumes that the HTML form is the first one present in the
document, and it has a field named “comments” */
comment = comment.replace( /\n/g, “<br />”); function formatField( fieldValue ) {
return fieldValue = fieldValue. replace(/\n/g, “<br />”);}
The function accepts any string as a parameter, and returns the new string with all of the newline characters replaced by <br /> tags.
CS346 Regular Expressions 11
Character classes – [ ]
Sequence of characters in brackets defines a set of characters, any one of which matches
e.g. [abcd]
Dashes used to specify spans of characters in a class
e.g. [a-z]
A caret at the left end of a class definition means the opposite
e.g. [^0-9]
CS346 Regular Expressions 12
Character class abbreviations
Abbreviation Equiv. Pattern Matches
\d [0-9] a digit
\D [^0-9] not a digit
\w [A-Za-z_0-9] a word char.
\W [^A-Za-z_0-9] not a word char.
\s [ \r\t\n\f] a whitespace char.
\S [^ \r\t\n\f] not a whitespace
char.
CS346 Regular Expressions 13
From Chapter 25 of text - Perl
Symbol Matches Symbol Matches ^ Beginning of line \d Digit (i.e., 0 to 9) $ End of line \D Nondigit \b Word boundary \s Whitespace \B Nonword boundary \S Nonwhitespace \w Word (alphanumeric)
character \n Newline
\W Nonword character \t Tab Fig. 25.9 Some of Perl’s metacharacters.
Note the difference of usage of ^ here and in a class
CS346 Regular Expressions 14
Quantifiers
Quantifiers in braces - Repetitions
Quantifier Meaning{n} exactly n repetitions{m,} at least m repetitions{min, max} at least min but max
repetitions allowed
CS346 Regular Expressions 15
Some other common Quantifiers
* zero or more repetitions
e.g., \d* means zero or more digits + one or more repetitions e.g., \d+ means one or more digits ? zero or one e.g., \d? means zero or one digit . exactly one character except
newline character e.g., /.l/ matches al or @l but not \n
nor l
CS346 Regular Expressions 16
Anchors
The pattern can be forced to match only at the left end with ^; at the end with $
e.g., /^Lee/ matches "Lee Ann" but not "Mary Lee Ann"
/Lee Ann$/ matches "Mary Lee Ann", but not "Mary Lee Ann is nice“
The anchor operators (^ and $) do not match characters in the string--they match positions, at the beginning or end
CS346 Regular Expressions 17
Examples
test() See 16-1checkURL.html See 16-2validEmail.html
search() method in String See 16-3check_phone.html
CS346 Regular Expressions 18
replace method()
replace(RE_pattern, string)
Finds a substring that matches the pattern replaces it with the string g modifier applicable
var str = "Some rabbits are rabid"; str.replace(/rab/g, "tim");
str is now "Some timbits are timid“ Matched substrings stored in $1, $2, etc $1 and $2 are both set to "rab"
CS346 Regular Expressions 19
match(pattern)
Most general pattern-matching method Returns an array of results of the pattern-matching
operation
With the g modifier, returns an array of the substrings that matched
Without the g modifier, first element of the returned array has the matched substring, the other elements have the values of $1, … obtained by parenthesized parts of pattern
var str = "My 3 kings beat your 2 aces"; var matches = str.match(/[ab]/g);
- matches is set to ["b", "a", "a"]
CS346 Regular Expressions 20
match(pattern) example
16-4matchExample.htmlvar str = “Having a take-home exam that
takes 3 hours to complete is better than a 1-hour in-class exam”;
var matches = str.match( /\d/g );
matches is set to [3, 1]
CS346 Regular Expressions 21
Parentheses in RE
Example: 16-5complexMatchEx.htmlvar str = "I have 118 credits; but I need 120 to graduate";
matches = str.match(/(\d+)([^\d]+)(\d+)/);document.write(matches, "<br />");
1st element of matches is the match, 2nd is the value of $1, 3rd element $2, 4th element $3 etc.
matches array:118 credits; but I need 120,118, credits; but I need ,120______________________ ___ _______________ ___ match with RE $1 $2 $3
CS346 Regular Expressions 22
Alternate patterns
Use the alternation operator | Example: 16-6matchAlternatives.html
CS346 Regular Expressions 23
split(parameter) of String
splits a string into substrings based on a pattern
“:" and /:/ both work
Example: 16-7splitEx.html
CS346 Regular Expressions 24
Program Structure
Example 16-3check_phone.html Limitations? How can you make it more flexible? Can you generalize it for checking
multiple fields
CS346 Regular Expressions 25
Uniform Program Structure for multiple tests
regex_name.test( string_to_be_tested ) to test each field
if test() returns false, compile an error message
See 16-8Structure.html
Examples of curly braces { }
16-9-curly_braces.html
CS346 Regular Expressions 26
CS346 Regular Expressions 27
Table – Regular Expression Codes
See “Regular Expression Codes.doc”