regular expressions and their usages in user inputs2

20
Regular Expressions and Their Usages in Web User Inputs By Tom Xian

Upload: chandra

Post on 17-Nov-2015

215 views

Category:

Documents


1 download

DESCRIPTION

h

TRANSCRIPT

  • Regular Expressions and Their Usages in Web User InputsBy Tom Xian

  • Points of Regular ExpressionsWhat is Regular ExpressionA pattern of text string describing a certain amount of text.Examples:Phone number: 408-376-6280\d{3}-\d{3}-\d{4} or (\d{3}-){2}\d{4}Email: [email protected]\w+@\w+(\.\w+)*IP address: 192.169.0.33(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

  • Points of Regular ExpressionsSupported by many programming languages including Java, C++, Perl, Python,Java Script, .Net, etc.Build string patterns quickly and precisely.Excellent for validating user inputs in Web/html.Support Unicode.

  • Pattern MatchingSimply speaking, a string has at least one sub-string that matches the defined pattern (expression)The cat captured a mouse yesterday has the pattern cap.Gr[ae]y will match Gray or Grey, but not Graey, nor Graay.

  • Regex Engine InternalsRegex engine is a piece of software to perform the matching between regular expression and a text string.The Regex-Directed Engine always returns the leftmost matchHe captured a catfish for his cat.When applying cat as expression, catfish is the first match, not the last word cat.

  • Character Sets and Meta CharactersBrackets [abc], representing any one character inside the bracket.Meta characters inside brackets, -, \, ^, and ]Using - inside the brackets for range[0-9], any single digits[a-zA-Z], any letter.Using ^ to negate the meaning of the character inside the brackets. [^0-9], meaning any character except a digit.\w, meaning word character, [a-zA-Z0-9_]

  • Character Sets and Meta Characters Famous ?, +, and *?, option item before it.Colou?r, for Colour, or Color.(021-)?32174568, for 021-32174568 or 32174568+ at least one occurrence before it.1+, for 1, 11,111* at least zero occurrence before it.A*, for , A, AA, AAA \W, for non-word, [^a-zA-Z0-9_]Dot (.), representing a single character except the new-line character (\n for Unix family, and \r\n for Windows).[^\n]|[^\r\n]

  • Character Sets and Meta Characters Anchors, ^ and $^ for beginning^a matches a, ab, or aaa$ for end, matches right after the last character in the string. x$, matches word relax, not boxes Word boundaries, \b\beBay\b, eBay as a single word, not eBays.Alternation, |(live|die)

  • Character Sets and Meta Characters Repetition {n,m}, besides +, and *. d{3}, for ddd. d{1,3}, for d, dd, or ddd, not dddd{0,} is same as *.{1,} is same as +Examples of telephone number with patterns, ddd-dddd, or ddddddd(\d{3}-\d{4})|\d{7}Email of someone@somewhere\w+@\w+More generically, ([a-zA-Z0-9]+)(\.[a-zA-Z0-9]+)*@ ([a-zA-Z0-9]+)(\.[a-zA-Z0-9]+)*\w+(\.\w+)*@ \w+(\.\w+)*

  • Predefined Character Classes

    . Any character (may or may not match line terminators)\dA digit: [0-9]\D A non-digit: [^0-9]\sA whitespace character: [ \t\n\x0B\f\r]\S A non-whitespace character: [^\s]\w A word character: [a-zA-Z_0-9]\W A non-word character: [^\w]

  • Using Back-reference In Perl, A regular expression can be reused in a compound expression. \1 - \9Example([a-c])x\1x\1 will match axaxa, bxbxb and cxcxc.Here \1 represents ([a-c]), or \1 represents the first (expression).([a-z])x([0-9])y\1y\2 will match ax0yby3\1 for ([a-z])\2 for ([0-9])

  • Case Studies Someones English nameDefinition: first name (middle name) last name (Sr.|Jr.)Example, George W. Bush[A-Z]\w*[ \t]+([A-Z]\w*|[A-Z]\.?)?[ \t]+[A-Z]\w*(Sr\.|Jr\.)?Credit Card NumberDefinitions: dddd-dddd-dddd-dddd or 16 digitisExpression: ((\d{4}-){3}\d{4})|(\d{16})

  • Case Studies Birth DateDefinition: (m)m/(d)d/yyyyExample: 02/29/1964Month: 1, 2, 9,10, 11,12, 01, 02, etc. [1][0-2]|0?[1-9]Day: 1,2, 9, 10, 11,31, 01, 02, [1-9]|0?[1-9]|1[0-9]|2[0-9]|3[01]0?[1-9]|[12][0-9]|3[01]Practical Birth Date, someone who was born after BC000\d|00\d\d|0\d\d\d|1\d\d\d|200[0-5]0\d{3}|1\d{3}|200[0-5][01]\d{3}|200[0-5]Final version:

    ([1][0-2]|0?[1-9])/(0?[1-9]|[12][0-9]|3[01])/([01]\d{3}|200[0-5])

  • Case Studies IP v4 address in dot notationDefinition: 0-255.0-255.0-255.0-255Example: 192.168.0.240-255: \d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5]Overall(\d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\. \d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5]){3}Using back-reference\d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5](\.\1){3}

  • Case Studies URLDefinition: protocol://host_string[:port]/URIPort has value range from 1 to 65535Example: http://cgi5.ebay.com/ws/isapi.dll?sellitemProtocol: http|ftp|rmi|t3|httpsHost_string: \w+(\.\w+)*Port number: 6[0-4]\d{3}|654\d{2}|6552\d|6553[0-5]|[1-5]?\d{1,4}

    URI: ///(/[a-zA-Z_0-9\?\.%&=@]+)*Overall

    (http|ftp|rmi|t3|https):// \w+(\.\w+)*(\:6[0-4]\d{3}|654\d{2}|6552\d|6553[0-5]|[1-5]?\d{1,4} )?(/[a-zA-Z_0-9\?\.%&=@]+)*

  • Java API

    http://java.sun.com/j2se/1.4.2/docs/api/index.html java.util.regex.Pattern java.util.regex.Matcher

  • Sample Java Code using v1.4

    import java.util.regex.*;

    public class regExp{ //define the patterns. //phone: ddddddd, or ddd-ddd-dddd private final static Pattern phonePattern = Pattern.compile("\\d{7}|(\\d{3}-){2}\\d{4}"); // email: someone@somewhere, [email protected] ... private final static Pattern emailPattern = Pattern.compile("\\w+(\\.\\w+)*@\\w+(\\.\\w+)*"); static boolean isPhone(String testString) {if(testString == null)return false;Matcher m = phonePattern.matcher(testString);return m.matches(); }

  • Sample Java Code using v1.4

    static boolean isMatchedPattern(Pattern pat, String testString){if(pat == null || testString == null)return false;Matcher m = pat.matcher(testString);return m.matches();}public static void main(String[] args){ if(args.length == 0) {System.out.println("No arg."); }

  • Sample Java Code using v1.4

    else if(args.length == 1) {if(isPhone(args[0]))System.out.println("matched phone number pattern");elseSystem.out.println("not matched phone number pattern");

    if(isMatchedPattern(emailPattern,args[0]))System.out.println("matched email pattern");elseSystem.out.println("not matched email pattern"); }

  • Sample Java Code using v1.4

    else if(args.length == 2) { // args[0] is pattern, args[1] is test string if(Pattern.matches(args[0], args[1]))System.out.println(args[1] +" is matched pattern," + args[0]); elseSystem.out.println(args[1] +" is not matched pattern," + args[0]); }}}