java regular expression part i

34
https://www.facebook.com/Oxus20 [email protected] Java Regular Expression PART I » String Manipulation » Matching / Validating » Extracting / Capturing » Modifying / Substitution Abdul Rahman Sherzad

Upload: oxus-20

Post on 27-Jun-2015

200 views

Category:

Education


0 download

DESCRIPTION

Regular Expressions (Regex) is powerful and convenient to use for string manipulation i.e. matching and validation, extracting and capturing, modifying and substitution, etc. This presentation covers Regular Expression with real world examples and demos.

TRANSCRIPT

Page 1: Java Regular Expression PART I

https://www.facebook.com/Oxus20

[email protected] Java

Regular Expression

PART I

» String Manipulation

» Matching / Validating

» Extracting / Capturing

» Modifying / Substitution

Abdul Rahman Sherzad

Page 2: Java Regular Expression PART I

Agenda » What is Regular Expression

» Regular Expression Syntax

˃ Character Classes

˃ Quantifiers

˃ Meta Characters.

» Basic Expression Example

» Basic Grouping Example

» Matching / Validating

» Extracting/Capturing

» Modifying/Substitution

2

https://www.facebook.com/Oxus20

Page 3: Java Regular Expression PART I

What are Regular Expressions?

» Regular Expressions are a language of string patterns built

into most modern programming languages, Perl, PHP, .NET

and including Java 1.4 onward.

» A regular expression defines a search pattern for strings.

» Regular expressions can be used to search, edit and

manipulate text.

» The abbreviation for Regular Expression is Regex. 3

https://www.facebook.com/Oxus20

Page 4: Java Regular Expression PART I

Regular Expression Syntax

» Regular Expressions, by definition, are string patterns

that describe text.

» These descriptions can then be used in nearly infinite

ways.

» The basic language constructs include

˃ Character Classes

˃ Quantifiers

˃ Meta Characters. 4

https://www.facebook.com/Oxus20

Page 5: Java Regular Expression PART I

Character Classes Character Class

Explanation and Alternatives

. Match any character (may or may not match line terminators)

\d Matches a digit, is an alternative for: [0-9]

\D Matches a non-digit character, is an alternative for: [^0-9]

\s Matches a whitespace character, is an alternative for:

[ \t\n\x0B\f\r]

\S Matches a non-whitespace character, is an alternative for: [^\s]

\w Match a word character, is an alternative for: [a-zA-Z_0-9]

\W Match a non-word character, is an alternative for: [^\w]

5

https://www.facebook.com/Oxus20

NOTE: in Java, you will need to "double escape" these backslashes "\" i.e. "\d" should be "\\d".

Page 6: Java Regular Expression PART I

Quantifiers Quantifiers Explanation and Alternatives

* Match zero or more times, is an alternative for {0,}

+ Match one or more times, is an alternative for {1,}

? Match no or one times, ? is an alternative for {0,1}

{n} Match exactly n number of times

{n,} Match at least n times,

{n,m} Match at least n but not more than m times

6

https://www.facebook.com/Oxus20

Quantifiers can be used to specify the number or length that part of a pattern should match or repeat.

Page 7: Java Regular Expression PART I

Meta Characters Meta Characters

Explanation

\ Escape the next meta-character (it becomes a normal / literal character)

^ Match the beginning of the line

. Match any character (except newline)

$ Match the end of the line (or before newline at the end)

| Alternation for ('or' statement)

() Grouping

[] Custom character class

7

https://www.facebook.com/Oxus20

Meta-characters are used to group, divide, and perform special operations in patterns.

Page 8: Java Regular Expression PART I

Basic Expression: Example I » Every string is a Regular Expression.

» For example, the string, "I study English", is a regular

expression that will match exactly the string, "I study

English", and will ignore everything else.

» What if we want to be able to find more subject that

we study? We can replace the word English with a

character class expression that will match any

subject. Example on next slide … 8

https://www.facebook.com/Oxus20

Page 9: Java Regular Expression PART I

Basic Expression: Example II "I study \\w+"

» As you can see, the above pattern "I study \\w+" uses

both a character class and a quantifier.

» The character class "\w" says match a word character

» The quantifier "+" says match one or more.

» Now the pattern "I study \\w+" will match any word in

place of "English" i.e. "I study Programming", "I study

Math", "I study Database", etc. 9

https://www.facebook.com/Oxus20

Page 10: Java Regular Expression PART I

Example II Demo

public class RegexBasicExampleII {

public static void main(String[] args) {

System.out.println("I study English".matches("I study \\w+")); // true

System.out.println("I study Programming".matches("I study \\w+")); // true

System.out.println("I study JAVA".matches("I study \\w+")); // true

System.out.println("I study: JAVA".matches("I study \\w+")); // false

}

}

10

https://www.facebook.com/Oxus20

Page 11: Java Regular Expression PART I

Example II Demo (Alternative)

public class RegexBasicExampleII {

public static void main(String[] args) {

System.out.println("I study English".matches("I study [a-zA-Z_0-9]+")); // true

System.out.println("I study Programming".matches("I study [a-zA-Z_0-9]+")); // true

System.out.println("I study JAVA".matches("I study [a-zA-Z_0-9]+")); // true

System.out.println("I study: JAVA".matches("I study [a-zA-Z_0-9]+")); // false

}

}

11

https://www.facebook.com/Oxus20

Page 12: Java Regular Expression PART I

Basic Expression: Example III » But the pattern "I study \\w+" will not match "I study:

English", because as soon as the expression finds the ":"

character, which is not a word character, it will stop

matching.

» If we want the expression to be able to handle this

situation, then we need to make a small change as follow:

» "I study:? \\w+"

» Now the pattern "I study:? \\w+" will match "I study

Programming" and also "I study: Programming" 12

https://www.facebook.com/Oxus20

Page 13: Java Regular Expression PART I

Example III Demo

public class RegexBasicExampleIII {

public static void main(String[] args) {

System.out.println("I study English".matches("I study:? \\w+")); // true

System.out.println("I study Programming".matches("I study:? \\w+")); // true

System.out.println("I study JAVA".matches("I study:? \\w+")); // true

System.out.println("I study: JAVA".matches("I study:? \\w+")); // true

}

}

13

https://www.facebook.com/Oxus20

Page 14: Java Regular Expression PART I

Basic Expression: Example IV » Also the pattern "I study \\w+" will not match neither the

string "i study English" and nor "I Study English" , because as

soon as the expression finds the lowercase "i", which is not

equal uppercase "I", it will stop matching.

» If we want the expression to be able to handle this situation

does not care about the case sensitivity then we need to make a

small change as follow:

» "(?i)I study \\w+"

» Now the pattern "(?i)I study \\w+" will match both "I STUDY

JAVA" and also "i StUdY JAVA" 14

https://www.facebook.com/Oxus20

Page 15: Java Regular Expression PART I

Example IV Demo

public class RegexBasicExampleIV {

public static void main(String[] args) {

System.out.println("I study English".matches("(?i)I study \\w+")); // true

System.out.println("i STUDY English".matches("(?i)I study \\w+")); // true

System.out.println("I study JAVA".matches("(?i)I study \\w+")); // true

System.out.println("i StUdY JAVA".matches("(?i)I study \\w+")); // true

}

}

15

https://www.facebook.com/Oxus20

Page 16: Java Regular Expression PART I

Regular Expression Basic Grouping » An important feature of Regular Expressions is the

ability to group sections of a pattern, and provide alternate matches.

» The following two meta-characters are core parts of flexible Regular Expressions

˃ | Alternation ('or' statement)

˃ () Grouping

» Consider if we know exactly subjects we are studying, and we want to find only those subjects but nothing else. Following is the pattern:

» "I study (Java|English|Programming|Math|Islamic|HTML)"

16

https://www.facebook.com/Oxus20

Page 17: Java Regular Expression PART I

Regular Expression Basic Grouping

» "I study (Java|English|Programming|Math|Islamic|HTML)"

» The new expression will now match the beginning of the string "I

study", and then any one of the subjects in the group, separated by

alternators, "|"; any one of the following would be a match:

˃ Java

˃ English

˃ Programming

˃ Math

˃ Islamic

˃ HTML

17

https://www.facebook.com/Oxus20

Page 18: Java Regular Expression PART I

Basic Grouping Demo I (Case Sensitive)

public class BasicGroupingDemoI {

public static void main(String[] args) {

String pattern = "I study (Java|English|Programming|Math|Islamic|HTML)";

System.out.println("I study English".matches(pattern)); // true

System.out.println("I study Programming".matches(pattern)); // true

System.out.println("I study Islamic".matches(pattern)); // true

// english with lowercase letter "e" is not in our group

System.out.println("I study english".matches(pattern)); // false

// CSS is not in our group

System.out.println("I study CSS".matches(pattern)); // false

}

} 18

https://www.facebook.com/Oxus20

Page 19: Java Regular Expression PART I

Basic Grouping Demo I (Case Insensitive)

public class BasicGroupingDemoI {

public static void main(String[] args) {

String pattern = "(?i)I study (Java|English|Programming|Math|Islamic|HTML)";

System.out.println("I study English".matches(pattern)); // true

System.out.println("I study Programming".matches(pattern)); // true

System.out.println("I study Islamic".matches(pattern)); // true

System.out.println("I study english".matches(pattern)); // true

// CSS is not in our group

System.out.println("I study CSS".matches(pattern)); // false

}

}

19

https://www.facebook.com/Oxus20

Page 20: Java Regular Expression PART I

Matching / Validating

» Regular Expressions make it possible to find all instances of text that match a certain pattern, and return a Boolean value if the pattern is found / not found.

» This can be used to validate user input such as ˃ Phone Numbers

˃ Social Security Numbers (SSN)

˃ Email Addresses

˃ Web Form Input Data

˃ and much more.

» Consider the purpose is to validate the SSN if the pattern is found in a String, and the pattern matches a SSN, then the string is an SSN.

20

https://www.facebook.com/Oxus20

Page 21: Java Regular Expression PART I

SSN Match and Validation public class SSNMatchAndValidate { public static void main(String[] args) { String pattern = "^(\\d{3}-?\\d{2}-?\\d{4})$"; String input[] = new String[5]; input[0] = "123-45-6789"; input[1] = "9876-5-4321"; input[2] = "987-650-4321"; input[3] = "987-65-4321 "; input[4] = "321-54-9876"; for (int i = 0; i < input.length; i++) { if (input[i].matches(pattern)) { System.out.println("Found correct SSN: " + input[i]); } } } } 21

https://www.facebook.com/Oxus20

OUTPUT: Found correct SSN: 123-45-6789 Found correct SSN: 321-54-9876

Page 22: Java Regular Expression PART I

SSN Match and Validation Detail

"^(\\d{3}-?\\d{2}-?\\d{4})$" // 123-45-6789

Regular

Expression

Meaning

^ match the beginning of the l ine

() group everything within the parenthesis as group 1

\d{3} match only 3 digits

-? optionally match a dash

\d{2} match only 2 digits

-? optionally match a dash

\d{4} match only 4 digits

$ match the end of the l ine 22

https://www.facebook.com/Oxus20

Page 23: Java Regular Expression PART I

SSN Match and Validation (Alternative)

public class SSNMatchAndValidateII { public static void main(String[] args) { String pattern = "^([0-9]{3}-?[0-9]{2}-?[0-9]{4})$"; String input[] = new String[5]; input[0] = "123-45-6789"; input[1] = "9876-5-4321"; input[2] = "987-650-4321"; input[3] = "987-65-4321 "; input[4] = "321-54-9876"; for (int i = 0; i < input.length; i++) { if (input[i].matches(pattern)) { System.out.println("Found correct SSN: " + input[i]); } } } } 23

https://www.facebook.com/Oxus20

OUTPUT: Found correct SSN: 123-45-6789 Found correct SSN: 321-54-9876

Page 24: Java Regular Expression PART I

SSN Match and Validation Detail

"^([0-9]{3}-?[0-9]{2}-?[0-9]{4})$" // 123-45-6789

Regular

Expression

Meaning

^ match the beginning of the l ine

() group everything within the parenthesis as group 1

[0-9]{3} match only 3 digits

-? optionally match a dash

[0-9]{2} match only 2 digits

-? optionally match a dash

[0-9]{4} match only 4 digits

$ match the end of the l ine 24

https://www.facebook.com/Oxus20

Page 25: Java Regular Expression PART I

Extracting / Capturing » Capturing groups are an extremely useful feature of

Regular Expression matching that allow us to query

the Matcher to find out what the part of the string was that

matched against a particular part of the regular expression.

» Consider you have a large complex body of text (with an

unspecified number of numbers) and you would like to

extract all the numbers.

» Next Slide demonstrate the example 25

https://www.facebook.com/Oxus20

Page 26: Java Regular Expression PART I

Extracting / Capturing Numbers import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class ExtractingNumbers {

public static void main(String[] args) {

String text = "Abdul Rahman Sherzad with university ID of 20120 is trying to

demonstrate the power of Regular Expression for OXUS20 members.";

Pattern p = Pattern.compile("\\d+");

Matcher m = p.matcher(text);

while (m.find()) {

System.out.println(m.group());

}

}

}

26

https://www.facebook.com/Oxus20

OUTPUT: 20120 20

Page 27: Java Regular Expression PART I

Extracting / Capturing Explanation » Import the needed classes

import java.util.regex.Matcher;

import java.util.regex.Pattern;

» First, you must compile the pattern

Pattern p = Pattern.compile("\\d+");

» Next, create a matcher for a target text by sending a message to your pattern

Matcher m = p.matcher(text);

» NOTES

˃ Neither Pattern nor Matcher has a public constructor;

+ use static Pattern.compile(String regExpr) for creating pattern instances

+ using Pattern.matcher(String text) for creating instances of matchers.

˃ The matcher contains information about both the pattern and the target text.

27

https://www.facebook.com/Oxus20

Page 28: Java Regular Expression PART I

Extracting / Capturing Explanation

» m.find()

˃ returns true if the pattern matches any part of the

text string,

˃ If called again, m.find() will start searching from

where the last match was found

˃ m.find() will return true for as many matches as

there are in the string; after that, it will return false

28

https://www.facebook.com/Oxus20

Page 29: Java Regular Expression PART I

Extract / Capture Emails import java.util.regex.Matcher; import java.util.regex.Pattern; public class ExtractEmails { public static void main(String[] args) { String text = "Abdul Rahman Sherzad [email protected] on OXUS20 [email protected]"; String pattern = "[A-Za-z0-9-_]+(\\.[A-Za-z0-9-_]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})"; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(text); while (m.find()) { System.out.println(m.group()); } } } 29

https://www.facebook.com/Oxus20

OUTPUT: [email protected] [email protected]

Page 30: Java Regular Expression PART I

Modifying / Substitution

» Values in String can be replaced with new values

» For example, you could replace all instances of the

word 'StudentID=', followed by an ID, with a mask to

hide the original ID.

» This can be a useful method of filtering sensitive

information.

» Next Slide demonstrate the example

30

https://www.facebook.com/Oxus20

Page 31: Java Regular Expression PART I

Mask Sensitive Information import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class Substitutions {

public static void main(String[] args) {

String text = "Three student with StudentID=20120, StudentID=20121 and finally StudentID=20122.";

Pattern p = Pattern.compile("(StudentID=)([0-9]+)");

Matcher m = p.matcher(text);

StringBuffer result = new StringBuffer();

while (m.find()) {

System.out.println("Masking: " + m.group(2));

m.appendReplacement(result, m.group(1) + "***masked***");

}

m.appendTail(result);

System.out.println(result);

}

} 31

https://www.facebook.com/Oxus20

Page 32: Java Regular Expression PART I

Mask Sensitive Information (OUTPUT)

» Masking: 20120

» Masking: 20121

» Masking: 20122

» Three student with StudentID=***masked***,

StudentID=***masked*** and finally

StudentID=***masked***.

32

https://www.facebook.com/Oxus20

Page 33: Java Regular Expression PART I

Conclusion » Regular Expressions are not easy to use at first

˃ It is a bunch of punctuation, not words

˃ It takes practice to learn to put them together correctly.

» Regular Expressions form a sub-language ˃ It has a different syntax than Java.

˃ It requires new thought patterns

˃ Can't use Regular Expressions directly in java; you have to create Patterns and Matchers first or use the matches method of String class.

» Regular Expressions is powerful and convenient to use for string manipulation ˃ It is worth learning!!!

33

https://www.facebook.com/Oxus20

Page 34: Java Regular Expression PART I

END

https://www.facebook.com/Oxus20

34