grep. whats grep? grep is a popular unix program that supports a special programming language for...

21
GREP

Upload: florence-hoover

Post on 05-Jan-2016

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

GREP

Page 2: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Whats Grep?

Grep is a popular unix program that supports a special programming language for doing regular expressions

The grammar in use for software doing regular expressions are based on grep; perl extends it further.

Page 3: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

ANY

Regular ExpressionSearch String

Compiles

Engine parses your search string

produces a state machine

Page 4: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

FALSEFALSE

Searches

Input sent into State Machine

Conceptually, 1 shape/letter at a time

Page 5: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

TRUETRUE

Found:The State Machine Object changes state (in this example it is set to true)

User checks machine state when it completes running

Page 6: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Grep Expressions

The “grep” language for doing Regular Expressions on text processing

Grep pattern is another name

called “Regular Expressions”

Page 7: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Grep Expressions

A string of text to match with special characters

“john.*”

would return True on a search of:“john was here”

Page 8: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Grep Expressions“.*\.txt”

.* is anything (.) any length (*)

\. is literally a . (the \ before it means the next character is literal; that is not special)

txt is just letter matching

This would filter out txt files

Its similar to what you see in windows, but its not the same--its more powerful than simple “wildcards” (*) you often see.

Page 9: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Special Chars

. = any single character

^ = beginning of a line

$ = end of line

\w = word & number characters

\d = decimals (numbers)

Page 10: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

\ = escape char

Backslash \ (leans to the left)

most popular escape character

Uses:

sneak past Illegal characters

make secret code characters

Data encoding always has them

Page 11: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Examples

… = three of ANYTHING

\d\d\d = three numbers (decimals)

remember the \ is the escape code

\w\w\w = three letters (no symbols)

good: abc

bad: a34, ab!

Page 12: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Approach

searching for “john” or “joan”

What is the difference between them?

jo_n

what symbol works?

jo\wn

jo.n

Page 13: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Special Chars

\D = non numbers

\W = non-word characters

\s = white space

\S = non white space

\n = new line (return/enter key)

\t = tab

Page 14: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

\s\s\s = three whitespaces

tabs, space, possibly newlines

\D\s\W = non-decimal, space, non-word

Examples:

x 4, ! !, = 4, A <tab> 5

Page 15: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Quantity Chars* = 0 or more

? = 0 or 1

+ = 1 or more

[] = any of the chars in the [abc]

[^] = NOT any of the chars in []

[a-zA-Z] = ranges of chars

Page 16: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Examples

X+ = 1 or more X

XXX

[XYZ] = any of these 1 chars

X, Y, Z

[XYZxyz]+ = 1+ of any of these

y, XYz, zYZZyX, ZZzzzzz

Page 17: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

EXAMPLES

[a-zA-Z0-9] = any word or number but no spaces

\.?$ = maybe ends with a .

remember: $ is end of line

.* = 0 to ∞ of any letter

[^abc]* = 0 to ∞ anything but lowercase a,b, or c

Page 18: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Problems

UniCode vs ASCII

Reg.Exp. language is older than UniCode

Many new Engines support UniCode

Minor Extensions to the language will be required for full UniCode support

Page 19: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Options

RegExp Engines typically have options

ignoreCase

saves you from doing [Aa] for each

global

repeats if a match was found until the end of the input; by default: it stops at the 1st match (useful for replace)

Page 20: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

Options

multiline

Most breakup the input into lines:

At end of line, it resets for next line

This would make it ignore line endings (unless you use ^ or $ which refer to the beginning and end of lines)

Page 21: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software

/Common Use/

/string/ similar to “quotes” on strings

if you use “string” you must escape:

/\d\d/ (match 2 digit pattern)

vs

“\\d\\d” (match 2 digit string)