7 searching and regular expressions (regex) mauro jaskelioff
TRANSCRIPT
![Page 1: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/1.jpg)
7 Searching and Regular Expressions (Regex)
Mauro Jaskelioff
![Page 2: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/2.jpg)
Introduction
• Shell metacharacters – What are they?– Why they are not the same as regular
expressions!• More about regular expressions
– Searching file contents using:• grep• egrep• fgrep
![Page 3: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/3.jpg)
Shell Metacharacters
![Page 4: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/4.jpg)
Shell Metacharacters
• Special characters are characters that have some meaning to the shell
• Also known as metacharacters• They are interpreted by the shell for
expansion unless they are quoted or escaped (more on this later)
• E.g.: $ file ../*(gives the file type for all files in the directory
one level up)
![Page 5: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/5.jpg)
Filename Expansion
• The * metacharacter matches multiple files. It means any string of zero or more characters. Eg.:– *.txt matches any filename ending in .txt– myfile.* matches all files with a prefix of myfile
and any suffix– *.* matches files with any prefix and suffix– * matches all files– UST/* matches all files in the UST directory– .* matches all hidden files– *ology matches all filenames with ology at the
end (or a filename of just ology ☺)
![Page 6: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/6.jpg)
Filename Expansion (2)
• The previous example:$ file ../*
1. The shell expands the metacharacters in the command line$ file ../file1 ../file2 /file3
2. The command is executed. • Commands don’t interpret shell
metacharacters• The interpretation is done by the
shell
![Page 7: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/7.jpg)
Other Filename Metacharacters
• ? matches any single character• [abc…] matches any of the enclosed
characters. A hyphen can be used to specify a range, e.g. a-z
• [!abc…] matches any character not enclosed
![Page 8: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/8.jpg)
Command substitution• The shell also supports substituting the
output of a command$ ls –l `cat filenames`
• The command should be enclosed in backquotes (`)
[zlizmj@unnc-cslinux ~]$ cat filenamestemptemp2[zlizmj@unnc-cslinux ~]$ ls -l `cat filenames`-rw-r--r-- 1 zlizmj Domain U 6 Mar 21 03:00 temp-rw-r--r-- 1 zlizmj Domain U 567 Mar 30 11:14 temp2[zlizmj@unnc-cslinux ~]$ ls -l temp temp2-rw-r--r-- 1 zlizmj Domain U 6 Mar 21 03:00 temp-rw-r--r-- 1 zlizmj Domain U 567 Mar 30 11:14 temp2[zlizmj@unnc-cslinux ~]$
![Page 9: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/9.jpg)
Avoiding Shell Expansion• What happens if we actually want to pass a
metacharacter to the command? (i.e. we don’t want the shell to interpret it as a
metacharacter)
• For example, me may have a file named temp*
• The character needs to be quoted or escaped– We can quote an argument with single quotes
(’) or with double quotes (”)– We escape characters with the backslash
character (\)
![Page 10: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/10.jpg)
Single or Double Quotes?
• ″ ″– everything between ″ and ″ is taken literally,
except for:• $ - variable substitution will occur• ` - command substitution will occur• ″ - marks the end of the double quote• ’ – doesn’t have special meaning
• ′ ′– everything between ′ and ′ is taken literally
except for another ′. – You cannot embed another ′ within such a
quoted string (unless you escape it)
![Page 11: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/11.jpg)
Escaping a Character
• The character following a backslash \ is taken literally. $ echo I\’m MauroI’m Mauro$
• Use \ within ″ ″ or ’ ’ to escape ″, $, and ′ when necessary.
• How to escape \?
![Page 12: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/12.jpg)
Regular Expressions
![Page 13: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/13.jpg)
Regular Expressions
• Also called regex• For describing a set of strings using a
pattern– Follows a set of rules– Used for finding occurrences of strings in files
• Contain normal characters mixed with special characters (called metacharacters)
• These metacharacters are NOT the same as shell metacharacters which are used for filename expansion!
![Page 14: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/14.jpg)
Regular Expressions
• Regular Expressions must be put inside quotes otherwise the shell will interpret metacharacters for filename expansion
• E.g.:– grep ‘[Ff]red’ myfile.txt – Searches the file myfile.txt for lines
containing either Fred or fred
![Page 15: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/15.jpg)
Fixed Patterns vs. Regular Expressions
• To search a file for the word computer:– grep computer myfile.txt– Will only match the word computer– A fixed pattern not a regular expression
• Supposing we want to find occurrences (including potential misspellings) of:– computer, computor, Computer, Computor,
Computers, and so on…– grep ‘[cC]omput[eo]rs*’ myfile.txt– Uses a regular expression
![Page 16: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/16.jpg)
Three versions of grep
• grep: supports for the most common metacharacters.
• egrep: (extended grep) supports extended set of metacharacters. It’s more expressive but may be slower.
• fgrep: (fast grep) doesn’t support metacharacters. It’s less expressive but faster.
![Page 17: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/17.jpg)
Regex Metacharacters. Matches any single character except newline c.t matches cat, cbt, cct …
[ ] Matches one character between [ and ] [abc] matches a, b or c
- Indicates a range a-z matches all characters from a to z
* Matches zero or more occurrences of the preceding character
12* matches 1, 12, 122, 1222 …
+ Matches one or more occurrences of the preceding character. NOTE: for use with egrep
12+ matches 12, 122, 1222 …
? Matches zero or one occurrence of the preceding character. NOTE: for use with egrep
12? matches 1 and 12
\ Treats the next character literally \* will match the character * and NOT the metacharacter *
^ Matches the start of the line ^Fred will match only lines that have the word Fred at the start of the line
$ Matches the end of the line Fred$ will match only lines that have the word Fred at the end of the line
![Page 18: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/18.jpg)
grep Revisited
• Used to search a file for a pattern• (remember STDIN, STDOUT, etc. are
also treated as files in UNIX)• cat myfile.txt | grep “chocolate”• who | grep zlizmj• grep ‘pingu’ penguinNames.txt• grep ‘[Ww]ib*le’ wobble.txt
![Page 19: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/19.jpg)
egrep
• Extended grep. • Slower but greater functionality• Includes additional metacharacters,
e.g.:– + matches one of more of it’s preceding
character. • E.g. abc+ means abc, abcc, abccc, …
– ? matches zero or one of it’s preceding character.
• E.g. abc? means ab or abc
– | an alternative. • E.g. A | B means A or B
![Page 20: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/20.jpg)
egrep Example
• egrep ‘(bio|geo)logy’ subjects.txt– will search the file subjects.txt for all
lines that contain the words biology or geology
![Page 21: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/21.jpg)
fgrep
• Fast grep• Does not use regular expressions
– Used for matching an exact string, not a pattern
– $, *, [, ^, |, (, ), and \ are interpreted literally
– (but still have special meaning to the shell)
– Enclose entire string in quotes
![Page 22: 7 Searching and Regular Expressions (Regex) Mauro Jaskelioff](https://reader030.vdocument.in/reader030/viewer/2022033106/56649d015503460f949d3413/html5/thumbnails/22.jpg)
Summary
• The shell performs filename expansion and command substitution.
• Shell metacharacters are not the same as regular expressions!
• Regular expressions allow us to search for a pattern in a file
• Commands used for searching:– grep– egrep– fgrep (does not use regular expressions)