nerd talk: regexes

21
Regexes: It's magic!

Upload: luisa-hugerth

Post on 05-Jul-2015

146 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Nerd talk: regexes

Regexes:It's magic!

Page 2: Nerd talk: regexes

“Some people, when confronted with a problem, think 'I know, I'll use regular expressions!'

Now they have two problems.”

Page 3: Nerd talk: regexes
Page 4: Nerd talk: regexes
Page 5: Nerd talk: regexes

*

Page 6: Nerd talk: regexes

Perl style regex:It's magic done right!

Page 7: Nerd talk: regexes

Metacharacters

^ beginning

$ end

. anything

\ escape

/^....G..AA$/

Page 8: Nerd talk: regexes

Escaped characters

\s whitespace

\S not-whitespace

\w word

\d digit

\. dot

\\ counterslash

/^\w\w\w\wG\w\wAA$/

/^\d\d\\\d\d\\\d\d\d\d$/

Page 9: Nerd talk: regexes

Repetition

? 0 or 1 time

* 0 or more times

+ 1 or more times

*? ungreedy *

+? ungreedy +

{m} m times

{m, n} m up to n times

{m, n}? ungreedy {m,n}

/^\w{4}G\w{2}AA$/

/^\d{1,2}\\\d{1,2}\\\d{2,4}$/

Page 10: Nerd talk: regexes

Grouping

[ABC] any of these characters

(AB|BC|CA) any of these expressions

(THIS!) save this

[A-Za-z0-9] ranges

/^[ACTG]{4}G[ACTG]{2}AA$/

/^(0?[1-9]|[0-2]\d|3[01])\\(0?\d|1[0-2])\\(\d{2}|\d{4})$/

Page 11: Nerd talk: regexes

OVERKILL

http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb

Page 12: Nerd talk: regexes

In Python (sigh...)

Page 13: Nerd talk: regexes

E.g.: finding files

Page 14: Nerd talk: regexes

E.g.: finding files

ls -la | grep '->' | grep -v 'bubo' | grep -v 'Daniel'

Page 15: Nerd talk: regexes

E.g.: demultiplexing fasta

1. Barcode

2. Primer

3. Random nucleotides

grep -P '1:N:0:ACTGGTT' -A3 –no-group-separator multiplex_R1.fastq | grep -P '^[ACTGN]{4}CCC[ACGT]T[GC]AGATA' -A2 -B1 --no-group-separator > deplexed_R1.fq

Page 16: Nerd talk: regexes

E.g.: paper figures!

From the subset of unique sequences that span the entire region under study, how many unique sequences are matched by each primer combination?

Page 17: Nerd talk: regexes

Sed: find & replace“Are you gonna talk about vim regexes?”“Sed regexes are weird”

My work around: use ranges

[0-9][A-Z][a-z][A-Za-z]

Page 18: Nerd talk: regexes

Sed: find & replace“Are you gonna talk about vim regexes?”Sed regexes are weird”

My work around: use ranges

[0-9][A-Z][a-z][A-Za-z]

E.g.:

“Oh noes, Americans don't know how to separate decimals!”

sed 's/./,/g' hisfile.tab > myfile.tab

“Oh noes, this bloody file was edited in Windows!”

sed 's/\r/\n/' theirfile.tab > decentfile.tab

“Oh noes, Cassava 1.6 has a slash in it!”

sed 's,/1, 1:N:0:NNNNNN,' oldfile.fq > newfile.fq

Page 19: Nerd talk: regexes

Other neat stuff

grep (-c)

sort (-n, -r, -k, -t)

uniq -c

Page 20: Nerd talk: regexes

LMGTFY:

sedhttp://www.tutorialspoint.com/unix/unix-regular-expressions.htm

grephttp://linux.about.com/od/commands/l/blcmdl1_grep.htm

Perlhttp://www.cs.tut.fi/~jkorpela/perl/regexp.html

Pythonhttp://docs.python.org/2/howto/regex.html

Vimhttp://vimregex.com/

Page 21: Nerd talk: regexes

sed 's/fear of regex/love of regex/g'