expressive pattern matching with logol application to the...

33
Expressive pattern matching with LOGOL Application to the modelling of -1 Ribosomal Frameshift events X XXY YY Catherine Belleannée - Dyliss team, Rennes 1 University X XXY YYZ Olivier Sallou - GenOuest plateform, Rennes 1 University Jacques Nicolas - Dyliss team, Inria 1 LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Upload: others

Post on 06-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Expressive pattern matching with LOGOL

Application to the modelling of -1 Ribosomal Frameshift events

X XXY YY

Catherine Belleannée - Dyliss team, Rennes 1 University

X  XXY  YYZ 

Olivier Sallou - GenOuest plateform, Rennes 1 UniversityJacques Nicolas - Dyliss team, Inria

1LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 2: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

What is Logol? gDNA

A new tool for pattern matching on RNAproteinsproteins

motifmotif

• attccggtctacc• ctttgtcacg

attccggtctacc• ctttgtcacg

• taggctggcttcggatt• tcggcattggattcgga• cggatcgattcttttac

taggctggcttcggatt• tcggcattggattcgga cggatcgattcttttac

t h i thsequences matches in the sequences

pattern Model

2LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 3: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Why a new tool?Why a new tool?

Towards more expressive patternsb d tifbeyond motifs TAT-[A|T]-T-xxx-AATTCCCtowards real biological models

Logol language

X  XXY  YYZ 

While remaining practicable Logol toolaccept real sequences (e g full genomes)- accept real sequences (e.g. full genomes)

- in reasonable time

3LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 4: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

O tliOutline

1 Logol language1. Logol language- Foundations- Some elements

2. Logol toolAvailability- Availability

- Design of a pattern- Specifications of the tool

3. An example : modelling « -1 frameshifting sites »

4. Conclusion

4LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 5: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

1. Logol language

5LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 6: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Foundations of the language

• Make the structure of motifs explicit -> Grammatical modelsDescribe «the language of gene » cf David Searls

g g

Describe the language of gene cf David Searls

• with an accurate level of grammar -> String Variable Grammars (SVG)String Variables : X X direct copy atcgttatgtatgttatgaString Variables : X…X direct copy atcgttatgtatgttatga

X…~X reverse complement atcgttatgtatataacga

SVG : beyond context-free grammars: « middly context sensitive »regular grammars : motif (TAT-[A|T]-T-xxx-AATTCCC)

P i l /t l i St i V i bl

context-free grammars : + palindrome (stem-loops)SVG : + copy, repeat

• Previous languages/tools using String VariablesPatscan[Dsouza&al, 97] , Patsearch[Pesole&al,00] limited expressivityGenlang[Dong&Searls, 94], Stan[Nicolas&al, 05] or no more maintained

6LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

-> Logol : in the lineage of Genlang

Page 7: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Some elements of the language 1/3

• A first grammar: looking for « aaaaa » anywhere in the sequence

g g

mod1()==*>SEQ1 mod1()==>"aaaaa" actaaaaatggaaaaagta

I t t h 2 t i t h ($) i d l ($$)• Inexact matches: 2 counters -> mismatch ($) , indel ($$)

-> mod1()==>"aaaaa":{$[0,1]} actagaaatgga cost=1 mismatch-> mod1()==>"aaaaa":{$$[0,1]} actaaaacatgga distance=1 insert> mod1() > aaaaa :{$$[0,1]} actaaaacatgga distance 1 insert

• String Variables : looking for 2 copies of a string (X1) separated by a gap (.*)

mod1()==>X1:{#[5,8]}, .* , X1 actatcaatggatcaagta

• Morphisms: to convert a string into another string personal morphisms allowed

"wc" :Watson Crick complement, "-" : reverse string, "wobble" :wobble cplt-"wc" : reverse complement

tt tt t t

7LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

mod1()==>"-wc" "aacccc" acttggggttggatcaagta

Page 8: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Some elements of the language 2/3• A constraint approachConstraints : begin (@), end (@@), content (?), length (#),

t ($) di t ($$) iti ( )

g g

cost ($), distance ($$), composition(%)

mod1()==> X1:{% "cg":70} X1 must contain at least 70% of ‘c’ and ‘g‘

V i bl d i• Variables can denote instances Instance= string + components->Mark an instance (_VARNAME) and reuse it (?VARNAME, $VARNAME...)

The second string must exactly match the previous instance actaaTaaaaTaactacctThe second string must exactly match the previous instance actaaTaaaaTaactacctmod1()==>"aaaaa":{$[0,1], _SAVE1}, ?SAVE1

‘acgt’ must be located at least 50 nt further than ‘aaaaa’d1() >" " { SAVE1} * " t" {@[@SAVE1+50 @SAVE1+100]}mod1()==>"aaaaa":{_SAVE1}, *., "acgt":{@[@SAVE1+50, @SAVE1+100]}

Looking for 3 strings, successively deriving from each other actaaaaaaaaTaaCatamod1()==>X1:{#[5,8],_S1}, ?S1:{_S2}:{$[1,1]}, ?S2:{$[1,1]}

Looking for a stem-loop, with sizes of : stem in [5,11], loop in [1,9],stem strands linked by Watson-Crick pairing, 2 mismatch + 1 indel allowed in the stem

8LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

mod1()==>STEM5:{#[5,11],_S5},.*:{#[1,9]}, -"wc" ?S5:{$[0,2],$$[0,1]}

Page 9: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Some elements of the language 3/3

• Repeats: looking for "acgt" repeated between 0 and 5 times. The instances may be separated by a Spacer between 0 to 2 nt

g g

may be separated by a Spacer between 0 to 2 nt

mod1()==>repeat("acgt",[0,2])+[0,5] actacgtggacgtcacgtccta

• Negative contain constraints (!): looking for a string with length between 2 and 5 which is not "ag" mod1()==> !"ag":{#[2,5]}

P t constraints on se eral strings• Put constraints on several stringsVIEW: constraints on consecutive segments

The total size of X1::X2::X3 must be between 8 and 20(X1 {#[1 10]} X2 {#[1 10]} X3 {#[1 10]}) {#[8 20]}(X1:{#[1,10]}, X2:{#[1,10]}, X3:{#[1,10]}): {#[8,20]}

Control panel: constraints on non consecutive segments

• Superposition of complementary models: Multiple model ’points of view’Superposition of complementary models: Multiple model points of view-> The sequence must match all the models

Note: parameters may be transferred from one model to another one

9LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

mod1(VAR1).mod2(VAR1,VAR2).mod3()==*>SEQ1

Page 10: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

2. Logol tool

10LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 11: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Availability

On the web : Logol can be used on the GenOuest web site (with restrictions) http://webapps genouest org/LogolDesigner/

y

restrictions) http://webapps.genouest.org/LogolDesigner/

Via Linux command-line on GenOuest plateform with a GenOuestaccountaccount

Download on your own computer NEW! NEW! NEW!Logol software is free and open source, under CeCILL licenseg pIt includes a Linux command line tool and a graphical designer

Logol is a fully maintained tool (development manager: Olivier SALLOU)g y ( p g )

Main logol page:http://logol.genouest.org/web/app.php/logol

11LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 12: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Design of a patterng p

mod4()==>(("aaa")|("ccc")|("uuu")|("ggg")) Grammatical model

text file mymodel.lgg

mod4() (( aaa )|( ccc )|( uuu )|( ggg ))mod2()==>mod4(),(("aaa")|("uuu")),! "g":{#[1,1]}…

Graphical model Graphical modelwith a graphical designermymodel.lgd

http://webapps.genouest.org/LogolDesigner/

12LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 13: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Specifications of the tool

• Input : a Logol model (graphical or grammatical model)a Fasta sequence

p

a Fasta sequence

• Runs on a computer or a grid (Linux)Configurable to support multi-core architectures and to use g ppmultiple nodes to parallelize treatments when possible.Sequences may be split for more parallelization

• Output: a compressed XML file contains all matches of the model• Output: a compressed XML file, contains all matches of the modelWith the details of each match (position of each word, size, number of errors compared to model…)Possibility to convert it to Fasta (sequence only) or GFF outputy ( q y) p

Main pipeline- a Java program transforms the model file into a Prolog programa Java program transforms the model file into a Prolog program- the Prolog program parses the sequence (to find the instances of the model)

it uses -> a Prolog library (with predicates to operate morphisms, % calculus…)

13LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

g y ( p p p , )-> a suffix array indexation (with “Vmatch” or home product “Cassiopee”)

Page 14: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

3 . An example : modelling 1 frameshifting sitesmodelling « -1 frameshifting sites »

14LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 15: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Programmed -1 ribosomal frameshifting

A translational recoding strategy RNA d t di ti t t i

g g

one mRNA may produce two distinct proteins

Ribosome may switches from the translation of the standard ORF (in the 0-frame) t l i ORF (i th 1 f )to an overlapping ORF (in the -1 frame)

slippery site There, the ribosome may slip of 1 nucleotide to the left

start0 …….....//……….. stop0.………………........... stop-1 p

standard protein (from the 0-frame)

alternative protein-> beginning : built from the 0 frame-> end : built from the 1 frame

15LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

-> end : built from the -1 frame

Page 16: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Typical frameshifting site

Located in messenger RNA

yp g

’- slippery motif

heptamer X XXY YYZ

Stem2.3’

3’Stem2 5’- spacer

- stimulatory structuremainly H-type pseudo-knot

Stem2.5

y yp p(two overlapping stem-loops)

Stem1.5’ Stem1.3’

5’ X  XXY  YYZ  NNNNNNNAUG

slippery motif pseudo‐knotspacerstart

16LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 17: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modelling slippery motifg pp yHeptamer XXXYYYZ with X: any nucleotide, Y: ‘a’ or ‘u’

Z: not a ’g’

Graphical model

Grammatical model

mod4()==>(("aaa")|("ccc")|("uuu")|("ggg"))                      Alternative

17LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

() (( )|( )|( )|( ggg ))mod2()==>mod4(),(("aaa")|("uuu")),! "g":{#[1,1]}             Negation

Page 18: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modelling spacer g p

Grammatical model

Spacer : a string that contains from 1 to10 nucleotides

Grammatical model

mod5()==>SPACER1:{#[1,10]}             String variable with a size constraint

18LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 19: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modelling pseudo-knot 1/3 A first modelg pTwo overlapping stem-loops ,

with - sizes of : stem1 in [4,16], loop1 in [1,5 ],t 2 i [3 8] l 2 i [0 4] l 3 i [4 40]stem2 in [3, 8], loop2 in [0,4], loop3 in [4,40]

- stem strands linked by Watson-crick pairing, - 4 mismatches allowed in stem1, 2 allowed in stem2

L3L1

Graphical model

L2 L3

Grammatical model

STEM15:{#[4 16] S15} *:{#[1 5]} STEM25:{#[3 8] S25}STEM15:{#[4,16],_S15},.*:{#[1,5]},STEM25:{#[3,8],_S25},.*:{#[0,4]},-"wc" ?S15 :{$[0,4]},.*:{#[4,40]},-"wc" ?S25 :{$[0,2]}

Logol operators: String variable, size constraint, cost constraint, morphism wc

BUT: test on data > to much false positives The model is not selective enough

19LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

BUT: test on data=> to much false positives. The model is not selective enough

Page 20: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modelling pseudo-knot 2/3 g pA more realistic model at a glance

20LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 21: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modeling pseudo-knot 3/3 g pLogol components of a more realistic model

• Precise error and constraint handlingprohibit mismatches at stem extremities :a stem is divided in 3 parts :< first nt, main part of the stem, last nt>and accept wobble pairing (g-u) (wobble morphism def)p p g (g ) ( p )

• Check global statistical constraints Allows describing stem stability A majority of ‘gc’ pairing is requiredA majority of gc pairing is required-> counting ‘gc’ in a stand (at least 50% of ‘g’ and ‘c’ on <A5,STEM5’,Z5>)use contain constraint on adjacent strings (i.e. on a view)

d3() >(A5 {#[1 1] SA5} STEM15 {#[2 14] S15}mod3()==>(A5:{#[1,1],_SA5}, STEM15:{#[2,14],_S15},Z5:{#[1,1],_SZ5}) : {%"gc":50}

> counting ‘c’ in both stands (at least 25% of ‘c’ on <A5 STEM5’ Z5> +-> counting c in both stands (at least 25% of c on <A5,STEM5 ,Z5> + <Z3,STEM3’,A3>) because ‘g’ may be involved in a (weak) wobble pairing

use contain constraint on non adjacent strings (stem5’ + stem3’)

21LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 22: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modelling frame alignments 1/3 Specificationsg g

• 2 alternatives translations- - X XXY YYZ- - X

start stop0

start

XXY YYZ

All translation in 0 frame

Shifting in 1 frame

Superposition of 3 mandatory models

stop-1 Shifting in -1 frame

Superposition of 3 mandatory models

SlipperySite model : a start, then a slippery site in the -1 frame

ORF0 model : from the same start, a stop is needed in the 0 frame

ORFminus model : from the same start , a stop is needed in the -1 frame

Between start and stop intermediate codons should not contain a stop

22LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Between start and stop, intermediate codons should not contain a stop codon

Page 23: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modelling frame alignments 2/3 In Logolg g g

• Superposition of the 3 models-> using multiple models in the Logol main ‘rule’ using multiple models in the Logol main rule

• Alignment on the same start-> mark variable START and reuse its position (@START)-> transfer parameters between models (the variable START)

position constraint = @STARTp @

23LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

save the variable START

Page 24: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Modelling frame alignments 3/3 In Logolg g g

• A nonstop codon : a string of size 3nt, which is not a stop> stop is defined as a model by an alternative (uga | uag | uaa)-> stop is defined as a model by an alternative (uga | uag | uaa)

-> then use a view with size constraint (size =3) and negative containconstraint (≠ stop)

> N t d l=> Nonstop model• Set of successive nonstop codons-> consecutive repeat (from 5 to 300 times) of the nonstop model

nonstopstop

24LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 25: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Test

• Test on 30 positive sequences30 « validated 1 frameshift » sequences30 « validated -1 frameshift » sequencesFrom the database Recode2 http://recode.ucc.ie/Size: from 5 000 nt to 30 000 nt Hits: about 3 sites and 100 hits per sequence including the good one

X  XXY  YYZ 

Hits: about 3 sites and 100 hits per sequence, including the good onewith an additional post-filter ordering the hits according to stem quality, the official hit is ranked 1st for 20 sequences

Time (on Intel X5550, 144Go RAM): 1mn 30s for the biggest sequence Time (on Intel X5550, 144Go RAM): 1mn 30s for the biggest sequenceimmediate response on dedicated KnotInFrame site

• Test on Bacillus Subtilis complete genome• Test on Bacillus Subtilis complete genomeReference: Str 168 NC_000964.3Size : 4 215 606 nt Hits : about 7 000 hits (with a looser model) Hits : about 7 000 hits (with a looser model) Time (on Intel X5550, 144Go RAM): 2 hoursno response on KnotInFrame site

25LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 26: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

4. Conclusion

26LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 27: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Logol:• A general purpose modelling language, for every type of sequences

-> with quite important expressivity-> practicable and maintained

• A new birth for String Variable Languages

• An ongoing project : g g p j- still in progress- your returns are expected to go on improving the

language and the toollanguage and the tool

A single adress :http://logol genouest org/web/app php/logolhttp://logol.genouest.org/web/app.php/logol

Test it!27

LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 28: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

5. Supplementary data

28LOGOL C. Belleannée, O. Sallou, J. Nicolas Jobim 3 juillet 2012

Page 29: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Job sequence

Grammar interpreter(convert grammar to prolog)

Job Managersplit sequence if multi core

Sequence Parser(Execute prolog program)

VmatchOr(convert grammar to prolog) split sequence if multi core ( p g p g )

1 1 1 1 1 N

Or Cassiopee

Logol grammar file N

Result file1 sequence per jobP ll l j b if l t ti l if l l

1

Parallel jobs if on cluster or sequential if local

Merged results

Fasta sequencesMulti Sequence Manager

Merged results

Page 30: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Grammar analysisGrammar analysisTh l i J d t L l• The analyzer is a Java program used to parse a Logolgrammar file or model (graphical) and transform it in a Prolog programg p g

• The Prolog program uses a Prolog library that contains predicates to match expressions (fixed content, 

hi )morphisms,…)• Logic programming: generated program tries to match all expressions up to a final match else it goesall expressions up to a final match, else it goes backward to test the next possibility.

• The analyzer extracts a maximum of information for a yvariable to limit the range of search (max size, base content defined later on in the grammar,….)

Page 31: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Execution flowExecution flow• Left to right file reading but accepts variable use before being determined (• Left to right file reading but accepts  variable use before being determined ( 

_R1,”acgt”:{?R1} )• Parse the sequence file by position and try to match an expression (content with 

error, constraints on a group of variable,…). When all expressions are matched, record it as a result.

• A match is recorded as an object. This object records all the information of the match all along the parsing: Object = [pattern1, pattern2,…]

– patternX is itself an object holding match information (position errors ) but also sub– patternX is itself an object holding  match information (position, errors,…) but also sub patterns if applicable (models, repeats,…).

• In case of gap (not a position+1):– Calls Vmatch (suffix array) or cassiopee when content is known (possibly with errors)

V t h i f t ffi h t l ith di t / t• Vmatch is a performant suffix array search tool with distance/error support• Cassiopee is a basic Ruby library with distance/error/ambiguity support, but not optimized for large 

sequences.– Continue by position with all possibilities when looking for content not defined at this time

• Optional final step: filtering to keep optimal results only• Optional final step: filtering to keep optimal results only.

Page 32: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

Z ifi it f L l i bl Zoom on a specificity of Logol variables

X:: $[0,1], X:: $[0,1] tandem repeatp

-> It accepts ATTA with X= ‘‘AA ’’ (or X= ‘‘TT’’ ). This is impossible with Genlang (X, X:: $1 ). So, Logol allows to make the distinction between entity (abstract) and instance (concrete)

Control pannel: To put constraints on non adjacent stringsp p j g

Counting ‘c’ in both stands (at least 25% of ‘c’ on <A5,STEM5’,Z5> + <Z3,STEM3’,A3>) because ‘g’ may be involved in a (weak) wobble pairing

use contain constraint on non adjacent strings (stem5’ + stem3’)

controls:{ %"c"[mod3.SA5,mod3.S15,mod3.SZ5,mod3.SZ3,mod3.S13,mod3.SA3]>=25}

Jobim - 3 juillet 2012 LOGOL C.Belleannée O.Sallou J.Nicolas 32

Page 33: Expressive pattern matching with LOGOL Application to the …jobim2012.inria.fr/sources/slides/s3.pdf · 2012-07-19 · Expressive pattern matching with LOGOL Application to the modelling

#définition du morphisme wobble A final grammatical model for -1 Frameshiftingdef:{ / without frame alignment /morphism(wcw a t)morphism(wcw,a,t) morphism(wcw,t,a) morphism(wcw,c,g) morphism(wcw,g,c) morphism(wcw,g,t) morphism(wcw,t,g) } #définition des contrôles#définition des contrôlescontrols:{ % "c"[mod3.SA5,mod3.STEM15,mod3.SZ5,mod3.SZ3,mod3.STEM13,mod3.SA3]>=25 % "c"[mod3.SX5,mod3.STEM25,mod3.SY5,mod3.SY3,mod3.STEM23,mod3.SX3]>=25 }} #définition du modèle Logolmod4()==>(("aaa")|("ccc")|("ttt")|("ggg")) mod2()==>mod4() (("aaa")|("ttt")) ! "g":{#[1 1]}mod2() mod4(),(( aaa )|( ttt )),! g :{#[1,1]} mod3()==>(A5:{#[1,1],_SA5},S15:{#[2,14],_STEM15},Z5:{#[1,1],_SZ5}):{%"gc":50},L1:{#[1,5]},(X5:{#[1,1],_SX5},S25:{#[1,6],_STEM25},Y5:{#[1,1],_SY5}):{%"gc":50},L11 {#[0 4]}L11:{#[0,4]},-"wcw" Z3:{?SZ5,_SZ3},-"wc" ?STEM15:{_STEM13}:{p$[0,34]},-"wcw" A3:{?SA5,_SA3},L2:{#[4,40]},-"wcw" Y3:{?SY5, SY3},-"wc" ?STEM25:{ STEM23}:{p$[0,34]},-"wcw" X3:{?SX5, SX3}

33

wcw Y3:{?SY5,_SY3}, wc ?STEM25:{_STEM23}:{p$[0,34]}, wcw X3:{?SX5,_SX3} mod1()==>CC1:{#[2,2]},mod2(),SPACER2:{#[1,10]},mod3()mod1()==*>SEQ1