Transcript
Page 1: Exploring Protein Sequences

Exploring Protein Sequences

Tutorial 5

Page 2: Exploring Protein Sequences

Exploring Protein Sequences

• Multiple alignment– ClustalW

• Motif discovery– MEME– Jaspar

Page 3: Exploring Protein Sequences

• More than two sequences– DNA– Protein

• Evolutionary relation– Homology Phylogenetic tree– Detect motif

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 4: Exploring Protein Sequences

• Dynamic Programming– Optimal alignment– Exponential in #Sequences

• Progressive– Efficient– Heuristic

Multiple Sequence Alignment

GTCGTAGTCG-GC-TCGACGTC-TAG-CGAGCGT-GATGC-GAAG-AG-GCG-AG-CGCCGTCG-CG-TCGTA-AC

A

D B

CGTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 5: Exploring Protein Sequences

ClustalW

“CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

Page 6: Exploring Protein Sequences

• Progressive– At each step align two existing alignments or sequences

– Gaps present in older alignments remain fixed

ClustalW

GTCGTAGTCG-GC-TGTC-TAG-CGAGCGTGC-GAAG-AG-GCG-GCCGTCG-CG-TCGT

GTCGTAGTCGGCTCGACGTCTAGCGAGCGTGATGCGAAGAGGCGAGCGCCGTCGCGTCGTAAC

Page 7: Exploring Protein Sequences

ClustalW - InputScoring matrix

Gap scoring

Input sequences

Page 8: Exploring Protein Sequences

ClustalW - Output

Page 9: Exploring Protein Sequences

ClustalW - Output

Input sequences

Pairwise alignment scores

Building alignment

Final score

Page 10: Exploring Protein Sequences

ClustalW - Output

Page 11: Exploring Protein Sequences

ClustalW Output

Sequence names Sequence positions

Match strength in decreasing order: * : .

Page 12: Exploring Protein Sequences

http://http://www.megasoftware.net/

Page 13: Exploring Protein Sequences

Can we find motifs using multiple sequence alignment?

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

1 3 5 7 9..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. * :** *:

MotifA widespread pattern with a biological significance

Page 14: Exploring Protein Sequences

Can we find motifs using multiple sequence alignment?

YES! NO

Page 15: Exploring Protein Sequences

MEME – Multiple EM for Motif finding

• http://meme.sdsc.edu/• Motif discovery from unaligned sequences

– Genomic or protein sequences• Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)

Page 16: Exploring Protein Sequences

MEME - InputEmail address

Multiple input sequences

How many times in each sequence?

How many motifs?

How many sites?

Range of motif lengths

Page 17: Exploring Protein Sequences

MEME - OutputMotif length

Number of times

Like BLAST

Page 18: Exploring Protein Sequences

MEME - Output

Probability * 10

‘a’=10, ‘:’=0

Page 19: Exploring Protein Sequences

MEME - Output

Low uncertainty

=

High information content

Page 20: Exploring Protein Sequences

MEME - Output

Multilevel Consensus

Page 21: Exploring Protein Sequences

Sequence names

Reverse complement (genomic input only)

Position in

sequence

Strength of match

Motif within sequence

MEME - Output

Page 22: Exploring Protein Sequences

Overall strength of motif matches

sequence lengths

Motif instance

MEME - Output

‘-’=Other strand

Page 23: Exploring Protein Sequences

MAST• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST

• Profile defines strength of match– Multiple motif matches per sequence– Combined E value for all motifs

• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

Page 24: Exploring Protein Sequences

JASPAR• Profiles

– Transcription factor binding sites– Multicellular eukaryotes– Derived from published collections of

experiments

• Open data accesss

Page 25: Exploring Protein Sequences

JASPAR• profiles

– Modeled as matrices.– can be converted into PSSM for scanning

genomic sequences.

1 2 3 4 5 6 7 8 9 10

A 0 0 0 0 0 0.5 1/6 1/3 0 0

D 0 0.5 1/3 0 0 1/6 5/6 1/6 0 1/6

E 0 0 2/3 1 0 0 0 0 1 5/6

G 0 1/6 0 0 1 1/3 0 0 0 0

H 0 1/6 0 0 0 0 0 0 0 0

N 0 1/6 0 0 0 0 0 0 0 0

Y 1 0 0 0 0 0 0.5 0.5 0 0

Page 26: Exploring Protein Sequences

Search profile

http://jaspar.cgb.ki.se/

Page 27: Exploring Protein Sequences

http://jaspar.cgb.ki.se/


Top Related