familias de proteinas -...

73
Curso de verano UCM 2008- Ana Rojas-CNIO JUL-2008 1 FAMILIAS DE PROTEINAS

Upload: others

Post on 18-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 1

FAMILIAS DE PROTEINAS

Page 2: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 2

What is a sequence? . . . a string of characters…

Amino acidsACDEFGHIKLMNPQRSTVWY

NucleotideA: adeninaC: citosinaT: timinaG: guanina

MMITRWLFSTNHKDIGTLYMIFGAWAGMVGTALSLLIRAELSQPGALLGDDQIYNVIV

GTGATAATCACTCGTTGACTATTCTCAACCAACCACAAAGATATTGGTACCCTATACATGATTTTCGGGGCCTGAGCTGGAATAGTTGGAACCGCTCTAAGCCTACTTATTCGAGCCGAACTCAGCCAACCTGGAGCTCTCCTA

The User Guide

“Real”players

Traducción del mensaje (previa transcripción a ARN)Genetic Code:Triplet AGG

R (Arg), Codon = amino ácido

DNA

Protein

{ATGC}

{43}

Modified from F. Abascal

Page 3: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 3

We want to understand Proteins

Reminder: the genetic code is “degenerated”, leaving room for change!

Page 4: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 4

H|

NH2- C -CO2H|R

. . . And Proteins are built from amino-acids

Page 5: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 5

. . . And amino acids have spatial constraints

Schema of a peptide bond

Peptide bonds are rigid and planar

O

NH2CH

R1

CN

H

CHC

O

NCH

R2

R3

HOOC

N-terminus C-terminus

Peptide bonds

Page 6: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 6

TTCCPSIVAR SNFNVCRLPG TPEAICATYT GCIIIPGATC PGDYANEE SSHHHH HHHHHHHTTT HHHHHHHH S EE SSS GGG

1D

3D

. . . And spatial constrainsts give a particular 3D shape

Sequences givealso structural information(Estructuras)

Page 7: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 7

We’ve seen so far:

We compare sequences

Goal: find the “most likely” alignment (we’ll never be sure) that reflects the changes.

RPE_YEAST 6 IAPSIL----ASDFANLGCECHKVINAGADWLHIDVMDGHFVPNITLGQP 51 ||.|:| ..|...| .:.:..|...:|.|||| |||.|.::...

RPE_MYCPN 10 IAFSLLPLLHQFDRKLL----EQFFADGLRLIHYDVMD-HFVDNTVFQGE 54

Page 8: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 8

What is homology?

Page 9: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 9

• Homologue: the same organ under every variety of form and function (true or essential correspondence).

• Analogy: superficial or misleading similarity.

Owen’s definition of homology

Richard Owen, 1843

Remember: everything is about homology.

Page 10: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 10

AnalogySame function but differentOrigin.

Homology: common ancestor.May have different function.

Do not forget the underlying concept!

Page 11: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 11

HOMOLOGIA IMPLICA UN ORIGEN EVOLUTIVOCOMUN

Remember: everything is about homology.

Page 12: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 12

Similarity ≠ Homology

Similarity: mathematical concept.Homology: biological concept.

Remember: everything is about homology.

Page 13: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 13

Sequence analyses

Transference of function by homology?

Modified from F. Abascal

But all is a matter of definition:

How do we define function?

How do we transfer the function?

Page 14: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 14

Multiple alignments give more information than pair-wise

-Consensus sequences:

-Regular expressions or patterns

(to screen motifs)

-Profiles & hmm profiles

ALRDFATHDDDF SMTAEATHDSI ECDQAATHEAS

A-T-H-[DE]

AGTVATVSCAGTSATHACIGRCARGSCIGEMARLACIGDYARWSC.........IGTVARVSC <= Consense

Page 15: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 15

Regular expressions

•Any: x•Ambiguity:

[A,B] A, or B... {A,B..} any except A and B.

•Repeat: A(2,4) means A-A o A-A-A o A-A-A-A•N terminal: <, C-terminal: >

Example: [AC]-x-V-x(4)-{E,D}.

[Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

Page 16: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 16

Perfiles (o PSSM): Like substitution matrix (i.e. BLOSUM) position specific.

F K L L S H C L L VF K A F G Q T M F QY P I V G Q E L L GF P V V K E A I L KF K V L A A V I A DL E F I S E C I I QF K L L G N V L V C

A -18 -10 -1 -8 8 -3 3 -10 -2 -8C -22 -33 -18 -18 -22 -26 22 -24 -19 -7D -35 0 -32 -33 -7 6 -17 -34 -31 0E -27 15 -25 -26 -9 23 -9 -24 -23 -1F 60 -30 12 14 -26 -29 -15 4 12 -29G -30 -20 -28 -32 28 -14 -23 -33 -27 -5H -13 -12 -25 -25 -16 14 -22 -22 -23 -10I 3 -27 21 25 -29 -23 -8 33 19 -23K -26 25 -25 -27 -6 4 -15 -27 -26 0L 14 -28 19 27 -27 -20 -9 33 26 -21M 3 -15 10 14 -17 -10 -9 25 12 -11N -22 -6 -24 -27 1 8 -15 -24 -24 -4P -30 24 -26 -28 -14 -10 -22 -24 -26 -18Q -32 5 -25 -26 -9 24 -16 -17 -23 7R -18 9 -22 -22 -10 0 -18 -23 -22 -4S -22 -8 -16 -21 11 2 -1 -24 -19 -4T -10 -10 -6 -7 -5 -8 2 -10 -7 -11V 0 -25 22 25 -19 -26 6 19 16 -16W 9 -25 -18 -19 -25 -27 -34 -20 -17 -28Y 34 -18 -1 1 -23 -12 -19 0 0 -18

Multiple alignment

Profile

Profiles

Page 17: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 17

Alignment of 5 sequences

3 consensus columns

m: is a match state, each has 20 residues emission probabilities (black bars)i: is an insertion state with also 20 emission probabilities.d: states are “mute” states with NO emission probabilities. b: begin and e: end. Arrows are transitions probabilities.

Positions are NOT independent: HMMs profiles

Page 18: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 18

Do not forget that pair-wise alignments are still useful

global versus local

Goes to get best alignment using the whole sequence length

Only if proteins have the same composition!

Goes for maximum scores in fragments

Here we find the Domain shuffling Issue!~ pieces of sequences shuffling along the evolutionary history

Page 19: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 19

Evolution Models

Page 20: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 20

Long time ago…

ACCGTACGGTTAA

ACGGTACGGTTAAACCGTCCGGTTAAACCGT-CGGTTAACCCGTACGGTTAAACCCGTACGGTTAA

time

A general evolution Model : random change + natural selection

Page 21: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 21

Long time ago…

ACCGTACGGTTAA

ACGGTACGGTTAAACCGTCCGGTTAAACCGT-CGGTTAACCCGTACGGTTAAACCCGTACGGTTAA

time

ACCG-CCGGTTAAACCCTCCGGTTAAACCGTCCGGTTCCCAATCCGTCCGGTTAAACCGTCCGCTTAA

Model : random change + natural selection

Page 22: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 22

Long time ago…

ACCGTACGGTTAA

ACGGTACGGTTAAACCGTCCGGTTAAACCGT-CGGTTAACCCGTACGGTTAAACCCGTACGGTTAA

time

ACCG-CCGGTTAAACCCTCCGGTTAAACCGTCCGGTTCCCAATCCGTCCGGTTAAACCGTCCGCTTAA

xn especies

ACCTCTAGTTAA

ACCGTTCCGAA

ACCGTCCGGTTGA

GGAGTACGGTTAA

ACCTGCAATTA

ACCGTACGGTTATA

ACCGTCGTAA

ACCGTACCCCGGTTAAGCCGTACCGTGGTCCA

CCGTCCCGTTAA

AACCGTACGGTTAA

Model : random change + natural selection

Page 23: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 23

A T F Y A G C D E L

How do proteins evolve? A hypothetical case:

Duplication

Glucose hydrolisis

A L F Y A G C E E L

A S Y Y A G C D E I

A T F Y A G C D E L

A T F Y A G C D E L *A S Y Y A G G D E I A S Y Y A G G D E IA T Y Y D G G D E IA T Y L A G G D E IA S R L A G G D E IA S Y Y A G G D E I

*A L F Y A G C E E L A L F Y A G C E E LA I F R A G C E E TA I F R A G C E E LA V F Y A G C E E L

Pseudogen=>LOST

Time & mutation

A S Y Y A G G D E I

Hidrolysisribose

Mutation

Speciation

Hidrolysisglucose

Hydrolisisribose

F(x)

Specificity

Hydrolisisribose

Page 24: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 24

Back to protein evolutionGene duplication?

Page 25: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 25

Homólogos: ortólogos y parálogos.

Ortólogos: genes que comparten el último ancestro común y cuya divergencia se debe a la especiación.

Los mismos genes en distintas especies.

Parálogos: genes que debido a una duplicación, ya no comparten el último ancestro. Frecuentemente tienen funciones distintas.

Imagen tomada de una presentación de Manuel José Gómez (CAB)

Page 26: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 26

Superfamilia: grupo de proteínas con un origen común.

Familia / Subfamilia: grupo de proteínas con una función común (jerarquía subjetiva).

proteínas ATP/GTP binding(superfamilia)

familia rasproteínas GTP-binding

factores de elongación

proteínas ATP-binding

rab (H. sapiens)

rab (M. musculus)

rab (C. elegans)

ras (H. sapiens)

ras (M. musculus)

ras (C. elegans)

ras2 (H. sapiens)

Subfamilia ras

Subfamilia rab

Dos formas de representarlo

rasrab

Model : random change + natural selection + gene duplication

(From Federico Abascal)

Page 27: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 27

Homólogos: ortólogos y parálogos.

rab (H. sapiens)

rab (M. musculus)

rab (C. elegans)

ras (H. sapiens)

ras (M. musculus)

ras (C. elegans)

ras2 (H. sapiens)Subfamilia ras. Grupo de ortólogos e in-paralogs.

Subfamilia rab. Grupo de ortólogos.

Las dos subfamilias sonparálogas entre sí.

in-paralogs.Duplicación reciente

Page 28: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 28

¿Por qué compararamos secuencias de . . .

1- para conocer la función de las proteínas:

-función general.-residuos importantes: p.e. centros activos.

2- para determinar en qué especies está una proteína.3- para predecir la estructura 3D de las proteínas. 4- para predecir especificidad funcional

proteínas?

ADN?

-para buscar genes:-ESTs.-ADN genómico.

-para estudios de genética poblacional (SNPs).-para comparar secuencias no codificantes.

Modified from F. Abascal

(Genomica)

(Filogenia)

(Estructura)

Page 29: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 29

What are sequences telling us?

ADGHLSCETRDLWYALDSOPRL

A L F Y A G C E E LA I F R A G C E E T

A S Y Y A G G D E IA T Y Y D G G D E IA T Y L A G G D E IA S R L A G G D E IA S Y Y A G G D E I

A L F Y A G C E E LA I F R A G C E E TA I F R A G C E E LA V F Y A G C E E L

A L F Y A G C E E L

PROTEIN FAMILIES are analysed via multiple alignments.

•NOTHING!=uninformative in evolutionary terms. … But, the physical and chemical properties of amino acidsCAN AID IN Secondary structure prediction(Estructuras)

•Very little, we could find clear ortologs: to detect gene dup.

Maybe A LOT!,•we can analyse trends in the Alignments: evolutionary info.•We can use this information to increase the complexityof our searches

Page 30: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 30

How? What for?

-Pair-wise- align 2 sequences- BLAST search against databases.

-Several proteins- multiple alignments (Clustalw, TCOFFEE, probabilistic).

-con motifs, profiles and hmm's- profiles: PSI-BLAST.- som DB’s:

· PROSITE· PFam· InterPro

•CLEAR ORTOLOGS

•Detect Paralogs and highly diverged•Evolutionary history

•Detect partial sequences•Beyond the twigth-light

The methods: sequence comparison.

Page 31: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 31

How to identify orthologs?

Page 32: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 32

They might be lots of paralogs many similar each other:

Which one is the real ortholog?

Page 33: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 33

The BEST bidirectional hit method (BBHs)

Genomas X e Y

Genomaancestral 0

Genomaancestral 1

No haydelecion

Page 34: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 34

Gen A2 is deleted: Method is still working though!

The BEST bidirectional hit method (BBHs)

Page 35: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 35

Gen A2 is deleted in genome X and gene A1 is deleted in genome Y: Method DOES NOT work

The BEST bidirectional hit method (BBHs)

Page 36: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 36

Automatic methods: COGS

•Best BH for each protein within genomes.•In-paralogs fusion (the most similar one)•Graph building using triangles•Fusion of triangles sharing 2 vertices•Grouping

PROBLEMS: DOMAIN SHUFFLING (Next classes)

Page 37: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 37

We use multiple alignments to analyse families

Page 38: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 38

Problem: the computational power required for:•Aligning 2 sequeces is NxM.•Aligning 3 sequences is NxMxL.

i.e.: if 2 seqs of 300 aa would take 1 sec, aligning 3 will take 300 secs... 10 will take 3008 secs (more than universe’s age).

The solution comes from heuristics. (ClustalW, Muscle, T-coffee).

A reminder: multiple alignments

Multiple alignment is a NP problem

Page 39: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 39

Information extracted from multiple sequence alignments

conserved

tree-determinants correlated mutations

What can we get from multiple alignments?

Different trends give different information: types of residues

Page 40: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 40

Motifs

Small conserved regions

Tend to correlate with functional characteristics :

- Active sites

-ligand binding sites.

-etc.

Can we get anything else?

Page 41: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 41

Gets Real!

Page 42: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 42

•The Acetyltransferases•Chemokine receptors

examples of functional specificity

What can we get from multiple alignments?= REAL examples.

Page 43: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 43

Acetyl transferases

Page 44: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 44

Carnitine transferases catalyse the exchange of acyl groups between carnitine and CoA (fatty acid metabolism). According to the acyl-CoA sensitivity:

CPTs (carnitine palmitoyltransferases): active towards long chainsMitochondrial beta-oxidation.

COTs (carnitine octanoyltransferases): active towards medium chainsPeroxisomes: mediates transport to mitochondria

CrATs (carnitine acetyltransferases): active towards short chain acyl-CoAsReversible conversion of acetyl-CoA and carnitine to acetylcarnitine and free CoA

Page 45: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 45

L-CPT I

M-CPT I

COT

CPT II

CrAT

ChAT

long

cha

in a

cyl-C

oA

short chain acyl-C

oA

medium chain acyl-CoA

malonyl-CoA regulated

malonyl-CoA insensitive

choline

carnitine

F.G. Hegardt

Acetyl transferases

Page 46: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 46

Information extracted from multiple sequence alignments

conserved

tree-determinants correlated mutations

Acetyl transferases

Page 47: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 47

SINGLEMUTATION

DECREASEDSTABILITY

"RESTORED"STABILITY SECOND COMPENSATORY

MUTATION

Correlated Mutations

Pazos et al.J. Mol. Biol., 1997

Acetyl transferases

Page 48: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 48

Information extracted from multiple sequence alignments

tree-determinants

Acetyl transferases

Page 49: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 49

Malonyl-CoA regulation: Met vs. Ser

Carnitine-Choline: Thr/Glu/Thr vs. Val/Asp/Asn

Short vs. Long substrate: Gly vs. Met

Acetyl transferases

Page 50: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 50

Predicting functional specificity

Identifying dimerisation residuesIn chemokine receptors.

Page 51: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 51

Smallmolecules:aa’s, amines, nucleosides, peptides, etc.

Pheromonesodorants

Ca2+Light

TSH,LH,FSH, IL’s, CK;s, etc

Proteins

EFFECTOR:Enzymechannels

Intracelullarmessenger

INTERNALIZATION

Arrestin

TRANSDUCTION

GCRP: Ligand binding

Page 52: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 52

NT- Ca2+ sensing receptor

CT- GABAB receptor

TM IV- B-adrenergic

N

C

C

N

The GCPR’sdimerize

GCRP: Dimerization

Inhibitor

Page 53: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 53

The two main events here are:

•Binding specificity.

•Dimerization/Oligomerization.

•Can we predict the signals and distinguish themat the sequence level?

Then, we have two aims:

• Which residues are involved in dimerisation?

GCRP: the issue

Page 54: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 54

Selected group: Chemokines

Why?: they are known to dimerize!

GCRP: the chemokines.

Page 55: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 55

RossiRossi & & ZlotnikZlotnik. . AnnuAnnu. Rev.. Rev.ImmunolImmunol. . (2000). 18:217(2000). 18:217--242.242.

CHEMOKINESCHEMOKINES

WoundWoundhealinghealing

Th1/Th2Th1/Th2developmentdevelopment

AngiogenesisAngiogenesis

TumorTumormetastasismetastasis

CellCellrecruitmentrecruitment

InflammationInflammation

OrganogenesisOrganogenesis

LymphoidLymphoidtraffickingtrafficking

RogersRogers D. D. VanderbiltVanderbilt UniversityUniversity (1950s) (1950s)

Chemokines: biological functions

Page 56: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 56

ADHESIONADHESIONCHEMOTAXISCHEMOTAXIS

POLARIZATIONPOLARIZATIONINTERNALIZATIONINTERNALIZATIONGENE EXPRESSIONGENE EXPRESSION

JAK

JAK

Gi dependent Gi independent

ThelenThelen (2001)(2001)

Mellado et al, (2001)Mellado et al, (2001)

STATSTAT

SOCSSOCS

Chemokines: signaling

Page 57: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 57

Chemokine receptors are in an equilibrium betweenseveral conformations: monomers, homodimers andheterodimers

Chemokines: conformations.

Page 58: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 58

• Existing methods to detect important residues:

GCRP: methods

HannenhalliHannenhalli & & RussellRussell. . JMB JMB (2000). 306:61(2000). 306:61--76.76.

Page 59: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 59

1.- Alignment selection.

2.- Tree determinants searching.

3.- Selecting regions.

4.- Mapping and rough model generation basedon Rhodopsin (to visually represent the results).

Steps:

TEST CASE: CHEMOKINES, known to dimerise.

Our strategy

Page 60: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 60

(http://www.gpcr.org/7M/)

• Clustering: to obtain a representative alignment containing groups:CCR1-9, CXCR3-5, and IL8A-B (total 61).

• Different levels of redundancy tested (75-100%). A redundancy level of 95% selected to compensate the number of sequences and alignment bias reduction

• Realignment using T-COFFEE with secondary structure predictions taking into account the rhodopsin model.

TEST CASE: CHEMOKINES

Alignment selection

Page 61: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 61

TREE DETERMINANTSEARCHING

•Level entropy method•Mutational behaviourmethod (MB)•Sequence SpaceAutomated Method (FASS)

Basics: Homodimerization specificity is trying to avoid promiscuous dimerisation between homologous sequences!

Dimerization-focused strategy: obtaining the best subfamily division(as many subfamily groups as possible).

Finding residues

Page 62: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 62

What is a TreeDeterminant?

Finding residues

Page 63: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 63

•What methods do we use to predict functional sites?

MB method.

S-method

PCA

Finding residues

Page 64: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 64

An example:

Sequence Space: overviewCasari, G. et al. Nat. Struct. Biol (1995). 2:171-178.

Page 65: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 65

Finding residues

{Carro et al, NAR, 2006}

Page 66: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 66

Page 67: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 67

Page 68: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 68

Residues obtained by Sequence-Space family division.

Tree-determinants: Clustering results

CKR1/3

IL8A/B

CKR6/11/9/7

CKR5/2

CKR8/4

CKR10

Page 69: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 69

CXCR3/5CXCR4CCR1/3/VIL8A/B

CCR6/7/9/11

CCR2/5CCR4/8CCR10

Sequence Space: Clustering results

Page 70: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 70

Region selection and then, residue selection (not necessarily the TD’s)

solvent accessibleS-methodS-method,

buried

Both S-method & FASS

Visualizing interface regions

Page 71: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 71

Bioinformatics: Conclusions

•The automated version is capable to detectthe Functional signal

•The dimerization signal still needs extensive humansupervision.

•Not all the obtained pairs were tested so, functionalsignals could very well be dimer/oligomerization ones.

•… and experimental validation of certain pairsconfirmed the predicitive power of this approach.

Page 72: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 72

L1-2 Mut-CCR5 L1-2 Wt-CCR5

CCL5-biot CCL5-biot

Fluorescence intensity

Cell

num

ber

L1-2 Mut-CCR5 L1-2 Wt-CCR5

CCR5-03 CCR5-03

Fluorescence intensity

Cell

num

ber

CCR5wt (Kd 0.87 nM)CCR5mut (Kd 1.33 nM)

CCL5 (nM)

0.01 0. 1 1 10 100 1000

20

40

60

80

100

0

% Bo

und

125 I

-CCL

5

CCR5I52V/V150A

Anti CCr5 staining

Similar CCl5-binding

CCR5I52V/V150A and CCR5 show similar membrane expression and ligand binding

Page 73: FAMILIAS DE PROTEINAS - ubio.bioinfo.cnio.esubio.bioinfo.cnio.es/Cursos/cursoVerano2008/documentos/arojas_su… · Homology: common ancestor. May have different function. Do not forget

Curso de verano UCM 2008- Ana Rojas-CNIOJUL-2008 73

Luis Sanchez-Pulido, CNB.Fede Abascal- CNB.Manuel Gomez-CAB.Juan Carlos Sanchez-CNB.

M. Mellado, DIO-CNB.

Many thanks to: