proteins and their 3 d structure - bioinformatics laboratory · sms and protein dossier –drug...

70
Goran Neshich http://www.cbi.cnptia.embrapa.br Proteins and their 3 D Structure Goran Neshich Embrapa Informática Agropecuária Cidade Universitária - UNICAMP Campinas, SP Structural BioInformatics Laboratory: SBI

Upload: truongdien

Post on 08-May-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Proteins and their 3 D Structure

Goran Neshich

Embrapa Informática Agropecuária

Cidade Universitária - UNICAMP

Campinas, SP

Structural BioInformatics Laboratory: SBI

Page 2: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

•Gene Anotation

•Gene Comparison

•Structure

Descriptors

•Function

Descriptors

•Gene Expression

Networks

•Proteomics

Sequence

Blast

Lexical

Structure

STING

Sintactic

Function

SMS

Semantic

Role

Microarray

Image

Analysing

Pragmatic

Page 3: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

http://www.cbi.cnpia.embrapa.br

Bringing Genome Into Three Dimensions

Old protein map

Parallels that help us to see the problem better

Page 4: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structure/function descriptors in JPD

Page 5: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Data/information deluge

and

flavors of Bioinformatics

Page 6: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Datalibrary – 2003 (february)

23.950.735 nucleotide sequences,37.486.732.136 bp

112 Published-complete genomes

590 Genomes being done

830.525 Protein Sequences

20.417 Protein Structures

5.300 Plasmodium falciparum genes, 23.000.000 bp

35.000 Genes in Homo sapiens,3.164.000.000 bp,

27936 genes in Xyllela fastidiosa,

2.519.802 Bases, 2775 proteins

10.000.000 Publications in PubMedline

Page 7: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Datalibrary – 2003 (October)

29,189,427 nucleotide sequences (~40 x 109 bp)

Published-complete genomes:

Virus: 1421; Archaea:16; Bacteria:135;

Eucariots: 9 +4 vertebrates+7 plants

590 Genomes being done

1,139,154 Protein Sequences

22,700 Protein Structures (PDB)

480 genes in Mycoplasma genitalium: 580,000 bp

35,000 Genes in Homo sapiens (3.164 x 109 bp)

27,936 genes in Xyllela fastidiosa,

2.519.802 Bases,

>10,000,000 Publications in PubMedline

Page 8: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

The “high throughputs...”

Page 9: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Ancient Chinese

Hindu

Babylonian

Egyptian

Maya

Roman

Modern Arabic

Parallels that help us to see the problem better

MCMLVI

1956

Page 10: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Onde atuamos?

Descritores de estruturaanotação

Sequenciamento

de GenomasGenômica

Estrutural

Interação proteína-ligante

(matching DB)

Mutational and

dynamic studiesDocking

Structural DB

Estrutura-Funcão

Livro da vida

Busca por novos efetores Drug Discovery

SMS and Protein Dossier – Drug Target DB

Page 11: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Final goal: complement Genome Track

Small molecules

Database

Fingerprint

Local PDB files

Fingerprint

Complete Genome

Sequence

Homology Modeling

Protein/Ligand interaction

(matching DB)

Mutational and

dynamic studiesDocking

Protein-binding site 2-D

information (for search)

2D Contour map surface

matching

Ligand-binding site 2-D

information (for search )

SMS and Protein Dossier – Drug Target DB

Page 12: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

1. Sequence similarity search

2. Sequence alignments

3. Structure alignment

4. Secondary structure prediction

5. Structure modeling (homology modeling)

6. Structure prediction (threding)

7. Characterization of structure

8. Relationship: sequence-structure-function

9. Function modifiers

10.Compiling the list of pairs: structure and its function

modifier

Page 13: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

Sequence similarity search

Sequence alignment

Page 14: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

AKWHGGAFWPPH

WAAGAHWPHAQD

Page 15: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

http://www.cbi.cnpia.embrapa.br

Bringing Genome Into

Three Dimensions

How well function can be

inherited from similar

sequences?

Functional Genomics Milestone:

From sequence to function: desires and problems

Page 16: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Data/information deluge

and

flavors of Bioinformatics

Page 17: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

1. Genomic sequencing2. Protein crystalization3. Synchrotron crystallography4. NMR5. Mass spectrometry6. Mutageneses experiments7. Screening8. Chemical synthesis

High Throughputs help increase a picture resolution:

Page 18: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

1.What do we get?

2.A big puzzle with great many peaces!!!

High Throughputs help increase a picture resolution:

Page 19: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

• Transcriptomics involves large-scale analysis of messenger RNAs (molecules

that are transcribed from active genes) to follow when, where, and under what conditions genes are expressed.

• Proteomics the study of protein expression and function—can bring

researchers closer than gene expression studies to what’s actually happening in

the cell.

• Structural genomics initiatives are being launched worldwide to generate the

3-D structures of one or more proteins from each protein family, thus offering clues to function and biological targets for drug design.

• Knockout studies are one experimental method for understanding the function

of DNA sequences and the proteins they encode. Researchers inactivate genes

in living organisms and monitor any changes that could reveal the function of specific genes.

• Comparative genomics—analyzing DNA sequence patterns of humans and

well-studied model organisms side-by-side—has become one of the most

powerful strategies for identifying human genes and interpreting their function.

Next Step in Genomics

Page 20: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

From gene to functional protein

Page 21: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 22: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Page 23: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 24: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 25: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 26: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 27: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

Sequence alignment

Page 28: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Scoring Matrices

T G A C

T 1 0 0 0

G 0 1 0 0

A 0 0 1 0

C 0 0 0 1

For DNA/RNA match=1, mismatch = 0

Instead of using points at match/mismatch, we may use

“scoring matrix”

“dotplot” is now converted into diagram of numbers and

best alignment corresponds to this diagonal with greatest

numerical value

Page 29: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

A R N D C Q E G H I L K M F P S T W Y V

A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0

R -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3

N -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3

D -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4

C -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -4

Q -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3

E -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3

G 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4

H -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4

I -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4

L -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1

K -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3

M -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1

F -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1

P -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3

S 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2

T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0

W -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3

Y -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1

V 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5

A R N D C Q E G H I L K M F P S T W Y V

A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0

R -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3

N -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3

D -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4

C -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -4

Q -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3

E -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3

G 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4

H -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4

I -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4

L -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1

K -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3

M -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1

F -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1

P -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3

S 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2

T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0

W -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3

Y -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1

V 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5

Page 30: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Dotplot with scores

Two proteins aligned produce “score dotplot” from which

one can calculate optimal alignment

H E A G A W G H E E

P

A

W

H

E

A

E

H E A G A W G H E E

P -2 -1 -1 -2 -1 -4 -2 -2 -1 -1

A -2 -1 5 0 5 -3 0 -2 -1 -1

W -3 -3 -3 -3 -3 15 -3 -3 -3 -3

H 10 0 -2 -2 -2 -3 -2 10 0 0

E 0 6 -1 -3 -1 -3 -3 0 6 6

A -2 -1 5 0 5 -3 0 -2 -1 -1

E 0 6 -1 -3 -1 -3 -3 0 6 6

Page 31: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Simple alignment

Graphical presentation of alignment

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| | |

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| | | | | | | |

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| | |

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| | |||| | | |

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| ||| | || ||

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| | | || ||

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| || || |

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| | | |

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

||

CGAAATCGCATCAGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|||||||||||||||||||||||||||||

CGAAATCGCATCAGCATACGATCGCATGC

Page 32: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Alignment with “gaps”

Simple alignment does not always function

well:

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| | | | | | | | |

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| | |

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| | ||||| | | |

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| | | || ||

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| ||| | | || ||

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| | || |

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| || | |

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

| | |

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

|| ||||||||||||||||

CGAAATCGCATCACGCATACGATCGCATGC

CGCTTCGGACGAAATCGCATCAGCATACGATCGCATGCCGGGCGGGATAAC

||||||||||||| |

CGAAATCGCATCACGCATACGATCGCATGC

In many cases where two sequences do not

“coincide/align” perfectly, it is necessary to

introduce “gaps”.

CGCTTCGGACGAAATCGCATCA-GCATACGATCGCATGCCGGGCGGGATAA

||||||||||||| ||||||||||||||||

CGAAATCGCATCACGCATACGATCGCATGC

Page 33: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

Structure elements

Page 34: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

STING Millennium Suite:

Analysing structure of proteins and their complexes -

What do we know about structure

and its relationship with function?

What are the building blocks of

microfactories, better known as

PROTEINS?

What is the structural hierarchi in

proteins?

Page 35: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

STING Millennium Suite:

Analysing structure of proteins and their complexes -

Secondary

structure elements:

Helix

Turn

Sheet

Coil

Page 36: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

STING Millennium Suite:

Analysing structure of proteins and their complexes -

Peptide bond and

other types of

“intimate” amino acid

contacts

Page 37: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Analysing structure of

proteins and their

complexes -

STING Millennium

Suite:

Page 38: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

STING Millennium Suite:

Analysing

structure of

proteins and

their

complexes -

“Proper”

structural

parameters:

dihedral angles and

Ramachandran plot

Page 39: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

STING Millennium Suite:

Analysing structure of

proteins and their complexes

Types of “intimate” amino acid

contacts: Hydrogen Bonds

Page 40: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Diamond STING Suite:

Analysing structure of

proteins and their complexes

Types of “intimate” amino acid

contacts: Hydrogen Bonds

Page 41: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

STING Millennium Suite:

Analysing structure of

proteins and their complexes

“Proper” structural parameters:

dihedral angles and Ramachandran

plot

Page 42: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Alpha Helix

Analysing structure of proteins and their complexes - SMS way

Page 43: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Ramachandran Plot

Page 44: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Collagen Helix

Analysing structure of proteins and their complexes - SMS way

Page 45: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

1. antiparallela

C. Struttura a foglietto ripiegatoExtended sheet - antiparallel

Analysing structure of proteins and their complexes - SMS way

Page 46: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.brExtended sheet - parallel

Analysing structure of proteins and their complexes - SMS way

Page 47: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Beta Turn

Analysing structure of proteins and their complexes - SMS way

Page 48: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Type II turn

Analysing structure of proteins and their complexes - SMS way

Page 49: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

Protein types

Page 50: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

1. mioglobin

2. flavodoxin 3. immunoglobulin lgG: domain CH2

Analysing structure of proteins and their complexes - SMS way

Page 51: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Proteins

1. 3-D presentation

2. Front view

C. Silk fiber

α-helix

superhelix

1. protofilamentA. α-Cheratin

3nm

10nm

1,5 nm

1. Triple Helix

2. Typical Sequence

3. Triple Helix (view from above)

Collagen

Page 52: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

1. monomer: cartoon

2. monomer: van der Waals presentation

C. Tertiary structure

1. dimer

2. complex Zn2+ hexamer

D. Quaternary Structure

Globular Proteins

Page 53: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Membrane protein secondary structure prediction

Integral membrane proteinsCitoplasmic side

External

protein

phpspholipidglycoprotein

glycolipid Extracellular cell

side

Page 54: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Table 2. Hydrophobicity scale

by Kyte & Doolittle (1982)

(K-D) and by Goldman,

Engelman & Steitz

(Engelman et al., 1986)

(GES).

Residuo K-D GES

Ile 4.5 3.1

Val 4.2 2.6

Leu 3.8 2.8

Phe 2.8 3.7

Cys 2.5 2.0

Met 1.9 3.4

Ala 1,8 1.6

Tyr 1.3 -0.7

Gly -0.4 1.0

Thr -0.7 1.2

Ser -0.8 0.6

Trp -0.9 1.9

Pro -1.6 -0.2

His -3.2 -3.0

Asp -3.5 -9.2

Glu -3.5 -8.2

Asn -3.5 -4.8

Gln -3.5 -4.1

Lys -3.9 -8.8

Arg -4.5 -12.3

Page 55: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

Structure modelling

Page 56: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Sequence-based fold

Recognition

50%

Probably non-globular

Protein

15%

As yet unobserved folds

20%

Full threading methods

15%

Figure 5 Hypothentical applicability of diferent categories of fold-recognition methods to the open

Reading Frames of small bacterial genomes. At present sequance-based fold recognition (e.g.

GenTHREADER) is successful for aroud 50% of the ORFs. Structures of a further 15% of ORFs can

probably be assigned. By full threading methods such as THREADER, and the reamaining 35%

cannot currently be recognized either because the fold has not yet observed, or because the ORF

encodes a non-globular protein (e.g. aTransmembrane protein).

Page 57: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Unannotated regionsPDB match region

Transmembrane or

Low complexity

region

Pie Chart of structural assignments to the proteome of the bacterium Mycoplasma genitalium. Almost

half of the amino acids (49%) in the Mycoplasma genitalium proteins have a structural annotation. In

this case, the structural anotation was taken from the SUPERFAMILY database(version 1.59,

September 2002), described in Section 11.3.2.Roughty one fifth of the proteome is predicted to be a

transmembrane helix or low complexity region by therelevant computer programs. The remaining 30%

of the proteome is unassigned.

Page 58: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Page 59: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

Structure alignment

Page 60: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Page 61: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Parallels that help us to see the problem better

Function modifiers: drugs

Page 62: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Molecular Geometry and 3D

Matching•Formato PDB

•Definições de Superfície Molecular

•Pockets e Cavities

•Fingerprints

•Matching

•Docking

Page 63: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Final goal: complement Genome Track

Small molecules

Database

Fingerprint

Local PDB files

Fingerprint

Complete Genome

Sequence

Homology Modeling

Protein/Ligand interaction

(matching DB)

Mutational and

dynamic studiesDocking

Protein-binding site 2-D

information (for search)

2D Contour map surface

matching

Ligand-binding site 2-D

information (for search )

SMS and Protein Dossier – Drug Target DB

Page 64: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Hundreds of targets

millions of compounds

Now…..

Page 65: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 66: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 67: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 68: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Page 69: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Leaving Surface: below the hood...

Page 70: Proteins and their 3 D Structure - Bioinformatics Laboratory · SMS and Protein Dossier –Drug Target DB. Goran Neshich Structural Bioinformatics 1. Sequence similarity search 2

Goran Neshichhttp://www.cbi.cnptia.embrapa.br

Structural Bioinformatics

Intermediary sequence - problem solved!

AKWHGGAFWPPH

WAAGAHWPHAQD

ARWHGGWPHAQE