Transcript
Page 1: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

A community-driven annotation platform for structural genomics

Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008

Biomedical theme: Central Machinery of Life -proteins conserved in all kingdoms of lifeBiological theme: Complete coverage of Thermotoga maritima

Adam Godzikand the JCSG Bioinformatics Team

Page 2: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Science is all about communication• Since late XIX century, a dominant way of communicating scientific

results is through peer-reviewed manuscripts• Pro

– Peer review ensures quality– Enforces a “publishable unit” – decreases noise in the

“communication space”– Authorship rules ensure proper distribution of credit in a

system that is well integrated with system of promotions and evaluations

• Con– Significant time lag and additional costs– Enforces a “publishable unit” – below the threshold

results are lost – Not scalable with high throughput data production

Page 3: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Increasingly, it’s not the only game in town

• Databases and automated annotation protocols– pro: fast, machine searchable, scalable– con: difficult to ensure quality and assign credit, put the

burden of expertise on the user

• Wikipedia– pro: harnesses power of community, scalable– con: unreliable, difficult to ensure quality and assign

credit

Page 4: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Can we have the best of all worlds?

Peer-reviewed manuscript

Automated database annotationWikipedia

entry

Fast, accurate,scalable

Page 5: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

purificationexpressioncloning

struc. refinementstruc. validationannotationpublication

phasingdata collectionxtal screening tracing xtal mounting

crystallizationimagingharvesting

targetselection

PDB

Structure determination in PSI centers is done on a semi-automated assembly line

• Joint Center for Structural Genomics• One of four large scale (production) centers of PSI2• ~ 600 structures deposited in the PDB• Sustained pace of ~15 PDB depositions per month

Page 6: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

…and the pace of structure determination far outstrips the pace of our publications…

PSI-1

PSI-2

Structure collage: PSIPublication statistics : http://olenka.med.virginia.edu/psi/

Page 7: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Why?• Speed: 2-3 months from target selection to

structure, 2-3 structures per week in each center• Assembly line process, no time to develop

“special relationship” with each protein• Structures are not associated with ongoing

biochemical and biological research.• Targets selected based on novelty, no expertise

available anywhere, difficult to reach “publishable unit”

Page 8: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

We are not alone …• Bacterial genomes

– 1995-2000 - every new sequenced genome led to a Nature/Science publication

– with 500+ genomes, an increasing percentage of them never become a single focus of a specific publication

– Community based annotation efforts become the best source of information (SEED)

Page 9: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

WEB 2.0 is reshaping how we share information:

Communities of globally distributed peers (Networks) built around rich, collaborative environments.

WikipediaCitizendiumScholarpediaGoogle Knols

OpenWetwareWikiomicsWikiProteinsTOPSAN

Page 10: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

How can we tap into an ultimate research tool?

• Search engines are becoming serious research tools

• Google indexes research papers, books, wikipedia pages

• Semi-natural language searches

Page 11: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

0

10

20

30

40

50

60

70

80

1980 1985 1990 1995 2000 2007

year

% c

ove

red

in PDB

sequence identity > 30%

blast e-value < 0.001

FFAS score < -9.5

Structural coverage of many genomes (here T.maritima) approaches completeness

~73% of feasible targets

Page 12: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

http://research.calit2.net/metagenomics/thermotoga

Which brings attention from broader communities

Which brings attention from broader communities

Page 13: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

We can utilize other information

• Metabolic reconstruction of T.maritima was done in collaboration with UCSD Systems Biology Lab (Bernard Palsson)

• Model is consistent with all the published experimental data on TM (see Ines Thiele poster)

• First generation model covers 479 genes (1398 are not in the model), 492 metabolites

– Proteins coded by 113 of these genes have been solved (71 at JCSG, 28 at other PSI centers)

– 320 have be modeled– We know at least approximate

structure of ALL the proteins in the reconstruction

Page 14: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

And bring it together to help make sense of the structures and see them in the full context

Page 15: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

All available information about a protein on one page

Page 16: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:
Page 17: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

We try to combine automated, database driven annotations with expert curated input.

Annotation:• Feeds from public databases• Expert-curated informationContent management:• Wiki-style editing (WYSIWYG editor)• Page-level access control• Structured fields + free text• Instant publication• Always open for comments and editsQuality control & authorship:• Encourage community collaboration• JCSG scientists & invited peers• Many authors - no contribution too

small• Lead authors (editors) in charge of

releases

Page 18: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

TOPSAN content:

Page 19: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Protein Groups

Page 20: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Browsing Options

Page 21: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

JCSG: Structures / Structure Notes / TOPSAN

278 of 593 structures have an annotation on TOPSAN

Page 22: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Members of the biological community can utilize PSI structures only when they are aware of them

Functionally well characterized enzyme and is also a new fold.PDB ID: 3C8WTargetDB: 376561

Page 23: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

TOPSAN access statistics- Jan to Mar 08

1143 visits from rest of the world.

Page 24: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Google sent us the largest number of visitors.

Page 25: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

UCSD & BurnhamBioinformatics CoreJohn WooleyAdam GodzikLukasz JaroszewskiSlawomir GrzechnikSri Krishna SubramanianAndrew MorseTamara AstakhovaLian DuanPiotr KozbialDana WeekesNatasha SefcovicJosie Alaoen

GNF & TSRICrystallomics CoreScott LesleyMark KnuthHeath KlockMarc DellerDennis CarltonPolat Abdubek Sanjay AgarwallaConnie ChenThomas ClaytonDustin ErnstJulie FeuerhelmRegina GorskiAnna GrzechnikJoanna C. HaleThamara JanaratneHope JohnsonSachin KaleDaniel McMullanEdward NigoghossianAmanda NopakunLinda OkachJessica PaulsenChristina PuckettSebastian SudekJessica Canseco

Scientific Advisory Board

Sir Tom BlundellUniv. CambridgeHomme Hellinga

Duke University Medical CenterJames Naismith

The Scottish Structural Proteomics facility Univ. St. AndrewsJames Paulson

Consortium for Functional Glycomics,The Scripps Research Institute

Robert StroudCenter for Structure of Membrane Proteins,

Membrane Protein Expression Center UC San FranciscoSoichi Wakatsuki

Photon Factory, KEK, JapanJames Wells

UC San FranciscoTodd Yeates

UCLA-DOE, Inst. for Genomics and Proteomics

TSRINMR CoreKurt Wüthrich Reto Horst Margaret JohnsonAmaranth ChatterjeeMichael GeraltWojtek AugustyniakJin-Kyu RheeBiswaranjan MohantyBill PedriniPedro Serrano

TSRI Administrative CoreIan WilsonMarc ElsligerGye Won HanDavid MarcianoHenry TienXiaoping DaiLisa van Veen

Stanford /SSRLStructure Determination CoreKeith HodgsonAshley DeaconMitchell Miller Hsiu-Ju (Jessica) ChiuDebanu Das Kevin JinAbhinav KumarWinnie LamSilvya OommachenChristopher RifeScott TalafuseChristine TrameQingping XuHenry van den BedemRonald Reyes

The JCSG is supported by the NIH Protein Structure Initiative grant U54 GM074898 from the National Institute of General Medical Sciences (www.nigms.nih.gov).

Page 26: A community-driven annotation platform for structural genomics Workshop on the Biological Annotation of Novel Proteins, March 7-8, 2008 Biomedical theme:

Thermotoga browser acknowledgments• Co-PI of the project - Andrei Osterman (the biochemistry side, specific examples)• The JCSG team - for all the structures, focus on Thermotoga and CML• Bernard Palsson group and Ines Thiele for work with Thermotoga reconstruction and

model simulations• The JCMM team for structure modeling• Krzysztof Ginalski and bioinfor server team for assistance with “borderline” predictions

• Ying Zhang (JCMM) - finalizing the metabolic reconstruction, network and fold distribution analysis

• Dana Weekes (JCSG) - first pass on the Thermotoga metabolic reconstruction, TM TOPSAN pages

• Craig Shepherd (JCMM) - network visualization• Zhanwen Li (JCMM) - modeling and fold assignments


Top Related