cafa poster presented at cshl genome informatics 2013

1
Understanding protein function is a key component to understanding life at a molecular level. It is also important for understanding and treating human disease, since many conditions arise as a consequence of the loss or gain of protein function. To understand and improve our ability to computationally annotate proteins, we are holding a series of multi-year challenges to the developers of function annotation programs. The rationale being that having these programs challenged and assessed will lead to understanding and improving predictive ability. The first critical assessment of Function Annotation (CAFA 1) was held over 2010-2011, involved 23 research groups and assessed the performance of 54 algorithms. CAFA 1 was structured as a time-challenge, where proteins which had no experimentally-validated function annotation were presented to the methods, and their function was predicted. Over the course of 10 months, some of these proteins gained experimental validation, and those were used as the final benchmark to assess program performance. Here we review CAFA 1, and introduce CAFA 2, which is taking place 2013-2014. (a) Domain architecture of human PNPT1 gene according to the Pfam classification. For each domain, the numbers of different leaf terms (associated with any protein in Swiss-Prot database containing this domain are shown. (b) Molecular Function terms (six of which are leaves) associated with the human PNPT1 gene in Swiss-Prot as of December 2011. Colored circles represent the predicted terms for three representative methods as well as two baseline methods. The prediction threshold for each method was selected to correspond to the point in the precision-recall space that provides the maximum F-measure. J (blue), Jones-UCL; O (magenta), Team Orengo; d (navy blue), dcGO; B (green), BLAST; N (brown), Naive. Dashed lines indicate the presence of other terms between the source and destination nodes. Precision: pr = TP/(TP+FP) Recall: rc = TP/(TP+FN) MFO BPO Case Study: hPNPase Predictions on Human and Mouse The CAFA Experiment: Generating Targets 2 ×( pr + rc pr × rc ) F1 = Assessing Method Performance New in CAFA 2 Human Phenotype Ontology Cellular Component Ontology Reassessing CAFA 1 methods Database Bias There is extensive bias in experimentally validated annotations in Uniprot-GOA. The bias is contributed by high throughput experiments. Many HT experimental annotations create redundancies Computer Scientists Experimental Biologists Biocurators Computational Biologists CAFA Engaging more communities Algorithms, Assessment methods Targets Targets & Ontologies Critical Assessment of Function Annotations: Lessons Learned and the Road Ahead Iddo Friedberg 1,* , Wyatt T Clark 2,3 , Alexandra M Schnoes 4 , Patricia C Babbitt 4 , Sean D Mooney 5 and Predrag Radivojac 2 Introduction Participating Methods A circle represents the sum total of articles annotating each organism. Each colored arch is composed of all the proteins in a single article. A line is drawn between any two points on the circle if the proteins they represent have 100% sequence identity. A black line is drawn if they are annotated with a different ontology (for example, in one article the protein is annotated with the MFO, and in another article with BPO); a red line if they are annotated in the same ontology. Example: S. pombe is described by two articles, one with few protein (light arch on bottom) and one with many (dark arch encompassing most of circle). Many of the same proteins are annotated by both articles. References and more information CAFA: Radivojac et al (2013) Nature Methods doi:10.1038/nmeth.2340 http://BioFunctionPrediction.org Database Bias: Schnoes et al (2013) PLoS Computational Biology doi:10.1371/journal.pcbi.1003063 Go to our website Download poster Steering Committee Patricia Babbitt Steven Brenner Christine Orengo Burkhard Rost Organizing Committee Iddo Friedberg Michal Linial Mark Wass Sean D Mooney Predrag Radivojac Data Wrangler Tal Ronen Oron CAFA 2 Assessor Anna Tramontano 1. Miami University, Oxford OH 2. Indiana University, Bloomington, IN 3. Yale University, New Haven, MA 4. University of California San Francisco, CA 5. Buck Institute for Research on Aging, CA Author Affiliations * [email protected]

Upload: iddo

Post on 22-May-2015

489 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: CAFA poster presented at CSHL Genome Informatics 2013

Understanding protein function is a key component to understanding life at a molecular level. It is also important for understanding and treating human disease, since many conditions arise as a consequence of the loss or gain of protein function.

To understand and improve our ability to computationally annotate proteins, we are holding a series of multi-year challenges to the developers of function annotation programs. The rationale being that having these programs challenged and assessed will lead to understanding and improving predictive ability. The first critical assessment of Function Annotation (CAFA 1) was held over 2010-2011, involved 23 research groups and assessed the performance of 54 algorithms. CAFA 1 was structured as a time-challenge, where proteins which had no experimentally-validated function annotation were presented to the methods, and their function was predicted. Over the course of 10 months, some of these proteins gained experimental validation, and those were used as the final benchmark to assess program performance.

Here we review CAFA 1, and introduce CAFA 2, which is taking place 2013-2014.

(a) Domain architecture of human PNPT1 gene according to the Pfam classification. For each domain, the numbers of different leaf terms (associated with any protein in Swiss-Prot database containing this domain are shown. (b) Molecular Function terms (six of which are leaves) associated with the human PNPT1 gene in Swiss-Prot as of December 2011. Colored circles represent the predicted terms for three representative methods as well as two baseline methods. The prediction threshold for each method was selected to correspond to the point in the precision-recall space that provides the maximum F-measure. J (blue), Jones-UCL; O (magenta), Team Orengo; d (navy blue), dcGO; B (green), BLAST; N (brown), Naive. Dashed lines indicate the presence of other terms between the source and destination nodes.

Precision: pr = TP/(TP+FP)Recall: rc = TP/(TP+FN)

MFO

BPO

Case Study: hPNPase

Predictions on Human and Mouse

The CAFA Experiment: Generating Targets

2×(pr+rcpr×rc

)F1 =

Assessing Method PerformanceNew in CAFA 2

Human Phenotype Ontology

Cellular Component Ontology

Reassessing CAFA 1 methods

Database Bias

There is extensive bias in experimentally validated annotations in Uniprot-GOA. The bias is contributed by high throughput experiments.

Many HT experimental annotations create redundancies

Computer Scientists

ExperimentalBiologists

BiocuratorsComputationalBiologists

CAFA

Engaging more communities

Algorithms,Assessment methods

Targets

Targets &Ontologies

Critical Assessment of Function Annotations: Lessons Learned and the Road AheadIddo Friedberg1,*, Wyatt T Clark2,3, Alexandra M Schnoes4, Patricia C Babbitt4, Sean D Mooney5 and Predrag Radivojac2

Introduction Participating Methods

A circle represents the sum total of articles annotating each organism. Each colored arch is composed of all the proteins in a single article. A line is drawn between any two points on the circle if the proteins they represent have 100% sequence identity. A black line is drawn if they are annotated with a different ontology (for example, in one article the protein is annotated with the MFO, and in another article with BPO); a red line if they are annotated in the same ontology. Example: S. pombe is described by two articles, one with few protein (light arch on bottom) and one with many (dark arch encompassing most of circle). Many of the same proteins are annotated by both articles.

References and more information

CAFA: Radivojac et al (2013) Nature Methods doi:10.1038/nmeth.2340

http://BioFunctionPrediction.org

Database Bias: Schnoes et al (2013) PLoS Computational Biology doi:10.1371/journal.pcbi.1003063

Go to our websiteDownload poster

Steering Committee

Patricia BabbittSteven BrennerChristine OrengoBurkhard Rost

Organizing Committee

Iddo FriedbergMichal LinialMark WassSean D MooneyPredrag Radivojac

Data Wrangler

Tal Ronen Oron

CAFA 2 Assessor

Anna Tramontano

1. Miami University, Oxford OH2. Indiana University, Bloomington, IN3. Yale University, New Haven, MA4. University of California San Francisco, CA5. Buck Institute for Research on Aging, CA

Author Affiliations

* [email protected]