trends in annotation of genomic data

26
One million monkeys with typewriters Annotations of the Genomic Data Deluge Genome Informatics Alliance Portland, 28/29 March 2012 Dr. Frank Schacherer, CTO, BIOBASE GmbH [email protected]

Upload: biobase

Post on 03-Dec-2014

1.534 views

Category:

Technology


3 download

DESCRIPTION

BIOBASE, the leader in data annotation and curation for genomics, took part in the Genome Informatics Alliance 2012: Logistics meeting in Oregon, and had an opportunity to present on trends in annotation of genomic data.

TRANSCRIPT

Page 1: Trends in Annotation of Genomic Data

One million monkeys with typewriters

Annotations of the Genomic Data Deluge

Genome Informatics Alliance

Portland, 28/29 March 2012

Dr. Frank Schacherer, CTO, BIOBASE GmbH

[email protected]

Page 2: Trends in Annotation of Genomic Data

Disclaimer: no actual monkeys involved

In 2003 the Arts Council for England paid £2,000 for a real-life test of the theorem involving six Sulawesi crested macaques, but the trial was abandoned after a month.

The monkeys produced five pages of text, mainly composed of the letter S, but failed to type anything close to a word of English, broke the computer and used the keyboard as a lavatory.

http://www.telegraph.co.uk/technology/news/8789894/Monkeys-at-typewriters-close-to-reproducing-Shakespeare.html

ATCGTTGATTTACCGGTA

CGCGCGTAAATACAATTGC

TGGCATCGTT

Page 3: Trends in Annotation of Genomic Data

• What annotation do we need?

• How can we get it?

Agenda

Page 4: Trends in Annotation of Genomic Data

A deluge of data

• deluge (plural deluges)– A great flood or rain.

The deluge continued for hours, drenching the land and slowing traffic to a halt.

– An overwhelming amount of something. The rock concert was a deluge of sound.

Page 5: Trends in Annotation of Genomic Data

Media perception

Cost of Gene Sequencing Falls, Raising Hopes for Medical Advances

7 March 2012

Soon, $1,000 Will Map Your Genes

10 Jan 2012

'Personalized Medicine' Hits a Bump / March 2012

Health Affairs 2009

Science 2011

The Power Of Digitizing Human Beings

17 Feb 2012

Page 6: Trends in Annotation of Genomic Data

Life cycle of data annotation

DeriveAnalyzePublish Curate

UnderstandMap

Annotate Rank

Page 7: Trends in Annotation of Genomic Data

How to predict mutation effects• Overlap with other data

– dbSNP, 1000 genomes– Relatives and Controls

• Algorithmically– Frameshift, Nonsense, Stop

gain/loss, Non-synonymous changes (SIFT, PolyPhen, ...)

• Based on annotation– known functional regions

(active sites, binding sites, ...)

• Directly known effects– HGMD

Bioinformatics, Vol. 26 no. 16 2010, pages 2069; 10.1093/bioinformatics/btq330

Page 8: Trends in Annotation of Genomic Data

Associating Genotype with Phenotype

http://www.gen2phen.org/

Page 9: Trends in Annotation of Genomic Data

What data do we need for clinical application

http://www.cdc.gov/genomics/gtesting/ACCE/index.htm

ACCE takes its name from the four main criteria for evaluating a genetic test — analytic validity, clinical validity, clinical utility and associated ethical, legal and social implications

Centers for Disease Control and PreventionOffice of Public Health Genomics (OPHG)

Page 10: Trends in Annotation of Genomic Data

Data from: Howard P. Levy, MD, PhD Johns Hopkins University

Ideal Annotation for clinical use?

• Variants – Pathogenic, Uncertain, Benign

– Severities, if known

– Ethnicities/Frequencies

– Number of cases

– Symptoms In conjunction with other mutations

• Evidences – Not weighted equally

– Risks of incorrect classification not equal between genes

N=124 Testing (Clinical Validity,Who/When, Methods, Interpretation, Cost)4 Management, Clinical Significance, Implications 3 Actionability, Clinical Utility 3 Clinical manifestations ( Pathophysiology, Phenotype, Prognosis, Severity, Penetrance, Pleiotropy) 2 Frequency (especially indicate most common variants) 2 Inheritance and de novo mutation rate 2 Evidence-based1 Clinical Decision Support in EHR

Data from: Elaine Lyon, Ph.D., FACMG University of Utah & ARUP Laboratories

Page 11: Trends in Annotation of Genomic Data

Who provides annotation?

MD/GeneticistPatient

Payor CuratorTest Lab

Anybody

Researcher

Computer

Page 12: Trends in Annotation of Genomic Data

Surveys & Patient Self-annotation

Knaus, William A.BUILDING A GENOME ENABLED ELECTRONIC MEDICAL RECORD

nature biotechnology VOLUME 29 NUMBER 5 MAY 2011

Patients with serious diseases may experiment with drugs that have not received regulatory approval. Online patient communities structured around quantitative outcome data have the potential to provide an observational environment to monitor such drug usage and its consequences. Here we describe an analysis of data reported on the website PatientsLikeMe by patients with amyotrophic lateral sclerosis (ALS) who experimented with lithium carbonate treatment

Patients with serious diseases may experiment with drugs that have not received regulatory approval. Online patient communities structured around quantitative outcome data have the potential to provide an observational environment to monitor such drug usage and its consequences. Here we describe an analysis of data reported on the website PatientsLikeMe by patients with amyotrophic lateral sclerosis (ALS) who experimented with lithium carbonate treatment

Page 13: Trends in Annotation of Genomic Data

DNA Variant Databases

Data, except for HGMD and DMuDB courtesy of P. Willems, Mutabase

Page 14: Trends in Annotation of Genomic Data

Data federation

Page 15: Trends in Annotation of Genomic Data

Testing Lab data

The Diagnostic Mutation Database (DMuDB) is a unique repository of high quality variant data collected from accredited clinical genetic testing laboratories in the UK National Health Service (NHS).It provides a safe and secure way for variant data to be shared within and between laboratories in order to support safer, more consistent diagnoses. The database was established in order to address the lack of data-sharing or publication in the genetic testing community.DMuDB is used regularly by genetic scientists:

• to check a new variant against existing reported variants from other laboratories

• to check for co-reported variants• as a part of regular re-assessment of unclassified variants• via the Universal Browser as part of complex searches

covering multiple databases

www.ngrl.org.uk/Manchester

A safe and secure route for sharing variant data

Page 16: Trends in Annotation of Genomic Data

LSDBs (Locus Specific Databases)

http://www.hgvs.org/dblist/glsdb.html

Page 17: Trends in Annotation of Genomic Data

Crowdsourcing genome annotation

Page 18: Trends in Annotation of Genomic Data

Crowdsourcing reality

“The future of biocuration To thrive, the field that links biologists and their data urgently needs structure, recognition and support. “NATURE|Vol 455|2008

…biological databases can be curated by a diffuse network of volunteers? This is certainly not the case and at the core of every successful wiki database are a group of dedicated experts who do the bulk of the data curation.

Page 19: Trends in Annotation of Genomic Data

Database curation

Page 20: Trends in Annotation of Genomic Data

• Clear incentives • Background in life sciences (MSc/PhD)• Curation is sole focus of work• Knowledge of standards, databases, formats,

specialized tools

Data Annotation Professionals

Huge volumes of primary data are currently archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent than today. The lasting archiving, accurate curation, efficient analysis and precise interpretation of all of these data are a challenge. Collectively, database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data.

Page 21: Trends in Annotation of Genomic Data

HGMD

Page 22: Trends in Annotation of Genomic Data

HGMD - comprehensive disease-causing germline

Page 23: Trends in Annotation of Genomic Data

Cleaning up the literature

Charts from: Jonathan S. Berg, U North Carolina, Chapel Hill

Page 24: Trends in Annotation of Genomic Data

Applying annotation

Page 25: Trends in Annotation of Genomic Data

• Clinical-grade annotation may be the most important task ahead

• NGS itself contributes to generate evidence• Many different sources and ways of annotation

exist• Human, specialist annotation remains essential

(monkeys nonwithstanding)

Conclusions on annotation

Page 26: Trends in Annotation of Genomic Data

www.biobase-international.com

[email protected]

Functional AnalysisHuman Mutation & Variant Analysis

Gene Regulation Analysis

Thank you!• BIOBASE Employees all around the world• David Cooper, University of Cardiff• Andrew Deveraux, NGRL• Patrick Willems, MutaBase• Johan den Dunnen, HVP & Leiden University Medical Center• Anthony J. Brooks, GEN2PHEN & University of Leicester• Samir K. Brahmachari , OSDD