sharing genetic variation data via embl-ebi: the european variation archive gary saunders, phd
Post on 18-Jan-2016
223 Views
Preview:
TRANSCRIPT
Sharing Genetic Variation Data via EMBL-EBI: The European Variation Archive
Gary Saunders, PhD
www.ebi.ac.uk/eva
Agenda
• Overview of European Variation Archive (EVA)
• EVA model of data sharing
• Summary of how we share genetic variation data
• Merging open-access variation datasets
European Variation Archive – EVA (Eva)
• Curated genetic variation data sharing & analysis platform
• All types of variation:
• SNVs, MNVs, small indels and structural variation
• Germ line, somatic, within / cross population, potentially between speciesSingle portal for open access variation data
EVA Data Sharing Model
Submitter Archived at EBI
Sample(s) Methodology Genome
EVA
EVA Data Sharing Model
Submitter Archived at EBI
Sample(s) Methodology Genome
EVA Publication
Collaborators
Wider Study Data
Stable POA Credit for reuse
EVA: Study Browser
• Core EVA functionality: portal to open-access genetic variation project data (VCF files):
EVA: Study Browser – project pages
• Core EVA functionality: portal to open-access genetic variation project data (VCF files):
EVA: Study Browser – assessing data quality• Core EVA functionality: portal to open-access
genetic variation project data (VCF files):
Submission to EVA
• Minimal or data-rich submissions are accepted
• Collaborative process
• Submitter recognition
• Hold date
• Links to runs / experiments / analyses
• Accession number in 48 hours
• EVA has a dynamic study loading pipeline
• Online documentation
• eva-helpdesk@ebi.ac.uk
Rate of Submission to EVA
Non-human
Total
March 2014 October 2015
1 billion
Merging Open-Access Datasets
share data
Merging Open-Access Datasets
Data submitters
share data
Merging Open-Access Datasets
Data submitters
share data
Merging Open-Access Datasets
Merging Open-Access Datasets
Merging Open-Access Datasets
Conclusion
European Variation Archivewww.ebi.ac.uk/eva
• Open-access genetic variation archive
• Curated resource
• All types of variants
• All species
• Simplified submission system
FundingEVAJustin Paschall
Ignacio Medina Castello
Gary Saunders
Cristina Yenyxe Gonzalez
Jag Kandasamy
Ilkka Lappalainen
EGAJeff Almeida-King
Vasudev Kumanduri
Saif Ur-Rehman
Tom Smith
AcknowledgmentsEnsembl VariationFiona Cunningham
Sarah Hunt
William McLaren
Anja Thormann
Laurent Gil
ENARasko Leinonen
Rajesh Radhakrishnan
Daniel Vaughan @ebivariation
www.ebi.ac.uk/eva
Case-study: deCODE
• Variation data from 2000 Icelanders
• VCF files
• Novel samples and metadata, custom reference genome
• Hold until publication
Case-study: deCODE
Case-study: deCODE
Variant Call Format (VCF): The Community Standard
• Most VCF validation tools do not truly conform to specification:
• Of all ~250 Human VCFs loaded to EVA < 10% were truly valid in first pass
• (EVA has publicized comprehensive C++ VCF validator that raises errors and warnings)
Most VCFs publically available are not truly valid
Sharing Genetic Variation Data
• Data accuracy
• Metadata
• Links to associated data
• Credit to data generator(s) for reuse
PROBLEMS
Sharing Genetic Variation Data
top related