the challenge - data science · – 134, combining protein and genome annotation for interpretation...

17
The Challenge

Upload: others

Post on 25-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

The Challenge

Page 2: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

BD2K-SDC Sustainability

Page 3: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Sustainability - Agenda• Introduction: Allen Dearry• Results From Metrics RFI: Izumi Hinkson• Panel Presentations and Discussion

– Susan Gregurick, Moderator– David Giaretta: The Role of Trustworthy Digital Repositories in Sustainability – George Alter: The Sloan Stewardship Gap Project– Melissa Landrum: Archiving Interpretations of Variants in ClinVar– Cathy Wu: Interoperability, Sustainability, and Impact: A UniProt Case Study

• Posters– 134, Combining Protein and Genome Annotation for Interpretation of

Genomic Variants, Peter McGarvey– 135, Interoperability of NURSA, PharmGKB, dkNET, and DataMed, Neil

McKenna • Notes

– https://docs.google.com/document/d/1EaYSuOeR7BJnmjzAAoS8iuS86UuS-3eK-MesX6-9_Eg/edit (hotlink on program book p6)

Page 4: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

RFI- Metrics to Assess the Value of Biomedical Digital Repositories

Izumi HinksonAAAS S&T Policy Fellow

NCI CBIIT

BD2K & SDC Sustainability Working Group November 30, 2016

Page 5: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Goals of RFI

Solicit input from stakeholdersFoundation for long-term sustainability

o Enable repository owners to prioritize repository management

o Support decisions made by funding agencies

o Support diverse domains of scienceo Communication between stakeholders

Page 6: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Distribution of Response ThemesUse & Users

Quality & Impact

Quality of Service

Governance

Infrastructure

Surveys & Case Studies

Other considerations

24%n=65

26%n=71

9%n=24

8%n=21

6%n=16

11%n=29

16%n=42

Page 7: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Examples of metrics

24%n=65

Use & Users

• Number of downloads• Size of user community• International reach• Cautions

• User counts vs coverage • Bias of utilization statistics

Page 8: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Examples of metrics

26%n=71

Quality & Impact

• Number of publications,citations, grants, and patents

• Data and metadata standards• Educational tools and protocols• Altmetrics• Cautions

• Data vs repository value• Context for quantitative

metrics

Page 9: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Examples of metrics

16%n=42

Quality of Service

• Expertise of staff• Up time and response time• Regularly scheduled updates and

maintenance • Help desk and FAQs• Tutorials, webinars, and training• Cautions

• Down time context

Page 10: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Examples of metrics

11%n=29

Governance

• Scientific Advisory Board• Legal, regulatory, and

contractual framework• Documentation• Terms of use• Licensing• Encryption and security

• Lifecycle management plan• Funding

Page 11: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Examples of metrics

16%n=42

Infrastructure

• Infrastructure funding• Technology architecture

• Hardware and software• Maintenance• Sustainability

• Office space

Page 12: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Examples of metrics

9%n=24

Surveys & Case Studies

• User experience surveys• Stakeholder interviews• Availability of alternatives• Counterfactuals• External audits• Cautions

• Testimonials may be biased• Validity of counterfactuals

Page 13: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Examples of metrics

. . .

8%n=21

Other Considerations

• Terminology• Indicators vs metrics

• Research cycle specific metrics• Tracking of missing, uncertain,

contradictory, or retracted data• “Diversity in data complexity

calls for different metrics”• Existing metrics and assessment

resources (e.g., ISO 16363, DSA-WDS)

Page 14: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Where to find more information?

• Executive Summary, mid-Decembero datascience.nih.gov/bd2k

• For more information, email:o [email protected]

Page 15: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Data Science at NIH

Data Science at NIH https://datascience.nih.gov/adds [email protected]@NIH_BD2K #BD2K, #BigData

Page 16: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,

Framing Questions for Sustainability Session at the 2016 BD2K AHM(also on google doc notes page)

1. In thinking about data preservation, scientific quality and impact, what do you find are the most important elements, or indicators, that promote data quality and ensure data impact?

2. As data integration and database cross integration become commonplace, how will this affect attribution and adherence to the FAIR principles*? For example, are there best practices for using format standards or for correctly referencing data identifiers that best support cross-linking data & repositories from one source/provider to another?

3. For data repositories that deal with clinical data and information, are there particular issues or challenges with adhering to the FAIR principles and performance indicators that could impact success?

4. What is the role for certifications of biomedical data repositories and trusted digital repositories? Will this differ between different biomedical and scientific domains?

5. What role does the scientific research community play in planning for and evaluating indicators for data preservation?

*FAIR principles are Findable, Accessible, Interoperable and Reusable.

Page 17: The challenge - Data Science · – 134, Combining Protein and Genome Annotation for Interpretation of Genomic Variants, Peter McGarvey – 135, Interoperability of NURSA, PharmGKB,