leonore reiser and lisa harper plantae webinar · • minimal information about a plant phenotyping...

51
Leonore Reiser and Lisa Harper Plantae Webinar May 30, 2018

Upload: others

Post on 03-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Leonore Reiser and Lisa Harper

Plantae WebinarMay 30, 2018

Page 2: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

What does it mean to be FAIR? Why is it so

important?

How to make your published work more

FAIR

Planning your data management strategy

Stay to the end to complete a survey about

future webinars

Page 4: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

What’s in it for YOU?

We all benefit from data sharing.

More citations of YOUR work, increasing

your visibility in the research community.

Easily comply with journal and

funding requirements

Less time spent fulfilling requests for data.

Page 5: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

ACGATTGAAGAGAGACTTAAAGTGGTGGAATAAGCACATTTTTGGAGATATTTTTAAAATCCTCCGATTG

GCAGAAGTTGAAGCTGAACAAAGAGAATTGAATTTCCAACAGAATCCCTCAGCAGCTAATAGAGAATTGA

TGCATAAGGCTTATGCCAAACTTAACCGGCAGTTAAGTATTGAAGAACTTTTTTGGCAACAAAAGTCGGG

TGTCAAATGGTTAGTGGAGGGGGAACGCAACACCAAATTTTTTCATATGAGGATGCGTAAAAAAAGAATG

AGAAATCACATCTTCCGGATTCAGGATCAGGAAGGGAATGTGCTTGAAGAACCTCATTTAATCCAAAACT

CGGGTGTTGAATTCTTTCAAAACTTGCTGAAGGCAGAACAATGTGACATCTCCAGGTTTGATCCTTCTAT

TACTCCACGAATTATCTCCACCACTGATAATGAATTCTTGTGTGCAACCCCATCGTTACAGGAAGTGAAA

GAGGCAGTATTTAACATTAATAAAGATAGTGTCGCTGGGCCTGACGGTTTCTCATCCTTGTTTTACCAAC

ACTGCTGGGACATAATCAAGCAAGACCTTTTTGAAGCAGTGCTTGATTTTTTCAAGGGGAGCCCGCTACC

ACGTGGCATTACCTCCACAACGCTTGTCTTGTTACCTAAAACTCAGAATGTCAGCCAATGGAGTGAATTT

CGGCCCATTAGTTTATGCACTGTCTTAAACAAGATAGTAACTAAACTTTTGGCCAACCGGCTATCCAAAA

TTCTCCCATCCATCATCTCAGAAAACCAAAGTGGCTTCGTTAATGGAAGGCTTATAAGTGACAATATCTT

GCTTGCACAGGAGCTGGTTGATAAGATTAATGCAAGATCAAGGGGAGGTAATGTGGTCCTAAAACTTGAT

ATGGCAAAAGCTTATGACCGTCTGAATTGGGAATTTCTTTATCTTATGATGGAGCAGTTTGGTTTTAATG

CACTTTGGATAAACATGATTAAGGCCTGCATCTCCAACTGTTGGTTTTCATTACTCATCAATGGATCCTT

AGTGGGCTATTTCAAATCCGAGAGGGGACTGAGACAGGGCGATTCTATTTCCCCTTCGCTTTTTATCTTG

GCTGCAGAATATTTATCAAGGGGACTCAATCAGTTATTCAGCCGCTACAATTCTTTACATTACTTATCTG

GATGTTCCATGTCTGTGAGTCACCTTGCTTTTGCCGATGATATTGTAATTTTTACTAATGGTTGCCACTC

AGCCTTGCAGAAGATCTTGGTCTTCTTACAGGAATATGAACAGGTATCGGGGCAACAGGTTAATCATCAA

What types of Data are we talking about?

• We used to publish all the data we needed to prove a hypothesis within a publication

• But things have changed:• Some data is now too large for inclusion in a publication

• Data can now be computationally analyzed, so it must be machine readable

Page 6: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Zhang et al, Plant Cell, 2018. doi.org/10.1105/tpc.17.00791

Photos of specimens

Data that can be included in Publications

Data OK in primary publication

Data goes in appropriate, stable, long term repository

Guo et al, Plant Cell, 2018. doi.org/10.1105/tpc.17.00842

Gel images, charts and graphs

Page 7: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Caseys et al, Plant Cell 2018. doi.org/10.1105/tpc.18.00278

Model cartoons

Kumar, et al, Plant Physiology 2018, doi.org/10.1104/pp.18.00263

SHORT lists of

primers in text format

(not pdf)

Data OK in primary publication

Data goes in appropriate, stable, long term repository

Data that can be included in Publications

Page 8: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Data OK in primary publication

Data goes in appropriate, stable, long term repository

• Genome Assemblies

• RNAseq/ChIPseq/OtherSeq

• QTL data (bi-parental or GWAS)/ SNPs/INDELs

• Other Genome Diversity data

• Proteomics

• Metabolomics

• Ionomics

• Etc.

Data TOO BIG for Publications

These Data Types need ADDITIONAL

attention to be FAIR

Page 9: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Data TOO BIG for PublicationsTwo Examples of Big Data in publications:

1. Paper reports on a new genome sequence assembly:

That genome sequence MUST be made available; should

be submitted to Genbank Genome.

2. Paper used RNAseq to show that expression of their gene

of interest is altered under a certain condition. Only a

subset needs to be shown, but ALL the RNAseq data is

valuable. Publish the paper AND publish the RNAseq

Data! You get TWO publications instead of one!

Page 10: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Credit: Melissa Haendel

Wilkinson, et al., (2016) The FAIR Guiding Principles for scientific data management and stewardship

10.1038/sdata.2016.18. https://www.nature.com/articles/sdata201618

• Findable means data is human and machine readable

and attached to persistent identifiers

• Accessible means data can be found and retrieved by

humans and machines using standard formats

• Interoperable means data can be exchanged and used

between systems.

• Reusable means data can be used by others

Page 11: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

How to Make Your Published Data FAIR

• Use standard formats

• Supply complete metadata

• Embrace Ontologies

• Use persistent and unambiguous identifiers

• Put your data in a long term stable repository

• Cite, share freely and encourage others

Page 12: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

CHROM POS REF ALT Line 1

Line2

1 12345 A C A A

3 67891 C T H C

10 23456 G T T U

CHROM POS REF ALT Line 1

Line 2

Gm01 12345 A C 0/0 0/0

Gm03 67891 C T 0/1 0/0

Gm10 23456 G T 1/1 ./.

CHROM POS REF ALT Line 1

Line2

Chr01 12345 A C AA AA

Chr03 67891 C T C/T CC

Chr10 23456 G T TT NN

ALL MEAN THE SAME!

BUT ARE NOT THE SAME

Use Standard formats: SNP example

SNP (Single Nucleotide Polymorphism): A base, a chromosome

number and genome position, and a reference to the genome

assembly used, and the genotypes of lines tested.

VCF: Variant Call Format

Is the STANDARD

Use the File format

STANDARD

for your data type

Page 13: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

DOI:/10.3389/fpls.2017.01812

Use Standard formats: Data in images is NOT accessible

Data in PDF (image) format

is not findable or

accessible.

Leave tabular data in tables

Page 14: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

If you use EXCEL, look out for data corruption and hidden Microsoft characters that impede parsing

Zeimann, 2016

10.1186/s13059-016-1044-7

Use Standard formats: Beware of Excel

Fig. 1: Prevalence of gene name errors in Supplementary Excel files

Percentage of papers with gene lists effected Increase in supplementary files with gene

name errors per year

Page 15: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

How to Make Your Published Data FAIR

• Use standard formats

• Supply complete metadata

• Embrace Ontologies

• Use persistent and unambiguous identifiers

• Put your data in a long term stable repository

• Cite, share freely and encourage others

Page 16: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Metadata: Species = xxx

Germplasm = xxx

Field location = xxx

Environment = xxx

Measurement = xxx

method

Phenotype (Data): Plant is 170cm tall

Metadata is data about the data,and allows understanding of the data

Supply Complete Metadata

Page 17: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

• Write your Materials and Methods as if you wanted someone else to be able to reproduce your work.

• Be accurate and complete about your bench and field work; include samples/stocks/lines used, accession numbers, sources of materials, exact measuring techniques etc.

• Be AS accurate and complete about your computational pipelines. Include your created raw data files and versions. If you use reference data (eg; sequence assembly), include the version number, download dates, and download source.

• Include names of software applications, versions, platforms and source. If you use a CyVerse, use their metadata reporting tools.

Supply Complete Metadata

Page 18: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Supply Complete Metadata: Example

Pretty Good

Page 19: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Pretty Goodbut lots of metadata

in free text

Page 20: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Supply Complete Metadata

Not so good

Page 21: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

But NONE of those are really great…

597 Possible Attributes

At least 50 Attributes

At least 100 Attributes

Page 22: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Budget TIME

to provide Metadata

The metadata in public databases is often confusing

and very incomplete

A test case with Zea mays RNAseq data reveals a high proportion

of missing, misleading or incomplete metadata.Bhandary, et al, Plant Science 2018. Raising orphans from a metadata morass: A researcher's

guide to re-use of public ’omics data. https://doi.org/10.1016/j.plantsci.2017.10.014

Page 23: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete
Page 24: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

• Established: Genomic Standards Consortium (http://gensc.org)

• Minimal Information about Any Sequence• Emerging

• Minimal Information about a Plant Phenotyping Experiment (MIAPPE)

Metadata Standards for Various Data Types

Ask For Help from Database People

Page 25: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

How to Make Your Published Data FAIR

• Use standard formats

• Supply complete and deep metadata

• Embrace Ontologies

• Use persistent and unambiguous identifiers

• Put your data in a long term stable repository

• Cite, share freely and encourage others

Page 26: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Cell

Same word,

different meanings

Different words, same concept

Eggplant

Aubergine

Melongene

Credit: Monica Munoz Torres

Page 27: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

An Ontology is:

A set of precisely defined terms

in a logical hierarchy, and the

relationship between terms can be

understood by computers

Page 28: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

PO:0020105ligule

Ontologies: Hierarchy of terms and

explicit relationship among terms

Plant

Ontology

(PO)

Ligule

PO:0020105

Vascular leaf

PO:0009025

Leaf sheath

PO:0020104

Flag leaf

PO:0020103

Adult vascular leaf

PO:0020103

Leaf

PO:0025034

Page 29: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Embracing ontologies

• Ontologies provide a POWERFUL, MACHINE READABLE utility to ensure we are all speaking the same language

• Examples of ontologies:

• Gene Function = Gene Ontology (GO)

• Plant Anatomy and Development = Plant Ontology (PO)

• Phenotypes = Phenotype and Trait Ontology (PATO)

• …..many many others

• Find and use existing ontologies:• http://bioportal.bioontology.org// (711 ontologies)

• https://www.ebi.ac.uk/ols/index (208 ontologies)

• Planteome (http://planteome.org)

• Ask Questions!

Page 30: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Using Ontologies in Metadata

Page 31: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Questions?

Page 32: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

How to Make Your Published Data FAIR

• Use standard formats

• Supply complete and deep metadata

• Embrace Ontologies

• Use persistent and unambiguous identifiers

• Put your data in a long term stable repository

• Cite, share freely and encourage others

Page 33: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Use persistent, unambiguous identifiers

Example: Gene names

GOOD!

Page 34: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Identifiers also resolve confusion over species

Is this Arabidopsis? Maize? Tomato?

Page 35: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

DOI:10/24/pp.17.00021

One gene- many names

GOOD

OK

(history)

Page 36: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

One name- many genes

Page 37: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

A ‘gene’ may have many named sequences

Page 38: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Community Standards and Nomenclature Resources

Page 39: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

How to Make Your Published Data FAIR

• Use standard formats

• Supply complete and deep metadata

• Embrace Ontologies

• Use persistent and unambiguous identifiers

• Put your data in a long term stable repository

• Cite, share freely and encourage others

Page 40: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Put your data in a stable public repository

Large International Repositories for many data

types for all species. ALL sequence data goes here

Large but specialized databases serving many species

Soybase

Specialized databases serving specific communities

Page 41: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Which repository?

ALL types of sequence Data: NCBI, DDBJ, or

EMBL

SNP data: European Variation Archive (EVA,

https://www.ebi.ac.uk/eva/).

(NCBI’s dbSNP will only process Human SNPs)

RNAseq: GEO

(https://www.ncbi.nlm.nih.gov/geo/), Array

Express (https://www.ebi.ac.uk/arrayexpress/)

Page 42: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Re3data- searching repositories

https://www.re3data.org/

Page 43: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

FAIRsharing- searching repositories

https://fairsharing.org/

Page 44: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

FAIRsharing- searching metadata standards

https://fairsharing.org/

Page 45: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

What if there is no specialized database?Or no recommendations from journals ?

You should get a Digital Object Identifier (DOI)

http://datadryad.org

** Curated, metadata

https://zenodo.org/

https://figshare.com/

https://datashare.ucsf.edu/stash

And institutional repositories

Page 46: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

But.. please, don’t forget to actually complete your submission*...

*And you never have to spend time fielding requests

or transferring huge data files again

Page 47: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Data Management PlanningWhat external sources of data will I be using?

Where will it come from?

Are there restrictions on reuse?

What types of data will I be generating?

What sort of metadata do I need to collect?

How will I structure and store my data?

Are there existing data handling standards and tools?

Where will the data reside when my project is done?

Is there a repository that can handle my data?

What metadata and files do I need to provide and how?

If I plan to host the data myself on a website/database, how

will I maintain it, and for how long? What happens then?

Under what terms will others be able to reuse my data?

If I want to, how will I be able to track how my data is

reused?

Page 48: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

How to Make Your Published Data FAIR

• Use standard formats

• Supply complete and deep metadata

• Embrace Ontologies

• Use persistent and unambiguous identifiers

• Put your data in a long term stable repository

• Cite, share freely and encourage others

Page 49: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Cite, share freely and encourage others to be FAIR

Include searchable and citable identifiers for your data in your papers

Release your data with clearly defined terms of use

e.g. Creative Commons (CC) CC-0, CC-BY

(https://creativecommons.org)

If you do not specify restrictions may be implied, limiting reuse

Cite all of your data sources

Enhances reproducibility….. and also shows value to funders!

When reviewing papers check them for FAIRness

Page 50: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Good data practices benefit everyone (and help you get funded)

NSF considers the Data Management Plan (DMP) to be an integral part of all full proposals

(http://www.nsf.gov/bfa/dias/policy/dmp.jsp1), that will be “considered under Intellectual Merit or

Broader Impacts or both, as appropriate for the scientific community of relevance” (PAPPG, pg. II-212).

BIO recognizes that different research communities may have their own data management practices

and standards; that these norms will change over time; and, that lifecycles of usefulness will vary for

different data types. As such, it is essential for scientific communities to guide needed standards

development, and to shape expectations for sharing or archiving.

https://www.nsf.gov/bio/pubs/BIODMP102015.pdf

Page 51: Leonore Reiser and Lisa Harper Plantae Webinar · • Minimal Information about a Plant Phenotyping Experiment (MIAPPE) Metadata Standards for Various Data Types ... • Supply complete

Thank you!Please complete the survey

AgBioData