niso/dcmi webinar: metadata for managing scientific research data

33
Metadata for Managing Scientific Research Data NISO/DCMI Webinar: August 22, 2012 Jane Greenberg, Professor and Director of the SILS Metadata Research Center [email protected]

Upload: national-information-standards-organization-niso

Post on 01-Nov-2014

28 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Metadata for Managing Scientific Research Data

NISO/DCMI Webinar: August 22, 2012

Jane Greenberg, Professor and Director of the SILS Metadata Research [email protected]

Page 2: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A

Page 3: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

BIG stuff▪ Digital data deluge (Hey & Trefethen, 2003)

▪ Big data (New York Times)

▪ The fourth paradigm (Jim Gray, 2007)

Just as important▪ The long tail (Heidorn, 2008)

▪ CODATA/Data-at-Risk Task Group▪ Scholarly communications, data citation

Technological affordances for improving and advancing science

Why should we care?

2008

Page 4: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Cultural shift toward data sharing

▪ National and international policies – US NSF and NIH [1, 2]– OECD  (Organisation for Economic Co-operation and

Development) [3]– INSPIRE Infrastructure for Spatial Information in the European

Community EU Commission [4]– UK Medical Research Council [5]

Dryad “enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.” (http://datadryad.org/)

Page 5: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Overview▪ Why should we care?

▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A

Page 6: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Data▪ No single agreed upon definition▪ One person’s data is another person’s

information ▪ Data often implies the “raw” stuff lacking

context– Scholarly context, written assessment

▪ “Essence of science” (Greenberg, et al, 2009)

▪ What is science?– The Archaeology Data Service (ADS)

archaeologydataservice.ac.uk

Page 7: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

DataI know it when I see it

By example: Traditional observations, numbers, and measures stored in spreadsheets and databases, fossils, phylogenetic trees, and herbarium samples (White, 2008)

Other disciplines▪ Bioinformatics: Gene

expressions, DNA transcription to RNA translation

▪ Geology, agriculture, surveillance, and historical manuscript research: Hyperspectral remote sensing

quantity type

3162 Plain Text

476 Microsoft Excel

308 Adobe Portable Document Format

302 Comma-separated values

252 Nexus

153 Microsoft Excel OpenXML

108 Microsoft Word

80 Zip file

62 JPEG image

45 Microsoft Word OpenXML

40 Extensible Markup Language

35 Hypertext Markup Language

21 Rich Text Format

16 FASTA sequence file

15 Tag Image File Format

14 Postscript Files

2 Video Quicktime

2 Mathematica Notebook

1 Microsoft Powerpoint

(email w/R. Scherle, July 2012)

The Dryad Repository

Page 8: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Overview▪ Why should we care?▪ What is data?

▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A

Page 9: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Metadata defined……data about data

…….information about data

▪“Metadata or ‘data about data’ describes the content, quality, condition, and other characteristics of data.” (FGDC Metadata WG, 1998)

▪Structured information about an object (data) that facilitates functions associated with the object. (Greenberg, 2002, 2003, 2009)

Page 10: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Discover ManageControl rights

Identify versions

Certify authenticity

Indicate status

Mark conent strucure

Situate geospatially

Describe processes

Typical functions

Page 11: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?

▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A

Page 12: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Metadata for Scientific Research Data

It g

ets

mes

sy r

eally

qu

ickl

y

Page 13: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Metadata for Scientific Research Data

Descriptive– General to granular

▪Value (addressing a topic, “aboutness”)– Topical (ontologies, subject heading lists/thesauri,

taxonomies)

▪Named entities– Name authority files (people, organizations,

geographical jurisdictions, structures, and events)

▪Geo-spatial (coordinates)

▪Temporal data (ISO 8601/ W3CDTF, or …)

Page 14: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Given the messiness…

“I cannot tell you exactly what metadata standards, vocabularies, etc. to use…”

Page 15: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Examining metadata schemes

Objectives and principles

Domains Architectural layout

• Objectives

• Principles

• Discipline

• Genre

• Format

• Structural design

• Extent

• Granularity

Metadata Objectives and principles, Domain, and Architectural Layout (MODAL) framework

(Greenberg, 2005; Willis, et al, JASIST 2012)

Page 16: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Simple schemes[6]

Objectives and principles

Domains Architectural layout

• Interoperability• Easy to

generate, lower barrier to produce

• Multi-disciplinary

• Any genre or format

• Primarily flat• Minimal with

means to extend

• General (not granular)

Dublin Core Metadata Element Set (DCMES) ver.1.1

US MARC bibliographic format

• Need training • Primarily flat• Extensible

DataCite • Primarily flat

Page 17: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Dublin Core Application Profile-Dryad [7]

Page 18: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

DataCite example, ver.2.2 [8] National Institute for Environmental Studies and Center for Climate System Research Japan

Page 19: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

US MARC bibliographic format: World Ocean Circulation Experiment global data (Moss Landing Marine Labs and the Monterey Bay Aquarium Research Institute Library) [9]

Page 20: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Simple/moderate schemes

Objectives and principles

Domains Architectural layout

Interoperability balanced w/specific needs

Generation requires more expertise

Greater domain focus

Genera diversity within a domain

Primarily flat Extensibility—

via connecting Slightly more

granular

Darwin Core

Access to Biological Collections Data (ABCD)

• Not as flat

Ecological Metadata Language

DCMI Terms • Graph approach

Page 21: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Wieczorek, et al. (2012). Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS One. 2012; 7(1): e29715: doi: 10.1371/journal.pone.0029715.

Page 22: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

<?xml version='1.0' encoding='UTF-8'?> <DataSets xmlns='http://www.tdwg.org/schemas/abcd/2.06'> <DataSet>

<TechnicalContacts> <TechnicalContact> <Name>Gerd MÃŒller</Name> <Email>[email protected]</Email> </TechnicalContact> </TechnicalContacts> <ContentContacts> <ContentContact> <Name>A Another</Name> <Email>[email protected]</Email> </ContentContact> </ContentContacts> <Metadata> <Description> <Representation language='en'> <Title>PonTaurus collection</Title> </Representation> </Description> <RevisionData> <DateModified>2001-03-01T00:00:00</DateModified> </RevisionData> </Metadata> <Units> <Unit> <SourceInstitutionID>BGBM</SourceInstitutionID> <SourceID>PonTaurus</SourceID> <UnitID>1136</UnitID> </Unit> </Units> </DataSet> </DataSets>

Access to Biological Collections Data (ABCD) (A minimum record)

Page 23: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Properties in the /terms/ namespace

abstractaccessRightsaccrualMethodaccrualPeriodicityaccrualPolicyalternativeaudienceavailablebibliographicCitationconformsTocontributorcoveragecreatedcreatordatedateAccepteddateCopyrighteddateSubmitteddescription

educationLevelextentformathasFormathasParthasVersionidentifierinstructionalMethodisFormatOfisPartOfisReferencedByisReplacedByisRequiredByissuedisVersionOflanguagelicensemediatormedium

modifiedprovenancepublisherreferencesrelationreplacesrequiresrightsrightsHoldersourcespatialsubjecttableOfContentstemporaltitletypevalid

Page 24: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Complex schemes

Objectives and principles

Domains Architectural layout

Interoperability level

Generation requires greater expertise

• Genre focus• Format

variation

Hierarchical Extensive Granular

FGDC

DDI

Content Standard for Digital Geospatial Metadata (CSDGM)/FGDC

Data Document Initiative (DDI)

1. Identification Information (M)2. Data Quality Information   3. Spatial Data Organization Information4. Spatial Reference Information5. Entity and Attribute Information6. Distribution Information7. Metadata Reference Information (M)

1. Concept2. Collecting3. Processing Archiving4. Distribution Archiving5. Discovery6. Analysis7. Repurposing

Page 25: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Summary for descriptive schemes

▪ Simple: Interoperable, Easy to generate/low barrier, generally multidisciplinary, genera/format agnostics, primarily flat, general (not granular), 15-25 properties

▪ Simple/moderate: Interoperability balanced w/specific needs, generation requires more expertise, greater domain focus, extensible--via connecting to other schemes, more granular, more properties

▪ Complex: Interoperable level, generation requires expertise, genera focus/format variation, hierarchical, granular, and extensive (100+ properties)

Page 26: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Value schemes

(addressing a topic,

“aboutness”)

Topical (ontologies,

subject heading

lists/thesauri,

taxonomies)

EXAMPLE

DDI Vocabularies

•Analysis Unit

•Character Set

•Commonality Type Coded

•Lifecycle Event Type

•Response Unit

•Software Package

•Summary Statistic Type

•Time Method

Named entities (people, organizations, geographical jurisdictions, structures, and events)» LC Authorities» Virtual International Authority File (VIAF)» Open Researcher and Contributor ID (ORCID)

» Gazetteers» Getty Thesaurus of Geographical Names

Geo-spatial coordinatesISO 19111

Temporal data

- Dates ISO 8601/

W3CDTF

- Periods

CODE lists- Mime type- Language- Geo.- Etc.

Page 27: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards

▪ Challenges, opportunities, and jumping in▪ Concluding comments▪ Q&A

Page 28: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Challenges and opportunities

▪ Stop here

Challenges Opportunities

Workflow/When to generate the metadata?

Educate scientists early (Qin, 2009)Integrate into social setting w/Center for Embedded Networked Sensing(CENS) (Borgman, Mayernik, etc., 2009-current; Mayernik’s dissertation, 2011)

Methods for generating metadata (labor intensive)

Use automatic techniques as much as possible, leverage human expertise (Dryad, DataOne Excel project)

Too many standardsWhich one do I use?

Don’t panic, join communities, look for examples. (If you can’t find them?)

Do I need to implement my metadata as linked data.

No. Explore and develop a best practice. Pursue a 2 pronged approach (Greenberg, et al, 2009)

Page 29: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Jumping in…

1. DCMI/NISO Seminars !!

2. DCMI Science and Metadata Community (http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata)

3. Digital Curation Center (DCC) (http://www.dcc.ac.uk/)

4. The Research Data Management Training, or MANTRA project (http://datalib.edina.ac.uk/mantra/)

5. DataONE workshops and tutorials (www.dataone.org/)

Page 30: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in

▪ Concluding comments▪ Q&A

Page 31: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Concluding comments▪ Standards are guidelines; no police

– Aim for reasonable quality

▪ KISS: Keep it simple stupid– What’s vital; what will aid reuse?

▪ Help to move the practice forward– Share what you learn

▪ Nothing new/it’s all new– Data documentation since ancient times – SILOS; let’s break them down (Willis, et al, 2012)– Greater connectivity than ever– Cross-disciplinary approaches for problem solving

Page 32: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Overview▪ Why should we care?▪ What is data?▪ What is metadata’s role w.r.t data?▪ Selected metadata standards▪ Challenges, opportunities, and jumping in▪ Concluding comments

▪ Q&A

Page 33: NISO/DCMI Webinar: Metadata for Managing Scientific Research Data

Footnotes[1] NSF Data Sharing Policy: http://www.nsf.gov/bfa/dias/policy/dmp.jsp.

[2] NIH Data Sharing Policy: http://grants.nih.gov/grants/policy/data_sharing/.

[3] ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT/Data and Metadata Reporting and Presentation Handbook: http://www.oecd.org/std/37671574.pdf.

[4] The INSPIRE Infrastructure for Spatial Information in the European Community): http://inspire.ec.europa.eu/index.cfm/pageid/48. directive released 15 May 2007 and will be implemented in various stages, with full implementation required by 2019, and aims to create a European Union (EU) spatial data infrastructure.

[5] UK medical research council: http://www.mrc.ac.uk/Ourresearch/Ethicsresearchguidance/datasharing/index.html.

[6] The DCMI Glossary (scroll down for “schema” entry): http://dublincore.org/documents/usageguide/glossary.shtml#schema.

[7] Dublin Core Example: Data from: Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia (Dryad repository): http://datadryad.org/resource/doi:10.5061/dryad.8120?show=full.

[8] National Institute for Environmental Studies and Center for Climate System Research Japan—animation data (DataCite): http://schema.datacite.org/meta/kernel-2.2/example/datacite-metadata-sample-v2.2.xml.

[9] US MARC bibliographic format: World Ocean Circulation Experiment global data (Moss Landing Marine Labs and the Monterey Bay Aquarium Research Institute Library): http://mlml.kohalibrary.com/cgi-bin/koha/opac-detail.pl?biblionumber=9282.