ddi-rdf discovery vocabulary a metadata vocabulary for documenting research and survey data linked...

20
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch GESIS, Germany [email protected] Richard Cyganiak DERI, Ireland [email protected] Arofan Gregory Open Data Foundation, USA agregory@opendatafoundation .org Joachim Wackerow GESIS, Germany [email protected]

Upload: valentine-caldwell

Post on 29-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

DDI-RDF Discovery VocabularyA Metadata Vocabulary for Documenting Research and Survey Data

Linked Data on the Web (LDOW 2013)14.05.2013

Thomas BoschGESIS, Germany

[email protected]

Richard CyganiakDERI, Ireland

[email protected]

Arofan GregoryOpen Data Foundation, USA

[email protected]

Joachim WackerowGESIS, Germany

[email protected]

Page 2: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

2

Outline

• What is DDI?• Motivation• Relationships to Vocabularies• DDI-RDF Discovery Vocabulary• Conceptual Model

Page 3: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

3

What is DDI?

• DDI (Data Documentation Initiative)• DDI is an established international standard for the documentation

and management of data from the social, behavioral, and economic sciences

• DDI is a data model for describing statistical data• data collected for research and official statistics

• The DDI Alliance • International consortium of > 35 member institutions• produces and maintaines the DDI

Page 4: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

4

What is DDI?

• DDI supports the entire research data lifecycle Secondary analysis results can be reproduced

Page 5: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

5

What is DDI?

• DDI focuses on the documentation of microdata• DDI also supports aggregated data • DDI-C (Codebook)

• general information about a study• data dictionary

• DDI-L (Lifecycle)• description of more complex multi-wave studies• throughout the data lifecycle

Page 6: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

6

What is DDI?

• Structured high quality metadata enable secondary analysis without the need to contact the primary researcher

• DDI enables the reuse of metadata of existing studies for designing new studies

• DDI is currently specified using XML Schemas • XML Schemas are organized in multiple modules corresponding to the

individual stages of the research data lifecycle• XML Schemas comprehend over 800 XML elements

Page 7: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

7

Motivation for the DDI Community

• publish microdata (data sets representing microdata)• increase visibility of microdata• increase use of microdata• discover microdata• enable inferencing on microdata• harmonize microdata (make microdata comparable)• RDF tools can process DDI-RDF

Page 8: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

8

Motivation for the LD Community

• an ontology describing the statistical domain is now available • publish microdata• publish metadata on microdata• metadata about already published but under-documented

microdata can be published• RDF tools can process DDI-RDF• to link microdata to other microdata

making the data and the results of research (e.g. publications) more closely connected

Page 9: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

9

Relationships to Vocabularies

• DCMI Metadata Terms• are used for citation purposes

• Simple Knowledge Organization System (SKOS)• is used for creating hierarchies of concepts similar to thesauri and

classification systems

• SKOS Extension (XKOS)• a vocabulary which extends SKOS to allow for a more complete description of

formal statistical classifications• planned for publication 2013 by the DDI Alliance• reference: https://github.com/linked-statistics/xkos

Page 10: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

10

Relationships to Vocabularies

• Data Catalog Vocabulary (DCAT)• W3C standard for describing catalogs of data sets• disco:LogicalDataSet ⊑ dcat:Dataset• disco: DataFile ⊑ dcat:Distribution

• RDF Data Cube Vocabulary• W3C standard for representing data cubes, i.e. multidimensional aggregate

data• disco:aggregation (disco:LogicalDataSet, qb:DataSet)• disco:inputVariable (qb:DataSet, disco:Variable)• reference: http://www.w3.org/TR/vocab-data-cube/

Page 11: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

11

DDI-RDF Discovery Vocabulary

• contains only a small subset of DDI-XML + additional axioms• The conceptual model is derived from use cases which are typical in

the statistical community • Statistical domain experts have formulated these use cases which

are seen as most significant to solve frequent problems• enables to

• publish • discover

microdata and metadata about microdata (research and survey data) in the Web of Linked Data

Page 12: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

12

DDI-RDF Discovery Vocabulary

• Availability of (meta)data• Microdata may be available (typically as CSV files)• In most cases, metadata about microdata is NOT available

• contains major types of metadata of DDI-C and DDI-L • Mappings from DDI-XML to DDI-RDF• No straightforward Mapping from DDI-RDF to DDI-XML

• enables better support for the LD community• partly no corresponding constructs in DDI-XML

• 26 experts from the statistics and the Linked Data community of 12 different countries have contributed

Page 13: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

13

Conceptual Modelclass overview

«union»

VariableQuestion

InstrumentQuestionnaire

dcat:DatasetLogicalDataSet

skos:ConceptAnalysisUnit

skos:ConceptUniverse

Study

StudyGroup

1..* product

0..*

0..*inGroup

0..1

1..*

variable 0..*

0..*universe1

1..*

containsVariable

0..*

0..*

question

1..*0..*

universe

1

0..*

analysisUnit0..1

0..*

universe

1

0..*question0..*

0..*

analysisUnit

0..1

0..*

universe

1..*

Page 14: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

14

class study-universe

Study

- owl:versionInfo

StudyGroup

skos:ConceptUniverse

- skos:definition :rdf:langString

skos:ConceptAnalysisUnit

«union»

- dcterms:abstract :rdf:langString- dcterms:alternative :rdf:langString- dcterms:available :xsd:dateTime- dcterms:title :rdf:langString- purpose :rdf:langString- subtitle :rdf:langString

0..*

analysisUnit

0..1

0..*

universe

1..*

0..*

inGroup

0..1

Page 15: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

15

class variable

Variable

- dcterms:description :rdf:langString+ skos:notation :rdfs:Literal- skos:prefLabel :rdf:langString

VariableDefinition

+ dcterms:description :rdf:langString- skos:prefLabel :rdf:langString

skos:Concept

- skos:definition :rdf:langString- skos:notation :rdfs:Literal- skos:prefLabel :rdf:langString

Representation

0..*

skos:narrower

0..*

0..*

skos:broader

0..*

0..*

representation

0..*

0..*

concept

1

0..*

representation

1

0..*

basedOn

0..1

0..*

concept

1

Page 16: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

16

class representation

skos:Concept

- skos:definition :rdf:langString- skos:notation :rdfs:Literal- skos:prefLabel :rdf:langString

«union»

rdfs:Datatype

skos:ConceptScheme

Representation

0..*

skos:hasTopConcept

0..*0..*

skos:inScheme

0..*

0..*

skos:narrower

0..*0..*

skos:broader

0..*

Page 17: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

17

class overview-data-set

dcat:DatasetLogicalDataSet

- dcterms:title :rdf:langString- isPublic :xsd:boolean

dcat:Distributiondcterms:Dataset

DataFile

- caseQuantity :xsd:nonNegativeInteger- dcterms:description :rdf:langString- owl:versionInfo :string

DescriptiveStatistics

CategoryStatistics

- cumulativePercentage :xsd:decimal- frequency :xsd:nonNegativeInteger- percentage :xsd:decimal- weightedCumulativePercentage :xsd:decimal- weightedFrequency :xsd:nonNegativeInteger- weightedPercentage :xsd:decimal

SummaryStatistics

- invalidCases :xsd:nonNegativeInteger- maximum :xsd:decimal- mean :xsd:decimal- median :xsd:decimal- minimum :xsd:decimal- mode :xsd:decimal- standardDeviation :xsd:decimal- validCases :xsd:nonNegativeInteger- weightedInvalidCases :xsd:nonNegativeInteger- weightedMean :xsd:decimal- weightedMedian :xsd:decimal- weightedMode :xsd:decimal- weightedValidCases :xsd:nonNegativeInteger

0..*

statisticsDataFile

0..*

0..*

dataFile

0..*

Page 18: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

18

class Data Collection

Question

- questionText :rdf:langString- skos:prefLabel :rdf:langString

Questionnaire

Instrument

- dcterms:description :rdf:langString- skos:prefLabel :rdf:langString

foaf:Document

Representation

0..*

externalDocumentation

0..*

0..*

question

1..*

0..*

responseDomain

1..*

Page 19: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

19

Thank you for your attention…

• Unofficial draft [planned as specification by DDI Alliance by 2013]http://rdf-vocabulary.ddialliance.org/discovery

• Specification (current state) on GitHub repositoryhttps://github.com/linked-statistics/disco-spec

• Scenarios for the DDI-RDF Discovery Vocabulary [in preparation]http://dx.doi.org/10.3886/DDISemanticWeb02

Thomas BoschGESIS - Leibniz Institute for the Social Sciences

[email protected]

https://github.com/boschthomas/PhD

Page 20: DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) 14.05.2013 Thomas Bosch

20

Acknowledgements

26 experts from the statistical community and the Linked Data community coming from 12 different countries contributed to this work. They were participating in the events mentioned below.• 1st workshop on 'Semantic Statistics for Social, Behavioural, and Economic Sciences:

Leveraging the DDI Model for the Linked Data Web' at Schloss Dagstuhl - Leibniz Center for Informatics, Germany in September 2011

• Working meeting in the course of the 3rd Annual European DDI Users Group Meeting (EDDI11) in Gothenburg, Sweden in December 2011

• 2nd workshop on 'Semantic Statistics for Social, Behavioural, and Economic Sciences: Leveraging the DDI Model for the Linked Data Web' at Schloss Dagstuhl - Leibniz Center for Informatics, Germany in October 2012

• Working meeting at GESIS - Leibniz Institute for the Social Sciences in Mannheim, Germany in February 2013