2014.12 - let's disco (eddi 2014)

145

Upload: thomas-bosch

Post on 09-Jul-2015

168 views

Category:

Technology


1 download

DESCRIPTION

Let's Disco

TRANSCRIPT

Page 1: 2014.12 - Let's Disco (EDDI 2014)
Page 2: 2014.12 - Let's Disco (EDDI 2014)

Slidesharehttp://de.slideshare.net/boschthomas

Page 3: 2014.12 - Let's Disco (EDDI 2014)

Questions?Please don‘t hesitate!

Page 4: 2014.12 - Let's Disco (EDDI 2014)
Page 5: 2014.12 - Let's Disco (EDDI 2014)

asbig as

DDI-L?

Page 6: 2014.12 - Let's Disco (EDDI 2014)
Page 8: 2014.12 - Let's Disco (EDDI 2014)

Triple Storehttp://multiweb.gesis.org/openrdf-

workbench/repositories/discotest/summary

Page 9: 2014.12 - Let's Disco (EDDI 2014)

Why Disco?

Page 10: 2014.12 - Let's Disco (EDDI 2014)

Why DDI as Linked Data?

Page 11: 2014.12 - Let's Disco (EDDI 2014)

Use Case

Page 12: 2014.12 - Let's Disco (EDDI 2014)

Where

to search for

data?

Page 13: 2014.12 - Let's Disco (EDDI 2014)
Page 14: 2014.12 - Let's Disco (EDDI 2014)

Which

microdata does exist

according to specific

metadata?

Page 15: 2014.12 - Let's Disco (EDDI 2014)
Page 16: 2014.12 - Let's Disco (EDDI 2014)

Which datasetsare associated with the

microdata?

Page 17: 2014.12 - Let's Disco (EDDI 2014)
Page 18: 2014.12 - Let's Disco (EDDI 2014)

Which

aggregated data according to specific

metadata does exist?

Page 19: 2014.12 - Let's Disco (EDDI 2014)
Page 20: 2014.12 - Let's Disco (EDDI 2014)

Which datasetsare associated with

aggregated data?

Page 21: 2014.12 - Let's Disco (EDDI 2014)
Page 22: 2014.12 - Let's Disco (EDDI 2014)

From which

microdata datasets is the aggregated dataset

derived?

Page 23: 2014.12 - Let's Disco (EDDI 2014)
Page 24: 2014.12 - Let's Disco (EDDI 2014)

Which

summary statistics does a

variable have?

Page 25: 2014.12 - Let's Disco (EDDI 2014)
Page 26: 2014.12 - Let's Disco (EDDI 2014)

Which

category statistics does a

variable representation have?

Page 27: 2014.12 - Let's Disco (EDDI 2014)
Page 28: 2014.12 - Let's Disco (EDDI 2014)

Which microdata datasets are created

by the research institute 'GESIS'?

Page 29: 2014.12 - Let's Disco (EDDI 2014)
Page 30: 2014.12 - Let's Disco (EDDI 2014)

Overview

Page 31: 2014.12 - Let's Disco (EDDI 2014)
Page 32: 2014.12 - Let's Disco (EDDI 2014)

SeriesStudies

Page 33: 2014.12 - Let's Disco (EDDI 2014)
Page 34: 2014.12 - Let's Disco (EDDI 2014)
Page 35: 2014.12 - Let's Disco (EDDI 2014)

Series

Series title: CIS

Page 36: 2014.12 - Let's Disco (EDDI 2014)

Study

Study title: EU-LFS 1991

Page 37: 2014.12 - Let's Disco (EDDI 2014)

Agents

Page 38: 2014.12 - Let's Disco (EDDI 2014)
Page 39: 2014.12 - Let's Disco (EDDI 2014)

IdentificationVersioning

Page 40: 2014.12 - Let's Disco (EDDI 2014)
Page 41: 2014.12 - Let's Disco (EDDI 2014)

ddi:study

a disco:Study;

dcterms:title

"National Population and

Housing Census, 1980"@en;

adms:identifier [

a adms:Identifier;

skos:notation

"us:ddi:us.mpc:ARG_1980_PHC_v01_A_IPUMS:1";

adms:schemaAgency "DDI Alliance"@en.

].

Page 42: 2014.12 - Let's Disco (EDDI 2014)

ddi:study

a disco:Study ;

dcterms:creator [

rdfs:label

"Minnesota Population Center"@en ;

skos:notation "MPC“ ;

adms:identifier [

a adms:Identifier ;

skos:notation "us.mpc“ ;

adms:schemaAgency

"DDI Alliance"@en ] ] .

Page 43: 2014.12 - Let's Disco (EDDI 2014)

Coverage

Page 44: 2014.12 - Let's Disco (EDDI 2014)
Page 45: 2014.12 - Let's Disco (EDDI 2014)
Page 46: 2014.12 - Let's Disco (EDDI 2014)
Page 47: 2014.12 - Let's Disco (EDDI 2014)

Spatial Coverage

Page 48: 2014.12 - Let's Disco (EDDI 2014)

<urn:ddi:de.gesis:study_EU-SILC-2005:0.1>

a disco:Study ;

dcterms:title

"EU-SILC 2005"@en ;

skos:prefLabel "2005"@en ;

dcterms:spatial

<http://sws.geonames.org/2782113> ,

... ,

:AllCountriesOfStudy ;

Page 49: 2014.12 - Let's Disco (EDDI 2014)

:AllCountriesOfStudy

a dcterms:Location ,

missy:Country;

rdfs:label

"all countries of study";

missy:code "" .

Page 50: 2014.12 - Let's Disco (EDDI 2014)

Countries

Study: EU-LFS 2004

Page 51: 2014.12 - Let's Disco (EDDI 2014)

Temporal Coverage

Page 52: 2014.12 - Let's Disco (EDDI 2014)

<urn:ddi:de.gesis:study_EU-SILC-2005:0.1>

a disco:Study ;

dcterms:title "EU-SILC 2005"@en ;

skos:prefLabel "2005"@en ;

dcterms:temporal

<urn:ddi:de.gesis:0ba9b4f3-ec22-

4471-8ffa-a38e8ada187a:0.1> ;

Page 53: 2014.12 - Let's Disco (EDDI 2014)

<urn:ddi:de.gesis:0ba9b4f3-ec22-4471-

8ffa-a38e8ada187a:0.1>

a disco:PeriodOfTime ;

dcterms:date

"Jan 1, 2005 12:00:00 AM"^^xsd:date .

Page 54: 2014.12 - Let's Disco (EDDI 2014)

Year

Study title:

'Structure of Earnings Survey – 2006'

Page 55: 2014.12 - Let's Disco (EDDI 2014)

Topical Coverage

Page 56: 2014.12 - Let's Disco (EDDI 2014)
Page 57: 2014.12 - Let's Disco (EDDI 2014)

missy:PB100

a disco:Variable

dcterms:subject [

a skos:Concept ;

skos:notation

"Quarter of the personal

interview"@en ] .

Page 58: 2014.12 - Let's Disco (EDDI 2014)

Topical Coverage

Variable ID:

<urn:ddi:de.gesis:variable_EU-SILC-2010-panel-p-data-2010_rev2-PB020:0.1>

Variable name:

'PB020'

Page 59: 2014.12 - Let's Disco (EDDI 2014)

Thematic Classification

Page 60: 2014.12 - Let's Disco (EDDI 2014)
Page 61: 2014.12 - Let's Disco (EDDI 2014)
Page 62: 2014.12 - Let's Disco (EDDI 2014)

:thematicClassification

a skos:ConceptScheme ;

skos:hasTopConcept

:concept1 ,

:concept2 ,

:concept3 .

Series-Level

Page 63: 2014.12 - Let's Disco (EDDI 2014)

:superConcept

a skos:Concept ;

skos:notation

"Demographic background"@en ;

skos:narrower

:subConcept1 ,

:subConcept2 .

Narrower Concepts

Page 64: 2014.12 - Let's Disco (EDDI 2014)

Direct Broader Concepts

Concept:

'Country'@en

Page 65: 2014.12 - Let's Disco (EDDI 2014)

All (Direct + Indirect) Broader Concepts

Concept:

'Country'@en

Page 66: 2014.12 - Let's Disco (EDDI 2014)

Direct Narrower Concepts

Concept: "Type of cooperation"@en

Page 67: 2014.12 - Let's Disco (EDDI 2014)

All (Direct + Indirect) Narrower Concepts

Concept: "Type of cooperation"@en

Page 68: 2014.12 - Let's Disco (EDDI 2014)

Top Concepts

Series: EU-SILC

Thematic Classification:

<urn:ddi:de.gesis:7bb54a91-4b26-4f6e-a1b7-be48cb58be24:0.1>

Page 69: 2014.12 - Let's Disco (EDDI 2014)

2-Level Concepts

Series: EU-SILC

Thematic Classification:

<urn:ddi:de.gesis:7bb54a91-4b26-4f6e-a1b7-be48cb58be24:0.1>

Page 70: 2014.12 - Let's Disco (EDDI 2014)

Lowest-Level Concepts (Leaf Concepts)

Series: EU-SILC

Thematic Classification:

<urn:ddi:de.gesis:7bb54a91-4b26-4f6e-a1b7-be48cb58be24:0.1>

Page 71: 2014.12 - Let's Disco (EDDI 2014)

Data SetsData Files

Page 72: 2014.12 - Let's Disco (EDDI 2014)
Page 73: 2014.12 - Let's Disco (EDDI 2014)

Data Sets

Page 74: 2014.12 - Let's Disco (EDDI 2014)
Page 75: 2014.12 - Let's Disco (EDDI 2014)
Page 76: 2014.12 - Let's Disco (EDDI 2014)

All Data Sets (IDs)

Study: EU-SILC 2010

Page 77: 2014.12 - Let's Disco (EDDI 2014)

Data Files

Page 78: 2014.12 - Let's Disco (EDDI 2014)
Page 79: 2014.12 - Let's Disco (EDDI 2014)

:dataFile

a disco:Datafile ;

dcterms:identifier

"ARG1900-P-H.dat“ ;

dcterms:description

"Person records"@en ;

disco:caseQuantity 2667714 ;

dcterms:format "ascii“ ;

dcterms:provenance

"Minnesota Population Center"@en ;

owl:versionInfo

"Version 1.0, IPUMS sample"@en .

Page 80: 2014.12 - Let's Disco (EDDI 2014)

:dataFile

a disco:Datafile ;

dcterms:spatial [

a dcterms:Location ;

rdfs:label

"Argentina, national coverage"@en];

dcterms:temporal :periodOfTime .

Page 81: 2014.12 - Let's Disco (EDDI 2014)

Controlled Vocabularies

Page 82: 2014.12 - Let's Disco (EDDI 2014)

Variables

Page 83: 2014.12 - Let's Disco (EDDI 2014)
Page 84: 2014.12 - Let's Disco (EDDI 2014)
Page 85: 2014.12 - Let's Disco (EDDI 2014)

ddi:AR80A401

a disco:Variable ;

skos:notation "AR80A401“ ;

skos:prefLabel "Sex"@en ;

disco:basedOn ddi:SexVD ;

disco:question ddi:QuestionGender .

Page 86: 2014.12 - Let's Disco (EDDI 2014)

ddi:SexVD

a disco:RepresentedVariable ;

disco:universe ddi:UniversePerson ;

disco:representation ddi:SexRepr ;

disco:concept ddi:IpumsC1 ;

skos:prefLabel "Sex"@en ;

dcterms:description

"Sex data element"@en.

Page 87: 2014.12 - Let's Disco (EDDI 2014)

missy:PB100

a disco:Variable ;

skos:notation "PB100" ;

skos:prefLabel

"Quarter of the personal

interview"@en ;

skos:concept :concept ;

disco:question :question .

Page 88: 2014.12 - Let's Disco (EDDI 2014)

Variables (Names + Labels)

Data Set:

EI-SILC 2010 cross-sec p-data

Data Set (ID):

<urn:ddi:de.gesis:logicalDataSet_EU-SILC-2010-cross-sec-p-data-2010_rev3:0.1>

Page 89: 2014.12 - Let's Disco (EDDI 2014)

Variable Concept

Data Set:EI-SILC 2010 cross-sec p-data

Data Set (ID):<urn:ddi:de.gesis:logicalDataSet_EU-SILC-2010-cross-sec-p-data-2010_rev3:0.1>Variable Name:

PB010

Page 90: 2014.12 - Let's Disco (EDDI 2014)

Study

Data Set

Variable label

Variable name: 'AGE'

Page 91: 2014.12 - Let's Disco (EDDI 2014)

Topical Coverage

Variable name:

B21

Study:

"Structure of Earnings Survey - 2006"@en

Page 92: 2014.12 - Let's Disco (EDDI 2014)

Variables having no concepts

Study title:

EU-SILC 2006

Page 93: 2014.12 - Let's Disco (EDDI 2014)

VariableRepresentation

Page 94: 2014.12 - Let's Disco (EDDI 2014)
Page 95: 2014.12 - Let's Disco (EDDI 2014)
Page 96: 2014.12 - Let's Disco (EDDI 2014)
Page 97: 2014.12 - Let's Disco (EDDI 2014)

Valid Codes and Categories

missy:1

a skos:Concept ;

skos:notation "1" ;

skos:prefLabel

"January,February,March" ;

disco:isValid true .

Page 98: 2014.12 - Let's Disco (EDDI 2014)

Invalid Codes and Categories

missy:Missing

a skos:Concept ;

skos:notation "M" ;

skos:prefLabel "Missing" ;

disco:isValid false .

Page 99: 2014.12 - Let's Disco (EDDI 2014)

Variable - Variable Representation

missy:PB100

a disco:Variable ,

missy:Variable ;

skos:notation 'PB100' ;

disco:representation

:representationPB100 .

Page 100: 2014.12 - Let's Disco (EDDI 2014)

Variable Representation

:representationPB100

a disco:Representation ,

skos:OrderedCollection ;

skos:memberList (

missy:1

missy:2

missy:3

missy:4

missy:Missing ) .

Page 101: 2014.12 - Let's Disco (EDDI 2014)

Variable Representation

Codes and Categories

Variable:

missy:PB100

Page 102: 2014.12 - Let's Disco (EDDI 2014)

Descriptive Statistics

Page 103: 2014.12 - Let's Disco (EDDI 2014)
Page 104: 2014.12 - Let's Disco (EDDI 2014)

Summary Statistics

Page 105: 2014.12 - Let's Disco (EDDI 2014)
Page 106: 2014.12 - Let's Disco (EDDI 2014)
Page 107: 2014.12 - Let's Disco (EDDI 2014)

missy:Minimum

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:Minimum ;

rdf:value "1".

Page 108: 2014.12 - Let's Disco (EDDI 2014)

Spatial Coverage of Study

:AllCountriesOfStudy

a dcterms:Location ,

missy:Country;

rdfs:label

"all countries of study";

missy:code "" .

Page 109: 2014.12 - Let's Disco (EDDI 2014)
Page 110: 2014.12 - Let's Disco (EDDI 2014)

missy:Minimum

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

<http://sws.geonames.org/2921044> ;

disco:summaryStatisticType

ddicv-sumstats:Minimum ;

rdf:value "1".

Page 111: 2014.12 - Let's Disco (EDDI 2014)

missy:Maximum

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:Maximum ;

rdf:value "4".

Page 112: 2014.12 - Let's Disco (EDDI 2014)

missy:Mean

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:ArithmeticMean ;

rdf:value "2.17".

Page 113: 2014.12 - Let's Disco (EDDI 2014)

missy:StandardDeviation

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:StandardDeviation ;

rdf:value "0.9061".

Page 114: 2014.12 - Let's Disco (EDDI 2014)

missy:ValidCases

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:ValidCases ;

rdf:value "470950".

Page 115: 2014.12 - Let's Disco (EDDI 2014)

missy:PercentOfValidCases

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:PercentOfValidCases ;

rdf:value "99.1".

Page 116: 2014.12 - Let's Disco (EDDI 2014)

missy:InvalidCases

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:InvalidCases ;

rdf:value "4195".

Page 117: 2014.12 - Let's Disco (EDDI 2014)

missy:PercentOfInvalidCases

a disco:SummaryStatistics ,

missy:SummaryStatistics;

disco:statisticsVariable

missy:PB100 ;

missy:country

:AllCountriesOfStudy ;

disco:summaryStatisticType

ddicv-sumstats:ValidCases ;

rdf:value "0.9".

Page 118: 2014.12 - Let's Disco (EDDI 2014)

Variable Pointing to Summary Statistics

missy:PB100

a disco:variable ,

missy:Variable ;

missy:summaryStatistics (

missy:Minimum

missy:Minimum_DE

missy:Maximum

missy:Maximum_DE

… )

Page 119: 2014.12 - Let's Disco (EDDI 2014)

Summary Statistics: Minimum

Variable:

missy:PB100

Spatial Coverage:

all countries of study

Page 120: 2014.12 - Let's Disco (EDDI 2014)

Summary Statistics: Valid Cases

Variable:

missy:PB100

Spatial Coverage: DE

Page 121: 2014.12 - Let's Disco (EDDI 2014)

Category Statistics

Page 122: 2014.12 - Let's Disco (EDDI 2014)
Page 123: 2014.12 - Let's Disco (EDDI 2014)

Valid Codes and Categories

missy:2

a skos:Concept , missy:Concept ;

skos:notation "2" ;

skos:prefLabel

"April, May, June" ;

disco:isValid true ;

missy:categoryStatistics (

missy:CS_2_AllCountries

missy:CS_2_DE ) .

Page 124: 2014.12 - Let's Disco (EDDI 2014)

Invalid Codes and Categories

missy:Missing

a skos:Concept , missy:Concept ;

skos:notation "M" ;

skos:prefLabel "Missing" ;

disco:isValid false ;

missy:categoryStatistics (

missy:CS_M_AllCountries ) .

Page 125: 2014.12 - Let's Disco (EDDI 2014)

Valid Cases

missy:CS_2_AllCountries

a disco:CategoryStatistics ,

missy:CategoryStatistics ;

disco:statisticsCategory missy:2 ;

missy:country

:AllCountriesOfStudy ;

disco:frequency 243708 ;

disco:percentage 51.3 ;

disco:cumulativePercentage 51.7 ;

disco:computationBase "valid" .

Page 126: 2014.12 - Let's Disco (EDDI 2014)

Invalid Cases

missy:CS_M_AllCountries

a disco:CategoryStatistics ,

missy:CategoryStatistics ;

disco:statisticsCategory

missy:Missing ;

missy:country

:AllCountriesOfStudy ;

disco:frequency 4195 ;

disco:percentage 0.9 ;

disco:computationBase "invalid" .

Page 127: 2014.12 - Let's Disco (EDDI 2014)

Category Statistics:

Frequency ( Invalid Cases)

Variable: missy:PB100

Spatial Coverage: All Countries of Study

Code: missy:Missing

Page 128: 2014.12 - Let's Disco (EDDI 2014)

Category Statistics:

Cumulative Percentage ( Valid Cases)

Variable: missy:PB100

Spatial Coverage: DE

<http://sws.geonames.org/2921044>

Code: missy:2

Category Label: 'April, May, June'

Page 129: 2014.12 - Let's Disco (EDDI 2014)
Page 130: 2014.12 - Let's Disco (EDDI 2014)

Data Collection

Page 131: 2014.12 - Let's Disco (EDDI 2014)
Page 132: 2014.12 - Let's Disco (EDDI 2014)

:variableYearOfBirth

a disco:Variable

skos:notation "RB080" ;

skos:prefLabel "Year of birth"@en ;

dcterms:subject :concept ;

disco:question :questionYearOfBirth.

Page 133: 2014.12 - Let's Disco (EDDI 2014)

:questionYearOfBirth

a disco:Question ;

disco:questionText

"What is your date of birth?"@en .

Page 134: 2014.12 - Let's Disco (EDDI 2014)

Variable Question

Data Set:EI-SILC 2010 cross-sec p-data

Data Set (ID):<urn:ddi:de.gesis:logicalDataSet_EU-SILC-2010-cross-sec-p-data-2010_rev3:0.1>Variable Name:

PB010

Page 135: 2014.12 - Let's Disco (EDDI 2014)

Variables

Series:

EU-SILC 2005

Question text:

'What is your date of birth?'@en

Page 136: 2014.12 - Let's Disco (EDDI 2014)

Relationships to other Vocabularies

Page 137: 2014.12 - Let's Disco (EDDI 2014)

PHDD

Page 138: 2014.12 - Let's Disco (EDDI 2014)

Mapping DDI-XML to Disco

Page 139: 2014.12 - Let's Disco (EDDI 2014)

DDI 4

Page 140: 2014.12 - Let's Disco (EDDI 2014)

DDI 4• Model-driven further development of DDI

• Model generate multiple representations(OWL, XSD, Java, RDB, …)

• Functional views are published in a step by step manner

Page 141: 2014.12 - Let's Disco (EDDI 2014)

Disco + DDI 4

Page 142: 2014.12 - Let's Disco (EDDI 2014)

Do Not Wait for DDI 4!• Own functional view for disco

• Mapping: disco DDI 4 (OWL representation)

• Easy migration

Page 143: 2014.12 - Let's Disco (EDDI 2014)

Let‘s Disco Now!

Page 144: 2014.12 - Let's Disco (EDDI 2014)
Page 145: 2014.12 - Let's Disco (EDDI 2014)

Acknowledgements

26 experts from the statistical community and the Linked Data community comingfrom 12 different countries contributed to this work. They were participating inthe events mentioned below.

• 1st workshop on 'Semantic Statistics for Social, Behavioural, and EconomicSciences: Leveraging the DDI Model for the Linked Data Web' at SchlossDagstuhl - Leibniz Center for Informatics, Germany in September 2011

• Working meeting in the course of the 3rd Annual European DDI Users GroupMeeting (EDDI11) in Gothenburg, Sweden in December 2011

• 2nd workshop on 'Semantic Statistics for Social, Behavioural, and EconomicSciences: Leveraging the DDI Model for the Linked Data Web' at SchlossDagstuhl - Leibniz Center for Informatics, Germany in October 2012

• Working meeting at GESIS - Leibniz Institute for the Social Sciences inMannheim, Germany in February 2013