why, what were the idea ? 1.create a data infrastructure, 2.data + the knowledge products that are...

19
Why, what were the idea ? 1. Create a data infrastructure, 2. Data + the knowledge products that are produced on the basis of data a) Efficiant access to large volumes of data b) Promote comparative analysis c) Support dissemination of knowledge d) Support the idea that knowledge have to be empirically based e) Create an infrastructure that may grow by its own force

Upload: xiomara-boatman

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Why, what were the idea ?1. Create a data infrastructure, 2. Data + the knowledge products that are produced on the basis

of data

a) Efficiant access to large volumes of datab) Promote comparative analysisc) Support dissemination of knowledged) Support the idea that knowledge have to be empirically basede) Create an infrastructure that may grow by its own force

How

• A distributed model, data stored and maintained locally, modern technology substitute for central institutions

• One common entrypoint, a portal• One common metadata standard, that

we were supposed to contribute to• One technical solution• One common multilingual thesaurus

More hows• A requirement was that the user

communities participated, allowed themselves to be activated and invested some resourcesa) Developing a classification of resourcesb) Use common metadata standard

Give bettered semantics / ontologyHelp solve some language issuesProduce more heterogeneous dataProduce better quality of dataGive better administration of data

Resource promation and integration

• Tools for publishing and finding data• Guidelines for publishing and finding

data• Access control

• And there should be room for others, we could go beyond CESSDA

The Portal• Metadata is all about communication• A set of tools + an idea: Data is the core

that facilitates a ”conversation”

• Technology, functionality• Multilingual thesaurus• Metadata standard

Activity in numbers

• 10 000 manhours• 40+ persons• 41 deliverables • 3 workshops• 7 meetings• 15 presentations• 33 teleconferences• The portal contains:

– 3000 studies– 500 000 objects

Economic situation Year 3

Total budget Spent RemainingNSD 750 478 659 073 91 405UKDA 374 044 296 557 77 487DDA 183 868 158 777 25 091FSD 125 188 113 751 11 437NESSTAR 532 688 423 504 109 184EKKE 58 570 56 210 2 360SIDOS 156 829ZA 27 844 22 984 4 860

EC contribution

Total EC funding 1 291 000 €- Received 975 000 €= Remaining 316 000 €

List of deliverables

D1.1 - Project Initiation Document D3.1 - Functional Specification and Design - M3D5.1 - Guidelines Thesaurus construction & translation D1.2 - Quality Assurance Plan D2.1 - User Analysis Report - M6 D3.2 - MADIERA Prototype - M6 D7.1 - Dissemination Plan - M6 D1.3 - Periodic Progress Report (6-month) - M7D2.2 - Usability test - MADIERA Prototype - M8 D3.3 - MADIERA Beta Version 1 - M15 D3.3a - MADIERA Beta Version 2 - M17 D3.3b - MADIERA Publisher Beta Version B - M17 D4.1 - Recommendation - Geo-referencing system D6.1 - Guidelines - Content provision &access control D2.3 - Usability test - MADIERA Beta version - D1.4 - Periodic Progress Report (12-month) - M14D4.2 - Methodology identification comparable elements D3.4 - MADIERA Version 1.0 - M23D4.3 - Naming and identification recommendation D5.2 - Report on adm mechanisms for thesaurus maintenance - M18 6.2 - User guides and training packs for content provision - M18

D6.3 - First version of hyper-linked information space demonstrator - M23 D6.4 - Data archive content provision workshop - D6.5 - Workshop on content metadata (CDG/DDI)D7.2 - On-going dissemination events

D7.3 - Userguides and training packs - M23 D8.2 - Workshops for non-archive data providers - D2.4 - Usability test - MADIERA Version 1 - M24 D1.5 - Periodic Progress Report (18-month) - M19D5.3 - Extended multilingual thesauri - M24 D6.6 - Hyperlinked information-space demonstrator version 2 - M24 D1.6 - Periodic Progress Report (24-month) - M26D4.4 - Package of revised recommendations - M27 D5.4 - Evaluation Workshops - M30 D1.7 - Periodic Progress Report (30-month) - M31D1.8 - Third annual report - M38D2.5 - Final usability test report - M38D3.5 - MADIERA Version 1.1 - M38D5.5 - Additional thesaurus hierarchies - M38D8.3 - Technological Implementation Plan - M41D1.8 - Final Report - M41

The PortalWe have data identified at 3 levels:

Study, Variable group and Variable

  Study Variable group Variable Free text search X X XCESSDA Classification XELSST 1 XELSST 2 X X XArchives XNUTS X

The Portal• The free-text search give the user the possibility to specify a completely free search

term. If you search for “sausage”, you will presently get 1 hit, at variable level. This term (sausage) seems not to be in ELSST (yet)

• If you search for “radio”, you get 12.951 hits. “Radio” is a word used in many languages (all languages with data on the servers).

• If you search for “fjernsyn”, you get 2.911 hits. “Fjernsyn” is the Norwegian word for television.

• If we expand the word “fjernsyn” to the equivalent in other languages, we get 10.311 hits. Such an expansion checks against ELSST and picks up the translations.

• Common for all: Searching in free text may give hits at all three levels of data. When browsing, some terms (keywords) are automatically translated back to the user.

• The Cessda classification is a controlled vocabulary used for the DDI element topcClass, which is at study level. <codeBook> <stdyDscr> <stdyInfo> <subject> <topcClas>. If this term is systematically used, we can set up a catalog structure. Then a study typically could be published in more than one catalogue.

• ELSST1 is a finer granulation then the Cessda classification, it gives the impression of an alphabethical sorted list of keywords, and it gives easy access to translations and the systematic structure with synonyms and related terms. But it works at study level, <codeBook> <stdyDscr> <stdyInfo> <subject> <keyword> .

The Portal

• ELSST1 is a finer granulation then the Cessda classification, it gives the impression of an alphabethical sorted list of keywords, and it gives easy access to translations and the systematic structure with synonyms and related terms. But it works at study level, <codeBook> <stdyDscr> <stdyInfo> <subject> <keyword>

• ELSST2 matches on a few key text fields (title, abstract, keywords, subject, etc.) The most important thing about the etc is that it searches DDI elements at three different levels, study, variable group (name) and variable level (label, text, concept).

• Archives actually lists the servers under the portal, for every server studies are listed sorted alphabethic

• The NUTS list gives units at different levels of NUTS, the search could use coordinates inserted in GeoBndBox. I don’t know how this is done (which DDI-elements are used).

Functionality: Geo-Chartography

Finding data by geography

Europe a mixture of political, administrative and statistical units

Code, Name, Coordinates

Problem: Publish

Functionality: Comparability

Functionality: Naming ConventionsObjective: For a user to be able to update (metadata)

1. Add to metadata of a study2. Use could also lead to changes, corrections, updates

Distinguish between two components of an identification:

Identifier (static) – version code (dynamic)

Elements that we identify consist of data and metadata

Elements could also be a complex mixture of instances that make up a study

And studies could be part of series

Functionality: Naming Conventions

Series

Study

Instance 1 Instance 2

Data

Metadata

All this described as a complex set of modules

Data from data producersMetadata from archives

DDI 3.0ID Module Simple Complex P/L

W Wrapper 1..1 1..1 L

A Archive 1..1 1..1 L

G Group 0..0 1..n P

C Concept 1..1 1..n P

DC Data Collection 1..n 1..n P

I Instrumentation 1..n 1..n P

LC Logical Data Structure 1..n 1..n P

PS Physical Data Structure 1..n 1..n L

PI Physical Instance 1..n 1..n L