dwb discovery portal · dwb discovery portal a new cessda portal for european research data...
TRANSCRIPT
DwB Discovery Portal A New CESSDA Portal for European Research Data Discovery
John Shepherdson - UKDA Pascal Heus - Metadata Technology Ørnulf Risnes - NSD
Overview
• Supports DwB goal “equal and easy access to official microdata for the European Research Area”
Ø provides “more coherent system for resource discovery of official sta@s@cs”
Ø demonstrates ability to ingest metadata from mul@ple sources, via mul@ple protocols
Scope of the portal work in WP12
• Content-‐wise Ø Metadata from NSIs + Archives
• Technical Ø Build prototype/beta Ø Sound, future proof methods, architecture, components
Ø Standards-‐based Ø Extensible Ø Easy to hand over to ‘sustainability’ body
Func9onal aspects of the portal
• Research data discovery (obviously)
• Provider portal, QA
• PlaRorm for addi@onal services
Metadata ingest
• Metadata gets... Ø harvested Ø made ready for QA Ø transformed into canonical model
Ø indexed Ø exposed
Canonical metadata model • Harmonisa@on
Ø DDI-‐C, DDI-‐L, MISSY, CIMES, etc.
• Builds on DISCO Ø DDI discovery RDF
Metadata ingest dependencies Step Source/standard 1) Harves@ng specific 2) Produce harves@ng report specific 3) Conversion to Raw-‐RDF agnos@c 4) Produce conversion report agnos@c 5) Harmoniza@on agnos@c 6) Produce harmoniza@on report agnos@c 7) Loading agnos@c 8) Produce loading report agnos@c 9) Indexing agnos@c 10) Produce indexing report agnos@c 11+) Discovery, other downstream processes agnos@c
PlaAorm for services
• DwB search portal is just a front-‐end applica@on
• Machine-‐ac@onable interfaces for most func@ons (REST)
Search Portal (alpha) • Powered by Solr • Facets
Ø producer, geography, date, data type …
Search Portal (alpha) • Sugges@ons / autocomplete
Search Portal (alpha) • ‘Did you mean?’ func@onality
Sprint in Colchester, May 2014 • Use Jenkins CI tool to:
Ø Harvest Nesstar metadata § any public instance
Ø Load DDI XML in to BaseX Ø Convert DDI XML to raw DwB-‐RDF Ø Harmonize DwB-‐RDF
§ Simple
Sprint in Colchester, May 2014 • Integrated Jenkins with Git to
Ø Build Nesstarvester and BasexSync tools automa@cally Ø Update harmoniza@on scripts automa@cally
• Iden@fied mechanism to detect metadata language Ø So can check language tag is correct
• Produced Solr schema
Jenkins Dashboard
Jenkins Job Details
Jenkins Job Details
Metadata harmoniza9on • Standard level -‐ sources based on the various metadata
standards • Version level -‐ within a standard, the use of different versions
(e.g. DDI 1.2.2, 2.5, 3.x) • Template/flavour level -‐ the use of elements of the standard
for different purposes; presence/absence of op@onal elements Ø driven by ins@tu@onal prac@ces, templates, or sopware tooling
Typical Console Output (Captured)
What next? • Perform provider/format specific transforma@ons
• Apply DwB specific adjustments (iden@fiers, system metadata, etc.)
• Apply DwB harmonizers (map metadata in to DwB standard facets/CV etc.)
• Load harmonized DwB-‐RDF in to Virtuoso RDF database
• Index DwB-‐RDF with Solr
What next?
• Producing various inges@on / QA reports • Propagate deletes for survey that have been dropped
• Synchronize various metadata files to repository Ø For ‘before and aper’ comparisons/provider feedback
Any Ques9ons?