towards a persistent identifier infrastructure for european e-research daan broeder clarin / mpg...
Post on 27-Mar-2015
216 Views
Preview:
TRANSCRIPT
Towards aPersistent Identifier Infrastructure for
European e-Research
Daan BroederCLARIN / MPG
2008 CNRI Handle System Workshop
Content
• Domain & Scope
• Organizational embedding
• Further requirements
• Services for e-research with PIDs
2008 CNRI Handle System Workshop
Domain & Scope
• Reliable references & citations of web accessible resources• Language resource domain
– Audio & video recordings, pictures, primary texts, annotations– Lexica, grammar descriptions, …– Concepts in terminology registries and ontology's– …
• Number of resources very big, dependent on how you approach the granularity issue
• References and citations – embedded in (web) documents– In data structures– In DBs– …
2008 CNRI Handle System Workshop
CLARIN Common Language Resources and Technology Infrastructure
The CLARIN project is a large-scale pan-European collaborative effort to create, coordinate and make language resources and technology available and readily useable. As one of its goals CLARIN will create a federation of LR repositories and aims to create a unified resource registry using persistent identifiers.
2008 CNRI Handle System Workshop
CLARIN Common Language Resources and Technology Infrastructure
• Preparatory phase 2008-2011 (Construction phase 2011-2020)
• European dimension (ICT FP7)
– 112 members from 35 countries, – Prep. Phase Funded with 4.2 ME
• National dimension:– Funding until now 6.5 ME, more to come– …
2008 CNRI Handle System Workshop
DAM-LR Distributed Access Management for Language Resources
• (Small 4 partners) European Project aimed at federation building in LR repository domain, 2005-2007
• Unified metadata catalogue
• Identity federation using Shibboleth
• Single resource identifier system for all “published” resources using the Handle System
2008 CNRI Handle System Workshop
Developed special tools
• Mover
– Updates Handle DB + catalogue
– Updates metadata XML files*
• Restore operations
– Recreate the Handle DB (and others) from scratch
Lessons learned
– Fed. Tech not for all organizations
Lund archive
R
MPI archive
R
primary1839
sec.10050
primary10050
INL archive
R
primary10032
R
R
R
R
R
sec.10032
sec.1839
DAM-LR HS infrastructure
User benefits
MPGMax-Planck Society
• Proposal within the MPG to support a MPG wide PID registration service based on the HS.
• Run by MPG computing center GWDG
• Will also give support for non-MPG German scientific organizations and (hopefully) CLARIN.
2008 CNRI Handle System Workshop
Requirements
• (Political) Independence: European GHR mirror & proxy + no single point of failure
• Wide(r) acceptance of PID scheme
• Support for object part addressing, from ISO TC37/SC4 CITER work.
• Support for (secure) management of resource copies
2008 CNRI Handle System Workshop
proxy
MPG/CLARIN@GWDG
MPI archiveClass A
R
primary1839
primary1111
.. ArchiveClass C
R
R RR
CLARIN PID Infrastructure
sec.…
sec.…
1839/R1
GHR mirror
1111/R5
sec.1839
PID registration
service
PID Scheme
• Difficult to gain acceptance– Without PID syntax being “official”– W3C seems to have problems with anything
else but HTTP (see recent XRI events)
• Can the HS user community help? • Possibly only acceptance via urlified
handles: http://hdl.handle.net/1039/R5• Perhaps follow ARK for elegance:
– http://hdl.handle.net/hdl:/1039/R5
2008 CNRI Handle System Workshop
A
y
x z
• Wasteful to issue a pid for each part (think of 100k entries in a lexicon). So use part identifiers.
• Resolver can make an adequate translation “A#z” -> “objectA?part=z” This requires enough
flexibility from the resolver to accommodate the object server.
• The syntax of “Z” should be standard for the specific data type. Loan from existing fragment identifier syntax standards.
1839/A1839/x1839/y1839/z
1839/A: + 1839/A#x, 1839/A#y, 1839/A#z
pidresolver
objectserver
1839/A#z
http://oserver/objectA?part=z
1839/A
http://oserver/objectA A
y
x z
z
2008 CNRI Handle System Workshop
PIDs & Resource Parts
Lund archive
R
MPI archivecopy
10050/R -> http://lund/lund_url
primary1839
primary10050
R
• What if MPI moves the resource copy?
• MPI should have wrt access to the Lund Handle record
• This would enable changing the Lund URL record too!
-> http://mpi/mpi_url
move
LHSAccessmonitor
MPIManager
R
2008 CNRI Handle System Workshop
Resource duplicates
Lund archive
R
MPI archive
R
copy
10050/R -> http://lund/lund_url
primary1839
primary10050
R
indirect handles*• TYPE = URL
– IE-Plugin: ok.– HS proxy: not-ok
• TYPE = HS_ALIAS (problem*) – IE-Plugin: ok.– HS-Proxy ok
• Status of 1839/Rcpy handle?– Use in documents?
-> hdl:1839/Rcpy
1839/Rcpy -> http://mpi/mpi_url
MPIManager
move
Resource duplicates
2008 CNRI Handle System Workshop
Possible Added PID Services
• Establishing resource authenticity
• Resource Collection Registration
• Resource Citation Information
• Lost Resource Detective
• …
2008 CNRI Handle System Workshop
Collection Registration Service
• Much scientific works depends on seemingly “accidental” distributed collections of material that has no independent embodiment.
• Needs to be citable with one single PID– encode the collection’s resource uris directly in a
handle record– attach a link to a map of the collection’s uris
• Compare recent “Aggregation Map” concept from ORE
2008 CNRI Handle System Workshop
Citation Information Service
• (Collections of) resources need to be cited in documents. Acknowledgement & credit also important for primary scientific data E.g. “Dutch Spoken Corpus, © Institute for Dutch Lexicography, ….”
• Make this citation information part of the with the PID associated metadata.
2008 CNRI Handle System Workshop
Establishing Provenance
• If by accident the handle <-> URI mapping was not properly maintained, special metadata could be available from the handle record to establish its location or find a copy.– URI history, Repository, Depositor, …
• Labor intensive• Only for limited number of resources
unless there is a pattern
2008 CNRI Handle System Workshop
Lost Resource Detective
2008 CNRI Handle System Workshop
The End
Integration • it should be an optional
extension• Make sure HS is not SPF• IMDI/LAT SW functions also
without HS
Issue handles for objects• Only for local resources
Need special tools• Mover
– Updates Handle DB + catalogue
– Updates IMDI XML files*• Restore operations
– Recreate the Handle DB (and others) from scratch
MPI1001# mpi_url
1839/087-D mpi_url LHS
LATwebapps
sync
Handle DB
catalogue
mover
IMDIharvester
CC
S S S S S
C
DAM-LR HS infrastructure
top related