unified digital format registry (udfr) stakeholder meeting library of congress washington, dc april...

20
Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Upload: anthony-blake

Post on 17-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Unified Digital Format Registry (UDFR)

Stakeholder Meeting

Library of CongressWashington, DCApril 13, 14, 2011

Page 2: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Welcome!Stephen Abrams, Associate directorLisa Colvin, UDFR project managerAlex Genadinik, UDFR project developer

University of California Curation Center

Bibliothèque nationale de France Library of CongressData Conservancy / Johns Hopkins U Los Alamos National LaboratoryDataONE / UC Santa Barbara National Archives [UK]Deutsche Nationalbibliothek National Archives [US]Ex Libris National Library of New ZealandFamily Search New York UniversityFlorida Center for Library Automation Open Planets F / Nationaal ArchiefGDFR / Harvard University TessellaGeorgia Institute of Technology University of PennsylvaniaGovernment Printing Office [US] Virginia Institute of TechnologyKoniklijke Bibliotheek

Page 3: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Objectives

The desired outcomes of this stakeholder meeting are:•Agreement on the scoping of functional and non-functional requirements

•Agreement on the data modeling process and ontology

•Agreement on key technology decisions

•Agreement on project plan and schedule

•Groundwork for the administrative and technical continuity of UDFR as an ongoing service

Page 4: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Key questions• What subset (or superset) of PRONOM and GDFR functionality and data modeling

should be supported?

• Is there a useful distinction between format “facts” and “policies”?

• What are the criteria for contributor eligibility?

• To what level of technical review should/will contributed information be subject , and by whom? Are new contributions immediately visible in an unreviewed state?

• What is the appropriate granularity of provenance and review?

• Should UDFR identifiers be transparent or opaque?

• Should UDFR support static or dynamic inheritance of properties?

• Must there be an explicit grant of license by content contributors?

• What is the proper replication model: master/slave(s) or peer-to-peer?

• Should UDFR support classes of information that is not replicated?

• What are the criteria for node eligibility?

• What is the ongoing relationship between PRONOM and UDFR?

Page 5: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

AgendaTime Topic

09:00 – 09:20 Welcome and introductions

09:20 – 09:30 Review of objectives and agenda

09:30 – 10:00 Project background

10:00 – 10:30 Use cases and functional requirements

10:30 – 11:00 Break

11:00 – 11:30 Function requirements (continued)

11:30 – 12:30 Data modeling and ontology

12:30 – 13:30 Lunch

13:30 – 14:30 Data modeling and ontology (continued)

14:30 – 15:00 Technical architecture

15:00 – 15:30 Break

15:30 – 16:30 Technical platform decisions

16:30 – 17:00 Questions and discussion

17:00 Adjourn

Page 6: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

AgendaTime Topic

09:00 – 09:30 Project schedule

09:30 – 10:15 Initial population of UDFR

10:15 – 10:45 Community building

10:45 – 11:15 Break

11:15 – 12:30 Community building (continued)

12:30 – 13:00 Follow-up planning

17:00 Adjourn

Page 7: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Project background

• Why worry about formats?

Information preservation

Bit preservation

• Since formatted digital assets are inherently mediated by technology, they are particularly susceptible to disruptive technological change

Format

a set of syntactic and semantic rules for mapping

between an information model and a serialized bit

stream

Page 8: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Project background

• PRONOMhttp://www.nationalarchives.gov.uk/PRONOM/Default.aspx

• Global Digital Format Registry (GDFR)http://www.gdfr.info/

• Unified Digital Format Registry (UDFR)http://www.udfr.org/

– “The Unified Digital Format Registry (UDFR) will provide a reliable, sustainable and publicly accessible knowledge base of file format information”

– Fully open source implementation that “unifies” the function and data holdings of PRONOM and GDFR

Page 9: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

UDFR project

1 year, 2+ FTE, funded by the Library of Congress• Features

– Use cases and functional requirements developed by the stakeholder community over the past two years

– Support for linked data and semantic web– Support for a distributed network of independent but

interoperable UDFR nodes

• Deliverables– Working, documented, single-node registry system,

initially populated with an export from PRONOM, GDFR, and other appropriate sources

– BSD license

Page 10: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Community building

How can we ensure the administrative and technical continuity of the UDFR once the LC-funded work is completed?•Policy and strategic planning•Operation of the initial registry node•Recruitment of additional nodes•Technical maintenance and enhancement of the code base•Content contribution •Review of contributed information

Page 11: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Policy and strategic planning

What is the lightest weight governance structure that is effective?•Continue as an ad hoc group or develop a more formal organization?

•Operate as loose consortium under an MOU•Look for an administrative umbrella under an existing organization

Page 12: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Operational considerations

CDL is prepared to provide an operational home for the initial production node on an interim basis•Any long-term commitment may require some (minimal) level of cost recovery

Additional replication nodes•Eligibility requirements?•Minimal/maximal number desired?

Page 13: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Technical maintenance and enhancement

• Manage source code in a public code repository

• Enhancement planning and prioritization– Call for community-wide evaluation at 6/12 months of

production operation

• Eligibility for contributors? Committers?

Page 14: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Content contribution

• Contributor eligibility– Are contributors recruited or self-selected ?

• What can we do to encourage contribution?– Engagement by institution and discipline

Page 15: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Technical review

• Reviewer eligibility– Are reviewers recruited or self-nominated?

• Single or multiple levels of scrutiny?

• Standard criteria for evaluation– What is the appropriate level of due diligence?

Page 16: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Follow-up planning

Next steps• Ongoing project work with early prototype releases

• Production release (single node) in January 2012

• Governance, policy, and planning structure

• Solicitation of replication nodes

• Solicitation of content contribution

• 6/12 month evaluation

Page 17: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Key questions• What subset (or superset) of PRONOM and GDFR functionality and

data modeling should be supported?• Is there a useful distinction between format “facts” and “policies”?

– Priority for “facts”; support for “policies” as time permits.

• What are the criteria for contributor eligibility?– No criteria, but user account required (i.e. no anonymous contribution).

• To what level of technical review should/will contributed information be subject , and by whom? Are new contributions immediately visible in an unreviewed state?– Opportunity (but not a requirement) for review. Strong provenance will

be maintained, as well as explicit tagging indicating the level of review.

• What is the appropriate granularity of provenance and review?– Individual assertion.

… answered?!

Page 18: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Key questions

• Should UDFR identifiers be transparent or opaque?

– Opaque, and without a node identifier component (to avoid the co-reference problem).

• Should UDFR support static or dynamic inheritance of properties?

– Not clear if inheritance is a feature of the model, the query system, or the UI.

• Must there be an explicit grant of license by content contributors?– Yes, ideally using CC0.

• What is the proper replication model: master/slave(s) or peer-to-peer?

– Master/slave(s), but replication is not the highest immediate priority. However, nothing in the design or implementation of the registry should preclude adding support for replication in the future.

… answered?!

Page 19: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Key questions

• Should UDFR support classes of information that is not replicated?

– Need to deal gracefully with legally encumbered information. In a master/slave configuration, data entered at a slave node would remain local.

• What are the criteria for node eligibility?

– With no consensus on the immediate need for replication, this question does not require an immediate answer. Some identified criteria include: geographic dispersion and high-availability operation.

• What is the ongoing relationship between PRONOM and UDFR?

– Continued close consultation and collaboration.

… answered?!

Page 20: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011

Thank you!

• http://www.udfr.org/

• Safe travels!