unified digital format registry (udfr) stakeholder meeting library of congress washington, dc april...
TRANSCRIPT
![Page 1: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/1.jpg)
Unified Digital Format Registry (UDFR)
Stakeholder Meeting
Library of CongressWashington, DCApril 13, 14, 2011
![Page 2: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/2.jpg)
Welcome!Stephen Abrams, Associate directorLisa Colvin, UDFR project managerAlex Genadinik, UDFR project developer
University of California Curation Center
Bibliothèque nationale de France Library of CongressData Conservancy / Johns Hopkins U Los Alamos National LaboratoryDataONE / UC Santa Barbara National Archives [UK]Deutsche Nationalbibliothek National Archives [US]Ex Libris National Library of New ZealandFamily Search New York UniversityFlorida Center for Library Automation Open Planets F / Nationaal ArchiefGDFR / Harvard University TessellaGeorgia Institute of Technology University of PennsylvaniaGovernment Printing Office [US] Virginia Institute of TechnologyKoniklijke Bibliotheek
![Page 3: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/3.jpg)
Objectives
The desired outcomes of this stakeholder meeting are:•Agreement on the scoping of functional and non-functional requirements
•Agreement on the data modeling process and ontology
•Agreement on key technology decisions
•Agreement on project plan and schedule
•Groundwork for the administrative and technical continuity of UDFR as an ongoing service
![Page 4: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/4.jpg)
Key questions• What subset (or superset) of PRONOM and GDFR functionality and data modeling
should be supported?
• Is there a useful distinction between format “facts” and “policies”?
• What are the criteria for contributor eligibility?
• To what level of technical review should/will contributed information be subject , and by whom? Are new contributions immediately visible in an unreviewed state?
• What is the appropriate granularity of provenance and review?
• Should UDFR identifiers be transparent or opaque?
• Should UDFR support static or dynamic inheritance of properties?
• Must there be an explicit grant of license by content contributors?
• What is the proper replication model: master/slave(s) or peer-to-peer?
• Should UDFR support classes of information that is not replicated?
• What are the criteria for node eligibility?
• What is the ongoing relationship between PRONOM and UDFR?
![Page 5: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/5.jpg)
AgendaTime Topic
09:00 – 09:20 Welcome and introductions
09:20 – 09:30 Review of objectives and agenda
09:30 – 10:00 Project background
10:00 – 10:30 Use cases and functional requirements
10:30 – 11:00 Break
11:00 – 11:30 Function requirements (continued)
11:30 – 12:30 Data modeling and ontology
12:30 – 13:30 Lunch
13:30 – 14:30 Data modeling and ontology (continued)
14:30 – 15:00 Technical architecture
15:00 – 15:30 Break
15:30 – 16:30 Technical platform decisions
16:30 – 17:00 Questions and discussion
17:00 Adjourn
![Page 6: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/6.jpg)
AgendaTime Topic
09:00 – 09:30 Project schedule
09:30 – 10:15 Initial population of UDFR
10:15 – 10:45 Community building
10:45 – 11:15 Break
11:15 – 12:30 Community building (continued)
12:30 – 13:00 Follow-up planning
17:00 Adjourn
![Page 7: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/7.jpg)
Project background
• Why worry about formats?
Information preservation
Bit preservation
• Since formatted digital assets are inherently mediated by technology, they are particularly susceptible to disruptive technological change
Format
a set of syntactic and semantic rules for mapping
between an information model and a serialized bit
stream
![Page 8: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/8.jpg)
Project background
• PRONOMhttp://www.nationalarchives.gov.uk/PRONOM/Default.aspx
• Global Digital Format Registry (GDFR)http://www.gdfr.info/
• Unified Digital Format Registry (UDFR)http://www.udfr.org/
– “The Unified Digital Format Registry (UDFR) will provide a reliable, sustainable and publicly accessible knowledge base of file format information”
– Fully open source implementation that “unifies” the function and data holdings of PRONOM and GDFR
![Page 9: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/9.jpg)
UDFR project
1 year, 2+ FTE, funded by the Library of Congress• Features
– Use cases and functional requirements developed by the stakeholder community over the past two years
– Support for linked data and semantic web– Support for a distributed network of independent but
interoperable UDFR nodes
• Deliverables– Working, documented, single-node registry system,
initially populated with an export from PRONOM, GDFR, and other appropriate sources
– BSD license
![Page 10: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/10.jpg)
Community building
How can we ensure the administrative and technical continuity of the UDFR once the LC-funded work is completed?•Policy and strategic planning•Operation of the initial registry node•Recruitment of additional nodes•Technical maintenance and enhancement of the code base•Content contribution •Review of contributed information
![Page 11: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/11.jpg)
Policy and strategic planning
What is the lightest weight governance structure that is effective?•Continue as an ad hoc group or develop a more formal organization?
•Operate as loose consortium under an MOU•Look for an administrative umbrella under an existing organization
![Page 12: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/12.jpg)
Operational considerations
CDL is prepared to provide an operational home for the initial production node on an interim basis•Any long-term commitment may require some (minimal) level of cost recovery
Additional replication nodes•Eligibility requirements?•Minimal/maximal number desired?
![Page 13: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/13.jpg)
Technical maintenance and enhancement
• Manage source code in a public code repository
• Enhancement planning and prioritization– Call for community-wide evaluation at 6/12 months of
production operation
• Eligibility for contributors? Committers?
![Page 14: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/14.jpg)
Content contribution
• Contributor eligibility– Are contributors recruited or self-selected ?
• What can we do to encourage contribution?– Engagement by institution and discipline
![Page 15: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/15.jpg)
Technical review
• Reviewer eligibility– Are reviewers recruited or self-nominated?
• Single or multiple levels of scrutiny?
• Standard criteria for evaluation– What is the appropriate level of due diligence?
![Page 16: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/16.jpg)
Follow-up planning
Next steps• Ongoing project work with early prototype releases
• Production release (single node) in January 2012
• Governance, policy, and planning structure
• Solicitation of replication nodes
• Solicitation of content contribution
• 6/12 month evaluation
![Page 17: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/17.jpg)
Key questions• What subset (or superset) of PRONOM and GDFR functionality and
data modeling should be supported?• Is there a useful distinction between format “facts” and “policies”?
– Priority for “facts”; support for “policies” as time permits.
• What are the criteria for contributor eligibility?– No criteria, but user account required (i.e. no anonymous contribution).
• To what level of technical review should/will contributed information be subject , and by whom? Are new contributions immediately visible in an unreviewed state?– Opportunity (but not a requirement) for review. Strong provenance will
be maintained, as well as explicit tagging indicating the level of review.
• What is the appropriate granularity of provenance and review?– Individual assertion.
… answered?!
![Page 18: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/18.jpg)
Key questions
• Should UDFR identifiers be transparent or opaque?
– Opaque, and without a node identifier component (to avoid the co-reference problem).
• Should UDFR support static or dynamic inheritance of properties?
– Not clear if inheritance is a feature of the model, the query system, or the UI.
• Must there be an explicit grant of license by content contributors?– Yes, ideally using CC0.
• What is the proper replication model: master/slave(s) or peer-to-peer?
– Master/slave(s), but replication is not the highest immediate priority. However, nothing in the design or implementation of the registry should preclude adding support for replication in the future.
… answered?!
![Page 19: Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011](https://reader036.vdocument.in/reader036/viewer/2022072112/56649cef5503460f949bd498/html5/thumbnails/19.jpg)
Key questions
• Should UDFR support classes of information that is not replicated?
– Need to deal gracefully with legally encumbered information. In a master/slave configuration, data entered at a slave node would remain local.
• What are the criteria for node eligibility?
– With no consensus on the immediate need for replication, this question does not require an immediate answer. Some identified criteria include: geographic dispersion and high-availability operation.
• What is the ongoing relationship between PRONOM and UDFR?
– Continued close consultation and collaboration.
… answered?!