british library seminar: shared canvas (september 2011)
DESCRIPTION
A detailed introduction to the technology behind Shared Canvas, and the data model for digital facsimiles of medeival manuscripts.TRANSCRIPT
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
1
Introduction to SharedCanvas: Linked Data for Facsimile Display and Annotation
Robert Sanderson [email protected] Los Alamos National Laboratory
Benjamin Albritton [email protected] Stanford University
http://www.shared-canvas.org/
This research is funded, in part, by the Andrew W. Mellon Foundation
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
2
Overview
• Quick Motivation • Technology Background:
• RDF and Linked Data • Object Reuse and Exchange (OAI-ORE) • Open Annotation (OAC)
• SharedCanvas: • Requirements • Model by Example
• Making it Real: • DMS Tech Group • Implementations and Demos
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
3
Motivation
Digital surrogates enable remote research • Improve preservation of original,
and digital preservation of surrogate • Promotes collaboration via shared
annotations and descriptions
A collaborative future: • Rich landscape of interconnected
repositories, with seamless user interfaces
• Improve efficiency and usability through open, shared development
BNF f.fr 113, folio 1 recto
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
4
Requirements
To Realize this Future: • Need a standardized input format to digital facsimile
presentation systems, to allow interoperability between and across repositories
Architectural Requirements: • Ability to model primarily textual items, where the individual
physical instance is an important cultural object • Alignment of multiple Images, Texts, Commentary and other
Content resources per folio • The Content, and Services that act upon it, are distributed
between institutions, and around the web
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
5
Naïve Approach: Transcribe Images Directly
But how to align multiple images, pages without images, fragments… ?!
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
6
Canvas Paradigm
A Canvas is an empty space in which to build up a display • HTML5, SVG, PDF, … even Powerpoint! • Can "paint" many different resources, including text, images and audio, on to a Canvas
We can use a Canvas to represent a folio of a manuscript.
Distributed nature is fundamental in the requirements • Painting resources, commentary and collaboration • Idea: Use Annotations to do all of those • Annotations can target the Canvas instead of individual Images
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
7
Annotations to Paint Text/Image to Canvas
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
8
Technology: RDF and Linked Data
Current technology of choice: XML • XML files can't be built in a distributed, collaborative way. • XML's tree structure insufficient
RDF (Resource Description Framework) is a Graph model • W3C Standard: http://www.w3.org/TR/rdf-primer/ • A single, global graph of interconnected resources • More Powerful … like the web • More Complex … like the web
Linked Data is RDF with some constraints • More web friendly • Much support from Industry, Academia and Government sectors • "Semantic Web" done right!
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
9
Technology: RDF and Linked Data
Primitives: • Resource Something of Interest • Predicate Typed, directed Relationship • Literal Data (string, integer, etc) • Triple ( Resource, Predicate, Literal/Resource )
Resource: • Can be digital, physical or conceptual • eg: An image file, an elephant, or "redness"
Predicate: • Can be Resource to Resource (relationship)
• X isPartOf Y • Or Resource to Literal (property)
• X title "Froissart's Chronicles"
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
10
Technology: RDF Skittles
Circle = Resource, Arrow = Predicate, Oval = Literal, Rectangle = Class
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
11
Technology: RDF and Linked Data
Namespaces: • Interoperability comes from reusing Ontologies (namespaces) of predicates and resources • eg Dublin Core, Open Annotation, SharedCanvas…
Can define (multiple) Classes for resources • Person, Image, Annotation, Canvas, … • Class is just another resource referenced with rdf:type predicate
• X rdf:type Class
All Resources and Predicates are identified by URIs • Linked Data recommends resolvable HTTP URIs
All statements are globally true, not just within the current document
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
12
Technology: RDF and Linked Data
Serializations: • XML ugly (though recommended as default) • Turtle much easier to read, but needs special parser • JSON many competing formats, no standard yet
XML: <dms:TranscriptionAnnotation rdf:about="urn:uuid:e7db526a…">! <oac:hasBody rdf:resource="http://anno.lanl.gov/m804/Line-f1r-37"/>! <oac:hasTarget ! rdf:resource="http://anno.lanl.gov/m804/View-f1r#xywh=696,1319,565,44"/>!</dms:TranscriptionAnnotation>!
Turtle: <urn:uuid:e7db526a…> a dms:TranscriptionAnnotation;!
! oac:hasBody ex:Line-f1r-37;!! oac:hasTarget ex:View-f1r#xywh=696,1319,565,44 .!
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
13
ORE: Aggregations of Web Resources
http://www.openarchives.org/ore/
Aggregation: An abstract collection of resources, with an identity Resource Map: A document that describes the Aggregation in RDF
AR-1 and AR-2 can be any web resource
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
14
ORE: Aggregations
Aggregations may aggregate other Aggregations, but each must have its own Resource Map
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
15
ORE: Aggregations
Aggregations do not have a default order for the Aggregated Resources Order can be imposed by RDF Lists
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
16
List/Aggregations
• How do those 'next' links actually work using an rdf:List?
• Verbose in full, but serializations have shortcuts to make this less ugly!
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
17
Technology: Open Annotation
• http://www.openannotation.org/
• Focus on interoperable sharing of annotations • Web-centric and open, not locked down silos • Create, consume and interact in different environments • Build from a simple model for simple cases, to more detailed for complex scholarly annotation requirements
• Status: Beta, with 9 ongoing funded experiments to inform 1.0
• Hardest part: Define what an Annotation is! • "Aboutness" is key to distinguish from general metadata
A document that describes how one resource is about one or more other resources, or part thereof.
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
18
Basic Model
The basic model has three resources: • Annotation (an RDF document)
• Default: RDF/XML but others via Content Negotiation • Body (the ‘comment’ of the annotation) • Target (the resource the Body is ‘about’)
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
19
Basic Model Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
20
Additional Relationships and Properties
Any of the resources can have additional information attached, such as creator, date of creation, title, etc.
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
21
Additional Properties Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
22
Annotation Types
There can be further types of Annotation, such as a Reply. Example: Replies are Annotations on Annotations.
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
23
Annotation Types Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
24
Inline Information
It is important to be able to have content contained within the Annotation document for Client Autonomy:
• Clients may be unable to mint new URIs for every resource • Clients may wish to transmit only a single document • Third parties can generate new URIs if the client does not
The W3C has a Content in RDF specification: • http://www.w3.org/TR/Content-in-RDF10/
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
25
Inline Information: Body
• We introduce a resource identified by a non resolvable URI, such as a UUID URN, as the Body. • We then embed the data within the Annotation document using the 'chars' property from the Content in RDF ontology.
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
26
Inline Body Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
27
Multiple Targets
There are many use cases for multiple targets for an Annotation: • Comparison of two or more resources • Making a statement that applies to all of the resources • Making a statement about multiple parts of a resource
The OAC Data Model allows for multiple targets by simply having more than one hasTarget relationship.
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
28
Multiple Targets Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
29
Segments of Resources
Most annotations are about part of a resource
Different segments for different media types:
• Text: paragraph, arbitrary span of words • Image: rectangular or arbitrary shaped area • Audio: start and end time points, track name/number • Video: area and time points • Other: slice of a data set, volume in a 3d object, …
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
30
Segments of Resources
Web Architecture Segmentation:
• A URI with a Fragment identifies part of the resource • Media-specific fragment identifiers; eg XPointer for XML • W3C Media Fragments URI specification for simple segments of media: http://www.w3.org/TR/media-frags/
We introduce a method of constraining resources:
• Introduce an approach for arbitrarily complex segments that cannot be expressed using Fragments • Can be applied to Body or Target resource
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
31
Segments of Resources: Fragment URIs
URI Fragments are a syntax for creating subsidiary URIs that identify part of the main resource
The syntax is defined per media type • X/HTML: The named anchor or identified element
• http://www.example.net/foo.html#namedSection
• XML: An XPointer to the element(s) • http://www.example.net/foo.xml#xpointer(/a/b/c)
• PDF: Many options, most relevant two operations: • http://www.example.net/foo.pdf#page=2&viewrect=20,80,50,60
• Plain Text: Either by character position or line position: • http://www.example.net/foo.txt#char=0,10 • http://www.example.net/foo.txt#line=1,5
• :
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
32
Segments of Resources: Media Fragments
Media Fragments allow anyone to create URIs that identify part of an image, audio or video resource.
The most common case is for rectangular areas of images: • http://www.example.org/image.jpg#xywh=50,100,640,480
Link to the full resource as well, for all Fragment URIs
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
33
Media Fragments Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
34
Complex Constraints
Fragment URIs are not always possible • Introduce a Constraint that describes the segment of interest • And a ConstrainedTarget that identifies the segment of interest • Constraints are entire resources, so can be more expressive • Constraints may also describe 'contextual' information
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
35
Constraint Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
36
RDF Constraints
Instead of having the information in an external document, it could be within the RDF of the Annotation document.
• We can attach information to the Constraint node
• Or use the Content in RDF specification to include what would have been in the external document
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
37
RDF Constraint Example
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
38
Constrained Body
The Body may also be constrained in the same way as Targets
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
39
Annotation Protocols
Protocol: publish, subscribe, consume linked
3
Unlike previous systems, Open Annotation does not mandate a protocol.
No reliance on a client/server combination gives the client autonomy.
Instead we promote a publish/subscribe methodology, where annotations may be stored and consumed from anywhere.
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
40
Publish/Subscribe Method
publish subscribe consume
4
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
41
publish subscribe consume
4
Publish/Subscribe Method
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
42
publish subscribe consume
4
Publish/Subscribe Method
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
43
Other Open Annotation Topics
Some other aspects of Open Annotation:
• Dealing with resources that change over time • http://arxiv.org/abs/1003.2643 • http://www.slideshare.net/azaroth42/
making-web-annotations-persistent-over-time
• Precedence when using multiple Constraints: • http://www.openannotation.org/spec/beta/precedence.html
• Machine Annotations, when the body is structured data intended for machine consumption
• In the beta spec directly: http://www.openannotation.org/spec/beta/#DM_Structured
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
44
BREAK
(Funny?) (Medieval) Picture of a Cat from the Web! http://romantoes.blogspot.com/2009/05/medievalist-cat-came-back.html
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
45
Motivating Questions
Many implicit assumptions: • What is a Manuscript? • What is its relation to a facsimile? • What is the relation of a transcription
of a facsimile to the original object?
What does this mean for digital tools?
• How do we rethink digital facsimiles in a shared, distributed, global space?
• How do we enable collaboration and encourage engagement?
Ms MurF: 10.5076/e-codices-kba-0003
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
46
Motivation
Digital surrogates enable remote research • Improve preservation of original,
and digital preservation of surrogate • Promotes collaboration via shared
annotations and descriptions
A collaborative future: • Rich landscape of interconnected
repositories, with seamless user interfaces
• Improve efficiency and usability through open, shared development
BNF f.fr 113, folio 1 recto
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
47
Baseline Requirements
To Realize this Future: • Need a standardized input format to digital facsimile
presentation systems, to allow interoperability between and across repositories
Architectural Requirements: • Ability to model primarily textual items, where the individual
physical instance is an important cultural object • Alignment of multiple Images, Texts, Commentary and other
Content resources per folio • The Content, and Services that act upon it, are distributed
between institutions, and around the web
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
48
Domain Requirements
Working at physical item level provides unique challenges!
1. Only parts of pages may be digitized
• Only illuminations digitized
• Fragments of pages
• Multiple fragments per image
Cod. Sang. 1394: 10.5076/e-codices-csg-1394
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
49
Domain Requirements
2. Page may not be digitized at all
• Not "interesting" enough
• Digitization destructive
• Page no longer exists
• Page only hypothetical
This page intentionally, but unfortunately,
left blank
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
50
Domain Requirements
3. Non-rectangular pages
• Fashionable heart shaped manuscripts
• Fragments
• Pages with foldouts
Facsimile of BNF Rothschild 2973 http://www.omifacsimiles.com/brochures/montchen.html
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
51
Domain Requirements
4. Alignment of multiple images of same object
• Multi-spectral imaging
• Multiple resolutions
• Image tiling
• Microfilm vs photograph
• Multiple digitizations
Archimedes Palimpsest Multi-Spectral Images http://www.archimedespalimpsest.org/
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
52
Domain Requirements
5. Multiple page orders over time • Rebinding
• Scholarly disagreement on reconstruction
6. Different pages of the manuscript held by different institutions
Cod Sang 730: 10.5706/e-codices-csg-0730a
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
53
Domain Requirements
7. Transcription of: • Text • Music
• Musical Notation • Performance
• Diagrams Reusing existing resources, such
as TEI, where possible
8. Transcriptions both created and stored in a distributed way, with competing versions
Parker CCC 008, f1r
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
54
Naïve Approach: Transcribe Images Directly
But how to align multiple images, pages without images, fragments… ?!
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
55
Canvas Paradigm
A Canvas is an empty space in which to build up a display • HTML5, SVG, PDF, … even Powerpoint! • Can "paint" many different resources, including text, images and audio, on to a Canvas
We can use a Canvas to represent a folio of a manuscript.
Distributed nature is fundamental in the requirements • Painting resources, commentary and collaboration • Idea: Use Annotations to do all of those • Annotations can target the Canvas instead of individual Images
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
56
Canvas to Page Relationship
The Canvas's top left and bottom right corners correspond to the corners of a rectangular box around the folio
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
57
OAC Annotations to Paint Images
We can paint the canvas by annotating it with resources.
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
58
OAC Annotations to Paint Text
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
59
Transcription: Morgan 804
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
60
Transcription: Morgan 804
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
61
Fragments: Cod Sang 1394
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
62
Musical Manuscripts: Parker CCC 008
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
63
Missing Pages: Parker CCC 286
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
64
Repeated Zones: Frauenfeld Y 112
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
65
List/Aggregations for Ordering
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
66
Rebinding: BNF f.fr. 113-116
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
67
Discovery: Aggregations
Those Annotations could be anywhere on the web! • Need to be able to discover them!
Introduce a discovery layer of sets of Annotations. • Currently by type of Annotation, and then by Folio eg: All ImageAnnotations, All text annotations for f1r • Other divisions possible, just for discovery!
Need a meta discovery layer to find the lists! • Introduce a "Manifest" resource:
• List of all of the resources known for the facsimile
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
68
SharedCanvas: Data Model
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
69
Digital Manuscript Interoperability for Tools and Repositories
Overview:
Andrew W. Mellon Foundation funded numerous manuscript digitization projects over several decades
All had in common: Inability to share data across silos to satisfy scholarly use Inability to leverage existing infrastructure No sustainability model for data or access
Goal: Interoperability between repositories and tools
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
70
Defining Interoperability
• Break down silos • Separate data from
applications • Share data models and
programming interfaces • Enable interactions at the
tool and repository level
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
71
Designing Modular Repositories and Tools
Image Data (Canonical)
Image Viewer
Discovery
Annotation
Metadata (Canonical)
Transcription
Image Viewer
Image Analysis Discovery Tool X?
Repository
Repository User
Interface
3rd-Party Tools
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
72
Designing Modular Repositories and Tools
Image Data (Canonical)
Image Viewer
Discovery
Annotation
Metadata (Canonical)
Transcription
Image Viewer
Image Analysis Discovery Tool X?
Repository
Repository User
Interface
3rd-Party Tools
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
73
Designing Modular Repositories and Tools
Image Data (Canonical)
Image Viewer
Discovery
Annotation
Metadata (Canonical)
Transcription
Image Viewer
Image Analysi
s Discovery Tool X?
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
74
Service-based Discovery and Delivery Interactions
• Four primitives currently supported: o Discovery
- New Name? - http://dms-dev.stanford.edu/
o Image Viewing - Independent zpr viewer
o Annotation - Digital Mappaemundi
o Transcription - T-PEN
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
75
Rendering Implementation
Rendering:
• Design considerations: • Easy to reuse and extend, no* server side code • Consume model directly from RDF • Use existing, well-understood, documented libraries
• Pure Javascript (Rob) • JQuery • RDF extension for JQuery • Audio Player extension • iOS Touch support extension • RaphaelJS for SVG (JQuery SVG not as easy, common)
* Except one minimal reflection script to avoid XSS/CORS issues
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
76
Rendering Implementation
Process:
• Fetch Manifest, Sequence, plus Lists of Annotations, via AJAX • Populate menus from Manifest and Sequence • Fetch any further resources needed, (TEI and SVG) • Generate one or more canvases based on browser size • Turn Annotation RDF/XML or n3 into JSON object for ease • Process XPointer, Media Fragments into local structures • Render annotations using HTML, or SVG if required, once all needed resources have been obtained • Retrieve commentary annotations, both public (pastebin) and personal (blogger), and render
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
77
Rendering Implementation
Demos!
• Morgan 804 (transcription as string, detail images) • http://www.shared-canvas.org/impl/demo1/
• Worlde's Blisce (audio, TEI transcription) • http://www.shared-canvas.org/impl/demo2/
• Selected Walters Museum Manuscripts (ranges, pan/zoom) • http://www.shared-canvas.org/impl/demo4/
• Archimedes Palimpsest (multi images, rotation, TEI transcription) • http://www.shared-canvas.org/impl/demo5/
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
78
Future Work
• Refine model based on community feedback, please!
• Improve implementations: • Ease of creation for new canvases and sequences • Improve User Interfaces (integrate zoom/pan, persistence) • High end technical aspects (zones) • Annotation filtering (spam will be an issue)
• Increase the community and adoption!
• Non Manuscript Use Cases: • Scientific Papers, Theses/Dissertations
• http://www.shared-canvas.org/impl/demo3/ & …/demo3b/ • Digitized Newspapers • …
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
79
Summary
Distributed Canvas paradigm provides a coherent solution to modeling the layout of medieval manuscripts
• Annotation, and Collaboration, at the heart of the model • Distribution across repositories for images, text, commentary • Granular accuracy, from full resource to non-rectangular segment • Multiple page orders and Discovery via Aggregations
SharedCanvas brings the humanist's primary research objects to their desktop in a powerful, extensible and interoperable fashion
Introduction to SharedCanvas British Library, 7th of September 2011, London, England
80
Thank You
Robert Sanderson [email protected] [email protected] @azaroth42
Ben Albritton [email protected]
Web: http://www.shared-canvas.org/ Paper: http://arxiv.org/abs/1104.2925
Slides: http://slidesha.re/XXXXX
Acknowledgements DMSTech Group: http://dmstech.group.stanford.edu/ Open Annotation Collaboration: http://www.openannotation.org/