interoperable digitised content “discover, search, extract, link, associate, and view digitised...

Interoperable Digitised Content

“Discover, search, extract, link,associate, and view digitised content”

Les Carr

2

Topics covered …

The brief Observations Challenges On EAD (encoded archival description) Opportunities

3

The Brief …

Background: There have been many major digitisation projects producing millions of

image files Often hosted only on an organisation’s web site with a customised user

interface

Libraries would like to provide broader access, for example, by: Content from separate libraries to be compared:

eg Newspapers published in different countries on the same date Content to be found, independent of location, that refers to the same

event from multiple content sources (e.g. start of first world war) Images to be located that are taken of the same object/place over time

from different sources Discover references people, names and places in documents held by

separate libraries

We would like to explore the possibility of using Linked Data technologies to facilitate these use cases.

4

Observations …

Each digitisation project has tended to end up with its own silo. BBC is transforming and at least partly becoming a library – there

is significant web access to previously broadcast content An organisation needs to adopt a form of persistent identifier for

each asset (as per 1st principle) & then publish (2nd principle) A German funding agency mandates that there are persistent

identifiers for digitised images Its not clear what are the benefits of RDF, but it seems to be a

fast and easy way of publishing It is good practice to separate (or distinguish between) content

and the way to present it – RDF can help this Helpful to have availability information about the item in RDF (e.g.

whether it is on line or not) Publish in JSON as well as RDF

5

Challenges …

Often a single organisation may have multiple separate silos of digitised content They may be inconsistencies in many ways, such as the way

they are described, searched, navigated, and the rights may differ

Most organisations have an internal challenge to link their existing silos - in addition to the obvious external challenge in linking with other organisations

Rights: Out of copyright does not mean there are no restrictions Metadata rights are not necessarily the same as content rights Different organisations publish in different ways or modes Deals with rights owners may be “time limited”, but once info is

published into public domain its not easy to retract it How do we record rights in RDF?

6

Challenges …

Some organisations have a wide range of types of digitised content: such as letters, sheet music, pictures, documents, newspapers, books, pamphlets and also born digital content

The British Museum (& others) measures contact time on its web site – if it publishes data then other sites may grab the user and so their own site traffic may reduce significantly

What are measures of success for a library if it exposes its data?

What is the future role for a library in the “supply chain”? As illustration, users may no longer go to a Mapping Agency so how does it demonstrate it provides a valued service

How do you demonstrate to funders that you are successful in a world of linked data?

7

Challenges …

If we convert several existing silos into RDF how might these be combined to produce a coherent user experience?

How might we create an RDF representation of METS? Typically each METS schema would depend on a model of the content that is being described, and these are different

Some digitisation projects are externally commercially funded and “publication” may compete with the funder’s business model

8

On EAD (encoded archival description)

EAD describes in a rich manner the content in an archive Can RDF cope with the levels, relationships and rich

structure in EAD? Can we create a “one fits all” model of EAD in RDF? Some organisations share DC even though they retain

MARC, so maybe some RDF might be shared about an EAD archive while a full EAD is retained.

Some organisations have many (e.g, ~3000) rich EAD silos – how might these be “cross walked”?

How do we share large scale resources in archives & manuscripts?

(Lets not forget TEI - text encoding is often applied to manuscripts … )

9

Opportunities …

Start with some simple collaboration … While there may be distinct differences by organisation or

area, ontologies can be adopted in several common areas: Person Time and time frames Location and changing names over time Events

Authority control can form a sound basis for cross referencing across collections and organisations – we could collaborate on shared published authority files

Establish a set of common content models, such as for newspapers and other content types

10

Opportunities …

Libraries may be able to adapt the BBC approach in mixing in-house and external resources

There is interest in exploring the use of EAD and RDF From the brief:

Content from separate libraries is accessed (eg Newspapers published in different countries on the same date)

Content to be found, independent of location, that refers to the same event from multiple content sources (e.g. start of first world war)

Images to be located that are taken of the same object/place over time from different sources

Discover references people, names and places in documents held by separate libraries

interoperable digitised content “discover, search, extract, link, associate, and view digitised...

Documents