linked library data

61
Linked Library Data Modeling Metadata for the [Semantic] Web Presented 2010-11-19 Columbia University Digital Library Seminar Series Corey A Harper

Upload: laban

Post on 25-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Linked Library Data. Modeling Metadata for the [Semantic] Web. Presented 2010-11-19 Columbia University Digital Library Seminar Series   Corey A Harper. Topical Overview. Semantic Web Intro Linked Open Data Graphs: Entity – Attribute – Value A Few Examples Library Data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linked Library Data

Linked Library Data

Modeling Metadata for the [Semantic] Web

Presented 2010-11-19Columbia University Digital Library Seminar Series   Corey A Harper

Page 2: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 2

Topical Overview

• Semantic Web Intro• Linked Open Data

– Graphs: Entity – Attribute – Value– A Few Examples

• Library Data

Page 3: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 3

Topical Overview (cont)

• Linked Library Data– SKOS and Authority Control– FRBR and Bibliographic Data– National Libraries

• Resource Description and Access (RDA)

• Dublin Core Metadata Initiative

Page 4: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 4

Semantic Web

• TBL’s original vision– “Weaving the Web” – 1999

• Then: Focus on Machine Reasoning – Scientific American Article

• Now: Focus on things & links– Reasoning becoming lower level

Page 5: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 5

Semantic Web

• Originally:– Metadata standard built on XML– Metadata about “Web” things

• Eventually:– Metadata about all things– Metadata about relationships

between things

Page 6: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 6

Semantic Web Terminology

• Resource: Any thing• Class: Abstraction of a type of thing• Individual: An instance of a class• Property: An attribute of an individual• Ontology: A domain specific collection of

classes and properties• Statement/Triple:

– A Resource (subject) - Nodes– A Property (predicate) - Arcs– A Value (object) - Nodes

Page 7: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 7

Semantic Web Terminology

• Graphs: Representations of statements about resources

• Nodes: The Subjects and Objects in a Graph• Arcs: The Predicates in a Graph• Literals:

“Objects” represented as strings (constant values) rather than things (URI References)

• Domains and Ranges: Constraints on Nodes• For Example…

Page 8: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 8

Page 9: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 9

RDF

• Resource Description Framework• Formally Begun in 1999• Ideas from 1995• Finalized in 2004• Frighteningly complex at times…

– “Directed Labeled Graphs”

Page 10: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 10

SemWeb Value Proposition

• Formally Modeled (Meta) Data• Formal Semantics Declaration• Increased Granularity compared to

record-based Metadata• Improved Interoperability

Page 11: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 11

“The vast bulk of data to be on the Semantic Web is already sitting in databases … all that is needed [is] to write an adapter to convert a particular format into RDF and all the content in that format is available.”

-Tim Berners-Lee in an interview with the Consortium Standards Bulletin

Page 12: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 12

Linked Open Data

• Use URIs as names for things • Use HTTP URIs so that people can look

up those names. • When someone looks up a URI, provide

useful information. • Include links to other URIs. so that they

can discover more things. http://www.w3.org/DesignIssues/LinkedData.html

Page 13: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 13

Page 14: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 14

Page 15: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 15

Page 16: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 16

Linked Data Cloud

• Automated generation– Comprehensive Knowledge Archive N

etwork (CKAN)– Vocabulary of Interlinked Datasets (

voiD)– Basically, catalog your metadata!

• Recent criticism: data quality

Page 17: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 17

Data in the Cloud• Hubs in the May 2008 Version:

– FOAF– DBPedia

• Myriad Sources coming online:– Thompson Reuters– New York Times– British Broadcasting Corporation– Google and Facebook– More and More Library Data

–Geonames–MusicBrains

Page 18: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 18

DBpedia

• Structured Wikipedia Data• Genres, Influences, External Links• Multi-lingual / Multi-script labels• Rich Semantics• Many linkages to other datasets

Page 19: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 19

DBpedia

• 3.4 Million “things” described• Ontology based on “infoboxes”

– 1.5 million things classified

• Approx. 50,000 “Properties”– Approx. 1,200 defined in ontology

• Brief Example

Page 20: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 20

Domain Modeling

• Starting from application / goal / function

“To guide and evaluate our designs, we need objective criteria that are founded on the purpose of the resulting artifact, rather than based on a priori notions of naturalness or Truth.” – Gruber, 1993

• Does this apply to Libraries? FRBRer?

Page 21: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 21

DBPedia Model

• Partial basis in data entry conventions• InfoBox’s, and InfoBox Templates• Metadata Entry Format• Partial source of Ontology

– Class Structure– Vocabulary Design

Page 22: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 22

DBpedia

• 3.4 Million “things” described• Ontology based on “infoboxes”

– 1.5 million things classified– http://wiki.dbpedia.org/Ontology

• Approx. 50,000 “Properties”– Approx. 1,200 defined in ontology

Page 23: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 23

Page 24: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 24

Page 25: Linked Library Data
Page 26: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 26

More Examples

• British Broadcasting Corporation– Programmes, Music, Wildlife

• Google Refine• Data.gov and data.gov.uk• NY Times

Page 27: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 27

What *things* are in our data???

Page 28: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 28

…Librarydata is extremelycomplicated

Page 29: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 29

Bibliographic Data

• Rich stores of MARC, MODS, &c.• Robust Controlled Vocabularies

– Subject Heading lists– Code lists– Thesauri

• Emerging data model in FR*

Page 30: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 30

Bibliographic Vocabs

• Bibliographic Ontology– Zotero, Omeka, EPrints and Others

• FRBR – unofficial– And now Official (Thank you IFLA!)

• ISBD

Page 31: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 31

Library Authority Data

“Include links to other URIs. so that they can discover more things.”

Short of providing and linking to URIs, this *is* authority data.

This is what our authority files are for.

Page 32: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 32

Library Controlled Vocabularies: Benefits

• Reputation - Trusted Tradition• Mature - Time tested and carefully

developed• General & Comprehensive - Cover

large knowledge spaces

Page 33: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 33

SKOS

• Simple Knowledge Organization System

• Properties and Classes for describing Controlled Vocabulary

RDF Pageskos:primaryTopic

skos:person

Page 34: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 34

LCSH in Dublin Core

• Encoding Scheme for DC Subject• No easy way to draw on equivelent

terms and cross-references• Abstract Model, RDF and SKOS

could enable applications to make use of the whole vocabulary

Page 35: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 35

LCSH as a Web Service!

• Uses principles of linked data• lcsh.info -> id.loc.gov• People noticed when taken down• Links to French Subject Headings• URIs for Literal String lookup• http://id.loc.gov/authorities/label/World Wide Web

Page 36: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 36

Page 37: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 37

Other Vocabularies

• Thesaurus for Economics• French Subject Headings• Swedish Subject Headings• IconClass (not on web yet)• OCLC Terminology Services• Dewey Decimal Classification• Virtual International Authority File

Page 38: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 38

Linked Library Data

• VIAF, LCSH, MARC Codes• Open Library, XC, Kualli OLE• Library of Congress, OCLC• Hungarian, German, British, Swedish

National Libraries• Formalized Efforts: W3C, IFLA & RDA

Page 39: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 39

Kungliga Biblioteket

Image co

urte

sy o

f Martin

M

alm

stem

http

://blo

g.lib

ris.kb.se

/sem

web/?

p=

7

Page 40: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 40

National Széchényi Library

“Our RDFDC, FAOF and SKOS statements are linked together. Our name authority is matched with the DBPedia name files and URI aliases are handled as owl:sameAs statements.” -Adam Horvath

Page 41: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 41

W3C LLD XG

• “Incubator Group”• Membership:

– Researchers, Consultants, Librarians– National Libraries: Germany, France,

LoC, Sweden– OCLC & IFLA

Page 42: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 42

Page 43: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 43

W3C LLD XG Goals

• Collecting, Curating and Clustering over 50 Use Cases

• Mining use cases for functional requirements and design patterns

• Recommendations to W3C– Should lead to Working Groups

Page 44: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 44

RDA Development

• RDA elements, roles and vocabularies have been provisionally registered

• IFLA FRBRer and ISBD elements and vocabularies have been officially registered

• Discussions about long term maintenance of both RDA and the vocabularies

• Effort to create multi-language RDA Vocabularies

RD

A S

lides A

dapte

d fro

m D

iane

Hillm

ann

Page 45: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 45

RDA Elements Listing

334!

Page 46: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 46

RDA Elements Listing

334!

Base material

Page 47: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 47

Detail: Base Material

Page 48: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 48

Detail: Base Material

URI

Page 49: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 49

RDA Base Material Vocabulary

Page 50: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 50

RDA WEMI Relationships

Page 51: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 51

Detail: RDA WEMI Relationship

Page 52: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 52

Metadata Registries

• Formerly NSDL Registry– Now “Open Metadata Registry”– Managing Vocabularies– Providing Vocabulary Services

• DCMI Registry Community• DCMI Architecture Forum

Page 53: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 53

DCMI and the Semantic Web

• Collaboration from the start• Libraries (esp. OCLC) were at the

table• Perception of DCMI as DCMES

– DCMI = Metedata Vocab / Framework– DCMES = Metadata Record Format

Page 54: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 54

DCMI and the Semantic Web

• Every example above had dcterms• DCMI as Research Institute and

Metadata Think Tank– Modeling Work– Metadata Registries– Application Profiles– Description Set Profiles– Singapore Framework

Page 55: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 55

Changing Role of DCMI

• Mike Bergman at DC2010:– Reference Metadata– Reference Concepts– Mapping Predicates

• “Mappings should be approximate”

– Usage Guidelines• Compliment to W3C Standards

Page 56: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 56

Why Does This Matter?Our descriptions no longer stand alone!Connect our data with the rest of the WEBAllow others to reuse more easily

– FOAF– DBPedia– Geonames– MusicBrains– New York Times– Thompson Reuters– Government Data - data.gov– British Broadcasting Corporation

Page 57: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 57

Conclusions

• Distributed bibliographic control environment– Linking Data– Focus on identification over description

• “In short, by treating values as non-literal resources and assigning URIs to them we give ourselves (and others) the hooks on which to hang further descriptions.” - Andy Powell

Page 58: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 58

Endless possibilities

• This barely scratches the surface• The Giant Global Graph!!• With more soundly modeled

bibliographic and authority data…– Terminology Services– Context sensitive

interfaces– Customized Exhibits

– Mashups– Web Services– User Profiling– Collaboration tools

Page 59: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 59

Continuing Challenges

• Emerging Technology• Design Patterns• Complexity (http-range14)• Existing Technical Infrastructure• Bootstrapping• Business Cases

Page 60: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 60

More Information

• W3C LLD XG:http://www.w3.org/2005/Incubator/lld/wiki/Main_Page

• ALA LLD Interest Group:– http://kcoyle.net/lld-ala.html

• IFLA Semantic Web SIG– https://wiki.d-nb.de/x/vA10Ag

Page 61: Linked Library Data

2010-11-19 Harper - Linked Library Data - Columbia University 61

Thanks!

[email protected]

Questions?