data designed for discovery
TRANSCRIPT
![Page 1: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/1.jpg)
LITA National Forum 2015
Data Designed for Discovery
Roy TennantOCLC Research
![Page 2: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/2.jpg)
Place Speaker Photo Here
Cataloging Unchained!
![Page 3: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/3.jpg)
THE PAST AND PRESENT SITUATION
![Page 4: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/4.jpg)
The Canonical Entity (of the past)
![Page 5: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/5.jpg)
The Canonical Entity (of the present)
![Page 6: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/6.jpg)
• A collection of statements…• Taken from the piece itself…• Sometimes “enhanced” with inferred
parentheticals (e.g., [1975] )…• Or additional statements not on the piece (e.g.,
subject headings)• Where punctuation, which may or may not be
present, is used (inconsistently) for structure• Mostly uncontrolled text strings that are only
loosely connected to anything else
The Classic Bib Record
MARC is machine readable,
NOT machine understandable!
![Page 7: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/7.jpg)
THE PROBLEM
![Page 8: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/8.jpg)
• Identification Problems:– “The Hamlet Problem” (titles aren’t enough)– “The Wang/Li/Zhang Problem” (names aren’t enough)
• Linkage Problems:– “The Web Problem” (text strings aren’t enough, you
need links)– “The Language Problem” (the right translation for a
given user in a way they can understand)• Quality Problems:
– “The Legacy Problem” (strings are not controlled terms; often, they cannot be turned into them)
Actually, A Number of Problems
![Page 9: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/9.jpg)
![Page 10: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/10.jpg)
![Page 11: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/11.jpg)
First, define ALL
THE THINGS
![Page 12: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/12.jpg)
entity/ˈɛntɪti/noun
a thing with distinct and independent existence.
relationship/rɪˈleɪʃ(ə)nʃɪp/noun
the way in which two or more people or things are connected
![Page 13: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/13.jpg)
RecordTitle: "War and Peace"Author: "Leo Tolstoy 1828-1910"ISBN: 0307266931
Type: WorkName: "War and Peace"Author: http://worldcat.org/entity/person/id/1234
Entity (http://worldcat.org/entity/work/id/115206288)
Type: PersonName: "Leo Tolstoy "Born: 1828Died: 1910Birthplace: http://worldcat.org/entity/place/id/8976
Entity (http://worldcat.org/entity/person/id/1234)
Type: PlaceName: "Yasnaya Polyana"SameAs: http://geonames.org/468686
Entity (http://worldcat.org/entity/place/id/8976)
⤵
⤵ ⟶
![Page 14: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/14.jpg)
person place
object concept
organization work
Entities of Initial Focus
work
person
![Page 15: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/15.jpg)
person place
object concept
organization work
subjectitemavailability
author
Relationships between Entities are established
![Page 16: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/16.jpg)
Using authoritative sources whenever possible
![Page 17: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/17.jpg)
And linkingto other authoritativedata sources
![Page 18: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/18.jpg)
“Shredding”
![Page 19: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/19.jpg)
From Records to Entities: Works
![Page 20: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/20.jpg)
![Page 21: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/21.jpg)
![Page 22: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/22.jpg)
![Page 23: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/23.jpg)
![Page 24: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/24.jpg)
LCSH = Mixed entities
Headings before conversion
600 10 Bennet, Elizabeth (Fictitious character) $v Fiction.600 10 Darcy, Fitzwilliam (Fictitious character) $v Fiction.650 0 Gentry $z England $v Fiction.650 0 Social classes $z England $v Fiction.650 0 Young women $v Fiction.650 0 Mate selection $v Fiction.650 0 Courtship $v Fiction.650 0 Sisters $v Fiction.651 0 England $x Social life and customs $y 19th century $v Fiction.
Person/Character
Genre
Place
Topic
Time
![Page 25: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/25.jpg)
FAST = Each entity in separate field
Entity/Facet Time Period……. 1800 - 1899 Person…………….. Bennet, Elizabeth (Fictitious character) Darcy, Fitzwilliam (Fictitious character) Topic………………. Courtship Gentry Manners and customs Mate selection Sisters Social classes Young women Place……………… England Form/Genre………. Fiction
![Page 26: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/26.jpg)
Creating linked data assertions from record-oriented data
MARC Record
Enhanced WorldCat
MARC Record
MARC Records
• FRBR Clustering
• String matching with controlled vocabularies
• Addition of standard identifiers
![Page 27: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/27.jpg)
OCLC Production Services
External OCLC Research Systems
Internal OCLC Research Resources
enhancedWorldCat
WORKS
Kindred Works
Classify
Identities
FictionFinder
Cookbook Finder
LCSH
FAST
VIAF
GMGPC
GSAFD
GTT
DDCLCTGM MeSH
Linked Data Entities
![Page 28: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/28.jpg)
Creating linked data assertions from record-oriented data
MARC Record
Enhanced WorldCat
MARC Record
Persons
Organizations
Places
Concepts
Events
Works
MARC Records RDF Entities
• FRBR Clustering
• String matching with controlled vocabularies
• Addition of standard identifiers
![Page 29: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/29.jpg)
Creating linked data assertions from record-oriented data
MARC Record
Enhanced WorldCat
MARC Record
Persons
Organizations
Places
Concepts
Events
Works
MARC Records RDF Entities Triples
• FRBR Clustering
• String matching with controlled vocabularies
• Addition of standard identifiers
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
Subject Predicate Object
![Page 30: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/30.jpg)
A series of recent Google Research papers describe the use of probabilistic models and machine learning to assess the truth of statements made by multiple sources.• Li, X., Dong, X. L., Lyons, K., Meng, W., Srivastava, D. (2013). Truth
Finding on the Deep Web: Is the Problem Solved? • Dong, X. L., Gabrilovich, E., Heitz, G., Horn, W., Murphy, K., Sun, S., Zhang, W. (2013). From
Data Fusion to Knowledge Fusion.• Dong, X. L., Murphy, K., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., ... & Zhang, W. (2014). Knowledge
Vault: A Web-scale approach to probabilistic knowledge fusion• Dong, X. L., Gabrilovich, E., Murphy, K. Dang, V., Horn, W., … & Zhang, W. (2015). Knowledge-Based
Trust: Estimating the Trustworthiness of Web Sources
Estimating “Truthiness” of assertions
![Page 31: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/31.jpg)
Data Sources
Extraction
Knowledge Triples
Scored Triples
Fusion KnowledgeVault
EnhancedWorldCat
VIAF
FAST
Knowledge Vault data flow
Extractor
Extractor
Extractor
Fusers
Collective
Etc.
![Page 32: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/32.jpg)
OK…BUT SO WHAT?
![Page 33: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/33.jpg)
Improving Discovery
Mockup
![Page 34: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/34.jpg)
![Page 35: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/35.jpg)
![Page 36: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/36.jpg)
![Page 37: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/37.jpg)
![Page 38: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/38.jpg)
![Page 39: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/39.jpg)
Solving the Hamlet Problem!
![Page 40: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/40.jpg)
![Page 41: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/41.jpg)
Embedding Authority Control
Solving the Wang/Li/Zhang Problem!
![Page 42: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/42.jpg)
Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:
Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:
Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:
Title: Tay du ky binh khaoLanguage: VietnameseTranslator: Phan QuanDate: 1980IsTranslationOf:
Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:
Title: PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:
![Page 43: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/43.jpg)
Mockup
![Page 44: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/44.jpg)
Mockup
Solving the Language Problem!
![Page 45: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/45.jpg)
Exposing the Quality Problem
![Page 46: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/46.jpg)
• Requires a new kind of thinking: “sets of assertions about something” NOT a “record”
• Requires much more than simply translating a record from MARC to a new format
• There are things we can do now to make MARC “linked data ready”
• Quality is a pursuit, not an achievable goal
It’s Not the Record, It’s the Linkage
![Page 47: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/47.jpg)
Why OCLC wants to manage entities
Connectivity– Entity recognition helps to connect library data
to the networked environment/web• e.g., Schema.org, BIBFRAME, etc.
Efficiency & Quality– Facilitates efficient creation of quality metadata
Creativity– Opens the door to creative reuse of library data
![Page 48: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/48.jpg)
• Person Entity Lookup service• Using EntityJS as one way for humans to
interact with, and evaluate, and potentially correct linked data
• Others as indentified…
Piloting New Services
Photo by drivethrucafehttps://www.flickr.com/photos/128758398@N07/CC BY-SA 2.0
![Page 49: Data Designed for Discovery](https://reader035.vdocument.in/reader035/viewer/2022070521/58f04bd21a28ab19638b45e7/html5/thumbnails/49.jpg)
SMTogether we make breakthroughs possible.
Thank you!Roy [email protected]/roytennant/
LITA National Forum 2015
©2015 OCLC This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: This work uses content from “Data Designed for Discovery” © OCLC, used under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.