multilingualism ifla 2014 08

44
IFLA - Lyon, France 19 August 2014 Janifer Gatenby Multilingual ism in WorldCat and VIAF Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz

Upload: janifer-gatenby

Post on 14-Dec-2014

155 views

Category:

Internet


0 download

DESCRIPTION

OCLC's 3 overlapping projects aim to generate true multi-lingual displays and to generate translation records for sharing via VIAF.

TRANSCRIPT

Page 1: Multilingualism ifla 2014 08

IFLA - Lyon, France 19 August 2014

Janifer Gatenby

Multilingualism in WorldCat and VIAF

Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz

Page 2: Multilingualism ifla 2014 08

WorldCat Today

• Resources in nearly all languages

• Contributed by more than 20,000 libraries worldwide

• More than half the database is for works not in English

Languages

EnglishGermanFrenchSpanishChineseDutchJapaneseRussianArabic469 others

Page 3: Multilingualism ifla 2014 08

• Bibliographic Records– Hybrid records– Parallel records

• Clustered at Work level (FRBR)

WorldCat Today

Page 4: Multilingualism ifla 2014 08

Existing Architecture

AuthorsAuthors

Authors

SubjClassifSubj

ClassifSubjClassif

HoldingHolding

Holdings

Bibliographic recordWork

cluster

Content cluster

Manifestation cluster

Page 5: Multilingualism ifla 2014 08

Complementary Initiatives

Work Level Record

GLIMIRManifestation & Content Clusters

Multi-lingual Bibliographic

Structure

Page 6: Multilingualism ifla 2014 08

Objective: Work Level Record

Create a consolidated metadata summary for the content of a work

Page 7: Multilingualism ifla 2014 08

Work Level Recordhttp://www.oclc.org/research/activities/workrecs.html

Coming Q1 2015

Page 8: Multilingualism ifla 2014 08

GLIMIR: Objective

Create better work presentations

Page 9: Multilingualism ifla 2014 08

• The Content Cluster– Enables better work record displays by reducing the number of

lines that display for large works– Enables a choice of format and presents the formats that could

be acceptable substitutes– Consolidates holdings for identical content

• The Manifestation Cluster is important – Consolidates holdings at manifestation level– In the short term allows the record catalogued in the language

of the interface to be chosen for display– Reduces apparent duplication– Allows a more accurate count of the number of manifestations

in WorldCat (as opposed to the number of records)

GLIMIRUsers like C

Cataloguers & scholars

like C

Page 10: Multilingualism ifla 2014 08

Manifestation Clustering

So far 103 million records processed (about 30%)

Page 11: Multilingualism ifla 2014 08

Manifestation Cluster Opened

Page 12: Multilingualism ifla 2014 08

SRU Search:

Loti Pêcheur d’islande (Work ID 21536567)

Records HoldingsWork 18 148

Content 14 143Manifestation 7 115

Page 13: Multilingualism ifla 2014 08

Objective: Improve displays; surface translations

Multilingual Bibliographic Structure Project

Page 14: Multilingualism ifla 2014 08

Creates true multi-lingual displays– At work and manifestation levels– Using all available data instead of “most appropriate

record”– Generates data

Corrects many of the 28 million records coded “und”

Better control and linking of translationsInput to refinement of work clustersSmarter data storage

Multilingual Bibliographic Structure Project

Page 15: Multilingualism ifla 2014 08

• Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond

• The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why

“Most appropriate” questioned

Page 16: Multilingualism ifla 2014 08

Which record is better to present to a German speaker?

Page 17: Multilingualism ifla 2014 08

Incomplete Swedish Record

Page 18: Multilingualism ifla 2014 08

Hybrid record

Page 19: Multilingualism ifla 2014 08

Build the display from all available data

Most appropriate display

Page 20: Multilingualism ifla 2014 08

• Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata.

Multilingual Bibliographic Structure Project

End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records

Page 21: Multilingualism ifla 2014 08

Proposed new architecture

Work

eng

fre

ger

jpn

ManifengManif

engManifeng

Manifeng

Manifeng Manif

engA

o freNotesContents +

+

HoldingHolding

Holding

Holding

Subjsif

SubjClassif

eng

freger

jpn

AuthorsAuthors

Authorseng

freger

jpn

eng

fre

ger

jpn

eng

fre

ger

jpn

Translations (Language of work)

Maniffre

Holding

Page 22: Multilingualism ifla 2014 08

• Language tagging of elements, particularly– Summaries (M21 520)– Subject headings

• Display in script preferred by the user if data is available

• Improve translated interfaces• Show consolidated holdings as appropriate

Important principles

Page 23: Multilingualism ifla 2014 08
Page 24: Multilingualism ifla 2014 08
Page 25: Multilingualism ifla 2014 08
Page 26: Multilingualism ifla 2014 08
Page 27: Multilingualism ifla 2014 08

Surfacing the “cream”

Translations

Page 28: Multilingualism ifla 2014 08

• The cream of the world’s cultural and knowledge heritage is shared by being translated

• WorldCat contains many rich cataloguing records for these translations

Great works are translated

GOAL: Data mine the really good records to improve clustering, presentation, authority records

and linked data

Page 29: Multilingualism ifla 2014 08

ΙλιάδαThe Iliad 紅樓夢

Dream of the Red Chamber

Война и миръWar and Peace

ঘরে� বা�ইরে�The Home and the World

સતયના� પરયો�ગો� અથવા� આતમકથ�The Story of My Experiments with Truth [Gandhi autobiography]

源氏物語

The Tale of Genji

דער בעל-תשובהThe Penitent

زقاق المدقMidaq Alley

Page 30: Multilingualism ifla 2014 08

Leo Tolstoy: 32 languagesHomer: 28 languages

Rabindranath Tagore: 21Isaac Bashevis Singer: 17Najib Mahfuz: 12 languages

Cao Xueqin: 9 languages

Mahatma Gandhi: 7 languages

Murasaki Shikabu: 7 languages

Translations

Page 31: Multilingualism ifla 2014 08

• Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results– Titles without subtitles– Missing or different forms of uniform title– Inverted title– Different coding of original and translated

information

Improving work clustering

Generated uniform title authority records will overcome most of these differences without needing to edit individual records

Page 32: Multilingualism ifla 2014 08

Addition of xR records to VIAF

Before

After

Page 33: Multilingualism ifla 2014 08

UNESCO Translation Database

Page 34: Multilingualism ifla 2014 08
Page 35: Multilingualism ifla 2014 08

XR VIAF Record

VIAF ID for Author

Translated title

Translator

Page 36: Multilingualism ifla 2014 08
Page 37: Multilingualism ifla 2014 08
Page 38: Multilingualism ifla 2014 08
Page 39: Multilingualism ifla 2014 08

IFLA - Lyon, France 19 August 2014

VIAF Linked DataNew Information

Page 40: Multilingualism ifla 2014 08

Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:

Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:

Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:

Title: Tay du ky binh khaoLanguage: VietnameseTranslator: Phan QuanDate: 1980IsTranslationOf:

Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:

Title: Monkeys PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:

Page 41: Multilingualism ifla 2014 08

# Original Work (in Chinese)<http://worldcat.org/entity/work/id/1215997>

a schema:CreativeWork; schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian”schema:inLanguage "zh";schema:name "靈山 "@zh;.

# Translated Work (in English)<http://worldcat.org/entity/work/id/145209748>

a schema:CreativeWork;schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian“ [new]:translator <http://viaf.org/viaf/81663420> ; # "Lee, Mabel"schema:inLanguage "en";schema:name "Soul Mountain"@en ;[new]:translationOfWork <http://worldcat.org/entity/work/id/1215997> “

Markup for the Semantic Web

Page 42: Multilingualism ifla 2014 08

Understanding information sharing across cultures

• What percentage of non-English works are translations of English works, and vice-versa?• Which authors are translated the most?• Which works have been translated into the most languages?• Which countries translate the most English works, the most non-English works?• Which countries translate a new work the fastest?Etc.

http://www.oclc.org/research/activities/multilingual-bib-structure.html

Page 43: Multilingualism ifla 2014 08

Where are we now?

Clustering• Work clusters done; ongoing refinement• GLIMIR clustering done for all [simple] text;

– 103 million records have GLIMIR IDs • Working on collected worksDisplays• Working on VIAF expression displays• Work level displays in WorldCat.org ++Data Mining for translations

Page 44: Multilingualism ifla 2014 08

Explore. Share. Magnify.

Janifer GatenbyEMEA Program Manager Metadata

[email protected]