multilingualism ifla 2014 08
DESCRIPTION
OCLC's 3 overlapping projects aim to generate true multi-lingual displays and to generate translation records for sharing via VIAF.TRANSCRIPT
IFLA - Lyon, France 19 August 2014
Janifer Gatenby
Multilingualism in WorldCat and VIAF
Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz
WorldCat Today
• Resources in nearly all languages
• Contributed by more than 20,000 libraries worldwide
• More than half the database is for works not in English
Languages
EnglishGermanFrenchSpanishChineseDutchJapaneseRussianArabic469 others
• Bibliographic Records– Hybrid records– Parallel records
• Clustered at Work level (FRBR)
WorldCat Today
Existing Architecture
AuthorsAuthors
Authors
SubjClassifSubj
ClassifSubjClassif
HoldingHolding
Holdings
Bibliographic recordWork
cluster
Content cluster
Manifestation cluster
Complementary Initiatives
Work Level Record
GLIMIRManifestation & Content Clusters
Multi-lingual Bibliographic
Structure
Objective: Work Level Record
Create a consolidated metadata summary for the content of a work
Work Level Recordhttp://www.oclc.org/research/activities/workrecs.html
Coming Q1 2015
GLIMIR: Objective
Create better work presentations
• The Content Cluster– Enables better work record displays by reducing the number of
lines that display for large works– Enables a choice of format and presents the formats that could
be acceptable substitutes– Consolidates holdings for identical content
• The Manifestation Cluster is important – Consolidates holdings at manifestation level– In the short term allows the record catalogued in the language
of the interface to be chosen for display– Reduces apparent duplication– Allows a more accurate count of the number of manifestations
in WorldCat (as opposed to the number of records)
GLIMIRUsers like C
Cataloguers & scholars
like C
Manifestation Clustering
So far 103 million records processed (about 30%)
Manifestation Cluster Opened
SRU Search:
Loti Pêcheur d’islande (Work ID 21536567)
Records HoldingsWork 18 148
Content 14 143Manifestation 7 115
Objective: Improve displays; surface translations
Multilingual Bibliographic Structure Project
Creates true multi-lingual displays– At work and manifestation levels– Using all available data instead of “most appropriate
record”– Generates data
Corrects many of the 28 million records coded “und”
Better control and linking of translationsInput to refinement of work clustersSmarter data storage
Multilingual Bibliographic Structure Project
• Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond
• The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why
“Most appropriate” questioned
Which record is better to present to a German speaker?
Incomplete Swedish Record
Hybrid record
Build the display from all available data
Most appropriate display
• Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata.
Multilingual Bibliographic Structure Project
End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records
Proposed new architecture
Work
eng
fre
ger
jpn
ManifengManif
engManifeng
Manifeng
Manifeng Manif
engA
o freNotesContents +
+
HoldingHolding
Holding
Holding
Subjsif
SubjClassif
eng
freger
jpn
AuthorsAuthors
Authorseng
freger
jpn
eng
fre
ger
jpn
eng
fre
ger
jpn
Translations (Language of work)
Maniffre
Holding
• Language tagging of elements, particularly– Summaries (M21 520)– Subject headings
• Display in script preferred by the user if data is available
• Improve translated interfaces• Show consolidated holdings as appropriate
Important principles
Surfacing the “cream”
Translations
• The cream of the world’s cultural and knowledge heritage is shared by being translated
• WorldCat contains many rich cataloguing records for these translations
Great works are translated
GOAL: Data mine the really good records to improve clustering, presentation, authority records
and linked data
ΙλιάδαThe Iliad 紅樓夢
Dream of the Red Chamber
Война и миръWar and Peace
ঘরে� বা�ইরে�The Home and the World
સતયના� પરયો�ગો� અથવા� આતમકથ�The Story of My Experiments with Truth [Gandhi autobiography]
源氏物語
The Tale of Genji
דער בעל-תשובהThe Penitent
زقاق المدقMidaq Alley
Leo Tolstoy: 32 languagesHomer: 28 languages
Rabindranath Tagore: 21Isaac Bashevis Singer: 17Najib Mahfuz: 12 languages
Cao Xueqin: 9 languages
Mahatma Gandhi: 7 languages
Murasaki Shikabu: 7 languages
Translations
• Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results– Titles without subtitles– Missing or different forms of uniform title– Inverted title– Different coding of original and translated
information
Improving work clustering
Generated uniform title authority records will overcome most of these differences without needing to edit individual records
Addition of xR records to VIAF
Before
After
UNESCO Translation Database
XR VIAF Record
VIAF ID for Author
Translated title
Translator
IFLA - Lyon, France 19 August 2014
VIAF Linked DataNew Information
Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:
Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:
Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:
Title: Tay du ky binh khaoLanguage: VietnameseTranslator: Phan QuanDate: 1980IsTranslationOf:
Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:
Title: Monkeys PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:
# Original Work (in Chinese)<http://worldcat.org/entity/work/id/1215997>
a schema:CreativeWork; schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian”schema:inLanguage "zh";schema:name "靈山 "@zh;.
# Translated Work (in English)<http://worldcat.org/entity/work/id/145209748>
a schema:CreativeWork;schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian“ [new]:translator <http://viaf.org/viaf/81663420> ; # "Lee, Mabel"schema:inLanguage "en";schema:name "Soul Mountain"@en ;[new]:translationOfWork <http://worldcat.org/entity/work/id/1215997> “
Markup for the Semantic Web
Understanding information sharing across cultures
• What percentage of non-English works are translations of English works, and vice-versa?• Which authors are translated the most?• Which works have been translated into the most languages?• Which countries translate the most English works, the most non-English works?• Which countries translate a new work the fastest?Etc.
http://www.oclc.org/research/activities/multilingual-bib-structure.html
Where are we now?
Clustering• Work clusters done; ongoing refinement• GLIMIR clustering done for all [simple] text;
– 103 million records have GLIMIR IDs • Working on collected worksDisplays• Working on VIAF expression displays• Work level displays in WorldCat.org ++Data Mining for translations
Explore. Share. Magnify.
Janifer GatenbyEMEA Program Manager Metadata