tools & technologies for enhancing access to audiovisual - the singapore journey
TRANSCRIPT
Tools & Technologies for enhancing access to Audiovisual - the Singapore Journey
Dr Phang Lai Tee ([email protected]) National Archives of Singapore
AMIA Conference 20 Nov 2015 Curated Stream
} A “Little Red Dot” …
Greetings from Singapore
} Island city-state } Population: 5.5 million } Multi-racial community – Chinese
74%, Malay 14%, Indian 9%, other ethnicities 3%
} Area : 710 sq km } Government: Parliamentary
Democracy } We celebrated our Golden Jubilee } We mourned the passing of our
founding Prime Minister
Who we are
Introduction to the National Archives of Singapore (NAS)
4
} 1968: Established by Act of Parliament } Aug 1993: Came under National Heritage Board (NHB) } 1996: Audio Visual Archives Division formally set up } Nov 2012: Transferred to National Library Board (NLB)
Conveniently located in Singapore’s Civic District
(1 Canning Rise)
Archives in the library
New opportunities, new challenges } Content is king } Increased digitisation funding } Robust IT infrastructure for resource hungry AV } Experienced in improving search-ability of content } Dared to innovate & try new technologies } How to be visible in a sea of books (enhanced discovery?) } Pressure to widen access } Branding of archives } Archival principles…
Enhancing access step by step
Treasure Trove of AV Content
8
} Recommendation of Advisory Council on Culture and the Arts chaired by then 2nd Deputy Prime Minister Ong Teng Cheong in 1989
} Strengthen the national heritage collection in all media to cover sound-and-moving images
} Over 100,000 AV recordings covering 60 years of broadcasting history of Singapore
} AV recordings capturing defining moments and key government initiatives in Singapore’s 50 years of independence
} Sound recordings documenting recording history of Singapore and the region from 1903 to 1970s
New look (2013)
Expose the archives - Findable } Make each record Google findable with permanent url } Curate easy access pages of topical interests
Radio Talks on ‘The Battle for Merger’, 13 Sep - 9 Oct 1961
Archivist Pick of the Week
Search beyond the Archives – Expandable } OneSearch, Many Sources
} Data harmonization and linkages across different descriptive frameworks and systems for the benefit of users
Avoiding pitfalls } Beware of the mapping
} ISAD-G, MARC, Dublin Core } Creator/publisher, transferring agency/source of acquisition } One date v.s. many dates
} Know your collection well and the differences in descriptions and definitions } Mapping alone may not be adequate
Anchored on ISAD-G
Enhance findability of non-textual content - voice to text transcription
} 6,000 hours of broadcasts and speeches done } Useful guide for writing synopses, minimises need to make notes
when listening to audio, reduces time taken by 25% (for those with good accuracy)
} Problem with names and non-English words } Sarong became sorrow, Blakang Mati became Locomotiv } Saudara Joko Senyoto became John Paulson } Dr Goh Keng Swee became…
} Accuracy highly dependent on clarity of recording and speaker’s accent; can be improved through training
} There are portions that can only be understood by listening to the audio repeatedly
} Not suitable for broadcasts with multiple languages, certain series 17
http://www.nas.gov.sg/archivesonline/oral_history_interviews/record-details/df04a824-115d-11e3-83d5-0050568939ad?keywords=nair&keywords-type=all
Using text analytics to automatically identify related content
Texttokenised;tokensparsedandweighted(TF/IDF)
Texttokenised;tokensparsedandweighted(TF/IDF)
Weightedtokenssimilarity
computed
Similarity = 0.295
Expandable - Mahout
Using clustering to handle large datasets
Clusteringisthetaskofgroupingasetofobjectsinsuchawaythatobjectsinthesamegroup(calledacluster)aremoresimilar(insomesenseoranother)toeachotherthantothoseinothergroups(clusters)
MahoutK-MeansClusteringwithCosineDistance
Examples of results within the same database
Examples of results across different databases
I can’t put everything online!
Copyrights ($$$)
Restrictions by depositors & rights
protection
Behind the scene
At the public front
AV holdings size: 140,000 recordings
3,120
2,443
12,864 19,200
4,308 8,316
6,442 5,186
18,951 26,637
35,100
125,931
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
FY 11 FY 12 FY 13 FY 14
No. of Recordings
No. of Recordings Digitised
No. of New Recordings (or Metadata) Uploaded Online
No. of Page Views on Recordings
Huge rise in public interest for AV recordings
Total no. of recordings (or metadata) online: 96,209
In the pipeline…
Expandable – Project by NLB } Use machine translation technology & KOS (Knowledge
Organisation System) names database to translate non-English content/local personality names to English
} Apply text-mining & keyword classification to recommend related library & archives content across languages
In the pipeline } Extend in-premises access to the libraries } Image analytics } Linked data (by NLB) } Crowdsourcing for home movies?
http://www.jts2016.org/
Acknowledgements: Technology & Innovation, NLB
Oral History Centre, NAS email: [email protected]