islandora webinar: highlighting cuhk chinese digital collections

37

Upload: erin-tripp

Post on 19-Jan-2017

449 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 2: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 3: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 4: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 5: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 6: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 7: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 8: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 9: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 11: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 12: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 13: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

ABOUT US

• Louisa Lam• Head of Research Support & Digital Initiatives, a new team setup in July 2015• Prior to that was the Head of Information Technology & Planning Team, responsible for

all issues of library systems and infrastructures, and Digitization Projects

• Jeff Liu• Has assumed the role of Digital Services Librarian since July 2015. • Prior to that was a Systems Librarian with duties in the management of ILS, Ezproxy,

Website development, Technical Support of all E-Resources, application system, plus Digitization

• No deep understanding/prior experience in metadata, MARC, MODS, Solr, Fedora which are all core components of Islandora

Page 14: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

ABOUT CUHK

• Established in 1963

• A comprehensive research university with a 137.3-hectare campus overlooking Tolo Harbor at Shatin, New Territories

• Comprises of 9 colleges, 8 Faculties and Graduate School

• 18,698 undergraduates and postgraduates, 7,157 teaching staff

Page 15: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

ABOUT CUHK LIBRARY

• Comprised of 7 libraries:• University Library

• Lee Quo Wei Law Library

• Chung Chi College Elizabeth Luce Moore Library

• Architecture Library

• New Asia College Ch’ien Mu Library

• United College Wu Chung Multimedia Library

• Li Ping Medical

Library

15

Page 16: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

OVERVIEW OF CUHK DIGITAL COLLECTIONS

Page 17: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

OVERVIEW OF CUHK DIGITAL COLLECTIONS

• 333,263 objects in the system currently

• ~98% are book / manuscript images, ~11,000 records are ETDs

• There are also images and photos

• 95% are in Chinese

• Over 3 million image objects not yet migrated – Some require special handling (Hong Kong Literature Database, Rulan Chao Pian Music Collection

• Continuous development of new collections

Page 18: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

MOVING INTO ISLANDORA

Before 2012, over 5 million of digital objects stored in Tamino XML Database with no user interface

Difficulties:

• Time consuming to develop new interface, new schema, new workflow for each new collection

• Every upload and every metadata update need to submit a request to technical team

• Not flexible enough for fast growing collection and large-scale implementation

• Staff mindset not yet ready for open source collaborative program development

Page 19: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

MOVING INTO ISLANDORA

Interim Solution (2013 – 2016): • Develop a new portal using Drupal CMS to have a standard interface

Long-term Solution (2014 - ):• Looking for mind-set changes, and alternative system

• Re-organize the staff / team structure before re-organizing the content

• Open source instead of a proprietary system

• Standardization instead of isolated development

Page 20: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

PROJECT TIMELINE

Time Frame Action

Summer 2014 Identified Islandora

Aug –Oct 2014 Local installation by our own technical team – identified the needs for more support

Oct 2014 Contracted with Discovery Garden

Nov 2014 – Mar 2015 Installation, Theming and Customization of CUHK instance by Discovery Garden

Jun 2015 Ingested around 100 Daoist Books into the repository without deep understanding of Islandora and structured metadata to meet an urgent request of a faculty - turned out to be a very good trial and error exercise)

Jul 2015 New team setup - dedicated to Digitization and Digital Repository development(1 Digital Services Librarian and 1 web programmer)

Aug 2015 – Jan 2016 Work with Digital Initiatives Group to tweak the theme to sync with New Library Website developed on Drupal by the same web programmer

Page 21: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

PROJECT TIMELINE

Time Frame Action

Aug 2015 - Studied and implemented the metadata standard, XSLT, user Interface, functionalities – after learning a lot from the Islandora Conference in Aug 2015

Mid Oct 2015 - - Started the re-ingestion of stitch bound classic Chinese books into the platform as the basis for the Daoist Texts Collection after experimenting with the system for months- Developed new workflow for migration- Developed tools and bug fixes to prepare migration of legacy collections

12 Feb 2016 Soft launch of the Repository

17 Mar 2016 Official launch of the Repository with more collections migratedDigital Scholarship Lab opened on the same day

Mar 2016 - Continue to migrate legacy collections into the Repository

Page 22: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

CHINESE RARE BOOK DIGITAL COLLECTION

• First collection to migrate to the new portal built on Drupal CMS

• Using Drupal views for search and retrieval. E-books are linked to an external e-book reader (in Flash player format)

• Problems:

• Flash player is fading out

• a large vol. of backlog cannot be handled by the system and servers

• Before Islandora

Page 23: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

CHINESE RARE BOOK DIGITAL COLLECTION

• Book Solution Pack widely used

• All objects are on a single platform for searching and viewing

• Book metadata output from Innovative ILS and converted with Library of Congress’s MARCXML to MODS XSLT with a few localized changes including flipping Marc Tag 880 (PinYin for Chinese characters) and adding local note and TOC fields

• With Islandora

Page 24: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

CHINESE RARE BOOK DIGITAL COLLECTION

• Customized Internet Archive Book Reader page progression for displaying and flipping classic Chinese books correctly (different from western books)

• All books objects in zip files are ingested by Drush command (The parameter of page progression entered during ingestion)

• With Islandora

Page 25: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

CHINESE RARE BOOK DIGITAL COLLECTION (4)

• Customized sorting for display of Chinese titles with the proper titleinfo and the numeric value of partNumber from MODS (otherwise v. 2 will be shown after v. 19 but not v.1 by default sorting)

• With Islandora

Page 26: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

CHINESE RARE BOOK DIGITAL COLLECTION

• Using the OpenSeaDragon Viewer, the image can be displayed clearly in the huge digital display wall at the new Digital Scholarship Lab

• The wall is built from twelve nos. of 55-inch high-resolution LED TV screens to provide an extremely high resolution of 24,883,200 pixels (7,680 pixels x 3240 pixels)

• With Islandora

Page 27: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

OTHER CHINESE RARE BOOKS COLLECTIONS

• As at today, over 320,000 page objects from 1,800 rare books were ingested to the system

• The Chinese Rare Books Digital Collection provides a concrete experience for developing a feasible migration strategy for other rare books collections

• Book objects are ingested according to their subject area and theming.

• Daoist Texts Collection was setup with books titles selected by Department of Culture and Religion Studies and Centre for Studies of Daoist Culture

• A new Chinese Medicine Collection would be created

• The migration would take more than a year, priorities will be given to those recently digitized but not yet accessible items, followed by B&W images digitized in last 10 years.

Page 28: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

ELECTRONIC THESES & DISSERTATIONS COLLECTION

• Before Islandora:

• Launched in 2014 with more than 10,000 ETD records

• One of the very high-use digital collections

• Search and retrieval system built on the existing Tamino XML database

• Our portal on Drupal CMS using iframe to display the search and browse pages

• PDF files are housed at another server

• Full text search is not by default due to system capabilities

• Each data load is using MARC→Excel→XML

• Lack of OAI-PMH function for harvesting

Page 29: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

ELECTRONIC THESES & DISSERTATIONS COLLECTION

• With Islandora• Launched in the locally-developed platform for less than 2 years, but decided to

migrate to Islandora due to the collection’s significance, popularity and well-structured records.

• Relies on the Islandora Scholar Solution Pack for display and the Islandora Solr Facet Pages module for browsing

• All data ingested are come from Innovative ILS and converted to XML for ingestion

• Challenge: display of non-English characters in the Islandora Solr Facet Pages due to limitation of Solr facet prefix

• Preparation of data for ingestion took more than 2 weeks. But the migration process is so straightforward that it took < 3 working days.

• This collection would be launched in May 2016

Page 30: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

Oracle Bones Collection

• Before Islandora:• A small but an important collection • Contains images of Oracle Bones that were existed 3,500 years

ago (Physical copies are kept at our main library)• Previously used a html webpage to display the digital image.

Later changed to use Drupal view/node to store and display in our portal

Page 31: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

Oracle Bones Collection

• With Islandora

• An expert from Academia Sinica, Taiwan, to provide proper metadata on each physical bone

• The first project that share metadata at ArchiveSpace

• Using OpenRefine to massage the metadata to follow MODS schema for Islandora

• OpenSeaDragon viewer to display and magnify the images clearly for users

• Future: As some 3D Oracle Bone images are available, may explore to see how they can be displayed in Islandora

Page 32: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

Sheng XuanHuai Archive

• This is a collaboration project with Arts Museum of CUHK

• ~10,000 letters/manuscripts of Mr. Sheng, who was a very influential entrepreneur in the late Qing Dynasty

• Vertical-transcribed Chinese text would be displayed side-by-side with the page image

• Transcription viewer is not new in Islandora, but vertical display that fits for traditional writing direction of Chinese characters does enrich this collection

• This collection would be launched at around Dec 2016

Page 33: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

SEARCH OF CHINESE CHARACTERS

• Enable the cross search of Traditional Chinese , Simplified Chinese and Variant Chinese characters (TSVCC)E.g. 台灣 (Taiwan) U+53F0 vs 臺灣 (Taiwan) U+81FA

• Discovery Garden has helped to apply Hong Kong’s TSVCC mapping table into the Solr

• Now conducting test to enhance precise search of word and phrase of Chinese characters

Page 34: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

LESSONS LEARNED

• Steep learning curve

• Support from the community is essential for continuous development

• The migration of metadata is a real challenge

• Two sources – ILS and ArchiveSpace with different structure and level of details

• The development of a standard and automatic method will save much time in further massaging the data in Islandora

• Much time spent on developing a single workflow for the team so as to save time and effort in migration – Islandora provides the capability to handle a single workflow for different collections and media types in a single and standard platform

• Dedication, focus and concentration helps to execute the project!

• An important component for the whole suite of service to support the university's Research and Digital Scholarship activities

Page 35: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections

THANK YOU

Repository URL:http://repository.lib.cuhk.edu.hk/en

Contact:[email protected]@lib.cuhk.edu.hk

Page 36: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections
Page 37: Islandora Webinar:  Highlighting CUHK Chinese Digital Collections