moving electronic theses from etd-db to eprints: the best of both worlds betsy coles technical...

65
Moving Electronic Theses from ETD-db to EPrints: The Best of Both Worlds Betsy Coles Technical Manager, Digital Library Systems California Institute of Technology [email protected] Katherine Johnson CODA Coordinator and Metadata Librarian California Institute of Technology [email protected] CNI Spring 2010 Membership Meeting, Baltimore, MD April 12, 2010

Upload: annabel-anthony

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Moving Electronic Theses from

ETD-db to EPrints: The Best of Both Worlds

Betsy ColesTechnical Manager, Digital Library Systems

California Institute of [email protected]

Katherine JohnsonCODA Coordinator and Metadata Librarian

California Institute of [email protected]

CNI Spring 2010 Membership Meeting, Baltimore, MDApril 12, 2010

The BackgroundWhere We Came From

Background: CODAThe Caltech Collection of Digital Archives

(CODA) grew out of a longstanding commitment to scholarly communication and open access

Since its inception CODA has been integral to the Caltech Library’s missionNot a separate department or functionMany staff involved, from all levelsNo special funding – support is from general

operating budget

Background: CODAFirst repository: Computer Science Technical

Reports, using EPrints software from the University of Southampton (April 2001)

Thesis archive, using ETD-db software from Virginia Tech starting (July 2001)

In 2010: largest CODA repository is CaltechAUTHORS, with almost 15,000 non-thesis items

Background: Why ETD-db for Theses?

In 2001, there were not many options!

ETD-db was designed specifically for thesesThesis-specific metadataSupport for thesis workflow and approval processSupport for withholding theses, or parts of theses,

pending journal publication or patent application“Notifications” allowing staff to communicate with

authors via email

Background: ETD EvolutionVoluntary ETD submission for the first year

(2001/2002)

Mandatory submission for PhD students after July 2002

Electronic thesis is now the version of record and we are committed to its preservation

But we still keep and bind a paper copy

Background: From Paper to Bits

Retrospective conversion of older theses began in 2002

Library staff handle both scanning and submitting

No dedicated staff – it’s a “spare time” activity, and many staff participate

We are currently more than halfway through the backlog of pre-2002 theses

Background: Numb3rsNew theses: we add between 180 and 250 per

year

Retrospective conversion: we add about 500 per year

As of April 20105,518 electronic theses in collection1,433 of these were born digital

Background: Numb3rsUsage is high; most users come from Google

In March 201019,000+ website visits from 14,000+ unique usersMore than 22,000 document file downloads

(.pdf, .doc, and .ps)More than 4,300 supplemental file downloads

(video, data files, software, etc)

The ProblemWhat’s Right, What’s Wrong?

Problem: Multiple PlatformsAs of 2008, we were running ETD-db for theses

and EPrints for everything elseDuplication of effort for software maintenance and

system administrationStaff had to learn two systemsUsers had to search two interfaces

Problem: Resource CrunchIn 2008, we were operating with reduced staff

and limited resources (like everyone else)

Since we had no dedicated repository staff or funding, efficiency was crucial

Problem: Need for FlexibilityBy 2008, ETD-db software was aging

In contrast, the EPrints platform is in active development; EPrints 3 includes:Plugin architecture offering easy customization

and extension to support new features and protocols

Active development team and contributing user community worldwide

Much more ….

The GoalsThe Road to Oz

Goals: One PlatformWe wanted all our repositories on one platform

Greater efficiency would allow more forward development

Goals: WorkflowNeed to improve instructions and

documentation for students submitting theses and staff processing theses

Better understanding of thesis workflow

No disturbance of delicate and hard-won relationships with other campus organizations involved with theses (Graduate Office, academic departments)

Need to modernize our process for submitting theses to Proquest/UMI

Goals: TechnicalRetain useful features of ETD-db

Thesis-specific metadata fields and search capabilities (committee, major/minor options, etc.)

Ability to communicate with authors, via email, from within the system interface

Special limited-access categories for thesis materials (restricted, withheld), at the file level as well as the record level

Support for the complex thesis-approval workflow

Goals: TechnicalAdd brand-new features

New metadata elements including advisor(s), major and minor field, funders, additional dates, references, internal notes, and others

Ability to store and identify related documents (copyright permissions, signed thesis forms, etc.) in a hidden part of the record

Expanded ability to track theses through the complex approval process

Additional automatically generated emails

The PlanMigrating Theses to Eprints 3

The Plan: Outsourced Elements

EPrints Services in Southampton would createMetadata conversion scripts and metadata and

data migration scriptsNew email trigger function in EPrintsNew complex data structure for degree-granting

departmentsNew functionality to accommodate “hidden”

documents, e.g. permissions letters, signed thesis forms

The Plan: In-House ElementsCaltech Library Services staff would do

Metadata analysis and modificationsAnalysis and customization of the user interfaceCustomization of the system “workflow” – the

movement of theses through the stages of approval

Migration of persistent URLS within the Caltech Library’s locally developed PURL system

The Plan: In-House ElementsLocal staff would also

Customize and localize screen text and help textCreate a new web guide for student submittersWrite documentation for library staff using the new

system

The Plan: Timeline6 month timespan allocated for migration

(March though August 2009)

Informal scheduling process: timeline and tasks list maintained on the library wiki

Schedule did slip by one month, but project was complete by the beginning of the academic year (Sept. 2009)

The ProcessOn the Road

Process: PeopleInitial task group

the coordinator for the ETD-db repositorythe programmer/system administrator for the

digital repositoriesone subject liaison librarian with extensive CODA

involvementThe Metadata Group support staff person who

processes submitted theses

Process: PeopleGroup expanded gradually as project

progressed:Other subject librarians reviewed progress,

especially interface issuesStaff were asked to test specific featuresFinal testing phase was open to all library staffFeedback was received from a wide range of staff,

from the University Librarian to circulation desk staff

Process: What & WhereEPrints Services staff in Southampton were able

to fit their work into our timeline. Code was often delivered ahead of schedule

Integration of contracted code was done at Caltech

Testing of system components, migration process, and user interface was also done locally

Process: TestingWe used a “staged” process to test the

conversion and migrationMultiple rounds of testing the migration process,

with a larger number of records each time and a larger number of testers

Final test was a full “dress rehearsal” of the complete migration process.

We wish we had had time for formal usability testing, but we didn’t

Process: ArrivalThe actual migration was done on a weekend

Thesis submission was unavailable for 48 hours, but public search interface was up

Actual conversion process took about 8 hours

The remainder of the weekend was devoted to reviewing and testing the results

CaltechTHESIS was open for business Monday morning, as planned

The OutcomeAre We There Yet?

Outcome: Immediate Feedback

First thesis was submitted by a student within hours

We emailed the submitter:

“Congratulations!  You are the first student to have deposited his thesis into the new CaltechTHESIS database.  Would you mind giving us some feedback on your experience?  Ease of use, problems encountered, confusion?”

Outcome: Immediate Feedback

The student’s reply:

“Thanks! I was wondering when this had changed, realized it must have been recent. I found the submission quite easy, it took me only a couple of minutes for the whole process.”

Outcome: The View from HereOur experience in the months since we “went

live” with CaltechTHESIS has confirmed our initial impression that we now have a modern, flexible system that provides a better user experience and smoother process for both students and library staff.

Outcome: SpecificsGoals met – we now have

A better and more easily supported technical system

More efficient thesis processing by library staffA better user experience for

Students submitting thesesLibrary staff processing thesesSearchers worldwide

Outcome: On the HorizonNo, we’re not “done.” To-do list:

Data cleanup remaining from the conversion (fairly minor), and filling in the new metadata fields added as part of the migration project

Export plugin for Proquest/UMI’s XML metadata format (currently being tested)

ETD-MS format for OAI harvesting (in process)User interface tweaks and improvements ( a

never-ending task!)

Outcome: On the HorizonMore “to-do’s”

Upgrade to recently-released EPrints v. 3.2Upgrade to new faster hardware and Red Hat 5

64-bit Linux operating system Implement available EPrints add-ons:

IRSTATS statistics moduleDROID/PRESERV plugin to support preservation

status monitoring

Outcome: On the HorizonEven more “to-do’s”

Perhaps most important: complete the task of documenting the migration in technical terms and uploading migration scripts into the EPrints wiki, so that others may make use of what we’ve done.

Lessons LearnedThe Rear-view Mirror

Lesson: Best of Both Worlds?We avoided the “buy vs. build” dilemma by

contracting out specific parts of the migration development work to experts, while using our own resources where our local skill set could be put to best use and where local control was crucial to success.

Lesson: Best of Both Worlds?We now have, in EPrints 3, a single, full-featured

repository platform for all of our institutional materials, but we haven’t lost any of the valuable functionality of the older system.

The FutureWe look forward to beginning our second

decade of institutional repository management with a strong and flexible foundation.

Links CaltechTHESIS – http://thesis.library.caltech.edu

CaltechAUTHORS – http://authors.library.caltech.edu

CODA – http://library.caltech.edu/digital

Thesis workflow planning document: http://library.caltech.edu/etd/System_Independent_Thesis_Workflow.pdf

Web guide for student submitters –http://libguides.caltech.edu/theses

This Presentation: http://resolver.caltech.edu/CaltechLIB:2010.001

More Links EPrints software – http://software.eprints.org

EPrints Services – http://www.eprints.org/services/

ETD-db software – http://scholar.lib.vt.edu/ETD-db/developer/index.shtml

Screenshots