focus on your content, not on ingesting your content terry brady applications programmer analyst...

62
Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library [email protected] https://github.com/organizations/Georgetown-Universit y-Libraries

Upload: jada-whalen

Post on 27-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Focus on Your Content, Not on Ingesting Your

ContentTerry Brady

Applications Programmer AnalystGeorgetown University Library

[email protected]

https://github.com/organizations/Georgetown-University-Libraries

Page 2: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Goals of our Repository Managers

Create new collections

Grow collections

Accurately describe collection contents

Showcase our repository content

Page 3: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our storyUsing simple tools to facilitate these goals

Page 4: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Imagine that you have content to load into your

repository

Page 5: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Scenario: One Item to Add to DSpace

Page 6: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

One Item to Add: Item Submission

Click through 7 item submission screens

authoring metadata as you go

Page 7: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Scenario: Three Items to Add to DSpace

Page 8: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Three Items to Add: Item Submission

Click through 3x7 item submission

screens authoring metadata as you go

Page 9: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

50 Items

Scenario: 50 newspaper issues to add to DSpace (very similar metadata)

Page 10: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

50 Items to Add: Individual Item Submission is impractical

Page 11: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Next OptionDSpace Bulk Ingest Process

Page 12: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

DSpace Bulk Ingest

50 Items

Page 13: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Ingest Folder

Media File

Thumbnail (optional)

Contents File

Metadata File

License File (optional)

Page 14: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: Build a Metadata Spreadsheet

50 Items

Page 15: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: Build Ingest Folders

50 Items

Page 16: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemCopy Item to Folder

50 Items

.PDF

Page 17: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemsCreate a unique Contents File

50 Items .TXT

.PDF

Page 18: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items

.PDF

.TXT

.XML

Page 19: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: Initiate Import from a Terminal Window

50 Items .TXT

.PDF

.XML

Page 20: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Bulk Ingest: For Each ItemsCreate a Dublin Core File

50 Items .TXT

.PDF

.XML

What if you make a mistake?

What if you need to refine the metadata?

Page 21: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

The ChallengeWant to grow the collections

But, the ingest process is daunting

Page 22: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

The conversation focused on HOW to ingest the contentRather than on the content itself

Page 23: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Approach

Page 24: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Approach:Empower Content Owners

• Automate the tedious tasks

• Make metadata entry the focus of the effort

• Hide the command line from content owners

Page 25: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Approach:Simple Tools

Work around the tedious steps

Without constructing a complex workflow

Page 26: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Our Tools

• File Analyzer

o Desktop Application for File System Traversal

• DSpace QC Tools

o Web application for Batch Process Submission

Both of these tools are available on GitHub

• Georgetown-University-Libraries

Page 27: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

File AnalyzerDesktop Application for File Processing

Page 28: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 29: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

What we need

50 Items

Page 30: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 1: Automatically Generate an Ingest Inventory based on existing files

50 Items

Page 31: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 32: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Export the Generated Inventory

Page 33: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 2: Edit the Ingest Inventory as a Spreadsheet

Page 34: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 3: Generate the Ingest Folders from the Inventory Spreadsheet

Generate Contents FileGenerate Dublin Core Metadata FileInclude custom thumbnails if applicable

Page 35: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 36: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Create Ingest Folders

• An error message will appear if files are missing (or misspelled)

• Process can be rerun if the metadata spreadsheet needs to change

Page 37: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Ingest Folder Creation Report

Page 38: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 4: Validate Ingest Folders

• Identify Missing Files• Required Metadata• Validate Files

o Contentso Dublin Core

Page 39: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 40: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Validation Status Report

Page 41: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Step 5: Move Ingest Folders to Server and Initiate Bulk Ingest

Page 42: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

for Batch Process Submission

Web Tools

Page 43: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 44: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Web Tools, Tutorials co-located with tools

Page 45: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Collection

Folder Location

Page 46: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Processes run by Bulk Ingest

• import

• filter-media [collection]

• update-discovery-index

• oai-import

• stats-util

Content is visible, searchable, and thumbnails are present!

Page 47: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 48: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Results

Empowered Librarians

Iterative metadata refinement

At the right point of the workflow

Significant growth in repository content

Decreasing IT involvement

Rapid development of support tools

Page 49: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Derived Tools

Generate Ingest Folders for ProQuest ETD's

Filter Media

Page 50: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Ingest ETD's from ProQuest

Page 51: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

ProQuest ETD Ingest Rule

Page 52: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Filter Media Toolfor Items Submitted One by One

Collection

Filter Media Tasks

Re-index?

Page 53: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Benefits

Companion tools easy to learn

Users are very comfortable with them

De-mystify DSpace-specifics

Users trained other users!

Page 54: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Other Tools Created

Automation

• Undo Bulk Ingest

• Update Metadata

• Move Community/Collection

Reporting

• Data Quality Reports

• Statistics Reports

Page 55: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

More Tools (time permitting)

Page 56: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Data Quality Reports

• Items with multiple media files

• Non-PDF Document Items

• Items missing a Thumbnail

• "Non-standard" Media Types

• Items modified last 30 days

• Items with Embargo

• Items missing a metadata field

• Item metadata containing a URL

Page 57: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Collection QC Report

Page 58: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Item QC Report

Page 59: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Usage Statistics Reports

• Not confident in the out of the box reports

• Wanted to understand underlying data

• Filter Stats

o On campus

o Within the library

Page 60: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu
Page 61: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Try it yourself

GitHub: Georgetown-University-Libraries

• File Analyzer & Metadata Harvestero Just need a Java Compilero Contains several utilities for digitization workflowso Links to tutorials

• DSpace QC Toolso PHP Codeo Sample code, not ready to runo Links to tutorials

Please let me know how these work for you!

Page 62: Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library twb27@georgetown.edu

Terry BradyApplications Programmer Analyst

Georgetown University [email protected]

https://github.com/organizations/Georgetown-University-Libraries