enabling interaction and quality in a distributed data dris

21
Enabling Interaction and Quality in a Distributed Data DRIS D. Scott Brandt Associate Dean for Research Michael Witt Senior Research Systems Administrator Purdue University Libraries CRIS 2006 Bergen, Norway May 11, 2006

Upload: jubal

Post on 12-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Enabling Interaction and Quality in a Distributed Data DRIS. D. Scott Brandt Associate Dean for Research Michael Witt Senior Research Systems Administrator Purdue University Libraries. CRIS 2006 Bergen, Norway May 11, 2006. Background: Purdue University. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enabling Interaction and Quality in a Distributed Data DRIS

Enabling Interaction and Quality in a Distributed

Data DRIS

D. Scott BrandtAssociate Dean for Research

Michael WittSenior Research Systems Administrator

Purdue University Libraries

CRIS 2006 Bergen, Norway May 11, 2006

Page 2: Enabling Interaction and Quality in a Distributed Data DRIS

Background: Purdue Purdue UniversityNine Colleges: Agriculture,

Consumer & Family Sciences, Education, Engineering, Liberal Arts, Management, Pharmacy/ Nursing/Health Sciences, Technology, Vet Medicine

73 Departments, several cross-disciplinary: e.g. Agricultural & Biological Engineering

Page 3: Enabling Interaction and Quality in a Distributed Data DRIS

Purdue University Libraries

2004 initiative for Librarians (faculty) to collaborate with other faculty across

campus—apply library science knowledge and expertise to various

research data problems: collect, organize, describe, curate,

archive, disseminate data/information

Page 4: Enabling Interaction and Quality in a Distributed Data DRIS

Strategic directions

University: “interdisciplinaryand collaborative endeavorsgrounded in the strengths of academic disciplines”

Libraries: Libraries faculty are integrated into campus research agenda

Page 5: Enabling Interaction and Quality in a Distributed Data DRIS

Areas of research collaboration

Discovery Learning Center

Earth & Atmospheric Science

English IT at Purdue Mechanical Engineering

Technology Regenstrief Center

Agronomy Biology Cancer Center Center for the

Environment Chemical

Engineering Chemistry Cyber Center

Page 6: Enabling Interaction and Quality in a Distributed Data DRIS

Current areas of participation E. Coli K-12 Model Organism Resource NIH proposal (B.

Wanner, Biology, PI, D. Scott Brandt, Libraries, Co-PI) : create archival process for curated database, assist in applying ontologies for data representation and annotation

An Expert System Multimedia Tutorial for Locating Technical Information, Purdue University TLT Digital Content grant (Megan Sapp, PI, Amy Van Epps and Michael Fosmire, co-PIs, with Bruce Harding, Mechanical Engineering Technology): develop tutorial for MET102 course in using and applying standards

URL-based Search Interface to the Distributed Institutional Repository Purdue University Graduate School (Michael Witt, Libraries, PI, Darcy Bullock, Civil Engineering, Co-PI): develop toolkit to deploy customized searching of dissertations by school, advisor, etc.

AquaEcon Web Library: An Electronic Resource on Economics-Related Literature on Aquaculture, NOAA (K. Quagrainie, Agricultural Economics PI, Hal Kirkwood, Libraries, as co-PI) : build and populate database

Page 7: Enabling Interaction and Quality in a Distributed Data DRIS

Progression towards CRIS

Institutional repository (IR) Distributed institutional repository

(DIR) Interactions related to DIR leading to

CRIS-like applications Leverage DIR for DRIS/CRIS

Page 8: Enabling Interaction and Quality in a Distributed Data DRIS

Distributed Institutional Repository

App

licat

ions

Met

adat

aR

epos

itory

e-prints

archival collections

grid resources

native databases

data archive

OAIService Provider

OAIData Providers

Page 9: Enabling Interaction and Quality in a Distributed Data DRIS

A systems-based approach to Libraries supporting research: linear

inputs experimentation outputs

CRIS Data repositories Document repositoriesA current research information system links people engaged in research with funding and other resources such as interdisciplinary collaborators

A repository of well-described data resulting from research processes is preserved and shared for repurposing

Journal article pre-prints, post-prints, conference and working papers, dissertations and other e-prints represent research outputs in a document repository

Page 10: Enabling Interaction and Quality in a Distributed Data DRIS

A systems-based approach to Libraries supporting research: cyclical

CRIS

data

repository e-print

repository

Page 11: Enabling Interaction and Quality in a Distributed Data DRIS

An example application: SRU

Linking to electronic theses and dissertations (ETD)

URL-based search interface to DIR running as a web service

$16,000 Strategic Development Initiative award for fellowship and server

Page 12: Enabling Interaction and Quality in a Distributed Data DRIS

Getting to the datasets: SRB

The Storage Resource Broker Developed by the San Diego Supercomputer

Center Uniform access to heterogeneous, distributed

storage Metadata catalog (MCAT) and preservation

functionality TeraGrid, collaboration with Information

Technology at Purdue and Rosen Center for Advanced Computing

Page 13: Enabling Interaction and Quality in a Distributed Data DRIS

An example systems interaction OAISRB: provides an OAI-PMH interface to the

SRB to expose metadata from resources on a data grid to OAI service providers

Apache Tomcat Server

OAI- PMH Interface (OAICat)

MCAT (SRB)

SRB Client (Jargon)

OAISRB H A R V E S T E R

HTTP

XML

Data grid

Page 14: Enabling Interaction and Quality in a Distributed Data DRIS

Sample OAISRB config#### OAI Handler Base URL FormatOAIHandler.baseURL=http://128.210.126.231:8080/OAISRB/OAIHandler#### SRB Connection ParametersSRB.HOST=orion.sdsc.eduSRB.PORT=7620SRB.USERNAME=mwittSRB.PASSWORD=nyahSRB.HOMEDIRECTORY=/dspace/home/mwitt.purdueSRB.MDASDOMAINNAME=purdueSRB.DEFAULTSTORAGERESOURCE=dspace-fs1SRB.MCATZONE=dspace#### SRB Collection Count and SRB Collection NamesSRB.root=/TGzone/home/lars.itapSRB.maxcollections=1SRB.collection1=LARSDATA#### Custom Parameters for SRB GRIDSRBRecordFactory.repositoryIdentifier=mwitt.purdueDisplay.MaxListSize=50#### Custom Identify response valuesIdentify.repositoryName=SRB Data GridIdentify.adminEmail=mailto:[email protected]=2000-01-01T00:00:00ZIdentify.deletedRecord=no#### Crosswalk (in this example, FGDC-to-unqualified Dublin Core)DC.Identifier=titleDC.Description=purposeDC.Title=titleDC.Format=File FormatDC.Creator=addressDC.Subject=metprof

Page 15: Enabling Interaction and Quality in a Distributed Data DRIS

Metadata research

Metadata librarian worked for four months analyzing metadata needs and processes for several data sets

Results included DC descriptions, enhanced with thesaurus headings, and a basic crosswalk

Also: metadata descriptions from scratch are too manually intensive…

Page 16: Enabling Interaction and Quality in a Distributed Data DRIS

Metadata- Water Quality

A flat file with only “system” metadata Began with Dublin Core Enhanced subjects with thesaurus from

NAL (US National Agriculture Library) Looked at DIF (Dir. Interchange Format) Looked at cross-walk with FGDC (Federal

Geographic Data Comm.) format

Page 17: Enabling Interaction and Quality in a Distributed Data DRIS
Page 18: Enabling Interaction and Quality in a Distributed Data DRIS
Page 19: Enabling Interaction and Quality in a Distributed Data DRIS

Next steps: Metadata

Articulate metadata workflow to imbed metadata into the process

Review automating all data Determine how/where to validate and

automate descriptive metadata

Page 20: Enabling Interaction and Quality in a Distributed Data DRIS

Conclusions and Questions Use existing, native metadata whenever possible Automate and periodically assess processes to ensure quality Diminishing returns: we settled on discovery and collection-

level metadata Crosswalks are useful but can truncate or distort the original

meaning The importance of interactions, among people and systems How do we implement CRIS/CWIS/DRIS in our environment? What is the role of the Libraries in such?

Page 21: Enabling Interaction and Quality in a Distributed Data DRIS

Michael [email protected]

D. Scott [email protected]

Takk (thank you)