data library services in the data stewardship lifecycle
Post on 16-Apr-2017
2.912 Views
Preview:
TRANSCRIPT
Data Library Services in the Data Stewardship Lifecycle
Charles (Chuck) HumphreyUniversity of Alberta
Outline
• Canada as a case study for data library services: a twenty-year experiment
• Lessons learned from Canada• General observations about forces shaping data
library services• Data and other digital collections• The data “continuum of access” in collection
development• Data reference and technical services
• Planning levels of service for data libraries• Applying a data stewardship lifecycle model
The Canadian experience
1970 1980 1990 2000 2010
Introduction of public use data products from the 1971 Census in digital format
A set of the 1981 Census data products cost~$12,000
The cost of 1986 Census data products > $200,000
CARL Census Data Consortium was formed in 1989
The “Modern” Census Era
The Canadian experience
• 1989 is a benchmark year in the development of data library services in Canada, which arose out of a response to Statistics Canada’s new pricing policy mandated by the Conservative government in power.
1970 1980 1990 2000 2010
CARL Census Data Consortium was formed in 1989
Data Library Context in 1989
• 8 data libraries
3 in the west
5 in the east
• 3 in libraries
2 in academic computing centres 2 in research centres
1 library hybrid
Data Library Context in 1989
• 8 data libraries
3 in the west
5 in the east
• 3 in libraries
2 in academic computing centres 2 in research centres
1 library hybrid
Data Library Context in 1989
• 8 data libraries
3 in the west 5 in the east
• 3 in libraries 2 in academic computing centres 2 in research centres
1 library hybrid
The closest data library to
my service was 1,200 km away.
Data Library Context in 2009• 75 data library services 25 in the west 50 in the east• All 75 are located in libraries
Changes between 1989 and 1998
1989 1994 1999 2004 2009
CARL Census Data Consortium, 1989
CARL General Social Survey Microdata Consortium,1991
COPPUL/ICPSR Federation, 1993
CARL Data Consortium for the 1991 Census, 1994
Data Liberation Initiative Pilot Launched, 1996
Annual COPPUL Data Service Training Workshops, 1992
Ontario-Quebec/ICPSR Federation, 1994
DLI Regional Training Workshops, 1997
Changes between 1999 and 2009
1999 2004 2009
Research Data Centre Network, 2000
National Data Archive Consultation, 2001-2002
Consultation on Access to Scientific Research Data, 2005
Research Data Strategy Working Group, 2008
DLI Train the Trainers Workshop, 2004
Canadian Digital Information Strategy, 2007
General lessons from Canada• Collections were a driving force behind
libraries introducing data library services.• Institutions working through cooperative
arrangements helped introduce data as a library resource.
• Collection development at the local level was largely driven by the general availability of data.
• Training has been an ongoing factor in the continued participation of libraries in data collection initiatives.
General lessons from Canada• Peer-to-peer training has been an effective
method in DLI, using the general rule that as you learn, you teach others.
• Training has allowed for differences in data needs and data cultures across institutions and regions in the country.
• Training opportunities have been continuous through annual regional workshops. DLI workshops have become an expectation.
• IASSIST conferences provide an immersion to data services and should be viewed as a training opportunity.
General lessons from Canada• National consultations and international
pressures have made data a well-discussed topic in this decade but have failed to make data a political priority.
• While everyone seems to be talking about data, few are actually doing anything to address concerns about data access and preservation.
• Part of the inability to mobilise a collective response to data access and preservation in Canada is the absence of a forum for data people to plan and coordinate work together. The Research Data Strategy Working Group is a first attempt at this.
Collection development• Data collections are part of a growing number of
digital collections being managed in today’s libraries.
• Libraries face buying or leasing these collections, producing their own through digitisation projects, or serving as stewards for collections that are entrusted with them.
• In Canada, data collections have tended to be leased. Most data licenses require that the producer’s data must be destroyed once a lease is terminated. One result is that these leases have become long-term commitments by libraries.
Collection development• With leased data, the role of a data librarian
becomes one of managing the contractual relationship between data producers and her or his local institution.
• Collection development consists of choosing data producers that have data corresponding to patron needs on campus. Often these omnibus collections have a mix of data that will support a variety of research interests. One can characterise these as “collections of access.”
• One strategy in Canada has been to select data collections that support a continuum of access to products.
Continuum of access
Open accessFree access
Published statistics
Restricted accessExpensive accessConfidential data
Conditional accessFees for access
Anonymised dataAggregate databases
Continuum of access
Open accessFree access
Published statistics
Restricted accessExpensive accessConfidential data
Conditional accessFees for access
Anonymised dataAggregate databases
Web AccessData Enclave
Access
Web and in person Access
Collection preservation• Unlike many other countries that have national
data archives, Canada is without an institution providing stewardship for the long-term preservation of and access to data.
• This institutional gap in Canada is now being addressed by proposals to establish trusted data repositories in some universities. The goal is to build a network of repositories nationally to preserve data collections.
• Data libraries, to a limited extent, have helped fill the gap in the absence of a national data archive.
Data reference• Data reference is dependent on the level of
service being supported, which can include:• locating data that has been requested by title,• finding data to support a line of research
enquiry,• interpreting data documentation,• extracting subsets of data and providing the
data in a format directly useable by a patron,• merging and manipulating data files to
produce new data products;• providing advice to researchers throughout a
project on metadata and data management.
Technical services• Metadata for data collections should include
(i) a general item description and (ii) a detailed content description that documents the data for machine processing as well as human understanding.
• A general item description in MARC format is typically produced for online catalogues and may be generated by the data producer or local bibliographic services.
• The detailed content metadata is generated by the data producer and can be delivered in a variety of formats and in conventions that often are not based on standards.
Technical services• The computing support will depend on the
level of service being provided, just as collection development and data reference services depend of a defined level of service.
• Web 2.0 services for data delivery are tempered by the license agreements with data producers. Typically, institutions are required to use IDs and passwords to access data holdings.
• As federated authentication systems become more widely shared across institutions, redundant storage of data collections will lessen as institutions share the physical storage of data.
Planning levels of service• The importance of levels of service has been
mentioned repeatedly in the context of data collections, data reference and technical services.
• What are the options for levels of service? How does one go about planning for levels of service? What co-dependencies must be including in data service plans?
• These questions can be addressed using a new framework based on the data stewardship lifecycle.
A framework for planning data services
• The concept of data stewardship in combination with a lifecycle model of data provides a useful tool for planning data library services.
• Data stewardship identifies the roles and responsibilities of all individuals and groups engaged in the production, access and preservation of data throughout its lifecycle.
• A data service plan should clearly state the roles and responsibilities identified with the level of service to be supported.
A framework for planning data services
• The lifecycle of data is a representation of the various stages through which data flow from production to use to preservation to new uses.
• Each stage consists of a set of related activities that culminate in a significant product, which is then passed to a subsequent stage.
• By linking together a series of stages in logical sequence, the processes of data production and use are described.
Lifecycle of data• As with any project management operation, the
views of a project vary depending on the granularity at which activities are described.
• Similarly, the stages in the data’s lifecycle can be aggregated or disaggregated into larger or smaller groupings, depending on the viewpoint one desires.
• Keep these points in mind while examining a couple of lifecycle representations.
• The first model is the widely circulated DCC curation lifecycle.
http://www.dcc.ac.uk/lifecycle-model/
http://www.dcc.ac.uk/lifecycle-model/
This table lists changes to the stages in the DCC model, re-aggregating activities in the lifecycle to create a data library viewpoint.
DCC Data Lib
create or receive data production
appraisal and select dissemination
ingest, store, access and use data repository
discovery
transform repurpose
Data lifecycle for data libraries
Data Repurposing
Data ProductionData Repository
Data Dissemination
Data Discovery
Data production• Stewardship role:• Responsible for the terms of data use
specified in the license of the data producer;• Serve as lifecycle advisor to local data
producers.
• Potential data services activities:• Help local project managers develop data
plans incorporating a lifecycle perspective; • Provide researchers who are collecting data
on human subjects with support statements for their ethics approval applications;
• Provide researchers with support statements on data management in grant applications;
Data production• Provide feedback to data producers about data
in demand on campus;• Provide data producers with usage statistics on
their data;• Assist with literature and data searches in the
study design stage; • Consult with local data producers on metadata
standards for data documentation; • Organise training on the DDI metadata standard;• Provide data preservation services throughout
the data production stage.
Data dissemination• Stewardship role: • Responsible for communicating the terms of the
license with patrons;• Ensure the data products that are delivered are
complete, documented and machine readable;• Ensure the appropriate level of security is
maintained for the data.
• Potential data services activities:• Monitor the release dates of data from producers; • Acquire data and metadata from data producers;• Prepare catalogue records for data titles;
Data dissemination
• Develop and maintain local access to data, providing formats appropriate for local needs;
• Support infrastructure that provides online access to data;
• Provide data dissemination services for researchers on campus;
• Provide data anonymisation services for human-subject data collected on campus;
• Coordinate the deposit of local research data with a data archive or repository.
Data repository• Stewardship role: • Responsible for the data collection in local
repository;• Responsible for service plan and operation of
local repository;• Responsible for metadata practices in local
repository.
• Potential data services activities:• Prepare and implement a data collection
development plan;• Acquire and ingest data from producers; • Ensure data product authenticity;
Data repository• Appraise, select and ingest data originating on
your campus into the repository;• Provide services to support the use of data,
including help with data extractions, reformatting and subsetting;
• Manage the collection of data and metadata, including refreshing digital media and migrating data to new digital media;
• Coordinate activities with the local Institutional Repository;
• Achieve and maintain “trusted” repository status.
Data discovery• Stewardship role: • Responsible for providing each patron with a
comprehensive data reference interview or, when appropriate, for making an informed referral.
• Potential data services activities:• Provide reference services to assist patrons in
their search for data;• Produce and maintain metadata services to
help find data (may involve metadata production, loading records into local OPAC, supporting Nesstar or Dataverse service, etc.);
Data discovery
• Conduct data literacy training for library colleagues and establish the grounds for informed referrals to data services;
• Conduct data literacy training for patrons;• Promote data citation practices on your
campus as part of the ethics of academic integrity;
• Contribute to the development of new tools to exploit metadata.
Data repurposing• Stewardship role: • Responsible for ensuring permissions are in
place for repurposing data;• Ensure the deposit of new data in a repository.
• Potential data services activities:• Help patrons mine metadata to discover data
for repurposing;• Provide technical support to help patrons
merge data from multiple sources for new purposes;
Data repurposing
• Participate in projects seeking new ways of exploiting metadata;
• Integrate the production of metadata into data repurposing practices;
• Engage in tools development to support data mining and data visualisation.
Across all stages of the lifecycle
• Work with data producers on campus to establish roles and responsibilities for the long-term access and preservation of data by clarifying stewardship roles.
• Work to ensure that information gaps do
not occur throughout the lifecycle, such as files getting stranded on hard drives.
Start a service appropriate for today
• A data library does not have to begin with a large mandate but can be tailored to the level of support that can be maintained and staffed today.
• Having said this, plan for expansion! The services offered by a data library have a way of generating new expectations that will require further resources, spanning staff, training and infrastructure.
• The returns on the service will surprise you!
top related