providing access to your data robert r. downs, phd socioeconomic data and applications center...

21
Providing access to your data Robert R. Downs, PhD Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science Information Network (CIESIN) Columbia University December 6, 2011 Copyright 2011. The Trustees of Columbia University in the City of New York

Upload: hillary-hampton

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Providing access to your data

Robert R. Downs, PhDSocioeconomic Data and Applications Center (SEDAC)Center for International Earth Science Information Network (CIESIN)Columbia University

December 6, 2011

Copyright 2011. The Trustees of Columbia University in the City of New York

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Providing access to your data - overview

•Determining your audience•Rights•Access Mechanisms•Tracking Data Usage•Handling Sensitive Data

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Why determine your audience?

•Determining the audience for data products or services influences the development and management of the resources to foster discovery, accessibility, understandability, and usability by the audience.

•Producers, distributors, stewards, and sponsors of data need to know what audiences are being served.• Producers can develop resource for use by the audience.• Distributors can package, promote, and disseminate resources

to the audience.• Stewards can manage the data to meet current and prospective

audience needs and assess its value to the audience.• Sponsors can support data collection, development, and

distribution for the audience.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Determine audience throughout data lifecycle

•Audience may be defined by user demographics, such as discipline (oceanography, geology, etc.) or level of expertise (graduate students, experienced climate scientists, etc.)

•Audience may be defined by the application and uses of data products and services, such as scientific research, education, policy-making, and planning.

•Determination of the intended audience should inform initial data development and dissemination plans.

•Evaluation of past and current audiences should inform enhancements for data products and related services and can be used to identify the impacts of the resources.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Methods for determining current audience

•Metrics for access and support (describe in privacy policy)• Web metrics (initial and repeat visitors’ page visits, downloads)• Contact information provided during requests for assistance

•Registration• Request contact information from recipients of data

•User surveys (with approval of ethics board)• Administer questionnaires to visitors of data webpage

•Citations – provide a recommended citation for data• References in published scientific literature (articles, books, etc.)• References in gray literature (presentations, reports, blogs, etc.)• References in popular media (magazines, newspapers, radio,

television, etc.)

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Rights

•Intellectual property rights for the use of data are decided by the producer (or owner, if different).

•Licenses or permissions may allow others to use the data.•A description of the rights for the use of data should be packaged with the data so that potential users can determine what they are allowed to do with the data.

•Applying non-restrictive licenses and open data practices can allow more uses of the data.

•Rights also should be described for any documents, software code, or scripts that are needed to use the data.

•Rights laws are not the same in every country.• In US, copyright is automatically applied to original works, not to

facts.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Relevance of rights to data management

• Distributors, stewards, and end-users of data need to receive rights for using data.• Distributors need rights to modify and disseminate the data and to

allow others to distribute the data.• Stewards need to acquire rights to maintain, modify, and store the data.• End-users need to acquire rights to use the data.

• Recipients of rights need to know what rights they have been granted for using the data so that they may exercise their rights and are aware of any restrictions.• Simple language should be used to communicate rights assigned to the user.• If clear descriptions of the rights are packaged with the data, then users can

easily learn about their rights and requests for rights will be limited.• Machine-readable rights descriptions can be harvested for distribution in

catalogs.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Choices for assigning rights: examples

• Public Domain – not copyrighted, permission is not required• Works created by some government agencies may be in the public domain.• Works in the public domain can be marked to identify them.• Rights can be waived by applying a Creative Commons 0 (CC0) license.

• CreativeCommons.org licenses for copyright holder to allow use, dissemination, derivation, integration• CC By License: requires attribution (cite the source).• CC By-ND: requires attribution, no derivatives allowed.• CC By-NC License: requires attribution, no commercial use allowed.• CC By-SA License: requires attribution, share alike using same license.

• Apache.org license for copyright holder of scripts and computer code• Applied to code to allow “worldwide, non-exclusive, no-charge, royalty-free,

irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute” (Apache License 2.0, 2004).

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Access Mechanisms

•Access providers• Single or multiple distributors (version control, commitment, cited source)

•Producer provides access• Producer’s website, project website, blog, Email, portable media

•Distributor provides access• Distributor’s website: organized by collections, projects, or producers

•Provide access to packaged data• Data package includes documentation (any information users need to

know)

•Access options• Free or fee, discovery, user capabilities, user services (help), registration

•Long-term access• Organizational commitment, infrastructure, continuing management

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Tracking data usage

•Scientific data can be used for various purposes in addition to those initially planned by the investigators who originally collected the data• New products, such as maps, can be created and distributed• Data may be integrated with other data to create new data• Studies may be published about the use of the data• Policies, plans, and reports may result from using the data

•Tracking the use of scientific data identifies how the data have been used, can reveal benefits and impacts of their use, and can justify previous and future data efforts

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Why track data usage?

•Data producers, distributors, stewards, and sponsors need to know how the data have been used.• Producers can determine whether the data are useful to others.• Distributors will learn which communities the data are serving.• Stewards can document the data with descriptions of prior use.• Sponsors can justify the costs of the effort and demonstrate the

potential value of sponsoring future efforts.

•Awareness of how data have been used may identify opportunities for new data collection and distribution efforts.

•Tracking data usage can justify the effort of collecting data and of developing and distributing data products or services.

•Evidence of beneficial data use demonstrates accomplishments for those who contributed to the effort.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Describing tracked usage

•Use may be describe in terms of users, such as discipline (oceanography, geology, etc.) or level of expertise. (graduate students, experienced climate scientists, etc.)

•Uses of the data can be categorized by purposes, such as scientific research, education, policy-making, and planning.

•Identifying uses of similar data should inform initial data development and dissemination plans.

•Evaluation of past and current uses should inform enhancements for data products and related services and can be used to identify the impacts of the resources.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Methods for tracking data usage

•Metrics for access and support (describe in privacy policy)• Web metrics (initial and repeat visitors’ page visits, downloads)• Contact information provided during requests for assistance

•Registration• Request contact information from recipients of data

•User surveys (with approval of ethics board)• Administer questionnaires to visitors of data webpage

•Citations – provide a recommended citation for data and subscribe to alerts from publishers• References in published scientific literature (articles, books, etc.)• References in gray literature (presentations, reports, blogs, etc.)• References in popular media (magazines, newspapers, radio,

television, etc.)

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Sensitive data

•Some data are sensitive and may require special handling.

•Data may be designated as sensitive for various reasons.•The designation of data as sensitive may be permanent or temporary.

•Sensitive data must be identified and properly handled to ensure that any sensitive information is not compromised.• Data managers must ensure that sensitive data are adequately

protected so that they are not accessed or modified by unauthorized persons and are not lost or misused.

• Data users must ensure that their handling or use does not violate any restrictions that have been placed on sensitive data that they use.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Why is sensitive data a concern?

•Data may be designated as sensitive to protect individuals and other life forms, property, programs, or other interests.• Laws, regulations, or policies may designate data as sensitive. • Licenses and agreements may specify requirements to protect data.• Security: data about locations or property that cannot be disclosed.• Privacy and civil rights: personal or information identifying individuals.• Environmental information about locations or inhabitants that need to be

protected.• Proprietary information about new developments or technologies.

•Violating restrictions that have been placed on sensitive data is unethical and could result in civil or criminal penalties, such as fines, loss of privileges, or incarceration.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

How are sensitive data handled?

•Access and use • Sensitive data must be protected to limit who may have access or how

they are used.• Authorization for access may be limited to specific individuals or specific

purposes.• Sensitive data may require additional security, monitoring, and loss

prevention.• Safeguards may need to limit modifications or derivations of sensitive

data.• Restrictions may limit whether sensitive data may be used in products or

services.

•Dissemination and copies• Restrictions may limit who is authorized to distribute the data, how the data may

be distributed, and whether copies of the data may be created.• Distribution of sensitive data may be limited to a specific location or service.• Distribution may be limited to a specific time period, which may be in the future.• Distribution of data and derived products or associated services may be limited.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Methods for handling sensitive data

•Identify data that must be designated as sensitive.•Designate data that are sensitive and label as sensitive.•Store sensitive data in secure facilities.•Transport sensitive data using secure means.•Include with sensitive data, a description of the conditions for handling sensitive data along with acceptable uses.

•Establish infrastructure and procedures for storing, managing, disseminating, accessing, copying, transmitting, and using sensitive data.

•Train staff on how to protect sensitive data from unauthorized use.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Methods of providing access to sensitive data

•Limit access to authorized individuals or roles• Enable access within a facility only accessible by authorized persons.• Limit access to authorized individuals with password-protection.• Establish a data enclave without capabilities to copy or transmit data.

•Provide options for authorizing access to sensitive data• Require evidence of approval or expertise.• Require users to sign a confidentiality, non-disclosure, or data use

agreement.• Require approval of data protection plan from potential users.• Request an exception or waiver of restrictions under certain conditions.• Request clearance for specific individuals to access sensitive data.• Request declassification of sensitive data from relevant authority.

•Modify the data to protect sensitive portions• Change aspects of data that are sensitive (recode, generalize)• Remove or obscure the portions of data that are sensitive (redact)

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

Providing access to sensitive data

•Review or establish organizational policies on sensitive data.•Identify conditions under which sensitive data may be

disseminated.•Select a dissemination method that does not violate restrictions. •Create a data dissemination plan for sensitive data. •Request approval from authorities for data dissemination plan.•Disseminate data in accordance with data dissemination plan.•Monitor data dissemination to ensure compliance with all

restrictions on sensitive data.•Cease dissemination if restrictions on sensitive data are violated

and immediately report any violations to authorities.

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

References and Resources

• Apache Software Foundation: Licenses http://www.apache.org/licenses/

• BSD License http://www.freebsd.org/copyright/license.html • Chen, R. S. and Downs, R. R. 2010. Evaluating the Use and Impact

of Scientific Data. Assessing The Usage and Value of Scholarly and Scientific Output: An Overview of Traditional and Emerging Metrics. Available online at http://info.nfais.org/info/ChenDownsNov10.pdf

• Creative Commons http://creativecommons.org/ • Best Practices for Sharing Sensitive Environmental Geospatial Data

http://www.geoconnections.org/publications/Key_documents/Sensitive_Env_Geo_Data_Guide_EN_v1.pdf

• Computer Security Act of 1987. Available at http://csrc.nist.gov/groups/SMA/ispab/documents/csa_87.txt

• Confidentiality Issues in Geospatial Data Applications. Available at http://sedac.ciesin.columbia.edu/confidentiality/

Module Template: Subtitle; Version 1.0, Reviewed 9/15/11

References and Resources

• Dealing With Sensitive Data. Available at http://www.dcc.ac.uk/events/research-data-management-forum/rdmf4-dealing-sensitive-data

• Downs, R. R., Schumacher, J., Chen, R. S., & Lenhardt, W. C. 2008. Assessing the Scientific Benefits of Interdisciplinary Use of Social Science Data through Citation Analysis. International Association for Social Science Information Services & Technology (IASSIST) 2008 Conference. Palo Alto, CA. May 28, 2008. https://www.stanford.edu/group/ADS/cgi-bin/drupal/system/files/DataCitationIASSIST20080528.ppt

• ICPSR – Restricted Data. Available at http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/restricted/

• Guidelines for Providing Appropriate Access to Geospatial Data in Response to Security Concerns. Available at http://www.fgdc.gov/policyandplanning/Access%20Guidelines.pdf

• Gnu Public Licenses (GPL & LGPL) http://www.gnu.org • Open Data Foundation. Available at http://www.opendatafoundation.org/• United States Copyright Office. Available at http://www.copyright.gov