changing cultures, building standards linda beebe senior director, psycinfo
TRANSCRIPT
Enhancing Access to Data in Scholarly Research
Changing Cultures, Building Standards
Linda BeebeSenior Director, PsycINFO
ICSTI Annual Meeting 2012
And authors and publishers did─◦ Text (extended methodology sections,
bibliographies, survey results, derivations. . .)◦ Tables and figures◦ Multimedia◦ Gene sequences, protein structures, chemical
compounds, structures, 3-D images◦ Computer programs—algorithms, code,
executables◦ Datasets—and raw research data
Technology allowed us to add almost anything outside the article. . .
ICSTI Annual Meeting 2012
No standards Very different cultures
and practices from one discipline to another
Inconsistent identifiers Poor metadata Lack of discovery tools Abuse of readers and
reviewers
We had rapid, unplanned growth.
ICSTI Annual Meeting 2012
Business Policies & Practices cover selecting, editing, hosting, assuring discoverability, referencing, packaging, maintaining links, providing context, and preserving.
Technical Recommendations emphasize metadata, persistent identifiers, preservation, packaging and exchange.
Bi-directional linking using DOIs, emphasis on persistent linking reliability.
Flexibility and simplicity to support either a simple approach or the most detailed and granular metadata.
Clear definitions of metadata elements. Attention to preservation and migration, including saving of
objects along the migration chain.
NISO-NFAIS Recommended Practices
Nearing Final
Publication
ICSTI Annual Meeting 2012
Following 2 slides from Howard Ratner good reminder of the growth
Borrowed with permission from his talk December 2011 STM Innovations meeting
Ideas generated by the STM Future Lab Committee.
Today the buzz is around raw data.
ICSTI Annual Meeting 2012
Important Topic #1: API Platforms*New Access to Content
NEW ACCESS TO CONTENT
*API-platforms for third party developers available at Elsevier, Springer, NPG, IEEE (search)Getting ready for launch:IoPP, T&F, CABIMany more expected to follow
Curiosity driven R&DGRANULARITY OF CONTENT
SEMANTICS
LET THE OUTSIDE WORLD IN
OUR CONTENT YOUR WAY
CREATE CROSS-PUBLISHER STANDARDS
Common metadata
Full text formats
HTML5
API PLATFORMS
XHTML
THIRD PARTY APPS
App store
LINKED DATA
LINKED OPEN DATA
RDF
MOBILE PRODUCTIVITYMULTI-DEVICE PRODUCTIVITY
Seamlessly linked platforms
M-commerce MOBILE
Transmedia itemsVoice Activation
ICSTI Annual Meeting 2012
Important Topic #2: Research DataNew Presentations for Re-use
RESEARCH DATA
DATA OBJECTS ARE FIRST CLASS RESEARCH OBJECTS
MAKE DATA INTERACTIVE
share the actual workflow of the researcher?
graphics represent data sets; how to open them up?
ACTIONABLE DATA
DATA CREATION
What formats do users want?COMMON STANDARDS
AUTHORING TOOLS
how to treat supplemental files to journals?
Guidelines for:-Reuse and sharing-Incentives and barriers-Editorial policies
Discoverability of data
BIG DATA
Deep Linking
REPOSITORIESDATACITE
Bibliographic tools
User behaviour
MendeleyCiteSeer
ColWizReadCUBE
Data journal
ICSTI Annual Meeting 2012
“Hard sciences” such as Physics and Chemistry—long history of handling supplemental material and requiring access to data.
Disciplines that study human subjects (psychology, sociology, health sciences)—far less likely to have such practices.
There is growing interest in standards and other support for data deposits and access.
Different Cultures & Practices
ICSTI Annual Meeting 2012
Study of Matter AAAS—must deposit
in approved repository.
ACS—must submit data and deposit.
AGU—must deposit data in approved repository
ASPB—must submit to journal.
Study of Humans APA—to date only
expected to supply for verification.
APS—no requirements
ASA—no requirements posted
AAA—no requirements posted
The Divide on Data Deposits
ICSTI Annual Meeting 2012
In the “softer” sciences, increased quantities of data are scattered on laptops, in file drawers, on the web—all in danger of being lost, even thrown away.
Question: how do we preserve these data and make them available for further research?
ICSTI Annual Meeting 2012
Actually, there are many questions
What constitutes
data?
What must the author do to it?
Who will maintain it?
What about confidentiali
ty?
How does one cite
data?
. . . And many more
ICSTI Annual Meeting 2012
Websters—factual information (as measurements or statistics) used as a basis for reasoning,discussion, or calculation.
Chaim Zins (2006)—statistical observations and other recordings or collections of evidence
NSF—any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor data streams, video, audio, algorithms, software, models and simulations, images, etc.
Altman & King—systematic compilation of measurements for machine reading; must be systematically organized and described
What constitutes data?
ICSTI Annual Meeting 2012
Report on Integration of Data and Publications, October 17, 2011. Susan Reilly, Wouter Schallier, Sabine Schrimpf, Eefke Smit, and Max Wilkinson. Retrieved 10/11/2012 from http://www.stm-assoc.org / 2011_12_5_ODE_Report_On_Integration_of_Data_and_Publications.pdf
ICSTI Annual Meeting 2012
Replication standard—sufficient information to enable a third party to to replicate with no additional information from the author (King 1995). So authors must— Provide clear metadata. Code consistently and list coding instructions. Explain how data were used. Provide all raw data. Organize data in a way that can be used by
others. Making data available requires a different
workflow and more work—but makes for a better scientist.
What must an author do?
ICSTI Annual Meeting 2012
Natural sciences, many options such as Crystallography, ChemStar, ChemSpider, PubChem, PANGAEA.
Life Sciences, Protein DataBank now one entity with data from former banks in US, Europe, and Japan. Also Dryad, National Biological Information Infrastructure.
Not so many options in Social Sciences.
Who will maintain the data?
ICSTI Annual Meeting 2012
Inter-university Consortium for Political and Social Research (ICPSR)
U of Michigan Data deposit and
management Publication-Related
Archive quickly available, but ICPSR does not process.
Institute for Quantitative Social Science (IQSS) Dataverse Network
Harvard Maintains dataverses
(individual repositories). Delivers formal
persistent citations.
Two options in Social Sciences
ICSTI Annual Meeting 2012
IQSS Dataverse Network terms and conditions (paraphrased): Agree not to use materials to obtain information
that could ID subjects in any way, produce links that could ID them or do anything that could constitute invasion of privacy or breach of confidentiality.
Also, will not download or use in any way prohibited by applicable law.
And will always include the bibliographic citation for the data in any publication that references the data.
What about confidentiality and attribution?
ICSTI Annual Meeting 2012
Like any citation, it must contain basic elements that identify the dataset as unique:Title, Author, Date, Version, Persistent Identifier
DataCite, the organization that manages DOIs for data, recommends Creator (Publication/Year): Title. Version. Publisher. ResourceType. Identifier.
Example: Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.1594/PANGAEA.726855
And how does one cite data?
ICSTI Annual Meeting 2012
How do we know the data have not changed? Altman & King (2007) advocated the
Universal Numeric Fingerprint (UNF)—a short fixed-length string of numbers and characters
Example: UNF: 3: ZNQRI1405389xOBffg?== in which the 3 is the version number, the suffix is the fingerprint. If that number changes, the set is a new version of the data.
Another issue—fixity. . .
ICSTI Annual Meeting 2012
Just like citing other sources of information—encourages findability, credits the creator, makes any impact trackable.
Promotes more and better science, as it enables reuse and verification of data.
Rewards the data producer—may encourage others to deposit data.
The importance of citations. . .
ICSTI Annual Meeting 2012
DataCite—very international with members around the world (CDL & Purdue US members, Microsoft & ICPSR associates)
Co-Data—International Council for Science, Committee on Data for Science & Technology
International Association for Social Science Information Services & Technology
Day-PASS—Data Preservation Alliance for the Social Sciences, membership organization of archives and research centers to date
Some Advocates for a Culture of Data Citation
ICSTI Annual Meeting 2012
Linkability and Citability of Research Data Responsibilities for researchers, data
archives, publishers Co-reponsibility for bi-directional linking
between datasets and publications using persistent identifiers
Support for data reuse Issued in June 2012 Joined by CrossRef in July
Joint STM-DataCite Statement
ICSTI Annual Meeting 2012
Researcher
Institution
Funder
Publisher
Data Manage
r
Need for collaboration among all major participants in the Research Cycle
ICSTI Annual Meeting 2012
Funder mandates for data sharing plans encourage new thinking from some disciplines.
Connection with the publications is needed. FundRef new initiative within CrossRef Collaboration between publishers and funders to
make connections between grants and resulting publications
Pilot for publishers to create and submit standard metadata with funder name and grant number.
Working group includes several publishers and funders. http://www.crossref.org/fundref/index.html
Funder/Research/Publishing Connections
ICSTI Annual Meeting 2012
Established to solve the name abiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers.
Provides an open and transparent linking mechanism between ORCID, other identifiers, and research objects—pubs, grants, patents, etc.
Governed by a board representing all stakeholders.
Launching this month. http://about.orcid.org/
ORCID Another Example
ICSTI Annual Meeting 2012
Designed to facilitate information exchange about research and scholarship.
Funded by NIH, National Center for Research Resources
Initially, 7academic institutions, but growing APA has instance: www.vivo.apa.org Semantic web of information to support
interconnectedness and trust support maintenance of research data.
And VIVO still another . . .
ICSTI Annual Meeting 2012
Data standards, changing cultures, new
infrastructures will help us avoid the tumult we’ve
experienced with supplemental materials.
ICSTI Annual Meeting 2012
Past expectation--psychologists do not withhold data and will share for verification of results.
New expectation—authors must agree to share data.
In Psychology—a new model for data sharing at APA
ICSTI Annual Meeting 2012
Psychologists worried— Potential nefarious uses—unscrupulous people
could twist the data or hector people author trying to help.
Well-intentioned but inept secondary analysis—they might get it wrong!
Loss of potential publications for self—I haven’t written all my articles from this data!
But most common fear—loss of academic credit for what may be years of data collection.
Sharing broadly a rare event in the past
ICSTI Annual Meeting 2012
Archives of Scientific Psychology is very different from other APA journals in 4 regards:
Authors must submit data to APA or approved repository.
Journal is electronic only. It is an open access/author pays model. Authors must submit two methods
sections: 1 scientific and 1 in lay language.
A radical change for psychology. . .
ICSTI Annual Meeting 2012
Authors sign a Collaboration Agreement specifying that others may reuse their data.
Researchers who wish to reuse the data must sign a Collaboration Agreement stating1. They will not do anything to reveal identity
of subjects.2. They will not engage in “gotcha” publishing
—run analyses to prove author wrong and publish the results.
3. They will offer the original data collector co-authorship.
Data Collaboration Most Radical Aspect
ICSTI Annual Meeting 2012
Change the paradigm for use and reuse of data in psychological research by assuring full attribution and credit for the original creator of the data.
Contribute to the culture of transparency and prevention of fraud in science.
Maintain APA’s high standards for peer-reviewed literature and contributions to science.
APA’s Goals
The jury is still out—but manuscripts are coming in.
ICSTI Annual Meeting 2012
As all the participants in the
scholarly communications process work to
enhance access to data, there
undoubtedly will be more revolutionary
changes.
ICSTI Annual Meeting 2012
Linda BeebeSenior Director, PsycINFO
American Psychological [email protected]
www.apa.org/pubs/index.aspx
Thanks for Listening!
By building standards, we can change cultures.