the persistence of error (2011 crossref annual meeting)
TRANSCRIPT
Phil Davis, Ph.D.
The Persistence of Error: A Study of Retracted Articles on the Internet and in Personal Libraries
2011 CrossRef Annual Member Meeting
November 15, 20110
An Elegant Solution to a Poorly Understood Problem
2 November 21, 2011
What We Know
• Number of retractions small but increasing (Wager & Williams, 2011; Steen, 2011)
• Retracted articles continue to be cited as valid studies (Budd et al., 1998, 2011; Redman et al., 2008)
• Journal publishers are inconsistent with alerting readers: 41% articles watermarked, 32% contain no notification anywhere (Steen, 2011)
• Most publishers allow some form of self-archiving (SHERPA/Romeo; Morris, 2009)
• Authors often ignore publisher policy (Davis & Connolly, 2007)
• Journal articles are likely to be found on non-publisher websites (Wren, 2005)
3 November 21, 2011
What We Assume
• Reaching readers is a communication problem that is not being solved by publishers and indexers alone.
• There is more than one access conduit to the scholarly literature
• Proliferation of article versions
• Scholars hoard articles in personal libraries
• Article status is static unless stated otherwise
• As retraction numbers are small, little incentive to search for updates (high-cost, low return)
4 November 21, 2011
What We Don’t Know
• Extent of proliferation of retracted papers on the public internet (out of the control of the publisher)
• Where they exist and which version(s)?
• What exists in readers personal libraries?
5 November 21, 2011
What We Did
1. Searched for copies of retracted papers on the public Internet. Excluded published version on publisher’s website
2. Created an API that searched the Mendeleydatabase for retracted articles
6 November 21, 2011
PMC (no notice on page view or pdf)
7 November 21, 2011
PMC (notice but not on pdf)
8 November 21, 2011
9 November 21, 2011
Advanced publication
10 November 21, 2011
Final manuscript on publisher’s site
11 November 21, 2011
Author manuscript in library repository
12 November 21, 2011
Pub version in repository
13 November 21, 2011
Reviewer manuscript in repository
14 November 21, 2011
Author website
15 November 21, 2011
Classes
16 November 21, 2011
Hospital Labs
17 November 21, 2011
Journal clubs
18 November 21, 2011
Medical schools
19 November 21, 2011
University Research Institutes
20 November 21, 2011
Advocacy
21 November 21, 2011
Commercial websites
22 November 21, 2011
Author, medical business
23 November 21, 2011
Aggregation sites
24 November 21, 2011
Entire issue
25 November 21, 2011
Clearinghouses
26 November 21, 2011
0
20
40
60
80
100
120
140
160
180
1973
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Retr
acte
d a
rtic
les
Year
No public copies
Found public copies
Public Copies on the Web
27 November 21, 2011
Summary of Web Study
• 1,779 retracted articles from PubMed (1973-2010)
• 308(12%) publicly-accessible copies (excluding published version on journal website)
• 29 could be found in more than one location (max 5)
• 90% of copies were published version; 9% final manuscripts; 1% other
• 41% in PMC; 28% on educational sites; 7% commercial
• 24% copies with retraction notices (5% excluding PMC page view)
28 November 21, 2011
A window into what is on computers
29 November 21, 2011
Mendeley API
30 November 21, 2011
Our API: http://www.fireisborn.org/retract/
Results from Mendeley
• 75% (1,340 of 1,779 records) could be found in Mendeley (mean readers = 3.4, max = 133)
• Caveat: We are not certain if they have the PDF
• Concentration of “readers” in top journals
• High readership articles more than 3x likely to be found on public (non-repository) websites (OR 3.28, 2.33-4.61, p<.0001)
31 November 21, 2011
Implications
• The problem of persistence cannot be controlled by copyright. Publishers lack control of articles
• Increased access comes with a versioning problem
• Essential problem: How do you reach readers when a Version of Record is no longer a Version of Record?
32 November 21, 2011
Solutions
Given 90% public copies are publisher version, CrossMark would be seen by the future reader
Caveats:
• Reader still responsible for initializing verification check
• Authors often write directly from bibliographic software
• Doesn’t prevent reuse/recycling of citations
• Doesn’t automatically update older PDFs (without symbol)
• Institutional self-archiving mandates may increase author manuscripts
33 November 21, 2011
1. Before Reading
34 November 21, 2011
2. Before Writing
35 November 21, 2011
3. Before Publication
36 November 21, 2011
Tripartite Solution
1. Before Reading
2. Before Writing
3. Before Publication
37 November 21, 2011