download vs. citation vs. readership data:the case of an information systems journal
DESCRIPTION
Presentation of the paper with Christian Schlögl, Juan Goarriz, Christian Gumpenberger, and Kris Jack at ISSI 2013TRANSCRIPT
gefördert durch das Kompetenzzentrenprogramm
ISSI 2013 – Altmetrics 215 July 2013
know-center.tugraz.at
Download vs. Citation vs. Readership Data:The Case of an Information Systems Journal (RiP)*
Christian Schlögl, Juan Gorraiz, Christian Gumpenberger, Kris Jack, Peter Kraker
* Research in Progress
© Know-Center 2011
2
www.know-center.at
Introduction
Many studies have compared download and citation data (Moed 2005, Bollen & Van De Sompel 2008, Schlögl & Gorraiz 2011)
Possible sources for download data
Repositories/preprint archives
Open access journals
E-journals
Recently, online reference systems have received a lot of attention as a possible source for altmetrics
A few studies have compared readership and citation data (Bar-Ilan 2012, Li and Thelwall 2012 , Kraker et al. 2012)
In this study, we compare citations, downloads, and readership for the Journal of Strategic Information Systems
© Know-Center 2011
3
www.know-center.at
Research Questions
Are most cited articles the most downloaded ones, and those which can be found most frequently in user libraries of the collaborative reference management system Mendeley?
Do citations, downloads, and readership have different obsolescence characteristics at publication level?
Are there other features in which citation, download and readership data differ?
© Know-Center 2011
4
www.know-center.at
Data
The Journal of Strategic Information Systems (JoSIS)
“The Journal of Strategic Information Systems focuses on the management, business and organizational issues associated with the introduction and utilization of information systems as a strategic tool, and considers these issues in a global context.” http://www.journals.elsevier.com/the-journal-of-strategic-information-systems/
Period of analysis: 2002-2011; 321 documents
Data sources:
ScienceDirect (SD): monthly download data (PDF & HTML)
Scopus: monthly citation data
Mendeley: monthly additions to user libraries (full length articles)
© Know-Center 2011
5
www.know-center.at
Mendeley
Online reference management system
Organizing personal research library
Creating user profile
Reading and annotating of PDFs
Forming private and public groups
Sharing of references/PDFs
Crowdsourced Mendeley research catalog
2.5 m users
428 m user documents
~75 m unique articles
http://www.mendeley.com/research-papers/
© Know-Center 2011
6
www.know-center.at
Methodology
Preprocessing
Matching documents between ScienceDirect and Scopus
No unique key for SD and Scopus/Different document types between SD and Scopus
Matching via title, journal, vol/issue, page
Matching documents between Scopus and Mendeley via title (Levenshtein ratio 1/15.83) – found all but 5
Descriptive statistics
Document types, publication dates, downloads, readers
Correlation analysis
Downloads vs. cites, readers vs. Cites, downloads vs. readers
© Know-Center 2011
7
www.know-center.at
ResultsDownloads per document type
FLAs are the most downloaded document type (94.1%)
All other documents are downloaded at a considerably lower level
Document type n % docs % downloadsDownloads per doc – relations
Announcement 5 1.6% 0.4% 5.9
Book review 4 1.2% 0.3% 5.5
Contents list 29 9.0% 0.4% 1.0
Editorial Board 29 9.0% 0.6% 1.5
Editorial 49 15.3% 3.3% 4.6
Erratum 1 0.3% 0.1% 5.7
Full length article 181 56.4% 94.1% 35.4
Index 12 3.7% 0.2% 1.3
Miscellaneous 9 2.8% 0.2% 1.8
Publishers note 2 0.6% 0.2% 7.0
321 100% 100% Source: ScienceDirect; n=321
© Know-Center 2011
8
www.know-center.at
ResultsPrint publication delay
FLAs are published online more than 1.5 months before print publication on average.
Document type nOnline date - print
publication date (mean days)
Announcement 5 -13.2Book review 4 -40.5Contents list 29 12.9Editorial Board 29 12.9Editorial 49 9.0Erratum 1 -145.0Full length article 181 -49.8Index 12 -4.9Miscellaneous 9 32.9Publishers note 2 -13.0 321 -24.9
Source: ScienceDirect; n=321
© Know-Center 2011
9
www.know-center.at
ResultsDownloads per publication year (relational)
Download maximum in many cases 1 year after publication
Most downloads in a single year for FLAs published in 2011
DL-year
PY n 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 allDL/FLA
2002 13 1.0 2.3 1.7 1.3 1.2 1.4 2.4 2.8 2.8 2.7 19.6 7.4x2003 21 0.0 1.3 2.2 1.0 1.0 0.9 1.5 1.3 1.5 1.1 11.9 2.8x2004 17 1.7 2.6 2.1 2.2 2.4 2.7 2.9 2.3 18.9 5.5x2005 18 1.7 2.3 1.8 2.0 2.4 2.6 2.2 15.0 4.1x2006 14 0.2 2.4 2.1 1.8 2.1 2.0 2.0 12.5 4.4x2007 18 0.0 2.7 3.6 3.4 3.5 2.9 16.1 4.4x2008 16 0.0 2.9 3.5 3.0 2.4 11.8 3.6x2009 14 3.1 4.0 3.1 10.2 3.6x2010 21 3.9 4.4 8.3 2.0x2011 29 0.3 5.6 5.9 1.0xall 181 1.0 3.7 5.6 6.8 8.9 11.1 16.6 21.4 26.4 29.0 130.4
Source: ScienceDirect; FLA only (n=181)
© Know-Center 2011
10
www.know-center.at
ResultsCitations per document type
Different document types in Scopus and ScienceDirect (FLA ≈ articles + conference papers + reviews)
Ca. 25% of all documents not cited (primarily editorials, conference papers and recent publications)
Doc type no. docs % uncited Cites Cites per doc type
Article 151 15% 2563 14.8Conference paper 13 69% 8 0.4Editorial 33 79% 13 0.2Review 18 6% 383 20.2All 215 27% 2967 10.9
Source: Scopus; n=215
© Know-Center 2011
11
www.know-center.at
ResultsCitations per publication year
Only a few documents are cited in publication year - citation maxium is reached several years after publication
Difference to downloads reaching their maximum in the year of publication or one year later
Pubyear n
Citation year cites per doc2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 all
2002 13 2 19 38 69 88 105 158 165 194 199 1037 79.82003 14 1 6 21 27 39 35 41 40 39 249 17.82004 17 0 15 40 56 74 78 88 107 458 26.92005 19 0 16 46 78 76 93 99 408 21.52006 14 1 2 14 31 31 53 49 181 12.92007 18 1 31 74 92 85 283 15.72008 15 3 30 69 83 185 12.32009 14 3 34 57 94 6.72010 18 5 40 45 2.52011 8 14 14 1.8all 150 2 20 44 106 173 261 410 498 668 772 2954 Source: Scopus; Document types: articles, reviews, conference papers; only cited documents
(n=150)
Special Issue on “Trust in the Digital Economy“
Special Issue withconference papers
© Know-Center 2011
12
www.know-center.at
ResultsReaders per print publication year
Relative youth of Mendeley (est. 2008), strong increase of its user base since then (now: 2.5 mio) make obsolescence analyses difficult – Weighting with user/document growth needed.
Pubyear n
Readership years Readers per doc2008 2009 2010 2011 - July
2012all
2002 13 7 30 126 245 183 591 45.52003 21 1 29 58 108 145 341 17.12004 17 11 36 107 158 165 477 28.12005 18 2 31 79 141 151 404 23.82006 14 6 39 88 128 148 409 29.22007 18 4 45 129 222 209 609 35.82008 16 7 36 99 182 164 488 32.52009 14 0 27 111 127 150 415 29.62010 21 0 0 84 238 191 513 24.42011 29 0 0 4 208 282 494 17.6all 181 38 273 885 1757 1852 4741
Source: Mendeley; FLA only (n=181)
© Know-Center 2011
13
www.know-center.at
ResultsDownloads vs. readers vs. cites (only FLAs)
Moderate to high correlation (Spearman) between downloads and readers (0.73)
and downloads and citations (0.77)
Moderate correlation between citations and readers (r=0.51)
0
20
40
60
80
100
120
downloads vs. readers
downloads
rea
de
rs
0
50
100
150
200
250
300
downloads vs. cites
downloads
cit
es
0 20 40 60 80 100 120
0
50
100
150
200
250
300
readers vs. cites
readership
cit
es
r=0.73, n=181 r=0.77, n=151 r=0.51, n=151
© Know-Center 2011
14
www.know-center.at
ResultsReadership structure of Mendeley articles
2/3 of readership counts come from students
Researchers + Post Docs + Profs ≈ 1/4 of all readership counts
32%
7%
19%
6%
5%
5%
1%
5%
3%
3% 5%
3%4%
1%0%
Student (PhD) Student (doctorial) Student (MA) Student (postgr.)
Student (BA) Lecturer Sen. Lecturer Researcher (academic)
Researcher (non-academic) Post Doc Assist. Prof. Assoc. Prof.
Prof. other Librian
Source: Mendeley; doc type: FLA; n=4741
© Know-Center 2011
15
www.know-center.at
Conclusions
Comparison of different measures not always easy
Different obsolesence characteristics of downloads and cites (readership to be determined)
Moderate to high correlation between downloads and cites
Moderate correlation between cites and readership data
For representative usage measures, we need to understand their characteristics on a large scale
To fully understand usage and impact of an article, it will be important to have many complementary measures with transparent biases
On the one hand, we need open bibliometric data, on the other hand, we need a better understanding of the research process
gefördert durch das Kompetenzzentrenprogramm
ISSI 2013 – Altmetrics 215 July 2013
know-center.tugraz.at
Thank you very much for your attention!
Christian Schlögl, Juan Gorraiz, Christian Gumpenberger, Kris Jack, Peter Kraker