challenges in altmetric data collection: what are the differences among different altmetric...

18
Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo Costas {z.zahedi.2;rcostas}@cwts.leidenuniv.nl [email protected] CWTS, Leiden University & Datacite.org 2:AM Conference, 7 October 2015, Amsterdam Science Park

Upload: calvin-barton

Post on 19-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Challenges in altmetric data collection: what are the differences among different

altmetric providers/aggregators?

Zohreh Zahedi, Martin Fenner & Rodrigo Costas

{z.zahedi.2;rcostas}@cwts.leidenuniv.nl [email protected]

CWTS, Leiden University & Datacite.org

2:AM Conference, 7 October 2015, Amsterdam Science Park

Page 2: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

1:AM altmetrics project funding awarded!

Our proposalHow consistent are altmetrics data providers?

Zohreh Zahedi, Martin Fenner & Rodrigo Costas

Supported by Thomson Reuters

https://altmetricsconf.wordpress.com/2014/12/17/1am-altmetrics-project-funding-awarded

/

Page 3: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Data provider: source of metrics

Data aggregators: Aggregate and offer /report metrics

Mendeley.com Altmetric.com Lagotto

Mendeley API Dump file open source application

Common sources across different provider/aggregators

Consistency: having the same number as the source itself when metric collected at the same time/date for a same DOI

Page 4: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

4

Consistency of altmetrics data among different providers/aggregators is very necessary!

document

Page 5: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Research question:

How consistent are altmetrics providers/ aggregators in reporting the same metrics for the same set of DOIs controlling date/time of the data

extraction?

• What are the differences? • What are the reasons?

Page 6: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

6

Inconsistencies (Zahedi, Fenner & Costas, 2014):

Data problems: previous study

Page 7: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Data:

A random sample of 30,000 DOIs from the year 2013 selected:

CrossRef(15,000 DOIs) +

WoS (15,000 DOIs)

Page 8: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Data collection done at the same date/time on July 23 2015 starting at 2 PM CEST

Page 9: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

9

Altmet-ric.com;

23%

Lagotto; 68%

Mende-ley.com;

69%Coverage of DOIs:

Common metrics (for overlapping DOIs):

Mendeley readers

Facebook

Twitter

CiteUlike

Reddit

0 10000 20000 30000 40000 50000 60000 70000 80000

LagottoAltmetric.comMendeley.com

Page 10: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Result: Mendeley readerships

Consistency:• overall, both aggregators were similar compared to

last year • Mendeley has improved its APIDifferences: (Frequency of updates)Lagotto: by default metrics from Mendeley collected every dayAltmetric.com: not updated in real time: time lag; reported only for documents with at least one other metric (articles that have only Mendeley counts but not other metrics are discarded)

Page 11: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Result: Facebook

Consistency:• Exactly the same result as last year: very different Differences: (different ways of collecting & reporting) Lagotto: aggregates all FB counts (shares+likes+posts+comments); search for DOIs via FB APIAltmetric.com: reports FB public posts only; track links to find DOIs/URLs

Page 12: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Result: Twitter

Consistency:• Exactly the same result as last year: very differentDifferences: (using different APIs)Lagotto: use very limited Twitter public API (limited number of tweets per DOI) Altmetric.com: use GNIP to get Twitter data; capture everything mentioning a whitelist of domains and then resolve links to papers

Page 13: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

Result: other sources

• Reddit: huge differences Lagotto Reddit counts =posts+comments

Altmetric.com Reddit counts = posts-comments•CiteULike: some differences

• Wikipedia: not analyzed yet

Page 14: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

14

What are the possible reasons for Inconsistency?

• Using different methodology/approach in collecting & processing metrics

• Using different identifiers (DOI, PMID, arXiv ID)• Differences in reporting metrics (aggregated vs.

raw score/public vs. private posts)• Accessibility issues (for resolving DOIs; cookies

problems, access denies) differs across different publishers

• Different updates: possible time lags in the data collection or updating issues

Page 15: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

15

What are the challenges:

There is a need for both:• Best practices • Guidelines and standards

Page 16: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

16

NISO ‘altmetrics data quality’ working group: Code of Conduct

NISO has initiated this group to develop a draft code of conduct for collection, processing, dissemination and reuse of altmetric data that can contribute to solve many of data problems issues.

Page 17: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo
Page 18: Challenges in altmetric data collection: what are the differences among different altmetric providers/aggregators? Zohreh Zahedi, Martin Fenner & Rodrigo

18

Thanks for your attention!