challenges in altmetric data collection: what are the differences among different altmetric...
TRANSCRIPT
Challenges in altmetric data collection: what are the differences among different
altmetric providers/aggregators?
Zohreh Zahedi, Martin Fenner & Rodrigo Costas
{z.zahedi.2;rcostas}@cwts.leidenuniv.nl [email protected]
CWTS, Leiden University & Datacite.org
2:AM Conference, 7 October 2015, Amsterdam Science Park
1:AM altmetrics project funding awarded!
Our proposalHow consistent are altmetrics data providers?
Zohreh Zahedi, Martin Fenner & Rodrigo Costas
Supported by Thomson Reuters
https://altmetricsconf.wordpress.com/2014/12/17/1am-altmetrics-project-funding-awarded
/
Data provider: source of metrics
Data aggregators: Aggregate and offer /report metrics
Mendeley.com Altmetric.com Lagotto
Mendeley API Dump file open source application
Common sources across different provider/aggregators
Consistency: having the same number as the source itself when metric collected at the same time/date for a same DOI
4
Consistency of altmetrics data among different providers/aggregators is very necessary!
document
Research question:
How consistent are altmetrics providers/ aggregators in reporting the same metrics for the same set of DOIs controlling date/time of the data
extraction?
• What are the differences? • What are the reasons?
6
Inconsistencies (Zahedi, Fenner & Costas, 2014):
Data problems: previous study
Data:
A random sample of 30,000 DOIs from the year 2013 selected:
CrossRef(15,000 DOIs) +
WoS (15,000 DOIs)
Data collection done at the same date/time on July 23 2015 starting at 2 PM CEST
9
Altmet-ric.com;
23%
Lagotto; 68%
Mende-ley.com;
69%Coverage of DOIs:
Common metrics (for overlapping DOIs):
Mendeley readers
CiteUlike
0 10000 20000 30000 40000 50000 60000 70000 80000
LagottoAltmetric.comMendeley.com
Result: Mendeley readerships
Consistency:• overall, both aggregators were similar compared to
last year • Mendeley has improved its APIDifferences: (Frequency of updates)Lagotto: by default metrics from Mendeley collected every dayAltmetric.com: not updated in real time: time lag; reported only for documents with at least one other metric (articles that have only Mendeley counts but not other metrics are discarded)
Result: Facebook
Consistency:• Exactly the same result as last year: very different Differences: (different ways of collecting & reporting) Lagotto: aggregates all FB counts (shares+likes+posts+comments); search for DOIs via FB APIAltmetric.com: reports FB public posts only; track links to find DOIs/URLs
Result: Twitter
Consistency:• Exactly the same result as last year: very differentDifferences: (using different APIs)Lagotto: use very limited Twitter public API (limited number of tweets per DOI) Altmetric.com: use GNIP to get Twitter data; capture everything mentioning a whitelist of domains and then resolve links to papers
Result: other sources
• Reddit: huge differences Lagotto Reddit counts =posts+comments
Altmetric.com Reddit counts = posts-comments•CiteULike: some differences
• Wikipedia: not analyzed yet
14
What are the possible reasons for Inconsistency?
• Using different methodology/approach in collecting & processing metrics
• Using different identifiers (DOI, PMID, arXiv ID)• Differences in reporting metrics (aggregated vs.
raw score/public vs. private posts)• Accessibility issues (for resolving DOIs; cookies
problems, access denies) differs across different publishers
• Different updates: possible time lags in the data collection or updating issues
15
What are the challenges:
There is a need for both:• Best practices • Guidelines and standards
16
NISO ‘altmetrics data quality’ working group: Code of Conduct
NISO has initiated this group to develop a draft code of conduct for collection, processing, dissemination and reuse of altmetric data that can contribute to solve many of data problems issues.
18
Thanks for your attention!