The search for alternative metrics for taxonomy
Daphne Duin & Peter van den Besselaar
VU university Amsterdam Org Science & Network Institute
Altmetrics for research evaluation• Quality (impact, relevance, originality) is assessed by
relevant audiences– Scholarly– Non-scholarly, such as economic, professional, policy,
general public• Metrics based on communication– Scholarly
• publications, citations (not very useful), but this does not work (in all fields), topics, networks
• Science on the web: altmetrics –> new communication media– Non-scholarly
• Indicators for societal impact: new metrics needed, based on communication with societal audiences. Eg, website visits/activity
14-6-2011 AtlMetrics@webscience 2
Test
14-6-2011 AtlMetrics@webscience 3
The case
• Scratchpads are biodiversity research communities on the Web – a research infrastructure
• Platform for collaboration, data sharing, publishing– Tagging– Data analysis – Blogs – Collaborative writing
14-6-2011 AtlMetrics@webscience 4
Scratchpads
• Started 2007– Today > 200 communities and > 3000 registered
users, numbers go up every week• EU funded
• Evaluation of research infrastructures– Scholarly use – Societal benefits
-> Horlings, Van den Besselaar, review, forthcoming
14-6-2011 AtlMetrics@webscience 5
Questions
• Can we use web data to identify the relevant audiences of the Scratchpads infrastructure?
• Can we use web data to study if and how often the sites are used to “produce” content?
14-6-2011 AtlMetrics@webscience 6
The Data
We used:• Google Analytics reports for all Scratchpads and
compared them to the report of one specific site
• Period Oct 1, 2010-March 31, 2011
• Web server log data for 1 day (24 h) to see what people are doing - CMS system with standardised (/add/edit/delete/comment/content)
14-6-2011 AtlMetrics@webscience 7
Results – the audiences I
1 Oct 2010- 31 March 2011
• 9212 unique Service Providers came in to Scratchpad domain (no bots); “Average time on site” > 4 seconds
• Of which 6896 telecom-internet companies (ISPs)• Of which 2316 identifiable user organizations = non-
ISPs (25%)
• Clustered in 8 categories
14-6-2011 AtlMetrics@webscience 8
Audiences - all sites
2316 unique Service Providers>200 community sites1 Oct 10 –April 11
14-6-2011 AtlMetrics@webscience 9
Categories• Research/Education = Universities, laboratories, science
museums, botanical gardens libraries, schools, colleges• Government = National/state/local departments in
agriculture, pest control, food security, forest management, environment energy, transport
• Companies = Food, Pharmaceuticals, Mining, energy companies, pest control products, Accountancy, Consultancy biotechnology
• Non-profit= conservation environmental agencies, societies• Health= Hospitals, health services health/medical research, • Art/culture/media = art museums, art academies,
broadcasting and media services publishing companies• Travel = hotels, airlines, stations (wifi?)• Other=Church
14-6-2011 AtlMetrics@webscience 10
Scholarly use
• Quick test: – Scholarly journals in the field– Authors– Corporate addressed
– Are these organizations also scratchpad users?
– Yes – but by far not exclusively
14-6-2011 AtlMetrics@webscience 11
Results – the audiences II
Same for 1 specific site
• 276 unique Service Providers
• 201 are internet/telecom companies (ISPs)
• 75 are identifiable user organizations (non ISPs)
• 2 categories
14-6-2011 AtlMetrics@webscience 12
Results – 1 site MicroOrg.info
75 unique Service Providers, commercial one’s excluded
14-6-2011 AtlMetrics@webscience 13
Using the Web is science, to do what?Web data to study the if and how often the sites are used to “produce” content
• Web sever log data for February 1, 2011 (24h)
http://citesbulbs.myspecies.info/node/add/image COMINED WITH POST /node/add/image HTTP/1.1
•/add; /edit; /delete; /comment; /content
•1148 “producing” actions
•Feb 1, 2011: 1270 visits registered in Google Analytics14-6-2011 AtlMetrics@webscience 14
So what does this tell us? IAnalysis of Services Providers...• Reveals interesting insights on part of the
audiences coming to the Scratchpads Such as...• Audiences from different educational levels;
from diverse scientific disciplines• and other professionals, organizations
• However: 75% through ISP’s
14-6-2011 AtlMetrics@webscience 15
So what does this tell us? II
Working on the web, to do what?
• Scratchpad web sever log data can be used to identify if and how often content “producing” activities occur
• If combined with numbers on consuming actions this gives a much more comprehensive view of the use of an e-science infrastructure
14-6-2011 AtlMetrics@webscience 16
Discussion • Data sets used are rich and analyse discussed here only a start
• Further research should tell us more about:- interdisciplinary and different educational interest of the sites-which sites or content attract what type of audience (ISP)- Division between producing and consuming actions
• Issues:– Division of ISPs versus user organizations– Who what type of organizations/companies are have their own ISPs
and who not why, what are the trends?– Trends on tele-working and use of wifi in the work enviroment– Percentage of bots in Internet traffic
14-6-2011 AtlMetrics@webscience 17
Acknowledgments
We thank the following people for helping us gathering and analyzing the web data for Altmetrics11
Simon Rycroft and the rest of the Scratchpad team at the Natural History Museum London
Laura Hollink – TU Delft
14-6-2011 AtlMetrics@webscience 18