attribution and credit: beyond print & citations johan bollen indiana university
DESCRIPTION
Attribution and Credit: beyond print & citations Johan Bollen Indiana University School of Informatics and Computing Center for Complex Networks and System Research jbollen@ indiana.edu Acknowledgements: Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Ryan Chute (LANL), - PowerPoint PPT PresentationTRANSCRIPT
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Attribution and Credit: beyond print & citations
Johan BollenIndiana University
School of Informatics and Computing
Center for Complex Networks and System Research
Acknowledgements:
Herbert Van de Sompel (LANL), Marko A. Rodriguez (LANL), Ryan Chute (LANL),
Lyudmila L. Balakireva (LANL), Aric Hagberg (LANL), Luis Bettencourt (LANL)
Research supported by the NSF and the Andrew W. Mellon Foundation.
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Impact evaluation from citation data.
IF is part of Journal Citation Reports (JCR)
JCR = citation graph
2005 journal citation network• +- 8,560 journals• 6,370,234 weighted citation edges
Impact Factor = mean 2 year citation rate
Lowinfluence
Highinfluence
20032001 2002
Journal x All (2003)
Citation data:• Golden standard of scholarly evaluation• Citation = scholarly influences.• Extracted from published materials.• Main bibliometric data source for scholarly
evaluation.
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Clearly something’s wrong
Digital era: digital scholars
Technology has changed
Scholarly communication has changed
Habits and minds have changed
Scholars have changed
But scholarly review and assessment?Still stuck in the paper era:Peer review, print, citations,
journals…Party like it’s 1899!
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
The scientific process: the importance of early indicators
(Egghe & Rousseau, 2000; Wouters, 1997)(Brody, Harnad, & Carr 2006),
Citation: final products
• Publication delays
• Focus on publications
• Focus on authors
Usage data
• Scale, cf. Elsevier downloads (+1B) vs. Wos citations (650M)
• Immediate, early stages
• Variety of resources and actors
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
The Promise of Usage Data
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Challenges
1. Representativeness:• Recorded for particular service• Recorded for particular user community• Challenge: large-scale aggregation of representative data
2. Attribution and credit:• Citation is explicit, intentional expression of influence• Usage is behavioral, implicit measurement of “attention”• Challenge: turn clickstreams into attribution, credit, and influence
3. Community acceptance:• Unproven in terms of metrics, services, etc.• Lack of applications and community services• Challenge: creation of framework for community services
This presentation: overview of our work in (1), (2) and (3)
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Presentation structure
1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and
attribution3. Community-services4. Discussion
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Timeline and development
• 2006-2010 (Los Alamos National Laboratory and Indiana University):o MESUR project: funded by Andrew W. Mellon Foundationo Models of scholarly communicationo Large-scale aggregation of usage datao Large-scale survey of usage-based metrics
• 2010-present (Indiana University)o MESUR 2.0: funded by NSF and Andrew W. Mellon Foundationo Work on development of community-supported, open and sustainable
framework with NISO (Todd Carpenter)
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
MESUR data flow
http://www.mesur.org/schemas/2007- 01/mesur/
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Modeling the scholarly communication process:the MESUR ontology.
Based on OntologyX5 frameworkdeveloped by Rights.com
Examples:An author (Agent) publishes
(Context:Event) an article (Document)
A user (Agent) uses (Context:Event) a journal (Document)
Marko. A Rodrguez, Johan Bollen and Herbert Van de Sompel. A Practical On- tology for the Large-Scale Modeling of Scholarly Artifacts and their Usage, In Proceedings of the Joint Conference on Digital Libraries 2007, Vancouver, June 2007
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
MESUR usage DB
2006-2010: Collaborating publishers, aggregators and institutional consortia:
• BMC, Blackwell, UC, CSU (23), EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR, LANL, MIMAS/ZETOC, THOMSON, UPENN (9), UTEXAS
• Scale:o > 1,000,000,000 usage events, and growing…o +50M articles, +-100,000 serials
• Period: 2002-2010…
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Presentation structure
1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and
attribution3. Community-services4. Discussion
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
From usage to clickstreams
Minimal requirements for all usage data
• Unique usage events (article level)
• Fields: unique session ID, date/time, unique document ID and/or metadata, request type
• Note difference with usage statistics
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
How to generate a usage network.
Same session ~ documents relatedness• Same session, same user: common interest• Frequency of co-occurrence = degree of
relationship• Normalized: conditional probability (P(A|B) !=
P(B|A), directional• Clickstream, breadcrumbs, paths, flow…
Usage data is on article level:• Works for journals and articles• In fact, anything for which usage was
recorded
Note: not something we invented: association rule learning in data mining.
Beer and diapers!
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Johan Bollen, Herbert Van de Sompel, Aric Hagberg,Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Clickstream data yields high-resolution maps of science. PLoS One, February 2009.
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Network science for impact metrics.
: Number of geodesics between vi and vj
Betweenness centrality
PR(vi): PageRank of node vi
O(vj): out-degree of journal vj
N: number of nodes in networkL: dampening factor
PageRank
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Set of metrics calculated on MESUR data set
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
The MESUR Metrics Map
RATE METRICS
TOTAL CITES
PAGERANK(S)
BETWEENNESS
USAGE METRICS
Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan Chute. A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE, June 2009. URL: http://dx.plos.org/10.1371/journal.pone.0006022.
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Presentation structure
1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and
attribution3. Community-services4. Discussion
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
MESUR Mapping and ranking services
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
MESUR Mapping and ranking services
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
MESUR Mapping and ranking services
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Presentation structure
1. MESUR’s work on usage data aggregation2. Turning clickstreams into credit and
attribution3. Community-services4. Discussion
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
“
Registration is now open for "Scholarly Evaluation Metrics: Opportunities and Challenges", a one-day NSF-funded workshop that will take place in the Renaissance Washington DC Hotel on Wednesday, December 16th 2009. Participation in this workshop is limited to 50 people. Registration is free at http://informatics.indiana.edu/scholmet09/registration.html.
The topic of the workshop is the future of scholarly assessment approaches, including organizational, infrastructural, and community issues. The overall goal is to identify requirements for novel assessment approaches, several of which have been proposed in recent years, to become acceptable to community stakeholders including scholars, academic and research institutions, and funding agencies. The impressive group of speakers and panelists for the workshop includes representatives from each of these constituencies.
Further details are available at http://informatics.indiana.edu/scholmet09/announcement.html
Workshop organizers:Johan Bollen ([email protected]),Herbert Van de Sompel ([email protected]) andYing Ding ([email protected])
“
Seeking increasing community involvement
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Planning process underway to establish sustainable, open, community supported infrastructure.New support from Andrew W. Mellon foundation to figure it all out.Close collaboration with NISO (Todd Carpenter)
Moving towards community involvement
Logistics:Data aggregationNormalizationData-related servicesData management
Science:MetricsAnalysisPrediction
ServicesRankingAssessmentMapping
=More than sum of parts:• Each component supports the other• Various business and funding models• Generate added value on all levels
Can fundamentally change scholarly communication
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
- Microsoft/MSR: http://academic.research.microsoft.com/- Altmetrics: http://altmetrics.org/- Mendeley-based analytics- Publisher-driven initiatives: Elsevier’s SciVal, mapping of science- Google Scholar- Katy Borner at Indiana University: Science of Science Cyberinfrastructure
New, interesting initiatives
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Some relevant publications.
Johan Bollen, Herbert Van de Sompel, Aric Hagberg, Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Clickstream data yields high-resolution maps of science. PLoS One, March 2009 (In Press)
Johan Bollen, Herbert Van de Sompel, Aric HagBerg, Ryan Chute. A principal component analysis of 39 scientific impact measures. arXiv.org/abs/0902.2183
Johan Bollen, Marko A. Rodriguez, and Herbert Van de Sompel. Journal status. Scientometrics, 69(3), December 2006 (arxiv.org:cs.DL/0601030)
Johan Bollen, Herbert Van de Sompel, and Marko A. Rodriguez. Towards usage-based impact metrics: first results from the MESUR project. In Proceedings of the Joint Conference on Digital Libraries, Pittsburgh, June 2008
Marko A. Rodriguez, Johan Bollen and Herbert Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage, In Proceedings of the Joint Conference on Digital Libraries, Vancouver, June 2007
Johan Bollen and Herbert Van de Sompel. Usage Impact Factor: the effects of sample characteristics on usage-based impact metrics. (cs.DL/0610154)
Johan Bollen and Herbert Van de Sompel. An architecture for the aggregation and analysis of scholarly usage data. In Joint Conference on Digital Libraries (JCDL2006), pages 298-307, June 2006.
Johan Bollen and Herbert Van de Sompel. Mapping the structure of science through usage. Scientometrics, 69(2), 2006.
Johan Bollen, Herbert Van de Sompel, Joan Smith, and Rick Luce. Toward alternative metrics of journal impact: a comparison of download and citation data. Information Processing and Management, 41(6):1419-1440, 2005.
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Local aggregation of usage data: linking servers
Full textDBs
Full textDBs
EhostEJS
EhostEJS
GoogleGoogle
ISIISI
BritishLibraryBritishLibrary
A&Iservice
A&Iservice
Publishersites
Publishersites
ingentaingenta
Link Resolver
• Linking servers can record activities across multiple OpenURL-enabled information sources of a specific digital library environment
• Linking server logs are representative of the activities of a particular user population
• Global scholarly information space compliant with linking servers
• Allows recording of clickstream data: other methods of log aggregation can not connect “same user, different system” streams
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Global aggregation of usage data
Aggregatedlogs
Log DB
Usage logs
LogRepository 1
Link Resolver
LogRepository 2
Link Resolver
LogRepository 3
Link Resolver
AggregatedUsage Data
Usage logs
Usage logs
• Aggregation of linking server logs leads to data set representative of large sample of scholarly community
• Global really means different samples of scholarly community• Can be finetuned
for local communities
• Possibility of truly global coverage
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
bX project: standards-based aggregation of usage data
Aggregatedlogs
Log DB
OpenURLContextObjects
LogRepository 1
Link Resolver
LogRepository 2
Link Resolver
LogRepository 3
Link Resolver
Logharvester
COCOCO
COCOCO
COCOCO
Usage log aggregation via OAI-PMH
Log Repository properties:
• OAI-PMH metadata record: • linking server event log for specific document in specific session• expressed using OpenURL XML ContextObject Format
• OAI-PMH identifier: UUID for event • OAI-PMH datestamp: datetime the event was added to the Log Repository
AggregatedUsage Data
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
bX project: OpenURL ContextObject to represent usage data
Referent * identifier * metadata
Requester* User or user proxy: IP, session, …
ServiceType
<?xml version=“1.0” encoding=“UTF-8”?><ctx:context-object timestamp=“2005-06-01T10:22:33Z” … identifier=“urn:UUID:58f202ac-22cf-11d1-b12d-002035b29062” …>…<ctx:referent> <ctx:identifier>info:pmid/12572533</ctx:identifier> <ctx:metadata-by-val> <ctx:format>info:ofi/fmt:xml:xsd:journal</ctx:format> <ctx:metadata> <jou:journal xmlns:jou=“info:ofi/fmt:xml:xsd:journal”> … <jou:atitle>Toward alternative metrics of journal impact… <jou:jtitle>Information Processing and manage…/jou:jtitle>…</ctx:referent>… <ctx:requester> <ctx:identifier>urn:ip:63.236.2.100</ctx:identifier> </ctx:requester>… <ctx:service-type> … <full-text>yes</full-text> … </ctx:service-type> … Resolver… Referrer… ….</ctx:context-object>
Resolver:* identifier of linking server
Event information: * event datetime * globally unique event ID
Indiana UniversityBerkeley, CA - 2011
School of Informatics and Computing
Implications for structural analysis of usage data
• Sequence preservation allows:o Reconstruction of user behavioro Usage graphs!
• Statistics do not allow this type of analysis BUT are useful for:
o validating resultso rankings