dspace 1.6 usage statistics: how does it work? ben bosman ... · @mire contribution to dspace 1.6...

42
@MIRE DSpace 1.6 usage statistics: How does it work? Ben Bosman - @mire

Upload: others

Post on 06-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

@MIREDSpace 1.6 usage statistics:

How does it work?

Ben Bosman - @mire

Page 2: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

Outline

1 - Introduction

2 - Technical Overview

3 - User Interface Additions

4 - Advanced Use Cases

5 - High Load Installations

6 - Content & Usage Analysis Add-on Module

Page 3: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

1 - Introduction

Community Survey

Highest rated feature request36% of requests in survey

@mire Contribution to DSpace 1.6

Core of @mire’s Content & Usage AnalysisLogging usage events

in search index

Querying usage events to provide statisticsOn-the-fly queries instead of predefined reports

For XMLUI & JSPUI

Page 4: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

1 - Introduction

In-house storage of usage data:

No dependency on external servicesavailability, long-term support, ...

No privacy issuesCreate your own Back-ups

Storing original usage events

No limitations on views of the dataView usage data in detailFull history availableNot only aggregated (e.g. per month)

Page 5: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

Usage events:

Community homepage visitsCollection homepage visitsItem visitsBitstream downloads

Data per usage event:

TimestampIP addressLocation: continent, country, cityand much more ...

Page 6: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

Usage event logging

Apache Solr Open source enterprise search platform from the Apache Lucene projectNew web application added to DSpace

PerformanceFast logging in search indexCan easily be deployed on a separate serverAdvanced solutions for fast querying based on caching

Page 7: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

How to store statistics data

Define the fields to store:dspace/solr/statistics/conf/schema.xmlActual storage:org.dspace.statistics.SolrLogger.post()doc.addField(“fieldname”, fieldvalue);

Page 8: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

<field name="type" type="integer" indexed="true" stored="true" required="true" /> <field name="id" type="integer" indexed="true" stored="true" required="true" /> <field name="ip" type="string" indexed="true" stored="true" required="false" /> <field name="time" type="date" indexed="true" stored="true" required="true" /> <field name="epersonid" type="integer" indexed="true" stored="true" required="false" /><field name="country" type="string" indexed="true" stored="true" required="false" /><field name="city" type="string" indexed="true" stored="true" required="false"/> <field name="owningComm" type="integer" indexed="true" stored="true" required="false" multiValued="true" />

Page 9: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

Display of statistics data

Depends on the User interfaceStatisticsTransformer for XMLUIDisplayStatisticsServlet for JSPUI

Page 10: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

Display Statistics - XMLUI

StatisticsTransformer generates the DRI with the statistics informationUses multiple StatisticsDisplay objects

StatisticsListing for a two column tableStatisticsTable for a multiple column table

Page 11: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

Page 12: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

Display Statistics - JSPUI

DisplayStatisticsServlet Uses StatisticsBean objects to store the informationdisplay-statistics.jsp builds tables from StatisticsBean objects

Page 13: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

2 - Technical Overview

Page 14: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

3 - User Interface Additions

3 small changes in detail

Create repository wide statisticsSpecify a timespanSeparate downloads and page visits

Page 15: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

3 - User Interface Additions

Create repository wide statistics

Add a Transformer to create overview of e.g. top 3 items.

StatisticsListing statListing = new StatisticsListing(new StatisticsDataVisits(dso));statListing.setTitle("Top 3 items");statListing.setId("list-top-items");DatasetDSpaceObjectGenerator dsoAxis = new DatasetDSpaceObjectGenerator();dsoAxis.addDsoChild(Constants.ITEM, 3, false, -1);statListing.addDatasetGenerator(dsoAxis);addDisplayListing(division, statListing);

Page 16: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

3 - User Interface Additions

Specify a timespan

Limit displayed statistics data on e.g. item display page to specific timespan.

StatisticsSolrDateFilter dateFilter = new StatisticsSolrDateFilter();dateFilter.setStartDate(startDate);dateFilter.setEndDate(endDate);dateFilter.setTypeStr(“month”);statListing.addFilter(dateFilter);

Determine timespan in User Interface

Page 17: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

3 - User Interface Additions

Separate downloads and page visits

Split up item views and file downloads in item display statistics

StatisticsListing statListing = new StatisticsListing(new StatisticsDataVisits(dso));DatasetDSpaceObjectGenerator dsoAxis = new DatasetDSpaceObjectGenerator();dsoAxis.addDsoChild(Constants.BITSTREAM, 10, false, -1);statsList.addDatasetGenerator(dsoAxis);

Page 18: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

Extend the data being stored

Harvest usage data

Store popular searches

Recommendations

Page 19: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

Extend the data being stored

ReferrerStore referrer to visualize incoming links from other websites, and internal navigationReferrer URL retrieved from the browser

Page 20: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

Extend the data being stored

Fields to be stored are defineddspace/solr/statistics/conf/schema.xml

<field name="referrer" type="string" indexed="true" stored="true" required="false" />

Page 21: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

Extend the data being stored

Storage is handled by post() method in the SolrLogger classdoc1.addField(“referrer”, request.getHeader("referrer"));

Extend the user interface to use this new data

Page 22: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

Harvest Usage Data

GoalsMining usage data from partner institutionsCompare usage data amongst different institutionsConstruct cross-institution recommendations based on usage data

Example set-upsNEEOPIRUS

Page 23: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

Store popular searches

GoalsDisplay top searches within your repositoryDisplay relevant additional search terms to be included or excludedRank search results based on usage by other visitors

RequirementsStore executed search termsStore relation amongst search termsStore relation between search terms and opened items

Page 24: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

http://getcloudlet.com/

Page 25: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

4 - Advanced Use Cases

Recommendations

Build recommendations solution based on the concept of ‘users who visited this item, also visited …’

Page 26: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

5 - High Load Installations

High load installations

Generate large amounts of usage dataSlower query executionResponse time increasesRequest time outs if response time is too long

Page 27: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

5 - High Load Installations

Solution

Optimization using Solr server featuresAutocommit systemQuery warmup system

Page 28: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

5 - High Load Installations! ! Autocommit

Autocommit feature optimizes storage of usage events

Out of the box, synchronous commits of usage events are used

The autocommit feature enables asynchronous commits of these events

Page 29: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

5 - High Load Installations! ! Autocommit

Remove solr.commit() from SolrLogger

Add the AutoCommit code to the solrconfig.xml:<autoCommit> <maxDocs>10000</maxDocs> <maxTime>900000</maxTime></autoCommit>

Page 30: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

5 - High Load Installations! ! Query warmup

Query warmup is used to optimize the query execution time

Preheated queries are cached by the Solr server based on the filter query

Queries are being warmed up for the current month

Query warmup required at:

Server startupEnd of each month

Page 31: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

5 - High Load Installations! ! Query warmup

Server startup:

<listener event="firstSearcher" class="solr.QuerySenderListener">

End of month:

Execute all expected queries for next month

Page 32: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

5 - High Load Installations

More detailed information about how to improve your performance

Performance improvements tips page:Coming soon at http://atmire.com/statistics_performance.phpRegister at http://atmire.com/contact.php to be notified when the explanation has been completedOr email [email protected]

Page 33: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

CUA module

Designed and developed by @mireModule core contributed to DSpace 1.6Same data loggingImproved interface

Page 34: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

Statlets

Configurable display in repositoryDetermine the displayed dataSeparate configuration for item, collection, community, repository homepage

GraphsGenerate various types of graphs, and integrate them in the display pages

Page 35: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

Page 36: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

Administrator interface

Wide range of reportsCreated instantaneous in the web interfaceConfigure type of report to be requestedFast access to results to verify the configurationGenerate data in a few clicks

View report as:Data TableDownloadable SpreadsheetVarious Graph types

Page 37: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

Page 38: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

Page 39: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

Content analysis

Visualize repository growthDisplay amount of records per yearCompare growth amongst various communities

Visualize distributionDisplay amount of records per type, language, …

Page 40: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module

Page 41: DSpace 1.6 usage statistics: How does it work? Ben Bosman ... · @mire Contribution to DSpace 1.6 Core of @mireÕs Content & Usage Analysis Logging usage events in search index Querying

6 - Content and Usage Analysis Module