analogist/ezpaarse: analysing locally gathered logfiles to determine users’ accesses to subscribed...

37
ANALOGIST/EZPAARSE : ANALYSING LOCALLY GATHERED LOGFILES TO DETERMINE USERSACCESSES TO SUBSCRIBED E-RESOURCES http://ezpaarse.couperin.org http://analogist.couperin.org

Upload: liber-europe

Post on 23-Jun-2015

616 views

Category:

Education


2 download

DESCRIPTION

AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources (Thomas Jouneau, Université de Lorraine, France). This presentation was one of the 10 most highly ranked at LIBER's Annual Conference 2014 in Riga, Latvia. Learn more: www.libereurope.eu

TRANSCRIPT

Page 1: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

ANALOGIST/EZPAARSE : ANALYSING LOCALLY GATHERED LOGFILES TO DETERMINE USERS’

ACCESSES TO SUBSCRIBED E-RESOURCES

http://ezpaarse.couperin.org

http://analogist.couperin.org

Page 2: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

1- The Context : A Need for Evaluation 2- Gathering Local Data 3- Parsers and Analyses 4- AnalogIST and ezPAARSE 5- Results and Visualization 6- Project Organization

Presentation Outline

Page 3: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

1 The Context :

A Need for Evaluation

Page 4: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

1. The Context : A need for evaluation

About some well-known facts

5.000 to 10.000 publishers / 23.000 e-journals

$25 billion global revenue in 2012, increasing 4-5 %/year

The 4 biggest publishers make half the market

For 10 years the price of most journals increases from 3% to 5% / year

5.500.000 researchers, increasing 3,5% per year

1.5 billion articles downloaded per year and by 10M users

The Scientific and Technical

Information Market

We need to assess and evaluate the use of these e-resources

Page 5: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

1. The Context : A need for evaluation

What we’ve currently got

… are not available

… are available and COUNTER-compliant

… are available but not COUNTER-

compliant

1st limitation : Vendors are the only source

2nd limitation : Only a partial view, no comparison possible

3d limitation : These numbers just offer mere quantification

A possible solution : → locally-gathered usage quantification

Publisher provided statistics

→ We need to assess these numbers

→ We need to complete the figures

→ We need to qualify them

Page 6: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

2 Gathering Usage

Data Locally

Page 7: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

4

3

2. Gathering usage data locally

The reverse proxy

Page 8: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

1

4

2

3

2. Gathering usage data locally with a reverse proxy

Where ezPAARSE comes into play

Page 9: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

3 Parsers and Analyses

Page 10: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

3. Parsers and analyses

Example of an URL structuration

http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S0001457512000747&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wchp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0-S0001457512000747-main.pdf

Page 11: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

3. Parsers and analyses

Example of an URL structuration

ISSN & type of the downloaded file

http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S0001457512000747&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wchp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0-S0001457512000747-main.pdf

Page 12: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

http://www.sciencedirect.com/science/journal/00014575

ISSN By manually trying the URL, we find an HTML table of contents

3. Parsers and analyses

Example of an URL structuration

Page 13: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

http://www.cairn.info/load_pdf.php?ID_ARTICLE=RFG_218_0009

We know it’s a PDF but we only get a publisher-specific identifier : we need a correspondance table : the Publisher Knowledge Base (ideally a KBART file)

3. Parsers and analyses

Example of an URL structuration

Page 14: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S0001457512000747&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wchp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0-S0001457512000747-main.pdf

/_pii=S([0-9]{0,7}[0-9X])/i

3. Parsers and analyses

Parse the URL

Page 15: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

3. Parsers and analyses

What do we count?

Serials   E-­‐books   Law  databases   Inst.  repositories  

Ar#cles  (ARTICLE)   Book  by  #tle  (BOOK)   Law  encyclopedia  (ENCYCLOPEDIES)  

PHD_THESIS  

Abstract  (ABS)   Chapter,  sec#on  (BOOK_SECTION)  

Law  memento  (FORMULES)  

MD_THESIS  

Table  of  contents  (TOC)   Book  series  (BOOKSERIE)   Law  manual  (BROCHES)   MASTER_THESIS  

Reference  (REF)   Manuals,  handbooks  (HANDBOOK)  

Law  codes  (CODES)  

Ar#cle  preview  (for  ex.  “Look  inside”  func#on  of  SpringerLink)  (PREVIEW)  

Ar#cle  in  basket/personal  folder  (BOOKMARK)  

- The availability of these items depend on the elements present in the URL - The Law databases currently covered are only French ones

Page 16: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

...we need one parser for each

3. Parsers and analyses

Platforms covered

Each platform has its own structuration...

Page 17: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

Opaque URLs : session ids, encryption…. Example : the former Springer platformhttp://www.springerlink.com/content/j5q872410p510m63/fulltext.pdf

Publisher IDs, needing to be linked to a knowledge base or a reference file. Example : Cairnhttp://www.cairn.info/load_pdf.php?ID_ARTICLE=RFG_218_0009

- Opaque URLs (session ids, encryption…) - Knowledge bases having to be manually edited

3. Parsers and analyses

Some limitations apply

Page 18: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

4 AnalogIST

and ezPAARSE

Page 19: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

AnalogIST : the wiki portal Analyse des Logs de l'IST = Analysing the logs of Scientific and Technical Information → The place where we gather the platform analysis, and synchronise the new parsers with the local installations http://analogist.couperin.org

4. AnalogIST and ezPAARSE

●  ezPAARSE : the software ez : easy / PAARSE : Progiciel d'Analyse des Accès aux RessourceS Electroniques = Software for Analysing the Accesses to Online Resources

●  as a local installation ●  as an online service (SaaS)

Free (libre) software Multi-platform http://ezpaarse.couperin.org

Page 20: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

4. AnalogIST and ezPAARSE

Univ 1

Univ 2

...

AnalogIST

local installations global installation + collaborative space

Page 21: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

4. AnalogIST and ezPAARSE

Through a web form With the command line (cURL)

a actualiser nouveau formulaire EN

Use the web form to create the command line suiting your needs.

Page 22: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : Using the Results

Example of an ezPAARSE output

KBART fields geoip fields

Ded

uplic

ate

cons

ulta

tion

even

ts :

CO

UN

TER

reco

mm

enda

tion

Text file (CSV format)

Page 23: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5 ezPAARSE :

Using the Results

Page 24: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : using the results

(Libre/MS) Office rendering macros

Page 25: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : using the results

Exploiting the Results with

Page 26: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : using the results Who (student, researcher, staff) consults what? (UL)

Repartition of consultations of paid content (books, journals, law references…) by user type at the Université de Lorraine

Page 27: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : using the results

Consultations by research unit (UL)

Consultations of articles from Jan 2014 to May 2014 by research units at the Université de Lorraine

Page 28: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : using the results

Consultations by teaching unit (UL)

Consultations of articles by teaching unit or faculty at the Université de Lorraine

Page 29: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : using the results

Geolocalisation of consultations (CNRS)

Page 30: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

5. ezPAARSE : using the results

Detection of an anomaly (CNRS)

The consultation peak corresponds to an abuse of an e-resource. Detection allows to react promptly to this incident.

Page 31: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

6 Project Organization

Page 32: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

6. Project organization : the method

SCRUM : An agile development method

4

C

PRODUCT VISION

Page 33: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

6. Project organization : the team

Page 34: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

In conclusion

●  ezPAARSE is free and open source ●  Simple use and testing ●  State of the art technologies

●  Feel free to test

●  send us log samples ●  give us feedback !

Page 35: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

Any Questions?

http://ezpaarse.couperin.org

http://analogist.couperin.org

https://twitter.com/ezpaarse

nuage de tag avec termes appropriés

Page 36: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

http://analogist.couperin.org/platforms/analyse-helper/start

The rest is automatically processed

dokuwiki syntax generated

Page 37: AnalogIST/ezPAARSE: Analysing Locally Gathered Logfiles to Determine Users’ Accesses to Subscribed e-Resources

LIBER 2014 - RIGA - 3/07/2014

More features : exploiting the results with geolocalization