counter point: making the most of imperfect data
TRANSCRIPT
COUNTER PointMaking the Most of Imperfect Data
cc: amphalon - https://www.flickr.com/photos/72427312@N00
Jeannie Gartenschlaeger-CastroLindsay Cronk
4/4/2016
IntroductionWho we are, What we do
Two Different eResource Perspectives• Jeannie = Systems-Side• Lindsay = Service-Side
Disclaimer: I was studied international relations and studio art.
Photo by David Bygott - Creative Commons Attribution-NonCommercial-ShareAlike License https://www.flickr.com/photos/86666094@N00 Created with Haiku Deck
Statistical modeling is the application of a set of assumptions to data, typically paired data.
Photo by Biblioteca General Antonio Machado - Creative Commons Attribution License https://www.flickr.com/photos/37667416@N04 Created with Haiku Deck
All COUNTER Reports are Time Series Data-Sets.• Continuous time interval• Successive measurements• Equal spacing/time between data points• Single measures within the report period
Decomposition of Time Series• Segments time series • Estimates based on predictability• Wold’s theorem/decomposition – every time
series can be decomposed into a pair of uncorrelated processes, one deterministic/one time/average based– Imagine usage in two components, one trend oriented
(COUNTER reporting periods) and one irregular (faculty recommendations/libguides/external drivers)
Exponential Smoothing• Smooths time series data• Eliminates frequency noise/outliers
About the Approach
• Plays to COUNTER’s strengths• Addresses reporting weaknesses• Relatively straight forward analysis• Opportunity to test predictive analysis • Powerful visualizations
Context and Culture
cc: Misenus1 - https://www.flickr.com/photos/44075517@N00
Statistical Modeling in the Librarycc: Boston Public Library - https://www.flickr.com/photos/24029425@N06
Choosing Resources for Pilot
• Needed 4 year+ usage history for reverse predictive analysis
• Larger numbers make analysis easier (went aggregate)
Getting StartedJR1/DB1 – 2010 to 2013
• 4 JR datasets (Elsevier, Wiley, Highwire, and Cambridge)
• 4 DB datasets (Ebsco and ProQuest, separate sets for sessions and searches)
Applications• Excel – Data collection/clean-up• R – Data analysis• Tableau – Data visualization
Excel
R
Learn About RResources/Tutorials I like: • R for Beginners: https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf• Quick R: http://www.statmethods.net/• Using R for Time Series Analysis: http://
a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html • R Time Series Quick Fix: http://www.stat.pitt.edu/stoffer/tsa3/R_toot.htm• Ryan Womack’s excellent video series: https://
www.youtube.com/watch?v=QHsmAM6nktY
Tableau
Findings and Next StepsTrends, Implications, and Plans
cc: DirectDish - https://www.flickr.com/photos/13800911@N08
Usage is consistent across vendor platforms.
Usage trends manifest across vendor platforms.
Usage can be predicted.
What is a good search to session ratio?
Moving Forward
• Going micro with big platforms• Heuristic examination of databases with low
search to session ratios• Developing trend reports for CMC/selectors
Thank you!
cc: USFWS Pacific - https://www.flickr.com/photos/52133016@N08
Questions
cc: Maëlick - https://www.flickr.com/photos/113604805@N04
Keep in touch
cc: tasslehoff84 - https://www.flickr.com/photos/23284841@N00
Jeannie [email protected]
Lindsay [email protected]@linds_bot