sushi a beginner’s guide to niso’s standardized usage statistics harvesting initiative

Download SUSHI A beginner’s guide to NISO’s Standardized Usage Statistics Harvesting Initiative

If you can't read please download the document

Upload: kato

Post on 10-Jan-2016

26 views

Category:

Documents


1 download

DESCRIPTION

SUSHI A beginner’s guide to NISO’s Standardized Usage Statistics Harvesting Initiative. Breakout Sessions: Group B UKSG Conference and Exhibition Torquay April 7-9, 2008 Oliver Pesch EBSCO Information Services [email protected]. Overview. Background on usage statistics - PowerPoint PPT Presentation

TRANSCRIPT

  • SUSHIA beginners guide to NISOs Standardized Usage Statistics Harvesting InitiativeBreakout Sessions: Group B UKSG Conference and Exhibition Torquay April 7-9, 2008 Oliver Pesch EBSCO Information [email protected]

  • OverviewBackground on usage statisticsWhy librarians collect themTimeline of standardsProgression of improvementsCOUNTERSUSHIWhat it isHow it worksSUSHI and COUNTER: why they are importantTo librariesTo publishers

  • OverviewBackground on usage statisticsWhy librarians collect themTimeline of standardsProgression of improvementsCOUNTERSUSHIWhat it isHow it worksSUSHI and COUNTER: why they are importantTo librariesTo publishers

  • Why do librarians collect usage statistics?Because they mustGovernment and funding bodies may require themE.g. ARL statisticsTo inform renewal decisionsOverall useCost-per-useSupport cancellation decisionsGenerally manage e-resources and the tools and programs that support them

  • Why do librarians collect usage statistics?Because they mustGovernment and funding bodies may require themE.g. ARL statisticsTo inform renewal decisionsOverall useCost-per-useSupport cancellation decisionsGenerally manage e-resources and the tools and programs that support them

  • OverviewBackground on usage statisticsWhy librarians collect themTimeline of standardsProgression of improvementsCOUNTERSUSHIWhat it isHow it worksSUSHI and COUNTER: why they are importantTo librariesTo publishers

  • Timeline for usage related standards efforts

  • Timeline for usage related standards efforts..Online collections and their use grows

  • Timeline for usage related standards effortsICOLC Guidelines for Usage Data

  • Timeline for usage related standards effortsICOLC Guidelines: Release 2

  • Timeline for usage related standards effortsProject COUNTER formed

  • Timeline for usage related standards effortsCOUNTER Code of Practice Release 1

  • Timeline for usage related standards effortsERMI committee formed

  • Timeline for usage related standards effortsFirst commercial ERM released

  • Timeline for usage related standards effortsERM Usage Consolidation Module

  • Timeline for usage related standards effortsSUSHI committee formed

  • Timeline for usage related standards effortsCOUNTER Code of Practice release 2

  • Timeline for usage related standards effortsSUSHI released as draft standard

  • TimelineSUSHI certified by ANSI as Z39.93

  • OverviewBackground on usage statisticsWhy librarians collect themTimeline of standardsProgression of improvementsCOUNTERSUSHIWhat it isHow it worksSUSHI and COUNTER: why they are importantTo librariesTo publishers

  • Usage StatisticsUsage data importance grows with e-collectionsCollection managementBudget management

  • Usage StatisticsUsage data importance grows with e-collectionsCollection managementBudget managementCredibility and consistencyDifferent vendors using different terminologyInconsistencies in processing lead to over countingFormatting differences make comparison challenging

  • Usage StatisticsUsage data importance grows with e-collectionsCollection managementBudget managementCredibility and consistencyDifferent vendors using different terminologyInconsistencies in processing lead to over countingFormatting differences make comparison challenging COUNTER

  • Usage StatisticsUsage data importance grows with e-collectionsCollection managementBudget managementCredibility and consistencyDifferent vendors using different terminologyInconsistencies in processing lead to over countingFormatting difference comparison challenging Consolidation and meaningful reportingMany vendors and reports to processCollection-level views needed

    COUNTER

  • Usage StatisticsUsage data importance grows with e-collectionsCollection managementBudget managementCredibility and consistencyDifferent vendors using different terminologyInconsistencies in processing lead to over countingFormatting difference comparison challenging Consolidation and meaningful reportingMany vendors and reports to processCollection-level views needed

    COUNTERUsage Consolidation tools (ERM)

  • Usage StatisticsUsage data importance grows with e-collectionsCollection managementBudget managementCredibility and consistencyDifferent vendors using different terminologyInconsistencies in processing lead to over countingFormatting difference comparison challenging Consolidation and meaningful reportingMany vendors and reports to processCollection-level views neededRetrieving and processingObtaining reports is time consumingFormatting and other adjustments still neededCOUNTERUsage Consolidation tools (ERM)

  • Usage StatisticsUsage data importance grows with e-collectionsCollection managementBudget managementCredibility and consistencyDifferent vendors using different terminologyInconsistencies in processing lead to over countingFormatting difference comparison challenging Consolidation and meaningful reportingMany vendors and reports to processCollection-level views neededRetrieving and processingObtaining reports is time consumingFormatting and other adjustments still needed COUNTERUsage Consolidation tools (ERM) SUSHI

  • OverviewBackground on usage statisticsWhy librarians collect themTimeline of standardsProgression of improvementsCOUNTERSUSHIWhat it isHow it worksSUSHI and COUNTER: why they are importantTo librariesTo publishers

  • GoalsCodes of practiceAuditComing in release 3

    http://www.projectcounter.org/

  • Why COUNTER?Goal: credible, compatible, consistent publisher/vendor-generated statistics for the global information communityLibraries and consortia need online usage statisticsTo assess the value of different online products/servicesTo support collection developmentTo plan infrastructurePublishers need online usage statisticsTo experiment with new pricing modelsTo assess the relative importance of the different channels by which information reaches the marketTo provide editorial supportTo plan infrastructure

  • COUNTER Codes of PracticeDefinitions of terms usedSpecifications for Usage ReportsWhat they should includeWhat they should look likeHow and when they should be deliveredData processing guidelinesAuditingCompliance

  • COUNTER: current Codes of Practice1) Journals and databasesRelease 1 Code of Practice launched January 2003Release 2 published April 2005 replacing Release 1 in January 2006Now a widely adopted standard by publishers and librariansAlmost 100 vendors now compliant10,000+ journals now coveredLibrarians use it in collection development decisionsPublishers use it in marketing to prove value

  • Journal and Database Code of Practice: ReportsJournal Report 1Full text article requests by month and journalJournal Report 2Turnaways by month and journalDatabase Report 1Total searches and sessions by month and databaseDatabase Report 2Turnaways by month and databaseDatabase Report 3Searches and sessions by month and service

  • COUNTER: current Codes of Practice2) Books and reference worksRelease 1 Code of Practice launched March 200610 vendors now compliant Relevant usage metrics less clear than for journalsDifferent issues than for journalsDirect comparisons between books less relevantUnderstanding how different categories of book are used is more relevant

  • Books and Reference Works: ReportsBook Report 1Number of successful requests by month and titleBook Report 2Number of successful section requests by month and titleBook Report 3Turnaways by month and titleBook Report 4Turnaways by month and serviceBook Report 5Total searches and sessions by month and titleBook Report 6Total searches and sessions by month and service

  • Specific Formats

  • Explicit report layout consistent

  • Credibility: COUNTER AuditIndependent audit required within 18 months of compliance, and annually thereafterAudit is online, using scripts provided in the Code of PracticeAuditor can be:Any Chartered AccountantAnother COUNTER-approved auditor ABCE is the first COUNTER-approved auditorIndustry-ownedNot-for-profitIndependent and impartialPart of ABC (Audit Bureau of Circulations)Providing website traffic audits for over 150 companies and certifying over 1400 domainsHave successfully completed test audits on COUNTER usage reports

  • Coming soonRelease 3 of the Journals and Databases Code of PracticeKey featuresConsortium reportsSets expectations for handling of:Federated searchingInternet robots and archives like LOCKSSBrowser prefetchingReports must be available in XML formatRevised COUNTER XML SchemaSUSHI support becomes a requirement for compliance

  • OverviewBackground on usage statisticsWhy librarians collect themTimeline of standardsProgression of improvementsCOUNTERSUSHIWhat it isHow it worksSUSHI and COUNTER: why they are importantTo librariesTo publishers

  • SUSHI: ObjectivesCOUNTER statistics provides an excellent model and rules for usage statistics countingLibraries needed:A more efficient data exchange modelCurrent model is file-by-file spreadsheet downloadBackground query and response model is more efficient and scalable

  • SUSHI: What it is and IsntWhat it is:A web-services model for requesting dataReplaces the users need to download files from vendors websiteA request for data where the response includes COUNTER dataUsing COUNTERs schemaWhat it isnt:A model for counting usage statisticsA usage consolidation application

  • SUSHI: COUNTER ReportsUsage ReportsJournal Report 1Full text article requests by month and journalJournal Report 2Turnaways by month and journalDatabase Report 1Total searches and sessions by month and databaseDatabase Report 2Turnaways by month and databaseDatabase Report 3Searches and sessions by month and service

  • Web Services: the chosen approach for SUSHIWeb services combine the best aspects of component-based development and the Web. Commercially acceptedWidely supported (W3C)Secure

    but first some definitions

  • DefinitionsXML Schema (XSD) A language for describing the structure and constraining the contents of XML documents. (reactivity.com glossary)

  • DefinitionsXML Schema (XSD) A language for describing the structure and constraining the contents of XML documents. (reactivity.com glossary)

  • DefinitionsWeb ServicesOpen, standard (XML, SOAP, etc.) based Web applications that interact with other web applications for the purpose of exchanging data. (lucent.com)

  • DefinitionsSimple Object Access Protocol (SOAP) SOAP is a lightweight XML based protocol used for invoking web services and exchanging structured data and type information on the Web. (oracle.com)

  • DefinitionsWeb Services Description Language(WSDL) is an XML format published for describing Web services. (wikipedia.org)

  • Web Services: An exampleSystem A provides online information about companies. System B provides real-time stock quotations. Using Web Services, System A can integrate real-time stock quotes into the company information they provide.

  • System BSystem AReal TimeStock Quotes(web service)OnlineCompanyData

    InternetSystem A sends the stock symbol to System B.Stock symbol

  • System BSystem AReal TimeStock Quotes (web service)OnlineCompanyData

    InternetSystem B returns the quote. All of this happens in milliseconds.Stock symbol

    Stock quote

  • System BSystem AReal TimeStock QuotesOnlineCompanyData

    InternetMessages are formatted in XML, and the protocol used to communicate is SOAP (Simple Object Access Protocol).Stock symbol

    Stock quote

    SOAP

  • SUSHI : The ExchangeReport Request

    Report Response

  • SUSHI: ArchitectureThe next series of slides graphically show a SUSHI transactionLibrarys ERM system requests a usage reportSUSHI client makes the requestSUSHI server processes requestSUSHI server prepares COUNTER reportSUSHI server packages and returns responseSUSHI client processes COUNTER report

  • Content ProviderLibrary

    InternetThe Librarys ERM and Content Providers systems are both connected to the internet.

  • Content ProviderLibrarySUSHIClient

    InternetThe SUSHI client is software that runs on the librarys server, usually associated with an ERM system.ERM

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMThe SUSHI server is software that runs on the Content Providers server, and has access to the usage data.

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClientRequest

    InternetERMWhen the ERM system wants a COUNTER report, it sends a request to the SUSHI client, which prepares the request.?

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClientRequest

    InternetERMThe SUSHI request is sent to the Content Provider. The request specifies the report and the library the report is for.?Request

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMThe SUSHI server reads the request then processes the usage data.?Request

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMThe SUSHI server creates the requested COUNTER report in XML format.?COUNTER

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMA response message is prepared according to the SUSHI XML schema.?COUNTERResponse

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMThe COUNTER report (XML) is added to the Response as its payload. The response is sent to the client.?Response

    COUNTER

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMThe COUNTER report (XML) is added to the Response as its payload. The response is sent to the client.?Response

    COUNTER

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMThe SUSHI client processes the response and extracts the COUNTER report.?Response

    COUNTER

  • Content ProviderLibrarySUSHIServer(web service)UsageDataSUSHIClient

    InternetERMThe extracted COUNTER report is passed to the ERM system for further processing.COUNTER

  • OverviewBackground on usage statisticsWhy librarians collect themTimeline of standardsProgression of improvementsCOUNTERSUSHIWhat it isHow it worksSUSHI and COUNTER: why they are importantTo librariesTo publishers

  • Why COUNTER and SUSHI are importantFor libraries and publishersUsage statistics are being used to inform decisionsThey need to be consistent, credible and comparableAnd, easy to obtain

    SUSHI

  • More thoughts on usage statisticsUsage statisticsshould enlighten rather than obscureshould be practicalare only part of the storyshould be used in contextshould be reliable

  • COUNTER and SUSHIQuestions and answers

  • SUSHIRapid adoption of SUSHI due to it being a COUNTER compliance requirementNew COUNTER schema will allow all COUNTER reports to be delivered through SUSHI using one schemaAdditional reports will help consortiaWhat effect will release 3 of the COUNTER Code of Practice have on SUSHI?

  • SUSHIAlmost 100 vendor/products are compliant with Journals and Databases COP10 vendors are compliant with Books and Reference WorksSee http://www.projectcounter.org/compliantvendors.html How many vendors are compliant with COUNTER codes of practice?

  • SUSHINISO web site for SUSHI:http://www.niso.org/ (Select Standards and search for Z39.93)SUSHI Schemas:http://www.niso.org/schemas/sushi Where do I find the standard and more information about it?

  • SUSHIToolkits for .NET (courtesy EBSCO) and JAVA (courtesy Swets) available on the NISO web siteRecorded Webinars on the NISO web siteDeveloper email listContact either Oliver Pesch [[email protected]] or Adam Chandler [[email protected]] to be addedWhat help is there for developers?

  • SUSHIIf data COUNTER data is available, anddevelopers are familiar with implementing web services in .NET or JAVA; then,the project is relatively small (weeks not months)How big a project is it to create a SUSHI Server?

  • Thank you!

    Oliver [email protected]

    One simple reason for collecting usage data is because they are obliged to. Organizations like ARL require their members to submit detailed statistics on the library, the library collection and its use. Other agencies and funding bodies demand the same. The big challenge with collecting statistics for online resources is that the library does not control the collection and, depending on the resource, the majority of the use can come from users who do not pass through the library (physically or virtually).

    Other than the because they must, usage statistics are being used as one input into renewal decisions. In the case of databases, the number of searches performed on a given database, compared to others is an indicator of the usefulness of that database, similarly, journals with low use, or high cost per use may undergo additional scrutiny. Many librarians work to maximize their budgets; therefore, they may cancel journals or databases that are no longer as relevant and use that money for other materials.

    In the case where the library budge it under pressure, usage statistics become a tool for isolating those materials that could be considered for cancellation. Note that just because one journal or database does not have the same level of use as another, does not mean it is less valuable. The librarian will also take into consideration the discipline and programs the resource is supporting.

    Low use may be as a result of the database being hard to find on the library web page and thus can be used to prompt action to better highlight and promote. Then usage can be measured over time to rate the effectiveness of the change.This screen shot is courtesy Innovative Interfaces, Inc. and shows one of the reports from their Usage Consolidation module. Note the cost per use column a simple calculation of price paid divided by number of full text downloads from the journal.Online collections continue to grow and become a significant part of the libraries collection. As a result, libraries need to measure usage to control these resources that are taking a growing percentage of their budget. Not all vendors provide usage statistics for a variety of reasons. And when they do, the reporting is not consistent from one to the next.

    The problem becomes critical for consortia who want the statistics to evaluate the effectiveness of their purchases.The International Coalition of Library Consortium became the first to address this growing problem. In 1999 they created their guidelines for reporting usage data. They normalized the terminology, and set expectations as to what elements a vendor was to report on.

    Significant because for the first time vendors were given a yardstick to be measured against.Two years later came the second release of the guidelines which included refinements to the first. The ICOLC guidelines helped; however, inconsistencies in counting and formatting continued to be a problem. Something beyond a set of definitions was needed. Publishers, Librarians and Aggregators teamed together to solve the problem and formed COUNTER the goal was to create a code of practice that would lead to consistent, comparable and credible usage statistics.Release 1 of the COUNTER Code of Practice was published within a year. It clarified terminology; identified specific reports that were needed, addressed common problems with web logs and double-clicking, and specified the format and methods of delivery for the reports.In 2002 the E-Resource Management Initiative was formed under the sponsorship of the Digital Library Foundation. This was an outcome of the research published the prior year by Tim Jewell. Tim was investigating the growing challenge of managing e-resources. He discovered that many libraries were developing their own solutions. The goal of ERMI was to come up with some standard approaches and guidelines for managing e-resources and as a result in 2004 the committee published its report, which included functional specifications, a data dictionary and an entity relationship diagram.The ERMI work became the blue-print for commercial ERM systems.In 2004 Innovative Interfaces released the first commercial ERM. The ERM is intended to offer the library a single place to store and access all information about their e-resources.

    The reason we are talking about ERM systems is that, as the single place to store all information about accessing and administering e-resources, the natural extension to this system was to incorporate usage data.As a result Innovative, I believe, was the first to attempt to add the usage consolidation module. Their goal was to leverage the work of COUNTER so that they could load the full text usage information in a standard format comparable across vendors.

    Unfortunately there were variation in how the code of practice was being applied, AND it was a lot of work to gather reports so something else had to be done.By mid 2005, it was clear that a method of automatically harvesting usage data was needed and thus SUSHI was born. SUSHI stands for the Standardized Usage Statistics Harvesting Initiative we will get into more about SUSHI in a minute.Shortly after SUSHI was created COUNTER updated its code of practice for journals and databases. They addressed some of the issues that were uncovered by early usage consolidation work by becoming much more specific with the formats and introducing some additional elements, such as Publisher/ They also introduced the notion of an audit to verify compliance.SUSHI was released as a draft standard some 14 months after the committee was formed, andIt was approved by the NISO members and officially became NISO Z39.93 by the end of last year.The simple fact that as online takes an increasingly larger role in the library collection, so does the need to measure usage for collection management (weeding) and budget managementSo with the need for usage data and the fact that the usage is not gathered by the library, surfaced the first major problem. Those vendors that were providing usage (and many did not), were not consistent in terminology, formatting and even the basic techniques for counting. A number of standards initiatives, like ICOLC, NISO and others contributed to solving these problems, but it was COUNTER that really made the difference Usage data is gathered at the vendor sites The library must retrieve and process to create meaningful reports. It was not until COUNTER came in to play that library application vendors and service providers saw that they could create a consolidation systems that would effectively consolidate the reports. Around this time the ERMs were beginning to appear on the market and thus the foundation was there to support such consolidation.

    Now we have the standard in place to get somewhat uniform stats from vendors and the place to load them, then next problem is uncovered. The time it takes to retrieve reports is significant and the process convoluted. And even with COUNTER, some manual tweaking is often neededResulting from this challenge came SUSHI which we will discuss later.COUNTER, or Counting Online Usage of Networked Electronic Resources

    Formed in 2002, Project COUNTER is a non-profit organization that was formed with the participation of publishers, librarians and aggregators. This collaboration was key for the success of this group. While many publishers were providing usage statistics for their product for years, they were counting different things in different ways.

    Project COUNTER has lead in the standardization of the usage of electronic resources and focuses on how things are counted and how they are reported.

    The ultimate goal for this standardization can be summed up in three Cs:Usage reports should be consistent, they should be credible, and they should be comparable across products.

    Ill get into examples in the next few minutes and then conclude with some caveats.Here is an example of a Journal Report. I have highlighted the required metadata in the top left-hand corner that needs to preceed the report. This includes when the report was run.

    You will notice that there is a distinction between the Publisher and the Platform. This is not a concern for most publishers, but is a concern for an aggregator like EBSCO Host or ProQuest. Services like HighWire, Ingenta and Metapress also host multiple publishers on a single platform.

    The other distinction is that yearly totals come in three flavors:A simple yearly total for all fulltext requests, and two additional columns that divide up HTML requests from PDF requests. The rationale for this was that some publishers provide multiple format versions of the same article and that readers tend to browse an HTML version before downloading the PDF version of the same article. Without breaking these two formats down, it would have been difficult to compare the usage of a publisher that provided both versions with another that provided only say PDF.The consistency goal is achieved by creating very explicit standards for how reports are presented.

    Not much is left to interpretation or imagination in the latest COUNTER release. All cells in the report are defined and described so there is no question on what goes into them. The main impetus for this change was the development of electronic management systems for libraries that could ingest these reports.NISO has Web Services committeeAmazon use web service to integrate book buying into other sitesThousands of others

    SOAP: SOAP Version 1.2 is a lightweight protocol intended for exchanging structured information in a decentralized, distributed environment. Simple Object Access Protocol. SOAP is a lightweight XML based protocol used for invoking web services and exchanging structured data and type information on the Web. (Oracle)

    Web Service: Open standard (XML, SOAP, etc.) based Web applications that interact with other web applications for the purpose of exchanging data. (lucent)

    XML Schema: XML Schema is a language for describing the structure and constraining the contents of XML documents. (reactivity.com glossary)