an xml log standard and tool for digital library logging analysis
DESCRIPTION
An XML Log Standard and Tool for Digital Library Logging Analysis. Marcos Andr é Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia Tech. Outline. Motivation Related Work Problems with existing DL logs The Digital Library Standardized Log Format - PowerPoint PPT PresentationTRANSCRIPT
An XML Log Standard and Tool for Digital Library Logging Analysis
Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox
Virginia Tech
Outline Motivation Related Work
Problems with existing DL logs
The Digital Library Standardized Log Format DL log standard design DL Log format structure
DL log tool and its implementation Conclusions and future work
Motivation Log analysis
Source of information about: How patrons really use DL services How systems behave while supporting user information seeking activities Examples: patterns
Used to: Evaluate Enhance services Help and design user interfaces Better allocation of resources
Common practice in the web setting Supported by web servers, proxy caching
Motivation (cont.) DLs differ from the web
DL collections are explicitly organized, described, managed, and preserved Users with more specific tasks and needs Digital objects and collections more structured
DL Logging should offer much richer information and opportunities Tradeoff : user privacy
Current DL logs Differences in formats and recorded information Problems:
Lack of interoperability No reuse of analysis tools Comparability of log analysis results
Related Work Web Servers (Common Log Format)
Focused in browsing, stateless
bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:21 -0400] "GET /~harley/courses.html HTTP/1.0" 200 1734bbn-cache-3.cisco.com - - [22/Oct/1998:00:20:22 -0400] "GET /~harley/clip_art/word_icon.gif HTTP/1.0" 200 1050www4.e-softinc.com - - [22/Oct/1998:00:20:27 -0400] "HEAD / HTTP/1.0" 200 0user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/capehatteras.html HTTP/1.0" 200 328user-38ldbam.dialup.mindspring.com - - [22/Oct/1998:00:20:48 -0400] "GET /~lhuang/junior/PB2panforringed.mirror.gif HTTP/1.0" 200 20222eger-dl01.agria.hu - - [22/Oct/1998:00:20:51 -0400] "GET /~tjohnson/pinouts/ HTTP/1.0" 200 26994
Related Work (cont.) DL- Greenstone
ADMINISTRATION 37
/fast-cgi-bin/niupepalibrary
(a) its-www1.massey.ac.nz
(b) [Thu Dec 07 23:47:00 NZDT 2000]
(c) (a=p, b=0, bcp=, beu=, c=niupepa, cc=, ccp=0, ccs=0, cl=, cm=, cq2=, d=, e=, er=, f=0, fc=1, gc=0, gg=text, gt=0, h=, h2=, hl=1, hp=, il=l, j=, j2=, k=1, ky=, l=en, m=50, n=, n2=, o=20, p=home, pw=, q=, q2=, r=1, s=0, sp=frameset, t=1, ua=, uan=, ug=, uma=listusers, umc=, umnpw1=, umnpw2=, umpw=, umug=, umun=, umus=, un=, us=invalid, v=0, w=w, x=0, z=130.123.128.4-950647871)
(d) "Mozilla/4.08 [en] (Win95; I ;Nav)"
Relate Work (cont.) Search Engine - OpenTextMon Sep 28 17:48:42 1998----- Starting Search -----Mon Sep 28 17:48:42 1998{Transaction Begin}Mon Sep 28 17:48:42 1998{RankMode Relevance1}Mon Sep 28 17:48:42 1998"Bacillus thuringiensis " Mon Sep 28 17:48:42 1998P0 = "Bacillus thuringiensis " Mon Sep 28 17:48:42 1998R = (*D including (*P0))Mon Sep 28 17:48:42 1998R = (((*R rankedby *P0)))Mon Sep 28 17:48:42 1998S = (subset.1.10 (*R))Mon Sep 28 17:48:42 1998SL0 = (region "OTSummary" within.1 (*S))Mon Sep 28 17:48:42 1998(*SL0 within.1 ( subset.1.1 *S ))Mon Sep 28 17:48:42 1998(*SL0 within.1 ( subset.2.1 *S ))Mon Sep 28 17:48:42 1998{Transaction End}
Related Work (cont.) Problems with existing DL logs
Incompatibility Incompleteness Complexity of analysis Lack of organization Ambiguity Inflexibility Verboseness
The Digital Library Standardized Log Format Comprehensive Reflective of the actual DL system behavior Easily readable Precise Flexible to accommodate in varying systems Succinct enough to be implemented Concern: user privacy
The Digital Library Standardized Log Format- Design (cont.) Capture high level user and system behaviors
Hierarchical organization Encapsulated in transactions
Interactions between the users and the system or among the system components
Log format designed to record a number of different kinds of transactions
Examples:1. Login to the system 2. Submission of search query3. Browsing a result list4. Recording of a user failure
The Digital Library Standardized Log Format- Design (cont.)
Design Reflective of DL behavior Based on the 5S formal theory
Unifying, mathematical theory to formally describe the semantics of DL components
Guidance for how to organize the log structure
The Digital Library Standardized Log Format- Design (cont.)
5S Definition Use in Log Design
Streams Represent static and dynamic multimedia content
Temporal events, types of digital objects
Structures Labeled directed graphs; provide organization within the DL
Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme
Spaces Sets, properties and operations on those sets
Retrieval mode, Presentation information,
Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement.
Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios.
Societies Sets of communities and relationships among them
User information
The Digital Library Standardized Log Format (cont.)
Specification Collection of extensive, flat set of attributes
query
event
registering
transaction
session
errorbrowse
actiontimestamp
Machine information
help
search
update
Sorting rule
search
catalog
collection Resultcutoff
response
The Digital Library Standardized Log Format - Specification
Organization in structured logical way XML- XML Schema
Standard syntax Guarantee quality, correctness Rich set of basic types help standardization Abundance of XML parsers helps construction
of analysis tools
The Digital Library Standardized Log Format - Structure Top Level Hierarchy
Log
Log Entry
Transaction
SessionId
MachineInfo
TimeStamp
Statement
. . . . . .
The Digital Library Standardized Log Format – Structure (cont.)
Decomposition of statement into different types
AdmInfo
Statement
SessionInfo
Event
ErrorInfo
HelpInfo
RegisterInfo
AdmInfo
Statement
SessionInfo
Event
ErrorInfo
HelpInfo
RegisterInfo
Action StatusInfo
Search Browse StoreSysInfoUpdate
The Digital Library Standardized Log Format – Structure (cont.)
Decomposition of event
The Digital Library Standardized Log Format – Structure (cont.)
Search Attributes
Search
QueryString
TimeFrame
PresentationInfo
SearchBy
Format NumberOfResultsSortBy CutOff
Collection
Catalog
DL Log Tool and Implementation Java classes
XMLLogData: store data XMLLogManager: methods to read and write log
information according to the format Synchronized read and writes: avoid conflicts and
inconsistencies
Middleware for plug-in DL tool to target system Events based on target system architecture and
implementation Implemented in the MARIAN DL system
DL Log Tool and Implementation (cont.): the MARIAN DL system
Database Layer
Search Layer
UserInteraction
Layer
Data Analysis,Collection Builders &Loading Tools
Webgate
Semantic networks persistent storageGeneralized inverted
index interfaces
DL Information networks characterization, indexing and loading
Tailored DL Infrastructure generation
Database management API
Searcher community
Semantic networkManagement API
Fusion modules
Distributed client communication
Structured logging
Customization and personalization
Query history
Multilingual support
DL Log Tool and Implementation (cont.)
MARIANUser Layer
XMLLogManagerwriteLogEntry(parameters)
c1
XMLLogData
c2
Log middleware
Systemevent
storelogData(parameters)
Userevent
Analysistool
getLogData(parameters)
logData
Analysisrequest
result
DLpatron
DLanalyst
DL Log Tool and Implementation (cont.) Example 1: Login to the system
<Transaction ID = "3452"> <SessionId > 987654usr3 </SessionId> <SessionInfo> <SessionStart> Start </SessionStart> <LoginInfo> <UserId> mhabib <UserId> </LoginInfo> </SessionInfo> <TimeStamp> 2002-05-31T20:10:55.000-05:00 </TimeStamp> <MachineInfo> <IPAddress> 128.173.244.56 <IPAddress> <Port> 8000 </Port> </MachineInfo></TransId>
DL Log Tool and Implementation ... <Event> <Action> <Search> <Collection>Dirline</Collection> <ObjectType>CommunityRecord</ObjectType> <SearchBy>SearchByAnyParts</SearchBy> <SearchType>NonPersistant</SearchType> <QueryString>low back pain</QueryString> <TimeFrame> <StartTime>2002-05-31T20:11:07.000-05:00</StartTime> <EndTime>2002-05-31T20:11:09.000-05:00</EndTime> </TimeFrame> <PresentationInfo> <Format>List</Format> <SortBy>ByRank</SortBy> <NumberOfResults>217</NumberOfResults> <Cutoff>20</Cutoff> </PresentationInfo> ...
Example 2: query all Dirline records about “low back pain”
DL Log Tool and Implementation
<Transaction ID = "3456">
<SessionId > 987654usr3 </SessionId>
...
<Statement>
<Event>
<Action>
<Browse>
<DocID> 5114 </DocID>
<DocName>University of Washington School of
Medicine Multidisciplinary Pain Center (UWPC)
</DocName>
...
Example 3: Browse an item of the ranked list returned as an answer for the previous search
In conclusion Analysis of current DL log formats
Need for standardization, common practices, interoperable tools
Designed an XML-based log format standard for DL logging analysis Captures a rich, detailed set of system and user
behaviors.
Implemented format in a log component tool Connected to the MARIAN DL system
Future Work Build suite of Components for Evaluation Use log format and tools to evaluate several projects
Networked Digital Library of Theses and Dissertations (NDLTD)
CITIDEL Broadening the scope of use to other NSDL projects Extend and use log tool with other DL systems and
architectures Consider user privacy issues Explore info for personalization
Future work Crosswalks to other standards (e.g. CLF)
“Not yet other standard” More challenges
Distributed Logs Large settings
Investigate compression issues to deal with XML verboseness
Promote discussions: Listserv: [email protected]