characterizing machine agent behavior through sparql query mining
Post on 20-Jan-2015
16.735 Views
Preview:
DESCRIPTION
TRANSCRIPT
Characterizing Machine Agent Behavior through SPARQL Query
MiningAravindan RaghuveerYahoo! Inc, Bangalore.
aravindr@yahoo-inc.com
Yahoo! Confidential
Introduction: LOD Users
The LOD cloud has two types of users- Humans (browsers). - Programs / machine agents.
2
Yahoo! Confidential
Introduction: LOD Access Methods
3
The data on the LOD cloud can be accessed in multiple ways.
For this work, we categorize them into two buckets:- SPARQL : A powerful declarative graph query
language
- Non-SPARQL: Direct linked data requests.
Yahoo! Confidential
Motivation: User Behavior Understanding
Deep Understanding of client behavior can help build “better” serving systems
Better:- Secure- Scalable- Available
Prior Work:- Moller et al , WebSci 2010- Picalausa et al. Swim 2011- Kirchberg et. al Usewod 2011- Mario et. Al, Usewod 2011 4
Yahoo! Confidential
Summarizing. . .
5
Human Users Machine Agents
Non-SPARQL
SPARQL This paper’s focus
Yahoo! Confidential
What this paper is about?
Mining of the USEWOD query log dataset to identify:
- Two Trends in Machine Agent Querying
- Two Patterns in Machine Agent Querying
6
Yahoo! Confidential
The USEWOD dataset
Query logs of servers hosting a part of LOD cloud data.
7
Type # records(million)
% SPARQL
bio2rdf Life sciences ~ 0.2 100%
lgd Geo ~ 1.9 100%
SWDF Conference ~ 16.7 43.38%
dbpedia Structured wikipedia
~ 36.2 46.9%
Yahoo! Confidential
Part-1: Two Trends in Machine Agent Querying
The Theme
“What are the overarching trends for SPARQL queries?”
8
Yahoo! Confidential
Trend-1: SPARQL is here to stay!
9
SWDF Dbpedia
Take-away: SPARQL query volume is pretty significant
0.1 – 1million
Yahoo! Confidential
Trend-2: SPARQL is heavily used by machine agents.
10
Took 17 million user agents from SPARQL queries from dbpediaand..
Yahoo! Confidential
Part-2: Two Patterns in Machine Agent Querying
The Theme
“Looking at SPARQL query logs, can we reason about the program that generated the queries?”
11
Yahoo! Confidential
Salient aspects of proposed Query Mining Techniques
Move from per query analysis to query session analysis
Move from query analysis to query result analysis
12
Yahoo! Confidential
Pattern -1 : Loops in Programs
Take-away
• Through a per-user, temporal mining of logs, we discover patterns that are caused by loops in program.
• Significant support in all 4 datasets
13
Yahoo! Confidential
Per-user Temporal mining
14User-1 User-2 User-3 User-4
TIME
Original Logs
User level Session Analysis
Loop
Yahoo! Confidential
Intra Pattern Loop
successive queries from the same user, use the same “template”
Example: Two successive queries:
15
SELECT * WHERE {http://bio2rdf.org/dr:D00332http://bio2rdf.org/ns/bio2rdf#xRefhttp://bio2rdf.org/cas:54-47-7}
SELECT * WHERE{http://bio2rdf.org/dr:D00333http://bio2rdf.org/ns/bio2rdf#xRefhttp://bio2rdf.org/cas:54-47-7}
Only the subject (D00332,D00333) varies
Yahoo! Confidential
Detecting Intra Pattern Loop
We convert a query to its canonical form by replacing variables, URI and literals by “keywords”.
16
SELECT * WHERE {http://bio2rdf.org/dr:D00332http://bio2rdf.org/ns/bio2rdf#xRefhttp://bio2rdf.org/cas:54-47-7}
Canonical Form of the previous queries: SELECT * WHERE { _URI_ _URI_ _URI_ }
Queries generated by the same template will have the same canonical form.
Yahoo! Confidential
Salient Aspects of Intra Pattern loops
Iterate over a dictionary of values (categorical)
Iterate over a numerical range (example LIMIT, OFFSET parameters in SPARQL queries)
Multiple levels of nested loops with the same intra loop pattern.
4 Parameters to quantify above (in paper)17
Yahoo! Confidential
Inter Pattern Loops
Found loops that iterate over a set of patterns
18
P1,P2,P3 ,P1,P2,P3,P1,P2,P3
Typically used when the output of the first query goes as a parameter to the second query.
(examples in paper)
Yahoo! Confidential
Results
19
86% 32%
40% 16%
Take-away:Significant support
for loops!bio2rdf
lgd
swdf dbpedia
Yahoo! Confidential
Pattern-2: Querying for dbpedia Linkage
Take-away:• By executing each query • analyze the results, we find that a portion of
queries “look” for dbpedia links• Results:- 20 months of SWDF queries had average of 8% look
for dbpedia urls- 2 days worth of lgd queries had 26.5% queries look
for dbpedia urls
20
Yahoo! Confidential
Summary & Conclusions
Proposed 2 new ways of SPARQL query mining:- Session view - Analyze results in addition to query
Showed that machine agents look for dbpedia using the owl:sameas annotation.
21
Influence on system design:- Can we pre-fetch elements in loop beforehand?- Priortitize dbpedia attributes for caching
Influence on log collection & analysis:- Stratified random sampling to remove effect of loops.
Yahoo! Confidential
22
For the great data !! For the great feedback & commentsFor listening!
Yahoo! Confidential
The famous LOD Cloud . . .
7 billion triples and counting!!23
top related