similarity-based recommendation of olap sessions 1 julien aligon by julien aligon arnaud giacometti...
TRANSCRIPT
Similarity-Based Recommendation of OLAP Sessions
1 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
by Julien Aligon
Arnaud Giacometti & Patrick Marcel (supervisors)
Marie-Aude Aufaure & Olivier Teste (reviewers)Chedy Raïssi & Stefano Rizzi (examiners)
Université François Rabelais ToursLaboratoire d’Informatique
BDTLN Team
December 13th 2013
2 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Contents
Contents
1. What is a proper recommendation?
2. Defining Similarities for OLAP Sessions
3. The SROS Recommender System
4. Assessing the quality of the recommender system
5. Conclusion & Perspectives
3 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Contents
Contents
1. What is a proper recommendation ?
2. Defining Similarities for OLAP Sessions
3. SROS System
4. Assessing the quality of the recommender system
5. Conclusion & Perspectives
4 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Recommender Systems (E-commerce)
Recommender Systems (E-commerce)
Aleksander
Here are the movies that I rated:
Per qualche dollaro in piu (1)
Reservoir Dogs (0.7)
Requiem for a Dream (0.7)
Django Unchained (0.7)
Recommendation ?
Three approaches :
• Content-based
• Collaborative Filtering
• Hybrid
Best approaches
Quality measures already proposed
5 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Recommender Systems (Databases)
Recommendations in a Database Context
Aleksander Multi-user context
Query:• Declarative language• Over a schema
LOG
Analysis Sessions
How to recommend queries or sessions?• In a multi-user context• Leveraging the schema
Colloborative Filtering approachUsing query expression to be efficient
6 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Query Recommendation in Data Warehouses > Modeling Multidimensional data
Modelling Multidimensional Data
One cube with hierarchies, and a set of measures:
Income PropInsr PerWt
CostGas CostWtr CostElec
City AllRaces Year
AllOccs SexGroup-by set
Selection set
Measure set
Sex=Female
City=NewYork
AvgIncome CostGas
Query 1 Query 2 Query 3
Query: fragment-based
Session:
7 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Query Recommendation in Databases > SnipSuggest approach
SnipSuggest ([Khoussainova et al, 2010]) approach
Aleksander
SELECT title, genre FROM Movies WHERE
director = ‘Quentin Tarantino’
AND
LOG
FROM Movies WHERE director=‘Quentin Tarantino’ WHERE genre=‘Western’
FROM Movies WHERE director=‘Quentin Tarantino’ WHERE genre=‘Thriller’
SELECT title WHERE director=‘Quentin Tarantino’ WHERE year>1995
Conf:1/5
Conf:4/5
Conf:4/5
TOP-K
Association rules
8 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Query Recommendation in Data Warehouses > [Sapia, 2000] & [Aufaure et al., 2013] approaches
PROMISE [Sapia, 2000] & [Aufaure et al., 2013] approaches
LOG Clusters of queries
c1
Aleksander
Current Query
Query
9 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Query Recommendation in Data Warehouses > [Giacometti et al., 2009] & [Negre, 2009] approaches
[Giacometti et al., 2009] & [Negre, 2009] approachesGeneric Framework :
Query 1 Query 2Hausdorff Distance
Edit Distance
Session 1 Session 2
Distance in Hierarchy
Position in a cube Position in a cube
Current Session
Log
10 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Query Recommendation in Databases & Data Warehouses > Approaches
Reference Collaborative ? Query Model Output Technique
SnipSuggest[Khoussainova et al, 2010]
Yes Query Expression Set of query fragments
Stochastic
[Sapia, 2000]& [Aufaure et al., 2013]
Yes Query Expression A query Similarity-basedStochastic
[Giacometti et al., 2009], [Negre, 2009]
Yes Query Answer A query Similarity-based
QueRIE[Eirinaki et al., 2013]
Yes Query Expression Set of queries Similarity-based
[Giacometti et al., 2011] Yes Query Answer Set of queries Stochastic
[Jerbi et al., 2009] No Query Answer Query fragment Preference-based
Icube[Sarawagi et Sathe, 2000]
No Query Answer Set of tuples Stochastic
Query Recommendation in Databases & DataWarehouses
SROS Yes Query Expression Sequence of queries Similarity-based
11 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
What is a proper recommendation ? > Query Recommendation in Databases & Data Warehouses
Query Recommendation in Databases & DataWarehousesA Desirable Recommendation:
• How to propose an informative recommendation to the user?Proposal:
Recommend a sequence of queries
• How to find log sessions close to the current session?Proposal:
Define a two-level similarity measure between sessions
• How to be consistent with the context of the current session?Proposal: Adapt the recommendation with that of the current session
• How to provide a relevant recommendation to the user?Proposal:
Define quality criteria assessing the recommendation
12 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Contents
Contents
1. What is a proper recommendation ?
2. Defining Similarities for OLAP Sessions
3. SROS System
4. Assessing the quality of the recommender system
5. Conclusion & Perspectives
13 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Requirements for Similarity-based Recommendation of OLAP Sessions
Requirements for Similarity Measures between OLAP Sessions
Intuitively:• The order of queries is relevant• Recent queries are more relevant than older queries
How to define a similarity measure for sequences? Classically: Dice, Edit Distance, soft-TFIDF, Subsequence Alignment• How to include the comparison between the sequence elements?
A two-level approach based on similarities between elements
Classically: Cosine similarity, Hamming Distance, Hausdorff Distance
No proposal for a measure using query expression!
14 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Queries
Similarity Measure between Queries
Query 1 Query 2
City AllRaces Year
AllOccs Sex
Sex=Female
City=New-York
AvgIncome CostGas
City AllRaces AllYear
AllOccs Sex
Sex=Female
Region=West
AvgIncome MaxCostElec
[Aligon et al., KAIS, 2013]
15 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions
Similarity Measure between Sessions
Different classical approaches extended to OLAP context:
• Dice Coefficient
[Aligon et al., KAIS, 2013]
16 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions
Similarity Measure between Sessions
Different classical approaches extended to OLAP context:
• Dice Coefficient
[Aligon et al., KAIS, 2013]
17 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions
Similarity Measure between Sessions
Different classical approaches extended to OLAP context:
• soft-TF-IDF
[Aligon et al., KAIS, 2013]
Log
TF-IDF Scores
18 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions
Similarity Measure between Sessions
Different classical approaches extended to OLAP context:
• Levenstein Distance
[Aligon et al., KAIS, 2013]
Based on a matrix of cost computing between two sessions the minimal number of:• Insertion (I)• Deletion (D)• Substitution (S)
(I) (S) (D)matching
19 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions
Similarity Measure between Sessions
Different classical approaches extended to OLAP context:
• SubSequence Alignment
[Aligon et al., KAIS, 2013]
To find the optimal local alignment whose result is a trade-off between the cost of gap and mismatching
Perfect alignement-> Maximal similarity
Good alignement-> Good similarity
GAP
Very Bad alignement-> Very low similarity
No Alignment-> Minimal similarity
20 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions
Extension of Subsequence Alignment in the OLAP context
Time-discounting function
Gap Penalty : variable score ensuring few gaps and long alignments
21 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests
Assessed with Subjective Tests
[ebiss 2011 Summer School]
questionnaireOpinions:
Query SimilaritiesSession Similarities
• 41 students & researchers from ebiss’2011 summer school• Given their similarity degrees (Low, Fair, Good, High) between a
current query and different queries• Given their similarity degrees (Low, Fair, Good, High) between a
current session and different sessions
22 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests
Assessed with Subjective Tests
% User Agreement
Edit-Distance Dice TF-IDF SubSequence Alignment
Test 1 51% 51% 29% 51% 51%
Test 2 43% 33% 9% 39% 43%
Test 3 51% 41% 4% 51% 51%
Test 4 36% 19% 26% 35% 35%
Test 5 38% 33% 13% 33% 33%
23 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests
Assessed with Objective Tests
24 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests
Assessed with Objective Tests
Log Edit Distance Dice TF-IDF SubSequence Alignment
1.79 1.57 1.51 5.23
1.46 1.52 1.31 3.21
1.39 1.16 1.39 2.32
1.44 1.23 1.32 2.15
1.08 1.57 1.42 0.78Average: 1.40 1.35 1.35 2.51
25 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Contents
Contents
1. What is a proper recommendation ?
2. Defining Similarities for OLAP Sessions
3. SROS System
4. Assessing the quality of the recommender system
5. Conclusion & Perspectives
26 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
SROS System
SROS System
Composed of three phases:
• Selection : Select a set of possible futures
• Ranking : Rank the futures from frequent similarities
• Tailoring : Adapt the best future to the current session
27 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
SROS System > Selection
SROS System
Current Session
Futures
Log
Former Close Sessions1 Selection
AleksanderCurrent Session
Log Session
Future
28 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
SROS System > Ranking
SROS System
Current Session
Futures
Log
Former Close Sessions1 Selection
Aleksander Ranking2
2
3
4
29 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
SROS System > Ranking
SROS System
Current Session
Futures
Log
Former Close Sessions1 Selection
Aleksander Ranking2
1
2
3
4
Futures
30 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
SROS System > Ranking
SROS System
Current Session
Futures
Log
Former Close Sessions1 Selection
Aleksander Ranking2
1
2
32 0
Futures
1
2 0
2
1 0
4
31 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
SROS System > Tailoring
SROS System
Current Session
Futures
Log
Former Close Sessions1 Selection
Aleksander Ranking2
1
2
3
Tailoring3
Adapted Recommendation
Association rules
Current Session
Year=2002 Year=2002 Year=2002
AvgIncome AvgIncome
Recommendation
Rules of Type 1
Year=2005 Year=2002
Log Session
AvgCostWtr
AllCities Region State
AvgCostGas AvgCostGas AvgCostGas
Year=2005 Year=2005 Year=2005
State City City
Year=2005 Year=2005
AvgIncome AvgIncome
Region State
Year=2002 Year=2002 Year=2002
32 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
SROS System > Tailoring
SROS System
Current Session
Log
Former Close Sessions1 Selection
AleksanderRanking2
1
2
3
Tailoring3
Adapted Recommendation
Association rules
Current Session
Year=2002 Year=2002 Year=2002
AvgIncome AvgIncome
Recommendation
Rules of Type 2
Year=2002
AvgCostWtr
AllCities Region State
AvgCostGas AvgCostGas AvgCostGas
State City City
Year=2002 Year=2002 Year=2002
AvgIncome
AvgIncome AvgIncome AvgIncome
33 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Contents
Contents
1. What is a proper recommendation ?
2. Defining Similarities for OLAP Sessions
3. SROS System
4. Assessing the quality of the recommender system
5. Conclusion & Perspectives
34 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Quality Measures
Quality Measures
Different Quality criteria :
• Novelty
Recommendation
LOG
35 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Quality Measures
Quality Measures
Different Quality criteria :
• Novelty
• Adaptation
AvgCostElec
AllCities
YearAllOccs
Race
AllSexes
Year=2002
Current Session
Recommendation
AvgIncome
AllCities
YearAllOccs
Race
Sex
Year=2005
36 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Quality Measures
Quality Measures
Expected Recommendations
Recommendations
Different Quality criteria :
• Novelty
• Adaptation
• Accuracy & Coverage
37 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Quality Measures
Quality Measures
An effective recommendation system finds a trade-off between:
• A novelty whose recommendation provides new informative fragments from the log
• An adaptation preserving the consistency with the current session
38 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Synthetic & Real Logs
Behavior Generation
IQ
SQ
RQ
RQ
RQ
Shortest OLAP Path
IQ: Initial QueryFQ: Final QuerySQ: Surprising QueryRQ: Random Query
One random operationExplorative template
IQ
FQ
RQ
RQ
RQ
RQ
Goal-oriented template
Synthetic log :
• including Explorative and Goal-Oriented templates, randomly chosen
• composed of 200 sessions (a total of 2950 queries)
39 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Synthetic & Real Logs
Real Logs
Questionnaires performed by Master’s students :
• 40 students from University of Bologna and University of Tours
• Devising OLAP sessions answering to questions of different complexities
• A filtering is essential : (145 sessions after filtering)
40 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Effectiveness Test > Principle
Effectiveness Test Principle
Log
Current session Expected Recommendation
• Conducted with the following logs : The synthetic log The logs devised by the student
• N-fold cross validation
41 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Effectiveness Test > Accuracy Result
Accuracy Result
Synthetic Log
42 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Effectiveness Test > Accuracy Result
Student Log
Accuracy Result
43 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Effectiveness Test > Novelty Result
Novelty Result
Novelty Measure
44 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Assessing the quality of the recommender system > Effectiveness Test > Adaptation Result
Adaptation Result
Adaptation Measure
45 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Contents
Contents
1. What is a proper recommendation ?
2. Defining Similarities for OLAP Sessions
3. SROS System
4. Assessing the quality of the recommender system
5. Conclusion & Perspectives
46 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Conclusion & Perspectives
Conclusion• Requirements for Recommendation and Similarity measures
• Definitions of query and session similarities for OLAP• Assessed with subjective and objective tests• Query similarity based on the structure• Session similarity extending Subsequence Alignment
• Proposal of a similarity-based recommender system of OLAP sessions based on three phases:• Selection• Ranking• Tailoring
• The recommender system is assessed in terms of effectiveness:• Quality Measure Proposals• The recommendations are:
well adapted to the context of the current session preserve the logic of the log session to provide new
information to the user very accurate, for very different contexts of log density
47 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Conclusion & Perspectives
A tool for Session Design Assistance
How to remedy to the cold start problem ?
• Solution : exploring the former session to initiate the first queries of a current session
• Problem : how to navigate between the sessions ?• Proposal : organizing sessions in a hierarchical structure and defining
browsing operators
• Problem : how to represent groups of sessions ?• Proposal : using summarization techniques to reduce the number of
queries and sessions but also to design a representative session ([Aligon & Marcel, EDA’2012], [Aligon et al., PersDB’2012])
48 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Conclusion & Perspectives
A benchmark of OLAP sessions
Development of a platerform to assess the quality of an analytical session over a cube:
• Allowing to measure the effectiveness of user-centric approaches
• Finding a more precise definition of OLAP session
49 Similarity-Based Recommendation of OLAP SessionsJulien Aligon
Conclusion & Perspectives
Adaptation of the Recommender System in other contexts
• Adaptation in Data-Mining context where sessions can be considered as sequences of complex tasks
• Supposing to adapt the session similarity
• In the Web, sequences can be analysis sequences over social networks :• Taking into account the relationships between users• Considering a similarity measure between users to define
user profiles