similarity-based recommendation of olap sessions 1 julien aligon by julien aligon arnaud giacometti...

50
Similarity-Based Recommendation of OLAP Sessions 1 Similarity-Based Recommendation of OLAP Sessions Julien Aligon by Julien Aligon Arnaud Giacometti & Patrick Marcel (supervisors) Marie-Aude Aufaure & Olivier Teste (reviewers) Chedy Raïssi & Stefano Rizzi (examiners) Université François Rabelais Tours Laboratoire d’Informatique BDTLN Team December 13 th 2013

Upload: kelly-sanders

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Similarity-Based Recommendation of OLAP Sessions

1 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

by Julien Aligon

Arnaud Giacometti & Patrick Marcel (supervisors)

Marie-Aude Aufaure & Olivier Teste (reviewers)Chedy Raïssi & Stefano Rizzi (examiners)

Université François Rabelais ToursLaboratoire d’Informatique

BDTLN Team

December 13th 2013

2 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Contents

Contents

1. What is a proper recommendation?

2. Defining Similarities for OLAP Sessions

3. The SROS Recommender System

4. Assessing the quality of the recommender system

5. Conclusion & Perspectives

3 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Contents

Contents

1. What is a proper recommendation ?

2. Defining Similarities for OLAP Sessions

3. SROS System

4. Assessing the quality of the recommender system

5. Conclusion & Perspectives

4 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Recommender Systems (E-commerce)

Recommender Systems (E-commerce)

Aleksander

Here are the movies that I rated:

Per qualche dollaro in piu (1)

Reservoir Dogs (0.7)

Requiem for a Dream (0.7)

Django Unchained (0.7)

Recommendation ?

Three approaches :

• Content-based

• Collaborative Filtering

• Hybrid

Best approaches

Quality measures already proposed

5 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Recommender Systems (Databases)

Recommendations in a Database Context

Aleksander Multi-user context

Query:• Declarative language• Over a schema

LOG

Analysis Sessions

How to recommend queries or sessions?• In a multi-user context• Leveraging the schema

Colloborative Filtering approachUsing query expression to be efficient

6 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Query Recommendation in Data Warehouses > Modeling Multidimensional data

Modelling Multidimensional Data

One cube with hierarchies, and a set of measures:

Income PropInsr PerWt

CostGas CostWtr CostElec

City AllRaces Year

AllOccs SexGroup-by set

Selection set

Measure set

Sex=Female

City=NewYork

AvgIncome CostGas

Query 1 Query 2 Query 3

Query: fragment-based

Session:

7 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Query Recommendation in Databases > SnipSuggest approach

SnipSuggest ([Khoussainova et al, 2010]) approach

Aleksander

SELECT title, genre FROM Movies WHERE

director = ‘Quentin Tarantino’

AND

LOG

FROM Movies WHERE director=‘Quentin Tarantino’ WHERE genre=‘Western’

FROM Movies WHERE director=‘Quentin Tarantino’ WHERE genre=‘Thriller’

SELECT title WHERE director=‘Quentin Tarantino’ WHERE year>1995

Conf:1/5

Conf:4/5

Conf:4/5

TOP-K

Association rules

8 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Query Recommendation in Data Warehouses > [Sapia, 2000] & [Aufaure et al., 2013] approaches

PROMISE [Sapia, 2000] & [Aufaure et al., 2013] approaches

LOG Clusters of queries

c1

Aleksander

Current Query

Query

9 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Query Recommendation in Data Warehouses > [Giacometti et al., 2009] & [Negre, 2009] approaches

[Giacometti et al., 2009] & [Negre, 2009] approachesGeneric Framework :

Query 1 Query 2Hausdorff Distance

Edit Distance

Session 1 Session 2

Distance in Hierarchy

Position in a cube Position in a cube

Current Session

Log

10 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Query Recommendation in Databases & Data Warehouses > Approaches

Reference Collaborative ? Query Model Output Technique

SnipSuggest[Khoussainova et al, 2010]

Yes Query Expression Set of query fragments

Stochastic

[Sapia, 2000]& [Aufaure et al., 2013]

Yes Query Expression A query Similarity-basedStochastic

[Giacometti et al., 2009], [Negre, 2009]

Yes Query Answer A query Similarity-based

QueRIE[Eirinaki et al., 2013]

Yes Query Expression Set of queries Similarity-based

[Giacometti et al., 2011] Yes Query Answer Set of queries Stochastic

[Jerbi et al., 2009] No Query Answer Query fragment Preference-based

Icube[Sarawagi et Sathe, 2000]

No Query Answer Set of tuples Stochastic

Query Recommendation in Databases & DataWarehouses

SROS Yes Query Expression Sequence of queries Similarity-based

11 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

What is a proper recommendation ? > Query Recommendation in Databases & Data Warehouses

Query Recommendation in Databases & DataWarehousesA Desirable Recommendation:

• How to propose an informative recommendation to the user?Proposal:

Recommend a sequence of queries

• How to find log sessions close to the current session?Proposal:

Define a two-level similarity measure between sessions

• How to be consistent with the context of the current session?Proposal: Adapt the recommendation with that of the current session

• How to provide a relevant recommendation to the user?Proposal:

Define quality criteria assessing the recommendation

12 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Contents

Contents

1. What is a proper recommendation ?

2. Defining Similarities for OLAP Sessions

3. SROS System

4. Assessing the quality of the recommender system

5. Conclusion & Perspectives

13 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Requirements for Similarity-based Recommendation of OLAP Sessions

Requirements for Similarity Measures between OLAP Sessions

Intuitively:• The order of queries is relevant• Recent queries are more relevant than older queries

How to define a similarity measure for sequences? Classically: Dice, Edit Distance, soft-TFIDF, Subsequence Alignment• How to include the comparison between the sequence elements?

A two-level approach based on similarities between elements

Classically: Cosine similarity, Hamming Distance, Hausdorff Distance

No proposal for a measure using query expression!

14 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Queries

Similarity Measure between Queries

Query 1 Query 2

City AllRaces Year

AllOccs Sex

Sex=Female

City=New-York

AvgIncome CostGas

City AllRaces AllYear

AllOccs Sex

Sex=Female

Region=West

AvgIncome MaxCostElec

[Aligon et al., KAIS, 2013]

15 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions

Similarity Measure between Sessions

Different classical approaches extended to OLAP context:

• Dice Coefficient

[Aligon et al., KAIS, 2013]

16 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions

Similarity Measure between Sessions

Different classical approaches extended to OLAP context:

• Dice Coefficient

[Aligon et al., KAIS, 2013]

17 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions

Similarity Measure between Sessions

Different classical approaches extended to OLAP context:

• soft-TF-IDF

[Aligon et al., KAIS, 2013]

Log

TF-IDF Scores

18 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions

Similarity Measure between Sessions

Different classical approaches extended to OLAP context:

• Levenstein Distance

[Aligon et al., KAIS, 2013]

Based on a matrix of cost computing between two sessions the minimal number of:• Insertion (I)• Deletion (D)• Substitution (S)

(I) (S) (D)matching

19 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions

Similarity Measure between Sessions

Different classical approaches extended to OLAP context:

• SubSequence Alignment

[Aligon et al., KAIS, 2013]

To find the optimal local alignment whose result is a trade-off between the cost of gap and mismatching

Perfect alignement-> Maximal similarity

Good alignement-> Good similarity

GAP

Very Bad alignement-> Very low similarity

No Alignment-> Minimal similarity

20 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions

Extension of Subsequence Alignment in the OLAP context

Time-discounting function

Gap Penalty : variable score ensuring few gaps and long alignments

21 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests

Assessed with Subjective Tests

[ebiss 2011 Summer School]

questionnaireOpinions:

Query SimilaritiesSession Similarities

• 41 students & researchers from ebiss’2011 summer school• Given their similarity degrees (Low, Fair, Good, High) between a

current query and different queries• Given their similarity degrees (Low, Fair, Good, High) between a

current session and different sessions

22 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests

Assessed with Subjective Tests

% User Agreement

Edit-Distance Dice TF-IDF SubSequence Alignment

Test 1 51% 51% 29% 51% 51%

Test 2 43% 33% 9% 39% 43%

Test 3 51% 41% 4% 51% 51%

Test 4 36% 19% 26% 35% 35%

Test 5 38% 33% 13% 33% 33%

23 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests

Assessed with Objective Tests

24 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Defining Similarities for OLAP Sessions > Similarity Measure between Sessions > Subjective & Objective Tests

Assessed with Objective Tests

Log Edit Distance Dice TF-IDF SubSequence Alignment

1.79 1.57 1.51 5.23

1.46 1.52 1.31 3.21

1.39 1.16 1.39 2.32

1.44 1.23 1.32 2.15

1.08 1.57 1.42 0.78Average: 1.40 1.35 1.35 2.51

25 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Contents

Contents

1. What is a proper recommendation ?

2. Defining Similarities for OLAP Sessions

3. SROS System

4. Assessing the quality of the recommender system

5. Conclusion & Perspectives

26 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

SROS System

SROS System

Composed of three phases:

• Selection : Select a set of possible futures

• Ranking : Rank the futures from frequent similarities

• Tailoring : Adapt the best future to the current session

27 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

SROS System > Selection

SROS System

Current Session

Futures

Log

Former Close Sessions1 Selection

AleksanderCurrent Session

Log Session

Future

28 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

SROS System > Ranking

SROS System

Current Session

Futures

Log

Former Close Sessions1 Selection

Aleksander Ranking2

2

3

4

29 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

SROS System > Ranking

SROS System

Current Session

Futures

Log

Former Close Sessions1 Selection

Aleksander Ranking2

1

2

3

4

Futures

30 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

SROS System > Ranking

SROS System

Current Session

Futures

Log

Former Close Sessions1 Selection

Aleksander Ranking2

1

2

32 0

Futures

1

2 0

2

1 0

4

31 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

SROS System > Tailoring

SROS System

Current Session

Futures

Log

Former Close Sessions1 Selection

Aleksander Ranking2

1

2

3

Tailoring3

Adapted Recommendation

Association rules

Current Session

Year=2002 Year=2002 Year=2002

AvgIncome AvgIncome

Recommendation

Rules of Type 1

Year=2005 Year=2002

Log Session

AvgCostWtr

AllCities Region State

AvgCostGas AvgCostGas AvgCostGas

Year=2005 Year=2005 Year=2005

State City City

Year=2005 Year=2005

AvgIncome AvgIncome

Region State

Year=2002 Year=2002 Year=2002

32 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

SROS System > Tailoring

SROS System

Current Session

Log

Former Close Sessions1 Selection

AleksanderRanking2

1

2

3

Tailoring3

Adapted Recommendation

Association rules

Current Session

Year=2002 Year=2002 Year=2002

AvgIncome AvgIncome

Recommendation

Rules of Type 2

Year=2002

AvgCostWtr

AllCities Region State

AvgCostGas AvgCostGas AvgCostGas

State City City

Year=2002 Year=2002 Year=2002

AvgIncome

AvgIncome AvgIncome AvgIncome

33 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Contents

Contents

1. What is a proper recommendation ?

2. Defining Similarities for OLAP Sessions

3. SROS System

4. Assessing the quality of the recommender system

5. Conclusion & Perspectives

34 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Quality Measures

Quality Measures

Different Quality criteria :

• Novelty

Recommendation

LOG

35 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Quality Measures

Quality Measures

Different Quality criteria :

• Novelty

• Adaptation

AvgCostElec

AllCities

YearAllOccs

Race

AllSexes

Year=2002

Current Session

Recommendation

AvgIncome

AllCities

YearAllOccs

Race

Sex

Year=2005

36 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Quality Measures

Quality Measures

Expected Recommendations

Recommendations

Different Quality criteria :

• Novelty

• Adaptation

• Accuracy & Coverage

37 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Quality Measures

Quality Measures

An effective recommendation system finds a trade-off between:

• A novelty whose recommendation provides new informative fragments from the log

• An adaptation preserving the consistency with the current session

38 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Synthetic & Real Logs

Behavior Generation

IQ

SQ

RQ

RQ

RQ

Shortest OLAP Path

IQ: Initial QueryFQ: Final QuerySQ: Surprising QueryRQ: Random Query

One random operationExplorative template

IQ

FQ

RQ

RQ

RQ

RQ

Goal-oriented template

Synthetic log :

• including Explorative and Goal-Oriented templates, randomly chosen

• composed of 200 sessions (a total of 2950 queries)

39 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Synthetic & Real Logs

Real Logs

Questionnaires performed by Master’s students :

• 40 students from University of Bologna and University of Tours

• Devising OLAP sessions answering to questions of different complexities

• A filtering is essential : (145 sessions after filtering)

40 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Effectiveness Test > Principle

Effectiveness Test Principle

Log

Current session Expected Recommendation

• Conducted with the following logs : The synthetic log The logs devised by the student

• N-fold cross validation

41 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Effectiveness Test > Accuracy Result

Accuracy Result

Synthetic Log

42 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Effectiveness Test > Accuracy Result

Student Log

Accuracy Result

43 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Effectiveness Test > Novelty Result

Novelty Result

Novelty Measure

44 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Assessing the quality of the recommender system > Effectiveness Test > Adaptation Result

Adaptation Result

Adaptation Measure

45 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Contents

Contents

1. What is a proper recommendation ?

2. Defining Similarities for OLAP Sessions

3. SROS System

4. Assessing the quality of the recommender system

5. Conclusion & Perspectives

46 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Conclusion & Perspectives

Conclusion• Requirements for Recommendation and Similarity measures

• Definitions of query and session similarities for OLAP• Assessed with subjective and objective tests• Query similarity based on the structure• Session similarity extending Subsequence Alignment

• Proposal of a similarity-based recommender system of OLAP sessions based on three phases:• Selection• Ranking• Tailoring

• The recommender system is assessed in terms of effectiveness:• Quality Measure Proposals• The recommendations are:

well adapted to the context of the current session preserve the logic of the log session to provide new

information to the user very accurate, for very different contexts of log density

47 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Conclusion & Perspectives

A tool for Session Design Assistance

How to remedy to the cold start problem ?

• Solution : exploring the former session to initiate the first queries of a current session

• Problem : how to navigate between the sessions ?• Proposal : organizing sessions in a hierarchical structure and defining

browsing operators

• Problem : how to represent groups of sessions ?• Proposal : using summarization techniques to reduce the number of

queries and sessions but also to design a representative session ([Aligon & Marcel, EDA’2012], [Aligon et al., PersDB’2012])

48 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Conclusion & Perspectives

A benchmark of OLAP sessions

Development of a platerform to assess the quality of an analytical session over a cube:

• Allowing to measure the effectiveness of user-centric approaches

• Finding a more precise definition of OLAP session

49 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Conclusion & Perspectives

Adaptation of the Recommender System in other contexts

• Adaptation in Data-Mining context where sessions can be considered as sequences of complex tasks

• Supposing to adapt the session similarity

• In the Web, sequences can be analysis sequences over social networks :• Taking into account the relationships between users• Considering a similarity measure between users to define

user profiles

50 Similarity-Based Recommendation of OLAP SessionsJulien Aligon

Thank you for your attention !