transforming and leveraging olap queries patrick marcel université françois rabelais tours...

Post on 20-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Transforming and leveraging OLAP queries

Patrick MarcelUniversité François Rabelais Tours

Laboratoire d'Informatique

SAP-BO, 06.22.2010

2

Outline

Short CV Personnalizing OLAP queries Recommending OLAP queries Summarizing OLAP queries Perspectives

3

About me

PhD « multidimensional data(base) manipulations and rule based languages », defended 1998, LISI (now LIRIS) INSA Lyon Sup. J. Kouloumdjian and MS Hacid

Maître de Conférences, UFRT, Dépt. Informatique Head of the Masters program in Information

systems and decision making Semester off (September 2010 – January 2011)

4

About me (cont'd)

Member of DB & NLP team (4 PR, 8 MCF) NLP XML and web technology Data mining and OLAP Recent activities

• Pattern based global models (PhD Eynollah Khanjari 2009)

• Summarizing and visualizing large sets of association rules (PhD Marie Ndiaye 2010)

• Collaborative exploration of datawarehouses (PhD Elsa Negre 2009)

5

Personnalizing OLAP queries

PhD Hassina Mouloudi (2007) Main pulications

ACM DOLAP 2005 BDA 2006 Hassina's dissertation (in French)

Prototype Mobile application for querying a cube with query

personnalization Mondrian, Oracle, Tomcat, Axis

6

Motivation

SELECT CROSSJOIN({City.Tours, City.Orleans},{Category.Members}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS

FROM SalesCubeWHERE (Measures.quantity)

Visualization depends on the user's profile

2003 2004 2005 2006Tours Drink 77 54 55 33

Food 89 61 30 41Orleans Drink 25 50 49 32

Food 33 44 59 27

Tours 2003 2004 2005 2005Food 77 54 55 33Drink 89 61 30 41Cloth 56 30 32 60Shoes 45 50 32 51

7

The problem

• Given

– An MDX query q

– User preferences P

– A Visualization constraint v

• Find a preferred query q'

– Included in q

– Nearest to q satisfying v

– The most interesting w.r.t P

8

Example of preferred query

SELECT CROSSJOIN({City.Tours},

{Category.Food,Category.Drink}) ON ROWS

{Year.2005} ON COLUMNSFROM SalesCubeWHERE (Measures.quantity)

SELECT CROSSJOIN({City.Tours},{Year.2006}) ON ROWS{Category.Drink} ON

COLUMNSFROM SalesCubeWHERE (Measures.quantity)

<

Since the user profile contains

Location < Product, Product < Time2005 < 2006, food < drink

Indeed:

(2005,Food,Tours,quantity) < (2006,Drink,Tours,quantity)(2005,Drink,Tours,quantity) < (2006,Drink,Tours, quantity)

9

Personnalizing

User query

Result

User profilDimension tables

Fact table

QueryprocessorPersonnalization engine

10

Personnalizing OLAP queries

• Context

– Dimension tables in main memory

– No acces to the fact table

• Principle

– Compute sets of positions in the resulting crosstab• Largest possible

• Visualizable w.r.t. The visualization constraint

• Corresponding to the preferred facts

– Compute the structures of the crosstabs

11

Example of personnalization (1)

The query:SELECT CROSSJOIN({City.Tours, City.Orleans},

{Category.Members}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS

FROM SalesCubeWHERE (Measures.quantity)

Preferences:Time < Location and Product < Location2002 < 2003 < 2004 < 2005 < 2006Electronics < shoes < cloth < food < drinkQuantity < price

Constraint: 2 axes, no more than 4 positions on each axis

12

Example of personnalization (2)

2006Drink Orleans

Tours

Step 1The most preferred facts

13

Example of personnalization (3)

2006Drink Orleans

Tours

2006 2005Drink Orleans

ToursFood Orleans

Tours

Step 2The second most preferred facts

14

Example of personnalization (4)

2006Drink Orleans

Tours

2006 2005Drink Orleans

ToursFood Orleans

Tours

2006 2005 2004Drink Orleans

ToursFood Orleans

Tours

Drink Food ClothTours 2005

2006Orleans 2005

2006

Step 3: the next most preferred factsBut the selected facts have to satisfy the visualization constraint

15

Example of personnalization (5)

Finally, one of the constructed query is

SELECT CROSSJOIN({City.Tours, City.Orleans},{Category.Food, Category.drink}) ON ROWS{2003, 2004, 2005, 2006} ON COLUMNS

FROM SalesCubeWHERE (Measures.quantity)

2003 2004 2005 2006Tours Drink 77 54 55 33

Food 89 61 30 41Orleans Drink 25 50 49 32

Food 33 44 59 27

16

Prototype

17

Speedup

18

Recommending OLAP queries

PhD Elsa Negre (2009) Main publications

ACM DOLAP 2008 DaWak 2009 ACM DOLAP 2009 Int. Journal of DW and mining

Prototype Various methods for OLAP query recommendation

Mondrian, MySql

19

Context and principle

20

Distances

• Between positions in the cube

– Hamming

– Based on shortest path

• Between queries

– Based on differences in dimension

– Hausdorff

• Between sessions

– Based on the subsequence

– Edit distance

21

Experiments

• Cube

– Foodmart (Mondrian sample cube)

• Session generator

– Max 100 cells per MDX query

– 25-50 sessions

– 20-50 queries/session

– Log of 150-25000 queries

– 1-20 queries/current session

22

Efficiency

• Shortest path

• Hausdorff distance

• Edit distance

23

Effectiveness

• 10 fold cross validation

– 1 query set = 10 equally sized subsets• 9 for the log

• 1 for the current sessions

• For the current sessions

– Remove the last query

– check how often this last query is recommended

24

Effectiveness

E= Members of the expected query

R = Members of the recommended query

Precision = Intersect / RRecal = intersect / EFmeasure = 2 * precision * recall / precision + recall

Intersect

25

Query recommandation for discovery driven analysis? Hm this looks

strange to me...

interesting...

26

Processing the log1: Consider all sessions

27

Processing the log

2: consider all queries

1: Consider all sessions

28

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

29

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

4: detect theirdrilldown pairs

30

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

4: detect theirdrilldown pairs

5: detect theirexception pairs

31

Processing the log

2: consider all queries

1: Consider all sessions

3: consider all difference pairs

4: detect theirdrilldown pairs

6: consider only the most general pairshaving drilldown pairs

or exceptions pairs

5: detect theirexception pairs

32

Recommending 1: detectdifference pairs

33

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

34

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

3: suggest the mostgeneral queries...

35

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

3: suggest the mostgeneral queries...

4: ... thendrilldown queries

36

Recommending

2: specialize a mostgeneral pair in the log?

1: detectdifference pairs

3: suggest the mostgeneral queries...

5: ... thenexception queries

4: ... thendrilldown queries

37

Prototype

Java, mondrian OLAP engine & Sarawagi's icube

Preliminary tests show that for small size log (few hundreds of queries) Recommendation time does not exceeds 50 ms

38

Conclusion: so far...Hm this looks

strange to me...

Ongoing work with IRSA (a French social security health examination center)

to analyze over 500.000 health care examination questionnaires

39

Summarizing OLAP queries

Master's thesis Julien Aligon (in progress) Problem: viewpoints on former sessions?

– By summarizing the log• Summarize a sequence of queries by a sequence

of queries

– By browsing/querying the summary Experiments on healthcare data Related publication

– EDA 2007, 2010

40

Perspectives

Project STIC-AmSud PQUERY: preference models for personnalized queries

Forthcomming work with M. Golfarelli (U. Bologna)

– Preference mining to dynamically add preferences to an MDX query

Contributions to a collaborative query management system for OLAP

top related