harrow school of computer science data & knowledge management group university of westminster...

19
Harrow School of Computer Science Harrow School of Computer Science Data & Knowledge Management Group Data & Knowledge Management Group University of Westminster University of Westminster Watford Road, Northwick Park, HA1 3TP Watford Road, Northwick Park, HA1 3TP London, UK London, UK Ermir Rogova, Panagiotis Chountas, Krassimir Ermir Rogova, Panagiotis Chountas, Krassimir Atanassov Atanassov CLBME CLBME Bulgarian Academy of Sciences Bulgarian Academy of Sciences Sofia, Bulgaria Sofia, Bulgaria

Upload: carson-timms

Post on 31-Mar-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Harrow School of Computer Science Harrow School of Computer Science

Data & Knowledge Management GroupData & Knowledge Management Group

University of WestminsterUniversity of WestminsterWatford Road, Northwick Park, HA1 3TPWatford Road, Northwick Park, HA1 3TP

London, UKLondon, UK

Ermir Rogova, Panagiotis Chountas, Krassimir AtanassovErmir Rogova, Panagiotis Chountas, Krassimir Atanassov

CLBME CLBME

Bulgarian Academy of SciencesBulgarian Academy of Sciences

Sofia, BulgariaSofia, Bulgaria

Page 2: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Imprecision and OLAP

IF-Cube

IF-OLAP Operators

Knowledge Based OLAP

Summary

Page 3: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Current models are mainly concerned with the querying of imprecision at the fact table.

Use of inflexible hierarchies makes it difficult to reconcile data coming from different sources.

Knowledge has to be incorporated as part of the OLAP operators in order to enhance the results.

Page 4: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

To accommodate imprecise data using IFS measures

To accommodate precise information expressed in the form of concepts i.e. Whole milk, whole pasteurised milk, etc.

In the latter case, the information stored in the cube is precise, but the concept is not, i.e. Whole pasteurised milk satisfies requests for whole milk as well (to some extent)

Page 5: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

A cube is an abstract structure that serves as the foundation for the multidimensional data cube model. A cube C is defined as a five-tuple (D, l, F, O, H) where:

D is a set of dimensionsl is a set of levels l1,…, ln,

F is a set of facts instances O is a partial order between the elements of l.H is an object type history which allows to trace the evolution of the cubic structure

Page 6: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Standard OLAP operators cannot cope with imprecise facts

Standard OLAP operators cannot cope with concept-based information i.e. How do you detect and SUM up reconcilable concepts? i.e. whole semi-skimmed and skimmed milk?

Page 7: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Selection (Σ):

◦ The selection operator selects a set of fact-instances from a cubic structure that satisfy a predicate ( ).

◦ Input: Ci = (D, l, F, O, H) and the predicate θ

◦ Output: Co= (D, l, Fo , O, H) where Fo F and Fo={f | (f F)(f satisfies θ)

◦ Σ(amount>1000 (μ>0.4 ν<0.3) year=2004)(Sales)=CResult

Page 8: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Cubic Product (): This is a binary operator Ci1 Ci2 . It is used to

relate two cubes Ci1 and Ci2

Input: Ci1 = (D1, 11, F1, O1, H1) and

Ci2 = (D2,l2, F2, O2, H2)

Output: Co= (Do, lo, Fo, Oo, Ho) where

Do= D1 D2 , lo= l1 l2, Oo= O1 O2 Ho= H1 H2,

Fo= F1 X F2, = {<<x, y>, min(μf1(x), μf2(y)), max(νf1(x), νf2(y),)>|<x, y> XY}

CSales’ CDiscounts = CResult

ProdID StoreID Amount <μ, ν> ProdID StoreID Discount <μ, ν> P1 S1 10 .7, .2 P2 S1 2 .5, .5 P2 S2 15 .5, .5 P3 S3 5 .3, .3

= S.ProdID S.StoreID S.Amount D.ProdID D.StoreID D.Discount <μ, ν>

P1 S1 10 P2 S1 2 .5, .5 P1 S1 10 P3 S3 5 .3, .3 P2 S2 15 P2 S1 2 .5, .5 P2 S2 15 P3 S3 5 .3, .5

Page 9: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Union (): The union operator is a binary operator that finds

the union of two cubes. Ci1 and Ci2 have to be union compatible.

Input: Ci1 = (D1, l1, F1, O1, H1) and Ci2 = (D2,l2, F2, O2, H2) Output:Co= (Do, lo, Fo, Oo, Ho) where Do=D1=D2, lo=l1=l2, Oo=O1=O2,

Ho=H1=H2, Fo= F1 F2 = { < x, max(F1 (x), F2(x)), min(F1(x),F2(x)) > | x X }

CSales_North CSales_South = CResult

ProdID StoreID Amount <μ, ν> ProdID StoreID Amount <μ, ν>

P1 S1 10 .7, .2 P1 S1 10 .5, .5 P2 S2 15 .5, .5 P3 S3 5 .3, .3

=

S.ProdID S.StoreID S.Amount <μ, ν>

P1 S1 10 .7, .2 P2 S2 15 .5, .5 P3 S3 5 .3, .3

Page 10: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Why do we need to keep track of the history?

The structured history of the datacube allows us to keep all the information when applying Roll up and get it all back when Roll Down is performed. To be able to apply the operation of Roll Up we need to make use of the IFSSUM aggregation operator

H is an object type history that corresponds to a cubic structure ( l, D, A, H’ ) which allows us to trace back the evolution of a cubic structure after performing a set of operators i.e. aggregation.

Page 11: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

The result of applying Roll up over dimension di at level dlr using the aggregation operator over a datacube Ci is another datacube (Co )

Input: Ci = (Di ,li ,Fi , O , Hi )

Output: Co = (Do ,lo ,Fo , O , Ho )

An object of type history is the initial state of the cube is a recursive structure H = ( l, D, A, H’ ) is the state of the cube after performing an

operation on the cube

Page 12: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Will the result over which the aggregate is performed be either crisp or Intuitionistic Fuzzy?

What is the meaning of the result after the IF aggregation is performed?

Using the standard definitions for the group operators (SUM, AVG, MIN and MAX), we provide their IFS extensions and meaning.

Page 13: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

IFSUM((attn-1)(F)) =

{<u>/ y | (( u= mi 1min (μi(uki), νi(uki)) (y =

km

kki kiu1

) ( k1, …km : 1 k1, …km n))}

Example: IFSUM((Amount)(ProdID))

{<.8,.1>/10}+{(<.4,.2>/11),(<.3,.2>/12)}+{(<.5,.3>/13),(<.5,.1>/12)}={(<.3, .3>/34), (<.4, .2>/33), (<.3, .3>/35)}

IFAVG :

The IFAVG aggregate makes use of the IFSUM that was discussed previously and the standard COUNT.

IFSSUM :

Page 14: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

IFMAX((attn-1)(F)) = {<u>y|((u= m

i 1min (μi(uki),νi(uki)) (y= m

i 1max (μi(uki),νi(uki)))(k1,…km :1k1,…km n))}

Example: IFMAX((Amount)(ProdID))

{<.8,.1>/10},{(<.4,.2>/11),(<.3,.2>/12)},{(<.5,.3>/13),(<.5, 0.1>/120)}={(<.3,.3>/13),(<.3,.2>/12)}

Page 15: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Concepts are used to describe how the data is organized in the data sources.

These definitions are used to rewrite queries conditions and to combine OLAP features in this process.

This supports the analysis according to the context of users’ explorations in order to guide the decision making, feature inexistent in current analytical tools.

Page 16: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Milk<0.8, 0.1>

Whole milk<1.0, 0.0>

Condensed whole milk

Skim milk

Pasteurized milk

Half skim milk

Sweetened milk

Condensed milk<0.4, 0.3>

Whole pasteurized milkSweetened condensed milk

<0.8, 0.1>

<0.8, 0.1>

<0.8, 0.1>

<0.4, 0.3>

<1.0, 0.0>

<0.8, 0.1>

<1.0, 0.0>

Page 17: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

If the hierarchical IFS structure expresses preferences in a query, the choice of the maximum allows us not to exclude any possible answer. In real cases, the lack of answers to a query generally makes this choice preferable, because it consists of widening the query answer rather than restricting it.

If the hierarchical IFS set represents an ill-known concept, the choice of the maximum allows us to preserve all the possible values, but it also makes the answer less specific.

Page 18: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Flexible hierarchies and data◦ Extend querying language◦ Extend the OLAP modelling construct (definition of facts – cube)

Intuitionistic Fuzzy OLAP◦ Extending the OLAP operators

KNOLAP◦ Imprecise hierarchical Intuitionistic fuzzy concepts◦ Involvement of the domain knowledge in answering OLAP

queries Automatic navigation/summarisation paths Implementation

Page 19: Harrow School of Computer Science Data & Knowledge Management Group University of Westminster Watford Road, Northwick Park, HA1 3TP London, UK University

Harrow School of Computer Science Harrow School of Computer Science

Data & Knowledge Management GroupData & Knowledge Management Group

University of WestminsterUniversity of WestminsterWatford Road, Northwick Park, HA1 3TPWatford Road, Northwick Park, HA1 3TP

London, UKLondon, UK

CLBME CLBME

Bulgarian Academy of SciencesBulgarian Academy of Sciences

Sofia, BulgariaSofia, Bulgaria