![Page 1: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/1.jpg)
Prediction Cubes
Bee-Chung Chen, Lei Chen,Yi Lin and Raghu Ramakrishnan
University of Wisconsin - Madison
![Page 2: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/2.jpg)
2
Big Picture• We are not trying to build a single accuracy “model”• We want to find interesting subsets of the dataset
– Interestingness: Defined by the “model” built on a subset– Cube space: A combination of dimension attribute values
defines a candidate subset (just like regular OLAP)• We are not using regular aggregate functions as the
measures to summarize subsets• We want the measures to represent
decision/prediction behavior– Summarize a subset using the “model” built on it– Big difference from regular OLAP!!
![Page 3: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/3.jpg)
3
One Sentence Summary
• Take OLAP data cubes, and keep everything the same except that we change the meaning of the cell values to represent the decision/prediction behavior– The idea is simple, but it leads to interesting and
promising data mining tools
![Page 4: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/4.jpg)
4
Example (1/5): Regular OLAP
Location Time # of App.… … ...
AL, USA Dec, 04 2… … …
WY, USA Dec, 04 3
Z: Dimensions Y: MeasureGoal: Look for patterns of unusuallyhigh numbers of applications
……………………………108270USA……3025502030CA…Dec…JanDec…Jan…20032004
Cell value: Number of loan applications
……………9080USA…90100CA…0304
Roll up
Coarserregions
………………………10WY
……5…………55AL
USA
…1535YT…2025……151520AB
CA
…Dec…Jan…2004
Drilldown
Finer regions
![Page 5: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/5.jpg)
5
Example (2/5): Decision AnalysisGoal: Analyze a bank’s loan decision process
w.r.t. two dimensions: Location and Time
Model h(X, σZ(D))E.g., decision tree
No…FBlackDec, 04WY, USA………………
Yes…MWhiteDec, 04AL, USA
Approval…SexRaceTimeLocationZ: Dimensions X: Predictors Y: ClassFact table D
cube subset
Location TimeAll
Japan USA Norway
AL W Y
All
Country
State
![Page 6: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/6.jpg)
6
Example (3/5): Questions of Interest
• Goal: Analyze a bank’s loan decision process with respect to two dimensions: Location and Time
• Target: Find discriminatory loan decision• Questions:
– Are there locations and times when the decision making was similar to a set of discriminatory decision examples(or similar to a given discriminatory decision model)?
– Are there locations and times during which Race or Sexis an important factor of the decision process?
![Page 7: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/7.jpg)
7
Example (4/5): Prediction Cube
Model h(X, σ[USA, Dec 04](D))E.g., decision tree
2004 2003 …Jan … Dec Jan … Dec …
CA 0.4 0.8 0.9 0.6 0.8 … …
USA 0.2 0.3 0.5 … … …
… … … … … … … …
1. Build a model using data from USA in Dec., 1985
2. Evaluate that modelMeasure in a cell:• Accuracy of the model• Predictiveness of Race
measured based on thatmodel
• Similarity between thatmodel and a given model
N…FBlackDec, 04WY, USA………………Y…MWhiteDec, 04AL ,USA
Approval…SexRaceTimeLocation
Data σ[USA, Dec 04](D)
![Page 8: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/8.jpg)
8
Example (5/5): Prediction Cube2004 2003 …
Jan … Dec Jan … Dec …CA 0.4 0.1 0.3 0.6 0.8 … …
USA 0.7 0.4 0.3 0.3 … … …
… … … … … … … …
………………………
…………0.80.70.9WY
………0.10.10.3…
…………0.20.10.2ALUSA
………0.20.10.20.3YT
………0.30.30.10.1…
……0.20.10.10.20.4ABCA
…Dec…JanDec…Jan…20032004
Drill down
…………
…0.30.2USA
…0.20.3CA…0304Roll up
Cell value: Predictiveness of Race
![Page 9: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/9.jpg)
9
Outline
• Motivating example• Definition of prediction cubes• Efficient prediction cube materialization• Experimental results• Conclusion
![Page 10: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/10.jpg)
10
Prediction Cubes
• User interface: OLAP data cubes– Dimensions, hierarchies, roll up and drill down
• Values in the cells:– Accuracy– Similarity– Predictiveness
→ Test-set accuracy cube→ Model-similarity cube→ Predictiveness cube
![Page 11: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/11.jpg)
11
Test-Set Accuracy Cube
No…FBlackDec, 04WY, USA………………
Yes…MWhiteDec, 04AL, USA
Approval…SexRaceTimeLocationData table DGiven:
- Data table D- Test set ∆
No…MBlack…………
Yes…FWhiteApproval…SexRace
Test set ∆
……………………………0.90.30.2USA……0.50.60.30.20.4CA…Dec…JanDec…Jan…20032004
Level: [Country, Month]
The decision model of USA during Dec 04had high accuracy when applied to ∆
Build a modelAccuracy
Yes…
Yes
Prediction
![Page 12: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/12.jpg)
12
Model-Similarity Cube
No…FBlackDec, 04WY, USA………………
Yes…MWhiteDec, 04AL, USA
Approval…SexRaceTimeLocationData table DGiven:
- Data table D- Target model h0(X)- Test set ∆ w/o labels
…MBlack…………FWhite…SexRace
Test set ∆
……………………………0.90.30.2USA……0.50.60.30.20.4CA…Dec…JanDec…Jan…20032004
Level: [Country, Month]
The loan decision process in USA during Dec 04was similar to a discriminatory decision model h0(X)
Build a model
Similarity
No
…
Yes
Yes
…
Yes
![Page 13: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/13.jpg)
13
Predictiveness Cube
2004 2003 …Jan … Dec Jan … Dec …
CA 0.4 0.2 0.3 0.6 0.5 … …USA 0.2 0.3 0.9 … … …
… … … … … … … …
Given:- Data table D- Attributes V- Test set ∆ w/o labels
Data table D
Build models
…MBlack…………FWhite…SexRace
Test set ∆
Level: [Country, Month]Predictiveness of V
Race was an important factor of loan approval decision in USA during Dec 04
h(X) h(X−V)
No…FBlackDec, 04WY, USA………………
Yes…MWhiteDec, 04AL, USA
Approval…SexRaceTimeLocation
YesNo..No
YesNo..Yes
![Page 14: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/14.jpg)
14
Outline
• Motivating example• Definition of prediction cubes• Efficient prediction cube materialization• Experimental results• Conclusion
![Page 15: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/15.jpg)
15
One Sentence Summary
• Reduce prediction cube computation to data cube computation– Somehow represent a data-mining model as a
distributive or algebraic (bottom-up computable) aggregate function, so that data-cube techniques can be directly applied
![Page 16: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/16.jpg)
16
Full MaterializationFull Materialization Table
[All,All]
[Country,Year]
[All,Year][Country,All]
Level Location Time Cell Value
[All,All] ALL ALL 0.7
CA ALL 0.4
… ALL …
USA ALL 0.9
ALL 1985 0.8
ALL … …
ALL 2004 0.3
CA 1985 0.9
CA 1986 0.2
… … …
USA 2004 0.8
[Country,Year]
[All,Year]
[Country,All]
USA
…
CA
2004…19861985
All
2004…19861985
[All, Year]
USA
…
CA
All
All
All
[All, All]
[Country, Year] [Country, All]
![Page 17: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/17.jpg)
17
Bottom-Up Data Cube Computation
1985 1986 1987 1988
Norway 10 30 20 24
… 23 45 14 32
USA 14 32 42 11
1985 1986 1987 1988
All 47 107 76 67
All
All 297
All
Norway 84
… 114
USA 99
Cell Values: Numbers of loan applications
![Page 18: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/18.jpg)
18
Functions on Sets• Bottom-up computable functions: Functions that can be
computed using only summary information• Distributive function: α(X) = F({α(X1), …, α(Xn)})
– X = X1 ∪ … ∪ Xn and Xi ∩ Xj = ∅– E.g., Count(X) = Sum({Count(X1), …, Count(Xn)})
• Algebraic function: α(X) = F({G(X1), …, G(Xn)})– G(Xi) returns a length-fixed vector of values– E.g., Avg(X) = F({G(X1), …, G(Xn)})
• G(Xi) = [Sum(Xi), Count(Xi)]• F({[s1, c1], …, [sn, cn]}) = Sum({si}) / Sum({ci})
![Page 19: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/19.jpg)
19
Scoring Function• Represent a model as a function of sets.• Conceptually, a machine-learning model h(X; σZ(D))
is a scoring function Score(y, x; σZ(D)) that gives each class y a score on test example x– h(x; σZ(D)) = argmax y Score(y, x; σZ(D))– Score(y, x; σZ(D)) ≈ p(y | x, σZ(D))– σZ(D): The set of training examples (a cube subset of D)
![Page 20: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/20.jpg)
20
Bottom-up Score Computation• Key observations:
– Observation 1: Score(y, x; σZ(D)) is a function of cube subset σZ(D); if it is distributive or algebraic, the data cube bottom-up technique can be directly applied
– Observation 2: Having the scores for all the test examples and all the cells is sufficient to compute a prediction cube
• Scores ⇒ predictions ⇒ cell values• Details depend on what each cell means (i.e., type of prediction
cubes); but straightforward
![Page 21: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/21.jpg)
21
1985 1986 1987 1988
Norway
…
USA
1985 1986 1987 1988
All
All
Norway
…
USA
All
All
scores scores scores scores
scores scores scores scores
scores scores scores scores
scores scores scores scores
scores
scores
scores
scores
value value value value
value value value value
value value value value
value value value value
value
value
value
value
1. Build a model for each lowest-level cell2. Compute the scores using data cube bottom-up technique
• Ob. 1: Distributive scoring function ⇒ bottom up3. Use the scores to compute the cell values
• Ob. 2: Having scores ⇒ having cell values
![Page 22: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/22.jpg)
22
Machine-Learning Models
• Naïve Bayes:– Scoring function: algebraic
• Kernel-density-based classifier:– Scoring function: distributive
• Decision tree, random forest:– Neither distributive, nor algebraic
• PBE: Probability-based ensemble (new)– To make any machine-learning model distributive– Approximation
![Page 23: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/23.jpg)
23
Probability-Based EnsemblePBE version of decision
tree on [WA, 85]Decision tree on [WA, 85]
1985
WA
Jan … Dec
………
WA
1985
………
Dec…Jan
Decision trees built on the lowest-level cells
![Page 24: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/24.jpg)
24
Probability-Based Ensemble
• Scoring function:
– h(y | x; bi(D)): Model h’s estimation of p(y | x, bi(D))– g(bi | x): A model that predicts the probability that x
belongs to base subset bi(D)
))(;,(maxarg))(;( DD SS xx σσ yScoreh PBEyPBE =
( )∑ ∈=
SS xxi iPBEPBE byScoreyScore ))(;,())(;,( DDσ
)|())(;|())(;,( xxx iiiPBE bgbyhbyScore ⋅= DD
![Page 25: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/25.jpg)
25
Outline
• Motivating example• Definition of prediction cubes• Efficient prediction cube materialization• Experimental results• Conclusion
![Page 26: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/26.jpg)
26
Experiments
• Quality of PBE on 8 UCI datasets– The quality of the PBE version of a model is slightly
worse (0 ~ 6%) than the quality of the model trained directly on the whole training data.
• Efficiency of the bottom-up score computation technique
• Case study on demographic data
WA
1985
………
…
WA
1985
………
…
PBE vs.
![Page 27: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/27.jpg)
27
Efficiency of the Bottom-up Score Computation
• Machine-learning models:– J48: J48 decision tree– RF: Random forest– NB: Naïve Bayes– KDC: Kernel-density-based classifier
• Bottom-up method vs. Exhaustive method− PBE-J48− PBE-RF− NB− KDC
− J48ex− RFex− NBex− KDCex
![Page 28: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/28.jpg)
28
Synthetic Dataset
• Dimensions: Z1, Z2 and Z3.
• Decision rule:
All
0 1 n
All
A B C D E
0 1 2 3 4 5 6 7 8 9
Z1 and Z2 Z3
Condition RuleWhen Z1>1 Y = I(4X1+3X2+2X3+X4+0.4X6 > 7)else when Z3 mod 2 = 0 Y = I(2X1+2X2+3X3+3X4+0.4X6 > 7)else Y = I(0.1X5+X1>1)
![Page 29: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/29.jpg)
29
Efficiency Comparison
0
500
1000
1500
2000
2500
40K 80K 120K 160K 200K
RFex
KDCex
NBex
J48ex
NB
KDC
RF-P BE J48-P BE
Using exhaustivemethod
# of Records
Exe
cutio
n T
ime
(sec
)
Using bottom-upscore computation
![Page 30: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/30.jpg)
30
Take-Home Messages
• Promising exploratory data analysis paradigm:– Use models to identify interesting subsets– Concentrate only on subsets in the cube space
• Those are meaningful subsets
– Precompute the results– Provide the users with an interactive tool
• A simple way to plug “something” into cube-style analysis:– Try to describe/approximate “something” by a
distributive or algebraic function
![Page 31: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/31.jpg)
31
Related Work: Building models in OLAP
• Multi-dimensional regression [Chen, VLDB 02]– Goal: Detect changes of trends– Build linear regression models for cube cells
• Step-by-step regression in stream cube [Liu, PAKDD 03]• Loglinear-based quasi cubes [Barbara, J. IIS 01]
– Use loglinear model to approximately compress dense regions of a data cube
• NetCube [Margaritis, VLDB 01]– Build Bayes Net on the entire dataset of approximately
answer count queries
![Page 32: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/32.jpg)
32
Related Work: Advanced Cube-Style Analysis
• Cubegrades [Imielinski, J. DMKD 02]– Extend data cubes using ideas from association rules– How the measure changes when we rollup or drill down
• Constrained gradients in data cube [Dong, VLDB 01]– Find pairs of similar cell characteristics associated with
big changes in measure• User-cognizant multidimensional analysis
[Sarawagi, VLDBJ 01]– Help users to explore the most informative unvisited
regions in a data cube using max entropy principle
![Page 33: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/33.jpg)
Questions
![Page 34: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/34.jpg)
34
What are Our Assumptions?
• Machine-learning models are good approximation of the true decision/prediction model– Evaluate accuracy
• The size of each base subset is large enough to build a good model– Future work: Find the proper levels of subsets to start
from
• Model properties are evaluated by test sets– We did not consider looking at the models themselves
![Page 35: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/35.jpg)
35
Why Test Set?• To obtain quantitative model properties, we need
test set• Questions: Why to let users to provide test sets?• Flexibility vs. ease of use
– Flexibility: The user can specify p(X) that he/she is interested in (e.g., focus on rich people)
• E.g., compare p1(Y | X, σ(D)) with p2(Y | X, σ(D))– Simple fix:
• Sample test set from the dataset.• Cross-validation cube
![Page 36: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/36.jpg)
36
Why PBE is not that good?
• If the probability estimation of the base models is correct, then PBE is optimal
• Why it is not optimal in reality?– The probability estimation method is not good– The training datasets for base models are too small
• Fix:– Work on the probability estimation method– Build models for some non-base-level cells
![Page 37: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/37.jpg)
37
Feature Selection vs. Prediction Cubes
• Feature selection:– Goal: Find the best k predictive attributes– Search space: 2n (n: number of attributes)
• Prediction cubes:– Goal: Find interesting cube cells– Search space: 2d (d: number of dimension attributes)– You may use accuracy cube to find predictive
dimension attributes, but not is not our goal– For the predictiveness cube, the attributes whose
predictiveness is of interest is given
![Page 38: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/38.jpg)
38
Why We Need Efficient Precomputation?
• Several hours vs. several days vs. several months• For upper level cells, if the machine learning
algorithm is not scalable and we do not have a bottom-up method, we may never get the result
![Page 39: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/39.jpg)
Backup Slides
![Page 40: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/40.jpg)
40
Theoretical Comparison
• Training complexity:– Exhaustive:– Bottom-up:
( )∑ ∈×××
Levelsll lltrainl
dl
d dd nfZZ
],...,[ ],...,[)()(
11 1
1 )(||...||
)(||...|| ]1,...,1[)1()1(
1 nfZZ traind ×××
All
MA WI MN
Madison, WI Green Bay, WI
Z1(3) = All All
85 86 04
Jan., 86 Dec., 86
Z1(2) = State
Z1(1) = City
Z2(3) = All
Z2(2) = Year
Z2(1) = Month
Z1 = Location Z2 = Time
![Page 41: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/41.jpg)
41
Theoretical Comparison
• Testing complexity:– Exhaustive:– Bottom-up:
( )∑ ∈×××
Levelsll lltestl
dl
d dd nfZZ
],...,[ ],...,[)()(
11 1
1 )(||...||
( )∑ −∈×××
+×××
]})1,...,1{[(],...,[
)()(1
]1,...,1[)1()1(
1
1
1 ||...||
)(||...||
Levelsll
ld
l
traind
d
d cZZ
nfZZ
[3,3]
[2,2]
[1,1]
[2,1][1,2]
[3,1][1,3]
[3,2][2,3]
Levels
![Page 42: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/42.jpg)
42
Test-Set-Based Model Evaluation
• Given a set-aside test set ∆ of schema [X, Y]:– Accuracy of h(X):
• The percentage of ∆ that are correctly classified– Similarity between h1(X) and h2(X):
• The percentage of ∆ that are given the same class labels by h1(X) and h2(X)
– Predictiveness of V ⊆ X: (based on h(X))• The difference between h(X) and h(X−V) measured
by ∆; i.e., the percentage of ∆ that are predicted differently by h(X) and h(X−V)
![Page 43: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/43.jpg)
43
Model Accuracy
• Test-set accuracy (TS-accuracy):– Given a set-aside test set ∆ with schema [X, Y],
• |∆|: The number of examples in ∆• I(Ψ) = 1 if Ψ is true; otherwise, I(Ψ) = 0
• Alternative: Cross-validation accuracy– This will not be discussed further!!
∑ ∈=
∆),());((
||1
yyhI
xDx
∆accuracy(h(X; D) | ∆) =
![Page 44: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/44.jpg)
44
Model Similarity
• Prediction similarity (or distance):– Given a set-aside test set ∆ with schema X:
• Similarity between ph1(Y | X) and ph2
(Y | X):
– phi(Y | X): Class-probability estimated by hi(X)
∑ ∈=
∆∆ xxx ))()((
||1
21 hhIsimilarity(h1(X), h2(X)) =
∑ ∑∈∆∆ x yh
hh xyp
xypxyp
)|()|(
log)|(||
1
2
1
1KL-distance =
distance(h1(X), h2(X)) = 1 – similarity(h1(X), h2(X))
![Page 45: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/45.jpg)
45
Attribute Predictiveness
• Predictiveness of V ⊆ X: (based on h(X))– PD-predictiveness:
– KL-predictiveness:
• Alternative:accuracy(h(X)) – accuracy(h(X – V))
– This will not be discussed further!!
distance(h(X), h(X – V))
KL-distance(h(X), h(X – V))
![Page 46: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/46.jpg)
46
Target Patterns• Find subset σ(D) such that h(X; σ(D)) has high prediction
accuracy on a test set ∆– E.g., The loan decision process in 2003’s WI is similar to
a set ∆ of discriminatory decision examples• Find subset σ(D) such that h(X; σ(D)) is similar to a given
model h0(X)– E.g., The loan decision process in 2003’s WI is similar to
a discriminatory decision model h0(X)• Find subset σ(D) such that V is predictive on σ(D)
– E.g., Race is an important factor of loan approval decision in 2003’s WI
![Page 47: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/47.jpg)
47
Test-Set Accuracy
• We would like to discover:– The loan decision process in 2003’s WI is similar to a
set of problematic decision examples
• Given: – Data table D: The loan decision dataset– Test set ∆: The set of problematic decision examples
• Goal:– Find subset σLoc,Time(D) such that h(X; σLoc,Time(D)) has
high prediction accuracy on ∆
![Page 48: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/48.jpg)
48
Model Similarity
• We would like to discover:– The loan decision process in 2003’s WI is similar to a
problematic decision model
• Given:– Data table D: The loan decision dataset– Model h0(X): The problematic decision model
• Goal:– Find subset σLoc,Time(D) such that h(X; σLoc,Time(D)) is
similar to h0(X)
![Page 49: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/49.jpg)
49
Attribute Predictiveness
• We would like to discover:– Race is an important factor of loan approval decision
in 2003’s WI
• Given:– Data table D: The loan decision dataset– Attribute V of interest: Race
• Goal:– Find subset σLoc,Time(D) such that h(X; σLoc,Time(D)) is
very different to h(X – V; σLoc,Time(D))
![Page 50: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/50.jpg)
50
Model-Based Subset Analysis
• Given: A data table D with schema [Z, X, Y]– Z: Dimension attributes, e.g., {Location, Time}– X: Predictor attributes, e.g., {Race, Sex, …}– Y: Class-label attribute, e.g., Approval
No…FBlackDec, 04WY, USA………………
Yes…MWhiteDec, 04AL, USA
Approval…SexRaceTimeLocationData table D
![Page 51: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/51.jpg)
51
Model-Based Subset Analysis
No…FBlackDec, 04WY, USA………………
Yes…MWhiteDec, 04AL, USA
Approval…SexRaceTimeLocationZ: Dimension X: Predictor Y: Class
σ[USA, Dec 04](D)
• Goal: To understand the relationship between X and Y on different subsets σZ(D) of data D– Relationship: p(Y | X, σZ(D))
• Approach: – Build model h(X; σZ(D)) ≈ p(Y | X, σZ(D))– Evaluate h(X; σZ(D))
• Accuracy, model similarity, predictiveness
![Page 52: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/52.jpg)
52
Dimension and Level
All
MA WI MN
Madison, WI Green Bay, WI
Z1(3) = All All
85 86 04
Jan., 86 Dec., 86
Z1(2) = State
Z1(1) = City
Z2(3) = All
Z2(2) = Year
Z2(1) = Month
Z1 = Location Z2 = Time
[3,3]
[2,2]
[1,1]
[2,1][1,2]
[3,1][1,3]
[3,2][2,3]
[All,All]
[State,Year]
[City,Month]
[City,Year]
[All,Month][City,All]
[All,Year][State,All]
[State,Month]
![Page 53: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/53.jpg)
53
Example: Full Materialization
[All,All]
[State,Year]
[City,Month]
[City,Year]
[All,Month][City,All]
[All,Year][State,All]
[State,Month]
All
All
[All, All]
All
AL
…
WY
85 … 05
All
[City, Month]
![Page 54: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/54.jpg)
54
Scoring Function• Conceptually, a machine-learning model h(X; S) is
a scoring function Score(y, x; S) that gives each class y a score on test example x– h(x; S) = argmax y Score(y, x; S)– Score(y, x; S) ≈ p(y | x, S)– S: A set of training examples
x
h(X; S)
No…FBlackDec, 85WY, USA………………
Yes…MWhiteDec, 85AL, USA
Approval…SexRaceTimeLocation
S
[Yes: 80%, No: 20%]
![Page 55: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/55.jpg)
55
Bottom-Up Score Computation
• Base cells: The finest-grained (lowest-level) cells in a cube
• Base subsets bi(D): The lowest-level data subsets– The subset of data records in a base cell is a base subset
• Properties:– D = ∪i bi(D) and bi(D) ∩ bj(D) = ∅– Any subset σS(D) of D that corresponds to a cube cell is
the union of some base subsets– Notation:
• σS(D) = bi(D) ∪ bj(D) ∪ bk(D), where S = {i, j, k}
![Page 56: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/56.jpg)
56
Bottom-Up Score ComputationDomainLattice
Scores:Score(y, x; σS(D)) =F({Score(y, x; bi(D)) : i ∈ S})
Data subset:σS(D) = ∪i∈S bi(D)
1985 …
WA b1(D) …
WI b2(D)
WY b3(D) …
1985 …
All σS(D) …
1985 …
All Score(y, x; σS(D)) …[All,All]
[State,Year]
[All,Year][State,All]
1985 …
WA Score(y, x; b1(D)) …
WI Score(y, x; b2(D))
WY Score(y, x; b3(D)) …
![Page 57: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/57.jpg)
57
Decomposable Scoring Function
• Let σS(D) = ∪i∈S bi(D).– bi(D) is a base (lowest-level) subset
• Distributively decomposable scoring function:– Score(y, x; σS(D)) = F({Score(y, x; bi(D)) : i ∈ S})– F is an distributive aggregate function
• Algebraically decomposable scoring function:– Score(y, x; σS(D)) = F({G(y, x; bi(D)) : i ∈ S})– F is an algebraic aggregate function– G(y, x; bi(D)) returns a length-fixed vector of values
![Page 58: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/58.jpg)
58
Algorithm
• Input: The dataset D and test set ∆• For each lowest-level cell, which contains data bi(D):
– Build a model on bi(D)– For each x ∈ ∆ and y, compute:
• Score(y, x; bi(D)), if distributive• G(y, x; bi(D)), if algebraic
• Use standard data cube computation technique to compute the scores in a bottom-up manner (by Observation 2)
• Compute the cell values using the scores (by Observation 1)
![Page 59: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/59.jpg)
59
Probability-Based Ensemble
• Scoring function:
– h(y | x; bi(D)): Model h’s estimation of p(y | x, bi(D))– g(bi | x): A model that predicts the probability that x
belongs to base subset bi(D)
))(;,(maxarg))(;( DD SS xx σσ yScoreh PBEyPBE =
( )∑ ∈=
SS xxi iPBEPBE byScoreyScore ))(;,())(;,( DDσ
)|())(;|())(;,( xxx iiiPBE bgbyhbyScore ⋅= DD
![Page 60: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/60.jpg)
60
Optimality of PBE
• ScorePBE(y, x; σS(D)) = c ⋅ p(y | x, x∈σS(D))
( )( )∑
∑∑
∈
∈
∈
⋅⋅=
∈⋅∈⋅=
∈⋅=
∈⋅=∈∈
=
∈
S
S
S
S
S
S
S
xx
xxxx
xx
xxxxxx
xx
i ii
i ii
i i
bgbyhz
bpbypz
bypz
ypzpyp
yp
)|())(;|(
)|)(()),(|(
)|)(,(
)|)(,( )|)(()|)(,(
))(,|(
D
DD
D
DDD
D
σσσ
σ
[ bi(D)’s partitions σS(D)]
![Page 61: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/61.jpg)
61
Efficiency Comparison
0
100
200
300
400
500
600
700
200K 400K 600K 800K 1M
J48-PBE
KDC
NB
RF-PBE
![Page 62: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/62.jpg)
62
Where is the Time Spend on
0%
20%
40%
60%
80%
100%
200K 1M 200K 1M 200K 1M 200K 1M
J48-P B E RF-P B E KDC NB
Other
Testing
Training
![Page 63: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/63.jpg)
63
Accuracy of PBE• Goal:
– To compare PBE with the gold standard• PBE: A set of n J48s/RFs each of which is trained
on a small partition of the whole dataset• Gold standard: A J48/RF trained on the whole data
– To understand how the number of base classifiers in a PBE affects the accuracy of the PBE
• Datasets:– Eight UCI datasets
![Page 64: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/64.jpg)
64
Accuracy of PBEAdult Dataset
80
82
84
86
88
90
92
94
96
98
100
2 5 10 15 20
# of base classifiers in a PBE
Acc
urac
y RF
J48
RF-PBE
J48-PBE
![Page 65: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/65.jpg)
65
Accuracy of PBENursery Dataset
80
82
84
86
88
90
92
94
96
98
100
2 5 10 15 20
# of base classifiers in a PBE
Acc
urac
y RF
J48
RF-PBE
J48-PBE
![Page 66: Prediction Cubes · – Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) • We are not using regular aggregate functions](https://reader034.vdocument.in/reader034/viewer/2022051904/5ff68c0af7683100d10376b3/html5/thumbnails/66.jpg)
66
Accuracy of PBEError = The average of the absolute difference between
a ground-truth cell value and a cell value computed by PBE
Flat Dataset
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
1 10 100 1000 10000# of base models in a PBE
Erro
r
RF-PBE J48-PBE
Deep Dataset
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 10 100 1000# of base models in a PBE
Erro
r
RF-PBE J48-PBE