scorpion - github pagessirrice.github.io/files/talks/scorpion_vldb13.pdfscorpion explaining away...

94
Scorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html

Upload: others

Post on 16-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Scorpion Explaining Away Outliers in Aggregate Queries

eugene wu and sam madden

MIT

http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html

Page 2: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT
Page 3: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT
Page 4: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT
Page 5: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT
Page 6: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT
Page 7: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT
Page 8: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Table

Split Aggregate Visualize

Page 9: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy

Exp

ense

s

China

SELECT sum(cost) FROM expenses GROUPBY country

Page 10: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy China

Exp

ense

s

SELECT sum(cost) FROM expenses GROUPBY country

Page 11: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy China

Exp

ense

s

SELECT sum(cost) FROM expenses GROUPBY country

Page 12: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy China

Exp

ense

s

SELECT sum(cost) FROM expenses GROUPBY country

Page 13: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy China

Given Outlier and normal results

Understand Why

Exp

ense

s

SELECT sum(cost) FROM expenses GROUPBY country

Page 14: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Given Outlier and normal results

caused the outliers? most caused the outliers? caused outliers but didn’t affect normal outputs?

USA Italy China

What input properties

Exp

ense

s

SELECT sum(cost) FROM expenses GROUPBY country

Page 15: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Can’t Touch This

Page 16: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT
Page 17: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Data!

Page 18: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance $$$

SELECT SUM(cost) FROM sam’s bank account

Page 19: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

SELECT SUM(cost) FROM sam’s bank account

$$$

Page 20: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

SELECT SUM(cost) FROM sam’s bank account

$$$

Page 21: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Darn! Ya caught me

SELECT SUM(cost) FROM sam’s bank account

$$$

Page 22: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

http://weknowmemes.com/2012/04/whats-the-point/

Page 23: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Filter for “most influential”

Provenance

SELECT SUM(cost) FROM sam’s bank account

Page 24: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Page 25: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Faceting

http://www.perceptualedge.com/articles/Whitepapers/Three_Blind_Men.pdf

Page 26: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Faceting

http://www.perceptualedge.com/articles/Whitepapers/Three_Blind_Men.pdf

Page 27: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Faceting

Dimensionality :(

Dealing with multiple outliers?

http://www.perceptualedge.com/articles/Whitepapers/Three_Blind_Men.pdf

Page 28: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Faceting

Page 29: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Provenance

Faceting

Scorpion!

Page 30: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy

Understand Why

Given Outlier and normal results

China

Exp

ense

s

Page 31: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy China

Predicates correlated with outliers

Find

Given Outlier and normal results

Desc = “toilets”

Exp

ense

s

Page 32: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy China

Removing predicate from inputs “fixes” outliers & maintains normal results

Predicates correlated with outliers

Find

s.t.

Given Outlier and normal results

Exp

ense

s

Desc = “toilets”

Page 33: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

USA Italy China

Removing predicate from inputs “fixes” outliers & maintains normal results

Predicates correlated with outliers

Find

s.t.

Given Outlier and normal results

Exp

ense

s

Desc = “toilets”

Page 34: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Removing predicate from inputs “fixes” outliers & maintains normal results

Predicates correlated with outliers

Find

s.t.

USA Italy China

Given Outlier and normal results

Exp

ense

s

Desc = “toilets”

Page 35: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Formalize “influence” as metric Predicate search heuristics

Some results

Page 36: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

25

50

12pm

T

Page 37: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

p(T)

25

50

12pm

T

Desc = “toilet”

Page 38: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

p(T)

25

50

12pm

T

Page 39: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

T – p(T)

p(T)

25

50

12pm

25

50

12pm

T

Page 40: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

25

50

12pm

25

50

12pm

p(T)

Page 41: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

25

50

12pm

25

50

12pm

p(T)

Page 42: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

25

50

12pm

25

50

12pm

p(T)

Δoutput

Page 43: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

25

50

12pm

25

50

12pm

|p(T)| p(T)

Δoutput

Page 44: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

25

50

12pm

25

50

12pm

|p(T)| p(T)

Δoutput

Δoutput |p(T)|

Influence Metric

Page 45: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Δoutput |p(T)|

Δf(x) Δx

Sensitivity Analysis

Influence Metric

Page 46: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Δoutput

|p(T)| ΔOutput

“High vs Low”

|p(T)|

ΔNormal

Multiple Outputs

Page 47: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Δoutput

|p(T)|

Δoutput � V

|p(T)|

ΔOutput

“High vs Low”

|p(T)|

ΔNormal

Multiple Outputs

Page 48: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Δoutput � V

|p(T)|

Δoutput � V

|p(T)|c

ΔOutput

“High vs Low”

|p(T)|

ΔNormal

Multiple Outputs

Δoutput

|p(T)|

Page 49: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Δoutlier � V

|p(T)|c ΔNormal

Δoutput � V

|p(T)|c

-

ΔOutput

“High vs Low”

|p(T)|

ΔNormal

Multiple Outputs

Δoutput

|p(T)|

Δoutput � V

|p(T)|

Page 50: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Δoutlier � V

|p(T)|c ΔNormal -

ΔOutput

“High vs Low”

|p(T)|

ΔNormal

Multiple Outputs Δoutlier � V

|p(T)|c mean ΔNormal max

outlier normal -

Δoutput

|p(T)|

Δoutput � V

|p(T)|c

Δoutput � V

|p(T)|

Page 51: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Δoutlier � V

|P(T)|c ΔHold-out

Δoutlier

|P(T)|

Δoutlier � V

|P(T)|

Δoutlier � V

|P(T)|c

-

Δoutput

“High vs Low”

|P(T)|

ΔNormal

Multiple Outputs Δoutlier � V

|P(T)|c mean ΔHold-out max

outlier normal -

influence(p)

Page 52: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Formalize “influence” as metric Predicate search heuristics

Some results

Page 53: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Page 54: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(agg(T-p(T)))

Page 55: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(agg(T-p(T)))

SUM({1,2,3,4,5}) = 15

Page 56: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(agg(T-p(T)))

SUM({1,2,3,4,5}) = 15

p

Page 57: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(agg(T-p(T)))

SUM({1,2,3,4,5}) = 15 - {4,5}

p

Page 58: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(agg(T-p(T)))

SUM({1,2,3,4,5}) = 15

SUM({1,2,3}) = 6

- {4,5}

p

Page 59: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(exponential) O(agg(T-p(T)))

Page 60: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(exponential)

Operator Properties

O(agg(T-p(T)))

Page 61: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(exponential) O(agg(p(T)))

Operator Properties

Incrementally removable

Page 62: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(exponential) O(agg(p(T))) Incrementally removable

SUM({1,2,3,4,5}) = 15

p

Page 63: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(exponential) O(agg(p(T))) Incrementally removable

15 - SUM({ 4,5}) = 6

SUM({1,2,3,4,5}) = 15

p

Page 64: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(exponential) O(agg(p(T)))

SUM COUNT AVG STDDEV

Incrementally removable

Page 65: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

O(exponential) O(agg(p(T)))

SUM COUNT AVG STDDEV

MEDIAN MODE

Incrementally removable

Page 66: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Independent

Incrementally removable

O(agg(p(T))) O(exponential)

Least influence

Most influence

Page 67: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Independent

Incrementally removable

O(agg(p(T))) O(exponential)

Least influence

Most influence

Page 68: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Independent

Incrementally removable

O(agg(p(T))) O(exponential)

Least influence

Most influence

Page 69: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Independent

Incrementally removable

O(agg(p(T))) O(exponential)

Least influence

Most influence

Page 70: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Page 71: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Page 72: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Page 73: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Page 74: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Anti-monotonic

Page 75: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Anti-monotonic

p’⊂p

Page 76: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Anti-monotonic

p’⊂p

influence(p’) ≤ influence(p)

Page 77: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Bottom Up Anti-monotonic

Page 78: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Bottom Up Anti-monotonic

Page 79: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence(p) argmax p ∈ predicates p* =

Top Down Independent Incrementally removable

O(agg(p(T))) O(exponential)

Bottom Up Anti-monotonic

Page 80: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

Formalize “influence” as metric Predicate search heuristics

Some results

Page 81: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

SELECT sum(Y) GROUPBY X S

um(Y

)

X

Page 82: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

SELECT sum(Y) GROUPBY X S

um(Y

)

X

Z

Y

Page 83: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

SELECT sum(Y) GROUPBY X S

um(Y

)

X

Z

Y

Z

Page 84: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

SELECT sum(Y) GROUPBY X S

um(Y

)

X

Z

Y

Z

Page 85: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

1K 5K 10K

thousand tuples / group

Page 86: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

1K 5K 10K

100

1000

10

thousand tuples / group

cost

sec

ond

s

Page 87: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

1K 5K 10K

100

1000

10

thousand tuples / group

cost

sec

ond

s Naive

Page 88: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

1K 5K 10K

100

1000

10

thousand tuples / group

cost

sec

ond

s Naive

Top down

Page 89: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

1K 5K 10K

100

1000

10

thousand tuples / group

cost

sec

ond

s Naive

Top down

Bottom up

Page 90: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

influence metric that is

accessible to end-users for

Data cleaning Data exploration Provenance reduction

Page 91: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

scorpion

http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html

[email protected]

Page 92: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

scorpion

http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html

[email protected]

Page 93: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

scorpion

http://springfieldpunx.blogspot.com/2010/11/mortal-kombat-ninjas-scorpion.html

[email protected]

Page 94: Scorpion - GitHub Pagessirrice.github.io/files/talks/scorpion_vldb13.pdfScorpion Explaining Away Outliers in Aggregate Queries eugene wu and sam madden MIT

C-parameter

Δoutput � V |p(T)|c

Y

Z

Low C High C

Z