3 ideas for big data exploration · 3 ideas for big data exploration stratos idreos cwi, ins-1,...

Post on 07-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

3 Ideas for Big Data Exploration

Stratos IdreosCWI, INS-1, Amsterdam

adaptive indexingadaptive loading

dbTouch

data is everywhere

years

daily data

years

daily data

Eric Schmidt: Every two days we create as much data as much we did from dawn of humanity to 2003

years

daily data

Eric Schmidt: Every two days we create as much data as much we did from dawn of humanity to 2003

2012

3-4 exabytes

not always sure what we are looking for (until we find it)

data exploration

not always sure what we are looking for (until we find it)

database systems great...declarative processing, back-end to numerous apps

database systems great...

but databases are too heavy and blind!

declarative processing, back-end to numerous apps

database systems great...

but databases are too heavy and blind!

timeline

declarative processing, back-end to numerous apps

database systems great...

but databases are too heavy and blind!

timeline

load

declarative processing, back-end to numerous apps

database systems great...

but databases are too heavy and blind!

timeline

load index

declarative processing, back-end to numerous apps

database systems great...

but databases are too heavy and blind!

timeline

load index query

declarative processing, back-end to numerous apps

database systems great...

but databases are too heavy and blind!

timeline

load index query

declarative processing, back-end to numerous apps

expert users - idle time - workload knowledge

systems tailored for data exploration

systems tailored for data exploration

no workload knowledge

systems tailored for data exploration

no installation steps

no workload knowledge

systems tailored for data exploration

quick response times

no installation steps

no workload knowledge

systems tailored for data exploration

quick response times

no installation steps

no workload knowledge

minimize data-to-query time

adaptive indexing7 years, 7 papers

ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award

3 ideas

adaptive indexing

adaptive loading

7 years, 7 papers

3 years, 3 papers

ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award

In use in Hadapt and Tableau

3 ideas

adaptive indexing

adaptive loading

dbTouch

7 years, 7 papers

3 years, 3 papers

1 year, 1 paperVENI 2012

ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award

In use in Hadapt and Tableau

3 ideas

adaptive indexing

adaptive loading

dbTouch

7 years, 7 papers

3 years, 3 papers

1 year, 1 paperVENI 2012

Martin Kersten, Stefan Manegold, Felix Halim, Panagiotis Karras, Roland Yap, Goetz Graefe, Harumi Kuno,

Eleni Petraki, Anastasia Ailamaki, Ioannis Alagiannis, Renata Borovica, Miguel Branco, Ryan Johnson, Erietta Liarou

ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award

In use in Hadapt and Tableau

3 ideas

indexing

tune= create proper indexes offlineperformance 10-100X

load index query

indexing

tune= create proper indexes offlineperformance 10-100X

but it depends on workload!which indices to build?on which data parts?and when to build them?

load index query

load index query

timeline

load index query

timeline

sample workload

load index query

timeline

sample workload analyze

load index query

timeline

sample workload analyze create indices

load index query

timeline

sample workload analyze create indices query

load index query

timeline

sample workload analyze create indices query

complex and time consuming process

load index query

timeline

sample workload analyze create indices query

complex and time consuming process

load index query

human administrators + auto-tuning tools

what can go wrong?

what can go wrong?

not enough idle time to finish proper tuning

what can go wrong?

not enough idle time to finish proper tuning

by the time we finish tuning, the workload changes

database cracking

database crackingincremental, adaptive, partial indexing

remove all tuning steps and need for human input

database crackingincremental, adaptive, partial indexing

remove all tuning steps and need for human input

indexing

initialization

database crackingincremental, adaptive, partial indexing

remove all tuning steps and need for human input

indexing

initialization query

database crackingincremental, adaptive, partial indexing

remove all tuning steps and need for human input

indexing

initialization query

every query is treated as an advice on how data should be stored

database crackingincremental, adaptive, partial indexing

remove all tuning steps and need for human input

indexing

initialization query

cracking exampleDatabase Cracking CIDR 2007

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

cracking exampleDatabase Cracking CIDR 2007

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Resu

lt tu

ples

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operatorRe

sult

tupl

es

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operatorRe

sult

tupl

es

Database Cracking CIDR 2007

gain knowledge on how data is organized

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Resu

lt tu

ples

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Resu

lt tu

ples

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

the more we crack, the more we learn

cracking example

from Rwhere R.A > 10

and R.A < 14

select *

Q2:select *from Rwhere R.A > 7

and R.A <= 16

Q1:136798131211141619 Piece 5: 16 < A

Piece 3: 10 < A < 14

Piece 1: A <= 7

Piece 2: 7 < A <= 10

Piece 4: 14 <= A <= 16

Cracker column of A Cracker column of A

10 < A < 14

14 <= A

A <= 10Piece 1:

Piece 3:

Piece 2:

(in!place)(copy)

Q1 Q2

Column A

13164921271193141186

49271386131211161914

42

Resu

lt tu

ples

Dynamically/on-the-fly within the select-operator

Database Cracking CIDR 2007

every query is treated as an advice on how data should be stored

the more we crack, the more we learn

TPC-H

Sideways Cracking, SIGMOD 09

70

330

100

150

200

250

300

0 5 10 15 20 25 30Query sequence

TPC-H Query 15764420

1000

10000

Sel. Crack

Sid. Crack

MonetDB

PresortedMySQL

PresortedRe

spon

se ti

me

(milli

sec

s)

TPC-H

Sideways Cracking, SIGMOD 09

70

330

100

150

200

250

300

0 5 10 15 20 25 30Query sequence

TPC-H Query 15764420

1000

10000

Sel. Crack

Sid. Crack

MonetDB

PresortedMySQL

PresortedRe

spon

se ti

me

(milli

sec

s)

Normal MonetDB

selection cracking

TPC-H

Sideways Cracking, SIGMOD 09

70

330

100

150

200

250

300

0 5 10 15 20 25 30Query sequence

TPC-H Query 15764420

1000

10000

Sel. Crack

Sid. Crack

MonetDB

PresortedMySQL

Presorted

Preparation cost3-14 minutes

Resp

onse

tim

e (m

illi s

ecs)

Presorted MonetDB

Normal MonetDB

selection cracking

TPC-H

Sideways Cracking, SIGMOD 09

70

330

100

150

200

250

300

0 5 10 15 20 25 30Query sequence

TPC-H Query 15764420

1000

10000

Sel. Crack

Sid. Crack

MonetDB

PresortedMySQL

Presorted

Preparation cost3-14 minutes

Resp

onse

tim

e (m

illi s

ecs)

Presorted MonetDB

MonetDB with sideways cracking

Normal MonetDB

selection cracking

TPC-H

Sideways Cracking, SIGMOD 09

70

330

100

150

200

250

300

0 5 10 15 20 25 30Query sequence

TPC-H Query 15764420

1000

10000

Sel. Crack

Sid. Crack

MonetDB

PresortedMySQL

Presorted

Preparation cost3-14 minutes

Resp

onse

tim

e (m

illi s

ecs)

Presorted MonetDB

MonetDB with sideways cracking

Normal MonetDB

selection cracking

TPC-H

Sideways Cracking, SIGMOD 09

70

330

100

150

200

250

300

0 5 10 15 20 25 30Query sequence

TPC-H Query 15764420

1000

10000

Sel. Crack

Sid. Crack

MonetDB

PresortedMySQL

Presorted

Preparation cost3-14 minutes

Resp

onse

tim

e (m

illi s

ecs)

Presorted MonetDB

MonetDB with sideways cracking

Normal MonetDB

selection cracking

TPC-H

Sideways Cracking, SIGMOD 09

70

330

100

150

200

250

300

0 5 10 15 20 25 30Query sequence

TPC-H Query 15764420

1000

10000

Sel. Crack

Sid. Crack

MonetDB

PresortedMySQL

Presorted

Preparation cost3-14 minutes

Resp

onse

tim

e (m

illi s

ecs)

Presorted MonetDB

MonetDB with sideways cracking

Normal MonetDB

selection cracking

skyserver experiments

cracking answers 160.000 queries while full indexing is still half way creating the index

Stochastic Cracking, PVLDB 12

cracking databasesupdates (SIGMOD07)

multiple columns (SIGMOD09)

concurrency control (PVLDB12)

storage restrictions (SIGMOD09)

robustness (PVLDB12)

benchmarking (TPCTC10)

algorithms (PVLDB11)

cracking databasesupdates (SIGMOD07)

multiple columns (SIGMOD09)

concurrency control (PVLDB12)

storage restrictions (SIGMOD09)

robustness (PVLDB12)

benchmarking (TPCTC10)

algorithms (PVLDB11)

disk based

joins

multi-cores

aggregations

row-stores

vectorwised

optimization

multi-dimensional

holistic indexing

load index query

loading

load index query

loading

copy data inside the databasedatabase now has full control

load index query

loading

copy data inside the databasedatabase now has full control

slow process...not all data might be needed all the time

database vs. unix tools

1 file, 4 attributes, 1 billion tuples

0 550 1,100 1,650 2,200DBAwk

single query cost (secs)

database vs. unix tools

93%

7%LoadingQuery Processing

1 file, 4 attributes, 1 billion tuples

0 550 1,100 1,650 2,200DBAwk

single query cost (secs)

break down db cost

database vs. unix tools

93%

7%LoadingQuery Processing

1 file, 4 attributes, 1 billion tuples

0 550 1,100 1,650 2,200DBAwk

single query cost (secs)

break down db cost

loading is a major bottleneck

database vs. unix tools

93%

7%LoadingQuery Processing

1 file, 4 attributes, 1 billion tuples

0 550 1,100 1,650 2,200DBAwk

single query cost (secs)

break down db cost

... but writing/maintaining scripts is hard too

loading is a major bottleneck

adaptive loading

load/touch only what is needed and only when it is needed

but raw data access is expensive

tokenizing - parsing - no indexing - no statistics

but raw data access is expensive

tokenizing - parsing - no indexing - no statistics

challenge: fast raw data access

query plan

query plan

scan

query plan

scan

db

query plan

scan

db

scan

query plan

scanscan

zero loading cost

query plan

scanscan

files

access raw data adaptively on-the-fly

zero loading cost

query plan

scanscan

files cache

access raw data adaptively on-the-fly

zero loading cost

query plan

scanscan

files cache

access raw data adaptively on-the-fly

zero loading cost

selective parsing - file indexing file splitting - online statistics

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

NoDB SIGMOD 2012

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

NoDB SIGMOD 2012

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

NoDB SIGMOD 2012

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

NoDB SIGMOD 2012

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

NoDB SIGMOD 2012

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

NoDB SIGMOD 2012

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

NoDB SIGMOD 2012

40

0

200

400

600

800

1000

1200

1400

MySQL CSV�Engine�MySQL

DBMS�X DBMS�Xw/�external�files

PostgreSQL PostgresRawPM�+�C

Execution�Time�(sec)

Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load

~7500 ~3500

reducing data-to-query time

NoDB SIGMOD 2012

load index query

querying

load index query

querying

declarative SQL interface

load index query

querying

declarative SQL interfacecorrect and complete answers

load index query

querying

declarative SQL interfacecorrect and complete answers

load index query

querying

declarative SQL interfacecorrect and complete answers

complex and slow - not fit for exploration

dbTouch CIDR 2013

dbTouch

dbTouch CIDR 2013

data

dbTouch

dbTouch CIDR 2013

data

query

dbTouch

dbTouch CIDR 2013

data

query

no expert knowledge - no set-up costsinteractive mode fit for exploration

dbTouch

dbTouch CIDR 2013

challengedesign database systems tailored for touch exploration

dbTouch CIDR 2013

challengedesign database systems tailored for touch exploration

user drives query processing actions

3 Ideas for Big Data Exploration

Stratos IdreosCWI, INS-1, Amsterdam

adaptive indexingadaptive loading

dbTouch

3 Ideas for Big Data Exploration

Stratos IdreosCWI, INS-1, Amsterdam

adaptive indexingadaptive loading

dbTouch

systems tailored for exploration

more in INS-1

data vaults

sciql

datacell

holistic indexing

sciborg

charles

hardware-conscious

cyclotron

more in INS-1

data vaults

sciql

datacell

holistic indexing

sciborg

charles

Thank you!

hardware-conscious

cyclotron

top related