3 ideas for big data exploration · 3 ideas for big data exploration stratos idreos cwi, ins-1,...
TRANSCRIPT
3 Ideas for Big Data Exploration
Stratos IdreosCWI, INS-1, Amsterdam
adaptive indexingadaptive loading
dbTouch
data is everywhere
years
daily data
years
daily data
Eric Schmidt: Every two days we create as much data as much we did from dawn of humanity to 2003
years
daily data
Eric Schmidt: Every two days we create as much data as much we did from dawn of humanity to 2003
2012
3-4 exabytes
not always sure what we are looking for (until we find it)
data exploration
not always sure what we are looking for (until we find it)
database systems great...declarative processing, back-end to numerous apps
database systems great...
but databases are too heavy and blind!
declarative processing, back-end to numerous apps
database systems great...
but databases are too heavy and blind!
timeline
declarative processing, back-end to numerous apps
database systems great...
but databases are too heavy and blind!
timeline
load
declarative processing, back-end to numerous apps
database systems great...
but databases are too heavy and blind!
timeline
load index
declarative processing, back-end to numerous apps
database systems great...
but databases are too heavy and blind!
timeline
load index query
declarative processing, back-end to numerous apps
database systems great...
but databases are too heavy and blind!
timeline
load index query
declarative processing, back-end to numerous apps
expert users - idle time - workload knowledge
systems tailored for data exploration
systems tailored for data exploration
no workload knowledge
systems tailored for data exploration
no installation steps
no workload knowledge
systems tailored for data exploration
quick response times
no installation steps
no workload knowledge
systems tailored for data exploration
quick response times
no installation steps
no workload knowledge
minimize data-to-query time
adaptive indexing7 years, 7 papers
ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award
3 ideas
adaptive indexing
adaptive loading
7 years, 7 papers
3 years, 3 papers
ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award
In use in Hadapt and Tableau
3 ideas
adaptive indexing
adaptive loading
dbTouch
7 years, 7 papers
3 years, 3 papers
1 year, 1 paperVENI 2012
ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award
In use in Hadapt and Tableau
3 ideas
adaptive indexing
adaptive loading
dbTouch
7 years, 7 papers
3 years, 3 papers
1 year, 1 paperVENI 2012
Martin Kersten, Stefan Manegold, Felix Halim, Panagiotis Karras, Roland Yap, Goetz Graefe, Harumi Kuno,
Eleni Petraki, Anastasia Ailamaki, Ioannis Alagiannis, Renata Borovica, Miguel Branco, Ryan Johnson, Erietta Liarou
ACM SIGMOD Jim Gray Diss. Award - Cor Baayen Award
In use in Hadapt and Tableau
3 ideas
indexing
tune= create proper indexes offlineperformance 10-100X
load index query
indexing
tune= create proper indexes offlineperformance 10-100X
but it depends on workload!which indices to build?on which data parts?and when to build them?
load index query
load index query
timeline
load index query
timeline
sample workload
load index query
timeline
sample workload analyze
load index query
timeline
sample workload analyze create indices
load index query
timeline
sample workload analyze create indices query
load index query
timeline
sample workload analyze create indices query
complex and time consuming process
load index query
timeline
sample workload analyze create indices query
complex and time consuming process
load index query
human administrators + auto-tuning tools
what can go wrong?
what can go wrong?
not enough idle time to finish proper tuning
what can go wrong?
not enough idle time to finish proper tuning
by the time we finish tuning, the workload changes
database cracking
database crackingincremental, adaptive, partial indexing
remove all tuning steps and need for human input
database crackingincremental, adaptive, partial indexing
remove all tuning steps and need for human input
indexing
initialization
database crackingincremental, adaptive, partial indexing
remove all tuning steps and need for human input
indexing
initialization query
database crackingincremental, adaptive, partial indexing
remove all tuning steps and need for human input
indexing
initialization query
every query is treated as an advice on how data should be stored
database crackingincremental, adaptive, partial indexing
remove all tuning steps and need for human input
indexing
initialization query
cracking exampleDatabase Cracking CIDR 2007
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
cracking exampleDatabase Cracking CIDR 2007
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Resu
lt tu
ples
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operatorRe
sult
tupl
es
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operatorRe
sult
tupl
es
Database Cracking CIDR 2007
gain knowledge on how data is organized
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Resu
lt tu
ples
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Resu
lt tu
ples
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
the more we crack, the more we learn
cracking example
from Rwhere R.A > 10
and R.A < 14
select *
Q2:select *from Rwhere R.A > 7
and R.A <= 16
Q1:136798131211141619 Piece 5: 16 < A
Piece 3: 10 < A < 14
Piece 1: A <= 7
Piece 2: 7 < A <= 10
Piece 4: 14 <= A <= 16
Cracker column of A Cracker column of A
10 < A < 14
14 <= A
A <= 10Piece 1:
Piece 3:
Piece 2:
(in!place)(copy)
Q1 Q2
Column A
13164921271193141186
49271386131211161914
42
Resu
lt tu
ples
Dynamically/on-the-fly within the select-operator
Database Cracking CIDR 2007
every query is treated as an advice on how data should be stored
the more we crack, the more we learn
TPC-H
Sideways Cracking, SIGMOD 09
70
330
100
150
200
250
300
0 5 10 15 20 25 30Query sequence
TPC-H Query 15764420
1000
10000
Sel. Crack
Sid. Crack
MonetDB
PresortedMySQL
PresortedRe
spon
se ti
me
(milli
sec
s)
TPC-H
Sideways Cracking, SIGMOD 09
70
330
100
150
200
250
300
0 5 10 15 20 25 30Query sequence
TPC-H Query 15764420
1000
10000
Sel. Crack
Sid. Crack
MonetDB
PresortedMySQL
PresortedRe
spon
se ti
me
(milli
sec
s)
Normal MonetDB
selection cracking
TPC-H
Sideways Cracking, SIGMOD 09
70
330
100
150
200
250
300
0 5 10 15 20 25 30Query sequence
TPC-H Query 15764420
1000
10000
Sel. Crack
Sid. Crack
MonetDB
PresortedMySQL
Presorted
Preparation cost3-14 minutes
Resp
onse
tim
e (m
illi s
ecs)
Presorted MonetDB
Normal MonetDB
selection cracking
TPC-H
Sideways Cracking, SIGMOD 09
70
330
100
150
200
250
300
0 5 10 15 20 25 30Query sequence
TPC-H Query 15764420
1000
10000
Sel. Crack
Sid. Crack
MonetDB
PresortedMySQL
Presorted
Preparation cost3-14 minutes
Resp
onse
tim
e (m
illi s
ecs)
Presorted MonetDB
MonetDB with sideways cracking
Normal MonetDB
selection cracking
TPC-H
Sideways Cracking, SIGMOD 09
70
330
100
150
200
250
300
0 5 10 15 20 25 30Query sequence
TPC-H Query 15764420
1000
10000
Sel. Crack
Sid. Crack
MonetDB
PresortedMySQL
Presorted
Preparation cost3-14 minutes
Resp
onse
tim
e (m
illi s
ecs)
Presorted MonetDB
MonetDB with sideways cracking
Normal MonetDB
selection cracking
TPC-H
Sideways Cracking, SIGMOD 09
70
330
100
150
200
250
300
0 5 10 15 20 25 30Query sequence
TPC-H Query 15764420
1000
10000
Sel. Crack
Sid. Crack
MonetDB
PresortedMySQL
Presorted
Preparation cost3-14 minutes
Resp
onse
tim
e (m
illi s
ecs)
Presorted MonetDB
MonetDB with sideways cracking
Normal MonetDB
selection cracking
TPC-H
Sideways Cracking, SIGMOD 09
70
330
100
150
200
250
300
0 5 10 15 20 25 30Query sequence
TPC-H Query 15764420
1000
10000
Sel. Crack
Sid. Crack
MonetDB
PresortedMySQL
Presorted
Preparation cost3-14 minutes
Resp
onse
tim
e (m
illi s
ecs)
Presorted MonetDB
MonetDB with sideways cracking
Normal MonetDB
selection cracking
skyserver experiments
cracking answers 160.000 queries while full indexing is still half way creating the index
Stochastic Cracking, PVLDB 12
cracking databasesupdates (SIGMOD07)
multiple columns (SIGMOD09)
concurrency control (PVLDB12)
storage restrictions (SIGMOD09)
robustness (PVLDB12)
benchmarking (TPCTC10)
algorithms (PVLDB11)
cracking databasesupdates (SIGMOD07)
multiple columns (SIGMOD09)
concurrency control (PVLDB12)
storage restrictions (SIGMOD09)
robustness (PVLDB12)
benchmarking (TPCTC10)
algorithms (PVLDB11)
disk based
joins
multi-cores
aggregations
row-stores
vectorwised
optimization
multi-dimensional
holistic indexing
load index query
loading
load index query
loading
copy data inside the databasedatabase now has full control
load index query
loading
copy data inside the databasedatabase now has full control
slow process...not all data might be needed all the time
database vs. unix tools
1 file, 4 attributes, 1 billion tuples
0 550 1,100 1,650 2,200DBAwk
single query cost (secs)
database vs. unix tools
93%
7%LoadingQuery Processing
1 file, 4 attributes, 1 billion tuples
0 550 1,100 1,650 2,200DBAwk
single query cost (secs)
break down db cost
database vs. unix tools
93%
7%LoadingQuery Processing
1 file, 4 attributes, 1 billion tuples
0 550 1,100 1,650 2,200DBAwk
single query cost (secs)
break down db cost
loading is a major bottleneck
database vs. unix tools
93%
7%LoadingQuery Processing
1 file, 4 attributes, 1 billion tuples
0 550 1,100 1,650 2,200DBAwk
single query cost (secs)
break down db cost
... but writing/maintaining scripts is hard too
loading is a major bottleneck
adaptive loading
load/touch only what is needed and only when it is needed
but raw data access is expensive
tokenizing - parsing - no indexing - no statistics
but raw data access is expensive
tokenizing - parsing - no indexing - no statistics
challenge: fast raw data access
query plan
query plan
scan
query plan
scan
db
query plan
scan
db
scan
query plan
scanscan
zero loading cost
query plan
scanscan
files
access raw data adaptively on-the-fly
zero loading cost
query plan
scanscan
files cache
access raw data adaptively on-the-fly
zero loading cost
query plan
scanscan
files cache
access raw data adaptively on-the-fly
zero loading cost
selective parsing - file indexing file splitting - online statistics
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
NoDB SIGMOD 2012
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
NoDB SIGMOD 2012
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
NoDB SIGMOD 2012
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
NoDB SIGMOD 2012
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
NoDB SIGMOD 2012
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
NoDB SIGMOD 2012
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
NoDB SIGMOD 2012
40
0
200
400
600
800
1000
1200
1400
MySQL CSV�Engine�MySQL
DBMS�X DBMS�Xw/�external�files
PostgreSQL PostgresRawPM�+�C
Execution�Time�(sec)
Q20 Q19 Q18Q17 Q16 Q15Q14 Q13 Q12Q11 Q10 Q9Q8 Q7 Q6Q5 Q4 Q3Q2 Q1 Load
~7500 ~3500
reducing data-to-query time
NoDB SIGMOD 2012
load index query
querying
load index query
querying
declarative SQL interface
load index query
querying
declarative SQL interfacecorrect and complete answers
load index query
querying
declarative SQL interfacecorrect and complete answers
load index query
querying
declarative SQL interfacecorrect and complete answers
complex and slow - not fit for exploration
dbTouch CIDR 2013
dbTouch
dbTouch CIDR 2013
data
dbTouch
dbTouch CIDR 2013
data
query
dbTouch
dbTouch CIDR 2013
data
query
no expert knowledge - no set-up costsinteractive mode fit for exploration
dbTouch
dbTouch CIDR 2013
challengedesign database systems tailored for touch exploration
dbTouch CIDR 2013
challengedesign database systems tailored for touch exploration
user drives query processing actions
3 Ideas for Big Data Exploration
Stratos IdreosCWI, INS-1, Amsterdam
adaptive indexingadaptive loading
dbTouch
3 Ideas for Big Data Exploration
Stratos IdreosCWI, INS-1, Amsterdam
adaptive indexingadaptive loading
dbTouch
systems tailored for exploration
more in INS-1
data vaults
sciql
datacell
holistic indexing
sciborg
charles
hardware-conscious
cyclotron
more in INS-1
data vaults
sciql
datacell
holistic indexing
sciborg
charles
Thank you!
hardware-conscious
cyclotron