ldbc & the social network benchmark peter boncz database architectures (da) @ cwi special chair...
TRANSCRIPT
![Page 1: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/1.jpg)
LDBC & The Social Network
BenchmarkPeter Boncz
Database Architectures (DA) @ CWI
Special chair “Large-Scale Data Engineering” @ VU
event.cwi.nl/lsde2015
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
![Page 2: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/2.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
Engines for Data Analysis
Inaugural Lecture
October 2014
![Page 3: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/3.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
The Start-Up Company Experience 1996-2003
2008-
2013-
![Page 4: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/4.jpg)
the relationalindustry has been reshaped...
![Page 5: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/5.jpg)
LDBC & The Social Network
BenchmarkPeter Boncz
Database Architectures (DA) @ CWI
Special chair “Large-Scale Data Engineering” @ VU
event.cwi.nl/lsde2015
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
![Page 6: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/6.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
a benchmark is a standardtest that measures efficiency
Goal: quantification make competing systems comparable
important tool in experimental science accelerate progress, make technology
viable social goal, influence a research field
Benchmarking?
![Page 7: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/7.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
Graph data managementMany Big Data problems revolve around graphs Social network data AI methods that build/discover relationships
Wave of new systems (/research): Graph database systems
e.g. Neo4j -- graph & paths “first class citizens” RDF / SPARQL systems Graph extensions to relational systems
Extensions: e.g. recursive queries, traversals
Graph Programming Frameworks leveraging cluster computing for graph algorithms e.g. GraphLab – distributed AI algorithms Giraph “think like a vertex”
![Page 8: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/8.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
SNB (Social Network Benchmark) schema
![Page 9: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/9.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
SNB Workloads Interactive: tests a system's throughput with
relatively simple queries with concurrent updates For one person, recommend a friend based on
shared friends and interests
Business Intelligence: consists of complex structured queries for analyzing online behavior Who are influential people the topic of open source
development?
Graph Analytics: tests the functionality and scalability on most of the data as a single operation PageRank, Shortest Paths, Community Detection
![Page 10: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/10.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
Social Networks correlation between property values and
network structure
![Page 11: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/11.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
SNB datagen: correlated graph structure
P4
<know
s
>
<kn
ow
s
>
<knows>
P5
Student “Anna
”<is
>
<studyA
t
>
“University of Leipzig”
<liveAt
>“Germany”
“1990”
<birthYear>
<firstnam
e><firstname
>P1
< studyAt
>
“University of Leipzig”
“Laura”
“1990”
<birthYea
r>
<lik
e>
<Britney Spears>
<Britney Spears>
<like>
<knows
>
P3
<
studyAt
>“University of Leipzig” “1990
”
<b
irthYea
r> P2<studyAt
>
“University of Amsterdam”
<liv
eA
t
>
“Netherlands”
![Page 12: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/12.jpg)
SNB datagen: correlated graph structure
P4
P5
Student “Anna
”<is
>
<study
At>
“University of Leipzig”
<liveAt
>“Germany”
“1990”
<birthYear>
<firstnam
e><firstname
>P1
< studyAt
>
“University of Leipzig”
“Laura”
“1990”
<birthYea
r>
<lik
e>
<Britney Spears>
<Britney Spears>
<like>
P3
<
studyAt
>“University of Leipzig” “199
0”
<b
irthYea
r> P2 <study
At>“University of Amsterdam”
<liv
eA
t
>
“Netherlands”
Danger: this is very expensive to compute on a large graph!(quadratic, random access)
?
??
? ?
• Compute similarity of two nodes based on their (correlated) properties.
• Use a probability density function wrt to this similarity for connecting nodes
connectionprobability
highly similar less similar
?
![Page 13: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/13.jpg)
SNB datagen: correlated graph structure
P4
<know
s
>
<know
s
>
<knows>
P5
Student “Anna
”<is
>
<study
At>
“University of Leipzig”
<liveAt
>“Germany”
“1990”
<birthYear>
<firstnam
e><firstname
>P1
< studyAt
>
“University of Leipzig”
“Laura”
“1990”
<birthYea
r>
<lik
e>
<Britney Spears>
<Britney Spears>
<like>
<know
s>
P3
<
studyAt
>“University of Leipzig”
“1990”
<b
irthYea
r> P2 <study
At>“University of Amsterdam”
<liv
eA
t
>
“Netherlands”
Probability that two nodes are connected is skewed w.r.t the similarity between the nodes (due to probability distr.)
connectionprobability
highly similar less similar
Window
Trick: disregard nodes with too large similarity distance(only connect nodes in a similarity window)
![Page 14: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/14.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
SNB datagen: MapReduce approach
![Page 15: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/15.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
SNB datagen: temporal effects
![Page 16: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/16.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
SNB datagen: friend degree distribution Based on
“Anatomy of Facebook” blogpost (2013)
Diameter increases logarithmically with dataset scale factor
![Page 17: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/17.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
SNB datagen: how realistic is it?
GRADES2014 “How community-like is the structure of synthetically generated graphs” - Arnau Prat (UPC); David Domínguez-Sal (Sparsity Technologies)
Livejournal LFR3 (synthetic) SNB datagen
![Page 18: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/18.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
ldbcouncil.org Code @ github/ldbc
![Page 19: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/19.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
Industry Membership
![Page 20: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/20.jpg)
LDBC & The Social Network Benchmark - Scientific Meeting 2015-3-27
Summary LDBC
Graph and RDF benchmark council Choke-point driven benchmark design (user+system expert
involvement) Social Network Benchmark (SNB)
Advanced social network generator (scale-free,power-laws,clsuetring,correlations)
Real data distributions from DBpediaSIGMOD 2015 publication (to appear)
![Page 21: LDBC & The Social Network Benchmark Peter Boncz Database Architectures (DA) @ CWI Special chair “Large-Scale Data Engineering” @ VU event.cwi.nl/lsde2015](https://reader035.vdocument.in/reader035/viewer/2022062308/56649dce5503460f94ac194d/html5/thumbnails/21.jpg)
Designing Engines for Data Analysis - Inaugural Lecture - 14/10/2014
Working with Industry increases impact Jim Gray Michael Stonebreaker
ACM Turin
g
Award 1998 IEEE Von
Neumann
Medal 2004
ACM Turin
g
Award 2015