public - google i_o 2012- crunching big data with bigquery
TRANSCRIPT
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
1/63
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
2/63
Crunching Big Data
with Big Query
Ryan Boyd, Developer Advocate
Jordan Tigani, Software Engineer
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
3/63
How BIG is big?
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
4/63
1 million rows?
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
5/63
millionmillionmillion
millionmillionmillion
millionmillionmillion
million
10 million rows?
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
6/63
1 million1 million1 million
1 million1 million1 million
1 million1 million1 million
1 million
1 million1 million1 million
1 million1 million1 million
1 million1 million1 million
1 million
1 million1 million1 million
1 million1 million1 million
1 million1 million1 million
1 million
1 1 1
1 1 1
1 1 1
1
100 million rows
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
7/63
1 millionmillion 1 million
1 million1 million
1 million1 million1 million1 million1 million1 million
1 million
1 million1 million
1 million1 million1 million1 million1 million1 million
1 million
1 million1 million
1 million1 million1 million1 million1 million1 million
1 million
1 m1 m
1 m1 m1 m1 m1 m1 m
1 m
1 million1 million
1 million1 million
1 million1 million1 million
1 million
1 million1 million
1 million1 million1 million1 million1 million1 million
1 million
1 million1 million
1 million1 million1 million1 million1 million1 million
1 million
11
11111
million
1
1
millionmillionmillion
million
million
1 million
millionmillion
million
1 million1 million1 million
1 million
1 million1 million1 million
1 million
1 million1 million1 million
1 million1 million1 million
1 million1 million1 million
1 million
1 m1 m1 m
1 m1 m1 m
1 m1 m1 m
1 m
1 million1 million1 million1 million1 million1 million1 million1 million1 million
1 million1 million1 million1 million1 million1 million1 million1 million1 million
1 million1 million1 million1 million1 million1 million1 million1 million1 million
500 million rows
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
8/63
Big Data at Google
60 hours
100 million gigaby
425 million users
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
9/63
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
10/63
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
11/63
Google's internal technology:
Dremel
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
12/63
Big Data at Google - Finding top installed market a
SELECT
top(appId, 20) AS app,
count(*) AS countFROM installlog.2012;
ORDER BY
count DESC
Result in ~20 seconds!
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
13/63
Big Data at Google - Finding slow servers
SELECT
count(*) AS count, source_machine AS machine
FROM product.product_log.liveWHERE
elapsed_time > 4000
GROUP BY
source_machine
ORDER BY
count DESC
Result in ~20 seconds!
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
14/63
BigQuery gives you this power
Store data with reliability, redundancy andconsistency
Go from data to meaning
Quickly!
At scale ...
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
15/63
How are developers using it?
Game and social media analytics
Advertising campaign optimization
Sensor data analysis
Infrastructure monitoring
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
16/63
Show the power
Loading your data Running your queries Underlying architecture design
Advanced queries
Agenda
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
17/63
Let's dive in!
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
18/63
BigQuery UI
bigquery.cloud.google.com
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
19/63
Loading your Data
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
20/63
Ingestion: Data format
birth_record
parent_id_mother
parent_id_father
plurality
is_malerace
weight
parents
id
race
age
cigarette_usstate
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
21/63
Ingestion: Data format
birth_record
mother_racemother_age
mother_cigarette_use
mother_state
father_race
father_age
father_cigarette_usefather_state
plurality
is_male
race
weight
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
22/63
Ingestion: Data format
1969,1969,1,20,,AL,TRUE,1,7.813,AL,1,2
1971,1971,5,7,,NY,FALSE,1,7.213,MA,5,72001,2001,12,5,,CA,TRUE,2,6.427,CA,12,
CSV
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
23/63
Running your Queries
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
24/63
Java
Python .NET PHP
JavaScript
Apps Script ... more ...
Libraries
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
25/63
It's REST
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
26/63
BigQuery architecture
Developing intuition about BigQuery
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
27/63
Disk storage
Relational Database Architecture: B-TreeRoot
0-999
Level 1
100-199
Level 1
700-799
Level 1
400-499
Level 2
60-69
Level 2
410-419
Level 2
430-439
Level 2
750-759
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
28/63
Disk storage
Relational Database Architecture: Finding a ValueRoot
0-999
Level 1
100-199
Level 1
700-799
Level 1
400-499
Level 2
60-69
Level 2
410-419
Level 2
430-439
Level 2
750-759
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
29/63
If you do a table scan over a 1TB table,
you're going to have a bad time.
Anonymous
16th century Italian Philosopher-Monk
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
30/63
Reading 1 TB/ second from disk: 10k+ disks
Processing 1 TB / sec: 5k processors
Goal: Perform a 1 TB table scan in 1 secondParallelize Parallelize Parallelize!
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
31/63
Data access: Column Store
Record Oriented Storage Column Oriented S
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
32/63
"Why not MapReduce?
Anonymous
Reddit User
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
33/63
Distributed Storage (e.g. GFS)
MapReduce... how does it work?Cont
Worker 0 Worker 1 Worker 2 Worker 3
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
34/63
Distributed Storage (e.g. GFS)
MapReduceCont
Worker 0 Worker 1 Worker 2 Worker 3
1. Map!
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
35/63
Distributed Storage (e.g. GFS)
MapReduceCont
Worker 0 Worker 1 Worker 2 Worker 3
2. Reduce!
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
36/63
Distributed Storage (e.g. GFS)
MapReduceCont
Worker 0 Worker 1 Worker 2 Worker 3
3. Profit!*
* Don't forget to shuffle Multiple passes may be req
Void where prohibited
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
37/63
Gray's third law for big data:Bring computations to the data, rather
than data to the computations.
Jim Gray
Database Pioneer
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
38/63
Distributed Storage (e.g. GFS)
BigQuery Architecture: Computation treeMixer 0
Mixer 1
Shard 0-8
Mixer 1
Shard 17-24
Mixer 1
Shard 9-16
Shard 0 Shard 10 Shard 12 Shard 20
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
39/63
Distributed Storage (e.g. GFS)
BigQuery Architecture: Finding a valueMixer 0
Mixer 1
Shard 0-8
Mixer 1
Shard 17-24
Mixer 1
Shard 9-16
Shard 0 Shard 10 Shard 12 Shard 20
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
40/63
SELECT COUNT(foo), MAX(foo), STDDEV(f
FROM ...
BigQuery SQL Example: Simple aggregates
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
41/63
SELECT ... FROM ....
WHERE REGEXP_MATCH(url, "\.com$")
AND userCONTAINS 'test'
BigQuery SQL Example: Complex Processing
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
42/63
SELECT COUNT(*) FROM
(SELECT foo ..... )GROUP BY foo
BigQuery SQL Example: Nested SELECT
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
43/63
BigQuery SQL Example: Small JOIN
SELECT huge_table.fooFROM huge_table
JOIN small_table
ON small_table.foo = small_table.foo
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
44/63
Distributed Storage (e.g. GFS)
BigQuery Architecture: Small JoinMixer 0
Mixer 1
Shard 0-8
Mixer 1
Shard 17-24
Shard 0 Shard 20
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
45/63
SELECT foo, barFROM huge_table
Where huge_table is very large and no filter is applied
Fix with:
... LIMIT 100
BigQuery SQL Example: Response too large
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
46/63
SELECT ... FROM ... GROUP BYuser_id
Where number of unique users is very large.
Fix with:
... WHERE HASH(user_id) % 10 = 0
BigQuery SQL Example: Internal response too larg
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
47/63
SELECT user_id, COUNT(user_id) ...
GROUP BY user_idORDER BY user_id DESC
Where number of unique users is very large.
Fix with:
SELECT TOP(user_id, 20), count(user_id) ...
BigQuery SQL Example: Internal response too larg
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
48/63
Wikipedia:
"GitHub is a web-based hosting service for softwaredevelopment projects ... GitHub is ... the most popularsource hosting site"
Using GitHub timeline dataset
Advanced Query Demo
http://en.wikipedia.org/wiki/Shared_web_hosting_servicehttp://en.wikipedia.org/wiki/Shared_web_hosting_servicehttp://en.wikipedia.org/wiki/Shared_web_hosting_servicehttp://en.wikipedia.org/wiki/Shared_web_hosting_service -
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
49/63
What is big data, anyway? BigQuery's Not MapReduce What's BigQuery good for? How to think about query execution
Summary
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
50/63
SELECT questions FROM audience
SELECT 'Thank You!'
FROM ryan, jordan
http://developers.google.com/bigquery
@ryguyrg http://profiles.google.com/ryan.boyd
@tigani https://plus.google.com/115600841849663767233
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
51/63
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
52/63
Titles are formatted as Open Sans with bold applied andfont size is set at 30pts
Vertical position for title is .3 Vertical position for bullet text is 1.54
Title capitalization is title case
Subtitle capitalization is title case
Titles and subtitles should never have a period at the end
Presentation Bullet Slide Layout
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
53/63
Titles are formatted as Open Sans with bold applied and
font size is set at 30pts Vertical position for title is .3
Vertical position for subtitle is 1.1
Vertical position for bullet text is 2
Title capitalization is title case
Subtitle capitalization is title case
Titles and subtitles should never have a period at the end
Subtitle Placeholder
Bullet Slide With Subtitle Placeholder
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
54/63
Color Palette
67
135
253
244
74
63
255
209
77
13
168
97
Flat Color Secondary GraysGradient
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
55/63
Graphic Element Styles and Arrows
Rounded Boxes
HTML
CodeBoxes
Box Title
Body Copy
Goes Here
Box Title
Body Copy
Goes Here
Arrows
Content Container Boxes
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
56/63
Pie Chart ExampleSubtitle Placeholder
Chart Title
source: place source info here
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
57/63
Column Chart ExampleSubtitle Placeholder
source: place source info here
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
58/63
Line Chart ExampleSubtitle Placeholder
source: place source info here
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
59/63
Table Option ASubtitle Placeholder
Column 1 Column 2 Column 3 Col
Row 1 placeholder placeholder placeholder plac
Row 2 placeholder placeholder placeholder place
Row 3 placeholder placeholder placeholder plac
Row 4 placeholder placeholder placeholder plac
Row 5 placeholder placeholder placeholder plac
Row 6 placeholder placeholder placeholder plac
Row 7 placeholder placeholder placeholder place
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
60/63
Table Option BSubtitle Placeholder
Header 1 placeholder placeholder placeholder
Header 2 placeholder placeholder placeholder
Header 3 placeholder placeholder placeholder
Header 4 placeholder placeholder placeholder
Header 5 placeholder placeholder placeholder
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
61/63
Segue SlideSubtitle Placeholder
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
62/63
This is an example ofquote text.
Name
Company
-
7/28/2019 PUBLIC - Google I_O 2012- Crunching Big Data With BigQuery
63/63
Code Slide With Subtitle PlaceholderSubtitle Placeholder
// Say hello world until the user starts questioning
// the meaningfulness of their existence.function helloWorld(world) {
for (vari = 42;--i >= 0;) {
alert (Hello + String(world));
}
}
p { color: pink }
p { color: blue }
u { color: umber }