case studies session 2

44
Blackbird Billions of rows, couple of milliseconds away Ishan Chhabra Shrijeet Paliwal Abhijit Pol

Upload: hbasecon

Post on 09-May-2015

819 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Case studies   session 2

BlackbirdBillions of rows, couple of milliseconds away

Ishan ChhabraShrijeet PaliwalAbhijit Pol

Page 2: Case studies   session 2
Page 3: Case studies   session 2

$2.38965$0.6782$1.7234

$0.09$1.78964$1.6782$1.7234$0.809$2.421.25

$2.11$1.26

$2.178$2.056$0.809$2.421.25

$2.11$1.26$2.78$1.56

$1.809$2.421.25

$2.11$1.26$2.78$0.56$2.421.25

$2.11$1.26$2.78

$0.756$0.809$2.421.25

$2.11$1.26$2.78

$1.256$1.809$2.421.25

$2.11$1.26$2.78

$0.586$2.009

1.25$2.11$1.26$2.78$1.56

$0.00

Site/PageGeo/WeatherTime of DayBrand AffinityUser

[ + ][ + ]

Page 4: Case studies   session 2
Page 5: Case studies   session 2

User Segments

3. Bid Request

5. Rocket Fuel Winning Ad

2. Ad Request

6. Ad Served

1. Page Request

4. Bid & Ad

Browser

User Engagements

Publishers

Data Partners

Exchange Partners

Optimize

Simple View of Rocket Fuel Platform

Real-time Bidder

User Engagement

User DataStore

Model Scoring

Page 6: Case studies   session 2

So what is Blackbird?

Page 7: Case studies   session 2

Facebook likes

Searches on Google

Bid Requests Considered by Rocketfuel

5 B

6 B

45 B

Requests per day

Page 8: Case studies   session 2

Blink of an eye

SF to Tokyo network round trip

One beat of a hummindbird's wing

Look up in Blackbird

400

100

20

2

Time (ms)

Page 9: Case studies   session 2

Powered by

Page 10: Case studies   session 2

HBase, we have a problem..

Page 11: Case studies   session 2

Object NoSQL Mapper

Page 12: Case studies   session 2

List<KeyValue>

Page 13: Case studies   session 2

High Performance Collections

Page 14: Case studies   session 2

» Data loss on concurrent modification» Read per write» High amount of data per write» O(n)

Page 15: Case studies   session 2

» Significantly reduced flushes, compaction, network usage, GC.

» O(1)

Page 16: Case studies   session 2

Combined Column: 100 entries

c1:combined

1 entry

c1:rand1

2 entries

c1:rand2

1 entry

c1:rand3

Logical Collection

Append Only, the HBase view

Page 17: Case studies   session 2

Optimizing reads using normalization

Combined Column: 100 entries

c1:combined

1 entry

c1:rand1

2 entries

c1:rand2

1 entry

c1:rand3

Combined Column: 103 entries

c1:combined

Page 18: Case studies   session 2

.filter( ), .transform( ), ⋋ ⋋ ⨍

Page 19: Case studies   session 2

Secondary Indexes

Page 20: Case studies   session 2

High ThroughputLow Latency

Lookups

Page 21: Case studies   session 2

Not so easy!

HBase is designed for high throughput writes

Page 22: Case studies   session 2

Key Ideas

Read as little as possible

Stay stable, uniform, data local

Don’t go to disk

Even if you have to go

to disk, make it fast

Page 23: Case studies   session 2

Protobufs, Protobufs, everywhere

Page 24: Case studies   session 2

Stay stable, uniform, data local at all times

Good quality hardware

Properly designed row keys

Off peak daily major compaction

Page 25: Case studies   session 2

Give me all your Cache!

128 Gb machines with 50% block CacheHigh Cache hit ratio (90% +) by effective utilization

Page 26: Case studies   session 2
Page 27: Case studies   session 2

It’s time to disk(o)

15K SAS drivesLocal & Short circuit reads (20-30% improvement)

Page 28: Case studies   session 2

High throughput writes aresupported too!

Page 29: Case studies   session 2

Small Writes

• Append Only• Protobufs

Large Memstores

• 4 Gb• Avoids flushes,

memory churn, compaction

• Maintains read performance by avoiding multiple seeks

Tuned Compaction

• Avoid Minor compactions

• Off Peak Major compaction

Page 30: Case studies   session 2

Reliability & Availability

Page 31: Case studies   session 2

Organize the chaos or pay the cost..

Page 32: Case studies   session 2

» Blind writes can grow rows & table too big

» Newbie clients 'guess' a lot

» Simple queries such as row count can be hard on the fly

Be aware…

Page 33: Case studies   session 2

Web app Bid Serving

Ad serving Data augmentation

Batch data pipelines Ops Housekeeping

Real time data pipelines

Multitenant Blackbird

Page 34: Case studies   session 2

Multi tenancy makes it hard to find the defaulter

Use ACLs & client side metrics in all access paths

Page 35: Case studies   session 2

Draft guidelines for new clients, help them estimate the growth

Keep track of growth, row count, row size, column size etc.

Page 36: Case studies   session 2

Maintaining SLA Guarantees

Page 37: Case studies   session 2

It’s a delicate equilibrium that is hard to maintain

Shield it with aggressive alerting, dashboards & canary monitoring

Page 38: Case studies   session 2

1st region server dies after several hours of clogged RPC queue

Bad region moves to another region server & soon kills it too!

2jmj7l5rSw0yVb_vlWAYkK_Ybwk

stgLVlK_SsLMn4HoG82ymp-QlRtA

Clients can go rouge, it can get as bad as a DoS attack

Protection via dynamic blacklists & size limit filters

Page 39: Case studies   session 2

Surviving the failures

Page 40: Case studies   session 2

» In absence of proxy: ‘The client is part of the cluster’ [1]

» Client must report availability error to calling application thread in short time span

» Follow circuit breaker pattern for read calls (Anecdote)

» ‘pseudo’ puts (local file) for write calls

[1] Blog post from Lars Hofhansl http://hadoop-hbase.blogspot.com/2012/09/hbase-client-timeouts.html

Page 41: Case studies   session 2

Shoutouts!

Page 43: Case studies   session 2
Page 44: Case studies   session 2