hbase design patterns @ yahoo!

20
HBase Design Patterns @ Y! PRESENTED BY Francis Liu | [email protected]⎪ May 5, 2014

Upload: hbasecon

Post on 27-Aug-2014

352 views

Category:

Software


3 download

Tags:

DESCRIPTION

Speaker: Francis Liu (Yahoo!) HBase's introduction into the Yahoo! Grid has provided our users with new ways to process and store data. A year after its availability, there has been varied usages: Event processing for personalization, incremental processing for ingestion, time-based aggregations for analytics, etc. All these were possible thanks to features HBase brings beyond working with HDFS files. This talk will review some recurring HBase design patterns at Yahoo! as well as share our learnings and experiences.

TRANSCRIPT

Page 1: HBase Design Patterns @ Yahoo!

HBase Design Patterns @ Y!

PRESENTED BY Francis Liu | [email protected] May 5, 2014⎪

Page 2: HBase Design Patterns @ Yahoo!

Y! Grid

▪ Off-Stage Processing▪ Hosted Service▪ Multi-tenant

Page 3: HBase Design Patterns @ Yahoo!

Batch Processing (with HDFS)

▪ Append-only ▪ Efficient full table scans▪ Process entire data set (or partitions)

Page 4: HBase Design Patterns @ Yahoo!

HBase

▪ Mutable▪ Point Access ▪ Range scans▪ Record-level processing▪ 7 clusters, 1500 nodes, 6PB

Page 5: HBase Design Patterns @ Yahoo!

Entity Store: Motivation

▪ Integrate data from multiple data sources▪ Store historical data▪ Share data

› Analytics› Machine Learning› Consume a data source

Page 6: HBase Design Patterns @ Yahoo!

Entity Store

▪ Records as Entities› Web pages› Celebrities› etc.

▪ Denormalized as a single table

Page 7: HBase Design Patterns @ Yahoo!

Entity Store: Content Store

Page 8: HBase Design Patterns @ Yahoo!

Entity Store: Considerations

▪ Row vs multiple rows as an entity?› Row in most cases

▪ Blob vs Primitives as cell values?› Blobs are more compact› Primitives work better for granular updates› Out of the box filters work better with primitives› Use a compact binary format

▪ Prepare for Schema Changes› Provide a DAO library

▪ Incremental Scan› Batch id (via version)› Size cache for batch

Page 9: HBase Design Patterns @ Yahoo!

Event Processing: Motivation

▪ Process a stream of events› Ad Targeting› Personalization› etc.

▪ Low average age of a record/model/etc

Page 10: HBase Design Patterns @ Yahoo!

Event Processing

▪ Entity Store▪ Incremental computation

› Persist incremental state▪ Stream processing framework

› ie Storm▪ Fit working set in Block Cache

Page 11: HBase Design Patterns @ Yahoo!

Event Processing: Ad Targeting

Ad Targeting

Page 12: HBase Design Patterns @ Yahoo!

Event Processing - Considerations

▪ Limit large compactions▪ Deferred log flush▪ Avoid compaction storms▪ Async Access

› HBase work queue› AsyncHBase

▪ Blobs when possible▪ Cache optimizations

Page 13: HBase Design Patterns @ Yahoo!

Phased Event Processing: Motivation

▪ Large/Complex event pipeline▪ Modularization▪ Dependency between pipelines

Page 14: HBase Design Patterns @ Yahoo!

Phased Event Processing

▪ Notifications › Separate Table› Separate Column Family

Page 15: HBase Design Patterns @ Yahoo!

Phased Event Processing: Personalization

Page 16: HBase Design Patterns @ Yahoo!

Phased Event Processing: Considerations

▪ Notifications› Ordered› At least once

▪ Write to multiple regions▪ Transactions

Page 17: HBase Design Patterns @ Yahoo!

Time Series DB: Motivation

▪ Track/Monitor changes over time› Application Metrics› User Analytics› System Metrics› etc.

▪ Alerts/Alarms› Thresholds› Changes over time

Page 18: HBase Design Patterns @ Yahoo!

Time Series DB: Personalization Data Quality

Page 19: HBase Design Patterns @ Yahoo!

Time-Series: Considerations

▪ Hot metrics› Namespace› Indexed tags

▪ Pre-compute aggregates if it is accessed often▪ Consider using a block encoding scheme (PREFIX, FAST_DIFF, etc)▪ Consider pre-computed aggregates in a separate table▪ Consider OpenTSDB

Page 20: HBase Design Patterns @ Yahoo!

HBaseCon 2014

Thank You!(We’re hiring)