design patterns for building 360-degree views with hbase and kiji

37
Design Patterns for 360º Views using HBase and Kiji Jonathan Natkins

Upload: hbasecon

Post on 10-May-2015

720 views

Category:

Software


1 download

DESCRIPTION

Speaker: Jonathan Natkins (WibiData) Many companies aspire to have 360-degree views of their data. Whether they're concerned about customers, users, accounts, or more abstract things like sensors, organizations are focused on developing capabilities for analyzing all the data they have about these entities. This talk will introduce the concept of entity-centric storage, discuss what it means, what it enables for businesses, and how to develop an entity-centric system using the open-source Kiji framework and HBase. It will also compare and contrast traditional methods of building a 360-degree view on a relational database versus building against a distributed key-value store, and why HBase is a good choice for implementing an entity-centric system.

TRANSCRIPT

Page 1: Design Patterns for Building 360-degree Views with HBase and Kiji

Design Patterns for 360º Views using HBase and Kiji

Jonathan Natkins

Page 2: Design Patterns for Building 360-degree Views with HBase and Kiji

Who am I?

Jon “Natty” NatkinsField Engineer at WibiDataFormerly at Cloudera/Vertica

Page 3: Design Patterns for Building 360-degree Views with HBase and Kiji

What is a 360º View?

Page 4: Design Patterns for Building 360-degree Views with HBase and Kiji

What is a 360º View For?Past

What interactions has a customer had in the past?

PresentWhat is the customer doing right now?

FutureWhat is the customer likely do to next?

Past and present inform the future

Page 5: Design Patterns for Building 360-degree Views with HBase and Kiji

What If I Don’t Care About Customers?

Page 6: Design Patterns for Building 360-degree Views with HBase and Kiji

Generalizing the 360º View:Entity-Centric Systems

Page 7: Design Patterns for Building 360-degree Views with HBase and Kiji

Goal of an Entity-Centric System

“Show me everything I know about Natty”

Page 8: Design Patterns for Building 360-degree Views with HBase and Kiji

What Data Do I Need to Store?

Static data

Event-oriented data

Derived data

Page 9: Design Patterns for Building 360-degree Views with HBase and Kiji

Building Entity-Centric Systems

Often, this is an EDW with a star schema

Fact

Dim

Dim

Dim

Dim

Page 10: Design Patterns for Building 360-degree Views with HBase and Kiji

Challenges With Star Schemas

How do we answer the original question?

Full table scan + joinsOLTP systems will likely fall over from the volumeOLAP systems are usually not optimized for single-row lookups

Page 11: Design Patterns for Building 360-degree Views with HBase and Kiji

Need Something Else…

Page 12: Design Patterns for Building 360-degree Views with HBase and Kiji
Page 13: Design Patterns for Building 360-degree Views with HBase and Kiji

Why

HBase rows can store both static and event-oriented data

Cell versions are key

Single-row lookups are extremely fast

Page 14: Design Patterns for Building 360-degree Views with HBase and Kiji

is for Building Entity-Centric Systems

Often used for:Building recommendation systemsPersonalized searchReal-time HBase applications

Underlying technologies:

Page 15: Design Patterns for Building 360-degree Views with HBase and Kiji

Designing an Entity-Centric Datastore

Ask yourself this: what is the entity?

Determine your entity by determining how you want to analyze the data

It’s ok to have data organized in multiple ways

Page 16: Design Patterns for Building 360-degree Views with HBase and Kiji

Schema Management with Kiji

Sometimes you actually want a schema layerDefining a schema allows for data discoverability

Page 17: Design Patterns for Building 360-degree Views with HBase and Kiji

Column Families in KijiKiji has two types of column familiesGroup families are similar to relational tables

Predefined set of columnsEach column has its own data type

Map families specify columns at runtime

Every column has the same data type

Page 18: Design Patterns for Building 360-degree Views with HBase and Kiji

sessions:2345

sessions:2345

sessions:2345

sessions:1234

sessions:1234

info:purchases

Knowing When To Use Different Family Types

Do you know all of your columns up front?

Then use a group family

Map families are for when you don’t know your columns ahead of time

info:name info:emailsessions:1

234sessions:2

345info:purchas

esinfo:purchas

es

Page 19: Design Patterns for Building 360-degree Views with HBase and Kiji

Choosing a Row KeyRow keys in Kiji are componentized

[ ‘component1’, ‘component2’, 1234 ]

More efficient than byte arraysConsider ‘1234567890’ versus [ 1234567890 ]

Good for scanning areas of the keyspace

Page 20: Design Patterns for Building 360-degree Views with HBase and Kiji

A Common Use for Components

Known users IDs versus unknown IDsOn a website, how do you differentiate between a logged-in or cookie’d user versus a brand new visitor[ ‘K’, ‘user1234’ ] or [ ‘U’, ‘unknown2345’ ]

Physically and logically separate rowsRun jobs over all known or unknown users

Page 21: Design Patterns for Building 360-degree Views with HBase and Kiji

Identifying Known UsersProblem: Users have many cookies over time.

Challenge: Ideally, we would have a single row for each user. How do we ensure that new data goes to the right row?

Page 22: Design Patterns for Building 360-degree Views with HBase and Kiji

Finding Known Users WithLookup Tables

HBase get operations are fastIt’s easy enough to create a table that contains a mapping of cookies to known user IDsWhen data is loaded, check the lookup table to determine if you should write data to an existing row or a new one

Page 23: Design Patterns for Building 360-degree Views with HBase and Kiji

Avoiding Hotspots

Page 24: Design Patterns for Building 360-degree Views with HBase and Kiji

Unhashed Row KeysNode 1 Node 2 Node 3

RegionA-B

RegionB-C

RegionD-E

RegionF-G

RegionH-I

RegionJ-K

Page 25: Design Patterns for Building 360-degree Views with HBase and Kiji

Hash-Prefixed Row KeysNode 1 Node 2 Node 3

Region00A-0fK

Region10A-1fK

Region20A-2fK

Region30A-3fK

Region40A-4fK

Region50A-5fK

Page 26: Design Patterns for Building 360-degree Views with HBase and Kiji

Storing Event Series360º views need easy access to all the transactions and events for a userHBase cells may contain more than one versionKiji leverages this to store event series data like clicks or purchases

sessions:2345

sessions:2345

sessions:2345

sessions:1234

sessions:1234

info:purchasesinfo:name info:email

sessions:1234

sessions:2345

info:purchases

info:purchases

Page 27: Design Patterns for Building 360-degree Views with HBase and Kiji

How Many Events is Too Many?

The HBase book warns that too many versions of a cell can cause StoreFile bloat

HBase will never split a row

Common tactic is to add a timestamp range to the row key

Kiji makes this easy with componentized row keys

Page 28: Design Patterns for Building 360-degree Views with HBase and Kiji

Beware of Timestamp Misuse

A major reason the HBase book warns against mucking with timestamps is that they can be dangerous

What happens if you use a sequence number as a timestamp? Think about TTLs

Page 29: Design Patterns for Building 360-degree Views with HBase and Kiji

Iterate and Evolve

Page 30: Design Patterns for Building 360-degree Views with HBase and Kiji

Why is Evolution Necessary?No entity-centric system will be the end-all, be-all the first time aroundData sources in large enterprises are usually heavily silo’dStart smallIncorporate new data sources over time

Page 31: Design Patterns for Building 360-degree Views with HBase and Kiji

Putting it TogetherKiji includes a shell to use DDL to create tablesMany of the features that have been discussed are declarative via the DDL

Page 32: Design Patterns for Building 360-degree Views with HBase and Kiji

Users TableCREATE TABLE ’user_events' WITH DESCRIPTION 'Events table for online users.'ROW KEY FORMAT (type STRING, user_id STRING NOT NULL, HASH(THROUGH user_id))PROPERTIES (NUMREGIONS = 32)WITH LOCALITY GROUP default WITH DESCRIPTION 'main storage' ( MAXVERSIONS = INFINITY, TTL = FOREVER, INMEMORY = false, MAP TYPE FAMILY events CLASS com.kiji.avro.Event WITH DESCRIPTION 'events'),LOCALITY GROUP memory WITH DESCRIPTION 'recs storage' ( MAXVERSIONS = 10, TTL = FOREVER, INMEMORY = true, FAMILY recs ( recommended CLASS com.kiji.avro.ProductRecList WITH DESCRIPTION 'Recommended products.' ));

Page 33: Design Patterns for Building 360-degree Views with HBase and Kiji

Users TableCREATE TABLE ’user_events' WITH DESCRIPTION 'Events table for online users.'

ROW KEY FORMAT (type STRING, user_id STRING NOT NULL, HASH(THROUGH user_id))PROPERTIES (NUMREGIONS = 32)WITH LOCALITY GROUP default WITH DESCRIPTION 'main storage' ( MAXVERSIONS = INFINITY, TTL = FOREVER, INMEMORY = false, MAP TYPE FAMILY events CLASS com.kiji.avro.Event WITH DESCRIPTION 'events'),LOCALITY GROUP memory WITH DESCRIPTION 'recs storage' ( MAXVERSIONS = 10, TTL = FOREVER, INMEMORY = true, FAMILY recs ( recommended CLASS com.kiji.avro.ProductRecList WITH DESCRIPTION 'Recommended products.' ));

Page 34: Design Patterns for Building 360-degree Views with HBase and Kiji

Users TableCREATE TABLE ’user_events' WITH DESCRIPTION 'Events table for online users.'ROW KEY FORMAT (type STRING, user_id STRING NOT NULL, HASH(THROUGH user_id))PROPERTIES (NUMREGIONS = 32)

WITH LOCALITY GROUP default WITH DESCRIPTION 'main storage' ( MAXVERSIONS = INFINITY, TTL = FOREVER, INMEMORY = false, MAP TYPE FAMILY events CLASS com.kiji.avro.Event WITH DESCRIPTION 'events'),LOCALITY GROUP memory WITH DESCRIPTION 'recs storage' ( MAXVERSIONS = 10, TTL = FOREVER, INMEMORY = true, FAMILY recs ( recommended CLASS com.kiji.avro.ProductRecList WITH DESCRIPTION 'Recommended products.' ));

Page 35: Design Patterns for Building 360-degree Views with HBase and Kiji

Users TableCREATE TABLE ’user_events' WITH DESCRIPTION 'Events table for online users.'ROW KEY FORMAT (type STRING, user_id STRING NOT NULL, HASH(THROUGH user_id))PROPERTIES (NUMREGIONS = 32)WITH LOCALITY GROUP default WITH DESCRIPTION 'main storage' ( MAXVERSIONS = INFINITY, TTL = FOREVER, INMEMORY = false, MAP TYPE FAMILY events CLASS com.kiji.avro.Event WITH DESCRIPTION 'events'),LOCALITY GROUP memory WITH DESCRIPTION 'recs storage' ( MAXVERSIONS = 10, TTL = FOREVER, INMEMORY = true,

FAMILY recs ( recommended CLASS com.kiji.avro.ProductRecList WITH DESCRIPTION 'Recommended products.’));

Page 36: Design Patterns for Building 360-degree Views with HBase and Kiji

In Summary…Designing applications in an entity-centric fashion can make them easier to build and more efficientKiji can speed up the development process of 360º views

Page 37: Design Patterns for Building 360-degree Views with HBase and Kiji

Questions?Contact me

[email protected]@nattyice

The Kiji Project: kiji.org