harmonizing multi-tenant hbase clusters for managing workload diversity

32
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity PRESENTED BY Dheeraj Kapur, Rajiv Chittajallu, Anish Mathew⎪ May 5, 2014

Upload: hbasecon

Post on 09-May-2015

1.308 views

Category:

Software


0 download

DESCRIPTION

Speakers: Dheeraj Kapur, Rajiv Chittajallu & Anish Mathew (Yahoo!) In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.

TRANSCRIPT

Page 1: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityPRESENTED BY Dheeraj Kapur, Rajiv Chittajallu, Anish Mathew May 5, 2014⎪

Page 2: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Agenda

Topic Speaker(s)

Overview of Hadoop stack and Grid Infrastructure at Yahoo Rajiv Chittajallu

Application onboarding on Multi-Tenant HBase Dheeraj Kapur

Automation for Compaction/Splits and Monitoring Anish Mathew

Q&A All Presenters

Page 3: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Hadoop at Yahoo

Page 4: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Hadoop Usage at Yahoo

HBaseCon 2014

Browsers

Mobile Devices

Web Crawl

KnowledgeGraph

3rd Party

Yahoo Grid

Business Intelligence Tools

(e.g. Tableau, MicroStrategy)

Data Collection

Asynchronous Data Processing

Synchronous Serving

UserEvents

WCC

Entity Feeds

Content Feeds

Source of truth for data*

Serving Systems

HomeRun

Search Mail

Mobile Flickr Media

Stream Ads

Native Ads

DisplayAds

Content systems

Y! NoSql

Page 5: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Grid Infrastructure at Yahoo

HBaseCon 2014

A multi-tenant, secure, distributed compute and storage

environment, based on Hadoop stack for large scale data

processing

Page 6: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Grid Stack

HBaseCon 2014

Page 7: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Deployment Model

HBaseCon 2014

DataNode NodeManager

NameNode RM

DataNodes RegionServers

NameNode HBase Master Nimbus

Supervisor

Administration, Management and Monitoring

ZooKeeperPools

HTTP/HDFS/GDM Load Proxies

Applications and Data

DataFeeds

Data Stores

Oozie Server

HS2/HCat

Page 8: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Network Architecture – 1G to node

HBaseCon 2014

Page 9: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Network Architecture – 10G Node

tor

VC0spine0 leaf0

spine1

spine7

.

.

leaf1

leaf31

.

.

VC1spine0 leaf0

spine1

spine7

.

.

leaf1

leaf31

.

.tor

40G (or 4 x 10G)

Host

Host

Host

.

.

.

.

.

.

10G

Page 10: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Hbase @ Yahoo

HBaseCon 2014

• 7 clusters, 1500 region servers, 6 PB of data

• Diverse use cases, 500+ Tables, 100k regions

• Rolling Major compaction & Split and Group Rebalancing

• RegionServer groups, namespaces and multi region config System

Page 11: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Challenges

HBaseCon 2014

• Customer onboarding and provisioning

• Access management and Table provisioning

• Deployments

• Customizing group configs

• Rolling Major Compaction and Splits

• Group Balancing

Page 12: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Use Cases

Page 13: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Use Cases

HBaseCon 2014

Search▪ Web Cache▪ Query Analysis▪ Local Listings▪ Analytics

Y! Mail▪ Anti-spam▪ Log Analytics▪ Metadata Mgmt.

Cloud Platforms▪ Performance▪ Monitoring▪ OpenStack

Consumer Platforms▪ CMS▪ Social Data

Online Ads▪ Traffic Protection▪ Ads Data Mgmt.

P13N▪ Content P13N▪ Ad targeting

Mobile▪ Notifications▪ Flickr

Sales▪ eCommerce

Yahoo’s Global Business

Page 14: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Web Crawl Cache

HBaseCon 2014

Developers/Scientists

Poller

Fetcher

Ingestor

Extruder

Processing

Random Read

poll

fetch

launch

write Compute ClustersN

MDN

NMDN

NMDN....

.HDFS NN

YARN RM

ClustersRSDN

RSDN

RSDN....

.HDFSNN

HBaseHM

r/w

insert

scan

Page 15: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Customer Onboarding & Multi Tenancy

Page 16: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Customer Onboarding & Provisioning

HBaseCon 2014

• Two identical environments (Prod and Non-Prod)

• Applications are on boarded to Non-Prod for

performance/Integration testing

• Once ready, provisioned on prod

• Performance results help in production onboarding

Page 17: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Namespaces

HBaseCon 2014

• Allow tenants to create/drop/modify their own tables

• Only super admin used to do it before

• Quota Management

• Security administration

• Commands : alter_namespace, create_namespace,

describe_namespace, drop_namespace, list_namespace,

list_namespace_tables

Page 18: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

RegionServer Groups

HBaseCon 2014

• Missing QoS in Hbase 0.94

• Isolation is required in Multi-tenant env

• Multi configs are required for different apps

• Commands : group_add, group_balance, group_get, group_list,

group_list_tables, group_list_transitions, group_move_servers,

group_move_tables, group_of_server, group_of_table,

group_remove

Page 19: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Multi Region Configs

HBaseCon 2014

SVN Jenkins Build Farm

Master Repository

Slave Repository Colo B

Slave Repository Colo A

HBase Cluster A

HBase Cluster B

Fetch Group List

Generate Multi Configs

Merge Default Config & Push multi config

Sync Configs Download

Host Maps and Multi Region Config

Page 20: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Compaction

HBaseCon 2014

• Minor & Major

• Minor picks up couple of smaller files and rewrite as one

• Major drop deletes or expire cells and picks up all files and rewrite

as one

Page 21: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Compaction file selection

HBaseCon 2014

FILE SIZE

Older File Age Younger

minCompactSize

Exc

lud

edIn

clu

ded

Page 22: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Compaction/Split Managemt

Page 23: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Config Parameters

HBaseCon 2014

• hbase.hstore.blockingStoreFile

• hbase.hstore.compaction.max.size

• hbase.hstore.compaction.min.size

• hbase.hstore.compaction.ratio

• hbase.hregion.max.filesize

• hbase.hregion.memstore.flush.size

• hbase.master.wait.on.regionservers.mintostart

Page 24: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Managed Compactions and Splits

HBaseCon 2014

• Flexible Scheduling

• Custom Logic per table and workload

Page 25: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Compaction and Splits Scheduler

HBaseCon 2014

Metrics

Mysql

Metrics

Analyze

Region Specific Metrics

Server Metrics

Scheduling Parameters HBaseCtl:

SchedulerHDFSPublish

HBase Cluster A

HBase Cluster B

HBase Cluster CUpdate Compaction/Split Statistics

Zookeeper

Coordination & Intermediate Store

Page 26: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Group Balancing

• Scheduled group balance followed by rolling major compaction

• Based on Data Locality

– Find data locality of each block of store files

– Move region to server where the maximum blocks are located

• Helps after cluster upgrades and restarts

• After config changes for a region group

HBaseCon 2014

Page 27: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Monitoring

Page 28: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Monitoring

HBaseCon 2014

• Simon Metrics & Yahoo Monitoring As a Service (YMS)

• OpenTSDB at Yahoo, replacing MySQL as backend for YMS

Page 29: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Monitoring cont..

HBaseCon 2014

Page 30: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Monitoring Cont.. ( Metrics for Customers)

HBaseCon 2014

Simon System

Other Systems for Analysis & Reporting

Jenkins Job : Merges and

Formats Metrics

HBase

HBase

Master

HDFS

Master

Grid Snodes

Customer Dashboards

Upload data to HDFS

Memory Dump from Master

Region Server Metrics

Push compiled metrics to snodes

Fetch metrics

Page 31: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Monitoring cont.. ( OpenTSDB )

HBaseCon 2014

• Evaluating

• Work required to make is production ready at Yahoo

Page 32: Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity

Thank You

HBaseCon 2014