cc 2.0 by william brawley |...

25
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3

Upload: others

Post on 29-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3

Page 2: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

2

Agenda

•  Why Hadoop and HBase? •  Social Media Monitoring •  Prospective Search and Coprocessors

•  Challenges & Lessons Learned •  Resources to get started

August 31,

2012

Page 3: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

3

About Sentric

•  Spin-o! of MeMo News AG, the leading provider for Social Media Monitoring & Analytics in Switzerland

•  Big Data expert, focused on Hadoop, HBase and Solr

•  Objective: Transforming data into insights

August 31,

2012

Page 4: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

CC 2.0 by Editor B| h"p://flic.kr/p/bcU5aD1

Page 5: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

5

Social Media Monitoring Process

Why Hadoop and HBase?

August 31,

2012

Information Gathering

Information Processing

Analysis & Interpretation

Insight Presentation

Page 6: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

6

Requirements

Why Hadoop and HBase?

August 31,

2012

SMM

Cost e!ective

High scalable

RT Alerting

Analytical capabilities

Reliable

Freshness

Page 7: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

7

Hadoop

•  HDFS + MapReduce •  Based on Google Papers •  Distributed Storage and Computation

Framework •  A!ordable Hardware, Free Software

•  Significant Adoption

Why Hadoop and HBase?

August 31,

2012

Page 8: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

8

HBase

•  Non-Relational, Distributed Database •  Column-Oriented •  Multi-Dimensional •  High Availability •  High Performance •  Build on top of HDFS as storage layer

Why Hadoop and HBase?

August 31,

2012

Page 9: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

9

Technology Stack

Why Hadoop and HBase?

August 31,

2012

HBase /HDFS Storage

Hadoop Mahout Analytics

Solr Search

HBase RowLog Event mechanism (MQ)

Prospective search Real-time alerting

Page 10: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

CC 2.0 by nolifebeforeco!ee | http://flic.kr/p/c1UTf

Page 11: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

11

Overview

Social Media Monitoring

August 31,

2012

Search Agents

Downloaded Articles

Output

match?

RT Alerts Reports Web-UI

Icons by http://dryicons.com

Page 12: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

12

Solution Architecture

Social Media Monitoring

August 31,

2012

REST

n News Agents

MySQL Solr

Web-UI

RT Alerts

Coprocessor

HBase

Icons by http://dryicons.com

Page 13: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

13

Prospective Search with Coprocessors

Social Media Monitoring

August 31,

2012

Processing

HRegionServer

HRegion

Put operations

Prospective Search

RT Alerts

Icons by http://dryicons.com

Page 14: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

14

Key Figures

•  Monthly growth •  Index: 200GB •  50 Mio. docs/month

•  HBase: 600 GB •  Raw data, meta data and extracted

data

•  A few 1000 map-reduce jobs/month

Social Media Monitoring

August 31,

2012

Page 15: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

CC 2.0 by saebaryo | h"p://flic.kr/p/5T4t5L

Page 16: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

16 1  Benchmarks - workloads 2  Supervision 3  Keys and shards – Schema design /LG 4  Timestamps, the 4th dimension 5  Short ColumnFamily names-> 6  File handles. OS 7  JVM Tuning, GC !!! 8  Scaling region servers, data locality! 9  Automatic vs manual splits, compaction 10  Do not use HBase as rock solid in prod 11  Forget feuerwehr aktionen, it takes some time 12  Use Hbase for a apropriate use case 13  Tune and tweak – it‘s not a project – it‘s a process 14  You need devops in production 15  Huge know-how curve, you need to know the hole ecosystem 16  Use a distribution, ist packed, tested and supports migration, enterprise grade 17  Virtualisierung, Hardware 18  Dont struggle to much, there is a good community 19  Share your knowledge 20  It‘s early state, many tools around, a few still missing

Challenges & Lessons Learned

August 31, 2012

Page 17: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

17

Challenges

•  Everyone is still learning •  Some issues only appear at scale •  At scale, nothing works as advertised

•  Production cluster configuration •  Hardware issues •  Tuning cluster configuration to our work

loads

•  HBase stability •  Monitoring health of HBase

Challenges & Lessons Learned

August 31,

2012

Page 18: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

18

Lessons - General

•  Do not rely on HBase as frontend storage layer. It’s not going to be rock solid

•  Don’t struggle to much, there is a good community

•  Share your knowledge •  It‘s early stage, many tools around, a

few still missing

Challenges & Lessons Learned

August 31,

2012

Page 19: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

19

Lessons - Planning

•  Use HBase for an appropriate use case •  Use a distribution, its packed, tested and

supports migration, enterprise grade •  Benchmarks – know your workloads &

query patterns •  YCSB

•  Schema & Key Design •  What’s queried together should be stored

together •  Scaling region servers, data locality! •  Virtualization vs. Real Hardware

Challenges & Lessons Learned

August 31,

2012

Page 20: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

20

Lessons - Performance Tuning

•  Number of CF < 10 •  Compaction + Flushing I/O intensive

•  Short ColumnFamily names •  HFile index size occupying aloc RAM (storefileindexSize)

•  OS file handles •  ulimit –n 32768

•  JVM Tuning, GC !!! •  HMaster 1024 MB •  RegionServer 8192 MB •  -XX:+UseConcMarkSweepGC •  -XX:+CMSIncrementalMode

•  Automatic vs. manual splits •  Be careful with expensive operations in coprocessors •  Play with all the configurations and benchmark for tuning

Challenges & Lessons Learned

August 31,

2012

Page 21: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

21

Lessons - Operation

•  Monitoring/Operational tooling is most important

•  Forget “emergency actions”, it takes some time

•  Tune and tweak – it‘s not a project – it‘s a process

•  You need DevOps in production •  Huge know-how curve, you need to

know the whole ecosystem •  Hadoop, HDFS, MapRed

Challenges & Lessons Learned

August 31,

2012

Page 22: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

22

Resources to get started

•  http://hbase.apache.org/book.html •  http://www.sentric.ch/blog/best-

practice-why-monitoring-hbase-is-important

•  http://www.sentric.ch/blog/hadoop-overview-of-top-3-distributions

•  http://www.sentric.ch/blog/hadoop-best-practice-cluster-checklist

•  http://outerthought.org/blog/465-ot.html

August 31,

2012

Page 23: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

23

Thank you!

Questions? Christian Gügi, [email protected]

Jean-Pierre König, [email protected]

NoSQL Roadshow Basel

August 31,

2012

Page 24: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

24

Cluster

Masters

August 31, 2012

Page 25: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr

25

Cluster

Worker

August 31, 2012