![Page 1: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/1.jpg)
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
![Page 2: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/2.jpg)
2
Agenda
• Why Hadoop and HBase? • Social Media Monitoring • Prospective Search and Coprocessors
• Challenges & Lessons Learned • Resources to get started
August 31,
2012
![Page 3: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/3.jpg)
3
About Sentric
• Spin-o! of MeMo News AG, the leading provider for Social Media Monitoring & Analytics in Switzerland
• Big Data expert, focused on Hadoop, HBase and Solr
• Objective: Transforming data into insights
August 31,
2012
![Page 4: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/4.jpg)
CC 2.0 by Editor B| h"p://flic.kr/p/bcU5aD1
![Page 5: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/5.jpg)
5
Social Media Monitoring Process
Why Hadoop and HBase?
August 31,
2012
Information Gathering
Information Processing
Analysis & Interpretation
Insight Presentation
![Page 6: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/6.jpg)
6
Requirements
Why Hadoop and HBase?
August 31,
2012
SMM
Cost e!ective
High scalable
RT Alerting
Analytical capabilities
Reliable
Freshness
![Page 7: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/7.jpg)
7
Hadoop
• HDFS + MapReduce • Based on Google Papers • Distributed Storage and Computation
Framework • A!ordable Hardware, Free Software
• Significant Adoption
Why Hadoop and HBase?
August 31,
2012
![Page 8: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/8.jpg)
8
HBase
• Non-Relational, Distributed Database • Column-Oriented • Multi-Dimensional • High Availability • High Performance • Build on top of HDFS as storage layer
Why Hadoop and HBase?
August 31,
2012
![Page 9: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/9.jpg)
9
Technology Stack
Why Hadoop and HBase?
August 31,
2012
HBase /HDFS Storage
Hadoop Mahout Analytics
Solr Search
HBase RowLog Event mechanism (MQ)
Prospective search Real-time alerting
![Page 10: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/10.jpg)
CC 2.0 by nolifebeforeco!ee | http://flic.kr/p/c1UTf
![Page 11: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/11.jpg)
11
Overview
Social Media Monitoring
August 31,
2012
Search Agents
Downloaded Articles
Output
match?
RT Alerts Reports Web-UI
Icons by http://dryicons.com
![Page 12: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/12.jpg)
12
Solution Architecture
Social Media Monitoring
August 31,
2012
REST
n News Agents
MySQL Solr
Web-UI
RT Alerts
Coprocessor
HBase
Icons by http://dryicons.com
![Page 13: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/13.jpg)
13
Prospective Search with Coprocessors
Social Media Monitoring
August 31,
2012
Processing
HRegionServer
HRegion
Put operations
Prospective Search
RT Alerts
Icons by http://dryicons.com
![Page 14: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/14.jpg)
14
Key Figures
• Monthly growth • Index: 200GB • 50 Mio. docs/month
• HBase: 600 GB • Raw data, meta data and extracted
data
• A few 1000 map-reduce jobs/month
Social Media Monitoring
August 31,
2012
![Page 15: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/15.jpg)
CC 2.0 by saebaryo | h"p://flic.kr/p/5T4t5L
![Page 16: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/16.jpg)
16 1 Benchmarks - workloads 2 Supervision 3 Keys and shards – Schema design /LG 4 Timestamps, the 4th dimension 5 Short ColumnFamily names-> 6 File handles. OS 7 JVM Tuning, GC !!! 8 Scaling region servers, data locality! 9 Automatic vs manual splits, compaction 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr aktionen, it takes some time 12 Use Hbase for a apropriate use case 13 Tune and tweak – it‘s not a project – it‘s a process 14 You need devops in production 15 Huge know-how curve, you need to know the hole ecosystem 16 Use a distribution, ist packed, tested and supports migration, enterprise grade 17 Virtualisierung, Hardware 18 Dont struggle to much, there is a good community 19 Share your knowledge 20 It‘s early state, many tools around, a few still missing
Challenges & Lessons Learned
August 31, 2012
![Page 17: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/17.jpg)
17
Challenges
• Everyone is still learning • Some issues only appear at scale • At scale, nothing works as advertised
• Production cluster configuration • Hardware issues • Tuning cluster configuration to our work
loads
• HBase stability • Monitoring health of HBase
Challenges & Lessons Learned
August 31,
2012
![Page 18: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/18.jpg)
18
Lessons - General
• Do not rely on HBase as frontend storage layer. It’s not going to be rock solid
• Don’t struggle to much, there is a good community
• Share your knowledge • It‘s early stage, many tools around, a
few still missing
Challenges & Lessons Learned
August 31,
2012
![Page 19: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/19.jpg)
19
Lessons - Planning
• Use HBase for an appropriate use case • Use a distribution, its packed, tested and
supports migration, enterprise grade • Benchmarks – know your workloads &
query patterns • YCSB
• Schema & Key Design • What’s queried together should be stored
together • Scaling region servers, data locality! • Virtualization vs. Real Hardware
Challenges & Lessons Learned
August 31,
2012
![Page 20: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/20.jpg)
20
Lessons - Performance Tuning
• Number of CF < 10 • Compaction + Flushing I/O intensive
• Short ColumnFamily names • HFile index size occupying aloc RAM (storefileindexSize)
• OS file handles • ulimit –n 32768
• JVM Tuning, GC !!! • HMaster 1024 MB • RegionServer 8192 MB • -XX:+UseConcMarkSweepGC • -XX:+CMSIncrementalMode
• Automatic vs. manual splits • Be careful with expensive operations in coprocessors • Play with all the configurations and benchmark for tuning
Challenges & Lessons Learned
August 31,
2012
![Page 21: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/21.jpg)
21
Lessons - Operation
• Monitoring/Operational tooling is most important
• Forget “emergency actions”, it takes some time
• Tune and tweak – it‘s not a project – it‘s a process
• You need DevOps in production • Huge know-how curve, you need to
know the whole ecosystem • Hadoop, HDFS, MapRed
Challenges & Lessons Learned
August 31,
2012
![Page 22: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/22.jpg)
22
Resources to get started
• http://hbase.apache.org/book.html • http://www.sentric.ch/blog/best-
practice-why-monitoring-hbase-is-important
• http://www.sentric.ch/blog/hadoop-overview-of-top-3-distributions
• http://www.sentric.ch/blog/hadoop-best-practice-cluster-checklist
• http://outerthought.org/blog/465-ot.html
August 31,
2012
![Page 23: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/23.jpg)
23
Thank you!
Questions? Christian Gügi, [email protected]
Jean-Pierre König, [email protected]
NoSQL Roadshow Basel
August 31,
2012
![Page 24: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/24.jpg)
24
Cluster
Masters
August 31, 2012
![Page 25: CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3nosqlroadshow.com/dl/NoSQL-Road-Show/slides/NoSQL-Basel/GugiK… · 10 Do not use HBase as rock solid in prod 11 Forget feuerwehr](https://reader034.vdocument.in/reader034/viewer/2022042309/5ed6d118c7a5935b07521f16/html5/thumbnails/25.jpg)
25
Cluster
Worker
August 31, 2012