datacenter@night: how big data technologies power facebook
DESCRIPTION
Karthik Ranganathan, ehemaliger Lead-Ingenieur bei Facebook, und heute bei NUTANIX beschäftigt erklärt in seiner Präsentation moderne Datencenter anhand des Business Use Case Facebook.TRANSCRIPT
![Page 1: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/1.jpg)
1
Kannan Muthukkaruppan & Karthik RanganathanJun/20/2013
How Big Data Technologies Power Facebook
How Big Data Technologies Power FacebookKarthik RanganathanSeptember, 2013
![Page 2: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/2.jpg)
2
Introduction
Email: [email protected]: @KarthikRCurrent: Member of Technical Staff, NutanixBackground: Technical Engineering Lead at Facebook. Co-built Cassandra for Facebook Inbox Search and improved performance and resiliency of Hbase for Facebook Messages and Search Indexing.
![Page 3: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/3.jpg)
3
Agenda
Big data at Facebook HBase use cases
• OLTP• Analytics
Operating at scale The Nutanix solution
![Page 4: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/4.jpg)
4
Big Data at Facebook
OLTP• User databases (MySQL)• Photos (Haystack)
• Facebook Messages, Operational Data Store (HBase) Warehouse
• Hive Analytics• Graph Search Indexing
![Page 5: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/5.jpg)
5
HBase in a nutshell
Apache project, modeled after BigTable Distributed, large scale data store Built on top of Hadoop DFS (HDFS) Efficient at random reads and writes
![Page 6: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/6.jpg)
6
FB’s Largest Hbase Application
Facebook Messages
![Page 7: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/7.jpg)
7
The New Facebook Messages
![Page 8: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/8.jpg)
8
Why HBase?
Evaluated a bunch of different options• MySQL, Cassandra, building a custom storage system for
messages
Horizontal Scalability Automatic failover and load balancing Optimized for write-heavy workloads HDFS already battle-tested at Facebook HBase’s strong consistency model
![Page 9: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/9.jpg)
9
Quick stats (as of Nov 2011)
Traffic to HBase• Billions of messages per day• 75B+ rpc’s per day
Usage pattern• 55% reads, 45% writes• Average write: 16 KV’s to multiple CF’s
![Page 10: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/10.jpg)
10
Data Sizes
7PB+ online data• ~21PB with replication• LZO compressed• Excludes backups
Growth rate• 500TB+ per month• ~20PB of raw disk per year!
![Page 11: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/11.jpg)
11
Growing with size
Constant need of features with growth Read and write path improvements
• Performance optimizations• IOPS reduction• New database file format
Intelligent data and compute placement• Shard level block placement• Locality based load-balancing
![Page 12: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/12.jpg)
12
Other OLTP use cases of HBase
Operational Data Store Multi-tenant KeyValue store Site integrity – fighting spam
![Page 13: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/13.jpg)
13
Warehouse use cases of HBase
Graph Search Indexing• Complex application logic• Multiple verticals
Hive over HBase• Realtime data ingest• Enables real-time analytics
![Page 14: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/14.jpg)
14
Real-time monitoring and anomaly detection
Operational Data Store
![Page 15: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/15.jpg)
15
ODS: Facebook’s #1 Debugging Tool
Collects metrics from production servers
Supports complex aggregations and transformations
Really well-designed UI
![Page 16: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/16.jpg)
16
Quick stats
Traffic to HBase• 150B+ ops per day
Usage pattern• Heavy reads of recent data• Frequent MR jobs for rollups• TTL to expire older data
![Page 17: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/17.jpg)
17
Real-time Analytics
Facebook Insights
![Page 18: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/18.jpg)
18
Real-time URL/Domain Insights
Deep analytics for websites• Facebook widgets
Massive scale• Billions of URL’s• Millions of increments/sec
![Page 19: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/19.jpg)
19
Detailed Insights
Tracks many metrics• Clicks, likes, shares,
impressions• Referral traffic
Detailed breakdown• Age buckets, gender,
location
![Page 20: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/20.jpg)
20
Controlled Multi-tenancy
Generic KeyValue Store
![Page 21: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/21.jpg)
21
A Multi-tenant solution on HBase
Generic Key-Value store• Multiple apps on the same cluster• Transparent schema design• Simple API
put(appid, key, value)value = get(appid, key)
![Page 22: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/22.jpg)
22
Architecture
HBase
put(appid, key, value)
Memcache
get(appid, key)
ReadWrite
![Page 23: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/23.jpg)
23
Multi-tenancy Issues
Not a self-service model• Each app is reviewed
Global and per-app metrics• Monitor RPCs by type, latencies, errors• Friendly names for apps
If things went wrong• Per-app kill switch
![Page 24: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/24.jpg)
24
Powering FB’s Semantic Search Engine
Graph Search Indexing
![Page 25: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/25.jpg)
25
Framework to build search indexes
Multiple, independent input sources HBase stores document info Output is the search index image
rowKey = document idvalue = terms, document
data
![Page 26: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/26.jpg)
26
Architecture
HBase cluster
Document
source 2Document source 1
MR cluster
…Image files…
![Page 27: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/27.jpg)
27
Do’s and Do-Not’s From Experience
Operating at Scale
![Page 28: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/28.jpg)
28
Design for failures(!)
Architect for failures and manageability No single point of failure
• Killing any process is legit
Minimize manual intervention• Especially for frequent failures
Uptime is important• Rolling upgrades are the norm• Need to survive rack failures
![Page 29: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/29.jpg)
29
Dashboard and Metrics
Single place to graph/report everything RPC calls SLA misses
• Latencies, p99, Errors• Per-request profiling
Cluster and node health Network Utilization
![Page 30: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/30.jpg)
30
Health Checks
Constantly monitor nodes Auto-exclude nodes on failure
• Machine not ssh-able• Hardware failures (HDD failure, etc)• Do NOT exclude on rack failures
Auto-include nodes once repaired Rate limit remediation of nodes
![Page 31: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/31.jpg)
31
In a nutshell…
Use commodity hardware Scaling out is #1 Efficiency is #2
• though pretty close behind scale-out
Design for failures• Frequent failures must be auto handled
Metrics, Metrics, Metrics!
![Page 32: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/32.jpg)
32
Overview through comparison
The Nutanix Solution
![Page 33: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/33.jpg)
33
Nutanix compared with HBase
Evaluated a bunch of different options• MySQL, Cassandra, building a custom storage system for
messages
Horizontal Scalability Just add more nodes to scale out
Automatic failover and load balancing When a node goes down, others take its place automatically Load of node that went down is distributed to many others
![Page 34: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/34.jpg)
34
Nutanix compared with HBase philosophy
Optimized for write-heavy workloads Optimized for virtualized environments Read and write heavy workloads Transparent use of flash to boost perf
HDFS already battle-tested at Facebook Nutanix is also quite battle-tested
HBase’s strong consistency model Nutanix is also strongly consistent
![Page 35: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/35.jpg)
35
Other aspects of Nutanix
Architected for failures and manageability No single point of failure Minimal manual intervention for frequent failures
Uptime is important Rolling upgrades are the norm• Need to survive rack failures
Single place to graph/report everything Prism UI to report and manage the entire cluster
Constantly monitor nodes Auto-exclude nodes on failure
![Page 36: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/36.jpg)
36
In a nutshell about Nutanix…
Runs on commodity hardware Scaling out is #1
Drop in scale out for nodes
Efficiency is #2 Constant work on perf improvements
Design for failures Frequent failures auto handled Alerts in UI for many other states
Metrics, Metrics, Metrics! Prism UI gives insights into the cluster health
![Page 37: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/37.jpg)
37
Questions?
![Page 38: Datacenter@Night: How Big Data Technologies Power Facebook](https://reader037.vdocument.in/reader037/viewer/2022110306/55503ee3b4c9058f768b487b/html5/thumbnails/38.jpg)
38
Thank You
NUTANIX INC. – CONFIDENTIAL AND PROPRIETARY