design for a distributed name node
DESCRIPTION
A proposed design for a distributed HDFS NameNode.TRANSCRIPT
![Page 1: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/1.jpg)
Reaching 10,000Aaron CordovaBooz Allen Hamilton | Hadoop Meetup DC | Sep 7 2010
![Page 2: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/2.jpg)
Lots of Applications Require Scalability
Intelligence
Bio-Metrics
Bio-Informatics
Defense
Video
Images
Text
Structured Data
Graph Analytics
Machine Learning
Network Security
![Page 3: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/3.jpg)
Hadoop Scales
![Page 4: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/4.jpg)
Cos
t ->
Data Size ->
Shared Nothing Shared Disk
Linear Scalability
![Page 5: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/5.jpg)
Massive Parallelism
![Page 6: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/6.jpg)
MapReduce
Simplified Distributed Programming Model
Fault Tolerant
Designed to Scale to Thousands of Servers
Many Algorithms Easily Expressed as Map and Reduce
![Page 7: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/7.jpg)
HDFS
Distributed File System
Optimized for High-Throughput
Fault Tolerant Through Replication, Checksumming
Designed to Scale to 10,000 servers
![Page 8: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/8.jpg)
Hadoop is a Platform
![Page 9: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/9.jpg)
MapReduce
HDFS
HBase
Mahout Hive
Pig
FlumeCascading
Nutch
![Page 10: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/10.jpg)
HBase
Scalable Structured store
Fast Lookups
Durable, Consistent Writes
Automatic Partitioning
![Page 11: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/11.jpg)
Mahout
Scalable Machine Learning Algorithms
Clustering
Classification
![Page 12: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/12.jpg)
Fuzzy Table
Low-Latency Parallel Search
Generalized Fuzzy Matching
Images, Biometrics, Audio
![Page 13: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/13.jpg)
One Major Problem
![Page 14: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/14.jpg)
HDFS Single NameNode
Single NameSpace - easy to serialize operations
NameSpace stored entirely in memory
Changes written to transaction log first
Single Point of Failure
Performance Bottleneck?
![Page 15: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/15.jpg)
NameNode Scalability
By software evolution standards Hadoop is a young project. In 2005, inspired by two Google papers, Doug Cutting and Mike Cafarella implemented the core of Hadoop. Its wide acceptance and growth started in 2006 when Yahoo! began investing in its development and committed to use Hadoop as its internal distributed platform. During the past sev-eral years Hadoop installations have grown from a handful of nodes to thousands. It is now used in many organizations around the world.
In 2006, when the buzzword for storage was Exabyte, the Hadoop group at Yahoo! formulated long-term target requirements [7] for the Hadoop Distributed File System and outlined a list of projects intended to bring the requirements to life. What was clear then has now become a reality: the need for large distributed storage systems backed by distributed computational frameworks like Ha-doop MapReduce is imminent.
Today, when we are on the verge of the Zettabyte Era, it is time to take a retrospective view of the targets and analyze what has been achieved, how aggressive our views on the evolution and needs of the storage world have been, how the achievements compare to competing systems, and what our lim-its to growth may be.
The main four-dimensional scale requirement targets for HDFS were formulated [7] as follows:
10PB capacity x 10,000 nodes x 100,000,000 files x 100,000 clients
The biggest Hadoop clusters [8, 5], such as the one recently used at Yahoo! to set sorting records, consist of 4000 nodes and have a total space capac-
“100,000 HDFS clients on a 10,000-node HDFS cluster will exceed the throughput capacity of a single name-node.
... any solution intended for single namespace server optimization lacks scalability.
... the most promising solutions seem to be based on distributing the namespace server ...”
Konstantin Shvachko
Login Apr 2010
![Page 16: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/16.jpg)
0
12.5
25
37.5
50
writ
es/s
econ
d (th
ousa
nds)
Single NN Target
Goal
![Page 17: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/17.jpg)
HDFS Single NameNode
Server grade machine
Lots of memory
Reliable components
RAID
Hot-Failover
![Page 18: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/18.jpg)
Needs Parallelism
![Page 19: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/19.jpg)
Scaling NameNode
Grow memory
Read-only Replicas of NameNode
Multiple static namespace partitions
Distributed name server, partition namespace dynamically
![Page 20: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/20.jpg)
Distributed NameNode Features
Fast Lookups
Durable, Consistent writes
Automatic Partitioning
![Page 21: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/21.jpg)
Can we use HBase?
![Page 22: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/22.jpg)
NameSpace
filename : blocks DataNodes
node : blocks Blocks
block : nodes
Mappings as HBase Tables
![Page 23: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/23.jpg)
How to order namespace?
![Page 24: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/24.jpg)
Depth First Search Order
/
/dir1
/dir1/subdir
/dir1/subdir/file
/dir2/file1
/dir2/file2
![Page 25: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/25.jpg)
Depth First Operations
Delete (Recursive)Move / Rename
![Page 26: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/26.jpg)
Breadth First Search Order
0/
1/dir1
2/dir2/file1
2/dir2/file2
2/dir1/subdir
3/dir2/subdir/file
![Page 27: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/27.jpg)
Breadth First Operations
List
![Page 28: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/28.jpg)
NameNode
DFSClientDataNode DataNode DFSClient
Current Architecture
![Page 29: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/29.jpg)
DFSClient
DNNProxy
DataNode
DNNProxy
DataNode
DNNProxy
DFSClient
DNNProxy
RServer RServer RServer RServer
Proposed Architecture
![Page 30: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/30.jpg)
100k clients -> 41k writes/s
![Page 31: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/31.jpg)
0
12.5
25
37.5
50
100 150 200 250
writ
es/s
econ
d (th
ousa
nds)
# machines hosting namespace
Single NN Distributed NN Target
Anticipated Performance
![Page 32: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/32.jpg)
Issues
Synchronization - multiple writers, changes
Name distribution hotspots
![Page 33: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/33.jpg)
Current Status
Working code exists that uses HBase with slightly modified DFSClient and DataNode for create, write, close, open, read, mkdirs, delete.
New component: HealthServer monitors DataNodes and does garbage collection. More like BigTable master, can die, restart without affecting clients.
![Page 34: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/34.jpg)
Code
Will be at http://code.google.com/p/hdfs-dnn
Available under the Apache license - whichever is compatible with Hadoop
![Page 35: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/35.jpg)
Doesn’t HBase run on HDFS?
![Page 36: Design for a Distributed Name Node](https://reader033.vdocument.in/reader033/viewer/2022052820/54c66e9c4a79594a538b4619/html5/thumbnails/36.jpg)
Self-Hosted HBase
May be possible to have HBase use the same HDFS instance it’s supporting
Some recursion and self-reference already exists: HBase Metadata table is itself a table in HBase
Have to work out bootstrapping and failure recovery to resolve any potential circular dependencies