welcome to hadoop2land!
Post on 22-Sep-2014
6 views
DESCRIPTION
This talk takes you on a rollercoaster ride through Hadoop 2 and explains the most significant changes and components. The talk has been held on the JavaLand conference in Brühl, Germany on 25.03.2014. Agenda: - Welcome Office - YARN Land - HDFS 2 Land - YARN App Land - Enterprise LandTRANSCRIPT
25.03.2014
Welcome to
uweseiler
25.03.2014
2 Your Travel Guide
Big Data Nerd
TravelpiratePhotography Enthusiast
Hadoop Trainer NoSQL Fan Boy
25.03.2014
2 Your Travel Agency
specializes on...
Big Data Nerds Agile Ninjas Continuous Delivery Gurus
Enterprise Java Specialists Performance Geeks
Join us!
25.03.2014
2 Your Travel Destination
Welcome OfficeYARN Land
HDFS 2 Land
YARN App Land Enterprise Land
Goodbye Photo
25.03.2014
2 Stop #1: Welcome Office
Welcome Office
25.03.2014
2
…there was MapReduce
In the beginning of Hadoop
• It could handle data sizes way beyond thoseof its competitors
• It was resilient in the face of failure
• It made it easy for users to bring their codeand algorithms to the data
25.03.2014
2
…but it was too low levelHadoop 1 (2007)
25.03.2014
2
…but it was too rigidHadoop 1 (2007)
25.03.2014
2
HDFS
…but it was Batch
HDFS HDFS
Single App
Batch
Single App
Batch
Single App
Batch
Single App
Batch
Single App
Batch
Hadoop 1 (2007)
25.03.2014
2
…but it had limitationsHadoop 1 (2007)
• Scalability– Maximum cluster size ~ 4,500 nodes – Maximum concurrent tasks – 40,000– Coarse synchronization in JobTracker
• Availability– Failure kills all queued and running jobs
• Hard partition of resources into map & reduce slots– Low resource utilization
• Lacks support for alternate paradigms and services
25.03.2014
2
YARN to the rescue!
Hadoop 2 (2013)
25.03.2014
2 Stop #2: YARN Land
Welcome OfficeYARN Land
25.03.2014
2 A brief history of Hadoop 2
• Originally conceived & architected by the team at Yahoo! – Arun Murthy created the original JIRA in 2008 and now is
the Hadoop 2 release manager
• The community has been working on Hadoop 2 for over 4 years
• Hadoop 2 based architecture running at scale at Yahoo! – Deployed on 35,000+ nodes for 6+ months
25.03.2014
2
Hadoop 1
HDFSRedundant, reliable
storage
Hadoop 2: Next-gen platform
MapReduceCluster resource mgmt.
+ data processing
Hadoop 2
HDFS 2Redundant, reliable storage
MapReduceData processing
Single use systemBatch Apps
Multi-purpose platformBatch, Interactive, Streaming, …
YARNCluster resource management
OthersData processing
25.03.2014
2 Taking Hadoop beyond batch
Applications run natively in Hadoop
HDFS 2Redundant, reliable storage
BatchMapReduce
Store all data in one placeInteract with data in multiple ways
YARNCluster resource management
InteractiveTez
OnlineHOYA
StreamingStorm, …
GraphGiraph
In-MemorySpark
OtherSearch, …
25.03.2014
2 YARN: Design Goals
• Build a new abstraction layer by splitting up the two major functions of the JobTracker• Cluster resource management • Application life-cycle management
• Allow other processing paradigms• Flexible API for implementing YARN apps• MapReduce becomes YARN app• Lots of different YARN apps
25.03.2014
2 Hadoop: BigDataOS™
Storage:
File System
Traditional OperatingSystem
Execution / Scheduling:
Processes / Kernel
Storage:
HDFS
Hadoop
Execution / Scheduling:
YARN
25.03.2014
2 YARN: Architectural Overview
Split up the two major functions of the JobTrackerCluster resource management & Application life-cycle management
ResourceManager
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Scheduler
AM 1
Container 1.2
Container 1.1
AM 2
Container 2.1
Container 2.2
Container 2.3
25.03.2014
2 YARN: Multi-tenancy IResourceManager
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Scheduler
MapReduce 1
map 1.2
map 1.1
MapReduce 2
map 2.1
map 2.2
reduce 2.1
NodeManager NodeManager NodeManager NodeManager
reduce 1.1 Tez map 2.3
reduce 2.2
vertex 1
vertex 2
vertex 3
vertex 4
HOYA
HBase Master
Region server 1
Region server 2
Region server 3 Storm
nimbus 1
nimbus 2
Different types of applications on the same cluster
25.03.2014
2 YARN: Multi-tenancy IIResourceManager
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Scheduler
MapReduce 1
map 1.2
map 1.1
MapReduce 2
map 2.1
map 2.2
reduce 2.1
NodeManager NodeManager NodeManager NodeManager
reduce 1.1 Tez map 2.3
reduce 2.2
vertex 1
vertex 2
vertex 3
vertex 4
HOYA
HBase Master
Region server 1
Region server 2
Region server 3 Storm
nimbus 1
nimbus 2
DWHUser Ad-Hoc
root
30% 60% 10%
Dev Prod
20% 80%
Dev Prod
Dev1 Dev2
25% 75%
60% 40%
Different users and organizations on the
same cluster
25.03.2014
2 YARN: Your own app
ResourceManager
NodeManager NodeManager
NodeManager
Scheduler
Container
Container
MyApp
YARNClient
App specificAPI
myAM
AMRMClient
NMClient
ApplicationClient
Protocol
ApplicationMaster
Protocol
Container Management Protocol
DistributedShell is thenew WordCount!
25.03.2014
2 Stop #3: HDFS 2 Land
Welcome OfficeYARN Land
HDFS 2 Land
25.03.2014
2 HDFS 2: In a nutshell
• Removes tight coupling of Block Storage and Namespace
• Adds (built-in) High Availability
• Better Scalability & Isolation
• Increased performance
Details: https://issues.apache.org/jira/browse/HDFS-1052
25.03.2014
2 HDFS 2: Federation
NameNodes do not talk to each other
NameNodes manages only slice of namespace
DataNodes can store blocks managed by
any NameNode
NameNode 1
Namespace 1
Namespace State
Block Map
NameNode 2
Namespace 2
Block Pools
Pool 1 Pool 2
Block Storage as generic storage service
Data Nodesb3 b1
b2 b4
b2 b1
b5
b3 b2
b5 b4
JBOD JBOD JBOD
Namespace State
Block Map
Horizontally scale IO and storage
25.03.2014
2 HDFS 2: Architecture
Active NameNode Standby NameNode
DataNode DataNode DataNode DataNode DataNode
Maintains Block Map and Edits File Simultaneously
reads and applies the edits
Report to both NameNodes
BlockMap
EditsFile
BlockMap
EditsFile
NFS Shared state on NFS
OR
Quorum based storageJournal Node
Journal Node
Journal Node
Take orders only from the
Active
or
25.03.2014
2
ZKFailoverController
ZKFailoverController
HDFS 2: High Availability
Active NameNode Standby NameNode
DataNode DataNode DataNode DataNode DataNode
BlockMap
EditsFile
BlockMap
EditsFile
ZooKeeper Node
ZooKeeperNode
ZooKeeperNode
Send Heartbeats & Block Reports
Shared StateMonitors healthof NN, OS, HW
Heartbeat Heartbeat
Holds special lock znode
Journal Node
Journal Node
Journal Node
25.03.2014
2 HDFS 2: Write-Pipeline
• Earlier versions of HDFS• Files were immutable• Write-once-read-many model
• New features in HDFS 2• Files can be reopened for append• New primitives: hflush and hsync• Replace data node on failure• Read consistency
DataNode 1
DataNode 2
DataNode 3
DataNode 4
Writer
Add new node tothe pipeline
Reader
Data Data Data
Can read from any node and then failover to any other node
25.03.2014
2 HDFS 2: Snapshots
• Admin can create point in time snapshots of HDFS • Of the entire file system• Of a specific data-set (sub-tree directory of file system)
• Restore state of entire file system or data-set to a snapshot (like Apple Time Machine) • Protect against user errors
• Snapshot diffs identify changes made to data set • Keep track of how raw or derived/analytical data changes
over time
25.03.2014
2 HDFS 2: NFS Gateway
• Supports NFS v3 (NFS v4 is work in progress)
• Supports all HDFS commands • List files • Copy, move files • Create and delete directories
• Ingest for large scale analytical workloads • Load immutable files as source for analytical processing • No random writes
• Stream files into HDFS • Log ingest by applications writing directly to HDFS client
mount
25.03.2014
2 HDFS 2: Performance
• Many improvements • New AppendableWrite-Pipeline• Read path improvements for fewer memory copies• Short-circuit local reads for 2-3x faster random
reads• I/O improvements using posix_fadvise()• libhdfs improvements for zero copy reads
• Significant improvements: I/O 2.5 - 5x faster
25.03.2014
2 Stop #4: YARN App Land
Welcome OfficeYARN Land
HDFS 2 Land
YARN App Land
25.03.2014
2 YARN Apps: Overview
• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search
25.03.2014
2 YARN Apps: Overview
• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search
25.03.2014
2 MapReduce 2: In a nutshell
• MapReduce is now a YARN app• No more map and reduce slots, it’s containers now• No more JobTracker, it’s YarnAppmaster library now
• Multiple versions of MapReduce • The older mapred APIs work without modification or recompilation • The newer mapreduce APIs may need to be recompiled
• Still has one master server component: the Job History Server • The Job History Server stores the execution of jobs • Used to audit prior execution of jobs • Will also be used by YARN framework to store charge backs at that level
• Better cluster utilization• Increased scalability & availability
25.03.2014
2 MapReduce 2: Shuffle
• Faster Shuffle • Better embedded server: Netty
• Encrypted Shuffle • Secure the shuffle phase as data moves across the cluster • Requires 2 way HTTPS, certificates on both sides • Causes significant CPU overhead, reserve 1 core for this work • Certificates stored on each node (provision with the cluster), refreshed every
10 secs
• Pluggable Shuffle Sort • Shuffle is the first phase in MapReduce that is guaranteed to not be data-
local • Pluggable Shuffle/Sort allows application developers or hardware
developers to intercept the network-heavy workload and optimize it • Typical implementations have hardware components like fast networks and
software components like sorting algorithms • API will change with future versions of Hadoop
25.03.2014
2 MapReduce 2: Performance
• Key Optimizations • No hard segmentation of resource into map and reduce slots • YARN scheduler is more efficient • MR2 framework has become more efficient than MR1: shuffle
phase in MRv2 is more performant with the usage of Netty.
• 40.000+ nodes running YARN across over 365 PB of data. • About 400.000 jobs per day for about 10 million hours of
compute time.• Estimated 60% – 150% improvement on node usage per day• Got rid of a whole 10,000 node datacenter because of their
increased utilization.
25.03.2014
2 YARN Apps: Overview
• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search
25.03.2014
2 Apache Tez: In a nutshell
• Distributed execution framework that works on computations represented as dataflow graphs
• Tez is Hindi for “speed”
• Naturally maps to execution plans produced by query optimizers
• Highly customizable to meet a broad spectrum of use cases and to enable dynamic performance optimizations at runtime
• Built on top of YARN
25.03.2014
2 Apache Tez: Architecture
• Task with pluggable Input, Processor & Output
Task
HDFSInput
MapProcessor
SortedOutput
„Classical“ Map
Task
ShuffleInput
ReduceProcessor
HDFSOutput
„Classical“ Reduce YARN ApplicationMaster to run DAG of Tez Tasks
25.03.2014
2 Apache Tez: Tez Service
• MapReduce Query Startup is expensive:– Job launch & task-launch latencies are fatal for
short queries (in order of 5s to 30s)
• Solution:– Tez Service (= Preallocated Application Master)
• Removes job-launch overhead (Application Master)• Removes task-launch overhead (Pre-warmed Containers)
– Hive (or Pig)• Submit query plan to Tez Service
– Native Hadoop service, not ad-hoc
25.03.2014
2
Hadoop 1
HDFSRedundant, reliable storage
Apache Tez: The new primitive
MapReduceCluster resource mgmt. + data
processing
Hadoop 2
MapReduce as Base Apache Tez as Base
Pig Hive Other
HDFSRedundant, reliable storage
YARNCluster resource management
TezExecution Engine
MR Pig Hive Realtime
Storm
Other
25.03.2014
2 Apache Tez: PerformanceSELECT a.state, COUNT(*),
AVERAGE(c.price)
FROM a
JOIN b ON (a.id = b.id)
JOIN c ON (a.itemId = c.itemId)
GROUP BY a.state
Existing HiveParse Query 0.5s
Create Plan 0.5s
Launch Map-Reduce
20s
Process Map-Reduce
10s
Total 31s
Hive/TezParse Query 0.5s
Create Plan 0.5s
Launch Map-Reduce
20s
Process Map-Reduce
2s
Total 23s
Tez & Hive ServiceParse Query 0.5s
Create Plan 0.5s
Submit to Tez Service
0.5s
Process Map-Reduce 2s
Total 3.5s
* No exact numbers, for illustration only
25.03.2014
2 Stinger Initiative: In a nutshell
25.03.2014
2 Stinger: Overall Performance
* Real numbers, but handle with care!
25.03.2014
2 YARN Apps: Overview
• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search
25.03.2014
2 Storm: In a nutshell
• Stream-processing
• Real-time processing
• Developed as standalone application• https://github.com/nathanmarz/storm
• Ported on YARN• https://github.com/yahoo/storm-yarn
25.03.2014
2 Storm: Conceptual view
Spout
Spout
SpoutSource of streams
Bolt
Bolt
Bolt
Bolt
Bolt
Tuple
Bolt• Consumer of streams• Processing of tuples• Possibly emits new
tuplesStreamUnbound sequence of
tuples
Tuple
Tuple
TupleList of name-value pairs
TopologyNetwork of spouts & bolts as the nodes and
streams as the edges
25.03.2014
2 YARN Apps: Overview
• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search
25.03.2014
2 Spark: In a nutshell
• High-speed in-memory analytics over Hadoop and Hive
• Separate MapReduce-like engine– Speedup of up to 100x
– On-disk queries 5-10x faster
• Spark is now a top-level Apache project– http://spark.apache.org
• Compatible with Hadoop‘s Storage API
• Spark can be run on top of YARN– http://spark.apache.org/docs/0.9.0/running-on-yarn.html
25.03.2014
2 Spark: RDD
• Key idea: Resilient Distributed Datasets (RDDs)
• Read-only partitioned collection of records• Optionally cached in memory across cluster
• Manipulated through parallel operators• Support only coarse-grained operations
• Map• Reduce• Group-by transformations
• Automatically recomputed on failure
RDDA11A12A13
25.03.2014
2 Spark: Data Sharing
25.03.2014
2 YARN Apps: Overview
• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search
25.03.2014
2 HOYA: In a nutshell
• Create on-demand HBase clusters • Small HBase cluster in large YARN cluster• Dynamic HBase clusters• Self-healing HBase Cluster• Elastic HBase clusters• Transient/intermittent clusters for workflows
• Configure custom configurations & versions
• Better isolation
• More efficient utilization/sharing of cluster
25.03.2014
2 HOYA: Creation of AppMaster
ResourceManager
NodeManager NodeManager
NodeManager
Scheduler
Container
Container
HOYA Client
YARNClient
HOYA specific API
HOYAApplication
Master
Container
Container
Container
Container
25.03.2014
2 HOYA: Deployment of HBase
ResourceManager
NodeManager NodeManager
NodeManager
Scheduler
Container
Container
HOYA Client
YARNClient
HOYA specific API
HOYAApplication
Master
Container
Container
Container
Container
HBase Master
Region Server
Region Server
25.03.2014
2 HOYA: Bind via ZooKeeper
ResourceManager
NodeManager NodeManager
NodeManager
Scheduler
Container
Container
HOYA Client
YARNClient
HOYA specific API
HOYAApplication
Master
Container
Container
Container
Container
HBase Master
Region Server
Region Server
HBase Client
ZooKeeper
25.03.2014
2 YARN Apps: Overview
• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search
25.03.2014
2 Giraph: In a nutshell
• Giraph is a framework for processing semi-structured graph data on a massive scale
• Giraph is loosely based upon Google's Pregel • Both systems are inspired by the Bulk
Synchronous Parallel model
• Giraph performs iterative calculations on top of an existing Hadoop cluster• Uses Single Map-only Job
• Apache top level project since 2012– http://giraph.apache.org
25.03.2014
2 Stop #5: Enterprise Land
Welcome OfficeYARN Land
HDFS 2 Land
YARN App Land Enterprise Land
25.03.2014
2 Falcon: In a nutshell
• A framework for managing data processing in Hadoop Clusters
• Falcon runs as a standalone server as part of the Hadoop cluster
• Key Features:
• Data Replication Handling• Data Lifecycle Management• Process Coordination & Scheduling• Declarative Data Process Programming
• Apache Incubation Status• http://falcon.incubator.apache.org
25.03.2014
2 Falcon: One-stop Shop
Data Management Needs Tool Orchestration
Data Processing
Replication
Retention
Scheduling
Reprocessing
Multi Cluster Mgmt.
Oozie
Sqoop
Distcp
Flume
MapReduce
Pig & Hive
25.03.2014
2 Falcon: Weblog Use Case
• Weblogs saved hourly to primary cluster • HDFS location is /weblogs/{date}
• Desired Data Policy: • Replicate weblogs to secondary cluster • Evict weblogs from primary cluster after 2 days • Evict weblogs from secondary cluster after 1
week
25.03.2014
2 Falcon: Weblog Use Case
<feed description=“" name=”feed-weblogs” xmlns="uri:falcon:feed:0.1"><frequency>hours(1)</frequency>
<clusters><cluster name=”cluster-primary" type="source">
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/><retention limit="days(2)" action="delete"/>
</cluster><cluster name=”cluster-secondary" type="target">
<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/><retention limit=”days(7)" action="delete"/>
</cluster></clusters>
<locations><location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-
${HOUR}"/></locations>
<ACL owner=”hdfs" group="users" permission="0755"/><schema location="/none" provider="none"/>
</feed>
25.03.2014
2 Knox: In a nutshell
• System that provides a single point of authentication and access for Apache Hadoop services in a cluster.
• The gateway runs as a server (or cluster of servers) that provide centralized access to one or more Hadoop clusters.
• The goal is to simplify Hadoop security for both users and operators
• Apache Incubation Status• http://knox.incubator.apache.org
25.03.2014
2 Knox: Architecture
ResourceManager
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Scheduler
MapReduce 1
map 1.2
map 1.1
MapReduce 2
map 2.1
map 2.2
reduce 2.1
NodeManager NodeManager NodeManager NodeManager
reduce 1.1 Tez map 2.3
reduce 2.2
vertex 1
vertex 2
vertex 3
vertex 4
HOYA
HBase Master
Region server 1
Region server 2
Region server 3 Storm
nimbus 1
nimbus 2
ResourceManager
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Scheduler
MapReduce 1
map 1.2
map 1.1
MapReduce 2
map 2.1
map 2.2
reduce 2.1
NodeManager NodeManager NodeManager NodeManager
reduce 1.1 Tez map 2.3
reduce 2.2
vertex 1
vertex 2
vertex 3
vertex 4
HOYA
HBase Master
Region server 1
Region server 2
Region server 3 Storm
nimbus 1
nimbus 2
Kerberos SSO Provider
Identity ProvidersSecure Hadoop Cluster 1
Secure Hadoop Cluster 2
Gateway Cluster
#1 #3#2
Ambari / HueServer
Browser
Ambari Client
RESTClient
JDBCClient
DMZ
Firew
all
Firew
all
25.03.2014
2 Final Stop: Goodbye Photo
Welcome OfficeYARN Land
HDFS2 Land
YARN App Land Enterprise Land
Goodbye Photo
25.03.2014
2 Hadoop 2: Summary
1. Scale
2. New programming models & services
3. Beyond Java
4. Improved cluster utilization
5. Enterprise Readiness
25.03.2014
2 One more thing…
Let’s get started with Hadoop 2!
25.03.2014
2 Hortonworks Sandbox 2.0
http://hortonworks.com/products/hortonworks-sandbox
25.03.2014
2 Trainings
Developer Training
• 19. - 22.05.2014, Düsseldorf• 30.06 - 03.07.2014, München• 04. - 07.08.2014, Frankfurt
Admin Training
• 26. - 28.05.2014, Düsseldorf• 07. - 09.07.2014, München• 11. - 13.08.2014, Frankfurt
Details: https://www.codecentric.de/schulungen-und-workshops/
25.03.2014
2 Thanks for traveling with me