Download - Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
Vinod Kumar VavilapalliHortonworks
© Hortonworks Inc. 2014
Apache Hadoop YARNPresent and Future
Vinod Kumar Vavilapalli
vinodkv [at] apache.org
@tshooter
Page 2
© Hortonworks Inc. 2014
A quick show of hands..
• Hadoop 2
Page 3Architecting the Future of Big Data
Real life Hadoop Logo
© Hortonworks Inc. 2014
Who am I?
• 6.75 Hadoop-years old• Last thing at School – a two node Tomcat cluster. Three months later,
first thing at job, brought down a 800 node cluster ;)• Previously @Yahoo!• Now @Hortonworks• Two hats
– Hortonworks: Hadoop MapReduce and YARN Development lead– Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member
• Worked/working on– YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop
security– Apache Ambari: Kickstarted the project and its first release– Stinger: High performance data processing with Hadoop/Hive
• Lots of trouble shooting on clusters• 99% + code in Apache, Hadoop
Page 4Architecting the Future of Big Data
© Hortonworks Inc. 2014
Agenda
• Apache Hadoop 2 : Overview• Past• Present• Future
Page 5Architecting the Future of Big Data
© Hortonworks Inc. 2014
Apache Hadoop 2Next Generation Architecture
Architecting the Future of Big DataPage 6
© Hortonworks Inc. 2014
What is YARN?
• Resource Management Platform– MapReduce v2– Beyond MapReduce with Tez, Storm, Spark; in Hadoop!– Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider?
• How is it different from Hadoop 1? ..
Page 7Architecting the Future of Big Data
© Hortonworks Inc. 2014
Hadoop 1 vs Hadoop 2
HADOOP 1.0
HDFS(redundant, reliable storage)
MapReduce(cluster resource management
& data processing)
HDFS2(redundant, highly-available & reliable storage)
YARN(cluster resource management)
MapReduce(data processing)
Others
HADOOP 2.0
Single Use SystemBatch Apps
Multi Purpose PlatformBatch, Interactive, Online, Streaming, …
Page 8
© Hortonworks Inc. 2014
Key Benefits of YARN
• Scale
• New Programming Models & Services
• Improved cluster utilization
• Agility
• To infinity and beyond ..
Page 9
© Hortonworks Inc. 2014
Why Migrate?
• 2.0 >= 2 * 1.0– HDFS: Lots of ground-breaking features– YARN: Next generation architecture
• Return on Investment: 2x throughput on same hardware!• Ready for improvements in hardware• Not convinced? Let’s see what others are saying!
Page 10Architecting the Future of Big Data
© Hortonworks Inc. 2014
Yahoo!
• Leader/Visionary on all things Hadoop!• On YARN (0.23.x)• Moving fast to 2.x
Page 11Architecting the Future of Big Data
http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
© Hortonworks Inc. 2014
Page 12Architecting the Future of Big Data
© Hortonworks Inc. 2014
Ebay
• Has one of the largest Hadoop clusters in the industry with many petabytes of data
• Migrated production clusters to Hadoop-2• Go to Mayank’s talk
– “Hadoop-2 @ ebay”!– Thursday, April 3– Track : Deployment and Operations
• Should be convinced by now .. . No?
Page 13Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN: the Data Operating System
Page 14Architecting the Future of Big Data
© Hortonworks Inc. 2014
Present
Architecting the Future of Big DataPage 15
© Hortonworks Inc. 2014
Apache Hadoop releases
• 15 October, 2013• The 1st GA release of Apache Hadoop 2.x• YARN
– First stable and supported release of YARN– Binary Compatibility for MapReduce applications built on hadoop-1.x– YARN level APIs solidified for the future– Performance– Scale!
• HDFS– High Availability for HDFS– HDFS Federation– HDFS Snapshots– NFSv3 access to data in HDFS
• Support for running Hadoop on Microsoft Windows• Substantial amount of integration testing with rest of projects in the
ecosystem
Page 16Architecting the Future of Big Data
Apache Hadoop 2.2
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• 24 February, 2014• First post GA release for the year 2014
• Alpha features in YARN– ResourceManager HA– Application History– Will cover in the 2.4 content
• HDFS– Details follow..
• Number of bug-fixes, enhancements
Page 17Architecting the Future of Big Data
Apache Hadoop 2.3
© Hortonworks Inc. 2014
HDFS: Heterogeneous Storage
Page 18Architecting the Future of Big Data
© Hortonworks Inc. 2014
HDFS: DataNode caching
Page 19Architecting the Future of Big Data
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• Very soon!
• YARN– Details follow..– ResourceManager restart fail-over for high availability– Preemption– Application History and timeline
• HDFS– FileSystem ACLs– Rolling upgrades
Page 20Architecting the Future of Big Data
Apache Hadoop 2.4
© Hortonworks Inc. 2014
ResourceManager Restart and fail-over
Page 21Architecting the Future of Big Data
ZooKeeper
© Hortonworks Inc. 2014
Capacity Scheduler Preemption
Page 22Architecting the Future of Big Data
© Hortonworks Inc. 2014
Application History and Timeline
• Few MR specific implementations: History and web-UI• Not just MR anymore!• History
– MapReduce specific Job History Server– Beyond ResourceManager Restart
• Timeline– Framework specific event collection and UIs
• Run analytics on historical apps!
Page 23Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future
Architecting the Future of Big DataPage 24
© Hortonworks Inc. 2014
Future: Operational enhancements
• Rolling upgrades– No/minimal impact to users– Ideal: Always rolling!
• HDFS in• YARN
Page 25Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future: Enabling more apps
• Beyond MR• Discussing next
– Long running services– Isolation– Multi-dimensional resource
scheduling
Page 26Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future: Long running services
• You can run them already!• Few enhancements needed
– Logs– Security– Management/monitoring
• Resource sharing across workload types
• Project Slider
Page 27Architecting the Future of Big Data
© Hortonworks Inc. 2014
Fine-grain isolation for multi-tenancy
• Custom memory-monitoring• Cgroups• Linux Containers• VMs
Page 28Architecting the Future of Big Data
© Hortonworks Inc. 2014
Multi-resource scheduling
• Today – memory & cpu– Physical memory / virtual memory– Cpu Cores – Virtual cores
• CPU stuff: More bake in• Disks
– Space– IOPS
• Network
Page 29Architecting the Future of Big Data
© Hortonworks Inc. 2014
Other features
• Application SLAs• Node labels• Node affinity/anti-affinity• Better online queue-management
Page 30Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN EcosystemBeyond the core YARN project: Briefly
Architecting the Future of Big DataPage 31
© Hortonworks Inc. 2014
Eco-system
Page 32
Applications Powered by YARN
Apache Giraph – Graph Processing
Apache Hama – BSP
Apache Hadoop MapReduce – Batch
Apache Tez – Batch/Interactive
Apache S4 – Stream Processing
Apache Samza – Stream Processing
Apache Storm – Stream Processing
Apache Spark – Iterative applications
HOYA – HBase on YARNYARN FrameworksApache Twill
REEF by Microsoft
Spring support for Hadoop 2
There's an app for that...
YARN App Marketplace!
© Hortonworks Inc. 2014
Apache TEZ
• Moving beyond MR• A data processing framework that can execute a complex DAG
of tasks.
• “Apache Tez - A New Chapter in Hadoop Data Processing”– By Siddharth Seth: YARN & Tez Committer/PMC Member– Thursday, April 3 (4:20-5:00pm)
Page 33Architecting the Future of Big Data
© Hortonworks Inc. 2014
Recap
Architecting the Future of Big DataPage 34
© Hortonworks Inc. 2014
Recap
Page 35Architecting the Future of Big Data
• Apache Hadoop 2 is, at least, twice as good!
• Exciting journey with Hadoop for this decade…– Hadoop is no longer a one-trick pony, err elephant– Beyond just HDFS & MapReduce
• Architecture for the future– Centralized data– Exciting spectrum of application types, workloads and usecases
© Hortonworks Inc. 2014
Couple more things..
Architecting the Future of Big DataPage 36
© Hortonworks Inc. 2014
The Book is out!
Page 37Architecting the Future of Big Data
http://yarn-book.com/
© Hortonworks Inc. 2014Page 38
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Thank you!
Page 39
Download Sandbox: Experience Apache Hadoop
Both 2.x and 1.x Versions Available!
http://hortonworks.com/products/hortonworks-sandbox/
Questions Time!