hadoop summit europe talk 2014: apache hadoop yarn: present and future

38
© Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 1

Upload: vinod-kumar-vavilapalli

Post on 19-Aug-2014

786 views

Category:

Engineering


3 download

DESCRIPTION

Title: Apache Hadoop YARN: Present and Future Abstract: Apache Hadoop YARN evolves the Hadoop compute platform from being centered only around MapReduce to being a generic data processing platform that can take advantage of a multitude of programming paradigms all on the same data. In this talk, we'll talk about the journey of YARN from a concept to being the cornerstone of Hadoop 2 GA releases. We'll cover the current status of YARN, how it is faring today and how it stands apart from the monochromatic world that is Hadoop 1.0. We`ll then move on to the exciting future of YARN - features that are making YARN a first class resource-management platform for enterprise Hadoop, rolling upgrades, high availability, support for long running services alongside applications, fine-grain isolation for multi-tenancy, preemption, application SLAs, application-history to name a few.

TRANSCRIPT

Page 1: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop YARNPresent and Future

Vinod Kumar Vavilapallivinodkv [at] apache.org@tshooter

Page 1

Page 2: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

A quick show of hands..

• Hadoop 2

Page 2Architecting the Future of Big Data

Real life Hadoop Logo

Page 3: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Who am I?

• 6.75 Hadoop-years old• Last thing at School – a two node Tomcat cluster. Three months later,

first thing at job, brought down a 800 node cluster ;)• Previously @Yahoo!• Now @Hortonworks• Two hats

– Hortonworks: Hadoop MapReduce and YARN Development lead– Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member

• Worked/working on– YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop

security– Apache Ambari: Kickstarted the project and its first release– Stinger: High performance data processing with Hadoop/Hive

• Lots of trouble shooting on clusters• 99% + code in Apache, Hadoop

Page 3Architecting the Future of Big Data

Page 4: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Agenda

• Apache Hadoop 2 : Overview• Past• Present• Future

Page 4Architecting the Future of Big Data

Page 5: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop 2Next Generation Architecture

Architecting the Future of Big Data Page 5

Page 6: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

What is YARN?

• Resource Management Platform– MapReduce v2– Beyond MapReduce with Tez, Storm, Spark; in Hadoop!– Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider?

• How is it different from Hadoop 1? ..

Page 6Architecting the Future of Big Data

Page 7: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Hadoop 1 vs Hadoop 2

HADOOP 1.0

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

HDFS2(redundant, highly-available & reliable storage)

YARN(cluster resource management)

MapReduce(data processing)

Others

HADOOP 2.0

Single Use SystemBatch Apps

Multi Purpose PlatformBatch, Interactive, Online, Streaming, …

Page 7

Page 8: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Key Benefits of YARN

• Scale

• New Programming Models & Services

• Improved cluster utilization

• Agility

• To infinity and beyond ..

Page 8

Page 9: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Why Migrate?

• 2.0 >= 2 * 1.0– HDFS: Lots of ground-breaking features– YARN: Next generation architecture

• Return on Investment: 2x throughput on same hardware!• Ready for improvements in hardware• Not convinced? Let’s see what others are saying!

Page 9Architecting the Future of Big Data

Page 10: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Yahoo!

• Leader/Visionary on all things Hadoop!• On YARN (0.23.x)• Moving fast to 2.x

Page 10Architecting the Future of Big Data

http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html

Page 11: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Twitter

Page 11Architecting the Future of Big Data

Page 12: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Ebay

• Has one of the largest Hadoop clusters in the industry with many petabytes of data

• Migrated production clusters to Hadoop-2• Go to Mayank’s talk

– “Hadoop-2 @ ebay”!– Thursday, April 3– Track : Deployment and Operations

• Should be convinced by now .. . No?

Page 12Architecting the Future of Big Data

Page 13: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

YARN: the Data Operating System

Page 13Architecting the Future of Big Data

Page 14: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Present

Architecting the Future of Big Data Page 14

Page 15: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop releases

• 15 October, 2013• The 1st GA release of Apache Hadoop 2.x• YARN

– First stable and supported release of YARN– Binary Compatibility for MapReduce applications built on hadoop-1.x– YARN level APIs solidified for the future– Performance– Scale!

• HDFS– High Availability for HDFS– HDFS Federation– HDFS Snapshots– NFSv3 access to data in HDFS

• Support for running Hadoop on Microsoft Windows• Substantial amount of integration testing with rest of projects in the

ecosystem

Page 15Architecting the Future of Big Data

Apache Hadoop 2.2

Page 16: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• 24 February, 2014• First post GA release for the year 2014

• Alpha features in YARN– ResourceManager HA– Application History– Will cover in the 2.4 content

• HDFS– Details follow..

• Number of bug-fixes, enhancements

Page 16Architecting the Future of Big Data

Apache Hadoop 2.3

Page 17: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

HDFS: Heterogeneous Storage

Page 17Architecting the Future of Big Data

Page 18: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

HDFS: DataNode caching

Page 18Architecting the Future of Big Data

Page 19: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• Very soon!

• YARN– Details follow..– ResourceManager restart fail-over for high availability– Preemption– Application History and timeline

• HDFS– FileSystem ACLs– Rolling upgrades

Page 19Architecting the Future of Big Data

Apache Hadoop 2.4

Page 20: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

ResourceManager Restart and fail-over

Page 20Architecting the Future of Big Data

ZooKeeper

Page 21: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Capacity Scheduler Preemption

Page 21Architecting the Future of Big Data

Page 22: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Application History and Timeline

• Few MR specific implementations: History and web-UI• Not just MR anymore!• History

– MapReduce specific Job History Server– Beyond ResourceManager Restart

• Timeline– Framework specific event collection and UIs

• Run analytics on historical apps!

Page 22Architecting the Future of Big Data

Page 23: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future

Architecting the Future of Big Data Page 23

Page 24: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future: Operational enhancements

• Rolling upgrades– No/minimal impact to users– Ideal: Always rolling!

• HDFS in• YARN

Page 24Architecting the Future of Big Data

Page 25: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future: Enabling more apps

• Beyond MR• Discussing next

– Long running services– Isolation– Multi-dimensional resource

scheduling

Page 25Architecting the Future of Big Data

Page 26: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Future: Long running services

• You can run them already!• Few enhancements needed

– Logs– Security– Management/monitoring

• Resource sharing across workload types

• Project Slider

Page 26Architecting the Future of Big Data

Page 27: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Fine-grain isolation for multi-tenancy

• Custom memory-monitoring• Cgroups• Linux Containers• VMs

Page 27Architecting the Future of Big Data

Page 28: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Multi-resource scheduling

• Today – memory & cpu– Physical memory / virtual memory– Cpu Cores – Virtual cores

• CPU stuff: More bake in• Disks

– Space– IOPS

• Network

Page 28Architecting the Future of Big Data

Page 29: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Other features

• Application SLAs• Node labels• Node affinity/anti-affinity• Better online queue-management

Page 29Architecting the Future of Big Data

Page 30: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

YARN EcosystemBeyond the core YARN project: Briefly

Architecting the Future of Big Data Page 30

Page 31: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Eco-system

Page 31

Applications Powered by YARNApache Giraph – Graph ProcessingApache Hama – BSP

Apache Hadoop MapReduce – BatchApache Tez – Batch/Interactive

Apache S4 – Stream ProcessingApache Samza – Stream ProcessingApache Storm – Stream ProcessingApache Spark – Iterative applicationsHOYA – HBase on YARN YARN Frameworks

Apache TwillREEF by MicrosoftSpring support for Hadoop 2

There's an app for that...YARN App Marketplace!

Page 32: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Apache TEZ

• Moving beyond MR• A data processing framework that can execute a complex DAG

of tasks.

• “Apache Tez - A New Chapter in Hadoop Data Processing”– By Siddharth Seth: YARN & Tez Committer/PMC Member– Thursday, April 3 (4:20-5:00pm)

Page 32Architecting the Future of Big Data

Page 33: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Recap

Architecting the Future of Big Data Page 33

Page 34: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Recap

Page 34Architecting the Future of Big Data

• Apache Hadoop 2 is, at least, twice as good!

• Exciting journey with Hadoop for this decade…– Hadoop is no longer a one-trick pony, err elephant– Beyond just HDFS & MapReduce

• Architecture for the future– Centralized data– Exciting spectrum of application types, workloads and usecases

Page 35: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Couple more things..

Architecting the Future of Big Data Page 35

Page 36: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

The Book is out!

Page 36Architecting the Future of Big Data

http://yarn-book.com/

Page 37: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014 Page 37Architecting the Future of Big Data

Page 38: Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

© Hortonworks Inc. 2014

Thank you!

Page 38

Download Sandbox: Experience Apache HadoopBoth 2.x and 1.x Versions Available!http://hortonworks.com/products/hortonworks-sandbox/

Questions Time!