hadoop summit europe talk 2014: apache hadoop yarn: present and future

Post on 19-Aug-2014

786 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Title: Apache Hadoop YARN: Present and Future Abstract: Apache Hadoop YARN evolves the Hadoop compute platform from being centered only around MapReduce to being a generic data processing platform that can take advantage of a multitude of programming paradigms all on the same data. In this talk, we'll talk about the journey of YARN from a concept to being the cornerstone of Hadoop 2 GA releases. We'll cover the current status of YARN, how it is faring today and how it stands apart from the monochromatic world that is Hadoop 1.0. We`ll then move on to the exciting future of YARN - features that are making YARN a first class resource-management platform for enterprise Hadoop, rolling upgrades, high availability, support for long running services alongside applications, fine-grain isolation for multi-tenancy, preemption, application SLAs, application-history to name a few.

TRANSCRIPT

© Hortonworks Inc. 2014

Apache Hadoop YARNPresent and Future

Vinod Kumar Vavilapallivinodkv [at] apache.org@tshooter

Page 1

© Hortonworks Inc. 2014

A quick show of hands..

• Hadoop 2

Page 2Architecting the Future of Big Data

Real life Hadoop Logo

© Hortonworks Inc. 2014

Who am I?

• 6.75 Hadoop-years old• Last thing at School – a two node Tomcat cluster. Three months later,

first thing at job, brought down a 800 node cluster ;)• Previously @Yahoo!• Now @Hortonworks• Two hats

– Hortonworks: Hadoop MapReduce and YARN Development lead– Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member

• Worked/working on– YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop

security– Apache Ambari: Kickstarted the project and its first release– Stinger: High performance data processing with Hadoop/Hive

• Lots of trouble shooting on clusters• 99% + code in Apache, Hadoop

Page 3Architecting the Future of Big Data

© Hortonworks Inc. 2014

Agenda

• Apache Hadoop 2 : Overview• Past• Present• Future

Page 4Architecting the Future of Big Data

© Hortonworks Inc. 2014

Apache Hadoop 2Next Generation Architecture

Architecting the Future of Big Data Page 5

© Hortonworks Inc. 2014

What is YARN?

• Resource Management Platform– MapReduce v2– Beyond MapReduce with Tez, Storm, Spark; in Hadoop!– Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider?

• How is it different from Hadoop 1? ..

Page 6Architecting the Future of Big Data

© Hortonworks Inc. 2014

Hadoop 1 vs Hadoop 2

HADOOP 1.0

HDFS(redundant, reliable storage)

MapReduce(cluster resource management

& data processing)

HDFS2(redundant, highly-available & reliable storage)

YARN(cluster resource management)

MapReduce(data processing)

Others

HADOOP 2.0

Single Use SystemBatch Apps

Multi Purpose PlatformBatch, Interactive, Online, Streaming, …

Page 7

© Hortonworks Inc. 2014

Key Benefits of YARN

• Scale

• New Programming Models & Services

• Improved cluster utilization

• Agility

• To infinity and beyond ..

Page 8

© Hortonworks Inc. 2014

Why Migrate?

• 2.0 >= 2 * 1.0– HDFS: Lots of ground-breaking features– YARN: Next generation architecture

• Return on Investment: 2x throughput on same hardware!• Ready for improvements in hardware• Not convinced? Let’s see what others are saying!

Page 9Architecting the Future of Big Data

© Hortonworks Inc. 2014

Yahoo!

• Leader/Visionary on all things Hadoop!• On YARN (0.23.x)• Moving fast to 2.x

Page 10Architecting the Future of Big Data

http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html

© Hortonworks Inc. 2014

Twitter

Page 11Architecting the Future of Big Data

© Hortonworks Inc. 2014

Ebay

• Has one of the largest Hadoop clusters in the industry with many petabytes of data

• Migrated production clusters to Hadoop-2• Go to Mayank’s talk

– “Hadoop-2 @ ebay”!– Thursday, April 3– Track : Deployment and Operations

• Should be convinced by now .. . No?

Page 12Architecting the Future of Big Data

© Hortonworks Inc. 2014

YARN: the Data Operating System

Page 13Architecting the Future of Big Data

© Hortonworks Inc. 2014

Present

Architecting the Future of Big Data Page 14

© Hortonworks Inc. 2014

Apache Hadoop releases

• 15 October, 2013• The 1st GA release of Apache Hadoop 2.x• YARN

– First stable and supported release of YARN– Binary Compatibility for MapReduce applications built on hadoop-1.x– YARN level APIs solidified for the future– Performance– Scale!

• HDFS– High Availability for HDFS– HDFS Federation– HDFS Snapshots– NFSv3 access to data in HDFS

• Support for running Hadoop on Microsoft Windows• Substantial amount of integration testing with rest of projects in the

ecosystem

Page 15Architecting the Future of Big Data

Apache Hadoop 2.2

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• 24 February, 2014• First post GA release for the year 2014

• Alpha features in YARN– ResourceManager HA– Application History– Will cover in the 2.4 content

• HDFS– Details follow..

• Number of bug-fixes, enhancements

Page 16Architecting the Future of Big Data

Apache Hadoop 2.3

© Hortonworks Inc. 2014

HDFS: Heterogeneous Storage

Page 17Architecting the Future of Big Data

© Hortonworks Inc. 2014

HDFS: DataNode caching

Page 18Architecting the Future of Big Data

© Hortonworks Inc. 2014

Apache Hadoop releases (contd)

• Very soon!

• YARN– Details follow..– ResourceManager restart fail-over for high availability– Preemption– Application History and timeline

• HDFS– FileSystem ACLs– Rolling upgrades

Page 19Architecting the Future of Big Data

Apache Hadoop 2.4

© Hortonworks Inc. 2014

ResourceManager Restart and fail-over

Page 20Architecting the Future of Big Data

ZooKeeper

© Hortonworks Inc. 2014

Capacity Scheduler Preemption

Page 21Architecting the Future of Big Data

© Hortonworks Inc. 2014

Application History and Timeline

• Few MR specific implementations: History and web-UI• Not just MR anymore!• History

– MapReduce specific Job History Server– Beyond ResourceManager Restart

• Timeline– Framework specific event collection and UIs

• Run analytics on historical apps!

Page 22Architecting the Future of Big Data

© Hortonworks Inc. 2014

Future

Architecting the Future of Big Data Page 23

© Hortonworks Inc. 2014

Future: Operational enhancements

• Rolling upgrades– No/minimal impact to users– Ideal: Always rolling!

• HDFS in• YARN

Page 24Architecting the Future of Big Data

© Hortonworks Inc. 2014

Future: Enabling more apps

• Beyond MR• Discussing next

– Long running services– Isolation– Multi-dimensional resource

scheduling

Page 25Architecting the Future of Big Data

© Hortonworks Inc. 2014

Future: Long running services

• You can run them already!• Few enhancements needed

– Logs– Security– Management/monitoring

• Resource sharing across workload types

• Project Slider

Page 26Architecting the Future of Big Data

© Hortonworks Inc. 2014

Fine-grain isolation for multi-tenancy

• Custom memory-monitoring• Cgroups• Linux Containers• VMs

Page 27Architecting the Future of Big Data

© Hortonworks Inc. 2014

Multi-resource scheduling

• Today – memory & cpu– Physical memory / virtual memory– Cpu Cores – Virtual cores

• CPU stuff: More bake in• Disks

– Space– IOPS

• Network

Page 28Architecting the Future of Big Data

© Hortonworks Inc. 2014

Other features

• Application SLAs• Node labels• Node affinity/anti-affinity• Better online queue-management

Page 29Architecting the Future of Big Data

© Hortonworks Inc. 2014

YARN EcosystemBeyond the core YARN project: Briefly

Architecting the Future of Big Data Page 30

© Hortonworks Inc. 2014

Eco-system

Page 31

Applications Powered by YARNApache Giraph – Graph ProcessingApache Hama – BSP

Apache Hadoop MapReduce – BatchApache Tez – Batch/Interactive

Apache S4 – Stream ProcessingApache Samza – Stream ProcessingApache Storm – Stream ProcessingApache Spark – Iterative applicationsHOYA – HBase on YARN YARN Frameworks

Apache TwillREEF by MicrosoftSpring support for Hadoop 2

There's an app for that...YARN App Marketplace!

© Hortonworks Inc. 2014

Apache TEZ

• Moving beyond MR• A data processing framework that can execute a complex DAG

of tasks.

• “Apache Tez - A New Chapter in Hadoop Data Processing”– By Siddharth Seth: YARN & Tez Committer/PMC Member– Thursday, April 3 (4:20-5:00pm)

Page 32Architecting the Future of Big Data

© Hortonworks Inc. 2014

Recap

Architecting the Future of Big Data Page 33

© Hortonworks Inc. 2014

Recap

Page 34Architecting the Future of Big Data

• Apache Hadoop 2 is, at least, twice as good!

• Exciting journey with Hadoop for this decade…– Hadoop is no longer a one-trick pony, err elephant– Beyond just HDFS & MapReduce

• Architecture for the future– Centralized data– Exciting spectrum of application types, workloads and usecases

© Hortonworks Inc. 2014

Couple more things..

Architecting the Future of Big Data Page 35

© Hortonworks Inc. 2014

The Book is out!

Page 36Architecting the Future of Big Data

http://yarn-book.com/

© Hortonworks Inc. 2014 Page 37Architecting the Future of Big Data

© Hortonworks Inc. 2014

Thank you!

Page 38

Download Sandbox: Experience Apache HadoopBoth 2.x and 1.x Versions Available!http://hortonworks.com/products/hortonworks-sandbox/

Questions Time!

top related