harnessing the power of yarn with apache twill · pdf fileharnessing the power of yarn with...

57
Harnessing the Power of YARN with Apache Twill Andreas Neumann andreas[at]continuuity.com @anew68

Upload: vuongtu

Post on 02-Feb-2018

239 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Harnessing the Power of YARN with Apache TwillAndreas Neumann

andreas[at]continuuity.com @anew68

Page 2: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!
Page 3: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A Distributed App

split split split

part part part

shuffle

Reducers

Mappers

Page 4: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A Map/Reduce Cluster

split split split

part part part

split split split

part part part

split split

part part

split split

part part

Page 5: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

What Next?

Page 6: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A Message Passing (MPI) App

data

data data

data data

data

Page 7: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A Stream Processing App

Database

events

Page 8: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A Distributed Load Test

test testtest

test test test

test test test

test

test

test

Web Service

Page 9: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A Multi-Purpose Clustert

Page 10: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Continuuity Reactor• Developer-Centric Big Data Application Platform!

• Many different types of jobs in Hadoop cluster!• Real-time stream processing!

• Ad-hoc queries!

• Map/Reduce!

• Web Apps

Page 11: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

The Answer is YARN• Resource Manager of Hadoop 2.0!

• Separates !• Resource Management!

• Programming Paradigm!

• (Almost) any distributed app in Hadoop cluster!• Application Master to negotiate resources

Page 12: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A YARN Application

data

data data

data data

data

YARNResourceManager

AppMaster

Page 13: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

A Multi-Purpose Clustert

AM

AM

AM AM

AM

YARNResourceManager

Page 14: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Node Mgr

Node Mgr

YARN - How it works

YARNResourceManager

YARN Client

1.

1. Submit App Master

AM

2.

2. Start App Master in a Container

3.

3. Request Containers Task

TaskTask

4.

4. Start Tasks in Containers

Page 15: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Starting the App Master

YARNResourceManager

YARN Client

Node Mgr

. . .

Local file system

Local file system

HDFS distributed file system

Local file system

!AM.jar

AM.jar1) copy to HDFS

AM.jar 2) copy to local

AM

3) load

Node Mgr

Page 16: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

The YARN Client1. Connect to the Resource Manager.!2. Request a new application ID.!3. Create a submission context and a container launch context.!4. Define the local resources for the AM.!5. Define the environment for the AM.!6. Define the command to run for the AM.!7. Define the resource limits for the AM.!8. Submit the request to start the app master.

Page 17: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Writing the YARN Client1. Connect to the Resource Manager: YarnConfiguration yarnConf = new YarnConfiguration(conf); InetSocketAddress rmAddress = NetUtils.createSocketAddr(yarnConf.get( YarnConfiguration.RM_ADDRESS, YarnConfiguration.DEFAULT_RM_ADDRESS)); LOG.info("Connecting to ResourceManager at " + rmAddress); configuration rmServerConf = new Configuration(conf); rmServerConf.setClass( YarnConfiguration.YARN_SECURITY_INFO, ClientRMSecurityInfo.class, SecurityInfo.class); ClientRMProtocol resourceManager = ((ClientRMProtocol) rpc.getProxy( ClientRMProtocol.class, rmAddress, appsManagerServerConf)); !

Page 18: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Writing the YARN Client2) Request an application ID:

GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class); GetNewApplicationResponse response = resourceManager.getNewApplication(request); LOG.info("Got new ApplicationId=" + response.getApplicationId());

3) Create a submission context and a launch context ! ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class); appContext.setApplicationId(appId); appContext.setApplicationName(appName); ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);

Page 19: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Writing the YARN Client4. Define the local resources: Map<String, LocalResource> localResources = Maps.newHashMap(); // assume the AM jar is here: Path jarPath; // <- known path to jar file ! // Create a resource with location, time stamp and file length LocalResource amJarRsrc = Records.newRecord(LocalResource.class); amJarRsrc.setType(LocalResourceType.FILE); amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath)); FileStatus jarStatus = fs.getFileStatus(jarPath); amJarRsrc.setTimestamp(jarStatus.getModificationTime()); amJarRsrc.setSize(jarStatus.getLen()); localResources.put("AppMaster.jar", amJarRsrc); ! amContainer.setLocalResources(localResources);

Page 20: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Writing the YARN Client5. Define the environment:

// Set up the environment needed for the launch context Map<String, String> env = new HashMap<String, String>(); ! // Setup the classpath needed. // Assuming our classes are available as local resources in the // working directory, we need to append "." to the path. String classPathEnv = "$CLASSPATH:./*:"; env.put("CLASSPATH", classPathEnv); ! // setup more environment env.put(...); ! amContainer.setEnvironment(env); !

Page 21: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Writing the YARN Client6. Define the command to run for the AM: // Construct the command to be executed on the launched container String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster" + " arg1 arg2 arg3" + " 1>" + ApplicationConstants.LOG_DIR_EXPANSION + "/stdout" + " 2>" + ApplicationConstants.LOG_DIR_EXPANSION + "/stderr"; ! List<String> commands = new ArrayList<String>(); commands.add(command); ! // Set the commands into the container spec amContainer.setCommands(commands); !

Page 22: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Writing the YARN Client7. Define the resource limits for the AM: // Define the resource requirements for the container. // For now, YARN only supports memory constraints. // If the process takes more memory, it is killed by the framework. Resource capability = Records.newRecord(Resource.class); capability.setMemory(amMemory); amContainer.setResource(capability); // Set the container launch content into the submission context appContext.setAMContainerSpec(amContainer); !!!!

Page 23: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Writing the YARN Client8. Submit the request to start the app master: // Create the request to send to the Resource Manager SubmitApplicationRequest appRequest = Records.newRecord(SubmitApplicationRequest.class); appRequest.setApplicationSubmissionContext(appContext); ! // Submit the application to the ApplicationsManager resourceManager.submitApplication(appRequest); !!!!!

Page 24: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Node Mgr

Node Mgr

YARN - How it works

YARNResourceManager

YARN Client

1.

1. Submit App Master

AM

2.

2. Start App Master in a Container

3.

3. Request Containers Task

TaskTask

4.

4. Start Tasks in Containers

Page 25: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

YARN is complex• Three different protocols to learn!

• Client -> RM, AM -> RM, AM -> NM!

• Asynchronous protocols!

• Full Power at the expense of simplicity!

• Duplication of code !• App masters!

• YARN clients

Page 26: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Can it be easier?

Page 27: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

YARN App vs Multi-threaded App

YARN Application Multi-threaded Java Application

YARN Client java command to launch application with options and arguments

Application Master main() method preparing threads

Task in Container Runnable implementation, each runs in its own Thread

Page 28: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Apache Twill• Adds simplicity to the power of YARN!

• Java thread-like programming model!

• Incubated at Apache Software Foundation (Nov 2013)!• Current release: 0.2.0-incubating

Page 29: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Hello Worldpublic class HelloWorld { static Logger LOG = LoggerFactory.getLogger(HelloWorld.class); static class HelloWorldRunnable extends AbstractTwillRunnable { @Override public void run() { LOG.info("Hello World"); } }

public static void main(String[] args) throws Exception { YarnConfiguration conf = new YarnConfiguration(); TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181"); runner.startAndWait(); TwillController controller = runner.prepare(new HelloWorldRunnable()) .start(); Services.getCompletionFuture(controller).get(); }

Page 30: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Hello Worldpublic class HelloWorld { static Logger LOG = LoggerFactory.getLogger(HelloWorld.class); static class HelloWorldRunnable extends AbstractTwillRunnable { @Override public void run() { LOG.info("Hello World"); } }

public static void main(String[] args) throws Exception { YarnConfiguration conf = new YarnConfiguration(); TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181"); runner.startAndWait(); TwillController controller = runner.prepare(new HelloWorldRunnable()) .start(); Services.getCompletionFuture(controller).get(); }

Page 31: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Hello Worldpublic class HelloWorld { static Logger LOG = LoggerFactory.getLogger(HelloWorld.class); static class HelloWorldRunnable extends AbstractTwillRunnable { @Override public void run() { LOG.info("Hello World"); } }

public static void main(String[] args) throws Exception { YarnConfiguration conf = new YarnConfiguration(); TwillRunnerService runner = new YarnTwillRunnerService(conf, "localhost:2181"); runner.startAndWait(); TwillController controller = runner.prepare(new HelloWorldRunnable()) .start(); Services.getCompletionFuture(controller).get(); }

Page 32: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Twill is easy.

Page 33: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Architecture

!!

Twill Client !!

… !!

Twill Runnable

Twill Runner

Twill Runnable

This is the only programming

interface you need

Node Mgr

Twill AM

YARNResourceManager

Task Task

Task

Task Task

Task

Node Mgr

Node Mgr

Node Mgr

1.

2.

3.

4.

Page 34: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Twill ApplicationWhat if my app needs more than one type of task?!

!

!

!

!

!

!

!

Page 35: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Twill ApplicationWhat if my app needs more than one type of task?!

• Define a TwillApplication with multiple TwillRunnables inside:! public class MyTwillApplication implements TwillApplication { ! @Override public TwillSpecification configure() { return TwillSpecification.Builder.with() .setName("Search") .withRunnable() .add("crawler", new CrawlerTwillRunnable()).noLocalFiles() .add("indexer", new IndexerTwillRunnable()).noLocalFiles() .anyOrder() .build(); } }

Page 36: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Patterns and Features

Page 37: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Real-Time Logging

!!

Twill Client

Twill Runner

LogHandler to receive logs from

all runnables

Node Mgr

AM+ Kafka

YARNResourceManager

Task Task

Task

Task Task

Task

Node Mgr

Node Mgr

Node Mgr

log streamlog

log appender to send logs

to Kafka

Page 38: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Real-time Logging TwillController controller = runner.prepare(new HelloWorldRunnable()) .addLogHandler(new PrinterLogHandler( new PrintWriter(System.out, true))) .start(); !

OR !! controller.addLogHandler(new PrinterLogHandler( new PrintWriter(System.out, true))); !!

Page 39: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Resource Report• Twill Application Master exposes HTTP endpoint!

• Resource information for each container (AM and Runnables)!

• Memory and virtual core!

• Number of live instances for each Runnable!

• Hostname of the container is running on!

• Container ID!

• Registered as AM tracking URL

Page 40: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Resource Report• Programmatic access to resource report.!

! ! ResourceReport report = controller.getResourceReport(); Collection<TwillRunResources> resources = report.getRunnableResource("MyRunnable");

!

!

!

!

Page 41: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

State Recovery• What happens to the Twill app if the client terminates?!

• It keeps running!

• Can a new client take over control?

Page 42: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

!!

Twill Client

State Recovery

Twill Runner

Recover state from ZooKeeper

Node Mgr

AM+ Kafka

YARNResourceManager

Task Task

Task

Task Task

Task

Node Mgr

Node Mgr

Node Mgr

ZooKeeperZooKeeperZooKeeper

state

Page 43: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

State Recovery• All live instances of an application!

! ! Iterable<TwillController> controllers = runner.lookup("HelloWorld");!

• A particular live instance!

! ! TwillController controller = runner.lookup(“HelloWorld”, RunIds.fromString("lastRunId"));

• All live instances of all applications!

! ! Iterable<LiveInfo> liveInfos = runner.lookupLive();

Page 44: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

!!

Twill Client

Command Messages

Twill Runner

Send command message to runnables

Node Mgr

AM+ Kafka

YARNResourceManager

Task Task

Task

Task Task

Task

Node Mgr

Node Mgr

Node Mgr

ZooKeeperZooKeeperZooKeeper

message

Page 45: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Command Messages• Send to all Runnables:!

! ListenableFuture<Command> completion = controller.sendCommand( Command.Builder.of("gc").build());

• Send to the “indexer” Runnable:!

! ListenableFuture<Command> completion = controller.sendCommand("indexer", Command.Builder.of("flush").build());

• The Runnable implementation defines how to handle it:!

! void handleCommand(Command command) throws Exception;

Page 46: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Elastic Scaling• Change the instance count of a live runnable!

! ListenableFuture<Integer> completion = controller.changeInstances("crawler", 10); // Wait for change complete completion.get();

• Implemented by a command message to the AM.!

!

Page 47: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Service Discovery• Running a service in Twill!

• What host/port should a client connect to?

Page 48: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

!!

Twill Client

Service Discovery

Twill Runner

Watch for change in discovery nodes

Node Mgr

AM+ Kafka

YARNResourceManager

Task Task

Task

Task Task

Task

Node Mgr

Node Mgr

Node Mgr

ZooKeeperZooKeeperZooKeeper

Service changes

Registerservice

Page 49: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Service Discovery• In TwillRunnable, register as a named service:!

! @Override public void initialize(TwillContext context) { // Starts server on random port int port = startServer(); context.announce("service", port); }

• Discover by service name on client side:!

! ServiceDiscovered serviceDiscovered = controller.discoverService(“service");

!

Page 50: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Bundle Jar Execution• Different library dependencies than Twill!

• Run existing application in YARN

Page 51: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Bundle Jar Execution• Bundle Jar contains classes and all libraries depended on inside a jar.!

! MyMain.class MyRecord.class lib/guava-16.0.1.jar lib/netty-all-4.0.17.jar

• Easy to create with build tools!• Maven - maven-bundle-plugin!

• Gradle - apply plugin: 'osgi' !

Page 52: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Bundle Jar Execution• Execute with BundledJarRunnable!

• See BundledJarExample in the source tree.!

! java org.apache.twill.example.yarn.BundledJarExample <zkConnectStr> <bundleJarPath> <mainClass> <args…> !

• Successfully ran Presto on YARN!• Non-intrusive, no code modification for Presto!

• Simple maven project to use Presto in embedded way and create Bundle Jar!!

Page 53: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Distributed Debugging• Enable debugging when starting the app!

controller = runner.prepare(new DummyApplication()) .enableDebugging("r1") .start();

• Runnable r1 will start that in JVM that is open for remote debugging!

• The remote debug port is available in the resource report!ResourceReport report = controller.getResourceReport(); for (resources : report.getRunnableResources(runnable)) { print(resources.getHost() + “:“ + resources.getDebugPort()); }

• Attach your IDE’s debugger to the remote JVM.

Page 54: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Distributed Debugging• Note: This is not secure !

• JVM does not secure the debug port!

• Use it only when needed and environment is safe.

Page 55: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

The Road Ahead• Scripts for easy life cycle management and scaling!

• Distributed coordination within application!

• Non-Java application!

• Suspend and resume application!

• Metrics!

• Local runner service!

• …

Page 56: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Summary• YARN is powerful!

• Allows applications other than M/R in Hadoop cluster!

• YARN is complex!• Complex protocols, boilerplate code!

• Twill makes YARN easy!• Java Thread-like runnables!

• Add-on features required by many distributed applications!

• Productivity Boost!• Developers can focus on application logic

Page 57: Harnessing the Power of YARN with Apache Twill · PDF fileHarnessing the Power of YARN with Apache Twill Andreas Neumann andreas ... ZooKeeper ZooKeeper ZooKeeper ... maven-bundle-plugin!

Berlin Buzzwords 2014

Thank You• Twill is Open Source and needs your contributions!

• twill.incubator.apache.org!

[email protected]!

!

• Continuuity is hiring!• continuuity.com/careers