tang zhenkun email: [email protected]@163.com

33
Let Mapreduce Programs Fly Tang Zhenkun Email: [email protected]

Upload: angela-sader

Post on 14-Dec-2015

256 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Let Mapreduce Programs Fly

Tang ZhenkunEmail: [email protected]

Page 2: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Mapreduce Basics Hadoop Counters Hadoop Log Info(slf4j) Unit Test(JUnit, MRUnit) Guava(Google Core Libraries for Java

1.6+) Others References

Overview

Page 3: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Hadoop job submit flow

Mapreduce Basics

Page 4: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Hadoop Web GUI

Mapreduce Basics

Page 5: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Hadoop job submit flow

Mapreduce Basics

1. Invisible to details2. None step-through Debug

If errors?

Not just pray!

Page 6: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Command Errors, Grammar Errors Check, and check, and check again…

Logic Errors That is the point that we need to deal

with.

Errors

Page 7: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

First, We need to check our mapreduce outputs.

Page 8: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Hadoop Standard Counters Map output records Reduce output records

Custom Counters

Hadoop Counters

Page 9: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

How to custom a mapreduce counter?

context.getCounter(counterName);context.getCounter(groupName, counterName);

输入文件

Page 10: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

How to custom a mapreduce counter?

Page 11: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Second, We need

more output information to check

our programs.

Page 12: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Stdout does not work. System.out.println() X

Use Logger. Eg: log4j, slf4j

Hadoop Log Info

Page 13: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

SLF4j – Simple Logging Façade for Java.

Simple, easy to use.

Hadoop Log Info – Slf4j

Page 14: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Hadoop Log Info – Slf4j

Page 15: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

But, we still want to assume the

correctness before we run the programs.

Page 16: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

TDD, Test-Driven Development,

Unit Test

TDD encourages simple designs and inspires

confidence.

Page 17: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

JUnit(Unit Test for Java) #Unit(for C#) XUnit

Unit Test – JUnit

Page 18: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

小孩分油问题:两个小孩去打油,一人带了一个一斤的空瓶,另一个带了一个七两、一个三两的空瓶。原计划各打一斤油,可是由于所带的钱不够,只好两人合打了一斤 (10两 )油,在回家的路上,二人想平分这一斤油,可是又没有其它工具。试仅用三个瓶子 (一斤、七两、三两 )精确地分出两个半斤油来。

How to write unit tests using JUnit?

Page 19: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Define a state: Each represents the 10 ounces,

7ounces, and 3 ounces bottle. Define the Operation:

multiAndPlus(X, b) Eg: pour 10 ounces from the first(10o)

bottle to the third one.

How to write unit tests using JUnit?

Page 20: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

How to write unit tests using JUnit?

MatTest.java

Mat.java

Page 21: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

@Test @Before, @After Assert*

And last, RUN in Java Normal Application.

How to write unit tests using JUnit?

Page 22: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

MRUnit, Unit Test for Hadoop Mapreduce

Unit Test - MRUnit

Page 23: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

MapDriver ReduceDriver MapReduceDriver

withInput(key, value) withOutput(key, value) runTest()

And last, RUN in Java Normal Application.

How to write unit test using MRUnit?

Page 24: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

And then, we also want to

assume the correctness

in our programs.

Page 25: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

The Art of Assertion in CH5 of Programming Pearls, Second Edition.

Assert in Java assert <boolean expression> assert <boolean expression> : <error

message>

But, you must run the application with enabling assertions implicitly.(java -ea <className>)

Precondition in Guava

Assertions

Page 26: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Guava, Google Core Libraries for Java 1.6+

Preconditions

Preconditions in Guava

checkArgument(i >= 0, "Argument was %s but expected nonnegative", i);checkArgument(i < j, "Expected i < j, but %s > %s", i, j);

Page 27: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Other useful libraries.

Guava

http://code.google.com/p/guava-libraries/

Page 28: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

How to custom a partitioner in hadoop?

自定义数据类型 CustomType

自定义 Partitioner

Page 29: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

How to custom a partitioner in hadoop?

Page 30: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

How to custom a partitioner in hadoop?

Partitioner: return Key % 3

When change to: (return key / 3), and change the number of reduce tasks to 4

Totally ordering.

Page 31: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Maven Auto endependency management

Hadoop Remote Debug JDWP, Java Debug Wire Protocol

HPROF Analysis tools in JDK

Others

Page 32: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Hadoop, the Definitive Guide, Second Edition.

http://www.junit.org/ http://incubator.apache.org/mrunit/ http://code.google.com/p/guava-

libraries/ http

://insightfullogic.com/blog/2011/oct/21/5-reasons-use-guava/

References

Page 33: Tang Zhenkun Email: tangzk2011@163.comtangzk2011@163.com

Thank you for your time!