big data 2012 - eth · 2014-06-24 · oct 19th, 2012 martin kaufmann – systems group, eth zürich...
TRANSCRIPT
![Page 1: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/1.jpg)
Martin Kaufmann – Systems Group, ETH ZürichOct 19th, 2012 1
Big Data 2012
Hadoop Tutorial
![Page 2: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/2.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 2
ContactExercise SessionFriday 14.15 to 15.00CHN D 46
Your AssistantMartin KaufmannOffice: CAB E 77.2E-Mail: [email protected]
Download of Exercises:http://www.systems.ethz.ch/courses/fall2012/BigData
![Page 3: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/3.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3
MapReduce
Parallel problems distributed across huge data sets using a large number of nodes
Two stages: Map step: master node takes input, divides into
smaller sub-problems Reduce step: master node collects answer from
all sub-problems and combindes them in some way
Condition: reduction function is associativeRemember: A x (B x C) = (A x B) x C
![Page 4: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/4.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 4
MapReduce
MapReduce transforms (key, value) pairs into list of values:
Map and Reduce functions defined with respect data stored in KV pairs: Map(k1, v2) → list(k2,v2) MapReduce then groups all pairs with same key Reduce(k2, list(v2)) → list(v3)
All functions executed in parallel !
![Page 5: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/5.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 5
Dataflow of MapReduce Input reader: divides input into ‘splits’. One split is
assigned to a ‘map’ function. Map function: Takes (key, value) pairs and
generates one or more output KV pairs Partition function: Asigning each map function to
a ‘reducer’. Returns an index of ‘reduce’. Comparison function: The input for each ‘reduce’
is sorted using a comparison function. Reduce function: The reduce function is called
once for each unique key in the sorted orderiterating through values and producing zero, one ormore outputs
Output writer: writes output of ‘reduce’ to storage
![Page 6: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/6.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 6
Overview of
Hadoop provides a programming model Efficient, automatic distribution of data & work across
machines Open source implementation of Google’s MapReduce
FW on top of Hadoop Distributed File System (HDFS) Large-scale distributed batch processing for vasts
amount of data (multi-terabytes) Runs on large clusters (1000s of nodes) of commodity
hw with reliability & fault-tolerance Highly scalable filesystem, computing coupled to
storage Provides a simplified programming model: map() & reduce()
no schema or type supportSlides adopted by
Cagri Balkesen
![Page 7: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/7.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 7
HDFS Architecture• Namenode: HDFS master
server• Manages the filesystem
namespace (block mappings)• Regulates access to files by
clients (open, close, rename, ...)
• Datanode: Manages data attached to each node
• Data is split into blocks & replicated (default is 64MB)
• Serves r/w requests of blocks• Data locality, computing goes to
data effective scheduling & parallel processing
• High aggregate bandwidth
Image Sources: [1] http://developer.yahoo.com/hadoop/tutorial/, [2] http://hadoop.apache.org/common/docs/current/hdfs_design.html
![Page 8: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/8.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 8
Maps execute in parallel over different local chunks
Map outputs shuffled/copied to reduce nodes
Reduce tasks begin after all local data is sorted
Mapping Lists
Reducing Lists
Image Sources: [1] http://developer.yahoo.com/hadoop/tutorial/ WordCount Example
Mapper(filename,contents):for each word in contents
emit(word, 1)
Reducer(word, values):sum = 0for each value in values:
sum = sum + valueemit(word, sum)
MAP
REDUCE
SHUFFLE
SORT
The MapReduce Paradigm
![Page 9: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/9.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 9
Job: A «full program» – an execution of a Mapper and Reducer across a data set
Task: An execution of a Mapper or a Reducer on a slice of data. a.k.a Task-In-Progress (TIP)
Master node runs JobTracker instance, which accepts Job requests from clients
TaskTracker instances run on slave nodes, periodically query JobTracker for work
TaskTracker forks separate Java process for task instances, failures isolated & restarts with same input
All mappers are equivalent; so map whatever data is local to a particular node in HDFS
TaskRunner launches Mapper/Reducer & knows which InputSplitsshould be processed; calls Mapper/Reducer for each record from the InputSplit Ex: InputSplit ↔each 64MB file chunk; RecordReader ↔each line in
chunk, also InputFormat identifies the InputSplit(i.e. TextInputFormat) Partitioner: Used in shuffle & determines the partition number for a key
Credits: [3] http://www.cloudera.com/wp-content/uploads/2010/01/4-ProgrammingWithHadoop.pdf
MapReduce Terminology
![Page 10: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/10.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 10
The WordCount Examplefunction map(String name, String document): // name: document name// document: document contentsfor each word w in document: emit (w, 1)
function reduce(String word, Iterator partialCounts): // word: a word// partialCounts: a list of aggregated partial countssum = 0 for each pc in partialCounts: sum += pc
emit (word, sum)
![Page 11: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/11.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 11
The WordCount ExampleDataset:
InputSplit-1
InputSplit-2
InputSplit-3
InputSplit-4
InputSplits are read and processed via TextInputFormat• Parses input• Generates key-value pairs: (key=offset, value=line-contents)• InputSplit boundaries expanded to newline \n
We are not whatwe want to be,but at leastwe are not whatwe used to be.
(k1,v1)
(k2,v2)
(k3,v3)
(k4,v4)
(k5,v5)
We are not whatwe want to be,but at leastwe are not whatwe used to be.
Splits:
![Page 12: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/12.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 12
The WordCount Example
We are not whatwe want to be,but at leastwe are not whatwe used to be.
(k1,v1)
(k2,v2)
(k3,v3)
(k4,v4)
(k5,v5)
Map(k1,v1)<we, 1> <are, 1> <not, 1> <what, 1>
Map(k2,v2)<we, 1> <want, 1> <to, 1> <be, 1>
Map(k3,v3) <but, 1> <at, 1> <least, 1>
Map(k4,v4) <we, 1> <are, 1> <not, 1> <what, 1>
Map(k5,v5) <we, 1> <used, 1> <to, 1> <be, 1>
<we, 1> <we, 1> <we, 1> <we, 1>
<are, 1> <are, 1>
<not, 1> <not, 1>
<what, 1> <what, 1>
Shuffle/Sort
Reduce(k,v[])
Reduce(k,v[])
Reduce(k,v[])
Reduce(k,v[])
<we, 4>
<are, 2>
<not, 2>
<what, 2>... ...
![Page 13: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/13.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 13
3 modes of setup Standalone: Single Java process, for verification & debugging Pseudo-distributed: Single machine, but JobTracker & NameNode on different
processes Fully-distributed: JobTracker & NameNode on different machines together with other
slave machines Let’s try standalone:
Download the latest stable release: http://hadoop.apache.org/core/releases.html Extract files: tar xvz hadoop-1.0.4*.tar.gz Set the following in conf/hadoop-env.sh
JAVA_HOME=/usr/java/default
In Hadoop directory Create an input folder: $~/hadoop> mkdir input Download & extract the sample: $~/hadoop/input> wget http://www.systems.ethz.ch/sites/default/files/hadoop-words.tar__0.gz
Run the word count example $~/hadoop> bin/hadoop jar hadoop-examples-*.jar wordcount input/ out/
See the results in out/
Setting up Hadoop
![Page 14: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/14.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 14
Dissecting the Word Count code The source code of Word Count is
src/examples/org/apache/hadoop/examples/WordCount.java Mapper class:
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);
}}
}
![Page 15: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/15.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 15
Reducer class:
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;for (IntWritable val : values) {sum += val.get();
}result.set(sum);context.write(key, result);
}}
Dissecting the Word Count code
![Page 16: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/16.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 16
Job setup:
public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 2) {System.err.println("Usage: wordcount <in> <out>");System.exit(2);
}Job job = new Job(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Dissecting the Word Count code
![Page 17: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/17.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 17
Pseudo-distributed Setup Hadoop still runs in a single machine but simulates distributed setup
by different processes for JobTracker & NameNode Change the configuration files as follows:
conf/core-site.xml:<configuration><property><name> fs.default.name </name><value> hdfs://localhost:9000 </value></property><configuration>
conf/hdfs-site.xml:<configuration><property><name> dfs.replication </name><value> 1 </value></property></configuration>
conf/mapred-site.xml<configuration><property><name> mapred.job.tracker </name><value> localhost:9001 </value></property><configuration>
![Page 18: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/18.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 18
Check whether you can connect without a passphrase: $> ssh localhost
If not, setup by executing the following: $> ssh-keygen -t rsa -P '' $> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Format and make the HDFS ready: $> bin/hadoop namenode -format
Start Hadoop daemons: $> bin/start-all.sh
Browse the web-interface of NameNode & JobTracker NameNode - http://localhost:50070/ JobTracker - http://localhost:50030/
Copy input files to the HDFS $> bin/hadoop dfs –put localInput dfsInput
Setup SSH, HDFS and start Hadoop
![Page 19: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/19.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 19
Job Tracker Web-Interface
![Page 20: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/20.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 20
NameNode Web-Interface
![Page 21: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/21.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 21
HDFS Commands $> ~/hadoop> bin/hadoop dfs
[-ls <path>]
[-du <path>]
[-cp <src> <dst>]
[-rm <path>]
[-put <localsrc> <dst>]
[-copyFromLocal <localsrc> <dst>]
[-moveFromLocal <localsrc> <dst>]
[-get [-crc] <src> <localdst>]
[-cat <src>]
[-copyToLocal [-crc] <src> <locdst>]
[-moveToLocal [-crc] <src> <locdst>]
[-mkdir <path>]
[-touchz <path>]
[-test -[ezd] <path>]
[-stat [format] <path>]
[-help [cmd]]
![Page 22: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/22.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 22
Input Data is a sample set of tweets from Twitter as follows (one tweet per line): {``text``:``tweet contents #TAG1 #TAG2``,
...,
``hashtags``: [{``text``:``TAG1``,...},
...,
{``text``:``TAG2``,...}
],
...
} \n Output the tags that occurs more than 10 times in the sample data set along with their
occurrence counts. Sample output:
TAG1 11
TAG2 50
TAG3 19
... Implement by modifying from the WordCount.java Compile your source by: (you might need to download Apache Commons CLI first)
> cd src/examples> javac -cp ../../hadoop-core-1.0.4.jar:../../../lib/commons-cli-1.2.jar org/apache/hadoop/examples/HashtagFreq.java
Example
![Page 23: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/23.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 23
Input Data is a sampled set of trades on stock market for a single day on 06/01/2006. The contents are as follows:
Description: @SYMBOL DATE EX TIME PRICE SIZE
IBM 06/01/2006 N 49813 84.2200 100 \n
IBM 06/01/2006 N 38634 84.0100 100 \n
SUN 06/01/2006 N 46684 85.4200 300 \n
SUN 06/01/2006 N 44686 85.6600 100 \n
Task: Compute the total volume of trades for each stock ticker and return all stocks having a volume higher than a given value from commandline.
In SQL: SELECT symbol, SUM(price*size) AS volume FROM Ticks GROUP BY symbol HAVING volume > V
Example total volume for IBM: 84.22*100+84.01*100 = 16823
Sample output: IBM 16823 Let’s assume filter = 20K IBM 16823
SUN 25711.66 ...
...
Implement by modifying from the WordCount.java
Create a directory in your $HADOOP_HOME, let’s say stocks/
Copy src/org/apache/hadoop/examples/WordCount.java to stocks/
Modify the code & name accordingly
Compile: javac -cp hadoop-core-0.20.203.0.jar:lib/commons-cli-1.2.jar stocks/StockVolume.java
Copy dataset to input/ : http://www.systems.ethz.ch/education/hs11/nis/project/stock_data.tar.gz
Run: > bin/hadoop stocks/StockVolume input/ output/
Example
![Page 24: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/24.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 24
Setting Job Specific Parameters
Set in the main, before submitting the Job: job.getConfiguration()
.setInt("filter", Integer.parseInt(otherArgs[2]));
Use inside map() or reduce(): context.getConfiguration().getInt("filter", -1);
See Hadoop API for other details: http://hadoop.apache.org/common/docs/current/api/index.html
![Page 25: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/25.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 25
Solution: Mapper
![Page 26: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/26.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 26
Solution: Reducer
![Page 27: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/27.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 27
Solution: The Job Setup
![Page 28: Big Data 2012 - ETH · 2014-06-24 · Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 3 MapReduce Parallel problems distributed across huge data sets using a large number](https://reader033.vdocument.in/reader033/viewer/2022043015/5f37c208b8478357af0dc2a7/html5/thumbnails/28.jpg)
Oct 19th, 2012 Martin Kaufmann – Systems Group, ETH Zürich 28
References
[1] http://developer.yahoo.com/hadoop/tutorial/[2] http://hadoop.apache.org/common/docs/current/hdfs_design.html[3] http://www.cloudera.com/wp-content/uploads/2010/01/4-ProgrammingWithHadoop.pdf[4] http://hadoop.apache.org/common/docs/current/api/index.html
Happy Coding