hadoop with r mapreduce jobs in how to program - jordi · pdf filehow to program mapreduce...

Post on 27-Feb-2018

218 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to program MapReduce jobs in

Hadoop with RGroup 8

João Rosa, Mario Almeida, Alex Pérez

Index

● Introduction

● Hadoop

● MapReduce

● R

● Why and how?

● Possible uses? Business opportunities?

● Conclusion

● Questions

● References

Nowadays, we have lots of data. BIG DATA!

If we need to analyse this we have a problem...

...but, if we need to analyse this we have a BIG DATA problem!

A possible solution!

+

How can we analyse this BIG DATA?

The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The project includes these subprojects:

Hadoop Common is a set of utilities that support the Hadoop subprojects. Hadoop Common includes FileSystem, RPC, and serialization libraries.

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant.

● Highly fault-tolerant with hardware Failure● Designed to be deployed on low-cost hardware● Streaming Data Access● Large Data Sets● Portability Across Heterogeneous Hardware and

Software Platforms

Supports distributed computing on large data sets on clusters of computers

Process large amounts of raw data

Map + Reduce

R is the language of Pirates!!!

Rrrrr

What is R?

It's a language and environment for statistical computing and graphics!

What is R?

2 million analysts!Quantitative finance!Google, Facebook and LinkedIn!

Why R?

● Current analytic solutions are costly!● New methods for analyzing complex datasets!

Why Hadoop with R?

"Easiest, most productive, most elegant way to write map reduce jobs."

Why Hadoop with R?

● One-two orders of magnitude less code than Java

Why Hadoop with R?

Readable, reusable and extensible language.

Why Hadoop with R?

To give R analysts a way to access the map-reduce programming paradigm using big data sets.

How to use Hadoop with R?RHadoop = rmr + rHDFS + rHBase ● rhdfs - functions providing file management

of the HDFS from within R (RJava). ● rhbase - functions providing database

management for the HBase distributed database from within R (Thrift).

● rmr - functions providing Hadoop

MapReduce functionality in R.

Business opportunities?

xkcd.com

Conclusions

Productivity vs EfficiencyWide variety of statistical and graphical techniquesBusiness orientation

Questions?

Referenceshttp://hadoop.apache.org/ - Apache Hadoop's projecthttp://www.r-bloggers.com/how-to-program-mapreduce-jobs-in-hadoop-with-r/ - teachers pagehttp://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf - MapReducehttps://github.com/RevolutionAnalytics/RHadoop/wiki/Tutorial - MapReduce in R tutorialhttp://www.inside-r.org/r-doc/base/lapply - R lapplyhttp://www.inside-r.org/r-doc/base/tapply - R tapplyhttp://www.revolutionanalytics.com/what-is-open-source-r/ - What is R?http://www.r-project.org/ - What is R? Official pagehttp://en.wikipedia.org/wiki/R_(programming_language) - R wikihttp://www.johndcook.com/R_language_for_programmers.html - R programming for those coming from other languageshttp://www.revolutionanalytics.com/why-revolution-r/whitepapers/r-is-hot.php- why are R is hot

PicturesWe tried to use CC pictures, bellow are their respective links: http://www.flickr.com/photos/nanagyei/4880468290 - pig pirateshttp://www.flickr.com/photos/timypenburg/5328226108 - maths and penhttp://www.flickr.com/photos/48481327 - graduatehttp://s0.geograph.org.uk/geophotos/01/53/43/1534341_7dc47500.jpg - storehttp://www.flickr.com/photos/dizfunk/3066153143/ - nerdhttp://geekithawaii.com/wp-content/uploads/2011/01/7562581_l.jpg - skyhttp://www.robweir.com/blog/wp-content/uploads/2011/01/numbers.jpg - numbershttp://delightfulchildrensbooks.files.wordpress.com/2011/02/read-around-the-world.jpg - children Others:http://www.xkcd.com

top related