hadoop map reduce
TRANSCRIPT
![Page 1: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/1.jpg)
Bui Quang Duy @ Septeni Technology
Hanoi 2014/01
![Page 2: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/2.jpg)
� � Introduction � Hadoop
� Hadoop Architecture � HDFS
� PYXIS & Hadoop
Outline
![Page 3: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/3.jpg)
� � Starting in Vietnam since March 2013 � Totally 45 employees � Heading to No.1 Ad Technology center in Asia
What’s Septeni technology
![Page 4: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/4.jpg)
� � A programming model to distribute a task on multiple
nodes � Used to develop solutions that will process large amounts
of data in a parallelized fashion in clusters of computing nodes
� Features of MapReduce: � Fault-tolerance � Status and monitoring tools � A clean abstraction for programmers
What’s Mapreduce
![Page 5: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/5.jpg)
. . .
User Program
Master
Split 1
Split 2
Split 3
Split 4
Split 5 . . .
Worker
Worker
Worker
Input Files Map Phase
Key/Value Pairs
Worker
Worker
Intermediate Operations
Output file 1
Reduce Phase
Remote read
Output Files
Fork Fork Fork
Write Local Write
Assign Map
Assign Reduce
MapReduce Execution Overview
Output file 2
![Page 6: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/6.jpg)
� Hadoop
� Open Source Implementation of MapReduce by Apache Software Foundation.
� Created by Doug Cutting. � Derived from Google's MapReduce and Google File
System (GFS) papers. � Apache Hadoop is a software framework that supports
data-intensive distributed applications under a free license
� It enables applications to work with thousands of computational independent computers and petabytes of data.
![Page 7: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/7.jpg)
� Hadoop Components
HDFS
Storage
Self-healing high-bandwidth clustered storage
MapReduce
Processing
Fault-tolerant distributed processing
![Page 8: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/8.jpg)
�
Hadoop Architecture
Secondary Namenode
Namenode JobTracker
Data node
TaskTracker
Map Map
Map
Reduce
Data node
TaskTracker
Map
Data node
TaskTracker
Map
Reduce Reduce
Map Map
Reduce Reduce Reduce
Reduce
Map Map
Reduce Reduce
![Page 9: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/9.jpg)
� Dataflow in Hadoop
� Map tasks write their output to local disk � Output available after map task has completed
� Reduce tasks write their output to HDFS � Once job is finished, next job’s map tasks can be
scheduled, and will read input from HDFS
� Therefore, fault tolerance is simple: simply re-run tasks on failure � No consumers see partial operator output
![Page 10: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/10.jpg)
� HDFS Basics
� HDFS is a filesystem written in Java � Sits on top of a native filesystem � Provides redundant storage for massive amounts
of data � Use Commodity devices
![Page 11: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/11.jpg)
� HDFS Data
� Data is split into blocks and stored on multiple nodes in the cluster
� Each block is usually 64 MB or 128 MB � Each block is replicated multiple times � Replicas stored on different data nodes
![Page 12: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/12.jpg)
What’s PYXIS
PYXIS is one-stop service of Ad Management, Measurement, Optimization system of online ads specialized in Facebook. Only 1 system with the approval from Facebook as Both of PMD(Ads manage) and MMP(Measurement) in the world.
Specialized in Mobile & LTV maximization.
Main Features
Massive ad creation
Graphical Reporting
Auto optimization
Mobile measurement
LTV Maximization
![Page 13: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/13.jpg)
Auto bidding Automated optimization – auto bidding and reallocating
Data source
Summarize & Analyze
Tuning Campaign & Ad
- Ad information (Targeting segment & Ad creative) - Delivery data (Impression, Click, Cost,…) - Action data (Like, Install, Billing, LTV)
PYXIS get massive data every hour Summarize and Integrate all data into optimized unit
Change bid price and budget based on unit data
![Page 14: Hadoop map reduce](https://reader033.vdocument.in/reader033/viewer/2022052903/55788b84d8b42a02618b4f99/html5/thumbnails/14.jpg)
� The end!