2014 feb 5_what_ishadoop_mda
DESCRIPTION
What Is Hadoop update to include YARN, Tez, Spark, and StormTRANSCRIPT
![Page 1: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/1.jpg)
WELCOME TO HADOOP Adam Muise – Hortonworks
![Page 2: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/2.jpg)
Who am I?
![Page 3: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/3.jpg)
Why are we here?
![Page 4: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/4.jpg)
Data
![Page 5: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/5.jpg)
“Big Data” is the marke=ng term of the decade
![Page 6: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/6.jpg)
What lurks behind the hype is the democra=za=on of Data.
![Page 7: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/7.jpg)
You need to deal with Data.
![Page 8: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/8.jpg)
You’re probably not as good at that as you think.
![Page 9: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/9.jpg)
Put it away, delete it, tweet it, compress it, shred it, wikileak-‐it, put it in a database, put it in SAN/NAS, put it in the cloud, hide it in tape…
![Page 10: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/10.jpg)
Let’s talk challenges…
![Page 11: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/11.jpg)
Volume
Volume
Volume
Volume
![Page 12: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/12.jpg)
Volume
Volume
Volume
Volume
Volume Volume
Volume
Volume Volume Volume
Volume
Volume Volume
Volume Volume
Volume Volume
![Page 13: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/13.jpg)
Volume
Volume
Volume
Volume
Volume Volume
Volume
Volume Volume Volume
Volume
Volume Volume
Volume Volume
Volume Volume Volume
Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
![Page 14: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/14.jpg)
Volume
Volume
Volume
Volume
Volume Volume
Volume
Volume Volume Volume
Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
Volume
Volume Volume
Volume Volume
Volume
Volume
![Page 15: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/15.jpg)
Storage, Management, Processing all become challenges with Data at
Volume
![Page 16: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/16.jpg)
Tradi=onal technologies adopt a divide, drop, and conquer approach
![Page 17: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/17.jpg)
The solu=on? EDW
Data Data Data
Data Data Data
Data Data Data
Yet Another EDW
Data Data Data
Data Data Data
Data Data Data
Analy=cal DB
Data Data Data
Data Data Data
Data Data Data OLTP
Data Data Data
Data Data Data
Data Data Data
Another EDW
Data Data Data
Data Data Data
Data Data Data
![Page 18: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/18.jpg)
Ummm…you dropped something
Data Data Data
Data Data Data
Data Data Data Data Data Data
Data Data Data
Data Data Data Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data Data Data
Data
Data Data Data
Data Data Data
EDW
Data Data Data
Data Data Data
Data Data Data
Yet Another EDW
Data Data Data
Data Data Data
Data Data Data
Analy=cal DB
Data Data Data
Data Data Data
Data Data Data
OLTP
Data Data Data
Data Data Data
Data Data Data
Another EDW
Data Data Data
Data Data Data
Data Data Data
![Page 19: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/19.jpg)
Analyzing the data usually raises more interes=ng ques=ons…
![Page 20: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/20.jpg)
…which leads to more data
![Page 21: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/21.jpg)
Wait, you’ve seen this before.
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data Data Data
Data
Data Data Data
Data Data Data Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Sausage Factory
Data Data Data
Data Data Data
Data Data Data … Data
Data Data …
![Page 22: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/22.jpg)
Data begets Data.
![Page 23: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/23.jpg)
What keeps us from Data?
![Page 24: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/24.jpg)
“Prices, Stupid passwords, and Boring Sta=s=cs.” -‐ Hans Rosling
h"p://www.youtube.com/watch?v=hVimVzgtD6w
![Page 25: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/25.jpg)
Your data silos are lonely places.
EDW
Data Data Data
Data Data Data
Data Data Data
Accounts
Data Data Data
Data Data Data
Data Data Data
Customers
Data Data Data
Data Data Data
Data Data Data
Web Proper=es
Data Data Data
Data Data Data
Data Data Data
![Page 26: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/26.jpg)
… Data likes to be together.
EDW
Data Data Data
Data Data Data
Data Data Data
Accounts
Data Data Data
Data Data Data
Data Data Data
Customers
Data Data Data
Data Data Data
Data Data Data
Web Proper=es
Data Data Data
Data Data Data
Data Data Data
![Page 27: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/27.jpg)
Data likes to socialize too. EDW
Data Data Data
Data Data Data
Data Data Data
Accounts
Data Data Data
Data Data Data
Data Data Data
Customers
Data Data Data
Data Data Data
Data Data Data
Web Proper=es
Data Data Data
Data Data Data
Data Data Data
Machine Data
Data Data Data
Data Data Data
Data Data Data
TwiYer
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
Data Data Data
CDR
Data Data Data
Data Data Data
Data Data Data
Weather Data
Data Data Data
Data Data Data
Data Data Data
![Page 28: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/28.jpg)
New types of data don’t quite fit into your pris=ne view of the world.
My LiYle Data Empire
Data Data Data
Data
Data Data
Data Data Data
Logs
Data Data Data Data
Data
Data Data
Machine Data
Data Data Data Data
Data
Data Data
? ?
? ?
![Page 29: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/29.jpg)
To resolve this, some people take hints from Lord Of The Rings...
![Page 30: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/30.jpg)
…and create One-‐Schema-‐To-‐Rule-‐Them-‐All…
EDW
Data Data Data
Data Data Data
Data Data Data Schema
![Page 31: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/31.jpg)
…but that has its problems too.
EDW
Data Data Data
Data Data Data
Data Data Data Schema Data
Data Data
ETL ETL
ETL ETL
EDW
Data Data Data
Data Data Data
Data Data Data Schema Data
Data Data
ETL ETL
ETL ETL
![Page 32: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/32.jpg)
So what is the answer?
![Page 33: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/33.jpg)
Enter the Hadoop.
hYp://www.fabulouslybroke.com/2011/05/ninja-‐elephants-‐and-‐other-‐awesome-‐stories/
………
![Page 34: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/34.jpg)
Hadoop was created because Big IT never cut it for the Internet
Proper=es like Google, Yahoo, Facebook, TwiYer, and LinkedIn
![Page 35: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/35.jpg)
Tradi=onal architecture didn’t scale enough…
DB DB DB
SAN
App App App App
DB DB DB
SAN
App App App App DB DB DB
SAN
App App App App
![Page 36: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/36.jpg)
Databases become bloated and useless
![Page 37: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/37.jpg)
Tradi=onal architectures cost too much at that volume…
$/TB
$pecial Hardware
$upercompu=ng
![Page 38: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/38.jpg)
How would you fix this?
![Page 39: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/39.jpg)
If you could design a system that would handle this, what would it
look like?
![Page 40: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/40.jpg)
It would probably need a highly resilient, self-‐healing, cost-‐efficient,
distributed file system…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
![Page 41: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/41.jpg)
It would probably need a completely parallel processing framework that
took tasks to the data…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
![Page 42: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/42.jpg)
It would probably run on commodity hardware, virtualized machines, and
common OS plaeorms
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
![Page 43: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/43.jpg)
It would probably be open source so innova=on could happen as quickly
as possible
![Page 44: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/44.jpg)
It would need a cri=cal mass of users
![Page 45: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/45.jpg)
It would be Apache Hadoop
![Page 46: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/46.jpg)
{Processing + Storage} =
{MapReduce/Tez/YARN+ HDFS}
![Page 47: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/47.jpg)
HDFS stores data in blocks and replicates those blocks
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing block3 block3
block3
block2 block2
block2
block1
block1
block1
![Page 48: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/48.jpg)
If a block fails then HDFS always has the other copies and heals itself
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing block3
block3
block3
block2 block2
block2
block1
block1
block1
X
![Page 49: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/49.jpg)
MapReduce is a programming paradigm that completely parallel
Mapper
Mapper
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
![Page 50: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/50.jpg)
MapReduce has three phases: Map, Sort/Shuffle, Reduce
Mapper
Mapper
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value Key, Value
Key, Value
Key, Value
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value
Key, Value Key, Value
Key, Value
![Page 51: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/51.jpg)
MapReduce applies to a lot of data processing problems
Mapper
Mapper
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
![Page 52: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/52.jpg)
MapReduce goes a long way, but not all data processing and analy=cs
are solved the same way
![Page 53: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/53.jpg)
Some=mes your data applica=on needs parallel processing and inter-‐
process communica=on
Process
Process
Process
Process
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
![Page 54: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/54.jpg)
…like Complex Event Processing in Apache Storm
![Page 55: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/55.jpg)
Some=mes your machine learning data applica=on needs to process in
memory and iterate
Process
Process
Process
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data Process Process
Process
Process
Data
Data Data
![Page 56: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/56.jpg)
…like in Machine Learning in Spark
![Page 57: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/57.jpg)
Introducing YARN
![Page 58: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/58.jpg)
YARN = Yet Another Resource Nego=ator
![Page 59: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/59.jpg)
YARN abstracts resource management so you can run more
than just MapReduce
HDFS2
MapReduce V2
YARN MapReduce V? STORM
MPI Giraph HBase Tez … and more Spark
![Page 60: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/60.jpg)
Resource Manager +
Node Managers = YARN
Resource Manager
AppMaster
Node Manager
Scheduler
AppMaster
AppMaster
Node Manager
Node Manager
Node Manager
Container
Container
MapReduce
Container
Storm
Container
Container
Container
Pig
Container
Container
Container
![Page 61: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/61.jpg)
YARN turns Hadoop into a smart phone: An App Ecosystem
hortonworks.com/yarn/
![Page 62: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/62.jpg)
Check out the book too…
Preview at: hortonworks.com/yarn/
![Page 63: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/63.jpg)
YARN is an essen=al part of a balanced breakfast in Hadoop 2.x
![Page 64: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/64.jpg)
Introducing Tez
![Page 65: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/65.jpg)
Tez is a YARN applica=on, like MapReduce is a YARN applica=on
![Page 66: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/66.jpg)
Tez is the Lego set for your data applica=on
![Page 67: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/67.jpg)
Tez provides a layer for abstract tasks, these could be mappers, reducers, customized stream
processes, in memory structures, etc
![Page 68: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/68.jpg)
Tez can chain tasks together into one job to get Map – Reduce – Reduce jobs
suitable for things like Hive SQL projec=ons, group by, and order by
TezMap
TezMap
TezMap
TezMap
TezMap
TezReduce
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
Data
Data Data
TezReduce
TezReduce
TezReduce
TezReduce
TezReduce
![Page 69: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/69.jpg)
Tez can provide long-‐running containers for applica=ons like Hive to side-‐step batch process startups you would have with MapReduce
![Page 70: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/70.jpg)
Hadoop has other open source projects…
![Page 71: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/71.jpg)
Hive = {SQL -‐> Tez || MapReduce} SQL-‐IN-‐HADOOP
![Page 72: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/72.jpg)
Pig = {PigLa=n -‐> Tez || MapReduce}
![Page 73: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/73.jpg)
HCatalog = {metadata* for MapReduce, Hive, Pig, HBase}
*metadata = tables, columns, par==ons, types
![Page 74: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/74.jpg)
Oozie = Job::{Task, Task, if Task, then Task, final Task}
![Page 75: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/75.jpg)
Falcon
Hadoop Hadoop Feed Feed
Feed Feed
Feed
Feed
Feed
Feed DR
Replica=on
![Page 76: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/76.jpg)
Knox
Hadoop Cluster
REST Client Knox Gateway
Hadoop Cluster
REST Client
REST Client
Enterprise LDAP
![Page 77: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/77.jpg)
Flume
JMS
Weblogs
Events
Files
Hadoop Flume
Flume
Flume
Flume
Flume
Flume
![Page 78: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/78.jpg)
Sqoop
Hadoop
DB DB Sqoop
Sqoop
![Page 79: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/79.jpg)
Ambari = {install, manage, monitor}
![Page 80: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/80.jpg)
HBase = {real-‐=me, distributed-‐map, big-‐tables}
![Page 81: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/81.jpg)
Storm = {Complex Event Processing, Near-‐Real-‐Time, Provisioned by
YARN }
![Page 82: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/82.jpg)
Apache Hadoop
Flume Ambari
HBase Falcon
MapReduce HDFS
Sqoop HCatalog
Pig
Hive
Storm YARN
Knox
Tez
![Page 83: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/83.jpg)
Hortonworks Data Plaeorm
Flume Ambari
HBase Falcon
MapReduce HDFS
Sqoop HCatalog
Pig
Hive
Storm YARN
Knox
Tez
![Page 84: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/84.jpg)
What else are we working on?
hortonworks.com/labs/
![Page 85: 2014 feb 5_what_ishadoop_mda](https://reader034.vdocument.in/reader034/viewer/2022051819/54c6ba904a795938518b459d/html5/thumbnails/85.jpg)
Hadoop is the new Modern Data Architecture