jenny blog

12
8/20/2019 Jenny Blog http://slidepdf.com/reader/full/jenny-blog 1/12  Jenny (Xiao) Zhang Facebook  Jenny (Xiao) Zhang Twitter  Jenny (Xiao) Zhang Google Plus  Jenny (Xiao) Zhang LinkedIn Search  Jenny (Xiao) Zhang Technology Professional and Enthusiast !"#T $% !L"G &"'T&T I$P"SSI!L% LIST I'F"GPI& %S#$% PL&%S I*+% !%%' %S#$% 'a,igation

Upload: amit-bhartiya

Post on 07-Aug-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 2/12

Page 3: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 3/12

 

Flume = I0-ort and %>-ort #nstructured or Se0i.Structured 6ata to?/ro0 adoo-1

Sqoop (S@LAadoo-) = I0-ort and %>-ort Structured 6ata to?/ro0 adoo-1

HDFS = adoo- distributed 7le syste01

MapReduce = a -rogra00ing 0odel and associated i0-le0entation /or -rocessing and

generating large data sets with a -arallel5 distributed algorith0 on a cluster1

HBase = 'oS@L 6atabase5 read? write /ro0 6FS to $a-educe5 can be used /or "LTP1

Pig  = data analysis tool originally de,elo-ed by yahoo5 use -rocedural data.Bow language

(PigLatin)5 good /or se0i.structured data1

Hive = data warehouse tool5 use S@L like language (i,e@L)5 good /or structured data1

Mahout = a 0achine learning /ra0ework5 used to de,elo- social network?%. co00erce

reco00endations1

Apache oozie = workBow scheduler and 0anage0ent tool5 can schedule and run adoo-

 2obs in -arallel1

H!"#

Page 4: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 4/12

 

HDFS has a 0aster?sla,e architecture1 n 6FS cluster consists o/ a single 'a0e'ode and a

nu0ber o/ 6ata'odes1

NameNode is the 0aster1 It 0aintains and 0anages the blocks that are -resented on

6ata'odes1 It 0anages all the 0etadata o/ 6FS1 'a0e'ode only stores in/or0ation in

$1 'a0e'ode is associated with Job Tracker1 There is a secondary 'a0e'ode in the

syste0 but it is not a hot standby o/ the 'a0e'ode1 It reads /ro0 'a0e'ode*s $ and

write to a 7le syste0 ( hard disk) and used /or disaster reco,ery1

DataNode is the sla,e1 It ser,es the read?write reCuests /ro0 the clients1 6ata'ode is

associated with Task Tracker1 7le is s-lit into one or 0ore blocks and these blocks are

stored in a set o/ 6ata'odes1

 The beauty o/ adoo- is data localization1 Traditional 6FS is trans/erring T! o/ data in the

network to -rocess1 adoo- has the conce-t o/ data locali8ation and only trans/ers ;! le,el

o/ code in the network1 6ata is -rocessed locally in 6ata'odes1

$ap%educe

Page 5: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 5/12

 

D1 #ser (a -erson) co-ies the in-ut 7le into the 6istributed File Syste01

31 #ser sub0its the 2ob to &lient (so/tware)1

E1 &lient gets in/or0ation about the in-ut 7le1:1 &lient s-lit the 2ob into 0ulti-le s-lits1

1 &lient u-load the 2ob in/or0ation to 6FS1

91 &lient sub0its 2ob to Job Tracker1

1 Job Tracker initiali8es the 2ob in 2ob Cueue1

H1 Job Tracker reads 2ob 7les /ro0 6FS to understand the 2ob1

1 Job Tracker creates $a- tasks and educe Tasks based on the 2ob ty-e1 The nu0ber o/

$a- tasks eCual the nu0ber o/ in-ut s-lits5 which is con7gurable1 %ach $a- task is running

on one in-ut s-lit1 The out-ut o/ the $a- task will go to educe Task1 The nu0ber o/ educe

tasks generated can be de7ned1 The $a- and educe tasks are running on 6ata'odes1

D41 Task Trackers send eartbeats to Job Tracker to let it know they are a,ailable /or tasks1DD1 Job Tracker -icks the Task Trackers that ha,e the 0ost local 6ata1

D31 Job Tracker assign tasks to Task Trackers1

"nce the tasks are co0-leted5 Task Tracker sends eartbeat to Job Tracker agains and Job

 Tracker will assign 0ore tasks1

Hadoop &'( )s *'(

Page 6: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 6/12

 There are /ollowing challenges in adoo- D14<• ori8ontal scalability o/ 'a0e'ode (bottleneck a/ter :444 nodes)

• 'a0e'ode is a single -oint o/ /ailure o/ the syste0

• ",er burn o/ 2ob tracker

• &annot run non.$a-educe a--lications on 6FS

• 'o 0ulti.tenancy< bility to run 0ulti-le ty-es o/ 2obs on the sa0e resource at the

sa0e ti0e

How Hadoop 2! solve the challenges o" Hadoop #!$

HDFS Fede%ation = dierent 'a0e'odes /or dierent organi8ations1 "ne 'a0es-ace has

one 'a0e'ode1 'a0e'odes are inde-endent and do not talk to each other1 6ata is s-read

on large scale o/ 6ata'odes1 ll 6ata'odes are used as co00on storage /or all 'a0e'odes1

%ach 6ata'ode is registered with all 'a0e'odes1 There is one block -ool /or one 'a0e'ode

? 'a0es-ace but one 6ata'ode can belong to 0ulti-le 'a0es-aces1

Page 7: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 7/12

NameNode High Availa&ilit' = "ne 'a0es-ace has one acti,e 'a0e'ode5 one stand by

'a0e'ode5 and one secondary 'a0e'ode (o-tional)1

Fail ove% p%ocess = I/ 'a0e'ode does not res-onse in D4 seconds5 the syste0 assu0es it

is dead1 ll 6ata'odes will talk to the stand by 'a0e'ode5 which beco0es the acti,e

'a0e'ode1 !ut the issue is the 'a0e'ode 0ay not really die1 $aybe the network is slow1

Fencing = 0ake sure the /ailed 'a0e'ode is actually dead1 ;ill all acti,e -rocesses on that

'a0e'ode and then kill the 'a0e'ode = stonith5 send s-ecial -ower su--ly signal and sto-

the -ower su--ly1 'eed to 0anually bring the dead 'a0e'ode back1

 (ARN = yet another resource negotiator (has nothing to do with $a-educe)5 better

-rocessing control5 su--ort non $a-educe -rocessing1 esource 0anage re-laces 2ob

tracker (scheduling 5 a--lications 0anager. 0anage 2obs)1 'ode 0anager re-laces task

tracker1

Page 8: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 8/12

Multi)tenanc' = Karn su--orts $ulti.tenancy5 which 0eans you can run 0ulti-le ty-es o/

 2obs (batch5 interacti,e5 strea0ing) on the sa0e resource at the sa0e ti0e1 There are

0ulti-le 2ob Cueues1 %ach Cueue has a -riority and shares certain -ercent o/ cluster1 FIF" in

each Cueue1

 Thank you ,ery 0uch /or reading this blog1 Please /eel /ree to contact 0e i/ you ha,e any

Cuestions or want to learn 0ore about adoo-1

$a- reduce. &AAi,esClPig-ython

Page 9: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 9/12

Page 10: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 10/12

Page 11: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 11/12

Page 12: Jenny Blog

8/20/2019 Jenny Blog

http://slidepdf.com/reader/full/jenny-blog 12/12