yarn

86
1 YARN Alex Moundalexis @technmsg

Upload: alex-moundalexis

Post on 24-May-2015

442 views

Category:

Technology


1 download

DESCRIPTION

A brief introduction to YARN: how and why it came into existence and how it fits together with this thing called Hadoop. Focus given to architecture, availability, resource management and scheduling, migration from MR1 to MR2, job history and logging, interfaces, and applications.

TRANSCRIPT

Page 1: YARN

1  

YARN  Alex  Moundalexis    

@technmsg  

Page 2: YARN

CC  BY  2.0  /  Richard  Bumgardner  

Been  there,  done  that.  

Page 3: YARN

3  

• SoluAons  Architect  •  AKA  consultant  •  government  •  Infrastructure  

 

Alex  @  Cloudera  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 4: YARN

4  

• product  •  distribuAon  of  Hadoop  components,  Apache  licensed  •  enterprise  tooling  

• support  • training  • services  (aka  consulAng)  • community  

What  Does  Cloudera  Do?  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 5: YARN

5  

• Cloudera  builds  things  soTware  • most  donated  to  Apache  •  some  closed-­‐source  

• Cloudera  “products”  I  reference  are  open  source  •  Apache  Licensed  •  source  code  is  on  GitHub  

• h[ps://github.com/cloudera  

Disclaimer  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 6: YARN

6  

• deploying  •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor  

• sizing  &  tuning  •  depends  heavily  on  data  and  workload  

• coding  •  line  diagrams  don’t  count  

• algorithms  •  I  suck  at  math,  ask  anyone  

What  This  Talk  Isn’t  About  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 7: YARN

7  

• Why  YARN?  • Architecture  • Availability  • Resources  &  Scheduling  • MR1  to  MR2  Gotchas  • History  •  Interfaces  • ApplicaAons  • StoryAme  

So  What  ARE  We  Talking  About?    

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 8: YARN
Page 9: YARN

9  

Why  “Ecosystem?”  

•  In  the  beginning,  just  Hadoop  • HDFS  • MapReduce  

• Today,  dozens  of  interrelated  components  •  I/O  •  Processing  •  Specialty  ApplicaAons  •  ConfiguraAon  • Workflow  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 10: YARN

10  

ParAal  Ecosystem  

Hadoop  

external  system  

RDBMS  /  DWH  

web  server  

device  logs  

API  access  

log  collecAon  

DB  table  import  

batch  processing  

machine  learning  

external  system  

API  access  

user  

RDBMS  /  DWH  

DB  table    export  

BI  tool  +  JDBC/ODBC  

Search  

SQL  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 11: YARN

11  

HDFS  

• Distributed,  highly  fault-­‐tolerant  filesystem  • OpAmized  for  large  streaming  access  to  data  • Based  on  Google  File  System  

•  h[p://research.google.com/archive/gfs.html  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 12: YARN

12  

Lots  of  Commodity  Machines  

Image:Yahoo! Hadoop cluster [ OSCON ’07 ] Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ] Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ] Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 13: YARN

13  

MapReduce  (MR)  

• Programming  paradigm  • Batch  oriented,  not  realAme  • Works  well  with  distributed  compuAng  • Lots  of  Java,  but  other  languages  supported  • Based  on  Google’s  paper  

•  h[p://research.google.com/archive/mapreduce.html  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 14: YARN

14  

MR1  Components  

•  JobTracker  •  accepts  jobs  from  client  •  schedules  jobs  on  parAcular  nodes  •  accepts  status  data  from  TaskTrackers    

• TaskTracker  •  one  per-­‐node  • manages  tasks  •  crunches  data  in-­‐place  •  reports  to  JobTracker  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 15: YARN

15  

Under  the  Covers  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 16: YARN

16  

You specify map() and reduce() functions. ���

���The framework does the

rest. 60

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 17: YARN

WHY  DO  WE  NEED  THIS?  But  wait…  

Page 18: YARN

18  18  

Page 19: YARN
Page 20: YARN

20  

YARN  Yet  Another  Ridiculous  Name  

Page 21: YARN

21  

YARN  Yet  Another  Ridiculous  Name  

Page 22: YARN

22  

YARN  Yet  Another  Resource  NegoAator  

Page 23: YARN

23  

Why  YARN  /  MR2?  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Scalability  • JT  kept  track  of  individual  tasks  and  wouldn’t  scale  

• UAlizaAon  • All  slots  are  equal  even  if  the  work  is  not  equal    

• MulA-­‐tenancy  • Every  framework  shouldn’t  need  to  write  its  own  execuAon  engine  

• All  frameworks  should  share  the  resources  on  a  cluster  

Page 24: YARN

24  

An  OperaAng  System?  

©2014  Cloudera,  Inc.  All  rights  reserved.  

TradiAonal  OperaAng  System  

Storage:  File  System  

ExecuAon/Scheduling:  Processes/Kernel  

Scheduler  

Hadoop  

Storage:  Hadoop  

Distributed  File  System  (HDFS)  

ExecuAon/Scheduling:  Yet  Another  Resource  NegoJaJor  (YARN)  

Page 25: YARN

25  

MulAple  levels  of  scheduling  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• YARN  • Which  applicaAon  (framework)  to  give  resources  to?  

 • ApplicaAon  (Framework  -­‐  MR  etc.)  

• Which  task  within  the  applicaAon  should  use  these  resources?  

Page 26: YARN
Page 27: YARN

27  

Architecture  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 28: YARN

28  

Architecture  –  running  mulAple  applicaAons  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 29: YARN

29  

Control  Flow:  Submit  applicaAon  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 30: YARN

30  

Control  Flow:  Get  applicaAon  updates  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 31: YARN

31  

Control  Flow:  AM  asking  for  resources  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 32: YARN

32  

Control  Flow:  AM  using  containers  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 33: YARN

33  

ExecuAon  Modes  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Local  mode  • Uber  mode  • Executors  

• DefaultContainerExecutor  • LinuxContainerExecutor  

 

Page 34: YARN
Page 35: YARN

35  

Client  Failover  

Client  Failover  

Availability  

©2014  Cloudera,  Inc.  All  rights  reserved.  

RM    Elector  

RM    Elector  ZK  Store  

NM   NM   NM   NM  

Client   Client   Client  

Page 36: YARN

36  

Availability  –  SubtleAes  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Embedded  leader  elector  • No  need  for  a  separate  daemon  like  ZKFC  

 • Implicit  fencing  using  ZKRMStateStore  

• AcAve  RM  claims  exclusive  access  to  store  through  ACL  magic  

Page 37: YARN

37  

Availability  –  ImplicaAons  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Previously  submi[ed  applicaAons  conAnue  to  run  • New  ApplicaAon  Masters  are  created    

• If  the  AM  checkpoints  state,  can  conAnue  from  where  it  leT  • MR  keeps  track  of  completed  tasks.  They  don’t  have  to  be  re-­‐run  

• Future  • Work-­‐preserving  RM  Restart  /  Failover  

Page 38: YARN

38  

Availability  –  ImplicaAons  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Transparent  to  clients  • RM  unavailable  for  a  small  duraAon  • AutomaAcally  failover  to  the  AcAve  RM  • Web  UI  redirects  • REST  API  redirects  (starAng  5.1.0)  

Page 39: YARN
Page 40: YARN

40  

Resource  Model  and  CapaciAes  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Resource  vectors  • e.g.  1024  MB,  2  vcores,  …  • No  more  task  slots!  

• Nodes  specify  the  amount  of  resources  they  have  • yarn.nodemanager.resource.memory-­‐mb  • yarn.nodemanager.resource.cpu-­‐vcores  

• vcores  to  cores  relaAon,  not  really  “virtual”  

 

Page 41: YARN

41  

Resources  and  Scheduling  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• What  you  request  is  what  you  get  • No  more  fixed-­‐size  slots  • Framework/applicaAon  requests  resources  for  a  task  

• MR  AM  requests  resources  for  map  and  reduce  tasks,  these  requests  can  potenAally  be  for  different  amounts  of  resources  

Page 42: YARN

42  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Page 43: YARN

43  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

I  want  2  containers  with  1024  MB  and  a  1  core  each  

Page 44: YARN

44  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Noted  

Page 45: YARN

45  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

I’m  sAll  here  

Page 46: YARN

46  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

I’ll  reserve  some  space  on  node1  for  AM1  

Page 47: YARN

47  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Got  anything  for  me?  

Page 48: YARN

48  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Here’s  a  security  token  to  let  you  launch  a  container  on  Node  1  

Page 49: YARN

49  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Hey,  launch  my  container  with  this  shell  command  

Page 50: YARN

50  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1  Node  2   Node  3  

Container  

Page 51: YARN

51  

Resources  on  a  Node  5  GB  

Reduce  1536  MB  

Map  512  MB  

Map  1024  MB  

Map  256  MB  

Map  256  MB  

Reduce  512  MB  

MR  -­‐  AM  1024  MB  

Page 52: YARN

52  

FairScheduler  (FS)  

• When  space  becomes  available  to  run  a  task  on  the  cluster,  which  applicaAon  do  we  give  it  to?  

• Find  the  job  that  is  using  the  least  space.  

Page 53: YARN

53  

FS:  Apps  and  Queues  

• Apps  go  in  “queues”  • Share  fairly  between  

queues  • Share  fairly  between  

apps  within  queues  

Page 54: YARN

54  

FS: Hierarchical Queues

Root  Mem  Capacity:  12  GB  CPU  Capacity:  24  cores  

MarkeJng  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  

cores  

R&D  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  

cores  

Sales  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  

cores  

Jim’s  Team  Fair  Share  Mem:  2  GB  Fair  Share  CPU:  4  

cores  

Bob’s  Team  Fair  Share  Mem:  2  GB  Fair  Share  CPU:  4  

cores  

Page 55: YARN

55  

FS: Fast and Slow Lanes

Root  Mem  Capacity:  12  GB  CPU  Capacity:  24  cores  

MarkeJng  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  cores  

Sales  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  cores  

Fast  Lane  Max  Share  Mem:  1  GB  Max  Share  CPU:  1  cores  

Slow  Lane  Fair  Share  Mem:  3  GB  Fair  Share  CPU:  7  cores  

Page 56: YARN

56  

• Traverse  the  tree  starAng  at  the  root  queue  • Offer  resources  to  subqueues  in  order  of  how  few  resources  they’re  using  

FS:  Fairness  for  Hierarchies  

Page 57: YARN

57  

FS: Hierarchical Queues

Root      

MarkeJng      

R&D      

Sales      

Jim’s  Team      

Bob’s  Team      

Page 58: YARN

58  

FS:  MulA-­‐resource  scheduling  

• Scheduling  based  on  mulAple  resources  • CPU,  memory  • Future:  Disk,  Network  

• Why  mulAple  resources?  • Be[er  uAlizaAon  • More  fair  

Page 59: YARN

59  

FS:  More  features  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• PreempAon  • To  avoid  starvaAon,  preempt  tasks  using  more  than  their  fairshare  aTer  the  preempAon  Ameout  

• Warn  applicaAons.  ApplicaAon  can  choose  to  kill  any  of  its  containers  

• Locality  through  delay  scheduling  • Try  to  give  node-­‐local,  rack-­‐local  resources  by  waiAng  for  someAme  

 

Page 60: YARN

60  

Enforcing  resource  limits  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Memory  • Monitor  process  usage  and  kill  if  crosses  • Disable  virtual  memory  checking  • Physical  memory  checking  is  being  improved  

• CPU  • Cgroups  

Page 61: YARN

61  61  MicrosoT  Office  EULA.  Really.  

Page 62: YARN

62  

MR1  to  MR2  Gotchas  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• AMs  can  take  up  all  resources  • Symptom:  Submi[ed  jobs  don’t  run  • Fix  in  progress  -­‐  to  limit  number  of  max  applicaAons  • Work  around  –  scheduler  allocaAons  to  limit  number  of  applicaAons  

• How  to  run  4  maps  and  2  reduces  per  node?  • Don’t  try  to  tune  number  of  tasks  per  node  • Set  assignMulAple  to  false  to  spread  allocaAons      

Page 63: YARN

63  

MR1  to  MR2  Gotchas  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Comparing  MR1  and  MR2  benchmarks  • TestDFSIO  runs  best  on  dedicated  CPU/disk,  harder  to  pin  • TeraSort  changed:  less  compressible  ==  more  network  xfer  

• Resource  AllocaAon  vs  Resource  ConsumpAon  • RM  allocates  resources,  heap  specified  elsewhere  • JVM  overhead  not  included    • Mind  your  mapred.[map|reduce].child.java.opts  

Page 64: YARN

64  

MR1  to  MR2  Gotchas  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Changes  in  logs,  tracing  problems  harder  • MR1:  distributed  grep  on  JobId  • YARN  logs  more  generic,  deal  with  containers  not  apps  

Page 65: YARN
Page 66: YARN

66  

Job  History  

• Job  History  Viewing  was  moved  to  its  own  server:  Job  History  Server  

• Helps  with  load  on  RM  (JT  equivalent)  • Helps  separate  MR  from  YARN  

 

Page 67: YARN

67  

How  History  Flows?  

• AM    • While  running,  keeps  track  of  all  events  during  execuAon  

• On  success,  before  finishing  up  • Writes  the  history  informaAon  to  done_intermediate_dir  

• The  JHS    • periodically  scans  the  done_intermediate  dir    • moves  the  files  to  done_dir  • starts  showing  the  history  

Page 68: YARN

68  

History:  Important  ConfiguraAon  ProperAes  

• yarn.app.mapreduce.am.staging-dir •  Default  (CM):  /user      ←  Want  this  also  for  security  •  Default  (CDH):  /tmp/hadoop-­‐yarn/staging  •  Staging  directory  for  MapReduce  applicaAons  

• mapreduce.jobhistory.done-dir

•  Default:  ${yarn.app.mapreduce.am.staging-­‐dir}/history/done  •  Final  locaAon  in  HDFS  for  history  files  

 • mapreduce.jobhistory.intermediate-done-dir

•  Default:  ${yarn.app.mapreduce.am.staging-­‐dir}/history/done_intermediate  •  LocaAon  in  HDFS  where  AMs  dump  history  files  

 

Page 69: YARN

69  

History:  Important  ConfiguraAon  ProperAes  

• mapreduce.jobhistory.max-age-ms • Default  604800000  (7  days)  • Max  age  before  JHS  deletes  history  

 • mapreduce.jobhistory.move.interval-ms

• Default:  180000  (3  min)  • Frequency  at  which  JHS  scans  the  intermediate_done  dir  

Page 70: YARN

70  

History:  Miscellaneous  

• The  JHS  runs  as  ‘mapred’,  the  AM  run  as  the  user  who  submi[ed  the  job,  and  the  RM  runs  as  ‘yarn’    • The  done-­‐intermediate  dir  needs  to  be  writable  by  the  user  who  submi[ed  the  job  and  readable  by  ‘mapred’    • The  RM,  AM,  and  JHS  should  have  idenAcal  versions  of  the  jobhistory-­‐related  properAes  so  they  all  “agree”  

Page 71: YARN

71  

ApplicaAon  History  Server  /  Timeline  Server  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Work  in  progress  to  capture  history  and  other  informaAon  for  non-­‐MR  YARN  applicaAons    

Page 72: YARN

72  

YARN  Container  Logs  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• While  applicaAon  is  running  • Local  to  the  NM.  yarn.nodemanager.log-­‐dirs  

• ATer  applicaAon  finishes  • Logs  aggregated  to  HDFS  

• yarn.nodemanager.remote-­‐app-­‐log-­‐dir  

• Disable  aggregaAon?  • yarn.log-­‐aggregaAon-­‐enable  

   

Page 73: YARN
Page 74: YARN
Page 75: YARN

75  

InteracAng  with  a  YARN  cluster  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Java  API  • MR1  –  MR2  APIs  are  compaAble  

• REST  API  • RM,  NM,  JHS  –  all  have  REST  APIs  that  are  very  useful  

• Llama  (Long-­‐Lived  ApplicaAon  Master)  • Cloudera  Impala  can  reserve,  use,  and  release  resource  allocaAons  without  using  YARN-­‐managed  container  processes  

• CLI  • yarn  rmadmin,  applicaAon,  etc.  

• Web  UI  • New  and  “improved”  –  need  Ame  to  get  used  to  

Page 76: YARN
Page 77: YARN

77   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• MR2  • Cloudera  Impala  • Apache  Spark  • Others?  Custom?    • Apache  Slider  (incubaAng);  not  producAon-­‐ready  

•  Accumulo  • HBase  •  Storm  

YARN  ApplicaAons  

Page 78: YARN
Page 79: YARN

79   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Shipping  •  Enabled  by  default  on  CDH5+  •  Included  for  past  two  years,  not  enabled  

• Supported  • Recommended  

The  Cloudera  View  of  YARN  

Page 80: YARN

80   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Benchmarking  is  harder  •  different  uAlizaAon  paradigm  •  “whole  cluster”  benchmarks  more  important,  e.g.  SWIM  

• Tuning  sAll  largely  trial/error  • MR1  was  the  same  originally  •  YARN/MR2  will  get  there  eventually  

Growing  Pains  

Page 81: YARN

81   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• A  few  are  using  in  producAon  • Many  are  exploring  

•  Spark  •  Impala  via  Llama  

• Most  are  waiAng  

What  Are  Customers  Doing?  

Page 82: YARN

82   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Mesos  •  designed  to  be  completely  general  purpose  • more  burden  on  app  developer  (offer  model  vs  app  request)  

• YARN  •  designed  with  Hadoop  in  mind  •  supports  Kerberos  • more  robust/familiar  scheduling  •  rack/machine  locality,  out  of  box  

• Supportability  •  all  commercial  Hadoop  vendors  support  YARN  •  support  for  Mesos  limited  to  startup  Mesosphere  

Why  not  Mesos?  

Page 83: YARN

83   ©2014  Cloudera,  Inc.  All  rights  reserved.  

Is  This  the  End  for  MapReduce?  

Page 84: YARN

ALL  OF  YOU  Extra  special  thanks:  

Page 85: YARN

85   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• CC  BY  2.0  flik            h[ps://flic.kr/p/4RVoUX  • CC  BY  2.0  Ian  Sane            h[ps://flic.kr/p/nRyHxd  • CC  BY-­‐NC  2.0  lollyknit        h[ps://flic.kr/p/49C1Xi  • CC  BY-­‐ND  2.0  jankunst        h[ps://flic.kr/p/deU71s  • CC  BY-­‐SA  2.0  pierrepocs      h[ps://flic.kr/p/9mgdMd  • CC  BY-­‐SA  2.0  bekathwia      h[ps://flic.kr/p/4FpABU  • CC  BY-­‐NC-­‐ND  2.0  digitalnc      h[ps://flic.kr/p/dxyTt1  • CC  BY-­‐NC-­‐ND  2.0  arselectronica  h[ps://flic.kr/p/7yw8z2  • CC  BY-­‐NC-­‐ND  2.0  yum9me    h[ps://flic.kr/p/81hQ49  • CC  BY-­‐NC-­‐SA  2.0  jimnix      h[ps://flic.kr/p/gsqpWC  • MicrosoT  Office  EULA  (really)  

Image  Credits  

Page 86: YARN

86  

Thank  You!  Alex  Moundalexis  @technmsg    Insert  wi[y  tagline  here.