multicorelinuxandmiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. ·...

15
Multicore Linux and Middleware Chenyang Lu CSE 520S

Upload: others

Post on 28-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Multicore  Linux  and  Middleware  

Chenyang  Lu  CSE  520S  

Page 2: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Motivation  and  Contributions    Trend  towards  mul9-­‐processor  and  mul9-­‐core  pla=orms  

affects  both  OS  and  middleware    Techniques  designed  for  uni-­‐processors  need  revisi9ng  

  Contribu9ons  to  real-­‐9me  systems  on  mul9-­‐core  pla=orms    A  performance  evalua9on  of  relevant  Linux  features    MC-­‐ORB  middleware  designed  for  mul9-­‐core  pla=orms    Evalua9on  of  MC-­‐ORB’s  mul9-­‐core  aware  RT  performance  

2  

Page 3: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Background  and  Related  Work  

  Linux  2.6  introduced  SMP  and  mul9-­‐core  support    Linux  2.6.23  added  the  Completely  Fair  Scheduler  (CFS)    However,  many  deployed  pla=orms  predate  2.6.23    We  studied  Linux  2.6.17  as  a  representa9ve  compromise  

  We  assume  unmodified  COTS  Linux  as  our  middleware  design  point,  for  highly  portable  real-­‐9me  performance  

  The  differing  trade-­‐offs  for  uni-­‐processor  vs.  mul9-­‐processor  pla=orms  mo9vate  new  middleware  designs  

3  

Page 4: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Linux  Performance:  Clock  Differences  I    We  first  evaluated  clock  

differences  between  cores    How  well  do  pla=orm/Linux  maintain  

synchroniza9on?    We  used  RDTSC  instruc9on  to  record  

clock  9cks  on  each  core  

  We  bounced  a  message  back  and  forth  between  two  cores    Used  arrival  TSCs  (x,  y,  z)  to  measure  

round  trip  delay  (RTD)    The  results  show  that  the  cores’  

frequencies  were  well  matched  

4  

Page 5: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Linux  Performance:  Clock  Differences  II    We  then  es9mated  the  cores’  

temporal  offsets  as    δ0  =  2y1–x0–  z0  ;  δ1  =  2y0–x1–z1  

  Insight  1    Though  frequencies  matched  well,  

avg.  offset  was  ~1.3μs    Mo9vates  measuring  offsets  in  our  

subsequent  analyses    

5  

Page 6: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Linux  Performance:  Load  Balancing  

  Can  thread  affinity  thwart  (bad)  Linux  rebalancing?    We  ran  sets  of  10  vs.  30  tasks  (all  bound  to  one  core  to  prevent  

rebalancing),  with  total  u9liza9ons  of  0.6  vs.  1.0  

  Insight  2    Though  overhead  is  small  and  amor9zed,  compiling  kernels  with  

rebalancing  off  appears  to  be  a  preferable  method  

 Tasks  

 U+liza+on  

 Imbalances  detected  in  

5  min  

Overhead  per  imbalance  (ns)    Overhead  (total  μs)  

Minimum   Mean   Maximum  

10   0.6   211   405   983   1899   207  

30   0.6   210   566   1178   2120   247  

10   1.0   588   536   854   1463   509  

30   1.0   596   671   1124   2069   670  

6  

Page 7: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Linux  Performance:  Migration  Strategies    Two  key  migra9on  strategies  

  Thread  migrates  itself    Separate  manager  thread  migrates  it  

  State  of  the  migra9ng  thread  influences  the  mechanisms  &  cost      Thread  core  affinity  mask  is  updated    For  running  thread,  may  involve    

kernel  run  queues  &  scheduler      Self  migra9on  always  entails  

migra9ng  a  running  thread    

Case 1: a running thread modifies its own affinity

0 1 2 3

Case 3: a separate manager thread modifies a sleeping thread’s affinity

0 1 2 3

0 1 2 3 Case 2: a separate manager thread modifies a running thread’s affinity

7  

Page 8: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Linux  Performance:  Migration  Costs  Insight  3    Every  strategy  risks  a  non-­‐negligible  

thread  migra9on  cost    Mo9vates  binding  task  threads  into  

core-­‐specific  thread  pools    Mo9vates  an  ORB  architecture  with  a  

separate  manager  thread  (next)  

self migration (~ 16 to 45 µs)

manager migrates sleeping thread (~ 4 to 10 µs)

manager migrates running thread (~ 18 to 36 µs)

8  

Page 9: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Conventional  Middleware  Architecture    Tradi9onal  single-­‐CPU  approach  benefits  from  leader/

followers  to  reduce  costly  hand-­‐offs    E.g.,  TAO,  nORB  

  However,  mul9ple  cores  increase  risk  of  migra6on  

1.  Leader  invokes  TA  (and  AC)  for  task  

2.  Picks  new  leader  3.  New  leader  may  need  

to  move  old  4.  Old  leader  runs  the  

task  (on  the  appropriate  core)  

9  

Page 10: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

MC-­‐ORB  Middleware  Architecture    In  contrast,  MC-­‐ORB’s  threading  architecture  leverages  hand-­‐

offs  to  avoid    thread  migra9ons      Key  trade  off:  copying/locking  costs  vs.  migra9on  costs  

1.  Request  is  queued  2.  Manager  thread  

reads  requests  in  priority  order  

3.  Invokes  TA  w/AC  4.  Manager  picks  thread  

from  pool  5.  Thread  runs  task  

10  

Page 11: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Real-­‐Time  ORB  Performance  Evaluation    To  gauge  performance  costs  of  our  middleware  

architecture  we  examined  four  key  features    Allocate  on  same  vs.  other  core  (as  manager  thread)    Thread  available  vs.  migra9on  needed    Realloca9on  is  vs.  is  not  required  to  allocate  task    New  task  is  admined  vs.  rejected  

  We  evaluated  our  middleware  architecture  both  with  (MC-­‐ORB)  and  without  (MC-­‐ORB*)  rejec9on    MC-­‐ORB*  compared  to  nORB  (designed  for  uniprocessors)    Varied  u9liza9on  granularity  &  magnitude  (10  task  sets)    We  measured  how  many  of  the  task  sets  missed  a  deadline  

11  

Page 12: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Overheads  for  MC-­‐ORB’s  Extensions  (μs)  

Scenarios  used  for  Overhead  Evalua9on  1.  New  task  on  same  core  as  manager  2.  New  task  on  different  core  (similar  cost  to  1)  3.  (Sleeping)  thread  moved  from  other  core  to  run  new  task  4.  (All)  running  tasks  reallocated  to  make  room  for  new  task  5.  The  new  task  is  rejected  (low  cost,  but  it’s  pure  overhead)  

Scenario Minimum Mean Maximum

1 43 55 109

2 42 58 111

3 50 64 121

4 222 235 289

5 39 50 107

12  

Page 13: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Fraction  of  Workloads  w/  Deadline  Misses  

  With  rejec9on,  >94%  of  tasks  were  admi8ed  by  MC-­‐ORB  and  all  admined  tasks  met  all  deadlines  

  Without  rejec9on  (where  +  shows  need  for  AC)  MC-­‐ORB*    Worked  bener  with  less  balanced  workloads      Outperformed  nORB  in  6  cases  (green)    Performed  the  same  as  nORB  in  4  cases  (grey)    Underperformed  nORB  in  2  cases  (red)  

Total Utilization

ORB

Balance Factor

0.1 0.2 0.3 0.5

1.4

nORB 0.4 0 0 0

MC-ORB* 0 0 0 0

1.5

nORB 0.8 0.3 0.1 0.1

MC-ORB* 0 0.1+ 0.1+ 0

1.6

nORB 1.0 0.5 0.1 0.1

MC-ORB* 0.3+ 0.4+ 0.4+ 0.3+

13  

Page 14: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Concluding  Remarks    COTS  OS  evalua9ons  

  Measurement  on  specific  target  pla=orms  is  crucial    Behaviors  of  hardware  and  OS  mechanisms  are  important  

  Middleware  architectures    OS  evalua9ons  establish  design  trade-­‐off  parameters    Prior  design  decisions  may  be  reversed  on  new  pla=orms  

  Performance  evalua9ons  bear  out  our  new  design    Even  w/out  admission  control,  MC-­‐ORB  architecture  helps      W/  AC  admined  high  u9liza9on,  and  met  all  deadlines  

14  

Page 15: MulticoreLinuxandMiddlewarelu/cse520s/slides/mc-orb.pdf · 2011. 11. 9. · Background*and*Related*Work* Linux(2.6(introduced(SMP(and(mul9:core(support Linux(2.6.23(added(the(Completely(Fair(Scheduler((CFS)(

Paper    Y.  Zhang,  C.  Gill,  and  C.  Lu,  Real-­‐Time  Performance  and  Middleware  for  

Mul9processor  and  Mul9core  Linux  Pla=orms,  IEEE  Interna9onal  Conference  on  Embedded  and  Real-­‐Time  Compu9ng  Systems  and  Applica9ons  (RTCSA'09),  August  2009.  

  Slides  based  on  Chris  Gill’s  RTCSA’09  presenta9on.  

15