jonathan/aldrich /charliegarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2...

44
1 15214 School of Computer Science Principles of So3ware Construc9on: Objects, Design, and Concurrency Part 6: Concurrency and distributed systems Distributed Systems Jonathan Aldrich Charlie Garrod

Upload: others

Post on 18-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

1 15-­‐214

School  of    Computer  Science  

Principles  of  So3ware  Construc9on:          Objects,  Design,  and  Concurrency    Part  6:  Concurrency  and  distributed  systems    Distributed  Systems    Jonathan  Aldrich  Charlie  Garrod  

Page 2: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

2 15-­‐214

Administrivia  

•  Homework  5b  due  Thursday,  11:59  p.m.  –  Finish  by  Friday  10  a.m.  if  you  want  to  be  considered  as  a  "Best  

Framework"  for  Homework  5c  •  Our  evalua9on  considers:  

–  Novelty  –  Func9onal  correctness  –  Documenta9on  – …  

Page 3: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

3 15-­‐214

Key  concepts  from  last  Thursday  

Page 4: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

4 15-­‐214

Concurrency  at  the  language  level  

•  Consider: int  sum  =  0;  Iterator  i  =  coll.iterator();  while  (i.hasNext())  {          sum  +=  i.next();  }  

•  In python: sum  =  0;  for  item  in  coll:          sum  +=  item  

Page 5: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

5 15-­‐214

Parallel  prefix  sums  algorithm,  winding  

•  Computes  the  par9al  sums  in  a  more  useful  manner

[13, 9, -4, 19, -6, 2, 6, 3]

[13, 22, -4, 15, -6, -4, 6, 9]  

 

[13, 22, -4, 37, -6, -4, 6, 5]  

 

[13, 22, -4, 37, -6, -4, 6, 42]  

  …

Page 6: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

6 15-­‐214

Parallel  prefix  sums  algorithm,  unwinding  

•  Now  unwinds  to  calculate  the  other  sums  

[13, 22, -4, 37, -6, -4, 6, 42]  

 

[13, 22, -4, 37, -6, 33, 6, 42]  

 

[13, 22, 18, 37, 31, 33, 39, 42]  

•  Recall,  we  started  with:[13, 9, -4, 19, -6, 2, 6, 3]

 

Page 7: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

7 15-­‐214

A  framework  for  asynchronous  computa9on  

•  The  java.util.concurrent.Future<V>  interface  V              get();  V              get(long  timeout,  TimeUnit  unit);  boolean  isDone();  boolean  cancel(boolean  mayInterruptIfRunning);  boolean  isCancelled();  

•  The  java.util.concurrent.ExecutorService  interface  Future                    submit(Runnable  task);  Future<V>              submit(Callable<V>  task);  List<Future<V>>  invokeAll(Collection<Callable<V>>  tasks);  Future<V>              invokeAny(Collection<Callable<V>>  tasks);  

Page 8: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

8 15-­‐214

Fork/Join:  another  common  computa9onal  paYern  

•  In  a  long  computa9on:  –  Fork  a  thread  (or  more)  to  do  some  work  –  Join  the  thread(s)  to  obtain  the  result  of  the  work  

•  The  java.util.concurrent.ForkJoinPool  class  –  Implements  ExecutorService    –  Executes      java.util.concurrent.ForkJoinTask<V>  or      

                                   java.util.concurrent.RecursiveTask<V>  or          java.util.concurrent.RecursiveAction  

Page 9: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

9 15-­‐214

Parallel  prefix  sums  algorithm  

•  How  good  is  this?  –  Work:  O(n)  –  Depth:  O(lg  n)  

•  See  PrefixSumsSequen9alImpl.java  –  n-­‐1  addi9ons  –  Memory  access  is  sequen9al  

•  For  PrefixSumsNonsequen9alImpl.java  –  About  2n  useful  addi9ons,  plus  extra  addi9ons  for  the  loop  indexes  –  Memory  access  is  non-­‐sequen9al  

•  The  punchline:    Constants  maYer.      

Page 10: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

10 15-­‐214

Today:    Distributed  system  design  

•  Java  networking  fundamentals  •  Introduc9on  to  distributed  systems  

–  Mo9va9on:  reliability  and  scalability  –  Failure  models  –  Techniques  for:  

•  Reliability  (availability)  •  Scalability  •  Consistency  

Page 11: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

11 15-­‐214

Our  des9na9on:    Distributed  systems  

•  Mul9ple  system  components  (computers)  communica9ng  via  some  medium  (the  network)  

•  Challenges:  –  Heterogeneity  –  Scale  –  Geography  –  Security  –  Concurrency  –  Failures  

(courtesy of http://www.cs.cmu.edu/~dga/15-440/F12/lectures/02-internet1.pdf

Page 12: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

12 15-­‐214

Communica9on  protocols  

•  Agreement  between  par9es  for  how  communica9on  should  take  place  

Friendly greeting.

Muttered reply.

Destination?

Pittsburgh.

Thank you.

(courtesy of http://www.cs.cmu.edu/~dga/15-440/F12/lectures/02-internet1.pdf

Page 13: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

13 15-­‐214

Abstractions of a network connection

IP

TCP | UDP | …

HTTP | FTP | …

HTML | Text | JPG | GIF | PDF | …

data link layer

physical layer

Page 14: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

14 15-­‐214

Internet  addresses  and  sockets  

•  For  IP  version  4  (IPv4)  host  address  is  a  4-­‐byte  number  –  e.g.  127.0.0.1  –  Hostnames  mapped  to  host  IP  addresses  via  DNS  –  ~4  billion  dis9nct  addresses  

•  Port  is  a  16-­‐bit  number  (0-­‐65535)  –  Assigned  conven9onally  

•  e.g.,  port  80  is  the  standard  port  for  web  servers  

Page 15: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

15 15-­‐214

Packet-­‐oriented  and  stream-­‐oriented  connec9ons  

•  UDP:    User  Datagram  Protocol  –  Unreliable,  discrete  packets  of  data  

•  TCP:    Transmission  Control  Protocol  –  Reliable  data  stream  

Page 16: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

16 15-­‐214

Networking  in  Java  

•  The  java.net.InetAddress:  static  InetAddress  getByName(String  host);  static  InetAddress  getByAddress(byte[]  b);  static  InetAddress  getLocalHost();  

•  The  java.net.Socket:  Socket(InetAddress  addr,  int  port);  boolean            isConnected();  boolean            isClosed();  void                  close();  InputStream    getInputStream();  OutputStream  getOutputStream();  

•  The  java.net.ServerSocket:  ServerSocket(int  port);  Socket              accept();  void                  close();  …  

Page 17: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

17 15-­‐214

Simple  sockets  demos  

•  NetworkServer.java  •  A  basic  chat  system:  

–  TransferThread.java  –  TextSocketClient.java  –  TextSocketServer.java  

Page 18: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

18 15-­‐214

Higher  levels  of  abstrac9on  

•  Applica9on-­‐level  communica9on  protocols  •  Frameworks  for  simple  distributed  computa9on  

–  Remote  Procedure  Call  (RPC)  –  Java  Remote  Method  Invoca9on  (RMI)  

•  Common  paYerns  of  distributed  system  design  •  Complex  computa9onal  frameworks  

–  e.g.,  distributed  map-­‐reduce  

Page 19: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

19 15-­‐214

Today  

•  Java  networking  fundamentals  •  Introduc9on  to  distributed  systems  

–  Mo9va9on:  reliability  and  scalability  –  Failure  models  –  Techniques  for:  

•  Reliability  (availability)  •  Scalability  •  Consistency  

Page 20: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

20 15-­‐214

Page 21: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

21 15-­‐214

Aside:    The  robustness  vs.  redundancy  curve  

? redundancy robustness

Page 22: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

22 15-­‐214

Metrics  of  success  

•  Reliability  –  O3en  in  terms  of  availability:    frac9on  of  9me  system  is  working  

•  99.999%  available  is  "5  nines  of  availability"  •  Scalability  

–  Ability  to  handle  workload  growth  

Page 23: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

23 15-­‐214

A  case  study:    Passive  primary-­‐backup  replica9on  

•  Architecture  before  replica9on:      

–  Problem:    Database  server  might  fail  

client front-end {alice:90, bob:42, …} client front-end

database server:

Page 24: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

24 15-­‐214

A  case  study:    Passive  primary-­‐backup  replica9on  

•  Architecture  before  replica9on:      

–  Problem:    Database  server  might  fail  

•  Solu9on:    Replicate  data  onto  mul9ple  servers  

client front-end {alice:90, bob:42, …} client front-end

database server:

client front-end {alice:90, bob:42, …} client front-end

primary:

{alice:90, bob:42, …}

backup:

{alice:90, bob:42, …}

backup:

Page 25: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

25 15-­‐214

Passive  primary-­‐backup  replica9on  protocol  

1.  Front-­‐end  issues  request  with  unique  ID  to  primary  DB  2.  Primary  checks  request  ID  

–  If  already  executed  request,  re-­‐send  response  and  exit  protocol  3.  Primary  executes  request  and  stores  response  4.  If  request  is  an  update,  primary  DB  sends  updated  state,  ID,  and  

response  to  all  backups  –  Each  backup  sends  an  acknowledgement  

5.  A3er  receiving  all  acknowledgements,  primary  DB  sends  response  to  front-­‐end  

Page 26: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

26 15-­‐214

Issues  with  passive  primary-­‐backup  replica9on  

•  If  primary  DB  crashes,  front-­‐ends  need  to  agree  upon  which  unique  backup  is  new  primary  DB  –  Primary  failure  vs.  network  failure?  

•  If  backup  DB  becomes  new  primary,  surviving  replicas  must  agree  on  current  DB  state  

•  If  backup  DB  crashes,  primary  must  detect  failure  to  remove  the  backup  from  the  cluster  –  Backup  failure  vs.  network  failure?  

•  If  replica  fails*  and  recovers,  it  must  detect  that  it  previously  failed  

•  Many  subtle  issues  with  par9al  failures  •  …  

Page 27: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

27 15-­‐214

More  issues…  

•  Concurrency  problems?  –  Out  of  order  message  delivery?  

•  Time…  

•  Performance  problems?  –  2n  messages  for  n  replicas  –  Failure  of  any  replica  can  delay  response  –  Rou9ne  network  problems  can  delay  response  

•  Scalability  problems?  –  All  replicas  are  wriYen  for  each  update  –  Primary  DB  responds  to  every  request  

Page 28: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

28 15-­‐214

Today  

•  Java  networking  fundamentals  •  Introduc9on  to  distributed  systems  

–  Mo9va9on:  reliability  and  scalability  –  Failure  models  –  Techniques  for:  

•  Reliability  (availability)  •  Scalability  •  Consistency  

Page 29: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

29 15-­‐214

Types  of  failure  behaviors  

•  Fail-­‐stop  •  Other  hal9ng  failures  •  Communica9on  failures  

–  Send/receive  omissions  –  Network  par99ons  –  Message  corrup9on  

•  Data  corrup9on  •  Performance  failures  

–  High  packet  loss  rate  –  Low  throughput  –  High  latency  

•  Byzan9ne  failures  

Page 30: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

30 15-­‐214

Common  assump9ons  about  failures  

•  Behavior  of  others  is  fail-­‐stop  (ugh)  •  Network  is  reliable  (ugh)  •  Network  is  semi-­‐reliable  but  asynchronous  •  Network  is  lossy  but  messages  are  not  corrupt  •  Network  failures  are  transi9ve  •  Failures  are  independent  •  Local  data  is  not  corrupt  •  Failures  are  reliably  detectable  •  Failures  are  unreliably  detectable    

Page 31: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

31 15-­‐214

Some  distributed  system  design  goals  

•  The  end-­‐to-­‐end  principle  –  When  possible,  implement  func9onality  at  the  end  nodes  (rather  than  the  

middle  nodes)  of  a  distributed  system  

•  The  robustness  principle  –  Be  strict  in  what  you  send,  but  be  liberal  in  what  you  accept  from  others  

•  Protocols  •  Failure  behaviors  

•  Benefit  from  incremental  changes  •  Be  redundant  

–  Data  replica9on  –  Checks  for  correctness  

Page 32: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

32 15-­‐214

Replica9on  for  scalability:    Client-­‐side  caching  

•  Architecture  before  replica9on:  

–  Problem:    Server  throughput  is  too  low  

•  Solu9on:    Cache  responses  at  (or  near)  the  client  –  Cache  can  respond  to  repeated  read  requests  

client front-end {alice:90, bob:42, …} client front-end

database server:

client front-end

client front-end

{alice:90, bob:42, …}

database server: cache

cache

Page 33: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

33 15-­‐214

Replica9on  for  scalability:    Client-­‐side  caching  

•  Hierarchical  client-­‐side  caches:  

client

front-end

client

front-end

{alice:90, bob:42, …}

database server:

cache

cache

cache

client

client

cache

cache

cache

Page 34: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

34 15-­‐214

Replica9on  for  scalability:    Server-­‐side  caching  

•  Architecture  before  replica9on:  

–  Problem:    Database  server  throughput  is  too  low  

•  Solu9on:    Cache  responses  on  mul9ple  servers  –  Cache  can  respond  to  repeated  read  requests  

client front-end {alice:90, bob:42, …} client front-end

database server:

client front-end

client front-end

{alice:90, bob:42, …}

database server: cache

cache

cache

Page 35: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

35 15-­‐214

Cache  invalida9on  

•  Time-­‐based  invalida9on    (a.k.a.  expira9on)  –  Read-­‐any,  write-­‐one  –  Old  cache  entries  automa9cally  discarded  –  No  expira9on  date  needed  for  read-­‐only  data  

•  Update-­‐based  invalida9on  –  Read-­‐any,  write-­‐all  –  DB  server  broadcasts  invalida9on  message  to  all  caches  when  the  DB  is  

updated  

Page 36: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

36 15-­‐214

Cache  replacement  policies  

•  Problem:    caches  have  finite  size  •  Common*  replacement  policies  

–  Op9mal  (Belady's)  policy  •  Discard  item  not  needed  for  longest  9me  in  future  

–  Least  Recently  Used  (LRU)  •  Track  9me  of  previous  access,  discard  item  accessed  least  recently  

–  Least  Frequently  Used  (LFU)  •  Count  #  9mes  item  is  accessed,  discard  item  accessed  least  frequently  

–  Random  •  Discard  a  random  item  from  the  cache  

Page 37: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

37 15-­‐214

Par99oning  for  scalability  

•  Par99on  data  based  on  some  property,  put  each  par99on  on  a  different  server  

client front-end {cohen:9, bob:42, …}

client front-end

CMU server:

{alice:90, pete:12, …}

Yale server: {deb:16, reif:40, …}

MIT server:

Page 38: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

38 15-­‐214

Horizontal  par99oning  

•  a.k.a.  "sharding"  •  A  table  of  data:  

username   school   value  

cohen   CMU   9  

bob   CMU   42  

alice   Yale   90  

pete   Yale   12  

deb   MIT   16  

reif   MIT   40  

Page 39: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

39 15-­‐214

Recall:    Basic  hash  tables  

•  For  n-­‐size  hash  table,  put  each  item  X  in  the    bucket:  X.hashCode() % n

0 1 2 3 4 5 6 7 8 9 10 11 12

{reif:40} {bob:42} {pete:12} {deb:16}

{alice:90} {cohen:9}

Page 40: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

40 15-­‐214

Par99oning  with  a  distributed  hash  table  

•  Each  server  stores  data  for  one  bucket  •  To  store  or  retrieve  an  item,  front-­‐end  server  hashes  the  key,  

contacts  the  server  storing  that  bucket  

client front-end {reif:40}

client front-end

Server 0:

{bob:42} Server 3: {pete:12,

alice:90}

Server 5:

{ } Server 1:

Page 41: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

41 15-­‐214

Consistent  hashing  

•  Goal:    Benefit  from  incremental  changes  –  Resizing  the  hash  table  (i.e.,  adding  or  removing  a  server)  should  not  

require  moving  many  objects  

•  E.g.,  Interpret  the  range  of  hash  codes  as  a  ring  –  Each  bucket  stores  data  for  a  range  of  the  ring  

•  Assign  each  bucket  an  ID  in  the  range  of  hash  codes  •  To  store  item  X  don't  compute  X.hashCode() % n.    Instead,  place  X  in  bucket  with  the  same  ID  as  or  next  higher  ID  than X.hashCode()

Page 42: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

42 15-­‐214

Problems  with  hash-­‐based  par99oning  

•  Front-­‐ends  need  to  determine  server  for  each  bucket  –  Each  front-­‐end  stores  look-­‐up  table?  –  Master  server  storing  look-­‐up  table?  –  Rou9ng-­‐based  approaches?  

•  Places  related  content  on  different  servers  –  Consider  range  queries:       SELECT * FROM users WHERE lastname STARTSWITH 'G'

Page 43: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

43 15-­‐214

Master/tablet-­‐based  systems  •  Dynamically  allocate  range-­‐based  par99ons  

–  Master  server  maintains  tablet-­‐to-­‐server  assignments  –  Tablet  servers  store  actual  data  –  Front-­‐ends  cache  tablet-­‐to-­‐server  assignments  

client front-end

k-z: {pete:12, reif:42}

client front-end

Tablet server 1:

a-c: {alice:90, bob:42, cohen:9}

Tablet server 2: d-g: {deb:16} h-j:{ }

Tablet server 3:

{a-c:[2], d-g:[3,4], h-j:[3], k-z:[1]}

Master:

d-g: {deb:16}

Tablet server 4:

Page 44: Jonathan/Aldrich /CharlieGarrod&charlie/courses/15-214/2015-fall/slides/06b... · 15214 2 Administrivia/ • Homework/5b/due/Thursday,/11:59/p.m./ – Finish/by/Friday/10/a.m./if/you/wantto/be/considered/as/a"Best

44 15-­‐214

Coming  next…  

•  More  distributed  systems  –  MapReduce