samc:semac-awaremodelchecking forfastdiscoveryof … · 2019-12-18 · 1...

43
1 SAMC: Sema+cAware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi * , Jeffrey F. Lukman , and Haryadi S. Gunawi *

Upload: others

Post on 23-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

1  

SAMC:  Sema+c-­‐Aware  Model  Checking  for  Fast  Discovery  of  

Deep  Bugs  in  Cloud  Systems  

Tanakorn  Leesatapornwongsa,  Mingzhe  Hao,  Pallavi  Joshi*,  Jeffrey  F.  Lukman†,  

and  Haryadi  S.  Gunawi      

†   *  

Page 2: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

2  

Internet  Services  

Page 3: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

3  

Internet  Services  Cloud  Systems  

Page 4: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Reliability  

4  

Page 5: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Deep  Bug  Example  ZooKeeper (synchronization service) Issue #335.

1. Nodes A, B, C start (w/ latex txid: 10) 2. B becomes leader 3. B crashes 4. C becomes leader 5. C commits new txid-value pair (11, X) 6. A crashes, before committing the new txid 11 7. C loses quorum and C crashes 8. A and B are back online after C crashes 9. A becomes leader 10. A's commits new txid-value pair (11, Y) 11. C is back online after A's new tx commit 12. C announce to B (11, X) 13. B replies diff starting with tx 12 14. Inconsistency: A has (11, Y), C has (11, X)

5  

PERMANENT INCONSISTENT REPLICA

Page 6: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Deep  Bug  Characteris+cs  ZooKeeper (synchronization service) Issue #335.

1. Nodes A, B, C start (w/ latex txid: 10) 2. B becomes leader 3. B crashes 4. C becomes leader 5. C commits new txid-value pair (11, X) 6. A crashes, before committing the new txid 11 7. C loses quorum and C crashes 8. A and B are back online after C crashes 9. A becomes leader 10. A's commits new txid-value pair (11, Y) 11. C is back online after A's new tx commit 12. C announce to B (11, X) 13. B replies diff starting with tx 12 14. Inconsistency: A has (11, Y), C has (11, X)

6  

Specific  Order  

1.  Out-­‐of-­‐order  messages  

2.  Mul+ple  crashes  

3.  Mul+ple  reboots    

HAPPEN  IN  ANY  ORDER  1 2 3

Page 7: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Study  of  Deep  Bugs  

0  

1  

2  

3  

4  

5  

6  

335  

569  

769  

790  

791  

975  

1075  

1118  

1154  

1294  

1319  

1332  

1367  

1372  

1419  

1492  

1573  

1653  

ZooKeeper  

Bug  #  

0  

1  

2  

3  

4  

5  

6  

913  

3780  

3846  

4252  

4425  

4607  

4748  

4832  

4833  

4890  

5000  

5169  

5198  

5358  

5405  

5409  

5476  

5489  

5505  

Hadoop  

Bug  #  

0  

1  

2  

3  

4  

5  

6  

515  

1221  

1432  

1730  

1992  

2115  

2514  

3273  

3466  

3626  

3876  

5179  

6156  

6364  

6503  

Cassandra  Bu

g  #  

x-­‐axis  is  bug  number  y-­‐axis  is  number  of  crashes                                and  reboots  

7  

#  crashe

s  /  #  re

boots  

#  crashe

s  /  #  re

boots  

#  crashe

s  /  #  re

boots   SEVERE

IMPLICATIONS INCONSISTENT REPLICAS,

DATA LOSS, DOWNTIMES, ETC.

Page 8: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

How  do  we  catch  deep  bugs  in  distributed  systems?  

8  

Page 9: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

How  to  Catch  Deep  Bugs  •  Distributed  system  model  checker  – Re-­‐ordering  all  non-­‐determinis+c  events  – Find  which  specific  orderings  lead  to  bugs  

9  

1  2  3  4  5  6  7  8  9  10  11  12  13  14  

2  7  1  4  5  6  3  8  11  10  9  12  14  13  

6  9  3  4  5  1  7  8  2  10  11  13  12  14  

3  4  1  5  2  6  7  8  9  12  11  10  14  13  

2  1  3  4  5  7  6  8  9  10  11  14  12  13  

. . .

Page 10: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

What’s  Wrong  with  Exis+ng  Model  Checkers?  

•  Last  7  years  •   MaceMC  [NSDI’07],    Modist  [NSDI’09],  dBug  [SSV’10],  

Demeter  [SOSP’13],  etc.  

•  BUT  – Too  many  events  – Mul+ple  crashes  and  reboots  •  Create  more  messages  

– No  model  checker  incorporate          mul+ple  crashes  and  reboots  – Cannot  find  deep  bugs!  

10  

100 events

Page 11: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

How  do  we  catch  deep  bugs  REALLY  FAST?  

11  

Page 12: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Black-­‐Box  Approach  

•  Exis+ng  model  checkers  are  so  slow  •  They  treat  target  systems  as  black  boxes  – A  large  number  of  event  orderings  are  generated  

12  

Black  Box  

A   B   C   D

Black  Box  Model  Checker  

ABCD  

ABDC  

ACBD  

ACDB  

ADBC  

.  .  .  

(24  total)  

Page 13: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Seman+c  Knowledge  

•  How  can  we  make  model  checker  fast?  – Exploit  seman+c  knowledge  

•  Seman+c-­‐aware  model  checker  (SAMC)  

13  

A   B   C   D

SAMC  Black  Box  

Page 14: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Black  Box  vs.  SAMC  

14  

Black  Box  

Black  Box  Model  Checker  

ABCD  

ABDC  

ACBD  

ACDB  

ADBC  

ADCB  

BACD  

BADC  

BCAD  

BCDA  

BDAC  

.  .  .  

A  

SAMC  with  message  processing  

semanJc  

ABCD  

ABDC  

ACBD  

ACDB  

ADBC  

ADCB  

BACD  

BADC  

BCAD  

BCDA  

.  .  .  

B   C   D A   B   C   D

Unnecessary Re-orderings

Lead to the same state

Message  Processing  Seman+c  

Page 15: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

SAMC  with  crash  recovery  

semanJc  

ABCDX  

ABCXD  

ABXCD  

AXBCD  

XABCD  

ABDCX  

ABDXC  

.  .  .  

N3  

SAMC  with  Crashes  

15  

Black  Box  Model  checker  

ABCDX  

ABCXD  

ABXCD  

AXBCD  

XABCD  

ABDCX  

ABDXC  

.  .  .  

N1  

N2  

N4  

A,B  

C,D  

Unnecessary Re-orderings

Crash  Recovery  Seman+c  

Page 16: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Generic  Reduc+on  Algorithms  

16  

Principle of Semantic

Awareness

Local-­‐Message  Independence  (LMI)  

Crash-­‐Message  Independence  (CMI)  

Crash  Recovery  Symmetry  (CRS)  

Reboot  Synchroniza+on  Symmetry  (RSS)  

Page 17: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

SAMC  Implementa+on  and  Integra+on  

Cloud  systems   Protocol  

Cassandra   Gossiper  

Hinted  handoff  

Read/write  

Hadoop  2.0   Cluster  management  

Specula+ve  execu+on  

ZooKeeper   Atomic  broadcast  

Leader  elec+on  

17  

•  SAMC  implementa+on  – 10,000  LOC  from  scratch  

•  Apply  SAMC  to  3  cloud          systems  – 7  protocols  – 10  versions  

Page 18: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Result  

•  Reproduced  12  old  bugs  – Compare  to  state-­‐of-­‐the-­‐art  techniques  

•  Dynamic  Par+al  Order  Reduc+on  (DPOR)  •  Random-­‐DPOR  

– Find  bugs  2x  to  340x  faster  •  49x  on  average  

•  Found  2  new  bugs  – Submit  them  to  developers  

18  

Page 19: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Outline  

•  IntuiJon  •  SAMC  – Local-­‐Message  Independence  – Crash-­‐Message  Independence  – Crash  Recovery  Symmetry  – Reboot  Synchroniza+on  Symmetry  

•  Evalua+on  

19  

Page 20: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Dependency  vs.  Independency  

20  

A   B  

Node  State  =  S  

S  

A,  B  

B,  A  

S’  

S’’   B,  A  

A,  B  

S   S’  

A, B = Dependent A, B = Independent

INDEPENDENT = NO REORDERING 2X SPEED-UP

Unnecessary

Page 21: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Black  Box  vs.  SAMC  

21  

SAMC  

ABCD  

ABDC  

BACD  

BADC  

Black  Box  

A   B   C   D A   B   C   D

All  dependent   Dependent   Dependent  

Seman+c  Info  

Black  Box  Model  Checker  

ABCD  

ABDC  

ACBD  

ACDB  

ADBC  

ADCB  

BACD  

BADC  

BCAD  

BCDA  

.  .  .  

4!=24  orderings  6X SPEED-UP

Page 22: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Reduc+on  Speed-­‐up  

22  

S  ABDC  

.  .  .  

.  

.  

.  .  .  .  

.  

.  

.  

S  ABDC  

.  .  .  

.  

.  

.  .  .  .  

.  

.  

.  

Page 23: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

How  to  Declare  Message  Independency?  

23  

Q:  Which  concurrent  messages  are  independent  ?  

A:  Use  message  processing  seman+c  

Page 24: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Message  Processing  Seman+c  in  Simplified  Leader  Elec+on  

24  

Belief  =  n3  

Vote=1  

B  =  3  

V  =  1  

if(vote <= belief)! // do nothing!else! belief = vote;!

Vote=2   Vote=4  

B  =  3  

B  =  3  

V  =  2  

B  =  3  

B  =  3  

V  =  4  

B  =  4  

Page 25: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Removing  Re-­‐ordering  via  Message  Processing  Seman+c  

25  

B  =  4  

V  =  1  

B  =  4  

B  =  4  

V  =  2  

V  =  3  

B  =  4  

Belief=4   B  =  4  

V  =  1  

B  =  4  

B  =  4  

V  =  3  

V  =  2  

B  =  4  

B  =  4  

V  =  2  

B  =  4  

B  =  4  

V  =  1  

V  =  3  

B  =  4  

Vote=1   Vote=2   Vote=3  

1,  2,  3   1,  3,  2   2,  1,  3  

if (vote <= belief)! // do nothing!else! belief = vote;!

Unnecessary  

Page 26: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Formalizing  the  Intui+on  

26  

if (isDiscard(msg, ls)) {! // do nothing;!}!

DISCARD  PATTERN  

boolean discardPredicate(msg, ls) {! if (msg.vote <= ls.belief)! return true;! else! return false;!}!

DISCARD  PREDICATE  

if (vote <= belief)! // do nothing!else! belief = vote;!!

MESSAGE  PROCESSING  SEMANTIC  

Page 27: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Formalizing  the  Intui+on  

27  

mx   my   discard(mx)   discard(my)   Independent  

1   2   true   true   ✔  

1   3   true   true   ✔  

2   3   true   true   ✔  

vote   belief   discardPredicate  

1   4   true  

2   4   true  

3   4   true  

Belief=4  

Vote=1   Vote=2   Vote=3  

Page 28: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Outline  

•  Intui+on  •  SAMC  – Local-­‐Message  Independence  – Crash-­‐Message  Independence  – Crash  Recovery  Symmetry  – Reboot  Synchroniza+on  Symmetry  

•  Evalua+on  

28  

Page 29: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Local  state:  ls1  

SAMC  Architecture  

29  

interceptor  

a,  b  

c,  d  Local  state:  

ls2  

interceptor  Dynamic  Par+al  Order  Reduc+on  (DPOR)  Symmetry  

Basic  Reduc+on  Techniques  

Generic  Reduc+on  Policies  

LMI   CMI   CRS   RSS  

Protocol  Specific  Rules  

Leader  Elec+on  

Atomic  Broadcast   …  

release(c)  

SAMC

…  

Page 30: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Local-­‐Message  Independence  

30  

Generic  Reduc+on  Policies  

Local-­‐Message  Independence  (LMI)  

Crash-­‐Message  Independence  (CMI)  

Crash  Recovery  Symmetry  (CRS)  

Reboot  Synchroniza+on  Symmetry  (RSS)  

SAMC

Page 31: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Local-­‐Message  Independence  

•  Discard  parern  •  Increment  parern  

•  Constant  parern  31  

if (msg.type == ack) {! node.ackCount++;!}!

C=0  

ack  

C=1  

C=2  

ack  

boolean incrementPredicate(msg, ls) {! if (msg.type == ack)! return true;! else! return false;!}!

C=0  

ack  

C=1  

C=2  

ack  

Page 32: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Crash-­‐Message  Independence  

32  

Generic  Reduc+on  Policies  

Crash-­‐Message  Independence  (CMI)  

Local-­‐Message  Independence  (LMI)  

Crash  Recovery  Symmetry  (CRS)  

Reboot  Synchroniza+on  Symmetry  (RSS)  

SAMC

Page 33: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Crash-­‐Message  Independence  

33  

Black  Box  

ABCDX  

ABCXD  

ABXCD  

AXBCD  

XABCD  

ABDCX  

…  

void handleCrash() {! if (X == follower && isQuorum())! followerCount--;! // No new messages!}!

boolean localImpact(X, ls) {! if (X == follower && isQuorum())! return true;! else! return false;!}!

L  

F   F  

A,B   C,D  

F  

L  

F   F  

A,B   C,D  

F  

Page 34: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Crash-­‐Message  Independence  

34  

boolean globalImpact(X, ls) {! if (X == leader || !isQuorum())! return true;! else! return false;!}!

L  

F   F  

A,B   C,D  

F  

L  

S   S  S  

void handleCrash() {! if (X == leader || !isQuorum()) ! electLeader()! // New messages created!}!

Page 35: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Crash  Recovery  Symmetry  Reboot  Synchroniza+on  Symmetry  

35  

Generic  Reduc+on  Policies  

Crash  Recovery  Symmetry  (CRS)  

Local-­‐Message  Independence  (LMI)  

Crash-­‐Message  Independence  (CMI)  

Reboot  Synchroniza+on  Symmetry  (RSS)  

SAMC

Page 36: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Outline  

•  Intui+on  •  SAMC  – Local-­‐Message  Independence  – Crash-­‐Message  Independence  – Crash  Recovery  Symmetry  – Reboot  Synchroniza+on  Symmetry  

•  EvaluaJon  

36  

Page 37: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Evalua+on  

Cloud  systems   Protocol  

Cassandra   Gossiper  

Hinted  handoff  

Read/write  

Hadoop  2.0   Cluster  management  

Specula+ve  execu+on  

ZooKeeper   Atomic  broadcast  

Leader  elec+on  

37  

•  SAMC  implementa+on  – 10,000  LOC  from  scratch  

•  Apply  SAMC  to  3  cloud          systems  – 7  protocols  – 10  versions  

Page 38: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Protocol-­‐Specific  Rules  (e.g.  ZooKeeper  Leader  Elec+on)  

•  Guide  SAMC  to  remove  re-­‐orderings  •  35  LOC  on  average  per  protocol  

38  

Page 39: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Catching  Old  Bugs  

SAMC  

#exe  

117  

7  

53  

16  

100  

576  

11  

4  

53  

40  

104  

96  39  

A  table  shows  number  of  execu+ons  to  reach  the  bugs  and  speedup  

Issue#  

ZooKeeper-­‐335  

ZooKeeper-­‐790  

ZooKeeper-­‐975  

ZooKeeper-­‐1075  

ZooKeeper-­‐1419  

ZooKeeper-­‐1492  

ZooKeeper-­‐1653  

MapReduce-­‐4748  

MapReduce-­‐5489  

MapReduce-­‐5505  

Cassandra-­‐3395  

Cassandra-­‐3626  

Black-­‐Box  DPOR  

#exe   speedup  

5000+  

14  

967  

1081  

924  

5000+  

945  

22  

5000+  

1212  

2552  

5000+  

Random  

#exe   speedup  

1057  

225  

71  

86  

2514  

5000+  

3756  

6  

5000+  

5000+  

191  

5000+  

Random  DPOR  

#exe   speedup  

5000+  

82  

163  

250  

987  

5000+  

3462  

6  

5000+  

1210  

550  

5000+  

Page 40: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Catching  Old  Bugs  

SAMC  

#exe  

117  

7  

53  

16  

100  

576  

11  

4  

53  

40  

104  

96  40  

A  table  shows  number  of  execu+ons  to  reach  the  bugs  and  speedup  

Issue#  

ZooKeeper-­‐335  

ZooKeeper-­‐790  

ZooKeeper-­‐975  

ZooKeeper-­‐1075  

ZooKeeper-­‐1419  

ZooKeeper-­‐1492  

ZooKeeper-­‐1653  

MapReduce-­‐4748  

MapReduce-­‐5489  

MapReduce-­‐5505  

Cassandra-­‐3395  

Cassandra-­‐3626  

Black-­‐Box  DPOR  

#exe   speedup  

5000+   43+  

14   2  

967   18  

1081   68  

924   9  

5000+   9+  

945   86  

22   6  

5000+   94+  

1212   30  

2552   25  

5000+   52+  

Random  

#exe   speedup  

1057   9  

225   32  

71   1  

86   5  

2514   25  

5000+   9+  

3756   341  

6   2  

5000+   94+  

5000+   125+  

191   2  

5000+   52+  

Random  DPOR  

#exe   speedup  

5000+   43+  

82   12  

163   3  

250   16  

987   10  

5000+   9+  

3462   315  

6   2  

5000+   94+  

1210   30  

550   5  

5000+   52  

Page 41: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Reduc+on  Ra+o  •  ZooKeeper  leader  elec+on  protocol  •  Run  black-­‐box  DPOR  •  Ater  each  execu+on,  find  that  this  execu+on  is  executed  by  SAMC  or  not  

•  Count  DPOR’s  execu+ons  that  are  executed  by  SAMC  too  

41  

ReducJon  RaJo  

ALL   LMI   CMI   CRS   RSS  

37X   18X   5X   4X   3X  

63X   35X   6X   5X   5X  

103X   37X   9X   9X   14X  

#Crash   #Reboot  

1   1  

2   2  

3   3  

Page 42: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Conclusion  

•  Deep  bugs  live  in  the  cloud  •  Model  checker  needs  to  incorporate  complex  failure  to  reach  deep  bugs  –  State  space  explosion  

•  Seman+c-­‐aware  model  checking  –  LMI,  CMI,  CRS,  RSS  

•  Bring  future  research  ques+ons  – What  other  seman+c  knowledge  is  useful?  – How  to  extract  them  from  the  code  automa+cally?  

42  

Page 43: SAMC:Semac-AwareModelChecking forFastDiscoveryof … · 2019-12-18 · 1 SAMC:"Semac-Aware"Model"Checking" for"FastDiscovery"of" Deep"Bugs"in"Cloud"Systems" Tanakorn"Leesatapornwongsa,

Thank  You    

Ques+ons?  

43  

hrp://ucare.cs.uchicago.edu