execution replay for multiprocessor virtual machines george w. dunlap dominic lucchetti michael a....
Post on 19-Dec-2015
213 views
TRANSCRIPT
![Page 1: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/1.jpg)
Execution Replay for
Multiprocessor Virtual Machines
George W. DunlapDominic Lucchetti
Michael A. FettermanPeter M. Chen
![Page 2: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/2.jpg)
Big ideas
• Detection and replay of memory races is possible on commodity hardware
• Overhead high for some workloads
• …but surprisingly low for other workloads
![Page 3: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/3.jpg)
Execution Replay
CPU
Memory
Disk
Network
Keyboard, mouse
Interrupts
![Page 4: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/4.jpg)
Uses of Execution Replay
• Reconstructing state– Fault tolerance
• Reconstructing execution– Debugging– Realistic trace generation
• Both– Intrusion analysis
![Page 5: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/5.jpg)
Single-processor Replay• Basic principles well understood
– Log all non-deterministic inputs– Timing of asynchronous events
• Minimal overhead (Dunlap02)– 13% worst case– Log for months or years
• Available commercially– VMWare: Record/Replay
![Page 6: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/6.jpg)
Replay for Multiprocessors• Memory races in multiprocessor VMs• The Ordering Requirement• The CREW Protocol
– Implementing with page protections– Relation to the Ordering Requirement– Generating constrants from CREW events
• DMA-capable devices and CREW• Performance
![Page 7: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/7.jpg)
The Multiprocessor Challenge
• Interleaved reads and writes– Fine-grained non-determinism– Much more difficult
• Existing solutions– Hardware modification– Software instrumentation
• SMP-ReVirt– Hardware MMU to detect sharing
![Page 8: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/8.jpg)
Multiprocessor Replay
P2
Memory
P1
P1 P2
n=3n=5
if (n<4)
![Page 9: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/9.jpg)
Ordering Memory Accesses
• Preserving order will reproduce execution– a→b: “a happens-before b”– Ordering is transitive: a→b, b→c means
a→c
• Two instructions must be ordered if:– they both access the same memory, and– one of them is a write
![Page 10: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/10.jpg)
Constraints: Enforcing order
• To guarantee a→d:– a→d– b→d– a→c– b→c
• Suppose we need b→c– b→c is necessary– a→d is redundant
P1
a
b
c
d
P2
overconstrained
![Page 11: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/11.jpg)
CREW Protocol
• Each shared object in one of two states:– Concurrent-Read: all processors can read,
none can write– Exclusive-Write: one processor (the
owner) can read and write; others have no access
![Page 12: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/12.jpg)
CREW protocol, con’t• Enforced with hardware MMU
– Read/write– Read-only– None
• Change CREW states on demand– Fault, fixup, re-execute
• CREW event– Increasing or reducing permission due to CREW
state changes
![Page 13: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/13.jpg)
CREW Property
• If two instructions on different processors: – access the same page,– and one of them is a write,– there will be a CREW event on each
processor between them.
![Page 14: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/14.jpg)
Generating Constraints• State: Concurrent Read
– All processors read-only
• d*: CREW fault• New state: P2 Exclusive• r: privilege reduction
– Read to None
• i: privilege increase– Read to Read/write
• Log timing of r and i• Constraint:
– r → i
P1
a
d
P2
ri
d*
![Page 15: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/15.jpg)
Direct Memory Access
• Device accesses memory directly
• Logically another processor– Reads and writes need to be ordered– IOMMU: can’t fault/fixup/re-execute
• Observation: Transaction model
• Device: non-preemptible actor
![Page 16: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/16.jpg)
Prototype: SMP-ReVirt
• Modified Xen hypervisor
• Implement logging, CREW protocol
• Details in paper
![Page 17: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/17.jpg)
Evaluation questions
• What is the overhead?
• What affects performance?– In paper
• When might I want to use MP?– Log with 1, 2, or N cpus?
![Page 18: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/18.jpg)
Evaluation Workloads
• SPLASH2 parallel application suite– FMM, LU, ocean, radix, water-spatial,
radiosity
• Kernel-build
• Dbench
![Page 19: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/19.jpg)
Predicting results• Key changes in sharing attributes
– 4096-byte sharing granularity– “Miss” is very expensive
• SPLASH2– Good: high spatial locality / low false sharing– Bad: random access patterns / high false sharing
• The Linux kernel– Tuned to 16-byte cacheline– Involving the kernel may be expensive
![Page 20: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/20.jpg)
Single-processor Xen guests
1.001.04
1.01 1.001.03
1.13
1.001.05
0
0.2
0.4
0.6
0.8
1
1.2
FMM LU ocean radix water-spatial
kernel-build
radiosity dbench
Norm
aliz
ed r
untim
e
Unmodified 1-cpu guest
Logging 1-cpuguest
`
![Page 21: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/21.jpg)
Log Growth RateWorkload Log growth(GB/day) Days to fill 300GB
FMM 0.234 1280
LU 0.237 1261
Ocean 0.232 1295
Radix 0.292 1025
Water-spatial 0.232 1296
Kernel-build 0.564 531
Radiosity 0.231 1295
Dbench 0.557 538
![Page 22: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/22.jpg)
2-processor Xen guests
1.51
1.001.08
1.601.48
2.10
1.90
1.76
1.96
1.741.83
1.99
0
0.5
1
1.5
2
2.5
FMM LU ocean radix water-spatial kernel-build
No
rma
lize
d r
un
tim
e
Unmodified 2-cpuguest
Logging 2-cpu guest
Logging 1-cpu guest
![Page 23: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/23.jpg)
2-processor, con’t
8.70
7.21
1.85 1.88
0123456789
10
radiosity dbench
No
rma
lize
d r
un
tim
e
Unmodified 2-cpu guest
Logging 2-cpu guest
Logging 1-cpu guest
![Page 24: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/24.jpg)
Log Growth RateWorkload Log growth(GB/day) Days to fill 300GB
FMM 34.5 8.7
LU 3.2 92.7
Ocean 4.3 69.1
Radix 39.8 7.5
Water-spatial 36.3 8.25
Kernel-build 43.3 6.9
Radiosity 88.4 3.4
Dbench 77.0 3.9
![Page 25: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/25.jpg)
4-processor Xen guests
7.36
1.12 1.28
4.20
1.72
9.03
0
2
4
6
8
10
FMM LU ocean radix water-spatial kernel-build
Nor
mal
ized
run
time
Unmodified domain, 4 cpus
CREW logging, 4 cpus
CREW logging, 2 cpus*
CREW logging, 1 cpu
![Page 26: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/26.jpg)
Recap• Memory races in multiprocessor VMs• The Ordering Requirement• The CREW Protocol
– Implementing with page protections– Relation to the Ordering Requirement– Generating constrants from CREW events
• DMA-capable devices and CREW• Performance
![Page 27: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/27.jpg)
Big ideas
• Detection and replay of memory races is possible on commodity hardware
• Overhead high for some workloads
• …but surprisingly low for other workloads
![Page 28: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen](https://reader030.vdocument.in/reader030/viewer/2022032800/56649d405503460f94a1a41f/html5/thumbnails/28.jpg)
Questions