automated full-stack memory model verification with the ...manerkar/slides/... · what are memory...
TRANSCRIPT
![Page 1: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/1.jpg)
Yatin Manerkar
Automated Full-Stack Memory Model Verification with the Check suite
http://check.cs.princeton.edu/
Princeton University
ARM Cambridge, July 20th, 2018
![Page 2: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/2.jpg)
What are Memory (Consistency) Models?
JVMLLVM IR PTX SPIR
Java
Bytecode
C11/
C++11
Cuda OpenCL
x86
CPU
ARM
CPU
Power
CPU
Nvidia
GPU
AMD
GPU
…
…
…
Shared Virtual Memory
Memory Consistency Models (MCMs)
Specify rules and guarantees about the ordering and visibility of accesses to shared memory [Sorin et al., 2011].
![Page 3: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/3.jpg)
What are Memory (Consistency) Models?
JVMLLVM IR PTX SPIR
Java
Bytecode
C11/
C++11
Cuda OpenCL
x86
CPU
ARM
CPU
Power
CPU
Nvidia
GPU
AMD
GPU
…
…
…
Shared Virtual Memory
HLL MCMs
Memory Consistency Models (MCMs)
Specify rules and guarantees about the ordering and visibility of accesses to shared memory [Sorin et al., 2011].
![Page 4: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/4.jpg)
What are Memory (Consistency) Models?
JVMLLVM IR PTX SPIR
Java
Bytecode
C11/
C++11
Cuda OpenCL
x86
CPU
ARM
CPU
Power
CPU
Nvidia
GPU
AMD
GPU
…
…
…
Shared Virtual Memory
ISA-level MCMs
Memory Consistency Models (MCMs)
Specify rules and guarantees about the ordering and visibility of accesses to shared memory [Sorin et al., 2011].
![Page 5: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/5.jpg)
Sequential Consistency (SC) - Interleaving Model
▪Defined by [Lamport 1979], execution is the same as if:
(R1) Memory ops of each processor appear in program order
(R2) Memory ops of all processors were executed in some total order
(load reads the value of last store to its address in the total order)
Core 0
x=1
y=1
Core 1
r1=y
r2=x
x=1
y=1
r1=y
r2=x
x=1
r1=y
y=1
r2=x
x=1
r1=y
r2=x
y=1
r1=y
r2=x
x=1
y=1
r1=y
x=1
r2=x
y=1
r1=y
x=1
y=1
r2=x
Program (mp litmus test)
(all addrs initially 0)Legal Executions
r1=1
r2=1 r1=0 r2=1
r1=0
r2=0
r1=1
r2=0
Illegal Outcome
![Page 6: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/6.jpg)
Sequential Consistency (SC) - Interleaving Model
▪Defined by [Lamport 1979], execution is the same as if:
(R1) Memory ops of each processor appear in program order
(R2) Memory ops of all processors were executed in some total order
(load reads the value of last store to its address in the total order)
Core 0
x=1
y=1
Core 1
r1=y
r2=x
x=1
y=1
r1=y
r2=x
x=1
r1=y
y=1
r2=x
x=1
r1=y
r2=x
y=1
r1=y
r2=x
x=1
y=1
r1=y
x=1
r2=x
y=1
r1=y
x=1
y=1
r2=x
Program (mp litmus test)
(all addrs initially 0)Legal Executions
r1=1
r2=1 r1=0 r2=1
r1=0
r2=0
r1=1
r2=0
Illegal Outcome
![Page 7: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/7.jpg)
Hardware Implements Weak Memory Models▪Most processors don’t implement SC
• x86: Total Store Order (TSO): Relaxes Write->Read ordering
• ARMv8 and Power relax more orderings
▪Compilation to weak memory ISAs must maintain ordering guarantees
• [Owens et al. TPHOLS 2009], [Batty et al. POPL 2011, POPL 2012], [Wickerson et al. OOPSLA 2015], …
atomic<int> x = 0;atomic<int> y = 0;
Thread 0 Thread 1
x = 1;y = 1;
r1 = y;r2 = x;
C11 Forbids: r1 = 1, r2 = 0
C11 Source Code
![Page 8: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/8.jpg)
Hardware Implements Weak Memory Models▪Most processors don’t implement SC
• x86: Total Store Order (TSO): Relaxes Write->Read ordering
• ARMv8 and Power relax more orderings
▪Compilation to weak memory ISAs must maintain ordering guarantees
• [Owens et al. TPHOLS 2009], [Batty et al. POPL 2011, POPL 2012], [Wickerson et al. OOPSLA 2015], …
atomic<int> x = 0;atomic<int> y = 0;
Thread 0 Thread 1
x = 1;y = 1;
r1 = y;r2 = x;
C11 Forbids: r1 = 1, r2 = 0
C11 Source Code
![Page 9: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/9.jpg)
Hardware Implements Weak Memory Models▪Most processors don’t implement SC
• x86: Total Store Order (TSO): Relaxes Write->Read ordering
• ARMv8 and Power relax more orderings
▪Compilation to weak memory ISAs must maintain ordering guarantees
• [Owens et al. TPHOLS 2009], [Batty et al. POPL 2011, POPL 2012], [Wickerson et al. OOPSLA 2015], …
atomic<int> x = 0;atomic<int> y = 0;
Thread 0 Thread 1
x = 1;y = 1;
r1 = y;r2 = x;
C11 Forbids: r1 = 1, r2 = 0
Initially, [x] = [y] = 0
Core 0 Core 1
stl #1, [x]stl #1, [y]
lda r1, [y]lda r2, [x]
ARMv8 forbids: r1 = 1, r2 = 0
ARMv8 Assembly Language
Compile
C11 Source Code
![Page 10: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/10.jpg)
Hardware Implements Weak Memory Models▪Most processors don’t implement SC
• x86: Total Store Order (TSO): Relaxes Write->Read ordering
• ARMv8 and Power relax more orderings
▪Compilation to weak memory ISAs must maintain ordering guarantees
• [Owens et al. TPHOLS 2009], [Batty et al. POPL 2011, POPL 2012], [Wickerson et al. OOPSLA 2015], …
atomic<int> x = 0;atomic<int> y = 0;
Thread 0 Thread 1
x = 1;y = 1;
r1 = y;r2 = x;
C11 Forbids: r1 = 1, r2 = 0
Initially, [x] = [y] = 0
Core 0 Core 1
stl #1, [x]stl #1, [y]
lda r1, [y]lda r2, [x]
ARMv8 forbids: r1 = 1, r2 = 0
ARMv8 Assembly Language
Compile
C11 Source Code
Is the ARMv8 hardware correctly implementing
the ARMv8 MCM?
![Page 11: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/11.jpg)
MCM Verification is a Full-Stack Problem!
High-Level Languages (HLL)
Compiler
Architecture (ISA)
OS
▪Each layer has responsibilities for ensuring correct MCM operation
▪Need MCM checking tools at all layers of the computing stack!
Is compiler maintaining HLL guarantees?
Is the ISA-level MCM formally defined?
[Batty et al. POPL 2011, POPL 2012]
[Wickerson et al. OOPSLA 2015]…
[Alglave et al. TOPLAS 2014]
![Page 12: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/12.jpg)
MCM Verification is a Full-Stack Problem!
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
Is hardware incorrectly reordering instructions?
Are virtual memory mappings correct?
Is RTL correctly implementing
microarchitecture?
▪Each layer has responsibilities for ensuring correct MCM operation
▪Need MCM checking tools at all layers of the computing stack!
Is compiler maintaining HLL guarantees?
Is the ISA-level MCM formally defined?
Processor RTL
[Batty et al. POPL 2011, POPL 2012]
[Wickerson et al. OOPSLA 2015]…
[Alglave et al. TOPLAS 2014]
![Page 13: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/13.jpg)
MCM Verification is a Full-Stack Problem!
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
Is hardware incorrectly reordering instructions?
Are virtual memory mappings correct?
Is RTL correctly implementing
microarchitecture?
▪Each layer has responsibilities for ensuring correct MCM operation
▪Need MCM checking tools at all layers of the computing stack!
Is compiler maintaining HLL guarantees?
Is the ISA-level MCM formally defined?
Processor RTL
[Batty et al. POPL 2011, POPL 2012]
[Wickerson et al. OOPSLA 2015]…
[Alglave et al. TOPLAS 2014]
![Page 14: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/14.jpg)
MCM Verification is a Full-Stack Problem!
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
Is hardware incorrectly reordering instructions?
Are virtual memory mappings correct?
Is RTL correctly implementing
microarchitecture?
▪Each layer has responsibilities for ensuring correct MCM operation
▪Need MCM checking tools at all layers of the computing stack!
Is compiler maintaining HLL guarantees?
Is the ISA-level MCM formally defined?
Processor RTL
[Batty et al. POPL 2011, POPL 2012]
[Wickerson et al. OOPSLA 2015]…
[Alglave et al. TOPLAS 2014]
![Page 15: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/15.jpg)
Check Suite: Full-Stack Automated MCM Analysis
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
▪ Suite of tools at various levels of computing stack
▪Automated Full-Stack MCM checking across litmus test suites
PipeCheck & CCICheck[Lustig et al. MICRO 2014][Manerkar et al. MICRO 2015]
COATCheck[Lustig et al. ASPLOS 2016]
TriCheck[Trippel et al. ASPLOS 2017]
RTLCheck[Manerkar et al. MICRO 2017]
Processor RTL
![Page 16: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/16.jpg)
Check Suite: Full-Stack Automated MCM Analysis
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
▪ Suite of tools at various levels of computing stack
▪Automated Full-Stack MCM checking across litmus test suites
PipeCheck & CCICheck[Lustig et al. MICRO 2014][Manerkar et al. MICRO 2015]
COATCheck[Lustig et al. ASPLOS 2016]
TriCheck[Trippel et al. ASPLOS 2017]
RTLCheck[Manerkar et al. MICRO 2017]
Processor RTL
Does microarchitecture correctly implement ISA MCM?
![Page 17: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/17.jpg)
Check Suite: Full-Stack Automated MCM Analysis
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
▪ Suite of tools at various levels of computing stack
▪Automated Full-Stack MCM checking across litmus test suites
PipeCheck & CCICheck[Lustig et al. MICRO 2014][Manerkar et al. MICRO 2015]
COATCheck[Lustig et al. ASPLOS 2016]
TriCheck[Trippel et al. ASPLOS 2017]
RTLCheck[Manerkar et al. MICRO 2017]
Processor RTL Does RTL like Verilog correctly implement microarchitecture?
![Page 18: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/18.jpg)
Check Suite: Full-Stack Automated MCM Analysis
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
▪ Suite of tools at various levels of computing stack
▪Automated Full-Stack MCM checking across litmus test suites
PipeCheck & CCICheck[Lustig et al. MICRO 2014][Manerkar et al. MICRO 2015]
COATCheck[Lustig et al. ASPLOS 2016]
TriCheck[Trippel et al. ASPLOS 2017]
RTLCheck[Manerkar et al. MICRO 2017]
Processor RTL
Do HLL, Compiler, and microarchitecture work
together correctly?
![Page 19: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/19.jpg)
Check Suite: Full-Stack Automated MCM Analysis
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
▪ Suite of tools at various levels of computing stack
▪Automated Full-Stack MCM checking across litmus test suites
PipeCheck & CCICheck[Lustig et al. MICRO 2014][Manerkar et al. MICRO 2015]
COATCheck[Lustig et al. ASPLOS 2016]
TriCheck[Trippel et al. ASPLOS 2017]
RTLCheck[Manerkar et al. MICRO 2017]
Processor RTL
So far, tools have found bugs in:• Widely-used gem5 Research simulator• Cache coherence paper (TSO-CC)• IBM XL C++ compiler (fixed in v13.1.5)• In-design commercial processors• RISC-V draft ISA specification• Compiler mapping proofs• C11 memory model• Open-source processor RTL
![Page 20: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/20.jpg)
Modelling Microarchitecture: Going below the ISA▪Hardware enforces consistency model using smaller localized orderings
• In-order fetch/decode/execute…
• Orderings enforced by memory hierarchy
• …and many more
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
Memory Hierarchy
![Page 21: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/21.jpg)
Modelling Microarchitecture: Going below the ISA▪Hardware enforces consistency model using smaller localized orderings
• In-order fetch/decode/execute…
• Orderings enforced by memory hierarchy
• …and many more
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
Pipeline stages may be FIFO to ensure in-order
execution
Memory Hierarchy
![Page 22: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/22.jpg)
Modelling Microarchitecture: Going below the ISA▪Hardware enforces consistency model using smaller localized orderings
• In-order fetch/decode/execute…
• Orderings enforced by memory hierarchy
• …and many more
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
Pipeline stages may be FIFO to ensure in-order
execution
Memory Hierarchy
Do individual orderings correctly work together
to satisfy consistency model?
![Page 23: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/23.jpg)
Microarchitectural Consistency Checking
Axiom “Decode_is_FIFO":... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
Axiom "PO_Fetch":... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch)).
Microarchitecture
Litmus Test
in µspec DSL
![Page 24: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/24.jpg)
Microarchitectural Consistency Checking
Axiom “Decode_is_FIFO":... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
Axiom "PO_Fetch":... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch)).
Microarchitecture
Litmus Test
in µspec DSL
Each axiom specifies an ordering that µarch should respect
![Page 25: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/25.jpg)
Microarchitectural Consistency Checking
Axiom “Decode_is_FIFO":... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
Axiom "PO_Fetch":... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch)).
Microarchitecture
Litmus Test
in µspec DSL
![Page 26: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/26.jpg)
Microarchitectural Consistency Checking
Microarchitectural happens-before (µhb) graphs
Axiom “Decode_is_FIFO":... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
Axiom "PO_Fetch":... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch)).
Microarchitecture
Litmus Test
in µspec DSL
![Page 27: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/27.jpg)
Microarchitectural Consistency Checking
Microarchitectural happens-before (µhb) graphs
Axiom “Decode_is_FIFO":... EdgeExists ((i1, Decode), (i2, Decode))
=> AddEdge ((i1, Execute), (i2, Execute)).
Axiom "PO_Fetch":... SameCore i1 i2 /\ ProgramOrder i1 i2 =>
AddEdge ((i1, Fetch), (i2, Fetch)).
Microarchitecture
Litmus Test
in µspec DSL
Microarch. verification checks that combination of axioms satisfies MCM
![Page 28: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/28.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
Litmus Test mp
Core 0 Core 1
![Page 29: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/29.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Litmus Test mp
Core 0 Core 1
(i1)
![Page 30: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/30.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Litmus Test mp
Core 0 Core 1
(i1)
![Page 31: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/31.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Litmus Test mp
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Core 0 Core 1
(i1) (i2)
![Page 32: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/32.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Litmus Test mp
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Core 0 Core 1
(i1) (i2)
![Page 33: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/33.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Litmus Test mp
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
WB
Mem.
Exec.
Dec.
Fetch
WB
Mem.
Exec.
Dec.
Fetch
Core 0 Core 1
(i1) (i2) (i3) (i4)
![Page 34: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/34.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Litmus Test mp
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
WB
Mem.
Exec.
Dec.
Fetch
WB
Mem.
Exec.
Dec.
Fetch
Core 0 Core 1
(i1) (i2) (i3) (i4)
![Page 35: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/35.jpg)
PipeCheck: Executions as µhb Graphs [Lustig et al. MICRO 2014]
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
Litmus Test mp
WB
Mem.
SB
Mem
Hier.
Exec.
Dec.
Fetch
WB
Mem.
Exec.
Dec.
Fetch
WB
Mem.
Exec.
Dec.
Fetch
Core 0 Core 1
(i1) (i2) (i3) (i4)
![Page 36: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/36.jpg)
▪Cycle in µhb graph => event has to happen before itself (impossible)
▪Cyclic graph → unobservable on µarch
▪Acyclic graph → observable on µarch
▪Exhaustively enumerate and check all possible execs of litmus test on µarch• Implemented using fast SMT solvers
• Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014]
PipeCheck: Microarchitectural Correctness
Litmus Test mp
![Page 37: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/37.jpg)
▪Cycle in µhb graph => event has to happen before itself (impossible)
▪Cyclic graph → unobservable on µarch
▪Acyclic graph → observable on µarch
▪Exhaustively enumerate and check all possible execs of litmus test on µarch• Implemented using fast SMT solvers
• Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014]
PipeCheck: Microarchitectural Correctness
Litmus Test mp ISA-Level Outcome
Observable(≥ 1 Graph Acyclic)
Not Observable(All Graphs Cyclic)
Allowed OKOK (stricter
than necessary)
Forbidden Consistency violation! OK
![Page 38: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/38.jpg)
▪Cycle in µhb graph => event has to happen before itself (impossible)
▪Cyclic graph → unobservable on µarch
▪Acyclic graph → observable on µarch
▪Exhaustively enumerate and check all possible execs of litmus test on µarch• Implemented using fast SMT solvers
• Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014]
PipeCheck: Microarchitectural Correctness
Litmus Test mp ISA-Level Outcome
Observable(≥ 1 Graph Acyclic)
Not Observable(All Graphs Cyclic)
Allowed OKOK (stricter
than necessary)
Forbidden Consistency violation! OK
![Page 39: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/39.jpg)
▪Cycle in µhb graph => event has to happen before itself (impossible)
▪Cyclic graph → unobservable on µarch
▪Acyclic graph → observable on µarch
▪Exhaustively enumerate and check all possible execs of litmus test on µarch• Implemented using fast SMT solvers
• Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014]
PipeCheck: Microarchitectural Correctness
Litmus Test mp ISA-Level Outcome
Observable(≥ 1 Graph Acyclic)
Not Observable(All Graphs Cyclic)
Allowed OKOK (stricter
than necessary)
Forbidden Consistency violation! OK
![Page 40: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/40.jpg)
▪Cycle in µhb graph => event has to happen before itself (impossible)
▪Cyclic graph → unobservable on µarch
▪Acyclic graph → observable on µarch
▪Exhaustively enumerate and check all possible execs of litmus test on µarch• Implemented using fast SMT solvers
• Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014]
PipeCheck: Microarchitectural Correctness
Litmus Test mp ISA-Level Outcome
Observable(≥ 1 Graph Acyclic)
Not Observable(All Graphs Cyclic)
Allowed OKOK (stricter
than necessary)
Forbidden Consistency violation! OK
![Page 41: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/41.jpg)
▪Cycle in µhb graph => event has to happen before itself (impossible)
▪Cyclic graph → unobservable on µarch
▪Acyclic graph → observable on µarch
▪Exhaustively enumerate and check all possible execs of litmus test on µarch• Implemented using fast SMT solvers
• Compare against ISA-level outcome from herd [Alglave et al. TOPLAS 2014]
PipeCheck: Microarchitectural Correctness
Litmus Test mp ISA-Level Outcome
Observable(≥ 1 Graph Acyclic)
Not Observable(All Graphs Cyclic)
Allowed OKOK (stricter
than necessary)
Forbidden Consistency violation! OK
Abstracted memory hierarchy prevents
verification of complex coherence issues!
![Page 42: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/42.jpg)
CCICheck: Coherence vs Consistency
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
Processor RTLProcessor RTL
▪Memory hierarchy is a collection of caches
• Coherence protocols ensure that all caches agree on the value of any variable
▪ CCICheck [Manerkar et al. MICRO 2015] shows that consistency verification often cannot simply treat memory hierarchy abstractly
• Nominated for Best Paper at MICRO 2015
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
Memory
Hierarchy
![Page 43: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/43.jpg)
CCICheck: Coherence vs Consistency
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
Processor RTLProcessor RTL
▪Memory hierarchy is a collection of caches
• Coherence protocols ensure that all caches agree on the value of any variable
▪ CCICheck [Manerkar et al. MICRO 2015] shows that consistency verification often cannot simply treat memory hierarchy abstractly
• Nominated for Best Paper at MICRO 2015
Coherence Protocol (SWMR, DVI, etc.)
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
![Page 44: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/44.jpg)
Coherence Protocol Example▪ If P1 updates the value of x to 200, the stale value of x in other
processors must be invalidated
▪ If P3 wants to subsequently read/write x, it must request the new value
▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant
P1 P2 P3
x = 100 x = 100 x = 100
Processors
Caches
![Page 45: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/45.jpg)
Coherence Protocol Example▪ If P1 updates the value of x to 200, the stale value of x in other
processors must be invalidated
▪ If P3 wants to subsequently read/write x, it must request the new value
▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant
P1 P2 P3
x = 100 x = 100 x = 100
Processors
Caches
St x = 200
![Page 46: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/46.jpg)
Coherence Protocol Example▪ If P1 updates the value of x to 200, the stale value of x in other
processors must be invalidated
▪ If P3 wants to subsequently read/write x, it must request the new value
▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant
P1 P2 P3
x = 100 x = 100 x = 100
Processors
Caches
Invalidations
x = 100 x = 100
St x = 200
![Page 47: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/47.jpg)
Coherence Protocol Example▪ If P1 updates the value of x to 200, the stale value of x in other
processors must be invalidated
▪ If P3 wants to subsequently read/write x, it must request the new value
▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant
P1 P2 P3
x = 100 x = 100 x = 100
Processors
Caches x = 200 x = 100 x = 100
![Page 48: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/48.jpg)
Coherence Protocol Example▪ If P1 updates the value of x to 200, the stale value of x in other
processors must be invalidated
▪ If P3 wants to subsequently read/write x, it must request the new value
▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant
P1 P2 P3
x = 100 x = 100 x = 100
Processors
Caches x = 200 x = 100 x = 100
Request Data
Ld x
![Page 49: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/49.jpg)
Coherence Protocol Example▪ If P1 updates the value of x to 200, the stale value of x in other
processors must be invalidated
▪ If P3 wants to subsequently read/write x, it must request the new value
▪ SWMR = Single-Writer Multiple Readers, DVI = Data Value Invariant
P1 P2 P3
x = 100 x = 100 x = 100
Processors
Caches x = 200 x = 100 x = 100x = 200
Ld x
Data Response
![Page 50: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/50.jpg)
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011]
▪Three optimizations: correct individually, but not in combination
![Page 51: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/51.jpg)
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011]
▪Three optimizations: correct individually, but not in combination
1. Prefetching
![Page 52: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/52.jpg)
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011]
▪Three optimizations: correct individually, but not in combination
1. Prefetching
2. Invalidation before use
• Invalidation can arrive before data
• Acknowledge Inv early rather than wait for data to arrive
• But repeated inv before use → livelock [Kubiatowicz et al. ASPLOS 1992]
![Page 53: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/53.jpg)
Motivating Example – “Peekaboo” [Sorin et al. Primer 2011]
▪Three optimizations: correct individually, but not in combination
1. Prefetching
2. Invalidation before use
• Invalidation can arrive before data
• Acknowledge Inv early rather than wait for data to arrive
• But repeated inv before use → livelock [Kubiatowicz et al. ASPLOS 1992]
3. Livelock avoidance: allow destination core to perform oneoperation on data when it arrives, even if already invalidated[Sorin et al. Primer 2011]
• Does not break coherence
• Sometimes intentionally returns stale data
![Page 54: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/54.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Sharedy: Modified
x: Invalidy: Invalid
[x] ← 1[y] ← 1
r1 ← [y]r2 ← [x]
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 55: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/55.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Sharedy: Modified
x: Invalidy: Invalid
[x] ← 1[y] ← 1
r1 ← [y]r2 ← [x]
Prefetch x
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 56: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/56.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Sharedy: Modified
x: Invalidy: Invalid
[x] ← 1[y] ← 1
r1 ← [y]r2 ← [x]
Prefetch x
Data (x = 0)
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 57: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/57.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Sharedy: Modified
x: Invalidy: Invalid
[x] ← 1[y] ← 1
r1 ← [y]r2 ← [x]
Prefetch x
Data (x = 0)
Inv
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 58: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/58.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Sharedy: Modified
x: Invalidy: Invalid
[x] ← 1[y] ← 1
r1 ← [y]r2 ← [x]
Prefetch x
Data (x = 0)
Inv
Inv-Ack
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 59: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/59.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Invalidy: Invalid
r1 ← [y]r2 ← [x]
Prefetch x
Data (x = 0)
Inv
Inv-Ack
x: Modifiedy: Modified
[x] ← 1[y] ← 1
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 60: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/60.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Invalidy: Invalid
r1 ← [y]r2 ← [x]
Prefetch x
Data (x = 0)
Inv
Inv-Ack
x: Modifiedy: Modified
[x] ← 1[y] ← 1
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 61: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/61.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1
x: Invalidy: Invalid
r1 ← [y]r2 ← [x]
Prefetch x
Data (x = 0)
Inv
Inv-Ack
x: Modifiedy: Modified
Request y
[x] ← 1[y] ← 1
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 62: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/62.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1Prefetch x
Data (x = 0)
Inv
Inv-Ack
Data (y = 1)
x: Modifiedy: Shared
x: Invalidy: Shared
Request y
[x] ← 1[y] ← 1
r1 = 1r2 ← [x]
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 63: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/63.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1Prefetch x
Inv
Inv-Ack
Data (y = 1)
x: Modifiedy: Shared
x: Invalidy: Shared
Request y
[x] ← 1[y] ← 1
r1 = 1r2 ← [x]
Data (x = 0)
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 64: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/64.jpg)
Motivating Example – “Peekaboo”▪ Consider mp with the livelock-avoidance mechanism:
Core 0 Core 1Prefetch x
Inv
Inv-Ack
Data (y = 1)
x: Modifiedy: Shared
x: Invalidy: Shared
Request y
[x] ← 1[y] ← 1
r1 = 1r2 = 0
Data (x = 0)
Optimizations:
1. Prefetching2. Invalidation-before-use3. Livelock avoidance
![Page 65: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/65.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected CoherenceSWMR, DVI, No Stale Data
Consistency
![Page 66: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/66.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected CoherenceSWMR, DVI, No Stale Data
Consistency
![Page 67: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/67.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected CoherenceSWMR, DVI, No Stale Data
Consistency
![Page 68: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/68.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected CoherenceSWMR, DVI, No Stale Data
Consistency
![Page 69: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/69.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected CoherenceSWMR, DVI, No Stale Data
Consistency
![Page 70: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/70.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected Coherence
Consistency
SWMR, DVI, No Livelock
![Page 71: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/71.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected Coherence
Consistency
SWMR, DVI, No Livelock
![Page 72: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/72.jpg)
The Coherence-Consistency Interface (CCI)
▪CCI = coherence protocol guarantees to microarch. + orderings microarch. expects from coherence protocol
+
=
Expected CoherenceSWMR, DVI, No Livelock
CCI MismatchConsistency Violation!
![Page 73: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/73.jpg)
ViCL: Value in Cache Lifetime▪Need a way to model cache occupancy and coherence events for:
• Coherence protocol optimizations (eg: Peekaboo)
• Partial incoherence and lazy coherence (GPUs, etc)
▪A ViCL is a 4-tuple:
(cache_id, address, data_value, generation_id)
▪ cache_id and generation_id uniquely identify each cache line
▪A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address
![Page 74: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/74.jpg)
ViCLs in µhb Graphs▪ViCLs start at a ViCL Create event
and end at a ViCL Expire event
• Correspond to nodes in µhb graphs
• Axioms over these nodes and edges enforce coherence and data movement orderings
▪Use pipeline model from PipeCheck, but add ViCL nodes and edges
Litmus Test co-mp
![Page 75: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/75.jpg)
ViCLs in µhb Graphs▪ViCLs start at a ViCL Create event
and end at a ViCL Expire event
• Correspond to nodes in µhb graphs
• Axioms over these nodes and edges enforce coherence and data movement orderings
▪Use pipeline model from PipeCheck, but add ViCL nodes and edges
Litmus Test co-mp
![Page 76: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/76.jpg)
ViCLs in µhb Graphs▪ViCLs start at a ViCL Create event
and end at a ViCL Expire event
• Correspond to nodes in µhb graphs
• Axioms over these nodes and edges enforce coherence and data movement orderings
▪Use pipeline model from PipeCheck, but add ViCL nodes and edges
Litmus Test co-mp
![Page 77: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/77.jpg)
ViCLs in µhb Graphs▪ViCLs start at a ViCL Create event
and end at a ViCL Expire event
• Correspond to nodes in µhb graphs
• Axioms over these nodes and edges enforce coherence and data movement orderings
▪Use pipeline model from PipeCheck, but add ViCL nodes and edges
Litmus Test co-mp
![Page 78: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/78.jpg)
ViCLs in µhb Graphs▪ViCLs start at a ViCL Create event
and end at a ViCL Expire event
• Correspond to nodes in µhb graphs
• Axioms over these nodes and edges enforce coherence and data movement orderings
▪Use pipeline model from PipeCheck, but add ViCL nodes and edges
Litmus Test co-mp
![Page 79: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/79.jpg)
µhb Graph for the Peekaboo Problem▪Additional nodes represent ViCL
requests and invalidations
▪ Solution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011]
▪TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo!
• Now fixed
![Page 80: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/80.jpg)
µhb Graph for the Peekaboo Problem▪Additional nodes represent ViCL
requests and invalidations
▪ Solution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011]
▪TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo!
• Now fixed
![Page 81: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/81.jpg)
µhb Graph for the Peekaboo Problem▪Additional nodes represent ViCL
requests and invalidations
▪ Solution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011]
▪TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo!
• Now fixed
![Page 82: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/82.jpg)
µhb Graph for the Peekaboo Problem▪Additional nodes represent ViCL
requests and invalidations
▪ Solution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011]
▪TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo!
• Now fixed
![Page 83: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/83.jpg)
µhb Graph for the Peekaboo Problem▪Additional nodes represent ViCL
requests and invalidations
▪ Solution: Invalidated data only usable if accessing load/store is oldest in program order at time of request [Sorin et al. Primer 2011]
▪TSO-CC protocol [Elver and Nagarajan HPCA 2014] was vulnerable to variant of Peekaboo!
• Now fixed
![Page 84: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/84.jpg)
CCICheck Takeaways▪Coherence & consistency often closely coupled in implementations
▪ In such cases, coherence & consistency cannot be verified separately
▪CCICheck: CCI-aware microarchitectural MCM checking
• Uses ViCL (Value in Cache Lifetime) abstraction
▪Discovered bug in TSO-CC lazy coherence protocol
![Page 85: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/85.jpg)
Hardware
ISA-level MCMs in the Hardware-Software Stack
New ISA-level MCM
High-Level Languages (HLLs)
![Page 86: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/86.jpg)
Hardware
ISA-level MCMs in the Hardware-Software Stack
New ISA-level MCM
High-Level Languages (HLLs)
Which orderings must be guaranteed
by hardware?
![Page 87: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/87.jpg)
Hardware
ISA-level MCMs in the Hardware-Software Stack
New ISA-level MCM
High-Level Languages (HLLs)
Which orderings does the compiler need to
enforce?
Which orderings must be guaranteed
by hardware?
![Page 88: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/88.jpg)
Hardware
ISA-level MCMs in the Hardware-Software Stack
New ISA-level MCM
High-Level Languages (HLLs)
Which orderings does the compiler need to
enforce?
Which orderings must be guaranteed
by hardware?
TriCheck checks that HLL, compiler, ISA, and
hardware align on MCM requirements
![Page 89: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/89.jpg)
TriCheck: Layers of the Stack are Intertwined
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
Processor RTLProcessor RTL
▪ ISA-level MCMs should allow microarchitectural optimizations but also be compatible with HLLs
▪TriCheck [Trippel et al. ASPLOS 2017] enables holistic analysis of HLL memory model, ISA-level MCM, compiler mappings, and microarchitectures
• Mapping: translation of HLL synchronization primitives to one or more assembly language instructions
▪Also useful for checking HLL compiler mappings to ISA-level MCMs
▪ Selected as one of 12 “Top Picks of Comp. Arch. Conferences” for 2017
![Page 90: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/90.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL Litmus Test Variants
HLL Model
e.g. C11
µspec Microarch.
Model
Four Primary Inputs
![Page 91: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/91.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL Litmus Test Variants
HLL Model
e.g. C11
µspec Microarch.
Model
Examine all C11 memory_order
combinations (release, acquire, relaxed, seq_cst) for HLL litmus tests
![Page 92: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/92.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL Litmus Test Variants
ISA-level litmus tests
HLL Model
e.g. C11
µspec Microarch.
Model
Translate HLL Litmus Tests to ISA-level litmus tests
![Page 93: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/93.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
ISA-level litmus tests
HLL Model
e.g. C11
µspec Microarch.
Model
Use Herd to check HLL outcomes
![Page 94: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/94.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
Microarch. OutcomeObservable/Unobservable?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
µhb Analysis with Check
ISA-level litmus tests
HLL Model
e.g. C11
µspec Microarch.
Model
Use µhb analysis to check microarch.
outcomes
![Page 95: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/95.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
Microarch. OutcomeObservable/Unobservable?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
µhb Analysis with Check
ISA-level litmus tests
?
HLL Model
e.g. C11
µspec Microarch.
Model
Compare HLL and microarch. outcomes
![Page 96: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/96.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
Microarch. OutcomeObservable/Unobservable?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
µhb Analysis with Check
ISA-level litmus tests
?
HLL Model
e.g. C11
µspec Microarch.
Model
Compare HLL and microarch. outcomes
Forbidden Observable
![Page 97: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/97.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
Microarch. OutcomeObservable/Unobservable?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
µhb Analysis with Check
ISA-level litmus tests
HLL Model
e.g. C11
µspec Microarch.
Model
Compare HLL and microarch. outcomes
Forbidden ObservableBUG!
![Page 98: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/98.jpg)
TriCheck: Comparing HLL to MicroarchitectureHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
Microarch. OutcomeObservable/Unobservable?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
µhb Analysis with Check
ISA-level litmus tests
HLL Model
e.g. C11
µspec Microarch.
Model
Forbidden ObservableBUG!
If bugs found, iterateby changing the
inputs and re-run
![Page 99: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/99.jpg)
Using TriCheck for ISA MCM Design: RISC-V▪Ran TriCheck on draft RISC-V ISA MCM with
• C11 HLL MCM [Batty et al. POPL 2011] [Batty et al. POPL 2016]
• Compiler mappings based on RISC-V manual
• Variety of microarchitectures that relaxed various memory orderings
− All legal according to draft RISC-V spec
− Ranging from SC microarchitecture to one with reorderings allowed by ARM/Power
▪Draft RISC-V MCM for Base ISA incapable of correctly compiling C11:
• C11 outcome forbidden, but impossible to forbid on hardware
• RISC-V fences too weak to restore orderings that implementations could relax
![Page 100: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/100.jpg)
Current RISC-V Status▪ In response to our findings, RISC-V Memory Model Working Group
was formed (we are members)
• Mandate to create an MCM for RISC-V that satisfies community needs
▪Working Group has developed an MCM proposal that fixes the aforementioned bugs (and other issues)
▪MCM proposal recently passed the 45-day public feedback period!
• Well on its way to being included in the next version of the RISC-V ISA spec
![Page 101: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/101.jpg)
TriCheck: Analysing Compiler MappingsHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
Microarch. OutcomeObservable/Unobservable?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
µhb Analysis with Check
ISA-level litmus tests
?
HLL Model
e.g. C11
µspec Microarch.
Model
Fix HLL model, microarch model,
and ISA-level MCM
![Page 102: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/102.jpg)
TriCheck: Analysing Compiler MappingsHLL to ISA Compiler Mapping
HLL OutcomeForbidden/Allowed?
Microarch. OutcomeObservable/Unobservable?
HLL Litmus Test Variants
Herd[Alglave et al. TOPLAS 2014]
µhb Analysis with Check
ISA-level litmus tests
HLL Model
e.g. C11
µspec Microarch.
Model
Forbidden ObservableBUG!
![Page 103: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/103.jpg)
Checking C11 Mappings to ARMv7/Power▪Ran TriCheck on microarch. with reordering similar to ARMv7/Power
• Utilised “trailing-sync” compiler mapping [Batty et al. POPL 2012]
• Discovered 2 cases where C11 outcome forbidden, but allowed by hardware!
• Deduced that the mapping must be flawed
▪Mapping was supposedly proven correct [Batty et al. POPL 2012]
• Traced the loophole in the proof [Manerkar et al. CoRR’16]
▪Problem: C11 model slightly too strong for mappings
• C11 has happens-before (ℎ𝑏) ordering and total order on all SC accesses (𝑠𝑐)
• ℎ𝑏 and 𝑠𝑐 orders must agree with each other
• Trailing-sync mapping does not guarantee this for our counterexamples
![Page 104: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/104.jpg)
Current state of C11▪ “Leading-sync” mapping [McKenney and Silvera 2011]
• Counterexample discovered concurrently to us [Lahav et al. PLDI 2017]
▪Both mappings currently broken
▪Possible solutions under discussion by C11 memory model committee:
• RC11 [Lahav et al. PLDI 2017]: remove req. that 𝑠𝑐 and ℎ𝑏 orders agree
− Current mappings work, but reduces intuition in an already complicated C11 model
• Adding extra fences to mappings
− low performance, requires recompilation, counterexample pattern not common
![Page 105: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/105.jpg)
TriCheck Takeaways▪Both HLL memory models and microarchitectural optimizations
influence the design of ISA-level MCMs
▪TriCheck enables holistic analysis of HLL memory model, ISA-level MCM, compiler mappings, and microarchitectural implementations
▪TriCheck discovered numerous issues with draft RISC-V MCM
• Influenced the design of the new RISC-V MCM
▪Discovered two counterexamples to C11 -> ARMv7/Power compiler mappings
• Mappings were previously “proven” correct; isolated flaw in proof
![Page 106: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/106.jpg)
29
Coherence Protocol (SWMR, DVI, etc.)
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
Memory Consistency Checking for RTL
Microarchitecture Checking
![Page 107: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/107.jpg)
29
RTL implementation
Coherence Protocol (SWMR, DVI, etc.)
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
[RTL Image: Christopher Batten]
How to ensure RTL maintains orderings?
Memory Consistency Checking for RTL
Microarchitecture Checking
![Page 108: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/108.jpg)
29
RTL implementation
Coherence Protocol (SWMR, DVI, etc.)
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
[RTL Image: Christopher Batten]
How to ensure RTL maintains orderings?
Memory Consistency Checking for RTL
✓Microarchitecture Checking
![Page 109: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/109.jpg)
29
RTL implementation
Coherence Protocol (SWMR, DVI, etc.)
Lds.
L2WB
Mem.
SB
L1
Exec.
Dec.
Fetch
WB
Mem.
SB
L1
Exec.
Dec.
Fetch
[RTL Image: Christopher Batten]
How to ensure RTL maintains orderings?
Memory Consistency Checking for RTL
✓
Microarchitecture Checking
![Page 110: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/110.jpg)
RTLCheck: Checking RTL Implementations
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
▪RTLCheck [Manerkar et al. MICRO 2017] enables checking microarchitectural axioms against an implementation’s Verilog RTL for litmus test suites
▪This helps ensure that the RTL maintains orderings required for consistency
▪ Selected as an Honorable Mention from the “Top Picks of Comp. Arch. Conferences” for 2017
Processor RTL
![Page 111: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/111.jpg)
RTL Verification is Maturing…
▪…but usually ignores memory consistency!
▪Often use SystemVerilog Assertions (SVA)
![Page 112: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/112.jpg)
RTL Verification is Maturing…
▪…but usually ignores memory consistency!
▪Often use SystemVerilog Assertions (SVA)
No MCM verification
ISA-Formal [Reid et al. CAV 2016]-Instr. Operational Semantics
![Page 113: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/113.jpg)
RTL Verification is Maturing…
▪…but usually ignores memory consistency!
▪Often use SystemVerilog Assertions (SVA)
No MCM verification
ISA-Formal [Reid et al. CAV 2016]-Instr. Operational Semantics
No multicore MCM verification (?)
DOGReL [Stewart et al. DIFTS 2014]-Memory subsystem transactions
![Page 114: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/114.jpg)
RTL Verification is Maturing…
▪…but usually ignores memory consistency!
▪Often use SystemVerilog Assertions (SVA)
No MCM verification
ISA-Formal [Reid et al. CAV 2016]-Instr. Operational Semantics
No multicore MCM verification (?)
DOGReL [Stewart et al. DIFTS 2014]-Memory subsystem transactions
Needs Bluespec design and manual proofs!
Kami[Vijayaraghavan et al. CAV 2015] [Choi et al. ICFP 2017]-MCM correctness for all programs, but…
![Page 115: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/115.jpg)
RTL Verification is Maturing…
▪…but usually ignores memory consistency!
▪Often use SystemVerilog Assertions (SVA)
No MCM verification
ISA-Formal [Reid et al. CAV 2016]-Instr. Operational Semantics
No multicore MCM verification (?)
DOGReL [Stewart et al. DIFTS 2014]-Memory subsystem transactions
Needs Bluespec design and manual proofs!
Kami[Vijayaraghavan et al. CAV 2015] [Choi et al. ICFP 2017]-MCM correctness for all programs, but…
Lack of automated memory
consistency verification at RTL!
![Page 116: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/116.jpg)
RTLCheck: Checking RTL Consistency Orderings
RTL Design
µspec Microarch.
Axioms
Litmus Test
Mapping Functions
Temporal SystemVerilogAssertions (SVA)
Cadence JasperGold(RTL Verifier)
RTLCheck
Proven?
![Page 117: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/117.jpg)
RTLCheck: Checking RTL Consistency Orderings
RTL Design
µspec Microarch.
Axioms
Litmus Test
Mapping Functions
Temporal SystemVerilogAssertions (SVA)
Cadence JasperGold(RTL Verifier)
RTLCheck
Proven?
User-provided mapping functionstranslate microarch.
primitives to RTL equivalents
![Page 118: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/118.jpg)
RTLCheck: Checking RTL Consistency Orderings
RTL Design
µspec Microarch.
Axioms
Litmus Test
Mapping Functions
Temporal SystemVerilogAssertions (SVA)
Cadence JasperGold(RTL Verifier)
RTLCheck
Proven?
RTLCheck automatically translates µarch.
ordering axioms to temporal properties
![Page 119: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/119.jpg)
RTLCheck: Checking RTL Consistency Orderings
RTL Design
µspec Microarch.
Axioms
Litmus Test
Mapping Functions
Temporal SystemVerilogAssertions (SVA)
Cadence JasperGold(RTL Verifier)
RTLCheck
Proven?
Properties may be provenor counterexample found
![Page 120: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/120.jpg)
Meaning can be Lost in Translation!
小心地滑
![Page 121: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/121.jpg)
Meaning can be Lost in Translation!
小心地滑(Caution: Slippery Floor)
![Page 122: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/122.jpg)
Meaning can be Lost in Translation!
[Image: Barbara Younger][Inspiration: Tae Jun Ham]
小心地滑(Caution: Slippery Floor)
![Page 123: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/123.jpg)
RTLCheck: Checking Consistency at RTL
AxiomaticMicroarch. Analysis
![Page 124: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/124.jpg)
RTLCheck: Checking Consistency at RTL
AxiomaticMicroarch. Analysis
TemporalRTL Verification
(SVA, etc)
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
![Page 125: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/125.jpg)
RTLCheck: Checking Consistency at RTL
AxiomaticMicroarch. Analysis
TemporalRTL Verification
(SVA, etc)
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
Abstract nodes and happens-before edges
![Page 126: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/126.jpg)
RTLCheck: Checking Consistency at RTL
AxiomaticMicroarch. Analysis
TemporalRTL Verification
(SVA, etc)
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
Abstract nodes and happens-before edges
Concretesignals and clock cycles
![Page 127: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/127.jpg)
RTLCheck: Checking Consistency at RTL
AxiomaticMicroarch. Analysis
TemporalRTL Verification
(SVA, etc)
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
Axiomatic/Temporal Mismatch!
Abstract nodes and happens-before edges
Concretesignals and clock cycles
![Page 128: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/128.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
mp (Message Passing)
Outcome Filtering in Axiomatic Analysis▪Outcome Filtering: Restrict test outcome to one particular outcome
• Allows for more efficient verification
▪Axiomatic models make outcome filtering easy
![Page 129: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/129.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
mp (Message Passing)
Outcome Filtering in Axiomatic Analysis▪Outcome Filtering: Restrict test outcome to one particular outcome
• Allows for more efficient verification
▪Axiomatic models make outcome filtering easy
Outcome: r1 = 1, r2 = 1
Execution examined as a whole, so outcome can be enforced!
![Page 130: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/130.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
mp (Message Passing)
Outcome Filtering in Axiomatic Analysis▪Outcome Filtering: Restrict test outcome to one particular outcome
• Allows for more efficient verification
▪Axiomatic models make outcome filtering easy
Outcome: r1 = 1, r2 = 1
Execution examined as a whole, so outcome can be enforced!
![Page 131: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/131.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
mp (Message Passing)
Outcome Filtering in Axiomatic Analysis▪Outcome Filtering: Restrict test outcome to one particular outcome
• Allows for more efficient verification
▪Axiomatic models make outcome filtering easy
Outcome: r1 = 1, r2 = 1
Execution examined as a whole, so outcome can be enforced!
![Page 132: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/132.jpg)
Outcome Filtering in Temporal Verification▪ Filtering executions by outcome requires expensive global analysis
• Not done by many SVA verifiers, including JasperGold!
mp
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
Is r1 = 1, r2 = 0 possible?
![Page 133: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/133.jpg)
Outcome Filtering in Temporal Verification▪ Filtering executions by outcome requires expensive global analysis
• Not done by many SVA verifiers, including JasperGold!
mp
(i1) x = 1
Step 1
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
Is r1 = 1, r2 = 0 possible?
![Page 134: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/134.jpg)
Outcome Filtering in Temporal Verification▪ Filtering executions by outcome requires expensive global analysis
• Not done by many SVA verifiers, including JasperGold!
mp
(i1) x = 1
Step 1 Step 2
(i2) y = 1 (i3) r1 = y = 1
Step 3
(i4) r2 = x = 1
Step 4
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
Is r1 = 1, r2 = 0 possible?
![Page 135: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/135.jpg)
Outcome Filtering in Temporal Verification▪ Filtering executions by outcome requires expensive global analysis
• Not done by many SVA verifiers, including JasperGold!
mp
(i1) x = 1
Step 1 Step 2
(i2) y = 1 (i3) r1 = y = 1
Step 3
(i4) r2 = x = 0?
(i4) r2 = x = 1
Step 4
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
Is r1 = 1, r2 = 0 possible?
![Page 136: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/136.jpg)
Outcome Filtering in Temporal Verification▪ Filtering executions by outcome requires expensive global analysis
• Not done by many SVA verifiers, including JasperGold!
mp
(i1) x = 1
Step 1 Step 2
(i2) y = 1 (i3) r1 = y = 1
Step 3
(i4) r2 = x = 0?
(i4) r2 = x = 1
Step 4
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
Is r1 = 1, r2 = 0 possible?
(i3) r1 = y = 0
… …
……
Need to examine all possible paths from
current step to end of execution: too expensive!
![Page 137: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/137.jpg)
Outcome Filtering in Temporal Verification▪ Filtering executions by outcome requires expensive global analysis
• Not done by many SVA verifiers, including JasperGold!
mp
(i1) x = 1
Step 1 Step 2
(i2) y = 1 (i3) r1 = y = 1
Step 3
(i4) r2 = x = 0?
(i4) r2 = x = 1
Step 4
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
Is r1 = 1, r2 = 0 possible?
(i3) r1 = y = 0
… …
……
Need to examine all possible paths from
current step to end of execution: too expensive!
SVA Verifier Approximation: Only check if constraints hold up to current step
Makes Outcome Filtering impossible!
![Page 138: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/138.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
µspec Analysis Uses Outcome Filtering
Note: Axioms abstracted for brevity
mp
![Page 139: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/139.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
µspec Analysis Uses Outcome Filtering
Note: Axioms abstracted for brevity
mp
![Page 140: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/140.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
µspec Analysis Uses Outcome Filtering
Note: Axioms abstracted for brevity
mp
No write for load to read from!
![Page 141: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/141.jpg)
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
µspec Analysis Uses Outcome Filtering
Note: Axioms abstracted for brevity
mp
Outcome Filtering leads to simpler axioms!
![Page 142: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/142.jpg)
Core[0].Commit
Core[1].Commit
clk
Core[1].LData
Core[0].SData
Temporal Outcome Filtering Fails!Filtered Read_Values:Unless Load returns non-zero value,
Load happens before all stores to its address
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
Note: Axioms/properties abstracted for brevity
Time (cycles)
![Page 143: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/143.jpg)
After 3 cycles:
Core[0].Commit
Core[1].Commit
clk
Core[1].LData
Core[0].SData
St x
0x1
3
Temporal Outcome Filtering Fails!Filtered Read_Values:Unless Load returns non-zero value,
Load happens before all stores to its address
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
21
Note: Axioms/properties abstracted for brevity
Time (cycles)
![Page 144: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/144.jpg)
After 3 cycles:Store happens before load!
Property Violated?Core[0].Commit
Core[1].Commit
clk
Core[1].LData
Core[0].SData
St x
0x1
3
Temporal Outcome Filtering Fails!Filtered Read_Values:Unless Load returns non-zero value,
Load happens before all stores to its address
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
21
Note: Axioms/properties abstracted for brevity
Time (cycles)
![Page 145: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/145.jpg)
After 6 cycles:Load does not read 0
No Violation!
After 3 cycles:Store happens before load!
Property Violated?Core[0].Commit
Core[1].Commit
clk
Core[1].LData
Core[0].SData
St x
0x1
3
St y
0x1
4
Ld y
0x1
5
Ld x
0x1
6
Temporal Outcome Filtering Fails!Filtered Read_Values:Unless Load returns non-zero value,
Load happens before all stores to its address
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
21
Note: Axioms/properties abstracted for brevity
Time (cycles)
![Page 146: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/146.jpg)
After 6 cycles:Load does not read 0
No Violation!But SVA verifiers don’t check
future cycles!
After 3 cycles:Store happens before load!
Property Violated?Core[0].Commit
Core[1].Commit
clk
Core[1].LData
Core[0].SData
St x
0x1
3
St y
0x1
4
Ld y
0x1
5
Ld x
0x1
6
Temporal Outcome Filtering Fails!Filtered Read_Values:Unless Load returns non-zero value,
Load happens before all stores to its address
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
21
Note: Axioms/properties abstracted for brevity
Time (cycles)
![Page 147: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/147.jpg)
After 6 cycles:Load does not read 0
No Violation!But SVA verifiers don’t check
future cycles!
After 3 cycles:Store happens before load!
Property Violated?Core[0].Commit
Core[1].Commit
clk
Core[1].LData
Core[0].SData
St x
0x1
3
Temporal Outcome Filtering Fails!Filtered Read_Values:Unless Load returns non-zero value,
Load happens before all stores to its address
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
21
Note: Axioms/properties abstracted for brevity
Counterexample flagged despite hardware doing nothing wrong!
Time (cycles)
![Page 148: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/148.jpg)
Property to check:mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);
▪Don’t simplify axioms; translate all cases
▪Tag each case with appropriate load value constraints
• reflect the data constraints required for edge(s)
Solution: Load Value Constraints
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
Note: Axioms and properties abstracted for brevity
![Page 149: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/149.jpg)
Property to check:mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);
▪Don’t simplify axioms; translate all cases
▪Tag each case with appropriate load value constraints
• reflect the data constraints required for edge(s)
Solution: Load Value Constraints
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
Note: Axioms and properties abstracted for brevity
![Page 150: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/150.jpg)
Property to check:mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);
▪Don’t simplify axioms; translate all cases
▪Tag each case with appropriate load value constraints
• reflect the data constraints required for edge(s)
Solution: Load Value Constraints
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
Note: Axioms and properties abstracted for brevity
![Page 151: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/151.jpg)
Property to check:mapNode(Ld x → St x, Ld x == 0) or mapNode(St x → Ld x, Ld x == 1);
▪Don’t simplify axioms; translate all cases
▪Tag each case with appropriate load value constraints
• reflect the data constraints required for edge(s)
Solution: Load Value Constraints
Axiom "Read_Values":Every load either reads BeforeAllWrites OR reads FromLatestWrite
Core 0 Core 1
(i1) x = 1; (i3) r1 = y;
(i2) y = 1; (i4) r2 = x;
SC Forbids: r1 = 1, r2 = 0
mp
Note: Axioms and properties abstracted for brevity
![Page 152: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/152.jpg)
Multi-V-scale: a Multicore Case StudyCore 0 Core 1 Core 2 Core 3
Arbiter
Memory
WB
DX
IF
WB
DX
IF
WB
DX
IF
WB
DX
IF
![Page 153: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/153.jpg)
Multi-V-scale: a Multicore Case StudyCore 0 Core 1 Core 2 Core 3
Arbiter
Memory
WB
DX
IF
WB
DX
IF
WB
DX
IF
WB
DX
IF
3-stage in-order pipelines
![Page 154: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/154.jpg)
Multi-V-scale: a Multicore Case StudyCore 0 Core 1 Core 2 Core 3
Arbiter
Memory
WB
DX
IF
WB
DX
IF
WB
DX
IF
WB
DX
IF
Arbiter enforces that only one core
can access memory at any
time
![Page 155: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/155.jpg)
▪ V-scale memory internally writes stores to wdata register
▪ wdata pushed to memory when subsequent store occurs
▪ Akin to single-entry store buffer
▪ When two stores are sent to memory in successive cycles, first of two stores is dropped by memory!
▪ Fixed bug by eliminating wdata
▪ V-scale has since been deprecated by RISC-V Foundation
Bug Discovered in V-scaleCore 0 Core 1 Core 2 Core 3
Arbiter
WB
DX
IF
WB
DX
IF
WB
DX
IF
WB
DX
IF
Memory
wdata
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Mem array
Stores
x = 1
y = 1
![Page 156: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/156.jpg)
▪ V-scale memory internally writes stores to wdata register
▪ wdata pushed to memory when subsequent store occurs
▪ Akin to single-entry store buffer
▪ When two stores are sent to memory in successive cycles, first of two stores is dropped by memory!
▪ Fixed bug by eliminating wdata
▪ V-scale has since been deprecated by RISC-V Foundation
Bug Discovered in V-scaleCore 0 Core 1 Core 2 Core 3
Arbiter
WB
DX
IF
WB
DX
IF
WB
DX
IF
WB
DX
IF
Memory
wdata
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Mem array
Stores
x = 1
y = 1
![Page 157: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/157.jpg)
▪ V-scale memory internally writes stores to wdata register
▪ wdata pushed to memory when subsequent store occurs
▪ Akin to single-entry store buffer
▪ When two stores are sent to memory in successive cycles, first of two stores is dropped by memory!
▪ Fixed bug by eliminating wdata
▪ V-scale has since been deprecated by RISC-V Foundation
Bug Discovered in V-scaleCore 0 Core 1 Core 2 Core 3
Arbiter
WB
DX
IF
WB
DX
IF
WB
DX
IF
WB
DX
IF
Memory
wdata
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Mem array
Stores
x = 1y = 1
![Page 158: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/158.jpg)
RTLCheck Takeaways▪Microarchitectural models must be validated against RTL
▪RTLCheck: Automated translation of microarch. axioms into equivalent temporal SVA properties for litmus test suites
• Translation is complicated by the axiomatic-temporal mismatch
• JasperGold was able to prove 90% of properties/test in 11 hours runtime
▪ Last piece of the Check suite; now have tools at all levels of the stack!
![Page 159: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/159.jpg)
Conclusion
High-Level Languages (HLL)
Compiler
Architecture (ISA)
Microarchitecture
OS
▪The Check suite provides automated full-stack MCM checking of implementations
▪ Litmus-test based verification to concentrate on error-prone cases
▪Can check:
• Implementation of HLL requirements
• Virtual memory implementation
• HLL Compiler mappings
• Microarchitectural Orderings (including coherence)
• and even RTL (Verilog)!
▪All tools are open-source and publicly available!
Processor RTL
![Page 160: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/160.jpg)
With Thanks to…
▪Collaborators:
• Margaret Martonosi
• Daniel Lustig
• Caroline Trippel
• Michael Pellauer
• Aarti Gupta
▪ Funding:
• Princeton Wallace Memorial Honorific Fellowship
• STARnet C-FAR (Center for Future Architectures Research)
• JUMP ADA Center (Applications Driving Architectures)
• National Science Foundation
![Page 161: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/161.jpg)
Questions?
http://check.cs.princeton.edu/
http://www.cs.princeton.edu/~manerkar
• Yatin A. Manerkar, Daniel Lustig, Margaret Martonosi, and Michael Pellauer. RTLCheck: Verifying the Memory Consistency of RTL Designs. The 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017.
• Yatin A. Manerkar, Caroline Trippel, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings. CoRR abs/1611.01507, November 2016.
• Caroline Trippel, Yatin A. Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA. The 22nd International Conference on Architectural Support forProgramming Languages and Operating Systems (ASPLOS), April 2017.
• Yatin A. Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. CCICheck: Using µhb Graphs to Verify the Coherence-Consistency Interface. The 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2015.
![Page 162: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/162.jpg)
Coherence and Consistency
Conceptual
Coherence Consistency
▪Most coherence protocols are not that simple!
• Partial incoherence (e.g. GPUs) [Wickerson et al. OOPSLA 2016]
• Lazy coherence (e.g. TSO-CC) [Elver and Nagarajan HPCA 2014]
▪CCI: Coherence-Consistency Interface
![Page 163: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/163.jpg)
Coherence and Consistency
Conceptual
Real Implementations
Coherence and consistency often interwoven
Coherence Consistency
▪Most coherence protocols are not that simple!
• Partial incoherence (e.g. GPUs) [Wickerson et al. OOPSLA 2016]
• Lazy coherence (e.g. TSO-CC) [Elver and Nagarajan HPCA 2014]
▪CCI: Coherence-Consistency Interface
![Page 164: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/164.jpg)
Coherence and Consistency
Conceptual
Real Implementations
Coherence and consistency often interwoven
Verifiers can’t ignore consistency
implications!
Coherence Consistency
Verifiers can’t assume abstract
coherence/memory hierarchy!
▪Most coherence protocols are not that simple!
• Partial incoherence (e.g. GPUs) [Wickerson et al. OOPSLA 2016]
• Lazy coherence (e.g. TSO-CC) [Elver and Nagarajan HPCA 2014]
▪CCI: Coherence-Consistency Interface
![Page 165: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/165.jpg)
Coherence and Consistency
Conceptual
Real Implementations
Coherence and consistency often interwoven
Verifiers can’t ignore consistency
implications!
Coherence Consistency
Verifiers can’t assume abstract
coherence/memory hierarchy!
CCI
▪Most coherence protocols are not that simple!
• Partial incoherence (e.g. GPUs) [Wickerson et al. OOPSLA 2016]
• Lazy coherence (e.g. TSO-CC) [Elver and Nagarajan HPCA 2014]
▪CCI: Coherence-Consistency Interface
![Page 166: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/166.jpg)
Issue with Draft RISC-V MCM: Cumulativity▪Consider this litmus test variant (WRC):
• C11 atomics can specify memory orderings: REL = release, ACQ = acquire
▪RISC-V lacked cumulative fences to enforce this ordering:
• (x5 and x6 contain addresses of x and y)
Thread 0 Thread 1 Thread 2
St (x, 1, REL) r0 = Ld (x, ACQ) r1 = Ld (y, ACQ)
St (y, 1, REL) r2 = Ld (x, ACQ)
Forbidden by C11: r0 = 1, r1 = 1, r2 = 0
Core 0 Core 1 Core 2
sw x1, (x5) lw x2, (x5) lw x3, (x6)
fence r, rw fence r, rw
fence rw, w lw x4, (x5)
sw x2, (x6)
Allowed by draft RISC-V: x1 = 1, x2 = 1, x3 = 1, x4 = 0
![Page 167: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/167.jpg)
Issue with Draft RISC-V MCM: Cumulativity▪Consider this litmus test variant (WRC):
• C11 atomics can specify memory orderings: REL = release, ACQ = acquire
▪RISC-V lacked cumulative fences to enforce this ordering:
• (x5 and x6 contain addresses of x and y)
Thread 0 Thread 1 Thread 2
St (x, 1, REL) r0 = Ld (x, ACQ) r1 = Ld (y, ACQ)
St (y, 1, REL) r2 = Ld (x, ACQ)
Forbidden by C11: r0 = 1, r1 = 1, r2 = 0
Core 0 Core 1 Core 2
sw x1, (x5) lw x2, (x5) lw x3, (x6)
fence r, rw fence r, rw
fence rw, w lw x4, (x5)
sw x2, (x6)
Allowed by draft RISC-V: x1 = 1, x2 = 1, x3 = 1, x4 = 0
![Page 168: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/168.jpg)
Issue with Draft RISC-V MCM: Cumulativity▪Consider this litmus test variant (WRC):
• C11 atomics can specify memory orderings: REL = release, ACQ = acquire
▪RISC-V lacked cumulative fences to enforce this ordering:
• (x5 and x6 contain addresses of x and y)
Thread 0 Thread 1 Thread 2
St (x, 1, REL) r0 = Ld (x, ACQ) r1 = Ld (y, ACQ)
St (y, 1, REL) r2 = Ld (x, ACQ)
Forbidden by C11: r0 = 1, r1 = 1, r2 = 0
Core 0 Core 1 Core 2
sw x1, (x5) lw x2, (x5) lw x3, (x6)
fence r, rw fence r, rw
fence rw, w lw x4, (x5)
sw x2, (x6)
Allowed by draft RISC-V: x1 = 1, x2 = 1, x3 = 1, x4 = 0
![Page 169: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/169.jpg)
Issue with Draft RISC-V MCM: Cumulativity▪Consider this litmus test variant (WRC):
• C11 atomics can specify memory orderings: REL = release, ACQ = acquire
▪RISC-V lacked cumulative fences to enforce this ordering:
• (x5 and x6 contain addresses of x and y)
Thread 0 Thread 1 Thread 2
St (x, 1, REL) r0 = Ld (x, ACQ) r1 = Ld (y, ACQ)
St (y, 1, REL) r2 = Ld (x, ACQ)
Forbidden by C11: r0 = 1, r1 = 1, r2 = 0
Core 0 Core 1 Core 2
sw x1, (x5) lw x2, (x5) lw x3, (x6)
fence r, rw fence r, rw
fence rw, w lw x4, (x5)
sw x2, (x6)
Allowed by draft RISC-V: x1 = 1, x2 = 1, x3 = 1, x4 = 0
![Page 170: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/170.jpg)
Issue with Draft RISC-V MCM: Cumulativity▪Consider this litmus test variant (WRC):
• C11 atomics can specify memory orderings: REL = release, ACQ = acquire
▪RISC-V lacked cumulative fences to enforce this ordering:
• (x5 and x6 contain addresses of x and y)
Thread 0 Thread 1 Thread 2
St (x, 1, REL) r0 = Ld (x, ACQ) r1 = Ld (y, ACQ)
St (y, 1, REL) r2 = Ld (x, ACQ)
Forbidden by C11: r0 = 1, r1 = 1, r2 = 0
Core 0 Core 1 Core 2
sw x1, (x5) lw x2, (x5) lw x3, (x6)
fence r, rw fence r, rw
fence rw, w lw x4, (x5)
sw x2, (x6)
Allowed by draft RISC-V: x1 = 1, x2 = 1, x3 = 1, x4 = 0
![Page 171: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/171.jpg)
Issue with Draft RISC-V MCM: Cumulativity▪Consider this litmus test variant (WRC):
• C11 atomics can specify memory orderings: REL = release, ACQ = acquire
▪RISC-V lacked cumulative fences to enforce this ordering:
• (x5 and x6 contain addresses of x and y)
Thread 0 Thread 1 Thread 2
St (x, 1, REL) r0 = Ld (x, ACQ) r1 = Ld (y, ACQ)
St (y, 1, REL) r2 = Ld (x, ACQ)
Forbidden by C11: r0 = 1, r1 = 1, r2 = 0
Core 0 Core 1 Core 2
sw x1, (x5) lw x2, (x5) lw x3, (x6)
fence r, rw fence r, rw
fence rw, w lw x4, (x5)
sw x2, (x6)
Allowed by draft RISC-V: x1 = 1, x2 = 1, x3 = 1, x4 = 0
![Page 172: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/172.jpg)
ARMv7/Power Trailing-Sync Counterexample▪Consider this litmus test variant (IRIW):
• Total order over all SC atomic accesses is required
▪With the trailing-sync mapping, this compiles to the following:
• Allowed on Power [Sarkar et al. PLDI 2011] and ARMv7 [Alglave et al. TOPLAS 2014]
Thread 0 Thread 1 Thread 2 Thread 3
St (x, 1, SC) St (y, 1, SC) r0 = Ld (x, ACQ) r2 = Ld (y, ACQ)
r1 = Ld (y, SC) r3 = Ld (x, SC)
Forbidden by C11: r0 = 1, r1 = 0, r2 = 1, r3 = 0
Core 0 Core 1 Core 2 Core 3
str 1, [x] str 1, [y] ldr r1, [x] ldr r3, [y]
ctrlisb/ctrlisync ctrlisb/ctrlisync
ldr r2, [y] ldr r4, [x]
Allowed by Power/ARMv7: r1 = 1, r2 = 0, r3 = 1, r4 = 0
![Page 173: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/173.jpg)
ARMv7/Power Trailing-Sync Counterexample▪Consider this litmus test variant (IRIW):
• Total order over all SC atomic accesses is required
▪ SC total order must respect happens-before i.e. (sb U sw)+
Thread 0 Thread 1 Thread 2 Thread 3
St (x, 1, SC) St (y, 1, SC) r0 = Ld (x, ACQ) r2 = Ld (y, ACQ)
r1 = Ld (y, SC) r3 = Ld (x, SC)
Forbidden by C11: r0 = 1, r1 = 0, r2 = 1, r3 = 0
[Generated with CPPMEM from Cambridge]
![Page 174: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/174.jpg)
ARMv7/Power Trailing-Sync Counterexample▪Consider this litmus test variant (IRIW):
• Total order over all SC atomic accesses is required
▪ SC total order must respect happens-before i.e. (sb U sw)+
Thread 0 Thread 1 Thread 2 Thread 3
St (x, 1, SC) St (y, 1, SC) r0 = Ld (x, ACQ) r2 = Ld (y, ACQ)
r1 = Ld (y, SC) r3 = Ld (x, SC)
Forbidden by C11: r0 = 1, r1 = 0, r2 = 1, r3 = 0
[Generated with CPPMEM from Cambridge]
![Page 175: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/175.jpg)
ARMv7/Power Trailing-Sync Counterexample▪Consider this litmus test variant (IRIW):
• Total order over all SC atomic accesses is required
▪ SC total order must respect happens-before i.e. (sb U sw)+
Thread 0 Thread 1 Thread 2 Thread 3
St (x, 1, SC) St (y, 1, SC) r0 = Ld (x, ACQ) r2 = Ld (y, ACQ)
r1 = Ld (y, SC) r3 = Ld (x, SC)
Forbidden by C11: r0 = 1, r1 = 0, r2 = 1, r3 = 0
[Generated with CPPMEM from Cambridge]
![Page 176: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/176.jpg)
ARMv7/Power Trailing-Sync Counterexample▪Consider this litmus test variant (IRIW):
• Total order over all SC atomic accesses is required
▪ SC total order must respect happens-before i.e. (sb U sw)+
Thread 0 Thread 1 Thread 2 Thread 3
St (x, 1, SC) St (y, 1, SC) r0 = Ld (x, ACQ) r2 = Ld (y, ACQ)
r1 = Ld (y, SC) r3 = Ld (x, SC)
Forbidden by C11: r0 = 1, r1 = 0, r2 = 1, r3 = 0
[Generated with CPPMEM from Cambridge]
c: Wsc x = 1 d: Wsc y = 1
f: Rsc y = 0 h: Rsc x = 0
![Page 177: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/177.jpg)
ARMv7/Power Trailing-Sync Counterexample▪Consider this litmus test variant (IRIW):
• Total order over all SC atomic accesses is required
▪ SC reads must be before later SC writes
Thread 0 Thread 1 Thread 2 Thread 3
St (x, 1, SC) St (y, 1, SC) r0 = Ld (x, ACQ) r2 = Ld (y, ACQ)
r1 = Ld (y, SC) r3 = Ld (x, SC)
Forbidden by C11: r0 = 1, r1 = 0, r2 = 1, r3 = 0
[Generated with CPPMEM from Cambridge]
c: Wsc x = 1 d: Wsc y = 1
f: Rsc y = 0 h: Rsc x = 0
![Page 178: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/178.jpg)
ARMv7/Power Trailing-Sync Counterexample▪Consider this litmus test variant (IRIW):
• Total order over all SC atomic accesses is required
▪ SC reads must be before later SC writes
Thread 0 Thread 1 Thread 2 Thread 3
St (x, 1, SC) St (y, 1, SC) r0 = Ld (x, ACQ) r2 = Ld (y, ACQ)
r1 = Ld (y, SC) r3 = Ld (x, SC)
Forbidden by C11: r0 = 1, r1 = 0, r2 = 1, r3 = 0
[Generated with CPPMEM from Cambridge]
c: Wsc x = 1 d: Wsc y = 1
f: Rsc y = 0 h: Rsc x = 0
• Cycle in the SC order implies outcome is forbidden• But compiled code allows the behaviour!
![Page 179: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/179.jpg)
What went wrong?▪ It was thought that program order and coherence edges directly
between SC accesses were all that needed enforcing [Batty et al. POPL 2012]
▪But ℎ𝑏 edges can arise between SC accesses through the transitive composition of edges to and from a non-SC intermediate access
▪Occurs in IRIW counterexample:
![Page 180: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/180.jpg)
What went wrong?▪ It was thought that program order and coherence edges directly
between SC accesses were all that needed enforcing [Batty et al. POPL 2012]
▪But ℎ𝑏 edges can arise between SC accesses through the transitive composition of edges to and from a non-SC intermediate access
▪Occurs in IRIW counterexample:
![Page 181: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/181.jpg)
What went wrong?▪ It was thought that program order and coherence edges directly
between SC accesses were all that needed enforcing [Batty et al. POPL 2012]
▪But ℎ𝑏 edges can arise between SC accesses through the transitive composition of edges to and from a non-SC intermediate access
▪Occurs in IRIW counterexample:
![Page 182: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/182.jpg)
▪Need to restrict executions to those of litmus test
▪Three classes of assumptions:
• Memory initialization
− Instr. mem and data mem
• Register initialization
• Value assumptions
− Load value assumptions: loads return correct value (when they occur)
− Final value assumptions: Required final values of memory are respected
▪RTLCheck generates SystemVerilog Assumptions to constrain executions
• Utilises user-provided program mapping function
Assumption Generation
![Page 183: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/183.jpg)
▪Covering trace: execution where assumption condition is enforced
• Eg: execution where load of x returns 0
• Must obey all assumptions
▪Covering final value assum. == finding forbidden execution!
• No covering trace => equivalent to verifying overall test!
▪Quicker verification for some tests
• Expect benefit to be largest for small designs
Assumption Generation
![Page 184: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/184.jpg)
▪Why generate final value assumptions if test has no final conditions?
▪Answer: Covering traces can lead to faster verification
▪These are traces where assumption condition occurs and can be enforced
The Benefits of Final Value Assumptions
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
![Page 185: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/185.jpg)
▪Why generate final value assumptions if test has no final conditions?
▪Answer: Covering traces can lead to faster verification
▪These are traces where assumption condition occurs and can be enforced
The Benefits of Final Value Assumptions
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
Covering trace for final valassumption is complete execution of litmus test
![Page 186: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/186.jpg)
▪Why generate final value assumptions if test has no final conditions?
▪Answer: Covering traces can lead to faster verification
▪These are traces where assumption condition occurs and can be enforced
The Benefits of Final Value Assumptions
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
Covering trace for final valassumption is complete execution of litmus test
Covering trace must also obey other assumptions, including load val assumptions
(For mp, Ld y = 1 and Ld x = 0)
![Page 187: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/187.jpg)
▪Why generate final value assumptions if test has no final conditions?
▪Answer: Covering traces can lead to faster verification
▪These are traces where assumption condition occurs and can be enforced
The Benefits of Final Value Assumptions
Core[0].DX
Core[0].WB
Core[1].DX
Core[1].WB
clk
Core[1].LData
St x
St x
St y
St y
Ld y
Ld y
Ld x
Ld x
0x1 0x1
Core[0].SData 0x1 0x1
2 3 4 5 6 7
Covering trace for final valassumption is complete execution of litmus test
Covering trace must also obey other assumptions, including load val assumptions
(For mp, Ld y = 1 and Ld x = 0)
Thus, covering trace for mp final valassumption (full execution with Ld y=1
and Ld x=0) is equivalent to finding forbidden execution of mp!
![Page 188: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/188.jpg)
▪Two configurations (Hybrid and Full_Proof), avg. runtime 6.2 hrs
• See paper for configuration details
Results: Time to Prove Properties
0
2
4
6
8
10
12
safe
00
6 lbsa
fe0
07
mp
safe
02
2sa
fe0
10
ssl
safe
00
0sa
fe0
08
n4
n5
co-m
psa
fe0
01
wrc sb
safe
01
8p
od
wr0
00
safe
00
3m
p+s
tale
ldsa
fe0
12
safe
00
2sa
fe0
14
iwp
23b
safe
00
9sa
fe0
29
safe
02
7rw
cn
2rf
i01
3sa
fe0
30
safe
01
1rf
i01
5rf
i00
3sa
fe0
21
iriw n
7iw
p24
po
dw
r00
1sa
fe0
17
rfi0
12
n6
safe
01
9rf
i00
1rf
i00
0rf
i01
1sa
fe0
26
safe
00
4sa
fe0
16
rfi0
02
rfi0
05
rfi0
14
rfi0
04
rfi0
06
n1
amd
3co
-iri
wM
ean
Tim
e (
ho
urs
)
Hybrid Full_Proof
![Page 189: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/189.jpg)
▪Two configurations (Hybrid and Full_Proof), avg. runtime 6.2 hrs
• See paper for configuration details
Results: Time to Prove Properties
0
2
4
6
8
10
12
safe
00
6 lbsa
fe0
07
mp
safe
02
2sa
fe0
10
ssl
safe
00
0sa
fe0
08
n4
n5
co-m
psa
fe0
01
wrc sb
safe
01
8p
od
wr0
00
safe
00
3m
p+s
tale
ldsa
fe0
12
safe
00
2sa
fe0
14
iwp
23b
safe
00
9sa
fe0
29
safe
02
7rw
cn
2rf
i01
3sa
fe0
30
safe
01
1rf
i01
5rf
i00
3sa
fe0
21
iriw n
7iw
p24
po
dw
r00
1sa
fe0
17
rfi0
12
n6
safe
01
9rf
i00
1rf
i00
0rf
i01
1sa
fe0
26
safe
00
4sa
fe0
16
rfi0
02
rfi0
05
rfi0
14
rfi0
04
rfi0
06
n1
amd
3co
-iri
wM
ean
Tim
e (
ho
urs
)
Hybrid Full_Proof
Complete quickly due to covering traces
![Page 190: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/190.jpg)
▪Two configurations (Hybrid and Full_Proof), avg. runtime 6.2 hrs
• See paper for configuration details
Results: Time to Prove Properties
0
2
4
6
8
10
12
safe
00
6 lbsa
fe0
07
mp
safe
02
2sa
fe0
10
ssl
safe
00
0sa
fe0
08
n4
n5
co-m
psa
fe0
01
wrc sb
safe
01
8p
od
wr0
00
safe
00
3m
p+s
tale
ldsa
fe0
12
safe
00
2sa
fe0
14
iwp
23b
safe
00
9sa
fe0
29
safe
02
7rw
cn
2rf
i01
3sa
fe0
30
safe
01
1rf
i01
5rf
i00
3sa
fe0
21
iriw n
7iw
p24
po
dw
r00
1sa
fe0
17
rfi0
12
n6
safe
01
9rf
i00
1rf
i00
0rf
i01
1sa
fe0
26
safe
00
4sa
fe0
16
rfi0
02
rfi0
05
rfi0
14
rfi0
04
rfi0
06
n1
amd
3co
-iri
wM
ean
Tim
e (
ho
urs
)
Hybrid Full_Proof
Max runtime 11 hours (if some properties unproven)
![Page 191: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/191.jpg)
▪Full_Proof generally better (90%/test) than Hybrid (81%/test)
▪On average, Full_Proof can prove more properties in same time
Results: Proven Properties
0
10
20
30
40
50
60
70
80
90
100
safe
00
6 lbsa
fe0
07
safe
00
0n
4sa
fe0
11
safe
01
6sa
fe0
30
rfi0
00
safe
01
7sa
fe0
19
safe
00
4sa
fe0
21
rfi0
11
rfi0
06
n1
rfi0
12
n7
co-i
riw
rfi0
05
safe
00
2n
2ir
iwrf
i00
2sa
fe0
12
rfi0
03
safe
00
3sa
fe0
14
safe
00
1iw
p2
4rf
i01
5rf
i00
1sa
fe0
26
safe
02
7p
od
wr0
01
safe
00
8rf
i01
4n
6n
5w
rcsa
fe0
18
rwc
safe
00
9rf
i00
4am
d3
mp
+sta
leld
rfi0
13
mp
safe
02
2sa
fe0
10
ssl
co-m
p sbp
od
wr0
00
iwp
23b
safe
02
9M
ean
% P
rove
n P
rop
ert
ies
Hybrid Full_Proof
![Page 192: Automated Full-Stack Memory Model Verification with the ...manerkar/slides/... · What are Memory (Consistency) Models? LLVM IR JVM PTX SPIR Java Bytecode C11/ C++11 Cuda OpenCL x86](https://reader034.vdocument.in/reader034/viewer/2022052012/6028683bbba83c2e0346170b/html5/thumbnails/192.jpg)
▪Full_Proof generally better (90%/test) than Hybrid (81%/test)
▪On average, Full_Proof can prove more properties in same time
Results: Proven Properties
0
10
20
30
40
50
60
70
80
90
100
safe
00
6 lbsa
fe0
07
safe
00
0n
4sa
fe0
11
safe
01
6sa
fe0
30
rfi0
00
safe
01
7sa
fe0
19
safe
00
4sa
fe0
21
rfi0
11
rfi0
06
n1
rfi0
12
n7
co-i
riw
rfi0
05
safe
00
2n
2ir
iwrf
i00
2sa
fe0
12
rfi0
03
safe
00
3sa
fe0
14
safe
00
1iw
p2
4rf
i01
5rf
i00
1sa
fe0
26
safe
02
7p
od
wr0
01
safe
00
8rf
i01
4n
6n
5w
rcsa
fe0
18
rwc
safe
00
9rf
i00
4am
d3
mp
+sta
leld
rfi0
13
mp
safe
02
2sa
fe0
10
ssl
co-m
p sbp
od
wr0
00
iwp
23b
safe
02
9M
ean
% P
rove
n P
rop
ert
ies
Hybrid Full_Proof
Hybrid better for only a few tests