automatic compilation for domain specific accelerators · 2020. 8. 5. · mem tile peak-specified...
TRANSCRIPT
![Page 1: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/1.jpg)
Automatic Compilation for Domain Specific Accelerators
Ross Daly Caleb Donovick
Jackson Melchert
![Page 2: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/2.jpg)
Golden Age of Computer Architecture!
![Page 3: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/3.jpg)
• Architecture Specifications change frequently
Golden Age of Computer Architecture!
![Page 4: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/4.jpg)
• Architecture Specifications change frequently • Compiler is the (often overlooked) key component!
Golden Age of Computer Architecture!
![Page 5: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/5.jpg)
• Architecture Specifications change frequently • Compiler is the (often overlooked) key component! • Waterfall methodology:
Golden Age of Computer Architecture!
ApplicationAnalysis
Architectural Specification
RTL Design and Test
Physical Design
Software / Compiler
Design
![Page 6: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/6.jpg)
• Architecture Specifications change frequently • Compiler is the (often overlooked) key component! • Agile methodology:
Golden Age of Computer Architecture!
Base Hardware Accelerator v0
Compiler Toolchain v0
Application 1Application 2
Power, Performance,
Area
Base Hardware Accelerator v1
Compiler Toolchain v1
Incremental Updates
Application 2.1Application 3
![Page 7: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/7.jpg)
• Architecture Specifications change frequently • Compiler is the (often overlooked) key component! • Agile methodology: • Automatically generate compiler for every spec change
Golden Age of Computer Architecture!
Base Hardware Accelerator v0
Compiler Toolchain v0
Application 1Application 2
Power, Performance,
Area
Base Hardware Accelerator v1
Compiler Toolchain v1
Incremental Updates
Application 2.1Application 3
![Page 8: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/8.jpg)
CPU
• Compile to IR (CoreIR) • Common Optimizations • Mapping • Packing • Placement • Routing • Bitfile generation
• Compile to IR (LLVM) • Common Optimizations • Instruction Selection • Peephole Optimization • Instruction Scheduling • Register Allocation • Assembly
CGRA/FPGA
![Page 9: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/9.jpg)
CPU
• Compile to IR (CoreIR) • Common Optimizations • Mapping • Packing • Placement • Routing • Bitfile generation
• Compile to IR (LLVM) • Common Optimizations • Instruction Selection • Peephole Optimization • Instruction Scheduling • Register Allocation • Assembly
CGRA/FPGA
![Page 10: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/10.jpg)
CGRA Mapping
Lower
Application Halide Program
CoreIR Graph
Map PE and Memory
Mapped CoreIR Graph
CGRA Bitstream
![Page 11: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/11.jpg)
Our DSL-based Hardware Generation and Software Compilation Flow
PEak Compiler
PE HW in Magma
CGRA Verilog
PEak Program (PE spec)
Halide Compiler
CoreIR Graph
PE and MEM Mapper
Mapped CoreIR Graph
CGRA Bitstream
Place & Route Engine
Application Halide Program
Magma Compiler
Compiler Collateral
![Page 12: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/12.jpg)
Our DSL-based Hardware Generation and Software Compilation Flow
Lake CompilerPEak Compiler
PE HW in Magma
CGRA Verilog
Lake Program (MEM spec)
PEak Program (PE spec)
Halide Compiler
CoreIR Graph
PE and MEM Mapper
Mapped CoreIR Graph
CGRA Bitstream
Place & Route Engine
Application Halide Program
Magma Compiler
MEM HW in Magma
Compiler Collateral
![Page 13: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/13.jpg)
Output of Halide Compiler
Unified Buffer
Unified Buffer
Computation Kernel
Computation Kernel
CoreIR Graph
From Global Buffer
To Global Buffer
![Page 14: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/14.jpg)
Desired Output of Mapper
From Global Buffer
To Global Buffer
Lake-Specified Mem Tile
PEak-Specified PE Tile
Mapped CoreIR Graph
![Page 15: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/15.jpg)
To Buffer/IO
Kernels are composed of CoreIR PrimitivesCoreIR Primitives
add
add
sub
ashr
divmul
mul
Computational Kernel
From Buffer/IO
![Page 16: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/16.jpg)
CoreIR has SMT QF BitVector Semantics
In0 In1
Out
CoreIR.Sub Out = In0 - In1
![Page 17: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/17.jpg)
Mapping
a
as
a
dm
m
PEak-Specified PE Tile
CoreIR Primitives
Kernel Mapped Kernel
![Page 18: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/18.jpg)
PEak-Specified PE Tile
CoreIR Primitives
Rewrite Rule 1
Rewrite Rule 2
Rewrite Rule 3
Rewrite Rule 4
…
Rewrite Rule Table
a
as
a
dm
m
Kernel
Instruction Selection Algorithm
Mapped Kernel
Instruction Selection
div
mul add
sub
ashr add
![Page 19: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/19.jpg)
PEak-Specified PE Tile
CoreIR Primitives
Rewrite Rule 1
Rewrite Rule 2
Rewrite Rule 3
Rewrite Rule 4
…
Rewrite Rule Table
4.3
6.0
3.1
1.2
a
as
a
dm
m
Kernel
Instruction Selection Algorithm
Mapped Kernel
Instruction Selection
div
mul add
sub
ashr add
Cost
![Page 20: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/20.jpg)
Peak Compiler generates a table of Rewrite Rules
PEak Compiler
PEak Program (PE spec)
Halide Compiler
CoreIR Graph
PE and MEM Mapper
Mapped CoreIR Graph
CGRA Bitstream
Place & Route Engine
Application Halide Program
Rewrite Rule 1
Rewrite Rule 2
Rewrite Rule 3
Rewrite Rule 4
…
Rewrite Rule Table
div
mul add
sub
ashr add
![Page 21: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/21.jpg)
PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Data, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
PE ISA Specification
class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]
Specific types (or composition of types) for operands and instructions
![Page 22: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/22.jpg)
PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
PE ISA Specification
class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]
Specific types (or composition of types) for operands and instructions
![Page 23: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/23.jpg)
PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
PE ISA Specification
class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]
Specific types (or composition of types) for operands and instructions
![Page 24: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/24.jpg)
PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
PE ISA Specification
class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]
Specific types (or composition of types) for operands and instructions
![Page 25: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/25.jpg)
PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
PE ISA Specification
class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]
Specific types (or composition of types) for operands and instructions
![Page 26: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/26.jpg)
PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
PE ISA Specification
class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]
Specific types (or composition of types) for operands and instructions
![Page 27: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/27.jpg)
Subtract?
res flag
A B C
PE
inst
PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
![Page 28: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/28.jpg)
Subtract?
res flag
A B C
PE
inst
PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
= Instruction( op=Add, invert_A=1, c_in=1)
![Page 29: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/29.jpg)
Subtract?
res flag
A B C
PE
inst
PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
= Instruction( op=Add, invert_A=1, c_in=1)
res = ~A + B + 1
![Page 30: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/30.jpg)
Subtract?
res flag
A B C
PE
inst
PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:
if inst.invert_A: A = ~A
if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :
... return res, flag
= Instruction( op=Add, invert_A=1, c_in=1)
res = ~A + B + 1 = B - A
![Page 31: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/31.jpg)
class RISCV(Peak): def __init__(self): self.rf = RegisterFile(32, Word) self.PC = Register(Data)
def __call__(self, inst: Instruction) ->{“next_PC”:Word}: #ID rs1_idx, rs2_idx, rd_idx, … = decode(inst) rs1_val, rs2_val = self.rf.read(rs1_idx, rs2_idx) #EX ...
#MEM...
#WBself.rf.write(rd_val)
Define sub-components and state
RiscV Peak Specification
![Page 32: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/32.jpg)
class RISCV(Peak): def __init__(self): self.rf = RegisterFile(32, Word) self.PC = Register(Data)
def __call__(self, inst: Instruction) ->{“next_PC”:Word}: #ID rs1_idx, rs2_idx, rd_idx, … = decode(inst) rs1_val, rs2_val = self.rf.read(rs1_idx, rs2_idx) #EX ...
#MEM...
#WBself.rf.write(rd_val)
Define sub-components and state
RiscV Peak Specification
![Page 33: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/33.jpg)
RiscV ISA Specification with Algebraic Data Types
![Page 34: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/34.jpg)
RiscV ISA Specification with Algebraic Data Types
class Register(Product): funct7 = Funct7Enum rs2 = BitVector[5] rs1 = BitVector[5] funct3 = Funct3Enum rd = BitVector[5] opcode= Opcode
class Immediate(Product): ...
class UImmediate(Product): ... class Store(Product): ... class Branch(Product): ... class Jump(Product): ...
Instruction = Sum[Register, Immediate, UImmediate, Store, Branch, Jump]
![Page 35: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/35.jpg)
Multiple Interpretations of PEak Specification
• PEak program uses abstract types provided by the PEak DSL such as Bit, BitVector etc. • Each component of the
PEak compiler provides a separate concrete implementation of these abstract types • Multiple interpretations of a
PEak specification in different contexts
Python Context
Functional Model
PEak Program
BitVector
Magma Context
PEak Program
RTL
Bits
SMT Context
PEak Program
Symbolic Representation
(for Rewrite Rules)
SMTBitVector
![Page 36: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/36.jpg)
Multiple Interpretations of PEak Specification
• PEak program uses abstract types provided by the PEak DSL such as Bit, BitVector etc. • Each component of the
PEak compiler provides a separate concrete implementation of these abstract types • Multiple interpretations of a
PEak specification in different contexts
Python Context
Functional Model
PEak Program
BitVector
Magma Context
PEak Program
RTL
Bits
SMT Context
PEak Program
Symbolic Representation
(for Rewrite Rules)
SMTBitVector
SINGLE SOURCE OF TRUTHPEak Program
![Page 37: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/37.jpg)
In0 In1
Out
CoreIR.Sub
Discovering a Rewrite Rule
res flag
A B C
PE
inst
![Page 38: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/38.jpg)
In0 In1
Out
CoreIR.Sub
Input/Output Bindings
res flag
A B C
PE
inst
![Page 39: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/39.jpg)
In0 In1
Out
CoreIR.Sub
Input/Output Bindings
res flag
A B C
PE
inst
![Page 40: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/40.jpg)
In0 In1
Out
CoreIR.Sub
Input/Output Bindings
res flag
A B C
PE
inst
Constant
![Page 41: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/41.jpg)
In0 In1
Out
CoreIR.Sub
Setting Constants
res flag
A B C
PE
inst = Instruction( op=Add, invert_A=1, c_in=1)
![Page 42: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/42.jpg)
In0 In1
Out
CoreIR.Sub
res flag
A B C
PE
inst
CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1))
![Page 43: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/43.jpg)
∃(input_binding, inst)
CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1))
st ∀(in0, in1):
Out
In0 In1
CoreIR.Sub
res flag
A B C
PE
inst
![Page 44: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/44.jpg)
CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1))[‘res’]
Out
In0 In1
CoreIR.Sub
res flag
A B C
PE
inst
∃(input_binding, inst) st ∀(in0, in1):
![Page 45: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/45.jpg)
∃(input_binding, inst)
CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1, other))[‘res’]
st ∀(in0, in1, other):
Out
In0 In1
CoreIR.Sub
res flag
A B C
PE
inst
![Page 46: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/46.jpg)
How to Handle State?
res flag
A B C
PE
inst
State
![Page 47: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/47.jpg)
How to Handle State?
res flag
A B C
PE
inst
State
res flag
A B C
PE
inst
State
Transform
![Page 48: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/48.jpg)
Floating Point?
res flag
A B C
PE
inst
Floating Point
res flag
A B C
PE
inst
Transform
Floating Point
![Page 49: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/49.jpg)
Performance of Rewrite Rule Generator
• Problem: Universally Quantified SMT queries can take a long time • Solutions: • It is okay to be slightly slow (unless doing DSE!) • Different ways to encode the final formula • Different techniques for solving Quantified Expression
• Recent results: • ~ 1 minute to solve 20 rewrite rules on our current CGRA.
![Page 50: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/50.jpg)
What patterns to use in the rewrite rule table?
PEak Compiler
PEak Program (PE spec)
Halide Compiler
CoreIR Graph
PE and MEM Mapper
Mapped CoreIR Graph
CGRA Bitstream
Place & Route Engine
Application Halide Program
Rewrite Rule 1
Rewrite Rule 2
Rewrite Rule 3
Rewrite Rule 4
…
Rewrite Rule Table
??
div
mul add
sub
ashr add
![Page 51: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/51.jpg)
Which Patterns?
• Enumerate all possible patterns up to a size • Lots of uncommon patterns • Bloated Rewrite Rule Table • Slower instruction selection
• Analyze target domain’s applications for common subgraphs • Approach used for our upcoming DSE paper
• Only very basic patterns • Use peephole optimization/packing after instruction selection
![Page 52: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/52.jpg)
CPU Instruction Selection
![Page 53: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/53.jpg)
Unified Buffer
Unified Buffer
Computation Kernel
Computation Kernel
CoreIR Graph
From Global Buffer
To Global Buffer
CGRA Compilation
![Page 54: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/54.jpg)
Basic Block
Basic Block
Basic Block
Basic BlockR2 <— Sub(R0, R1)
R3 <— M[R2] M[R3] <— R1 R4 <— Add(R1, 0x50) …
Control Flow Graph Basic Block(Machine independent)
![Page 55: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/55.jpg)
In0 In1
Out
Out <— Sub(In0, In1)
Compiling WebAssembly to RiscV?
RISCV
inst
Register File
WebAssembly Subtract
![Page 56: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/56.jpg)
Transform RiscV to remove Register File
RISCV
inst
TransformRegister File Register
File
RISCV
inst
rs1 rs2
rd
![Page 57: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/57.jpg)
In0 In1
Out
Out <— Sub(In0, In1)
Discovering Subtract
RISCV
inst rs1 rs2
rd
![Page 58: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/58.jpg)
RISCV
inst rs1 rs2
rd
Branch/Memory Instructions?
PC MemRead
Next PC
Mem Addr
Mem Write
![Page 59: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/59.jpg)
The Future
• Goal: Fully Automatic compiler generation for Accelerator Architectures
![Page 60: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives](https://reader033.vdocument.in/reader033/viewer/2022060101/60b221f4a6ad3d306d4c8582/html5/thumbnails/60.jpg)
Thank You