shunning jiang, christopher torng, an ... - cornell university

1
An Open-Source Python-Based Hardware Generation, Simulation, and Verification Framework Shunning Jiang, Christopher Torng, Christopher Batten Computer Systems Laboratory School of Electrical and Computer Engineering Cornell University 1 Abstract We present an overview of previously published features and work in progress for PyMTL, an open-source Python-based hardware generation, simulation, and verification framework that brings compelling productivity benefits to hardware design and verification. PyMTL provides a natural environment for multi-level modeling using method-based interfaces, features highly parametrized static elaboration and analysis/transform passes, supports fast simulation and property-based random testing in pure Python environment, and includes seamless SystemVerilog integration. 2 Hardware Development Workflows Existing HDL, HPF, HGF, and HGSF Workflows We categorize state-of-the-art hardware development workflows into four main groups with increasing productivity: traditional HDLs, hardware preprocessing frameworks (HPF), hardware generation frameworks (HGF), and hardware generation and simulation frameworks (HGSF) . * DUT = design under test; DUT’ = generated DUT; TB = test bench; TB* = TB with limited functionality; TB’ = generated TB; Sim = simulation. DUT Sim TB HDL (Verilog) FPGA/ ASIC synth DUT' Sim TB' HDL (Verilog) DUT Mixed (Verilog+Perl) TB gen gen FPGA/ ASIC synth DUT' Sim TB' HDL (Verilog) DUT Host Language (Scala) TB TB FPGA/ ASIC gen gen synth * * (a) Traditional (b) HPF (e.g., Genesis2) (c) HGF (e.g., Chisel) Other examples: VHDL Other examples: Verischemelog Other examples: Lava, HML, Stratus, PHDL 3 Fast edit-debug-sim loop 3 Single language for design and testbench 7 Not-so-good parametrization 7 Difficult to build powerful testbench 3 Better parametrization with insignificant coding style change 7 Multi-language semantic gap 7 Still difficult to build powerful testbench 3 Powerful parametrization 3 Single language for design 7 Slower edit-debug-sim loop 7 Yet still difficult to build powerful testbench (can only generate simple testbench) DUT' Sim HDL (Verilog) DUT Host Language (Python) TB Sim cosim FPGA/ ASIC gen synth (d) HGSF (e.g., PyMTL) 3 Fast(?) edit-debug-sim loop in host language 3 Single language for design and testbench 3 Powerful parametrization 3 Powerful testbench (unleash Python’s full power!) Other examples: JHDL, CλaSH, PyMTL, PyRTL, Migen, PyHDL PyMTL’s HGSVF Workflow and Software Architecture Verilog DUT' Sim HDL (Verilog) FL DUT Host Language (Python) Test Bench Sim cosim FPGA/ ASIC generate synth CL DUT RTL DUT coverage.py pytest hypothesis Model PyMTL Specifications (Python) Config Elaboration Model Instance PyMTL "Kernel" (Python) Simulatable Model Test & Sim Harnesses PyMTL Passes (Python) Simulation Pass Translation Pass Verilog Analysis Pass Analysis Output Transform Pass New Model 3 PyMTL’s Eight Features . Multi-level modeling # FL implementation for calculating log2(N) @s.tick_fl def fl_algorithm(): # put/get have blocking semantics s.out.put( math.log( s.in.get(), 2 ) ) # CL implementation emulates a 3-cycle pipeline s.pipe = Pipeline( latency = 3 ) @s.tick_cl def cl_algo_pipelined(): if s.out_q.enq_ready(): if s.pipe.can_pop(): s.out_q.push( s.pipe.do_pop() ) else: s.pipe.advance() if not s.in_q.deq_ready(): s.pipe.do_push( math.log( s.in_q.deq(), 2 ) ) # Part of RTL implementation s.N = Reg( Bits32 ) s.res = RegEn( Bits32 ) s.connect( s.res.out, s.out.msg ) ... @s.combinational def rtl_combN(): s.res.in_ = s.res.out + 1 s.N.in_ = s.N.out >> 1 if s.N.out == 0: s.res.en = Bits1( 0 ) else: s.res.en = Bits1( 1 ) . Pure-Python simulation . Property-based random testing . Highly Parametrized Static Elaboration . Method-based interfaces Xcel FL Adapter RTL Interface x = read(0x10) Xcel CL Adapter RTL Interface enqueue(req) dequeue(resp) Xcel RTL RTL Interface valid = 1; addr = ... if valid: x = data read(0x10) Mem FL Xcel FL Adapter FL Interface x = read(0x10) Xcel CL CL Interface enqueue(req) dequeue(resp) Xcel RTL RTL Interface if ready: en = 1; ... rdy = 1; if en: ... Port-Based Method-Based FL Interface valid/ ready valid/ ready valid/ ready read() en/ ready read() Adapter enq() read() deq() read(0x10) Mem FL FL Interface read(0x10) Mem FL FL Interface Mem FL Adapter RTL Interface read(0x10) Mem FL Adapter RTL Interface read(0x10) Mem FL Adapter RTL Interface read(0x10) . Python/SystemVerilog integration # By default PyMTL imports module DUT of DUT.v # in the same folder as the python source file. class DUT( VerilogModel ): def __init__( s ): s.in_ = InPort ( Bits32 ) s.out = OutPort ( Bits32 ) # Connect top level ports of DUT # to corresponding PyMTL ports s.set_ports({ 'clk' : s.clk, 'reset' : s.reset, 'in' : s.in_, 'out' : s.out, }) . Analysis and Transform passes # Analysis pass example: # Get a list of processors with >=2 input ports def count_pass( top ): ret = [] for m in top.get_all_modules_filter( lambda m: len( m.get_input_ports() ) >= 2 ): if isinstance( m, AbstractProcessor ): ret.append( m ) return m # Transform pass example: # Wrap every ctrl with CtrlWrapper def debug_port_pass( top ): for m in top.get_all_modules(): if m.get_full_name().startswith("ctrl"): p = m.get_parent() ctrl = p.delete_component( "ctrl" ) w = p.add_component( "ctrl_wrap", CtrlWrapper() ) new_ctrl = w.add_component( "ctrl", m ) ... < connect ports > ... . Fast Pure-Python Simulation PyMTL MyHDL PyRTL Migen IVerilog CVS Mamba Divider 118K CPS 0.82.20.030.69.3201-core 20K CPS - - - 1151632-core 360 CPS - - - 1.825124 PyMTL Use Cases Course Lab Assignments PyMTL has been used by over 400 students across two universities, including in a senior-level undergraduate computer architecture course at Cornell Univer- sity(ECE 4750), in a similar course at Boston University (EC 513), and in a graduate-level ASIC design course at Cornell University (ECE 5745). The com- puter architecture courses involved multiple design labs (integer multiplier, simple RISC-V processor, set-associative blocking cache, and bus/ring network), cul- minating in a final lab composing all previous components to build a multi-core system. Students chose whether to design in PyMTL, in SystemVerilog, or with a mix, but they were required to test their designs using PyMTL. Computer Architecture Research Cosimulate PyMTL cycle-level accelerator model with gem5 CPU/memory: Shreesha Srinath, Berkin Ilbeyi, Mingxing Tan, Gai Liu, Zhiru Zhang, and Christopher Batten. ”Architectural Specialization for Inter-Iteration Loop Dependence Patterns.” 47th ACM/IEEE Int’l Symp. on Microarchitecture (MICRO-47), Dec. 2014 Ji Kim, Shunning Jiang, Christopher Torng, Moyang Wang, Shreesha Srinath, Berkin Ilbeyi, Khalid Al-Hawaj, and Christopher Batten. ”Using Intra-Core Loop-Task Accelerators to Improve the Productivity and Performance of Task-Based Parallel Programs.” 50th ACM/IEEE Int’l Symp. on Microarchitecture (MICRO-50), Oct. 2017. Create architecture templates of tuned accelerator with PyMTL RTL modeling: Tao Chen, Shreesha Srinath, Christopher Batten, and Edward Suh. ”An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware.” 51st ACM/IEEE Int’l Symp. on Microarchitecture (MICRO-51), Oct. 2018. Tao Chen and Edward Suh. ”Efficient Data Supply for Hardware Accelerators with Prefetching and Access/Execute Decoupling.” 49th ACM/IEEE Int’l Symp. on Microarchitecture (MICRO-49), Oct. 2016. Batten Research Group Test Chip 1 (2016) . Fabricated in IBM 130nm; 2 x 2mm die; 1.2M transistor; RISC processor; LVDS clock receiver. Batten Research Group Test Chip 2 (2018) . Fabricated in TSMC 28nm; 1mm x 1.25mm die; 6.7M transistor; quad-core in-order RV32IMAF; smart sharing techniques for LLFUs and caches; synthesizable PLL. This work was supported in part by NSF CRI Award #1512937, NSF SHF Award #1527065, DARPA POSH Award #FA8650-18-2-7852, and a donation from Intel. The authors acknowledge and thank Derek Lockhart for his valuable feedback and his work on the original PyMTL framework. The authors would like to thank Ajay Joshi for using PyMTL for the computer architecture course at Boston University. The author also thank all the students who have provided feedback to PyMTL. U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation theron. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of any funding agency. Publication: Appears in The First Workshop on Open-Source EDA Technology (WOSET 2018), Nov. 2018. URL: http://github.com/cornell-brg Contact Author: Shunning Jiang, 471 Rhodes Hall, Ithaca, NY 14853, [email protected]

Upload: others

Post on 21-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shunning Jiang, Christopher Torng, An ... - Cornell University

An Open-Source Python-Based HardwareGeneration, Simulation, and Verification Framework

Shunning Jiang, Christopher Torng,Christopher Batten

Computer Systems LaboratorySchool of Electrical and Computer Engineering

Cornell University

1 AbstractWe present an overview of previously published features and work in progress for PyMTL, an open-source Python-based hardware generation,simulation, and verification framework that brings compelling productivity benefits to hardware design and verification. PyMTL provides a naturalenvironment for multi-level modeling using method-based interfaces, features highly parametrized static elaboration and analysis/transform passes,supports fast simulation and property-based random testing in pure Python environment, and includes seamless SystemVerilog integration.

2 Hardware Development WorkflowsExisting HDL, HPF, HGF, and HGSF Workflows

We categorize state-of-the-art hardware development workflows into four main groups with increasing productivity: traditional HDLs, hardwarepreprocessing frameworks (HPF), hardware generation frameworks (HGF), and hardware generation and simulation frameworks (HGSF) .

* DUT = design under test; DUT’ = generated DUT; TB = test bench; TB* = TB with limited functionality; TB’ = generated TB; Sim = simulation.

DUT

Sim

TB

HDL(Verilog)

FPGA/ ASIC

synthDUT'

Sim

TB'

HDL(Verilog)

DUT

Mixed(Verilog+Perl)

TB

gen

genFPGA/ ASIC

synthDUT'

Sim

TB'

HDL(Verilog)

DUT

Host Language (Scala)

TB TB

FPGA/ ASIC

gen

gen

synth

* *

(a) Traditional (b) HPF (e.g., Genesis2) (c) HGF (e.g., Chisel)

Other examples: VHDL Other examples: Verischemelog Other examples: Lava, HML, Stratus, PHDL

3 Fast edit-debug-sim loop3 Single language for design and

testbench7 Not-so-good parametrization7 Difficult to build powerful testbench

3 Better parametrization with insignificantcoding style change

7 Multi-language semantic gap7 Still difficult to build powerful testbench

3 Powerful parametrization3 Single language for design

7 Slower edit-debug-sim loop7 Yet still difficult to build powerful testbench

(can only generate simple testbench)

DUT'

Sim

HDL(Verilog)

DUT

Host Language (Python)

TB

Sim

cosim

FPGA/ ASIC

gen synth

(d) HGSF (e.g., PyMTL)

3 Fast(?) edit-debug-sim loop in host language

3 Single language for design and testbench

3 Powerful parametrization

3 Powerful testbench (unleash Python’s full power!)

Other examples: JHDL, C�aSH, PyMTL, PyRTL, Migen, PyHDL

PyMTL’s HGSVF Workflow and Software Architecture

Verilog DUT'

Sim

HDL(Verilog)

FL DUT

Host Language (Python)

Test Bench

Sim cosim

FPGA/ ASIC

generate synth

CL DUTRTL DUT

coverage.pypytest hypothesis

Model

PyMTLSpecifications

(Python)

Config

Elaboration ModelInstance

PyMTL "Kernel"(Python)

SimulatableModel

Test & SimHarnesses

PyMTLPasses(Python)

SimulationPass

TranslationPass Verilog

AnalysisPass

AnalysisOutput

TransformPass

NewModel

3 PyMTL’s Eight Features

.Multi-level modeling# FL implementation for calculating log2(N)@s.tick_fldef fl_algorithm():

# put/get have blocking semanticss.out.put( math.log( s.in.get(), 2 ) )

# CL implementation emulates a 3-cycle pipelines.pipe = Pipeline( latency = 3 )@s.tick_cldef cl_algo_pipelined():

if s.out_q.enq_ready():if s.pipe.can_pop(): s.out_q.push( s.pipe.do_pop() )else: s.pipe.advance()

if not s.in_q.deq_ready():s.pipe.do_push( math.log( s.in_q.deq(), 2 ) )

# Part of RTL implementations.N = Reg( Bits32 )s.res = RegEn( Bits32 )s.connect( s.res.out, s.out.msg )[email protected] rtl_combN():

s.res.in_ = s.res.out + 1s.N.in_ = s.N.out >> 1if s.N.out == 0: s.res.en = Bits1( 0 )else: s.res.en = Bits1( 1 )

. Pure-Python simulation

. Property-based random testing

.Highly Parametrized Static Elaboration

.Method-based interfaces

Xcel FL

Ada

pter

RTL

Inte

rfac

e

x = read(0x10)

Xcel CL

Ada

pter

RTL

Inte

rfac

e

enqueue(req)

dequeue(resp)

Xcel RTL

RTL

Inte

rfac

e

valid = 1; addr = ...

if valid: x = data

read(0x10)

Mem FLXcel FL

Ada

pter

FL In

terf

ace

x = read(0x10)

Xcel CL

CL

Inte

rfac

e

enqueue(req)

dequeue(resp)

Xcel RTL

RTL

Inte

rfac

e

if ready: en = 1; ...

rdy = 1; if en: ...

Port-Based Method-Based

FL In

terf

ace

valid/ready

valid/ready

valid/ready

read()

en/ready

read()

Ada

pter

enq()read()

deq() read(0x10)

Mem FL

FL In

terf

ace

read(0x10)

Mem FL

FL In

terf

ace

Mem FL

Ada

pter

RTL

Inte

rfac

e

read(0x10)

Mem FL

Ada

pter

RTL

Inte

rfac

e

read(0x10)

Mem FL

Ada

pter

RTL

Inte

rfac

e

read(0x10)

. Python/SystemVerilog integration# By default PyMTL imports module DUT of DUT.v# in the same folder as the python source file.class DUT( VerilogModel ):

def __init__( s ):s.in_ = InPort ( Bits32 )s.out = OutPort ( Bits32 )

# Connect top level ports of DUT# to corresponding PyMTL portss.set_ports({'clk' : s.clk,'reset' : s.reset,'in' : s.in_,'out' : s.out,

})

.Analysis and Transform passes# Analysis pass example:# Get a list of processors with >=2 input portsdef count_pass( top ):

ret = []for m in top.get_all_modules_filter(

lambda m: len( m.get_input_ports() ) >= 2 ):if isinstance( m, AbstractProcessor ):

ret.append( m )return m

# Transform pass example:# Wrap every ctrl with CtrlWrapperdef debug_port_pass( top ):

for m in top.get_all_modules():if m.get_full_name().startswith("ctrl"):

p = m.get_parent()ctrl = p.delete_component( "ctrl" )w = p.add_component( "ctrl_wrap", CtrlWrapper() )new_ctrl = w.add_component( "ctrl", m )...< connect ports >...

. Fast Pure-Python SimulationPyMTL MyHDL PyRTL Migen IVerilog CVS Mamba

Divider 118K CPS 0.8⇥ 2.2⇥ 0.03⇥ 0.6⇥ 9.3⇥ 20⇥1-core 20K CPS - - - 1⇥ 15⇥ 16⇥32-core 360 CPS - - - 1.8⇥ 25⇥ 12⇥

4 PyMTL Use Cases

Course Lab Assignments

PyMTL has been used by over 400 students across two universities, includingin a senior-level undergraduate computer architecture course at Cornell Univer-sity(ECE 4750), in a similar course at Boston University (EC 513), and in agraduate-level ASIC design course at Cornell University (ECE 5745). The com-puter architecture courses involved multiple design labs (integer multiplier, simpleRISC-V processor, set-associative blocking cache, and bus/ring network), cul-minating in a final lab composing all previous components to build a multi-coresystem. Students chose whether to design in PyMTL, in SystemVerilog, or with amix, but they were required to test their designs using PyMTL.

Computer Architecture Research

Cosimulate PyMTL cycle-level accelerator model with gem5 CPU/memory:Shreesha Srinath, Berkin Ilbeyi, Mingxing Tan, Gai Liu, Zhiru Zhang, and Christopher Batten. ”Architectural Specialization for Inter-IterationLoop Dependence Patterns.” 47th ACM/IEEE Int’l Symp. on Microarchitecture (MICRO-47), Dec. 2014Ji Kim, Shunning Jiang, Christopher Torng, Moyang Wang, Shreesha Srinath, Berkin Ilbeyi, Khalid Al-Hawaj, and Christopher Batten. ”UsingIntra-Core Loop-Task Accelerators to Improve the Productivity and Performance of Task-Based Parallel Programs.” 50th ACM/IEEEInt’l Symp. on Microarchitecture (MICRO-50), Oct. 2017.

Create architecture templates of tuned accelerator with PyMTL RTL modeling:Tao Chen, Shreesha Srinath, Christopher Batten, and Edward Suh. ”An Architectural Framework for Accelerating Dynamic ParallelAlgorithms on Reconfigurable Hardware.” 51st ACM/IEEE Int’l Symp. on Microarchitecture (MICRO-51), Oct. 2018.Tao Chen and Edward Suh. ”Efficient Data Supply for Hardware Accelerators with Prefetching and Access/Execute Decoupling.” 49thACM/IEEE Int’l Symp. on Microarchitecture (MICRO-49), Oct. 2016.

Batten Research Group Test Chip 1 (2016)

. Fabricated in IBM 130nm; 2 x 2mm die; 1.2M transistor; RISC processor; LVDS clock receiver.

Batten Research Group Test Chip 2 (2018)

. Fabricated in TSMC 28nm; 1mm x 1.25mm die; 6.7M transistor; quad-core in-order RV32IMAF;smart sharing techniques for LLFUs and caches; synthesizable PLL.

This work was supported in part by NSF CRI Award #1512937, NSF SHF Award #1527065, DARPA POSH Award #FA8650-18-2-7852, and a donation from Intel. The authors acknowledge and thank DerekLockhart for his valuable feedback and his work on the original PyMTL framework. The authors would like to thank Ajay Joshi for using PyMTL for the computer architecture course at Boston University. Theauthor also thank all the students who have provided feedback to PyMTL. U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notationtheron. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of any funding agency.

Publication: Appears in The First Workshop on Open-Source EDA Technology (WOSET 2018), Nov. 2018. URL: http://github.com/cornell-brg Contact Author: Shunning Jiang, 471 Rhodes Hall, Ithaca, NY 14853, [email protected]