hwacha v4: decoupled data parallel custom extension · raven-1 hv1 raven-2 hv2 raven-3 hv3 raven-4...

40
Hwacha V4: Decoupled Data Parallel Custom Extension Colin Schmidt, Albert Ou, Krste Asanović UC Berkeley

Upload: others

Post on 27-Jun-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha V4: Decoupled Data Parallel Custom Extension

Colin Schmidt, Albert Ou, Krste Asanović

UC Berkeley

Page 2: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Introduction

• Data-parallel, custom ISA extension to RISC-V with a configurable ASIC-focused implementation

• Developed to research energy-efficient implementation of vector architecture

• Several versions over the years with this the most recent being V4

• Rocket-chip based accelerator with binutils, llvm-backend, FireSim, and ASIC VLSI support

• Open-sourced now!

2

Page 3: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Key Ideas in Vector Architectures

• Runtime-variable vector length register • Compact, portable code

• Hardware implementation scalability

• Reasonable encoding space usage

• Long vectors with temporal and spatial execution

• Scalar-vector computation

• Configurable register file • Trade architectural registers for longer vector lengths

3

Page 4: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Key Ideas

• Goal: Maximize efficiency of in-order vector microarchitecture

• Access/execute decoupling

• Precision-reconfigurable vector RF to increase vector lengths

• Predication and consensual branches to enable control flow

• Mixed scalar-vector computation

• Mixed-precision registers and datapaths

• OS support with unified virtual memory and restartable exceptions

4

Page 5: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Architecture Overview

• Configurable vector data and predicate register files

• Fixed number of address and scalar registers

• Unit-stride, constant-stride, indexed load/stores

• Full set of half, single, double FP, integer, and predicate operations

5

Page 6: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Fetch Programming model

• Separate vector instructions from scalar control thread

• Vector-fetch instruction, vf, sends address from control thread to vector unit

• Vector unit fetches and decodes separate instruction stream

• Control processor runs ahead enqueuing more work, hiding latency

CP VI$

Vector

Execution

Unit

Vector Unit

6

Page 7: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Vector-Fetch Architectural Paradigm

7

a0: n, a1: a,

a2: *x, a3: *y

vmcs vs0, a1

stripmine:

vsetvl t0, a0

vmca va0, a2

vmca va1, a3

vf saxpy

slli t1, t0, 2

add a2, a2, t1

add a3, a3, t1

sub a0, a0, t0

bnez a0, stripmine

saxpy:

vlw vv0, va0

vlw vv1, va1

vfma.svv vv1, vs0, vv0, vv1

vsw vv1, va1

vstop

Control Thread

Worker Thread

CP VI$

Vector

Execution

Unit

Vector Unit

for (i=0; i<n; i++) {

y[i] = a*x[i] + y[i];

}

Page 8: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Vector-Fetch Architectural Paradigm

8

a0: n, a1: a,

a2: *x, a3: *y

vmcs vs0, a1

stripmine:

vsetvl t0, a0

vmca va0, a2

vmca va1, a3

vf saxpy

slli t1, t0, 2

add a2, a2, t1

add a3, a3, t1

sub a0, a0, t0

bnez a0, stripmine

saxpy:

vlw vv0, va0

vlw vv1, va1

vfma.svv vv1, vs0, vv0, vv1

vsw vv1, va1

vstop

Control Thread

Worker Thread

CP VI$

Vector

Execution

Unit

Vector Unit

for (i=0; i<n; i++) {

y[i] = a*x[i] + y[i];

}

Page 9: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Vector-Fetch Architectural Paradigm

9

a0: n, a1: a,

a2: *x, a3: *y

vmss vs0, a1

stripmine:

vsetvl t0, a0

vmsa va0, a2

vmsa va1, a3

vf saxpy

slli t1, t0, 2

add a2, a2, t1

add a3, a3, t1

sub a0, a0, t0

bnez a0, stripmine

saxpy:

vlw vv0, va0

vlw vv1, va1

vfma.svv vv1, vs0, vv0, vv1

vsw vv1, va1

vstop

Control Thread

Worker Thread

CP VI$

Vector

Execution

Unit

Vector Unit

for (i=0; i<n; i++) {

y[i] = a*x[i] + y[i];

}

Page 10: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Vector-Fetch Architectural Paradigm

10

a0: n, a1: a,

a2: *x, a3: *y

vmcs vs0, a1

stripmine:

vsetvl t0, a0

vmca va0, a2

vmca va1, a3

vf saxpy

slli t1, t0, 2

add a2, a2, t1

add a3, a3, t1

sub a0, a0, t0

bnez a0, stripmine

saxpy:

vlw vv0, va0

vlw vv1, va1

vfma.svv vv1, vs0, vv0, vv1

vsw vv1, va1

vstop

Control Thread

Worker Thread

CP VI$

Vector

Execution

Unit

Vector Unit

for (i=0; i<n; i++) {

y[i] = a*x[i] + y[i];

}

Page 11: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Microarchitecture

11

Page 12: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Microarchitecture

12

Page 13: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Microarchitecture

13

Page 14: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Microarchitecture

14

Page 15: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Microarchitecture

15

Page 16: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Hwacha Microarchitecture

16

Page 17: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Decoupling

17

• Exploit regularity of constant-stride vector memory accesses for aggressive yet accurate prefetching

• Extensive decoupling in microarchitecture to tolerate latency and resolve memory references as early as possible

Page 18: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Decoupling

Before stripmine loop, control thread pushes scalar register write to VXU command queue

18

Page 19: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Decoupling

Vector length updates, address register writes, and vector fetch commands pushed to both queues for each stripmine iteration

19

Page 20: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Decoupling

VRU prefetches instructions and pre-executes vector loads, refilling L2 cache ahead of VMU access

20

Page 21: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Decoupling

VMU moves data from L2 cache into vector register file when eventually needed; overlaps with VXU computation

21

Page 22: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Decoupling

Control processor runs ahead of vector unit to next iteration

• Performs address computations

• Pushes next vector fetch command

22

Page 23: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Decoupling

VRU prefetches next iteration while VXU remains busy with previous one

23

Page 24: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Reconfigurable Vector Register File • Software specifies number of vector data and predicate registers

• Hardware dynamically remaps elements to physical storage

vsetcfg 4 vlen = 2

vsetcfg 2 vlen = 4

• Maximum hardware vector length automatically extends to fill available capacity

• Exchange unused architectural registers for longer hardware vectors

24

Page 25: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Mixed-Precision Support • Configurable element widths for individual vector registers

• Subword register packing and alignment of mixed-precision operands are managed by hardware – software remains fully oblivious

25

vsetcfg 1, 1 vlen = 5

Page 26: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Lane Organization

• Compact register file of four 1R/1W SRAM banks

• Per-bank integer ALU/PLU • Two independently scheduled FMA

clusters • Total of four double-precision FMAs

per cycle

• Pipelined integer multiplier • Variable-latency decoupled

functional units • Integer divide • Floating-point divide with square root • Reduction

26

Page 27: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Lane Organization

• Compact register file of four 1R/1W SRAM banks

• Per-bank integer ALU/PLU • Two independently scheduled FMA

clusters • Total of four double-precision FMAs

per cycle

• Pipelined integer multiplier • Variable-latency decoupled

functional units • Integer divide • Floating-point divide with square root • Reduction

27

Page 28: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Lane Organization

• Compact register file of four 1R/1W SRAM banks

• Per-bank integer ALU/PLU • Two independently scheduled FMA

clusters • Total of four double-precision FMAs

per cycle

• Pipelined integer multiplier • Variable-latency decoupled

functional units • Integer divide • Floating-point divide with square root • Reduction

28

Page 29: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Lane Organization

• Compact register file of four 1R/1W SRAM banks

• Per-bank integer ALU/PLU • Two independently scheduled FMA

clusters • Total of four double-precision FMAs

per cycle

• Pipelined integer multiplier • Variable-latency decoupled

functional units • Integer divide • Floating-point divide with square root • Reduction

29

Page 30: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Lane Organization

• Compact register file of four 1R/1W SRAM banks

• Per-bank integer ALU/PLU • Two independently scheduled FMA

clusters • Total of four double-precision FMAs

per cycle

• Pipelined integer multiplier • Variable-latency decoupled

functional units • Integer divide • Floating-point divide with square root • Reduction

30

Page 31: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Lane Organization

• Compact register file of four 1R/1W SRAM banks

• Per-bank integer ALU/PLU • Two independently scheduled FMA

clusters • Total of four double-precision FMAs

per cycle

• Pipelined integer multiplier • Variable-latency decoupled

functional units • Integer divide • Floating-point divide with square root • Reduction

31

Page 32: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Sequencer

• Instruction window

• Group of 8 elements striped across all four banks forms atomic unit of scheduling within lane

• Issues to expander after all hazards resolved for element group • Vector ILP: Element groups from

different instructions can issue/complete out-of-order

• Expander converts sequencer ops to μops – low-level control signals that drive lane datapath

32

Page 33: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Systolic Bank Execution

• Sustains n operands/cycle after n-cycle initial latency

• Stall-free cascade of μops (“fire and forget”) after clearing hazards

• Striped element mapping avoids bank conflicts 33

Bank 0

Bank 1

Bank 2

Bank 3

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Page 34: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Vector Memory Unit

• 128-bit TileLink2 interface to L2 cache

• Coalesced accesses for unit-stride memory operations

• Optimizations to handle response reordering for loads: • Out-of-order writeback mechanism to

minimize buffering

• Support for “tailgating” overlapped load operations

34

Page 35: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Benchmarks/Tape-outs

• Silicon-proven: 10+ tape-outs

• Peak power efficiency: 40+ DP-GFLOPS/W in ST 28nm FD-SOI

• Peak floating-point performance: 64+ DP-GFLOPS, 128+ SP-GFLOPS, 256+ HP-GFLOPS in TSMC 16nm 5x5mm

• 95+% of peak performance on DGEMM

• 10X acceleration over Rocket on Caffe NN with single lane

35

2.8x2.8mm ST 28nm FD-SOI

Page 36: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Jun May

Raven-1

HV1

Raven-2

HV2 Raven-3

HV3

Raven-4

HV3

EOS14

HV2

EOS16

HV2

EOS18

HV2 EOS20

HV2

EOS22

HV3

EOS24

HV3.5

2011 2012 2013 2014 2015

May Apr Aug Feb Jul Sep Mar Nov Mar Apr

Hurricane-1

2-core HV4

2016

Mar

Hurricane-2

2-lane HV4

2017

CraftP1

4-lane HV4

Raven/Hurricane: ST 28nm FD-SOI; EOS: IBM 45nm SOI; CRAFT/EAGLE:TSMC16nm

EAGLE

8-core HV4

2018

Hwacha Chip Chronology

Page 37: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Current/Future Work

• Move frontend to the standard RISC-V Vector extension • Retain lane structure

• Extensions to RVV for polymorphic instructions

• Explore domain-specific extensions

• Draft of RVV spec at github.com/riscv/riscv-v-spec

37

Page 38: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Open Source Release

• Full parameterized Chisel3 RTL • Multi-lane, Sequencer slots, queue depths

• ISA assembly tests, hand-coded micro-benchmark suite

• Random torture test generator for verification

• Spike extension for simple software simulation

• FireSim support for ~90MHz FPGA simulation

• GNU binutils assembler

• Preliminary OpenCL compiler based on LLVM+pocl

• Documentation: technical reports

38

Page 39: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Acknowledgements

• Yunsup Lee

• Alon Amid, Sagar Karandikar, Quan Nguyen, Stephen Twigg, Huy Vo, Andrew Waterman, Jerry Zhao

• All students who worked on these chips as part of the following Berkeley labs and centers: ParLab, ASPIRE, ADEPT, BWRC

Sponsors:

Research partially funded by DARPA CRAFT HR0011-16-C-0052 and

PERFECT HR0011-12-2-0016; Intel Science and Technology Center for

Agile Design; and ADEPT Lab industrial sponsors and affiliates Intel, Google, Huawei, Siemens, SK Hynix, Seagate.

39

Page 40: Hwacha V4: Decoupled Data Parallel Custom Extension · Raven-1 HV1 Raven-2 HV2 Raven-3 HV3 Raven-4 HV3 EOS14 HV2 EOS16 HV2 EOS18 HV2 EOS20 8 HV2 EOS22 HV3 EOS24 HV3.5 2011 2012 2013

Questions and Links

RTL: github.com/ucb-bar/hwacha

Example Project: github.com/ucb-bar/hwacha-template

Toolchain+spike+tests: github.com/ucb-bar/esp-tools

FireSim: github.com/firesim/firesim/tree/hwacha

OpenCL+LLVM: github.com/ucb-bar/esp-llvm

Docs: hwacha.org

40