Download - Notes on an actor language Jörn W. Janneck Xilinx Inc. 13 February 2007 – 7 th Ptolemy Miniconference

Notes on an actor language

Jörn W. JanneckXilinx Inc.

13 February 2007 – 7th Ptolemy Miniconference

CAL Actor Language

• scripting actor specifications– make it easier to write atomic actors

• experimenting with domain polymorphism• (code generation)

CAL @ Ptolemy• the language• domain-dependent

interpretationCAL @ Xilinx• overview• application

actors in CAL

encapsulated state

Actions

State

guarded atomic actions



simple actors

actor Sum () Input ==> Output:

sum := 0;

action [a] ==> [sum] do sum := sum + a; endend

Sum

actor SumAbs () Input ==> Output:

sum := 0;

action [a] ==> [sum] guard a >= 0 do sum := sum + a; end

action [a] ==> [sum] guard a < 0 do sum := sum - a; endend

SumAbsInput Output



nondeterminism

actor NDMerge () Input1, Input2 ==> Output:

action Input1: [x] ==> [x] end action Input2: [x] ==> [x] endend

NDMergeInput1

OutputInput2



data-dependent token flow

actor Select () S, A, B ==> Output:

action S: [sel], A: [v] ==> [v] guard sel end

action S: [sel], B: [v] ==> [v] guard not sel endend

Select

S

Output

B

A



CAL anddomain polymorphism

• two fundamental questions:1. Can an actor be interpreted/used in a given MoC?2. What is its interpretation?

domain-specific interpretation



Example: SDF

actor Add () Input1, Input2 ==> Output:

action [a], [b] ==> [a + b] endend

actor AddSeq () Input ==> Output:

action [a, b] ==> [a + b] endend

AddInput1

OutputInput2

1

1

1

AddSeqInput Output2 1



Example: SDF (cont’d)



NDMergeInput1

OutputInput2

actor Merge () Input1, Input2 ==> Output:

action [x1], [x2] ==> [x1, x2] endend

MergeInput1

OutputInput2

1

1

2



Some kind of “synchronous”...

NDMerge A2

1

1

F

1 1

Merge

1

1

2

1 1



Example: CSP



actor Add () Input1, Input2 ==> Output:

action [a], [b] ==> [a + b] endend

[ Input1 ? x -> Output ! x|| Input2 ? x -> Output ! x]

Input1 ? a -> Input2 ? b ->Output ! a + b

[ Input1 ? a -> Input2 ? b|| Input2 ? b -> Input1 ? a] ; Output ! a + b



Example: CSP (cont’d)

actor Select () S, A, B ==> Output:

action S: [sel], A: [v] ==> [v] guard sel end

action S: [sel], B: [v] ==> [v] guard not sel endend

S ? sel; [ sel -> A ? v -> Output ! v|| not sel -> B ? v -> Output ! v]

actor A () X, Y ==> Z:

action X: [x1, x2] ==> [f(x1, x2)] guard P(x1, x2) end

action Y: [y1, y2] ==> [f(y1, y2)] guard P(y1, y2) end end

?



CAL and dataflow at Xilinx

class MyActor

{ schedule(); readPort( portNum ); writePort( portNum );

}

software

hardware

actor source+ network

high-level synthesis

simulation

new FPGA programming model & tools• hardware code generation• software (& mixed) code generation

driver application• MPEG4 Simple Profile Decoder

MPEG standardization effort• ISO/IEC 23001-4 (working draft):

Codec Configuration Representation

• ISO/IEC 23002-4 (working draft):Video Tool Library



FPGA Programming In PracticeNetworked MPEG-4 Viewer

Microblaze running LWIP protocol stack

Decoder Actor Network

Raster Scan Actor

Raster Scan Actor

VGA Display IP

XUP Board(2VP30)

Remote Video Stream Server

UDP over Ethernet

LocalVGA Monitor

Ethernet

UDP

Memory ControllerVGA

Display IP



MPEG-4 SP Decoder

quality of compiled code

VersionArea

PerformanceSlice LUT FF BRAM MULT

VHDL IP 1

(15000 lines) 4637 7923 2637 26 2 344-CIF image size180K macroblock/s @ 100MHzRequires ZBT SRAM framebuf

CAL decoder(4000 lines)

3872 7720 3576 22 3 7

HD image size243K macroblock/s @ 120MHzInterfaces to DRAM framebufI-frame parsing: 50 Mbit/s

1 http://www.xilinx.com/bvdocs/ipcenter/data_sheet/ds520_prod_brf.pdf2 BRAM-limited to 4-CIF image size.3 Supports HD image size. Reduces to 16 BRAMs for 4-CIF image size.



comparing decoder solutions

throughputmacroblocks/sec

x1000

relative area efficiency

1

2

5

10

10 100 1000

CIF SD HD

a

a TI64xx MPEG-4 (CPU + L1 cache only)

b

c FPGA MPEG-4 using traditional HDL flow (12 MM effort)

c

d FPGA MPEG-4 using actor/dataflow synthesis (3 MM effort)

d

b ISSCC’06 H.264 capable (includes periphery)



Thank You.

CAL actor language: embedded.eecs.berkeley.edu/caltrop

Credits:Dave B. Parlour, Ian D. Miller, Johan Eker, Edward A. Lee, and many others.

BACKUP

programming language adoption

Name TPCI TPCI cum. Year

C 17.66% 17.66% 1973C++ 11.06% 28.73% 1985Perl 5.48% 34.20% 1987Python 3.47% 37.67% 1990VB 9.73% 47.40% 1991Delphi 2.15% 49.54% 1994Java 21.17% 70.72% 1995PHP 9.86% 80.58% 1995JavaScript 2.20% 82.78% 1995C# 3.07% 85.85% 2002

source: TIOBE Programming Community Index, TPCI, October 2006, http://www.tiobe.com/tpci.htm

1970 1975 1980 1985 1990 1995 2000 2005

50

100

C

C++Perl

Python

VBDelphi

JavaPHP

JavaScript

C#

cumulative TCPI by language creation date(for top 10 languages)

Smaller, Faster, Easier Too good to be true?

• This is what happens when design effort is constrained.• The key is enabling architectural exploration with rapid

turn-around time.• New decoder architecture incorporates many

improvements over original design in motion compensation, AC/DC reconstruction, parser, 2-d IDCT.

• Approximate manpower numbers:– VHDL decoder: 12 months– Dataflow decoder: 3 months

Architectural ExplorationMPEG4 Motion Compensator

video stream feedback

video frame buffer(off-chip DRAM)

PROBLEM! Memory latency for random access reads and writes prevents real-world operation at HD rates.

First Step: Try on-chip cache

• Break the address and data streams, insert a cache placeholder.

• Insert different policies, see what happens.

policy1Pass-through just to make sure model is OK.

policy2Insert a cache actor in the read path and monitor statistics.

Simulation result with policy2

Frame 1 OK time: 28111msFrame 2 OK time: 23834msRequests: 49456, Hits: 45360Miss rate: 8.28%Frame 3 OK time: 27369msRequests: 98704, Hits: 90512Miss rate: 8.30%

Monitor console

• Memory controller performance 133MHz clock 32 pixel cache line fill in ~18 cycles

• Worst case compensation is 81 reads for an 8x8 block.

• 8.3% miss rate impliesaverage read is ~ 2.4 cycles

• Rate limit is 44 Mpixel/s

• HD (1920p, 4:2:0, 30fps) rate target is 93.3 Mpixel/s

• Options for improvement- more expensive controller- much better cache policy- application-aware prefetch

Step2: Application-aware prefetch

replace cache with “search window”

compensation addresses now

relative to search window

search window senses block type

prefetch requests to frame buffer prefetch data

Results of prefetch strategy

• Better performance– prefetch needs to operate at 3x pixel rate– exploits longer burst read with application-awareness

(longer cache line did not help policy2 significantly)– 64 pixels in 26 cycles → average read is ~ 0.4 cycles– peak theoretical performance is 111 Mpixel/s– exceeds HD rate target with cheap DRAM

• Substantial change to overall model behavior, but– impact limited to two actors– no refactoring of control in other actors needed

The FPGA programming problem

• Big, heterogeneous chips• circuit-design programming (+ C, Simulink, ...)

1985: 128 4-LUTs

2006: [V5-LX] 207360 6-LUTs 10Mbit BRAM 192 ALUs

Download - Notes on an actor language Jörn W. Janneck Xilinx Inc. 13 February 2007 – 7 th Ptolemy Miniconference

Top Related