flexible filters for high performancehigh performance ...€¦ · stream programmingstream...

118
Flexible Filters for High Performance High Performance Embedded Computing Rebecca Collins and Luca Carloni Department of Computer Science Department of Computer Science Columbia University

Upload: others

Post on 28-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Flexible Filters for High PerformanceHigh Performance

Embedded ComputingRebecca Collins and Luca CarloniDepartment of Computer ScienceDepartment of Computer Science

Columbia University

Page 2: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

MotivationMotivation

Page 3: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 4: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 5: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 6: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 7: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 8: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 9: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 10: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 11: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 12: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 13: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 14: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 15: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 16: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Stream ProgrammingStream Programming

CBA

• Stream Programming model• Stream Programming model– filter: a piece of sequential programming– channels: how filters communicate– token: an indivisible unit of data for a filter

• Examples: Signal processing, image processing, embedded applicationsembedded applications

Page 17: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Example: Constant False-Alarm R (CFAR) D iRate (CFAR) Detection

guard cells cell under testguard cells cell under test

ith dditi l f tcompare to with additional factors• number of gates• threshold (μ)• number of guard cellsHPEC Challenge • number of guard cells• rows and other dimensional

datahttp://www.ll.mit.edu/HPECchallenge/

HPEC Challenge

Page 18: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkuInt to right left align find

CFAR BenchmarkuInt tofloat square right

windowleft

window

add

aligndata

findtargets

add

Data Stream:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 19: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkuInt to right left align find1

CFAR BenchmarkuInt tofloat square right

windowleft

window

add

aligndata

findtargets1

add

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 20: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkuInt to right left align find2 1

CFAR BenchmarkuInt tofloat square right

windowleft

window

add

aligndata

findtargets2 1

add

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 21: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkuInt to right left align find

1

23 1

CFAR BenchmarkuInt tofloat square right

windowleft

window

add

aligndata

findtargets23 1

add

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 22: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkuInt to right left align find1

67891011

231213

CFAR Benchmark

cell under test

uInt tofloat square right

windowleft

window

add

aligndata

findtargets123910111213

8

cell under test add

right window

left window

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

do

Page 23: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkuInt to right left align find1

67891011

231213

CFAR Benchmark

cell under test

uInt tofloat square right

windowleft

window

add

aligndata

findtargets123910111213

8tcell under test add

right window

left window

t

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

do

t

Page 24: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkuInt to right left align find1

7891011

231213

CFAR Benchmark

cell under test

uInt tofloat square right

windowleft

window

add

aligndata

findtargets123910111213

8t6

cell under test add

right window

left window

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

do

t

Page 25: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

MappingMapping

CBA

Page 26: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

MappingMapping

1 2 3core 1 core 2 core 3

CBA

Th h t t fA B C Throughput: rate of processing data tokens

A B C

multi-core chip

Page 27: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

MappingMapping

1 3core 1 core 3

CBA

Th h t t fA C Throughput: rate of processing data tokens

AB

C

multi-core chip

Page 28: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 29: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 30: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 31: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 32: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 33: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 34: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 35: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 36: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 37: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

“wait”

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 38: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Unbalanced Flow

1 2 3

Unbalanced Flow

core 1 core 2 core 3

CBA

“wait”

• Bottlenecks reduce throughputCaused by backpressure– Caused by backpressure• inherent algorithmic imbalances• data dependent computational spikes• data-dependent computational spikes

Page 39: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data Dependent Execution Time: CFARCFAR

• Using Set 1 from the HPEC 1.3 % targets

35

CFAR Kernel Benchmark• Over a block of about 100

cells 15202530

Perc

ent

• Extra workload of 32 microseconds per target 0

510

2 128 254 380 506

P

7.3 % targets15

Execution Time (microseconds)

0.3 % targets80

5

10

Perc

ent

20

40

60Pe

rcen

t

00 96 192 288 384 480 576

Execution Time (microseconds)

0

20

2 128 255 381 508Execution Time (microseconds)

Page 40: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data Dependent Execution TimeData Dependent Execution Time

Some other examples:p• Bloom Filters (Financial, Spam detection)• Compression (Image processing)

789

3456

Perc

ent

0123

0.000 0.001 0.002 0.003 0.004 0.005

Execution Time(s)

Page 41: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 42: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 43: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 44: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 45: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 46: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 47: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 48: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 49: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 50: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 51: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 52: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 53: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 54: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 55: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 56: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 57: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 58: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 59: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 60: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 61: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 62: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 63: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Our Solution: Flexible FiltersOur Solution: Flexible Filters

1 2 3

flex-split flex-merge

BA Ccore 1 core 2 core 3

C

• Unused cycles on B are filled by working ahead on filter C with the data already present on BPush stream flow upstream of a bottleneck• Push stream flow upstream of a bottleneck

• Semantic Preservation

Page 64: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Related WorksRelated Works

• Static Compiler Optimizations• Static Compiler Optimizations– StreamIt [Gordon et al, 2006]

• Dynamic Runtime Load BalancingW k St li /Filt Mi ti– Work Stealing/Filter Migration

• [Kakulavarapu et al., 2001]• Cilk [Frigo et al., 1998]• Flux [Shah et al 2003]• Flux [Shah et al., 2003]• Borealis [Xing et al., 2005]

– Queue based load balancing• Diamond [Huston et al 2005] distributed search queue based load• Diamond [Huston et al., 2005] distributed search, queue based load

balancing, filter re-ordering

• Combination Static+Dynamic– FlexStream [Hormati et al., 2009] multiple competing programsFlexStream [Hormati et al., 2009] multiple competing programs

Page 65: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

OutlineOutline

• IntroductionIntroduction• Design Flow of a Stream Program with

FlexibilityFlexibility • Performance• Implementation of Flexible Filters• Experimentsp

– CFAR Case Study

Page 66: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Design Flow of a Stream Program ith Fl ibl Filtwith Flexible Filters

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Profile of CFAR filters on Cell

right window

uIntToFloat

compiler/design tools

0 1 2 3 4 5

find targets

add

Average Execution Time (microseconds) per 100 tokens

Page 67: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Design ConsiderationsDesign Considerations

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Page 68: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Design ConsiderationsDesign Considerations

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Page 69: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Design ConsiderationsDesign Considerations

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Page 70: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Adding FlexibilityAdding Flexibility

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Page 71: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Adding FlexibilityAdding Flexibility

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Page 72: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Adding FlexibilityAdding Flexibility

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Page 73: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Adding FlexibilityAdding Flexibility

Design streamDesign stream algorithm

Mapping• Filters

• Memory

Profile

Add Flexibility to Bottlenecks

Page 74: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

OutlineOutline

• IntroductionIntroduction• Design Flow of a Stream Program with

FlexibilityFlexibility• Performance• Implementation of Flexible Filters• Experimentsp

– CFAR Case Study

Page 75: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Mapping Stream Programs M l i C Pl fto Multi-Core Platforms

BA CBA Ccore 1 core 2 core 3

core 1

BA Cpipeline mappingcore 2

BA C

core 1 core 3core 2

BA CBA C

SPMD mappingsharing a core

Page 76: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: SPMD MappingThroughput: SPMD Mapping

BA Ccore 1

1

BA Ct0 t1 t2 t3 t4 t5 t6

core 1

core 2

2 BA Ccore 2

core 3core 3

2

BA C

SPMD i3 tokens processed in 7 timesteps,ideal throughput = 3/7 = 0 429

3

SPMD mapping ideal throughput = 3/7 = 0.429

Page 77: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: SPMD MappingThroughput: SPMD Mapping

BA C Suppose EA = 2, EB = 2, EC = 3core 1

1 EA=2 EB=2 EC=3

BA Ct0 t1 t2 t3 t4 t5 t6

core 1

core 2

2 BA Ccore 2

core 3core 3

2

BA C

SPMD i3 tokens processed in 7 timesteps,ideal throughput = 3/7 = 0 429

3

SPMD mapping ideal throughput = 3/7 = 0.429

Page 78: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: SPMD MappingThroughput: SPMD Mapping

BA C Suppose EA = 2, EB = 2, EC = 3core 1

1 EA=2 EB=2 EC=3

BA Ct0 t1 t2 t3 t4 t5 t6

core 1 A1

core 2

2 BA Ccore 2

core 3

A1

A2

A3core 3

2

BA C

SPMD i

A3

3 tokens processed in 7 timesteps,ideal throughput = 3/7 = 0 429

3

SPMD mapping ideal throughput = 3/7 = 0.429

Page 79: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: SPMD MappingThroughput: SPMD Mapping

BA C Suppose EA = 2, EB = 2, EC = 3core 1

1 EA=2 EB=2 EC=3

BA Ct0 t1 t2 t3 t4 t5 t6

core 1 A1 B1

core 2

2 BA Ccore 2

core 3

A1

A2

A3

B1

B2

B3core 3

2

BA C

SPMD i

A3 B33

SPMD mapping

Page 80: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: SPMD MappingThroughput: SPMD Mapping

BA C Suppose EA = 2, EB = 2, EC = 3core 1

1 EA=2 EB=2 EC=3

BA Ct0 t1 t2 t3 t4 t5 t6

core 1 A1 B1 C1

core 2

2 BA Ccore 2

core 3

A1

A2

A3

B1

B2

B3

C1

C2

C3core 3

2

BA C

SPMD i

A3 B3 C3

3 tokens processed in 7 timesteps,ideal throughput = 3/7 = 0 429

3

SPMD mapping ideal throughput = 3/7 = 0.429

Page 81: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline MappingThroughput: Pipeline Mapping

CBAcore 1 core 2 core 3

latency 2 latency 2 latency 3

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1core 1core 2core 3core 3

Page 82: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline MappingThroughput: Pipeline Mapping

CBAcore 1 core 2 core 3

1

latency 2 latency 2 latency 3

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1core 1core 2core 3

A1

core 3

Page 83: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline MappingThroughput: Pipeline Mapping

CBAcore 1 core 2 core 3

1 12

latency 2 latency 2 latency 3

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1 A2core 1core 2core 3

A1 A2B1

core 3

Page 84: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline MappingThroughput: Pipeline Mapping

CBAcore 1 core 2 core 3

1 12 123

latency 2 latency 2 latency 3

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1 A2 A3core 1core 2core 3

A1 A2B1

C1B2A3

core 3 C1

Page 85: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline MappingThroughput: Pipeline Mapping

CBAcore 1 core 2 core 3

1 12 123 234

latency 2 latency 2 latency 3

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1 A2 A3 A4core 1core 2core 3

A1 A2B1

C1B2A3

C2B3A4

core 3 C1 C2throughput = 1/3 = 0.333 < 0.429

Page 86: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 87: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 88: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 89: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 90: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 91: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 92: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 93: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 94: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data BlocksData Blocks

CBAcore 1 core 2 core 3

CBA

C

data block: group of data tokens

Page 95: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline Augmented with Flexibilitywith Flexibility

core 1 core 2 core 3

CBA

C

1 1

C

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1core 1core 2core 3core 3

throughput = 2/5 = 0.4 < 0.429 (but > 0.333)

Page 96: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline Augmented with Flexibilitywith Flexibility

core 1 core 2 core 3

CBA

C

1

C

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1core 1core 2core 3

A1

core 3

throughput = 2/5 = 0.4 < 0.429 (but > 0.333)

Page 97: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline Augmented with Flexibilitywith Flexibility

core 1 core 2 core 3

CBA

C

1 12

C

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1 A2core 1core 2core 3

A1 A2B1

core 3

throughput = 2/5 = 0.4 < 0.429 (but > 0.333)

Page 98: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline Augmented with Flexibilitywith Flexibility

core 1 core 2 core 3

CBA

C

1 12 123

C

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1 A2 A3core 1core 2core 3

A1 A2B1

C1B2A3

core 3 C1throughput = 2/5 = 0.4 < 0.429 (but > 0.333)

Page 99: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline Augmented with Flexibilitywith Flexibility

core 1 core 2 core 3

CBA

C

1 12 123

C

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1 A2 A3core 1core 2core 3

A1 A2B1

C1B2A3

C2core 3 C1

throughput = 2/5 = 0.4 < 0.429 (but > 0.333)

Page 100: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Throughput: Pipeline Augmented with Flexibilitywith Flexibility

core 1 core 2 core 3

CBA

C

1 12 123 234

C

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9

core 1 A1 A2 A3 A4core 1core 2core 3

A1 A2B1

C1B2A3

C2B3A4

C2core 3 C1 C2

throughput = 2/5 = 0.4 < 0.429 (but > 0.333)

Page 101: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

OutlineOutline

• IntroductionIntroduction• Design Flow of a Stream Program with

FlexibilityFlexibility• Performance• Implementation of Flexible Filters• Experimentsp

– CFAR Case Study

Page 102: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Flex-SplitFlex SplitFlex-split core 2 core 3p

pop data block b from in

n0 = available space on out0

n1 |b| n0CB

core 2 core 3

n1 = |b| - n0

send n0 to out0, n1 to out1

send n0 0’s, then n1 1’s to l

Cselect

select

flex split flex mergeC out

out1

in0

in1

out0in

Cout1

maintain ordering based on run-time state of queues

Page 103: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Flex-MergeFlex MergeFlex-merge core 2 core 3g

pop i from select

if i is 0, pop token from in0

if i is 1 pop token from in1CB

core 2 core 3

if i is 1, pop token from in1

push token to out

C

select

flex split flex mergeC out

out1

in0

in1

out0in

Cout1

Page 104: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Flex-MergeFlex MergeFlex-merge core 2 core 3g

pop i from select

if i is 0, pop token from in0

if i is 1 pop token from in1CB

core 2 core 3

if i is 1, pop token from in1

push token to out

C

select

flex split flex mergeC out

out1

in0

in1

out0in Overhead of Flexibility?C

out1

Page 105: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Multi-Channel Fl S li d Fl MFlex-Split and Flex-Merge

output channel 1

filter

output channel 2

…output channel n

Page 106: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Multi-Channel Fl S li d Fl MFlex-Split and Flex-Merge

flex merge output channel 1

flex split filterflex merge output channel 2

…flex merge output channel n

Page 107: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Multi-Channel Fl S li d Fl MFlex-Split and Flex-Merge

flex merge output channel 1

flex split filterflex merge output channel 2

…filterflex

flex merge output channel n

Page 108: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Multi-Channel Fl S li d Fl MFlex-Split and Flex-Merge

flex merge output channel 1

flex split filterflex merge output channel 2

…filterflex

selectflex merge output channel n

Page 109: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Multi-Channel Fl S li d Fl MFlex-Split and Flex-Merge

i h l 1filter

…input channel 1

input channel 2

input channel n

Page 110: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Multi-Channel Fl S li d Fl MFlex-Split and Flex-Merge

i h l 1filter

…input channel 1

input channel 2flex merge

flex split

Centralized

filterflexinput channel n

select

Page 111: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Multi-Channel Fl S li d Fl MFlex-Split and Flex-Merge

i h l 1filter

…input channel 1

input channel 2flex merge

flex split

Centralized

filterflexinput channel n

select

filterinput channel 1

flex mergeflex split

Distributed

filter

… filterflex

input channel 2flex merge

β flex split

te flexinput channel n β flex split

select

Page 112: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

OutlineOutline

• IntroductionIntroduction• Design Flow of a Stream Program with

FlexibilityFlexibility• Performance• Implementation of Flexible Filters• Experimentsp

– CFAR Case Study

Page 113: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Cell BE ProcessorCell BE Processor• Distributed Memoryy• Heterogeneous

– 8 SIMD (SPU) cores– 1 PowerPC (PPU)

• Element Interconnect BBus– 4 rings– 205 Gb/s– 205 Gb/s

• Gedae Programming LanguageCommunication Layery

Page 114: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

GedaeGedae• Commercial data-flow language and programming GUI • Performance analysis tools• Performance analysis tools

Page 115: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

CFAR BenchmarkCFAR BenchmarkuInt to right left align finduInt tofloat square right

windowleft

window

add

aligndata

findtargets

add

Profile of CFAR filters on Cell

right window

uIntToFloat

find targets

add

0 1 2 3 4 5Average Execution Time (microseconds) per 100 tokens

Page 116: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

Data DependencyData Dependency• By changing threshold, change % targets

– 1.3 % – 7.3 %

• Additional workload per target– 16 µs– 32 µs– 64 µs

% T t

1.3 7.3

% Targets

16 μs 0.82 1.4532 μs 1.06 1.3964 1 27 1 47

AdditionalWorkload

64 μs 1.27 1.47

Page 117: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

More BenchmarksMore BenchmarksBenchmark Field Results

DedupInformation

TheoryRabin block/ max chunk size Speedup

4096/512 2.00

JPEG Image Processing

Image width x height128x128256 256

1.311 16Processing 256x256

512x5121.161.25

stocks/walks/timestepsValue-at-

RiskFinance 16/1024/1024

64/1024/1024128/1024/1024

0.981.561.55

Page 118: Flexible Filters for High PerformanceHigh Performance ...€¦ · Stream ProgrammingStream Programming A B C • Stream Programming modelStream Programming model – filter: a piece

ConclusionsConclusions

• Flexible filters – adapt to data dependent bottlenecks– distributed load balancing– provide speedup without modification to

original filters– can be implemented on top of general

stream languages