connex technology proprietary and confidential 1 the ca1024: a massively parallel processor for...

26
Connex Technology Proprie tary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Post on 19-Dec-2015

219 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

1

The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Page 2: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

2

• Fabless semiconductor company in Silicon Valley• VC funded (series A & B) • In the product-development stage with 26+ employees

– Deep experience with video algorithms, processor design, and digital-video system software

• Core asset: ConnexArrayTM vector-processor architecture– Architecture verified in CA4096 test chip

• Six patent applications on Connex vector-processor technology – 1 US patent granted, 3 US patents pending, 2 US provisional– Granted and pending patents also filed in China, Taiwan, Korea,

EEC, Japan, Singapore• Initial market focus on DTV

Company Background

Page 3: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

3

Presentation Agenda

• Why a massively parallel processor (MPP)?

• How is MPP integrated in an SoC?

• Processor performance

• Project status

Page 4: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

4

• HDTV codec & post-processing are computationally intensive

• Computation is dominated by data-parallel processes

• HDTV is a fast-evolving domain

• ASICs are a very costly solution

Challenges

Page 5: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

5

Our Solution:Integral Parallel Machine

• Data-parallel computation

• Time-parallel computation (supported by speculative parallelism)

• I/O process is transparent to the computational process

Page 6: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

6

Key Technology

• Fully programmable solution for HDTV video encoding, decoding, and transcoding at the system and algorithm levels– Simple programming model

• Silicon-efficient architecture; die size competitive with similar function ASICs– Re-use of transistors– Minimal dedicated hard-wired blocks

• Sufficient performance to enable multistandard, multichannel, high-definition DTV– Linearly scalable

Page 7: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

7

The Connex Architecture

1

I/OController

Connex Array

0

1

n

0 2 m

CA1024-PVP:m = n = 32 for a 1,024-PE Connex Machine

Test Chip:m = n = 64 for a 4,096-PE Connex Array; sequencer and I/O control in an FPGA

3.2 GByte/sec I/O channel in parallel with code running on the Connex Array

ConnexI/O

AUX

16-bitRAM

Address

SelectIndex

16 bitALU

Sequencer

255

R0R1

01

254

R2R3R4R5R6R7

Page 8: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

8

16 bitALU

Connex Cell Architecture

• PE (Processing Element) has eight accumulator registers, including Connex, Aux, and I/O special-function registers

• Select flag enables or disables instruction processing

• Index is a unique cell number used to direct certain instructions

• Bidirectional 16-bit bus to 256 RAM locations

• Connex register includes connections for shifts to/from adjacent PE

• Aux and I/O registers dedicated to specific instruction functions

Address 0

ConnexI/O

AUX

RAM

1

255254

Index

R0R1R2R3R4R5R6R7

Select

Page 9: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

916 bitALU

16 bitALU

16 bitALU

ConnexArray Structure

• Replicated Connex cells each include PE and local RAM

• Linear interconnect of neighbor registers

• Conditional execution based on state of select bit or index value

• All selected cells execute the same instruction stream

255254

255

R0R1

01

254

R2R3R4R5R6R7

1On

1023

R0R1

01

On0

Off

R2R3R4R5R6R7

255

R0R1

01

254

R2R3R4R5R6R7

Page 10: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

10

Connex Data-Array Structure

0

255

0 1023Element n

Line m

16-bit data operands

256 lines with 1024 16-bit elements per line1GByte data I/O in parallel with computation operations

Page 11: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

11

Full Line Operations:Operate On All Elements in Parallel

0

255

0 1023

Line i

Line k

Line j

+, -, *, XOR, etc.

=

Line k = Line i OP Line j

Line k = Line i OP scalar value (repeated for all elements)

Page 12: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

12

Columns Active Based On Repeating Patterns

0

255

0 1023

Line i

Line k

Line j

+, -, *, XOR, etc.

=

Example: Mark all odd columns active. Or mark every third column active. Or mark every third and fourth column active, etc.

Page 13: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

13

Columns Active Based On Results of Previous Operations

0

255

0 1023

Line i

Line k

Line j

+, -, *, XOR, etc.

=

Example: Apparently random columns are active, marked, based on Data-dependent results of previous operations.This enables selective processing based on data content.

Page 14: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

14

0

255

0 1023

Line i

Line j

Example: 128 sets of 8x8 run in parallel in a 1024-cell array

7

7

8x8 8x8 8x8 8x8

Outer-Loop Parallelism:Program in context of 128+ data-structure instances

Example: 8x8 DCT

……..

Page 15: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

15

I/O System

I/O Plane

Connex Array

IOC

Switch Fabric

IS

Interrupts

DDR-DRAMController

DRAMDRAM

DRAMDRAM

Page 16: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

16

Computational-IntensiveArchitecture

• All forms of parallelism are strongly segregated– Connex Array for data-parallel computation– Speculative Array for time-parallel computation

• The granularity perfectly fits the application domain – 16-bit processing elements– no MACs, no FPUs, no multipliers…

Page 17: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

17

High I/O Bandwidth

• External I/O: 3.2 GBytes/sec– Serial access and random access with similar

performance

• Internal I/O: 400 GBytes/sec

Page 18: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

18

Area & Power Efficiency

• 2 GOPS/mm2 (peak performance)

• GOPS/Watt is 25–50 times greater than a mature sequential technology

Page 19: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

19

Programming Connex• CPL (Connex Programming Language) is

an extension of C with C/C++ syntax

• Code that operates on scalar data is written in regular C notation

• Connex-specific operators defined for features not available in C, e.g. operations on vectors, selections

• CPL uses sequential operators and

control structures on vector and select datatypes

• Using CPL, the Connex Machine is programmed the same way as conventional sequential machines

• Hides the complexities of the parallel execution hardware

• Complete SDK

{ ...const short OFFSET = 15;...short vector x, y;short vector min, max;...sel = all;x += OFFSET;...min = (x < y)? x : y;max = (x > y)? x : y;...

}

Vectors are arrays of scalar components.

Selections are arrays of Boolean values that dictate which vector components are active.

Page 20: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

20

Performance

• DCT: 0.35 clock cycle per pixel

• SAD: 0.0025 clock cycle per pixel

Page 21: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

21

H.264 Dual HD Stream Decoding

Clock Cycles Per Macroblock

Dezigzagging   37.3

Intra Prediction 54.1

IT/IQ 97.3

Motion Compensation 114.3

Deblocking Filter   27.1

Total [ Clock Cycles/Macroblock ]337.8

Allowed clock cycles per macroblock (2-channel 1080i): 409 cycles

Page 22: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

22

H.264 CABAC (SA) Decoding

• Targeted profile and level: 4.1 Main Profile• Bit-rate/stream considered: 35Mbps (45Mbps

maximum)• Number of bins to decode using CABAC : 47M/sec• Number of clock cycles per bin: 1 cycle• Cycles to decode bins/stream: 50MHz• Typical bit-rate expected for DVB: 10Mbps• Cycles to decode bins for typical stream (DVB):

15MHz

Page 23: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

23

Sw

itc

h F

ab

ric

Switch Fabric

Au

dio

Ou

tV

ide

oO

ut

Vid

eo

Ou

t

HOSTI/F

Au

dio

Ou

t

Ext.Bus

Au

dio

InA

ud

ioIn

Vid

eo

InV

ide

oIn

Test ICE

PCI v2.2or

Generic

64-bit Wide DRAM

5x-I2S

1xI2S

BT.656/1120

BT.656/1120

Flash

2x-I2S orS/PDIF

BT.656/1120

2x-I2S orS/PDIF

BT.656/1120

DDR-DRAM Ctrl(400 MHz Data Rate)

JTAGGPIO I2C

S/PDIF

SAHostCPU

Audio CPU

TS/SecCPU

VideoCPU

Instruction Sequencer

Sw

itc

h F

ab

ric

I/O

C

on

tro

ller

ConnexArray™Programmable Media Processor

Multi-Codec ProcessingPre-Analysis

3D FilterScaling

Graphics ProcessingVideo Merge/Blend

Motion Adaptive De-interlacing

CA1024

Switch Fabric

Page 24: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

24

CA1024 Project Status

ACF

MIPS MIPS MIPS PCI

MIPSSA

DD

RC

WO

A CA256CA256 CA256 CA256

• TSMC 0.13 micron• 676-pin PBGA• Samples Q3 2006• [email protected]

Page 25: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

25

In Summary…..

• Fully programmable processor

• Computational-intensive architecture

• High-bandwidth I/O

• Connex Programming Language & SDK

• Die-area and power-efficient architecture

Page 26: Connex Technology Proprietary and Confidential 1 The CA1024: A Massively Parallel Processor for Cost-Effective HDTV

Connex Technology Proprietary and Confidential

26

Thank You !