trends in compilable dsp architecture - glossnerglossner.org/john/papers/sips_presentation.pdf ·...

61
Communications R&D Center Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill, Jeff Derby, Erdem Hokenek, David Meltzer Uzi Shvadron, and Malcolm Ware IBM Communications R&D Center Yorktown Heights, NY [email protected]

Upload: others

Post on 08-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

Communications R&D Center

Trends in CompilableDSP Architecture

John Glossner, Jaime Moreno, Mayan Moudgill,Jeff Derby, Erdem Hokenek, David Meltzer

Uzi Shvadron, and Malcolm Ware

IBM Communications R&D CenterYorktown Heights, NY

[email protected]

Page 2: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Introduction

■ Broadband Applications

● Network & Functions● Market

■ DSP Algorithms

■ DSP Architectures● Classification● Comparison to

General Purpose Architectures

● Classical, Transitional, Modern DSP Examples

■ Compilation Issues● The “C” Problem● Previous solutions● IBM solution

■ IBM e-lite DSP● Compilable ultra-low

power DSP

■ Conclusions / Future

Page 3: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Network Functions

PBX

AccessSw./Rtr

ACDIVR

Data center

Work group

Desktop

Wiring closet

Wiring closet

WAN Access

Enterprise/Campus

High Speed Backbone

Central Office

Central Office Sw

High End Routers / Sw

LAN

Gateway

SOHO/Consumer

WAN

Ethernet

ADSL, Cable

Central Office

Central Office SwBase Station

Page 4: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Broadband Functions

Public Switched NetworkSet-Top Box ADSL Modem

Notebook computer

Workpad

Radio tower

PBX/PABX

DSL Access Mux

Cell Phone

Base Station

Public Internet

Voice-over-Net Server

Modem Server

Corporate Intranet

Fax-over-Net Server

Voice-over-Net Server

Router

PC w/ modem

Telephone

Fax

Network Gateway products

PC Client prducts

Embedded products

Wireless Client products

Web Phone

Page 5: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Broadband Communications

■ Aggregation of multiple streams at a network access boundary

● streams from different ports

● multiple streams from a single port

● streams have different QoS requirements

● voice, data, ...

■ Signal processing functions

● xDSL● VoIP● VoDSL● V.90

■ Network processing functions

● ATM with SAR● forwarding● QoS / bandwidth

management● policing / scheduling● filtering● service enablement

Page 6: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Programmable DSP Market

■ CAGR 34.4%

■ Growing faster than the general semiconductor market

Communications64%

Computer13%

Consumer10%

Industrial6%

Military2% Office Automation

1%

Instrumentation4%

4.4 $6.1$8.2

$10.9

$14.5

$19.2

$25.4

0

5

10

15

20

25

30

($B)

1999 2000 2001 2002 2003 2004 2005

General Purpose DSP Market

Page 7: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Programmable DSP MktDSP Market by Word Size

16-bit Fx87%

24-bit Fx7%

Floating Point6%

TI48%

Lucent25%

ADI12%

Motorola10%

Other5%

DSP Market Share

Page 8: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Wireless Market

0

5

10

15

20

25

Bill

ions

95 96 97 98 99 2000 2001 2002 2003

AnalogGSMIS-95IS-136PDC3G

Source: Micrologic Research / Forward Concepts Worldwide

Page 9: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

Communications R&D Center

DSP Algorithms

Page 10: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

DSP ApplicationsDSP Algorithm System Application

Speech Coding Digital cellular telephones, personal communications systems, digital cordless telephones, multimedia computers, secure communications.

Speech Encryption Digital cellular telephones, personal communications systems, digital cordless telephones, secure communications.

Speech Recognition Advanced user interfaces, multimedia workstations, robotics, automotive applications, cellular telephones, personal communications systems.

Speech Synthesis Advanced user interfaces, robotics Speaker Identification Security, multimedia workstations, advanced user interfaces

High-fidelity Audio Consumer audio, consumer video, digital audio broadcast, professional audio, multimedia computers

Modems Digital cellular telephones, personal communications systems, digital cordless telephones, digital audio broadcast, digital signaling on cable TV, multimedia computers, wireless computing, navigation, data/fax

Noise cancellation Professional audio, advanced vehicular audio, industrial applications Audio Equalization Consumer audio, professional audio, advanced vehicular audio, music Ambient Acoustics Emulation Consumer audio, professional audio, advanced vehicular audio, music Audio Mixing/Editing Professional audio, music, multimedia computers Sound Synthesis Professional audio, music, multimedia computers, advanced user interfaces

Vision Security, multimedia computers, advanced user interfaces, instrumentation, robotics, navigation

Image Compression Digital photography, digital video, multimedia computers, videoconferencing Image Compositing Multimedia computers, consumer video, advanced user interfaces, navigation Beamforming Navigation, medical imaging, radar/sonar, signals intelligence Echo cancellation Speakerphones, hands-free cellular telephones Spectral Estimation Signals intelligence, radar/sonar, professional audio, music

Source: BDTI

Page 11: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Sample RatesRadio Signaling and Radar

1/1000

1/100

1/10

1

10

100

1k

10k

100k

1M

10M

100M

1G

Sam

ple

Rat

e (H

z)

Control

Speech

Audio

Video

High Definition Television

Radio Modems

Voiceband Modems

Seismic Modeling

xDSL Modems

Broadband Communications

Signal Processing:12+ Orders Of Magnitude!!!

Instrumentation

Financial Modeling

Weather Modeling

low highALGORITHM COMPLEXITY

Source: BDTI

Page 12: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

DSP Operations

y b xk n k nn

N

= −=∑

0■ FIR:

■ FFT:

■ 2D-DCT:

■ Neural Nets:

y x e ikjk

jj

N iN= = = −

=

− −

∑ω ωπ

0

1 2

1

F u vN

f m nm u

Nn v

Nn

N

m

N

( , ) ( , ) cos( )

cos( )

=+

+

=

=

∑∑1 2 12

2 122

0

1

0

1 π π

y f w xk kk

N

= −

=∑ φ

0

Inner Products Easily Described By Vectors

Page 13: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Code Characteristics■ General Purpose

● Limited Parallelism● Control Dominated● Inherently Serial● Branch Intensive (20%)● Limited By Amdahl’s Law

■ 30% of Dynamic Execution

■ Amdahl’s Law:● 30% limits speedup to about 3x

■ DSP● Parallel Inner Loops● Loop Setup, then Compute● Overlapped Parallel Processing● Multiple Independent Streams

■ 70% of Dynamic Execution

■ Gustafson’s Law: ● tparallel is independent of N● Parallel Portion Scales With N● Linear slope!

s p e e d u pt

tN

t t

N s p e e d u pt

s e r i a lp a r a l l e l

p r o c e s s o r s

s e r i a l p a r a l l e l

s e r i a l

=+

+ =

→ ∞ →

1

11

( f o r )

a s , -N) tN+(

ttNtt

dupScaledSpee

serial

parallelserial

processorsparallelserial

1=

+

+=

Page 14: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Workload Comparisons

1.0 1.1 1.3 1.4 1.7 2.02.5

3.35.0

6.710.0

20.0

33.350.0

100.0

200.0

500.0

1000.0

10000.0

1

10

100

1000

10000

0 10 20 30 40 50 60 70 80 90 100

% Parallel Code

Spee

dup

General Purpose DSP

Video

Amdahl’s Law

Page 15: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Computational Requirements

MOPs

16 X GSM_EFR (380)

ADSL XCVR - 1.5Mb/s (100)

ADSL XCVR - 6.1Mb/s (360)

DFSE EQ - 2Mb/s (650)Full-rate DAB Viterbi Decoder, MPEG II MP@ML, 30fps Decode (600)

P X 64 CIF, 15 f/s, 100kb/s (1.2)

MPEG II Encode, 30f/s, Full Search, P=16, (35)

MPEG II Encode, MP@ML, 30f/s, ALG Search, P=16, (1.68)

100 GOPs

10 GOPs

1 GOP

500 MOPs

GSM_FR (2.5)

100 MOPs200 MOPs300 MOPs500 MOPs

GSM_HR, AC-3 decode, V.34 (20)GSM Terminal (Baseband, HR) (52)

GSM_EFR (16)

Page 16: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Architecture Domain

MACs

2k

( ) = OPS

Perf

orm

ance

GSM Terminal (HR/EFR) (52M)

GSM_FR (2.5M)GSM_EFR (16M)

GSM_HR, V.34bis (20M)VFLEX2 (30M)

1 MAC DSP

500

100

Page 17: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Architecture Domain

MACs

2k

( ) = OPS

Perf

orm

ance

GSM Terminal (HR/EFR) (52M)

2 MAC DSP

ADSL XCVR - 1.5Mb/s (100M)4 X GSM_HR/EFR (110M)

AC-3/MUSICAM Decode (20M)

GSM_FR (2.5M)GSM_EFR (16M)

GSM_HR, V.34bis (20M)VFLEX2 (30M)

1 MAC DSP

500

GSM Terminal (EHR/HSCSD/GPRS) (80M)

Single Carrier GSM BTS (180M)

100

Page 18: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Architecture Domain

MACs

2k

MP

( ) = OPS

Perf

orm

ance

GSM Terminal (HR/EFR) (52M)

2 MAC DSP

ADSL XCVR - 1.5Mb/s (100M)4 X GSM_HR/EFR (110M)

AC-3/MUSICAM Decode (20M)

GSM_FR (2.5M)GSM_EFR (16M)

GSM_HR, V.34bis (20M)VFLEX2 (30M)

1 MAC DSP

500

GSM Terminal (EHR/HSCSD/GPRS) (80M)

Single Carrier GSM BTS (180M)

16 X GSM_HR/EFR (400M)DAB XCVR (800M)

Multi-Carrier GSM BTS (800M)

100

Page 19: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Architecture Domain

MACs New Architectures Required

2k

( ) = OPS

Perf

orm

ance

GSM Terminal (HR/EFR) (52M)

2 MAC DSP

ADSL XCVR - 1.5Mb/s (100M)4 X GSM_HR/EFR (110M)

AC-3/MUSICAM Decode (20M)

GSM_FR (2.5M)GSM_EFR (16M)

GSM_HR, V.34bis (20M)VFLEX2 (30M)

1 MAC DSP

500

ADSL XCVR - 6.1Mb/s (500M)

DFSE EQ (UMTS) - 2Mb/s (650M)Single Chip DAB XCVR ( 800M)

H.263L + GSM Terminal (EHR/HSCSD/GPRS) (1.0G)

GSM Terminal (EHR/HSCSD/GPRS) (80M)

MPEG II MP@ML, 30fps Decode (600M)

Single Chip STB (1.5G)

Single Carrier GSM BTS (180M)

16 X GSM_HR/EFR (400M)DAB XCVR (800M)

Multi-Carrier GSM BTS (800M)

MPEG II Encode, MP@ML, 30f/s, ALG Search, P=16, (1.68G)MP

100

Page 20: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Architecture Domain

( ) = OPS

2k

Perf

orm

ance

GSM Terminal (HR/EFR) (52M)

2 MAC DSP

ADSL XCVR - 1.5Mb/s (100M)4 X GSM_HR/EFR (110M)

AC-3/MUSICAM Decode (20M)

GSM_FR (2.5M)GSM_EFR (16M)

GSM_HR, V.34bis (20M)VFLEX2 (30M)

1 MAC DSP

500

ADSL XCVR - 6.1Mb/s (500M)

DFSE EQ (UMTS) - 2Mb/s (650M)Single Chip DAB XCVR ( 800M)

H.263L + GSM Terminal (EHR/HSCSD/GPRS) (1.0G)

GSM Terminal (EHR/HSCSD/GPRS) (80M)

MPEG II MP@ML, 30fps Decode (600M)

Single Chip STB (1.5G)

Single Carrier GSM BTS (180M)

16 X GSM_HR/EFR (400M)DAB XCVR (800M)

Multi-Carrier GSM BTS (800M)

MPEG II Encode, MP@ML, 30f/s, ALG Search, P=16, (1.68G)MP

Symphonic SynthesisNatural Language ProcessingReal-time Speech Recognition

3G WirelessSoftware Radio

MACs MPNew Architectures Required

100

Page 21: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

Communications R&D Center

DSP Classifications

Page 22: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Processor Classification

Processor

GeneralPurpose

DSP

Page 23: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Processor Classification

Processor

GeneralPurpose

DSP

FloatingPoint

FloatingPoint

32 bitIEEE Other 32/64 bit

IEEEOther

(80 bit)

Page 24: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Processor Classification

Processor

GeneralPurpose

DSP

FixedPoint

FloatingPoint

16 bit 20 bit 24 bit 32 bitIEEE Other

FloatingPointInteger

32 bit +subsets

32/64 bitIEEE

64 bit +subsets

Other(80 bit)

Page 25: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Processor Classification

Processor

GeneralPurpose

DSP

FixedPoint

FloatingPoint

16 bit 20 bit 24 bit 32 bitIEEE Other

FloatingPointInteger

32 bit +subsets

32/64 bitIEEE

64 bit +subsets

Other(80 bit)

Page 26: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Numeric Representations

mantissa exponent

RadixPointSign Radix

PointSign

1

Implied mantissa(always 1)

SignSign

-21 20 2-1 2-2 2-3 2-4 2-5-20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 -27 26 25 24 23 22 21 20-23 22 21 20

1 0 1 0 1 1 0 0

-20 + 2-2 + 2-4 2-5+ =

-1 + .25 + .0625 + .03125 = -.65625

1 0 1 0 1 1 0 0

-27 + 25 + 23 22+ =

-128 + 32 + 8 + 4 = -84

0 1 1 0 1 0 0 1 0 10

22 + 20 = 520 + 2 + 2 =-1 -3

1 +.5 + .125 = 1.625

1.625 x 25 = 52.0Multiplication complicates fractional representations

Source: BDTI

Page 27: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

DSP vs. General Purpose■ Execution Predictability

● Required to guarantee real-time constraints

■ 1 cycle MAC

■ 0-overhead Loop Buffer

■ Complex Instructions● Multiple Operations Issued

■ Harvard Memory Architecture● Multiple memory access

■ Specialized Addressing Modes

■ Operate on Vector Stream Data

■ Data-independent Execution

■ Fractional Arithmetic

■ Pipeline Non-interlocked● Shallow Pipeline (3-5 stage)

■ Delayed Branch

■ Fast But Non-predictable● Dynamic Instruction Issue● Non-deterministic caches

■ Multicycle MAC

■ Branch Prediction

■ RISC Superscalar Instructions● Multiple Instructions Issued

■ Von Neumann Architecture● Split Cache has similar benefit

■ Typically Linear Addressing

■ Caches Assume Locality

■ Data-dependent Execution● Dependent upon operands

■ Integer Arithmetic

■ Pipeline Typically Interlocked● Deep Pipeline (5+ stage)

■ Multicycle Branch

Page 28: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

ISA Comparison

ISA Orthogonality Parallelism Withinan Instruction

Number ofInstructions /

Addressing Modes

Width ofInstructions

RISC High None Small FixedCISC Low-Medium Medium Large VariableDSP Low High Medium Mostly Fixed

Page 29: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

Communications R&D Center

DSP Architectures

Page 30: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Trends in DSP Processors

■ Software programmability● Focus on compilation

■ Ultra-low power

■ Very high performance

■ Computational performance with control processing

Page 31: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Performance vs. Power

2164 2181217321065L1609

162016281629

16210

5600256307

566025665256812C203 C203C549

C5421

C5441

SC140

SC140 FR500

FR300

ISP-5.7

Carmel

10 100 1000

Power (mW) - Note Log Scale

0

200

400

600

800

1000

1200Pe

rfor

man

ce (M

MA

C/S

)

Previous Generation

DSPs

Anticipated/Projected Entrants:TigerSHARC: 1.2 GMAC/Sec @ 2-8 WAltiVec: 4 GMAC/Sec @ 5+WC62x: 400 MMAC/Sec @ 1.8WC64x: (2005+? / 1.1GHz) 4.4 GMAC/Sec @ ??W

C55x announce10-80mW

400-800 MMACBetter

Pwr/Perf

FutureSweetSpot

Page 32: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Performance vs. PowerDSP Performance vs. Power

(Log Log scale)

C55x

C55x

C203

C549

C5421

C5441

2181

2164

2173

21065L

16210

16291628 1609

1620

56652 56602

56307

5600256812

SC140

SC140 FR500

FR300

1M/mW

1M/mW

5M/mW

5M/mW

10M/mW

10M/mW

50M/mW

50M/mW

10

100

1000

10000

10 100 1000

mWatts

MM

AC

/s

100 GMAC/sec

Page 33: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Classification

■ Classical DSPs

■ Transitional DSPs

■ Modern DSPs

■ Future DSPs

Page 34: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Classical DSP Architectures

■ Dot product processors

■ Poor compiler targets

■ Non-orthogonal

■ Small Address space

■ Multiple address spaces

■ Compound ISA

■ Highly focused on an application

Page 35: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

TI C54x■ Quintessential Classical

Architecture● 8, 16-bit busses● 40-bit ALU● 2, 40-bit accumulators

◆ 8 guard bits● 40-bit Barrel shifter● 17x17 multiply unit with 40-bit

adder & 1-cycle throughput◆ 0 detect, rounding, saturation

● Compare, select, store unit◆ Viterbi algorithm

● Exponent encoder● 16-bit address space

◆ 548/549 use segments to give 23-bit address

◆ Circular, bit-reversed addressing● Bock repeat Source: TI C54x CPU Reference

Page 36: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

C54x

■ A 16-bit machine with many modes!● Block repeat active● Overflow mode● Sign extension mode● Double precision or dual 16-bit precision mode● Fractional mode (left shift multiply <<1)● Accumulator shift mode (5-bit shift field mode)● Saturation on multiplication mode

◆ before accumulation◆ ETSI GSM operation

● Saturation on Store● Compiler mode

◆ Relative addressing using Data Page Pointer or Stack Pointer

Page 37: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Transitional Architectures■ Characteristics of both Classical and Modern

DSPs

■ More programmable but not architected for compilation

■ Typically small address space (64 kB)

■ More computational units (dual-MAC)

■ Parallel instruction issue● Versus compound instructions

■ More registers with RISC-like ISA

■ Media Processors

■ General Purpose Processors with SIMD

Page 38: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Infineon Carmel■ Superscalar: two 24-bit

instructions issued every cycle

■ Up to six instructions executed per cycle with CLIW™

■ Conditional execution

■ Memory-based architecture ● memory operands used directly

■ Memory accesses: 4x16 data read and 2x16 data write (total of 4 memory access per cycle)

■ Data buffers addressing: linear, modulo, special and bit-reversal

■ Execution units: 2xALU, 2xMAC, Barrel Shifter, Exp. Unit

■ Six 40-bit accumulators

■ Four nesting levels of zero-overhead loops

■ 8 Stage pipeline

EXP SHIFT ALU1 MAC1 ALU2 MAC2

Data Bus Switch

Data Bus Switch

Accumulator Bank 6 x 40

16 16 16 16 16 16 16

40 40 40 40

6 x 40

2 x 16

from memory

Execution Unit 1 Execution Unit 2

immediate

to memory

from AU, PCU

M U X

2 x A LU

R eg iste r S et 0 -3

2 x A LU

R eg iste r S et 4-7

A LU

S P

A 1A A 2A B 1A B 2A

G 1D G 2Dim m ediate

Used Courtesy of Infineon Technologies

Page 39: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

ZSP Block Diagram

64Kx16

64Kx16

Ext. Memory Periph PeriphDMA

JTAG

I-Cache8x64b

D-Cache12x64b

Register File

ALU 1ALU 2MAC 1MAC 2Pi

pelin

e C

ontr

olU

nit

Instr Unit

Data Unit

DSP Core

Bus I/F

64b64b

Interrupt Control

32b

64b

32b

Boot ROM

PLL

MXU

DEU

64b

32b

■ 4-issue Superscalar engine● Simple RISC-like

programming model● Orthogonal Instruction Set● Register-based Operations

■ Pipeline complexity managed by Hardware

● 5-stage Hardware controlled Pipeline

● Relieves DSP programmers from having to deal with pipeline nuances

● Eliminates programming errors due to hidden states and execution restrictions

Used Courtesy of LSI Logic

■ Parallel execution optimized by Hardware

● Hardware automatically schedules instructions

● Programmers don’t need to find parallelism. Simple straight-line coding.

● No delay slots, No prefixes, No wasteful NOPs.

Page 40: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Modern DSP Architectures

■ Focus on compilability

■ RISC based with Control + DSP processing

■ Highly Parallel

■ Multiple instruction issue

■ Multiple operation issue● MAC● ALU● Load/Store

■ Predominately VLIW● Some use of SIMD

■ 32-bit unified address space

Page 41: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

TI C6x

■ A groundbreaking machine● VLIW with 8 functional units

◆ Up-to 8, 32-bit instructions issued per cycle

◆ 2 MPY, 6 ArithmeticInt 32x32 -> 64-bit result

◆ Instruction packing● RISC-based● Conditional execution● 8/16/32/40-bit types● Saturation / Normalization● Bit field manipulation● Circular addressing● Deep pipeline

Source: TI C6x InstructionSet Reference Manual

Page 42: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

StarCore SC140 Core Block Diagram

JTA

GTrace Event Unit

Event Detection

Event CounterEOnCETM Controller

Program SequencerBranch Unit(8 Loop Registers)

Instruction Dispatcher

Address Registers27 Tot (16 Gen)

Data RegisterFile (16 Gen.)

AAU AAU MAC1 MAC2 MAC3 MAC4

ALU1 ALU2 ALU3 ALU4

BFU1 BFU2 BFU3 BFU4

PDB

128

PAB

32

ABA

32

ABB

32

DBA 64

DBB 64

BMU

128

IB128

Data ALU Section

ISA EngineSectionDebug

Section

TAB

TDB

32 16

Instruction-Set Architecture Plug-In(s)

SC140

- Address Register File (32-bits, 27 Total, 16 General Purpose)-Also 4 modulo, 4 offset, 2 Stack Pointers, 1 modulo control

-Branch Registers: 8 hardware loop registers in Branch Unit-128 bit VLES

-Up to 6 instructions per clock, including 4 MACS-128 Bit Data Bandwidth

-Up to 8 data words per clock (4.8 GBytes per second)

-300 MHz @ 1.5 V; Low Power, Static Design-16 Functional Units Total-16 Bit Data, 40 Bit Accumulators

-Single cycle MAC, Integer and fractional data-32 Bit Address, Byte addressable

-One Unified data and program space-Data Register File: 16 40-bit General Purpose Registers

Used courtesy of Lucent /Motorola / Starcore

4/6/00

Page 43: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Summary DSP Architectures

■ Classical● Dot product processors● Poor compiler targets● Non-orthogonal● Small Address space● Multiple address spaces● Compound ISA● Highly focused app.

■ Transitional● More programmable● Some classical features● Some modern features

■ Modern● Focus on compiler /

architecture pair● Highly parallel● Multiple MACs● 32-bit unified address

space● RISC-based

◆ Control + DSP

Page 44: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

Communications R&D Center

DSP Compilation

Page 45: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

DSP Application Complexity

1000

10000

100000

1985 1995 2005

Line

s of

C C

ode

10x Complexity every 10 years

Page 46: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Compiler Productivity

6-9 Months!

DesignAlgorithms

Map toFixed Point

C

Write DSPSpecific C

Write DSPAssembly

Hand ScheduleOperations on DSP

Final Product

6-9 Months!

Page 47: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Compiler Productivity

NEW

Compile

6-9 Months!

DesignAlgorithms

Map toFixed Point

C

Write DSPSpecific C

Write DSPAssembly

Hand ScheduleOperations on DSP

Final ProductIf floating point implemented Final Product

6-9 Months!

Page 48: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Compilable Architecture

Compiler

Architecture

Optimize

Cost / PowerPerformance

3G

DSL

GSM

VoIP

Implementations

Algorithms

Page 49: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

DSP Compilation Problem

■ Mismatch between C & DSP● 16-bit fixed point● 40-bit accumulators with mixed type arithmetic● Saturation arithmetic vs. modulo semantics

■ Historically...● DSPs have had compiler unfriendly architectures

◆ very complex instructions◆ non-orthogonal, specialized resources◆ exposed pipelines

● DSP compiled performance◆ Typical: 1/10 speed of handwritten assembly◆ Assembly code is required for performance

Page 50: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

DSP Compilation Solutions■ Extensive libraries

● Often more than 1000 functions● Resource consuming but high reuse

■ C language extensions (DSP-C)● Type support (Q15)● Memory disambiguation

■ Intrinsics

■ Handwritten assembly code

■ Matlab compiler (BOPS)● 64-bit double precision of Matlab problematic

■ Tensor compiler● Algorithm specific● Highly skilled algorithm designers required

Page 51: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

DSP Intrinsics■ Intrinsics allow programmers to use instructions

a compiler can not generate

■ Has appearance of a function call in C ● Replaced with assembly statements by compiler● Highly architecture dependent

■ Often condense 10 assembly instructions into 1

■ Early attempts were blocking● Inlined asm statement

■ Non-blocking pioneered by Lucent● Written in the compiler’s intermediate language● Semantics of side effects well defined● Allowed for further optimization● Architecturally neutral

Page 52: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

IBM DSP Compilation Solution

■ Intrinsics work well but…● Compiler writers become DSP assembly

language programmers● Only work for a specific application

■ IBM Solution: Semantic Analysis● Type inference● no intrinsics: out-of-the box C compiler● near-parity with assembly code● novel DSP optimizations● existing optimizations adapted for DSPs● power-driven optimizations

Page 53: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Compiled Simulator

■ Aids compiler debug● Fast compile/execute/check for correctness turn-

around time

■ Provides profile information● Add extra instructions to gather statistics

■ Debug using host debugger● One-to-one mapping between instruction break

points and compiled simulator code sequences● One-to-one mapping between architected state and

compiled simulator state

■ Mix and match with native code● Can use native libraries● Debug by compiling some files using native compiler

Page 54: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

Communications R&D Center

IBM e-lite Architecture

Page 55: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Architecture Domain

( ) = OPS

2k

Perf

orm

ance

500

ADSL XCVR - 6.1Mb/s (500M)

DFSE EQ (UMTS) - 2Mb/s (650M)Single Chip DAB XCVR ( 800M)

H.263L + GSM Terminal (EHR/HSCSD/GPRS) (1.0G)

MPEG II MP@ML, 30fps Decode (600M)

Single Chip STB (1.5G)MPEG II Encode, MP@ML, 30f/s, ALG Search, P=16, (1.68G)

Symphonic SynthesisNatural Language ProcessingReal-time Speech Recognition

3G WirelessSoftware Radio

MACs MPNew Architectures Required

100

Page 56: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

e-lite Objectives

■ Fully compilable DSP● “out-of-the-box” C compilation

■ Low-power focus● Algorithm techniques● Software techniques● Architectural techniques● Microarchitectural techniques● Circuit techniques● Process techniques

■ Applications Area: Broadband Communications● 3G wireless, VoIP, xDSL

Page 57: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Prior Research ContributionsLow Power DSP Compilation

Research

Chameleon Vector ResearchSiGe

SOI

MethodologyCircuitsArchitectureProcess

3G WirelessWide Band CDMA2.5G GSMS/W Radio

SIMD VectorsSchedulingType Recognition

ProcessorDesign

PPC IBM DSPPPCBluetooth

WLAN

Page 58: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

e-lite Architecture■ Developed hand-in-hand with the compiler

■ 64-bit multiple instruction bundles● 3-instruction issue per cycle peak● Each instruction may specify multiple operations

■ Pre-decoded Instruction cache● 5-issue per cycle peak

■ Non-interlocked pipeline● Except long loads● Minimal Control paths

■ SIMD execution

■ Streaming Register File

■ Fully visible hardware resources

P

P 30-bit 30-bit

20-bit20-bit20-bit

Page 59: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

e-lite Execution Units

■ Integer Unit● 32-bit modulo arithmetic

■ Storage Access Unit● Byte (8)/half-word (16)/word (32) transfers● 64-bit Vector transfers

■ Vector (SIMD) Unit● 16-bit Q15 format

■ Vector Reduction Unit● Parallel accumulation

■ Branch Unit

Page 60: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

e-lite Compiler Results

12/99 12/000

20

40

60

80

100

120M

Hz

Typical efrTypical efr ni

e-lite efr e-lite amr efr Target

400

GSM EFR/AMR Speech Coder Typical w/ and w/o intrinsics

e-lite untouched C Code

Page 61: Trends in Compilable DSP Architecture - Glossnerglossner.org/john/papers/sips_presentation.pdf · Trends in Compilable DSP Architecture John Glossner, Jaime Moreno, Mayan Moudgill,

J. Glossner SIPS-2000 Communications R&D Center

Conclusions■ DSP design has undergone major paradigm shift

● Soaring costs of assembly programming have altered DSP architectures

● Compilable DSPs are required● Ultra-low power implementations are desirable● Deterministic execution still required

■ Multiple-issue highly parallel architectures will become more prevalent

● Mix of control and compute codes● Vectorization technology makes SIMD implementations

possible

■ Compilers will play an even more important role in the future

● Take into account memory hierarchy● Parallelism extraction● Type analysis● Precision analysis