low power qdi asynchronous fft · 14 this work sync (chip)* [1] [2] tech 65 nm 65 nm 65 nm 0.35 µm...

17
Low Power QDI Asynchronous FFT Benjamin Z. Tang, Frank Lane Qualcomm Research May 10, 2016

Upload: others

Post on 30-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

Low Power QDI Asynchronous FFT

Benjamin Z. Tang, Frank LaneQualcomm ResearchMay 10, 2016

Page 2: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

2

• Extreme long battery life is crucial for M2M communication

• Must push innovation in low-power communication

• FFT is a common IP block that is computationally and memory intensive

Motivation

Page 3: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

3

• DFT:

• FFT: O(N log(N))

• Implementation options:• Algorithm level: Radix, DIT/DIF, number representation• Circuit level: Twiddle multiplication, CORDIC rotation, multi-rate clocking, etc

• Reference sync version:• Radix 23

• Decimation in frequency (DIF)• 16-bit real, 16-bit imaginary• Twiddle multiplication: (a+bj)*(c+dj) = (ac-bd) + (ad+bc)j

FFT

21

0[ ] [ ] , 0,1,...., 1

kN j nN

nX k x n e k N

π− −

=

= = −∑

Page 4: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

4

Radix 23

FFT Architecture

Stage 1 Memory

Stage 2 Memory

Stage 2 Memory

Stage 3 Memory

Stage 3 Memory

Stage 3 Memory

Stage 3 Memory

-j

-j

-jW8

W8

IN

Memory

TwiddleNi/2

Ni/4

Ni/8

Page 5: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

5

• QDI

• Micro-architecture optimizations highlights:• Token ring memory controls

• CORDIC twiddle multiplication

Async FFT

Page 6: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

6

Example:

• 1st stage of 8-pt FFT

• 1-D ring

Token-Ring Memory Intro

Tin

ToutC

Tin

ToutC

Tin

ToutC

Tin

ToutC

Stage mem input

Out

OutdelayedTo butterfly

Stage memory

0

1

2

3

WR

W

W

W

R

R

R

Page 7: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

7

1st stage example

• 2 passes, need 2-D ring

• 1st pass: store 8, read 8, once

• 2nd pass: store 1, read 1, 8 times

• Token ring keeps track of pattern

16-point FFT

Token-Ring ControlsTin

ToutC

Tin

ToutC

Tin

ToutC

Tin

ToutC

Stage mem input

Out

OutdelayedTo butterfly

Stage memory

Tin

ToutC

Tin

ToutC

Tin

ToutC

Tin

ToutC

0

1

2

3

4

5

6

7

WR

WR

WR

WR

WR

WR

WR

WR

W

WR

WR

WR

WR

WR

WR

WR

R

Page 8: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

8

1st stage example

• 3 passes, need 3-D ring, with 8 groups of 8

• 1st pass: store 64, read 64, once

• 2nd pass: store 8, read 8, 8 times

• 3rd pass: store 1, read 1, 64 times

128-point FFT

Token-Ring Controls Tin1Tout

CTin0

Tin1Tout

CTin0

ToutC

Tin

ToutC

Tin

Tout0C

Tin

Tout1

Tout0C

Tin

Tout1

0

1

7

8

9

15

...

...

Group 0

Group 1

Group 2

Group 7

...

...

...

Token rings provide controls for memory read/write, automatic back-pressure, counting and addressing

Page 9: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

9

• Twiddle (e-j(2πkn/N)) multiplication = rotation by angle –(2πkn/N)

• Pipelined vs iterative

• Performance, area, clock cycles

CORDIC Rotation Intro

1

1

1

0

0

0

2

2

arctan 21, 01, 0

ii i i i

ii i i i

ii i i

ii

i

in

in

x x y d

y y x dz z d

zd

zx xy yz rotation angle

−+

−+

−+

= − × ×

= + × ×

= − ×

− <⎧= ⎨

≥⎩=

=

=

Chose iterative architecture

Page 10: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

10

• 6 iterations

• Bypass CORDIC for 0 degrees• About 37% of the time in 128-point FFT

• Increased performance, reduced power

Asynchronous CORDIC Engine

×(-1)atan

01

0 1

0 0

1 1

msb

Z07

7

01

0 1

×(-1)

0 0

1 1

18

>>>i

Y0

Youtscale16

C

Cend

18

Dinv

01

0 1

0 0

1 118

>>>i

X0

Xoutscale16

C

Cend

18

count

Dinv Dinv

Xs Ys

i i

C=*[1,0,0,0,0,0,0]Cend=*[0,0,0,0,0,0,1]

C

Cend

×(-1)

Ramps up performance to cycle through iterations without being constrained by system clock

Page 11: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

11

• CHP production rules transistor-level (Spice) netlist

• Sinusoid input, negligible difference compared to result from Matlab's native FFT function

Results - FFT Plot

Page 12: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

12

Results – Spice Waveforms

Inputs

1st stage

2nd stage

3rd stage

Outputs

Page 13: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

13

Subsystems Energy (nJ)

Memories, controls, butterflies, others 3.1

CORDIC 2.8

Total 5.9

Results - Power

• 65nm technology

• Vdd=1V

• 10 MHz data rate

Page 14: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

14

This Work Sync (Chip)* [1] [2]

Tech 65 nm 65 nm 65 nm 0.35 µm

Voltage 1.0 V 1.0 V 0.3 V 1.1 V

N-point 128 128 128 128

Data rate 10 MHz 10 MHz - 16 kHz

Energy 5.9 nJ 205 nJ 31 nJ 120 nJ

Results – Comparison

* Normalized to same data rate and FFT length

[1] K.-S. Chong, J. Chang, I. Ebong, Y. Yilmaz, and P. Mazumder, “Comparison of FFT/IFFT Designs Utilizing Different Low Power Techniques,” in Electronic System Design (ISED), 2012 International Symposium.

[2] K.-S. Chong, B.-H. Gwee, and J. S. Chang, “Energy-Efficient Synchronous-Logic and Asynchronous-Logic FFT/IFFT Processors,”, IEEE Journal of Solid-State Circuits, 2007.

Page 15: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

15

• Low power clockless FFT design• Same design concepts can be extended to high performance

systems• Can lower supply voltage further for near-threshold computing

• Simple, fast token rings memory controls

• Small, fast CORDIC engine

Summary

Page 16: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

16

• Rajit Manohar for async CAD tools

Acknowledgment

Page 17: Low Power QDI Asynchronous FFT · 14 This Work Sync (Chip)* [1] [2] Tech 65 nm 65 nm 65 nm 0.35 µm Voltage 1.0 V 1.0V 0.3 V 1.1 V N-point 128 128 128 128 Data rate 10 MHz 10 MHz

Thank you

Follow us on:For more information, visit us at: www.qualcomm.com & www.qualcomm.com/blog

Nothing in these materials is an offer to sell any of the components or devices referenced herein.

©2016 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries.Other products and brand names may be trademarks or registered trademarks of their respective owners.

References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable.Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.