architecting an lte base station with graphics processing...

41
1 1 1 1 University of Michigan Architecting an LTE Base Station with Graphics Processing Units Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti + , Achilleas Anastasopoulos*, Scott Mahlke*, Trevor Mudge* *University of Michigan, Ann Arbor + Arizona State University, Tempe SiPS ’13 Oct 17, 2013

Upload: others

Post on 23-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

1 1

1

1 University of Michigan

Architecting an LTE Base Station with Graphics Processing Units

Qi Zheng*, Yajing Chen*, Ronald Dreslinski*, Chaitali Chakrabarti+, Achilleas Anastasopoulos*,

Scott Mahlke*, Trevor Mudge* *University of Michigan, Ann Arbor +Arizona State University, Tempe

SiPS ’13

Oct 17, 2013

Page 2: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

2 2

2

2 University of Michigan

Wireless Base Station

Page 3: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

3 3

3

3 University of Michigan

Baseband SDR in a Base Station

GOPS Throughput

Page 4: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

4 4

4

4 University of Michigan

1.0E-01

1.0E+00

1.0E+01

1.0E+02

1.0E+03

Peak

Dat

a R

ate

(Mbp

s)

Technology Evolution

Page 5: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

5 5

5

5 University of Michigan

Baseband processor

Processor for Wireless Base Station

!  Flexibility -- Good programmability

!  Performance -- High computing throughput

Page 6: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

6 6

6

6 University of Michigan

GPU

Baseband processor

Processor for Wireless Base Station

!  Flexibility -- Good programmability

!  Performance -- High computing throughput

Page 7: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

7 7

7

7 University of Michigan

Graphics Processing Unit "  High Throughput:

"  GOPS/TOPS-level peak throughput

"  Good Programming Support "  CUDA "  OpenCL

"  High Efficiency

Processor GFLOP/dollar GFLOP/watt Nvidia GeForce GTX680 6.192 15.848

Intel Xeon E7-8837 0.037 0.656

Intel Itanium 8350 0.007 0.150

Page 8: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

8 8

8

8 University of Michigan

GPU Architecture "  SIMT – Single Instruction Multiple Threads

"  Thousands of cores on a GPU "  Explore Data-Level Parallelism

t0 t1 t2 tn-2 tn-1 tn ..…

d0 d1 d2 dn-2 dn-1 dn ..…

Instruction Instruction Fetch Unit

ALU

Memory

Page 9: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

9 9

9

9 University of Michigan

GPU Architecture "  SIMT – Single Instruction Multiple Threads

"  Thousands of cores on a GPU "  Explore Data-Level Parallelism

" Multithreading "  Hide long memory latency

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

Thread Group 0: ld R0 [R1+Offset] sub R2, R0, #2 add R0, R2, R3

Dcache miss

Page 10: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

10 10

10

10 University of Michigan

GPU Architecture "  SIMT – Single Instruction Multiple Threads

"  Thousands of cores on a GPU "  Explore Data-Level Parallelism

" Multithreading "  Hide long memory latency

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

Thread Group 1: mul R3,R4,R5,R6 ld R0 [R1+Offset] sub R2, R0, #2 add R0, R2, R3

Page 11: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

11 11

11

11 University of Michigan

GPU Architecture "  SIMT – Single Instruction Multiple Threads

"  Thousands of cores on a GPU "  Explore Data-Level Parallelism

" Multithreading "  Hide long memory latency

GOPS/TOPS-level peak throughput

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

SIMT

Multithreading

Page 12: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

12 12

12

12 University of Michigan

GPU Mapping Challenge "  Core underutilization

"  Over 1000 cores on a commercial GPU

"  Pipeline stall "  Long memory access latency

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

t0 t1 t2 tn-2 tn-1 tn ..…

SIMT

Multithreading

Page 13: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

13 13

13

13 University of Michigan

Our Contribution "  Previous works

" Mapped only a single kernel onto a GPU

"  Base station study on traditional platforms, such as DSP, FPGA, etc

"  In this work "  Key Kernel Parallelization

"  Kernel runtime performance

" Minimum number of GPUs needed

"  System Power consumption

Page 14: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

14 14

14

14 University of Michigan

Outline " Motivation

"  GPU Architecture "  Key Kernels Parallelization

"  Experimental Results

"  Conclusion

Page 15: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

15 15

15

15 University of Michigan

List of Key Kernels "  PHY Layer

"  SC-FDMA demodulation (FFT) "  Transform decoder (IDFT) "  Channel estimation " MIMO detection " Modulation demapper

"  Turbo decoder

Page 16: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

16 16

16

16 University of Michigan

Parallelism in PHY Layer Kernels

Parallelism Description Num of threads

User-level Process data from different users in parallel #thread = Nusr

Antenna-level Process data from different receiver antennae in parallel #thread = Nant

Symbol-level Processing SC-FDMA symbols in a subframe in parallel #thread = Nsym

Subcarrier-level Each subcarrier in a symbol of is processed in parallel #thread = Nsub

Algorithm-level parallelism inherent in each algorithm

Varies based on kernels

Page 17: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

17 17

17

17 University of Michigan

Parallelism in PHY Layer Kernels Kernel Parallelism Number of threads

FFT/IFFT User-level

Antenna-level Symbol-level

Algorithm-level

Channel Estimation

User-level Antenna-level

Subcarrier-level

MIMO detector User-level

Symbol-level Subcarrier-level

Modulation demapper

User-level Antenna-level Symbol-level

Subcarrier-level Algorithm-level

Nusr × Nant × Nsym × NFFT

Nusr × Nant × Nsub

Nusr × Nsym × Nsub

Nusr × Nant × Nsym × Nsub × NMod

Nusr × Nant × Nsym × Nsub × log2 (NMod )

Page 18: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

18 18

18

18 University of Michigan

Parallelism in Turbo Decoder* "  Total number of threads

"  Implementation performance tradeoff

Nthread = N packet ⋅Nsubblock ⋅Threadtrellis

Parallelism Scheme Throughput Latency Bit Error Rate

Packet-level Better Worse No Change

Subblock-level Better No Change Worse

Trellis-level Better No Change No Change

Subblock+NII Worse No Change Better

Subblock+TS Worse No Change Better

*Qi Zheng, et al, “Parallelization Techniques for Implementing Trellis Algorithms on Graphics Processors”, ISCAS 2013, Beijing

Page 19: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

19 19

19

19 University of Michigan

Experimental Setup "  Nvidia GeForce GTX680

"  8 Streaming Multiprocessors "  1536 Streaming Processors "  64KB L1 cache + shared memory "  512KB L2 cache "  2GB DRAM

"  GPU Runtime measure "  CUDA event record

"  GPU Power measure "  GPU-Z

Page 20: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

20 20

20

20 University of Michigan

Experimental Setup Kernel Configuration

Turbo decoder

Code Rate = 1/3, Codeword Length = 6144, Iteration Number = 5

Modulation demapper 16QAM/64QAM

SC-FDMA FFT 2048

Decoding IFFT 1200

MIMO 1x1, 2x2, 4x4

Page 21: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

21 21

21

University of Michigan

"  One LTE subframe on a GPU "  Different antenna configurations

PHY Layer Kernel Runtime

!"

!#$"

!#%"

!#&"

!#'"

("

(#$"

(#%"

(#&"

(#'"

$"

))*" +))*" ,+,-"./0/1023" 45677/8"/9:;6:27"

,2.<86:27"./;6==/3>(&?@,"

,2.<86:27"./;6==/3>&%?@,"

ABC"D6E/3"F/

37/8"G<7

:;/9"H;

9I"2J"6

7"D*K"9<LJ36;/"

(M(" $M$" %M%"

Page 22: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

22 22

22

22 University of Michigan

Turbo Decoder Performance

Schemes Throughput

(Mbps)

Worst-case Codeword Latency

(ms)

BER

Tellis-level Subblock Num

Codeword Num

State-level 512 2 77.6 0.7 1.6×10-3

State-level 256 4 78.2 1.7 4.1×10-3

State-level For-Back 256 2 78.3 0.7 4.1×10-3

State-level For-Back 128 7 80.6 3.1 2.0×10-3

Page 23: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

23 23

23

23 University of Michigan

Turbo Decoder Performance

Schemes Throughput

(Mbps)

Worst-case Codeword Latency

(ms)

BER

Tellis-level Subblock Num

Codeword Num

State-level 512 2 77.6 0.7 1.6×10-3

State-level 256 4 78.2 1.7 4.1×10-3

State-level For-Back 256 2 78.3 0.7 4.1×10-3

State-level For-Back 128 7 80.6 3.1 2.0×10-3

Page 24: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

24 24

24

24 University of Michigan

!"

#"

$"

%"

&"

'"

("

)"

*"

+"

#!"

'!" )'" #!!" #'!" $!!" %!!"

,-.

/01"2

3"456

(*!"47

89"

70:;"<:=:">:=0"?@/A9B"

7CD" 5-1/2" 52=:E"

Number of needed GPUs "  Meet latency and throughput requirements

• (tk )∑ ≤ 1ms for a subframe• Thturbo ≥Thsys

Page 25: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

25 25

25

25 University of Michigan

Number of needed GPUs "  Meet latency and throughput requirements

!"

#"

$"

%"

&"

'"

("

)"

*"

+"

#!"

'!" )'" #!!" #'!" $!!" %!!"

,-.

/01"2

3"456

(*!"47

89"

70:;"<:=:">:=0"?@/A9B"

7CD" 5-1/2" 52=:E"

• (tk )∑ ≤ 1ms for a subframe• Thturbo ≥Thsys

Page 26: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

26 26

26

26 University of Michigan

!"

#"

$"

%"

&"

'"

("

)"

*"

+"

#!"

'!" )'" #!!" #'!" $!!" %!!"

,-.

/01"2

3"456

(*!"47

89"

70:;"<:=:">:=0"?@/A9B"

7CD" 5-1/2" 52=:E"

Number of needed GPUs "  Meet latency and throughput requirements

• (tk )∑ ≤ 1ms for a subframe• Thturbo ≥Thsys

Page 27: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

27 27

27

University of Michigan

Power and Energy "  Support 75Mbps

"  Two GTX680 GPUs + One Intel Core 2 CPU "  Total power is 188W

!""#

$%"#

&%'#

()%&#

!%(# !%$#

*+,-./#0123456-78,9#

:4-5;#<,=;<,-## >?@ABCD#AA:##B,=;<E+.#FAA:## C;<4G7H;+#<,87II,-##?J7++,G#,3H87H;+## CFCK#<,L,=L;-##

!"#

!$#

$"#

$$#

%"#

%$#

&'()*#+,-#

!""#

$%"#

&%'#

()%&#

!%(# !%$#

*+,-./#0123456-78,9#

:4-5;#<,=;<,-## >?@ABCD#AA:## B,=;<E+.#FAA:## C;<4G7H;+#<,87II,-## ?J7++,G#,3H87H;+## CFCK#<,L,=L;-##

Page 28: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

28 28

28

28 University of Michigan

Conclusion "  Highly parallel GPU implementations of all key kernels

"  Kernel runtimes under different configurations

"  Up to four GTX680 GPUs needed for ≤150Mbps "  Can fit into a motherboard with low latency

"  Dual-GPU solution consumes 188W for 75Mbps "  Competitive with a commercial solution

Page 29: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

29 29

29

29 University of Michigan

Thanks!

Any questions?

Page 30: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

30 30

30

30 University of Michigan

Backup

Page 31: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

31 31

31

31 University of Michigan

Our Contribution "  Previous works

" Mapped only a single LTE kernel onto a GPU

"  Base station study on traditional platforms, such as DSP, FPGA, etc

"  In this work "  Parallel implementations of LTE key signal processing kernels on GPU

"  Kernel runtime performance under different system configurations

"  The number of GPUs needed for the baseband subsystem in an LTE base station

"  Power consumption of the GPU-based solution

Page 32: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

32 32

32

32 University of Michigan

Parallelism in PHY Layer Kernels "  User-level Parallelism

"  #thread = Nusr

"  Antenna-level Parallelism "  #thread = Nant

"  Symbol-level Parallelism "  #thread = Nsym

"  Subcarrier-level Parallelism "  #thread = Nsub

"  Algorithm-level Parallelism

Page 33: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

33 33

33

33 University of Michigan

Parallelism in PHY Layer Kernels "  User-level Parallelism

"  #thread = Nusr

"  Antenna-level Parallelism "  #thread = Nant

"  Symbol-level Parallelism "  #thread = Nsym

"  Subcarrier-level Parallelism "  #thread = Nsub

"  Algorithm-level Parallelism

Page 34: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

34 34

34

34 University of Michigan

Parallelism in PHY Layer Kernels "  User-level Parallelism

"  #thread = Nusr

"  Antenna-level Parallelism "  #thread = Nant

"  Symbol-level Parallelism "  #thread = Nsym

"  Subcarrier-level Parallelism "  #thread = Nsub

"  Algorithm-level Parallelism

Page 35: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

35 35

35

35 University of Michigan

Parallelism in PHY Layer Kernels "  User-level Parallelism

"  #thread = Nusr

"  Antenna-level Parallelism "  #thread = Nant

"  Symbol-level Parallelism "  #thread = Nsym

"  Subcarrier-level Parallelism "  #thread = Nsub

"  Algorithm-level Parallelism

Page 36: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

36 36

36

36 University of Michigan

Parallelism in PHY Layer Kernels "  User-level Parallelism

"  #thread = Nusr

"  Antenna-level Parallelism "  #thread = Nant

"  Symbol-level Parallelism "  #thread = Nsym

"  Subcarrier-level Parallelism "  #thread = Nsub

"  Algorithm-level Parallelism

Page 37: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

37 37

37

37 University of Michigan

Parallelism in PHY Layer Kernels "  User-level Parallelism

"  #thread = Nusr

"  Antenna-level Parallelism "  #thread = Nant

"  Symbol-level Parallelism "  #thread = Nsym

"  Subcarrier-level Parallelism "  #thread = Nsub

"  Algorithm-level Parallelism

Page 38: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

38 38

38

38 University of Michigan

Number of needed GPUs "  The minimum number of GTX680 GPUs needed for the

baseband system of an LTE base station

Peak Data Rate (Mbps)

Number of GPUs

PHY Turbo Total

50 1 1 2

75 1 1 2

100 1 2 3

150 2 2 4

200 2 3 5

300 5 4 9

Page 39: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

39 39

39

39 University of Michigan

Processor for Wireless Base Station

Good programmability

High computing throughput

Page 40: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

40 40

40

40 University of Michigan

Processor for Wireless Base Station

High computing throughput

Good programmability GPU

Page 41: Architecting an LTE Base Station with Graphics Processing Unitsqizheng/file/SiPS13_presentation.pdf · 2015. 3. 4. · 1 1 University of Michigan 1 Architecting an LTE Base Station

41 41

41

University of Michigan

Power and Energy "  Support 75Mbps

"  two GTX680 GPUs + one Intel Core 2 CPU "  Total power is 188W

Kernel Power (W) Energy (J/subframe)

Turbo decoder 63.3 144.0

SC-FDMA FFT 56.7 3.4

Decoding IFFT 56.9 5.7

Modulation demapper 56.3 26.5

Channel estimation 61.8 1.2

MIMO detector 57.7 1.3