telecom systems simulations acceleration via cpu/gpu...

34
Paolo Spallaccini, Stefano Chinnici Telecom Systems Simulations via CPU/GPU co-processing NVIDIA GPU Technology Conference 2012, San Jose, CA

Upload: lecong

Post on 09-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 48 pt

Slide subtitle

minimum 30 pt

Paolo Spallaccini, Stefano Chinnici

Telecom Systems Simulations via

CPU/GPU co-processing

NVIDIA GPU Technology Conference 2012, San Jose, CA

Page 2: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 48 pt

Slide subtitle

minimum 30 pt

Paolo Spallaccini, Stefano Chinnici

TURBO CODES CASE STUDY

›Ericsson Telecomunicazioni, Milan (ITALY)

Page 3: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 3

Fast Simulations for Communications Systems

PHY layer simulation is extremely

time consuming

Extrapolation of short timescale

results is risky

So what ?

HW prototyping (full/reduced

speed) is costly

Educated guesses are not always

optimal

Page 4: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 4

CPU-based serially iterated

simulations

First time right ASIC design

Trading-off computation latency for

data-level parallelism!

TTM driven development

Higher Design Quality

Fast Simulations for Communications Systems

Page 5: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 5

describing the battlefield

Simulating a

whole

telecom

system chain

is a very time

intensive

task, due to

the

complexity of

the overall

system

Typically,

physical

layer

simulations

on

conventional

CPUs have

a runtime of

several

weeks

Several

algorithms

that depend

on the

particular

processed

layer have to

be

implemented.

They often do

not benefit

from parallel

data

processing

An

adequate

statistic

characteriz

ation of the

simulation

often

requires a

very large

number of

iterations

Page 6: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 6

THE phy layer simulation model

Random

Bits

Source

Serial

Turbo

Code

Encoder

QAM

Modulator

AWGN

Channel

Soft QAM

Demod

Serial

Turbo

Code

Decoder

BER

Meter

SCCC

The key players:

traffic source channel features

transceiver

modulation and

coding

Page 7: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 7

our case study...

Random

Bits

Source

Serial

Turbo

Code

Encoder

QAM

Modulator

AWGN

Channel

Soft QAM

Demod

Serial

Turbo

Code

Decoder

BER

Meter

TURBO Forward-

Error Correction

(FEC)

performance

characterization

below BER 10-9

Iterative

Decoding has

a very high

computational

complexity

The decoding algorithm

performs both

... + recursive

computations

intrinsically parallel

calculations

Page 8: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 8

...and the next steps

Random

Bits Source

Serial

Turbo

Code

Encoder

QAM

Modulator

AWGN

Channel

Soft QAM

Demod

Serial

Turbo

Code

Decoder

BER Meter

Deliver a GPU-accelerated Simulation Library

Next functions to be attacked

(in order of comp. complexity): 1. soft decision demodulator

2. Additive White Noise Gaussian channel model

3. Turbo-code encoder

Page 9: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 10

The T-shaped approach:

1. (Horizontal) Widespan the whole

simulation system to identify the best

CPU-GPU synergy perspective, in a

scenario able to exploit parallel

processing

2. (Vertical) Dive deeply into the block

targeted for GPU implementation

our Cpu-gpu co-processing perspective

Page 10: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 11

Coarse Profiling, first (and maybe some coffee...)

...Checking

system

performance... “GPU idea” is

promising but

you can feel

you’re going

along an

unpaved path!

Easy finding, the

rule of thumb:

Decoder must

be ”accelerated”

before any other

simulation block

Very next in line

is the soft

demod, heavier

for larger

modulations

CPU execution

times [ms]

QPSK 1024

QAM

Average time

per FEC

block 45.7 57.8

Spent in

SISO decoder 43.0 49.1

Spent in soft

demodulator 2.0 7.2

Page 11: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 12

“complex” scenario in a parallel processing perspective

The algorithm mix in a typical telecom

network phy layer model is extremely

complex

Execution level Parallelization work

often requires algorithm re-

engineering

Data level parallelism is:

- either inherent (but always limited)

- or obtainable treating larger input

data sets

- don’t forget we are not

running real time stuff!

BUT,

surely this is not an isolated case!

Page 12: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 13

inherently serial vs Embarassingly parallel algorithms

Simple,

light,

Parallel

unfriendly

Simple,

medium,

Parallel

friendly

Simple,

medium,

Parallel

friendly

Complex,

heavy,

Parallel

unfriendly

Simple,

medium,

Inherently

serial

Complex,

very heavy,

Poses

challenges

Random

Bits

Source

Serial

Turbo

Code

Encoder

QAM

Modulator

AWGN

Channel

Soft QAM

Demod

Serial

Turbo

Code

Decoder

BER

Meter

From GTC 2010 - among many other examples: deflation (highly parallel) and preconditioning (inherently serial) of conjugate gradient

Page 13: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 14

Exploiting parallelism in a telecom system

Algorithm Intrinsic Parallelism Da

ta L

eve

l P

ara

llelis

m

System Block 1 Syste

m B

lock 1

System Block 1 Syste

m B

lock 1

Parallel domain 2

Parallel domain 2

Parallel domain 1

Parallel domain 1

Page 14: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 15

From A Naive...

PARALLEL

Page 15: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 16

...to a more structured approach

The T-shaped approach is good to prove the feasibilty of a GPU-

based simulation platform for heterogeneous and (very) complex

systems

But, to what extent shall we push efforts to paralelize and optimize

(in the ”CUDA sense”) the implementations of a single block...

...rather than try to regard at the overall simulation? Ultimately, what

kind of research efforts are needed in order to sort out challenges

posed by a telecom system?

Page 16: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 17

Need for newer abstraction perspectives in modeling?

So, possibly the most important lesson learned

is related to the fact that a different modeling

strategy is necessary in order to ... ... define methods and criteria to optimately

cope with the different topics of a fully-parallel

approach to the problem of simulating a

telecommunication network (even if at just the

phy layer)

Page 17: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 18

Serial turbo code overview

Outer

Conv.

Coder

(rate ½)

Outer

Code

Punct (6/8) π

Inner

Conv.

Coder

(rate ½)

Inner Code

Punct

Message Bits Coded Bits

OUTER

CONVOLUTIONAL

CODE COUT

RATE k/p

INNER

CONVOLUTIONAL

CODE CIN

RATE p/n

INTERLEAVER N

SCCC (rate = k/n)

Page 18: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 19

Serial turbo code: key points

Iterative Decoding of Turbo Codes

Iterated functions acceleration

leads to high speed-up

non-Iterated functions acceleration

removes memory bottlenecks

inner code puncturing

data permutation constituent codes soft decoding

outer code puncturing

Page 19: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 20

BCJR decoding algorithm: BLOCK DIAGRAM

Turbo Code Decoding Algorithm

based on Bahl, Cocke, Jelinek and Raviv (1974)

updated by Berrou (1993) and Benedetto (1996)

Iterative Algorithm:

- Minimizes the bit error probability

- Iterates until convergence is reached

Each iteration uses a double recursion

to compute updated probabilities

of each bit in the received FEC block

given the channel characteristics and

the code structure.

Page 20: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 21

Decoder constituents: definitions

ix iy

The INTERLEAVER

I

)()( ii xyxIy

iii ,)( Under causality assumption

SOFT

DEMODULATOR ky )|()( cypcL kk def

)|1(

)|1(ln

yuP

yuP

k

k

Log-Likelihood ratio: LLR

def

1)(~ )0(

2 iL k

)()(~

))(()(~

:

)1(

211

)(

1 lal

iu kl

m

l

m

k upuLucLiLk

u

)()(~

))(()(~

:

)(

122

)(

2 lal

iu kl

m

l

m

k upuLucLiLk

u

Extrinsic information: Extrinsic information

provided by decoder 1

used as a-priori

information by the

other decoder

Page 21: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 22

decoding algorithm details: siso block

(A)-SISO

);( IcPk

);( IuPk

);( OcPk

);( OuPk

)()(~

);( )1(

2 upuLIuP a

n

kk

)();( 11 cLIcP kk

The core is the Soft-Input Soft-Output (SISO)

decoder

- based on a double recursion

- Log-domain formulation (log-MAP algorithm) used

- lower computation complexity!

Key Steps:

- branch metrics computation

- forward and backward recursions

- soft output computation

all these operations are performed on the code trellis

(1st SISO)

)()(~

);( )(

)(1 upuLIuP a

n

kk

)();( 22 cLIcP kk

(2nd SISO)

A posteriori extrinsic:

)(~ )(

1 uL n

k )(~ )(

2 uL n

k );( OuPk);( OuPk

1SISO 2SISO

Page 22: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 23

SISO decoding: double recursions

SISO decoding on the code trellis

1- branch metrics (received symbol normed distance with every

possible code alphabet symbol) are computed as in Viterbi decoding

2- forward path metrics are computed according to

3- backward path metrics are computed according to

4- soft output is computed according to

Page 23: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 24

decoding iterations (turbo code generic scheme)

SISO

1

A priori LLR

SISO

2

decision

not used not used

)( 11 cL k

)( 22 cL k

)(~ )1(

2 uL n

k

)(

~ )(

1 uL n

k)(

~ )(

)(1 uL n

k

)(~ )(

)(2 uL n

k

)(~ )(

2 uL n

k

)(upa

I1I

)()(~

);( )1(

2 upuLIuP a

n

kk

)();( 11 cLIcP kk

)()(~

);( )(

)(1 upuLIuP a

n

kk

)();( 22 cLIcP kk

SISO 1 SISO 2 A single decoding iteration

involves

- two SISO decoding

operations

- on inner and outer code

- two soft bits permutations

- direct and reverse

Iterations are repeated until

convergence - or until a limit value is

reached

Page 24: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 25

turbo decoding parallel algorithm

? Parallel Friendly Sections:

- permutation (direct and reverse)

- memory access problems to be

taken into account!

- SISO branch metrics computation

- soft output computation

Inherently serial Sections:

- forward and backward recursions

- algorithmic

reformulation/reengineering

needed !

Page 25: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 26

decoding bottleneck: double recursion

SISO decoding bottleneck

- forward path metrics computation

recursive sum over ”past” trellis steps

- backward path metrics computation

recursive sum over ”future” trellis steps

Page 26: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 27

breaking recursions

Recursions can be split over N

windows

- minimum window size is FEC dependent

- windows boundary values

- taken from previous iteration

Parallel-friendly algorithmic re-engineering

- side effect: convergence is slowed down

- more iterations are required

Input codewords vector

window 0 window 1 window N-1 window k-1 window k

BACKWARD

FORWARD

BACKWARD

FORWARD

Backward recursion state metric distribution

Forward recursion state metric distribution

iteration 1

Iteration 2

INIT

Page 27: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 28

BCJR algorithm implemented - iterations

cudaMonolithicBCJR_I <<< B, T >>>

(Cod_cu, Inf_cu, OInf_cu, AlphaWinMem_I_cu, BetaWinMem_I_cu);

cuda_Deinterleaver <<< blocksInter, threadsInter >>>

(Permutation_cu, InfOInner_cu, Ext_cu);

cudaO_Depunct <<< blocksPunct, threadsPunct >>>

(Ext_cu, OCod_cu);

cudaMonolithicBCJR_O <<< B, T >>>

(Cod_cu, Punctarray_cu, OInf_cu, OCod_cu, OPunct_cu, AlphaWinMem_O_cu,

BetaWinMem_O_cu, StopOrGo_cu);

cuda_Punct_Interleaver_StopRule <<< blocksInter, threadsInter >>>

(Permutation_cu, Ext_cu, OPunct_cu, Inf_cu, Stopping_cu);

Smaller grid, time-intensive

kernel. (huge data vectors

exchanged with host)

Larger grid,

lean kernel

Larger grid, lean

kernel (no host-

device data

exchange

Smaller grid, time-

intensive kernel

Main data interface

with host; relatively

large grid

Page 28: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 29

enc

mod

demod

dec

input

0 1 2 ... N-3 N-2 N-1

chan

looped

execution

looped

execution

looped

execution

looped

execution

looped

execution

TX

RX enc

mod

demod

dec

input

chan

looped

execution

looped

execution

looped

execution

looped

execution

looped

execution

TX

RX

Sim Architecture – baseline evolution

Data-parallel

”unfriendly” model:

CPU baseline

architecture

”Very large input vector” means that a lot of

input frames are processes at the same

time. Only parallel architectures allow such

kind of processing for telecom systems!

Page 29: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 30

enc

mod

Sim Architecture – cuda

very large input vector 0 1 2 ... N-3 N-2 N-1

chan

demod

looped

execution

looped

execution

looped

execution

looped

execution

BC

JR

- GP

U g

rids

I_SISO

O_SISO

Deinter +

Depunct

Inter +

Punct

CP

U b

ase

d b

locks

Page 30: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 31

The “monolithic kernel” example

extern "C"

__global__

void cudaMonolithicBCJR_I

(int* Cod_cu, int* Inf_cu, int* OInf_cu, int* AlphaWinMem_I_cu, int* BetaWinMem_I_cu)

{

int vector, window, p, i, j, jj, k, m, base = blockDim.x*blockIdx.x + threadIdx.x;

[…]

mP = &BetaWinMem_I_cu[m+EIGHT];

tempState3 = *--mP;

tempState2 = *--mP;

tempState1 = *--mP;

tempState0 = *--mP;

for(jj = (vector + disp*TRELLIS_MEM_LENGTH__TIMES__TWO),

p=vector, k=window; k>0; p-=N_STATES, k-=EIGHT, jj-=N_STATES)

{

[…]

}

*--mP = tempState3;

*--mP = tempState2;

*--mP = tempState1;

*--mP = tempState0;

mP = &AlphaWinMem_I_cu[m];

tempState0 = *mP++;

tempState1 = *mP++;

tempState2 = *mP++;

tempState3 = *mP++;

vector = base >> 2;

for(p=vector, k=0; k<WINDOW_SIZE__TIMES__N_STATES; p+=INPUTS, k+=EIGHT)

{

[…]

}

*mP++ = tempState0;

*mP++ = tempState1;

*mP++ = tempState2;

*mP = tempState3;

}

15 or 18μs in there!

(≈30% of GPU time for a

single decoding iteration)

What Visual Profiler said: (ipse dixit...)

1. This kernel is most probably

computationally bounded

2. Global memory accesses have large

improvements margins (although we

already suspected...)

3. GPU computational resources should

be better used (occupancy issues)

We do have potential for improvements in

performance, but:

Should we stick on this CUDA architecture

or try to reorganize data structures

and computational stuff?... 15 or 18μs in there!

(≈30% of GPU time for a

single decoding iteration)

Page 31: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 32

SimulationS: BER results in two corner cases

BER 1024 QAM AWGN

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

1.00E-02

22.50 23.00 23.50 24.00 24.50 25.00 25.50

Eb/No coded

BER QPSK AWGN

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

1.00E-02

1.00E-01

4 4.5 5 5.5 6

Eb/No coded

BE

R

1024 QAM QPSK

Page 32: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 33

Simulation performance comparison

Execution Time: Whole Simulation @ BER = 10-6

3070

3601

190305

16,17

11,82

0

500

1000

1500

2000

2500

3000

3500

4000

1 2

QPSK 1024 QAM

tim

e [

s]

0,00

10,00

20,00

Sp

eed

-up

facto

rCPU CUDA Accelerated Speed-up factor

Execution Time: Decoder @ BER = 10-6

51

62

2,021 1,9552

25,24

31,71

0,00

10,00

20,00

30,00

40,00

50,00

60,00

70,00

1 2

QPSK 1024 QAM

tim

e [

ms

]

-5,00

5,00

15,00

25,00

35,00

Sp

eed

-up

facto

r

CPU CUDA Accelerated Speed-up factor

CPU: Intel Xeon X5690 3.47GHz; 12GB RAM

GPU: NVIDIA Tesla C2050

Page 33: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 34

A glance into next challenges

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

0 10 20 30 40 50

frame parallel factor

no

rmali

zed

execu

tio

n t

ime

Execution Time: Single Decoder Iteration @ BER = 10-6

1781417072

198 194

89,97

88,00

0,00

2000,00

4000,00

6000,00

8000,00

10000,00

12000,00

14000,00

16000,00

18000,00

20000,00

1 2

QPSK 1024 QAM

tim

e [

ms

]

80,00

90,00

100,00

Sp

eed

-up

facto

r

CPU CUDA Accelerated Speed-up factor

Page 34: Telecom Systems Simulations Acceleration via CPU/GPU …on-demand.gputechconf.com/gtc/2012/presentations/S0255-Sims... · Slide title minimum 48 pt Slide subtitle minimum 30 pt Paolo

Slide title

minimum 32 pt

(32 pt makes 2 rows

Text and bullet level 1

minimum 24 pt

Bullets level 2-5

minimum 20 pt

!"#$%&'()*+,-

./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdef

ghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏ

ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆć

ĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠ

šŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—

‘’‚“”„†‡•…‰‹›⁄€™−≤≥fifl

ĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌ

ŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘș

ΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣ

ΤΥΦΧΨΩΪΫΌΎΏ

ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏ

ѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№

Do not add objects or text in

the footer area GTC 2012 | © Ericsson Telecomunicazioni SpA 2012 | 2012-05-15 | Page 35