processing in memory advanced seminar computer engineering … · 2017. 4. 28. · 11 advanced...

Post on 20-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Processing in Memory

Advanced Seminar Computer Engineering

ZITI CAG – University of Heidelberg

Felix Kaiser

6.2.2017

2/6/17Advanced Seminar – Processing in Memory2

Processing in Memory

Processing in Memory can be everything

2/6/17Advanced Seminar – Processing in Memory3

Structure of Contents

Computation Walls Near-Data-Processing Processing-In-Memory Systems Results and Challenges Conclusion

2/6/17Advanced Seminar – Processing in Memory4

Walls

Peak FLOPS/Socket increasing at 50%-60% per Year

Memory Bandwidth increasing at 23% per Year

Memory Latency increasing at 4% per Year Interconnect Bandwidth increasing at 20%

per Year Interconnect Latency decreasing at 20% per

Year

2/6/17Advanced Seminar – Processing in Memory5

Walls

2/6/17Advanced Seminar – Processing in Memory6

Taxonomy of NDP

Computing in Caches

Near-Memory Processing/Processing-in-Memory

Processing in Flash

Computing on Disk

Intelligent Network

2/6/17Advanced Seminar – Processing in Memory7

Subsets of PIM

In front of the Sense Amplifier(with reservations)

Between Sense Amplifier and Column Decoder Memory embedded into Pipeline of a Processor In front of a Bus/Crossbar Switch In 3D-Stack

• In each Vault

• Processing Dies in Memory Stack

2/6/17Advanced Seminar – Processing in Memory8

Subsets of PIM

In front of the Sense Amplifier(with reservations)

Between Sense Amplifier and Column Decoder

Memory embedded into Pipeline of a Processor

In front of a Bus/Crossbar Switch

In 3D-Stack• In each Vault

• Processing Dies between Memory DiesSense

AmplifierEnable

A

B

C

2/6/17Advanced Seminar – Processing in Memory9

Subsets of PIM

SenseAmplifier

Enable

A

B

C

Final StateAB + BC +

AC

C(A + B) + ~C(AB)

2/6/17Advanced Seminar – Processing in Memory10

Subsets of PIM

In front of the Sense Amplifier(with reservations)

Between Sense Amplifier and Column Decoder

Memory embedded into Pipeline of a Processor

In front of a Bus/Crossbar Switch

In 3D-Stack• In each Vault

• Processing Dies between Memory Dies

2/6/17Advanced Seminar – Processing in Memory11

Computing near Memory

In front of the Sense Amplifier(with reservations)

Between Sense Amplifier and Column Decoder

Memory embedded into Pipeline of a Processor

In front of a Bus/Crossbar Switch

In 3D-Stack• In each Vault

• Processing Dies between Memory Dies

2/6/17Advanced Seminar – Processing in Memory12

Computing near Memory

2/6/17Advanced Seminar – Processing in Memory13

DEEP-ER (NAM)

HMC(3D-Stacked Memory) and the FPGA are connected with wide Data paths

The Intelligence can execute Computation while using the full Bandwidth

Data can be written back or sent to Host

2/6/17Advanced Seminar – Processing in Memory14

Stacking

Idea:

Stacking different Parts which are traditionally placed on PCBs

Especially Memory stacking can be sensible:• higher Density

• more Capacity

• less Power Consumption

2/6/17Advanced Seminar – Processing in Memory15

Subsets of PIM

In front of the Sense Amplifier(with reservations)

Between Sense Amplifier and Column Decoder

Memory embedded into Pipeline of a Processor (GPP)

In front of a Bus/Crossbar Switch

In 3D-Stack• In each Vault

• Processing Dies between Memory Dies

2/6/17Advanced Seminar – Processing in Memory16

Subsets of PIM

In front of the Sense Amplifier(with reservations)

Between Sense Amplifier and Column Decoder

Memory embedded into Pipeline of a Processor (GPP)

In front of a Bus/Crossbar Switch

In 3D-Stack• In each Vault

• Processing Dies between Memory Dies

2/6/17Advanced Seminar – Processing in Memory17

Simulation System

Host:• 2 Cortex-A15 at 2GHz

Memory:• Hybrid Memory Cube

• 512 MB

• 16 Vaults

• 4 Dies stacked

PIM:• Similar to Host with Voltage and Frequency Scaling

2/6/17Advanced Seminar – Processing in Memory18

Results

2/6/17Advanced Seminar – Processing in Memory19

Heat Problems with 3D-Stacking

No actual Implementations of logic in 3D-Memory stacks could be found

One of the biggest Problems is heat

2/6/17Advanced Seminar – Processing in Memory20

Conclusion

NDP sensible to overcome Walls Traditional PIM Approaches have Problems

• Reason: DRAM and Logic Processes differ too much

Bus/Crossbar Approach looks realistic 3D-Stacking Approaches seem promising

But:• Not verified(only Simulations)• Potential Heat Problems

2/6/17Advanced Seminar – Processing in Memory21

Sources

[1] siliconangle.com/blog/2011/10/21/sap-announces-hana-powered-analytics-cloud-at-teched-2011/sap-hana/

[2] amd.com[3] SC16 Invited Talks – John McCalpin[4] Design and Evaluation of Processing-in-Memory Architecture for the Smart Memory

Cube, Erfan Azarkhish et al.[5] Processing-in-Memory: Exploring the Design Space, Marco Scrbak et al.[6] Data-Centric Computing Frontiers: A survey On Processing-in-Memory,Patick Siegl, Rainer Buchty, Mladen Berekovic[7] Integrated Thermal Analysis for Processing in Die-Stacking Memory[8] Implementation of a 32-bit RISC Processor for the Data-Intensive ArchitectureProcessing-in-Memory Chip, Jeffrey Draper et al.[9] http://www.ece.cmu.edu/~ece447/s15/doku.php?id=start, Prof. Onur Mutlu

top related