ai in iot at the edge: mram’s golden opportunity · how can stt-mram help? - the engine to enable...

AI in IoT at the Edge: MRAM’s Golden Opportunity

Andy Walker

October 2019

2

Spin Memory Corporate Overview• US Company based in Fremont, CA

• Strong Corporate Partners- Arm- Applied Materials- www.spinmemory.com/spin-memory-announces-52-million-series-b-funding-round/

• Highly differentiated MRAM IP and Expertise- Design Techniques and MRAM Management- 10+ Years MRAM Design Expertise- Magnetics / MTJ technology- Selector/Process Expertise

• Complete MRAM Teams- Magnetics/Physics- Device Fabrication- CMOS Design- Test & Reliability Engineering

• 200mm MRAM Prototype Line Spin Technology Center - Fremont, CA

Arm Tech Con 2019 – © Spin Memory, Inc.October 2019

http://www.spinmemory.com/spin-memory-announces-52-million-series-b-funding-round/

3

Contents

• AI, IoT and Colossal Energy Demand• Physics of Charging and Discharging Capacitors• Relevance to Integrated Circuits, Systems and AI• The Main Hog – Energy Demand in Memory• What Can We Do About This for AI?

- But First – What is STT-MRAM?- Minimize Static Energy Loss/Maximize On-Chip Memory Capacity- The Engine for Energy Efficiency

• Application-Targeted MRAM for SoCs / MRAM Macros / Markets for SRAM-like MRAM- The Selector- Fault Tolerance and Voltage Manipulation

• MRAM in AI


4

IoT, AI and Colossal Energy Demand• Energy efficiency key constraints for IoT and AI-in-IoT proliferation

- 1T IoT devices by 2035(1)

- Large growth in AI-at-the-Edge - Widespread innovations in IoT power sources- What can be done at the silicon level?

(1) P. Sparks, “The Route to a Trillion Devices”, Arm White Paper, June 2017 (2) “AI Chip Architectures Race to the Edge”, Semiconductor Engineering Nov. 2018 (3) A. Raj et al., J. Electrochemical Soc. 2018

arm


(1)

(2)

(3)

5

IoT, AI and Colossal Energy Demand• Energy efficiency key constraint for AI proliferation

- Training a single AI model can emit as much CO2 as five cars in their lifetimes(1)

- AI data centers to consume > 10% of world energy capacity by 2025(1)

(1) G. Dickerson, Applied Materials AI Design Forum, July 9 2019 and E. Strubell et al., “Energy and Policy Considerations for Deep Learning in NLP”, arXiv.org, June 5 2019


6

The Main Hog – Energy Demand in Memory• Fetching/storing data in solid state memory uses >/~ 60% of system energy(1)

- On-chip SRAM access ~ 10X energy of CPU data manipulation- Off-chip DRAM access ~ 1000X energy of CPU data manipulation

• On-chip SRAM has fundamental leakage – wasted energy(2) (3)

(1) A. Boroumand et al., “Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks”, ASPLOS’18, March 2018

(2) A. Pedram et al., “Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era”, IEEE Design & Test, vol.34, April 2017

(3) Simulations/estimates from 7nm transistor data in IEDM 2017


7

Physics of Charging and Discharging Capacitors

• Charging- Total energy stored in capacitor = 1

2𝐶𝐶𝑉𝑉𝑏𝑏2

- Total energy converted into heat = 12𝐶𝐶𝑉𝑉𝑏𝑏2

Charging Discharging* *

* http://hyperphysics.phy-astr.gsu.edu

• Discharging- Total energy stored in capacitor = 1

2𝐶𝐶𝑉𝑉02

- Total energy converted into heat = 12𝐶𝐶𝑉𝑉02

• During charging half** of electrical energy converted into heat and half** stored on capacitor

• During discharging all** of stored capacitor energy converted into heat** About right since does not count EM radiation


http://hyperphysics.phy-astr.gsu.edu/

8

Relevance to Integrated Circuits, Systems and AI• Any IC and system is an electrical power supply and a network of capacitors and

resistors• Data movements require charging and discharging of wires• Wires are capacitors with 𝑪𝑪 ∝ 𝑳𝑳𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘• Energy conversion into heat ∝ 𝑳𝑳𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘• Long wires between data in on-chip SRAM and processor• Longest wires between data in off-chip DRAM and processor• Most energy conversion into heat takes place in data transactions with memory• AI requires extremely intensive store and recall between processor and memory• Memory requirements pose a huge challenge in energy efficiency for deep learning

models


9

What Can We Do About This for AI?• Domain Specific Architectures for AI accelerator chips• Algorithms that minimize data flow interactions with off-chip DRAM• Package solutions that minimize inter-chip impedances (capacitances)• In-memory compute• Near memory compute• Principle of locality in time and space – cache structure and control• Data compression to minimize weight populations• Reduced precision arithmetic• Maximize stand-alone main memory single chip capacity• Minimize static energy loss • Maximize on-chip memory capacity• Fault tolerance and voltage manipulation

What is doing about this?


10

Minimize Static Energy Loss/Maximize On-Chip Memory Capacity

Spin’s Solution What it is What it does Importance

ENGINECircuit tuned to the physics of the magnetic element

Allows MRAM tobehave as RAM-like (high endurance and symmetric R/W)

High density, ~zero leakage,and persistent memory

SELECTOR

Manufacturable, scalable semiconductor device using existing Fab tools, materials and switching mechanisms

Allows dramaticshrinks of MRAM cells (</~10F2) to enable SRAM replacement and persistent (e)DRAM

Very high density, ~zero leakage and persistent memory for embedded and stand-alone solutions

Reduce transactions with off-chip DRAM for dramatic energy efficiency as embedded memory

Low energy Storage Class Memory


11

The Engine for Energy EfficiencyEnables RAM-like Performance with Energy Efficient MRAM

• Engine allows reduced electrical stress- Results in large endurance increase (5 – 6 orders)

• Engine deals with resultant WER increase- Managed transparently to the user- No change in latency- Allows for faster pulses at high endurance- Symmetric Read/Write

MTJ Voltage (V)

Log(

Writ

e Er

ror R

ate)

Reduced electrical stress

Increased WER

ENGINE on-chip


12

Application-Targeted MRAM Design for SoCs

NVM

Retention

Spee

d &

Endu

ranc

e

Foundry NVM MTJ

Foundry SRAM MTJ

eNVMeFlash replacement

10+ years

25ns Rd / 50-500ns Wrt106-8 cycles>10 years retention

SRAMSRAM ReplacementLLC, AI, DDI, many others

10-15ns R/W>1013 cyclesDays-months retention

HS&ENVM

High Speed & Endurance NVMIoT, Edge AI

25-50ns R/W>1011 cycles>10 years retention


13

MRAM Compiler and Macro Availability

• Arm and Spin creating MRAM compilers

- HS&E NVM first- SRAM replacement near future

• Arm and Spin can create custom macros

- Especially SRAM-replacement at advanced nodes


14

Markets for SRAM-like MRAM

Display Driver IC (DDI)

CMOS Image Sensor (CIS)MCUs

CPUs & Networking

Datacenter AI

SSD Controller

5 - 7nm 22 - 28nm

Especially Edge AI


15

The SelectorDisruptive Non-Disruptive Technology – The Key to Manufacturability

3-D NAND

High voltage vertical NMOS transistorusing selective epitaxy

Adapt

Optimize+

Combine

Allows Embedded Persistent DRAM (<10F2)- Maximize on-chip memory capacities- Minimize off-chip DRAM transactions- Dramatic increase in energy efficiency- Useful for any switchable element

SPIN’s Selector


16

Fault Tolerance and Voltage Manipulation• Energy stored on a capacitor ∝ 𝑉𝑉2

• Voltage V can be supply voltage, maximum bit line voltage and so on• Trade off classification accuracy for energy efficiency• SRAM voltage scaling for energy efficiency in convolutional neural nets(1)

• SRAM/DRAM/Flash voltage scaling in deep neural net resilience study(2)

• Traditional memories tend to have chaotic bit behavior with reducing V• MRAM is stochastic but with predictable bit error behavior with V (read and write)• Match neural net fault tolerance with MRAM bit error rates using V(3)

- Quadratic reduction in energy conversion into heat- Large MRAM endurance boost due to less thin magnesium oxide wearout- Improve performance through fast reads with predictable read disturb rates- Improve MRAM density with smaller cells with predictable retention errors

(1) L. Yand and B. Murmann, ISQED, 2017 (2) B. Reagen et al., DAC 2018 (3) M. Tzoufras, M. Gajek and A. Walker, arXiv 2019.


17

MRAM in AI (1)

• Stochasticity is linked in a fundamental way to neural networks. At the same time it is an inherent property of MRAM that has hampered it for more than a decade

• The convergence between neural networks and MRAM presents a unique opportunity for research and for improving the performance of many ANN applications

• SRAM is on-chip and provides flawless but untuneable precision• DRAM is off chip and provides flawless but untuneable precision• MRAM can be integrated on-chip to provide dense and tuneable precision

(1) M. Tzoufras, M. Gajek and A. Walker, arXiv 2019.


18

Conclusions• Energy Use in AI and IoT calls for Energy Efficiency techniques• The Main Hog – Energy Demand in Memory• Physics of Charging and Discharging Capacitors• Relevance to Integrated Circuits, Systems and AI• How Can STT-MRAM Help?

- The Engine to enable RAM-like performance• High endurance and symmetric read/write• Application-Targeted MRAM Designs for SoCs / MRAM Macros / Markets

- The Selector• Very high density persistent embedded and stand-alone memory with ~zero leakage

- Fault Tolerance and Voltage Manipulation• MRAM in AI


Thank You

spinmemory.com

Spin Memory Inc.45500 Northpoint Loop WestFremont, CA 94538

ai in iot at the edge: mram’s golden opportunity · how can stt-mram help? - the engine to enable...

Documents