erd architecture benchmarking: the nri mind activity ralph k. cavin, iii, kerry bernstein & jeff...

15
ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Upload: jonathan-jones

Post on 01-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

ERD Architecture Benchmarking:The NRI MIND ActivityRalph K. Cavin, III, Kerry Bernstein &

Jeff WelserJuly 12, 2009

San Francisco, CA

Page 2: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Goals of the NRI/MIND Benchmarking Project

• Develop circuit/subsystem level examples of the applications of novel devices

• Evaluate the circuits/subsystems in the energy-time-space context versus CMOS implementations

• Determine most promising applications for emerging devices with an emphasis on integration with CMOS

Page 3: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Year

0

5

10

15

20

SP

EC

Int

/ F

PG

AlphaAMD

HPIBM

INTELSGI

SUNAVG

Highest reported SPEC2000 INT per (adj)FPG GenerationFPG, SPECmark approximated when necessary; Broken line = discontinued series

K. Bernstein 11/06

Architecture Performance*SPEC 2000 Int (Base)

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Year

0

5

10

15

20

25

30

SP

EC

fp /

FP

G

AlphaAMD

HPIBM

INTELSGI

SUNAVG

Highest reported SPEC2000 INT per (adj)FPG GenerationFPG, SPECmark approximated when necessary; Broken line = discontinued series

K. Bernstein 11/06

Architecture Performance*SPEC 2000 fp(Base)

Architectural Innovations haven’t been the major driver for system performance

Analysis of high perfarchitectures and the technologies they were built in, examining devicevs arch contributions to throughput

- Predominant influence on SPEC2000 is from device technology - Modest contributions from architecture

Page 4: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Four Architectural Projections

1) CMOS is not going away anytime soon. Charge (state variable), and the MOSFET (fundamental switch) will remain the preferred HPC solution until new switches appear as the long term replacement solution in 10-20 years.

2) Hdwre Accelerators execute selected functions faster than software performing it on the CPU.Accelerators are responsible for substantial improvements in thru-put.

3) Alternative switches often exhibit emergent, idiosyncratic behavior. We should exploit them.Certain physical behaviors may emulate selected HPC instruction sequences. Some operations may be superior to digital solutions.

4) New switches may improve high-utilitization acceleratorsThe shorter term supplemental solution (5-15 years) improves or replaces accelerators “built in CMOS and designed for CMOS”, either on-chip or on-3D-stack or on-planar

Page 5: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Matching Logic Functions & New Switch Behaviors

Single Spin

Spin Domain

Tunnel-FETs

NEMS

MQCA

Molecular

Bio-inspired

CMOL

Excitonics

?

Popular Accelerators New Switch IdeasEncrypt / Decrypt

Compr / Decompr

Reg. Expression Scan

Discrete COS Trnsfrm

Bit Serial Operations

H.264 Std Filtering

DSP, A/D, D/A

Viterbi Algorithms

Image, Graphics

Example: Cryptography Hardware AccelerationOperations required: Rotate, Byte Alignmt, EXORs, Multiply, Table LookupCircuits used in Accel: Transmission Gates (“T-Gates”)New Switch Opportunity: A number of new switches (i.e. T-FETs) don’t have (example) thermionic barriers: won’t suffer from CMOS Pass-gate

VT drop, Body Effect, or Source-Follower delay.Potential Opportunity: Replace 4 T-Gate MOSFETs with 1 low power switch.

Page 6: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Examples of Benchmarking Work in Progress

• Magnetic Tunnel Junction one-bit adder• Magnetic Logic for one-bit adder• Magnetic Ring Logic Devices• Many other devices are being evaluated in a

variety of circuit configurations.

Page 7: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Background - MTJ• Researchers have been investigating post-CMOS devices for

many years. In short term, people are looking for switches that supplement CMOS and are CMOS-compatible, supporting ultra-low power operation.

• MTJ (Magnetic Tunnel Junction) is one of the strongest candidate which is available in practice rather than only in theory.– Excellent for memory and storage.

• STT-RAM using MTJ is strong candidate for universal memory.

– For logic design, good or not?• Any memory device can also be used to build logic circuits, in theory at

least, and MTJ is no exception.• The discovery of spin torque transfer (STT) makes MTJ scalable and

completely CMOS-compatible.

Page 8: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

MTJ-based DyCML 1 Bit Full Adder

• MTJ is used as both a memory cell and functional input. • The switching of MTJ conducted by STT using control signals WL, BL. • It is actually a CMOS-MTJ-combined version of DyCML. Thus, it is more

reasonable to compare it with CMOS-based DyCML to see MTJ’s impact.

Page 9: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Results

• ED Curve of 65nm process

DyCML-MTJ

SCMOS

DyCML-CMOS

Page 10: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Nanomagnet Logic (NML)

PIsGary Bernstein1, X. Sharon Hu2, Michael Niemier2, Wolfgang Porod1

Student Researchers:

M. Tanvir Alam1, Michael Crocker2, Aaron Dingler2,

Steve Kurtz2, Shawn Liu2, M. Jafar Siddiq1, Edit Varga1

Affiliations:1Department of Electrical Engineering, 2Department of Computer Science and Engineering

Page 11: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Comparison to CMOS

• Hard to compare magnet to transistor– Need to make technology comparison at

functional unit level; consider initial projections here

• Natural comparison = low power CMOS systems, sub-threshold, etc.

11

A

C

B

Sum

Cout

M1

M2

M3

Base performance projections on adder design.

Page 12: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Trends

12

V & r

EDP(pJ ns)

Because of sensitivity to sub-threshold slope,

threshold voltage … energy, delay can vary

significantly from technology to technology.

These are best data points for CMOS

(0.3V - 1V)

Energy (pJ) Delay (ns)

CMOS 0.020 261

NMLNP 0.029 198

NMLP 0.029 18

With r = 1, can still see ~15X performance gain due to

higher throughput

CMOS 0.19 20If higher supply voltage to match

delay, ~7X energy savings

NMLNP 0.0012 198

NMLP 0.0012 18

With r = 5, ~17x (NP) and ~158X (P) energy savings with

better performance

Page 13: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Magnetic Ring Logic Devices – Benchmarks/Metrics

• Caroline Ross - MIT• These devices work by the movement of domain walls around thin film

rings with general structure Hard layer/Spacer/Soft layer, e.g. Co/Cu/NiFe or Co/MgO/NiFe.

• Rings can have several remanent states with different resistances. This is useful for multibit memory. However, digital logic uses two levels so in these examples, some of the complexity available in ring devices is wasted

• NAND/NOR configurations are being analyzed.

Page 14: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Existing prototype Projection

Device area 1 µm2 Improve x 100?

Switching speed 5 ns Proportional to 1/device length (improve x 10?) and domain wall velocity (improve x 10?)

Switching energy 5 10-14 J (107 kT) Proportional to switching speed (improve x 100??) and to device x-section area (improve x 10-20?) and to critical current for wall motion (improve x10-100?)

Prototype Magnetic Ring Device Performance

Page 15: ERD Architecture Benchmarking: The NRI MIND Activity Ralph K. Cavin, III, Kerry Bernstein & Jeff Welser July 12, 2009 San Francisco, CA

Summary

• The Nanoelectronics Research Initiative benchmarking project should be nearing completion by mid-August, 2009

• The ERA section plans to provide a summary of findings for 2009