134 ...drl.ee.ucla.edu/wp-content/uploads/2017/07/... · 134...

134 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 6, NO. 2, JUNE 2016

Comparative Evaluation of Spin-Transfer-Torque andMagnetoelectric Random Access Memory

Shaodi Wang, Student Member, IEEE, Hochul Lee, Student Member, IEEE,Farbod Ebrahimi, P. Khalili Amiri, Member, IEEE, Kang L. Wang, Fellow, IEEE, and Puneet Gupta

Abstract—Spin-transfer torque random access memory(STT-RAM), as a promising nonvolatile memory technology, faceschallenges of high write energy and low density. The recentlydeveloped magnetoelectric random access memory (MeRAM)enables the possibility of overcoming these challenges by the useof voltage-controlled magnetic anisotropy (VCMA) effect andachieves high density, fast speed, and low energy simultaneously.As both STT-RAM and MeRAM suffer from the reliabilityproblem of write errors, we implement a fast Landau–Lif-shitz–Gilbert equation-based simulator to capture their writeerror rate (WER) under process and temperature variation.We utilize a multi-write peripheral circuit to minimize WERand design reliable STT-RAM and MeRAM. With the sameacceptable WER, MeRAM shows advantages of 83% faster writespeed, 67.4% less write energy, 138% faster read speed, and28.2% less read energy compared with STT-RAM. Benefitingfrom the VCMA effect, MeRAM also achieves twice the densityof STT-RAM with a 32 nm technology node, and this densitydifference is expected to increase with technology scaling down.

Index Terms—Evaluation, magnetic tunnel junctions (MTJ),magnetoelectric random access memory (MeRAM), spin-transfertorque RAM (STT-RAM), voltage controled memory, write en-ergy, write error rate, write speed.

I. INTRODUCTION

M AGNETORESISTIVE random access memory(MRAM) [1] using magnetic tunnel junctions (MTJs)

is a promising data storage technology due to its nonvolatility,zero leakage power, high speed, high endurance, immunity tosingle-event soft error and high thermal budget [2], [3]. MTJsswitched by Spin-transfer torque (STT-MTJ) [4], [5] potentiallypromise the speed and area of dynamic RAM (DRAM) [6].Therefore, spin-transfer torque RAM (STT-RAM) designedwith STT-MTJs is identified as a possible replacement ofcurrent memory technologies, such as static RAM (SRAM)

Manuscript received June 24, 2015; revised August 28, 2015 and October31, 2015; accepted December 01, 2015. Date of publication April 06, 2016;date of current version June 09, 2016. This work was supported in part by theNSF Engineering Research Center on Translational Applications of NanoscaleMultiferroic Systems (TANMS), and in part by a Phase II SBIR award from theNational Science Foundation. This paper was recommended by Guest Editor S.Ghosh.S. Wang, H. Lee, F. Ebrahimi, P. K. Amiri, K. L. Wang, and P. Gupta are

with the Department of Electrical Engineering, University of California, LosAngeles, CA 90095 USA (e-mail: [email protected]).F. Ebrahimi and P. K. Amiri are also with the Inston Inc., Los Angeles, CA

90095 USA.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JETCAS.2016.2547681

Cache [7]–[10] and DRAM main memory [11]. In addition tothe traditional uses, research also focuses on exploring newmemory architectures to utilize its non-volatility (e.g., fastpersistent memory system enabled by STT-RAM [12] allowsto instantly recover from off state). However, STT-RAM facesthe challenge of high write current (around at 45 nmnode [6]), which does not directly scale with MTJ dimension[13]. As a result, a large access transistor is always required,leading to high write energy and low density.The recent development of voltage-controlled MTJs

(VC-MTJs) with voltage-controlled magnetic anisotropy(VCMA) provides more promising performance thanSTT-MTJs [14]–[20]. This technology allows for precessionalswitching, a process which provides flipping of the magneti-zation upon a voltage pulse, irrespective of the initial state. Itenables the use of minimum sized access transistors, as well asprecessional switching to simultaneously achieve low energy,high density and high speed magnetoelectric random accessmemory (MeRAM). MeRAM reduces switching energy due toreduced ohmic loss ( [15] for the VC-MTJs withover 100X higher resistance than STT-MTJs).Both STT-MTJ and VC-MTJ suffer from the reliability

problem of intrinsic switching failure caused by thermalfluctuation exacerbated by process variation [21], [22] andtemperature variation. This problem can be quantified by writeerror rate (WER), which is the average number of switchingfailures per write. STT-RAM can simply reduce the WER byusing high current and long write time. By contrast, MeRAMdoes not have a trivial solution, because every VC-MTJ has anoptimal write pulse giving the lowest WER, and the effect ofvariation on the optimal pulse is less straight forward.Considering the advantages and disadvantages, MeRAM re-

quires a comprehensive evaluation, while STT-RAM, which isbetter known and has similar structure and fabrication process,is an appropriate reference. To accurately compare the reliabilityof the two technologies, the WER must be precisely captured.The state-of-art method is the Landau–Lifshitz–Gilbert (LLG)differential equation based Monte-Carlo simulation (e.g., [23]for STT-MTJs). However, previous implementations were tooslow to be adapted for high-accuracy simulations needed forlarge memory array. As a result, limited samples were simu-lated in previous STT-RAM studies [21], [24], which could notaddress WER below . This may lead to inappropriate de-signs, e.g., the WER of requires 20% more write currentthan for STT-MTJs. Moreover, the context of circuit-leveloptimization is also essential given that peripheral circuit cansignificantly affect memory performance. For instance, MRAM

2156-3357 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

WANG et al.: COMPARATIVE EVALUATION OF SPIN-TRANSFER-TORQUE AND MAGNETOELECTRIC RANDOM ACCESS MEMORY 135

Fig. 1. (a) Writing mechanism of STT-RAM and MeRAM. (b) VC-MTJ is switched by unidirectional voltage pulses. The first two same pulses switch the resis-tance state of a VC-MTJ from to and then back to , the third double-width pulse make two switches continuously. (c) STT-MTJ is switched by directionalcurrent pulses, and the switching directions depends on the direction of current.

can leverage circuit techniques to mitigate the WER by tradingoff the speed and power.In this paper, we perform the first comprehensive circuit-level

comparison between MeRAM and STT-RAM with respect toreliability, power, performance, density, and scalability using ahigh-speed Monte-Carlo simulator. Our contributions are sum-marized as follows.• The LLG equation is modified to include thermal fluctua-tion, temperature dependence, STT, and VCMA effect.

• Based on the LLG equation, a simulator is implementedin CUDA on GPU platform1. It completes 100 000 simu-lations within 2 s.

• The density and scalability of STT-RAM and MeRAMare analyzed based on the 32 nm design rules and derivedmodels, respectively.

• The impact of process and temperature variation onSTT-RAM and MeRAM is compared: for the first time thevariation impact on MeRAM is analyzed; the temperaturerelated behavior is analyzed using more accurate modelthan existing studies.

• The WER of both MRAMs is minimized below an ac-ceptable rate by utilizing multiple writes enabled by thepre-read and write sense amplifier (PWSA) [25]. With thisdesign, a fair circuit-level energy-speed comparison be-tween STT-RAM and MeRAM is carried out.

The paper is organized as follows. Section II describes theLLG based model and simulation in detail. Section III analyzesthe scalability of STT-RAM and MeRAM. Section IV designsMRAM cells under process and temperature variation andcompares the cell density of two MRAMs with 32 nm designrules. Section V analyzes the WER of the nominal MTJs andMRAMs with process and temperature variation separately.Section VI introduces the PWSA multi-write design and carriesout a circuit-level comparison with respect to write latency,energy and MRAM failure analysis. Section VII concludes thepaper.

II. MODELING AND SIMULATION

Both STT-MTJ and VC-MTJ are resistive memory de-vices, their resistances are determined by the magnetization

1Freely available at http://nanocad.ee.ucla.edu/Main/DownloadForm

directions of two ferromagnetic layers. The direction ofone layer is fixed (referred to as reference layer) while theother one can be switched (referred to as free layer). A lowresistance is present when magnetic directions in the twolayers are parallel (referred as state); a high resistance ispresent when the two directions are anti-parallel (referred as

state). The two states are utilized to store “0” and “1”.Tunnel magnetoresistance (TMR, defined asover 200% has been demonstrated, which means that the highresistance can be over 3X of the low resistance. Based on themagnetization direction of the two layers, MTJs are classifiedas in-plane and out-of-plane (perpendicular magnetized) de-vices. Recently, STT-RAM with out-of-plane MTJs is foundto have lower write current and less fabrication challengethan in-palne MTJs [26]–[29]. The magnetic anisotropyof out-of-plane MTJs is dominated by the perpendicularmagnetic anisotropy (PMA). In this paper, we consider theSTT-RAM and MeRAM with out-of-plane MTJs.Although STT-MTJs and VC-MTJs share a similar device

structure and data storing mechanism, their switching mecha-nisms differ as shown in Fig. 1, e.g., in an STT-MTJ, polar-ized electrons flowing from the reference layer to the free layerswitch the magnetization of the free layer to state; when elec-trons flow in the opposite direction, the reflected electrons fromthe reference layer switch the free layer to state.Unlike STT-MTJ, VC-MTJ utilizes an unidirectional voltage

pulse to make both switches from to and from to .As is illustrated in Fig. 2, the energy barrier Eb separates thetwo stable states of the free layer magnetization (pointing upand down) when the voltage applied across the VC-MTJ is 0.The energy barrier Eb decreases with the voltage increase due toVCMA effect. When the voltage reaches [ , see (8)],full 180 switching can be achieved by timing the precessionalswitching of magnetization.In the MTJ switching simulation, the magnetization in

the free layer during any short interval (e.g., 0.25 ps in oursetup) is described by an LLG differential equation. The entireswitching is captured by iteratively solving the LLG equa-tions in sequence. The WER is then extracted from numeroussimulations in a Monte-Carlo approach. Shorter interval andmore simulations can improve the accuracy at the expense oftime. The LLG (1) describes the dynamic behavior of the freelayer magnetization vector in the presence of an external


Fig. 2. VCMA-induced precessional switching. When a voltage is applied onthe VC-MTJ, the energy barrier separating the two magnetization states of thefree layer is reduced so that the magnetization state starts to spin.

field , shape anisotropy , PMA , andthermal fluctuation , as follows:

(1)

where is the gyromagnetic ratio, is the effective magneticfield, is the intrinsic damping constant, is the satura-tion magnetization, and is the amplitude of the spin-transfertorque induced by current. can be reduced by voltagedue to the VCMA effect, which is expressed below

(2)

where is the applied voltage, and are thethickness of the free layer and MgO layer respectively, isthe anistropy constant, is the anisotropy change slope, isthe VCMA factor with the unit of . Positive causesVCMA effect to reduce as well as the perpendicularmagnetization to cause precessional switching [14]. An op-timal applied voltage can exactly cancel out the perpendicularmagnetization, and then a perfect precessional switching (con-trolled by the in-plane external magnetic field ) starts,during which the magnetization in the free layer rotates. Theoptimal pulse width equals to the half cycle of the precessionalswitching [16]. More specifically, the optimal pulse allowsthe magnetization to rotate exactly 180 . The VCMA effect isconsidered for both STT-MTJ and VC-MTJ, while previouscircuit-level STT-RAM studies ignore it. As an example, theimpact of VCMA effect on an STT-MTJ is shown in Fig. 3:VCMA can change the WER. When a write current is applied,the voltage drop on the STT-MTJ reduces the PMA and in-creases the chance of thermal activated switching. Hence, whenthe write pulse is not long enough to guarantee a switch, thethermal activated switching assisted by VCMA increases the

Fig. 3. During a write of the STT-MTJ, VCMA may assist the thermal acti-vation to cause unintended switching. This effect can improve the switchingprobability when the write pulse width is insufficient to switch the STT-MTJ,on the other hand, may lead to switching failure when the write pulse width issufficiently long.

switching probability, but when the write pulse is long enough,it induces errors.The in (1) is the thermal fluctuation field and ran-

domly determined as a variable following Normal distributionat each simulation interval

(3)

where is the area of the MTJ, is the Boltzmann constant,and is the temperature.Temperature significantly affects the MTJ switching be-

havior, e.g., the WER of an STT-MTJ can increase fromto with temperature rising from 300 K to 350 K (seeFig. 9). Except , other terms in (1) also change withtemperature as described in (4) [30], which are commonlyignored in previous large-scaled MRAM studies [7], [8]. Insitu thermal sensors [31], [32] may help to monitor MRAMtemperature and modulate MTJ write schemes

(4)

where is 1120 K, and , , and are correspondingparameters at 1120 K.The spin-transfer torque effect is described in (5) [28]

(5)

where is the reduced Plank constant, is the spin-torqueefficiency factor [33], is the angle between the two magneti-zations of the free layer and reference layer, and is the current


TABLE IMODELING PARAMETERS AT 300 K

density through MTJ. can be further expanded at (6) [27],[33], [34]

(6)

where and , as functions of , are polarization ef-ficiency of tunnel current and spin valve respectively.and are material-dependent polarization factors for thetunnel current and current passing through ferromagnetic layersrespectively [34]. These two parameters are not necessarilyequal, while we use 0.66 [35] for both of them in this paper.The required switching current (known as critical current)differs from switching directions due to the difference in po-larizing efficiency [36]. The parameters used in the model andsimulation are listed in Table I.Inspired by the massive floating point calculations involved

by the LLG equation and highly independent operations inMonte-Carlo simulations,we implement the switching simulatorin CUDA, and it completes 100 000 simulations within 2 s onNVIDIATeslaM2070. Themodel has been validated. The speedimprovementcomesfromhighlyparallel simulations[39]–[41].

III. SCALABILITYIn this subsection, we analyze the scalability of STT-RAM

and MeRAM regarding retention, write power, area, and fab-rication challenges. Retention, as one of the most importantmetric for memory system [42], determines the available data-storing time and thus is a non-scalable parameter [6]. An MTJwith low retention is easy to flip, but high retention increasesthe write difficulty. Considering the trade-off, an efficient de-sign should have its retention as low as possible but satisfy ap-plication requirement. For STT-MTJ and VC-MTJ, the reten-tion time (mean time to false switching during idle state) is anexponential function of thermal stability [43], [44]

(7)

where is the sum of perpendicular components of, , and . Based on [14], [28], we derive

the critical current of STT-MTJ and the optimal voltage ofVC-MTJ as functions of and MTJ area in

(8)

where is the elementary charge, is the spin-torque polariza-tion efficiency, and is the VCMA factor. From (8), the critical

Fig. 4. Layouts of STT-RAM and MeRAM under 32 nm design rules. Thearea of an STT-RAM cell is twice the area of a MeRAM cell, as an STT-MTJrequires a 3X wider access transistor than a VC-MTJ. Vertical transistor likenanowire may help to reduce area inefficiency [47].

current of STT-MTJ does not directly depend on the MTJdimension given that the thermal stability is constant. But asincreases with decreased due to the sub-volume excitation

for large MTJs with lateral size over 50 nm [45], can beslightly reduced by scaling dimension. However, the reductiontrend does not continue for small MTJs. The optimal voltage

of VC-MTJ is inversely proportional to indicating that itwill increase with dimension scaling. Hence, the key for scalingboth technologies is finding materials that provide more and.With respect to the memory density, access transistors dom-

inate the area rather than the MTJs (see Fig. 4). Because of thenon-scaling critical current for small STT-MTJs, access tran-sistors in STT-RAM have to increase the width/length ratiowith dimension scaling down. MeRAM always uses minimumsized transistors and hence promises better scalability in den-sity. Alternatively, MeRAM can be integrated in a much densercross-bar structure, unlike STT-RAM [16].In terms of fabrication, both STT-RAM and MeRAM face

the challenge scaling MgO thickness. Scaling dimension forcesSTT-MTJs to reduce MgO thickness, and thin MgO may con-tain defects [46], such as pin-holes, which can cause MTJs tofail. Though MeRAM has thicker MgO because of the high re-sistance of VC-MTJs, increasing write voltage may cause MgObreakdown. Again, these challenges can be overcome by findingbetter materials with higher and .

IV. MRAM CELL DESIGN AND VARIATION

As discussed in Section III, both STT-MTJ and VC-MTJ facescaling problems, and enlarging MTJs exacerbates write dif-ficulty but does not improve thermal stability due to the sub-volume excitation [48]. Considering these factors, we set the


TABLE IIDESIGN PARAMETERS FOR MTJS AND ACCESS TRANSISTORS. TRANSISTORS'

THRESHOLD VOLTAGE VARIATION CONSIDERS THE EFFECTS OF LINEEDGE ROUGHNESS (LER), RANDOM DOPANT FLUCTUATION (RDF), AND

NON-RECTANGULAR GATE (NRG). ACCESS TRANSISTORS OF MERAM HAVELARGER THRESHOLD VOLTAGE VARIATION BECAUSE NARROW TRANSISTORS

ARE AFFECTED MORE BY NRG, RDF, AND LER

diameter of STT-MTJs and VC-MTJs to 60 nm (i.e., a circularMTJ structure), which has been demonstrated [49] for STT-MTJs. Access transistors and peripheral circuit are built with 32nm planar CMOS technology. The layouts of STT-RAM andMeRAM are drawn in Fig. 4 under 32 nm design rules. Thedensity of MeRAM is twice of STT-RAM for the reason thatSTT-MTJ needs 3X wider access transistors.Table II lists the design parameters of MRAM cells (nominal

and variation). In an MRAM cell, an MTJ is connected with anaccess transistor (1T1M), The MTJ resistance variation due tothe MTJ shape variation was identified as a big design concern[21], [24], [54], [55]. But it is actually a secondary problem ofthe re-deposition of etched products on the MTJ sidewalls incurrent plasma-based etching system, where the re-depositionmay cause an MTJ failure [56] and a shape change. Developingselective-etching process is expected to fix both problems byforming volatile compound during a etch. Less than 4% in sizevariation has been shown for fabricated 50 nm STT-RAM [50].We pessimistically choose in our simulation where

is 10% of the MTJ diameter.The designed MTJs have thermal stability margins of

at 300 K and at 350 K for the requirement of 40.3 [6] (i.e.,10 years retention time ). External magnetic field in VC-MTJsassists the precessional switching, but reduces thermal stability,and hence the free layer thickness of VC-MTJs is set thinner tooffset the thermal stability loss.Comparing with the MTJ, CMOS variation has been well an-

alyzed. Major variations in the 32 nm planar technology areconsidered in Table II. FinFET and Tunneling FET technolo-gies [57]–[59], which is possibly introduced for scaled MRAM,shows slightly smaller impact from process variation [60].

V. WRITE ERROR RATE OF MRAMThe reliability problems of MRAMs include retention error,

read disturbance, read failure, and write error. We focus on thewrite error in this section, which is our main contribution, andthe other failures are discussed in Section VI-B.

Fig. 5. WER of the nominal STT-MTJ as a function of pulse width for differentperfect current pulses (constant current) and switching directions.

Fig. 6. The WER of the nominal VC-MTJ as a function of pulse width fordifferent perfect voltage pulses and switching directions. A VC-MTJ has anoptimal pulse, which leads to the lowest WER. The curve of 1.2 V has the loweroverall WER than 1.1 V and 1.3 V, indicating 1.2 V is closer to the optimalvoltage.

A. Write Error Rate of MTJs Without VariationFig. 5 shows the WER of the nominal STT-MTJ. The two

switching directions have different WER due to the asymmetricpolarization efficiency (6). When the STT-MTJ switches from

to , the polarizing current changes from the majority tominority, while the polarizing current changes from theminorityto majority in the opposite direction.The WER of the nominal VC-MTJ is shown in Fig. 6. The

curve of 1.2 V is observed to have lower overall WER (fordifferent pulse widths) than 1.1 V and 1.3 V indicating that itis closer to the optimal voltage. A nonoptimal voltage eitherunder-compensate or over-compensate the PMA, resulting inan imperfect precessional switching and thus a higher WER.As can be seen in Fig. 6, the low WER region of 1.3 V av-eragely locates left (shorter pulse width) to 1.2 V and 1.1 V,as 1.3 V over-compensates the PMA more to result in a fasterprecessional switching. Small WER asymmetry is observed forthe two switching directions, because the write voltage inducesleakage current and corresponding STT effect, which assists theswitching from to but resists the switching from to.

B. Write Error Rate of MRAM ArrayTo estimate the WER of an entire array with temperature and

process variations, WER must be simulated for different cellsthat have varying design parameters.The variations of access transistors result in variation of write

pulse voltage, rise and fall time as is shown in Fig. 7(a). We


Fig. 7. (a) Write current (voltage) pulse on STT-MTJs (VC-MTJs). The riseand fall time are measured by the time while voltage is rising and falling be-tween 10% and 90% of the peak voltage, respectively. (b) Mean of write cur-rent on STT-MTJs as a function of MTJ resistance. (c) Mean of write voltageon VC-MTJs as a function of MTJ resistance.

obtain the distribution of the write pulse using Monte-CarloSPICE simulations. In simulations, an access transistor is con-nected with a resistor and a capacitor (a lumped model for theMTJ [61]). The parameters of access transistors are randomlydetermined based on Table II. Then the distribution of pulsecurrent (current through STT-MTJs), pulse voltage (voltage onVC-MTJs), pulse rise time, and pulse fall time are statisticallyextracted from 100 000 simulations for each MTJ resistancestate (i.e., resistance changes during switching) and (thesupply voltage between 0.9 V to 1.3 V, which drops over anaccess transistor and an MTJ in series). The standard deviation

, mean of pulse current and voltage vary with MTJ re-sistance. As is shown in Fig. 7(b) and (c), the of pulse currentchanges up to 26.5% with STT-MTJ resistance, whereas theof pulse voltage only changes below 3.5% with VC-MTJ resis-tance because that the high resistance of VC-MTJs drops morethan 95% of the . The of pulse current in STT-RAMis up to 16%, whereas the of pulse voltage in MeRAM isbelow 1% for the reason that in STT-RAM the pulse current ismainly controlled by access transistors and thus suffers moreimpact from transistor variation. The , of pulse rise time aremainly determined by access transistors, which barely do not de-pend on MTJ resistance. For a given , the varies within7% with resistance, and the is around 10%. By contrast,the fall time is mostly determined by the leaking current throughMTJs. In STT-RAM the of pulse fall time varies between 61.6ps and 70.6 ps with MTJ resistance, while the for MeRAMis much longer and varies between 128 ps and 248 ps becauseof the high resistance of VC-MTJs. The of pulse fall timedue to transistor variation is around 9% (maximum 14.1%) forSTT-RAM and 5% (maximum 7.3%) for MeRAM for different

and temperatures. We summarize the write pulse variationin Table III.

TABLE IIISUMMARY OF WRITE PULSE VARIATION DUE TO TRANSISTOR PROCESS

VARIATION AT TEMPERATURE OF AND . MEAN SHIFT IS THEPERCENTAGE CHANGE OF PARAMETERS' MEAN BETWEEN HIGH AND LOW

MTJ RESISTANCE STATES

Fig. 8. Monte-Carlo simulation flow to obtain the WER of 1T1M MRAMarray. N is the sample size, and T is the simulation time including a writingtime and a waiting time (for the MTJ to settle down, e.g., waiting time is 20 nsin the simulations).

The , of pulse current/voltage, rise/fall time are fittedto polynomial models of MTJ resistance, which are accurateenough as their dependence on resistance is nearly linear. Themodels are inputs to the CUDA simulator. The Monte-Carlosimulation flow is shown in Fig. 8. At the beginning of a simu-lation, pulse voltage/current, rise/fall time, and MTJ parametersare generated following Normal distribution. During the simula-tion, pulse voltage, current, and fall time are updated accordingto the MTJ resistance state.The WER of a 1T1M STT-RAM is shown in Fig. 9. As ex-

pected, the WER of STT-RAM decreases monotonically withincreasing and pulse width, and the WER increases withtemperature. The switching from to shows higher WERdue to the asymmetry in spin-torque polarization efficiency.The WER of a 1T1M MeRAM is shown in Fig. 10. Unlike

the results of the nominal VC-MTJ, the MeRAM with processvariation cannot achieve WER below because there is no


Fig. 9. WER of an STT-RAM under process and temperature variation for dif-ferent write pulses and switching directions (a: to , b: to ).

Fig. 10. WER of an MeRAM under process and temperature variation for dif-ferent write pulses. WER is averaged over two switching directions.

common optimal voltage for all VC-MTJs in theMeRAM. Tem-perature also has significant impact on the MeRAM: the WERof the the pulse (1.26 V/1.42 ns), which gives the lowest WERat 300 K, increases 1000X at 350 K; the voltage that gives thelowest WER changes from 1.26 V at 300 K to 1.20 V at 350 Kbecause high temperature leads to low thermal stability and thuslow required voltage (8); the pulse width giving the lowestWERfor a given voltage decreases with temperature for the reasonthat higher temperature reduces the horizontal demagnetizationfield, then the external field is less canceled and drives the pre-cessional switching faster. Despite of the high WER, MeRAMshows clear speed advantage.

Fig. 11. Data program flow for the PWSA multi-write design.

VI. CIRCUIT-LEVEL EVALUATION

A. MRAM Write/Read With PWSA Multi-Write Design

As well desired, peripheral circuit can improve the reliabilityof MRAMs at the expense of speed, power, and area [62].Memory with multi-write schemes can significantly reducewrite errors, e.g., incremental step pulse programming for Flashtechnology [63]. We utilize PWSA [25] to reduce the WERof MRAMs, where PWSA is a peripheral circuit designedfor STT-RAM’s and MeRAM's write and read operations. AsMeRAM uses an unidirectional pulse to write both “1” and “0”,PWSA uses an operation called pre-read to check the storeddata prior to a write, and no write is performed if the storeddata matches the writing one. In addition to pre-read, PWSAalso enables multi-write policy which can perform additionalwrite after a write error. The data flow in Fig. 11 illustratesthe PWSA multi-write design. It is noticed that read failurehas been well analyzed in [21], [55], [64], which is mainlycaused by process variation and is a permanent failure (not likewrite error). Read failure can be eliminated by chip test at theexpense of yield loss. All pre-read, comparison, and read shareone sense amplifier and are assumed to be error-free operationsin the PWSA multi-write design.We divide the multi-write flow into four steps: read (also

pre-read, including pre-charge and sensing), load, comparison,and write (including control of logic circuit andMTJ switching).To obtain reasonable delay and energy consumption of the pe-ripheral circuit, each sense amplifier is connected to a bit-line of256 1T1M cells and a ref bit-line. The delay and energy for thesesteps are extracted from Spectre simulation of PWSA circuit[25] using 32 nm PTMHPmodel [65] and are listed in Table IV.To guarantee a good pulse shape for the write in MeRAM, apre-charge operation is performed to raise the bit-line voltage to

prior to turning on access transistors, where the pre-chargetakes around 0.15 ns. Though STT-MTJs do not have strict re-quirement on the pulse shape, STT-RAM needs to raise bit-linevoltage to bias access transistors to offer required write cur-rent, which takes similar delay and consumes more power dueto its larger access transistors. Conversely, the read energy ofSTT-RAM is lower, because the low resistance of STT-MTJ al-lows lower read voltage than VC-MTJ. The difference of readdelay is within 0.2 ns. MeRAM shows great advantage in theMTJ switching energy, whereas this energy is a bottleneck for


TABLE IVENERGY AND DELAY FOR OPERATIONS IN THE PWSA

MULTI-WRITE CIRCUIT AT 300 K TEMPERATURE

STT-RAM due to the high leakage current caused by the low re-sistance of STT-MTJs. The switching energy can be reduced bypre-read, though pre-read is not mandatory for STT-RAM (i.e.,the STT-MTJ can be directly written with directional current).We utilize the PWSA multi-write design to achieve reliable

STT-RAM and MeRAM with the same acceptable WER andthen compare the expected latency and energy of writing a word.The acceptable WER is for a cell in one multi-writeoperation, which is slightly smaller than the soft error rate inDRAM technology [66] and can be handled by error-correctioncode (ECC) designs. A word is a set of bits to be written in par-allel, and the size of word varies from storage devices, e.g., amain memory usually writes 64 bits in parallel. The expectedwrite latency/energy is the sum of products of the delay/energyand probability for all possible scenarios (i.e., different num-bers of writes for different number of bits). In our calculations,we assume the memory system to store “1”s and “0”s in bal-ance, which means there is 50% probability that the bit to bewritten equals to the bit stored in the target MRAM cell. As themulti-write design writes only to the cells that fail in the pre-readcheck or previous writes, so the expected energy does not countthe write energy of the cells that do not need writes (i.e., en-ergy of the pre-read and comparison is always counted). Thereis a maximum allowed write times per multi-write operation toavoid infinite writes in case that permanently failed cells exist,which is calculated by the number of writes to achieve the ac-ceptable WER of , e.g., three for MeRAM at 300 K.We tried multiple write configurations for STT-RAM to ex-

plore tradeoffs between write latency and energy: with pre-read,without pre-read, and different write pulse widths for switchingfrom to including 3 ns, 6 ns, and 9 ns. For switching from

to , the pulse width is set to 3 ns, which exactly achievesthe acceptable WER. The for STT-RAM is chosen to 1.3V, which is the most efficient one in Fig. 10(a). The write pulseused for MeRAM is the optimized pulse at 300 K [1.26 V/1.42ns, see Fig. 10(b)]. The pre-read is mandatory for MeRAM.Fig. 12 shows the expected write energy and latency for

STT-RAM and MeRAM with PWSA multi-write design at300 K. Benefiting from the fast and energy-efficient switchingof VC-MTJs, MeRAM shows substantial advantages of bothspeed and energy against STT-RAM. Among configurationsfor STT-RAM, the one with the shortest pulse width of 3 nsand pre-read gives the lowest expected energy, as most cellscan pass the comparison check in the pre-read and the firstwrite. Nevertheless, it has the longest expected latency for thatthe pre-read adds latency and its high WER leads to the mostwrite errors and write iterations. The STT-RAMwith 6 ns pulsewithout pre-read shows the fastest speed, as a comparison,

Fig. 12. Expected word-write energy and latency for MeRAM and STT-RAMwith PWSAmulti-write circuit. The word size is 256 bits, which are the numberof bits being simultaneously written. The WER/bit after multiple writes is mini-mize below . The labels 3 ns, 6 ns, and 9 ns on STT-RAM are single writepulse widths. The pre-read is not mandatory for STT-RAM. The top circled de-signs are STT-RAMs without pre-read operation. The bottom circled ones areSTT-RAMs with pre-read operation, which count the overhead of pre-read butsave unnecessary write.

Fig. 13. Impact of word size and temperature on the expected write latency andenergy of MRAMs. All bars are normalized within each group to the expectedwrite latency of MRAM at 300 K with 64-bit word size.

the 9 ns pulse has lower WER but higher latency and energy,because the benefit of the lower WER does not compensatethe overhead brought by its longer pulse. This also indicatesthat directly using a long pulse in STT-RAM to guarantee zeroWER is not an energy-latency efficient design. The energy-la-tency Pareto fronts of STT-RAMs are 3 ns with pre-read, 6 nswithout pre-read, and 6 ns with pre-read.Fig. 13 shows the impact of word size and temperature on

the write latency. A longer word usually leads to more writeiterations as more cells are written giving rise to more write er-rors. The STT-RAM with the 3 ns pulse is affected most bythe word size due to its high WER. The STT-RAM with 3 nspulse and MeRAM are affected most by temperature, as theirWERs increase the most from 300 K to 350 K (i.e., the WERof a MeRAM cell increases from to , andthe WER of an STT-RAM cell with 3 ns pulse increases from0.06 to 0.09). Moreover, the temperature induced overhead in-creases with word size. Illustrated from the comparison of twoSTT-RAMs with 6 ns pulse, pre-read can mitigate the impactof temperature variation for the reason that about 50% writes


TABLE VFAILURE TYPES AND FIT FOR A 16 MB MEMORY BANK. READS ANDWRITES IN A BANK-HOUR ARE ASSUMED. THE READ DISTURBANCE RATEIS EXTRAPOLATED FROM SIMULATIONS. AS A COMPARISON, THE FIT OF

SINGLE-BIT FAULT IN A 16 MB BANK IS ABOUT [66], AND THE FITOF DDR BUS ERRORS IS ABOUT 100 [68]

are saved if pre-read is enabled. For all MRAM designs, max-imum 15% latency increase is shown due to temperature vari-ation, whereas energy only shows maximum 2.3% increase at350 K against 300 K. Among the energy overhead, a big portioncomes from the energy of bit-line charging (i.e., 4% increase inthis energy from 300 K to 350 K). Again, the small energy over-head is because that most cells only need one write.The delay and power overhead of pre-read and comparison

are simulated and listed in Table IV. In this section, we analyzethe area overhead of PWSA. One PWSA contains 37 transis-tors and one regular STT-RAM sense amplifier contains 8 tran-sistors [67]. Considering the bit-line size of 256 cells and fourbit-lines sharing one sense amplifier, the area overhead is 2.7%.However, the area of design also depends on size of transistors.The transistors of sense amplifier for STT-RAM are much largerthan those for MeRAM given that STT-RAM requires largerwrite current. Indeed the PWSA (37 transistors) for MeRAMoccupies 20% less area than the regular sense amplifier (8 tran-sistors) for STT-RAM.

B. Failure Analysis and Error CorrectionAs is listed in Table V, memory failures in STT-RAM and

MeRAM are classified into four types: write errors, retentionerrors, read failure, and read disturbance. We use failure-in-time(FIT, average number of failures in a billion-device-hours) torepresent the error rate for these faulure types.Write errors have been analyzed in the Sections V and VI-A.

The multi-write scheme significantly reduces write error rate.Its FIT for an 16-MB MeRAM bank decreases from overwithout multi-write scheme to below with multi-writescheme.False switching of MTJs during idle state is called reten-

tion error. As is mentioned in Section III, the VC-MTJs andSTT-MTJs have been designed with enough margin in thermalstability to minimize the retention error. The FIT of retentionerror in a 16-MBMRAM is calculated according to Table II andis listed in Table V.Read failure due to the MTJ resistance variation [21], [55],

[64] is a persistent memory failure which stay in memory andfrequently produces errors. More specifically, large shape vari-ation of an MTJ can lead to significant resistance change whichdecreases sensing margin and results in read errors. Read errorsare produced in all reads to a failed MTJ (with large resistancechange) indicating that the multi-write design also creates writeerrors due to the involved read step. However, the multi-writedesign does not increase read error rate for the fact that suchwrite errors only occur on failed MTJs, and the failed MTJs arealways read out incorrectly. The resistance change due to shape

Fig. 14. Read disturbance rate as a function of read voltage. The read distur-bance rate for MeRAM and STT-RAM are extrapolated to the read voltage dropon MTJs (0.48 V and 0.15 V are respectively for VC-MTJs and STT-MTJs).

variation is mainly caused by wafer-level process variation [1],which can be minimized by increased TMR or recently devel-oped peripheral circuit designs, e.g., the local-reference readingscheme [69] and the self-reference scheme [70]. In our experi-mental setup, the AP and P resistance for STT-RAM (MeRAM)are and respectively,and reference resistors are . Reference re-sistors are fabricated with traditional CMOS process, and theirvariation are negligible compared to MTJs. The MTJ resistancevariation is assumed to follow Gaussian distribution [21], [55].In [51], standard deviation of MTJ resistance is measured as1.5% of mean resistance from a 4-Mb MRAM array. Accord-ingly, we calculate its MgO thickness variation in Table II andestimate the resistance standard deviation of 2.6% for STT-MTJs and VC-MTJs with the diameter of 60 nm. By setting0.05 V sensing margin for sense amplifiers to operate function-ally (i.e., 0.05 V is enough for the limited variation of largesized sense amplifiers), our sensing scheme (using PWSA and2 ns sensing time) can tolerate 17.5% resistance variation (i.e.,STT-RAM: 20% in AP and 45% in P, MeRAM: 17.5% in APand 40% in P). It is noticed that we consider both access tran-sistor variation and MTJ shape variation, but the access tran-sistor variation has negligible impact due to the low sensingvoltage (0.2 V for STT-RAM and 0.48 V for MeRAM), wheresensing current is dominated by MTJs. The read failure rate ofan MTJ due to resistance variation is , which givesrise to 99.22% yield for a 16-MB bank array. Redundancy tech-nique of sparing columns is a common technology for yield im-provement. By adding one sparing column (every column has256 cells) to every mat (contains multiple rows and columns,e.g., a 16-MB bank has 64 mats, and every mat has 256 rowsand 8192 columns), the yield of a 16-MB memory bank is im-proved to 99.994% with 0.01% area overhead.The failure rate of read disturbance is close to zero when short

read pulse and low read voltage (0.48 V on VC-MTJs,and 0.15 V on STT-MTJs) are used [24]. We have simulatedthe read disturbance of MeRAM and STT-RAM as functionsof read voltage with the precision of using the CUDALLG-based model and extrapolated the read disturbance to ourread voltage using polynomial models as shown in Fig. 14.Error-correction-code (ECC) is a common technique to

protect memory from memory errors. We use MEMRES (a fastsystem-level memory reliability simulator) [71] to simulate


TABLE VIWRITE/READ LATENCY/ENERGY FOR ONE WRITE/READ IN A X8 16-MBSTT-RAM AND MERAM BANKS. ONE WRITE/READ OPERATES ON 64BITS (72BITS IN MEMORY BANKS FOR IN-MEMORY ECC DETECTION

AND CORRECTION) IN A ROW IN BURST MODE

and analyze the impact of MRAM introduced failures on an8-GB memory. The memory is comprised of 512 16-MBbanks and is protected by in-memory ECC (SECDED) (i.e.,locates in memory banks) and in-controller SECDED/Chipkill(i.e., locates in memory controller) [72], [73]. MRAM intro-duced failures (listed in Table V) are included in MEMRESsimulations in addition to typical memory logic-circuit in-duced failures (e.g., bank failure, row failure, column failure,etc.[71]). Based on simulated results, the probability thatMRAM introduced failures cause an ECC uncorrectable errorin an 8-GB memory is for five-year operating time(i.e., no such error is found in 10 000 000 five-year memoryreliability simulations), indicating that traditional ECC designsare strong enough to handle failures in MeRAM and STT-RAMwith PWSA multi-write design.

C. Latency, Energy, and Area of a 16-MB MRAM Bank

In order to include the energy and latency of ECC designs, wecompare STT-RAM and MeRAM in memory-bank level. Withinputs of MTJ cell area (see Table II) and bit-line write/read la-tency/energy (see Table IV), we use NVSIM [74] to obtain thearea, energy, and latency of a 16-MB STT-RAM bank and a16-MB MeRAM bank. In-controller ECC commonly exists incurrent server-class processors for DRAM error detection andcorrection. In-memory ECC is a new technology, which correcterrors individually in memory banks. We only count the powerand latency of in-memory ECC in our STT-RAM and MeRAMcomparison, because in-memory ECC can correct all MRAMintroduced failures, and in-controller ECC is already used forcurrent memory technologies. The in-memory ECC detectionand correction latency are about 0.34 ns and 4.4 ns, respectively,[75], and encoding latency is assumed to be 0.3 ns (i.e., shouldbe little shorter than detection). The energy of encoding, detec-tion, and correction is below 1 pJ per access, which are ignoredcompared to other memory components. The area overhead ofin-memory ECC is about 12.5%.We summarize the area, latency, and energy of one access to

16-MB STT-RAM and MeRAM banks in Table VI (every ac-cess is 64 bits in burst mode). Benefited by the smaller size ofVC-MTJs,MeRAMhas smaller bank area, shorter interconnect,and smaller sized peripheral circuits, which turn into less energyand shorter latency in the logic operations like row decodingand MUX selection. The bank read latency and energy are dom-inated by these logic operations, thus MeRAM shows fasterread speed and less read energy. For write operation, VC-MTJs'small cell size, shorter write pulse, and less write energy jointlybuild the advantages of both write energy and latency.

VII. CONCLUSIONWe comprehensively compare the two promising non-volatile

magnetic memory technologies, STT-RAM and MeRAM, inthe circuit context with respect to reliability, energy, speed,area, and scalability. MeRAM has higher WER than STT-RAMunder process and temperature variation, but by utilizing amulti-write design, both MRAMs are able to achieve an ac-ceptably low WER. With clear advantages of MTJ switchingdelay and energy, MeRAM outperforms STT-RAM by 83% inwrite speed, 67.4% in write energy, 138% in read speed, and28.2% in read energy. In terms of density, VCMA allows touse minimum sized access transistors, which helps MeRAM toachieve twice the density of STT-RAM at 32 nm node, and thedensity advantage is expected to increase at smaller nodes indi-cating that MeRAM has better density scalability. With respectto challenge of technology scaling down, simply shrinkingdimension does not save energy and introduces fabricationdefects for both technologies; more effort should be spent ondiscovering materials with higher polarization efficiency.

ACKNOWLEDGMENT

The authors would like to thank R. Dorrance, D. Markovic, J.Alzate, J. Nath, D. Srinivasan, and D. Matic for their help andcontribution in developing the LLG equation based model andsimulator.

REFERENCES[1] S. Tehrani et al., “Progress and outlook for MRAM technology,” IEEE

Trans. Magn., vol. 35, no. 5, pp. 2814–2819, Sep. 1999.[2] B. Fang et al., “Tunnel magnetoresistance in thermally robust

Mo/CoFeB/MgO tunnel junction with perpendicular magneticanisotropy,” AIP Adv., vol. 5, no. 6, p. 067116, 2015.

[3] W. Skowroński et al., “Underlayer material influence on electric-fieldcontrolled perpendicular magnetic anisotropy in CoFeB/MgO mag-netic tunnel junctions,” Phys. Rev. B, vol. 91, no. 18, p. 184410, 2015.

[4] C. Heide, “Spin currents in magnetic films,” Phys. Rev. lett., vol. 87,no. 19, p. 197201, 2001.

[5] D. C. Worledge et al., “Spin torque switching of perpendicular Taform,” Appl. Phys. Lett., vol. 98, no. 2, p. 2501, 2011.

[6] ITRS, 2008 2011 [Online]. Available: http://www.itrs.net/about.html[7] Z. Sun et al., “Multi retention level STT-RAM cache designs with a

dynamic refresh scheme,” in Proc. 44th Annu. IEEE/ACM Int. Symp.Microarchitect. ACM, 2011, pp. 329–338.

[8] C. W. Smullen et al., “Relaxing non-volatility for fast and energy-ef-ficient STT-RAM caches,” in Proc. IEEE 17th Int. Symp. IEEE HighPerformance Comput. Architect., 2011, pp. 50–61.

[9] W. Xu et al., “Design of last-level on-chip cache using spin-torquetransfer RAM (STT RAM),” IEEE Trans. Very Large Scale (VLSI)Syst., vol. 19, no. 3, pp. 483–493, Mar. 2011.

[10] A. Jog et al., “Cache revive: Architecting volatile STT-RAM caches forenhanced performance in CMPs,” in Proc. 49th Annu. Design Automat.Conf., 2012, pp. 243–252.

[11] E. Kultursay,M. Kandemir, A. Sivasubramaniam, andO.Mutlu, “Eval-uating STT-RAM as an energy-efficient main memory alternative,” inProc. IEEE Int. Symp. Performance Anal. Syst. Software, 2013, pp.256–267.

[12] S. Pelley, P. M. Chen, and T. F. Wenisch, “Memory persistency,” inProc. 41st Annu. Int. Symp. Comput. Archit., 2014, pp. 265–276.

[13] K. C. Chun et al., “A scaling roadmap and performance evaluationof in-plane and perpendicular MTJ based STT-MRAMs for high-den-sity cache memory,” IEEE J. Solid-State Circuits, vol. 48, no. 2, pp.598–610, Feb. 2013.

[14] P. K. Amiri, P. Upadhyaya, J. G. Alzate, and K. L. Wang, “Electric-field-induced thermally assisted switching of monodomain magneticbits,” J. Appl. Phys., vol. 113, no. 1, p. 013912, 2013.

[15] J. G. Alzate et al., “Voltage-induced switching of nanoscale magnetictunnel junctions,” in Proc. IEEE Int. Electron Devices Meet., 2012, pp.29–5.


[16] R. Dorrance et al., “Diode-MTJ crossbar memory cell using voltage-induced unipolar switching for high-density MRAM,” IEEE ElectronDevice Lett., vol. 34, no. 6, pp. 753–755, Jun. 2013.

[17] S. Kanai et al., “Electric field-induced magnetization reversal in a per-pendicular-anisotropy CoFeB-MgO magnetic tunnel junction,” Appl.Phys. Lett., vol. 101, no. 12, p. 122403, 2012.

[18] Y. Shiota et al., “Pulse voltage-induced dynamic magnetizationswitching in magnetic tunneling junctions with high resistance-areaproduct,” Appl. Phys. Lett., vol. 101, no. 10, p. 102406, 2012.

[19] Y. Shiota et al., “Induction of coherent magnetization switching in afew atomic layers of FeCo using voltage pulses,” Nature Mater., vol.11, no. 1, pp. 39–43, 2012.

[20] W.-G. Wang, M. Li, S. Hageman, and C. L. Chien, “Electric-field-as-sisted switching in magnetic tunnel junctions,” Nature Mater., vol. 11,no. 1, pp. 64–68, 2012.

[21] J. Li, C. Augustine, S. Salahuddin, and K. Roy, “Modeling offailure probability and statistical design of spin-torque transfermagnetic random access memory (STT MRAM) array for yieldenhancement,” in Proc. 45th ACM/IEEE Design Automat. Conf.,2008, pp. 278–283.

[22] Z. Xu et al., “Compact modeling of STT-MTJ devices,” Solid-StateElectron., vol. 102, pp. 76–81, 2014.

[23] P. Wang et al., “A thermal and process variation aware MTJ switchingmodel and its applications in soft error analysis,” in Proc. IEEE Int.Conf. Comput.-Aided Design, 2012, pp. 720–727.

[24] Y. Zhang, X. Wang, and Y. Chen, “STT-RAM cell design optimiza-tion for persistent and non-persistent error rate reduction: A statisticaldesign view,” in Proc. Int. Conf. Comput.-Aided Design, 2011, pp.471–477.

[25] H. Lee et al., “Design of a fast and low-power sense amplifier andwriting circuit for high-speed MRAM,” IEEE Trans. Magn., vol. 51,no. 5, pp. 1–7, May 2015.

[26] R. Sbiaa et al., “Reduction of switching current by spin transfer torqueeffect in perpendicular anisotropy magnetoresistive devices,” J. Appl.Phys., vol. 109, no. 7, p. 07C707, 2011.

[27] Y. Zhang et al., “Compact modeling of perpendicular-anisotropyCoFeB/MgO magnetic tunnel junctions,” IEEE Trans. Electron De-vices, vol. 59, no. 3, pp. 819–826, Mar. 2012.

[28] K. J. Lee, O. Redon, and B. Dieny, “Analytical investigation ofspin-transfer dynamics using a perpendicular-to-plane polarizer,”Appl. Phys. Lett., vol. 86, no. 2, p. 022505, 2005.

[29] Y. Huai, “Spin-transfer torqueMRAM (STT-MRAM): Challenges andprospects,” AAPPS Bull., vol. 18, no. 6, pp. 33–40, 2008.

[30] J. G. Alzate et al., “Temperature dependence of thevoltage-controlled perpendicular anisotropy in nanoscale MgO—CoFeB— Ta magnetic tunnel junctions,” Appl. Phys. Lett., vol.104, no. 11, p. 112410, 2014.

[31] X. Zhang et al., “An electrothermal/electrostatic dual driven MEMSscanner with large in-plane and out-of-plane displacement,” in Proc.IEEE Int. Conf. Opt. MEMS Nanophoton., 2013, pp. 13–14.

[32] X. Zhang, B. Li, X. Li, and H. Xie, “A robust, fast electrothermalmicromirror with symmetric bimorph actuators made of copper/tung-sten,” in Proc. 18th Int. Conf. IEEE Solid-State Sensors, Actuators andMicrosystems, 2015, pp. 912–915.

[33] J. C. Slonczewski, “Current-driven excitation of magnetic multi-layers,” J. Magnetism Magnetic Mater., vol. 159, no. 1, pp. L1–L7,1996.

[34] G. D. Fuchs et al., “Adjustable spin torque in magnetic tunnel junctionswith two fixed layers,” Appl. Physics Lett., vol. 86, no. 15, p. 152509,2005.

[35] J. C. Sankey et al., “Measurement of the spin-transfer-torque vectorin magnetic tunnel junctions,” Nature Phys., vol. 4, no. 1, pp. 67–71,2008.

[36] M. Hosomi et al., “A novel nonvolatile memory with spin torquetransfer magnetization switching: Spin-RAM,” in Proc. IEEE Int.Electron Devices Meet., 2005, pp. 459–462.

[37] F. Bonell et al., “Large change in perpendicular magnetic anisotropyinduced by an electric field in FePd ultrathin films,” Appl. Phys. Lett.,vol. 98, no. 23, p. 232510, 2011.

[38] T. Nozaki et al., “Voltage-induced magnetic anisotropy changes in anultrathin FeB layer sandwiched between twoMgO layers,” Appl. Phys.Exp., vol. 6, no. 7, p. 073005, 2013.

[39] W.Wu et al., Exploiting Parallelism by Data Dependency Elimination:A Case Study of Circuit Simulation Algorithms 2012.

[40] X. Chen et al., “An escheduler-based data dependence analysis and taskscheduling for parallel circuit simulation,” EEE Trans. Circuits Syst. II,Exp. Briefs, vol. 58, no. 10, pp. 702–706, Oct. 2011.

[41] W. Wu et al., “FPGA accelerated parallel sparse matrix factorizationfor circuit simulations,” in Reconfigurable Computing: Architectures,Tools and Applicant. New York: Springer, 2011, pp. 302–315.

[42] R. F. Freitas and W. W. Wilcke, “Storage-class memory: The nextstorage system technology,” IBM J. Res. Develop., vol. 52, no. 4, pp.439–447, 2008.

[43] A. Nigam et al., “Delivering on the promise of universal memory forspin-transfer torque RAM (STT-RAM),” in Proc. 17th IEEE/ACM Int.Symp. Low-Power Electron. Design, 2011, pp. 121–126.

[44] N. D. Rizzo et al., “Thermally activated magnetization reversal in sub-micron magnetic tunnel junctions for magnetoresistive random accessmemory,” Appl. Phys. Lett., vol. 80, no. 13, pp. 2335–2337, 2002.

[45] S. Ohuchida, K. Ito, and T. Endoh, “Impact of sub-volume excitation onimproving overdrive delay product of sub-40 nm perpendicular mag-netic tunnel junctions in adiabatic regime and its beyond,” Jpn J. Appl.Phys., vol. 54, no. 4S, p. 04DD05, 2015.

[46] S. Fu, V. Lomakin, A. Torabi, and B. Lengsfield, “Modeling perpen-dicular magnetic multilayered oxide media with discretized magneticlayers,” in Proc. IEEE Magn. Conf, 2015, pp. 1–1.

[47] W.-C. Wang and P. Gupta, “Efficient layout generation and evalua-tion of vertical channel devices,” in Proc. 2014 IEEE/ACM Int. Conf.Comput.-Aided Design, 2014, pp. 550–556.

[48] J. Z. Sun et al., “Effect of subvolume excitation and spin-torque effi-ciency on magnetic switching,” Phys. Rev. B, vol. 84, no. 6, p. 064413,2011.

[49] T. Ohsawa et al., “1 Mb 4T-2MTJ nonvolatile STT-RAM for em-bedded memories using 32 b fine-grained power gating technique with1.0 ns/200 ps wake-up/power-off times,” in Proc. Symp. IEEEVLSICircuits, 2012, pp. 46–47.

[50] K. T. Nam et al., “Switching properties in spin transper torque MRAMwith sub-5O nm MTJ size,” in IEEE Non-Volatile Memory Technol.Symp., 2006, pp. 49–51.

[51] R. W. Dave et al., “MgO-based tunnel junction material for high-speedtoggle magnetic random access memory,” IEEE Trans. Magn., vol. 42,no. 8, pp. 1935–1939, Aug. 2006.

[52] J. M. Slaughter et al., “Magnetic tunnel junction materials for elec-tronic applications,” JOM, vol. 52, no. 6, p. 11, 2000.

[53] Y. Ye, F. Liu, S. Nassif, and Y. Cao, “Statistical modeling and sim-ulation of threshold variation under dopant fluctuations and line-edgeroughness,” in Proc. 45th ACM/IEEE Design Automat. Conf., 2008,pp. 900–905.

[54] E. Y. Chen et al., “Comparison of oxidation methods for magnetictunnel junction material,” J. Appl. Phys., vol. 87, no. 9, pp. 6061–6063,2000.

[55] J. Li, H. Liu, S. Salahuddin, andK. Roy, “Variation-tolerant spin-torquetransfer (STT) MRAM array for yield enhancement,” in Proc. IEEECustom Integr. Circuits Conf., 2008, pp. 193–196.

[56] J.-Y. Park et al., “Etching of CoFeB using CO/NH3 in an inductivelycoupled plasma etching system,” J. Electrochem. Soc., vol. 158, no. 1,pp. H1–H4, 2011.

[57] G. Leung et al., “An evaluation framework for nanotransfer printing-based feature-level heterogeneous integration in VLSI circuits,” IEEETrans. Very Large Scale (VLSI) Syst., 2015, to be published.

[58] S. Wang, A. Pan, C. O. Chui, and P. Gupta, “PROCEED: A paretooptimization-based circuit-level evaluator for emerging devices,” inProc. 19th Asia South Pacific IEEE Design Automat. Conf., 2014, pp.818–824.

[59] S. Wang, A. Pan, C. O. Chui, and P. Gupta, “PROCEED: A pareto op-timization-based circuit-level evaluator for emerging devices,” IEEETrans. Very Large Scale (VLSI) Syst. , vol. 24, no. 1, pp. 192–205, Jan.2016.

[60] S. Wang et al., “Evaluation of digital circuit-level variability in inver-sion-mode and junctionless FinFET technologies,” IEEE Trans. Elec-tron Devices, vol. 60, no. 7, pp. 2186–2193, Jul. 2013.

[61] R. Scheuerlein et al., “A 10 ns read and write non-volatile memoryarray using a magnetic tunnel junction and FET switch in each cell,”in Proc. IEEE Int. Solid-State Circuits Conf., 2000, pp. 128–129.

[62] W. Wu, F. Gong, G. Chen, and L. He, “A fast and provably boundedfailure analysis of memory circuits in high dimensions,” in Proc. 19thIEEE Asia South Pacific Design Automat. Conf., 2014, pp. 424–429.

[63] K.-D. Suh et al., “A 3.3 V 32 Mb NAND flash memory with incre-mental step pulse programming scheme,” IEEE J. Solid-State Circuits,vol. 30, no. 11, pp. 1149–1156, 1995.

[64] R. Dorrance et al., “Scalability and design-space analysis of a 1T-1MTJmemory cell for STT-RAMs,” IEEE Trans. Electron Devices, vol. 59,no. 4, pp. 878–887, Apr. 2012.

[65] PTM Model [Online]. Available: http://ptm.asu.edu/


[66] V. Sridharan and D. Liberty, “A study of DRAM failures in the field,”in Proc. Int. Conf. High Performance Comput., Network., StorageAnal., 2012, p. 76.

[67] C.-T. Cheng, Y.-C. Tsai, and K.-H. Cheng, “A high-speed currentmode sense amplifier for spin-torque transfer magnetic random accessmemory,” in Proc. 53rd IEEE Int. Midwest Symp. Circuits Syst., 2010,pp. 181–184.

[68] N. Miura, K. Kasuga, M. Saito, and T. Kuroda, “An 8Tb/s 1pJ/b 0.8mm2/Tb/s QDR inductive-coupling interface between 65 nm CMOSGPU and DRAM,” in IEEE Int. Solid-State Circuits Conf. Dig.Tech. Papers, 2010, pp. 436–437.

[69] U. K. Klostermann et al., “A perpendicular spin torque switching basedMRAM for the 28 nm technology node,” in Proc. IEEE Int. ElectronDevices Meet., 2007, pp. 187–190.

[70] Y. Chen et al., “A nondestructive self-reference scheme forspin-transfer torque random access memory (STT-RAM),” in Proc.DATE. IEEE, 2010, pp. 148–153.

[71] S. Wang, H. Hu, H. Zheng, and P. Gupta, “MEMRES: A fast memorysystem reliability simulator,” in SELSE: 12th IEEE Workshop SiliconErrors Logic System Effects, 2015.

[72] M. Blaum, R. Goodman, and R. Mceliece, “The reliability of single-error protected computer memories,” IEEE Trans. Comput., vol. 37,no. 1, pp. 114–119, Jan. 1988.

[73] T. J. Dell, “Awhite paper on the benefits of chipkill-correct ECC for PCserver main memory,” IBM Microelectron. Division, pp. 1–23, 1997.

[74] X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, “NVSim: A circuit-level per-formance, energy, and area model for emerging nonvolatile memory,”IEEE Trans. Trans. Comput.-Aided Design Integr. Circuits Syst., vol.31, no. 7, pp. 994–1007, Jul. 2012.

[75] H. Duwe, X. Jian, and R. Kumar, “Correction prediction: Reducingerror correction latency for on-chip memories,” in Proc. IEEE 21st Int.Symp. High Performance Comput. Architect., 2015, pp. 463–475.

Shaodi Wang (S'12) received the B.S. degree fromthe Electronics Engineering and Computer ScienceDepartment, Division of Microelectronic, PekingUniversity, Beijing, China, and the M.S. degree inelectrical engineering from the University of Cali-fornia, Los Angeles, CA, USA, where he is currentlyfifth-year Ph.D. degree student in the NanoCAD lab,Department of Electrical Engineering, advised byProf. P. Gupta.His research interests include emerging memory

and device technology circuit- and system-level de-sign, evaluation and optimization, and modeling for manufacturing.Mr. Wang received the Department Fellowship Award in September, 2011.

Hochul Lee (S'13) received the B.S. degree inelectrical engineering from Korea University, Seoul,South Korea, in February 2005. In September2005, he joined Semiconductor Material DeviceLab (SMDL), Seoul National University, Seoul,South Korea, to pursue his M.S. degree. He joinedthe Device Research Laboratory, University ofCalifornia, Los Angeles, CA, USA, and is currentlya Ph.D. degree candidate exploring MTJs basedhybrid CMOS circuit.After graduating with the M.S. degree, he worked

for Samsung Electronics Flash memory circuit design team until July 2012.

Farbod Ebrahimi received the B.E.E. degree andM.S. degree in electrical engineering from theUniversity of Minnesota, Minneapolis, MN, USA,in 2010 and 2012, respectively.

In November 2013, he joined the Device Re-search Laboratory (DRL), University of California,Los Angeles, CA, USA, as a visiting scientist. InAugust 2014 alongside his position at DRL, hejoined Inston Inc., a startup working towards therealization of MeRAM. He is currently exploringvoltage controlled MTJs for memory and logic

applications, and spinwave systems for RF and logic applications.

Pedram Khalili Amiri (M'05) received the B.Sc.degree from Sharif University of Technology,Tehran, Iran, in 2004, and the Ph.D. degree (cumlaude) from Delft University of Technology (TUDelft), Delft, The Netherlands, in 2008, both inelectrical engineering.He joined the Department of Electrical Engi-

neering at the University of California, Los Angeles,CA, USA, in 2009, where he is currently an AssistantAdjunct Professor. His professional activities haveincluded serving as a Guest Editor for Spin, and

serving on the technical program committee of the Joint MMM/IntermagConference.Dr. Amiri was a best student paper award finalist at the IEEE International

Magnetics Conference (Intermag) in 2008.

Kang L. Wang (F’92) received the B.S. degreefrom National Cheng Kung University, Taiwan, andthe M.S. and Ph.D. degrees from the MassachusettsInstitute of Technology, Cambridge, MA, USA.He is currently a Distinguished Professor and

holds Raytheon Chair Professor in Physical Sci-ence and Electronics in the Electrical EngineeringDepartment of the University of California, LosAngeles, CA, USA. He server as an Editor for ArtechHouse and other publications. His research areasinclude nanoscale physics and materials, topological

insulators, and spintronics and devices.Dr. Wang is a member of the American Physical Society. He also served as

the Editor-in-Chief of IEEE TRANSACTIONS ON NANOTECHNOLOGY.

Puneet Gupta received the B.Tech degree inelectrical engineering from Indian Institute of Tech-nology, Delhi, India, in 2000, and the Ph.D. degreein 2007 from University of California, San Diego,CA, USA.He is currently a faculty member of the Electrical

Engineering Department at UCLA. He co-foundedBlaze DFM Inc. (acquired by Tela Inc.) in 2004 andserved as its product architect until 2007. He hasauthored over 130 papers, 16 U.S. patents, a book,and a book chapter. He currently leads the IMPACT+

center which focuses on future semi-conductor technologies. His research hasfocused on building high-value bridges across application-architecture-imple-mentation-fabrication interfaces for lowered cost and power, increased yieldand improved predictability of integrated circuits and systems.Dr. Gupta is a recipient of NSF CAREER award, ACM/SIGDA Outstanding

New Faculty Award, SRC Inventor RecognitionAward and IBMFaculty Award.

134 ...drl.ee.ucla.edu/wp-content/uploads/2017/07/... · 134...

Documents