device technology for peta computingdevice technology for ... · device technology for peta...

Post on 27-Mar-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Device Technology for Peta ComputingDevice Technology for Peta Computing

June 21, 2002

Masao Fukuma

NEC Laboratories, NEC Corporation

The World First CMOS Super Computer LSI The World First CMOS Super Computer LSI

• 290MFLOPS• 0.25-um CMOS• Supply Voltage: 2.5V• Clock: 125MHz • No. of Transistors: 3M

SX-4 @1996

Earth SimulatorEarth Simulator

Japanese ‘Computenik’!?New York Times @2002

Performance:40.0-TFLOPS (Theoretical)35.7-TFLOPS (Linpack)

Foot Print:65m x 50m

Performance:= 8-GFLOPS / LSI * 5120= 64-GFLOPS / node * 640= 128-GLOFPS / BOX * 320

System Memory:10-TByte

Process : 0.15um CMOS  8-layer Copper

Frequency : 500MHz(System)

Performance: 8-GFLOPS

Single-Chip Vector SupercomputerSingle-Chip Vector Supercomputer

Multi-GFLOPS Processor Multi-GFLOPS Processor

10.0

1.0

0.1Floa

ting

Poin

t Exe

cutio

n (G

FLO

PS)

1995 1998 2001

SX-40.35µmCMOS4M Tr125 MHz x 2 =0.25GFLOPS

SX-50.25µmCMOS15M Tr250 MHz x 4 =1GFLOPS

0.15µmCMOS

8GFLOPS

Earth Sim.

High-speed Design with Lower Power

10M1985 1990 1995 2000 2005 2010

Year

MPU

Fre

quen

cy (H

z)

Num

ber o

f Tra

nsis

tors

 (1

/chi

p)

100M

1G

10G

100K

1M

10M

100M

1G,:MPU for Supercomputer,:MPU for PC

MPU Performance TrendMPU Performance Trend

1K 10K 100K 1M 10M 100M1G

10G

100G

1T

10T

100T

1Peta

10PPe

rform

ance

(FLO

PS)

System Power (VA)

0.25µm0.15µm0.08µm0.05µm0.03µm

Technology/Power Trends for Peta-FLOPSTechnology/Power Trends for Peta-FLOPS(Scientific computing)

NEC CMOS (UX Series) RoadmapNEC CMOS (UX Series) Roadmap

1999 2000 2001 2002 2003CY1998

UC3180nm1.8V

150nm1.8V

UR3

UX4UX4150nm(Lg:130nm)1.5V

UR3H150nm1.8V

UX4UX4(Cu)(Cu)

UR4

UC40.12um1.2V

UX5UX5

CuCu

UX6UX6

CuCu

130nm(Lg: 95nm)1.2V

90nm(Lg: 65nm)1.0V

2004 2005 2006

UX7UX7

65nm(Lg: 45nm)0.9VCuCu

UX8UX8preliminarypreliminary45nm(Lg: 30nm)0.8VCuCu

2007

UX5AUX5A130nm(Lg:77-60nm)1.2V

UX6AUX6A90nm(Lg:48-38nm)1.0V

ASIC

MPUUX7AUX7A UX8AUX8A

preliminarypreliminary65nm(Lg:30-24nm)0.9V

65nm(Lg:20nm)0.9V

YEAR 2001 2002 2003 2004 2005 2006 2007 2010 2013 2016MPU Physical Gate Length (nm) 65 53 45 37 32 28 25 18 13 9Physical Gate Length low-standbypower (LSTP) (nm)

90 75 65 53 45 37 32 22 16 11

Nominal Ion at 25 C (mA/mm) 900 900 900 900 900 900 900 1200 1500 1500[NMOS] high-performanceNominal sub-threshold current (at25 C) (pA/オm) [NMOS] LSTP

1 1 1 1 1 1 1 3 7 10

Equivalent physical oxide thickness 1.3-1.6 1.2-1.5 1.1-1.6 0.9-1.4 0.8-1.3 0.7-1.2 0.6-1.1 0.5-0.8 0.4-0.6 0.4-0.5Tox (nm), high-performanceLgate 3σ variation (nm) 6.31 5.3 4.46 3.75 3.15 2.81 2.5 1.77 1.25 0.88Gate electrode sheet Rs (Ω/ ) 5 5 5 5 5 5 5 5 6 7Silicide thickness (nm) 35.8 29.2 24.8 20.4 17.6 15.4 13.8 9.9 7.2 5Contact silicide sheet Rs (Ω/ ) 4.2 5.1 6.1 7.4 8.5 9.7 10.9 15.2 21.0 30.3Drain extension Xj (nm) 27-45 22-36 19-31 15-25 13-22 12-19 10-17 7-12 5-9 4-6Number of metal levels 8 8 8 9 10 10 10 10 11 11Local wiring pitch (nm) 350 295 245 210 185 170 150 105 75 50Intermediate wiring pitch (nm) 450 380 320 265 240 215 195 135 95 65Minimum global wiring pitch (nm) 670 565 475 460 360 320 290 205 140 100Conductor effective resistivity 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2 2.2(オΩ-cm) Cu intermediate wiringBarrier/cladding thickness 16 14 12 10 9 8 7 5 3.5 2.5(for Cu intermediate wiring) (nm)Interlevel metal insulator 3.0-3.6 3.0-3.6 3.0-3.6 2.6-3.1 2.6-3.1 2.6-3.1 2.3-2.7 2.1 1.9 1.8-effective dielectric constant (k)

ITRS Technology Roadmap 2001ITRS Technology Roadmap 2001

Changes in VLSI R&DChanges in VLSI R&D

Scaling has been driving VLSI.

However, the conventional scaling is less effective for sub-100nm SOC.

Why?

Application orientedMobile:Low Vdd, Thin Ox, Low leakage currentHigh end:Minimum Lg, Low Vt

Physical/practical limitsThin Ox, Subtheshold Leak, Lithography

CMOS Development CMOS Development

Gate Length (nm)1040100 206080200400

SOC

 Pe

rform

ance

Conv.Scaling

Equivalent Scaling

Flexible ParameterCMOS

Gate Electrode :( Drivability, Parasitic RC)Low depletion, Low R, High drivability

Gate Insulator :(Drivability, SCE)

Thin Tox, High reliability, Low leakage, High mobility

Channel Design :(Drivability, SCE)Halo, Sharp profile

1nm

Si-Si Distance : ~0.3nm

4~6layer

Gate

Source Drain

Concentration

ShallowHigh

Junction Depth :(SCE, Parasitic RC)Shallow xj, High doping,Low deffect

Dep

th

GateDepletion

Insu

lato

r

Technology Challenge for sub-0.1µm CMOS Technology Challenge for sub-0.1µm CMOS

Sharp halo

Gate Electrode

Si substrate

-1.5 -1 -0.5 0 0.5 1 1.510-12

10-10

10-8

10-6

10-4

10-2

Gate voltage VG [V]

Dra

in c

urre

nt I D

[A/µ

m]

33-nm pMOSby C-S/D

|VD|=0.1,1.2V

24-nm nMOSby R-S/D

-1.5 -1 -0.5 0 0.5 1 1.50

200

400

600

800

1000

1200

33-nm pMOSby C-S/D

|VG|=0,0.3,,1.5V

Drain voltage VD [V]D

rain

cur

rent

I D[A

/µm

]

24-nm nMOSby R-S/D

Tr. Characteristics of 25nm(nFET)/33nm(pFET)Tr. Characteristics of 25nm(nFET)/33nm(pFET)

24nm n-FET w/ 800 µA/µm, 300 nA/µm by R-SD, Tox=1.6nm33nm p-FET w/ 400 µm/µm, 300nA/µm by C-SD, Tox=1.6nm

500 100 50 10

(A/V

/m)

1200

1000

800

600

400

200

0

Lgate (nm)

I D/V

D

Driving Capability Driving Capability

10-6

10-4

10-2

100

102

1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8

1p

1n

1f

Gate leakage current (A)@

L=0.1mm

, W=1m

mG

ate

leak

age

curre

nt (A

/cm

2 )

Physical Tox (nm)

Gate dielectric:SiO2nMOS Inversion

Ioff for Low power

Ioff for Standard

UX4

UX5

Ioff criteria for High speedUX6:0.10µm node

Gate leakage current vs. ToxGate leakage current vs. Tox

0 10 20 30 40 50

10-3

100

103

Dielectric Constant

J G (

A/cm

2 )

TEQ=1nmVOX=1Vm*=0.5m0

SiO2

φB=3.1eV

2.0eVφB=

1.5eV

1.0eV

0.5eV

High-k Materials for Low Gate Leakage CurrentHigh-k Materials for Low Gate Leakage CurrentHigh-k material selection guide line Higher Barrier hight than 1eVMedium K value (10-100) (due to lowering φB)

 Al2O3 ZrO2 HfO2× La2O3 Ta2O5

SiO

2

0 1 2 3 4 510-12

10-9

10-6

10-3

100

103

TEQ (nm)

J G (

A/cm

2 )

100nm SOI-CMOS 100nm SOI-CMOS

0.0 0.5 1.0 1.5 2.00

10

20

30

Supply voltage [V]

Prop

agat

ion

dela

y τ p

d[p

sec/

stag

e]

Bulk

SOI

100–nm CMOS

SOI-CMOS : ~20% faster than bulk-CMOSSOI-CMOS : smaller junction capacitance

100 nm

CoSi2

Box

SOICoSi2

M9M9M9

M9M9M9

M1M1M1M2M2M2M3M3M3M4M4M4M5M5M5M6M6M6M7M7M7M8M8M8

M9M9M9

M10?M10?M10?

L-ox

k=2.4k=2.0

k=2.9

M4M4M4

M3M3M3

M2M2M2

M1M1M1

M5M5M5

W

Cu

SiO2

UX4(150nm) UX5(130nm) UX6(100nm) UX7(70nm)

Wiring IssuesWiring Issues

low-kk=4.2

Cu

Triple-Layered p-BCB/Cu SDI on 80nm CMOS Triple-Layered p-BCB/Cu SDI on 80nm CMOS

SiO2

W-plug

Cu wiringBCB

Cu-plug

80nm

80nm p/n-gate

tox=1.9nm(SiON)SDI: Single damascene interconnect

1.0µm

0.5µm

With Ta/TaN Barrier

Keff = 3.1

Full p-BCB StructureCu Dual-Damascene Interconnect

p-BCB

BMF(Barrier Metal-Free) Cu-DDI BMF(Barrier Metal-Free) Cu-DDI

BMF

MOCVD-Cu

50%-reduction in Via-resistanceCu-DDI module with the full low-k structure reducing the interconnect delay.

Performance of Cu-DDI with BMF structure Performance of Cu-DDI with BMF structure

0.1

1

10

0.1 0.2 0.3 0.4 0.5

Med

ium

Via

Res

ista

nce(

Ω/u

nit)

Designed Via Diameter(µm)

Cu-DDI(BMF with Cu-epi-contact)

Al/W-plug

Cu-DDI with TaN-barrier

2

2.5

3

3.5

4

4.5

0.2 0.3 0.4 0.5 0.6

Mid

ium

Tpd

(ns)

Space width(µm)

Full p-BCB(BMF, keff=3.1)

p-BCB/SiO2(Hybrid, Keff=3.6)

SiO2 Keff=5

0.1mm CMOS 5GHz Clocking0.1mm CMOS 5GHz Clocking

On-Chip Transmission Line

2.5mm 200ps 20ps

Local Clock Distribution

5GHz Clocking for 10mm x 10mm

Multiple System ConfigurationsMultiple System Configurations

e.g. Communication between boards Inside Routers

OIF: Optical InterfaceSW: SwitchMCSL: Multi-Channel

Serial link

Backplane or Cable

Line Interface

SW

MCSL

SW Multi-Channel Serial link

Case 1

Case 2

40G

10G

40G

Backplaneor cable

Backplane or Cable

LineInterface

MCSL

OIF

LineInterface

MCSL

OIF

Optical Signal for WAN/LAN

10G

LineInterface

MCSL

OIF

10G

LineInterface

MCSL

OIF

LineInterface

MCSL

OIF

LineInterface

MCSL

OIF

LineInterface

MCSL

OIF

LineInterface

MCSL

OIF

LineInterface

MCSL

OIF

10G10G

44 --C

H R

XC

H R

X

44 --C

H R

XC

H R

X

44 --C

H R

XC

H R

X44 --

CH

RX

CH

RX

44 --C

H R

XC

H R

X

44 --C

H T

XC

H T

X

44 --C

H T

XC

H T

X

44 --C

H T

XC

H T

X44 --

CH

TX

CH

TX

44 --C

H T

XC

H T

X

Microphotograph of 100-Gb/s Test ChipMicrophotograph of 100-Gb/s Test Chip

4-ChannelReceiver IP core

4-ChannelTransmitter IP core

100 Gb/s Bandwidth = 20-Channel 5-Gb/s Serial Links

NoiseSource

(20,000 Flip Flops)

1 mm

2.25

mm

0.15-µm CMOS, 6 ALSupply Voltage:

1.5 VPackage:

1,296-pin FC-BGAPower Dissipation:

142 mW/Ch. (TX)236 mW/Ch. (RX)

Bit Error Rate:Under 10-13 @ 5Gb/swith a 5-meter AWG28 Cable

Chip PhotoChip Photo

0.18µm CMOS

TOX=3.8nm2 metal layers

WaveformsWaveforms

Input Data(10Gb/s)

Input Clock(2.5GHz)

Output Data(2.5Gb/s)

VDD=1.3V

High-Speed InterfaceHigh-Speed Interface

Power (W)1 10

1

51020

Band

wid

th (

Gb/

s)

CMOS

Bipolar

Parallel SiGe

100

1T

High speed

Low power

P L L

1 0 :1

M U X

8 B 1 0 B

&

3 2 :8M U X

C D R

1 :1 0D E M U X

&W A

1 0 B 8 B

8 :3 2D E M U X

&

E B

1 .1 m m1 .3 m m

1.2

m

1.3

m

5Gbps x 4 Parallel I/O LSI

CMOS Breaks Power Walls for Speed

Pentium

Dynamically Reconfigurable LogicDynamically Reconfigurable Logic

10000

1000

100

10

1# of

App

licat

ion

Prog

ram

s

101 102 103 104 105

Power Efficiency (MIPS/W)

MPU

MCU

DSP

ASIC

DRL

Realtime Reconfguration: 5ns8 Configuration Data on a Chip0.25um CMOS, 5MTr

A B

A

B

A B

A

B

A B

A

B

A B

A

BFA

Reconfigurable ComputingReconfigurable Computing

Test Chip

(1)

(2)

(3)

(4)

(5)

Rec

onfig

urat

ion

PEx

ecut

ion

of P

rogr

am

Dynamically Reconfigurable LogicDynamically Reconfigurable LogicCustomize hardwaredynamically

External control style

Reconfigurable Hardware

A

D

B

C

E

A B D C E

Task 1

X

Z

W

Y

Task N

H I J W X

Y Z

Internal control style

µP

Task 1

Task N Tim

eTi

me

Task N

MemoryController

MemoryController

Task 1

0.15um 14 Million Trs.125MHz x 4PEs

x 2-way Superscalar

1GIPS@1W, Vdd=1.3V

Dynamic power control of PEs  On-chip power switch

Idling:25mW, Sleep:0.2mW   

0.15um 14 Million Trs.125MHz x 4PEs

x 2-way Superscalar

1GIPS@1W, Vdd=1.3V

Dynamic power control of PEs  On-chip power switch

Idling:25mW, Sleep:0.2mW   

10.5mm

10.5

mm

MP98:Single-Chip MultiprocessorMP98:Single-Chip MultiprocessorNovel multi-thread architectureSoftware for parallel architectureSophisticated power management

Power Management in MP98Power Management in MP98

333µm

38.8µm

PowerSW

Block

Cap.SW Internal

CircuitsCd

Ctrl

ControlCircuits

PowerSW

De-couplingCapacitors

Multi-oxide/-Vt for high-speed and low-power circuitsSoftware controlled power management circuit

10

1.0

t

Normalmode Sleep

Cache data

Idle

Leak Current

80us

0.01

Power(W)

25ms

0.0002

Ultimately Scaled-down MOSFETUltimately Scaled-down MOSFET1988 Symp. VLSI Tech.

Thin SOI structureTs<10nm :~Tinv

Minimum Lg predictedLg~10nm

Limiting factorTunneling current

L=15nmTox=1.5nmTs=10nmVd=0.5VVg=0.3V77K

200 nm

Top View of an EJ-MOSFETTop View of an EJ-MOSFET

SEM image

lower gate

n+ regionfield

2mm

200nm LLG

8nm < LLG < 100 nmTox=5nm

Subthreshold CharacteristicsSubthreshold Characteristics

- 1 0 110- 12

10- 10

10- 8

10- 6

10- 4

DR

AIN

CU

RR

ENT

(A) LLG = 8 nm

T = 300 K

200

150

1005025

0 0.5 1 1.510- 12

10- 10

10- 8

10- 6

10- 4

LOWER- GATE VOLTAGE (V)

LLG = 52 nm

T = 300 K

200

150

100

50

25

No temperature dependency of slope for L=8nm at <100KDirect tunneling between S and D

New Direction for the 21st Century New Direction for the 21st Century

Equivalent System ScalingNew Architecture/Circuit

Equivalent Device ScalingNew Material/Process

Synergetic Device/Ckt. Co-designFlexible Parameter CMOS

P

Logic 3

Logic2Logic1

Memory1

Memory2

Interconnect

P

Logic 3

Logic2Logic1

Memory1

Memory2

Interconnect

top related