virtual metrology enabled early stage prediction for ...xs3d.kaist.edu/tc-sma/2013 case/virtual...

Introduction Regularization Methods Experimental Settings Experimental Results Conclusions

Virtual Metrology Enabled Early Stage Prediction forEnhanced Control of Multi-stage Fabrication

Processes

Gian Antonio Susto *, Adrian B. Johnston **, Paul G. O’Hara **, SeánMcLoone ***

* National University of Ireland, Maynooth (NUI Maynooth)- ** Seagate Technologies, Derry, UK - *** NUIMand Queen’s University of Belfast, UK

CASE 2013 Madison - Aug. 18th, 2013

Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 1 / 15


Virtual Metrology (VM) systems

Semiconductor Industry: one of the mosttechnologically advanced manufacturingsectors

Wafer sizes are growing ⇒ Process Qualityand Control of increasing importance

Virtual Metrology (VM) systems: models ofthe process based on tool variables used topredict product parameters that are costly(time & money) to measure directly

Possible advantages of a VM system:

(i) Quality Monitoring(ii) Measurement Reduction(iii) Smart Decision Systems (i.e. Skip Lot Sampling)(iv) Enhanced Control



Multi-Step VM Problems

Generally VM systems consider asingle processing step

The VM target (Y ) can be influencedby many processing steps

Process

1

Wafer k

VM Production Line (VMLP)

�Process

3

Process

5

Process

4

Process

2

VM Target

Processes Description

PR3

PR2PR1

Layer 5

PR4 (Sacrificial layer)

YL

PR3

PR2PR1

Layer 5

PR4 (SL)

Feature Material

Layer 5

Dry EtchingWrite Pole

Formation

• Goal: predicting YF after the Dry Etching and

before the Write Pole Formation

YRE

Y

VMPL

Feature MaterialFeature Material

Virtual Metrology Production Line(VMPL): group of processes that’strongly’ influences Y

Case study available: prediction ofwrite pole width (Y )

2 processes strongly affects the VMtarget Y

(i) Dry Etching(ii) Write Pole Formation



Multi-Step VM ProblemsProcesses Description

PR3

PR2PR1

Layer 5


YL

PR3

PR2PR1

Layer 5

PR4 (SL)

Feature Material

Layer 5


Formation



YRE

Y

VM objective: Prediction of Y



In the problem at hand we needan early stage prediction of theVM target

Control/Decision system usage ofVM prediction

In the VM objective the VMPL input datafor modeling are divided into

(i) Observed Portion (XO)(ii) Unobserved Portion (XU)

The modeling problem is approximated

y = f (xO, xU) ⇒ y = f (xO)

Process

1

Wafer k

Observed

VMPL Portion

��

�

��

Process

3

Process

5

Process

4

Unobserved

VMPL Portion

Process

2

�(��)



Challenging Aspects

Apart from the early stage need of the prediction, two factors that add to thechallenge of the problem at hand (from a modelling building p.o.v.) are:

(i) The desire to have an interpretable model(ii) The high level of collinearity in the data

50

100

150

200

250

300

50 100 150 200 250 300

Regressor ID

Correlation Matrix (Absolute Values)

RegressorID

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1



Modeling for VM: State-of-the Art

w1 w2 . . . wmO

v1 v2 v3 . . . vmH

u1 u2 u3 . . . umI−1 umI

HIDDEN LAYER

OUTPUT LAYER

INPUT LAYER

1

Neural Networks (NNs) State-of-the-art in VMmodelling(+) Non-linear modelling(+) Great Prediction accuracy(-) Slow training(-) Difficult interpretation(-) With big data requires pre-processing

Regularization Methods: competitive with NNs in prediction accuracy andperfect tools for dealing with big dataset

Most famous approaches

(i) Ridge Regression(ii) LASSO

G.A. Susto, A. BeghiLeast Angle Regression for Semiconductor Manufacturing ModelingIEEE Multi-Conference on Systems and Control, Dubrovnik (Croatia), October 3-5th, 2012, pp. 658-663



LASSO and Ridge Regression 1/2

Minimization Objective for Regularization Methods:

L(β) = ‖Y − Xβ‖2 + λR(β),

trade-off between prediction accuracy and model complexity governed by λ

Application of Occam’s razor: simplicity enhances generality of results

−1 0 1 2 3 4 5 6−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

LASSO sparsity example

Contour lines

LASSO admissible region Different methods depending on the choice ofR

(i) Ridge Regression (RR)− R(β) =

∑pj=1 β

2j

(ii) LASSO − R(β) =∑p

j=1 |βj |

RR works best with strong collinearities

LASSO does variable selection





L(β) = ‖Y− Xβ‖2 + λR(β),



−1 0 1 2 3 4 5 6−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5


Contour lines



∑pj=1 β

2j


j=1 |βj |







L(β) = ‖Y − Xβ‖2 + λR(β),



−1 0 1 2 3 4 5 6−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5


Contour lines



∑pj=1 β

2j


j=1 |βj |






RR:coefficients are reduced in magnitude

LASSO:one coefficient at time ’enters’ themodel

Methods already employed in VMliterature (RR [Schirru2011], LASSO[Pampuri2011])

Ridge Regression

1/λβj

LASSO

1/λ

βj

-40

-30

-20

-10

0

10

20

30

-40

-30

-20

-10

0

10

20

30

40

A. Schirru, S. Pampuri, C. De Luca, G. De NicolaoMultilevel Kernel Methods for Virtual Metrology in Semiconductor Manufacturing18th IFAC World Congress, Milan (Italy), 28 Aug.-2 Sep. 2011, pp. 11614-11621

S. Pampuri, A. Schirru, G. Fazio, G. De NicolaoMultilevel Lasso applied to Virtual Metrology in Semiconductor Manufacturing7th IEEE CASE, Trieste (Italy), 24-27 Aug. 2011, pp. 244-249



Challenging Aspects

We require both qualities (sparsity and robustness to collinearities)

Elastic Nets (ENs) [Zou2005] combined the two approaches

R(β) =

p∑j=1

αβ2j + (1− α)|βj |

Increased complexity (trade-off α between L1 and L2 penalty)

Optimized algorithms for computation [Friedman2010]

ENs work well even for problems where n < p

H. Zou and T. HastieRegularization and Variable Selection via the Elastic NetJournal of the Royal Statistical Society. Series B (Methodological), vol. 67, pp. 301-320, 2005

J. Friedman, T. Hastie and R. TibshiraniRegularization Paths for Generalized Linear Models via Coordinate DescentJournal of Statistical Software, vol. 33, pp. 1-22, 2010



Data Available and Current PolicyProcesses Description

PR3

PR2PR1

Layer 5


YL

PR3

PR2PR1

Layer 5

PR4 (SL)

Feature Material

Layer 5


Formation



YRE

Y


KPIVs

Available Measures:

(i) YL width after Litho (process state)(ii) YRE , width after the Dry Etch

Current Policy: monitoring of YRE as a predictor of variation in Y

Input data for our model:

(a) KPIVs, dry etching statistics (avg., min., max.)(b) YRE

(c) YL



Data Available and Current Policy

Data Description:(a) n = 870 wafers (collected over 5 months)(b) p = 327 input variables

Monte Carlo: K = 1000 simulations with q × 100% = 70% training data

Results reported as averages over K in terms of the indicators

Av. MSE =1

K (1− q)n

K∑k=1

(1−q)n∑i=1

(yi,k − yi,k )2

Av. NMSE =100

K (1− q)n

K∑k=1

(1−q)n∑i=1

(yi,k − yi,k )2

σ2yk

[%]

Techniques compared:(i) target - production target for Y (used as single predictor of Y )(ii) OLS on YRE - Ordinary Least Square model based on YRE (current policy)(iii) Ridge Regression (RR)(iv) LASSO(v) Elastic Nets (ENs)



Performances

Boxplot: summary of all Monte Carlosimulation performances

EN yields best results among the methodsconsidered

Regularization Techniques outperformcurrent policy

Median variability in Y explained improve by35% (from 46% to 62%)

20

40

60

80

100

120

Normalized Mean Squared Error Distribution

Target OLS on Y_RE RR LASSO Elastic Net

Method: Averaged MSE [∗104] Averaged NMSETarget 1.784 87.86 %

OLS with yRE 1.156 53.82 %Ridge Regression 1.01 46.69 %

LASSO 0.89 40.9 %Elastic Net 0.832 37.95%

TABLE IAVERAGED MSE AND NMSE OVER K = 1000 MC SIMULATIONS.



Model Interpretation

Model Size: number of variables withassociated non-zero coefficients

Sparsity achieved: as expected, just aportion of the 327 variables enter themodel

20 40 60 80 1000

10

20

30

40

50

60

LASSO

Model Size

Tim

es S

ele

cte

d

20 40 60 80 1000

10

20

30

40

50

60

70

80

Elastic Net

Model Size

Tim

es S

ele

cte

d

−1

0

1

2

3

4

x 10−3

Y_RE CHA_VO MIN @ Stage 9 CHA_BOTTOM_POWER_VPP MIN @ Stage 9 CHA_VO MAX @ Stage 9 CHA_TOP_POWER_VPP MAX @ Stage 4

VAR A

VAR B

VAR C

VAR D VAR E

Coefficients Distribution

Model insights: the most important variables(i.e. by order of entry into the model orfrequency of selection) can be identified

Model coefficients can be studied to see theeffect on the output



Conclusions and Open Control Problem

A VM system for a multi-stage VM prediction problem has been presented

Two innovations

(a) Early-Stage VM scheme (observable and non-observable VMPL)(b) Elastic Nets

Open problem: feed-forward control scheme with VM measures

Process

1

Wafer

Observed

VMPL Portion

�

��

Process

3

Process

5

Process

4

Unobserved

VMPL Portion

Process

2

��VM

module

�Control

system

��

��-1

�

��

�-1�

Target��



Thank you for your attention !

Virtual Metrology Enabled Early Stage Prediction forEnhanced Control of Multi-stage Fabrication Processes

Gian Antonio Susto, Adrian B. Johnston, Paul G. O’Hara and Seán McLoone

ACKNOWLEDGMENT - The financial support of the Irish Centre for Manufacturing Research andEnterprise Ireland are gratefully acknowledged


virtual metrology enabled early stage prediction for ...xs3d.kaist.edu/tc-sma/2013 case/virtual...

Documents