virtual metrology enabled early stage prediction for ...xs3d.kaist.edu/tc-sma/2013 case/virtual...
TRANSCRIPT
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Virtual Metrology Enabled Early Stage Prediction forEnhanced Control of Multi-stage Fabrication
Processes
Gian Antonio Susto *, Adrian B. Johnston **, Paul G. O’Hara **, SeánMcLoone ***
* National University of Ireland, Maynooth (NUI Maynooth)- ** Seagate Technologies, Derry, UK - *** NUIMand Queen’s University of Belfast, UK
CASE 2013 Madison - Aug. 18th, 2013
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 1 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Virtual Metrology (VM) systems
Semiconductor Industry: one of the mosttechnologically advanced manufacturingsectors
Wafer sizes are growing ⇒ Process Qualityand Control of increasing importance
Virtual Metrology (VM) systems: models ofthe process based on tool variables used topredict product parameters that are costly(time & money) to measure directly
Possible advantages of a VM system:
(i) Quality Monitoring(ii) Measurement Reduction(iii) Smart Decision Systems (i.e. Skip Lot Sampling)(iv) Enhanced Control
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 2 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Multi-Step VM Problems
Generally VM systems consider asingle processing step
The VM target (Y ) can be influencedby many processing steps
Process
1
Wafer k
VM Production Line (VMLP)
�Process
3
Process
5
Process
4
Process
2
VM Target
Processes Description
PR3
PR2PR1
Layer 5
PR4 (Sacrificial layer)
YL
PR3
PR2PR1
Layer 5
PR4 (SL)
Feature Material
Layer 5
Dry EtchingWrite Pole
Formation
• Goal: predicting YF after the Dry Etching and
before the Write Pole Formation
YRE
Y
VMPL
Feature MaterialFeature Material
Virtual Metrology Production Line(VMPL): group of processes that’strongly’ influences Y
Case study available: prediction ofwrite pole width (Y )
2 processes strongly affects the VMtarget Y
(i) Dry Etching(ii) Write Pole Formation
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 3 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Multi-Step VM ProblemsProcesses Description
PR3
PR2PR1
Layer 5
PR4 (Sacrificial layer)
YL
PR3
PR2PR1
Layer 5
PR4 (SL)
Feature Material
Layer 5
Dry EtchingWrite Pole
Formation
• Goal: predicting YF after the Dry Etching and
before the Write Pole Formation
YRE
Y
VM objective: Prediction of Y
before the Write Pole Formation
Feature MaterialFeature Material
In the problem at hand we needan early stage prediction of theVM target
Control/Decision system usage ofVM prediction
In the VM objective the VMPL input datafor modeling are divided into
(i) Observed Portion (XO)(ii) Unobserved Portion (XU)
The modeling problem is approximated
y = f (xO, xU) ⇒ y = f (xO)
Process
1
Wafer k
Observed
VMPL Portion
��
�
����
Process
3
Process
5
Process
4
Unobserved
VMPL Portion
Process
2
�(��)
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 4 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Challenging Aspects
Apart from the early stage need of the prediction, two factors that add to thechallenge of the problem at hand (from a modelling building p.o.v.) are:
(i) The desire to have an interpretable model(ii) The high level of collinearity in the data
50
100
150
200
250
300
50 100 150 200 250 300
Regressor ID
Correlation Matrix (Absolute Values)
RegressorID
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 5 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Modeling for VM: State-of-the Art
w1 w2 . . . wmO
v1 v2 v3 . . . vmH
u1 u2 u3 . . . umI−1 umI
HIDDEN LAYER
OUTPUT LAYER
INPUT LAYER
1
Neural Networks (NNs) State-of-the-art in VMmodelling(+) Non-linear modelling(+) Great Prediction accuracy(-) Slow training(-) Difficult interpretation(-) With big data requires pre-processing
Regularization Methods: competitive with NNs in prediction accuracy andperfect tools for dealing with big dataset
Most famous approaches
(i) Ridge Regression(ii) LASSO
G.A. Susto, A. BeghiLeast Angle Regression for Semiconductor Manufacturing ModelingIEEE Multi-Conference on Systems and Control, Dubrovnik (Croatia), October 3-5th, 2012, pp. 658-663
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 6 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
LASSO and Ridge Regression 1/2
Minimization Objective for Regularization Methods:
L(β) = ‖Y − Xβ‖2 + λR(β),
trade-off between prediction accuracy and model complexity governed by λ
Application of Occam’s razor: simplicity enhances generality of results
−1 0 1 2 3 4 5 6−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
LASSO sparsity example
Contour lines
LASSO admissible region Different methods depending on the choice ofR
(i) Ridge Regression (RR)− R(β) =
∑pj=1 β
2j
(ii) LASSO − R(β) =∑p
j=1 |βj |
RR works best with strong collinearities
LASSO does variable selection
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 7 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
LASSO and Ridge Regression 1/2
Minimization Objective for Regularization Methods:
L(β) = ‖Y− Xβ‖2 + λR(β),
trade-off between prediction accuracy and model complexity governed by λ
Application of Occam’s razor: simplicity enhances generality of results
−1 0 1 2 3 4 5 6−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
LASSO sparsity example
Contour lines
LASSO admissible region Different methods depending on the choice ofR
(i) Ridge Regression (RR)− R(β) =
∑pj=1 β
2j
(ii) LASSO − R(β) =∑p
j=1 |βj |
RR works best with strong collinearities
LASSO does variable selection
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 7 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
LASSO and Ridge Regression 1/2
Minimization Objective for Regularization Methods:
L(β) = ‖Y − Xβ‖2 + λR(β),
trade-off between prediction accuracy and model complexity governed by λ
Application of Occam’s razor: simplicity enhances generality of results
−1 0 1 2 3 4 5 6−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
LASSO sparsity example
Contour lines
LASSO admissible region Different methods depending on the choice ofR
(i) Ridge Regression (RR)− R(β) =
∑pj=1 β
2j
(ii) LASSO − R(β) =∑p
j=1 |βj |
RR works best with strong collinearities
LASSO does variable selection
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 7 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
LASSO and Ridge Regression 2/2
RR:coefficients are reduced in magnitude
LASSO:one coefficient at time ’enters’ themodel
Methods already employed in VMliterature (RR [Schirru2011], LASSO[Pampuri2011])
Ridge Regression
1/λβj
LASSO
1/λ
βj
-40
-30
-20
-10
0
10
20
30
-40
-30
-20
-10
0
10
20
30
40
A. Schirru, S. Pampuri, C. De Luca, G. De NicolaoMultilevel Kernel Methods for Virtual Metrology in Semiconductor Manufacturing18th IFAC World Congress, Milan (Italy), 28 Aug.-2 Sep. 2011, pp. 11614-11621
S. Pampuri, A. Schirru, G. Fazio, G. De NicolaoMultilevel Lasso applied to Virtual Metrology in Semiconductor Manufacturing7th IEEE CASE, Trieste (Italy), 24-27 Aug. 2011, pp. 244-249
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 8 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Challenging Aspects
We require both qualities (sparsity and robustness to collinearities)
Elastic Nets (ENs) [Zou2005] combined the two approaches
R(β) =
p∑j=1
αβ2j + (1− α)|βj |
Increased complexity (trade-off α between L1 and L2 penalty)
Optimized algorithms for computation [Friedman2010]
ENs work well even for problems where n < p
H. Zou and T. HastieRegularization and Variable Selection via the Elastic NetJournal of the Royal Statistical Society. Series B (Methodological), vol. 67, pp. 301-320, 2005
J. Friedman, T. Hastie and R. TibshiraniRegularization Paths for Generalized Linear Models via Coordinate DescentJournal of Statistical Software, vol. 33, pp. 1-22, 2010
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 9 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Data Available and Current PolicyProcesses Description
PR3
PR2PR1
Layer 5
PR4 (Sacrificial layer)
YL
PR3
PR2PR1
Layer 5
PR4 (SL)
Feature Material
Layer 5
Dry EtchingWrite Pole
Formation
• Goal: predicting YF after the Dry Etching and
before the Write Pole Formation
YRE
Y
Feature MaterialFeature Material
KPIVs
Available Measures:
(i) YL width after Litho (process state)(ii) YRE , width after the Dry Etch
Current Policy: monitoring of YRE as a predictor of variation in Y
Input data for our model:
(a) KPIVs, dry etching statistics (avg., min., max.)(b) YRE
(c) YL
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 10 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Data Available and Current Policy
Data Description:(a) n = 870 wafers (collected over 5 months)(b) p = 327 input variables
Monte Carlo: K = 1000 simulations with q × 100% = 70% training data
Results reported as averages over K in terms of the indicators
Av. MSE =1
K (1− q)n
K∑k=1
(1−q)n∑i=1
(yi,k − yi,k )2
Av. NMSE =100
K (1− q)n
K∑k=1
(1−q)n∑i=1
(yi,k − yi,k )2
σ2yk
[%]
Techniques compared:(i) target - production target for Y (used as single predictor of Y )(ii) OLS on YRE - Ordinary Least Square model based on YRE (current policy)(iii) Ridge Regression (RR)(iv) LASSO(v) Elastic Nets (ENs)
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 11 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Performances
Boxplot: summary of all Monte Carlosimulation performances
EN yields best results among the methodsconsidered
Regularization Techniques outperformcurrent policy
Median variability in Y explained improve by35% (from 46% to 62%)
20
40
60
80
100
120
Normalized Mean Squared Error Distribution
Target OLS on Y_RE RR LASSO Elastic Net
Method: Averaged MSE [∗104] Averaged NMSETarget 1.784 87.86 %
OLS with yRE 1.156 53.82 %Ridge Regression 1.01 46.69 %
LASSO 0.89 40.9 %Elastic Net 0.832 37.95%
TABLE IAVERAGED MSE AND NMSE OVER K = 1000 MC SIMULATIONS.
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 12 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Model Interpretation
Model Size: number of variables withassociated non-zero coefficients
Sparsity achieved: as expected, just aportion of the 327 variables enter themodel
20 40 60 80 1000
10
20
30
40
50
60
LASSO
Model Size
Tim
es S
ele
cte
d
20 40 60 80 1000
10
20
30
40
50
60
70
80
Elastic Net
Model Size
Tim
es S
ele
cte
d
−1
0
1
2
3
4
x 10−3
Y_RE CHA_VO MIN @ Stage 9 CHA_BOTTOM_POWER_VPP MIN @ Stage 9 CHA_VO MAX @ Stage 9 CHA_TOP_POWER_VPP MAX @ Stage 4
VAR A
VAR B
VAR C
VAR D VAR E
Coefficients Distribution
Model insights: the most important variables(i.e. by order of entry into the model orfrequency of selection) can be identified
Model coefficients can be studied to see theeffect on the output
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 13 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Conclusions and Open Control Problem
A VM system for a multi-stage VM prediction problem has been presented
Two innovations
(a) Early-Stage VM scheme (observable and non-observable VMPL)(b) Elastic Nets
Open problem: feed-forward control scheme with VM measures
Process
1
Wafer
Observed
VMPL Portion
�
����
Process
3
Process
5
Process
4
Unobserved
VMPL Portion
Process
2
��VM
module
�Control
system
��
��-1
�
��
�-1�
Target��
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 14 / 15
Introduction Regularization Methods Experimental Settings Experimental Results Conclusions
Thank you for your attention !
Virtual Metrology Enabled Early Stage Prediction forEnhanced Control of Multi-stage Fabrication Processes
Gian Antonio Susto, Adrian B. Johnston, Paul G. O’Hara and Seán McLoone
ACKNOWLEDGMENT - The financial support of the Irish Centre for Manufacturing Research andEnterprise Ireland are gratefully acknowledged
Gian Antonio Susto (NUI Maynooth) Early Stage VM for Multi-Stage Processes CASE 2013 Madison - Aug. 18th, 2013 15 / 15