applying machine learning and design of experiments to ... 2018... · this document does not...

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Applying Machine Learning

and Design of Experiments to

Visually-Intensive Process

Metrics

George S. BaggsSystems Engineer, Moog Space and Defense Group, East Aurora, New York

Version 20181025


Agenda• Introduction

• What is Machine Learning?

• DOE – an Overview

• Visual Classification of Process Output via Machine Learning – an Overview

• Application 1: Visual Classification of Experimental Output

– Unsupervised Machine Learning

– Supervised Machine Learning

• Application 2: DOE Optimization DOE of a CNN – a Demonstration

• Summary

• More Information

• Questions?

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 2


Introduction• Machine Learning has been used at Moog to improve the quality of processes with visual

outputs

• Machine Learning has been applied within the framework of statistical Design of Experiments (DoE) to improve experiential response quality for Additive Manufacturing (AM) process-development

• Machine learning is simply another tool that can be used complements traditional methods

• DoE can also be used to optimize the hyper-parameters of a deep learning CNN (Convolutional Neural Network)



What is Machine Learning?• A data analysis method that automates analytical model building1

• A branch of Artificial Intelligence (AI), based on the idea that machines can2

– Learn from data and identify patterns

– Make decisions with minimal human intervention

[1,2] SAS Machine Learning https://www.sas.com/en_us/insights/analytics/machine-learning.html

Input

Data Preparation

Transformation

Output

Statistics Machine Learning

Dependent Variable Label

Variable Feature

Transformation Feature Creation

Feature Creation



What is Ma chine Learning?Two different types of Machine Learning have been used at Moog to help improve process quality3

• Supervised: the machine is shown both the inputs and outputs (labeled examples), and it then learns the relationship between data inputs and outputs

– Statistical example would be a regression fit

• Unsupervised: the machine is shown the input data, and it then determines what structures and patterns exist within the data

– Statistical example would be finding data outliers

[3] Other types semi-supervised learning (limited label examples) and reinforcement learning (learn by trial and error with a reward function)

Note that Moog is exploring these as well; however, only the two above pertain to this presentation



What is Deep Learning?Deep Learning is a subfield of machine learning:

• Utilizes many stacked (i.e. ‘Deep’) layers of Artificial Neural Networks (ANN)

• ANNs are inspired by the function of neurons in the brain

• Two primary types:

– Convolutional Neural Networks (CNN) for recognizing patterns in spatial data (e.g. pictures)

– Recurrent Neural Networks (RNN) for recognizing patterns in temporal data (e.g. time series)

• CNNs and RNNs may be combined

– Example: recognize patterns in a series of images (e.g. video)

Input

Feature Extraction

Feature Creation

Output

Data

Preparation &

Training



What is Design of Experiments?A branch of applied statistics that deals with planning, conducting, analyzing and interpreting controlled tests for evaluation of factor effects on a parameter or group of parameters4, 5

• Strategically designed and executed

• Efficient (simultaneous study of factors)

• Provides for error control anderror quantification

• Facilitates unbiased evaluation of factor effects and interactions

[4] From ASQ: http://asq.org/learn-about-quality/data-collection-analysis-tools/overview/design-of-experiments.html

[5] See: http://www.moog.com/news/blog-new/GeorgeBaggsonAdditiveManufacturing.html



A Visual Classification of Process OutputAutomatic inspection and classification of metallic grain structure6

[6] Adapted Chollet simple binary classifier CNN: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

good grains

bad grains

950,369 Parameters

Training & Test Accuracy

Approximately 2500 images were used to

create 7000 examples for the CNN training

and validation dataset



When performing a DoE on an Additive Manufacturing (AM) process7, the experimental treatment combinations are arranged spatially on the AM machine build plate

Factor A

Factor B

Factor B

Factor D

Factor E

Factor F

ANOVA

A Visual Classification of Experimental Output

154.5

150.0

145.5

105.0

102.5

100.0

97.5

95.0

50.5

50.0

49.5

52.5

50.0

47.5

52.5

50.0

47.5 0.

50.0

-0.5

31.22

5

30.000

28.77

5

Power

J/m

m^

3

ExposureTime PointDistance HatchDistance FocalDistance LayerThickness

Main Effects Plot for Specific Energy Density (Simulation)Data MeansMain Effects

906030 14010060 313029

30

15

0

30

15

0

30

15

030

15

030

15

0

200175150

30

15

0

906030 951

A_Power

B_Exp Time

C_Point Dist

D_Hatch Dist

E_Focal Dist

F_Layer Thk

150

175

200

A_Power

30

60

90

Time

B_Exp

30

60

90

Dist

C_Point

60

100

140

Dist

D_Hatch

1

5

9

Dist

E_Focal

29

30

31

F_Layer Thk

Interaction Plot for Specific Energy Density (Simulation)Data MeansInteractions

420-2-4

99.99

99

90

50

10

1

0.01

Standardized Residual

Perc

en

t

1201101009080

4

2

0

-2

-4

Fitted Value

Sta

nd

ard

ized

Re

sid

ual

3210-1-2-3

120

90

60

30

0

Standardized Residual

Fre

qu

en

cy

1400

1300

1200

1100

1000

900

800

700

600

500

400

300

200

1001

4

2

0

-2

-4

Observation Order

Sta

nd

ard

ize

d R

esi

du

al

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Specific Energy DensityResidual Plots

Typical responses = Bulk metallic properties


[7] http://www.moog.com/news/blog-new/GeorgeBaggsonAdditiveManufacturing.html


Problem Statement

Additive Manufacturing DoE coupons exhibited two non-desirable features that were visually obvious

• Raised lumpy structures that tend to catch and jam the AM machine’s recoater blade

• Highly visible seams in the material

• These two features were independent of the bulk material properties of the metal


VOL DOE2 Response Coupon 1-4 VOL DOE2 Response Coupon 2-15



How to Quantify Visual Metrics?

The presence of non-desirable visual features could be rated by people, and these ratings could be used as new responses for the experiment…but…

• People tend to provide biased ratings

• People tend to be inconsistent (high variability)

• The number of participants should be as large as possible (often impractical)

• There are techniques that could be used to help mitigate these problems (e.g. psychometric methods)8

The decision was instead made to use a machine-learning-based method to eliminate the problems associated with people-provided ratings9


[8] https://www.psychometricsociety.org/content/what-psychometrics

[9] We have conducted studies that demonstrate that a machine-learning algorithm is almost twice as accurate as human SMEs



VOL DOE2 Response Coupon 1-4 VOL DOE2 Response Coupon 2-15

An unsupervised k-means clustering machine learning algorithm was used to find and isolate the raised lumpy structures

• The algorithm was programed to find two classes of data within the images, and then to color-classify these as either green or red pixels on the raw images

• The algorithm was used to color-classify all 40 coupon images from the DoE

– Images with more solid lumpy areas have less collective boundary length between the red and green than images with dispersed smaller areas of red

– The images were stored as JPEGs, which uses a ‘lossy’ compression technique

– The algorithm clusters data by trying to separate samples into ‘n’ groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.10 Here ‘n’ = 2 (classes).

– These characteristics were exploited using some digital signal processing11

An Unsupervised Machine Learning Approach

K-means Classification of Response Coupon 1-4 K-means Classification of Response Coupon 2-15

[10] http://scikit-learn.org/stable/modules/clustering.html#k-means

[11] Processing on the pixel color frequencies Image histogram



Statistical Analysis of K-Means Classification Resp onse Metric

• The DoE matrix was an RSM (Response Surface Model)

– Central Composite Design (CCD), withstandard star points (α)

– Three factors


13

Star Points (+/- α)

Center Point (0)

Cube Points (+/-1)

Factor A

Factor B

Factor C

xxx

xxx

xxx

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase



• ANOVA (Analysis of Variance)12


These extremely high F-ratios and

0% P-values indicate these factors

have a very strong effect on the

variation seen in the k-means

response metric

Error was less for the k-means

response than was seen for the

traditional additive manufacturing bulk

properties responses


[12] ANOVA: https://en.wikipedia.org/wiki/Analysis_of_variance



• Main Effects

• Interaction


1000750500

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

1.1

1.0

807060 100500

Speed (mm/sec)

Mean

of

Red

HR

ati

o

Hatch (um) Overlap (um)

GRCop84 Red Histogram Ratio on KMeans Classification of Raw Top Surface Images

Fitted MeansMain Effects PlotsFactor A Factor B Factor C

These two factors were

statistically significant

Hig

he

r H

rati

o�

Mo

re L

um

py

low medium high low medium high low medium high

Overlap (um) 50

Hold Values

Speed (mm/sec)

Hatc

h (

um

)

1000900800700600500

85

80

75

70

65

60

55

>

–

–

–

–

< 1.2

1.2 1.5

1.5 1.8

1.8 2.1

2.1 2.4

2.4

HRatio

Red

GRCop84 Red Histogram on KMeans Classification of Raw Top Surface ImagesInteraction Contour Plot

Factor A

Fact

or

B

low medium high

low

m

ed

ium

h

igh




• Residual Plots

– The ‘left-over’ variation after the RSM K-means fit and these can provide further clues

– Surprisingly well-behaved


0.50.0-0.5

99

90

50

10

1

Residual

Perc

ent

2.01.51.0

0.5

0.0

-0.5

Fitted Value

Resi

dual

0.60.40.20.0-0.2-0.4

8

6

4

2

0

Residual

Fre

quency

4035302520151051

0.5

0.0

-0.5

Observation Order

Resi

dual

4039

38

37

36

3534

33

3231

30

29

28

27

262524

23

22

2120

19

18

17

1615

14

13

12

11

10

9

87

6

5

4

32

1



Residual Plots for Red HRatio



Deep-Learning CNN 13 to Recognize the Seams

• Raw data preparation (a necessary operation for CNNs)

– Image borders were removed and the remaining field was partitioned into 4 quadrants

– To reduce dimensionality, image pixel color ranges were shifted to 1/255 and then gray-scaled (using weighted averaging on RGB channels)

A Supervised Machine Learning Approach


[13] http://cs231n.github.io/convolutional-networks/


CNN Dataset Preparation

• Training & validation dataset created

– Relatively sparse data so standard deep-learning data augmentation techniques were used14

– Random width and height shifts, random shears and zooms, and random rotations of dataset images

– Increased apparent size of the available data samples from 40 imagesto 200 total (140 for training and 60 for validation)

– Dataset was split 50/50 between seams and no-seams examples


30

30

[14] Francois Chollet (Google engineer): https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html



CNN Training and Validation

• Dynamic augmentation used during training using dynamic memory-only preprocessing as a front-end operation during training

• Trained in batches of 20 images across 500 training epochs

– Approximately 2 sec/epoch on an NvidaTitan Xp GPU (Graphics Processing Unit)

– 17 minutes of training for all 500 epochs

• Maximum CNN performance

– Training accuracy 95% validation accuracy 91.7%; overall 94% (200 dataset images)

• Observations

– Elevation of training accuracy above validation accuracy around Epoch 200 indicates that the CNN was overfitting, but in this case, this was not a concern




CNN Seam Parametric Response Metric

• The trained CNN was then used to classify each of the original greyscale image quadrants from the DoE as either no-seam = 0.00 or seam = 1.00

– The CNN achieved 94.38% classification accuracy on the original images

– Accuracy was based on which images were visually sorted for seam or no-seam when the training and validation sets were first created

• The 4 classification metrics per response image were averaged for each DoE treatment

– This ensemble approach improves the ‘depth’ of the experimental responsemetric




Statistical Analysis of CNN Seam Response Metric

• ANOVA (Analysis of Variance)


Extremely high F-ratio and 0% P-

values indicate this factor has a very

strong effect on the variation seen in

the CNN seam response metric

Error was less for the CNN seam

response than was seen for the

traditional additive manufacturing bulk

properties responses


The curved component of Factor A

was significant



• Main effects

• Interaction


Overlap (um) 50

Hold Values

Speed (mm/sec)H

atc

h (

um

)

1000900800700600500

85

80

75

70

65

60

55

>

–

–

–

–

–

< -0.50

-0.50 -0.25

-0.25 0.00

0.00 0.25

0.25 0.50

0.50 0.75

0.75

Seam-Probability

GRCop84 Presence of Seam by CNN Classification of Raw Top Surface ImagesInteraction Contour Plot

Factor AFa

cto

r B

low medium high

low

m

ed

ium

h

igh

1000750500

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

807060 100500

Speed (mm/sec)

Mean

of

Seam

-Pro

bab

ilit

y

Hatch (um) Overlap (um)

GRCop84 Presence of Seam by CNN Classification of Raw Top Surface Images

Fitted Means

Main Effects Plots

Factor A Factor B Factor C

Hig

he

r P

sea

m�

Mo

re S

ea

m

These two factors were

statistically significant

low medium high low medium high low medium high




• Residual Plot

– The ‘left-over’ variation after the RSM CNN seam fit and these can provide further clues

– Surprisingly well-behaved


0.40.20.0-0.2-0.4

99

90

50

10

1

Residual

Perc

ent

0.80.60.40.20.0

0.30

0.15

0.00

-0.15

-0.30

Fitted Value

Re

sid

ual

0.240.120.00-0.12-0.24

8

6

4

2

0

Residual

Fre

que

ncy

4035302520151051

0.30

0.15

0.00

-0.15

-0.30

Observation Order

Resi

dua

l

40

39

38

3736

35

34

33

32

31

30

29

2827

26

2524

2322

21

20

19

18

17

16

15

14

13

12

1110 9

8

7

6

5

4 3

2

1



Residual Plots for Seam-Probability



Augmenting DoE Visual Responses

• The use of ML techniques to create response metrics from visual images, for an Additive Manufacturing DoE, was positive

– Provides a surrogate for human subject-matter experts to classification and rate visual responses

– A CNN was trained just for the purposes of evaluating a DoE response can over-fit without much worry:

o The experimental environment was a one-time situation and was not intended for production (the CNN has a very limited charter)

o The machine was intended to replace a human-rating with a non-biased metric

– If necessary, a production version of the CNN would be trained using an adequately sized dataset

Machine Learning with DoE



Structured Experimentation to Optimize a CNN

• Almost all publications concerning deep learning and AI report a one-factor-at-a-time (1FAT) approach to testing and algorithm optimization

• A deep learning CNN has hundreds of categorical and parametric hyperparameters (algorithm factors) to set and optimize

• When choosing methods and configurations, much of the approach follows so-called best practices

• It became apparent that the field of deep learning AI would benefit from the use of a structured experimental approach such as DoE

– The MNIST dataset was chosen as the vehicle to test this idea

– MNIST has established benchmarks of performance that can be used for CNN performance comparison

DoE Applied to Deep Learning Optimization



What is the MNIST dataset?

• Modified NIST (National Institute of Standards and Technology) handwriting database15

– Original NIST Special Database 19 created in 1995 (postal character recognition)

– Now widely used to baseline performance of various Machine Learning systems

– Dr. Sargur N. Srihari16

UB Department of Computer Science and Engineering (CSE)


[15] The MNIST Database of handwritten digits: http://yann.lecun.com/exdb/mnist/

[16] SUNY at Buffalo, Dr. Sargur N. Srihari: https://cedar.buffalo.edu/~srihari/



Machine Learning Benchmarks using MNIST

• The dataset contains 70,000 examples: 60,000 training images and 10,000 test images

• Benchmark summary below (Wikipedia)17


[17] Wikipedia: MNIST Database: https://en.wikipedia.org/wiki/MNIST_database

2012

2016

1998

2007

2009

2003

2002

2003

1998

2010

2016

2016



Machine Learning Benchmarks using MNIST

• Load the MNIST dataset into the Keras API18,19


[18] Keras: https://keras.io/

[19] https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/

False color images to

visualize gray scale levels



Machine Learning Toolset

• How the CNN was configured for this demonstration

– Blue boxes are provided as either open-source or no-fee license applications


Python 2.7.8

Anaconda Stack

(Library of Computational APIs)

Theano

(ANN API from University of Montreal)

Keras

(Deep Learning API from Francis Chollet – a Google engineer)

Tensorflow

(ANN API from Google)

CNTK

(ANN API from Microsoft)

SciPy

(Science and Engineering Libraries)

scikit-learn

(Machine Learning Libraries)

PyCharm

(Commercial Python IDE)

Everything inside this

boundary is included in the

Anaconda Stack



MNIST CNN Demonstration

• A large 5-layer CNN example from Dr. Jason Brownlee20 was selected


[20) Dr. Jason Brownlee http://machinelearningmastery.com/start-here/



MNIST CNN DoE

• Brownlee Model Large CNN for MNIST

– This demonstration CNN was already configured for near-state-of-the-art performance using best-practices

– Performance yielded ~ 1% error rate after 10 epochs of training and validation

• Built using the Keras API

– No changes to the structure of the Brownlee model, but superficial modifications for the DoE were introduced

– 2 internal CNN factors and 2 external training factors were varied for this DOE

– Internal: convolutional layer output activation (Relu or tanh) and dropout layer rate (20% and 40%) – blue indicates Brownlee baseline

– External: training sample batch size (100 and 200) and optimization algorithm (Adam or Nadam) – blue indicates Brownlee baseline




MNIST CNN DoE

• Response Surface Model (RSM)

– Central Composite Design (CCD) w/ standard star points

– Batch (79, 100, 150, 200, 221)

– Drops (15.9%, 20%, 30%, 40%, 44.1%)

– Activation (Relu and Tanh)21

– Optimizer (Adam and Nadam)22

• Run only 5 epochs per treatment

– Brevity needed (I didn’t have the GPUmachine when I did this)


2 continuous variables:

- Batches and Dropouts

2 discrete variables:

- Activation and

Optimization

Cube Points

Star Points (α)


[21) https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

[22] http://ruder.io/optimizing-gradient-descent/


MNIST CNN DoE - Method

• DOE run programmatically in Python code

• Factor level changes made automatically as program sequences through DoE matrix

• CNN is trained and validated on each DoE treatment combination (from input CSV)

• Output from CNN is then loaded back into Minitab for statistical analysis


MINITAB

(Design Experiment)MNIST_CNN_DOE.MJP

Excel

(create matrix text file)Demo_CNN_DOE.xlsx

Python

(Load matrix into a data

dictionary)

Keras

(Automatically Run DOE)MNIST_CNN_DOE_CCD-2C-2D.py

input

1 -----------

2------------

3------------

CCD-2C-2D_20170812.txt

output

1------------

2------------

3------------

DOE_Output.txt


1

2 3

4


MNIST CNN DoE - Analysis

• ANOVA (Analysis of Variance)


• Main Effects (Linear)

– Activation 7.14%

– Optimizer 19.1%

• 2-Factor Interactions

– Batch x Optimizer 4.76%

– Activation x Optimizer 4.76%

• Experimental Error

– Lack-of-fit 38.1%

– Pure Error 21.4% (reproducibility)

Surprised by variability

(very stochastic, not deterministic)




• Main Effects

• Interactions


220

.711

200.

000

150

.000

100

.000

79.2

89

0.0105

0.0104

0.0103

0.0102

0.0101

0.0100

0.0099

0.0098

0.0097

0.0096

0.44

1421

0.40

0000

0 .300

000

0.200

00 0

0.15

8 579

tanh

relu

nada

mad

am

Batch

Mean

Dropout Activation Optimizer

MNIST Large CNN 5-Epoch Error Rate

0.441

421

0.400

000

0.300

000

0.200

000

0.158

579

nada

m

adam

0.011

0.010

0.009

0.011

0.010

0.009

0.011

0.010

0.009

220.71

1

200.00

0

150.

000

100.00

0

79.28

9

0.011

0.010

0.009

tanhre

lu

Batch

Dropout

Activation

Optimizer

79.289

100.000

150.000

200.000

220.711

Batch

0.158579

0.200000

0.300000

0.400000

0.441421

Dropout

relu

tanh

Activation

adam

nadam

Optimizer

MNIST Large CNN 5-Epoch Error Rate




• Residual Plots


0.0020.0010.000-0.001-0.002

99

90

50

10

1

Residual

Pe

rce

nt

0.01100.01050.01000.00950.0090

0.002

0.001

0.000

-0.001

-0.002

Fitted Value

Resi

dua

l

0.00160.00080.0000-0.0008-0.0016

16

12

8

4

0

Residual

Fre

que

ncy

50454035302520151051

0.002

0.001

0.000

-0.001

-0.002

Observation Order

Resi

dual



Residual Plots for Error • Relatively well-behaved

– Close to normal residual variation

– No evidence of a pattern superimposed over the run-order

– The experimental matrix was not randomized

• Due to the thought that computer models are deterministic

• The CNN exhibits stochastic behavior (makes sense after thinking about it)

• Future DOEs will randomize the run order to prevent aliasing external effects into the experiment



MNIST CNN DoE - Validation

• CNN configured to best settings found in the DoE

– Batch = 200, Dropouts = 40%, Activation = ‘Relu’, Optimizer = ‘Nadam’

– Loss function = ‘Categorical-Cross Entropy’

– Validation CNN run through 48 epochs on MNIST (70,000 digit images)




MNIST CNN DoE – Comparison to Baseline Brownlee CNN

• CNN configured to Brownlee baseline: error at 0.75% after 40+ epochs

• CNN configured to best settings found in the DoE: error at 0.60% after 40+ epochs

– DoE-optimized CNN demonstrated a 20% improvement over baseline




Final ConclusionsComplementing DoE with Machine Learning (and Vice V ersa)

• Positive

– The use of ML techniques to create response metrics from visual images, for an Additive Manufacturing DoE, was successful

– ML can act as a surrogate for human subject-matter experts (SME) to classify and rate visual responses from the experimental

– ML in lieu of human SMEs will eliminate bias and variation from the experimental response

• Negative

– Proper data preparation and the correct training approach is critical for ML success (this is not trivial)

– ML tools are still in raw form (i.e. needs to be coded using a computer language)



Machine Learning Resources

More Information

1. Keras (https://keras.io/ )

2. Theano (http://deeplearning.net/software/theano/ )

3. Anaconda (https://www.continuum.io/what-is-anaconda )

4. Scipy (https://www.scipy.org/ )

5. Scikit-learn (http://scikit-learn.org/stable/ )

6. Python (https://www.python.org/ )

7. NIST database 19 (https://www.nist.gov/srd/nist-special-database-19 )

8. MNIST database (http://yann.lecun.com/exdb/mnist/ )

9. Dr. Jason Brownlee (http://machinelearningmastery.com/start-here/ )

10.CS231 online CNN course (http://cs231n.github.io/convolutional-networks/ )



More Information

1. Moog (http://www.moog.com/ )

2. Moog AM (http://www.moog.com/3dmetal/index.html )

3. Moog AM/AI (http://www.moog.com/news/blog-new/UB_Moog_Develop_AI_for_MetalAM.html )

4. Moog DoE (http://www.moog.com/news/blog-new/GeorgeBaggsonAdditiveManufacturing.html )

MOOG



More Information

CONTACT INFORMATION

George S. Baggs

Questions?


applying machine learning and design of experiments to ... 2018... · this document does not...

Documents