applying machine learning and design of experiments to ... 2018... · this document does not...

42
This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772 Applying Machine Learning and Design of Experiments to Visually-Intensive Process Metrics George S. Baggs Systems Engineer, Moog Space and Defense Group, East Aurora, New York Version 20181025

Upload: others

Post on 07-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Applying Machine Learning

and Design of Experiments to

Visually-Intensive Process

Metrics

George S. BaggsSystems Engineer, Moog Space and Defense Group, East Aurora, New York

Version 20181025

Page 2: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Agenda• Introduction

• What is Machine Learning?

• DOE – an Overview

• Visual Classification of Process Output via Machine Learning – an Overview

• Application 1: Visual Classification of Experimental Output

– Unsupervised Machine Learning

– Supervised Machine Learning

• Application 2: DOE Optimization DOE of a CNN – a Demonstration

• Summary

• More Information

• Questions?

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 2

Page 3: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Introduction• Machine Learning has been used at Moog to improve the quality of processes with visual

outputs

• Machine Learning has been applied within the framework of statistical Design of Experiments (DoE) to improve experiential response quality for Additive Manufacturing (AM) process-development

• Machine learning is simply another tool that can be used complements traditional methods

• DoE can also be used to optimize the hyper-parameters of a deep learning CNN (Convolutional Neural Network)

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 3

Page 4: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

What is Machine Learning?• A data analysis method that automates analytical model building1

• A branch of Artificial Intelligence (AI), based on the idea that machines can2

– Learn from data and identify patterns

– Make decisions with minimal human intervention

[1,2] SAS Machine Learning https://www.sas.com/en_us/insights/analytics/machine-learning.html

Input

Data Preparation

Transformation

Output

Statistics Machine Learning

Dependent Variable Label

Variable Feature

Transformation Feature Creation

Feature Creation

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 4

Page 5: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

What is Ma chine Learning?Two different types of Machine Learning have been used at Moog to help improve process quality3

• Supervised: the machine is shown both the inputs and outputs (labeled examples), and it then learns the relationship between data inputs and outputs

– Statistical example would be a regression fit

• Unsupervised: the machine is shown the input data, and it then determines what structures and patterns exist within the data

– Statistical example would be finding data outliers

[3] Other types semi-supervised learning (limited label examples) and reinforcement learning (learn by trial and error with a reward function)

Note that Moog is exploring these as well; however, only the two above pertain to this presentation

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 5

Page 6: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

What is Deep Learning?Deep Learning is a subfield of machine learning:

• Utilizes many stacked (i.e. ‘Deep’) layers of Artificial Neural Networks (ANN)

• ANNs are inspired by the function of neurons in the brain

• Two primary types:

– Convolutional Neural Networks (CNN) for recognizing patterns in spatial data (e.g. pictures)

– Recurrent Neural Networks (RNN) for recognizing patterns in temporal data (e.g. time series)

• CNNs and RNNs may be combined

– Example: recognize patterns in a series of images (e.g. video)

Input

Feature Extraction

Feature Creation

Output

Data

Preparation &

Training

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 6

Page 7: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

What is Design of Experiments?A branch of applied statistics that deals with planning, conducting, analyzing and interpreting controlled tests for evaluation of factor effects on a parameter or group of parameters4, 5

• Strategically designed and executed

• Efficient (simultaneous study of factors)

• Provides for error control anderror quantification

• Facilitates unbiased evaluation of factor effects and interactions

[4] From ASQ: http://asq.org/learn-about-quality/data-collection-analysis-tools/overview/design-of-experiments.html

[5] See: http://www.moog.com/news/blog-new/GeorgeBaggsonAdditiveManufacturing.html

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 7

Page 8: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

A Visual Classification of Process OutputAutomatic inspection and classification of metallic grain structure6

[6] Adapted Chollet simple binary classifier CNN: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

good grains

bad grains

950,369 Parameters

Training & Test Accuracy

Approximately 2500 images were used to

create 7000 examples for the CNN training

and validation dataset

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 8

Page 9: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

When performing a DoE on an Additive Manufacturing (AM) process7, the experimental treatment combinations are arranged spatially on the AM machine build plate

Factor A

Factor B

Factor B

Factor D

Factor E

Factor F

ANOVA

A Visual Classification of Experimental Output

154.5

150.0

145.5

105.0

102.5

100.0

97.5

95.0

50.5

50.0

49.5

52.5

50.0

47.5

52.5

50.0

47.5 0.

50.0

-0.5

31.22

5

30.000

28.77

5

Power

J/m

m^

3

ExposureTime PointDistance HatchDistance FocalDistance LayerThickness

Main Effects Plot for Specific Energy Density (Simulation)Data MeansMain Effects

906030 14010060 313029

30

15

0

30

15

0

30

15

030

15

030

15

0

200175150

30

15

0

906030 951

A_Power

B_Exp Time

C_Point Dist

D_Hatch Dist

E_Focal Dist

F_Layer Thk

150

175

200

A_Power

30

60

90

Time

B_Exp

30

60

90

Dist

C_Point

60

100

140

Dist

D_Hatch

1

5

9

Dist

E_Focal

29

30

31

F_Layer Thk

Interaction Plot for Specific Energy Density (Simulation)Data MeansInteractions

420-2-4

99.99

99

90

50

10

1

0.01

Standardized Residual

Perc

en

t

1201101009080

4

2

0

-2

-4

Fitted Value

Sta

nd

ard

ized

Re

sid

ual

3210-1-2-3

120

90

60

30

0

Standardized Residual

Fre

qu

en

cy

1400

1300

1200

1100

1000

900

800

700

600

500

400

300

200

1001

4

2

0

-2

-4

Observation Order

Sta

nd

ard

ize

d R

esi

du

al

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Specific Energy DensityResidual Plots

Typical responses = Bulk metallic properties

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 9

[7] http://www.moog.com/news/blog-new/GeorgeBaggsonAdditiveManufacturing.html

Page 10: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Problem Statement

Additive Manufacturing DoE coupons exhibited two non-desirable features that were visually obvious

• Raised lumpy structures that tend to catch and jam the AM machine’s recoater blade

• Highly visible seams in the material

• These two features were independent of the bulk material properties of the metal

A Visual Classification of Experimental Output

VOL DOE2 Response Coupon 1-4 VOL DOE2 Response Coupon 2-15

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 10

Page 11: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

How to Quantify Visual Metrics?

The presence of non-desirable visual features could be rated by people, and these ratings could be used as new responses for the experiment…but…

• People tend to provide biased ratings

• People tend to be inconsistent (high variability)

• The number of participants should be as large as possible (often impractical)

• There are techniques that could be used to help mitigate these problems (e.g. psychometric methods)8

The decision was instead made to use a machine-learning-based method to eliminate the problems associated with people-provided ratings9

A Visual Classification of Experimental Output

[8] https://www.psychometricsociety.org/content/what-psychometrics

[9] We have conducted studies that demonstrate that a machine-learning algorithm is almost twice as accurate as human SMEs

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 11

Page 12: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

VOL DOE2 Response Coupon 1-4 VOL DOE2 Response Coupon 2-15

An unsupervised k-means clustering machine learning algorithm was used to find and isolate the raised lumpy structures

• The algorithm was programed to find two classes of data within the images, and then to color-classify these as either green or red pixels on the raw images

• The algorithm was used to color-classify all 40 coupon images from the DoE

– Images with more solid lumpy areas have less collective boundary length between the red and green than images with dispersed smaller areas of red

– The images were stored as JPEGs, which uses a ‘lossy’ compression technique

– The algorithm clusters data by trying to separate samples into ‘n’ groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.10 Here ‘n’ = 2 (classes).

– These characteristics were exploited using some digital signal processing11

An Unsupervised Machine Learning Approach

K-means Classification of Response Coupon 1-4 K-means Classification of Response Coupon 2-15

[10] http://scikit-learn.org/stable/modules/clustering.html#k-means

[11] Processing on the pixel color frequencies Image histogram

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 12

Page 13: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Statistical Analysis of K-Means Classification Resp onse Metric

• The DoE matrix was an RSM (Response Surface Model)

– Central Composite Design (CCD), withstandard star points (α)

– Three factors

An Unsupervised Machine Learning Approach

13

Star Points (+/- α)

Center Point (0)

Cube Points (+/-1)

Factor A

Factor B

Factor C

xxx

xxx

xxx

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase

Page 14: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Statistical Analysis of K-Means Classification Resp onse Metric

• ANOVA (Analysis of Variance)12

An Unsupervised Machine Learning Approach

These extremely high F-ratios and

0% P-values indicate these factors

have a very strong effect on the

variation seen in the k-means

response metric

Error was less for the k-means

response than was seen for the

traditional additive manufacturing bulk

properties responses

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 14

[12] ANOVA: https://en.wikipedia.org/wiki/Analysis_of_variance

Page 15: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Statistical Analysis of K-Means Classification Resp onse Metric

• Main Effects

• Interaction

An Unsupervised Machine Learning Approach

1000750500

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

1.1

1.0

807060 100500

Speed (mm/sec)

Mean

of

Red

HR

ati

o

Hatch (um) Overlap (um)

GRCop84 Red Histogram Ratio on KMeans Classification of Raw Top Surface Images

Fitted MeansMain Effects PlotsFactor A Factor B Factor C

These two factors were

statistically significant

Hig

he

r H

rati

o�

Mo

re L

um

py

low medium high low medium high low medium high

Overlap (um) 50

Hold Values

Speed (mm/sec)

Hatc

h (

um

)

1000900800700600500

85

80

75

70

65

60

55

>

< 1.2

1.2 1.5

1.5 1.8

1.8 2.1

2.1 2.4

2.4

HRatio

Red

GRCop84 Red Histogram on KMeans Classification of Raw Top Surface ImagesInteraction Contour Plot

Factor A

Fact

or

B

low medium high

low

m

ed

ium

h

igh

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 15

Page 16: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Statistical Analysis of K-Means Classification Resp onse Metric

• Residual Plots

– The ‘left-over’ variation after the RSM K-means fit and these can provide further clues

– Surprisingly well-behaved

An Unsupervised Machine Learning Approach

0.50.0-0.5

99

90

50

10

1

Residual

Perc

ent

2.01.51.0

0.5

0.0

-0.5

Fitted Value

Resi

dual

0.60.40.20.0-0.2-0.4

8

6

4

2

0

Residual

Fre

quency

4035302520151051

0.5

0.0

-0.5

Observation Order

Resi

dual

4039

38

37

36

3534

33

3231

30

29

28

27

262524

23

22

2120

19

18

17

1615

14

13

12

11

10

9

87

6

5

4

32

1

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Red HRatio

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 16

Page 17: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Deep-Learning CNN 13 to Recognize the Seams

• Raw data preparation (a necessary operation for CNNs)

– Image borders were removed and the remaining field was partitioned into 4 quadrants

– To reduce dimensionality, image pixel color ranges were shifted to 1/255 and then gray-scaled (using weighted averaging on RGB channels)

A Supervised Machine Learning Approach

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 17

[13] http://cs231n.github.io/convolutional-networks/

Page 18: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

CNN Dataset Preparation

• Training & validation dataset created

– Relatively sparse data so standard deep-learning data augmentation techniques were used14

– Random width and height shifts, random shears and zooms, and random rotations of dataset images

– Increased apparent size of the available data samples from 40 imagesto 200 total (140 for training and 60 for validation)

– Dataset was split 50/50 between seams and no-seams examples

A Supervised Machine Learning Approach

30

30

[14] Francois Chollet (Google engineer): https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 18

Page 19: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

CNN Training and Validation

• Dynamic augmentation used during training using dynamic memory-only preprocessing as a front-end operation during training

• Trained in batches of 20 images across 500 training epochs

– Approximately 2 sec/epoch on an NvidaTitan Xp GPU (Graphics Processing Unit)

– 17 minutes of training for all 500 epochs

• Maximum CNN performance

– Training accuracy 95% validation accuracy 91.7%; overall 94% (200 dataset images)

• Observations

– Elevation of training accuracy above validation accuracy around Epoch 200 indicates that the CNN was overfitting, but in this case, this was not a concern

A Supervised Machine Learning Approach

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 19

Page 20: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

CNN Seam Parametric Response Metric

• The trained CNN was then used to classify each of the original greyscale image quadrants from the DoE as either no-seam = 0.00 or seam = 1.00

– The CNN achieved 94.38% classification accuracy on the original images

– Accuracy was based on which images were visually sorted for seam or no-seam when the training and validation sets were first created

• The 4 classification metrics per response image were averaged for each DoE treatment

– This ensemble approach improves the ‘depth’ of the experimental responsemetric

A Supervised Machine Learning Approach

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 20

Page 21: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Statistical Analysis of CNN Seam Response Metric

• ANOVA (Analysis of Variance)

A Supervised Machine Learning Approach

Extremely high F-ratio and 0% P-

values indicate this factor has a very

strong effect on the variation seen in

the CNN seam response metric

Error was less for the CNN seam

response than was seen for the

traditional additive manufacturing bulk

properties responses

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 21

The curved component of Factor A

was significant

Page 22: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Statistical Analysis of CNN Seam Response Metric

• Main effects

• Interaction

A Supervised Machine Learning Approach

Overlap (um) 50

Hold Values

Speed (mm/sec)H

atc

h (

um

)

1000900800700600500

85

80

75

70

65

60

55

>

< -0.50

-0.50 -0.25

-0.25 0.00

0.00 0.25

0.25 0.50

0.50 0.75

0.75

Seam-Probability

GRCop84 Presence of Seam by CNN Classification of Raw Top Surface ImagesInteraction Contour Plot

Factor AFa

cto

r B

low medium high

low

m

ed

ium

h

igh

1000750500

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

807060 100500

Speed (mm/sec)

Mean

of

Seam

-Pro

bab

ilit

y

Hatch (um) Overlap (um)

GRCop84 Presence of Seam by CNN Classification of Raw Top Surface Images

Fitted Means

Main Effects Plots

Factor A Factor B Factor C

Hig

he

r P

sea

m�

Mo

re S

ea

m

These two factors were

statistically significant

low medium high low medium high low medium high

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 22

Page 23: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Statistical Analysis of CNN Seam Response Metric

• Residual Plot

– The ‘left-over’ variation after the RSM CNN seam fit and these can provide further clues

– Surprisingly well-behaved

A Supervised Machine Learning Approach

0.40.20.0-0.2-0.4

99

90

50

10

1

Residual

Perc

ent

0.80.60.40.20.0

0.30

0.15

0.00

-0.15

-0.30

Fitted Value

Re

sid

ual

0.240.120.00-0.12-0.24

8

6

4

2

0

Residual

Fre

que

ncy

4035302520151051

0.30

0.15

0.00

-0.15

-0.30

Observation Order

Resi

dua

l

40

39

38

3736

35

34

33

32

31

30

29

2827

26

2524

2322

21

20

19

18

17

16

15

14

13

12

1110 9

8

7

6

5

4 3

2

1

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Seam-Probability

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 23

Page 24: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Augmenting DoE Visual Responses

• The use of ML techniques to create response metrics from visual images, for an Additive Manufacturing DoE, was positive

– Provides a surrogate for human subject-matter experts to classification and rate visual responses

– A CNN was trained just for the purposes of evaluating a DoE response can over-fit without much worry:

o The experimental environment was a one-time situation and was not intended for production (the CNN has a very limited charter)

o The machine was intended to replace a human-rating with a non-biased metric

– If necessary, a production version of the CNN would be trained using an adequately sized dataset

Machine Learning with DoE

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 24

Page 25: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Structured Experimentation to Optimize a CNN

• Almost all publications concerning deep learning and AI report a one-factor-at-a-time (1FAT) approach to testing and algorithm optimization

• A deep learning CNN has hundreds of categorical and parametric hyperparameters (algorithm factors) to set and optimize

• When choosing methods and configurations, much of the approach follows so-called best practices

• It became apparent that the field of deep learning AI would benefit from the use of a structured experimental approach such as DoE

– The MNIST dataset was chosen as the vehicle to test this idea

– MNIST has established benchmarks of performance that can be used for CNN performance comparison

DoE Applied to Deep Learning Optimization

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 25

Page 26: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

What is the MNIST dataset?

• Modified NIST (National Institute of Standards and Technology) handwriting database15

– Original NIST Special Database 19 created in 1995 (postal character recognition)

– Now widely used to baseline performance of various Machine Learning systems

– Dr. Sargur N. Srihari16

UB Department of Computer Science and Engineering (CSE)

DoE Applied to Deep Learning Optimization

[15] The MNIST Database of handwritten digits: http://yann.lecun.com/exdb/mnist/

[16] SUNY at Buffalo, Dr. Sargur N. Srihari: https://cedar.buffalo.edu/~srihari/

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 26

Page 27: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Machine Learning Benchmarks using MNIST

• The dataset contains 70,000 examples: 60,000 training images and 10,000 test images

• Benchmark summary below (Wikipedia)17

DoE Applied to Deep Learning Optimization

[17] Wikipedia: MNIST Database: https://en.wikipedia.org/wiki/MNIST_database

2012

2016

1998

2007

2009

2003

2002

2003

1998

2010

2016

2016

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 27

Page 28: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Machine Learning Benchmarks using MNIST

• Load the MNIST dataset into the Keras API18,19

DoE Applied to Deep Learning Optimization

[18] Keras: https://keras.io/

[19] https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/

False color images to

visualize gray scale levels

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 28

Page 29: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Machine Learning Toolset

• How the CNN was configured for this demonstration

– Blue boxes are provided as either open-source or no-fee license applications

DoE Applied to Deep Learning Optimization

Python 2.7.8

Anaconda Stack

(Library of Computational APIs)

Theano

(ANN API from University of Montreal)

Keras

(Deep Learning API from Francis Chollet – a Google engineer)

Tensorflow

(ANN API from Google)

CNTK

(ANN API from Microsoft)

SciPy

(Science and Engineering Libraries)

scikit-learn

(Machine Learning Libraries)

PyCharm

(Commercial Python IDE)

Everything inside this

boundary is included in the

Anaconda Stack

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 29

Page 30: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN Demonstration

• A large 5-layer CNN example from Dr. Jason Brownlee20 was selected

DoE Applied to Deep Learning Optimization

[20) Dr. Jason Brownlee http://machinelearningmastery.com/start-here/

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 30

Page 31: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE

• Brownlee Model Large CNN for MNIST

– This demonstration CNN was already configured for near-state-of-the-art performance using best-practices

– Performance yielded ~ 1% error rate after 10 epochs of training and validation

• Built using the Keras API

– No changes to the structure of the Brownlee model, but superficial modifications for the DoE were introduced

– 2 internal CNN factors and 2 external training factors were varied for this DOE

– Internal: convolutional layer output activation (Relu or tanh) and dropout layer rate (20% and 40%) – blue indicates Brownlee baseline

– External: training sample batch size (100 and 200) and optimization algorithm (Adam or Nadam) – blue indicates Brownlee baseline

DoE Applied to Deep Learning Optimization

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 31

Page 32: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE

• Response Surface Model (RSM)

– Central Composite Design (CCD) w/ standard star points

– Batch (79, 100, 150, 200, 221)

– Drops (15.9%, 20%, 30%, 40%, 44.1%)

– Activation (Relu and Tanh)21

– Optimizer (Adam and Nadam)22

• Run only 5 epochs per treatment

– Brevity needed (I didn’t have the GPUmachine when I did this)

DoE Applied to Deep Learning Optimization

2 continuous variables:

- Batches and Dropouts

2 discrete variables:

- Activation and

Optimization

Cube Points

Star Points (α)

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 32

[21) https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

[22] http://ruder.io/optimizing-gradient-descent/

Page 33: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE - Method

• DOE run programmatically in Python code

• Factor level changes made automatically as program sequences through DoE matrix

• CNN is trained and validated on each DoE treatment combination (from input CSV)

• Output from CNN is then loaded back into Minitab for statistical analysis

DoE Applied to Deep Learning Optimization

MINITAB

(Design Experiment)MNIST_CNN_DOE.MJP

Excel

(create matrix text file)Demo_CNN_DOE.xlsx

Python

(Load matrix into a data

dictionary)

Keras

(Automatically Run DOE)MNIST_CNN_DOE_CCD-2C-2D.py

input

1 -----------

2------------

3------------

CCD-2C-2D_20170812.txt

output

1------------

2------------

3------------

DOE_Output.txt

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 33

1

2 3

4

Page 34: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE - Analysis

• ANOVA (Analysis of Variance)

DoE Applied to Deep Learning Optimization

• Main Effects (Linear)

– Activation 7.14%

– Optimizer 19.1%

• 2-Factor Interactions

– Batch x Optimizer 4.76%

– Activation x Optimizer 4.76%

• Experimental Error

– Lack-of-fit 38.1%

– Pure Error 21.4% (reproducibility)

Surprised by variability

(very stochastic, not deterministic)

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 34

Page 35: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE - Analysis

• Main Effects

• Interactions

DoE Applied to Deep Learning Optimization

220

.711

200.

000

150

.000

100

.000

79.2

89

0.0105

0.0104

0.0103

0.0102

0.0101

0.0100

0.0099

0.0098

0.0097

0.0096

0.44

1421

0.40

0000

0 .300

000

0.200

00 0

0.15

8 579

tanh

relu

nada

mad

am

Batch

Mean

Dropout Activation Optimizer

MNIST Large CNN 5-Epoch Error Rate

0.441

421

0.400

000

0.300

000

0.200

000

0.158

579

nada

m

adam

0.011

0.010

0.009

0.011

0.010

0.009

0.011

0.010

0.009

220.71

1

200.00

0

150.

000

100.00

0

79.28

9

0.011

0.010

0.009

tanhre

lu

Batch

Dropout

Activation

Optimizer

79.289

100.000

150.000

200.000

220.711

Batch

0.158579

0.200000

0.300000

0.400000

0.441421

Dropout

relu

tanh

Activation

adam

nadam

Optimizer

MNIST Large CNN 5-Epoch Error Rate

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 35

Page 36: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE - Analysis

• Residual Plots

DoE Applied to Deep Learning Optimization

0.0020.0010.000-0.001-0.002

99

90

50

10

1

Residual

Pe

rce

nt

0.01100.01050.01000.00950.0090

0.002

0.001

0.000

-0.001

-0.002

Fitted Value

Resi

dua

l

0.00160.00080.0000-0.0008-0.0016

16

12

8

4

0

Residual

Fre

que

ncy

50454035302520151051

0.002

0.001

0.000

-0.001

-0.002

Observation Order

Resi

dual

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Error • Relatively well-behaved

– Close to normal residual variation

– No evidence of a pattern superimposed over the run-order

– The experimental matrix was not randomized

• Due to the thought that computer models are deterministic

• The CNN exhibits stochastic behavior (makes sense after thinking about it)

• Future DOEs will randomize the run order to prevent aliasing external effects into the experiment

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 36

Page 37: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE - Validation

• CNN configured to best settings found in the DoE

– Batch = 200, Dropouts = 40%, Activation = ‘Relu’, Optimizer = ‘Nadam’

– Loss function = ‘Categorical-Cross Entropy’

– Validation CNN run through 48 epochs on MNIST (70,000 digit images)

DoE Applied to Deep Learning Optimization

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 37

Page 38: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

MNIST CNN DoE – Comparison to Baseline Brownlee CNN

• CNN configured to Brownlee baseline: error at 0.75% after 40+ epochs

• CNN configured to best settings found in the DoE: error at 0.60% after 40+ epochs

– DoE-optimized CNN demonstrated a 20% improvement over baseline

DoE Applied to Deep Learning Optimization

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 38

Page 39: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Final ConclusionsComplementing DoE with Machine Learning (and Vice V ersa)

• Positive

– The use of ML techniques to create response metrics from visual images, for an Additive Manufacturing DoE, was successful

– ML can act as a surrogate for human subject-matter experts (SME) to classify and rate visual responses from the experimental

– ML in lieu of human SMEs will eliminate bias and variation from the experimental response

• Negative

– Proper data preparation and the correct training approach is critical for ML success (this is not trivial)

– ML tools are still in raw form (i.e. needs to be coded using a computer language)

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 39

Page 40: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

Machine Learning Resources

More Information

1. Keras (https://keras.io/ )

2. Theano (http://deeplearning.net/software/theano/ )

3. Anaconda (https://www.continuum.io/what-is-anaconda )

4. Scipy (https://www.scipy.org/ )

5. Scikit-learn (http://scikit-learn.org/stable/ )

6. Python (https://www.python.org/ )

7. NIST database 19 (https://www.nist.gov/srd/nist-special-database-19 )

8. MNIST database (http://yann.lecun.com/exdb/mnist/ )

9. Dr. Jason Brownlee (http://machinelearningmastery.com/start-here/ )

10.CS231 online CNN course (http://cs231n.github.io/convolutional-networks/ )

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 40

Page 41: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

More Information

1. Moog (http://www.moog.com/ )

2. Moog AM (http://www.moog.com/3dmetal/index.html )

3. Moog AM/AI (http://www.moog.com/news/blog-new/UB_Moog_Develop_AI_for_MetalAM.html )

4. Moog DoE (http://www.moog.com/news/blog-new/GeorgeBaggsonAdditiveManufacturing.html )

MOOG

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 41

Page 42: Applying Machine Learning and Design of Experiments to ... 2018... · This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

This document does not contain Technical Data or Technology as defined in the ITAR Part 120.10 or EAR Part 772

More Information

CONTACT INFORMATION

George S. Baggs

Questions?

October 16, 2018 ACQ Buffalo 2018 Problem Solving Showcase 42