visualizing spatial information from multiple … · observed simultaneously over time....

16
8/14/2015 Copyright © 2015, SAS Institute Inc. All rights reserved. 1 Copyright © 2015, SAS Institute Inc. All rights reserved. VISUALIZING SPATIAL INFORMATION FROM MULTIPLE MEASURES WITHIN A UNIT OVER TIME; SUPPORTED BY JSL Tony Cooper & Sam Edgemon Analytical Consultants at SAS Copyright © 2015, SAS Institute Inc. All rights reserved. BIO Tony Cooper, PhD, Principal Analytic Consultant, SAS Tony has extensive experience using statistical methods to solve executive-driven business problems in a variety of industries. He has done project work, as well as taught the principles of work process improvement, statistical thinking and statistical methods. Tony is an excellent SAS coder, JMP user and JSL scripter. Tony received his doctorate from the University of Tennessee. He also has a BS in chemical engineering from Rensselaer Polytechnic Institute. Sam Edgemon, Senior Analytical Consultant, SAS A SAS and JMP user for over 20 years, a period of time in which Sam Edgemon has focused on consulting and corporate work utilizing many SAS products with project roles ranging from contributing analyst to project lead, as well as all aspects of managing technically-oriented projects. He has gained experience from many areas: Government, Environmental, Biological Surveillance, Health Care, Pharmaceutical, Automotive, Financial Services, Education, Gaming, Recreation, and Agriculture. Edgemon holds a BS in mathematics and a BS in statistics from the University of Tennessee with certificates from the University of Tennessee in Process Controls and Experimental Design, and from the Massachusetts Institute of Technology in Data Mining.

Upload: others

Post on 07-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 1

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

VISUALIZING SPATIAL INFORMATION FROM

MULTIPLE MEASURES WITHIN A UNIT OVER TIME;

SUPPORTED BY JSL

• Tony Cooper & Sam Edgemon

• Analytical Consultants at SAS

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

BIO

• Tony Cooper, PhD, Principal Analytic Consultant, SAS• Tony has extensive experience using statistical methods to solve executive-driven business problems in a variety

of industries. He has done project work, as well as taught the principles of work process improvement, statistical thinking and statistical methods. Tony is an excellent SAS coder, JMP user and JSL scripter. Tony received his

doctorate from the University of Tennessee. He also has a BS in chemical engineering from Rensselaer Polytechnic Institute.

• Sam Edgemon, Senior Analytical Consultant, SAS• A SAS and JMP user for over 20 years, a period of time in which Sam Edgemon has focused on consulting and

corporate work utilizing many SAS products with project roles ranging from contributing analyst to project lead, as

well as all aspects of managing technically-oriented projects. He has gained experience from many areas: Government, Environmental, Biological Surveillance, Health Care, Pharmaceutical, Automotive, Financial

Services, Education, Gaming, Recreation, and Agriculture. Edgemon holds a BS in mathematics and a BS in statistics from the University of Tennessee with certificates from the University of Tennessee in Process Controls

and Experimental Design, and from the Massachusetts Institute of Technology in Data Mining.

Page 2: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 2

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

A SCRIPT TO INVESTIGATE THE MULTIVARIATE DIACHRONIC NATURE OF

SPATIAL SOURCES OF VARIATION GRAPHICALLY & QUANTITATIVELY

• Understanding the causes of variation in key product or process

characteristics is an ongoing task in product and process design and

manufacturing. Once discovered, the initial investigation is followed by

engineering solutions to reduce or mitigate sources of variation. Included in

the multitude of diagnostic strategies available for initial investigation &

development of theories are various data methodologies to examine

retrospective and observational data.

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

BACKGROUND

• Edward Tuf te characterizes fundamental graphical designs into data maps, time series maps, space-time narrative designs, and relational graphics. Data Maps, placing data on cartographical displays, have a rich history. Perhaps the most famous

is Dr. John Snow’s use of data maps in investigating the cause of cholera. An equiv alent of Data Maps to investigate spatial

structure are concentration diagrams where data, measurements or defects, etc., are placed on a schematic of a product. The importance of time based graphics in production is well understood; runs charts and control charts are widely used.

Inv estigation of diachronic structure is at the basis of time series plots; “do the sources of variation act in a consistent f ashion” or “How long does it take the process to ‘settle’ down’” are typical questions. But, despite their usefulness, space-

time maps are not y et common in engineering. Ty pically, a large number of product dimensions and process parameters are

observ ed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively, is a nontrivial process.

A famous map combining spatial and time characteristics is Minard’sMap describing Napoleon’s Army destruction in the

invasion of 1812.

Charles Joseph Minard, “Carte Figurative des pertessuccessives en hommes de l'Armée Française dans la campagne de Russie 1812-1813”; Paris 1869

Page 3: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 3

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

SPATIAL RELATIONSHIPS: CONCENTRATION DIAGRAMS / MAPS IN JMP

Malaria Cases (World Health Organization)

JMP finds these

Map shapes ‘automatically’

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

TEMPORAL CHANGES

• The graph shows two measurements on 70 parts over time (some labeled)

• Unsupervised Causal Models:

• Special cause / Common Cause• Championed by Shew hart and Deming

• Typically evaluated w ith control charts

• Clusters

Control Chart

Cluster Analysis

Page 4: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 4

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

SMALL 3D EXAMPLE

• What are the interesting facets of

this data?

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

AN ANSWER

Cluster Analysis

Multivariate Control Chart

Each step is readily available in JMP

A script can speed up this analysis

Page 5: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 5

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

TYPICAL DATA

• Univariate

• Across dimensions over time

• Across regions (geographic) over

time

• Across part locations (quality) over

time

• Columns

• Time (or a logical order)

• Location

• Measure

• Product Examples

• Profiles within parts

• Measured on a cmm

• Impurity profiles

• Public Health

• Disease counts by country

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

Step 2)

Choose #

Clusters

using

dendogram

Step 2a)

Choose #

Principal

Components

Step 3)

Continue (or

Update)

Step 4) Add

Cluster

information to

original table

SCRIPT

STEP 1) RUN ’WITHINPARTANALYSIS’

STEP 1A) OPEN DATA WITH LINKED (CUSTOM) MAP

Page 6: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 6

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

ORIGINAL &

REARRANGED DATA

Name L Part Value

R1_S1 1 1 125.02

R1_S2 1 1 123.1

R1_S3 1 1 122.38

R1_S4 1 1 122.81

R1_S5 1 1 126.74

R1_S6 1 1 125.05

R1_S7 1 1 125.27

R1_S8 1 1 122.08

R2_S1 1 1 127.04

R2_S2 1 1 123.58

Part

R1_S

1 1

R1_S

1 2

R1_S

1 3

R1_S

2 1

R1_S

2 2

R1_S

2 3

R1_S

3 1

R1_S

3 2

R1_S

3 3

1 125.02 128.75 126.55 123.1 126 122.5 122.38 125.4 125.68

2 125.41 127.77 126.22 125.16 124.72 123.34 125.18 119.41 125.97

3 124.67 126.15 125.98 123.41 125.07 128.28 125.66 122.97 121.73

4 120.96 123.44 125.26 124.17 122.77 130.01 123.49 123.33 124.06

5 123.93 125.47 125.16 127.48 121.23 128.23 124.44 121.6 123.71

6 125.3 123.9 126.95 123.87 125.35 124.83 124.93 125.42 122.49

At least one

column

describes locations

within a part

One

column

identifies the part

One

column

relates the

measure at the

location /

part

Rows=(# Parts)x(# Locations)

Rows = (#Parts)

Locations w ithin part

became columns

Original Data

Rearranged Data

Script

splits

columns

STEP 1) RUN ’WITHINPARTANALYSIS’

STEP 1A) OPEN DATA WITH LINKED (CUSTOM) MAP

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

HIERARCHICAL CLUSTER ANALYSIS USING THE WARD METHOD

• Starts with the maximum number of clusters (= # parts) and iteratively

combines points & clusters of points that are closest together.

• Closest is the shortest distance between a summary of the profile.

• Ward is a method based on creating a next cluster with minimum variance between

the profiles.

Step 2) Choose #

Clusters using

dendogram

Page 7: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 7

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

PRINCIPAL COMPONENTS:

ASSESSING COVARIANCE / CORRELATION STRUCTURE

• Maximum number of Principal

Components = Number of Locations

within the part

• Considers linear combinations

between locations (describing the

profile)

• 1st component describes the strongest

relationship and so on.

• Consider correlation between locations

as an example.

Y1 Y2 Y3 Z1 Z2 Z3 x1 x2

Y1 0.796 0.693 0.509 0.487 0.439 0.113 -0.157

Y2 0.796 0.745 0.345 0.500 0.361 0.016 0.019

Y3 0.693 0.745 0.287 0.315 0.345 -0.012 -0.161

Z1 0.509 0.345 0.287 0.904 0.734 0.157 0.011

Z2 0.487 0.500 0.904 0.809 0.109 0.037

Z3 0.439 0.361 0.345 0.734 0.809 -0.039 -0.059

x1 0.113 0.016 -0.012 0.157 0.109 -0.039 0.033

x2 -0.157 0.019 -0.161 0.011 0.037 -0.059 0.033

Example of a correlation matrix

Are the relationships always as described by this matrix? I.e., Are there multiple loadings

Step 2a) Choose #

Principal

Components

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

MORE ABOUT PRINCIPAL COMPONENTS

• Columns are locations

• The idea that the locations

are distinct is preserved

• The idea that certain

locations are closer than

others is lost

Dimension Reduction

Scree Plot

Eigenvalues

Most the variation is

explained by 3 or 4

components

λ

Page 8: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 8

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

MULTIVARIATE CONTROL CHARTS

ASSESSING COVARIANCE / CORRELATION STRUCTURE OVER TIME

• Data must be in a logical (time)

order.

• Signal

• Noise

• Summarized by Hotelling T2

• 𝑇2 = 𝑃𝐶𝐴

𝑖2

𝜆𝑖

• 𝑇2~𝑛 𝑛 − 1 2Β

𝑝2, 𝑛−𝑝−1

2

Step 3)

Continue (or

Update)

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

T2

WITH PHASES BY

CLUSTER

• The T2 multivariate control chart assumes a single homogenous baseline process (with a few

observations affected by special causes).

• The clustering suggest many ‘common’ cause states. Invalidating the calculated limits.

• If the reason for the clustering is understood, phased control limits could make sense.

• Multiple machines? (f ixed, systematic differences)

• Raw material lots, setup?

Page 9: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 9

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

VIEW CONCENTRATION MAPS REPRESENTING KEY TIME PERIODS

Step 4) Add Cluster

information to original

table

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

EXAMPLE 1

• Measures at 72 Locations per part

• Three levels (L=1,2,3)

• 24 measurements per Level

• Three rings

• Eight sections per ring

• 617 parts produced over time

• 44,424 records

Page 10: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 10

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

ANALYSIS OF EXAMPLE 1:

Multivariate Summary over Time

Spatial Description of clusters 3,

6 & 8

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

A QUICK NOTE ABOUT CUSTOM MAP FILES: TWO DIMENSIONAL MAPS

Data Table

• Includes a column identif ying the map location f or the row.

•This location ID column has a Column Property > Map Role > Shape Name Use > nn-name.jmp

nn-Name.jmp

•Def ault File Location

•Custom file location, data table must ref erence the custom location

•At least two columns

•Must hav e “Shape ID” column with Column Property > Map Role > Shape Name Def inition

•Custom Name: This has the same v alues used in the map location in the data table. Examples: Parish, Loc_ID

nn-XY.jmp

•Must be in the same location as nn-name.jmp

•Column Names

•Shape ID: Identifies each map shape

•Part ID: allows a single shape to hav e separate sections (ev en non contiguous)

•X, Y: The v ertices

Page 11: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 11

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

CONCLUSION

• Semiconductor w afer manufacture and CMMs (Coordinate Measurement Machines) provide many

measurements for a single part. Another example of repeated measures w ithin a unit is defect

counts. This data includes a w ithin unit location.

• Modeling using the spatial information can be diff icult, but spatial information should alw ays be used

in visualization. The potential information provided by this data includes: “Is the current unit similar

to other recent units?” and “Where are the opportunities concentrated w ithin the unit?”

• This data can be analyzed and visualized using Principal Components Analysis (PCA), multivariate

control charting and clustering. Most importantly the data w ill be visualized using custom maps

w ithin JMP. There w ill need to be more than one arrangement of the data in order for the analysis to

w ork; splitting the data for PCA and clustering, but also merging cluster information w ith the original

stacked view.

• This presentation described a methodology that is streamlined w ith JMP scripting to: indicate similar

processing time periods on a control chart and then map the w ithin part information during each of

those time periods.

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

THANK YOU

[email protected]

[email protected]

Page 12: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 12

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

NEXT STEPS ADDITIONAL IDEAS

• Implement option to calculate control limits by cluster

• Consider time series in T2 - a 3rd method of looking for patterns in the profile

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

SUPPORT PAGES

Page 13: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 13

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

TWO RELATED UNSUPERVISED MODELS;

‘STATES’ OF THE CAUSAL STRUCTURE

Special Cause / Common Cause• Shewhart / Deming Model

• Common cause• Ubiquitous causal structure

• Typical

• The norm

• Special cause• Additional ‘assignable’ cause(s)

• Methodology1. Summarize prof ile

1. Often based on PCA 2. Max # components= # variables

2. Use (robust) estimators to assess what is typical / normal / baseline

3. Compare data to baseline to evaluate atypical

Clustering / Segmentation• Parts w ith similar profiles are grouped. Each

group representing a causal structure.

• Methodology

• ‘Distance’ betw een profile of different

parts

• Max number of clusters - # observations

Return

T2 Multivariate

Control Chart

Cluster Analysis

The two models are not entirely compatible. Clustering suggests more than one ‘common’ causal structure.

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

OVERVIEW OF

SCRIPT

Data

Rearrange Data

Principal Components across

Locations

Multivariate Control Chart

Color Code Multivariate Control Chart by Cluster

Update Original table

Summarize Maps by Cluster

Cluster Analysis

(Custom) Map

Return

Page 14: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 14

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

FOUR COMPONENTS SAVED FROM EXAMPLE

Return

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

T2 PRINCIPAL COMPONENTS ON COVARIANCE

Return

Page 15: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 15

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

CUSTOM MAPS DEFAULT FILE LOCATIONS

• JMP looks for these files in two locations. One location is shared by all users

on a machine. This location is:

• Windows: C:\Program Files\SAS\JMP\<Version Number>\Maps

• Mac: /Library/Application Support/JMP/<Version Number>/Maps

• The other location is specific for an individual user:

• On Windows: C:\Users\<user name>\AppData\Roaming\SAS\JMP\Maps

• Note: On Window s, in JMP Pro, the “JMP” folder is named “JMPPro”. In JMP Shrinkw rap, the

“JMP” folder is named “JMPSW”.

• On Mac: /Users/<user name>/Library/Application Support/JMP/Maps

Return

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

EXAMPLE OF DATA

TABLE

Return

Page 16: VISUALIZING SPATIAL INFORMATION FROM MULTIPLE … · observed simultaneously over time. Investigation of these multivariate systems with space-time maps, graphically and quantitatively,

8/14/2015

C o p y r ig h t © 20 15 , SAS In sti t u te In c . A l l rig h ts re ser ve d . 16

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

EXAMPLE OF –

NAME.JMP FILE

Return

Copy right © 2015, SAS Ins ti tu te Inc . Al l rights res erv ed.

EXAMPLE OF –

XY.JMP FILE

• Column Names

• Shape ID: Identifies each map shape

• Part ID: allows a single shape to

have separate sections (even non

contiguous)

• X, Y: The vertices

Return