a jmp script for geostatistical cluster analysis of … · 2012. domaining by clustering...

21
A JMP® SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF MIXED DATA SETS WITH SPATIAL INFORMATION Steffen Brammer

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

A JMP® SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF MIXED DATA SETS WITH SPATIAL INFORMATION

Steffen Brammer

Page 2: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

OBJECTIVE Example 1 Find a location for your luxury car sales outlet

Data set w samples across city at specific locations

Establish mean for each suburb to identify affluent clientele

Location of outlet

No customers ??? Why

Reality Affluent Suburb

Page 3: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

Example 2 How much pesticide do you need to get rid of the bugs?

Strong bug population only in trees along roads -> remove tree lines from your statistics

Data set w samples at specific locations

Establish mean for each field

Page 4: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

Example 3 Image processing

Available pixel Completed image Interpolation

algorithm Reality

Page 5: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

Example 4 Mine geology

Sample data Domains

High grade gold mineralisation in quartz vein stockwork

(pink lines)

Page 6: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

MIXED DATA • Single data set with two (or more) underlying populations that are independent of each other – need to separate into sub-sets (‘domains’) before any statistical analysis

Challenge: allocate samples within the range of overlap to the correct domain

Page 7: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

DECOMPOSITION OF DATA • By statistical decomposition

– various methods and algorithms available

• In a spatial framework (that is, samples are not randomly distributed, but a spatial relationship exists between samples), not only the value of sample must be taken into account, but also its location

– By manual creation of polygons to separate various domains – By geostatistical decomposition, eg geostatistical cluster analysis

• ROMARY, T., RIVOIRARD, J. et al. 2012. Domaining by Clustering Multivariate Geostatistical Data. In: ABRAHAMSEN et al. (eds) Geostatistics Oslo 2012, pp. 455-466, Springer, Dordrecht

– Conventional geostatistical methods struggle or fail when the clusters are intertwined with irregular, discontinuous or complex geometries

– New concept developed and applied using JMP® • Assumption 1: Distribution of underlying populations are known – outcome after decomposition must honour

the distribution • Assumption 2: Populations occur in clusters with a certain degree of connectivity between its samples • Brammer, S. 2015. Domaining of long-tailed bimodal data-sets with statistical methods. In: The Danie Krige

Geostatistical Conference. SAIMM, Johannesburg. pp. 281-286 • Brammer, S. 2015. A self-guiding domaining tool for long-tailed bi-modal data sets. In: Proceedings of the 17th

annual conference of the International Association for Mathematical Geosciences. Sept 5-13, 2015, Freiberg (Saxony), Germany

Page 8: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

CONCEPT & METHODOLOGY

• 1st step Establish statistical moments of

underlying sample populations*

*assuming both populations are approx. normally distributed

– Mean, spread, number of samples (a) – Build target histogram of expected

outcome (b)

(a)

(b)

Original sample data with small outlier population

Page 9: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

CONCEPT & METHODOLOGY (cont.)

Seed Sample

Sample grid (detail) Domains (Reality) red dots – outlier domain

Sample grid

x

x

x x

x 2nd step Build a continuous search path through sample grid

• Pick random seed within upper domain • Follow progressively adjacent samples as long as they fit into the target histogram

x x

x

x x

Page 10: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

CONCEPT & METHODOLOGY (cont.)

• Search path stops when no sample in neighbourhood fits into target histogram – outside high-grade zone; lower tail of

target histogram is filled up

• Once search is interrupted, repeat search from new random seed

• Repeat procedure until all samples potentially belonging to the upper domain are investigated

Page 11: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

SCRIPT 1 .

E S T I M AT I O N O F S T AT I S T I C A L M O M E N T S

Original sample data with small outlier population

Input dialog 1 – estimated parameters (a) Input dialog 2 – iteration parameters (b)

(a)

(b)

Page 12: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

1. Calculate statistical moments for various distribution scenarios (a) 2. Fit distribution and assess goodness-of-fit (b) 3. Record critical parameters (c)

(b)

(a)

(c)

Page 13: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

4. Iterate through all possible combinations in nested loops (d) 5. Rank output values by goodness-of-fit tests and chose best option as final result (e)

(d)

(e)

Page 14: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

SCRIPT 2 S E A RC H PAT H T H R O U G H S A M P L E G R I D

Input dialog 1 – assign columns (a) Input dialog 2 – statistical moments, as established by Script 1 (b) Input dialog 3 – search parameters (c)

(a)

(b)

(c)

Page 15: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

1. Set up target histogram for outlier population (a) 2. Set up rotation matrix for oriented search (b)

(a)

(b)

Page 16: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

3. Select random seed sample from outlier population (c) 4. Select all samples within specified neighbourhood (d)

(c)

(d)

Page 17: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

5. Select sample within specified neighbourhood that fits into target histogram (e) 6. Increase the number of samples of the respective histogram bin (e)

(e)

7. Go to selected sample and continue search at new location (e) 8. Continue search as long as criteria of target histogram is satisfied, then chose new seed sample of next cluster and repeat search until whole grid is investigated (f) (f)

Page 18: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

8. Post-processing to clean up results (g)

9. Repeat whole procedure several times to conduct cluster analysis with a variety of different seed samples and different search orientations (as result of single run depends on random sequence of seed samples)

(g)

Page 19: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

10. Results are given as probabilities for each sample to belong to the outlier population

11. Select specified number of samples with highest probabilities for final result

Page 20: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

FINAL RESULT

Reality Result After 10 runs

Result After 25 runs

Result After 50 runs

Page 21: A JMP SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF … · 2012. Domaining by Clustering Multivariate Geostatistical Data. In: A. BRAHAMSEN. et al. (eds) Geostatistics Oslo 2012,

How to do this without ?!?

No idea.....!

Thank You!!!