final report prepared for the data analysis and ...€¦ · final report prepared for the data...

24
Delineating gold occurrence using K-mean cluster analysis. Final report prepared for the Data Analysis and Interpretation Specialization June 20, , 2016

Upload: others

Post on 10-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Delineating gold occurrence using K-mean cluster analysis.

Final report prepared for the Data Analysis and Interpretation Specialization

June 20,, 2016

Page 2: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Objetive & Scope

The role of the geologist in the mineral exploration is that she/he must analyze all the

available information to decide where to investigate as the project moves forward. These

investigations include (but are not limited to) rock & soil sampling, geophysical surveys

and drilling. This last item usually is one of the most expensive at the exploration stage,

where the cost ranges of drilling goes up to US$100/m or higher as the holes goes deeper.

As the project moves forward, the amount of data increases. That information is analyzed

using different approaches, usually in the form of plotting results in maps and find trends

just to mention one. The scope of this report is to test if a K-means cluster analysis could

lead to refine the understanding of the ore deposit being investigated.

Foundations for the present study are the following questions:

• Do the cluster generated separates high grades zones from barren ones?

• It is possible to differentiate two different ore deposits with this approach?

• Is the clustering identifying an element association not clearly spotted using

traditional statistical methods?

Methods

Sample

The number of observations used for this study is 16,175. These observations were

obtained from the sampling of 101 boreholes drilled in two spatially separated targets,

each of these targets representing two ore deposits: a hydrothermal breccia and a gold

porphyry. Although the boreholes are from two ores, these two are in the same region and

share some rock types.

The sampling done on boreholes analyzed in the present study followed certain criteria: a

lithological contact is a limit for two samples; maximum sample length was 1.5 meter; the

entire drill hole was sampled and sent to the laboratory.

Page 3: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Measures

Once in the laboratory, the samples are assayed, for this case 48 elements were

analyzed: gold, silver, copper, lead, zinc, bismuth, sodium, molybdenum, and so on. All

these 48 elements are included on this study. The laboratory reports each element, either

in ppm or percentage. Table 1 includes a list of the elements and the units at what these

are reported.

Sometimes, an element concentration is below of the detection limit, this means that the

equipment used for the assays is unable to read the concentration of such element, then

the concentration is reported using a “<” symbol followed by the number indicating the

minimum concentration limit (p.e. <0.05). In that case, the element value assigned to that

sample was half of the detection limit, p.e 0.025.

Element Symbol

Element Name Units

Au Gold ppm Ag Silver ppm Al Aluminum % As Arsenic ppm Ba Barium ppm Be Berylium ppm Bi Bismuth ppm Ca Calcium % Cd Cadmium ppm Ce Cerium ppm Co Cobalt ppm Cr Chromium ppm Cs Cesium ppm Cu Copper ppm Fe Iron % Ga Gallium ppm Ge Germanium ppm Hf Hafnium ppm In Indium ppm K Potassium ppm La Lanthanum ppm Li Lithium ppm Mg Magnesium % Mn Manganese ppm Mo Molybdenum ppm Na Sodium ppm Nb Niobium ppm Ni Nickel ppm P Phosphorous ppm Pb Lead ppm Rb Rubidum ppm

Page 4: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

S Sulfur ppm Sb Antimony ppm Sc Scandium ppm Se Selenium ppm Sn Tin ppm Sr Stronium ppm Ta Tantalum ppm Te Tellurium ppm Th Thorium ppm Ti Titanium % Tl Thallium ppm U Uranium ppm V Vanadium ppm W Tungsten ppm Y Yttrium ppm Zn Zinc ppm Zr Zirconium ppm

Table 1. List of elements (variables) and their units of meassure

Analysis

As a first approach, traditional statistical measures were calculated for the dataset: mean,

median, maximum & minimum value as well as standard deviation. Then each element

was binned and plotted to determine if the population distribution was normal, presence of

outliers & skewness. It is important to mention that all variables analyzed are quantitative.

Secondly, “r” and “p” values were calculated for the relationship between gold and all the

other 47 elements. Scatterplots for each of these relations were generated. As part of this

step, the presence of moderators was tested for the relationship between gold and silver.

To continue, it was proceed to carry out a K-means cluster to the dataset. At this step, 46

elements were included on the cluster generation, while gold was used to validate the

clustering. Data was standardized to have a mean of zero and a standard deviation of 1.

To determine the amount of clusters to use the Elbow method was applied. A Tuckey test

to evaluate the post hoc comparisons of the clusters was performed.

Finally, the clusters were plotted in a map to evaluate the distributions of these on real-

world location.

Page 5: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Results

Descriptive statistics

Although the descriptive statistic was calculated for all the 48 elements, table 2 shows the

statistics for gold (response variable) and twelve of the most common elements found in

porphyry and hydrothermal deposits. The average gold concentration in the area is

0.42ppm with a standard deviation of 2.58. The maximum gold concentration is 210ppm,

while 0.0025 ppm, is the lowest.

Variables N Mean SD Min MaxAu 16,175 0.42 2.58 0.0025 210.00Ag 16,175 0.89 4.96 0.0100 587.00As 16,175 37.16 57.79 0.2000 2,132.00Ba 16,175 468.83 593.68 5.0000 4,847.00Bi 16,175 0.35 0.66 0.0050 50.10Cu 16,175 6.50 48.89 0.0001 2,310.00Mo 16,175 8.39 32.92 0.0500 2,520.70Mn 16,175 1,154.70 914.74 0.0036 4,713.00Pb 16,175 71.47 228.00 0.2500 10,000.00Sb 16,175 9.33 10.44 0.0500 206.70Te 16,175 0.34 0.69 0.0250 40.60W 16,175 1.95 19.19 0.1000 1,977.00Zn 16,175 357.93 651.46 0.5000 15,400.00

Table2.Descriptivestatisticsforgoldandotherelementsassociatedwiththeoredepositstudied.

Gold distribution is highly skewed to the left (figure 1), which is a common behavior of this

element. Some elements as silver, zinc, lead and copper follow same pattern, while some

other shows bimodal (iron, chromium, cobalt) and normal distributions (yttrium, sodium,

aluminum). See appendix 1 for the distribution graphs for all the elements and the

complete statistics.

Figure1.Golddistributioninthedataset

Page 6: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Bivariate Analyses

According to table 3, the strongest relationship occurs between gold and silver (r=0.403,

p=0), which is a positive one; however this relation weak and probably non-linear, as

revealed by the scatterplot (figure 2). From the same table it is possible to see that the

relationship between gold and arsenic, bismuth, copper, manganese, lead, antimony,

tellurium, tungsten and zinc is statistically significant with p values <0.05. However, looking

at the scatterplots of these relations (figure 2), it is possible to observe that for all of them,

the relationship is non-linear.

Quantitativepredictors r p r2

Ag 0.403 0 0.162409As 0.0945 2.15e-33 0.00893025Ba -0.00017 0.98 2.89E-08Bi 0.084 1.34e-26 0.007056Cu 0.035 1.05e-05 0.001225Mn 0.069 2.16e-18 0.004761Mo 0.008 0.28 0.000064Pb 0.15 4.49e-82 0.0225Sb 0.11 9.23e-46 0.0121Te 0.103 2.01e-39 0.010609W 0.035 9.98e-06 0.001225Zn 0.15 3.91e-82 0.0225

Table3.Correlationcoefficientsfortherelationshipbetweengoldandelementsmostcommonlyfoundonporphyry&hydrothermaldeposits.

Although not shown in table 3, cerium has a weak negative relation (r=-0.3, p=0.00018)

with gold, and according to the scatterplot, it is non-linear (see appendix 2). For a

complete list of “r” values for the relation of gold and the other elements in the dataset, as

well as the scatterplots, please refer to appendix 2.

Being gold-silver the strongest relationship, it was tested how zinc, yttrium and hafnium

moderates that relationship. Table 4 shows how the groups were divided and the

respective “r” and “p” values for each of these groups. It can be seen that the correlation

between gold and silver do not differ as the concentration of zinc increases. However, for

yttrium is observed that as it concentration increases the correlation between gold and

silver is positive (r values of 0.37 and 0.44) and significant. The same pattern is observed

in hafnium, when the concentration of hafnium increases the association between gold and

silver is positive and significant.

Page 7: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Figure2.Scatterplotsfortherelationbetweengoldandotherpredictors

Page 8: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Zinc(ppm)Group From To r p

1 0 350 0.21 1.51e-1292 350 700 0.21 4.89e-183 700 1050 0.53 1.44e-654 1050 max 0.37 1.29e-43

Yttrium(ppm)

1 0 2.93 0.27 0.932 2.93 5.86 0.057 0.553 5.86 8.79 0.044 0.274 8.79 11.72 0.37 1.6e-615 11.72 max 0.43 0

Hafnium(ppm)

1 0 0.576 0.08 0.0112 0.576 1.152 0.22 1.26e-583 1.152 1.728 0.035 1.84e-584 1.728 max 0.65 0

Table4.Testingmoderatorsfortherelationshipbetweengoldandsilver

Cluster Analysis

The Elbow method (figure 3) showed bends at 4, 6, 8 and 16 clusters. All these were tried

but 4 clusters is the solution that worked the best. The clusters generated are shown in

figure 4, and table 5 displays the number of observations per clusters.

Figure3.Elbowmethodtodeterminenumberofclusters

Page 9: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

When applying gold to validate the clusters, the OLS regression (figure 5) indicates that

the clusters differ on gold concentration (F-statistic=34.61, p=2.95e-22). The means for gold

by clusters are shown in table 6. The Tukey test (figure 6) shows that the clusters differed

in gold concentration, being the cluster 0 and 1 the ones with the highest difference.

ClusterNumberofobservations

0 7,244

1 261

2 26

3 3,790

Table5.Numberofobservationspercluster

Figure4.Fourclusterssolution,someagglomerationofobservationstotheleftisobserved.

Figure5.OLSregessiontabletoevaluategoldandtheclusters

Figure6.ResultsfortheTukeytest.

Page 10: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Plotting clusters in real world

The clusters generated were plotted using real world coordinates. Figure 7 shows the

clusters for the porphyry type deposit and for the hydrothermal breccia. Although the

quality of these images is not good enough, it is seen that clusters 1 and 2 are mainly

found on the hydrothermal breccia with just a few samples on the porphyry type deposit.

No trend, no additional information is provided by the real world distribution plotting of the

cluster, at least on a 2D dimension.

Conclusions/Limitations

A k-means cluster analysis was performed to differentiate barren zones from high grades

zones and to assess if the algorithm works well to differentiate two different ore deposits.

For this purpose a number a database containing 16,175 samples from 101 boreholes was

used. From the 48 chemical elements available in the database, gold was the main focus

of this study.

A basic statistic approach showed a strong skewness to the left, indicating that most of the

samples assayed contained low concentration of gold. It was also possible to identify

outliers (high grade gold samples with values above 50ppm), while the mean gold

concentration was 0.42ppm. Looking at the distribution of other elements (iron, chromium,

cobalt), bimodal or more than two distributions were observed, possibly indicating the

presence of two different mineralization events, one related to the porphyry type deposit

and the second to the hydrothermal breccia.

Figure7.Distributionoftheclustersfortheporphyrytypedeposit(left),andhydrothermalbreccia(right).

Page 11: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

After evaluating the correlation between gold and the other 47 elements, it was found that

the strongest one was the correlation of gold-silver with r=0.403 (p=0). The scope of this

study was not to test moderators on gold and other elements relationship, however a few

moderators were tested. It was found that when the concentration of yttrium and hafnium

increases, the relationship of gold and silver becomes stronger. Although the data need to

be interpreted more carefully, it seems like yttrium and hafnium could be used as

pathfinders elements to find gold-silver mineralization in the studied region.

The cluster analysis showed that the k-means algorithm was not able to separate high-

grade zones from barren ones. Four clusters was the optimal solution. A weak separation

between ore deposits was achieved by cluster 2, which envelops 6 samples belonging to

the porphyry deposit and 20 samples from the hydrothermal breccia. Moreover, the

samples from cluster 2 are not relevant when looking at gold concentration. Cluster 0 has

the samples with more high gold concentration, but at the same time it contains samples

that are considered as barren. The plotting of the clusters at 2D do not provide further

information on both of the deposits, it will necessary to evaluate the 3D distribution of

those.

Although the cluster analysis carried out in this study did not provide further relevant

information to the understanding of the geology of the region, performing future analysis

could not be discarded because some limitations were found at this study. A limitation for

this first trial could be the presence of outliers in gold and silver, as well as the skewness

in the distribution of many of other elements assayed. Considering the use of the rock type

information in the creation of the cluster is an option not explored in this study. However,

using rock type could also introduce bias, due to an incorrect definition of the rock by the

geologist. Finally, it is recommended to carry out a Lasso Regression analysis to find out

what other elements, besides to silver, are associated with gold and explore their possible

use as indicator elements for gold presence.

Page 12: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

APPENDIX 1

Distribution and basic statistics

Page 13: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 14: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 15: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 16: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 17: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 18: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Basic statistics for all the elements

Page 19: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Appendix 2

Scatterplots for the relationship between gold and other elements

Page 20: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 21: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 22: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 23: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role
Page 24: Final report prepared for the Data Analysis and ...€¦ · Final report prepared for the Data Analysis and Interpretation Specialization June 20,, 2016 . Objetive & Scope The role

Ç