by: curtis b. storlie university of new mexico e-mail ...storlie/st579/scatterplot_tests.pdf ·...

Statistical Tests for Nonrandomness

By: Curtis B. Storlie

University of New Mexico

E-mail:[email protected]

Curt Storlie Statistical Tests for Nonrandomness

Outline

◮ Tests Based on Gridding

◮ Common Means (CMN)◮ Common Locations (CL)◮ Common Variances (CV)◮ Chi Square Test for Statistical Independence (SI)

◮ Flexible Grid-Free Tests◮ Linear and Quadratic Regression◮ Rank Correlation Coefficient Test◮ Squared Rank Differences Test (SRD)

◮ Combining Statistical Tests

◮ Reliability of Results, TDCC


WIPP Example

◮ Computational Model from Waste Isolation Pilot Plant◮ 31 input variables◮ Response Variable Waste Pressure at 10,000 years

(WAS PRES.10K)


Common Means (CMN) Test

◮ Divide all of the values of xj into M disjoint classes (usually atequally spaced quantiles along the x-axis).

◮ For discrete xj , define classes for each of the distinct values.◮ Then perform an analysis of variance (ANOVA) to determine

if y has a different mean across these classes.


Notation for CMN Test

◮ Let the observed values of xj and y be denoted (xi ,j , yi ),i = 1, . . . , n.

◮ Let c , c = 1, 2, . . . ,M, designate the individual classes intowhich the values of xj have been divided

◮ Let Xc designate the set such that i ∈ Xc only if xi ,j belongsto class c

◮ Let mc equal the number of elements contained in Xc (i.e.,the number of xi ,j ’s associated with class c).


Common Means Test

◮ Conducting the ANOVA or F-test

◮ Typical ANOVA assumptions: assume that the yi ’s areindependent and yi ∼ N(µc , σ

2) for i ∈ Xc .

◮ The ANOVA procedure is a test of the hypothesis

H0 : µ1 = µ2 = · · · = µM (1)

versus the alternative that H0 is not true.

◮ If H0 is assumed to be true, then

F ∗ =

[

∑Mc=1 (yc − y)2

]

/(M − 1)[

∑Mc=1

∑

i∈Xc(yi − yc)

2]

/(n − M)(2)

follows an F -distribution with (M − 1, n − M) degrees offreedom, where y =

∑ni=1 yi/n and yc =

∑

i∈Xcyi/mc .


Common Means Test

◮ Let Fa,b denote a random variable from an F distribution witha and b degrees of freedom.

◮ The p-value for the test of the null hypothesis H0 is given by

p = P [FM−1,n−M > F ∗] (3)

◮ A small p-value suggests that at least one of the µc is notequal to the rest. Hence, the observed pattern involving xj

and y did not arise by chance and xj has an effect on thebehavior of y .

◮ A level of significance α is specified a-priori (e.g., α = .05). Ifp < α, then we conclude xj has an effect on the behavior of y .

◮ Relative importance of the xj ’s can be assessed by rankingthem according to their respective p-values (smaller thep-value, the more important).


Common Means Test

◮ CMN Test Results on WAS PRES.10K output

Input p-value

BHPERM 0.0000

HALCOMP 0.0000

HALPRM 0.0000

BPCOMP 0.0004

ANHPRM 0.0019

ANHCOMP 0.0212


Common Locations (CL) Test

◮ CL test is the Kruskal-Wallis test for common medians amonggroups.

◮ Groups are defined by the same classes of xj values as in theCMN test above.

◮ Assume that the yi ’s are independent and identicallydistributed with median(yi) = ηc for i ∈ Xc , c = 1, . . . ,M. Itis also assumed that the shape and scale of the distribution ofthe yi ’s is the same across all M groups.

◮ The CL procedure is then a test of the hypothesis

H0 : η1 = η2 = · · · = ηM (4)



Common Locations Test

◮ The test statistic T ∗ for the CL test is based onrank-transformed data Specifically,

T ∗ = (n − 1)

∑Mc=1 mc (rc − r)2

∑Mc=1

∑

i∈Xc(r(yi ) − rc)2

(5)

where

rc = (1/mc )∑

i∈Xc

r(yi), (6)

r = (1/n)

n∑

i=1

r(yi ), (7)

r(yi ) denotes the rank of yi , and mc equals the number ofelements contained in Xc .



◮ If all of the y values have the same distribution, then T ∗

approximately follows a χ2M−1 distribution.

◮ The p-value for this test is

p = P [χ2M−1 > T ∗]. (8)

◮ A small p-value indicates that y has a different distributiondepending on which of the groups xj is in. Since it wasassumed that the shape and scale of the distribution of y

across each of the M groups is the same, the difference mustbe between the locations (medians).

◮ Even without the the shape and scale assumption though, asmall p-value indicates that xj has some effect on y (locationshift or otherwise).



◮ CMN and CL Test Results on WAS PRES.10K output

CMN Results CL Results

Input p-value Input p-value

BHPERM 0.0000 BHPERM 0.0000

HALCOMP 0.0000 HALCOMP 0.0000

HALPRM 0.0000 HALPRM 0.0000

BPCOMP 0.0004 BPCOMP 0.0002

ANHPRM 0.0019 ANHPRM 0.0005

ANHCOMP 0.0212 ANHCOMP 0.0063

BPPRM 0.0403


Common Variances Tests

◮ Define groups by the same classes of xj values as in the CMNand CL tests above.

◮ Any test for common variances (Levene’s Test, Bartlett’sTest, etc.) can be used to test whether or not the variance ofy is the same across xj .

◮ Two pitfalls to a CV test◮ When the distribution of y changes with xj it tends to do so

more dramatically by location shifts than by changes in higherorder moments.

◮ Variances are much harder to estimate than Means. So evenwhen the variance shift is pronounced, it is harder to detect.

◮ One possibility is to combine the CMN and CV test (or CLand CV test) into one test using the Bonferroni Inequality.


Statistical Independence Test

◮ The SI test uses the same partitioning of xj values as used forthe CMN and CL tests.

◮ In addition, the y values are also partitioned in a manneranalogous to that used for the xj values.



◮ The SI test makes use of the χ2-test for contingency tables,to indicate if the pattern appearing in a scatterplot appears tobe nonrandom.

◮ Let r , r = 1, 2, . . . ,L, designate the individual classes intowhich the values of y are divided

◮ Let Yr designate the set such that i ∈ Yr only if yi belongs toclass r

◮ Let lr equal the number of elements contained in Yr (i.e., thenumber of yi ’s associated with class r).

◮ The partitioning of xj and y into M and L classes,respectively, in turn partitions (xj , y) into M × L classes.

◮ Let Or ,c denote the set such that i ∈ Or ,c only if i ∈ Xc andalso i ∈ Yr , and let kr ,c equal the number of elementscontained in Or ,c .



◮ The SI procedure is a test of the hypothesis

H0 : y is independent of xj . (9)


◮ Under the assumption of H0, y has the same distribution ineach of the xj classes.

◮ If xj and y are independent,

Er ,c = (lr/n)(mc/n)n = lrmc/n (10)

is an estimate of the expected number of observations (xj , y)that should fall in class (r , c).



◮ Define the test statistic

T ∗ =M

∑

c=1

L∑

r=1

(kr ,c − Er ,c )2

Er ,c(11)

◮ Asymptotically, T ∗ follows a χ2(M−1)(L−1) distribution when xj

and y are independent. Since large values of T ∗ indicateevidence against the independence hypothesis, the p-value canbe obtained as

p = P [χ2(M−1)(L−1) > T ∗] (12)

◮ A small p-value indicates that xj and y are likely notindependent.



◮ CMN, CL, and SI Test Results on WAS PRES.10K output

CMN Results CL Results SI Results

Input p-value Input p-value Input p-value

BHPERM 0.0000 BHPERM 0.0000 BHPERM 0.0000

HALCOMP 0.0000 HALCOMP 0.0000 HALCOMP 0.0000

HALPRM 0.0000 HALPRM 0.0000 HALPRM 0.0001

BPCOMP 0.0004 BPCOMP 0.0002 WGRCOR 0.0026

ANHPRM 0.0019 ANHPRM 0.0005 BPCOMP 0.0029

ANHCOMP 0.0212 ANHCOMP 0.0063 ANHPRM 0.0189

BPPRM 0.0403 WGRMICH 0.0331

WRBRNSAT 0.0495


WAS PRES.10K vs. WGRCOR Scatterplot


Outline







Linear & Quadratic Regression

◮ The regression (REG) test for nonrandomness in scatterplotsis performed by fitting simple linear regression of the y on xj .

◮ The p-value for the test is obtained from the test of nonzeroslope.

◮ the quadratic regression (QREG) test performs a quadraticregression of y on xj . That is, the multiple regression model

yi = β0 + β1xi + β2x2i + εi (13)

is fit to the data.

◮ The p-value is obtained by testing the full model forsignificance (i.e., testing the hypothesis β1 = β2 = 0 versusthe alternative that either β1 6= 0 or β2 6= 0).



◮ REG and QREG Test Results on WAS PRES.10K output

REG Results QREG Results

Input p-value Input p-value

HALCOMP 0.0000 BHPERM 0.0000

HALPRM 0.0000 HALCOMP 0.0000

BPCOMP 0.0000 HALPRM 0.0000

ANHPRM 0.0001 ANHPRM 0.0001

HALPOR 0.0031 BPCOMP 0.0001

BPMAP 0.0045 BPMAP 0.0044

BPPRM 0.0046 HALPOR 0.0121

ANHCOMP 0.0166 BPPRM 0.0164



◮ Fitted linear regression line for WAS PRES.10K versusBHPERM (blue line), p-val=0.34

◮ Fitted quadratic regression line for WAS PRES.10K versusBHPERM (red line), p-val=0.00


Rank Correlation Coefficient Test

◮ The rank correlation coefficient (RCC) test is based on therank (or Spearman) correlation coefficient

r =

∑ni=1[r(xi ,j) − (n + 1)/2][r(yi ) − (n + 1)/2]

{∑n

i=1[r(xi ,j) − (n + 1)/2]}1/2{∑n

i=1[r(yi ) − (n + 1)/2]}1/2

(14)where r(xi ,j) and r(yi ) are the ranks associated xj and y forsample element i .


Rank Correlation Coefficient Test

◮ Under the null hypothesis of no monotonic relationshipbetween xj and y , the quantity r is asymptotically distributedas a student’s t random variable with n − 2 degrees offreedom.

◮ Value of r which are large in magnitude (absolute value)indicate that the the underlying rank correlation may bedifferent from zero and there is a relationship between xj andy .

◮ Thus, the p-value to test the null hypotheses is given by

p = P(|tn−2| > |r |) = 2P(tn−2 > |r |). (15)


Squared Rank Differences Test

◮ The Squared Rank Differences (SRD) Test is effective atidentifying linear and very general nonlinear patterns inanalysis results.

◮ the SRD test does not involve the specification of a grid or aparametric model between y and x .

◮ The SRD test is based on the statistic

Q =n−1∑

i=1

(ri+1,j − ri ,j)2 (16)

where ri ,j , i = 1, 2, . . . , n, is the rank of y obtained with thesample element in which xj has rank i .



◮ Under the null hypothesis of no relationship between xj and y ,the test statistic

S∗ =Q − n(n2 − 1)/6

n5/2/6(17)

approximately follows a standard normal distribution forn > 40. Small values of Q (and subsequently small values ofS) indicate similar ranks among y values with similar xj

values. This is inconsistent with the null hypothesis ofindependence between xj and y . Thus, a p-value can beobtained as

p = P(Z < S∗) (18)

where Z is a standard normal random variable.



◮ SRD can be combined with RCC to produce a hybridSRD/RCC which can perform better than either test alone.

◮ The SRD/RCC test is obtained from combining the p-valuesof the SRD and RCC tests, see Hora ?? for details.

RCC Results SRD Results SRD/RCC Results

Input p-value Input p-value Input p-value

HALCOMP 0.0000 BHPERM 0.0000 BHPERM 0.0000

HALPRM 0.0000 SHBCEXP 0.0269 HALPRM 0.0000

BPCOMP 0.0001 HALCOMP 0.0000

ANHPRM 0.0017 BPCOMP 0.0001

HALPOR 0.0032 ANHPRM 0.0011

BPPRM 0.0086 HALPOR 0.0037

BPMAP 0.0086 ANHCOMP 0.0082

ANHCOMP 0.0127 BPMAP 0.0149

BPPRM 0.0305


Outline







Top Down Coefficient of Concordance (TDCC)

◮ The Top Down Coefficient of Concordance (TDCC) wasintroduced as a way to quantify the level of agreementbetween different sensitivity analysis procedures.

◮ Assume that there are K SA procedures used to rank theimportance of the inputs x on the response y .

◮ For example, suppose we use the CMN, SI, and SRD/PCCtests and rank each of the input variables by their p-values, inascending order.

◮ The TDCC measure emphasizes agreement between themultiple rankings assigned to important variables andde-emphasizes disagreement between rankings assigned to lessimportant/unimportant variables.



◮ The TDCC is based on the consideration of arrays of the form

Proc1 Proc2 · · · ProcK

x1 r1,1 r1,2 · · · r1,K

x2 r2,1 r2,2 · · · r2,K...

xp rp,1 rp,2 · · · rp,K

(19)

where (i) x1, x2, . . . , xp are the input variables underconsideration, (ii) Proc1,Proc2, · · · ,ProcK designate the K

SA procedures, (iii) rj ,k is the importance rank of the inputvariable xj resulting from Proc k.



◮ The TDCC measure uses corresponding Savage scores ssj ,k ,where

ssj ,k =

p∑

j=rj,k

1

j(20)

(average Savage scores are assigned in the event of ties).◮ Notice that the Savage scores are large (∼ 2) for the smallest

ranks and very small (1/p) for the largest ranks.◮ The TDCC is defined by

CT =

∑pj=1

(

∑Kk=1 ssj ,k

)2− pK 2

K 2(

p −∑p

j=11j

) . (21)

◮ The TDCC is a number between 0 and 1 where 1 indicatesperfect agreement between the respective ranks of the SAprocedures.



◮ Under repeated random assignment of the input rankings foreach column of the ranks array,

T = K (p − 1)CT (22)

approximately follows a χ2p−1 distribution and thus provides

the basis for a statistical test of agreement.

◮ Specifically, we can test the null hypothesis that the rankingsfrom each SA method are generated at random with thep-value

p = P(χ2p−1 > T ). (23)



◮ TDCC can also be used to test for reproducibility of sensitivityresults obtained when using one particular SA procedure.

◮ To accomplish this, instead of using one sample of size n, weconsider R replicated samples of size n. This is the case withthe WIPP data example we have been using (i.e. 3 LHSsamples of size n = 100 to make a total sample of size 300).

◮ Suppose we apply one particular SA method (e.g. CMN) toeach of the R samples. Each of the R results produce aranking of the order of importance for the inputs.

◮ These R sets of ranks are then treated the same as the K setsof ranks from the various SA procedures to produce theTDCC value.

◮ A large TDCC indicates that the results from this particularSA procedure are “trustworthy” in the sense that they will notchange that much from one sample to another.


Summary & Conclusions

◮ Scatterplot Tests can capture very general nonlinearrelationships between x and y

◮ Scatterplot Test Results can be ranked for variable importancebased on p-value, but this is not all that satisfying. Ideally amore quantitative measure of relative importance is desired(like variance decomposition).

◮ Scatterplot Tests cannot assess the strength of interactions

◮ The p-values for these tests are based on distributionalassumptions that in many cases do not hold. Permutationp-values can be used with any of these tests to improve theaccuracy of the p-values.


References

1. W.J. Conover. Practical Nonparametric Statistics. New York, NY:John Wiley & Sons, 2 edition, 1980.

2. S.C. Hora and J.C. Helton. A distribution-free test for therelationship between model input and output when using Latinhypercube sampling. Reliability Engineering and System Safety,79:333339, 1996.

3. R.L. Iman and W.J. Conover. A measure of top-down correlation.Technometrics, 29:351357, 1987.

4. J.C. Helton, J.D. Johnson, C.J. Sallaberry, and C.B. Storlie. Surveyof sampling-based methods for uncertainty and sensitivity analysis.Reliability Engineering and System Safety, 91:11751209, 2006.


by: curtis b. storlie university of new mexico e-mail ...storlie/st579/scatterplot_tests.pdf ·...

Documents