6 basic statistical tools

Upload: carlo-quinlog

Post on 06-Mar-2016

22 views

Category:

Documents


0 download

DESCRIPTION

for research

TRANSCRIPT

6 BASIC STATISTICAL TOOLSThere are lies, damn lies, and statistics......(Anon.)

6.1 Introduction6.2 Definitions6.3 Basic Statistics6.4 Statistical tests

6.1 IntroductionIn the preceding chapters basic elements for the proper execution of analytical work such as personnel, laboratory facilities, equipment, and reagents were discussed. Before embarking upon the actual analytical work, however, one more tool for the quality assurance of the work must be dealt with: the statistical operations necessary to control and verify the analytical procedures (Chapter 7) as well as the resulting data (Chapter 8).It was stated before that making mistakes in analytical work is unavoidable. This is the reason why a complex system of precautions to prevent errors and traps to detect them has to be set up. An important aspect of the quality control is the detection of both random and systematic errors. This can be done by critically looking at the performance of the analysis as a whole and also of the instruments and operators involved in the job. For the detection itself as well as for the quantification of the errors, statistical treatment of data is indispensable.A multitude of different statistical tools is available, some of them simple, some complicated, and often very specific for certain purposes. In analytical work, the most important common operation is the comparison of data, or sets of data, to quantify accuracy (bias) and precision. Fortunately, with a few simple convenient statistical tools most of the information needed in regular laboratory work can be obtained: the "t-test, the "F-test", and regression analysis. Therefore, examples of these will be given in the ensuing pages.Clearly, statistics are a tool, not an aim. Simple inspection of data, without statistical treatment, by an experienced and dedicated analyst may be just as useful as statistical figures on the desk of the disinterested. The value of statistics lies with organizing and simplifying data, to permit some objective estimate showing that an analysis is under control or that a change has occurred. Equally important is that the results of these statistical procedures are recorded and can be retrieved.6.2 Definitions

6.2.1 Error6.2.2 Accuracy6.2.3 Precision6.2.4 Bias

Discussing Quality Control implies the use of several terms and concepts with a specific (and sometimes confusing) meaning. Therefore, some of the most important concepts will be defined first.6.2.1 ErrorError is the collective noun for any departure of the result from the "true" value*.Analytical errors can be:1. Random or unpredictable deviations between replicates, quantified with the "standard deviation".2. Systematic or predictable regular deviation from the "true" value, quantified as "mean difference" (i.e. the difference between the true value and the mean of replicate determinations).3. Constant, unrelated to the concentration of the substance analyzed (theanalyte).4. Proportional, i.e. related to the concentration of the analyte.* The "true" value of an attribute is by nature indeterminate and often has only a very relative meaning. Particularly in soil science for several attributes there is no such thing as the true value as any value obtained is method-dependent (e.g. cation exchange capacity). Obviously, this does not mean that no adequate analysis serving a purpose is possible. It does, however, emphasize the need for the establishment of standard reference methods and the importance of externalQC(see Chapter 9).6.2.2 AccuracyThe "trueness" or the closeness of the analytical result to the "true" value.It is constituted by a combination of random and systematic errors (precision and bias) and cannot be quantified directly. The test result may be a mean of several values. An accurate determination produces a "true" quantitative value, i.e. it is precise and free of bias.6.2.3 PrecisionThe closeness with which results of replicate analyses of a sample agree.It is a measure of dispersion or scattering around the mean value and usually expressed in terms ofstandard deviation, standard erroror arange(difference between the highest and the lowest result).6.2.4 BiasThe consistent deviation of analytical results from the "true" value caused by systematic errors in a procedure.Bias is the opposite but most used measure for "trueness" which is the agreement of the mean of analytical results with the true value, i.e. excluding the contribution of randomness represented in precision. There are several components contributing to bias:1.Method biasThe difference between the (mean) test result obtained from a number of laboratories using the same method and an accepted reference value. The method bias may depend on the analyte level.2.Laboratory biasThe difference between the (mean) test result from a particular laboratory and the accepted reference value.3.Sample biasThe difference between the mean of replicate test results of a sample and the ("true") value of the target population from which the sample was taken. In practice, for a laboratory this refers mainly to sample preparation, subsampling and weighing techniques. Whether a sample is representative for the population in the field is an extremely important aspect but usually falls outside the responsibility of the laboratory (in some cases laboratories have their own field sampling personnel).The relationship between these concepts can be expressed in the following equation:Figure

The types of errors are illustrated in Fig. 6-1.Fig. 6-1.Accuracy and precision in laboratory measurements. (Note that the qualifications apply to the mean of results: in c the mean is accurate but some individual results are inaccurate)

6.3 Basic Statistics

6.3.1 Mean6.3.2 Standard deviation6.3.3 Relative standard deviation. Coefficient of variation6.3.4 Confidence limits of a measurement6.3.5 Propagation of errors

In the discussions of Chapters 7 and 8 basic statistical treatment of data will be considered. Therefore, some understanding of these statistics is essential and they will briefly be discussed here.The basic assumption to be made is that a set of data, obtained by repeated analysis of the same analyte in the same sample under the same conditions, has anormalorGaussiandistribution. (When the distribution is skewed statistical treatment is more complicated). The primary parameters used are themean(or average) and thestandard deviation(see Fig. 6-2) and the main tools theF-test,thet-test, and regression and correlation analysis.Fig. 6-2.A Gaussian or normal distribution. The figure shows that (approx.) 68% of the data fall in the range x s,95% in the range x2s,and 99.7% in the range x 3s.6.3.1 MeanThe average of a set ofndataxi:(6.1)

6.3.2 Standard deviationThis is the most commonly used measure of the spread or dispersion of data around the mean. The standard deviation is defined as the square root of thevariance (V).The variance is defined as the sum of the squared deviations from the mean, divided byn-1.Operationally, there are several ways of calculation:(6.1)

or(6.3)

or(6.4)

The calculation of the mean and the standard deviation can easily be done on a calculator but most conveniently on aPCwith computer programs such asdBASE, Lotus 123, Quattro-Pro, Excel,and others, which have simple ready-to-use functions.(Warning:some programs usenrather thann-1!).6.3.3 Relative standard deviation. Coefficient of variationAlthough the standard deviation of analytical data may not vary much over limited ranges of such data, it usually depends on the magnitude of such data: the larger the figures, the largers.Therefore, for comparison of variations (e.g. precision) it is often more convenient to use therelative standard deviation (RSD)than the standard deviation itself. TheRSDis expressed as afraction,but more usually as apercentageand is then calledcoefficient of variation (CV).Often, however, these terms are confused.(6.5; 6.6)

Note.When needed (e.g. for theF-test, see Eq. 6.11) the variance can, of course, be calculated by squaring the standard deviation:V = s2(6.7)

6.3.4 Confidence limits of a measurementThe more an analysis or measurement is replicated, the closer the mean x of the results will approach the "true" value, of the analyte content (assuming absence of bias).A single analysis of a test sample can be regarded as literally sampling the imaginary set of a multitude of results obtained for that test sample. The uncertainty of such subsampling is expressed by(6.8)

where="true" value (mean of large set of replicates)x = mean of subsamplest= a statistical value which depends on the number of data and the required confidence (usually 95%).s =standard deviation of mean of subsamplesn =number of subsamples(The termis also known as thestandard error of the mean.)The critical values for t are tabulated in Appendix 1 (they are, therefore, here referred to asttab).To find the applicable value, the number ofdegrees of freedomhas to be established by:df = n-1 (see also Section 6.4.2).ExampleFor the determination of the clay content in the particle-size analysis, a semi-automatic pipette installation is used with a 20 mL pipette. This volume is approximate and the operation involves the opening and closing of taps. Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) and precision have to be established.A tenfold measurement of the volume yielded the following set of data (in mL):19.94119.81219.82919.82819.742

19.79719.93719.84719.88519.804

The mean is 19.842 mL and the standard deviation 0.0627 mL. According to Appendix 1 forn= 10 is ttab= 2.26(df = 9)and using Eq. (6.8) this calibration yields:pipette volume = 19.842 2.26 (0.0627/) = 19.84 0.04 mL(Note that the pipette has a systematic deviation from 20 mL as this is outside the found confidence interval. See alsobias).In routine analytical work, results are usually single values obtained in batches of several test samples. No laboratory will analyze a test sample 50 times to be confident that the result is reliable. Therefore, the statistical parameters have to be obtained in another way. Most usually this is done bymethod validation(see Chapter 7) and/or by keepingcontrol charts,which is basically the collection of analytical results from one or more control samples in each batch (see Chapter 8). Equation (6.8) is then reduced to(6.9)

where="true" valuex= single measurementt= applicable ttab(Appendix 1)s =standard deviation of set of previous measurements.In Appendix 1 can be seen that if the set of replicated measurements is large (say > 30), t is close to 2. Therefore, the (95%) confidence of the resultxof a single test sample (n = 1 in Eq. 6.8) is approximated by the commonly used and well known expression(6.10)

where S is the previously determined standard deviation of the large set of replicates (see also Fig. 6-2).Note:This "method-s" or s of a control sample is not a constant and may vary for different test materials, analyte levels, and with analytical conditions.Runningduplicateswill, according to Equation (6.8), increase the confidence of the (mean) result by a factor:

wherex = mean of duplicatess= known standard deviation of large setSimilarly, triplicate analysis will increase the confidence by a factor, etc. Duplicates are further discussed in Section 8.3.3.Thus, in summary, Equation (6.8) can be applied in various ways to determine the size of errors (confidence) in analytical work or measurements: single determinations in routine work, determinations for which no previous data exist, certain calibrations, etc.6.3.5 Propagation of errors

6.3.5.1. Propagation of random errors6.3.5.2 Propagation of systematic errors

The final result of an analysis is often calculated from several measurements performed during the procedure (weighing, calibration, dilution, titration, instrument readings, moisture correction, etc.). As was indicated in Section 6.2, the total error in an analytical result is an adding-up of the sub-errors made in the various steps. For daily practice, the bias and precision of the whole method are usually the most relevant parameters (obtained from validation, Chapter 7; or from control charts, Chapter 8). However, sometimes it is useful to get an insight in the contributions of the subprocedures (and then these have to be determined separately). For instance if one wants to change (part of) the method.Because the "adding-up" of errors is usually not a simple summation, this will be discussed. The main distinction to be made is between random errors (precision) and systematic errors (bias).6.3.5.1. Propagation of random errorsIn estimating the total random error from factors in a final calculation, the treatment of summation or subtraction of factors is different from that of multiplication or division.I.Summation calculationsIf the final resultxis obtained from the sum (or difference) of (sub)measurementsa, b, c,etc.:x = a + b + c +...then the total precision is expressed by the standard deviation obtained by taking the square root of the sum of individual variances (squares of standard deviation):

If a (sub)measurement has a constant multiplication factor or coefficient (such as an extra dilution), then this is included to calculate the effect of the variance concerned, e.g. (2b)2ExampleThe Effective Cation Exchange Capacity of soils(ECEC)is obtained by summation of the exchangeable cations:ECEC =Exch. (Ca + Mg + Na + K + H + Al)Standard deviations experimentally obtained for exchangeable Ca, Mg, Na, K and (H + Al) on a certain sample, e.g. a control sample, are: 0.30, 0.25, 0.15, 0.15, and 0.60 cmolc/kg respectively. The total precision is:

It can be seen that the total standard deviation is larger than the highest individual standard deviation, but (much) less than their sum. It is also clear that if one wants to reduce the total standard deviation, qualitatively the best result can be expected from reducing the largest individual contribution, in this case the exchangeable acidity.2.Multiplication calculationsIf the final resultxis obtained from multiplication (or subtraction) of (sub)measurements according to

then the total error is expressed by the standard deviation obtained by taking the square root of the sum of the individual relative standard deviations(RSDorCV,as a fraction or as percentage, see Eqs. 6.6 and 6.7):

If a (sub)measurement has a constant multiplication factor or coefficient, then this is included to calculate the effect of theRSDconcerned, e.g.(2RSDb)2.ExampleThe calculation of Kjeldahl-nitrogen may be as follows:

wherea= ml HCl required for titration sampleb= ml HCl required for titration blanks =air-dry sample weight in gramM =molarity of HCl1.4 = 1410-3100% (14 = atomic weight of N)mcf =moisture correction factorNote that in addition to multiplications, this calculation contains a subtraction also (often, calculations contain both summations and multiplications.)Firstly, the standard deviation of the titration(a -b)is determined as indicated in Section 7 above. This is then transformed toRSDusing Equations (6.5) or (6.6). Then theRSDof the other individual parameters have to be determined experimentally. The foundRSDsare, for instance:distillation: 0.8%,titration: 0.5%,molarity: 0.2%,sample weight: 0.2%,mcf:0.2%.The total calculated precision is:

Here again, the highestRSD(of distillation) dominates the total precision. In practice, the precision of the Kjeldahl method is usually considerably worse (2.5%) probably mainly as a result of the heterogeneity of the sample. The present example does not take that into account. It would imply that 2.5% - 1.0% = 1.5% or 3/5 of the total random error is due to sample heterogeneity (or other overlooked cause). This implies that painstaking efforts to improve subprocedures such as the titration or the preparation of standard solutions may not be very rewarding. It would, however, pay to improve the homogeneity of the sample, e.g. by careful grinding and mixing in the preparatory stage.Note.Sample heterogeneity is also represented in the moisture correction factor. However, the influence of this factor on the final result is usually very small.6.3.5.2 Propagation of systematic errorsSystematic errors of (sub)measurements contribute directly to the total bias of the result since the individual parameters in the calculation of the final result each carry their own bias. For instance, the systematic error in a balance will cause a systematic error in the sample weight (as well as in the moisture determination). Note that some systematic errors may cancel out, e.g. weighings by difference may not be affected by a biased balance.The only way to detect or avoid systematic errors is by comparison (calibration) with independent standards and outside reference or control samples.6.4 Statistical tests

6.4.1 Two-sided vs. one-sided test6.4.2 F-test for precision6.4.3 t-Tests for bias6.4.4 Linear correlation and regression6.4.5 Analysis of variance (ANOVA)

In analytical work a frequently recurring operation is the verification of performance by comparison of data. Some examples of comparisons in practice are:- performance of two instruments,- performance of two methods,- performance of a procedure in different periods,- performance of two analysts or laboratories,- results obtained for a reference or control sample with the "true", "target" or "assigned" value of this sample.Some of the most common and convenient statistical tools to quantify such comparisons are theF-test,thet-tests, and regression analysis.Because theF-test and thet-tests are the most basic tests they will be discussed first. These tests examine if two sets of normally distributed data are similar or dissimilar (belong or not belong to the same "population") by comparing theirstandard deviationsandmeansrespectively. This is illustrated in Fig. 6-3.Fig. 6-3.Three possible cases when comparing two sets of data(n1= n2).A. Different mean (bias), same precision; B. Same mean (no bias), different precision; C. Both mean and precision are different. (The fourth case, identical sets, has not been drawn).

6.4.1 Two-sided vs. one-sided testThese tests for comparison, for instance between methodsAandB,are based on the assumption that there is no significant difference (the "null hypothesis"). In other words, when the difference is so small that a tabulatedcritical valueofFortis not exceeded, we can be confident (usually at 95% level) thatAandBare not different. Two fundamentally different questions can be asked concerning both the comparison of the standard deviationss1ands2with theF-test, and of the meansx1, and x2, with thet-test:1. areAandBdifferent?(two-sidedtest)2. isAhigher (or lower) thanB? (one-sidedtest).This distinction has an important practical implication as statistically the probabilities for the two situations are different: the chance thatAandBare only different ("it can go two ways") is twice as large as the chance thatAis higher (or lower) thanB("it can go only one way"). The most common case is the two-sided (also calledtwo-tailed)test: there are no particular reasons to expect that the means or the standard deviations of two data sets are different. An example is the routine comparison of a control chart with the previous one (see 8.3). However, when it is expected or suspected that the mean and/or the standard deviation will go only one way, e.g. after a change in an analytical procedure, the one-sided (orone-tailed)test is appropriate. In this case the probability that it goes the other way than expected is assumed to be zero and, therefore, the probability that it goes the expected way is doubled. Or, more correctly, the uncertainty in the two-way test of 5% (or the probability of 5% that the critical value is exceeded) is divided over the two tails of the Gaussian curve (see Fig. 6-2), i.e. 2.5% at the end of each tail beyond2s.If we perform the one-sided test with 5% uncertainty, we actually increase this 2.5% to 5% at the end of one tail. (Note that for the whole gaussian curve, which is symmetrical, this is then equivalent to an uncertainty of 10% in two ways!)This difference in probability in the tests is expressed in the use of two tables of critical values for bothFand t. In fact, the one-sided table at 95% confidence level is equivalent to the two-sided table at 90% confidence level.It is emphasized that the one-sided test is only appropriate when a difference in one direction is expected or aimed at. Of course it is tempting to perform this test after the results show a clear (unexpected) effect. In fact, however, then a two times higher probability level was used in retrospect. This is underscored by the observation that in this way even contradictory conclusions may arise: if in an experiment calculated values of F and t are found within the range between the two-sided and one-sided values of Ftab, and ttab, the two-sided test indicates no significant difference, whereas the one-sided test says that the result of A is significantly higher (or lower) than that of B. What actually happens is that in the first case the 2.5% boundary in the tail was just not exceeded, and then, subsequently, this 2.5% boundary is relaxed to 5% which is then obviously more easily exceeded. This illustrates that statistical tests differ in strictness and that for proper interpretation of results in reports, the statistical techniques used, including the confidence limits or probability, should always be specified.6.4.2 F-test for precisionBecause the result of theF-test may be needed to choose between the Student'st-test and the Cochran variant (see next section), theF-test is discussed first.TheF-test (orFisher's test) is a comparison of the spread of two sets of data to test if the sets belong to the same population, in other words if the precisions are similar or dissimilar.The test makes use of the ratio of the two variances:(6.11)

where the largers2must be the numerator by convention. If the performances are not very different, then the estimatess1,and s2, do not differ much and their ratio (and that of their squares) should not deviate much from unity. In practice, the calculatedFis compared with the applicableFvalue in the F-table (also called thecriticalvalue, see Appendix 2). To read the table it is necessary to know the applicable number of degrees of freedom fors1,ands2.These are calculated by:df1= n1-1df2= n2-1IfFcalFtabone can conclude with 95% confidence that there is no significant difference in precision (the "null hypothesis" thats1,=s,is accepted). Thus, there is still a 5% chance that we draw the wrong conclusion. In certain cases more confidence may be needed, then a 99% confidence table can be used, which can be found in statistical textbooks.Example I (two-sided test)Table 6-1 gives the data sets obtained by two analysts for the cation exchange capacity (CEC) of a control sample. Using Equation (6.11) the calculatedFvalue is 1.62. As we had no particular reason to expect that the analysts would perform differently, we use the F-table for thetwo-sidedtest and findFtab=4.03 (Appendix 2,df1,=df2= 9). This exceeds the calculated value and the null hypothesis (no difference) is accepted. It can be concluded with 95% confidence that there is no significant difference in precision between the work of Analyst 1 and 2.Table 6-1.CEC values (in cmolc/kg) of a control sample determined by two analysts.12

10.29.7

10.79.0

10.510.2

9.910.3

9.010.8

11.211.1

11.59.4

10.99.2

8.99.8

10.610.2

x:10.349.97

s:0.8190.644

n:1010

Fcal= 1.62tcal= 1.12

Ftab=4.03ttab= 2.10

Example 2 (one-sided test)The determination of the calcium carbonate content with the Scheibler standard method is compared with the simple and more rapid "acid-neutralization" method using one and the same sample. The results are given in Table 6-2. Because of the nature of the rapid method we suspect it to produce a lower precision then obtained with the Scheibler method and we can, therefore, perform theone sided F-test.The applicableFtab= 3.07 (App. 2,df1, =12,df2=9) which is lower thanFcal(=18.3) and the null hypothesis (no difference) is rejected. It can be concluded (with 95% confidence) that forthis one samplethe precision of the rapid titration method is significantly worse than that of the Scheibler method.Table6-2. Contents of CaCO3(in mass/mass %) in a soil sample determined with the Scheibler method(A)and the rapid titration method(B).AB

2.51.7

2.41.9

2.52.3

2.62.3

2.52.8

2.52.5

2.41.6

2.61.9

2.72.6

2.41.7

-2.4

-2.2

2.6

x:2.512.13

s:0.0990.424

n:1013

Fcal=18.3tcal= 3.12

Ftab= 3.07ttab*=2.18

(ttab*= Cochran's "alternative"ttab)6.4.3 t-Tests for bias

6.4.3.1. Student's t-test6.4.3.2 Cochran's t-test6.4.3.3 t-Test for large data sets (n30)6.4.3.4 Paired t-test

Depending on the nature of two sets of data (n, s,sampling nature), the means of the sets can be compared for bias by several variants of thet-test. The following most common types will be discussed:1.Student's t-testfor comparison of two independent sets of data with verysimilarstandard deviations;2. theCochranvariant of thet-test when the standard deviations of the independent setsdiffersignificantly;3. thepaired t-test for comparison of strongly dependent sets of data.Basically, for thet-tests Equation (6.8) is used but written in a different way:(6.12)

wherex = mean of test results of a sample= "true" or reference values =standard deviation of test resultsn =number of test results of the sample.To compare the mean of a data set with a reference value normally the "two-sided t-table of critical values" is used (Appendix 1). The applicable number of degrees of freedom here is:df=n-1If a value for t calculated with Equation (6.12) does not exceed the critical value in the table, the data are taken to belong to the same population: there is no difference and the "null hypothesis" is accepted (with the applicable probability, usually 95%).As with the F-test, when it is expected or suspected that the obtained results are higher or lower than that of the reference value, the one-sidedt-test can be performed: iftcal> ttab,then the results are significantly higher (or lower) than the reference value.More commonly, however, the "true" value of proper reference samples is accompanied by the associated standard deviation and number of replicates used to determine these parameters. We can then apply the more general case of comparing the means of two data sets: the "true" value in Equation (6.12) is then replaced by the mean of a second data set. As is shown in Fig. 6-3, to test if two data sets belong to the same population it is tested if the two Gauss curves do sufficiently overlap. In other words, if the difference between the means x1-x2is small. This is discussed next.Similarity or non-similarity of standard deviationsWhen using thet-test for twosmallsets of data(n1and/orn2