cdb 3093 data handling, statistic and errors
TRANSCRIPT
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
1/38
Data Handling, Statistic and Errors
Dr Asna M. Zain, RSci AMIChemECDB3093
Analytical Chemistry
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
2/38
Outline
Sample handling and management
QC and QA
Errors in analysis
Statistical analysis parameters
Descriptive statistics
Inferential statistics
Example questions
2
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
3/38
Nature and Scope
Set of instructionReliability in accuracy,
reproducibility
Solve using chemical orphysico-chemical processas underlying principles of
the technique
SubjectChemicalanalysis
Analyticalproblem
Method Validate
Procedures
Based onpurpose and
intended quality
3
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
4/38
Techniques and method of analysis
Techniques
A t omi c &
m ol e c ul a r
s p e c t r om e t
r y
AAS FTIR
Gr a v i m e t r
y
M a s s s p e c t r om
e t r y
Ch r om a t o g r a
ph y
HPLCGC
T h er m a l
E l e c t r o c h emi c a l
R a d i o c h emi c
a l
4
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
5/38
Validation method
Performance
characteristicof detectorfor singleanalyte
calibrationstandards
Processrepeated formixed analyte
calibrationstandards
Processrepeated for
analyte
calibrationstandard with
possibleinterferingsubstances
and forreagent blank
Processrepeated for
analyte
calibrationstandard withanticipated
matrixcomponent to
evaluatematrix
interference
Analysis ofspike
simulated
matrix – matrix with
added knownamount ofanalyte, to
testrecoveries
Field trials in
routine labwith more junior
personnel totest
ruggedness
5
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
6/38
Sampling and sample handling
Reflects the real composition of sample
Due to varying in time and elapse
between sample collection and analysis
proper storage is required to prevent
loss of analyte
Preservative to maintain the sample
condition for storage or for analysis
Prior to analysis such as extraction,
grinding, concentrate or dissolutionAnalysis
6
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
7/38
Representative sample
Coning and quartering – solid
grab sample /composite of grab – water/liquid
Random pick
1
2
3
41
2
3
4
1
2
3
47
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
8/38
Quality control and quality assurance
QC - ensure theoperational techniques andactivities in analytical labprovide result suitable forintended purpose
Meet specific requirementin context of definedproblem e.g. accuracy andprecision, calibration
Confidence in validity
Cost effective
QA - managerialcomponent/ responsibility ofan analytical lab with all QCprocedures are in place.
Build confidence through labparticipation by inter labstudies.
Proficiency test to the lab
performance or analyst.
Method performance andcertification studiesundertaken
8
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
9/38
Errors in analytical measurement
Measurement error – used statistical method to assess the error andminimize by careful experimental design and control
Absolute and relative error Absolute error given by the Ea = Xm – Xt Relative error, Er = (Xm – Xt)/ Xt
Determinate errors Systematic error lead to bias in the measured value from analyst, equipment or
procedure which require record keeping, training or equipment maintenance.
Indeterminate error Random error source from random fluctuations in measured quantities occurs in
closely controlled environment Minimize by careful experimental design andcontrol of the environmental factors
Accumulated error Aggregated error count in every measurement made in analytical procedures
and contributed to the final calculated results.
9
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
10/38
Determinate and indeterminate error
Determinate error Indeterminate error Instrumental error include
instrument fault,uncalibrated weights anduncalibrated glasswares
Operative error – due tolack of skill and training
Errors in methods -sourcefrom coprecipitation, slight
solubility, side reactions,incomplete reactions andimpurities in reagents
Accidental error or random error
Use probability or statistic tocome into conclusion about the
error
Indeterminate error should followthe normal distribution orGaussian curve
represent the standard deviationof infinite population and measurethe precision by the spread ofnormal population distribution asin Fig 3.2
10
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
11/38
Gaussian distribution
Random errors follow a Gaussian or normal distribution.
We are 95% certain that the true value falls within 2σ (infinite population),IF there is no systematic error.
Fig. 3.2 Normal error curve. ©Gary Christian, Analytical Chemistry, 6th Ed. (Wiley)11
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
12/38
Way to express accuracy - Absolute error
and relative error
Absolute error Relative error
Difference between truevalue and measuredvalue
If true value is 2.62 g andthe measured value is2.52 g, thus the absoluteerror, Ea is -0.10 g
If the Xm is based onaverage of severalmeasurement the valueis called mean error.
Absolute or mean errorexpressed as percentage oftrue value is relative error
Based on the samemeasurement, relative error, Eris (-0.10/2.62) x 100% = -3.8%
The relative accuracy is themeasured value or meanexpressed as a percentage ofthe true value, (2.52/2.62) x100% = 96.2%
12
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
13/38
Example 3.6
The results of an analysis are 36.97 g, compared withthe accepted value of 37.06 g.
What is the relative error in parts per thousand, ppt?
Absolute error = 36.97 g – 37.06 g = -0.09 g
Relative error = -0.09 /37.06 x 1000%
= -2.4 ppt
13
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
14/38
Statistical analysis
Used statistical model
follow a normal (Gaussian) distribution
Average or normalize data if data set is smallto apply Gaussian distribution
A batch may contain a sample or more withdifferent variety or reason e.g. parameters,
holding time
14
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
15/38
Accuracy and precision
You can’t have accuracy without good precision.
But a precise result can have a determinate or systematic error.
©Gary Christian, Analytical Chemistry, 6th Ed. (Wiley)
Fig. 3.1. Accuracy and precision.
15
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
16/38
R chart and X chart
Use control chart to present or evaluate the batch of QC sample.
R chart was used to present the precision which record the property of interest in a running
sequence. Show centerline or average, standard deviation and warning or control limit
This X chart requires result from known sample composition and used to evaluate accuracy.
Warning limit of 2 standard deviation and control limit of 3 standard deviation.16
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
17/38
Statistical parameters
software – Excel, SPSS, Minitab, SYSTAT
Descriptive statistic
Check data for any problematic or non normality data set depart from bell shape or withoutliers, use frequency chart or normal plot
Means,
standard deviation, or S (data
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
18/38
Data distribution
18
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
19/38
Select a confidence level (95% is good) for the number of samples analyzed=(degrees of freedom +1).
Confidence limit = x ± ts/√N.
It depends on the precision, s, and the confidence level you select.
Confidence limit Estimate the range within a given probability which the true value might fall defined by
the experimental mean and standard deviation
The range is called confidence interval and the limit is called confidence limit.
The likelihood that the true value fall within the range is called the probability orconfidence level
19
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
20/38
Inferential statistic
Researcher need to make inferences about populationof sample
Types of inferential statistic
Significance Test, F test and T-test
Analysis of variance (ANOVA)
Q-Test (to discard bad data)
20
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
21/38
Significance test Compare the result of a method with the accepted method
results to decide whether the data is significantly different fromanother set of data (in the mean or availability and spread)
Used statistical table like F test or t test F test indicate a significant different between two method based on
their standard deviation F is defined in term of variances of two methods where the variance
is the square of the standard deviation
F = s12 /s2
2 (Eq. 3.10) where s1
2 > s22
If the calculated F value from Eq. 3.10 exceeds a tabulated F value atthe selected confidence level (e.g Table 3.2 at 95% confidence level),then there is a significant different between variances of the twomethods
21
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
22/38
F valueF = s1
2 /s22.
You compare the variances of two different methods to see if there is asignificant difference in the methods, at the 95% confidence level.
©Gary Christian, Analytical Chemistry, 6th Ed. (Wiley)
22
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
23/38
Example 3.16
You are developing a new calorific procedure for determining the
glucose content in blood serum. You have chosen the standard Folin-
Wu procedure with which to compare your results. From the following
two sets of replicate analyses on the same sample, determine whether
the variance of your method differs significantly from that of the
standard methods using F test.
Your method (mg/dL) Folin-Wu method (mg/dL)
127
125
123
130131
126
129
130
128
131
129127
125
23
T
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
24/38
t-Test
Analysis of variance between means
Require assumption before the test
Do the sample follow a normal distribution? If small is sample then the test isincorrect, moderate sample size of 40-100 to be accurate
the variance for the two groups is about the same. Check homogeneity of varianceassumption, can lead to inaccurate result particularly for small groups with unequalsample sizes
observations to be assumed to be independent, such that one subject does notinfluence another’s subject score.
Statistic calculate the sample means divided by a variance for comparison with the critical valueobtained from a probability table at the selected p value (0.05, 0.01 or 0.001)
if the t statistic is equal or exceed the critical value, then the difference between the two groupmeans is significant at the chosen level of alpha.
The test can be one-sided or two – sided. The former is used when the mean for a particulargroup is hypothesized to be higher than the mean for other group, the latter is used when themean are expected to be different.
24
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
25/38
Example 3.18
A new gravimetric method is developed for iron (III) in which the iron
is precipitated in crystalline form with an organoboron cagecompound. The accuracy of the method is checked by analyzing the
iron in an ore sample and comparing with the results using the
standard precipitation with ammonia and weighing of Fe2O3. The
results, reported as % Fe for each analysis, were as follows:
Find the F and t value,
given
Test method Reference method
20.10
20.50
18.6519.25
19.40
19.99
18.89
19.20
19.0019.70
19.40
25
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
26/38
ANOVA
Multiple t-test when there are more than a few groups
A comparison of group means – no limitation on the no. of group comparison
ANOVA was used to examine the variability of scores within and betweengroups.
Subject scores within groups vary due to differences in individual and random
error
ANOVA assume the observation are independent, normal and group variancesare equal
ANOVA test determine if any group mean is significant different from any othergroup mean by overall F test.
If no different (i.e. F-test is not significant), then the is no point in comparing anyof the groupsretain null hyphothesis.
If F-test is significant indicate at least one group mean is significantly differentfrom one other group mean. investigate the hypothesis for the groups.
26
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
27/38
Q-testQCalc = outlier difference/range.
If QCalc
> QTable
, then reject the outlier as due to a systematic error.
27
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
28/38
Example of Q-test
Performed Q-test to find outlier data from
the following measurement and made your
conclusion to the data.
Sydney Cherry Tien Dick
10.2
10.8
11.6
9.9
9.4
7.8
10.0
9.2
11.3
9.5
10.6
11.6
28
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
29/38
Correlation
Association between two variables that takes on avalue between +1.0 and -1.0
If the two variables are positively correlated, then asone increases, the other increase.
If the two variables are negatively correlated, thenone variable increases, the other decreases
It there are not associated at all the correlation iszero
A scatter plot of zero correlation will show a circularfields of points on x-y axis or no particularrelationships between x and y.
A positive correlation appear as linear line andincreasing but negative correlation will appear aslinear with decreasing line.
Made inferences for association between twovariables in population, by assume data are normaldistribute
Pearson correlation , or
29
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
30/38
Regression
Regression consider a continuous group of variables such as age, divide thegroup into the continuous nature of the age
Regression create a linear equation to predicts the score in a dependentvariable.
The equation represent a line that best fit through a scatter plot of pointsdescribing the relationship between variable and one or more independentvariables
The beta weight or coefficient of the independent variables in the equation giveinfo on relationships between the independent and dependent variables
The slope of single line best fit data of the x-y axis, represent the beta weightand reflect changes in the value of the dependent variable that associated witheach change of one unit in the independent variable.
Regression analysis assume independence, normality and constant variance, andlinear relationship between independent and dependent variables.
30
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
31/38
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
32/38
A least-squares plot gives the best straight line through experimental points.Excel will do this for you.
Fig. 3.7. Straight-line plot.
©Gary Christian,Analytical Chemistry,6th Ed. (Wiley)
32
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
33/38
Riboflavin (Vit B3) is determined in a cereal sample by its fluorescence
intensity in 5% HAc sol. A calibration curve was prepared by measuringthe fluorescence intensities of a series of standards of increasingconcentrations. The following data were obtained. Used the methodleast squares to obtain the best straight line for the calibration curveand to calculate the concentration of riboflavin in the sample.
Fig. 3.8. Least-squares plot of data from Example 3.21.
This Excel plot gives the same results for slope and intercept as calculated inthe example.
©Gary Christian,Analytical Chemistry,6th Ed. (Wiley)
33
M=(xi-x)(yi-y)
(xi-x)2
b= y-mx
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
34/38
Manual solution for example 3.21
34
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
35/38
EXCEL spreadsheet solution for 3.21
Select LINEST from the statistical function list (in the Paste Function window
– click on f x in the tool bar to open).LINEST calculates key statistical functions for a graph or set of data.
Fig. 3.10. Using LINEST for statistics.35
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
36/38
Use of spreadsheets in analytical
chemistry
We often use relative cell references in formulas.
If a number from a given cell is to be a constant in the formula, place $ infront of that cell’s descriptors.
Fig. 3.5. Relative and absolute cell references.36
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
37/38
EXCEL Mathematical function
Excel has a number of mathematical and statistical functions.
Click on f x on the tool bar to open the Paste Function.
Math & trig syntaxes:
LOG10
PRODUCT
POWER
SQRTStatistical syntaxes:
AVERAGE
MEDIAN
STDEV
TTEST
VAR
37
-
8/15/2019 CDB 3093 Data Handling, Statistic and Errors
38/38
References
Gary D. Christian, 2003 Analytical Chemistry, 6th Ed., Wiley,QD101.2 C57 2003
Daniel C Harris, Exploring Chemical Analysis Second Ed., W.HFreeman and Company, 2000 QD 75.2. H368.
Seamus P.J. Higson, Analytical chemistry, Oxford University Press,2004 QD 101.2.H54
38