cdb 3093 data handling, statistic and errors

Upload: jc-jackson

Post on 05-Jul-2018

264 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    1/38

    Data Handling, Statistic and Errors

    Dr Asna M. Zain, RSci AMIChemECDB3093

    Analytical Chemistry 

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    2/38

    Outline

    Sample handling and management

    QC and QA

    Errors in analysis

    Statistical analysis parameters

    Descriptive statistics

    Inferential statistics

    Example questions

    2

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    3/38

    Nature and Scope

    Set of instructionReliability in accuracy,

    reproducibility

    Solve using chemical orphysico-chemical processas underlying principles of

    the technique

    SubjectChemicalanalysis

    Analyticalproblem

    Method Validate

    Procedures

    Based onpurpose and

    intended quality

    3

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    4/38

     Techniques and method of analysis

    Techniques

    A  t  omi   c  & 

    m ol   e c  ul   a r 

     s  p e c  t r  om e t 

    r  y 

    AAS FTIR

     Gr  a  v i  m e t r 

     y 

    M a  s  s  s  p e c  t r  om

     e t r  y 

     Ch r  om a  t  o g r  a 

     ph  y 

    HPLCGC

    T h  er m a l  

    E l   e c  t r  o c h  emi   c  a l  

    R  a d i   o c h  emi   c 

     a l  

    4

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    5/38

    Validation method

    Performance

    characteristicof detectorfor singleanalyte

    calibrationstandards

    Processrepeated formixed analyte

    calibrationstandards

    Processrepeated for

    analyte

    calibrationstandard with

    possibleinterferingsubstances

    and forreagent blank 

    Processrepeated for

    analyte

    calibrationstandard withanticipated

    matrixcomponent to

    evaluatematrix

    interference

    Analysis ofspike

    simulated

    matrix – matrix with

    added knownamount ofanalyte, to

    testrecoveries

    Field trials in

    routine labwith more junior

    personnel totest

    ruggedness

    5

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    6/38

    Sampling and sample handling

    Reflects the real composition of sample

    Due to varying in time and elapse

    between sample collection and analysis

    proper storage is required to prevent

    loss of analyte

    Preservative to maintain the sample

    condition for storage or for analysis

    Prior to analysis such as extraction,

    grinding, concentrate or dissolutionAnalysis

    6

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    7/38

    Representative sample

    Coning and quartering – solid

    grab sample /composite of grab – water/liquid

    Random pick

    1

    2

    3

    41

    2

    3

    4

    1

    2

    3

    47

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    8/38

    Quality control and quality assurance

    QC - ensure theoperational techniques andactivities in analytical labprovide result suitable forintended purpose

    Meet specific requirementin context of definedproblem e.g. accuracy andprecision, calibration

    Confidence in validity

    Cost effective

    QA - managerialcomponent/ responsibility ofan analytical lab with all QCprocedures are in place.

    Build confidence through labparticipation by inter labstudies.

    Proficiency test to the lab

    performance or analyst.

    Method performance andcertification studiesundertaken

    8

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    9/38

    Errors in analytical measurement

    Measurement error – used statistical method to assess the error andminimize by careful experimental design and control

    Absolute and relative error Absolute error given by the Ea = Xm – Xt Relative error, Er = (Xm – Xt)/ Xt

    Determinate errors Systematic error lead to bias in the measured value from analyst, equipment or

    procedure which require record keeping, training or equipment maintenance.

    Indeterminate error Random error source from random fluctuations in measured quantities occurs in

    closely controlled environment Minimize by careful experimental design andcontrol of the environmental factors

    Accumulated error Aggregated error count in every measurement made in analytical procedures

    and contributed to the final calculated results.

    9

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    10/38

    Determinate and indeterminate error

    Determinate error Indeterminate error  Instrumental error include

    instrument fault,uncalibrated weights anduncalibrated glasswares

    Operative error – due tolack of skill and training

    Errors in methods -sourcefrom coprecipitation, slight

    solubility, side reactions,incomplete reactions andimpurities in reagents

    Accidental error or random error

    Use probability or statistic tocome into conclusion about the

    error

    Indeterminate error should followthe normal distribution orGaussian curve

      represent the standard deviationof infinite population and measurethe precision by the spread ofnormal population distribution asin Fig 3.2

    10

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    11/38

    Gaussian distribution

    Random errors follow a Gaussian or normal distribution.

    We are 95% certain that the true value falls within 2σ (infinite population),IF there is no systematic error.

    Fig. 3.2 Normal error curve. ©Gary Christian, Analytical Chemistry, 6th Ed. (Wiley)11

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    12/38

    Way to express accuracy - Absolute error

    and relative error

    Absolute error Relative error  

    Difference between truevalue and measuredvalue

    If true value is 2.62 g andthe measured value is2.52 g, thus the absoluteerror, Ea is -0.10 g

    If the Xm is based onaverage of severalmeasurement the valueis called mean error.

    Absolute or mean errorexpressed as percentage oftrue value is relative error

    Based on the samemeasurement, relative error, Eris (-0.10/2.62) x 100% = -3.8%

    The relative accuracy is themeasured value or meanexpressed as a percentage ofthe true value, (2.52/2.62) x100% = 96.2%

    12

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    13/38

    Example 3.6

    The results of an analysis are 36.97 g, compared withthe accepted value of 37.06 g.

    What is the relative error in parts per thousand, ppt?

    Absolute error = 36.97 g – 37.06 g = -0.09 g

    Relative error = -0.09 /37.06 x 1000%

    = -2.4 ppt

    13

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    14/38

    Statistical analysis

    Used statistical model

    follow a normal (Gaussian) distribution

    Average or normalize data if data set is smallto apply Gaussian distribution

    A batch may contain a sample or more withdifferent variety or reason e.g. parameters,

    holding time

    14

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    15/38

    Accuracy and precision

     You can’t have accuracy without good precision.

    But a precise result can have a determinate or systematic error.

     ©Gary Christian, Analytical Chemistry, 6th Ed. (Wiley)

    Fig. 3.1. Accuracy and precision.

    15

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    16/38

    R chart and X chart

    Use control chart to present or evaluate the batch of QC sample.

    R chart was used to present the precision which record the property of interest in a running

    sequence. Show centerline or average, standard deviation and warning or control limit

    This X chart requires result from known sample composition and used to evaluate accuracy.

    Warning limit of 2 standard deviation and control limit of 3 standard deviation.16

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    17/38

    Statistical parameters

    software – Excel, SPSS, Minitab, SYSTAT

    Descriptive statistic

    Check data for any problematic or non normality data set depart from bell shape or withoutliers, use frequency chart or normal plot

    Means,

    standard deviation, or S (data

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    18/38

    Data distribution

    18

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    19/38

    Select a confidence level (95% is good) for the number of samples analyzed=(degrees of freedom +1).

    Confidence limit = x ± ts/√N.

    It depends on the precision, s, and the confidence level you select.

    Confidence limit Estimate the range within a given probability which the true value might fall defined by

    the experimental mean and standard deviation

    The range is called confidence interval and the limit is called confidence limit.

    The likelihood that the true value fall within the range is called the probability orconfidence level

    19

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    20/38

    Inferential statistic

    Researcher need to make inferences about populationof sample

    Types of inferential statistic

    Significance Test, F test and T-test

    Analysis of variance (ANOVA)

    Q-Test (to discard bad data)

    20

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    21/38

    Significance test Compare the result of a method with the accepted method

    results to decide whether the data is significantly different fromanother set of data (in the mean or availability and spread)

    Used statistical table like F test or t test F test indicate a significant different between two method based on

    their standard deviation F is defined in term of variances of two methods where the variance

    is the square of the standard deviation

    F = s12 /s2

    2 (Eq. 3.10) where s1

    2 > s22

    If the calculated F value from Eq. 3.10 exceeds a tabulated F value atthe selected confidence level (e.g Table 3.2 at 95% confidence level),then there is a significant different between variances of the twomethods

    21

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    22/38

    F valueF = s1

    2 /s22.

     You compare the variances of two different methods to see if there is asignificant difference in the methods, at the 95% confidence level.

     ©Gary Christian, Analytical Chemistry, 6th Ed. (Wiley)

    22

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    23/38

    Example 3.16

    You are developing a new calorific procedure for determining the

    glucose content in blood serum. You have chosen the standard Folin-

    Wu procedure with which to compare your results. From the following

    two sets of replicate analyses on the same sample, determine whether

    the variance of your method differs significantly from that of the

    standard methods using F test.

    Your method (mg/dL) Folin-Wu method (mg/dL)

    127

    125

    123

    130131

    126

    129

    130

    128

    131

    129127

    125

    23

    T

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    24/38

    t-Test

    Analysis of variance between means

    Require assumption before the test

    Do the sample follow a normal distribution? If small is sample then the test isincorrect, moderate sample size of 40-100 to be accurate

    the variance for the two groups is about the same. Check homogeneity of varianceassumption, can lead to inaccurate result particularly for small groups with unequalsample sizes

    observations to be assumed to be independent, such that one subject does notinfluence another’s subject score.

    Statistic calculate the sample means divided by a variance for comparison with the critical valueobtained from a probability table at the selected p value (0.05, 0.01 or 0.001)

    if the t statistic is equal or exceed the critical value, then the difference between the two groupmeans is significant at the chosen level of alpha.

    The test can be one-sided or two – sided. The former is used when the mean for a particulargroup is hypothesized to be higher than the mean for other group, the latter is used when themean are expected to be different.

    24

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    25/38

    Example 3.18

    A new gravimetric method is developed for iron (III) in which the iron

    is precipitated in crystalline form with an organoboron cagecompound. The accuracy of the method is checked by analyzing the

    iron in an ore sample and comparing with the results using the

    standard precipitation with ammonia and weighing of Fe2O3. The

    results, reported as % Fe for each analysis, were as follows:

    Find the F and t value,

    given

    Test method Reference method

    20.10

    20.50

    18.6519.25

    19.40

    19.99

    18.89

    19.20

    19.0019.70

    19.40

    25

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    26/38

    ANOVA

    Multiple t-test when there are more than a few groups

    A comparison of group means – no limitation on the no. of group comparison

    ANOVA was used to examine the variability of scores within and betweengroups.

    Subject scores within groups vary due to differences in individual and random

    error

    ANOVA assume the observation are independent, normal and group variancesare equal

    ANOVA test determine if any group mean is significant different from any othergroup mean by overall F test.

    If no different (i.e. F-test is not significant), then the is no point in comparing anyof the groupsretain null hyphothesis.

    If F-test is significant indicate at least one group mean is significantly differentfrom one other group mean. investigate the hypothesis for the groups.

    26

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    27/38

    Q-testQCalc = outlier difference/range.

    If QCalc

    > QTable

    , then reject the outlier as due to a systematic error.

    27

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    28/38

    Example of Q-test

    Performed Q-test to find outlier data from

    the following measurement and made your

    conclusion to the data.

    Sydney Cherry Tien Dick  

    10.2

    10.8

    11.6

    9.9

    9.4

    7.8

    10.0

    9.2

    11.3

    9.5

    10.6

    11.6

    28

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    29/38

    Correlation

    Association between two variables that takes on avalue between +1.0 and -1.0

    If the two variables are positively correlated, then asone increases, the other increase.

    If the two variables are negatively correlated, thenone variable increases, the other decreases

    It there are not associated at all the correlation iszero

    A scatter plot of zero correlation will show a circularfields of points on x-y axis or no particularrelationships between x and y.

    A positive correlation appear as linear line andincreasing but negative correlation will appear aslinear with decreasing line.

    Made inferences for association between twovariables in population, by assume data are normaldistribute

    Pearson correlation , or

    29

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    30/38

    Regression

    Regression consider a continuous group of variables such as age, divide thegroup into the continuous nature of the age

    Regression create a linear equation to predicts the score in a dependentvariable.

    The equation represent a line that best fit through a scatter plot of pointsdescribing the relationship between variable and one or more independentvariables

    The beta weight or coefficient of the independent variables in the equation giveinfo on relationships between the independent and dependent variables

    The slope of single line best fit data of the x-y axis, represent the beta weightand reflect changes in the value of the dependent variable that associated witheach change of one unit in the independent variable.

    Regression analysis assume independence, normality and constant variance, andlinear relationship between independent and dependent variables.

    30

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    31/38

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    32/38

    A least-squares plot gives the best straight line through experimental points.Excel will do this for you.

    Fig. 3.7. Straight-line plot.

     ©Gary Christian,Analytical Chemistry,6th Ed. (Wiley)

    32

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    33/38

    Riboflavin (Vit B3) is determined in a cereal sample by its fluorescence

    intensity in 5% HAc sol. A calibration curve was prepared by measuringthe fluorescence intensities of a series of standards of increasingconcentrations. The following data were obtained. Used the methodleast squares to obtain the best straight line for the calibration curveand to calculate the concentration of riboflavin in the sample.

    Fig. 3.8. Least-squares plot of data from Example 3.21.

    This Excel plot gives the same results for slope and intercept as calculated inthe example.

     ©Gary Christian,Analytical Chemistry,6th Ed. (Wiley)

    33

    M=(xi-x)(yi-y)

    (xi-x)2

    b= y-mx

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    34/38

    Manual solution for example 3.21

    34

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    35/38

    EXCEL spreadsheet solution for 3.21

    Select LINEST from the statistical function list (in the Paste Function window

     – click on f  x  in the tool bar to open).LINEST calculates key statistical functions for a graph or set of data.

    Fig. 3.10. Using LINEST for statistics.35

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    36/38

    Use of spreadsheets in analytical

    chemistry 

    We often use relative cell references in formulas.

    If a number from a given cell is to be a constant in the formula, place $ infront of that cell’s descriptors.

    Fig. 3.5. Relative and absolute cell references.36

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    37/38

    EXCEL Mathematical function

    Excel has a number of mathematical and statistical functions.

    Click on f  x  on the tool bar to open the Paste Function.

    Math & trig syntaxes:

    LOG10

    PRODUCT

    POWER

    SQRTStatistical syntaxes:

    AVERAGE

    MEDIAN

    STDEV

    TTEST

    VAR

    37

  • 8/15/2019 CDB 3093 Data Handling, Statistic and Errors

    38/38

    References

    Gary D. Christian, 2003 Analytical Chemistry, 6th Ed., Wiley,QD101.2 C57 2003

    Daniel C Harris, Exploring Chemical Analysis Second Ed., W.HFreeman and Company, 2000 QD 75.2. H368.

    Seamus P.J. Higson, Analytical chemistry, Oxford University Press,2004 QD 101.2.H54

    38