8[1].basic stat inference

Upload: manish-mahabir

Post on 03-Jun-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/12/2019 8[1].Basic Stat Inference

    1/41

    Basics of

    Statistical Inference

    V. Sreenivas

    [email protected]

  • 8/12/2019 8[1].Basic Stat Inference

    2/41

    Basics of Statistical Inference

    ImportancePrimary entities in statistical inference

    Types of statisticalinference

    EstimationconsiderationsMeasure of accuracy of estimates

    General framework of testing

    Types of errors in statistical testing

    Form of a Statistical test

    Interpretation of a test result

  • 8/12/2019 8[1].Basic Stat Inference

    3/41

    Basics of Statistical Inference

    In all clinical/epidemiological studies,information collected represents only asample from the target population

    Drawing conclusions about thepopulation depends on statistical analysisof data

    So the basis of statistical inference isimportant to understand & interpret theresults from the epidemiological studies

  • 8/12/2019 8[1].Basic Stat Inference

    4/41

    Basics of Statistical Inference

    3 primary entities

    The target population

    Set of characteristics orvariables

    Probability distribution of thecharacteristics

  • 8/12/2019 8[1].Basic Stat Inference

    5/41

    Basics of Statistical Inference

    Population

    Collection of units of observation that are of

    interest & is the target of the investigation

    Eg. In studying the prevalence of

    osteoporosis in women of a city, all the

    women in that city would form the target

    population

    Essential to identify the population clearly &

    precisely

  • 8/12/2019 8[1].Basic Stat Inference

    6/41

    Basics of Statistical Inference

    Variables

    Once population is identified, clearly define whatcharacteristics of the units of this population are tobe studied

    In the above example, we need to define:

    Osteoporosis (reliable & valid method of diagnosis:DEXA/ Ultrasound, normal values of BMD etc.)

    Clear & precise methods of measuring thesecharacteristics are essential for the success of thestudy

  • 8/12/2019 8[1].Basic Stat Inference

    7/41

    Basics of Statistical Inference

    Variables

    Qualitative: Take a few possible values (Eg.

    Sex, Disease status)Quantitative: Can theoretically take any

    value within a specified range (Eg.

    Blood sugar, Syst. BP)

    Type of analysis depends on the type of the

    variable

  • 8/12/2019 8[1].Basic Stat Inference

    8/41

    Basics of Statistical Inference

    Probability distribution

    Most crucial link between population & its

    characteristics

    Allows to draw inferences on the population,

    based on a sample

    Tells what different values can a variable take

    How frequently each value can occur in the

    population

  • 8/12/2019 8[1].Basic Stat Inference

    9/41

    Basics of Statistical Inference

    Probability distribution

    Common distributions in health research areBinomial, Poisson & Normal

    Eg. Incidence of a relatively common diseasemay be approximated by Binomial distribution

    Incidence of a rare disease can be considered

    to have a Poisson distribution

    Continuous variables are often considered tobe Normally distributed

  • 8/12/2019 8[1].Basic Stat Inference

    10/41

    Probability distribution

    Prob. Distribution is characterized by certain quantities

    called parametersThese quantities allow us to calculate the probabilities of

    various events concerning the variable

    Eg. Binomial dist. has 2 parameters nand p.

    This distribution occurs when a fixed number (n) of

    subjects is observed, the characteristic is dichotomous innature and each subject has the same probability (p) of

    having one value and (1-p) of having the other value

    The statistical inference then involves finding out the value

    of pin the population, based on a carefully selected sample

  • 8/12/2019 8[1].Basic Stat Inference

    11/41

    Binomial distribution

    for n = 10 & p = 0.5

    0.25

    0.20

    0.15

    0.10

    0.05

    0.000 1 2 3 4 5 6 7 8 9 10

    Number of successes

  • 8/12/2019 8[1].Basic Stat Inference

    12/41

    Probability distribution

    Eg. The Normal distribution is a mathematical curve

    represented by two quantities (, )mean andstandard deviation respectively.

    Most quantitative characteristics follow this

    Symmetric, Bell shaped curve

    One half is the mirror image of the other

    Mean, median & mode are same and are at center

    Mean 1SD covers 68% data, 2SD 95%, 3SD 99%

  • 8/12/2019 8[1].Basic Stat Inference

    13/41

    X

    0Z

    68.6%

    X-1SD1Z

    X-2SD

    1.96Z

    X+2SD

    1.96Z

    X+1SD

    1Z

    95.0% area

    X+2.58SD

    2.58Z

    X-2.58SD

    2.58Z

    99.0% area under the curve

    Empirical properties of a Normal Deviate

    X: Variable in original units Z: Standardized variable

  • 8/12/2019 8[1].Basic Stat Inference

    14/41

    Statistical inference

    Estimation

    We estimate somecharacteristic of the

    population, based

    on a sample

    Testing

    We test some

    hypotheses about

    the population

    parameters

  • 8/12/2019 8[1].Basic Stat Inference

    15/41

    Descriptive Studies

    In these, generally the objective is:To estimate the values of the parameters of

    the Prob. dist., based on the sampled

    observations

    Best guess of the value in the population

    and a measure of accuracy of this estimateare obtained

  • 8/12/2019 8[1].Basic Stat Inference

    16/41

    Estimation

    Best guesses

    Population mean : Mean of samplePopulation proportion: Sample proportion

    Considerations:

    Consistency: As the sample size increases, theestimates approach their target values

    Unbiased: The average value of the estimatedparameter over a large number of repeated samples ofsame size will be equal to the population value

    Maximum likelihood: That value of the parameterwhich maximizes the probability of observing a

    sample that has been observed

  • 8/12/2019 8[1].Basic Stat Inference

    17/41

    Accuracy of estimates

    When an estimate (E) of a parameter is obtained,

    we need to know how this value (E) wouldchange if another sample is studied

    The distribution of values of E over different

    repeated samples (under identical conditions) isknown as the sampling distribution of E

    This sampling distribution can be determined

    empirically or purely based sampling theory

    The standard deviation of the estimate E is calledthe Standard Error (SE)

  • 8/12/2019 8[1].Basic Stat Inference

    18/41

    Accuracy of Estimates

    Once the sampling distribution of theestimate is known, it can be answered

    How close is my estimate likely to be

    the true value of the parameter

    Can state with certain confidence thatthe true value will be withincertaininterval (Confidence Interval)

  • 8/12/2019 8[1].Basic Stat Inference

    19/41

    Confidence Interval

    The more the confidence required, morethe width, for a given sample size

    Intuitively, more the information wehave (larger sample), the smaller the

    width of the interval (the more certain

    we are about the result)

  • 8/12/2019 8[1].Basic Stat Inference

    20/41

    Estimation of parameters from a Normalpopulation - Example

    The average Bone Mineral Density (BMD) of 150elder women (60+ years) is 0.678gm/cm2with a SDof 0.12 gm/cm2, what is the 95% C.I of the meanBMD?

    Sample size (n) = 150 Mean = 0.678 SD = 0.12

    It has been shown that Mean will have a Normal

    distribution with Mean as mean itself and theStandard Error as /SE n

    0.678 / 150 0.055SE

  • 8/12/2019 8[1].Basic Stat Inference

    21/41

    Confidence Interval for

    BMD in elderly women

    We have the Mean = 0.678, SE = 0.055; andwe also know that Mean follows a Normal

    Distribution

    Using the Normal distribution properties, we know

    that Mean 1.96 SE covers 95% of values

    0.678 (1.96*0.055) = (0.570 0.786) covers

    95% of results if we repeat the study

    This interval is called the 95% CI for mean BMD

  • 8/12/2019 8[1].Basic Stat Inference

    22/41

    Interpretation of CI

    Mean BMD = 0.678 and 95% CI: (0.570 0.786)

    If we repeat the study 100 times, 95% times we geta mean BMD between a 0.57 and 0.79 gm/cm2

    Another interpretation is:

    There is 95% chance that these two limits cover thetrue, unknown, but fixed value of BMD in the

    elderly women

    There is 95% chance that the truth is somewhere in

    this interval

    Do not get the impression that truth varies from

    0.57 to 0.79

  • 8/12/2019 8[1].Basic Stat Inference

    23/41

    Interpretation of CIMean BMD = 0.678 and

    95% CI: (0.570 0.786)

    The narrower the interval, the more confidentwe are of the result

    Alternatively, the wider this interval, the less

    certain we are about the result

  • 8/12/2019 8[1].Basic Stat Inference

    24/41

    Analytical Studies

    Involve testing of hypothesisStudy will have formulated research questions(hypotheses)

    Eg. Is treatment A is superior to treatment B ?Based on the observations from the sample, weneed to draw conclusions

    Inference is a 2 step process:

    - Estimate the parameters

    - Test the hypotheses involving these parameters

  • 8/12/2019 8[1].Basic Stat Inference

    25/41

    Statistical Tests of Hypotheses

    Step 1: Identify the Null Hypothesis (H0)

    - No additional effect of the new treatment;

    - No difference in prevalence rates;

    - Relative risk is one etc.

    It should be testable

    - Possible to identify which parameters

    need to be estimated and their sampling

    distribution, given the study design

  • 8/12/2019 8[1].Basic Stat Inference

    26/41

    Statistical tests of Hypotheses

    Null hypothesis: cure rate p1= p2

    Alternative hypotheses to the null

    hypothesis:The cure rates are different

    (P1p2 two-tailed alternative)

    The cure rate in new method is more

    (p2 > p1 one-tailed alternative)

  • 8/12/2019 8[1].Basic Stat Inference

    27/41

    Step 2: Determine the levels of errorsthat can be acceptable

    Decision Truth in the populationH0is true H0is false

    Accept H0 No error Type II Error ()

    Reject H0 Type I error () No error

    Analogous with a laboratory test

    : False Positivity: False Negativity

    1: Sensitivity (Power of a test)

    St 2 D t i th l l f

  • 8/12/2019 8[1].Basic Stat Inference

    28/41

    Step 2: Determine the levels of errorsthat can be acceptable

    Decision

    Truth in the population

    H0is true H0is false

    Accept H0 No error Type II Error ()

    Reject H0 Type I error () No error

    Impossible to reduce both the errors simultaneously

    One decreases when the other increases

    Design the study with a desired level of and

    minimize the

    choice of & is made after determining the

    consequences of each of the errors and is made

    at the design stage itself

  • 8/12/2019 8[1].Basic Stat Inference

    29/41

    Step 3: Determine the best Statistical test for

    the stated Null hypothesis

    Depends on:

    Study design (Cross over or Independent

    groups, Paired or Unpaired observations etc.)

    Type of variable (Qualitative / Quantitative)

    The properties of the study variable

    (Binomial/Normal distribution, Standard

    Error of the estimate etc.)

  • 8/12/2019 8[1].Basic Stat Inference

    30/41

    Step 3: Determining the best test

    Common tests of significancet TEST

    Chi-square (2

    ) testZ test

    Non-parametric testsInvolves calculating a critical ratio

    that helps to make a decision

  • 8/12/2019 8[1].Basic Stat Inference

    31/41

    Tests of significance

    ParameterCritical Ratio = ----------------------------------(Test Statistic) SE of that parameter

    If we are comparing two proportions:

    Diff. between

    the two proportionsCritical Ratio = Z = ---------------------------------

    SE of the differencebetween the two

    proportions

  • 8/12/2019 8[1].Basic Stat Inference

    32/41

    Step 4: Perform the Statistical Test

    - Calculate the test statistic (Z / 2/ t etc)

    - Using the properties of the distribution of the test

    Statistic, obtain the probability of

    observing such an estimate of the Statistic

    - This probability is the probability of getting the

    observed value of the test statistic if the Null

    hypothesis is true- If this is small, Null hypothesis is an unlikely

    explanation for the resultsReject the Null

    hypothesis (Significant result). If not

  • 8/12/2019 8[1].Basic Stat Inference

    33/41

    - tn-1, 1-/2 0 tn-1, 1-/2

    Acceptance region

    |t|< tn-1, 1- /2

    Rejection region

    t< -tn-1, 1- /2Rejection region

    t> tn-1, 1- /2

    Acceptance & Rejection regions for a paired ttest

  • 8/12/2019 8[1].Basic Stat Inference

    34/41

    Step 5: If the Null hypothesis is not rejected

    at the given level of significance, the

    statistical power of the test (1-) should be

    computed

    Recall that is an error of accepting H0,when it is false. So 1- will be prob. of

    rejecting H0, when it is false. If this

    quantity is low, we recommend that thestudy be repeated with a larger sample

  • 8/12/2019 8[1].Basic Stat Inference

    35/41

    Statistical test oh HypothesisAn example

    We wish to compare the BMD of Indian elderly

    women with Caucasian elderly women

    We hypothesize that Indian women will have lower

    BMD

    Our Null hypothesis: both groups will have equal

    BMD level

    Our alternative hypothesis is: both groups BMD

    will be unequal

    We collected data on 150 Indian women and data

    on Caucasian women is available from literature

  • 8/12/2019 8[1].Basic Stat Inference

    36/41

    Statistical test oh HypothesisAn example

    Indian data:

    Caucasian data:

    Since the sample sizes are large, we can apply a

    test called Z test and Z statistic is calculated as:

    Calculations give us a Z value:

    0.176/0.0117 = 15.04

    1 1 1150 0.678 0.12n x S

    2 2 2300 0.854 0.11n x S

    1 2

    2 2

    1 2

    1 2

    |x xZ

    S S

    n n

  • 8/12/2019 8[1].Basic Stat Inference

    37/41

    Statistical test oh HypothesisAn example

    From the data, we have Z = 15.04

    We know that Z follows a Normal distribution

    Using the properties of Normal distribution we

    realize that the probability of observing this much

    value of Z or more extreme in either directionis < 0.0000001 or < one in a million

    In other words, if our Null hypothesis is correct, our

    chance of finding a Z = 15.04 is so small

    We suspect the Null hypothesis and Reject it and

    conclude that both groups have statistically different

    BMD levels

  • 8/12/2019 8[1].Basic Stat Inference

    38/41

    Summary

    3 entities viz. Population, Variables &Probability distribution of the variables are

    important in Statistical Inference

    Estimation & Testing are 2 components ofStatistical Inference

    Descriptive studies generally deal with

    estimation & Analytical studies deal with

    testing of hypotheses

  • 8/12/2019 8[1].Basic Stat Inference

    39/41

    Summary contd.

    Estimation is followed by a measure of accuracy

    Confidence Interval

    2 types of errors can be committed in statisticaltesting

    Type I error is nothing but the usual pvalue (FalsePositivity)

    The compliment of type II error (False negativity)is called the Power of the test (Sensitivity)

    Test estimator is generally of the form of a ratio of

    2 quantities

  • 8/12/2019 8[1].Basic Stat Inference

    40/41

    Summary contd.

    The calculated Ratio under given

    circumstances follows a known pattern called

    its distribution

    Using this distribution, we can know theprobability of observing a Ratio of the

    magnitude that is observed, by chance alone

    If this chance probability is low, chance is

    unlikely to explain the observed result and we

    Reject the null hypothesis

  • 8/12/2019 8[1].Basic Stat Inference

    41/41

    Summary Contd..

    If the Null hypothesis is rejected, we attribute the

    observed difference to the exposure under

    consideration

    If the null hypothesis is not rejected (accepted), we

    should be sure that our data is sensitive enough to

    believe the negative result (statistical power

    should be calculated)