class notes, statistical methods in research 1

113
STA 6166 STATISTICAL RESEARCH METHODS I Demetris Athienitis Department of Statistics, University of Florida

Upload: chance-shafor

Post on 27-Sep-2015

224 views

Category:

Documents


4 download

DESCRIPTION

Class Notes, Statistical Methods in Research 1

TRANSCRIPT

  • STA 6166STATISTICAL RESEARCH METHODS I

    Demetris AthienitisDepartment of Statistics, University of Florida

  • Contents

    Contents 1

    I Part 1 Material 4

    1 Descriptive Statistics 51.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.2.1 Location . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.3 Effect of shifting and scaling measurements . . . . . . . 7

    1.3 Graphical Summaries . . . . . . . . . . . . . . . . . . . . . . . 71.3.1 Dot Plot . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.3 Box-Plot . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.4 Pie chart . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.5 Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . 11

    2 Probability 132.1 Sample Space and Events . . . . . . . . . . . . . . . . . . . . 13

    2.1.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . 132.1.2 Relating events . . . . . . . . . . . . . . . . . . . . . . 14

    2.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Conditional Probability and Independence . . . . . . . . . . . 16

    2.3.1 Independent Events . . . . . . . . . . . . . . . . . . . . 172.3.2 Law of Total Probability . . . . . . . . . . . . . . . . . 182.3.3 Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.4 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 202.4.1 Expected Value And Variance . . . . . . . . . . . . . . 232.4.2 Population Percentiles . . . . . . . . . . . . . . . . . . 252.4.3 Common Discrete Distributions . . . . . . . . . . . . . 252.4.4 Common Continuous Distributions . . . . . . . . . . . 272.4.5 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.6 Mean and variance of linear combinations . . . . . . . 322.4.7 Central Limit Theorem . . . . . . . . . . . . . . . . . . 33

    1

  • 3 Inference For Population Mean 353.1 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.1.1 Large sample C.I. for population mean . . . . . . . . . 363.1.2 Small sample C.I. for population mean . . . . . . . . . 363.1.3 Sample size for a C.I. of fixed level and width . . . . . 38

    3.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 393.2.1 One sample hypothesis tests . . . . . . . . . . . . . . . 393.2.2 Small sample test for population mean . . . . . . . . . 43

    II Part 2 Material 45

    4 Inference For Population Proportion 464.1 Large sample C.I. for population proportion . . . . . . . . . . 464.2 Large sample test for population proportion . . . . . . . . . . 47

    5 Inference For Two Population Means 485.1 Two Sample C.I.s . . . . . . . . . . . . . . . . . . . . . . . . 48

    5.1.1 Large sample C.I. for two means . . . . . . . . . . . . . 485.1.2 Small sample C.I. for two means . . . . . . . . . . . . . 495.1.3 Large sample C.I. for two population proportions . . . 505.1.4 C.I. for paired data . . . . . . . . . . . . . . . . . . . . 51

    5.2 Two Sample Hypothesis Tests (optional) . . . . . . . . . . . . 525.2.1 Large sample test for difference of two means . . . . . 525.2.2 Small sample test for difference of two means . . . . . 535.2.3 Large sample test for difference of two proportions . . . 545.2.4 Test for paired data . . . . . . . . . . . . . . . . . . . . 54

    5.3 Normal Probability Plot . . . . . . . . . . . . . . . . . . . . . 55

    6 Nonparametric Procedures For Population Location 576.1 Sign test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.2 Wilcoxon rank-sum test . . . . . . . . . . . . . . . . . . . . . 596.3 Wilcoxon signed-rank test . . . . . . . . . . . . . . . . . . . . 60

    7 Inference About Population Variances 647.1 Inference On One Variance . . . . . . . . . . . . . . . . . . . . 647.2 Comparing Two Variances . . . . . . . . . . . . . . . . . . . . 667.3 Comparing t 2 Variances . . . . . . . . . . . . . . . . . . . . 67

    8 Contingency Tables 69

    III Part 3 Material 73

    9 Regression 749.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . 74

    2

  • 9.1.1 Goodness of fit . . . . . . . . . . . . . . . . . . . . . . 779.1.2 Distribution of response and coefficients . . . . . . . . 799.1.3 Inference on slope coefficient . . . . . . . . . . . . . . . 809.1.4 C.I. on the mean response . . . . . . . . . . . . . . . . 819.1.5 Prediction interval . . . . . . . . . . . . . . . . . . . . 829.1.6 Checking assumptions . . . . . . . . . . . . . . . . . . 829.1.7 Box-Cox (Power) transformation . . . . . . . . . . . . 85

    9.2 Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . 889.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 889.2.2 Goodness of fit . . . . . . . . . . . . . . . . . . . . . . 899.2.3 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    10 Analysis Of Variance 10210.1 Completely Randomized Design . . . . . . . . . . . . . . . . . 102

    10.1.1 Post-hoc comparisons . . . . . . . . . . . . . . . . . . . 10610.1.2 Nonparametric procedure . . . . . . . . . . . . . . . . 108

    10.2 Randomized Block Design . . . . . . . . . . . . . . . . . . . . 10910.2.1 Nonparametric procedure . . . . . . . . . . . . . . . . 111

    3

  • Part I

    Part 1 Material

    4

  • Chapter 1

    Descriptive StatisticsChapter 3 in textbook

    1.1 Concept

    Definition 1.1. Population parameters are a numerical summary concerningthe complete collection of subjects, i.e. the population.

    The population parameters are notated by Greek symbols such as popu-lation mean .

    Definition 1.2. Sample statistics are a numerical summary concerning asubset of the population, i.e. the sample, from which we try to draw inferenceabout the population parameter.

    Sample statistics are notated by the hat symbol over the populationparameter such as the sample mean , or sometimes for convenience a symbolfrom the English alphabet. For the sample mean x.

    1.2 Summary Statistics

    Let x1, . . . , xn denote n observations/numbers.

    1.2.1 Location

    Themean is the arithmetic average of the observations. x = 1n

    ni=1 xi.

    The median is the center of the ordered data.

    If n is odd then the median is located at the (n+1)/2 position ofthe ordered data.

    If n is even the median is the average of two observations, the onelocated at the n/2 position and the (n/2) + 1 position.

    The mode is the most frequently encountered observation.

    5

  • The % trimmed mean is the mean of the data with the smallest% n observations and the largest % n observations truncatedfrom the data.

    The pthpercentile value divides the ordered data such that p% ofthe data are less than that value and (100-p)% greater than it. It islocated at (p/100)(n+1) position of the ordered data. If the positionvalue is not an integer then average the values at (p/100)(n+1) and(p/100)(n+ 1). The median is actually the 50th percentile.According to the textbook p.76, the jth ordered observation corre-sponds to the 100(j 0.5)/n percentile.

    Example 1.1. The following values of fracture stress (in megapascals) weremeasured for a sample of 24 mixtures of hot mixed asphalt (HMA).

    30 75 79 80 80 105 126 138 149 179 179 191223 232 232 236 240 242 245 247 254 274 384 470

    Hence, 24i=1 xi = 30 + 75 + + 384 + 470 = 4690 and thus x =4690/24 = 195.4167.

    The median is the average of the observations at the 12thand 13thpositionof the ordered data, i.e. x = (191 + 223)/2 = 207.

    There are three modes, 80, 179 and 232.

    To compute the 5% trimmed mean we need to remove 0.05(24) = 1.2 1 observations from the lower and upper side of the data. Hence remove30 and 470 and recalculate the average of those 22 observations. Thatis 190.45.

    The 25thpercentile (a.k.a. 1stQuartile) is located at (25/100)(24 +1) = 6.25 position. So average the values at 6thand 7thposition, i.e.(105+126)/2=115.5

    http://www.stat.ufl.edu/~athienit/STA6166/loc_stats.pdf

    Remark 1.1. Note that the mean is more sensitive to outliers-observationsthat do not fall in the general pattern of the rest of the data-than the median.Assume we have values 2, 3, 5. The mean is 3.33 and the median is 3. Assumewe now have 2, 3, 5, 112. The mean is 30.5 but the median is now 4.

    1.2.2 Spread

    The variance is a measure of spread of the individual observationsfrom their center as indicated by the mean.

    2 = s2 =1

    n 1

    n

    i=1

    (xi x)2 =1

    n 1

    ([n

    i=1

    x2i

    ]

    nx2)

    6

    http://www.stat.ufl.edu/~athienit/STA6166/loc_stats.pdf
  • The standard deviation is simply the square root of the variance inorder to return to the original units of measurement.

    The range is the maximum observation - minimum observation.

    The interquartile range (IQR) is 75thpercentile - 25thpercentile (orQ3 Q1).

    Example 1.2. Continuing from Example 1.1

    we havex2i = 1152494, and hence s2 = 123(115249424(195.4167)2) =10260.43 and s =

    10260.43 = 101.2938.

    The range is 470-30=440.

    The IQR is 243.5-115.5=128.

    http://www.stat.ufl.edu/~athienit/STA6166/loc_stats.pdf

    1.2.3 Effect of shifting and scaling measurements

    As we know measurements can be made in different scales, e.g. cm, m, km,etc and even different units of measurements, e.g. Kelvin, Celsius, Fahren-heit. Let us see how shifting and rescaling influence the mean and variance.Let x1, . . . , xn denote the data and define yi = axi + b, where a and b aresome constants. Then,

    y =1

    n

    yi =1

    n

    (axi + b) =1

    n

    (

    nb+ a

    xi

    )

    = ax+ b,

    and,

    s2y =1

    n 1

    (yiy)2 =1

    n 1

    (axi+baxb)2 =1

    n 1a2

    (xix)2 = a2s2x

    1.3 Graphical Summaries

    1.3.1 Dot Plot

    Stack each observation on a horizontal line to create a dot plot that gives anidea of the shape of the data. Some rounding of data values is allowed inorder to stack.

    7

    http://www.stat.ufl.edu/~athienit/STA6166/loc_stats.pdf
  • Fracture stress in mPa

    100 200 300 400

    Figure 1.1: Dot plot of data from Example 1.1

    1.3.2 Histogram

    1. Create class intervals (by choosing boundary points) in which to placethe data.

    2. Construct a Frequency Table.

    3. Draw a rectangle for each class.

    It is up to the researcher to decide how many class intervals to create.As a rule of thumb one creates about 2n1/3 classes. For Example 1.1 that is5.75 so we can either go with 5 or 6 classes.

    Class Interval Freq. Relative Freq. Density0 -

  • Remark 1.2. May use Frequency, Relative Frequency or Density as the ver-tical axis when class widths are equal. However, class widths are not nec-essarily equal; usually done to create smoother graphics if not mandated bythe situation at hand. If this is the case then we must use Density thataccounts for the width because large classes may have unrepresentative largefrequencies.

    http://www.stat.ufl.edu/~athienit/STA6166/hist1_boxplot1.R

    1.3.3 Box-Plot

    Box-Plot is a graphic that only uses quartiles. A box is created with Q1, Q2,and Q3. A lower whisker is drawn from Q1 down to the smallest data pointthat is within 1.5 IQR of Q1. Hence from Q1 = 115.5 down to Q11.5IQR =115.5 1.5(128) = 76.5, but we stop at the smallest point within thanwhich is 30. Similarly the upper whisker is drawn from Q3 = 243.5 toQ3 + 1.5IQR = 435.5 but we stop at the largest point within which is 384.

    100 200 300 400

    Figure 1.3: Box-Plot of data from Example 1.1

    Remark 1.3. Any point beyond the whiskers is classified as an outlier andany point beyond 3IQR from either Q1 or Q3 is classified as an extremeoutlier.

    http://www.stat.ufl.edu/~athienit/STA6166/hist1_boxplot1.R

    9

    http://www.stat.ufl.edu/~athienit/STA6166/hist1_boxplot1.Rhttp://www.stat.ufl.edu/~athienit/STA6166/hist1_boxplot1.R
  • These densities have shapes that can be described as:

    Symmetric

    3 2 1 0 1 2 3

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    Symmetric Density Shapes

    =1=1.5=0.8

    Skewed Left and Right

    5 4 3 2 1 0 1

    0.0

    0.1

    0.2

    0.3

    0.4

    Skewed left

    1 0 1 2 3 4 5

    0.0

    0.1

    0.2

    0.3

    0.4

    Skewed right

    Bi-Modal (or more than two modes)

    4 2 0 2 4 6

    0.00

    0.05

    0.10

    0.15

    0.20

    10

  • 1.3.4 Pie chart

    A pie or circle has 360 degrees. For each category of a variable, the size of theslice is determined by the fraction of 360 that corresponds to that category.

    Example 1.3. There is a total of 337,297,000 native English speakers of theworld, categorizes as

    Country Pop. (1000) % of Total % of pieUSA 226,710 67.21 0.6721(360) = 241.97

    UK 56,990 16.90 60.83

    Canada 19,700 5.84 21.02

    Australia 15,316 4.54 16.35

    Other 18,581 5.51 19.83

    Total 337,297 100 360

    Table 1.2: Frequency table for native English speakers of 1997

    USA 67%

    UK 17%

    Canada 6%

    Australia 5%

    Other 6%

    Pie Chart of Countries

    Figure 1.4: Pie chart of English speaking countries

    http://www.stat.ufl.edu/~athienit/STA6166/pie.R

    1.3.5 Scatterplot

    It is used to plot the raw 2-D points of two variables in an attempt to discerna relationship.

    Example 1.4. A small study with 7 subjects on the pharmacodynamicsof LSD on how LSD tissue concentration affects the subjects math scoresyielded the following data.

    Score 78.93 58.20 67.47 37.47 45.65 32.92 29.97Conc. 1.17 2.97 3.26 4.69 5.83 6.00 6.41

    Table 1.3: Math score with LSD tissue concentration

    11

    http://www.stat.ufl.edu/~athienit/STA6166/pie.R
  • 1 2 3 4 5 6

    3040

    5060

    7080

    Scatterplot

    LSD tissue concentration

    Mat

    h sc

    ore

    Figure 1.5: Scatterplot of Math score vs. LSD tissue concentration

    http://www.stat.ufl.edu/~athienit/STA6166/scatterplot.R

    12

    http://www.stat.ufl.edu/~athienit/STA6166/scatterplot.R
  • Chapter 2

    ProbabilityChapter 4.1 - 4.5 in textbook.

    The study of probability began in the 17thcentury when gamblers startinghiring mathematicians to calculate the odds of winning for different types ofgames.

    2.1 Sample Space and Events

    2.1.1 Basic concepts

    Definition 2.1. The set of all possible outcomes of an experiment is calledthe sample space (S) for the experiment.

    Example 2.1. Here are some basic examples:

    Rolling a die. Then S = {1, 2, 3, 4, 5, 6}

    Tossing a quarter and a penny. S = {Hh, Ht, Th, Tt}

    Counting the number of flaws in my personality. S = {1, 2, . . .}

    Machine cuts rods of certain length (in cm). S = {x|5.6 < x < 6.4}

    Remark 2.1. Elements in S may not be equally weighted.

    Definition 2.2. A subset of a sample space is called an event.

    For instance the empty set = {} and the entire sample space S are alsoevents.

    Example 2.2. Let A be the event of an even outcome when rolling a die.Then, A = {2, 4, 6} S.

    13

  • 2.1.2 Relating events

    When we are concerned with multiple events within the sample space, VennDiagrams are useful to help explain some of the relationships. Lets illustratethis via an example.

    Example 2.3. Let,

    S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}A = {1, 3, 5, 7, 9}B = {6, 7, 8, 9, 10}

    Figure 2.1: Venn Diagram

    Combining events implies combining the elements of the events. For ex-ample,

    A B = {1, 3, 5, 6, 7, 8, 9, 10}.Intersecting events implies only listing the elements that the events havein common. For example,

    A B = {7, 9}.

    The complement of an event implies listing all the elements in the samplespace that are not in that event. For example,

    Ac = {2, 4, 6, 8, 10} (A B)c = {2, 4}.

    Definition 2.3. A collection of events A1, A2, . . . is mutually exclusive if notwo of them have any outcomes in common. That is, Ai Aj = , i, j

    In terms of the Venn Diagram, there is no overlapping between them.

    14

  • 2.2 Probability

    Notation: Let P (A) denote the probability that the event A occurs. It is theproportion of times that the event A would occur in the long run.

    Axioms of Probability:

    P (S) = 1

    0 P (A) 1, since A S

    If A1, A2, . . . are mutually exclusive, thenP (A1 A2 ) = P (A1) + P (A2) +

    As a result of the axioms we have that P (A) = 1 P (Ac) and thatP () = 0.Example 2.4. In a computer lab there are 4 computers and once a day atechnician inspects them and counts the number of computer crashes. Hence,S = {0, 1, 2, 3, 4} and

    Crashes Probability0 0.601 0.302 0.053 0.044 0.01

    Table 2.1: Probabilities for computer crashes

    Let A be the event that at least one crash occurs on a given day.

    P (A) = 0.30 + 0.05 + 0.04 + 0.01

    = 0.4

    or

    = 1 P (Ac)= 1 0.60= 0.4

    If S containsN equally likely outcomes/elements and the event A containsk( N) outcomes then,

    P (A) =k

    N

    Example 2.5. The experiment consists of rolling a die. There are 6 outcomesin the sample space, all of which are equally likely (assuming a fair die).Then, if A is the event of an outcome of a roll being even, A = {2, 4, 6} with3 elements so, P (A) = 3/6 = 0.5

    15

  • The axioms provide a way of finding the probability of a union of twoevents but only if they are mutually exclusive. What if they are not mutuallyexclusive? Show the formula via the following example

    Example 2.6. Lets continue from Example 2.3. We have shown that

    A B = {1, 3, 5, 6, 7, 8, 9, 10} and A B = {7, 9}.So,

    P (A B) = P (A) + P (B) P (A B).We need to subtract the probability of the intersection set {7, 9} as thatprobability was double counted since it is included within A and within B.

    2.3 Conditional Probability and Independence

    Definition 2.4. A probability that is based upon the entire sample space iscalled an unconditional probability, but when it is based upon a subset of thesample space it is a conditional (on the subset) probability.

    Definition 2.5. Let A and B be two events with P (B) 6= 0. Then theconditional probability of A given B (has occurred) is

    P (A|B) = P (A B)P (B)

    .

    The reason that we divide by the probability of given said occurrence, i.e.P (B) is to re-standardize the sample space. We update the sample space tobe just B, i.e. S = B and hence P (B|B) = 1. The only part of event A thatoccurs within this new S = B is P (A B).Proposition 2.1. Rule of Multiplication:

    If P (A) 6= 0, then P (A B) = P (B|A)P (A).

    If P (B) 6= 0, then P (A B) = P (A|B)P (B).Example 2.7. A player serving at tennis is only allowed one fault. At adouble fault the server loses a point/other player gains a point. Given thefollowing information:

    Serve 1

    Success

    0.56

    Serve 2

    Success

    0.98

    Loss of point0.02

    Fault 2

    0.44

    Fault 1

    16

  • What is the probability that the server loses a point, i.e. P (Fault 1 and Fault 2)?

    P (Fault 1 and Fault 2) = P (Fault 2|Fault 1)P (Fault 1) = (0.02)(0.44) = 0.009

    2.3.1 Independent Events

    When the given occurrence of one event does not influence the probabilityof a potential outcome of another event, then the two events are said to beindependent.

    Definition 2.6. Two events A and B are independent if the probability ofeach remains the same, whether or not the other has occurred. If P (A) 6=0, P (B) 6= 0, then

    P (B|A) = P (B) P (A|B) = P (A).

    If either P (A) = 0, or P (B) = 0, then the two events are independent.

    Definition 2.7. (Generalization) The events A1, . . . , An are independent iffor each Ai and each collection Aj1, . . . Ajm of events with P (Aj1 Ajm) 6=0,

    P (Ai|Aj1 Ajm) = P (Ai)

    As a consequence of independence, the rule of multiplication then says

    P (A B) = P (A|B)P (B) ind.= P (A)P (B),

    and in the general case

    P

    (k

    i=1Ai

    )

    =k

    i=1

    P (Ai) 0 < k n

    Example 2.8. Of the microprocessors manufactured by a certain process,20% of them are defective. Assume they function independently. Five mi-croprocessors are chosen at random. What is the probability that they willall work?

    Let Ai denote the event that the ithmicroprocessor works, for i = 1, 2, 3, 4, 5.

    Then,

    P (all work) = P (A1 A2 A3 A4 A5)= P (A1)P (A2)P (A3)P (A4)P (A5)

    = 0.85

    = 0.328

    17

  • 2.3.2 Law of Total Probability

    Recall that the sequence of events A1, . . . , An is mutually exclusive if no twopairs have any elements in common, i.e. Ai Aj = , i, j. We also say thatthe sequence is exhaustive if the union of all the events is the sample space,i.e. ni=1Ai = S.

    Proposition 2.2. Law of Total ProbabilityIf A1, . . . An are mutually exclusive and exhaustive events, and B is any event,then,

    P (B) =n

    i=1

    P (Ai B) = P (A1 B) + + P (An B).

    Equivalently, if P (Ai) 6= 0 for each Ai,

    P (B) =

    n

    i=1

    P (B|Ai)P (Ai) = P (B|A1)P (A1) + + P (B|An)P (An).

    To better illustrate this proposition let n = 4 and look at Figure 2.3.2

    Figure 2.2: Venn Diagram illustrating Law of Total Probability

    Example 2.9. Customers can purchase a car with three options for enginesizes

    Small 45% sold

    Medium 35% sold

    Large 20% sold

    Of the cars with the small engine 10% fail an emissions test within 10 yearsof purchase, while 12% fail of the medium and 15% of the large.

    18

  • What is the probability that a randomly chosen car will fail the emissionstest within 10 years?

    IN CLASS

    2.3.3 Bayes Rule

    In most cases P (B|A) 6= P (A|B). Bayes rule provides a method to calculateone conditional probability if we know the other one. It uses the rule ofmultiplication in conjunction with the law of total probability.

    Proposition 2.3. Bayess RuleSpecial Case: Let A and B be two events with P (A) 6= 0, P (Ac) 6= 0, andP (B) 6= 0. Then,

    P (A|B) = P (A B)P (B)

    =P (B|A)P (A)

    P (B|A)P (A) + P (B|Ac)P (Ac) .

    General Case: Let A1, . . . , An be mutually exclusive and exhaustive eventswith P (Ai) 6= 0 for each i = 1, . . . , n. Let B be any event with P (B) 6= 0.Then,

    P (Ak|B) =P (B|Ak)P (Ak)

    ni=1 P (B|Ai)P (Ai)

    .

    19

  • Example 2.10. In a telegraph signal a dot or dash is sent. Assume that

    P (dot sent) =3

    7, P (dash sent) =

    4

    7

    Suppose that there is some interference and with probability 1/8 a dot ismistankenly received on the other end as a dash, and vice versa.

    Find P (dot sent|dash received).

    IN CLASS

    2.4 Random Variables

    Chapter 4.6 - 4.10 in textbook.

    Definition 2.8. A random variable is a function that assigns a numericalvalue (between [0,1]) to each outcome in a sample space.

    It is an outcome characteristic that is unknown prior to the experiment.For example, an experiment may consist of tossing two dice. One poten-

    tial random variable could be the sum of the outcome of the two dice, i.e.X= sum of two dice. Then, X is a random variable. Another experimentcould consist of applying different amounts of a chemical agent and a poten-tial random variable could consist of measuring the amount of final productcreated in gramms.

    Quantitative random random variables can either be discrete, by whichthey have a countable set of possible values, or continuous which haveuncountably infinite.

    Notation: For a discrete random variable (r.v.) X , the probability distribu-tion is the probability of a certain outcome occurring, denoted as

    P (X = x) = pX(x).

    This is also called the probability mass function (p.m.f.).

    20

  • Notation: For a continuous random variable (r.v.) X , the probability den-sity function (p.d.f.), denoted by fX(x), models the relative frequency of X .Since there are infinitely many outcomes within an interval, the probabilityevaluated at a singularity is always zero, e.g. P (X = x) = 0, x, X being acontinuous r.v.

    Conditions for a function to be:

    p.m.f. 0 p(x) 1 and x p(x) = 1

    p.d.f. f(x) 0 and f(x)dx = 1

    Example 2.11. (Discrete) Suppose a storage tray contains 10 circuit boards,of which 6 are type A and 4 are type B, but they both appear similar. Aninspector selects 2 boards for inspection. He is interested in X = number oftype A boards. What is the probability distribution of X?

    The sample space of X is {0, 1, 2}. We can calculate the following:

    p(2) = P (A on first)P (A on second|A on first)= (6/10)(5/9) = 0.3333

    p(1) = P (A on first)P (B on second|A on first)+ P (B on first)P (A on second|B on first)

    = (6/10)(4/9) + (4/10)(6/9) = 0.5333

    p(0) = P (B on first)P (B on second|B on first)= (4/10)(3/9) = 0.1334

    Consequently,

    X = x p(x)

    0 0.1334

    1 0.5333

    2 0.3333

    Total 1.0

    Table 2.2: Probability Distribution of X

    21

  • Example 2.12. (Continuous) The lifetime of a certain battery has a distri-bution that can be approximated by f(x) = 0.5e0.5x, x > 0.

    0 2 4 6 8

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    Lifetime in 100 hours

    Den

    sity

    Figure 2.3: Probability density function of battery lifetime.

    Notation: You may recall thatf(t)dt is contrived from lim

    f(ti)i. Hence

    for the following definitions and expressions we will only be using notationfor continuous variables and wherever you see

    simply replace it with

    .

    Definition 2.9. The cumulative distribution function (c.d.f.) of a r.v. X isdenoted by FX(x) and defined as

    FX(x) = P (X x) = x

    f(t)dt

    (

    discrete=

    txp(t)

    )

    Example 2.13. Example 2.11 continued.Find F (1). That is,

    F (1) = P (X 1)= P (X = 0) + P (X = 1)

    = 0.1334 + 0.5333

    = 0.6667

    Example 2.14. Example 2.12 continued.Find F (1). That is,

    F (1) =

    1

    f(x)dx

    =

    0

    0dx+

    1

    0

    0.5e0.5xdx

    = 0 + (e0.5x)|10= 0.3935

    22

  • 2.4.1 Expected Value And Variance

    The expected value of a r.v. is thought of as the long term average for thatvariable. Similarly, the variance is thought of as the long term average ofvalues of the r.v. to the expected value.

    Definition 2.10. The expected value (or mean) of a r.v. X is

    X := E(X) =

    xf(x)dx

    (

    discrete=

    x

    xp(x)

    )

    .

    In actuality, this definition is a special case of a much broader statement.

    Definition 2.11. The expected value (or mean) of function h() of a r.v. Xis

    E(h(X)) =

    h(x)f(x)dx.

    Due to this last definition, if the function h performs a simple lineartransformation, such as h(t) = at+ b, for constants a and b, then

    E(aX + b) =

    (ax+ b)f(x)dx = a

    xf(x)dx+ b

    f(x)dx = aE(X) + b

    Example 2.15. Referring back to Example 2.11, the expected value of thenumber of type A boards (X) is

    E(X) =

    x

    xp(x) = 0(0.1334) + 1(0.5333) + 2(0.3333) = 1.1999.

    We can also calculate the expected value of (i) 5X + 3 and (ii) 3X2.

    (i) 5(1.1999) + 3 = 8.995.

    (ii) 3(02)(0.1334) + 3(12)(0.5333) + 3(22)(0.3333) = 5.5995

    Definition 2.12. The variance of a r.v. X is

    2X := V (X) = E[(X X)2

    ]

    =

    (x X)2f(x)dx

    =

    (x2 2xX + 2X)f(x)dx

    =

    x2f(x)dx 2X

    xf(x)dx+ 2X

    f(x)dx

    = E(X2) 2E2(X) + E2(X)= E(X2) E2(X)

    23

  • Example 2.16. We know that E(X) = 1.1999 and E(X2) = 02(0.1334) +12(0.5333) + 22(0.3333) = 1.8665. Thus,

    V (X) = E(X2)E2(X)= 1.8665 1.19992= 0.42674

    Example 2.17. IN CLASS. Variance for Example 2.12

    24

  • Definition 2.13. The variance of a function h of a r.v. X is

    V (h(X)) =

    [h(x) E(h(x))]2f(x)dx

    = E(h2(X)) E2(h(X))

    Notice that if h stands for a linear transformation function then,V (aX + b) = . . . IN CLASS

    2.4.2 Population Percentiles

    Let X be a continuous r.v. with p.d.f. f and c.d.f. F . The populationpthpercentile, xp is found by solving the following equation for xp

    F (xp) =

    xp

    f(t)dt =

    p

    100.

    Example 2.18. Let r.v. X have p.d.f. f(x) = 0.5e0.5x, x > 0. The medianof X is found by solving for xm in

    F (xm) =

    xm

    0

    0.5e0.5tdt = 0.5.

    We note that xm

    0

    0.5e0.5tdt =0.5

    0.5e0.5t|xm0

    = e0.5xm (e0)= e0.5xm + 1.

    Hence, we need to solve

    e0.5xm + 1 = 0.50.5xm = log 0.5

    xm = 2 log 0.5 = 1.386294

    Example 2.19. Example 2.11, IN CLASS

    2.4.3 Common Discrete Distributions

    Bernoulli

    Imagine an experiment where the r.v. X can take only two possible outcomes,success (X = 1) with some probability p and failure (X = 0) with probability1 p. The p.m.f. of X is

    p(x) = px(1 p)1x x = 0, 1 0 p 1

    25

  • and we denote this by stating X Bernoulli(p). The mean of X is

    E(X) =

    x

    xp(x) = 0p(0) + 1p(1) = p,

    and the variance is

    V (X) = E(X2) E2(X) = [02p(0) + 12p(1)] p2 = p p2 = p(1 p).Example 2.20. A die is rolled and we are interested in whether the outcomeis a 6 or not. Let,

    X =

    {

    1 if outcome is 6

    0 otherwise

    Then, X Bernoulli(1/6) with mean 1/6 and variance 5/36.

    Binomial

    If X1, . . . , Xn correspond to n Bernoulli trials conducted where

    the trials are independent

    each trial has identical probability of success p

    the r.v. X is the total number of successesthen X =

    ni=1Xi Bin(n, p). The the intuition behind the form of the

    p.m.f. can be motivated by the following example.

    Example 2.21. A fair coin is tossed 10 times and X = the number of headsis recorded. What is the probability that X = 3?

    One possible outcome is

    (H) (H) (H) (T) (T) (T) (T) (T) (T) (T)

    The probability of this outcome occurring in exactly this order is p3(1 p)7.However there are

    (103

    )possible ways of 3 Heads and 7 Tails since order is

    not important.

    Consequently, the p.m.f. of X Bin(n, p) is

    p(x) =

    (n

    x

    )

    px(1 p)nx, x = 0, 1, . . . , n

    with E(X) = np and V (X) = np(1 p).Another variable of interest concerning experiments with binary outcomes

    is the proportion of successes p = X/n. Note that p is simply the r.v. Xmultiplied by a constant, 1/n. Hence,

    E(p) = E(X/n) =np

    n= p

    and

    V (p) = V (X/n) =1

    n2V (X) =

    np(1 p)n2

    =p(1 p)

    n

    26

  • Example 2.22. A die is rolled 4 times and the number of 6s is observed.Find the probability that there is at least one 6.

    Let, X be the number of 6s which implies X Bin(4, 1/6).

    P (X 1) =4

    i=1

    (4

    i

    )(1

    6

    )i(

    1 16

    )4i

    = 1 P (X < 1)= 1 P (X = 0)

    = 1(4

    0

    )(1

    6

    )0(

    1 16

    )4

    = 0.518

    Also, E(X) = 4(1/6) = 2/3 and V (X) = 4(1/6)(5/6) = 5/9. The ex-pected value of the proportion of 6s which is E(p) = 1/6 and has varianceV (p) = (5/36)/4 = 5/144.

    2.4.4 Common Continuous Distributions

    Uniform

    A continuous r.v. that places equal weight to all values within its support,[a, b], a b, is said to be a uniform r.v. It has p.d.f.

    f(x) =1

    b a a x b

    0 1 2 3 4 5 6

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    Uniform Distribution

    Figure 2.4: Density function of Uniform[1, 5].

    Hence if X Uniform[a, b] then E(X) = a+b2

    and V (X) = (ba)2

    12.

    27

  • Example 2.23. Waiting time for the delivery of a part from the warehouseto certain destination is said to have a uniform distribution from 1 to 5 days.What is the probability that the delivery time is two or more days?

    Let X Uniform[1, 5]. Then, f(x) = 0.25 for 1 x 5 and hence

    P (X 2) = 5

    2

    0.25dt = 0.75.

    Normal

    The normal distribution (Gaussian distribution) is by far the most importantdistribution in statistics. The normal distribution is identified by a locationparameter and a scale parameter 2(> 0). A normal r.v. X is denoted asX N(, 2) with p.d.f.

    f(x) =1

    2

    e1

    22(x)2 < x <

    3 2 1 0 1 2 3

    0.0

    0.1

    0.2

    0.3

    0.4

    Normal Distribution

    Figure 2.5: Density function of N(0, 1).

    It is symmetric, unimodal, bell shaped with E(X) = and V (X) = 2.

    Notation: A normal random variable with mean 0 and variance 1 is calleda standard normal r.v. It is usually denoted by Z N(0, 1). The c.d.f.of a standard normal is given at the end of the textbook and also availableonline (z-table) so that probabilities, which can be expressed in terms of c.d.fcan be conveniently obtained.

    Example 2.24. Find P (2.34 < Z < 1). From the relevant remark,

    P (2.34 < Z < 1) = P (Z < 1)P (Z < 2.34) = 0.15870.0096 = 0.1491

    28

    http://www.stat.ufl.edu/~dathien/Tables/Ztable.pdf
  • If Z is standard normal then it has mean 0 and variance 1. Now if wetake a linear transformation of Z, say X = aZ + b, then

    E(X) = E(aZ + b) = aE(Z) + b = b

    andV (X) = V (aZ + b) = a2V (Z) = a2.

    This fact together with the following proposition allows us to express anynormal r.v. as a linear transformation of the standard normal r.v. Z bysetting a = and b = .

    Proposition 2.4. The r.v. X that is expressed as the linear transformationZ + , is a also a normal r.v. with E(X) = and V (X) = 2.

    Linear transformations are completely reversible, so given a normal r.v.X with mean and variance 2 we can revert back to a standard normal by

    Z =X

    .

    As a consequence any probability statements made about an arbitrary normalr.v. can be reverted to statements about a standard normal r.v.

    Example 2.25. Let X N(15, 7). Find P (13.4 < X < 19.0).We begin by noting

    P (13.4 < X < 19.0) = P

    (13.4 15

    7 5 and n(1 p) > 5

    Example 2.30. At a university the mean age of students is 22.3 and thestandard deviation is 4. A random sample of 64 students is to be drawn.What is the probability that the average age of the sample will be greaterthan 23?

    By the CLT

    Xapprox. N

    (

    22.3,42

    64

    )

    .

    So we need to find

    P (X > 23) = P

    (

    X 22.34/

    (64)>

    23 22.34/

    (64)

    )

    = P (Z > 1.4)

    = 0.0808

    33

  • Example 2.31. At a university assume it is known that 25% of students areover 21. In a sample of 400 what is the probability that more than 110 ofthem are over 21?

    IN CLASS

    34

  • Chapter 3

    Inference For Population Mean

    Chapter 5 in textbook.

    3.1 Confidence intervals

    When a population parameter is estimated by a sample statistic such as = x, the sample statistic is a point estimate of the parameter. Due tosampling variability the point estimate will vary from sample to sample.

    An alternative or complementary approach is to report an interval ofplausible values based on the point estimate sample statistic and its standarddeviation (a.k.a. standard error). A confidence interval (C.I.) is calculatedby first selecting the confidence level, the degree of reliability of the interval.A 100(1)% C.I. means that the method by which the interval is calculatedwill contain the true population parameter 100(1 )% of the time. Thatis, if a sample is replicated multiple times, the proportion of times that theC.I. will not contain the population parameter is .

    Figure 3.1: Multiple confidence intervals from different samples

    35

  • 3.1.1 Large sample C.I. for population mean

    Let X1, . . . , Xn be i.i.d. N(, 2) with unknown mean and known variance

    2 (both assumed finite). Then, X N(, 2/n). However, if the samplesize is large enough we do not require that Xi be normal r.vs. The centrallimit theorem guarantees that the sample mean X is normal.

    Let zc stand for the value of Z N(0, 1) such that P (Z > zc) = c. Hencethe proportion of C.Is containing the population parameter is,

    1 = P(

    z/2 30 and the assumptions of C.L.T. are satisfied

    In the small sample setting with n 30 we must assume that the data arederived from a normal distribution, since we cannot use the C.L.T. Then, if2 is known the 100(1 )% C.I. for is

    x z/2n. (3.2)

    However, when 2 is unknown, simply replacing with the sample statistics is not sufficient, as s in no longer considered an accurate estimate dueto the small sample size.

    In higher level statistics the distribution of s2 is found, as it is a statisticthat depends on the random variables X1, . . . , Xn and it is shown that

    X s/n

    tn1 (3.3)

    where tn1 stands for Students-t distribution with parameter degrees of free-dom = n1. A Students-t distribution is similar to the standard normalexcept that it places more weight to extreme values as seen in Figure 3.1.2.

    4 2 0 2 4

    0.0

    0.1

    0.2

    0.3

    0.4

    Density Functions

    N(0,1)t_4

    Figure 3.2: Standard normal and t4 probability density functions

    It is important to note that Students-t is not just similar to the stan-dard normal but asymptotically (as n ) is the standard normal. Onejust needs to view the t-table to see that under infinite degrees of freedom thevalues in the table are exactly the same as the ones found for the standardnormal. Intuitively then, using Students-t when 2 is unknown makes sense

    37

    http://www.stat.ufl.edu/~dathien/Tables/Ttable.pdf
  • as it adds more probability to extreme values due to the uncertainty placedby estimating 2.

    The 100(1 )% C.I. for is then

    x t(n1,/2)sn. (3.4)

    Remark 3.2. To be technically correct then when 2 in known one should useequation (3.2) and when it is unknown, equation (3.4). It is common practice,for convenience mainly, to use equation (3.2) even when 2 is unknown butthe sample size is large. As discussed earlier under this scenario s2 is a goodestimate of 2 and the values in the t-table and z-table are very close toeachother.

    Example 3.2. Suppose that a sample of 36 resistors is taken with x = 10and s2 = 0.7. A 95% C.I. for is

    10 t(35,0.025)

    2.03

    0.7

    36= (9.71693, 10.28307)

    Note: If the exact degrees of freedom are not in the table, use the closest one.Since n > 30 you may see in practice see used equation (3.2) for the

    reasons discussed. The 95% C.I. using that method would be

    10 z0.025

    1.96

    0.7

    36= (9.726691, 10.273309)

    3.1.3 Sample size for a C.I. of fixed level and width

    The price paid for a higher confidence level, for the same sample statistics, isa wider interval - try this at home using different values. We know that asthe sample size n increases the standard deviation of X , /

    n decreases and

    consequently so does the margin of error. Thus, knowing some preliminaryinformation such as a rough estimate for can help us determine the samplesize needed to obtain a fixed margin of error.

    Using equation (3.2), the width of the interval is twice the margin of error

    width = 2z/2n.

    Thus,n = 2z/2

    width n

    (

    2z/2

    width

    )2

    .

    Example 3.3. In Example 3.1 we had that x = 12.05 and s = 0.1 for the100 boxes, leading to a 95% C.I. for the true mean width 0.0392 or 0.0196(3.1). Boss man requires a 95% C.I. of 0.0120.

    38

  • IN CLASS

    3.2 Hypothesis Testing

    A statistical hypothesis is a claim about a population characteristic (and onoccasion more than one). An example of a hypothesis is the claim that thepopulation is some value, e.g. = 0.75.

    Definition 3.1. The null hypothesis, denoted by H0, is the hypothesis thatis initially assumed to be true.

    The alternative hypothesis, denoted by Ha or H1, is the complementaryassertion to H0 and is usually the hypothesis, the new statement that wewish to test.

    A test procedure is created under the assumption of H0 and then it isdetermined how likely that assumption is compared to its complement Ha.The decision will be based on

    Test statistic, a function of the sampled data.

    Rejection region, the set of all test statistic values for which H0 willbe rejected.

    The basis for choosing a particular rejection region lies in an understandingof the errors that can be made.

    Definition 3.2. A type I error consists of rejecting H0 when it is actuallytrue.

    A type II error consists of failing to reject H0 when in actuality H0 isfalse.

    The type I error is generally considered to be the most serious one and dueto limitations we can only control for one, so the rejection region is chosenbased upon the maximum probability of a type I error that a researcher iswilling to accept.

    3.2.1 One sample hypothesis tests

    We motivate the test procedure by an example whereby the drying timeof a certain type of paint, under fixed environmental conditions, is known

    39

  • to be normally distributed with mean 75 min. and standard deviation 9min. Chemists have added a new additive that is believed to decrease dryingtime and have obtained a sample of 35 drying times and wish to test theirassertion. Hence,

    H0 : 75 (or = 75)Ha : < 75

    An obvious candidate for a test statistic, that is an unbiased estimator of thepopulation mean, is X which is normally distributed. If the data were notknown to be normally distributed the normality of X can also be confirmedby the C.L.T. Thus, under the null assumption H0

    XH0 N

    (

    75,92

    35

    )

    ,

    or equivalentlyX 75

    935

    H0 N(0, 1).

    Since we wish to control for the type I error, we set P (type I error) = .The default value of is usually taken to be 5%.

    0

    0.0

    0.1

    0.2

    0.3

    0.4

    Standard Normal

    1.645

    =0.05 area=0.05 area

    Figure 3.3: Rejection region equivalent to 0.05

    Then if the test statistic value,

    T.S. =x 75

    935

    ,

    40

  • is in the blue region, i.e. T.S. < z0.05, then H0 is rejected. We assume thatsample mean x is a good estimate for and hence x should be closeto 0, which implies T should be close to zero. However, if it is not, then itimplies that = 75 was not a good hypothesis value for the true mean andconsequently that T was not centered correctly.

    Assume that x = 70.8 from the 35 samples. Then T.S. = 2.76, which isin the rejection region and we reject H0 at the = 0.05 level. Equivalently,a conclusion can be reached in hypothesis/significance testing by using thep-value.

    Definition 3.3. The p-value of a hypothesis test is the probability of ob-serving the specific value of the test statistic, T.S., or a more extreme value,under the null hypothesis. The direction of the extreme values is indicatedby the alternative hypothesis.

    Therefore, in this example values more extreme than -2.76 are {x|x 30 and hence X is normallydistributed. To test

    41

  • (i) H0 : 0 vs Ha : > 0(ii) H0 : 0 vs Ha : < 0(iii) H0 : = 0 vs Ha : 6= 0at the significance level, compute the test statistic

    T.S. =x 0/

    n. (3.5)

    Reject the null if(i) T.S. > z

    (ii) T.S. < z(iii) |T.S.| > z/2

    (i) p-value=P (Z > T.S.) <

    (ii) p-value=P (Z < T.S.) <

    (iii) p-value=P (|Z| > |T.S.|) < Remark 3.3. Is is unknown and instead s is used, one should be usingStudents-t and the relevant t-table instead of the z-table, but since thesample size is large the two distributions are equivalent.

    Example 3.4. A scale is to be calibrated by weighing a 1000g weight 60times. From the sample we obtain x = 1000.6 and s = 2. Test whether thescale is calibrated correctly.

    IN CLASS

    42

  • Example 3.5. A company representative claims that the number of callsarriving at their center is no more than 15/week. To investigate the claim, 36random weeks were selected from the companies records with a sample meanof 17 and sample standard deviation of 3. Do the sample data contradictthis statement?

    First we begin by stating the hypotheses of

    H0 : 15 vs Ha : > 15

    The test statistic is

    T.S. =17 153/36

    = 4

    The conclusion is that there is significance evidence to reject H0 as the p-valueis very close to 0.

    3.2.2 Small sample test for population mean

    If the sample size is small, i.e. n 30, then the C.L.T. is not applicablefor X and therefore we must assume that the individual r.vs. X1, . . . , Xncorresponding to the sample are normal r.vs with mean and variance 2.Then, by Proposition 2.5 we have that X N(, 2/n) and we can proceedexactly as in equation (3.5).

    However, if is unknown, which is usually the case, we replace it by itssample estimate s. Consequently,

    X 0s/n

    H0 tn1,

    and the for an observed value X = x, the test statistic becomes

    T.S. =x 0s/n.

    At the significance level, for the same hypothesis tests as before, we rejectH0 if

    (i) T.S. > t(n1,)

    (ii) T.S. < t(n1,)(iii) |T.S.| > t(n1,/2)

    (i) p-value=P (tn1 > T.S.) <

    (ii) p-value=P (tn1 < T.S.) <

    (iii) p-value=P (|tn1| > |T.S.|) <

    Example 3.6. In an ergonomic study, 5 subjects were chosen to study themaximin weight of lift (MAWL) for a frequency of 4 lifts/min. Assuming theMAWL values are normally distributed, does the following data suggest thatthe population mean of MAWL exceeds 25?

    25.8, 36.6, 26.3, 21.8, 27.2

    IN CLASS

    43

  • Remark 3.4. The values contained within a two-sided 100(1 )% C.I. areprecisely those values for which the p-value of a two sided hypothesis testwill be greater than .

    Example 3.7. The lifetime of single cell organism is believed to be on av-erage 257 hours. A small preliminary study was conducted to test whetherthe average lifetime was different when the organism was placed in a certainmedium. The measurements are assumed to be normally distributed andturned out to be 253, 261, 258, 255, and 256. The hypothesis test is

    H0 : = 257 vs. Ha : 6= 257

    With x = 256.6 and s = 3.05, the test statistic value is

    T.S. =256.6 2573.05/

    5

    = 0.293.

    The p-value is P (|t4| > | 0.293|) = P (t4 < 0.293) + P (t4 > 0.293) =0.7839. Hence, since the p-value is large (> 0.05) we fail to reject H0 andconclude that population mean is not statistically different from 257.

    Instead of a hypothesis test if a two sided 95% was constructed by

    256.6 t(4,0.025)

    2.776

    3.055

    (252.81, 260.39),

    it clear that the null hypothesis value of = 257 is a plausible value andconsequently H0 is plausible, so it is not rejected.

    44

  • Part II

    Part 2 Material

    45

  • Chapter 4

    Inference For PopulationProportion

    Chapter 10.1 - 10.2 in textbook.

    4.1 Large sample C.I. for population propor-

    tion

    In the binomial setting experiments had binary outcomes and of interest wasthe number of successes out of the total number of trials. Let X be the totalnumber of successes, then X Bin(n, p). Once an experiment is conductedand data obtained an estimate for p can be obtained,

    p =x

    n

    which is an average. It is the total number of successes over the total numberof trials. As such, if the number of successes and number of failures aregreater than 5, the C.L.T. tells us that

    p N(

    p,p(1 p)

    n

    )

    .

    Then a 100(1 )% C.I. can be created as in equation (3.2),

    p z/2

    p(1 p)n

    .

    This is the classical approach for when the sample size is large. This cannotbe used for the small sample framework as the C.L.T. is not applicable. Anexact version exists in the field nonparametric statistics. However, there doesexist an interval similar to classical version that works relatively well for smallsample sizes (not too small) and is equivalent for large sample sizes. It iscalled the Agresti-Coull 100(1 )% C.I.,

    p z/2

    p(1 p)n

    ,

    46

  • where n := n + 4, and p := (x+ 2)/n.Note: The instructor will be using the Agresti-Coull interval on the exam andquizes, however the current textbook will be using the classical approach.

    Example 4.1. A map and GPS application for a smartphone was tested foraccuracy. The experiment yielded 26 error out of the 74 trials. Find the 90%C.I. for the proportion of errors.

    Since n = 74 and x = 26, then n = 74 + 4 and p = (26 + 2)/78 = 0.359.Hence the 90% C.I. for p is

    0.359 z0.05

    1.645

    0.359(1 0.359)78

    (0.269, 0.448)

    4.2 Large sample test for population propor-

    tion

    Let X be the number of successes in n Bernoulli trials with probability ofsuccess p, then X Bin(n, p). We know by the the C.L.T. that under certainregularity conditions (number of successes and number of failures is greaterthan 5), then p N(p, p(1 p)/n). To test

    (i) H0 : p p0 vs Ha : p > p0(ii) H0 : p p0 vs Ha : p < p0(iii) H0 : p = p0 vs Ha : p 6= p0we must assume, under the null hypothesis H0, that the number of successesand failures is greater than 5, i.e. np0 > 5 and n(1 p0) > 5, such that

    pH0 N

    (

    p0,p0(1 p0)

    n

    )

    .

    The test statistic is

    T.S. =p p0

    p0(1p0)n

    ,

    and the r.v. corresponding to the test statistic has a standard normal distri-bution under the null hypothesis assumption. Reject the null if

    (i) T.S. > z

    (ii) T.S. < z(iii) |T.S.| > z/2

    (i) p-value=P (Z > T.S.) <

    (ii) p-value=P (Z < T.S.) <

    (iii) p-value=P (|Z| > |T.S.|) <

    47

  • Chapter 5

    Inference For Two PopulationMeans

    Chapter 6 in textbook.

    5.1 Two Sample C.I.s

    There are instances when a C.I. for the difference between two means is ofinterest when one wishes to compare the sample mean from one populationto the sample mean of another.

    5.1.1 Large sample C.I. for two means

    Let X1, . . . , XnX and Y1, . . . , YnY represent two independent random largesamples with nX > 40, nY > 40 with means X , Y and variances

    2X ,

    2Y

    respectively. A simple application of the C.L.T. implies that X and Y arenormal random variables. Proposition 2.5 allows us to find the distributionof the random variable K corresponding to the difference X Y . Hence, Kis a normal random variable with

    E(K) = E(X Y ) = X Y ,

    and

    V (K) = V (X Y ) = 2X

    nX+

    2YnY

    .

    Therefore,

    K := X Y N(

    X Y ,2XnX

    +2YnY

    )

    ,

    and hence a 100(1 )% C.I. for the difference of X Y is

    x y z/2

    2XnX

    +2YnY

    .

    48

  • Once again, if the variances are unknown we can replace them with thesample variances due to the large sample size. In addition, we could useStudents-t critical values instead of the z-score, z/2, (as the variances areunknown) but large sample sizes imply that the t-score will be approximatelyequal to the z-score.

    Example 5.1. In an experiment, 50 observations of soil NO3 concentration(mg/L) were taken at each of two (independent) locations X and Y . Thedescriptive statistics are: x = 88.5, sX = 49.4, y = 110.6 and sY = 51.5.Construct a 95% C.I. for the difference in means and interpret.

    IN CLASS

    5.1.2 Small sample C.I. for two means

    As in Section 3.1.2, with small sample sizes we must assume thatX1, . . . , XnXare i.i.d N(X ,

    2X) and Y1, . . . , YnY are i.i.d N(Y ,

    2Y ) with the two sample

    being independent of one another. As in equation (3.3)

    X Y (X Y )

    s2X

    nX+

    s2Y

    nY

    t

    where

    =

    (s2XnX

    +s2YnY

    )2

    (s2X/nX)2

    nX1 +(s2

    Y/nY )2

    nY 1

    . (5.1)

    Hence the 100(1 )% for X Y is

    x y t(,/2)

    s2XnX

    +s2YnY

    .

    Example 5.2. Two methods are considered standard practice for surfacehardening. For Method A there were 15 specimens with a mean of 400.9(N/mm2) and standard deviation 10.6. For Method B there were also 15specimens with a mean of 367.2 and standard deviation 6.1. Assuming the

    49

  • samples are independent and from a normal distribution the 98% C.I. forA B is

    400.9 367.2 t,0.01

    10.62

    15+

    6.12

    15

    where

    =

    (10.62

    15+ 6.1

    2

    15

    )2

    (10.62/15)2

    14+ (6.1

    2/15)2

    14

    = 22.36 = 22

    and hence t22,0.01 = 2.508 giving a 98% C.I. of (25.8,41.6).

    Remark 5.1. When population variances are believed to be equal, i.e. 2X 2Y we can improve on the estimate of variance by using a pooled or weightedaverage estimate. If in addition to the regular assumptions, if we can assumeequality of variances then the 100(1 )% C.I. for X Y is

    x y t(nX+nY 2,/2)sp

    1

    nX+

    1

    nY,

    with

    sp =

    (nX 1)s2X + (nY 1)s2YnX + nY 2

    .

    The assumption that the variances are equal must be made a priori and notused simply because the two variances may be close in magnitude.

    Example 5.3. Consider Example 5.2 but now assume that 2X 2Y . A98% C.I. for the difference of X Y constructed with

    sp =

    14(10.62) + 14(6.12)

    28= 8.648

    is

    400.9 367.2 t(28,0.01)

    2.467

    (8.648)

    2

    15 (25.9097, 41.4903)

    How is this interval different from the one in Example 5.2?

    5.1.3 Large sample C.I. for two population proportions

    A simple extension of Section 4.1 to the two sample framework yields the100(1 )% C.I. for the difference of two population proportions. Let X Bin(nX , pX) and Y Bin(nY , pY ) be two independent binomial r.vs. DefinenX = nX + 2 and pX = (x+ 1)/nX , similarly for Y . Then the 100(1 )%C.I. for pX pY is

    pX pY z/2

    pX(1 pX)nX

    +pY (1 pY )

    nY.

    50

  • Intuitively, since proportions are between 0 and 1, the difference of two pro-portions must lie between -1 and 1. Hence if the bounds of a C.I. are outsidethe intuitive ones, they should be replaced by the intuitive bounds.

    Example 5.4. In a clinical trial for a pain medication, 394 subjects wereblindly administered the drug, while an independent group of 380 were givena placebo. From the drug group, 360 showed an improvement. From theplacebo group 304 showed improvement. Construct a 95% C.I. for the dif-ference and interpret.

    IN CLASS

    5.1.4 C.I. for paired data

    There are instances when two samples are not independent, when a rela-tionship exists between the two. For example, before treatment and aftertreatment measurements made on the same experimental subject are depen-dent on eachother through the experimental subject. This is a common eventin clinical studies where the effectiveness of a treatment, that may be quan-tified by the difference in the before and after measurements, is dependentupon the individual undergoing the treatment. Then, the data is said to bepaired.

    Consider the data in the form of the pairs (X1, Y1), (X2, Y2), . . . , (Xn, Yn).We note that the pairs, i.e. two dimensional vectors, are independent as theexperimental subjects are assumed to be independent with marginal expec-tations E(Xi) = X and E(Yi) = Y for all i = 1, . . . , n. By defining,

    D1 = X1 Y1D2 = X2 Y2...

    Dn = Xn Yna two sample problem has been reduced to a one sample problem. Inferencefor X Y is equivalent to one sample inference on D as was done inChapter 3. This holds since,

    D := E(D) = E

    (

    1

    n

    n

    i=1

    Di

    )

    = E

    (

    1

    n

    n

    i=1

    Xi Yi)

    = E(XY ) = XY .

    51

  • In addition we note that the variance of D does incorporate the covariancebetween the two samples as

    2D := V (D) = V

    (

    1

    n

    n

    i=1

    Di

    )

    =1

    n2

    n

    i=1

    V (Di) =2X +

    2Y 2XYn

    .

    Example 5.5. A new and old type of rubber compound can be used intires. A researcher is interested in a compound/type that does not weareasily. Ten random cars were chosen at random that would go around atrack a predetermined number of times. Each car did this twice, once foreach tire type and the depth of the tread was then measured.

    Car1 2 3 4 5 6 7 8 9 10

    New 4.35 5.00 4.21 5.03 5.71 4.61 4.70 6.03 3.80 4.70Old 4.19 4.62 4.04 4.72 5.52 4.26 4.27 6.24 3.46 4.50D 0.16 0.38 0.17 0.31 0.19 0.35 0.43 -0.21 0.34 0.20

    With d = 0.232 and sD = 0.183. Assuming that the data are normallydistributed, a 95% C.I. for new old = D is

    0.232 t9,0.025

    2.262

    0.18310

    (0.101, 0.363)

    and we note that the interval is strictly greater than 0, implying that thatthe difference is positive, i.e. that new > old

    5.2 Two Sample Hypothesis Tests (optional)

    5.2.1 Large sample test for difference of two means

    Let X1, . . . , XnX and Y1, . . . , YnY represent two independent random largesamples with nX > 40, nY > 40 with means X , Y and variances

    2X ,

    2Y

    respectively. We have seen in Section 5.1.1 by virtue of the C.L.T.

    X Y N(

    X Y ,2XnX

    +2YnY

    )

    .

    To test

    (i) H0 : X Y 0 vs Ha : X Y > 0(ii) H0 : X Y 0 vs Ha : X Y < 0(iii) H0 : X Y = 0 vs Ha : X Y 6= 0

    52

  • we assume that the variances are known and the test statistic is

    T.S. =x y 0

    2X/nX + 2Y /nY

    .

    The r.v. corresponding to the test statistic has a standard normal distri-bution under the null hypothesis H0, that X Y = 0. Reject the nullif

    (i) T.S. > z

    (ii) T.S. < z(iii) |T.S.| > z/2

    (i) p-value=P (Z > T.S.) <

    (ii) p-value=P (Z < T.S.) <

    (iii) p-value=P (|Z| > |T.S.|) < If the variances 2X and

    2Y are unknown it is acceptable to replace them

    by their sample estimates or alternatively use a t distribution as shown inthe next section.

    5.2.2 Small sample test for difference of two means

    Inference via hypothesis testing is analogous to Section 5.1.2 which is anextension of the large sample methodology. However, since the C.L.T. isnot applicable we must assume that the two random samples are normallydistributed and independent.

    If the variances are known the test statistic is

    T.S. =x y 0

    2X/nX + 2Y /nY

    ,

    which has a standard normal distribution under H0. Reject H0 same asbefore.

    Usually the variances are unknown and have to be estimated, thenthe test statistic is

    T.S. =x y 0

    s2X/nX + s2Y /nY

    ,

    which has a t distribution under H0, where the degrees of freedom aregiven by equation (5.1).

    Remark 5.2. As in Remark 5.1, when population variances are believed to beequal, i.e. 2X 2Y we can improve on the estimate of variance, and henceobtain a more powerful test, by using a pooled estimate of the variance. If inaddition to the regular assumptions, if we can assume equality of variancesthen replace both sX and sY with

    sp =

    (nX 1)s2X + (nY 1)s2YnX + nY 2

    ,

    and the degrees of freedom for the t distribution by nX + nY 2.

    53

  • 5.2.3 Large sample test for difference of two propor-tions

    Let X Bin(nX , pX) and Y Bin(nY , pY ) represent two independent bino-mial r.vs from two Bernoulli trial experiments. To test

    (i) H0 : pX pY 0 vs Ha : pX pY > 0

    (ii) H0 : pX pY 0 vs Ha : pX pY < 0

    (iii) H0 : pX pY = 0 vs Ha : pX pY 6= 0

    we must assume that the number of successes and failures is greater than10 for both samples. As the null hypotheses values for pX and pY are notavailable we simply check that the sample successes and failures are greaterthan 10. By virtue of the C.L.T.

    pX pY H0 N(

    0,pX(1 pX)

    nX+

    pY (1 pY )nY

    )

    ,

    and test statistic would be constructed in the usual way. However, under H0it is assumed that pX = pY which implies that the two variances are equaland therefore in lieu of Remark 5.1 we can replace pX and pY in the varianceby the pooled estimate

    p =x+ y

    nX + nY.

    The test statistic is then

    T.S. =pX pY 0

    p(1 p)(1/nX + 1/nY ),

    and the r.v. corresponding to the test statistic has a standard normal distri-bution under the null hypothesis.

    5.2.4 Test for paired data

    In the event that two samples are dependent, i.e. paired, such as when twodifferent measurements are made on the same experimental unit, the infer-ence methodology must be adapted to account for the dependence/covariancebetween the two samples.

    Refer to Section 5.1.4, where we consider the data in the form of thepairs (X1, Y1), (X2, Y2), . . . , (Xn, Yn) and construct the one-dimensional, i.e.one-sample D1, D2, . . . , Dn where Di = XiYi for all i = 1, . . . , n. As shownearlier, D = X Y and the variance term 2D incorporates the covariancebetween X and Y .

    To test

    (i) H0 : X Y = D 0 vs Ha : X Y = D > 0

    54

  • (ii) H0 : X Y = D 0 vs Ha : X Y = D < 0(iii) H0 : X Y = D = 0 vs Ha : X Y = D 6= 0perform a one-sample hypothesis test by either a large or small sample infer-ence using the test statistic

    T.S. =d0D/

    n

    ord0sD/

    n

    5.3 Normal Probability Plot

    A probability plot is a graphical technique for comparing two data sets, eithertwo sets of empirical observations, one empirical set against a theoretical set.

    Definition 5.1. The empirical distribution function, or empirical c.d.f., isthe cumulative distribution function associated with the empirical measureof the sample. This c.d.f. is a step function that jumps up by 1/n at each ofthe n data points.

    Fn(x) =number of elements x

    n=

    1

    n

    n

    i=1

    I{xi x}

    Example 5.6. Consider the sample: 1, 5, 7, 8. The empirical c.d.f. is

    F4(x) =

    0 if x < 1

    0.25 if 1 x < 50.50 if 5 x < 70.75 if 7 x < 81 if x > 8

    0 2 4 6 8 10

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    x

    Fn(

    x)

    Figure 5.1: Empirical c.d.f.

    55

  • The normal probability plot is a graphical technique for normality testingby assessing whether or not a data set is approximately normally distributed.The data are plotted against a theoretical normal distribution in such a waythat the points should form an approximate straight line. Departures fromthis straight line indicate departures from normality.

    There are two types of plots commonly used to plot the empirical c.d.f.to the normal theoretical one (G()). A P-P plot that plots (Fn(x), G(x))(with scaled changed to look linear), while of more wide use, the Q-Q plotwhich plots the quantile functions (F1n (x), G

    1(x)).

    Example 5.7. An experiment of lead concentrations (mg/kg dry weight)from 37 stations, yielded 37 observations. Of interest is to determine if thedata are normally distributed (of more practical use if sample sizes are small,e.g. < 30).

    0 50 100 150 200

    2

    1

    01

    2

    Normal QQ Plot

    Sample Quantiles

    The

    oret

    ical

    Qua

    ntile

    s

    Figure 5.2: QQ plot.

    http://www.stat.ufl.edu/~dathien/STA6166/QQplot.R

    INTERPRETATION OF FIGURES

    56

    http://www.stat.ufl.edu/~dathien/STA6166/QQplot.R
  • Chapter 6

    Nonparametric Procedures ForPopulation Location

    When the sample size is small and we cannot assume that the data arenormally distributed we need must use exact nonparametric procedures toperform inference on population cental values. Instead of means we willbe referring to medians () and other location concepts as they are lessinfluenced by outliers which can have a drastic impact (especially) on smallsamples.

    6.1 Sign test

    Section 5.8 in textbook.

    Recall that the median is the 50thpercentile, so we expect 50% of the data tofall above that value. Let B be the number of observations that are strictlygreater than the median. (This will be the test statistic irrespective of thetype of hypothesis test). By definition of the median we expect a 50-50chance that an observation is above the median. Therefore, B Bin(n, 0.5).To test the hypotheses

    (i) H0 : 0 vs Ha : > 0(ii) H0 : 0 vs Ha : < 0(iii) H0 : = 0 vs Ha : 6= 0we reject H0 if the p-value < . We illustrate the calculation of the p-valuewith the following example.

    Example 6.1. Pulse rates for a sample of 15 students were:

    60, 62, 72, 60, 63, 75, 64, 68, 63, 60, 52, 64, 82, 68, 64

    57

  • To test H0 : 65 vs Ha : < 65 we have B = 5. The p-value, (i.e. theprobability of observing the test statistic or a value more extreme) is

    p-value = P (B 5|B Bin(15, 0.5))= P (B = 0) + . . .+ P (B = 5)

    =

    5

    i=0

    (15

    i

    )

    0.5i0.515i

    = 0.1509.

    Hence, we fail to reject H0. In R we would simply runbinom.test(5,15,alternative="less")

    How does one calculate the p-value for a two-sided test? IN CLASS

    Remark 6.1. If we wanted to test the location of the 70thpercentile thenB Bin(n, 0.3)Remark 6.2. There is also a normal approximation (shown in textbook) butwe will stick to exact method.

    58

  • 6.2 Wilcoxon rank-sum test

    Section 6.3 in textbook.

    One of the most widely used two sample tests for location differences be-tween two populations (treatments). Assume, that two independent samplesX1, . . . , XnX are i.i.d. with a c.d.f. F1() and Y1, . . . , YnY are i.i.d. with ac.d.f. F2(). The null hypothesis H0 : F1(x) = F2(x) x is tested against

    (i) Y s tend to be smaller than the X s.

    (ii) Y s tend to be larger than the X s.

    (iii) One of the two populations is shifted from the other.

    To conduct the test we

    first rank all the (nX + nY ) data irrespective of sample

    calculate the sum of the ranks associated with the smallest sample. (ifsample sizes equal, choice of smallest is irrelevant; usually go withthe first sample).

    H0 is rejected if

    (i)

    TX TU , nX nYTY TL, nX > nY

    (ii)

    TX TL, nX nYTY TU , nX > nY

    (iii)

    TX TU or TX TL nX nYTY TU or TY TL nX > nY

    where the critical values TU and TL can be found in Table 5 (Table 6 in thetextbook) where the first sample is the smallest one (done for convenience).In practice though, R can provide exact p-values.

    59

    http://www.stat.ufl.edu/~athienit/Tables/tables.pdf
  • Example 6.2. Two groups of 10 did not know whether they were receivingalcohol or the placebo and their reaction times (in seconds) was recorded.

    Placebo 0.90 0.37 1.63 0.83 0.95 0.78 0.86 0.61 0.38 1.97Alcohol 1.46 1.45 1.76 1.44 1.11 3.07 0.98 1.27 2.56 1.32

    Test whether the distribution of reaction times for the placebo are shiftedto the left of that for alcohol (case (ii)). The ranks are:

    Placebo 7 1 16 5 8 4 6 3 2 18 70Alcohol 15 14 17 13 10 20 9 11 19 12 140

    The test statistic is T = 70. From Table 6b TL = 83, TU = 127. SinceT TL we reject H0.

    Remark 6.3. Notice that the table only provides critical values for nX 10and nY 10. For larger values, you may use

    other tables online

    normal approximation

    T n1(n1 + n2 + 1)/2

    n1n2(n1 + n2 + 1)/12

    (as shown in textbook p. 254)

    software such as R which give you exact p-valueshttp://www.stat.ufl.edu/~athienit/STA6166/wilcox_1.R

    Remark 6.4. If there are ties in the data then the values that are tied get theaverage of the ranks that they would have gotten if not tied. For example,the rank of the data 0.3, 0.5, 0.5, 0.7 is 1, 2.5, 2.5, 4, as the values 0.5 shouldhave gotten ranks 2 and 3 if they were slightly different.

    6.3 Wilcoxon signed-rank test

    Section 6.5 in textbook.

    To test for location differences between the X and Y components in thei.i.d. pairs (X1, Y1), . . . , (Xn, Yn), we take the differences Di = Xi Yi (as inSection 5.1.4) and test

    H0 : Distribution of Dis is symmetric about the null value D0, against thealternatives

    60

    http://www.stat.ufl.edu/~athienit/STA6166/wilcox_1.R
  • (i) Dis tend to be larger than D0, i.e. X s tend to be larger that Y s byan amount of D0 or greater.

    (ii) Dis tend to be smaller than D0, i.e. X s tend to be smaller than Y sby an amount of D0 or greater.

    (iii) Dis tend to be consistently larger or smaller than D0, i.e. X s tendto be consistently different than Y s by an amount of D0 or greater orgreater.

    The test procedure consists of

    calculating the differences di = (xi yi)D0,

    discarding any di = 0 from the data,

    ranking |d1|, . . . , |dn| from smallest to largest,

    calculate

    T+ = sum of ranks corresponding to positive dis

    T = sum of ranks corresponding to negative dis

    H0 is rejected if

    (i) T < Tc

    (ii) T+ < Tc

    (iii) min{T, T+} < Tcwhere Tc is the critical value found in Table 6(Table 7 in textbok).

    Remark 6.5. The table of critical values is limited, but there does exist anormal approximation that is provided in the textbook for larger samplesizes or one can simply use software like R.

    Example 6.3. A city park department compared two fertilizers A and Bon 20 softball fields. Each field was divided in half where each fertilizer wasused. The effect of the fertilizer was measured in the pounds (lbs) of grassclippings produced.

    Since not specified in the problem, we consider as an alternative hypoth-esis the (general) two-sided alternative (case (iii)) with D0 = 0.

    61

    http://www.stat.ufl.edu/~athienit/Tables/tables.pdf
  • Field A B D Rank(|D|) Field A B D Rank(|D|)1 211.4 186.3 25.1 15 11 208.9 183.6 25.3 17.52 204.4 205.7 -1.3 1 12 208.7 188.7 20.0 83 202.0 184.4 17.6 7 13 213.8 188.6 25.2 164 201.9 203.6 -1.7 2 14 201.6 204.2 -2.6 45 202.4 180.4 22.0 14 15 201.8 181.6 20.1 96 202.0 202.0 0 0 16 200.3 208.7 -8.4 67 202.4 181.5 20.9 13 17 201.8 181.5 20.3 108 207.1 186.7 20.4 11 18 201.5 208.7 -7.2 59 203.6 205.7 -2.1 3 19 212.1 186.8 25.3 17.510 216.0 189.1 26.9 19 20 203.4 182.9 20.5 12

    T+ = 15 + 7 + 14 + 13 + 11 + 19 + 17.5 + 8 + 16 + 9 + 10 + 17.5 + 12 = 169

    T = 1 + 2 + 3 + 4 + 6 + 5 = 21

    The test statistic is T = 21 which is smaller than Tc = 46 (from Table 7with n = 19 nonzeros, = 0.05) and we reject H0.

    We can conclude that fertilizers A and B differ, and since T+ is greaterthan T, that A produces more clippings than B.http://www.stat.ufl.edu/~athienit/STA6166/wilcox_2.R

    Remark 6.6. Suppose that type B was the old fertilizer and that a sales agentapproached the city council with a claim that their new fertilizer (type A)was better in that it would produce 5 or more pounds of grass clippingscompared to B.

    The alternative hypothesis is case (i) with D0 = 5. As a result we obtainthe following table

    Field AB D Rank(|D|) Field AB D Rank(|D|)1 25.1 20.1 16 11 25.3 20.3 18.52 -1.3 -6.3 2 12 20.0 15.0 93 17.6 12.6 7 13 25.2 20.2 174 -1.7 -6.7 3 14 -2.6 -7.6 55 22.0 17.0 15 15 20.1 15.2 106 0 -5.0 1 16 -8.4 -13.4 87 20.9 15.9 14 17 20.3 15.3 118 20.4 15.4 12 18 -7.2 -12.2 69 -2.1 -7.1 4 19 25.3 20.3 18.510 26.9 21.9 20 20 20.5 15.5 13

    T+ = 16 + 7 + 15 + 14 + 12 + 20 + 18.5 + 9 + 17 + 10 + 11 + 18.5 + 13 = 181

    T = 2 + 3 + 1 + 4 + 5 + 8 + 6 = 29

    62

    http://www.stat.ufl.edu/~athienit/STA6166/wilcox_2.R
  • The test statistic is T = 29 which is smaller than Tc = 60 (from Table 7with n = 20 nonzeros, = 0.05) and we reject H0.

    63

  • Chapter 7

    Inference About PopulationVariances

    Chapter 7 in textbook.

    7.1 Inference On One Variance

    The sample statistic s2 is widely used as the point estimate for the populationvariance 2, and similar to the sample mean it varies from sample to sampleand has a sampling distribution.

    Let X1, . . . , Xn be i.i.d. r.v.s. We already have some tools that help usdetermine the distribution of X = 1

    n

    ni=1Xi, a function of the r.v.s, and

    hence X is a r.v. itself and once a sample is collected a realization X = x isobserved. Similarly, let

    S2 =1

    n 1

    n

    i=1

    (Xi X)2

    be a function of the r.v.s X1, . . . , Xn and hence is a r.v. itself. A realizationof this r.v. is the sample variance s2. If X1, . . . , Xn are i.i.d. N(,

    2) then

    (n 1)S22

    2n1,

    where 2 denotes a chi-square distribution with (n 1) degrees of freedom.Let 2(n1,) denote the critical value of a

    2n1 distribution such that the

    area to the right is .

    64

    http://en.wikipedia.org/wiki/Chi-squared_distribution
  • 0

    0.00

    0.02

    0.04

    0.06

    0.08

    0.10

    2distribution

    2

    area

    Figure 7.1: 2 distribution and critical value.

    Consequently,

    1 = P(

    2(n1),1/2