stat review -keller

Upload: shyamraju2007

Post on 22-Feb-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 Stat Review -Keller

    1/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    !hat is "tatistics#

    Statistics is a way to get information from data

    Data

    Statistics

    Information

    Data: Facts, especially

    numerical facts, collected

    together for reference or

    information.

    Definitions: Oxford English Dictionary

    Information: Knowledge

    communicated concerning

    some particular fact.

    Statistics is atoolfor creatingnew understandingfrom a set of

    numbers.

  • 7/24/2019 Stat Review -Keller

    2/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    $ey "tatistical Concepts%

    Population

    apopulationis the group of allitems of interest toa statistics practitioner.

    frequently very large; sometimes infinite.

    E.g. All 5 million Florida voters, per Example 12.5

    Sample

    Asampleis a set of data drawn from thepopulation.

    Potentially very large, but less than the population.

    E.g. a sample of 765 voters exit polled on election day.

  • 7/24/2019 Stat Review -Keller

    3/209

  • 7/24/2019 Stat Review -Keller

    4/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    $ey "tatistical Concepts%

    Populations have Parameters,

    Samples have Statistics.

    &arameter

    &op'lation "ample

    "tatistic

    "'(set

  • 7/24/2019 Stat Review -Keller

    5/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    )escriptive "tatistics%

    aremethodsof organizing, summarizing, and presenting

    data in a convenient and informative way. These methodsinclude:

    Graphical Techniques (Chapter 2), and

    Numerical Techniques (Chapter 4).

    The actual method used depends on what informationwe

    would like to extract. Are we interested in

    measure(s) of central location? and/or

    measure(s) of variability (dispersion)?

    Descriptive Statistics helps to answer these questions

  • 7/24/2019 Stat Review -Keller

    6/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "tatistical Inference%

    Statistical inferenceis theprocessof making an estimate,

    prediction, or decision about a population based on a sample.

    &arameter

    &op'lation

    "ample

    "tatistic

    Inference

    What can we inferabout a Populations Parameters

    based on a Samples Statistics?

  • 7/24/2019 Stat Review -Keller

    7/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    )e*nitions%

    A variableis some characteristic of a population or sample.

    E.g. student grades.

    Typically denoted with a capital letter: X, Y, Z

    The valuesof the variable are the range of possible valuesfor a variable.

    E.g. student marks (0..100)

    Dataare theobserved valuesof a variable.

    E.g. student marks: {67, 74, 71, 83, 93, 55, 48}

  • 7/24/2019 Stat Review -Keller

    8/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Interval )ata%

    Intervaldata

    Real numbers, i.e. heights, weights, prices, etc.

    Also referred to as quantitative or numerical.

    Arithmetic operations can be performed on Interval Data,thus its meaningful to talk about 2*Height, or Price + $1,

    and so on.

  • 7/24/2019 Stat Review -Keller

    9/209

  • 7/24/2019 Stat Review -Keller

    10/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    rdinal )ata%

    OrdinalDataappear to be categorical in nature, but their

    values have anorder; a ranking to them:

    E.g. College course rating system:

    poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

    While its still not meaningful to do arithmetic on this data

    (e.g. does 2*fair = very good?!), we can say things like:

    excellent > poor or fair < very good

    That is, order is maintained no matter what numeric values

    are assigned to each category.

  • 7/24/2019 Stat Review -Keller

    11/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    -raphical . Ta('lar Techni'es for +ominal)ata%The only allowable calculation on nominal data is to count

    the frequency of each value of the variable.

    We can summarize the data in a table that presents the

    categories and their counts called afrequency distribution.

    Arelative frequency distributionlists the categories and the

    proportion with which each occurs.

    Refer to Example 2.1

    http://e/TT%20PowerPoint%20slides/References/Xm02-01.xlshttp://e/TT%20PowerPoint%20slides/References/Xm02-01.xls
  • 7/24/2019 Stat Review -Keller

    12/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    +ominal )ata Ta('lar "'mmary1

  • 7/24/2019 Stat Review -Keller

    13/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    +ominal )ata re'ency1

    Bar Charts are often used to displayfrequencies

  • 7/24/2019 Stat Review -Keller

    14/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    +ominal )ata

    It all the same information,(based on the samedata).

    Just differentpresentation.

  • 7/24/2019 Stat Review -Keller

    15/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    -raphical Techni'es for Interval)ataThere are several graphical methods that are used when the

    data are interval(i.e. numeric, non-categorical).

    The most important of these graphical methods is the

    histogram.

    The histogram is not only a powerful graphical technique

    used tosummarizeinterval data, but it is also used to help

    explainprobabilities.

  • 7/24/2019 Stat Review -Keller

    16/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    B'ilding a 3istogram%

    1) Collect the Data

    2) Create a frequency distribution for the data. 3) Draw the Histogram.

  • 7/24/2019 Stat Review -Keller

    17/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    3istogram and "tem . Leaf%

  • 7/24/2019 Stat Review -Keller

    18/209Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    give%

    Is a graph of acumulativefrequency distribution.

    We create an ogive in three steps

    1) Calculate relative frequencies.

    2) Calculatecumulative relative frequenciesby adding thecurrent class relative frequency to the previous classcumulative relative frequency.

    (For the first class, its cumulative relative frequency is just its relative frequency)

  • 7/24/2019 Stat Review -Keller

    19/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    C'm'lative 4elative re'encies%

    first class

    next class: .355+.185=.540

    last class: .930+.070=1.00

    :

    :

  • 7/24/2019 Stat Review -Keller

    20/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    give%

    The ogive can (e 'sedto anser 'estions

    like6

    !hat telephone (ill

    val'e is at the 50thpercentile#

    (Refer also to Fig. 2.13 in your textbook)7aro'nd 895:

  • 7/24/2019 Stat Review -Keller

    21/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "catter )iagram%

    Example 2.9A real estate agent wanted to know to what

    extent the selling price of a home is related to its size

    1) Collect the data

    2) Determine the independent variable (X house size) andthe dependent variable (Y selling price)

    3) Use Excel to create a scatter diagram

    http://e/TT%20PowerPoint%20slides/References/Xm02-09.xlshttp://e/TT%20PowerPoint%20slides/References/Xm02-09.xls
  • 7/24/2019 Stat Review -Keller

    22/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "catter )iagram%

    It appears that in fact there is a relationship, that is, the

    greater the house size the greater the selling price

  • 7/24/2019 Stat Review -Keller

    23/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &atterns of "catter )iagrams%

    Linearity and Direction are two concepts we are interested in

    &ositive Linear 4elationship +egative Linear 4elationship

    !eak or +on;Linear 4elationship

  • 7/24/2019 Stat Review -Keller

    24/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Time "eries )ata%

    Observations measured at the same point in time are called

    cross-sectionaldata.

    Observations measured at successive points in time are

    calledtime-seriesdata.

    Time-series data graphed on a line chart, which plots the

    value of the variable on the vertical axis against the time

    periods on the horizontal axis.

  • 7/24/2019 Stat Review -Keller

    25/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    +'merical )escriptive Techni'es%

    Measures of Central Location

    Mean, Median, Mode

    Measures of Variability

    Range, Standard Deviation, Variance, Coefficient of Variation

    Measures of Relative Standing

    Percentiles, Quartiles

    Measures of Linear Relationship

    Covariance, Correlation, Least Squares Line

  • 7/24/2019 Stat Review -Keller

    26/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

  • 7/24/2019 Stat Review -Keller

    27/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    >rithmetic

  • 7/24/2019 Stat Review -Keller

    28/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "tatistics is a pattern lang'age%

    Population Sample

    Size N n

    Mean

  • 7/24/2019 Stat Review -Keller

    29/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The >rithmetic

  • 7/24/2019 Stat Review -Keller

    30/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

  • 7/24/2019 Stat Review -Keller

    31/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    4ange%

    Therangeis the simplest measure of variability, calculated

    as:

    Range = Largest observation Smallest observation

    E.g.

    Data: {4, 4, 4, 4, 50} Range = 46

    Data: {4, 8, 15, 24, 39, 50} Range = 46The range is the same in both cases,

    but the data sets have very different distributions

  • 7/24/2019 Stat Review -Keller

    32/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "tatistics is a pattern lang'age%

    Population Sample

    Size N n

    Mean

    Variance

  • 7/24/2019 Stat Review -Keller

    33/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    ?ariance%

    The variance of a populationis:

    The variance of a sampleis:

    pop'lation mean

    sample mean

    +oteA the denominator is sample sie n1 min's one A

    pop'lation sie

  • 7/24/2019 Stat Review -Keller

    34/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    >pplication%

    Example 4.7. The following sample consists of the number

    of jobs six randomly selected students applied for: 17, 15,23, 7, 9, 13.

    Finds its mean and variance.

    What are we looking to calculate?

    The following sampleconsists of the number of jobs six

    randomly selected students applied for: 17, 15, 23, 7, 9, 13.

    Finds its meanand variance.

    as opposed to or 2

  • 7/24/2019 Stat Review -Keller

    35/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "ample

  • 7/24/2019 Stat Review -Keller

    36/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "tandard )eviation%

    The standard deviation is simply the square root of the

    variance, thus:

    Population standard deviation:

    Sample standard deviation:

  • 7/24/2019 Stat Review -Keller

    37/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "tandard )eviation%

    Consider Example 4.8where a golf club manufacturer has

    designed a new club and wants to determine if it is hit moreconsistently (i.e. with less variability) than with an old club.

    UsingTools )ata >nalysisDmay need to 7add in:% )escriptive

    "tatisticsin Excel, we produce the following tables for

    interpretation

    You get moreconsistent

    distance with the

    new club.

    http://e/TT%20PowerPoint%20slides/References/Xm04-08.xlshttp://e/TT%20PowerPoint%20slides/References/Xm04-08.xls
  • 7/24/2019 Stat Review -Keller

    38/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The Empirical 4'le% If the histogram isbellshaped

    >ppro@imately FGH of all o(servations fall

    ithin onestandard deviation of the mean

    >ppro@imately 5H of all o(servations fall

    ithin twostandard deviations of the mean

    >ppro@imately JH of all o(servations fall

    ithin threestandard deviations of the mean

    Ch ( h K Th + t f d ( i l i

  • 7/24/2019 Stat Review -Keller

    39/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Che(ysheKs Theorem%+otoften 'sed (eca'se interval isvery ide

    A more general interpretation of the standard deviation is

    derived from Chebysheffs Theorem, which applies to allshapes of histograms (not just bell shaped).

    The proportion of observations in any sample that lie

    within kstandard deviations of the mean is at least:

    or k=2 say1, the theoremstates that at least9/M of all

    o(servations lie ithin 2standard deviations of themean This is a 7loer (o'nd:compared to Empirical 4'lesappro@imation 5H1

    l

  • 7/24/2019 Stat Review -Keller

    40/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Bo@ &lots%

    These box plots are based on

    data in Xm04-15.

    Wendys service time is

    shortest and least variable.

    Hardees has the greatest

    variability, while Jack-in-

    the-Box has the longest

    service times.

    < h d f C ll i )

    http://e/TT%20PowerPoint%20slides/References/Xm04-15.xlshttp://e/TT%20PowerPoint%20slides/References/Xm04-15.xls
  • 7/24/2019 Stat Review -Keller

    41/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

  • 7/24/2019 Stat Review -Keller

    42/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "ampling%

    Recall that statistical inference permits us to draw

    conclusions about a population based on a sample.

    Sampling (i.e. selecting a sub-set of a whole population) is

    often done for reasons ofcost(its less expensive to sample

    1,000 television viewers than 100 million TV viewers) and

    practicality(e.g. performing a crash test on every

    automobile produced is impractical).

    In any case, thesampled populationand thetarget

    populationshould be similarto one another.

    " li &l

  • 7/24/2019 Stat Review -Keller

    43/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "ampling &lans%

    Asampling planis just a method or procedure for

    specifying how a sample will be taken from a population.

    We will focus our attention on these three methods:

    Simple Random Sampling,

    Stratified Random Sampling, and

    Cluster Sampling.

    "i l 4 d " li

  • 7/24/2019 Stat Review -Keller

    44/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "imple 4andom "ampling%

    Asimple random sampleis a sample selected in such a way

    that every possible sample of the same size is equally likelyto be chosen.

    Drawing three names from a hat containing all the names of

    the students in the class is an example of a simple random

    sample: any group of three names is as equally likely as

    picking any other group of three names.

    "t ti* d 4 d " li

  • 7/24/2019 Stat Review -Keller

    45/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "trati*ed 4andom "ampling%

    After the population has been stratified, we can usesimple

    random samplingto generate the complete sample:

    e only have s'Ncient reso'rces to sample M00 people total,

    e o'ld dra O00 of them from the lo income gro'p%

    %if e are sampling O000 people, ed dra50 of them from the high income gro'p

    Cl t " li

  • 7/24/2019 Stat Review -Keller

    46/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Cl'ster "ampling%

    Acluster sampleis a simple random sample of groups or

    clusters of elements (vs. a simple random sample ofindividual objects).

    This method is useful when it is difficult or costly to develop

    a complete list of the population members or when the

    population elements are widely dispersed geographically.

    Cluster sampling may increase sampling error due tosimilarities among cluster members.

    " li E

  • 7/24/2019 Stat Review -Keller

    47/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "ampling Error%

    Sampling errorrefers to differences between the sample and

    the population that exist only because of the observationsthat happened to be selected for the sample.

    Another way to look at this is: the differences in results for

    different samples (of the same size) is due to sampling error:

    E.g. Two samples of size 10 of 1,000 households. If we

    happened to get the highest income level data points in ourfirst sample and all the lowest income levels in the second,

    this delta is due to sampling error.

    + li E

  • 7/24/2019 Stat Review -Keller

    48/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    +onsampling Error%

    Nonsampling errorsare more serious and are due to

    mistakes made in the acquisition of data or due to the sampleobservations being selected improperly.Three types of

    nonsampling errors:

    Errors in data acquisition,

    Nonresponse errors, and

    Selection bias.

    Note: increasing the sample size will notreduce this type of

    error.

    > h t > i i

  • 7/24/2019 Stat Review -Keller

    49/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    >pproaches to >ssigning&ro(a(ilities%There are three ways to assign a probability, P(Oi), to an

    outcome, Oi, namely:

    Classical approach: make certain assumptions (such asequally likely, independence) about situation.

    Relative frequency: assigning probabilities based onexperimentation or historical data.

    Subjective approach: Assigning probabilities based on theassignors judgment.

    Interpreting &ro(a(ility

  • 7/24/2019 Stat Review -Keller

    50/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Interpreting &ro(a(ility%

    One way to interpret probability is this:

    If a random experiment is repeated an infinitenumber of

    times, the relative frequency for any given outcome is the

    probability of this outcome.

    For example, the probability of heads in flip of a balanced

    coin is .5, determined using the classical approach. The

    probability is interpreted as being the long-term relativefrequency of heads if the coin is flipped an infinite number

    of times.

    Conditional &ro(a(ility

  • 7/24/2019 Stat Review -Keller

    51/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Conditional &ro(a(ility%

    Conditional probabilityis used to determine how two events

    are related; that is, we can determine the probability of oneeventgiventhe occurrence of another related event.

    Conditional probabilities are written as P(A | B)and read as

    the probability of A givenB and is calculated as:

    Independence

  • 7/24/2019 Stat Review -Keller

    52/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Independence%

    One of the objectives of calculating conditional probability

    is to determine whether two events are related.

    In particular, we would like to know whether they areindependent, that is, if the probability of one event isnot

    affectedby the occurrence of the other event.

    Two events A and B are said to be independentif

    P(A|B) = P(A)or

    P(B|A) = P(B)

    Complement 4'le

  • 7/24/2019 Stat Review -Keller

    53/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Complement 4'le%

    The complement of an event A is the event that occurs when

    A does not occur.

    Thecomplement rulegives us the probability of an event

    NOT occurring. That is:

    P(AC) = 1 P(A)

    For example, in the simple roll of a die, the probability of the

    number 1 being rolled is 1/6. The probability that some

    number other than 1 will be rolled is 1 1/6 = 5/6.

  • 7/24/2019 Stat Review -Keller

    54/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    ddition 4'le

  • 7/24/2019 Stat Review -Keller

    55/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    >ddition 4'le%

    Recall: theaddition rulewas introduced earlier to provide a

    way to compute the probability of event AorBorboth Aand B occurring; i.e. the union of A and B.

    P(A or B) = P(A) + P(B) P(A and B)

    Why do we subtract the joint probability P(A and B) from

    the sum of the probabilities of A and B?

    P(A or B) = P(A) + P(B) P(A and B)

    >ddition 4'le for

  • 7/24/2019 Stat Review -Keller

    56/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    >ddition 4'le for

  • 7/24/2019 Stat Review -Keller

    57/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    To Types of 4andom ?aria(les%

    DiscreteRandom Variable

    one that takes on acountablenumber of values E.g. values on the roll of dice: 2, 3, 4, , 12

    ContinuousRandom Variable one whose values arenot discrete, not countable

    E.g. time (30.1 minutes? 30.10000001 minutes?)

    Analogy:

    Integers are Discrete, while Real Numbers are Continuous

    Las of E@pected ?al'e

  • 7/24/2019 Stat Review -Keller

    58/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Las of E@pected ?al'e%

    1. E(c) = c

    The expected value of a constant (c) is just the value of theconstant.

    2. E(X + c) = E(X) + c

    3. E(cX) = cE(X)

    We can pull a constant out of the expected value expression

    (either as part of a sum with a random variable X or as a coefficient

    of random variable X).

    Las of ?ariance

  • 7/24/2019 Stat Review -Keller

    59/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Las of ?ariance%

    1. V(c) = 0

    The variance of a constant (c) is zero.

    2. V(X + c) = V(X)

    The variance of a random variable and a constant is just the

    variance of the random variable (per 1 above).

    3. V(cX) = c2V(X)

    The variance of a random variable and a constant coefficient isthe coefficient squared times the variance of the random variable.

    Binomial )istri('tion

  • 7/24/2019 Stat Review -Keller

    60/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Binomial )istri('tion%

    Thebinomial distributionis the probability distribution that

    results from doing a binomial experiment. Binomialexperiments have the following properties:

    1. Fixed number of trials, represented as n.

    2. Each trial has two possible outcomes, a success and a

    failure.

    3. P(success)=p (and thus: P(failure)=1p), for all trials.

    4. The trials are independent, which means that the

    outcome of one trial does not affect the outcomes of any

    other trials.

    Binomial 4andom ?aria(le

  • 7/24/2019 Stat Review -Keller

    61/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Binomial 4andom ?aria(le%

    The binomial random variablecountsthe number of

    successes in ntrials of the binomial experiment. It can takeon values from 0, 1, 2, , n. Thus, its adiscreterandom

    variable.

    To calculate the probability associated with each value we

    use combintorics:

    for x=0, 1, 2, , n

    Binomial Ta(le

  • 7/24/2019 Stat Review -Keller

    62/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Binomial Ta(le%

    What is the probability that Pat fails the quiz?

    i.e. what is P(X 4), given P(success) = .20and n=10 ?

    P(X 4) = .967

    Binomial Ta(le

  • 7/24/2019 Stat Review -Keller

    63/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Binomial Ta(le%

    What is the probability that Pat getstwo answerscorrect?

    i.e. what is P(X = 2), given P(success) = .20and n=10 ?

    P(X = 2) = P(X2) P(X1) = .678 .376 = .302

    remem(er, the ta(le shos cumulativepro(a(ilities%

    =BI+

  • 7/24/2019 Stat Review -Keller

    64/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    =BI+

  • 7/24/2019 Stat Review -Keller

    65/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    =BI+

  • 7/24/2019 Stat Review -Keller

    66/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Binomial )istri('tion%

    As you might expect, statisticians have developed general

    formulas for the mean, variance, and standard deviation of abinomial random variable. They are:

    &oisson )istri('tion

  • 7/24/2019 Stat Review -Keller

    67/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &oisson )istri('tion%

    Named for Simeon Poisson, thePoisson distributionis a

    discrete probability distribution and refers to the number ofevents (a.k.a. successes) within a specific time period or

    region of space. For example:

    The number of cars arriving at a service station in 1 hour. (The

    interval of time is 1 hour.)The number of flaws in a bolt of cloth. (The specific region is a

    bolt of cloth.)

    The number of accidents in 1 day on a particular stretch of

    highway. (The interval is defined by both time, 1 day, and space,the particular stretch of highway.)

    The &oisson E@periment

  • 7/24/2019 Stat Review -Keller

    68/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The &oisson E@periment%

    Like a binomial experiment, aPoisson experimenthas four

    defining characteristic properties:1. The number of successes that occur in any interval is

    independent of the number of successes that occur in any

    other interval.

    2. The probability of a success in an interval is the same for

    all equal-size intervals

    3. The probability of a success is proportional to the size of

    the interval.4. The probability of more than one success in an interval

    approaches 0 as the interval becomes smaller.

    &oisson )istri('tion

  • 7/24/2019 Stat Review -Keller

    69/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &oisson )istri('tion%

    ThePoisson random variableis the number of successes

    that occur in a period of time or an interval of space in aPoisson experiment.

    E.g. On average, 96trucks arrive at a border crossing

    every hour.

    E.g. The number of typographic errors in a new textbook

    edition averages 1.5per 100 pages.

    s'ccesses

    timeperiod

    s'ccesses#A1

    interval

    &oisson &ro(a(ility )istri('tion

  • 7/24/2019 Stat Review -Keller

    70/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &oisson &ro(a(ility )istri('tion%

    The probability that a Poisson random variable assumes a

    value of xis given by:

    and eis the natural logarithm base.

    FYI:

    E@ample J O2

  • 7/24/2019 Stat Review -Keller

    71/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    E@ample JO2%

    The number of typographical errors in new editions of

    textbooks varies considerably from book to book. Aftersome analysis he concludes that the number of errors is

    Poisson distributed with a mean of 1.5 per 100 pages. The

    instructor randomly selects 100 pages of a new book. What

    is the probability that there are no typos?

    That is, what is P(X=0) given that = 1.5?

    There is about a 22% chance of finding zero errors

    &oisson )istri('tion

  • 7/24/2019 Stat Review -Keller

    72/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &oisson )istri('tion%

    As mentioned on the Poisson experiment slide:

    The probability of a success is proportional to the size of

    the interval

    Thus, knowing an error rate of 1.5 typos per 100 pages, we

    can determine a mean value for a 400 page book as:

    =1.5(4) = 6 typos / 400 pages.

    E@ample JO9%

  • 7/24/2019 Stat Review -Keller

    73/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    E@ample JO9%

    For a 400 page book, what is the probability that there are

    notypos?

    P(X=0) =

    there is a very small chance there are no typos

    E@ample JO9%

  • 7/24/2019 Stat Review -Keller

    74/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    E@ample JO9%

    Excelis an even better alternative:

    &ro(a(ility )ensity 'nctions%

    http://e/TT%20PowerPoint%20slides/References/Poisson%20probabilities.xlshttp://e/TT%20PowerPoint%20slides/References/Poisson%20probabilities.xls
  • 7/24/2019 Stat Review -Keller

    75/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &ro(a(ility )ensity 'nctions%

    Unlike a discrete random variable which we studied in

    Chapter 7, acontinuous random variableis one that canassume an uncountablenumber of values.

    We cannot list the possible values because there is an

    infinite number of them.

    Because there is an infinite number of values, the

    probability of each individual value is virtually 0.

    &oint &ro(a(ilities are Sero

  • 7/24/2019 Stat Review -Keller

    76/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &oint &ro(a(ilities are Sero

    Because there is an infinite number of values, the

    probability of each individual value is virtually 0.

    Thus, we can determine the probability of arange of values

    only.

    E.g. with a discreterandom variable like tossing a die, it is

    meaningful to talk about P(X=5), say.

    In a continuoussetting (e.g. with time as a random variable), theprobability the random variable of interest, say task length, takes

    exactly5 minutes is infinitesimally small, hence P(X=5) = 0.

    It is meaningful to talk about P(X 5).

    &ro(a(ility )ensity 'nction%

  • 7/24/2019 Stat Review -Keller

    77/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    &ro(a(ility )ensity 'nction%

    A function f(x) is called aprobability density function(over

    the range a

    x

    bif it meets the followingrequirements:

    1) f(x) 0 for all xbetween aand b, and

    2) The total area under the curve between aand bis 1.0

    f(x)

    xba

    area=1

    The +ormal )istri('tion%

  • 7/24/2019 Stat Review -Keller

    78/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The +ormal )istri('tion%

    Thenormal distributionis the most important of all

    probability distributions. The probability density function ofanormal random variableis given by:

    It looks like this:

    Bell shaped,

    Symmetrical around the mean

    The +ormal )istri('tion%

  • 7/24/2019 Stat Review -Keller

    79/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    e o a s (' o

    Important things to note:

    The normal distribution is fully defined by two parameters:its standard deviation andmean

    Unlike the range of the uniform distribution (a x b)

    Normal distributionsrange from minus infinity to plus infinity

    The normal distribution is bell shaped andsymmetrical about the mean

    "tandard +ormal )istri('tion%

  • 7/24/2019 Stat Review -Keller

    80/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    A normal distribution whose mean is zeroand standard

    deviation is oneis called thestandard normal distribution.

    As we shall see shortly, any normal distribution can be

    convertedto a standard normal distribution with simple

    algebra. This makes calculations much easier.

    0

    O

    O

    Calc'lating +ormal &ro(a(ilities%

  • 7/24/2019 Stat Review -Keller

    81/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    We can use the following function to convert any normal

    random variable to a standardnormal random variable

    "ome advice6alays dra a

    pict'reA

    0

    Calc'lating +ormal &ro(a(ilities%

  • 7/24/2019 Stat Review -Keller

    82/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    Example: The time required to build a computer isnormally

    distributedwith a mean of 50 minutes and a standarddeviation of 10 minutes:

    What is the probability that a computer is assembled in a

    time between 45 and 60 minutes?

    Algebraically speaking, what is P(45 < X < 60)?

    0

    Calc'lating +ormal &ro(a(ilities%

  • 7/24/2019 Stat Review -Keller

    83/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    P(45 < X < 60)?

    0

    mean of 50 minutes and a

    standard deviation of 10 minutes

    Calc'lating +ormal &ro(a(ilities%

  • 7/24/2019 Stat Review -Keller

    84/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    We can use Table 3in

    Appendix B to look-upprobabilities P(0 < Z < z)

    We can break up P(.5 < Z < 1)into:

    P(.5 < Z < 0)+ P(0 < Z < 1)

    The distribution issymmetricaround zero, so we have:P(.5 < Z < 0) = P(0 < Z < .5)

    Hence: P(.5 < Z < 1) = P(0 < Z < .5)+ P(0 < Z < 1)

    Calc'lating +ormal &ro(a(ilities%

    http://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xls
  • 7/24/2019 Stat Review -Keller

    85/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    How to use Table 3

    This table gives probabilitiesP(0 < Z < z)

    First column = integer + first decimal

    Top row = second decimal place

    P(0 < Z < 0.5)

    P(0 < Z < 1)

    P(.5 < Z < 1) = .1915 + .3414 = .5328

    sing the +ormal Ta(le Ta(le 91%

    http://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xls
  • 7/24/2019 Stat Review -Keller

    86/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    What is P(Z > 1.6)?

    0 OF

    &0 U S U OF1 = MM52

    &S OF1 = 5 V &0 U S U OF1= 5 V MM52

    = .0548

    sing the +ormal Ta(le Ta(le 91%

    http://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xls
  • 7/24/2019 Stat Review -Keller

    87/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    What is P(Z < -2.23)?

    0 229

    &0 U S U 2291

    &S U ;2291 = &S 2291= 5 V &0 U S U 2291

    = .0129

    ;229

    &S 2291&S U ;2291

    sing the +ormal Ta(le Ta(le 91%

    http://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xls
  • 7/24/2019 Stat Review -Keller

    88/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    What is P(Z < 1.52)?

    0 O52

    &S U 01 = 5

    &S U O521 = 5 W &0 U S U O521= 5 W M95J

    = .9357

    &0 U S U O521

    sing the +ormal Ta(le Ta(le 91%

    http://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xlshttp://e/TT%20PowerPoint%20slides/References/Table%203.xls
  • 7/24/2019 Stat Review -Keller

    89/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    What is P(0.9 < Z < 1.9)?

    0 0

    &0 U S U 01

    &0 U S U O1 = &0 U S U O1 V &0 U S U 01=MJO9 V 9O5

    = .1554

    O

    &0 U S U O1

    inding ?al'es of S%

    http://e/TT%20PowerPoint%20slides/References/Table%203.xls
  • 7/24/2019 Stat Review -Keller

    90/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Other Z values are

    Z.05= 1.645Z.01= 2.33

    sing the val'es of S

  • 7/24/2019 Stat Review -Keller

    91/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Because z.025= 1.96 and - z.025= -1.96, it follows that we can

    state

    P(-1.96 < Z < 1.96) = .95

    Similarly

    P(-1.645 < Z < 1.645) = .90

    ther Contin'o's )istri('tions%

  • 7/24/2019 Stat Review -Keller

    92/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Three other important continuous distributions which will be

    used extensively in later sections are introduced here:

    StudenttDistribution,

    Chi-Squared Distribution, and

    FDistribution.

    "t'dent t)istri('tion%

  • 7/24/2019 Stat Review -Keller

    93/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Here the lettertis used to represent the random variable,

    hence the name. The density function for the Studenttdistribution is as follows

    (nu) is calledthe degrees of freedom, and

    (Gamma function) is (k)=(k-1)(k-2)(2)(1)

    "t'dent t)istri('tion%

  • 7/24/2019 Stat Review -Keller

    94/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    In much the same way that and define the normal

    distribution, , the degrees of freedom, defines the StudenttDistribution:

    As the number of degrees of freedom increases, thet

    distribution approaches the standard normal distribution.

    ig're G2M

    )etermining "t'dent t?al'es%

  • 7/24/2019 Stat Review -Keller

    95/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The studenttdistribution is used extensively in statistical

    inference. Table 4in Appendix B lists values of

    That is, values of a Studenttrandom variable with degrees

    of freedom such that:

    The values for A are pre-determined

    critical values, typically in the

    10%, 5%, 2.5%, 1% and 1/2% range.

    sing the tta(le Ta(le M1 for

    http://e/TT%20PowerPoint%20slides/References/Table%204.xlshttp://e/TT%20PowerPoint%20slides/References/Table%204.xls
  • 7/24/2019 Stat Review -Keller

    96/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    val'es%For example, if we want the value oftwith 10 degrees of

    freedom such that the area under the Studenttcurve is .05:>rea 'nder the c'rve val'e t>1 6 CL

  • 7/24/2019 Stat Review -Keller

    97/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    TheFdensity function is given by:

    F > 0. Two parameters define this distribution, and like

    weve already seen these are againdegrees of freedom.

    is the numerator degrees of freedom and

    is the denominator degrees of freedom.

    )etermining ?al'es of F%

  • 7/24/2019 Stat Review -Keller

    98/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    For example, what is the value of F for 5% of the area under

    the right hand tail of the curve, with a numerator degree offreedom of 3 and a denominator degree of freedom of 7?

    Solution: use the F look-up (Table 6)

    +'merator )egrees of reedom 6 CL

  • 7/24/2019 Stat Review -Keller

    99/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    For areas under the curve on the left hand side of the curve,

    we can leverage the following relationship:

    Pay close attention to the order of the terms!

  • 7/24/2019 Stat Review -Keller

    100/209

    OO00Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Chapter 9

    Samplin !istributions

    "ampling )istri('tion of the

  • 7/24/2019 Stat Review -Keller

    101/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    " O 2 9 M 5 F

    P#"$ O/F O/F O/F O/F O/F O/F

    A fair dieis thrown infinitely many times,

    with the random variable X = # of spots on any throw.

    The probability distribution of X is:

    and the mean and variance are calculated as well:

    "ampling )istri('tion of To )ice

  • 7/24/2019 Stat Review -Keller

    102/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    A sampling distribution is created by looking at

    all samples of size n=2 (i.e. two dice) and their means

    While there are 36 possible samples of size 2, there are only

    11 values for , and some (e.g. =3.5) occur more

    frequently than others (e.g. =1).

    "ampling )istri('tion of To )ice%

  • 7/24/2019 Stat Review -Keller

    103/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Thesampling distributionof is shown below:

    1.0 1/36

    1.5 2/36

    2.0 3/36

    2.5 4/36

    3.0 5/36

    3.5 6/36

    4.0 5/36

    4.5 4/36

    5.0 3/36

    5.5 2/36

    6.0 1/36

    P( )

    1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

    F/9F

    5/9F

    M/9F

    9/9F

    2/9F

    O/9F

    P

    (

    )

    Compare%

  • 7/24/2019 Stat Review -Keller

    104/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Compare the distribution of X

    with the sampling distribution of .

    As well, note that:

    1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

    Central Limit Theorem%

  • 7/24/2019 Stat Review -Keller

    105/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The sampling distribution of the mean of a random sample

    drawn from any population isapproximately normalfor asufficiently large sample size.

    The larger the sample size, the more closely the sampling

    distribution of X will resemble a normal distribution.

    Central Limit Theorem%

  • 7/24/2019 Stat Review -Keller

    106/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    If the population is normal, then X is normally distributed

    for all values of n.

    If the population is non-normal, then X is approximately

    normal only for larger values of n.

    In many practical situations, a sample size of 30 may be

    sufficiently large to allow us to use the normal distribution

    as an approximation for the sampling distribution of X.

    "ampling )istri('tion of the "ample

  • 7/24/2019 Stat Review -Keller

    107/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

  • 7/24/2019 Stat Review -Keller

    108/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The foreman of a bottling plant has observed that the amount

    of soda in each 32-ounce bottle is actually a normallydistributed random variable, with a mean of 32.2 ounces and

    a standard deviation of .3 ounce.

    If a customer buys one bottle, what is the probability that thebottle will contain more than 32 ounces?

    E@ample Oa1%

  • 7/24/2019 Stat Review -Keller

    109/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We want to find P(X > 32), where X is normally distributed

    and =32.2 and =.3

    there is about a 75% chance that a single bottle of soda

    contains more than 32oz.

    E@ample O(1%

  • 7/24/2019 Stat Review -Keller

    110/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The foreman of a bottling plant has observed that the amount

    of soda in each 32-ounce bottle is actually a normally

    distributed random variable, with a mean of 32.2 ounces and

    a standard deviation of .3 ounce.

    If a customer buys a carton of fourbottles, what is theprobability that themean amount of the four bottleswill be

    greater than 32 ounces?

    E@ample O(1%

  • 7/24/2019 Stat Review -Keller

    111/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We want to find P(X > 32), where X is normally distributed

    with =32.2 and =.3

    Things we know:

    1) X is normally distributed, therefore so will X.

    2) = 32.2 oz.

    3)

    E@ample O(1%

  • 7/24/2019 Stat Review -Keller

    112/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    If a customer buys a carton of fourbottles, what is the

    probability that themean amount of the four bottleswill be

    greater than 32 ounces?

    There is about a 91% chance the mean of the four bottles

    will exceed 32oz.

    -raphically "peaking%mean=92

  • 7/24/2019 Stat Review -Keller

    113/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    hat is the pro(a(ility that one(ottle ill contain more than 92

    o'nces#

    hat is the pro(a(ility that themean of fo'r (ottles ill e@ceed 92

    o#

    2

    "ampling )istri('tion6 )iKerence of tomeans

  • 7/24/2019 Stat Review -Keller

    114/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    meansThe final sampling distribution introduced is that of the

    differencebetween two sample means. This requires:

    independentrandom samples be drawn from each of twonormalpopulations

    If this condition is met, then the sampling distribution of thedifferencebetween the two sample means, i.e.

    will be normally distributed.

    (note: if the two populations are notboth normallydistributed, but the sample sizes are large (>30), thedistribution of is approximatelynormal)

    "ampling )istri('tion6 )iKerence of tomeans

  • 7/24/2019 Stat Review -Keller

    115/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    meansThe expected valueand varianceof the sampling

    distribution of are given by:

    mean:

    standard deviation:

    (also called the standard error if the difference between two

    means)

    Estimation%

  • 7/24/2019 Stat Review -Keller

    116/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    There are two types of inference: estimation and hypothesis

    testing; estimationis introduced first.

    The objective of estimation is to determine theapproximate

    valueof a population parameter on the basis of a sample

    statistic.

    E.g., the sample mean ( ) is employed to estimatethe

    population mean ( ).

    Estimation%

  • 7/24/2019 Stat Review -Keller

    117/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The objective of estimation is to determine theapproximate

    valueof a population parameter on the basis of a sample

    statistic.

    There are two types of estimators:

    Point Estimator

    Interval Estimator

    &oint . Interval Estimation%

  • 7/24/2019 Stat Review -Keller

    118/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    For example, suppose we want to estimate the mean summer

    income of a class of business students. For n=25 students,

    is calculated to be 400 $/week.

    point estimate interval estimate

    An alternative statement is:

    The mean income isbetween380 and 420 $/week.

    Estimating hen is knon%the con*dence

  • 7/24/2019 Stat Review -Keller

    119/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We established in Chapter 9:

    Thus, theprobabilitythat the interval:

    contains the population mean is 1 . This is aconfidence interval estimator for .

    the sample meanis in the center of

    the interval%

    the con*denceinterval

    o'r commonly 'sed con*dencelevels

  • 7/24/2019 Stat Review -Keller

    120/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    levels%Confidence Level

    c't . keep handyA

    Ta(le O0O

    E@ample O0O%

  • 7/24/2019 Stat Review -Keller

    121/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    A computer company samples demand during lead time over

    25 time periods:

    Its is known that the standard deviation of demand over lead

    time is 75 computers. We want to estimate themeandemand

    over lead time with 95% confidence in order to set inventorylevels

    235 374 309 499 253

    421 361 514 462 369

    394 439 348 344 330

    261 374 302 466 535

    386 316 296 332 334

    E@ample O0O% C%&C'&%()

  • 7/24/2019 Stat Review -Keller

    122/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    In order to use our confidence interval estimator, we need the

    following pieces of data:

    therefore:

    The lowerand upperconfidence limits are 340.76and 399.56.

    370.16

    1.96

    75

    n 25 Given

    Calculated from the data

    E@ample O0O% *+(),P,)(

  • 7/24/2019 Stat Review -Keller

    123/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The estimation for the mean demand during lead time liesbetween 340.76 and 399.56 we can use this as input indeveloping an inventory policy.

    That is, we estimated that the mean demand during lead time

    falls between 340.76 and 399.56, and this type of estimatoris correct 95% of the time. That also means that 5% of thetime the estimator will be incorrect.

    Incidentally, the media often refer to the 95% figure as 19times out of 20, which emphasizes the long-runaspect ofthe confidence level.

    Interval !idth%

  • 7/24/2019 Stat Review -Keller

    124/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    A wide interval provides little information.

    For example, suppose we estimate with 95% confidence thatan accountants average starting salary is between $15,000

    and $100,000.

    Contrastthis with: a 95% confidence interval estimate of

    starting salaries between $42,000 and $45,000.

    The second estimate is much narrower, providing accountingstudents more precise information about starting salaries.

    Interval !idth%

  • 7/24/2019 Stat Review -Keller

    125/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The width of the confidence interval estimate is a function of

    the confidence level,the populationstandard deviation, and

    the sample size

    "electing the "ample "ie%

  • 7/24/2019 Stat Review -Keller

    126/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We can control the width of the interval by determining the

    sample size necessary to produce narrow intervals.

    Suppose we want to estimate the mean demand to within 5

    units; i.e. we want to the interval estimate to be:

    Since:

    It follows that

    Solve for nto get requisite sample size!

    "electing the "ample "ie%

    S l i h i

  • 7/24/2019 Stat Review -Keller

    127/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Solving the equation

    that is, to produce a 95% confidence interval estimate of the

    mean (5 units), we need to sample 865 lead time periods

    (vs. the 25 data points we have currently).

    "ample "ie to Estimate a

  • 7/24/2019 Stat Review -Keller

    128/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The general formula for the sample size needed to estimate a

    population mean with an interval estimate of:

    Requires a sample size of at least this large:

    E@ample O02%

    A l b i h di f

  • 7/24/2019 Stat Review -Keller

    129/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    A lumber company must estimate the mean diameter of trees

    to determine whether or not there is sufficient lumber to

    harvest an area of forest. They need to estimate this to within

    1 inch at a confidence level of 99%. The tree diameters are

    normally distributed with a standard deviation of 6 inches.

    How many trees need to be sampled?

    E@ample O02%

    Thi k

  • 7/24/2019 Stat Review -Keller

    130/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Things we know:

    Confidence level = 99%, therefore =.01

    We want , hence W=1.

    We are given that = 6.

    1

    E@ample O02%

    W t

  • 7/24/2019 Stat Review -Keller

    131/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We compute

    That is, we will need to sample at least 239 trees to have a

    99% confidence interval of 1

  • 7/24/2019 Stat Review -Keller

    132/209

    +onstatistical 3ypothesis Testing%

    Th t ibl

  • 7/24/2019 Stat Review -Keller

    133/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    There are two possible errors.

    A Type I error occurs when we reject a true null hypothesis.That is, a Type I error occurs when the jury convicts an

    innocent person.

    A Type II error occurs when we dont reject a false nullhypothesis. That occurs when a guilty defendant is acquitted.

    +onstatistical 3ypothesis Testing%

    Th b bilit f T I i d t d (G k

  • 7/24/2019 Stat Review -Keller

    134/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The probability of a Type I error is denoted as(Greek

    letter alpha). The probability of a type II error is(Greek

    letter beta).

    The two probabilities are inversely related. Decreasing one

    increases the other.

    +onstatistical 3ypothesis Testing%

    The critical concepts are theses:

  • 7/24/2019 Stat Review -Keller

    135/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The critical concepts are theses:

    1. There are two hypotheses, the null and the alternative

    hypotheses.

    2. The procedure begins with the assumption that the null

    hypothesis is true.

    3. The goal is to determine whether there is enough evidence toinfer that the alternative hypothesis is true.

    4. There are two possible decisions:

    Conclude that there is enough evidence to support the

    alternative hypothesis.

    Conclude that there is not enough evidence to support the

    alternative hypothesis.

    +onstatistical 3ypothesis Testing%

    5 T o possible errors can be made

  • 7/24/2019 Stat Review -Keller

    136/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    5. Two possible errors can be made.

    Type I error: Reject a true null hypothesisType II error: Do not reject a false null hypothesis.

    P(Type I error) =

    P(Type II error) =

    Concepts of 3ypothesis Testing O1%

    There are two hypotheses One is called the null hypothesis

  • 7/24/2019 Stat Review -Keller

    137/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    There are twohypotheses. One is called thenull hypothesis

    and the other thealternativeorresearch hypothesis. The

    usual notation is:

    H0: the null hypothesis

    H1: the alternative or research hypothesis

    The null hypothesis (H0) will always state that theparameter

    equals the valuespecified in the alternative hypothesis (H1)

    prono'nced

    3 7no'ght:

    Concepts of 3ypothesis Testing%

    Consider Example 10 1 (mean demand for computers during

  • 7/24/2019 Stat Review -Keller

    138/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Consider Example 10.1 (mean demand for computers during

    assembly lead time) again. Rather than estimate the mean

    demand, our operations manager wants to know whether the

    mean is different from 350 units. We can rephrase this

    request into a test of the hypothesis:

    H0: X = 350

    Thus, our research hypothesis becomes:H1: X 350

    This is hat e areinterested in

    determining%

    Concepts of 3ypothesis Testing M1%

    There are two possible decisions that can be made:

  • 7/24/2019 Stat Review -Keller

    139/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    There are twopossible decisions that can be made:

    Conclude that there isenough evidenceto support thealternative hypothesis

    (also stated as: rejecting the null hypothesis in favor of the

    alternative)

    Conclude that there is notenough evidenceto support thealternative hypothesis

    (also stated as: notrejecting the null hypothesis in favor ofthe alternative)

    NOTE: we do notsay that we acceptthe null hypothesis

    Concepts of 3ypothesis Testing%

    Once the null and alternative hypotheses are stated the next

  • 7/24/2019 Stat Review -Keller

    140/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Once the null and alternative hypotheses are stated, the next

    step is to randomly sample the population and calculate atest

    statistic(in this example, the sample mean).

    If the test statistics value is inconsistent with the null

    hypothesis we reject the null hypothesisand infer that thealternative hypothesis is true.

    For example, if were trying to decide whether the mean is

    not equal to 350, a large value of (say, 600) would provide

    enough evidence. If is close to 350 (say, 355) we could not

    say that this provides a great deal of evidence to infer that the

    population mean is different than 350.

    Types of Errors%

    A Type I error occurs when we reject a true null hypothesis

  • 7/24/2019 Stat Review -Keller

    141/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    A Type I error occurs when werejectatruenull hypothesis

    (i.e. Reject H0when it is TRUE)

    A Type II error occurs when wedont rejectafalsenull

    hypothesis (i.e. Do NOT reject H0when it is FALSE)

    30 T

    4eYect I

    4eYect II

  • 7/24/2019 Stat Review -Keller

    142/209

  • 7/24/2019 Stat Review -Keller

    143/209

    E@ample OOO%

    The system will be cost effective if the mean account balance

  • 7/24/2019 Stat Review -Keller

    144/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The system will be cost effective if the mean account balance

    for all customers is greater than $170.

    We express this belief as a our research hypothesis, that is:

    H1: > 170 (this is what we want to determine)

    Thus, our null hypothesis becomes:

    H0: = 170 (this specifies a single value for the

    parameter of interest)

  • 7/24/2019 Stat Review -Keller

    145/209

    E@ample OOO%

    To test our hypotheses we can use two different approaches:

  • 7/24/2019 Stat Review -Keller

    146/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    To test our hypotheses, we can use two different approaches:

    Therejection regionapproach (typically used when

    computing statistics manually), and

    Thep-valueapproach (which is generally used with a

    computer and statistical software).

    We will explore both in turn

  • 7/24/2019 Stat Review -Keller

    147/209

    E@ample OOO%

    All thats left to do is calculate and compare it to 170

  • 7/24/2019 Stat Review -Keller

    148/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    All that s left to do is calculate and compare it to 170.

    we can calculate this based on any level of

    significance ( ) we want

  • 7/24/2019 Stat Review -Keller

    149/209

    E@ample OOO% The Big &ict're%

  • 7/24/2019 Stat Review -Keller

    150/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    =175.34

    =178

    H1: > 170

    H0: = 170

    Reject H0in favor of

    "tandardied Test "tatistic%

    An easier method is to use the standardized test statistic:

  • 7/24/2019 Stat Review -Keller

    151/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    An easier method is to use the standardized test statistic:

    and compare its result to : (rejection region: z > )

    Since z = 2.46 > 1.645 (z.05), we reject H0in favor of H1

  • 7/24/2019 Stat Review -Keller

    152/209

    p;?al'e

    The p-value of a test is the probability of observing a test

  • 7/24/2019 Stat Review -Keller

    153/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Thep valueof a test is the probability of observing a test

    statistic at least as extreme as the one computed given that

    the null hypothesis is true.

    In the case of our department store example, what is the

    probabilityof observing a sample meanat least as extremeas the one already observed (i.e. = 178), given that the null

    hypothesis (H0: = 170) is true?

    p;val'e

    Interpreting the p;val'e%

    The smaller the p-value, the more statistical evidence exists

  • 7/24/2019 Stat Review -Keller

    154/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    The smaller the p value, the more statistical evidence existsto support the alternative hypothesis.

    If the p-value is less than 1%, there isoverwhelmingevidencethat supports the alternative hypothesis.

    If the p-value is between 1% and 5%, there is astrongevidencethat supports the alternative hypothesis.

    If the p-value is between 5% and 10% there is a weakevidencethat supports the alternative hypothesis.

    If the p-value exceeds 10%, there isno evidencethat

    supports the alternative hypothesis.We observe a p-value of .0069, hence there is

    overwhelming evidenceto support H1: > 170.

    Interpreting the p;val'e%

    Compare the p-value with the selected value of the

  • 7/24/2019 Stat Review -Keller

    155/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Co pa e t e p va ue w t t e se ected va ue o t e

    significance level:

    If the p-value is less than , we judge the p-value to be

    small enough to reject the null hypothesis.

    If the p-value is greater than , we do not reject the null

    hypothesis.

    Since p-value = .0069

  • 7/24/2019 Stat Review -Keller

    156/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    j ymean payment period. Thus, the parameter to be tested is the

    population mean. We want to know whether there is enoughstatistical evidence to show that the population mean is lessthan 22 days. Thus, the alternative hypothesis is

    H1:< 22

    The null hypothesis is

    H0:= 22

    Chapter;pening E@ample%

    The test statistic is

  • 7/24/2019 Stat Review -Keller

    157/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We wish to reject the null hypothesis in favor of the

    alternative only if the sample mean and hence the value of

    the test statistic is small enough. As a result we locate therejection region in the left tail of the sampling distribution.

    We set the significance level at 10%.

    n

    x

    z

    /

    =

    Chapter;pening E@ample%

    Rejection region: 28.110

    ==< zzz

  • 7/24/2019 Stat Review -Keller

    158/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    From the data in SSAwe compute

    and

    p-value = P(Z < -.91) = .5 - .3186 = .1814

    28.110.< zzz

    63.21220

    759,4

    220===

    ixx

    91.

    220/6

    2263.21

    /

    =

    =

    =

    n

    x

    z

    Chapter;pening E@ample%

    Conclusion: There is not enough evidence to infer that the

    http://e/TT%20PowerPoint%20slides/References/SSA.xlshttp://e/TT%20PowerPoint%20slides/References/SSA.xls
  • 7/24/2019 Stat Review -Keller

    159/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    mean is less than 22.

    There is not enough evidence to infer that the plan will be

    profitable.

    Since Z(- .91) > -Z.10(-1.28)

    We fail to reject Ho: X 22

    at a O0H level of signi*cance

    &LT &!E4 C4?E

  • 7/24/2019 Stat Review -Keller

    160/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    4ight;Tail Testing%

    Calculate the critical value of the mean ( ) and compare

  • 7/24/2019 Stat Review -Keller

    161/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    ( ) p

    against the observed value of the sample mean ( )

    Left;Tail Testing%

    Calculate the critical value of the mean ( ) and compare

  • 7/24/2019 Stat Review -Keller

    162/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    p

    against the observed value of the sample mean ( )

    ToVTail Testing%

    Two tail testing is used when we want to test a research

  • 7/24/2019 Stat Review -Keller

    163/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    g

    hypothesis that a parameter is not equal () to some value

    E@ample OO2%

    AT&Ts argues that its rates are such that customers wont

  • 7/24/2019 Stat Review -Keller

    164/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    see a difference in their phone bills between them and their

    competitors. They calculate the mean and standard deviationfor all their customers at $17.09 and $3.87 (respectively).

    They then sample 100 customers at random and recalculate amonthly phone bill based on competitors rates.

    What we want to show is whether or not:H1: 17.09. We do this by assuming that:

    H0: = 17.09

    E@ample OO2%

    The rejection region is set up so we can reject the null

  • 7/24/2019 Stat Review -Keller

    165/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    hypothesis when the test statistic is largeorwhen it is small.

    That is, we set up a two-tail rejection region. The total areain the rejection region must sum to , so we divide this

    probability by 2.

    stat is small stat is large

    E@ample OO2%

    At a 5% significance level (i.e. = .05), we have

  • 7/24/2019 Stat Review -Keller

    166/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    /2 = .025. Thus, z.025= 1.96 and our rejection region is:

    z < 1.96 -or- z > 1.96

    z-z.025 +z.0250

    E@ample OO2%

    From the data, we calculate = 17.55

    http://e/TT%20PowerPoint%20slides/References/Xm11-02.xlshttp://e/TT%20PowerPoint%20slides/References/Xm11-02.xls
  • 7/24/2019 Stat Review -Keller

    167/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Using our standardized test statistic:

    We find that:

    Since z = 1.19 is not greater than 1.96, nor less than 1.96

    we cannot reject the null hypothesis in favor of H1. That isthere is insufficient evidence to infer that there is a

    difference between the bills of AT&T and the competitor.

    &LT &!E4 C4?E

  • 7/24/2019 Stat Review -Keller

    168/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    "'mmary of ne; and To;TailTests%

  • 7/24/2019 Stat Review -Keller

    169/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    ne;Tail Test

    left tail1

    To;Tail Test ne;Tail Test

    right tail1

    Inference >(o't > &op'lation%D"I-+$+!+Z

    &op'lation

  • 7/24/2019 Stat Review -Keller

    170/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We will develop techniques to estimate and test three

    population parameters:

    Population Mean

    Population Variance

    Population Proportion p

    &arameter

    "ample

    "tatistic

    Inference

    Inference !ith ?ariance nknon%

    Previously, we looked at estimating and testing the

  • 7/24/2019 Stat Review -Keller

    171/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    population mean when the population standard deviation ( )

    was known or given:

    But how often do we know the actual population variance?

    Instead, we use the Student t-statistic, given by:

    Testing hen is 'nknon%

    When the population standard deviation is unknown and the

  • 7/24/2019 Stat Review -Keller

    172/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    population is normal, the test statistic for testing hypotheses

    about is:

    which is Studenttdistributed with = n1 degrees offreedom. The confidence interval estimator of is given

    by:

    E@ample O2O%

    Will new workers achieve 90% of the level of experienced

  • 7/24/2019 Stat Review -Keller

    173/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    workers within one week of being hired and trained?

    Experienced workers can process 500 packages/hour, thus if

    our conjecture is correct, we expect new workers to be able

    to process .90(500) = 450 packages per hour.

    Given the data, is this the case?

    E@ample O2O%

    Our objective is todescribethe population of the numbers of

    *!)+(*-

    http://e/TT%20PowerPoint%20slides/References/Xm12-01.xlshttp://e/TT%20PowerPoint%20slides/References/Xm12-01.xls
  • 7/24/2019 Stat Review -Keller

    174/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    packages processed in 1 hour by new workers, that is we

    want to know whether the new workers productivity is morethan 90% of that of experienced workers. Thus we have:

    H1: > 450

    Therefore we set our usual null hypothesis to:

    H0: = 450

    E@ample O2O%

    Our test statistic is:

    C/P'()

  • 7/24/2019 Stat Review -Keller

    175/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    With n=50 data points, we have n1=49 degrees of freedom.

    Our hypothesis under question is:

    H1: > 450

    Our rejection region becomes:

    Thus we will reject the null hypothesis in favor of the

    alternative if our calculated test static falls in this region.

    E@ample O2O%

    From the data, we calculate = 460.38, s=38.83 and thus:

    C/P'()

  • 7/24/2019 Stat Review -Keller

    176/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Since

    we reject H0in favor of H1, that is, there is sufficient

    evidence to conclude that the new workers are producing at

    more than 90% of the average of experienced workers.

    E@ample O22%

    Can we estimate the return on investment for companies that

    li d ?

    *!)+(*-

  • 7/24/2019 Stat Review -Keller

    177/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    won quality awards?

    We are given a random sampleof n = 83 such companies.

    We want to construct a 95% confidence interval for the mean

    return, i.e. what is: ??

    E@ample O22%

    From the data, we calculate:

    C/P'()

    http://e/TT%20PowerPoint%20slides/References/Xm12-02.xlshttp://e/TT%20PowerPoint%20slides/References/Xm12-02.xls
  • 7/24/2019 Stat Review -Keller

    178/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    For this term

    and so:

    Check 4e'isite Conditions%

    The Student t distribution isrobust, which means that if the

    l i i l h l f h d

  • 7/24/2019 Stat Review -Keller

    179/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    population is nonnormal, the results of the t-test and

    confidence interval estimate are still valid provided that thepopulation is not extremelynonnormal.

    To check this requirement,draw a histogramof the data andsee how bell shaped the resulting figure is. If a histogram

    is extremely skewed (say in the case of an exponential

    distribution), that could be considered extremely

    nonnormal and hence t-statistics would be not be valid inthis case.

    Inference >(o't &op'lation?ariance%If we are interested in drawing inferences about a

    l ti i bilit h d

  • 7/24/2019 Stat Review -Keller

    180/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    populations variability, the parameter we need to

    investigate is the population variance:

    The sample variance (s2)is an unbiased, consistent andefficient point estimator for . Moreover,

    the statistic, , has a chi-squared distribution,

    with n1 degrees of freedom.

    Testing . Estimating &op'lation?arianceCombining this statistic:

  • 7/24/2019 Stat Review -Keller

    181/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    With the probability statement:

    Yields theconfidence interval estimator for :

    loer con*dencelimit

    'pper con*dencelimit

    E@ample O29%

    Consider a container filling machine. Management wants a

    hi t fill 1 lit (1 000 ) th t th t i f th

    *!)+(*-

  • 7/24/2019 Stat Review -Keller

    182/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    machine to fill 1 liter (1,000 ccs) so that that variance of the

    fills is less than 1 cc2. A random sampleof n=25 1 liter fillswere taken. Does the machine perform as it should at the 5%

    significance level?

    We want to show that:

    H1: < 1

    (so our null hypothesis becomes: H0

    : = 1). We will usethis test statistic:

    ?ariance is less than O cc2

    E@ample O29%

    Since our alternative hypothesis is phrased as:

    C/P'()

    http://e/TT%20PowerPoint%20slides/References/Xm12-03.xlshttp://e/TT%20PowerPoint%20slides/References/Xm12-03.xls
  • 7/24/2019 Stat Review -Keller

    183/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    H1: < 1

    We will reject H0in favor of H1if our test statistic falls into

    this rejection region:

    We computer the sample variance to be: s2=.8088

    And thus our test statistic takes on this value

    com

    pare

    E@ample O2M%

    As we saw, we cannot reject the null hypothesis in favor of

    th lt ti Th t i th i t h id t i f

  • 7/24/2019 Stat Review -Keller

    184/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    the alternative. That is, there is not enough evidence to infer

    that the claim is true.

    Note:the result does not say that the variance is greater than

    1, rather it merely states that we are unableto show that the

    variance is less than 1.

    We could estimate (at 99% confidence say) the variance of

    the fills

    E@ample O2M%

    In order to create a confidence interval estimate of the

    i d th f l

    C/P'()

  • 7/24/2019 Stat Review -Keller

    185/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    variance, we need these formulae:

    we know (n1)s2= 19.41 from our previous calculation, and

    we have from Table 5 in Appendix B:

    loer con*dencelimit

    'pper con*dencelimit

    Comparing To &op'lations%

    Previously we looked at techniques to estimate and test

    parameters for one population:

  • 7/24/2019 Stat Review -Keller

    186/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    parameters for one population:

    Population Mean , Population Variance

    We will still consider these parameters when we are looking

    attwo populations, however our interest will now be:

    Thedifferencebetween two means.Theratioof two variances.

  • 7/24/2019 Stat Review -Keller

    187/209

    "ampling )istri('tion of

    1. is normally distributed if the original populations

    are normal or approximately normal if the populations are

  • 7/24/2019 Stat Review -Keller

    188/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    are normal or approximately normal if the populations are

    nonnormal and the sample sizes are large (n1, n2> 30)

    2. The expected value of is

    3. The variance of is

    and the standard error is:

    (o't

    Since isnormally distributedif the original

    populations are normal or approximately normal if the

  • 7/24/2019 Stat Review -Keller

    189/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    populations are normal orapproximately normalif the

    populations are nonnormal and the sample sizes are large (n1,n2> 30), then:

    is a standard normal (or approximately normal) random

    variable. We could use this to build test statistics orconfidence interval estimators for

    (o't

    except that, in practice, the z statistic is rarely used since

    the population variances are unknown

  • 7/24/2019 Stat Review -Keller

    190/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    the population variances are unknown.

    Instead we use a t-statistic. We consider two cases for the

    unknown population variances: when we believe they areequaland conversely when they arenot equal.

    !hen are variances e'al#

    How do we know when the population variances are equal?

  • 7/24/2019 Stat Review -Keller

    191/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Since the population variances are unknown, we cant know

    for certain whether theyre equal, but we can examine the

    sample variancesand informally judgetheir relative values

    to determine whether we can assume that the populationvariances are equal or not.

    Test "tatistic for e'alvariances11) Calculate thepooled variance estimatoras

  • 7/24/2019 Stat Review -Keller

    192/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    2) and use it here:

    degrees of freedom

    CI Estimator for e'alvariances1The confidence interval estimator for when the

    population variances are equal is given by:

  • 7/24/2019 Stat Review -Keller

    193/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    population variances are equal is given by:

    degrees of freedompooled variance estimator

    Test "tatistic for 'ne'alvariances1The test statistic for when the population variances

    are unequal is given by:

  • 7/24/2019 Stat Review -Keller

    194/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    are unequalis given by:

    Likewise, the confidence interval estimator is:

    degrees of freedom

    E@ample O92%

    Two methods are being tested for assembling office chairs.

    Assembly times are recorded (25 times for each method) At

    *!)+(*-

  • 7/24/2019 Stat Review -Keller

    195/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Assembly times are recorded (25 times for each method). At

    a 5% significance level, do the assembly times for the twomethodsdiffer?

    That is, H1:

    Hence, our null hypothesis becomes: H0:

    Reminder: This is atwo-tailed test.

    E@ample O92%

    The assembly timesfor each of the two methods are

    recorded and preliminary data is prepared

    C/P'()

    http://e/TT%20PowerPoint%20slides/References/Xm13-02.xlshttp://e/TT%20PowerPoint%20slides/References/Xm13-02.xls
  • 7/24/2019 Stat Review -Keller

    196/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    recordedand preliminary data is prepared

    The sample variances are similar, hence e ill ass'me thatthe pop'lation variances are e'al%

    E@ample O92%

    Recall, we are doing a two-tailed test, hence the rejection

    region will be:

    C/P'()

  • 7/24/2019 Stat Review -Keller

    197/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    region will be:

    The number of degrees of freedom is:

    Hence our critical values of t (and our rejection region)becomes:

    E@ample O92%

    In order to calculate our t-statistic, we need to first calculate

    the pooled variance estimator followed by the t-statistic

    C/P'()

  • 7/24/2019 Stat Review -Keller

    198/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    thepooled variance estimator, followed by the t statistic

    E@ample O92% *+(),P,)(

  • 7/24/2019 Stat Review -Keller

    199/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Since our calculated t-statisticdoes notfall into the rejection

    region, we cannot reject H0in favor of H1, that is, there is not

    sufficient evidence to infer that the mean assembly timesdiffer.

    E@ample O92%

    Excel, of course, also provides us with the information

    *+(),P,)(

  • 7/24/2019 Stat Review -Keller

    200/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Compare%

    %or look at p;val'e

    Con*dence Interval%

    We can compute a 95% confidence interval estimate for the

    difference in mean assembly times as:

  • 7/24/2019 Stat Review -Keller

    201/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    difference in mean assembly times as:

    That is, we estimate the mean difference between the twoassembly methods between .36 and .96 minutes. Note: zero

    is included in this confidence interval

  • 7/24/2019 Stat Review -Keller

    202/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    independent samples.

    If, however, an observation in one sample ismatchedwith

    an observation in a second sample, this is called amatched

    pairs experiment.

    To help understand this concept, lets consider example 13.4

    Identifying actors%

    Factors that identify the t-test and estimator of :

  • 7/24/2019 Stat Review -Keller

    203/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Inference a(o't the ratio of tovariancesSo far weve looked at comparing measures of centrallocation, namely themeanof two populations.

  • 7/24/2019 Stat Review -Keller

    204/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    , y p p

    When looking at two population variances, we consider the

    ratioof the variances, i.e. the parameter of interest to us is:

    The sampling statistic: isFdistributed with

    degrees of freedom.

    Inference a(o't the ratio of tovariancesOur null hypothesis is always:

  • 7/24/2019 Stat Review -Keller

    205/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    H0:

    (i.e. the variances of the two populations will be equal, hence

    their ratio will be one)

    Therefore, our statistic simplifies to:

    df1 = n1- 1

    df2 = n2- 1

    E@ample O9F%

    In example 13.1, we looked at the variances of the samples

    of people who consumed high fiber cereal and those who did

    *!)+(*-

  • 7/24/2019 Stat Review -Keller

    206/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    p p g

    not and assumed they were not equal. We can use the ideasjust developed to test if this is in fact the case.

    We want to show: H1:(the variances are not equal to each other)

    Hence we have our null hypothesis: H0:

    E@ample O9F%

    Since our research hypothesis is: H1:

    We are doing a two-tailed test and our rejection region is:

    C%&C'&%()

  • 7/24/2019 Stat Review -Keller

    207/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    We are doing a two-tailed test, and our rejection region is:

    F

    E@ample O9F%Our test statistic is:

    C%&C'&%()

  • 7/24/2019 Stat Review -Keller

    208/209

    Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc

    Hence there is sufficient evidence to reject the nullhypothesis in favor of the alternative; that is, there is a

    difference in the variance between the two populations.

    F.58 1.61

    E@ample O9F%We may need to work with the Excel output before drawing

    conclusions

    *+(),P,)(

  • 7/24/2019 Stat Review -Keller

    209/209

    'r research hypothesis3O6

    re'ires to;tail testing,

    ('t E@cel only gives 's val'esfor one;tail testing%