08_review_of_part_i

Upload: rama-dulce

Post on 13-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 08_Review_of_Part_I

    1/19

    - 1 -

    REVIEW OF PART I

    Topics Outline

    Probability, Probability Distributions, Describing Distributions

    Decision Analysis

    Statistical InferenceSimple Linear Regression

    Terms and Concepts

    1. Probability of events16. Types of data (categorical, numerical,

    cross-sectional, time series)

    2. Probability rules 17. Bar (column) graph

    3. Mutually exclusive (disjoint) events 18. Pie chart

    4. Conditional probability 19. Histogram

    5. Independent events 20. Box plot

    6. Discrete random variable21. Measures of central tendency

    (mean x , median, mode)

    7. Continuous random variable22. Measures of shape

    (skewness, kurtosis)

    8. Discrete probability distributionfunction

    23. Symmetric, right skewed, left skeweddistributions

    9. Density function

    24. Measures of variability

    (range, quartiles, interquartile range,

    variance2s , standard deviations)

    10. Expected value (mean) , variance2 , and standard deviation of a

    random variable

    25. The empirical rule

    11. Binomial distribution 26. Decision analysis, decisions and outcomes

    12. Uniform distribution 27. Value model

    13. Normal distribution 28. Payoff table

    14. Standard normal distribution 29. Maximin and maximax criteria

    15. Forward and backward calculations 30. EMV (Expected monetary value)

  • 7/26/2019 08_Review_of_Part_I

    2/19

    - 2 -

    31. EMV criterion 51. Sampling distribution of X ( known)

    32. Risk profile 52. Sampling distribution of X ( unknown)

    33. Sensitivity analysis 53. tdistributions and their properties

    34. Decision trees, nodes, branches 54. Point estimate

    35. Folding-back procedure55. Confidence interval

    (critical value, confidence level, standarderror, margin of error)

    36. Prior probabilities 56. Calculation of sample size

    37. Likelihoods

    57. Hypothesis tests(null hypothesis, alternative hypothesis, left-sided, right-sided, two-sided tests, test

    statistic,P-value)

    38. Posterior probabilities58. Statistical significance, level of significance

    39. Law of total probability59. Confidence interval approach for two-sided

    tests

    40. Bayes rule 60. Type I and Type II errors

    41. EVPI(Expected value of perfect information)

    61. Simple linear regression

    42. EVSI(Expected value of sample information)

    62. Explanatory and response variables

    43. Precision tree add-in 63. Scatterplots and correlation r

    44. Interpretation of charts(strategy region, tornado, spider)

    64. Least-squares regression line(interpretation of slope and intercept)

    45. Population, parameters65. Making predictions (interpolation,

    extrapolation)

    46. Sample, statistics 66. Residuals and residual plots

    47. Simple random sample 67. Standard error of estimate es

    48. Statistical inference 68. Coefficient of determination 2r

    49. Sampling distributions 69. Outliers and influential observations

    50. Central Limit Theorem (CLT) 70. Causation and lurking variables

  • 7/26/2019 08_Review_of_Part_I

    3/19

    - 3 -

    Example 1

    Cooper realty is a small real estate company located in Albany, New York, specializingprimarily in residential listings. They recently became interested in determining thelikelihood of one of their listings being sold within a certain number of days.An analysis of company sales of 800 homes in previous years produced the following data.

    Days Listed Until SoldInitial Asking Price Under 30 31 - 90 Over 90 Total

    Under $150,000 50 40 10 100

    $150,000 - $199,999 20 150 80 250

    $200,000 - $250,000 20 280 100 400

    Over $250,000 10 30 10 50

    Total 100 500 200 800

    (a) If A is defined as the event that a home is listed for more than 90 days before being sold,estimate the probability of A.

    The joint probability table is:

    Days Listed Until Sold

    Initial Asking Price Under 30 31 - 90 Over 90 Total

    Under $150,000 0.0625 0.05 0.0125 0.125

    $150,000 - $199,999 0.025 0.1875 0.1 0.3125

    $200,000 - $250,000 0.025 0.35 0.125 0.5

    Over $250,000 0.0125 0.0375 0.0125 0.0625

    Total 0.125 0.625 0.25 1

    P(A) = 0.25 (using the marginal probability)

    or

    P(A) = 200/800 = 0.25 (using counts)

    (b) If B is defined as the event that the initial asking price is under $150,000,estimate the probability of B.

    P(B) = 0.125 (using the marginal probability)

    or

    P(B) = 100/800 = 0.125 (using counts)

    (c) What is the probability of A B?

    P(A B) = 0.0125 (using the joint probability)

    or

    P(A B) = 10/800 = 0.0125 (using counts)

  • 7/26/2019 08_Review_of_Part_I

    4/19

    - 4 -

    (d) Assuming that a contract just signed to list a home has an initial asking price of less than$150,000, what is the probability that the home will take Cooper realty more than 90 days to sell?

    P(A | B) = P(A B) / P(B) = 0.0125 / 0.125 = 0.10

    (e) Are events A and B independent?

    No, since

    0.10 = P(A | B) P(A) = 0.25

    or

    0.0125 = P(A B) P(A)P(B) = 0.0313

  • 7/26/2019 08_Review_of_Part_I

    5/19

    - 5 -

    Example 2

    Nine percent of undergraduate students carry credit card balances greater than $7000.

    Suppose 10 undergraduate students are selected randomly to be interviewed about credit

    card usage.

    (a) Is the selection of 10 students a binomial experiment? Explain.

    Yes. Since they are selected randomly,pis the same from trial to trial and the trials

    are independent. We have a binomial experiment with n= 10 and p= .09

    1010!( ) (.09) (.91)

    !(10 )!

    x xf x

    x x

    (b) What is the probability that two of the students will have a credit card balance greater

    than $7000?

    P(X= 2) =f(2) =822102 91.0)09.0(

    )!210(!2

    !10)09.01()09.0(

    2

    10

    = (45)(0.0081)(0.4703) = 0.1714

    (c) How many students would you expect to have a credit card balance greater than $7000

    in the sample of 10 students?

    npXE )( = (10)(0.09) = 0.9

    Approximately one student.

    (d) What is the variance in the number of students with credit card balances greater than $7000?

    )1(2 pnp = (10)(0.09)(0.91) = 0.819

    (e) What is the standard deviation of the number of students with credit card balances

    greater than $7000?

    819.0 = 0.905

  • 7/26/2019 08_Review_of_Part_I

    6/19

    - 6 -

    Example 3

    The time customers spend in a record store is uniformly distributed between 3 and 12

    minutes.

    (a) What is the probability that a customer will spend less than 5 minutes in the store?

    P(X< 5) = (53)(1/9) = 2/9 = 0.2222

    (b) What is the probability that a customer will spend exactly 5 minutes in the store?

    P(X= 5) = 0

    (c) What is the probability that a customer will spend between 5 and 15 minutes in the store?

    P(5 X 15) = (125)(1/9) = 7/9 = 0.7778

    (d) Determine the expected time customers spend in the store.

    E(x) = 50.72

    15

    2

    123

    2

    ba

    (e) Compute the standard deviation for the time customers spend in the store.

    12

    81

    12

    9

    12

    )123(

    12

    )( 2222 ba= 6.75

    75.6 = 2.5981

  • 7/26/2019 08_Review_of_Part_I

    7/19

    - 7 -

    Example 4

    The scores of adults on an IQ test are approximately normal with mean 100 and standarddeviation 15.

    = 100 = 15

    (a) Corinne scores 118 on such a test. She scores higher than what percent of all adults?

    2.115

    100118xz

    The area to the left of 1.2 is 0.8849.

    Corinne scores higher than 88.49% of all adults.

    (b) The organization MENSA, which calls itself the high IQ society, requires an IQscore of 130 or higher for membership. What percent of adults would qualify for

    membership?

    215

    100130xz

    area to the right of 2 = 1(area to the left of 2) = 10.9772 = 0.0228

    Hence, 2.28% of adults would qualify for membership.

    The same question, answered using the 68-95-99.7% rule:

    130 is two standard deviations above the mean. So, approximately 2.5% of adults haveIQs of 130 or more.

    (c) What percent of all adults score between 90 and 120?

    67.015

    1009090

    xz

    33.115

    100120120

    xz

    area between 90 and 120 = (area to the left of 1.33)(area to the left of0.67)

    = 0.90820.2514

    = 0.6568

    65.68% of all adults score between 90 and 120.

  • 7/26/2019 08_Review_of_Part_I

    8/19

    - 8 -

    (d) What percent of all adults score within one standard deviation of the mean?

    The area within one standard deviation from the mean of a normaldistribution with

    mean and standard deviation is equal to the area between (1) and (+1) on the

    standard normaldistribution.

    area between 85 and 115 = (area to the left ofz= +1)(area to the left ofz=1)

    = 0.84130.1587

    = 0.6826

    68.26% of all adults score within one standard deviation from the mean.

    (e) What percent of all adults score within two standard deviations from the mean?

    area between 70 and 130 = (area to the left ofz= +2)(area to the left ofz=2)

    = 0.97720.0228

    = 0.9544

    95.44% of all adults score within two standard deviations from the mean.

    (f) What IQ scores would place Corinne in the bottom 30% of all adults?

    The area of 0.3015 is to the left ofz=0.52.

    Solving15

    10052.0 x

    givesx= 92.20 92

    Scores below 92 would place Corinne in the bottom 30% of all adults.

    (g) How well must Corinne do in order to place in the top 20% of all adults?

    The cut-off point for top 20% is equal to the cut-off point for bottom 80%.

    The area of 0.7995 is to the left of z = 0.84.

    Solving15

    10084.0 x

    givesx= 112.6 113

    Corinne must score 113 or better to place in the top 20% of all adults.

  • 7/26/2019 08_Review_of_Part_I

    9/19

    - 9 -

    Example 5

    Marketing a New Product at Acme (See Acme_MarketingDecisions.xlsx )

    The Acme Company is trying to decide whether to market a new product. Acme believes that it

    might be wise to introduce the product in a regional test market before introducing it nationally.

    Acme estimates that the net cost of the test market is $100,000. Based on the results of the test

    market, it can then decide whether to market the product nationally, in which case it will incur a fixedcost of $7 million. Acmes unit margin(the difference between the anticipated selling price and theknown unit cost of the product) is $18. We assume this is relevant only for the national market.

    Acme classifies the results in either the test market or the national market as great, fair, or awful.Let NG, NF, and NA represent great, fair, and awful national-market results, respectively,and TG, TF, and TA represent similar events for the test market.

    In the absence of any test market information, Acme estimates that probabilities of the threenational market outcomes are 0.45, 0.35, and 0.20, respectively. Each of the results in thenational market is accompanied by a forecast of total units sold. These sales volumes (in1000s of units) are 600 (great), 300 (fair), and 90 (awful). In addition, Acme has the following

    historical data from products that were introduced into both test markets and national markets.

    Of the products that eventually did great in the national market, 64% did great in the test market,

    26% did fair in the test market, and 10% did awful in the test market.

    Of the products that eventually did fair in the national market, 18% did great in the test market,57% did fair in the test market, and 25% did awful in the test market.

    Of the products that eventually did awful in the national market, 9% did great in the test market,48% did fair in the test market, and 43% did awful in the test market.

    (a) What are Acmes possible strategies?

    Acme must first decide whether to run a test market.

    Then it must decide whether to introduce the product nationally.

    (b) What are the prior probabilites of national-market results?

    P(NG) = 0.45 P(NF) = 0.35 P(NA) = 0.20

    (c) What are the likelihoods of fair test-market results, given national-market results?

    According to the historical percentages,

    P(TF|NG) = 0.26 P(TF|NF) = 0.57 P(TF|NA) = 0.48

    (d) What is the probability of a fair test-market result?

    Applying the Law of total probability, we get

    = 0.26(0.45) + 0.57(0.35) + 0.48(0.20)

    = 0.4125

  • 7/26/2019 08_Review_of_Part_I

    10/19

    - 10 -

    (e) What is the probability of great national result, given fair test result?

    Using Bayes rule gives

    (f) Is this result reasonable?

    This is a reasonable result. In the absence of test market information, the probability of a greatnational market is 0.45. However, after a test market with only fair results, the probability ofa great national market is revised down to 0.2836.

    (g) Interpret the decision tree developed with Excel.

    To interpret this tree, recall that each value just below each node name is an EMV.(These are colored red or green in Excel.)For example, the 796.76 in cell B41 is the EMV for the entire decision problem.It means that Acme's best EMV from acting optimally is $796,760.

  • 7/26/2019 08_Review_of_Part_I

    11/19

    - 11 -

    As another example, the 74 in cell D35 means that if Acme ever gets to that point

    there is no test market and the product is marketed nationallythe EMV is $74,000.Actually, this is the expected selling profit minus the $7 million fixed cost, so the expected

    selling profit, given that no information from a test market has been obtained, is $7,074,000.

    (h) Use the decision tree to find Acmes optimal strategy.

    Acme's optimal strategy is apparent by following the TRUE branches from left to right:

    Acme should first run a test market.If the test-market result is great, the product should be marketed nationally.However, if the test-market result is fair or awful, the product should be abandoned.In these cases the prospects from a national market look bleak, so Acme should cut its losses.(And there arelosses. In these latter two cases, Acme has already spent $100,000 on the test

    market and has nothing to show for it.)

    (i) Construct the risk profile of the optimal strategy.

    There are two values at each end node. The bottom number is the combined monetary valuealong this sequence of branches, and the top number is the probability of this sequence ofbranches. This information leads directly to probability distribution in the risk profile:

  • 7/26/2019 08_Review_of_Part_I

    12/19

    - 12 -

    For this optimal strategy, the onlypossible monetary outcomes are: a gain of $3,700,000 losses of $100,000, $1,700,000,

    $5,480,000

    Their respective probabilities are

    0.288, 0.631, 0.063, and 0.018.

    Fortunately, the large possible lossesare unlikely enough that the EMV isstill positive, $796,760.

    (j) Do you think Acme should implement this optimal strategy?You might argue that the large potential losses and the slightly higher than 70% chance of

    someloss should persuade Acme to abandon the product right awaywithout a test market.However, this is what playing the averages with EMV is all about.Because the EMV of thisoptimal strategy is greater than 0, the EMV from abandoning the product right away, Acmeshould go ahead with this optimal strategy if the company is indeed an EMV maximizer.

    (k) Use the strategy region chart given below to investigate if the decision about whether to run a testmarket or to market nationally changes when the unit margin (currently $18) varies from $8 to $2

    The chart indicates that for smallunit margins, it is better notto run

    a test market.The top line, at value 0,corresponds to abandoning theproduct altogether, whereas thebottom line, at value100,corresponds to running a testmarket and then abandoning theproduct regardless of the results.

    Similarly, for large unit margins, it is also best not to run a test market.Again, the top line is 100 above the bottom line. However, the reasoning now is different.For large unit margins, the company should market nationally regardlessof test-marketresults, so there is no reason to spend money on a test market.Finally, for intermediate unit margins, as in the original model, the chart shows that itis best to run a test market.

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    -6000

    -5000

    -4000

    -3000

    -2000

    -1000 0

    1000

    2000

    3000

    4000

    Pro

    ba

    bility

    Probabilities for Decision Tree 'Acme'Optimal Path of Entire DecisionTree

    -5000

    50010001500200025003000350040004500

    $5

    $10

    $15

    $20

    $25

    $30

    Expecte

    d

    Va

    lue

    Unit margin (B8)

    Strategy Region of Decision Tree 'Acme' Expected Value of Node 'Test market?' (B41)

    With Variation of Unit margin (B8)

    No

    Yes

  • 7/26/2019 08_Review_of_Part_I

    13/19

    - 13 -

    (l) Currently, the fixed cost of the test market is $100,000. How much this test market is really worthIn other words, what is the EVSI (expected value of sample information)?

    EVSI = EMV with (free) sample informationEMV without information

    The EMV from test marketing is $796,760.$100,000 of it is the cost of the test market.Therefore, if this test market were free, the expected profit would be $896,760.

    On the other hand, the EMV from not running a test market is $74,000 (see cell C31 in the tree).The difference is EVSI:

    EVSI = $896,760$74,000 = $822,760

    Intuitively, running the test market is worth something because it changes the optimal decision.With no test-market information, the best decision is to market nationally (see the top part of the tr

    However, with the test-market information, the ultimate decision depends on the test- market resulSpecifically, Acme should market nationally only if the test-market result is great.This is what makes information worth somethingits outcome affects the optimal decision.

    (m) In general, Acme might have many sources of information it could obtain that would help it makenational decision; the test market is just one of them. How much such information could be worth

    This is answered by EVPI, the expected value of perfect information. Imagine that Acmecould purchase an envelope that has the true national-market resultgreat, fair, or awfulwritten inside. EVPI is what this envelope is worth.

    If the envelope reveals that the national market result will be great, then Acme will have aprofit of $3,800,000 (600 units sold times $18 per unit minus the fixed cost of $7 million).If the contents of the envelope reveal that the national market will be fair or awful Acmeshould abandon the product right away (that is, the profit will be $0).

    The probabilities for great, fair, and awful national market are 0.45, 0.35, and 0.20, respectively.Therefore, if the envelope (perfect information) is free

    EMV with (free) perfect information = 0.45($3,800,000) + 0.35($0) + 0.20($0) = $1,710,000

    If there is no information, the EMV is $74,000. Therefore,

    EVPI = EMV with (free) perfect informationEMV without information

    = $1,710,000$74,000 = $1,636,000

    No sample information, test market or otherwise, could possibly be worth more than this.So if some hotshot market analyst offers to provide extremely reliable market information to Acme for, say, $1.8 million, Acme knows this information cannot be worth its cost.

  • 7/26/2019 08_Review_of_Part_I

    14/19

    - 14 -

    Example 6To estimate the mean height of male students on your campus, you measure a simple random

    sample of 25 students. You know from government data that the heights of young men vary

    according to the normal distribution with mean = 70 inches and standard deviation = 2.8 inches

    (a) If you choose one student at random, what is the probability that he is between 69 and 71 inches t

    For the heightX of an individual student,

    P(69

  • 7/26/2019 08_Review_of_Part_I

    15/19

    - 15 -

    Example 7A manager of an insurance company wanted to see how well one of his salesrepresentatives was doing, so he randomly selected 30 matured policies that had beensold by the sales rep and computed the net profit (premium charged minus paid claims),for each of the 30 policies:

    Profit (in $) from 30 policies

    222.80 463.35 2089.401756.23 -66.20 2692.751100.85 57.90 2495.703340.66 833.95 2172.701006.50 1390.70 3249.65445.50 2447.50 -397.10

    3255.60 1847.50 -397.313701.85 865.40 186.25-803.35 1415.65 590.85

    3865.90 2756.94 578.95

    (a) Are the necessary conditions for statistical inference satisfied?

    The sample was selected randomly from the matured policies sold by the sales representative.The sample appears to be unimodal and fairly symmetric without strong skewness or outliers.The sample size is pretty large and the use of tdistribution with df = n1 = 301 = 29 is safe.

    (b) Construct a 95% confidence interval for the mean profit of policies sold by the sales rep.

    We calculate the sample mean = $1,438.90 and standard deviations= $1,329.60The critical value for tdistribution with df = n1 = 301 = 29 (from Table C) for 95%confidence is 2.045.

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    -500 500 1500 2500 3500

    Count

    Profit

  • 7/26/2019 08_Review_of_Part_I

    16/19

    - 16 -

    The 95% confidence interval is:

    n

    stx *

    30

    60.1329045.290.1438

    496.421438.90=)75.242(045.290.1438

    = $942.48 to $1,935.32

    (c) Interpret the confidence interval in the proper context.

    From our analysis of the selected policies, we are 95% confident that the true mean profit ofall policies sold by this sales rep is contained in the interval from $942.48 to $1,935.32.

    Note: Insurance losses are notoriously subject to outliers. One very large loss could influencethe average profit substantially. However, there were no such cases in this data set.

    (d) Is there evidence that the mean profit of policies sold by this sales representative is less than$1,500?

    To test the hypotheses

    1500:

    1500:0

    aH

    H

    we calculate the test statistic

    n

    s

    xt 0

    30

    60.1329

    150090.1438 2517.0

    75.242

    10.61

    TheP-value is the probability of observing a sample mean as small as $1,438.90 (or smaller)if the true mean were $1,500, as the null hypothesis states.

    Using the t-Table, we findP-value > 0.25.

    If the mean were $1,500, we would expect a sample of size 30 to have a mean this low morethan 25% of the time. Therefore, the result we obtained from the sampled contracts is notsurprising and we conclude that there is not enough evidence in this sample of policies toindicate that the true mean is below $1,500.

  • 7/26/2019 08_Review_of_Part_I

    17/19

    - 17 -

    Example 8

    European GDP growth

    Is economic growth in Europe related to growth in the United States? Heres a regressionoutput for the average growth in 25 European countries (in % of Gross Domestic Product)versus the growth in the United States. Each point represents one of years from 1970 to 2007.

    (a) Describe the relationship between the economic growth in the United States and theeconomic growth in Europe.

    The scatterplot shows a positive linear association, with one or two possible outliers.

    The correlation is r= 2965.0 = 0.545 indicating a moderate linear relationship.

    (b) Economists speculate that the growth rate of the United States can help predict the growthrate of the 25 European countries. Do you think the data confirm the economists speculation?

    r2= 0.2965

    About 30% of the variation in the growth rates of the 25 European countries is accounted forby the growth rates of the United States.

    The growth rates of the United States can be used to give (very rough) estimates of the growthrates of the 25 European countries. The spread is so wide that the estimates would not bevery reliable.

    y = 0.3616x + 1.3297

    R = 0.2965

    -1

    0

    1

    2

    3

    4

    5

    6

    -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

    AnnualGDPGrowthof25European

    Countries(%)

    Annual GDP Growth of US (%)

  • 7/26/2019 08_Review_of_Part_I

    18/19

    - 18 -

    (c) In 2007, the United States experienced a 3.20% growth, while Europe grew at a rate of 2.16%.Is this more or less than you would have predicted?

    The predicted value ofyat anxvalue of 3.2% is:

    Growth(25 European Countries)= 1.330 + 0.3616Growth(United States)= 1.330 + 0.3616(3.2)= 2.48712% or 2.49%

    The predicted value using the linear model is higher than the actual percentage.The actual value performed less than expected:

    residual = observedypredicted y = 2.162.49 =0.33%

    (d) Would your prediction be better if the outlier (x,y) = (0.2167, 4.3748) has been removed?After removing the outlier the summary statistics are

    x = 3.1487 y = 2.3882 r= 0.6347

    xs = 2.0174 ys = 1.3382

    Find the equation of the new regression line.

    The slope is

    421.0)6633.0)(6347.0(0174.2

    3382.16347.0

    x

    y

    s

    srb

    The intercept is

    xbya = 0626.1)1487.3)(421.0(3882.2

    The least squares regression line is thus

    xy 421.00626.1

    (e) How influential is the outlier for the correlation?

    The correlation rises from 0.545 to 0.635.

    Removing the outlier makes the linear association stronger and so moves rcloser to 1.

    (f) How influential is the outlier for the coefficient of determination?

    The new coefficient of determination is

    4028.06347.0 22r

    The coefficient of determination rises from 0.2965 to 0.4028.Removing the outlier improves the percent explained variation from about 30% to about 40%.

  • 7/26/2019 08_Review_of_Part_I

    19/19

    - 19 -

    (g) How influential is the outlier for the regression line?

    The slope increases slightly from 0.3616 to 0.421 with the outlier removed.This is too small a change to consider the outlier influential for the regression line.

    (h) Find the prediction for European growth when the United States growth is 3.20% andcompare it with the prediction from part (c).

    Forx= 3.20, we predict

    xy 421.00626.1 = 1.0626 + (0.421)(3.20) = 2.4098 or 2.41%

    For predictions, both lines give similar results.The prediction with the outlier removed is only slightly better:

    residual = observedypredicted y = 2.162.41=0.25%