comm291 practice midterm

70
Part A. Midterm Exam 2013 Notes: This exam has 9 questions. The duration is 2 hours. Bookso noteso and calculators are allowed, but not computerso cellphones or on-line connectivity. MT2013: Question 1 ooA sampling of sampling questionsoo a) The Human Resources Department of a large university maintains records on its facultv members. The table displavs some of these data. Place an X in the space beside each variable that is best described as Quantitative. -Payroll Number -Birth date -Years of Employment -Teaching Rating _Faculty -Salary Classification b) Which of the following is (are) based on cross-sectional data? -A. Company quarterly profits . B. Percentage of Canadian adults who work full-time 'C. Historical closing stock prices -D. Yearly student enrolments -E. Annual costs c) Which of the following is (are) time series data? -A. Number of employees in20l2 -B. This month's demand for an automotive part -C. This quarter's sales of automobiles -D. Weekly receipts at a clothing boutique -E. Percentage of employees who are female d) The administration of a large university wants to study the types of wellness programs that would interest its employees. They plan to survey a random sample of employees. Under consideration are several sampling plans. Beside each plan, write the number of the sampling strategy given in the following list. for each. Choose from among: 1 : Simple Random Sampling 2: Stratified Random Sampling 3 : Cluster Sampling 4: Systematic Sampling _ (i) There are five categories of employees (administration, faculty, professional staff, clerical and maintenance). Randomly select ten individuals from each category. - (ii) Each employee has an ID number. Randomly select 50 numbers. _ (iii) Randomly select a school within the university (e.g., Business School) and survey all of the individuals (administration, faculty, professional staff, clerical and maintenance) who work in that school. _ (iv) The HR Department has an alphabetized list of newly hired employees (hired within the last five years). After starting the process by randomly selecting an employee from the list, every fifth name is chosen to be included in the sample. a) dir c) d) e) SG-4

Upload: al3ung

Post on 29-Dec-2015

491 views

Category:

Documents


4 download

DESCRIPTION

Practice midterm for Business Statistics Course at UBC

TRANSCRIPT

Page 1: Comm291 Practice Midterm

Part A. Midterm Exam 2013Notes: This exam has 9 questions. The duration is 2 hours. Bookso noteso and

calculators are allowed, but not computerso cellphones or on-line connectivity.

MT2013: Question 1ooA sampling of sampling questionsoo

a) The Human Resources Department of a large university maintains records on its

facultv members. The table displavs some of these data.

Place an X in the space beside each variable that is best described as Quantitative.

-Payroll Number

-Birth date

-Years of Employment

-Teaching Rating

_Faculty

-Salary Classification

b) Which of the following is (are) based on cross-sectional data?

-A. Company quarterly profits

. B. Percentage of Canadian adults who work full-time'C. Historical closing stock prices

-D. Yearly student enrolments

-E. Annual costs

c) Which of the following is (are) time series data?

-A. Number of employees in20l2

-B. This month's demand for an automotive part

-C. This quarter's sales of automobiles

-D. Weekly receipts at a clothing boutique

-E. Percentage of employees who are female

d) The administration of a large university wants to study the types of wellness programs

that would interest its employees. They plan to survey a random sample of employees.

Under consideration are several sampling plans. Beside each plan, write the number ofthe sampling strategy given in the following list. for each. Choose from among:

1 : Simple Random Sampling2: Stratified Random Sampling3 : Cluster Sampling4: Systematic Sampling

_ (i) There are five categories of employees (administration, faculty, professional staff,

clerical and maintenance). Randomly select ten individuals from each category.

- (ii) Each employee has an ID number. Randomly select 50 numbers.

_ (iii) Randomly select a school within the university (e.g., Business School) and survey

all of the individuals (administration, faculty, professional staff, clerical and

maintenance) who work in that school.

_ (iv) The HR Department has an alphabetized list of newly hired employees (hiredwithin the last five years). After starting the process by randomly selecting an

employee from the list, every fifth name is chosen to be included in the sample.

a)dir

c)

d)

e)

SG-4

Page 2: Comm291 Practice Midterm

e) A manufacturer of toys claims that less than3o/o of his toys are defective. When 100

toys were drawn from one production run of 5,000 toys, 5o/o werc found to be defective.

For each term on the left, select the matching answer from the list to the right, and writethe number in the blank.

- Population I The 3Yovalue

- Sample 2 The 5o/o value

- Sampling Frame 3 The 100 toys

- Parameter 4 The 5,000 toys

- Statistic 5 All toys produced

MT2013: Question 2 oocould this label be called a ophone tag?)"

Amagazine that publishes product reviews conducted a survey of teenagers'preferences

for cell phones. Three brands of cell phone designed specifically with teens in mind were

the focus of the study. The table summarizes responses by brand and gender.

Cell Phone Male Female Totallall Me Mavbe 55 87 142

Phone Fun XS . 99-. 150 249

Black Kevs II "196 113 309

Total 350 .i 350 700

a) Which of the following charts would be appropriate for displaying the marginaldistribution of cell phone brand?

-A. Histogram

-B. Boxplot

-C. Bar Chart

-D. Line Graph E. Stem and Leaf Display

b) What percent of teenagers preferred Call Me Maybe?

_A. s0% B. 4l% _c.2s% -D. t6%

-F,.20%

c) What percent of female teenagers preferred the Phone Fun XS?

_A. 43% _8. 60% _c.2r% -D. 50% E. 16%

d) What percent of teenagers who preferred the Black Keys II were males?

_A. 63% 8.32% _C. 16% -D.

s0% -8.

4l%

e) Which of the following statement is true?

_A. It appears that cell phone brand preference and gender are not related.

_B. It appears that cell phone brand preference and gender are not independent.

_C. It appears that cell phone brand preference and gender are independent.

-D. A scatterplot will be more informative here than a table.

_E, None of the above

SG.5

Page 3: Comm291 Practice Midterm

MT2013: Question 3 ttSpring into these summary questionst'

a) You have a set of 30 numbers. The standard deviation from these numbers isas zero. You can be certain that:

-A. Half of the numbers are above the mean

_8. All of the numbers in the set are zero

_C. All of the numbers in the set are equal

-D. The numbers are evenly spaced below and above the mean

b) Here is the five number summarv of the hourlv w forMin o1 Median o3 Max20.94 37.64 44.77 49.24 67.11

sales managers.

Space for calculations:

Increase

Increase

Increase

Increase

(i) The shape of this distribution is best described as:

-A. Symmehic

_8. Skewed to the right_C. Skewed to the left

-D. Not enough information to tell

(ii) The IQR for these data is:

(iii) Compute the lower and upper inner fences:

',i.

Lower inner fence:

a. Mean

b. Median

c. Range

d. rQR

Upper inner fence:

(iv) Are there any outliers, as defined by the ooinner fences" criterion?_A. Yes, only on the left side of the distribution

-B. Yes, only on the right side of the distribution

-C. Yes, on both sides of the distribution

-D. No

(v) Suppose there had been an effor and that the lowest hourly wage for sales

was $ 18.50 instead of $20.94. Indicate whether how this change would affect the

following swnmary statistics (increase, decrease, or stay about the same):

Decrease Stay the Same

Decrease Stay the Same

Decrease Stay the Same/

Decrease Stay the Same

Page 4: Comm291 Practice Midterm

c) In a perfectly symmetrical distribution,,which of the folowing statemenfs is false?-A'

The distance from er ,o qz i. equar-to-trie;;." from e2 to e3-: ]Ti# aTtTT,ru;ti*l1ll#;d il"6' ',,r,.,ui";;; distance

{ tT"*'&i:fl".1"H:nfll*l;i;arion

to e2 is the same as the disrance

-D The disJanc:"i"#Sl ," q:"1, rr"iiorrn" distance from rhe smanes*o the

d) Here is a12

13

14

15

16

17

l8

(i) How many students were in the course?(ii) What was the maximum score?(iii) What is the medianscoJe?

' ;li

Ustfm ntot of scores (out of 200) in a graduate finance course.

t34s7834726

J

9

e) An office supply chain has stores in Toronto and vancouver. one of these stores is tobe closed within the coming yut *rro.h.lp -;k;,rr"J.jri"n, management reviewssales data. Below are boxpio;r f#;dry unit sales for both locations.

Which of the following statements is not correct?

-A' Monthry sales are higheil.Toronto compared to vancouver.

-B. The IeR for rut"r itrioronio i, ru.g".ii;;# fo. vurr.ouu...

-c' Monthrv sales are less uu.iuur. i, \i";;;;;;;;o*.d to Toronto._D. Both distributions are fairly symmet ic.

-E' Monthly sares are more i*uarcin vun.ouuer compared to Toronto.

SG.7

Page 5: Comm291 Practice Midterm

MT2013: Question 4 '6Time for relationship-building'o

a) A consumer research group investigating the relationship between the price of rneat

(per kg) and the fat contJnt (gramO githered datathatproduced the following scatterp

(i) Which best describes the association between the price of meat and fat content?

-A. Negative, moderatelY strong

-B. Negative, weak

-C. Positive, strong

-D. Positive, weak

E. No aPParent association:..(ir) If the point in the lower left hand corner ($2.00 per kilogram, 6 grams of fat) is

removed, would..$be correlation would most likely

-A. remain the same

-B. become stronger negative

-C. become weaker negative

-D. become Positive

-E. become zeto

b) For each of the following pairs of variables, would you expect a large negative

correlation , alargepositivJcorrelation, or a small correlation? Circle your choices'

1. The age of a used car and its Price

2.The height and weight of a Person

3. The height and the IQ of a Person

Large Neg. Large Pos. Small

Large Neg. Large Pos. Small

Large Neg. Large Pos. Small

c) For each of the following statements, about the correlation coefficierrt, r, decide

whether it is True or False. Circle your choices as appropriate.

1. r equals the proportion of times two variables lie True

on a straight line2. r willbe +1.0 only if all the data lie exactly on a

horizontal straight line3. r measures the fraction of outliers that appear in True

a scatterPlot

True

True

True

4. If the correlation between X and Y is r, the

correlation between Y and X is -r5. r is a unitless number and must always lie

between -1.0 and +1.0 inclusive.

8c!fir.ploi of Fsl Gnms va Prlclrk0

Page 6: Comm291 Practice Midterm

MT2013: Question 5 oolf rnistrust is the opposite of trustrwouldmistress be the opposite of stress?r'

A labour efficiency consultant collected some data on several employees of amanufacturing operation: their stress levels (X, on a scale from Oio i0) and theproductivity levels (Y, in parts made per hour). She only recorded some of the relevantcomputations, as follows:

i:5.4! :57.5bt: -3.19

s, :3.3sr:11.1s" = 4,3

a) Write the estimated regression equation here:

b) Write the correlation coefficient here:

Space for work:

(Use two decimals only for each value)

(Round to two decimals)

c) Complete thislevel

sentence: For each additional unit on the stress scale, the productivity

d) what percentage :lrll"rn"tion in productivity levels can be explained bythe shess level variable? Give your answer here, t-o ttre nearest whoie p.r..ni

parts per hour.

e) Estimate the productivity of an individual whose stress level is g:

(Round to nearest whole number)

Residual: Outlier? Yes No

Explanation:

g) Estimate the productivity of an individual whose shess level is unknown.

I Suppose the employee in part e) has an actual productivity level of 60 parts per hour.Compute the residual and use the fact that the standard deviation of the resiauats is 4.3 todecide whether this data point would be considered an outlier. Explain why in onesentence only.

h) Give an interval range in which the productivity levelexpected to fall. Report to the nearest whole numbers.

of 95% of employees would beto

SG.9

Page 7: Comm291 Practice Midterm

MT2013: Question 6 "Can vou answer the call of the obellt?"

a) Which statistic(s) would you expect to have a normal distribution?I. Height of womenII. Shoe sizes of menIII. Age (years) of first-year university students

-A. I&IIonly

B. II & III only

-C. I&IIIonly

-D. All three

_E. None of the three

b) The length of time taken by a statistics professor to solve The Globe & Mailcrossword has a normal distribution. It is known that the probability of needing

20 minutes is 0.5, while the probability of needing more than 30 minutes is 0,1

(i) Find the mean and the standard deviation of the professor's solving time.

Mean: SD=

(ii) What

_A.0.38

is the probability that the solving time is between

8.0.17 _c.0.68

15 and 25 minutes?

_8.0.12 _F._D.0.06

c) A soft drink machine dispenses a cup, syrup and carbonated water, hopefully in

order! The amount of synrp injected is normally distributed with mean 15 ml and

variance 10 ml2. The amount of water injected is normally distributed with mean

and variance 15 ml2. The two amounts are independent of one another.

(i) Find the mean and standard deviation of the total amount of synrp and water

dispensed.

Mean: SD:

(ii) If 25 drinks are dispensed in a day, what are the mean and standard deviation

total amount of liquid (syrup and water) that ate required?

Mean: SD:

Page 8: Comm291 Practice Midterm

d) Suppose the time it takes for a purchasing agent to complete an online orderingprocess is_normally distributed with a mean of 8 minute. und u standard deviation of 2minutes. suppose a random sample of 25 ordering processes is selected.

(i) The standard deviation of the sampling diskibution of mean times is_A. 0.4 minutes

8.2 minutes_C. 0.08 minutes_D. 1.6 minutes_8.0,12 minutes

(ii) what is the probability that the sampre mean will be less than 7.5 minutes?_4.0.3944_B. 0.1056_c.0.21t4_D.0,4013_8.0.8944

:) Il. mean height of male UBC students is 70 inches, with SD 3 inches. The meanheight of female UBC students is 65 inches, with SD 4 inches. you measure the heightsof random samples of 100 males and 100 females.'Which result is the most unlikely? Todecide, compute the z-scorE for each result and write the values ir trt. rp*Lffi"ia.a.

1. 9n. randomly.trorin male having a height of 79 inches or more

-!. o1e randomly chosen female having a treight of 74 inches or more

-c. All females in your sample having an auerage height of 6g inches or more

-D. All males in your sample having an a'ne.agJheight of 73 inches or more

forA=

for C:z-score for B :

z-score for D =

SG-l1

Page 9: Comm291 Practice Midterm

MT2013: Question 7 66Work with confidence!"

a) EU (European Union) countries report that 460/o of their labour force is female. Is tX

p"r..triug. of females in the Canadian labour force the same? Statscan plan to check a

random Jample selected from more than 10,000 employment records on file to esti

the percentage of females in the Canadian labour force.

(i) Statscan wants to estimate the percentage of females in the Canadian labour force

*itftitr *5% with 907o confidence. How many employment records should be sampl

_A. tzt_8.269_c.451_D.382

E. 1000

(ii) Suppose that Statscan wants to be $ confident of estimating the percentage of

femates in the labour force to within +2o/o of the true percentage. Which of the follou

would they have to do?

-A. Decrease the samPle size

-8. Select the same number of employment records

-C. Increase the samPle size

-D. Decrease the Precision

-8. Increase the samPling'error

(iii) They actually select a random sample of 525 employment records, and find that229

of the people are 'females . The 90oh confidence interval is closest to:

-A.40.1%to 47.2%;o

-B.275%to 59.7o/o

-C.17.8%to 69.40/o

-D.42.4%to 56.8oh

-E. 12.4% to 71.0o/o

b) For each of the following statements about a95Vo confidence interval (CI) for the

mean, decide whether it is True or False. Circle your answers at the right.

1. Results fromg1oh of all samples will lie in this interval. True False

2. CIs are more information than point estimates because they True False

show how much the population parameters can vary.

M

Sr

fi4t

iI

3. The interval is wider thanag}% U would be'

4.95% of data values will fall in the range of a95oh CIfor the mean.

5. We arc 95o/o confident that the confidence intervalincludes the samPle mean.

6. If we took many additional samples and computeda95a/o cI True

for each, then approximately 95% of those intervals

would contain the population mean.

True False

True False

True False

False

SG.I2

Page 10: Comm291 Practice Midterm

MT2013: Question 8 *Hypothetically speaking"

Suppose that areport indicates that2SYo of Canadians have experienced difficulty inmaking mortgage payments. Further suppose that anews organizatronrandomly sampled

400 Canadians from 10 cities and found that 136 reported such difficulty. Does thisindicate that the problem is more severe among these cities?

a) The correct null and alternative hypotheses are

-A. Ho : p:0.28 and Ho : p > 0.28

-8. Ho : p:0.28 and Ho : p < 0.28

-C. Ho : p : 0.28 and Ho : p * 0.28

-D. Ho I p i0.28 and Ho : p : 0.28

-H. Ho : p > 0.28 and Ho : p : 0.28

b) The correct value of the test statistic is: Space for work:

_A. -1.28_8. -2.67_c. 2.67

_D. 1.96

_E, -1.28

c) The P-value corresponding to this test statistic is:

_A. 0.025

-8. 0.2119 )

_c. 0.0177 ''

_D. 0.0522 .

_E. 0.0038

d) At a= .05, we can conclude that the percentage of Canadians in these cities

experiencing difficulty making mortgage payments ...

-A. is significantly higher than 28oh

-B. is significantly lower than28%o

-C. is not significantly different from 28%

_D. is equal to 28Vo

_E. is none of the above; no conclusion can drawn with the given information.

e)Using the P-value in part c), which one of the following statements is true?

-A. A 90% confidence intervalfor p would contain2So/o

-B. A 95% confidence intervalfor p would contain2SYo

-C. A 95% confidence intervalfor p would not contain 28%

-D. None of the above

Part f) is unrelated to parts a) through e):f; An opinion poll in a city of 200,000 was based on a simple random sample of 2000

people. Another poll is to be taken in the same way in a second city of population400,000. In order for this poll to have the same margin of error as the poll in the first city,the sample size in the second city should be:

_A. 1000

_8.2000_c.4000

D.8000

SG.13

Page 11: Comm291 Practice Midterm

MT2013: Question 9 o'No Surprise: A Statistics Test with a test

Insurance companies track life expectancy info-rmation to assist in determining the co

life insurance policies. Last year the uu.rug.life expectancy of all policyholders was

years. ABI Insurance wants io determine littreir clients now have a longer life

expectancy, on uu..ug., so thev rando"'1v.:Tp]:^:tr:lf:::tirtl'y"f,f i,lli'll?;il"#;;#;;ffiv will onry chang. th"it prr*ium structure if there is evidence t

people who buy th.ir 61i"'i.r ut. fiuinilongerihan before' The sample has a mean of

MT2a)Ye) 5.

5,0c

Det'a)\rattwitlb')"hen,

1on1

c)(poid)(iil(iii(ire)wl

HoHo

ZS.O y.utt and a standard deviation of 4'48 years'

a) The appropriate null and alternative hypotheses are:

b) Give the formula for the appropriate test statistic and compute its value'

Formula: Computed value:

Space for work:

\

c) The corresPonding P-value is:

-A. Greater than 0.20

-B. Between 0.10 and 0'20

-C. Between 0.05 and 0'10

-D. Between 0.025 and 0'05

E. Between 0.01 and 0'025

-F. Less than 0.01

d) State your conclusion using cr: .05._write ry statistically and grammalically correct

sentence that tells egl tns,rra;ce whether thera evidence to increase their premiums'

Ma)

D,

a)b(b)c)

dle)

tt

e) suppose ABI randomly samples.lO0 recently paid policies' This sample yields a mean

of 77.7 years and a standard deviation of 3'6 yt*t' compute aWconfidence interval'

Report it in the f"t;i;;.x , xx'xl with one decimal plaie' t-.- ' l

MT2013 - END OF QUESTIONS; ANSWERS AND EXPLANATIONS FOLLOW

86 75 83 84 81 77 78 79 79 81

76 85 70 76 79 81 73 74 1) 83

SG.14

Page 12: Comm291 Practice Midterm

MT2013: Answer 1

a) !9ars of Employment, Teaching Rating b) B. c) D. d) 2,1,3,4.e) 5,3,4,1,2. Population : All toys produc-ed; sample 1 too tovsi Sampling Frame :5,000 toys; Parameter :3yo; Statistic :5o/o

Details and Comments:a) Years of Employment has units (yrs); Teaching Rating does not have units but theratlns^is an average of ordinal data over a numbei of corises, and can range from I to 5with fractional values possible.b) "Percentage of Canadian adults who work full-time" is measured at one time point,hence cross-sectional. The other variables are.rurrrroi.peatedly over time, hencelongitudinal or time-series.c) only"weekly receipts at a clothing boutique" is measured at more than one timepoint. The other variables ut. *rururJd once each.d).(t) The five categories are strata;random samples are taken within each one.qil E^ach employee has the same chance of beinj setecteJror the sample.(iii) one school is a reasonable representative oith" entire university, hence a cluster.(iv) Choosing o'every fifth name', makes it systematic.tl Tlt sampling frame is the production run, namel y, thatpart of the population fromwhich the sample can be drawn.

'r : ;''

\,':i I

MT2013: Answer 2a) C. b) E. c) A. d) A. e) B.

Details and Comments:a) categorical data are displayed with abar chart.Histograms, stem-and-leaf displays,boxplots (and usually line graphs) are for quantitative da"ta.b) 20% (142t700)c) 43% (150/3s0)d) 63% (1e6t30e)e) The column percentages for males are different from those for females, which suggeststhat cell phone brand preference and gender are rehtlJli-

"ot independent.)

SG-15

Page 13: Comm291 Practice Midterm

MT2013: Answer 3

a) c.b) (i) C. (ii) 11.6 (iii) Lower inner fence :20.24 Upper inner fence :66.64(iv) B. (v) Decrease, Stay the same, Increase, Stay the same

c) D. d) 15, 189, 138 e) E.

Details and Comments:a) Look at the formula for standard deviation. If all numbers are equal, then they are

all equal to the mean, so all the deviations are zero. This is the only way the standard

deviation can be zero.b) (i) The median is closer to Q3 than to Q1 so the distribution is skewed to the left.

(ii) rQR: Q3 - Ql - 49.24 -37.64(iii) Lower inner fence : 37.64- 1.5x1 1.6; Upper inner fence : 49.24 + 1.5x1 1.6

(iv) Yes, only on the right side of the distribution since the maximum exceeds 66.64.

(v) Decreasing the lowest data value decreases the sum, and hence the mean. But itdoesn't really affect which is the middle value or the quartiles. The range increases.

c) Quartiles divide the area of the distribution into four equal sections.

d) (i) Count up the number of data values. Don't forget to attach the leaf to the stem

the maximum and median.e) Monthly sales are more variable in Vancouver compared to Toronto since the box i

tal1er.

MT2013: Answe3.4a) (i) A. Negative, moderately strongb) Large N.g.; Large Pos.; Small

Details and Comments:a) (i) Top left to bottom right is negative association.(ii) Removing the lower left point reduces the scatter.

b) 1. The older the car, the lower the price. 2. The taller the person, the heavier the

person. 3. Height has no connection with IQ.c) 1. "Creative" but completely wrong.2. The points must lie exactly on a shaight line with a positive slope.

3. "Creative" but also completely wrong.4. Corr of X and Y: Corr of Y and X. The roles are interchangeable.5. Two of the properties of r.

(ii) B. Become stronger negativec) False, False, False, Falseo True

Page 14: Comm291 Practice Midterm

MT2013: Answer 5a)f =74.73 *3.19x b) _0.95 c).odecreasesby3.19,, d)g0% e) 49

3rH:it"fl,;li;I;t it is an outliei since the residen, i, -o* than2.5"";, u*uy rrom 0.

Details and Comments:

?). b_o = ! - b$: 59.5 -(-3 .Ig)(s,4):74.73)

b) Reanange the formula ior b.r,'r = b1$*/sr): (-3.1gx3 .3/rt.1)= -0.g5c) Interpretation of slope.d) y' : G0.9r2:0.90 or 90%oe) i(8) :74.73

-1.19_(8) :49,2t (Round to 49)

3"li*1;li;#l"t ;$r";#, : 1 1 ; remember the 68 _s s -ss .7 Rure for idenrifying

g) Since x is unknown, just use the mean ofy.h) Use the 68-95-99.7 liule, i.e, Sl,i * z(tt.t) :35.3,74.7

MT2013: Answer 6a) A. I and II onlvuJ().n{ean :2ojso: r0 (ii) A.:J !i) Mean: e5 ; SD = 5 iiij u.* :2375;SD :25d) (i) A. 0.4 minure, ,).. ,i.: fiijn. tj.f OsOe)D. z-scores: 3, Z.ZS,i.Si, rc) 'D has the highest z-score anditherefore is the most unlikely.

Details and Comments:a) First-year students' ages will vary only.slightly since most are within ayearor two inage. There might be some older students, i.e.-ttror. rrtu*1ig to school etc., but it isli4]y-*tikely to have students who are much younger than lg or 19!b) (i) Computations: pr(Z>A:0: ; z:0,r" X ="1ii)o=r 20:$* 0 => It:20k(Z > z) = 0.1587 :- :-= l,_so X : p + 26 :) 30 : 20+ I o :) o : I 0(ii) Computations: Pr(15 < X < 2t jp{l

s-201trc <Z <lys,2'l/rc):f{f 5 < z < .0.s): 1 _ 2(0.3osi; :0.:s:c) (r) uomputations: E(T*y_): E(X) + E(y) =15 + 80 : 95;var(X+Y) = var(X) lvar(i) Gince iJrpi:lo *ji :'ir, rosD :r/25 : 5

$1,!:m:i",i;;1,P0= )s(git:iiis'T*rri : iilzsi2ozs; sn :^1azi : zs

0llttft <7.5):Pr(Z < [7.5-8J/0 .4):pr(Z < -1.25):0.1056e) Computations:z-score for A: 179-701/3:3 z_score for B : [74-65]/4:2.25z-score for C : [69_65]/t4/.,h 001 : 7.5 z_score for D : t73-70llt3/",h 001 : 19

sc-17

Page 15: Comm291 Practice Midterm

MT2013: Answer 7

a) (i)8,269 (ii) C. Increase the sample size (iii) A. 140'l% ,4'7 '2ohlb) 1. False;2. False;3. True;4. False;5. False;6. True

Details and Comments:a) (i) n: (1.64s2)(0.46x0.54)(0.0s1 :269

1ii;'ioot< at the formula for the CI. The sample size is in the denominator of the

etTor, so increasing the sample size decreases the margin of error.

(iii) p :2291525 :0.4362;

90%C|:0,4362tI.645@:0.4362+0.0356or[0.4006,4?1b) 1. The interval changes from sample to sample

2. Population parameters don't vary; sample statistics vary

3. Higher confidence requires wider intervals4. CI; are not about individual data values; they are about estimates

5. All CIs for mean include the sample mean; only 95o/o include the population mean

6. Definition of a CI

MT2013: Answer 8

a) A. Hs: p:0.28 and Ho: P > 0.28

b) C.2.67 c) E. 0.003s d) A. e) C. f) B. 2000

Details and'Comments:a) One-sided altbrnative since the question asks whether the problem is'omore severe."

b)l:1361400:034;z=

c) The P-value is the areato the right of 2.67 on a standard normal curve.

d) Since the P-value is less than 0.05, the null hypothesis is rejected; the true population

proportion is significantly higher than28%.e; nejecting the null hypotheiis for a two-tailed alternative is equivalent to the usual

(two-sided) confi dence interval.

0 Sampling variability only depends on sample size, as long the population is large.

MT2013: Answer 9

a) H6 : p:77 and Hu :1t > 77

b) Formula and computed value: t = #r: m: !.597

c) C. Between 0.05 and 0.10d) There is not sufficient evidence that the mean length of life of people who buy their

policies is higher, so do not increase premiums.

e) 177 .0 ,78.41

Details and Comments:a) One-sided alternative since the question asks whether policy-buyers are "living longer"

than before.c) Use the t-table with 19 degrees of freedom

d) Since the P-value is gteater than 0.05, do not reject the null hypothesis.

"j ll,t* 1.984x3.64h00 : 77.7 t 0.7

f-po _ 0.34-0.28 _)6jffi @s?4

{ tl .l- 4oo

c)*1

d)u

e)odont

surv

END OF ANSWERS AND EXPLANATIONS TO MIDTERM 2013

SG-18

Page 16: Comm291 Practice Midterm

i

Midterm Exam 2012Notes: This exam has 9 questions. The duration is 2 hours. Books, noteso andcalculators are allowedo but not computers, cellphones or on-line connectivity.MT20l2: Question 1 "A sole practitioner"ASW, a regional shoe chain, has recently launched an online store. Sales via the Internethave been sluggish compared to their brick and mortar stores, and management suspectsthat its regular customers have concerns regarding the security of online"transactions. Todetermine if this is the case, they plan to survey a sample of their regular .urio-.rr.a) Suppose that ASW's regurar customers belong to a rewards program and have acustomer rewards ID number. ASw decides to rindomly seleci 10b numbers. Thissampling plan is called:

_A. Simple Random SamplingB. Stratified Sampling

_C. Cluster Sampling_D. Systematic Sampling_E. Convenience Sampling

b) Suppose that ASW has an alphabetized list of regular customers who belong to theirrewards program. After randomly selecting a custoirer on the list, every 25th Justomerfrom that point on is chosen.to:-b." in the sample. This sampling plan is called:

_A. Simple Random Sampling

-8. Stratified Sampling .,i^,

_C. Cluster Sampling_D. Systematic Sampling_E. Convenience Sampling

c) "All regular ASW customers', is known as the_A. Parameter

_B. Statistic_C. Target Population_D. Sampling Frame_E. Sample

of the study.

d) which of the following is the parameter of interest in the ASw study?_A. All regular ASW customers

-B' % of regular ASw customers who have concprns about online security

_C. ASW customers who belong to the rewards program

-D'% of ASW customers who belong to the r.*urdi program but don't shop online

E. None of the above

e) One member of the management team at ASW suggests that their survey could bedone online. Customers logging on to the online storilwould be asked to .o*pi.t. tt.survey and offered a coupon as incentive to participate. Which statement is true?

_A. This is a voluntary response sample_B. This would result in an unbiased random sample_C. This would result in a biased sample_D. Both A and B_E. Both A and C

SG.19

Page 17: Comm291 Practice Midterm

[IT2012z Question 2 'oPlanning

A brokerage firm gathered information on how their clients were investing forHere is a small sample of the data they collected.

a) Place an X in the space beside each variable that is bsst described as Quanti

-Respondent Number

-Age-Gender-Household

Income

- Self-directed KRSP

-Bookvalue of portfulio

Based on age, clients were categorized according to where the largest percentage of their

retirement portfolio was invested and shown in the table below.

Ase 50 or Younser Over Aee 50 TotalVlutual Funds 30 34 64

Jtocks 37 45 82

londs T9 23 42

fotal 86 t02 188

b) The percentage of clients who are over age 50 and invest in mutual funds is:

_A. s3.t% 8.33.3% _C. r8.1% _D.34.0% _8. s4.3%

c) Of the clients over age 50, the percentage who invest in mutual funds is:

_A. s3.t% _8.333% _C. r8j% _D.34.0% _8. s4.3%

d) Of the clients who invest in mutual funds, the percentage over age 50 is:

_A. s3.l% _8.333% _C. l8.r% _D.34.0% _8. 54.3%

e) The percentage ofclients over age 50 is:

-A. 53.1o/o

-P.333% -C. l8.I%

-D.34.0% -F. 54.3%

f) Consider the following side-by-side bar chart for the data below:

ta)

b)

c)'

d):

clllrtdYosrEEi, Ol(hr

50

,r0

l0

l0

l0

0Yqt{q &

liltut Mffi?fistsitln hlrlk d hetHri

ldryrr tbr ldry fter!1*i6 0{d!

Does the chart indicate that mode ofinvestment is independent of age?

Yes No

Explain in one short sentence only.

Page 18: Comm291 Practice Midterm

MT2012t Question 3 '6Mmm - Marketing Manager Money"

Here is a histogram and the five number salary for salaries (in $) for a sample of 48marketing managers.

a) The shape of this distribution is:

-4. Symmetric 'r'. :.;

_8. Bimodal

-C. Skewed to the right ')r'

_D. Skewed to the left_E. Normal

b) Which of the following is true?

_A. Mode < Median < Mean_B. Median < Mode < Mean

-C. Mean < Median < Mode

-D. Mean < Mode < Median

_E. All three are equal

c) Which of the following is closest to the standard deviation?_A. $ 3,676_8. $ 13,843

_c. $ 20,765_D. $ 83,060

_E. Can't tell without the data

d) The IQR for these data is:

_A. $83,060_8. s22,057_c. $69,693_D.977,020_E. $14,566

Hrbgrrmof ld(€ lrllmger Sdarie:

filC$ l,|.nrgFr 8rhdc.

Min o1 Median o3 Max46360 69693 77020 9t750 129420

SG.2I

Page 19: Comm291 Practice Midterm

e) ComPute the lower and uPPer

Lower inner fence:

Upper inner fence:

Space for calculations :

MTi

Todwererecei

a)\

b)'rcol:

f) Are there any outliers' as defined by the 'oinner fences" criterion?

-A. Yes, only on the left side- of the distribution

-B. Yes, only on the right side- of the distribution

-C. Y"r, on 6oth sides of the distribution

-D. No

g)Supposethemarketingmanaqe'..*hgwasearning$t29'420gotaraiseandisnowilil; bt+o,ooo. wht;h ortne-fottowing statements is true?

-A. The mean would increase

-B. The median would increase

-C. fn. range would staY the same

-D. The IQR would increase

-E. The IQR would decrease

The next two parts are not related to parts (a) through (g) above.

The boxplots belolJ rt o* *o"tnfy tuf.Jtlut"ut ftguttt 1$ tho"'uttds) for a discount

office supply companywith locations in three different regions of Canada (Atlantic'

Central and West).

h) Which of the following statements lt tfggt--/ -A.

Central has the lowest sales revenues

-g. Central has the lowest median sales revenue

C. West has the lowest mean sales revenue

-O. West has the lowest median sales revenue

J. etlantic has the lowest mean sales'

i'r Which of the following statements is S!ry?A. West has the most variable sales revenues'

-g. West has the largest tQR-'-

-C. Central has the smallest IQR'

-O. eUantic has the most variable sales revenues'

E. Central has the least variable sales revenues'

=ELLLI Me,l.srarI@

c)1the

Slc

Int

Eq

Sp,

d)$8

i

i aoxFlot of Ail*ntlc, ccT trol' d$d wr*l

I*H#

SG.22

Er

Page 20: Comm291 Practice Midterm

To determine whether the cash bonus paid by a company is related to annual pay, datawere gathered for 10 account executives at Outstanding Management Group lOivtC; wtroreceived cash bonuses in2007. The data and summary-statistics are shown b"to*.

MT20l2: Question 4 '.OMG: A great place to workro

ANNUAL PAY CASH BONAS$ 70,609 $ tt,22s$ 58.487 $ 6.238$ 104,s61 $ 14,194$ 43,922 $ 4,188$ 82.613 $ 11"863$ 116,250 $ r3,67t$ 76.751 $ 7,759$ 68.513 $ 20.760

$ 137,000 $ s5,000$ 94.469 $ 34.368

Meun $ 8s,318 $ 17,927Stsndard Deviation $ 28.077 $ 15,618

Conelation 0.735

a) what percentage of variability in cash bonuses. can be explained by pay?.r 'j-

;t,b) What would the correlation be if the Dollars were converted to Euros at the currentconversion rate of (1 Canadian Dollar :0.76 Euros)?

c) Estimate the linear regression model that relatesthe predictor variable (annual pay).

the response variable (cash bonus) to

Slope of the regression line:

Intercept of the regression line:

Equation of the linear model:

Spacefor work:

(Report to three decimal places)

(Report to nearest whole number)

d) From the equation, in part c), estimate the cash bonus for an executive at OMG earning$82,613 ayear' and compute the residual for this estimate.

Estimated cash bonus: Residual:

sG-23

Page 21: Comm291 Practice Midterm

e)Would you be confident in using your regression equation to estimate the cash

for an executive at OMG earning $200,000 ayeafl

Yes No Reason:

f) Below is a plot showing residuals versus fitted values for the estimated regression

equation relating cash bonus to pay for the account executives at OMG.

v{rr{ll Fltr(ruffi ic Cdr bt!!I

Circle the conditions for linear regression which are violated, if any.Noqe are violatedLinearityNormalitY'.-+'

Constant Variance (Equal spread)Independence

Parts (g) through (i) are unrelated to parts (a) through (f):g) In commenting on the increase in home foreclosures (i.e. banks repossessing homes),

news reporter stated "there appears to be a strong correlation between home forecand job loss of the head of household." Comment on this statement; use one sentence

only.

h)A research study investigated the relationship between number of hours individualsspend on the Intemet and age. Which is the predictor variable? Circle your choice.

Hours on Internet Age

i)The correlation associated with the following scatterplot is:gshrClt dtEr

b)wto th

c)cRou

d)ls!'(f€ii': i

_A._8._c._D.

E.

1.00-1.000.50-0.s00.00

Page 22: Comm291 Practice Midterm

tI

I

I MT2012: Question 5 o6Greater attitude, greater latitude,,

I The Survey of Study Habits and Attitudes (ssHA) is a psychological test that measuresI academic motivatigl ano tt"iy il.uits. Females ,ror. t iglrrr, on average, than males. TheI oisttiuution of SSHA t:;*l ;;;f tn" r.-ate studenis at a university has mean r20 andI standard deviation 28; thedistributlon among male sfudents has mean 105 and standard

I atlati"n 35' Scores are nonnally distribut.i *rurn. also that scores are independent.

I 3.ffi:f-'ffiTi:r::ffi ;?.t#ave S SHA scores greater than t 62? Report your

II

II

I

I

I u) wtrat ssHA score is exceeded by only 10% of female students? Round your answerI to tne nearest whole number.

I

lLl'I'I

I c) compute the lower and upper quartiles for the distribution of scores of female students.I Round your answers to the neaiest whole numbers.

d) suppose you select a single female student and asingle male student at random andgive them the SSHA test' what are the mean and the stindard deviation of the difference(female minus male) between their scores. Report to one oecimar place.

Mean = Standard Deviation =

e) using your answers-from part d),compute the probab irity thatthe chosen female has ahigher score than the chosen-male.'

SG-25

Page 23: Comm291 Practice Midterm

f) Suppose Angelina (a female) scores 78 on the SSHA, while Brad (a male) scores

the SSHA. Use an appropriate calculation to determine who did worse compared to

average for their gender. Circle the name of the person who did worse.

Angelina Brad

Explanation:

MT20l2z Question 6 66A convenient trutho'

Part I. A convenience store owner suspects that only 10% of the customers buymagazines and thinks that he might be able to sell something more profitable. In order to

decide whether he should stop selling them, he tracks the number of customers who buy

magazines on a given day.

a) On that day he had 300 customers. Assuming it was a typical day and that his estimate

is correct, what are the mean and standard deviation of the number of customers who buymagazines each day? Report your answers to one decimal place.

Mean: Standard Deviation:

b) What is the prolability that25 to 35 customers (inclusive) bought magazines that day?

c) How many magazine sales would you consider to be very strong evidence that his 10%

estimate was too low. That is, what number of sales would be extremely unusually high?

Hints: Use The Empirical (68-95-99.7) Rule. Remember to give a whole number answer.

Part II. Past records indicate that the magazines he sells on any day have an averagerevenue of $150 with a standard deviation of $30. Suppose he takes a random sample of36 past days' sales receipts and records the dollar value of magazine sales.

a) Describe the sampling distribution for the sample mean by naming the model and

telling its mean and standard deviation.

b) Suppose the resulting sample mean is $130. Do you think that this sample result isunusually small? Explain.

d)corat

e)r

SG-26

Page 24: Comm291 Practice Midterm

MT20l2: Question 7 "Talk about confidence!,'

One division of a telecommunications equipment company reports that l2Vo of non-electrical components are reworked. Management wants to determine if this perceniage isthe same as the percentage rework for electrical components manufacfured by thecompany. The Quality Control Department plans to check a random sample of the over10,000 electrical components manufactured across all divisions.

a) The Quality Control Department wants to estimate the true percentage of rework forelectrical components to within *4o/o,with99Vo confidence. How many componentsshould they sample?

_A.6s1_B. 1000

_c.344_D.438_8.579

b) They actually select a random sample of 450 electrical components and find that 46 ofthose had to be reworked. The 99o/o confrdence interval is closest to:

_A. [ 0.0654, 0.1390 ]_8. [ 0.0432,0.1608 ]_c. [ 0.0763 ,0.1277 ]_D. [ 0.0541 ,0.1499 ]_E. Cannot be deternijnEd with the given information.

c) The 95o/o confidence interval haqed on these data is 0.0742 to 0.1302. Which one ofthe following is the correct interpretation?

-A. The percentage of electronic components that are reworked is

between 7.4Y0 and I3.0%.

-8. we are 95o/o confident that between7.4Yo and l3.0yo of electrical

components are reworked.

-C. The margin of error for the true percentage of electrical components

that are reworked is between 7 .4%o and 13.0%.

-D. All samples of size 450 will yield a percentage of reworked electrical

components that falls within 7.4Yo and 13.0%.

-E. There is a 95Yo chance that 7 .4%o to 13 .\Yo of the electrical components

have to be reworked.

d) Based on the 95o/o confidence interval, should the Quality Control Departmentconclude that the percentage of rework for the electrical components is lower than therate of l2o/o for non-electrical components?

_A. Yes, because the lower limit of the confidence interval is 7.4%.

-B. Yes, because l2o/o is contained with the 95o/o confidence interval.

-c. No, because 12% is contained with the 95%6 confidence interval.

-D. No, because the upper limit of the confidence interval is 13.0%.

_E. We cannot say since the sample size is not large enough.

e) All else being equal, increasing the level of confidence desired will...:_A. ...tighten the confidence interval_B. ...decrease the margin of error_C. ...increase precision_D. ...increase the margin of error

E. ...increase the margin of error and tighten the confidence interval

SG.27

Page 25: Comm291 Practice Midterm

Ho:

MT2012: Question 8 654' dip in chips"

A company manufacturing computer chips finds that 8% of all chips manufactured are

defective. Management is concerned that high employee turnover is partiallyfor the high defect rate. In an effort to decrease the percentage of defective chips,

management decides to provide additional training to those employees hired within the

last year. After training was implemented, a sample of 450 chips revealed only 27 withdefects. Was the additional training effective in lowering the defect rate?

a) The appropriate null and alternative hypotheses are:

b) Give the formula for the appropriate test statistic and compute its value.

Test Statistic Formula:

Computed value:

Show your work:

c) Assume that the value of the test statistic is -1 .4.Don't use your computed value from

part b).The P-value associated with the given test statistic is closest to:

_A. 0.0404B. 0.05

_c. 0.0808

_D. 0.1616_8. 0.9192

d) From the P-value in part c), and using a 1% significance level (i.e. cr: .01), which of ,

the following is true? ,

_A. Conclude that additional training significantly lowered the defect rate.

_B. Conclude that additional training did not significantly lower the defect rate.

_C. Conclude that additional training significantly increased the defect rate.

_D. Conclude that additional training did not affect the defect rate.

-E. No conclusion can be made with the given information.

Hul

Page 26: Comm291 Practice Midterm

12: Question 9 6oThe non-profit motiveo'

large software development firm recently relocated its facilities. Top management has

their professional employees to engage in local service activities. Theythat the firm's professionals volunteer an average of more than 15 hours per

If this is not the case, they will institute an incentive program to increase it. Asample of 24 professionals reported the following number of hours:

sample has a mean of 16.7 5 hours and a standard deviation of 2.40 credit hours.

The correct null and alternative hypotheses are:

-A.Ho-B.Ho-C.Ho-D.Ho-E,HO

x:15andHup:15andHup:15andHup*15andHup:15andHu

f>15p>15p<15p:15p+15

b) The correct value of the test statistic is closest to:

_4. 3.572

_8. -3.572

*c. 1.327

-D. -1.327 "r". j.'.'

_8. 0.729 .11

c) Which of the following conclusions is correct?

_A. We reject the alternative hypothesis at the 5o/o significance level.

-B. We fail to reject the null hypothesis at the 5% significance level.

_C. An incentive program is needed since the evidence indicates professional

employees volunteer an average of no more than 15 hours per month.

_D. We reject the null hypothesis; the firm shouldn't need to institute an

incentive program since the evidence indicates that professionalemployees volunteer an average of more than 15 hours per month.

E. No conclusion can be reached about the hypothesis with the informationthat is given.

d) It is appropriate to test the mean because:

_ A. The data are a simple random sample from the population of interest

_ B. The distribution of the sample data appears to.be approximately normal

- C. Volunteer hours is likely to be independent across employees

_ D. All of the above

e) A95% confidence interval for the true mean number of hours of volunteer time isclosest to:

_A. 16.75 + 1.016

_8. 16.75 + 0.840

_c. 16.75 t4.966_D. 16.75 * 4.114_8. 2.40 +7.074

MT2012 - END OF QUESTIONS; ANSWERS AND EXPLANATIONS FOLLOW

l2 t3 t4 I4 15 15 15 T6 t6 t6 t6 I6t7 l7 T7 18 18 18 I9 19 t9 20 20 22

SG.29

Page 27: Comm291 Practice Midterm

MT2O12: ANSWERS AND EXPLANATIONS

MT20l2z Answer Ia) A. b) D. c) C. d) B. e) E.

Details and Comments:a) Each regular customer has the same chance of being selected for the sample.

b) Choosing ooevery 25th customer" makes it systematic.c) The target population is the o'universe" for which you want to be able to generalize.

d) A parameter is a numerical characteristic such as a mean or a proportion/percentage.

e) Since people can decide whether to answer or not, it is a voluntary response, and hence

subject to bias. People who decide to participate may not be like people who decide not

participate.

MT2012: Answer 2

a) Age, Household Income, Book value of portfoliob) C. 18.1% c)8.33.3% d) A. 53.1% e)E.54.3%f) Yes: The age distribution (ratio of younger to older) is about the same for each mode

(i.e. type) of investment.

Details and Comments:a) Age (fq), Household Income ($), and Book Value ($) all have units and are measured

on a continrium; so they are quantitative.b) 341188 :0.181 ,.

c) 341102: 0.333d) 34164: 0.531e) l02ll88 :0.543f) Look for differences across the clusters of bars.

N.IT20l2t Answer 3

a) C. Skewed to the right b) A. Mode < Median < Meanc) B. $ 13,843 d)8.$22,057e) Lower inner fence: $36,607.50; Upper inner fence : $124,835.50

0B. s)A. h)B. i)D.

Details and Comments:a) Long right-hand tail: more of the area.is piled up to the left.b) The mode is the peak and it is clearly to the left of the median value of 77020.Themedian is less than the mean for a right-skewed distribution.c) Use the rule of thumb: s = Range/6d) IQR: Q3 - Ql : 91750 - 69693:22,057e) Lower inner fence :69,693 - L5x22,057 : $36,607.50

Upper inner fence :91,750 + 1.5x22,057: $124,835.50f) The maximum is larger than the upper fence but the minimum is not smaller than the

lower fence.g) The sum is increased so the mean is increased.

h) The median is the line in the interior of the box.i) Variability is shown by the length of the box.

n

a

dq

La.

IP

b.

zc)ttt

vt

a(

d)e)

0At

SG-30

Page 28: Comm291 Practice Midterm

NIT20L2: Answer 4a) y' :0.n52 :0.5402 or 54%ob) Unchangedat0.735

l) h = r(s, / s*) :O.735(t5,6tg/28,077) : 0.409;bo = ! - b$: 17,??1- (0.409x8sirsl : _16,968; i: _16,968 + 0.409xd) 9 $2,6 t 3) : _ t 6,e 68 + 0.40s (A;,2' i il :$ I 6, 82 IResidual : 11,863 - 16,g2-: _M;fi;''e) No; a predicrion at$200,000;.;;;r extrapolation beyond the range of data.f) c ons tant Variance (v- srrap e ill;; ;;;ffi#ft;ltJl"_0,, *lfi]ft:

two variables are categorical, not quantitative, * *o"rution is not appropriate.i) E. 0.00

Pr(X > 1621 : pr(z^> Lt62 _ 1201/2g) : pr(7, 1.5) : 0.0668.b) Find the value of z-thathrt;;;; of fi%to the right; then ,.unstandardize.,,

z : 1.28; X : 120 + l.2g(28):155gc) Find z-varues that have ui ui"u ii z5%o to,theright and to rhe left; then

;ffJfffi:ize. 'since the z is'ffi1t i., th. ,-u;i;;;; t . r.r is the negarive orrhe z_

9:, : -0.675; X: 120 + 1_0.675)(28) : 19,e3: z :0.675; X: t2o * io.atsj?ti': tze

d) Mean: 120-105 :15; SD : ffifrg = 44.ge) Pr(F-M > 0) : pr(Z> [0-ts1ii+-.t1': p:(7>_,0.33) : 0.629301 0,63 0r 630/of) Z-score for Angerina ='gs*tzolnls : -t'.5; Z-scoreror erao = (70-105) r35 : -1.0;Angelina did worse relative to the reErence populations since her Z-score more negative.

Details and Comments:a) This is the definition of r_squared.

?f,:t *ttlation coefficient iiut no u"its; it doesn't change if the measurement units

c) straightforward application of least squares regression line formulas.

,U:'i;',Hff ;H*ii:T,*HiT","'-ilffi?il!.'i,n"p..IiJ.avrheresiduarh)Age "precedls" dG.ii;;;#,, Hours on rnrernet.i) The best-fitting stiaight hn;i, il"Jr"rr"f.

MT20l2: Answer 5

291% b)d) Mean: t5; SD :44.8

"j i.izgior 0.63 or 630/of) Angelina; Z-score for angelini:-1.s, Z-scoreror nrao: _1.0;

Details and Comments:

?.rtl1fftXr"h:X-varue; 162 is r.5 sDs above the average. Find the areato the right of

SG.31

Page 29: Comm291 Practice Midterm

MT2012: Answer 6

Part I.a) Mean - nP:300x0.10:30.0; SD :.rlm: l,Fbbrcffi3o = 5.2

b) Pr(25 < X < 35): Pr([25-30]15.2<Z<135-3011s.2): Pr(-0.96 <Z<0.96): 1 - 2(0.1685;:0.663.c) From the Empirical Rule, 3 SDs above the mean is extremely unusual;

f3o: 30 + 3(5.2) : 45.6. Sales of 46 or more would be extremely unusual.

Part II.a) Normal: Mean : 150 and SD :3011fi:5b) Pr(f, < 130) --Pr(Z < [130 - 150]/5) :Pr(Z < -4) < 0001

There is an extremely small probability of getting a sample mean this small.

Details and Comments:Part I.a) Use the mean and standard deviation of a count.b) Use the normal sampling distribution of a count. (Note: Continuity correction was

needed, but if you used it correctly you would get an answer of 0'71 1 .)

Part II.a) Use the mean and standard deviation of a mean. (Note: The CLT applies here, butitiJnot necessary to say this in the answer.)b) Use the,normal sampling distribution of a mean.

" .:..

l'4T20l2t Answei:7a) D. b) A. c) B. d) C. e) D.

Details and Comments:a) n : (2.s7 6\(0.1 2X0.S 8y( 0.04\ : 43 8

b) p:461450:0.1022

99% cl: 0.t022 t 2.s76@ :01022 + 0.0368

c) Notice the wording and the use of the term"95o/o confident".d) Values inside a confidence interval are likely values of the parameter. Evidence of a

change or a difference depends on the target value being outside the CI.

e) Examine the CI formula; a higher confidence level requires a larger multiplier/critical

value so the margin of error will be larger.

ffil

sliU,d

W#isld)'d)t!

leI

Page 30: Comm291 Practice Midterm

NIT20l2t Answer ga) Hs : p:0.08 and Hu : p < 0.0gb) Formula and computed value; p:27/450:0.06

__f-po 0,06-0,08Z:::-- 1aa

lpoct| /(o.os)(o.gz){ " {-r-sod) B.

Details and Comments:

:1.?::l?,ttlHt:ffi:l'." the quesrion asks whether the rraining was errective in

b) Remember thatthe test statistic uses

confidence interval.the denominator, not

c) Find the arcato the left of -1.4 on the standard normal curye.d)Since the P-value is not less than-0.05 the evidenrc i, "ot

statistically significant.

MT2012t Answer 9a) B. b) A. c) D. d) D. e) A.

Details and Commeng:,.-

f)Ho:p=15andHu:p>15.

:T:?#tt131::?$ve si'ce the question is abour "increasing" the volunteer time.

L,,1-....'''.-::-:? <-^' s/,ln 2.40/\m J'J tLc) The P-value is much smaller than 0.05 so reject the null hypothesis. The volunteer timeis greater than r5 hours. so no incentive program is needed to get past r5 hours.d) These are rhe assumptions/condition; ibr;

";;;".pl.Ite.t.e) 16.75 +2.069x2.40/\m: t6.75 + 1.016

END OF'ANSWERS AND EXPLANATIONS TO MIDTERM 2012

c) c.

m/- ln\/n

SG.33

Page 31: Comm291 Practice Midterm

Midterm Exam 2011

Notes: This exam has 9 questions. The duration is 2 hours. Books, notes, andcalculators are allowedo but not computers, cellphones or on-line connectivity.

MT2011: Question I 6'First things firstt'

a) At the beginning of the term we asked all Commerce 29I students to complete our

line survey. This survey was most likely designed to be:

-A. a random sample of aIl C29l students

-B. a census of all C29I students

-C. arandom sample of business students

-D. a random sample of 2od year UBC students

_8. all of the above

b) The survey asked a wide range of questions. For each variable, circle the description

which best describes the type of data the variable represents.

Ethnic background CategoricalHeight CategoricalC290 grade Categorical# hrs onlin4per day Categorical

c) From the surveylresults, we can estimate that, on average, students spent 15.2 hours

per week studying. This number seems high given that for a course load of 4 courses

students spend 12 hours per week in the classroom and nearly half of the students

reported doing paid work. What is the most likely explanation?

-A. the data are very skewed and the median is a better numerical summary

-8. the data are bimodal, the two goups are those that work and those that

-C. women study more than men

-D. none of the above

d) Unfortunately, not every C291-registered student responded to the survey. If it were

true that students who didn't respond also spend less time studying, then our estimate

study time from the survey is:

-A. a good estimate of average study time of C291 students

-B. biased above the true average study time of C291 students

-C. biased below the true average study time of C291 students

-D. not a good estimate for study time of C291 students but

we can't say whether it is too high or too low.

e) From the survey we find that the Commerce 290 Grade (call this variable, X) has a

symmetric, bell-shaped distribution. Also, 95o/o of the grades fall in the range 53 to 93.

Use that information to compute the mean and standard deviation of X. Report to at

one decimal place.

QuantitativeQuantitativeQuantitativeQuantitative

IdentifierIdentifierIdentifierIdentifier

Mean of X SD ofX

Page 32: Comm291 Practice Midterm

MT2011: Question 2 "stock answers are sufficient here,,a) The following data are the price-to-earnings ratios (P/E ratio) for a random sample of25 stocks traded on the NYSE. The data valuis have been sorted from smallest to largest.Data: 4 8 r-i- tL t2 i-3 13 !4 t4 15 16 1,7 t7 t7 1,g2L 22 22 24 24 26 28 33 35 39The mean of these values is 19.0 and the standard deviation is g.5.

i) Find the following:Median :Ql :-Q3 :-IQR :-Inner fenceg :

-Outliers: :

-

Qr{ote:outlie,,u,"dffi!l1:*::*Tnooutliers,write..None'')

ii) Is the distribution s5rmmekic or skewed? (Note: You do not have to draw a graph toanswer this.) Circle your choice. Then give your reason.

Symmetric Skewed

.1t

iii) Sketch a boxplot of these data. Use the versiondo not use the modified version using fences.

based on the five-number summary;

b) Determine whether each statement is true or false? circle your choice.is required.

1. If the mean and SD are equal for a measurement variable

llrl only takes positive values, the distribution is syrnmetric.2.lf the mean and median are equal, the distribution must

be normal.3. If the mean and median are equal, the mode must also

equal the mean and median.4. The SD and IQR are always equal for a symmetric

distribution.5. The SD ofa set ofdata values can never be zero.

No explanation

True False

True False

True False

True False

True False

SG-35

Page 33: Comm291 Practice Midterm

MT2011: Question 3 ,6To-fu or not to-fu, that is the question'o

Read the foilowing survey design plan and then answer the questions after it. ,

Get Healthy, o pridrr", of healthfoods conducts a survey of the Lower Mainland to

determine how-recepttve itgh schiot students would be to its TOFU BURGH product and

what market potential (sates) it could expect. It plans the survey as follows:i. From the tist of all schools in the area, tyvo groups are defi.ned, public and private high

schools, called PUBS and PNSii. From the PUBS, four schools are chosen randomly.

iii. From the PklS, one school is chosen randomly'

iv. In the PUBS schools selected, on

one day, researchers give every

fifteenth student to exit the school a

TOFU BURGH and a-stamPed, self'addressed postcard (ike the one here).

v. In the PRIS school, researclters set

up a stand outside the school and give a

free TOF\J BURGH and the postcard to

any student who comes to the stand.

a) The overall survey sampling design planned by the company can best be described as:

-A. convenience samPling

-B. multi-stage samPling

-C. stiatified samPling

-D. simple random samPling

-E. clustei'samPling

t; fn tne PUBS selected, the sampling design uses:

- A. systematic samPling

- B. voluntarY response strategY

- C. unacceptable bribery of students

- D. anecdotal responses

c) In the PRIS selected, the sampling design uses:

- A. systematic samPling

- B. voluntarY response strategY

- C. unacceptable bribery of students

- D. anecdotal responses

d) One parameter of interest is likely to be:

_ B. the number of high school students in the Lower Mainland

_ C. the number of students who replied they would buy at least one

TOFU BURGH in a tYPical week

_ D. the proportion of students who replied they would buy at least one

TOFU BURGH in a tYPical week

e) which of the two samples is likely to have non-response bias?

- A. PUBS schools onlY

- B. PRIS school onlY

- C. Both PUBS and PRIS schools

- D. Neither will have non-response bias

Dear Hioh School Studentvou uaG been 'aroonly se ected io pdniclpate i' a€6earch proieff by Gel Healthy'ooqs You will €cieve$1.00 loryour ponc pa!on Iyou lryourTOFIJ BUqCHcircle your choice below Bnd mail this pdst cerd before

Apdl 30. 2002. ThBnkyou.Twiggy osohealihy, Director ol MarkBtihg

Having iiledthe TOFU BURGH. in 6typical weBK I

would buy 01

z34 ot more

Get Healthy FoodsMarksting ResearchDepartmentTOFU BURGH StudY1236 S. E. Marine Drive

Vancouvsr. BC

Belurn l1.ll0 to: Nan6:

Addrr..:

T.l:

b)

uIem

b,

c).(mren

sm

;; j

Yer

Ret

8{-i

Page 34: Comm291 Practice Midterm

MT2011: Question 4 o(Unassociated questions about association - how ironic,Note: This question has three unrelated parts.

a) A business school conducted a survey of companies in its state. They mailed aquestionnaire to small, medium-sized, and large companies. The rate oinon-r..ponse isimportant in deciding how reliable survey results are. Hrre are the data on responses tothis

Small Medium LarseResponse 375 160 40

No Resnonse 225 240 160Total 600 400 200

(i) What was the overall percent of non-response?

(ii) How is non-response related to the size of the business? Use percents to make yourstatement precise.

1f

b) Investment reports now often include correlations. Following a table of correlationsamong mutual funds, a report adds, "Two funds can have perfeit correlation, yet differentlevels of risk. For example, Fund A and Fund B may be pirfectly correlated,'yet Fund Amoves 20o/o whenever Fund B moves I0o .'Explain to someone who knows no statisticshow this can happen.

c) A study shows that there is a positive correlation between the size of a hospital(measured by its number of beds, .r) and the median number of days, y, thatpatientsremain in the hospital. Does this mean that you can shorten a hospitai stay by choosing asmall hospital? Explain your answer choice.

Yes No

Reason:

SG-37

Page 35: Comm291 Practice Midterm

-a) At a well-known business school the grade point averages (GPA) of-its 1000

undergraduates are normally distributed"*ith ..utt 2.84 Ad standard deviation 0'40'

(i) What percentage of the undergfaduates have GPAs below 2'00 (i'e' "on probation")?

MT2011: Question 5$Bartvs. Lisa does not refer to Simpsonos Paradoxo'

Answer:

(ii)whatGPAwillbeexceededbyonly20ohofthestudentbody?

Answer:

(iii) Compute the lower and

distribution:. ::.

upper quartiles, and the interquartile range for this

Ql : + Q3= IQR:

Bart: Lisa:

(ii) Circle either the name

reference poPulations.

b) Bart scores 725 onthe mathematics section of the Scholastic Aptitude Test (SAT)' Ina

reference population, sAT scores are normally distributed with mean 500 and standard

deviation 100. Lisa r.or., 33 on the Americutt Colltgt Test (ACT) mathematics test;

ACT score, ur. rror-utiy distributed with mean 18 and standard deviation 6'

(i) What are the z-scores for each student?

M

a)inr

evlor

inob

ceantothr

Ncin<

$,

Bart or Lisa (above) based on who did better relative to the

Page 36: Comm291 Practice Midterm

MT2011: Question 6 oostrength in numbersl numbers on strengthtt

a) To test the strength of building materials such as steel girders, engineers place

increasing loads on the girders until they break. The pressure exerted by the load that

eventually breaks the material is call the 'strength' of the girder. Generally speaking, the

longer the girder, the less the strength. Your company makes steel girders. The engineer

in charge of testing tells you that he has tested 10 girders to breaking point and has

obtained data linking the length of each girder (in metres) to its strength (in kg per square

centimetre). But his computer crashed just after he ran a regression analysis on the data

and all he can remember is the lengths of the girders and a few strengths. He did manage

to record the means and standard deviations olall the lengths and sGngths and the r2 ofthe regression, which was 0.719.

(X) Lensth (m) (Y) Strensth fte/cm')1 90

1 101

2 Lost2 LostJ 91

J 77

4 Lost4 Lost5 76j. Lost

Mean 3.00 82.60

SD 1.49 10.72

Note: The means and standard deviations are calculated for the ENTIRE data set,

including those that are missing.

(i) What is the correlation between length and strength? Report to three decimal places.

(ii) Work out a regression equation that predicts strength from length.

Equation:

(iii) You notice that the purchaser of your girders requires the 5 m girders to support an

average load of 75 kg per square centimetre. Do you feel confident your girders will do

that? Give a numerical rationale.

SG-39

Page 37: Comm291 Practice Midterm

X 1 3 5

Y 4 J 2

b) What is the correlation coefficient for the following three points in the X-Y plane?

(sroP AND rHU{I< rEEqBp YOU SrARr!)

Answer:

c) An American study found that the correlation between two-year-old children's heights

(measured in inches) and their weights (measured in pounds) was 0.46. What would the

correlation coefficient be if you converted their heights to centimetres and weights to

kilograms? (One inch :2.54 cm and 1 pound:0.454 kg.)

Answer:

d) An economist studied salaries of 321bank employees with five or less years ofemployment in a national bank. He found that the relationship between years of service

and salary was linear and that the regression equation predicting salary (in thousands ofdollars) was: Salary :2I.5 + 3.1 * Years.

He concludes that employees with 10 years of service should make an average salary of$52,500. Is his conclusion correct? If not, say why.

e) In part d) the economist has used the regression equation to make a prediction. Which

of these numbers best measures the precision of this prediction?

-A. The slope of the line (br)

-B. The standard deviation ofy (sr)

-C. The standard deviation of x (s,)

-D. The square of the correlation coefficient (r')

E. The ratio of the two standard deviations (s, /s")

f) An investigator measuring various characteristics of a large group of athletes found that

the correlation coefficient between the weight of the athlete and the weight that the

athlete could lift was r: 0.60. Determine whether each statement is true or false. Circleyour choice.

bn\4

4

(i) If an athlete gains 5 kg, he/she will be able to liftan additional 3 kg.(ii) The more an athlete can lift, on the average the morethat athlete weighs.(iii) 36 per cent of the athlete's lifting ability can be

attributed to his or her weight alone.(iv) 60 per cent of the athlete's lifting ability can be

attributed to his or her weight alone.

True False

True False

True False

True False

SG-40

Page 38: Comm291 Practice Midterm

MT2011: Question 7 o6Pack up all your troubles, and call it a day,,

An important part of the customer service responsibilities of a telephone company relatesto the speed with which troubles in residential service can be repaired. Suppose that pastdata indicate that there is a probability of 0.70 that service troubles can be repaired on thesame day they are reported.

a) Suppose the company receives 100 houble calls on aparticular day. What is theapproximate chance thatS0o/o or more will receive same-day repairs,

b) Suppose it is also known that the repair time for a trouble call has a mean of 480minutes and a standard deviatibn6f ZSO minutes. A random sample of 400 trouble callswas taken and the repair times recordpd. Compute the probability that the mean of the400 repair times is less than 500 minutes.

SG.41

Page 39: Comm291 Practice Midterm

MT2011: Question 8 6cstatistical analysis of a logo transformationo'

An established clothing retailer, CHAP, is interested in customer response to a pn

new logo. A survey randomly samples 100 customers; 55 of them say they wo11ld

the neri logo to the previous one. Ho*.,r.., CHAP will only change its logo if it is

convinced that the newly designed logo is preferred by the majority (i.e. more than hal

of its customers. Based on this information answer the following questions.

a) The sample estimate i, the proportion of customers who prefer the newly designed

logo over the previous one is:

_ A. 0.55

_8.55_ c. 100

D. Not able to be determined from the information given

b) The standard error of this estimate is closest to:

_ A. 0.0025

_ B. 0.050_ c. 0.071

D.0.50

c) The 95%,con_fidence interval for the true proportion of the customers who prefer the i

new logo over ihe previous one is closest to: '

_ A. 0.55 * 0.098

_ B. 0.55 + 0.98

_ c. 0.55 + 0.0049D. 55 + 9.8

d) How large a sample n would you need to estimate P,the proportion of people who

prefer the riewly designed logo over the previous one, with margin of error 0.05 with 99%

confidence? Use the guess f :0.5 as the value forp'_ A. 384

_8.664_c.26_D.271

e) If a hypothesis test were conducted on these data, the test statistic would be 1.00. If the

uitr*uti* hypothesis were one-sided, what would the P-value be?

_ A. 0.0794_ B. 0.1587_ c. 0.3174_ D. 0.8413

fl Which of the following is a correct conclusion from the hypothesis test in part e)?

A. Customers definitely prefer the new logo

- B. Customers definitely do not prefer the new logo

_ C. There is not enough evidence to say customers prefer the new logo

D. There is not.ttough evidence to say customers do not prefer the new logo

FiUrter

de

c)

Hc

d)

Fo

Co(sl

e)(

Page 40: Comm291 Practice Midterm

MT2011: Question 9 '6The business of bus-ness'o

You are the new Operations Manager of the local public transportation company and are

especially interested in the reliability of bus service. You plan, on a monthly basis, to takea random sample of major bus stops and observe whether the buses depart on time or lateand how late they are. (Buses never leave early since, if they arrive early, they wait untiltheir departure will be exactly on time.)

a) The first month, you gather a random sample of l2l bus departures from a variety oftimes of day, days of the week, routes and locations. The sample has an average lateness

of departure of 6.4 minutes with a standard deviation of 1.8 minutes. Which of thefollowing is closest to a95oh confidence interval for the average lateness of departuresfor the entire bus system this month.

_A.6.4+0.029_8.6.4 +0.271

_ c. 6.4 L0324_ D. 6.4 +3.564

b) Which of the following would decrease the width of the confidence interval?

- A. Reducg the confidence level

- B. Increaie tlie sample size

- C. Reduce the saqrple standard deviation

- D. All of the above

Five years ago, the system-wide mean lateness of departure was known to be 6.8 minutes.Using a 5o/o level of significance and the sample results of part a), cany out a hypothesistest to decide whether the system is improving; that is, whether the mean lateness has

decreased from five years ago.

c) The appropriate null and altemative hypotheses are:

d) Give the formula for the appropriate test statistic and compute its value.

Formula:

Computed value:(Show your work to the right::>;

e) Give a range in which the P-value is located.

SG-43

HulHo:

Page 41: Comm291 Practice Midterm

f) From the P-value associated with this test statistic, which of the following is

- A. Do not reject Hs atthe I}Yo significance level

_ B. Reject Hs atthe I0o/o significance but not at the 5% significance level

_ C. Reject Hs atthe Soh significance level but not atthelo/o significance level

- D. Reject Ho atthe to/o significance level

g) Using the 5o/o significance level, state your conclusion inthat the bus company management can understand.

h) The distribution of lateness of departure is strongly skewed to the right. However, itis

still appropriate to test the mean because:

_ A. The data are a simple random sample from the population of interest

_ B. The sample size is large enough for the Central Limit Theorem to apply

_ C. Since the sample is random, bus departures are independent of one another

_ D. All of the above

.. ,.1.

rl

BONUS: In what century did the "equals" sign first appear in print?_ A. 1300s

_ B. 1400s

- C. 1500s

- D. 1600s

-E.1700s_F.1800s

- G. 1900s

MT2011 - END Or QUESTIONS; ANSWERS AND EXPLANATIONS FOLLOW

n

tufrd)Init)111

Page 42: Comm291 Practice Midterm

ilr;2011'Answer I

?tl::"?:#;,;:6ff#i,11f;;unt'Quantitarive;c2e0gradeeuantitative;c) A; d) B; e) Mean(E = Zi, Snpg; = 19

Details and Comments;a) The goal was toa census. survey the entire population of c2gl students; that is the definition ofP#T*' ;,?,iiifli; Jll,iff iijif.'iffi h meas ured with uni ts (cm, %o,and hrs,

;i,ji"!:"*rtit4x;*::*{r#la*mwi,hahighnumberofflriiq;ff:,ff#f:'r",Xi::T.-lu'i.'",,

,ni,,ine iuiu., ir,i.r, are missing ror a reason

?,Iffi:ni:ff :i',.#T[l;:,'.g*1.,f#,#:1,Rure):73t2(10)

MT20lll. Answer Z-a) i) Median: f Z, ei';13,_et = 24, Ieft = 11.lnner fences : G3's, 49 D.iI;.';H (0,40.5).J There are no outriers.ir) The distribution i*t.*"o ril6il;.un i, quit, Jin r.", from rhe median.

b) All five statements are False.

lelails and Comments;a) r) With 25 datapoints, the median is the 13th value. ThrfiTllg,T:i.t::ff g:.?6?iJf sn:,:#ih,#::1,fi li,ry;ii:l,$li::I:tt ts also acceprable to reporr ir-"r o oi*uy p/E rr,,il;#:: [:i#;*R) is negative,

,-?rti:T:Y, the disrribu,il;;"#";;

ro rhe.tAl;; i#, oi.,,n,tiJn,u, not needed inii?Jh: sketch musr show rhe skewnes

;itn" box and *'.l.rt

'r'i'i.""i."tiliii if#1f"ffif;ft"fjan is croser to the reft side

j. iT"m;:f.ffi.-,Hfllj;*?1" to "work" so the dishibution is Nor symmerric.

;"ff#tr$;.::ffi *fr *:ixr#ti'l;:m;x;*ni",lx#j#i?yj. pere is no reason tor this to be true.5. SD = 0 if all data values;.;,h"-r;;.

201030 40

sc-45

Page 43: Comm291 Practice Midterm

MT2011: Answer 3

a) B or C; b) A; c) B; d) D; e) C;

Details and Comments:a) Both multi-stage sampling and stratified sampling are acceptable answers.

Technically, multi-stage sampling is the preferred answer, since for PUBS, four schools

are chosen randomly but the actual students are selected systematically.b) Since every fifteenth student is selected, the selection is systematic, not random.

c) Since students are free to come, or not, to the stand, this is voluntary response.

d) Counts are not parameters because they are not adjusted for sample size; however,

proportions are parameters.

e) Cards are handed out either to every fifteenth student or to volunteers; however, in

each group not everyone who receives a card will mail the card in; that's non-response.

MT2011: Answer 4

a) (i) s2% (62s11200:0.52)(ii) Non-response rates are: Small: 37.5o , Medium: 60%o,Large: 80%.

The larger the company the higher the expected rate of non-response.

b) Correlation is not the same as slope. So a perfect correlation does not mean that the

slope is 1, hence a I unit increase in x does not mean a 1 unit increase in y.

c) No: Larger hospitals are more likely to take more serious cases requiring longer len$h

of stay.

fDetails and Comments:a) (i) Sum across the columns to get the row totals of 575 Respondents and 625 Non-

respondents. Then divide by the overall total of 1200.(ii) Column percentages are needed here, not row percentages.

b) Remember the formula for slope: Ut - ,(*). Even if r: l,the slope is still the ratio of

the SDs, which need not be equal.

c) Look for lurking variables to explain unusual or nonsensical correlations.

MT2011: Answer 5a) (i) Pr (X < 2.00) :Pr (Z < [2.00*2.84]10.40):Pr (Z<-2.10):0.0179 ot 17.9o/o.

(ii)Z:0.84; X :2.84 + 0.84(0.40):3.18 (or 3.176)(iii) Q1 : 2.57iQ3 : 3.1 1; IQR : 0.54

Ql for Z:-0.675;X:2.84 + (-0.675)(0.40):2.57Q3 forZ: 0.675;X:2.84 + (0.675X0.40):3.1tIQR:3.11 - 2.57 :0.54

b) Bart: 2.25;Lisa:2.50, Circle LisaZ-score for Bart : (725*500)/100 : 2.25:. Z-score for Lisa : (33-18y6 :2.50;Lisa did better relative to the reference populations since her positive Z-score is higher.

Details and Comments:a) Remember to make sketches of the required areas so that you get the correct parts ofthe normal curve. In (i), standardize Xto Z and find the corresponding area; in (ii) and

(iii), begin with the area, find Z and 'ounstandardize" to get X.

a

(i

b(isl

bc,

4tte)

D

Da)

threwBib)

0Akp

m

Ma)

b)

De

a)

b)Bcare

col

SG-46

Page 44: Comm291 Practice Midterm

tuIZ0ll: Answer 6

a) (i) r = *tlffi: -0.g4g Q'.ote that the correlation is negative!)(ii) bI= t (#) = -0.848(1 0.72/t.4s): -;.;;,^-'@!rv'rD'vbo = ! - bfi :82.6::t_g.10X3.00): 100.9; !: 100.9 _ 6.10x

1,3",JR ; j ll,i,;*l r;l ;j:; ;;:t m*:H:th: required 7 5 kgt cm2, you

b) Perfect negative correlation' r : -? tpr"t p o"i, o"irir, ,t .y fall on a straight line.)c) r:0.46, unchanged (coneration is i'nuaeant to thi -"ur*"..nt scales.)d) No -predictionJut rb. y1.r..w;s extrapolation beyond the range of data(that is,

illiq[J"tiffi J:'X*iin:ajn/;;;"iH'i,u,,or,i,prov-.,,qf) False, True, True, False

Details and Comments:a) The minus sign is vital; the correlation is negative since the longer the girder, the lowerthe strength' If you rotgtitttto'iiu-rlign yo* iarculations of the slope, inlercept andregression equation wiit be itr.oo..t urra you *";iJ;; up concluding that 5 m girders#:1i#,','#ni;:,:'f#??fi ii;a;;*;;;#?"'.bu'dingmighrralrdownb) Remember to makg.a ptoruerore doing the calculations.| (i) is farse becausS;d* 9ir[r *ir'i.un;;;;i;; fift of 3 kg onry on averase.A gain of 5 kg mighl eiieaoiiti#itift gr."i..lriil;;i". some people and less than 3;1,H:li;ti|)||ft:;""I1{'r'.pp." on averase; (iiifuses trre oennition or,,; ri.,,t

MT201l: Answer 7a)pr (p > o.8o) :pr (zrffil

:pr (Z t_2^.1g):0.0145 or l.4So/ob) pr (r < soo) : pr (2. jiffi )

:Pr(Z < 1.60) :0.945 or 94.5%

Details and Comments;

"rJ yr. the sampling distribution of p.

b) Use the sampling distribution of x- (i.,e. rele_r1ber the ,/i;nthe denominator).Both of ther. .ituuiionr a"p.nJo" riJc.nout it*r ill',- *a u"'rlrrJo_ samplesfr*lTr;;ough

(r00 andz00,..rpr.tiu.r/. Remenrberio.ut. a skerch to ger the

SG-47

Page 45: Comm291 Practice Midterm

MT2011: Answer 8

a) A;b) B; c) A; d) B; e) B; 0 C

Details and Comments:a) Reason: p:55/100 : 0.55

b) Reason: J (0.55) (0,45) / L00 : 0.050

c) Reason: 0.55 + 1.96(0.050)d) Reason: n: (2.5762X0.5X0.5y(0.052) : 664

e) Reason: Area to the right of 1.00 on the z-ctrrve.

f) Reason: The P-value is not less than 0.05 (and not even less than 0.10).

MT2011: Answer 9

a) C b) D c) He: lr: 6.8; Hu: P < 6.8., I-tto 6.4-6.8 : _2.44d)t:T":73/r[m -L.11

e) 0.005 < P-value < 0.01

f) D. Reject Ho at the to/o significance levelg) There is strong evidence to say that the system is improving (or that mean lateness

decreased) fh) B or D (either is acceptable)

J

IDetails and Cimments:

a) Reason: ti26 : t'.980: CI : 6.4 + L 980(1 .811ffi) - 6.1* 0.324 {

b) Examine the effect of each of these by referring to the formula for the CI.

c) This is a one-tailed altemative since the question asks whether mean lateness has

decreased from five years ago.

d) Remember the minus sign on the test statistic.

e), 0 & g) Reject H6 since the P-value is less than 0.01. Remember to state your

conclusion in a sentence that answers the original question.

h) B is the most important of the three, but A and C are also needed for the test to work.

BONUS: C. Theooequals" sign first appeared in print in 1557.

MT2O11 _ END OF ANSWERS AND NXPLANATIONS

bCuu

c),his

Page 46: Comm291 Practice Midterm

Midterm Exam 2010

Notes: This exam has 9 questions. The duration is 2 hours. Bookso notes, andcalculators are allowedo but not computers, cellphones or on-line connectivity.

MT2010: Question 1 'rMittens, means, and medians'r

a) The Hudson's Bay Company was the official retailer of Olympics merchandise,including the very popular red mittens. Their database included information on each salemade to customers who paid by credit card (Visa only). Some of the variables theycollected are listed below. Decide whether each variable would, for analysis, be mostusefully considered as categorical, quantitative or neither.

o Total amount of the sale (g) Categorical Quantitative Neither

r Country of origin on credit card Categorical Quantitative Neither

o Gender of the customer

o Visa credit card number

Categorical Quantitative Neither

Categorical Quantitative Neither

b) Credit card customets were divided into two groups: Canadian residents and visitors toCanada. The average amoirirt spent by all Canadian residents was $200. The averageamount spent by all visitors to -Canada was $300. What must be true about the averageamount spent by all customers?

_ A. It must be $250

- B. It must be larger than the median expenditure

_ C. It could be any number between $200 and $300_ D. It must be larger than $250

c) A sample of 500 cash sales had a mean of $20 and astandard deviation of $40. Thehistogram of the data would most likelybe:

_ A. skewed to the left (i.e. long left-hand tail)

- B. approximately symmetric

_ C. skewed to the right (i.e. long right-hand tail)_ D. bimodal

d) Which of the following is likely to have a mean that is smaller than the median?_ A. The salaries of all National Hockey League players

- B. The grades of students (out of 100) on a very easy exam on which most

score very high or perfectly, but a few do very poorly_ C. The prices of homes in Vancouver

- D. The grades of students (out of 100) on a very difficult exam on which most

score poorly, but a few do very well

SG.49

Page 47: Comm291 Practice Midterm

e) Here is the frequency distribution of the ages of a sample of 100 employees of the

Hudson's Bav ComAge (years) Freouencv

15-19 2

20-24 10

25-29 19

30-34 27

35-39 16

40-44 l04s-49 6

50-54 5

55-59 3

60-64 2

Total 100

(i) What percentage of the employees is 50 or older?

(ii) The median age of the employees is:

- A. About 40

_ B. Between 30 and 34

_ 9. Between 40 and49_ D.l,{one of the above

(iii) The mean age'bf the employees is:

- A. About 34 because about half are younger than3{ and half are older

_ B. Above the median because the distribution is approximately symmetric

- C. Above the median because the distribution is skewed to the right

_ D. None of the above

f) Based on the following figure, decide whether each of the statements below the

is more likely to be True or False. (f{ote: House income means "total householdand is referred to simply as "income" in the statements.)

350,000

o 300.000E

.p zso,ooo

$ zoo,oooof tso,ooo

: L00,000

I 5o,ooo

0

BMW Cadillac Lexus Lincoln

Mercedes buyers have the highest variability in income.

For each car type, the incomes are reasonably symmetric.

There is a positive correlation between income and brand.

Mercedes

True False

True False

True False

Page 48: Comm291 Practice Midterm

MT2010: Question 2 "Catching some zzzstl

a) consider a standard normar random variabre, z, (i.e.with mean 0 and standarddeviation 1)' Find the median, lower and upper quurtii., *o interquartile range (IeR) of

Median of Z:

Lower quartile (e1):

Upper quartile (e3):

Interquartile Range:

b)What percentage of values of Zlieoutsideis, find the total percentage below "Median _

1.5xIQR on each side of the median? That1.5xIQR" or above "Median + 1.SxIeR".

c) Draw a boxplot that woufi represent data obtained from a large sample of values of Z.

d) This part is unrelated to parts a), b) and c).scores on the wechsler aaurt rntettigence scale (wAIS), a standard Ie test, areapproximately normally distributed 6r all age gr"rpr, il;*.ver, the means and standarddeviations of scores differ across different_ag. g.oupr. For the 20 to 34age group, themean is 1 10 and the standard deviation is 25",''iil.'rorirt. 60 to 64age group, the meanis 90 and the standard deviation is 25. sarah is 29 and,her mother Ann is 62. sarahscores135 on the wAIS while Ann scores 120. which of the two has the higher score relative toher age group? Explain your choice with appropriate calculations.

Ann Sarah

SG-51

Page 49: Comm291 Practice Midterm

MT2010: Question 3 "Contender for gender offender'r

A university offers only two degree programs, one in Engineering and one in English.

Admission to the programs is competitive, and a women's group suspects discrimination

against women in the admissions process. They obtain the following data from the

universi a two-wav classltlcatlon oI aMale Female

Admitted 35 20

Not Admitted 45 40

lassification of all applicants by gender and admissions decision.

a) Is there evidence of an association between the applicants' gender and success inobtaining admission? Why or why not?

b) The university replies that there is no discrimination. Ir its defence, it produces a

three-way table that classifies applicants by gender, admission decision AND program to

which ied.Ensineerins Enslish

Male Female Male Female

Admitted' 30 10 Admitted 5 10

Not Admitted 30 10 Not Admitted 15 30

Is there an association between admission rates and gender in either program? Explain

why or why not.

c) Are the answers in parts a) and b) contradictory? If so, how can you explain the

contradiction?

d) After disregarding gender, are admission rates different in the two programs? Support

your conclusion with an appropriate two-way table (i.e. admission decision by program).

d)o1

e)

m(

SG-52

Dr

Page 50: Comm291 Practice Midterm

MT2010: Question 4 rrBeauty is in the eye of the frolder"

On a recent trip to Mars, scientists discovered a colony of small creatures that they namedfrolders. Due to the speed and agility of the frolders, the scientists could only capture fivespecimens to bring back to Earth to study. One scientist suspects the weight of the froldermay be related to the number of eyes it has. The following table shows the weight andnumber of eyes for each of the five soecimens:o VE

Specimen ID aI01 4102 Ar03 4104 AT05Weisht (ks) 2 8 4 15 6Number of Eves 2 t1 5 t7 5

b) Briefly describe the association (must be briefforfult marlcl)

c) Which of the following values is the correct correlation coefficient for this data?Note: You can reason this out without doing the calculation.

_ A. r:0.5_ B. r:0.975_c.r:o_ D. r: -0.954_8. r: -0.5

d) Looking at the scatterplot, is the correlation coefficient an appropriate measure? Whyor why not?

e) A joumalist reporting on this study claims that being heavier causes a frolder to growmore eyes. What is wrong with this statement?

a) Plot these data below.

Frolder Study

0 5 'i'10 15 2A

Weighr (kc)

-* ***--i" -'"" - "- -*--i--" - "- -*i------"*""*ii;i;!iii

............................j........."...."..........^....i...,.. ; i_- - ----- "i- - - -^-* -i' -'- -**-i*- ". ""'".-* iiiiiiiii- ----"'" I "----*-"'i""*-^***-i- ""' ^-- -iiii!';i:aii:"-*-"..*"....--'i- *'...'....".....".'...i.".....'.......^..........,."1,,,....^..............."......;:i!i,;!li!iiii ii

20

L5oo

'i 10r*F

5

0

f) Do you think these five frolders represent a random sample? why or why not?

SG.53

Page 51: Comm291 Practice Midterm

MT2010: Question 5 "wires, dam wires, and electricians'r

Electrical wires can corrode over time. And wires used near hydroelectric dams can

corrode more quickly because of the extra moisture in the air' corrosion rates (measured

in hundredths of mils; are generally known for various types of wire, but electricians

would like to be able to prJdi.t the corrosion rates near dams. Corrosion rates for 30

types of wire were measured in normal use and at dams to assess the relationship' A

linear regression model can be constructed with wires in normal use as the x variable and

the same wires used at dams as the y variable. The following scatterplot shows the data:

L200

1000

800

500

400

Wire (normaluse)

d)fo

e)

o

;Eoo

D1

s)lwir

400

200

0

Rer

a) In this study, the response variable is:

A. Corrosion rate for a dam wire

- B. Corrosion rate for a wire in normal use

_ C. Either rate; itdoes not matter which is considered the response

- D. Neither; the instrument used to measure corrosion is the response variable

b) Is linear regression appropriate here? choose the single best statement'

A. Yes, the scatterplot is straight enough

- B. No, there is not enough scatter

- C. No, there is too much scatter

- D. Yes, there are no outliers

c) Summary statistics are presented below. Use them to calculate the regression line'

Show the formulas and your work. Report your final answers to [email protected] places'

I :304.6667 ,sx : t96'4466 r : 0'8691

h)r

I

End

l

t:554.0000 sy:286'6104

(

I

I

I(

SG-54

Page 52: Comm291 Practice Midterm

d) A new type of wire has a corrosionfor the corrosion measure of this type

rate measure of 555. What does the model predictof wire used at a dam?

e) one of the data points is (220,245). Whatis the value of the residual for this point?

f) what fraction of the variation iny is accounted for by the model?

g) Can the regression line be used to reliably estimate the dam wire corrosion rate for awire which has a rate of 2500 mil under normal use? Give a reason.

_Yes _NoReason:

h) Fill in each blank with the letter of the ending that fits best.

(i) If the x andy variables are switched, _.(ii) If the units are changed for both x and,y variables,(iii) If the units are changed for just the x variable,

(iv) If a constant is added to the y variable,

Endings:A' "'the slope will change but the averages and standard deviations will not change.B. ...s, will change but ! will not change.

C. ...the data will be normally distributed.

D. ...only the correlation will change.

E. ...the correlation, slope, and standard deviations will remain the same.F. ...the correlation and slope will both change.

G. ...the slope will change, and s" and s, will also change.

SG-55

Page 53: Comm291 Practice Midterm

MT2010: Question 6 rrPutting the pedal to the medalil

Retain gflprecision throughout your calculations but write down only two decimalfor your final answer.

For parts a) and b), assume that the weights of the gold medals, silver medals, andribbons are all independent (especially since we have not learned how to deal with such

questions otherwise ! ).

a) Each medal made for the recent Olympics is unique. Ours were the first OlympicGames for which the medals have not been identical! Complete gold medals (that is, the

medal plus the ribbon) weigh 48 grams on average with a standard deviation of 6 grams.

The ribbons that are attached to the medals weigh 8 grams on average with a standarddeviation of 2 grams. Find the mean, variance and standard deviation of the weights ofthe gold medals without their ribbons.

b) Complete silver medals (i.e. medal plus ribbon) weigh 38 grams on average with a

standard ddvia{ion of 5 grams. Find the mean, variance and standard deviation of a pair ofcomplete medals (gold and silver) combined.

$

c) You were instructed to assume that the weights of the gold medals, silver medals, and

lengths of ribbon are all independent. Is this a reasonable assumption? Explain why orwhy not in one brief sentence at most,

d) In some winter Olympic events, such as the snowboard parallel giant slalom, thewinner is the rider with the best combined time over two runs. In some summer Olympicevents, such as the javelin throw, the winner is athlete with the best single distance out offour tries. Generally speaking, does the sum of two random times or the maximum of fourrandom distances have greater variability?

- A. Sum of two random times

- B. Maximum of four random distances

- C. Cannot say because time and distance are unrelated variables

Why? Explain in one sentence maximum.

C

rc

Page 54: Comm291 Practice Midterm

MT2010: Question 7 r'The food of the godslr'

chocolate bars produced by a certainmachine are labeled 240 gransto comply withadvertising rules and regulations. However, the distribution of ihe actual*rigit of thesechocolate bars is claimed to be normal with u^iun oii?igrutnr and astandarddeviation of 3 grams.

a) Approximately what percentage of all chocolate bars produced by this machine wouldbe expected to be between24} iaZqegrams?

b) A quality control manager initiallyplans to take a random sample of size n fromtheproduction line' If he were to double his sample size to 2nt, thestandard deviation of thesampling distribution of the sample mean x would ue mutiiptied by:_4.1t2_B. U\n* C. \n_D.2

c) The quality control man$ger plansproduction line. How big should n bedeviation 0.3 grams?

_A. 10

_ B. 100

_ c. 1000

- D' Cannot be determined unless we know that the population is normal.

d) tf thq quality control manager takes a random sample of nine chocolate bars from theproduction line, what is the probability that ilt. ru*pi. *;; weight of the nine samplechocolate bars will be less than240 grams?_A.0_ B. 0.0013

_ c. 0.1587_ D. 0.9987

Show your work:

to take a random sample of size n fromtheso that the sampling distribution of i-has standard

SG.57

Page 55: Comm291 Practice Midterm

MT2010: Question 8 rrshooters for the shooters?'r

A radio talk show host with alarge audience is interested in the proportionp of adults in

his listenin g arcawho think the drinking age should be lowered to 18. To find out, he

poses the following questions to his listeners: ooDo you think that the drinking age should

be reduced to 18, in light of the fact that 18-year-olds are eligible for military seryice?"

He asks listeners to phone in and vote "yes" if they agree the drinking age should be

lowered and o'noo'if not. Of the 100 people who phoned in, 70 answered "yes".

a) The sample estimate, B, of the proportion of adults who think the drinking age should

be reduced is:

_ A.70_ B. 0.70_ c. 0.69

- D. Not able to be determined from the information given

b) The standard error of this estimate is closest to:

_ A. 0.089

_ B. 0.046

_ c. 0.0021

_ D 010045

ri

c) The margin of error for a90%o confidence interval is closest to:

_ A. 0.046

_ B. 0.075

_ c. 0.090

_ D. 0.690

d) How luge asample n would you need to estimate p withmargin of error 0.01 with

95% confidence? Use the guess f :0.6 as the value forp._ A. 6768

_8.9220_c.9502_ D. 9596

e) Which of the following assumptions for inference about a proportion using a

confidence interval are violated in this case?

_ A. The data are a simple random sample from the population of interest

- B. The success/failure condition

- C. A third choice of no opinion needed to be included

- D. There appear to be no violations

otiriri:

Page 56: Comm291 Practice Midterm

MT20l0: euestion 9 ,,Going postal,,

A simple random sample of 100 canada Post emproyees found that the average timethese emplovees had;"rk.d;; r]* 0""" ,.*rri-#uJ o.v*r, with standari deviationor2'0 vears. oo these data;;;;#ffi.nr. ,r,ur-L;;# lengrh of time thar thefl:tr'il:"lui.|.t#:"Jr"i|;H.:"1ffi r'uu"*ot;Hffi oosiarservicer,ui".r,*g.o

a) Give the appropriate null and alternative hypotheses.

b) Give the formula for the appropriate test statistic and compute its varue.

c) Give a range in which the p_value is located.

t"l?:d;trJ:;il:x^;;:tj["r;ti"J;lfr I;;:J*3;orthero,,owingisco*e*?

- B' Reject Ho atttr. rol' ri*in.iir.-u"t;;;,h.

5% significance level- c. Rejecr,,Hp ar the s% sie;hcrr;;i#;;;"

ther%significance rever_ D. Reject lroat the t%o,ifiiinrun.. i;r;i ve! uv!

fl "y8lf"ff fnlfl fl:J,ffi n:?,JJillJ;ffi :"','usioninonecrearrywordedsentence

l"lli:"t#;ir'Jj*rn:T nme the population orpostar emproyees have spenr wirh rhe_ A. 7.0 l.0.2_ B. 7.0 + 0.4_ C. 7.0 x 2.0_ D. 7.0 + 4.0

Bonus Question: Just for Fun and Bragging Rightsover the r 7 davs

"111r winter.orympics you saw the olympic rings rogo countlesstimes. In the officiar logo, not rr,. .rgr.-.il; t;;,i,# roro version, each of the five;Tf:;'#Jj"tT::ff*ffi *ilffi];,f H:[:sberrheorder"rir,.iiffi in,n.

Ring I Ring 2 RG'Ring 4 RG'

END OF MIDTERM 2OIO -ANSWERS AND EXPLANATIONS FOLLOWSG-59

Page 57: Comm291 Practice Midterm

MIDTERM EXAM 2O1O: ANSWERS AND EXPLANATIOI\S

MT2010: Answer 1

a) Quantitative; Categorical ; Cate gorical; Neither

b) c c) C d) B e) (i) 10% (ii) B (iii) cf) True; True; False

Details and Comments:a) Although the text considers an identifier variable, such as a Visa credit card number,

atrypeof Jategorical variable, it is useless in that form; it is best thought of as Neither.

You aren't likely to do any analysis on the Visa card number!

b) The average must lie between the minimum and maximum, but depending on

,k.*n.ss it could be smaller or larger than the midpoint or median.

c) The minimum value is 0 but the maximum can be very large, hence right-skewed.

d) A11 except B are likely to have a long right-hand tail, where the mean exceeds the

median.e) (i) (5+3+2)1t00: llYofiii +it of values (2+20+lg) are less than 30; including the 30-34 interval increases th

cumulative count to 58% (2+20+19+27).f) Incomes are not exactly symmetric, but for all practical pulposes and especially for

data analysis, they certainly are reasonably synmetric.

- .:..

MT2010: Answer 2

a) Median:0; Qli: -0.675; Q3 :0.675; IQR: 1.35

bj rrob. : 2xpr (z > L5x1.35) : 2xpr (Z > 2.025) : 2x0.0215 : 0.0430 or fuout 4o/o

.j fn. boxplot is s;rmmetric around 0, with the ends of the box at Ql and Q3 at '0.675

and0.675 (from putt u). Since Zhasno limits, the whiskers can't extend to the minimum

and maximum. Instead, use inner fences; the whiskers should extend to -2.7 and2'7 '

d) Ann has a higher rank.Ann's z-score: (120-90) 125: L2; Sarah's z-score : (135-110)125:1

Details and Comments:a) Z is symmetric so the median equals the mean.

It is acceptable to report answers to two decimal places:

For Ql: -0.68 or -0.67; for Q3: 0.68 or 0.67;for IQR: L36 or 1'34

b) If you used IQR of 1.36, the probability is 0.0414.

If you used IQR of 1.34, the probability is 0.0444.

c) Since the distribution is unbounded, any reasonable choice of whiskers is acceptable.

II1

1l

0

n

1\

a

b)

c)d)

e)

0

Dtc)d)

Page 58: Comm291 Practice Midterm

MT2010: Answer 3a) Yes: Percent of males admitted:35/g0 :0.4375 0r 43.75yopercent of Females admitted :20160:0.33 o, Ziilob) No: Half of engineers of either sex are admitted. one-quarter of English students ofeither sex are admitted.c) The English pro{?m is harder to get into, and that is where more females applied. Thisis an illustration of Simpson's parado"x.d)

Ensineerino English Row TotalAdmitted 40 l5 JJNot Admitted 40 45 95Column Total 80 60 140Admitted to Engineering:

Admitted to English:40180: 0.50 or 50oA15/60:0.25 or 25%o

Details and Comments:when a two-way table is provided, it is useful to add the row totals and the column totals.They are needed to compute conditionar prouuuilitirr. ii.!ron', paradox is one of themost revealing illustrations of the need to dig deeper il; tt. relationship betweencategorical variables' *3,-Tt*nt appearto be the result for a two-w ay tablemay well bereversed when a third rl4riable is incorporated.

MT2010: Answer 4 ..i

d) Yes; there is a clear linear relationshipe) Correlation does not imply causation.D No. They were the slower ott*, o. tt

" easier ones to catch.

Details and Comments:c) Since the correlatio.n is strong and positive, only 0.g75is a sensible choice for r.d) Conelation coefficients requlre lj4iar relationships.

SG.61

20f15E10v5e0s 0t020

Weight (kg)

Page 59: Comm291 Practice Midterm

MT2010: Answer 5a)A b)Ac) h - r (*) : 0.86e1(2 86.6r04ns6.4466) : r.268

bo = ! - bfi: 554.0000 - 1.268(304.6667): 167 .683 (ot 167 .684)

9 : 167.683 + 1.268x (or f: 167.684 + 1.268x)d)f :167.683 + 1.268(555)--871.423 (or 871.424)e) 9 :167.683 + 1.268(220):446.643 (or 446.644)

Residual : e:245 - 466.643 : -20L643 (or -201.644)

D ,t :0.86912 :0.755g) No; this is extrapolation far beyond the range of data.h) (1) A (ii) G (iii) B (iv) E

Details and Comments:a) Response variable is on the vertical axis.c) Beware of round-off error. Carcy all available decimal places in the intermediatecalculations, but report fewer as instructed.d) Simple substitutione) Use the definition of residual: observed minus predicted.f) This is the definition of r-squared.g) Although it is mathematically correct to substitute 2500 into the regression equation,

extrapolation far beyond the range of data is a major misuse of regression.h) Examine"th6,.formulas for slope, intercept and correlations and test out the effect of the

suggested changes.. For (ii), correlation does not depend on units, but slope and SDs do.1

change if both variables change. For (iv), the scatterplot is simply moved straight up, so

SDs, slope, and correlation are not affected.

MT2010: Answer 6a) Mean (X-Y) : Mean (X) - Mean (Y) :48 - 8 :40

Var (X-Y) : Var (X) + Var (Y) : 36 + 4:40SD (X-Y) : r|fr.:6.32

b) Mean (X+Y) : Mean (X) + Mean (Y):48+38 : 86

Var (X+Y) : Var (X) + Var (Y) :36 * 25 :61SD (X+Y) : rfif : 7.81

c) Yes: Heavier ribbons are not expected to be found only on heavier medals.

d)A

Details and Comments:a) and b) The variance of a sum or difference of two independent variables is always the

sum of the individual variances. Remember that calculations are not done with standard

deviations; combine variances first and then take the square root.d) The sum of two random variables generally has greater variability than a singlerandom variable. However, if the question had asked about the oomean" of two measures

rather than the sum, then the mean would have lesser variability than a single measure.

Ia

q

irbp(

d,

thrl9

Br

BI

Page 60: Comm291 Practice Midterm

fr&?i;;ffi:soo46d) Reason: , = (i.66 .iioiii,ro.ot21= e220e) The data arc'a""nuJni.iri. il;JJ since peopte choose

MT20l0: Answer 9 ..F "'vw vvupro cnoose whefher or not to phone in!a) He:p =7.5;Hu: p*7.5b)1= ry- 7'o-7.s _ ^ -

:ll."r"'_.,rj ,^i:!m;

-''' d) c

iTfi{r:.il'iJH:io sav that the m#n rength of time emproyees have worked forDs t o; ffif[i./+,ff:i?;'fr u;;'"'

vr rr'Ie 'rD]

lepn and Comments;a) This is a t_test of a ,ingte mean. Th

fr ;:ii?,tt,;*lli;*;*#;ir;i'Ifi :il:l'ff ti'.:irix",;tt'l?l#Tr,T,n,,ll * ")

Note rhar "rrfr""gi,fri'uutr* op.ositive uaurln'tt "

t-ta6le "

t;;*i1,]it[test statistic is negative, you rook up theO Since the p-value i, r.otirin;rr.r$H.":*:,;rffiT:l rl?r-0r,, ;;,'*?" the null hypothesis at the 5%o revet;but sincef)rhe;;ilffi;1.H:*:f j:*Ti1i*t?iil,::f"::Ttili1i jrj;j*,:_.",,,""

-I = 99 df Use the vatue'fo, f Od-li" tfle taUte.*::,t euesrion: Olympic Rings colourstsLUE BLACK REDyELLow cneg*-END OF ANSWERS AND EXPLANATIONS

TO MIDTERM 2O1O

MT20l0: Answer 7a) 68% bl;";ri' ror d): ;'l; ' 240) =;]; - ,oo-rIt\ p,o, = pr (z< -3) : 0 00r 3?e!?ik and Commens;a) Use the 6g/95/rrin.i","#"d;i?;1,[XliffJ*,1l]l:l'.rro, * rj: 243 x3 = (240 ,246)

:.:tft Y,lli;:tg':"iru,ruX1fi liif,'.T.#:oi,ntmffi :yroffi =):;d:7;',:#'9{ygl;{:l:: la,ot"it^",'i,""L,,d) since the question i, uuout d."rr*t, *.un, ,rr" ,rt dardizationuses o/16 = 3Ni.MT20l0: Answer ga)B b) B c)B d)B e)A

lelails and Commen9;aJ Keason: i :70/100 : 0.70

SG.63

Page 61: Comm291 Practice Midterm

Part B. Past Years' Midterm Exams

A collection of questions from midterm exams of past years' withanswers and exPlanations

This section presents questions from midterm exams prior to 2010. Since course syllabi,

textbooks, oid.r oftopics, and even notation, have changed, not every question from past

exams is relevant today. So the exam questions have been reorganized by broad topic

area as follows:

Section A: DescriPtive StatisticsSection B: Scatterplots, Association, Correlation, Least Squares Regression

Section C: Normal Curve, Sampling Distributions, Combining Random Variables

Section D: Introduction to Inference, Confidence Intervals, Hypothesis Tests

Section E: Miscellaneous

Questions in each topic area are afiarrged from the most recent year and go back in time.

fifowing the questibns in each topic area is a set of answers and explanations/comments

about the answers. The comments give details of calculations and cofirmon errors made

by students. ,.:.

Since the teaching'of any course is dynamic and always undergoing change, there may

still be some terminology or notation or even a few parts of questions which are

unfamiliar to you. If you are unclear whether a particular question or topic is relevant to

the current year, please ask your instructor.

Page 62: Comm291 Practice Midterm

SECTION A: DESCRTPTTVE STATTSTTCS

Question Al (MT2,099:9rl 6oNot yet an olympic Sporto,

irr:l];JJ#f l*::lli1g"T.;.t,eriarrromrarersections.ffi.fr-U*,?:lli:f,ffiTJ;'ffi;ffi:##:Ti;'.'$id'',a,eandremale

a) which team has more members? (circle the correct response)Male Female Can,t tell Same size

'?.H;llT;*ft?i:?1,flil:**,y3#f,:f' rouowing is most rikery rhe mean age or

2122n30c) For each of the three measures below, fill in the numerical value in the blank provided:il3,l:'illl:'.f ;i.,ffi . ;;;;;i A*iffi ffi , or none o r rhe s e (c ircr e one

Interquartile ranse(for males), "

5Oth percentile(for females)

Oldest male member

Is a measure of;

Shape Cenhe

Value:

Spread None

None

d) The distribution of male ages is: (circle the correct response)

Symmeffic Skewed to the left

Shape Centre Spread

Shape Centre Spread None

Skewed to the righte) The distribution of femare ages is: (circle the correct response)

Symmetric Skewed to the left Skewed to the right

SG.65

Page 63: Comm291 Practice Midterm

f) The mean male age is 22,5 years. One of the members of the male team is 22 years oldand has a z-score of -0.25. What is the standard deviation of male ages?

g) If we assume that male ages are normally distributed, what proportion of males on the

team are 22 years ofage or younger?

Which of the following is the best justification for the assumption of normality madepart g)? (Check the best response)

_ A. The Law of Large Numbers_ B. The Central Limit Theorem

- C. Least squares regression

- D. None of the above

i) Team members are required to take a course in the history of underwater basket-weaving. The professor records the values of several variables for each student. These

variables are listed below. For each one, decide whether it has been recorded asquantitative or categorical.

h)in

Score on the final exam (out of 200 points)

Final grade for the course (A, B, C, D, or F)

The number of lectures the student missed

Brand name of favorite swimsuit

Quantitative

Quantitative

Quantitative

Quantitative

Categorical

Categorical

Categorical

Categorical

j) Universities across North America require underwater basket-weaving students to take

a quantitative skills test. Percentage scores on this test have a mean of 30% and astandard deviation of l0%, Give a range within which you would expect to find themiddle 95o/o of all North American underwater basket-weaving student test scores.

csl

ol

In

A1

uIQ

Page 64: Comm291 Practice Midterm

Question A2 (MT200s-er) 6.There are two kinds of data _ good and bad!,,

;1f,:TJ;"tljl.* of the dara set in which cyberStat corporati information

tffifJni:tr;""r1:te variables below which are recorded as quanrirarive scate variables

EmPloyee # Surname Age Gender Salary Job Typeb) Three small Statistics classes all took the sameclass are shown below.

test. Histograms of the scores for each

Class 1

I

a

I

40 50 60 7U 80 90 100

lts em on recordsE *-r^

Surname_".u;au-ysli /f Age Gender Salary Job TypeI IZJ4 Srnith 3923467 Jones

remale $62,100 MAttn".o^onJ?7 Male $47,350e8543 Chan 27 Female $zs.zso76548 W'ono utencal48 Male s / /,600 Management

Class 2

fa

I

I

{ 50 60 70 80 s0 J00

(i) Which class had the highest mean score?(ii) Which class had *. ild;;;, median score? r z 3(iii) Forwhich.tu**.-rt"',,'"ilameaianmostdifferent? 11 3 ;(iv) which class had rhr;;;u.rt standard a"uiutiJz r z 3

c) For each of these variables, decide whether its dishibution is more likely symmetric or:f:ffiH,-jj.tffi;sr'iffi;) "r skewed r.ft (j:;;"g reft-hand ta') circre one

Individuar incomes in the united states Symmetric Skewed right skewed reftAge of male heart attackvictims symmetric Skewed right skewed reftLifetimes of electric light bulbs symmetric Skewed right Skewed leftIQ scores of the canadian population Symmetric skewed right skewed left

Class 3

7

6

5

4

3

;l

I

I

I

40 50 60 70 80 90 10€

sG-67

Page 65: Comm291 Practice Midterm

Question A3 (MT2008-Q2) ooA Nash-ional Game"

The data set to the right contains all the point differentials or margins in all NBAgames played by the Phoenix Suns up to February 13 of the 2007108 season.Negative numbers indicate losses, positive numbers indicate wins. The data havebeen arranged in ascending order for you (biggest loss to biggest win).

a) Compute the various numerical summaries and put them into the table belowpart b) under "original data." Some have been computed for you.

NOTE: Part b) is not part of the current curriculum. You can ignore it. Butthink of it as a challenge question. It is easy to figure out. Instructionsare given in the Answers/Comments.

b) Suppose the data undergo a transformation such that tr : 2X - 3, where X:the original variable and,X* is the oonew," transformed variable. Find all of thenumerical summaries forX* and put them into the table below under'otransformed data".

OriginalData (X)

TransformedData (X*)

Mean 5.6

Median

Range

Q1

Q3

IQR

Std dev II,7

c) Are there any outliers? Use the "inner fences" definition of outliers and theoriginal data (not the transformed data) to identify any outliers.

Lower inner fence :

Upper inner fence =

Observation numbers of outliers:

Obs#

1

)aJ

4

6

7

8

9

10

11

t213

t415

16

17

l8t920

2l)J23

24

25

26

27

28

29

30

31

32

JJ

34

35

36

37

38

39

404t42

43

44

45

4647

48

4950

51

52

Page 66: Comm291 Practice Midterm

a) A sample of shoppers at a malltype of data are more likely to be

What is your age (in years)?

How much did you spend (in $)?

What is your maitalstatus?

T:" ,* avaitability of parking.(Excellent, Good, Fair, poor)

b) Here is a table of sources of electricity in canad a andthe uS and the percentage ofelectricity generated by each. c"".ttr.i; bar graph to *-pur. canada and the uS.Do NoT use separate sets of axes ro..u.r, gr;;:' !v wv'rP

c) A news article reports that, "of th e 41r players on National Basketball Associationrosters in Februarv i?nq, only 119-made more than the league salary of $2.36mitlion'" which wordshould g" ;il;h"k, ;;;; or"."*i*, That is, is $2.36 m'lionthe mean or median salary for"rVna piuy..r?

d) A study was made 9f the age of enterins.firlt-year university students. which of thefollowins;.'":'Lt$:ll to u.ir,.,t*Jurai.uiutionz

_8. 1 year_C. 5 years

ffitfit:il firfffl07-er) uData,data,

dara! r can,r make bricks without cray!,, _

was asked the following questions. Decide whether thequantitative or categoricai. lCircte your ctroice;

Categorical euantitative

Categorical euantitative

Categorical euantitative

Categorical euantitative

SG.69

Page 67: Comm291 Practice Midterm

ffiffi

I

24

20o18*ro?h 14o12310EA2B

42

0

12345678Unsmployment Rate

e) The following histogram displays the December 2000 percentage unemployment rates

in the 50 U.S. states and Puerto Rico. The labels on the horizontal axis should beinterpreted as follows: the bar labelled "1" represents rates of |.0% to I.9%o, the barlabelled '02" represents rates of 2.0% to 2.9To, etc.

(i) What percentage of the rates (out of a total of 5 1 observations) is 5.0% or greater?

(ii) Estimate the median unemployment rate.

f) You have decided to sell your home. The market is booming now with the 2010Olympic Games preparations, and therefore most sellers of houses with similarcharacteristics have received extremely good deals in the past few months. You ask therealtor for a summary of net prices of homes sold in your neighborhood. The realtorhands you the,&llowing two density curves, one of them of the prices of homes sold inthe past few months in your neighborhood, and the other of the prices of homes soldduring a deep econbmic recession.

Curve A Curve B

(i) Under the given assumptions, which of the two curves better represents thedistribution of prices of homes sold in the past few months? Circle your answer choice.

Curve A Curve B(ii) A potential buyer offers to give you the mean, the median or the mode of the prices ofall the homes sold in the past few months in your neighborhood. Assuming that thedensity curve is the one you chose in (i) directly above, which numerical measure wouldyou prefer? Circle your answer choice.

If you chose Curve A: MeanOR:If you chose Curve B: Mean

(iii) You are told that the mean price of 50 houses sold is $700,000. However, you noticethat there was a mistake in the calculation, and that one of the buyers paid $500,000instead of the $800,000 that was used when making this calculation. What is the actualmean price of the 50 houses sold?

Median

Median

Mode

Mode

Ia

t"

b

n

s.

sl

sl

SG-70

Page 68: Comm291 Practice Midterm

SECTION A: AI{SWERS AND EXPLANATIONS

Answer to Question A1 (MT2009-Q1)a) Can't tellb) 23

c) IQR: 3; Spread 50tn p. :22, Cenfte Oldest male : 27, Noned) Symmetrice) Skewed to the rightD Z: -0.25 : (29-22.5)lo, so o: (22-22.5)l(-0.251:2g)Pr(Z < -0.25): 0.4013h) D. None of the abovei) Quantitative, Categorical, Quantitative, Categoricalj) Empirical (68-95-99.7) Rule: 30 + 20 : (10 , 50) (Also accept 30 + 19.6)Note: Parts g), h) and j) are about "Sampling Distributions and the Normal Model".Check your notes or the textbook.

Details and Comments:a) Boxplots do not show sample sizes; they only show: min, Q1, median, Q3, and max.b) Since the age distnbution for females is shongly skewed to the right, the mean isgreater than the median. The median (from the graph) is 22, so the mean must be a littlelarger, hence 23. Note that 30 is close to the maximum and far above Q3 so it is not a

realistic estimate of the mean.c) IQR (Males) : 24 - 2I :3;50'n p. (Females) : median : 22; Oldest Male : max: 27f) Use the formula for standardiring Xto Z; however, here both the values of XandZ arcgiven and it is the value of o which is unknown.h) The Central Limit Theorem cannot be used as the reason here since the sample isunlikely to be large.

Answer to Question A2 (MT2008-Q1)a) The quantitative variables are Age and Salary.b) Answers:3,3,3,1.c) Answers:Individual incomes in the United StatesAge of male heart attackvictimsLifetimes of electric light bulbsIQ scores of the Canadian population

Details and Comments:

Skewed right (long right-hand tail)Skewed left (long left-hand tail)Skewed right (long right-hand tail)Symmetric (equal tails)

a) Gender and Job Type are categorical; Employee # and Surname are simply strings andused as identifier variables. Taking the mean of the Employee # would not make sense.

b) Class 3 has much more area to the right than Class 1 or Class 2 so the mean andmedian are also shifted to the right. And since the histogram for Class 3 shows thegreatest skewness, it has the greatest difference between mean and median. Class 1 is less

spread out (the tails are both smaller than in the other two classes) so it has the smalleststandard deviation.

SG-7I

Page 69: Comm291 Practice Midterm

c) Incomes are skewed right because fewer people have very large incomes, more people

have incomes at the lower end or middle.

Age of heart attack victims is skewed left because heart attacks are much more likely inolder people.

Lifetimes of bulbs are skewed right because most bulbs last the amount of time they are

engineered to last but some will last much longer; that is, quality is designed in. Only a

few will fail early. Lifetimes in general are skewed right.

Answer to Question A3 (MT2008-Q2)a) aq4 b

OriginalData (X)

TransformedData (X*)

Mean 5.6 8.2

Median 6.5 10

Range 5l 110

Ql -3 -9

Q3 11 19

IQR t4 28

Std dev TI,7 23.4

c) Lower inner ferrce - -3 - 1.5(14) : -24Upper inner fence : 11 + 1.5(14):32Observation numbers of outliers : 52

Details and Comments:Note that the question asked for the observation number(s), not the margin!For part b): Suppose the data are transformed (linearly) as follows X* : a + bX; that is,

multiply the original observations by oob" and then add "a". That shifts all the values ofXup or down by the amount o'a" and changes the size of the unit of measurement by'0b".Mean(X*): a t bxMean(X);Median (X*) : a + bxMedian(X);Range(X*) : bxRange(X); lthe effect of ooa" is cancelled]

QIf): a * b"QL(X);Q3(X): a + b"Q3(X);IQR(X): bxIQR(X); [the effect of o'a"'is cancelled]SD(tr): bxSD6); fthe effect of 'oa" is cancelled]

SG.72

Page 70: Comm291 Practice Midterm

Answer to Question A4 (MT2007-e1)

a) What is your age (in years)?How much did you spend (in $)?What is your marital status?Rate the availability of parking

QuantitativeQuantitativeCategoricalCategorical

b)

Sources of E€ctricity Eources of Eec{rlclty

80

70

60

t50Ii.o.30

m

fl

Nudotr Nallrel C€s

c) "Of the 4ll players on National Basketball Association rosters in February 1998, only139 made more than the leagr.p MEAN salary of $2.36 million." If it were th! median,

-

then half of the 4r1 players (i.e. 205 or 206) would exceed the value.

d) I year is the typical difference in age between entering first-year university students.

e) (i) 5/51 :0.098, so 9.8%. It is also acceptable to round to l0o/o.(ii) The median is in the 3.0-3.9 interval, so the median is best estimated as the midpointof that interval at3.5%o.Comment: It is also acceptable to give the range 3.0-3.9. It is not acceptable to estimatethe median as 3.jYo.

f) (i) Curve B(ii) If you chose Curve A: Mean

If you chose Curve B: ModeNote: The two choices offered in part (ii) are to give you a chance to get the correctanswer to part (ii) even if you made the wrong choice in part (i).

(iii) [(50x700,000) - 300,000]/50 :9694,000Comment: Use the formula for mean and adjust accordingly.

SG-73