experimental designbugusky.weebly.com/.../stats_review_fall_semester.pdf · stats review fall...

18
Stats Review Fall Semester Vocabulary and Free Response Experimental Design Block Placebo Completely Randomized Design Random Assignment Confounding Randomized Block Design Control Group Replication Double Blind Observational Study Experiment Matched Pairs Experimental Unit/Subject Response Variable Explanatory Variable Treatment Homogenous Response Variable Population Sample Census Bias Voluntary Response Convenience Sample Simple Random Sample Non-response Stratified Random Sample Undercoverage Response Bias Systematic sample Single Blind Three Principles of Experimental Design Cluster sample What is the population and sample and how do they relate? What are different symbols we use for each? “ Blocking is used to control the factors you can see; Randomization helps balance the ones you cannot see.”

Upload: others

Post on 14-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Stats Review Fall Semester

Vocabulary and Free Response Experimental Design

Block Placebo

Completely Randomized Design Random Assignment

Confounding Randomized Block Design

Control Group Replication

Double Blind Observational Study

Experiment Matched Pairs

Experimental Unit/Subject Response Variable

Explanatory Variable Treatment

Homogenous Response Variable

Population Sample

Census Bias

Voluntary Response Convenience Sample

Simple Random Sample Non-response

Stratified Random Sample Undercoverage

Response Bias Systematic sample

Single Blind Three Principles of Experimental Design

Cluster sample

What is the population and sample and how do they relate? What are different symbols we use for each?

“ Blocking is used to control the factors you can see; Randomization helps balance the ones you cannot see.”

Page 2: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

2000 Free Response Question High cholesterol level in people can be reduced by exercise or by drug treatment. A pharmaceutical company has developed a new cholesterol-reducing drug. Researchers would like to compare its effects to the effects of the cholesterol-reducing drug that is currently available on the market. Volunteers who have a history of high cholesterol and who are currently not on medication will be recruited to participate in a study.

(a) Explain how you would carry out a completely randomized experiment for the study. (b) Describe an experimental design that would improve the design in (a) by incorporating blocking. (c) Can the experimental design in (b) be carried out in a double-blind manner? Explain.

2003 Free Response Question There have been many studies recently concerning coffee drinking and cholesterol level. While it is known that several coffee-bean components can elevate blood cholesterol level, it is thought that a new type of paper coffee filter may reduce the presence of some of these components in coffee.

The effect of the new filter on cholesterol level will be studies over a 10-week period using 300 nonsmokers who each drink 4 cups of caffeinated coffee per day. Each of these 300 participants will be assigned to one of two groups: the experimental group, who will only drink coffee that has been made with the new filter, or the control group, who will only drink coffee that has been made with the standard filter. Each participant’s cholesterol level will be measured at the beginning and at the end of the study.

(a) Describe an appropriate method for assigning the subjects to the two groups so that each group will have an equal number of subjects.

(b) In this study, the researchers chose to include a group who only drank coffee that was made with the standard filter. Why is it important to include a control group in this study even though cholesterol levels will be measured at the beginning and at the end of the study?

(c) Which test would you conduct to determine whether the change in cholesterol level would be greater if people used the new filter rather than using the standard filter?

(d) Why would the researchers choose to use only nonsmokers in the study?

Page 3: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

2004 Free Response Question

At a certain university, students who live in the dormitories eat at a common dining hall. Recently, some students have been complaining about the quality of the food served there. The dining hall manager decided to do a survey to estimate the proportion of students living in the dormitories who think that the quality of the food should be improved. One evening, the manager asked the first 100 students entering the dining hall to answer the following question.

Many students believe that the food served in the dining hall needs improvement. Do you

think that the quality of food served here needs improvement, even though that would

increase the cost of the meal plan?

_____ Yes _____ No _____ No opinion

(a) In this setting, explain how bias may have been introduced based on the way this convenience sample was selected and suggest how the sample could have been selected differently to avoid that bias.

(b) In this setting, explain how bias may have been introduced based on the way the question was worded and suggest how it could have been worded differently to avoid that bias.

Page 4: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Viewing and Describing Data

Categorical data Quantitative data

Graphical displays of categorical data Graphical displays of quantitative data

Center Shape

Spread Outliers

Gaps and Clusters Mean

Median Mode

Symmetrical Bi-modal

Skewed Range

IQR Quartiles

5 number summary Box and Whisker plot

Modified boxplot Histogram

Bar Graph Pie chart

Stem and leaf plot Dotplot

Variance Standard Deviation

Non resistant to outliers

2001 AP Free Response The summary statistics for the number of inches of rainfall in Los Angeles since 1877, are shown below.

N MEAN MEDIAN TRMEAN STDEV SE MEAN MIN MAX Q1 Q3

117 14.941 13.070 14.416 6.747 0.624 4.850 38.180 9.680 19.250

(a) Describe a procedure that uses these summary statistics to determine whether there are outliers. (b) Are there outliers in these data? Justify your answer based on the procedure that you described in part (a). (c) The news media reported that in a particular year, there were only 10 inches of rainfall. Use the information

provided to comment on this reported statement.

Page 5: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

2000 AP Free Response Two pain relievers, A and B, are being compared for relief of postsurgical pain. Twenty different strengths (doses in milligrams) of each drug were tested. Eight hundred postsurgical patients were randomly divided into 40 different groups. Twenty groups were given drug A. Each group was given a different strength. Similarly, the other twenty groups were given different strengths of drug B. Strengths used ranged from 210 to 400 milligrams. Thirty minutes after receiving the drug, each patient was asked to describe his or her pain on a scale of 0 (no decrease in pain) to 100 (pain totally gone).

The strength of the drug given in milligrams and the average pain rating for each group are shown in the scatterplot below. Drug A is indicated with A’s and drug B with B’s.

a. Based on the scatterplot, describe the effect on drug A and how it is related to strength in milligrams. b. Based on the scatterplot, describe the effect of drug B and how it is related to strength in milligrams. c. Which drug would you give and at what strength, if the goal is to get pain relief of at least 50 at the lowest possible

strength? Justify your answer based on the scatterplot.

Page 6: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

2002 AP Free Response At a school field day, 50 students and 50 faculty members each completed an obstacle course. Descriptive statistics for the completion times (in minutes) for the two groups are shown below.

Students Faculty Members

Mean 9.90 12.09

Median 9.25 11.00

Minimum 3.75 4.50

Maximum 16.50 25.00

Lower quartile 6.75 8.75

Upper quartile 13.75 15.75

a) Use the same scale to draw boxplots for the completion times for students and for faculty members. b) Write a few sentences comparing the variability of the two distributions. c) You have been asked to report on this event for the school newspaper. Write a few sentences describing

student and faculty performances in this competition for the paper.

Page 7: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Regression Explanatory variable Response variable

LSRL equation Confounding

Correlation coefficient Coefficient of determination

Strength Direction

Slope Residual

Residual plot Exponential model

Power model Influential point

Extrapolation Causation

Page 8: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

1999 Free Response Lydia and Bob were searching the Internet to find information on air travel in the United States. They found data on the number of commercial aircraft flying in the United States during the years 1990-1998. The dates were recorded as years since 1990. Thus, the year 1990 was recorded as year 0. They fit a least squares regression line to the data. The graph of the residuals and part of the computer output for their regression are given below.

a) Is a line an appropriate model to use for these data? What information tells you this? b) What is the value of the slope of the least squares regression line? Interpret the slope in the context of this

situation. c) What is the value of the intercept of the least squares regression line? Interpret the intercept in the context of this

situation. d) What is the predicted number of commercial aircraft flying in 1992? e) What was the actual number of commercial aircraft flying 1992?

Page 9: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

2004 Free Response Form B Question 1 The Earth’s Moon has many impact craters that were created when the inner solar system was subjected to heavy bombardment of small celestial bodies. Scientists studied 11 impact craters on the Moon to determine whether there was any relationship between the age of the craters (based on radioactive dating of lunar rocks) and the impact rate (as deduced from the density of the craters). The data are displayed in the scatterplot below.

(a) Describe the nature of the relationship between impact rate and age.

Prior to fitting a linear regression model, the researchers transformed both impact rate and age by using logarithms. The following computer output and residual plot were produced.

(b) Interpret the value of 2r .

(c) Comment on the appropriateness of this linear regression for modeling the relationship between the

transformed variables.

Page 10: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Distributions 1999 Free Response A company is considering implementing one of two quality control plans for monitoring the weights of automobile batteries that it manufacturers. If the manufacturing process is working properly, the battery weights are approximately normally distributed with a specified mean and standard deviation.

Quality control plan A calls for rejecting a battery as defective if its weight falls more than 2 standard deviations below the specified mean.

Quality control plan B calls for rejecting a battery as defective if its weight falls more than 1.5 interquartile ranges below the lower quartile of the specified population.

Assume the manufacturing process is under control

a. What proportion of batteries will be rejected by plan A? b. What is the probability that at least 1 of 2 randomly selected batteries will be rejected by plan A? c. What proportion of batteries will be rejected by plan B?

Page 11: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Probability

Sample space Event

Tree diagram Complement rule

Addition rule Multiplication rule

Conditional probability Independence

Random variable Discrete vs. continuous

Probability distribution Expected value

Variance of random variable Transformation of random variable

Combining random variables Binomial setting

Binomial formula Binomial cumulative distribution function

Standard deviation of binomial random variable Mean of binomial random variable

Geometric setting Geometric cumulative distribution function

Mean of geometric random variable Mutually exclusive

Simulation Stopping Rule

Number assignment Trial

Page 12: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

2001 Free Response

Every Monday a local radio station gives coupons away to 50 people who correctly answer a question about a news fact from the previous day’s newspaper. The coupons given away are numbered from 1 to 50, with the first person receiving coup 1, the second person receiving coupon, and so on, until all 50 coupons are given away. On the following Saturday, the radio station randomly draws numbers from 1 to 50 and awards cash prizes to the holders of the coupons with these numbers. Numbers continue to be drawn without replacement until the total amount awarded first equals for exceeds $300. If selected, coupons 1 through 5 each have a cash value of $200, coupons 6 through 20 each have a cash value of $100, and coupons 21 through 50 each have a cash value of $50.

a) Explain how you would conduct a simulation using the random number table provided below to estimate the distribution of the number of prize winners each week.

2003 Free Response

Men’s shirt sizes are determined by their neck sizes. Suppose that men’s neck sizes are approximately normally distributed with mean 15.7 inches and standard deviation 0.7 inch. A retailer sells men’s shirts in sizes S, M, L, and XL, where the shirt sizes are defined the in table below.

Shirt Size Neck size

S 14 ≤ neck size < 15

M 15 ≤ neck size < 16

L 16 ≤ neck size < 17

XL 17 ≤ neck size < 18

(a) Because the retailer only stocks the sizes listed above, what proportion of customers will find that the retailer does not carry any shirts in their sizes? Show you work.

(b) Using a sketch of a normal curve, illustrate the proportion of men whose shirt size is M. Calculate this proportion.

(c) Of 12 randomly selected customers, what is the probability that exactly 4 will request size M? Show you work.

Page 13: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Contestants on a game show spin a wheel like the one shown in the figure above. Each of the four outcomes on this wheel is equally likely and outcomes are independent from one spin to the next.

The contestant spins the wheel.

If the result is a skunk, no money is won and the contestant’s turn is finished.

If the result is a number, the corresponding amount in dollars is won. The

contestant can then stop with those winnings or can choose to spin again, and his or her turn continues.

If the contestant spins again and the result is a skunk, all of the money earned on that turn is lost and the turn

ends.

The contestant may continue adding to his or her winnings until he or she chooses to stop or until a spin results in

a skunk.

(a) What is the probability that the result will be a number on all of the first three spins of the wheel?

(b) Suppose a contestant has earned $800 on his or her first three spins and chooses to spin the wheel again. What is the expected value of his or her total winnings for the four spins?

Page 14: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Mixed Questions. Free response

0. Mr. Bugusky wants to investigate student’s opinion if “okay Boomer” is worse than or equal to the n-word. Given the following ways to sample explain what each case is and what could occur?

a) He just ask the first 30 kids that come into his Pre-AP Algebra 2 class.

b) Mr. B randomly selects 30 freshman, 30 sophomores, 30 juniors, and 30 seniors opinions.

c) He sets up a booth in the cafeteria that says “Okay Boomer = N-Word, prove me wrong” and sips coffee while recording the comical results.

d) He asks the first 8 kids from each of his periods (7*8 = 56 students) opinions on the matter and records the results.

1. The number of sweatshirts a vendor sells daily has the following probability distribution.

a) What is the mean?

b) What is the standard deviation?

If each sweatshirt sells for $25, what is the expected daily total dollar amount taken in by the vendor from the sale of sweatshirts?

c) What is the new mean?

d) What is the new standard deviation?

Number of Sweatshirts x 0 1 2 3 4 5

P(x) 0.3 0.2 0.3 0.1 0.08 0.02

Page 15: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

2. Given the data to the right, answer the following questions.

a) What is the IQR?

b) What would constitute an outlier?

c) 400 and above would be about what % of values?

3. Explain why the following data points would result in possible skew distributions

a) Mr. B ask how many games ALL his students have won in chess games.

b) The weight of babies, in pounds, in a very malnourished country follows the following Normal Distribution N(5.1, 3.1)

c) David Okoh, the goat of 2018-2019 ALG2, grades were normally distributed as follows N(99.85, 0. 39)

4. Mr.C and his friends form the trivia dream team. Mr.C is secretly an IBM created robot that gets .98 of the trivia correct. His friend Albert gets .95 of the trivia correct. His friend Charles gets .92 of the trivia correct.

a) In one round they each have to answer a question. What is the odds of all of them answering correctly?

In a lighting round, Mr. C has to answer 10 questions.

b) What is the probability he gets all 10 correct?

b) What is the probability he gets only 8 correct?

c) If an answer is incorrect by the team, what is the probability it came from Charles?

BillyBob fills in one night and has to do the same 10 question lighting round and he has a .3 of getting the trivia correct.

d) What is the probability he gets at least 1 correct?

Page 16: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

5. Given the two-way table to the right.

a) Is preferring to watch football and male independent?

b) Find P(baseball|female)

c) Find P(Football) or P(Female)

d) P(male)

e) Find P(football|male)

f) What is more likely, P(football|male) or P(baseball|female)?

6. Mr. B’s commute to work in minutes is normally distributed as follows N(25,3).

a) Draw a Normal Curve and label 3 sigma in each direction. b) Label the empirical rule %

c) What is the probability of Mr. B arriving in less than 19 mins?

Page 17: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely

Mixed Questions. MultiChoice

0. Which residual plots are the only two that are appropriate for a linear model?

1. Which of the following can be used to show a cause-and-effect relationship between two variables?

(A) A census (B) A controlled experiment (C) An observational study

(D) A sample survey (E) A cross-sectional survey (D) Repeat it until people believe it

2. A company wanted to determine the health care costs of its employees. A sample of 25 employees were interviewed and their medical expenses for the previous year were determined. Later the company discovered that the highest medical expense in the sample was mistakenly recorded as 10 times the actual amount. However, after correcting the error, the correct amount was still greater than or equal to any other medical expense in the sample. Which of the following sample statistics must have remained the same after the correction was made?

(A) Mean (B) Median (C) Mode

(D) Range (E) Variance (D) Extrapolation

3. For which of the following distributions is the mean greater than the median?

(A) (D)

(B) (E)

(C)

0

0

0

0

0

Page 18: Experimental Designbugusky.weebly.com/.../stats_review_fall_semester.pdf · Stats Review Fall Semester . Vocabulary and Free Response . Experimental Design . Block Placebo . Completely