results from the 2009 ap statistics exam administration allan rossman, cal poly – san luis obispo...

83
Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Results from the 2009 AP Statistics Exam Administration

Allan Rossman, Cal Poly – San Luis Obispo

Ruth Carver, Germantown Academy

Page 2: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Outline

Introductions Six questions on operational exam

Intent Question Common errors Advice for teachers

Overall performance on operational exam Comparability study Q & A

Page 3: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Introductions

Allan Chief Reader-Designate for past year Professor of Statistics at Cal Poly

Ruth Test Development Committee member

Just completed first year Math teacher at Germantown Academy (near

Philadelphia)

Page 4: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Introductions (cont.)

We represent other TDC members Ken Koehler (Chair) Chris Franklin (Chief Reader “Emeritus”) Dorinda Hewitt Chris Olsen (CB liaison, now retired from TDC) Tom Short Josh Tabor (now retired from TDC) Bob Taylor

New TDC members Floyd Bullard, John Mahoney

Page 5: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: Intent of question

Assess a student’s ability to: construct an appropriate graphical display for

comparing the distributions of two categorical variables;

summarize from this graph the relationship of the two categorical variables;

identify the appropriate statistical procedure to test if an association exists between two categorical variables and stating appropriate hypotheses.

Page 6: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

A simple random sample of 100 high school seniors was selected from a large school district. The gender of each student was recorded, and each student was asked the following questions.

#1: The question

1. Have you ever had a part-time job?2. If you answered yes to the previous question, was your part-time job in the summer only?

Page 7: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: The question (cont.)

The responses are summarized in the table below.

Page 8: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: The question- part (a)On the grid below, construct a graphical display that represents the association between gender and job experience for the students in the sample.

Page 9: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: Common errors- part (a)

Use counts (frequencies) instead of percents (relative frequencies) Not appropriate when comparing groups of

unequal size Provide no or incorrect label on vertical axis Indicate conditioning on one variable (say

gender), but draw graph as if conditioning on other variable (say job experience).

Construct nonstandard graphs of many varieties

Page 10: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: The question- part (b)

Write a few sentences summarizing what the display in part (a) reveals about the association between gender and job experience for the students in the sample.

Page 11: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: Common errors- part (b)

Not fully discuss how males and females compare in all three job experience categories. Some only commented on which gender had more (or

fewer) part-time jobs, ignoring the two different categories Struggle to communicate statistical thinking clearly

when writing sentences about the association between gender and job experience.

Describe graphs as if discussing quantitative data, using terms like shape, center, spread, correlation None of these appropriate for describing distributions of

categorical data

Page 12: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: The question- part (c)

Which test of significance should be used to test if there is an association between gender and job experience for the population of high school seniors in the district?

State the null and alternative hypotheses for the test, but do not perform the test.

Page 13: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: Common errors- part (c)

Could not correctly name appropriate test Some incomplete names, like “chi-square test”

Some stated hypotheses suggesting causation, not appropriate for observational study Ho: Gender has no effect on job experience. Ha: Gender has an effect on job experience.

Several attempted to use symbols to state hypotheses Very hard to do well for chi-square test of

association/independence

Page 14: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#1: Advice for teachers

Pay attention to categorical variables! Graphical displays: bar graphs Numerical summaries: conditional proportions

Cannot over-emphasize distinction between categorical vs. quantitative variables

Focus on concept of independence Distribution of one variable (e.g., gender) is

identical for all categories of the other (e.g., job condition)

Page 15: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: Intent of question

Assess a student’s ability to:

calculate a percentile value from a normal probability distribution;

recognize a binomial scenario and calculate an appropriate probability;

use the sampling distribution of the sample mean to find a probability for the mean of five observations.

Page 16: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

A tire manufacturer designed a new tread pattern for its all-weather tires. Repeated tests were conducted on cars of approximately the same weight traveling at 60 miles per hour. The tests showed that the new tread pattern enables the cars to stop completely in an average distance of 125 feet with a standard deviation of 6.5 feet and that the stopping distances are approximately normally distributed.

#2: The question

Page 17: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: Parts (a)-(c)

a) What is the 70th percentile of the distribution of stopping distances ?

b) What is the probability that at least 2 cars out of 5 randomly selected cars in the study will stop in a distance that is greater than the distance calculated in part (a) ?

c) What is the probability that a randomly selected sample of 5 cars in the study will have a mean stopping distance of at least 130 feet ?

Page 18: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: The question- part (a)

What is the 70th percentile of the distribution of stopping distances?

Page 19: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: Common errors- part (a)

A number of students set z=0.7 and attempted to calculate the stopping distance. Many had difficulty determining the 70%-tile.

Several students interpreted the 70%-tile to be 70% centered about the mean. By doing so, several used the 68-95-99% rule to try to solve the problem.

Used “calculator speak” to calculate stopping distance without defining parameters of the distribution. Calculator commands without defining the arguments for the commands are discouraged.

Sketches of the approximately normal distribution were unclear and unlabeled.

Page 20: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: The question- part (b)

What is the probability that at least 2 cars out of 5 randomly selected cars in the study will stop in a distance that is greater than the distance calculated in part (a) ?

Page 21: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: Common errors- part (b)

Several students failed to use the binomial distribution correctly; calculating 1 - P( Y ≤ 2 ) instead of 1 - P( Y ≤ 1).

Several students only gave one term, generally, P(Y=2).

Several students used 0.7 as their probability of a success, as well as incorrectly defining the parameters of the binomial distribution

Several students constructed another normal probability not seeing the binomial was needed.

Page 22: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: The question- part (c)

What is the probability that a randomly selected sample of 5 cars in the study will have a mean stopping distance of at least 130 feet?

Page 23: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: Common errors- part (c)

Several students failed to get the sampling distribution for the sample mean statistic,-didn’t define the distribution or its parameters correctly.

Several students gave a value z=1.72, and a value p=0.0427, without correctly indicating that this probability relates to P( Z ≥ 1.72) = 1 - P( Z < 1.72)=0.0427.

Several students confused the question with a test of hypothesis and gave the probability P( Z ≥ 1.72) = 1 - P( Z < 1.72)=0.0427 as a p-value.

Page 24: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: Advice for teachers Pay attention to the distribution being used and

its corresponding parameters. Use standard notation to name the distribution

and its corresponding parameters. For example N(125, 6.5) for a Normal Distribution B(5, 0.3) or Bin(5, 0.3) for a Binomial Distribution

Avoid calculator-speak such as invNorm(0.70,125,6.5); it is not sufficient for defining a distribution and its parameters.

An appropriately labeled sketch with correct labels for center and spread is a definite plus.

Page 25: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#2: Advice for teachers (cont.) Cannot overemphasize the distinction between

the distribution of a variable and the sampling distribution of a sample statistic of a variable. Have students define the variable of interest

(in words) and the distribution of the variable (using standard notation). For example X = the stopping distance of a car with

new tread tiresX ~N (125, 6.5)

= the mean of the stopping distances of five randomly selected cars.

X

6.5~ 125,5

X N

Page 26: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3: Intent of question

Assess a student’s ability to: describe a randomization process required for

comparing two groups in a randomized experiment

describe a potential consequence of using self-selection instead of randomization

Page 27: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3: The question

Before beginning a unit on frog anatomy, a seventh-grade biology teacher gives each of the 24 students in the class a pre-test to assess their knowledge of frog anatomy. The teacher wants to compare the effectiveness of an instructional program in which students physically dissect frogs with the effectiveness of a different program in which students use computer software that only simulates the dissection of a frog. After completing one of the two programs, students will be given a posttest to assess their knowledge of frog anatomy. The teacher will then analyze the changes in the test scores (score on the pre-test minus score on posttest).

Page 28: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3: The question (cont.)

(a) Describe a method for assigning the 24 students to two groups of equal size that allows for a statistically valid comparison of the two instructional programs.

Page 29: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3(a): Common errors

Not stating a specific device or mechanism for randomization

Not specifying groups in context I.e., forming group 1 and group 2 but not

indicating which is the dissection and which is the computer simulation group

Using a stopping rule with a coin toss (or equivalent) without prior randomization

Referring to simple random samples

Page 30: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3(a): Common errors (cont.) Providing only a design diagram Picking names or numbers from a hat but

forgetting to first mix the contents Using a paired design but not blocking on

similar pre-test scores Forming blocks on a characteristic other than

pre-test, such as gender

Page 31: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3: The question (cont.)

(b) Suppose the teacher decided to allow the students in the class to select which instructional program on frog anatomy (physical dissection or computer simulation) they prefer to take, and 11 students choose actual dissection and 13 students choose computer simulation. How might that self-selection process jeopardize a statistically valid comparison of the changes in the test scores (score on post-test minus score on pre-test) for the two instructional programs? Provide a specific example to support your answer.

Page 32: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3(b): Common errors

Stating a reasonable characteristic but only saying that students “like it”

Not describe how behaviors associated with the self-selection criterion impact the changes in the differences (post – pre).

Referring only to the post-test (instead of the change in score)

Page 33: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3(b): Common errors (cont.) Mentioning a vague aspect of performance

E.g., do better, learn more/less Using common terms unclearly

Bias, observation, voluntary response, … Mentioning only a characteristic without any

connection to performance

Page 34: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3: Advice for teachers

Give practice, feedback in providing enough detail in describing randomization process So two KSUs (“knowledgeable statistics users”)

would use the same method Make students aware of (subtle) “stopping rule”

issue Can’t just balance groups at end unless order has been

randomized to begin with

Page 35: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#3: Advice for teachers (cont.) Pay attention to what the variables are

Response is change in test score (post – pre) Emphasize concept of confounding

Challenging concept Confounding variable must relate to both

explanatory and response variables

Page 36: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: Intent of question

Assess a student’s ability to: identify and compute an appropriate

confidence interval, after checking the necessary conditions;

interpret the interval in the context of the question;

use the confidence interval to make an inference about whether or not a council member’s belief is supported.

Page 37: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

One of the two fire stations in a certain town responds to calls in the northern half of the town, and the other fire station responds to calls in the southern half of the town. One of the town council members believes that the two fire stations have different mean response times. Response time is measured by the difference between the time an emergency call comes into the fire station and the time the first fire truck arrives at the scene of the fire.

#4: The question

Page 38: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: The question (cont.)

Data were collected to investigate whether the council member’s belief is correct. A random sample of 50 calls selected from the northern fire station had a mean response time of 4.3 minutes with a standard deviation of 3.7 minutes. A random sample of 50 calls selected from the southern fire station had a mean response time of 5.3 minutes with a standard deviation of 3.2 minutes.

Page 39: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: Parts (a)-(b)

a) Construct and interpret a 95 percent confidence interval for the difference in mean response times between the two fire stations.

b) Does the confidence interval in part (a) support the council member’s belief that the two fire stations have different mean response times? Explain.

Page 40: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: The question- part (a)

Construct and interpret a 95 percent confidence interval for the difference in mean response times between the two fire stations.

Page 41: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: Common errors- part (a)

Many students identified a z confidence interval as the appropriate procedure rather than a t.

Many students failed to check the sample size condition at all.

Some students did an inadequate job of checking the sample size condition by saying that the samples are large enough, with no reference to a number (such as 25 or 30), the central limit theorem or sampling distributions.

Some students stated that 50 is large enough to assume that the populations or samples or data are approximately normal, rather than that the sampling distribution(s) is (are) approximately normal.

Step 1 (name distribution+ conditions)

Page 42: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: Common errors- part (a) (cont.)More students received credit for this part than

for any of the other parts, but:• A few students used 1.645 as the multiplier in

their computation.• A few students neglected to square the standard

deviations when computing the standard error, and consequently presented an incorrect final answer.

• A few students thought that the interval could not go below 0, and truncated it at 0.

Step 2 (Mechanics)

Page 43: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: Common errors- part (a) (cont.) Some students omitted the word “mean” and interpreted the

interval as applying to the difference in response times. Some students omitted the word “difference” or similar words to

indicate the interval is for a difference in means, stating that the interval is for the “mean response time.”

A few students omitted the context. A few students interpreted the confidence level instead of the

confidence interval. A few students interpreted the confidence interval correctly but

interpreted the confidence level incorrectly. A few students wrote that the confidence interval was for a

“mean proportion” or similar wording using “proportion.”

Step 3 (Interpretation)

Page 44: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: The question- part (b)

Does the confidence interval in part (a) support the council member’s belief that the two fire stations have different mean response times? Explain.

Page 45: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: Common errors- part (b) Many students made a statistically incorrect statement,

such as “because the interval contains 0, the council member’s belief is wrong.”

Some students thought that the interval supported the council member’s belief because it included more values on one side of 0 than the other.

Some students thought that the interval supported the council member’s belief because it included values as large as 2 minutes.

A few students based a conclusion solely on testing hypotheses and made no reference to the confidence interval.

Page 46: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#4: Advice for teachers Pay attention to the parameter of interest

Response time/ difference in response times/ mean response time/ difference in mean response times

Spiral inference procedures Identify correct procedure from mixed problem set Verification of conditions for a particular procedure

Focus on relationship between hypothesis test and corresponding confidence interval Values contained in the interval

Contextual meaning beyond the mantra.

Page 47: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5: Intent of question

Assess a student’s ability to: interpret a p-value in context make an appropriate conclusion about the

study based on the p-value based on the conclusion, identify the type

of error that could have occurred and a possible consequence of this error in context

Page 48: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5: The question

For many years, the medically accepted practice of giving aid to a person experiencing a heart attack was to have the person who placed the emergency call administer chest compression (CC) plus standard mouth-to-mouth resuscitation (MMR) to the heart attack patient until the emergency response team arrived. However, some researchers believed that CC alone would be a more effective approach.

Page 49: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5: The question (cont.)

In the 1990s a study was conducted in Seattle in which 518 cases were randomly assigned to treatments: 278 to CC plus standard MMR and 240 to CC alone. A total of 64 patients survived the heart attack: 29 in the group receiving CC plus standard MMR, and 35 in the group receiving CC alone. A test of significance was conducted on the following hypotheses.

Page 50: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5: The question (cont.)

H0: The survival rates for the two treatments are equal.

Ha: The treatment that uses CC alone produces a higher survival rate.

This test resulted in a p-value of 0.0761.

Page 51: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5: The question (cont.)

a) Interpret what this p-value measures in the context of this study.

b) Based on this p-value and study design, what conclusion should be drawn in the context of this study? Use a significance level of = 0.05.

c) Based on your conclusion in part (b), which type of error, Type I or Type II, could have been made? What is one potential consequence of this error?

Page 52: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5(a): Common errors

Confusing the p-value with the significance level (saying that the p-value is the probability of rejecting H0)

Interpreting the p-value as the probability that H0 (or Ha) is true (or false)

Omitting a reference to the difference between proportions obtained in this study: “There is a 7.61% chance that the treatment that uses CC alone produces a higher survival rate than CC+MMR, if the true difference between the survival rates is 0”

Page 53: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5(a): Common errors

Omitting “as large as” in the probability phrase (“probability of obtaining the observed difference in survival rates”)

Saying “by chance alone” or “as a result of sampling variation” in place of the more specific conditional phrase (“if the survival rates for the two treatments are in fact the same”)

Omitting the conditional phrase entirely Omitting context

Page 54: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5(b): Common errors

Stating that we “accept H0” (Incorrectly) concluding that the two treatments

have the same survival rate

Omitting linkage Need to compare p-value to given level

Omitting context

Page 55: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5(c): Common errors

Confusing Type I and Type II errors Providing a “consequence” that is a decision,

rather than an action Must refer to actions on heart attack victims, not

decisions by those analyzing the data Lacking specificity with respect to treatments

– specifically, failing to distinguish whether “both treatments” means “CC+MMR and CC” or “CC+MMR”

Page 56: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#5: Advice for teachers

Focus on concepts of inference, not only procedures These concepts are very challenging, require

frequent emphasis and revisiting Emphasize what a p-value is, including through

repeated use of simulations Insist on clear, thorough communication

Page 57: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Intent of question

Assess a student’s ability to: state a correct pair of hypotheses; explain how a particular statistic measures

skewness; use the observed value of the statistic and a

simulated sampling distribution to make a conclusion about the shape of the population;

create a new statistic and explain how it measures skewness.

Page 58: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

A consumer organization was concerned that an automobile manufacturer was misleading customers by overstating the average fuel efficiency (measured in miles per gallon, or mpg) of a particular car model. The model was advertised to get 27 mpg. To investigate, researchers selected a random sample of 10 cars of that model. Each car was then randomly assigned a different driver. Each car was driven for 5,000 miles, and the total fuel consumption was used to compute mpg for that car.

#6: The question

Page 59: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Parts (a)-(d)a) Define the parameter of interest and state the null

and alternative hypotheses the consumer organization is interested in testing.

b) One condition for conducting a one-sample t-test in this situation is that the mpg measurements for the population of cars of this model should be normally distributed. However, the boxplot and histogram shown below indicate that the distribution of the 10 sample values is skewed to the right.

Page 60: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Parts (a)-(d) (cont.)

b) One possible statistic that measures skewness is the ratio

What values of that statistic (small, large, close to one) might indicate that the population distribution of mpg values is skewed to the right? Explain.

median sample

mean sample

Part (b) (cont.)

Page 61: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Parts (a)-(d) (cont.)c) Even though the mpg values in the sample were

skewed to the right, it is still possible that the population distribution of mpg values is normally distributed and that the skewness was due to sampling variability. To investigate, 100 samples, each of size 10, were taken from a normal distribution with the same mean and standard deviation as the original sample. For each of those 100 samples, the statistic (sample mean)/(sample median) was calculated. A dotplot of the 100 simulated statistics is shown below.

Page 62: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Parts (a)-(d) (cont.)

In the original sample, the value of the statistic (sample mean)/(sample median) was 1.03. Based on the value of 1.03 and the dotplot above, is it plausible that the original sample of 10 cars came from a normal population, or do the simulated results suggest the original population is really skewed to the right? Explain.

Page 63: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Parts (a)-(d) (cont.)d) The table below shows summary statistics for mpg

measurements for the original sample of 10 cars.

Choosing only from the summary statistics in the table, define a formula for a different statistic that measures skewness.

What values of that statistic might indicate that the distribution is skewed to the right? Explain.

Minimum Q1 Median Q3 Maximum

23 24 25.5 28 32

Page 64: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: The question- part (a)

a) Define the parameter of interest and state the null and alternative hypotheses the consumer organization is interested in testing.

Page 65: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Common errors- part (a) Students did not understand how to define the parameter

of interest. Instead we saw: “The mpg of the cars” (the variable of interest) “All the cars of this model” (the population of interest) “To determine if the manufacturer is misleading

customers” (the question of interest) Students often attempted to define the parameter more

than once (e.g. saying “the parameter is …” and then later saying “ = …”). These were treated as parallel solutions and the worst attempt was scored.

Students used non-standard notation in the hypotheses, often without explicitly defining their notation.

Students used a two-tailed alternative hypothesis.

Page 66: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: The question- part (b)One condition for conducting a one-sample t-test in this situation is that the mpg measurements for the population of cars of this model should be normally distributed. However, the boxplot and histogram shown below indicate that the distribution of the 10 sample values is skewed to the right.

Page 67: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: The question- part (b) (cont.)b) One possible statistic that measures skewness is

the ratio

What values of that statistic (small, large, close to one) might indicate that the population distribution of mpg values is skewed to the right? Explain.

median sample

mean sample

Page 68: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Common errors- part (b) Students reversed the relationship between the mean and

median in a right skewed distribution (saying the mean will be less than the median in a right skewed distribution).

Students made good statements about the relationship between the mean and median but did not state that large values of the statistic indicate right skewness.

Students stated that large values of the statistic indicate right skewness but only argued that in a normal (or symmetric) distribution the ratio should be close to 1 and did not explain how the mean and median are related in a right skewed distribution.

Students said “large” without any explanation.

Page 69: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: The question- part (c)c) Even though the mpg values in the sample were skewed to the right, it

is still possible that the population distribution of mpg values is normally distributed and that the skewness was due to sampling variability. To investigate, 100 samples, each of size 10, were taken from a normal distribution with the same mean and standard deviation as the original sample. For each of those 100 samples, the statistic (sample mean)/(sample median) was calculated. A dotplot of the 100 simulated statistics is shown below.

Page 70: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: The question- part (c) (cont.)

In the original sample, the value of the statistic (sample mean)/(sample median) was 1.03. Based on the value of 1.03 and the dotplot above, is it plausible that the original sample of 10 cars came from a normal population, or do the simulated results suggest the original population is really skewed to the right? Explain.

Page 71: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Common errors- part (c)

Students did not understand that the dotplot approximated the sampling distribution of the statistic (sample mean)/(sample median). In other words, students did not understand that it showed what values of the statistic would occur when sampling from a normal population. Some students thought the dotplot showed sample data (as

opposed to simulated values of a sample statistic), described the shape of the dotplot as approximately normal, and used this to justify that the original sample came from a normal population.

Other students thought that the values in the dotplot came from new samples of size 10 from the original population, instead of from a normal population.

Page 72: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Common errors- part (c) (cont.) Students did not understand how to use the dotplot to make an

appropriate conclusion. In other words, students did not know to look for where 1.03, the observed value of the sample statistic, fell in the distribution and explicitly indicate whether or not a value of 1.03 would be likely to occur by chance when sampling from a normal population. Some students stated the relative position of 1.03 without

specific numerical evidence from the dotplot (“1.03 is towards the middle of the distribution”) and then correctly decided that it was plausible the original sample came from a normal population.

Some students thought that 1.03 was unusual enough to conclude that the original population was skewed to the right (“1.03 is in the tail of the distribution so I conclude that the sample came from a right skewed population”).

Page 73: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Common errors- part (c) (cont.)Other incorrect responses included: Saying that the dots are centered around 1 so the sample came

from a normal population. Because the sampling distribution was generated using samples from a normal population, this is not surprising. However, it doesn’t address whether or not 1.03 is unusual.

Arguing that 1.03 is close to 1 without describing its relative position in the dotplot. It was clear that many students were thinking simply about the absolute difference between 1 and 1.03 without considering the variability in the sampling distribution.

Stating that the sample size (or number of samples) is large, so the distribution is normal or that the sample size is too small to make a conclusion

Stating that the sample came from a normal population with no explanation.

Page 74: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: The question- part (d)d) The table below shows summary statistics for mpg

measurements for the original sample of 10 cars.

Choosing only from the summary statistics in the table, define a formula for a different statistic that measures skewness.

What values of that statistic might indicate that the distribution is skewed to the right? Explain.

Minimum Q1 Median Q3 Maximum

23 24 25.5 28 32

Page 75: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Common errors- part (d)

Students did not provide a statistic that measured skewness. This typically occurred if the student only focused on the right half of the distribution (e.g. max/Q3) or used a measure of spread (e.g. (max – min)/median).

Students provided a method for identifying skewness but not a statistic. For example, “if (max – med) > (med – min), then the distribution is skewed right.”

Page 76: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Common errors- part (d) (cont.) Students did not correctly identify the values of their

statistic which indicate skewness to the right (e.g. looking for values < 1 when using (med – min)/(max – med)).

Students did not justify the values that indicate right skewness by discussing how right skewness affects the relationship between the components of the statistic.

Students often tried to use outlier rules to measure skewness. Using these rules correctly and concluding that there is right skewness if there are outliers on the right but not on the left got credit for a reasonable method, but not a reasonable statistic.

Page 77: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Advice for teachers

Pay attention to difference between Variable of interest/ population of interest/ question of

interest/ parameter of interest Understand relationship between

distribution of a variable and sampling distribution of a sample statistic for samples

of size n Use simulations throughout the course

Emphasize reasoning process behind them Solve real problems based on simulations

Page 78: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

#6: Advice for teachers Cannot emphasize enough the relationship

between The sampling distribution of a sample statistic and the

likelihood of obtaining a sample statistic as or more extreme than what was observed in your sample.

The sampling distribution of a sample statistic as a model of the null hypothesis.

Give students experience with thinking “outside the box” E.g., creating own statistics/estimators

Page 79: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Overall performance

Mean scores by question:1 2 3 4 5 6

2.02 0.84 1.42 1.64 0.96 1.32

Exam Score 2008 2009

5 12.9% 12.1%

4 22.7% 22.3%

3 23.7% 24.3%

2 18.8% 19.2%

1 21.8% 22.1%

Exam Score 2008 2009

5 12.9% 12.1%

4 22.7% 22.3%

3 23.7% 24.3%

2 18.8% 19.2%

1 21.8% 22.1%

Page 80: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Overall performance

Mean scores by question:1 2 3 4 5 6

2.02 0.84 1.42 1.64 0.96 1.32

score 2006 2007 2008 20095 12.5% 11.8% 12.7% 12.1%4 22.3% 21.4% 22.5% 22.3%3 25.4% 25.5% 23.9% 24.3%2 18.3% 17.2% 19.0% 19.2%1 21.5% 24.1% 21.9% 22.1%

Page 81: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

2009 AP Statistics Comparability Study The College Board conducts College Comparability Studies

every 5-7 years by subject to ensure that AP Exam scoring aligns with college-level expectations.

Portions of recent AP exams are administered to college students; their professors score the exams.

The essay portion of the exams taken by college students are scored at the AP Reading; multiple-choice sections are machine scored.

College student performance is then compared to AP student performance to verify the alignment of expectations that are reflected in the AP exam scores.

Page 82: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

AP Statistics Comparability Study Results

The 2009 AP College Comparability study confirmed that standards for scoring the AP Statistics exam remain high and, overall, are calibrated well with college expectations.

Page 83: Results from the 2009 AP Statistics Exam Administration Allan Rossman, Cal Poly – San Luis Obispo Ruth Carver, Germantown Academy

Q & A?

Please ask questions now Please feel free to contact us later

[email protected] [email protected]

Please consider coming to Daytona Beach next June for the Reading Apply if you have not done so

apcentral.collegeboard.com