€¦  · web viewthere is no need for “extra fluff” on an ap stats exam. ... especially with...

40
AP Statistics Exam Review Packet Tips for the AP Statistics Exam The Multiple Choice Section (50%): 40 questions in 90 minutes (there is usually enough time for this section). There is no penalty for wrong answers, so ANSWER EVERY QUESTION! Generally the questions get harder as you go. Skip tough questions and return to them later. The Free Response Section (50%): 6 questions in 90 minutes (students usually feel rushed on this section). The first 5 questions are shorter and should take 10-15 minutes each. The 6 th and final question is called the investigative task. It is worth 25% of the free response portion and typically takes longer. The question usually has a “flow” and almost always asks you to do something familiar and something new. Don’t save it until the end of the exam, you will be too tired and rushed to think creatively. Keep an open mind and do your best—if it is hard for you, it will be hard for everyone. A good strategy is to do question 1, then question 6, then the remaining 4 questions. Read each question first so you can get the big picture and prioritize your time. Communication is very important. Make sure the grader knows what you are doing and why. Don’t use statistical vocabulary unless you use it correctly. Define all symbols, draw pictures, etc. Never just give a numerical answer. Don’t just rely on calculator commands. If you use calculator commands, clearly label each number. Explain your reasoning. When asked to choose between several options, give reasons for your choice AND reasons why you did not choose the others. When you are asked to compare two distributions, use explicit comparison phrases such as “higher than” or “approximately the same as.” Lists of characteristics do not count as a comparison. Don’t waste time erasing. Cross out wrong answers and draw arrows to help the reader follow your work. Don’t give 2 different solutions to a problem. The worst one will be graded. Answer all questions in the context of the problem. If the question asks you to use results from previous parts of the question, make sure you explicitly refer to them in your answer. If you cannot get an answer for an early part of a question but need it for a later part, make up a value or carefully explain what you would do if you knew the answer.

Upload: tranngoc

Post on 23-May-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

AP Statistics Exam Review Packet

Tips for the AP Statistics ExamThe Multiple Choice Section (50%):

40 questions in 90 minutes (there is usually enough time for this section). There is no penalty for wrong answers, so ANSWER EVERY QUESTION! Generally the questions get harder as you go. Skip tough questions and return to them later.

The Free Response Section (50%): 6 questions in 90 minutes (students usually feel rushed on this section). The first 5 questions are shorter and should take 10-15 minutes each. The 6th and final question is called the investigative task. It is worth 25% of the free response portion and

typically takes longer. The question usually has a “flow” and almost always asks you to do something familiar and something new. Don’t save it until the end of the exam, you will be too tired and rushed to think creatively. Keep an open mind and do your best—if it is hard for you, it will be hard for everyone.

A good strategy is to do question 1, then question 6, then the remaining 4 questions. Read each question first so you can get the big picture and prioritize your time.

Communication is very important. Make sure the grader knows what you are doing and why. Don’t use statistical vocabulary unless you use it correctly. Define all symbols, draw pictures, etc. Never just give a numerical answer.

Don’t just rely on calculator commands. If you use calculator commands, clearly label each number. Explain your reasoning. When asked to choose between several options, give reasons for your choice AND

reasons why you did not choose the others. When you are asked to compare two distributions, use explicit comparison phrases such as “higher than” or

“approximately the same as.” Lists of characteristics do not count as a comparison. Don’t waste time erasing. Cross out wrong answers and draw arrows to help the reader follow your work. Don’t give 2 different solutions to a problem. The worst one will be graded. Answer all questions in the context of the problem. If the question asks you to use results from previous parts of the question, make sure you explicitly refer to

them in your answer. If you cannot get an answer for an early part of a question but need it for a later part, make up a value or

carefully explain what you would do if you knew the answer. Space on the exam is not suggestive of the desired length of an answer. The best answers are usually quite

succinct. There is no need for “extra fluff” on an AP Stats exam. Don’t automatically enter data into your calculator. In most cases, you will not need to. Use words like “approximately” liberally, especially with the word “Normal.”

Other Stuff: Make sure to eat a healthy, typical meal before the test Bring a watch to help pace yourself. Bring an extra calculator or extra batteries and an extra pencil. You will be provided formulas and tables (normal, t, chi-square) on both sections. Do NOT bring a cell phone (or any other communication device). You may not use rulers, white-out or highlighters. You may not discuss the multiple choice questions (ever) and may not discuss the free response questions until

they are released on AP Central (not all FR questions will be released). The AP Exam is harder than a normal classroom test. Scoring at least 40% will almost guarantee a 3 or higher

on the exam. Don’t panic if you cannot answer a question or two.How to study:

Work on the class-compiled review & use the mock analysis sheet as a guide Review chapter tests & FRAPPYs Watch videos/do practice quizzes at www.whfreeman.com/TPS5e, especially from chapters 4–6

You have worked very hard this year and I am proud of you Calculator Functions for Stats (TI-83/84)

NAME RESULT CALC KEYS

1-VarStats (L1) Summary of univariate statistics on data (put data in L1) STAT / Calc / 1: 1-Var Stats

2-VarStats (L1, L2) Summary of bivariate statistics on data (x data in L1, y data in L2) STAT / Calc / 2: 2-Var Stats

ZoomStat Defines viewing window so that all data points are displayed in a StatPlot (graph) ZOOM / 9: ZoomStat

Normalcdf (min, max, µ, σ)

Computes cumulative probability (area under the curve) for normal distribution between bounds for given mean and st. dev.

2nd Distr / 2:normalcdf

Normalpdf (x, µ, σ)Computes probability for normal distribution at x value for given mean and st. dev.

2nd Distr / 1: normalpdf

InvNorm (area, µ, σ)Computes value (z-score) for given area under normal curve with given mean and st. dev.

2nd Distr / 3: invNorm

DiagnosticOn Turns on display of r , r2 , R2 2nd Catalog / “D” / scroll down to DiagnosticOn

DiagnosticOff Turns off display of r , r2 , R2 2nd Catalog / “D” / scroll down to DiagnosticOff

LinReg ( a + bx) Fits linear regression to list data (L1: x data, L2: y data) STAT / Calc / 8: LinReg (a+bx)

RandInt (lower, upper, # trials)

Displays random integer between given bounds for specified # trials MATH / Prob / 5: RandInt

Binomcdf ( n, p, x)Computes cumulative probability for ≤ x value for binomial distribution with probability p of success on one trial

2nd Distr / B: binomcdf

Binompdf ( n, p, x)Computes probability at x value for binomial distribution with proability p of success on one trial

2nd Distr / A: binompdf

Geometcdf ( p , x)Computes cumulative probability for ≤ x trials for geometric distribution with probability p of success on one trial

2nd Distr / F: geometcdf

Geometpdf ( p , x)Computes probability at trial x for geometric distribution with probability p of success on one trial

2nd Distr / E: geometpdf

1-Sample Z IntervalCalculates a one proportion confidence interval with number of successes x out of trials n

STAT / Tests / A: 1-PropZInt

1-Sample Z Test Performs a one proportion z test (one- and two-sided options) STAT / Tests / 5: 1-PropZTest

2-Sample Z Interval

Calculates confidence interval for two proportions ( p1−p2) with number of successes x1 and x2 out of trials n1 and n2

STAT / Tests / B: 2-PropZInt

2-Sample Z Test Performs a two proportion z test STAT / Tests / 6: 2-PropZTest

Tcdf ( min, max, df)Computes the t-distribution probability (area under curve)between lower and upper bound for given df

2nd Distr / 5: tcdf

InvT ( area, df) Computes value (t-score) for given area under t-distribution curve with given df 2nd Distr / 4: invT

1-Sample T IntervalComputes confidence interval for one mean with either data list or given statistics

STAT / Tests / 8: TInterval

1-Sample T TestPerforms a one mean t test (one- and two-sided options) with either data list or given statistics

STAT / Tests / 2: TTest

2-Sample T IntervalCalculates confidence interval for two means (µ1−µ2) with either data lists or given statistics

STAT / Tests / 0: 2-SampTInt

2-Sample T Test Performs a two mean t test with data lists or given statistics STAT / Tests/ 4: 2-SampTTest

χ2cdf (min, max, df)Computes probability for chi-square distribution between bounds (area under curve) with given degrees of freedom df

2nd Distr / 7: χ2cdf

χ2 Test for Goodness of Fit (only available for TI-84)

Performs chi-square test (observed counts in L1 and expected counts in L2) STAT / Tests / D: χ2GOF-Test

χ2 Test for Homogeneity OR for Independence

Performs chi-square test (observed counts in matrix [A} and expected counts in matrix [B]

STAT / Tests / C: χ2-Test

Linear Regression T Test Performs linear regression and a t-test (x data in L1 and y data in L2) STAT / Tests / F: LinRegTTest

Inference Review: Picking the correct inference procedure

The table below lists the 15 different inference procedures you should know for the AP exam. In each of the scenarios below, choose the correct inference procedure.

One-sample z interval for p One-sample z test for pOne-sample t interval for , including paired data

One-sample t test for , including paired data

Two-sample z interval for Two-sample z test for

Two-sample t interval for Two-sample t test for t interval for the slope of a least-squares regression line

t test for the slope of a least-squares regression line

Matched pairs z test Chi-square test for goodness-of-fitMatched pairs t test Chi-square test for homogeneity

Chi-square test for association/independence

1. Which brand of AA batteries last longer—Duracell or Eveready?

2. According to a recent survey, a typical teenager has 38 contacts stored in his/her cellphone. Is this true at your school?

3. What percent of students at your school have a Facebook?

4. Is there a relationship between the age of a student’s car and the mileage reading on the odometer at a large university?

5. Is there a relationship between students’ favorite academic subject and preferred type of music at a large high school?

6. Who is more likely to own an iPod—middle school girls or middle school boys?

7. How long do teens typically spend brushing their teeth?

8. Are the colors equally distributed in Froot Loops?

9. Which brand of razor gives a closer shave? To answer this question, researchers recruited 25 men to shave one side of their face with Razor A and the other side with Razor B.

10. How much more effective is exercise and drug treatment than drug treatment alone at reducing the incidence of heart attacks among men aged 65 and older?

Web resource for more problems like these: http://www.ltcconline.net/greenL/java/Statistics/catStatProb/categorizingStatProblems13.html

Chapter 1: Exploring Data

VocabularyIndividuals - People animals or things.Variables - Describes some characteristic of an individual.Categorical - variables assigned a label that places each individual into one or several groups.Marginal Distribution – the distribution of values among all individuals for one of the categorical variables in a two-way table of countsConditional Distribution – describes values of variable among individuals who have a specific value of another variable. There is a separate one for each value of the other variable. Quantitative - Has numerical values that measures some characteristic of each individual.

Summary Ways to graph categorical data: Bar graphs, Pie charts, Two-way Tables, Segmented Bar graph Example:

Conditional Distr. of opinion Two-way table: among women:

Ways to graph quantitative data: Dotplot, Stemplots, Histograms Describing quantitative data: SCS – GO!

o Shape: Symmetric/Left or Right skewness, Unimodal/Bimodal/Uniformo Center: Mean (average), Median (midpoint)o Spread: Range (Max – Min), IQR (Q3−Q1 ¿, Standard deviationo Outliers: Falls within 1.5∗IQR above third quartile OR below first quartile

Five–Number summary:

Tips/Mistakes Quantitative: can be counted vs. Categorical: cannot be counted When making bar graph/histogram, keep the bars/bins the same width (no pictographs!) Label your axes!!! For skewness, remember to look at your feet! Rightly-skewed (right foot: peak on left), etc. When asked to compare distributions, actually COMPARE; “about the same as”, “is greater than”… Can’t see “modes” in boxplot. Don’t describe shape for boxplot. Formulas given for mean & standard deviation OR use 1-Var Stats

MULTIPLE CHOICE

Q1 Q3

Median

IQR

Min

Q3+1.5∗IQR

Outlier/Max

Opinion Female Male TotalAlmost no chance 96 98 194Some chance 426 286 71250-50 chance 696 720 1416Good chance 663 758 1421Almost certain 486 597 1083Total 2367 2459 4826

Opinion PercentAlmost no chance 96/2397 = 4.1%Some chance 426/2367 = 18%50-50 chance 696/2367 = 29.4%Good chance 663/2367 = 28%Almost certain 486/2367 = 20.5%

Color % of Vehicles White 23Black 18Silver 16Gray 13Red 10Blue 9Brown/Beige 5Yellow/Gold 3Green 2

Use the following for #1-2: The National Survey of Adolescent Health interviewed several thousand teens (grade 7 to 12). One question asked was “What do you think are the chances you will be married in the next ten years?” Here is a two-way table of the responses by gender: 1) The percent of females among the respondents was…

(a) 2625(b) 4877(c) About 46%(d) About 54%(e) None of the above

2) What percent of females thought they were almost certain to be married in the next ten years?(a) About 16% (b) About 24% (c) About 40% (d) About 45% (e) About 61%

3) Here are the amounts of money (cents) in coins carried by 10 students in a statistics class: 50 35 0 97 76 0 0 87 23 65. To make a stem plot of these data, you would use the stems...

(a) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 (d) 00, 10, 20, 30, 40, 50, 60, 70, 80, 90(b) 0, 2, 3, 5, 6, 7, 8, 9 (e) None of these(c) 0, 3, 5, 6, 7

4) The stem plot (on right) shows the number of home runs hit by each of the 30 MLB teams in 2011. Home run totals above what value should be considered outliers?

(a) 173 (b) 210(c) 222(d) 229(e) 257

5) You look at real estate ads for houses in Naples, Florida. There are many houses ranging from $200,000 to $500,000 in price. The few house on the water, however, have prices up to $15 million. the distribution of house prices will be

(a) skewed to the left. (d) unimodal.(b) roughly symmetric. (e) too high.(c) skewed to the right.

FREE RESPONSE

6) The dot plot below displays the last digit of 100 phone numbers chosen at random from a phonebook. Describe the shape of the distribution. Does the shape make sense to you? Explain.

7) Here is the distribution of colors for vehicles sold in North Americain 2011.

(a) What percent of vehicles had colors other than those listed?

(b) Display these data in a bar graph. Be sure to label your axes.

(c) Would it be appropriate to make a pie chart of these data? Explain.

Chapter 2: Distributions of Data (Density Curves)

VocabularyPercentile – value with p percent of the observations less than it. (location in the distribution)

Key: 14 8 ׀ is a team with 148 home runs.

09 1510 378911 47 12 19 13 14 89 15 3444516 23917 22318 35619 120 321 022 2

Response Female MaleAlmost no chance 119 103Some chance but probably not 150 17150-50 Chance 447 512Good chance 735 710Almost certain 1174 756

Standardized Score (z-score) - z scores tell us how many standard deviations from the mean an observation falls.Density Curve - describes the overall pattern of a distribution. Always on or above the horizontal axis & has an area of 1 underneath it.Normal Distribution - is described by a normal density curve & specified by its mean µ standard deviation σ

o Abbreviate: N(µ, σ)68-95-99.7 Rule - (In a normal dist.)

o Approximately 68% of observations will fall within one standard deviation of the mean.

o Approximately 95% of the observations will fallwithin 2 standard deviations of the mean.

o Approximately 99.7% of observations will fall within 3 standard deviations of the mean.

Standard Normal distribution – Normal distribution with mean 0 and standard deviation of 1.

Summary

z−score(z )= x value−meanStandard Dev of x .

=x i−µx

σ x

Transformations: o Adding/Subtracting: Changes center, but NOT spreado Multiplying/Dividing: Changes center & spreado Neither changes shape

Notation:

Mean µ is the balance point of the curve. Median divides the area under the curve in half & is much more resistant to skewness/outliers than mean. For symmetric density curves mean and median are equal, the mean of a skewed curve is located farther

toward the long tail than median is. All normal distributions obey the 68-95-99.7 rule Can use z table or Normalcdf( to find area under a curve with given boundaries, mean, and standard deviation Can use z table or invNorm( to find the z score from a given area under a Normal curve

Tips/Mistakes Percentiles are LOCATIONS. An observation isn’t “in” the 84th percentile, it’s “at” the 84th percentile. Z-score is NOT measured in the same units as the variable When drawing Normal curve:

o Start with seven evenly-spaced tick markso Mean in the center and go out 3 standard deviations in both directions

When doing Normal distributions calculations, the z table gives the area to the left of the boundary value. If you want the area to the right, you need to subtract from 1. (Always sketch curve to be sure what you are looking for).

Don’t just use “calculator speak.” When using calculator carefully label each of the inputs in the calculator command.

o Ex: normalcdf(lower: 2.35, upper: 10000, µ: 2.1, σ: 0.6)

MULTIPLE CHOICE

Population SampleMean µ xStandard Deviation σ sx

1) A difference species of cockroach have weights that follow a normal distribution with a mean of 50 grams. After measuring the weights of many of these roaches, a lab assistant reports that 14% of the roaches weigh more than 55 grams. Based on this report, what is the approximate standard deviation of weights for this species of cockroaches?

(a) 4.6 (b) 5.0 (c) 6.2 (d) 14.0 (e) Cannot determine

2) Jorge’s score on an exam in his stats class was at the 64th percentile of the scores for all students. His score falls:(a) Between the minimum and the first quartile(b) Between the first quartile and the median(c) Between the median and the third quartile(d) Between the third quartile and the maximum(e) At the mean score for all students

3) Scores on the ACT college entrance exam follow a bell shaped distribution with mean 18 and standard deviation 6. Wayne’s standardized score on the ACT was -0.5. What was Wayne’s actual ACT score?

(a) 5.5 (b) 12 (c) 15 (d) 17.5 (e) 21

4) Two lines are drawn on a density curve. Which is correct?(a) The median is at the dashed line and the mean is at the solid line(b) The median is at the solid line and the mean is at the dashed line(c) The mode is at the solid line and the median is at the dashed line(d) The mode is at the dashed line and the median is at the solid line(e) The mode is at the solid line and the median is at the dashed line

5) The following normal probability shows the distribution of points scored for the 551 players in the 2011-2012 NBA season. If the distribution of points was displayed in a histogram, what would be the best description of the histograms shape?

(a) Approximately normal(b) Symmetric but not approximately normal(c) Skewed left(d) Skewed right(e) Cannot be determined

FREE RESPONSE

6) How many pairs of shoes do students have? Do girls have more shoes than boys? Here are data from a random sample of 20 female and 20 male students at a large high school:

(a) Find and interpret the percentile in the female distribution for the girl with 22 pairs of shoes.

(b) Find and interpret the percentile in the male distribution for the boy with 22 pairs of shoes.

(c) Who is more unusual: the girl with 22 pairs of shoes or the boy with 22 pairs of shoes?

7) The length of human pregnancies from conception to birth varies according to a distribution that is approximately Normal with a mean 266 days and standard deviation 16 days.

(a) At what percentile is a pregnancy that last 240 days (8 months)?

(b) What percent of pregnancies last between 240 and 270 days?

(c) How long do the longest 20% of pregnancies last?

Chapter 3: Describing Relationships

Female: 50 26 26 31 57 19 24 22 23 38 13 50 13 34 23 30 49 13 15 51 Male: 14 7 6 5 12 38 8 7 10 10 10 11 4 5 22 7 5 10 35 7

VocabularyResponse Variable - Measures outcome of a study.Explanatory Variable - Helps explain/ predict changes in a response variable.Scatterplot - Shows relationship between two quantitative variable measured on the same individuals.Correlation - Measures direction and strength of linear relationship on scatterplot.Regression Line - Line that describes how a response variable (y) changes as an explanatory variable (x) changesExtrapolation - Use of a regression line for prediction for outside the interval of values of (x) used to obtain the line. Residual - Difference between an observed value of the response variable and the value predicted by the regression lineLeast-Square Regression line - Line that makes the sum of the squared residuals as small as possible Residual plot - indicates that a linear model may not be appropriateStandards deviations of residuals (s) - measures typical size of prediction errors when using regression liner2 - Fraction of the variation in the response variable that is accounted for by the least-square regression on the explanatory variable

Summary When describing a scatterplot, you look at direction, form, strength, and outliers. Direction has to do with the association, you can have a positive association or negative association. When looking at form, make sure it is a linear relationship, not curved or clustered. Strength is determined by how close the points in the scatterplot lie to form a line. Correlation can only be 0-1, 1 being perfect correlation and 0 being no correlation. Regression line:

Least-Squares regression line:

Standard deviation of the residuals (s):

Tips/Mistakes Correlation doesn't imply causation, since correlation isnt resistant, outliers can greatly skew results. You can find r in the calculator by inputting the data into L1 and L2 and then finding the linear regression line

equation (Stat, Calc, 8). *Make sure your diagnostics are on(go to the catalog on the calculator)* Scatterplots can be graphed on the calculator, input data into L1 and L2, go to Stat Plot, turn on plot one, and

hit graph. eXplanatory goes on the X axis Always make sure relationship is roughly linear and also detect any outliers Look for influential observations, individual points that change correlation/regression line NEVER conclude that there is a cause-and-effect relationship between two variables Residual plots can be made in calculator using the list option “RESID”

MULTIPLE CHOICE

1) You have data for many years on the average price of oil and the average price if a gallon of unleaded regular gasoline. If you want to see how well the price of oil predicts the price of gas, then you should make a scatter plot with what as your explanatory value?

(a) The price of oil (c) The year (e) Time(b) The price of gas (d) Either oil price or gas price

2) In a scatterplot of the average price of a barrel of oil and the average retail price of a gallon of gas, you expect to see:

(a) Very little association (d) A strong negative association(b) A weak negative association (e) A strong positive association(c) A weak positive association

3) Which of the following is not a characteristic of the least-squares regression line?(a) The slope of the least-squares regression line is always between -1 and 1(b) The least-squares regression line always goes through the point ( x , y )(c) The least-squares regression line minimizes the sum of squared residuals (d) The slope of the least-square regression line will always have the same sign as the correlation(e) The least-squares regression line is not resistant to outliers

4) Measurements on young children in Mumbai, India, found this least-squares line for predicting height y from arm span x: y=6.4+0.93 x. In addition to the regression line, the report on the Mumbai measurements says that r2= 0.95. This suggests that

(a) Although arm span and height are correlated, arm span does not predict height very accurately (b) Height increases by √0.95=0.97cm for each additional centimeter of arm span(c) 95% if the relationship between height and arm span is accounted for by the regression line(d) 95% of the variation in height is accounted for by the regression line(e) 95% of the height measurements are accounted for by the regression line

5) Suppose that a tall child with arm span 120cm and height 118cm was added to the sample used in this study. What effect will adding this child have on the correlation and the slope of the least-squares regression line?

(a) Correlation will increase, slope will increase(b) Correlation will increase, slope will stay the same(c) Correlation will increase, slope will decrease (d) Correlation will stay the same, slope will stay the same(e) Correlation will stay the same, slope will increase

FREE RESPONSE

6) Ninth grade students at the Webb Schools go on a backpacking trip each fall. Students are divided into hiking groups of size 8 by selecting names from a hat. Before leaving, students and their backpacks are weighed. The data here are from one hiking group in a recent year. Make a scatter plot by hand that shows how backpacks relate to body weight.

7) What’s my line? You use the same bar of soap to shower each morning. The bar weighs 80 grams when it is new. Its weight goes down by 6 grams per day on average. What is the equation of the regression line for predicting weight from days of use?

Body Weight (lb) 120 187 109 103

131 165 158 116

Backpack Weight (lb) 26 30 26 24 29 35 31 28

Chapter 4: Designing Studies

VocabularyC ensus - collects data from every individual in the populationStrata - classifying the population into groups of similar individualsClusters - Classifying the population into groups of individuals that are located near each other Bias - consistently underestimate or consistently overestimate the value we want to know Convenience sample - Choosing individuals that are easy to reachVoluntary Sample - Consists of people that choose themselvesSimple random sample (SRS) - size n is chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sampleConfounding - occur when two variables are associated in such a way that their effects on as response variable cannot be distinguished from each otherTreatment - a specific condition applied to the individuals in an experiment Subjects - when the units are human beings Completely randomized design - experimental units are assigned to the treatments completely by chance Statistically significant - An observed effect so large that it would rarely occur by chance Block - is a group of experiment to be similar in some way Randomized block design - random assignment of experimental units to treatments is carried out separately within each block  Matched pairs design - The idea is to create blocks by matching pairs of similar experimental units

SummaryThis chapter is about sampling and surveys, experiments, and how to use studies wisely. After reading the chapter section 4.1 you will be able to randomly obtain a sample, know the difference between kinds of sample data, and explain aspects of a survey that can lead to bias. Section 4.2 is all about experiments. In this section you will be taught the difference between an observational study and an experiment. You will also learn how to correctly set up an experiment, the purpose of certain kinds of experiments, and interpret the meaning of statistically significant in the context of an experiment, (basically the four step process: SPDC).  Section 4.3 is the shortest of the three sections. After reading this section you should be able to evaluate whether a study has been done ethically and describe the level of interference in your study.

Tips/Mistakes Bad samples include: convenience sampling, voluntary response sampling, undercoverage, and nonresponse. Always randomize your sampling whether it be with technology, randInt(  ,   ), or without, slips of paper in a

hat. Question wording and the order questions are asked matters. Most well designed experiments compare two or more treatments. Control groups in experiments help reduce variability in the response variable. If asked to describe the design of an experiment you’re expected to describe how treatments are assigned and

clearly state what will be measured or compared. Don’t mix the language of experiments and the language of sample surveys or other observational studies. Remember CRCR: Comparison, Randomness, Control, and Replication.

MULTIPLE CHOICE1) A study of treatments for angina (pain due to low blood supply to the heart) compared bypass surgery, angioplasty and use of drugs. The study looked at the medical records of thousands of angina patients whose doctors had chosen one of these treatments. It found that the average survival time of patients given drugs was the highest. What do you conclude?

(a) This study proves that drugs prolong life and should be the treatment of choice.(b) We can conclude that drugs prolong life because the study was a comparative experiment.(c) We can’t conclude that drugs prolong life because the patients were volunteers.(d) We can’t conclude that that drugs prolong life because this was an observational study.(e) We can’t conclude that drugs prolong life because no placebo was used.

2) Consider an experiment to investigate the effectiveness of different insecticides in controlling pests and their impact on the productivity of tomato plants. What is the best reason for randomly assigning treatment levels (Spraying or not spraying) to the experimental units (farms)?

(a) Random assignment allows researchers to generalize conclusions about the effectiveness of the insecticides to all farms.(b) Random assignment will tend to average out all other uncontrolled factors such as soil fertility so that they are not confounded with the treatment effects(c) Random assignment eliminates the effects of other variables, like soil fertility(d) Random assignment eliminates chance variation in the response.(e) Random assignment helps avoid bias due to the placebo effect.

3) Bias in a sampling method is (a) Any difference between the sample result and the truth about the population.(b) The difference between the sample result and the truth about the population due to using chance to select a sample(c) Any difference between the sample result and the truth about the population due to practical difficulties such as contacting the subjects selected.(d) Any difference between the sample result and the truth about the population that tends to occur in the same direction whenever you use this sampling method(e) Racism or sexism on the part of those who take the sample.

4) The web portal AOL places opinion poll questions next to many of its news stories. Simply click your response to join the sample. One of the questions in January 2008 was “Do you plan to diet this year?” More than 30,000 people responded, with 68% saying “Yes”. You can conclude

(a) About 68% of Americans planned to diet in 2008(b) The poll used a convenience sample, so the results tell us little about the population of all adults.(c) The poll uses voluntary response, so the results tell us little about the population of all adults.(d) The sample is too small to draw any conclusion.(e) None of these

5) A farmer is conducting an experiment to determine which variety of apple tree. Fuji or Gala, will produce more fruit in his orchard. The orchard is divided into 20 equally sized square plots. He has 10 trees of each variety and randomly assigns each tree to a separate plot in the orchard. What are the experimental units in this study?

(a) The trees (c) The apples (e) The Orchard(b) The plots (d) The farmer

FREE RESPONSE6) Elephants sometimes damage trees in Africa. It turns out that elephants dislike bees. They recognize bee hives in areas where they are common and avoid them. Can this be used to keep elephants away from trees? Will elephant damage be less in trees with hives? Will even empty hives keep elephants away? Researchers want to design an experiment to answer these questions using 72 acacia trees.

(a) Identify the experimental units, treatments, and the response variable.(b) Describe how the researchers could carry out a completely randomized design for this experiment. Include a description of how the treatments should be assigned.

7) On odd-numbered days, the subjects took either a tablet that contained aspirin or a dummy pill that looked and tasted like aspirin but had no active ingredients, (a placebo), On even-numbered days, they took either a capsule containing beta-carotene or a placebo. There were several response variables – the study looked for heart attacks, several kinds of cancer, and other medical outcomes. After several years, 239 of the placebo group but only 139 of the aspirin group had suffered heart attacks. This difference is large enough to give good evidence that taking aspirin does reduce heart attacks. It did not appear, however, that beta-carotene had any effect on preventing cancer.

Explain how each of the four principles of experimental design was used in the Physician’s Health Study.

Chapter 5: Probability

VocabularyLaw of Large Numbers - proportion of times that a particular outcome occurs in many repetitions will approach a single numberSample space, or S, is the set of all possible outcomesProbability model is a description of some chance process that consists of two parts: the sample space and the probability of each outcomeEvent - any collection of outcomes for some chance process. Events are subsets of the sample spaceMutually exclusive (disjoint) - two events that have no outcomes in common and so can never occur together Complement rule - P (A) = 1 - P (AC)Union (A ∪ B) - means all probabilities in A, B, or bothIntersection (A ∩ B) - means outcomes in both A and BGeneral addition rule - P (A or B) = P (A ∪ B) = P (A) + P (B) - P (A ∩ B)Conditional probability - chance that another event will happen, given that another one has occurred or P (B | A)

Conditional probability formula - P (A | B) = P (A ∩B)

P(B)General Multiplication rule – Prob. of events A & B occurring together or P(A and B) = P(A ∩ B) = P(A) × P(B | A)Independent - Knowledge of one outcome does not alter the probability of the other outcome. If this is true, then P (A | B) = P (A)

Summary5 basic probability rules

1. For any event A, 0 ≤ P(A)≤ 12. If S is sample space in probability model, P(S) = 13. In the case of equally likely outcomes, P(A) =

¿of outcomes corresponding¿event A ¿Total ¿

of outcomes∈S ¿

4. Complement rule: P(AC) = 1 – P(A)5. Addition Rule: for mutually exclusive events P(A or B) = P(A) + P(B)

Tree Diagrams

Probability Model (ex: flipping a coin 3 times)S = {HHH THH HTH HHT HTT THT TTH TTT}

Tips/Mistakes Don’t apply Law of Large Numbers to a small number of trials Use a tree diagram when faced with a sequence of events

(0.63) x (0.78) = 0.4914

(0.63) x (0.22) = 0.13860.22

Wins Point

Doesn’t win point

0.78

Makes first serve

0.37

0.63

(0.37) x (0.43) = 0.1591

(0.37) x (0.57) = 0.2109

Doesn’t win point

Wins PointMisses first

serve

0.57

0.43

Check for independence before diving into a problem Don’t confuse Law of Large Numbers with Law of Averages

MULTIPLE CHOICE1) A randomly selected student is asked to respond Yes, No, or Maybe to the question “Do you intend to vote in the next election?” The sample space is {Yes, No, Maybe}. Which of the following represents a legitimate assignment of the probabilities for this sample space?

(a) 0.4, 0.4, 0.2 (b) 0.4, 0.4, 0.6 (c) 0.3, 0.3, 0.3 (d) 0.5, 0.4, 0.15

2) There are 10 red marbles and 8 green marbles in a jar. If you take 3 marbles from the jar (without replacement), the probability that they are all red is:

(a) 0.069 (b) 0.088 (c) 0.147 d) 0.171

3) You play tennis regularly with a friend, and from past experience, you believe that the outcome of each match is independent. For any given match you have a probability of 0.6 of winning. The probability that you win the next two matches is:

(a) 0.16 (b) 0.36 (c) 0.4 (d) 0.6

4) Sara and Brandon are applying for summer jobs at a local restaurant. After interviewing them, the restaurant owner says, “The probability that I hire Sara is 0.7, and the probability that I hire Brandon is 0.4. The probability that I hire at least one of you is 0.9.” What is the probability that both Sara and Brandon get hired?

(a) 0.1 (b) 0.2 (c) 0.28 (d) 0.3

5) Select a random integer from -100 to 100. Which of the following pairs of events are mutually exclusive?(a) the number is odd; the number is 5 (c) the number is less than 5; the number is negative(b) the number is even; the number is greater than 10 (d) the number is above 50; the number is less than 20

FREE RESPONSE6) Airlines routinely overbook flights because they expect a certain number of no-shows. An airline runs a 5 P.M. commuter flight from Washington, D.C., to New York City on a plane that holds 38 passengers. Past experience has shown that if 41 tickets are sold for the flight, then the probability distribution for the number who actually show up for the flight is as shown in the table below.

Assume that 41 tickets are sold each flight.

(a) There are 38 passenger seats on the flight. What is the probability that all passengers who show up for this flight will get a seat?(b) What is the expected number of no-shows for this flight?(c) Given that not all passenger seats are filled on a flight, what is the probability that only 36 passengers showed up for the flight?

7) Nine sales representatives, 6 men and 3 women, at a small company wanted to attend a national convention. There were only enough travel funds to send 3 people. The manager selected 3 people to attend and stated that the people were selected at random. The people selected were women. There were concerns that no men were selected to attend the convention.

(a) Calculate the probability that randomly selecting 3 people from a group of 6 men and 3 women will result in selecting 3 women.(b) Based on your answer to part (a), is there reason to doubt the manager’s claim that the 3 people were selected at random? Explain.(c) An alternative to calculating the exact probability is to conduct a simulation to estimate the probability. A proposed simulation process is described below.

Number who actually show up 36 37 38 39 40 41

Probability 0.46 0.30 0.16 0.05 0.02 0.01

Each trial in the simulation consists of rolling three fair, six-sided dice, one die for each of the convention attendees. For each die, rolling a 1, 2, 3, or 4 represents selecting a man; rolling a 5 or 6 represents selecting a woman. After 1,000 trials, the number of times the dice indicate selecting 3 women is recorded.

Does the proposed process correctly simulate the random selection of 3 women from a group of 9 people consisting of 6 men and 3 women? Explain why or why not.

Chapter 6: Random Variables

VocabularyRandom variable- values that describe the outcomes of some chance process.Probability distribution- gives the possible values and their probabilities of random variablesDiscrete vs continuous variables- has a fixed set of numbers vs. often things we cannot count. Binary is always discrete.Binomial random variable- the count of X of successes in a binomial settingBinomial distribution- the probability distribution of XParameter N- number of trialsParameter P- the probability of successes

Summary Mean(expected value) → µx=E ( x )=Σ(x i∗pi) Standard Deviation of x → σ x=√var (x )=√ Σ¿¿

Effects of multiplying or dividing a random variable by a constant:Shape- no changeCenter- only changes to mean and medianSpread- only changes to standard deviation, interquartile range, range

For any two random variables, X and Y, if T = X + Y: Expected value of T E(T )=μt=μx+μ y

Variance of T (σ T )2= (σ x )2−(σ Y )2 **Only if X & Y are independent

Conditions for a binomial settingBinary- can the outcomes be classified as “success” or “failure”?Independent- the result of one trial must not affect the result of another trial.Number- number of trials of the chance process must be fixed beforehand. Same- there is the same probability of success for each trial.

Calculating binomial probabilities: P(x=K )=( nk ) pk (1−p )n−k

Calculating binomial probabilities in calculator: P ( x≤ K )=binomcdf and P ( x=K )=binompdf Calculating binomial distribution

Mean μx=n∗p Standard deviation σ x=√np (1−p)

10% condition: n≤ 110 N , where n = Sample size N = Population size

Calculating geometric probabilities: P ( y=K )=(1−p )k−1∗p Calculating geometric probabilities in calculator: P ( y≤ K )=geometcdf and P ( y=K )=geometpdf

Geometric distribution: Mean μy=E( y)=1p

Tips/Mistakes Sketch graph to help you visualize the problem

Make sure your answer is logical and relating to the problem Addition and subtraction do not have an impact on standard

deviation!!!! When using calculator to find mean and standard deviation write down the first few terms in order to

receive credit.MULTIPLE CHOICE1) Choose an American household at random and let the random variable X be the number of cars (including SUVs and light trucks) they own. Here is the probability model if we ignore the few households that own more than 5 cars:

What is the expected number of cars in a randomly selected American households?

(a) 1.00 (b) 1.75 (c) 1.84 (d) 2.00 (e) 2.50

2) Using the mean (expected number) in Q1 and knowing the standard deviation of X is 1.08, aType equation here .bout what percentage of households have a number of cars within 2 standard deviations of the mean?

(a) 68% (b) 71% (c) 93% (d) 95% (e) 98%

3). Seventeen people have been exposed to a particular disease. Each one independently has 40% chance of contracting the disease. A hospital has the capacity to handle 10 cases of the disease. What is the probability that the hospital’s capacity will be exceeded?

(a) 0.011 (b) 0.035 (c) 0.092 (d) 0.965 (e) 0.989

4) Joe reads that 1 out of 4 eggs contains salmonella bacteria. So he never uses more than 3 eggs in cooking. If eggs do or do not contain salmonella independently of each other, the number of contaminated eggs when Joe uses 3 chosen at random, has the following distribution

(a) Binomial; n=4 and p= ¼ (c) Binomial; n= 3 and p= 1/3 (e) Geometric; p= 1/3(b) Binomial; n=3 and p= ¼ (d) Geometric; p= ¼

5) In which of the following situations would it be appropriate to use a Normal Distribution to approximate probabilities for a binomial distribution with the given values of n and p?

(a) n = 10, p = 0.5 (c) n = 100, p = 0.2 (e) n = 1000, p = 0.003(b) n = 40, p = 0.88 (d) n = 100, p = 0.99

FREE RESPONSE6) Mr. Bullard's AP statistics class did the activity on pg 346. There were 21 students in the class. If we assume that the students in his class could not tell tap water from bottled water, then each one was basically guessing with a 1/3 chance of being correct. Let X= the number of student who correctly identified the cup containing the different type of water.

(a) Explain why X is a binomial random variable(b) Find the mean and standard deviation of X. interpret each value in context.(c) Of the 21 students in the class, 13 made correct identifications. Are you convinced that Mr.Bullard’s students could tell bottled water from tap water? Justify your answer.

7) Researchers randomly select a married couple in which both spouses are employed. Let X be the income of the husband and Y be the income of the wife. Suppose that you know the means μx∧μyand the variance σ x

2 and σ y2of both

variable.(a) Is it reasonable to take the mean of the total income of X + Y to be μx+μy? Explain your answer.(b) Is it reasonable to take the variance of the total income to be σ x

2+ σ y2? Explain your answer.

Number of cars, X: 0 1 2 3 4 5

Probability 0.09 0.36 0.35 0.13 0.05 0.02

8) Buckley Farms produces homemade potato chips that sell in bags labeled “16 ounces”. The total weight of each bag follows an approximately Normal distribution with a mean of 16.15 ounces and a standard deviation of 0.12 ounces.

(a) If you randomly selected 1 bag of these chips, what is the probability that the total weight is less than 16 ounces?(b) If you randomly selected 10 bags of these chips, what is the probability that exactly 2 of the bags will have a total weight less than 16 ounces?

Chapter 7: Sampling Distributions

Vocabulary

Parameters- describes populationStatistics- describes a Sample,Sampling variability- value of statistic varies in repeated random sampling, this is important because then we know how close the estimates are to the truth.Central Limit Theorem - When “n” is large, the sample dist. of the sample mean is approx. normal.Unbiased Estimator- statistic used to estimate parameter, only if mean of sample distribution equals value of parameter being estimatedBiased Estimator- statistic is consistently lower/higher than parameter

Summary

Sampling Distribution of Sample Proportion: μ p=p σ p=√ p (1−p )n

1. Random sampling2. Independent: 10% condition3. Large counts condition: np ≥ 10 , n(1−p)≥10

Sampling Distribution of Sample Mean: µx=µ σ x=σ√n

1. & 2. Same3. If population is not Normal, Large counts condition: n ≥ 30

7.1- What is a Sampling Distribution? -parameter vs. statistic-statistics vary-distributions of population vs. distribution of sample-how to describe/interpret a sampling distribution

7.2 Sample Proportions -Large counts: np 10, n(1-p) 10 checking for normality-Sample proportion p unbiased estimator-Variability of distribution p is smaller when the sample is larger

7.3 Sample Means -if the population is normal, x is normal-if the population is not normal, x will mirror the shape unless n > 30, this is known as the Central Limit Theorem-x unbiased estimator of population mean

Tips/Mistakes

If you’re unsure about notation then don’t use it! You could potentially earn a P instead of E because of a incorrect symbols being used.

Same thing with terminology, it is important not to say “sample distribution” rather than “sampling distribution.” Be very careful with both symbols and terminology!

Be specific with which distribution you are referring to—“the distribution is …” is not enough.

MULTIPLE CHOICE

1) A researcher initially plans to take an SRS of size n from a population that has mean 80 and standard deviation 20. If he were to double his sample size, the standard deviation of the sampling distribution of the sample mean would be multiplied by…

(a) √2 (b) 1√2

(c) 2 (d) 12 (e)

1√2 n

2) Which of the following statements are true?I. Sample parameters are used to make inferences about populations.II. Statistics from smaller samples have more variability.III. Parameters are fixed while statistics vary depending on which sample is chosen.

(a) I and II (b) I and III (c) II and III (d) I, II, and III (e) None of the above.

3) The Gallup Poll has decided to increase the size of its random sample of voters from about 1500 people to about 400 people right before an election. The poll is designed to estimate the proportion of voters who favor a new law banning in public buildings. The effect of this increase is to…

(a) reduce the bias of the estimate. (d) increase the variability of the estimate.(b) increase the bias of the estimate. (e) reduce the bias and variability of the estimate.(c) reduce the variability of the estimate.

4) Which of the following statements are true?I. Sampling distribution of p has a mean equal to the population proportion p.II. Sampling distribution of p has a standard deviation = √np(1−p) .III. Sampling distribution of p is considered close to normal, provided that n ≥ 30.

(a) I and II (b) I and III (c) II and III (d) I, II, and III (e) None of the above.

5) Which of the following statements are unbiased estimators for the corresponding population parameters?I. Sample meansII. Sample proportionsIII. Difference of sample meansIV. Different of sample proportions

(a) None of the above. (b) I and II (c) I and III (d) III and IV (e) All of the above.

FREE RESPONSE

6) The mathematics department at a state university notes that the SAT math scores of high school seniors applying for admission into their program are normally distributed with a mean of 610 and standard deviation of 50.

(a) What is the probability that a randomly chose applicant to the department has an SAT math score above 700?(b) What is the shape, mean, and standard deviation of the sampling distribution of the mean of a sample of 40 randomly selected applicants?(c) What is the probability that the mean SAT math score is an SRS of 40 applicants is above 625?(d) Would your answers to A. B. or C. be affected if the original population of SAT math scores were highly skewed instead of normal? Explain.

7) Suppose that the heights of college basketball players are normally distributed with a mean of 74 inches and a standard deviation of 4 inches.

(a) What percentage of players are over 7 feet.(b) What is the probability that at least one of ten randomly selected players is over 7 feet.(c) If an outlier is defined to be any value more than 1.5 interquartile ranges above the third quartile or below the first quartile, what percentage of heights of players are outliers.

Chapter 8: Confidence Intervals (1-sample)

VocabularyConfidence Interval - an interval of plausible values for a parameter. It’s calculated: point estimate± margin of errorPoint Estimate - a single best guess for the value of a population parameter. Margin of Error - how far we expect the sample statistic to vary from the population parameter at most. Confidence Level - the overall success rate of the method for calculating the confidence interval. That is, in C% of all possible samples, the method would yield an interval that captures the true parameter value. Standard Error of the Statistic - the result when the standard deviation of a statistic is estimated from data. Standard Error of the Sample Mean - describes how far x will typically be from μ in repeated SRSs of size n.

Summary Interpret a confidence interval and confidence level in context. Describe and determine whether the Random, 10%, and Large Counts conditions are met so that a confidence

interval can be constructed for a population proportion. o Large Counts: n p≥10 , n(1− p)≥ 10

Describe and determine whether the Random, 10%, and Normal/Large Sample conditions are met so that a confidence interval can be constructed for a population mean.

o Normal/Large Sample: n ≥ 30 Determine the point estimate from a confidence interval: Represents a single value on a number line. Determine the “Margin of Error” of a Confidence Interval: (Critical value) - (standard deviation of statistic) Calculate a confidence interval (formula)

o Statistic ± (critical value) - (standard deviation of statistic) Distinguish the standard deviation of a statistic and the standard error of a statistic. Both measure typical

distance from mean, but we use standard deviation for a population and standard error for a sample.

Calculate the standard error of a sample proportion: SE p=√ p(1− p)n

Identify and calculate a critical value.o Z*=Critical Value=CI=Statistic ± (Z*) (SE)o Measures how many standard errors we need to extend the interval to get the desired

confidence level Calculate a 1-sample z and t interval for a proportion and mean.

o p ± z* √ p(1− p)n

o x ± t* Sx√ n

Tips/Mistakes Ensure all conditions for constructing a confidence interval are met (proportion or mean). If one of the

conditions is violated, the actual capture rate will be off from the intended confidence interval. Always perform the four-step process when calculating a confidence interval. This process was designed from

the AP rubric and doing it will maximize the number of points you can receive. If you are uncertain whether you should perform a 1-Sample Z Interval or a 1-Sample T Interval, remember

zap (proportion) tax (mean)! When you are constructing a confidence interval for a mean, not including a graph of sample data if n≤30

could cause you to lose credit for the Normal/Large Sample Condition on the AP Exam.

When you are constructing a confidence interval for a mean, not understanding that the Normal/Large Sample Condition is about population being normal could cause you to lose credit.

Understand that the confidence level does not tell us the chance that a particular confidence interval captures the population parameter. Rather, the confidence interval gives us a set of plausible values for the parameter.

On the AP Exam, you may be asked to interpret the confidence interval, the confidence level, or both. Ensure you are able to differentiate the two.

It is ok to use your calculator to compute a confidence interval of the AP Exam, but ensure you name the procedure and give the interval. If you just give the calculator answer with no work, you could receive no credit if the interval is wrong.

MULTIPLE CHOICE

1) A health fitness research group wishes to estimate the mean amount of time (in hours) that members of a fitness center spend each week exercising at the center. They want to estimate the mean within a margin of error of 0.5 hours with a 95% level of confidence. Previous data suggests that σ = 2.2. Which of the following is the smallest sample size that meets these criteria?

(a) 60 (b) 75 (c) 90 (d) 180 (e) 190

2) A 90% confidence interval for a population mean is determined to be 800 to 900. If the confidence is increased to 95% confidence while the sample statistics and sample size remain the same, the confidence interval for μ

(a) becomes narrower (c) does not changes (e) becomes 0.025(b) becomes 0.05 (d) becomes wider

3) A random sample of 100 visitors to a popular theme park spent an average of $142 on the trip with a standard deviation of $47.50. Which of the following would the 98% confidence interval for the mean money spent by all visitors to this theme park?

(a) ($130.77, $153.23) (c) ($132.69, $151.31) (e) ($95.45, $188.55)(b) ($132.57, $151.43) (d) ($140.88, $143.12)

4) A quality control specialist at a plate glass factory must estimate the mean clarity rating of a new batch of glass sheets being produced using a sample of 18 sheets of glass. The actual distribution of this batch is unknown, but preliminary investigations show that a normal approximation is reasonable. The specialist decides to use a t-distribution rather than a z-distribution because

(a) The z-distribution is not appropriate because the sample size is too small.(b) The sample size is large compared to the population size. (c) The data comes from only one batch.(d) The variability of the batch is unknown. (e) The t-distribution results in a narrower confidence interval.

5) A biologist has taken a random sample of a specific type of fish from a large lake. A 95% confidence interval was calculated to be 6.8 ± 1.2 pounds. Which of the following is true?

(a) 95% of all fish in the lake weigh between 5.6 and 8 pounds. (b) In repeated sampling, 95% of the sample proportions will fall within 5.6 and 8 pounds. (c) In repeated sampling, 95% of the time the true population of mean fish weights will be equal to 6.8 pounds.(d) In repeated sampling, 95% of the time the true population mean of fish weight will be captured in the constructed interval. (e) We are 95% confident that all the fish weigh less than 8 pounds in this lake.

FREE RESPONSE

6) A random survey finds that 587 out of 675 adults claim never to take naps during the day. Construct a 95% confidence interval of the proportion of adults who never take naps.

7) An SRS of 1000 voters finds that 57% believe that competence is more important than character in voting for President of the United States.

(a) Determine a 95% confidence interval estimate for the percentage of voters who believe competence is more important than character. Assume the conditions for inference have been met.

(b) If your parents know nothing about statistics, how would you explain to them why you couldn’t simply say that 57% of voters believe that competence is more important?

(c) Also explain to your parents what is meant by 95% confidence level. Chapter 9: Significance Tests (1-sample)

VocabularySign ificance test- a formal procedure for using observed data to decide between two competing claims (hypotheses).Null Hypothesis (Ho) - hypothesis is the statement of no differenceAlternative hypothesis (Ha) - claim about the population that we are trying to find evidence forOne- sided- states that the parameter is larger (or smaller) than the null hypothesis valueTwo-sided- states that the parameter is different than the null hypothesis.P-value- the probability that the statistic would take a value as extreme as or more extreme as the one observed.Reject Ho- if our sample result is too unlikely to have happened by chance (p-value < α=0.05), assuming Ho is true.Fail to reject Ho- We say that there is NOT convincing evidence for Ha.Test Statistic- measures how far the sample statistic diverges from what we would expect the Ho hypothesis to be.Power- the probability that the test will reject the null at a chosen significance level, when the specified value of the parameter is true.Paired Data – study designs that involve making two observations on the same individual or one observation on each of 2 similar individuals

Summary

The hypothesis is usually stated in terms of the population parameters & often Ho is a statement of no change, whereas the Ha is a statement of what we hope is true.

Conditions for 1-sample Z test: Random, 10% condition, Large Counts: n p0 ≥10 , n(1−p0)≥ 10-Confidence Intervals: Provide additional info that significance tests do not. (Formula chart)

There is an important link between the probabilities of Type I and Type II Errors in significance tests- as one increases, the other decreases.

We can increase the power of a significance test by increasing the sample size, increasing the significance level, or increasing the difference that is important to detect between the Ho and Ha parameter values.

Conditions for 1- Sample t test:o Random

Independence: 10% conditiono Normal/ large sample- n ≥ 30

If n < 30, examine a graph of the example data for Normality. There are 3 factors that influence the sample size required for a statistical test: significance level, effect size, and the desired power of the test. Many tests run at once will probably produce significant results by chance alone, even if all the null hypotheses are true.

Errors: Ho True Ha TrueReject Ho Type I Fail to reject Ho Type II

Tips/Mistakes *Never* accept the Ho! Always label the variables that you use!

o Example: Ho: µ1 = µ2, where µ is the…. Zap Tax! Where p= proportion and x= mean If you are unsure of which variable notation to use on the FRAPPY portion of the exam, go straight to

plugging the given numbers into the equation! The four step process: State, Plan, Do, Conclude

MULTIPLE CHOICE

1) In formulating hypotheses for a statistical test of significance, the null hypothesis is often(a) A statement of “no effect” or no difference(b) The probability of observing the data you actually obtained(c) A statement that the data are all 0(d) 0.05(e) The probability that the parameter value is actually µ

2) The p-value of a test of a null hypothesis is the probability that(a) assuming that the null hypothesis is false, the test statistic will take a value at least as extreme as that actually observed(b) the null hypothesis is true(c) the null hypothesis is false(d) the alternative hypothesis is true(e) assuming that the null hypothesis is true, the test statistic will take a value at least as extreme as that actually observed

3) I conduct a statistical test of the hypotheses and find that the null hypothesis is statistically significant at the level α=0.05. I may conclude:

(a) The test would also be significant at level α=0.10 (d) Both A and C(b) The test would also be significant at level α=0.01 (e) Both B and C(c) The p-value is less than 0.05

4) A researcher plans to conduct a test of hypotheses at the level α = 0.01 significance level. She designs her experiment to have a power of 0.99 at a particular alternative value of the parameter of interest. The probability that the researcher will commit a Type I error is

(a) 0.01 (b) 0.10 (c) 0.89 (d) 0.99 (e) None of these

5) Which of the following will increase the power of a statistical test of significance?(a) Increase the Type II error probability(b) Increase the sample size(c) Reject the null hypothesis only if the P-value is smaller than the level of significance(d) Decrease the significance level(e) All of the above

FREE RESPONSE

6) A certain intelligence test is designed to have a population of scores following a normal distribution with a mean score of 100. Below are scores on this intelligence test from 6 randomly selected undergraduate students from Thorndike University. 110 118 110 122 110 150

(a) Do these scores suggest that, on average, the population of undergraduates at Thorndike University have higher than average intelligence scores? Carry out an appropriate test at the 5% level to help answer this question.(b) What would constitute a Type I error for this test?

7) A drug manufacturer claims that 9 out of 10 doctors recommend aspirin for their patients with headaches. To test this claim, a random sample of 100 doctors is obtained. If these 100 doctors, 82 indicate that they recommend aspirin.

(a) Do these results support the claim of the drug manufacturer? Support your conclusion with a test of significance.(b) What would constitute a Type II error for this test?

Chapter 10: Comparing Two Populations or Groups

VocabularyStandard Error - The standard error is a measure of the variability of a statistic. It is an estimate of the standard deviation of a sampling distribution. The standard error depends on three factors:

N: The number of observations in the population. n: The number of observations in the sample. The way that the random sample is chosen.

Test Statistic - In hypothesis testing, the test statistic is a value computed from sample data. The test statistic is used to assess the strength of evidence in support of a null hypothesis.Parameter - A parameter is a measurable characteristic of a population, such as a mean or a standard deviation.Sampling Distribution - Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample. The probability distribution of this statistic is called a sampling distribution.

SummaryIn this chapter, one learns to

Describe the characteristics of the sampling distribution of the difference between two sample means. Calculate probabilities using the sampling distribution of the difference between two sample means. Determine whether the conditions for performing inference are met Use two-sample t procedures to compare two means based on summary statistics or raw data. Interpret computer output for two- sample t procedures. Perform a significance test to compare two means. Interpret the results of inference procedures.

Tips/Mistakes Use 10% to check conditions, not > 30. Name the procedure when using the 2-PropZInt on the calculator (two proportion z interval). Always check conditions in the Plan step. (And don’t forget State, Plan, Do, Conclude.) Designs that use more power are the best. Just say no to pooling!

MULTIPLE CHOICE

1) A SRS of size 100 is taken from a population having proportion .8 successes. An independent SRS of size 400 is taken from a population having proportion .5 successes. The sampling distribution of the difference in the sample proportions has what mean?

(a) 0.3(b) 0.15(c) The smaller of 0.8 and 0.5(d) The mean cannot be determined without the sampling results.

(e) None of the above

2) We wish to test if a new feed increases the mean weight gain compared to an old feed. At the conclusion of the experiment it was found that the new feed gave a 10 kg bigger gain than the old feed. A two-sample t test with the proper one-sided alternative was done and the resulting P-value was .082. This means that

(a) There is an 8.2% chance the null hypothesis is true.(b) There was only an 8.2% chance of observing an increase greater than 10 kg (assuming the null hypothesis was true).(c) There was only an 8.2% chance of observing an increase greater than 10 kg (assuming the null hypothesis was false).(d) There is an 8.2% chance the alternative hypothesis is true.(e) There is only an 8.2% chance of getting a 10 kg increase.

3) Suppose we have two SRSs from two distinct populations and the samples are independent. We measure the same variable for both samples. Suppose both populations of the values of these variables are Normally distributed but the means and standard deviations are unknown. For purposes of comparing the two means, we use

(a) Two-Sample t procedures (d) The least squares regression line(b) Matched pairs t procedures (e) None of the above.(c) Two-proportion z procedures

4) A study of road rage asked separate random samples of 596 men and 523 women about their behavior while driving. Based on their answers, each respondent was assigned a road rage score on a scale of 0 to 20. Are the conditions for performing a two sample t test satisfied?

(a) Maybe; we have independent random samples, but we need to look at the data to check Normality. (b) No; road rage scores in a range between 0 and 20 can’t be Normal. (c) No; we don’t know the population standard deviations. (d) Yes; the large sample sizes guarantee that the corresponding population distributions will be Normal. (e) Yes; we have two independent random samples and large sample sizes.  

5) Researchers are interested in evaluating the effect of a natural product on reducing blood pressure. This will be done by comparing the mean reduction in blood pressure of a treatment (natural product) group and a placebo group using a two-sample t test. The researchers would like to be able to detect whether the natural product reduces blood pressure by at least 7 points more, on average, than the placebo. If groups of size 50 are used in the experiment, a two-sample t test using ɑ = 0.01 will have a power of 80% to detect a 7-point difference in mean blood pressure reduction. If the researchers want to be able to detect a 5-point difference instead, then the power of the test

(a) Would be less than 80%.(b) Would be greater than 80%.(c) Would still be 80%. (d) Could be either less than or greater than 80%, depending on whether the natural product is effective. (e) Would vary depending on the standard deviation of the data.

FREE RESPONSE

6) Observational studies suggest that moderate use of alcohol by adults reduces heart attacks and that red wine may have special benefits. One reason may be that red wine contains polyphenols, substances that do good things to cholesterol in the blood and so may reduce the risk of heart attacks. In an experiment, healthy men were assigned at random to drink half a bottle of either red or white wine each day for two weeks. The level of polyphenols in their blood was measured before and after the two-week period. Here are the percent changes in level for subjects in both groups:

(a) Make a dotplot and write a few sentences comparing the distributions. (b) Construct and interpret a 90% confidence interval for the difference in mean percent change in polyphenol levels for the red wine and the white wine treatments.

Red 3.5 8.1 7.4 4.0 0.7 4.9 8.4 7.0 5.5

White 3.1 0.5 -3.8 4.1 -0.6 2.7 1.9 -5.9 0.1

(c) Does the interval in part (b) suggest that red wine is more effective than white wine? Explain.

7) Nicotine patches are often used to help smokers quit. Does giving medicine to fight depression also help? A randomized double-blind experiment assigned 244 smokers to receive nicotine patches and another 245 to receive both a patch and the antidepressant drug bupropion. Results: After a year, 40 subjects in the nicotine patch group had abstained from smoking, as had 87 in the patch-plus-drug group.

(a) Is this good evidence that adding bupropion increases the success rate? Carry out and appropriate test to help answer this question.(b) Construct and interpret a 99% confidence interval for the difference in population proportions.

Chapter 11: Distributions of Categorical Data

VocabularyChi-Square statistic - Measures how far the observed counts are from the expectedChi-Square Test for Independence - Used to test if there is or is not an association between two categorical values in the population of interestOne-way table - Often used to display the distribution of a single categorical variable for a sample of individualsExpected counts - The expected number of individuals in the sample that would fall in each cell of the one or two-way table if Ho were true

Summary

Goodness of Fit: A goodness of fit test is used to help determine whether a population has a certain

hypothesized distribution, expressed as proportions of individuals in the population falling into various outcome categories.

o Conditions: Random, Independent (10% condition), Expected Counts ≥ 5o Degrees of freedom: df =n−1, where n is the number of categorieso Expected counts=n∗p0=¿o H 0: The distribution of ______ is the same as the stated/hypothesized distribution.

Homogeneity: Used to test if there is or is not a difference in the distribution of a categorical variable for several populations or treatments

o Conditions: Random, Independent (10% condition), Expected Counts ≥ 5o Degrees of freedom: df =(¿ rows−1)(¿ columns−1)

o Expected counts= (row total ) ( columntotal )table total

o H 0: There is no difference in true distributions of ________ and ________. Independence: Knowing the value of one variable helps predict the value of the other. If knowing one variable

does not help predict the value of the other variable, there is no association between variables.o Conditions: Random, Independent (10% condition), Expected Counts ≥ 5o Degrees of freedom: df =(¿ rows−1)(¿ columns−1)

o Expected counts= (row total ) ( columntotal )table total

o H 0: There is no association between _____ and ______ in population of ______.

Tips/Mistakes Failing to reject Ho does not mean the null hypothesis is true! It means we cannot conclude the data is correct.

All we can say is that the sample did not provide convincing evidence to reject Ho. The Chi-Square statistic compares observed and expected counts. Don’t try to perform calculations with

observed and expected proportions in each category.

When checking large counts condition, be sure to examine expected counts NOT observed. We cannot use a Chi-Square test for a one sided alternative hypothesis.

MULTIPLE CHOICE

1) If χ2=69.8∧p−value ≈ 0, and assuming researchers used a significance level of 0.05, which of the following is true?

(a) A Type I error is possible. (b) A Type II error is possible. (c) Both a Type I and Type II error are possible. (d) There is no chance of making Type I or Type II error because the P-value is approximately 0. (e) There is no chance of making Type I or Type II error because the calculations are correct.

2) An investigator was studying a territorial species of Central American termites, Nasturtiums cornier. Forty-nine termite pairs were randomly selected; both members of each of these pairs were from the same colony. Fifty-five additional termite pairs were randomly selected; the two members in each of these pairs were from different colonies. The pairs were placed in petri dishes and observed to see whether they exhibited aggressive behavior. The results are shown in the table.

A Chi-square test for homogeneity was conducted, resulting in χ2 = 7 .638. The expected counts are shown in parentheses in the table. Which of the following sets of statements follows from these results?

(a) χ2 is not significant at the 0 .05 level.(b) χ2 is significant, 0 .01 < p < 0 .05; the counts in the table suggest that termite pairs from the same colony are less likely to be aggressive than termite pairs from different colonies.(c) χ2 is significant, 0 .01 < p < 0 .05; the counts in the table suggest that termite pairs from different colonies are less likely to be aggressive than termite pairs from the same colony.(d) χ2 is significant, p < 0 .01; the counts in the table suggest that termite pairs from the same colony are less likely to be aggressive than termite pairs from different colonies.(e) χ2 is significant, p < 0 .01; the counts in the table suggest that termite pairs from different colonies are less likely to be aggressive than termite pairs from the same colony.

3) A chi-square goodness of fit test is considered to be valid if each of the expected values is…(a) greater than 0 (b) less than 5 (c) between 0 and 5 (d) at most 1 (e) at least 5

4) The actual counts in a two-way table are referred to as the expected counts(a) True (b) False

FREE RESPONSE

5) A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were classified by gender (male or female) and by voting preference (Republican, Democrat, or Independent). Results are shown in the table below.

Aggressive Nonaggressive

Total

Same colony

40 (33.5) 9 (15.5) 49

Different colonies

31 (37.5) 24 (17.5) 55

Total 71 33 104

Voting Preferences TotalRepublican Democrat Independent

Male 200 150 50 400Female 250 300 50 600Total 450 450 100 1000

Is there a gender gap? Do the men's voting preferences differ significantly from the women's preferences? Use a 0.05 level of significance.

6) In a study of the television viewing habits of children, a developmental psychologist selects a random sample of 300 first graders - 100 boys and 200 girls. Each child is asked which of the following TV programs they like best: The Lone Ranger, Sesame Street, or The Simpsons. Results are shown in the table below.

Do the boys' preferences for these TV programs differ significantly from the girls' preferences? Use a 0.05 level of significance.

Viewing Preferences TotalLone Ranger Sesame Street The Simpsons

Boys 50 30 20 100Girls 50 80 70 200Total 100 110 90 300