chapter 5

Download Chapter 5

If you can't read please download the document

Upload: crescent

Post on 25-Feb-2016

40 views

Category:

Documents


4 download

DESCRIPTION

Chapter 5. Statistical Reasoning. 5.1 – exploring data. Chapter 5. Data set. The payrolls for three small companies are shown in the table. Figures include year-end bonuses. Each company has 15 employees. Sanela wonders if the companies have similar “average” salaries. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Statistical ReasoningChapter 5Chapter 55.1 exploring dataData set

The payrolls for three small companies are shown in the table. Figures include year-end bonuses. Each company has 15 employees. Sanela wonders if the companies have similar average salaries.What is the range of salaries for each company?245 000 27 300 (Media Focus Advertising)217 700362 000 52 300 (Computer Rescue)309 70097 500 55 250 (Auto Value Sales)42 250Are there any outliers for any of the companies?Measures of central tendency

What are the measures of central tendency (mean, median, and mode)?Mean: Company A:27 300 + 28 500 + 33 400 + 36 200 + 39 500 + 42 500 + 47 400 + 57 500 + 61 000 + 61 000 + 65 000 + 71 000 + 86 000 + 162 000 + 245 000 = 1 063 3001 063 300/15 = 70 887

Company B:52 300 + 52 700 + 53 100 + 53 800 + 55 200 + 55 900 + 56 500 + 59 000 + 59 200 + 62 500 + 63 000 + 96 500 + 96 500 + 112 000 + 362 000 = 1 290 2001 290 200/15 = 86 013

Company C:55 250 + 55 250 + 56 900 + 57 300 + 57 900 + 58 200 + 58 300 + 58 900 + 61 500 + 62 300 + 62 800 + 62 800 + 64 400 + 66 900 + 97 500 = 936 200 936 200/15 = 62 413Median and mode

What are the measures of central tendency (mean, median, and mode)?Median: Were looking for the term right in the middle

Company A: 57 500Company B: 59 000Company C: 58 900Mode:The number that appears the most times

Company A: 61 000 Company B: 96 500Company C: 55 250Mean:Company A: 70 887Company B: 86 013Company C: 62 413Paulo needs a new battery for his car. He is trying to decide between two different brands. Both brands are the same price. He obtains data for the lifespan, in years, of 30 batteries of each brand, as shown below.

Whats the range?Brand X: 3.1 8.2Brand Y: 4.5 7.0Range is a measure of dispersion. Dispersion is a measure that varies by the spread among the data in a set; dispersion has a value of zero if all the data in a set is identical, and it increases in value as the data becomes more spread out.Whats the mode?Brand X: 5.7Brand Y: 5.6, 5.7, 5.8, 5.9, 6.0, 6.8 Is the mode useful in this comparison?Which brand would you choose?handoutToday we will be examining sets of data. Answer all of the questions in the handout to your fullest abilities, because this is a summative assessment.Chapter 55.2 Frequency Tables, histograms, and frequency polygonsexample

The following data represents the flow rates of the Red River from 1950 to 1999, as recorded at the Redwood Bridge in Winnipeg, Manitoba.Example (continued)Determine the water flow rate that is associated with serious flooding by creating a frequency distribution.Frequency distribution is a set of intervals (table or graph), usually of equal width, into which data is organized; each interval is associated with a frequency that indicates the number of measurements in this interval.Whats the lowest water flow rate?159Whats the highest? 4587The range is 4587 159 = 4428 We can divide this into 10 equal parts:4428/10 = 442.8So, lets let the interval width be 500. Flow Rate(m3/s)TallyFrequency (# of years)0 - 500500 - 10001000-15001500-20002000-25002500-30003000-35003500-40004000-45004500-5000Example (continued)Instead, you could have created a histogram. A histogram is the graph of a frequency distribution, in which equal intervals of values are marked on a horizontal axis and the frequencies associated with these intervals are indicated by the areas of the rectangles drawn for these intervals.

We know that there were nine floods. Based on the histogram, the flow rate was greater than 1950 m3/s in only 12 years. These 12 years must include the flood years. We could predict that floods occur when the flow rate is greater than 1950 m3/s.Example (continued)Or, we could create a frequency polygon. A frequency polygon is the graph of a frequency distribution, produced by joining the midpoints of the intervals using straight lines.

Most of the data is in the first four intervals, and the most common water flow is between 1500 and 2000 m3/s. After this, the frequencies drop off dramatically.

There were six years where the flow rate was around 2625, 3075 or 4425 m3/s. These must have been flood years. The other three floods should have occurred when the flow rate was around 2175 m3/s. Assuming that the flow rate in three of those years was 2175 m3/s or greater, floods should occur when the flow rate is 2175 m3/s or greater.example

The magnitude of an earthquake is measured using the Richter scale. Examine the histograms for the frequency of earthquake magnitudes in Canada from 2005 to 2009. Which of these years could have had the most damage from earthquakes?Example (continued)We could use a frequency table:Year20052006200720082009Frequency of Earthquakes from 4.0 to 4.95891672008 had the greatest number of earthquakes with the potential for minor damage.

Four of the years had three earthquakes with magnitudes from 5.0 to 5.9, while 2008 had five earthquakes with these magnitudes.YearMagnitude on Richter ScaleTotal4.0 4.95.0 5.96.0 6.92008165425200973111Therefore, 2008 could have had the most damage from earthquakes.Example (continued)We could have made a frequency polygon.

Both 2008 and 2009 had the strongest earthquakes, registering from 6.0 to 6.9 on the Richter scale.

The number of earthquakes in the three highest intervals was greater into 2008 than in 2009, so 2008 could have had the most damage from earthquakes.Independent PracticePg. 249-253, # 1, 3, 5, 6, 8, 10, 11, 12Chapter 55.3 standard deviationUsing excel to find mean and standard deviationWe will go over how to find the mean and standard deviation using Excel. Its important that you pay close attention as I go through the steps, since you will need to use this information to do complete todays summative assessment.HandoutToday we will be finding the mean and the standard deviation of a set of data. Answer all of the questions in the handout to your fullest abilities, because this is a summative assessment.Standard deviationStandard deviation is a measure of dispersion, of how the data in a set is distributed.Standard deviation, , can be expressed as:

The mean, , can be expressed as:

where x is each term in a set of data, and n is the number of terms in the data.What does mean?exampleBrendan works part-time in the canteen at his local community centre. One of his tasts is to unload delivery trucks. He wondered about the accuracy of the mass measurements given on two cartons that contained sunflower seeds. He decided to measure the masses of the 20 bags in the two cartons. One carton contained 227 g bags, and the other carton contained 454 g bags.

How can measures of dispersion be used to determine if the accuracy of measurement is the same for both bag sizes?Example (continued)

We can find the mean, , and standard deviation, , using our calculator.

STATEDITEnter all of the terms into L1STATCALC1-Var StatsWe are given all of the information about the data set. Try it with these two sets: Set 1:

Set 2:

The accuracy is not the same for both bags. The 454 g bags are more accurate.exampleAngele conducted a survey to determine the number of hours per week that Grade 11 males in her school play video games. She determined that the mean was 12.84 h, with a standard deviation of 2.16 h.

Janessa conducted a similar survey of Grade 11 females in her school, she organized her results in this frequency table. Compare the results.

Firstly, we need to use the midpoint of the Hours, since we cant enter a range into our data.Midpoints: 4, 6, 8, 10, 12, 14For group data, like this, there is an extra step. Enter Hours into L1, & Frequency into L2STAT

Midpoints: 4, 6, 8, 10, 12, 141-Var(2ndSTATL1STATL22nd)ENTERFor the girls, we find:

Compared to the boys:

Boys have a higher mean (they play more video games), and they have a lower standard deviation (so theyre more consistent than the girls).,Independent PracticePg. 261-265, # 2, 4, 5, 7, 9, 10, 12, 13, 14Chapter 55.4 The normal distributionNormal distributionIf we rolled a single die 50 000 times, what do you think the graph would look like?What about two dice?In partners, roll two dice 50 times. One person rolls, one person keeps a frequency diagram.

Normal distributionA normal curve is a symmetrical curve that represents the normal distribution; also called a bell curve.Normal distribution is data that, when graphed as a histogram or a frequency polygon, results in a unimodal symmetric distribution about the mean.

exampleHeidi is opening a new snowboard shop near a local ski resort. She knows that the recommended length of a snowboard is related to a persons height. Her research shows that most of the snowboarders who visit this resort are males, 20 to 39 years old. To ensure that she stock the most popular snowboard lengths, she collects height data for 1000 Canadian men, 20 to 39 years old. How can she use the data to help her stock her store with boards that are the appropriate lengths?

First, enter the data into your calculator. Remember to use midpoints for the height range (i.e. 60.5, 61.5, etc).

example

Heights within one standard deviation of the mean:69.52 2.98 = 66.5 inches69.52 + 2.98 = 72.5 inchesHeights within two standard deviations of the mean:69.52 2(2.98) = 63.5 inches69.52 + 2(2.98) = 75.5 inchesNumber of men within one std. deviation (from 66.572.5 in): 116 + 128 + 147 + 129 + 115 + 63 = 698 men

Number of men within two std. deviation (from 63.575.5 in): 30 + 52 + 64 + 116 + 128 + 147 + 129 + 115 + 63 + 53 + 29 + 20 = 944 menShe surveyed 1000 men, so 698/1000 or 69.8% of men are 66.5 to 72.5 inches tall, while 944/1000 or 94.4% of men are within 63.5 to 75.5 inches tall. exampleJim raises Siberian husky sled dogs at his kennel. He know, from the data he has collected over the years, that the weights of adult male dogs are normally distributed, with a mean of 52.5 lbs and a standard deviation of 2.4 lbs. Jim used this information to sketch a normal curve, with:

68% of the data within one standard deviation of the mean95% of the data within two standard deviations of the mean99.7% of the data within three standard deviations of the meanWhat percent of dogs at Jims kennel would you expect to have a weight between 47.7 lbs and 54.9 lbs.Example (continued)

We want to find the percentage of dogs who weigh between 47.7 lbs and 54.9 lbs.

Shade in the area on the provided graph.So, we can see that 68% of dogs will be between 50.1 and 54.9 lbs, so we just need to know what percent falls between 47.7 and 50.1.How can we figure it out?

What percent happens between 47.7 and 52.5?95/2 = 47.5%What percent is between 50.1 and 52.5?68/2 = 34%So, what percent is left between 47.7 and 50.1?47.5% 34% = 13.5%13.5% So, 68% + 13.5% = 81.5% are between 47.7 lbs and 54.9 lbs.example

Sometimes, we use the symbol to represent the mean.Enter it into your calculator, to find that:

= 2.526 = 0.482Median = 2.55 (scroll down in the 1-Var list)When the median and the mean are close, that suggests the data may be normally distributed.We can make a histogram using this data on our calculator to check if it is normally distributed. First, enter all of the data into your STAT list.ExampleMaking a histogram on your calculator:2ndY=Switch it to ONENTERUse the arrows to navigate to the histogram picture, press ENTER

ZOOM9GRAPHIts also helpful to change the x-scale under WINDOW to the standard deviation (0.482, in this case)

exampleNow that we have a graph, we can see that it looks like its probably normally distributed. However, to make sure, we need to check that it has the right percentsnormally distributed graphs have approximately 68% of the data within one standard deviation of the mean, approximately 95% within two standard deviations of the mean, and approximately 99.7% within three. These are important to remember. We can check this on the calculator, using the TRACE function.When on the graph screen, press TRACE, and then use the arrow keys to find the number of terms within each standard deviation. In the two tallest bars, we have:n = 18n = 17Thats within one standard deviation, so 17 + 18 = 35 terms are within one standard deviation.35/50 = 70% are within one deviation.In the next two, we have:n = 7 and n = 635 + 7 + 6 = 48 terms are within 2 standard deviations48/50 = 96%

In the last, we have:n = 1 and n = 148 + 2 = 5050/50 = 100% are within three standard deviations70%, 96% and 100% are pretty close to the percents for a normal distribution, so we can say that this data approximates a normal distribution.Example (continued)b) If Shirley purchases this cellphone, what is the likelihood that it will last for more than three years?Sketch a frequency diagram, and show one standard deviation (0.482) above the mean (2.526):

Using the standard normal distribution percents (68%, 95%, 99.7%), we know that 50% of data should fall below the mean of 2.526.

68%/2 = 34%, so approximately 34% should fall within one standard deviation above the mean.

So, approximately, what percent would be lasting longer than 3?100% 34% 50% = 16%There is a 16% chance that her cellphone will last more than three years.Standard normal distribution

Independent PracticePg. 279-282, # 1, 3, 4-7, 9, 12-16Chapter 55.5 Z-SCORESZ-scoresA z-score is a standardized value that indicates the number of standard deviations of a data value above or below the mean (it tells you how far away from the mean a value isthe higher the z-score, the further a value is from the mean).The standard normal distribution is a normal distribution that has a mean of zero and a standard deviation of one.When we do problems involving z-scores, we need to use the z-score tables on pages 580-581. exampleHailey and Serge belong to a running club in Vancouver. Part of their training involves a 200 m sprint. Below are normally distributed times for the 200 m sprint in Vancouver and on a recent trip to Lake Louise. At higher altitudes, run times improve.

Determine at which location Haileys run time was better, when compared with the results. For any score, x, we have x = + z, where z represents the number of standard deviations of the score from the mean.

Solve for z:

Example (continued)

Vancouver:Lake Louise:

In this case, is a score that is more negative a good or bad thing?

Haileys time was better than the mean in both places, however, since her z-score was lower for Lake Louise, her better time was in Lake Louise.Try it: Use z-scores to figure out with of Serges run times was better.exampleIQ tests are sometimes used to measure a persons intellectual capacity at a particular time. IQ scores are normally distributed, with a mean of 100 and a standard deviation of 15. If a person scores 119 on an IQ test, how does this score compare with the scores of the general population?

Diagram:119

Now that we have the z-score, we can find the percentile of the score, which tells you the percent of people who would be lower on the graph.

The value in the z-score table is 0.8980. This means that a person who scores 119, scores greater than 89.80% of the population. We say that hes in the 89.8th percentile.exampleAthletes should replace their running shoes before the shoes lose their ability to absorb shock. Running shoes lose their shock-absorption after a mean distance of 640 km, with a standard deviation of 160 km. Zack is an elite runner and wants to replace his shoes at a distance when only 25% of people would replace their shoes. At what distance should he replace his shoes?Diagram:

Now, we use the z-score table in the opposite way: look for the number closest to 0.25 in the middle of the table.What is the accompanying z-score?0.2500 isnt an option on the table, so we see that it must fall between 0.2483 and 0.2514. That means that the z-score falls between 0.67 and 0.68. We can say that z = 0.675.

He should replace his shoes after 532 km.exampleThe ABC Company produces bungee cords. When the manufacturing process is running well, the lengths of the bungee cords produced are normally distributed, with a mean of 45.2 cm and a standard deviation of 1.3 cm. Bungee cords that are shorter than 42.0 cm or longer than 48.0 cm are rejected by the quality of control workers. If 20 000 bungee cords are manufactured each day, how many bungee cords would you expect the quality control workers to reject?Minimum = 42 cm

Maximum = 48 cm

We want to find the area to the right of the max, so the process is different.2.15 = 1 0.98422.15 = 0.0158

There is 0.69% below the minimum, and 1.58% above the max. They will reject approximately 0.69 + 1.58 = 2.27% of the bungee cords.

(0.0227)(20 000) = 454

So, they would reject approximately 454 cords.Look up the percentiles for these values:2.46 = 0.0069

exampleA manufacturer of personal music players has determined that the mean life of the players is 32.4 months, with a standard deviation of 6.3 months. What length of warranty should be offered if the manufacturer wants to restrict repairs to less than 1.5% of all the players sold?1.5% = 0.015

Find the z-score for 0.0150, using your calculatorz = 2.17

They should offer an 18-month warranty.

Independent PracticePG. 292-294 # 1-4, 6, 8, 10, 12, 13, 15, 17, 20, 23Chapter 55.6 Confidence intervalsIndependent PracticePg. 302-304, #