statistics course materials math 12mladdon/sabbatical 2009/statisticsreader5.pdfwork 215 delay 27...

Statistics Course Materials

Math 12

Cabrillo College

Marcella Laddon

Activity #1: How a Typical Worker in the OPF Spends Her Day

A study was conducted to see how technicians spend their time on the job at Kennedy Space Center’s Orbiter Processing Facility. The table below gives the information gathered.

Technician Activity Number of Times Observed

Setup 55 Work 215 Delay 27 Cleanup 31 Training/Meetings 39 Miscellaneous 23 Your job is to make a pie chart summarizing the above data. You must do this with a computer, using Excel or another spreadsheet to enter data and create a pie chart. Make sure that, whichever method you use, your chart is clearly labeled and easy to read. When you have finished the pie chart, make a bar chart as well. Which one do you think is a better presentation of the information?

Homework #1: Graph It! For each of the following situations, make a pie-chart to describe the data visually. Use a computer to generate these charts. When you are finished, comment on the “clarity” of the charts. Are there things you could do to make them easier to read? If so, what? Use

your software to also make a bar chart for problem 2. How does it compare to the pie chart? 1. Teachers at a college were observed at different times to see how they spent their day at work1. The following results were obtained from this study:

Number of Teacher Activity/Role Observations Housekeeping 7 Lecturing 41 Student Interaction 20 Facilitator 25 Off Task 9 Testing 15 Miscellaneous 13 2. Between 1959 and 1992, 14 groups of astronauts were selected to participate in different NASA projects (like the Mercury, Gemini and Apollo projects)2. Here is the data about the numbers of astronauts that were selected in Groups #1-14: Group Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Number of Astronauts 7 9 14 6 19 11 7 35 19 17 13 15 23 19

1. If you are interested in an actual work sampling study of teachers, one example is: “Utilization of Professional Manpower in the Teaching Profession” by Paul E. Christensen, Ph.D. Thesis, Wayne University, Detroit, MI, 1955. Mr. Christensen surveyed elementary schoolteachers in Royal Oak, Michigan.

2. NASA, Astronaut Fact Book, Information Summaries, PMS-011E (JSC), 1995, p. 53

Excel Lab Instructions for Lab #1 If you are working in the CTC, first go to the main desk and get your account name and password. They claim they will have your name within 24 hours of when you register for the course. In both the MLC and the CTC, the machines have Microsoft Excel on them. Using the mouse, click on “start” at the bottom of the screen, select “programs”, then “Microsoft Excel.” For Lab 1, I want you to make some pie charts and bar charts using Excel. Here is what you need to do:

1. Type in the data and the labels from the worksheet you have. (You will be doing “Activity #1” and “Homework #1.”) You may then highlight the area you typed the information in.

2. Use the “chart wizard” icon located in the upper right-hand portion of the screen. (It has red, yellow and blue bars on it.)

3. For the first step, select the type of chart you want. (Excel calls bar graphs “column charts.” These are the type of bar charts we want: with the bars rising vertically.)

4. To answer the second question, you need to know where you have typed in your data (if Excel has not already selected it for you.) If the choice is empty or incorrect, you need to type in the locations of your first piece of data and the last piece of data, separated by a colon. For example, if the data is in locations A1 through B6, I type “a1:b6.” Notice that the separator between the cell locations is a colon.

5. Continue following through the chart wizard instructions. To go on, press “Next”

or “Finish.” If you make an error, you can go “Back”! Try it and see what happens.

6. If you see a data legend that says “series 1,” click on it and press the “delete” key. Your graph does not need goofy labels put there by the program. You should make sure all of the labeling is clear, with correct spelling.

7. Before you print your chart (you will be paying for whatever printing you do in

the CTC), select “Print Preview” from the “File” menu at the top left of the top of the screen. This enables you to check that the chart you print is indeed the chart you desire. Be sure to verify that it will print on one page, and not on parts of two pages. Always do this so that we can save the paper wasted by printing incorrect charts. You can put the boxes with the charts on the same page with the data. If you want to print the chart all by itself on a full sheet of paper, click somewhere in the region of the chart until you see black dots around the chart. (Some versions of Excel show a fuzzy, denim like border around the chart.) It should then print on a page by itself. If you do a “Print Preview” while the black dots are around the chart, you will see what it looks like when printed on a page by itself. Feel free to use the “delete” key to omit any extraneous information or legends on your chart.

8. Make sure your name is on your chart and that it is clearly labeled. Double check that the chart you made is indeed an accurate representation of the data. For example, does the bar chart convey the information that you intend it to?

---------

· ,

4-1 COllsidcr the population pilwided in T:lble4-1 (or the btl'st opl'r<lting incomes achieved hy the owners o( rr~;nclliscs (rom:l fall10US doughnut chain. Using tli;:: fnl10wing t\':o-dj(~it random numbers, select a s:lml'!c or 10 illl'omes (or fUI ther analysis. Starting wilh r:mdom numhers ill the first row, select those franchises tilat 113\'e matching numbers. You may skip:1 HUl\\ber th:1t has already been selected or tlla! does not m2tch a franchise numher. The first 10 franchise incomes selected will constitute the sample. List them Oil [he following page.

97 78 17 40 30 23 '80 32 94 31 20 91 46 75 29 15 31 82 44 77

Table 4-1 ---~~---~---_._-_. ._----_._-,------~-------._ .. __._----_.- __ . __

Franchise Franchise Franchise Franchise Number Income Number Income Number Income Number Income

01 $~ 1,81\4 14 $10,694 27 $19,574 39 $14,249 02 S,914 15 17,801 28 7,987 40 21,347 03 16,026 16 5,993 29 15,949 41 9,980 04 5,964 17 13,961 30 5,763 42 17,936 05 11,971 18 3,843 31 13,427 43 7,G81 06 j ') ,921 19 11,513 32 21,068 44 15,339 07 lJ,598 20 19,160 33 36,429 45 22,975 OC; 17,250 21 14,693 34 8,141 46 7,848 09 7,257 22 7,020 35 19,745 47 20,563 10 (,.833 23 13,942 36 8,632 48 6,121 11 9,651 24 13,754 37 11,884 49 9,089 12 6,519 25 9,512 38 2,337 50 7,773 13 J7,2·16 26 12,472

21

-------- ---

2'r: '. f1tt:~;TAflSTICALSAMrLlNGSrUDY

Sample

F/;li1chise No. Income I,lcome

---_.._-_._-

4..2 FigUIC -1 .. 1 is a r.l'lp of a hypulheti,::d city, D'Illsvilte. Th(~ homes have been arbitnrily placed into "cemus tracts," CJch repr(:senting a contiguous ncighborho,)d. Hou:;c:hold identities, addresses, and f;.l.mily income data are provided in Table 4-2. (Figure 4-1 is on paGe 24.)

t lI l

THE STATISTICAL SAMPLING STUDY 23

Table 4-2 (confirHhed)

Household Address Income Household Address Income ----_. ------

East Court 57 3 $18,998 79 3 $42,735 58 4 13,556 80 4 60,600 59 7 17,lJ56 81 5 38,887 60 8 14,665 82 6 71,775 61 11 19,545 83 8 31,119 62 12 16,997 84 10 40,000 63 15 15,305 85 12 56,337 64 16 15,55~ West Court 65 19 16,885 86· 1 12,223 66 20 17,554 87 5 10,678 67 23 21 ,115 88 9 14,556 68 24 20,997 89 13 13,665 69 27 16,666 90 17 15,997 70 28 17,002 91 21 14,555 71 31 15,155 92 25 16,554 72 32 18,444 93 26 22,115 73 35 15,876 94 29 19,997 74 36 16,123 95 30 17,666 75 39 20,001 96 33 16,002 76 40 1l::,888 97 34 J6,155

Hillcrest 98 37 17,444 77 J 57,lJ..J5 99 38 16,876 78 2 28,553

--------~.~_. ----~- \ !

(u) f..hltcliing th;; followirlg random JI..Jmbers wilh houscohold identities, select a simple ranrlom sample of 10 family incom~s. I

34 01 17 73 93 05 54 89 42 29 L

Sarr.ple ---_._------_.

IncomeIJouselw!d Household

Sample

Income

I' (' ,

r

I f ----_._--- I I

I !

Total

(I» Calc:uLltc the sample mean family iI,CCltne .

...t' = . , ._. . _

---

I School 1 rad D-04 I r/) I Vacant 4

- - :2~ ~:~112 [Of} rtJ~-2IF1~~oJ213t-,l

~4 <I THE STI,TISnCAL SMM'1 !NG ~TUDY

Trdel DOl r;- -- - - - - -- --- -" -,- - - -- -- ~-

l~-=-~- -~r:i::-T~~: ---- f-----.---- -------- 1--

I 26 25 28 27 I ----

21 24 23 I u ~-- u 1-----1I ;;; 17 LO;;; 19

I ! ~ --1-3--- l----i-6-- ~j f--- 15-1

I 1i --f--------- <U ') 12

I ] 1---_I I - 5 8

'---1 :,r- L.--l-_---l--~

I - - - -- Uroldway 1-----_..--- .-I

i q I

I _!

W. Boondocks Ln.

J:=-19_=----_IJlillIT' I

t- - T~a~~ 0-05--- -- .

ShoPl'illg Center

L

II _

7 -

-----=-3__=.. V

r+1 .~; -

l I I 8 7 I I I

II 6 5 I

I E B d k Ln t i il -~~'~"'-' c. --~~ i

I y~~O~ __:

I

I6 I I3 I I

I -I II I, I

I J, !

I 10 I JI I I

~ ~-=-=-=--~..._--:-.:.. J

{, .::::.::::=-'-=. =-.:=--=., . I I I I Parking I 108 I

-9l:

Tract D-06

Figure 4-} Map of Dullsvillc Showing Census Tracts

------------------------- ---------

---------

-------------

THE STI\TISTICAL S/IMPl.ING STUDY 25

4-3 (a) Sclect a systemfJtic ralllA-om sample of 1(1 Uul1svillc family incomcs, picking every tcnth household slarting with the fundomly chosen household 07.

Sample Sample

Household Income Household Income

Total

(b) Calculate the sample mean bnily income.

x:--' _

44 (3) A cluster ran(;OlIl sample of Dull~ville family incomes is to be tahn. Trcating each census tract as a cluster, U5~ the first ~ pplica1.>le random numbers in the following sequence to select two clusters by matching the appropriate census tract identity code.

1, 6, 8, 6, 3, 8, 7, 4

The cll!sters chosen are:

D- D- _

('0) Indicate th,' sample incomes by circling the appropriate entries in Tab!e 4-2. (c) Calculate the sample mc~:n family income.

X=' _

4...5 (a) A stratzficJ random sample' that contains two f:lll1Hy incomes from each census tract in Dullsvil1c is to be taken, baSed on street address or apartment number. Starting with Iract D-DI in Figure 4-1, use thr, fol1owir:g random number scquenee--continuing with the next randoll1l;urnber on tIll' kt until Ihe last r,:mily income in D-06 is ohtained. Be sure jo skip those r<lpc!om numbers that do 110t apply to an address andllcvcr back up to an callier random number. Start in the UPP',f !eft-!l<lnd corner and proceed horizontaJly, line by lil1~.

56 04 57 55 85 34 01 17 73 93 05 5,1 89 42 29 57 69 43 10 77 97 78 17 40 30 23 80 32 94 31 20 91 46 75 2~~ J5 3; 82 44 77 32 42 I 1 09 14 6J 19 00 12 05 63 29 2", :1 33 0') 'f9 29 48 21 90 01 58 07 03 .... ' ..8 25 <17 16 4J J4 61 i2 55 86 88 02 39 44 57 56

.-.. !

!I

!

~

I

I I I

I I j,i I I i f I

26 4 'fHE "TATISTICAL SAMPLING STUDY

The sample incomes nrc

, ,;' Address Income

4-6

(b)

(a) Suppose that a canvass has been made of Dullsville family incomes but that the census data are not yet known. As a convenience sample, only the incomes in th~ apartment house (IOd lYIain Street) were included. No family member in unit 02 was home, and the tenant in 05 sLJmmed the door on the canvasser. The remaining family incomes were obt1ined. Li~t the sample outcomes.

------------------

Total

IncomeUnit

x= _

Calculat.; the sample mean family income.

- (b) Calculate the sampk mean bmily income.

x· = " _

THE STATISTICAL SAMPLING STUDY 27

4-7 (a) Seh-d a judgmcnt sample of f.lInily ill~omc by ch()o~ing lh~ hOU)ch~)lds with the lowest identity IlUmbCIS in each cenSliS ,fact. All if,lds are to be represented by olle f:Ullily, except for the hrp-cst Olles (D-OI and lJ-05), whicll arc to be represented by three families from each lrast.

Census Tract HOllseho!d Income

(1)) Cllkll1ate the sample mean family income.

x= _

h: ;, I

Measures of the Center: Mean, Median, and Mode

1. A list has 10 entries. Each entry is either a 1, a 2, or a 3. a. What must the list be if the mean is 1? b. What must the list be if the mean is 3? c. Can the mean be 4?

2. Which of the following two lists has a larger mean? How could you determine this

without doing any calculations? a. 10, 7, 8, 3, 5, 9 b. 10, 7, 8, 3, 5, 9, 11

3. Ten people in a room have a mean height of 5 ft 6 in. (or 66 in.) An eleventh

person, who is 6 ft 5 in tall, enters the room. Find the mean height of all eleven people. Show your reasoning clearly.

4. An instructor gives a quiz with 3 questions. Each question is worth 1 point. 40%

of the class scores 3 points, 30% of the class scores 2 points, 20% of the class scores 1 point, and 10% of the class scores 0.

Score on Quiz Percent of Class with that

Score

3 40

2 30

1 20

0 10

a. If there were 10 people in the class, what would the mean score be? b. If there were 20 people in the class, what would the mean score be? c. If you were not told the number of people in the class, what would the mean score

be? d. If there were 10 people in the class, what would the median be? e. If there were 10 people in the class, what would the mode be? f. If you were not told how many people were in the class, what would the median

be? g. If you were not told how many people were in the class, what would the mode be?

Variability: Standard Deviation

1. Below are rough sketches of the histograms for three lists of data. Match the sketch with the description. (Some descriptions will be left over.) Explain your reasoning.

DATA DESCRIPTIONS:

i) x = 3.5, s = 1

ii) x = 3.5, s = 0.5

iii) x = 3.5, s = 2

iv) x = 2,5m s = 1

v) x = 2.5, s = 0.5

vi) x = 4.5, s = 0.5

SKETCHES:

2. Make up a list of 10 numbers so that the standard deviation is as large as possible and:

a. Every number is either a 1 or a 5 b. Every number is either a 1 or a 9 c. Every number is either a 1, 5, or 9 and at least tow of the numbers are 5.

3. For a list of positive numbers, can the standard deviation ever be larger than the mean? Why or why not? 4. In 2007, Governor Schwarzenegger proposed that all state employees be given a flat raise of $200 per month.

a. What would this do to the (mean) average monthly salary of state employees? b. What would this do to the standard deviation of the salaries of state employees? c. What would a 5% increase in salaries do to the mean monthly salary? d. What would a 5% increase in salaries do to the standard deviation?

5. Can the standard deviation ever be negative?

Comparing the Mean Absolute Deviation (MAD) and the Standard Deviation Here is the data: 13, 14, 24, 24, 25, 26 1. Find the mean, median and mode for this data.

2. Use the table to find the n

xMAD

∑ −=

µ, the Mean Absolute Deviation of the data.

Value of x Deviation x - µ

Absolute Deviation µ−x

13

14

24

24

25

26

Total

n

xMAD

∑ −=

µ = _________

3. Now find the standard deviation of the data.

Value of x Deviation x - µ

Squared Deviation (x – µ)2

13

14

24

24

25

26

Total The population standard deviation is given by the formula:

n

x2)( µ

σ−

= = __________

While I think that the MAD is a more “natural” way to measure deviation, we use the standard deviation in statistics. This is basically because it is easier mathematically to deal with squares and square roots than it is to deal with absolute values.

Top 10 Reasons Why We Use the Standard Deviation as Our Measure of Variation (by Ann E. Watkins, California State University Northridge)

10. Every value gets taken into account. 9. In important distributions, the formula for the SD turns out to be short and sweet. 8. It is easy to compute with squares. 7. The SD has a rather intimate relationship with the mean so if the mean is the measure of center, the SD is the logical measure for spread. 6.The SD and the mean have a rather intimate relationship with the normal distribution 5.The SD goes someplace mathematically. The MAD is reasonable, but it isn’t important. 4. Standard deviations add like Pythagoras. 3. The formula for the SD looks suspiciously like the Euclidean distance formula. 2. The SD is a distance! 1. It must be the official one to use since it is called the “standard” deviation 0. The SD is on the calculator and the MAD is not.

Population Standard Deviation:

n

x2)( µ

σ−

=

Sample Standard Deviation:

1

)( 2

−

−=

n

xs

µ

Measures of Central Tendency, Variability, Grouped Data and Cumulative Frequencies

1. For the data: 16, 6, 10, 6, 16, 6, 15, 19, 23, 5, 6, find the:

a. Mean b. Median c. Mode d. Midrange e. Range f. MAD g. Sample Standard Deviation

2. Here is a frequency distribution:

Score Classes Frequency (# of people) Cumulative Frequency

10 – 24 15

25 – 39 20

40 – 54 27

55 – 69 13

70 - 84 15

Total 90

a. Draw a frequency histogram of the data, showing the class boundaries as the

edges of your bars. You may also want to show the class marks. b. Draw a cumulative frequency ogive of the data. c. Find the mean of this grouped data. (You’ll need the class marks or midpoints.) d. Find the sample standard deviation of this grouped data.

Criminal Data Lab The purpose of this lab is to show you how a spreadsheet program can help you process large amounts of data. Preliminaries: You can do this lab at home or in the labs on campus. You will need Microsoft Excel and Word. I am writing for, and using, a PC, but it should work similarly on a Mac. Stress: If you are not comfortable with the computer, expect a certain level of frustration, though I have tried to eliminate as much of that as possible. The focus is on processing and interpreting data. If you are stressing and things aren’t proceeding smoothly, stop what you are doing, and come talk to me. It will help if you do not leave this assignment until the last minute. Quite often the problems you may have are some small things that either you or I have overlooked. What it is: In this lab we are going to use some data that was collected about 100 years ago on criminals. The file consists of two columns: one that gives the height (in inches) of 3000 male criminals and the other column that contains the lengths of their left middle finger (in centimeters.) The data is stored on a file online at www.cabrillo.edu/~mladdon/Criminals.xls. We will be looking at some summary statistics as well as some graphs. We will do our statistical work using Excel and then we will copy the important information into a Word document where we can add our answers to the questions. Open up the data file using Excel: To open the file “Criminals.xls,” go to the above link, click on it, and save it to your computer. You can do this by clicking on File, scroll to “Save As…” and select a place on your computer to save the file. Then open up Excel (Start>Programs>Microsoft Excel) and under “File” select “Open” and select the Criminals.xls file. (If you are using Internet Explorer, be sure to save the file on your computer first, before opening up Excel. Do not just click on it to open it up. If you do ,then you will be working with it on the internet in an html file. You want to be working on it, and saving it, to your local computer.) Open up Word: To do this, click Start>Programs>Microsoft Word. In Word, start a new document (File>New>Blank Document.) At the bottom of your computer screen there will be buttons that will allow you to easily go between the two programs. At this point there should be two open. You can go from one program to the other by clicking on the desired button at the bottom of the screen.

http://www.cabrillo.edu/~mladdon/criminals.xls�

Create a Histogram and Summary Statistics for Heights: • Make sure you are in Excel. Have the Criminal Data file open. • Double click on Tools, at the top of the screen. At or near the bottom of the tool list may

be “Data Analysis.” If it is, skip the next few steps until you get to the Tools>Data Analysis step.

• Click on Tools>Add Ins. (It may take a little while.) • Select Analysis ToolPak and Analysis ToolPak-VBA. (click on the box to the left of them

so that there is a check in each box.) • Click OK. (You may need the disks you used to install Excel if you are doing this at

home.) • Click on Tools>Data Analysis. • Click on Descriptive Statistics. • Click OK • To select the input range, simple click on the A above the column where the height data is.

(Make sure the cursor is in the input range box.) • Since our data includes a heading on the column, be sure the “labels in first row” box is

checked. • Select the “Summary Statistics” box. • Click OK • In the column headings, move your mouse to put the cursor between the column A and

column B headings. The pointer should now be a vertical line with two arrows pointing to the left and right.

• Double click. This will widen column A so that you can read all the headings. • The summary statistics should still be highlighted (dark) at this point. If they are not, click

on the upper left hand block of the statistics and drag the mouse to the lower right hand corner of the block.

• Click Edit>Copy. • Now go to your Word document. • Click Edit>Paste

You have now pasted the table of the summary statistics of the heights into your Word document. It is a good idea to name this document and save it. Keep doing this as you go, so that when you are ready to stop, you won’t lose all of your work so far. You will answer the questions posed in this lab in the Word document. On your summary statistics table, the minimum value should be 56 inches and the count, n, should be 3000. The original data did not go away. When you need to go back to it, select the Microsoft Excel box and then click on the “Criminal Data” tab at the bottom of the screen. In the Word document, you will answer the numbered questions.

1. What is the maximum height?

We now will use Excel to make a histogram (bar chart) of the data. We need to set up the “classes” for Excel to use. They are called “bins.”

• Go To Excel. • Click on the Criminal Data tab to get back to the data • In an empty cell, say D1, type “height (inches) • In D2, enter the minimum value (56) • In D3, enter “57” • Select D2 and D3 • Move the cursor to the lower right hand corner of D3. The cursor should turn into a “+.” • Click and drag down until the number in the box is the same as the maximum value.

Now you have the “bins” that Excel will use for the histogram

• Click on Tools>Data Analysis>Histogram>OK • For the input range, select column A by clicking on the “A”. • Click on the bin box. • Select D1 through to the maximum value (D something.) • Be sure the labels box is checked. • Be sure the Chart Output box is checked. • Click OK. • In the graph, click on the word “frequency” on the right hand side and press “delete.” (This

gets rid of the label “frequency.”) • Click on the word “Histogram.” • Now select, or double click on, “Histogram” and type in an appropriate title for your chart.

When you are done, click somewhere away from the title to deselect it. • Do the same for the word “frequency” on the left side of the chart. Give it a correct title.

(That is, what are the units being shown in the bars?) We will now copy this histogram and paste it into Word:

• Click in the box containing the chart, but not directly on the bars or titles, to select the chart. (You may see little black boxes at the edges.)

• Click Edit>Copy. • Go to your Word document. • Click Edit>Paste.

You can resize the histogram in your Word document by:

• Clicking on the graph to select it. • Click on one of the little squares on the perimeter of the box and drag the mouse to resize

the chart. The histogram should not take up your whole page, nor should it be as small as a postage stamp. Pick a medium size to look good in your document.

2. Describe the shape of the data. This is called the shape of the “distribution.” Now we will do some calculations with the summary statistics. Go to the Excel sheet that has these. (It is probably called “sheet 1” in your workbook. You can rename it by clicking on the tab at the bottom of the page, select the “sheet 1” title, and type “Summary Stats” as your title for the sheet.) 3. Calculate sx ± , sx 2± , sx 5.1± and sx 3± to the nearest tenth of an inch. ( x is

the mean and s is the standard deviation of our data.) Put these intervals in your Word document.

4. What percent of the 3000 heights do you expect in the intervals sx ± , sx 2± , and sx 3± ? Answer this using Chebychev’s Rule, and then answer this using the Empirical Rule.

We will now determine the actual percent of the data falling in the intervals above. Since the data is in discrete classes, we will estimate the percent that is in each interval. We will assume that a measurement of 65 really means that the height is between 64.5 and 65.5 inches, etc. To estimate the number of heights in your intervals from question 3, we will “interpolate” to determine what percent of each class is in the interval. For example, if one of the intervals were (62.3, 68.6), we would want a certain percentage of the “62” inch heights, all of the “63” in heights, and so on until the “68” class, and lastly, a percentage of the “69” inch heights. To find this, we would add up (62.5-62.3)*(number of people in the “62”class) + (number of people in the “63” class) + (number of people in the “64” class) + (number of people in the “65” class) + (number of people in the “66” class) + (number of people in the “67” class) + (the number of people in the “68” class) + (68.6-68.5)*(the number of people in the “69” class). Notice, I am finding the number of people in each class from my Excel sheet where the histogram is. Excel shows the “bins” and the “frequencies”, or number of people whose heights are in each bin. It is only for the first and last classes in my interval that I need to find a percentage. 5. Determine the percentage of heights that fall in each of the intervals you

found in question 3. Please show your calculations clearly. 6. How do these percentages compare with what you expected? Which of the

two rules seems most appropriate? 7. Based upon the last 2 questions, make a conclusion about the percent of data

that falls within 1.5 standard deviations of the mean. (i.e.: what would you “expect” to find?)

Now we will repeat this with the left middle finger data. We need to determine the minimum and maximum finger length.

• Go to Excel. • Click on the Criminal Data tab. • Click on Tools>Data Analysis. • Click on Descriptive Statistics. • Click OK. • To select the input range, click on the B above column B. (Make sure the cursor is in

the input range box.) • Check the “labels in first row” box. • Check the “Summary Statistics” box. • Click OK.

8. What is the maximum length for middle fingers?

• Go back to Excel. • Click on the Criminal Data tab. • In an empty cell, say E1, type “length (cm).” • In cell E2, enter the minimum value, 9.5. • In cell E3, enter 9.6. • Select E2 and E3. • Move the cursor to the lower right hand corner of E3. The cursor should turn into a “+.” • Click and drag down until the number in the box is the same as the maximum value. • Click on Tools>Data Analysis>Histogram>OK. • For the input range, select column B by clicking on the B. • Highlight anything in the “bin range” box and delete it. • Select E1 through the maximum value. • Check the label and chart output boxes. • Click OK • Change the labels on the graph like you did with the height histogram. • Copy and paste the graph into your Word document.

9. Calculate sx 5.1± to the nearest hundredth of a cm. Determine the

percentage of lengths that fall into this interval. 10. The shape of the histogram is up and down in several places. Why might

this be? What could have been done differently to avoid this? 11. How does the calculated percentage compare with your predicted value?

You are now done. You can save your Excel file and the Word document to a disk. You will turn in your Word document with the histograms and answers to the questions all typed up to me.

Box and Whisker Plots

Uses:

• To study the distribution of numerical data, to see how spread out or bunched up it is.

• To compare different sets of numerical data.

To Draw the Box Plot (Find the 5 number summary):

• Find the minimum value of the data (Min)

• Find the maximum value of the data (Max)

• Find the median, the middle value of the data (Med)

• Find the lower (first) and upper (third) quartiles (Q1 and Q3)

The lower quartile, Q1, is the middle value of the numbers that are less than the

median and the upper quartile, Q3, is the middle value of the numbers that are

greater than the median.

Make a number line, in scale that works for the values of your data. Draw vertical line segments above the values of Q1, Med, and Q3. Make the box around these values. Extend lines (“whiskers”) from the left and right sides of the box to the minimum and maximum values.

Who Was the Greatest Yankee Home Run Hitter? (from: Exploring Data by James M. Landwehr and Ann E. Watkins, prepared for the American Statistical Association’s Quantitative Literacy Project)

Here are four of the greatest New York Yankee home run hitters with a list of the number of home runs each hit while a Yankee.

Babe

Ruth

Year

# Home Runs

Lou

Gehrig

Year

# Home Runs

Mickey

Mantle

Year

# Home Runs

Roger

Maris

Year

# Home Runs

1920 54 1923 1 1951 13 1960 39

1921 59 1924 0 1952 23 1961 61

1922 35 1925 20 1953 21 1962 33

1923 41 1926 16 1954 27 1963 23

1924 46 1927 47 1955 37 1934 26

1925 25 1928 27 1956 52 1965 8

1926 47 1929 35 1957 34 1966 13

1927 60 1930 41 1958 46

1928 54 1931 46 1959 31

1929 46 1932 34 1960 40

1930 49 1933 32 1961 54

1931 46 1934 49 1962 30

1932 41 1935 30 1963 15

1933 34 1936 49 1934 35

1934 22 1937 37 1965 19

1938 29 1966 23

1939 0 1967 22

1968 18 Source: The Baseball Encyclopedia, 4th ed. Joseph L. Reicher, ED, 1979

1. Using ONE numerical scale (or number line), make 4 box and whiskers plots on the same page. Be sure to include the 5 number summary that you got for each player. Compare the box plots and:

2. Decide who was the best player. What is your reasoning? 3. Rank the 4 players from best to worst. What are the reasons for your rankings?

Probability: Experiment versus Theory

Throw one die 36 times and record your results on this table. We will fill in the theory column together, as a class.

Number showing on the die Experiment: # of times it occurred

Theory: Percentage expected

1

2

3

4

5

6

Throw two dice 36 times and record the sums that you get:

Sum of the 2 dice Experiment: # of times that sum occurred


2

3

4

5

6

7

8

9

10

11

12

Throw two dice 36 times and record the differences you get:

Difference of the 2 dice (large # - small #)

Experiment: # of times that difference occurred


0

1

2

3

4

5

Two Dice Sums – Probabilities of Combined Events 1. Write the theoretical probabilities of the following sums:

a) P(sum of 2) b) P(sum of 3) c) P(sum of 4) d) P(sum of 5) e) P(sum of 6) f) P(sum of 7) g) P(sum of 8) h) P(sum of 9) i) P(sum of 10) j) P(sum of 11) k) P(sum of 12) l) P(sum of 13)

2. Find these probabilities:

a) P(sum of 7 or 11) b) P(sum of 3 or 4) c) P(sum is an even number) d) P(sum is an odd number) e) P(a sum of 3, 5 or 9) f) P(a sum larger than 5) g) P(a sum smaller than 5) h) P(a sum less than 12) i) P(a sum smaller than 5 OR larger than 9) j) P(a sum smaller than 4 OR larger than 8) k) P(a sum larger than 2 AND smaller than 5) l) P(one die shows a 5 and the other die has 3 or less)

3. The sample space for “roll one die” had 6 outcomes. The sample space for “roll two

dice” had 36 outcomes. How many outcomes will there be for the experiment “roll three dice?”

4. Write out a suitable sample space for the experiment “roll a die and toss a coin.”

Find the probabilities:

a) P(6 and heads) = P(6H) b) P(rolling a 5) c) P(heads and any number) d) P(even number and tails)

5. Write out a suitable sample space for the experiment “roll a die, toss a coin, and pick

one of 3 cards.” (You have the ace, king and queen of clubs.)

Find the probabilities:

a) P(a 1,a head and an ace) b) P(a 3 and tails) c) P(tails and a queen) d) P(a 4 AND a king) e) P(anything but a 4 on the die) f) P(an even #, heads and king) g) P(an odd # and an ace) h) P(an odd # on the die and a king or queen) i) P(a 4 OR a king)

Counting Recap Here is a summary of what we discussed in class: Multiplication Principle e.g. How many possible outcomes do we have when we toss a coin, toss a die and pick one of 3 cards? 2·6·3 = 36 possible outcomes Factorials n! = n(n-1)(n-2)…3·2·1 e.g. 6! = 6·5·4·3·2·1 = 720, 4! = 4·3·2·1 = 24 Permutations How can I use n objects to fill r spaces? Order is important, no repeats.

P(n,r) = nPr = )!(

!

rn

n

−

e.g. How can I use 7 books to fill 4 empty spaces on the shelf? 7P4 = 840 Combinations How can I pick n objects r at a time? Order is not important, no repeats.

C(n,r) = nCr =

=

− r

n

rrn

n

!)!(

!

e.g. How many ways can you pick a committee of 3 from a group of 12 people?

12C3 = 220 The 4 Ways to Count:

1. License Plates: Order is important, repeats are allowed 2. Teams: Order is important, repeats are not allowed

3. Committees: Order is not important, repeats are not allowed 4. Weird Stuff: Order is not important, repeats are allowed

Some Counting Problems:

1. Calculate:

P(7, 4) = 7P4 = C(7,4) = 7C4 =

2. On a math test there are 10 multiple choice questions with 4 possible answers each, and 15 true-false questions. In how many possible ways can the 25 questions be answered?

3. The student affairs committee has 3 faculty, 2 administration members, and 5 students on it. In how many ways can a subcommittee of 1 faculty, 1 administrator and 2 students be formed?

4. Here is an ice-cream question:

The Laddon Ice-cream Shoppe has 31 flavors of ice-cream. You are going to buy a triple decker ice-cream cone. How many ways are there to arrange your 3 scoops of ice-cream if:

a) Each scoop has to be a different flavor and you do not care how they are arranged.

b) Each scoop has to be a different flavor and you do care how they are arranged.

c) Each scoop does not have to be a different flavor and you do care how they are arranged.

d) Each scoop does not have to be a different flavor and you do not care how they are arranged.

Conditional Probability

Consider this contingency table, which presents the results of an advertising survey about the use of credit by Martan Oil Company customers:

Number of Purchases at Gasoline Stations:

(Made Last Year)

Method of

Payment

0 - 4 5 - 9 10 - 14 15 - 19 20 and

over

Cash 150 100 25 0 0

Oil Company

Card

50 35 115 80 70

National or Bank

Credit Card

50 60 65 45 5

a) How many customers were surveyed? b) Why is this bivariate data? What type of variable is each one? c) What is the probability of preferring to use an oil-company credit card?

d) What is the probability of preferring cash AND making 10 - 14 purchases last year?

e) What is the probability of making between 5 and 9 purchases GIVEN that the person prefers to use a national or bank credit card?

f) What does the 70 in the fifth cell of the second row mean?

g) What is the probability of preferring cash OR making 10 – 14 purchases last year? h) Using whatever category you like, try an example to see if you can show that the 2

variables are independent (or not independent.)

Conditional Probability and Independence of Variables Marcella went to several restaurants. Here is a table describing the cost and quality of the food she had.

The Food was:

Good

Bad

Cheap

50

40

Expensive

25

5

Total

Find these probabilities:

a) P(the food was good)

b) P(the food was cheap)

c) P(the food was a good and cheap)

d) P(the food was good given that it was cheap)

e) P(the food was expensive given that it was bad) f) Based on this study, are cost and quality independent? Why or why not?

Probability Distributions For each of the following, state whether or not it is a probability distribution. If it is not, tell the reason why. If it is, find its mean and standard deviation. 1.

X P(X)

1 0.2

2 0.3

3 0.3

4 0.1

µ = _________, σ = ____________ 2.

X P(X)

1 0.4

2 0.3

3 0.2

4 0.1

µ = _________, σ = ____________ 3.

X P(X)

1 0.5

2 -0.3

3 0.6

4 0.2

µ = _________, σ = ____________ 4

X P(X)

0 0.2

1 0.2

2 0.2

3 0.2

4 0.2

µ = _________, σ = ____________ 5.

X P(X)

1 1/8

2 3/8

3 3/8

4 1/8

µ = _________, σ = ____________

Binomial type problems.

1. A weird penny is tossed 8 times. The probability of heads on any one toss is equal to 0.3. Fill in the following table:

Number of Heads

X

Use the formula

(the long way!)

P(X)

Use Binomial

Table in the text

P(X)

Use Binoml83

program on

calculuator

P(X)

0

1

2

3

4

5

6

7

8

2. This time the weird penny is tossed 6 times. P(heads) = 0.9. Fill in the following table:

Number of Heads

X

Use the formula

(the long way!)

P(X)

Use Binomial

Table in the text

P(X)

Use Binoml83

program on

calculuator

P(X)

0

1

2

3

4

5

6

3. Last time. The penny is tossed 10 times. P(heads) = 0.5. Fill in the following table:

Number of Heads

X

Use the formula

(the long way!)

P(X)

Use Binomial

Table in the text

P(X)

Use Binoml83

program on

calculuator

P(X)

0

1

2

3

4

5

6

7

8

9

10

1

1 1 1 2 1

1 3 3 1 1 4 6 4 1

1 5 10 10 5 1 1 6 15 20 15 6 1

1 7 21 35 35 21 7 1 1 8 28 56 70 56 28 8 1

1 9 36 84 126 126 84 36 9 1 1 10 45 120 210 252 210 120 45 10 1

THE STANDARD NORMAL CURVE Worksheet For each of the problems below, be sure to SKETCH the standard normal curve and SHADE IN the area you are being asked to find. See the following example: Find the area under the standard normal curve that lies between z = -1.65 and z = 1.65

From the table in the text, or using the Normal83 program, the area = 2(.4505) = .901 The related probability statement is P(-1.65 < z < 1.65) = .901 or 90.1%

The shaded area is .901 or 90.1% of the entire area. 1. You find the area under the standard normal curve that lies between:

Sketch Area

a) z = 0 and z = 2

b) z = 0 and z = 3

c) z = −3 and z = 3

d) z = 0 and z = 1.70

e) z = −1.70 and z = 2

f) z = 1.70 and z = 2.70

2. Find the AREA under the standard normal curve:

SKETCH:

a) to the left of z = −0.40

b) for z < 0.40

c) to the right of z = 1.65

d) to the right of z = −1.65

e) outside the interval from z = −2.00 to z = 2.00

3. Solve for z in each of the following (I have shaded in the areas, you find the z value

that gives the correct area.):

a) Area = 80.64% z =

b) Area = 44.52% z =

c) Area = 17.11% z =

d) Area = 99.18% z =

e) Area = 91.15% z =

f) Combined area = 4.04% z =

Statistics Using the TI-83 Graphics Calculator

The Normal Distribution By Mike Koehler Blue Valley North High School Overland Park, KS [email protected]

Problem 9: A company manufactures light bulbs that have a life expectancy that is normally distributed with a mean of 750 hours and a standard deviation of 40 hours. Find the probability that a bulb burns between 728 hours and 784 hours.

Problem 10: The heights of 6 year old girls are normally distributed with a mean of 46 inches and a standard deviation of 2.17 inches. Find the probability that a girl selected at random will have a height less than 44 inches.

Problem 11: In a letter to Ann Landers, a wife claimed to give birth to a baby 308 days after a visit from her husband who was in the Navy and stationed on board a ship. Pregnancies are normally distributed with a mean of 268 days and a standard deviation of 15 days. Does the wife have a problem? (Triola, Elementary Statistics, Addison Wesley, 1992)

Problem 12: On a SAT exam administered by the CEEB, the mean math score was 475 with a standard deviation of 130. If a scholarship is available to students with scores above the 85th percentile, what is the score needed to be eligible for the scholarship?

Problem 13: The life of an electric drill when used commercially follows a normal distribution with a mean of 8 years with a standard deviation of 1.25 years. The manufacturer will replace free all drills that fail while under warranty. If the manufacturer is willing to replace only 5% of the drills that fail, how long a guarantee should be offered?

Copyright 1996 by Mike Koehler

(c) Copyright 1997 Texas Instruments Incorporated. All rights reserved.

Trademarks

The Normal Distribution

1. What is the standard normal distribution and why is it important?

2. Consider data that was obtained from a normal distribution with a mean of 6.3 and a standard deviation of 3. 17. Convert the following to z-scores.

a. 12.31 b. 8.2 c. 2.1 d. 15.8

3. What is the physical meaning of a z-score of 1.96? 4. What is the physical meaning of a z-score of –1.96?

5. Consider the intelligence quotient (IQ) of a person. IQ’s are approximately normal

with a mean of 100 and a standard deviation of 15. If a person were selected at random,

a. What is the probability that her/his IQ would be below 120? b. What is the probability that her/his IQ would be above 120? c. What is the probability that her/his IQ would be between 90 and 105? d. What is the probability that her/his IQ would be below 90?

6. Consider data that was obtained from a normal distribution with a mean of 6.3 and

a standard deviation of 3.17. Find the 85th percentile.

7. Find the following probabilities: a. P(-1.5 < z < 1.1) b. P(0 < z < 6.2) c. P(-1 < z < 1) d. P(-2 < z < 2) e. P(-3 < z < 3) f. P(-1.96 < z <1.96)

8. Given that a particular normal random variable, x, has a mean of 13 and a

standard deviation of 6.2, find the following probabilities: a. P(14 < x < 20) b. P(4 < x < 17.2) c. P(x < 7)

9. A job satisfaction index score for nurses is normally distributed with a mean of 50

and a standard deviation of 10. What is the probability that a nurse selected at random has an index score:

a. Higher than 55? b. Between 47 and 59?

10. A radar unit is used to measure the speed of automobiles on a busy street in

downtown Santa Cruz that has a speed limit of 35 mph. Suppose the speeds of individual automobiles are normally distributed with a mean of 37 mph.

a. Find the standard deviation if 5% of the automobiles travel faster than 45 mph.

b. Based on the standard deviation you just calculated, find the 85th percentile for the random variable “automobile speed.”

c. Based on the standard deviation you just calculated, what percentage of cars travel within 3 mph of the posted speed limit?

Questions about Normal Data and Sample Means

A manufacturer of Sea-Monkeys finds that the daily numbers of Sea-Monkey fun packs assembled by a machine are normally distributed with a mean of 560 and a standard deviation of 12.

1. For a randomly selected day, find the probability that more than 575 items are produced.

2. For a randomly selected day, find the probability that between 565 and 580 items were produced.

3. For 36 different randomly selected days, find the probability that the mean number of items produced is less than 565.

4. For 20 different randomly selected days, find the probability that the mean number of items produced is greater than 565.

5. For a randomly selected day, find the value of the twentieth percentile.

Sampling Distributions, An Example

Original Population: X = 1, 2, 3 All are equally probable 8165.32,2 === σµ and

The population of all SAMPLES of size 4 (sampling with replacement): Samples of size 4 {Population 1,2, 3}

x-bar (mean)

1 1 1 1 1 1 1 1 2 1.25 1 1 1 3 1.5 1 1 2 1 1.25 1 1 2 2 1.5 1 1 2 3 1.75 1 1 3 1 1.5 1 1 3 2 1.75 1 1 3 3 2 1 2 1 1 1.25 1 2 1 2 1.5 1 2 1 3 1.75 1 2 2 1 1.5 1 2 2 2 1.75 1 2 2 3 2 1 2 3 1 1.75 1 2 3 2 2 1 2 3 3 2.25 1 3 1 1 1.5 1 3 1 2 1.75 1 3 1 3 2 1 3 2 1 1.75 1 3 2 2 2 1 3 2 3 2.25 1 3 3 1 2 1 3 3 2 2.25 1 3 3 3 2.5 2 1 1 1 1.25 2 1 1 2 1.5 2 1 1 3 1.75 2 1 2 1 1.5 2 1 2 2 1.75 2 1 2 3 2 2 1 3 1 1.75 2 1 3 2 2 2 1 3 3 2.25 2 2 1 1 1.5 2 2 1 2 1.75 2 2 1 3 2 2 2 2 1 1.75

2 2 2 2 2 2 2 2 3 2.25 2 2 3 1 2 2 2 3 2 2.25 2 2 3 3 2.5 2 3 1 1 1.75 2 3 1 2 2 2 3 1 3 2.25 2 3 2 1 2 2 3 2 2 2.25 2 3 2 3 2.5 2 3 3 1 2.25 2 3 3 2 2.5 2 3 3 3 2.75 3 1 1 1 1.5 3 1 1 2 1.75 3 1 1 3 2 3 1 2 1 1.75 3 1 2 2 2 3 1 2 3 2.25 3 1 3 1 2 3 1 3 2 2.25 3 1 3 3 2.5 3 2 1 1 1.75 3 2 1 2 2 3 2 1 3 2.25 3 2 2 1 2 3 2 2 2 2.25 3 2 2 3 2.5 3 2 3 1 2.25 3 2 3 2 2.5 3 2 3 3 2.75 3 3 1 1 2 3 3 1 2 2.25 3 3 1 3 2.5 3 3 2 1 2.25 3 3 2 2 2.5 3 3 2 3 2.75 3 3 3 1 2.5 3 3 3 2 2.75 3 3 3 3 3

The Distribution of the Sample Means:

x-bar Frequency P(x-bar) 1 1 0.01

1.25 4 0.05 1.5 10 0.12

1.75 16 0.20 2 19 0.23

2.25 16 0.20 2.5 10 0.12

2.75 4 0.05 3 1 0.01

Histogram of Sample Means

0

5

10

15

20

11.2

5 1.5 1.75 2

2.25 2.5 2.7

5 3More

Average of 4 numbers

Freq

uenc

y

(Looks kind of normal, no?) Summary Sampling Statistics n = 4

µµ == 2x

166.2 =xσ

28165.

24082. ====

σσσnx

Sampling Activity 1. Look at the sheet for a minute. Turn it face down. Write down your “estimate” of the

average area. 2. Pick any 5 of the rectangles that you consider “typical.” Find their areas and take the

average. (judgement)

3. Using a random number generator, generate 5 numbers between 1 and 100. Find the areas of those 5 rectangles and take the average. (random)

4. Pick a number between 1 and 20. Add 10 to it 4 times. Take those 5 rectangles and find their areas. Take the average. (systematic)

From: Activity-Based Statistics by R. Scheaffer, M. Gnanadesikan, A. Watkins, J. Wetmer, New York, Springer-Verlag, 1996

Bivariate Data Examples

1. Two “qualitative” variables: Gender and Major

Major Gender Liberal Arts Business

Administration

Technology Row Totals

Male 5 6 7

Female 6 4 2

Column Totals

a) What is the “grand total?”

b) Fill in the percentages based on the grand total in the grid below:

Gender Liberal

Arts

Business

Administration


Male

Female

Column

Totals

c) Fill in the percentages for each cell based on the row totals:

Gender Liberal

Arts

Business

Administration


Male

Female

Column

Totals

d) Fill in the percentages for each cell based on the column totals:

Gender Liberal

Arts

Business

Administration


Male

Female

Column

Totals

Practice stating exactly what you are finding in clear English!

2. Two “quantitative” variables: Number of Hours Studied and Score on a Test

X

Number of Hours Studied

Y

Score on the Test

18 68

27 82

20 77

10 90

30 79

24 72

32 94

27 88

12 60

16 70

a) Make a “scatter plot” of this data.

b) Enter the data into your calculator. Find the correlation coefficient, r.

c) Find the slope and intercept of the regression line.

d) Write the equation of the regression line.

(Here’s what I get: r = 0.4798612569 and the slope is 0.681642, and the intercept is 63.2765298. Check to see if you are getting these values on your calculator also.)

Football:

A random sample of eight quarterbacks listed in The Sports Encyclopedia: Pro Football, 11th Edition, gave the following information:

Height of Q-back

(in inches)

Weight of Q-back

(in pounds)

75 205

78 230

74 210

73 210

72 195

75 215

76 203

73 196

a) Get a piece of graph paper and draw a scatter diagram for the data. b) Write the equation of the line of best fit. c) Graph this line on your scatter diagram. d) If a quarterback is 76 inches tall, what would you predict his weight to

be? e) If a quarterback weighs 200 pounds, what would you predict his height

to be? f) What is r, the correlation coefficient? Does this seem to indicate a strong

or weak correlation? g) Now, go to the lab and use EXCEL to do this for you again. (The

following page will give you the instructions on how to do this.)

Here is some information about some professional golfers.

Player’s Name Earnings per year (x in

dollars)

World Ranking (y) as of

1998

David Duval 1,272,305 8.77

Fred Couples 1,056,533 6.47

Tiger Woods 1,056,086 11.91

Justin Leonard 1,052,346 8.76

Mark O’Meara 894,724 7.49

Phil Mickelson 788,800 7.72

Ernie Els 601,363 12.35

Vijay Singh 383,979 6.59

Mark Calcavecchia 766,224 5.72

Davis Love III 621,987 10.67

Tom Watson 418,385 5.25

Nick Price 296,668 7.79

Colin Montgomery 272,000 9.03

a) Draw the scatter diagram on graph paper.

b) What is r? Does there appear to be linear correlation for this data?

c) Write the equation of the regression line. Graph it on your scatter diagram.

d) What are the units of the slope of the regression line? What is the interpretation of

the slope in English? What are the units of the intercept of the line? What does

the y-intercept tell you?

e) If a golfer makes 500,000 per year, what would you expect their ranking to be?

f) If a golfer is ranked number 7 in the world, what would you expect his/her income

to be?

Regression Example for Statistics For each of the following 4 data sets, use your calculator or a piece of software to find:

1. n, the sample size

2. the mean of the x's

3. the mean of the y's

4. the equation of the regression line

5. the correlation coefficient, r

6. the value of r-squared, r2

Now graph the scatter-plot for each set of data and draw in the regression

line.

What is similar about the sets of data? What is different about the sets?

For which set(s) is it most appropriate to use linear regression as a model?

Set 1 Set 1 Set 2 Set 2 Set 3 Set 3 Set 4 Set 4

x y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58

8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76

13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71

9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84

11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47

14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04

6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25

4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50

12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56

7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91

5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

This data comes from: The Visual Display of Quantitative Information by Edward R.

Tufte, Graphics Press, Cheshire, Connecticut c. 1983

Statistics Using the TI-83 Graphics Calculator Page 1 of 2

2/14/97 3:02:37 PM

By Mike Koehler Blue Valley North High School Overland Park, KS [email protected]


Confidence Intervals

Problem 14: A company manufactures light bulbs that have a length of life that is approximately normally distributed with a standard deviation of 30 hours. If a sample of 40 bulbs has an average life of 765 hours, find a 95% confidence interval for the population mean of all bulbs produced by the company.

Problem 15: Ten high school students are randomly selected and asked to count the amount of change in their pocket or purse. They are found to have a mean of $1.11 and a standard deviation of $1.42. construct a 95% confidence interval for the mean amount of money carried by high school students.

Problem 16: The following were recorded for the relief time, in hours, for a certain brand of cough suppressant: 3.4, 2.5, 4.8, 2.9, 3.6, 2.8, 3.3, 5.6, 3.7, 2.8, 4.4, 4.0, 5.2, 3.0, 4.8. Assuming that the measurements represent a random sample from a normal population, find the 95% confidence interval for relief times of the cough suppressant.

Problem 17: As a freshman, Jacque Vaughn of the University of Kansas, made 28 out of 70 shots from three point range. Construct a 95% confidence interval of the proportion of such shots this player might make in his career at K.U.


2/14/97 3:02:37 PM

Problem 18: In a high school, it is believed that the percentage of senior males who drive to school exceeds the percentage of senior females who drive to school by 10%. It is found that 158 of 200 males surveyed drive and 114 out of 150 females drive. Find a 95% confidence interval for the difference in proportions of seniors who drive to school. Decide if the 10% difference is valid.

Problem 19: A car dealership asked 25 randomly selected buyers how long they planned to keep their new cars. The sample mean and standard deviation are 6.2 years and 3.1 years respectively. Construct the 95% confidence interval for the population standard deviation.



TI-82 / 83 Activities for Statistics / Probability

Classroom Activities Matrix

(c) Copyright 1997 Texas Instruments Incorporated. All rights reserved. Trademarks

Marcella

Cross-Out

Hypothesis Testing

The 5 step process:

1. State the null hypothesis H0 and the alternative hypothesis H1. (They are always

about µ, p, or σ.)

2. State what you are testing, α, the level of significance given for the problem, whether

the test to be used is a one-tailed or two-tailed test, and what test statistic is to be used

(z, t, F, or χ2.)

3. Draw a picture, shade in the appropriate tail or tails, look up the critical values using

tables or calculators and label the picture.

4. Compute the test statistic and show it on your picture. The computation may be done

on your calculator. I generally show the p-value in this step as well.

5. Make a decision to reject or fail to reject H0 and state your conclusion in clear

English.

Example:

An IQ test was administered to twelve students and the scores were as follows:

87 102 94 81 115 75

74 116 98 114 96 102

Based on this sample, is the claim that the true mean IQ in the population is less

than 100 justified? Use α = .10, and assume that IQ is normally distributed with a

population variance, σ2 = 144.

Solution: 1. H0: µ = 100

H1: µ < 100

2. This is a one population, one-tailed test of the mean. The test statistic

to use is a z-test because σ is known (even though the sample size n is

small.) We will use zx

n

=− µ

σ

3. This is a left-tailed test. The critical value for z is zcrit = −1.28.

Decision rule: If ztest ≤ −1.28, we’ll reject H0. If ztest > −1.28, we’ll fail

to reject H0.

4. Using the formula above, we get ztest =

12

12

100197.69z

−= = −1.11

5. Fail to reject H0. (We are NOT in the critical region.) There is not

enough evidence to show that the mean IQ is less than 100.

Note: To do Step 4 on the TI-83 and TI-84 calculators, press STATS, TESTS, Input: Stats, µ0 = 100, σ =12

(the calculator will divide by n for you), x = 96.197, n = 12, µ: < µ0, calculate.

The p-value will be given to you by the calculator. It is the probability of having a z value less than the

value of our test statistic. P(z < −1.11) = .1335. By comparing the 13.35% with our significance level, α = 10%, we also see that we are not in the critical region.

Examples of One Sample Tests

1. From his long-standing experience, a farmer believes that the mean yield of grain

per plot on his farm is 150 bushels. When a new seed introduced on the market

was tried on sixteen randomly picked experimental plots, the mean yield was 158

bushels. Suppose the yield per plot can be assumed to be normally distributed

with a standard deviation of yield, σ, of 20 bushels. Is the new seed significantly

better? Use α = 0.02.

2. A random sample of size n = 16 is drawn from a population having a normal

distribution. The sample mean and the sample variance are given, respectively, as

x = 23.8 and s2 = 10.24. At the 5 percent level of significance, test the following:

a) H0: µ = 25 versus H1: µ ≠ 25

b) H0: µ = 25 versus H1: µ < 25

3. A ski coach claims that she can train beginning skiers for 3 weeks so that at the

end of the program they will finish a certain downhill course in less than 13

minutes. It was found that, when a random sample of ten skiers was given the

training, their mean time was 12.3 minutes with s = 1.2 minutes. On the basis of

the evidence, is the true mean time significantly less than 13 minutes? Use α =

0.025.

4. In a sample of 160 cathode tubes inspected, 22 were found to be defective. If the

true proportion of defectives is significantly higher than the 8 percent that the

company considers tolerable, repairs on the machine are in order. At the 5 percent

level of significance, does the machine need repairs?

5. Quarters are currently minted with weights having a mean of 5.670 g and a

standard deviation of 0.062 g. New equipment is being tested in an attempt to

improve quality by reducing variation. A simple random sample of 24 quarters is

obtained from those manufactured with the new equipment, and this sample has a

standard deviation of 0.049 g. Use a 0.05 significance level to test the claim that

quarters manufactured with the new equipment have weights with a standard

deviation less than 0.062 g. Does the new equipment appear to be effective in

reducing the variation of weights? What would be an adverse consequence of

having quarters with weights that vary too much?

6. To test the effect of a physical fitness course on one’s physical ability, the number

of sit-ups that a person could do in one minute, both before and after the course,

was recorded. Ten randomly selected participants scored as shown in the

following table. Can you conclude that a significant amount of improvement took

place? Use α = 0.01.

Before 29 22 25 29 26 24 31 46 34 28

After 30 26 25 35 33 36 32 54 50 43

Examples of two-sample tests

1. The purchasing department for a regional supermarket chain is considering two

sources from which to purchase 10-lb bags of potatoes. A random sample taken

from each source shows the following results:

Idaho Supers Idaho Best

No. of bags weighed 100 100

Mean Weight 10.2 lb 10.4 lb

Sample Variance 0.36 lb 0.25 lb

At the 0.05 level of significance, is there a difference between the mean weights

of the “10-lb” bags of potatoes?

2. Two competing headache remedies claim to give fast-acting relief. An

experiment was performed to compare the mean lengths of time required for

bodily absorption of brand A and brand B headache remedies. Twelve people

were randomly selected and given an oral dosage of brand A. Another 12 were

randomly selected and given an equal dosage of brand B. The length of time in

minutes for the drugs to reach a specified level in the blood was recorded. The

information follows:

Brand A mean = 20.1 s1 = 8.7 n1=12

Brand B mean = 18.9 s2 = 7.5 n2=12

Past experience with the drug composition of the two remedies permits

researchers to assume that the standard deviations of the two time distributions are

approximately equal. Use a 5% level of significance to test the claim that there is

no difference in the mean time required for bodily absorption.

3. In a survey of working parents (both parents working outside the home) one of the

questions asked was “Have you refused a job, promotion, or transfer because it

would mean less time with your family?” Two hundred men and two hundred

women were asked this question. 29% of the men and 24% of the women

responded “yes.” Based on this survey, can we conclude that there is a difference

in the proportion of men and women responding “yes” at the 0.05 level of

significance?

4. Researchers collected data on the numbers of hospital admissions resulting from

motor vehicle crashes, and the results are given in the table for Fridays on the 6th

of a month and Fridays on the following 13th of the same month (based on data

from “Is Friday the 13th Bad for Your Health?” by Scanlon et al., British Medical

Journal, Vol. 307, as listed in the Data and Story Line online resource of data

sets). Use a 0.05 significance level to test the claim that when the 13th day of a

month falls on a Friday, the numbers of hospital admissions from motor vehicle

crashes are not affected.

Friday the 6th 9 6 11 11 3 5

Friday the 13th 13 12 14 10 4 12

5. Bipolar Depression Treatment. In clinical experiments involving different groups

of independent samples, it is important that the groups be similar in the important

ways that affect the experiment. In an experiment designed to test the

effectiveness of paroxetine for treating bipolar depression, subjects were

measured using the Hamilton depression scale with the results given below (based

on data from “Double-Blind, Placebo-Controlled Comparison of Imipramine and

Paroxetine in the Treatment of Bipolar Depression,” by Nemeroff et al., American

Journal of Psychiatry, Vol. 158, No. 6). Using a 0.05 significance level, test the

claim that both populations have the same standard deviation. Based on the

results, does it appear that the two populations have different standard deviations?

Placebo Group n = 43 x = 21.57 s = 3.87

Paroxetine Treatment Group n = 33 x = 20.38 s = 3.91

Chi-Square (χχχχ2)

We’ll work through an example. Say we have polled registered voters about a piece of

legislation proposed by the governor. We pool 200 urban, 200 suburban and 100 rural

residents (selected randomly) and ask them if they are in favor of, or oppose the proposal.

Governor’s Proposal

Type of

Residence

Favor Oppose Total

Urban 200

Suburban 200

Rural 100

Total 500

If peoples preferences about the proposal are independent of where they live, then we

would expect 40% of the people in favor of the proposal to be urban dwellers, since they

make up 40% of the population here. Likewise, we would expect 40% of the people

opposing the proposal to be urban dwellers also.

With this in mind, and using the fact that 254 people were in favor of the proposal and

246 people were opposed to the proposal, fill out the table below for “expected”

responses.


Type of

Residence

Favor Oppose Total

Urban 200

Suburban 200

Rural 100

Total 254 246 500

Now, as it happens, the real data was somewhat different. It was as follows:


Type of

Residence

Favor Oppose Total

Urban 143 57 200

Suburban 98 102 200

Rural 13 87 100

Total 254 246 500

Here is how we test this:

Step 1:

H0: The proportion of voters favoring the proposed legislation is the same in all 3

groups. (This is the presumption of “independence.”)

H1: The proportion of voters favoring the proposed legislation is not the same in

all three groups.

Step 2:

This is a test of independence. The test statistic to use is χ2 = ( )O E

E

−∑

2

.

O is the observed frequency (pulled out of the data table) and E is the expected

frequency (pulled out of the table we created using the totals from each type of

residence.)

It is a one tailed test because this type of χ2 tests always are - if the observed and

expected frequencies are close, then χ2 will be close to 0 (i.e., the opinion on the

legislation is independent of where the person lives.) If χ2 is large, then there is some

dependence between opinion and residency.

Step 3:

Use the table in your text to select a critical value for χ2. We will use α = .05.

The degrees of freedom are df = (r−1)(c−1), where r is the number of rows in your table

and c is the number of columns. Here, df = (3−1)(2−1) = 2. The critical value here is

χ2 = 5.99.

Step 4:

Do the number crunching. You can try it longhand (once will probably satisfy

you) and then notice that with the TI-83, after entering the matrix of observations in [A],

you can select χ2 test from the TESTS part of the Stats menu. The expected matrix will

be calculated automatically and stored wherever you specify.

I got χ2value to be 91.72, so we are definitely in the critical region.

Step 5:

Make your decision and state the conclusion. We reject H0. The three groups of

voters do not all have the same proportions favoring the proposed legislation. (Notice,

we can decide that there is dependence, but not what caused it. That’s why we study

social science as well as mathematics.)

For any old contingency table, you can always construct the table of expected values by

using the row and column totals off of the data table. The expected frequency of the ith

row and jth column position in the table is given by:

ERow total Column total

Grand totali, j =

×=

×R C

n

i j

(EXCEL can do all of these calculations for you also!)


2/14/97 3:02:59 PM

By Mike Koehler Blue Valley North High School Overland Park, KS [email protected]


Hypothesis Testing

Problem 20: A certain type of children's pain reliever states that it contains 325 mg of acetaminophen in each ounce of the drug. If 70 one ounce samples are tested for acetaminophen and it is determined that the mean is 319 mg of the drug and a standard deviation of 26 mg. With a = .01, test the claim that the population mean is equal to 325 mg.

Problem 21: A test is designed to determine if the right hand of right handed people is stronger than their left hand. Nine right handed adults were selected and hand strength tested for each hand. The hand strengths in pounds for each person are given below.

Problem 22: A school wants to compare the ACT scores of those students who complete a college core of courses (4 years of English, and at least 3 years of Math, Social Science, and Natural Science) to the scores of those who do not complete this core. The results of a recent administration of the test follow.


2/14/97 3:03:00 PM

Determine whether completing the core courses produces a higher score on the sub-tests.

Problem 23: Cuckoos lay their eggs in the nests of other birds. Some biologists speculate that the size of the cuckoo's eggs might be different depending on whether the eggs are laid in warbler's nests or wren's nests. To check this, biologists searched a wildlife refuge for warbler's and wren's nests. Summary statistics for the lengths (in mm) of cuckoo's eggs found in these nests are shown below.

Does this data support the biologists' claim that the size of the eggs differ depending on whether they are laid in warbler's nests or wren's nests? ( Advanced Placement Course Description, Preliminary Edition; The College Board, 1995)

Problem 24: It is believed that at least 70% of the homes in a city have smoke detectors installed. Would you agree with this claim if a random survey of homes in the city shows that 153 of the 240 homes surveyed have working smoke detectors installed?

Problem 25: A vote is to be taken among the residents of a state to determine whether casino gambling should be legalized. Many voters in the out-state areas feel it will pass because of the large proportion of city and suburban voters who favor the amendment. To determine if there is a significant difference in the proportion of metropolitan voters and out-state voters favoring the proposal, a poll is taken. If 123 of 210 town voters favor the legalization, and 244 of 515 out-state residents favor it, would you agree that the


2/14/97 3:03:00 PM

proportion of metropolitan voters favoring the proposal is higher than the proportion of out-state voters? Do you think the proposal will pass?

Problem 26: A national pizza chain recently test marketed a new type of pizza in a large metropolitan area. It is important for the company to evaluate the product's performance during this time. This was done in part by sampling consumers and assessing their exposure to the product. The company selected random samples of consumers from different age groups and obtained the following results.

Do these data indicate that market penetration is independent of age?

Problem 27: The following data is obtained from a random sample of absences from a company. At the .05 level of significance, test the claim that the absences occur with equal frequency on each of the 5 days.

X2cdf computes the cumulative probability function. The syntax is ( lowerbound, upperbound, degrees of freedom ) .


2/14/97 3:03:01 PM

Problem 28: Does your life expectancy depend on where you live? The following table gives the state-by-state list of average life expectancy at birth as compiled by the National Center for Health Statistics. Is there a significant difference in a person's average life expectancy based on the region of the country in which they live?

Analysis of Variance provides the methods for comparing the means of more than two populations. In this case we use a one-way ANOVA because we are comparing the means of populations that are classified in one way, by region of the country. The procedures for comparing the means involves analyzing the variation in the sample data relative to the variation in sample means. If the variation in sample means is large relative to the variation within sample means, the we conclude that the means of the population are not all equal.

TWO

populations

Test Statistics

Test of the mean

of differences

(Matched Pairs)

Test of the difference

of means

Test of the difference

of proportions

Test of

Equality of

Variances

What is being

tested?

MEAN (dependent/paired data)

MEAN (independent samples)

MEAN (independent samples) (Pooled Variances)

PROPORTION (independent samples)

VARIANCES

Is σ known? Yes – any size n is ok, use a z test No – use large n, use a t test

No – Assume σ1 = σ2

Samples have to be large

Samples have to be normal

Null

Hypothesis

Ho: µd = 0 Ho: µ1 = µ2

Ho: µ1 = µ2 Ho: p1 = p2 Ho:σ1=σ2

Statistic to

use

td

s

n

d

d

=− µ

You find d and sd with your calculator

2

2

2

1

2

1

2121 )()(

n

s

n

s

xxt

+

−−−=

µµ

df = smaller of n1 – 1 and n2 – 1

tx x

sn n

where

sn s n s

n n

=− − −

+

=− + −

+ −

( ) ( )

( ) ( )

1 2 1 2

1 2

1 1

2

2 2

2

1 2

1 1

1 1

2

µ µ

proportion “pooled” The

ˆ

ˆ,ˆ

)11

(ˆˆ

)()ˆˆ(

21

21

2

22

1

11

21

2121

nn

xxp

andn

xp

n

xp

where

nnqp

ppppz

+

+=

==

+

−−−=

2

2

2

1

s

sF =

(sample #1 has larger variance)

statistics course materials math 12mladdon/sabbatical 2009/statisticsreader5.pdfwork 215 delay 27...

Documents