statistics course materials math 12mladdon/sabbatical 2009/statisticsreader5.pdfwork 215 delay 27...
TRANSCRIPT
Statistics Course Materials
Math 12
Cabrillo College
Marcella Laddon
Activity #1: How a Typical Worker in the OPF Spends Her Day
A study was conducted to see how technicians spend their time on the job at Kennedy Space Center’s Orbiter Processing Facility. The table below gives the information gathered.
Technician Activity Number of Times Observed
Setup 55 Work 215 Delay 27 Cleanup 31 Training/Meetings 39 Miscellaneous 23 Your job is to make a pie chart summarizing the above data. You must do this with a computer, using Excel or another spreadsheet to enter data and create a pie chart. Make sure that, whichever method you use, your chart is clearly labeled and easy to read. When you have finished the pie chart, make a bar chart as well. Which one do you think is a better presentation of the information?
Homework #1: Graph It! For each of the following situations, make a pie-chart to describe the data visually. Use a computer to generate these charts. When you are finished, comment on the “clarity” of the charts. Are there things you could do to make them easier to read? If so, what? Use
your software to also make a bar chart for problem 2. How does it compare to the pie chart? 1. Teachers at a college were observed at different times to see how they spent their day at work1. The following results were obtained from this study:
Number of Teacher Activity/Role Observations Housekeeping 7 Lecturing 41 Student Interaction 20 Facilitator 25 Off Task 9 Testing 15 Miscellaneous 13 2. Between 1959 and 1992, 14 groups of astronauts were selected to participate in different NASA projects (like the Mercury, Gemini and Apollo projects)2. Here is the data about the numbers of astronauts that were selected in Groups #1-14: Group Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of Astronauts 7 9 14 6 19 11 7 35 19 17 13 15 23 19
1. If you are interested in an actual work sampling study of teachers, one example is: “Utilization of Professional Manpower in the Teaching Profession” by Paul E. Christensen, Ph.D. Thesis, Wayne University, Detroit, MI, 1955. Mr. Christensen surveyed elementary schoolteachers in Royal Oak, Michigan.
2. NASA, Astronaut Fact Book, Information Summaries, PMS-011E (JSC), 1995, p. 53
Excel Lab Instructions for Lab #1 If you are working in the CTC, first go to the main desk and get your account name and password. They claim they will have your name within 24 hours of when you register for the course. In both the MLC and the CTC, the machines have Microsoft Excel on them. Using the mouse, click on “start” at the bottom of the screen, select “programs”, then “Microsoft Excel.” For Lab 1, I want you to make some pie charts and bar charts using Excel. Here is what you need to do:
1. Type in the data and the labels from the worksheet you have. (You will be doing “Activity #1” and “Homework #1.”) You may then highlight the area you typed the information in.
2. Use the “chart wizard” icon located in the upper right-hand portion of the screen. (It has red, yellow and blue bars on it.)
3. For the first step, select the type of chart you want. (Excel calls bar graphs “column charts.” These are the type of bar charts we want: with the bars rising vertically.)
4. To answer the second question, you need to know where you have typed in your data (if Excel has not already selected it for you.) If the choice is empty or incorrect, you need to type in the locations of your first piece of data and the last piece of data, separated by a colon. For example, if the data is in locations A1 through B6, I type “a1:b6.” Notice that the separator between the cell locations is a colon.
5. Continue following through the chart wizard instructions. To go on, press “Next”
or “Finish.” If you make an error, you can go “Back”! Try it and see what happens.
6. If you see a data legend that says “series 1,” click on it and press the “delete” key. Your graph does not need goofy labels put there by the program. You should make sure all of the labeling is clear, with correct spelling.
7. Before you print your chart (you will be paying for whatever printing you do in
the CTC), select “Print Preview” from the “File” menu at the top left of the top of the screen. This enables you to check that the chart you print is indeed the chart you desire. Be sure to verify that it will print on one page, and not on parts of two pages. Always do this so that we can save the paper wasted by printing incorrect charts. You can put the boxes with the charts on the same page with the data. If you want to print the chart all by itself on a full sheet of paper, click somewhere in the region of the chart until you see black dots around the chart. (Some versions of Excel show a fuzzy, denim like border around the chart.) It should then print on a page by itself. If you do a “Print Preview” while the black dots are around the chart, you will see what it looks like when printed on a page by itself. Feel free to use the “delete” key to omit any extraneous information or legends on your chart.
8. Make sure your name is on your chart and that it is clearly labeled. Double check that the chart you made is indeed an accurate representation of the data. For example, does the bar chart convey the information that you intend it to?
---------
· ,
4-1 COllsidcr the population pilwided in T:lble4-1 (or the btl'st opl'r<lting incomes achieved hy the owners o( rr~;nclliscs (rom:l fall10US doughnut chain. Using tli;:: fnl10wing t\':o-dj(~it random numbers, select a s:lml'!c or 10 illl'omes (or fUI ther analysis. Starting wilh r:mdom numhers ill the first row, select those franchises tilat 113\'e matching numbers. You may skip:1 HUl\\ber th:1t has already been selected or tlla! does not m2tch a franchise numher. The first 10 franchise incomes selected will constitute the sample. List them Oil [he following page.
97 78 17 40 30 23 '80 32 94 31 20 91 46 75 29 15 31 82 44 77
Table 4-1 ---~~---~---_._-_. ._----_._-,------~-------._ .. __._----_.- __ . __
Franchise Franchise Franchise Franchise Number Income Number Income Number Income Number Income
01 $~ 1,81\4 14 $10,694 27 $19,574 39 $14,249 02 S,914 15 17,801 28 7,987 40 21,347 03 16,026 16 5,993 29 15,949 41 9,980 04 5,964 17 13,961 30 5,763 42 17,936 05 11,971 18 3,843 31 13,427 43 7,G81 06 j ') ,921 19 11,513 32 21,068 44 15,339 07 lJ,598 20 19,160 33 36,429 45 22,975 OC; 17,250 21 14,693 34 8,141 46 7,848 09 7,257 22 7,020 35 19,745 47 20,563 10 (,.833 23 13,942 36 8,632 48 6,121 11 9,651 24 13,754 37 11,884 49 9,089 12 6,519 25 9,512 38 2,337 50 7,773 13 J7,2·16 26 12,472
21
-------- ---
2'r: '. f1tt:~;TAflSTICALSAMrLlNGSrUDY
Sample
F/;li1chise No. Income I,lcome
---_.._-_._-
4..2 FigUIC -1 .. 1 is a r.l'lp of a hypulheti,::d city, D'Illsvilte. Th(~ homes have been arbitnrily placed into "cemus tracts," CJch repr(:senting a contiguous ncighborho,)d. Hou:;c:hold identities, addresses, and f;.l.mily income data are provided in Table 4-2. (Figure 4-1 is on paGe 24.)
t lI l
THE STATISTICAL SAMPLING STUDY 23
Table 4-2 (confirHhed)
Household Address Income Household Address Income ----_. ------
East Court 57 3 $18,998 79 3 $42,735 58 4 13,556 80 4 60,600 59 7 17,lJ56 81 5 38,887 60 8 14,665 82 6 71,775 61 11 19,545 83 8 31,119 62 12 16,997 84 10 40,000 63 15 15,305 85 12 56,337 64 16 15,55~ West Court 65 19 16,885 86· 1 12,223 66 20 17,554 87 5 10,678 67 23 21 ,115 88 9 14,556 68 24 20,997 89 13 13,665 69 27 16,666 90 17 15,997 70 28 17,002 91 21 14,555 71 31 15,155 92 25 16,554 72 32 18,444 93 26 22,115 73 35 15,876 94 29 19,997 74 36 16,123 95 30 17,666 75 39 20,001 96 33 16,002 76 40 1l::,888 97 34 J6,155
Hillcrest 98 37 17,444 77 J 57,lJ..J5 99 38 16,876 78 2 28,553
--------~.~_. ----~- \ !
(u) f..hltcliing th;; followirlg random JI..Jmbers wilh houscohold identities, select a simple ranrlom sample of 10 family incom~s. I
34 01 17 73 93 05 54 89 42 29 L
Sarr.ple ---_._------_.
IncomeIJouselw!d Household
Sample
Income
I' (' ,
r
I f ----_._--- I I
I !
Total
(I» Calc:uLltc the sample mean family iI,CCltne .
...t' = . , ._. . _
---
I School 1 rad D-04 I r/) I Vacant 4
- - :2~ ~:~112 [Of} rtJ~-2IF1~~oJ213t-,l
~4 <I THE STI,TISnCAL SMM'1 !NG ~TUDY
Trdel DOl r;- -- - - - - -- --- -" -,- - - -- -- ~-
l~-=-~- -~r:i::-T~~: ---- f-----.---- -------- 1--
I 26 25 28 27 I ----
21 24 23 I u ~-- u 1-----1I ;;; 17 LO;;; 19
I ! ~ --1-3--- l----i-6-- ~j f--- 15-1
I 1i --f--------- <U ') 12
I ] 1---_I I - 5 8
'---1 :,r- L.--l-_---l--~
I - - - -- Uroldway 1-----_..--- .-I
i q I
I _!
W. Boondocks Ln.
J:=-19_=----_IJlillIT' I
t- - T~a~~ 0-05--- -- .
ShoPl'illg Center
L
II _
7 -
-----=-3__=.. V
r+1 .~; -
l I I 8 7 I I I
II 6 5 I
I E B d k Ln t i il -~~'~"'-' c. --~~ i
I y~~O~ __:
I
I6 I I3 I I
I -I II I, I
I J, !
I 10 I JI I I
~ ~-=-=-=--~..._--:-.:.. J
{, .::::.::::=-'-=. =-.:=--=., . I I I I Parking I 108 I
-9l:
Tract D-06
Figure 4-} Map of Dullsvillc Showing Census Tracts
------------------------- ---------
---------
-------------
THE STI\TISTICAL S/IMPl.ING STUDY 25
4-3 (a) Sclect a systemfJtic ralllA-om sample of 1(1 Uul1svillc family incomcs, picking every tcnth household slarting with the fundomly chosen household 07.
Sample Sample
Household Income Household Income
Total
(b) Calculate the sample mean bnily income.
x:--' _
44 (3) A cluster ran(;OlIl sample of Dull~ville family incomes is to be tahn. Trcating each census tract as a cluster, U5~ the first ~ pplica1.>le random numbers in the following sequence to select two clusters by matching the appropriate census tract identity code.
1, 6, 8, 6, 3, 8, 7, 4
The cll!sters chosen are:
D- D- _
('0) Indicate th,' sample incomes by circling the appropriate entries in Tab!e 4-2. (c) Calculate the sample mc~:n family income.
X=' _
4...5 (a) A stratzficJ random sample' that contains two f:lll1Hy incomes from each census tract in Dullsvil1c is to be taken, baSed on street address or apartment number. Starting with Iract D-DI in Figure 4-1, use thr, fol1owir:g random number scquenee--continuing with the next randoll1l;urnber on tIll' kt until Ihe last r,:mily income in D-06 is ohtained. Be sure jo skip those r<lpc!om numbers that do 110t apply to an address andllcvcr back up to an callier random number. Start in the UPP',f !eft-!l<lnd corner and proceed horizontaJly, line by lil1~.
56 04 57 55 85 34 01 17 73 93 05 5,1 89 42 29 57 69 43 10 77 97 78 17 40 30 23 80 32 94 31 20 91 46 75 2~~ J5 3; 82 44 77 32 42 I 1 09 14 6J 19 00 12 05 63 29 2", :1 33 0') 'f9 29 48 21 90 01 58 07 03 .... ' ..8 25 <17 16 4J J4 61 i2 55 86 88 02 39 44 57 56
.-.. !
!I
!
~
I
I I I
I I j,i I I i f I
26 4 'fHE "TATISTICAL SAMPLING STUDY
The sample incomes nrc
, ,;' Address Income
4-6
(b)
(a) Suppose that a canvass has been made of Dullsville family incomes but that the census data are not yet known. As a convenience sample, only the incomes in th~ apartment house (IOd lYIain Street) were included. No family member in unit 02 was home, and the tenant in 05 sLJmmed the door on the canvasser. The remaining family incomes were obt1ined. Li~t the sample outcomes.
------------------
Total
IncomeUnit
x= _
Calculat.; the sample mean family income.
- (b) Calculate the sampk mean bmily income.
x· = " _
THE STATISTICAL SAMPLING STUDY 27
4-7 (a) Seh-d a judgmcnt sample of f.lInily ill~omc by ch()o~ing lh~ hOU)ch~)lds with the lowest identity IlUmbCIS in each cenSliS ,fact. All if,lds are to be represented by olle f:Ullily, except for the hrp-cst Olles (D-OI and lJ-05), whicll arc to be represented by three families from each lrast.
Census Tract HOllseho!d Income
(1)) Cllkll1ate the sample mean family income.
x= _
h: ;, I
Measures of the Center: Mean, Median, and Mode
1. A list has 10 entries. Each entry is either a 1, a 2, or a 3. a. What must the list be if the mean is 1? b. What must the list be if the mean is 3? c. Can the mean be 4?
2. Which of the following two lists has a larger mean? How could you determine this
without doing any calculations? a. 10, 7, 8, 3, 5, 9 b. 10, 7, 8, 3, 5, 9, 11
3. Ten people in a room have a mean height of 5 ft 6 in. (or 66 in.) An eleventh
person, who is 6 ft 5 in tall, enters the room. Find the mean height of all eleven people. Show your reasoning clearly.
4. An instructor gives a quiz with 3 questions. Each question is worth 1 point. 40%
of the class scores 3 points, 30% of the class scores 2 points, 20% of the class scores 1 point, and 10% of the class scores 0.
Score on Quiz Percent of Class with that
Score
3 40
2 30
1 20
0 10
a. If there were 10 people in the class, what would the mean score be? b. If there were 20 people in the class, what would the mean score be? c. If you were not told the number of people in the class, what would the mean score
be? d. If there were 10 people in the class, what would the median be? e. If there were 10 people in the class, what would the mode be? f. If you were not told how many people were in the class, what would the median
be? g. If you were not told how many people were in the class, what would the mode be?
Variability: Standard Deviation
1. Below are rough sketches of the histograms for three lists of data. Match the sketch with the description. (Some descriptions will be left over.) Explain your reasoning.
DATA DESCRIPTIONS:
i) x = 3.5, s = 1
ii) x = 3.5, s = 0.5
iii) x = 3.5, s = 2
iv) x = 2,5m s = 1
v) x = 2.5, s = 0.5
vi) x = 4.5, s = 0.5
SKETCHES:
2. Make up a list of 10 numbers so that the standard deviation is as large as possible and:
a. Every number is either a 1 or a 5 b. Every number is either a 1 or a 9 c. Every number is either a 1, 5, or 9 and at least tow of the numbers are 5.
3. For a list of positive numbers, can the standard deviation ever be larger than the mean? Why or why not? 4. In 2007, Governor Schwarzenegger proposed that all state employees be given a flat raise of $200 per month.
a. What would this do to the (mean) average monthly salary of state employees? b. What would this do to the standard deviation of the salaries of state employees? c. What would a 5% increase in salaries do to the mean monthly salary? d. What would a 5% increase in salaries do to the standard deviation?
5. Can the standard deviation ever be negative?
Comparing the Mean Absolute Deviation (MAD) and the Standard Deviation Here is the data: 13, 14, 24, 24, 25, 26 1. Find the mean, median and mode for this data.
2. Use the table to find the n
xMAD
∑ −=
µ, the Mean Absolute Deviation of the data.
Value of x Deviation x - µ
Absolute Deviation µ−x
13
14
24
24
25
26
Total
n
xMAD
∑ −=
µ = _________
3. Now find the standard deviation of the data.
Value of x Deviation x - µ
Squared Deviation (x – µ)2
13
14
24
24
25
26
Total The population standard deviation is given by the formula:
n
x2)( µ
σ−
= = __________
While I think that the MAD is a more “natural” way to measure deviation, we use the standard deviation in statistics. This is basically because it is easier mathematically to deal with squares and square roots than it is to deal with absolute values.
Top 10 Reasons Why We Use the Standard Deviation as Our Measure of Variation (by Ann E. Watkins, California State University Northridge)
10. Every value gets taken into account. 9. In important distributions, the formula for the SD turns out to be short and sweet. 8. It is easy to compute with squares. 7. The SD has a rather intimate relationship with the mean so if the mean is the measure of center, the SD is the logical measure for spread. 6.The SD and the mean have a rather intimate relationship with the normal distribution 5.The SD goes someplace mathematically. The MAD is reasonable, but it isn’t important. 4. Standard deviations add like Pythagoras. 3. The formula for the SD looks suspiciously like the Euclidean distance formula. 2. The SD is a distance! 1. It must be the official one to use since it is called the “standard” deviation 0. The SD is on the calculator and the MAD is not.
Population Standard Deviation:
n
x2)( µ
σ−
=
Sample Standard Deviation:
1
)( 2
−
−=
n
xs
µ
Measures of Central Tendency, Variability, Grouped Data and Cumulative Frequencies
1. For the data: 16, 6, 10, 6, 16, 6, 15, 19, 23, 5, 6, find the:
a. Mean b. Median c. Mode d. Midrange e. Range f. MAD g. Sample Standard Deviation
2. Here is a frequency distribution:
Score Classes Frequency (# of people) Cumulative Frequency
10 – 24 15
25 – 39 20
40 – 54 27
55 – 69 13
70 - 84 15
Total 90
a. Draw a frequency histogram of the data, showing the class boundaries as the
edges of your bars. You may also want to show the class marks. b. Draw a cumulative frequency ogive of the data. c. Find the mean of this grouped data. (You’ll need the class marks or midpoints.) d. Find the sample standard deviation of this grouped data.
Criminal Data Lab The purpose of this lab is to show you how a spreadsheet program can help you process large amounts of data. Preliminaries: You can do this lab at home or in the labs on campus. You will need Microsoft Excel and Word. I am writing for, and using, a PC, but it should work similarly on a Mac. Stress: If you are not comfortable with the computer, expect a certain level of frustration, though I have tried to eliminate as much of that as possible. The focus is on processing and interpreting data. If you are stressing and things aren’t proceeding smoothly, stop what you are doing, and come talk to me. It will help if you do not leave this assignment until the last minute. Quite often the problems you may have are some small things that either you or I have overlooked. What it is: In this lab we are going to use some data that was collected about 100 years ago on criminals. The file consists of two columns: one that gives the height (in inches) of 3000 male criminals and the other column that contains the lengths of their left middle finger (in centimeters.) The data is stored on a file online at www.cabrillo.edu/~mladdon/Criminals.xls. We will be looking at some summary statistics as well as some graphs. We will do our statistical work using Excel and then we will copy the important information into a Word document where we can add our answers to the questions. Open up the data file using Excel: To open the file “Criminals.xls,” go to the above link, click on it, and save it to your computer. You can do this by clicking on File, scroll to “Save As…” and select a place on your computer to save the file. Then open up Excel (Start>Programs>Microsoft Excel) and under “File” select “Open” and select the Criminals.xls file. (If you are using Internet Explorer, be sure to save the file on your computer first, before opening up Excel. Do not just click on it to open it up. If you do ,then you will be working with it on the internet in an html file. You want to be working on it, and saving it, to your local computer.) Open up Word: To do this, click Start>Programs>Microsoft Word. In Word, start a new document (File>New>Blank Document.) At the bottom of your computer screen there will be buttons that will allow you to easily go between the two programs. At this point there should be two open. You can go from one program to the other by clicking on the desired button at the bottom of the screen.
Create a Histogram and Summary Statistics for Heights: • Make sure you are in Excel. Have the Criminal Data file open. • Double click on Tools, at the top of the screen. At or near the bottom of the tool list may
be “Data Analysis.” If it is, skip the next few steps until you get to the Tools>Data Analysis step.
• Click on Tools>Add Ins. (It may take a little while.) • Select Analysis ToolPak and Analysis ToolPak-VBA. (click on the box to the left of them
so that there is a check in each box.) • Click OK. (You may need the disks you used to install Excel if you are doing this at
home.) • Click on Tools>Data Analysis. • Click on Descriptive Statistics. • Click OK • To select the input range, simple click on the A above the column where the height data is.
(Make sure the cursor is in the input range box.) • Since our data includes a heading on the column, be sure the “labels in first row” box is
checked. • Select the “Summary Statistics” box. • Click OK • In the column headings, move your mouse to put the cursor between the column A and
column B headings. The pointer should now be a vertical line with two arrows pointing to the left and right.
• Double click. This will widen column A so that you can read all the headings. • The summary statistics should still be highlighted (dark) at this point. If they are not, click
on the upper left hand block of the statistics and drag the mouse to the lower right hand corner of the block.
• Click Edit>Copy. • Now go to your Word document. • Click Edit>Paste
You have now pasted the table of the summary statistics of the heights into your Word document. It is a good idea to name this document and save it. Keep doing this as you go, so that when you are ready to stop, you won’t lose all of your work so far. You will answer the questions posed in this lab in the Word document. On your summary statistics table, the minimum value should be 56 inches and the count, n, should be 3000. The original data did not go away. When you need to go back to it, select the Microsoft Excel box and then click on the “Criminal Data” tab at the bottom of the screen. In the Word document, you will answer the numbered questions.
1. What is the maximum height?
We now will use Excel to make a histogram (bar chart) of the data. We need to set up the “classes” for Excel to use. They are called “bins.”
• Go To Excel. • Click on the Criminal Data tab to get back to the data • In an empty cell, say D1, type “height (inches) • In D2, enter the minimum value (56) • In D3, enter “57” • Select D2 and D3 • Move the cursor to the lower right hand corner of D3. The cursor should turn into a “+.” • Click and drag down until the number in the box is the same as the maximum value.
Now you have the “bins” that Excel will use for the histogram
• Click on Tools>Data Analysis>Histogram>OK • For the input range, select column A by clicking on the “A”. • Click on the bin box. • Select D1 through to the maximum value (D something.) • Be sure the labels box is checked. • Be sure the Chart Output box is checked. • Click OK. • In the graph, click on the word “frequency” on the right hand side and press “delete.” (This
gets rid of the label “frequency.”) • Click on the word “Histogram.” • Now select, or double click on, “Histogram” and type in an appropriate title for your chart.
When you are done, click somewhere away from the title to deselect it. • Do the same for the word “frequency” on the left side of the chart. Give it a correct title.
(That is, what are the units being shown in the bars?) We will now copy this histogram and paste it into Word:
• Click in the box containing the chart, but not directly on the bars or titles, to select the chart. (You may see little black boxes at the edges.)
• Click Edit>Copy. • Go to your Word document. • Click Edit>Paste.
You can resize the histogram in your Word document by:
• Clicking on the graph to select it. • Click on one of the little squares on the perimeter of the box and drag the mouse to resize
the chart. The histogram should not take up your whole page, nor should it be as small as a postage stamp. Pick a medium size to look good in your document.
2. Describe the shape of the data. This is called the shape of the “distribution.” Now we will do some calculations with the summary statistics. Go to the Excel sheet that has these. (It is probably called “sheet 1” in your workbook. You can rename it by clicking on the tab at the bottom of the page, select the “sheet 1” title, and type “Summary Stats” as your title for the sheet.) 3. Calculate sx ± , sx 2± , sx 5.1± and sx 3± to the nearest tenth of an inch. ( x is
the mean and s is the standard deviation of our data.) Put these intervals in your Word document.
4. What percent of the 3000 heights do you expect in the intervals sx ± , sx 2± , and sx 3± ? Answer this using Chebychev’s Rule, and then answer this using the Empirical Rule.
We will now determine the actual percent of the data falling in the intervals above. Since the data is in discrete classes, we will estimate the percent that is in each interval. We will assume that a measurement of 65 really means that the height is between 64.5 and 65.5 inches, etc. To estimate the number of heights in your intervals from question 3, we will “interpolate” to determine what percent of each class is in the interval. For example, if one of the intervals were (62.3, 68.6), we would want a certain percentage of the “62” inch heights, all of the “63” in heights, and so on until the “68” class, and lastly, a percentage of the “69” inch heights. To find this, we would add up (62.5-62.3)*(number of people in the “62”class) + (number of people in the “63” class) + (number of people in the “64” class) + (number of people in the “65” class) + (number of people in the “66” class) + (number of people in the “67” class) + (the number of people in the “68” class) + (68.6-68.5)*(the number of people in the “69” class). Notice, I am finding the number of people in each class from my Excel sheet where the histogram is. Excel shows the “bins” and the “frequencies”, or number of people whose heights are in each bin. It is only for the first and last classes in my interval that I need to find a percentage. 5. Determine the percentage of heights that fall in each of the intervals you
found in question 3. Please show your calculations clearly. 6. How do these percentages compare with what you expected? Which of the
two rules seems most appropriate? 7. Based upon the last 2 questions, make a conclusion about the percent of data
that falls within 1.5 standard deviations of the mean. (i.e.: what would you “expect” to find?)
Now we will repeat this with the left middle finger data. We need to determine the minimum and maximum finger length.
• Go to Excel. • Click on the Criminal Data tab. • Click on Tools>Data Analysis. • Click on Descriptive Statistics. • Click OK. • To select the input range, click on the B above column B. (Make sure the cursor is in
the input range box.) • Check the “labels in first row” box. • Check the “Summary Statistics” box. • Click OK.
8. What is the maximum length for middle fingers?
• Go back to Excel. • Click on the Criminal Data tab. • In an empty cell, say E1, type “length (cm).” • In cell E2, enter the minimum value, 9.5. • In cell E3, enter 9.6. • Select E2 and E3. • Move the cursor to the lower right hand corner of E3. The cursor should turn into a “+.” • Click and drag down until the number in the box is the same as the maximum value. • Click on Tools>Data Analysis>Histogram>OK. • For the input range, select column B by clicking on the B. • Highlight anything in the “bin range” box and delete it. • Select E1 through the maximum value. • Check the label and chart output boxes. • Click OK • Change the labels on the graph like you did with the height histogram. • Copy and paste the graph into your Word document.
9. Calculate sx 5.1± to the nearest hundredth of a cm. Determine the
percentage of lengths that fall into this interval. 10. The shape of the histogram is up and down in several places. Why might
this be? What could have been done differently to avoid this? 11. How does the calculated percentage compare with your predicted value?
You are now done. You can save your Excel file and the Word document to a disk. You will turn in your Word document with the histograms and answers to the questions all typed up to me.
Box and Whisker Plots
Uses:
• To study the distribution of numerical data, to see how spread out or bunched up it is.
• To compare different sets of numerical data.
To Draw the Box Plot (Find the 5 number summary):
• Find the minimum value of the data (Min)
• Find the maximum value of the data (Max)
• Find the median, the middle value of the data (Med)
• Find the lower (first) and upper (third) quartiles (Q1 and Q3)
The lower quartile, Q1, is the middle value of the numbers that are less than the
median and the upper quartile, Q3, is the middle value of the numbers that are
greater than the median.
Make a number line, in scale that works for the values of your data. Draw vertical line segments above the values of Q1, Med, and Q3. Make the box around these values. Extend lines (“whiskers”) from the left and right sides of the box to the minimum and maximum values.
Who Was the Greatest Yankee Home Run Hitter? (from: Exploring Data by James M. Landwehr and Ann E. Watkins, prepared for the American Statistical Association’s Quantitative Literacy Project)
Here are four of the greatest New York Yankee home run hitters with a list of the number of home runs each hit while a Yankee.
Babe
Ruth
Year
# Home Runs
Lou
Gehrig
Year
# Home Runs
Mickey
Mantle
Year
# Home Runs
Roger
Maris
Year
# Home Runs
1920 54 1923 1 1951 13 1960 39
1921 59 1924 0 1952 23 1961 61
1922 35 1925 20 1953 21 1962 33
1923 41 1926 16 1954 27 1963 23
1924 46 1927 47 1955 37 1934 26
1925 25 1928 27 1956 52 1965 8
1926 47 1929 35 1957 34 1966 13
1927 60 1930 41 1958 46
1928 54 1931 46 1959 31
1929 46 1932 34 1960 40
1930 49 1933 32 1961 54
1931 46 1934 49 1962 30
1932 41 1935 30 1963 15
1933 34 1936 49 1934 35
1934 22 1937 37 1965 19
1938 29 1966 23
1939 0 1967 22
1968 18 Source: The Baseball Encyclopedia, 4th ed. Joseph L. Reicher, ED, 1979
1. Using ONE numerical scale (or number line), make 4 box and whiskers plots on the same page. Be sure to include the 5 number summary that you got for each player. Compare the box plots and:
2. Decide who was the best player. What is your reasoning? 3. Rank the 4 players from best to worst. What are the reasons for your rankings?
Probability: Experiment versus Theory
Throw one die 36 times and record your results on this table. We will fill in the theory column together, as a class.
Number showing on the die Experiment: # of times it occurred
Theory: Percentage expected
1
2
3
4
5
6
Throw two dice 36 times and record the sums that you get:
Sum of the 2 dice Experiment: # of times that sum occurred
Theory: Percentage expected
2
3
4
5
6
7
8
9
10
11
12
Throw two dice 36 times and record the differences you get:
Difference of the 2 dice (large # - small #)
Experiment: # of times that difference occurred
Theory: Percentage expected
0
1
2
3
4
5
Two Dice Sums – Probabilities of Combined Events 1. Write the theoretical probabilities of the following sums:
a) P(sum of 2) b) P(sum of 3) c) P(sum of 4) d) P(sum of 5) e) P(sum of 6) f) P(sum of 7) g) P(sum of 8) h) P(sum of 9) i) P(sum of 10) j) P(sum of 11) k) P(sum of 12) l) P(sum of 13)
2. Find these probabilities:
a) P(sum of 7 or 11) b) P(sum of 3 or 4) c) P(sum is an even number) d) P(sum is an odd number) e) P(a sum of 3, 5 or 9) f) P(a sum larger than 5) g) P(a sum smaller than 5) h) P(a sum less than 12) i) P(a sum smaller than 5 OR larger than 9) j) P(a sum smaller than 4 OR larger than 8) k) P(a sum larger than 2 AND smaller than 5) l) P(one die shows a 5 and the other die has 3 or less)
3. The sample space for “roll one die” had 6 outcomes. The sample space for “roll two
dice” had 36 outcomes. How many outcomes will there be for the experiment “roll three dice?”
4. Write out a suitable sample space for the experiment “roll a die and toss a coin.”
Find the probabilities:
a) P(6 and heads) = P(6H) b) P(rolling a 5) c) P(heads and any number) d) P(even number and tails)
5. Write out a suitable sample space for the experiment “roll a die, toss a coin, and pick
one of 3 cards.” (You have the ace, king and queen of clubs.)
Find the probabilities:
a) P(a 1,a head and an ace) b) P(a 3 and tails) c) P(tails and a queen) d) P(a 4 AND a king) e) P(anything but a 4 on the die) f) P(an even #, heads and king) g) P(an odd # and an ace) h) P(an odd # on the die and a king or queen) i) P(a 4 OR a king)
Counting Recap Here is a summary of what we discussed in class: Multiplication Principle e.g. How many possible outcomes do we have when we toss a coin, toss a die and pick one of 3 cards? 2·6·3 = 36 possible outcomes Factorials n! = n(n-1)(n-2)…3·2·1 e.g. 6! = 6·5·4·3·2·1 = 720, 4! = 4·3·2·1 = 24 Permutations How can I use n objects to fill r spaces? Order is important, no repeats.
P(n,r) = nPr = )!(
!
rn
n
−
e.g. How can I use 7 books to fill 4 empty spaces on the shelf? 7P4 = 840 Combinations How can I pick n objects r at a time? Order is not important, no repeats.
C(n,r) = nCr =
=
− r
n
rrn
n
!)!(
!
e.g. How many ways can you pick a committee of 3 from a group of 12 people?
12C3 = 220 The 4 Ways to Count:
1. License Plates: Order is important, repeats are allowed 2. Teams: Order is important, repeats are not allowed
3. Committees: Order is not important, repeats are not allowed 4. Weird Stuff: Order is not important, repeats are allowed
Some Counting Problems:
1. Calculate:
P(7, 4) = 7P4 = C(7,4) = 7C4 =
2. On a math test there are 10 multiple choice questions with 4 possible answers each, and 15 true-false questions. In how many possible ways can the 25 questions be answered?
3. The student affairs committee has 3 faculty, 2 administration members, and 5 students on it. In how many ways can a subcommittee of 1 faculty, 1 administrator and 2 students be formed?
4. Here is an ice-cream question:
The Laddon Ice-cream Shoppe has 31 flavors of ice-cream. You are going to buy a triple decker ice-cream cone. How many ways are there to arrange your 3 scoops of ice-cream if:
a) Each scoop has to be a different flavor and you do not care how they are arranged.
b) Each scoop has to be a different flavor and you do care how they are arranged.
c) Each scoop does not have to be a different flavor and you do care how they are arranged.
d) Each scoop does not have to be a different flavor and you do not care how they are arranged.
Conditional Probability
Consider this contingency table, which presents the results of an advertising survey about the use of credit by Martan Oil Company customers:
Number of Purchases at Gasoline Stations:
(Made Last Year)
Method of
Payment
0 - 4 5 - 9 10 - 14 15 - 19 20 and
over
Cash 150 100 25 0 0
Oil Company
Card
50 35 115 80 70
National or Bank
Credit Card
50 60 65 45 5
a) How many customers were surveyed? b) Why is this bivariate data? What type of variable is each one? c) What is the probability of preferring to use an oil-company credit card?
d) What is the probability of preferring cash AND making 10 - 14 purchases last year?
e) What is the probability of making between 5 and 9 purchases GIVEN that the person prefers to use a national or bank credit card?
f) What does the 70 in the fifth cell of the second row mean?
g) What is the probability of preferring cash OR making 10 – 14 purchases last year? h) Using whatever category you like, try an example to see if you can show that the 2
variables are independent (or not independent.)
Conditional Probability and Independence of Variables Marcella went to several restaurants. Here is a table describing the cost and quality of the food she had.
The Food was:
Good
Bad
Cheap
50
40
Expensive
25
5
Total
Find these probabilities:
a) P(the food was good)
b) P(the food was cheap)
c) P(the food was a good and cheap)
d) P(the food was good given that it was cheap)
e) P(the food was expensive given that it was bad) f) Based on this study, are cost and quality independent? Why or why not?
Probability Distributions For each of the following, state whether or not it is a probability distribution. If it is not, tell the reason why. If it is, find its mean and standard deviation. 1.
X P(X)
1 0.2
2 0.3
3 0.3
4 0.1
µ = _________, σ = ____________ 2.
X P(X)
1 0.4
2 0.3
3 0.2
4 0.1
µ = _________, σ = ____________ 3.
X P(X)
1 0.5
2 -0.3
3 0.6
4 0.2
µ = _________, σ = ____________ 4
X P(X)
0 0.2
1 0.2
2 0.2
3 0.2
4 0.2
µ = _________, σ = ____________ 5.
X P(X)
1 1/8
2 3/8
3 3/8
4 1/8
µ = _________, σ = ____________
Binomial type problems.
1. A weird penny is tossed 8 times. The probability of heads on any one toss is equal to 0.3. Fill in the following table:
Number of Heads
X
Use the formula
(the long way!)
P(X)
Use Binomial
Table in the text
P(X)
Use Binoml83
program on
calculuator
P(X)
0
1
2
3
4
5
6
7
8
2. This time the weird penny is tossed 6 times. P(heads) = 0.9. Fill in the following table:
Number of Heads
X
Use the formula
(the long way!)
P(X)
Use Binomial
Table in the text
P(X)
Use Binoml83
program on
calculuator
P(X)
0
1
2
3
4
5
6
3. Last time. The penny is tossed 10 times. P(heads) = 0.5. Fill in the following table:
Number of Heads
X
Use the formula
(the long way!)
P(X)
Use Binomial
Table in the text
P(X)
Use Binoml83
program on
calculuator
P(X)
0
1
2
3
4
5
6
7
8
9
10
1
1 1 1 2 1
1 3 3 1 1 4 6 4 1
1 5 10 10 5 1 1 6 15 20 15 6 1
1 7 21 35 35 21 7 1 1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1 1 10 45 120 210 252 210 120 45 10 1
THE STANDARD NORMAL CURVE Worksheet For each of the problems below, be sure to SKETCH the standard normal curve and SHADE IN the area you are being asked to find. See the following example: Find the area under the standard normal curve that lies between z = -1.65 and z = 1.65
From the table in the text, or using the Normal83 program, the area = 2(.4505) = .901 The related probability statement is P(-1.65 < z < 1.65) = .901 or 90.1%
The shaded area is .901 or 90.1% of the entire area. 1. You find the area under the standard normal curve that lies between:
Sketch Area
a) z = 0 and z = 2
b) z = 0 and z = 3
c) z = −3 and z = 3
d) z = 0 and z = 1.70
e) z = −1.70 and z = 2
f) z = 1.70 and z = 2.70
2. Find the AREA under the standard normal curve:
SKETCH:
a) to the left of z = −0.40
b) for z < 0.40
c) to the right of z = 1.65
d) to the right of z = −1.65
e) outside the interval from z = −2.00 to z = 2.00
3. Solve for z in each of the following (I have shaded in the areas, you find the z value
that gives the correct area.):
a) Area = 80.64% z =
b) Area = 44.52% z =
c) Area = 17.11% z =
d) Area = 99.18% z =
e) Area = 91.15% z =
f) Combined area = 4.04% z =
Statistics Using the TI-83 Graphics Calculator
The Normal Distribution By Mike Koehler Blue Valley North High School Overland Park, KS [email protected]
Problem 9: A company manufactures light bulbs that have a life expectancy that is normally distributed with a mean of 750 hours and a standard deviation of 40 hours. Find the probability that a bulb burns between 728 hours and 784 hours.
Problem 10: The heights of 6 year old girls are normally distributed with a mean of 46 inches and a standard deviation of 2.17 inches. Find the probability that a girl selected at random will have a height less than 44 inches.
Problem 11: In a letter to Ann Landers, a wife claimed to give birth to a baby 308 days after a visit from her husband who was in the Navy and stationed on board a ship. Pregnancies are normally distributed with a mean of 268 days and a standard deviation of 15 days. Does the wife have a problem? (Triola, Elementary Statistics, Addison Wesley, 1992)
Problem 12: On a SAT exam administered by the CEEB, the mean math score was 475 with a standard deviation of 130. If a scholarship is available to students with scores above the 85th percentile, what is the score needed to be eligible for the scholarship?
Problem 13: The life of an electric drill when used commercially follows a normal distribution with a mean of 8 years with a standard deviation of 1.25 years. The manufacturer will replace free all drills that fail while under warranty. If the manufacturer is willing to replace only 5% of the drills that fail, how long a guarantee should be offered?
Copyright 1996 by Mike Koehler
(c) Copyright 1997 Texas Instruments Incorporated. All rights reserved.
Trademarks
The Normal Distribution
1. What is the standard normal distribution and why is it important?
2. Consider data that was obtained from a normal distribution with a mean of 6.3 and a standard deviation of 3. 17. Convert the following to z-scores.
a. 12.31 b. 8.2 c. 2.1 d. 15.8
3. What is the physical meaning of a z-score of 1.96? 4. What is the physical meaning of a z-score of –1.96?
5. Consider the intelligence quotient (IQ) of a person. IQ’s are approximately normal
with a mean of 100 and a standard deviation of 15. If a person were selected at random,
a. What is the probability that her/his IQ would be below 120? b. What is the probability that her/his IQ would be above 120? c. What is the probability that her/his IQ would be between 90 and 105? d. What is the probability that her/his IQ would be below 90?
6. Consider data that was obtained from a normal distribution with a mean of 6.3 and
a standard deviation of 3.17. Find the 85th percentile.
7. Find the following probabilities: a. P(-1.5 < z < 1.1) b. P(0 < z < 6.2) c. P(-1 < z < 1) d. P(-2 < z < 2) e. P(-3 < z < 3) f. P(-1.96 < z <1.96)
8. Given that a particular normal random variable, x, has a mean of 13 and a
standard deviation of 6.2, find the following probabilities: a. P(14 < x < 20) b. P(4 < x < 17.2) c. P(x < 7)
9. A job satisfaction index score for nurses is normally distributed with a mean of 50
and a standard deviation of 10. What is the probability that a nurse selected at random has an index score:
a. Higher than 55? b. Between 47 and 59?
10. A radar unit is used to measure the speed of automobiles on a busy street in
downtown Santa Cruz that has a speed limit of 35 mph. Suppose the speeds of individual automobiles are normally distributed with a mean of 37 mph.
a. Find the standard deviation if 5% of the automobiles travel faster than 45 mph.
b. Based on the standard deviation you just calculated, find the 85th percentile for the random variable “automobile speed.”
c. Based on the standard deviation you just calculated, what percentage of cars travel within 3 mph of the posted speed limit?
Questions about Normal Data and Sample Means
A manufacturer of Sea-Monkeys finds that the daily numbers of Sea-Monkey fun packs assembled by a machine are normally distributed with a mean of 560 and a standard deviation of 12.
1. For a randomly selected day, find the probability that more than 575 items are produced.
2. For a randomly selected day, find the probability that between 565 and 580 items were produced.
3. For 36 different randomly selected days, find the probability that the mean number of items produced is less than 565.
4. For 20 different randomly selected days, find the probability that the mean number of items produced is greater than 565.
5. For a randomly selected day, find the value of the twentieth percentile.
Sampling Distributions, An Example
Original Population: X = 1, 2, 3 All are equally probable 8165.32,2 === σµ and
The population of all SAMPLES of size 4 (sampling with replacement): Samples of size 4 {Population 1,2, 3}
x-bar (mean)
1 1 1 1 1 1 1 1 2 1.25 1 1 1 3 1.5 1 1 2 1 1.25 1 1 2 2 1.5 1 1 2 3 1.75 1 1 3 1 1.5 1 1 3 2 1.75 1 1 3 3 2 1 2 1 1 1.25 1 2 1 2 1.5 1 2 1 3 1.75 1 2 2 1 1.5 1 2 2 2 1.75 1 2 2 3 2 1 2 3 1 1.75 1 2 3 2 2 1 2 3 3 2.25 1 3 1 1 1.5 1 3 1 2 1.75 1 3 1 3 2 1 3 2 1 1.75 1 3 2 2 2 1 3 2 3 2.25 1 3 3 1 2 1 3 3 2 2.25 1 3 3 3 2.5 2 1 1 1 1.25 2 1 1 2 1.5 2 1 1 3 1.75 2 1 2 1 1.5 2 1 2 2 1.75 2 1 2 3 2 2 1 3 1 1.75 2 1 3 2 2 2 1 3 3 2.25 2 2 1 1 1.5 2 2 1 2 1.75 2 2 1 3 2 2 2 2 1 1.75
2 2 2 2 2 2 2 2 3 2.25 2 2 3 1 2 2 2 3 2 2.25 2 2 3 3 2.5 2 3 1 1 1.75 2 3 1 2 2 2 3 1 3 2.25 2 3 2 1 2 2 3 2 2 2.25 2 3 2 3 2.5 2 3 3 1 2.25 2 3 3 2 2.5 2 3 3 3 2.75 3 1 1 1 1.5 3 1 1 2 1.75 3 1 1 3 2 3 1 2 1 1.75 3 1 2 2 2 3 1 2 3 2.25 3 1 3 1 2 3 1 3 2 2.25 3 1 3 3 2.5 3 2 1 1 1.75 3 2 1 2 2 3 2 1 3 2.25 3 2 2 1 2 3 2 2 2 2.25 3 2 2 3 2.5 3 2 3 1 2.25 3 2 3 2 2.5 3 2 3 3 2.75 3 3 1 1 2 3 3 1 2 2.25 3 3 1 3 2.5 3 3 2 1 2.25 3 3 2 2 2.5 3 3 2 3 2.75 3 3 3 1 2.5 3 3 3 2 2.75 3 3 3 3 3
The Distribution of the Sample Means:
x-bar Frequency P(x-bar) 1 1 0.01
1.25 4 0.05 1.5 10 0.12
1.75 16 0.20 2 19 0.23
2.25 16 0.20 2.5 10 0.12
2.75 4 0.05 3 1 0.01
Histogram of Sample Means
0
5
10
15
20
11.2
5 1.5 1.75 2
2.25 2.5 2.7
5 3More
Average of 4 numbers
Freq
uenc
y
(Looks kind of normal, no?) Summary Sampling Statistics n = 4
µµ == 2x
166.2 =xσ
28165.
24082. ====
σσσnx
Sampling Activity 1. Look at the sheet for a minute. Turn it face down. Write down your “estimate” of the
average area. 2. Pick any 5 of the rectangles that you consider “typical.” Find their areas and take the
average. (judgement)
3. Using a random number generator, generate 5 numbers between 1 and 100. Find the areas of those 5 rectangles and take the average. (random)
4. Pick a number between 1 and 20. Add 10 to it 4 times. Take those 5 rectangles and find their areas. Take the average. (systematic)
From: Activity-Based Statistics by R. Scheaffer, M. Gnanadesikan, A. Watkins, J. Wetmer, New York, Springer-Verlag, 1996
Bivariate Data Examples
1. Two “qualitative” variables: Gender and Major
Major Gender Liberal Arts Business
Administration
Technology Row Totals
Male 5 6 7
Female 6 4 2
Column Totals
a) What is the “grand total?”
b) Fill in the percentages based on the grand total in the grid below:
Gender Liberal
Arts
Business
Administration
Technology Row Totals
Male
Female
Column
Totals
c) Fill in the percentages for each cell based on the row totals:
Gender Liberal
Arts
Business
Administration
Technology Row Totals
Male
Female
Column
Totals
d) Fill in the percentages for each cell based on the column totals:
Gender Liberal
Arts
Business
Administration
Technology Row Totals
Male
Female
Column
Totals
Practice stating exactly what you are finding in clear English!
2. Two “quantitative” variables: Number of Hours Studied and Score on a Test
X
Number of Hours Studied
Y
Score on the Test
18 68
27 82
20 77
10 90
30 79
24 72
32 94
27 88
12 60
16 70
a) Make a “scatter plot” of this data.
b) Enter the data into your calculator. Find the correlation coefficient, r.
c) Find the slope and intercept of the regression line.
d) Write the equation of the regression line.
(Here’s what I get: r = 0.4798612569 and the slope is 0.681642, and the intercept is 63.2765298. Check to see if you are getting these values on your calculator also.)
Football:
A random sample of eight quarterbacks listed in The Sports Encyclopedia: Pro Football, 11th Edition, gave the following information:
Height of Q-back
(in inches)
Weight of Q-back
(in pounds)
75 205
78 230
74 210
73 210
72 195
75 215
76 203
73 196
a) Get a piece of graph paper and draw a scatter diagram for the data. b) Write the equation of the line of best fit. c) Graph this line on your scatter diagram. d) If a quarterback is 76 inches tall, what would you predict his weight to
be? e) If a quarterback weighs 200 pounds, what would you predict his height
to be? f) What is r, the correlation coefficient? Does this seem to indicate a strong
or weak correlation? g) Now, go to the lab and use EXCEL to do this for you again. (The
following page will give you the instructions on how to do this.)
Here is some information about some professional golfers.
Player’s Name Earnings per year (x in
dollars)
World Ranking (y) as of
1998
David Duval 1,272,305 8.77
Fred Couples 1,056,533 6.47
Tiger Woods 1,056,086 11.91
Justin Leonard 1,052,346 8.76
Mark O’Meara 894,724 7.49
Phil Mickelson 788,800 7.72
Ernie Els 601,363 12.35
Vijay Singh 383,979 6.59
Mark Calcavecchia 766,224 5.72
Davis Love III 621,987 10.67
Tom Watson 418,385 5.25
Nick Price 296,668 7.79
Colin Montgomery 272,000 9.03
a) Draw the scatter diagram on graph paper.
b) What is r? Does there appear to be linear correlation for this data?
c) Write the equation of the regression line. Graph it on your scatter diagram.
d) What are the units of the slope of the regression line? What is the interpretation of
the slope in English? What are the units of the intercept of the line? What does
the y-intercept tell you?
e) If a golfer makes 500,000 per year, what would you expect their ranking to be?
f) If a golfer is ranked number 7 in the world, what would you expect his/her income
to be?
Regression Example for Statistics For each of the following 4 data sets, use your calculator or a piece of software to find:
1. n, the sample size
2. the mean of the x's
3. the mean of the y's
4. the equation of the regression line
5. the correlation coefficient, r
6. the value of r-squared, r2
Now graph the scatter-plot for each set of data and draw in the regression
line.
What is similar about the sets of data? What is different about the sets?
For which set(s) is it most appropriate to use linear regression as a model?
Set 1 Set 1 Set 2 Set 2 Set 3 Set 3 Set 4 Set 4
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
This data comes from: The Visual Display of Quantitative Information by Edward R.
Tufte, Graphics Press, Cheshire, Connecticut c. 1983
Statistics Using the TI-83 Graphics Calculator Page 1 of 2
2/14/97 3:02:37 PM
By Mike Koehler Blue Valley North High School Overland Park, KS [email protected]
Statistics Using the TI-83 Graphics Calculator
Confidence Intervals
Problem 14: A company manufactures light bulbs that have a length of life that is approximately normally distributed with a standard deviation of 30 hours. If a sample of 40 bulbs has an average life of 765 hours, find a 95% confidence interval for the population mean of all bulbs produced by the company.
Problem 15: Ten high school students are randomly selected and asked to count the amount of change in their pocket or purse. They are found to have a mean of $1.11 and a standard deviation of $1.42. construct a 95% confidence interval for the mean amount of money carried by high school students.
Problem 16: The following were recorded for the relief time, in hours, for a certain brand of cough suppressant: 3.4, 2.5, 4.8, 2.9, 3.6, 2.8, 3.3, 5.6, 3.7, 2.8, 4.4, 4.0, 5.2, 3.0, 4.8. Assuming that the measurements represent a random sample from a normal population, find the 95% confidence interval for relief times of the cough suppressant.
Problem 17: As a freshman, Jacque Vaughn of the University of Kansas, made 28 out of 70 shots from three point range. Construct a 95% confidence interval of the proportion of such shots this player might make in his career at K.U.
Statistics Using the TI-83 Graphics Calculator Page 2 of 2
2/14/97 3:02:37 PM
Problem 18: In a high school, it is believed that the percentage of senior males who drive to school exceeds the percentage of senior females who drive to school by 10%. It is found that 158 of 200 males surveyed drive and 114 out of 150 females drive. Find a 95% confidence interval for the difference in proportions of seniors who drive to school. Decide if the 10% difference is valid.
Problem 19: A car dealership asked 25 randomly selected buyers how long they planned to keep their new cars. The sample mean and standard deviation are 6.2 years and 3.1 years respectively. Construct the 95% confidence interval for the population standard deviation.
Copyright 1996 by Mike Koehler
Statistics Using the TI-83 Graphics Calculator
TI-82 / 83 Activities for Statistics / Probability
Classroom Activities Matrix
(c) Copyright 1997 Texas Instruments Incorporated. All rights reserved. Trademarks
Hypothesis Testing
The 5 step process:
1. State the null hypothesis H0 and the alternative hypothesis H1. (They are always
about µ, p, or σ.)
2. State what you are testing, α, the level of significance given for the problem, whether
the test to be used is a one-tailed or two-tailed test, and what test statistic is to be used
(z, t, F, or χ2.)
3. Draw a picture, shade in the appropriate tail or tails, look up the critical values using
tables or calculators and label the picture.
4. Compute the test statistic and show it on your picture. The computation may be done
on your calculator. I generally show the p-value in this step as well.
5. Make a decision to reject or fail to reject H0 and state your conclusion in clear
English.
Example:
An IQ test was administered to twelve students and the scores were as follows:
87 102 94 81 115 75
74 116 98 114 96 102
Based on this sample, is the claim that the true mean IQ in the population is less
than 100 justified? Use α = .10, and assume that IQ is normally distributed with a
population variance, σ2 = 144.
Solution: 1. H0: µ = 100
H1: µ < 100
2. This is a one population, one-tailed test of the mean. The test statistic
to use is a z-test because σ is known (even though the sample size n is
small.) We will use zx
n
=− µ
σ
3. This is a left-tailed test. The critical value for z is zcrit = −1.28.
Decision rule: If ztest ≤ −1.28, we’ll reject H0. If ztest > −1.28, we’ll fail
to reject H0.
4. Using the formula above, we get ztest =
12
12
100197.69z
−= = −1.11
5. Fail to reject H0. (We are NOT in the critical region.) There is not
enough evidence to show that the mean IQ is less than 100.
Note: To do Step 4 on the TI-83 and TI-84 calculators, press STATS, TESTS, Input: Stats, µ0 = 100, σ =12
(the calculator will divide by n for you), x = 96.197, n = 12, µ: < µ0, calculate.
The p-value will be given to you by the calculator. It is the probability of having a z value less than the
value of our test statistic. P(z < −1.11) = .1335. By comparing the 13.35% with our significance level, α = 10%, we also see that we are not in the critical region.
Examples of One Sample Tests
1. From his long-standing experience, a farmer believes that the mean yield of grain
per plot on his farm is 150 bushels. When a new seed introduced on the market
was tried on sixteen randomly picked experimental plots, the mean yield was 158
bushels. Suppose the yield per plot can be assumed to be normally distributed
with a standard deviation of yield, σ, of 20 bushels. Is the new seed significantly
better? Use α = 0.02.
2. A random sample of size n = 16 is drawn from a population having a normal
distribution. The sample mean and the sample variance are given, respectively, as
x = 23.8 and s2 = 10.24. At the 5 percent level of significance, test the following:
a) H0: µ = 25 versus H1: µ ≠ 25
b) H0: µ = 25 versus H1: µ < 25
3. A ski coach claims that she can train beginning skiers for 3 weeks so that at the
end of the program they will finish a certain downhill course in less than 13
minutes. It was found that, when a random sample of ten skiers was given the
training, their mean time was 12.3 minutes with s = 1.2 minutes. On the basis of
the evidence, is the true mean time significantly less than 13 minutes? Use α =
0.025.
4. In a sample of 160 cathode tubes inspected, 22 were found to be defective. If the
true proportion of defectives is significantly higher than the 8 percent that the
company considers tolerable, repairs on the machine are in order. At the 5 percent
level of significance, does the machine need repairs?
5. Quarters are currently minted with weights having a mean of 5.670 g and a
standard deviation of 0.062 g. New equipment is being tested in an attempt to
improve quality by reducing variation. A simple random sample of 24 quarters is
obtained from those manufactured with the new equipment, and this sample has a
standard deviation of 0.049 g. Use a 0.05 significance level to test the claim that
quarters manufactured with the new equipment have weights with a standard
deviation less than 0.062 g. Does the new equipment appear to be effective in
reducing the variation of weights? What would be an adverse consequence of
having quarters with weights that vary too much?
6. To test the effect of a physical fitness course on one’s physical ability, the number
of sit-ups that a person could do in one minute, both before and after the course,
was recorded. Ten randomly selected participants scored as shown in the
following table. Can you conclude that a significant amount of improvement took
place? Use α = 0.01.
Before 29 22 25 29 26 24 31 46 34 28
After 30 26 25 35 33 36 32 54 50 43
Examples of two-sample tests
1. The purchasing department for a regional supermarket chain is considering two
sources from which to purchase 10-lb bags of potatoes. A random sample taken
from each source shows the following results:
Idaho Supers Idaho Best
No. of bags weighed 100 100
Mean Weight 10.2 lb 10.4 lb
Sample Variance 0.36 lb 0.25 lb
At the 0.05 level of significance, is there a difference between the mean weights
of the “10-lb” bags of potatoes?
2. Two competing headache remedies claim to give fast-acting relief. An
experiment was performed to compare the mean lengths of time required for
bodily absorption of brand A and brand B headache remedies. Twelve people
were randomly selected and given an oral dosage of brand A. Another 12 were
randomly selected and given an equal dosage of brand B. The length of time in
minutes for the drugs to reach a specified level in the blood was recorded. The
information follows:
Brand A mean = 20.1 s1 = 8.7 n1=12
Brand B mean = 18.9 s2 = 7.5 n2=12
Past experience with the drug composition of the two remedies permits
researchers to assume that the standard deviations of the two time distributions are
approximately equal. Use a 5% level of significance to test the claim that there is
no difference in the mean time required for bodily absorption.
3. In a survey of working parents (both parents working outside the home) one of the
questions asked was “Have you refused a job, promotion, or transfer because it
would mean less time with your family?” Two hundred men and two hundred
women were asked this question. 29% of the men and 24% of the women
responded “yes.” Based on this survey, can we conclude that there is a difference
in the proportion of men and women responding “yes” at the 0.05 level of
significance?
4. Researchers collected data on the numbers of hospital admissions resulting from
motor vehicle crashes, and the results are given in the table for Fridays on the 6th
of a month and Fridays on the following 13th of the same month (based on data
from “Is Friday the 13th Bad for Your Health?” by Scanlon et al., British Medical
Journal, Vol. 307, as listed in the Data and Story Line online resource of data
sets). Use a 0.05 significance level to test the claim that when the 13th day of a
month falls on a Friday, the numbers of hospital admissions from motor vehicle
crashes are not affected.
Friday the 6th 9 6 11 11 3 5
Friday the 13th 13 12 14 10 4 12
5. Bipolar Depression Treatment. In clinical experiments involving different groups
of independent samples, it is important that the groups be similar in the important
ways that affect the experiment. In an experiment designed to test the
effectiveness of paroxetine for treating bipolar depression, subjects were
measured using the Hamilton depression scale with the results given below (based
on data from “Double-Blind, Placebo-Controlled Comparison of Imipramine and
Paroxetine in the Treatment of Bipolar Depression,” by Nemeroff et al., American
Journal of Psychiatry, Vol. 158, No. 6). Using a 0.05 significance level, test the
claim that both populations have the same standard deviation. Based on the
results, does it appear that the two populations have different standard deviations?
Placebo Group n = 43 x = 21.57 s = 3.87
Paroxetine Treatment Group n = 33 x = 20.38 s = 3.91
Chi-Square (χχχχ2)
We’ll work through an example. Say we have polled registered voters about a piece of
legislation proposed by the governor. We pool 200 urban, 200 suburban and 100 rural
residents (selected randomly) and ask them if they are in favor of, or oppose the proposal.
Governor’s Proposal
Type of
Residence
Favor Oppose Total
Urban 200
Suburban 200
Rural 100
Total 500
If peoples preferences about the proposal are independent of where they live, then we
would expect 40% of the people in favor of the proposal to be urban dwellers, since they
make up 40% of the population here. Likewise, we would expect 40% of the people
opposing the proposal to be urban dwellers also.
With this in mind, and using the fact that 254 people were in favor of the proposal and
246 people were opposed to the proposal, fill out the table below for “expected”
responses.
Governor’s Proposal
Type of
Residence
Favor Oppose Total
Urban 200
Suburban 200
Rural 100
Total 254 246 500
Now, as it happens, the real data was somewhat different. It was as follows:
Governor’s Proposal
Type of
Residence
Favor Oppose Total
Urban 143 57 200
Suburban 98 102 200
Rural 13 87 100
Total 254 246 500
Here is how we test this:
Step 1:
H0: The proportion of voters favoring the proposed legislation is the same in all 3
groups. (This is the presumption of “independence.”)
H1: The proportion of voters favoring the proposed legislation is not the same in
all three groups.
Step 2:
This is a test of independence. The test statistic to use is χ2 = ( )O E
E
−∑
2
.
O is the observed frequency (pulled out of the data table) and E is the expected
frequency (pulled out of the table we created using the totals from each type of
residence.)
It is a one tailed test because this type of χ2 tests always are - if the observed and
expected frequencies are close, then χ2 will be close to 0 (i.e., the opinion on the
legislation is independent of where the person lives.) If χ2 is large, then there is some
dependence between opinion and residency.
Step 3:
Use the table in your text to select a critical value for χ2. We will use α = .05.
The degrees of freedom are df = (r−1)(c−1), where r is the number of rows in your table
and c is the number of columns. Here, df = (3−1)(2−1) = 2. The critical value here is
χ2 = 5.99.
Step 4:
Do the number crunching. You can try it longhand (once will probably satisfy
you) and then notice that with the TI-83, after entering the matrix of observations in [A],
you can select χ2 test from the TESTS part of the Stats menu. The expected matrix will
be calculated automatically and stored wherever you specify.
I got χ2value to be 91.72, so we are definitely in the critical region.
Step 5:
Make your decision and state the conclusion. We reject H0. The three groups of
voters do not all have the same proportions favoring the proposed legislation. (Notice,
we can decide that there is dependence, but not what caused it. That’s why we study
social science as well as mathematics.)
For any old contingency table, you can always construct the table of expected values by
using the row and column totals off of the data table. The expected frequency of the ith
row and jth column position in the table is given by:
ERow total Column total
Grand totali, j =
×=
×R C
n
i j
(EXCEL can do all of these calculations for you also!)
Statistics Using the TI-83 Graphics Calculator Page 1 of 5
2/14/97 3:02:59 PM
By Mike Koehler Blue Valley North High School Overland Park, KS [email protected]
Statistics Using the TI-83 Graphics Calculator
Hypothesis Testing
Problem 20: A certain type of children's pain reliever states that it contains 325 mg of acetaminophen in each ounce of the drug. If 70 one ounce samples are tested for acetaminophen and it is determined that the mean is 319 mg of the drug and a standard deviation of 26 mg. With a = .01, test the claim that the population mean is equal to 325 mg.
Problem 21: A test is designed to determine if the right hand of right handed people is stronger than their left hand. Nine right handed adults were selected and hand strength tested for each hand. The hand strengths in pounds for each person are given below.
Problem 22: A school wants to compare the ACT scores of those students who complete a college core of courses (4 years of English, and at least 3 years of Math, Social Science, and Natural Science) to the scores of those who do not complete this core. The results of a recent administration of the test follow.
Statistics Using the TI-83 Graphics Calculator Page 2 of 5
2/14/97 3:03:00 PM
Determine whether completing the core courses produces a higher score on the sub-tests.
Problem 23: Cuckoos lay their eggs in the nests of other birds. Some biologists speculate that the size of the cuckoo's eggs might be different depending on whether the eggs are laid in warbler's nests or wren's nests. To check this, biologists searched a wildlife refuge for warbler's and wren's nests. Summary statistics for the lengths (in mm) of cuckoo's eggs found in these nests are shown below.
Does this data support the biologists' claim that the size of the eggs differ depending on whether they are laid in warbler's nests or wren's nests? ( Advanced Placement Course Description, Preliminary Edition; The College Board, 1995)
Problem 24: It is believed that at least 70% of the homes in a city have smoke detectors installed. Would you agree with this claim if a random survey of homes in the city shows that 153 of the 240 homes surveyed have working smoke detectors installed?
Problem 25: A vote is to be taken among the residents of a state to determine whether casino gambling should be legalized. Many voters in the out-state areas feel it will pass because of the large proportion of city and suburban voters who favor the amendment. To determine if there is a significant difference in the proportion of metropolitan voters and out-state voters favoring the proposal, a poll is taken. If 123 of 210 town voters favor the legalization, and 244 of 515 out-state residents favor it, would you agree that the
Statistics Using the TI-83 Graphics Calculator Page 3 of 5
2/14/97 3:03:00 PM
proportion of metropolitan voters favoring the proposal is higher than the proportion of out-state voters? Do you think the proposal will pass?
Problem 26: A national pizza chain recently test marketed a new type of pizza in a large metropolitan area. It is important for the company to evaluate the product's performance during this time. This was done in part by sampling consumers and assessing their exposure to the product. The company selected random samples of consumers from different age groups and obtained the following results.
Do these data indicate that market penetration is independent of age?
Problem 27: The following data is obtained from a random sample of absences from a company. At the .05 level of significance, test the claim that the absences occur with equal frequency on each of the 5 days.
X2cdf computes the cumulative probability function. The syntax is ( lowerbound, upperbound, degrees of freedom ) .
Statistics Using the TI-83 Graphics Calculator Page 4 of 5
2/14/97 3:03:01 PM
Problem 28: Does your life expectancy depend on where you live? The following table gives the state-by-state list of average life expectancy at birth as compiled by the National Center for Health Statistics. Is there a significant difference in a person's average life expectancy based on the region of the country in which they live?
Analysis of Variance provides the methods for comparing the means of more than two populations. In this case we use a one-way ANOVA because we are comparing the means of populations that are classified in one way, by region of the country. The procedures for comparing the means involves analyzing the variation in the sample data relative to the variation in sample means. If the variation in sample means is large relative to the variation within sample means, the we conclude that the means of the population are not all equal.
Statistics Using the TI-83 Graphics Calculator Page 5 of 5
2/14/97 3:03:01 PM
Copyright 1996 by Mike Koehler
Statistics Using the TI-83 Graphics Calculator
TI-82 / 83 Activities for Statistics / Probability
Classroom Activities Matrix
(c) Copyright 1997 Texas Instruments Incorporated. All rights reserved. Trademarks
TWO
populations
Test Statistics
Test of the mean
of differences
(Matched Pairs)
Test of the difference
of means
Test of the difference
of proportions
Test of
Equality of
Variances
What is being
tested?
MEAN (dependent/paired data)
MEAN (independent samples)
MEAN (independent samples) (Pooled Variances)
PROPORTION (independent samples)
VARIANCES
Is σ known? Yes – any size n is ok, use a z test No – use large n, use a t test
No – Assume σ1 = σ2
Samples have to be large
Samples have to be normal
Null
Hypothesis
Ho: µd = 0 Ho: µ1 = µ2
Ho: µ1 = µ2 Ho: p1 = p2 Ho:σ1=σ2
Statistic to
use
td
s
n
d
d
=− µ
You find d and sd with your calculator
2
2
2
1
2
1
2121 )()(
n
s
n
s
xxt
+
−−−=
µµ
df = smaller of n1 – 1 and n2 – 1
tx x
sn n
where
sn s n s
n n
=− − −
+
=− + −
+ −
( ) ( )
( ) ( )
1 2 1 2
1 2
1 1
2
2 2
2
1 2
1 1
1 1
2
µ µ
proportion “pooled” The
ˆ
ˆ,ˆ
)11
(ˆˆ
)()ˆˆ(
21
21
2
22
1
11
21
2121
nn
xxp
andn
xp
n
xp
where
nnqp
ppppz
+
+=
==
+
−−−=
2
2
2
1
s
sF =
(sample #1 has larger variance)