breakout session #1 graphical statistics

55
Breakout Session #1 Graphical Statistics Presented by Dr. Del Ferster

Upload: kairos

Post on 24-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Breakout Session #1 Graphical Statistics. Presented by Dr. Del Ferster. What’s in store for today?. We’ll start by doing a needs assessment. Where do you want or need more information regarding the topics for this year’s work. We’ll spend a bit of time looking at some “test-type” problems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Breakout Session #1 Graphical Statistics

Breakout Session #1Graphical Statistics

Presented byDr. Del Ferster

Page 2: Breakout Session #1 Graphical Statistics

What’s in store for today?

We’ll start by doing a needs assessment.Where do you want or need more information regarding the topics for this year’s work.

We’ll spend a bit of time looking at some “test-type” problems.We’ll take another look at graphical statistics.

Different types of plots2 column frequency tables.

I know, it sounds like a blast!

Page 3: Breakout Session #1 Graphical Statistics

Let’s do some problems!

Practice ProblemsGRAPHICAL STATISTICS

Page 4: Breakout Session #1 Graphical Statistics

Get your popcorn ready!

TODAY’S FEATURE PERFORMANCE

GRAPHICAL STATISTICS

Page 5: Breakout Session #1 Graphical Statistics

Descriptive Statistics:Tabular and Graphical Presentations

Summarizing Qualitative DataSummarizing Quantitative Data

RecallQualitative = Essentially just a name. Quantitative = True numerical data.

Page 6: Breakout Session #1 Graphical Statistics

2.6

We Deal with 2 Types of Data

Numerical/Quantitative Data [Real Numbers]:

Your heightThe number of people in your familytemperature of coffee bought at McDonaldsThe score on your last math test

Qualitative/Categorical Data [Labels rather than numbers]:

grade of a High School student[F, S, J, Senior]favorite colorPolitical party affiliationthe part of a new automobile that breaks firstthe reason you get mad at your spouse

Page 7: Breakout Session #1 Graphical Statistics

Summarizing Qualitative Data

Frequency Distribution (shows how many)Relative Frequency Distribution (shows what

fraction)Percent Frequency Distribution (shows what

percentage)Bar GraphPie Chart

Both these are graphical means for displaying any of above.

Page 8: Breakout Session #1 Graphical Statistics

Frequency Distribution

A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes.The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.

Page 9: Breakout Session #1 Graphical Statistics

Example: Stumble InnGuests staying at Stumble Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. Theratings provided by a sample of 20 guests are shownbelow.

Below Average Average Above Average

Above Average Above Average Above Average

Above Average Below Average Below Average

Average Poor Poor

Above Average Excellent Above Average

Average Above Average Average

Above Average Average

Page 10: Breakout Session #1 Graphical Statistics

Frequency DistributionRating FrequencyPoor 2Below Average 3Average 5Above Average 9Excellent 1

Total 20

Example: Stumble Inn

Page 11: Breakout Session #1 Graphical Statistics

Relative Frequency Distribution

The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class.A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.

Page 12: Breakout Session #1 Graphical Statistics

Percent Frequency Distribution

The percent frequency of a class is the relative frequency multiplied by 100.A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.

Page 13: Breakout Session #1 Graphical Statistics

Example: Stumble InnRelative Frequency and Percent Frequency Distributions

Relative Percent

Rating FrequencyFrequency

Poor .1010Below Average .15

15Average .25

25Above Average .45

45Excellent .05 5

Total 1.00 100

Page 14: Breakout Session #1 Graphical Statistics

Bar GraphA bar graph is a graphical device for depicting qualitative data.On the horizontal axis we specify the labels that are used for each of the classes.A frequency, relative frequency, or percent frequency scale can be used for the vertical axis.Using a bar of fixed width drawn above each class label, we extend the height appropriately.The bars are separated to emphasize the fact that each class is a separate category.

Page 15: Breakout Session #1 Graphical Statistics

Example: Stumble InnBar Graph

12

3

45

6

78

9

Poor BelowAverage

Average AboveAverage

Excellent

Freq

uenc

y

Rating

Page 16: Breakout Session #1 Graphical Statistics

Pie ChartThe pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data.First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class.Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.

Page 17: Breakout Session #1 Graphical Statistics

Example: Stumble Inn

Pie Chart

Average 25%

BelowAverage 15%

Poor 10%

AboveAverage 45%

Exc. 5%

Quality Ratings

Page 18: Breakout Session #1 Graphical Statistics

Insights Gained from the Preceding Pie ChartOne-half of the customers surveyed gave Stumble Inn a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager.For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager.

Example: Stumble Inn

Page 19: Breakout Session #1 Graphical Statistics

Summarizing Quantitative Data

Frequency DistributionRelative Frequency and Percent Frequency DistributionsDot PlotHistogramCumulative DistributionsOgive

Page 20: Breakout Session #1 Graphical Statistics

Example: RPM Auto Repair

The manager of RPM Auto Repairwould like to have a betterunderstanding of the costof parts used in the enginetune-ups performed in theshop. He examines 50customer invoices for tune-ups. The costs of parts,rounded to the nearest dollar, are listed on the nextslide.

Page 21: Breakout Session #1 Graphical Statistics

Sample of Parts Cost for 50 Tune-ups

91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73

Including a line in the table for every possible cost is not a good idea.

We need to categorize the data.

Example: RPM Auto Repair

Page 22: Breakout Session #1 Graphical Statistics

Frequency Distribution

Guidelines for Selecting Number of Classes

Use between 5 and 20 classesSmaller data sets usually require fewer classesData sets with a larger number of elementsusually require a larger number of classes.Note that the upper limit of every class is also the lower limit of the next class.

We treat the upper limit as OPEN (or Up to that amount)

Page 23: Breakout Session #1 Graphical Statistics

Frequency Distribution

Guidelines for Selecting Width of Classes

Use classes of equal width.Approximate Class Width =

Largest Data Value Smallest Data ValueNumber of Classes

Page 24: Breakout Session #1 Graphical Statistics

For RPM Auto Repair, if we choose 6 classes:

Frequency Distribution

Approximate Class Width =109 52 9.5

6so we'll use an interval length of 10

50-60 60-70 70-80 80-90 90-100 100-110

2 13 16 7 7 5 Total 50

Parts Cost ($)Frequency

Page 25: Breakout Session #1 Graphical Statistics

Relative Frequency andPercent Frequency Distributions

50-60 60-70 70-80 80-90 90-100 100-110

PartsCost ($)

.04 .26 .32 .14 .14 .10Total 1.00

RelativeFrequency

4 26 32 14

1410

100

Percent Frequency

2/50 .04(100)

Page 26: Breakout Session #1 Graphical Statistics

Relative Frequency andPercent Frequency Distributions

For the RPM Motors Data, we can make the following observations.

Only 4% of the parts costs are in the $50-60 class.30% of the parts costs are under $70.The greatest percentage (32% or almost one-third) of the parts costs are in the $70-80 class.10% of the parts costs are $100 or more.

Page 27: Breakout Session #1 Graphical Statistics

Dot Plot

One of the simplest graphical summaries of data is a dot plot.A horizontal axis shows the range of data values.Then each data value is represented by a dot placed above the axis.

Page 28: Breakout Session #1 Graphical Statistics

Example: RPM Auto Repair

Dot Plot . . .. . . .

50 60 70 80 90 100 110

. . . ..... .......... .. . .. . . ... . .. . . .. .. .. .. . .

Cost ($)

Page 29: Breakout Session #1 Graphical Statistics

Histogram

Another common graphical presentation of quantitative data is a histogram.The variable of interest is placed on the horizontal axis.A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency.Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.

Page 30: Breakout Session #1 Graphical Statistics

Example: Hudson Auto Repair

Histogram

PartsCost ($)

24

6

8

10

12

14

16

18

Freq

uenc

y

50 60 70 80 90 100 110

Page 31: Breakout Session #1 Graphical Statistics

Cumulative Distributions

Cumulative frequency distribution -- shows the number of items with values less than or equal to the upper limit of each class.Cumulative relative frequency distribution -- shows the proportion of items with values less than or equal to the upper limit of each class.Cumulative percent frequency distribution -- shows the percentage of items with values less than or equal to the upper limit of each class.

Page 32: Breakout Session #1 Graphical Statistics

Example: Hudson Auto RepairCumulative Distributions

Cumulative Cumulative Cumulative Relative PercentCost ($) Frequency Frequency Frequency < 60 2 .04 4 < 70 15 .30 30 < 80 31 .62 62 < 90 38 .76 76 < 100 45 .90 90 <110 50 1.00 100

Page 33: Breakout Session #1 Graphical Statistics

Exploratory Data Analysis

The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly.One such technique is the stem-and-leaf display.

Page 34: Breakout Session #1 Graphical Statistics

Stem-and-Leaf DisplayA stem-and-leaf display shows both the rank order and shape of the distribution of the data.It is similar to a histogram on its side, but it has the advantage of showing the actual data values.The first digits of each data item are arranged to the left of a vertical line.To the right of the vertical line we record the last digit for each item in rank order.Each line in the display is referred to as a stem.Each digit on a stem is a leaf.

8 5 7 9 3 6 7 8

Page 35: Breakout Session #1 Graphical Statistics

Stem-and-Leaf Display

Leaf UnitsA single digit is used to define each leaf.In the preceding example, the leaf unit was 1.Leaf units may be 100, 10, 1, 0.1, and so on.Where the leaf unit is not shown, it is assumed to equal 1.

Page 36: Breakout Session #1 Graphical Statistics

Example: Hudson Auto RepairStem-and-Leaf Display

5 2 7 6 2 2 2 2 5 6 7 8 8 8 9 9 9 7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 8 0 0 2 3 5 8 9 9 1 3 7 7 7 8 9

10 1 4 5 5 9

Page 37: Breakout Session #1 Graphical Statistics

SPLIT STEM Stem-and-Leaf Display

If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two more stems for each leading digit(s).Whenever a stem value is stated twice, the first value corresponds to leaf values of 0-4, and the second values corresponds to values of 5-9.

Page 38: Breakout Session #1 Graphical Statistics

Example: Hudson Auto RepairSPLIT STEM Stem and Leaf Plot

5 2 5 7 6 2 2 2 2 6 5 6 7 8 8 8 9 9 9 7 1 1 2 2 3 4 4 7 5 5 5 6 7 8 9 9 9 8 0 0 2 3 8 5 8 9 9 1 3 9 7 7 7 8 9

10 1 4 10 5 5 9

Page 39: Breakout Session #1 Graphical Statistics

2 Way Data Tables

Thus far we have focused on methods that are used to summarize the data for one variable at a time.Often we are interested in tabular and graphical methods that will help understand the relationship between two variables.2 Way Data Tables and scatter diagrams are two methods for summarizing the data for two (or more) variables simultaneously.

Page 40: Breakout Session #1 Graphical Statistics

2 Way Data Tables

2 way data tables are used to summarize the data for two variables simultaneously.2 way data tables can be used when:

One variable is qualitative and the other is quantitativeBoth variables are qualitativeBoth variables are quantitative

The left and top margin labels define the classes for the two variables.

Page 41: Breakout Session #1 Graphical Statistics

Example: Finger Lakes Homes2 Way Data Tables

The number of Finger Lakes homes sold for each style and price for the past two years is shown below.

Price Home Style Range Colonial Ranch Split A-Frame

Total < $99,000 18 6 19 12 55 > $99,000 12 14 16 3 45

Total 30 20 35 15 100

Page 42: Breakout Session #1 Graphical Statistics

Example: Finger Lakes Homes

Insights Gained from the Preceding 2 Way table

The greatest number of homes in the sample (19) are a split-level style and priced at less than or equal to $99,000.Only three homes in the sample are an A-Frame style and priced at more than $99,000.

Page 43: Breakout Session #1 Graphical Statistics

2 Way Tables: Row or Column Percentages

Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.

Page 44: Breakout Session #1 Graphical Statistics

Example: Finger Lakes HomesRow Percentages

Price Home Style Range Colonial Ranch Split A-Frame Total < $99,000 32.73 10.91 34.55 21.82 100 > $99,000 26.67 31.11 35.56 6.67 100

Note: row totals are actually 100.01 due to rounding.

Page 45: Breakout Session #1 Graphical Statistics

Example: Finger Lakes HomesColumn Percentages

Price Home Style Range Colonial Ranch Split A-Frame

< $99,000 60.00 30.00 54.29 80.00 > $99,000 40.00 70.00 45.71 20.00

Total 100 100 100 100

Page 46: Breakout Session #1 Graphical Statistics

A quick 2 way table problemBaked Chips Mashed Total

Boys 34 100Girls 25 37

Teachers 12 22 50Total 104 250

The table above gives the preferences for a variety of people regarding their favorite way to consume potatoes (Yes it’s a carbohydrate extravaganza!!)

1) How many boys liked baked?2) How many teachers preferred

chips?3) How many girls were asked?4) Out of the people who liked

chips, how many were boys?

Page 47: Breakout Session #1 Graphical Statistics

That was fun, let’s do another one!This one deals with probabilities. Grab your calculator and let’s rock! A person is picked at random from this sample

Baked Chips Mashed TotalBoys 15 51 34 100 BoysGirls 25 37 38 100 GirlsTeachers 12 16 22 50 TeachersTotal 52 104 94 250 Total

1) What is the probability the a person picked is a boy?2) What is the probability the a person picked likes mashed?3) What is the probability the person was a teacher who prefers baked

potatoes?4) What is the probability that, out of the girls, the person likes chips?5) Out of the people who like chips, what is the probability the person is a

boy?

Page 48: Breakout Session #1 Graphical Statistics

Scatter Diagram

A scatter diagram is a graphical presentation of the relationship between two quantitative variables.One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.The general pattern of the plotted points suggests the overall relationship between the variables.

Page 49: Breakout Session #1 Graphical Statistics

Example: Panthers Football TeamScatter DiagramThe Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored.

x = Number of y = Number of Interceptions Points Scored

1 14 3 24 2 18 1 17 3 27

Page 50: Breakout Session #1 Graphical Statistics

Example: Panthers Football TeamScatter Diagram

y

x

Number of Interceptions1 2 3

Num

ber o

f Poi

nts S

core

d

0

51015202530

0

Page 51: Breakout Session #1 Graphical Statistics

Example: Panthers Football Team

The preceding scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored.Higher points scored are associated with a higher number of interceptions.The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.

Page 52: Breakout Session #1 Graphical Statistics

Scatter Diagram

A Positive Relationship

x

y

Page 53: Breakout Session #1 Graphical Statistics

Scatter Diagram

A Negative Relationship

x

y

Page 54: Breakout Session #1 Graphical Statistics

Scatter Diagram

No Apparent Relationship

x

y

Page 55: Breakout Session #1 Graphical Statistics

Wrapping Up

Thanks for your attention and participation.I know it’s not easy doing this after a full day with the “munchkins”.

I hope that your year is off to a good start.If I can help in any way, don’t hesitate to shoot me an email, or give me a call.