statistics. quiz tomorrow questions on anything? comments? fears? ask away!!!
TRANSCRIPT
Statistics
Quiz Tomorrow
• Questions on anything?
• Comments?
• Fears?
• Ask away!!!
So…..
• We have now learned how to display CATEGORICAL data.
• Now… we will talk about quantitative data.
To learn how to display and describe quantitative data we will be using some baseball statistics. The following table shows the number of home runs in a
single season for three well-known baseball players: Hank Aaron, Barry Bonds, and Babe Ruth.
Dot Plot• Label the horizontal axis with the name of the
variable and title the graph• Scale the axis based on the values of the variable• Mark a dot (we’ll use x’s) above the number on
the axis corresponding to each data value
Ruth20 25 30 35 40 45 50 55 60
Number of Home Runs in a Single Season Dot Plot
Describing a Distribution
• We describe a distribution using the acronym SOCS
Shape:
• We describe the shape of a distribution in one of two ways.
• 1. Symmetric/Approximately Symmetric
Symmetric-3 -2 -1 0 1 2 3
Collection 1 Dot Plot
Uniform-3 -2 -1 0 1 2 3 4
Shape Dot Plot
Shape:
• 2. Skewed• Right Skewed vs.
Left Skewed
RightSkewed-4 -3 -2 -1 0 1 2 3 4
Shape Dot Plot
LeftSkewed-3 -2 -1 0 1 2 3 4
Shape Dot Plot
Notice that the direction of the “skew” is the same direction as the “tail”
“tail” “tail”
Outliers:
• These are observations that we would consider “unusual”. Pieces of data that don’t “fit” the overall pattern of the data.
Ruth20 25 30 35 40 45 50 55 60 65
Number of Home Runs in a Single Season Dot Plot
Unusual observation???
Babe Ruth had two seasons that appear to be somewhat different than the rest of his career. These may be “outliers”
Outliers:
Bonds10 20 30 40 50 60 70 80
Number of Home Runs in a Single Season Dot Plot
Unusual observation???
The season in which Barry Bonds hit 73 home runs does not appear to fit the overall pattern. This piece of data may be an outlier.
Center:
• A single value that describes the entire distribution. A “typical” value that gives a concise summary of the whole batch of numbers.
Ruth20 25 30 35 40 45 50 55 60 65
Number of Home Runs in a Single Season Dot Plot
Center:
• A single value that describes the entire distribution. A “typical” value that gives a concise summary of the whole batch of numbers.
Ruth20 25 30 35 40 45 50 55 60 65
Number of Home Runs in a Single Season Dot Plot
A typical season for Babe Ruth appears to be approximately 46 home runs
Spread:
• Since we know that not everyone is typical, we need to also talk about the variation of a distribution. We need to discuss if the values of the distribution are tightly clustered around the center making it easy to predict or do the values vary a great deal from the center making prediction more difficult?
Spread:
Ruth20 25 30 35 40 45 50 55 60 65
Number of Home Runs in a Single Season Dot Plot
Babe Ruth’s number of home runs in a single season varies from a low of 23 to a high of 60.
Distribution Description using SOCS
• The distribution of Babe Ruth’s number of home runs in a single season is approximately symmetric1 with two possible unusual observations at 23 and 25 home runs.2 He typically hits about 463 home runs in a season. Over his career, the number of home runs has varied from a low of 23 to a high of 60.4
• 1-Shape 2-Outliers
• 3-Center 4-Spread
Stem and Leaf PlotCreating a stem and leaf plot
• Order the data points from least to greatest• Separate each observation into a stem (all but
the rightmost digit) and a leaf (the final digit)—Ex. 123->12 (stem): 3 (leaf)
• In a T-chart, write the stems vertically in increasing order on the left side of the chart.
• On the right side of the chart write each leaf to the right of its stem, spacing the leaves equally
• Include a key and title for the graph
Stem and Leaf Example:
• Number of Home Runs in a Single Season
Key
= 46
Split Stem and Leaf Plot• If the data in a distribution is concentrated in just
a few stems, the picture may be more descriptive if we “split” the stems
• When we “split” stems we want the same number of digits to be possible in each stem. This means that each original stem can be split into 2 or 5 new stems.
• A good rule of thumb is to have a minimum of 5 stems overall
• Let’s look at how splitting stems changes the look of the distribution of Hank Aaron’s home run data.
Split Stem and Leaf Plot
• Split each stem into 2 new stems. This means that the first stem includes the leaves 0-4 and the second stem has the leaves 5-9
• Splitting the stems helps us to “see” the shape of the distribution in this case.
Number of Home Runs in a Single Season
= 46Key:
Back-to-Back Stem and Leaf
• Back-to-Back stem and leaf plots allow us to quickly compare two distributions.
• Use SOCS to make comparisons between distributions
Number of Home Runs in a Single Season
Key:= 46
Advantages and Disadvantages of dotplots/stem and leaf plots
• Advantages– Preserves each
piece of data– Shows features of
the distribution with regards to shape—such as clusters, gaps, outliers, etc
Disadvantages- If creating by hand,
large data sets can be cumbersome
- Data that is widely varied may be difficult to graph