a short tour of probability & statistics presented by: nick bennett, grass roots consulting...

Post on 05-Jan-2016

218 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Short Tour of Probability & Statistics

Presented by:Nick Bennett, Grass Roots Consulting & GUTSJosh Thorp, Stigmergic Consulting & GUTSIrene Lee, Santa Fe Institute, GUTS

November 6, 2010Santa Fe Alliance for ScienceProfessional Enrichment Activity

Outline

• Framing the problem (Nick)

• Review of Statistics (Irene)

• Randomness (Nick)

• Dice & Data (Josh)

• Problem of Points (Nick)

• Crosswalk of Common Core standards (Irene)

What is Statistics?

• The science of collection, organization and interpertation of data.

What do Statisticians do?

• Data analysis

• Probability

• Statistical inference

What is a Statistical question?

• One that anticipates variability.

• Compare• “How old am I?”• “How old are the students in my school?”

Describing Data

• Data (plural) are the raw material• Data are the numbers we use to interpret reality

• We will look at a few different ways of describing data.• Dot plot• Frequency table• Stem and Leaf diagram

A Sample Data Set

• 92 Penn State students’ weights

• MALES: • 140, 145, 160, 190, 155, 165, 150, 190, 195, 138, 160, 155, 153, 145,

170, 175, 175, 170, 180, 135, 170, 157, 130, 185, 190, 155, 170, 155, 215, 150, 145, 155, 155, 150, 155,150, 180, 160, 135, 160, 130, 155, 150, 148, 155, 150, 140, 180, 190, 145, 150, 164, 140, 142, 136, 123, 155.

• FEMALES:• 140, 120, 130, 138, 121, 125, 116, 145, 150, 112, 125, 130, 120, 130,

131, 120, 118, 125, 135, 125, 118, 122, 115, 102, 115, 150, 110, 116, 108, 95, 125, 133, 110, 150, 108

Dot Plot

In a dot plot, one dot per student goes over each student’s reported weight.

Frequency table -> Histogram

Divide the number line into intervals and count the number of students weights within each interval.

The “frequency” is the count in any given interval.

The “relative frequency” is the proportion of weights in each interval.

Histograms

• From the frequency table, we can make a bar graph called a histogram.

• Each bar covers an interval and is centered at the midpoint.

• The height of the bar corresponds with the number of data points in the interval

Stem-and-Leaf Diagram

Both summarizes data and shows all data points.

• The STEM shows intervals (ranges in tens)

• The LEAVES show data points (ranges in ones)

• Put the leaves in order

• Is there evidence of reporting bias?

Summary Statistics

• Central or typical value

• Spread about that value

Measures of Center

• Mean

• Median

The Mean

• Given

x_

=xii=1

n∑n

x1, x2 , x3,....xn

The Median

• The midpoint of the data• If even number of data points, it is the middle• If odd number of data points, average the two data

points nearest the middle.

Why two measures of center?

Measures of Spread

• Interquartile range

• Standard deviation

Interquartile range

• Put the data in numerical order

• Divide the data set into two equal groups with the median as the center point.

• The median of the low group = 1st quartile

• The median of the high group = 3rd quartile

S = S2

Q1

Q3

IQR =Q3 −Q1

Box & Whiskers plot

Q1 Q3median

1.5 IQR 1.5 IQR

.

Standard deviation

• Average squared distance =

• Sample variance

• Standard deviation =

(xii=1

n∑ −x_

)2

n

(xii=1

n∑ −x_

)2

n−1S2 =

S = S2

Z-scores, Standardized Scores

• A student weighing 175 pounds has a z-score of 1.26

zi =xi −x

_

S

175 −145.223.7

=1.26

Summary:

• Several ways to display data

• Measures of Center

• Measures of Spread

• Standard deviations

Statistical inference

• Use random sampling to draw inferences about a population.

• Generalizations about a population from a sample are valid only if the sample is representative of that population.

Sampling

• With replacement.

• Without replacement.

top related