Download - Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest

Chapter 1Chapter 1

Why Statistics?

2

Learning can result from:Learning can result from:Critical thinkingAsking an authorityReligious experience

However, collecting DATA is the surest However, collecting DATA is the surest way to learn about the worldway to learn about the world

3

Data in the Sciences are messyData in the Sciences are messy

At first glance, data often look like an incoherent jumble of numbers

How do we make sense of data?

Statistical procedures are tools for Statistical procedures are tools for learning about the world by Learning learning about the world by Learning from Data.from Data.

4

Real Data!Real Data!To help you understand the power and

usefulness of statistics, we will explore two real and interesting data sets

“The Smoking Study”“The Maternity Study”

5

The Smoking StudyThe Smoking Study From the University of

Wisconsin Center for Tobacco Research and Intervention

608 participants provided data on smoking, addiction, withdrawal, and how best to quit smoking

The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book

6

The Maternity StudyThe Maternity Study From Wisconsin Maternity

Leave and Health Project

244 families provided data on marital satisfaction, child-rearing styles, and other household events

The full data set is

provided on the CD, a description of the data collected in provided in the appendices of the book

7

VariabilityVariability Why are data messy? Consider a concrete example:

Depression scores (“CESD”) for participants in the Smoking Study

Some participants (each has a different ID number) have CESD scores of 0, while others have scores of 2, 11 or 7, or some other value

These data are messy in that the scores are different from one another

VariabilityVariability is the statistical term for the is the statistical term for the degree to which scores (such as the degree to which scores (such as the depression scores) differ from one depression scores) differ from one another.another.

8

Sources of VariabilitySources of Variability It is easy to see that depression scores are

variable, by why?– Individual differences

Some people are more depressed than others Some people have difficulty reading the and

understanding the questions on the test Some people answer the questions more honestly than

others– Procedure

Differences in the ways the data were collected– Conditions or Treatments

The conditions that are imposed on the participants of the study

9

Populations and SamplesPopulations and SamplesStatistical Population – a collection or Statistical Population – a collection or

set of measurements of a variable that set of measurements of a variable that share some common characteristicshare some common characteristic

Sample – a subset of measurements Sample – a subset of measurements from a populationfrom a population

Random sample – a sample selected Random sample – a sample selected such that every score in the population such that every score in the population has an equal chance of being includedhas an equal chance of being included

Chapter 2Chapter 2

Frequency Distributions and

Percentiles

Variability (revisited)Variability (revisited)Collecting Data means measuring a

variableThose measurements differ (vary) from

one anotherOne way to organize and summarize a

set of measurements is to construct a frequency distribution

These methods can be applied to both populations and samples

ExampleExample

5 13 17 20 19 35 21 28 3 22

26 13 30 30 30 32 40 27 14 4

27 33 28 45 29 25 38 35 33 39

5 4 20 24 25 27 16 25 38 9

36 20 18 11 12 23 22 27 32 49

22 30 0 32 4 23 9 29 22 23

YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

ExampleExample

0 3 4 4 4 5 5 9 9 10

11 13 13 14 16 17 18 19 20 20

20 21 22 22 22 22 23 23 23 24

25 25 25 26 27 27 27 27 28 28

29 29 30 30 30 30 32 32 32 33

33 35 35 36 38 38 39 40 45 49


A Better Summary?A Better Summary?

ClassInterval

FrequencyRelative

FrequencyCumulativeFrequency

CumulativeProportion

0 - 4 5 .083 5 .083

5 - 9 4 .067 9 .150

10 - 14 5 .083 14 .233

15 - 19 4 .067 18 .300

20 - 24 12 .200 30 .500

25 - 29 12 .200 42 .700

30 - 34 9 .150 51 .850

35 - 39 6 .100 57 .950

40 - 44 1 .017 58 .967

45 – 49 2 .033 60 1.00

Total (n) 60 1.000


Graphing DistributionsGraphing Distributions

PercentilesPercentilesWe have been focusing on distributions

rather than individual scoresSometimes, individual scores are of great

importanceComputing Percentiles, when n=608

The 50-th percentile is the “middle” score. It is the 304-th sorted score.

The 32-th percentile is the 608*0.32=194.56, i.e., the 195-th sorted score.

Percentile RankPercentile RankThe percentile rank of a score is the

percent (the proportion times 100) of the measurements in the distribution below that score value

Computing percentile rank for YRSMK:Sort the variable, called YRSMK_sorted The percentile rank of 9 is 50/608 = 0.082, so

it is the 8-th percentileThe percentile rank of 21 is 246/608 =

0.4046053, so it is the 40-th percentile

Graphing DistributionsGraphing DistributionsGraphing distributions is a very

valuable tool for highlighting features of the data– Shape– Range– Central Tendency– Variability

ShapeShapeWe classify the shape of distributions

in three ways:– Symmetry – is one half a mirror image of

the other half?– Skew – are there high/low frequencies of

low/high scores?– Modality – how many humps or modes?

SymmetrySymmetry Is one half of the distribution a mirror image of the

other (along a vertical axis)? Three examples of symmetrical distributions:

SkewSkew Positive – high

frequencies of low values and low frequencies of high values

Negative – low frequencies of low values and high frequencies of high values

ModalityModalityHow many humps (or modes)?

Unimodal Bimodal

Characterizing ShapeCharacterizing Shape

AsymmetricNegatively Skewed

Bimodal

AsymmetricPositively Skewed

Unimodal

Central Tendency and Central Tendency and VariabilityVariability In addition to shape, distributions differ

in terms of:– Central Tendency - scores near the center

of the distributions; where the scores “tend” to be

– Variability – the degree to which scores differ from one another; the “spread” of the scores

Comparing DistributionsComparing Distributions It is very useful to be able to compare

and contrast (name similarities and differences) of distributions

Distributions can differ in terms of shapes, central tendencies, and variability

Comparing DistributionsComparing Distributions

How do these distributions differ?

Download - Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest

Top Related