quantitative vs. categorical data quantitative data consist of number that represent counts or...

24
Quantitative vs. Categorical Data • Quantitative data consist of number that represent counts or measurements. All quantitative data is numerical, but not all numerical data is quantitative. Data with a unit of measurement (seconds, feet, pounds, dollars, etc.) is quantitative. Numerical data used as a label or range of values (Student ID Number, 20-25 years) is not quantitative. 1

Upload: darleen-harrison

Post on 26-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Quantitative vs. Categorical Data

• Quantitative data consist of number that represent counts or measurements.

• All quantitative data is numerical, but not all numerical data is quantitative.

• Data with a unit of measurement (seconds, feet, pounds, dollars, etc.) is quantitative.

• Numerical data used as a label or range of values (Student ID Number, 20-25 years) is not quantitative.

1

Examples: Quantitative Data

The University keeps the following quantitative data about each student.• Grade Point Average

• Number of Credit Hours Completed

• Age

• Amount of money owed for tuition

• Other examples?

2

3

Categorical Data

Non-quantitative data is called categorical.• Non-numerical data must be categorical.• Numerical data that serves to label or

identify individuals are categorical (Example: Social Security Number).

• A useful guide: Would it make sense to consider an average value? If not, treat the data as categorical.

24

Examples: Categorical Data

The University keeps the following categorical data about each student:• Name

• Laker ID Number

• Date of Birth

• Gender

• Residency (“in-state” or “out-of-state”)

• Other?

Chapter One, Part 2

1.4: Critical Thinking in Statistics

1.5: Collecting Sample Data

Quick Review

• We typically have a VERY LARGE set of individuals (called the Population), but we cannot obtain data from every individual.

• A parameter is a numerical value that describes the population. The actual value is not known.

• We choose a subset of the Population (this subset is called a Sample), and gather data from those individuals.

• A statistic is a numerical value, computed from the Sample data. We often use this to estimate the unknown value of some parameter.

6

Population vs. Sample(not to scale!!)

7

08/15/11

Sampling Bias Ideally, we want our sample to be representative

of the overall population. If the way we choose the sample and/or gather

data from the chosen individuals… …is more/less likely to include a certain type of

individual or produce a certain type of response... …then the conclusions we draw from the sample might be inaccurate for the intended population.

This is called bias. Examples to follow.

8

08/15/11

Examples of Bias

Estimate average class height using a sample of students from the front row.

Estimate average class height using a sample of male students.

Study the effectiveness of a weight-loss diet using a sample of professional athletes.

Estimate what percent of Americans approve of the president using a sample of voters from only one political party.

9

08/15/11

Common Types of Sampling Bias

• A voluntary response sample occurs when the individuals to be studied have control over whether or not they are included in the sample.– This is also called “self-selection bias.”

• A convenience sample occurs when the researcher is more likely to choose individuals for which it is easier to obtain data.– The researcher might be unaware of this!

• Small sample: Using too few individuals increases of chance of getting a sample that consists only of “unusual” individuals.

10

Example: Voluntary Response

• Ratemyprofessors.com is a website that collects information about college professors from their students.

• The ratings come from students volunteering to create an account and submit information.

• Question: What kind of students are likely to volunteer?

11

08/15/11

Voluntary Response Bias

• Answer: Students with stronger opinions are more likely to volunteer a response.

• In many “customer satisfaction” surveys, those with a strong negative opinion are most likely to volunteer. Those with a neutral opinion are least likely to volunteer.

• There is potential bias: those in the sample are more likely to have a negative opinion than the entire population.

12

08/15/11

Example: Convenience Sample

• I want to determine the average age of all current Clayton State students.

• For my sample data, I choose five students from the class and compute their average age.

• Why might this lead to inaccurate results?

• Although my intended population is all CSU students, I picked only from a small part of the population (that was most convenient for me).

13

08/15/11

Other Common Problems

• Some types of bias occur not in choosing the sample, but in gathering data from the chosen individual.

• Misreported data: Individuals may give inaccurate results (perhaps unintentionally) when asked a certain question.– Example: How much do you weigh today?

– Example: How many hours per week do you study?

• Question wording: Variations in the wording of a question can greatly influence people’s responses. Compare:– Should the government spend more money on public education?

– Should the government spend more of your tax dollars on public education?

14

08/15/11

Good Ways to Sample

• We usually want to have some degree of randomness when choosing our sample.

• Randomly-chosen samples reduce the potential for biased results, but complete randomness is not always possible.

• We’ll talk more about what “random” actually means in Chapters 4 and 5.

15

08/15/11

*** Simple Random Sample ***

• All of the statistical inference in this course will assume that data comes from a Simple Random Sample (SRS). This means…– Before choosing individuals, we decide how

many we want to use. This is called the sample size, usually denoted by the letter n.

– We choose the sample so that each group of n individuals (from the overall population) is equally likely to be picked.

16

08/15/11

Other “Good” Samples

• Random Sample: Each individual from the population has an equal chance of being chosen for the sample (every SRS is a Random Sample, but not every Random Sample is an SRS).

• Probability Sample: For each individual, we know the chance that he/she will be chosen for the sample, but different individuals may have different chances (Random Samples and SRS’s are special cases of this).

17

08/15/11

Other “Good” Samples

• Stratified Sample: Divide the population into mutually exclusive groups (strata), and choose a sample (often an SRS) from within each group.– This is often done if we want to account for

some kind of population demographics.– Example: If our population is 60% women and

40% men, we might choose a sample of 30 women and 20 men. Sample demographics match those of the population.

18

08/15/11

Other “Good” Samples

• Cluster Sampling: Divide the population into mutually exclusive groups (clusters), and randomly choose a set of clusters. For each selected cluster, gather data from all individuals within that cluster.– Example: There are 13 sections of Math 1231

currently offered at Clayton State. Randomly choose 3 sections, and survey all students from each of those 3 sections.

19

Other Types of Samples?

• Real-world statistics often uses very complex sampling methods. See the text for a survey of some of these.

• The trade-off: A simple method (like an SRS) is easier to analyze mathematically, but often more difficult to achieve in practice.

08/15/11 20

08/15/11

How are the data obtained?

• In addition to how the individuals are chosen for a sample, we distinguish between the following two scenarios:– Observational Study: Simply observe and/or

measure individuals, without attempting to modify their characteristics or behavior.

– Experiment: Deliberately impose a specific set of conditions (a treatment) on each individual. A valid experiment has more than one possible treatment, and we can compare results.

21

08/15/11

Example: Experiment vs. Observational Study

• Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Observational Study:– Just before the exam, record how much caffeine each

student has consumed today.

– Record each student’s exam score.

• It will probably be the case that different students consume different amounts of caffeine, but we do not deliberately try to create this difference.

22

08/15/11

Example: Experiment vs. Observational Study

• Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Experiment:– Require students to consume no caffeine on exam day.– 15 minutes before the exam, give each student a cup

of coffee. Some students will get regular coffee (with caffeine), others will get decaffeinated coffee.

– Record each student’s exam score.

• It will certainly be the case that different students consume different amounts of caffeine, because we deliberately created such a difference.

23

Questions for Discussion

• In the Experiment, why not give all students a cup of (caffeinated) coffee?

• In the Experiment, why not use “regular coffee” versus “no coffee”?

• What are some advantages/disadvantages of the Study versus the Experiment?

• Can you think of a scenario where it would not be possible to do an Experiment?

24