quantitative vs. categorical data quantitative data consist of number that represent counts or...
TRANSCRIPT
Quantitative vs. Categorical Data
• Quantitative data consist of number that represent counts or measurements.
• All quantitative data is numerical, but not all numerical data is quantitative.
• Data with a unit of measurement (seconds, feet, pounds, dollars, etc.) is quantitative.
• Numerical data used as a label or range of values (Student ID Number, 20-25 years) is not quantitative.
1
Examples: Quantitative Data
The University keeps the following quantitative data about each student.• Grade Point Average
• Number of Credit Hours Completed
• Age
• Amount of money owed for tuition
• Other examples?
2
3
Categorical Data
Non-quantitative data is called categorical.• Non-numerical data must be categorical.• Numerical data that serves to label or
identify individuals are categorical (Example: Social Security Number).
• A useful guide: Would it make sense to consider an average value? If not, treat the data as categorical.
24
Examples: Categorical Data
The University keeps the following categorical data about each student:• Name
• Laker ID Number
• Date of Birth
• Gender
• Residency (“in-state” or “out-of-state”)
• Other?
Quick Review
• We typically have a VERY LARGE set of individuals (called the Population), but we cannot obtain data from every individual.
• A parameter is a numerical value that describes the population. The actual value is not known.
• We choose a subset of the Population (this subset is called a Sample), and gather data from those individuals.
• A statistic is a numerical value, computed from the Sample data. We often use this to estimate the unknown value of some parameter.
6
08/15/11
Sampling Bias Ideally, we want our sample to be representative
of the overall population. If the way we choose the sample and/or gather
data from the chosen individuals… …is more/less likely to include a certain type of
individual or produce a certain type of response... …then the conclusions we draw from the sample might be inaccurate for the intended population.
This is called bias. Examples to follow.
8
08/15/11
Examples of Bias
Estimate average class height using a sample of students from the front row.
Estimate average class height using a sample of male students.
Study the effectiveness of a weight-loss diet using a sample of professional athletes.
Estimate what percent of Americans approve of the president using a sample of voters from only one political party.
9
08/15/11
Common Types of Sampling Bias
• A voluntary response sample occurs when the individuals to be studied have control over whether or not they are included in the sample.– This is also called “self-selection bias.”
• A convenience sample occurs when the researcher is more likely to choose individuals for which it is easier to obtain data.– The researcher might be unaware of this!
• Small sample: Using too few individuals increases of chance of getting a sample that consists only of “unusual” individuals.
10
Example: Voluntary Response
• Ratemyprofessors.com is a website that collects information about college professors from their students.
• The ratings come from students volunteering to create an account and submit information.
• Question: What kind of students are likely to volunteer?
11
08/15/11
Voluntary Response Bias
• Answer: Students with stronger opinions are more likely to volunteer a response.
• In many “customer satisfaction” surveys, those with a strong negative opinion are most likely to volunteer. Those with a neutral opinion are least likely to volunteer.
• There is potential bias: those in the sample are more likely to have a negative opinion than the entire population.
12
08/15/11
Example: Convenience Sample
• I want to determine the average age of all current Clayton State students.
• For my sample data, I choose five students from the class and compute their average age.
• Why might this lead to inaccurate results?
• Although my intended population is all CSU students, I picked only from a small part of the population (that was most convenient for me).
13
08/15/11
Other Common Problems
• Some types of bias occur not in choosing the sample, but in gathering data from the chosen individual.
• Misreported data: Individuals may give inaccurate results (perhaps unintentionally) when asked a certain question.– Example: How much do you weigh today?
– Example: How many hours per week do you study?
• Question wording: Variations in the wording of a question can greatly influence people’s responses. Compare:– Should the government spend more money on public education?
– Should the government spend more of your tax dollars on public education?
14
08/15/11
Good Ways to Sample
• We usually want to have some degree of randomness when choosing our sample.
• Randomly-chosen samples reduce the potential for biased results, but complete randomness is not always possible.
• We’ll talk more about what “random” actually means in Chapters 4 and 5.
15
08/15/11
*** Simple Random Sample ***
• All of the statistical inference in this course will assume that data comes from a Simple Random Sample (SRS). This means…– Before choosing individuals, we decide how
many we want to use. This is called the sample size, usually denoted by the letter n.
– We choose the sample so that each group of n individuals (from the overall population) is equally likely to be picked.
16
08/15/11
Other “Good” Samples
• Random Sample: Each individual from the population has an equal chance of being chosen for the sample (every SRS is a Random Sample, but not every Random Sample is an SRS).
• Probability Sample: For each individual, we know the chance that he/she will be chosen for the sample, but different individuals may have different chances (Random Samples and SRS’s are special cases of this).
17
08/15/11
Other “Good” Samples
• Stratified Sample: Divide the population into mutually exclusive groups (strata), and choose a sample (often an SRS) from within each group.– This is often done if we want to account for
some kind of population demographics.– Example: If our population is 60% women and
40% men, we might choose a sample of 30 women and 20 men. Sample demographics match those of the population.
18
08/15/11
Other “Good” Samples
• Cluster Sampling: Divide the population into mutually exclusive groups (clusters), and randomly choose a set of clusters. For each selected cluster, gather data from all individuals within that cluster.– Example: There are 13 sections of Math 1231
currently offered at Clayton State. Randomly choose 3 sections, and survey all students from each of those 3 sections.
19
Other Types of Samples?
• Real-world statistics often uses very complex sampling methods. See the text for a survey of some of these.
• The trade-off: A simple method (like an SRS) is easier to analyze mathematically, but often more difficult to achieve in practice.
08/15/11 20
08/15/11
How are the data obtained?
• In addition to how the individuals are chosen for a sample, we distinguish between the following two scenarios:– Observational Study: Simply observe and/or
measure individuals, without attempting to modify their characteristics or behavior.
– Experiment: Deliberately impose a specific set of conditions (a treatment) on each individual. A valid experiment has more than one possible treatment, and we can compare results.
21
08/15/11
Example: Experiment vs. Observational Study
• Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Observational Study:– Just before the exam, record how much caffeine each
student has consumed today.
– Record each student’s exam score.
• It will probably be the case that different students consume different amounts of caffeine, but we do not deliberately try to create this difference.
22
08/15/11
Example: Experiment vs. Observational Study
• Question: Is there any sort of relationship between caffeine use and exam scores? Here is an Experiment:– Require students to consume no caffeine on exam day.– 15 minutes before the exam, give each student a cup
of coffee. Some students will get regular coffee (with caffeine), others will get decaffeinated coffee.
– Record each student’s exam score.
• It will certainly be the case that different students consume different amounts of caffeine, because we deliberately created such a difference.
23
Questions for Discussion
• In the Experiment, why not give all students a cup of (caffeinated) coffee?
• In the Experiment, why not use “regular coffee” versus “no coffee”?
• What are some advantages/disadvantages of the Study versus the Experiment?
• Can you think of a scenario where it would not be possible to do an Experiment?
24