quality of data - weebly

55
1/ 55 Design of Experiments Sampling Design Quality of Data Hosung Sohn Department of Public Administration and International Affairs Maxwell School of Citizenship and Public Affairs Syracuse University Lecture Slide 3-2 (September 24, 2015) Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

Upload: others

Post on 26-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

1/ 55

Design of ExperimentsSampling Design

Quality of Data

Hosung Sohn

Department of Public Administration and International AffairsMaxwell School of Citizenship and Public Affairs

Syracuse University

Lecture Slide 3-2 (September 24, 2015)

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

2/ 55

Design of ExperimentsSampling Design

Table of Contents

1 Design of Experiments

2 Sampling Design

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

3/ 55

Design of ExperimentsSampling Design

Announcement

Problem Set 1 is due at the end of our class on September 29 (Tuesday).

If you have any questions regarding Problem Set 1, please come by myoffice during today’s office hours.

Solutions to Problem Set 1 will be posted only on Blackboard onSeptember 29 (Tuesday after class).

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

4/ 55

Design of ExperimentsSampling Design

Review of Previous Lecture

The most important factor for sound data analysis: quality of data!

=⇒ You cannot overcome flaws in your data by using statisticalmethods—regardless of how fancy, sophisticated your methods are.

Carefully examining where the data come from should precede any dataanalysis.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

5/ 55

Design of ExperimentsSampling Design

Review of Previous Lecture

Four main sources of data:

1. Anecdotal evidence

=⇒ Although limited, it can be a very good starting point of yourresearch.

2. Data that are publicly available

=⇒ Data are not created for a specific research purpose, only contains alimited amount of information, and data are not available at theindividual level.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

6/ 55

Design of ExperimentsSampling Design

Review of Previous Lecture

3. Data from sample survey

A. Idea: The idea of “sample” in a sample survey is to study a part in orderto gain information about the whole.

B. Why sample? Because of cost, time, and error.

C. Although useful, it is limited in establishing causality between twovariables—it is basically an observational study.

4. Data from experiments

=⇒ To establish causality, data obtained from experiments areconsidered the only source of convincing data.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

7/ 55

Design of ExperimentsSampling Design

Review of Previous Lecture

The vitamin C example: providing vitamin C to reduce the prevalenceof late arrivers.

1. Hosung’s initial conclusion was not convincing:

=⇒ no valid “control group.”

2. Hosung’s subsequent conclusion was again not convincing:

=⇒ did not control for the “placebo” effect.

3. Hosung’s third attempt was again not convincing:

=⇒ did not control for “selection bias.”

To prove that vitamin C is an effective way of reducing late arrivers:

=⇒ randomize the treatment assignment!

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

8/ 55

Design of ExperimentsSampling Design

Randomization

To recap, the friend’s argument is based on the fact that if there is“selection” into the treatment or control groups, then the observedeffect may not be due to the treatment itself.

Rather, the effect might be reflecting a host of other differences betweenthe treatment and control groups.

To put it differently, in order for Hosung to argue in favor of thevitamin C effect, the only difference between the treatment and controlgroups should be the treatment itself.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

9/ 55

Design of ExperimentsSampling Design

Why Randomize the Treatment Assignment?

The solution to achieve the aforementioned goal is to “randomize” theselection of treatment and control groups.

If assignment to either the treatment or control group is donecompletely at random:

1. Prevents any systematic difference in the treatment and control group.

2. Prevents one from drawing spurious conclusions.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

10/ 55

Design of ExperimentsSampling Design

Rationale Behind Randomization

The assignment to the treatment and control groups is determinedsolely by “chance”:

1. The assignment does not depend on any characteristics of theexperimental units.

2. The assignment does not depend on the judgment of the experimenter.

=⇒ It is unlikely that differences arise between the two groups otherthan the treatment itself.

Randomization is what allows one to be confident that there is a causalrelationship between X and Y , rather than some lurking or confoundingvariables.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

11/ 55

Design of ExperimentsSampling Design

How Should We Randomize the Assignment?

The idea of randomization is to assign subjects to treatments bydrawing numbers from a bowl.

In practice, experimenters use statistical software.

1. Prepare a list of employees’ names.

2. Use computer to randomly assign numbers from 1 to 1,000 to eachemployee.

3. Assign employees who received an even number to the treatment group.

4. Assign employees who received an odd number to the control group.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

12/ 55

Design of ExperimentsSampling Design

Limitations of Randomized Experiments

If we randomly assign individuals to the treatment and control group,then by the virtue of “chance,” characteristics ofindividuals—observable or unobservable—will be likely to be equalbetween the two groups.

And that is why, in principle, randomized experiments are consideredthe “gold” standard for evidence of cause and effect.

Unfortunately, however, randomized experiments are not free oflimitations. And there are four potential limitations:

1. Experimenter bias.

2. Noncompliance.

3. Attrition.

4. Lack of realism.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

13/ 55

Design of ExperimentsSampling Design

Limitation 1: Experimenter Bias

The experimenter often has some preference over the outcome of theexperiment.

To obtain a favorable result, the experimenter may engage incommunicating one’s expectations to the subjects so that participantsof the experiment may alter their behavior to conform to theexpectation of experimenters.

This kind of bias is extremely likely in experimental studies becauseexperiments are often funded by a huge amount of grants.

The experimenter usually does not want to make the funder unhappy,so oftentimes, experimenters engage in undesirable manner.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

14/ 55

Design of ExperimentsSampling Design

Limitation 1: Experimenter Bias

To prevent such experimenter bias, we use “double-blinded”experiments.

Double-blinded experiments: neither the subjects nor the experimentersknow which treatment any subject has received.

Best scenario is the one in which no one even knows that they are underexperimental setting.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

15/ 55

Design of ExperimentsSampling Design

Limitation 2: Noncompliance

Another potential limitation inherent in any experiment is the issue ofnoncompliance.

Consider the vitamin C example.

1. Suppose people who were given the GNCR© vitamin C are not satisfiedwith the GNCR© vitamin.

2. Then they may engage in purchasing the Nature MadeR© vitamin C andtake the Nature MadeR© vitamin.

=⇒ i.e., employees in the control group are not complying to theprotocol of the experiment.

3. If there are many noncompliers, it creates a huge problem as to thevalidity of a randomized experiment. Why? Think about the extremecase; i.e., every employee in the control group takes the Nature MadeR©

vitamin.

=⇒ There wouldn’t be a difference in the “treatment” between thetreatment and control group.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

16/ 55

Design of ExperimentsSampling Design

Limitation 2: Noncompliance

Preventing noncompliance issues are extremely hard.

The double-blind strategy is helpful in reducing participants’ incentivesto not comply, but it’s not perfect.

In the vitamin C example, we are “giving” vitamin C to employees. Weare not putting the vitamin C into the mouth of employees.

If employees who are given vitamin C do not actually “take” thevitamin that they are given, then it may also create an issue ofnoncompliance—if those who did not take the vitamin C aresystematically different between the two groups.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

17/ 55

Design of ExperimentsSampling Design

Limitation 2: Noncompliance

In principle, we have to conduct a well-designed experiment in order toprevent noncompliance issues.

But what constitutes a well-designed experiment that is free ofnoncompliance issues is oftentimes hard to tell beforehand.

The bottom line is that you need to think seriously about whether yourexperiment is liable to potential noncompliance issues.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

18/ 55

Design of ExperimentsSampling Design

Limitation 3: Attrition

Sometimes, it is hard to maintain the same number of subjects till theend of an experiment.

That is, the subjects may “attrit” or drop out of the experimentalsetting.

And we say that, in this case, the experiment suffers from attritionproblems.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

19/ 55

Design of ExperimentsSampling Design

Limitation 3: Attrition

Consider again the vitamin C example:

1. We divided the employees into an equal size; i.e., 500 for each group.

2. And we decided to give vitamin C pills for 30 days.

3. For reasons we do not know, however, some employees may leave thecompany before this 30-day period.

4. Those in the treatment group who left the company may be“systematically” different from those in the control group, who also leftthe company.

5. Then the treatment and control groups may no longer be “comparable.”

=⇒ The validity of the experiment is severely compromised.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

20/ 55

Design of ExperimentsSampling Design

Limitation 3: Attrition

Attrition is especially likely in the control group because subjects in thecontrol group may not like the fact that they are not receiving thetreatment.

Again, preventing attrition is challenging in any experiment.

When you see research based on randomized experiment, you should beon the look out for the prevalence of attrition.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

21/ 55

Design of ExperimentsSampling Design

Limitation 4: Lack of Realism

The most serious potential weakness of an experiment is lack of realism.

The subjects, treatments, or setting of an experiment may notrealistically duplicate the conditions we really want to study.

For example, many experiments in psychology-related literature usestudent subjects in a campus setting.

=⇒ It is less likely that the conclusions from these subjects would applyto the real world.

Most experimenters want to generalize their conclusions to some settingwider than that of the actual experiment.

=⇒ But this lack of realism may prevent the experimenters to apply theconclusions of an experiment to the settings of greatest interest.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

22/ 55

Design of ExperimentsSampling Design

Implications

Aforementioned limitations nonetheless, the randomizedexperiment—because of its ability to give convincing evidence forcausation—is one of the most important ideas in statistics.

It is not too much to say that the use of randomized experimentadvanced our knowledge regarding the truth.

And it is important to note that a good experiment requires:

1. Attention to detail.

2. Good statistical design.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

23/ 55

Design of ExperimentsSampling Design

Outline of a Randomized Experiment

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

24/ 55

Design of ExperimentsSampling Design

Caveats on the Observed Difference

The last step of a randomized experiment is to compare the differencein an outcome variable (or a response variable).

=⇒ If there is difference in the outcome, then it doesn’t necessarilyimply that the difference is due to the treatment!!

The observed difference in the outcome, if any, implies two things:

1. The difference is due to the treatment, OR

2. The difference is due to “chance.”

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

25/ 55

Design of ExperimentsSampling Design

Caveats on the Observed Difference

The fact that we find a difference from this sample does not “prove”that the treatment is effective.

The effect we found for this case might be specific to this sample.

So we need a way to verify that the effect we found is strong enough toargue in favor of the treatment effect and that it may apply to wholepopulation.

This brings us to the realm of “sampling design” and “statisticalinference.”

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

26/ 55

Design of ExperimentsSampling Design

Exercise 2-1

The evaluations of two instructors by their students are compared whenit is time to decide raises for the coming years. Teacher A always handsout the evaluation forms in class when the grades on the first exam aregiven to students. Teacher B, on the other hand, always hands out theevaluation forms at the end of a class in which a very interesting filmclip is shown.

Question: Is the Teacher A evaluation form biased?

=⇒ Answer: Yes!

Question: Is the Teacher B evaluation form biased?

=⇒ Answer: Yes!

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

27/ 55

Design of ExperimentsSampling Design

Exercise 2-2

Explain how you would improve each of the following proposedexperiments.

1. Two product promotion offers are to be compared. The first, which offerstwo items for $2 will be used in a store on Friday. The second, whichoffers three items for $3, will be used in the same store on Saturday.

=⇒ Shopping patterns may differ on Friday and Saturday.

2. A study compares two marketing campaigns to encourage individuals toeat more fruits and vegetables. The first campaign is launched in Floridaat the same time that the second campaign is launched in Minnesota.

=⇒ Florida and Minnesota are not comparable.

3. You want to evaluate the effectiveness of a new investment strategy. Youtry the strategy for one year and evaluate the performance of thestrategy.

=⇒ No control group.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

28/ 55

Design of ExperimentsSampling Design

Exercise 2-3

Explain what is wrong with each of the following “randomization”procedure and describe how you would do the randomization correctly.

1. 20 students are to be used to evaluate a new treatment. Ten men areassigned to receive the treatment and 10 women are assigned to be thecontrols.

=⇒ Assigning subjects by gender is not random.

=⇒ I would rather assign 5 men and 5 women to each treatment.

2. 10 subjects are to be assigned to two treatments, 5 to each. For eachsubject, a coin is tossed. If the coin comes up heads, the subject isassigned to the first treatment; if the coin comes up tails, the subject isassigned to the second treatment.

=⇒ This randomization will not necessarily divide the subjects into twogroups of 5.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

29/ 55

Design of ExperimentsSampling Design

Exercise 2-4

“Bee pollen is effective for combating fatigue, depression, cancer, andcolon disorders.” So says a website that offers the pollen for sale. Wewonder if it really does prevent colon disorders. Here are two ways tostudy this question. Explain why the first design is more compelling.

1. Find 400 women who do not have colon disorders. Randomly assign 200to take bee pollen capsules and the other 200 to take placebo capsulesthat are identical in appearance. Follow both groups for five years.

2. Find 200 women who take bee pollen regularly. Match each with awoman of the same age, race, and occupation who does not take beepollen. Follow both groups for five years.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

30/ 55

Design of ExperimentsSampling Design

Sampling Design: Introduction

We are often interested in drawing conclusions about a population usinga sample.

Population: the entire group of individuals that we want informationabout.

Sample: a part of the population that we actually examine, that we useto draw conclusions about population.

“Population” is defined in terms of our desire for knowledge.

e.g.) If we wish to draw conclusions about married couple with nochildren, then this group is our population.

e.g.) If we wish to draw conclusions about married couple, then allmarried couple—regardless of whether they have children or not—is ourpopulation.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

31/ 55

Design of ExperimentsSampling Design

Two Types of Samples

To draw a sound conclusion regarding the population from the sample:

=⇒ The sample should be “representative” of the population!!

Note that there are two types of samples:

1. A voluntary response sample

2. A probability sample

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

32/ 55

Design of ExperimentsSampling Design

Voluntary Response Sample

Voluntary Response Sample

DefinitionA voluntary response sample consists of people who choose to respond toa general appeal.

e.g.) online opinion polls use voluntary response samples.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

33/ 55

Design of ExperimentsSampling Design

Voluntary Response Sample

A critical issue with using this sample to draw conclusions about apopulation:

=⇒ The voluntary response sample suffers from selection bias.

=⇒ That is, the voluntary response sample will not likely to representwell the population of your interest. Why?

Consider the online opinion poll example: people who participate in thepolls are not representative of a population, because:

1. Only people with internet access will respond to online polls.

2. Only people with strong opinions care enough to respond to the polls.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

34/ 55

Design of ExperimentsSampling Design

Voluntary Response Sample

To put it differently, because people “self-select” on participating theonline poll, the resulting sample would suffer from selection bias.

And if people who participate in the online poll are systematicallydifferent from people that you want to generalize about, then using thisvoluntary response sample will not give you a relevant information onwhich you can draw conclusion from.

So the “selection” is the culprit.

How do we prevent this “selection” in choosing a sample?

=⇒ We make use of chance when choosing a sample.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

35/ 55

Design of ExperimentsSampling Design

Probability Sample

The benefit of allowing “chance” to do the choosing:

=⇒ There will be no favoritism either by the sampler or participants.

Random selection of a sample eliminates selection bias by giving allindividuals an equal chance to be chosen, just as randomizationeliminates bias in assigning experimental units.

The sample chosen by chance is called a probability sample.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

36/ 55

Design of ExperimentsSampling Design

Random Sample vs. Random Assignment

Random “sample” is not the same as random “assignment.”

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

37/ 55

Design of ExperimentsSampling Design

Probability Sample

Probability Sample

DefinitionA probability sample is a sample chosen by “chance.”

Two types of samples are widely used as a probability sample:

1. A simple random sample (SRS)

2. A stratified random sample

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

38/ 55

Design of ExperimentsSampling Design

Simple Random Sample (SRS)

Suppose a sample of n observations is selected from a population of Nobservations.

Definition

A simple random sample (SRS) of size n consists of n observationsdrawn from the population chosen as follows;

1) every possible set of n observations has the same probability of beingselected; and

2) the selection of any one observation in no way affects the chance ofselecting any other observations.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

39/ 55

Design of ExperimentsSampling Design

Simple Random Sample (SRS)

If you pick an SRS from a population, the “probability” that this SRSwill represent the population is high.

But sometimes, an SRS may fail to draw a sample that is representativeof a population.

=⇒ Especially likely when a population of your interest is very, verylarge in size.

For example, suppose your population of interest is everyone in the US.And consider drawing an SRS of size n = 1, 000.

Although an SRS gives each member of the population an equal chanceto be selected, the resulting sample chosen by an SRS might notrepresent the whole population in the US.

=⇒ The population characteristics of the US are so diverse and thatthey are spread out over a wide area.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

40/ 55

Design of ExperimentsSampling Design

Stratified Random Sample

In such situation, we select a sample using a “stratified” samplingdesign.

The sample obtained from a stratified sampling design is called astratified random sample.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

41/ 55

Design of ExperimentsSampling Design

Stratified Random Sample

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

42/ 55

Design of ExperimentsSampling Design

Other Probability Samples

There are many other probability samples:

e.g.) Systematic sample, cluster sample, multi-stage sample, etc.

Two takeaway points:

1. An SRS is the building block of more elaborate samples.

2. An SRS is the fundamental survey design and the one upon which moststatistical inference techniques are based.

In the end, every kind of probability samples has one purpose:

=⇒ To get a sample that is representative of a population!

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

43/ 55

Design of ExperimentsSampling Design

Potential Problems with Survey

We learned that randomly selecting a sample eliminates bias in thechoice of a sample from a population.

But we also learned that the fact that a sample is selected randomlydoes not guarantee that the sample we obtained represents thepopulation very well.

There are three potential scenarios that lead to a sample that is notrepresentative of a population:

1. Undercoverage.

2. Nonresponse.

3. Response bias.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

44/ 55

Design of ExperimentsSampling Design

Undercoverage

When we draw a sample, we need an accurate and complete list ofindividuals in the population if we were to produce a sample thatrepresents a population well.

Note, however, such a list is rarely available.

=⇒ As such, most samples suffer from some degree of undercoverage.

Consider a sample survey of households. This survey will miss:

1. Homeless people.

2. Prison inmates.

3. Students in dormitories.

4. etc.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

45/ 55

Design of ExperimentsSampling Design

Undercoverage

Undercoverage is especially salient in an opinion poll.

For example, consider an opinion poll conducted via telephone.

=⇒ This sample will miss the large number of households withoutresidential phones.

e.g.) The British election of 1992: leading opinion poll failed to predictthe election outcome.

So many national sample surveys have some bias because of thisundercoverage.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

46/ 55

Design of ExperimentsSampling Design

Nonresponse

Another serious source of bias in sample surveys is nonresponse.

Nonresponse occurs when an individual chosen randomly for the samplecannot be contacted or does not cooperate.

This presents big problems for the survey because this individual—whowas randomly chosen—represents some proportion of the population.

=⇒ Not including him/her will bias information away from the truepopulation.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

47/ 55

Design of ExperimentsSampling Design

Response Bias

The third source of bias is called a response bias.

Some respondents who are interviewed may lie systematically abouttheir behaviors, or make mistakes when responding to questions.

e.g.) Abortion is commonly understood to be underreported in manysurvey data.

=⇒ Young women may be uncomfortable saying that they have had anabortion, particularly to a stranger.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

48/ 55

Design of ExperimentsSampling Design

Response Bias

e.g.) Asking individuals to recall the past can also lead to incorrectinformation.

=⇒ Many people respond as “yes” to a question, “Have you visited adentist in the last six months?” even though he/she visited a dentisteight months ago.

e.g.) Wording of questions also elicits a response bias.

=⇒ Only 13% think that government is spending too much on“assistance to the poor.”

=⇒ But 44% think that government is spending too much on “welfare.”

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

49/ 55

Design of ExperimentsSampling Design

Wrap Up on Sample Survey

When thinking about the validity of a sample, you should seriouslythink about the possible issues mentioned above.

For example, many polls done by the media and market research andopinion-polling firms do not disclose their rates of nonresponse.

So insist on knowing the exact questions asked, nonresponse rate, andthe method of the survey design before you trust any poll results.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

50/ 55

Design of ExperimentsSampling Design

Exercise 3-1

Explain what is wrong with each of the following random selectionprocedures and explain how you would do the randomization correctly.

1. To determine the reading level of an introductory statistics text, youevaluate all the written material in Chapter 3.

=⇒ The content of a single chapter is not random; choose random wordsfrom random pages.

2. You want to sample student opinions about a proposed change inprocedures for changing majors. You hand out questionnaires to 100students as they arrive for class at 7:30 A.M.

=⇒ Students who are registered for an early-morning class might havedifferent characteristics from those who avoid such classes.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

51/ 55

Design of ExperimentsSampling Design

Exercise 3-1

3. A population of subjects is put in alphabetical order and a simplerandom sample of size 10 is taken by selecting the first 10 subjects inthe list.

=⇒ Alphabetic ordering is not random.

=⇒ One problem is that the sample might include people with the samelast name (siblings, spouses, etc.).

=⇒ Additionally, some last names tend to be more common in someethnic groups.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

52/ 55

Design of ExperimentsSampling Design

Exercise 3-2

A committee on community relations in a college town plans to surveylocal businesses about the importance of students as customers. Fromtelephone book listings, the committee chooses 160 businesses atrandom. Of these, 72 return the questionnaire mailed by the committee.

1. What is the population for this sample survey?

=⇒ All local businesses OR businesses listed in a phone book.

2. What is the sample?

=⇒ The sample is the 72 businesses that “returned” the questionnaire.

3. What is then 160 businesses?

=⇒ It is an SRS drawn from the population.

4. What’s the issue with our sample?

=⇒ Because of nonresponse (i.e., 88/160 = 55%), our sample may notrepresent the population well.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

53/ 55

Design of ExperimentsSampling Design

Exercise 3-3

Comment on each of the following as a potential sample surveyquestions. Is the question clear? Is it slanted toward a desiredresponse?”

1. “Some cellphone users have developed brain cancer. Should allcellphones come with a warning label explaining the danger of using cellphones?

=⇒ The question is not clear: What do you mean by cell phone users?How are you going to label a warning sign?

=⇒ It is also slanted toward a desired response because the statementsays that “some cell phone users have developed brain cancer,” withoutproviding strong evidence of the fact that cell phones cause brain cancer.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

54/ 55

Design of ExperimentsSampling Design

Exercise 3-3

2. “Do you agree that a national system of health insurance should befavored because it would provide health insurance for everyone andwould reduce administrative costs?”

=⇒ The question is not clear. What kind of system?

=⇒ It is slanted toward a desired response: How do you know thesystem would provide health insurance for everyone?

=⇒ It lists two benefits of such a system, and no arguments from theother side of the issue.

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)

55/ 55

Design of ExperimentsSampling Design

Exercise 3-3

3. “In view of escalating environmental degradation and incipient resourcedepletion, would you favor economic incentives for recycling ofresource-intensive consumer goods?”

=⇒ The question is not clear: What kinds of economic incentives?What kinds of degradation?

=⇒ Besides, it doesn’t say about benefits and costs of providingeconomic incentives for recycling.

=⇒ A better phrasing might be, “Would you be willing to pay more forthe products you buy if the extra cost were used to conserve resourcesby encouraging recycling?”

Hosung Sohn (Lecture Slide 3-2) Introduction to Statistics: PAI 721 (Fall, 2015)