collecting data sensibly

40
CHAPTER 2 Collecting Data Sensibly

Upload: zita

Post on 22-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Collecting Data Sensibly. Chapter 2. 2.1 Statistical Studies: Observation and Experimentation. Whether or not a conclusion is reasonable depends on how the data were collected. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Collecting Data Sensibly

CHAPTER 2

Collecting Data Sensibly

Page 2: Collecting Data Sensibly

2.1 Statistical Studies: Observation and Experimentation

Whether or not a conclusion is reasonable depends on how the data were collected.

Sometimes we’re are interested in answering questions about the characteristics of a single population or in in comparing two or more well defined populations.

Sometimes we’re trying to answer questions dealing with the effect of a certain explanatory variable on some response.

Page 3: Collecting Data Sensibly

In the former situation, an observational study is conducted. Investigator observes characteristics of a subset of the member of

one or more existing populations. Goal is usually to draw conclusions about the corresponding

population or about differences between two or more populations.In the latter, an experiment is conducted.

Investigator observes how a response variable behaves when one or more explanatory variables, sometimes called factors, are manipulated.

Goal is to determine the effect of the manipulated factors on the response variable.

Composition of the groups that will be exposed to different experimental conditions is determined by random assignments.

Page 4: Collecting Data Sensibly

Important difference between observational study and experiment. Well-designed experiment can result in data that

provide evidence for a cause-and-effect relationships. Alternatively, observational studies can not because it

is possible that the observed effect is due to some variable other than the factor being studied. Such variables are called confounding variables –

variables that are related to both group membership and the response variable of interest in a research study.

Page 5: Collecting Data Sensibly

Example of confounding variables in a study A July 1, 2003 article from San Luis Obispo Tribune

summarized the conclusion so a government advisory panel that investigated the benefits of vitamin use.

Panel looked at a large number of studies and concluded that the results were “inadequate or conflicted”

Major concern, many studies were observational in nature and the panel worried that people might healthier just because they take better care of themselves in general.

Confounding variable was lifestyle.

Page 6: Collecting Data Sensibly

Two different types of conclusions have been described: One type involves generalizing from what we have seen in a

sample to some larger population. The other involves reaching a cause-and-effect conclusion about

the effect of an explanatory variable on response.

It is important to think carefully about the objectives of a statistical study before planning how the data will collected.

Both observational studies and experiments must be carefully designed if the resulting data are to be useful.

Page 7: Collecting Data Sensibly

Table 2.1 Drawing conclusions from statistical studies

Study Description Reasonable to generalize about group characteristic to the population?

Reasonable to draw cause-and-effect conclusion?

Obs. study w/ sample selected at random from a population of interest

Yes No

Obs. Study based on convenience or voluntary response sample (poor sampling design)

No No

Experiment w/ groups formed by random assignment of individuals or objects to experimental conditions

Individuals used in study are volunteers or not randomly selected from some population of interest.

No Yes

Individuals or objects used in study are randomly selected from some population of interest

Yes Yes

Experiment with groups not formed by random assignment to experimental conditions (poorly designed experiment)

No No

Page 8: Collecting Data Sensibly

2.2 Sampling

If one is to generalize about a population from a sample, the sample must be representative of the population. If sample is chosen haphazard on the basis of convenience alone,

it is impossible to interpret the resulting data with confidence.There is not way to tell just by looking at a sample

whether it is representative of the population from which it was drawn.

A census – obtaining information from an entire population – is often not feasible, so samples are selected instead. Process may be destructive Limited resources: not enough time and money

Page 9: Collecting Data Sensibly

Bias in Sampling

Bias – the tendency for samples to differ from the corresponding population in some systematic way.

Selection bias – bias resulting from the systematic exclusion if some part of the population. Example: Taking a sample of opinion in a community

by selecting participants from phone numbers in the local phone book would systematically exclude people who choose to have unlisted numbers, people who do not have phones, and people who have moved into the community since the telephone directory was published.

Page 10: Collecting Data Sensibly

Measurement bias or response bias – bias resulting from the method of observation tends to produce values that differ from the true value. Examples:

Taking a sample of weights of a type of apple when the scale consistently gives a weigh that is 0.2 ounces high.

When questions on a survey are worded in a way that tends to influence the response. A Gallup survey sponsored by the American Paper Institute (Wall

Street Journal, May 17, 1994) included the following question: “It is estimated that disposable diapers account for less than 2 percent of the trash in today’s landfills. In contrast, beverage containers, third-class mail and yard waste are estimated to account for 21 percent of trash in landfills. Given this, in you opinion, would it be fair to tax or ban disposable diapers?”

Page 11: Collecting Data Sensibly

Nonresponse bias – bias that results when data are not obtained from all individuals selected for inclusion in the sample. This form of bias is lowest when response rate is high. Highest nonresponse rates are mail, telephone and

internet, but are cheapest to conduct. Best response rates are from personal interviews, but

are expensive to conduct.

Page 12: Collecting Data Sensibly

Important note on bias Bias is introduced by the way in which a sample is

selected or by the way in which the data are collected from the sample so that increasing the size of the sample does nothing to reduce the bias.

Page 13: Collecting Data Sensibly

Random Sampling

Random sampling helps reduce bias from samples.Most inferential methods introduced in this text

are based on the idea of random selection.Simple Random Sample of size n - a sample

that is selected in a way that ensures that every different possible sample of the desired size has the same chance of being selected. A common method of selecting a random sample is to first

create a list, called a sampling frame of the individuals in the population. Each item on the list can then be identified by a number, and a table random digits or a random number generator can be used to select the sample.

Page 14: Collecting Data Sensibly

Sampling with replacement – means that after each successive item is selected for the sample, the item is “replaced” back into the population and may therefore be selected again. Example: Choose a sample of 5 digits by spinning a

spinner and choosing the number where the pointer is directed.

Page 15: Collecting Data Sensibly

Sampling without replacement – after an item is selected for the sample it is removed from the population and therefore cannot be selected again. Example: A hand of “five card stud” poker is dealt

from an ordinary deck of playing cards. Typically, once a card is dealt it is not possible for that card to appear again until the deck is reshuffled and dealt again.

Page 16: Collecting Data Sensibly
Page 17: Collecting Data Sensibly

A Note on Sample Size

Common misconception If sample size is relatively small compare to the

population size, the sample can’t possibly accurately reflect the population.

The random selection process allows us to be confident that the resulting sample adequately reflects the population, even when the sample consists of only a small fraction of the population (see Figure 2.1 for illustration of this idea).

Page 18: Collecting Data Sensibly
Page 19: Collecting Data Sensibly

Other Sampling Methods

In some situations, alternative sampling methods may be less costly, easier to implement, or more accurate.

Stratified Random Sampling – separate random samples are taken from a set of non-overlapping subpopulations, called strata (or stratum, singular). Example: Estimating malpractice insurance cost

among subgroups of doctors. Provides information about subgroups as well as

overall pop. Allow to make more accurate inferences about a

population than does SRS.

Page 20: Collecting Data Sensibly

Cluster sampling – involves dividing a population if interest into nonoverlapping subgroups, called clusters, selecting clusters at random, and all individuals in the cluster are included in the sample. In this case, it is ideal for each cluster to mirror the

characteristics of the population. Note: The ideal situation occurs when it is reasonable to

assume that each cluster reflects the general population. If that is not the case or when clusters are small, a large number of clusters must be selected to get a sample that reflects the population.

Second note: Do not confuse stratified and cluster sampling. Strata must be homogenous (similar). Clusters must be heterogeneous (reflecting variability in the population).

Page 21: Collecting Data Sensibly

Systematic sampling is a procedure that can be employed when it is possible to view the population of interest as consisting of a list or some other sequential arrangement. A value k is specified (a number such as 25, 100, 2500…). The one of the first k individuals is selected at random, and then ever kth individual in the sequence is selected to be included in the sample. Called 1 in k systematic sampling. Example: In a large university, a professor wanting to select a

sample of students to determine the student’s age, might take the student directory (an alphabetical list) and randomly choose one of the first 100 students) and then take every 100th student from that point on.

Works as long as there is no repeating patterns in the population.

Page 22: Collecting Data Sensibly

Convenience sampling is using and easily available or convenient group to form a sample.

Example: A “voluntary response sample” is often taken by television news programs. Viewers are encouraged to go to a website and “vote” yes or no on some issue. The commentator then would announce the results of the survey.

A recipe for disaster! Results are rarely informative about the true nature of the population; wouldn’t want to generalize about the population.

Page 23: Collecting Data Sensibly

2.3 Simple Comparative Experiments

Sometimes the question we are trying to answer deals with the effect of a certain explanatory variable on some response. “What happens when…?” “What is the effect of…?”

To address these types of questions, the researcher conducts an experiment.

Page 24: Collecting Data Sensibly

Experiment – a planned intervention undertaken to observe the effects of one or more explanatory variables, called factors, on a response variable. Purpose is to increase understanding of the nature of

the relationships between the explanatory and response variables.

Any particular combination of values for the explanatory variables is called an experimental condition or treatment.

The design of an experiment is the overall plan for conducting an experiment.

Page 25: Collecting Data Sensibly

A good experiment requires more than just manipulating the explanatory variable. The design must also eliminate rival explanations or the experimental results will not be conclusive. Example: Testing student the effects of room temperature

on student performance on a semester physics exam. Four sections, two assigned to 65 deg F and

two, 75 deg F. If 65 deg F group had a higher average would

these results conclusive? Why or why not?

Page 26: Collecting Data Sensibly

Extraneous factor – is one that is not of interest in the current study but is thought to affect the response variable; also called lurking variable.

A well-designed experiment copes with the potential effects of extraneous factors by Random assignment to experimental conditions Direct control Blocking

Page 27: Collecting Data Sensibly

Direct control – when an experimenter holds extraneous factors constant so that their effects are not confounded with those of the experimental conditions. Revisiting the physics exam example:

Requiring the use of the same physics textbook for all sections.

All sections meet at same time of day.

Page 28: Collecting Data Sensibly

Blocking - Using extraneous factors to create groups (blocks) that are similar. All experimental conditions are then tried in each block. Extraneous factors addressed through blocking are

called blocking factors. In the physics example if the four sections are taught

by two different instructors, we might block by instructor.

Instructor 1

Section at 65o

Section at 75o

Instructor 1

Section at 65o

Section at 65o

Page 29: Collecting Data Sensibly

We can control the effects of extraneous factors through direct control or blocking as described above, but factors cannot be controlled or blocked. Example: Student ability in the physics test example

We can handle extraneous factors through random assignment to experimental groups—a process called randomization. Ensures that experiment does not systematically favor one

experimental condition over another and attempts to create experimental groups that are much alike as possible.

Ideal situation: Ability to both randomly select subjects and randomly assign them to experimental conditions. Would allow for conclusions to be made about the larger population. The former is not always possible, but we can still make conclusions

about the treatment.

Page 30: Collecting Data Sensibly

An investigation to test if an online review of course material before an exam would improve exam performance.

Subjects selected might have different ability, which is reflected in their SAT math and verbal scores.

Page 31: Collecting Data Sensibly

If we are going to assign these students to two groups, one receiving the review and one not, we should make sure that assignment does not favor one groups over another.

This figure is suppose to show, by use of color, that the subjects were randomly assigned. The orange and blue dots in the original figure were indeed randomly dispersed for any given row of dots in the figure above.

Page 32: Collecting Data Sensibly

As long as the number of subjects in not too small, we can rely on random assignment to produce comparable experimental groups eliminating the problem of extraneous variables. This is the reason that randomization is part of all well-designed experiments.

The gas additive/mileage example. Test three different fuel additives on fuel efficiency. Use same car, 10 trials for each additive. When an experiment can be viewed as a sequence of trials,

randomization involves the random assignment of treatments to trails.

Page 33: Collecting Data Sensibly

Replication – a design strategy to ensure that there are enough observations for each experimental condition to ensure that each group reliably reflects variability of the population.

Example 2.3 Subliminal Messages Language test, one group with words related to

politeness, the other related to rudeness. After test 63% of the group give words that were

related to rudeness interrupted a conversation, the other group on 17% interrupted.

Page 34: Collecting Data Sensibly

Many experiments compare a group that receives a particular treatment to a control group that receives no treatment. Allows the experimenter to assess how the response

variable behaves when the treatment is not used.Example 2.4 Chilling Newborns? Then You

Need a Control Group. Infants were randomly assigned to usual care (control

group) or whole-body cooling. Results indicated that cooling reduced the risk of

death and disability for infants deprived of oxygen at birth.

Page 35: Collecting Data Sensibly

Before proceeding with an experiment you must be able to answer these questions.

1. What is the research question that data from the experiment will be used to answer?

2. What is the response variable?3. How will the values of the response variable be determined?4. What are the factors (explanatory variables) for the experiment?5. For each factor, how many different values are there, and what

are these values?6. What are the treatments for the experiment?7. What extraneous variables might influence the response?8. How does the design incorporate random assignment of subject

to treatments (or treatments to subjects) or random assignment of treatments to trials?

9. For each extraneous variable listed in Question 7, how does the design protect against its potential influence on the response through blocking, direct control, or randomization?

10. Will you be able to answer the research question using the data collected in the experiment?

Page 36: Collecting Data Sensibly

2.4 More on Experimental Design

Goal of experimental design is to provide a method of data collection that: Minimizes extraneous sources of variability in the response

so that any differences in response for various experimental conditions can be more easily assessed.

Creates experimental groups that are similar with respect to extraneous variables that cannot be controlled either directly or through blocking.

Notes on control groups: Comparing new treatment to old; old is considered the

control. Not all experiments use a control.

Example: how oven temperature effects overall cooking time.

Page 37: Collecting Data Sensibly

Dealing with human subjects Placebo – is something that is identical (in

appearance, taste, feel, etc.) to the treatment received by the treatment group, except that it contains no active ingredients. Because people sometimes respond to the power of

suggestion. Single-blind experiment – experiment in which

subjects do not know what treatment they have received.

Double-blind experiment – experiment in which neither the subjects not the individuals who measure the response know which treatment was received.

Page 38: Collecting Data Sensibly

Experimental units and replication Experimental unit – the smallest unit to which a

treatment if applied. Replication – each treatment if applied to more than

one experimental unit. Necessary for randomization to be an effective way to

create similar experimental groups, and to get a sense variability in the values of the response for

individuals that receive the same treatment.

Page 39: Collecting Data Sensibly

2.5 More on Observational Studies: Designing Surveys (Optional)

Survey – a voluntary encounter between two strangers in which an interviewer seeks information from a respondent by engaging in a special type of conversation.

Designing and administering a survey is not as easy as it might seem. Great care required to obtain good information.

Survey researchers and psychologist agree that there is a sequence of task in a survey: Comprehension of question Retrieval from memory Reporting a response

Page 40: Collecting Data Sensibly

Keep in mind when writing a survey:1. Questions should be understandable by the

individuals in the population being surveyed. Vocabulary at an appropriate level, and sentence structure should be simple.

2. Questions should, as much as possible recognize that human memory is fickle. Questions that are specific will aid the respondent by providing better memory cues. Limitations of memory should be kept in mind when interpreting responses.

3. Questions should not make respondents feel embarrassed or threatened. In such cases, respondents may introduced social desirability bias. This can compromise conclusions drawn from survey data.