who won the power of sampling survey (%) datesagencyclintondoleperot 10/18hotline49409...

36
Who Won THE POWER OF SAMPLING SURVEY (%) Dates Agency Clinto n Dole Perot 10/18 Hotline 49 40 9 11/1 Reuter 49 41 8 11/1 Harris 51 39 9 11/3 Gallup 51 38 9 ELECTION Results 49 41 9

Upload: daniella-hodge

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Who Won

THE POWER OF SAMPLING SURVEY (%)

Dates Agency Clinton Dole Perot

10/18 Hotline 49 40 9

11/1 Reuter 49 41 8

11/1 Harris 51 39 9

11/3 Gallup 51 38 9

ELECTION Results 49 41 9

Sampling

• Sampling: the process of selecting observations

• Probability sampling is to ensure generalizability from a sample to a larger population by a extremely important procedure called random selection.

• We will see some nonprobability sampling to illustrate sampling history

President Alf Landon

• Literary Digest, a famous magazine run from 1890 to 1938, accurately predicted results of U.S. presidential election in 1924, 1928, 1932 by sending questionnaires to people on telephone directories and automobile registration list

• In 1936, the Digest sent ten million ballots to people listed in telephone directories and automobile registration. They received 2 million responses predicting Landon versus Roosevelt at 57 to 43 percent landslide.

• The result was Roosevelt won the election with 61 percent of the vote.

• So what happened?

President Thomas Dewey

• Gallup poll correctly predicted the 1996 election using quota sampling, which assumes knowledge of the entire population and requires the sampling characteristics much up with the population features such as income, race, and residential area etc.

• The Gallup poll used level of income for its quota sampling, which ensured the right proportion of respondents at each income level. It correctly predicted subsequent election in 1940 and 1944.

President Thomas Dewey

• But in 1948, Gallup poll predicted that Dewey would win the election. It turned out that Truman won the election.

• 1) the poll was conducted in October when many people did not make their minds yet. A disproportion number of undeciders voted for Truman when they entered the voting booth.

• 2) The quota sampling assumes knowledge of the entire population. Post WWII population changed drastically from Pre WWII population while the Gallup used 1940 census data to sample 1948 population. The most striking change is that large number of increase in city dwellers since WWII, most of whom voted for Truman.

Two sampling methods

• Probability sampling: selections of a random sample from a list containing the names of everyone in the population being sampled

• Nonprobability sampling includes reliance on available subjects, purposive or judgmental sampling, snowball sampling, and quota sampling.

Reliance on available subjects

• Relying on available subjects such as stopping people at a street corner or surveying students in a large lecture class is a very risky yet the most feasible sampling method.

• Researchers should be extremely cautious of the generalizability of data obtained from this sampling method. Simply put, researchers cannot make much generalization in relying on available subject surveys.

Purposive or judgmental sampling

• Purposive or judgmental sampling is to select a sample on the basis of knowledge of a population that can suffice the purpose of the study.

• You want to study right-wing and left-wing students and all you have is the membership lists of some right-wing and left-wing groups. You sampled the memberships of those groups. Your study may not provide a good description of the students, but it suffice for your study purpose of comparing the two groups.

• Selecting deviant cases to study is another illustration of purposive/judgmental studies

Snowball sampling

• You collect data on the few members of the target population he or she can provide and then ask them to provide the information needed to locate other members of that population whom they happen to know.

• Classical examples include homeless people, illegal immigration, drug dealer and criminal studies, migrant workers, political group members.

• Snowball sampling produces sample with questionable representative ness, it used primarily for exploratory purposes. Why it is the case?

Quota sampling

• Your sample should match up with this table

Gender

Age

Male Female

15-34 C11 C12

35-49 C21 C22

50-79 C31 C32

Informants

• Informants are the persons who can provide information of the subjects you are studying.

• Researchers commonly select informants who are typical of the groups they are studying. However, some good representative informants may turn out to be marginal or atypical person as the research unfolds.

• Nonprobability research has its particular application to qualitative researchers. But acknowledging the limitation of nonprobability sampling regarding its accuracy and precise representation is a crucial issue in those research

The Theory and Logic of Probability Sampling

• Nonprobability sampling cannot guarantee a representative sample of the entire population, thus all large-scale surveys use probability sampling methods.

• If all members of a population were identical in all respects, studying a single case suffice as a sample to study the whole population. It never happens because human being varies in a great amount of characteristics.

Probability Sampling

• Using a hypothetical example population consisting of 100 people varying in gender and race, we illustrate various aspects of probability sampling

• Sampling bias means those selected are not typical or representative of the larger population they have been chosen from. Researchers may unconsciously induce sampling bias by choosing respondents most closest to them (see Figure 7-2 on page 184 for an illustration).

Techniques to Avoid Bias

• Although we offer no definition for the term representativeness, a sample is representative of the population from which it is selected if the aggregate characteristics of the sample closely match those same aggregate characteristics in the population.

• A basic principle of probability sampling is that a sample will be representative of the population from which it is selected if all members of the population have an equal chance of being selected in the sample, which is commonly called EPSEM (Equal Probability of Selection Method)

Advantages of Probability Samples

• 1) probability sample, although never perfectly representative, are more representative than other types of samples such as nonprobability samples because bias is avoided.

• 2) probability theory permits an estimate of the accuracy or representativeness of the sample. In other words, the probability sampler can provide an accurate estimate of success or failure in its representativeness.

Elements and Population

• Elements are units about which information is collected and that provides the basis of analysis. Most likely the elements in social studies are individuals. Some times, it can be families, social clubs, corporations, and nations.

• Population is the theoretically specified aggregation of the elements in a study. It can be current U.S. citizen, college students, etc.

• A study population is that aggregation of elements from which the sample is selected. For practical purpose, a polling firm may exclude Alaska and Hawaii for an national sampling.

Random Selection

• The purpose of sampling: to select a set of elements from a population in such a way that descriptions of those elements accurately portray the total population from which the elements are selected.

• Random selection, in which each element has an equal chance of selection independent of any other event in the selection process, in the key to accomplish the purpose/goal of sampling.

Flipping Cores

• A classical illustration of random sampling is flipping coins. Each time the chance of getting a head or the tail is 50%, irrespective of all previous results.

• Sampling distribution of ten cases• An illustration of the example from Pp 186-Pp

188 will be on class session.• But remember the conclusion: every increase in

sample size improves the distribution of estimates of the mean.

Sampling Error

• Sampling error: the degree of error to be expected for a given sample design.

S = SquareRoot((P * Q)/N)S: standard error (standard deviation for sampling distribution)P: percentage of cases equals 1 in a binary variable

Q: percentage of cases equals 0 in a binary variable (Q = 100 –P)N: number of cases in each sample

• Assuming we have 50% approval rate on some social issue, and sample size is 100. The standard error is 5%. So one standard deviation above mean that covers 34% of samples will give estimate from 50% to 55%. One standard deviation below mean that covers 34% of samples will give estimate from 45% to 50%.

Populations and Sampling Frame

• A sampling frame is the list or quasi list of elements from which a probability sample is selected. Example illustrations follows

• The data of our research were obtained from a random sample of parents of children in the third grade in public schools in Yakima county, Washington.

• Our sample of 160 individuals was drawn randomly from the telephone directory of Fayetteville Arkansas

A Problem

• Properly drawn samples provide information appropriate for describing the population of elements that compose the sampling frame-nothing more.

• Very often researchers select samples from a given sampling frame and make assertions about a population that is similar but not identical to the population defined by the sampling frame.

The Sequence

• The sampling frame is a list of the elements composing the the study population.

• Existing frame always define the study population, rather than other way around.

• Have a population in minds• Search for available sampling frame• Redefine your population to accommodate

your sampling frame

Elements

• Random sample organizations from a sampling list. The sample can be used to represent all organizations in the list

• You can also make use of list of registered voters, automobile owners, taxpayers, and telephone directories

• Telephone directories have many defects in representing the entire population in a region. First is its social class bias, poor people have no phone line, rich people have many phone lines. Second, many people choose not to put their names on the list.

Principles

• Findings based on a sample can be taken as representing only the aggregation of elements that compose the sampling frame

• Omission is inevitable. You need to correctly assess the empirical result and not to over-generalize your findings.

• Each element in a sample appears only once.

Types of Sampling Design

• Simple random sampling: once you have a sampling frame, assign a unique number to each elements in the frame, and use random number generator to select cases

• public class random • public static void main (String args[])• for (int i=0; i<10; i++)•

System.out.println(Math.random()*10);

Types of Sampling Design

• Systematic sampling: Every Kth element in the entire list goes into the sample. Sampling interval = population size / sample size; sampling ratio = sample size/population size

• Very dad choice if the sampling interval is coincident with systematic bias in the list. For example, you sample every 10th case in army roster, but army roster is arranged according to ranks and sergeants always rank the 1st, 10th and so on and so forth. You sample is consisting only sergeants or absolutely no sergeants.

Stratified Sampling

• Stratified samples is to first organize the population into homogeneous subsets (with heterogeneous between subsets) and to select the appropriate number of elements from each.

• The goal of stratified sampling is to reduce sampling error by creating homogenous subpopulation from which the samples are selected.

• An example to produce a homogenous population in studies of college student is to create subpopulation of students based on their age cohorts. So each subpopulation consists of people with the same age. Then randomly select cases from each stratified age cohorts.

• Depending on your research focus, you may stratify the population according to different variables such as sex, occupations, educations, races, social classes, incomes, etc.

Implicit Stratification

• Some lists have implicit stratification. For example, a university may use students SSN to produce a roster for the entire university. So the roster is grossly stratified by geographic locations. In these cases, you need to use systemic sampling to produce homogeneous cases in terms of geography.

An example

• Studying students in University of Hawaii• Sampling frame is the computerized student file containing students

id, gender, name, address, SSN, major, age, and class.• Redefine the study population as day-program degree seeking,

students in fall semester on the Manoa compus, including all departments, all levels, all nationalities.

• Stratified the population by college class into many subpopulations. • Determine the sample size to be 1,100 and ratio to be 1/14, a random

number generator produces a number from 1 to 14, students of that number in every 14 students block is selected into the sample.

• Due to budget cut, the sample size is down to 733. A systemic random sampling with a random start reduces the sample size to 733.

Multistage Cluster Sampling

• Multistage cluster sampling first samples groups of elements, followed by the selection of elements within each of the selected clusters

• Bian (1994) used multistage cluster sampling in his studies of work and inequality in urban China. He sampled 2 out of totally 6 districts in Tianjin, China, using random selection. Within each district, there are more than 100 street blocks, which in turn have a entire list of household living in the street blocks. Bian randomly selected 10 street blocks within each district and 50 household within each street blocks. So his sample ends up having 50 * 10 * 2 = 1000 individuals because he interviewed individuals within each household.

• Bian, Yanjie. 1994. Work and inequality in urban China. Albany, NY: State University of New York Press

Increasing Sampling Error

• Multistage design has a defect of increasing the sampling error, which is the function of the number of stages. In previous Bian example. Researchers have a sampling error when they randomly selected district, another sampling error when they selected blocks, and one more sampling error when they select individual households.

• However, for a given sample size (mostly due to budget constraint), the number of clusters trade-offs with the number of elements within each cluster.

Solutions

• Solution one would be to increase the number of clusters and decrease the number of elements within each cluster for a given sample size. The reason we do this is because each cluster consists of largely homogeneous elements, which will reduce the sampling error

• The second solution uses stratification for the multistage sampling. For example, using geographic location as the stratifying variable to produce stratum, within which you can randomly select churches.

• U.S. census bureau has standardize this practice by asking 5 household per census block. If you need to study 2,000 household, you need to randomly select 400 blocks from the list.

Probability Proportionate to Size (PPS) Sampling

• A more sophisticated sampling method called PPS ensures the same probability of being selected in multistage random cluster sampling.

• We want to sample 100 blocks from a total 1,000 street blocks from a city, followed by 1 out of 10 households selected for the study. The probability of block selection is 10%, the probability for household selection is 10%. Thus the probability of being selected for the study is 1% for each household. So what’s the problem?

PPS

• The problem is to ensure the same probability to be selected for each household, it assumes each block has the same number of households, which is not the case for most times.

• Suppose a city has 100 residents living in 1 densely populated blocks and another 99 residents living in equal number in sparsely populated 9 blocks. You want to select 1 block, from which you want to select 1 out of 10 residents to study. The probability of selection for residents living in the dense block is 10% * 10% = 1%, whereas the probability for residents living in sparse blocks approximates 90% * 10% = 9%.

Solution

• PPS can solve the unequal probability of selection problem associated with multistage cluster sampling by assigning weight to change the probability of each cluster.

• A city has two blocks. Block A has 100 households, block B has 10 households. We assign the probability of selecting block A 10 times of that of selecting block B. If P(A) = 5%, P(B) = 0.5%. Supposing we want to select 5 households each block. Households in block A has a probability of 5% of being selected, whereas household B has 50% of being selected. However overall the household in block A has 5% * 5% = 0.0025% of being selected and B has the same probability of 0.5% * 50% = 0.0025% chances.

Variations in PPS

• Disproportionate Sampling and Weighing

• A real example