week 7 sampling

SamplingSamplingAndrew MartinAndrew Martin

PS 372PS 372University of KentuckyUniversity of Kentucky

What do polls tell us?

Population❖ If we want to assess American support for

competing presidential candidates, we must clarify what we mean by Americans.

❖ In other words, we need to specify a population.

❖ A population is the complete set of relevant units of analysis.

❖ For the purpose of studying elections, the population is generally defined as the U.S. voting-age population (residents 18 and older).

Sample❖ Interviewing every voting-age American would be

impossible; the time and money constraints are too great.

❖ However, it is possible (and more practical) to select a sample from the population for investigation.

❖ A sample is any subset of units collected in some manner from a population.

❖ The sample size and method ultimately determines the quality of inferences that can be made about the population.

Population vs. Sample

Study Population Sample

U.S. VotingVoting-Age

Pop.Gallup Poll

Respondents

Military Coups All coupsCoups in Latin

America in 1980sSupreme Court

Decision-making

All decisions on the merits

Merits decisions on Burger Court

Democratization

All democratizing

countries

Democratizing post-Cold War

Populations and Samples

❖ Ideally, political scientists would like to measure and gather information about the population.

❖ Examples: Averages, differences between two groups, relationships among variables.

❖ If this information can be found and quantified for the entire population, the number is known as a population parameter.

❖ However, sample statistics allow political scientists to approximate the corresponding population values, or parameters.,

Population Parameters

❖ Population parameters are typically denoted by lowercase English or Greek letters, usually the Greek letter theta ( θ ).

❖ A proportion, such as the proportion of Americans who support the war in Iraq at a particular time, typically designated as P or π.

Sample Statistics

❖ Sample statistics are frequently expresses with a hat (^) over a character to denote it is not a parameter but a statistic. Sometimes lowercase p is used for a sample proportion.

Statistics vs. Parameters

❖ For a population mean, the lowercase Greek letter Mu (μ) is used.

❖ For the corresponding sample statistic, μ hat is used, or Y bar

Population vs. Samples

❖ An element is a single occurrence, realization or instance of the objects or entities being studied.

❖ A population can be subdivided into small groups known as strata.

❖ Each stratum shares one or more characteristic.

Sampling

❖ The particular population from which a sample is actually drawn is called a sampling frame.

❖ Sampling frames are supposed to contain all elements that are part of the population of interest, but in practice are often incomplete.

❖ Example: Polling UK students using the annual student phone directory.

1936 Presidential Election

❖ Literary digest predicted that Republican Alf Landon would defeat Democrat Franklin D. Roosevelt.

❖ The population: U.S. voters.

❖ The sampling frame: Telephone directories and automobile registration lists.

❖ Telephone and car ownership were not common then. The sample was not representative of the actual population because it overrepresented wealthy voters.

Sampling❖ Now, virtually everyone has a telephone. But

some people have unlisted numbers.

❖ Researchers have developed random digit dialing to deal with this problem. A computer randomly selects telephone numbers, which is how people with unlisted numbers are contacted.

❖ However, not everyone owns a telephone. Millions of people are switching to cell phones, which will eventually cause pollsters to change their methodology.

Sampling

❖ Political science researchers like to use information collected in the sample to make inferences about the whole population.

❖ If the sampling frame is incomplete or inappropriate, sample bias will occur.

❖ This causes the sample to be unrepresentative of the population and can lead scholars to draw incorrect conclusions.

Sampling

❖ A probability sample is simply a sample for which each element in the total population has a known probability of being sampled.

❖ A nonprobability sample is one in which each element in the population has an unknown probability of being selected.

Simple Random Sample

❖ In a simple random sample each element and combination of elements has an equal chance of being selected.

❖ However, this is often difficult to do in practice.

Simple Random Sample❖ During the Vietnam War, the selective service

decided it would randomly draft soldiers by sampling days of the year.

❖ A drum contained 366 capsules with days of the year. Capsules were drawn, and men born on the day written on the capsule would be called to serve in the military unless exempted.

❖ However, the capsules must not have been properly mixed, because the Selective Service tended to oversample days during the last six months of the year.

Simple Random Sample Method 1

❖ Example: We have 1,507 elements in the population and wish to draw a sample of 150.

❖ Every element is numbered, starting at 1 and ending at 1,507.

❖ Using a random number table, elements can be selected each time its corresponding number appears.

❖ Any system of combining the numbers is acceptable as long as the numbers are random.

Simple Random Sample Method 2

❖ Example: We have 1,507 elements in the population and wish to draw a sample of 150.

❖ All elements are represented on corresponding marbles and put in a hit, which is continuously and thoroughly mixed.

❖ Each element has an equal chance of being selected.

Systematic Sample

❖ Elements are selected from a list at pre-determined intervals. In other words, they are chosen systematically rather than randomly.

❖ Every jth element on a list is selected. This number is known as the sampling interval.

❖ If you have a population of 2,000 on a list and you want a sample of 200, you can select every 1oth element on the list for the sample.

❖ Usually the starting number is randomly selected. This is known as a random start.

Systematic Sample

❖ Systematic samples could be biased if:

1. The elements on the list have been ranked according to a characteristic.

2. If the list contains a pattern corresponding to a sampling interval.

Stratified Sample

❖ A stratified sample is a probability sample in which elements sharing one or more characteristics are grouped, and elements are selected from each group in proportion to the group’s representation in the total population.

❖ It is less difficult to draw a sample from a homogeneous population than a heterogeneous population.

Stratified Sample❖ Can be proportionate or disproportionate.

❖ In a proportionate sample, each stratum is represented in proportion to its size in the population.

❖ To determine the number of elements to sample in each stratum, a sampling fraction must be calculated.

Proportionate Stratified Sample

❖ Ex: We have 8,000 freshmen, 5,000 sophomores, 4,000 juniors and 3,000 seniors we wish to poll about the presidential election. We want to sample 2,000 students.

❖ The sampling fraction would be the desired sample size divided by the population, so 2000/20000.

❖ Therefore, we would sample 10 percent of each stratum, which in this case by school year.

Proportionate Stratified Sample

❖ When selecting characteristics on which to stratify a list, you should chose those expected to have a relationship with the dependent variable.

❖ Ex: Household income samples should stratify by education, sex and race.

❖ Ex: MCs stratified by party and experience.

❖ Ex: News stories by network.

Disproportionate Stratified Sample

❖ In a disproportionate sample, some strata are overrepresented and others are overrepresented.

❖ Usually, strata making up a smaller percentage of the population get oversample so we can make useful inferences about that group independent of the other strata.

❖ To prevent having a biased sample, each strata is weighted by its proportion of the population.

Disproportionate Stratified Sample

❖ JRM 223

❖ .625(Liberal Arts) + .125(Engineering) + .25(Business) = Mean Student Body GPA

❖ .625(2.5) + .125(3.3) + .25(2.7) = 2.65

Cluster Samples

❖ A cluster sample is a probability sample in which the sampling frame initially consists of clusters of elements.

❖ NN 174

Cluster Samples

❖ Suppose there are 500 blocks, and from these blocks 25 are chosen at random.

❖ On these 25 blocks, there are 4,000 dwelling units or households.

❖ One quarter of these households will be contacted because we desire a sample of 1,000 individuals.

Cluster Samples

❖ Each household's probability of being selected is the same. How do we know this?

❖ 25/500 (probability that the household's block will be chosen) X 1000/4000 (probability of being surveyed if block is chosen) = 1/80

Nonprobability Samples

❖ Sometimes an element's chance of being selected is unknown, and nonprobability samples have to be collected.

❖ Although probability samples are prefereable, sometimes they are not feasible.

❖ Researchers may be able to learn more studying carefully selected, even unusual cases.

Nonprobability Samples❖ Purposive samples allow the researcher to have

discretion in selecting elements for observation. (Ex: Fenno's Home Style)

❖ A quota sample is a sample in which elements are sampled in proportion to the population. Similar to a stratified sample but elements are not chosen probabilistically.

❖ In a snowball sample, respondents are asked to identify other persons who might qualify for inclusion in the sample.

Statistical Inference

❖ Statistical inference involves the mathematical theory and techniques for making conjectures about the unknown characteristics (parameters) of populations based on samples.


❖ Samples statistics provide us with estimates or approximations of population parameters.

❖ These estimates may differ from the “true” value of the population parameter, but if the sample is collected correctly and is large enough, the estimates are unlikely to be far from the truth.


❖ We will focus on three concepts:

❖ Expected values

❖ Standard errors

❖ Sampling distributions

Expected Value

❖ Expected value is the mean or average value of a sample statistic based on repeated samples from a population.

Expected Value

❖ Suppose a candidate for state senate wants to know how many independent voters live in a district, which has grown rapidly during the last 10 years. Therefore, there are no reliable Census data available.

❖ Why might a state senator care about the number of independents in his/her district?

Expected Value❖ Suppose the true level of registered Independents is 25 percent,

or .25.

❖ In formal terms, P = .25, where P = population parameter.

❖ You take the first sample. Two of 10 respondents say they are independents. Your first estimate, or sample statistic, has some sampling error.

❖ Specifically, the sampling error is the discrepancy (.05) between the population parameter (.25) and the sample statistic (.20).

Expected Value

❖ What about four samples? Let's assume you conduct 4 samples of 10 and the proportion of independents is: (.20 + .30 + .40 + .20)/4 = .275 (observed value or mean not far from true value of .25)

❖ Four samples of 10 brings us closer to the truth than one sample. What about 1,000 samples of 10? What about 1,000 samples of 50?

Expected Value

❖ If statistics (or observed values) are calculated for each of many, many independently and randomly chosen samples, their average or mean will equal the corresponding population parameter (or true value).

❖ Statisticians refer to this mean as the expected value (E) of the estimator.

Expected Value

❖ If θ represents the population parameter, then represents a sample estimator of that characteristic. We can then write:

E =

Expected Value

❖ In the case of a sample proportion based on a simple random sample, we have:

❖ E (p) = Pwhere p is sample proportion, P is the estimated population proportion.

❖ In the long run, the sample statistic the average results would theoretically equal the true value or the population proportion.

Sampling Distribution

❖ A sampling distribution of a sample statistic is a theoretical expression that describes the mean variation, and shape of the distribution in an infinite number of occurrences of the statistic when calculated on samples of size N drawn independently and randomly from a population.

❖ JRM 230

❖ JRM 233

❖ JRM 234

❖ JRM 235

Confidence and Error

❖ Confidence is the degree of belief that an estimated range of values – more specifically, a high or low value – includes or covers the population parameter. In political science this concept is normally described by a confidence interval.

❖ Standard error is the standard deviation or measure of variability of a sampling distribution. In other words, it tells us how much variation there is in the sampling methods.

Standard Error❖ The standard error of a reported proportion or

percentage p measures its accuracy, and is the estimated standard deviation of that percentage. It can be estimated from just p and the sample size, n, if n is small relative to the population size, using the following formula.

Standard Error

.25.7510

=.14

Confidence Intervals

❖ Get a poll, talk about it.

Gallup Poll Standard Error

❖ If we want the margin of error, or confidence interval for the poll, we multiply this number by 1.96.

❖ .01 * 1.96 = 1.96 ≈ 2

.52.482761

=.01

❖ Margin of Error Graphic

Confidence Intervals

❖ Standard distribution photo

week 7 sampling

Technology

sample study population

particular population

actual population

total population

entire population

sample proportion

sample size

sample biaswill