1 data collection and sampling chapter 5. 2 5.2 methods of collecting data the reliability and...

21
1 Data Collection and Sampling Chapter 5

Upload: margaretmargaret-hopkins

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

1

Data Collection and Sampling

Data Collection and Sampling

Chapter 5Chapter 5

Page 2: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

2

5.2 Methods of Collecting Data

• The reliability and accuracy of the data affect the validity of the results of a statistical analysis.

• The reliability and accuracy of the data depend on the method of collection.

• Three of the most popular sources of statistical data are:– Published data– Observational studies– Experimental studies

Page 3: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

3

– This is often a preferred source of data due to low cost and convenience.

– Published data is found as printed material, tapes, disks, and on the Internet.

– Data published by the organization that has collected it is called PRIMARY DATA.

For example:Data published by the US Bureau of Census.

For example:Data published by the US Bureau of Census.

– Data published by an organization different than the organization that has collected it is called SECONDARY DATA.

For example:•The Statistical abstracts of the United States,compiles data from primary sources• Compustat, sells variety of financial data tapescompiled from primary sources

For example:•The Statistical abstracts of the United States,compiles data from primary sources• Compustat, sells variety of financial data tapescompiled from primary sources

Published Data

Page 4: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

4

– Observational study is one in which measurements representing a variable of interest are observed and recorded, without controlling any factor that might influence their values.

– Experimental study is one in which measurements representing a variable of interest are observed and recorded, while controlling factors that might influence their values.

• When published data is unavailable, one needs to conduct a study to generate the data.

Observational and experimental studies

Page 5: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

5

• Surveys solicit information from people.• Surveys can be made by means of

– personal interview– telephone interview– self-administered questionnaire

Surveys

Page 6: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

6

A good questionnaire must be well designed:• Keep the questionnaire as short as possible.• Ask short,simple, and clearly worded questions.• Start with demographic questions to help respondents get started comfortably.• Use dichotomous and multiple choice questions.• Use open-ended questions cautiously. • Avoid using leading-questions.• Pretest a questionnaire on a small number of people.• Think about the way you intend to use the collected data when preparing the questionnaire.

A good questionnaire must be well designed:• Keep the questionnaire as short as possible.• Ask short,simple, and clearly worded questions.• Start with demographic questions to help respondents get started comfortably.• Use dichotomous and multiple choice questions.• Use open-ended questions cautiously. • Avoid using leading-questions.• Pretest a questionnaire on a small number of people.• Think about the way you intend to use the collected data when preparing the questionnaire.

Surveys

Page 7: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

7

5.3 Sampling

• Motivation for conducting a sampling procedure:– Costs.– Population size.– The possible destructive nature of the sampling

process.• The sampled population and the target

population should be similar to one another.

Page 8: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

8

5.4 Sampling Plans

• We introduce three different sampling plans– Simple random sampling– Stratified random sampling– Cluster sampling

Page 9: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

9

Simple Random Sampling

• In simple random sampling all the samples with the same size are equally likely to be chosen.

• To conduct random sampling… – assign a number to each element of the chosen

population (or use already given numbers),– randomly select the sample numbers (members).

Use a random numbers table, or a software package.

Page 10: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

10

• Example 5.1– A government income-tax auditor is responsible for

1,000 tax returns.– The auditor will randomly select 40 returns to audit.– Use Excel’s random number generator to select the

returns.• Solution

• We generate 50 numbers between 1 and 1000 (we need only 40 numbers, but the extra might be used if duplicate numbers are generated.)

Simple Random Sampling

Page 11: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

11

Simple Random Sampling

0.3820002 382.00018 3830.1006806 100.68056 1010.5964843 596.48427 5970.8991058 899.10581 9000.8846095 884.60952 8850.9584643 958.46431 9590.0144963 14.496292 150.4074221 407.4221 4080.8632466 863.24656 8640.1385846 138.58455 1390.2450331 245.03311 246

. . .

. . .

0.3820002 382.00018 3830.1006806 100.68056 1010.5964843 596.48427 5970.8991058 899.10581 9000.8846095 884.60952 8850.9584643 958.46431 9590.0144963 14.496292 150.4074221 407.4221 4080.8632466 863.24656 8640.1385846 138.58455 1390.2450331 245.03311 246

. . .

. . .

X(100) Round-up

38310159790088595915408864139246..

The auditor should select 40 files numbered 383, 101, ...

50 Random numbersbetween 0 and 1000,each has a probabilityof 1/1000 to be selected

50 numbers uniformly distributed between 0 and 1

50 random uniformly distributed whole-numbers between 1 and 1000.

Page 12: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

12

• This sampling procedure separates the population into mutually exclusive sets (strata), and then draw simple random samples from each stratum.

Sex• Male• Female

Age• under 20• 20-30• 31-40• 41-50

Occupation• professional• clerical• blue-collar

Stratified Random Sampling

Page 13: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

13

• With this procedure we can acquire information about– the whole population– each stratum– the relationships among strata.

Stratified Random Sampling

Page 14: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

14

Stratified Random Sampling

• There are several ways to build the stratified sample. For example, keep the proportion of each stratum in the population.

A sample of size 1,000 is to be drawn

Stratum Income Population proportion

1 under $15,000 25% 2502 15,000-29,999 40% 4003 30.000-50,000 30% 3004 over $50,000 5% 50

Stratum size

Total 1,000

Page 15: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

15

• Cluster sampling is a simple random sample of groups or clusters of elements.

• This procedure is useful when– it is difficult and costly to develop a complete list of the population

members (making it difficult to develop a simple random sampling procedure.

– the population members are widely dispersed geographically.• Cluster sampling may increase sampling error, because of

probable similarities among cluster members.

Cluster Sampling

Page 16: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

16

5.5 Sampling and Non-sampling errors

• Two major types of errors can arise when a sampling procedure is performed.

• Sampling Error– Sampling error refers to differences between the

sample and the population, because of the specific observations that happen to be selected.

– Sampling error is expected to occur when making a statement about the population based on the sample taken.

Page 17: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

17

Population income distribution

( population mean)

)( meansamplex

Sampling error The sample mean falls here only because certain randomly selected observations were included in the sample.

Sampling Errors

Page 18: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

18

• Non-sampling errors occur due to mistakes made along the process of data acquisition

• Increasing sample size will not reduce this type of errors.

• There are three types of Non-sampling errors;– Errors in data acquisition,– Non-response errors,– Selection bias.

Non-sampling Errors

Page 19: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

19

Data Acquisition Error

If this observation…

…is wrongly recorded here…

…then the sample mean is affected

Sampling error + Data acquisition error

Population

Sample

Page 20: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

20

Non-Response Error

Population

Sample

No response here... …may lead to biased results here.

Page 21: 1 Data Collection and Sampling Chapter 5. 2 5.2 Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results

21

Selection Bias

Population

Sample

When parts of the population cannot be selected...

…the sample cannot representthe whole population.