the 4 sampling techniques review and critical evaluation skills needed: memorise the descriptions...

33
Scholarship:Statistics and M odelling Performance S tandard 93201 O utcom e description The studentw ill dem onstrate the ability to apply m athem atical,statistical and probability know ledge and m ethods to com plex problem s in contexts w hich m ay be unfam iliar, interpretand,w here appropriate,generalise results and clearly com m unicate concepts and findings.

Upload: laura-erica-johnson

Post on 03-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Scholarship: Statistics and Modelling Performance Standard 93201

Outcome description The student will demonstrate the ability to apply mathematical, statistical and probability knowledge and methods to complex problems in contexts which may be unfamiliar, interpret and, where appropriate, generalise results and clearly communicate concepts and findings.

The 4 Sampling techniques

Review and Critical Evaluation

SKILLS NEEDED:Memorise the descriptions of: Simple Random, Stratified,

Cluster, Systematic, especially the advantages & disadvantages of using each method.

1. Read notes on the next few slides (memorise key points).

2. Do Old Sigma: Ex. 9.2–Q1, Ex. 9.3, Ex. 24.9 .

Sampling Techniques that you need to be able to describe

Good sampling techniques (you must be able to describe these & give advantages & disadvantages for using each):

1.Simple Random Sampling2.Systematic Sampling3.Stratified Sampling4.Cluster Sampling

Bad sampling techniques (bias selection processes):

• Convenience sampling (e.g. go to the Square & interview the first n people you see).

• Self-selected sampling (e.g. people phoning in).

1. Simple Random Sampling1. Obtain a list of all N items in the target population,

numbering them 1 to N (e.g. the school roll: 1-600).2. Decide how many you will select for your sample (n).3. Use the random number generator on your calculator

to select numbers at random between 1 and N: On calculator, type: 1 + Population size × RAN#4. Keep pressing ‘equals’ until you have selected n

different items (discard any repeats).

Advantage of SR sampling: Ensures that every item in the population has an equal chance of being selected – therefore, by definition, no selection bias.

2. Decide how many you will select for your sample (n).3. Use the random number generator on your calculator to

select numbers at random between 1 and N: On calculator, type: 1 + Population size × RAN#4. Keep pressing ‘equals’ until you have selected n different

items (discard any repeats).

Advantage of SR sampling: Ensures that every item in the population has an equal chance of being selected – therefore, by definition, no selection bias.

Potential disadvantages:• Under/over representation of some sub-groups:

Does not ensure that all sub-groups of the population are represented in proportion (e.g. ethnicities, Sth Islanders).

• Data collection inconvenient: Only if sampling from a large geographical area. Chosen items are randomly distributed all over the place (no groupings or systems).

1. Do Old Sigma (2nd edition) pg. 134 – Ex. 9.2: Q1 only.

Simple random sampling.21.03

Generate 20 different random numbers between 1 and 100.

42 67 2 12 77 49 60 20 45 15 64 7 8 21 15 64 58 14 29 68 26 90

If a random number has already occurred, generate more as needed.

Calculator formula1 + 100×RAN#

Systematic Sampling1. Obtain a list of all N items in the target population and

number the items from 1N (Sampling Frame).2. Decide on your sample size, n .3. Pick a random starting point in your list by generating a

random number from 1N (e.g. item number 7).4. Sample every kth item after that, where k=N/n until you

have selected n items. Cycle back through if you reach the end of the list before you have n items.

Advantages:• Ensures that sample is selected from throughout the

breadth of the sampling frame.• Convenient and fast – easier to collect info on items that

are in a sequence (every 5th house) than from a random sample where they are scattered all over.

4. Sample every kth item after that, where k=N/n until you have selected n items. Cycle back through if you reach the end of the list before you have n items.

Advantages:• Ensures that sample is selected from throughout the

breadth of the sampling frame.• Convenient and fast – easier to collect info on items that are

in a sequence (every 5th house) than from a random sample where they are scattered all over.

Disadvantage:Be careful that the list itself has no regular pattern that

matches your systematic selection (i.e. where every kth item shares similar characteristics in the variable you are measuring).

Advantages:• Ensures that sample is selected from throughout the

breadth of the sampling frame.• Convenient and fast – easier to collect info on items that

are in a sequence (every 5th house) than from a random sample where they are scattered all over.

Disadvantage:Be careful that the list itself has no regular pattern that

matches your systematic selection (i.e. where every kth item shares similar characteristics in the variable you are measuring).

E.g. If every 2nd house on a street were sampled, all would be on the same side of the street!

21.03

Choose a starting point at random between 1 and 100.

Suppose this gives 5.87352 5.

So start at item number 5.

Then choose every kth item,

where k = N/n.

Using calculator1 + 100×RAN# =

Systematic sampling.

To obtain a systematic sample of size 20 from this data.

= 100/20

= 5. So sample every 5th item.

Stratified samplingUse when the population consists of categories (strata),

and you wish to represent each ‘stratum’ proportionally.E.g. racial groups, one-story and multi-story homes within a

city.

1. Obtain a list of all N items in the target population (Sampling Frame).

2. Decide on your sample size, n .3. Divide sampling frame (list) into the strata

(categories).4. Select a separate random sample from each

stratum in proportion to the percentage of the population found in each.

Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population.

1. Obtain a list of all N items in the target population (Sampling Frame).

2. Decide on your sample size, n .3. Divide sampling frame (list) into the strata

(categories).4. Select a separate random sample from each

stratum in proportion to the percentage of the population found in each.

Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population.

E.g. If 12% of a city’s citizens are Pacific Islanders, then 12% of the sample size should be selected from among the Pacific Island citizens.

3. Divide sampling frame (list) into the strata (categories).

4. Select a separate random sample from each stratum in proportion to the percentage of the population found in each.

Proportional Allocation: Selecting from each stratum in proportion to its percentage of the population.

E.g. If 12% of a city’s citizens are Pacific Islanders, then 12% of the sample size should be selected from among the Pacific Island citizens.

Advantage: Guaranteed to be representative of each stratum.

Disadvantage: Time-consuming and expensive because you must collect information about the strata-sizes in advance.

21.03

Suppose the avocados are of 3 different varieties.

Hass: 1–40 40%

Hopkins: 71–100 30%

Fuerte: 41–70 30%

The number in each strata of the sample should be proportional to the number in each group in the population.

Hass: 40% x 20

Fuerte: 30% x 20Hopkins: 30% x 20

Stratified sampling.

= 8

= 6

= 6

21.03

Thus generate random numbers as follows:

Hass: 1–40 8 random nos.33 17 12 25 9 9 33 16 39 8

Fuerte: 41–70 6 random nos.58 59 67 43 53 56

Hopkins: 71–100 6 random nos.98 85 96 99 90 81

Stratified sampling.

Cluster samplingUse when the population is distributed into naturally-occurring groups or ‘clusters’

Cluster samplingUse when the population is distributed into naturally-occurring groups or ‘clusters’ (e.g. towns and cities in a country, purchases from a national franchise like Noel Leemings can be grouped by retail outlet).

Basic cluster sampling consists of 2 stages of selection:Stage 1: Select the clusters:Select a representative sample of the clusters themselves (usually there are too many to sample from all of them).

Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation.

Cluster samplingBasic cluster sampling consists of 2 stages of selection:Stage 1: Select the clusters:Select a representative sample of the clusters themselves (usually there are too many to sample from all of them).

Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation.

E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each.

Basic cluster sampling consists of 2 stages of selection:Stage 1: Select the clusters:Select a representative sample of the clusters themselves (usually there are too many to sample from all of them).

Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation.

E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each.

*Multi-stage cluster sampling:

Basic cluster sampling consists of 2 stages of selection:Stage 1: Select the clusters:Select a representative sample of the clusters themselves (usually there are too many to sample from all of them).

Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation.

E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each.

*Multi-stage cluster sampling: In reality, most cluster sampling for real-life research uses more than 2 stages.

Stage 1: Select the clusters:Select a representative sample of the clusters themselves (usually there are too many to sample from all of them).

Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster, maintaining proportional allocation.

E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each.

*Multi-stage cluster sampling: In reality, most cluster sampling for real-life research uses more than 2 stages.E.g. select a sample of cities, then a sample of suburbs from within each chosen city, then a sample of streets from each chosen suburb, then a sample of houses within each chosen street (4-stages)!

Stage 2: Select the items within chosen clusters: Select a random sample of items from within each chosen cluster,

maintaining proportional allocation.

E.g. Selecting samples of New Zealanders by selecting a sample of towns/cities from throughout the country, then a proportional random sample from within each.

*Multi-stage cluster sampling: In reality, most cluster sampling for real-life research uses more than 2 stages.

E.g. select a sample of cities, then a sample of suburbs from within each chosen city, then a sample of streets from each chosen suburb, then a sample of houses within each chosen street (4-stages)!

Advantage of cluster sampling:• Data-collection process convenient, particularly if sampling

from a geographically large area (data can be collected in groups within chosen clusters rather than being spread out).

E.g. select a sample of cities, then a sample of suburbs from within each chosen city, then a sample of streets from each chosen suburb, then a sample of houses within each chosen street (4-stages)!

Advantage of cluster sampling:• Data-collection process convenient, particularly if sampling

from a geographically large area (data can be collected in groups within chosen clusters rather than being spread out).

Disadvantages of cluster sampling:• Potential bias as items don’t have an equal chance of

selection.– Very small clusters are unlikely to be sampled from.– Items that are not in clusters are excluded altogether.E.g. farmers or people in small rural communities may have no chance of

being selected for a national survey of households.

• Can under-estimate the variation in the population when you sample items within clusters - items that are close together may have similar characteristics.

• Requires prior knowledge of cluster sizes.

Practice for scholarship questions:

1. Memorise the 4 types of sampling techniques and the advantages & disadvantages of each.(*asked in scholarship exam every year!*)

2. Do Old Sigma (2nd edition): Pg. 137 – Ex. 9.3.

3. Then pg. 399 – Ex. 24.9.

21.03

Cluster sampling.

Choose four clusters, each of 5 avocados, by selecting four numbers at random from the data, and taking them as the middle item of a ‘cross’.

62 22 2 68 56

If clusters overlap or run outside the boundaries, choose another.

Note: Depending how a cluster is defined, it can exclude some items or make other items more likely to be chosen than under other sampling methods

Here is one way of obtaining a cluster sample of size 20.

Spreadsheet formula99×RAN# + 1 =

Simple random sampling Systematic sampling

Cluster samplingStratified sampling

21.03 Comparison of samples.

Practice for scholarship questions:

1. Memorise the 4 types of sampling techniques and the advantages & disadvantages of each.(*asked in scholarship exam every year!*)

2. Do Old Sigma (2nd edition): Ex. 9.3.

3. Then Ex. 24.9.

21.04

Here is a brief description of an investigation into house prices.

The sample data shows that the median price is only $281 000.A boxplot shows the distribution of house prices is not symmetric.

Conclusions from the data

Sampling methodA mixture of cluster sampling and systematic sampling is used to obtain the data.

The data is entered into a spreadsheet, sample statistics calculated and appropriate graphs produced.

Data processing

HypothesisThe median price that a house is sold for in Auckland is $350 000, and the distribution of house prices is not symmetric.

Critically evaluate the sampling process described in detail on the next slide.

The sampling method.

A mixture of cluster sampling and systematic sampling is chosen.

There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages.

Each page has 4 columns, with a maximum of about 50 phone numbers on each.

Each of the 4 locations obtained is visited and all prices in every second row in their office windows recorded.

=RANDBETWEEN(1811,1819),RANDBETWEEN(1,4),RANDBETWEEN(1,50)

A random number generator or spreadsheet formula is used to get the 4 clusters (ie 4 real estate offices).

21.04

Cluster sampling

Systematic sampling

Page (1811 to 1819) Column (1–4) Entry (1 to 50)

It would not be practical to visit a large number of real estate locations and only select a few house prices.

Hence a cluster sample is faster and more convenient to obtain.

21.04

Choose a mixture of cluster sampling and systematic sampling.

There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages.

Each page has 4 columns, with a maximum of about 50 phone numbers on each.

Each location is visited and all prices in every second row in their office windows recorded.

Use a random numbers to obtain 4 clusters (ie 4 real estate offices).

Justify the sampling method.

Justify the use of cluster sampling.

Taking a systematic sample from each location helps with the convenience.

Because the data can be collected comparatively quickly a ‘snapshot’ of prices at the same point in time is possible.

Justify the use of systematic sampling.

21.04

Choose a mixture of cluster sampling and systematic sampling.

There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages.

Each page has 4 columns, with a maximum of about 50 phone numbers on each.

Each location is visited and all prices in every second row in their office windows recorded.

Justify the sampling method.

Use a random numbers to obtain 4 clusters (ie 4 real estate offices).

21.04

Critically evaluate the sampling process

Note that the sample is almost certainly biased, and has many faults.

Comment on the choice of clusters.

(The characteristics of Auckland suburbs mean some have low and fairly uniform prices, while others, like Milford, have very expensive beachfront property.)

The number of clusters is too low and there is a significant difference between the clusters.

Further preliminary research would help select areas for inclusion in a stratified sample, and this would more accurately reflect the population.

There are nine pages (numbered from 1811 to 1819) of real estate agents in the Auckland Yellow Pages. Each page has 4 columns, with a maximum of about 50 phone numbers on each.Use a random numbers to obtain 4 clusters (ie 4 real estate offices).

21.04

Critically evaluate the sampling process

Note that the sample is almost certainly biased, and has many faults.

Comment on the use of systematic sampling.

Each location is visited and all prices in every second row in their office windows recorded.

One particular row of advertisements and photos of houses may include bargains only, or better quality houses, and some houses may not be advertised there.

21.04

Critically evaluate the sampling process

The list in the Yellow Pages is not exhaustive—private sales are excluded for example, and some houses are listed with more than one real estate agent.

Note that the sample is almost certainly biased, and has many faults.

Several selling methods are excluded—sale by tender or auction for example—where the price is unknown.

Some real estate agents are listed more than once, so are more likely to be chosen.

Some areas of Auckland may be over-supplied with real estate offices, again making these areas more likely to be selected.

There are an unequal number of agents listed in each column.

As well as the data not supplying the information required in the hypothesis, it is incomplete, and has duplications.

Comment on the quality of the data.

21.04

Critically evaluate the sampling process

This data would answer the question directly, and should include all sales within a particular time period, with no duplication, so would be representative of the population.

The sampling process could be improved by gaining access to actual sales data used by real estate agents.

(Not easy because they guard it jealously!)

Note that the sample is almost certainly biased, and has many faults.

Comment as to how better data could be obtained.

Do Old Sigma: Ex. 9.2–Q1, Ex. 9.3, Ex. 24.9.