OUTLINE: sampling and census sampling surveys, frame, size probability and non-probability sampling
methods census
OUTLINE: sampling and census sampling surveys, frame, size probability and non-probability sampling
methods census
SSAMPLINGAMPLINGSSAMPLINGAMPLING
collection methods for data
SamplingSampling
any data collection that is not a controlled experiment
i.e. percentage of greenhouse gases in atmosphere above Winnipeg
SSAMPLING AND AMPLING AND CCENSUSENSUSSSAMPLING AND AMPLING AND CCENSUSENSUS
CensusCensus
survey whose domain is the characteristics of an entire population
any study of entire population of a particular set of ‘objects’.
i.e. female polar bears in western Hudson Bay
human residents of Heidelberg
the number of Epacris impressa plants on a single hillside in Riding Mountain National Park
SSAMPLING AND AMPLING AND CCENSUSENSUSSSAMPLING AND AMPLING AND CCENSUSENSUS
collect, analyse or study only some members of a population then we are carrying out a surveysurvey
aim is to make observations at a limited number of carefully chosen locations that are representative of a distribution
use sample to predict the overall character of the population – accuracy will depend on quality of sample
SSAMPLING AND AMPLING AND CCENSUSENSUSSSAMPLING AND AMPLING AND CCENSUSENSUS
done for several reasons:
costs less than a census of the equivalent population
they are carried out to answer specific questions,
sample survey will usually offer greater scope than a census (larger geographical area, greater variety of questions)
SSAMPLING AMPLING SSURVEYSURVEYSSSAMPLING AMPLING SSURVEYSURVEYS
development of sampling survey:
state objectives of survey
define target population
define data to be collected
define the required precision and accuracy
define the measurement ‘instrument’
define the sample frame, sample size and sampling method, then select the sample
SSAMPLING AMPLING SSURVEYSURVEYSSSAMPLING AMPLING SSURVEYSURVEYS
process of generating a sample requires several critical decisions to be made:
sample frame
sample size
sampling method
errors will compromise the entire survey
SSAMPLING AMPLING SSURVEYSURVEYSSSAMPLING AMPLING SSURVEYSURVEYS
if frame is wrongly defined, sample may not be representative of the target population.
frame might be ‘wrong’ in three ways:
contains too many individuals (membership is under-defined)
contains too few individuals (membership is over-defined)
contains the wrong set of individuals (membership is ill-defined)
SAMPLE FRAMESAMPLE FRAME
Two-stage process:
divide the target population into sampling units
i.e. households, trees, light bulbs, soil samples, cities, individuals
create a finite list of sampling units that make up the target population.
i.e. names, addresses, identity numbers, # of 50 mL sample bottles
SSAMPLE AMPLE FFRAMERAMESSAMPLE AMPLE FFRAMERAME
member of a sample/sample frame
in geomatics – points, lines (transects) and areas (quadrats)
i.e. measuring snow depth at 10 cm intervals along a 10 m line
measuring all features that fall within 10 m of a line
SSAMPLING AMPLING UUNITSNITSSSAMPLING AMPLING UUNITSNITS
quantity is not better than quality
in statistics – sample size of 30 or greater is ideal
in geomatics – appropriate sample size is directly related to a distribution’s variability
SSAMPLE AMPLE SSIZEIZESSAMPLE AMPLE SSIZEIZE
aim is to obtain a sample that is representative of the target population.
when selecting a sampling method, we need some minimal prior knowledge of the target population
how we actually decided which sampling units will be chosen makes up the sampling method.
SSAMPLING AMPLING MMETHODETHODSSAMPLING AMPLING MMETHODETHOD
most sampling methods attempt to select units such that each has a definable probability of being chosen - probability sampling methods.
we can ignore probability of selection and choose samples on some other criterion – non-probability sampling methods.
SSAMPLING AMPLING MMETHODETHODSSAMPLING AMPLING MMETHODETHOD
NNON-PROBABILITY ON-PROBABILITY SSAMPLINGAMPLINGNNON-PROBABILITY ON-PROBABILITY SSAMPLINGAMPLING
units that make up the sample are collected with no specific probability structure in mind
i.e. units are self-selected
units are most easily accessible
units are selected on economic grounds
units are considered to be typical of pop’n
units are chosen without an obvious design
considered inferior to other method - no statistical basis upon which the success of sampling method can be evaluated.
may be unavoidable – regard as a ‘last resort’ when designing a sample scheme.
NNON-PROBABILITY ON-PROBABILITY SSAMPLINGAMPLINGNNON-PROBABILITY ON-PROBABILITY SSAMPLINGAMPLING
basis is the selection of sampling units to make up the sample based on defining the chance that each unit in the sample frame will be included
i.e. have 100 students, need 10 to fill out a survey, each student has a 1 in 10 chance or being selected (probability of selection is 0.1)
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
each time we apply the same method to the same frame, we will generate a different sample
concerned with probability of each sample being chosen, rather than with the probability of choosing individual units
number of probability sampling strategies
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
Simple random sampling simplest way
select n units such that every one of the possible samples has an equal chance of being chosen
generate a sample by selecting from the sample frame by any method that guarantees that each sampling unit has a specified probability of being included
how we do the sampling is of no significance (I.e. random number tables, dice, …)
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
Simple random sampling
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
i.e. 94407382
94409687
93535459
94552345
94768091
93732085
94556321
94562119
93763450
94127845
94675420
94562119
93763450
94127845
Use random number table to generate six random number between 1 and 14
4, 6, 7, 9, 11, 13
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
Stratified Sampling used when you suspect the target population
actually consists of a series of separate ‘sub-populations’
stratification is the process of splitting the sample to take account of possible sub-populations
stratified sampling – total pop is first divided into a set of mutually exclusive sub-pops/strata
sub-populations may be of equal sizes or not depending on their relative sizes
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
Stratified Sampling within each strata, select a
sample usually ensuring that the probability of selection is the same for each unit in each sub-pop – stratified random sample
i.e. national polls and rating surveys
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
i.e. 94407382
94409687
94535459
94552345
94768091
94732085
94556321
93562119
93763450
93127845
93675420
93562119
93763450
93127845
First split pop into sub-pops (based on the second number in this example)
Then sample from these sub-pops (three from each using a random number table – 1, 2, 5)
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
Systematic Sampling decide sample size from
the population size; population has to be organized in some way
i.e. points along a river, simple numerical order
simpler in design and easier to administer
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
Systematic Sampling choose a starting point along the sequence
by selecting the rth unit from one end of the sequence
then take the rest of the sample by a number to r
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
i.e. 94407382
94409687
94535459
94552345
94768091
94732085
94556321
93562119
93763450
93127845
93675420
93562119
93763450
93127845
First order the sample units (in this case decreasing numerical order)
Next, select the first point (r value) – 2
Then take every third sample after this (2, 5, 8, 11, 14)
PPROBABILITY ROBABILITY SSAMPLINGAMPLINGPPROBABILITY ROBABILITY SSAMPLINGAMPLING
CCENSUSENSUSCCENSUSENSUS
aim is to identify and record all members of a population
most countries routinely carry out a census on its population
i.e. Canada – performs a census every 5 years (1981, 1986, 1991, 1996, 2001)
original function to enumerate for electoral purposes, but encompasses a large range of information about national populations
collects important information about the social and economic situation of people living in an area Population Counts
Age, Sex, Marital Status, Families (number, type and structure)
Structural Type of Dwelling and Household Size
Immigration and Citizenship, Education, Mobility, Migration
Mother Tongue, Home Language and official/Non-Official Languages
Ethnic Origin and Population Group (visible minorities)
Labor Market Activities, Household Activity, Place of Work and Mode of Transportation
Sources of Income, Total Income and Family and Household Income
Families: Social and Economic Characteristics, Occupied Dwellings and Household Costs
CCENSUSENSUSCCENSUSENSUS
disadvantages of census:
time consuming - require years of planning
laborious - requires thousands of workers/volunteers
costly - millions of dollars to survey everyone
CCENSUSENSUSCCENSUSENSUS
Errors in census data:
people respond dishonestly due to lack of confidence in confidentiality
full accounting of residences is difficult to document (i.e homeless)
recruiting substandard people to conduct surveys
CCENSUSENSUSCCENSUSENSUS
a census consists of “enumeration” data counts tabulated or ‘aggregated’ by
geographic areas
census regions/enumeration areas are not distributed uniformly and vary in shape, size and orientation
Canada divided into 51,500 enumeration areas
census regions are defined by political boundaries and natural and cultural landmarks
CCENSUS ENSUS RREGIONSEGIONSCCENSUS ENSUS RREGIONSEGIONS
Enumeration Area (EA) smallest reported census area
canvassed by one census representative
125-440 dwellings, depending on situation in rural/urban area
Census Tract (CT) represent urban or rural communities in CMAs and Cas
populations range between 2,500 - 8,000
Census Subdivision (CSD) term applied to municipalities or equivalent
CCENSUS ENSUS RREGIONSEGIONSCCENSUS ENSUS RREGIONSEGIONS
Census Division (CD) areas intermediate between municipality (CSD) and province
level
represent counties, regional districts, regional municipalities
Census Metropolitan Area/Census Agglomeration (CMA/CA) CMA and CA are very large urban cores together with adjacent
integrated urban and rural areas
urban core population >100,000 for CMA, >10,000 for CA
CMA may be combined with adjacent CAs to form ‘consolidated CMA’
Federal Electoral Districts (FED) area entitled to elect a representative member to the House
CCENSUS ENSUS RREGIONSEGIONSCCENSUS ENSUS RREGIONSEGIONS
aggregate census information within the boundaries of the data collection regions.
reduce costs
confidentiality
GIS concerns
census region totals are more abstract and spatially inaccurate
mask the true nature of population distribution
CCENSUS ENSUS RREGIONSEGIONSCCENSUS ENSUS RREGIONSEGIONS
aggregated data reported as census region totals – data presentation is a count by region
also report census totals at region centroids
center of area – balance point for census region shape
center of population – averaging x and y coordinates of the individual pop`n.
RREPORTING EPORTING MMETHODETHODRREPORTING EPORTING MMETHODETHOD
Map of Census divisions
census represents a very important source of data for GIS because:
it provides data of use in many areas of human geography: social, economic, political
the census goes back to Confederation, so historical analyses can be performed
the census provides data in a large variety of readily-mapped spatial zones (eg CMA, county)
CCENSUS AND ENSUS AND GISGISCCENSUS AND ENSUS AND GISGIS