1 2006 public use microdata file (pumf) 1. change factors 2. scenarios : characteristics 3. analytic...
TRANSCRIPT
1
2006 Public Use Microdata File (PUMF)
1. Change factors2. Scenarios : characteristics3. Analytic Content: additions and losses
Outline
DLI Ontario Training, Ryerson University, Dec. 13, 2007
Martine Grenier, Mokili Mbuluyo, Jean René Boudreau, Statistics Canada
2
1. Change Factors
Improvement of the three files’ analytic content for greater use at the national and international levels
Greater accessibility of census data
Data confidentiality constraints• File size• Limited geography• Age “variable”• Income “variable”
Late release of PUMFs • Delay due to heavy workload of selecting, certifying and
deriving variables and quality control on the files
3
Content
1. Sample size Individuals: 800,000 records
Families: 310,000 records
Households and dwellings: 350,000 records
2. Geography Provinces, Territories, CMAs
3. Variables Variables extracted from the dissemination database
Large number of derived variablesLess detailed variables for Maritime provinces andNorthern territories
Variables repeated in the 3 files
Reduction of disclosurerisks
Substantial disclosure control by the microdata filereview committeeConfidentiality rules applied separately to each file3 years, expected release in 2010?
Production time Considerable amount of work for SM analysts to certify derived variables
2. Scenario #1: Status Quo (2001)
4
Content
1. File size Single file: 800,000 records (individuals)
Some persons will represent a family or a household
2. Geography Canada, 5 regions, 5 CMAs with a population of at leastone million
3. Variables Variables extracted from the dissemination database
Derived variables of complexity level 4 or which require the use of limited data
Reduction of disclosure risks Eliminate values with Canada frequency of less than
100,000. Collapse some or all of age groups. Round off or
generate noise in income components
Production time Projected release: Summer 2009Reduced certification
2. Scenario #2: Single File
5
Content
1. File size Hierarchical file: 350,000 records on households
All families and persons are included and identified in thehousehold (about 800,000 persons).
2. Geography Canada, regions with a population of at least 2 million
3. Variables 2B variables from the dissemination database
Derived variables of complexity level 4 or which require the use of limited data
Reduction ofdisclosure risks
Eliminate values with Canada frequency of less than
100,000. Collapse age groups. Round off or generate
noise in income components
Production time Reduced certificationProjected release: Summer 2009
2. Scenario #3: Hierarchical File
6
3. Analytic Content: additions and lossesPUMF-2006
(Status Quo )PUMF-2006 (Single File)
PUMF-2006 (Hierarchical File)
Content
Size: 2.7% of the population Size: 2.7% of the population Size: 2.7% of the population
Independent samples of the three universes
Some people represent a family or a household
All families and persons in households sampled are included
Diverse geographies at theprovince and CMA levels
Geography limited toprovinces and major CMAs(pop. 1 million)
Geography more limited toregions
Families and households wellrepresented
Loss of information about families and households
File representative of households; more varied content including all data
Repetition of variablesbetween the 3 universes; complex derived variables
Variables taken from the questionnaire so that userscan create their own derivedvariables
Variables taken from the questionnaire so that userscan create their own derivedvariables
Analytic content limited to one universe at a time
Analytic content extended to the three universes
Analytic content extended to the three universesGreater potential for analysisand international comparison
Production requirements
Certification and productionprojected for summer 2010
Production projected for summer 2009
Production projected for summer 2009
Confidentiality Suppression level higher thanin 2001
Suppression level lower than in 2001 (less geographies)
Same suppression level as in 2001 (less geographies)
7
Thank you!