the many ways of improving the industrial coding for statistics canada’s business register yanick...

31
The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Upload: natalie-ball

Post on 17-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register

Yanick Beaucage

ICES III

June 2007

Page 2: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Overview

Background

Automatic Coding

Manual Coding

Quality Evaluation of Classification Updates

Quality Assurance Survey

Conclusion

Page 3: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Background

STC’s Business Register RedesignImprove administrative data link

Improve treatment of births/deaths

Reflect the businesses reality

Give update privileges to a larger set of people

Develop a quality assurance program

Part of the quality assurance program is ensuring good industrial classification

Page 4: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Background

Good industrial classificationLeads to better population identification

Leads to smaller sample size

Leads to reduced collection cost

Leads to better precision

Prevents frustration from respondents (and interviewers)

Page 5: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Background

BusinessRegister

Statistics Canada

Page 6: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Background

BusinessRegister

Canada Revenue Agency Statistics Canada

Page 7: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Background

BusinessRegister

Canada Revenue Agency

Automatic

Manual

Statistics Canada

Page 8: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Background

BusinessRegister

Updates

Canada Revenue Agency

Automatic

Manual QE

QE

Statistics Canada

Page 9: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Background

BusinessRegister

Updates

Canada Revenue Agency

Automatic

Manual QE

QE

QAS

Statistics Canada

Page 10: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Automatic Coding

New businesses apply for a Business Number (BN) (done at Canada Revenue Agency - CRA)

In person, over the phone, over the internet, ...

What is the description of the main Business activity?

Decision tree tool used by CRA

Prompts for details needed for coding

Returns a robot-phrase to Statistics Canada

Page 11: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007
Page 12: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007
Page 13: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007
Page 14: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Automatic Coding

Assign classification based on robot-phrase

Improving decision tree tool and usageRe-developed on micro (originally mainframe)Expand use for Web BN application (currently used for phone or in person registration)Develop questions for all sectors

Currently used for 75% of all industrial sectorsCovers 90% of all descriptions to be coded

Page 15: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Automatic Coding

Automated Character Text Recognition (ACTR)

If description too general Manual coding

Used to assign classification based on descriptions

Reference file (French and English)

Parsing strategy

Word weighting algorithm

Score derived

Page 16: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Automatic Coding

Improving use of ACTRImprove reference file

Each year new phrases are addedCurrently 7 000 phrases

Study score needed for matchOpening the weighting algorithm

Improve parsing rulesRevisit the rules

Create an environment for testing purposesEvaluate impact of changing input/rules/score

Page 17: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Automatic Coding

40 000 new businesses a month to code

45% are coded using robot-phrases

5% are coded using ACTR

Leaves 20 000 new businesses to codeNeed manual coding

Done at Statistics Canada

Page 18: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Manual Coding

Other units to code manuallySurvey feedback

New operating entity found when profiling

ToolSearch engine for industrial coding

Improve manual codingAdd on-line ACTR or ACTR results

Add decision tree tool

Page 19: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Manual Coding

New businessesGoal: code all of them

Reality: do as many as we can

Result: backlog of businesses to code

Page 20: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Manual Coding

New businessesGoal: code all of them

Reality: do as many as we can

Result: backlog of businesses to code

BusinessRegister

Automatic

Manual

Automatic

CRA May batch

CRA June batch Backlog

Manual

Manual

Page 21: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Manual Coding

Which units should be coded first?First in, first out?Economic activity signal?

Economic activity is determined by administrative data

Both! Select a sample from backlogTake-all (large economic activity)Take-some 1 (economic activity / older units)Take-some 2 (economic activity / newer units)Take-none (no economic activity )

Page 22: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Manual Coding

Prioritize units to code

Can produce under-coverage estimates of the backlog by industrial sector

Ultimate goalImprove automatic coding

80% - 90%?

Code all remaining active units

Page 23: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Quality Evaluation of Classification Updates

Update privileges will be expandedSubject-matter specialists

Collection personnel

Need to evaluate the quality of updatesPrevent systematic errors

Where to focus training

Page 24: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Quality Evaluation of Classification Updates

Two processesNotification and sample selection

1- NotificationSpecialist determines set of enterprise to look at

Every update to targeted enterprise is sent to specialist

Agree/Disagree/Do nothing

Make use of expertise of specialist

Specialists keep up-to-date with their frame

Page 25: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Quality Evaluation of Classification Updates

2- Sample selection and evaluationBased on industry, source of industry, size and complexity of enterpriseRe-code and compare

Minimize respondent input when re-coding

Using notification and sampleProduce error rate for industrial codingTarget specific problems

Page 26: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Quality Assurance Survey

Goal: assess the quality of classification on the BR on an on-going basis

Assess dead/alive status as well

Point in time surveys done in the past1993, 1995, 1997, 2002

Implement a continuous surveyProduce overall results monthly

Produce detailed results combining 12 months

Page 27: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Quality Assurance Survey

StratificationIndustrial sectors

2 or 3 size stratumHave higher sampling fraction for larger size

Recently contactedConsidered to have valid classification

Sample allocationTarget 3.5% standard error for annual industrial classification error rate

550 units a month

Page 28: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Quality Assurance Survey

Currently doing a pilot test

Monthly estimates produced

Yearly estimates based on weighted average of 12 monthly measures

Weighted average based on 1/12

Weighted average based on population ratio over the year (Nm/(N1+...+N12))

Page 29: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Quality Assurance Survey

Survey will be used to Clean-up the register as an independent source

Evaluate industrial in and out-of-scope rate

Evaluate industrial error rate for non-surveyed portion of the register (e.g. small enterprises)

Evaluate death rate in order to adjust sample sizes

Potential useEvaluate frame quality for new surveys

Clean-up part of the register

Page 30: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Conclusion

Classification is essential to the BRRedesign provides an opportunity

To improve codingTo standardize tools used for codingTo measure quality of coding adequatelyTo set-up good practices/good reports

ResultsBetter quality of business survey framesMore efficient surveys

Page 31: The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007

Pour plus d’information, veuillez contacter

For more Information please contact

Visit our web site atwww.statcan.ca

Yanick Beaucage 613-951-4622

[email protected]