kommunikasjon: communicating accuracy of register statistics
DESCRIPTION
Communicating Accuracy of Register StatisticsTRANSCRIPT
Communicating Accuracy of Register Statistics
Thomas Laitila
Statistics Sweden and Örebro university
Presentation at Nordiskt Statistikermöte, Bergen, 2013
Outline
Why - Why measure accuracy?
Criteria - Criteria on measures of uncertainty of register statistics
CIm - Confidence Image
Example
Discussion
Bergen, 2013 2 Thomas Laitila
Why - Some basic questions
• What is statistics (statistical inference methods) all about?
• What is making statistics so special, why is it of value to us?
Bergen, 2013 3 Thomas Laitila
Why - Chatterjee (2003)
• There are two methods for deriving statements – deduction and induction
• Statistics is a prolongation of epistemiology (theory on knowledge and knowledge building)
• Statistics provide with a method for inductive inference
Bergen, 2013 4 Thomas Laitila
Why - Induction
Assumptions
Evidence
Area of concern
Statement
Bergen, 2013 5 Thomas Laitila
Why - Induction, example
Ignorable nonresponse
Sample of units
Swedish labor market
Estimate of rate of unemployment
Bergen, 2013 6 Thomas Laitila
Why - Induction, another example
Register on units
Swedish labor market
Estimate of rate of unemployment
Derived variables
Bergen, 2013 7 Thomas Laitila
Why - Induction and Evidence
• All evidence come with uncertainty of the general
• Statements derived by induction are uncertain
• Example: Inductive statement – A man will inevitably die
– Evidence - No man born for more than e.g. 150 years ago are still alive.
Bergen, 2013 8 Thomas Laitila
Why - Why is statistical inference so special?
• Statistics is the only theory yet, providing with objective measures of uncertainty of inductive inference.
• Objective measures of importance for general communication of statistics.
Bergen, 2013 9 Thomas Laitila
Why - Summing up
• Register statistics yield inductive statements
• Register statistics are thereby uncertain
• Statistical inference provide with objective measurements of uncertainty
• Inference on register statistics should be founded in statistical inference theory
• Do we have appropriate statistical tools?
Yes, and no
Bergen, 2013 10 Thomas Laitila
Criteria - Approaches for statistical inference on register statistics
• Model based methods
– Multivariate techniques
– Data mining methods
– Stochastic processes
– and more
• Sample surveys
– Use sample surveys as a complement for measuring uncertainty
Bergen, 2013 11 Thomas Laitila
Criteria – Criteria on a measure
a) Founded within statistical inference theory
• Interpretable and objective measures
b) Easy to interpret by users
• How easy is the interpretation of an ordinary confidence interval?
c) Of low cost
d) Comparable with measures in sample surveys
• Comparability/coherency
Bergen, 2013 12 Thomas Laitila
CIm – A new statistical tool
• Statistical inference methods centers around – a point estimator, and – its sampling distribution
• In register statistics, treating variables as fixed, there is – a point estimate, but – its sampling distribution is degenerate
• One alley of finding appropriate tools for register statistics is to develop statistical inference procedures which are not based on the sampling distribution of an estimator!
Bergen, 2013 13 Thomas Laitila
CIm - Laitila (2012)
• Confidence Images
• Idéa: Use external information to restrict the potential values of study variables (y1,y2,…,yN) – This will restrict the potential values of the population
parameter of interest t=f(y1,y2,…,yN)
– The more information, the more t is restricted.
• Information can come in any form, as long it comes with a measure of uncertainty
• We can use registers, sample surveys, old statistics, google, facebook, whatever!!!
Bergen, 2013 14 Thomas Laitila
Example - Estimation of total number of cattle in Swedish farms
County N:o units N:o missing values Sum of y_k
1 18713 3817 393797
2 14321 2918 296944
3 12281 2475 261832
4 10836 2213 216535
5 8646 1763 185285
6 7233 1485 148029
Total 72030 14671 1502422
Table 1: Information1 in available register on farms (N=72030)
1) No measurement or coverage errors in the register.
Problem: Estimate the total number of cattle with an interval estimate using the information in the register, which contains missing values.
Bergen, 2013 15 Thomas Laitila
Example - Pieces of information
• A1: Available data in the register
• A2: The 100 largest farms are in the register and the N:o cattle for the 100th largest farm is 553.
• A3: Table 2 (below)
• A4: A 95% CI of the proportion of farms with zero cattle: 0.6 – 0.71
Bergen, 2013 16 Thomas Laitila
In register In population
County y_k=0 y_k>=553 y_k>=100 y_k>=100
1 9108 29 1252 1288
2 6989 17 931 959
3 5960 21 784 800
4 5329 12 677 701
5 4196 10 581 601
6 3565 11 467 477
Total 35147 100 4692 4826
Table 2: Additional information (N:o units)
Example – Table 2
Bergen, 2013 17 Thomas Laitila
Example – Calculated CIms
Information Used
Confidence Level
Lower bound
Upper bound
A1 - A2 100% 1502 9615
A1 - A3 100% 1516 3016
A1 – A4 95% 1516 2217
Table 3: Confidence intervals for the total number of cattle based
on information sets A1 – A4. (Thousands cattle, True value 1,56 million)
Bergen, 2013 18 Thomas Laitila
Discussion
• Image instead of interval as the method may not provide a “connected” interval of points. The CIm may consist of e.g. separate disjoint intervals
• The CIm can directly be generalized to multivariate cases.
• Easy calculated in some cases, in others calculation can be a most complicated thing. Research needed here.
Bergen, 2013 19 Thomas Laitila
Discussion
• The CIm fulfill all the four criteria listed above.
• Most interesting: – Traditional confidence intervals are special cases
of Cims
– Any kind of information (data) can be used, as long as there is a probability measure of its certainty
• The CIm is a theory, there is a need for methodological developments.
Bergen, 2013 20 Thomas Laitila
Thanks for Your attention!
Request of paper Laitila (2012) [email protected]
Bergen, 2013 Thomas Laitila 21