kommunikasjon: communicating accuracy of register statistics

21
Communicating Accuracy of Register Statistics Thomas Laitila Statistics Sweden and Örebro university Presentation at Nordiskt Statistikermöte, Bergen, 2013

Upload: nordisk-statistikermote-2013

Post on 14-Jan-2015

130 views

Category:

Technology


1 download

DESCRIPTION

Communicating Accuracy of Register Statistics

TRANSCRIPT

Page 1: Kommunikasjon: Communicating accuracy of register statistics

Communicating Accuracy of Register Statistics

Thomas Laitila

Statistics Sweden and Örebro university

Presentation at Nordiskt Statistikermöte, Bergen, 2013

Page 2: Kommunikasjon: Communicating accuracy of register statistics

Outline

Why - Why measure accuracy?

Criteria - Criteria on measures of uncertainty of register statistics

CIm - Confidence Image

Example

Discussion

Bergen, 2013 2 Thomas Laitila

Page 3: Kommunikasjon: Communicating accuracy of register statistics

Why - Some basic questions

• What is statistics (statistical inference methods) all about?

• What is making statistics so special, why is it of value to us?

Bergen, 2013 3 Thomas Laitila

Page 4: Kommunikasjon: Communicating accuracy of register statistics

Why - Chatterjee (2003)

• There are two methods for deriving statements – deduction and induction

• Statistics is a prolongation of epistemiology (theory on knowledge and knowledge building)

• Statistics provide with a method for inductive inference

Bergen, 2013 4 Thomas Laitila

Page 5: Kommunikasjon: Communicating accuracy of register statistics

Why - Induction

Assumptions

Evidence

Area of concern

Statement

Bergen, 2013 5 Thomas Laitila

Page 6: Kommunikasjon: Communicating accuracy of register statistics

Why - Induction, example

Ignorable nonresponse

Sample of units

Swedish labor market

Estimate of rate of unemployment

Bergen, 2013 6 Thomas Laitila

Page 7: Kommunikasjon: Communicating accuracy of register statistics

Why - Induction, another example

Register on units

Swedish labor market

Estimate of rate of unemployment

Derived variables

Bergen, 2013 7 Thomas Laitila

Page 8: Kommunikasjon: Communicating accuracy of register statistics

Why - Induction and Evidence

• All evidence come with uncertainty of the general

• Statements derived by induction are uncertain

• Example: Inductive statement – A man will inevitably die

– Evidence - No man born for more than e.g. 150 years ago are still alive.

Bergen, 2013 8 Thomas Laitila

Page 9: Kommunikasjon: Communicating accuracy of register statistics

Why - Why is statistical inference so special?

• Statistics is the only theory yet, providing with objective measures of uncertainty of inductive inference.

• Objective measures of importance for general communication of statistics.

Bergen, 2013 9 Thomas Laitila

Page 10: Kommunikasjon: Communicating accuracy of register statistics

Why - Summing up

• Register statistics yield inductive statements

• Register statistics are thereby uncertain

• Statistical inference provide with objective measurements of uncertainty

• Inference on register statistics should be founded in statistical inference theory

• Do we have appropriate statistical tools?

Yes, and no

Bergen, 2013 10 Thomas Laitila

Page 11: Kommunikasjon: Communicating accuracy of register statistics

Criteria - Approaches for statistical inference on register statistics

• Model based methods

– Multivariate techniques

– Data mining methods

– Stochastic processes

– and more

• Sample surveys

– Use sample surveys as a complement for measuring uncertainty

Bergen, 2013 11 Thomas Laitila

Page 12: Kommunikasjon: Communicating accuracy of register statistics

Criteria – Criteria on a measure

a) Founded within statistical inference theory

• Interpretable and objective measures

b) Easy to interpret by users

• How easy is the interpretation of an ordinary confidence interval?

c) Of low cost

d) Comparable with measures in sample surveys

• Comparability/coherency

Bergen, 2013 12 Thomas Laitila

Page 13: Kommunikasjon: Communicating accuracy of register statistics

CIm – A new statistical tool

• Statistical inference methods centers around – a point estimator, and – its sampling distribution

• In register statistics, treating variables as fixed, there is – a point estimate, but – its sampling distribution is degenerate

• One alley of finding appropriate tools for register statistics is to develop statistical inference procedures which are not based on the sampling distribution of an estimator!

Bergen, 2013 13 Thomas Laitila

Page 14: Kommunikasjon: Communicating accuracy of register statistics

CIm - Laitila (2012)

• Confidence Images

• Idéa: Use external information to restrict the potential values of study variables (y1,y2,…,yN) – This will restrict the potential values of the population

parameter of interest t=f(y1,y2,…,yN)

– The more information, the more t is restricted.

• Information can come in any form, as long it comes with a measure of uncertainty

• We can use registers, sample surveys, old statistics, google, facebook, whatever!!!

Bergen, 2013 14 Thomas Laitila

Page 15: Kommunikasjon: Communicating accuracy of register statistics

Example - Estimation of total number of cattle in Swedish farms

County N:o units N:o missing values Sum of y_k

1 18713 3817 393797

2 14321 2918 296944

3 12281 2475 261832

4 10836 2213 216535

5 8646 1763 185285

6 7233 1485 148029

Total 72030 14671 1502422

Table 1: Information1 in available register on farms (N=72030)

1) No measurement or coverage errors in the register.

Problem: Estimate the total number of cattle with an interval estimate using the information in the register, which contains missing values.

Bergen, 2013 15 Thomas Laitila

Page 16: Kommunikasjon: Communicating accuracy of register statistics

Example - Pieces of information

• A1: Available data in the register

• A2: The 100 largest farms are in the register and the N:o cattle for the 100th largest farm is 553.

• A3: Table 2 (below)

• A4: A 95% CI of the proportion of farms with zero cattle: 0.6 – 0.71

Bergen, 2013 16 Thomas Laitila

Page 17: Kommunikasjon: Communicating accuracy of register statistics

In register In population

County y_k=0 y_k>=553 y_k>=100 y_k>=100

1 9108 29 1252 1288

2 6989 17 931 959

3 5960 21 784 800

4 5329 12 677 701

5 4196 10 581 601

6 3565 11 467 477

Total 35147 100 4692 4826

Table 2: Additional information (N:o units)

Example – Table 2

Bergen, 2013 17 Thomas Laitila

Page 18: Kommunikasjon: Communicating accuracy of register statistics

Example – Calculated CIms

Information Used

Confidence Level

Lower bound

Upper bound

A1 - A2 100% 1502 9615

A1 - A3 100% 1516 3016

A1 – A4 95% 1516 2217

Table 3: Confidence intervals for the total number of cattle based

on information sets A1 – A4. (Thousands cattle, True value 1,56 million)

Bergen, 2013 18 Thomas Laitila

Page 19: Kommunikasjon: Communicating accuracy of register statistics

Discussion

• Image instead of interval as the method may not provide a “connected” interval of points. The CIm may consist of e.g. separate disjoint intervals

• The CIm can directly be generalized to multivariate cases.

• Easy calculated in some cases, in others calculation can be a most complicated thing. Research needed here.

Bergen, 2013 19 Thomas Laitila

Page 20: Kommunikasjon: Communicating accuracy of register statistics

Discussion

• The CIm fulfill all the four criteria listed above.

• Most interesting: – Traditional confidence intervals are special cases

of Cims

– Any kind of information (data) can be used, as long as there is a probability measure of its certainty

• The CIm is a theory, there is a need for methodological developments.

Bergen, 2013 20 Thomas Laitila

Page 21: Kommunikasjon: Communicating accuracy of register statistics

Thanks for Your attention!

Request of paper Laitila (2012) [email protected]

Bergen, 2013 Thomas Laitila 21