adaptation to the survey ict-h in canary islands of the dual frame methodology

1
The target variables considered are generated using Bernoulli distribution with different parameter. p 1 : probability of availability of fixed telephone. p 2 : probability of availability of desktop computer. p 3 : probability of only fixed telephone for those with fixed telephone. p 4 : probability of availability of internet connection for those with fixed telephone. p 5 : probability of availability of internet connection for those with not fixed telephone. ICT-H Survey (INE): Provides information of equipment and use of new technologies in household at the Autonomous Community level. Follows a stratified three stage random sampling design in each province, with primary sampling units the census sections, secondary units the dwellings and tertiary units the people. 120 sections are sampled (8 dwellings by section, approximately 808 dwellings). Each year renews a quarter of dwellings (rotating panels from 2004). It uses direct estimates of reason with calibrated weights, w j. All households at the first visit are surveyed by personal interview (CAPI), then those ones that have fixed telephone, in subsequent visits are surveyed through telephone interviews (CATI). The variables of interest, among others, are: availability of fixed telephone, mobile phone, only fixed telephone, desktop computer, portable computer, internet, etc. ICT-H Canary Survey (ISTAC): Similar to the ICT-H survey (INE) but it’s a light survey in the questionnaire and provides information at the island level (performed only in 2006 and 2010). Follows a stratified three stage random sampling design. 180 sections are sampled (20 dwellings by section, approximately 3,500 dwellings). It uses direct estimates of reason with calibrated weights, w j . 70% of the survey is conducted through telephone interview (CATI) and 30% with personal interview. Adaptation to the survey ICT-H in Canary Islands of the Dual Frame methodology González-Dávila, Enrique 1 ; and and González-Yanes, Jesús Alberto 2 1 [email protected], 2 [email protected] 1 Departamento de Estadística, Investigación Operativa y Computación (Universidad de La Laguna, Spain) 2 Instituto Canario de Estadística (ISTAC, Spain) 1. Description of the surveys 4.Implementation real on ICT-H Canary survey (2010) ERCIM 2011 4th International Conference of the ERCIM on Computing & Statistics Londres, December 17-19, 2011 Abstract The combination of information obtained in person and by telephone of a survey usually includes the definition of two scenarios, one associated with the census or population register, and another to phonebook. The application of dual frame methodology to surveys of availability or use of new technologies in households where there are target variables that can match, have different degrees of association, or be independent with the scenarios defined, are of particular interest. In this work this methodology is adapted to the particular case of the ICT-H survey conducted in the Canary Islands, where one of the scenarios is contained entirely in the other. Dual frame estimator used is the pseudo maximum likelihood estimator introduced by Skinner and Rao. The results are compared with those offered by the direct estimator, and the estimates given by the ICT-H survey conducted by the Spanish National Statistics Institute for the Canaries. Additionally, a simulation study of such survey, on an artificial population similar to the actual population, allows us to evaluate the efficiency of its application. Keywords: Dual-Frame estimators; pseudo maximum likelihood estimator; Relative Bias; Relative Mean Squared error. Acknowledgments This work was supported by the Instituto Canario de Estadística (ISTAC) and Agencia Canaria de Investigación e Innovación y Sociedad de la Información (ACIISI) and partially supported by the Spanish MICINN Proyect MTM2010-16828. 2. Questions 1. If we use a similar survey to the one conducted by ISTAC: how does the use of telephone interview affect in the direct estimation of the target variables? 2. Assuming that the direct estimations of the target variables will be biased, is it possible to maintain a high percentage of telephone interview (low cost) to avoid or correct this bias? 3. How does the telephone interview affect the Dual-Frame estimation of target variables, which can match, or have different degrees of association, or be independent of the availability of phone? Internet QUESTION 1: How does the telephone interview affect? We build an artificial population of households in the Canary Islands, departing of the Housing Census 2001 of that community, and generating the variables of interest under different conditions. Simulate the extraction of surveys with the same methodology that ICT-H Canary survey, being able to vary the degree of in person interview (500 simulations). We obtain the relative mean square errors and biases of different estimators. Conducting a survey is very expensive and the use of telephone interview reduces the cost. But, in this survey, most of the target variables are related to the availability of phone at home. We consider that the availability of computer is independent of the fixed telephone. This allows us to evaluate the performance of an independent variable to the type of interview. The variable “only fixed telephone” enable us to evaluate the performance of a variable contained entirely within the telephone framework. The variable “internet” allows us to evaluate variables that are closely related to the availability of fixed telephone, but that is not fully contained within the phone framework. Telephone Computer Only fixed Telephone QUESTION 2 and 3: Using Dual-Frame Estimators Dual-Frame methodology is adapted to the requirements of a survey which combines multiple scenarios with the intent to cover the total population and lower implementation costs. We only introduce the situation of two scenarios with one totally contained in the other. Let be A and B such frames, and in particular: A: the total housing. B: households with fixed telephone. Then independent samples of size n A and n B are considered for each frame respectively with ݓ and ݓ the inverse of the inclusion probabilities in each frame. We consider that survey is conducted by in-person interview in frame A and by telephone interview in frame B. A a ab B a ab In this case, it creates two mutually exclusive domains, a and ab, formed as: units of A that are not in B, and units that are in both frame to time, respectively. Hartley Estimator ߠ ߠ 1െ ߠ with: ߠ ௩ሺ , The calculation of variances and covariance can be complex and depends on the type of sampling is performed. Scale-load Estimator (Rao 1983) ݕ ݕIts calculation is very simple but is highly influenced by the sample size. Pseudo-Maximum Likelihood (PML) Estimator (Skinner and Rao, 1996) ሺே ,ುಾಽ ,ெ where ,ெ is the smallest root of the quadratic equation: ݔ ݔ 0 The optimal choice needs to estimate the variances of and . We consider the definition with the samples sizes. Additionally, you can define a new variable of weights and work with the typical structure of a direct estimator usual in Statistical Institutes, as: ݓ ሺே ,ುಾಽ ݓ ,ುಾಽ ݓ ݓ ,ುಾಽ ݓ The PML estimator remains unbiased and fairly stable. By decreasing p 4 (more independent is the variable respect to phone) the RMSE of both (direct and PML) is more similar. When the percentage of in-person interview is very small (less than 20%) PML estimator variance increases, being more serious when p 4 is smaller (more independent is the variable respect to phone). A: Continuous Census (Total households) B: Phone Directory (Not total households with fixed telephone). The domain a, units of A that are not in B, is defined as the households without fixed telephone plus households with fixed telephone that are not included in the phone directory. Because in this survey, the variable “is in phone directory?” is not considered, we use an estimated proportion, p, of households that are not included in phone directory on households with fixed telephone, and we denote as PML(p%). ICT-H-INE 2010 ICT-HC-ISTAC 2010 Direct PML(20%) Households with Computer 67,0 76,7 68,6 Table Computer 51,0 55,5 47,0 Portable Computer 37,6 44,2 40,2 Households with access to Internet 58,4 70,4 60,0 Households with fixed telephone 74,7 89,1 70,6 Households with mobile telephone 94,2 92,3 92,2

Upload: instituto-canario-de-estadistica-istac

Post on 26-Jun-2015

610 views

Category:

Technology


2 download

DESCRIPTION

Póster de la ULL y el ISTAC en la 4th International Conference of the ERCIM WG on COMPUTING & STATISTICS en Londres

TRANSCRIPT

Page 1: Adaptation to the survey ICT-H in Canary Islands of the Dual Frame Methodology

The target variables considered are generated using Bernoulli distribution with different parameter.

p1: probability of availability of fixed telephone.

p2: probability of availability of desktop computer.

p3: probability of only fixed telephone for those with fixed telephone.

p4: probability of availability of internet connection for those with fixed telephone.

p5: probability of availability of internet connection for those with not fixed telephone.

ICT-H Survey (INE): Provides information of equipment and use of new technologies in household at the Autonomous Community

level.

Follows a stratified three stage random sampling design in each province, with primary sampling units the

census sections, secondary units the dwellings and tertiary units the people.

120 sections are sampled (8 dwellings by section, approximately 808 dwellings).

Each year renews a quarter of dwellings (rotating panels from 2004).

It uses direct estimates of reason with calibrated weights, wj .

All households at the first visit are surveyed by personal interview (CAPI), then those ones that have fixed

telephone, in subsequent visits are surveyed through telephone interviews (CATI).

The variables of interest, among others, are: availability of fixed telephone, mobile phone, only fixed

telephone, desktop computer, portable computer, internet, etc.

ICT-H Canary Survey (ISTAC): Similar to the ICT-H survey (INE) but it’s a light survey in the questionnaire and provides information at the

island level (performed only in 2006 and 2010).

Follows a stratified three stage random sampling design.

180 sections are sampled (20 dwellings by section, approximately 3,500 dwellings).

It uses direct estimates of reason with calibrated weights, wj .

70% of the survey is conducted through telephone interview (CATI) and 30% with personal interview.

Adaptation to the survey ICT-H in Canary Islands of theDual Frame methodology

González-Dávila, Enrique1; and and González-Yanes, Jesús Alberto2

[email protected], [email protected]

1Departamento de Estadística, Investigación Operativa y Computación (Universidad de La Laguna, Spain)

2Instituto Canario de Estadística (ISTAC, Spain)

1. Description of the surveys

4. Implementation real on ICT-H Canary survey (2010)

ERCIM 20114th International Conference of the ERCIM on Computing & Statistics

Londres, December 17-19, 2011

Abstract

The combination of information obtained in person and by telephone of a survey usually includes the definition of two

scenarios, one associated with the census or population register, and another to phonebook. The application of dual

frame methodology to surveys of availability or use of new technologies in households where there are target variables

that can match, have different degrees of association, or be independent with the scenarios defined, are of particular

interest. In this work this methodology is adapted to the particular case of the ICT-H survey conducted in the Canary

Islands, where one of the scenarios is contained entirely in the other. Dual frame estimator used is the pseudo

maximum likelihood estimator introduced by Skinner and Rao. The results are compared with those offered by the

direct estimator, and the estimates given by the ICT-H survey conducted by the Spanish National Statistics Institute for

the Canaries. Additionally, a simulation study of such survey, on an artificial population similar to the actual

population, allows us to evaluate the efficiency of its application.

Keywords: Dual-Frame estimators; pseudo maximum likelihood estimator; Relative Bias; Relative Mean Squared

error.

Acknowledgments

This work was supported by the Instituto Canario de Estadística (ISTAC) and Agencia Canaria de Investigación e Innovación y Sociedad de la Información

(ACIISI) and partially supported by the Spanish MICINN Proyect MTM2010-16828.

2. Questions

1. If we use a similar survey to the one conducted by ISTAC: how does the use of telephone interview

affect in the direct estimation of the target variables?

2. Assuming that the direct estimations of the target variables will be biased, is it possible to maintain a

high percentage of telephone interview (low cost) to avoid or correct this bias?

3. How does the telephone interview affect the Dual-Frame estimation of target variables, which can

match, or have different degrees of association, or be independent of the availability of phone?

Internet

QUESTION 1: How does the telephone interview affect?

We build an artificial population of households in the Canary Islands, departing of the Housing Census 2001 of that

community, and generating the variables of interest under different conditions.

Simulate the extraction of surveys with the same methodology that ICT-H Canary survey, being able to vary the

degree of in person interview (500 simulations). We obtain the relative mean square errors and biases of different

estimators.

Conducting a survey is very expensive and the use of telephone interview reduces the cost. But, in thissurvey, most of the target variables are related to the availability of phone at home.

We consider that the availability of computer is independent of the fixed telephone. Thisallows us to evaluate the performance of an independent variable to the type of interview.

The variable “only fixed telephone” enable us to evaluate the performanceof a variable contained entirely within the telephone framework.

The variable “internet” allows us to evaluate variables that are closelyrelated to the availability of fixed telephone, but that is not fully containedwithin the phone framework.

Telephone Computer Only fixed Telephone

QUESTION 2 and 3: Using Dual-Frame EstimatorsDual-Frame methodology is adapted to the requirements of a survey which combines multiple scenarios with theintent to cover the total population and lower implementation costs. We only introduce the situation of two scenarioswith one totally contained in the other.

Let be A and B such frames, and in particular:

A: the total housing.

B: households with fixed telephone.

Then independent samples of size nA and nB are considered for each frame respectively with and the inverse of

the inclusion probabilities in each frame. We consider that survey is conducted by in-person interview in frame A and

by telephone interview in frame B.A

a

abBa

ab

In this case, it creates two mutuallyexclusive domains, a and ab, formedas: units of A that are not in B, andunits that are in both frame to time,respectively.

Hartley Estimator

1

with: ,

The calculation of variances and covariance

can be complex and depends on the type of

sampling is performed.

Scale-load Estimator (Rao 1983)

Its calculation is very simple but is

highly influenced by the sample size.

Pseudo-Maximum Likelihood (PML) Estimator (Skinner and Rao, 1996)

,,

where , is the smallest root of the quadratic equation:

0

The optimal choice needs to estimate the variances of and . We consider the definition with the samples sizes.

Additionally, you can define a new variable of weights and work with the typical structure of a direct estimator usual inStatistical Institutes, as:

, ∈

, ∈

, ∈

The PML estimator remains unbiased and fairly stable. By decreasing p4 (moreindependent is the variable respect to phone) the RMSE of both (direct and PML)is more similar. When the percentage of in-person interview is very small (lessthan 20%) PML estimator variance increases, being more serious when p4 issmaller (more independent is the variable respect to phone).

A: Continuous Census (Total households)

B: Phone Directory (Not total households with fixed telephone).

The domain a, units of A that are not in B, is defined as the households without fixed telephone plus households

with fixed telephone that are not included in the phone directory. Because in this survey, the variable “is in

phone directory?” is not considered, we use an estimated proportion, p, of households that are not included in

phone directory on households with fixed telephone, and we denote as PML(p%).

ICT-H-INE 2010

ICT-HC-ISTAC 2010

Direct PML(20%)

Households with Computer 67,0 76,7 68,6

Table Computer 51,0 55,5 47,0

Portable Computer 37,6 44,2 40,2

Households with access to Internet 58,4 70,4 60,0

Households with fixed telephone 74,7 89,1 70,6

Households with mobile telephone 94,2 92,3 92,2