methods festival oxford - july 20061 things that go wrong in comparative surveys – evidence from...
TRANSCRIPT
Methods Festival Oxford - July 2006
1
Things that go wrong in comparative surveys – evidence from ESS
Jaak BillietCCT of ESSK.U. Leuven
Methods Festival June 2006 Oxford
Methods Festival Oxford - July 2006
2
outline Introduction: conceptual frame for
assessing data quality in a methodologically well prepared cross-nation survey ESS
Selection of two aspects: the non-response problem and measurement equivalence
Evaluation of non-response bias Evaluation of equivalence of measures Conclusion: what can be done?
Methods Festival Oxford - July 2006
3
Introduction: conceptual frame
attempt to combine two approaches: Total Survey Error approach (TSE) & Total Quality Management approach (TQM) (Loosveldt, Carton & Billiet, IJMR 2004)
TSE: data quality is absence of variable and systematic error: often less attention to non-sampling error
TQM: all components of the production process contribute to the quality of the end product
Methods Festival Oxford - July 2006
4
Introduction: conceptual frame
here is focus on data-collection by means of face-to-face interviews
Conceptual frame is combination of - interviewer tasks (contacting & obtain co- operation/interview in narrow sense)- result on completion (obtained sample/responses)- kind of evaluation: process/output evaluation
5
Conceptual frameInterviewer tasks
To contact respondents and to obtain their co-operation
The interview in narrow sense
Result on completion Obtained sample
Registered responses
Process evaluation - Training and preparation - Evaluation of contacting
producedure - Follow-up and feedback
- Training and preparation - Evaluation of interviewer
behaviour - Follow-up and feedback
Output evaluation
- Evaluation of non-response error
- Evaluation of measurement errors
focus is now on
Methods Festival Oxford - July 2006
6
Selection of two aspects
Evaluation of non-response bias in ESS: several possibilities(1) comparing sample with population statistics(2) comparing respondents with non-respondents using additional information about nr(3) comparing co-operative respondents with reluctant respondents using information of contact forms…
example 1: traces of non-response bias with (3)
Methods Festival Oxford - July 2006
7
Selection of two aspects
Evaluation of measurement invariance in ESS:several possibilities- comparing distributions and looking for outliers…- looking for response sets- testing measurement models of latent variables and evaluation of levels of factorial invariance
example 2: two translation problems detected
Methods Festival Oxford - July 2006
8
I. Evaluation of non-response bias
Many rules and actions in ESS in order to obtain high and comparable response rates (goal = 70%)
Result: response rates vary between 35% and 80% in round 1 and between 43% and 79% in round 2
Example: Round 1Goal not obtained however much higher
response rates than in other surveys in most countries
9
Figure 1. Response rates (target 70%)
0
10
20
30
40
50
60
70
80
Methods Festival Oxford - July 2006
10
When nr-bias in comparative surveys?
- none if no bias in separate countries
- None if equal bias in separate countries and non response rates are equal
- Well if equal bias in separate countries and non response rates very different
- Well if different bias in separate countries
ESS = bias is expected because of very different response rates and traces of difference in bias…
Methods Festival Oxford - July 2006
11
non-response bias…
ESS not worser than other surveys and lots of effort in order to estimate nr-bias
How?
One example of data quality assessment in ESS = comparing co-operative and reluctant respondents
Reluctant respondents = original refusals who are converted into respondents
Methods Festival Oxford - July 2006
12
Data = contact files and main data files
contact form:- Info about time of each contact attempt (up to 10)- Info about every mode of each C.A.- Info about every outcome of each CA- Info about kind of non-response of each CA- Info about reason of refusal- Estimation of strength of refusal -> base for
conversion and distinction between soft and hard refusals
- Info about ineligibles- Info about respondent (age, gender, housing form)
Methods Festival Oxford - July 2006
13
non-response bias…
method assumption: reluctant respondents are
informative for final refusals (some evidence for that in waves of mail surveys…)
How detect? Linking contact forms of co-operative and reluctant respondents with main data file
Analysis of countries with substantive number of reluctant respondents (NE and DE more than 450 reluctant respondents, GB, AT & CH more than 115)
Methods Festival Oxford - July 2006
14
non-response bias…
Find differences in scores between Co & Re on social demographics (education, age, urban environment…)
Find differerences in scores between Co & Re on relevant attitudinal variables (latent mutliple indicator variables: political trust, political participation, ethnic threat…)
Multiple regression with all relevant control variables and type respondent (Re/Co) as predictor and attitudinal variable as dependent var (Do we find a net effect of Co/Re on the attitude?) see figure
If sign. differences = trace of bias
Methods Festival Oxford - July 2006
15
Steps in analysis of nr-bias
1. Simple regression:
kind of resp ? attid. var
2. multipe regression evaluate change (1) – (2)
kind of resp ? attid. var
soc. demograph.
16
Some traces of nr-bias in attitudes
most in expected direction (exception CH both cases and GB for p.p.)
Table 1. Means (10-ptscales) and t-test for difference between Co-operative and Reluctant respondents in ESS R1 for political participation and ethnic threat.
AT GB CH DE NL Attitude
Coop. Reluct Coop. Reluct Coop. Reluct Coop. Reluct Coop. Reluct
5.81 6.29 5.28 5.21 5.82 5.95 5.60 5.29 5.89 5.66 Political participation t = -2.63; p < 0.01 t = 0.38; p = 0.70 t = -0.81; p = 0.42 t =3.20 ; p < 0.01 t = 2.69; p < 0.01
5.04 4.81 5.30 5.74 5.29 5.54 5.57 5.81 5.00 5.28 Ethnic threat
t = 1.33 ; p = 0.18 t = -3.00 ; p < 0.01 t = -2.04 ; p = 0.04 t = -3.02 ; p < 0.01 t = -3.51 ; p < 0.001
Methods Festival Oxford - July 2006
17
Some traces of nr-bias
About the same (but smaller diff) for interest in politics, trust in politics, but not in all countries
Participation in voting: not all in same direction Differences in social demographics not all in
same direction!Next step is simple and multiple regression of
type of respondent on attitudes
Example: Political part and ethnic threat in DE and NL
Parameters in simple (model 1) and multiple (model 2) regression
18
Some traces of nr-bias
Table 2. Standardized regressioncoëfficiënts (-coefficients) and t-values for reluctant respondents (versus co-operative respondents) in simple (Model 1) and multiple regression models (Model 2) in Germany and the Netherlands.
DE NL Explained variable
Type of respondent t-value t-value
Political participation
Reluctant (Model 1) (Model 2)
ref: cooperative
-0.060** -0.047**
-3.20** -2.56
-0.056** -0.039
-2.69 -1.91
Ethnic threat Reluctant (Model 1) (Model 2)
ref: co-operative
0.056** 0.030
3.02 1.70
0.072*** 0.047*
3.51 2.34
Model 2 always smaller effect but non-significance depends of country and dept. var
19
Nr-bias: evaluation
Correctly recording contacts is hard task and more expensive – resistance of surv.org. (errors in some countries)
Refusal conversion – problems of privacy regulation – not done in every country (more succesfull in more countries in R2)
In the coutries that were used (5 in R1) some differences in direction
Therefore, not possible to apply ‘corrections’ for every country, thus not usefull in comparative analysis unless…
Relies on rather weak hypothesis: reluctant ± refusals
20
II. Evaluation of measurement equivalence
Focused on sets of indicators for latent variables = evaluation of indicators within context of construct
Full equivalence (see figure and rule)- invariance of corresponding slopes (factor loadings)
over all countries (metric invariance)- invariance of corresponding intercepts of indicators with latent avairalbe over all countries (scalar invariance) - invariance of error terms (residuals)
weaker forms of equivalence when hypotheses on equality constraints rejected =
only the pattern of relation between concept and indicators is invariant
21
valid measurement in comparative setting
idea of causal relationship* between latent variable (LV) and four observed indicators (OV), and between measured latent variable and theoretical (intended) concept (TC)
measurement validity theoretical validity? random error validity parameter (=equal for all disciplines?)
e ov1 e
ov2
M e LV ? TC
e ov3
ov4Method effect assumed to be zero
Methods Festival Oxford - July 2006
22
Formal notation of MG comparison
The relationship beween item Ij and latent trait W in group g is denoted as follows group (country)
(1)
latent trait intercept term slope parameter
link function (e.g. OLS regression)observed indicator (j = number of indicator)
)( )(,1
)(,0 WfI g
jg
jj
Methods Festival Oxford - July 2006
23
Kinds of equivalence
construct equivalence is defined as follows:
Metric invariance = equal slopes
with j = 1, …, J and g,h = 1, …, G; g h (2)
Regression coefficients (with observe variable as dependent var) validly comparable
equality of slopes is required
)(,1
)(,1
hj
gj
Methods Festival Oxford - July 2006
24
Kinds of equivalence
if one wants to compare means of latent variable and indicatorsadditional requirement = scalar invariance equality of intercepts:
with j = 1, …, J and g,h = 1, …, G; g
h (3)
an item is measurement invariant across groups if restrictions (2) and (3) hold
)(,0
)(,0
hj
gj
Methods Festival Oxford - July 2006
25
What if slopes/intercepts of some groups not invariant?
if test of equal slopes/intercepts shows that the parameters are not invariant for some items, then several options:
- exclude groups
- remove items (try to know why? = our examples)
- resort to partial factorial invariance (free estimation of factor loadings)
- conclude that construct has different meaning and rely on weaker form of equivalence
Methods Festival Oxford - July 2006
26
An example of equivalent measurement: willingness to allow immigrants
Six items (D4-D9) in 21 countries
Tests by Multi Group Structural Equation Modeling Proportion Odds Model (very strict)
(Welkenhuysen-Gybels & Billiet, 2002)
here = only MGSEM
Methods Festival Oxford - July 2006
27
“To what extent do you think [country] should allow people xxxxxx to come and live here?” (4-point scale: many = 1, some = 2, a few = 3, none = 4)
d4: “…of a different race” d5: “…of the same race…” d6: “…from the richer countries in Europe…” d7: “…from the poorer countries in
Europe…” d8: “…from the richter countries outside
Europe…” d9: “…from the poorer countries outside
Europe…”
Methods Festival Oxford - July 2006
28
Model specifications
• Start from factorial invariance:o 21 groupso Equal factor loadings over all groupso Equal intercepts over all groupso Equal residual variances over all groupso Variances over latent variable not equal over
groupso Latent mean κ = 0 in group 1, free in all other
groups
Methods Festival Oxford - July 2006
29
Test information
Model modifications χ² dfRMSE
Ap (close
fit)NFI
M0factorial
invariance951,8
9508 0,022 1 1
M1 free τ1(hu) 867,0
1507 0,020 1 1
M2 free λ1(hu) 847,4
7506 0,019 1 1
M3 free τ1(dk) 827,2
9505 0,019 1 1
30
The quasi equivalent measurement model
(6,94)0,15d9
(6,77)0,12d8
(6,86)0,15d7
(6,69)0,11d6
(6,91)0,15d5
(-2,71)-0,07(-8,77)-0,25(8,38)0,16d4
dkhuall
Τ (t-value)
(-90,14)-0,94d9
(-146,43)-0,77d8
(-87,77)-0,94d7
(-143,81)-0,72d6
(-132,76)-0,91d5
(-26,32)-0,59-0,83d4
huall
Standardised λ (t-value)
(6,94)0,15d9
(6,77)0,12d8
(6,86)0,15d7
(6,69)0,11d6
(6,91)0,15d5
(-2,71)-0,07(-8,77)-0,25(8,38)0,16d4
dkhuall
Τ (t-value)
(-90,14)-0,94d9
(-146,43)-0,77d8
(-87,77)-0,94d7
(-143,81)-0,72d6
(-132,76)-0,91d5
(-26,32)-0,59-0,83d4
huall
Standardised λ (t-value)
(negative coeff because item scores not reversed many=1…none=4)
(intercepts not very different between countries)
Methods Festival Oxford - July 2006
31
Things that go wrong…
« lost in translation » Good results for D4-D9 set, but not for all sets of items Two examples of problematic translation Possible to detect by MGSEM?
Where?- in France « asylum items » - In DK « ethnic threat » items
Problems discoverd outside standard MGSEM tests because FR and DK not in release, but checked ad hoc (skip all details)
32
Example 1: « …generous in judging peoples applications for refugee status »
How detected? In comparison between « neighbours » of Belgium very large deviation in FR on one items
D51 The government should be generous in judging people's applications for refugee status.
Agree (strongly) Neither agree nor
disagree Disagree (strongly) N
Belgium 18,0 17,9 22,8 22,6 59,2 59,5 1852
Germany 15,8 15,9 24,8 24,7 59,4 59,4 2873
France 62,9 63,1 18,8 18,6 18,3 18,3 1470
United Kingdom 27,2 27,2 26,1 25,7 46,8 47,1 2028
Luxembourg 31,3 31,0 23,5 24,3 45,2 44,7 1416
Netherlands 10,0 10,0 14,9 15,2 75,2 74,8 2336
weighted by gender and age; weighted by gender, age, and education
*
*
CARD 40: Some people come to this country and apply for refugee status on the grounds [1] that they fear persecution in their own Using this card, please say how much you agree or disagree with the following statements.
“The government should be generous [2] in judging people’s applications for refugee status”
[1] “On the grounds”: in the sense of both ‘because’ and ‘stating that’
[2] “Generous”: ‘liberal’.
In Luxemburg, Genéve (CH), and Wallonia (BE)“Le gouvernement devrait être généreux en traitant les demandes du statut de réfugié”
Translation of D51 in source language (English)
Wallonia, Switserland, Luxemburg
CARTE 40: Des gens viennent en [country] et demandent un statut de réfugé car ils se sentent peresécutés dans leur propre pays. A l’aide de cette carte, dites-moi s’il vous plaît quelle mesure vos êtes d’accord ou en désaccord avec les propositions suivantes…“Le gouvernement devrait être généreux en traitant les demandes du statut de réfugié”
Luxemburg: …en accordant le statut de réfugiéSwitserland: …en traitant d’un statut de réfugié
FranceCertaines personnes arrivent en France et demandent le statut de
réfugié parce qu’elles craignent des persécutions dans leur pays
Equeteur montre liste 49 (tout à fait d’accord etc….)“Les pouvoir publics devraient se montrer plus ouverts dans l’examen de ces demandes”
Methods Festival Oxford - July 2006
35
Three differences in one small statement
“Le gouvernement devrait être généreux en traitant les demandes du statut de réfugié”
“Les pouvoir publics devraient se montrer plus ouverts dans l’examen de ces demandes”
Differences:“gouvernement” = the government“pouvoir publics” = the administration“montrer plus ouvert” = more acceptable than
the extreme “devrait être généreux”«ces demandes » no direct reference to « statut
de réfugié »
Methods Festival Oxford - July 2006
36
Example 2: D24 in Denmark
Source questionnaire
Questions D23 & D24 on ‘serious crime and any crime”
D23 If people who have come to live here commit a serious crime, they should [1] be made to leave
D24 If people who have come to live here commit any crime, they should [2] be made to leave(five point scales: completely disagree 1 ---compl agree 5)
[1] “Should” in D23 and D24 have the sense of ‘must’.[2] “Should” in D23 and D24 have the sense of ‘must’.
Methods Festival Oxford - July 2006
37
Two examples: item D24 in Denmark
One can expect a contrast effect (backfire, or contrast) in these items Mean approval of D23 for all countries = 79%Mean approval of D24 for alle countries = 51%
How problem detected?
Strange result of DK in report of EUMCR (Scheepers et al. 2004) : two items combined for all countries (PCA scores and % support reported (support of “made to leave” = % higher then mean (0))
38
unlikely figure
Mean score and percentage suport on “favour repatriation policies for criminal immigrants” (ESS R1) (selection of countries in correct order)
Country Mean (PCA D23-D24) % support
HU 0.867 91.9%
FR 0.834 87.3%
E-DE 0.770 83.9%
PT 0.756 83.2%
… … …
W-DE 0.715 75.2%
… … …
SW 0.577 49.3%
LU 0.558 46.3%
DK 0.540 43.8%
Mean all countries
0.792 50.8%
Methods Festival Oxford - July 2006
39
Arguments:D23-serious crime: 77;2% of Danish want immigrants to leave = higher than AT, BE, ES, FI, FR, IE, IL, LU, NL, NO, SE
about average of EU (79%) (pweight)
certainly not the lowest as was reported
D24: any crime: only 12.9% of the Danish want immigrants to leave = lowest of all. Much lower than average in Europe = 51% (pweight).
What happened?
D24 in Denmark
Methods Festival Oxford - July 2006
40
Lost in translation
Contrast between D23-D24 much larger because of different translation of ‘crime’ in D24 in DK
D.23 Hvis mennesker, der er kommet for at bo her, begår en alvorlig forbrydelse, skal de udvises af landet
D.24 Hvis mennesker, der er kommet for at bo her, begår nogen som helst form for lovovertrædelse, skal de udvises af landet
Forbrydelse = ”crime” is used in NO and SW for D23 and D24 alvorlig forbrydelse = serious crime
Lovovertrædelse = ”any kind of law violation, associated with minor crime (violation of traffic rules included)” is used in D24 in DK nogen som helst form for Lovovertrædelse
= any form of minor crime
41
Is this detectable in tests of Factorial Invariance? YES
Example of MGSEM: D51 in FR, measurement model for D49-D55 in FR, Genève, Wallonia, and Luxemburg (four groups)D49 [Country] has more than its fair share of people applying refugee status (-)D50 People applying refugee status allowed to work while cases considered (+)D51 Government should be generous judging applications for refugee status (+)D52 Most refusee applicants don’t fear persecution in own countries (-)D53 Refusee applicants kept in detention centres while cases considered (-)D54 Financial support to refugee applicants while cases considered (+)D55 Granted refugees should be entitled to bring close family members (+)
Set of items is balanced (4 pos, 3 neg) = a MGSEM model with a substatial factor and a style factor (aquiescence) (see Billiet & McClendon, SEM 2000) (see model)
42
The MGSEM measurement model
Methods Festival Oxford - July 2006
43
Detection of problematic item
Model Chisq df RMSEA P(close fit)
Model CAIC
Mo: basic model invariant (A) 1,842.78 92 0.133 0.000 2,291.57
Mo: basic model invariant (A+S) 1,304.60 84 0.116 0.000 1,829.35
M1: free FR3 860.92 83 0.093 0.000 1,395.04
M2: free LU5 738.80 82 0.086 0.000 1,282.29
Aim: detect problematic item very early in the test procedure for finding adequate modelProblematic item nr 3 (D51) in France directly detected in model with Asylum + Style (method effect )item 5 for LU needs also inspection (to be done)
Methods Festival Oxford - July 2006
44
Equivalent measurement: evaluation
In spite of many efforts for correct translation still some problems
Translation problems are detectable by- comparing distributions (find ‘strange’ outliers)- MGSEM tests (-parameter not invariant, to detect early in stage of test)- in discussion with native speakers, translators…
Other sources of in-equivalence…