methods festival oxford

Methods Festival Oxford - July 2006

1

Things that go wrong in comparative surveys – evidence from ESS

Jaak BillietCCT of ESSK.U. Leuven

Methods Festival June 2006 Oxford


2

outline Introduction: conceptual frame for

assessing data quality in a methodologically well prepared cross-nation survey ESS

Selection of two aspects: the non-response problem and measurement equivalence

Evaluation of non-response bias Evaluation of equivalence of measures Conclusion: what can be done?


3

Introduction: conceptual frame

attempt to combine two approaches: Total Survey Error approach (TSE) & Total Quality Management approach (TQM) (Loosveldt, Carton & Billiet, IJMR 2004)

TSE: data quality is absence of variable and systematic error: often less attention to non-sampling error

TQM: all components of the production process contribute to the quality of the end product


4

Introduction: conceptual frame

here is focus on data-collection by means of face-to-face interviews

Conceptual frame is combination of - interviewer tasks (contacting & obtain co- operation/interview in narrow sense)- result on completion (obtained sample/responses)- kind of evaluation: process/output evaluation

5

Conceptual frameInterviewer tasks

To contact respondents and to obtain their co-operation

The interview in narrow sense

Result on completion Obtained sample

Registered responses

Process evaluation - Training and preparation - Evaluation of contacting

producedure - Follow-up and feedback

- Training and preparation - Evaluation of interviewer

behaviour - Follow-up and feedback

Output evaluation

- Evaluation of non-response error

- Evaluation of measurement errors

focus is now on


6

Selection of two aspects

Evaluation of non-response bias in ESS: several possibilities(1) comparing sample with population statistics(2) comparing respondents with non-respondents using additional information about nr(3) comparing co-operative respondents with reluctant respondents using information of contact forms…

example 1: traces of non-response bias with (3)


7

Selection of two aspects

Evaluation of measurement invariance in ESS:several possibilities- comparing distributions and looking for outliers…- looking for response sets- testing measurement models of latent variables and evaluation of levels of factorial invariance

example 2: two translation problems detected


8

I. Evaluation of non-response bias

Many rules and actions in ESS in order to obtain high and comparable response rates (goal = 70%)

Result: response rates vary between 35% and 80% in round 1 and between 43% and 79% in round 2

Example: Round 1Goal not obtained however much higher

response rates than in other surveys in most countries

9

Figure 1. Response rates (target 70%)

0

10

20

30

40

50

60

70

80


10

When nr-bias in comparative surveys?

- none if no bias in separate countries

- None if equal bias in separate countries and non response rates are equal

- Well if equal bias in separate countries and non response rates very different

- Well if different bias in separate countries

ESS = bias is expected because of very different response rates and traces of difference in bias…


11

non-response bias…

ESS not worser than other surveys and lots of effort in order to estimate nr-bias

How?

One example of data quality assessment in ESS = comparing co-operative and reluctant respondents

Reluctant respondents = original refusals who are converted into respondents


12

Data = contact files and main data files

contact form:- Info about time of each contact attempt (up to 10)- Info about every mode of each C.A.- Info about every outcome of each CA- Info about kind of non-response of each CA- Info about reason of refusal- Estimation of strength of refusal -> base for

conversion and distinction between soft and hard refusals

- Info about ineligibles- Info about respondent (age, gender, housing form)


13


method assumption: reluctant respondents are

informative for final refusals (some evidence for that in waves of mail surveys…)

How detect? Linking contact forms of co-operative and reluctant respondents with main data file

Analysis of countries with substantive number of reluctant respondents (NE and DE more than 450 reluctant respondents, GB, AT & CH more than 115)


14


Find differences in scores between Co & Re on social demographics (education, age, urban environment…)

Find differerences in scores between Co & Re on relevant attitudinal variables (latent mutliple indicator variables: political trust, political participation, ethnic threat…)

Multiple regression with all relevant control variables and type respondent (Re/Co) as predictor and attitudinal variable as dependent var (Do we find a net effect of Co/Re on the attitude?) see figure

If sign. differences = trace of bias


15

Steps in analysis of nr-bias

1. Simple regression:

kind of resp ? attid. var

2. multipe regression evaluate change (1) – (2)

kind of resp ? attid. var

soc. demograph.

16

Some traces of nr-bias in attitudes

most in expected direction (exception CH both cases and GB for p.p.)

Table 1. Means (10-ptscales) and t-test for difference between Co-operative and Reluctant respondents in ESS R1 for political participation and ethnic threat.

AT GB CH DE NL Attitude

Coop. Reluct Coop. Reluct Coop. Reluct Coop. Reluct Coop. Reluct

5.81 6.29 5.28 5.21 5.82 5.95 5.60 5.29 5.89 5.66 Political participation t = -2.63; p < 0.01 t = 0.38; p = 0.70 t = -0.81; p = 0.42 t =3.20 ; p < 0.01 t = 2.69; p < 0.01

5.04 4.81 5.30 5.74 5.29 5.54 5.57 5.81 5.00 5.28 Ethnic threat

t = 1.33 ; p = 0.18 t = -3.00 ; p < 0.01 t = -2.04 ; p = 0.04 t = -3.02 ; p < 0.01 t = -3.51 ; p < 0.001


17

Some traces of nr-bias

About the same (but smaller diff) for interest in politics, trust in politics, but not in all countries

Participation in voting: not all in same direction Differences in social demographics not all in

same direction!Next step is simple and multiple regression of

type of respondent on attitudes

Example: Political part and ethnic threat in DE and NL

Parameters in simple (model 1) and multiple (model 2) regression

18

Some traces of nr-bias

Table 2. Standardized regressioncoëfficiënts (-coefficients) and t-values for reluctant respondents (versus co-operative respondents) in simple (Model 1) and multiple regression models (Model 2) in Germany and the Netherlands.

DE NL Explained variable

Type of respondent t-value t-value

Political participation

Reluctant (Model 1) (Model 2)

ref: cooperative

-0.060** -0.047**

-3.20** -2.56

-0.056** -0.039

-2.69 -1.91

Ethnic threat Reluctant (Model 1) (Model 2)

ref: co-operative

0.056** 0.030

3.02 1.70

0.072*** 0.047*

3.51 2.34

Model 2 always smaller effect but non-significance depends of country and dept. var

19

Nr-bias: evaluation

Correctly recording contacts is hard task and more expensive – resistance of surv.org. (errors in some countries)

Refusal conversion – problems of privacy regulation – not done in every country (more succesfull in more countries in R2)

In the coutries that were used (5 in R1) some differences in direction

Therefore, not possible to apply ‘corrections’ for every country, thus not usefull in comparative analysis unless…

Relies on rather weak hypothesis: reluctant ± refusals

20

II. Evaluation of measurement equivalence

Focused on sets of indicators for latent variables = evaluation of indicators within context of construct

Full equivalence (see figure and rule)- invariance of corresponding slopes (factor loadings)

over all countries (metric invariance)- invariance of corresponding intercepts of indicators with latent avairalbe over all countries (scalar invariance) - invariance of error terms (residuals)

weaker forms of equivalence when hypotheses on equality constraints rejected =

only the pattern of relation between concept and indicators is invariant

21

valid measurement in comparative setting

idea of causal relationship* between latent variable (LV) and four observed indicators (OV), and between measured latent variable and theoretical (intended) concept (TC)

measurement validity theoretical validity? random error validity parameter (=equal for all disciplines?)

e ov1 e

ov2

M e LV ? TC

e ov3

ov4Method effect assumed to be zero


22

Formal notation of MG comparison

The relationship beween item Ij and latent trait W in group g is denoted as follows group (country)

(1)

latent trait intercept term slope parameter

link function (e.g. OLS regression)observed indicator (j = number of indicator)

)( )(,1

)(,0 WfI g

jg

jj


23

Kinds of equivalence

construct equivalence is defined as follows:

Metric invariance = equal slopes

with j = 1, …, J and g,h = 1, …, G; g h (2)

Regression coefficients (with observe variable as dependent var) validly comparable

equality of slopes is required

)(,1

)(,1

hj

gj


24

Kinds of equivalence

if one wants to compare means of latent variable and indicatorsadditional requirement = scalar invariance equality of intercepts:

with j = 1, …, J and g,h = 1, …, G; g

h (3)

an item is measurement invariant across groups if restrictions (2) and (3) hold

)(,0

)(,0

hj

gj


25

What if slopes/intercepts of some groups not invariant?

if test of equal slopes/intercepts shows that the parameters are not invariant for some items, then several options:

- exclude groups

- remove items (try to know why? = our examples)

- resort to partial factorial invariance (free estimation of factor loadings)

- conclude that construct has different meaning and rely on weaker form of equivalence


26

An example of equivalent measurement: willingness to allow immigrants

Six items (D4-D9) in 21 countries

Tests by Multi Group Structural Equation Modeling Proportion Odds Model (very strict)

(Welkenhuysen-Gybels & Billiet, 2002)

here = only MGSEM


27

“To what extent do you think [country] should allow people xxxxxx to come and live here?” (4-point scale: many = 1, some = 2, a few = 3, none = 4)

d4: “…of a different race” d5: “…of the same race…” d6: “…from the richer countries in Europe…” d7: “…from the poorer countries in

Europe…” d8: “…from the richter countries outside

Europe…” d9: “…from the poorer countries outside

Europe…”


28

Model specifications

• Start from factorial invariance:o 21 groupso Equal factor loadings over all groupso Equal intercepts over all groupso Equal residual variances over all groupso Variances over latent variable not equal over

groupso Latent mean κ = 0 in group 1, free in all other

groups


29

Test information

Model modifications χ² dfRMSE

Ap (close

fit)NFI

M0factorial

invariance951,8

9508 0,022 1 1

M1 free τ1(hu) 867,0

1507 0,020 1 1

M2 free λ1(hu) 847,4

7506 0,019 1 1

M3 free τ1(dk) 827,2

9505 0,019 1 1

30

The quasi equivalent measurement model

(6,94)0,15d9

(6,77)0,12d8

(6,86)0,15d7

(6,69)0,11d6

(6,91)0,15d5

(-2,71)-0,07(-8,77)-0,25(8,38)0,16d4

dkhuall

Τ (t-value)

(-90,14)-0,94d9

(-146,43)-0,77d8

(-87,77)-0,94d7

(-143,81)-0,72d6

(-132,76)-0,91d5

(-26,32)-0,59-0,83d4

huall

Standardised λ (t-value)

(6,94)0,15d9

(6,77)0,12d8

(6,86)0,15d7

(6,69)0,11d6

(6,91)0,15d5

(-2,71)-0,07(-8,77)-0,25(8,38)0,16d4

dkhuall

Τ (t-value)

(-90,14)-0,94d9

(-146,43)-0,77d8

(-87,77)-0,94d7

(-143,81)-0,72d6

(-132,76)-0,91d5

(-26,32)-0,59-0,83d4

huall

Standardised λ (t-value)

(negative coeff because item scores not reversed many=1…none=4)

(intercepts not very different between countries)


31

Things that go wrong…

« lost in translation » Good results for D4-D9 set, but not for all sets of items Two examples of problematic translation Possible to detect by MGSEM?

Where?- in France « asylum items » - In DK « ethnic threat » items

Problems discoverd outside standard MGSEM tests because FR and DK not in release, but checked ad hoc (skip all details)

32

Example 1: « …generous in judging peoples applications for refugee status »

How detected? In comparison between « neighbours » of Belgium very large deviation in FR on one items

D51 The government should be generous in judging people's applications for refugee status.

Agree (strongly) Neither agree nor

disagree Disagree (strongly) N

Belgium 18,0 17,9 22,8 22,6 59,2 59,5 1852

Germany 15,8 15,9 24,8 24,7 59,4 59,4 2873

France 62,9 63,1 18,8 18,6 18,3 18,3 1470

United Kingdom 27,2 27,2 26,1 25,7 46,8 47,1 2028

Luxembourg 31,3 31,0 23,5 24,3 45,2 44,7 1416

Netherlands 10,0 10,0 14,9 15,2 75,2 74,8 2336

weighted by gender and age; weighted by gender, age, and education

*

*

CARD 40: Some people come to this country and apply for refugee status on the grounds [1] that they fear persecution in their own Using this card, please say how much you agree or disagree with the following statements.

“The government should be generous [2] in judging people’s applications for refugee status”

[1] “On the grounds”: in the sense of both ‘because’ and ‘stating that’

[2] “Generous”: ‘liberal’.

In Luxemburg, Genéve (CH), and Wallonia (BE)“Le gouvernement devrait être généreux en traitant les demandes du statut de réfugié”

Translation of D51 in source language (English)

Wallonia, Switserland, Luxemburg

CARTE 40: Des gens viennent en [country] et demandent un statut de réfugé car ils se sentent peresécutés dans leur propre pays. A l’aide de cette carte, dites-moi s’il vous plaît quelle mesure vos êtes d’accord ou en désaccord avec les propositions suivantes…“Le gouvernement devrait être généreux en traitant les demandes du statut de réfugié”

Luxemburg: …en accordant le statut de réfugiéSwitserland: …en traitant d’un statut de réfugié

FranceCertaines personnes arrivent en France et demandent le statut de

réfugié parce qu’elles craignent des persécutions dans leur pays

Equeteur montre liste 49 (tout à fait d’accord etc….)“Les pouvoir publics devraient se montrer plus ouverts dans l’examen de ces demandes”


35

Three differences in one small statement

“Le gouvernement devrait être généreux en traitant les demandes du statut de réfugié”

“Les pouvoir publics devraient se montrer plus ouverts dans l’examen de ces demandes”

Differences:“gouvernement” = the government“pouvoir publics” = the administration“montrer plus ouvert” = more acceptable than

the extreme “devrait être généreux”«ces demandes » no direct reference to « statut

de réfugié »


36

Example 2: D24 in Denmark

Source questionnaire

Questions D23 & D24 on ‘serious crime and any crime”

D23 If people who have come to live here commit a serious crime, they should [1] be made to leave

D24 If people who have come to live here commit any crime, they should [2] be made to leave(five point scales: completely disagree 1 ---compl agree 5)

[1] “Should” in D23 and D24 have the sense of ‘must’.[2] “Should” in D23 and D24 have the sense of ‘must’.


37

Two examples: item D24 in Denmark

One can expect a contrast effect (backfire, or contrast) in these items Mean approval of D23 for all countries = 79%Mean approval of D24 for alle countries = 51%

How problem detected?

Strange result of DK in report of EUMCR (Scheepers et al. 2004) : two items combined for all countries (PCA scores and % support reported (support of “made to leave” = % higher then mean (0))

38

unlikely figure

Mean score and percentage suport on “favour repatriation policies for criminal immigrants” (ESS R1) (selection of countries in correct order)

Country Mean (PCA D23-D24) % support

HU 0.867 91.9%

FR 0.834 87.3%

E-DE 0.770 83.9%

PT 0.756 83.2%

… … …

W-DE 0.715 75.2%

… … …

SW 0.577 49.3%

LU 0.558 46.3%

DK 0.540 43.8%

Mean all countries

0.792 50.8%


39

Arguments:D23-serious crime: 77;2% of Danish want immigrants to leave = higher than AT, BE, ES, FI, FR, IE, IL, LU, NL, NO, SE

about average of EU (79%) (pweight)

certainly not the lowest as was reported

D24: any crime: only 12.9% of the Danish want immigrants to leave = lowest of all. Much lower than average in Europe = 51% (pweight).

What happened?

D24 in Denmark


40

Lost in translation

Contrast between D23-D24 much larger because of different translation of ‘crime’ in D24 in DK

D.23 Hvis mennesker, der er kommet for at bo her, begår en alvorlig forbrydelse, skal de udvises af landet

D.24 Hvis mennesker, der er kommet for at bo her, begår nogen som helst form for lovovertrædelse, skal de udvises af landet

Forbrydelse = ”crime” is used in NO and SW for D23 and D24 alvorlig forbrydelse = serious crime

Lovovertrædelse = ”any kind of law violation, associated with minor crime (violation of traffic rules included)” is used in D24 in DK nogen som helst form for Lovovertrædelse

= any form of minor crime

41

Is this detectable in tests of Factorial Invariance? YES

Example of MGSEM: D51 in FR, measurement model for D49-D55 in FR, Genève, Wallonia, and Luxemburg (four groups)D49 [Country] has more than its fair share of people applying refugee status (-)D50 People applying refugee status allowed to work while cases considered (+)D51 Government should be generous judging applications for refugee status (+)D52 Most refusee applicants don’t fear persecution in own countries (-)D53 Refusee applicants kept in detention centres while cases considered (-)D54 Financial support to refugee applicants while cases considered (+)D55 Granted refugees should be entitled to bring close family members (+)

Set of items is balanced (4 pos, 3 neg) = a MGSEM model with a substatial factor and a style factor (aquiescence) (see Billiet & McClendon, SEM 2000) (see model)

42

The MGSEM measurement model


43

Detection of problematic item

Model Chisq df RMSEA P(close fit)

Model CAIC

Mo: basic model invariant (A) 1,842.78 92 0.133 0.000 2,291.57

Mo: basic model invariant (A+S) 1,304.60 84 0.116 0.000 1,829.35

M1: free FR3 860.92 83 0.093 0.000 1,395.04

M2: free LU5 738.80 82 0.086 0.000 1,282.29

Aim: detect problematic item very early in the test procedure for finding adequate modelProblematic item nr 3 (D51) in France directly detected in model with Asylum + Style (method effect )item 5 for LU needs also inspection (to be done)


44

Equivalent measurement: evaluation

In spite of many efforts for correct translation still some problems

Translation problems are detectable by- comparing distributions (find ‘strange’ outliers)- MGSEM tests (-parameter not invariant, to detect early in stage of test)- in discussion with native speakers, translators…

Other sources of in-equivalence…

methods festival oxford - july 20061 things that go wrong in comparative surveys – evidence from...

Documents

oxford slide

response bias ess

traces of nonresponse

trace of bias slide

kind of nonresponse

different bias

equal bias