2nd meeting: multilevel modeling: intra class correlation subjects for today:

15
2nd meeting: Multilevel modeling: intra class correlation Subjects for today: Multilevel data base construction The difference between single level OLS regression and Multilevel analysis Multilevel analysis: the intra class correlation (ICC)

Upload: tanika

Post on 07-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

2nd meeting: Multilevel modeling: intra class correlation Subjects for today: Multilevel data base construction The difference between single level OLS regression and Multilevel analysis Multilevel analysis: the intra class correlation (ICC). What we have learned first meeting: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

2nd meeting:

Multilevel modeling: intra class correlation

Subjects for today:

Multilevel data base construction

The difference between single level OLS regression and Multilevel analysis

Multilevel analysis: the intra class correlation (ICC)

Page 2: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

What we have learned first meeting:

• When we like to say something about higher level units like Indonesian districts or countries it is best to use multilevel analysis, because we use the right standard error and correct number of observations.

• We need a data file with samples at district level and within district level we need samples from individuals:

Individual District body height1 Bandung1502 Bandung1453 Bandung1564 Majalengka 1185 Majalengka 1746 Majalengka 1567 Serang 1678 Serang 1539 Serang 14510 District X 14411 District X 17712 District X 188

2

Page 3: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Because we like to incorporate level 2 variables as well to explain why districts (or countries) differ.

The data file look like this (welfare included):

Individual District body height Welfare (in € per capita)

1 Bandung150 3002 Bandung145 3003 Bandung156 3004 Majalengka 118 1005 Majalengka 174 1006 Majalengka 156 1007 Serang 167 2008 Serang 153 2009 Serang 145 20010 District X 144 50011 District X 177 50012 District X 188 500

HOW TO GET THAT RIGHT?

3

Page 4: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

First we have to get the welfare data for Indonesian districts:

Can be found at Central Bureau for Statistics or other internet sites

Second we like to have them in a SPSS readable data file (for instance an Excel file (Microsoft Office) or SPSS file like SAV or POR files).

Third we must connect the welfare data to the individual data:

Individual data:

1 Bandung

2 Bandung

3 Bandung

4 Serang

5 Serang

Contextual data:

Bandung 300

Serang 200

In SPSS called a file

In SPSS called a table

4

Page 5: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

SPSS Syntax to construct multilevel data files:

GET FILE= "c:\multilevelmodeling\welfare.sav".

* Watch it: data MUST be sorted by country first!!.sort cases by district.

SAVE OUTFILE= "c:/multilevelmodeling/welfare.sav".

GET FILE= "c:\multilevelmodeling\all_individuals.sav".

* Watch it: sort data MUST be sorted by district first!!.sort cases by district.

SAVE OUTFILE= "c:/multilevelmodeling/all_individuals.sav".

match files table= "c:/ multilevelmodeling\welfare.sav" /file= "c:/multilevelmodeling/all_individuals.sav" /by district.EXE.

5

Page 6: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Ok, we have our data ready, what’s next?

First we like to know whether there is variation WITHIN districts

and variation BETWEEN districts:

6

Page 7: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

In Mlwin we have something simular: first we have an equation with the within variance:

Y = a + e ij where Y = dependent variable, a = intercept, e ij = within variance (error in regression analysis) i=individual, j=level2 (district)

Second we have an equation with the between variance:

a = B 0j + u j where B0j = intercept, u j = between variance

Substituting a in the first equation gives:

Y = B0 j + e ij + u 0j A multilevel null model !!!

So in plain words: all individuals scores (Y ij) depend upon some figure (B O j + some individual variation + some level 2 variation.

7

Page 8: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Yij = B 0j + e 0ij + u 0j for two individuals in Bandung:

Mean in Bandung

Overall mean across the population of districts

Y

X

8

Page 9: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Now suppose that all districts have the same mean body weight:

Then the between variance = 0.

Suppose that all individuals within a district have all the same weight:

Then the within variance = 0.

In many research there is both within and between variance or both level 1 and level 2 variance. The total variance of course is level 1 variance + level 2 variance.

Now suppose that all individuals are relatively closely clustered arond their district (or Group) means then the so-called intra class correlation is said to be high:

ICC = level 2 variance / total variance (=variance level 1 and 2)

ICC is always between 0 (only level 1 variance, no clustering) and 1 (only level 2 variance)

9

Page 10: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Now down to business. We have data (name SCHOOL23.sav, see our site, data used with kind permission from I. Kreft and J. De Leeuw. Introducing Multilevel Modeling. Sage Publications, 1998.) from 23 schools including 519 pupils and we have a math test als Y variabele. We like to know the between en within variances.

* SPSS syntax:mixed math /random intercept | subject(school) covtype(un) /print solution g testcov /method ml.

ICC= 24.85/ 24.85 + 81.23 = .23

10

Page 11: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Multilevel null model in MLwiN:

11

Page 12: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

We can also test with a Chi-square test whether ICC is significant. This way of testing is recommended, because it has NOT the normality assumption from a z-test.

In Mlwin you can use Chi square testing because the difference between two -2 loglikelihoods is Chi-square distributed.

Say we have a model with only level 1 variance with -2 loglikehood of 1600 and the same model but now both level 1 and 2 variance parameters:

-2 loglikelihood will be equal or lower! So -2 loglikihood figures are a measures of fit: the lower it is the better the models fits the data.

Because the difference in -2 loglikelihood between the models can be zero or higher, the test probability must be devided by 2!

Note: On our site we included a brief instruction about statistical testing in MlwiN.

12

Page 13: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Test whether ICC is significant or whether level 2 variance is significant different from zero we perform a Chi-square test:

-2 loglikelihood from 1 level model - -2 loglikelihood from 2 level model:

3933.064 – 3800.776 = 132 with 1 df. Which is highly significant. This test is superior to the Z-test in SPSS because the latter uses an estimate for the standard error.

Note: we test one sided because outcome is always zero or higher.

Model with one level only:

13

Page 14: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Let us assume that the difference between the -2 loglikihoods is 10. We have 1 df, because we added one extra parameter, which is the level 2 variance.

The Chi-square distribution looks something like this:

In fact we must divide 0.00156 by 2 to get the correct p=value, but the original p is already very low. The conclusion being that beyond reasonable doubt there is level 2 variance or ICC > 0!

14

Page 15: 2nd meeting:  Multilevel modeling: intra class correlation Subjects for today:

Testing in MLwin with Chi-square (more info in document about testing, see: ‘statistical testing in Mlwin.pdf’.

Type cprob 10 1 and press [Enter]

Note that 0.0015654 must be divided by 2 voor level 2 variance testing

15