1st meeting: multilevel modeling: introduction subjects for today: basic statistics (testing) the...

17
1st meeting: Multilevel modeling: introduction Subjects for today: Basic statistics (testing) The difference between regression analysis and multilevel analysis Multilevel data base construction

Upload: neil-poole

Post on 02-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

1st meeting:

Multilevel modeling: introduction

Subjects for today:

Basic statistics (testing)

The difference between regression analysis and multilevel analysis

Multilevel data base construction

Page 2: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

INFERENTIAL STATISTICS….

2

Page 3: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

The distribution of age in the Netherlands in 1899

3

Page 4: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

4

Page 5: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Statistical testing using a normal distribution

5

Page 6: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

6

Page 7: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Calculate a standard error: take the standard deviation (s) and divide it by the square root of the sample size (n)

SE= s / √n assumption: the units in the sample are drawn at random. This means that all units in the population have a equal chance to be sampled.

In our example the standard deviation is 20.6 and we have 1000 units in a sample: SE= 20.6 / 31.6 = 0.65

Now suppose that when we sample unit 1 we have a 100% chance to sample unit 2, 3, 4, 5, 6, 7, 8, 9 and 10. Suppose further that all these units have more or less the same score on age.

As a consequence the sample is not 1000 anymore but 1000 / 10= 100! Hence the standard error is no longer .65 but 20.6 / 10 = 2.06! As a consequence results may be not significant (lower p-values!).

So, when a sample is clustered (for instance sampling unit 1 is sampling unit 2-10 as well) our effective sample tends to decrease! This is important to note because in multilevel we use clustered samples!

7

Page 8: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Clustered samples….. (see also the file “the bowl of rice problem.pdf”).

One may think that we still use a large number of cases but due to clustering we use an effective sample of only 6. Standard error would be close to 20.6 / 2.44 = 8.16 years!!!

8

Page 9: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Before we turn to multilevel analysis, let us first take a look a the data structure and how to construct a multilevel data set.

We take as an example Indonesia with about a 100 districts, like Bandung, Majalengka, and Serang.

Suppose we have a lot of respondents randomly from these districts. Then we may have:

A dataset from Bandung, a dataset from Majalengka and a dataset from Serang and Maybe many more…

First we like to construct a dataset with all respondents from all districts.

9

Page 10: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Bandung.sav

Majalengka.sav

Serang.sav

District x.sav

Bandung + Majalengka + Serang + District x.sav

* SPSS SYNTAX:.ADD FILES FILE "c:/multilevelmodeling/bandung.sav" /FILE "c:/multilevelmodeling/majalengka.sav" /FILE "c:/multilevelmodeling/serang.sav" /FILE "c:/multilevelmodeling/district x.sav".EXECUTE.

SAVE OUTFILE= "c:/multilevelmodeling/all_individuals .sav".

NOTE: all files should have same data definitions!!

10

Page 11: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Ok, now we have something like this:

Individual District body height

1 Bandung 150

2 Bandung 145

3 Bandung 156

4 Majalengka 118

5 Majalengka 174

6 Majalengka 156

7 Serang 167

8 Serang 153

9 Serang 145

10 District X 144

11 District X 177

12 District X 188

11

Page 12: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Now this may serve your purposes. For example you like to estimate the height effect on say one’s body weight with a linear regression model using ordinary least squares (OLS):

Body weight = a + b1* body height + e

This may be problematic as the error may be correlated because we have people from different districts (and districts may vary in average body height).

Bandung

MajalengkaSerang

12

Page 13: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

SOLUTION:

Body weight = a + b1* body height + b2 * Bandung + b3 * Majalengka + e

Districts are so-called ‘dummy-variables’ code 0 and 1

Serang is left out of equation: it is the reference category. So b2 is the estimated difference in average body weight in Bandung and Serang while taking into account body height differences.

The body height effect is now more correct because we rule out the effects of districts.

When we restrict our analyses to these districts, statistical speaking this is ok.

BUT, suppose we like to know WHY people in say Bandung on average are heavier than people in Serang. We need more information from these districts like average welfare.

13

Page 14: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

SOLUTION 1:

Body weight = a + b1* body height + b2 * Average Welfare + e

Problem: we have only three datapoints for Average Welfare: Bandung, Majalengka and Serang.

Problem2: The effect of welfare would only be valid for these three districts

Problem3: we must assume that Average Welfare explains all level 2 variance otherwise part of the error is still correlated

Problem 4: we mix up individual variance (not all people in Serang have same body weight) with district variance (not all districts have same average body weight : it is all in the variance of e .

14

Page 15: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Solution 2:

We first randomly collect a large number of districts say 30 or more and then randomly select individuals from these districts.

We set up a multilevel equation:

Y = a + b1 * body height + e1 (individual level) +

b2 * average welfare + e2 (district level)

e1 is within (districts) variance (individual variance), e2 is between (districts) variance

I return to this in detail during next meeting.

First: how to get the data right for this?

15

Page 16: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

Problem: we have individual data from many districts added into one bigfile, BUT we need to add the Welfare figures for each state:

Individual District body height Welfare (in € per capita)1 Bandung150 3002 Bandung145 3003 Bandung156 3004 Majalengka 118 1005 Majalengka 174 1006 Majalengka 156 1007 Serang 167 2008 Serang 153 2009 Serang 145 20010 District X 144 50011 District X 177 50012 District X 188 500

16

Page 17: 1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel

SPSS Syntax to construct multilevel data files:

GET FILE= "c:\multilevelmodeling\welfare.sav".

* Watch it: data MUST be sorted by country first!!.sort cases by country.

SAVE OUTFILE= "c:/multilevelmodeling/welfare.sav".

GET FILE= "c:\multilevelmodeling\all_individuals.sav".

* Watch it: sort data MUST be sorted by country first!!.sort cases by country.

SAVE OUTFILE= "c:/multilevelmodeling/all_individuals.sav".

match files table= "c:/ multilevelmodeling\welfare.sav" /file= "c:/multilevelmodeling/all_individuals.sav" /by country.EXE.

17