advanced study design february 19, 2010. today’s class last week’s probing question advanced...

Advanced Study Design

February 19, 2010

Today’s Class

• Last Week’s Probing Question• Advanced Study Design• Assignments

Probing Question

• Let’s say you wanted to do a large-scale research study on boredom

• Under what conditions would it be preferable to use– Questionnaire items– Experience sampling method– Quantitative field observations

Today’s Class

Today…

• Validity• Validity Threats• Stratification• Counterbalancing and Cross-over Designs• Regression-Discontinuity Designs

Validity

• Useful jargon...

Validity(Trochim & Donnelly, 2007)

• Conclusion validity• Internal validity• Construct validity• External validity• Ecological validity

Conclusion Validity

• The degree to which conclusions you reach about relationships in your data are justified

Internal Validity

• Assuming that there is a relationship in the study, can you justifiably infer that the relationship is causal?

Construct Validity

• The degree to which inferences can legitimately be made from the operationalizations in your study, to the theoretical constructs on which those operationalizations were based

External Validity

• Do your results generalize to other people, procedures, places, and times?

Ecological Validity

• What is the degree to which the methods, materials, and settings of the study are relevant to natural/legitimate settings?

Ecological vs. External Validity

• Ecological validity– not about *generalization* to real-life situations– about the whether the "methods, materials and settings"

are similar (or identical) to real life.

• Ecological validity is about real-world *relevance*

• External validity is about generalizability

Examples?

• High External Validity, Low Ecological Validity• Low External Validity, High Ecological Validity

High External Validity, Low Ecological Validity

• Lab studies of “seductive details” effect• Instruction that does not include interesting but ultimately

irrelevant details leads to better learning, for students of variety of ages performed in lab settings at 2 universities with children of different socio-economic status (SES) & race

Low External Validity, High Ecological Validity

• A classroom study, with real students, involving legitimate educational tasks, presented in exactly the way a teacher would present them…

Low External Validity, High Ecological Validity

• A classroom study, with real students, involving legitimate educational tasks, presented in exactly the way a teacher would present them…

• With 1 student in each condition

Let’s consider a few examples

• Vote on which type of validity is violated (any of the five, could be multiple, could even be none)

• Explain your reasoning

Which type of validity is violated?

• Students who read bug messages perform more poorly on post-test

• So bug messages hurt learning!

You have chosen a categorical variable for the X axis; however, scatterplot graphs can only contain numerical variables.

• I have proven that students learn more Calculus from my Calculus tutoring system

• Here is my test, used both pre and post

• How well do you know Calculus? 1 2 3 4 5 Not well Very well

• My new tutoring system is much better than the previous tutoring system!

• I conducted a study comparing my new tutoring system to a previous one

• Students who completed the whole tutoring system performed significantly better on post-test in the experimental condition than control condition

• I conducted a study comparing my new tutoring system to a previous one

• Students who completed the whole tutoring system performed significantly better on post-test in the experimental condition than control condition

• Oops… did I mention only 3% of students completed the whole tutoring system in the control condition?

• Now that I have tested my new learning environment that responds to off-task behavior by giving it to single students in the guidance counselor’s office after school, we can be confident it will work in all school settings

• Now that I have tested my new learning environment with a set of 10 8th graders in Tuktoyaktuk (Northwestern Territory of Canada), all bilingual English-Inuvialuit, with fathers who work in the mine nearby, we can be confident it will work for all students

• Now that I have tested my new learning environment with a set of 120 8th graders in a predominantly middle-class Caucasian suburb of Worcester, we can be confident it will work for all students

Some Popular Threats to Internal Validity

Maturation Threat

• Something happens between pre-test and post-test, aside from your intervention, that impacts student change– E.g. the same thing would have happened

whether or not you ran your study

Maturation Threat

• Any horror stories from your research?

Maturation Threat

• One teacher taught the same material in class during the same week as the study

Mortality Threat

• Common in urban classrooms

Mortality Threat

• Large numbers of participants systematically drop out of the study

• Any horror stories from your research?

Mortality Threat

• Large numbers of participants systematically drop out

• Example: I ran a study with homeschool students; response rates were different between conditions

Regression to the Mean

• If you choose a group based on pre-test performance– The most frequent gamers– The students who scored in the bottom 10% on

the pre-test• Some of them were in that group by chance• And can be expected to do better on the post-

Diffusion of Treatment

• You assign kids to different conditions, but they see each others’ screens (or talk in the hallway, etc.)

• You assign classes randomly to condition within-teacher, but teachers learn strategies from the better condition and use that knowledge in the other condition

Diffusion of Treatment

• You assign kids to different conditions, but they see each others’ screens (or talk in the hallway, etc.)

• You assign classes randomly to condition within-teacher, but teachers learn strategies from the better condition and use that knowledge in the other condition– A major study comparing curricula in Baltimore was called

into question because teachers took teaching strategies from the experimental condition to the control condition

Compensatory rivalry/resentful demoralization

• Students in condition A learn about condition B, which is obviously better

• Resentful demoralization – “it’s no fair they got the better software, let’s just quit”

• Compensatory rivalry – “we can beat them, even if they got the better software”

Compensatory rivalry/resentful demoralization

• Students in condition A learn about condition B, which is obviously better

• Resentful demoralization – “it’s no fair they got the better software, let’s just quit”– More common for students

• Compensatory rivalry – “we can beat them, even if they got the better software”– More common for teachers

Confounding

• You changed multiple things in your intervention (often inadvertantly), and it’s not clear which change had the impact

• Some examples?

Confounding

• Your meta-cognitive intervention takes longer to go through– Better learning, or just more time-on-task?

Comments? Questions?

Stratification

Pure random sampling

• Let’s say you have an intervention that you want to test in 4 groups: urban, wealthy suburban, working-class suburban, and rural students– You have access to students in Worcester, Auburn,

Ashburnham, and Cambridge– If you just randomly sample in your population, you are

going to get a lot more people from Worcester than Ashburnham

– In fact, if you sample 100 people randomly, you have a significant chance of getting nobody at all from Ashburnham

Stratification

• Your population has N groups• Sample randomly within each group

Proportional Stratification

• Sample from each group in proportion to its’ size– e.g. randomly select• 5% of all students in Worcester• 5% of all students in Auburn• 5% of all students in Cambridge• 5% of all students in Ashburnham

Equalizing Stratification(also called “Disproportionate”)

• Sample from each group in proportion to get equal groups – e.g. randomly select• 25 students in Worcester• 25 students in Auburn• 25 students in Cambridge• 25 students in Ashburnham

What variables could you stratify on?(in learning sciences)

• Gender• Race/Ethnicity• Prior knowledge (pre-test large group, then

choose intervention sample)• Disabilities

• Why might you want to use– Proportional Stratification– Equalizing Stratification– Good Old Random Sampling

Some Reasons for Stratification

• Guarantee of representing all groups of interest in your sample

• Higher statistical power• Discover inter-group differences in

intervention’s effect

Some Reasons against Stratification

• Need to account for multiple groups in your statistical method

• Results will over-emphasize effects in rarer groups– e.g. what if an intervention is wonderfully

effective in major cities; stratification may make that effect harder to see

• Much more complicated, especially if you stratify on pre-test

Counterbalancing

• Also called “Cross-over design”

Counterbalancing

• Split your sample into groups A and B

Control Experimental

ControlExperimental

Time 1(Topic 1?)

Time 2(Topic 2?)

Advantages? Disadvantages?

• Split your sample into groups A and B

Control Experimental

ControlExperimental

Time 1(Topic 1?)

Time 2(Topic 2?)

Advantages

• Enables you to do a within-subjects statistical test– More statistical power

• You can look at longer-term effects of your intervention (by looking at time 2 behavior in group that got experimental condition at time 1)

Disadvantages

• Statistical analysis will be complicated if there is any carry-over effect from time 1 to time 2

• Longer study• Usually requires two versions of all material

such as two topics – if topics are different in difficulty or learning, there is increased variance (and thus less statistical power)

• Enthusiastically recommended by Shaaron Ainsworth in her AIED Evaluation tutorial

• For me, it has always been a disaster

Regression-Discontinuity Design

• The “that’s got to be invalid…” Design

• You conduct a pre-test• You choose a cut-off– Below the cut-off, you give the experimental

condition– Above the cut-off, you give the control condition

• You plot the pre-test and post-test for each condition on the same graph

No effect of intervention

Cut-off

Positive effect of intervention

Cut-off

Negative effect of intervention

Cut-off

Why would you want to use this study method?

• Cases where it is unethical to give the control condition to some students– Or where there is a real need for intervention for

students at the bottom of the distribution• Cases where the experimental condition is

expected to have no effect on students who don’t need it

What are some limitations of this method?

• Complicated statistics; low statistical power• Painful to explain to reviewers

Surprisingly…

• Regression to the mean isn’t a problem…• It doesn’t cause discontinuities in the

regression line!

Today’s Class

Assignment #4

• Any questions?

advanced study design february 19, 2010. today’s class last week’s probing question advanced...

high ecological validity

high external validity

validity useful jargon

validity trochim donnelly

justified slide

generalizability slide

condition slide

reasoning slide

Documents

last week’s mission: this week’s mission: food

periodontal probing

siemens probing

this week’s focus

advanced imaging and ultra-fast material probing with ......

this week’s schedule

probing procrastination

monday - this week’s spelling practice: this week’s

probing of local multifield coupling phenomena of advanced...

linear probing

high speed digital systems require advanced probing

probing technique

kvi - center for advanced radiation technology probing...

quadratic probing

structure probing

advanced operating systems ms degree in computer ...linux...

june 1 – 4, 1997 san diego, ca · • gardell, dave [ibm,...

probing causal mechanisms and strengthening causal...

this week’s topics

cloud probing