upstream evaluations - university of york · 2019-12-20 · ‘upstream’ and ‘downstream’...

‘Upstream evaluations’ informing the development of programmes before

full trial

Kathy Sylva and Fiona Jelley Department of Education, University of Oxford

York RCT Conference

7 September 2017

Outline

1. Why ‘upstream’?

2. Standards of evidence with regard to interventions/programmes

3. Some history

• Formative evaluation

• Design experiments

• Stand-alone small scale studies; process studies

• Pilot studies before trial

4. An iterative approach to developing a CPD programme (Diamond and Powell, 2011)

5. An experimental approach to developing a maths intervention (Nunes et al, 2007)

6. Interventions that arise from practice in non-academic contexts

The Esmee Fairbairn-Sutton Trust Parent Engagement Fund: supporting practice based interventions through ‘upstream’ evaluations

• EasyPeasy Parent App - design, and effects on parents and children

• Parent Engagement Programme– design and problems in establishing effects

7. Moving ‘downstream’ to full trial: E.g., Education Endowment Foundation evaluations

8. Limitations, future directions

‘Upstream’ and ‘downstream’ evaluations

• Not a new concept, pulling together the literature on designing and evaluating interventions

• Upstream is close to the source of the programme, e.g.

Early implementing of an idea, refining it, adapting and initial testing

• Moving downstream to full scale evaluation

What is the best way to gather evidence when the programme is in the early stages of development, and money and time are in short supply? Some examples of downstream evaluations are considered to demonstrate (sometimes painfully) what they can and cannot tell us

EIF Standards of evidence: the journey

Early Intervention Foundation Evidence Standards

Formative v. Summative

Scriven (1996):

Formative “evaluations are intended - by the evaluator - as a basis for improvement – not effectiveness”

Typically formative evaluations are qualitative, e.g 1:1 interviewing participants, focus groups, learner portfolios, unstructured observation of practice

Design experiments

• First introduced in education by Brown (1992) and Collins (1992)

• Aim: study learning in the real world and not the lab, real world learning is ‘resistant to experimental control’

• Progressive refinement, placing a product or intervention in the real world to ‘see how it works’

• Dual goals: refining practice as well as constructing theory

Small scale studies, practitioner studies

As recommended by the Education Endowment Foundation:

‘It is important to make a distinction between DIY evaluation, which can be undertaken by teachers and take place in a single school or class, and other forms of evaluation such as randomised controlled trials… both forms of evaluation are useful, but they serve different purposes.’

The EEF DIY Evaluation Guide (Coe, Kime, Nevill and Coleman, 2013)

Pilot studies: two kinds

• At the very beginning of an intervention idea: Just ‘trying it out’

• OR a pilot study as part of a large scale RCT whose aim to establish feasibility and identify modifications to procedures/instruments when carrying out hypothesis-testing at large scale

(1)An iterative approach to evaluation Diamond and Powell 2011

Aim: develop vocabulary instruction CPD for Head Start teachers

Step 1: self-reported approaches to vocabulary instruction: semi structured small group interviews. N=137 teachers, qualitative analysis

Step 2: Teacher use of e-resources through ‘think aloud’ procedure. N=10 to find out challenges in using on –line resources, qualitative analysis

Step 3: Trial use of on-line coaching. N=5 teachers to find out ‘lived experience’ of coaching, qualitative analysis of experiences and quant. timesheets

Step 4: First Pilot N=11 teachers given on-line resources and face to face coaching. Outcome: teacher compliance with protocol through self report and subjective assessment via coaches

Step 5: Second Pilot N=18 intervention and N=16 control teachers

Measures- teachers’ and coaches’ compliance with protocol (data analytics on use of on-line resources, formal qualitative interviews with coaches)

Measures- teachers’ instruction observed in both groups by researchers (counting frequency of instructional practices)

Measures- audio taped teachers reading to children. (counting frequency of instructional practices)

N.B. No outcome measures of child language but LOTS of process measures

(2)An experimental approach to developing/testing a primary maths intervention (Nunes et al., 2007)

• ‘A causal link between children’s ability to reason logically and their mathematical understanding requires two kinds of design: longitudinal observation through tests over time, followed by experimentation. ‘

• The aim of the next two studies was to test a causal theory and NOT to develop an educational intervention…. that came later.

Nunes, T., Bryant, P., Evans, D., Bell, D., Gardner, S., Gardner, A., & Carraher, J. (2007). The contribution of logical reasoning to the learning of mathematics in primary school. British Journal of Developmental Psychology, 25(1), 147-166.

5 bricks added to one end of the row; 5 taken away from the other end

7 +5 -5

Item from a test of Logical Reasoning in maths

Nunes Study 1: Longitudinal design

Participants from four schools (N=59) tested on three occasions in Years 1

and 2 (over 16 months)

Measures:

Year 1: tests of working memory, IQ, logical reasoning

Year 2: all the above, plus maths attainment on SATs and researcher-designed maths test at end of year 2

Results :

• logical abilities and working memory predict maths achievement 16 months later

• Logical scores continued to predict maths levels after control for working memory, whereas working memory scores failed to maths levels after control for differences in logical ability

Nunes Study 2: Experimental Design

A researcher trained small groups of children from three of the four schools in logical reasoning for 12 weeks

Participants: N=13 intervention children who were underperforming on

logical competence at pretest screen (Note: these were new entrants to classes from which longitudinal participants were drawn the previous year)

N= 14 controls taken from previous longitudinal study who were under-performing on logic (Note no randomisation to treatment)

Measures and results Pretest: IQ, working memory, logical competence (same as in longitudinal study) Immediate post test : logical competence Delayed Post test : SATs and scores on researcher- designed maths improved (effect size 1.2 sd using Cohen’s d)

Conclusion – experimental group made more progress than control group on SATs and researcher designed achievement tests after control for working memory and maths pretest

Nunes et al. (2007) experiment results

Is the Nunes et al study ‘upstream’ evaluation?

YES as an educational intervention

A Downstream EEF evaluation followed the upstream experiment - What had to change?

• In the EEF evaluations experimental classrooms and control classrooms were randomly allocated in clustered design

• In the EEF evaluations the teachers and not a researcher carried out the classroom instruction

• Sample size determined by power calculations based on clustering

• Materials were modified for mainstream schools and teacher training materials developed

Supporting voluntary sector organisations in evaluation

There are many interventions developed by practitioners and

teachers that do not arise in academic contexts

The Esmee Fairbairn and Sutton Trust Parent Engagement Fund

2015-2017 funded five ‘promising’ small scale interventions, all involving parents from disadvantaged backgrounds in their children’s learning

Goal for all five: Oxford University to support their movement in the direction of ‘evidence based practice’ – small budgets and collaborative working

https://www.suttontrust.com/our-programmes/parental-engagement-fund/

(3)The EasyPeasy Study (Character Counts)

An app that encourages parents to support their preschool child’s task engagement through play.

It delivers games to the app on the parent’s phone.

The games seek to improve concentration, creativity and determination; skills linked to school readiness.

Each video centres on one example of parent-child play that will support development.

17

Aim of this evaluation

– What is the effect of using EasyPeasy on children’s self-regulation?

– What is the effect of using EasyPeasy on parents’ self-efficacy and parenting stress?

18

Study design

19

Participants

• 150 families recruited from 8 children's centres

• Coastal town in England

• Children aged between 2 and 6 years old (mean age: 3 yrs 7 mths)

Procedure

• Baseline measures – all parent report (paper)

• Individual randomisation (within centre) to intervention or control

• Use of app for 18 weeks; 1 new game via phone each week

Post-intervention

• Parent-report questionnaires 4-6 weeks post-intervention (online)

• Control group given access to the app after follow-up

Measures

• Child Self-regulation and Behaviour Questionnaire (CSBQ): parent report on 5-point rating scale Howard & Melhuish, validated 2016

• Behavioural self-regulation (8 items)

• Cognitive self-regulation (6 items)

• Emotional self-regulation (7 items)

• Tool to measure Parenting Self-Efficacy (TOPSE): parent report on 6-point scale Kendall & Bloomfield, 2005

• Play and enjoyment (6 items)

• Control (6 items)

• Discipline and Boundaries (6 items)

• Parenting Stress Index (PSI): parent report on 5-point scale Abidin, 1995

• Parent-child interaction subscale (12 items)

• Demographic information collected at baseline via questionnaires 20

Participant flow through the trial

21

Recruitment (n=170 from 8 centres)

Baseline measures collected

Excluded (n=20) - Not meeting inclusion criteria

Completed post-test and analysed (n=34) - Lost to follow-up (n=36)

Allocated to intervention (n=75) - Received allocated intervention (n=70) - Later found to be ineligible (n=5)

Completed post-test and analysed (n=41) - Lost to follow-up (n=33)

Allocated to control (n=75) - Received the app post data collection (n=74) - Later found to be ineligible (n=1)

Allocation

Follow-up and analysis

Randomised (within centre) (n=150)

Enrollment

Participant characteristics

22

All families with pre- and post-test (as analysed)

Intervention (n=34)

Control (n=41)

Parent gender (female) 32 (94.1%) 39 (95.1%)

Parent age 33.7 (6.19) 34.3 (6.09)

Parent ethnicity (White British) 23 (69.7%) 27 (65.9%)

Marital status (married, civil partnered, cohabiting) 23 (67.6%) 34 (82.9%)

Highest qualification

GCSE or below 8 (23.5%) 14 (34.1%)

Vocational 16-18 5 (14.7%) 1 (2.4%)

Academic 16-18 6 (17.6%) 4 (9.8%)

Degree or higher 12 (35.3%) 17 (41.5%)

Other 3 (8.8%) 5 (12.2%)

Employed (yes) 18 (52.9%) 21 (51.2%)

Partner employed 21 (91.3%) 32 (97%)

Home ownership 17 (51.5%) 19 (47.5%)

Relationship to child (biological) 32 (100%) 38 (95%)

Child gender (girls) 16 (48.5%) 18 (43.9%)

Child age in months 44.09 (8.48) 41.71 (8.12)

Child disability (yes) 1 (3%) 3 (7.3%)

Child ethnicity (White British) 25 (78.1%) 32 (80%)

Language spoken at home (only English) 23 (69.7%) 28 (70%)

No significant differences in demographic characteristics between the two groups

Analysis

75 families provided post-test data and included in the analysis (Note high attrition – but post hoc tests showed small differences between drop-outs and retainees.)

Analysis of Covariance (ANCOVA) was conducted, controlling for: pre-test score, child’s gender, age, children’s centre

Nesting of children in centres was taken into account (fixed effect at child level). No centre effects

23

Results

24

Outcome

Intervention Control

p effect size (Hedge’s g

and CI) n pre-test

mean (sd)

post-test mean (sd)

n pre-test mean (sd)

post-test mean (sd)

TOPSE play & enjoyment 31 4.19 (.73)

4.36 (.54)

41 4.03 (.73)

4.26 (.79)

ns 0.20 (-0.26, 0.66)

TOPSE control 31 3.23 (.73)

3.46 (.76)

41 3.22 (.89)

3.23 (.70)

ns 0.39 (-0.04, 0.82)

TOPSE discipline and boundaries

31 3.44 (.82)

3.70 (.67)

41 3.49 (.69)

3.39 (.83)

p<.05 0.51 (0.12, 0.90)

PSI parent-child dysfunctional interaction

30 1.67 (.46)

1.64 (.63) 41 1.78 (.61)

1.76 (.55)

ns 0.20 (-0.24, 0.64)

CSBQ behavioural self-regulation

31 3.36 (.65)

3.29 (.75)

41 3.17 (.64)

3.05 (.65)

ns 0.26 (-0.13, 0.65)

CSBQ cognitive self-regulation

31 3.45 (.65)

3.64 (.69)

41 3.53 (.62)

3.42 (.54)

p<.05 0.44 (0.01, 0.87)

CSBQ emotional self-regulation

31 3.53 (.61)

3.46 (.75)

41 3.33 (.67)

3.21 (.63)

ns 0.31 (-0.14, 0.76)

Covariates: child’s age, gender, children’s centre and pre-test scores

(Too many) limitations

Low retention between recruitment and follow-up

Small sample, highly localised

Constrained by parent-report measures

…with no direct child measure of self-regulation

No measure of parent use of the app (studying the ‘process’ of change)

25

Discussion and next steps

• Promising ‘early stage’ findings – indicate potential of light-touch, digital interventions for parents

• Possible relationship between parents’ reported boundary setting and children’s cognitive self-regulation (Pino-Pasternak & Whitebread, 2010)

• Next step: An ongoing RCT funded by EEF with more robust evaluation methods.

• Further study could use observational measure of children’s self-regulation, or teacher (more objective?) rating scales

• Collect detailed information (videos, daily diaries) on parents’ use at home

26

(4)The Parent Engagement intervention

• A social enterprise delivering training to schools and early years settings to support staff in parental engagement.

• The training helps staff to support parents to engage in their children’s learning, with a focus on disadvantaged parents.

• The intervention is delivered by staff over a school year.

• 2-3 parent workshops per term and resources and activities for families to take home.

• Intervention and controls in nursery classes.

27

Aim of the study

to assess the effects on children and parents of the intervention.

– What is the effect of the intervention on the child’s readiness for school?

– What is the effect of the intervention on the family’s home learning environment?

– How did schools respond to the intervention and engage parents? (collected by the intervention team)

28

Study design and methods

29

• Small-scale, school randomised evaluation

• Whole schools allocated to intervention or waiting list control

• Minimised on: % FSM, % EAL, parental engagement experience, school size, LA Design

• 20 schools recruited from four Local Authorities in the northwest

• Each school recruited 10 families with a 3-4 year old targeting pupil premium children

• Final sample: 167 families from 18 schools

Participants

•Parent and teacher report questionnaires at beginning and end of school year

•Home Learning Environment (HLE) questionnaire completed by parents

•Brief Early Skills and Support Index (BESSI) completed by child’s teacher Measures

•Staff from intervention schools trained in the intervention

• Intervention delivered by school staff to families over one academic year

•Control schools offered training the following year Procedure

Measures

Early Years Home Learning Environment (HLE): parent report Sylva et al., 2010

– 7 item questionnaire assessing frequency of learning activities at home

– Rated on 0-7 scale; total score out of 49

Brief Early Skills and Support Index (BESSI): teacher report Hughes & White, 2015

– 4-point rating scale

– High score indicates a problem

– Four subscales: Behavioural Adjustment, Language & Cognition, Daily Living Skills, Family Support

Demographic information collected at baseline

Qualitative data collected on parents’ and schools’ response

30

Participant flow through the trial

31

Schools expressed interest (n=20)

Analysed (n=9 schools) Families in analysis (n=65)

Lost to follow up (n=0 schools) Families lost to follow up (n=19)

Analysed (n=9 schools) Families in analysis (n=62)

Allo

cati

on

Fo

llow

-up

Randomised (n=20) Enro

llme

nt

Allocated to intervention (n=10): Dropped out (n=1) Recruited families (n=9 schools) Families completed pre-test measures (n=84)

Allocated to waiting list comparison (n=10): Dropped out (n=1) Recruited families (n=9 schools) Families completed pre-test measures (n=83)

An

alys

is

Lost to follow up (n=0 schools) Families lost to follow up (n=21)

Participant characteristics

32

All families in the trial (as randomised)

Families with follow up (as analysed)

Families lost to follow-up (attrition)

Intervention (n=84)

Comparison (n=83)

Intervention (n=65)

Comparison (n=62)

Intervention (n=19)

Comparison (n=21)

Child gender (girls) 34 (40.5%) 43 (53.1%) 30 (46.2%) 31 (50%) 4 (21.1%) 12 (63.2%)

Child age in months 43.2 (3.6) 44.2 (3.7) 43.4 (3.5) 44.0 (3.8) 42.4 (3.7) 44.7 (3.4)

Child ethnicity (White European)

40 (53.3%) 34 (47.2%) 36 (59%) 24 (44.4%) 4 (28.6%) 10 (55.6%)

Language spoken at home (only English)

45 (57.0%) 55 (70.5%) 41 (64.1%) 40 (64.5%) 4 (26.7%) 15 (93.8%)

Pupil Premium funded 62 (84.9%) 48 (87.3%) 52 (85.2%) 35 (85.4%) 10 (83.3%) 13 (92.9%)

Has an older sibling 63 (78.8%) 62 (76.5%) 52 (80%) 47 (75.8%) 11 (73.3%) 15 (78.9%)

Has special educational needs

6 (7.7%) 7 (8.6%) 4 (6.3%) 5 (8.1%) 2 (14.3%) 2 (10.5%)

Parent gender (female) 55 (87.3%) 67 (89.3%) 48 (90.6%) 51 (91.1%) 7 (70%) 16 (84.2%)

Notes: values are numbers (valid % in brackets) for categorical variables and mean (SD) for numerical variables No significant differences between intervention and comparison in analysed sample

Pre- and post-test means

Pre-test mean (SD)

Post-test mean (SD)

Sig. different at baseline

Home Learning Environment score

No Intervention group (n=54) 24.01 (10.26) 30.13 (8.41) Control group (n=59) 26.49 (9.12) 27.06 (9.89)

BESSI1 Behavioural Adjustment subscale No Intervention group (n=63) 2.30 (.47) 1.89 (.49)

Control group (n=65) 2.14 (.66) 1.94 (.58) BESSI1 Language and Cognition subscale

Yes Intervention group (n=63) 2.54 (.53) 1.67 (.51) Control group (n=65) 2.3 (.53) 1.67 (.47)

BESSI1 Daily Living Skills subscale


BESSI1 Family Support subscale


33

1A high score on the BESSI subscales indicates a problem

Analysis considerations

• Some attrition between pre and post: N=113-133 at post-test depending on measure (recruited sample: 167)

• Need to deal with clustered data – is two-level analysis appropriate for this small sample? Robust standard errors?

• Analyse differences between school means instead of individuals?

• How best to deal with this messy study?

34

Why we worry about nesting: ICC

Measure ICC in null model (N = 18 clusters)

Home Learning Environment (HLE) 0.11

BESSI Behavioural Adjustment 0.18

BESSI Language and Cognition 0.27

BESSI Daily Living Skills 0.16

BESSI Family Support 0.27

35

Limitations are many

Small sample of schools – constrained by capacity of training team

Nature of intervention necessitated cluster randomisation but only 10 schools per cluster

Limited by parent-report measures

Too limited for further stat?

36

Discussion and summary

Upstream stage in evaluation is important, especially if we wish to include practitioner based interventions which do not come with a long and distinguished academic history

Our conclusions so far

1. Don’t wait too long before outcome measurement

2. Importance of a counter factual

3. Importance of valid measures attuned to the intervention

4. Difficult to take account of nesting in small scale studies (and Nunes didn’t even consider it!) but important that we try

5. Sound exploration of recruitment and engagement procedures

6. Lack of significant effect does not mean intervention does not work (type 2 error)

7. Cosy relationship between intervention and evaluation team is mixed blessing.

37

Thank you

[email protected] http://www.education.ox.ac.uk/research/fell/

[email protected] http://easypeasyapp.com/

38

mailto:[email protected]

http://www.education.ox.ac.uk/research/fell/

mailto:[email protected]

http://easypeasyapp.com/

Cognitive self-regulation rating scale: items

Persists with difficult tasks

Chooses activities on their own

Does not need much help with tasks

Persists with tasks until completed

Waits their turn in activities

Likes to work things out for self

CSBQ; Howard, & Melhuish, E. (2016). An Early Years Toolbox for assessing early executive function, language, self-regulation, and social development: validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 10.

39

References

40

Abidin R.R. (1995). Parenting Stress Index, 3rd ed. Psychological Assessment Resource, Odessa, FL. Brown, A. L. (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions in

classroom settings. The journal of the learning sciences, 2(2), 141-178. Coe, R., Kime, S., Nevill, C., and Coleman, R. (2013). The DIY Evaluation Guide: Education Endowment Foundation. Collins, A. (1992). Toward a design science of education. In New directions in educational technology (pp. 15-22). Springer, Berlin,

Heidelberg. Diamond, K. E., & Powell, D. R. (2011). An iterative approach to the development of a professional development intervention for

Head Start teachers. Journal of Early Intervention, 33(1), 75-93. EIF Evidence Standards. (n.d.) Retrieved from http://www.eif.org.uk/eif-evidence-standards/ Howard, & Melhuish, E. (2016). An Early Years Toolbox for assessing early executive function, language, self-regulation, and social

development: validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 10. Kendall, S. & Bloomfield, L. (2005). Developing and validating a tool to measure parenting self-efficacy. Journal of Advanced

Nursing, 51, 174-181. Nunes, T., Bryant, P., Evans, D., Bell, D., Gardner, S., Gardner, A., & Carraher, J. (2007). The contribution of logical reasoning to the

learning of mathematics in primary school. British Journal of Developmental Psychology, 25(1), 147-166. Pino-Pasternak, D. & Whitebread, D. (2010). The role of parenting in children’s self-regulated learning. Educational Research

Review, 5, 220–242. Scriven, M. (1996). Types of evaluation and types of evaluator. Evaluation practice, 17(2), 151-161.

upstream evaluations - university of york · 2019-12-20 · ‘upstream’ and ‘downstream’...

Documents