epidemiological analysis workshop by dr suzanne campbell

41
Training workshop: Epidemiological analysis Dr Suzy Campbell Postdoctoral Research Associate LSTM, Liverpool [email protected]

Upload: countdown-on-ntds

Post on 22-Jan-2018

192 views

Category:

Health & Medicine


2 download

TRANSCRIPT

Page 1: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Training workshop: Epidemiological analysis

Dr Suzy Campbell

Postdoctoral Research Associate LSTM, Liverpool [email protected]

Page 2: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Overview of Workshop

Welcome, introduction and overview

Brief introduction to epidemiological data

Key measures encountered in helminthology

Epidemiological data collection

Designing an analysis plan for your work

Useful sources of information – where to go next?

Final thoughts, questions and close

Page 3: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Epidemiology

The ‘who’, ‘what’, ‘where’, ‘when’, ‘why’, and ‘how’ of disease

Field epidemiology – study design, data collection

Analytical epidemiology – analysing the results of the data to meet the stated research objectives (quantitative and qualitative epidemiology, biostatistics)

- Determine disease distributions in populations

- Determine associations between outcome (eg disease status) and explanatory variables (factors of interest/’determinants’ of disease)

Extremely useful (often, essential) for targeting control strategies appropriately

Page 4: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Introduction to epidemiological data

Baseline surveying useful:

- To find out what disease/s present in which areas

- To define where interventions are needed (may not be all areas: can rank order of priority)

- To estimate intervention (drug) requirements

- To determine the frequency of interventions

Follow-up surveying useful:

- To assess changes in diseases (and intervention requirements) over time

- To assess whether the intervention is working (reductions in morbidity, increases in coverage)

- To know when you can stop (theoretically)

Page 5: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Epidemiological measures encountered in helminthology

Prevalence of schistosomiasis and STH by area (and infection intensity): stool and urine examination

Also essential to collect information on other risk factors (eg demography, WASH, environmental variables etc)

Treatment coverage data (usually school-based) AND age and sex

School-based data: also interested in

• Size of school-aged population

• Number of schools (and teachers) by district/province)

• School enrolment rate

Page 6: Epidemiological Analysis Workshop By Dr Suzanne Campbell

The ‘evidence pyramid’ of study designs

Image credit: http://libraryguides.unh.edu/health-literacy/pyramid

Most evidence tends to be observational you CANNOT attribute causality to observational data

Few experimental studies undertaken you CAN attribute causality (if well-designed and analysed)

Few systematic reviews/meta-analyses(observational evidence)

Page 7: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Epidemiological data collection

1. Know your research question/s at the BEGINNING!1. That way you are collecting the right data

2. What do you need, what don’t you need, who is the population (inclusion/exclusion)

3. Where could the mistakes be made

4. Quality quality quality!

2. Design data collection forms to collect this information1. Use sources of information eg literature review, expert advice

2. Test them! Pre-testing, pilot testing before you commence

3. Be open to adaptation (but STRONG note of caution here!)

Page 8: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Some examples of data collection forms for helminth surveys

Page 9: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Laboratory specific

Page 10: Epidemiological Analysis Workshop By Dr Suzanne Campbell
Page 11: Epidemiological Analysis Workshop By Dr Suzanne Campbell
Page 12: Epidemiological Analysis Workshop By Dr Suzanne Campbell
Page 13: Epidemiological Analysis Workshop By Dr Suzanne Campbell
Page 14: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Electronic data capture with ODK Collect

Page 15: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Using Open Data Kit (ODK) Collect and ‘tablets’ to collect data

Page 16: Epidemiological Analysis Workshop By Dr Suzanne Campbell

So you’ve collected the data: now what?

We need to carry out an analysis (often more than one) to answer certain questions. But where do we start?

• What analyses should be carried out?

• What statistical techniques are appropriate to use?

• Will these techniques be sufficient to answer the research questions?

• We need to choose the correct methods of analysis for each question.

• It is POOR PRACTICE to apply statistical techniques without thinking about their relevance to the research question and the study design.

You need an analysis plan: this essentially involves thinking ahead, from the time of designing the study, to determine the nature of the analysis to be conducted.

Page 17: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Designing an analysis plan for your study

1. This should be done AT THE BEGINNING

2. Know your data1. What is your research question (null hypothesis)?

2. What are your outcome variables? Your explanatory variables?

3. Always start with investigating the variables individually – describe them

4. Then move on to consideration of two variables – is there anything happening?

5. Then move on to consideration of modelling (univariable before multivariable)

ALWAYS: start simple – explore your data, do descriptive analyses, build up to more advanced modelling. NEVER just do statistical modelling without basic investigations!

Page 18: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Steps to designing an analysis plan

1. List the objective

2. Describe the study design

3. Identify variables of interest

4. Define inclusion/exclusion criteria

5. Explore variable(s) needed

6. Examine relationships of interest

7. Dummy tables and figures

8. Going further: confounding and effect modification, statistical modelling

Page 19: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Example:

• A study investigating prevalence, infection intensity and risk factors associated with schistosomiasis and soil-transmitted helminths (STH), in Barombi Kotto and BarombiMbo crater lakes, Cameroon

• Cross-sectional schistosomiasis and STH study undertaken in May-June 2016

Page 20: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Steps to designing an analysis plan

1. List the objective/s1. Describe the prevalence and infection intensity of schistosomiasis in the populations of

Barombi Mbo and Barombi Kotto2. Identify some of the risk factors associated with schistosomiasis (water contact behaviours)

2. Describe the study design

“The schistosomiasis and STH study is a cross-sectional study that was undertaken in May-June 2016 in the two crater lakes of Barombi Mbo and Barombi Kotto, in South West Region, Cameroon. It involved 338 children and adults who supplied a faecal and urine sample at commencement, for the purpose of ascertaining prevalence and intensity of schistosomiasis and STH. They also completed an interviewer-administered questionnaire.”

Page 21: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Steps to designing an analysis plan

3. Identify variables of interest:

Haematobium (binary), Mansoni (binary), Eggsper10mL (quantitative continuous), age (quantitative continuous), sex (categorical nominal), B1_Swim (categorical ordinal), B2_Wash (categorical ordinal), B3_Fish (categorical ordinal), B4_Other (categorical ordinal)

Get to know your data – don’t be scared of it!• What kind of variables do you have? Categorical, binary, ordinal, continuous etc

• What are your outcome variables?

• What are your explanatory variables?

Are all variables listed in a codebook for quick reference? If not, write this!

Page 22: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Variable name Code Variable name Code

P_Number Person’s unique ID number. NB: People with the same

number (but different letters) are in the same family

Guineensis 0 = No S. guineensis

1 = S. guineensis

888 = No data

Area BM = Barombi Mbo

BK = Barombi Kotto

B1_Swim Swims in lake: 1 = Never; 2 = Monthly; 3 = Weekly; 4 =

Daily; 5 = Refused; 6 = Unsure; 888 = No data

Gender F = Female

M = Male

B2_Wash Washes self in lake: 1 = Never; 2 = Monthly; 3 =

Weekly; 4 = Daily; 5 = Refused; 6 = Unsure; 888 = No

data

Age Age in years B3_Fish Fishes in lake: 1 = Never; 2 = Monthly; 3 = Weekly; 4 =

Daily; 5 = Refused; 6 = Unsure; 888 = No data

Agecohort Categorised age:

1 = Preschool-aged (1≤6 years);

2 = School-aged (>6≤18 years)

3 = Adults (>18 years)

B4_Other Enters lake for any other reason: 1 = Never; 2 =

Monthly; 3 = Weekly; 4 = Daily; 5 = Refused; 6 =

Unsure; 888 = No data

Haematobium 0 = No S. haematobium

1 = S. haematobium

888 = No data

B4Otherreason Other reason for entering the lake: As written

Eggsper10mL Number of eggs B5Home_water Source of water used at home: 1 = Lake; 2 = Bore

hole; 3 = Pool; 4 = Well; 5 = Pump; 6 = Other; 888 =

No data

Mansoni 0 = No S. mansoni

1 = S. mansoni

888 = No data

Page 23: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Steps to designing an analysis plan

4. Define inclusion/exclusion criteria• Exclude those with missing stool or urine results for prevalence and intensity analysis

• Exclude those without questionnaire answers for risk factor analysis

• Exclude all children aged less than 1 year

Page 24: Epidemiological Analysis Workshop By Dr Suzanne Campbell

5.Explore the variable(s) needed

Always start with investigating the variables individually – describe them

1. Frequency, percentage, summary statistics (mean, median, standard deviation, minimum and maximum for continuous variables)

2. How much data is missing? What will you do about it?3. What graphs are useful?

Histograms, bar charts, box plots, scatter plots

In our example:Descriptive statistics of Haematobium (number, percent), Eggsper10mL (Mean, median, SD), Haematobium_IOI (number, percent)

Graphical representation of the variable, such as histograms or boxplots of Haematobium and Eggsper10mL (to check range and distribution). If you were thinking of doing a regression—then a scatterplot would be appropriate

Page 25: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Standard (normal) distribution

VERY important for the outcome variable: it helps you determine the statistical techniques to use later on:

- Parametric

- Non-parametric

Page 26: Epidemiological Analysis Workshop By Dr Suzanne Campbell

5.Explore the variable(s) needed (cont’d)

Does a variable need to be categorised?

Eggsper10mL classes of infection intensity:

Page 27: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Calculation of categorised infection intensity:

New variable Haematobium_IOI:

if Eggsper10mL = 0 then Haematobium_IOI = 0 "No infection"

if Eggsper10mL > 1 and <= 50 then Haematobium_IOI = 1 "Light"

if Eggsper10mL > 50 then Haematobium_IOI = 2 "Heavy"

if Eggsper10mL missing then Haematobium_IOI = NA

Descriptive statistics for Haematobium_IOI: Table showing Haematobium_IOI(%, N)

Page 28: Epidemiological Analysis Workshop By Dr Suzanne Campbell

6. Examine relationships of interest:

One of our objectives is about relationships between Haematobium_IOI (likely, heavy infection intensity) and other variables.

Treat each of the selected 'other variables' separately, and come up with a plan to analyse the relationship. Are the relationships significant?

(i) Ask questions about the types of variables, and design issues, and the nature of the question itself

(ii) Choose a statistical method

(iii) Consider whether additional outputs are needed or useful: standard errors, confidence intervals, effect estimates and so on. Will graphs help?

Page 29: Epidemiological Analysis Workshop By Dr Suzanne Campbell

6. Examine relationships of interest:

Investigate the relationship between two variables of interest: Outcome variable and a risk factor: how much does one influence the other?

Eg: prevalence of schistosomiasis and age

The way in which these relationships are analysed depends on the types of variable combinations involved

y = mx + b

Page 30: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Basic statistical techniques and when to use them

Page 31: Epidemiological Analysis Workshop By Dr Suzanne Campbell

In our example:

Variation of heavy infection intensity with sex:

• Contingency (2x2) table showing sex (rows) by Haematobium_IOI and row %s

• Chi-squared test for prevalence of Haematobium Males vs Females (assessing the role of ‘chance’) – a test of statistical significance• E.g. there is a statistically significant difference between males and females in

prevalence of Schistosoma haematobium

• Odds ratio of prevalence of Haematobium in Males:Females and 95%CI

Variation of heavy infection intensity with age:

• Use AgeCohort (or construct new variable for age groups if this is not provided)

Page 32: Epidemiological Analysis Workshop By Dr Suzanne Campbell

6. Examine relationships of interest:

Remember:

The statistical program is quite capable of calculating something which is mathematically correct, but statistically invalid, because, for example, assumptions are not met. Therefore, for each statistical test, it should be indicated in the analysis plan what the assumptions are for each test and how they will be tested.

Page 33: Epidemiological Analysis Workshop By Dr Suzanne Campbell

7. Consider using ‘dummy’ tables

By this point, you will have quite a lot of output.

What is most useful?

How will you present it?

Page 34: Epidemiological Analysis Workshop By Dr Suzanne Campbell

In our example:

Table X Barombi Mbo Schistosomiasis and Soil-transmitted helminths Study: demographic determinants

of heavy infection intensity

Variable/values N Intensity of infection 95% CI for

Heavy %

P-value from chi-

squared

No

infection

Light Heavy

Sex Females X % % % %, % Chi-squared = X,

P = X, 1 dfMales X % % % %, %

Age 1<6 yrs X % % % %, % Chi-squared = X,

P = X, 5 df6<18 yrs X % % % %, %

18 yrs or

older

X % % %%, %

Page 35: Epidemiological Analysis Workshop By Dr Suzanne Campbell

8. Going further: Confounding & effect modification, statistical modelling

This is where you consider

• univariable and

• multivariable

regression analysis (statistical modelling)

This is where:

• we get the most important results: how much does one factor influence risk of getting schistosmiasis, when ADJUSTED for other factors?

• we get the most detail to guide decision-making: where to target resources

This is where you CONTACT an epidemiologist!

Page 36: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Regression made simple!

You are simply exploring the relationship between one (or more) explanatory variable (X) on an outcome variable (Y):

e.g. for a linear relationship, it is Y = a + b*X (a is the intercept, b is the slope of the line – ie the amount in which Y changes for each unit of X)

Page 37: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Multiple regression:

Having established that an outcome variable and an explanatory variable are associated, and estimated the regression coefficient, we might be interested in further exploring the other influences on the outcome variable. We can set up a multivariate (or multivariable) linear model such as:

Y = a + b X + c X + dX +......

Where a, b, c,... are regression coefficients indicating the dependence of Y upon variables X1, X2, X3.

The underlying mathematics is a bit more complex, and there are some key assumptions, but ESSENTIALLY NO NEW PRINCIPLES EMERGE.

Page 38: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Multiple regression

When a multiple regression model is fitted, we obtain estimates of the separate or adjusted relationships between the dependent variable and the independent (or explanatory) variables. Thus, the association between the outcome variable and one explanatory variable is adjusted for confounding with the other explanatory variables. This is the method for dealing with confounding and multiple exposures.

A confounding variable is one that is associated with both the dependent variable and an independent variable. If the confounder is left out of the analysis then a direct association will be found between the dependent and independent variable that only exists because of the indirect association due to the confounding variable.

Page 39: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Summary

Statistical analyses should not proceed in an ad hoc fashion

Good research requires construction of an analysis plan, which is driven by research objectives, and takes account of study design, including sampling and measurement issues, and variable types.

Page 40: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Any questions?

Page 41: Epidemiological Analysis Workshop By Dr Suzanne Campbell

Thank you!