andrew smith describing childhood diet with cluster analysis young statisticians meeting. 12th april...

21
Andrew Smith Describing childhood diet with cluster analysis Young Statisticians’ meeting. 12th April 2011

Upload: alejandro-wheeler

Post on 28-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Andrew Smith

Describing childhood diet with cluster analysisYoung Statisticians’ meeting. 12th April 2011

Page 2: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Describing diet with cluster analysis

• Pauline M. Emmett

• P. Kirstin Newby

• Kate Northstone

• World Cancer Research Fund

• MRC, Wellcome Trust, University of Bristol

2

Page 3: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Outline

• Introductions• ALSPAC• Food frequency questionnaires• Dietary patterns• Cluster analysis

• k-means cluster analysis

• Results• 3 cluster solution• Associations with socio-demographic variables

3

Page 4: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

ALSPAC

• Avon Longitudinal Study of Parents and Children

• Birth cohort study

• 14,541 pregnant women and their children

• www.bris.ac.uk/alspac

4

Page 5: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Food frequency questionnaires5

Page 6: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Dietary patterns

• Examine diet as a whole

• Analyse multivariate FFQ data

• Use correlations between foods

• PCA

• Cluster analysis

6

Image: Paul / FreeDigitalPhotos.net

Page 7: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Cluster analysis

• Separate subjects into

non-overlapping

groups

• Based on ‘distances’

between individuals

• Unsupervised learning

7

Image: Boaz Yiftach / FreeDigitalPhotos.net

Page 8: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

k-means cluster analysis

• Most widely used for dietary patterns

• Number of clusters, k, is specified beforehand

• Minimises – Distance from each subject to his/her cluster

mean– Summed over all subjects in that cluster– Summed over all clusters

8

Page 9: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

k-means cluster analysis9

Page 10: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Problems with the standard algorithm

• Short-sighted

• Tends to find solutions that are at a local minimum– So run algorithm 100 times and choose solution

that is minimum out of all minima

10

Page 11: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Standardising the input variables11

Page 12: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Reliability of the cluster solution

• Split sample in half

• Perform separate analyses on each half

• See how many children change clusters

• Repeat 5 times– 32 out of 8,279 children changed cluster (0.4%)

12

Page 13: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Processed4177 children13

Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net

Page 14: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Plant-based2065 children14

Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net

Page 15: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Traditional British2037 children15

Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net

Page 16: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Associations with socio-demographic vars

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

Girls 3,115 1 1 1

Boys 2,941 0.82 (0.72, 0.93)

1.03(0.89, 1.20)

1.18 (1.04, 1.34)

16

Page 17: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Associations with socio-demographic vars

Maternal age

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

< 21 130 1 1 1

21-25 994 0.59 (0.33, 1.07)

1.07 (0.56, 2.05)

1.57(1.02, 2.43)

26-30 2,644 0.52(0.29, 0.92)

1.20(0.64, 2.28)

1.60(1.04, 2.46)

31+ 2,288 0.37(0.21, 0.67)

1.50(0.79, 2.88)

1.77(1.13, 2.76)

17

Page 18: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Associations with socio-demographic vars

Maternal education

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

CSE 740 1 1 1

Vocational 504 0.84(0.60, 1.17)

1.19(0.82, 1.72)

1.01(0.76, 1.32)

O level 2,163 0.65(0.51, 0.83)

1.46(1.10, 1.94)

1.05(0.86, 1.30)

A level 1,604 0.42(0.33, 0.55)

2.01(1.50, 2.69)

1.18(0.95, 1.48)

Degree 1,045 0.30(0.23, 0.39)

2.75(2.00, 3.76)

1.22(0.94, 1.57)

18

Page 19: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Associations with socio-demographic vars

Siblings

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

0 older 2,755 1 1 1

1 older 2,317 1.21(1.03, 1.42)

1.12 (0.94, 1.36)

0.73(0.62, 0.86)

2+ older 984 1.58(1.28, 1.97)

0.99(0.76, 1.27)

0.64(0.52, 0.80)

19

Page 20: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Associations with socio-demographic vars

Siblings

Processed

Plant-based

Plant-based

Traditional British

Traditional British

Processed

0 younger 2,946 1 1 1

1 younger 2,490 1.01(0.86, 1.19)

0.58(0.48, 0.71)

1.69(1.44, 1.99)

2+ younger 620 1.21(0.92, 1.57)

0.43(0.33, 0.58)

1.90(2.50, 2.40)

20

Page 21: Andrew Smith Describing childhood diet with cluster analysis Young Statisticians meeting. 12th April 2011

Summary

• Multivariate methods to compress FFQ data into

dietary patterns

• k-means cluster analysis is widespread but must

be applied carefully

• Processed, Plant-based and Traditional British

clusters in 7-year-old children

• Associated with various socio-demographic

variables

21