andrew smith describing childhood diet with cluster analysis young statisticians meeting. 12th april...
TRANSCRIPT
Andrew Smith
Describing childhood diet with cluster analysisYoung Statisticians’ meeting. 12th April 2011
Describing diet with cluster analysis
• Pauline M. Emmett
• P. Kirstin Newby
• Kate Northstone
• World Cancer Research Fund
• MRC, Wellcome Trust, University of Bristol
2
Outline
• Introductions• ALSPAC• Food frequency questionnaires• Dietary patterns• Cluster analysis
• k-means cluster analysis
• Results• 3 cluster solution• Associations with socio-demographic variables
3
ALSPAC
• Avon Longitudinal Study of Parents and Children
• Birth cohort study
• 14,541 pregnant women and their children
• www.bris.ac.uk/alspac
4
Food frequency questionnaires5
Dietary patterns
• Examine diet as a whole
• Analyse multivariate FFQ data
• Use correlations between foods
• PCA
• Cluster analysis
6
Image: Paul / FreeDigitalPhotos.net
Cluster analysis
• Separate subjects into
non-overlapping
groups
• Based on ‘distances’
between individuals
• Unsupervised learning
7
Image: Boaz Yiftach / FreeDigitalPhotos.net
k-means cluster analysis
• Most widely used for dietary patterns
• Number of clusters, k, is specified beforehand
• Minimises – Distance from each subject to his/her cluster
mean– Summed over all subjects in that cluster– Summed over all clusters
8
k-means cluster analysis9
Problems with the standard algorithm
• Short-sighted
• Tends to find solutions that are at a local minimum– So run algorithm 100 times and choose solution
that is minimum out of all minima
10
Standardising the input variables11
Reliability of the cluster solution
• Split sample in half
• Perform separate analyses on each half
• See how many children change clusters
• Repeat 5 times– 32 out of 8,279 children changed cluster (0.4%)
12
Processed4177 children13
Image: Suat Eman, Rawich, Master Isolated Images / FreeDigitalPhotos.net
Plant-based2065 children14
Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net
Traditional British2037 children15
Image: Suat Eman, Filomena Scalise, Maggie Smith / FreeDigitalPhotos.net
Associations with socio-demographic vars
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
Girls 3,115 1 1 1
Boys 2,941 0.82 (0.72, 0.93)
1.03(0.89, 1.20)
1.18 (1.04, 1.34)
16
Associations with socio-demographic vars
Maternal age
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
< 21 130 1 1 1
21-25 994 0.59 (0.33, 1.07)
1.07 (0.56, 2.05)
1.57(1.02, 2.43)
26-30 2,644 0.52(0.29, 0.92)
1.20(0.64, 2.28)
1.60(1.04, 2.46)
31+ 2,288 0.37(0.21, 0.67)
1.50(0.79, 2.88)
1.77(1.13, 2.76)
17
Associations with socio-demographic vars
Maternal education
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
CSE 740 1 1 1
Vocational 504 0.84(0.60, 1.17)
1.19(0.82, 1.72)
1.01(0.76, 1.32)
O level 2,163 0.65(0.51, 0.83)
1.46(1.10, 1.94)
1.05(0.86, 1.30)
A level 1,604 0.42(0.33, 0.55)
2.01(1.50, 2.69)
1.18(0.95, 1.48)
Degree 1,045 0.30(0.23, 0.39)
2.75(2.00, 3.76)
1.22(0.94, 1.57)
18
Associations with socio-demographic vars
Siblings
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
0 older 2,755 1 1 1
1 older 2,317 1.21(1.03, 1.42)
1.12 (0.94, 1.36)
0.73(0.62, 0.86)
2+ older 984 1.58(1.28, 1.97)
0.99(0.76, 1.27)
0.64(0.52, 0.80)
19
Associations with socio-demographic vars
Siblings
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
0 younger 2,946 1 1 1
1 younger 2,490 1.01(0.86, 1.19)
0.58(0.48, 0.71)
1.69(1.44, 1.99)
2+ younger 620 1.21(0.92, 1.57)
0.43(0.33, 0.58)
1.90(2.50, 2.40)
20
Summary
• Multivariate methods to compress FFQ data into
dietary patterns
• k-means cluster analysis is widespread but must
be applied carefully
• Processed, Plant-based and Traditional British
clusters in 7-year-old children
• Associated with various socio-demographic
variables
21