andrew smith describing childhood diet with cluster analysis 6th september 2012
Post on 03-Jan-2016
217 Views
Preview:
TRANSCRIPT
Andrew Smith
Describing childhood diet with cluster analysis6th September 2012
Describing diet with cluster analysis
• Kate Northstone
• Pauline Emmett
• PK Newby
• World Cancer Research Fund
• MRC, Wellcome Trust, University of Bristol
2
Describing diet with cluster analysis3
Outline
• Introductions• ALSPAC• Food frequency questionnaires / diet diaries• Dietary patterns• Cluster analysis
• k-means cluster analysis
• Results• 4 cluster solution• Associations with socio-demographic variables
4
ALSPAC
• Avon Longitudinal Study of Parents and Children
• Birth cohort study
• 14,541 pregnant women and their children
• www.bris.ac.uk/alspac
5
Food frequency questionnaires6
Diet diaries
• Records all food and drink consumed over 3 day
period
• 2 weekdays and 1 weekend day
• Parent completes age 7
• Child completes age 10 and 13
7
Dietary patterns
• Examine diet as a whole
• Start with many variables
(food group intakes)
• Express as a small number of
variables
Image: Paul / FreeDigitalPhotos.net
8
Principal components analysis (PCA)
• Examine diet as a whole
• Start with many variables
• Use correlations between foods
• Express as a small number of
components
Image: Paul / FreeDigitalPhotos.net
9
Cluster analysis
• Examine diet as a whole
• Start with many variables
• Use similarities between people
• Express as a small number of
clusters
Image: Paul / FreeDigitalPhotos.net
10
Cluster analysis
• Separate subjects into
non-overlapping
groups
• Based on ‘distances’
between individuals
• Unsupervised learning
11
Image: Boaz Yiftach / FreeDigitalPhotos.net
k-means cluster analysis
• Most widely used for dietary patterns
• Number of clusters, k, is specified beforehand
• Minimises – Distance from each subject to his/her cluster
mean– Summed over all subjects in that cluster– Summed over all clusters
12
k-means cluster analysis13
Problems with the standard algorithm
The algorithm for k-means cluster analysis is:
• Short-sighted
• Tends to find solutions that are at a local minimum– So run algorithm 100 times and choose solution
that is minimum out of all minima
14
Standardising the input variables15
Reliability of the cluster solution
• Split sample in half
• Perform separate analyses on each half
• See how many children change clusters
• Repeat 5 times– 32 out of 8,279 children changed cluster (0.4%)
16
Results
• Food frequency questionnaire (FFQ) data– Age 7– 3 clusters
• Diet diary data– Age 7, 10 and 13– 4 clusters
17
Processed30.2% of children18
Image: Suat Eman, artemisphoto, -Marcus- / FreeDigitalPhotos.net
27.8% of childrenPlant-based (Healthy)19
Image: Suat Eman, Paul, Rob Wiltshire, Simon Howden, winnond / FreeDigitalPhotos.net
Traditional British21.3% of children20
Image: Suat Eman, Maggie Smith, Simon Howden / FreeDigitalPhotos.net
Packed Lunch20.6% of children21
Image: Grant Cochrane, luigi diamanti, Rawich, Master Isolated Images / FreeDigitalPhotos.net
Associations with socio-demographic vars
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
Girls 3,115 1 1 1
Boys 2,941 0.82 (0.72, 0.93)
1.03(0.89, 1.20)
1.18 (1.04, 1.34)
22
Associations with socio-demographic vars
Maternal age
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
< 21 130 1 1 1
21-25 994 0.59 (0.33, 1.07)
1.07 (0.56, 2.05)
1.57(1.02, 2.43)
26-30 2,644 0.52(0.29, 0.92)
1.20(0.64, 2.28)
1.60(1.04, 2.46)
31+ 2,288 0.37(0.21, 0.67)
1.50(0.79, 2.88)
1.77(1.13, 2.76)
23
Associations with socio-demographic vars
Maternal education
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
CSE 740 1 1 1
Vocational 504 0.84(0.60, 1.17)
1.19(0.82, 1.72)
1.01(0.76, 1.32)
O level 2,163 0.65(0.51, 0.83)
1.46(1.10, 1.94)
1.05(0.86, 1.30)
A level 1,604 0.42(0.33, 0.55)
2.01(1.50, 2.69)
1.18(0.95, 1.48)
Degree 1,045 0.30(0.23, 0.39)
2.75(2.00, 3.76)
1.22(0.94, 1.57)
24
Associations with socio-demographic vars
Siblings
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
0 older 2,755 1 1 1
1 older 2,317 1.21(1.03, 1.42)
1.12 (0.94, 1.36)
0.73(0.62, 0.86)
2+ older 984 1.58(1.28, 1.97)
0.99(0.76, 1.27)
0.64(0.52, 0.80)
25
Associations with socio-demographic vars
Siblings
Processed
Plant-based
Plant-based
Traditional British
Traditional British
Processed
0 younger 2,946 1 1 1
1 younger 2,490 1.01(0.86, 1.19)
0.58(0.48, 0.71)
1.69(1.44, 1.99)
2+ younger 620 1.21(0.92, 1.57)
0.43(0.33, 0.58)
1.90(2.50, 2.40)
26
Summary
• Multivariate methods to compress dietary data into
dietary patterns
• k-means cluster analysis is widespread but must
be applied carefully
• 3 clusters in FFQ data (Processed, Plant-based
and Traditional British)
• 4 clusters in diet diary data ( + Packed Lunch)
27
References
• Northstone, AS et al. (2012) ‘Longitudinal
comparisons of dietary patterns derived by cluster
analysis in 7 to 13 year old children’ British Journal
of Nutrition to appear.
• AS et al. (2011) ‘A comparison of dietary patterns
derived by cluster and principal components
analysis in a UK cohort of children.’ European
Journal of Clinical Nutrition 65, p1102-9.
28
top related