statistical challenges in genetic studies of mental disorders heping zhang collaborative center for...
TRANSCRIPT
Statistical Challenges in Genetic Studies of Mental Disorders
Heping Zhang
Collaborative Center for Statistics in ScienceYale University School of Medicine
June 9, 2009Institute of Mathematical StatisticsNational University of Singapore
Outline
• Heredity of Psychiatric Disorders – A Century Ago– One Example
• Genetic Studies of Mental Disorders – As We Are Speaking– Three Examples
• Statistical Challenges – Our Progress– Ordinal Traits– Multivariate Traits
• Closing Comments and Acknowledgements
2 of 55
Journal of Nervous and Mental Disorders, May 1911
PRELIMINARY REPORT OF A STUDY OF HEREDITYIN INSANITY IN THE LIGHT OF THE
MENDELIAN LAWS
BY GERTRUDE L. CANNON, A.M., AND A. J. ROSANOFF, M.D.
KINGS PARK STATE HOSPITAL, NEW YORK
3 of 55
Pedigrees from 11 Neuropathetic Patients
Correlated Phenotypes
Theoretical Conclusions
The Genetics of Tourette Syndrome
Tourette syndrome is a complex disorder characterized by repetitive, sudden, and involuntary movements or noises called tics.
Concordance in MZ twins ~ 50%Concordance in DZ twins < 10%
In 1986, Pauls and Leckman concluded that Tourette's syndrome is inherited as a highly penetrant, sex-influenced, autosomal dominant trait.
Pete Bennett, winner of the
7th series of Big Brother
7 of 55
The Genetics of Tourette Syndrome
In 2005, State’s lab identified mutations involving the SLITRK1 gene (13q31.1) in a small number of people with Tourette syndrome.
Most people with Tourette syndrome do not have a mutation in the SLITRK1 gene. Because mutations have been reported in so few people with this condition, the association of the SLITRK1 gene with this disorder has not been confirmed.
TSICG (2008): Lack of association between SLITRK1var321 and Tourette syndrome in a large family-based sample
Schizophrenia
Schizophrenia is a chronic, severe, and disabling brain disorder that affects about 1.1 percent of the U.S. population age 18 and older in a given year. People with schizophrenia sometimes hear voices others don’t hear, believe that others are broadcasting their thoughts to the world, or become convinced that others are plotting to harm them. These experiences can make them fearful and withdrawn and cause difficulties when they try to have relationships with others.
http://www.nimh.nih.gov
9 of 55
Genetic Studies of Schizophrenia
In 1987-88, it was reported “Bipolar affective disorders linked to DNA markers on chromosome 11” and “Localization of a susceptibility locus for schizophrenia on chromosome 5.”Some regions (e.g., dysbindin on chromosome 6p, neuregulin on 8p and G72 on 13q) have been more consistently identified as candidate regions.
Attract a lot of publicity, but couldn’t be replicated
Kraepelin (Textbook of Psychiatry, 1896) described ‘Dementia Praecox’ as an inherited disorder.Kety, Rosenthal, and Wender conducted a series of adoption studies beginning in 1968, establishing genetic basis for schizophrenics.
There may not be a true sequence variation in a gene that causes illness. Rather, variable expression through epigenetic modification of gene activation may be the key (DeLisi et al. 2007).
Genetic Studies of Schizophrenia
Nature Online July 30, 2008:• International Schizophrenia Consortium:
3,391 schizophrenia cases and 3,181 controls in a European sample
• Stefansson et al.:1,433 schizophrenia cases and 33,250 controls 3,285 cases and 7,951 controls
• Both groups report genetic deletions associated with schizophrenia in the same three locations
on chromosomes 1 and 15a third deletion on chromosome 22 that has previously been
connected with increased susceptibility to schizophrenia.
Genetic Studies of Schizophrenia
Nature July 31, 2008:• The surveys have identified sections of the human
genome that, when deleted, can elevate the risk of developing schizophrenia by up to 15 times compared with the general population.
• In ISC study, a total of 890 CNVs were observed in either a case or a control as a single occurrence. This set of CNVs showed a 1.45-fold increase in cases (empirical P = 5E-6). On average, 13.1% of cases of schizophrenia possessed a deletion or duplication observed only once in the sample, in contrast to 10.4% of controls.
Smoking
In 1990s, a series of large-sample twin studies in the US and other countries showed repeatedly that smoking is a heritable behavior. The heritability for nicotine dependence is estimated around 50%.In the last decade, about 20 genome-wide linkage scans for smoking behavior have been reported, but only a limited number of putative genomic linkages have been replicated in independent studies (Li 2007). Challenges include genetic heterogeneity, the size of the genetic effect, the density of markers, the definition and assessment of the phenotypes, and the statistical approaches (Li 2007).
13 of 55
Diagnosis of Psychiatric Disorders
Yale Global Tic Severity Scale and the symptom checklist and Yale-Brown Obsessive Compulsive ScaleOrdinal scales
Review with the familyPerform comorbid psychiatric diagnoses using the Schedule for Affective Disorders and Schizophrenia for School-Age Children, the Children’s Depression Rating Scale-Revised, and the Revised Children’s Manifest Anxiety Scale.
14 of 55
Schizophrenia – DSM-IV
Example 1: 295.30 Schizophrenia, Paranoid Type, Continuous
• Current: – With severe psychotic
dimension– With absent disorganized
dimension– With moderate negative
dimension
• Lifetime: – With mild psychotic dimension– With absent disorganized
dimension– With mild negative dimension
Example 2: 295.60 Schizophrenia, Residual Type, Episodic With Residual Symptoms
• Current: – With mild psychotic dimension– With mild disorganized
dimension– With mild negative dimension
• Lifetime: – With moderate psychotic
dimension– With mild disorganized
dimension– With mild negative dimension
http://www. psychiatryonline.com
Substance Abuse and Dependence
An individual continues use of the substance despite significant substance-related problems.
Dependence is defined as a cluster of three or more of the symptoms (Tolerance, Withdrawal, etc.) occurring at any time in the same 12-month period.
In SummaryPsychiatric disorders are generally assessed with instruments based on ordinal severity scoresComorbid psychiatric disorders are common:TS, OCD, ADHD, etc.Smoking, Alcohol, Depression, etc.
Fagerstrom Test for Nicotine Dependence (FTND)
1. How many cigarettes a day do you usually smoke?
1 to 1011 to 20
0 point1 point
21 to 3030 or more
2 points3 points
2. How soon after you wake up do you smoke your first cigarette?
After 60 minutes31- 60 minutes
0 point1 point
6 - 30 minutes< 5 minutes
2 points3 points
3. Do you smoke more during the first two hours of the day than during the rest of the day?
No 0 point Yes 1 point
4. Which cigarette would you most hate to give up?
Any other cigarette than the first one
0 point The first cigarette in the morning
1 point
5. Do you find it difficult to refrain from smoking in places where it is forbidden, such as public buildings, on airplanes or at work?
No 0 point Yes 1 point
6. Do you still smoke even when you are so ill that you are in bed most of the day?
No 0 point Yes 1 point
Total points
Ordinal Traits
Experimental Cross
September 17, 2008
April 24, 2009
18 of 55
Genetic Analysis of Ordinal Traits
Genetic Analysis of Ordinal Traits
Software for Analysis of Ordinal Traits
21 of 55
LOT: Linkage Analysis of Ordinal TraitsLOT is a software program that performs linkage analysis of ordinal traits for pedigree data. It implements a latent-variable proportional-odds logistic model that relates inheritance patterns to the distribution of the ordinal trait.
Contents1.Citation 2.Condition of use 3.Versions 4.Methodology 5.Input file formats
1..loc file 2..ped file
6.Downloads 7.Running LOT
1.Running LOT with GUI on Windows and Linux 2.Running LOT from command line in Windows 3.Running LOT from command line in Linux 4.Running LOT from command line in Mac OS X
8.Genehunter License Agreement
Inference of Inheritance Vectors v(t)• Nuclear family: 2 founders and n nonfounders
• Alleles of the two founders (1,2) (3,4)• v(t) = (v1, v2, …, v2n-1, v2n)’
• More complex pedigree: f founders and n nonfounders. Alleles of the f founders (1,2) (3,4) (5,6) … (2f-1,2f)
LOT: Methodology
v2j-1
=1, if grandpaternal allele is transmitted to the paternal meiosis to the jth sibling
=2, if grandmaternal allele is transmitted to the paternal meiosis to the jth sibling
v2j
=3, if grandpaternal allele is transmitted to the maternal meiosis to the jth siblingz=4, if grandmaternal allele is transmitted to the maternal meiosis to the jth sibling
,,logit 2211 ij
ik
ij
iiij UUxvUkYP Kk ,...,1,0
Genetic Model and Hypothesis Testing• Latent variable
• U1 : common genetic or environmental factors in a family not observed through the covariates• U2: genetic susceptibility of the family founders and nonfounders
• Proportional-odds logistic model
LOT: Data Files
Two input files are required: a locus data file and pedigree file.
• Locus file: This file contains information on genetic distances between markers, number of alleles at each locus and their frequencies. The format of this file is very similar from the standard GENEHUNTER (or LINKAGE) format.
• Pedigree file: This file consists of columns with the following information in the correct order :
Pedigree_ID Person_ID Father_ID Mother_ID Sex Phenotype Marker_genotypes Covariates
LOT: Output
Association Analysis
… …
n families
siblings insiblings 1n siblings nn
26 of 55
General Test StatisticAssume that there are n nuclear families. In the
family, there are siblings, i=1,…, n. For the
child in the family, the trait value is , the
covariates is and the genotype is . is the
number of allele A in the genotype . The
association test statistic can be constructed as
follows:
where is a weight function of and .
thi
in
ijythi
thj
ijg ijX
ijg
n
i
n
jijij
n
ii
i
XWTT1 11
,
ijW ijy
ijz
ijz
O-TDT
Model and Method
,1,...,1
,')())|((logit :Model
Kk
zgIgkYP k
. genotypein allele of copies ofnumber theis )(
effect. genetic is and ,parameters level are ' where
gDgI
sk
• Di-allelic maker with possible alleles A and a.
• Assume that there is a trait increasing allele , and
we use to denote the wild type allele(s)
• Consider a trait taking values in ordinal responses
1,…, K.
O-TDT
Score Statistic
,),(1 1
n
i
n
jijijij
i
XzywT
The score function under the null hypothesis is , where),|( PMYTET
1,1,)'ˆˆexp(1
)'ˆˆexp(),(ˆ
Kkz
zzk
k
k
0),0(ˆ z 1),(ˆ zK
),1(ˆ),(ˆ1),( zkzkzkw
O-TDT
Powers Based on 10,000 Replications – Test for Association in the Presence of Linkage
# F K Sig. level OTDT QTDT TDT
200 3 0.05 0.4067 0.2334 0.1961
0.01 0.1853 0.0842 0.0654
0.001 0.0469 0.0171 0.0116
4 0.05 0.4531 0.2354 0.1844
0.01 0.2201 0.0862 0.0618
0.001 0.0596 0.0164 0.0102
400 3 0.05 0.6960 0.4266 0.3471
0.01 0.4486 0.2068 0.1549
0.001 0.1887 0.0594 0.0384
4 0.05 0.7704 0.4609 0.3508
0.01 0.5405 0.2323 0.1556
0.001 0.2572 0.0707 0.0404
Simulation
Quantitative Trait
Collaborative Studies on Genetics of Alcoholism (COGA)
• In United States, 12.5% of Adults has ever had alcohol dependence problem in their life time (Hasin, et al, 2007)
• A large scale, multi-center study to map alcohol dependence susceptible genes.
• 143 families with 1614 individuals. 4720 SNPs from Illumina genotype data set.
• One ordinal trait with 4 levels was recorded (pure unaffected, never drank, unaffected with some symptoms, and affected).
• FBAT was also used for comparison
32 of 55
Association Analysis of COGA Data
SNP Markers That Are Significant at the 0.001 Level Based on O-TDT after Adjusting for Gender and Age
SNP Markers
Chromosome
Physical location
P-values Gene Names
Gender and Age Adjusted
Un-adjusted
rs1972373 14 18435498 0.00038 0.00017
rs1571423 10 125256948 0.00046 0.00035 LOC440007
rs485874 1 18182512 0.00050 0.00101
rs619 X 29916017 0.00055 0.07736 GK
rs718251 8 52437707 0.00067 0.01073
rs1869907 15 38835904 0.00087 0.03067
Nicotine Extraneous Variable
Smoking
Drinking
Multivariate Traits
Comorbid psychiatric disorders are common and their determinants are multi-factorial.
Multivariate Traits
34 of 55
In theory, comorbid disorders should be considered. Technically, testing
multiple traits simultaneously can avoid adjusting for multiple testing.
• How beneficial is it to consider multiple traits?
• In what situations, is it most beneficial to consider multiple traits?
But
Multivariate TraitsMultivariate Traits
Although we do not observe the causal relationship between the
genotypes and traits or among the traits, we generate the data
from 40 directed acyclic graphs (DAGs). For example,
G
1Y
2Y
3Y
An arrow between any two elements points to a causal relationship
Graphical Structures for Simulation Models
DAGs 1-20
DAGs 21-40
For in a DAG, if there exist some arrows pointing to , say, an arrow
from gene to and an arrow from to , we reflect these
relationships through a linear regression model as follows,
jY jY
G jY kY jY
3,2,1, , kjYXY jkkjGjjj for
).,(
,
,,),,0(
2
322
jkkjGjj
jkG
jj
YXN
YYX
N
ondistributi normal the from generated be can and on lConditiona
t.independenmutually are as ddistribute iswhere 1
If there are no arrows pointing to , is independent of the disease
gene and other traits, and distributed asjY jY
).,( 2jjN
SEMs for each DAG (quantitative traits)
Without loss of generality, we use the following models for illustration
,
,
,
322311333
211222
111
YYXY
YXY
XY
G
G
G
.)(Var
)(Var ,
)(Var
)(Var 22
j
kkjkj
j
Gjj Y
Yt
Y
Xh
:tyInterabili :tyHeritabili
.1
,1
,1)1(2
1
,1
,1)1(2
1 ,
1)1(2
1
23
223
213
223
2323
223
213
213
1323
223
213
23
3
22
212
212
1222
212
22
221
21
1
htt
t
htt
t
htt
h
ppβ
ht
t
ht
h
ppβ
h
h
ppβ
have wealgebra, simple some After
Heritability and Interability
G
1Y
2Y
3Y
EV
There may exist one or more extraneous variables that are not included
in the traits under consideration and that results in correlations among
the traits under consideration
variables. extraneousby induced is that ionconsiderat
under traits the among ncorrelatio the represents where
as ddistribute is that consider wesituation, this eaccommodat To
332
321
)(),,0(
)',,(
kjN
Extraneous Variables (EV)
• Generate the parent’s genotype via the haplotype frequencies
(AD=0.2, Ad=0.1, aD=0.1, ad=0.6, where D is the minor allele in trait
locus G and A is the minor allele in the marker locus)
• Given the parental genotypes, generate the offspring genotype
using 1cM between trait locus and marker locus
• Conditional on the trait genotype, using the SEMs of each DAG
discussed above to generate the trait values for different scenario.
0.35. ,15.0,05.0 ,05.0 ,1 ,0 222 and Let kjjjj th
.2.02.0,1 22 jkkjkjjj for and
let wevariables, extraneous of presence the In
Simulation Design and Settings
42 of 55
• Univariate FBAT
Rabinowitz, 1997; Whittaker and Lewis 1998
• FBAT-GEE for multiple traits
Lange et al. 2003
Testing Strategies
Structure
No. Un-FBAT FBAT-GEE Un-FBAT FBAT-GEE Un-FBAT FBAT-GEE
--
S1 0.0099 0.0100 0.0099 0.0100 0.0099 0.0100
S2 0.0096 0.0096 0.0085 0.0092 0.0101 0.0097
S3 0.0088 0.0095 0.0092 0.0091 0.0081 0.0089
S4 0.0098 0.0095 0.0095 0.0098 0.0092 0.0093
S5 0.0095 0.0091 0.0094 0.0091 0.0098 0.0099
S6 0.0090 0.0093 0.0091 0.0091 0.0070 0.0085
0.2
S1 0.0090 0.0097 0.0090 0.0097 0.0090 0.0097
S2 0.0100 0.0101 0.0094 0.0097 0.0094 0.0097
S3 0.0101 0.0101 0.0092 0.0096 0.0084 0.0096
S4 0.0095 0.0099 0.0101 0.0102 0.0087 0.0102
S5 0.0099 0.0100 0.0092 0.0101 0.0085 0.0095
S6 0.0093 0.0092 0.0080 0.0092 0.0078 0.0096
-0.2
S1 0.0095 0.0097 0.0095 0.0097 0.0095 0.0097
S2 0.0102 0.0101 0.0095 0.0097 0.0094 0.0097
S3 0.0104 0.0089 0.0098 0.0096 0.0093 0.0096
S4 0.0098 0.0096 0.0094 0.0097 0.0103 0.0102
S5 0.0090 0.0096 0.0095 0.0097 0.0093 0.0097
S6 0.0093 0.0091 0.0094 0.0096 0.0078 0.0097
kj 05.02 kjt 35.02 kjt15.02 kjt
Type I Errors: Quantitative Traits (alpha=0.01)
FBAT: dots and FBAT-GEE: triangles. .2.0,2.0, kjkjkj :Green :Red :Black0
.00
.20
.40
.60
.81
.0
Power t2 0.35
Structure No.7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0.0
0.2
0.4
0.6
0.8
1.0
Power t20.15
Structure No.7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0.0
0.2
0.4
0.6
0.8
1.0
Power t20.05
Structure No.7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Power: Quantitative Traits (Alpha=0.01)
Multivariate Trait
Kendall’s Tau: a non-parametric statistic measuring the strength of the relationship between two variables
pairs.ant disconcord and concordant ofnumber theare D and C where
1D)/-(C2
as defined isTau Kendall The n. size sample aFor
)n(n-
ant.disconcord ispair hesay that t wesign,different have
theyIf .concordant ispair hesay that t wesign, same thehave
and If ns.observatio ofpair a be ),( and ),(Let
ij
ij
YY
XXYXYX jjii
46 of 55
Association Test
Observations:
)'.,...,(M markers of vector a and)',...,( traitsofA vector )()1()()1( Gp MMTTT
(G))'C-(G)C(1),...,C-(1)(C
and
))'T-(Tf),...,T-(T(f
where,2
jiji
(p)j
(p)ip
(1)j
(1)i1
1
ij
ij
jiijij
v
u
vun
ULet
ddistributeUUVarUW UVarranka 2))((
1
0 ~ )('
Test Statistic
Notations:
Simulation Study-Model Setting
0 of value takes riumdisequilib linkage oft coefficien the
0.11 of value takes riumdisequilib linkage oft coefficien the
Nominal type I error comparison
Power evaluation
Given the genotype at the trait locus, a non-proportional odds model is used to generate ordinal phenotype data and a Gaussian distributed model is used for quantitative phenotype
48 of 55
Type I error comparison alpha = 0.05 alpha = 0.01 alpha = 0.001
#(family) K O-FBAT FBAT O-FBAT FBAT O-FBAT FBAT
200 3 0.043 0.044 0.009 0.009 0.001 0.001
4 0.049 0.051 0.008 0.007 0.001 0.001
5 0.059 0.062 0.013 0.01 <0.001 <0.001
6 0.047 0.043 0.005 0.005 <0.001 <0.001
400 3 0.049 0.051 0.012 0.009 0.002 0.002
4 0.055 0.054 0.009 0.011 0.001 0.001
5 0.042 0.041 0.006 0.006 0.001 0.002
6 0.045 0.045 0.006 0.008 0.001 0.001
600 3 0.036 0.038 0.006 0.006 <0.001 <0.001
4 0.054 0.055 0.013 0.010 0.001 0.001
5 0.061 0.055 0.005 0.009 0.001 <0.001
6 0.038 0.038 0.006 0.007 <0.001 <0.001
Power Comparison alpha = 0.05 alpha = 0.01 alpha = 0.001
#(family) K O-FBAT FBAT O-FBAT FBAT O-FBAT FBAT
200 3 0.783 0.778 0.553 0.541 0.261 0.249
4 0.732 0.702 0.492 0.456 0.213 0.184
5 0.760 0.672 0.541 0.429 0.277 0.193
6 0.504 0.403 0.266 0.184 0.076 0.042
400 3 0.980 0.982 0.922 0.916 0.757 0.752
4 0.961 0.946 0.882 0.857 0.664 0.627
5 0.978 0.949 0.914 0.839 0.757 0.604
6 0.792 0.664 0.584 0.437 0.328 0.203
600 3 0.999 0.999 0.989 0.991 0.958 0.954
4 0.996 0.988 0.978 0.970 0.920 0.885
5 0.999 0.990 0.987 0.957 0.935 0.837
6 0.947 0.859 0.826 0.658 0.582 0.379
Application for COGA Data
• Phenotypes:– Alcohol DX-DSM3R+Feighner (ALDX1)
• 4 categories
– Maximum number of drinks in a 24 hour period (MaxDrink)
• 4 categories
– Spent so much time drinking, had little time for anything else (TimeDrink)
• 3 categories
51 of 55
Single trait analysis
D7S679 with p-value 0.002879 for ALDX1 > 0.000538 = 0.05/(3*31)
Multiple traits analysisP-value is 0.000553 < 0.0016129 = 0.05/31 at marker D7S679, which is around 1 cM away from D7S1793 that has been reported to have linkage evidence.
Closing Comments
• Genetic studies of mental diseases involve many challenges: some are clinical, some are statistical, and some are scientific.
• We attempted to deal with a few statistical challenges. It remains to be seen as to whether we succeeded. However, our solutions appear promising.
• We need more people to pay attention to these challenges and be persistent in our pursuit.
54 of 55
AcknowledgementsXiang Chen Rui Feng Ching-Ti Liu Xueqin Wang
Minghui Wang Yuanqing Ye Meizhuo Zhang Wensheng Zhu