journal of neuroscience methodswyp/resource/papers/d lin.pdf · imaging and genetic data dongdong...

10
Journal of Neuroscience Methods 237 (2014) 69–78 Contents lists available at ScienceDirect Journal of Neuroscience Methods jo ur nal ho me p age: www.elsevier.com/locate/jneumeth Computational Neuroscience Review Sparse models for correlative and integrative analysis of imaging and genetic data Dongdong Lin a,b , Hongbao Cao c , Vince D. Calhoun d,e , Yu-Ping Wang a,b,a Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA b Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA c Unit on Statistical Genomics, Intramural Program of Research, National Institute of Mental Health, NIH, Bethesda 20852, USA d The Mind Research Network & LBERI, Albuquerque, NM 87106, USA e Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA h i g h l i g h t s We review integration methods for imaging and genomic data analysis. We focus on our efforts in developing sparse models for imaging and genomic data integration. We show real examples on applications of sparse models to detecting genes and diseases diagnosis. We give a perspective on future research directions in imaging genomics. a r t i c l e i n f o Article history: Received 5 May 2014 Received in revised form 27 August 2014 Accepted 1 September 2014 Available online 9 September 2014 Keywords: Imaging genetics Sparse modeling Correspondence analysis Integration Classification a b s t r a c t The development of advanced medical imaging technologies and high-throughput genomic measure- ments has enhanced our ability to understand their interplay as well as their relationship with human behavior by integrating these two types of datasets. However, the high dimensionality and heterogeneity of these datasets presents a challenge to conventional statistical methods; there is a high demand for the development of both correlative and integrative analysis approaches. Here, we review our recent work on developing sparse representation based approaches to address this challenge. We show how sparse models are applied to the correlation and integration of imaging and genetic data for biomarker identification. We present examples on how these approaches are used for the detection of risk genes and classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the integration of multiple imaging and genomic datasets including their interactions such as epistasis. © 2014 Elsevier B.V. All rights reserved. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2. Sparse models for [correlation/association] analysis of imaging and genetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.1. Sparse models for two block analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.2. Sparse multivariate regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3. Sparse model for integration of imaging and genetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4. Application to detection of genes and disease diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.1. Detection of risk genes with correlative models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2. Better diagnosis with integrated information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5. Conclusion and future direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Corresponding author at: Biomedical Engineering Department, Tulane University, New Orleans, LA, USA. Tel.: +1 504 865 5867. E-mail addresses: [email protected] (D. Lin), [email protected] (H. Cao), [email protected] (V.D. Calhoun), [email protected] (Y.-P. Wang). http://dx.doi.org/10.1016/j.jneumeth.2014.09.001 0165-0270/© 2014 Elsevier B.V. All rights reserved.

Upload: others

Post on 13-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

CR

So

Da

b

c

d

e

h

••••

a

ARRAA

KISCIC

C

h0

Journal of Neuroscience Methods 237 (2014) 69–78

Contents lists available at ScienceDirect

Journal of Neuroscience Methods

jo ur nal ho me p age: www.elsev ier .com/ locate / jneumeth

omputational Neuroscienceeview

parse models for correlative and integrative analysisf imaging and genetic data

ongdong Lina,b, Hongbao Caoc, Vince D. Calhound,e, Yu-Ping Wanga,b,∗

Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USACenter of Genomics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USAUnit on Statistical Genomics, Intramural Program of Research, National Institute of Mental Health, NIH, Bethesda 20852, USAThe Mind Research Network & LBERI, Albuquerque, NM 87106, USADepartment of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA

i g h l i g h t s

We review integration methods for imaging and genomic data analysis.We focus on our efforts in developing sparse models for imaging and genomic data integration.We show real examples on applications of sparse models to detecting genes and diseases diagnosis.We give a perspective on future research directions in imaging genomics.

r t i c l e i n f o

rticle history:eceived 5 May 2014eceived in revised form 27 August 2014ccepted 1 September 2014vailable online 9 September 2014

a b s t r a c t

The development of advanced medical imaging technologies and high-throughput genomic measure-ments has enhanced our ability to understand their interplay as well as their relationship with humanbehavior by integrating these two types of datasets. However, the high dimensionality and heterogeneityof these datasets presents a challenge to conventional statistical methods; there is a high demand forthe development of both correlative and integrative analysis approaches. Here, we review our recent

eywords:maging geneticsparse modelingorrespondence analysis

work on developing sparse representation based approaches to address this challenge. We show howsparse models are applied to the correlation and integration of imaging and genetic data for biomarkeridentification. We present examples on how these approaches are used for the detection of risk genesand classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the

ntegrationlassification

integration of multiple imaging and genomic datasets including their interactions such as epistasis.© 2014 Elsevier B.V. All rights reserved.

ontents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702. Sparse models for [correlation/association] analysis of imaging and genetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.1. Sparse models for two block analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.2. Sparse multivariate regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3. Sparse model for integration of imaging and genetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734. Application to detection of genes and disease diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.1. Detection of risk genes with correlative models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2. Better diagnosis with integrated information . . . . . . . . . . . . . . . . . . . . . .

5. Conclusion and future direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

∗ Corresponding author at: Biomedical Engineering Department, Tulane University, NeE-mail addresses: [email protected] (D. Lin), [email protected] (H. Cao), vcalhoun@

ttp://dx.doi.org/10.1016/j.jneumeth.2014.09.001165-0270/© 2014 Elsevier B.V. All rights reserved.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

w Orleans, LA, USA. Tel.: +1 504 865 5867.unm.edu (V.D. Calhoun), [email protected] (Y.-P. Wang).

Page 2: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

7 scienc

1

itrpi(psbatgrgeoiiltNipdm

tioGa(oeqptetracererf

ifeb2hMttciudao

0 D. Lin et al. / Journal of Neuro

. Introduction

In the past decades, increasing development of medical imag-ng and genomic techniques provides new opportunities to studyissue structure, function, and genetic variation as well as theirelationship with human behavioral components, e.g., cognitivehenotypes and psychiatric disorders. For example, medical imag-

ng measurements such as structural magnetic resonance imagingsMRI), functional MRI (fMRI), diffusion tensor imaging (DTI) andositron emission tomography (PET), provide quantitative mea-urements of structural information at brain tissue level, dynamiclood oxygenation-level dependent (BOLD) response of neuralctivity and brain structural and functional connectivity. In addi-ion, genetic measurements such as single polymorphism (SNP),ene expression, copy number variation (CNV) and proteomics caneveal structural and functional variations at molecular level. Theoal of imaging genetics is to identify genetic factors that influ-nce the intermediate quantitative measurements from anatomicalr functional images, and the cognition and psychiatric disordersn humans. Rasetti and Weinberger (2011) described a cascade ofmaging genetic studies, in which mutations start from geneticevel to cellular processes, to the system level, e.g., brain struc-ure, function and integrity, and eventually to human behaviors.umerous examples like this demonstrate that the fusion of

maging and genetics will facilitate the understanding of the patho-hysiology, the diagnosis of complex and heritable psychiatricisorders and the optimization of treatments in a personalizedanner.In recent imaging genetic studies, neural imaging endopheno-

ypes (or intermediate phenotype) derived from diverse medicalmages, are commonly used for genetic analysis. By the definitionf ‘endophenotype’ by psychiatric geneticists (Shen et al., 2013;ottesman and Gould, 2003), an endophenotype should: (1) bessociated with the illness/disorder of interest; (2) be heritable;3) be state-independent; (4) exist temporally before the onsetf the clinical illness in the pathophysiological pathway to themergence of the clinical syndrome; (5) be found with higher fre-uency in healthy relatives of illness/disorder than in the generalopulation. Each of these criteria is based on the hypothesis thathe effects of susceptible genes will be more penetrated to thendophenotyes. The endophenotypes are considered to be closer tohe biology of genetic function than the diagnostic results from self-eported and questionnaire-based clinical assessments (Gottesmannd Gould, 2003; Meyer-Lindenberg and Weinberger, 2006), whichan boost the causal variants detection power. Some quantitativendophenotypes derived from brain imaging are reproducible andeliable with high heritability, and can accommodate highly het-rogeneous symptoms from patients in the same group. For theseeasons, obtaining reliable and heritable endophenotypes is criticalor imaging genetics studies.

Many endophenotypes have been used in imaging genetic stud-es such as voxel-, vertex-, surface- or connection-based measuresrom structural, functional or diffusion images, respectively. Forxample, on structural MRI, the volumetric measures of total cere-ral and gray and white matters (Nymberg et al., 2013; Baaré et al.,001), cortical thickness and cortical area (Winkler et al., 2010)ave been studied as quantitative traits. On rest- or task-functionalRI images, there are functional endophenotypes utilized such as

he extent of activation or deactivation for each voxel responding tohe task-related stimuli, t-test contrast map, and functional brainonnectivity (van den Heuvel et al., 2013). On DTI images, brainntegrity (e.g., fractional anisotropy and mean diffusivity), meas-

res of coherent direction of axons (e.g., radial diffusivity and axialiffusivity) (Jahanshad et al., 2013; Kochunov et al., 2010) and thenatomical connectivity such as the measurement of fiber densityr integrity have also been explored. Moreover, due to the high

e Methods 237 (2014) 69–78

resolution of brain imaging (e.g., structural MRI), we can analyzethe genetic influence on diverse imaging endophenotypes acrossthe entire brain, which facilitates the understanding of underlyingneurobiological mechanism of psychiatric disorders.

Fig. 1 shows an integrative approach for combining imaging andgenetics techniques for biomarker detection, from which multi-ple psychiatric disorders or subtypes can be better classified. Wewill elaborate on this approach in the following three aspects: (1)Between modality analysis to explore the correspondence or asso-ciation between imaging and genetic data. One type of data is takenas an endophenotype (e.g., brain structure (Winkler et al., 2010),fiber integrity (Jahanshad et al., 2013), functional connectivity andnetwork (Thompson et al., 2013)) to find the correlated or asso-ciated variables in the other data (e.g., genotype (Filippini et al.,2009)). The imaging endophenotypes are usually used as quantita-tive traits to explore the potential genetic risk factors. As reviewedin (Liu and Calhoun, 2014; Ge et al., 2013), the research in thisarea has evolved from candidate approaches to the whole genomeand whole brain investigation, and many complicated models areproposed to account for the increasing number of variables. (2)Integration of imaging and genetic data for biomarker identifi-cation. A variety of medical imaging modalities (e.g., sMRI, fMRIand DTI) provide different insights on the change of brain or neu-ron activity at tissue-level, while genetic data (e.g., SNP, mRNAexpression, DNA methylation and proteomics) measure differentlayers of genetic information at the molecular level. These differ-ent types of data are complementary, so combing these multiplemodalities is likely to facilitate a better identification of biomarkersand a more comprehensive diagnosis of complex diseases. (3) Sub-typing or classification of diseases by using biomarkers extractedfrom multimodal data. The identified biomarkers from imaging andgenetic data contain complementary information about multiplepsychiatric disorders (e.g., schizophrenia, bipolar and unipolar dis-orders). By using these biomarkers as features and input into alinear or non-linear classifier, we can achieve better disease clas-sification, translating into more accurate diagnosis and ideally aclinical impact.

Many processing strategies and analysis approaches have beenproposed to combine imaging and genetic information. For exam-ple, as reviewed in (Hibar et al., 2011a), between modality analysismethods can be categorized into univariate and multivariate imag-ing genetic analysis. Voxel-wise genome-wide association study(vGWAS; Stein et al., 2010) and voxel-wise gene-wide study (vGe-neWAS; Hibar et al., 2011b) have been used to screen each pairof SNP/gene and voxel in maps of regional brain volume underthe control of multiple comparisons. Canonical correlation anal-ysis (CCA; Correa et al., 2008; Sui et al., 2010), partial least square(PLS; Wold et al., 1983; Krishnan et al., 2011) and parallel ICA (Liuet al., 2009) have been applied to extract a pair of correlated latentvariables from imaging and genetic datasets. Kernel machine based(Ge et al., 2012) and Bayes methods (Stingo et al., 2013) have alsobeen proposed for imaging genetics analysis. For biomarker iden-tification, joint ICA (Sui et al., 2011), multi-set CCA (Correa et al.,2010), multi-table PLS (Caplan et al., 2007) and multi-task learningmethods (Zhou et al., 2011) have been used in modeling multi-modal imaging, genetic and human behavioral data. Based on theidentified biomarkers, a variety of classifiers such as support vec-tor machine (Mourão-Miranda et al., 2005; Yang et al., 2010a) andmultiple kernel learning (Castro et al., 2014; Ji et al., 2008) havebeen applied to the classification of complex diseases.

Despite the success of these methods in the analysis of imagingand genetic data, there are still challenges due to the high dimen-

sionality and heterogeneity of these datasets. For example, manyconventional statistical methods such as CCA, PLS and ICA performpoorly for data with smaller sample size but with larger number offeatures/variables (e.g., voxels and SNPs). A dimensional reduction
Page 3: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

D. Lin et al. / Journal of Neuroscience Methods 237 (2014) 69–78 71

Fig. 1. The illustration of an integrative approach for imaging genetics study. Here, correlation/association is to find genetic biomarkers correlated/associated with imagingendophenotypes extracted from medical images and related to the disease of interest. Different from the correlation/association analysis, integration analysis aims to integratec rs andt genes

aomvariimffaigta

rodao

2i

cdl2gpn

omplementary imaging and genetic data for identifying disease-related biomarkeheir connectivity). The identified biomarkers will be then used for the detection of

pproach, e.g., principle component analysis or univariate test, isften used, which, however, may cause the loss of useful infor-ation. Statistical tests such as vGWAS (Stein et al., 2010) and

GeneWAS (Hibar et al., 2011b) may not have enough power due to large number of multiple comparisons. In addition, high collinea-ity among whole brain or genome-wide variables may introducenstability and computational problems in the model (e.g., non-nvertible of matrix, over-fitting). Sparse representation, a powerful

ethod recently developed in statistics and signal processing, hasound successful applications in a broad of fields such as bioin-ormatics (Lin et al., 2013a), remote sensing (Chen et al., 2011)nd image processing (Li et al., 2013). We have been develop-ng sparse models for imaging genetics study, which demonstratereat advantages for the identification of biomarkers, leadingo more accurate classification of diseases than many existingpproaches.

In this review, we will review recent developments of sparseepresentation based methods for imaging genetics, with a focusn our own work. We will show how a variety of sparse models areeveloped and applied to the correlation and integration of imagingnd genomic data. Finally, we conclude the review with a discussionf future research directions.

. Sparse models for [correlation/association] analysis ofmaging and genetic data

Fig. 2 shows three types of sparse models in general for theorrelation/association analysis between imaging and geneticata. In the first approach (Fig. 2(A)), sparse penalties such as

asso (Kohannim et al., 2012), group lasso (Silver and Montana,

012), fuse lasso (Yang et al., 2012; Chen et al., 2010) and sparseroup lasso (Silver et al., 2013), have been used in the model toerform feature selection in a dataset (e.g., SNP data). A smallumber of SNPs (i.e., non-zero entries in the vector) are identified

their interactions (e.g., genetic variants and their network, risk brain regions and and/or the diagnosis/subtyping of complex mental illnesses.

to be associated with specific imaging phenotypes. In the secondand third approaches, the two-block methods (Fig. 2(B)) andsparse multivariate regression models (Fig. 2(C)) are used. Both aremultivariate models to account for pleiotropic effects (Andreassenet al., 2013; Mier et al., 2010) (i.e., genetic variants associated withmultiple imaging quantitative traits) and the covariance structureamong imaging endophenotypes (Alexander-Bloch et al., 2013;Bullmore and Sporns, 2009). Two block methods include sparseCCA (Le Cao et al., 2009; Lin et al., 2013b), regularized kernel CCA(Waaijenborg and Zwinderman, 2009) and sparse PLS correlation(PLSC) (Le Floch et al., 2012; Le Cao et al., 2008), which are usuallyused for correspondence analysis between two modalities. Le Flochet al. (2012) compared the performance of detecting the correla-tion between fMRI imaging and SNP data with different methodssuch as univariate approach, PLSC, sparse PLSC, regularized kernelCCA, and their combinations with PCA and pre-filters. The resultsshow the best performance of using filtering in combination withsparse PLSC. Sparse multivariate regression is another way to studythe association between multiple imaging and genetic factors,which includes sparse multi-task regression (Wang et al., 2012a),sparse reduced rank regression (sRRR) (Vounou et al., 2010, 2012;Wang et al., 2012b), collaborative sRRR (Lin et al., 2013c) andsparse PLS regression (Chun and Keles , 2010). These differentmodels incorporate prior knowledge with different ways, leadingto different results. In the following, we present several modelsfor the correlation or association analysis of imaging and genomicdata, with a focus on our own work.

2.1. Sparse models for two block analysis

Due to high dimensionality and small sample size of neuroimag-ing and genetic datasets, there is usually an issue of high collinearityin the data. To this end, several types of regularized CCA and PLSapproaches have been proposed, which enforce different sparse

Page 4: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

72 D. Lin et al. / Journal of Neuroscience Methods 237 (2014) 69–78

Fig. 2. An illustration of three types of analysis methods with sparse penalizations for imaging genetics study. (A) Sparse univariate-imaging and multivariate-geneticregression is a sparse multiple regression model used to identify multiple genetic factors associated with a single imaging endopheontype. (B) Sparse two-block methodsinclude a number of correspondence analysis methods using sparse latent variable models, i.e., sCCA, sparse kernel CCA and sPLS. These methods can be used to explore apair of correlated latent variables, which consists of linear or non-linear combination of sparse features from each dataset. (C) Sparse multivariate regression is a regressionbased model to identify a set of genetic factors associated with a set of imaging endophenotypes, represented by the coefficient matrix penalized with a variety of sparseterms.

Page 5: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

scienc

pl(

m

wvnu∑atdcw

mueCTCamIshlutecmfsvggtfw

2

adiof

m

wdaTf

Stit

D. Lin et al. / Journal of Neuro

enalties (e.g., lasso, elastic net and sparse group lasso) on theoading vectors into the model. We present a unified frameworkLin et al., 2013a) to formulate sparse CCA models as in Eq. (1):

inu,v − ut∑

XYv + �1‖u‖G + �1‖u‖1 + �2‖v‖G + �2‖v‖1 s.t.

ut∑

XXu ≤ 1, vt

∑YY

v ≤ 1 (1)

here X,Y are the two data matrices; u and v are the loadingectors constrained by sparse terms; ||u||1 and ||v||1 are the l − 1orm based lasso penalty to perform the selection of individ-al variables/features in two datasets respectively; and ‖u‖G =

Ll=1ωl

∥∥ul

∥∥2, ‖v‖G =

∑Hh=1�h

∥∥vh

∥∥2

are the group penalties toccount for joint effects of features within the same group inwo data sets respectively. The group penalty uses the non-iffentiability of ||ul||2 (or ||vh||2) at ul = 0 (vh = 0) to set theoefficients of the group to 0, such that the entire group of featuresill be removed to achieve group sparsity.

The above model is realistic for the analysis of many cases, e.g.,ultiple SNPs rather than individual SNPs from the same gene

sually function together as a group to be associated with a dis-ase. This general formulation includes a variety of existing sparseCA models as specific examples (Le Cao et al., 2009; Witten andibshirani, 2009), e.g., CCA-l1 or CCA-elastic net (�1 = 0, �2 = 0) andCA-group lasso (�1 = 0, �2 = 0) models. The solution of Eq. (1) usu-lly involves with the inversion of the covariance matrices, whichight be non-invertible because of the high dimensionality of data.

n the simplified case when the covariance matrix is diagonal,parse CCA model also reduces to sparse PLS model. A simulationas been performed to evaluate the efficiency of CCA-sparse group

asso model. Fig. 3(a) shows the results of recovered loading vectors and v by CCA-l1, CCA-group and CCA-sparse group models respec-ively. It can be seen that the CCA-sparse group model can betterstimate true u and v than CCA-l1, and CCA-group model. Fig. 3(b)ompares the accuracy of recovering loading vectors from threeethods with respect to different noise levels corresponding to dif-

erent degrees of correlations between the two data sets. The resulthows that the CCA-group model can recover the most correlatedariables but give the highest total discordance. The CCA-sparseroup model has a comparable recovering accuracy as the CCA-roup model but with much less total discordance, especially whenhe noise level decreases. We have also applied these methods toMRI data and SNP data to identify significant sets of SNPs correlatedith brain region activity.

.2. Sparse multivariate regression models

Multivariate regression model (Fig. 2(c)) is another popularpproach for correlation analysis between imaging and genomicata, i.e., associating the entire genetic variants with whole brain

maging measurements. A regression model with regularizationn the coefficient matrix can be described by the followingormula:

inA||Y − XA||2F + �(A) (2)

here X, Y are the two data matrices (e.g., genomics and imagingatasets) with n × p and n × q (p,q � n) dimensions respectively;nd A is p × q coefficient matrix penalized by function �(·).here are several models that vary in the use of penalizedunctions.

A sparse multi-task learning method was applied to MRI and

NP datasets in (Wang et al., 2012a) with a combination of penaltyerms on A, �(A) = �1

∑Ki=1||A[i]||F + �2

∑qj=1||Aj||1, where || A[i]||F

ndicates a collaborative group lasso penalization on submatrix A[i]o account for the group structure of SNPs, e.g., multiple SNPs with

e Methods 237 (2014) 69–78 73

linkage disequilibrium (LD) or from the same gene; || Aj||1 is a rowsparse penalty on each individual SNP. The simulation experimentshows better performance of this method than multivariate linearregression, ridge regression and multitask feature learning modelsin predicting SNPs using longitudinal data.

Sparse low rank regression model was developed to considerthe correlations among features in Y matrix. A low rank regulariza-tion was imposed on the coefficient matrix A due to the collinearityin Y. Vounou et al. (2010) proposed a sparse reduced rank regres-sion method (sRRR) by decomposing coefficient matrix into twofull-rank matrices, i.e., A = PQ, P ∈ Rp×r, Q ∈ Rr×q„ where r is the rankof A (r < p, q). At each stage sRRR performs unit rank decompositionto extract a pair of column vector Pk, Qk, k = 1,2,. . .,r, which are alsoconstrained to be sparse (i.e., with the use of ||Pk||1, ||Qk||1). ThesRRR was tested to have higher sensitivity than univariate method.It was applied to a whole brain and whole genome data set toidentify those genetic variants associated with Alzheimer-relatedimaging biomarkers. A pathway sparse reduced rank (PsRRR) model(Silver et al., 2012), based on sRRR, was developed to consider thejoint effects of SNPs within the same pathway by enforcing a grouplasso penalty (i.e.,

∑Gg=1||Pg

k||2), followed by a SNP-level selection at

each pathway. A unit rank decomposition method is needed to esti-mate rank r, which cannot perform feature selection across ranks.Instead of decomposing A into two matrices, Wang et al. (2012b)applied a trace-norm penalty || · ||* to enforce the low rank of Aand combined it with group lasso penalty || · ||2,1. The model isapplied to the longitudinal imaging genetic data to identify imag-ing biomarkers associated with a set of SNPs. To further considerthe group effects of SNPs and gene–gene interactions, we proposeda collaborative sparse reduce rank regression (c-sRRR) (Lin et al.,2013c) to incorporate protein–protein interaction information intothe model for the grouping of SNPs. The method can perform bi-level selection at both SNP- and module-levels. Several top genemodules are identified to be associated with functional networkfrom postcentral and precentralgyri. The genes from these modulesare enriched in some susceptible schizophrenia-related pathways(Lin et al., 2013c).

In addition to the sparse models discussed above, there are othersparse multivariate regression analysis methods for integrativeanalysis of imaging and genetic data. For example, Liu et al. (2014)applied an overlapped group fused lasso to incorporate geneticinformation into the multivariate regression model to identify thebrain connectivity pattern in resting fMRI data.

3. Sparse model for integration of imaging and genetic data

Integrative analysis of multiple datasets can combine com-plementary information from each individual data and thereforemay provide higher power to identify potential biomarkers thatwould otherwise be missed by using individual data alone. Dueto different characteristics of diverse data modality (e.g., resolu-tion, size and format), data integration is challenging. Wang et al.(2012c) proposed a sparse multimodal multitask learning methodto combine sMRI, PET and GWAS data to identify genetic andphenotypic biomarkers associated with Alzheimer disease. In themodel, the variables from each modality were combined equallyand group lasso penalty was applied to identify those variablesshared by multiple tasks (e.g., disease diagnosis, quantitative traitassociation). We address the data integration problem by devel-oping a generalized sparse model (GSM) with weighting factors toaccount for the contribution of different data in data combination,

as shown in Fig. 4. To solve the small-sample-large-variable prob-lem, we developed a novel sparse representation based variableselection (SRVS) algorithm, which is described as the follow-ing, For the purpose of illustration, we consider joint analysis of
Page 6: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

74 D. Lin et al. / Journal of Neuroscience Methods 237 (2014) 69–78

Fig. 3. A comparison of group sparse CCA with the other sparse CCA models (e.g., CCA-lassfor simulated genetic and imaging data respectively) by three different models. (b) The tobetween imaging and genetic datasets.

Fdib

ig. 4. The representation of phentoypes or disease states by the fusion of twoatasets A1 and A2 with a parallel sparse model, where the correlative information

s represented simultaneously by non-zero entries in a sparse vector. The model cane easily extended to include multiple data sets.

o, CCA-group lasso). (a) The accuracy of recovering the loading vectors (i.e., u and vtal true positives and total discordance with respect to different correlation values

two types of data (which can be easily generalized to multipledata):

Y = [˛1A1, ˛2A2]

[X1

X2

]= ε + AX = ε, (3)

where Y ∈ Rm×1 is the observation vector (phenotypes of thesubjects or disease status); A1 ∈ Rm×n1 and A2 ∈ Rm×n2 are the mea-surements of two different data types; A = [˛1A1, ˛2A2] ∈ Rm×n,where ˛1 + ˛2 = 1, and ˛1, ˛2 > 0 are the weight factors for the twotypes of data; and ε ∈ Rm×1 is the measurement error due to noise.The biomarker/variable selection is formulated as the solution ofsparse vector X ∈ Rn×1 based on Y and A, where the sparsity (i.e.,

the number of non-zero elements) indicates the significant mark-ers/variables selected, as illustrated in Fig. 4.

For SNPs and fMRI imaging data, the number of samples is sig-nificantly less than that of variables, i.e., m � n. In this case, the

Page 7: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

D. Lin et al. / Journal of Neuroscienc

Fig. 5. Our sparse model based variable selection (SRVS) method can better identifyard

cop

wm

m

wilas

2e(fistvf2ntu(

F5

bnormal brain region (a) than Li et al.’s method (b) (i.e., yielding more clusteredegions that have been validated before), and give better classification accuracy foriscriminating schizophrenia patients and health controls (c).

onventional sparse recovery algorithms will often fail. In order tovercome such a difficulty, we find the approximate solution to theroblem by using the following approximation:

min∑t

i=1||Xi||p subject to ||Y − AX||2 ≤ ε (4)

here Xi is the approximation to X. It can be solved with the mini-ization of the following functions:

inX ||AX − Y ||2 + �P(X) (5)

here P(X) =∑t

i=1||Xi||p will make the solution to be sparse; �s a regularization parameter, whose choice depends on the noiseevel. We have proposed an iterative procedure for the solutionnd we called this method as sparse representation based variableelection (SRVS) (Cao et al., 2013a).

Using the model, we obtained encouraging result (Cao et al.,013a) for the selection of both fMRI imaging and genetic biomark-rs associated with schizophrenia on the analysis of 208 subjects92 cases and 116 controls). Fig. 5(a) shows the results of identi-ed brain regions with SRVS method in comparison with Li et al.’sparse regression method (Li et al., 2009) (Fig. 5(b)); our methodends to find regions with voxels clustered together that have beenerified before. We evaluate the performance of the method for dif-erentiating healthy controls and schizophrenia patients (Cao et al.,013a,b) by comparing the SRVS models regularized with three Lp

orms (p = 0, 0.5, 1). They all give higher classification ratios thanhat of Li et al.’s method (p < 1E−11) and the sparse model reg-larized with L1/2 norm generates the highest classification ratioFig. 5(c)).

ig. 6. (a) ROIs 3, 4, 7, 8 which are identified to be mainly correlated with genes ERBB4, N5, 56, 67, 68 which are found to correlate with Gene ERBB4, CHL1, GRM3 and MAGI2. No

e Methods 237 (2014) 69–78 75

4. Application to detection of genes and disease diagnosis

We demonstrate the applications of above described sparsemodels to imaging genomics with the following two examples:detection of genes and diagnosis of diseases.

4.1. Detection of risk genes with correlative models

Our proposed correlative or integrative model has found sig-nificant application to the detection of risk genes and signalingpathways. For example, based on the two pairs of canonical vari-ates using the group sparse CCA model (Eq. (1) of Section 2.1),we found some regions of interest (ROI) in the brain correlatedwith a set of genes, contributing to the cause of schizophrenia. Foreach gene–ROI, gene–gene and ROI–ROI correlation, 10,000 per-mutations were performed to test the significance. As shown inFig. 6(a), Gene ERBB4, NRG1, MAGI2 and GABRG2 show correla-tions with some ROIs such as 3, 4, 7, 8 (the index of ROI is defined bythe automated anatomic labeling (aal) template (Tzourio-Mazoyeret al., 2002)) with correlations of � = 0.1822, 0.1483, 0.1335, 0.1782(p < 0.004), respectively. These ROIs mostly consist of superior, mid-dle, medial front gyrus and precentral gyrus located at frontallobe containing primary motor cortex and were suspected to haveabnormal changes in schizophrenia patients (Kiehl et al., 2005;Honey et al., 2005). ROIs 75, 76, 77, 11 and 13 (not shown) arealso found to be correlated with gene ERBB4, NRG1, and MAGI2with the correlation value of 0.1822, 0.1649 and 0.1541 respec-tively. These ROIs are mainly located at thalamus (right), whichplays a critical role in coordinating the pass of information betweenbrain regions. Many studies show the association between dysfunc-tion of thalamus with schizophrenia (Sui et al., 2011; Clinton andMeador-Woodruff, 2004; Kiehl and Liddle, 2001). These three genesare also correlated with each other, which may have the similareffects on the ROIs. In addition, ROIs 91, 92, 99, 100, 103, 111, 112are correlated with many genes (e.g., GRIN2B, CHL1, ERBB4, NRG1,FOXP2, MAGI1, GABRG2, MAGI2). These ROIs are located at decliveof cerebellum and culmen. Their relationship with schizophrenia isnot clear yet but several previous publications have reported sig-nificant difference of these regions between normal control andschizophrenia patients (Kim et al., 2009) and there were modelsof schizophrenia on the importance of the cerebellum (Andreasenand Pierson, 2008).

4.2. Better diagnosis with integrated information

Many studies have demonstrated that the integration of fMRIand SNP information will give a more comprehensive analysis

RG1, MAGI1 and GABRG2. (b) ROIs 40, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,te: the index of ROI is given by the aal template (Tzourio-Mazoyer et al., 2002).

Page 8: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

76 D. Lin et al. / Journal of Neuroscienc

Fig. 7. Integration of fMRI (a) and SNP (b) with our proposed sparse models (Caoet al., 2013a) results in better classification of schizophrenia and healthy controlsthan using just fMRI data or SNPs only. In (c), blue line is the classification resultwithout using combined data, while the red and green lines are the results by comb-iar

ortcmhaCtfa

(ebmbu3ts2nSfbfi

dsdfiosct(eoccb

5

aoa

ng fMRI with SNP markers with different numbers, showing higher classificationccuracy. (For interpretation of reference to color in this figure legend, the reader iseferred to the web version of this article.)

f schizophrenia. Based on the biomarkers identified by the cor-elative and integrative models, we can use them to improvehe diagnosis of schizophrenia in combination with appropriatelassifiers. For example, Yang et al. (2010a) proposed a hybridachine learning method to identify schizophrenia patients from

ealthy controls combining genetic and fMRI data, and this methodchieved better classification accuracy than using either data alone.astro et al. (2014) proposed a multiple kernel learning approachhat employed both the phase and magnitude of complex-valuedMRI data for the classification, showing improved classificationccuracy compared to using only the magnitude of fMRI data.

We proposed a sparse representation clustering (SRC) modelCao et al., 2013a,b) to simultaneously select SNPs and fMRI vox-ls as biomarkers for schizophrenia and then input the detectediomarkers as features into a classifier such as support vectorachine (SVM). The accuracy of classification was further validated

y the cross-validation. This approach also offers an objective eval-ation of the identified biomarkers as described in Sections 2 and. Fig. 7 gives a demonstration that the combination of the iden-ified SNPs and fMRI voxels can lead to a better classification ofchizophrenia patients from healthy controls in a study (Cao et al.,013a,b). When the integrative model was tested on 20 schizophre-ia patients and 20 healthy controls, the use of both imaging andNPs markers can improve the accuracy from 67.5% (i.e., usingMRI only) to 92.5% (Lin et al., 2011). This indicates that by com-ining SNPs and imaging voxels, the complementary informationrom both data can increase the classification accuracy, leading tomproved diagnosis.

In addition to feature selection based on sparse modelsescribed in Sections 2 and 3, it should be noted that sparse repre-entations can be directly applied to classification or subtyping ofiseases. We have recently developed sparse models based classi-ers to chromosome classification (Cao et al., 2012a) and subtypingf leukemia (Tang et al., 2011) and gliomas (Tang et al., 2012),howing that sparse models provide a more robust approach inlassifying image data containing variations, with better accuracyhan many other classifiers such as SVM and Bayesian classifiersCao et al., 2012b). We have recently extended the model (Caot al., 2012a) to account for the correlation within the neighborhoodf an image pixel (Li et al., 2013) for further improvement of thelassification. We are currently also applying these models for thelassification of multiple psychiatric disorders (e.g., schizophrenia,ipolar and unipolar disorders).

. Conclusion and future direction

Imaging genetics is a relatively new and promising field whichims to explore genetic effects on the neurobiology and etiologyf brain structure and function, and human behavior and psychi-tric disorders. A number of heritable imaging endophenotypes can

e Methods 237 (2014) 69–78

improve our understanding of genetic influence on different brainregions. The availability of multiscale and multimodal imaginggenetics data bring about computational challenges for developingpowerful, efficient and biologically interpretable methods.

There have been many proposed approaches for the integrationof imaging and genomic information. Methods and tools such asICA, CCA, pICA have been developed by us, with many successfulapplications. Despite this fact, there are still many challenges toovercome in modeling the complex data. Sparse representationsprovide a powerful and flexible way to model the multi-scale het-erogeneous information. We have shown the advantage of sparsemodels in integrating imaging and genomic information in the fol-lowing aspects: (1) Imaging genetics data usually contain a smallersample size than that of features; enforcing sparsity can overcomethe difficulty of many existing models such as CCA and regres-sion analysis in modeling these data. (2) Imaging genomic dataoften exhibit many characteristics such as their correlations. Sparsemodels provide a powerful approach to consider this informationinto the integration. (3) Biological knowledge databases such asPPI contain rich information but is often overlooked. Sparse mod-els provide a flexible way to incorporate this prior knowledge intothe model. A problem with sparse models is that they are oftencomputationally expensive. However, many effective algorithmsare available to overcome such a difficulty.

The field of imaging genomics is progressing rapidly and posessignificant challenges. Most integrative analysis methods in imag-ing genetics focus on two modalities but there are many other typesof data available. Besides the GWAS data with SNPs, there are otherlevels of genetic data such as copy number variation, gene expres-sion, microRNA, DNA methylation and proteomics. These varioustypes of data have been widely studied in the study of complex dis-eases including cancers, e.g., TCGA (http://cancergenome.nih.gov/).Many advanced methods were proposed for integrating these mul-tiple omic datasets (Kristensen et al., 2014). The integration of othertypes of data will be able to provide a more comprehensive insighton genetic mechanisms underlying complex diseases. In addition,other types of neuroimaging measurements, e.g., sMRI, fMRI, PET,EEG and EPR, have also been available but limited work existedto integrate these multimodal imaging datasets as reviewed in (Suiet al., 2012a). For example, Sui et al. (2012b) proposed a mCCA + jICAmethod to integrate fMRI, DTI with DNA methylation to identifypotential brain abnormalities in schizophrenia. It is both promisingand pressing to develop multi-way integrative method to systemat-ically combine these multiple types of imaging and genetic datasetsto facilitate the understanding of brain activity and human behaviorand their underlying biological mechanisms.

Sparse models have also shown promise for the study of epista-sis, e.g., SNP–SNP interaction and gene–gene interaction. Epistasis,as an important factor for explaining the heritability of commontraits, has been widely studied in genetic data analysis (Cordell,2002; Pan et al., 2013). Some work has reported and verified theeffect of interactions on imaging traits as reviewed in (Birnbaumand Weinberger, 2013). The methods used for identifying theseinteractions are mainly based on exhaustive pair-wise searching.For example, Hibar et al. (2013) applied an iterative sure indepen-dence screening (SIS) algorithm to search SNP–SNP interactionsand found a significant interaction associated with temporal lobevolume. Meda et al. (2012) identified 109 SNP–SNP interactionsassociated with right hippocampal atrophy and 125 for rightentorhinal cortex atrophy. Despite of the crucial role of epistasis inexplaining part of missing heritability, the extremely high compu-tational burden of exhaustive searching imposes a challenge for the

study. Sparse representation can provide much effective way forepistasis identification from such a large number of interactions.For examples, adaptive mix lasso model (Wang et al., 2012d), adap-tive group lasso (Yang et al., 2010b) and more generally penalized
Page 9: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

scienc

mldw

diacisbitatNQamIodtitgtmetsugpd

A

2R

R

A

A

A

B

B

B

C

C

C

D. Lin et al. / Journal of Neuro

ultiple regressions (Hoffman et al., 2013) have the promise forarge scale interaction analysis. Further effort is needed for theevelopments of sparse methods for detecting epistasis factors,hich can be further incorporated into imaging genetics study.

In our current work, we mainly provide approaches for the pre-iction of schizophrenia patients from controls. However it is more

nteresting clinically to consider other psychiatric disorders suchs bipolar disorder and unipolar disorders. In some cases (e.g. psy-hotic bipolar disorder and schizophrenia) these multiple mentalllnesses share clinically overlapping symptoms, providing a way totudy the diagnostic utility of our methods. Clinically unipolar andipolar can sometimes be difficult to distinguish initially because

ndividuals with bipolar sometimes have more depressive episodeshan manic episodes and depressive episodes tend to be longernd more unpleasant for patients than manic episodes. To date,here is no biological marker to distinguish these two disorders.IMH’s Research Domain Criteria (RDoC) Project (Simmons anduinn, 2014) aims to define basic dimensions of function that cutcross disorders as traditionally defined and can be studied acrossultiple units of analysis, from genes to neural circuits to behaviors.

n such a context, the integration of imaging and genetics as well asther sources of biological information has the potential to betteriscriminate clinically subtle subgroups and hopefully to even-ually translate into individualized treatments. As demonstratedn our examples, sparse models provide a powerful approach forhe detection of biomarkers by integrating multiple imaging andenetic information, which can better utilize their specific charac-eristics and incorporate biological knowledge. We will apply these

odels to look not only within the diagnostic category, but also tovaluate the degree to which the biologic data is consistent withhe symptom-based categories. We are currently developing novelparse models to help identify risk genes and/or genomic regionsnderlying these multiple psychiatric disorders, and better distin-uish these clinically cryptic subgroups. Based on these discoveries,ersonalized and optimal treatments can be done according to theirifferent genetic make-ups.

cknowledgement

This work is partially supported by NIH grants (NIBIBR01EB000840, COBRE 5P20RR021938/P20GM103472,01GM109068, and R01MH104680).

eferences

lexander-Bloch A, Raznahan A, Bullmore E, Giedd J. The convergence of matura-tional change and structural covariance in human cortical networks. J Neurosci2013;33(7):2889–99.

ndreasen NC, Pierson R. The role of the cerebellum in schizophrenia. Biol Psychiatry2008;64(2):81–8.

ndreassen OA, Djurovic S, Thompson WK, Schork AJ, Kendler KS, O‘Donovan MC,et al. Improved detection of common variants associated with schizophrenia byleveraging pleiotropy with cardiovascular-disease risk factors. Am J Hum Genet2013;92(2):197–209.

aaré WF, Pol HEH, Boomsma DI, Posthuma D, de Geus EJ, Schnack HG, et al. Quanti-tative genetic modeling of variation in human brain morphology. Cereb Cortex2001;11(9):816–24.

irnbaum R, Weinberger DR. Functional neuroimaging and schizophrenia: a viewtowards effective connectivity modeling and polygenic risk. Dialogues Clin Neu-rosci 2013;15(3):279–89.

ullmore E, Sporns O. Complex brain networks: graph theoretical analysis of struc-tural and functional systems. Nat Rev Neurosci 2009;10(3):186–98.

ao HB, Deng HW, Wang YP. Segmentation of M-FISH images for improved classi-fication of chromosomes with an adaptive fuzzy c-means clustering algorithm.IEEE T Fuzzy Syst 2012a;20(1):1–8.

ao HB, Deng HW, Li M, Wang YP. Classification of multicolor fluorescence in-situ

hybridization (M-FISH) images with sparse representation. IEEE T Nanobiosci2012b;11(2):111–8.

ao H, Duan J, Lin D, Calhoun V, Wang YP. Integrating fMRI and SNP data forbiomarker identification for schizophrenia with a sparse representation basedvariable selection method. BMC Med Genomics 2013a;6(Suppl. 3):S2.

e Methods 237 (2014) 69–78 77

Cao H, Calhoun V, Wang YP. Biomarker identification for diagnosis of schizophreniawith integrated analysis of fMRI and SNPs. In: 2012 IEEE international confer-ence on bioinformatics and biomedicine (BIBM12). IEEE; 2013b.

Caplan JB, McIntosh AR, De Rosa E. Two distinct functional networks for successfulresolution of proactive interference. Cereb Cortex 2007;17(7):1650–63.

Castro E, Gomez-Verdejo V, Martinez-Ramon M, Kiehl KA, Calhoun VD. A mul-tiple kernel learning approach to perform classification of groups fromcomplex-valued fMRI data analysis: application to schizophrenia. Neuroimage2014;87:1–17.

Chen X, Kim S, Lin Q, Carbonell JG, Xing EP. Graph-structured multi-task regres-sion and an efficient optimization method for general fused Lasso; 2010arxiv:10053579.

Chen Y, Nasrabadi NM, Tran TD. Hyperspectral image classification using dictionary-based sparse representation. IEEE T Geosci Remote 2011;49(10):3973–85.

Chun H, Keles S. Sparse partial least squares regression for simultaneous dimen-sion reduction and variable selection. J R Stat Soc: Ser B (Stat Methodol)2010;72(1):3–25.

Clinton SM, Meador-Woodruff JH. Thalamic dysfunction in schizophrenia: neuro-chemical, neuropathological, and in vivo imaging abnormalities. Schizophr Res2004;69(2–3):237–53.

Cordell HJ. Epistasis: what it means, what it doesn’t mean, and statistical methodsto detect it in humans. Hum Mol Genet 2002;11(20):2463–8.

Correa NM, Li YO, Adali T, Calhoun VD. Canonical correlation analysis for feature-based fusion of biomedical imaging modalities and its application to detectionof associative networks in schizophrenia. IEEE J Sel Top Signal Process2008;2(6):998–1007.

Correa NM, Eichele T, Adali T, Li YO, Calhoun VD. Multi-set canonical correlation anal-ysis for the fusion of concurrent single trial ERP and functional MRI. Neuroimage2010;50(4):1438–45.

Filippini N, MacIntosh BJ, Hough MG, Goodwin GM, Frisoni GB, Smith SM, et al.Distinct patterns of brain activity in young carriers of the APOE-epsilon4 allele.Proc Natl Acad Sci U S A 2009;106(17):7209–14.

Ge T, Feng J, Hibar DP, Thompson PM, Nichols TE. Increasing power for voxel-wisegenome-wide association studies: the random field theory, least square kernelmachines and fast permutation procedures. Neuroimage 2012;63(2):858–73.

Ge T, Schumann G, Feng J. Imaging genetics—towards discovery neuroscience. Quan-titative Biology 2013;1(4):227–45.

Gottesman II, Gould TD. The endophenotype concept in psychiatry: etymology andstrategic intentions. Am J Psychiatry 2003;160(4):636–45.

Hibar DP, Kohannim O, Stein JL, Chiang MC, Thompson PM. Multilocus genetic anal-ysis of brain images. Front Genet 2011a;2:73.

Hibar DP, Stein JL, Kohannim O, Jahanshad N, Saykin AJ, Shen L, et al. Voxelwisegene-wide association study (vGeneWAS): multivariate gene-based associationtesting in 731 elderly subjects. Neuroimage 2011b;56(4):1875–91.

Hibar DP, Stein JL, Jahanshad N, Kohannim O, Toga AW, McMahon KL, et al. Exhaus-tive search of the SNP-sNP interactome identifies epistatic effects on brainvolume in two cohorts. Med Image Comput Comput Assist Interv 2013;16(Pt.3):600–7.

Hoffman GE, Logsdon BA, Mezey JG. PUMA: a unified framework for penalized multi-ple regression analysis of GWAS data. PLOS Computat Biol 2013;9(6):e1003101.

Honey GD, Pomarol-Clotet E, Corlett PR, Honey RA, McKenna PJ, Bullmore ET, et al.Functional dysconnectivity in schizophrenia associated with attentional modu-lation of motor function. Brain 2005;128(Pt. 11):2597–611.

Jahanshad N, Kochunov PV, Sprooten E, Mandl RC, Nichols TE, Almasy L, et al.Multi-site genetic analysis of diffusion images and voxelwise heritability anal-ysis: a pilot project of the ENIGMA–DTI working group. Neuroimage 2013;81:455–69.

Ji S, Sun L, Jin R, Ye J. Multi-label multiple kernel learning. NIPS 2008;2008:777–84.Kiehl KA, Liddle PF. An event-related functional magnetic resonance imaging study

of an auditory oddball task in schizophrenia. Schizophr Res 2001;48(2–3):159–71.

Kiehl KA, Stevens MC, Celone K, Kurtz M, Krystal JH. Abnormal hemody-namics in schizophrenia during an auditory oddball task. Biol Psychiatry2005;57(9):1029–40.

Kim DI, Mathalon DH, Ford JM, Mannell M, Turner JA, Brown GG, et al. Auditoryoddball deficits in schizophrenia: an independent component analysis of thefMRI multisite function BIRN study. Schizophr Bull 2009;35(1):67–81.

Kochunov P, Glahn DC, Lancaster JL, Winkler AM, Smith S, Thompson PM, et al. Genet-ics of microstructure of cerebral white matter using diffusion tensor imaging.Neuroimage 2010;53(3):1109–16.

Kohannim O, Hibar DP, Stein JL, Jahanshad N, Hua X, Rajagopalan P, et al. Discoveryand replication of gene influences on brain structure using LASSO regression.Front Neurosci 2012;6:115.

Krishnan A, Williams LJ, McIntosh AR, Abdi H. Partial least squares (PLS)methods for neuroimaging: a tutorial and review. Neuroimage 2011;56(2):455–75.

Kristensen VN, Lingjærde OC, Russnes HG, Vollan HKM, Frigessi A, Børresen-DaleA-L. Principles and methods of integrative genomic analyses in cancer. Nat RevCancer 2014;14(5):299–313.

Le Cao KA, Rossouw D, Robert-Granie C, Besse P. A sparse PLS for variable selectionwhen integrating omics data. Stat Appl Genet Mol Biol 2008;7, Article 35.

Le Cao KA, Martin PGP, Robert-Granie C, Besse P. Sparse canonical methods for bio-logical data integration: application to a cross-platform study. BMC Bioinform2009;10:34.

Le Floch E, Guillemot V, Frouin V, Pinel P, Lalanne C, Trinchera L, et al. Signifi-cant correlation between a set of genetic polymorphisms and a functional brain

Page 10: Journal of Neuroscience Methodswyp/resource/papers/D Lin.pdf · imaging and genetic data Dongdong ... derived from brain imaging are reproducible and reliable with high heritability,

7 scienc

L

L

L

L

L

L

L

L

L

M

M

M

M

N

P

R

S

S

S

S

S

S

S

8 D. Lin et al. / Journal of Neuro

network revealed by feature selection and sparse partial least squares. Neuroim-age 2012;63(1):11–24.

i Y, Namburi P, Yu Z, Guan C, Feng J, Gu Z. Voxel selection in FMRI data anal-ysis based on sparse representation. IEEE Trans Bio-med Eng 2009;56(10):2439–51.

i J, Lin D, Cao H, Wang YP. An improved sparse representation model with structuralinformation for multicolour fluorescence in-situ hybridization (M-FISH) imageclassification. BMC Syst Biol 2013;7(Suppl. 4):S5.

in D, Calhoun V, Wang YP. Integrating of SNPs and fMRI data for improved classifica-tion of schizophrenia. In: 2011 IEEE international conference on bioinformaticsand biomedicine (BIBM11); 2011.

in D, Zhang J, Li J, Calhoun VD, Deng HW, Wang YP. Group sparse canonical corre-lation analysis for genomic data integration. BMC Bioinform 2013a;14:245.

in D, Calhoun VD, Wang YP. Correspondence between fMRI and SNP databy group sparse canonical correlation analysis. Medical image analysis2013b;18(6):891–902.

in D, Calhoun V, Deng HW, Wang YP. Network-based investigation of genomic mod-ules associated with functional brain network in schizophrenia. In: 2013 IEEEInternational Conference on Bioinformatics and Biomedicine (BIBM13); 2013c.

iu J, Calhoun VD. A review of multivariate analyses in imaging genetics. FrontNeuroinform 2014;8:29.

iu J, Pearlson G, Windemuth A, Ruano G, Perrone-Bizzozero NI, Calhoun V. Combin-ing fMRI and SNP data to investigate connections between brain function andgenetics using parallel ICA. Hum Brain Mapp 2009;30(1):241–55.

iu A, Chen X, Wang ZJ, Xu Q, Appel-Cresswell S, McKeown MJ. A geneti-cally informed, group FMRI connectivity modeling approach: application toschizophrenia. IEEE Trans Biomed Eng 2014;61(3):946–56.

eda SA, Koran M, Pryweller JR, Vega JN, Thornton-Wells T. Genetic interactionsassociated with 12-month atrophy in hippocampus and entorhinal cortex inAlzheimer’s disease neuroimaging initiative. Neurobiol Aging 2012;30(1):e10.

eyer-Lindenberg A, Weinberger DR. Intermediate phenotypes and genetic mech-anisms of psychiatric disorders. Nat Rev Neurosci 2006;7(10):818–27.

ier D, Kirsch P, Meyer-Lindenberg A. Neural substrates of pleiotropic actionof genetic variation in COMT: a meta-analysis. Mol Psychiatry 2010;15(9):918–27.

ourão-Miranda J, Bokde AL, Born C, Hampel H, Stetter M. Classifying brain statesand determining the discriminating activation patterns: support vector machineon functional MRI data. Neuroimage 2005;28(4):980–95.

ymberg C, Jia T, Ruggeri B, Schumann G. Analytical strategies for large imag-ing genetic datasets: experiences from the IMAGEN study. Ann N Y Acad Sci2013;1282:92–106.

an Q, Hu T, Moore JH. Epistasis, complexity, and multifactor dimensionality reduc-tion. In: Genome-wide association studies and genomic prediction. Springer;2013. p. 465–77.

asetti R, Weinberger DR. Intermediate phenotypes in psychiatric disorders. CurrOpin Genet Dev 2011;21(3):340–8.

hen Li, Thompson PM, Potkin SG, Bertram L, Farrer LA, Foroud TM, et al. Geneticanalysis of quantitative phenotypes in AD and MCI: imaging, cognition andbiomarkers. Brain imaging behavior 2013;8(2):183–207.

ilver M, Montana G. Initiative AsDN: fast identification of biological pathways asso-ciated with a quantitative trait using group lasso with overlaps. Stat Appl GenetMol 2012;11(1):7.

ilver M, Janousova E, Hua X, Thompson PM, Montana G. Initiative aTAsDN:identification of gene pathways implicated in Alzheimer’s disease using lon-gitudinal imaging phenotypes with sparse regression. Neuroimage 2012;63:1681–94.

ilver M, Chen P, Li R, Cheng C-Y, Wong T-Y, Tai E-S, et al. Pathways-driven sparseregression identifies pathways and genes associated with high-density lipopro-tein cholesterol in two asian cohorts. PLoS Genet 2013;9(11):e1003939.

immons JM, Quinn KJ. The NIMH research domain criteria (RDoC) project: implica-tions for genetics research. Mammalian Genome: Off J Int Mammalian Genome

Soci 2014;25(1–2):23–31.

tein JL, Hua X, Lee S, Ho AJ, Leow AD, Toga AW, et al. Voxelwise genome-wideassociation study (vGWAS). Neuroimage 2010;53(3):1160–74.

tingo FC, Guindani M, Vannucci M, Calhoun VD. An integrative Bayesian modelingapproach to imaging genetics. J Am Stat Assoc 2013;108(503):876–91.

e Methods 237 (2014) 69–78

Sui J, Adali T, Pearlson G, Yang H, Sponheim SR, White T, et al. A CCA+ ICA based modelfor multi-task brain imaging data fusion and its application to schizophrenia.Neuroimage 2010;51(1):123–34.

Sui J, Pearlson G, Caprihan A, Adali T, Kiehl KA, Liu J, et al. Discriminating schizophre-nia and bipolar disorder by fusing fMRI and DTI in a multimodal CCA+ joint ICAmodel. Neuroimage 2011;57(3):839–55.

Sui J, Yu Q, He H, Pearlson GD, Calhoun VD. A selective review of multimodal fusionmethods in schizophrenia. Front Hum Neurosci 2012a;6:27.

Sui J, He H, Liu J, Yu Q, Adali T, Pearlson GD, et al. Three-way FMRI-DTI-methylationdata fusion based on mCCA + jICA and its application to schizophrenia. Conf ProcIEEE Eng Med Biol Soc 2012b;2012:2692–5.

Tang W, Cao H, Wang YP. A compressive sensing method for subtyping of leukemiawith gene expression analysis data. J Bioinform Computat Biol 2011;9(5.).

Tang W, Zhang J, Duan J, Wang YP. Subtyping of Glioma by Combining Gene Expres-sion and CNVs Data Based on a Compressive Sensing Approach. Adv Genet Eng2012;1(101), 2169-0111.

Thompson PM, Ge T, Glahn DC, Jahanshad N, Nichols TE. Genetics of the connectome.Neuroimage 2013;80:475–88.

Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N,et al. Automated anatomical labeling of activations in SPM using a macro-scopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage2002;15(1):273–89.

van den Heuvel MP, van Soelen IL, Stam CJ, Kahn RS, Boomsma DI, Hulshoff Pol HE.Genetic control of functional brain network efficiency in children. Eur Neuropsy-chopharmacol 2013;23(1):19–23.

Vounou M, Nichols TE, Montana G. Discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regressionapproach. Neuroimage 2010;53(3):1147–59.

Vounou M, Janousova E, Wolz R, Stein JL, Thompson PM, Rueckert D, et al.Initia ADN: sparse reduced-rank regression detects genetic associationswith voxel-wise longitudinal phenotypes in Alzheimer’s disease. Neuroimage2012;60(1):700–16.

Waaijenborg S, Zwinderman AH. Correlating multiple SNPs and multiple diseasephenotypes: penalized non-linear canonical correlation analysis. Bioinformatics2009;25(21):2764–71.

Wang H, Nie F, Huang H, Kim S, Nho K, Risacher SL, et al. Alzheimer’s diseaseneuroimaging I: identifying quantitative trait loci via group-sparse multitaskregression and feature selection: an imaging genetics study of the ADNI cohort.Bioinformatics 2012a;28(2):229–37.

Wang H, Nie F, Huang H, Yan J, Kim S, Nho K, et al. From phenotype to genotype:an association study of longitudinal phenotypic markers to Alzheimer’s diseaserelevant SNPs. Bioinformatics 2012b;28(18):i619–25.

Wang H, Nie F, Huang H, Risacher SL, Saykin AJ, Shen L. Identifying diseasesensitive and quantitative trait-relevant biomarkers from multidimensionalheterogeneous imaging genetics data via sparse multimodal multitask learning.Bioinformatics 2012c;28(12):i127–36.

Wang D, El-Basyoni IS, Baenziger PS, Crossa J, Eskridge K, Dweikat I. Predictionof genetic values of quantitative traits with epistatic effects in plant breedingpopulations. Heredity 2012d;109(5):313–9.

Winkler AM, Kochunov P, Blangero J, Almasy L, Zilles K, Fox PT, et al. Cortical thick-ness or grey matter volume? The importance of selecting the phenotype forimaging genetics studies. Neuroimage 2010;53(3):1135–46.

Witten DM, Tibshirani RJ. Extensions of sparse canonical correlation analysis withapplications to genomic data. Stat Appl Genet Mol Biol 2009;8(1):1–27.

Wold S, Martens H, Wold H. The multivariate calibration-problem in chemistrysolved by the PLS method. Lect Notes Math 1983;973:286–93.

Yang HH, Liu JY, Sui J, Pearlson G, Calhoun VD. A hybrid machine learning methodfor fusing fMRI and genetic data: combining both improves classification ofschizophrenia. Front Hum Neurosci 2010a;4.

Yang C, Wan X, Yang Q, Xue H, Yu W. Identifying main effects and epistatic inter-actions from large-scale SNP data via adaptive group Lasso. Bmc Bioinformatics

2010b;11(Suppl. 1):S18.

Yang S, Pan Z, Shen X, Wonka P, Ye J. Fused multiple graphical lasso; 2012arxiv:12092139.

Zhou J, Chen J, Ye J. Clustered multi-task learning via alternating structure optimiza-tion. NIPS 2011;2011:702–10.