network connectivity in resting-state fmri as a novel biomarker in diagnosing alzheimer's...

Network Connectivity in Resting-State fMRI as a NovelBiomarker in Diagnosing Alzheimer’s Disease

Daniel ParkDepartment of Bio and Brain Engineering

Korean Advanced Institute of Science and Technology

August 2nd, 2013

Abstract

This paper presents a novel algorithm for increasedperformance of early detection in Alzheimer’s Dis-ease (AD) patients using Neural Networks asa classification tool, as well as reveal newly-discovered functional pathways among variousregions of the brain that show significant changewith the progression of the disease.

Resting-state functional Magnetic ResonanceImaging (rs-fMRI) was used to acquire the vox-elwise time series in 156 subjects with clinicallydiagnosed AD (Clinical Dementia Rating 1.0 = 55,CDR 2.0 = 22) and Normal (n = 79). The brainswere divided into 116 regions of interest (ROIs)and the Pearson correlation coefficients of pairwiseROIs were used to classify these subjects (Tzourio,Rodgers). Error estimation of the classificationswas performed with the five-fold cross-validationmethod. An Artificial Neural Network (ANN)was used to classify extracted features fromcorrelational coefficients of functional connections.

With this algorithm, the area under the receiveroperating characteristic curve (AUC) yielded87.4% classification power, 84.4% sensitivity, and88.6% specificity between the AD group and thenon-AD group. 25 functional pathways in the brainthat showed statistically significant changes in ADpatients were found with the Correlation FeatureSelection (CFS) algorithm (Hall).

1 Introduction

Alzheimer’s Disease is a neurodegenerative disor-der and also the most common cause of dementia(Heron). Definitive diagnosis is difficult in clinical

settings because only an autopsy can confirm theabnormal levels of amyloid plaques and neurofib-rillary tangles in the brain.

Because AD patients experience severe decreasein their quality of life with the advancement ofthe disease, it is imperative to diagnose it at theearliest stage possible. There are several waysto diagnose the disease but none yields highaccuracy, especially in the early stages. Althoughthe deterioration of DMN has been widely studiedamong Alzheimer’s patients, only one study hasshown it to be an effective biomarker for diagnosisof AD (Chen, Raichle). It is the aim of thisresearch to improve the performance of suchdiagnosis as well as create a connectivity map ofthe functional pathways that are correlated withthe disease.

Resting-state data is collected to analyze thebrain’s connections when a subject is not per-forming any tasks or responding to any stimulusduring imaging. Rs-fMRI data yields new insightsinto how structurally segregated and functionallyspecialized brain networks are interconnected withBlood Oxygen Level Dependent (BOLD) signalsby tracking the hemodynamic response of an idlebrain (Fries). Several studies have proposed anew perspective on neurodegenerative diseases byfocusing on networks such as Hippocampus Net-works (HN), the Default Mode Network (DMN),and the Small-World Network (SWN).

With the recent rise in the use of SupportVector Machines (SVM), the number of paperson automated categorization of structural andfunctional brain images by differentiation of im-ages from two groups has increased dramatically.The pattern recognition techniques employing

1

machine learning are multivariate and take intoaccount various biomarkers to help categorizescans. New rs-fMRI scans of patients can becompared against the existing training set and canthen be classified as either positive or negative (Li).

Current studies suggest a positive correlation be-tween the progression of AD and the deteriorationof the DMN (Palop); yet there is almost no litera-ture on a quantitative approach to measuring thenetwork connectivity as a biomarker in diagnosingAD. Utilizing functional connections as features,the algorithm presented in the paper performs wellin the differential diagnosis of three different formsof dementia, suggesting the reliability of generaliz-ing over bigger groups.

2 Materials and Methods

2.1 Human Subjects

Out of 156 test subjects, 79 had been diagnosedwith Subjective Memory Impairment (SMI) butwere treated as the control group and 77 wereplaced into the experimental group with AD. Ofthe Alzheimer’s group, 55 subjects had a Clin-ical Dementia Rating (CDR) of 1.0 and 22 had 2.0.

The control group had a mean age of 67.6 with astandard deviation of 7.41. For patients diagnosedwith CDR of 1.0, mean age was 76.60 with a stan-dard deviation of 11.09. The last group with CDR2.0 was 75.20 with a standard deviation of 7.17.

2.2 ROI Selection

Selecting Regions of Interest (ROI) is necessaryto define regions which are related to AD. Inde-pendent Component Analysis (ICA) was used asan unsupervised clustering algorithm that cangroup proximal areas of the brain that show highcorrelation with other areas, yet independent. TheDefault Mode Network (DMN) includes PosteriorCingulate Cortex (PCC), Left Lateral ParietalCortex (LPC-L), Right Lateral Parietal Cortex(LPC-R), Left Superior Frontal Gyrus (SFG-L),and Right Superior Frontal Gyrus (SFG-R).

The ICA problem can be solved by maximizingthe contrast function which can be considered asan approximate representation of the independenceamong the components of the transformed data(Ashburner). The log-likelihood contrast function

for ICA can be written as

L =n∑

i=1

E[ln pyi(yi)] + ln | detB|,

where E denotes the expectation operator; yiis the ith component of y = Bx, i.e., yi = bTi xwhere bTi is the ith row vector of B; pyi(yi) is theprobability density function of yi.

The ICA problem is solved by maximizing theequation above and can be changed into the fol-lowing equation

∂L

∂B= E[g(B)xT ] + (B−1)T = 0

where g(B) is a column vector defined by

g(B) = [gy1 . . . gyi . . . gyn]T

and

gyi(yi) = p′yi(yi)/pyi(yi)

where p′yi(yi) stands for the derivative with respectto yi. Multiplying the partial derivative equationby BT yields the following gradient equation

F (B) = BT ∂L

∂B= BTE[g(B)xT ] + I = 0

where I is the identity matrix (Lai).

After the ROIs were determined, they were la-beled according to the Anatomical Automatic La-beling (AAL) template from Statistical Paramet-rical Mapping (SPM) (Tzourio-Mazoyer). Of the116 regions, only 90 were of the cerebellum region,and the remaining 26 in the cerebrum were dis-carded due to their anatomical irrelevance to AD(Greicius).

Figure 1: The brain from different angles coloredbased on regions from the AAL template.

2.3 Pre-processing

After determining the ROIs using ICA, voxels rep-resenting Cerebrospinal Fluid (CSF) were excludedfrom the final set of ROIs because they introducedunnecessary noise. Chang and Glover showed that

2

when both White Matter (WM) and CSF ROIswere regressed out of the data prior to correlationanalysis, performance increased. The assumptionunderlying this method is that the time series fromWM and CSF may carry physiological fluctuationsthat are similar to those affecting grey matter,while containing little contribution from neuralactivity.

However, in the case of AD, results were betterwhen only CSF was removed; this may be due tothe fact that AD is a disorder affecting connectionsamong different regions of the brain and affectingthe WM connections. In other words, the numberand length of axons may be of significance in de-termining AD in a test subject. From the selectedROIs, the voxel with the highest Pearson correla-tion coefficient was selected as the root and a meantime series was calculated from the nearby voxelsof size 5x5x5.

Raw ROI CSF removed Both removedFigure 2: ROI after pre-processing

2.4 Feature Extraction

Out of the 116 ROIs, 90 non-cerebellum ROIswere put into a correlation matrix to make featurearrays for the classification algorithm. Each ofthese correlation values from the matrix was putinto an array of features but since the matrix issymmetrical with respect to the diagonal fromtop left to bottom right, there was a total of 4005features instead of all 8100.

The Pearson Correlation ranges from -1 to1 with the former indicating a perfect negativecorrelation and the latter perfect positive cor-relation. An absolute value of the correlationvalues were taken as features as the sign of the val-ues are insignificant when used as indicators of AD.

For variables X and Y , the coefficient P can be

found with the following formula,

P (X, Y ) =E(XY )− E(X)E(Y )√

E(X2)− (E(X))2√E(Y 2)− (E(Y ))2

Figure 3: Correlation matrix of the 90 cerebellumregions.

2.5 Filtering

Out of the 4005 connections between all 90 regionsof the brain, it is clear that not all of themwill prove useful as biomarkers for diagnosingAlzheimer’s Disease. With this idea in mind, it ishypothesized that selecting and using only a sub-set of the data will produce better results becausethe others will act as noise in classification. Atfirst, the subset was determined after filtering thefeatures below a certain threshold but because all156 patients needed subject-specific thresholds,this method was inefficient and ineffective.

The final idea was to use a Correlation FeatureSelection (CFS) algorithm that evaluates subsetsof features on the basis that good feature subsetscontain features highly correlated with the classifi-cation, yet uncorrelated to each other. The subsetcan be found with the following equation,

CFS = maxx∈{0,1}n

[ (∑ni=1 aixi

)2∑ni=1 xi +

∑i 6=j 2bijxixj

]

where xi is the set membership indicator functionfor the features. The size of the resulting subsetwas 25, significantly smaller than the original 4005.All 25 were used as features for classification andyielded higher performance than when using 4005.

3

2.6 Classification

• Support Vector Machines

A Support Vector Machine (SVM) is an exampleof a supervised, multivariate classification method.SVMs are supervised because they include atraining step to learn about differences betweengroups to be classified. For this paper, a normal-ized polynomial kernel is created from normalizedtraining data (Theodoridis).

Given data vectors xi(i = 1, 2, ..., N), the stan-dard SVM provides an optimal separating bound-ary (a hyperplane), to assign a label yi ∈ [−1, 1]to each xi by solving the following optimizationproblem:

minw,ε,b

(1

2|w|2 + c

∑y1=1

εi

)

subject to the constraints yi(wφ̇(xi) + b) ≥ 1 − εiand εi ≥ 0, where c is the penalty parameter.

For the classification algorithm, the following pa-rameters were used: C = 5.0, L = 0.05, P =1.0E−12, N = 0, V = −1, W = 1, and K =Normalized Polynomial Kernel. They were foundby looping through all the variables in small in-crements until the accuracy was maximized. Thekernel was chosen in a similar fashion, with the al-gorithm looping for each kernel and it was shownthat the Normalized Polynomial Kernel performedthe best.

Figure 4: Illustration of the concept used in sup-port vector machines. The algorithm tries to finda boundary that maximizes the distance betweengroups. The figure reduces the problem to two di-

mensions for the purpose of illustration only.

• Neural Networks.

Artificial Neural Network (ANN) is an intercon-nected group of natural or artificial neurons thatuses a mathematical or computational model forinformation processing based on a connectionisticapproach to computation. In most cases an ANNis an adaptive system that changes its structurebased on external or internal information thatflows through the network. In essence, they arenon-linear statistical data modeling or decisionmaking tools which can be used to model complexrelationships between inputs and outputs or tofind patterns in data.

Neural networks work with the help of analgorithm called back propagation in which theoriginal error caused by random assignment ofweights is reduced by altering the weight of themost influential node. ANNs are meta functions inwhich the mapping functions in the hidden layerdetermine its characteristic.

For the classification algorithm, the following pa-rameters were used: B = 2, S = 6, R = 0.0,M = 2, W = 1.004. The activation function waschosen after looping through each function and theRadial Basis Function performed the best.

Figure 5: Neural network with one hidden layer.

3 Results

There are two benchmark performance numbersfor the algorithm presented in this paper: theoriginal paper in Radiology and the diagnosticaccuracy of doctors in clinical settings. Theoriginal paper had 87% accuracy, 85% sensitivityand 80% specificity (Kloppel). The accuracy ofdoctos in clinical settings range from 50% tonearly 90% and there is no objective criteria for

4

definitive diagnosis (Heron).

The goal of this paper is to develop an algo-rithm that can objectively diagnose AD fromone fMRI scan. The test was comparing SVMagainst ANN, of which was further divided intofour groups, pre - CFS filter, post - CFS filter,90 ROIs, and 116 ROIs. Sequential MinimalOptimization (SMO) was used as the classificationalgorithm for SVM and a Radial Basis Function(RBF) was used as the activation function in ANN.

Table 1: Comparison of pre & post-filter of SMO

Data Accuracy Sensitivity SpecificityPre 90 74.36% 0.744 0.742Pre 116 66.02% 0.660 0.658Post 90 82.05% 0.820 0.819Post 116 75.00% 0.750 0.747

In the SMO group in the table above, the algo-rithm performed best when the 90 x 90 correlationmatrix was used as features.

Table 2: Comparison of pre & post-filter of RBF

Data Accuracy Sensitivity SpecificityPre 90 60.26% 0.603 0.600Pre 116 58.97% 0.590 0.588Post 90 86.5% 0.819 0.865Post 116 84.6% 0.846 0.845

In the RBF group in the table above, the al-gorithm again performed best when the 90 x 90correlation matrix was used as features. In fact, itoutperformed the Radiology benchmark in speci-ficity by 6.5% and equalled in accuracy and sensi-tivity.

Table 3: Effectiveness of CFS Filter

Group x2 − x1 % increase P - valueSMO 90 7.69% 10.34 0.0491SMO 116 8.98% 13.60 0.0491RBF 90 26.24% 43.54 0.0075RBF 116 25.63% 43.46 0.0075

A look into the performance of the CFS filteris shown in the table above. Calculations showthat they have a statistically significant effect inincreasing the accuracy of the algorithm (Noether).

Table 4: Summary statistics before and after CFS

SMO RBFPre Post Pre Post

Mean 70.19 78.53 59.61 85.5SD 5.90 4.99 0.91 1.34SEM 4.17 3.53 0.645 0.950

The above table shows the summary statisticsbefore and after applying the CFS filter on bothSMO and RBF groups. It should be noted thatthere were 5 times the number of subjects in thisstudy than Radiology’s, which makes it less likelyto have been overfitted and more reliable as adiagnostic tool in clinical settings.

Table 5: Comparison of pre & post-filter of RBF

ROI # Functional Connection1x89 Amygdala L x Rectus L4x32 Angular R x Cingulum Post R8x58 Caudate R x Hippocampus R8x70 Caudate R x Olfactory R27x51 Cingulum Ant L x Frontal Sup Orb R28x41 Cingulum Ant R x Frontal Med Orb L28x42 Cingulum Ant R x Frontal Med Orb R28x56 Cingulum Ant R x Heschl R28x97 Cingulum Ant R x Temporal Inf L29x53 Cingulum Mid L x Fusiform L30x53 Cingulum Mid R x Fusiform L30x84 Cingulum Mid R x Precentral R30x108 Cingulum Mid R x Thalamus R35x36 Frontal Inf Oper L x Frontal Inf Oper R35x41 Frontal Inf Oper L x Frontal Med Orb L39x60 Frontal Inf Tri L x Insula R39x93 Frontal Inf Tri L x Supp Motor Area L45x104 Frontal Mid Orb R x Temporal Pole Sup R46x101 Frontal Mid R x Temporal Pole Mid L51x56 Frontal Sup Orb R x Heschl R53x57 Fusiform L x Hippocampus L53x85 Fusiform L x Precuneus L56x98 Heschl R x Temporal Inf R58x95 Hippocampus R x SupraMarginal L60x105 Insula R x Temporal Sup L

The list above includes 25 features extractedfrom applying the CFS filter to the raw data andsignify functional connections that are related tothe progression of AD. Aside from the DefaultMode Network, there has not been any studiesshowing statistically significant pathways in thebrain. This novel finding sheds light on new waysof treating Alzheimer’s Disease by targeting onlyspecific networks. The list is not only statisticallysignificant but also anatomically relevant, as it in-

5

cludes key areas such as the Hippocampus, Tem-poral Lobe, and the Thalamus.

4 Discussion

Our results indicate that supervised machinelearning techniques can aid the clinical diagnosisof AD. The analytical technique presented herepromises to distinguish disease-specific atrophyfrom that of normal aging in a standard T1weighted structural MRI scan. Furthermore, thestudy provides evidence that the method canbe developed to correctly differentiate betweendifferent forms of dementia.

Further areas of improvement include increasingthe sample size and having more groups of differentCDR in order to increase the classification power.We can also try different filters other than a CFSon a faster computer to optimize the performanceof the algorithm.

5 Acknowledgement

I would like to thank Dr. Yong Jeong for providingthe opportunity to conduct independent researchover summer and offering helpful suggestions overthe course of the internship. Young Beom Lee wasalso extremely helpful in introducing me to the lit-erature and answering many questions about thedifferent approaches to setting up the tests. Thiswork was supported by the Korean Advanced In-stitute of Science and Technology (KAIST), andspecifically the Laboratory for Cognitive Neuro-science and NeuroImaging (CNI). Finally, I wouldlike to thank all the members of the CNI lab formaking the internship as enjoyable as it could havepossibly been.

6 References

Ashburner J. A fast diffeomorphic image regis-tration algorithm. Neuroimage 2007; 38: 95–113.

Chen, G. Classification of Alzheimer’s Disease,Mild Cognitive Impairment, and Normal Cogni-tive Status with Large-Scale Network AnalysisBased on Resting-State Functional MR Imaging.Radiology 2011; 259: 213-22

Fries P. Neuronal gamma-band synchronizationas a fundamental process in cortical computation.Annu Rev Neurosci 2009;32: 209–224.

Greicius MD. Default-mode network ac-tivity distinguishes Alzheimer’s disease fromhealthy aging: evidence from functional MRI.Proc Natl Acad Sci U S A 2004;101(13):4637–4642.

Hall, M. Correlation-based Feature Selectionfor Machine Learning. Department of ComputerScience 1999;

Heron, M. Deaths: final data for 2006. NatlVital Stat Rep 2009; 57(14):1–134.

Kloppel, S. Automatic classification of MRscans in Alzheimer’s disease. Brain 2008; I3I:681-689.

Lai TL. Strong consistency of least squaresestimates in multiple regression. Proc Natl AcadSci U S A 1978;75(7):3034–3036.

Li SJ. Alzheimer disease: evaluation of a func-tional MR imaging index as a marker. Radiology2002;225(1):253–259.

Noether GE. The Wilcoxon two-sample test.In: Introduction to statistics: the nonpara-metric way. New York, NY: Springer-Verlag,1991;103–112.

Palop JJ. A network dysfunction perspec-tive on neurodegenerative diseases. Nature2006;443(7113):768–773.

Raichle ME. A default mode of brain function:a brief history of an evolving idea. Neuroimage2007;37(4):1083–1090; discussion 1097–1099.

Rodgers JL, Nicewander WA. Thirteen waysto look at the correlation coefficient. Am Stat1988;42(1):59–66.

Theodoridis S. In: Pattern recognition. 3rded. San Diego, Calif: Academic Press, 2006;69–78,474–475.

Tzourio-Mazoyer N. Automated anatomicallabeling of activations in SPM using a macroscopicanatomical parcellation of the MNI MRI single-subject brain. Neuroimage 2002;15(1): 273–289.

6

network connectivity in resting-state fmri as a novel biomarker in diagnosing alzheimer's...

Documents