>replacethislinewithyourpaperidentificationnumber(double ... ·...

12
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 Abstract—Labeled speech data from patients with Parkinson’s disease (PD) are scarce, and the statistical distributions of training and test data differ significantly in the existing datasets. To solve these problems, dimensional reduction and sample augmentation must be considered. In this paper, a novel PD classification algorithm based on sparse kernel transfer learning combined with a parallel optimization of samples and features is proposed. Sparse transfer learning is used to extract effective structural information of PD speech features from public datasets as source domain data, and the fast ADDM iteration is improved to enhance the information extraction performance. To implement the parallel optimization, the potential relationships between samples and features are considered to obtain high-quality combined features. First, features are extracted from a specific public speech dataset to construct a feature dataset as the source domain. Then, the PD target domain, including the training and test datasets, is encoded by convolution sparse coding, which can extract more in-depth information. Next, parallel optimization is implemented. To further improve the classification performance, a convolution kernel optimization mechanism is designed. Using two representative public datasets and one self-constructed dataset, the experiments compare over thirty relevant algorithms. The results show that when taking the Sakar dataset, MaxLittle dataset and DNSH dataset as target domains, the proposed algorithm achieves obvious improvements in classification accuracy. The study also found large improvements in the algorithms in this paper compared with nontransfer learning approaches, demonstrating that transfer learning is both more effective and has a more acceptable time cost. Index Terms—convolution sparse coding, parallel selection, Parkinson’s disease, transfer learning. Manuscript received December XX, 2020. This work was supported in part by NSFC under Grant 61771080,in part by the Fundamental Research Funds for the Central Universities under Grants 2019CDCGTX306 and 2019CDQYTX019,in part by the Basic and Advanced Research Project in Chongqing under Grants cstc2018jcyjAX0779 and cstc2018jcyjA3022, in part by the Chongqing Social Science Planning Project under Grant 2018YBYY133. * Corresponding author: Yongming Li, email: [email protected]. Xiaoheng Zhang is with Chongqing Radio and TV University, Chongqing 400052, China, (e-mail: 7818320@ qq.com). Y. Li, P. Wang and Y. Liu are with the School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400030, China. (e-mail: [email protected]; [email protected]; [email protected]). I. INTRODUCTION ARKINSON’S disease (PD) is a degenerative neurological disease in which physical monitoring, early diagnosis and timely intervention are the key factors for improving the quality of PD diagnosis and treatment. Speech-based PD detection has several advantages, including convenience, a high cost-performance ratio, and noncontact and noninvasive administration. Therefore, further study of speech-based PD diagnostic ability has high scientific value and practical significance [1]–[3].However, the current relevant classification algorithms for Parkinson’s diagnosis are unsatisfying; thus, it is necessary to improve their accuracy. In speech-based PD diagnosis, features are extracted from clean speech as primitive materials. In the early stage, the commonly used traditional features are vowels, rhythm, intonation and syllables, which reflect speech characteristics, and these features were most often used in isolation. The prosodic features of speech in PD were studied by Pettorino et al.[4]; quantitative prosody analysis was carried out, and the potential correlations between monophonic pitch, loudness and phonological discontinuities in dysarthria were further analyzed[5]. The continuous vowel information in dysarthria and the syllabic characteristics of speech in PD were studied[6], [7]. As the research progressed, the main types of features extracted from PD speech were extended to include more accurate features such as pitch, MFCC, cepstral, shimmer, jitter, HNR[8]–[12], NSR and combinations of these features. Jitter, shimmer and NSR are used to calculate classification accuracy and UPDRS score, which improved the experimental results [13], [14]. More complete speech features of PD were extracted and studied by Sakaret al. [15],who achieved some obtained. In addition, some of the methods for detecting PD are based on mel-cepstrum and other spectrum information[16], [17]. The biomechanical characteristics of PD patient pronunciation were adopted to make predictions based on Bayesian analysis, and the correlations between biomechanical properties, chatter and pronunciation caused by disarticulation were investigated, providing clues for the biomarkers of Parkinson’s ClassificationAlgorithm of Speech Data of Parkinson’s Disease Based on Convolution Sparse Kernel Transfer Learning with Optimal Kernel and Parallel Sample/Feature Selection Xiaoheng Zhang, Yongming Li * Member, IEEE, Pin Wang, Xiaoheng Tan, and Yuchuan Liu P

Upload: others

Post on 30-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1

Abstract—Labeled speech data from patients with Parkinson’s

disease (PD) are scarce, and the statistical distributions of trainingand test data differ significantly in the existing datasets. To solvethese problems, dimensional reduction and sample augmentationmust be considered. In this paper, a novel PD classificationalgorithm based on sparse kernel transfer learning combined witha parallel optimization of samples and features is proposed.Sparse transfer learning is used to extract effective structuralinformation of PD speech features from public datasets as sourcedomain data, and the fast ADDM iteration is improved to enhancethe information extraction performance. To implement theparallel optimization, the potential relationships between samplesand features are considered to obtain high-quality combinedfeatures. First, features are extracted from a specific publicspeech dataset to construct a feature dataset as the source domain.Then, the PD target domain, including the training and testdatasets, is encoded by convolution sparse coding, which canextract more in-depth information. Next, parallel optimization isimplemented. To further improve the classification performance,a convolution kernel optimization mechanism is designed. Usingtwo representative public datasets and one self-constructeddataset, the experiments compare over thirty relevant algorithms.The results show that when taking the Sakar dataset, MaxLittledataset and DNSH dataset as target domains, the proposedalgorithm achieves obvious improvements in classificationaccuracy. The study also found large improvements in thealgorithms in this paper compared with nontransfer learningapproaches, demonstrating that transfer learning is both moreeffective and has a more acceptable time cost.

Index Terms—convolution sparse coding, parallel selection,Parkinson’s disease, transfer learning.

Manuscript received December XX, 2020. This work was supported in partby NSFC under Grant 61771080,in part by the Fundamental Research Fundsfor the Central Universities under Grants 2019CDCGTX306 and2019CDQYTX019,in part by the Basic and Advanced Research Project inChongqing under Grants cstc2018jcyjAX0779 and cstc2018jcyjA3022, in partby the Chongqing Social Science Planning Project under Grant2018YBYY133.

*Corresponding author: Yongming Li, email: [email protected] Zhang is with Chongqing Radio and TV University, Chongqing

400052, China, (e-mail: 7818320@ qq.com).Y. Li, P. Wang and Y. Liu are with the School of Microelectronics and

Communication Engineering, Chongqing University, Chongqing 400030,China. (e-mail: [email protected]; [email protected];[email protected]).

I. INTRODUCTION

ARKINSON’S disease (PD) is a degenerativeneurological disease in which physical monitoring, early

diagnosis and timely intervention are the key factors forimproving the quality of PD diagnosis and treatment.Speech-based PD detection has several advantages, includingconvenience, a high cost-performance ratio, and noncontactand noninvasive administration. Therefore, further study ofspeech-based PD diagnostic ability has high scientific valueand practical significance [1]–[3].However, the currentrelevant classification algorithms for Parkinson’s diagnosisare unsatisfying; thus, it is necessary to improve theiraccuracy.

In speech-based PD diagnosis, features are extracted fromclean speech as primitive materials. In the early stage, thecommonly used traditional features are vowels, rhythm,intonation and syllables, which reflect speech characteristics,and these features were most often used in isolation. Theprosodic features of speech in PD were studied by Pettorino etal.[4]; quantitative prosody analysis was carried out, and thepotential correlations between monophonic pitch, loudnessand phonological discontinuities in dysarthria were furtheranalyzed[5]. The continuous vowel information in dysarthriaand the syllabic characteristics of speech in PD werestudied[6], [7].

As the research progressed, the main types of featuresextracted from PD speech were extended to include moreaccurate features such as pitch, MFCC, cepstral, shimmer,jitter, HNR[8]–[12], NSR and combinations of these features.Jitter, shimmer and NSR are used to calculate classificationaccuracy and UPDRS score, which improved theexperimental results [13], [14]. More complete speechfeatures of PD were extracted and studied by Sakaret al.[15],who achieved some obtained. In addition, some of themethods for detecting PD are based on mel-cepstrum andother spectrum information[16], [17]. The biomechanicalcharacteristics of PD patient pronunciation were adopted tomake predictions based on Bayesian analysis, and thecorrelations between biomechanical properties, chatter andpronunciation caused by disarticulation were investigated,providing clues for the biomarkers of Parkinson’s

ClassificationAlgorithm of Speech Data ofParkinson’s Disease Based on Convolution

Sparse Kernel Transfer Learning with OptimalKernel and Parallel Sample/Feature Selection

Xiaoheng Zhang, Yongming Li* Member, IEEE, Pin Wang, Xiaoheng Tan, and Yuchuan Liu

P

Page 2: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2

disease[18].The methods used for feature selection include

PCA[19]–[21], LDA[22], [23], NN[24]–[27], a serial search ofa rough set [28] and evolutionary computation[23], [29]–[32].The classifiers mainly include SVM[9], [10], [17], [33],KNN[21], [34], RF[9], [35]–[37], ensemble learning[38]–[40]and decision trees[41].The deep learning methods includeDBN[42], DNN[43], autoencoders [44], and so on. Fuzzytheory[21], [32], [45], [46] has also been adopted as anauxiliary method.

Currently, most relevant research methods are based on twoUCI datasets. One is supported by Little et al.[8], and the otheris supported by Sakar et al.[15].The latter is more challengingsince it originated later and accuracy on that dataset is currentlylower. However, considering the regional differences amongPD patients and the number of Chinese PD patients, it would bebetter if Chinese PD patients were considered.

It is worth noting that all the above studies were based onthe current speech data and a machine learning algorithm wasadopted to achieve PD classification[47], [48]. Because thesemethods are all directly based on a single small speech dataset,small sample-size problems prevent the algorithms fromimproving their performances. The best solution would be toincrease the sample size while simultaneously reducing thefeature dimensions. Therefore, a concrete algorithm isproposed in this paper. Convolution sparsecoding(CSC)[49]–[51], which has become popular intwo-dimensional image processing tasks in recent years, hasgood unsupervised sparse learning ability and is used toeffectively compress feature dimensions to solve the smallsample problem. Moreover, considering the large publicspeech datasets, transfer learning can be combined withsparse coding[52] to extract more valuable information frompublic datasets, thereby further solving both the small sampleproblem and the problem of different sample distributionsbetween different datasets. Because the public speech datasetsdo not include labels, the CSC can be used to learnconvolution kernels from these public datasets to transformthem into feature map information. In this study, a parallelsample/feature selection mechanism is designed andintroduced into the subsequent sparse coding processing tosimultaneously obtain high-quality samples and features. Inaddition, to further improve the classification accuracy, aconvolution kernel optimization mechanism is adopted toselect the proper kernel.

The main contributions and innovations of this paper are asfollows:

1) Convolution sparse coding and transfer learning arecombined to solve the small sample problem of PDspeech data.

2) State of the art sparse methods are optimized andintroduced into PD speech data classification.

3) A parallel sample/feature selection mechanism isdesigned and implemented after the convolutionsparse coding and transfer learning to further improvethe quality of samples and features.

4) A convolution kernel optimization mechanism is

proposed to enhance the current CSC algorithm.5) Three representative PD speech datasets are

considered for this study and used to verify theproposed classification algorithm. The Sakar andMaxLittle datasets are adopted as the well-known PDspeech datasets (foreign PD patients) and the DNSHdataset (self=designed PD speech dataset of ChinesePD patients) is also adopted. The Sakar dataset andMaxLittle datasets involve classifications of bothnormal and PD patients, while the DNSH datasetinvolves classifications of PD patients before andafter treatment. The Sakar and MaxLittle datasets arepublicly available UCI datasets, and the DNSHdataset was constructed by the authors.

The remainder of this paper is divided into four mainsections. Section 1 describes the background, motivation andvalue of this paper. Section 2 elaborates on the proposedmethod. Section 3 analyzes the experimental results. Section 4discusses the contributions and limitations of this study andsuggests future work.

II. METHOD - PD CLASSIFICATION ALGORITHM BASED ONCONVOLUTION SPARSE KERNEL TRANSFER LEARNING

(PD_CSTLOK&S2)

A. Brief Description of the Proposed AlgorithmThe main flow of this algorithm is as follows. First, the

public speech dataset is expanded using noise injection,forming a larger dataset. Second, the features are extractedfrom the data, thereby constructing a speech feature dataset asthe source domain. Then, CSC learning is carried out for thesource domain datasets, and a kernel matrix D̂ is obtained.Based on the kernels, the target domain dataset is encoded tocalculate the feature maps; then, they are normalized to form anorm feature map matrix E. To improve the computationalcomplexity and the engineering efficiency, the CSC[53] isoptimized. Then, the sparse representation feature matrix isextended. Using the Relief algorithm, the weights of themixed features in the training sample set are calculated, andfeature optimization is conducted to generate a new dataset,P[48]. For descriptive simplicity, the transfer learning with anoptimal kernel, simple transfer learning and nontransferlearning are denoted as PD_CSTLOK&S2 (PD classificationalgorithm based on convolution sparse transfer learning withoptimal kernel and parallel selection), PD_CSTL&S2 (PDclassification algorithm based on convolution sparse transferlearning and parallel selection) and PD_CSC&S2 (PDclassification algorithm based on convolution sparse codingand parallel selection), respectively.

B. Notation

Target domain dataset

MHNHH

N

N

H aaa

aaaaaa

A

AA

A

AA

A

~

~~

2

1

21

22221

11211

2

1

,

Page 3: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3

where Hiaaa iNiii 1],,,,[ 21 A ,label vector

Hc

cc

2

1

C ,

Partitioned matrix Mi

aaa

aaaaaa

NHHH

N

N

i

1,~

02010

22221

11211

A . The total

number of samples is H, and the number of features persample(Number of vector components) is N, All the samplesbelong to subject M, that is, the number of samples for eachsubject is H0=H/M; the public dataset is extended to a largerscale by injecting different SNRs and different types of noise.

The extended datasets are expressed as follows:

JJJ SNR

SNRSNR

,,

,,,,

22

11

2

1

NS

NSNS

S

SS

S

, (1)

where S

is the original voice signal from the public data set,jN

represent different types of noise signals, and is afunction of the type of noise adjustment and the signal-to-noiseratio(SNR).

Features are extracted for the extended data sets to form anew feature dataset

LLNLL

N

N

L Y

YY

SSS

SSSSSS

Y

YY

Y

~

~~

2

1

21

22221

11211

2

1

. (2)

As the source domain dataset,where LiiNiii 1,,,, 21 SSSY

, the feature extraction

method from [15] was adopted to extract N different signalfeatures. The total number of feature samples is L, iY

~is a

two-dimensional NH 0 block matrix, iY

is a sparse dictionary

learning training sample, and iY~ represents the

convolution-kernel sparse learning training samples. Forcomparison, in the nontransfer learning version, the training setof the target domain dataset is used directly as the object ofconvolution sparse learning, and the remaining algorithmicprocessing is similar to the corresponding transfer learningversion.

C. Convolution Sparse Coding Learning

In CSC, given M training samples Mmm 1x , the convolution

kernel group is learned by minimizing the objective function K

ik 1d as follows:

K,,k ,ee k

M

m

M

m

K

k

km,

K

k

km,kme

11s.t.,21minarg 2

21 1 1

1

2

21,

ddxd

, (3)

where mm Yx~

isthe NH0 block matrix, km,e is the NH0

feature map matrix, and mx is approximated by convolving itwith the corresponding convolution kernel kd .Thenotation denotes a two-dimensional convolution and is a

regularization factor greater than zero. The solution to theabove optimization problem is based on the fundamentalclassical framework alternating direction method of multipliers(ADMM) [54].

Formula (3) can be re-expressed as

1s.t.,21minarg 2

21

2

2 kdexDe

e , (4)

where Ded

km,

K

k

k e1

, KDDD 21D is the corresponding

vectorizable convolution operator of Kddd 21 , and

TKeee TT2

T1 e is the feature map vector.

The solution can be divided into two processes.First, the convolution kernel is fixed to obtain the feature

maps. Formula (4) can be expressed as follows.

0 s.t. ,21minarg

1

2

2 bebxDe

be

,(5)

In (5), 2

21 21)( xDee , 12 )( bb ,

which can be solved via ADMM iteration:

1)(1)(1

2

2

1)(1

2

2

1)(2

1

2

2

2

2

2

211

-

2minarg

2)(minarg

221minarg

2)(minarg

jjj)(j

jjjj)(j

jjjj)(j

ρρ

ρρ

beuu

ub-ebub-ebb

ub-exDeub-eee

bb

ee

(6)

Second, the feature map is fixed to solve the convolutionkernel.Formula (4) can be expressed as follows:

0 and 1 s.t. ,21minarg 2

2

2

2 cdcxEd

cd,k . (7)

In (7), 2

21 21)( xEdd , )(2 c is the indicator function of the

convex set 12

2kc , and in (8), )(roxp computes the proximal

operator, which can be solved via ADMM iteration:

1)(1)(1

1)(2

1

2

2

2

2

2

211

-

)(roxp

221minarg

2)(minarg

jjj)(j

j)(j)(j

jjjj)(j ρρ

cdvv

vdc

vc-dxEdvc-ddd

c

dd

(8)

Finally, a set of sparse convolution kernels ],,,[ 21 kddd isobtained by the alternating iterations.

D. Solution of Fast Convolution Sparse CodingCSC feature map learning can be realized by ADMM. The

fast convolution sparse coding algorithm [53] is adopted herewhen the convolution kernels ],,,[ 21 kddd are fixed. Thefeature extraction formula is expanded as follows:

22e 2 2

1

1

1

1arg min2

( ) ( ) ( ( ) )( ) ( ( ) )( )

(j+1) j j

T T j j

T T T j j

T T T j jk k

= - x - -2

I -I I -I I -

e De e b u

D D D x b uD D D D D x b uD D D D D x b u

, (9)

Page 4: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4

2( 1) ( )b 1 2

( 1) ( )

( 1) ( ) ( 1) ( )

( 1) ( )

( 1) ( ) ( 1) ( )

1arg min2

( )

,

0

(j+1) j j

j j

j j j j

j j

j j j j

= -

, <

b b e b + u

S e + u

e + u e + u

e + u

e + u e + u

, (10)

1)(1)(1 - jjj)(j beuu . (11)Because matrix inversion is extremely time consuming, in

(9),1

T

kkI DD can be precomputed in the Fourier

domain by

K

kkd

1

2ˆ1

1for the fixed parameters to greatly

reduce the algorithm complexity, where kd̂ is the magnitudes ofthe Fourier transform of kd . In (10), S is the soft thresholdfunction of the element operation. A stable feature map can beobtained after many iterations.

Similarly, the kernel learning formula is expanded asfollows:

jjTT

kkT

jjTTT

jj)(j

II

II

vcxEEEEE

vcxEEEEE

vc-dxEddd

1

1

2

2

2

21

21

2minarg

, (12)

21 1 1 1 1supp supp2

1 1supp

)F ) / , if F ) 1

)F ), otherwise

j j j j j j j j(j+1)

j j j=

>(d + v (d d + v d + v (dc

(d + v (d

, (13)where )(Fsupp d is a mask that takes a 1 on )(SUPP d and 0otherwise:

1)(1)(1 - jjj)(j cdvv . (14)To further improve the efficiency of the CSC algorithm, by

revisiting the iterative scheme (9)-(14), it can be observed that

the variables e and d play only an intermediate role. Based on[55], a new iterative feature extraction formula is generated asfollows:

jj

jj

j

j

)(j

)(j

(j)j(j)j

(j)j

(j)j(j)j

j

jjjj

jjTTkk

Tj II

uubb

ub

ub

ueue

ue

ueue

b

beuu

ubxDDDDDe

1

1

1

step2.

0

step1.

. (15)

Then, a new kernel learning formula is generated as follows:

jj

jj

j

j

)(j

)(j

jjj

jjjjjjjjj

jjjj

jjTTkk

Tj II

vvcc

vc

vc

dvd

dvdvd/dvdc

cdvv

vcxEEEEEd

1

1

supp

suppsupp

1

step2.

otherwise)(F)(

1)ˆ(Fif )(F)(

- step1.

,

(16)where is a relaxation factor.

A pseudocode description of the fast kernel learningalgorithm from a public dataset is shown in Algorithm 1.Algorithm 1: Fast kernel learning algorithm

Input: Public dataset S

, total sample sizeH, number of features per sampleN

Output: convolution kernel D̂

Procedure:

1.Using Formula (1), add different types of noise with differentsignal-to-noise ratios to the dataset and extend it to the set S ;

2.Using Formula (2), extract the features of the speech samples from S andconstruct the feature dataset, which is the source domain dataset Y ;

3. for iteration = 1…iter_num do

4. for iteration1 = 1…iter_num1 do

5.Using Formula (15), fix the convolution kernel to obtain the feature maps.

6. end for

7.for iteration2 = 1…iter_num2 do

8.Using Formula (16), fix the feature map to obtain the convolution kernel.

9.end for

10:end for

E. Algorithm for Simultaneous Selection of Speech Samplesand Features

In Formula (16), by replacing x with iA~ and D with D̂ , the

target domain feature matrix iA~ is mapped to the feature map

matrix Keee 21e through a finite number of iterations. Then,

ke is selected as a specific mapping iE~ , and a new feature set

ME

EE

E

~

~~

2

1

is constructed.

Fig. 1 shows the images in the processing flow of CSC forthe DNSH dataset. In Fig. 1(a), the locally normalized image isa visualization of one subject’s data from the DNSH dataset.The image in Fig.1(b)is the sparse coding of the subject data in

Page 5: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5

Fig. 1(a), obtained from the convolutional sparse model. Fig.1(c) shows a feature map extracted from the subject data in Fig.1(a). The sparsity of image Fig.1(b) is greater than that of imageFig.1(a), and Fig.1(c) is extremely sparse, which means that thefeature extracted from convolutional sparse coding are highlysignificant.

Fig.1.Images in the processing flow of CSC for the DNSH dataset: (a) Locallynormalized image.;(b)Image obtained from the convolution sparse model; (c)Feature map.

The feature map matrix E is extended to G as follows:

0MNMM

0N

0N

0

0

0

M N1NH

N1NHN1NH

21

22221

11211

2

1

2

1

~

~~

E

ME

EE

G

GG

G

RESHAPE

RESHAPESHAPER

(17)

where the feature extension of CSC expandsH0 rowvectors iE

of the same subject into one row vector; the

convolution sparse coding feature expansion aims to transformthe feature matrix iE

~into one row vector, normalize G to

obtain G , and split the matrix

m

m

TG

G into training

sets 021 GNGG ΓΓΓGm

and test sets mT .

Based on the Relief algorithm, the weight 0Nwww

21W of

the normalized feature vector array 021 GNGG ΓΓΓ

is

calculated. The weight of feature jΓ

is as follows:

M

i

R

r

ijrij

R

r

ijrijjw22

)(H)(M , (18)

where )(M ij is the neighbor set of the same class of ij , R is theset number, )( ijH is the neighbor set of the other class of ij ,and )()( ijijr MM , )()( ijijr HH .By reordering W

and so that

Qwww 21 , the feature sets QmGP and Q

mTP are reconstructedaccording to the weights.

1

2 =

M

=

Q1 2m

m Q1 2

1 2 N0

1 2 N0

'''' ''QGG GG

Q '''' ''T TT T

' ' 'G G G

' ' 'T T T

PP ΓΓ ΓPP P ΓΓ Γ

P

Γ Γ Γ

Γ Γ Γ

T T T1 2 Ql l l

. (19)

The Q dimension vectors QGGG ΓΓΓ

21 and TQTT ΓΓΓ

21 are selected according to the maximum Q

weights of 021 GNGG ΓΓΓ

,

00 1 00

indexil , and

index is the column marker for the feature column vector withiw .

The pseudocode description for the simultaneous selectionbased on the convolution kernel algorithm is shown inAlgorithm 2.Algorithm 2:Simultaneous Selection based on Convolution KernelAlgorithm(SS_CK)

Input: Convolution kernel D̂ , target domain data set A , total sample sizeH, number of features per sample N, number of subjects M

Output: accuracy

Procedure:

1: Compute the feature map matrix E of the target domain dataset,expand G , and normalize it to G ;

2: Split

m

m

TG

G into a training set mG and a test set mT ;

3: Scan the sample )(imGP

of training set mG , and select R neighbors from

the same type of samples )(imGP

to construct )( ijM . Then, select R

neighbors from the other type of samples )(imGP

to construct )(H ij ;

4: Using (22), calculate W

;

5: According to the maximized Q weights in W

,select the optimal featuresand obtain a new matrix P ;

6: Perform LOSO CV; then, the classification accuracy is calculated basedon an SVM classifier with a linear kernel.

F. Convolution Kernel OptimizationThe classification quality is influenced by numerous factors,

such as the number of convolution kernels, the number offeature maps, and the number of features selected bysimultaneous selection; therefore, it is important to train aproper sparse coding kernel. In (20), both the sparseconvolution characteristics of Algorithm 1 and theclassification accuracy of Algorithm 2 are

considered.

M

m

M

m

K

k

km,

K

k

km,kmaaλa

1 1 11

2

21, 2

1min dxd

is a

biconvex problem; consequently, it is difficult to provewhether Qk ,dCKSS is convex. Clearly, b in (20) is a complexcombinatorial optimization problem with a large parametersearch space:

set optimal kernel onSS_CK ,max :b

21min :a

arg1 1 1

1

2

21,

Q

aλa

k

M

m

M

m

K

k

km,

K

k

km,kma k

d

dxd d . (20)

For the reasons above, a simplified kernel optimizationsolution mechanism is proposed in this paper. The core idea isthat by initializing the convolution kernel ijD using multiplerandom seeds in parallel, the optimal optimalD is selected basedon the classification accuracy of the kernel training set.

The pseudocode of the entire proposed algorithm, includingconvolution kernel optimization, is shown in Algorithm 3.

Page 6: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6

Algorithm 3(Complete Algorithm): PD_CSTLOK&S2

Input: Public dataset S

, target domain data set A , total sample size H,number of features per sample N, number of subjects M

Output: accuracy, sensitivity, specificity

Procedure:

1: Split A into a kernel optimization set 1A and a data training-test set 2A ;

2: for kernel_num= 1…i do

3: for featuremap_num= 1…kernel_num do

4: parallel for seed= 1…jdo

5: Randomly initialize the convolution kernel ijD ;

6. Calculate the convolution kernel on dataset 1A , using Algorithm 1;

7. Calculate the accuracy using Algorithm 2;

8. Obtain optimalD when the best result of Q,optimalDSS_CK is achieved.

9: end parallel for

10: Calculate the classification accuracy on dataset 2A based on optimalD

using Algorithm 2;

11: end for

12: end for

III. EXPERIMENTAL RESULTS AND ANALYSIS

A. Experimental EnvironmentThe source domain dataset of this paper is the standard

speech dataset TIMIT. The training set includes data from 40men and 40 women, each with 3 sentences. The dataset issupplemented with noise (from the NOISEX-92 noise dataset)and expanded. The expanded dataset is 10 times larger than theoriginal dataset. Based on the expanded dataset, 26 features(Jitter,Shimmer,AC,NTH,HTN, etc.) are extracted to constructthe feature set (feature matrix). In this paper, three PD datasetswere selected as the target sets. The 1st dataset was created byLittle et al.[8], [9]. The 2nd dataset was created by Sakar etal.[15] in 2014. The 3rd dataset was collected by the authors atthe Department of Neurology, Southwest Hospital and contains10 subjects in the closing period and 21 subjects in openingperiod. The corresponding 13 voice samples collectedarelabeled ‘1’, ’2’, ’3’, ’4’, ’5’, ’6’, ’7’, ’8’, ’9’, ’10’, ’a’, ’o’,and ‘u’, and the extracted feature types are the same as thefeatures in the Sakar dataset.

The subjects in the dataset were subjected to cross-validationby LOSO. The main reason for choosing the cross-validationmethod is that there are insufficient samples; cross-validationmaximizes the number of training samples to better reflect thealgorithm's potential. Therefore, the test accuracy is closer to

the results of an actual application scenario. Different from thecurrent literature, which uses k-fold and hold-one-outcross-validation, the training set and test set represent differentsubjects to ensure that the classification accuracy is bothrealistic and consistent with the actual diagnosis.

The hardware configuration of the experiment platform wasas follows: CPU (Intel i5-3230M), 4GBof memory andMATLAB R2014a. For the PD_CSTLOK&S2 algorithm, thenumber of random seeds was 10, and the numbers of maintraining iterations, feature map iterations and convolutionkernel iterations were set to 100, 10 and 10, respectively.

The relaxation factor 1 . The number of convolutionkernels ranged from 2 to 8, and the size of the convolutionkernel was 8*8 for the Sakar and DNSH datasets and 4*4 forthe MaxLittle dataset.

B. Verifying the Main Parts of the Proposed Algorithm1) Verification of Parallel Sample/FeatureSelection

Fig. 2 shows a comparison of the classification accuracy ofthe PD_CSTLOK&S2 algorithm before and after parallelsample/feature selection for the MaxLittle dataset. The abscissais the (convolution kernel num, feature map num), and theordinate is the classification accuracy. As Fig. 2 shows, theclassification accuracy rate after applying parallelsample/feature selection is significantly higher than that beforeapplying the technique. The results show that parallelsample/feature selection can significantly improve the testaccuracy.

Fig.2. Comparison of PD_CSTLOK&S2 based on the algorithm before andafter parallel sample/feature selection.2) Verification of Transfer Learning and Optimal ConvolutionKernels

The classification performance is illustrated in Fig. 3,wherethe confusion matrixes show the actual and predictedclassifications. Fig. 3(a) indicates the comparison results ofPD_CSC&S2, PD_CSTL&S2 and PD_CSTLOK&S2 on theSakar dataset. The case is similar to the MaxLittle dataset in Fig.3(b) and DNSH dataset in Fig. 3(c).

PD_CSC&S2

TRUE FALSE

FALSE

TRUE

0.33

0.73

0.67

0.27

PD_CSTL&S2

TRUE FALSE

FALSE

TRUE

0.19

0.91

0.81

0.09

PD_CSTLOK&S2

TRUE FALSE

FALSE

TRUE

0.00

0.90

1.00

0.10

Page 7: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7

Fig.3(a). The confusion matrix for the PD_CSC&S2, PD_CSTL&S2 andPD_CSTLOK&S2 algorithms on the Sakar dataset.

PD_CSC&S2

TRUE FALSE

FALSE

TRUE

0.66

0.94

0.34

0.06

PD_CSTL&S2

TRUE FALSE

FALSE

TRUE

0.57

0.98

0.42

0.02

PD_CSTLOK&S2

TRUE FALSE

FALSE

TRUE

0.00

1.00

1.00

0.00

Fig.3(b). The confusion matrix for the PD_CSC&S2, PD_CSTL&S2 andPD_CSTLOK&S2 algorithms on the MaxLittle dataset.

PD_CSC&S2

TRUE FALSE

FALSE

TRUE

0.06

0.35

0.94

0.65

PD_CSTL&S2

TRUE FALSE

FALSE

TRUE

0.09

0.60

0.91

0.40

PD_CSTLOK&S2

TRUE FALSE

FALSE

TRUE

0.20

1.00

0.80

0.00

Fig.3(c). The confusion matrix for the PD_CSC&S2, PD_CSTL&S2 andPD_CSTLOK&S2 algorithms on the DNSH dataset.

As the comparison in Fig. 3(a) shows, moving fromPD_CSC&S2, PD_CSTL&S2 and PD_CSTLOK&S2, thesensitivity improves from 73% to 91%, increasing by 18%, anddecreases to 90% slightly. The specificity improves from 67%to 81% and then to 100%, increasing by 14% and 19%,respectively. In Fig. 3(b), the sensitivity improves from 94% to98% and then to 100%, increasing by 4% and 2%, respectively.The specificity improves from 34% to 42% and then to 100%,increasing by 8% and 58%, respectively. In Fig. 3(c), thesensitivity improves from 35% to 60% and then to 100%,increasing by 25% and 40%, respectively.

C. Effects of Parameters on the Proposed Algorithm’sPerformance

The convolution kernel is one of the main parameters ofPD_CSTLOK&S2; therefore, it is necessary to study its effecton the algorithm’s performance. For the DNSH dataset, as thenumber of convolution kernels increases from 2 to 8, therelationship between the number of convolution kernels and theclassification accuracy is shown in Table I.

TABLE IRELATIONSHIP BETWEEN THE NUMBER OF CONVOLUTION KERNELS AND

CLASSIFICATION ACCURACY FOR THE SAKAR DATASET

Convolutionkernel num

Featuremap num

Accuracy(%) Sensitivity(%) Specificity(%)

2 1 75.0 80.0 70.02 60.0 60.0 60.0

3 1 70.0 70.0 70.02 65.0 60.0 70.03 60.0 60.0 60.0

41 80.0 80.0 80.02 60.0 60.0 60.03 60.0 70.0 50.04 85.0 100.0 70.0

51 70.0 80.0 60.02 80.0 70.0 90.03 80.0 80.0 80.04 50.0 60.0 40.05 85.0 80.0 90.0

61 60.0 70.0 50.02 65.0 50.0 80.03 65.0 60.0 70.04 80.0 80.0 80.05 60.0 60.0 60.0

6 85.0 80.0 90.0

7

1 65.0 80.0 50.02 75.0 80.0 70.03 75.0 60.0 90.04 80.0 90.0 70.05 65.0 70.0 60.06 75.0 80.0 70.07 85.0 80.0 90.0

8

1 80.0 80.0 80.02 80.0 90.0 70.03 70.0 70.0 70.04 65.0 70.0 60.05 75.0 60.0 90.06 90.0 90.0 90.07 80.0 80.0 80.08 95.0 90.0 100.0

As shown in Table I, with 8 convolution kernels and 8feature maps, the average classification accuracy reaches amaximum of 95.0%, the sensitivity reaches 90.0% and thespecificity reaches 100.0%. It can be found that the best resultcan be obtained by selecting proper convolution kernel numand feature map num.

D. Comparison with Representative AlgorithmsTable II presents the classification and comparison results of

this algorithm on the Sakar dataset. The algorithm proposed inthis paper is compared with other representative algorithms onthis dataset. In addition, the proposed algorithm is comparedwith other relevant algorithms, including the SVM with linearand radial kernels, DBN, CNN and a deep autoencoderalgorithm.

TABLE IICOMPARISON OF THE CLASSIFICATION RESULTS OF THE PROPOSED

ALGORITHM (SAKAR DATASET)

Study Method Accuracy(%)

Sensitivity(%)

Specificity(%)

Sakar et al.[15] KNN+SVM 55(LOSOCV)

60 50

Canturk andKarabiber [56]

4 FeatureSelectionMethods+

6 Classifiers

57.5(LOSOCV) 54.28 80

Eskidere et al.[38]Random Subspace

ClassifierEnsemble

74.17(10-fold CV)

— —

Behroozi andSami[39]

Multiple classifierframework

87.50(A-MCFS) 90.00 85.00

Zhang et al.[35] MENN+RF withMENN

81.5(LOSOCV)

92.50 70.50

Benba et al.[57] HFCC+SVM 87.5(LOSOCV) 90.00 85.00

Li et al.[33] Hybrid featurelearning+SVM

82.50(LOSO CV)

85.00 80.00

Vadovsk andParalic[36]

C4.5+C5.0+RF+CART

66.5(4-foldCV)

— —

Page 8: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8

Zhang [22]LSVM+MSVM+RSVM+CART+K

NN+LDA+NB

94.17(Holdout) 50.00 94.92

Benba et al.[17] MFCC+SVM 82.5(LOSOCV) 80.0 85.0

Kraipeerapun andAmornsamanku[58]

Stacking+CMTNN

75(10-foldCV)

— —

Khan et al.[40]Evolutionary

Neural NetworkEnsembles

90(10-foldCV) 93.00 97.00

Ali et al.[23] LDA-NN-GA 95(LOSOCV) 95 95

- DBN 54.6(LOSOCV)

52.4 56.8

- CNN60.0(LOSO

CV) 63.0 57.0

- DBN+SVM 50.5(LOSOCV)

53.0 48.0

- DBN+SVM(TL) 55.5(LOSOCV) 60.0 51.0

- Autoencoder+SVM

67.5(LOSOCV)

65.0 70.0

-Autoencoder+SV

M(TL)72.5(LOSO

CV) 75.0 70.0

Proposedalgorithm

PD_CSTLOK&S2 95.0(LOSOCV)

90.0 100.0

To fully illustrate the advantages of the proposed algorithm,the transfer learning(denoted as “algorithm(TL)”) andnontransfer-learning versions of DBN and the autoencoder arealso compared, denoted as DBN+SVM, DBN+SVM(TL),Autoencoder+SVM, and Autoencoder+SVM(TL), respectively.For the transfer learning version, the autoencoder and DBN areused for unsupervised learning of the source dataset, and thenetworks are trained to represent the features of the targetdataset; then, one hidden layer of networks is extracted toconstruct feature sets. Finally, an SVM is used for classification.The source domain dataset is not used for the nontransferlearning version.

The main reasons for choosing to compare the algorithmslisted above are as follows: 1) Deep learning is a popularmethod for feature learning and classification; thus, therepresentative DBN, CNN and autoencoder methods are chosen.2) The transfer learning versions of the DBN and autoencodertake advantage of the source domain dataset, and they are morecompelling than the compared algorithms.

As shown in Table II, the average accuracy rate ofPD_CSTLOK&S2 reached 95% and achieved better resultsthan other state-of-the-art methods.

Table III shows the classification and comparison results of

this algorithm are presented. The dataset used is the MaxLittledataset. The algorithm proposed in this paper is compared withthe other representative algorithms on this dataset. In addition,the proposed algorithm is compared with the most relevantalgorithms, including the SVM with linear and radial kernels,DBN, CNN and the deep autoencoder algorithm.

TABLE IIICOMPARISON OF THE CLASSIFICATION RESULTS OF THE PROPOSED

ALGORITHM(MAXLITTLE DATASET)

Study Method Accuracy (%) Sensitivity(%)

Specificity (%)

Little et al.[8]Preselection

filter+exhaustivesearch+SVM

91.40(Bootstrap with 50replicates)

— —

Shahbaba andNeal [59]

Dirichlet processmixtures

87.7(5-foldCV) — —

Psorakis et al.[60] mRVMs 89.47(10-foldCV)

— —

Guo et al.[30] GA-EM93.10(10-fold

CV) — —

Sakar and Kursun[28]

Mutualinformation+SVM

92.75(Bootstrap with 50replicates)

— —

Das [26] ANN decision tree 92.90(Holdout)

— —

Ozcift andGulten[61]

Correlation-basedfeature

selection-rotationforest

87.10(10-foldCV)

— —

Luukka [45]Fuzzy entropy

measures+similarity85.03(Holdou

t) — —

Li et al.[46]

Fuzzy-basednonlinear

transformation+SVM

93.47(Holdout) — —

Spadoto et al.[31]

PSO+OPF harmonysearch+OPFgravitationalsearch+OPF

84.01(Holdout) — —

Polat [34] FCMFW+KNN 97.93(Holdout) — —

Chen et al.[21] PCA-fuzzy KNN 96.07(10-foldCV)

— —

Ali et al.[42] DBN 94(Holdout) — —

Åström and Koker[27] Parallel ANN

91.20(Holdout) 90.5 93.0

Daliri[62] SVM withChi-square distance

91.20(Holdout)

91.71 89.92

Page 9: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9

kernel

Zuo et al.[32] PSO-fuzzy KNN 97.47(10-foldCV) 98.16 96.57

Kadam and Jadhav[43] FESA-DNN 93.84(10-fold

CV) 95.23 90.00

Ma et al.[63] SVM-RFE 96.29(10-foldCV)

95.00 97.50

Cai et al. [37] RF-BFO-SVM97.42(10-fold

CV) 99.29 91.5

Dash et al.[64] ECFA-SVM 97.95(10-foldCV)

97.90 —

Gürüler [44] KMCFW-CVANN 99.52(10-foldCV) 100 99.47

- SVM(linear kernel) 75.0(LOSO) 100.0 0.0

- SVM(RBF kernel) 75.0(LOSO) 100.0 0.0

Proposedalgorithm PD_CSTLOK&S2 100.0(LOSO) 100.0 100.0

As shown in Table III, the compared methods on the MaxLittle dataset are based on hold-one-out CV and 10-fold CV,Holdout CV is more contingent, and even when 10-fold CV isadopted, there is still no deliberate effort to avoid the fact thatthe training samples and test sample come from the samesubject. Therefore, the accuracies of the methods are perhapshigher than they would be in practice. As TABLE III shows,even under LOSO CV, the proposed algorithm still achieves thebest performance.

It can also be found from Table IV that the proposedalgorithm achieves the best results on the DNSH dataset.Outperforming the SVM, the average classification accuracy ofthe proposed algorithm reaches 86.7%, proving that it is quiteeffective even on the DNSH dataset.

TABLE IVCOMPARISON OF THE CLASSIFICATION RESULTS OF THE PROPOSED

ALGORITHM(DNSH DATASET)

Study Method Accuracy(%)

Sensitivity(%)

Specificity(%)

- SVM(linearkernel)

61.3 0.0 90.5

- SVM(RBF kernel) 67.7 0.0 100.0

Proposedalgorithm PD_CSTLOK&S2 86.7 100.0 80.0

E. Time Complexity AnalysisThe time cost of the proposed algorithm is another important

aspect of its practical feasibility. Fig. 4 shows the time cost offeature extraction from one subject of the Max Little dataset.The abscissa is the number of iterations, and the ordinate is thetime cost. As the number of iterations increases, the time cost of

the proposed method increases less rapidly than does that of thefast convolutional sparse coding method in [65]. This resultmeans that the engineering complexity of the proposedalgorithm is acceptable.

Table V shows the time cost of the proposed algorithm basedon the number of convolution kernels. The kernel training timewas ignored because the optimal kernel is fixed.

As Table V shows, the time cost increases only slowly as thenumber of convolution kernels increases. On the MaxLittle andDNSH datasets, the time cost increases slowly with the numberof kernels; on the Sakar dataset, the subject data size( NH 0 )is larger; consequently,the number of kernels is notthe key factor of the computational complexity; thus, the timecost is not strictly increased: the minimum time cost is 2.93 s,the maximum time cost is 3.83 s, and the difference is 0.09 s,indicating that the time cost is acceptable even under largenumbers of kernels.

Fig. 4.Time cost of feature extraction from one subject of the Max Littledataset.

TABLE VTHE TRAINING TIME COST OF THE PROPOSED ALGORITHM BASED ON THE

NUMBER OF CONVOLUTION KERNELSSackar dataset MaxLittle dataset DNSH dataset

convolutionkernel num

Timecost(s)

convolutionkernel num

Timecost(s)

convolutionkernel num

Timecost(s)

2 27.46 2 1.31 2 2.93

3 28.43 3 1.33 3 3.05

4 28.94 4 1.34 4 3.24

5 28.83 5 1.37 5 3.27

6 29.06 6 1.37 6 3.40

7 29.03 7 1.43 7 3.57

8 28.48 8 1.44 8 3.83

IV. DISCUSSION AND CONCLUSIONS

The current relevant PD diagnostic methods of speech dataare primarily based on local PD data; however, the problem ofinsufficient samples prevents the methods from improving their

Page 10: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10

classification accuracy. To solve this problem, a new method isproposed in this paper: a convolution sparse learning algorithmis adopted to reduce the dimensions and the demand for largernumbers of samples based on sparse transfer learning frompublic datasets and kernel optimization; mixed speech featurelearning is used to optimize the sample features and improvethe classification accuracy. By the end of this study, tens ofrepresentative algorithms were applied to verify theperformance and for comparison with the proposed algorithm.The experimental results show that the innovative parts of theproposed algorithm, including convolution sparse transferlearning, kernel optimization, and parallel selection, areeffective. Compared with other state-of-the-art algorithms, theproposed algorithm in this paper achieves significantimprovements in terms of classification accuracy, sensitivityand specificity. In addition, the time complexity of theproposed algorithm is acceptable.

The main contributions and innovations of this paper are asfollows: 1) Convolution sparse coding and transfer learning arecombined to solve the small sample problem that exists withcurrently available PD speech data. 2)A parallel sample/featureselection mechanism is designed to follow the convolutionsparse coding and transfer learning operations to furtherimprove the quality of the samples and features. 3) Aconvolution kernel optimization mechanism is proposed toenhance the current CSC algorithm. 5) Three representative PDspeech datasets are considered to study and verify theclassification algorithm.

Because the available PD training datasets have smallsample sizes, a major contradiction always exists. Based on theresearch results of this paper, the next steps are to conductfurther studies regarding the size and type of public speechdatasets and to investigate various domain adaptationarchitectures.

ACKNOWLEDGMENT

We are grateful for the support of the National NaturalScience Foundation of China NSFC (No. 61771080); theFundamental Research Funds for the Central Universities(2019CDCGTX306, 2019CDQYTX019); Basic and AdvancedResearch Project in Chongqing(cstc2018jcyjAX0779,cstc2018jcyjA3022);Chongqing Social Science PlanningProject(2018YBYY133).

REFERENCES

[1] J. F. Benge, R. L. Roberts, Z. Kekecs, and G. Elkins, "Briefreport: Knowledge of, interest in, and willingness to trybehavioral interventions in individuals with Parkinson'sdisease," Advances Mind-Body Medicine, vol. 32, no. 1, pp.8–12, Feb. 2018.

[2] A. O. A. Plouvier, T. C. O. Hartman, A. van Litsenburg, B. R.Bloem, C. van Weel, and A. L. M. Lagro-Janssen, "Being incontrol of Parkinson's disease: A qualitative study ofcommunity-dwelling patients' coping with changes in care,"Eur. J. General Pract., vol. 24, no. 1, pp. 138–145, Mar.2018, doi: 10.1080/13814788.2018.1447561.

[3] Y. F. Chiu and K. Forrest, "The impact of lexicalcharacteristics and noise on intelligibility of parkinsonianspeech," J. Speech Lang. Hearing Res., vol. 61, no. 4, pp.837–846, Apr. 2018, doi: 10.1044/2017_jslhr-s-17-0205.

[4] M. Pettorino, M. G. Busà, and E. Pellegrino, "Speech rhythmin Parkinson's disease: A study on Italian," in Proc. Annu.Conf. Int. Speech Commun. Assoc., San Francisco, USA,2016, pp. 1958–1961.

[5] Z. Galaz et al., "Prosodic analysis of neutral, stress-modifiedand rhymed speech in patients with Parkinson's disease,"Comput. Methods Programs Biomedicine, vol. 127, pp.301–317, Apr. 2016, doi: 10.1016/j.cmpb.2015.12.011.

[6] H. Byeon, H. Jin, S. Cho, "Characteristics of hypokineticDysarthria patients' speech based on sustained vowelphonation and connected speech," Int. J. u- e- Service Sci.Technol., vol. 9, no. 10, pp. 417–422, Sep. 2016, doi:10.14257/ijunesst.2016.9.10.37.

[7] B. Bigi, K. Klessa, L. Georgeton, and C. Meunier, "Asyllable-based analysis of speech temporal organization: Acomparison between speaking styles in dysarthric andhealthy populations," in Proc. Annu. Conf. Int. SpeechCommun. Assoc., Dresde, Germany, 2015, pp. 2977–2981.

[8] M. A. Little, P. E. McSharry, S. J. Roberts, D. A. E. Costello,and I. M. Moroz, "Exploiting nonlinear recurrence andfractal scaling properties for voice disorder detection,"BioMed. Eng. OnLine, vol. 6, no. 1, pp. 23–41, Jun. 2007, doi:10.1186/1475-925X-6-23.

[9] A. Tsanas, M. A. Little, P. E. McSharry, J. Spielman, and L.O. Ramig, "Novel speech signal processing algorithms forhigh-accuracy classification of Parkinson's disease," IEEETrans. Biomed. Eng., vol. 59, no. 5, pp. 1264–1271, Jan.2012, doi: 10.1109/TBME.2012.2183367.

[10] S. Lahmiri, D. A. Dawson, and A. Shmuel, "Performance ofmachine learning methods in diagnosing Parkinson’s diseasebased on dysphonia measures," Biomed. Eng. Lett., vol. 8, no.1, pp. 29–39, Feb. 2018, doi: 10.1007/s13534-017-0051-2.

[11] J. I. Godino-Llorente, S. Shattuck-Hufnagel, J. Y. Choi, L.Moro-Velázquez, and J. A. Gómez-García, "Towards theidentification of idiopathic Parkinson's Disease from thespeech. New articulatory kinetic biomarkers," PLoS One, vol.12, no. 12, p. e0189583, Dec. 2017, doi:10.1371/journal.pone.0189583.

[12] M. Kaya and H. Ş. Bilge, "Classification of Parkinson speechdata by metric learning," in 2017 Int. Artif. Intell. DataProcess. Symp. (IDAP), Malatya, Turkey, 2017, pp. 1–5.

[13] J. Mekyska et al., "Perceptual features as markers ofParkinson’s disease: The issue of clinical interpretability," inRecent Advances in Nonlinear Speech Processing, A.Esposito et al., Eds. Cham, Switzerland: SpringerInternational Publishing, 2016, pp. 83–91.

[14] A. Kacha, C. Mertens, F. Grenez, S. Skodda, and J.Schoentgen, "On the harmonic-to-noise ratio as an acousticcue of vocal timbre of Parkinson speakers," Biomed. SignalProcess. Control, vol. 37, pp. 32–38, Aug. 2017, doi:10.1016/j.bspc.2016.09.004.

[15] B. E. Sakar et al., "Collection and analysis of a Parkinsonspeech dataset with multiple types of sound recordings,"IEEE J. Biomed. Health Inform., vol. 17, no. 4, pp. 828–834,Feb. 2013, doi: 10.1109/JBHI.2013.2245674.

[16] L. Jeancolas et al., "Automatic detection of early stages ofParkinson's disease through acoustic voice analysis withmel-frequency cepstral coefficients," in 2017 Int. Conf. Adv.Technologies Signal Image Process. (ATSIP), Fez, Morocco,2017, pp. 1–6.

[17] A. Benba, A. Jilbab, and A. Hammouch, "Analysis ofmultiple types of voice recordings in cepstral domain usingMFCC for discriminating between patients with Parkinson’sdisease and healthy people," Int. J. Speech Technol., vol. 19,no. 3, pp. 449–456, Sep. 2016, doi:10.1007/s10772-016-9338-4.

[18] P. Gómez-Vilda, D. Palacios-Alonso, V. Rodellar-Biarge, A.Álvarez-Marquina, V. Nieto-Lluis, and R. Martínez-Olalla,"Parkinson's disease monitoring by biomechanical instabilityof phonation," Neurocomputing, vol. 255, pp. 3–16, Sep.2017, doi: 10.1016/j.neucom.2016.06.092.

[19] W. Ji and Y. Li, "Stable dysphonia measures selection forParkinson speech rehabilitation via diversity regularizedensemble," in 2016 IEEE Int. Conf. Acoust. Speech SignalProcess. (ICASSP), Shanghai, China, 2016, pp. 2264–2268.

Page 11: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11

[20] J. Kim et al., "Automatic estimation of Parkinson's diseaseseverity from diverse speech tasks," in Proc. Annu. Conf. Int.Speech Commun. Assoc., Dresden, Germany, 2015, pp.914–918.

[21] H.-L. Chen et al., "An efficient diagnosis system fordetection of Parkinson’s disease using fuzzy k-nearestneighbor approach," Expert Syst. Appl., vol. 40, no. 1, pp.263–271, Jan. 2013, doi: 10.1016/j.eswa.2012.07.014.

[22] Y. Zhang, "Can a smartphone diagnose Parkinson disease? Adeep neural network method and telediagnosis systemimplementation," Parkinson’s Dis., vol. 2017, p. 6209703,Aug. 2017, doi: 10.1155/2017/6209703.

[23] L. Ali, C. Zhu, C. Zhang, and Y. Liu, "Automated detectionof Parkinson’s disease based on multiple types of sustainedphonations using linear discriminant analysis and geneticallyoptimized neural network," IEEE J. Translational Eng.Health Medicine, vol. 7, pp. 1–10, Oct. 2019, doi:10.1109/JTEHM.2019.2940900.

[24] M. Shahbakhti, D. Taherifar, and A. Sorouri, "Linear andnon-linear speech features for detection of Parkinson'sdisease," in The 6th 2013 Biomed. Eng. Int. Conf., AmphurMuang, Thailand, 2013, pp. 1–3.

[25] H. Dubey, J. C. Goldberg, M. Abtahi, L. Mahler, and K.Mankodiya, "EchoWear: Smartwatch technology for voiceand speech treatments of patients with Parkinson's disease,"in Proc. Conf. ACM Wireless Health, Maryland, USA, 2015,pp. 1–8.

[26] R. Das, "A comparison of multiple classification methods fordiagnosis of Parkinson disease," Expert Syst. Appl., vol. 37,no. 2, pp. 1568–1572, Mar. 2010, doi:10.1016/j.eswa.2009.06.040.

[27] F. Åström and R. Koker, "A parallel neural networkapproach to prediction of Parkinson’s disease," Expert Syst.Appl., vol. 38, no. 10, pp. 12470–12474, Sept. 2011, doi:10.1016/j.eswa.2011.04.028.

[28] C. O. Sakar and O. Kursun, "Telediagnosis of Parkinson’sdisease using measurements of dysphonia," J. Med. Syst., vol.34, no. 4, pp. 591–599, Aug. 2010, doi:10.1007/s10916-009-9272-y.

[29] T. Arias-Vergara, J. C. Vasquez-Correa, J. R.Orozco-Arroyave, J. F. Vargas-Bonilla, and E. Nöth,"Parkinson's disease progression assessment from speechusing GMM-UBM," in Proc. Annu. Conf. Int. SpeechCommun. Assoc., San Francisco, USA, 2016, pp.1933–1937.

[30] P.-F. Guo, P. Bhattacharya, and N. Kharma, "Advances inDetecting Parkinson’s Disease," in Med. Biometrics Vol.6165 Lecture Notes Comput. Sci., Berlin, Heidelberg, 2010,pp. 306–314.

[31] A. A. Spadoto, R. C. Guido, F. L. Carnevali, A. F. Pagnin, A.X. Falcão, and J. P. Papa, "Improving Parkinson's diseaseidentification through evolutionary-based feature selection,"in 2011 Annu. Int. Conf. IEEE Eng. Medicine Biol. Soc.,Boston, MA, 2011, pp. 7857–7860.

[32] W.-L. Zuo, Z.-Y. Wang, T. Liu, and H.-L. Chen, "Effectivedetection of Parkinson's disease using an adaptive fuzzyk-nearest neighbor approach," Biomed. Signal Process.Control, vol. 8, no. 4, pp. 364–373, Jul. 2013, doi:10.1016/j.bspc.2013.02.006.

[33] Y. Li, C. Zhang, Y. Jia, P. Wang, X. Zhang, and T. Xie,"Simultaneous learning of speech feature and segment forclassification of Parkinson disease," in 2017 IEEE 19th Int.Conf. e-Health Netw. Appl. Services (Healthcom), Dalian,China, 2017, pp. 1–6.

[34] K. Polat, "Classification of Parkinson's disease using featureweighting method on the basis of fuzzy C-means clustering,"Int. J. Syst. Sci., vol. 43, no. 4, pp. 597–609, Apr. 2012, doi:10.1080/00207721.2011.581395.

[35] H.-H. Zhang et al., "Classification of Parkinson’s diseaseutilizing multi-edit nearest-neighbor and ensemble learningalgorithms with speech samples," BioMed. Eng. OnLine, vol.15, no. 1, pp. 122–143, Nov. 2016, doi:10.1186/s12938-016-0242-6.

[36] M. Vadovský and J. Paralič, "Parkinson's disease patientsclassification based on the speech signals," in 2017 IEEE

15th Int. Symp. Appl. Mach. Intell. Inform. (SAMI), Herl'any,Slovakia, 2017, pp. 321–326.

[37] Z. Cai, J. Gu, and H. Chen, "A new hybrid intelligentframework for predicting Parkinson’s disease," IEEE Access,vol. 5, pp. 17188–17200, Aug. 2017, doi:10.1109/ACCESS.2017.2741521.

[38] Ö. Eskıdere, A. Karatutlu, and C. Ünal, "Detection ofParkinson's disease from vocal features using randomsubspace classifier ensemble," in 2015 Twelve Int. Conf.Electron. Comput. Comput. (ICECCO), Almaty, Kazakhstan,2015, pp. 1–4.

[39] M. Behroozi and A. Sami, "A multiple-classifier frameworkfor Parkinson’s disease detection based on various vocaltest," Int. J. Telemedicine Appl., vol. 2016, p. 6837498, Mar.2016, doi: 10.1155/2016/6837498.

[40] M. M. Khan, A. Mendes, and S. K. Chalup, "Evolutionarywavelet neural network ensembles for breast cancer andParkinson’s disease prediction," PLoS One, vol. 13, no. 2, p.e0192192, Feb. 2018, doi: 10.1371/journal.pone.0192192.

[41] O. Geman, "Data processing for Parkinson's disease: Tremor,speech and gait signal analysis," in E-Health BioengineeringConf., EHB 2011, Iasi, Romania, 2011, pp. 1–4.

[42] A. H. Al-Fatlawi, M. H. Jabardi, and S. H. Ling, "Efficientdiagnosis system for Parkinson's disease using deep beliefnetwork," in 2016 IEEE Congr. Evol. Comput. (CEC),Vancouver, BC, 2016, pp. 1324–1330.

[43] V. J. Kadam and S. M. Jadhav, "Feature ensemble learningbased on sparse autoencoders for diagnosis of Parkinson’sdisease," in Comput. Commun. Signal Process., Singapore,Asia, 2019, pp. 567–581.

[44] H. Gürüler, "A novel diagnosis system for Parkinson’sdisease using complex-valued artificial neural network withk-means clustering feature weighting method," NeuralComput. Appl., vol. 28, no. 7, pp. 1657–1666, Jul. 2017, doi:10.1007/s00521-015-2142-2.

[45] P. Luukka, "Feature selection using fuzzy entropy measureswith similarity classifier," Expert Syst. Appl., vol. 38, no. 4,pp. 4600–4607, Apr. 2011, doi: 10.1016/j.eswa.2010.09.133.

[46] D.-C. Li, C.-W. Liu, and S. C. Hu, "A fuzzy-based datatransformation for feature extraction to increaseclassification performance with small medical data sets,"Artif. Intell. Medicine, vol. 52, no. 1, pp. 45–52, May 2011,doi: 10.1016/j.artmed.2011.02.001.

[47] Y. Li et al., "Research on diagnosis algorithm of Parkinson'sdisease based on speech sample multi-edit and randomforest," J. Biomed. Eng., vol. 33, no. 6, pp. 1053–1059, Dec.2016.

[48] X. Zhang et al., "Combining speech sample and featurebilateral selection algorithm for classification of Parkinson'sdisease," J. Biomed. Eng., vol. 34, no. 6, pp. 942–948, Feb.2018, doi: 10.7507/1001-5515.201704061.

[49] H. Zhang and V. M. Patel, "Convolutional sparse andlow-rank coding-based image decomposition," IEEE Trans.Image Process., vol. 27, no. 5, pp. 2121–2133, Dec. 2018,doi: 10.1109/TIP.2017.2786469.

[50] X. Hu, F. Heide, Q. Dai, and G. Wetzstein, "Convolutionalsparse coding for RGB+NIR imaging," IEEE Trans. ImageProcess., vol. 27, no. 4, pp. 1611–1625, Dec. 2018, doi:10.1109/TIP.2017.2781303.

[51] B. Wohlberg, "Efficient algorithms for convolutional sparserepresentations," IEEE Trans. Image Process., vol. 25, no. 1,pp. 301–315, Oct. 2016, doi: 10.1109/TIP.2015.2495260.

[52] H. Chang, J. Han, C. Zhong, A. M. Snijders, and J. Mao,"Unsupervised transfer learning via multi-scaleconvolutional sparse coding for biomedical applications,"IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, pp.1182–1194, Jan. 2018, doi: 10.1109/TPAMI.2017.2656884.

[53] M. Šorel and F. Šroubek, "Fast convolutional sparse codingusing matrix inversion lemma," Digital Signal Process., vol.55, pp. 44–51, Aug. 2016, doi: 10.1016/j.dsp.2016.04.012.

[54] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein,"Distributed optimization and statistical learning via thealternating direction method of multipliers," Found. TrendsMach. Learn., vol. 3, no. 1, pp. 1–122, Jan. 2011.

Page 12: >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE ... · >REPLACETHISLINEWITHYOURPAPERIDENTIFICATIONNUMBER(DOUBLE-CLICKHERETOEDIT)< 3 where i [ai1,ai2, ,aiN],1 i H

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 12

[55] X. Cai, G. Gu, B. He, and X. Yuan, "A proximal pointalgorithm revisit on the alternating direction method ofmultipliers," Sci. China Math., vol. 56, no. 10, pp.2179–2186, Oct. 2013, doi: 10.1007/s11425-013-4683-0.

[56] İ. Cantürk and F. Karabiber, "A Machine learning system forthe diagnosis of Parkinson’s disease from speech signals andits application to multiple speech signal types," Arabian J.Sci. Eng., vol. 41, no. 12, pp. 5049–5059, Dec. 2016, doi:10.1007/s13369-016-2206-3.

[57] A. Benba, A. Jilbab, and A. Hammouch, "Using humanfactor cepstral coefficient on multiple types of voicerecordings for detecting patients with Parkinson's disease,"IRBM, vol. 38, no. 6, pp. 346–351, Nov. 2017, doi:10.1016/j.irbm.2017.10.002.

[58] P. Kraipeerapun and S. Amornsamankul, "Using stackedgeneralization and complementary neural networks topredict Parkinson's disease," in 2015 11th Int. Conf. NaturalComput. (ICNC), Zhangjiajie, China, 2015, pp. 1290–1294.

[59] B. Shahbaba and R. Neal, "Nonlinear models using Dirichletprocess mixtures," J. Mach. Learn. Res., vol. 10, pp.1829–1850, Dec. 2009.

[60] I. Psorakis, T. Damoulas, and M. A. Girolami, "Multiclassrelevance vector machines: Sparsity and accuracy," IEEETrans. Neural Netw., vol. 21, no. 10, pp. 1588–1598, Aug.2010, doi: 10.1109/TNN.2010.2064787.

[61] A. Ozcift and A. Gulten, "Classifier ensemble constructionwith rotation forest to improve medical diagnosisperformance of machine learning algorithms," Comput.Methods Programs Biomedicine, vol. 104, no. 3, pp.443–451, Dec. 2011, doi: 10.1016/j.cmpb.2011.03.018.

[62] M. R. Daliri, "Chi-square distance kernel of the gaits for thediagnosis of Parkinson's disease," Biomed. Signal Process.Control, vol. 8, no. 1, pp. 66–70, Jan. 2013, doi:10.1016/j.bspc.2012.04.007.

[63] H. Ma, T. Tan, H. Zhou, and T. Gao, "Support vectormachine-recursive feature elimination for the diagnosis ofParkinson disease based on speech analysis," in 2016Seventh Int. Conf. Intell. Control Inf. Process. (ICICIP),Siem Reap, Cambodia, 2017, pp. 34–40.

[64] S. Dash, R. Thulasiram, and P. Thulasiraman, "An enhancedchaos-based firefly model for Parkinson's disease diagnosisand classification," in 2017 Int. Conf. Inf. Technol. (ICIT),Bhubaneswar, India, 2017, pp. 159–164.

[65] H. Bristow, A. Eriksson, and S. Lucey, "Fast convolutionalsparse coding," in 2013 IEEE Conf. Comput. Vision PatternRecognit., Portland, OR, 2013, pp. 391–398.

XiaohengZhang received an M.S. degreein Circuits and Systems from ChongqingUniversity, China, in 2007. He is anengineer with many years of engineeringexperience in signal processing.He is currently an associate professor withChongqing Radio and TV University. Hiscurrent research interests include speechsignal processing, pattern recognition andmachine learning.

Yongming Li received M.S. and Ph.D.degrees in Circuits and Systems fromChongqing University, China, in 2003 and2007, respectively. He was a visitingscholar at Pennsylvania State University,USA from 2008 to 2009, and worked as apostdoctoral fellow at Carnegie-MellonUniversity, USA, from Feb. to Dec. 2009.He is currently a professor with the School

of Microelectronics and Communication Engineering,Chongqing University and serves as an editorial board memberfor IEEE Access. His current research interests include signalprocessing, pattern recognition, data mining, and machinelearning.

Pin Wang received an M.S. degree fromChongqing University, China, in 2003 anda Ph.D. degree from NanyangTechnological University, Singapore, in2008. She worked as a postdoctoral atNanyang Technological University,Singapore, from 2008 to 2009. In2009–2010, she was funded by theNational Energy Laboratory PostdoctoralFund to conduct postdoctoral research at

the University of Pittsburgh.She is currently an associate professor at Chongqing University.Her current research interests include hyperspectral imagingand detection, intelligent information processing, and big dataanalysis.

Xiaoheng Tan received the B.E. and Ph.D.degrees in electrical engineering fromChongqing University, Chongqing, China,in 1998 and 2003, respectively. He was aVisiting Scholar with the University ofQueensland, Brisbane, Qld., Australia,from 2008 to 2009. He is currently aprofessor with the School ofMicroelectronics and CommunicationEngineering, Chongqing University. His

current research interests include modern communicationstechnologies and systems, communications signal processing,pattern recognition, and machine learning.

Yuchuan Liu received a bachelor’s degreeincommunication engineering from theSouthwest University of Science andTechnology, China, in 2015.He iscurrently pursuing a Ph.D. degree withChongqing University, Chongqing, China.His research interests includedimensionality reduction, patternrecognition, and machine learning.