research article spectral analysis on time-course

11
Hindawi Publishing Corporation Advances in Bioinformatics Volume 2013, Article ID 171530, 10 pages http://dx.doi.org/10.1155/2013/171530 Research Article Spectral Analysis on Time-Course Expression Data: Detecting Periodic Genes Using a Real-Valued Iterative Adaptive Approach Kwadwo S. Agyepong, 1 Fang-Han Hsu, 1 Edward R. Dougherty, 1,2 and Erchin Serpedin 1 1 Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, USA 2 Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004-2101, USA Correspondence should be addressed to Erchin Serpedin; [email protected] Received 26 October 2012; Accepted 23 January 2013 Academic Editor: Mohamed Nounou Copyright © 2013 Kwadwo S. Agyepong et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Time-course expression profiles and methods for spectrum analysis have been applied for detecting transcriptional periodicities, which are valuable patterns to unravel genes associated with cell cycle and circadian rhythm regulation. However, most of the proposed methods suffer from restrictions and large false positives to a certain extent. Additionally, in some experiments, arbitrarily irregular sampling times as well as the presence of high noise and small sample sizes make accurate detection a challenging task. A novel scheme for detecting periodicities in time-course expression data is proposed, in which a real-valued iterative adaptive approach (RIAA), originally proposed for signal processing, is applied for periodogram estimation. e inferred spectrum is then analyzed using Fisher’s hypothesis test. With a proper -value threshold, periodic genes can be detected. A periodic signal, two nonperiodic signals, and four sampling strategies were considered in the simulations, including both bursts and drops. In addition, two yeast real datasets were applied for validation. e simulations and real data analysis reveal that RIAA can perform competitively with the existing algorithms. e advantage of RIAA is manifested when the expression data are highly irregularly sampled, and when the number of cycles covered by the sampling time points is very reduced. 1. Introduction Patterns of periodic gene expression have been found to be associated with essential biological processes such as cell cycle and circadian rhythm [1], and the detection of periodic genes is crucial to advance our understanding of gene function, disease pathways, and, ultimately, therapeu- tic solutions. Using high-throughput technologies such as microarrays, gene expression profiles at discrete time points can be derived and hundreds of cell cycle regulated genes have been reported in a variety of species. For example, Spellman et al. applied cell synchronization methods and conducted time-course gene expression experiments on Saccharomyces cerevisiae [2]. e authors identified 800 cell cycle regulated genes using DNA microarrays. Also, Rustici et al. and Menges et al. identified 407 and about 500 cell cycle regulated genes in Schizosaccharomyces pombe and Arabidopsis, respectively [3, 4]. Signal processing in the frequency domain simplifies the analysis and an emerging number of studies have demon- strated the power of spectrum analysis in the detection of periodic genes. Considering the common issues of missing values and noise in microarray experiments, Ahdesm¨ aki et al. proposed a robust detection method incorporating the fast Fourier transform (FFT) with a series of data preprocessing and hypothesis testing steps [5]. Two years later, the authors further proposed a modified version for expression data with unevenly spaced time intervals [6]. A Lomb-Scargle (LS) approach, originally used for finding periodicities in astrophysics, was developed for expression data with uneven sampling [7]. Yang et al. further improved the performance using a detrended fluctuation analysis [8]. It used harmonic regression in the time domain for significance evaluation. e method was termed “Lomb-Scargle periodogram and harmonic regression (LSPR).” Basically, these methods con- sists of two steps: transferring the signals into the frequency

Upload: others

Post on 29-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Hindawi Publishing CorporationAdvances in BioinformaticsVolume 2013 Article ID 171530 10 pageshttpdxdoiorg1011552013171530

Research ArticleSpectral Analysis on Time-Course Expression Data DetectingPeriodic Genes Using a Real-Valued Iterative Adaptive Approach

Kwadwo S Agyepong1 Fang-Han Hsu1 Edward R Dougherty12 and Erchin Serpedin1

1 Department of Electrical and Computer Engineering Texas AampM University College Station TX 77843-3128 USA2Computational Biology Division Translational Genomics Research Institute Phoenix AZ 85004-2101 USA

Correspondence should be addressed to Erchin Serpedin serpedinecetamuedu

Received 26 October 2012 Accepted 23 January 2013

Academic Editor Mohamed Nounou

Copyright copy 2013 Kwadwo S Agyepong et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Time-course expression profiles and methods for spectrum analysis have been applied for detecting transcriptional periodicitieswhich are valuable patterns to unravel genes associated with cell cycle and circadian rhythm regulation However most of theproposedmethods suffer from restrictions and large false positives to a certain extent Additionally in some experiments arbitrarilyirregular sampling times as well as the presence of high noise and small sample sizes make accurate detection a challenging taskA novel scheme for detecting periodicities in time-course expression data is proposed in which a real-valued iterative adaptiveapproach (RIAA) originally proposed for signal processing is applied for periodogram estimation The inferred spectrum is thenanalyzed using Fisherrsquos hypothesis test With a proper 119901-value threshold periodic genes can be detected A periodic signal twononperiodic signals and four sampling strategies were considered in the simulations including both bursts and drops In additiontwo yeast real datasetswere applied for validationThe simulations and real data analysis reveal that RIAAcanperformcompetitivelywith the existing algorithms The advantage of RIAA is manifested when the expression data are highly irregularly sampled andwhen the number of cycles covered by the sampling time points is very reduced

1 Introduction

Patterns of periodic gene expression have been found tobe associated with essential biological processes such ascell cycle and circadian rhythm [1] and the detection ofperiodic genes is crucial to advance our understanding ofgene function disease pathways and ultimately therapeu-tic solutions Using high-throughput technologies such asmicroarrays gene expression profiles at discrete time pointscan be derived andhundreds of cell cycle regulated genes havebeen reported in a variety of species For example Spellmanet al applied cell synchronization methods and conductedtime-course gene expression experiments on Saccharomycescerevisiae [2] The authors identified 800 cell cycle regulatedgenes usingDNAmicroarrays Also Rustici et al andMengeset al identified 407 and about 500 cell cycle regulated genesin Schizosaccharomyces pombe and Arabidopsis respectively[3 4]

Signal processing in the frequency domain simplifies theanalysis and an emerging number of studies have demon-strated the power of spectrum analysis in the detection ofperiodic genes Considering the common issues of missingvalues and noise in microarray experiments Ahdesmaki etal proposed a robust detectionmethod incorporating the fastFourier transform (FFT) with a series of data preprocessingand hypothesis testing steps [5] Two years later the authorsfurther proposed a modified version for expression datawith unevenly spaced time intervals [6] A Lomb-Scargle(LS) approach originally used for finding periodicities inastrophysics was developed for expression data with unevensampling [7] Yang et al further improved the performanceusing a detrended fluctuation analysis [8] It used harmonicregression in the time domain for significance evaluationThe method was termed ldquoLomb-Scargle periodogram andharmonic regression (LSPR)rdquo Basically these methods con-sists of two steps transferring the signals into the frequency

2 Advances in Bioinformatics

(spectral) domain and then applying a significance evaluationtest for the resulting peak in the spectral density

While numerous methods have been developed fordetecting periodicities in gene expression most of thesemethods suffer from false positive errors andworking restric-tions to a certain extent particularly when the time-coursedata contain limited time points In addition no algorithmseems available to resolve all of these challenges Microar-ray as well as other high-throughput experiments due tohigh manufacturing and preparation costs have commoncharacteristics of small sample size [9] noisy measurements[10] and arbitrary sampling strategies [11] thereby makingthe detection of periodicities highly challenging Since thenumber and functions of cell cycle regulated genes or peri-odic genes remain greatly uncertain advances in detectionalgorithms are urgently needed

Recently Stoica et al developed a novel nonparametricmethod termed the ldquoreal-valued iterative adaptive approach(RIAA)rdquo specifically for spectral analysis with nonuniformlysampled data [12] As stated by the authors RIAA aniteratively weighted least-squares periodogram can providerobust spectral estimates and is most suitable for sinu-soidal signals These characteristics of RIAA inspired us toapply it to time-course gene expression data and conductan examination on its performance Herein we incorpo-rate RIAA with a Fisherrsquos statistic to detect transcriptionalperiodicities A rigorous comparison of RIAA with severalaforementioned algorithms in terms of sensitivities andspecificities is conducted through simulations and simula-tion results dealing with real data analysis are also pro-vided

In this study we found that the RIAA algorithm canprovide robust spectral estimates for the detection of periodicgenes regardless of the sampling strategies adopted in theexperiments or the nonperiodic nature of noise present inthe measurement process We show through simulations thatthe RIAA can outperform the existing algorithms particularlywhen the data are highly irregularly sampled and when thenumber of cycles covered by the sampling time points isvery fewThese characteristics of RIAA fit perfectly the needsof time-course gene expression data analysis This paper isorganized as follows In Section 2 we begin with an overviewof RIAA In Section 3 a scheme for detecting periodicities isproposed and simulationmodels for performance evaluationand a real data analysis for validation purposes are presentedA complete investigation of the performance of RIAA and arigorous comparison with other algorithms are provided inSection 4

2 RIAA Algorithm

RIAA is an iterative algorithm developed for finding theleast-squares periodogram with the utilization of a weightedfunction The essential mathematics involved in RIAA isintroduced in this section with the algorithm input beingtime-course expression data for more details regardingRIAA the readers are encouraged to check the original paperby Stoica et al [12]

21 Basics Suppose that the signals associated with the peri-odic gene expressions are composed of noise and sinusoidalcomponents Let 119910ℎ(119905119894) 119894 = 1 119899 denote the time-courseexpression ratios of gene ℎ at instances 1199051 119905119899 respectively119910ℎ(119905119894) are real numbers sum119899

119894=1119910ℎ(119905119894) = 0 The least-squares

periodogramΦ119897119904119901 is given by

Φ119897119904119901 = |(120596)|2 (1)

where (120596) is the solution to the following fitting problem

(120596) = argmin120572(120596)

119899

sum

119894=1

[119910ℎ (119905119894) minus 120572 (120596) 119890119895120596119905119894]2

(2)

Let 120572(120596) = |120572(120596)|119890119895120601(120596)

= 120573119890119895120579 where 120573 = |120572(120596)| ge 0 and

120579 = 120601(120596) isin [0 2120587] refer to the amplitude and phase of 120572(120596)respectively The criterion in (2) can then be rewritten as119899

sum

119894=1

[119910ℎ (119905119894) minus 120573 cos (120596119905119894 + 120579)]2+ 1205732

119899

sum

119894=1

sin2 (120596119905119894 + 120579) (3)

The second term in the above equation is data indepen-dent and can be omitted from the minimization operationHence the criterion (2) is simplified to

(120573 120579) = argmin120573120579

119899

sum

119894=1

[119910ℎ (119905119894) minus 120573 cos (120596119905119894 + 120579)]2 (4)

We further apply 119886 = 120573 cos(120579) and 119887 = minus120573 sin(120579) and derivean equivalent of (4) as follows

(119886 ) = argmin119886119887

119899

sum

119894=1

[119910ℎ (119905119894) minus 119886 cos (120596119905119894) minus 119887 sin (120596119905119894)]2 (5)

The target of interest to the fitting problem now becomes 119886and (instead of 120572(120596)) and the solution is well known to be

[119886

] = Rminus1r (6)

where

R =

119899

sum

119894=1

[cos (120596119905119894)

2 cos (120596119905119894) sin (120596119905119894)sin (120596119905119894) cos (120596119905119894) sin (120596119905119894)

2 ]

r =119899

sum

119894=1

[cos (120596119905119894)sin (120596119905119894)

] 119910ℎ (119905119894)

(7)

After 119886 and are estimated the least-squares periodogramcan be derived

22 Observation Interval and Resolution Prior to implemen-tation of RIAA for periodogram estimation the observationinterval [0 120596max] and the resolution in terms of grid size haveto be selected To this end the maximum frequency 120596max inthe observation interval without aliasing errors for samplinginstances 1199051 119905119899 can be evaluated by

120596max =1205960

2 (8)

Advances in Bioinformatics 3

where 1205960 is given by

1205960 =2 (119899 minus 1) 120587

sum119899minus1

119894=1(119905119894+1 minus 119905119894)

(9)

The observation interval [0 120596max] is hence chosen after 120596maxis obtained

To ensure that the smallest frequency separation in time-course expression data with regular or irregular sampling canbe adequately detected the grid size Δ120596 is chosen to be

Δ120596 =2120587

119905119899 minus 1199051 (10)

which in fact is the resolution limit of the least-squaresperiodogram As a result the frequency grids 120596119892 consideredin periodogram are

120596119892 = 119892Δ120596 119892 = 1 119866 (11)

where the number of grids 119866 is given by

119866 = lfloor120596maxΔ120596

rfloor (12)

23 Implementation The following notations are introducedfor the implementation of RIAA at a specific frequency 120596119892

Y = [119910ℎ (1199051) sdot sdot sdot 119910ℎ (119905119899)]119879

120588119892 = [119886 (120596119892) 119887 (120596119892)]119879

A119892 = [c119892 s119892]

(13)

where

c119892 = [cos (1205961198921199051) sdot sdot sdot cos (120596119892119905119899)]119879

s119892 = [sin (1205961198921199051) sdot sdot sdot sin (120596119892119905119899)]119879

(14)

and 119886(120596119892) and 119887(120596119892) denote variables 119886 and 119887 at frequency120596119892respectively

RIAArsquos salient feature is the addition of a weighted matrixQ119892 to the least-squares fitting criterion The weighted matrixQ119892 can be viewed as a covariance matrix encapsulating thecontributions of noise and other sinusoidal components in Yother than 120596119892 to the spectrum it is defined as

Q119892 = Σ +119866

sum

119898=1119898 = 119892

A119898D119898A119879

119898 (15)

where

D119898 =1198862(120596119892) + 119887

2(120596119892)

2[1 0

0 1] (16)

and Σ denotes the covariance matrix of noise in expressiondata Y given by

Σ =[[

[

1205902

0

0 120590

2

]]

]

(17)

Assuming thatQ119892 is invertible in RIAA a weighted least-squares fitting problem is formulated and considered forfinding 119886 and (instead of using (5)) and it is written in theform of matrices using (13) as follows

120588119892 = argmin120588119892

[Y minus A119892120588119892]119879

Qminus1119892[Y minus A119892120588119892] (18)

In Stoica et al [12] the solution to (18) has been shown tobe

120588119892 =A119879119892Qminus1119892Y

A119879119892Qminus1119892A119892

(19)

and the RIAA periodogram at 120596 = 120596119892 can be derived by

Φriaa (120596119892) =1

119899120588119879

119892(A119879119892A119892) 120588119892 (20)

From (15) and (19) it is obvious thatQ119892 and 120588119892 are dependenton each other An iterative approach (ie RIAA) is hencea feasible solution to get the estimate 120588119892 and the weightedmatrixQ119892

The iteration for estimating spectrum starts with initialestimates 1205880

119892 in which the elements 119886 and are given by (6)

with 120596 = 120596119892 119892 = 1 119866 After initialization the firstiteration begins First the elements 119886 and of 1205880

119892are applied

to obtain D1119898using (16) Secondly to get a good estimate of

1 the frequency 120596119901 at which the largest value-119901 is located

in the temporary periodogramΦ0(120596119892) 119892 = 1 119866 derived

using (20) with 120588119892 = 1205880

119892 is applied for obtaining a reversed

engineered signal Y0 The elements 119910ℎ(119905119894) 119894 = 1 119899 in Y0are given by

119910ℎ (119905119894) =radic2119875 cos (120596119901119905119894 + 119904) 119894 = 1 119899 (21)

The phase of the cosine function 119904 is unknown however 1is estimable using

1= min119904isin[02120587]

10038171003817100381710038171003817Y minus Y010038171003817100381710038171003817

2

119899 (22)

where || sdot || is the Euclidean norm With estimates D1119898and

1 the estimates Q1

119892 119892 = 1 119866 in the first iteration are

hence given by (15) After this Q1119892are inserted into the right-

hand side of (19) and updated estimates 1205881119892 119892 = 1 119866

are derived The algorithm consists of repeating these stepsand updating Q119896

119892and 120588

119896

119892iteratively where 119896 denotes the

number of iterations until a termination criterion is reachedIf the process stops at the 119870th iteration then the final RIAAperiodogram is given by (20) using 120588119870

119892 The pseudocode in

Algorithm 1 represents a concise description of the iterativeRIAA process

3 Methods

Figure 1 demonstrates our scheme for periodicity detectionand algorithm comparison The first step involves a peri-odogram estimation which converts the time-course gene

4 Advances in Bioinformatics

Algorithm RIAA

InitializationUse (6) to obtain the initial estimates 119886 and in 1205880

119892

TheFirst IterationObtain D1

119898using (16) with parameters 119886 and given by 1205880

119892 Obtain 1 using (22) Using D1

119898

and 1 to drive the first weighted matrix Q1119892by (15) Update estimate 1205881

119892by (19) withQ119892 = Q1

119892

Updating IterationAt the 119896th iteration 119896 = 1 2 estimates Q119896

119892and 120588119896

119892are iteratively updated in the same way

as the first iterationTerminationTerminate simply after 15 iterations (119870 = 15) or when the total changes in 119889119896

119892= ||120588119896

119892||

for 119892 = 1 119866 is extremely small sayradicsum119866119892=1

(119889119896119892minus 119889119896minus1119892

)2lt 0005radicsum119866

119892=1(119889119896minus1119892

)2 then 119870 = 119896

Algorithm 1 The pseudocode of the iterative process in RIAA

expression ratios into the frequency domain Three methodsare considered for comparison RIAA LS and a detrend LS(termed DLS) which uses an additional detrend function(developed in LSPR) before regular LS periodogram estima-tion is applied The derived spectra are then analyzed usinghypothesis testing This study is conducted using a Fisherrsquostest with the null hypothesis that there are no periodicsignals in the time domain and hence no significantly largepeak in the derived spectra The algorithm performanceis evaluated and compared via simulations and receiveroperating characteristic (ROC) curves In real microarraydata analysis three published benchmark sets are utilized asstandards of cell cycle genes for performance comparison

31 Fisherrsquos Test After the spectrum of time-course expres-sion data is obtained via periodogram estimation a Fisherrsquosstatistic 119891 for gene ℎ with the null hypothesis 1198670 thatthe peak of the spectral density is insignificant against thealternative hypothesis1198671 that the peak of the spectral densityis significant is applied as

119891ℎ =max1le119892le119866 (Φ (120596119892))

119866minus1sum119866

119892=1Φ(120596119892)

(23)

where Φ refers to the periodogram derived using RIAA LSor DLS The null hypothesis 1198670 is rejected and the geneℎ is claimed as a periodic gene if its 119901-value denoted as119901ℎ is less than or equal to a specific significance thresholdFor simplicity 119901ℎ is approximated from the asymptotic nulldistribution of 119891 assuming Gaussian noise [13] as follows

119901ℎ = 1 minus 119890minus119899119890minus119891ℎ

(24)

In real data analysis deviation might be invoked for theestimation of 119901ℎ when the time-course data is short Thisissue was carefully addressed by Liew et al [14] and assuggested alternative methods such as random permutationmay provide less deviation and better performance Howeverpermutation also has limitations such as tending to be con-servative [15] While finding the most robust method for the

Time-courseexpression ratios

Spec

tral

ana

lysis

in fr

eque

ncy

dom

ain

Periodogramestimation

Hypothesistesting

Benchmarksets

Real data

RIAAcompared withLS DLS

Periodic genes andnonperiodicities

SimulationsROC curves

Fisherrsquos test

Figure 1 The scheme of the process for detecting periodicities intime-course expression data

119901-value evaluation remains an open question it gets beyondthe scope of this study since the algorithm comparison viaROC curves is threshold independent [16] and the results areunaffected by the deviation

32 Simulations Simulations are applied to evaluate theperformance of RIAA The simulation models and samplingstrategies used for simulations are described in the followingparagraphs

321 Periodic and Nonperiodic Signals Three models onefor periodic signals and two for nonperiodic signals areconsidered as transcriptional signals Since periodic genes aretranscribed in an oscillatory manner the expression levels 119910119904embedded with periodicities are assumed to be

119910119904 (119905119894) = 119872 cos (120596119904119905119894) + 120598119905119894 119894 = 1 119899 (25)

where 119872 denotes the sinusoidal amplitude 120596119904 refers to thesignal frequency 120598119905119894 are Gaussian noise independent and

Advances in Bioinformatics 5

0 2 4 6 8 10 12 14 16

0

2

4

Time

Gen

e exp

ress

ion

Sampled dataPeriodic signal

minus2

(a)

Frequency

Am

plitu

de

RIAA

00

02

02

04

04

06

06

08

08

1

01 03 05 07

= 24 times 10minus3

119901-value

(b)

Figure 2 (a) A time-course periodic signal with frequency = 02 sampled by the bio-like sampling strategy 16 time points are assigned tothe interval (08] and 8 time points are assigned to the interval (816] (b)The periodogram derived using RIAAThemaximum value (peak)in the periodogram locates at frequency = 0195

identically distributed (iid) with parameters 120583 and 120590 Fornonperiodic signals the first model 119910119899 is simply composed ofGaussian noise given by

119910119899 (119905119894) = 120598119905119894 119894 = 1 119899 (26)

Additionally as visualized by Chubb et al gene transcriptioncan be nonperiodically activated with irregular intervals in aliving eukaryotic cell like pulses turning on and off rapidlyand discontinuously [17] Based on this the second nonpe-riodic model 1199101015840

119899incorporates one additional transcriptional

burst and one additional sudden drop into the Gaussiannoise which can be written as

1199101015840

119899(119905119894) = 119868119887 (119905119894) minus 119868119889 (119905119894) + 120598119905119894

119894 = 1 119899 (27)

where 119868119887 and 119868119889 are indicator functions equal to 1 at thelocation of the burst and the drop respectively and 0

otherwise The transcriptional burst assumes a positive pulsewhile the transcriptional drop assumes a negative pulse Bothof them may be located randomly among all time points andare assumed to last for two time points In other words theindicator functions are equal to 1 at two consecutive timepoints say 119868119887 = 1 at 119905119894 and 119905119894+1 The burst and the drop haveno overlap

322 Sampling Strategies As for the choices of sampling timepoints 119905119894 119894 = 1 119899 four different sampling strategies onewith regular sampling and three with irregular sampling areconsidered First regular sampling is applied inwhich all timeintervals are set to be 1119888 where 119888 is a constant Secondlya bio-like sampling strategy is invoked This strategy tendsto have more time points at the beginning of time-courseexperiments and less time points after we set the first 23time intervals as 1119888 and set the next 13 time intervalsas 2119888 Third time intervals are randomly chosen between1119888 and 2119888 The last sampling strategy in which all timeintervals are exponentially distributed with parameter 119888 isless realistic than the others but it is helpful for us to evaluatethe performance of RIAA under pathological conditions

ROC curves are applied for performance comparisonTo this end 10000 periodic signals were generated using(25) and 10000 nonperiodic signals were generated usingeither (26) or (27) Sensitivity measures the proportion ofsuccessful detection among the 10000 periodic signals andspecificity measures the proportion of correct claims onthe 10000 nonperiodic simulation datasets Sampling timepoints are decided by one of the four sampling strategies andthe number of time points 119899 is chosen arbitrarily For all ROCcurves in Section 4 119888 = 2 and 119899 = 24

33 Real Data Analysis Two yeast cell cycle experimentssynchronized using an alpha-factor one conducted by Spell-man et al [2] and one conducted by Pramila et al [18]are considered for a real data analysis The first time-course microarray data termed dataset alpha and down-loaded from the Yeast Cell Cycle Analysis Project website(httpgenome-wwwstanfordeducellcycle) harbors 6178gene expression levels and 18 sampling time points with a 7-minute intervalThe second time-course data termed datasetalpha 38 is downloaded from the online portal for FredHutchinson Cancer Research Centerrsquos scientific laboratories(httplabsfhcrcorgbreedencellcycle) This dataset con-tains 4774 gene expression levels and 25 sampling time pointswith a 5-minute interval Three benchmark sets of genes thathave been utilized in Lichtenberg et al [19] and Liew et al[20] as standards of cell cycle genes are also applied herein forperformance comparison These benchmark sets involving113 352 and 518 genes respectively include candidates ofcycle cell regulated genes in yeast proposed by Spellman et al[2] Johansson et al [21] Simon et al [22] Lee et al [23] andMewes et al [24] and are accessible in a laboratory website(httpwwwcbsdtudkcellcycle)

4 Results

RIAA performed well in the conducted simulations Asshown in Figure 2(a) a periodic signal (solid line) withamplitude 119872 = 1 and frequency 120596119904 = 04120587 is sampled

6 Advances in Bioinformatics

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 3 The ROC curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 04120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

Advances in Bioinformatics 7

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 4 The ROC Curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 01120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

2 Advances in Bioinformatics

(spectral) domain and then applying a significance evaluationtest for the resulting peak in the spectral density

While numerous methods have been developed fordetecting periodicities in gene expression most of thesemethods suffer from false positive errors andworking restric-tions to a certain extent particularly when the time-coursedata contain limited time points In addition no algorithmseems available to resolve all of these challenges Microar-ray as well as other high-throughput experiments due tohigh manufacturing and preparation costs have commoncharacteristics of small sample size [9] noisy measurements[10] and arbitrary sampling strategies [11] thereby makingthe detection of periodicities highly challenging Since thenumber and functions of cell cycle regulated genes or peri-odic genes remain greatly uncertain advances in detectionalgorithms are urgently needed

Recently Stoica et al developed a novel nonparametricmethod termed the ldquoreal-valued iterative adaptive approach(RIAA)rdquo specifically for spectral analysis with nonuniformlysampled data [12] As stated by the authors RIAA aniteratively weighted least-squares periodogram can providerobust spectral estimates and is most suitable for sinu-soidal signals These characteristics of RIAA inspired us toapply it to time-course gene expression data and conductan examination on its performance Herein we incorpo-rate RIAA with a Fisherrsquos statistic to detect transcriptionalperiodicities A rigorous comparison of RIAA with severalaforementioned algorithms in terms of sensitivities andspecificities is conducted through simulations and simula-tion results dealing with real data analysis are also pro-vided

In this study we found that the RIAA algorithm canprovide robust spectral estimates for the detection of periodicgenes regardless of the sampling strategies adopted in theexperiments or the nonperiodic nature of noise present inthe measurement process We show through simulations thatthe RIAA can outperform the existing algorithms particularlywhen the data are highly irregularly sampled and when thenumber of cycles covered by the sampling time points isvery fewThese characteristics of RIAA fit perfectly the needsof time-course gene expression data analysis This paper isorganized as follows In Section 2 we begin with an overviewof RIAA In Section 3 a scheme for detecting periodicities isproposed and simulationmodels for performance evaluationand a real data analysis for validation purposes are presentedA complete investigation of the performance of RIAA and arigorous comparison with other algorithms are provided inSection 4

2 RIAA Algorithm

RIAA is an iterative algorithm developed for finding theleast-squares periodogram with the utilization of a weightedfunction The essential mathematics involved in RIAA isintroduced in this section with the algorithm input beingtime-course expression data for more details regardingRIAA the readers are encouraged to check the original paperby Stoica et al [12]

21 Basics Suppose that the signals associated with the peri-odic gene expressions are composed of noise and sinusoidalcomponents Let 119910ℎ(119905119894) 119894 = 1 119899 denote the time-courseexpression ratios of gene ℎ at instances 1199051 119905119899 respectively119910ℎ(119905119894) are real numbers sum119899

119894=1119910ℎ(119905119894) = 0 The least-squares

periodogramΦ119897119904119901 is given by

Φ119897119904119901 = |(120596)|2 (1)

where (120596) is the solution to the following fitting problem

(120596) = argmin120572(120596)

119899

sum

119894=1

[119910ℎ (119905119894) minus 120572 (120596) 119890119895120596119905119894]2

(2)

Let 120572(120596) = |120572(120596)|119890119895120601(120596)

= 120573119890119895120579 where 120573 = |120572(120596)| ge 0 and

120579 = 120601(120596) isin [0 2120587] refer to the amplitude and phase of 120572(120596)respectively The criterion in (2) can then be rewritten as119899

sum

119894=1

[119910ℎ (119905119894) minus 120573 cos (120596119905119894 + 120579)]2+ 1205732

119899

sum

119894=1

sin2 (120596119905119894 + 120579) (3)

The second term in the above equation is data indepen-dent and can be omitted from the minimization operationHence the criterion (2) is simplified to

(120573 120579) = argmin120573120579

119899

sum

119894=1

[119910ℎ (119905119894) minus 120573 cos (120596119905119894 + 120579)]2 (4)

We further apply 119886 = 120573 cos(120579) and 119887 = minus120573 sin(120579) and derivean equivalent of (4) as follows

(119886 ) = argmin119886119887

119899

sum

119894=1

[119910ℎ (119905119894) minus 119886 cos (120596119905119894) minus 119887 sin (120596119905119894)]2 (5)

The target of interest to the fitting problem now becomes 119886and (instead of 120572(120596)) and the solution is well known to be

[119886

] = Rminus1r (6)

where

R =

119899

sum

119894=1

[cos (120596119905119894)

2 cos (120596119905119894) sin (120596119905119894)sin (120596119905119894) cos (120596119905119894) sin (120596119905119894)

2 ]

r =119899

sum

119894=1

[cos (120596119905119894)sin (120596119905119894)

] 119910ℎ (119905119894)

(7)

After 119886 and are estimated the least-squares periodogramcan be derived

22 Observation Interval and Resolution Prior to implemen-tation of RIAA for periodogram estimation the observationinterval [0 120596max] and the resolution in terms of grid size haveto be selected To this end the maximum frequency 120596max inthe observation interval without aliasing errors for samplinginstances 1199051 119905119899 can be evaluated by

120596max =1205960

2 (8)

Advances in Bioinformatics 3

where 1205960 is given by

1205960 =2 (119899 minus 1) 120587

sum119899minus1

119894=1(119905119894+1 minus 119905119894)

(9)

The observation interval [0 120596max] is hence chosen after 120596maxis obtained

To ensure that the smallest frequency separation in time-course expression data with regular or irregular sampling canbe adequately detected the grid size Δ120596 is chosen to be

Δ120596 =2120587

119905119899 minus 1199051 (10)

which in fact is the resolution limit of the least-squaresperiodogram As a result the frequency grids 120596119892 consideredin periodogram are

120596119892 = 119892Δ120596 119892 = 1 119866 (11)

where the number of grids 119866 is given by

119866 = lfloor120596maxΔ120596

rfloor (12)

23 Implementation The following notations are introducedfor the implementation of RIAA at a specific frequency 120596119892

Y = [119910ℎ (1199051) sdot sdot sdot 119910ℎ (119905119899)]119879

120588119892 = [119886 (120596119892) 119887 (120596119892)]119879

A119892 = [c119892 s119892]

(13)

where

c119892 = [cos (1205961198921199051) sdot sdot sdot cos (120596119892119905119899)]119879

s119892 = [sin (1205961198921199051) sdot sdot sdot sin (120596119892119905119899)]119879

(14)

and 119886(120596119892) and 119887(120596119892) denote variables 119886 and 119887 at frequency120596119892respectively

RIAArsquos salient feature is the addition of a weighted matrixQ119892 to the least-squares fitting criterion The weighted matrixQ119892 can be viewed as a covariance matrix encapsulating thecontributions of noise and other sinusoidal components in Yother than 120596119892 to the spectrum it is defined as

Q119892 = Σ +119866

sum

119898=1119898 = 119892

A119898D119898A119879

119898 (15)

where

D119898 =1198862(120596119892) + 119887

2(120596119892)

2[1 0

0 1] (16)

and Σ denotes the covariance matrix of noise in expressiondata Y given by

Σ =[[

[

1205902

0

0 120590

2

]]

]

(17)

Assuming thatQ119892 is invertible in RIAA a weighted least-squares fitting problem is formulated and considered forfinding 119886 and (instead of using (5)) and it is written in theform of matrices using (13) as follows

120588119892 = argmin120588119892

[Y minus A119892120588119892]119879

Qminus1119892[Y minus A119892120588119892] (18)

In Stoica et al [12] the solution to (18) has been shown tobe

120588119892 =A119879119892Qminus1119892Y

A119879119892Qminus1119892A119892

(19)

and the RIAA periodogram at 120596 = 120596119892 can be derived by

Φriaa (120596119892) =1

119899120588119879

119892(A119879119892A119892) 120588119892 (20)

From (15) and (19) it is obvious thatQ119892 and 120588119892 are dependenton each other An iterative approach (ie RIAA) is hencea feasible solution to get the estimate 120588119892 and the weightedmatrixQ119892

The iteration for estimating spectrum starts with initialestimates 1205880

119892 in which the elements 119886 and are given by (6)

with 120596 = 120596119892 119892 = 1 119866 After initialization the firstiteration begins First the elements 119886 and of 1205880

119892are applied

to obtain D1119898using (16) Secondly to get a good estimate of

1 the frequency 120596119901 at which the largest value-119901 is located

in the temporary periodogramΦ0(120596119892) 119892 = 1 119866 derived

using (20) with 120588119892 = 1205880

119892 is applied for obtaining a reversed

engineered signal Y0 The elements 119910ℎ(119905119894) 119894 = 1 119899 in Y0are given by

119910ℎ (119905119894) =radic2119875 cos (120596119901119905119894 + 119904) 119894 = 1 119899 (21)

The phase of the cosine function 119904 is unknown however 1is estimable using

1= min119904isin[02120587]

10038171003817100381710038171003817Y minus Y010038171003817100381710038171003817

2

119899 (22)

where || sdot || is the Euclidean norm With estimates D1119898and

1 the estimates Q1

119892 119892 = 1 119866 in the first iteration are

hence given by (15) After this Q1119892are inserted into the right-

hand side of (19) and updated estimates 1205881119892 119892 = 1 119866

are derived The algorithm consists of repeating these stepsand updating Q119896

119892and 120588

119896

119892iteratively where 119896 denotes the

number of iterations until a termination criterion is reachedIf the process stops at the 119870th iteration then the final RIAAperiodogram is given by (20) using 120588119870

119892 The pseudocode in

Algorithm 1 represents a concise description of the iterativeRIAA process

3 Methods

Figure 1 demonstrates our scheme for periodicity detectionand algorithm comparison The first step involves a peri-odogram estimation which converts the time-course gene

4 Advances in Bioinformatics

Algorithm RIAA

InitializationUse (6) to obtain the initial estimates 119886 and in 1205880

119892

TheFirst IterationObtain D1

119898using (16) with parameters 119886 and given by 1205880

119892 Obtain 1 using (22) Using D1

119898

and 1 to drive the first weighted matrix Q1119892by (15) Update estimate 1205881

119892by (19) withQ119892 = Q1

119892

Updating IterationAt the 119896th iteration 119896 = 1 2 estimates Q119896

119892and 120588119896

119892are iteratively updated in the same way

as the first iterationTerminationTerminate simply after 15 iterations (119870 = 15) or when the total changes in 119889119896

119892= ||120588119896

119892||

for 119892 = 1 119866 is extremely small sayradicsum119866119892=1

(119889119896119892minus 119889119896minus1119892

)2lt 0005radicsum119866

119892=1(119889119896minus1119892

)2 then 119870 = 119896

Algorithm 1 The pseudocode of the iterative process in RIAA

expression ratios into the frequency domain Three methodsare considered for comparison RIAA LS and a detrend LS(termed DLS) which uses an additional detrend function(developed in LSPR) before regular LS periodogram estima-tion is applied The derived spectra are then analyzed usinghypothesis testing This study is conducted using a Fisherrsquostest with the null hypothesis that there are no periodicsignals in the time domain and hence no significantly largepeak in the derived spectra The algorithm performanceis evaluated and compared via simulations and receiveroperating characteristic (ROC) curves In real microarraydata analysis three published benchmark sets are utilized asstandards of cell cycle genes for performance comparison

31 Fisherrsquos Test After the spectrum of time-course expres-sion data is obtained via periodogram estimation a Fisherrsquosstatistic 119891 for gene ℎ with the null hypothesis 1198670 thatthe peak of the spectral density is insignificant against thealternative hypothesis1198671 that the peak of the spectral densityis significant is applied as

119891ℎ =max1le119892le119866 (Φ (120596119892))

119866minus1sum119866

119892=1Φ(120596119892)

(23)

where Φ refers to the periodogram derived using RIAA LSor DLS The null hypothesis 1198670 is rejected and the geneℎ is claimed as a periodic gene if its 119901-value denoted as119901ℎ is less than or equal to a specific significance thresholdFor simplicity 119901ℎ is approximated from the asymptotic nulldistribution of 119891 assuming Gaussian noise [13] as follows

119901ℎ = 1 minus 119890minus119899119890minus119891ℎ

(24)

In real data analysis deviation might be invoked for theestimation of 119901ℎ when the time-course data is short Thisissue was carefully addressed by Liew et al [14] and assuggested alternative methods such as random permutationmay provide less deviation and better performance Howeverpermutation also has limitations such as tending to be con-servative [15] While finding the most robust method for the

Time-courseexpression ratios

Spec

tral

ana

lysis

in fr

eque

ncy

dom

ain

Periodogramestimation

Hypothesistesting

Benchmarksets

Real data

RIAAcompared withLS DLS

Periodic genes andnonperiodicities

SimulationsROC curves

Fisherrsquos test

Figure 1 The scheme of the process for detecting periodicities intime-course expression data

119901-value evaluation remains an open question it gets beyondthe scope of this study since the algorithm comparison viaROC curves is threshold independent [16] and the results areunaffected by the deviation

32 Simulations Simulations are applied to evaluate theperformance of RIAA The simulation models and samplingstrategies used for simulations are described in the followingparagraphs

321 Periodic and Nonperiodic Signals Three models onefor periodic signals and two for nonperiodic signals areconsidered as transcriptional signals Since periodic genes aretranscribed in an oscillatory manner the expression levels 119910119904embedded with periodicities are assumed to be

119910119904 (119905119894) = 119872 cos (120596119904119905119894) + 120598119905119894 119894 = 1 119899 (25)

where 119872 denotes the sinusoidal amplitude 120596119904 refers to thesignal frequency 120598119905119894 are Gaussian noise independent and

Advances in Bioinformatics 5

0 2 4 6 8 10 12 14 16

0

2

4

Time

Gen

e exp

ress

ion

Sampled dataPeriodic signal

minus2

(a)

Frequency

Am

plitu

de

RIAA

00

02

02

04

04

06

06

08

08

1

01 03 05 07

= 24 times 10minus3

119901-value

(b)

Figure 2 (a) A time-course periodic signal with frequency = 02 sampled by the bio-like sampling strategy 16 time points are assigned tothe interval (08] and 8 time points are assigned to the interval (816] (b)The periodogram derived using RIAAThemaximum value (peak)in the periodogram locates at frequency = 0195

identically distributed (iid) with parameters 120583 and 120590 Fornonperiodic signals the first model 119910119899 is simply composed ofGaussian noise given by

119910119899 (119905119894) = 120598119905119894 119894 = 1 119899 (26)

Additionally as visualized by Chubb et al gene transcriptioncan be nonperiodically activated with irregular intervals in aliving eukaryotic cell like pulses turning on and off rapidlyand discontinuously [17] Based on this the second nonpe-riodic model 1199101015840

119899incorporates one additional transcriptional

burst and one additional sudden drop into the Gaussiannoise which can be written as

1199101015840

119899(119905119894) = 119868119887 (119905119894) minus 119868119889 (119905119894) + 120598119905119894

119894 = 1 119899 (27)

where 119868119887 and 119868119889 are indicator functions equal to 1 at thelocation of the burst and the drop respectively and 0

otherwise The transcriptional burst assumes a positive pulsewhile the transcriptional drop assumes a negative pulse Bothof them may be located randomly among all time points andare assumed to last for two time points In other words theindicator functions are equal to 1 at two consecutive timepoints say 119868119887 = 1 at 119905119894 and 119905119894+1 The burst and the drop haveno overlap

322 Sampling Strategies As for the choices of sampling timepoints 119905119894 119894 = 1 119899 four different sampling strategies onewith regular sampling and three with irregular sampling areconsidered First regular sampling is applied inwhich all timeintervals are set to be 1119888 where 119888 is a constant Secondlya bio-like sampling strategy is invoked This strategy tendsto have more time points at the beginning of time-courseexperiments and less time points after we set the first 23time intervals as 1119888 and set the next 13 time intervalsas 2119888 Third time intervals are randomly chosen between1119888 and 2119888 The last sampling strategy in which all timeintervals are exponentially distributed with parameter 119888 isless realistic than the others but it is helpful for us to evaluatethe performance of RIAA under pathological conditions

ROC curves are applied for performance comparisonTo this end 10000 periodic signals were generated using(25) and 10000 nonperiodic signals were generated usingeither (26) or (27) Sensitivity measures the proportion ofsuccessful detection among the 10000 periodic signals andspecificity measures the proportion of correct claims onthe 10000 nonperiodic simulation datasets Sampling timepoints are decided by one of the four sampling strategies andthe number of time points 119899 is chosen arbitrarily For all ROCcurves in Section 4 119888 = 2 and 119899 = 24

33 Real Data Analysis Two yeast cell cycle experimentssynchronized using an alpha-factor one conducted by Spell-man et al [2] and one conducted by Pramila et al [18]are considered for a real data analysis The first time-course microarray data termed dataset alpha and down-loaded from the Yeast Cell Cycle Analysis Project website(httpgenome-wwwstanfordeducellcycle) harbors 6178gene expression levels and 18 sampling time points with a 7-minute intervalThe second time-course data termed datasetalpha 38 is downloaded from the online portal for FredHutchinson Cancer Research Centerrsquos scientific laboratories(httplabsfhcrcorgbreedencellcycle) This dataset con-tains 4774 gene expression levels and 25 sampling time pointswith a 5-minute interval Three benchmark sets of genes thathave been utilized in Lichtenberg et al [19] and Liew et al[20] as standards of cell cycle genes are also applied herein forperformance comparison These benchmark sets involving113 352 and 518 genes respectively include candidates ofcycle cell regulated genes in yeast proposed by Spellman et al[2] Johansson et al [21] Simon et al [22] Lee et al [23] andMewes et al [24] and are accessible in a laboratory website(httpwwwcbsdtudkcellcycle)

4 Results

RIAA performed well in the conducted simulations Asshown in Figure 2(a) a periodic signal (solid line) withamplitude 119872 = 1 and frequency 120596119904 = 04120587 is sampled

6 Advances in Bioinformatics

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 3 The ROC curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 04120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

Advances in Bioinformatics 7

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 4 The ROC Curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 01120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Advances in Bioinformatics 3

where 1205960 is given by

1205960 =2 (119899 minus 1) 120587

sum119899minus1

119894=1(119905119894+1 minus 119905119894)

(9)

The observation interval [0 120596max] is hence chosen after 120596maxis obtained

To ensure that the smallest frequency separation in time-course expression data with regular or irregular sampling canbe adequately detected the grid size Δ120596 is chosen to be

Δ120596 =2120587

119905119899 minus 1199051 (10)

which in fact is the resolution limit of the least-squaresperiodogram As a result the frequency grids 120596119892 consideredin periodogram are

120596119892 = 119892Δ120596 119892 = 1 119866 (11)

where the number of grids 119866 is given by

119866 = lfloor120596maxΔ120596

rfloor (12)

23 Implementation The following notations are introducedfor the implementation of RIAA at a specific frequency 120596119892

Y = [119910ℎ (1199051) sdot sdot sdot 119910ℎ (119905119899)]119879

120588119892 = [119886 (120596119892) 119887 (120596119892)]119879

A119892 = [c119892 s119892]

(13)

where

c119892 = [cos (1205961198921199051) sdot sdot sdot cos (120596119892119905119899)]119879

s119892 = [sin (1205961198921199051) sdot sdot sdot sin (120596119892119905119899)]119879

(14)

and 119886(120596119892) and 119887(120596119892) denote variables 119886 and 119887 at frequency120596119892respectively

RIAArsquos salient feature is the addition of a weighted matrixQ119892 to the least-squares fitting criterion The weighted matrixQ119892 can be viewed as a covariance matrix encapsulating thecontributions of noise and other sinusoidal components in Yother than 120596119892 to the spectrum it is defined as

Q119892 = Σ +119866

sum

119898=1119898 = 119892

A119898D119898A119879

119898 (15)

where

D119898 =1198862(120596119892) + 119887

2(120596119892)

2[1 0

0 1] (16)

and Σ denotes the covariance matrix of noise in expressiondata Y given by

Σ =[[

[

1205902

0

0 120590

2

]]

]

(17)

Assuming thatQ119892 is invertible in RIAA a weighted least-squares fitting problem is formulated and considered forfinding 119886 and (instead of using (5)) and it is written in theform of matrices using (13) as follows

120588119892 = argmin120588119892

[Y minus A119892120588119892]119879

Qminus1119892[Y minus A119892120588119892] (18)

In Stoica et al [12] the solution to (18) has been shown tobe

120588119892 =A119879119892Qminus1119892Y

A119879119892Qminus1119892A119892

(19)

and the RIAA periodogram at 120596 = 120596119892 can be derived by

Φriaa (120596119892) =1

119899120588119879

119892(A119879119892A119892) 120588119892 (20)

From (15) and (19) it is obvious thatQ119892 and 120588119892 are dependenton each other An iterative approach (ie RIAA) is hencea feasible solution to get the estimate 120588119892 and the weightedmatrixQ119892

The iteration for estimating spectrum starts with initialestimates 1205880

119892 in which the elements 119886 and are given by (6)

with 120596 = 120596119892 119892 = 1 119866 After initialization the firstiteration begins First the elements 119886 and of 1205880

119892are applied

to obtain D1119898using (16) Secondly to get a good estimate of

1 the frequency 120596119901 at which the largest value-119901 is located

in the temporary periodogramΦ0(120596119892) 119892 = 1 119866 derived

using (20) with 120588119892 = 1205880

119892 is applied for obtaining a reversed

engineered signal Y0 The elements 119910ℎ(119905119894) 119894 = 1 119899 in Y0are given by

119910ℎ (119905119894) =radic2119875 cos (120596119901119905119894 + 119904) 119894 = 1 119899 (21)

The phase of the cosine function 119904 is unknown however 1is estimable using

1= min119904isin[02120587]

10038171003817100381710038171003817Y minus Y010038171003817100381710038171003817

2

119899 (22)

where || sdot || is the Euclidean norm With estimates D1119898and

1 the estimates Q1

119892 119892 = 1 119866 in the first iteration are

hence given by (15) After this Q1119892are inserted into the right-

hand side of (19) and updated estimates 1205881119892 119892 = 1 119866

are derived The algorithm consists of repeating these stepsand updating Q119896

119892and 120588

119896

119892iteratively where 119896 denotes the

number of iterations until a termination criterion is reachedIf the process stops at the 119870th iteration then the final RIAAperiodogram is given by (20) using 120588119870

119892 The pseudocode in

Algorithm 1 represents a concise description of the iterativeRIAA process

3 Methods

Figure 1 demonstrates our scheme for periodicity detectionand algorithm comparison The first step involves a peri-odogram estimation which converts the time-course gene

4 Advances in Bioinformatics

Algorithm RIAA

InitializationUse (6) to obtain the initial estimates 119886 and in 1205880

119892

TheFirst IterationObtain D1

119898using (16) with parameters 119886 and given by 1205880

119892 Obtain 1 using (22) Using D1

119898

and 1 to drive the first weighted matrix Q1119892by (15) Update estimate 1205881

119892by (19) withQ119892 = Q1

119892

Updating IterationAt the 119896th iteration 119896 = 1 2 estimates Q119896

119892and 120588119896

119892are iteratively updated in the same way

as the first iterationTerminationTerminate simply after 15 iterations (119870 = 15) or when the total changes in 119889119896

119892= ||120588119896

119892||

for 119892 = 1 119866 is extremely small sayradicsum119866119892=1

(119889119896119892minus 119889119896minus1119892

)2lt 0005radicsum119866

119892=1(119889119896minus1119892

)2 then 119870 = 119896

Algorithm 1 The pseudocode of the iterative process in RIAA

expression ratios into the frequency domain Three methodsare considered for comparison RIAA LS and a detrend LS(termed DLS) which uses an additional detrend function(developed in LSPR) before regular LS periodogram estima-tion is applied The derived spectra are then analyzed usinghypothesis testing This study is conducted using a Fisherrsquostest with the null hypothesis that there are no periodicsignals in the time domain and hence no significantly largepeak in the derived spectra The algorithm performanceis evaluated and compared via simulations and receiveroperating characteristic (ROC) curves In real microarraydata analysis three published benchmark sets are utilized asstandards of cell cycle genes for performance comparison

31 Fisherrsquos Test After the spectrum of time-course expres-sion data is obtained via periodogram estimation a Fisherrsquosstatistic 119891 for gene ℎ with the null hypothesis 1198670 thatthe peak of the spectral density is insignificant against thealternative hypothesis1198671 that the peak of the spectral densityis significant is applied as

119891ℎ =max1le119892le119866 (Φ (120596119892))

119866minus1sum119866

119892=1Φ(120596119892)

(23)

where Φ refers to the periodogram derived using RIAA LSor DLS The null hypothesis 1198670 is rejected and the geneℎ is claimed as a periodic gene if its 119901-value denoted as119901ℎ is less than or equal to a specific significance thresholdFor simplicity 119901ℎ is approximated from the asymptotic nulldistribution of 119891 assuming Gaussian noise [13] as follows

119901ℎ = 1 minus 119890minus119899119890minus119891ℎ

(24)

In real data analysis deviation might be invoked for theestimation of 119901ℎ when the time-course data is short Thisissue was carefully addressed by Liew et al [14] and assuggested alternative methods such as random permutationmay provide less deviation and better performance Howeverpermutation also has limitations such as tending to be con-servative [15] While finding the most robust method for the

Time-courseexpression ratios

Spec

tral

ana

lysis

in fr

eque

ncy

dom

ain

Periodogramestimation

Hypothesistesting

Benchmarksets

Real data

RIAAcompared withLS DLS

Periodic genes andnonperiodicities

SimulationsROC curves

Fisherrsquos test

Figure 1 The scheme of the process for detecting periodicities intime-course expression data

119901-value evaluation remains an open question it gets beyondthe scope of this study since the algorithm comparison viaROC curves is threshold independent [16] and the results areunaffected by the deviation

32 Simulations Simulations are applied to evaluate theperformance of RIAA The simulation models and samplingstrategies used for simulations are described in the followingparagraphs

321 Periodic and Nonperiodic Signals Three models onefor periodic signals and two for nonperiodic signals areconsidered as transcriptional signals Since periodic genes aretranscribed in an oscillatory manner the expression levels 119910119904embedded with periodicities are assumed to be

119910119904 (119905119894) = 119872 cos (120596119904119905119894) + 120598119905119894 119894 = 1 119899 (25)

where 119872 denotes the sinusoidal amplitude 120596119904 refers to thesignal frequency 120598119905119894 are Gaussian noise independent and

Advances in Bioinformatics 5

0 2 4 6 8 10 12 14 16

0

2

4

Time

Gen

e exp

ress

ion

Sampled dataPeriodic signal

minus2

(a)

Frequency

Am

plitu

de

RIAA

00

02

02

04

04

06

06

08

08

1

01 03 05 07

= 24 times 10minus3

119901-value

(b)

Figure 2 (a) A time-course periodic signal with frequency = 02 sampled by the bio-like sampling strategy 16 time points are assigned tothe interval (08] and 8 time points are assigned to the interval (816] (b)The periodogram derived using RIAAThemaximum value (peak)in the periodogram locates at frequency = 0195

identically distributed (iid) with parameters 120583 and 120590 Fornonperiodic signals the first model 119910119899 is simply composed ofGaussian noise given by

119910119899 (119905119894) = 120598119905119894 119894 = 1 119899 (26)

Additionally as visualized by Chubb et al gene transcriptioncan be nonperiodically activated with irregular intervals in aliving eukaryotic cell like pulses turning on and off rapidlyand discontinuously [17] Based on this the second nonpe-riodic model 1199101015840

119899incorporates one additional transcriptional

burst and one additional sudden drop into the Gaussiannoise which can be written as

1199101015840

119899(119905119894) = 119868119887 (119905119894) minus 119868119889 (119905119894) + 120598119905119894

119894 = 1 119899 (27)

where 119868119887 and 119868119889 are indicator functions equal to 1 at thelocation of the burst and the drop respectively and 0

otherwise The transcriptional burst assumes a positive pulsewhile the transcriptional drop assumes a negative pulse Bothof them may be located randomly among all time points andare assumed to last for two time points In other words theindicator functions are equal to 1 at two consecutive timepoints say 119868119887 = 1 at 119905119894 and 119905119894+1 The burst and the drop haveno overlap

322 Sampling Strategies As for the choices of sampling timepoints 119905119894 119894 = 1 119899 four different sampling strategies onewith regular sampling and three with irregular sampling areconsidered First regular sampling is applied inwhich all timeintervals are set to be 1119888 where 119888 is a constant Secondlya bio-like sampling strategy is invoked This strategy tendsto have more time points at the beginning of time-courseexperiments and less time points after we set the first 23time intervals as 1119888 and set the next 13 time intervalsas 2119888 Third time intervals are randomly chosen between1119888 and 2119888 The last sampling strategy in which all timeintervals are exponentially distributed with parameter 119888 isless realistic than the others but it is helpful for us to evaluatethe performance of RIAA under pathological conditions

ROC curves are applied for performance comparisonTo this end 10000 periodic signals were generated using(25) and 10000 nonperiodic signals were generated usingeither (26) or (27) Sensitivity measures the proportion ofsuccessful detection among the 10000 periodic signals andspecificity measures the proportion of correct claims onthe 10000 nonperiodic simulation datasets Sampling timepoints are decided by one of the four sampling strategies andthe number of time points 119899 is chosen arbitrarily For all ROCcurves in Section 4 119888 = 2 and 119899 = 24

33 Real Data Analysis Two yeast cell cycle experimentssynchronized using an alpha-factor one conducted by Spell-man et al [2] and one conducted by Pramila et al [18]are considered for a real data analysis The first time-course microarray data termed dataset alpha and down-loaded from the Yeast Cell Cycle Analysis Project website(httpgenome-wwwstanfordeducellcycle) harbors 6178gene expression levels and 18 sampling time points with a 7-minute intervalThe second time-course data termed datasetalpha 38 is downloaded from the online portal for FredHutchinson Cancer Research Centerrsquos scientific laboratories(httplabsfhcrcorgbreedencellcycle) This dataset con-tains 4774 gene expression levels and 25 sampling time pointswith a 5-minute interval Three benchmark sets of genes thathave been utilized in Lichtenberg et al [19] and Liew et al[20] as standards of cell cycle genes are also applied herein forperformance comparison These benchmark sets involving113 352 and 518 genes respectively include candidates ofcycle cell regulated genes in yeast proposed by Spellman et al[2] Johansson et al [21] Simon et al [22] Lee et al [23] andMewes et al [24] and are accessible in a laboratory website(httpwwwcbsdtudkcellcycle)

4 Results

RIAA performed well in the conducted simulations Asshown in Figure 2(a) a periodic signal (solid line) withamplitude 119872 = 1 and frequency 120596119904 = 04120587 is sampled

6 Advances in Bioinformatics

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 3 The ROC curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 04120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

Advances in Bioinformatics 7

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 4 The ROC Curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 01120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

4 Advances in Bioinformatics

Algorithm RIAA

InitializationUse (6) to obtain the initial estimates 119886 and in 1205880

119892

TheFirst IterationObtain D1

119898using (16) with parameters 119886 and given by 1205880

119892 Obtain 1 using (22) Using D1

119898

and 1 to drive the first weighted matrix Q1119892by (15) Update estimate 1205881

119892by (19) withQ119892 = Q1

119892

Updating IterationAt the 119896th iteration 119896 = 1 2 estimates Q119896

119892and 120588119896

119892are iteratively updated in the same way

as the first iterationTerminationTerminate simply after 15 iterations (119870 = 15) or when the total changes in 119889119896

119892= ||120588119896

119892||

for 119892 = 1 119866 is extremely small sayradicsum119866119892=1

(119889119896119892minus 119889119896minus1119892

)2lt 0005radicsum119866

119892=1(119889119896minus1119892

)2 then 119870 = 119896

Algorithm 1 The pseudocode of the iterative process in RIAA

expression ratios into the frequency domain Three methodsare considered for comparison RIAA LS and a detrend LS(termed DLS) which uses an additional detrend function(developed in LSPR) before regular LS periodogram estima-tion is applied The derived spectra are then analyzed usinghypothesis testing This study is conducted using a Fisherrsquostest with the null hypothesis that there are no periodicsignals in the time domain and hence no significantly largepeak in the derived spectra The algorithm performanceis evaluated and compared via simulations and receiveroperating characteristic (ROC) curves In real microarraydata analysis three published benchmark sets are utilized asstandards of cell cycle genes for performance comparison

31 Fisherrsquos Test After the spectrum of time-course expres-sion data is obtained via periodogram estimation a Fisherrsquosstatistic 119891 for gene ℎ with the null hypothesis 1198670 thatthe peak of the spectral density is insignificant against thealternative hypothesis1198671 that the peak of the spectral densityis significant is applied as

119891ℎ =max1le119892le119866 (Φ (120596119892))

119866minus1sum119866

119892=1Φ(120596119892)

(23)

where Φ refers to the periodogram derived using RIAA LSor DLS The null hypothesis 1198670 is rejected and the geneℎ is claimed as a periodic gene if its 119901-value denoted as119901ℎ is less than or equal to a specific significance thresholdFor simplicity 119901ℎ is approximated from the asymptotic nulldistribution of 119891 assuming Gaussian noise [13] as follows

119901ℎ = 1 minus 119890minus119899119890minus119891ℎ

(24)

In real data analysis deviation might be invoked for theestimation of 119901ℎ when the time-course data is short Thisissue was carefully addressed by Liew et al [14] and assuggested alternative methods such as random permutationmay provide less deviation and better performance Howeverpermutation also has limitations such as tending to be con-servative [15] While finding the most robust method for the

Time-courseexpression ratios

Spec

tral

ana

lysis

in fr

eque

ncy

dom

ain

Periodogramestimation

Hypothesistesting

Benchmarksets

Real data

RIAAcompared withLS DLS

Periodic genes andnonperiodicities

SimulationsROC curves

Fisherrsquos test

Figure 1 The scheme of the process for detecting periodicities intime-course expression data

119901-value evaluation remains an open question it gets beyondthe scope of this study since the algorithm comparison viaROC curves is threshold independent [16] and the results areunaffected by the deviation

32 Simulations Simulations are applied to evaluate theperformance of RIAA The simulation models and samplingstrategies used for simulations are described in the followingparagraphs

321 Periodic and Nonperiodic Signals Three models onefor periodic signals and two for nonperiodic signals areconsidered as transcriptional signals Since periodic genes aretranscribed in an oscillatory manner the expression levels 119910119904embedded with periodicities are assumed to be

119910119904 (119905119894) = 119872 cos (120596119904119905119894) + 120598119905119894 119894 = 1 119899 (25)

where 119872 denotes the sinusoidal amplitude 120596119904 refers to thesignal frequency 120598119905119894 are Gaussian noise independent and

Advances in Bioinformatics 5

0 2 4 6 8 10 12 14 16

0

2

4

Time

Gen

e exp

ress

ion

Sampled dataPeriodic signal

minus2

(a)

Frequency

Am

plitu

de

RIAA

00

02

02

04

04

06

06

08

08

1

01 03 05 07

= 24 times 10minus3

119901-value

(b)

Figure 2 (a) A time-course periodic signal with frequency = 02 sampled by the bio-like sampling strategy 16 time points are assigned tothe interval (08] and 8 time points are assigned to the interval (816] (b)The periodogram derived using RIAAThemaximum value (peak)in the periodogram locates at frequency = 0195

identically distributed (iid) with parameters 120583 and 120590 Fornonperiodic signals the first model 119910119899 is simply composed ofGaussian noise given by

119910119899 (119905119894) = 120598119905119894 119894 = 1 119899 (26)

Additionally as visualized by Chubb et al gene transcriptioncan be nonperiodically activated with irregular intervals in aliving eukaryotic cell like pulses turning on and off rapidlyand discontinuously [17] Based on this the second nonpe-riodic model 1199101015840

119899incorporates one additional transcriptional

burst and one additional sudden drop into the Gaussiannoise which can be written as

1199101015840

119899(119905119894) = 119868119887 (119905119894) minus 119868119889 (119905119894) + 120598119905119894

119894 = 1 119899 (27)

where 119868119887 and 119868119889 are indicator functions equal to 1 at thelocation of the burst and the drop respectively and 0

otherwise The transcriptional burst assumes a positive pulsewhile the transcriptional drop assumes a negative pulse Bothof them may be located randomly among all time points andare assumed to last for two time points In other words theindicator functions are equal to 1 at two consecutive timepoints say 119868119887 = 1 at 119905119894 and 119905119894+1 The burst and the drop haveno overlap

322 Sampling Strategies As for the choices of sampling timepoints 119905119894 119894 = 1 119899 four different sampling strategies onewith regular sampling and three with irregular sampling areconsidered First regular sampling is applied inwhich all timeintervals are set to be 1119888 where 119888 is a constant Secondlya bio-like sampling strategy is invoked This strategy tendsto have more time points at the beginning of time-courseexperiments and less time points after we set the first 23time intervals as 1119888 and set the next 13 time intervalsas 2119888 Third time intervals are randomly chosen between1119888 and 2119888 The last sampling strategy in which all timeintervals are exponentially distributed with parameter 119888 isless realistic than the others but it is helpful for us to evaluatethe performance of RIAA under pathological conditions

ROC curves are applied for performance comparisonTo this end 10000 periodic signals were generated using(25) and 10000 nonperiodic signals were generated usingeither (26) or (27) Sensitivity measures the proportion ofsuccessful detection among the 10000 periodic signals andspecificity measures the proportion of correct claims onthe 10000 nonperiodic simulation datasets Sampling timepoints are decided by one of the four sampling strategies andthe number of time points 119899 is chosen arbitrarily For all ROCcurves in Section 4 119888 = 2 and 119899 = 24

33 Real Data Analysis Two yeast cell cycle experimentssynchronized using an alpha-factor one conducted by Spell-man et al [2] and one conducted by Pramila et al [18]are considered for a real data analysis The first time-course microarray data termed dataset alpha and down-loaded from the Yeast Cell Cycle Analysis Project website(httpgenome-wwwstanfordeducellcycle) harbors 6178gene expression levels and 18 sampling time points with a 7-minute intervalThe second time-course data termed datasetalpha 38 is downloaded from the online portal for FredHutchinson Cancer Research Centerrsquos scientific laboratories(httplabsfhcrcorgbreedencellcycle) This dataset con-tains 4774 gene expression levels and 25 sampling time pointswith a 5-minute interval Three benchmark sets of genes thathave been utilized in Lichtenberg et al [19] and Liew et al[20] as standards of cell cycle genes are also applied herein forperformance comparison These benchmark sets involving113 352 and 518 genes respectively include candidates ofcycle cell regulated genes in yeast proposed by Spellman et al[2] Johansson et al [21] Simon et al [22] Lee et al [23] andMewes et al [24] and are accessible in a laboratory website(httpwwwcbsdtudkcellcycle)

4 Results

RIAA performed well in the conducted simulations Asshown in Figure 2(a) a periodic signal (solid line) withamplitude 119872 = 1 and frequency 120596119904 = 04120587 is sampled

6 Advances in Bioinformatics

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 3 The ROC curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 04120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

Advances in Bioinformatics 7

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 4 The ROC Curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 01120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Advances in Bioinformatics 5

0 2 4 6 8 10 12 14 16

0

2

4

Time

Gen

e exp

ress

ion

Sampled dataPeriodic signal

minus2

(a)

Frequency

Am

plitu

de

RIAA

00

02

02

04

04

06

06

08

08

1

01 03 05 07

= 24 times 10minus3

119901-value

(b)

Figure 2 (a) A time-course periodic signal with frequency = 02 sampled by the bio-like sampling strategy 16 time points are assigned tothe interval (08] and 8 time points are assigned to the interval (816] (b)The periodogram derived using RIAAThemaximum value (peak)in the periodogram locates at frequency = 0195

identically distributed (iid) with parameters 120583 and 120590 Fornonperiodic signals the first model 119910119899 is simply composed ofGaussian noise given by

119910119899 (119905119894) = 120598119905119894 119894 = 1 119899 (26)

Additionally as visualized by Chubb et al gene transcriptioncan be nonperiodically activated with irregular intervals in aliving eukaryotic cell like pulses turning on and off rapidlyand discontinuously [17] Based on this the second nonpe-riodic model 1199101015840

119899incorporates one additional transcriptional

burst and one additional sudden drop into the Gaussiannoise which can be written as

1199101015840

119899(119905119894) = 119868119887 (119905119894) minus 119868119889 (119905119894) + 120598119905119894

119894 = 1 119899 (27)

where 119868119887 and 119868119889 are indicator functions equal to 1 at thelocation of the burst and the drop respectively and 0

otherwise The transcriptional burst assumes a positive pulsewhile the transcriptional drop assumes a negative pulse Bothof them may be located randomly among all time points andare assumed to last for two time points In other words theindicator functions are equal to 1 at two consecutive timepoints say 119868119887 = 1 at 119905119894 and 119905119894+1 The burst and the drop haveno overlap

322 Sampling Strategies As for the choices of sampling timepoints 119905119894 119894 = 1 119899 four different sampling strategies onewith regular sampling and three with irregular sampling areconsidered First regular sampling is applied inwhich all timeintervals are set to be 1119888 where 119888 is a constant Secondlya bio-like sampling strategy is invoked This strategy tendsto have more time points at the beginning of time-courseexperiments and less time points after we set the first 23time intervals as 1119888 and set the next 13 time intervalsas 2119888 Third time intervals are randomly chosen between1119888 and 2119888 The last sampling strategy in which all timeintervals are exponentially distributed with parameter 119888 isless realistic than the others but it is helpful for us to evaluatethe performance of RIAA under pathological conditions

ROC curves are applied for performance comparisonTo this end 10000 periodic signals were generated using(25) and 10000 nonperiodic signals were generated usingeither (26) or (27) Sensitivity measures the proportion ofsuccessful detection among the 10000 periodic signals andspecificity measures the proportion of correct claims onthe 10000 nonperiodic simulation datasets Sampling timepoints are decided by one of the four sampling strategies andthe number of time points 119899 is chosen arbitrarily For all ROCcurves in Section 4 119888 = 2 and 119899 = 24

33 Real Data Analysis Two yeast cell cycle experimentssynchronized using an alpha-factor one conducted by Spell-man et al [2] and one conducted by Pramila et al [18]are considered for a real data analysis The first time-course microarray data termed dataset alpha and down-loaded from the Yeast Cell Cycle Analysis Project website(httpgenome-wwwstanfordeducellcycle) harbors 6178gene expression levels and 18 sampling time points with a 7-minute intervalThe second time-course data termed datasetalpha 38 is downloaded from the online portal for FredHutchinson Cancer Research Centerrsquos scientific laboratories(httplabsfhcrcorgbreedencellcycle) This dataset con-tains 4774 gene expression levels and 25 sampling time pointswith a 5-minute interval Three benchmark sets of genes thathave been utilized in Lichtenberg et al [19] and Liew et al[20] as standards of cell cycle genes are also applied herein forperformance comparison These benchmark sets involving113 352 and 518 genes respectively include candidates ofcycle cell regulated genes in yeast proposed by Spellman et al[2] Johansson et al [21] Simon et al [22] Lee et al [23] andMewes et al [24] and are accessible in a laboratory website(httpwwwcbsdtudkcellcycle)

4 Results

RIAA performed well in the conducted simulations Asshown in Figure 2(a) a periodic signal (solid line) withamplitude 119872 = 1 and frequency 120596119904 = 04120587 is sampled

6 Advances in Bioinformatics

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 3 The ROC curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 04120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

Advances in Bioinformatics 7

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 4 The ROC Curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 01120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

6 Advances in Bioinformatics

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 3 The ROC curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 04120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

Advances in Bioinformatics 7

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 4 The ROC Curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 01120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Advances in Bioinformatics 7

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(a)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(b)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(c)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(d)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(e)

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(f)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(g)

RIAALSDLS

002

02

04

04

06

06

08

08

1

1-specificity

Sens

itivi

ty

(h)

Figure 4 The ROC Curves derived from simulations with 24 sampling time points signal amplitude119872 = 1 120596119904 = 01120587 and Gaussian noise120583 = 0 and 120590 = 05 Description of subplots is provided in Section 4

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

8 Advances in Bioinformatics

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(a)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(b)

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(c)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

100

The n

umbe

r of i

nter

sect

ion

113 gene benchmark set

The number of preserved genes

(d)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

352 gene benchmark set

The number of preserved genes

(e)

RIAALSDLS

200 400 600 800 10000

20

40

60

80

120

100

The n

umbe

r of i

nter

sect

ion

518 gene benchmark set

The number of preserved genes

(f)

Figure 5The intersection of preserved genes and the benchmark sets using RIAA LS andDLS algorithms (a) (b) and (c) reveal the analysisresults when dataset alpha was applied (d) (e) and (f) reveal the analysis results when dataset alpha 38 was applied

using the bio-like sampling strategy which applies 16 timepoints in (08] and 8 more time points in (816] Gaussiannoise with parameters 120583 = 0 and 120590 = 05 is assumedduring microarray experiments The resulting time-courseexpression levels (dots) at a total of 24 time points andthe sampling time information were treated as inputs tothe RIAA algorithm Figure 2(b) demonstrates the resultof periodogram estimation In this example the grid sizeΔ120596 was chosen to be 0065 and a total of 11 amplitudescorresponding to different frequencies were obtained andshown in the spectrum Using Fisherrsquos test the peak at thethird grid (frequency = 0195) was found to be significantlylarge (119901-value = 24 times 10 minus3) and hence a periodic gene wasclaimed

ROC curves strongly illustrate the performance of RIAAIn Figures 3 and 4 subplots (a)-(b) (c)-(d) (e)-(f) and (g)-(h) refer to the simulations with regular bio-like binomi-ally random and exponentially random sampling strategiesrespectively Additionally in the left-hand side subplots (a)(c) (e) and (g) nonperiodic signals were simply Gaussiannoise with parameters 120583 = 0 and 120590 = 05 while in the

right-hand side subplots (b) (d) (f) and (h) nonperiodicsignals involve not only the Gaussian noise but also atranscriptional burst and a sudden drop (27) Periodic signalswere generated using (25) with amplitude 119872 = 1 119888 = 2 and119899 = 24 The only difference in simulation settings betweenFigures 3 and 4 is the frequency of periodic signals they are120596119904 = 04120587 and 01120587 respectively As shown in these figuresLS and DLS can perform well as RIAA when the time-coursedata are regularly sampled or mildly irregularly sampledhowever when data are highly irregularly sampled RIAAoutperforms the others The superiority of RIAA over DLSis particularly clear when the signal frequency is small

Figure 5 illustrates the results of the real data analysiswhen these three algorithms namely the RIAA LS andDLS were applied On the 119909-axis the numbers indicate thethresholds 120578 that we preserved and classified as periodicitiesamong all yeast genes on the y-axis the numbers referto the intersection of 120578 preserved genes and the proposedperiodic candidates listed in the benchmark sets Figures5(a)ndash5(c) demonstrate the results derived from dataset alphawhen the 113-gene benchmark set 352-gene benchmark

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Advances in Bioinformatics 9

set and 518-gene benchmark set were applied respectivelySimilarly Figures 5(d)ndash5(f) demonstrate the results derivedfrom dataset alpha 38The RIAA does not result in significantdifferences in the numbers of intersections when comparedto those corresponding to LS and DLS in most of thesecases However RIAA shows slightly better coverage whenthe dataset alpha 38 and the 113-gene benchmark set wasutilized (Figure 5(d))

5 ConclusionsIn this study the rigorous simulations specifically designedto comfort with real experiments reveal that the RIAA canoutperform the classical LS and modified DLS algorithmswhen the sampling time points are highly irregular andwhenthe number of cycles covered by sampling times is verylimited These characteristics as also claimed in the originalstudy by Stoica et al [12] suggest that the RIAA can begenerally applied to detect periodicities in time-course geneexpression data with good potential to yield better results Asupplementary simulation further shows the superiority ofRIAA over LS and DLS when multiple periodic signals areconsidered (see Supplementary Figure s1 available online athttpdxdoiorg1011552013171530) From the simulationswe also learned that the addition of a transcriptional burst anda sudden drop to nonperiodic signals (the negatives) does notaffect the power of RIAA in terms of periodicity detectionMoreover the detrend function in DLS designed to improveLS by removing the linearity in time-course data may fail toprovide improved accuracy and makes the algorithm unableto detect periodicities when transcription oscillates with avery low frequency

The intersection of detected candidates and proposedperiodic genes in the real data analysis (Figure 5) does notreveal much differences among RIAA LS and DLS Onepossible reason is that the sampling time points conductedin the yeast experiment are not highly irregular (not manymissing values are included) since as demonstrated in Fig-ures 3(a)ndash3(d) the RIAA just performs equally well as the LSand DLS algorithms when the time-course data are regularlyor mildly irregularly sampled Also the very limited timepoints contained in the dataset may deviate the estimationof 119901-values [14] and thus hinder the RIAA from exhibitingits excellence Besides the number of true cell cycle genesincluded in the benchmark sets remains uncertainWe expectthat the superiority of RIAA in real data analysis would beclearer in the future when more studies and more datasetsbecome available

Besides the comparison of these algorithms it is inter-esting to note that the bio-like sampling strategy could leadto better detection of periodicities than the regular samplingstrategy (as shown in Figures 3(c) and 3(d)) It might bebeneficial to apply loose sampling time intervals at posteriorperiods to prolong the experimental time coverage when thenumber of time points is limited

Acknowledgments

Theauthors would like to thank themembers in the GenomicSignal Processing Laboratory Texas AampM University for

the helpful discussions and valuable feedback This workwas supported by the National Science Foundation underGrant no 0915444 The RIAA MATLAB code is available athttpgsptamueduPublicationssupplementaryagyepong12a

References

[1] W Zhao K Agyepong E Serpedin and E R DoughertyldquoDetecting periodic genes from irregularly sampled geneexpressions a comparison studyrdquoEURASIP Journal on Bioinfor-matics and Systems Biology vol 2008 Article ID 769293 2008

[2] P T Spellman G Sherlock M Q Zhang et al ldquoComprehensiveidentification of cell cycle-regulated genes of the yeast Sac-charomyces cerevisiae by microarray hybridizationrdquoMolecularBiology of the Cell vol 9 no 12 pp 3273ndash3297 1998

[3] G Rustici J Mata K Kivinen et al ldquoPeriodic gene expressionprogram of the fission yeast cell cyclerdquo Nature Genetics vol 36no 8 pp 809ndash817 2004

[4] M Menges L Hennig W Gruissem and J A H MurrayldquoCell cycle-regulated gene expression in Arabidopsisrdquo Journalof Biological Chemistry vol 277 no 44 pp 41987ndash42002 2002

[5] M Ahdesmaki H Lahdesmaki R Pearson H Huttunenand O Yli-Harja ldquoRobust detection of periodic time seriesmeasured from biological systemsrdquo BMC Bioinformatics vol 6article 117 2005

[6] M Ahdesmaki H Lahdesmaki A Gracey et al ldquoRobustregression for periodicity detection in non-uniformly sampledtime-course gene expression datardquo BMC Bioinformatics vol 8article 233 2007

[7] E F Glynn J Chen and A R Mushegian ldquoDetecting periodicpatterns in unevenly spaced gene expression time series usingLomb-Scargle periodogramsrdquo Bioinformatics vol 22 no 3 pp310ndash316 2006

[8] R Yang C Zhang and Z Su ldquoLSPR an integrated periodicitydetection algorithm for unevenly sampled temporal microarraydatardquo Bioinformatics vol 27 no 7 pp 1023ndash1025 2011

[9] E R Dougherty ldquoSmall sample issues for microarray-basedclassificationrdquoComparative and Functional Genomics vol 2 no1 pp 28ndash34 2001

[10] Y Tu G Stolovitzky and U Klein ldquoQuantitative noise analysisfor gene expression microarray experimentsrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 99 no 22 pp 14031ndash14036 2002

[11] Z Bar-Joseph ldquoAnalyzing time series gene expression datardquoBioinformatics vol 20 no 16 pp 2493ndash2503 2004

[12] P Stoica J Li and H He ldquoSpectral analysis of nonuniformlysampled data a new approach versus the periodogramrdquo IEEETransactions on Signal Processing vol 57 no 3 pp 843ndash8582009

[13] J Fan and Q Yao Nonlinear Time Series Nonparametric andParametric Methods Springer New York NY USA 2003

[14] A W C Liew N F Law X Q Cao and H Yan ldquoStatisticalpower of Fisher test for the detection of short periodic geneexpression profilesrdquo Pattern Recognition vol 42 no 4 pp 549ndash556 2009

[15] V Berger ldquoPros and cons of permutation tests in clinical trialsrdquoStatistics in Medicine vol 19 no 10 pp 1319ndash1328 2000

[16] A P Bradley ldquoThe use of the area under the ROC curvein the evaluation of machine learning algorithmsrdquo PatternRecognition vol 30 no 7 pp 1145ndash1159 1997

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

10 Advances in Bioinformatics

[17] J R Chubb T Trcek SM Shenoy andRH Singer ldquoTranscrip-tional pulsing of a developmental generdquoCurrent Biology vol 16no 10 pp 1018ndash1025 2006

[18] T PramilaWWuWNoble and L Breeden ldquoPeriodic genes ofthe yeast Saccharomyces cerevisiae a combined analysis of fivecell cycle data setsrdquo 2007

[19] U Lichtenberg L J Jensen A Fausboslashll T S Jensen P Borkand S Brunak ldquoComparison of computational methods for theidentification of cell cycle-regulated genesrdquo Bioinformatics vol21 no 7 pp 1164ndash1171 2005

[20] A W C Liew J Xian S Wu D Smith and H Yan ldquoSpectralestimation in unevenly sampled space of periodically expressedmicroarray time series datardquo BMC Bioinformatics vol 8 article137 2007

[21] D Johansson P Lindgren and A Berglund ldquoA multivariateapproach applied to microarray data for identification of geneswith cell cycle-coupled transcriptionrdquo Bioinformatics vol 19no 4 pp 467ndash473 2003

[22] I Simon J Barnett N Hannett et al ldquoSerial regulation oftranscriptional regulators in the yeast cell cyclerdquo Cell vol 106no 6 pp 697ndash708 2001

[23] T I Lee N J Rinaldi F Robert et al ldquoTranscriptionalregulatory networks in Saccharomyces cerevisiaerdquo Science vol298 no 5594 pp 799ndash804 2002

[24] H W Mewes D Frishman U Guldener et al ldquoMIPS adatabase for genomes and protein sequencesrdquo Nucleic AcidsResearch vol 30 no 1 pp 31ndash34 2002

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology