isocir: an r package for constrained inference using ... · an application to cell biology. sandra...

17
isocir: An R Package for Constrained Inference using Isotonic Regression for Circular Data, with an Application to Cell Biology. Sandra Barrag´ an Universidad de Valladolid Miguel A. Fern´ andez Universidad de Valladolid Cristina Rueda Universidad de Valladolid Shyamal Das Peddada National Institute of Environmental Health Sciences Abstract In many applications one may be interested in drawing inferences regarding the order of a collection of points on a unit circle. Due to the underlying geometry of the circle standard constrained inference procedures developed for Euclidean space data are not applicable. Recently, statistical inference for parameters under such order constraints on a unit circle was discussed in Rueda et al. (2009); Fern´ andez et al. (2012). In this paper we introduce an R package called isocir which provides a set of functions that can be used for analyzing angular data subject to order constraints on a unit circle. Since this work is motivated by applications in cell biology, we illustrate the proposed package using a relevant cell cycle data. Keywords : Circular Data, Isotropic Order, CIRE, Conditional Test, R package isocir, R. 1. Introduction Circular data arise in a wide range of contexts, such as in geography, cell biology, circadian biology, endocrinology, ornithology, etc (cf Zar (1999) or Mardia et al. (2008)). This work is motivated by applications in cell cycle biology where one may want to draw inferences regarding angular parameters subject to order constraints on a unit circle. A cell cycle among eukaryotes follows a well-coordinated process where cells go through cycle four phases of distinct biological functions, namely, G1, S, G2 and M phase (see Figure 1). cular order and extended the notion of isotonic regression estimator to circular parameter space by defining the circular isotonic regression estimator (CIRE). Using CIRE, Fern´ andez et al. (2012) developed a formal statistical theory and methodology for testing whether the circular order of peak expression of a subset of cell cycle genes is conserved across multiple species. These statistical methods may have numerous other applications apart from cell cycle. With the increase in human survival rates, there is considerable interest in understanding neuro-

Upload: ledat

Post on 29-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

isocir An R Package for Constrained Inference

using Isotonic Regression for Circular Data with

an Application to Cell Biology

Sandra BarraganUniversidad de Valladolid

Miguel A FernandezUniversidad de Valladolid

Cristina RuedaUniversidad de Valladolid

Shyamal Das PeddadaNational Institute of

Environmental Health Sciences

Abstract

In many applications one may be interested in drawing inferences regarding the orderof a collection of points on a unit circle Due to the underlying geometry of the circlestandard constrained inference procedures developed for Euclidean space data are notapplicable Recently statistical inference for parameters under such order constraints ona unit circle was discussed in Rueda et al (2009) Fernandez et al (2012) In this paperwe introduce an R package called isocir which provides a set of functions that can be usedfor analyzing angular data subject to order constraints on a unit circle Since this workis motivated by applications in cell biology we illustrate the proposed package using arelevant cell cycle data

Keywords Circular Data Isotropic Order CIRE Conditional Test R package isocir R

1 Introduction

Circular data arise in a wide range of contexts such as in geography cell biology circadianbiology endocrinology ornithology etc (cf Zar (1999) or Mardia et al (2008)) This workis motivated by applications in cell cycle biology where one may want to draw inferencesregarding angular parameters subject to order constraints on a unit circle A cell cycleamong eukaryotes follows a well-coordinated process where cells go through cycle four phasesof distinct biological functions namely G1 S G2 and M phase (see Figure 1)

cular order and extended the notion of isotonic regression estimator to circular parameterspace by defining the circular isotonic regression estimator (CIRE) Using CIRE Fernandezet al (2012) developed a formal statistical theory and methodology for testing whether thecircular order of peak expression of a subset of cell cycle genes is conserved across multiplespecies

These statistical methods may have numerous other applications apart from cell cycle Withthe increase in human survival rates there is considerable interest in understanding neuro-

2 isocir An R Package for Isotonic Inference for Circular Data

Figure 1 Phases of a cell cycle

Genes participating in the cell division cycle are oftencalled cell cycle genes A cell cycle gene has a periodicexpression with its peak expression occurring just beforeits biological function (Jensen et al (2006)) Since theperiodic expression of a cell cycle gene can be mappedonto a unit circle the angle corresponding to its peakexpression is known as the phase angle of the gene (seeLiu et al (2004) and Figure 2)

Since cell cycle is fundamental to the growth and developmentof an organism cell biologists have been interested in under-standing various aspects of cell cycle that are evolutionarilyconserved For instance they would like to identify geneswhose relative order of peak expressions is evolutionarily con-served In order to solve such problems Rueda et al (2009)introduced an order restriction on the unit circle called cir- Figure 2 Phase angle (φ)

logical diseases related to aging such as the Alzheimerrsquos disease (AD) and the Parkinsonrsquosdisease (PD) An important aspect of such neurological diseases is the disruption of circadianclock and genes participating in it Researchers are interested in testing differences in thephases of expression of circadian genes in normal and AD patients (Cermakian et al (2011))Methodology discussed in this paper can be used for analyzing such data Other areas ofapplications include the study of migratory patterns and directions of birds (Cochran et al(2004)) the changes in the wind directions (Bowers et al (2000)) directional fluctuations inthe atmosphere (van Doorn et al (2000)) psychology (studies of mental maps or monitoringdata (Kibiak and Jonas (2007)) the orientation of ridges in fingerprints or magnetic maps(Boles and Lohmann (2003))

Motivated by the wide range of applications and the non-existence of a user friendly softwarein Section 3 we introduce the isocir package programmed in R environment R Develop-ment Core Team (2011) which can be downloaded from httpcranr-projectorgweb

packagesisocir The package provides functions which can be used for drawing inferencesregarding the order of a collection of points on a unit circle In Section 2 we describe thestatistical problem and the methodology of Rueda et al (2009) and Fernandez et al (2012)The isocir package is illustrated in Section 4 using the motivating cell cycle gene expressiondata Some concluding remarks are provided in Section 5

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 3

2 Angular Parameters under Order Constraints

21 Circular Order Restriction

Let θ = (θ1 θ2 θq) where each θi i = 1 2 q is the sample circular mean of a randomsample of size ni from a population with unknown circular mean φi All angles are definedin a counter clockwise direction relative to a given pole The mean resultant lengths foreach population are denoted as r1 rq (see Mardia and Jupp (2000) for the definition ofcircular mean and mean resultant length of a set of angles) Then the problem of interest isto draw statistical inferences on φi i = 1 2 q subject to the constraint that the anglesφ1 φ2 φq are in a counter clockwise order on a unit circle Thus φ1 is ldquofollowedrdquo by φ2which is ldquofollowedrdquo by φ3 φq is ldquofollowedrdquo by φ1 More precisely we shall denote thissimple circular order among angular parameters as follows

Csco = φ = (φ1 φ2 φq) isin [0 2π]q φ1 φ2 middot middot middot φq φ1 (1)

It is important to note that the order among the angular parameters is invariant under changesin location of the pole (the initial point of the circle) Unlike the Euclidean space points ona unit circle wrap around That is starting at the pole by traveling 2π radians around thecircumference of the circle one would return to pole For this reason the circular order amongpoints on a unit circle is preserved even if the location of the pole is shifted This is whyRueda et al (2009) and Fernandez et al (2012) refer to the circular order Csco as isotropicorder As in this paper we will also consider more general circular order restrictions fromnow on we will refer to Csco as simple circular order

As a consequence of the geometry a circle can never be linearized and hence methods devel-oped for Euclidean space data are not applicable to circular data The problem is even morechallenging when the angular parameters are constrained by an order around the circle suchas Csco General methodology for circular data when there are no constraints on the angularparameters can be found in the book Mardia and Jupp (2000) among others Constrainedinference for circular data is rather recent (Rueda et al (2009) and Fernandez et al (2012))As noted in Rueda et al (2009) standard Euclidean space methods such as the pool adja-cent violators algorithm (PAVA) used for computing isotonic regression (see Robertson et al(1988) for details) cannot be applied to circular data For example when a cell biologist isinvestigating a large number of cell cycle genes it may be difficult to ascertain the circularorder among all cell cycle genes under consideration However based on the underlying bi-ology the investigator may a priori know the circular order among groups of genes but notthe order among genes within each group In such situations a partial circular order Cpco asdefined below can be used

Cpco =

φ isin [0 2π]q

φ1φ2φl1

φl1+1

φl1+2

φl1+l2

middot middot middot

φl1++lLminus1+1

φl1++lLminus1+2

φl1++lL

φ1φ2φl1

(2)In this case we have L sets of parameters with lj angular parameters in set j and q =

sumLj=1 lj

Order among the parameters within a set is not known but every parameter in a given set isldquofollowedrdquo by every parameter in the next set

4 isocir An R Package for Isotonic Inference for Circular Data

22 Estimation and Testing under Circular Order Restrictions

Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)

prime isin Csco is givenby

φ = arg minφisinCsco

SCE(φ θ) (3)

where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data

SCE(φ θ) =

qsumi=1

ri(1minus cos(φi minus θi)) (4)

where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir

Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses

H0 φ isin CscoH1 H0 is not true

The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ

is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by

Reject H0 if T lowast ge c(m)

where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that

pr(Fqminusmqminus1 ge c(m)) =α

1minus 1(q minus 1) (5)

where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test

H0 φ isin CpcoH1 H0 is not true

by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5

The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates

These tests are implemented in the function condtest in the R package isocir introduced inthe next Section

3 Package isocir

We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples

31 Related Packages

As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as

isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion

Iso (Turner (2009)) Functions to perform isotonic regression

ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves

Similarly there are several packages in R for analyzing circular data such as

CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name

circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package

Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir

32 Package Structure

For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1

6 isocir An R Package for Isotonic Inference for Circular Data

Functions Arguments Description

sce (arg1 arg2 meanrl) Sum of Circular Errors

mrl (data) Mean Resultant Length

CIRE (data groups circular) Circular Isotonic Regression Estimator

condtest (data groups kappa) Conditional Test

isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir

isisocir (x) Checks an Object

printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir

plotisocir (x option ) Plots an object of class isocir

Table 1 Summary of the components of the package isocir

In the following we describe each function of the software in detail

Functions sce() and mrl()

The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data

Function CIRE()

Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains

Arguments Values

data vector or matrix with the data

groups the groups of the order

circular =TRUE(by default) =FALSE

Table 2 Arguments of the CIRE function

the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular

sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))

Function condtest()

This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7

H0 The angles φ1 φq follow a (simple or partial) circular order

H1 H0 is not true

The arguments of this function appear in Table 3 and are explained below Arguments data

Arguments κ known κ unknown

data numeric vector matrix (as many columns as replications)

groups numeric vector with the groups of the order to be tested

kappa positive numeric value (NULL)

biasCorrect (NULL) =TRUE(by default) =FALSE

Table 3 Arguments of the condtest function

and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest

Class isocir

Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements

$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)

$SCE is the value of the Sum of Circular Errors (SCE(φ θ))

$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument

$kappa the value of kappa (either set by the user or estimated)

$pvalue the p-value of the conditional test obtained from the function condtest

These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

2 isocir An R Package for Isotonic Inference for Circular Data

Figure 1 Phases of a cell cycle

Genes participating in the cell division cycle are oftencalled cell cycle genes A cell cycle gene has a periodicexpression with its peak expression occurring just beforeits biological function (Jensen et al (2006)) Since theperiodic expression of a cell cycle gene can be mappedonto a unit circle the angle corresponding to its peakexpression is known as the phase angle of the gene (seeLiu et al (2004) and Figure 2)

Since cell cycle is fundamental to the growth and developmentof an organism cell biologists have been interested in under-standing various aspects of cell cycle that are evolutionarilyconserved For instance they would like to identify geneswhose relative order of peak expressions is evolutionarily con-served In order to solve such problems Rueda et al (2009)introduced an order restriction on the unit circle called cir- Figure 2 Phase angle (φ)

logical diseases related to aging such as the Alzheimerrsquos disease (AD) and the Parkinsonrsquosdisease (PD) An important aspect of such neurological diseases is the disruption of circadianclock and genes participating in it Researchers are interested in testing differences in thephases of expression of circadian genes in normal and AD patients (Cermakian et al (2011))Methodology discussed in this paper can be used for analyzing such data Other areas ofapplications include the study of migratory patterns and directions of birds (Cochran et al(2004)) the changes in the wind directions (Bowers et al (2000)) directional fluctuations inthe atmosphere (van Doorn et al (2000)) psychology (studies of mental maps or monitoringdata (Kibiak and Jonas (2007)) the orientation of ridges in fingerprints or magnetic maps(Boles and Lohmann (2003))

Motivated by the wide range of applications and the non-existence of a user friendly softwarein Section 3 we introduce the isocir package programmed in R environment R Develop-ment Core Team (2011) which can be downloaded from httpcranr-projectorgweb

packagesisocir The package provides functions which can be used for drawing inferencesregarding the order of a collection of points on a unit circle In Section 2 we describe thestatistical problem and the methodology of Rueda et al (2009) and Fernandez et al (2012)The isocir package is illustrated in Section 4 using the motivating cell cycle gene expressiondata Some concluding remarks are provided in Section 5

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 3

2 Angular Parameters under Order Constraints

21 Circular Order Restriction

Let θ = (θ1 θ2 θq) where each θi i = 1 2 q is the sample circular mean of a randomsample of size ni from a population with unknown circular mean φi All angles are definedin a counter clockwise direction relative to a given pole The mean resultant lengths foreach population are denoted as r1 rq (see Mardia and Jupp (2000) for the definition ofcircular mean and mean resultant length of a set of angles) Then the problem of interest isto draw statistical inferences on φi i = 1 2 q subject to the constraint that the anglesφ1 φ2 φq are in a counter clockwise order on a unit circle Thus φ1 is ldquofollowedrdquo by φ2which is ldquofollowedrdquo by φ3 φq is ldquofollowedrdquo by φ1 More precisely we shall denote thissimple circular order among angular parameters as follows

Csco = φ = (φ1 φ2 φq) isin [0 2π]q φ1 φ2 middot middot middot φq φ1 (1)

It is important to note that the order among the angular parameters is invariant under changesin location of the pole (the initial point of the circle) Unlike the Euclidean space points ona unit circle wrap around That is starting at the pole by traveling 2π radians around thecircumference of the circle one would return to pole For this reason the circular order amongpoints on a unit circle is preserved even if the location of the pole is shifted This is whyRueda et al (2009) and Fernandez et al (2012) refer to the circular order Csco as isotropicorder As in this paper we will also consider more general circular order restrictions fromnow on we will refer to Csco as simple circular order

As a consequence of the geometry a circle can never be linearized and hence methods devel-oped for Euclidean space data are not applicable to circular data The problem is even morechallenging when the angular parameters are constrained by an order around the circle suchas Csco General methodology for circular data when there are no constraints on the angularparameters can be found in the book Mardia and Jupp (2000) among others Constrainedinference for circular data is rather recent (Rueda et al (2009) and Fernandez et al (2012))As noted in Rueda et al (2009) standard Euclidean space methods such as the pool adja-cent violators algorithm (PAVA) used for computing isotonic regression (see Robertson et al(1988) for details) cannot be applied to circular data For example when a cell biologist isinvestigating a large number of cell cycle genes it may be difficult to ascertain the circularorder among all cell cycle genes under consideration However based on the underlying bi-ology the investigator may a priori know the circular order among groups of genes but notthe order among genes within each group In such situations a partial circular order Cpco asdefined below can be used

Cpco =

φ isin [0 2π]q

φ1φ2φl1

φl1+1

φl1+2

φl1+l2

middot middot middot

φl1++lLminus1+1

φl1++lLminus1+2

φl1++lL

φ1φ2φl1

(2)In this case we have L sets of parameters with lj angular parameters in set j and q =

sumLj=1 lj

Order among the parameters within a set is not known but every parameter in a given set isldquofollowedrdquo by every parameter in the next set

4 isocir An R Package for Isotonic Inference for Circular Data

22 Estimation and Testing under Circular Order Restrictions

Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)

prime isin Csco is givenby

φ = arg minφisinCsco

SCE(φ θ) (3)

where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data

SCE(φ θ) =

qsumi=1

ri(1minus cos(φi minus θi)) (4)

where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir

Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses

H0 φ isin CscoH1 H0 is not true

The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ

is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by

Reject H0 if T lowast ge c(m)

where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that

pr(Fqminusmqminus1 ge c(m)) =α

1minus 1(q minus 1) (5)

where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test

H0 φ isin CpcoH1 H0 is not true

by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5

The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates

These tests are implemented in the function condtest in the R package isocir introduced inthe next Section

3 Package isocir

We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples

31 Related Packages

As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as

isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion

Iso (Turner (2009)) Functions to perform isotonic regression

ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves

Similarly there are several packages in R for analyzing circular data such as

CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name

circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package

Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir

32 Package Structure

For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1

6 isocir An R Package for Isotonic Inference for Circular Data

Functions Arguments Description

sce (arg1 arg2 meanrl) Sum of Circular Errors

mrl (data) Mean Resultant Length

CIRE (data groups circular) Circular Isotonic Regression Estimator

condtest (data groups kappa) Conditional Test

isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir

isisocir (x) Checks an Object

printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir

plotisocir (x option ) Plots an object of class isocir

Table 1 Summary of the components of the package isocir

In the following we describe each function of the software in detail

Functions sce() and mrl()

The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data

Function CIRE()

Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains

Arguments Values

data vector or matrix with the data

groups the groups of the order

circular =TRUE(by default) =FALSE

Table 2 Arguments of the CIRE function

the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular

sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))

Function condtest()

This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7

H0 The angles φ1 φq follow a (simple or partial) circular order

H1 H0 is not true

The arguments of this function appear in Table 3 and are explained below Arguments data

Arguments κ known κ unknown

data numeric vector matrix (as many columns as replications)

groups numeric vector with the groups of the order to be tested

kappa positive numeric value (NULL)

biasCorrect (NULL) =TRUE(by default) =FALSE

Table 3 Arguments of the condtest function

and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest

Class isocir

Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements

$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)

$SCE is the value of the Sum of Circular Errors (SCE(φ θ))

$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument

$kappa the value of kappa (either set by the user or estimated)

$pvalue the p-value of the conditional test obtained from the function condtest

These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 3

2 Angular Parameters under Order Constraints

21 Circular Order Restriction

Let θ = (θ1 θ2 θq) where each θi i = 1 2 q is the sample circular mean of a randomsample of size ni from a population with unknown circular mean φi All angles are definedin a counter clockwise direction relative to a given pole The mean resultant lengths foreach population are denoted as r1 rq (see Mardia and Jupp (2000) for the definition ofcircular mean and mean resultant length of a set of angles) Then the problem of interest isto draw statistical inferences on φi i = 1 2 q subject to the constraint that the anglesφ1 φ2 φq are in a counter clockwise order on a unit circle Thus φ1 is ldquofollowedrdquo by φ2which is ldquofollowedrdquo by φ3 φq is ldquofollowedrdquo by φ1 More precisely we shall denote thissimple circular order among angular parameters as follows

Csco = φ = (φ1 φ2 φq) isin [0 2π]q φ1 φ2 middot middot middot φq φ1 (1)

It is important to note that the order among the angular parameters is invariant under changesin location of the pole (the initial point of the circle) Unlike the Euclidean space points ona unit circle wrap around That is starting at the pole by traveling 2π radians around thecircumference of the circle one would return to pole For this reason the circular order amongpoints on a unit circle is preserved even if the location of the pole is shifted This is whyRueda et al (2009) and Fernandez et al (2012) refer to the circular order Csco as isotropicorder As in this paper we will also consider more general circular order restrictions fromnow on we will refer to Csco as simple circular order

As a consequence of the geometry a circle can never be linearized and hence methods devel-oped for Euclidean space data are not applicable to circular data The problem is even morechallenging when the angular parameters are constrained by an order around the circle suchas Csco General methodology for circular data when there are no constraints on the angularparameters can be found in the book Mardia and Jupp (2000) among others Constrainedinference for circular data is rather recent (Rueda et al (2009) and Fernandez et al (2012))As noted in Rueda et al (2009) standard Euclidean space methods such as the pool adja-cent violators algorithm (PAVA) used for computing isotonic regression (see Robertson et al(1988) for details) cannot be applied to circular data For example when a cell biologist isinvestigating a large number of cell cycle genes it may be difficult to ascertain the circularorder among all cell cycle genes under consideration However based on the underlying bi-ology the investigator may a priori know the circular order among groups of genes but notthe order among genes within each group In such situations a partial circular order Cpco asdefined below can be used

Cpco =

φ isin [0 2π]q

φ1φ2φl1

φl1+1

φl1+2

φl1+l2

middot middot middot

φl1++lLminus1+1

φl1++lLminus1+2

φl1++lL

φ1φ2φl1

(2)In this case we have L sets of parameters with lj angular parameters in set j and q =

sumLj=1 lj

Order among the parameters within a set is not known but every parameter in a given set isldquofollowedrdquo by every parameter in the next set

4 isocir An R Package for Isotonic Inference for Circular Data

22 Estimation and Testing under Circular Order Restrictions

Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)

prime isin Csco is givenby

φ = arg minφisinCsco

SCE(φ θ) (3)

where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data

SCE(φ θ) =

qsumi=1

ri(1minus cos(φi minus θi)) (4)

where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir

Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses

H0 φ isin CscoH1 H0 is not true

The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ

is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by

Reject H0 if T lowast ge c(m)

where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that

pr(Fqminusmqminus1 ge c(m)) =α

1minus 1(q minus 1) (5)

where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test

H0 φ isin CpcoH1 H0 is not true

by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5

The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates

These tests are implemented in the function condtest in the R package isocir introduced inthe next Section

3 Package isocir

We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples

31 Related Packages

As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as

isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion

Iso (Turner (2009)) Functions to perform isotonic regression

ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves

Similarly there are several packages in R for analyzing circular data such as

CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name

circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package

Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir

32 Package Structure

For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1

6 isocir An R Package for Isotonic Inference for Circular Data

Functions Arguments Description

sce (arg1 arg2 meanrl) Sum of Circular Errors

mrl (data) Mean Resultant Length

CIRE (data groups circular) Circular Isotonic Regression Estimator

condtest (data groups kappa) Conditional Test

isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir

isisocir (x) Checks an Object

printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir

plotisocir (x option ) Plots an object of class isocir

Table 1 Summary of the components of the package isocir

In the following we describe each function of the software in detail

Functions sce() and mrl()

The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data

Function CIRE()

Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains

Arguments Values

data vector or matrix with the data

groups the groups of the order

circular =TRUE(by default) =FALSE

Table 2 Arguments of the CIRE function

the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular

sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))

Function condtest()

This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7

H0 The angles φ1 φq follow a (simple or partial) circular order

H1 H0 is not true

The arguments of this function appear in Table 3 and are explained below Arguments data

Arguments κ known κ unknown

data numeric vector matrix (as many columns as replications)

groups numeric vector with the groups of the order to be tested

kappa positive numeric value (NULL)

biasCorrect (NULL) =TRUE(by default) =FALSE

Table 3 Arguments of the condtest function

and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest

Class isocir

Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements

$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)

$SCE is the value of the Sum of Circular Errors (SCE(φ θ))

$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument

$kappa the value of kappa (either set by the user or estimated)

$pvalue the p-value of the conditional test obtained from the function condtest

These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

4 isocir An R Package for Isotonic Inference for Circular Data

22 Estimation and Testing under Circular Order Restrictions

Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)

prime isin Csco is givenby

φ = arg minφisinCsco

SCE(φ θ) (3)

where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data

SCE(φ θ) =

qsumi=1

ri(1minus cos(φi minus θi)) (4)

where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir

Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses

H0 φ isin CscoH1 H0 is not true

The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ

is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by

Reject H0 if T lowast ge c(m)

where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that

pr(Fqminusmqminus1 ge c(m)) =α

1minus 1(q minus 1) (5)

where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test

H0 φ isin CpcoH1 H0 is not true

by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5

The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates

These tests are implemented in the function condtest in the R package isocir introduced inthe next Section

3 Package isocir

We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples

31 Related Packages

As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as

isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion

Iso (Turner (2009)) Functions to perform isotonic regression

ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves

Similarly there are several packages in R for analyzing circular data such as

CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name

circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package

Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir

32 Package Structure

For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1

6 isocir An R Package for Isotonic Inference for Circular Data

Functions Arguments Description

sce (arg1 arg2 meanrl) Sum of Circular Errors

mrl (data) Mean Resultant Length

CIRE (data groups circular) Circular Isotonic Regression Estimator

condtest (data groups kappa) Conditional Test

isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir

isisocir (x) Checks an Object

printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir

plotisocir (x option ) Plots an object of class isocir

Table 1 Summary of the components of the package isocir

In the following we describe each function of the software in detail

Functions sce() and mrl()

The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data

Function CIRE()

Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains

Arguments Values

data vector or matrix with the data

groups the groups of the order

circular =TRUE(by default) =FALSE

Table 2 Arguments of the CIRE function

the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular

sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))

Function condtest()

This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7

H0 The angles φ1 φq follow a (simple or partial) circular order

H1 H0 is not true

The arguments of this function appear in Table 3 and are explained below Arguments data

Arguments κ known κ unknown

data numeric vector matrix (as many columns as replications)

groups numeric vector with the groups of the order to be tested

kappa positive numeric value (NULL)

biasCorrect (NULL) =TRUE(by default) =FALSE

Table 3 Arguments of the condtest function

and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest

Class isocir

Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements

$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)

$SCE is the value of the Sum of Circular Errors (SCE(φ θ))

$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument

$kappa the value of kappa (either set by the user or estimated)

$pvalue the p-value of the conditional test obtained from the function condtest

These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5

The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates

These tests are implemented in the function condtest in the R package isocir introduced inthe next Section

3 Package isocir

We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples

31 Related Packages

As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as

isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion

Iso (Turner (2009)) Functions to perform isotonic regression

ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves

Similarly there are several packages in R for analyzing circular data such as

CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name

circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package

Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir

32 Package Structure

For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1

6 isocir An R Package for Isotonic Inference for Circular Data

Functions Arguments Description

sce (arg1 arg2 meanrl) Sum of Circular Errors

mrl (data) Mean Resultant Length

CIRE (data groups circular) Circular Isotonic Regression Estimator

condtest (data groups kappa) Conditional Test

isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir

isisocir (x) Checks an Object

printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir

plotisocir (x option ) Plots an object of class isocir

Table 1 Summary of the components of the package isocir

In the following we describe each function of the software in detail

Functions sce() and mrl()

The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data

Function CIRE()

Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains

Arguments Values

data vector or matrix with the data

groups the groups of the order

circular =TRUE(by default) =FALSE

Table 2 Arguments of the CIRE function

the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular

sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))

Function condtest()

This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7

H0 The angles φ1 φq follow a (simple or partial) circular order

H1 H0 is not true

The arguments of this function appear in Table 3 and are explained below Arguments data

Arguments κ known κ unknown

data numeric vector matrix (as many columns as replications)

groups numeric vector with the groups of the order to be tested

kappa positive numeric value (NULL)

biasCorrect (NULL) =TRUE(by default) =FALSE

Table 3 Arguments of the condtest function

and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest

Class isocir

Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements

$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)

$SCE is the value of the Sum of Circular Errors (SCE(φ θ))

$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument

$kappa the value of kappa (either set by the user or estimated)

$pvalue the p-value of the conditional test obtained from the function condtest

These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

6 isocir An R Package for Isotonic Inference for Circular Data

Functions Arguments Description

sce (arg1 arg2 meanrl) Sum of Circular Errors

mrl (data) Mean Resultant Length

CIRE (data groups circular) Circular Isotonic Regression Estimator

condtest (data groups kappa) Conditional Test

isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir

isisocir (x) Checks an Object

printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir

plotisocir (x option ) Plots an object of class isocir

Table 1 Summary of the components of the package isocir

In the following we describe each function of the software in detail

Functions sce() and mrl()

The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data

Function CIRE()

Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains

Arguments Values

data vector or matrix with the data

groups the groups of the order

circular =TRUE(by default) =FALSE

Table 2 Arguments of the CIRE function

the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular

sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))

Function condtest()

This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7

H0 The angles φ1 φq follow a (simple or partial) circular order

H1 H0 is not true

The arguments of this function appear in Table 3 and are explained below Arguments data

Arguments κ known κ unknown

data numeric vector matrix (as many columns as replications)

groups numeric vector with the groups of the order to be tested

kappa positive numeric value (NULL)

biasCorrect (NULL) =TRUE(by default) =FALSE

Table 3 Arguments of the condtest function

and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest

Class isocir

Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements

$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)

$SCE is the value of the Sum of Circular Errors (SCE(φ θ))

$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument

$kappa the value of kappa (either set by the user or estimated)

$pvalue the p-value of the conditional test obtained from the function condtest

These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7

H0 The angles φ1 φq follow a (simple or partial) circular order

H1 H0 is not true

The arguments of this function appear in Table 3 and are explained below Arguments data

Arguments κ known κ unknown

data numeric vector matrix (as many columns as replications)

groups numeric vector with the groups of the order to be tested

kappa positive numeric value (NULL)

biasCorrect (NULL) =TRUE(by default) =FALSE

Table 3 Arguments of the condtest function

and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest

Class isocir

Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements

$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)

$SCE is the value of the Sum of Circular Errors (SCE(φ θ))

$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument

$kappa the value of kappa (either set by the user or estimated)

$pvalue the p-value of the conditional test obtained from the function condtest

These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

8 isocir An R Package for Isotonic Inference for Circular Data

will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input

Some S3 methods have also been defined for the class isocir

isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir

isisocir(x) This function checks whether the object x is of class isocir

printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen

plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means

33 Examples

In this section we provide examples to illustrate isocir

Example 1

Suppose the observed angular means of eight populations are given by

θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ1

φ2

φ3

These data are a set of random circular data called cirdata in our package and theycan be used by calling as below

gt data(cirdata)

gt cirdata

Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows

gt orderGroups lt- c(1 1 1 2 2 3 4 4)

Thus we obtain CIRE using the function CIRE as follows

gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9

The output is saved in example1CIRE and the printed output is as follows

gt example1CIRE

Thus the constrained estimates satisfy the required order as followsφ1 = 0993

φ2 = 1475

φ3 = 3066

φ4 = 5056

φ5 = 3066

φ6 = 5056

φ7 = 5056

φ8 = 0993

where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))

Example 2 (κ unknown (replications needed))

Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows

H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1

H1 H0 is not true

We take the data from the package and set the groups of the order in the argumentgroups

gt data(datareplic)

gt orderGroups2 lt- c(18)

Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code

gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)

gt example2test

The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order

If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

10 isocir An R Package for Isotonic Inference for Circular Data

gt round(unlist(example2test$cirmeans) digits = 3)

4 Application to Analysis of the Cell Cycle Gene Expression Data

As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis

H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1

H1 H0 is not true

(6)

For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff

peddadaindexcfm

Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that

θij simind M(φij κj) i = 1 2 16 j = 1 2 10

where θij is the unrestricted circular mean of the gene i in the experiment j

Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11

within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene

The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is

φij = micro+ αi + βj

where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)

For each of the 10 experiments we test the hypothesis (6) using the function condtest

that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5

gt data(cirgenes)

gt kappas lt- c(264773 324742 215936 415314 454357

+ 2907610 651408 1419445 566920 1112889)

gt allresults lt- list()

gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))

gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))

gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))

gt for (i in 1 nrow(cirgenes))

+ k lt- kappas[i]

+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])

+ allresults[[i]] lt- condtest(genes kappa = k)

+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)

+ SCEs[i] lt- allresults[[i]]$SCE

+ pvalues[i] lt- allresults[[i]]$pvalue

+

From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs

5 Conclusions

In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

12 isocir An R Package for Isotonic Inference for Circular Data

to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))

We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas

Acknowledgments

SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript

References

Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge

r-projectorgprojectscircular

Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg

Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63

Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30

Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170

Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13

Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408

de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05

Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals

orgcontent4072823

Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific

Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597

Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237

Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87

Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245

Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg

package=CircStats

Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109

Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons

Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260

Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042

R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg

Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

14 isocir An R Package for Isotonic Inference for Circular Data

Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347

Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817

Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso

van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534

Zar J (1999) Biostatistical Analysis Prentice Hall

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15

Unr

estr

icte

dC

ircu

lar

Mea

ns

``

``

```

```

``

Exp

erim

ents

Gen

esssb1

cdc2

2msh

6psm

3ra

d21

cig2

mik1

h33

hhf1

hht

3ht

a2ht

b1

fkh2

chs2

sid2

slp1

θ 1θ 2

θ 3θ 4

θ 5θ 6

θ 7θ 8

θ 9θ 1

0θ 1

1θ 1

2θ 1

3θ 1

4θ 1

5θ 1

6

1Oliva

cdc

020

20

218

626

25

765

089

35

612

625

71

178

091

21

200

097

11

289

529

85

597

425

25

209

2Oliva

elut1

294

03

262

281

12

848

160

32

382

170

94

689

435

54

717

441

94

397

160

01

819

175

12

519

3Oliva

elut2

044

00

447

525

86

206

438

15

458

604

51

541

072

76

115

035

20

687

393

53

970

583

65

896

4Pen

gcd

c3

328

356

53

387

280

63

194

326

03

026

477

94

694

475

54

816

467

52

685

277

02

885

242

2

5Pen

gelut

333

33

912

389

43

444

364

83

970

429

65

189

506

05

144

521

65

243

333

83

607

308

23

185

6Rust

cdc1

196

52

151

203

42

029

174

22

073

173

03

130

299

33

085

306

32

873

128

11

179

190

61

237

7Rust

cdc2

180

92

208

141

51

351

196

41

940

197

83

745

358

43

670

347

93

590

138

31

456

106

41

396

8Rust

elut1

mdash1

457

128

8mdash

153

01

374

137

92

420

227

92

409

231

12

245

101

01

146

mdash1

091

9Rust

elut2

221

41

786

173

01

987

187

81

882

307

12

704

278

82

814

290

92

740

135

21

441

127

51

420

10R

ust

elut3

234

02

702

270

42

526

297

92

319

228

43

773

356

73

636

346

53

431

198

11

716

252

32

118

Tab

le4

Init

ial

S

Pom

be

ph

ase

angl

edat

afo

rea

chex

per

imen

t

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

16 isocir An R Package for Isotonic Inference for Circular Data

CIR

Eunder

Circular

Order

SCE

p-value``

``

```

```

``

Experim

ents

Gen

esφ1

φ2

φ3

φ4

φ5

φ6

φ7

φ8

φ9

φ10

φ11

φ12

φ13

φ14

φ15

φ16

1Oliva

cdc6257

62576257

62570054

00540054

10451045

10851085

12895069

50695069

52091270

06658

2Oliva

elut12526

25262526

25262526

25262526

45154515

45154515

45151600

17851785

25191218

07214

3Oliva

elut25849

58495849

58495849

58496045

05980598

05980598

06873935

39705836

58492660

02437

4Peng

cdc3225

32253225

32253225

32253225

47364736

47494749

47492685

26932693

26930248

09983

5Peng

elut3333

37253725

37253725

39704296

51245124

51445216

52433302

33023302

33020156

09850

6Rust

cdc11961

19611961

19611961

19611961

30293029

30293029

30291230

12301571

15710213

04142

7Rust

cdc21693

16931693

16931952

19521978

36143614

36143614

36141301

13011301

13960296

09536

8Rust

elut1mdash

13731373

mdash1427

14271427

23332333

23332333

23331010

1118mdash

11180028

09992

9Rust

elut21909

19091909

19161916

19162837

28372837

28372837

28371352

13581358

14200125

09992

10Rust

elut32340

25852585

25852585

25852585

35743574

35743574

35741849

18492321

23210269

08748

Tab

le5

CIR

E

SC

Ean

dp-valu

esfor

eachex

perim

ent

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions

Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17

Affiliation

Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes

URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel

httpwwweiouvaes~cristina

Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm

  • Introduction
  • Circular models
    • Description Orders
    • CIRE and test
      • isocir
        • Background
        • Package structure
        • Examples
          • Applications
          • Conclusions