isocir: an r package for constrained inference using ... · an application to cell biology. sandra...
TRANSCRIPT
isocir An R Package for Constrained Inference
using Isotonic Regression for Circular Data with
an Application to Cell Biology
Sandra BarraganUniversidad de Valladolid
Miguel A FernandezUniversidad de Valladolid
Cristina RuedaUniversidad de Valladolid
Shyamal Das PeddadaNational Institute of
Environmental Health Sciences
Abstract
In many applications one may be interested in drawing inferences regarding the orderof a collection of points on a unit circle Due to the underlying geometry of the circlestandard constrained inference procedures developed for Euclidean space data are notapplicable Recently statistical inference for parameters under such order constraints ona unit circle was discussed in Rueda et al (2009) Fernandez et al (2012) In this paperwe introduce an R package called isocir which provides a set of functions that can be usedfor analyzing angular data subject to order constraints on a unit circle Since this workis motivated by applications in cell biology we illustrate the proposed package using arelevant cell cycle data
Keywords Circular Data Isotropic Order CIRE Conditional Test R package isocir R
1 Introduction
Circular data arise in a wide range of contexts such as in geography cell biology circadianbiology endocrinology ornithology etc (cf Zar (1999) or Mardia et al (2008)) This workis motivated by applications in cell cycle biology where one may want to draw inferencesregarding angular parameters subject to order constraints on a unit circle A cell cycleamong eukaryotes follows a well-coordinated process where cells go through cycle four phasesof distinct biological functions namely G1 S G2 and M phase (see Figure 1)
cular order and extended the notion of isotonic regression estimator to circular parameterspace by defining the circular isotonic regression estimator (CIRE) Using CIRE Fernandezet al (2012) developed a formal statistical theory and methodology for testing whether thecircular order of peak expression of a subset of cell cycle genes is conserved across multiplespecies
These statistical methods may have numerous other applications apart from cell cycle Withthe increase in human survival rates there is considerable interest in understanding neuro-
2 isocir An R Package for Isotonic Inference for Circular Data
Figure 1 Phases of a cell cycle
Genes participating in the cell division cycle are oftencalled cell cycle genes A cell cycle gene has a periodicexpression with its peak expression occurring just beforeits biological function (Jensen et al (2006)) Since theperiodic expression of a cell cycle gene can be mappedonto a unit circle the angle corresponding to its peakexpression is known as the phase angle of the gene (seeLiu et al (2004) and Figure 2)
Since cell cycle is fundamental to the growth and developmentof an organism cell biologists have been interested in under-standing various aspects of cell cycle that are evolutionarilyconserved For instance they would like to identify geneswhose relative order of peak expressions is evolutionarily con-served In order to solve such problems Rueda et al (2009)introduced an order restriction on the unit circle called cir- Figure 2 Phase angle (φ)
logical diseases related to aging such as the Alzheimerrsquos disease (AD) and the Parkinsonrsquosdisease (PD) An important aspect of such neurological diseases is the disruption of circadianclock and genes participating in it Researchers are interested in testing differences in thephases of expression of circadian genes in normal and AD patients (Cermakian et al (2011))Methodology discussed in this paper can be used for analyzing such data Other areas ofapplications include the study of migratory patterns and directions of birds (Cochran et al(2004)) the changes in the wind directions (Bowers et al (2000)) directional fluctuations inthe atmosphere (van Doorn et al (2000)) psychology (studies of mental maps or monitoringdata (Kibiak and Jonas (2007)) the orientation of ridges in fingerprints or magnetic maps(Boles and Lohmann (2003))
Motivated by the wide range of applications and the non-existence of a user friendly softwarein Section 3 we introduce the isocir package programmed in R environment R Develop-ment Core Team (2011) which can be downloaded from httpcranr-projectorgweb
packagesisocir The package provides functions which can be used for drawing inferencesregarding the order of a collection of points on a unit circle In Section 2 we describe thestatistical problem and the methodology of Rueda et al (2009) and Fernandez et al (2012)The isocir package is illustrated in Section 4 using the motivating cell cycle gene expressiondata Some concluding remarks are provided in Section 5
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 3
2 Angular Parameters under Order Constraints
21 Circular Order Restriction
Let θ = (θ1 θ2 θq) where each θi i = 1 2 q is the sample circular mean of a randomsample of size ni from a population with unknown circular mean φi All angles are definedin a counter clockwise direction relative to a given pole The mean resultant lengths foreach population are denoted as r1 rq (see Mardia and Jupp (2000) for the definition ofcircular mean and mean resultant length of a set of angles) Then the problem of interest isto draw statistical inferences on φi i = 1 2 q subject to the constraint that the anglesφ1 φ2 φq are in a counter clockwise order on a unit circle Thus φ1 is ldquofollowedrdquo by φ2which is ldquofollowedrdquo by φ3 φq is ldquofollowedrdquo by φ1 More precisely we shall denote thissimple circular order among angular parameters as follows
Csco = φ = (φ1 φ2 φq) isin [0 2π]q φ1 φ2 middot middot middot φq φ1 (1)
It is important to note that the order among the angular parameters is invariant under changesin location of the pole (the initial point of the circle) Unlike the Euclidean space points ona unit circle wrap around That is starting at the pole by traveling 2π radians around thecircumference of the circle one would return to pole For this reason the circular order amongpoints on a unit circle is preserved even if the location of the pole is shifted This is whyRueda et al (2009) and Fernandez et al (2012) refer to the circular order Csco as isotropicorder As in this paper we will also consider more general circular order restrictions fromnow on we will refer to Csco as simple circular order
As a consequence of the geometry a circle can never be linearized and hence methods devel-oped for Euclidean space data are not applicable to circular data The problem is even morechallenging when the angular parameters are constrained by an order around the circle suchas Csco General methodology for circular data when there are no constraints on the angularparameters can be found in the book Mardia and Jupp (2000) among others Constrainedinference for circular data is rather recent (Rueda et al (2009) and Fernandez et al (2012))As noted in Rueda et al (2009) standard Euclidean space methods such as the pool adja-cent violators algorithm (PAVA) used for computing isotonic regression (see Robertson et al(1988) for details) cannot be applied to circular data For example when a cell biologist isinvestigating a large number of cell cycle genes it may be difficult to ascertain the circularorder among all cell cycle genes under consideration However based on the underlying bi-ology the investigator may a priori know the circular order among groups of genes but notthe order among genes within each group In such situations a partial circular order Cpco asdefined below can be used
Cpco =
φ isin [0 2π]q
φ1φ2φl1
φl1+1
φl1+2
φl1+l2
middot middot middot
φl1++lLminus1+1
φl1++lLminus1+2
φl1++lL
φ1φ2φl1
(2)In this case we have L sets of parameters with lj angular parameters in set j and q =
sumLj=1 lj
Order among the parameters within a set is not known but every parameter in a given set isldquofollowedrdquo by every parameter in the next set
4 isocir An R Package for Isotonic Inference for Circular Data
22 Estimation and Testing under Circular Order Restrictions
Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)
prime isin Csco is givenby
φ = arg minφisinCsco
SCE(φ θ) (3)
where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data
SCE(φ θ) =
qsumi=1
ri(1minus cos(φi minus θi)) (4)
where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir
Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses
H0 φ isin CscoH1 H0 is not true
The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ
is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by
Reject H0 if T lowast ge c(m)
where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that
pr(Fqminusmqminus1 ge c(m)) =α
1minus 1(q minus 1) (5)
where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test
H0 φ isin CpcoH1 H0 is not true
by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5
The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates
These tests are implemented in the function condtest in the R package isocir introduced inthe next Section
3 Package isocir
We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples
31 Related Packages
As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as
isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion
Iso (Turner (2009)) Functions to perform isotonic regression
ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves
Similarly there are several packages in R for analyzing circular data such as
CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name
circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package
Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir
32 Package Structure
For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1
6 isocir An R Package for Isotonic Inference for Circular Data
Functions Arguments Description
sce (arg1 arg2 meanrl) Sum of Circular Errors
mrl (data) Mean Resultant Length
CIRE (data groups circular) Circular Isotonic Regression Estimator
condtest (data groups kappa) Conditional Test
isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir
isisocir (x) Checks an Object
printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir
plotisocir (x option ) Plots an object of class isocir
Table 1 Summary of the components of the package isocir
In the following we describe each function of the software in detail
Functions sce() and mrl()
The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data
Function CIRE()
Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains
Arguments Values
data vector or matrix with the data
groups the groups of the order
circular =TRUE(by default) =FALSE
Table 2 Arguments of the CIRE function
the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular
sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))
Function condtest()
This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7
H0 The angles φ1 φq follow a (simple or partial) circular order
H1 H0 is not true
The arguments of this function appear in Table 3 and are explained below Arguments data
Arguments κ known κ unknown
data numeric vector matrix (as many columns as replications)
groups numeric vector with the groups of the order to be tested
kappa positive numeric value (NULL)
biasCorrect (NULL) =TRUE(by default) =FALSE
Table 3 Arguments of the condtest function
and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest
Class isocir
Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements
$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)
$SCE is the value of the Sum of Circular Errors (SCE(φ θ))
$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument
$kappa the value of kappa (either set by the user or estimated)
$pvalue the p-value of the conditional test obtained from the function condtest
These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
2 isocir An R Package for Isotonic Inference for Circular Data
Figure 1 Phases of a cell cycle
Genes participating in the cell division cycle are oftencalled cell cycle genes A cell cycle gene has a periodicexpression with its peak expression occurring just beforeits biological function (Jensen et al (2006)) Since theperiodic expression of a cell cycle gene can be mappedonto a unit circle the angle corresponding to its peakexpression is known as the phase angle of the gene (seeLiu et al (2004) and Figure 2)
Since cell cycle is fundamental to the growth and developmentof an organism cell biologists have been interested in under-standing various aspects of cell cycle that are evolutionarilyconserved For instance they would like to identify geneswhose relative order of peak expressions is evolutionarily con-served In order to solve such problems Rueda et al (2009)introduced an order restriction on the unit circle called cir- Figure 2 Phase angle (φ)
logical diseases related to aging such as the Alzheimerrsquos disease (AD) and the Parkinsonrsquosdisease (PD) An important aspect of such neurological diseases is the disruption of circadianclock and genes participating in it Researchers are interested in testing differences in thephases of expression of circadian genes in normal and AD patients (Cermakian et al (2011))Methodology discussed in this paper can be used for analyzing such data Other areas ofapplications include the study of migratory patterns and directions of birds (Cochran et al(2004)) the changes in the wind directions (Bowers et al (2000)) directional fluctuations inthe atmosphere (van Doorn et al (2000)) psychology (studies of mental maps or monitoringdata (Kibiak and Jonas (2007)) the orientation of ridges in fingerprints or magnetic maps(Boles and Lohmann (2003))
Motivated by the wide range of applications and the non-existence of a user friendly softwarein Section 3 we introduce the isocir package programmed in R environment R Develop-ment Core Team (2011) which can be downloaded from httpcranr-projectorgweb
packagesisocir The package provides functions which can be used for drawing inferencesregarding the order of a collection of points on a unit circle In Section 2 we describe thestatistical problem and the methodology of Rueda et al (2009) and Fernandez et al (2012)The isocir package is illustrated in Section 4 using the motivating cell cycle gene expressiondata Some concluding remarks are provided in Section 5
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 3
2 Angular Parameters under Order Constraints
21 Circular Order Restriction
Let θ = (θ1 θ2 θq) where each θi i = 1 2 q is the sample circular mean of a randomsample of size ni from a population with unknown circular mean φi All angles are definedin a counter clockwise direction relative to a given pole The mean resultant lengths foreach population are denoted as r1 rq (see Mardia and Jupp (2000) for the definition ofcircular mean and mean resultant length of a set of angles) Then the problem of interest isto draw statistical inferences on φi i = 1 2 q subject to the constraint that the anglesφ1 φ2 φq are in a counter clockwise order on a unit circle Thus φ1 is ldquofollowedrdquo by φ2which is ldquofollowedrdquo by φ3 φq is ldquofollowedrdquo by φ1 More precisely we shall denote thissimple circular order among angular parameters as follows
Csco = φ = (φ1 φ2 φq) isin [0 2π]q φ1 φ2 middot middot middot φq φ1 (1)
It is important to note that the order among the angular parameters is invariant under changesin location of the pole (the initial point of the circle) Unlike the Euclidean space points ona unit circle wrap around That is starting at the pole by traveling 2π radians around thecircumference of the circle one would return to pole For this reason the circular order amongpoints on a unit circle is preserved even if the location of the pole is shifted This is whyRueda et al (2009) and Fernandez et al (2012) refer to the circular order Csco as isotropicorder As in this paper we will also consider more general circular order restrictions fromnow on we will refer to Csco as simple circular order
As a consequence of the geometry a circle can never be linearized and hence methods devel-oped for Euclidean space data are not applicable to circular data The problem is even morechallenging when the angular parameters are constrained by an order around the circle suchas Csco General methodology for circular data when there are no constraints on the angularparameters can be found in the book Mardia and Jupp (2000) among others Constrainedinference for circular data is rather recent (Rueda et al (2009) and Fernandez et al (2012))As noted in Rueda et al (2009) standard Euclidean space methods such as the pool adja-cent violators algorithm (PAVA) used for computing isotonic regression (see Robertson et al(1988) for details) cannot be applied to circular data For example when a cell biologist isinvestigating a large number of cell cycle genes it may be difficult to ascertain the circularorder among all cell cycle genes under consideration However based on the underlying bi-ology the investigator may a priori know the circular order among groups of genes but notthe order among genes within each group In such situations a partial circular order Cpco asdefined below can be used
Cpco =
φ isin [0 2π]q
φ1φ2φl1
φl1+1
φl1+2
φl1+l2
middot middot middot
φl1++lLminus1+1
φl1++lLminus1+2
φl1++lL
φ1φ2φl1
(2)In this case we have L sets of parameters with lj angular parameters in set j and q =
sumLj=1 lj
Order among the parameters within a set is not known but every parameter in a given set isldquofollowedrdquo by every parameter in the next set
4 isocir An R Package for Isotonic Inference for Circular Data
22 Estimation and Testing under Circular Order Restrictions
Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)
prime isin Csco is givenby
φ = arg minφisinCsco
SCE(φ θ) (3)
where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data
SCE(φ θ) =
qsumi=1
ri(1minus cos(φi minus θi)) (4)
where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir
Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses
H0 φ isin CscoH1 H0 is not true
The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ
is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by
Reject H0 if T lowast ge c(m)
where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that
pr(Fqminusmqminus1 ge c(m)) =α
1minus 1(q minus 1) (5)
where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test
H0 φ isin CpcoH1 H0 is not true
by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5
The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates
These tests are implemented in the function condtest in the R package isocir introduced inthe next Section
3 Package isocir
We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples
31 Related Packages
As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as
isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion
Iso (Turner (2009)) Functions to perform isotonic regression
ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves
Similarly there are several packages in R for analyzing circular data such as
CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name
circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package
Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir
32 Package Structure
For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1
6 isocir An R Package for Isotonic Inference for Circular Data
Functions Arguments Description
sce (arg1 arg2 meanrl) Sum of Circular Errors
mrl (data) Mean Resultant Length
CIRE (data groups circular) Circular Isotonic Regression Estimator
condtest (data groups kappa) Conditional Test
isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir
isisocir (x) Checks an Object
printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir
plotisocir (x option ) Plots an object of class isocir
Table 1 Summary of the components of the package isocir
In the following we describe each function of the software in detail
Functions sce() and mrl()
The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data
Function CIRE()
Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains
Arguments Values
data vector or matrix with the data
groups the groups of the order
circular =TRUE(by default) =FALSE
Table 2 Arguments of the CIRE function
the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular
sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))
Function condtest()
This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7
H0 The angles φ1 φq follow a (simple or partial) circular order
H1 H0 is not true
The arguments of this function appear in Table 3 and are explained below Arguments data
Arguments κ known κ unknown
data numeric vector matrix (as many columns as replications)
groups numeric vector with the groups of the order to be tested
kappa positive numeric value (NULL)
biasCorrect (NULL) =TRUE(by default) =FALSE
Table 3 Arguments of the condtest function
and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest
Class isocir
Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements
$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)
$SCE is the value of the Sum of Circular Errors (SCE(φ θ))
$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument
$kappa the value of kappa (either set by the user or estimated)
$pvalue the p-value of the conditional test obtained from the function condtest
These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 3
2 Angular Parameters under Order Constraints
21 Circular Order Restriction
Let θ = (θ1 θ2 θq) where each θi i = 1 2 q is the sample circular mean of a randomsample of size ni from a population with unknown circular mean φi All angles are definedin a counter clockwise direction relative to a given pole The mean resultant lengths foreach population are denoted as r1 rq (see Mardia and Jupp (2000) for the definition ofcircular mean and mean resultant length of a set of angles) Then the problem of interest isto draw statistical inferences on φi i = 1 2 q subject to the constraint that the anglesφ1 φ2 φq are in a counter clockwise order on a unit circle Thus φ1 is ldquofollowedrdquo by φ2which is ldquofollowedrdquo by φ3 φq is ldquofollowedrdquo by φ1 More precisely we shall denote thissimple circular order among angular parameters as follows
Csco = φ = (φ1 φ2 φq) isin [0 2π]q φ1 φ2 middot middot middot φq φ1 (1)
It is important to note that the order among the angular parameters is invariant under changesin location of the pole (the initial point of the circle) Unlike the Euclidean space points ona unit circle wrap around That is starting at the pole by traveling 2π radians around thecircumference of the circle one would return to pole For this reason the circular order amongpoints on a unit circle is preserved even if the location of the pole is shifted This is whyRueda et al (2009) and Fernandez et al (2012) refer to the circular order Csco as isotropicorder As in this paper we will also consider more general circular order restrictions fromnow on we will refer to Csco as simple circular order
As a consequence of the geometry a circle can never be linearized and hence methods devel-oped for Euclidean space data are not applicable to circular data The problem is even morechallenging when the angular parameters are constrained by an order around the circle suchas Csco General methodology for circular data when there are no constraints on the angularparameters can be found in the book Mardia and Jupp (2000) among others Constrainedinference for circular data is rather recent (Rueda et al (2009) and Fernandez et al (2012))As noted in Rueda et al (2009) standard Euclidean space methods such as the pool adja-cent violators algorithm (PAVA) used for computing isotonic regression (see Robertson et al(1988) for details) cannot be applied to circular data For example when a cell biologist isinvestigating a large number of cell cycle genes it may be difficult to ascertain the circularorder among all cell cycle genes under consideration However based on the underlying bi-ology the investigator may a priori know the circular order among groups of genes but notthe order among genes within each group In such situations a partial circular order Cpco asdefined below can be used
Cpco =
φ isin [0 2π]q
φ1φ2φl1
φl1+1
φl1+2
φl1+l2
middot middot middot
φl1++lLminus1+1
φl1++lLminus1+2
φl1++lL
φ1φ2φl1
(2)In this case we have L sets of parameters with lj angular parameters in set j and q =
sumLj=1 lj
Order among the parameters within a set is not known but every parameter in a given set isldquofollowedrdquo by every parameter in the next set
4 isocir An R Package for Isotonic Inference for Circular Data
22 Estimation and Testing under Circular Order Restrictions
Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)
prime isin Csco is givenby
φ = arg minφisinCsco
SCE(φ θ) (3)
where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data
SCE(φ θ) =
qsumi=1
ri(1minus cos(φi minus θi)) (4)
where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir
Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses
H0 φ isin CscoH1 H0 is not true
The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ
is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by
Reject H0 if T lowast ge c(m)
where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that
pr(Fqminusmqminus1 ge c(m)) =α
1minus 1(q minus 1) (5)
where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test
H0 φ isin CpcoH1 H0 is not true
by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5
The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates
These tests are implemented in the function condtest in the R package isocir introduced inthe next Section
3 Package isocir
We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples
31 Related Packages
As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as
isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion
Iso (Turner (2009)) Functions to perform isotonic regression
ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves
Similarly there are several packages in R for analyzing circular data such as
CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name
circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package
Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir
32 Package Structure
For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1
6 isocir An R Package for Isotonic Inference for Circular Data
Functions Arguments Description
sce (arg1 arg2 meanrl) Sum of Circular Errors
mrl (data) Mean Resultant Length
CIRE (data groups circular) Circular Isotonic Regression Estimator
condtest (data groups kappa) Conditional Test
isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir
isisocir (x) Checks an Object
printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir
plotisocir (x option ) Plots an object of class isocir
Table 1 Summary of the components of the package isocir
In the following we describe each function of the software in detail
Functions sce() and mrl()
The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data
Function CIRE()
Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains
Arguments Values
data vector or matrix with the data
groups the groups of the order
circular =TRUE(by default) =FALSE
Table 2 Arguments of the CIRE function
the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular
sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))
Function condtest()
This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7
H0 The angles φ1 φq follow a (simple or partial) circular order
H1 H0 is not true
The arguments of this function appear in Table 3 and are explained below Arguments data
Arguments κ known κ unknown
data numeric vector matrix (as many columns as replications)
groups numeric vector with the groups of the order to be tested
kappa positive numeric value (NULL)
biasCorrect (NULL) =TRUE(by default) =FALSE
Table 3 Arguments of the condtest function
and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest
Class isocir
Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements
$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)
$SCE is the value of the Sum of Circular Errors (SCE(φ θ))
$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument
$kappa the value of kappa (either set by the user or estimated)
$pvalue the p-value of the conditional test obtained from the function condtest
These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
4 isocir An R Package for Isotonic Inference for Circular Data
22 Estimation and Testing under Circular Order Restrictions
Analogous to PAVA for Euclidean data Rueda et al (2009) derived a circular isotonic regres-sion estimator (CIRE) for estimating angular parameters (φ1 φ2 φq) subject to a circularorder The CIRE of φ = (φ1 φ2 φq) under the constraint (φ1 φ2 φq)
prime isin Csco is givenby
φ = arg minφisinCsco
SCE(φ θ) (3)
where SCE(φ θ) defined below is the Sum of Circular Errors a circle analog to the Sum ofSquared Errors (SSE) used for Euclidean data
SCE(φ θ) =
qsumi=1
ri(1minus cos(φi minus θi)) (4)
where ri are the mean resultant lengths The CIRE is implemented in the function CIRE ofthe package isocir
Just as the normal distribution is commonly used for the Euclidean space data the von-Mises distribution is widely used for describing angular data on a unit circle Accordinglyfor i = 1 2 q throughout this paper we assume that θi are independently distributedaccording to von-Mises distribution denoted as M(φi κ) where φi is the modal direction andκ is the concentration parameter (see Mardia and Jupp (2000)) Under such an assumptionFernandez et al (2012) developed a conditional test for testing the following hypotheses
H0 φ isin CscoH1 H0 is not true
The conditional test statistic is given by T lowast = 2κSCE(φθ)q where κ is the estimator of κ φ
is the CIRE computed under H0 The estimate φ determines a partition of weierp = 1 Iinto sets of coordinates on which φ is constant These sets are called level sets The rejectionregion for the conditional α-level test is given by
Reject H0 if T lowast ge c(m)
where m is the number of level sets for φ and for large values of κ the approximate criticalvalue c(m) is chosen so that
pr(Fqminusmqminus1 ge c(m)) =α
1minus 1(q minus 1) (5)
where Fqminusmqminus1 represents the central F random variable with (qminusm qminus1) degrees of freedomThe above test statistic is proportional to a chi-square test when κ is known For details onemay refer to Fernandez et al (2012) The above methodology can be modified to test
H0 φ isin CpcoH1 H0 is not true
by replacing 1(q minus 1) by (l1l2 middot middot middot lL)(q minus 1)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5
The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates
These tests are implemented in the function condtest in the R package isocir introduced inthe next Section
3 Package isocir
We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples
31 Related Packages
As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as
isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion
Iso (Turner (2009)) Functions to perform isotonic regression
ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves
Similarly there are several packages in R for analyzing circular data such as
CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name
circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package
Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir
32 Package Structure
For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1
6 isocir An R Package for Isotonic Inference for Circular Data
Functions Arguments Description
sce (arg1 arg2 meanrl) Sum of Circular Errors
mrl (data) Mean Resultant Length
CIRE (data groups circular) Circular Isotonic Regression Estimator
condtest (data groups kappa) Conditional Test
isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir
isisocir (x) Checks an Object
printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir
plotisocir (x option ) Plots an object of class isocir
Table 1 Summary of the components of the package isocir
In the following we describe each function of the software in detail
Functions sce() and mrl()
The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data
Function CIRE()
Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains
Arguments Values
data vector or matrix with the data
groups the groups of the order
circular =TRUE(by default) =FALSE
Table 2 Arguments of the CIRE function
the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular
sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))
Function condtest()
This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7
H0 The angles φ1 φq follow a (simple or partial) circular order
H1 H0 is not true
The arguments of this function appear in Table 3 and are explained below Arguments data
Arguments κ known κ unknown
data numeric vector matrix (as many columns as replications)
groups numeric vector with the groups of the order to be tested
kappa positive numeric value (NULL)
biasCorrect (NULL) =TRUE(by default) =FALSE
Table 3 Arguments of the condtest function
and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest
Class isocir
Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements
$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)
$SCE is the value of the Sum of Circular Errors (SCE(φ θ))
$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument
$kappa the value of kappa (either set by the user or estimated)
$pvalue the p-value of the conditional test obtained from the function condtest
These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 5
The simulation study performed in Fernandez et al (2012) suggests that the power of this testis quite reasonable Notice that for a given data set the p-value obtained by using the abovemethodology may serve as a useful goodness of fit criterion when comparing two or moreplausible circular orders among a set of angular parameters Larger values may suggest thatthe estimations are closer to the presumed circular order Thus the statistical methodologydeveloped in Fernandez et al (2012) can be used not only for testing relative order amongthe parameters It can be also useful for selecting a ldquobest fittingrdquo circular order among severalcircular order candidates
These tests are implemented in the function condtest in the R package isocir introduced inthe next Section
3 Package isocir
We start this section by giving some background on R packages for isotonic regression andanalysis of circular data We then describe the structure of our package isocir and illustrateit by some examples
31 Related Packages
As isotonic regression is a well-known and widely used technique there are many packages inR for performing isotonic regression such as
isotone (de Leeuw et al (2009)) Active set and generalized PAVA for isotone optimiza-tion
Iso (Turner (2009)) Functions to perform isotonic regression
ordMonReg (Balabdaoui et al (2009)) Compute least squares estimates of one boundedor two ordered isotonic regression curves
Similarly there are several packages in R for analyzing circular data such as
CircStats (Lund and Agostinelli (2009)) The implementations of the Circular Statisticsfrom ldquoTopics in circular Statisticsrdquo Jammalamadaka and SenGupta (2001) It is an Rport from the S-plus library with the same name
circular (Agostinelli and Lund (2011)) This package expands in several ways the Circ-Stats package
Since none of the existing packages for circular data are applicable for analyzing circular dataunder constraints in this article we introduce the software package ldquoisotonic inference forcircular datardquo with the acronym isocir for analyzing circular data under constraints Ourpackage depends on circular (see Agostinelli and Lund (2011)) and combinat (see Chasalow(2010)) These packages should be installed in the computer before loading isocir
32 Package Structure
For the convenience of the reader we summarize all the functions arguments and descriptionsof our package isocir in Table 1
6 isocir An R Package for Isotonic Inference for Circular Data
Functions Arguments Description
sce (arg1 arg2 meanrl) Sum of Circular Errors
mrl (data) Mean Resultant Length
CIRE (data groups circular) Circular Isotonic Regression Estimator
condtest (data groups kappa) Conditional Test
isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir
isisocir (x) Checks an Object
printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir
plotisocir (x option ) Plots an object of class isocir
Table 1 Summary of the components of the package isocir
In the following we describe each function of the software in detail
Functions sce() and mrl()
The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data
Function CIRE()
Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains
Arguments Values
data vector or matrix with the data
groups the groups of the order
circular =TRUE(by default) =FALSE
Table 2 Arguments of the CIRE function
the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular
sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))
Function condtest()
This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7
H0 The angles φ1 φq follow a (simple or partial) circular order
H1 H0 is not true
The arguments of this function appear in Table 3 and are explained below Arguments data
Arguments κ known κ unknown
data numeric vector matrix (as many columns as replications)
groups numeric vector with the groups of the order to be tested
kappa positive numeric value (NULL)
biasCorrect (NULL) =TRUE(by default) =FALSE
Table 3 Arguments of the condtest function
and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest
Class isocir
Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements
$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)
$SCE is the value of the Sum of Circular Errors (SCE(φ θ))
$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument
$kappa the value of kappa (either set by the user or estimated)
$pvalue the p-value of the conditional test obtained from the function condtest
These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
6 isocir An R Package for Isotonic Inference for Circular Data
Functions Arguments Description
sce (arg1 arg2 meanrl) Sum of Circular Errors
mrl (data) Mean Resultant Length
CIRE (data groups circular) Circular Isotonic Regression Estimator
condtest (data groups kappa) Conditional Test
isocir (cirmeans SCE CIRE pvalue kappa) Creates an Object of class isocir
isisocir (x) Checks an Object
printisocir (xdecCIREdecpvaluedeckappa) Prints an object of class isocir
plotisocir (x option ) Plots an object of class isocir
Table 1 Summary of the components of the package isocir
In the following we describe each function of the software in detail
Functions sce() and mrl()
The auxiliary function sce computes the sum of circular errors between a given q-dimensionalvector (denoted by arg1) and one or more q-dimensional vectors (denoted by arg2) Thefunction mrl computes the mean resultant length for the input data
Function CIRE()
Using the methodology developed in Rueda et al (2009) this R function computes the CIREfor a given circular order (1) or a partial order (2) The arguments of this function aresummarized in Table 2 The input variable data is a matrix where each column contains
Arguments Values
data vector or matrix with the data
groups the groups of the order
circular =TRUE(by default) =FALSE
Table 2 Arguments of the CIRE function
the vector of unconstrained angular means corresponding to each replication If there isonly one replication then data is a vector The position i in the vector groups containsthe group number to which the parameter φi belongs to The logical argument circular
sets whether the order is wrapped around the circle ie circular order (circular=TRUE) ornot ie simple order (circular=FALSE) For example the simple order cone in the circleCso = φ = (φ1 φ2 φq) isin [0 2π]q 0 le φ1 le φ2 le middot middot middot le φq le 2π would be a non circularorder The output of this function is an object of class isocir (explained later) containingthe circular isotonic regression estimator (φ) the unrestricted circular means (θ) and thecorresponding sum of circular error (SCE(φ θ))
Function condtest()
This function performs the conditional test and computes the corresponding p-value for thefollowing hypotheses
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7
H0 The angles φ1 φq follow a (simple or partial) circular order
H1 H0 is not true
The arguments of this function appear in Table 3 and are explained below Arguments data
Arguments κ known κ unknown
data numeric vector matrix (as many columns as replications)
groups numeric vector with the groups of the order to be tested
kappa positive numeric value (NULL)
biasCorrect (NULL) =TRUE(by default) =FALSE
Table 3 Arguments of the condtest function
and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest
Class isocir
Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements
$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)
$SCE is the value of the Sum of Circular Errors (SCE(φ θ))
$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument
$kappa the value of kappa (either set by the user or estimated)
$pvalue the p-value of the conditional test obtained from the function condtest
These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 7
H0 The angles φ1 φq follow a (simple or partial) circular order
H1 H0 is not true
The arguments of this function appear in Table 3 and are explained below Arguments data
Arguments κ known κ unknown
data numeric vector matrix (as many columns as replications)
groups numeric vector with the groups of the order to be tested
kappa positive numeric value (NULL)
biasCorrect (NULL) =TRUE(by default) =FALSE
Table 3 Arguments of the condtest function
and groups are same as those in function CIRE although in this function groups is the orderto be tested instead of the known order The argument kappa is needed only when data is avector If there are no replications in data the value of κ must be set by the user Even whenthere are replicated data if the user knows the value of κ it may be introduced and it willbe taken into consideration to perfom the conditional test When κ is unknown and there arereplicated data the parameter is internally estimated by maximum likelihood and κ is shownin the output The biasCorrect is related to the estimation of κ If biasCorrect=TRUE thebias correction appearing in Mardia and Jupp (2000) p 87 is performed in the estimationof κ The output of this function is an object of class isocir (explained below) with all theresults from the conditional test the CIRE (φ) the unrestricted circular means (θ) the SCE(SCE(φ θ)) the kappa value (estimated or introduced) and the p-value of the conditionaltest
Class isocir
Finally we describe the isocir function This function creates the S3 objects of class isocirwhich is a list with the following elements
$cirmeans is a list with the unrestricted circular means Notice that when the argumentdata is a vector these values match exactly with the input However if there are repli-cated data the argument data is a matrix and $cirmeans contains the correspondingunrestricted circular means (θ1 θq)
$SCE is the value of the Sum of Circular Errors (SCE(φ θ))
$CIRE is a list with the Circular Isotonic Regression Estimator (φ) obtained under the orderdefined by the groups argument
$kappa the value of kappa (either set by the user or estimated)
$pvalue the p-value of the conditional test obtained from the function condtest
These objects of class isocir are the output of the functions CIRE and condtest The lasttwo elements of the list ($kappa and $pvalue) are NULL if the object comes from the functionCIRE Otherwise if the object comes from the function condtest not only there are theresults of the conditional test ($kappa and $pvalue) but also an attribute called estkappa
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
8 isocir An R Package for Isotonic Inference for Circular Data
will inform (or rather remind) the user if the value in $kappa has been internally estimatedor introduced as a known input
Some S3 methods have also been defined for the class isocir
isocir(cirmeans = NULL SCE = NULL CIRE = NULL pvalue = NULL kappa = NULL)This function creates an object of class isocir
isisocir(x) This function checks whether the object x is of class isocir
printisocir(x decCIRE decpvalue deckappa ) This S3 method is usedto print an object x of class isocir The number of decimal places can be chosen
plotisocir(x option = c(CIRE cirmeans) ) This S3 method is usedto plot an object x of class isocir The argument option gives the user the optionto plot the points of the Circular Isotonic Regression Estimator (by default) or theunrestricted circular means
33 Examples
In this section we provide examples to illustrate isocir
Example 1
Suppose the observed angular means of eight populations are given by
θ1 = 0025 θ2 = 1475 θ3 = 3274θ4 = 5518 θ5 = 2859θ6 = 5387θ7 = 4179 θ8 = 1962We illustrate isocir for estimating the 8 population angular parameters under the fol-lowing partial circular order constraint φ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ1
φ2
φ3
These data are a set of random circular data called cirdata in our package and theycan be used by calling as below
gt data(cirdata)
gt cirdata
Since in this example there are no replications we provide data in a vector formatThe groups of the order are defined as follows
gt orderGroups lt- c(1 1 1 2 2 3 4 4)
Thus we obtain CIRE using the function CIRE as follows
gt example1CIRE lt- CIRE(cirdata groups = orderGroups circular = TRUE)
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 9
The output is saved in example1CIRE and the printed output is as follows
gt example1CIRE
Thus the constrained estimates satisfy the required order as followsφ1 = 0993
φ2 = 1475
φ3 = 3066
φ4 = 5056
φ5 = 3066
φ6 = 5056
φ7 = 5056
φ8 = 0993
where φ is the Circular Isotonic Regression Estimator of φ Results may be displayedgraphically by setting plot(example1CIRE) When done so a plot with the pointsof the CIRE is produced To see the plot for the unrestricted estimates the argumentoption=cirmeans can be used (ie plot(example1CIRE option = cirmeans))
Example 2 (κ unknown (replications needed))
Using the data in our package called datareplic we demonstrate the use of the functioncondtest when κ is unknown As remarked earlier when κ is unknown we need repli-cate data to estimate κ The file datareplic is a matrix where each column containsthe values of a replication and each row the angles observed at each population meanIn this example we have 8 populations and hypotheses regarding the corresponding 8parameters are as follows
H0 φ1 φ2 φ3 φ4 φ5 φ6 φ7 φ8 φ1
H1 H0 is not true
We take the data from the package and set the groups of the order in the argumentgroups
gt data(datareplic)
gt orderGroups2 lt- c(18)
Since replicate data are available we do not include the argument kappa as we wantthe function to estimate it Moreover we correct the bias in the estimation of κ so weset biasCorrect=TRUE Thus we have the following code
gt example2test lt- condtest(datareplicgroups=orderGroups2biasCorrect=TRUE)
gt example2test
The result is the p-value defined in (5) Since the p-value = 00034 we reject the nullhypothesis and conclude that the parameters do not satisfy the specified circular order
If the user is interested in printing the unrestricted circular means θ then the followingcommand is used example2test$cirmeans The result is a list that is saved in thesame format as CIRE Since each group in the circular order has a single element it isconvenient to use the vector format Hence we have
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
10 isocir An R Package for Isotonic Inference for Circular Data
gt round(unlist(example2test$cirmeans) digits = 3)
4 Application to Analysis of the Cell Cycle Gene Expression Data
As noted in the Introduction there has been considerable discussion in the literatureon the conservation of various aspects of cell cycle genes (Fernandez et al (2012))particularly between two yeast species namely S Cerevisiae (budding yeast) and SPombe (fission yeast) Using the 10 published budding yeast data sets (Rustici et al(2004) Oliva et al (2005) Peng et al (2005)) we illustrate the isocir package to testthe null hypothesis that 16 fission yeast genes namely ssb1 cdc22 msh6 psm3 rad21cig2 mik1 h33 hhf1 hht3 hta2 htb1 fkh2 chs2 sid2 and slp1 satisfy the samecircular order as their budding yeast orthologs (RFA1 RNR1 MSH6 SMC3 MCD1CLN2 SWE1 HHT2 HHF1 HHT1 HTA2 HTB2 FKH1 CHS2 DBF2 and CDC20)whose circular order is obtained from cyclebase (httpwwwcyclebaseorg) andpublished literature Thus we test the following hypothesis
H0 φssb1 φcdc22 φmsh6 φpsm3 φrad21 φcig2 φmik1 φh33 φhhf1 φhht3 φhta2 φhtb1 φfkh2 φchs2 φsid2 φslp1 φssb1
H1 H0 is not true
(6)
For each of the 10 experimental data sets the unconstrained estimates of the phaseangles of the above 16 fission yeast genes appearing in Table 5 were obtained usingthe Random Periods Model (Liu et al (2004)) The R code for that software canbe obtained from httpwwwniehsnihgovresearchatniehslabsbbstaff
peddadaindexcfm
Notice that there are no replicated data here since the experiments were not performedunder the same experimental conditions It appears that the 10 experiments were notsynchronized (ie cells were probably not arrested at the same point in the cell cycle)For this reason from Table 5 it appears that there is a large variability in the estimatesof phase angles of each of the 16 genes Even though there may be large variability inthe estimated values our interest is in the relative order of phase angles among the 16genes which does not rely on the location of the pole and hence does not rely on thesynchronization As there are no replicated data we have a single observation for eachof the 16 fission yeast genes in each experiment and therefore the values in Table 5play the role of the unrestricted circular mean in each experiment Consequently wesuppose that
θij simind M(φij κj) i = 1 2 16 j = 1 2 10
where θij is the unrestricted circular mean of the gene i in the experiment j
Since the 10 experiments may not considered as replications of each other we performeda separate test for each experiment Moreover as explained in Fernandez et al (2012)we assume that the concentration parameter κj depends on the experiment but not onthe gene The reason for this is that out of the two sources of uncertainty one specificto the gene and another one due to the experiment (and therefore common to all genes
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 11
within the experiment) the former source maybe considered negligible relative to thelatter as the number of time points used in each time course experiment is fairly largefor any specific gene
The κj values considered for this example are obtained using Fernandez et al (2012)The procedure used for the computation of these values comes from an analysis ofvariance type methodology Under the assumptions made before the model for thecircular means is
φij = micro+ αi + βj
where αi is the gene effect and βj is the experiment effect The proposed model isanalogous to the standard two-way analysis of variance model and is fully detailed inthe supplementary material of Fernandez et al (2012)
For each of the 10 experiments we test the hypothesis (6) using the function condtest
that is in our software The following code gives the p-values for each experimentResults are summarized in Table 5
gt data(cirgenes)
gt kappas lt- c(264773 324742 215936 415314 454357
+ 2907610 651408 1419445 566920 1112889)
gt allresults lt- list()
gt resultIsoCIRE lt- matrix(ncol = ncol(cirgenes) nrow = nrow(cirgenes))
gt SCEs lt- vector(mode = numeric length = nrow(cirgenes))
gt pvalues lt- vector(mode = numeric length = nrow(cirgenes))
gt for (i in 1 nrow(cirgenes))
+ k lt- kappas[i]
+ genes lt- asnumeric(cirgenes[i isna(cirgenes[i ]) ])
+ allresults[[i]] lt- condtest(genes kappa = k)
+ resultIsoCIRE[ i isna(cirgenes[i ]) ] lt- unlist(allresults[[i]]$CIRE)
+ SCEs[i] lt- allresults[[i]]$SCE
+ pvalues[i] lt- allresults[[i]]$pvalue
+
From the p-values in Table 5 we see that the null hypothesis cannot be rejected in anyof the 10 experiments even at a level of significance as high as 020 Therefore it seemsplausible that the peak expressions of these 16 genes in S Pombe (fission yeast) followthe same order as their S Cerevisiae (budding yeast) orthologs
5 Conclusions
In this paper the R package isocir has been presented This package provides usefultools for drawing inferences from circular data under order restrictions There are twomain functions (CIRE and condtest) The first one computes the CIRE the circularversion of the widely known isotonic regression in Rq The second one is designed fortesting circular hypotheses using a conditional test We have also created the classisocir in order to properly save all the results Although we illustrated the proposedmethodology using an example from cell biology the proposed software can be applied
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
12 isocir An R Package for Isotonic Inference for Circular Data
to a wide range of contexts For example biologists working on circadian clocks maybe interested in the testing for the conservation of circular order among circadian genesbetween two tissues (eg Liu et al (2006))
We would like to emphasize that the field of constrained inference on a unit circle is in itsinfancy and is wide open for new developments both in methods as well as applicationsAs observed in the introduction such constrained inference problems arise naturally inmany applications Therefore we expect the software described in this paper to bewidely used by researchers working in such areas
Acknowledgments
SBrsquos MAFrsquos and CRrsquos research was partially supported by Spanish MCI grant MTM2009-11161 SBrsquos work has been partially financed by Junta de Castilla y Leon Consejerıade Educacion and the European Social Fund within the PO Castilla y Leon 2007-2013programme SDPrsquos research [in part] was supported by the Intramural Research Pro-gram of the NIH National Institute of Environmental Health Sciences (Z01 ES101744-04) We thank Dr Leping Li Keith Shockley the anonymous reviewers the associateeditor and the editor for several useful comments which improved the presentation ofthis manuscript
References
Agostinelli C Lund U (2011) circular Circular Statistics CA Department ofEnvironmental Sciences Informatics and Statistics Carsquo Foscari University VeniceItaly UL Department of Statistics California Polytechnic State University SanLuis Obispo California USA R package version 04-3 URL httpsr-forge
r-projectorgprojectscircular
Balabdaoui F Rufibach K Santambrogio F (2009) OrdMonReg Compute LeastSquares Estimates of One Bounded or Two Ordered Isotonic Regression Curves Rpackage version 102 URL httpCRANR-projectorgpackage=OrdMonReg
Boles L Lohmann K (2003) ldquoTrue Navigation and Magnetic Maps in Spiny LobstersrdquoNature 421 60ndash63
Bowers J Morton I Mould G (2000) ldquoDirectional Statistics of the Wind and WavesrdquoApplied Ocean Research 22 13ndash30
Cermakian N Lamont E Bourdeau P Boivin D (2011) ldquoCircadian Clock Gene Expres-sion in Brain Regions of Alzheimerrsquos Disease Patients and Control Subjectsrdquo Journalof Biological Rhythms 26 160ndash170
Chasalow S (2010) combinat Combinatorics Utilities R package version 00-8 URLhttpCRANR-projectorgpackage=combinat
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 13
Cochran W Mouritsen H Wikelski M (2004) ldquoMigrating Songbirds Recalibrate TheirMagnetic Compass Daily from Twilight Cuesrdquo Science 304 405ndash408
de Leeuw J Hornik K Mair P (2009) ldquoIsotone Optimization in R Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methodsrdquo Journal of Statistical Soft-ware 32(5) 1ndash24 URL httpwwwjstatsoftorgv32i05
Fernandez M Rueda C Peddada S (2012) ldquoIdentification of a core set of signaturecell cycle genes whose relative order of time to peak expression is conserved acrossspeciesrdquo Nucl Acids Res 40(7) 2823ndash2832 URL httpnaroxfordjournals
orgcontent4072823
Jammalamadaka S SenGupta A (2001) Topics in Circular Statistics World Scientific
Jensen JL Jensen T Lichtenberg U Brunak S Bork P (2006) ldquoCo-evolution of tran-scriptional and post-translational cell-cycle regulationrdquo Nature 443 594ndash597
Kibiak T Jonas C (2007) ldquoApplying Circular Statistics to the Analysis of MonitoringDatardquo European Journal of Psychological Assessment 23 227ndash237
Liu D Peddada S Li L Weinberg C (2006) ldquoPhase Analysis of Circadian-RelatedGenes in Two Tissuesrdquo BMC Bioinformatics 7 87
Liu D Umbach D Peddada S Li L Crockett P Weinberg C (2004) ldquoA RandomPeriods Model for Expression of Cell-Cycle Genesrdquo PNAS 101(19) 7240ndash7245
Lund U Agostinelli C (2009) CircStats Circular Statistics from rdquoTopics in Circu-lar Statisticsrdquo (2001) R package version 02-4 URL httpCRANR-projectorg
package=CircStats
Mardia K Hughes G Taylor C Singh H (2008) ldquoA Multivariate von Mises Distributionwith Applications to Bioinformaticsrdquo Canadian Journal of Statistics 36 99ndash109
Mardia K Jupp P (2000) Directional Statistics John Wiley amp Sons
Oliva A Rosebrock A Ferrezuelo F Pyne S Chen H Skiena S Futcher B LeatherwoodJ (2005) ldquoThe Cell-Cycle-Regulated Genes of Schizosaccharomyces Pomberdquo PlosBiology 3 1239ndash1260
Peng X Karuturi R Miller L Lin K Jia Y Kondu P Wang L Wong L Liu EBalasubramanian M Liu J (2005) ldquoIdentification of Cell Cycle-Regulated Genes inFission Yeastrdquo The American Society for Cell Biology 16 1026ndash1042
R Development Core Team (2011) R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing Vienna Austria ISBN 3-900051-07-0 URL httpwwwR-projectorg
Robertson T Wright F Dykstra R (1988) Order Restricted Statitical Inference JohnWiley amp Sons
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
14 isocir An R Package for Isotonic Inference for Circular Data
Rueda C Fernandez M Peddada S (2009) ldquoEstimation of Parameters Subject to OrderRestrictions on a Circle with Application to Estimation of Phase Angles of Cell-CycleGenesrdquo Journal of the American Statistical Association 104(485) 338ndash347
Rustici G Mata J Kivinen K Lio P Penkett C Burns G Hayles J Brazma A NurseP Bahler J (2004) ldquoPeriodic Gene Expression Program of the Fission Yeast CellCyclerdquo Nature Genetics 36 809ndash817
Turner R (2009) Iso Functions to Perform Isotonic Regression R package version00-8 URL httpCRANR-projectorgpackage=Iso
van Doorn E Dhruva B Sreenivasan K Cassella V (2000) ldquoStatistics of Wind Directionand Its Incrementsrdquo Physics of Fluids 12 1529ndash1534
Zar J (1999) Biostatistical Analysis Prentice Hall
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 15
Unr
estr
icte
dC
ircu
lar
Mea
ns
``
``
```
```
``
Exp
erim
ents
Gen
esssb1
cdc2
2msh
6psm
3ra
d21
cig2
mik1
h33
hhf1
hht
3ht
a2ht
b1
fkh2
chs2
sid2
slp1
θ 1θ 2
θ 3θ 4
θ 5θ 6
θ 7θ 8
θ 9θ 1
0θ 1
1θ 1
2θ 1
3θ 1
4θ 1
5θ 1
6
1Oliva
cdc
020
20
218
626
25
765
089
35
612
625
71
178
091
21
200
097
11
289
529
85
597
425
25
209
2Oliva
elut1
294
03
262
281
12
848
160
32
382
170
94
689
435
54
717
441
94
397
160
01
819
175
12
519
3Oliva
elut2
044
00
447
525
86
206
438
15
458
604
51
541
072
76
115
035
20
687
393
53
970
583
65
896
4Pen
gcd
c3
328
356
53
387
280
63
194
326
03
026
477
94
694
475
54
816
467
52
685
277
02
885
242
2
5Pen
gelut
333
33
912
389
43
444
364
83
970
429
65
189
506
05
144
521
65
243
333
83
607
308
23
185
6Rust
cdc1
196
52
151
203
42
029
174
22
073
173
03
130
299
33
085
306
32
873
128
11
179
190
61
237
7Rust
cdc2
180
92
208
141
51
351
196
41
940
197
83
745
358
43
670
347
93
590
138
31
456
106
41
396
8Rust
elut1
mdash1
457
128
8mdash
153
01
374
137
92
420
227
92
409
231
12
245
101
01
146
mdash1
091
9Rust
elut2
221
41
786
173
01
987
187
81
882
307
12
704
278
82
814
290
92
740
135
21
441
127
51
420
10R
ust
elut3
234
02
702
270
42
526
297
92
319
228
43
773
356
73
636
346
53
431
198
11
716
252
32
118
Tab
le4
Init
ial
S
Pom
be
ph
ase
angl
edat
afo
rea
chex
per
imen
t
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
16 isocir An R Package for Isotonic Inference for Circular Data
CIR
Eunder
Circular
Order
SCE
p-value``
``
```
```
``
Experim
ents
Gen
esφ1
φ2
φ3
φ4
φ5
φ6
φ7
φ8
φ9
φ10
φ11
φ12
φ13
φ14
φ15
φ16
1Oliva
cdc6257
62576257
62570054
00540054
10451045
10851085
12895069
50695069
52091270
06658
2Oliva
elut12526
25262526
25262526
25262526
45154515
45154515
45151600
17851785
25191218
07214
3Oliva
elut25849
58495849
58495849
58496045
05980598
05980598
06873935
39705836
58492660
02437
4Peng
cdc3225
32253225
32253225
32253225
47364736
47494749
47492685
26932693
26930248
09983
5Peng
elut3333
37253725
37253725
39704296
51245124
51445216
52433302
33023302
33020156
09850
6Rust
cdc11961
19611961
19611961
19611961
30293029
30293029
30291230
12301571
15710213
04142
7Rust
cdc21693
16931693
16931952
19521978
36143614
36143614
36141301
13011301
13960296
09536
8Rust
elut1mdash
13731373
mdash1427
14271427
23332333
23332333
23331010
1118mdash
11180028
09992
9Rust
elut21909
19091909
19161916
19162837
28372837
28372837
28371352
13581358
14200125
09992
10Rust
elut32340
25852585
25852585
25852585
35743574
35743574
35741849
18492321
23210269
08748
Tab
le5
CIR
E
SC
Ean
dp-valu
esfor
eachex
perim
ent
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-
Sandra Barragan Miguel A Fernandez Cristina Rueda Shyamal Das Peddada 17
Affiliation
Sandra Barragan Miguel A Fernandez Cristina RuedaDepartamento de Estadıstica e Investigacion OperativaInstituto de Matematicas (IMUVA)Universidad de ValladolidValladolid SpainE-mail sandrabaeiouvaesmiguelafeiouvaescruedaeiouvaes
URL httpwwweiouvaes~sandrahttpwwweiouvaes~miguel
httpwwweiouvaes~cristina
Shyamal D PeddadaBiostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkNC 27709 USAE-mail peddadaniehsnihgovURL httpwwwniehsnihgovresearchatniehslabsbbstaffpeddadaindexcfm
- Introduction
- Circular models
-
- Description Orders
- CIRE and test
-
- isocir
-
- Background
- Package structure
- Examples
-
- Applications
- Conclusions
-