empirical characteristic function approach to goodness of fit tests

Upload: sameer-al-subh

Post on 06-Apr-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    1/277

    Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.

    November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

    iii

    Journal Of Modern Applied Statistical Methods

    Shlomo S. Sawilowsky

    Editor College of EducationWayne State University

    Harvey Keselman Associate Editor

    Department of PsychologyUniversity of Manitoba

    Bruno D. Zumbo Associate Editor Measurement, Evaluation, & Research Methodology

    University of British Columbia

    Vance W. Berger Assistant Editor

    Biometry Research Group National Cancer Institute

    John L. Cuzzocrea Assistant Editor

    Educational ResearchUniversity of Akron

    Todd C. Headrick Assistant Editor

    Educational Psychology and Special EducationSouthern Illinois University-Carbondale

    Alan Klockars Assistant Editor

    Educational PsychologyUniversity of Washington

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    2/277

    Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.

    November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

    iv

    Journal Of Modern Applied Statistical Methods

    Invited Debate

    332 339 Daniel H. Robinson, The Not-So-Quiet Revolution: CautionaryJoel R. Levin Comments on the Rejection of HypothesisTesting in Favor of a Causal ModelingAlternative

    340 347 Joseph Lee Rodgers Statistical and Mathematical Modelingversus NHST? Theres No Competition!

    348 358 Lisa L. Harlow On Scientific Research: The Role of Statistical Modeling and HypothesisTesting

    Regular Articles359 368 Robert H. Pearson, Recommended Sample Size for Conducting

    Daniel J. Mundfrom Exploratory Factor Analysis onDichotomous Data

    369 378 Marcelo Angelo Cirillo, Generalized Variances Ratio Test for Daniel Furtado Ferreira, Comparing k Covariance Matrices fromThelma Sfadi, Dependent Normal PopulationsEric Batista Ferreira

    379 387 Kung-Jong Lui Notes on Hypothesis Testing under a Single-Stage Design in Phase II Trial

    388 402 Housila P. Singh, Effect of Measurement Errors on the SeparateNamrata Karpe and Combined Ratio and Product Estimators

    in Stratified Random Sampling

    403 413 Xian Liu, Reducing Selection Bias in AnalyzingCharles C. Engel, Longitudinal Health Data with HighHan Kang, Mortality RatesKristie L. Gore

    414 442 Oluseun Odumade, Use of Two Variables Having Common MeanSarjinder Singh to Improve the Bar-Lev, Bobovitch and

    Boukai Randomized Response Model

    443 451 Peyman Jafari, A Flexible Method for Testing IndependenceNoori Akhtar-Danesh, in Two-Way Contingency TablesZahra Bagheri

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    3/277

    Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.

    November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

    v

    452 460 Seema Jaggi, Neighbor Balanced Block Designs for Two

    Cini Varghese, FactorsN. R. Abeynayake

    461 469 Moustafa Omar Ahmed Adjusted Confidence Interval for theAbu-Shawiesh Population Median of the Exponential

    Distribution

    470 479 K. A. Bashiru, Nonlinear Trigonometric TransformationO. E. Olowofeso, Time Series ModelingS. A. Owabumoye

    480 487 Hilmi F. Kittani Incidence and Prevalence for A TriplyCensored Data

    488 494 Mowafaq M. Al-Kassab, A Comparison between Unbiased Ridge andOmar Q. Qwaider Least Squares Regression Methods Using

    Simulation Technique

    495 501 Hatice Samkar, Ridge Regression Based on Some RobustOzlem Alpu Estimators

    502 511 Sanizah Ahmad, Robust Estimators in Logistic Regression:Norazan Mohamed Ramli, A Comparative Simulation StudyHabshah Midi

    512 519 Sunil Kumar, A General Class of Chain-Type EstimatorsHousila P. Singh, in the Presence of Non-Response Under Sandeep Bhougal Double Sampling Scheme

    520 535 Li-Chih Wang, A GA-Based Sales Forecasting ModelChin-Lien Wang Incorporating Promotion Factors

    536 546 Anton Abdulbasah Kamil, Maximum Downside Semi DeviationAdli Mustafa, Stochastic Programming for PortfolioKhilpah Ibrahim Optimization Problem

    547 557 Gyan Prakash On Bayesian Shrinkage Setup for ItemFailure Data Under a Family of Life TestingDistribution

    558 567 M. T. Alodat, Empirical Characteristic Function ApproachS. A. Al-Subh, to Goodness of Fit Tests for the LogisticKamaruzaman Ibrahim, Distribution under SRS and RSSAbdul Aziz Jemain

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    4/277

    Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc.

    November, 2010, Vol. 9, No. 2, 332-603 1538 9472/10/$95.00

    vi

    568 578 Sheikh Parvaiz Ahmad, Bayesian Analysis of Location-Scale Family

    Aquil Ahmed, of Distributions Using S-Plus and R SoftwareAthar Ali Khan

    579 583 Ozer Ozdemir, ANN Forecasting Models for ISE National-Atilla Aslanargun, 100 IndexSenay Asma

    584 595 Shafiqah Alawadhi, Markov Chain Analysis and StudentMokhtar Konsowa Academic Progress: An Empirical

    Comparative Study

    Brief Report 596 598 L. V. Nandakishore Bayesian Analysis for Component

    Manufacturing Process

    Emerging Scholars599 603 Hamid Reza Kamali, Estimating the Non-Existent Mean and

    Parisa Shahnazari- Variance of the F-Distribution by SimulationShahrezaei

    JMASM is an independent print and electronic journal (http://www.jmasm.com/), publishing (1) new statistical tests or procedures, or the comparison of existing statistical testsor procedures, using computer-intensive Monte Carlo, bootstrap, jackknife, or resamplingmethods, (2) the study of nonparametric, robust, permutation, exact, and approximaterandomization methods, and (3) applications of computer programming, preferably in Fortran(all other programming environments are welcome), related to statistical algorithms, pseudo-random number generators, simulation techniques, and self-contained executable code to carryout new or interesting statistical methods.

    Editorial Assistant: Julie M. SmithInternet Sponsor: Paula C. Wood , Dean, College of Education, Wayne State University

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    5/277

    Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc. November 2010, Vol. 9, No. 2, 332-339 1538 9472/10/$95.00

    332

    INVITED DEBATE The Not-So-Quiet Revolution: Cautionary Comments on the Rejection of

    Hypothesis Testing in Favor of a Causal Modeling Alternative

    Daniel H. Robinson Joel R. LevinUniversity of Texas University of Arizona

    Rodgers (2010) recently applauded a revolution involving the increased use of statistical modelingtechniques. It is argued that such use may have a downside, citing empirical evidence in educational

    psychology that modeling techniques are often applied in cross-sectional, correlational studies to produceunjustified causal conclusions and prescriptive statements.

    Key words: Modeling, hypothesis testing, SEM, HLM, causation.

    Daniel H. Robinson is Professor of EducationalPsychology and editor of Educational

    Psychology Review. His research interestsinclude educational technology innovations thatmay facilitate learning, team-based approachesto learning, and examining gender trendsconcerning authoring, reviewing, and editingarticles published in various educational journalsand societies. Email him at:

    [email protected] R. Levin is Professor Emeritus at theUniversity of Arizona and the University of Wisconsin-Madison. He is former editor of the

    Journal of Educational Psychology , and former Chief Editorial Advisor for journal publicationsof the American Psychological Association. Hisresearch interests include the design and

    statistical analysis of educational research, aswell as cognitive-instructional strategies thatimprove students learning. Email him at:

    [email protected].

    IntroductionOver the years, we have found that Joseph

    Rodgers (e.g., Rodgers, Cleveland, van denOord, & Rowe, 2000; Rodgers & Nicewander,1988) has something academically interesting,meaty, and instructive to say. Against that

    backdrop, Rodgers most recent essay, provocatively titled The epistemology of mathematical and statistical modeling: A quietmethodological revolution (Rogers, 2010)merits close examination and extensive

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    6/277

    ROBINSON & LEVIN

    333

    commentary. Rodgers appeared to have missedthe mark in two critical respects; both reflectedin the subtitle A quiet methodologicalrevolution, because as will become apparent inthe following discussion, the revolution isneither quiet nor methodological.

    The Null Hypothesis HullabalooRodgers is correct in stating that serious

    concerns about null hypothesis significancetesting (NHST) have been mounting over the

    past several decades. Yet, as is well representedin Harlow, Mulaik, & Steigers (1997)impressive volume, NHST criticisms havehardly been expressed quietly, but rather withfull sound and fury. Moreover, in making hiscase, Rodgers provided a one-sided view of thecontroversy. Although several sources that indict

    NHST were cited, short shrift was given toapproaches that have defended reasonable and

    proper applications of statistical hypothesistesting, including, among others, decidingwhether a believed-random process is trulyrandom (e.g., Abelson, 1997), intelligenthypothesis testing (Levin, 1998a), equivalencetesting (e.g., Serlin & Lapsley, 1993), andhypothesis testing supplemented by effect-sizeestimation and/or confidence-intervalconstruction (Steiger, 2004).

    In addition, numerous authors have

    defended the use of NHST when mindfullyapplied (e.g., Frick, 1996; Hagen, 1997;Robinson & Levin, 1997; Wainer & Robinson,2003). Rodgers cited social-sciences statisticalsage Jacob Cohen (1994) as one who dismissed

    NHST practices in his 1994 seminal article,The Earth is Round ( p < .05). Yet, in the samearticle, one could easily interpret Cohens (p.1001) comment about the nonexistence of magical alternatives to NHST as conceding thatfor whatever good NHST does, there are noadequate substitutes.

    Rodgers (p. 2) described thefundamental difference between the Fisherianand Neyman-Pearson approaches, with the latter emphasiz[ing] the importance of the individualdecision. However, he characterized NHST as ahybrid and condemned it. Just because atechnique is often misused is not a sufficientreason to abandon it. For example, it is argued

    below that in educational psychology we have

    observed frequent misapplication of theRodgers favored causal modeling techniques. Inrecognizing that misapplication, however, our goal is not to deter researchers from adoptingmodeling techniques, but rather to encourageresearchers to apply such techniquesappropriately and to interpret wisely the resultsthat they pump out. (Back in the Neanderthalage of computers, grind out would have been amuch more fitting description.)

    As researchers who have spent most of our careers conducting randomized experiments,we have sought to apply NHST judiciously,typically adopting or adapting Neyman-Pearsona priori Type I, Type II error, effect-size, andsample-size specification principles.Accordingly, we have found that in experimentsconducted with rationally (or better, optimally)

    determined sample sizes - that is, sample sizesassociated with enough statistical power todetect nontrivial differences but with not toomuch power to detect trivial differences (see, for example, Levin, 1998b; and Walster & Cleary,1970) - NHST provides useful informationconcerning whether one has an experimentaleffect worth pursuing. In this context, pursuingmeans that obtaining a statistically significanteffect is followed by a sufficient number of independent replications until the researcher hasconfidence that the initially observed effect is a

    statistically reliable one (see, for example, Levin& Robinson, 2003).

    In that sense as well, we have regarded NHST primarily as a screening device, similar infunction to what Sir Ronald had in mind (e.g.,Fisher, 1935). Much of the hullabaloo about

    NHST is caused by too many researchersfocusing on the results of a single study rather than on a series of studies that are part of a

    program of research (Levin & Robinson, 2000).Fisher was never satisfied with an effectidentified in a single study, even if it had a p

    value of less than 0.05! Instead, he believed thata treatment was only worth writing home aboutwhen it had consistently appeared in numerousexperiments. As is implied in the followingsection, whatever purported advantagesmodeling techniques have over NHST alsovanish unless researchers test a priori models inmultiple experiments.

    Rodgers (p. 3) also condemned the

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    7/277

    REJECTION OF HYPOTHESIS TESTING IN FAVOR OF CAUSAL MODELING

    334

    NHST jurisprudence model while aptly referringto Tukeys (1977) confirmatory data analysisstrategy as being judicial (or quasi-judicial) innature. Yet, Rodgers mischaracterized Tukeysexploratory data analysis strategy insofar as thedetective nature of that hypothesis-generatingapproach clearly is not jurisprudence. It is thisdetective role that one emphasizes when using

    NHST simply as a research-based screening process to determine whether posited effectsexist. To us, convincing a jury of ones peersthat a prescription for practice should be basedon a single research study is rarely, if ever,

    justified.Rodgers (p. 9) assertion that a

    fundamental problem with NHST is one of testing valueless nil null hypotheses has beenadvanced by many critics. As researchers who

    endeavor to use intelligent forms of hypothesistesting with experimental data, we regard the

    problem of nil nulls not as a statistical issue butas a methodological one. Specifically, it makeslittle or no conceptual sense to apply NHSTwhen comparing an instructional treatment witha closet (Levin, 1994, p. 233) control group(i.e., a condition in which participants sit in adark room and do nothing), just as it is inane tocompute p-values for reliability correlations(see, for example, Thompson, 1996).Educational psychology is filled with such

    examples of comparing new innovations withridiculous straw-person control conditions thatno sane researcher would ever consider using. Amore appropriate formulation of a nil null iswhen an investigator wishes to compare a newlydeveloped and previously untested experimentaltreatment with the best treatment that iscurrently available.

    According to Rodgers, the [1999 task force assembled by the American PsychologicalAssociation] concluded that NHST was brokenin [a] certain respect (p. 3). Task-force member

    Wainer and the present first author (Wainer &Robinson, 2003) provided a different view of thetask forces brief consideration of therecommendation to issue an outright ban on

    NHST. As we have argued previously (e.g.,Levin & Robinson, 1999) and in our precedingdiscussion, adopting such an extreme stancewould be akin to calling for a ban on hammers

    because hammerers were hammering their

    fingers instead of nails (for additionaldiscussion, see Levin & Robinson, 2003). Eventhe outspoken NHST critic Rozeboom (1997)acknowledged via another tools analogy thatthe sharpest of scalpels can only create a messif misdirected by the hand that drives it, (p.335). Fortunately, in the case of the most recent(6 th) edition of the APA Publication Manual (American Psychological Association, 2010),the hypothesis-testing baby was not thrown outwith the bath water.

    Causal Modeling TechniquesContemporary modeling techniques,

    including structural equations modeling (SEM)and hierarchical linear modeling (HLM), amongothers, which emerge from atheoretical/conceptual framework, are

    statistical/data-analytic and not methodologicalin nature. So, whence Rodgersmethodological revolution? Even he noted on

    p. 8 that SEM has been built into a powerfulanalytic method and is a prototype of the firstapproach [a model-comparison framework] to

    postrevolutionary modeling (p. 8).That a statistical modeling tail often

    wags the methodological dog may havecontributed to what we consider a major misuseof causal modeling: researchers attempting tosqueeze causality out of observational or

    correlational data. Because of the unfortunatecausal nomenclature, we fear that manyresearchers may be deluded into believing thatthe statistical control that such techniques

    provide for correlational (non-experimental)data is on a par with the genuine experimentalcontrol of randomized experiments (Levin &ODonnell, 2000, p. 211). This in turn results incausal-model appliers issuing causal conclusionsthat they mistakenly believe are scientificallyvalid. As Cliff (1983) previously noted, Literalacceptance of the results of fitting causal

    models to correlational data can lead toconclusions that are of questionable value (p.115).

    In addition, because causal-modelresearchers conclusions typically flow fromrevised data-driven models rather than from a

    priori theory-based model specifications, in theabsence of independent validations those causalconclusions present even more cause for

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    8/277

    ROBINSON & LEVIN

    335

    concern. As with our previous hammers vs.hammerers distinction, Rodgers is well aware of researchers potential shoddy application of causal modeling techniques. Yet, he could havesent a stronger cautionary message to therelatively uninitiated model builder than hisinnocuous pronouncement that the success of SEM depends on the extent to which it is appliedin many research settings (p. 8).

    To illustrate what we mean by prescriptive statements appearing in articles thatinclude statistical modeling techniques, we offer very recent examples that appeared in areputable educational psychology research

    journal. To avoid redundancy, we offer only twosuch unjustified causal excerpts here, fromnumerous ones that we have encountered inmultiple teaching-and-learning research journals

    that we have recently read or reviewed (seeRobinson, Levin, Thomas, Pituch, & Vaughn,2007, and the following section).

    Ciani, Middleton, Summers and Sheldon(2010)s Study

    The following summary appeared inCiani et al.s study abstract:

    Multilevel modeling was used to teststudent perceptions of three contextual

    buffers: classroom community, teachers

    autonomy support, and a masteryclassroom goal structureResults

    provide practitioners with tools for counteracting potential negativeimplications of emphasizing

    performance in the classroom. (p. 88)

    There was one predictor variable; one outcomevariable, a three-item scale that measuredstudents motivation to learn; and threemoderator variables, a three-item scale thatmeasured student perceptions of classroom

    community, a four-item scale that measuredstudent perceptions of instructor autonomysupport, and a three-item scale that measuredstudent perceptions of the extent to which their teacher emphasizes developing competence inthe classroom. All measures were collected at asingle point in time and HLM was used toanalyze the data. Here are a couple causalconclusions from the discussion section:

    However, it appears that comparingstudents achievement publicly, or usingthe work of the highest achievingstudents as an example for everyone,may not be so pernicious a practicewhen students in the classroom perceivea sense of community among their fellow classmates.

    [O]ur findings demonstrate that if students feel respected by the teacher,such that their preferences and ways of doing things are acknowledged andaccommodated as much as possible,then a strong performance orientation onthe part of the teacher is not harmful.Autonomy support enables students tointernalize what they are doing, so that

    they view their activity as importanteven if it is not enjoyable, or if it createsstress and pressure. Thus, it appears thatemphasizing competition betweenstudents is not necessarily underminingof student mastery goals, if the teacher can communicate and promote the

    performance structure in a non-controlling way. These findings arereassuring, showing that performanceorientations are not necessarilycorrosive certainly an important

    message, given the performancenecessities that all students face. (p. 95)

    As with most of these articles based oncorrelational data and yet that offer prescriptiverecommendations, certain limitations of theresearch are explicitly acknowledged by theauthors:

    The most significant limitation to thecurrent study is that all data reported arecorrelational.

    Gathering data at one point in time alsocreates a limitation regarding the causalrelationships among the variables in thisstudy. (p. 96)

    These limitations aside (or ignored?), the authors proceeded to offer the following prescriptive:

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    9/277

    REJECTION OF HYPOTHESIS TESTING IN FAVOR OF CAUSAL MODELING

    336

    Our findings, along with other goaltheorists (e.g., Urdan & Midgley, 2003),suggest that given current prevailingattitudes and policy it may be morefruitful to emphasize adaptiveinstructional practices in the classroom,as opposed to trying to reducemaladaptive practices. (p. 97)

    Thus, the authors made recommendations for practice (prescriptive statements) in theabsence of convincing evidence that such

    practices are clearly causally related to studentoutcomes.

    Chen, Wu, Kee, Lin & Shuis (2009) StudyChen et al. used SEM to analyze

    relations among fear of failure, achievement

    goals, and self-handicapping. Causal relationsamong the variables are implied in theDiscussion section:

    This finding shows fear of failure as adistal determinant of self-handicappingand achievement goals (MAv and PAv)as proximal determinants of self-handicapping, demonstrating themotivational process of self-handicapping. (p. 302)

    The authors revealed the perceived magicalquality of SEM allowing researchers to coaxcausality from correlational data:

    Since SEM analysis examines manyvariables relationships simultaneously, werely on its results as the basis for our conclusions and discussion. (p. 303)

    The Limitations section is predictable:

    Although we used the SEM approach to

    estimate the proposed model, the data inthe study are cross-sectional in natureand causal relations cannot be drawn.The longitudinal approach is preferredin order to ascertain the causal patternand to further clarify the chronic effectsof mastery-avoidance and performance-approach goals on achievement-relatedoutcomes. (p. 304)

    In contrast, what follows are the grand prescriptives that appeared in the Implicationsand Conclusions:

    We believe that the integrative modelcan help educators develop effectiveinterventions to reduce students self-handicapping, especially since we foundthat the mid-level achievement goals(MAv and PAv) mediate therelationships between fear of failure andself-handicapping it is suggested thatteachers use multiple indices to offer more opportunities for students to attainsuccess. In addition, teachers shouldencourage students to embrace amultiple goals perspective in whichdoing ones best and outperforming

    others are not in conflict with eachother. (p. 304)

    Rodgers (2010, p. 8) previously proffered caveataside, in both of the just-presented examples,cross-sectional (one time point), correlational(no variables were manipulated) data weretossed into a statistical modeling analysis andwhat popped out were causal conclusions.

    Correlational Data and Causal ConclusionsOver the past few years, we have

    examined empirical articles published in widelyread teaching- and-learning research journalsand have found that:

    1. In one journal survey (Hsieh et al., 2005),the proportion of articles based onintervention and experimental (randomassignment) methodology had decreasedfrom 47% in 1983 to 23% in 2004.

    2. In another journal survey (Robinson et al.,2007), the proportion of articles based on

    intervention methods had decreased from45% in 1994 to 33% in 2004. Meanwhile,the proportion of nonintervention articlesthat contained prescriptive statementsincreased from 34% in 1994 to 43% in 2004.The proportion of nonintervention (non-experimental and correlational) articles thatincluded prescriptive statements (in the formof causally implied implications for

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    10/277

    ROBINSON & LEVIN

    337

    educational practice) increased from 33% in1994 to 45% in 2004.

    3. In a follow-up to the just-describedRobinson et al. (2007) survey (Shaw, Walls,Dacy, Levin & Robinson, 2010), althoughonly 19 nonintervention studies in 1994included prescriptive statements, thesestatements were repeated in 30 subsequentarticles that had cited the original 19.

    For the present article, we examined thefirst two issues of the 1999 volume of the APA-

    published journal, the Journal of Educational Psychology , and again for the 2009 volume. Welooked specifically at the comparative

    proportions of articles based on correlationalmethods and those that involved interventions

    (either randomized experimental or nonrandomized but researcher manipulated), aswell as the proportion of correlational methodsarticles in which prescriptive statements wereoffered. The results are summarized in Table 1.

    Although roughly half of the articlesappearing in only one of the five journals thatwere part of Robinson et al.s (2007) study weresurveyed, the findings support the reportedtrends. Intervention studies (both randomizedand nonrandomized) are becoming increasinglyrare and instead researchers are basing their

    recommendations for practice on weaker evidence. Moreover, it appears that statistical

    and nonrandomized) are becoming increasinglyrare and instead researchers are nonrandomized)modeling techniques are becoming more popular - having increased from only 3% of thecorrelational research articles in 1999 to 40% in2009 - which may in turn contribute to theconcomitant 10-year increase in prescriptivestatements appearing in such articles.

    Thus, we have witnessed widespreadapplication of SEM, HLM, and other sophisticated statistical procedures incorrelational data contexts, where causality issought but the critical conditions needed toattribute causality are missing (e.g., Marley &Levin, 2011; Robinson, 2010). Rodgers statesthat researchers who are scientistsshould befocusing on building a modelembedded withinwell-developed theory (p. 4-5). Here we agree

    with former Institute for Educational ScienceDirector Grover Whitehurst who argued that - atleast in the field of education - we have enoughtheory development studies and need morestudies that address practical what worksquestions.

    It is our fear that a research approachwhere the question, Does the data fit mymodel? is far more dangerous than thequestion, Is there anything here worth

    pursuing? As we have seen, an affirmativeanswer to the former question seems to entitle a

    researcher to form a model that indicates acausal relationship between, say, students self-efficacy and their achievement. The researcher then develops a self-efficacy scale that measures

    Table 1: Summary of Selected Results of Surveyed Articles Appearing in the Journal of Educational Psychology (1999 and 2009) Based on Either Correlational or Intervention Methods

    1999 2009

    Type of Study Type of Study

    Correlational Intervention Correlational Intervention

    Number of Articles 18 (60%) 12 (40%) 23 (66%) 12 (34%)

    Prescriptive Statements 9 (50%) ------ a 13 (57%) ------ a

    Statistical Modeling 1 (3%) 0 (0%) 14 (40%) 2 (6%)

    Prescriptive Statements 1 ------ a 7 (50%) ------ a

    Note : This table includes preliminary data from a larger study recently completed by Reinhart, Haring,Levin, Patall, and Robinson (2011). a Not assessed in the present survey

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    11/277

    REJECTION OF HYPOTHESIS TESTING IN FAVOR OF CAUSAL MODELING

    338

    students self-perceptions and also measuresachievement. The data may fit the model but inthe absence of convincing longitudinal data,ruling out alternative explanations, andindependent replications based on the previousnice-fitting model, this practice may lead todangerous causal conclusions. For the just-

    presented self-efficacy example, it is just aslikely that high achievers feel better about their effectiveness as learners rather than the other way around. Apparently, many researchers

    believe that it is entirely appropriate to applysuch modeling techniques and to interpret theresults as support for prescriptive statementsfounded on causality.

    Conclusions About RevolutionsTo summarize, Rodgers (2010) has

    written a cogent essay on the vices of statisticalhypothesis testing and the virtues of statisticalmodeling. We believe, however, that his essay

    painted a somewhat distorted (and potentiallymisleading) portrait about those statistical arts.In particular, we take issue with two aspects of Rodgers so-called quiet methodologicalrevolution. For one aspect (rejecting statisticalhypothesis testing), we argue that the picture isneither as bleak nor as open and shut as Rodgers

    portrayed. As supporting evidence, witness thesustained presence of hypothesis testing, along

    with its more intelligent additions andadaptations, in various academic-researchdisciplines - including the research-and-

    publication bible of both our very own field of psychology and virtually all social-sciencesdomains, the most recent edition of the APA

    Publication Manual (American PsychologicalAssociation, 2010).

    For the other aspect of Rodgers essaythat merits critical commentary (acceptingmodeling techniques), we argue that causalmodeling and other related multivariate and

    multilevel data-analysis tools frequently causetheir users to think - in accord with Rodgersseductive subtitle - that the procedures aremethodological randomization-compensating

    panaceas rather than techniques that do the bestthey can to provide some degree of statisticalcontrol in a multiply confounded variableworld. The unfortunate consequence of thatmethodological understanding, then, is that

    when combined with researcher misapplicationof such modern modeling artillery, instead of

    being on target with their data analyses andresearch conclusions, weapons are backfiringand researchers are ending up (whether knowingly or not) with a considerable amount of egg on their faces.

    ReferencesAbelson, R. P. (1997). The surprising

    longevity of flogged horses: Why there is a case for the significance test. Psychological Science , 8, 12-15.

    American Psychological Association.(2010). Publication manual of the American

    Psychological Association (6 th Ed.). Washington,DC: American Psychological Association.

    Chen, L. H., Wu, C.-H., Kee, Y. H., Lin,M.-S., & Shui, S.-H. (2009). Fear of failure, 2 x 2achievement goal and self-handicapping: Anexamination of the hierarchical model of achievement motivation in physical education.Contemporary Educational Psychology , 34, 298-305.

    Ciani, K. D., Middleton, M. J., Summers, J.J., & Sheldon, K. M. (2010). Buffering against

    performance classroom goal structures: Theimportance of autonomy support and classroomcommunity. Contemporary Educational Psychology ,35, 88-99.

    Cliff, N. (1983). Some cautions concerningthe application of causal modeling methods.

    Multivariate Behavioral Research , 18, 115-126.

    Cohen, J. (1994). The earth is round ( p 0 (1)

    The parameter represents the mean number of events per unit time (e.g., the rate of arrivalsor the rate of failure). The exponential

    distribution is supported on the interval [0, ).The mean (expected value) of an exponentiallydistributed random variable X with rate

    parameter is given by:

    1)( == X E (2)

    In light of the examples given above, thismakes sense: if a person receives phone calls atan average rate of 2 per hour, then they canexpect to wait one-half hour for every call. Also,note that approximately 63% of the possiblevalues lie below the mean for any exponentialdistribution (Betteley, et al., 1994).

    The median of an exponentiallydistributed random variable X with rate

    parameter is given by:

    69315.0)2(ln

    == MD (3)

    The maximum likelihood estimator (MLE) for

    the rate parameter , given an independent andidentically distributed random sample of size n,

    n X X X ,...,, 21 , drawn from the exponential

    distribution, ( ) 1 Exp , is given by:

    n

    ii 1

    n 1XX

    =

    = = . (4)

    While this estimate is the most likelyreconstruction of the true parameter , it is onlyan estimate, and as such, the more data pointsavailable the better the estimate will be. Also,the MLEs are consistent estimators of their

    parameters and are asymptotically efficient(Casella & Berger, 2002).

    The Used EstimatorsThe sample mean, X , and the

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    136/277

    ABU-SHAWIESH M.O.A

    463

    sample median, 1 MD , which are used in thisstudy for constructing the proposedmodified confidence intervals for theexponential distribution median, MD , arenow introduced.

    The Sample Mean, X The sample mean is the most well

    known example of a measure of location, or average. It is defined for a set of values as thesum of values divided by the number of valuesand is denoted by X . The sample mean for arandom sample of size n observations

    n X X X ,...,, 21 can be defined as follows:

    n

    X X

    n

    i i== 1 (5)

    The main advantages of the sample mean, X ,are: it is easy to compute, easy to understoodand takes all values into account. Its maindisadvantages are: it is influenced by outliers,can be considered unrepresentative of datawhere outliers occur because many values may

    be well away from it and it requires all values inorder to calculate its value (Betteley, et al.,

    1994; Francis, 1995).The Sample Median, MD 1

    The sample median is perhaps the bestknown of the resistant location estimators. It isinsensitive to behavior in the tails of thedistribution. The sample median is defined for aset of values as the middle value when thevalues are arranged in order of magnitude and itis denoted herein by 1 MD . The sample medianfor a random sample of size n observations

    n X X X ,...,, 21 can be defined as follows:

    +=

    +

    +

    evenisnif

    X X

    odd isnif X

    MD nn

    n

    2

    122

    21

    1

    (6)

    The main advantages of the samplemedian, MD 1, is that, it is easy to determine,requires only the middle values to calculate, can

    be used when a distribution is skewed - as in thecase of the exponential distribution, is notaffected by outliers and has a maximal 50%

    breakdown point. The main disadvantages of thesample median, MD 1, are that it is difficult tohandle in mathematical equations, it does notuse all available values and it can be misleadingin a distribution with a long tail because itdiscards so much information. The samplemedian, though, is considered as an alternativeaverage to the sample mean (Betteley, et al.,1994; Francis, 1995). However, the samplemedian, MD 1, has become as a good general

    purpose estimator and is generally considered asan alternative average to the sample mean, X .

    Estimating the Exponential Distribution Median:Two techniques are now introduced for

    finding estimates, the method of sample medianand the method of maximum likelihood which isthe most widely used.

    The Method of Sample MedianGiven a random sample of size n

    observations, n X X X ,...,, 21 , the estimator of the exponential population median , MD , based

    on the method of sample median, MD 1, isdenoted by MD MD1 . Now, from equation (3):

    1 ln(2)MD ln(2)

    MD= =

    (8)

    Thus, if the exponential population median MD in (8) is estimated by the sample median MD 1,results in the following approximation:

    1

    )2ln(

    MD

    = . (9)

    Therefore, equating the results in (4) and (9) andsolving, the following approximation isobtained:

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    137/277

    ADJUSTED CONFIDENCE INTERVAL FOR EXPONENTIAL DISTRIBUTION MEDIAN

    464

    )2ln()2ln( 1

    111

    MDn X

    MD X

    n n

    iin

    ii

    ==

    =

    (10)

    The Maximum Likelihood Estimator of theMedianGiven a random sample of size n

    observations, n X X X ,...,, 21 , the estimator of the exponential population median based on themaximum likelihood method is denoted herein

    by MD MLE and can be defined as follows:

    )2ln()2ln(1 1

    n

    X MD

    n

    ii

    MLE ===

    (11a)

    where

    n

    ii 1

    n 1XX

    =

    = = . (11b)

    Comparing the Two Estimators of theExponential Distribution Median

    It is known that the maximum likelihoodestimators are asymptotically unbiased andefficient. Concretely, the estimator MD MLE is

    unbiased and2

    2)2(ln)(

    n MDVar MLE = .

    Moreover, the sample median estimator, MD1, isasymptotically normal distributed withasymptotic variance

    )(4

    1)(

    21 MDnf MDVar A = , where (.) f is

    the corresponding density and MD is thetheoretical median (Vann deer Vaart, 1998).Asymptotically unbiased means that theaverage value over many random samplesfor the two estimators MD 1 or MD MLE is theexponential distribution median, MD . Tocompare the two estimators MD 1 and MD MLE interms of how far they are from the exponentialdistribution median ( MD ) on the average for many random samples, it is necessary tocompare their root mean square error, RMSE ,given as follows:

    ==

    r

    i MDi MDr

    RMSE 1

    2)(1

    (12)

    where r MD MD MD ,...,, 21 are the values of the

    estimators MD 1 and MD MLE for r replications and MD is the value of the exponential distributiontrue median.

    The Confidence Interval for the ExponentialDistribution Median

    Next, the confidence interval for themedian of the exponential distribution, MD , isderived by modifying the confidence interval for the mean of the exponential distribution, . Let

    n X X X ,...,, 21 be a random sample of size n from the exponential distribution with parameter , that is, ( ) 1~ Exp X , then the exact twosided )%1(100 confidence interval for theexponential distribution mean, , is given by(Trivedi, 2001):

    =

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    138/277

    ABU-SHAWIESH M.O.A

    465

    n n

    i ii 1 i 1

    2 2(2n , 2) (2n ,1 2)

    n n

    i ii 1 i 1

    2 2(2n, 2) (2n,1 2)

    2 X 2 XMD 1

    P( )ln(2)

    2ln(2) X 2ln(2) XP( MD )

    1

    = =

    = =

    < = <

    = < <

    =

    (15)

    The MD MLE confidence interval is exact. It is

    based on the fact that=

    n

    1iiX2 follows a

    2)2( n distribution. The coverage probability

    must be exactly 95%.

    The MD MD1 Confidence IntervalThis confidence interval is obtained by

    substituting the result from (10) into equation(13) to give the exact )%1(100 confidenceinterval for the exponential distribution median,

    MD , as follows:

    1 12 2(2n , 2) (2n ,1 2)

    1 12 2(2n , 2) (2n,1 2)

    2(n MD ln(2)) 2(n MD ln(2))MDP( )

    ln(2)

    2n MD 2n MDP( MD )

    1

    < <

    = < <

    = (16)

    The MD1 confidence interval is notexact. Its expression in (16) is based on equation(10) which is only an approximation of thestatistics In order to see that, the performance of the MD1 confidence interval is studied bycalculating the coverage probability, the averagewidth and the standard error using Monte-Carlosimulations. Actual, approximate and exactconfidence intervals based on the sample median

    MD 1 can be also constructed using standardmethods.

    The Monte Carlo Simulation StudyA Monte Carlo simulation was designed

    to compare and study the behavior of the twoestimators MD 1 and MD MLE and investigate the

    behavior of the proposed approximateconfidence intervals for the exponentialdistribution median, MD . FORTRAN programswere used to generate the data from theexponential distribution and run the simulationsand to make the necessary tables. Results arefrom the exponential distribution with parameter which was set to 1 and 0.5, to increaseskewness. The more the repetition, the moreaccurate are simulated results, therefore 10,000random samples of sizes n = 10, 15, 20, 30, 40,50 and 100 were generated.

    Table (1) shows the simulated results for the root mean square error, RMSE , and theaverage of MD 1s and MD MLEs (AVG) toillustrate that both estimators are approximatelyunbiased for the true median of the exponentialdistribution, MD . The simulated results for thecoverage probability ( P ), average width (AW)and standard error (SE) of the exact confidenceinterval for the exponential mean, , and thetwo proposed approximate confidence intervalsfor the exponential distribution median, MD , areshown in tables (2-4). The criteria used toevaluate the exact and proposed approximateconfidence intervals is the value of the coverage

    probability ( P ) and average width (AW); a goodmethod should have an observed coverage

    probability ( P ) near to the nominal coverage probability and a small scaled average width(AW).

    The simulation results in Table 1 showthat the maximum likelihood estimator of themedian, MD MLE , is a much better estimator for the population median of the exponentialdistribution, MD , than the sample median, MD 1.While both estimators are approximately

    unbiased, the root mean square error, RMSE , for the sample median, MD 1, is larger than that of the maximum likelihood estimator of themedian, MD MLE . It should be noted that theaccuracy of the maximum likelihood estimator of the median, MD MLE , increases as the samplesize, n, increases which clearly provides a verygood estimator, even considering that thediscrepancy of these two estimators is very small

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    139/277

    ADJUSTED CONFIDENCE INTERVAL FOR EXPONENTIAL DISTRIBUTION MEDIAN

    466

    -that is, these two estimators asymptoticallycoincide.

    As shown in Tables 2-4, the simulationresults show that the coverage probability ( P )for the confidence interval of the mean and theapproximate confidence interval of the median

    based on the MLE method for the exponentialdistribution are the same and very close to thenominal confidence coefficient.

    The approximate confidence interval of the median based on the sample median methodfor the exponential distribution provides thelower coverage probability ( P ) and gives the

    lowest width among the three methods. Theaverage widths (AW) for the two proposedconfidence interval methods are approximatelythe same for moderate and large sample sizes.However, the estimated average width (AW) for the sample median method is the shortest amongall considered methods, but it has poor coverage

    probability. Furthermore, as sample sizesincreases, the performance of the proposedconfidence interval based on the MLE methodimproves, but is still much lower than thenominal confidence.

    Table 1: The Root Mean Square Error and Average of the Two Estimators for theExponential Distribution Median

    SampleSize (n)

    1= (True Median = 0.69315)

    RMSE( MD 1) AVG( MD 1) RMSE( MD MLE ) AVG( MD MLE )

    10 0.31575 0.74851 0.22369 0.69487

    15 0.26754 0.72676 0.18040 0.69332

    20 0.22459 0.72003 0.15715 0.69394

    30 0.18276 0.71180 0.12808 0.69335

    40 0.15907 0.70649 0.11106 0.69342

    50 0.14108 0.70459 0.09887 0.69439100 0.10005 0.69894 0.06961 0.69265

    SampleSize (n)

    5.0= (True Median = 1.38629)

    RMSE( MD 1) AVG( MD 1) RMSE( MD MLE ) AVG( MD MLE )

    10 0.63150 1.49703 0.44739 1.38973

    15 0.53509 1.45352 0.36080 1.38663

    20 0.44918 1.44005 0.31430 1.38788

    30 0.36553 1.42359 0.25615 1.38671

    40 0.31814 1.41299 0.22211 1.38684

    50 0.28215 1.40917 0.19773 1.38877

    100 0.20010 1.39788 0.13922 1.38530

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    140/277

    ABU-SHAWIESH M.O.A

    467

    Table 2: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Mean of the Exponential Distribution

    n

    95.01 =

    1= 5.0=

    P AW SE P AW SE

    10 94.55 1.504 0.484 94.55 3.007 0.968

    15 94.84 1.148 0.299 94.84 2.297 0.59820 94.51 0.964 0.218 94.51 1.928 0.43730 94.58 0.762 0.141 94.58 1.524 0.282

    40 94.94 0.650 0.104 94.94 1.299 0.20850 94.79 0.576 0.082 94.79 1.153 0.164

    100 95.02 0.399 0.040 95.02 0.798 0.080

    Table 3: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Median of the Exponential Distribution with 1

    =

    n

    95.01 = Confidence Interval Method

    MDMLE Method MD MD1 Method

    P AW SE P AW SE

    10 94.55 1.123 0.336 85.11 1.042 0.46615 94.84 0.834 0.207 82.76 0.796 0.30520 94.51 0.693 0.151 83.92 0.668 0.21530 94.58 0.542 0.098 83.41 0.528 0.139

    40 94.94 0.459 0.072 83.00 0.450 0.10350 94.79 0.405 0.057 83.28 0.400 0.081

    100 95.02 0.279 0.028 82.95 0.277 0.040

    Table 4: Coverage Probabilities, Average Width and Standard Error for the Confidence Intervalof the Median of the Exponential Distribution with 5.0=

    n

    95.01 = Confidence Interval Method

    MDMLE Method MD MD1 Method

    P AW SE P AW SE

    10 94.55 2.246 0.671 85.11 2.085 0.93315 94.84 1.669 0.414 82.76 1.592 0.60920 94.51 1.387 0.303 83.92 1.337 0.43030 94.58 1.085 0.195 83.41 1.056 0.27740 94.94 0.918 0.144 83.00 0.901 0.20650 94.79 0.811 0.114 83.28 0.799 0.162

    100 95.02 0.558 0.056 82.95 0.553 0.080

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    141/277

    ADJUSTED CONFIDENCE INTERVAL FOR EXPONENTIAL DISTRIBUTION MEDIAN

    468

    Numerical ExampleThis example is taken from Wilk, et al.

    (1962); the data set represents the lifetimes (inweeks) of 34 transistors in an accelerated lifetest. The transistors were tested and the testcontinued until all of them failed. The lifetimesfor the 34 transistors (in weeks) were recordedas follows:

    3, 4, 5, 6, 6, 7, 8, 8, 9,9, 9, 10, 10, 11, 11, 11, 13, 13,

    13, 13, 13, 17, 17, 19, 19, 25, 29,33, 42, 42, 52, 52, 52, 52

    The sample mean 912.18= X weeks, theexponential median MD 1 = 13 weeks, theexponential median MD2 = 13.108 weeks and the

    skewness is 1.265695, which is highly skeweddistribution. Furthermore,

    053.005287.0912.1811 ===

    X and -

    using the approximation derived earlier - results

    in 053.005332.013

    69315.0)2ln(1

    === MD

    which indicates that the two values are veryclose and therefore the approximation is good.Based on Kibria (2006), the above data set areassumed to come from an exponential

    distribution with mean 21= weeks; using theKolmogorov-Smirnov test, the K-S statistic =0.1603 and the p-value = 0.3125, it indicates thatthe sample data are from an exponentialdistribution with mean 21= weeks, andtherefore (from equation (3)) has a median MD = 14.556 weeks. The resulting 95% confidenceinterval and the corresponding confidence width

    for the exact confidence interval for theexponential mean and the two proposed methodsfor the exponential median are calculated andgiven in table (5).

    From table (5), it is observed that theexact confidence interval for the exponentialmean, as expected, covered the hypothesizedtrue population mean of 21= weeks and alsothe proposed confidence intervals for theexponential median, MD , covered thehypothesized true population median MD =14.556 weeks. However, the proposedconfidence interval for the exponential median,

    MD , based on the sample median, MD1, provided the shortest confidence interval width.

    ConclusionThe median - one of the most important and

    popular measures for location - has many goodfeatures. The median confidence interval isuseful for one parameter families, such as theexponential distribution, and it may not need to

    be adjusted if censored observations are present.The maximum likelihood estimation is a popular statistical method used to make inferences about

    parameters of the underlying probabilitydistribution from a given data set. This study

    proposed an approximate confidence interval for the median of the exponential distribution, MD ,

    based on two estimators, the sample median, MD 1, and the maximum likelihood estimator of the median, MD MLE .

    The results of this study show that usinga maximum likelihood estimator, MLE , for the

    population median of the exponentialdistribution, MD , is better alternative to theclassical estimator based on the sample median,

    MD 1. As shown by the study results, themaximum likelihood estimator of the median,

    MD MLE , provides a good estimation for the population median of the exponentialdistribution, MD , and the proposed confidenceinterval based on this estimator had a goodcoverage probabilities compared to the samplemedian method. However, it produced slightlywider estimated width. It appears that the samplesize, n, has significant effect on the two

    proposed confidence interval methods.Moreover, both of the proposed methods arecomputationally simpler. If scientists and

    Table 5: The 95% Confidence Intervals for theLifetimes Data

    ConfidenceInterval Method

    95% ConfidenceInterval Width

    Exact for Mean (13.874 , 27.308) 13.434

    MD MLE (9.617 , 18.929) 9.312

    MD MD1 (9.537 , 18.772) 9.235

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    142/277

    ABU-SHAWIESH M.O.A

    469

    researchers are conservative about the smaller width, they might consider confidence interval

    based on sample median method as a possibleinterval estimator for the population median of the exponential distribution, MD . Finally, theresults obtained from the simulation studycoincided with that of the numerical example.

    AcknowledgementThe author would like to thank the HashemiteUniversity for their cooperation during the

    preparation of this article.

    ReferencesBalakrishnan, N., & Basu, A. P. (1996).

    The exponential distribution: theory, methodsand applications , CRC press.

    Bettely, G., Mettric, N., Sweeney, E., &Wilson, D. (1994). Using statistics in industry:Quality improvement through total processcontrol , (1 st Ed. ). London: Prentice HallInternational Ltd.

    Casella, G., & Berger, R. L. (2002).Statistical inference . Pacific Grove, CA:Duxbury.

    Daniel, W. W. (1990). Applied nonparametric statistics (2nd Ed. ). Duxbury,Canada: Thomson Learning.

    Francis, A. (1995). Businessmathematics and statistics . London: DP

    Publications, Ltd.Kececioglu, D. (2002). Reliability

    engineering handbook . Lancaster, UK: DEStechPublications.

    Kibria, B. M. G. (2006). Modifiedconfidence intervals for the mean of theasymmetric distribution. Pakestanian Journal of Statistics , 22 (2), 109-120.

    Kinney, J. (1997). Probability: Anintroduction with statistical applications . NewYork: John Wiley and Sons, Inc.

    Lewis, E. E. (1996). Introduction to

    reliability engineering , (2nd

    Ed. ). New York:John Wiley and Sons, Inc.

    Maguire, B. A., Pearson, E. S., & Wynn,A. H. A. (1952). The time intervals betweenindustrial accidents. Biometrika , 39 , 168-180.

    Montgomery, D. C. (2005). Introductionto statistical quality control (5th Ed. ). New York:John Wiley and Sons, Inc.

    Patel, J. K., Kapadia, C. H., & Owen, D.B. (1976). Handbook of statistical distributions .

    New York: Marcel Dekker.Sinha, A., & Bhattacharjee, S. (2004).

    Test of parameter of an exponential distributionin predictive approach. Pakestanian Journal of Statistics , 20(3), 409-414.

    Smithson, M. (2001). Correctconfidence intervals for various regression effectsizes and parameters: The importance of noncentral distributions in computing intervals.

    Educational and Psychological Measurement ,61 , 605-532.

    Trivedi, K. S. (2001). Probability and statistics with reliability, queuing, and computer science applications (2nd Ed. ). New York: JohnWiley and Sons, Inc.

    Van der Vaart, A.W., (1998). Asymptotic Statistics, 1-496. CambridgeUniversity Press, Cambridge.

    Wilk, M. B., Gnanadesikan, R., &Huyett, M. J. (1962). Estimation of parametersof the gamma distribution using order statistics.

    Biometrika , 49 , 525-545

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    143/277

    Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc. November 2010, Vol. 9, No. 2, 470-479 1538 9472/10/$95.00

    470

    Nonlinear Trigonometric Transformation Time Series Modeling

    K. A. Bashiru O. E. Olowofeso S. A. OwabumoyeOsun State University,

    Osogbo, Osun StateFederal University Of Technology,

    Akure Ondo State, Nigeria

    The nonlinear trigonometric transformation and augmented nonlinear trigonometric transformation with a polynomial of order two was examined. The two models were tested and compared using daily meantemperatures for 6 major towns in Nigeria with different rates of missing values. The results were used todetermine the consistency and efficiency of the models formulated.

    Key words: Nonlinear time series, polynomial, consistency, efficiency, missing value, model andforecasting.

    Introduction

    Time series analysis is an important techniqueused in many disciplines, including physics,engineering, finance, economics, meteorology,

    biology, medicine, hydrology, oceanography andgeomorphology (Terasvirta & Anderson, 1992).This technique is primarily used to infer

    properties of a system by the analysis of ameasured time record (data) (Priestley, 1988);this is accomplished by fitting a representativemodel to the data with an aim of discovering theunderlying structure as closely as possible.

    Traditional time series analysis is based

    on assumptions of linearity and stationarity.However, time series analysis (Box & Jenkins,1970; Brock & Potter, 1993) has nonlinear features such as cycles, asymmetries, bursts,

    K. A. Bashiru is a lecturer in the Department of Mathematical and Physical Sciences. His areasof interest are Geostatistics and Time seriesEconometrics. Email him at:[email protected]. O. E.

    Olowofeso is an Associate Professor of Statisticsin the Mathematical Sciences Department. He isalso currently working in the Central Bank of

    Nigeria, Abuja. Email: [email protected]. A. Owabumoye is a Masters student under the supervision of O. E Olowofeso in theMathematical Sciences Department. Email:[email protected].

    jumps, chaos, thresholds and heteroscedasticity,

    and mixtures of these must also be taken intoaccount. Thus, a problem arises regarding asuitable definition of a nonlinear model becausenot every time series analysis is purely linear:the nonlinear class clearly encompasses a largenumber of possible choices. For these reasons,non-linear time series analysis is a rapidlydeveloping area and there have been major developments in model building and forecasting(De Gooijer & Kumar, 1992).

    The growing interest in studyingnonlinear and non-stationary time series models

    in many practical problems stems from theinherently non-linear nature of many phenomenain physics, engineering, meteorology, medicine,hydrology, oceanography, economics andfinance, that is, many real world problems donot satisfy the assumptions of linearity and/or stationarity (Bates & Watts, 1988; DeGooijer &Kumar, 1992; Sugihara & May, 1990).Therefore, for many real time series data,nonlinear models are more appropriate thanlinear models for accurately describing thedynamic of the series and making multi-step-

    ahead forecast (Tsay, 1986; Barnett, Powell &Tauchen, 1991; Olowofeso, 2006). For example,financial markets and trends are influenced byclimatic factors like daily temperature, amountof rainfall and intensity of sun, these are areaswhere a need exists to explain behaviors that arefar from being even approximately linear.

    Nonlinear models would be more appropriate for forecasting and accurately describing returns and

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    144/277

    BASHIRU, OLOWOFESO & OWABUMOYE

    471

    volatility. Thus, the need for the further development of the theory and applications for nonlinear models is essential, and, because thereare an enormous number of nonlinear modelsavailable for modeling and forecasting economictime series, research should help provideguidance for choosing the best model for a

    particular application (Robinson, 1983).

    MethodologyThe model proposed by Gallant (1981) calledthe Augmented Nonlinear Parametric TimeSeries Model (ANPTSM) was used in this studyand a second model was formulated based on theLeast Square Method Modified Nonlinear Trigonometric Transformation Time SeriesModel (MNTTTSM).

    DataData used in this study were daily mean

    of temperatures from 1987 to 1996 for Ikeja,Ibadan, Ilorin, Minna and Zaria. The data werecollected from the Meteorological Centre-Oshodi Lagos.

    Model FormulationConsider the format shown in Table 1.

    In this model, up to 9 years were considered andthe model is formulated based on the data asshown in Table 2.

    Assumption and Notation for the ModelsLet:

    X t,i,k = value of occurrence for day t of Month iin the year k;

    X t,k = mean occurrence for day t of year k;X*i,k = mean occurrence for month i of year k;X*yK = overall yearly mean for the sampled

    month;X*mi = overall monthly mean for the sampled

    year;t = the position of the day from the first day of

    the Month. 1 t 31;ti = the sum of days in month i for 1 i 12;tik = the sum of days from the initial sampled

    month of initial sampled year to month iof year k;

    ti* = the sum of days from the initial sampledmonth to month I;

    k = the position of a particular year from aninitial sample year for k ;

    n = the number of sampled years;m = the number of sampled months; andX* = Grand Mean occurrence for k year(s)

    examined.The first model was reviewed based on

    the assumption that the sum of the occurrenceswere presented monthly, where i th monthrepresents the month i for 1 i 12 which is to

    be modeled using the number of days in eachmonth (see Table 3).

    Table 1: Model Formulation for a Particular Year

    t/i 1 2 3 4 5 6 7 8 9 10 11 12 xi/12

    1 x1,1,k x1,2,k x1,3,k x1,4,k x1,5,k x1,6,k x1,7,k x1,8,k x1,9,k x1,10,k x1,11,k x1,12,k X 1

    2 x2,1,k x2,2,k x2,3,k x2,4,k x2,5,k x2,6,k x2,7,k x2,8,k x2,9,k x2,10,k x2,11,k x2,12,k X 2

    t x t,1,k xt,2,k xt,3,k xt,4,k xt,5,k xt,6,k xt,7,k xt,8,k xt,9,k xt,10,k xt,11,k xt,12,k X t *

    1,k x*2,k x*3,k x*4,k x*5,k x*6,k x*7,k x*8,k x*9,k x*10,k x*11,k x*12,k x*i,k /12

    Table 2: Model Data Formulationt/ik 1 2 3 i k xi/ik

    1 x1,1,1 x1,2,1 x1,3,1 x 1,i,k X 1,k

    2 x2,1,1 x2,2,1 x2,3,1 x 2,i,k X 2,k

    t x t,1,1 xt,2,1 xt,3,1 x t,i,k X t,k

    x*1,k x*2,k x*3,k x *i,k x*i,k /ik= X

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    145/277

    NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING

    472

    Augmented Nonlinear Parametric Time SeriesModel (ANPTSM)

    Trigonometric (sine and cosine)transformation augmented with polynomial of order two was applied to formulate the modelacross the year, that is, the monthly meansample and the least square methods were usedfor estimating the models parameters asfollows. Let the equation be of the form

    ( ) ( )2t ,i,k 1 2 ik 3 ik iX a a tSin t a t Cos t1 i 12

    = + + +

    (3.0)

    The expected value of X t,i,k is X *i,k then theequation can be reformed as below to estimate

    the parameters; a 1, a2 and a 3 using Least SquareMethod.

    ( ) ( )* 2i,k 1 2 i ik 3 i ik iX a a t Sin t a t Cos t1 i 12

    = + + +

    (3.1)

    ( ) ( )( )* 2i i,k 1 2 i ik 3 i ik X a a t Sin t a t Cos t = + +

    (3.2)Let i2 = S

    ( ) ( )( )* 2 2i,k 1 2 i ik 3 i ik S (X a a t Sin t a t Cos t )= + + (3.3)

    Differentiating 3.3 with respect to a 1, a2, a3, a3,as

    01

    aS

    results in

    ( ) ( )* 2i,k 1 2 i ik 3 i ik X ma a t Sin t a t Cos t = + + (3.4)

    where m is the number of the monthly samplemean examined. Similarly, as

    02

    aS

    then

    ( ) ( )* 2 2i ik i,k 1 i ik 2 i ik 3

    3 i ik ik

    t Sin t X a t Sin t a t Sin (t )

    a t Sin(t )Cos(t )

    = +

    +

    (3.5)

    and as0

    3

    aS

    then

    ( ) ( )( ) ( )

    ( )

    2 * 21i ik i,k i ik

    32 i ik ik

    4 23 i ik

    t Cos t X a t Cos t

    a t Sin t Cos t

    a t Cos t

    =

    +

    +

    (3.6)

    Simultaneously solving equations 3.4, 3.5 and3.6 using Cramers Rule results in equations 3.7-3.10.

    Table 3: Months and Sums of Occurrences ModeledJan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan

    i 1 2 3 4 5 6 7 8 9 10 11 12 13 i k

    t 31 281/4 31 30 31 30 31 31 30 31 30 31 31 t i

    ti 31 591/4 90

    1/4 1201/4 151

    1/4 1811/4 212

    1/4 2431/4 273

    1/4 3041/4 334

    1/4 3651/4 396

    1/4 t

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    146/277

    BASHIRU, OLOWOFESO & OWABUMOYE

    473

    Therefore, from equations 3.7, 3.8, 3.9 and 3.10,the following result:

    a1 =0

    1

    (3.11)

    a2 =0

    2

    (3.12)

    a3 =0

    3

    (3.13)

    Next, substituting 3.11, 3.12 and 3.13 into 3.1gives:

    * 231 2i,k ik ik

    0 0 0

    X tSin(t ) t Cos(t ).

    = + +

    (3.14)

    Because X *i,k is the expected value of X t,i,k ,equation 3.14 can be rewritten as

    231 2t,i,k ik ik

    0 0 0X tSin(t ) t Cos(t ).

    = + +

    (3.15)

    Models 3.14 and 3.15 would only be visible provided there is an occurrence within a monthof any sampled year.

    Modified Nonlinear TrigonometricTransformation Time Series Model(MNTTTSM)

    In a situation where a whole month of

    data is missing, the above model may bedifficult to apply and a different model would beneeded. The model for such occurrence isformulated as follows. If the data in 3.2 arereformed such that the monthly means are thoseshown in Table 4. Consider:

    ( )*i,k i iX a bsin t *= + + (3.16)

    ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )

    2 2 4 2 3 20 i ik i ik i ik ik

    4 2 2 3i ik i ik i ik i ik i ik ik

    2 3 2 2 2i ik i ik i ik i i i ik

    m{ t Sin t t Cos t ( t Sin t Cos t ) }

    t Sin t { t Sin t t Cos t t Cos t t Sin t Cos t }

    t Cos t { t Sin t t Sin t Cos t t Cos t t Sin t }

    =

    +

    (3.7)

    ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )

    * 2 2 4 2 3 21 i,k i ik i ik i ik ik

    * 4 2 2 * 3i ik i ik i,k ik i ik i,k i ik ik

    2 * 3 2 * 2 2i ik i ik i,k i ik ik i ik i,k i ik

    X { t Sin t t Cos t ( t Sin t Cos t ) }

    t Sin t { t Sin t X t Cos t t Cos t X t Sin t Cos t }

    t Cos t { t Sin t X t Sin t Cos t t Cos t X t Sin t }

    =

    +

    (3.8)

    ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( ) ( )

    * 4 2 2 * 32 i ik i,k i ik i ik i,k i ik ik

    * 4 2 2 3i,k i ik i ik i ik i ik ik

    2 2 * 2 *i ik i ik i ik i,k i ik i ik i,k

    m{ t Sin t X t Cos t ( t Cos t X t Sin t Cos t }

    X { t Sin t t Cos t t Cos t t Sin t Cos t }

    t Cos t { t Sin t t Cos t X t Cos t t Sin t X }

    =

    +

    (3.9)

    ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( ) ( )

    2 2 2 * 3 *3 i ik i ik i,k i ik i i ik i,k

    2 * 2 *i ik i ik i ik i,k i ik i ik i,k

    * 3 2 2 2i,k i ik i ik ik i ik i ik

    m{ t Sin t t Cos t X ( t Sin t Cos t t Sin t X }

    t Sin t { t Sin t t Cos t X t Cos t t Sin t X }

    X { t Sin t t Sin t Cos t t Cos t t Sin t

    =

    +

    (3.10)

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    147/277

    NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING

    474

    where

    i1

    1 i 12 , 1 t 3654

    If the expected value of X *i,k is X *mi, then

    equation 3.16 can take the form

    ( )*m i i iX a b sin t *= + + (3.17)where

    i1

    1 i 12 , 1 t * 3654

    An ordinary least square method was used inestimating the parameters a and b. If S m = i2 = (X*mi -(a+bsin(t i*))2, then differentiating withrespect to a and b

    ( )( )*m *m i iS 2 X a bsin ta

    = +

    as

    0

    a

    S m

    ( )*m *i i X 12a b sin t = + (3.18)

    Also,

    ( ) ( )( )( )* *m *m i i iS 2 (sin t X a bsin t b

    = +

    as

    0

    bS m

    ( ) ( ) ( )*m * 2 *i i i isin t * X a sin t b sin t = + (3.19)

    Using Cramers Rule to solve equations 3.18and 3.19 simultaneously, results in:

    ( )22 * *

    4 i i

    *m 2 * * *m *5 i i i i i

    * *m *m *6 i i i i

    12 Sin (t ) Sin(t )

    X Sin (t ) Sin(t )X Sin(t )

    12 Sin(t )X X Sin(t )

    =

    =

    =

    where parameters

    ( )

    *m 2 * * *m *5 i i i i i

    22 * *4 i i

    X Sin (t ) Sin(t )X Sin(t )a

    12 Sin (t ) Sin(t )

    = =

    (3.20)

    and

    ( )

    * *m *m *6 i i i i

    22 * *4 i i

    12 Sin(t )X X Sin(t ) b

    12 Sin (t ) Sin(t )

    = =

    (3.21)

    Therefore, the model for monthly occurrence is

    Table 4: Modified Nonlinear Trigonometric Transformation Time Series Model Data

    t/i 1 2 3 4 5 6 7 8 9 10 11 12 xi/12

    1 X*1,1 X*2,1 X*3,1 X*4,1 X*5,1 X*6,1 X*7,1 X*8,1 X*9,1 X*10,1 X*11,1 X*12,1 X*y1

    2X*

    1,2 X*

    2,2 X*

    3,2 X*

    4,2 X*

    5,2 X*

    6,2 X*

    7,2 X*

    8,2 X*

    9,2 X*

    10,2 X*

    11,2 X*

    12,2 X*y

    2

    k X*1,k X*2,k X*3,k X*4,k X*5,k X*6,k X*7,k X*8,k X*9,k X*10,k X*11,k X*12,k X*yk

    x*m1 x*m2 x*m3 x*m4 x*m5 x*m6 x*m7 x*m8 x*m9 x*m10 x*m11 x*m x*m,i = X*12

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    148/277

    BASHIRU, OLOWOFESO & OWABUMOYE

    475

    * *5 6

    4 4

    ( )mi i X Sin t

    = +

    (3.22)

    Because X *mi is an expected value for X *i,k thenequation 3.22 can be rewritten as

    )( *

    4

    6

    4

    5,

    *ik i t Sin X

    +

    = (3.23)

    Similarly, along the sampled year X *yK = c + dSin( k) for - k + , 15 75. The must

    be chosen such that i = 0, i2 is as small as possible.

    If S y = i2 = (X*yK -(c+dsin( k ))2 then

    y *y

    K

    S2 (X (c dsin( k))

    c

    = +

    as

    0

    c

    S y

    *yK X nc d sin( k) = + (3.24)

    Also,

    y *yK

    S2 (sin( k) X (c dsin( k)))

    d

    = +

    as

    0

    d

    S y

    *y 2K sin( k) X c sin( k) d sin ( k) = +

    (3.25)

    Solving equations 3.24 and 3.25 simultaneously

    using Cramers Rule results in2 2

    7

    *y 2 *y8 k k

    *y *y9 k k

    n Sin ( k) ( Sin( k))

    X Sin ( k) Sin( k)X Sin( k)

    n Sin( k)X X Sin( k)

    =

    =

    =

    Where the parameters

    *y 2 *y8 k k

    2 27

    X Sin ( k) Sin( k)X Sin( k)c

    n Sin ( k) ( Sin( k))

    = =

    (3.26)and

    *y *y9 k k

    2 27

    n Sin( k)X X Sin( k)d

    n Sin ( k) ( Sin( k))

    = =

    (3.27)

    *y 8 9k

    7 7

    X Sin( k)

    = +

    (3.28)

    The method of placing expected

    occurrences in a contingency table of a Chi-square was applied using equations 3.23 and3.28 to obtain the model to find the dailyoccurrences for a particular month of a particular year. Therefore, the model for expected dailyoccurrences is

    k y

    k y

    im

    k it X X X n

    X ***

    ,,))((

    = (3.29)

    Substituting 3.23 and 3.28 into 3.29, results in

    +

    +

    +

    =

    )(

    )()(

    7

    9

    7

    8

    7

    9

    7

    8*

    4

    6

    4

    5

    ,,

    k Sin

    k Sint Sinn

    X i

    k it

    (3.30)

    ResultsModel Analysis and Discussion

    The data on the daily mean temperaturefor Ikeja, Ibadan, Ilorin, Minna and Zariacollected from the Meteorological Centre-Oshodi Lagos were used. The parameters of themodels were estimated and the fitted models for each zone are shown in Table 5 for Ikeja,Ibadan, Ilorin and Minna for ANPTSM. Data for the daily mean temperature was used to estimatethe parameters. The fitted model for Zaria could

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    149/277

    NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING

    476

    not be formulated due to the fact that manymonths of data were missing.

    Table 6 shows the fitted models for

    Ikeja, Ibadan,Ilorin, Minna and Zaria for MNTTTSM using the daily mean temperaturedata to estimate their parameters. The fittedmodel for Zaria was formulated becauseMNTTTSM has the strength of addressing the

    problem of missing values. Thus, although manymonths data were missing from Zarias dailymean temperature, MNTTTSM parameterscould still be estimated. This is one of the

    advantages of MNTTTSM over ANPTSM.Table 7 shows that the results of the

    Pearson Product Moment Correlation

    coefficients and Spearman Browns rank Order Correlation coefficients for Ikeja, Ibadan, Ilorinand Minna are highly and positively correlated,indicating a strong relationship between theactual data and estimated data for the daily meantemperature. In Zaria the correlation coefficientfor MNTTTSM is positive but low which mayindicate a weak relationship between the actualand estimated daily mean temperatures.

    Table 5: The Fitted Models for ANPTSMZones Augmented Nonlinear Parametric Time Series Model (ANPTSM)

    IKEJA ( ) ( )226.88642582 0.047971536tSin 0.000143793 Cost t tik ik + IBADAN ( ) 2ik ik 26.36612286 0.054847742tSin 0.0000344912t Cost t+ ILORIN ( ) ( )t t ik Cos

    2

    ik 000833551.04tSin0.0481158726.2476883 t +

    MINNA ( ) ( )t t t ik ik CostSin2

    00073.0062853.072428.25 +

    ZARIA

    Table 6: The Fitted Models for MNTTTSMZones Modified Nonlinear Trigonometric Transformation Time Series Model (MNTTTSM)

    IKEJA

    =+

    ++10

    1

    *

    )601311.088996.26(

    )6013116.088996.26)(420072.187226.26(10

    K

    i

    k Sin

    k SinSin t

    IBADAN

    =

    +

    ++10

    1

    *

    )901311.036761.26(

    )9013535.036761.26)(591834.136749.26(10

    K

    i

    k Sin

    k SinSin t

    ILORIN

    =

    +

    ++10

    1

    *

    )45409024.040106.26(

    )45409024.040106.26)(816182.145708.26(10

    K

    i

    k Sin

    k SinSin t

    MINNA=

    +

    ++

    10

    1

    *

    )90112148.067047.27(

    )90112148.067047.27)(508736.256143.27(10

    K

    i

    k Sin

    k SinSin t

    ZARIA

    =

    +

    ++10

    1

    *

    )90222282.000445.25(

    )90222282.000445.25)(210108.198532.24(10

    K

    i

    k Sin

    k SinSin t

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    150/277

    BASHIRU, OLOWOFESO & OWABUMOYE

    477

    Apart from Ibadan, in which thecorrelation coefficient in ANPTSM is greater than MNTTTSM and Ikeja which has equalcorrelation coefficients, all other Zones, thecorrelation coefficient in MNTTTSM is greater than ANPTSM. This indicates that MNTTTSMshows a stronger relationship between the actualand estimated values than does ANPTSM.Although the relationship between actual andestimated values of MNTTTSM in Zaria is weak

    but positive, that of ANPTSM could not beestimated due to the large number of missingvalues in the data. Also, all of the correlations

    are significant at the 0.01 level (2-tailed).As shown in Table 8, the mean of the

    actual and estimated values for each zones of allmodels are almost equal; differences are due toapproximation (truncation error) duringcalculation. Also, the mean of the actual andestimated values of MNTTTSM are closer thanthose of ANPTSM, which implies thatMNTTTSM estimates better than ANPTSM. Itwas also discovered from results in Table 8 thatthe more missing values in the data, the weaker the ANPTSM is in estimating, while inMNTTTSM, the model maintains its precision.

    Table 7: Correlation Coefficients

    Zones TypesANPTSM MNTTTSM

    Coefficients Sig. Coefficients Sig.

    IKEJAPearsons r 0.607 .000 0.607 .000

    Spearmans Rho 0.620 .000 0.620 .000

    IBADANPearsons r 0.594 .000 0.575 .000

    Spearmans Rho 0.622 .000 0.584 .000

    ILORINPearsons r 0.503 .000 0.589 .000

    Spearmans Rho 0.560 .000 0.612 .000

    MINNAPearsons r 0.596 .000 0.676 .000

    Spearmans Rho 0.656 .000 0.686 .000

    ZARIAPearsons r - - 0.419 .000

    Spearmans Rho - - 0.445 .000

    Table 8: Comparison of ANPTSM and MNTTSM Means

    Zones NANPTSM MNTTTSM

    Actual Estimated Actual Estimated

    IKEJA 3,660 26.9077 26.8759 26.9077 26.8759

    IBADAN 3,601 26.3749 26.3756 26.3749 26.3791

    ILORIN 3,580 26.4558 26.2443 26.4558 26.4593

    MINNA 3,362 27.5489 26.3611 27.5489 27.5559

    ZARIA 3,588 - - 25.0514 25.0172

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    151/277

    NONLINEAR TRIGONOMETRIC TRANSFORMATION TIME SERIES MODELING

    478

    Table 9 shows that the standarddeviations for MNTTTSM are less than those of ANPTSM which indicates that MNTTTSM is

    better in estimating and forecasting thanANPTSM. Similarly, apart from the standarderror of ANPTSM and MNTTTSM of Ikeja,which are equal, it may be observed that thestandard errors for MNTTTSM were alsosmaller than those of ANPTSM, which indicatesthat MNTTTSM is better in estimating andforecasting than ANPTSM for time series datawith missing values.

    Table 10 shows that at Ikeja, there is a95% chance that the differences between theactual and estimated daily mean temperaturewould lie between -0.00749 and 0.07119 inANPTSM and -0.00748 and 0.07118 inMNTTTSM. Similarly, at Ibadan; -0.0462 and0.04473 in ANPTSM and -0.0496 and 0.04114

    in MNTTTSM, at Ilorin; 0.1505 and 0.2723 inANPTSM and -0.0592 and 0.05218 inMNTTTSM, at Minna; 1.1155 and 1.2601 inmodel I and -0.0689 and 0.05482 in MNTTTSMwhile in Zaria is between -0.0546 and 0.12310.

    It was also discovered that the range of the confidence interval for MNTTTSM is lessthan that of ANPTSM for Ikeja and Ibadan. InIlorin and Minna, the lower confidence intervalsof differences for ANPTSM are positive whichindicates a 95% chance that the differences

    between their actual and estimated dailytemperature (actual estimate) are positivewhile those of MNTTTSM are not. This impliesthat the estimated daily temperatures for ANPTSM at Ilorin and Minna were under-estimated. Hence MNTTTSM is better inestimating and forecasting than ANPTSM whenthere are missing values in the time series.

    Table 9: Comparison of ANPTSM and MNTTSMS Standard Deviation andStandard Error of Differences

    ZonesANPTSM MNTTTSM

    Std. Dev. Std. Error of the Mean Std. Dev.Std. Error of

    the MeanIKEJA 1.2138 0.02006 1.2137 0.02006

    IBADAN 1.3913 0.02319 1.3882 0.02313

    ILORIN 1.8585 0.03106 1.6996 0.02841MINNA 2.1381 0.03688 1.8293 0.03155

    ZARIA - - 2.7152 0.04533

    Table 10: Comparison of ANPTSM and MNTTTSMs 95 % Confidence Interval of the Difference

    ZonesANPTSM MNTTTSM

    Lower Upper Lower Upper

    IKEJA -0.00749 0.07119 -0.00748 0.07118

    IBADAN -0.0462 0.04473 -0.0496 0.04114

    ILORIN 0.1505 0.2723 -0.0592 0.05218

    MINNA 1.1155 1.2601 -0.0689 0.05482

    ZARIA - - -0.0546 0.12310

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    152/277

    BASHIRU, OLOWOFESO & OWABUMOYE

    479

    ConclusionThe two models tested in this study were theAugmented Nonlinear Parametric Time SeriesModel (ANPTSM) and the Modified Nonlinear Trigonometric Transformation Time SeriesModel (MNTTTSM). Both models were testedusing daily mean temperatures at Ikeja, Ibadan,Ilorin, Minna and Zaria, and the results wereanalyzed. It was discovered that ANPTSM could

    be used in forecasting provided the data ishaving few missing values. However MNTTTSM estimates forecasts better thanANPTSM in estimating missing values andforecasting. Based on results of this study,MNTTTSM is more efficient in estimatingmissing values and forecasts better thanANPTSM.

    The beauty of a good model developedfor nonlinear time series modeling is the abilityto forecast better, the new method MNTTTTSMis therefore recommended for numericalsolutions for a nonlinear model with missingvalues due to its higher capacity to addressmissing values. It was also noted that themathematical derivative of MNTTTSM issimpler than ANPTSM which did not forecast

    better. Further research could be conducted by placing a condition in which data having a year or more of missing values is taken intoconsideration.

    ReferencesBarnett, W. A., Powell, J., & Tauchen,

    G. E. (1991). Nonparametric and semi parametric methods in econometrics and statistics . Proceedings of the 5 th InternationalSymposium in Economic Theory andEconometrics. Cambridge: CambridgeUniversity Press.

    Bates, D. M., & Watts, D. G. (1988). Nonlinear regression analysis and itsapplications . New York: Wiley.

    Box, G. E. P., Jenkins, G. M. (1970).Time series analysis, forecasting and control .San Francisco: Holden-Day.

    Brock, W. A., Potter, S. M. (1993). Nonlinear time series and macroeconometrics.In: G.S. Maddala, C. R. Rao & H. R. Vinod,Eds., Handbook of Statistics , Vol. 11 , 195-229.Amsterdam: North-Holland.

    De Gooijer, J. G., & Kumar, K. (1992).Some recent developments in non-linear timeseries modeling, testing and forecasting.

    International Journal of Forecasting , 8, 135-156.

    Friedman, J. H., & Stuetzle, W. (1981).Projection pursuit regression. Journal of the

    American Statistical Association , 76 , 817-823.Gallant, A. R. (1987). Nonlinear

    statistical models . New York: Wiley.Gallant, A. R., & Tauchen, G. (1989).

    Semi nonparametric estimation of conditionallyconstrained heterogeneous processes: asset

    pricing application. Econometrica , 57 , 1091-1120.

    Granger, C. W. J., & Hallman, J. J.(1991a). Nonlinear transformations integratedtime series. Journal of Time Series Analysis , 12 ,207-224.

    Olowofeso, O. E. (2006). Varying co-efficient regression models for non-linear time

    series: Estimation and testing .13 th Meeting of the Forum for Interdisciplinary Mathematicaland Statistical Techniques.

    Priestley, M. (1988). Non-linear and

    non-stationary time series analysis . London:Academic Press.

    Robinson, P. M. (1983). Non-parametricestimation for time series models. Journal of Time Series Analysis , 4, 185-208.

    Sugihara, G., & May, R. M. (1990). Nonlinear forecasting as a way of distinguishingchaos from measurement error in time series.

    Nature , 344 , 734-741.Terasvirta, T., & Anderson, H. M.

    (1992). Modelling nonlinearities in businesscycles using smooth transition autoregressive

    models. Journal of Applied Econometrics , 7 ,S119-S136.

    Tsay, R. S. (1986). Nonlinearity tests for time series. Biometrika , 73 , 461-466.

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    153/277

    Journal of Modern Applied Statistical Methods Copyright 2010 JMASM, Inc. November 2010, Vol. 9, No. 2, 480-487 1538 9472/10/$95.00

    480

    Incidence and Prevalence for A Triply Censored Data

    Hilmi F. KittaniThe Hashemite University,

    Jordan

    The model introduced for the natural history of a progressive disease has four disease states which areexpressed as a joint distribution of three survival random variables. Covariates are included in the modelusing Coxs proportional hazards model with necessary assumptions needed. Effects of the covariates areestimated and tested. Formulas for incidence in the preclinical, clinical and death states are obtained, and

    prevalence formulas are obtained for the preclinical and clinical states. Estimates of the sojourn times inthe preclinical and clinical states are obtained.

    Key words: Progressive disease model, prevalence, incidence, trivariate hazard function, censored data, proportional hazards model, sojourn times, chronic habitu.

    IntroductionLouis, et al. (1978) introduced a natural historymodel for a progressive disease in a set of threearticles: Albert, Gertman and Louis (1978),Albert, Gertman, Louis and Liu(1978) andLouis, Albert and Heghinian (1978). This modelwas extended by Kittani (1995a). Clayton (1978)also developed a model for association for the

    bivariate case and Oakes (1982) made inferencesabout the association parameter in Claytonsmodel. Clayton and Cuzick (1985) introduced

    the bivariate survival function for two failuretimes and made inferences about the association parameter, . Kittani (1995a, 1996, 1997, 1997-1998) considered the model for the bivariatecase that is, a case with two failure times (X,T) by including covariates and by using Coxs

    proportional hazards model.The motivation for this research lies in

    the fact that it is necessary to identify a threedimensional survival function for three failuretimes (X, Y, D) with four disease states (diseasefree state, preclinical state, clinical state and

    death state). In the model, X is the age uponentering the preclinical state (tumor onset or first

    Hilmi Kittani is a Professor of Statistics in theCollege of Science, Department of Mathematics.Email: [email protected].

    heart attack), T is the age when entering theclinical state (symptoms first appear or secondheart attack) and D is the age upon entering thedeath state (dying of cancer or acute myocardialinfarction). Kittani (2010) considered estimatingthe parameters using nonparametric approach for a triply censored data.

    Background and AssumptionsAs in the Louis, et al. (1978) model, it is

    assumed that f XYZ(x,y,z,a) is continuous that is,

    X = Y = Z = is not allowed and Y and Z aretermed the sojourn times in the preclinical andclinical states respectively. The model proposed

    by Louis, et al. (1978) makes the assumption of no cohort effect, meaning that the distribution of the random variables (X, Y, Z) is independent of the age distribution A, or

    f XYZA (x, y, z, a) = f XYZ(x, y, z) f A(a)

    andf XYA (x, y, a) = f XY(x, y) f A(a)

    where f XYZ(x, y, z) is the joint pdf of (X, Y, Z) ,f XY(x,y) is the joint pdf of X, Y and f A(a) is the

    pdf of A( the age distribution of the subject population). In addition, a subject is a chronichabitu of the PCS if, for that subject, X < , Y= , for example, subject never leaves PCS.According to the model, there will be no chronichabitus of the PCS or CS because, if a subject

  • 8/2/2019 Empirical Characteristic Function Approach to Goodness of Fit Tests

    154/277

    KITTANI

    481

    lives long enough, then he/she will progress tothe next state eventually (Louis et al., 1978).

    The X, Y and Z axes are partitioned intoI, J and K intervals according to Chiang, et al.(1989) and Hollford (1976); they assumedconstant baseline hazards in each subinterval, 1i(x) = 1i, x Ii, 2j (y) = 2j , y I j and 3k (z)= 3k , z Ik in the i th, jth and k th intervalsrespectively. The hazard functions for the n th individual whose (X, Y, Z) values fall in thecube I ixI jxIk are modeled by assuming Coxs(1972) proportional hazards model and holds for each X, Y and Z in each respective I i, I j and I k interval.

    Assuming , and (regression parameters) for the covariate (p-dimensional)are constant (the same) for all intervals to beestimated . The hazard functions 1, 2 and 3 inthe I th, Jth and K th intervals for n th individualwhose observed (X, Y, Z) value is ( Xn, Yn, Zn)will be defined as

    'n(x ) e , x I1i n 1i n i

    (a , a ], (y )i i 1 2 j n'

    ne ,2 j

    =

    = +

    =

    andy I (b , b ], (z )n j j j 1 3k n

    'ne , z I3k n k

    (c ,c ].k k 1

    = +

    =

    = +

    Where 1i, 2j and 3k are baseline hazardfunctions associated with X, Y and Zrespectively. Assuming , and are constant(the same) regression parameters for thecovariate for all intervals and to be estimatedalong with the association parameter .

    The joint survival function for the threenon-negative random variables (X, Y, Z) given

    by Kittani (1995b) is:

    1

    -)()()( ]2[z),F(x, 321 ++= z y x eee y .(2.1)

    Where > 0, x > 0, y> 0, z > 0, and 1, 2, 3 are the cumulative hazard functions associatedwith X, Y and Z respectively. For example, tocompute 1i(x), which is the cumulative hazardfunction for the n th individual whose x value fallsin the i th interval (assuming a constant hazardover each interval) is as follows:

    k x

    n 10

    r 1 r n i

    (x ) (u)du1i

    i 1 '(a a ) (x a ) e1r 1i

    r 1

    +

    =

    = +

    = (2.2)

    where 2j(yn) and 3k (zn) are defined in a similar way. Thus, the joint density function (X, Y, Z) is

    1 32

    1 2

    3

    3)[ ( x ) ( y ) ( z )]

    y

    ( 1)( 2) (x ) (y)

    (z) e U1

    (-

    f(x, ,z)

    + +

    =

    + +

    (2.3)

    where > 0, x > 0, y > 0, z > 0, and 1, 2 and 3 are base line hazard functions associated with X,Y and Z respectively as

    31 2 (z)(x) (y)U e e e 2. = + +

    Kittani (1996) derived the likelihood functionfor the uncensored and censored cases in order to estimate the regression parameters bymaximizing the likelihood function, that is, thenth individual that generates data vector wn , andL(wn) is the likelihood function contribution for the n th individual as:

    .1 2 N N

    L(w ,w ,...,w ) L(w )nn 1

    = =

  • 8/2/201