local regression with meaningful parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf ·...

17
Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression or loess has become a popular method for smoothing scatterplots and for nonparametric regres- sion in general. The final result is a “smoothed” version of the data. In order to obtain the value of the smooth estimate associated with a given covariate a polynomial, usually a line, is fitted locally using weighted least squares. In this paper we will present a version of local regression that fits more general parametric functions. In certain cases, the fitted parameters may be interpreted in some way and we call them meaningful parameters. Examples showing how this procedure is useful for signal processing, physiological, and financial data are included. KEY WORDS: Local Regression, Harmonic Model, Meaningful Parameters, Sound Analysis, Circadian Pattern. 1 Introduction Local regression estimation is a method for smoothing scatterplots, for , in which the fitted value at, say, is the value of a polynomial fit to the data using weighted least squares where the weight given to is related to the distance between and (Cleveland 1979; Cleveland and Devlin 1988). Some theoretical work exists, for example Stone (1977) shows that estimates obtained using the local regression methods have desirable asymptotic properties. Recently, Fan (1992,1993) has studied minimax properties of local linear regression. In this paper we will Rafael A. Irizarry is Assistant Professor, Department of Biostatistics, School of Hygiene and Public Health, Johns Hopkins University, Bal- timore, MD 21205 (E-mail: [email protected]). The author would like to thank David Wessel, Adrian Freed, George Katinas, and Heather Bell for providing the data. The author would also like to thank the editor and the referee for suggestions that have improved this paper. Partial support for this work was provided by the Johns Hopkins University School of Public Health Faculty Innovation Fund. 1

Upload: others

Post on 17-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

LocalRegressionwith MeaningfulParameters

RafaelA. Irizarry�

Abstract

Local regressionor loesshasbecomea popularmethodfor smoothingscatterplotsandfor nonparametricregres-

sionin general.Thefinal resultis a“smoothed”versionof thedata.In orderto obtainthevalueof thesmoothestimate

associatedwith a given covariatea polynomial,usuallya line, is fitted locally usingweightedleastsquares.In this

paperwe will presenta versionof local regressionthatfits moregeneralparametricfunctions. In certaincases,the

fittedparametersmaybeinterpretedin someway andwe call themmeaningfulparameters.Examplesshowing how

thisprocedureis usefulfor signalprocessing,physiological,andfinancialdataareincluded.

KEY WORDS:LocalRegression,HarmonicModel,MeaningfulParameters,SoundAnalysis,CircadianPattern.

1 Introduction

Local regressionestimationis a methodfor smoothingscatterplots,�����������

for ���� ���������� , in which thefittedvalue

at,say,���

is thevalueof apolynomialfit to thedatausingweightedleastsquareswheretheweightgivento�����������

is

relatedto thedistancebetween���

and���

(Cleveland1979;ClevelandandDevlin 1988).Sometheoreticalwork exists,

for exampleStone(1977)showsthatestimatesobtainedusingthelocal regressionmethodshavedesirableasymptotic

properties.Recently, Fan(1992,1993)hasstudiedminimaxpropertiesof local linearregression.In thispaperwewill�Rafael A. Irizarry is AssistantProfessor, Departmentof Biostatistics,Schoolof HygieneandPublic Health,JohnsHopkinsUniversity, Bal-

timore,MD 21205(E-mail: [email protected]).Theauthorwould like to thankDavid Wessel,Adrian Freed,George Katinas,andHeatherBell for

providing thedata.Theauthorwould alsolike to thanktheeditorandtherefereefor suggestionsthathave improvedthispaper. Partial supportfor

thiswork wasprovidedby theJohnsHopkinsUniversitySchoolof PublicHealthFacultyInnovationFund.

1

Page 2: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

concentrateon theconceptualaspects.

Throughtheexplorationof datarepresentingsoundsignalsproducedby musicalinstruments,wepresentaversion

of localregression,whichfits harmonicfunctionsinsteadof polynomials.Localestimatesof thefrequency andampli-

tudesof harmoniccomponentsareobtainedaswell asa smoothversionof thedata.Thefitted parameters,frequency

andamplitudes,may have meaningfulinterpretations.Whenthis is the casewe will say that they aremeaningful

parameters. We explorethepossibilityof applyingthis procedureto otherdatathatappearto beapproximatelyperi-

odic. Finally, wediscussthepossibilityof extendingthisversionof localregressionto other, moregeneral,parametric

functionswheretheparametersmaybeinterpretedin somespecificway. In thenext sectionwe examineanexample

wherethefittedparameterscontainusefulinformation.

Sounds,data,andsoftwarerelatedto theexamplesdescribedin this papercanbeobtainedfrom theauthor’sweb

page:http://biosun01.biostat.jhsph.edu/ � ririzarr2 Local Regression

Local regressionor loessis a smoothingtechniquedesignedto accommodatedatafor which weassumetheobserved� �’saretheoutcomeof a randomprocessdefinedby:� � ��� ��� � �� "! � � ��#� ����������

with � ���$� a “smooth” functionof a covariate�

and! �

an independentidenticallydistributedrandomerror. For the

examplespresentedin thispaperthecovariateis timesowewill usethenotation� � �&% � to denotethetimeassociated

with the th measurement� �

.

Loessis a numericalalgorithmthat prescribeshow to computean estimate '� � of � � % � � for a specific % � . By

repeatingthe procedurefor all % � �(%�) ��������� %+* , we obtain a smooth, '� �� ,�-� ���������� . How do we computean

estimateof � � % �.� ? Given % � we definea weight for each % �+� /�0� ����������� using 1 ����23�4� �65 ��7 % �98 % �:7;�=<>� , with5 a non-negative function decreasingwith ? and 5 � ? ��<@� �BA for ?0C <. The statisticalpackageS-Plususes5 � ? �=<>� �#DE� 8F� ?HG <>�I.J.I for ?LK < (Clevelandetal. 1993).Thespanor window size

<controlsthe“smoothness”

2

Page 3: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

of thefinal estimate(Cleveland1979;ClevelandandDevlin 1988).

Oncetheweightshavebeendefinedwefind the 'M thatminimizes*N �PO ) 1 � � % � � D � � 8FQE� % �R M �=JTS (1)

withQE� % R M � somepolynomial,usuallya line (S-Plushastwo options,a line or a parabola).We estimate� � % � � with'� � � QU��23�E� 'M � .

In Figure1 we seeanexampletakenfrom Diggle, Liang,andZeger(1994). Thefigureshows a scatterdiagram

of CD4cell countsversustimesinceseroconversionfor menwith HIV. Thefirst plot showsthefitted line when< �#�

year, for % � � 8WV:� A � andV

years.Thesecondplot shows thesmooth '� � � ��X� ���������� , thefinal result.Noticethatthis

lastplot showsausefulsummaryof thedata:onaverageCD4countsgodown right afterseroconversionandareclose

to beingfixedbeforeandafter. However, in thisparticularcase,thefittedslopesandinterceptscontainedin the 'M sare

notusedto convey this information.

3 Local Harmonic Regression

Many timeseriesdataaremodeledwith theso-calledsignalplusnoisemodels.Many methodshavebeenproposedto

analyzethesetypesof data,seefor exampleBrockwell andDavis (1996). In many casesthesignalis thecomponent

of interest.Non-parametricmethods,suchasloess,have beenusedto obtainusefulestimatesof the signalin such

models.In thissectionweexamineaspecificexamplewherethelocalbehavior of thesignalcanbewell approximated

with aharmonicfunction.Thissuggeststhatweuseharmonicfunctionsinsteadof polynomialsforQU� % � M � in equation

(1). We call thisprocedurelocalharmonicregressionor lohess.

3.1 Musical Sound Signals

Soundcanberepresentedasareal-valuedfunctionof time. Thisfunctioncanbesampledatasmallenoughratesothat

theresultingdiscreteversionis agoodapproximationof thecontinuousone.Thispermitsoneto studymusicalsounds

as discretetime series. Physicalmodeling(Fletcherand Rossing1991) suggeststhat many musical instruments’

3

Page 4: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

.

.

.

.

.

.

.

..

. .

.

.

.

.

...

.

..

..

.

..

.

.

..

.

..

.

..

..

.. .

.

.

.

.

.

.

..

.

.

.

...

.

.

.

..

..

.

..

.

..

.

.

.

.

.

.. .

.

.

.

.

.

.

.

.

..

.

...

.

.

.

. ..

.

..

.

.

.

.

.

..

. ..

..

...

.

.

.

.

.

.

.

.

.

...

.

. ..

.

.

.

.

..

..

.

.

.

..

..

.

.

.

.

... .

..

..

..

..

.

.

.

.

.

.

.

. .

..

.

.

..

.

.

.

.

.

.

...

.

. .

.. .

.. .

.

.

. ..

..

.. ..

.. .

.

.

.

.

. .

.

.

.

.

.

.

..

..

..

.

..

..

.

.. .

..

..

..

...

.

.

.

.

.

.

.

.

.

.

..

.

.

..

.

.

.

..

.

. .

.

.

.

. .

.

.

.

.

.

.

. .

.

..

. ..

.

.

.

.

.

.

.

.

.

..

. .

-2 0 2 4

050

015

00

.

.

.

.

.

.

.

..

. .

.

.

.

.

...

.

..

..

.

..

.

.

..

.

..

.

..

..

.. .

.

.

.

.

.

.

..

.

.

.

...

.

.

.

..

..

.

..

.

..

.

.

.

.

.

.. .

.

.

.

.

.

.

.

.

..

.

...

.

.

.

. ..

.

..

.

.

.

.

.

..

. ..

..

...

.

.

.

.

.

.

.

.

.

...

.

. ..

.

.

.

.

..

..

.

.

.

..

..

.

.

.

.

... .

..

..

..

..

.

.

.

.

.

.

.

. .

..

.

.

..

.

.

.

.

.

.

...

.

. .

.. .

.. .

.

.

. ..

..

.. ..

.. .

.

.

.

.

. .

.

.

.

.

.

.

..

..

..

.

..

..

.

.. .

..

..

..

...

.

.

.

.

.

.

.

.

.

.

..

.

.

..

.

.

.

..

.

. .

.

.

.

. .

.

.

.

.

.

.

. .

.

..

. ..

.

.

.

.

.

.

.

.

.

..

. .

-2 0 2 4

050

015

00

Years since seroconversion

Figure1: CD4cell countsinceseroconversionfor HIV infectedmen.

soundsmaybecharacterizedby a approximatelyperiodicsignalplusstochasticnoisemodel,� � �Y� � % � �$ Z! � � ��#� ����������[�Thetimeseriesis regular, sowithout lossof generalitywewill assumeit hasa oneseconddurationandset % � �Y G � .

Researchersareinterestedin separatingthesetwo elementsof thesoundandfinding parametricrepresentationswith

musicalmeaning(Rodet1997).

For many instrumentsonemayarguethatthelocalbehavior of thesignalproducedwhenplaying,say, a C-noteis

periodic,seePierce(1992)for details.We mayverify this from thedataby computingandexaminingspectrograms.

For data� � � ��#� ���������� , wedefinethespectrogramat time % � �\ � G � with] � % �U��^@� � �V4_`�aVEbc � � degf �Ph�i$jN�kO$�kh�lmjon�pUq ��V4_$^ G �$�r���s S f �Ph�i$jN�kO$�kh�lmjoq�tku ��V4_$^ +G �$�r���s S@vw �

4

Page 5: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

Here�aVEbx � � G � is somesuitablewindow size.For any � , if thedata

� �kh�lmj ���������� �Phi$jis periodicwith fundamental

frequency^ �

cyclespersecondthefunction] � % � �=^>� will havepeaksat themultiplesof

^ �(Bloomfield1976).As an

example,in Figure2, we show spectrogramsof thesignalsproducedby a violin playinga C-note,anoboeplayinga

C-note,anda guitarplayinga D-note,all of themsampledat 44100measurementspersecond.Dark shadesof grey

representlargevalues.Thedarkhorizontalbandsat frequenciesthataremultiplesof 260Hz. (146Hz. for theguitar)

suggestthat locally the signalsareperiodicwith that fundamentalfrequency. The fact that for different times the

darknessof thesehorizontalbandschangessuggeststhatit is only locally thatthesignalsareperiodic.Similar results

areobtainedfor other“harmonic” instruments.

Violin

Timey

Fre

quen

cy in

Hz.

0.0 0.5 1.0 1.5

0

1000

2000

3000

4000

Oboe

Timey

Fre

quen

cy in

Hz.

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0

1000

2000

3000

4000

Guitar

Timey

Fre

quen

cy in

Hz.

0 1 2 3 4z0

1000

2000

3000

4000

Figure2: Spectrogramsof thesoundsignalsof anoboe,a violin, andaguitar.

Any periodicdeterministicregulartimeseries2 ) ���������2 * with period{ canbewrittenas2 � �Y| � ~}�� S l )N� O )�� | � n�pEqW� V4_$�{ �� "� � q�tku�� V4_$�{ ��H� | }�� S n�pEq ��_ �

for ���� ����������� (Bloomfield1976). We define^ � � G { asthe fundamentalfrequency andits units arecyclesper

5

Page 6: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

every�

measurements.This suggeststhat for time seriesdatathatappearto be locally periodicwe mayuselohess,

with QU� % � � M � � � -�N� O ) � | � n�pEq � V4_$�� ^ � �� � qtPu � V4_$�� ^ ��� (2)

� � -�N� O ):� � n�pEqW� V4_$�� ^ o� � � (3)

in equation(1). In this casetheparameterwe fit isM � � � ��^$� � ) ��������� � � �=� ) ����������� � �+� with � � ��� | S� "� S� and� � ���� n�� � u �8W� � G4| � � for

� �x� ����������� , and�0�Y� G ��V^@� fixed. Theparametersin

Mareusefulfor interpretation.

They aremeaningfulparametersaswill bedescribedin thenext section.

3.2 Meaningful Parameters

As describedin Section2, whenusingloesswe computea different 'M for each ���� ���������� . We usethesubscript'M � � ���� ���������� to denotethedifferentfitted values.In thecaseof loess,thesefitted valuescanbeusedto give, for

example,local slopeestimates.However, in general,the smooth '� itself conveys moreinformationaboutthe data

thanthe fitted parameters'M � ’s. With lohessthe fitted parameters'M � containwhat could be interpretedasthe local

meanlevel '� � , andlocal estimatesof the fundamentalfrequency '^ � andamplitudes '� �.� � ��� ��� ����������� andphases'� �.� � ��� �#� ����������� of eachsinusoidalcomponentor harmonic.Anotherusefulquantityis� �N� O ) '� S �.� �a� ) � S �whichwereferto asthetotalamplitude.In thecaseof lohess,thefittedparametersmaysometimesbeconsideredmore

informativethanthesmooth '� . For somemusicalapplications,thefittedparametersaretheonly quantitiesof interest

andareusedasparametricrepresentationsof sound.Timbreanalysis,soundrecreation,time-scale/pitchmodification,

andtimbremorphing/modificationaresomeexamplesof applicationsthat usesuchparametricrepresentations.See

Irizarry (1998)for moredetails.

In orderto give the readera betterideaof how theseapplicationswork andhow they areusefulin practice,we

describea timbre morphingexample. Timbre morphingis the processof combiningtwo or moresoundsto create

6

Page 7: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

a new soundwith “intermediate”timbre. Morphing canbe usedto createinterestingsoundsthat arenot found in

nature,but thathave thecharacteristicsof naturallyoccurringsounds.An interestingexampleis the recreationof a

castratovoice(Depalleet al. 1995).This wasdoneto producethesound-trackfor Farinelli, a film aboutthefamous

18thcenturycastrato.To simulateFarinelli’s voice,thevoiceof a countertenoranda sopranowereanalyzedwith a

proceduresimilar to lohess.Thetwo meaningfulparameterswerecombinedin a way thatproduceda new parametric

representationfor a timbresimilar to thatof acastrato.

Now we consideran exampleof lohessappliedto datapresentedin this paper(and availableon the author’s

web-page).We demonstratehow we canmodify the timbre of the oboesoundof Figure2 usingthe resultsfrom

lohess.Timbremodificationis theprocessof changinga soundby manipulationof musicallymeaningfulparametric

representations.To obtaina smooth '� andfitted valuesof meaningfulparametersfor theoboesound,we uselohess

with� �B�.  and

< � V A milliseconds.In Figure3 we seethe resultingsmoothversionof theoboesoundsignal,

the residuals,andsomeof theestimatedparameters.Thefitted fundamentalfrequency andamplitudesof thefirst 5

sinusoidalcomponentsagreewith whatwehear:theinstrumentalistis playinga vibratoanda tremolo.

The 'M � s provide a parametricrepresentationfor the harmonicpart of the oboesound,which we can construct

usingQU� % � � 'M � � . If we convert

QU� % � � 'M � � into soundwe areableto heara harmonicsoundvery similar to the original

(indistinguishablefrom the original for most listeners). The residualshave a noisy sound(we must amplify the

residualsin order to hearthem). It is said that if one listensto an oboevery closelyonecanhearthe voice of a

sopranosingingat an octave above the fundamentalfrequency of the oboesignal. We canmodify the oboesound

throughour parametricrepresentationandbring out thehiddensoprano.That is, by creatingthesoundQU� % � �¡¢� 'M � ��

where¡¢�+£¤�

is a function that modulatesthe amplitudesof the evenharmonics(the evenharmonicsarerelatedto a

soundanoctave above theoriginal). SeeIrizarry (1998)for moredetails.On theauthor’s web-pagethereis a music

demowhereonecanhearall thesoundsmentionedin thisexample.

If weperformthesameanalysisona3 secondsegmentof theviolin sound,wewouldthenhavetwo parametricrep-

resentations,say 'M[¥§¦�¨�¦�©«ª� � 'M[¥P¬ � ¦­ � * ª� � ®�#� ����������� . A morphof thesetwo soundscanbecreatedbyQE� % �+��¡~� 'M[¥P¦�¨�¦�©«ª� � 'M[¥k¬ � ¦+­ � * ª� ��

with¡¢�+£¤�

a functionthatcombinestheparametricrepresentationin orderto obtaina satisfactorymorph. In practice

7

Page 8: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

Smooth Estimate

Time in Seconds

V(t

)

0.5 1.0 1.5 2.0 2.5 3.0

-40

24

6Residuals

Time in Seconds

0.5 1.0 1.5 2.0 2.5 3.0

-0.4

0.0

0.4

Fitted fundamental frequency

Time in seconds

Fre

quen

cy in

Hz.

0.5 1.0 1.5 2.0 2.5 3.0

258.0

258.5

259.0

259.5

260.0

260.5

Amplitudes of first 5 partials¯

Time in seconds

Am

plitu

de

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

12345

Figure3: Smoothestimateof anoboesoundsignaldatawith residualsandestimatedmeaningfulparameters.

onemust“tweak” to obtainauseful¡~�£ �

, but in essenceit definesnew amplitudesby averagingthetwo originals,that

is � ¥k°±¦�² }�³ ª� ��´ '� ¥§¦�¨�¦�©«ª�T� � �� � 8 ´ � '� ¥k¬ � ¦+­ � * ª�.� � with AµK&´µK#� determiningif themorphsoundsmorelike theoboeor the

violin. SeeDepalle,GarcıaandRodet(1995)for moredetails.

3.3 Standard error approximations

Loader(1999) presentsfinite sampleresultsthat may be usedto obtain an approximationof the varianceof our

estimates.Irizarry (2000)presentsasymptoticresultsrelatedto local fitting of harmonicfunctions.In this sectionwe

presenttherelevantresultsin aheuristicfashion.

Loader(1999,Chapter2) presentsvarianceexpressionsbasedon finite samplecalculationsfor local regression

methods.As pointedoutby thereferee,theanalogousform of theresultpresentedin Loader(1999)for thenon-linear

8

Page 9: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

problemis

cov� 'M3¶ � �Y· Sg¸ l )) � ¶ ¸ S � ¶ ¸ l )) � ¶ (4)

where ¸ ° � ¶ � *N �kO ) 1 ��� % � °º¹ QU� % �� M �¹ M ¹ QE� % �+� M �¹ M » �Hereweareassumingthattheerrorsareindependentandidenticallydistributedwith variance· S .

For theharmonicregressioncase,Irizarry (2000)developssomeasymptoticresults.Considerthata locally peri-

odicsignalcanbeexpressedas � � � QU� % ��R M � % � ���$ "! � � [�#� ����������with

QU� % � R M � % ����� asin (2) but with theparameterM

now dependingon % � . Irizarry (2000)showsthatif 1) thesamplerate�is highenough,2)

M � % � � is smoothenough,and3) thefundamentalfrequency is largeenough,thenfor any % � , wecan

find smallestimationwindows� % � 8µ< G V:� % � /< G VE� with

M � % � � well approximatedbyM � % � � within

� % � 8µ< G V:� % � /< G VE�andwith enoughdatapoints

�$<receiving positive weightso that the estimatesobtainedwith lohesshave desirable

asymptoticproperties,suchasasymptoticnormalityof theestimates.To getanideaof why, considerthatconditions

2) and3) giveusthatQU� % � R M � % �«�� is approximatelyharmonicwithin theestimationwindow, condition1) givesusmany

datapoints.We canusetheresultspresentedin Irizarry (2000)to obtainapproximatemarginalstandarderrorsfor our

parameters.For example,for independentidenticallydistributederrorswehave

var� '^¼�4�¾½ V '· S��$< I � 5 S�$¿ S 8FV 5 � 5 ) ¿ ) 5 S) ¿ �� 5 � 5 S 8 5 S) � S � f �N� O ) �¼S '� S �.� � s

l )and (5)

var� '� �.� ���¾½ V '· S��$< � ¿ �5 S� � � (6)

Here '· S� is a localestimateof thevarianceof theerrorsand 5 �U� 5¢) � 5 S � ¿ �� ¿ ) � and¿ S areconstantsdefinedby5ºÀÁ� �V<à³l ³ �ÅÄVE< �V � À 5 � Ä ��<@�:Æ Ä and

¿ ÀÁ� �VE<à³l ³ ��ÄVE< �V � À 5 � Ä ��<@�+S�Æ Ä for ÇÈ�YA � � �=V:�Noticethat Ä G VE<É �4G V takes Ä from Ê 8W<$��<¼Ë to Ê A � � Ë .

For themusicsignals,thesamplerate(� �ÍÌEÌ3�.AA measurementspersecond)is large, theharmonicparameters

canbe consideredto be slowly varying (asseenin the spectrogram)andthe fundamentalfrequency (for the oboe

9

Page 10: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

soundit waswas260cyclespersecond)is largeenoughsothatin relatively smallwindows(< � V A milliseconds)we

still have enoughoscillation(about5) to have theharmonicmodelasa goodapproximationandenoughdatapoints

(� � �ÏÎÎ V ) to be able to usethe large sampleapproximations.The normality approximationmotivatedby these

asymptoticspermitsusto constructapproximateconfidenceintervals. However, theseconfidenceintervalsshouldbe

interpretedwith careasit is difficult to assesstheassumptionthat“M � % � is smoothenough”.

Loader(1999,page39)pointsout thatasymptoticexpansions“shouldneverbeconsideredanalternative to (finite

sampleresults)for actuallycomputingvariance”.However, computing(4) maybecomputationallyintensive, aswe

needto performmatrix inversionsandmultiplicationsfor each ��(� ����������� . The asymptoticresultspresentedin

Irizarry (2000)provide approximationsthatarein closedform. Furthermore,onecanshow thatwhenthenumberof

cycleswithin theestimationwindow is large, (5) and(6) areequivalentto theexpressionsobtainedwith (4). In our

experiencewith soundsignals,whenonehasmorethan5 cyclesin theestimationwindow theresultsarepractically

equivalent,for 3-5cyclesthey areusefullyclose.Thismeansthatevenif wedon’t usetheasymptoticnormalityresult,

we maystill find (5) and(6) to beusefulapproximationsto thevarianceestimatesobtainedfrom (4). In thesoftware

sectionof theauthor’sweb-pagethereis S-Pluscodethatcompares(4) and(5)-(6).

For many datasetsit is not convenientto assumeconstantvariance. For example,for soundsignalsthe noise

is usuallystrongerduring the beginning of the note,or what musicianscall the attack. This is agreementwith the

residualplot seenin Figure3. However, (4), (5) and(6) permita differentestimateof · S for eachestimationwindow,

but assumethatwithin theestimationwindows theerrorsareindependentandidenticallydistributed. We justify this

with a heuristicargument:for ourexample,webelieve theway thevariance,· S� � var�a!����Ð� ��#� ���������� , varieswith

is slow enoughfor usto considerdatawithin anestimationwindow to beindependentandidenticallydistributed.We

canusethe '· S� s asa smoothestimateof thevariances· S� . Furthermore,for many examples,especiallyfor timeseries

data,theindependenceassumptionmaynotbeappropriate.Theresultspresentedin Irizarry (2000)aredevelopedfor

locally stationarynoise.However, extendingtheresultspresentedin thispaperto moregeneralcasesis of interestbut

is notdiscussedhere.

10

Page 11: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

3.4 Other data sets

Many time seriesdatahave trendsfor which onemayarguethat locally thebehavior is periodic. Differentmethods

have beenproposedfor estimatingthis trend, seefor exampleClevelandet. al. (1990). In situationswherethe

assumptionsdiscussedin theprevioussectionseemapproximatelytrue,we proposelohessasa usefulalternative. In

thissectionwebriefly discusstheapplicationof lohessto abiologicaldataset.

Physiologicalmeasurementsthat vary in a circadianpatternin humanshave beenextensively studied,seefor

exampleGreenhouse,Kass,andTsay(1987)andWangandBrown (1996).Systolicbloodpressureis anexample,it

is lowestduringsleepandearlymorningandrisesafterawakening. Figure4 shows measurementsof systolicblood

pressuretaken for 208dayswith anaverageof about48 measurementstakenperday on a patientthat suffers from

a conditiondubbedcircadianhyper-amplitude-tension(CHAT). This conditionis diagnosed,for example,whenthe

patient’s daily bloodpressurerangeis largerthanwhat is considerednormal. CHAT mayoccurwithout an increase

of over-all blood pressure,but is believed to put the patientat risk aswell. Physiciansare interestedin how three

differenttreatmentsaffectthepatient’scondition.Fromday1-80thepatientwastakingnifedippine,from days81-125

blocalcinis addedto thetreatment,andduringdays126-208blocalinis administeredduringnight insteadof morning.

SeeKatinaset. al. (1999)for details.

A plot of 15daysof data,alsoseenin Figure4, suggeststhatthesemeasurementsin factfollow acircadianpattern.

However, onemayexpectthis patternto changewith time. For example,a patientmayhave smallerfluctuationsif a

treatmentis beingeffectiveagainstCHAT. Fromjust lookingat thedatait is hardto seeif thepatient’sbloodpressure

behavesdifferentlyin thethreeperiodsdefinedabove.

Theinstrumentthatwasusedto takethesemeasurementsis abit inaccurateandmeasurementerrormaybepresent.

We uselohesswith� �#Ñ and

< ��Ò daysto smooththedataandseeif any usefulinformation,not seenin thedata

plot, is obtained.In thiscasewehaveprior knowledgeof whatthefundamentalfrequency^

shouldbe,namely1 cycle

perday. In situationslike this we maychooseto set^

to a fixedvalueinsteadof estimatingit. In this case,lohess

is equivalentto computingspectrogramvaluesfor�¼^��=� �Ó� ����������� . Furthermore,using(2) we canexpress(3) asa

linearmodelandobtainapproximatestandarderrorsin astraightforwardfashion.

11

Page 12: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

Sistolic Blood PressureÔ

Time in Days

Sis

tolic

Blo

od P

ress

ure

0 50 100 150 200

100

150

200

Days 40 to 55Õ

Time in Days

Sis

tolic

Blo

od P

ress

ure

40 45 50 55

6080

100

120

140

160

180

200

Smooth EstimateÔ

Time in Days

Sis

tolic

Blo

od P

ress

ure

0 50 100 150 200

100

120

140

160

180

Days 40 to 55 Smooth EstimateÕ

Time in Days

Sis

tolic

Blo

od P

ress

ure

40 45 50 55

100

120

140

160

Estimated AmplitudeÖ

Time in DaysA

mpl

itude×

0 50 100 150 200

0Ø 102030

4050

60Residuals

Time in Days

Sis

tolic

Blo

od P

ress

ure

0 50 100 150 200

-40

-20

020

40

Figure4: 208and15 daysof systolicbloodpressuremeasurementstaken from a heartpatient. Smoothestimateof

systolicbloodpressuredatawith residualsandestimatedmeaningfulparameters.

12

Page 13: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

In Figure4, weseethesmoothversionandresidualsfor thethedata.Thesmoothversionof thedataclearlyshows

whatcouldbeconsideredtobeacircadianpattern.Whenlookingatthesmooth '� weseeadropin meanlevelataround

day80 andalsonoticethat,betweendays80 and125,thereseemsto bea dropin total amplitude.After day125the

amplitudeestimatesincreaseslightly. This maybe interpretedasevidencethat the thesecondtreatmentis effective

againstCHAT. This is seenmoreclearlyin theplotsshowing thefitted totalamplitude.To explorethepossibilitythat

thesechangesare“real”, weplot marginal Ù V standarderrorsaroundthefittedamplitudes.Theamplitudesplot seems

to bethemostinformativeplot in termsof showing how CHAT is beingaffectedby thetreatments.

In this casethe Ù V standarderrorsshouldnot be consideredpoint-wiseconfidenceintervals, sinceit is hardto

arguethattheassumptionsneededfor theasymptoticresultsaremet. For example,we useda window sizeof 7 days

whichcontainsabout7 cycles,but only 280datapoints.Within thiswindow thebehavior is notaswell approximated

by a harmonicfunctionasin themusicexample.If we believe theshapeof theoscillationsduringdaysthatareclose

togetheraremoresimilar thanfor daysthatarefar apart,thenusingsmallerwindow sizesis analternative we could

considerto improvethelocalapproximation.However, in doingsowewould reducethenumberof datapointsand/or

numberof cycles in the window, makingthe varianceapproximationsinappropriate.In any case,lohesshasbeen

usefulin termsof finding thatsomethingdifferentis occurringduringdays80 to 125.

3.5 Computational Issues

TheS-Plusfunctionswritten for lohessareavailableon thesoftwaresectionof theauthor’sweb-page.In this section

wediscusssomeof theproblemsthattheusersmayencounter.

In practice,fitting equation(2) is moreconvenientthanfitting equation(3) because is theonly non-linearpa-

rameterand we can useminimization routinesthat take advantageof this, seeBatesand Lindstrom (1986). The

minimizationroutineusedby lohessis theS-Plusfunctionnlfit with thealgorithm optionsetto "plinear".

Thefunctionneedsstartingvaluesfor thenon-linearparameter. In practice,wehavefoundthatit is importantthatthis

valueis closeto the“true” frequency. Noticethateverymultipleof thetruefundamentalfrequency is alocalminimum

of (1).

13

Page 14: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

If in (2)V�� � is not muchlessthanthe numberof pointsconsideredin the local estimation,we arefitting a

“saturatedmodel”. Thefinal estimatemaynotbemuchsmootherthantheoriginalsignal.To avoid this,onepossibility

is to choosea�

, smallerthanp/2, thatachievesthedesiredsmoothnessgivena particularspan<. In practiceonehas

to balancebetween�

and<

to obtaina reasonableestimate.For the examplespresentedin this paper, appropriate

valueswerefoundby usingour scientificintuition combinedwith trial anderror. SeeIrizarry (2001)for somedata

drivenproceduresfor choosing�

and<.

4 Extensions

Theideaof usingnon-polynomiallocalmodelshasbeenexploredby, for example,Hjort andJones(1996)andLoader

(1999).Extensivedevelopmentof theoreticalpropertiesarediscussedin theirwork. In thispaper, wehavedeveloped

practicalaspectsmuchmoredeeplyandpresentedan examplein soundanalysiswherethe procedureseemsto be

successful.We alsodemonstratedhow thefitted parametershave meaningfulinterpretationsin theseexamples.We

believethatthisideacanbeusefulin theanalysesof otherdatasetswherefunctionsotherthanpolynomialor harmonic

functionsmaybeusedforQE� % � M � in equation(1).

For example,if we look at any 30 year periodof the daily closing pricesof the DJIA, the dataappearto be

exponentiallygrowing. However, differentperiodshave differentgrowth rates.By usinglocal regressionwith a span

of 30yearsand QU� % � � M � �&| o�@Ú�Û:Ü���Ý % � �in equation(1), we mayobtaina smoothversionof this data. In this casethe parametersarealsomeaningful.For

example,we canview 'Ýr� asa measureof averagegrowth ratein thespanbeingconsidered.In Figure5 we show the

smoothversionof thedata,theresiduals,andfitted meaningfulparameters.By looking at thefitted growth rateplot,

wecanclearlyseetherecessionsof the30’sand70’s,theroaring20’s,andthebooming80’s. Furthermore,noticethat

if wecomputetheaverageover timeof thesmoothyearlygrowth estimate,weseethatit is about7%-8%,thenumber

givenby financial“experts”aswhatoneshouldexpectstocksto grow.

14

Page 15: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

Data and fit

YearÞ

Clo

sing

Pric

e

1940 1960 1980 2000

5050

050

00

.........................................................................................................................................................................................................................................................................................................................................................................................................................................................

...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

..................................................................................................................................................................................................................................................................................................................................................

......................................................................................................................................................................................................................................................................................................................................................................

...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

.......................................................................................................................................................................................................................................................................................................

................................................................................................................................................................................................................................................................

..............................................................................................................................

Residuals

Time

Per

cent

age

away

from

exp

ecte

d pr

ice

1940 1960 1980 2000

-50

050

Estimated parameter

YearÞE

stim

ated

gro

wth

par

amet

er

1940 1960 1980 2000

0.0

0.10

0.20

Meaningful parameter

YearÞ1 an

d 10

-Yea

r gr

owth

in %

1940 1960 1980 2000

510

1520

Figure5: Smoothestimateof Dow Jonesindustrialaveragedatawith residualsandestimatedmeaningfulparameters.

Notice that the resultsobtainedfrom this analysesmaynot bevery differentthanthoseobtainedby fitting local

lines to a log transformof the data. Taking log transformsis usuallyconsideredmoreuseful thanthe exponential

modelwith additive errorsdevelopedhere. However, we believe that the analysispresentedin this sectionserves

asan exampleof how loessmay be modifiedandwe hopeit motivatereadersthink of how it canbe usedin other

applications.

5 Concluding Remarks

In this paperwe have seenhow for certaindatasetslocal regressioncanbemodifiedin orderto producemeaningful

parameters.Theresultingproceduremaybeusedfor smoothingdataandfor obtainingusefulinformationaboutthe

local behavior of thedata. In particular, we introducedlohessandshowedhow it canbeusefulfor smoothingtime

15

Page 16: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

seriesdatawith approximatelyperiodictrends.It is importantto notethattheseproceduresareintendedasexploratory

dataanalysistools.We donotsuggestthatthefittedparametricfunctionsaremodelsfor thedata.However, scientists

have found the plots resultingfrom the procedurepresentedin this paperinsightful, a fact which hasled to more

detailedanalyses.

References

Bates,D. andLindstrom,M. (1986).Nonlinearleast-squareswith conditionally linear parameters.Proceedings,

StatisticalComputingSection, pages152–157.

Bloomfield,P. (1976).Fourier Analysisof TimeSeries:An Introduction. JohnWiley andSon,New York.

Brockwell,P. J.andDavis, R. A. (1996).Introductionto TimeSeriesandForecasting. Springer-Verlag.

Cleveland,R. B., Cleveland,W. S.,McRae,J. E., andTerpenning,I. (1990).Stl: A seasonal-trenddecomposition

procedurebasedon loess.Journalof Official Statistics, 6:3–33.

Cleveland,W. S. (1979).Robust locally weightedregressionandsmoothingscatterplots.Journal of theAmerican

StatisticalAssociation, 74(368):829–836.

Cleveland,W. S.andDevlin, S.J.(1988).Locally weightedregression:An approachto regressionanalysisby local

fitting. Journalof theAmericanStatisticalAssociation, 83:596–610.

Cleveland,W. S., Grosse,E., andShyu,W. M. (1993).Local regressionmodels.In Chambers,J. M. andHastie,

T. J.,editors,StatisticalModelsin S, chapter8, pages309–376.Chapman& Hall, New York.

Depalle,P., Garcıa, G., andRodet,X. (1995).Therecreationof a castratovoice,Farinelli’s voice.In IEEE ASSP

WorkshoponApplicationsof SignalProcessingto AudioandAcoustics, pages242–245,New Paltz,NY. IEEE.

Diggle,P. J.,Liang,K.-Y., andZeger, S.L. (1994).Analysisof LongitudinalData. OxfordSciencePublications.

Fan, J. (1992).Design-adaptive nonparametricregression.Journal of the AmericanStatisticalSociety, 87:998–

1004.

16

Page 17: Local Regression with Meaningful Parametersbiosun01.biostat.jhsph.edu/~ririzarr/papers/as99.pdf · Local Regression with Meaningful Parameters Rafael A. Irizarry Abstract Local regression

Fan,J.(1993).Local linearregressionsmoothersandtheirminimaxefficiencies.Annalsof Statistics, 21:196–216.

Fletcher, N. H. andRossing,T. D. (1991).ThePhysicsof MusicalInstruments. Springer-Verlag,New York.

Greenhouse,J. B., Kass,R. E., andTsay, R. S. (1987).Fitting nonlinermodelswith ARMA errorsto biological

rhythmdata.Statisticsin Medicine, 6:167–183.

Irizarry, R. A. (2000).Asymptoticdistributionof estimatesfor a time-varyingparameterin a harmonicmodelwith

multiple fundamentals.StatisticaSinica. In press.

Irizarry, R. A. (2001).Informationcriteriafor modelselectionin local likelihoodestimation.Journalof theAmeri-

canStatisticalAssociation. To appear.

Katinas,G., Cornelissen,G., Irizarry, R., Schaffer, E., Homans,D., Rhodus,N., Schwartzkopff, O., Siegelova,J.,

Palat,M., andHalberg,F. (1999).Casereportof coexistingelderlymesor-hypertensionandcircadianamplitude-

hypertension(CHAT). Geronto-Geriatrics, 2:68–86.

Pierce,J. (1992).TheScienceof MusicalSound. Freeman,New York.

Rodet,X. (1997).Musicalsoundsignalsanalysis/synthesis:Sinusoidal+residualandelementarywaveformmodels.

In Proceedingsof theIEEETime-FrequencyandTime-ScaleWorkshop(TFTS’97), Coventry, UK. IEEE.

Stone,C. J. (1977).Consistentnonparametricregression.TheAnnalsof Statistics, 5:595–620.

Wang,Y. andBrown, M. B. (1996).A flexible modelfor humancircadianrhythms.Biometrics, 52:588–596.

17