an example
DESCRIPTION
An Example. The question Data Analyses Conclusions. Monthly mean air temperature at Recife 1953-1962. The question: What is the relationship between temperatures in Recife and El Nino?. Objective s - to layout analyses - to explore the data for surprises - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/1.jpg)
An Example.
The question
Data
Analyses
Conclusions
![Page 2: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/2.jpg)
Monthly mean air temperature at Recife 1953-1962
The question:
What is the relationship between temperatures in Recife and El Nino?
Objectives
- to layout analyses
- to explore the data for surprises
- predicted values
- signal + noise?
- ...
![Page 3: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/3.jpg)
Finding the data.
Google with various key words: temperature, Recife, ...
"Eventually lead" to:
cdiac.ornl.gov/ftp/ndp041
Carbon dioxide information analysis center!
Had to discover Recife Curado station id - 3068290000
(Looked at Brasil sites too, but that didn't turn up the data)
Years 1949-1988
Searched an inappropriate ste for a long time
![Page 4: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/4.jpg)
30682900001941 274 279 268 267 260 250 246 245 256 260 262 266
30682900001942 273 268 270 270 256 247 236 233 252 260 270 265
30682900001943 270 270 273 261 253 245 236 238 247 260 264 268
30682900001944 269 272 275 263 254 247 239 232 241 256 267 273
30682900001945 278 268 278 271 256 240 236 243 251 258 268 267
30682900001946 268 277 271 259 258 247 247 247 249 257 261 266
30682900001947 269 270 268-9999 253 246 244 245 249 262 268-9999
30682900001948 273 271 270 269 255 249 243 240 248-9999 264 270
30682900001949 272 274 278 266 252 246 240 236 250 261 265 271
30682900001950 273 280 269 251 248 243 235 234 248 258 264-9999
30682900001951 269 267 275 265 254 238 233 238 247 259 263-9999
30682900001952 272 278 267 262 252 245 240 238 253 256 263 268
The web data. monthly
notice -9999 replace by NA file: recifecurado
![Page 5: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/5.jpg)
How to handle missing values? Interpolate? Model? ...?
junk<-scan("recifecurado")
junk1<-matrix(junk,ncol=48)
junk2<-junk1[2:13,] # years in first row
series<-c(junk2)/10 # for degrees centigrade
length(series[is.na(series)]) #17 - need to understand missingness
Interpolation
series1<-series
for(i in 2:(length(series)-1)){if(is.na(series[i]))series1[i]<-.5*series[i-1] +.5*series[i+1]}
![Page 6: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/6.jpg)
plot(xaxis,series1,type="l",xlab="year",ylab="mean temp (degrees C)",las=1)
title("Mean monthly temperatures 1949-88 Recife Curado")
abline(h=mean(series1))
![Page 7: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/7.jpg)
There is seasonality and variability
Restricted range in mid-sixties - nonconstant mean level?
ylim<-range(series1)
par(mfrow=c(2,1))
plot(lowess(xaxis,series1),type="l",ylim=ylim,xlab="year",ylab="degrees C",main="Smoothed Recife series")
abline(h=mean(series1))
junk20<-lowess(xaxis,series1)
plot(xaxis,series1-junk20$y,type="l",xlab="year",ylab="degrees C",main="Residuals")
abline(h=mean(series1-junk20$y))
![Page 8: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/8.jpg)
![Page 9: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/9.jpg)
par(mfrow=c(1,1))
acf(series1,las=1,xlab="lag(mo)",ylab="",main="autocorrelation recife temperatures",lag.max=50,ylim=c(-1,1))
![Page 10: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/10.jpg)
More confirmation of period 12
Remember the interpretation of the error lines
Note that nearby values are highly correlated
![Page 11: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/11.jpg)
spectrum(series1,xlab="frequency (cycles/month)",las=1)
![Page 12: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/12.jpg)
Note peaks at frequency 1/12 and harmonics
Further confirmation of period 12
Note log scale for y-axis
Note vertical line in upper right
Gives uncertainty
![Page 13: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/13.jpg)
What is the shape of the seasonal?
junk4<-matrix(series1,nrow=12)
junk5<-apply(junk4,1,mean)
plot(junk5,type="l",las=1)
abline(h=mean(junk5))
![Page 14: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/14.jpg)
Cooler in July-Aug
Southern Hemisphere
Uncertainty?
![Page 15: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/15.jpg)
Cooler in July-August. Southern hemisphere
Part of a longer cycle? El Nino explanatory?
After "removing" trend middle has been pulled up
Need uncertainties
Back to original data
![Page 16: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/16.jpg)
Remove seasonal
series2<-series1
for(i in 1:48){
for(j in 1:12){
series2[(i-1)*12+j]<-series1[(i-1)*12+j]-junk5[j]
}
}
par(mfrow=c(2,1))
plot(xaxis,series2,type="l",xlab="year",ylab="residual",main="Series after removing seasonal",las=1)
abline(h=0)
ylim<-range(series2)
plot(xaxis,series1-mean(series1),type="l",xlab="year",ylab="degreesC",main="Mean removed series",las=1,ylim=ylim)
abline(h=mean(series1-mean(series1)))
![Page 17: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/17.jpg)
original variance 1.342 adjusted .248
![Page 18: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/18.jpg)
par(mfrow=c(2,1))
acf(series2,lag.max=50,las=1,xlab="lag (mo)",main="Ajusted by removing monthly means",las=1)
acf(diff(series1,lag=12),lag.max=50,xlab="lag (mo)",main="Order 12 differenced series")
![Page 19: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/19.jpg)
Frequency domain analysis.
par(mfrow=c(2,1))
junk9<-spec.pgram(series1,taper=0,detrend=F,demean=F,spans=5,plot=F)
ylim<-range(junk9$spec)
junk9<-spec.pgram(series1,taper=0,detrend=F,demean=F,spans=5,xlab="frequency (cycles/mo)",las=1,main="Original series")
junk10<-spec.pgram(series2,taper=0,detrend=F,demean=F,spans=5,ylim=ylim,main="Monthly means removed",las=1)
![Page 20: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/20.jpg)
Work remains on seasonal
Residual "not" white noise
![Page 21: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/21.jpg)
Time domain
distributions
![Page 22: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/22.jpg)
Parametric model. SARIMA ?
Thinking about prediction, consider
Yt = αYt-1 + βYt-12 + Nt
with some ARMA for Nt
Check seasonal residuals for normality
Hope to end up with white noise
![Page 23: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/23.jpg)
Call:
arima(x = series1, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 1), period = 12))
Coefficients:
ar1 ma1 sar1 sma1 intercept
0.7604 -0.3344 0.9990 -0.9201 25.6117
s.e. 0.0610 0.0968 0.0007 0.0220 0.6100
sigma^2 estimated as 0.1771: log likelihood = -337.48, aic = 686.97
Junk<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12))
![Page 24: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/24.jpg)
![Page 25: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/25.jpg)
tsdiag(Junk,gof.lag=25)
![Page 26: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/26.jpg)
![Page 27: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/27.jpg)
Junk<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12))
postscript(file="recifeplots1a.ps",paper="letter",hor=T)
Junk2<-predict(Junk,n.ahead=24)
Junk3<-c(series1,Junk2$pred)
Junk3a<-c(rep(0,576),2*Junk2$se)
Junk3b<-c(rep(0,576),-2*Junk2$se)
Junk4a<-Junk3+Junk3a;Junk4b<-Junk3+Junk3b
ylim<-range(Junk4a,Junk4b)
par(mfrow=c(1,1))
xaxis1<-1941+(1:length(Junk3)/12)
plot(xaxis1[xaxis1>1983],Junk4a[xaxis1>1983],type="l",las=1,ylim=ylim,col="red",xlab="year",ylab="degrees C",main="Data + predictions")
lines(xaxis1[xaxis1>1983],Junk4b[xaxis1>1983],col="red")
lines(xaxis1[xaxis1>1983],Junk3[xaxis1>1983],col="blue")
lines(xaxis[xaxis>1983],series1[xaxis>1983])
![Page 28: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/28.jpg)
Two series
Bivariate case {Xt, Yt} - jointly distributed
Linear time invariant / transfer function model
tkktkt NXhY
nonparametric/parametric approaches
![Page 29: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/29.jpg)
Southern Oscillation Index
El Niño: global coupled ocean-atmosphere phenomenon.
The Pacific ocean signatures, El Niño and La Niña are important temperature fluctuations in surface waters of the tropical Eastern Pacific Ocean
![Page 30: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/30.jpg)
Southern Oscillation reflects monthly or seasonal fluctuations in the air pressure difference between Tahiti and Darwin
www.cpc.ncep.noaa.gov/data/soi
![Page 31: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/31.jpg)
junk<-scan("recifecurado")
junk1<-matrix(junk,ncol=48)
junk6<-junk1[1,]
junk1<-junk1[,junk6>1950]
junk2<-junk1[2:13,]
series<-c(junk2)/10
length(series[is.na(series)]) #13
xaxis<-1951+(1:length(series)/12)
series1<-series
junk4<-matrix(series1,nrow=12)
junk5<-apply(junk4,1,mean)
for(i in 2:(length(series)-1)){if(is.na(series[i]))series1[i]<-.5*series[i-1]+.5*series[i+1]}
junk4<-matrix(series1,nrow=12)
junk5<-apply(junk4,1,mean)
series2<-series1
for(i in 1:38){
for(j in 1:12){
series2[(i-1)*12+j]<-series1[(i-1)*12+j]-junk5[j]}}
![Page 32: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/32.jpg)
kunk<-scan("SOIa.dat")
kunk1<-matrix(kunk,ncol=58); kunk6<-kunk1[1,]
kunk1<-kunk1[,kunk6<1989]
kunk2<-kunk1[2:13,]
teries<-c(kunk2)
length(teries[is.na(teries)]) #0
teries1<-teries; teries2<-teries1
postscript(file="recifeplots3.ps",paper="letter",hor=T)
par(mfrow=c(2,1))
plot(xaxis,series2,type="l",las=1,xlab="year",ylab="",main="Seasonally adjusted Recife temps")
plot(xaxis,teries2,type="l",las=1,xlab="year",ylab="",main="Southern Oscillation Index")
![Page 33: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/33.jpg)
postscript(file="recifeplots2.ps",paper="letter",hor=F)
par(mfrow=c(1,1))
acf(cbind(series2,teries2))
![Page 34: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/34.jpg)
![Page 35: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/35.jpg)
junk10<-cbind(series2,teries2)
junk11<-spec.pgram(junk10,plot=F,taper=0,detrend=F,demean=F,spans=11)
par(mfcol=c(2,2))
plot(junk11$freq,10**(.1*junk11$spec[,2]),log="y",main="SOIspectrum", xlab="frequency", ylab="", las=1,type="l")
plot(junk11$freq,junk11$coh,main="Coherence",xlab="frequency",ylab="",las=1,ylim=c(0,1),type="l")
junkh<-1-(1-.95)**(1/(.5*junk11$df-1))
abline(h=junkh)
plot(junk11$freq,10**(.1*junk11$spec[,1]),log="y",main="Seasonally corrected Recife spectrum",xlab="frequency", ylab="",las=1, type="l")
![Page 36: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/36.jpg)
SARIMAX
Yt = αYt-1 + βYt-12 + γXt + Nt
ar1 ma1 sar1 sma1 intercept teries1
0.7885 -0.3717 0.9996 -0.9474 25.5572 -0.0255
s.e. 0.0610 0.1014 0.0006 0.0321 0.6403 0.0149
sigma^2 estimated as 0.1792: log likelihood = -275.68, aic = 565.35
Junk1<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12),xreg=teries1)
![Page 37: An Example](https://reader035.vdocument.in/reader035/viewer/2022070409/568144a4550346895db16b3d/html5/thumbnails/37.jpg)
The answer to the question:
There is a hint of a linear time invariant relationship.