environmental data analysis with matlab
DESCRIPTION
Environmental Data Analysis with MatLab. Lecture 24: Confidence Limits of Spectra; Bootstraps. Housekeeping. This is the last lecture The final presentations are next week The last homework is due today. SYLLABUS. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/1.jpg)
Environmental Data Analysis with MatLab
Lecture 24:
Confidence Limits of Spectra; Bootstraps
![Page 2: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/2.jpg)
Housekeeping
This is the last lecture
The final presentations are next week
The last homework is due today
![Page 3: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/3.jpg)
Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares ProblemsLecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power Spectral DensityLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
SYLLABUS
![Page 4: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/4.jpg)
purpose of the lecture
continue
develop a way to assess the significance ofa spectral peak
and
develop the Bootstrap Methodof determining confidence intervals
![Page 5: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/5.jpg)
Part 1
assessing the confidence level of a spectral peak
![Page 6: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/6.jpg)
what does confidence in a spectral peak mean?
![Page 7: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/7.jpg)
one possibilityindefinitely long phenomenon
you observe a short time window(looks “noisy” with no obvious periodicities)
you compute the p.s.d. and detect a peak
you askwould this peak still be there if I observed some other time
window?or did it arise from random variation?
![Page 8: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/8.jpg)
0 100 200 300 400 500 600 700 800 900 1000-10
-5
0
5
10
0 0.50
50
100
0 0.2 0.40
50
100
0 0.50
50
100
0 0.2 0.40
50
100
example
t
ffff
da.s.d Y N N N
![Page 9: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/9.jpg)
0 100 200 300 400 500 600 700 800 900 1000-10
-5
0
5
10
0 0.2 0.40
50
100
0 0.2 0.40
50
100
0 0.2 0.40
50
100
0 0.2 0.40
50
100
t
ffff
da.s.d Y Y Y Y
![Page 10: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/10.jpg)
Null Hypothesis
The spectral peak can be explained by random variation in a time series that consists of nothing but random noise.
![Page 11: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/11.jpg)
Easiest Case to Analyze
Random time series that is:
Normally-distributeduncorrelatedzero meanvariance that matches power of time series under consideration
![Page 12: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/12.jpg)
So what is the probability density function p(s2) of points in the power spectral density s2 of such a
time series ?
![Page 13: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/13.jpg)
Chain of Logic, Part 1
The time series is Normally-distributed
The Fourier Transform is a linear function of the time series
Linear functions of Normally-distributed variables are Normally-distributed, so the Fourier Transform is Normally-distributed too
For a complex FT, the real and imaginary parts are individually Normally-distributed
![Page 14: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/14.jpg)
Chain of Logic, Part 2
The time series has zero mean
The Fourier Transform is a linear function of the time series
The mean of a linear function is the function of the mean value, so the mean of the FT is zero
For a complex FT, the means of the real and imaginary parts are individually zero
![Page 15: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/15.jpg)
Chain of Logic, Part 3
The time series is uncorrelated
The Fourier Transform has [GTG]-1 proportional to I
So by the usual rules of error propagation, the Fourier Transform is uncorrelated too
For a complex FT, the real and imaginary parts are uncorrelated
![Page 16: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/16.jpg)
Chain of Logic, Part 4
The power spectral density is proportional to the sum of squares of the real and imaginary parts of the Fourier Transform
The sum of squares of two uncorrelated Normally-distributed variables with zero mean and unit variance is chi-squared distributed with two degrees of freedom.
Once the p.s.d. is scaled to have unit variance, it is chi-squared distributed with two degrees of freedom.
![Page 17: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/17.jpg)
so
s2/c is chi-squared distributed
where c is a yet-to-be-determined scaling factor
![Page 18: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/18.jpg)
in the text, it is shown that
where:σd2 is the variance of the dataNf is the length of the p.s.d.Δf is the frequency samplingff is the variance of the taper.It adjusts for the effect of a tapering.
![Page 19: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/19.jpg)
0 2 4 6 8 10 12 14 16 18 200
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30-20
-10
0
10
20 A) tapered time series
time t, seconds
d(i)
B) power spectral density
frequency f, Hz
+2sd
-2sds2(f)
mean
95%
example 1: a completely random timeseries
![Page 20: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/20.jpg)
1 2 3 4 5 6 7 80
5
10
15
20
25
30
35
power spectral density, s2(f)
coun
tsmean 95%
example 1:histogram ofspectralvalues
![Page 21: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/21.jpg)
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
0 5 10 15 20 25 30
-20
-10
0
10
20A) tapered time series
time t, seconds
d(i)
B) power spectral density
frequency f, Hz
+2sd
-2sds2(f)
mean95%
example 2: random timeseries consistingof 5 Hz cosineplus noise
![Page 22: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/22.jpg)
2 4 6 8 10 12 14 16 180
10
20
30
40
50
60
power spectral density, s2(f)
coun
ts
mean 95% peak
example 2:histogram ofspectralvalues
![Page 23: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/23.jpg)
so how confident are we of a peak at 5 Hz ?
= 0.99994
the p.s.f. is predicted to be less than the level of the peak 99.994% of the time
But here we must be very careful
![Page 24: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/24.jpg)
two alternative Null Hypotheses
a peak of the observed amplitude at 5 Hz is caused by random variation
a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation
![Page 25: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/25.jpg)
two alternative Null Hypotheses
a peak of the observed amplitude at 5 Hz is caused by random variation
a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation
much more likely, since p.s.d. has many frequency points
(513 in this case)
![Page 26: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/26.jpg)
two alternative Null Hypotheses
a peak of the observed amplitude at 5 Hz is caused by random variation
a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation
peak of the observed amplitude or greater occurs only 1-0.99994= 0.006 % of the time
The Null Hypothesis can be rejected to high certainty
![Page 27: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/27.jpg)
two alternative Null Hypotheses
a peak of the observed amplitude at 5 Hz is caused by random variation
a peak at the observed amplitude somewhere in the p.s.d. is caused by random variation
peak of the observed amplitude occurs only 1-(0.99994)513
= 3% of the timeThe Null Hypothesis can be rejected to acceptable certainty
![Page 28: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/28.jpg)
Part 2
The Bootstrap Method
![Page 29: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/29.jpg)
The Issue
What do you do when you have a statistic that can test a Null Hypothesis
but you don’t know its probability density function
?
![Page 30: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/30.jpg)
If you could repeat the experiment many times, you could address the problem empirically
perform experimentcalculate statistic, s
make histogram of s’snormalize histogram into empirical p.d.f.
repeat
![Page 31: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/31.jpg)
The problem is that it’s not usually possible to repeat an experiment many times over
![Page 32: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/32.jpg)
Bootstrap Method
create approximate repeat datasetsby randomly resampling (with duplications)
the one existing data set
![Page 33: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/33.jpg)
example of resampling
1.42.13.83.11.51.7
123456
313251
3.81.43.82.11.51.4
123456
original data set
random integers in range 1-6
resampled data set
![Page 34: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/34.jpg)
example of resampling
1.42.13.83.11.51.7
123456
313251
3.81.43.82.11.51.4
123456
original data set
random integers in range 1-6
new data set
![Page 35: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/35.jpg)
p(d) p’(d)
sampling
duplication
mixing
interpretation of resampling
![Page 36: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/36.jpg)
time t, hours
d(i)
Example
what is the p(b)where b is the slope of a linear fit?
![Page 37: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/37.jpg)
This is a good test case, because we know the answer
if the data are Normally-distributed, uncorrelated with variance σd2,
and given the linear problem d = G m where m = [intercept, slope]T
The slope is also Normally-distributed with a variance that is the lower-right element of σd2 [GTG]-1
![Page 38: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/38.jpg)
![Page 39: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/39.jpg)
create resampled data set
returns Nrandom integers from 1 to N
![Page 40: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/40.jpg)
usual code for least squares fit of line
save slopes
![Page 41: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/41.jpg)
histogram of slopes
![Page 42: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/42.jpg)
2.5% and 97.5%
boundsintegrate p(b) to P(b)
![Page 43: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/43.jpg)
0.5 0.51 0.52 0.53 0.54 0.55 0.560
10
20
30
40
50
slope, b
p(b)
p(b)
standard error propagation
bootstrap
slope, b
95% confidence
![Page 44: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/44.jpg)
a more complicated example
p(r)where r isratio of CaO to Na2O ratio of the second varimax factor
of the Atlantic Rock dataset
![Page 45: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/45.jpg)
0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.520
5
10
15
20
25
30
35
CaO/Na2O ratio, r
p(r)
p(r)
CaO / Na2O ratio, r
95% confidence
mean
![Page 46: Environmental Data Analysis with MatLab](https://reader036.vdocument.in/reader036/viewer/2022062323/568165be550346895dd8bf9a/html5/thumbnails/46.jpg)
we can use this histogram to write confidence intervals for r
r has a mean of 0.486
95% probability that r is between 0.458 and 0.512
and roughly, since p(r) is approximately symmetrical
r = 0.486 ± 0.025 (95% confidence)