oceanography 569 oceanographic data analysis laboratory
DESCRIPTION
Oceanography 569 Oceanographic Data Analysis Laboratory. Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_2014/. Organization. 1 lecture, 1 lab period (2 hrs) per week - PowerPoint PPT PresentationTRANSCRIPT
Oceanography 569Oceanographic Data Analysis Laboratory
Kathie KellyApplied Physics Laboratory
515 Ben Hall IR Bldgclass web site:
faculty.washington.edu/kellyapl/classes/ocean569_2014/
Organization
• 1 lecture, 1 lab period (2 hrs) per week• Exercise assigned in lab, finish by following lecture• Presentation of solution in lecture session• One class project completed individually• Grade based on presentations and project• Office hours by appointment
Materials
Materials available on class web site:• Powerpoint notes• mfiles & mat files for exercises• specialized functions (mfiles)• example solutions (following week)
Text: “Modeling Methods for Marine Science” by Glover, Jenkins & Doney• on reserve in Physics Library• a good reference to buy
General Procedure for Data Analysis
• Define analysis goal• Characterize data• Prepare data• Errors and error propagation• Statistical analyses• Combine data with model (prognostic,
diagnostic, statistical)
Exercise 1: Aegean Sea temperaturesanalysis goal: create continuous 3-m time series
Daily satellite SST maps• 5 buoys (POSEIDON)• 3-m 3-hourly
temperatures (with gaps)
Exercise 1: Characterize Data
3-m: higher resolution, but gapsSST: continuous, but only daily
What happens when the data are “merged”?To make a consistent series, what is sacrificed?
Exercise 1: Data discrepanciescompare apples & apples: average 3-m to daily
What are the characteristics of the differences?How can the differences be reconciled?
Periodic Signals
Robust way to estimate periodic signals, especially for gappy data:
fit_harmonics: fit to cosines with period L, L/2, etc (cf. Fourier series)
[amp,phase,frac,offset,da]=fit_harmonics(data,time,nharm,L,cutoff);
d_periodic = amp(1)*cos(2*pi*t/L+phase(1)) + amp(2)*cos(2*pi*2*t/L+phase(2))
+ ... + amp(n)*cos(2*pi*n*t/L+phase(n)) +offset for nharm=n
includes jth term only if frac(tion) of variance removed > cutoff/100 returns anomaly: da = data - d_periodic
Note: offset is not the same as mean(data) Remove mean using fit_harmonics if strong seasonal cycle!
Exercise 1: Fix discrepanciesfind & remove seasonal cycle in difference
Result: daily average temperature that matches the seasonal cycle of the 3-m series
Other goals
1. Continuous SST with a diurnal cycle: use 3m temperature to find diurnal cycle
2. Correct SST for aliasing from undersampling the diurnal cycle
3. Create non-seasonal temperature anomalies
AliasingSST sampling aliases diurnal cycle“Nyquist frequency”: period of 2*Δt
sample diurnal temperature signal using 26-hr intervals
Matlab functions
datenum: converts yyyy,mm,dd to Julian dates, starting at year 0; also datestr, datevec, datetick(‘x’)
imagesc: bit map that shows each image pixel, scaled to colormap (cf. pcolor, which interpolates pixels to a grid)
NaN, “not a number”: use to flag invalid data, then nanmean, nansum, etc ignore NaN’s. Does not plot. To find valid data:
ind=find(~isnan(data));
fit_harmonics(data,time,nharm,L,cutoff): use to find any periodic signal in the data, using the time axis, period L and a cutoff (% of variance explained)
Statistics of Observations “random” variables
Are these observations of random variables?Will removing the mean make them random?
Statistical Definitions: mean
The sample mean is given by
The mean of the parent population is given by
But we never know it since the sample is finite. For class the mean wil refer to the sample mean, regardless of the symbol.
The factor N here is the number of degrees of freedom.
Statistical Definitions: variance
The sample variance is given by
where s is the standard deviation of x. The variance of the parent population corresponds to an infinite number of samples, N.
The N-1 factor occurs because using the sample mean “uses up” one of the degrees of freedom of the data set.
In class the we will refer to the sample variance.
Exercise 2: Periodic Signals need to remove non-random components
Both have periodic signals (seasonal, not random)
Caution: mean of data with periodic componentif incomplete cycles in sample
Use “offset” from fit_harmonics instead
Exercise 2: Probability Distributions(histogram)
Both non-seasonal SST and non-seasonal rain are random variables.Are either of these normally distributed?
Normal Distribution for Random Variable
Why do we want a normal distribution?Least-squares fit, correlations, optimal interpolation have error estimates based on assumption of normal distributions of random data and/or errors
Exercise 2: Making a variable more normal distribution of log(rain)
log(rain)
rain
Exercise 2: distributions for modified variabledeciles
rain deciles
rain
uniform
Exercise 2: test for normal cumulative distribution
To edit or not to edit
For a truly normal distribution, 0.3% of the data are more than3 standard deviations from the mean
“Three-sigma edit”: remove data more than 3 std dev from mean
Best to justify edits in terms oflikely error sources and characteristics• spikes• unphysical values• comparisons with other
variables
Exercise 2: Edit data3-sigma outliers
Procedure for removing suspicious data:1) remove known signals
(diurnal, seasonal, trends)2) check for normal
distribution3) compute σ (standard
deviation)4) remove data more than 3*σ
from mean
do not iterate!
Central Limit Theorem
Why is Normal distribution commonly used?
Underlying distributions may be unknown or non-Normal
BUT if measurement (or error) is sum of many processes, distribution will approach Normal
Example: distribution of the mean of X for different distributions as the number of samples increases