application of two statistical data assimilation * paul

1
Application of two Statistical Data Assimilation Procedures to a 1D Biological Model of the BATS Site Paul Mattern 1,* , Michael Dowd 2 , Katja Fennel 1 Sequential Monte Carlo methods are ensemble-based techniques that can be used in statistical data assimilation to provide optimal estimates of the state of the ocean. We tested the Ensemble Kalman Filter (EnKF) and the Sequential Importance Resampling (SIR) algorithm, two sequential Monte Carlo methods, on a simple physical-biological model. We implemented the time-dependent model in one spatial dimension for the Bermuda Atlantic Time-Series Study (BATS) site in the North Atlantic. Monthly physical and biological data are available for this site from October 1988 onward (http://bats.bios.edu). * [email protected] 1 Department of Oceanography, Dalhousie University 2 Department of Mathematics and Statistics, Dalhousie University Introduction Statistical Data Assimilation The Biological Model Our goals are: assimilating the observations from the BATS site into our biological model, employing both the EnKF and SIR procedures evaluating the results and comparing the model output with and without data assimilation to observations assessing ease of implementation and potential future use of the EnKF and SIR procedures with more complex three-dimensional models The Physical Model z Phy w Phy m Phy t Phy Phy Phy = μ z Det w Det r Phy m t Det Det Det Phy = Phy Det r t DIN Det μ = t Chl w Chl m Phy t Chl Phy Phy Chl = μ ρ ( ) Det r Phy r t Oxy Det DIN Oxy = μ : A physical model is required to simulate fluid motion and the effects of turbulent mixing on the distribution of the biological properties. We use the General Ocean Turbulence Model (GOTM, www.gotm.net) to simulate these physical processes at the BATS site. GOTM features a large number of turbulence parameterizations for dynamic, one-dimensional simulations of the ocean. The model simulates the top 350 m of the ocean with a vertical resolution of 1 m. The main objective for our physical model is to provide a realistic physical environment for the embedded biological processes. Mesoscale eddies (which are known to affect the BATS site) and other three-dimensional processes cannot be represented dynamically in our one- dimensional model. To account for variations in vertical density structure caused by horizontal advection and eddy variability, we decided to nudge model temperature and salinity to observed profiles. With nudging, the model- predicted mixed layer depth agrees with the data-derived mixed layer depth, with a coefficient of determination R 2 = 84.6%. Our biological model is a simplified version of the model of Fennel et al. (2006) and includes five state-variables. Phytoplankton (Phy), detritus (Det), and dissolved inorganic nitrogen (DIN) are nitrogen-based variables and describe a highly simplified nitrogen cycle. The two non-nitrogen state-variables, oxygen (Oxy) and phytoplankton chlorophyll (Chl), serve mainly diagnostic purposes. The core of the biological model is a set of partial differential equations (PDEs) that describes the sources and sinks of the state-variables: Here μ is a parameterization for photosynthetic growth of phytoplankton that depends on the availability of light and nutrients. m Phy is the phytoplankton mortality rate, r Det the remineralization rate of detritus, and r Oxy:DIN is the ratio of oxygen to DIN. ρ Chl is a parameterization that accounts for photoacclimation, i.e. the synthesis of chlorophyll in order to optimize light harvesting for photosynthesis. w Phy and w Det are the sinking rates for phytoplankton and detritus, respectively. BATS bottle data biological state- variable PON 30% phytoplankton 70% detritus nitrate and nitrite DIN chlorophyll a chlorophyll oxygen oxygen We assimilated the following BATS observations into the biological model. In the EnKF, each ensemble member is altered to match the observations more closely. The adjustment is carried out based on a Gaussian approximation of the probability distributions of the system state and the observations. In the SIR procedure, individual ensemble members remain unaltered during the assimilation, while their distribution is changed by a resampling step. In this step, new ensemble members are chosen from the old ensemble, according to their similarity to the observations. We implemented two sequential statistical data assimilation techniques, the EnKF and the SIR procedure. Both are Monte Carlo methods that require an ensemble of samples approximating the probability distribution of the system state. To produce the ensemble, we introduce stochasticity into the model by varying the biological parameters. For every model run, each parameter value is taken randomly from a parameter-specific distribution. Conclusions and Outlook EnKF and SIR have been implemented; both procedures increase model accuracy at the assimilation step and are promising for application with 3D models. The ensemble mean for the EnKF and SIR often show considerable improvement over simulations without assimilation, but a sensible choice of biological parameter values and distribution is important. This is subject to further research. References and Acknowledgements Dowd, M. “Bayesian Statistical Data Assimilation for Ecosystem Models using Markov Chain Monte Carlo” Journal of Marine Systems Volume 68, 439-456 (2007) Fennel, K. and 5 others “Nitrogen cycling in the Mid Atlantic Bight and implications for the North Atlantic nitrogen budget: Results from a three-dimensional model” Global Biogeochemical Cycles 20 GB3007, doi:10.1029/2005GB002456 (2006) Steinberg, D.K. and 5 others “Overview of the US JGOFS Bermuda Atlantic Time-series Study (BATS): a decade-scale look at ocean biology and biogeochemistry” Deep-Sea Research II 48, 1405-1447 (2001) PM and KF gratefully acknowledge funding from the ONR MURI program. KF is also supported through the CRC and NSERC Discovery programs. MD is grateful for the support of NPCDS and NSERC. mean square error PON DIN chlorophyll oxygen without assimilation 0.046 1.66 0.0067 207.9 EnKF result 0.043 0.92 0.0065 136.7 The mean square error respective to the bottle data, for a simulation with and without assimilation. Differences in scale in between variables are due to their units. In contrast to the EnKF, our SIR procedure conserves mass and creates profiles dynamically consistent with the model equations, at the price of staying strictly within the bounds imposed by the model, for example, due to mass conservation. For our purposes, the SIR procedure is easier to implement than the EnKF and provides the opportunity of analyzing the parameter combinations corresponding to resampled profiles. Statistical data assimilation encompasses a range of different procedures that incorporate observations into numerical models in order to improve the model predictions. Variational methods are one group of assimilations procedures and have been used with biological models mainly in the context of parameter optimization with deterministic models. In contrast to variational methods, we used the ensemble-based methods which allow for model errors. These sequentially alter the model system’s state during the simulation as new observations become available. During an assimilation step the observations are used to adjust the system state, with the aim of decreasing its difference from the observations. This is also referred to as state-space modelling and is especially useful for real-time prediction and forecasting. phytoplankton z (in m) conc. (in mmol N m -3 ) oxygen z (in m) conc. (in mmol oxygen m -3 ) chlorophyll z (in m) conc. (in mg chlorophyll m -3 ) Time-averaged profiles of phytoplankton, chlorophyll and oxygen: model (blue) versus BATS bottle data (red). The solid line shows the median, the light-colored area marks the region between the 0.1- and 0.9-quantile, and the dark-colored area shows the region between the 0.25- and 0.75-quantile.

Upload: others

Post on 31-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Application of two Statistical Data Assimilation * Paul

Application of two Statistical Data Assimilation Procedures to a 1D Biological Model of the BATS Site

Paul Mattern1,*, Michael Dowd2, Katja Fennel1

Sequential Monte Carlo methods are ensemble-based techniques that can be used in statistical data assimilation to provide optimal estimates of the state of the ocean. We tested the Ensemble Kalman Filter (EnKF) and the Sequential Importance Resampling (SIR) algorithm, two sequential Monte Carlo methods, on a simple physical-biological model. We implemented the time-dependent model in one spatial dimension for the Bermuda Atlantic Time-Series Study (BATS) site in the North Atlantic. Monthly physical and biological data are available for this site from October 1988 onward (http://bats.bios.edu).

* [email protected] Department of Oceanography,Dalhousie University 2 Department of Mathematics and Statistics,Dalhousie University

Introduction Statistical Data AssimilationThe Biological Model

Our goals are:• assimilating the observations from the BATS site into our biological model, employing both the EnKF and SIR procedures• evaluating the results and comparing the model output with and without data assimilation to observations• assessing ease of implementation and potential future use of the EnKF and SIR procedures with more complex three-dimensional models

The Physical Model

zPhywPhymPhy

tPhy

PhyPhy ∂∂

−−=∂

∂ μ

zDetwDetrPhym

tDet

DetDetPhy ∂∂

−−=∂

PhyDetrtDIN

Det μ−=∂

tChlwChlmPhy

tChl

PhyPhyChl ∂∂

−−=∂

∂ μρ

( )DetrPhyrtOxy

DetDINOxy −=∂

∂ μ:

A physical model is required to simulate fluid motion and the effects of turbulent mixing on the distribution of the biological properties. We use the General Ocean Turbulence Model (GOTM, www.gotm.net) to simulate these physical processes at the BATS site. GOTM features a large number of turbulence parameterizations for dynamic, one-dimensional simulations of the ocean.

The model simulates the top 350 m of the ocean with a vertical resolution of 1 m. The main objective for our physical model is to provide a realistic physical environment for the embedded biological processes. Mesoscale eddies (which are known to affect the BATS site) and other three-dimensional processes cannot be represented dynamically in our one- dimensional model. To account for variations in vertical density structure caused by horizontal advection and eddy variability, we decided to nudge model temperature and salinity to observed profiles. With nudging, the model- predicted mixed layer depth agrees with the data-derived mixed layer depth, with a coefficient of determination R2 = 84.6%.

Our biological model is a simplified version of the model of Fennel et al. (2006) and includes five state-variables. Phytoplankton (Phy), detritus (Det), and dissolved inorganic nitrogen (DIN) are nitrogen-based variables and describe a highly simplified nitrogen cycle. The two non-nitrogen state-variables, oxygen (Oxy) and phytoplankton chlorophyll (Chl), serve mainly diagnostic purposes.

The core of the biological model is a set of partial differential equations (PDEs) that describes the sources and sinks of the state-variables:

Here μ is a parameterization for photosynthetic growth of phytoplankton that depends on the availability of light and nutrients. mPhy is the phytoplankton mortality rate, rDet the remineralization rate of detritus, and rOxy:DIN is the ratio of oxygen to DIN. ρChl is a parameterization that accounts for photoacclimation, i.e. the synthesis of chlorophyll in order to optimize light harvesting for photosynthesis. wPhy and wDet are the sinking rates for phytoplankton and detritus, respectively.

BATS bottle data biological state- variable

PON30% phytoplankton 70% detritus

nitrate and nitrite DINchlorophyll a chlorophyll

oxygen oxygen

We assimilated the following BATS observations into the biological model.

In the EnKF, each ensemble member is altered to match the observations more closely. The adjustment is carried out based on a Gaussian approximation of the probability distributions of the system state and the observations.

In the SIR procedure, individual ensemble members remain unaltered during the assimilation, while their distribution is changed by a resampling step. In this step, new ensemble members are chosen from the old ensemble, according to their similarity to the observations.

We implemented two sequential statistical data assimilation techniques, the EnKF and the SIR procedure. Both are Monte Carlo methods that require an ensemble of samples approximating the probability distribution of the system state. To produce the ensemble, we introduce stochasticity into the model by varying the biological parameters. For every model run, each parameter value is taken randomly from a parameter-specific distribution.

Conclusions and Outlook

• EnKF and SIR have been implemented; both procedures increase model accuracy at the assimilation step and are promising for application with 3D models.

•The ensemble mean for the EnKF and SIR often show considerable improvement over simulations without assimilation, but a sensible choice of biological parameter values and distribution is important. This is subject to further research.

References and Acknowledgements

Dowd, M. “Bayesian Statistical Data Assimilation for Ecosystem Models using Markov Chain Monte Carlo” Journal of Marine Systems Volume 68, 439-456 (2007)

Fennel, K. and 5 others “Nitrogen cycling in the Mid Atlantic Bight and implications for the North Atlantic nitrogen budget: Results from a three-dimensional model” Global Biogeochemical Cycles 20 GB3007, doi:10.1029/2005GB002456 (2006)

Steinberg, D.K. and 5 others “Overview of the US JGOFS Bermuda Atlantic Time-series Study (BATS): a decade-scale look at ocean biology and biogeochemistry” Deep-Sea Research II 48, 1405-1447 (2001)

PM and KF gratefully acknowledge funding from the ONR MURI program. KF is also supported through the CRC and NSERC Discovery programs. MD is grateful for the support of NPCDS and NSERC.

mean square error PON DIN chlorophyll oxygenwithout assimilation 0.046 1.66 0.0067 207.9

EnKF result 0.043 0.92 0.0065 136.7

The mean square error respective to the bottle data, for a simulation with and without assimilation. Differences in scale in between variables are due to their units.

• In contrast to the EnKF, our SIR procedure conserves mass and creates profiles dynamically consistent with the model equations, at the price of staying strictly within the bounds imposed by the model, for example, due to mass conservation.

• For our purposes, the SIR procedure is easier to implement than the EnKF and provides the opportunity of analyzing the parameter combinations corresponding to resampled profiles.

Statistical data assimilation encompasses a range of different procedures that incorporate observations into numerical models in order to improve the model predictions. Variational methods are one group of assimilations procedures and have been used with biological models mainly in the context of parameter optimization with deterministic models. In contrast to variational methods, we used the ensemble-based methods which allow for model errors. These sequentially alter the model system’s state during the simulation as new observations become available. During an assimilation step the observations are used to adjust the system state, with the aim of decreasing its difference from the observations. This is also referred to as state-space modelling and is especially useful for real-time prediction and forecasting.

phytoplankton

z (in

m)

conc. (in mmol N m-3)

oxygen

z (in

m)

conc. (in mmol oxygen m-3)

chlorophyll

z (in

m)

conc. (in mg chlorophyll m-3)

Time-averaged profiles of phytoplankton, chlorophyll and oxygen: model (blue) versus BATS bottle data (red). The solid line shows the median, the light-colored area marks the region between the 0.1- and 0.9-quantile, and the dark-colored area shows the region between the 0.25- and 0.75-quantile.