geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_rse.pdf ·...

11
Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes Jun Wang a, , Daniel G. Brown a , Dorit Hammerling b a School of Natural Resources and Environment, University of Michigan, Ann Arbor, MI, USA b Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, USA abstract article info Article history: Received 18 October 2012 Received in revised form 4 August 2013 Accepted 9 August 2013 Available online 4 September 2013 Keywords: Multi-resolution data fusion Super-resolution mapping Change-of-support Spatial nonstationarity Geostatistical inverse modeling Spatial prediction Uncertainty The increasing availability of satellite images derived from multiple sensors creates opportunities for broader spatial and temporal coverage but also methodological challenges. We present a geostatistical inverse modeling (GIM) approach for merging coarse-resolution images with variable resolutions and for super- resolution (i.e., predictions at the sub-pixel level) mapping of continuous spatial processes. GIM can explicitly account for the differences in spatial supports of multiple datasets. The restricted maximum likelihood method was used for parameter estimations associated with the change-of-support problem. We used GIM to produce both spatial predictions of a target image and prediction uncertainties, while preserving the values of original measurements. GIM is totally data driven, and covariance parameters for a target resolution can be directly derived from measurements. We also developed a moving-window GIM approach to accommodate spatial nonstationarity and reduce computational burden associated with large image data. First, we demonstrated GIM and moving-window GIM on synthetic images. Aggregated synthetic images with variable resolutions were merged to produce a single resolution image. The results show that the two approaches can produce accu- rate spatial predictions and generate prediction uncertainties. Second, we applied moving-window GIM for merging aerosol optical depth (AOD) data with variable resolutions, which were derived from two satellite sen- sors. The modeling results show that moving-window GIM can be applied for merging complementary AOD data from two sensors and for super-resolution mapping of global AOD distributions. Therefore, we can conclude that GIM is a practical solution for merging complementary coarse-resolution images and for super-resolution mapping of continuous spatial processes. © 2013 Elsevier Inc. All rights reserved. 1. Introduction Synthesizing complementary information derived from multiple sensors prompts the need to study rigorous data fusion algorithms. Data fusion is a process that integrates information derived from differ- ent sensors or different spectral bands of the same sensor and produces a single image that contains complementary information from multiple sources, while minimizing loss or distortion of the original data (Hall, 2004; Pohl & Van Genderen, 1998). In this work, we focus on statistical algorithms for merging measurements derived from multiple coarse- resolution sensors. Statistical data fusion combines statistically hetero- geneous samples from marginal distributions to make statistical infer- ence about the unobserved joint distributions or functions of them (Braverman, 2008). Statistical data fusion, including those based on geostatistics, can produce spatial predictions of pixel values (Atkinson, Pardo-Iguzquiza, & Chico-Olmo, 2008). Recently, several geostatistical algorithms, including xed ranking kriging (Cressie & Johannesson, 2008; Shi & Cressie, 2007), xed ranking ltering (Cressie, Shi, & Kang, 2010; Kang, Cressie, & Shi, 2010), spatial statistical data fusion (Nguyen, Cressie, & Braverman, 2012), space-time data fusion (Braverman, Nguyen, & Cressie, 2011), and moving-window kriging (Hammerling, Michalak, & Kawa, 2012), have been developed for map- ping global distributions of environmental variables, such as aerosol op- tical depth (AOD) and carbon dioxide (CO 2 ), with sparsely distributed remotely sensed data. The numerous algorithms for merging measure- ments from different spectral bands (e.g., pan-sharpening) are beyond the scope of this paper. Measurements from remote sensing sensors are constantly inuenced by factors like atmospheric conditions, electronic noise of sensors, and changes in illumination. In order to build geostatistical models of sensor measurements contaminated with measurement er- rors, we take a stochastic view of remote sensing images. We regard the true spatial process of interest (i.e., spectral radiance) as a random eld, i.e., a spatial random process with a set of random variables that have certain probability distributions. Then, a remote sensing image covering an area can be conceived as a realization of the random eld. In this work, we adopt the Gaussian random eld model, which involves Remote Sensing of Environment 139 (2013) 205215 Corresponding author at: School of Natural Resources and Environment, University of Michigan, 440 Church Street, Ann Arbor, MI 48109-1041, USA. Tel.: +1 734 276 5048; fax: +1 734 936 2195. E-mail address: [email protected] (J. Wang). 0034-4257/$ see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.rse.2013.08.007 Contents lists available at ScienceDirect Remote Sensing of Environment journal homepage: www.elsevier.com/locate/rse

Upload: others

Post on 10-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

Remote Sensing of Environment 139 (2013) 205–215

Contents lists available at ScienceDirect

Remote Sensing of Environment

j ourna l homepage: www.e lsev ie r .com/ locate / rse

Geostatistical inverse modeling for super-resolution mapping ofcontinuous spatial processes

Jun Wang a,⁎, Daniel G. Brown a, Dorit Hammerling b

a School of Natural Resources and Environment, University of Michigan, Ann Arbor, MI, USAb Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, USA

⁎ Corresponding author at: School of Natural RUniversity of Michigan, 440 Church Street, Ann Arbor,734 276 5048; fax: +1 734 936 2195.

E-mail address: [email protected] (J. Wang).

0034-4257/$ – see front matter © 2013 Elsevier Inc. All rihttp://dx.doi.org/10.1016/j.rse.2013.08.007

a b s t r a c t

a r t i c l e i n f o

Article history:Received 18 October 2012Received in revised form 4 August 2013Accepted 9 August 2013Available online 4 September 2013

Keywords:Multi-resolution data fusionSuper-resolution mappingChange-of-supportSpatial nonstationarityGeostatistical inverse modelingSpatial predictionUncertainty

The increasing availability of satellite images derived from multiple sensors creates opportunities for broaderspatial and temporal coverage but also methodological challenges. We present a geostatistical inversemodeling (GIM) approach for merging coarse-resolution images with variable resolutions and for super-resolution (i.e., predictions at the sub-pixel level) mapping of continuous spatial processes. GIM can explicitlyaccount for the differences in spatial supports of multiple datasets. The restricted maximum likelihood methodwas used for parameter estimations associated with the change-of-support problem. We used GIM to produceboth spatial predictions of a target image and prediction uncertainties, while preserving the values of originalmeasurements. GIM is totally data driven, and covariance parameters for a target resolution can be directlyderived from measurements. We also developed a moving-window GIM approach to accommodate spatialnonstationarity and reduce computational burden associated with large image data. First, we demonstratedGIM and moving-window GIM on synthetic images. Aggregated synthetic images with variable resolutionswere merged to produce a single resolution image. The results show that the two approaches can produce accu-rate spatial predictions and generate prediction uncertainties. Second, we applied moving-window GIM formerging aerosol optical depth (AOD) data with variable resolutions, which were derived from two satellite sen-sors. Themodeling results show thatmoving-windowGIM can be applied formerging complementary AODdatafrom two sensors and for super-resolution mapping of global AOD distributions. Therefore, we can conclude thatGIM is a practical solution for merging complementary coarse-resolution images and for super-resolutionmapping of continuous spatial processes.

© 2013 Elsevier Inc. All rights reserved.

1. Introduction

Synthesizing complementary information derived from multiplesensors prompts the need to study rigorous data fusion algorithms.Data fusion is a process that integrates information derived from differ-ent sensors or different spectral bands of the same sensor and producesa single image that contains complementary information frommultiplesources, while minimizing loss or distortion of the original data (Hall,2004; Pohl & Van Genderen, 1998). In this work, we focus on statisticalalgorithms for merging measurements derived from multiple coarse-resolution sensors. Statistical data fusion combines statistically hetero-geneous samples from marginal distributions to make statistical infer-ence about the unobserved joint distributions or functions of them(Braverman, 2008). Statistical data fusion, including those based ongeostatistics, can produce spatial predictions of pixel values (Atkinson,Pardo-Iguzquiza, & Chico-Olmo, 2008). Recently, several geostatistical

esources and Environment,MI 48109-1041, USA. Tel.: +1

ghts reserved.

algorithms, including fixed ranking kriging (Cressie & Johannesson,2008; Shi & Cressie, 2007), fixed ranking filtering (Cressie, Shi, &Kang, 2010; Kang, Cressie, & Shi, 2010), spatial statistical data fusion(Nguyen, Cressie, & Braverman, 2012), space-time data fusion(Braverman, Nguyen, & Cressie, 2011), and moving-window kriging(Hammerling, Michalak, & Kawa, 2012), have been developed for map-ping global distributions of environmental variables, such as aerosol op-tical depth (AOD) and carbon dioxide (CO2), with sparsely distributedremotely sensed data. The numerous algorithms for merging measure-ments from different spectral bands (e.g., pan-sharpening) are beyondthe scope of this paper.

Measurements from remote sensing sensors are constantlyinfluenced by factors like atmospheric conditions, electronic noise ofsensors, and changes in illumination. In order to build geostatisticalmodels of sensor measurements contaminated with measurement er-rors, we take a stochastic view of remote sensing images. We regardthe true spatial process of interest (i.e., spectral radiance) as a randomfield, i.e., a spatial random process with a set of random variables thathave certain probability distributions. Then, a remote sensing imagecovering an area can be conceived as a realization of the random field.In thiswork,we adopt theGaussian randomfieldmodel,which involves

Page 2: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

206 J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

a set of Gaussian probability density functions for random variables.For remote sensing measurements, the values of continuous spatialprocesses of interest are regularized to discrete pixels by aweighted av-erage process, with the spatial weights determined by point spreadfunctions (PSFs) of sensors (Jupp, Strahler, & Woodcock, 1988). The ef-fective instantaneous field of view (EIFOV) of the sensor is an area overwhichmeasurements are averaged. EIFOV defines the spatial support ofsensor measurements. Spatial support is a geostatistical concept thatmeans the shape, size, and orientation of measurements (Gotway &Young, 2002, 2005). The value assigned to a pixel represents the aver-age radiance arriving at the sensor from the EIFOV (Jupp, Strahler &Woodcock, 1988). Sensors with different sizes of pixels have differ-ent EIFOVs. As a result, measurements from these sensors have dif-ferent spatial supports.

Merging remote sensing images with variable resolutions usuallyinvolves the change-of-support problem (Curran & Atkinson, 1999;Gotway & Young, 2002, 2005). Several geostatistical algorithms havebeen developed to solve the change-of-support problem. Area-to-point krigingwas developed for downscaling areal data to point support(Kyriakidis, 2004; Kyriakidis & Yoo, 2005). Goovaerts (2008) developeda practical semivariogram deconvolution algorithm to derive point sup-port variogramparameters from areal data. This algorithm solved one ofthe key problems in the practical application of area-to-point kriging.Moreover, a parallel computing algorithm has been developed forspeeding-up the computations involved in the practical application ofarea-to-point kriging (Guan, Kyriakidis, & Goodchild, 2011). Area-to-point kriging has the potential for downscaling remote sensing data.Nguyen et al. (2012) developed a spatial statistical data fusion ap-proach, which integrates a change-of-support model into fixed rankkriging for multi-resolution data fusion. It was applied for mergingvariable-resolution AOD images derived from the Multi-angle ImagingSpectroradiometer (MISR) and the Moderate Resolution ImagingSpectroradiometer (MODIS) sensors. Sales, Souza, and Kyriakidis(2013) applied a Kriging with external drift (KED) method for in-creasing 500-m resolution (bands 3–7) MODIS images to 250-mresolution.

The spatially varying dependence structure of random variables(i.e., spatial nonstationarity) is another problem to be dealt withwhen working with remote sensing images covering large geographicareas. In geostatistics, a fundamental assumption of most models isthat random fields are second-order stationary, i.e., in a relativelysmall region the mean values of random variables are constant andthe covariance between the values of random variables only dependson the distance between them (Chilès & Delfiner, 1999). However, forspatial data covering large geographic areas, this assumption may notbe true because remote sensing images covering large geographicareas usually include spatial nonstationarity. In this work, we differen-tiated two types of spatial nonstationarity: one is spatial nonstationarityin the mean values of regionalized variables, and the other is spatialnonstationarity in the covariance structure. The need to address spatialnonstationarity has been discussed in the field of geostatistics overthe past decades. Universal kriging is one way to address spatialnonstationarity in the mean values of regionalized variables. Hass(1990) applied a moving-window kriging approach to model aciddepositions. In the moving-window kriging model, measurements inlocal-windows are used for both parameter estimations and spatialpredictions. This approach is simple to be implemented, and it alleviatesthe problem of spatial nonstationarity. Because of local fitting andcomputing, moving-window kriging is also computationally efficient.One caveat of the local-window approach is that there is no consistentcovariance function over the whole study domain. Higdon, Swall, andKern (1999) convolved spatially varying kernels to give a nonstationaryversion of the squared exponential stationary covariance function.This approach has been applied in modeling remote sensing images(D'Hondt, López-Martínez, Ferro-Famil, & Pottier, 2007). Although thismethod can produce a consistent covariance function over the whole

prediction domain, the Gaussian kernel applied in this method is toosmooth for real spatial processes.

Besides spatial nonstationarity, the problem of computationalburden also needs to be solved when applying geostatistical modelsfor dealing with large spatial data. The local-window approach(e.g., moving-window kriging) is one way to solve the problem ofcomputational burden. Data dimension reduction is another way to re-duce computational burden associated with large spatial data (Wikle,2010). Cressie and Johannesson (2008) developed a spatial mixedeffects model (i.e., fixed rank kriging) with a flexible family ofnonstationary covariance functions. For this approach, kriging can bedone exactly, and the computational complexity is linear to the size ofthe data. Moreover, Cressie et al. (2010) developed a spatial–temporalrandom effect model (i.e., fixed rank filtering), which integrates fixedrank kriging and Kalman filter for dealing with large spatial–temporaldata. Fixed rank kriging and fixed rank filtering are both approachesof data dimension reduction. These methods eliminate or reduce somecomponents of spatial variability to improve computational efficiency.

In this work, we present a geostatistical inverse modeling approachfor merging coarse-resolution remote sensing images with variablespatial supports. The geostatistical inverse model was designed to bestatistically principled, and it can produce the best predictions (i.e., min-imizing the squared errors between predictions andmeasurements) andprediction uncertainties, while honoring the original data (i.e., preserv-ing the values of original measurements) (Kitanidis, 1995; Michalak,Bruhwiler, & Tans, 2004). In the geostatistical inverse modeling frame-work, the restricted maximum likelihood method was used for estimat-ing covariance parameters related to the change-of-support problem.Moreover, we contributed a moving-window geostatistical inversemodeling approach to accommodate spatial nonstationarity and reducecomputational burden associated with large spatial data. Following theintroduction, we introduce the geostatistical inverse modeling method-ology in Section 2. In Section 3, we illustrate the computer experimentsusing synthetic and real images. The modeling results are presented inSection 4. Finally, we discuss possiblemodel improvements and summa-rize the major findings of this work.

2. Methodology

2.1. Geostatistical inverse modeling

Geostatistical inverse modeling (GIM) follows a Bayesian approach,and it is based on the principle of combining the prior information(i.e., spatial and/or temporal autocorrelation) with the informationfrom available measurements (Michalak, Bruhwiler & Tans, 2004). Spa-tial and/or temporal autocorrelation can provide information about thestructure of the data that can be used to reduce prediction uncertainty.GIM has been applied in ground water systems (Kitanidis, 1995), con-taminant sources identification (Snodgrass & Kitanidis, 1997), estimat-ing surface fluxes of atmospheric trace gases (Gourdji, Mueller,Schaefer, & Michalak, 2008; Michalak, Bruhwiler & Tans, 2004), charac-terizing attribute distributions in water sediments (Zhou & Michalak,2009), and merging remote sensing images with variable resolutions(Erickson & Michalak, 2006). There remain unrealized opportunities inapplying GIM for image scaling (i.e., downscaling and up-scaling) andmulti-resolution data fusion. In comparison with area-to-point kriging,which also deals with predicting point values from areal data, the co-variance parameters in the GIM framework can be inferred directlyfrom measurements by the restricted maximum likelihood algorithm.Moreover, measurements with variable spatial supports can also bemerged to produce a single resolution image using GIM.

We only present the key equations of GIM here. Readers are referredto Michalak et al. (2004) for an in-depth discussion about GIM. Thespatial prediction problem of GIM can be expressed as

z ¼ h s; rð Þ þ v ð1Þ

Page 3: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

207J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

where z is an n × 1 vector of measurements, s is anm × 1 vector of pre-dictions, and the vector r contains other parameters needed by thetransformation function h(s, r). The vector v describes the model-datamismatch (i.e., error). The error includes both measurement errors as-sociated with collecting the data and any random numerical or concep-tual inaccuracies associated with the evaluation of the function h(s, r).When the function is linear with respect to the unknown surface s, itcan be written as

h s; rð Þ ¼ Hs ð2Þ

where H is a known n × mmatrix, the Jacobian representing the sensi-tivity of the measurements z to the predictions s (i.e., Hij = ∂zi / ∂sj,where i is the index ofmeasurements, and j is the index of prediction lo-cations). For remote sensing applications, the transformation functioncan be defined as the point spread functions (PSFs) of sensors. PSFs formany electro-optical sensors follow a two-dimensional Gaussian distri-bution (Huang, Townshend, Liang, Kalluri, & DeFries, 2002). The sensi-tivity matrix is calculated in the same way where one or multipledatasets are considered. In the cause of multiple datasets with variablespatial supports, the sensitivity matrix is

H ¼H1⋮HN

24 35 ð3Þ

where N is the number of datasets. We assume v has zero mean andknown covariance matrix R. The covariance of the measurement errorsis most commonly modeled as

R ¼ σ2RI ð4Þ

where σR2 is the variance of the measurement error, and I is an n × n

identity matrix.Following the Bayes' theorem, the posterior probability density

function of a state vector s given an observation vector z is proportionalto the likelihood of the state given the data (or, conversely, the probabil-ity density function of the data given the state) times the prior probabil-ity density function of the state (Eq. 5). The drift parameters β areestimated along with s.

p s;βð jzÞ ¼ p zð jsÞp s;βð ÞZp zð jsÞp s;βð Þds

ð5Þ

The prior probability density function represents the assumedspatial structure of the prediction field. The likelihood of the data repre-sents the degree to which a prediction of the unknown function s repre-sents the measurements z. The prior probability density function ismodeled as

p s;βð Þ ¼ p sð jβÞp βð Þ∝ Qj j−1=2 exp −12

s−Xβð ÞTQ−1 s−Xβð Þ� �

ð6Þ

where X is anm × tmatrix of auxiliary variables related to the distribu-tion of predictions, t is the number of auxiliary variables, β is a t × 1vector of drift coefficients on X, Xβ is the model of the trend, and Q isan m × m matrix representing the spatial autocorrelation of residualsthat are not explained by the model of the trend. The prior probabilitydensity function of β is assumed to be uniform over all values(p(β) ∝ 1). The likelihood of measurements is modeled as

p zð jsÞ∝ Rj j−1=2 exp −12

z−Hsð ÞTR−1 z−Hsð Þ� �

ð7Þ

The posterior probability density of the unknown surface distribu-tions s becomes

p s;βð jzÞ∝ Rj j−1=2 Qj j−1=2 exp −12

z−Hsð ÞTR−1 z−Hsð Þ−12

s−Xβð ÞTQ−1 s−Xβð Þ� �

ð8Þ

By taking the negative logarithm of the above formula, we can getthe objective function of GIM.

Ls;β ¼ z−Hsð ÞTR−1 z−Hsð Þ þ s−Xβð ÞTQ−1 s−Xβð Þ ð9Þ

The best predictions are obtained by minimizing the objectivefunction with respect to s and β. By taking the first order derivative ofthe above objective function with respect to s and β and expressingthe result in the matrix form, we can get the solution of the GIM.

HQHT þ RHXð ÞT

HX0

" #λT

M

� �¼ HQ

XT

� �ð10Þ

where λ is an m × n matrix of estimated weights assigned to relatedobservations, and M is a p × m matrix of Langrange multipliers. Bysolving λ and M, we can get the best predictions of s and β

bs ¼ λz ð11Þ

bβ ¼ XTQ−1X� �−1

XTQ−1bs ð12Þ

where bs and bβ are the best predictions of s and λ, respectively. Theposterior covariance of bs is calculated by the inverse of the Hessian ofthe objective function (Kitanidis, 1995; Michalak, Bruhwiler & Tans,2004).

VsbVsbβb

VsbβbVβb

� �¼ Q−1 þ HTR−1H

XTQ−1Q−1XXTQ−1X

�−1

"ð13Þ

The portion of the matrix defining the posterior covariance of bs isVbs ¼ −XM þ Q−QHTλ ð14Þ

The diagonal elements of Vbs represent the predicted error varianceof individual elements in bs. An overview of GIM for synthesizing infor-mation from multiple image datasets is shown in Fig. 1.

In order to accommodate spatial nonstationarity and reduce compu-tational burden associated with large spatial data, we developed amoving-window GIM approach, which is an integration of GIM andthe moving-window approach. The general idea of moving-windowGIM is that instead of using all available measurements to predictthe value at a prediction location, only measurements within local-windows are used. Local-windows are centered at prediction locations.The rationale behind this approach is that measurements distant from aprediction location may contribute very little to the prediction, exceptlong-range dependence in sparse data situations. In most remote sens-ing applications, this is particularly true because images usually covereverywhere in the study domain, except the missing pixels caused bymeasurement errors or clouds. Moreover, the point spread functions(PSFs) of optical sensors also have a blurring effect because pixel valuesare calculated using data from neighboring pixels (Huang, Townshend,Liang, Kalluri & DeFries, 2002). In this work, the size of local-windowswas chosen based on pre-calculations of covariance parameters overthe prediction domain in order to guarantee the size of local-windowsmeets both computational requirements and the accuracy of approxi-mation. We used measurements in local-windows for both parameterestimations and spatial predictions.

Page 4: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

Sensor Measurements

Dataset 1

Pixel coordinatesPixel values (z1)

Relationship matrix (H1)

Dataset 2

Pixel coordinatesPixel values (z2)

Relationship matrix (H2)

Dataset N

Pixel coordinatesPixel values (zN)

Relationship matrix (HN)

Geostatistical Inverse Model

Prior Information

Spatial trend (Χβ)Covariance structure (Q)

Modeling Results

Best predictions (s)Variances of errors (Vs)Conditional simulations

Fig. 1.Aflowchart of the geostatistical inversemodelingmethodology formerging remotelysensed datawith variable resolutions to produce a single resolution image.Multiple datasets(z1, z2, …, zN) can be merged to produce a single resolution image with different relation-shipsmatrices (H1,H2,…,HN). The prior information on the covariance structure of the pre-diction field (Q) was also estimated from measurements (see the following section).

208 J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

2.2. Covariance parameter estimations

Parameter estimations refer to the statistical inference of covarianceparameters for a stochastic model of measurements. There are severalapproaches for estimating covariance parameters of geostatisticalmodels, such as ordinary-least-square (OLS; Bogaert & Russo, 1999),weighted-least-square (WLS; Cressie, 1985), and restricted maximumlikelihood (REML; Kitanidis, 1995; Michalak, Bruhwiler & Tans, 2004).In this work, we applied REML for optimizing covariance parameters bymaximizing the likelihood of measurements. REML is a variant of themaximum likelihood (ML) algorithm. REML leads to less biased estimatesof covariance parameterswhen the sample size is small (Diggle&Ribeiro,2007). Image scaling and multi-resolution data fusion involve two typesof parameter estimations. If there are available measurements at the pre-diction resolution, we can directly estimate covariance parameters usingthe measurements. Otherwise, we have to infer covariance parametersat the prediction resolution using data measured at other spatial resolu-tions. REML can be applied for the two types of parameter estimations.

First, whenmeasurements are available at the prediction resolution,REML for parameter estimations is referred to as “REML-Kriging(Gourdji, Hirsch, Mueller, Yadav, Andrews & Michalak, 2010; Nagle,Sweeney, & Kyriakidis, 2011).” The objective function of “REML-Kriging” is defined as

LΨ ¼ 12

ln Qj j þ 12

ln XTQ−1X��� ���

þ 12

sT Q−1−Q−1X XTQ−1X� �−1

XTQ−1� �

s� �

ð15Þ

In this case, the gradient-based searching routine can be used for pa-rameter optimizations because the number of measurements is usuallyquite large for remote sensing applications. The uncertainties of the co-variance parameters (i.e., sill, range, and nugget) in Q can be calculatedby the Hessian computations.

Second, when no measurements are available at the prediction res-olution, REML for parameter estimations is referred to as “REML-InverseModeling (Kitanidis, 1995; Michalak, Bruhwiler & Tans, 2004).” The ob-jective function of “REML-Inverse Modeling” is defined as

L ¼ 12

ln Ψj j þ 12

XTHTΨ−1HX��� ���þ 1

2zTΞz ð16Þ

Ψ ¼ HQHT þ R ð17Þ

Ξ ¼ Ψ−1−Ψ−1HX XTHTΨ−1HX� �−1

XTHTΨ−1 ð18Þ

If the number of measurements is small, we can use theunconstrained nonlinear optimization routine for parameter optimiza-tions. Otherwise, the gradient-based searching routine can be appliedfor parameter optimizations. The two optimization algorithms areboth provided in the software package MATLAB (Mathworks Inc.,Natick, Massachusetts, USA). In the moving-window GIM model, weused “REML-Inverse Modeling” for estimating covariance parametersin local-windows because “REML-InverseModeling” leads to less biasedestimates of covariance parameters when the sample size is small(Diggle & Ribeiro, 2007). The unconstrained nonlinear optimizationroutine was applied for parameter optimizations in local-windows.

In this work, we applied the exponential covariance function for allgeostatistical models. The exponential covariance function and thecorresponding semivariogrammodel are defined as

Q hð jσ2; lÞ ¼ σ2 exp −h

l

� �ð19Þ

γ hð jσ2; lÞ ¼ σ2 1− exp −h

l

� �� �ð20Þ

where σ2 is the variance, l is the integral scale, and h is the distancebetween two measurements. The straight-line distance (i.e., Euclidiandistance) and the great-circle distance were used in this work forsynthetic and real data modeling experiments, respectively. The great-circle distance between xi and xj is defined as (Diggle & Ribeiro, 2007)

h xi; xj

� �¼ θ cos−1 sinφi sinφ j þ cosφi cosφ j cos ϕi−ϕ j

� �� �ð21Þ

where θ is the radius of the Earth, φi and ϕi are the latitude and longi-tude of xi, respectively, and φj and ϕj are the latitude and longitude ofxj, respectively.

2.3. Unconditional and conditional simulations

Unconditional and conditional simulations are spatially consistentMonte Carlo simulations. They are geostatistical approaches for describ-ing spatial variability of random fields. Conditional simulations generaterealizations of a random field that possess the same structural character-istics asmeasurements. It can also reproducemeasurements. Conditionalsimulations are equally likely realizations that have the same spatialstructure defined by the covariance function. Unconditional simulationsalso follow the same spatial dependence structure defined by measure-ments, but they cannot reproduce measurements (Chilès & Delfiner,1999; Diggle & Ribeiro, 2007). Unconditional and conditional simulationscan generatemultiple outcomes, andeachofwhich is an equally likely re-alization. When modeling a stationary Gaussian random field over anarea that is much larger than the range, a single simulation can give aview of a variety of possible local situations (Chilès & Delfiner, 1999).

In order to demonstrate the application of GIM for image downscal-ing and multi-resolution data fusion on synthetic images with spatiallystationary characteristics, we used unconditional simulations to gener-ate such synthetic data. In this case, the Gaussian random field modelwas used inmodeling the spatial random process. The Gaussianmodelsare widely used because they are simple, and they can capture a widerange of spatial behaviors according to the specification of their correla-tion structures (Diggle & Ribeiro, 2007). The random field is spatiallystationary (i.e., second-order stationary) if the mean values of regional-ized variables are constant, and the covariance is only determined bythe distance between two measurements (Chilès & Delfiner, 1999).The generation of synthetic images proceeded as follows. The user-

Page 5: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

209J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

specified covariance function is first decomposed by the Choleskydecomposition

Q ¼ CCT ð22Þ

where Q is the covariance function. The unconditional simulations aregenerated by

sui ¼ Xβ þ Cui ð23Þ

where Xβ is the model of spatial trend and can be set as zero, and ui is avector of normally distributed random numbers with zero-mean andunit-variance. In the case of generating synthetic images, the model ofspatial trend was set as zero.

The conditional simulations of the geostatistical inverse model aredefined as (Kitanidis, 1995; Michalak, Bruhwiler & Tans, 2004)

sci ¼ sui þ λ zþ v−Hsuið Þ ð24Þ

where v is a vector of normally distributed random numbers, which issampled from a multivariate normal distribution with mean zero andthemodel-data mismatch error covariance R = σR

2I. sui is the uncondi-tional simulation of GIM (Eq. 23).

2.4. Fixed Rank Kriging

In order to demonstrate the application of the moving-window GIMapproach for image downscaling and multi-resolution data fusion onsynthetic data with spatially nonstationary characteristics, we usedfixed ranking kriging (FRK; Cressie & Johannesson, 2008) to generatesuch synthetic data. FRK is a low-rank representation of spatial contin-uous random processes, and it eliminates or reduces some componentsof spatial variability to improve computational efficiency. This datadimension reduction approach was developed for dealing with thespatial prediction problem associated with large spatial data. Remotesensing images covering large geographic areas usually have spa-tially nonstationary characteristics. FRK can accommodate spatialnonstationarity by using a set of multi-scale basis functions in themodel. Readers are referred to Cressie et al. (2010), Cressie andJohannesson (2008), and Kang et al. (2010) for detailed discussionsabout FRK. Only brief descriptions are provided here because we onlyused FRK to generate synthetic images.

The goal is to make statistical inferences of a spatial processY sð Þ : s∈D⊂Rd

n obased on measurements with errors. The measure-

ments are given by the data model.

Z sð Þ ¼ Y sð Þ þ ε sð Þ ð25Þ

where {ε(s):s ∈ D} is a white-noise Gaussian process with zero-mean and variance σε

2 N 0. We assume that Y(s) has the followingstructure.

Y sð Þ ¼ μ sð Þ þ υ sð Þ ð26Þ

where μ(·) is a deterministic trend function representing the large-scale spatial variability (e.g., μ(·) = Xβ). The small-scale spatialvariability is modeled as a Gaussian process. We assume υ(·) iswith zero-mean, and it follows a spatial random effect model. Theunknown random variables to be predicted are fixed at a number,which is equal to the number of spatial basis functions. It can beexpressed as

υ sð Þ ¼ S sð Þ′ηþ ξ sð Þ ð27Þ

where S(⋅) ≡ (S1(⋅),..., Sr(⋅))′ is a set of basis-functions, which cancapture different scales of spatial dependence. The basis-functions arenot necessarily orthogonal, but they should represent information atmultiple resolutions. The multi-resolution wavelet basis function

(Cressie, Shi & Kang, 2010; Kang, Cressie & Shi, 2010) and the bi-square basis function (Cressie & Johannesson, 2008; Nguyen, Cressie,& Braverman, 2012) have been used in the previous studies. In thiswork, we chose the bi-square basis function considering its simplicityand computational efficiency (Cressie & Johannesson, 2008).

Sj lð Þ uð Þ ¼ 1− u−vj lð Þ =rl� �2

�2; u−vj lð Þ

≤rl

0 ; otherwise

8<: ð28Þ

where vj(l) is one of the center points of the lth resolution (l = 1, 2, 3,⋯N),and rl = 1.5 (i.e., the shortest arc distance between center points of the lthresolution). The generated synthetic images can have more spatial vari-ability by increasing the number of resolutions in the model. However,this will increase computational requirements for generating such syn-thetic data. The spatially nonstationary covariance function ismodeled as

C u; vð Þ ¼ S uð Þ′ΚS vð Þ; u; v∈Rd ð29Þ

where K is a positive definite r × r unknown matrix, η = (η1,..., ηr)′ is azero-mean Gaussian random vector with cov(η) ≡ Κ, and ξ(·) capturesthe fine-scale spatial variability. The fraction of the fine-scale spatialvariability to the total spatial variability and the signal-to-noise ratio ofthe generated synthetic image can also be defined in the FRK model(Cressie, Shi & Kang, 2010).

3. Computer experiments

In this work, the overall geostatistical modeling experimentsproceeded as follows. First, in order to isolate various problems associ-ated with real remote sensing images (e.g., measurement errors), wedemonstrated GIM and moving-window GIM for image downscalingand multi-resolution data fusion on synthetic images. Second, weapplied moving-window GIM for merging AOD measurements derivedfrom two sensors and for super-resolution mapping of global AODdistributions. For all experiments on synthetic and real data, we usedREML for estimating covariance parameters. In order to assess predic-tion accuracies and the correctness of prediction uncertainties, wecalculated the root mean square error (RMSE) between spatial predic-tions and measurements, and the percentage of pixels in originalimages with their values falling within two standard deviations of thebest predictions. We call the first measure the RMSE index and thesecond measure the percentage index. In addition, image registrationis an important step prior to pixel-level data fusion. We assumed thatthe images used for multi-resolution data fusion have been registeredcorrectly. Analyzing the influence of image mis-registration on theaccuracy of spatial predictions is beyond the scope of this work. All ofthe geostatistical models discussed in the paper (Section 2) and thefollowing experiments were coded in MATLAB by the authors.

3.1. Generating synthetic images

We applied unconditional simulations with user-specified covari-ance parameters to generate synthetic images with spatially stationarycharacteristics. The synthetic images with and without nugget effectwere generated and used in the modeling experiments. The simulatedreference pixel values were distributed on a 120 × 120 regular grid(of unit cell size). In this work, we differentiated fine-scale spatialvariability and measurement errors, and we defined nugget effect asfine-scale variability of image data (Cressie, Shi & Kang, 2012). More-over, we did not introduce simulated measurement errors into the syn-thetic images. The covariance functions used for generating syntheticimages are

C hð Þ ¼ 0:01� 1− h10

� �ð30Þ

Page 6: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

210 J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

C hð Þ ¼ 0:005þ 0:02� 1− h5

� �ð31Þ

where C(h) is the covariance function, and h is the Euclidian distancebetween two locations. Eq. (30) is the exponential covariance function,and the values of sill and range are 0.01 and30, respectively. Eq. (31) is acombination of the exponential covariance function and nugget effect,and the values of sill and range are 0.025 and 15, respectively.

We used FRK to generate a synthetic image with spatiallynonstationary characteristics. The values of sill and range for the expo-nential covariance function in FRKwere set as 0.05 and 30, respectively.The specified covariance function was used to parameterize the Kmatrix in the spatially nonstationary covariance function (Eq. 29). Inorder to simulate spatial variability of real remote sensing images, weused a large number of basis-functions for characterizing multi-scalespatial variability. In this experiment, we set the resolution number (l)as five. The number of bi-square basis-functions used for generatingthe synthetic image was 1364, which was the sum of basis-functionsin each of the five lower resolutions: 4 (1 × 4), 16 (4 × 4), 64(16 × 4), 256 (64 × 4), and 1024 (256 × 4). We set the fraction offine-scale spatial variability to the total spatial variabiltiy as 0.05. Wedid not introduce measurement errors into the synthetic image. There-fore, the value of the signal-to-noise ratio was ∞. The simulated pixelswere distributed on a 320 × 320 regular grid (of unit size).

3.2. Merging synthetic images with variable resolutions

We designed four experiments for demonstrating the application ofGIM for image downscaling and multi-resolution data fusion on syn-thetic images with spatially stationary characteristics: two were onthe synthetic image without nugget effect, and the other two were onthe synthetic image with nugget effect. The modeling experimentsproceeded as follows. Given space limitations, we only used the exper-iment on the synthetic imagewithout nugget effect as an example. First,we averaged the synthetic image to two coarser resolutions using anon-overlapping moving-window. The first aggregated image waswith the spatial support of 64 (i.e., the size of the local-window was8 × 8), and the second aggregated image was with the spatial supportof four (i.e., the size of the local-window was 2 × 2). We only usedthe diagonal part of the second aggregated image for the experiment.We set two relationship matrices (i.e., the H matrix in GIM) for thetwo aggregated images because they had different spatial supports.We used the uniform distribution function to represent the relation-ships between coarse-resolution measurements and fine-resolutionpredictions, and we assumed sub-pixels within a coarse-resolutionpixel had the same contribution to the value of the coarse-resolutionpixel. The size of the prediction field was set as the same size of theoriginal simulated image. We also generated conditional simulationsto show spatial uncertainties of the predictions. In the secondexperiment, the aggregated image with the spatial support of 64was merged with the diagonal part of the aggregated image with thespatial support of 16 (i.e., the size of the local-window was 4 × 4). Wealso ran two similar experiments on the synthetic image with nuggeteffect.

We demonstrated the application of moving-window GIM for multi-resolution data fusion on the synthetic imagewith spatially nonstationarycharacteristics. Similar to the above experiments, the synthetic imagegenerated by FRK was averaged using non-overlapping moving win-dows to two coarser resolutions: one aggregated image was with thespatial support of 16 (i.e., the size of the local-window was 4 × 4),and the other aggregated image was with the spatial support of four(i.e., the size of the local-window was 2 × 2). Only the diagonal partof the second aggregated image was used in the experiment. Wealso set two relationship matrices for the two aggregated images withdifferent spatial supports. The uniform distribution function was alsoused for representing the relationships between coarse-resolution

measurements and fine-resolution predictions. In order to assess themodeling results, the size of the prediction field was also set as thesame size of the original simulated image.

3.3. Merging AOD measurements with variable resolutions

We applied moving-window GIM for merging Level-3 AOD data de-rived from the MISR and the Terra-MODIS sensors and for super-resolution mapping of global AOD distributions in short time-intervals.The spatial resolutions of Level-3 MISR AOD and Terra-MODIS AODare 0.5° × 0.5° and 1° × 1°, respectively. MISR and Terra-MODIS areboth carried by the Terra satellite but with different sensor characteris-tics, such as the size of swath and the instantaneous field of view.Second, the algorithms for retrieving AOD from original MISR andTerra-MODIS data are different. MODIS land retrieval algorithm doesnot operate over desert (e.g., northern Africa and Middle East) orother bright land surfaces, and MODIS has difficulties in computingAOD in part of its swath over darkwater due to sun glint. MISR canmea-sure AOD in the above conditions (Kahn et al., 2009). Moreover, MODIShasmuchwider spatial coverage thanMISR does due to itswider swath.However, MISR has measurements in some areas where MODIS doesnot have. Given the differences in instrumental designs and retrievalalgorithms, AOD measurements from the MISR and MODIS sensorswere used in conjunctionwith one another to exploit their complemen-tary strengths, especially for the middle and low latitudes (Kahn et al.,2009). In addition, AOD is changing quickly in space and time, measure-ments from one sensor in a short time-interval may not be enough tocapture the functional features of the continuous AOD process. Thecomplementary coverage makes the two AOD datasets suitable fordata fusion.

However, there were several difficulties in merging the AODmeasurements from the MISR and MODIS sensors. We have discussedspatial nonstationarity and computational burden associated withlarge spatial data. In the real data modeling experiments, we had bothof the problems. In this section, we ran two experiments. One experi-ment was merging the one-day AOD measurements (i.e., August 1,2008) derived from the two sensors, and the other experiment wasmerging the eight-day AOD measurements (i.e., from July 28, 2008 toAugust 4, 2008) derived from the two sensors. These data were selectedrandomlywith no specific reasons. For the one-dayAODmeasurements,the numbers of pixels for theMISR AODandMODISAODmeasurementswere 15,612 and 23,253, respectively. For the eight-day AOD measure-ments, the numbers of pixels for the MISR AOD and MODIS AOD mea-surements were 129,664 and 40,766, respectively. Most of the AODmeasurements from the two sensors were made between 70° N and50° S, and between 180° W and 180° E. For the two real data modelingexperiments, the MISR AOD and MODIS AOD measurements weremerged and downscaled to 0.25° × 0.25° for super-resolution mappingof global AOD distributions. Similar to the experiments on syntheticdata, we also used two relationship matrices for the two AODmeasure-ments with different spatial supports. We did not know exactly the re-lationships between measurements and predictions for the aggregatedAOD data. We assumed that sub-pixels within a coarse-resolutionpixel had the same contribution to the value of the coarse-resolutionpixel. We used the uniform distribution function to calculate therelationship matrix (i.e., the H matrix in GIM). We applied moving-window GIM to accommodate spatial nonstationarity and reduce com-putational burden associatedwith large spatial data. By pre-calculationsand analyses, we set a local-window with the size of 2000-km for bothparameter estimations and spatial predictions. Even for the areas withthe AOD measurements from both the MODIS and MISR sensors,merging them through the moving-window GIM approach canprovide more accurate and robust predictions of the true AOD process.We also calculated prediction uncertainties for the two real dataexperiments.

Page 7: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

211J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

4. Results

4.1. Image downscaling and multi-resolution data fusion

For the synthetic images generated by unconditional simulations,the image without nugget effect and with a longer correlation lengthand a lower sill (Fig. 2A) shows less spatial variability than the otherimage (Fig. 2B). The synthetic image generated by FRK is shown inFig. 2C.

The results of the five synthetic data modeling experiments areshown in Table 1. The values of RMSE index and the percentage indexshow that the spatial predictions of the five experiments were all accu-rate and the prediction uncertainties were all correct. The values ofRMSE were all lower than 10% of the original measurements. Theoreti-cally, for realizations of Gaussian random fields, 95% of the pixel valuesof the original images should fall within the two standard deviationbounds of the best spatial predictions. The prediction accuracies of themodeling experiments using the synthetic imageswithout nugget effect(Experiments 1 and 2) were better than the results of the experimentsusing the synthetic images with nugget effect (Experiments 3 and 4).The prediction accuracies of the modeling experiments using fine-resolution aggregated images (Experiments 1 and 3) were better thanthe results of the experiments using coarse-resolution aggregatedimages (Experiments 2 and 4). The prediction accuracy of Experiment5 was better than the results of the first four experiments, and the per-centage index of this experiment was also higher than the results of thefirst four experiments. This may be caused by the fact that the syntheticimage generated by FRK has less spatial variability than the syntheticimages generated by unconditional simulations.

Given space limitations, we only show the figures of the modelingresults of Experiments 1, 3, and 5 (Fig. 3). The spatial predictionsshowmore spatial details in the diagonal parts with fine-resolution ag-gregated data added in (Fig. 3B, F, and J). Moreover, the diagonal partshad lower prediction uncertainties than the other parts (Fig. 3C, G,and K). These results were reasonable because all spatial predictionsbenefit from the presence of nearby measurements. Spatial predictionsmade in the densely measured regions should have lower errors thanthose made in the sparsely measured regions. The conditional simula-tions (Fig. 3D, H, and L) look more realistic as the original simulateddata (Fig. 2A, B, and C), and they show less smoothing effect than thespatial predictions (Fig. 3B, F, and J). From Fig. 3K we can see that thevalues of prediction uncertainties (i.e., standard errors) varied acrossthe domain. This demonstrates that moving-window GIM couldaccommodate the spatial nonstationarity of the synthetic image gener-ated by FRK. Therefore, using spatially stationary covariance functionsfor the realizations of spatial nonstationary random fields mayintroduce more prediction errors and generate incorrect predictionuncertainties.

Fig. 2. Synthetic images: (A) and (B) are the images without and with nugget effect, respectgenerated by FRK.

4.2. Super-resolution mapping of global AOD distributions

From the one-day AOD measurements derived from the MISR andMODIS sensors (Fig. 4A and C), we can see that MODIS had morecomplete spatial coverage than MISR did. MODIS AOD was missing insome of the low andmiddle latitudes (Fig. 4A).MISR hadmeasurementsin those regions, but the field of view of MISRwas narrow (Fig. 4C). Theeight-day MODIS AOD measurements had more complete coveragethan the one-day MODIS AOD measurements. However, there werestill gaps in the low and middle latitudes (i.e., North Africa and MiddleEast) (Fig. 4B). The eight-day MISR AODmeasurements had better cov-erage than the one-day MISR AOD measurements, but they still did notcover the whole globe (Fig. 4D). Visually, the data gaps of the eight-dayMODIS AOD measurements can be mostly covered by the eight-dayMISR AOD measurements. Merging the two datasets can produce amore complete picture of global AOD distributions.

By comparing the spatial predictions using the one-day and eight-day AOD measurements (Fig. 4E and F), we can find that although thegeneral patterns of the two images were similar, there were still differ-ences between the two figures, especially in North Eurasia and NorthAmerica. This indicates that aerosol concentrations were shifting quick-ly over space and time. Therefore, mapping AOD distributions overshort time-intervals may be important. Previous studies found thatthe high concentrations of AOD were from different sources: highAOD values in North Africa and Middle East, Central Africa, and EastAsia (e.g., China) and North America were mainly from dust, smoke,and pollution, respectively (Ichoku, Kaufman, Remer, & Levy, 2004).The standard errors of spatial predictions show that the errors of theexperiment using the eight-day AOD measurements (Fig. 4G) werelower than the errors of the experiment using the one-day AODmeasurements (Fig. 4H). In addition, the prediction uncertainties insome of the low and middle latitudes with fewer measurements werehigher than the prediction uncertainties in the regionswith densemea-surements. These were reasonable because there were few measure-ments to constrain spatial predictions in the low and middle latitudes.

4.3. Moving-window GIM versus spatial statistical data fusion

In order to gain insights into the relative performance and com-putational efficiency of moving-window GIM, we compared thisapproach with spatial statistical data fusion (SSDF; Nguyen, Cressie& Braverman, 2012). SSDF is a recently developed geostatistical ap-proach for merging large spatial data with variable resolutions.SSDF was developed based on FRK, and it uses a data dimension re-duction approach to improve computational efficiency. For the modelcomparison, we selected the subsets of the eight-day AOD measure-ments (i.e., from July 28, 2008 to August 4, 2008) from the MISR andMODIS sensors within the latitudes 20° N–60° N and the longitudes

ively. These two images were generated by unconditional simulations. (C) is the image

Page 8: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

Table 1Results of merging multi-resolution synthetic images.

Experiments with aggregated images RMSE index Prediction uncertainty Percentage index

Lower_bound Upper_bound

Images with spatially stationary characteristicsImages without the nugget effect

Experiment 1: 15 × 15 and part of 60 × 60 0.0395 0.0237 0.0663 95.60%Experiment 2: 15 × 15 and part of 30 × 30 0.0427 0.0305 0.0667 95.72%

Images with the nugget effectExperiment 3: 15 × 15 and part of 60 × 60 0.1008 0.0781 0.1361 95.61%Experiment 4: 15 × 15 and part of 30 × 30 0.1074 0.0917 0.1369 95.65%

Images with spatially nonstationary characteristicsExperiment 5: 160 × 160 and part of 80 × 80 0.0361 0.0209 0.0816 98.27%

212 J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

70° E–120° E. The numbers of pixels for theMISR AOD andMODIS AODmeasurements within the subsets were 4412 and 1477, respectively. Inorder to evaluate both short-range and long-range predictions, we usedto two sampling schemes here. First, we randomly selected 10% of theAOD measurements within the subsets and treated them as testingdata. The remaining 90% of themeasurementswere used for spatial pre-dictions. This was for evaluating short-range predictions. Second, werandomly selected a test region within the latitudes 30° N–40° N andlongitudes 90° E–120° E (i.e., 10% of the total selected subset area).The numbers pixels for the MISR AOD and MODIS AOD measurementswithin the test regionwere 397 and 159, respectively. This was for eval-uating long-range predictions (e.g., big gaps in remote sensing images).In the two experiments, spatial predictionsweremade at the resolutionof 0.25° × 0.25°. For SSDF, the number of bi-square basis functions usedwas 340 (i.e., the resolution number was four). For moving-windowGIM, we used the uniform distribution function to calculate the

Fig. 3. Results of merging synthetic data with variable resolutions: (A), (E), and (I) are overlay(B), (F), (J) are spatial predictions with data from (A), (E), and (I), respectively; (C), (G),andconditional simulations for (B), (F), and (J), respectively.

relationship matrix (i.e., the H matrix in GIM). The spatial predictionswere averaged back to the spatial resolutions of the AODmeasurementsfor cross-validations. In the first experiment, the values of RMSE formoving-window GIM and SSDF were 0.3519 and 0.8106, respectively.In the second experiment, the values of RMSE for moving-windowGIM and SSDF were 0.4973 and 0.8112, respectively. The value ofRMSE formoving-windowGIM increased significantly in the second ex-periment. This was reasonable because spatial predictions made bymoving-window GIM were based on nearby measurements in local-windows, and SSDF used all of the available data for spatial predictions.The values of RMSE for the twomodelsmay vary across different test re-gions because the availabilities of the MODIS AOD and MISRAOD measurements vary across space (Fig. 4A–D). SSDF was morecomputationally efficient than moving-window GIM. For each of theabove two experiments, SSDF took about 117 s on a 2.8 GHz machinewith an Intel Core Processor, and moving-window GIM took about

s of synthetic images with different resolutions for Experiments 1, 3, and 5, respectively;(K) are prediction uncertainties for (B), (F), and (J), respectively; and (D), (H), (L) are

Page 9: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

Fig. 4. Results ofmerging real datawith variable resolutions: (A) and (C) are the one-dayAODmeasurements (i.e., August 1, 2008) derived from theMODIS andMISR sensors, respectively;(B) and (D) are the eight-day AODmeasurements (i.e., from July 28, 2008 to August 4, 2008) derived from theMODIS andMISR sensors, respectively; (E) is spatial predictions withmea-surements from (A) and (C); (F) is spatial predictionswithmeasurements from (B) and (D); and (G) and (H) are prediction uncertainties associatedwith (E) and (F), respectively. Only themodeling results covering the continents are shown in the maps.

213J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

6 min on a 16-processor workstation (i.e., the computing processwas parallelized). These were reasonable because SSDF eliminatessome components of spatial variability to improve computationalefficiency.

5. Discussion and conclusions

We have demonstrated GIM and moving-window GIM for imagedownscaling and multi-resolution data fusion on synthetic and realimages. There were several limitations in the above experiments. First,we did not deal with measurement errors of remote sensing images.Measurement errors varywith spatial resolutions of remote sensing im-ages, and they also vary for data derived from different sensors. For theexperiments on synthetic and real images, we treated nugget effect as

fine-scale variability, and we built geostatistical models to model it.However, real remote sensing images are always contaminated withmeasurement errors. Future work needs to examine the influence ofmeasurement errors on spatial predictions. Fortunately, GIM canaccount for measurement errors by the model-data mismatch matrix(i.e., the R matrix in GIM). Second, we applied the Gaussian randomfield model for both synthetic and real images. In some cases, measure-ments may have non-Gaussian characteristics. In those occasions weneed to extend the geostatistical inverse model to accommodate non-Gaussian spatial data by using a set of models known as generalized lin-earmixedmodels. The actual probability distributions ofmeasurementsneed to be explored in order to define the link function of generalizedlinear mixed models. Third, we used exponential covariance functionsfor both synthetic and real data modeling experiments. However,

Page 10: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

214 J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

this could affect the assessment of the correctness of prediction uncer-tainties (i.e., the values of the percentage index) when the data do notfollow exactly the exponential covariance function. Future work needsto include detailed exploratory analyses about the characteristics ofthe spatial variability of the data. In this work, we did not use auxiliaryinformation for spatial predictions, although GIM can include auxiliaryvariables (i.e., the model of the spatial trend Xβ) for improving predic-tion accuracy. Other studies found that including auxiliary variables inGIM can improve prediction accuracy (Gourdji, Mueller, Schaefer &Michalak, 2008). However, including inappropriate auxiliary informa-tion may bias the model. Usually, a statistical test of the contributionsof auxiliary variables is needed (Gourdji, Mueller, Schaefer & Michalak,2008).

We used REML for estimating covariance parameters related to thechange-of-support problem. The estimated covariance parametersalways have errors, and these errors can consequently propagate tospatial predictions. Therefore, analyzing error propagations in spatialpredictions is very important for understanding the accuracy of spatialpredictions. By runningmodeling experiments,we found that the errorsassociated with the estimated covariance parameters increased withthe increase in the resolution gap between measurements and spatialpredictions. More in-depth analyses about error propagations in spatialpredictions using GIM are still needed. Moreover, we only applied GIMfor merging coarse-resolution satellite images. When applying GIM onfine-resolution images with tens or hundreds of millions of pixels, thecomputational burden will still be a challenge. This is similar to manyother geostatisticalmodels for dealingwith large spatial data.We devel-oped amoving-windowGIM approach to reduce computational burdenassociated with large spatial data. When parallelizing the computationsof moving-windows, spatial predictions can be made very fast. Besidesparallel computing, low-rank representations (Wikle, 2010) and covari-ance tapering (Fuentes & Smith, 2001) are also possible solutions tothe problem of computational burden. Furthermore, although weused the uniform distribution function to simulate the relationships be-tween coarse-resolutionmeasurements andfine-resolution predictions,real point spread functions (e.g., the two-dimensional Gaussian distri-bution) of sensors can be used to calculate the relationship matrix(i.e., theHmatrix in GIM). Finally, we only applied GIM for image down-scaling. GIM can also be used for image up-scaling with one or moremeasurements, if we set-up a coarse prediction grid and define the rela-tionship matrix between fine-resolution measurements and coarse-resolution predictions.

We have presented an introduction of the GIM methodology formerging coarse-resolution images with variable resolutions andfor super-resolution mapping of continuous spatial processes.The results of the synthetic data modeling experiments demon-strate that GIM and moving-window GIM can produce accuratespatial predictions and generate prediction uncertainties. The re-sults of the real data modeling experiments show that moving-window GIM can be used for merging the complementary MISRAOD and MODIS AOD measurements and for super-resolutionmapping of global AOD distributions. To the best of our knowledge,this is the first introduction of the moving-window GIM approachfor solving the scale and data fusion related problems in remotesensing.

Acknowledgements

The work was conducted with the financial support from NASALand-Cover and Land-Use Change Program (NNX09AK87G). Theauthors would like to thank the science teams for producing andsharing the products of MODIS AOD and MISR AOD. The authors alsowould like to thank Dr. Hai Nguyen from the Jet Propulsion Laboratory,California Institute of Technology, andDr. Emily Kang fromUniversity ofCincinnati for providing the sample codes related to fixed rank krigingand spatial statistical data fusion.

References

Atkinson, P.M., Pardo-Iguzquiza, E., & Chico-Olmo, M. (2008). Downscaling cokriging forsuper-resolution mapping of continua in remotely sensed images. IEEE Transactionson Geoscience and Remote Sensing, 46, 573–580.

Bogaert, P., & Russo, D. (1999). Optimal spatial sampling design for the estimationof variogram based on a least squares approach. Water Resources Research, 35,1275–1289.

Braverman, A. (2008). Data fusion. In E. L. Melnick, & B.S. Everitt (Eds.), Encyclopedia ofquantitative risk analysis and assessment (pp. 125–139). New York: Wiley.

Braverman, A., Nguyen, H., & Cressie, N. (2011). Space-time data fusion for remotesensing applications. Working paper. CA: California Institute of Technology.

Chilès, J., & Delfiner, P. (1999). Geostatistics: modeling spatial uncertainty. Hoboken, NJ:John Wiley & Sons, Inc.

Cressie, N. (1985). Fitting variogram models by weighted least squares. MathematicalGeology, 17, 563–586.

Cressie, N., & Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets.Journal of Royal Statistical Society B, 70, 209–266.

Cressie, N., Shi, T., & Kang, E. (2010). Fixed rank filtering for spatial–temporal data. Journalof Computational and Graphical Statistics, 19, 724–745.

Curran, P. J., & Atkinson, P.M. (1999). Issues of scale andoptimal pixel size. In A. Stein, & F. D.V.D. Meer (Eds.), Spatial statistics for remote sensing (pp. 115–133). Netherlands: Springer.

D'Hondt, O., López-Martínez, C., Ferro-Famil, L., & Pottier, E. (2007). Spatiallynonstationary anisotropic texture analysis in SAR images. IEEE Transactions onGeoscience and Remote Sensing, 45, 3905–3918.

Diggle, P., & Ribeiro, P. J., Jr. (2007). Model-based geostatistics. New York: Springer.Erickson, T., & Michalak, A.M. (2006). Merging of variable-resolution imagery using

geostatistics and sensor PSFs. Proceedings of ASPRS Annual Conference (pp. 1–8)(Reno, NV) http://www.asprs.org/Conference-Proceedings/ASPRS-Reno-2006-Conference-Proceedings.html.

Fuentes, M., & Smith, R. L. (2001). A new class of nonstationary spatial models. TechnicalReport, North Carolina State University, Institute of Statistics Mimeo Series #2534,Raleigh. NC.

Goovaerts, P. (2008). Kriging and semivariogram deconvolution in the presence of irreg-ular geographical units. Mathematical Geology, 40, 101–128.

Gotway, C. A., & Young, L. J. (2002). Combing incompatible spatial data. Journal of theAmerican Statistical Association, 97, 632–648.

Gotway, C. A., & Young, L. J. (2005). Change of support: An interdisciplinary challenge. InP.M. Atkinson, & C. D. Lloyd (Eds.), Geostatistics for Environmental Application(pp. 1–13). Netherlands: Springer.

Gourdji, S. M., Hirsch, A. I., Mueller, K. L., Yadav, V., Andrews, A. E., & Michalak, A.M.(2010). Regional-scale geostatistical inverse modeling of North American CO2 fluxes:a synthetic study. Atmospheric and Chemical Physics, 10, 6151–6171.

Gourdji, S. M., Mueller, K. L., Schaefer, K., & Michalak, A.M. (2008). Global monthlyaveraged CO2 fluxes recovered using a geostatistical inverse modeling approach: 2.Results including auxiliary environmental data. Journal of Geophysical Research, 113,D21115, http://dx.doi.org/10.1029/2007/JD009733.

Guan, Q., Kyriakidis, P. C., & Goodchild, M. F. (2011). A parallel computing approach to fastgeostatistical areal interpolation. International Journal of Geographical InformationScience, 25(8), 1241–1267.

Hall, D. L. (2004). Mathematical techniques in multisensor data fusion. Norwood, M.A.:Artech House Inc.

Hammerling, D., Michalak, A.M., & Kawa, S. R. (2012). Mapping of CO2 at high spatiotem-poral resolution using satellite observations: Global distributions from OCO-2.Journal of Geophysical Research, 117, http://dx.doi.org/10.1029/2011JD017015(D06306).

Hass, T. C. (1990). Lognormal andmovingwindowmethods of estimating acid deposition.Journal of the American Statistical Association, 85, 950–963.

Higdon, D.M., Swall, J., & Kern, J. (1999). Nonstationary spatial modeling. In J. M. Bernardo,J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian Statistics 6 (pp. 1–9). Oxford,U.K.: Oxford University Press.

Huang, C., Townshend, J. R. G., Liang, S., Kalluri, S. N. V., & DeFries, R. S. (2002). Impactof sensor's spread function on land cover characterization: assessment anddeconvolution. Remote Sensing of Environment, 80, 203–212.

Ichoku, C., Kaufman, Y. J., Remer, L. A., & Levy, R. (2004). Global aerosol remote sensingfrom MODIS. Advances in Space Research, 34, 820–827.

Jupp, D. L. B., Strahler, A. H., & Woodcock, C. E. (1988). Autocorrelation and regularizationin digital images I. basic theory. IEEE Transactions on Geoscience and Remote Sensing,26, 463–473.

Kahn, R. A., Nelson, D. L., Garay, M. J., Levy, R. C., Bull, M.A., Diner, D. J., et al. (2009). MISRaerosol product attributes and statistical comparisons with MODIS. IEEE Transactionson Geoscience and Remote Sensing, 47, 4095–4114.

Kang, E., Cressie, N., & Shi, T. (2010). Using temporal variability to improve spatialmapping with application to satellite data. The Canadian Journal of Statistics, 38,271–289.

Kitanidis, P. K. (1995). Quasi-linear geostatistical theory for inversing. Water ResourcesResearch, 31, 2411–2419.

Kyriakidis, P. C. (2004). A geostatistical framework for area-to-point spatial interpolation.Geographical Analysis, 36, 259–289.

Kyriakidis, P. C., & Yoo, E. (2005). Geostatistical prediction and simulation of point valuesfrom areal data. Geographical Analysis, 37, 124–151.

Michalak, A.M., Bruhwiler, L., & Tans, P. P. (2004). A geostatistical approach to surfaceflux estimation of atmospheric trace gases. Journal of Geophysical Research, 109,http://dx.doi.org/10.1029/2003JD004422 (D14109).

Nagle, N. N., Sweeney, S. H., & Kyriakidis, P. C. (2011). A geostatistical linear regressionmodel for small area data. Geographical Analysis, 43, 38–60.

Page 11: Geostatistical inverse modeling for super-resolution ...danbrown/papers/wang_RSE.pdf · Geostatistical inverse modeling for super-resolution mapping of continuous spatial processes

215J. Wang et al. / Remote Sensing of Environment 139 (2013) 205–215

Nguyen, H., Cressie, N., & Braverman, A. (2012). Spatial statistical data fusion for remotesensing applications. Journal of the American Statistical Association, 107(499),1004–1018.

Pohl, C., & Van Genderen, J. L. (1998). Multisensor image fusion in remote sensing:concepts, methods, and applications. International Journal of Remote Sensing, 19,823–854.

Sales, M. H. R., Souza, C. M., Jr., & Kyriakidis, P. C. (2013). Fusion of MODIS images usingKriging with external drift. IEEE Transactions on Geoscience and Remote Sensing,http://dx.doi.org/10.1109/TGRS.2012.2208467.

Shi, T., & Cressie, N. (2007). Global statistical analysis of MISR aerosol data: a massive dataproduct from NASA's Terra satellite. Environmetrics, 18, 665–680.

Snodgrass, M. F., & Kitanidis, P. K. (1997). A geostatistical approach to contaminant sourceidentification. Water Resources Research, 33, 537–546.

Wikle, C. K. (2010). Low-rank representation for spatial processes. In A. E. Gelfand, P.Diggle, P. Guttorp, & M. Fuentes (Eds.), Handbook of Spatial Statistics (pp. 107–118).New York: CRC Press.

Zhou, Y., &Michalak, A.M. (2009). Characterizing attribute distributions in water sedimentsby geostatistical downscaling. Environmental Science and Technology, 43, 9267–9273.