www.nr.noearthobs.nr.no
Retraining maximum likelihood classifiers using a low-rank model
Arnt-Børre Salberg
Norwegian Computing Center
Oslo, Norway
IGARSS July 25, 2011
www.nr.noearthobs.nr.no
Introduction► Challenge: Dataset shift problem:
▪ Training data match the test data poorly due to atmospherical, geographical, botanical and phenological variations in the image data→ reduced classification performance
▪ Class-dependent data distribution varies ◦ between training images◦ between test and training images
► Goal: Develop a method that re-estimates the parameters such that classifier possess a good fit to the test data
www.nr.noearthobs.nr.no
Introduction
► Many surface reflectance algorithms often requires data from external sources▪ LEDAPS (Landsat):
◦ ozone and water vapor measurements
► Phenological, botanical and geographical variation in addition to atmospherical makes the calibration problem even harder
www.nr.noearthobs.nr.no
An existing method…
► Models the test image as a mixture distribution and estimates all parameters using the EM-algorithm, with estimated parameters from training data as initial values
► To many degrees of freedom. Statistic fit is excellent, but class labels get mixed.
( )
−+=
∑∑∑== =
C
iiii
PPC
C
C
Pf
PPC
C
C 1
N
1n
C
1ini
,...,,...,,...
1
1
1
1 ,|Plogmaxarg
,...,
,...,
,...
1
1
1
λΣμxΣΣ
μμ
ΣΣμμ
www.nr.noearthobs.nr.no
Low-rank parameter modeling► Training image k:
▪ Class mean vector and covariance matrix (class i)
► Class mean vector and covariance matrix model for the test image
α and β are unknown parameter vectors to be estimated from the data
)( and 1 ,, DD) (D kiki ×× Σμ
www.nr.noearthobs.nr.no
Low-rank data modeling
► The proposed method for modeling the test data is a low-rank approach since the number of parameters in α is L<D. ▪ This is much less than estimating all C·D
parameters i µ i, i=1,…,C
► By using a low-rank estimation of the class mean vectors of the test data, the spectral differences between the classes is in larger degree maintained
www.nr.noearthobs.nr.no
Parameter estimation► Procedure for estimating α
and β:▪ Select N random
samples {y1, y2,… yN} from the test image
www.nr.noearthobs.nr.no
Parameter estimation► Procedure for estimating α and β:
▪ Select N random samples {y1, y2,… yN} from the test image
▪ Model them using a Gaussian mixture distribution
► Estimate the parameters by solving the likelihood
www.nr.noearthobs.nr.no
Experiment 1:Cloud detection in optical images
► 15 different QuickBird and WorldView-2 images covering 7 different scenes in Norway
► Features▪ Band 2 (green)▪ Band 3 (red)
► Classes▪ clouds, cloud shadows, vegetation,
concrete/asphalt/etc., haze and water
► Resolution down-sampled to 19.2 m (16.0 m)
► 4 different training (sub)images
www.nr.noearthobs.nr.no
Experiment 1:Cloud detection in optical images
► Model
δ i is the eigenvector corresponding to the largest eigenvalue νi of the matrix
average
Test
eigenvector
www.nr.noearthobs.nr.no
Experiment 1:Cloud detection in optical images
► Parameter estimation. At iteration l+1:
where
www.nr.noearthobs.nr.no
Results:Cloud detection in optical images
Without retraining With retraining
www.nr.noearthobs.nr.no
Results:Cloud detection in optical images
www.nr.noearthobs.nr.no
Results:Cloud detection in optical images
www.nr.noearthobs.nr.no
Experiment 2:Tree cover mapping of tropical forest
► 13 different Landsat TM images covering an area nearby Amani, Tanzania (path/row 166/063)
► Features▪ Band 1-5 and 7
► Classes▪ Forest, spares forest, grass and soil
► Two training images (1986-10-06 and 2010-02-10)
www.nr.noearthobs.nr.no
Experiment 2:Tree cover mapping of tropical forest
► Model
α constrained to contain only positive elements
► Solution found using non-negative least-squares in combination with iterative maximum-likelihood estimation
www.nr.noearthobs.nr.no
Experiment 2:Tree cover mapping of tropical forest
► Parameter estimation: At iteration l+1
where
www.nr.noearthobs.nr.no
Results:Tree cover mapping of tropical forest
► *
Without retraining With retraining
February 2010 July 2009
www.nr.noearthobs.nr.no
Summary and conclusion
► Proposed a simple method for handling the dataset shift between training and test data
► Cloud detection: Evaluated successfully on a many different Quickbird and WorldView-2 images. ▪ Haze versus clouds▪ Confuses snow and clouds
► Guidelines on how to select the low-rank modeling functions is needed
► EM-algorithm and local minima problem
► More testing and evalidation of the method is necessary