towards a unified signal representation via empirical mode … · yao chen and camel iii....
TRANSCRIPT
Towards a unified signal representation via empirical modedecomposition
by
Jiexin Gao
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2012 by Jiexin Gao
Abstract
Towards a unified signal representation via empirical mode decomposition
Jiexin Gao
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2012
Empirical mode decomposition was proposed recently as a time frequency analysis
tool for nonlinear and nonstationary signals. Despite from its many advantages, problems
such as “uniqueness” problem have been discovered which limit the application.
Although this problem has been addressed to some extent by various extensions of
the original algorithm, the solution is far from satisfactory in some scenarios. In this
work we propose two variants of the original algorithm, with emphasis on providing
unified representations. R-EMD makes use of a set of reference signals to guide the
decomposition therefore guarantees unified representation for multiple 1D signals. 2D-
BEMD takes advantage of a projection procedure and is capable of providing unified
representation between a pair of 2D signals. Application of the proposed algorithms on
different problems in biometric and image processing demonstrates promising results and
indicates the effectiveness of the proposed framework.
ii
Acknowledgements
First and foremost, I would like to sincerely thank my advisor Prof. Dimitrios Hatzinakos.
He gave me guidance and direction that was needed to produce this work. Without his
support, this research would not have been possible. I truly appreciate all the help from
him during research and thesis writing.
I also gratefully thank members of my thesis committee, Prof. Anastasios Venet-
sanopoulos, Prof. Ashish Khisti and Prof. Gregory Steffan for taking time to provide
insightful comments.
Many thanks to all my colleagues in the Biometric Security Laboratory and friends in
the Communications group for their inspiring interaction and encouragement. You have
made the last two years of my life a wonderful experience.
Last but not the least, I would like to thank my parents and my family, for their
support, encouragement and love at all times. To them I dedicate this thesis.
iv
Contents
List of Tables viii
List of Figures ix
List of Algorithms xiii
Symbols and Abbreviations xv
1 Introduction 1
1.1 Problems of Empirical Mode Decomposition . . . . . . . . . . . . . . . . 2
1.2 Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Empirical Mode Decomposition and Its Representational Problem 7
2.1 Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Local Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Terminate Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.4 Stop Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Two Dimensional Empirical Mode Decomposition . . . . . . . . . . . . . 15
v
2.3 Bivariate Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . 21
2.4 Other Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Problem with EMD - No unified representation . . . . . . . . . . . . . . 26
2.5.1 One Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.2 Two Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 EMD with Unified Representation 32
3.1 One Dimensional Reference EMD . . . . . . . . . . . . . . . . . . . . . . 32
3.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 Relationship between extracted frequency and reference signal . . 34
3.1.3 Reference frequency selection for R-EMD . . . . . . . . . . . . . . 44
3.1.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Two Dimensional Bivariate EMD . . . . . . . . . . . . . . . . . . . . . . 45
3.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Two Dimensional Reference EMD . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Otoacoustic Emissions for Biometric Recognition via R-EMD 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Otoacoustic Emissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Signal Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 Biometric System Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 OAE signal decomposition using R-EMD . . . . . . . . . . . . . . . . . . 58
4.6.1 Reference frequency selection . . . . . . . . . . . . . . . . . . . . 59
4.6.2 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
vi
4.7 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7.1 Single ear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.7.2 Fusion of left and right ear . . . . . . . . . . . . . . . . . . . . . . 62
4.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Image Fusion via 2D-BEMD 65
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Multiscale Fusion with 2D-BEMD . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.1 Results on synthesized blurred images . . . . . . . . . . . . . . . 72
5.3.2 Results on partially focused photos . . . . . . . . . . . . . . . . . 72
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 Expression Invariant Face Recognition via 2D-BEMD 75
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Expression Transformation with 2D-BEMD . . . . . . . . . . . . . . . . . 77
6.3.1 Expression mask . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3.2 Expression transformation . . . . . . . . . . . . . . . . . . . . . . 78
6.4 Experimental setup and results . . . . . . . . . . . . . . . . . . . . . . . 80
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7 Conclusion 85
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A Demonstration of the sifting process 88
vii
B TEOAE dataset 99
B.1 Data collection setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
B.2 Intra-subject similarity and inter-subject difference . . . . . . . . . . . . 103
B.3 Outlier removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Bibliography 107
viii
List of Tables
4.1 TEOAE recording protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 TEOAE biometric recognition performance . . . . . . . . . . . . . . . . . 64
ix
List of Figures
2.1 A synthesized signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Intrinsic mode functions from decomposing the signal in Figure 2.1. From
top to bottom: IMF1, IMF2, IMF3, IMF4, IMF5, IMF6 and residue. . . 11
2.3 Amplitude versus time-frequency spectrum. . . . . . . . . . . . . . . . . 12
2.4 Local maximum and minimum detection. . . . . . . . . . . . . . . . . . . 19
2.5 Surface envelopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 An image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Intrinsic mode functions from decomposing the image in Figure 2.6. . . . 22
2.8 3D tube built to surround a complex signal. . . . . . . . . . . . . . . . . 23
2.9 Estimation of the center of 3D tube. . . . . . . . . . . . . . . . . . . . . 23
2.10 Two synthesized signals. Top: signal A. Bottom: signal B. . . . . . . . . 26
2.11 Complex intrinsic mode functions from decomposing the pair of signals in
Figure 2.10. Solid lines represent real parts of the IMFs and dashed lines
represent imaginary parts of the IMFs. . . . . . . . . . . . . . . . . . . . 27
2.12 Example of three signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.13 Decomposition obtained by applying EMD separately. . . . . . . . . . . . 29
2.14 Two faces with different expressions. . . . . . . . . . . . . . . . . . . . . 30
2.15 Decomposition obtained by applying 2D-EMD separately. . . . . . . . . . 31
3.1 Contour plot for demonstration of the solution to Equation (3.16). Shaded
area corresponds to the region on t, defined by the second inequality. . . 38
x
3.2 Examples of different values of q and their intersection with y(t) = 0.
Vertical dotted line represent the current value q = qo. . . . . . . . . . . 39
3.3 Condition for q such that no solution exists for Equation (3.13). Two
arrows: q = 0.5 and q = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Instantaneous frequency of the chirp (solid line) and the reference (dashed
line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 First level IMF: real part on the top plot and imaginary part on the bottom
plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6 Approximate of the filter structure of one level R-EMD. . . . . . . . . . . 43
3.7 Smoothed one level frequency response. . . . . . . . . . . . . . . . . . . . 44
3.8 Filter bank structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.9 Filter bank in log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.10 Demonstration of the proposed R-EMD. . . . . . . . . . . . . . . . . . . 47
3.11 Demonstration of the proposed 2D-BEMD algorithm. . . . . . . . . . . . 49
4.1 Click sound stimulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 TEOAE response after pre-processing. . . . . . . . . . . . . . . . . . . . 55
4.3 Proposed biometric recognition system. . . . . . . . . . . . . . . . . . . . 58
4.4 IMF1− 4 from decomposing a raw TEOAE recording. . . . . . . . . . . . 60
5.1 Top row: partially defocused images A (left) and B (right). Bottom row:
zoomed in details showing the differences between the two images. . . . . 67
5.2 IMFs 1-5 obtained with the proposed 2D-BEMD algorithm for partially
defocused images. Left column: IMFs corresponding to image A. Right
column: IMFs corresponding to image B. . . . . . . . . . . . . . . . . . . 68
5.3 IMFs 6-10 obtained with the proposed 2D-BEMD algorithm for partially
defocused images. Left column: IMFs corresponding to image A. Right
column: IMFs corresponding to image B. . . . . . . . . . . . . . . . . . . 69
xi
5.4 BEMD and 2D-BEMD fusion results on an artificially generated partially
blurred image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5 2D-BEMD fusion results on two partially defocused sets of images. . . . . 73
5.6 Examples of BEMD and 2D-BEMD based fusion results. The input images
are partially defocused (background versus foreground) while the recon-
structed (all in focus) image of the BEMD case exhibits significant artifacts. 74
6.1 Expression masks used for decomposition with an input image of arbitrary
expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 2D-BEMD analysis for an input image and a surprised mask image. The
edges of the input are among the first few IMFs, while most of the infor-
mation of the mask is found in IMFs 5 and 6. . . . . . . . . . . . . . . . 79
6.3 Weights used in fusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Examples of expression transformation with 2D-BEMD. From top to bot-
tom: input images, expression masks, transformed faces and ground truth
images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 The enrollment pipeline. Every gallery image to be enrolled is first used
to synthesize 6 expression variants. . . . . . . . . . . . . . . . . . . . . . 81
6.6 Verification rate versus false acceptance rate, for gallery of size 10. . . . . 82
6.7 Verification rate versus false acceptance rate, for gallery of size 20. . . . . 82
6.8 Verification rate versus false acceptance rate, for gallery of size 40. . . . . 83
A.1 Signal to be decomposed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.2 IMFs and the residue after decomposition. . . . . . . . . . . . . . . . . . 89
A.3 Sifting iteration 0 for Candidate IMF 1. . . . . . . . . . . . . . . . . . . . 90
A.4 Sifting iteration 1 for Candidate IMF 1. . . . . . . . . . . . . . . . . . . . 91
A.5 Sifting iteration 10 for Candidate IMF 1. . . . . . . . . . . . . . . . . . . 92
A.6 Sifting iteration 0 for Candidate IMF 2. . . . . . . . . . . . . . . . . . . . 93
xii
A.7 Sifting iteration 10 for Candidate IMF 2. . . . . . . . . . . . . . . . . . . 94
A.8 Sifting iteration 0 for Candidate IMF 3. . . . . . . . . . . . . . . . . . . . 95
A.9 Sifting iteration 7 for Candidate IMF 3. . . . . . . . . . . . . . . . . . . . 96
A.10 Sifting iteration 0 for Candidate IMF 6. . . . . . . . . . . . . . . . . . . . 97
A.11 Sifting iteration 3 for Candidate IMF 6. . . . . . . . . . . . . . . . . . . . 98
B.1 Example of a stimulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
B.2 Example of 2 per-buffer response from the same group. . . . . . . . . . . 101
B.3 Example of buffer averaged response. . . . . . . . . . . . . . . . . . . . . 102
B.4 Example of buffer averaged response with low frquency trend removed. . 102
B.5 Histogram of stabilization time for recordings from all subjects, including
outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.6 Similarity of TEOAE recordings from the same subject after applying R-
EMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.7 Difference of TEOAE recordings form different subjects after applying R-
EMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.8 Histogram of WWR for recordings from all subjects, including outliers. . 105
xiii
List of Algorithms
1 EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 2D EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 FA-2D-EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 BEMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 R-EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 2D-BEMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7 FA-2D-BEMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
xiv
Symbols and Abbreviations
t continuous variable
m,n discrete variables
x(t) one dimensional continuous time signal
x(n) one dimensional discrete time signal
x(m,n) two dimensional signal (or image, surface)with m and n representing
spatial index
z(n) complex-valued one dimensional discrete time signal
z(m,n) complex-valued two dimensional signal (or image, surface)
xi(n) the ith IMF associated with signal x(n), with superscript denoting
IMF index
xi(m,n) the ith IMF associated with signal x(m,n), with superscript denoting
IMF index
sgn(x) the signum function
EMD Empirical Mode Decomposition
IMF Intrinsic Mode Function
BEMD Bivairiate Empirical Mode Decomposition
2D-EMD Two Dimensional Empirical Mode Decomposition
R-EMD Reference Empirical Mode Decomposition
2D-BEMD Two Dimensional Bivariate Empirical Mode Decomposition
LDA Linear Discriminant Analysis
PCA Principle Component Analysis
OAE Otoacoustic Emission
TEOAE Transient Evoked Otoacoustic Emission
xv
Chapter 1
Introduction
Time-frequency analysis has been widely used in engineering and biomedical applica-
tions since it provides insight into complex structure of the signal and can potentially
reveal the underlying processes. Unlike the traditional Fourier transform, time-frequency
transforms are capable of capturing time-varying characteristics of a time series. Wavelet
transform, which is the most widely applied time-frequency analysis, is carried out by
calculating the inner product of the signal and a family of wavelets, which are dilated and
translated from a fixed mother wavelet. Although it is effective in many applications,
due to the fact that it is essentially linear with fixed basis it is still suboptimal for real
world signals.
Recently, Empirical Mode Decomposition (EMD) together with Hilbert Transform
was proposed by Huang et al. [1], as a new time-frequency analysis tool for nonlinear and
nonstationary signals. It has been applied to different areas and has demonstrated to be
effective in mechanical engineering, biomedicine, geology and financial analysis [2–7]. It
is data driven and adapts to the embedded signal property automatically, which made it
powerful as a data analysis tool. For a discrete 1D signal x(n), EMD decomposes it into
a set of oscillation components (Intrinsic Mode Function, or IMF) plus a signal trend :
x(n) =∑i
xi(n) + xr(n) (1.1)
1
Chapter 1. Introduction 2
where xi(n) denotes the ith level IMF and xr(n) denotes the signal trend. IMF satisfies
two criteria in order to be considered as characteristic oscillatory component. It is a
signal such that:
• Number of extrema and number of zero crossings are equal or differ by at most one.
• Upper envelope and lower envelope are symmetric to each other.
After the IMFs are obtained, one can apply Hilbert Transform on each IMF to get its
instantaneous amplitude and instantaneous frequency which can be combined together
to yield an amplitude-time-frequency spectrum.
The decomposition is done by iteratively separating faster oscillations from the slower
ones on local scales, thereby creating a decomposition which ranges from the fastest
oscillatory component x1(n) to the slowest signal trend xr(n). Moreover, the above
procedure is adaptive and data-driven, meaning that no pre-defined basis is required.
1.1 Problems of Empirical Mode Decomposition
The EMD algorithm is simple and widely applied, but its theoretical background and
limitations remain uncertain due to its empirical and algorithmic nature. An overview
was presented in [11] on some aspects of the exisiting problems with EMD, including con-
fidence limit, the boundary effect, interpolation method, mode mixing and the “unique-
ness” problem. The “uniqueness” problem is one of the major problems of EMD which
originates from the fact that there is no analytical definition for the decomposition. Giv-
ing a signal and the IMF requirement, there can be many valid sets of IMFs to reconstruct
the signal. Moreover, since the decomposition procedure is iterative and signal depen-
dant, information such as number of levels and oscillatory modes in each level cannot be
determined a priori.
Researchers have been investigating the possibility of detecting an optimal or unique
set of IMFs, but it is difficult to justify what is optimal, in addition, how it can be
Chapter 1. Introduction 3
determined is unclear. It has been shown that [11], different IMF sets can be generated
from the same signal by varying the parameters related to sifting process1, but how they
are inter-related is still unclear. To answer this, more fundamental problems need to be
solved first, such as how to define the decomposition process rigorously. Although the
problem has drawn significant attention from the methematics community, before such
definition can be made, to address the “uniquenss” problem from an engineering point
of view, it is necessary to refine the algorithm such that unified representation can be
obtained.
1.2 Research Goals
Among the recent development of the EMD algorithm, to the best of the author’s knowl-
edge, there is no framework to address the problem of obtaining a unified representation
after decomposition, that is, to develop a tractable algorithm while retaining the desired
features of the original EMD.
In this thesis, we will investigate why such representation is necessary, and develop
two variants of the original algorithm: One Dimensional Reference EMD (R-EMD) to
address the problem in one dimensional case; Two Dimensional Bivariate EMD (2D-
BEMD) to address the problem in two dimensional case. We will demonstate that the
proposed algorithms are capable of obtaining a unified representation while providing
meaningful decomposition, by comparing the proposed ones with state of the art EMD
algorithms.
Successful applications of the two proposed algorithms will be demonstrated for three
real world problems where the state of the art EMD algorithms cannot be applied or
proved to be suboptimal. In case where existing EMD algorithm is applicable but con-
sidered suboptimal, performance will be compared against the state of the art EMD
1A core process in EMD, which will be presented in details in Chapter 2.
Chapter 1. Introduction 4
algorithms. In case where no exisiting EMD algorithm is applicable, performance will be
compared with available baseline methods. Performance will also be compared to those
of wavelet transform where appropriate.
1.3 Contributions
Major contributions from this thesis are summarized as follows:
1. Proposed and developed R-EMD for decomposing multiple 1D signals under unified
representation. Upper and lower bound of frequency extraction region were derived
and validated. Rules for reference frequency selection were given.
2. Collected the first dataset of transient evoked otoacoustic emission (TEOAE) sig-
nals with moderate number of subjects, under a specific biometric setup.
3. Framework for using TEOAE as biometric modality has been developed by em-
ploying the proposed R-EMD. The method was validated on the collected dataset.
4. Proposed and developed 2D-BEMD for decomposing 2D signals under unified rep-
resentation.
5. Applied 2D-BEMD to image fusion and the results outperform BEMD, pixel aver-
aging and wavelet fusion.
6. Applied 2D-BEMD for transforming facial expressions in an expression invariant
face recognition framework. The results outperform the baseline PCA approach.
1.4 Related Publications
[12] Jiexin Gao, F. Agrafioti, S. Wang, and D. Hatzinakos. Transient otoacoustic
emissions for biometric recognition. In Acoustics, Speech and Signal Processing
(ICASSP), 2012 IEEE International Conference on, March 2012.
Chapter 1. Introduction 5
This paper was the winner of the Fujitsu Student Paper Award at ICASSP 2012
[13] Jiexin Gao and D. Hatzinakos. Effect of initial phase in two tone separation
using empirical mode decomposition. In Acoustics, Speech and Signal Processing
(ICASSP), 2012 IEEE International Conference on, March 2012.
[14] F. Agrafioti, Jiexin Gao, H. Mohammadzade, and D. Hatzinakos. A 2d bivariate
emd algorithm for image fusion. In Digital Signal Processing (DSP), 2011 17th
International Conference on, pages 1-6, July 2011.
[15] H. Mohammadzade, F. Agrafioti, Jiexin Gao, and D. Hatzinakos. Bemd for expres-
sion transformation in face recognition. In Acoustics, Speech and Signal Processing
(ICASSP), 2011 IEEE International Conference on, pages 1501-1504, May 2011.
1.5 Thesis Organization
This thesis is organized as follows:
In Chapter 2, a brief review is provided for the original one dimensional EMD (EMD)
and two dimensional EMD (2D-EMD) algorithm, as well as the one dimenional bivari-
ate EMD (BEMD). We also discuss the importance of unified signal representation and
demonstrate different scenarios under which the existing algorithms fail to provide such
representation.
In Chapter 3, two variants of EMD algorithm are developed, one dimensional reference
EMD (R-BEMD) for one dimensional case and two dimensional bivariate EMD (2D-
BEMD) for two dimensional case.
From Chapter 4 to Chapter 6 the focus is on different applications of the two proposed
algorithms.
In Chapter 4, we apply R-EMD to decompose transient evoked otacoustic emission
(TEOAE) signals at multiple scales for biometric recognition. Under the unified repre-
Chapter 1. Introduction 6
sentation, a piece of TEOAE recording can be matched with all enrolled recordings for
identity establishment purpose. Performance of the proposed system is evaluated with
our collected data.
In Chapter 5, we apply 2D-BEMD to simultaneously decompose two multi-focused
images, in order to fuse information at different scales for a better full-focused image.
Performance will be compared against the state of the art EMD algorithm, and traditional
methods such as wavelet and pixel averaging.
In Chapter 6, we apply 2D-BEMD to simultaneously decompose a face image with an
expression mask, in order to transform the expression on input face into the desired one.
Using this as a pre-processing step in a face recognition system, significant improvement
can be achieved over the baseline method. As a priliminary result, we compare the
performance of the proposed method with a baseline PCA-based face recognition system.
In Chapter 7, conclusion and future directions are presented.
Chapter 2
Empirical Mode Decomposition and
Its Representational Problem
In this chapter a brief review is provided on the original EMD algorithm1 and some of
its related variants for providing a unified representation. Details on the parameters and
procedures involved in the original EMD algorithm is also given for a better understanding
of the method. A summary of each algorithm and the respective decomposition results
on sample signals are presented. Then a demonstration is provided to justify that under
certain scenarios a unified representation is necessary, which motivates the development
of the subsequent algorithms in the next chapter.
2.1 Empirical Mode Decomposition
Empirical Mode Decomposition (EMD) was developed by Huang et al. [1] for processing
nonlinear and nonstationary data. EMD performs an adaptive analysis in the time
domain, in order to isolate the inherent oscillations, referred to as the intrinsic mode
functions (IMF). For an one dimensional signal x(n), EMD algorithm operates as follows:
1For simplicity, this algorithm will be referred to as EMD directly in later discussions.
7
Chapter 2. Empirical Mode Decomposition and Its Representational Problem8
1. Initialize the signal to be iterated on as the original signal: xr(n) = x(n). Initialize
IMF index i = 1.
2. Detect local maxima and minima of xr(n).
3. Interpolate among local maxima to get an upper envelope envu(n), and local min-
ima for envl(n), respectively.
4. Compute the mean envelope as the average of the upper and lower envelopes:
12[envu(n) + envl(n)].
5. Subtract the mean envelop from signal to get a candidate IMF: xi(n) = xr(n) −12[envu(n) + envl(n)].
6. If the candidate IMF satisfies the stop criterion, update the residue for next IMF
extraction: xr(n) = xr(n) − xi(n), increase IMF index i and go to step 7. Other-
wise, update residue for the next sifting iteration: xr(n) = xr(n)− xi(n) and keep
iterating from step 2.
7. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.
Otherwise, go to step 2.
Step 2 to 6 descibe a sifting process, which is stopped when xi(n) meets the stop
criterion for IMF:
• Total number of local extremum points and total number of zero crossings are equal,
or differ at most by 1.
• Average of its upper envelop and lower envelop is almost zero mean.
If xi(n) meets the stop criterion for IMF, it describes an underlying oscillation of x(n).
The next step would be to remove this oscillation from xr(n), increase the IMF index
and iterate on the residual. A detailed demonstration of the sifting process is presented
Chapter 2. Empirical Mode Decomposition and Its Representational Problem9
in Appendix A. The procedure is terminated when xr(n) describes a true residue (signal
trend) which is a monotonic function such that no more IMF can be extracted.
The algorithm is summarized in Algorithm 1, where CheckStopCriterion checks to see
if the target signal is an IMF in order to stop the sifting process, and CheckTerminate-
Criterion checks if the entire algorithm needs to be terminated, which happens when the
current signal to be iterated cannot be further decomposed.
Algorithm 1: EMD
1 i = 1;2 xr(n) = x(n);3 while NOT(CheckTerminateCriterion(xr(n))) do4 xr(n) = xr(n);5 while NOT(CheckStopCriterion(xi(n))) do6 envu(n) = Interpolate(FindLocalMax(xr(n)));7 envl(n) = Interpolate(FindLocalMin(xr(n)));8 xi(n) = xr(n)− 1
2[envu(n) + envl(n)];
9 xr(n) = xr(n)− xi(n);
10 end11 xr(n) = xr(n)− xi(n);12 i = i+ 1;
13 end
After execution of the algorithm, suppose there are a total of L − 1 IMFs and the
residue in the end, then the original signal x(n) can be represented by:
x(n) =L−1∑i=1
xi(n) + xr(n) (2.1)
where x1(n), x2(n), x3(n) . . . xL−1(n) denote the Intrinsic Mode Functions (IMF) and
xr(n) denotes the residue. Alternatively we can include the residue in the summation to
obtain a simpler representation:
x(n) =L∑i=1
xi(n) (2.2)
where the residue is represented as the Lth level IMF. Note that this is just an alternative
representation for simplicity since the residue does not satisfy the IMF criterion. In most
of the subsequent discussions, we use this simplified representation.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem10
200 400 600 800 1000 1200 1400 1600 1800 2000
−1
−0.5
0
0.5
1
Sample
Am
plit
ude
Figure 2.1: A synthesized signal.
The IMFs after decomposition are considered as a set of intrinsic oscillatory com-
ponents of the original signal. If a time-frequency representation is needed, the set of
IMFs can further go through a Hilbert Transform, which gives the instantaneous ampli-
tude and frequency for each IMF that can be combined together for an amplitude versus
time-frequency spectrum.
Figure 2.1 shows a synthesized signal and a demonstration of the decomposition result
is shown in Figure 2.2. In Figure 2.2 there are 6 IMFs plus the residue, where the first row
shows IMF1 and the last row shows the residue. It can be observed that the decomposition
adapts to the intrinsic oscillatory modes without any prior assumption and information
about the signal.
The corresponding amplitude-time-frequency2 plot is shown in Figure 2.3. Note that
in the spectrum darker colors represent higher amplitude and lighter colors represent
lower amplitude.
2The combination of EMD and Hilbert Transform is also known as Hilbert-Huang Transform, orHilbert-Huang spectrum.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem11
Am
plit
ud
e
200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
−0.1
0
Sample
−0.5
0
0.5
−0.2
0
0.2
−0.5
0
0.5
−0.5
0
0.5
−1
0
1
−1
0
1
Figure 2.2: Intrinsic mode functions from decomposing the signal in Figure 2.1. From
top to bottom: IMF1, IMF2, IMF3, IMF4, IMF5, IMF6 and residue.
Details about the algorithm, including local extrema detection, interpolation method,
terminate criterion and stop criterion are presented in the following sections.
2.1.1 Local Extrema
Since the algorithm start decomposing the signal from the finnest scale, FindLocalMax
searches for particular sample that is larger than both its left and right neighbours. For
a signal u(n), it finds the collection of points {ni} such that:
{ni} = {ni : u(ni) > u(ni−1) & u(ni) > u(ni+1)} (2.3)
FindLocalMin operates in a similar manner.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem12
no
rma
lize
d f
req
ue
ncy
time
Hilbert−Huang spectrum
200 400 600 800 1000 1200 1400 1600 18000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Figure 2.3: Amplitude versus time-frequency spectrum.
2.1.2 Interpolation
Interpolation is of particular importance for the EMD algorithm. Indeed, the superpo-
sition of IMF2 up to the residue is the superposition of all the mean envelopes from the
sifting process for IMF1. To see this, we can write the original signal as IMF1 plus a
collection of mean envelopes, as in the form after the first IMF has been extracted:
x(n) = x1(n) +∑s
envsm (2.4)
where envsm denotes the mean envelope in the sth sifting iteration, and the sum is taken
over all sifting iterations during the process to extract IMF1. In fact this collection of
mean envelopes is the residue to be further processed for higher level IMFs and the true
Chapter 2. Empirical Mode Decomposition and Its Representational Problem13
residue. In other words, the interpolation method defines how osscilatory mode can be
constructed in the EMD algorithm.
The most commonly used interpolation method is cubic spline, which operates as
follows:
Giving a set of K points from the original signal which consists of the index set
{t1, t2, · · · tK} and the corresponding amplitude set {x(t1), x(t2), · · ·x(tK)}, cubic spline
interpolation finds a piecewise function:
S(t) =
S1(t) for t1 ≤ t ≤ t2
S2(t) for t2 ≤ t ≤ t3
...
SK−1(t) for tK−1 ≤ t ≤ tK
(2.5)
where each Sk(t) is a 3rd order polynomial:
Sk(t) = ak(t− tk)3 + bk(t− tk)2 + ck(t− tk) + dk (2.6)
To solve for the parameters these conditions must be satisfied:
1. S(t) passes through the provided K points
2. S ′(t) is continuous
3. S ′′(t) is continuous
that is ∀k = 1, 2, · · ·K − 1 we need:
Sk(tk) = x(tk)
Sk(tk+1) = x(tk+1)
S ′k(tk+1) = S ′k+1(tk+1)
S ′′k (tk+1) = S ′′k+1(tk+1)
Chapter 2. Empirical Mode Decomposition and Its Representational Problem14
In addition, two more boundary conditions need to be specified in order for the parameters
to be solved. Two standard choices exist and can be used alternatively. In natural cubic
spline we require: S ′′1 (t1) = 0
S ′′K−1(tK) = 0
(2.7)
In clamped cubic spline we require:S ′1(t1) = x′(t1)
S ′K−1(tK) = x′(tK)
(2.8)
2.1.3 Terminate Criterion
The CheckTerminateCriterion in the outer loop is to check whether the current residue
xr(n) can be further decomposed or should be considered as the true residue of the
original signal. It is satisfied when the total number of extrema is less than three:
|{nmax}|+ |{nmin}| < 3
where |{nmax}| and |{nmin}| represent the number of maxima and minima, respectively.
2.1.4 Stop Criterion
The CheckTerminateCriterion in the inner loop is to check whether the candidate IMF
xi(n) is actually an IMF. In practice, this is determined by either checking the IMF
conditions directly, or by a Cauchy convergence test. Giving a candidate IMF xis(n) at
level i for sifting iteration s, these two type of tests are as follows:
• IMF condition check
Let the upper evelope of xis(n) be envu(n) and the lower envelope be envl(n).
By checking the IMF conditions directly, the stop criterion is satisfied when the
Chapter 2. Empirical Mode Decomposition and Its Representational Problem15
candidate IMF is oscillatory and its mean envelope is zero mean:|envu(n) + envl(n)| < δ2 · |envu(n)− envl(n)|, ∀n
|{no}|N
< δt
−1 ≤ |{nzero}| − (|{nmax}|+ |{nmin}|) ≤ 1
(2.9)
with
{no} = {no : |envu(no) + envl(no)| > δ1 · |envu(no)− envl(no)|}
where N is the length of candidate IMF vector and |{nzero}| is the number of zero
crossings of the candidate IMF. δ1, δ2 and δt are adjustable parameters.
• Convergence test
By a Cauchy convergence test, the stop criterion is satisfied when the candidate
IMFs from two consecutive sifting iterations are close enough to each other:
SD < δSD
where δSD is an adjustable parameter and
SD =∑n
|xis−1(n)− xis(n)|2
|xis−1(n)|2
2.2 Two Dimensional Empirical Mode Decomposi-
tion
The original EMD algorithm was also extended for the decomposition of images (two
dimensional signal) [16, 17]. There are multiple ways of referring to this algorithm, in
this thesis we use two dimensional empirical mode decomposition (2D-EMD) throughout
the discussion.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem16
As an extension of the original algorithm, 2D-EMD utilizes a similar sifting process,
with the exception that instead of 1D cubic spline interpolation, 2D scattered data in-
terpolation is used to derive the mean surface. The interpolation can be done via many
types of methods, such as multilevel B-spline, Delaunay triangulation, or finite-element
method. Giving an image x(m,n), the 2D-EMD algorithm operates as follows:
1. Initialize the signal to be iterated on as the original signal xr(m,n) = x(m,n).
Initialize IMF index i = 1.
2. Detect local maxima and minima of xr(m,n).
3. Interpolate among local maxima to get an upper surface envelope envu(m,n), and
local minima for lower surface envelope envl(m,n), respectively.
4. Compute the mean envelope as the average of the two: 12[envu(m,n) + envl(m,n)].
5. Subtract the mean envelope from signal to get a candidate IMF: xi(m,n) = xr(m,n)−12[envu(m,n) + envl(m,n)].
6. If the candidate IMF satisfies the stop criterion, update the residue for next IMF
extraction: xr(m,n) = xr(m,n) − xi(m,n), increase IMF index and go to step 7.
Otherwise, update the residue for the next sifting iteration: xr(m,n) = xr(m,n)−
xi(m,n) and keep iterating from step 2.
7. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.
Otherwise, go to step 2.
The procedure is described in Algorithm 2.
In this algorithm FindLocalMax2D (FindLocalMin2D) is a 2D extension to the ex-
trema detector in EMD, it searches in a 9× 9 grid centered around the point of interest
for local maxima and minima. CheckStopCriterion2D is slightly different than the one
Chapter 2. Empirical Mode Decomposition and Its Representational Problem17
Algorithm 2: 2D EMD
1 i = 1;2 xr(m,n) = x(m,n);3 while NOT(CheckTerminateCriterion2D(xr(m,n))) do4 xr(m,n) = xr(m,n);5 while NOT(CheckStopCriterion2D(xi(m,n))) do6 envu(m,n) = Interpolate2D(FindLocalMax2D(xr(m,n)));7 envl(m,n) = Interpolate2D(FindLocalMin2D(xr(m,n)));8 xi(m,n) = xr(m,n)− 1
2[envu(m,n) + envl(m,n)];
9 xr(m,n) = xr(m,n)− xi(m,n);
10 end11 xr(m,n) = xr(m,n)− xi(m,n);12 i = i+ 1;
13 end
used in EMD since for two dimensional signals it is impossible to check the number of
zero crossings. Thus the stop criterion is modified as:
• At any point, the mean value of the upper and lower surface envelope is near zero.
• The IMFs are locally orthogonal to each other.
In practice, usually a Cauchy convergence test is used as stop criterion. Let xis(m,n) be
the candidate IMF at level i for sifting iteration s. The stop criterion is satisfied when:
SD < δSD
where δSD is an adjustable threshold and
SD =∑m,n
|xis−1(m,n)− xis(m,n)|2
|xis−1(m,n)|2
Similarly to the EMD algorithm, CheckTerminateCriterion2D checks to see if the total
number of local extrema is less than 3 so that the residue cannot be further decomposed.
After the execution of the algorithm, suppose we have a total of L− 1 IMFs and the
residue in the end, then the original image x(m,n) can be represented by:
x(m,n) =L−1∑i=1
xi(m,n) + xr(m,n) (2.10)
Chapter 2. Empirical Mode Decomposition and Its Representational Problem18
where x1(m,n), x2(m,n), x3(m,n) . . . xL−1(m,n) denote the Intrinsic Mode Functions
(IMF) and xr(m,n) denotes the residue. Note that here both the IMFs and residue
are two dimensional signals (images). Similar to the alternative representation for EMD,
we can include the residue in the summation:
x(m,n) =L∑i=1
xi(m,n) (2.11)
where the residue is represented as the Lth level IMF.
Due to the computational complexity of the 2D interpolation operation and the it-
erative nature of the algorithm, 2D-EMD is a relatively slow algorithm. Bhuiyan [18]
proposed the use of an approximation process to estimate the mean surface, thereby
reducing or avoiding the use of the iterative sifting process, in order to speed up the
2D-EMD algorithm. This FA-2D-EMD3 without sifting process operates as follows:
1. Initialize the signal to be iterated on as the original signal xr(m,n) = x(m,n).
Initialize IMF index i = 1.
2. Detect local maxima and minima of xr(m,n).
3. Determine the set of distances {dadj−max} between adjacent maxima, and the set of
distances {dadj−min} between adjacent minima.
4. Find the largest and smallest distances in both {dadj−max} and {dadj−min}. Choose
a window size accordingly4.
5. Apply order statistics filter and smooth filter to obtain the upper and lower surface
envelope.
6. Removing mean surface (average of the upper and lower surface envelope) from
xr(m,n) to get the current level IMF.
3In the original article it was referred to as FABEMD. In order to avoid confusion with BivariateEmpirical Decomposition (BEMD) which will be covered later, FA-2D-EMD is used instead.
4There are four choices: maximum of {dadj−max}, minimum of {dadj−max}, maximum of {dadj−min}and minimum of {dadj−min}. Details on window size selecting can be found in [18].
Chapter 2. Empirical Mode Decomposition and Its Representational Problem19
7. Update the residue for next IMF extraction: xr(m,n) = xr(m,n)− xi(m,n)
8. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.
Otherwise, go to step 2.
Algorithm 3: FA-2D-EMD
1 i = 1;2 xr(m,n) = x(m,n);3 while NOT(CheckTerminateCriterion2D(xr(m,n))) do4 xr(m,n) = SmoothFilter(OrderStatisticsFilter(xr(m,n)));5 xi(m,n) = xr(m,n)− xr(m,n);6 xr(m,n) = xr(m,n);7 i = i+ 1;
8 end
Figure 2.4: Local maximum and minimum detection.
In FA-2D-EMD the mean surface is estimated by applying an order statistics filter
followed by a smoothing operation. It was claimed that this procedure approximates
Chapter 2. Empirical Mode Decomposition and Its Representational Problem20
upper surface envelope before smoothing lower surface envelope before smoothing
upper surface envelope after smoothing lower surface envelope after smoothing
Figure 2.5: Surface envelopes.
the original sifting process in one iteration. During order statistic filtering, the upper
surface envelope is obtained by setting each pixel value to the maximal value within a
window surrounding it, with the window size determined by the overall distance between
adjacent local extrema. The lower surface envelope can be obtained similarly.
Figure 2.4 shows the detected local maxima (denoted as ‘o’s) and minima (denoted as
‘x’s) for a 20× 20 sample image. Figure 2.5 shows the upper and lower surface envelope,
before and after local smoothing. After this, the mean surface is obtained by averaging
the two smoothed upper and lower surface envelopes.
A demonstration of the decomposition result is shown in Figure 2.6 and 2.7. Figure
Chapter 2. Empirical Mode Decomposition and Its Representational Problem21
2.6 shows an image and Figure 2.7 shows the result after applying FA-2D-EMD.
Figure 2.6: An image.
2.3 Bivariate Empirical Mode Decomposition
The Bivariate EMD [19] was proposed in order to obtain unified representation between
a pair of signals x(n) and y(n). BEMD performs a similar procedure as in the original
EMD, but for a complex signal x(n) + jy(n). The idea is that instead of looking for
oscillating components, BEMD looks for rotating components in three dimensional space
defined by real, imaginary and time axes. Thus, the analysis is performed simultaneously
for the real and imaginary components, resulting in the same number of IMFs for both.
In analogous to the envelope in EMD, 3D tube is built in BEMD to surround the
signal being iterated such that the center of the tube acts similar to the mean envelope
and can be removed iteratively to reveal the IMF. This is demonstrated in Figure 2.8.
Note that this is a demonstration of how 3D tube is built, which corresponds to one
sifting iteration. In order to extract a valid IMF, the complex signal in Plot (c) will need
to be further processed, that is, 3D tube need to be built to surround it in order for
slowly rotating trend to be removed.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem22
IMF1 IMF2
IMF3 IMF4
IMF5 IMF6
Figure 2.7: Intrinsic mode functions from decomposing the image in Figure 2.6.
In order to estimate the center of the tube, projection of the complex signal is ob-
tained at different directions among the real and imaginary part. Combining the partial
estimations in all directions results in an estimation of the center location. This is demon-
strated in Figure 2.9, where the ellipse represents a cross section of the 3D tube, and
the four points corresponds to the extrema when projecting to four directions. The tube
center is defined as the center of mass of all these points, assuming unit mass for every
point. Other methods of estimating the center are available and was presented in [19].
Giving two signals x(n) and y(n), BEMD operates as follows:
1. Initialize the signal to be iterated on as the complex signal: zr(n) = x(n) + jy(n).
Initialize IMF index i = 1.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem23
Figure 2.8: 3D tube built to surround a complex signal.
Figure 2.9: Estimation of the center of 3D tube.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem24
2. Project zr(n) onto K different directions within [0, π].
3. For each direction, find the upper and lower envelope.
4. Taking the average of all envelopes to get the center of tube envm(n).
5. Removing envm(n) from zr(n) to obtain the candidate IMF zi(n).
6. Check if the stop criterion is satisfied, if so, update residue for next IMF extraction
zr(n) = zr(n) − zi(n), increase IMF index i and continue to step 7. Otherwise,
update residue for next sifting iteration zr(n) = zr(n) − zi(n), and keep iterating
from step 2.
7. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.
Otherwise, go to step 2.
The complete BEMD procedure is summarized in Algorithm 4.
Algorithm 4: BEMD
1 z(n) = x(n) + jy(n);2 i = 0;3 zr(n) = z(n);4 while NOT(CheckTerminateCriterion(zr(n))) do5 zr(n) = zr(n);6 i = i+ 1;7 while NOT(CheckStopCriterion(zi+1(n))) do8 envm(n) = [0 0 . . . 0];9 for k = 1→ K do
10 ϕk = kπK
;11 envu(n) = Interpolate(FindLocalMax(Project(zr(n), ϕk)));12 envl(n) = Interpolate(FindLocalMin(Project(zr(n), ϕk)));13 envm(n) = 1
2[envm(n) + 1
2e−jϕk(envu(n) + envl(n))];
14 end15 zi(n) = zr(n)− envm(n);16 zr(n) = zr(n)− zi(n);
17 end
18 end
Chapter 2. Empirical Mode Decomposition and Its Representational Problem25
The stop criterion and the terminate criterion are similar to those with the EMD
algorithm, with the exception that for terminate criterion, signal projections on different
directions need to be checked and the algorithm will be terminated whenever there are
less than 3 extrema in any of the directions.
After the execution of the algorithm, suppose we have a total of L− 1 IMFs and the
residue in the end, then the original complex signal z(n) can be represented by:
z(n) =L−1∑i=1
zi(n) + zr(n) (2.12)
Writting this in a simple form we have:
z(n) =L∑i=1
zi(n) (2.13)
where the residue has been moved inside the summation.
Since all IMFs are complex signals as well, we can write out their real and imaginary
parts explicitly:
zi(n) = xi(n) + jyi(n) (2.14)
xi(n) = Re{zi(n)}
yi(n) = Im{zi(n)}
so that we have same number of decomposition levels for both signals:
x(n) =L∑i=1
xi(n)
y(n) =L∑i=1
yi(n)
(2.15)
Figure 2.10 shows an example of two synthesized signals. A demonstration of the
decomposition result is shown in Figure 2.11, where the solid lines represent the real
parts of the IMFs, which corresponds to signal A in Figure 2.10, and the dashed lines
represent the imaginary parts of the IMFs, from signal B.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem26
Am
plit
ud
e
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
−10
−5
0
5
10
Sample
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
−10
−5
0
5
10
Figure 2.10: Two synthesized signals. Top: signal A. Bottom: signal B.
2.4 Other Variations
For decomposing more than two 1D signals together, algorithms have been proposed, such
as trivariate EMD [20] and multivariate EMD [21]. One drawback of these algorithms
is that as the signal dimension or the number of signals increase, the computational
complexity grows exponentially since within each sifting iteration projections need to be
taken in higher dimensional hyperspheres. In practice such complexity is undesired.
2.5 Problem with EMD - No unified representation
Due to the empirical and algorithmic nature of EMD, no unified representation for the
decomposition is gauranteed when applying the original algorithm. When a certain type
of signals are decomposed, it is desired that the output would be under the same unified
representation, for example same number of levels and correspondence at each level. All
Chapter 2. Empirical Mode Decomposition and Its Representational Problem27
Am
plit
ud
e
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−2
0
2
Sample
−5
0
5
−5
0
5
−10
0
10
−5
0
5
Figure 2.11: Complex intrinsic mode functions from decomposing the pair of signals
in Figure 2.10. Solid lines represent real parts of the IMFs and dashed lines represent
imaginary parts of the IMFs.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem28
state of the art EMD algorithms are either unable to solve this problem, or can only
address this to a certain extent which is far from satisfactory.
2.5.1 One Dimensional CaseA
mp
litu
de
100 200 300 400 500 600
−5
0
5
10
x 10−6
Sample
100 200 300 400 500 600
−2
0
2
x 10−5
100 200 300 400 500 600
−1
−0.5
0
0.5
1x 10
−5
Figure 2.12: Example of three signals.
When a set of signals are decomposed, for example, multiple recordings of similar
processes, we want the decompositions of all the signals to be of same number of levels and
each level to be consisting of similar oscillation types. In the case of two signals, a unified
representation can be obtained by using BEMD. For three or more signals, although
there are extensions of EMD such as trivariate-EMD [20] or multivariate-EMD [21], the
computational complexity involved is exponential of the number of signals. Furthermore,
for each new signal to be decomposed, we need to add this signal into the collection and
repeat the decomposition from the beginning, which is not practical.
If we apply the original EMD to multiple signals separately, most probably we will
get different number of IMFs, and most importantly IMFs at the same level might not
Chapter 2. Empirical Mode Decomposition and Its Representational Problem29
−2
0
2x 10
−5
−1
0
1x 10
−5
−5
0
5x 10
−6
−2
0
2x 10
−4
−2
0
2x 10
−5
−5
0
5x 10
−5
−1
0
1x 10
−5
−2
0
2x 10
−5
−2
0
2x 10
−4
−5
0
5x 10
−6
−1
0
1x 10
−5
−5
0
5x 10
−6
−5
0
5x 10
−6
−5
0
5x 10
−6
−2
0
2x 10
−5
Figure 2.13: Decomposition obtained by applying EMD separately.
correspond to same type of oscillations. In fact, this is true even when the number of
levels are the same. Figure 2.12 shows three otoacoustic emission signals and Figure
2.13 shows an example of decomposing three signals using EMD separately, where the
number of levels are different and no correspondence is gauranteed between the IMFs.
In Figure 2.13 each column shows a decomposition of one of the signals. These three
signals are actually of the same type and they cover roughly the same frequency range.
These signals will be discussed in details in Chapter 4.
The problem of obtaining unified representation for more than two signals will be
addressed by Reference-EMD (R-EMD) in Chapter 3.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem30
2.5.2 Two Dimensional Case
Original Signal A Original Signal B
Figure 2.14: Two faces with different expressions.
Since 2D-EMD is a relatively new tool, state of the art algorithms are presently only
capable of decomposing one image at a time. If information fusion is required between
two images, the current solution is to vectorize the two images, applying BEMD on the
two 1D signals, then organize everything back to 2D. This is suboptimal since vectorized
images lose the correlation from one spatial dimension completely.
If 2D-EMD is applied to multiple images separately, we will get different number
of IMFs and no correspondence is guaranteed. Figure 2.14 shows two images of the
same face under different facial expressions. Figure 2.15 shows the decomposition after
applying 2D-EMD separately, where no correspondence can be found between the IMFs.
These images will be discussed in details in Chapter 6.
The problem of obtaining unified representation for more than one image will be
addressed by 2D-BEMD in Chapter 3.
Chapter 2. Empirical Mode Decomposition and Its Representational Problem31
IMF A1 IMF A2
IMF A3 IMF A4
IMF A5 IMF A6
IMF B1 IMF B2
IMF B3 IMF B4
Figure 2.15: Decomposition obtained by applying 2D-EMD separately.
2.6 Conclusion
In summary, for one dimensional case, there exisits several different algorithms to satisfy
different decomposition requirements, such as EMD for decomposing a single signal,
BEMD for a pair of signals, trivariate EMD for three signals and multivariate EMD for
more than three signals. But as the number of signal increases, even the most capable
algorithm becomes intractable with the state of the art approach. On the other hand, the
development for two dimensional algorithms is still at an early stage with only 2D-EMD
covering the single image case.
In the next chapter, the problems described herein will be addressed via two new
algorithms that provide unified representations after decomposition.
Chapter 3
EMD with Unified Representation
In this chapter two new variants of the EMD algorithm are proposed, in order to obtain
unified representation after decomposisiton, for both one dimensional and two dimen-
sional cases.
The discussion in this chapter will start from the better developed one dimensional
case, where Reference EMD (R-EMD) is proposed, as a way to obtain unified represen-
tation among multiple signals by decomposing each signal with a set of references, giving
that these signals originate from a similar process. Then we continue to the development
of two dimensional bivariate EMD (2D-BEMD) for obtaining unified representation be-
tween two signals, as a step further on the existing two dimensional algorithms.
At the end of this chapter, two dimensional reference bivariate EMD (2D-R-BEMD)
will be briefly discussed to complete the proposed framework. Although this algorithm is
not used for any application in this thesis, it has potential in applications where spatial-
frequency information needs to be fused from more than two images.
3.1 One Dimensional Reference EMD
For decomposing multiple one dimensional signals together, the state of the art solution is
to combine the target signals into higher dimensions as in trivariate EMD and multivari-
32
Chapter 3. EMD with Unified Representation 33
ate EMD. The problem with this is that as the number of signals increases, dimensionality
gets higher and the complexity grows exponentially. Furthermore, adding a new signal
into the collection requires the repetition of the entire decomposition procedure, which
is not practical.
The proposed reference EMD (R-EMD) takes an alternative approach to this problem
by carefully constructing a set of reference signals that is appropriate for a certain type
of signals and use these references to guide the decomposition of every signal of interest.
Note that by doing so, arbitrary number of signals can be decomposed with unified
representation without increasing the computational complexity. Furthermore, adding a
new signal into the collection requires simple decomposition of the new signal against the
reference.
We propose to use a set of sinusoids as reference to achieve a wavelet-like frequency
separation, while retaining the adpative feature of the EMD algorithm.
3.1.1 Algorithm
For a one dimensional signal x(n), the first step is to determine a rough frequency range
for such type of signals and design a set of reference frequencies w according to the fre-
quency range and the desired number of decomposition levels. The rules for determining
reference frequencies will be discussed in details in Section 3.1.2. After w is determined,
R-EMD operates as follows:
1. Initialize IMF index i = 1.
2. Create a sinusoid vi(n) of the reference frequency w(i) at the current level i.
3. Initialize the signal to be iterated on as the complex pair: zr(n) = xr(n) + jvi(n)
4. For 1 ≤ k ≤ K
(a) Project zr(n) on ϕk = πK
(k − 1) to get Re{e−jϕkzr(n)}
Chapter 3. EMD with Unified Representation 34
(b) Compute the upper and lower envelops envu(n),envl(n).
(c) Average upper and lower envelop to get the partial mean envelop envm(n) on
kth direction.
Then the partial estimations are averaged in three dimensional space to get the
overall mean envelop envm(n).
5. Compute the candidate IMF:
xi(n) = Re{zr(n)− envm(n)}
If this candidate IMF satisfies the stop criterion, remove this IMF from zr(n),
increase the IMF index i and go to step 6. Otherwise, keep iterating from step 4.
6. Check if the pre-determined IMF levels is reached, if so terminate the algorithm.
If not, go to step 2.
The proposed algorithm is presented in more detail in Algorithm 5, where CheckTer-
minateCriterion checks if the number of levels are reached, for terminating the entire
algorithm and CheckStopCriterion checks to see if the target signal is an IMF in order
to stop the sifting process, similar to the original EMD algorithm.
3.1.2 Relationship between extracted frequency and reference
signal
In order to construct a set of reference sinusoids, it is important to know that for each
sinusoid in the set being used as reference, what kind of frequency components can be
extracted and is in correspondence with such sinusoid. To understand this, we derive the
local frequency relationship between the reference sinusoid and the extracted IMF at one
single level, together with experiment on synthesized signals to validate the result. For
notation simplicity and ease of derivations, in this section we use continuous time signals
Chapter 3. EMD with Unified Representation 35
Algorithm 5: R-EMD
1 i = 0;2 xr(n) = x(n);3 while i < L do4 vi(n) =CreateRefSignal(w, i);5 z(n) = xr(n) + jvi(n);6 i = i+ 1;7 while NOT(CheckStopCriterion(xi(n))) do8 envm(n) = [0 0 . . . 0];9 for k = 1→ K do
10 ϕk = πK
(k − 1);11 envu(n) = Interpolate(FindLocalMax(Project(zr(n), ϕk)));12 envl(n) = Interpolate(FindLocalMin(Project(zr(n), ϕk)));13 envm(n) = 1
2[envm(n) + 1
2e−jϕk(envu(n) + envl(n))];
14 end15 xi(n) = Re{z(n)− envm(n)};16 z(n) = envm(n);
17 end18 xr(n) = xr(n)− xi(n);
19 end
instead of discrete time signals. Note that all EMD related algorithms still operate in
the discrete time domain only.
In a general signal model, we allow a signal to be a superposition of different amplitude
modulated and frequency modulated (AM-FM) components:
x(t) =∑i
ai(t) cos(ωi(t)t) (3.1)
We first remark that:
• in the following analysis only local properties are considered
• ai(t) is usually smooth and slowly varying
Without loss of generality, we can set the amplitude and frequency to be both constants:
ai(t) = ai
ωi(t) = ωi
Chapter 3. EMD with Unified Representation 36
Let the current level reference signal be:
v(t) = cos(ωt) (3.2)
Essentially, the IMF is extracted by first forming the complex signal between the input
signal and the reference signal, and sifting out the fastest rotation component. The real
part of the sifted out component is the current level IMF. The question we ask is: among
all the components in x(n), what is the criterion for a certain component to be extracted?
In order to investigate that, we first form the complex signal between ith component from
the input signal and the reference signal:
z(t) = ai cos(ωit) + jv(t) = ai cos(ωit) + j cos(ωt) (3.3)
We are seeking the conditions under which the ith component is part of the extracted
IMF. As suggested in [19], for a complex signal to be a fixed point of the sifting process,
one necessary condition is that the sense of rotation never changes. The sense of rotation
SoR is defined by the sense of the vector product of velocity and acceleration:
SoR = Sgn
{Im
[dz
dt·(d2z
dt2
)∗]}(3.4)
Using the signal model, this can be computed as:
dz
dt= −aiωi sin(ωit)− jω sin(ωt)
d2z
dt2= −aiωi2 cos(ωit)− jω2 cos(ωt)(
d2z
dt2
)∗= −aiωi2 cos(ωit) + jω2 cos(ωt)
So that SoR can be written as:
SoR = Sgn
{Im
[dz
dt
(d2z
dt2
)∗]}= Sgn{−aiω2ωi sin(ωit) cos(ωt) + aiωi
2ω sin(ωt) cos(ωit)}
= Sgn{−(ωi + ω) sin[(ωi − ω)t] + (ωi − ω) sin[(ωi + ω)t]}
(3.5)
We investigate the function in two different scenarios:
Chapter 3. EMD with Unified Representation 37
1. ω = ωi
In this case we have SoR = Sgn{0} which is a constant. Under such condition the
complex pair z(t) = ai cos(ωit)+j cos(ωt) is a fixed point of the sifting operator and
the ith component will be extracted as part of the IMF. This makes intuitive sense
as well, since ω = ωi implies that the ith component is in perfect synchronization
with the reference signal.
2. ω 6= ωi
In the general case, we consider a local time span where the faster of the two signals
(cos(ωt) and ai cos(ωit)) completes one full cycle. In other words, the time span is
restricted to be:
0 ≤ t ≤ min(2π
ω,2π
ωi) (3.6)
Let y(t) be:
y(t) = −(ωi + ω) sin[(ωi − ω)t] + (ωi − ω) sin[(ωi + ω)t] (3.7)
Without loss of generality, we consider the ratio between the two frequencies:
q =ωiω
(3.8)
Substitute this into y(t) we have:
y(t) = −(ωq + ω) sin[(ωq − ω)t] + (ωq − ω) sin[(ωq + ω)t] (3.9)
Since only Sgn[y(t)] is considered, it suffices to evaluate:
y(t) = −(q + 1) sin[(ωq − ω)t] + (q − 1) sin[(ωq + ω)t] (3.10)
For better numerical demonstration, we scale y(t) with respect to ω by changing t
to t:
t =t2πω
(3.11)
Chapter 3. EMD with Unified Representation 38
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
t
(−q−1) sin(q 2 π t−2 π t)+(q−1) sin(q 2 π t+2 π t) = 0
Figure 3.1: Contour plot for demonstration of the solution to Equation (3.16). Shaded
area corresponds to the region on t, defined by the second inequality.
and get:
y(t) = −(q + 1) sin[2π(q − 1)t] + (q − 1) sin[2π(q + 1)t] (3.12)
Within the defined time span we require Sgn[y(t)] to be fixed, either being +1 or
−1. This is equivalent to requiring that there is no solution for :y(t) = 0
0 < t < min(1, 1q)
(3.13)
To solve this system of equations numerically, we plot two contours, as shown in
Chapter 3. EMD with Unified Representation 39
(a) q = 0.25 (b) q = 0.75
(c) q = 0.1 (d) q = 3
Figure 3.2: Examples of different values of q and their intersection with y(t) = 0. Vertical
dotted line represent the current value q = qo.
Figure 3.1. Solid line represent the contour for:
y(t) = 0 (3.14)
and dashed line is the contour for:
t =1
q(3.15)
Chapter 3. EMD with Unified Representation 40
Conside the region defined by:
0 < t < min(1,1
q) (3.16)
as shown in Figure 3.1 by the shaded area. We are looking for values for q such that
within the shaded region (excluding boundaries) there is no solution to y(t) = 0.
Suppose q takes on a specific value q = qo, the above is equivalent to requiring that
within the shaded region (excluding boundaries), there is no intersection between
q = qo and the contour y(t) = 0. Figure 3.2(b) and 3.2(c) show examples of q
that do not intersects with y(t) = 0. In these cases no solution exists for Equation
(3.16), so SoR does not change and the corresponding ai cos(ωit) + j cos(ωt) is a
fixed point for the sifting operator, and ai cos(ωit) is part of the extracted IMF.
On the other hand, Figure 3.2(a) and 3.2(d) show examples of q that intersect
with y(t) = 0. In these cases the requirement is not satisfied so the corresponding
ai cos(ωit) will not be part of the IMF, when cos(ωt) is used as reference. It is clear
that the requirement is only satisfied if q satisfies:
0.5 ≤ q ≤ 2 (3.17)
as shown by the darker shaded region in Figure 3.3.
Combining the result from the first and second scenario, we conclude that ai cos(ωit)
is part of the extracted IMF with reference signal cos(ωt), if the ratio between ωi and ω
satisfies:
ω
2≤ ωi ≤ 2ω (3.18)
To validate the above analysis, we construct a synthesized chirp signal of 5 seconds,
Chapter 3. EMD with Unified Representation 41
0 0.5 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
t
(−q−1) sin(q 2 π t−2 π t)+(q−1) sin(q 2 π t+2 π t) = 0
Figure 3.3: Condition for q such that no solution exists for Equation (3.13). Two arrows:
q = 0.5 and q = 2.
begining with DC and increases frequency at 25Hz per second:
x(t) = cos(2π · 25t · t) (3.19)
We use a reference cosine of 50Hz, which is the instantaneous frequency of x(t) at t = 2
second:
r(t) = cos(2π · 50 · t) (3.20)
The instantaneous frequencies of these two signals are plotted in Figure 3.4.
We apply the proposed R-EMD and investigate the first level IMF. Denote the real
part of IMF1, which corresponding to x(t), as x1(t). Denote the imaginary part of IMF1,
which corresponding to r(t), as r1(t). Figure 3.5 shows these two signals.
Chapter 3. EMD with Unified Representation 42
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
20
40
60
80
100
120
140
Time (Sec)
Insta
nta
ne
ou
s F
req
ue
ncy
Figure 3.4: Instantaneous frequency of the chirp (solid line) and the reference (dashed
line).
Am
plit
ude
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−1
−0.5
0
0.5
1
Time (Sec)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
−1
−0.5
0
0.5
1
Figure 3.5: First level IMF: real part on the top plot and imaginary part on the bottom
plot.
According to our analysis, frequency in the range (25, 100) will be extracted. Note
that in a practical setup of R-EMD, as we will present in Section 3.1.3, the decomposition
Chapter 3. EMD with Unified Representation 43
starts from the finnest scale and we design the set of reference frequencies such that at
each level the highest frequency in the signal x(t) is no more than twice of the reference
frequency, whereas in this experiment it is beyond the frequency that the reference signal
is supposed to extract. To evaluate the performance in this experimental setting, the
extracted frequency is defined to be related to the range where both x1(t) and r1(t) have
high energy, that is, where x1(t) and r1(t) are in correspondence with each other. To
evaluate this, we build upper envelopes on the the two signals, which we denote as ex(t)
and er(t). If we consider each level of R-EMD as a filter, the filter response can be
approximated as in Figure 3.6, where the x-axis is the instantaneous frequency of x(t)
and the y-axis is ex(t) · er(t). We also plotted the location of the referenec frequency at
50Hz (vertical dashed line in the middle) together with two bounds from our analysis:
25Hz (vertical dashed line on the left) and 100Hz (vertical dashed line on the right). It is
clear that the frequency bounds from the analysis approximate closely to the simulation
result.
0 20 40 60 80 100 120
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Instantaneous Frequency
Am
plit
ude R
esponse
Figure 3.6: Approximate of the filter structure of one level R-EMD.
Chapter 3. EMD with Unified Representation 44
3.1.3 Reference frequency selection for R-EMD
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Figure 3.7: Smoothed one level frequency response.
To approxiamate the behaviour of 1-level R-EMD in a filter-like structure, a mixture
of three gaussians is used to fit the amplitude response, as shown in Figure 3.7. Under this
assumption we can create a filterbank structure that corresponds to the optimal reference
frequency setting of the proposed algorithm, as shown in Figure 3.8. The structure is
also plotted in log scale, as in Figure 3.8.
Ideally, once the highest frequency and the number of level is determined, the sub-
sequent frequencies can be determined one by one, each being half of the reference from
previous level. Furthermore, the highest frequency can be any value in the range (W2, 2W )
where W is the highest instantaneous frequency in the signal to be decomposed. In prac-
tice, reference frequencies need to be picked up according to this rough measure, as well
as the freuquency property of the signals of interest. Also it is preferred to pick up a
higher frquency to start with since there is always high frequency noise in real world
applications.
Chapter 3. EMD with Unified Representation 45
0 0.125ω0.25ω 0.5ω ω 1.5ω
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Instantaneous Frequency
Am
plit
ude R
esponse
Figure 3.8: Filter bank structure.
3.1.4 Experimental Result
Figure 3.10 shows an example of decomposing three transient evoked otoacoustic emission
(TEOAE) signals using the proposed R-EMD. The original signals are the same as the
ones used to demonstrate problems of EMD and were depicted in Figure 2.12. The
reference frequencies are designed according to the above rules and OAE signal property,
which will be presented in details in Chapter 4.
3.2 Two Dimensional Bivariate EMD
For decomposing two dimensional signals, the state of the art algorithms are only capable
of dealing with one signal at a time. In this section, we take a step further to proposed an
algorithm to decompose two signals together, in order to obtain a unified representation
between the two.
Chapter 3. EMD with Unified Representation 46
0.008ω 0.016ω 0.313ω 0.063ω 0.125ω 0.25ω 0.5ω ω 1.5ω
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Instantaneous Frequency (log scale)
Am
plit
ude R
esponse
Figure 3.9: Filter bank in log scale.
3.2.1 Algorithm
In order to decompose two images together, we follow a similar approach as in BEMD.
Signals are combined together to form a new one in higher dimension. By projecting
to different directions among the two signal dimensions, partial surface envelopes are
estimated and can be combined together. We herein use the term complex-valued surface
to describe a function (image) that maps a pixel to a complex number, so the first step is
to combine two images into a complex-valued surface. Following that, obtaining unified
representation means that each IMF is a complex-valued surface with correspondence on
the local oscillation between the real and the imaginary parts.
For two images x(m,n) and y(m,n), 2D-BEMD operates as follows:
1. Initialize IMF index i = 1.
2. Construct the complex-valued surface z(m,n) = x(m,n) + jy(m,n)
Chapter 3. EMD with Unified Representation 47
Sample
Am
plit
ude
−202
x 10−5
−101
x 10−5
−202
x 10−5
−505
x 10−6
−101
x 10−5
−505
x 10−5
−202
x 10−4
200 400 600−2
02
x 10−4
−101
x 10−5
−101
x 10−5
−101
x 10−5
−202
x 10−5
−202
x 10−5
−202
x 10−5
−202
x 10−5
200 400 600−2
02
x 10−4
−101
x 10−5
−505
x 10−6
−505
x 10−6
−505
x 10−6
−202
x 10−6
−505
x 10−6
−202
x 10−5
200 400 600−5
05
x 10−5
Figure 3.10: Demonstration of the proposed R-EMD.
3. For 1 ≤ k ≤ K
(a) Project on ϕk = πK
(k − 1) to get Re{e−jϕkz(m,n)}
(b) Determine window size for filtering [18]:
w = min{min{dadj−max},min{dadj−min}} where {dadj−max} is the collection of
all pairwise distances between adjancent local maximums, and {dadj−min} is
the collection of all pairwise distances between adjancent local minimums.
(c) Apply order statistics filter and smoothing to get the upper and lower surface
envelopes.
(d) Average the upper and lower surface envelope to get the partial mean surface
envm(m,n)
Chapter 3. EMD with Unified Representation 48
(e) Compute complex-valued mean surface:
envm(m,n) = 12[envm(m,n) + e−jϕkenvm(m,n)]
4. Compute the candidate IMF:
zi(m,n) = zr(m,n)− envm(m,n)
If this satisfies the stop criterion, then remove this IMF from the signal, increase
the IMF index i and go to step 5. Otherwise, keep iterating from step 3.
5. Check if the terminate criterion is satisfied, if so terminate the algorithm. If not,
go to step 3.
The proposed algorithm is presented in more detail in Algorithm 6, where CheckTer-
minateCriterion checks if the entire algorithm need to be terminated, which happens
when the current signal to be iterated is considered as a residue. CheckStopCriterion
checks to see if the target signal is an IMF in order to stop the sifting process. Both
criteria are similar to those of 2D-EMD.
In most of the cases, we need only one sifting operation to extract the IMF, so we
can approximate the decomposition by removing the inner loop sifting process to speed
up the algorithm, as shown in Algorithm 7. In the following chapters, we only use this
fast version Algorithm 7 and we will refer to it as 2D-BEMD for simplicity.
3.2.2 Experimental Result
A demonstration of the proposed 2D-BEMD is given in Figure 3.11. Here the same pair
of face images as in Figure 2.14 are decomposed. Same number of levels is achieved with
correspondence from both sides at each level. This demonstrates the effectiveness of the
proposed algorithm in obtaining a unified representation for decomposing pair of images.
Chapter 3. EMD with Unified Representation 49
IMF A1 IMF A2
IMF A3 IMF A4
IMF A5 IMF A6
IMF A7 IMF A8
IMF A9
IMF B1 IMF B2
IMF B3 IMF B4
IMF B5 IMF B6
IMF B7 IMF B8
IMF B9
Figure 3.11: Demonstration of the proposed 2D-BEMD algorithm.
Chapter 3. EMD with Unified Representation 50
Algorithm 6: 2D-BEMD
1 z(m,n) = x(m,n) + jy(m,n);2 i = 1;3 zr(m,n) = z(m,n);4 while NOT(CheckTerminateCriterion(zr(m,n))) do5 zr(m,n) = zr(m,n);
6 while NOT(CheckStopCriterion(zi(m,n))) do7 envm(m,n) = [0 0 . . . 0];8 for k = 1→ K do9 ϕk = kπ
K;
10 envm(m,n) =SmoothFilter(OrderStatisticsFilter(Project(zr(m,n), ϕk)));
11 envm(m,n) = 12[envm(m,n) + e−jϕk envm(m,n)];
12 end13 zi(m,n) = zr(m,n)− envm(m,n);14 zr(m,n) = zr(m,n)− zi(m,n);
15 end16 zr(m,n) = zr(m,n)− zi(m,n);17 i = i+ 1;
18 end
Algorithm 7: FA-2D-BEMD
1 z(m,n) = x(m,n) + jy(m,n);2 i = 1;3 zr(m,n) = z(m,n);4 while NOT(CheckTerminateCriterion(zr(m,n))) do5 zr(m,n) = zr(m,n);6 envm(m,n) = [0 0 . . . 0];7 for k = 1→ K do8 ϕk = kπ
K;
9 envm(m,n) = SmoothFilter(OrderStatisticsFilter(Project(zr(m,n), ϕk)));10 envm(m,n) = 1
2[envm(m,n) + e−jϕk envm(m,n)];
11 end12 zi(m,n) = zr(m,n)− envm(m,n);13 zr(m,n) = zr(m,n)− zi(m,n);14 zr(m,n) = zr(m,n)− zi(m,n);15 i = i+ 1;
16 end
3.3 Two Dimensional Reference EMD
Using the same principle as in R-EMD, it is possible to extend the two dimensional
EMD algorithm for obtaining unified representation among multiple images. Giving that
Chapter 3. EMD with Unified Representation 51
the frequency range of one particular type of images can be determined roughly, a set
of reference surfaces consisting of sinusoids in both directions can be constructed and
used to guide the decomposition. It is expected that this algorithm will achieve similar
performance for images as the R-EMD does for one dimensional signals. Although this
algorithm was not inplemented since currently none of the applications requires spatial-
frequency specific information fusion from more than two images, it is proposed here as
to complete the entire framework for obtaining unified representation using EMD.
3.4 Conclusion
In this chapter two algorithms were proposed for obtaining unified signal representa-
tion using EMD. Both algorithms are one step further compared to the state of the art
algorithms, with R-EMD for 1D signal decomposition and 2D-BEMD for 2D image de-
composition. Since the state of the art is different for 1D and 2D scenarios, the algorithms
proposed here are different in their capabilities.
For 1D signals, existing algorithms are capable of providing unified representation
among pair, triplet or more signals, but the algorithm complexity grows as more signals
added in. R-EMD was proposed to decompose multiple signals under unified represen-
tation with low complexity. For 2D images, existing algorithms do not provide unified
representation for a pair of images. 2D-BEMD was developed to address this problem.
To complete the framework of using reference signal for a unified representation, we
also discussed 2D-R-EMD, which will be developed in the future.
For the next three chapters, the focus will be on the application of these propsed
algorithms in different areas in biometrics and image processing. In Chapter 4 we present
the application of R-EMD to decompose transient evoked otoacoustic emission signals
for similarity comparison at each level, in order for a biometric recognition system to
be established. In Chapter 5 and Chapter 6 2D-BEMD is applied to combine images at
Chapter 3. EMD with Unified Representation 52
multiple levels. By using different strategy for combination, we can emphasize details
from both image for fusing de-focused images (Chapter 5), or highlighting certain features
that resides in some particular levels for transforming facial expressions (Chapter 6).
In these applications unified representation is preferred because signal comparison or
information fusion is carried out at each level, which requires predictable number of
levels and correspondence at each level.
Chapter 4
Otoacoustic Emissions for Biometric
Recognition via R-EMD
In this chapter the R-EMD algorithm proposed in Chapter 3 is employed in multi-level
decomposition of the transient evoked otoaoustic emission signals for biometric recog-
nition purpose. The unified representations make it possible to carry out a multi-level
matching and decision on the IMFs from the decomposition. Performance of the proposed
system is tested on a dataset we collected specifically for biometric evaluation purpose
and promising result is observed.
4.1 Introduction
Otoacoustic emission (OAE) is a weak acoustic sound generated by an active process in
the cochlea and can be collected easily using a special earphone with built-in microphone.
The transducer in the earphone is for sending stimulus into the ear and the microphone
is used to collect response.
Using OAE for biometric has several advantages. First of all, its physiological na-
ture makes it robust against falsified credentials, because physiological signals are more
difficult to reproduce, compared to traditional biometric features. Furthermore, it can
53
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD54
be applied in some special scenarios, for example, in new born identification where both
face, fingerprint and iris recognition will fail. Last but not least, the proposed framework
can be easily integrated into existing OAE recording devices and operate in parallel with
diagnostic or monitoring activities.
4.2 Otoacoustic Emissions
0 0.5 1 1.5 2 2.5
−100
−50
0
50
100
150
200
250
300
350
400
Time (ms)
Am
plit
ude (
mP
a)
Figure 4.1: Click sound stimulus.
The human ear consists of three major parts, outer ear, middle ear and inner ear.
Sound collected by outer ear travels along the ear canal and hit the ear drum, cause
it to vibrate. Through the middle ear it arrives at the inner ear. Basilar membrane,
a piece of structure run through the coil of cochlea, respond to different frequency in
a location-specific way, where higher frequencies are responded near the base and lower
frequencies near the apex. Inside this basilar membrane, there are two different types
of hair cells, with outer hair cells responsible for amplification of sound, which is related
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD55
0 2 4 6 8 10 12 14 16
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
Time (ms)
Am
plit
ud
e (
mP
a)
Figure 4.2: TEOAE response after pre-processing.
to OAE generation, and inner hair cells responsible for transformation of the sound into
electrical pulses that are sent to the brian through auditory nerves.
OAE is a by-product of the normal hearing process. In mammals, in order to improve
hearing sensitivity, the outer hair cells inside basilar membrane nonlinearly amplify quite
sounds more than loud ones. Vibration of the outer hair cell body result in not only a
forward amplification of the sound, but also a backward response that eventually comes
out of the ear canal, which is the source for OAE .
Different types of OAE are determined by the stimulus used to generate the response.
Transient evoked otoacoustic emission (TEOAE), which is the signal we investigate, is
a response that is stimulated by a low level click sound, which is a flat-band signal in
frequency domain. A flat-band signal can excite the entire basilar membrane so the
response will be rich in frequency. In addition, different frequency components arrive at
the ear canal at different times, thus make the whole response long and complex, compare
to other OAE signals.
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD56
TEOAE is used in clinical application for screening and diagnosis purpose. It has
been observed that TEOAE response is quite different between individuals, which makes
it possible as biometric.
Example of TEOAE recorded from Vivosonic Integrity system, after some pre-processing
for better visual quality, is shown in Figure 4.2. The stimulus used to generate this signal
is shown in Figure 4.1.
4.3 Related Work
A feasibility study of using TEOAE as a biometric modality was conducted by Swabey
[22]. The dataset that was investigated consists of one adult short-term dataset with 23
subjects (recorded within same session, which is of less value for biometric evaluation
purpose), one neonate dataset with 760 subjects (no report on time interval between
two recording sessions) and one adult long-term dataset with 6 subjects(time interval
between two sessions was 6 months). Maximum likelihood estimation was employed to
approximate the probability density function of inter-class and intra-class distance, on
raw time domain signals (after built-in filtering by the device). The reported Equal Error
Rate (EER) for three datasets was 1.24%, 2.29% and 2.35%, all with 90% confidence,
respectively.
The only dataset of long term variability evaluation recordings consists of only 6
subjects, so the performance on such dataset cannot be generalized to a larger population.
Furthermore, analyzing TEOAE signal in time domain is suboptimal, since its frequency
specific features and details are ignored.
To address these problems, we collect data under a setup suitable for biometric eval-
uation purpose, and propose to use R-EMD for decomposing and analyzing the signal at
multiple levels.
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD57
Table 4.1: TEOAE recording protocol
Stimulus Parameters
STI-Mode Nonlinear
Click Interval 21.12ms
Click Duration 80µs
Click Level 80dB peSPL
Test Control
Record Window 2.8− 20ms
Low Pass Cut-off 6000 Hz
High Pass Cut-off 750Hz
Artifact Rejection Threshold 55dB SPL
4.4 Signal Collection
Signal collection was conducted in Biometrics Security Laboratory at University of Toronto,
approved by University of Toronto protocol reference number 23018. Vivosonic Integrity
system [47] was used with protocol details shown in Table 4.1.
To ensure the quality of recording without too much constraint on environment,
earmuff was used for noise canceling but the experiment was setup in an office where there
were people talking and entering the office. The participants were given the instruction to
sit in a chair and relax. Details about signal collection and outlier removal are discussed
in Appendix B.
After outlier removal, 54 subjects were successfully recorded in both sessions, with
the time between sessions at least one week to validate long-term stability. Most of the
subjects are between the age 20 and 30. Dataset consists of one response of length 17.2ms
per ear per session for each subject.
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD58
4.5 Biometric System Setup
The setup of the biometric system is shown in Figure 4.3. In the enrollment stage, we
use the recording from first session of each subject, pre-process and store the signature
into the system gallery set. In recognition stage, we select the recording from the second
session of one subject, pre-process it and match it with the whole gallery set to find the
best fit and claim the identity.
Figure 4.3: Proposed biometric recognition system.
Denote TEOAE recorded during first session(enrollment) as {xLk}, {xRk} and TEOAE
recorded during second session(recognition) as {yLk}, {yRk} , with subject IDs k =
1, 2 · · · 54 and {L,R} for left or right ear.
4.6 OAE signal decomposition using R-EMD
R-EMD is used to decompose the TEOAE recordings, with reference frequencies selected
according to an auditory model. In addition, high frequency noise is removed by applying
a signal mask according to the frequency-latency relationship. These procedures serve as
the preprocessing steps before biometric identification can be made.
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD59
4.6.1 Reference frequency selection
According to an auditory model proposed to be used with wavelet transform [23, 24],
generation of TEOAE can be considered as the stimulus passing through a set of band
pass filters (BPF), each corresponding to one point on the basilar membrane and with
nonlinear damping effect. In their work, using a 128-level auditory model, the center
frequency of each band can be calculated as:
fm =foqm
(4.1)
where m = 0, 1, 2, · · · 127, base frequency fo = 15165.4Hz and q = 1.0352952. To apply
the R-EMD algorithm, reference frequencies have to be roughly spaced with a factor of 2
from level to level. Taking into account both the auditory model and reference frequency
selection rule, we propose to use the following 8 reference frequencies:
fm =fo
q16k−1(4.2)
with k = 1, 2, · · · 8, corresponding to level 15, 31, 47, 63, 79, 95, 111, 127 in the 128-level
auditory model.
In addition, since low frequency trend with high amplitude resides in every recording,
as can be seen from Figure B.3 in Appendix B, we propose to use only the first 4 levels
of IMFs to avoid low frequency noise, rather than using all 8 levels and the residue. This
also speeds up the algorithm since decomposition can be done only for the first 4 levels.
Note that this type of hard thresholding can be improved in future work.
4.6.2 Masking
For TEOAE signals, since high frequency components of TEOAE exhibit shorter latency
and duration, in order to remove high frequency noise from the recording, IMF1 and
IMF2 are multiplied with a mask, which is a window centered at 0ms with a falling
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD60
cosine tail between 3.9ms and 6.5ms:
W (t) =
1 0 ≤ t < 3.9ms
1 + cos( t−3.92.6
π) 3.9ms ≤ t < 6.5ms
0 6.5ms ≤ t ≥ 17.2ms
(4.3)
The above two step procedures are applied to the TEOAE recordings in both en-
rollment and recognition session. Denote the decomposed signals after applying mask
as {xiLk}, {xiRk} for enrollment session and {yiLk}, {yiRk} for recognition session, with
subject IDs k = 1, 2 · · · 54 and IMF index i = 1, 2, 3, 4. Note that in a practical setup,
these decomposed signals are store in gallery together with the subject ID only during
enrollment since during recognition the subject ID is unknown. An example of 4-level
decomposition after masking is depicted in Figure 4.4.
0
5x 10
−6
0
5x 10
−6
0
1x 10
−5
0 5 10
0
4x 10
−5
Time(ms)
Am
plit
ude
IMF1
IMF2
IMF3
IMF4
Figure 4.4: IMF1− 4 from decomposing a raw TEOAE recording.
4.7 Recognition
During recognition, we want to match the input decomposition with all the signals in
gallery to find out the best match. To do this, correlation between IMFs at the same
level are computed and combined together to determine the best match ID. To combine
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD61
these correlation values, we take into account two facts. First, correlation coefficient of
different levels cannot be combined directly since each level might have different baseline
value. Thus a normalization is needed at the same level across the gallery. Second, each
level is of different importance for identification purpose. Due to the limited size of the
collected dataset, we here propose to use some empirical weighting for different levels but
this can be improved in future works by learning the best weights using a training set
when a larger dataset is available.
4.7.1 Single ear
Recognition can be done with recording from either the left or right ear. For simplicity
of discussion, we assume the use of recording from left ear in this section.
For the recording from an unknown subject n with recording yLn in recognition session,
we want to find the best match identity in enrolled recordings.
This is done by the following steps:
• Correlation matrix Correlations between IMFs are calculated with the corre-
sponding subject ID for each gallery entry as follows:
C(k, i) = corr(xiLk, yiLn)
I(k, i) = k
with k = 1, 2 · · · 54, i = 1, 2, 3, 4, and corr(a,b) denotes the correlation between
two vectors a and b.
• Ranked, normalized and weighted correlation matrix
After computed the correlation matrix C54×41, we sort each column in descending
order and get C ′54×4. Normalize each entry in C ′54×4 with respect to first row and
1subscript denotes size of the matrix
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD62
keep the first 3 rows (3 highest ranked matches for each level) yield C3×4 with
C(1,i) = 1 and C(k,i) =C′
(k,i)
C′(1,i)
for k = 1, 2, 3, i = 1, 2, 3, 4. I54×4 is also re-ordered
correspondingly as I3×4.
With an empirical weight imposed, score matrix is computed as:
S3×4 =
1 0.8 0.8 0.6
C(2,1) 0.8C(2,2) 0.8C(2,3) 0.6C(2,4)
C(3,1) 0.8C(3,2) 0.8C(3,3) 0.6C(3,4)
• Matching score
Denote the collection of all unique IDs in I3×4 as {Iu} with 1 ≤ u ≤ N and N ≤ 12.
For every Iu ∈ {Iu}, final scores
Su =∑
I(m,n)=Iu
S(m,n)
• Decision
Sort {Su} in descending order as {Su} with {Iu} re-ordered as {Iu}. 3 best matched
identities are I1, I2 and I3 with corresponding scores S1, S2 and S3. The subject is
identified as I1.
4.7.2 Fusion of left and right ear
A score level fusion from both ears can be employed to improve system performance.
Suppose we have the three best matches from left ear enrolled recordings with their
identities IL1, IL2, IL3 and scores SL1, SL2, SL3. Those from right ear are denoted as IR1,
IR2, IR3 and SR1, SR2, SR3.
If results from both sides agree with each other, that is IL1 = IR1, final identified
subject ID is IL1.
If IL1 6= IR1 the matched identity is calculated as:
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD63
• Concatenate subject IDs and scores
Ic = [IL1 IL2 IL3 IR1 IR2 IR3]
Sc = [SL1 SL2 SL3 SR1 SR2 SR3]
• Fused matching score
Denote the collection of all unique IDs in Ic as {Iu}. Final scores {Su} are computed
as follows:
Su =∑
Ic(m)=Iu
Sc(m)
• Decision
Sort {Su} in descending order as {Su} with {Iu} re-ordered as {Iu}. The subject
is identified as I1.
4.8 Experimental Results
Recognition performance is summarized in Table 4.2 for three different scenarios: using
left ear recording only, using right ear recording only and the fusion of two ears. Right
ear performance is slightly lower than left ear. One possible cause may be the additive
spontaneous otoacoustic emission (SOAE), which might not be as unique within each
individual as TEOAE and has been proved to exist together with TEOAE and exhibit
greater intensity in right ear than in left ear [25]. With the fusion of information from
two ears a recognition rate of 98.15% is achieved.
4.9 Conclusion
In this chapter a framework for biometric recognition using transient evoked otoacoustic
emissions was presented. TEOAE signals from 54 subjects for long-term stability vali-
Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD64
Table 4.2: TEOAE biometric recognition performance
Scenario Correctly Recognized Performance
Left 52 out of 54 96.30%
Right 49 out of 54 90.74%
Fusion 53 out of 54 98.15%
dation purpose was collected. By using the proposed R-EMD with reference frequencies
determined by an auditory model, a recognition rate of 98.15% can be achieved with
fusion of information from both ears.
Chapter 5
Image Fusion via 2D-BEMD
In this chapter a multi-level image fusion scheme is proposed to combine complementary
information from partially blurred or defocused images. By using the proposed 2D-
BEMD algorithm which retains the embedded spatial information while addressing level
correspondence when decomposing two images, improved fusion results can be obtained.
We show the improvement by a lower mean squared error (MSE) value on synthesized
blurred images and a better visual appearance on real images from camera.
5.1 Introduction
Image fusion is essentially a problem of combining information from different imagery, in a
way that complementary information from the distinct images is preserved. Image fusion
finds application in numerous areas such as brain imaging [26], aerial imaging [27], visible
and infra-red imaging for night vision [28], terrestrial observation [29] and combination
of visual and thermal images [30].
Traditionally, image fusion is done with the help of either the Principal Component
Analysis (PCA), wavelets, Gaussian pyramids or pixel averaging [30]. In PCA, only high
energy components are retained in the reconstruction, and as a result there is information
loss for the combined image. In addition, PCA performs linear projections on stochastic
65
Chapter 5. Image Fusion via 2D-BEMD 66
data, and is thus suboptimal. Wavelets require a predefined set of bases images that
do not adapt to the particular characteristics of the images under consideration. This
does not allow for generalization of successful fusion. On the other hand, pixel averaging
performs a simple low-pass operation, which results in loss of edge information.
Recently, it was proposed that empirical mode decomposition can be used for image
fusion [30–33]. Since EMD decomposes the signal into a number of oscillatory modes
that are adaptively defined by the signal itself, it is a natural choice for image fusion.
Prior works relied either on BEMD [30,31,33] or 2D-EMD [34]. For the BEMD case,
the limitations arise from the fact that two dimensional data are vectorized, and this
causes loss of the spatial information of the image. In the 2D-EMD fusion case, two
images are decomposed separately via 2D-EMD where no unified representation after
decomposition is guaranteed, meaning that the number and type of detected oscillatory
modes is uncertain. In an automatic fusion setting, where scale correspondence is re-
quired, this poses a great challenge on combining analogous information from both sides.
In this chapter we address the problem of image fusion, with the proposed 2D-BEMD
that overcomes the above difficulties. The two dimensional nature of the algorithm
preserves the spatial information of the image at all oscillatory modes. Furthermore, the
bivariate aspect of the algorithm allows us to decompose two images simultaneously, and
achieve mode correspondence, which can then be strategically incorporated in a fusion
mechanism.
5.2 Multiscale Fusion with 2D-BEMD
A fusion strategy similar to [31] is employed for the fusion of real and imaginary IMFs
that result from the 2D-BEMD decomposition. Giving two partially blurred or defocused
images A(m,n) and B(m,n), we form the complex image C(m,n)
C(m,n) = A(m,n) + jB(m,n) (5.1)
Chapter 5. Image Fusion via 2D-BEMD 67
Figure 5.1: Top row: partially defocused images A (left) and B (right). Bottom row:
zoomed in details showing the differences between the two images.
By applying 2D-BEMD on C(m,n) we get
C(m,n) =L∑i=1
Ci(m,n) (5.2)
Alternatively,
C(m,n) =L∑i=1
Ai(m,n) + jL∑i=1
Bi(m,n) (5.3)
Figure 5.1 shows an example of two partially defocused images, one is background
focused and the other is foreground focused. Figure 5.2 and Figure 5.3 shows the IMFs
from 2D-BEMD decomposition on the two images. Note that zoomed in details are shown
for the IMFs for better demonstration of the decomposition.
The fused image is obtained by pixel-wise weighting of the two images at different
levels, and summing up the two across all levels
F (m,n) =L∑i=1
αi(m,n)Ai(m,n) + βi(m,n)Bi(m,n) (5.4)
where for each pixel at every level, the weights satisfy
αi(m,n) + βi(m,n) = 1 (5.5)
Chapter 5. Image Fusion via 2D-BEMD 68
Figure 5.2: IMFs 1-5 obtained with the proposed 2D-BEMD algorithm for partially
defocused images. Left column: IMFs corresponding to image A. Right column: IMFs
corresponding to image B.
At each level, the weights are determined by comparing local variances between IMF
from image A and IMF from image B. Intuitively, more weight is assigned to pixel with
a higher local variance to emphasize detailed information, as a result, the corresponding
Chapter 5. Image Fusion via 2D-BEMD 69
Figure 5.3: IMFs 6-10 obtained with the proposed 2D-BEMD algorithm for partially
defocused images. Left column: IMFs corresponding to image A. Right column: IMFs
corresponding to image B.
pixel from IMF of the other image will be assigned less weight. The procedure is as
follows:
Chapter 5. Image Fusion via 2D-BEMD 70
Original Image
Foreground Blurred Background Blurred
2D BEMD (MSE 0.34) 1D BEMD (MSE 1.81)
Pixel Average (MSE 48.82) Wavelet Fusion (MSE 0.59)
Figure 5.4: BEMD and 2D-BEMD fusion results on an artificially generated partially
blurred image.
Chapter 5. Image Fusion via 2D-BEMD 71
αi(m,n) = 0 if var{Ai(m,n)} < var{Bi(m,n)} − ε (5.6)
αi(m,n) = 0.5 if |var{Ai(m,n)} − var{Bi(m,n)}| < ε (5.7)
αi(m,n) = 1 if var{Ai(m,n)} > var{Bi(m,n)}+ ε (5.8)
where ε is a controllable threshold to determine if one local variance can be considered
significant higher than the other. The local variance var{Ai(m,n)} is calculated by
examining the variance within a pre-defined window
var{Ai(m,n)} =∑
(p,q)∈win(m,n)
[Ai(m+ p, n+ q)− µ]2 (5.9)
where
µ =∑
(p,q)∈win(m,n)
Ai(m+ p, n+ q) (5.10)
and
win(m,n) = {(p, q) : max(1,m−w) ≤ p ≤ min(M,m+w),max(1,m−w) ≤ q ≤ min(N,m+w)}
(5.11)
for image size M ×N .
5.3 Experiment Result
The performance of the proposed scheme was tested for fusion on both synthesized and
camera captured images. In our experiments all images are of size 399× 600 and in gray
scale, 8-bit unsigned integer format. We use threshold ε = 30 and window size w = 40.
These settings are for demonstrating the results only since the focus is on application of
the 2D-BEMD algorithm rather than the design of a fusion scheme. More sophisticated
parameter settings can be used in real world applications for better result.
Chapter 5. Image Fusion via 2D-BEMD 72
5.3.1 Results on synthesized blurred images
The proposed algorithm was compared against well known image fusion strategies i.e.,
BEMD, pixel averaging and wavelets. For BEMD the approach in [33] is used. For pixel
averaging, the fused image is obtained by taking average of all pixel pairs in both images.
For wavelet, we decompose the two images at level 5 by Symlets-4. Approximations and
details coefficients are merged elementwise by taking the maximum on both sides.
For comparison purposes, partially blurred images was artificially generated in order
to quantify performance via MSE. Figure 5.4 shows the original image and the two
blurred versions. The foreground and background regions were artificially blurred with
a Gaussian filter of radius 1.5 pixels. Figure 5.4 also shows the reconstruction results for
four fusion methods.
Pixel averaging performs worse than all fusion methods (the mse is 48.82 ). This
is because it is equivalent to a low pass operation whereby the detail of the image is
destroyed. The wavelet result is visually satisfying, however the error is greater than
the 2D-BEMD case. Although wavelet fusion is by definition multi-scale, this decompo-
sition relies on predefined bases and as such may miss the intrinsic information of the
image. BEMD address this problem by analyzing signal adaptively, but lacks spatial
treatment because the decomposition is performed on vectorized versions of the images.
The proposed 2D-BEMD sufficiently addresses this problem by analyzing the two images
simultaneously in the space domain.
5.3.2 Results on partially focused photos
Figure 5.5 shows two examples of partially defocused images from camera and the 2D-
BEMD reconstruction result. Despite the complexity of the depicted scenes, the adaptive
nature of the decomposition process manages to capture the finest detail from both
images. High frequency characteristics that are present in the two partially defocused
Chapter 5. Image Fusion via 2D-BEMD 73
images are preserved by the decomposition and depicted in the fused result. In addition,
even though both images are crowded and the transitions of foreground to background
blurring is among objects that physically overlap, the 2D-BEMD reconstruction exhibits
clear transitions from one object to the other.
Image A.
Foreground Blurred
Image A.
Background Blurred
Image A.
2D BEMD Reconstruction
Image B.
Foreground Blurred
Image B.
Background Blurred
Image B.
2D BEMD Reconstruction
Figure 5.5: 2D-BEMD fusion results on two partially defocused sets of images.
A further comparison of BEMD and 2D-BEMD is shown in Figure 5.6, where a
zoomed version of the fused reconstruction is provided for comparison. It is clear from
the fusion detail, that without further refinements, vectorization in the case of BEMD
introduces artifacts. On the contrary, the 2D-BEMD manages to transfer details from
both images in an artifact free way.
5.4 Conclusion
This chapter presented the use of 2D-BEMD for image fusion to achieve a unified rep-
resentation while retaining spatial information at each scale. The algorithm decomposes
Chapter 5. Image Fusion via 2D-BEMD 74
Background
Focus
Foreground
Focus1D BEMD
Fusion
2D BEMD
Fusion
1D BEMD
Fusion
Details
2D BEMD
Fusion
Details
Figure 5.6: Examples of BEMD and 2D-BEMD based fusion results. The input images
are partially defocused (background versus foreground) while the reconstructed (all in
focus) image of the BEMD case exhibits significant artifacts.
simultaneously two images to provide common oscillatory modes that benifits image fu-
sion where the purpose is to reconstruct an image that gathers details from both sources.
The algorithm was tested on defocused and partially blurred images, to demonstrate
fusion power over other traditional methods and BEMD, in terms of the MSE and visual
quality.
Chapter 6
Expression Invariant Face
Recognition via 2D-BEMD
In this chapter an expression-invariant automatic face recognition system is proposed.
2D-BEMD is used for simultaneously decomposition of two face images in order to apply
a multi-level information fusion for expression transformation. With the help of the
expression transformation, better within class variation can be learned and the proposed
method shows an improvement over the traditional PCA method.
6.1 Introduction
Despite the significant advances of face recognition over the past two decades, the per-
formance of most algorithms degrades under severe expression variation. Even though
individuals can be enrolled to the biometric system under a desired facial expression (typ-
ically neutral), there is no guarantee that during the recognition mode of operation, the
subject will present his/her face under the same expression. Supervised learning, such as
linear discriminant analysis (LDA) is capable of addressing this problem, by learning the
morphological variations of a particular subject. This solution, however, requires that the
biometric is presented for training under all its variants i.e., under all facial expressions.
75
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 76
Despite the fact that this is impractical for real life systems, the performance may not
improve because of the very high dimensional representation of face images, compared
with the relatively small number of training samples.
6.2 Related Works
In [36] the authors discuss the problem of face recognition from one sample image per
subject, and present a review of works done under the illumination, pose, and expression
variation problem. The most important methods proposed to deal with the problem of
recognizing expression-variant faces from one sample image per person include the local
eigenspace approach [37], separation of texture and shape features of the face [38], and
the use of the tensorface concept to transform non-neutral expressions to neutral [39].
All of these methods suffer from some or all of the following shortcomings:
• The expression of the probe image is required to be determined.
• The probe image is required to be warped to all the gallery images.
• Facial landmark points of the stored and probe images are required to be selected
to fit 2D triangulated meshes.
which result in a time-consuming and error-prone process [39].
In this chapter we propose a solution based on expression transformation with the
2D-BEMD. Among the strengths of this method is that the expression of the probe does
not need to be determined, while a subject can be enrolled to the system using only one
image. This image is then used to synthesize new expressions for the enrollee, so that
the classifier can learn intra-class variability. With this treatment, a probe image with a
random arbitrary expression can be recognized by the system.
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 77
6.3 Expression Transformation with 2D-BEMD
In order to transform expressions, while retaining the anatomical properties of face,
expression related information needs to be separated first. After that we can replace the
expression-related information in an face image with the one from target expression, by
applying 2D-BEMD to the face image and the expression mask and fuse information at
different levels.
6.3.1 Expression mask
Under the assumption that local oscillations are related to particular expressions, we
propose the design of expression masks i.e., masks of superimposed oscillations that are
significant in differentiating expressions, and which do not carry subject-specific infor-
mation. To this end, a first step in the analysis of facial expressions, is to identify the
oscillatory modes that are descriptive for every expression. We herein deal with the
problem of expression transformation, among 6 emotions (happy, surprise, fear, disgust,
angry and sad). An expression mask is designed according to the following steps:
1. 2D-EMD analysis on images of the expression subset x.
2. Estimation of the within class variability among corresponding IMFs.
3. Identification of IMF level i with lowest intra expression variability.
4. Mask construction by averaging of ith IMFs (low-pass filtering to remove inter-
subject dependence).
Figure 6.1 shows the designed expression masks. It is interesting to note, that al-
though the expression within each mask is recognizable, the identity information is ab-
sent.
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 78
Happy Mask Suprise Mask Fear Mask Disgust Mask Angry Mask Sad Mask
Figure 6.1: Expression masks used for decomposition with an input image of arbitrary
expression.
6.3.2 Expression transformation
The next step is to decompose an input image, with the mask of a targeted expression
simultaneously. The objective is to fuse the IMFs of the two in a way, that only the
anatomical and subject-specific information is kept from the input side, while the ex-
pression is replaced with the mask. The 2D-BEMD algorithm proposed in Chapter 3 is
utilized for this task, because it allows for a unified representation between the mask and
the input image, while applying 2D-EMD separately on the mask and input fails to do
so.
This procedure is usually referred to as fusion via fission [30, 32]. Although facial
oscillations with EMD have been explored thoroughly [32, 40–42], there is no report on
its ability to separate the expression from a random face image. By treating the mask as
the targeted oscillation, and by decomposing it together with the input via 2D-BEMD,
we expect the targeted mode to exist among the IMFs of the input.
More precisely, given an input image I(m,n) and an expression mask M(m,n), we
form the complex image C(m,n)
C(m,n) = I(m,n) + jM(m,n) (6.1)
By applying 2D-BEMD on C(m,n) we get
C(m,n) =L∑i=1
I i(m,n) + jL∑i=1
M i(m,n) (6.2)
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 79
Input Real Real Real Real Real Real Real Real RealInputImage
RealIMF 1
RealIMF 2
RealIMF 3
RealIMF 4
RealIMF 5
RealIMF 6
RealIMF 7
RealIMF 8
RealIMF 9
0.4
0.6
0.8
1
atio
n C
oe
ffic
ien
t
Correlation Weights
Desired Expression
Imaginary IMF 1
Imaginary IMF 2
Imaginary IMF 3
Imaginary IMF 4
Imaginary IMF 5
Imaginary IMF 6
Imaginary IMF 7
Imaginary IMF 8
Imaginary IMF 9
2 4 6 8 100
0.2
IMF
Co
rre
la
Expression Mask Weights
Input Weights
Figure 6.2: 2D-BEMD analysis for an input image and a surprised mask image. The
edges of the input are among the first few IMFs, while most of the information of the
mask is found in IMFs 5 and 6.
Figure 6.2 shows an example of IMFs acquired from a 2D-BEMD analysis of an input
and a surprise mask.
For reconstruction, the magnitude of both the real and imaginary IMFs is summed,
to form a new image R(m,n), with the anatomical characteristics of the input and the
expression of the mask. Weights are defined based on the correlation coefficient of the
mask M(m,n) with its IMFs M i(m,n), as follows:
wmask(i) =E[(M(m,n)− µM)(M i(m,n)− µM i)]
σMσM i
(6.3)
Since the mask is constructed using a number of moderate oscillations, in the 2D-
BEMD analysis, the imaginary IMFs are expected to exhibit low correlation with low
order IMFs (corresponding to fastest oscillations). Therefore, the weighting function
relies on the correlation coefficient to minimize the effect of low order IMFs, while em-
phasizing the actual oscillations of the mask.
On the input side, an inverse treatment is required. It has been observed [32,42], that
low order IMFs carry most of the edge information in the image, which is directly related
to the anatomical properties of faces. Thus, automatically emphasizing this information
is crucial for accurate face recognition. In addition, the expression information of the
input needs to be suppressed. Based on these requirements, the following weighting
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 80
Input Real Real Real Real Real Real Real Real RealInputImage
RealIMF 1
RealIMF 2
RealIMF 3
RealIMF 4
RealIMF 5
RealIMF 6
RealIMF 7
RealIMF 8
RealIMF 9
0.4
0.6
0.8
1
atio
n C
oe
ffic
ien
t
Correlation Weights
Desired Expression
Imaginary IMF 1
Imaginary IMF 2
Imaginary IMF 3
Imaginary IMF 4
Imaginary IMF 5
Imaginary IMF 6
Imaginary IMF 7
Imaginary IMF 8
Imaginary IMF 9
2 4 6 8 100
0.2
IMFC
orr
ela
Expression Mask Weights
Input Weights
Figure 6.3: Weights used in fusion.
function is used for the IMFs of the input:
winput(i) = 1− wmask(i) (6.4)
Figure 6.3 shows how the weights are determined for the input and the mask IMFs
in Figure 6.2. The reconstructed image can then be computed as:
R(m,n) =L∑i=1
(winput(i) ∗ I i(m,n))+(wmask(i) ∗M i(m,n)) (6.5)
Figure 6.4 shows examples of expression reconstructions for a number of images, along
with the ground truth i.e., the actual image of the transformed person.
6.4 Experimental setup and results
To evaluate the performance of the proposed expression invariant face recognition sys-
tem, we used the Cohn-Kanade database [43], which is currently the most comprehensive
facial expression database. This database contains video sequences from 97 people, per-
forming a series of 1 to 9 facial expressions. Every video sequence starts from a neutral
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 81
Figure 6.4: Examples of expression transformation with 2D-BEMD. From top to bottom:
input images, expression masks, transformed faces and ground truth images.
Ga
llery
BEMD
Expr
1E
xpr
6
Masks
Train LDA
.
.
.
Figure 6.5: The enrollment pipeline. Every gallery image to be enrolled is first used to
synthesize 6 expression variants.
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 82
10−3
10−2
10−1
100
0.4
0.5
0.6
0.7
0.8
0.9
1
False acceptance rate
Verification r
ate
PCA
Proposed Method
Figure 6.6: Verification rate versus false acceptance rate, for gallery of size 10.
10−3
10−2
10−1
100
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
False acceptance rate
Verification r
ate
PCA
Proposed Method
Figure 6.7: Verification rate versus false acceptance rate, for gallery of size 20.
expression and ends to a target expression. We selected the last frame, which corre-
sponds to the most intense expression, of video sequences displaying any of the following
six expressions: happy, surprise, fear, disgust, anger and sadness. We also selected the
first frame of one of the video sequences of each person as the neutral expression. The
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 83
10−3
10−2
10−1
100
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
False acceptance rate
Verification r
ate
PCA
Proposed Method
Figure 6.8: Verification rate versus false acceptance rate, for gallery of size 40.
resulting dataset consists of 500 images of 96 people, where each person has at least two
different expressions. We aligned the images using the eye center coordinates, masked,
and cropped to an image of size 273 by 204 pixels. The final images are normalized to
have zero mean and unit variance.
For face recognition, we are considering the following scenario. There is a gallery
of neutral images with one image per subject, and there is a probe set with images of
random (non-neutral) expressions. For every image in the gallery set, we synthesize 6
additional expression images, using the proposed method. The reconstructed images,
along with the gallery one, are used to train the LDA classifier [44]. Figure 6.5 shows the
block diagram of the training phase for this system. The resulted discriminant projection
directions are used to extract the expression-invariant features of the gallery and probe
images for verification.
Figure 6.6, 6.7 and 6.8 shows the ROC curves for the proposed method and the
eigenface method, for performance comparison. For the implementation of the eigenface
Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 84
method, the PCA coefficients were weighted by their corresponding eigenvalues, as this
improves the recognition performance under expression variation [12]. The number of
eigenfaces that correspond to 99% of the eigenvalue energy was used. The performance
is reported for three different gallery sizes i.e., 10, 20 and 40. In each case, gallery images
were randomly selected out of 96. The random selection procedure was repeated such
that almost all pairs of images in the dataset were matched against each other. The
cosine distance was used as the similarity measure. Figure 6.6 to Figure 6.8 demonstrate
the robustness of the system to changes of the gallery size.
6.5 Conclusion
This chapter presented the problem of expression invariant face recognition, under the
scenario of one sample image per gallery subject. We advocate that within subject ex-
pression variability can be learned by the classifier, using synthesized expressions. To that
end, we employ the proposed 2D-BEMD, in order to analyze and separate the anatomical
features of the face from the expression. Expression masks of particular oscillatory activ-
ity are designed to be decomposed simultaneously with the gallery image. A fusion via
fission approach is adopted, to synthesize a new image, where only the subject-specific
information is retained from the gallery, while the expression characteristics are passed
from the mask. When the gallery and its synthesized expressions are used to train the
LDA, significant recognition improvement is observed.
Chapter 7
Conclusion
7.1 Summary
In this thesis, a framework for obtaining unified signal representation using empirical
mode decomposition was proposed. The representation or “uniqueness” problem was
investigated in both one dimensional and two dimensional cases, showing that results
from state of the art algorithms are far from satisfactory.
For one dimensional case, current algorithms are capable of obtaining unified represen-
tation for pair, triplet or even a collection of signals. The problem is that computational
complexity grows as the number of signals increases. Furthermore, decomposing a new
signal requires re-decomposing all the signals in collection, which is not pracical. To
address the problem, we proposed reference EMD (R-EMD) that incoorporates of a set
of reference signals to decompose each signal in collection, thus obtaining unified rep-
resentation for all signals. We also demonstrated that reference frequencies need to be
designed according to certain rules, with small variation allowed and determined by the
signal property.
The proposed R-EMD algorithm was applied in decomposing TEOAE signals in a
biometric system. The unified representation across all the subjects in the system makes
85
Chapter 7. Conclusion 86
it possible to apply a multi-level correlation score fusion in a systematic way, which is
not feasible by applying any of the current EMD algorithms.
For two dimensional case, EMD related algorithms are only capable of decomposing
one image at a time, resulting in a non-unified representation even for a pair of sig-
nals. We addressed this problem by the introduction of two dimensional bivariate EMD
(2D-BEMD) which decomposes two images as a comlex pair. By doing so, unified repre-
sentation can be obtained with same number of levels for both images and correspondence
between IMFs at the same level.
The proposed 2D-BEMD algorithm was applied in multiresolution image fusion and
expression invariant face recognition. In both applications, 2D-BEMD was used for
information fusion from two images (multi-focus images for the former, face images with
different expressions for the latter) which requires unified representation for both images.
Additional to the 2D-BEMD algorithm, a 2D-R-EMD was also discussed as an ex-
tension for more than two images. Although no application was presented, 2D-R-EMD
completes the proposed framework.
7.2 Future Directions
As an extension of the thesis work, there are two major topics.
1. To complete the proposed framework the following work need to be done:
• 2D-R-EMD need to be implemented in order to complete the proposed frame-
work. This algorithm will find application in multichannel data fusion for
images.
• For R-EMD since reference signal is used we sacrifice a certain degree of adap-
tiveness. A quatified measure on the tradeoff between adaptiveness and unified
representation will be benificial.
Chapter 7. Conclusion 87
2. In terms of applications these are possible directions:
• For using TEOAE for bometric recognition, a dataset with more subjects will
need to be collected. Once the dataset is ready, the empirical weights used
for combining correlations in the proposed method can be determined from a
training procedure. In addition, other operation modes of biometric system
such as authentication and intruder test need to be investigated. Moreover, in
order for TEOAE to be established as a biometric modality, its short term and
long term variability need to be studied to justify why this signal is individually
unique.
• For image fusion related applications, the proposed method need to be tested
on a broader range of applications, such as the fusion of visual and thermal
images. In addition, 2D-BEMD can be combined with more sophisticated
fusion schemes to improve the performance.
• For expression invariant face recognition, we need to compare the proposed
system with other state of the art methods. Furthermore, instead of assigning
weight to the entire IMF, block-wise weighting can be used to fuse information
according to a local rather than global criterion.
With the applications and promising results presented in this thesis, we believe that
the proposed framework will benifit many applications in signal and image processing,
as well as providing a better understanding of the EMD algorithm.
Appendix A
Demonstration of the sifting process
In order to gain a better insight into how sifting process acts as the core part of EMD
algorithm and how the IMFs are extracted, in this appendix a simple synthesized signal is
decomposed and we show every major step of the sifting iterations. As we have presented
in Chapter 2, the EMD algorithm consists of two major loops, with the outer loop to
identify the IMF and update the residue, and the inner loop to extract the IMF, which
is done by the sifting process.
All plots except Figure A.1 in this appendix are generated using the EMD package
by Flandrin [45]. The synthesized signal is shown in Figure A.1, which is generated by
generating four uniformly distributed time series of different length, interpolate each of
them to the same length and sum up all four. Its IMFs and residue after applying EMD,
are shown in Figure A.2.
The first step is to interpolate the local maxima and minima from the original signal,
in order to get an upper envelope and a lower envelope. These envelopes are shown as
dashed lines in the top plot in Figure A.3. Mean envelope is computed as the average
of the upper and lower envelope and is shown as the line in between the two. After
subtracting the mean envelope from the signal, a candidate IMF is obtained and is
shown in the third plot in Figure A.3. Subtracting the candidate IMF from the original
88
Appendix A. Demonstration of the sifting process 89
−1
−0.5
0
0.5
1
Figure A.1: Signal to be decomposed.
−1
0
1
−1
0
1
−0.5
0
0.5
−0.5
0
0.5
−0.2
0
0.2
−0.5
0
0.5
200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
−0.1
0
Figure A.2: IMFs and the residue after decomposition.
Appendix A. Demonstration of the sifting process 90
signal gives us the residue for the current iteration, as shown in the last plot in Figure
A.3. In fact this residue is equal to the mean envelope, but this is only true for the first
iteration in the sifting process for every IMF, as we will see later.
−2
0
2IMF 1; iteration 0 before sifting
0
5000
10000stop parameter
−1
0
1IMF 1; iteration 0 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−2
0
2residue
Figure A.3: Sifting iteration 0 for Candidate IMF 1.
Stop paramter is checked according to Equation 2.1.4. In the second plot of each fig-
ure, the solid line is the amplitude ratio between the upper and lower envelope |envu(n)+envl(n)||envu(n)−envl(n)|
and dashed line is δ1. The following paramters are used:δ1 = 0.05
δ2 = 0.5
δt = 0.05
Since large value exists for the envelope amplitude ratio, as can be seen in the second
Appendix A. Demonstration of the sifting process 91
plot in Figure A.3, the stop criterion is not satisfied so the sifting process need to be
continued. By interpolating the local maxima and minima from the residue in Figure
A.3, upper and lower envelopes are constructed, as shown in the first plot in Figure
A.4. Again, we remove the mean envelope to get a candidate IMF1 and test if the stop
criterion is satisfied. Note here the residue is the original signal form iteration 0 minus
the candidate IMF1, so it is equal to the sum of the two mean envelopes from iteration
0 and 1.
−2
0
2IMF 1; iteration 1 before sifting
0
1000
2000stop parameter
−1
0
1IMF 1; iteration 1 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−2
0
2residue
Figure A.4: Sifting iteration 1 for Candidate IMF 1.
Since the stop criterion is not satisfied, the sifting process continues. Figure A.5 shows
iteration 10, where the stop parameter is small enough so that the candidate IMF1 can
be considered as a valid IMF. This is where sifting process stops for IMF1.
Now the terminate criterion is checked to see if the current residue is the final residue
Appendix A. Demonstration of the sifting process 92
for the signal. Since the residue has more than 3 local extrema, it’s not the final residue,
then IMF1 is removed from the original signal from iteration 0 and the residue acts as
the input to the next step, where IMF2 will be extracted.
−1
0
1IMF 1; iteration 10 before sifting
0
0.5
1stop parameter
−1
0
1IMF 1; iteration 10 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−2
0
2residue
Figure A.5: Sifting iteration 10 for Candidate IMF 1.
Upper and lower envelopes are constructed on the current signal, which is the residue
after extracting IMF1. Now candidate IMF2 is the current signal minus the mean enve-
lope. Similar to the previous iterations in IMF1, stop parameter is checked to see if the
candidate IMF2 is a valid one. Since it is not, sifting process continues on the residue
from this iteration.
Similar to the last sifting iteration of IMF1, at iteration 10 IMF criterion is satisfied
so the candidate IMF2 is considered valid. This is where sifting process stops for IMF2.
Now the terminate criterion is checked to see if the current residue is the final residue
Appendix A. Demonstration of the sifting process 93
−2
0
2IMF 2; iteration 0 before sifting
0
5
10x 10
5 stop parameter
−1
0
1IMF 2; iteration 0 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
0
1residue
Figure A.6: Sifting iteration 0 for Candidate IMF 2.
for the signal. Since the residue has more than 3 local extrema, it’s not the final residue.
By removing IMF2 from the signal at iteration 0, we get the residue which is going to
be passed on to the next level.
From Figure A.8 to Figure A.11 we show some of the major steps to extract IMFs
up to the final residue. Note that in Figure fig:AppendixA:siftingIMF6It3, after stop
criterion is checked to validate the candidate IMF6, terminate criterion is checked on
residue to determine that it can no longer be decomposed so can be considered the final
residue of the signal. This is where the entire algorithm stops.
Appendix A. Demonstration of the sifting process 94
−1
0
1IMF 2; iteration 10 before sifting
0
0.5stop parameter
−1
0
1IMF 2; iteration 10 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
0
1residue
Figure A.7: Sifting iteration 10 for Candidate IMF 2.
Appendix A. Demonstration of the sifting process 95
−1
0
1IMF 3; iteration 0 before sifting
0
1000
2000stop parameter
−1
0
1IMF 3; iteration 0 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
0
1residue
Figure A.8: Sifting iteration 0 for Candidate IMF 3.
Appendix A. Demonstration of the sifting process 96
−0.5
0
0.5IMF 3; iteration 7 before sifting
0
0.5stop parameter
−0.5
0
0.5IMF 3; iteration 7 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−1
0
1residue
Figure A.9: Sifting iteration 7 for Candidate IMF 3.
Appendix A. Demonstration of the sifting process 97
−1
0
1IMF 6; iteration 0 before sifting
0
0.5
1stop parameter
−0.5
0
0.5IMF 6; iteration 0 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2residue
Figure A.10: Sifting iteration 0 for Candidate IMF 6.
Appendix A. Demonstration of the sifting process 98
−0.5
0
0.5IMF 6; iteration 3 before sifting
0
0.5stop parameter
−0.5
0
0.5IMF 6; iteration 3 after sifting
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
−0.1
0residue
Figure A.11: Sifting iteration 3 for Candidate IMF 6.
Appendix B
TEOAE dataset
In order to evaluate the proposed method on using transient evoked otoacoustic emission
(TEOAE) signals for biometric recognition, a dataset was collected at the Biometric Se-
curity Laboratory, University of Toronto. This is the first TEOAE dataset with moderate
number of subjects and created specifically for biometric evaluation purpose.
B.1 Data collection setup
The signal collection sessions was carried out at the Biometric Security Laboratory [46],
under University of Toronto protocol #23018. Vivosonic Intergrity [47] system was used
under a setting listed in Table 4.1.
Since the TEOAE signal is highly nonlinear and with weak amplitude, raw response
data are subjected to pre-processing before can be collected as output, which is a standard
in the nonlinear protocol for TEOAE recording. In such protocol, a train of stimuli
consisiting of identical series of four clicks are sent into the ear. Here we distinguish
responses at four different pre-processing stage:
• Per-sweep response
Each stimulus is a 80µs wide click sound, called a click or a sweep, as dipicted in
99
Appendix B. TEOAE dataset 100
Figure B.1. Per-sweep response is the response collected at the outer ear canal for
a duration of 17.2ms after the stimulus. This response is not available for output
in the Intergrity system we used.
• Per-series response
Each series of stimuli consists of four stimuli: three identiccal ones and one with
opposite polarity and three times the amplitude. Per-series response is the response
averaged over these four responses, with the purpose to cancel out the linear com-
ponents in the original per-stimulus response. This response is not available for
output in the Intergrity system we used.
• Per-buffer response
In our setting, per-series responses are divided into groups of 16, with responses
1, 3, 5, 7, 9, 11, 13, 15 goes into buffer A and responses 2, 4, 6, 8, 10, 12, 14, 16 goes into
buffer B. Per-buffer response is the response averaged over 8 per-series responses,
and is the one we can collect using the Intergrity system. Example of such response
is shown in Figure B.2. To measure if a stable TEOAE has been detected, the per-
buffer responses from A and B are filtered and their correlation is the whole wave
reproducibility (WWR).
• Buffer averaged response
Finally every 2 per-buffer response from the same group are averaged to get the
buffer averaged response. Example of such response is shown in Figure B.3.
Note that these response has a low frequency trend with a high amplitude so that the
structure of the signal cannot be observed. Figure B.4 shows the same buffer averaged
response as in Figure B.3, with the trend removed for better visual quality.
Each time three signals are recorded: the stimulus , per-buffer response in buffer A
and per-buffer response in buffer B. Length of recording session depends on how fast
Appendix B. TEOAE dataset 101
0 0.5 1 1.5 2 2.5
−100
−50
0
50
100
150
200
250
300
350
400
Time (ms)
Am
plit
ude (
mP
a)
Figure B.1: Example of a stimulus.
Time (ms)
Am
plit
ud
e (
mP
a)
0 2 4 6 8 10 12 14 16−1
−0.5
0
0.5
1
1.5
2
BufferA
0 2 4 6 8 10 12 14 16−3
−2
−1
0
1
BufferB
Figure B.2: Example of 2 per-buffer response from the same group.
Appendix B. TEOAE dataset 102
0 2 4 6 8 10 12 14 16
−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
Time (ms)
Am
plit
ud
e (
mP
a)
Figure B.3: Example of buffer averaged response.
0 2 4 6 8 10 12 14 16
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
Time (ms)
Am
plit
ud
e (
mP
a)
Figure B.4: Example of buffer averaged response with low frquency trend removed.
the response stablize, which varies by different subject, body condition, body movement,
ear bud fitting, environmental noise, etc.. The person who operates the Intergrity de-
Appendix B. TEOAE dataset 103
vice looks at the whole wave reproducibility (WWR) measure on the system and stop
recording whenever the WWR is high enough (above 90%) or the WWR saturates at
some particular value. The stabilization time of all the recordings, including those from
outliers, are summarized in Figure B.5.
0 20 40 60 80 100 1200
5
10
15
20
25
30
35
40
Stabization Time (Sec)
Nu
mb
er
of
su
bje
cts
Mean = 39.246, Std = 24.3503
Figure B.5: Histogram of stabilization time for recordings from all subjects, including
outliers.
B.2 Intra-subject similarity and inter-subject differ-
ence
After applying the proposed R-EMD, intra-subject similarity and inter-subject difference
can be observed on the decomposed signals. Figure B.6 shows 4-level decomposed record-
ings from left ear of subject 3, with solid lines representing the first session, and dashed
lines for the second session. In Figure B.7 the same first session recording is plotted in
Appendix B. TEOAE dataset 104
solid line, together with the decomposed second session recording from left ear of subject
2 in dashed lines.
Time (ms)
Am
plit
ud
e (
mP
a)
0 2 4 6 8 10 12 14 16
−0.05
0
0.05
0 2 4 6 8 10 12 14 16
−0.05
0
0.05
0 2 4 6 8 10 12 14 16
−0.05
0
0.05
0 2 4 6 8 10 12 14 16
−0.1
0
0.1
Figure B.6: Similarity of TEOAE recordings from the same subject after applying R-
EMD.
B.3 Outlier removal
Highest WWR from left and right ears in both sessions for all subjects are summarized
in Figure B.8. We removed recordings from subject 3 since a wrong testing protocol was
used by mistake for session one. A total of 6 outliers were also removed from the dataset
because their WWR were too low, compared to clinical standard of 85% [48]. Subject
IDs of outliers are: 23, 37, 40, 44, 55, 60 and their WWRs are listed in Table B.3.
Appendix B. TEOAE dataset 105
Time (ms)
Am
plit
ud
e (
mP
a)
0 2 4 6 8 10 12 14 16−0.1
0
0.1
0 2 4 6 8 10 12 14 16
−0.1
0
0.1
0 2 4 6 8 10 12 14 16
−0.05
0
0.05
0 2 4 6 8 10 12 14 16
−0.1
0
0.1
Figure B.7: Difference of TEOAE recordings form different subjects after applying R-
EMD.
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
WWR
Nu
mb
er
of
Re
co
rdin
gs
Mean = 86.6228, Std = 12.2601
Figure B.8: Histogram of WWR for recordings from all subjects, including outliers.
Appendix B. TEOAE dataset 106
SubjectID Left Session 1 Left Session 2 Right Session 1 Right Session 2
23 79.82 81.84 80.74 67.87
37 8.12 29.39 81.81 83.04
40 22.91 32.91 56.30 58.58
44 48.52 71.58 22.84 24.17
55 38.04 40.80 62.61 72.50
60 63.70 32.78 80.78 68.36
Bibliography
[1] N.E. Huang, Z. Shen, R.R. Long, M.L. Wu, Q. Zheng, N.C. Yen, and C.C Tung. The
empirical mode decomposition and hilbert spectrum for nonlinear and nonstationary
time series analysis. Proc. Roy. Soc. London, 454:903995, 1998.
[2] K.T Coughlin and K.K Tung. 11-year solar cycle in the stratosphere extracted by
the empirical mode decomposition method. Advances in Space Research, 34(2):323
– 329, 2004. ¡ce:title¿Solar Variability and Climate Change¡/ce:title¿.
[3] Zhaohua Wu, Edwin K Schnieder, and Zeng-zhen Hu. The impact of global warming
on enso variability in climate records. Transform, 2001.
[4] Binwei Weng, M. Blanco-Velasco, and K.E. Earner. Ecg denoising based on the
empirical mode decomposition. In Engineering in Medicine and Biology Society,
2006. EMBS ’06. 28th Annual International Conference of the IEEE, pages 1 –4, 30
2006-sept. 3 2006.
[5] Pengfei Wei, Qiuhua Li, and Guanglin Li. Classifying motor imagery eeg by empirical
mode decomposition based on spatial-time-frequency joint analysis approach. In
BioMedical Information Engineering, 2009. FBIE 2009. International Conference
on Future, pages 489 –492, dec. 2009.
[6] B. Liu, S. Riemenschneider, and Y. Xu. Gearbox fault diagnosis using empirical
mode decomposition and hilbert spectrum. Mechanical Systems and Signal Process-
ing, 20(3):718 – 734, 2006.
107
Bibliography 108
[7] Kousik Guhathakurta, Indranil Mukherjee, and A. Roy Chowdhury. Empirical mode
decomposition analysis of two different financial time series and their comparison.
Chaos, Solitons and Fractals, 37(4):1214 – 1227, 2008.
[8] E. Delechelle, J. Lemoine, and Oumar Niang. Empirical mode decomposition: an
analytical approach for sifting process. Signal Processing Letters, IEEE, 12(11):764
– 767, nov. 2005.
[9] G. Rilling and P. Flandrin. One or two frequencies? the empirical mode decompo-
sition answers. Signal Processing, IEEE Transactions on, 56(1):85 –95, jan. 2008.
[10] P. Flandrin, G. Rilling, and P. Goncalves. Empirical mode decomposition as a filter
bank. Signal Processing Letters, IEEE, 11(2):112 – 114, feb. 2004.
[11] N E Huang and S S Shen. Hilbert-Huang transform and its applications, volume 5.
World Scientific Publishing Co. Pte. Ltd., 2005.
[12] H. Mohammadzade, F. Agrafioti, Jiexin Gao, and D. Hatzinakos. Bemd for expres-
sion transformation in face recognition. In Acoustics, Speech and Signal Processing
(ICASSP), 2011 IEEE International Conference on, pages 1501 –1504, may 2011.
[13] F. Agrafioti, Jiexin Gao, H. Mohammadzade, and D. Hatzinakos. A 2d bivariate
emd algorithm for image fusion. In Digital Signal Processing (DSP), 2011 17th
International Conference on, pages 1 –6, july 2011.
[14] Jiexin Gao, F. Agrafioti, S. Wang, and D. Hatzinakos. Transient otoacoustic
emissions for biometric recognition. In Acoustics, Speech and Signal Processing
(ICASSP), 2012 IEEE International Conference on, march 2012.
[15] Jiexin Gao and D. Hatzinakos. Effect of initial phase in two tone separation using em-
pirical mode decomposition. In Acoustics, Speech and Signal Processing (ICASSP),
2012 IEEE International Conference on, march 2012.
Bibliography 109
[16] J.C Nunes, Y Bouaoune, E Delechelle, O Niang, and Ph Bunel. Image analysis
by bidimensional empirical mode decomposition. Image and Vision Computing,
21(12):1019 – 1026, 2003.
[17] Anna Linderhed. 2d empirical mode decompositions in the spirit of image compres-
sion. volume 4738, pages 1–8. SPIE, 2002.
[18] S. M. A. Bhuiyan, R. R. Adhami, and J. F. Khan. Fast and adaptive bidimensional
empirical mode decomposition using order-statistics filter based envelope estimation.
EURASIP Journal on Advances in Signal Processing, 2008.
[19] G. Rilling, P. Flandrin, P. Gonalves, and J.M. Lilly. Bivariate empirical mode
decomposition. IEEE Signal Processing Letters, 14(12):936 –939, Dec. 2007.
[20] N. ur Rehman and D.P. Mandic. Empirical mode decomposition for trivariate sig-
nals. Signal Processing, IEEE Transactions on, 58(3):1059 –1068, march 2010.
[21] N. Rehman and D. P. Mandic. Multivariate empirical mode decomposition. Pro-
ceedings of the Royal Society A: Mathematical, Physical and Engineering Science,
2009.
[22] Matthew A. Swabey, Paul Chambers, Mark E. Lutman, Neil M. White, John E.
Chad, Andrew D. Brown, and Stephen P. Beeby. The biometric potential of transient
otoacoustic emissions. Int. J. Biometrics, 1:349–364, March 2009.
[23] Jun Yao and Yuan-Ting Zhang. Bionic wavelet transform: a new time-frequency
method based on an auditory model. Biomedical Engineering, IEEE Transactions
on, 48(8):856 –863, aug. 2001.
[24] Ling Zheng, Yuan-Ting Zhang, Fu-Sheng Yang, and Da-Tian Ye. Synthesis and
decomposition of transient-evoked otoacoustic emissions based on an active auditory
Bibliography 110
model. Biomedical Engineering, IEEE Transactions on, 46(9):1098 –1106, sept.
1999.
[25] Yael Raz. Otoacoustic emissions: Clinical applications, 3rd edition, martin s. robi-
nette, theodore j. glattke, eds., new york: Thieme medical publishers, 2007. The
Laryngoscope, 117(9):1700–1700, 2007.
[26] D. A. Socolinsky and L. B. Wolff. In Image fusion for enhanced visualization of
brain imaging, San Diego, CA, Feb. 1999.
[27] D.A. Socolinsky and L.B. Wolff. Multispectral image visualization through first-
order fusion. IEEE Transactions on Image Processing, 11(8):923 – 931, aug. 2002.
[28] D.A. Fay, A.M. Waxman, M. Aguilar, D.B. Ireland, J.P. Racamato, W.D. Ross,
W.W. Streilein, and M.I. Braun. Fusion of multi-sensor imagery for night vision:
color visualization, target learning and search. In Proceedings of the Third Interna-
tional Conference on Information Fusion, 2000., volume 1, jul. 2000.
[29] L. P. Yaroslavsky, B. Fishbain, A. Shteinman, S. Gepshtein, and Leonid P. Process-
ing and fusion of thermal and video sequences for terrestrial long range observation
systems. In 7th Annual International Conference of Information Fusion, pages 848–
855, 2004.
[30] D. Looney and D.P. Mandic. Fusion of visual and thermal images using complex
extension of EMD. In Second ACM/IEEE Int. Conf. on Distributed Smart Cameras,
pages 1 –8, sep. 2008.
[31] D. Looney and D.P. Mandic. Multiscale image fusion using complex extensions of
EMD. IEEE Transactions on Signal Processing, 57(4):1626 –1630, apr. 2009.
Bibliography 111
[32] H. Hariharan, A. Koschan, B. Abidi, A. Gribok, and M. Abidi. Fusion of visible and
infrared images using empirical mode decomposition to improve face recognition. In
IEEE International Conference on Image Processing, pages 2049 –2052, oct. 2006.
[33] N. Rehman, D. Looney, T.M. Rutkowski, and D.P. Mandic. Bivariate EMD-based
image fusion. In IEEE/SP 15th Workshop on Statistical Signal Processing, pages 57
–60, aug. 2009.
[34] X. Xu, H. Li, and A. N. Wang. The application of BEMD to multispectral image
fusion. In Proc. Int. Conference on Wavelet Analysis Pattern Recognition, pages
448–452, 2007.
[35] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuro-
science, 37(1):2–86, 1991.
[36] X. Tan, S. Chen, Z.H. Zhou, and F. Zhang. Face recognition from a single image
per person: A survey. Pattern Recognition, 39(9):1725 – 1745, 2006.
[37] A.M. Martinez. Recognizing imprecisely localized, partially occluded, and expression
variant faces from a single sample per class. IEE Transactions on Pattern Analysis
and Machine Intelligence, 24(6):748 –763, jun. 2002.
[38] X. Li, G. Mori, and H. Zhang. Expression-invariant face recognition with expression
classification. In Proceedings of the The 3rd Canadian Conference on Computer and
Robot Vision, page 77, Washington, DC, USA, 2006.
[39] H.S. Lee and D. Kim. Expression-invariant face recognition by facial expression
transformations. Pattern Recognition Letters, 29(13):1797 – 1805, 2008.
[40] D. Zhang and Y. Tang. Extraction of illumination-invariant features in face recog-
nition by empirical mode decomposition. In M. Tistarelli and M. Nixon, editors,
Bibliography 112
Advances in Biometrics, volume 5558 of Lecture Notes in Computer Science, pages
102–111. Springer Berlin / Heidelberg, 2009.
[41] Y.L. Liu, X.G. Xu, Y.W. Guo, J. Wang, X. Duan, X. Chen, and Q.S. Peng. Pores-
preserving face cleaning based on improved empirical mode decomposition. Journal
of Computer Science and Technology, 24:557–567.
[42] C. Qing, J. Jiang, and Z. Yang. Empirical mode decomposition-based facial pose
estimation inside video sequences. Optical Engineering, 49(3):037401, 2010.
[43] T. Kanade, J.F. Cohn, and Yingli Tian. Comprehensive database for facial expres-
sion analysis. pages 46 –53, 2000.
[44] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. fisherfaces:
recognition using class specific linear projection. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 19(7):711 –720, jul. 1997.
[45] http://perso.ens-lyon.fr/patrick.flandrin/emd.html.
[46] http://www.comm.utoronto.ca/ biometrics/.
[47] http://www.vivosonic.com/.
[48] Angela Constance Garinis. Efferent Control of the Human Auditory System. PhD
thesis, Department of Speech, Language and Hearing Sciences, The University of
Arizona, 2008.