towards a unified signal representation via empirical mode … · yao chen and camel iii....

128
Towards a unified signal representation via empirical mode decomposition by Jiexin Gao A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto Copyright c 2012 by Jiexin Gao

Upload: doanminh

Post on 14-Apr-2018

224 views

Category:

Documents


2 download

TRANSCRIPT

Towards a unified signal representation via empirical modedecomposition

by

Jiexin Gao

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2012 by Jiexin Gao

Abstract

Towards a unified signal representation via empirical mode decomposition

Jiexin Gao

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2012

Empirical mode decomposition was proposed recently as a time frequency analysis

tool for nonlinear and nonstationary signals. Despite from its many advantages, problems

such as “uniqueness” problem have been discovered which limit the application.

Although this problem has been addressed to some extent by various extensions of

the original algorithm, the solution is far from satisfactory in some scenarios. In this

work we propose two variants of the original algorithm, with emphasis on providing

unified representations. R-EMD makes use of a set of reference signals to guide the

decomposition therefore guarantees unified representation for multiple 1D signals. 2D-

BEMD takes advantage of a projection procedure and is capable of providing unified

representation between a pair of 2D signals. Application of the proposed algorithms on

different problems in biometric and image processing demonstrates promising results and

indicates the effectiveness of the proposed framework.

ii

Dedication

To my parents,

Chao Gao and Jian Zhao

and to my family,

Yao Chen and Camel

iii

Acknowledgements

First and foremost, I would like to sincerely thank my advisor Prof. Dimitrios Hatzinakos.

He gave me guidance and direction that was needed to produce this work. Without his

support, this research would not have been possible. I truly appreciate all the help from

him during research and thesis writing.

I also gratefully thank members of my thesis committee, Prof. Anastasios Venet-

sanopoulos, Prof. Ashish Khisti and Prof. Gregory Steffan for taking time to provide

insightful comments.

Many thanks to all my colleagues in the Biometric Security Laboratory and friends in

the Communications group for their inspiring interaction and encouragement. You have

made the last two years of my life a wonderful experience.

Last but not the least, I would like to thank my parents and my family, for their

support, encouragement and love at all times. To them I dedicate this thesis.

iv

Contents

List of Tables viii

List of Figures ix

List of Algorithms xiii

Symbols and Abbreviations xv

1 Introduction 1

1.1 Problems of Empirical Mode Decomposition . . . . . . . . . . . . . . . . 2

1.2 Research Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Empirical Mode Decomposition and Its Representational Problem 7

2.1 Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Local Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.3 Terminate Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.4 Stop Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Two Dimensional Empirical Mode Decomposition . . . . . . . . . . . . . 15

v

2.3 Bivariate Empirical Mode Decomposition . . . . . . . . . . . . . . . . . . 21

2.4 Other Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5 Problem with EMD - No unified representation . . . . . . . . . . . . . . 26

2.5.1 One Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5.2 Two Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . 30

2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 EMD with Unified Representation 32

3.1 One Dimensional Reference EMD . . . . . . . . . . . . . . . . . . . . . . 32

3.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.1.2 Relationship between extracted frequency and reference signal . . 34

3.1.3 Reference frequency selection for R-EMD . . . . . . . . . . . . . . 44

3.1.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Two Dimensional Bivariate EMD . . . . . . . . . . . . . . . . . . . . . . 45

3.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.2 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Two Dimensional Reference EMD . . . . . . . . . . . . . . . . . . . . . . 50

3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Otoacoustic Emissions for Biometric Recognition via R-EMD 53

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Otoacoustic Emissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 Signal Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.5 Biometric System Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.6 OAE signal decomposition using R-EMD . . . . . . . . . . . . . . . . . . 58

4.6.1 Reference frequency selection . . . . . . . . . . . . . . . . . . . . 59

4.6.2 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

vi

4.7 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.7.1 Single ear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.7.2 Fusion of left and right ear . . . . . . . . . . . . . . . . . . . . . . 62

4.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Image Fusion via 2D-BEMD 65

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.2 Multiscale Fusion with 2D-BEMD . . . . . . . . . . . . . . . . . . . . . . 66

5.3 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.3.1 Results on synthesized blurred images . . . . . . . . . . . . . . . 72

5.3.2 Results on partially focused photos . . . . . . . . . . . . . . . . . 72

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 Expression Invariant Face Recognition via 2D-BEMD 75

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.3 Expression Transformation with 2D-BEMD . . . . . . . . . . . . . . . . . 77

6.3.1 Expression mask . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.3.2 Expression transformation . . . . . . . . . . . . . . . . . . . . . . 78

6.4 Experimental setup and results . . . . . . . . . . . . . . . . . . . . . . . 80

6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7 Conclusion 85

7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A Demonstration of the sifting process 88

vii

B TEOAE dataset 99

B.1 Data collection setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

B.2 Intra-subject similarity and inter-subject difference . . . . . . . . . . . . 103

B.3 Outlier removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Bibliography 107

viii

List of Tables

4.1 TEOAE recording protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 TEOAE biometric recognition performance . . . . . . . . . . . . . . . . . 64

ix

List of Figures

2.1 A synthesized signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Intrinsic mode functions from decomposing the signal in Figure 2.1. From

top to bottom: IMF1, IMF2, IMF3, IMF4, IMF5, IMF6 and residue. . . 11

2.3 Amplitude versus time-frequency spectrum. . . . . . . . . . . . . . . . . 12

2.4 Local maximum and minimum detection. . . . . . . . . . . . . . . . . . . 19

2.5 Surface envelopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 An image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.7 Intrinsic mode functions from decomposing the image in Figure 2.6. . . . 22

2.8 3D tube built to surround a complex signal. . . . . . . . . . . . . . . . . 23

2.9 Estimation of the center of 3D tube. . . . . . . . . . . . . . . . . . . . . 23

2.10 Two synthesized signals. Top: signal A. Bottom: signal B. . . . . . . . . 26

2.11 Complex intrinsic mode functions from decomposing the pair of signals in

Figure 2.10. Solid lines represent real parts of the IMFs and dashed lines

represent imaginary parts of the IMFs. . . . . . . . . . . . . . . . . . . . 27

2.12 Example of three signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.13 Decomposition obtained by applying EMD separately. . . . . . . . . . . . 29

2.14 Two faces with different expressions. . . . . . . . . . . . . . . . . . . . . 30

2.15 Decomposition obtained by applying 2D-EMD separately. . . . . . . . . . 31

3.1 Contour plot for demonstration of the solution to Equation (3.16). Shaded

area corresponds to the region on t, defined by the second inequality. . . 38

x

3.2 Examples of different values of q and their intersection with y(t) = 0.

Vertical dotted line represent the current value q = qo. . . . . . . . . . . 39

3.3 Condition for q such that no solution exists for Equation (3.13). Two

arrows: q = 0.5 and q = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Instantaneous frequency of the chirp (solid line) and the reference (dashed

line). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5 First level IMF: real part on the top plot and imaginary part on the bottom

plot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Approximate of the filter structure of one level R-EMD. . . . . . . . . . . 43

3.7 Smoothed one level frequency response. . . . . . . . . . . . . . . . . . . . 44

3.8 Filter bank structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.9 Filter bank in log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.10 Demonstration of the proposed R-EMD. . . . . . . . . . . . . . . . . . . 47

3.11 Demonstration of the proposed 2D-BEMD algorithm. . . . . . . . . . . . 49

4.1 Click sound stimulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 TEOAE response after pre-processing. . . . . . . . . . . . . . . . . . . . 55

4.3 Proposed biometric recognition system. . . . . . . . . . . . . . . . . . . . 58

4.4 IMF1− 4 from decomposing a raw TEOAE recording. . . . . . . . . . . . 60

5.1 Top row: partially defocused images A (left) and B (right). Bottom row:

zoomed in details showing the differences between the two images. . . . . 67

5.2 IMFs 1-5 obtained with the proposed 2D-BEMD algorithm for partially

defocused images. Left column: IMFs corresponding to image A. Right

column: IMFs corresponding to image B. . . . . . . . . . . . . . . . . . . 68

5.3 IMFs 6-10 obtained with the proposed 2D-BEMD algorithm for partially

defocused images. Left column: IMFs corresponding to image A. Right

column: IMFs corresponding to image B. . . . . . . . . . . . . . . . . . . 69

xi

5.4 BEMD and 2D-BEMD fusion results on an artificially generated partially

blurred image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.5 2D-BEMD fusion results on two partially defocused sets of images. . . . . 73

5.6 Examples of BEMD and 2D-BEMD based fusion results. The input images

are partially defocused (background versus foreground) while the recon-

structed (all in focus) image of the BEMD case exhibits significant artifacts. 74

6.1 Expression masks used for decomposition with an input image of arbitrary

expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 2D-BEMD analysis for an input image and a surprised mask image. The

edges of the input are among the first few IMFs, while most of the infor-

mation of the mask is found in IMFs 5 and 6. . . . . . . . . . . . . . . . 79

6.3 Weights used in fusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.4 Examples of expression transformation with 2D-BEMD. From top to bot-

tom: input images, expression masks, transformed faces and ground truth

images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.5 The enrollment pipeline. Every gallery image to be enrolled is first used

to synthesize 6 expression variants. . . . . . . . . . . . . . . . . . . . . . 81

6.6 Verification rate versus false acceptance rate, for gallery of size 10. . . . . 82

6.7 Verification rate versus false acceptance rate, for gallery of size 20. . . . . 82

6.8 Verification rate versus false acceptance rate, for gallery of size 40. . . . . 83

A.1 Signal to be decomposed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

A.2 IMFs and the residue after decomposition. . . . . . . . . . . . . . . . . . 89

A.3 Sifting iteration 0 for Candidate IMF 1. . . . . . . . . . . . . . . . . . . . 90

A.4 Sifting iteration 1 for Candidate IMF 1. . . . . . . . . . . . . . . . . . . . 91

A.5 Sifting iteration 10 for Candidate IMF 1. . . . . . . . . . . . . . . . . . . 92

A.6 Sifting iteration 0 for Candidate IMF 2. . . . . . . . . . . . . . . . . . . . 93

xii

A.7 Sifting iteration 10 for Candidate IMF 2. . . . . . . . . . . . . . . . . . . 94

A.8 Sifting iteration 0 for Candidate IMF 3. . . . . . . . . . . . . . . . . . . . 95

A.9 Sifting iteration 7 for Candidate IMF 3. . . . . . . . . . . . . . . . . . . . 96

A.10 Sifting iteration 0 for Candidate IMF 6. . . . . . . . . . . . . . . . . . . . 97

A.11 Sifting iteration 3 for Candidate IMF 6. . . . . . . . . . . . . . . . . . . . 98

B.1 Example of a stimulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

B.2 Example of 2 per-buffer response from the same group. . . . . . . . . . . 101

B.3 Example of buffer averaged response. . . . . . . . . . . . . . . . . . . . . 102

B.4 Example of buffer averaged response with low frquency trend removed. . 102

B.5 Histogram of stabilization time for recordings from all subjects, including

outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

B.6 Similarity of TEOAE recordings from the same subject after applying R-

EMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

B.7 Difference of TEOAE recordings form different subjects after applying R-

EMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

B.8 Histogram of WWR for recordings from all subjects, including outliers. . 105

xiii

List of Algorithms

1 EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 2D EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 FA-2D-EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 BEMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 R-EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 2D-BEMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7 FA-2D-BEMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

xiv

Symbols and Abbreviations

t continuous variable

m,n discrete variables

x(t) one dimensional continuous time signal

x(n) one dimensional discrete time signal

x(m,n) two dimensional signal (or image, surface)with m and n representing

spatial index

z(n) complex-valued one dimensional discrete time signal

z(m,n) complex-valued two dimensional signal (or image, surface)

xi(n) the ith IMF associated with signal x(n), with superscript denoting

IMF index

xi(m,n) the ith IMF associated with signal x(m,n), with superscript denoting

IMF index

sgn(x) the signum function

EMD Empirical Mode Decomposition

IMF Intrinsic Mode Function

BEMD Bivairiate Empirical Mode Decomposition

2D-EMD Two Dimensional Empirical Mode Decomposition

R-EMD Reference Empirical Mode Decomposition

2D-BEMD Two Dimensional Bivariate Empirical Mode Decomposition

LDA Linear Discriminant Analysis

PCA Principle Component Analysis

OAE Otoacoustic Emission

TEOAE Transient Evoked Otoacoustic Emission

xv

SOAE Spontaneous Otoacoustic Emission

WWR Whole Wave Reproducibility

xvi

Chapter 1

Introduction

Time-frequency analysis has been widely used in engineering and biomedical applica-

tions since it provides insight into complex structure of the signal and can potentially

reveal the underlying processes. Unlike the traditional Fourier transform, time-frequency

transforms are capable of capturing time-varying characteristics of a time series. Wavelet

transform, which is the most widely applied time-frequency analysis, is carried out by

calculating the inner product of the signal and a family of wavelets, which are dilated and

translated from a fixed mother wavelet. Although it is effective in many applications,

due to the fact that it is essentially linear with fixed basis it is still suboptimal for real

world signals.

Recently, Empirical Mode Decomposition (EMD) together with Hilbert Transform

was proposed by Huang et al. [1], as a new time-frequency analysis tool for nonlinear and

nonstationary signals. It has been applied to different areas and has demonstrated to be

effective in mechanical engineering, biomedicine, geology and financial analysis [2–7]. It

is data driven and adapts to the embedded signal property automatically, which made it

powerful as a data analysis tool. For a discrete 1D signal x(n), EMD decomposes it into

a set of oscillation components (Intrinsic Mode Function, or IMF) plus a signal trend :

x(n) =∑i

xi(n) + xr(n) (1.1)

1

Chapter 1. Introduction 2

where xi(n) denotes the ith level IMF and xr(n) denotes the signal trend. IMF satisfies

two criteria in order to be considered as characteristic oscillatory component. It is a

signal such that:

• Number of extrema and number of zero crossings are equal or differ by at most one.

• Upper envelope and lower envelope are symmetric to each other.

After the IMFs are obtained, one can apply Hilbert Transform on each IMF to get its

instantaneous amplitude and instantaneous frequency which can be combined together

to yield an amplitude-time-frequency spectrum.

The decomposition is done by iteratively separating faster oscillations from the slower

ones on local scales, thereby creating a decomposition which ranges from the fastest

oscillatory component x1(n) to the slowest signal trend xr(n). Moreover, the above

procedure is adaptive and data-driven, meaning that no pre-defined basis is required.

1.1 Problems of Empirical Mode Decomposition

The EMD algorithm is simple and widely applied, but its theoretical background and

limitations remain uncertain due to its empirical and algorithmic nature. An overview

was presented in [11] on some aspects of the exisiting problems with EMD, including con-

fidence limit, the boundary effect, interpolation method, mode mixing and the “unique-

ness” problem. The “uniqueness” problem is one of the major problems of EMD which

originates from the fact that there is no analytical definition for the decomposition. Giv-

ing a signal and the IMF requirement, there can be many valid sets of IMFs to reconstruct

the signal. Moreover, since the decomposition procedure is iterative and signal depen-

dant, information such as number of levels and oscillatory modes in each level cannot be

determined a priori.

Researchers have been investigating the possibility of detecting an optimal or unique

set of IMFs, but it is difficult to justify what is optimal, in addition, how it can be

Chapter 1. Introduction 3

determined is unclear. It has been shown that [11], different IMF sets can be generated

from the same signal by varying the parameters related to sifting process1, but how they

are inter-related is still unclear. To answer this, more fundamental problems need to be

solved first, such as how to define the decomposition process rigorously. Although the

problem has drawn significant attention from the methematics community, before such

definition can be made, to address the “uniquenss” problem from an engineering point

of view, it is necessary to refine the algorithm such that unified representation can be

obtained.

1.2 Research Goals

Among the recent development of the EMD algorithm, to the best of the author’s knowl-

edge, there is no framework to address the problem of obtaining a unified representation

after decomposition, that is, to develop a tractable algorithm while retaining the desired

features of the original EMD.

In this thesis, we will investigate why such representation is necessary, and develop

two variants of the original algorithm: One Dimensional Reference EMD (R-EMD) to

address the problem in one dimensional case; Two Dimensional Bivariate EMD (2D-

BEMD) to address the problem in two dimensional case. We will demonstate that the

proposed algorithms are capable of obtaining a unified representation while providing

meaningful decomposition, by comparing the proposed ones with state of the art EMD

algorithms.

Successful applications of the two proposed algorithms will be demonstrated for three

real world problems where the state of the art EMD algorithms cannot be applied or

proved to be suboptimal. In case where existing EMD algorithm is applicable but con-

sidered suboptimal, performance will be compared against the state of the art EMD

1A core process in EMD, which will be presented in details in Chapter 2.

Chapter 1. Introduction 4

algorithms. In case where no exisiting EMD algorithm is applicable, performance will be

compared with available baseline methods. Performance will also be compared to those

of wavelet transform where appropriate.

1.3 Contributions

Major contributions from this thesis are summarized as follows:

1. Proposed and developed R-EMD for decomposing multiple 1D signals under unified

representation. Upper and lower bound of frequency extraction region were derived

and validated. Rules for reference frequency selection were given.

2. Collected the first dataset of transient evoked otoacoustic emission (TEOAE) sig-

nals with moderate number of subjects, under a specific biometric setup.

3. Framework for using TEOAE as biometric modality has been developed by em-

ploying the proposed R-EMD. The method was validated on the collected dataset.

4. Proposed and developed 2D-BEMD for decomposing 2D signals under unified rep-

resentation.

5. Applied 2D-BEMD to image fusion and the results outperform BEMD, pixel aver-

aging and wavelet fusion.

6. Applied 2D-BEMD for transforming facial expressions in an expression invariant

face recognition framework. The results outperform the baseline PCA approach.

1.4 Related Publications

[12] Jiexin Gao, F. Agrafioti, S. Wang, and D. Hatzinakos. Transient otoacoustic

emissions for biometric recognition. In Acoustics, Speech and Signal Processing

(ICASSP), 2012 IEEE International Conference on, March 2012.

Chapter 1. Introduction 5

This paper was the winner of the Fujitsu Student Paper Award at ICASSP 2012

[13] Jiexin Gao and D. Hatzinakos. Effect of initial phase in two tone separation

using empirical mode decomposition. In Acoustics, Speech and Signal Processing

(ICASSP), 2012 IEEE International Conference on, March 2012.

[14] F. Agrafioti, Jiexin Gao, H. Mohammadzade, and D. Hatzinakos. A 2d bivariate

emd algorithm for image fusion. In Digital Signal Processing (DSP), 2011 17th

International Conference on, pages 1-6, July 2011.

[15] H. Mohammadzade, F. Agrafioti, Jiexin Gao, and D. Hatzinakos. Bemd for expres-

sion transformation in face recognition. In Acoustics, Speech and Signal Processing

(ICASSP), 2011 IEEE International Conference on, pages 1501-1504, May 2011.

1.5 Thesis Organization

This thesis is organized as follows:

In Chapter 2, a brief review is provided for the original one dimensional EMD (EMD)

and two dimensional EMD (2D-EMD) algorithm, as well as the one dimenional bivari-

ate EMD (BEMD). We also discuss the importance of unified signal representation and

demonstrate different scenarios under which the existing algorithms fail to provide such

representation.

In Chapter 3, two variants of EMD algorithm are developed, one dimensional reference

EMD (R-BEMD) for one dimensional case and two dimensional bivariate EMD (2D-

BEMD) for two dimensional case.

From Chapter 4 to Chapter 6 the focus is on different applications of the two proposed

algorithms.

In Chapter 4, we apply R-EMD to decompose transient evoked otacoustic emission

(TEOAE) signals at multiple scales for biometric recognition. Under the unified repre-

Chapter 1. Introduction 6

sentation, a piece of TEOAE recording can be matched with all enrolled recordings for

identity establishment purpose. Performance of the proposed system is evaluated with

our collected data.

In Chapter 5, we apply 2D-BEMD to simultaneously decompose two multi-focused

images, in order to fuse information at different scales for a better full-focused image.

Performance will be compared against the state of the art EMD algorithm, and traditional

methods such as wavelet and pixel averaging.

In Chapter 6, we apply 2D-BEMD to simultaneously decompose a face image with an

expression mask, in order to transform the expression on input face into the desired one.

Using this as a pre-processing step in a face recognition system, significant improvement

can be achieved over the baseline method. As a priliminary result, we compare the

performance of the proposed method with a baseline PCA-based face recognition system.

In Chapter 7, conclusion and future directions are presented.

Chapter 2

Empirical Mode Decomposition and

Its Representational Problem

In this chapter a brief review is provided on the original EMD algorithm1 and some of

its related variants for providing a unified representation. Details on the parameters and

procedures involved in the original EMD algorithm is also given for a better understanding

of the method. A summary of each algorithm and the respective decomposition results

on sample signals are presented. Then a demonstration is provided to justify that under

certain scenarios a unified representation is necessary, which motivates the development

of the subsequent algorithms in the next chapter.

2.1 Empirical Mode Decomposition

Empirical Mode Decomposition (EMD) was developed by Huang et al. [1] for processing

nonlinear and nonstationary data. EMD performs an adaptive analysis in the time

domain, in order to isolate the inherent oscillations, referred to as the intrinsic mode

functions (IMF). For an one dimensional signal x(n), EMD algorithm operates as follows:

1For simplicity, this algorithm will be referred to as EMD directly in later discussions.

7

Chapter 2. Empirical Mode Decomposition and Its Representational Problem8

1. Initialize the signal to be iterated on as the original signal: xr(n) = x(n). Initialize

IMF index i = 1.

2. Detect local maxima and minima of xr(n).

3. Interpolate among local maxima to get an upper envelope envu(n), and local min-

ima for envl(n), respectively.

4. Compute the mean envelope as the average of the upper and lower envelopes:

12[envu(n) + envl(n)].

5. Subtract the mean envelop from signal to get a candidate IMF: xi(n) = xr(n) −12[envu(n) + envl(n)].

6. If the candidate IMF satisfies the stop criterion, update the residue for next IMF

extraction: xr(n) = xr(n) − xi(n), increase IMF index i and go to step 7. Other-

wise, update residue for the next sifting iteration: xr(n) = xr(n)− xi(n) and keep

iterating from step 2.

7. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.

Otherwise, go to step 2.

Step 2 to 6 descibe a sifting process, which is stopped when xi(n) meets the stop

criterion for IMF:

• Total number of local extremum points and total number of zero crossings are equal,

or differ at most by 1.

• Average of its upper envelop and lower envelop is almost zero mean.

If xi(n) meets the stop criterion for IMF, it describes an underlying oscillation of x(n).

The next step would be to remove this oscillation from xr(n), increase the IMF index

and iterate on the residual. A detailed demonstration of the sifting process is presented

Chapter 2. Empirical Mode Decomposition and Its Representational Problem9

in Appendix A. The procedure is terminated when xr(n) describes a true residue (signal

trend) which is a monotonic function such that no more IMF can be extracted.

The algorithm is summarized in Algorithm 1, where CheckStopCriterion checks to see

if the target signal is an IMF in order to stop the sifting process, and CheckTerminate-

Criterion checks if the entire algorithm needs to be terminated, which happens when the

current signal to be iterated cannot be further decomposed.

Algorithm 1: EMD

1 i = 1;2 xr(n) = x(n);3 while NOT(CheckTerminateCriterion(xr(n))) do4 xr(n) = xr(n);5 while NOT(CheckStopCriterion(xi(n))) do6 envu(n) = Interpolate(FindLocalMax(xr(n)));7 envl(n) = Interpolate(FindLocalMin(xr(n)));8 xi(n) = xr(n)− 1

2[envu(n) + envl(n)];

9 xr(n) = xr(n)− xi(n);

10 end11 xr(n) = xr(n)− xi(n);12 i = i+ 1;

13 end

After execution of the algorithm, suppose there are a total of L − 1 IMFs and the

residue in the end, then the original signal x(n) can be represented by:

x(n) =L−1∑i=1

xi(n) + xr(n) (2.1)

where x1(n), x2(n), x3(n) . . . xL−1(n) denote the Intrinsic Mode Functions (IMF) and

xr(n) denotes the residue. Alternatively we can include the residue in the summation to

obtain a simpler representation:

x(n) =L∑i=1

xi(n) (2.2)

where the residue is represented as the Lth level IMF. Note that this is just an alternative

representation for simplicity since the residue does not satisfy the IMF criterion. In most

of the subsequent discussions, we use this simplified representation.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem10

200 400 600 800 1000 1200 1400 1600 1800 2000

−1

−0.5

0

0.5

1

Sample

Am

plit

ude

Figure 2.1: A synthesized signal.

The IMFs after decomposition are considered as a set of intrinsic oscillatory com-

ponents of the original signal. If a time-frequency representation is needed, the set of

IMFs can further go through a Hilbert Transform, which gives the instantaneous ampli-

tude and frequency for each IMF that can be combined together for an amplitude versus

time-frequency spectrum.

Figure 2.1 shows a synthesized signal and a demonstration of the decomposition result

is shown in Figure 2.2. In Figure 2.2 there are 6 IMFs plus the residue, where the first row

shows IMF1 and the last row shows the residue. It can be observed that the decomposition

adapts to the intrinsic oscillatory modes without any prior assumption and information

about the signal.

The corresponding amplitude-time-frequency2 plot is shown in Figure 2.3. Note that

in the spectrum darker colors represent higher amplitude and lighter colors represent

lower amplitude.

2The combination of EMD and Hilbert Transform is also known as Hilbert-Huang Transform, orHilbert-Huang spectrum.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem11

Am

plit

ud

e

200 400 600 800 1000 1200 1400 1600 1800 2000−0.2

−0.1

0

Sample

−0.5

0

0.5

−0.2

0

0.2

−0.5

0

0.5

−0.5

0

0.5

−1

0

1

−1

0

1

Figure 2.2: Intrinsic mode functions from decomposing the signal in Figure 2.1. From

top to bottom: IMF1, IMF2, IMF3, IMF4, IMF5, IMF6 and residue.

Details about the algorithm, including local extrema detection, interpolation method,

terminate criterion and stop criterion are presented in the following sections.

2.1.1 Local Extrema

Since the algorithm start decomposing the signal from the finnest scale, FindLocalMax

searches for particular sample that is larger than both its left and right neighbours. For

a signal u(n), it finds the collection of points {ni} such that:

{ni} = {ni : u(ni) > u(ni−1) & u(ni) > u(ni+1)} (2.3)

FindLocalMin operates in a similar manner.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem12

no

rma

lize

d f

req

ue

ncy

time

Hilbert−Huang spectrum

200 400 600 800 1000 1200 1400 1600 18000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Figure 2.3: Amplitude versus time-frequency spectrum.

2.1.2 Interpolation

Interpolation is of particular importance for the EMD algorithm. Indeed, the superpo-

sition of IMF2 up to the residue is the superposition of all the mean envelopes from the

sifting process for IMF1. To see this, we can write the original signal as IMF1 plus a

collection of mean envelopes, as in the form after the first IMF has been extracted:

x(n) = x1(n) +∑s

envsm (2.4)

where envsm denotes the mean envelope in the sth sifting iteration, and the sum is taken

over all sifting iterations during the process to extract IMF1. In fact this collection of

mean envelopes is the residue to be further processed for higher level IMFs and the true

Chapter 2. Empirical Mode Decomposition and Its Representational Problem13

residue. In other words, the interpolation method defines how osscilatory mode can be

constructed in the EMD algorithm.

The most commonly used interpolation method is cubic spline, which operates as

follows:

Giving a set of K points from the original signal which consists of the index set

{t1, t2, · · · tK} and the corresponding amplitude set {x(t1), x(t2), · · ·x(tK)}, cubic spline

interpolation finds a piecewise function:

S(t) =

S1(t) for t1 ≤ t ≤ t2

S2(t) for t2 ≤ t ≤ t3

...

SK−1(t) for tK−1 ≤ t ≤ tK

(2.5)

where each Sk(t) is a 3rd order polynomial:

Sk(t) = ak(t− tk)3 + bk(t− tk)2 + ck(t− tk) + dk (2.6)

To solve for the parameters these conditions must be satisfied:

1. S(t) passes through the provided K points

2. S ′(t) is continuous

3. S ′′(t) is continuous

that is ∀k = 1, 2, · · ·K − 1 we need:

Sk(tk) = x(tk)

Sk(tk+1) = x(tk+1)

S ′k(tk+1) = S ′k+1(tk+1)

S ′′k (tk+1) = S ′′k+1(tk+1)

Chapter 2. Empirical Mode Decomposition and Its Representational Problem14

In addition, two more boundary conditions need to be specified in order for the parameters

to be solved. Two standard choices exist and can be used alternatively. In natural cubic

spline we require: S ′′1 (t1) = 0

S ′′K−1(tK) = 0

(2.7)

In clamped cubic spline we require:S ′1(t1) = x′(t1)

S ′K−1(tK) = x′(tK)

(2.8)

2.1.3 Terminate Criterion

The CheckTerminateCriterion in the outer loop is to check whether the current residue

xr(n) can be further decomposed or should be considered as the true residue of the

original signal. It is satisfied when the total number of extrema is less than three:

|{nmax}|+ |{nmin}| < 3

where |{nmax}| and |{nmin}| represent the number of maxima and minima, respectively.

2.1.4 Stop Criterion

The CheckTerminateCriterion in the inner loop is to check whether the candidate IMF

xi(n) is actually an IMF. In practice, this is determined by either checking the IMF

conditions directly, or by a Cauchy convergence test. Giving a candidate IMF xis(n) at

level i for sifting iteration s, these two type of tests are as follows:

• IMF condition check

Let the upper evelope of xis(n) be envu(n) and the lower envelope be envl(n).

By checking the IMF conditions directly, the stop criterion is satisfied when the

Chapter 2. Empirical Mode Decomposition and Its Representational Problem15

candidate IMF is oscillatory and its mean envelope is zero mean:|envu(n) + envl(n)| < δ2 · |envu(n)− envl(n)|, ∀n

|{no}|N

< δt

−1 ≤ |{nzero}| − (|{nmax}|+ |{nmin}|) ≤ 1

(2.9)

with

{no} = {no : |envu(no) + envl(no)| > δ1 · |envu(no)− envl(no)|}

where N is the length of candidate IMF vector and |{nzero}| is the number of zero

crossings of the candidate IMF. δ1, δ2 and δt are adjustable parameters.

• Convergence test

By a Cauchy convergence test, the stop criterion is satisfied when the candidate

IMFs from two consecutive sifting iterations are close enough to each other:

SD < δSD

where δSD is an adjustable parameter and

SD =∑n

|xis−1(n)− xis(n)|2

|xis−1(n)|2

2.2 Two Dimensional Empirical Mode Decomposi-

tion

The original EMD algorithm was also extended for the decomposition of images (two

dimensional signal) [16, 17]. There are multiple ways of referring to this algorithm, in

this thesis we use two dimensional empirical mode decomposition (2D-EMD) throughout

the discussion.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem16

As an extension of the original algorithm, 2D-EMD utilizes a similar sifting process,

with the exception that instead of 1D cubic spline interpolation, 2D scattered data in-

terpolation is used to derive the mean surface. The interpolation can be done via many

types of methods, such as multilevel B-spline, Delaunay triangulation, or finite-element

method. Giving an image x(m,n), the 2D-EMD algorithm operates as follows:

1. Initialize the signal to be iterated on as the original signal xr(m,n) = x(m,n).

Initialize IMF index i = 1.

2. Detect local maxima and minima of xr(m,n).

3. Interpolate among local maxima to get an upper surface envelope envu(m,n), and

local minima for lower surface envelope envl(m,n), respectively.

4. Compute the mean envelope as the average of the two: 12[envu(m,n) + envl(m,n)].

5. Subtract the mean envelope from signal to get a candidate IMF: xi(m,n) = xr(m,n)−12[envu(m,n) + envl(m,n)].

6. If the candidate IMF satisfies the stop criterion, update the residue for next IMF

extraction: xr(m,n) = xr(m,n) − xi(m,n), increase IMF index and go to step 7.

Otherwise, update the residue for the next sifting iteration: xr(m,n) = xr(m,n)−

xi(m,n) and keep iterating from step 2.

7. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.

Otherwise, go to step 2.

The procedure is described in Algorithm 2.

In this algorithm FindLocalMax2D (FindLocalMin2D) is a 2D extension to the ex-

trema detector in EMD, it searches in a 9× 9 grid centered around the point of interest

for local maxima and minima. CheckStopCriterion2D is slightly different than the one

Chapter 2. Empirical Mode Decomposition and Its Representational Problem17

Algorithm 2: 2D EMD

1 i = 1;2 xr(m,n) = x(m,n);3 while NOT(CheckTerminateCriterion2D(xr(m,n))) do4 xr(m,n) = xr(m,n);5 while NOT(CheckStopCriterion2D(xi(m,n))) do6 envu(m,n) = Interpolate2D(FindLocalMax2D(xr(m,n)));7 envl(m,n) = Interpolate2D(FindLocalMin2D(xr(m,n)));8 xi(m,n) = xr(m,n)− 1

2[envu(m,n) + envl(m,n)];

9 xr(m,n) = xr(m,n)− xi(m,n);

10 end11 xr(m,n) = xr(m,n)− xi(m,n);12 i = i+ 1;

13 end

used in EMD since for two dimensional signals it is impossible to check the number of

zero crossings. Thus the stop criterion is modified as:

• At any point, the mean value of the upper and lower surface envelope is near zero.

• The IMFs are locally orthogonal to each other.

In practice, usually a Cauchy convergence test is used as stop criterion. Let xis(m,n) be

the candidate IMF at level i for sifting iteration s. The stop criterion is satisfied when:

SD < δSD

where δSD is an adjustable threshold and

SD =∑m,n

|xis−1(m,n)− xis(m,n)|2

|xis−1(m,n)|2

Similarly to the EMD algorithm, CheckTerminateCriterion2D checks to see if the total

number of local extrema is less than 3 so that the residue cannot be further decomposed.

After the execution of the algorithm, suppose we have a total of L− 1 IMFs and the

residue in the end, then the original image x(m,n) can be represented by:

x(m,n) =L−1∑i=1

xi(m,n) + xr(m,n) (2.10)

Chapter 2. Empirical Mode Decomposition and Its Representational Problem18

where x1(m,n), x2(m,n), x3(m,n) . . . xL−1(m,n) denote the Intrinsic Mode Functions

(IMF) and xr(m,n) denotes the residue. Note that here both the IMFs and residue

are two dimensional signals (images). Similar to the alternative representation for EMD,

we can include the residue in the summation:

x(m,n) =L∑i=1

xi(m,n) (2.11)

where the residue is represented as the Lth level IMF.

Due to the computational complexity of the 2D interpolation operation and the it-

erative nature of the algorithm, 2D-EMD is a relatively slow algorithm. Bhuiyan [18]

proposed the use of an approximation process to estimate the mean surface, thereby

reducing or avoiding the use of the iterative sifting process, in order to speed up the

2D-EMD algorithm. This FA-2D-EMD3 without sifting process operates as follows:

1. Initialize the signal to be iterated on as the original signal xr(m,n) = x(m,n).

Initialize IMF index i = 1.

2. Detect local maxima and minima of xr(m,n).

3. Determine the set of distances {dadj−max} between adjacent maxima, and the set of

distances {dadj−min} between adjacent minima.

4. Find the largest and smallest distances in both {dadj−max} and {dadj−min}. Choose

a window size accordingly4.

5. Apply order statistics filter and smooth filter to obtain the upper and lower surface

envelope.

6. Removing mean surface (average of the upper and lower surface envelope) from

xr(m,n) to get the current level IMF.

3In the original article it was referred to as FABEMD. In order to avoid confusion with BivariateEmpirical Decomposition (BEMD) which will be covered later, FA-2D-EMD is used instead.

4There are four choices: maximum of {dadj−max}, minimum of {dadj−max}, maximum of {dadj−min}and minimum of {dadj−min}. Details on window size selecting can be found in [18].

Chapter 2. Empirical Mode Decomposition and Its Representational Problem19

7. Update the residue for next IMF extraction: xr(m,n) = xr(m,n)− xi(m,n)

8. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.

Otherwise, go to step 2.

Algorithm 3: FA-2D-EMD

1 i = 1;2 xr(m,n) = x(m,n);3 while NOT(CheckTerminateCriterion2D(xr(m,n))) do4 xr(m,n) = SmoothFilter(OrderStatisticsFilter(xr(m,n)));5 xi(m,n) = xr(m,n)− xr(m,n);6 xr(m,n) = xr(m,n);7 i = i+ 1;

8 end

Figure 2.4: Local maximum and minimum detection.

In FA-2D-EMD the mean surface is estimated by applying an order statistics filter

followed by a smoothing operation. It was claimed that this procedure approximates

Chapter 2. Empirical Mode Decomposition and Its Representational Problem20

upper surface envelope before smoothing lower surface envelope before smoothing

upper surface envelope after smoothing lower surface envelope after smoothing

Figure 2.5: Surface envelopes.

the original sifting process in one iteration. During order statistic filtering, the upper

surface envelope is obtained by setting each pixel value to the maximal value within a

window surrounding it, with the window size determined by the overall distance between

adjacent local extrema. The lower surface envelope can be obtained similarly.

Figure 2.4 shows the detected local maxima (denoted as ‘o’s) and minima (denoted as

‘x’s) for a 20× 20 sample image. Figure 2.5 shows the upper and lower surface envelope,

before and after local smoothing. After this, the mean surface is obtained by averaging

the two smoothed upper and lower surface envelopes.

A demonstration of the decomposition result is shown in Figure 2.6 and 2.7. Figure

Chapter 2. Empirical Mode Decomposition and Its Representational Problem21

2.6 shows an image and Figure 2.7 shows the result after applying FA-2D-EMD.

Figure 2.6: An image.

2.3 Bivariate Empirical Mode Decomposition

The Bivariate EMD [19] was proposed in order to obtain unified representation between

a pair of signals x(n) and y(n). BEMD performs a similar procedure as in the original

EMD, but for a complex signal x(n) + jy(n). The idea is that instead of looking for

oscillating components, BEMD looks for rotating components in three dimensional space

defined by real, imaginary and time axes. Thus, the analysis is performed simultaneously

for the real and imaginary components, resulting in the same number of IMFs for both.

In analogous to the envelope in EMD, 3D tube is built in BEMD to surround the

signal being iterated such that the center of the tube acts similar to the mean envelope

and can be removed iteratively to reveal the IMF. This is demonstrated in Figure 2.8.

Note that this is a demonstration of how 3D tube is built, which corresponds to one

sifting iteration. In order to extract a valid IMF, the complex signal in Plot (c) will need

to be further processed, that is, 3D tube need to be built to surround it in order for

slowly rotating trend to be removed.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem22

IMF1 IMF2

IMF3 IMF4

IMF5 IMF6

Figure 2.7: Intrinsic mode functions from decomposing the image in Figure 2.6.

In order to estimate the center of the tube, projection of the complex signal is ob-

tained at different directions among the real and imaginary part. Combining the partial

estimations in all directions results in an estimation of the center location. This is demon-

strated in Figure 2.9, where the ellipse represents a cross section of the 3D tube, and

the four points corresponds to the extrema when projecting to four directions. The tube

center is defined as the center of mass of all these points, assuming unit mass for every

point. Other methods of estimating the center are available and was presented in [19].

Giving two signals x(n) and y(n), BEMD operates as follows:

1. Initialize the signal to be iterated on as the complex signal: zr(n) = x(n) + jy(n).

Initialize IMF index i = 1.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem23

Figure 2.8: 3D tube built to surround a complex signal.

Figure 2.9: Estimation of the center of 3D tube.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem24

2. Project zr(n) onto K different directions within [0, π].

3. For each direction, find the upper and lower envelope.

4. Taking the average of all envelopes to get the center of tube envm(n).

5. Removing envm(n) from zr(n) to obtain the candidate IMF zi(n).

6. Check if the stop criterion is satisfied, if so, update residue for next IMF extraction

zr(n) = zr(n) − zi(n), increase IMF index i and continue to step 7. Otherwise,

update residue for next sifting iteration zr(n) = zr(n) − zi(n), and keep iterating

from step 2.

7. Check if the terminate criterion for residue is satisfied, if so terminate the algorithm.

Otherwise, go to step 2.

The complete BEMD procedure is summarized in Algorithm 4.

Algorithm 4: BEMD

1 z(n) = x(n) + jy(n);2 i = 0;3 zr(n) = z(n);4 while NOT(CheckTerminateCriterion(zr(n))) do5 zr(n) = zr(n);6 i = i+ 1;7 while NOT(CheckStopCriterion(zi+1(n))) do8 envm(n) = [0 0 . . . 0];9 for k = 1→ K do

10 ϕk = kπK

;11 envu(n) = Interpolate(FindLocalMax(Project(zr(n), ϕk)));12 envl(n) = Interpolate(FindLocalMin(Project(zr(n), ϕk)));13 envm(n) = 1

2[envm(n) + 1

2e−jϕk(envu(n) + envl(n))];

14 end15 zi(n) = zr(n)− envm(n);16 zr(n) = zr(n)− zi(n);

17 end

18 end

Chapter 2. Empirical Mode Decomposition and Its Representational Problem25

The stop criterion and the terminate criterion are similar to those with the EMD

algorithm, with the exception that for terminate criterion, signal projections on different

directions need to be checked and the algorithm will be terminated whenever there are

less than 3 extrema in any of the directions.

After the execution of the algorithm, suppose we have a total of L− 1 IMFs and the

residue in the end, then the original complex signal z(n) can be represented by:

z(n) =L−1∑i=1

zi(n) + zr(n) (2.12)

Writting this in a simple form we have:

z(n) =L∑i=1

zi(n) (2.13)

where the residue has been moved inside the summation.

Since all IMFs are complex signals as well, we can write out their real and imaginary

parts explicitly:

zi(n) = xi(n) + jyi(n) (2.14)

xi(n) = Re{zi(n)}

yi(n) = Im{zi(n)}

so that we have same number of decomposition levels for both signals:

x(n) =L∑i=1

xi(n)

y(n) =L∑i=1

yi(n)

(2.15)

Figure 2.10 shows an example of two synthesized signals. A demonstration of the

decomposition result is shown in Figure 2.11, where the solid lines represent the real

parts of the IMFs, which corresponds to signal A in Figure 2.10, and the dashed lines

represent the imaginary parts of the IMFs, from signal B.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem26

Am

plit

ud

e

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

−10

−5

0

5

10

Sample

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

−10

−5

0

5

10

Figure 2.10: Two synthesized signals. Top: signal A. Bottom: signal B.

2.4 Other Variations

For decomposing more than two 1D signals together, algorithms have been proposed, such

as trivariate EMD [20] and multivariate EMD [21]. One drawback of these algorithms

is that as the signal dimension or the number of signals increase, the computational

complexity grows exponentially since within each sifting iteration projections need to be

taken in higher dimensional hyperspheres. In practice such complexity is undesired.

2.5 Problem with EMD - No unified representation

Due to the empirical and algorithmic nature of EMD, no unified representation for the

decomposition is gauranteed when applying the original algorithm. When a certain type

of signals are decomposed, it is desired that the output would be under the same unified

representation, for example same number of levels and correspondence at each level. All

Chapter 2. Empirical Mode Decomposition and Its Representational Problem27

Am

plit

ud

e

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−2

0

2

Sample

−5

0

5

−5

0

5

−10

0

10

−5

0

5

Figure 2.11: Complex intrinsic mode functions from decomposing the pair of signals

in Figure 2.10. Solid lines represent real parts of the IMFs and dashed lines represent

imaginary parts of the IMFs.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem28

state of the art EMD algorithms are either unable to solve this problem, or can only

address this to a certain extent which is far from satisfactory.

2.5.1 One Dimensional CaseA

mp

litu

de

100 200 300 400 500 600

−5

0

5

10

x 10−6

Sample

100 200 300 400 500 600

−2

0

2

x 10−5

100 200 300 400 500 600

−1

−0.5

0

0.5

1x 10

−5

Figure 2.12: Example of three signals.

When a set of signals are decomposed, for example, multiple recordings of similar

processes, we want the decompositions of all the signals to be of same number of levels and

each level to be consisting of similar oscillation types. In the case of two signals, a unified

representation can be obtained by using BEMD. For three or more signals, although

there are extensions of EMD such as trivariate-EMD [20] or multivariate-EMD [21], the

computational complexity involved is exponential of the number of signals. Furthermore,

for each new signal to be decomposed, we need to add this signal into the collection and

repeat the decomposition from the beginning, which is not practical.

If we apply the original EMD to multiple signals separately, most probably we will

get different number of IMFs, and most importantly IMFs at the same level might not

Chapter 2. Empirical Mode Decomposition and Its Representational Problem29

−2

0

2x 10

−5

−1

0

1x 10

−5

−5

0

5x 10

−6

−2

0

2x 10

−4

−2

0

2x 10

−5

−5

0

5x 10

−5

−1

0

1x 10

−5

−2

0

2x 10

−5

−2

0

2x 10

−4

−5

0

5x 10

−6

−1

0

1x 10

−5

−5

0

5x 10

−6

−5

0

5x 10

−6

−5

0

5x 10

−6

−2

0

2x 10

−5

Figure 2.13: Decomposition obtained by applying EMD separately.

correspond to same type of oscillations. In fact, this is true even when the number of

levels are the same. Figure 2.12 shows three otoacoustic emission signals and Figure

2.13 shows an example of decomposing three signals using EMD separately, where the

number of levels are different and no correspondence is gauranteed between the IMFs.

In Figure 2.13 each column shows a decomposition of one of the signals. These three

signals are actually of the same type and they cover roughly the same frequency range.

These signals will be discussed in details in Chapter 4.

The problem of obtaining unified representation for more than two signals will be

addressed by Reference-EMD (R-EMD) in Chapter 3.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem30

2.5.2 Two Dimensional Case

Original Signal A Original Signal B

Figure 2.14: Two faces with different expressions.

Since 2D-EMD is a relatively new tool, state of the art algorithms are presently only

capable of decomposing one image at a time. If information fusion is required between

two images, the current solution is to vectorize the two images, applying BEMD on the

two 1D signals, then organize everything back to 2D. This is suboptimal since vectorized

images lose the correlation from one spatial dimension completely.

If 2D-EMD is applied to multiple images separately, we will get different number

of IMFs and no correspondence is guaranteed. Figure 2.14 shows two images of the

same face under different facial expressions. Figure 2.15 shows the decomposition after

applying 2D-EMD separately, where no correspondence can be found between the IMFs.

These images will be discussed in details in Chapter 6.

The problem of obtaining unified representation for more than one image will be

addressed by 2D-BEMD in Chapter 3.

Chapter 2. Empirical Mode Decomposition and Its Representational Problem31

IMF A1 IMF A2

IMF A3 IMF A4

IMF A5 IMF A6

IMF B1 IMF B2

IMF B3 IMF B4

Figure 2.15: Decomposition obtained by applying 2D-EMD separately.

2.6 Conclusion

In summary, for one dimensional case, there exisits several different algorithms to satisfy

different decomposition requirements, such as EMD for decomposing a single signal,

BEMD for a pair of signals, trivariate EMD for three signals and multivariate EMD for

more than three signals. But as the number of signal increases, even the most capable

algorithm becomes intractable with the state of the art approach. On the other hand, the

development for two dimensional algorithms is still at an early stage with only 2D-EMD

covering the single image case.

In the next chapter, the problems described herein will be addressed via two new

algorithms that provide unified representations after decomposition.

Chapter 3

EMD with Unified Representation

In this chapter two new variants of the EMD algorithm are proposed, in order to obtain

unified representation after decomposisiton, for both one dimensional and two dimen-

sional cases.

The discussion in this chapter will start from the better developed one dimensional

case, where Reference EMD (R-EMD) is proposed, as a way to obtain unified represen-

tation among multiple signals by decomposing each signal with a set of references, giving

that these signals originate from a similar process. Then we continue to the development

of two dimensional bivariate EMD (2D-BEMD) for obtaining unified representation be-

tween two signals, as a step further on the existing two dimensional algorithms.

At the end of this chapter, two dimensional reference bivariate EMD (2D-R-BEMD)

will be briefly discussed to complete the proposed framework. Although this algorithm is

not used for any application in this thesis, it has potential in applications where spatial-

frequency information needs to be fused from more than two images.

3.1 One Dimensional Reference EMD

For decomposing multiple one dimensional signals together, the state of the art solution is

to combine the target signals into higher dimensions as in trivariate EMD and multivari-

32

Chapter 3. EMD with Unified Representation 33

ate EMD. The problem with this is that as the number of signals increases, dimensionality

gets higher and the complexity grows exponentially. Furthermore, adding a new signal

into the collection requires the repetition of the entire decomposition procedure, which

is not practical.

The proposed reference EMD (R-EMD) takes an alternative approach to this problem

by carefully constructing a set of reference signals that is appropriate for a certain type

of signals and use these references to guide the decomposition of every signal of interest.

Note that by doing so, arbitrary number of signals can be decomposed with unified

representation without increasing the computational complexity. Furthermore, adding a

new signal into the collection requires simple decomposition of the new signal against the

reference.

We propose to use a set of sinusoids as reference to achieve a wavelet-like frequency

separation, while retaining the adpative feature of the EMD algorithm.

3.1.1 Algorithm

For a one dimensional signal x(n), the first step is to determine a rough frequency range

for such type of signals and design a set of reference frequencies w according to the fre-

quency range and the desired number of decomposition levels. The rules for determining

reference frequencies will be discussed in details in Section 3.1.2. After w is determined,

R-EMD operates as follows:

1. Initialize IMF index i = 1.

2. Create a sinusoid vi(n) of the reference frequency w(i) at the current level i.

3. Initialize the signal to be iterated on as the complex pair: zr(n) = xr(n) + jvi(n)

4. For 1 ≤ k ≤ K

(a) Project zr(n) on ϕk = πK

(k − 1) to get Re{e−jϕkzr(n)}

Chapter 3. EMD with Unified Representation 34

(b) Compute the upper and lower envelops envu(n),envl(n).

(c) Average upper and lower envelop to get the partial mean envelop envm(n) on

kth direction.

Then the partial estimations are averaged in three dimensional space to get the

overall mean envelop envm(n).

5. Compute the candidate IMF:

xi(n) = Re{zr(n)− envm(n)}

If this candidate IMF satisfies the stop criterion, remove this IMF from zr(n),

increase the IMF index i and go to step 6. Otherwise, keep iterating from step 4.

6. Check if the pre-determined IMF levels is reached, if so terminate the algorithm.

If not, go to step 2.

The proposed algorithm is presented in more detail in Algorithm 5, where CheckTer-

minateCriterion checks if the number of levels are reached, for terminating the entire

algorithm and CheckStopCriterion checks to see if the target signal is an IMF in order

to stop the sifting process, similar to the original EMD algorithm.

3.1.2 Relationship between extracted frequency and reference

signal

In order to construct a set of reference sinusoids, it is important to know that for each

sinusoid in the set being used as reference, what kind of frequency components can be

extracted and is in correspondence with such sinusoid. To understand this, we derive the

local frequency relationship between the reference sinusoid and the extracted IMF at one

single level, together with experiment on synthesized signals to validate the result. For

notation simplicity and ease of derivations, in this section we use continuous time signals

Chapter 3. EMD with Unified Representation 35

Algorithm 5: R-EMD

1 i = 0;2 xr(n) = x(n);3 while i < L do4 vi(n) =CreateRefSignal(w, i);5 z(n) = xr(n) + jvi(n);6 i = i+ 1;7 while NOT(CheckStopCriterion(xi(n))) do8 envm(n) = [0 0 . . . 0];9 for k = 1→ K do

10 ϕk = πK

(k − 1);11 envu(n) = Interpolate(FindLocalMax(Project(zr(n), ϕk)));12 envl(n) = Interpolate(FindLocalMin(Project(zr(n), ϕk)));13 envm(n) = 1

2[envm(n) + 1

2e−jϕk(envu(n) + envl(n))];

14 end15 xi(n) = Re{z(n)− envm(n)};16 z(n) = envm(n);

17 end18 xr(n) = xr(n)− xi(n);

19 end

instead of discrete time signals. Note that all EMD related algorithms still operate in

the discrete time domain only.

In a general signal model, we allow a signal to be a superposition of different amplitude

modulated and frequency modulated (AM-FM) components:

x(t) =∑i

ai(t) cos(ωi(t)t) (3.1)

We first remark that:

• in the following analysis only local properties are considered

• ai(t) is usually smooth and slowly varying

Without loss of generality, we can set the amplitude and frequency to be both constants:

ai(t) = ai

ωi(t) = ωi

Chapter 3. EMD with Unified Representation 36

Let the current level reference signal be:

v(t) = cos(ωt) (3.2)

Essentially, the IMF is extracted by first forming the complex signal between the input

signal and the reference signal, and sifting out the fastest rotation component. The real

part of the sifted out component is the current level IMF. The question we ask is: among

all the components in x(n), what is the criterion for a certain component to be extracted?

In order to investigate that, we first form the complex signal between ith component from

the input signal and the reference signal:

z(t) = ai cos(ωit) + jv(t) = ai cos(ωit) + j cos(ωt) (3.3)

We are seeking the conditions under which the ith component is part of the extracted

IMF. As suggested in [19], for a complex signal to be a fixed point of the sifting process,

one necessary condition is that the sense of rotation never changes. The sense of rotation

SoR is defined by the sense of the vector product of velocity and acceleration:

SoR = Sgn

{Im

[dz

dt·(d2z

dt2

)∗]}(3.4)

Using the signal model, this can be computed as:

dz

dt= −aiωi sin(ωit)− jω sin(ωt)

d2z

dt2= −aiωi2 cos(ωit)− jω2 cos(ωt)(

d2z

dt2

)∗= −aiωi2 cos(ωit) + jω2 cos(ωt)

So that SoR can be written as:

SoR = Sgn

{Im

[dz

dt

(d2z

dt2

)∗]}= Sgn{−aiω2ωi sin(ωit) cos(ωt) + aiωi

2ω sin(ωt) cos(ωit)}

= Sgn{−(ωi + ω) sin[(ωi − ω)t] + (ωi − ω) sin[(ωi + ω)t]}

(3.5)

We investigate the function in two different scenarios:

Chapter 3. EMD with Unified Representation 37

1. ω = ωi

In this case we have SoR = Sgn{0} which is a constant. Under such condition the

complex pair z(t) = ai cos(ωit)+j cos(ωt) is a fixed point of the sifting operator and

the ith component will be extracted as part of the IMF. This makes intuitive sense

as well, since ω = ωi implies that the ith component is in perfect synchronization

with the reference signal.

2. ω 6= ωi

In the general case, we consider a local time span where the faster of the two signals

(cos(ωt) and ai cos(ωit)) completes one full cycle. In other words, the time span is

restricted to be:

0 ≤ t ≤ min(2π

ω,2π

ωi) (3.6)

Let y(t) be:

y(t) = −(ωi + ω) sin[(ωi − ω)t] + (ωi − ω) sin[(ωi + ω)t] (3.7)

Without loss of generality, we consider the ratio between the two frequencies:

q =ωiω

(3.8)

Substitute this into y(t) we have:

y(t) = −(ωq + ω) sin[(ωq − ω)t] + (ωq − ω) sin[(ωq + ω)t] (3.9)

Since only Sgn[y(t)] is considered, it suffices to evaluate:

y(t) = −(q + 1) sin[(ωq − ω)t] + (q − 1) sin[(ωq + ω)t] (3.10)

For better numerical demonstration, we scale y(t) with respect to ω by changing t

to t:

t =t2πω

(3.11)

Chapter 3. EMD with Unified Representation 38

0 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

t

(−q−1) sin(q 2 π t−2 π t)+(q−1) sin(q 2 π t+2 π t) = 0

Figure 3.1: Contour plot for demonstration of the solution to Equation (3.16). Shaded

area corresponds to the region on t, defined by the second inequality.

and get:

y(t) = −(q + 1) sin[2π(q − 1)t] + (q − 1) sin[2π(q + 1)t] (3.12)

Within the defined time span we require Sgn[y(t)] to be fixed, either being +1 or

−1. This is equivalent to requiring that there is no solution for :y(t) = 0

0 < t < min(1, 1q)

(3.13)

To solve this system of equations numerically, we plot two contours, as shown in

Chapter 3. EMD with Unified Representation 39

(a) q = 0.25 (b) q = 0.75

(c) q = 0.1 (d) q = 3

Figure 3.2: Examples of different values of q and their intersection with y(t) = 0. Vertical

dotted line represent the current value q = qo.

Figure 3.1. Solid line represent the contour for:

y(t) = 0 (3.14)

and dashed line is the contour for:

t =1

q(3.15)

Chapter 3. EMD with Unified Representation 40

Conside the region defined by:

0 < t < min(1,1

q) (3.16)

as shown in Figure 3.1 by the shaded area. We are looking for values for q such that

within the shaded region (excluding boundaries) there is no solution to y(t) = 0.

Suppose q takes on a specific value q = qo, the above is equivalent to requiring that

within the shaded region (excluding boundaries), there is no intersection between

q = qo and the contour y(t) = 0. Figure 3.2(b) and 3.2(c) show examples of q

that do not intersects with y(t) = 0. In these cases no solution exists for Equation

(3.16), so SoR does not change and the corresponding ai cos(ωit) + j cos(ωt) is a

fixed point for the sifting operator, and ai cos(ωit) is part of the extracted IMF.

On the other hand, Figure 3.2(a) and 3.2(d) show examples of q that intersect

with y(t) = 0. In these cases the requirement is not satisfied so the corresponding

ai cos(ωit) will not be part of the IMF, when cos(ωt) is used as reference. It is clear

that the requirement is only satisfied if q satisfies:

0.5 ≤ q ≤ 2 (3.17)

as shown by the darker shaded region in Figure 3.3.

Combining the result from the first and second scenario, we conclude that ai cos(ωit)

is part of the extracted IMF with reference signal cos(ωt), if the ratio between ωi and ω

satisfies:

ω

2≤ ωi ≤ 2ω (3.18)

To validate the above analysis, we construct a synthesized chirp signal of 5 seconds,

Chapter 3. EMD with Unified Representation 41

0 0.5 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

t

(−q−1) sin(q 2 π t−2 π t)+(q−1) sin(q 2 π t+2 π t) = 0

Figure 3.3: Condition for q such that no solution exists for Equation (3.13). Two arrows:

q = 0.5 and q = 2.

begining with DC and increases frequency at 25Hz per second:

x(t) = cos(2π · 25t · t) (3.19)

We use a reference cosine of 50Hz, which is the instantaneous frequency of x(t) at t = 2

second:

r(t) = cos(2π · 50 · t) (3.20)

The instantaneous frequencies of these two signals are plotted in Figure 3.4.

We apply the proposed R-EMD and investigate the first level IMF. Denote the real

part of IMF1, which corresponding to x(t), as x1(t). Denote the imaginary part of IMF1,

which corresponding to r(t), as r1(t). Figure 3.5 shows these two signals.

Chapter 3. EMD with Unified Representation 42

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

20

40

60

80

100

120

140

Time (Sec)

Insta

nta

ne

ou

s F

req

ue

ncy

Figure 3.4: Instantaneous frequency of the chirp (solid line) and the reference (dashed

line).

Am

plit

ude

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−1

−0.5

0

0.5

1

Time (Sec)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

−1

−0.5

0

0.5

1

Figure 3.5: First level IMF: real part on the top plot and imaginary part on the bottom

plot.

According to our analysis, frequency in the range (25, 100) will be extracted. Note

that in a practical setup of R-EMD, as we will present in Section 3.1.3, the decomposition

Chapter 3. EMD with Unified Representation 43

starts from the finnest scale and we design the set of reference frequencies such that at

each level the highest frequency in the signal x(t) is no more than twice of the reference

frequency, whereas in this experiment it is beyond the frequency that the reference signal

is supposed to extract. To evaluate the performance in this experimental setting, the

extracted frequency is defined to be related to the range where both x1(t) and r1(t) have

high energy, that is, where x1(t) and r1(t) are in correspondence with each other. To

evaluate this, we build upper envelopes on the the two signals, which we denote as ex(t)

and er(t). If we consider each level of R-EMD as a filter, the filter response can be

approximated as in Figure 3.6, where the x-axis is the instantaneous frequency of x(t)

and the y-axis is ex(t) · er(t). We also plotted the location of the referenec frequency at

50Hz (vertical dashed line in the middle) together with two bounds from our analysis:

25Hz (vertical dashed line on the left) and 100Hz (vertical dashed line on the right). It is

clear that the frequency bounds from the analysis approximate closely to the simulation

result.

0 20 40 60 80 100 120

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Instantaneous Frequency

Am

plit

ude R

esponse

Figure 3.6: Approximate of the filter structure of one level R-EMD.

Chapter 3. EMD with Unified Representation 44

3.1.3 Reference frequency selection for R-EMD

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Figure 3.7: Smoothed one level frequency response.

To approxiamate the behaviour of 1-level R-EMD in a filter-like structure, a mixture

of three gaussians is used to fit the amplitude response, as shown in Figure 3.7. Under this

assumption we can create a filterbank structure that corresponds to the optimal reference

frequency setting of the proposed algorithm, as shown in Figure 3.8. The structure is

also plotted in log scale, as in Figure 3.8.

Ideally, once the highest frequency and the number of level is determined, the sub-

sequent frequencies can be determined one by one, each being half of the reference from

previous level. Furthermore, the highest frequency can be any value in the range (W2, 2W )

where W is the highest instantaneous frequency in the signal to be decomposed. In prac-

tice, reference frequencies need to be picked up according to this rough measure, as well

as the freuquency property of the signals of interest. Also it is preferred to pick up a

higher frquency to start with since there is always high frequency noise in real world

applications.

Chapter 3. EMD with Unified Representation 45

0 0.125ω0.25ω 0.5ω ω 1.5ω

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Instantaneous Frequency

Am

plit

ude R

esponse

Figure 3.8: Filter bank structure.

3.1.4 Experimental Result

Figure 3.10 shows an example of decomposing three transient evoked otoacoustic emission

(TEOAE) signals using the proposed R-EMD. The original signals are the same as the

ones used to demonstrate problems of EMD and were depicted in Figure 2.12. The

reference frequencies are designed according to the above rules and OAE signal property,

which will be presented in details in Chapter 4.

3.2 Two Dimensional Bivariate EMD

For decomposing two dimensional signals, the state of the art algorithms are only capable

of dealing with one signal at a time. In this section, we take a step further to proposed an

algorithm to decompose two signals together, in order to obtain a unified representation

between the two.

Chapter 3. EMD with Unified Representation 46

0.008ω 0.016ω 0.313ω 0.063ω 0.125ω 0.25ω 0.5ω ω 1.5ω

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Instantaneous Frequency (log scale)

Am

plit

ude R

esponse

Figure 3.9: Filter bank in log scale.

3.2.1 Algorithm

In order to decompose two images together, we follow a similar approach as in BEMD.

Signals are combined together to form a new one in higher dimension. By projecting

to different directions among the two signal dimensions, partial surface envelopes are

estimated and can be combined together. We herein use the term complex-valued surface

to describe a function (image) that maps a pixel to a complex number, so the first step is

to combine two images into a complex-valued surface. Following that, obtaining unified

representation means that each IMF is a complex-valued surface with correspondence on

the local oscillation between the real and the imaginary parts.

For two images x(m,n) and y(m,n), 2D-BEMD operates as follows:

1. Initialize IMF index i = 1.

2. Construct the complex-valued surface z(m,n) = x(m,n) + jy(m,n)

Chapter 3. EMD with Unified Representation 47

Sample

Am

plit

ude

−202

x 10−5

−101

x 10−5

−202

x 10−5

−505

x 10−6

−101

x 10−5

−505

x 10−5

−202

x 10−4

200 400 600−2

02

x 10−4

−101

x 10−5

−101

x 10−5

−101

x 10−5

−202

x 10−5

−202

x 10−5

−202

x 10−5

−202

x 10−5

200 400 600−2

02

x 10−4

−101

x 10−5

−505

x 10−6

−505

x 10−6

−505

x 10−6

−202

x 10−6

−505

x 10−6

−202

x 10−5

200 400 600−5

05

x 10−5

Figure 3.10: Demonstration of the proposed R-EMD.

3. For 1 ≤ k ≤ K

(a) Project on ϕk = πK

(k − 1) to get Re{e−jϕkz(m,n)}

(b) Determine window size for filtering [18]:

w = min{min{dadj−max},min{dadj−min}} where {dadj−max} is the collection of

all pairwise distances between adjancent local maximums, and {dadj−min} is

the collection of all pairwise distances between adjancent local minimums.

(c) Apply order statistics filter and smoothing to get the upper and lower surface

envelopes.

(d) Average the upper and lower surface envelope to get the partial mean surface

envm(m,n)

Chapter 3. EMD with Unified Representation 48

(e) Compute complex-valued mean surface:

envm(m,n) = 12[envm(m,n) + e−jϕkenvm(m,n)]

4. Compute the candidate IMF:

zi(m,n) = zr(m,n)− envm(m,n)

If this satisfies the stop criterion, then remove this IMF from the signal, increase

the IMF index i and go to step 5. Otherwise, keep iterating from step 3.

5. Check if the terminate criterion is satisfied, if so terminate the algorithm. If not,

go to step 3.

The proposed algorithm is presented in more detail in Algorithm 6, where CheckTer-

minateCriterion checks if the entire algorithm need to be terminated, which happens

when the current signal to be iterated is considered as a residue. CheckStopCriterion

checks to see if the target signal is an IMF in order to stop the sifting process. Both

criteria are similar to those of 2D-EMD.

In most of the cases, we need only one sifting operation to extract the IMF, so we

can approximate the decomposition by removing the inner loop sifting process to speed

up the algorithm, as shown in Algorithm 7. In the following chapters, we only use this

fast version Algorithm 7 and we will refer to it as 2D-BEMD for simplicity.

3.2.2 Experimental Result

A demonstration of the proposed 2D-BEMD is given in Figure 3.11. Here the same pair

of face images as in Figure 2.14 are decomposed. Same number of levels is achieved with

correspondence from both sides at each level. This demonstrates the effectiveness of the

proposed algorithm in obtaining a unified representation for decomposing pair of images.

Chapter 3. EMD with Unified Representation 49

IMF A1 IMF A2

IMF A3 IMF A4

IMF A5 IMF A6

IMF A7 IMF A8

IMF A9

IMF B1 IMF B2

IMF B3 IMF B4

IMF B5 IMF B6

IMF B7 IMF B8

IMF B9

Figure 3.11: Demonstration of the proposed 2D-BEMD algorithm.

Chapter 3. EMD with Unified Representation 50

Algorithm 6: 2D-BEMD

1 z(m,n) = x(m,n) + jy(m,n);2 i = 1;3 zr(m,n) = z(m,n);4 while NOT(CheckTerminateCriterion(zr(m,n))) do5 zr(m,n) = zr(m,n);

6 while NOT(CheckStopCriterion(zi(m,n))) do7 envm(m,n) = [0 0 . . . 0];8 for k = 1→ K do9 ϕk = kπ

K;

10 envm(m,n) =SmoothFilter(OrderStatisticsFilter(Project(zr(m,n), ϕk)));

11 envm(m,n) = 12[envm(m,n) + e−jϕk envm(m,n)];

12 end13 zi(m,n) = zr(m,n)− envm(m,n);14 zr(m,n) = zr(m,n)− zi(m,n);

15 end16 zr(m,n) = zr(m,n)− zi(m,n);17 i = i+ 1;

18 end

Algorithm 7: FA-2D-BEMD

1 z(m,n) = x(m,n) + jy(m,n);2 i = 1;3 zr(m,n) = z(m,n);4 while NOT(CheckTerminateCriterion(zr(m,n))) do5 zr(m,n) = zr(m,n);6 envm(m,n) = [0 0 . . . 0];7 for k = 1→ K do8 ϕk = kπ

K;

9 envm(m,n) = SmoothFilter(OrderStatisticsFilter(Project(zr(m,n), ϕk)));10 envm(m,n) = 1

2[envm(m,n) + e−jϕk envm(m,n)];

11 end12 zi(m,n) = zr(m,n)− envm(m,n);13 zr(m,n) = zr(m,n)− zi(m,n);14 zr(m,n) = zr(m,n)− zi(m,n);15 i = i+ 1;

16 end

3.3 Two Dimensional Reference EMD

Using the same principle as in R-EMD, it is possible to extend the two dimensional

EMD algorithm for obtaining unified representation among multiple images. Giving that

Chapter 3. EMD with Unified Representation 51

the frequency range of one particular type of images can be determined roughly, a set

of reference surfaces consisting of sinusoids in both directions can be constructed and

used to guide the decomposition. It is expected that this algorithm will achieve similar

performance for images as the R-EMD does for one dimensional signals. Although this

algorithm was not inplemented since currently none of the applications requires spatial-

frequency specific information fusion from more than two images, it is proposed here as

to complete the entire framework for obtaining unified representation using EMD.

3.4 Conclusion

In this chapter two algorithms were proposed for obtaining unified signal representa-

tion using EMD. Both algorithms are one step further compared to the state of the art

algorithms, with R-EMD for 1D signal decomposition and 2D-BEMD for 2D image de-

composition. Since the state of the art is different for 1D and 2D scenarios, the algorithms

proposed here are different in their capabilities.

For 1D signals, existing algorithms are capable of providing unified representation

among pair, triplet or more signals, but the algorithm complexity grows as more signals

added in. R-EMD was proposed to decompose multiple signals under unified represen-

tation with low complexity. For 2D images, existing algorithms do not provide unified

representation for a pair of images. 2D-BEMD was developed to address this problem.

To complete the framework of using reference signal for a unified representation, we

also discussed 2D-R-EMD, which will be developed in the future.

For the next three chapters, the focus will be on the application of these propsed

algorithms in different areas in biometrics and image processing. In Chapter 4 we present

the application of R-EMD to decompose transient evoked otoacoustic emission signals

for similarity comparison at each level, in order for a biometric recognition system to

be established. In Chapter 5 and Chapter 6 2D-BEMD is applied to combine images at

Chapter 3. EMD with Unified Representation 52

multiple levels. By using different strategy for combination, we can emphasize details

from both image for fusing de-focused images (Chapter 5), or highlighting certain features

that resides in some particular levels for transforming facial expressions (Chapter 6).

In these applications unified representation is preferred because signal comparison or

information fusion is carried out at each level, which requires predictable number of

levels and correspondence at each level.

Chapter 4

Otoacoustic Emissions for Biometric

Recognition via R-EMD

In this chapter the R-EMD algorithm proposed in Chapter 3 is employed in multi-level

decomposition of the transient evoked otoaoustic emission signals for biometric recog-

nition purpose. The unified representations make it possible to carry out a multi-level

matching and decision on the IMFs from the decomposition. Performance of the proposed

system is tested on a dataset we collected specifically for biometric evaluation purpose

and promising result is observed.

4.1 Introduction

Otoacoustic emission (OAE) is a weak acoustic sound generated by an active process in

the cochlea and can be collected easily using a special earphone with built-in microphone.

The transducer in the earphone is for sending stimulus into the ear and the microphone

is used to collect response.

Using OAE for biometric has several advantages. First of all, its physiological na-

ture makes it robust against falsified credentials, because physiological signals are more

difficult to reproduce, compared to traditional biometric features. Furthermore, it can

53

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD54

be applied in some special scenarios, for example, in new born identification where both

face, fingerprint and iris recognition will fail. Last but not least, the proposed framework

can be easily integrated into existing OAE recording devices and operate in parallel with

diagnostic or monitoring activities.

4.2 Otoacoustic Emissions

0 0.5 1 1.5 2 2.5

−100

−50

0

50

100

150

200

250

300

350

400

Time (ms)

Am

plit

ude (

mP

a)

Figure 4.1: Click sound stimulus.

The human ear consists of three major parts, outer ear, middle ear and inner ear.

Sound collected by outer ear travels along the ear canal and hit the ear drum, cause

it to vibrate. Through the middle ear it arrives at the inner ear. Basilar membrane,

a piece of structure run through the coil of cochlea, respond to different frequency in

a location-specific way, where higher frequencies are responded near the base and lower

frequencies near the apex. Inside this basilar membrane, there are two different types

of hair cells, with outer hair cells responsible for amplification of sound, which is related

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD55

0 2 4 6 8 10 12 14 16

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

Time (ms)

Am

plit

ud

e (

mP

a)

Figure 4.2: TEOAE response after pre-processing.

to OAE generation, and inner hair cells responsible for transformation of the sound into

electrical pulses that are sent to the brian through auditory nerves.

OAE is a by-product of the normal hearing process. In mammals, in order to improve

hearing sensitivity, the outer hair cells inside basilar membrane nonlinearly amplify quite

sounds more than loud ones. Vibration of the outer hair cell body result in not only a

forward amplification of the sound, but also a backward response that eventually comes

out of the ear canal, which is the source for OAE .

Different types of OAE are determined by the stimulus used to generate the response.

Transient evoked otoacoustic emission (TEOAE), which is the signal we investigate, is

a response that is stimulated by a low level click sound, which is a flat-band signal in

frequency domain. A flat-band signal can excite the entire basilar membrane so the

response will be rich in frequency. In addition, different frequency components arrive at

the ear canal at different times, thus make the whole response long and complex, compare

to other OAE signals.

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD56

TEOAE is used in clinical application for screening and diagnosis purpose. It has

been observed that TEOAE response is quite different between individuals, which makes

it possible as biometric.

Example of TEOAE recorded from Vivosonic Integrity system, after some pre-processing

for better visual quality, is shown in Figure 4.2. The stimulus used to generate this signal

is shown in Figure 4.1.

4.3 Related Work

A feasibility study of using TEOAE as a biometric modality was conducted by Swabey

[22]. The dataset that was investigated consists of one adult short-term dataset with 23

subjects (recorded within same session, which is of less value for biometric evaluation

purpose), one neonate dataset with 760 subjects (no report on time interval between

two recording sessions) and one adult long-term dataset with 6 subjects(time interval

between two sessions was 6 months). Maximum likelihood estimation was employed to

approximate the probability density function of inter-class and intra-class distance, on

raw time domain signals (after built-in filtering by the device). The reported Equal Error

Rate (EER) for three datasets was 1.24%, 2.29% and 2.35%, all with 90% confidence,

respectively.

The only dataset of long term variability evaluation recordings consists of only 6

subjects, so the performance on such dataset cannot be generalized to a larger population.

Furthermore, analyzing TEOAE signal in time domain is suboptimal, since its frequency

specific features and details are ignored.

To address these problems, we collect data under a setup suitable for biometric eval-

uation purpose, and propose to use R-EMD for decomposing and analyzing the signal at

multiple levels.

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD57

Table 4.1: TEOAE recording protocol

Stimulus Parameters

STI-Mode Nonlinear

Click Interval 21.12ms

Click Duration 80µs

Click Level 80dB peSPL

Test Control

Record Window 2.8− 20ms

Low Pass Cut-off 6000 Hz

High Pass Cut-off 750Hz

Artifact Rejection Threshold 55dB SPL

4.4 Signal Collection

Signal collection was conducted in Biometrics Security Laboratory at University of Toronto,

approved by University of Toronto protocol reference number 23018. Vivosonic Integrity

system [47] was used with protocol details shown in Table 4.1.

To ensure the quality of recording without too much constraint on environment,

earmuff was used for noise canceling but the experiment was setup in an office where there

were people talking and entering the office. The participants were given the instruction to

sit in a chair and relax. Details about signal collection and outlier removal are discussed

in Appendix B.

After outlier removal, 54 subjects were successfully recorded in both sessions, with

the time between sessions at least one week to validate long-term stability. Most of the

subjects are between the age 20 and 30. Dataset consists of one response of length 17.2ms

per ear per session for each subject.

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD58

4.5 Biometric System Setup

The setup of the biometric system is shown in Figure 4.3. In the enrollment stage, we

use the recording from first session of each subject, pre-process and store the signature

into the system gallery set. In recognition stage, we select the recording from the second

session of one subject, pre-process it and match it with the whole gallery set to find the

best fit and claim the identity.

Figure 4.3: Proposed biometric recognition system.

Denote TEOAE recorded during first session(enrollment) as {xLk}, {xRk} and TEOAE

recorded during second session(recognition) as {yLk}, {yRk} , with subject IDs k =

1, 2 · · · 54 and {L,R} for left or right ear.

4.6 OAE signal decomposition using R-EMD

R-EMD is used to decompose the TEOAE recordings, with reference frequencies selected

according to an auditory model. In addition, high frequency noise is removed by applying

a signal mask according to the frequency-latency relationship. These procedures serve as

the preprocessing steps before biometric identification can be made.

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD59

4.6.1 Reference frequency selection

According to an auditory model proposed to be used with wavelet transform [23, 24],

generation of TEOAE can be considered as the stimulus passing through a set of band

pass filters (BPF), each corresponding to one point on the basilar membrane and with

nonlinear damping effect. In their work, using a 128-level auditory model, the center

frequency of each band can be calculated as:

fm =foqm

(4.1)

where m = 0, 1, 2, · · · 127, base frequency fo = 15165.4Hz and q = 1.0352952. To apply

the R-EMD algorithm, reference frequencies have to be roughly spaced with a factor of 2

from level to level. Taking into account both the auditory model and reference frequency

selection rule, we propose to use the following 8 reference frequencies:

fm =fo

q16k−1(4.2)

with k = 1, 2, · · · 8, corresponding to level 15, 31, 47, 63, 79, 95, 111, 127 in the 128-level

auditory model.

In addition, since low frequency trend with high amplitude resides in every recording,

as can be seen from Figure B.3 in Appendix B, we propose to use only the first 4 levels

of IMFs to avoid low frequency noise, rather than using all 8 levels and the residue. This

also speeds up the algorithm since decomposition can be done only for the first 4 levels.

Note that this type of hard thresholding can be improved in future work.

4.6.2 Masking

For TEOAE signals, since high frequency components of TEOAE exhibit shorter latency

and duration, in order to remove high frequency noise from the recording, IMF1 and

IMF2 are multiplied with a mask, which is a window centered at 0ms with a falling

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD60

cosine tail between 3.9ms and 6.5ms:

W (t) =

1 0 ≤ t < 3.9ms

1 + cos( t−3.92.6

π) 3.9ms ≤ t < 6.5ms

0 6.5ms ≤ t ≥ 17.2ms

(4.3)

The above two step procedures are applied to the TEOAE recordings in both en-

rollment and recognition session. Denote the decomposed signals after applying mask

as {xiLk}, {xiRk} for enrollment session and {yiLk}, {yiRk} for recognition session, with

subject IDs k = 1, 2 · · · 54 and IMF index i = 1, 2, 3, 4. Note that in a practical setup,

these decomposed signals are store in gallery together with the subject ID only during

enrollment since during recognition the subject ID is unknown. An example of 4-level

decomposition after masking is depicted in Figure 4.4.

0

5x 10

−6

0

5x 10

−6

0

1x 10

−5

0 5 10

0

4x 10

−5

Time(ms)

Am

plit

ude

IMF1

IMF2

IMF3

IMF4

Figure 4.4: IMF1− 4 from decomposing a raw TEOAE recording.

4.7 Recognition

During recognition, we want to match the input decomposition with all the signals in

gallery to find out the best match. To do this, correlation between IMFs at the same

level are computed and combined together to determine the best match ID. To combine

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD61

these correlation values, we take into account two facts. First, correlation coefficient of

different levels cannot be combined directly since each level might have different baseline

value. Thus a normalization is needed at the same level across the gallery. Second, each

level is of different importance for identification purpose. Due to the limited size of the

collected dataset, we here propose to use some empirical weighting for different levels but

this can be improved in future works by learning the best weights using a training set

when a larger dataset is available.

4.7.1 Single ear

Recognition can be done with recording from either the left or right ear. For simplicity

of discussion, we assume the use of recording from left ear in this section.

For the recording from an unknown subject n with recording yLn in recognition session,

we want to find the best match identity in enrolled recordings.

This is done by the following steps:

• Correlation matrix Correlations between IMFs are calculated with the corre-

sponding subject ID for each gallery entry as follows:

C(k, i) = corr(xiLk, yiLn)

I(k, i) = k

with k = 1, 2 · · · 54, i = 1, 2, 3, 4, and corr(a,b) denotes the correlation between

two vectors a and b.

• Ranked, normalized and weighted correlation matrix

After computed the correlation matrix C54×41, we sort each column in descending

order and get C ′54×4. Normalize each entry in C ′54×4 with respect to first row and

1subscript denotes size of the matrix

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD62

keep the first 3 rows (3 highest ranked matches for each level) yield C3×4 with

C(1,i) = 1 and C(k,i) =C′

(k,i)

C′(1,i)

for k = 1, 2, 3, i = 1, 2, 3, 4. I54×4 is also re-ordered

correspondingly as I3×4.

With an empirical weight imposed, score matrix is computed as:

S3×4 =

1 0.8 0.8 0.6

C(2,1) 0.8C(2,2) 0.8C(2,3) 0.6C(2,4)

C(3,1) 0.8C(3,2) 0.8C(3,3) 0.6C(3,4)

• Matching score

Denote the collection of all unique IDs in I3×4 as {Iu} with 1 ≤ u ≤ N and N ≤ 12.

For every Iu ∈ {Iu}, final scores

Su =∑

I(m,n)=Iu

S(m,n)

• Decision

Sort {Su} in descending order as {Su} with {Iu} re-ordered as {Iu}. 3 best matched

identities are I1, I2 and I3 with corresponding scores S1, S2 and S3. The subject is

identified as I1.

4.7.2 Fusion of left and right ear

A score level fusion from both ears can be employed to improve system performance.

Suppose we have the three best matches from left ear enrolled recordings with their

identities IL1, IL2, IL3 and scores SL1, SL2, SL3. Those from right ear are denoted as IR1,

IR2, IR3 and SR1, SR2, SR3.

If results from both sides agree with each other, that is IL1 = IR1, final identified

subject ID is IL1.

If IL1 6= IR1 the matched identity is calculated as:

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD63

• Concatenate subject IDs and scores

Ic = [IL1 IL2 IL3 IR1 IR2 IR3]

Sc = [SL1 SL2 SL3 SR1 SR2 SR3]

• Fused matching score

Denote the collection of all unique IDs in Ic as {Iu}. Final scores {Su} are computed

as follows:

Su =∑

Ic(m)=Iu

Sc(m)

• Decision

Sort {Su} in descending order as {Su} with {Iu} re-ordered as {Iu}. The subject

is identified as I1.

4.8 Experimental Results

Recognition performance is summarized in Table 4.2 for three different scenarios: using

left ear recording only, using right ear recording only and the fusion of two ears. Right

ear performance is slightly lower than left ear. One possible cause may be the additive

spontaneous otoacoustic emission (SOAE), which might not be as unique within each

individual as TEOAE and has been proved to exist together with TEOAE and exhibit

greater intensity in right ear than in left ear [25]. With the fusion of information from

two ears a recognition rate of 98.15% is achieved.

4.9 Conclusion

In this chapter a framework for biometric recognition using transient evoked otoacoustic

emissions was presented. TEOAE signals from 54 subjects for long-term stability vali-

Chapter 4. Otoacoustic Emissions for Biometric Recognition via R-EMD64

Table 4.2: TEOAE biometric recognition performance

Scenario Correctly Recognized Performance

Left 52 out of 54 96.30%

Right 49 out of 54 90.74%

Fusion 53 out of 54 98.15%

dation purpose was collected. By using the proposed R-EMD with reference frequencies

determined by an auditory model, a recognition rate of 98.15% can be achieved with

fusion of information from both ears.

Chapter 5

Image Fusion via 2D-BEMD

In this chapter a multi-level image fusion scheme is proposed to combine complementary

information from partially blurred or defocused images. By using the proposed 2D-

BEMD algorithm which retains the embedded spatial information while addressing level

correspondence when decomposing two images, improved fusion results can be obtained.

We show the improvement by a lower mean squared error (MSE) value on synthesized

blurred images and a better visual appearance on real images from camera.

5.1 Introduction

Image fusion is essentially a problem of combining information from different imagery, in a

way that complementary information from the distinct images is preserved. Image fusion

finds application in numerous areas such as brain imaging [26], aerial imaging [27], visible

and infra-red imaging for night vision [28], terrestrial observation [29] and combination

of visual and thermal images [30].

Traditionally, image fusion is done with the help of either the Principal Component

Analysis (PCA), wavelets, Gaussian pyramids or pixel averaging [30]. In PCA, only high

energy components are retained in the reconstruction, and as a result there is information

loss for the combined image. In addition, PCA performs linear projections on stochastic

65

Chapter 5. Image Fusion via 2D-BEMD 66

data, and is thus suboptimal. Wavelets require a predefined set of bases images that

do not adapt to the particular characteristics of the images under consideration. This

does not allow for generalization of successful fusion. On the other hand, pixel averaging

performs a simple low-pass operation, which results in loss of edge information.

Recently, it was proposed that empirical mode decomposition can be used for image

fusion [30–33]. Since EMD decomposes the signal into a number of oscillatory modes

that are adaptively defined by the signal itself, it is a natural choice for image fusion.

Prior works relied either on BEMD [30,31,33] or 2D-EMD [34]. For the BEMD case,

the limitations arise from the fact that two dimensional data are vectorized, and this

causes loss of the spatial information of the image. In the 2D-EMD fusion case, two

images are decomposed separately via 2D-EMD where no unified representation after

decomposition is guaranteed, meaning that the number and type of detected oscillatory

modes is uncertain. In an automatic fusion setting, where scale correspondence is re-

quired, this poses a great challenge on combining analogous information from both sides.

In this chapter we address the problem of image fusion, with the proposed 2D-BEMD

that overcomes the above difficulties. The two dimensional nature of the algorithm

preserves the spatial information of the image at all oscillatory modes. Furthermore, the

bivariate aspect of the algorithm allows us to decompose two images simultaneously, and

achieve mode correspondence, which can then be strategically incorporated in a fusion

mechanism.

5.2 Multiscale Fusion with 2D-BEMD

A fusion strategy similar to [31] is employed for the fusion of real and imaginary IMFs

that result from the 2D-BEMD decomposition. Giving two partially blurred or defocused

images A(m,n) and B(m,n), we form the complex image C(m,n)

C(m,n) = A(m,n) + jB(m,n) (5.1)

Chapter 5. Image Fusion via 2D-BEMD 67

Figure 5.1: Top row: partially defocused images A (left) and B (right). Bottom row:

zoomed in details showing the differences between the two images.

By applying 2D-BEMD on C(m,n) we get

C(m,n) =L∑i=1

Ci(m,n) (5.2)

Alternatively,

C(m,n) =L∑i=1

Ai(m,n) + jL∑i=1

Bi(m,n) (5.3)

Figure 5.1 shows an example of two partially defocused images, one is background

focused and the other is foreground focused. Figure 5.2 and Figure 5.3 shows the IMFs

from 2D-BEMD decomposition on the two images. Note that zoomed in details are shown

for the IMFs for better demonstration of the decomposition.

The fused image is obtained by pixel-wise weighting of the two images at different

levels, and summing up the two across all levels

F (m,n) =L∑i=1

αi(m,n)Ai(m,n) + βi(m,n)Bi(m,n) (5.4)

where for each pixel at every level, the weights satisfy

αi(m,n) + βi(m,n) = 1 (5.5)

Chapter 5. Image Fusion via 2D-BEMD 68

Figure 5.2: IMFs 1-5 obtained with the proposed 2D-BEMD algorithm for partially

defocused images. Left column: IMFs corresponding to image A. Right column: IMFs

corresponding to image B.

At each level, the weights are determined by comparing local variances between IMF

from image A and IMF from image B. Intuitively, more weight is assigned to pixel with

a higher local variance to emphasize detailed information, as a result, the corresponding

Chapter 5. Image Fusion via 2D-BEMD 69

Figure 5.3: IMFs 6-10 obtained with the proposed 2D-BEMD algorithm for partially

defocused images. Left column: IMFs corresponding to image A. Right column: IMFs

corresponding to image B.

pixel from IMF of the other image will be assigned less weight. The procedure is as

follows:

Chapter 5. Image Fusion via 2D-BEMD 70

Original Image

Foreground Blurred Background Blurred

2D BEMD (MSE 0.34) 1D BEMD (MSE 1.81)

Pixel Average (MSE 48.82) Wavelet Fusion (MSE 0.59)

Figure 5.4: BEMD and 2D-BEMD fusion results on an artificially generated partially

blurred image.

Chapter 5. Image Fusion via 2D-BEMD 71

αi(m,n) = 0 if var{Ai(m,n)} < var{Bi(m,n)} − ε (5.6)

αi(m,n) = 0.5 if |var{Ai(m,n)} − var{Bi(m,n)}| < ε (5.7)

αi(m,n) = 1 if var{Ai(m,n)} > var{Bi(m,n)}+ ε (5.8)

where ε is a controllable threshold to determine if one local variance can be considered

significant higher than the other. The local variance var{Ai(m,n)} is calculated by

examining the variance within a pre-defined window

var{Ai(m,n)} =∑

(p,q)∈win(m,n)

[Ai(m+ p, n+ q)− µ]2 (5.9)

where

µ =∑

(p,q)∈win(m,n)

Ai(m+ p, n+ q) (5.10)

and

win(m,n) = {(p, q) : max(1,m−w) ≤ p ≤ min(M,m+w),max(1,m−w) ≤ q ≤ min(N,m+w)}

(5.11)

for image size M ×N .

5.3 Experiment Result

The performance of the proposed scheme was tested for fusion on both synthesized and

camera captured images. In our experiments all images are of size 399× 600 and in gray

scale, 8-bit unsigned integer format. We use threshold ε = 30 and window size w = 40.

These settings are for demonstrating the results only since the focus is on application of

the 2D-BEMD algorithm rather than the design of a fusion scheme. More sophisticated

parameter settings can be used in real world applications for better result.

Chapter 5. Image Fusion via 2D-BEMD 72

5.3.1 Results on synthesized blurred images

The proposed algorithm was compared against well known image fusion strategies i.e.,

BEMD, pixel averaging and wavelets. For BEMD the approach in [33] is used. For pixel

averaging, the fused image is obtained by taking average of all pixel pairs in both images.

For wavelet, we decompose the two images at level 5 by Symlets-4. Approximations and

details coefficients are merged elementwise by taking the maximum on both sides.

For comparison purposes, partially blurred images was artificially generated in order

to quantify performance via MSE. Figure 5.4 shows the original image and the two

blurred versions. The foreground and background regions were artificially blurred with

a Gaussian filter of radius 1.5 pixels. Figure 5.4 also shows the reconstruction results for

four fusion methods.

Pixel averaging performs worse than all fusion methods (the mse is 48.82 ). This

is because it is equivalent to a low pass operation whereby the detail of the image is

destroyed. The wavelet result is visually satisfying, however the error is greater than

the 2D-BEMD case. Although wavelet fusion is by definition multi-scale, this decompo-

sition relies on predefined bases and as such may miss the intrinsic information of the

image. BEMD address this problem by analyzing signal adaptively, but lacks spatial

treatment because the decomposition is performed on vectorized versions of the images.

The proposed 2D-BEMD sufficiently addresses this problem by analyzing the two images

simultaneously in the space domain.

5.3.2 Results on partially focused photos

Figure 5.5 shows two examples of partially defocused images from camera and the 2D-

BEMD reconstruction result. Despite the complexity of the depicted scenes, the adaptive

nature of the decomposition process manages to capture the finest detail from both

images. High frequency characteristics that are present in the two partially defocused

Chapter 5. Image Fusion via 2D-BEMD 73

images are preserved by the decomposition and depicted in the fused result. In addition,

even though both images are crowded and the transitions of foreground to background

blurring is among objects that physically overlap, the 2D-BEMD reconstruction exhibits

clear transitions from one object to the other.

Image A.

Foreground Blurred

Image A.

Background Blurred

Image A.

2D BEMD Reconstruction

Image B.

Foreground Blurred

Image B.

Background Blurred

Image B.

2D BEMD Reconstruction

Figure 5.5: 2D-BEMD fusion results on two partially defocused sets of images.

A further comparison of BEMD and 2D-BEMD is shown in Figure 5.6, where a

zoomed version of the fused reconstruction is provided for comparison. It is clear from

the fusion detail, that without further refinements, vectorization in the case of BEMD

introduces artifacts. On the contrary, the 2D-BEMD manages to transfer details from

both images in an artifact free way.

5.4 Conclusion

This chapter presented the use of 2D-BEMD for image fusion to achieve a unified rep-

resentation while retaining spatial information at each scale. The algorithm decomposes

Chapter 5. Image Fusion via 2D-BEMD 74

Background

Focus

Foreground

Focus1D BEMD

Fusion

2D BEMD

Fusion

1D BEMD

Fusion

Details

2D BEMD

Fusion

Details

Figure 5.6: Examples of BEMD and 2D-BEMD based fusion results. The input images

are partially defocused (background versus foreground) while the reconstructed (all in

focus) image of the BEMD case exhibits significant artifacts.

simultaneously two images to provide common oscillatory modes that benifits image fu-

sion where the purpose is to reconstruct an image that gathers details from both sources.

The algorithm was tested on defocused and partially blurred images, to demonstrate

fusion power over other traditional methods and BEMD, in terms of the MSE and visual

quality.

Chapter 6

Expression Invariant Face

Recognition via 2D-BEMD

In this chapter an expression-invariant automatic face recognition system is proposed.

2D-BEMD is used for simultaneously decomposition of two face images in order to apply

a multi-level information fusion for expression transformation. With the help of the

expression transformation, better within class variation can be learned and the proposed

method shows an improvement over the traditional PCA method.

6.1 Introduction

Despite the significant advances of face recognition over the past two decades, the per-

formance of most algorithms degrades under severe expression variation. Even though

individuals can be enrolled to the biometric system under a desired facial expression (typ-

ically neutral), there is no guarantee that during the recognition mode of operation, the

subject will present his/her face under the same expression. Supervised learning, such as

linear discriminant analysis (LDA) is capable of addressing this problem, by learning the

morphological variations of a particular subject. This solution, however, requires that the

biometric is presented for training under all its variants i.e., under all facial expressions.

75

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 76

Despite the fact that this is impractical for real life systems, the performance may not

improve because of the very high dimensional representation of face images, compared

with the relatively small number of training samples.

6.2 Related Works

In [36] the authors discuss the problem of face recognition from one sample image per

subject, and present a review of works done under the illumination, pose, and expression

variation problem. The most important methods proposed to deal with the problem of

recognizing expression-variant faces from one sample image per person include the local

eigenspace approach [37], separation of texture and shape features of the face [38], and

the use of the tensorface concept to transform non-neutral expressions to neutral [39].

All of these methods suffer from some or all of the following shortcomings:

• The expression of the probe image is required to be determined.

• The probe image is required to be warped to all the gallery images.

• Facial landmark points of the stored and probe images are required to be selected

to fit 2D triangulated meshes.

which result in a time-consuming and error-prone process [39].

In this chapter we propose a solution based on expression transformation with the

2D-BEMD. Among the strengths of this method is that the expression of the probe does

not need to be determined, while a subject can be enrolled to the system using only one

image. This image is then used to synthesize new expressions for the enrollee, so that

the classifier can learn intra-class variability. With this treatment, a probe image with a

random arbitrary expression can be recognized by the system.

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 77

6.3 Expression Transformation with 2D-BEMD

In order to transform expressions, while retaining the anatomical properties of face,

expression related information needs to be separated first. After that we can replace the

expression-related information in an face image with the one from target expression, by

applying 2D-BEMD to the face image and the expression mask and fuse information at

different levels.

6.3.1 Expression mask

Under the assumption that local oscillations are related to particular expressions, we

propose the design of expression masks i.e., masks of superimposed oscillations that are

significant in differentiating expressions, and which do not carry subject-specific infor-

mation. To this end, a first step in the analysis of facial expressions, is to identify the

oscillatory modes that are descriptive for every expression. We herein deal with the

problem of expression transformation, among 6 emotions (happy, surprise, fear, disgust,

angry and sad). An expression mask is designed according to the following steps:

1. 2D-EMD analysis on images of the expression subset x.

2. Estimation of the within class variability among corresponding IMFs.

3. Identification of IMF level i with lowest intra expression variability.

4. Mask construction by averaging of ith IMFs (low-pass filtering to remove inter-

subject dependence).

Figure 6.1 shows the designed expression masks. It is interesting to note, that al-

though the expression within each mask is recognizable, the identity information is ab-

sent.

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 78

Happy Mask Suprise Mask Fear Mask Disgust Mask Angry Mask Sad Mask

Figure 6.1: Expression masks used for decomposition with an input image of arbitrary

expression.

6.3.2 Expression transformation

The next step is to decompose an input image, with the mask of a targeted expression

simultaneously. The objective is to fuse the IMFs of the two in a way, that only the

anatomical and subject-specific information is kept from the input side, while the ex-

pression is replaced with the mask. The 2D-BEMD algorithm proposed in Chapter 3 is

utilized for this task, because it allows for a unified representation between the mask and

the input image, while applying 2D-EMD separately on the mask and input fails to do

so.

This procedure is usually referred to as fusion via fission [30, 32]. Although facial

oscillations with EMD have been explored thoroughly [32, 40–42], there is no report on

its ability to separate the expression from a random face image. By treating the mask as

the targeted oscillation, and by decomposing it together with the input via 2D-BEMD,

we expect the targeted mode to exist among the IMFs of the input.

More precisely, given an input image I(m,n) and an expression mask M(m,n), we

form the complex image C(m,n)

C(m,n) = I(m,n) + jM(m,n) (6.1)

By applying 2D-BEMD on C(m,n) we get

C(m,n) =L∑i=1

I i(m,n) + jL∑i=1

M i(m,n) (6.2)

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 79

Input Real Real Real Real Real Real Real Real RealInputImage

RealIMF 1

RealIMF 2

RealIMF 3

RealIMF 4

RealIMF 5

RealIMF 6

RealIMF 7

RealIMF 8

RealIMF 9

0.4

0.6

0.8

1

atio

n C

oe

ffic

ien

t

Correlation Weights

Desired Expression

Imaginary IMF 1

Imaginary IMF 2

Imaginary IMF 3

Imaginary IMF 4

Imaginary IMF 5

Imaginary IMF 6

Imaginary IMF 7

Imaginary IMF 8

Imaginary IMF 9

2 4 6 8 100

0.2

IMF

Co

rre

la

Expression Mask Weights

Input Weights

Figure 6.2: 2D-BEMD analysis for an input image and a surprised mask image. The

edges of the input are among the first few IMFs, while most of the information of the

mask is found in IMFs 5 and 6.

Figure 6.2 shows an example of IMFs acquired from a 2D-BEMD analysis of an input

and a surprise mask.

For reconstruction, the magnitude of both the real and imaginary IMFs is summed,

to form a new image R(m,n), with the anatomical characteristics of the input and the

expression of the mask. Weights are defined based on the correlation coefficient of the

mask M(m,n) with its IMFs M i(m,n), as follows:

wmask(i) =E[(M(m,n)− µM)(M i(m,n)− µM i)]

σMσM i

(6.3)

Since the mask is constructed using a number of moderate oscillations, in the 2D-

BEMD analysis, the imaginary IMFs are expected to exhibit low correlation with low

order IMFs (corresponding to fastest oscillations). Therefore, the weighting function

relies on the correlation coefficient to minimize the effect of low order IMFs, while em-

phasizing the actual oscillations of the mask.

On the input side, an inverse treatment is required. It has been observed [32,42], that

low order IMFs carry most of the edge information in the image, which is directly related

to the anatomical properties of faces. Thus, automatically emphasizing this information

is crucial for accurate face recognition. In addition, the expression information of the

input needs to be suppressed. Based on these requirements, the following weighting

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 80

Input Real Real Real Real Real Real Real Real RealInputImage

RealIMF 1

RealIMF 2

RealIMF 3

RealIMF 4

RealIMF 5

RealIMF 6

RealIMF 7

RealIMF 8

RealIMF 9

0.4

0.6

0.8

1

atio

n C

oe

ffic

ien

t

Correlation Weights

Desired Expression

Imaginary IMF 1

Imaginary IMF 2

Imaginary IMF 3

Imaginary IMF 4

Imaginary IMF 5

Imaginary IMF 6

Imaginary IMF 7

Imaginary IMF 8

Imaginary IMF 9

2 4 6 8 100

0.2

IMFC

orr

ela

Expression Mask Weights

Input Weights

Figure 6.3: Weights used in fusion.

function is used for the IMFs of the input:

winput(i) = 1− wmask(i) (6.4)

Figure 6.3 shows how the weights are determined for the input and the mask IMFs

in Figure 6.2. The reconstructed image can then be computed as:

R(m,n) =L∑i=1

(winput(i) ∗ I i(m,n))+(wmask(i) ∗M i(m,n)) (6.5)

Figure 6.4 shows examples of expression reconstructions for a number of images, along

with the ground truth i.e., the actual image of the transformed person.

6.4 Experimental setup and results

To evaluate the performance of the proposed expression invariant face recognition sys-

tem, we used the Cohn-Kanade database [43], which is currently the most comprehensive

facial expression database. This database contains video sequences from 97 people, per-

forming a series of 1 to 9 facial expressions. Every video sequence starts from a neutral

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 81

Figure 6.4: Examples of expression transformation with 2D-BEMD. From top to bottom:

input images, expression masks, transformed faces and ground truth images.

Ga

llery

BEMD

Expr

1E

xpr

6

Masks

Train LDA

.

.

.

Figure 6.5: The enrollment pipeline. Every gallery image to be enrolled is first used to

synthesize 6 expression variants.

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 82

10−3

10−2

10−1

100

0.4

0.5

0.6

0.7

0.8

0.9

1

False acceptance rate

Verification r

ate

PCA

Proposed Method

Figure 6.6: Verification rate versus false acceptance rate, for gallery of size 10.

10−3

10−2

10−1

100

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

False acceptance rate

Verification r

ate

PCA

Proposed Method

Figure 6.7: Verification rate versus false acceptance rate, for gallery of size 20.

expression and ends to a target expression. We selected the last frame, which corre-

sponds to the most intense expression, of video sequences displaying any of the following

six expressions: happy, surprise, fear, disgust, anger and sadness. We also selected the

first frame of one of the video sequences of each person as the neutral expression. The

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 83

10−3

10−2

10−1

100

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

False acceptance rate

Verification r

ate

PCA

Proposed Method

Figure 6.8: Verification rate versus false acceptance rate, for gallery of size 40.

resulting dataset consists of 500 images of 96 people, where each person has at least two

different expressions. We aligned the images using the eye center coordinates, masked,

and cropped to an image of size 273 by 204 pixels. The final images are normalized to

have zero mean and unit variance.

For face recognition, we are considering the following scenario. There is a gallery

of neutral images with one image per subject, and there is a probe set with images of

random (non-neutral) expressions. For every image in the gallery set, we synthesize 6

additional expression images, using the proposed method. The reconstructed images,

along with the gallery one, are used to train the LDA classifier [44]. Figure 6.5 shows the

block diagram of the training phase for this system. The resulted discriminant projection

directions are used to extract the expression-invariant features of the gallery and probe

images for verification.

Figure 6.6, 6.7 and 6.8 shows the ROC curves for the proposed method and the

eigenface method, for performance comparison. For the implementation of the eigenface

Chapter 6. Expression Invariant Face Recognition via 2D-BEMD 84

method, the PCA coefficients were weighted by their corresponding eigenvalues, as this

improves the recognition performance under expression variation [12]. The number of

eigenfaces that correspond to 99% of the eigenvalue energy was used. The performance

is reported for three different gallery sizes i.e., 10, 20 and 40. In each case, gallery images

were randomly selected out of 96. The random selection procedure was repeated such

that almost all pairs of images in the dataset were matched against each other. The

cosine distance was used as the similarity measure. Figure 6.6 to Figure 6.8 demonstrate

the robustness of the system to changes of the gallery size.

6.5 Conclusion

This chapter presented the problem of expression invariant face recognition, under the

scenario of one sample image per gallery subject. We advocate that within subject ex-

pression variability can be learned by the classifier, using synthesized expressions. To that

end, we employ the proposed 2D-BEMD, in order to analyze and separate the anatomical

features of the face from the expression. Expression masks of particular oscillatory activ-

ity are designed to be decomposed simultaneously with the gallery image. A fusion via

fission approach is adopted, to synthesize a new image, where only the subject-specific

information is retained from the gallery, while the expression characteristics are passed

from the mask. When the gallery and its synthesized expressions are used to train the

LDA, significant recognition improvement is observed.

Chapter 7

Conclusion

7.1 Summary

In this thesis, a framework for obtaining unified signal representation using empirical

mode decomposition was proposed. The representation or “uniqueness” problem was

investigated in both one dimensional and two dimensional cases, showing that results

from state of the art algorithms are far from satisfactory.

For one dimensional case, current algorithms are capable of obtaining unified represen-

tation for pair, triplet or even a collection of signals. The problem is that computational

complexity grows as the number of signals increases. Furthermore, decomposing a new

signal requires re-decomposing all the signals in collection, which is not pracical. To

address the problem, we proposed reference EMD (R-EMD) that incoorporates of a set

of reference signals to decompose each signal in collection, thus obtaining unified rep-

resentation for all signals. We also demonstrated that reference frequencies need to be

designed according to certain rules, with small variation allowed and determined by the

signal property.

The proposed R-EMD algorithm was applied in decomposing TEOAE signals in a

biometric system. The unified representation across all the subjects in the system makes

85

Chapter 7. Conclusion 86

it possible to apply a multi-level correlation score fusion in a systematic way, which is

not feasible by applying any of the current EMD algorithms.

For two dimensional case, EMD related algorithms are only capable of decomposing

one image at a time, resulting in a non-unified representation even for a pair of sig-

nals. We addressed this problem by the introduction of two dimensional bivariate EMD

(2D-BEMD) which decomposes two images as a comlex pair. By doing so, unified repre-

sentation can be obtained with same number of levels for both images and correspondence

between IMFs at the same level.

The proposed 2D-BEMD algorithm was applied in multiresolution image fusion and

expression invariant face recognition. In both applications, 2D-BEMD was used for

information fusion from two images (multi-focus images for the former, face images with

different expressions for the latter) which requires unified representation for both images.

Additional to the 2D-BEMD algorithm, a 2D-R-EMD was also discussed as an ex-

tension for more than two images. Although no application was presented, 2D-R-EMD

completes the proposed framework.

7.2 Future Directions

As an extension of the thesis work, there are two major topics.

1. To complete the proposed framework the following work need to be done:

• 2D-R-EMD need to be implemented in order to complete the proposed frame-

work. This algorithm will find application in multichannel data fusion for

images.

• For R-EMD since reference signal is used we sacrifice a certain degree of adap-

tiveness. A quatified measure on the tradeoff between adaptiveness and unified

representation will be benificial.

Chapter 7. Conclusion 87

2. In terms of applications these are possible directions:

• For using TEOAE for bometric recognition, a dataset with more subjects will

need to be collected. Once the dataset is ready, the empirical weights used

for combining correlations in the proposed method can be determined from a

training procedure. In addition, other operation modes of biometric system

such as authentication and intruder test need to be investigated. Moreover, in

order for TEOAE to be established as a biometric modality, its short term and

long term variability need to be studied to justify why this signal is individually

unique.

• For image fusion related applications, the proposed method need to be tested

on a broader range of applications, such as the fusion of visual and thermal

images. In addition, 2D-BEMD can be combined with more sophisticated

fusion schemes to improve the performance.

• For expression invariant face recognition, we need to compare the proposed

system with other state of the art methods. Furthermore, instead of assigning

weight to the entire IMF, block-wise weighting can be used to fuse information

according to a local rather than global criterion.

With the applications and promising results presented in this thesis, we believe that

the proposed framework will benifit many applications in signal and image processing,

as well as providing a better understanding of the EMD algorithm.

Appendix A

Demonstration of the sifting process

In order to gain a better insight into how sifting process acts as the core part of EMD

algorithm and how the IMFs are extracted, in this appendix a simple synthesized signal is

decomposed and we show every major step of the sifting iterations. As we have presented

in Chapter 2, the EMD algorithm consists of two major loops, with the outer loop to

identify the IMF and update the residue, and the inner loop to extract the IMF, which

is done by the sifting process.

All plots except Figure A.1 in this appendix are generated using the EMD package

by Flandrin [45]. The synthesized signal is shown in Figure A.1, which is generated by

generating four uniformly distributed time series of different length, interpolate each of

them to the same length and sum up all four. Its IMFs and residue after applying EMD,

are shown in Figure A.2.

The first step is to interpolate the local maxima and minima from the original signal,

in order to get an upper envelope and a lower envelope. These envelopes are shown as

dashed lines in the top plot in Figure A.3. Mean envelope is computed as the average

of the upper and lower envelope and is shown as the line in between the two. After

subtracting the mean envelope from the signal, a candidate IMF is obtained and is

shown in the third plot in Figure A.3. Subtracting the candidate IMF from the original

88

Appendix A. Demonstration of the sifting process 89

−1

−0.5

0

0.5

1

Figure A.1: Signal to be decomposed.

−1

0

1

−1

0

1

−0.5

0

0.5

−0.5

0

0.5

−0.2

0

0.2

−0.5

0

0.5

200 400 600 800 1000 1200 1400 1600 1800 2000−0.2

−0.1

0

Figure A.2: IMFs and the residue after decomposition.

Appendix A. Demonstration of the sifting process 90

signal gives us the residue for the current iteration, as shown in the last plot in Figure

A.3. In fact this residue is equal to the mean envelope, but this is only true for the first

iteration in the sifting process for every IMF, as we will see later.

−2

0

2IMF 1; iteration 0 before sifting

0

5000

10000stop parameter

−1

0

1IMF 1; iteration 0 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−2

0

2residue

Figure A.3: Sifting iteration 0 for Candidate IMF 1.

Stop paramter is checked according to Equation 2.1.4. In the second plot of each fig-

ure, the solid line is the amplitude ratio between the upper and lower envelope |envu(n)+envl(n)||envu(n)−envl(n)|

and dashed line is δ1. The following paramters are used:δ1 = 0.05

δ2 = 0.5

δt = 0.05

Since large value exists for the envelope amplitude ratio, as can be seen in the second

Appendix A. Demonstration of the sifting process 91

plot in Figure A.3, the stop criterion is not satisfied so the sifting process need to be

continued. By interpolating the local maxima and minima from the residue in Figure

A.3, upper and lower envelopes are constructed, as shown in the first plot in Figure

A.4. Again, we remove the mean envelope to get a candidate IMF1 and test if the stop

criterion is satisfied. Note here the residue is the original signal form iteration 0 minus

the candidate IMF1, so it is equal to the sum of the two mean envelopes from iteration

0 and 1.

−2

0

2IMF 1; iteration 1 before sifting

0

1000

2000stop parameter

−1

0

1IMF 1; iteration 1 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−2

0

2residue

Figure A.4: Sifting iteration 1 for Candidate IMF 1.

Since the stop criterion is not satisfied, the sifting process continues. Figure A.5 shows

iteration 10, where the stop parameter is small enough so that the candidate IMF1 can

be considered as a valid IMF. This is where sifting process stops for IMF1.

Now the terminate criterion is checked to see if the current residue is the final residue

Appendix A. Demonstration of the sifting process 92

for the signal. Since the residue has more than 3 local extrema, it’s not the final residue,

then IMF1 is removed from the original signal from iteration 0 and the residue acts as

the input to the next step, where IMF2 will be extracted.

−1

0

1IMF 1; iteration 10 before sifting

0

0.5

1stop parameter

−1

0

1IMF 1; iteration 10 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−2

0

2residue

Figure A.5: Sifting iteration 10 for Candidate IMF 1.

Upper and lower envelopes are constructed on the current signal, which is the residue

after extracting IMF1. Now candidate IMF2 is the current signal minus the mean enve-

lope. Similar to the previous iterations in IMF1, stop parameter is checked to see if the

candidate IMF2 is a valid one. Since it is not, sifting process continues on the residue

from this iteration.

Similar to the last sifting iteration of IMF1, at iteration 10 IMF criterion is satisfied

so the candidate IMF2 is considered valid. This is where sifting process stops for IMF2.

Now the terminate criterion is checked to see if the current residue is the final residue

Appendix A. Demonstration of the sifting process 93

−2

0

2IMF 2; iteration 0 before sifting

0

5

10x 10

5 stop parameter

−1

0

1IMF 2; iteration 0 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−1

0

1residue

Figure A.6: Sifting iteration 0 for Candidate IMF 2.

for the signal. Since the residue has more than 3 local extrema, it’s not the final residue.

By removing IMF2 from the signal at iteration 0, we get the residue which is going to

be passed on to the next level.

From Figure A.8 to Figure A.11 we show some of the major steps to extract IMFs

up to the final residue. Note that in Figure fig:AppendixA:siftingIMF6It3, after stop

criterion is checked to validate the candidate IMF6, terminate criterion is checked on

residue to determine that it can no longer be decomposed so can be considered the final

residue of the signal. This is where the entire algorithm stops.

Appendix A. Demonstration of the sifting process 94

−1

0

1IMF 2; iteration 10 before sifting

0

0.5stop parameter

−1

0

1IMF 2; iteration 10 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−1

0

1residue

Figure A.7: Sifting iteration 10 for Candidate IMF 2.

Appendix A. Demonstration of the sifting process 95

−1

0

1IMF 3; iteration 0 before sifting

0

1000

2000stop parameter

−1

0

1IMF 3; iteration 0 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−1

0

1residue

Figure A.8: Sifting iteration 0 for Candidate IMF 3.

Appendix A. Demonstration of the sifting process 96

−0.5

0

0.5IMF 3; iteration 7 before sifting

0

0.5stop parameter

−0.5

0

0.5IMF 3; iteration 7 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−1

0

1residue

Figure A.9: Sifting iteration 7 for Candidate IMF 3.

Appendix A. Demonstration of the sifting process 97

−1

0

1IMF 6; iteration 0 before sifting

0

0.5

1stop parameter

−0.5

0

0.5IMF 6; iteration 0 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2

0

0.2residue

Figure A.10: Sifting iteration 0 for Candidate IMF 6.

Appendix A. Demonstration of the sifting process 98

−0.5

0

0.5IMF 6; iteration 3 before sifting

0

0.5stop parameter

−0.5

0

0.5IMF 6; iteration 3 after sifting

0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2

−0.1

0residue

Figure A.11: Sifting iteration 3 for Candidate IMF 6.

Appendix B

TEOAE dataset

In order to evaluate the proposed method on using transient evoked otoacoustic emission

(TEOAE) signals for biometric recognition, a dataset was collected at the Biometric Se-

curity Laboratory, University of Toronto. This is the first TEOAE dataset with moderate

number of subjects and created specifically for biometric evaluation purpose.

B.1 Data collection setup

The signal collection sessions was carried out at the Biometric Security Laboratory [46],

under University of Toronto protocol #23018. Vivosonic Intergrity [47] system was used

under a setting listed in Table 4.1.

Since the TEOAE signal is highly nonlinear and with weak amplitude, raw response

data are subjected to pre-processing before can be collected as output, which is a standard

in the nonlinear protocol for TEOAE recording. In such protocol, a train of stimuli

consisiting of identical series of four clicks are sent into the ear. Here we distinguish

responses at four different pre-processing stage:

• Per-sweep response

Each stimulus is a 80µs wide click sound, called a click or a sweep, as dipicted in

99

Appendix B. TEOAE dataset 100

Figure B.1. Per-sweep response is the response collected at the outer ear canal for

a duration of 17.2ms after the stimulus. This response is not available for output

in the Intergrity system we used.

• Per-series response

Each series of stimuli consists of four stimuli: three identiccal ones and one with

opposite polarity and three times the amplitude. Per-series response is the response

averaged over these four responses, with the purpose to cancel out the linear com-

ponents in the original per-stimulus response. This response is not available for

output in the Intergrity system we used.

• Per-buffer response

In our setting, per-series responses are divided into groups of 16, with responses

1, 3, 5, 7, 9, 11, 13, 15 goes into buffer A and responses 2, 4, 6, 8, 10, 12, 14, 16 goes into

buffer B. Per-buffer response is the response averaged over 8 per-series responses,

and is the one we can collect using the Intergrity system. Example of such response

is shown in Figure B.2. To measure if a stable TEOAE has been detected, the per-

buffer responses from A and B are filtered and their correlation is the whole wave

reproducibility (WWR).

• Buffer averaged response

Finally every 2 per-buffer response from the same group are averaged to get the

buffer averaged response. Example of such response is shown in Figure B.3.

Note that these response has a low frequency trend with a high amplitude so that the

structure of the signal cannot be observed. Figure B.4 shows the same buffer averaged

response as in Figure B.3, with the trend removed for better visual quality.

Each time three signals are recorded: the stimulus , per-buffer response in buffer A

and per-buffer response in buffer B. Length of recording session depends on how fast

Appendix B. TEOAE dataset 101

0 0.5 1 1.5 2 2.5

−100

−50

0

50

100

150

200

250

300

350

400

Time (ms)

Am

plit

ude (

mP

a)

Figure B.1: Example of a stimulus.

Time (ms)

Am

plit

ud

e (

mP

a)

0 2 4 6 8 10 12 14 16−1

−0.5

0

0.5

1

1.5

2

BufferA

0 2 4 6 8 10 12 14 16−3

−2

−1

0

1

BufferB

Figure B.2: Example of 2 per-buffer response from the same group.

Appendix B. TEOAE dataset 102

0 2 4 6 8 10 12 14 16

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

Time (ms)

Am

plit

ud

e (

mP

a)

Figure B.3: Example of buffer averaged response.

0 2 4 6 8 10 12 14 16

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

Time (ms)

Am

plit

ud

e (

mP

a)

Figure B.4: Example of buffer averaged response with low frquency trend removed.

the response stablize, which varies by different subject, body condition, body movement,

ear bud fitting, environmental noise, etc.. The person who operates the Intergrity de-

Appendix B. TEOAE dataset 103

vice looks at the whole wave reproducibility (WWR) measure on the system and stop

recording whenever the WWR is high enough (above 90%) or the WWR saturates at

some particular value. The stabilization time of all the recordings, including those from

outliers, are summarized in Figure B.5.

0 20 40 60 80 100 1200

5

10

15

20

25

30

35

40

Stabization Time (Sec)

Nu

mb

er

of

su

bje

cts

Mean = 39.246, Std = 24.3503

Figure B.5: Histogram of stabilization time for recordings from all subjects, including

outliers.

B.2 Intra-subject similarity and inter-subject differ-

ence

After applying the proposed R-EMD, intra-subject similarity and inter-subject difference

can be observed on the decomposed signals. Figure B.6 shows 4-level decomposed record-

ings from left ear of subject 3, with solid lines representing the first session, and dashed

lines for the second session. In Figure B.7 the same first session recording is plotted in

Appendix B. TEOAE dataset 104

solid line, together with the decomposed second session recording from left ear of subject

2 in dashed lines.

Time (ms)

Am

plit

ud

e (

mP

a)

0 2 4 6 8 10 12 14 16

−0.05

0

0.05

0 2 4 6 8 10 12 14 16

−0.05

0

0.05

0 2 4 6 8 10 12 14 16

−0.05

0

0.05

0 2 4 6 8 10 12 14 16

−0.1

0

0.1

Figure B.6: Similarity of TEOAE recordings from the same subject after applying R-

EMD.

B.3 Outlier removal

Highest WWR from left and right ears in both sessions for all subjects are summarized

in Figure B.8. We removed recordings from subject 3 since a wrong testing protocol was

used by mistake for session one. A total of 6 outliers were also removed from the dataset

because their WWR were too low, compared to clinical standard of 85% [48]. Subject

IDs of outliers are: 23, 37, 40, 44, 55, 60 and their WWRs are listed in Table B.3.

Appendix B. TEOAE dataset 105

Time (ms)

Am

plit

ud

e (

mP

a)

0 2 4 6 8 10 12 14 16−0.1

0

0.1

0 2 4 6 8 10 12 14 16

−0.1

0

0.1

0 2 4 6 8 10 12 14 16

−0.05

0

0.05

0 2 4 6 8 10 12 14 16

−0.1

0

0.1

Figure B.7: Difference of TEOAE recordings form different subjects after applying R-

EMD.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

WWR

Nu

mb

er

of

Re

co

rdin

gs

Mean = 86.6228, Std = 12.2601

Figure B.8: Histogram of WWR for recordings from all subjects, including outliers.

Appendix B. TEOAE dataset 106

SubjectID Left Session 1 Left Session 2 Right Session 1 Right Session 2

23 79.82 81.84 80.74 67.87

37 8.12 29.39 81.81 83.04

40 22.91 32.91 56.30 58.58

44 48.52 71.58 22.84 24.17

55 38.04 40.80 62.61 72.50

60 63.70 32.78 80.78 68.36

Bibliography

[1] N.E. Huang, Z. Shen, R.R. Long, M.L. Wu, Q. Zheng, N.C. Yen, and C.C Tung. The

empirical mode decomposition and hilbert spectrum for nonlinear and nonstationary

time series analysis. Proc. Roy. Soc. London, 454:903995, 1998.

[2] K.T Coughlin and K.K Tung. 11-year solar cycle in the stratosphere extracted by

the empirical mode decomposition method. Advances in Space Research, 34(2):323

– 329, 2004. ¡ce:title¿Solar Variability and Climate Change¡/ce:title¿.

[3] Zhaohua Wu, Edwin K Schnieder, and Zeng-zhen Hu. The impact of global warming

on enso variability in climate records. Transform, 2001.

[4] Binwei Weng, M. Blanco-Velasco, and K.E. Earner. Ecg denoising based on the

empirical mode decomposition. In Engineering in Medicine and Biology Society,

2006. EMBS ’06. 28th Annual International Conference of the IEEE, pages 1 –4, 30

2006-sept. 3 2006.

[5] Pengfei Wei, Qiuhua Li, and Guanglin Li. Classifying motor imagery eeg by empirical

mode decomposition based on spatial-time-frequency joint analysis approach. In

BioMedical Information Engineering, 2009. FBIE 2009. International Conference

on Future, pages 489 –492, dec. 2009.

[6] B. Liu, S. Riemenschneider, and Y. Xu. Gearbox fault diagnosis using empirical

mode decomposition and hilbert spectrum. Mechanical Systems and Signal Process-

ing, 20(3):718 – 734, 2006.

107

Bibliography 108

[7] Kousik Guhathakurta, Indranil Mukherjee, and A. Roy Chowdhury. Empirical mode

decomposition analysis of two different financial time series and their comparison.

Chaos, Solitons and Fractals, 37(4):1214 – 1227, 2008.

[8] E. Delechelle, J. Lemoine, and Oumar Niang. Empirical mode decomposition: an

analytical approach for sifting process. Signal Processing Letters, IEEE, 12(11):764

– 767, nov. 2005.

[9] G. Rilling and P. Flandrin. One or two frequencies? the empirical mode decompo-

sition answers. Signal Processing, IEEE Transactions on, 56(1):85 –95, jan. 2008.

[10] P. Flandrin, G. Rilling, and P. Goncalves. Empirical mode decomposition as a filter

bank. Signal Processing Letters, IEEE, 11(2):112 – 114, feb. 2004.

[11] N E Huang and S S Shen. Hilbert-Huang transform and its applications, volume 5.

World Scientific Publishing Co. Pte. Ltd., 2005.

[12] H. Mohammadzade, F. Agrafioti, Jiexin Gao, and D. Hatzinakos. Bemd for expres-

sion transformation in face recognition. In Acoustics, Speech and Signal Processing

(ICASSP), 2011 IEEE International Conference on, pages 1501 –1504, may 2011.

[13] F. Agrafioti, Jiexin Gao, H. Mohammadzade, and D. Hatzinakos. A 2d bivariate

emd algorithm for image fusion. In Digital Signal Processing (DSP), 2011 17th

International Conference on, pages 1 –6, july 2011.

[14] Jiexin Gao, F. Agrafioti, S. Wang, and D. Hatzinakos. Transient otoacoustic

emissions for biometric recognition. In Acoustics, Speech and Signal Processing

(ICASSP), 2012 IEEE International Conference on, march 2012.

[15] Jiexin Gao and D. Hatzinakos. Effect of initial phase in two tone separation using em-

pirical mode decomposition. In Acoustics, Speech and Signal Processing (ICASSP),

2012 IEEE International Conference on, march 2012.

Bibliography 109

[16] J.C Nunes, Y Bouaoune, E Delechelle, O Niang, and Ph Bunel. Image analysis

by bidimensional empirical mode decomposition. Image and Vision Computing,

21(12):1019 – 1026, 2003.

[17] Anna Linderhed. 2d empirical mode decompositions in the spirit of image compres-

sion. volume 4738, pages 1–8. SPIE, 2002.

[18] S. M. A. Bhuiyan, R. R. Adhami, and J. F. Khan. Fast and adaptive bidimensional

empirical mode decomposition using order-statistics filter based envelope estimation.

EURASIP Journal on Advances in Signal Processing, 2008.

[19] G. Rilling, P. Flandrin, P. Gonalves, and J.M. Lilly. Bivariate empirical mode

decomposition. IEEE Signal Processing Letters, 14(12):936 –939, Dec. 2007.

[20] N. ur Rehman and D.P. Mandic. Empirical mode decomposition for trivariate sig-

nals. Signal Processing, IEEE Transactions on, 58(3):1059 –1068, march 2010.

[21] N. Rehman and D. P. Mandic. Multivariate empirical mode decomposition. Pro-

ceedings of the Royal Society A: Mathematical, Physical and Engineering Science,

2009.

[22] Matthew A. Swabey, Paul Chambers, Mark E. Lutman, Neil M. White, John E.

Chad, Andrew D. Brown, and Stephen P. Beeby. The biometric potential of transient

otoacoustic emissions. Int. J. Biometrics, 1:349–364, March 2009.

[23] Jun Yao and Yuan-Ting Zhang. Bionic wavelet transform: a new time-frequency

method based on an auditory model. Biomedical Engineering, IEEE Transactions

on, 48(8):856 –863, aug. 2001.

[24] Ling Zheng, Yuan-Ting Zhang, Fu-Sheng Yang, and Da-Tian Ye. Synthesis and

decomposition of transient-evoked otoacoustic emissions based on an active auditory

Bibliography 110

model. Biomedical Engineering, IEEE Transactions on, 46(9):1098 –1106, sept.

1999.

[25] Yael Raz. Otoacoustic emissions: Clinical applications, 3rd edition, martin s. robi-

nette, theodore j. glattke, eds., new york: Thieme medical publishers, 2007. The

Laryngoscope, 117(9):1700–1700, 2007.

[26] D. A. Socolinsky and L. B. Wolff. In Image fusion for enhanced visualization of

brain imaging, San Diego, CA, Feb. 1999.

[27] D.A. Socolinsky and L.B. Wolff. Multispectral image visualization through first-

order fusion. IEEE Transactions on Image Processing, 11(8):923 – 931, aug. 2002.

[28] D.A. Fay, A.M. Waxman, M. Aguilar, D.B. Ireland, J.P. Racamato, W.D. Ross,

W.W. Streilein, and M.I. Braun. Fusion of multi-sensor imagery for night vision:

color visualization, target learning and search. In Proceedings of the Third Interna-

tional Conference on Information Fusion, 2000., volume 1, jul. 2000.

[29] L. P. Yaroslavsky, B. Fishbain, A. Shteinman, S. Gepshtein, and Leonid P. Process-

ing and fusion of thermal and video sequences for terrestrial long range observation

systems. In 7th Annual International Conference of Information Fusion, pages 848–

855, 2004.

[30] D. Looney and D.P. Mandic. Fusion of visual and thermal images using complex

extension of EMD. In Second ACM/IEEE Int. Conf. on Distributed Smart Cameras,

pages 1 –8, sep. 2008.

[31] D. Looney and D.P. Mandic. Multiscale image fusion using complex extensions of

EMD. IEEE Transactions on Signal Processing, 57(4):1626 –1630, apr. 2009.

Bibliography 111

[32] H. Hariharan, A. Koschan, B. Abidi, A. Gribok, and M. Abidi. Fusion of visible and

infrared images using empirical mode decomposition to improve face recognition. In

IEEE International Conference on Image Processing, pages 2049 –2052, oct. 2006.

[33] N. Rehman, D. Looney, T.M. Rutkowski, and D.P. Mandic. Bivariate EMD-based

image fusion. In IEEE/SP 15th Workshop on Statistical Signal Processing, pages 57

–60, aug. 2009.

[34] X. Xu, H. Li, and A. N. Wang. The application of BEMD to multispectral image

fusion. In Proc. Int. Conference on Wavelet Analysis Pattern Recognition, pages

448–452, 2007.

[35] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuro-

science, 37(1):2–86, 1991.

[36] X. Tan, S. Chen, Z.H. Zhou, and F. Zhang. Face recognition from a single image

per person: A survey. Pattern Recognition, 39(9):1725 – 1745, 2006.

[37] A.M. Martinez. Recognizing imprecisely localized, partially occluded, and expression

variant faces from a single sample per class. IEE Transactions on Pattern Analysis

and Machine Intelligence, 24(6):748 –763, jun. 2002.

[38] X. Li, G. Mori, and H. Zhang. Expression-invariant face recognition with expression

classification. In Proceedings of the The 3rd Canadian Conference on Computer and

Robot Vision, page 77, Washington, DC, USA, 2006.

[39] H.S. Lee and D. Kim. Expression-invariant face recognition by facial expression

transformations. Pattern Recognition Letters, 29(13):1797 – 1805, 2008.

[40] D. Zhang and Y. Tang. Extraction of illumination-invariant features in face recog-

nition by empirical mode decomposition. In M. Tistarelli and M. Nixon, editors,

Bibliography 112

Advances in Biometrics, volume 5558 of Lecture Notes in Computer Science, pages

102–111. Springer Berlin / Heidelberg, 2009.

[41] Y.L. Liu, X.G. Xu, Y.W. Guo, J. Wang, X. Duan, X. Chen, and Q.S. Peng. Pores-

preserving face cleaning based on improved empirical mode decomposition. Journal

of Computer Science and Technology, 24:557–567.

[42] C. Qing, J. Jiang, and Z. Yang. Empirical mode decomposition-based facial pose

estimation inside video sequences. Optical Engineering, 49(3):037401, 2010.

[43] T. Kanade, J.F. Cohn, and Yingli Tian. Comprehensive database for facial expres-

sion analysis. pages 46 –53, 2000.

[44] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. fisherfaces:

recognition using class specific linear projection. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 19(7):711 –720, jul. 1997.

[45] http://perso.ens-lyon.fr/patrick.flandrin/emd.html.

[46] http://www.comm.utoronto.ca/ biometrics/.

[47] http://www.vivosonic.com/.

[48] Angela Constance Garinis. Efferent Control of the Human Auditory System. PhD

thesis, Department of Speech, Language and Hearing Sciences, The University of

Arizona, 2008.