Quaternion-Valued Nonlinear Adaptive
Filters
Prepared byChe Ahmad Bukhari bin Che Ujang
Supervised byProf. Danilo P. Mandic
A thesis submitted in fulfilment of requirements for the degree of Doctor of Philosophyand Diploma of Imperial College London
Communications and Signal Processing GroupDepartment of Electrical and Electronic Engineering
Imperial College London2012
2
Abstract
Advances in vector sensor technology have created a need for adaptive nonlinear signal
processing in the quaternion domain. The main concern of this thesis lies in the issue of
analyticity of quaternion-valued nonlinear functions. The Cauchy-Riemann-Fueter (CRF)
conditions determine the analyticity in the quaternion domain which proved too strict
to be of any practical use. In order to circumvent this problem, split-quaternion nonlin-
ear functions which are analytic componentwise are commonly employed. However, these
functions do not fully capture the correlations between dimensions and are not suitable for
real-world applications. To address this, the use of fully quaternion nonlinear functions in
the derivation of a completely new class of algorithms which takes into consideration the
non-commutative aspect of quaternion product is proposed. These fully quaternion func-
tions satisfy the local analyticity condition (LAC) that guarantees the first-order differen-
tiability of the function. This provides a unifying framework for the derivation of gradient
based learning algorithms in the quaternion domain which are shown to have the same
generic form as their real- and complex-valued counterparts. Unlike existing approaches,
this new class of algorithms derived is suitable for the processing of signals with strong
component correlations and is further extended to the recurrent neural network (RNN)
architecture. Novel algorithms are also derived to improve the computational complexity
of quaternion-valued adaptive filters which could be easily extended to incorporate non-
linear functions. A rigorous mathematical analysis provides a basis for the understanding
of the convergence and steady-state performance of the proposed algorithms. Simulations
over a range of synthetic and real-world signals support the approach taken in the thesis.
3
Acknowledgement
Firstly, I would like to express my deepest gratitude to my supervisor, Professor Danilo
Mandic for his guidance and patience. His utmost dedication to his work has set a good
motivation for me to complete my studies. Throughout the years, Danilo has supported
me in my research and has been patience with my shortcomings. He has given me ample
opportunity to grow as a researcher and it was a privilege working under him.
I would like to thank Dr. Clive Cheong Took for mentoring me throughout my
studies. Working together alongside Clive for all these years has taught me so many
things about academics and life. Clive is one of the best researcher I have ever had the
opportunity to know and it has been an honour working with him.
I am greatly in debt to my parents, Che Ujang Che Daud and Noraini Mat Noor,
and my siblings, Che Adam Rashid Che Ujang and Che Roselind Che Ujang, for their
continuous and unwavering support throughout. I would also like to thank my girlfriend,
Nik Nabilah Nik Amiruddin for her warm love during the cold months.
I am thankful to all my friends and colleagues in the EEE department especially
Zaid Omar, Mossaber Ahmed, Naveed Ur Rehman, Ammar Hassan, Bruce Leow, Hana
Fedora Abdul Aziz, Hussein al-Khattab, Lila Izhar, Pradeep Loganathan, Andy Khong,
David Looney, Cheolsoo Park and Xia Yi Li.
I would like to extend my gratitude to the Malaysia Ministry of Higher Education
(MOHE) and Universiti Putra Malaysia (UPM) for giving me the opportunity to further
my studies at Imperial College London. Last but not least, I would like to thank God as
without His blessings, nothing is possible.
4
Dedicated to Che Ujang Che Daud, Noraini Mat Noor, Che Adam Rashid Che Ujang,
Che Roselind Che Ujang and Nik Nabilah Nik Amiruddin
5
List of Publications
The following publications support the material given in this thesis.
Journal Publications:
1. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Quaternion valued nonlinear
adaptive filtering”, IEEE Transactions on Neural Networks, vol. 22, no. 8, pp.
1193-1206, 2011.
2. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Split quaternion nonlinear
adaptive filtering”, Neural Networks, vol. 23, no. 3, pp. 426-434, 2010.
3. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Identification of improper quater-
nion processes by fractional tap-length adaptive filters”, submitted to IEEE Trans-
actions on Neural Networks and Learning Systems (special issue on learning in non-
stationary and evolving environments).
Conference Publications:
1. B. Che Ujang, C. Cheong Took and D. P. Mandic, “On quaternion analyticity:
enabling quaternion-valued adaptive filtering”, In Proceedings of IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), March 25-30,
2012, Kyoto, Japan.
2. B. Che Ujang, C. Cheong Took and D. P. Mandic, “Identification of improper pro-
cesses by variable tap-length complex valued adaptive filters”, In Proceedings of
International Joint Conference on Neural Networks (IJCNN), pp. 1-6, July 18-23,
2010, Barcelona, Spain.
3. B. Che Ujang, C. Cheong Took and D. P. Mandic, “A split quaternion nonlin-
ear adaptive filter”, In Proceedings of IEEE International Conference on Acoustics,
List of Publications 6
Speech and Signal Processing (ICASSP), pp. 1745-1748, April 19-24, 2009, Taipei,
Taiwan.
7
Contents
Abstract 2
Acknowledgement 3
List of Publications 5
Contents 7
List of Figures 10
List of Tables 12
Statement of Originality 13
Abbreviations 14
Mathematical Notations 16
Chapter 1. Introduction 18
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Motivations and Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 2. Background Theory 23
2.1 Adaptive Systems Configuration . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Quaternion Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Augmented Quaternion Statistics . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Cη-circular Quaternion Random Variables . . . . . . . . . . . . . . . 28
2.3.2 H-circular Quaternion Random Variables . . . . . . . . . . . . . . . 29
2.3.3 Augmented Second-Order Statistics of Quaternion Random Vectors 29
2.4 Analyticity in H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Review of Nonlinear Functions . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Quaternion-valued Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . 35
Contents 8
2.6.1 Derivation of Quaternion Least Mean Square (QLMS) . . . . . . . . 36
2.6.2 Analysis of Quaternion Least Mean Square (QLMS) . . . . . . . . . 37
2.7 Introduction to Quaternion Kalman Filtering . . . . . . . . . . . . . . . . . 39
Chapter 3. A Class of Split Quaternion Nonlinear Adaptive Filters 42
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Derivation of Split Quaternion Algorithms . . . . . . . . . . . . . . . . . . . 43
3.2.1 Derivation of Quaternion-valued Finite Impulse Response algorithm 44
3.2.2 Derivation of the Split Quaternion Adaptive Filtering Algorithm
(SQAFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 Derivation of Adaptive Amplitude Split Quaternion Adaptive Fil-
tering Algorithm (AASQAFA) . . . . . . . . . . . . . . . . . . . . . 47
3.2.4 Convergence Analysis of SQAFA and AASQAFA . . . . . . . . . . . 48
3.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.1 Four-dimensional Saito’s Chaotic Circuit . . . . . . . . . . . . . . . . 53
3.3.2 Wind Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Chapter 4. A Class of Quaternion Valued Nonlinear Adaptive Filters 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Fully Quaternion Functions in H . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.1 Quaternion Exponential Function . . . . . . . . . . . . . . . . . . . . 63
4.2.2 Local Analyticity of the Quaternion tanh Function . . . . . . . . . . 64
4.3 Derivation of Fully Quaternion Algorithms . . . . . . . . . . . . . . . . . . . 65
4.3.1 Derivation of Quaternion Nonlinear Gradient Descent (QNGD) . . . 66
4.3.2 Augmented Quaternion Nonlinear Gradient Descent (AQNGD) . . . 67
4.3.3 Convergence Analysis of QNGD and AQNGD . . . . . . . . . . . . . 68
4.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4.1 Linear AR (4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.2 Four-dimensional Saito’s Chaotic Circuit . . . . . . . . . . . . . . . . 72
4.4.3 Wind Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Chapter 5. Enabling Quaternion Valued Recurrent Neural Networks 82
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Analysis of Quaternion-Valued Functions . . . . . . . . . . . . . . . . . . . 84
5.3 FCRNN Algorithms in H . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Contents 9
5.3.1 Derivation of the Split Quaternion-valued RTRL . . . . . . . . . . . 86
5.3.2 Derivation of the Quaternion-Valued RTRL . . . . . . . . . . . . . . 89
5.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.1 Three-dimensional Lorenz Chaotic Signal . . . . . . . . . . . . . . . 91
5.4.2 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Chapter 6. Identification of Improper Quaternion Processes by Fractional
Tap-Length Algorithms 94
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Model Order Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.1 Filter Weight Updates . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.2 Tap Length Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Steady-State Analysis of FT Based Algorithms . . . . . . . . . . . . . . . . 99
6.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.4.1 Optimal Tap-Length . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4.2 Modelling of Quaternion-Valued Systems . . . . . . . . . . . . . . . 107
6.4.3 Nonstationary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Chapter 7. Conclusions and Future Works 112
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography 116
Appendix A. Derivation of QLMS 124
Appendix B. Derivation of QMLP-FIR 126
Appendix C. Convergence of SQAFA 128
Appendix D. Convergence of AASQAFA 131
Appendix E. Analyticity of the exponential function eq 134
Appendix F. Local Analyticity of tanh(q) 136
Appendix G. A Local Derivative of tanh(q) 140
Appendix H. Derivation of Split QRTRL 141
Appendix I. Derivation of QRTRL 143
10
List of Figures
2.1 Adaptive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Linear adaptive finite impulse response (FIR) filter . . . . . . . . . . . . . . 36
3.1 Nonlinear adaptive finite impulse response (FIR) filter . . . . . . . . . . . . 44
3.2 Left: The 4D Saito Signal. Right: The 3D wind signal. . . . . . . . . . . . . 51
3.3 The performance of SQAFA, AASQAFA and QMLP on the prediction of
4D Saito’s Chaotic Signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 The performance of SQAFA, AASQAFA and QMLP on the prediction of
3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5 The performance of SQAFA, QFIR, CNGD and NGD on the prediction of
3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Prediction gain of AASQAFA for the varying initial amplitude λ(0) and
step size ρ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on the
prediction of linear AR (4) signal (4.39) driven by H-circular white Gaussian
noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on the
prediction of linear AR (4) signal (4.39) driven by Ci-circular white Gaus-
sian noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on the
prediction of linear AR (4) signal (4.39) driven by noncircular white Gaus-
sian noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 Noncircular signals used in simulations. Left: The 4D Saito Signal. Right:
The 3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
List of Figures 11
4.5 The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on the
prediction of the noncircular 4D Saito signal. . . . . . . . . . . . . . . . . . 76
4.6 The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on the
prediction of the noncircular 4D Saito signal over a range of filter lengths. . 77
4.7 The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on the
prediction of a 3D wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.8 The performance of QNGD, QMLP and NGD on the prediction of a 3D
wind signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.9 Prediction gains of QNGD for tan(q), sin(q), arctan(q), arcsin(q), sinh(q),
arctanh(q) and arcsinh(q) for the prediction of 3D wind signal. . . . . . . . 80
5.1 A fully connected recurrent neural network (FCRNN). . . . . . . . . . . . . 86
5.2 Phase space of Lorenz signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 The performance of QRTRL, split QRTRL and RTRL on the prediction of
motion data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.1 Hybrid filter structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 The steady-state MSE for the processesW1 andW2 with respect to tap-length.104
6.3 The evolution of the optimal filter length parameter p and mixing parameter
λ for the modelling of the linear system W1. . . . . . . . . . . . . . . . . . . 105
6.4 The evolution of the optimal filter length parameter p and mixing parameter
λ for the modelling of the widely linear system W2. . . . . . . . . . . . . . . 106
6.5 The steady-state MSE for the process linear noncircular W1 with respect
to tap-length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.6 The evolution of the optimal filter length parameter p for the modelling of
the system W1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and
noncircular W1 for 6001 ≤ n ≤ 9000 . . . . . . . . . . . . . . . . . . . . . . 109
6.7 The evolution of the mixing parameter λ for the modelling of the system
W1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and noncircular
W1 for 6001 ≤ n ≤ 9000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12
List of Tables
3.1 Computational complexities of the algorithms . . . . . . . . . . . . . . . . . 56
4.1 Classes of Quaternion White Gaussian Noise . . . . . . . . . . . . . . . . . 70
4.2 Prediction Gain Rp for a Linear AR (4) Process With Varying Degree of
Noncircularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3 Computational complexities of the algorithms considered . . . . . . . . . . . 78
5.1 Correlation Coefficients Between Lorenz Attractors . . . . . . . . . . . . . . 91
6.1 Noncircular Quaternion White Gaussian Noise . . . . . . . . . . . . . . . . 108
13
Statement of Originality
This research is believed to be an original contribution of the author’s work in the field of
quaternion domain signal processing. Any idea or quotations from the works of other peo-
ple are fully acknowledged according to the standard referencing style practiced in the field.
As far as the author is aware, the following aspects of the thesis are believed to be
original contributions:
• Chapter 3: A Class of Split Quaternion Nonlinear Adaptive Filters
• Chapter 4: A Class of Quaternion Valued Nonlinear Adaptive Filters
• Chapter 5: Enabling Quaternion Valued Recurrent Neural Networks
• Chapter 6: Identification of Improper Quaternion Processes by Fractional
Tap-Length Algorithms
14
Abbreviations
LMS: Least Mean Square
CLMS: Complex Least Square
NN: Neural Network
RNN: Recurrent Neural Network
BP: Backpropagation
RBP: Recurrent Backpropagation
CBP: Complex Backpropagation
CRTRL: Complex Real Time Recurrent Learning
RTRL: Real Time Recurrent Learning
3DV-BP: Three-Dimensional Vector Backpropagation
VP-BP: Vector Product Backpropagation
QMLP: Quaternion valued Multilayer Perceptron
QLMS: Quaternion Least Mean Square
FIR: Finite Impulse Response
NGD: Nonlinear Gradient Descent
SQAFA: Split Quaternion Adaptive Filtering Algorithm
AASQAFA: Adaptive Amplitude Split Quaternion Adaptive Filtering Algorithm
MSE: Mean Squared Error
FT: Fractional Tap-Length
QWGN: Quaternion White Gaussian Noise
Abbreviations 15
QNGD: Quaternion Nonlinear Gradient Descent
AQNGD: Augmented Quaternion Nonlinear Gradient Descent
WL-QLMS: Widely Linear Quaternion Least Mean Square
AR: Autoregressive
QRTRL: Quaternion Real Time Recurrent Learning
QMLP-FIR: QMLP rederived for FIR
IIR: Infinite Impulse Response
CR: Cauchy-Riemann
GCR: Generalized Cauchy Riemann
CRF: Cauchy-Riemann-Fueter
ETF: Elementary Transcendental Function
LAC: Local Analyticity Condition
WGN: White Gaussian Noise
CWGN: Complex White Gaussian Noise
BPTT: Backpropagation Through Time
Pdf: Probability distribution function
CC-QLMS: Collaborative Combination Quaternion Least Mean Square
KF: Kalman Filter
EKF: Extended Kalman Filter
QKF: Quaternion Kalman Filter
QEKF: Quaternion Extended Kalman Filter
16
Mathematical Notations
x lower case denotes scalar
x boldface lower case denotes vector
xa augmented vectors
R real field
C complex field
H quaternion field
Rn n vector field
[·]T transpose operation
[·]H Hermitian operation
[·]∗ conjugate operation
‖ · ‖2 Eucledian norm
[·]ı,,κ ı, and κ involution
Φ nonlinear activation function
O(·) order of computational complexity
Rp prediction gain
ε white gaussian noise
Φs Split Quaternion function
Φ Locally Analytic Quaternion function
µ learning rate
R correlation matrix
Mathematical Notations 17
P cross correlation matrix
Ψ sensitivity
Υ conjugate sensitivity
e exponential function
∇ gradient
qa,b,c,d real, ı, , κ part of the quaternion vector
[·]′
first order derivative
R real part of the variable
I imaginary parts of the variable
Q quaternion field
δ dirac-delta function
b·c floor operator
Cqq covariance matrix
Pqq pseudocovariance matrix
Cqı ı-covariance matrix
Cq -covariance matrix
Cqκ κ-covariance matrix
Iı,,κ ı, , κ part of the variable
, equality in terms of probability distribution
∀ for all
∈ is an element of
{, } the set of
≈ approximation
σ variance
→ approaches
18
Chapter 1
Introduction
The overview of the research topic is presented in Section 1.1. This is followed by an
elaboration on the motivations and aims of the research in Section 1.2. This chapter ends
with a brief organisation of the thesis in Section 1.3.
1.1 Overview
Neural networks (NN) are central to nonlinear adaptive filtering due to their universal ap-
proximation capabilities [1]. This virtue is derived from the choice of nonlinear activation
functions. The original condition established on such nonlinear activation function is that
it needs to be continuous discriminatory [1]. Funahashi proved that sigmoidal functions
also fall under the class of continuous discriminatory functions [2]. One of the earliest
gradient descent algorithms to apply sigmoidal functions is the Backpropagation (BP) al-
gorithm which trains the feedforward NNs layer by layer [3]. The BP algorithm performs
reasonably well but requires a large number of training data and takes a long time to
converge. This has led to the development of the training algorithm for the recurrent
neural networks (RNNs) which possess the attractive ability to deal with time-varying
input through natural temporal operation. Among the first RNN training algorithms is
the Backpropagation Through Time (BPTT) which unfolds the RNN into a multilayer
feedforward network one layer at each time step. The BPTT offers generality but requires
1.1 Overview 19
huge memory for a long training sequence [4]. The Recurrent Back Propogation (RBP)
does not experience the requirement of huge memory at the expense of being complicated
and unstable [5]. In 1989, William and Zipser proposed an online gradient descent learning
algorithm for the RNNs called the Real Time Recurrent Learning (RTRL) algorithm [6]
which became hugely popular due to its simple weight update and fast direct-gradient
calculation.
As multidimensional data representation became more prominent, the Complex
Least Mean Square (CLMS) algorithm was the first extension of adaptive filtering algo-
rithms enabling processing in the complex domain C [7]. The ability of CLMS to process
two-dimensional signals in the complex domain C led to improved results over the con-
ventional processing in the real domain R. Unlike linear adaptive filters, the nonlinear
adaptive filtering algorithms faced a major obstacle in finding suitable analytic complex-
valued nonlinear activation functions. This resulted from the direct consequences of the
Liouville theorem which states that a bounded entire function must be a constant in C
limiting the scope of nonlinear activation functions that were once suitable in R. To cir-
cumvent this, different classes of split-complex functions which are analytic componentwise
were proposed [8–10]. These split-complex functions were utilised in the Split Complex
Backpropagation (CBP) [8] and later extended to the Split Complex Real Time Recurrent
Learning (Split CRTRL) [11]. These split-complex algorithms have been shown to yield
reasonable performances given that there are no strong couplings between the real and
imaginary part of the complex signals. Kim and Adali later proved that a class of complex
elementary transcendental functions (ETFs) based on the entire complex exponential func-
tions are suitable for complex-valued nonlinear adaptive filtering applications [12]. These
ETFs satisfied the Cauchy-Riemann (CR) conditions proving to be analytic in C and were
later implemented in the design of the Fully Complex Real Time Recurrent Learning (Fully
CRTRL) algorithm [13]. The Fully CRTRL exploits the correlation between the real and
imaginary parts resulting in an improved performance.
Further advances in sensor technology have highlighted the demand for higher di-
mensional adaptive signal processing algorithms to efficiently process the multidimensional
1.1 Overview 20
data. Among the approach is to represent multidimensional signals as vectors in Rn and to
use general split functions. One of the first multidimensional learning algorithms to utilise
this approach is the Three-Dimensional Vector Back-Propagation (3DV-BP) for NNs [14].
The 3DV-BP utilises a matrix operation which does not take advantage of the couplings
between the dimensions. Improvements include the Vector Product Back-Propagation
(VP-BP), which addresses this issue through the use of vector products [15]. A major
drawback of the VP-BP is that the algorithm cannot update the weights in the presence
of non-zero error due to the nature of the vector spaces not forming a division algebra.
Furthermore, the universal function approximation capabilities for both algorithms have
not been investigated as no density theorem has been proven for real vector spaces [16, pp
67-71]. This led to the developments of multidimensional learning algorithms in other
multidimensional spaces.
A natural platform for dealing with the processing of three- and four-dimensional
signals is the quaternion domain H. Quaternions were first conceived by W. Hamilton
in 1843 when he was posed with the problem to extend complex numbers into higher
dimensions [17]. Frobenius later regarded quaternions as the highest associative division
algebra which made it more attractive to work with as compared to other hypercomplex
spaces [18, 19]. Recently, quaternions experienced an uprising and were proven to be
popular across many areas of engineering such as molecular modelling [20], computer
graphics [21] and robotics [22]. In the statistical signal processing field, quaternions have
been employed in adaptive filtering, including Kalman filtering [23] and stochastic gradient
algorithms, such as the Quaternion Least Mean Square (QLMS) [24].
However, they are still relatively underexplored in nonlinear adaptive filtering
mainly due to the lack of analytic nonlinear functions in H. The very stringent Cauchy-
Riemann-Fueter (CRF) conditions [25] ensure that the only globally analytic quaternion-
valued functions are the linear functions and constants. Analogous to C, a split-quaternion
function that treats each channel separately (as a real channel) passed through a real
smooth nonlinearity was proposed and employed in the training of Quaternion Multilayer
Perceptron (QMLP) [16]. The QMLP training algorithm exhibited enhanced performance
1.2 Motivations and Aims 21
over vector based algorithms owing to the power of processing in H. Despite the gain,
the training of QMLP suffers when there exist strong correlations between the dimensions
of the signal. Furthermore, the QMLP training algorithm does not take into account the
non-commutativity aspect of quaternion product in its derivation.
1.2 Motivations and Aims
The main aim of this research is to extend complex-valued nonlinear adaptive filtering
algorithms to the quaternion domain H. Due to the non-commutativity aspect of the
quaternion product [26], the derivation of these algorithms need to explicitly take this
into account. The main concern lies in the analytic properties of the nonlinear activation
functions. The lack of suitable analytic quaternion nonlinear functions [25] does not per-
mit generalisation of nonlinear signal processing to H. Previous approaches, such as the
training algorithm for the Quaternion Multilayer Perceptron (QMLP) [16], were based on
the split-quaternion functions whereby the processing is done componentwise making it
unsuitable for real-world signals. Recent local analyticity condition (LAC) [27] has pro-
vided an alternative to the strict Cauchy-Riemann-Fueter (CRF) conditions. Satisfying
the LAC indicates the nonlinear quaternion-valued functions is locally analytic. These
locally analytic functions are only guaranteed the first-order differentiability at the cur-
rent operating point. Based on these results, several novel finite impulse response (FIR)
nonlinear quaternion adaptive filtering algorithms were derived by employing these locally
analytic functions. The derived algorithms were then extended to the recurrent neural
networks (RNN) architecture.
Another objective is to extend the current real- and complex-valued algorithms to
H. This enables the tools that were once available in the real domain R and complex
domain C, to be also accessible in H. The algorithm under consideration is the fractional
tap-length (FT) algorithm [28] which was devised for real-valued filters with a recent
extension to complex-valued filters [29]. This extension will open up the possibility of new
applications in H.
1.3 Organisation of the Thesis 22
1.3 Organisation of the Thesis
The organisation of the thesis is as follows. Chapter 2 introduces the fundamental con-
cepts behind the subsequently derived algorithms. In Chapter 3, the derivation for the
quaternion-valued nonlinear adaptive filters algorithms employing the split quaternion
functions which take into account the non-commutativity aspect are presented. Chap-
ter 4 introduces a class of locally analytic quaternion nonlinear functions which are then
implemented in the derivation of a class of fully quaternion nonlinear adaptive filtering
algorithms. Chapter 5 shows the extension of the derived algorithms in Chapter 4 to the
recurrent neural networks (RNNs) architecture. Chapter 6 provides an analysis to the
fractional tap-length (FT) algorithm extended to the quaternion domain H. Future works
and conclusions are given in Chapter 7.
23
Chapter 2
Background Theory
This chapter begins with an introduction to adaptive systems in Section 2.1. Section
2.2 introduces the basic quaternion operators. Section 2.3 presents the fundamentals of
quaternion domain H second-order statistics. This is followed by a discussion regarding
the analyticity conditions in the quaternion domain H in Section 2.4. The fundamen-
tals characteristics of activation functions for gradient-descent algorithms are detailed in
Section 2.5. Section 2.6 shows the derivation and performance analysis on one of the ear-
liest quaternion-valued adaptive filtering algorithm, the Quaternion Least Mean Square
(QLMS). An introduction to Quaternion Kalman Filtering is presented in Section 2.7.
2.1 Adaptive Systems Configuration
The basic structure of an adaptive filter is shown in Figure 2.1 where x(n) is the input
signal, y(n) is the filter output and d(n) is the desired output. The instantaneous error
e(n) is defined by the difference between the desired signal and the filter output which
is e(n) = d(n) − y(n). The filter output y(n) is given as y(n) = wT (n)x(n) where
x(n) = [x(n − 1), · · · , x(n − p)]T is the input signal, p is the filter length, (·)T is the
transpose operator, and w(n) = [w(n − 1), · · · , w(n − p)]T is the filter weights. The
adaptive filter adjusts its filter parameters w(n) using an algorithm which optimises the
cost function J(n). J(n) is usually defined based on the instantaneous error square e2(n).
2.1 Adaptive Systems Configuration 24
Figure 2.1: Adaptive Systems
Wiener theory has stated that the optimal coefficients of an adaptive filter are found
by minimising the expectation operator of the error square cost function J(n) = e2(n) [30].
Assuming that e(n), d(n) and x(n) are wide sense stationary with zero mean, it can be
shown that
E{J(n)} = E{(d(n)− y(n)
)2}
= E{d2(n) +wT (n)x(n)xT (n)w(n)− 2d(n)xT (n)w(n)}
= E{d2(n)}+wT (n)E{x(n)xT (n)}w(n)− 2E{d(n)xT (n)}wT (n)
= E{d2(n)}+wT (n)Rw(n)− 2PTw(n) (2.1)
where R is the input correlation matrix and P is the cross correlation vector between the
desired signal and input signal.
In order to find the optimal weight wopt, differentiate (2.1) with respect to w and
set the results to zero which will yield the Wiener-Hopf equation given by
∇wJ = 2Rw − 2P (2.2)
2.1 Adaptive Systems Configuration 25
The optimal weight wopt is calculated to be
wopt = R−1P (2.3)
In reality, it is not possible to have an exact measurement of the gradient vector in
(2.2) since that would require prior knowledge of R and P. Therefore, the gradient vector
has to be estimated from the available data and the weights are made adaptive through a
gradient descent update specified by
w(n+ 1) = w(n)− µ∇wE(n) (2.4)
where µ is the real-valued learning rate.
Consider the instantaneous estimation of the input correlation matrix R(n) and cross
correlation vector P(n) given by
R(n) = x(n)xT (n); P(n) = d(n)xT (n); (2.5)
Correspondingly, by substituting the instantaneous estimations of R(n) and P(n) (2.5)
into the instantenous gradient ∇wJ(n) to give
∇wJ(n) = 2x(n)xT (n)w(n)− 2d(n)xT (n) (2.6)
Substitute the instantaneous gradient obtained in (2.6) into the gradient descent update
defined in (2.4) to yield the final weight update as
w(n+ 1) = w(n)− µ
(
2x(n)xT (n)w(n) + 2d(n)xT (n)
)
= w(n)− µ
(
x(n)(xT (n)w(n) + d(n)
))
= w(n) + µe(n)x(n) (2.7)
where 2 is absorbed into µ.
The final weight update derived in (2.7) is the weight update of the Least Mean
2.2 Quaternion Algebra 26
Square (LMS) algorithm which proves that the recursive nature of the LMS algorithm
would converge to the optimal Wiener-Hopf equation in (2.3). This shows that algorithms
of this nature converged to the Wiener-Hopf equation.
2.2 Quaternion Algebra
Throughout the years, quaternion has been applied in various scientific applications rang-
ing from computer graphics [21] up to wind modelling [31]. Quaternions were first con-
ceived by W. Hamilton in 1843 [17] and proven to be the highest associative division
algebra [19] making it attractive to work in. The dilemma of modelling in the quaternion
domain versus modelling in R4 has been long present [32–34] and traditionally quaternion
based nonlinear adaptive filtering is still in its infancy.
A basic quaternion variable q ∈ H is defined as having a scalar part and a vector
part which can be represented as
q = [qa, q] = qa + qbı+ qc+ qdκ (2.8)
where qa, qb, qc, qd ∈ R and ı, , κ are both imaginary units and orthogonal unit vectors.
The relationships between these imaginary units and orthogonal unit vectors are shown
to be
ı = κ; κ = ı; κı = ; ıκ = ı2 = 2 = κ2 = −1 (2.9)
The addition and subtraction operations in quaternion algebra are defined similarly to
real and complex algebra which are given by
w ± x = [wa ± xa, w ± x] = (wa ± xa)± (wb ± xb)ı± (wc ± xc)± (wd ± xd)κ (2.10)
Quaternion is notoriously known for its non-commutative product given by
wx = [wa, w][xa, x] = [waxa − w · x, wax+ xaw + w × x] (2.11)
where the symbols “·” and “×” denote respectively to the dot-product and cross-product.
2.2 Quaternion Algebra 27
The non-commutativity of the quaternion product arises due to the presence of the cross-
product. Quaternions are a division algebra as the product of two non-zero quaternion
variables can never be zero.
Due to the inherent non-commutativity nature, there are two definitions for quater-
nion division operator which are
Right Division :w
x= wx−1; Left Division :
w
x= x−1w (2.12)
It can be seen that x−1w 6= wx−1, making the left division and right division not equiva-
lent. For clarity, the default definition for quaternion division operator used in this thesis
throughout is the right division.
Similar to the complex case, the conjugate of a quaternion q is
q∗ = [qa, q]∗ = [qa,−q] = qa − qbı− qc− qdκ (2.13)
and its norm square is
‖ q ‖22= qq∗ = q∗q = q2a + q2b + q2c + q2d (2.14)
Other operators of equivalence important to this work are the three quaternion involutions
(self-inverse mappings) given by
qı = −ıqı = qa + qbı− qc− qdκ
q = −q = qa − qbı+ qc− qdκ
qκ = −κqκ = qa − qbı− qc+ qdκ (2.15)
From this point onwards, all quantities are treated as quaternion valued, unless stated
otherwise.
2.3 Augmented Quaternion Statistics 28
2.3 Augmented Quaternion Statistics
The concept of augmented statistics in division algebra was first introduced to define
the notion of second-order noncircularity, or improperness, for complex random normal
vectors [35], and was subsequently extended to non-normal vectors [36]. In the complex
domain C, the second-order properness of a complex random vector can be fully charac-
terised by its covariance Czz and pseudocovariance Pzz, defined as [35]
Czz = E(zzH); Pzz = E(zzT ) (2.16)
where (·)H and (·)T denote respectively the Hermitian and transpose vector operator, and
z = x+ yı where x and y are real-valued. A complex random vector is termed “circular”
if its probability distribution is rotation-invariant. In the second-order sense, this implies
that the real-valued vectors x and y
Cxx = Cyy; Cxy = −CTxy (2.17)
resulting in a vanishing pseudocovariance Pzz [37]. In the scalar case, this reduces to that
the real and imaginary components have equal variance and are not correlated [38,39].
2.3.1 Cη-circular Quaternion Random Variables
The concept of augmented statistics was extended to the quaternion domain in [40], albeit
with the restriction of a single rotation axis of either ı, , or κ. A quaternion random
variable q that obeys this condition is said to be Cη-circular, and is defined as
q , qeηθ,∀θ (2.18)
for one and only one pure imaginary unit η, where η ∈ {ı, , κ}. The symbol , denotes
equality in terms of the probability distribution function (pdf) and the symbol θ represents
the angle of rotation.
2.3 Augmented Quaternion Statistics 29
2.3.2 H-circular Quaternion Random Variables
The restriction of a single rotation axis for Cη-circular random variable has proven to
be too rigid in practical scenarios and a generalisation, allowing for a pdf along any two
arbitrary axes of rotation to be circular, was introduced in [41]. A quaternion random
variable q that satisfies this condition is said to be H-circular, or Q-proper, and is defined
as
q , qeηθ,∀θ (2.19)
for all the pure imaginary units η ∈ {ı, , κ}. An H-circular quaternion random variable
is circular in all its dimensions, meaning that the scatterplot of any two components of
{1, ı, , κ} is circular. A Q-proper (second-order circular) random variable q is defined
as the one that has equal powers in all the components, qa, qb, qc, and qd.
2.3.3 Augmented Second-Order Statistics of Quaternion Random Vec-
tors
Similarly to the complex case, in general the covariance alone is not sufficient to fully
describe the complete second-order information within the quaternion random vector. To
provide a generic framework for second-order statistical modelling of quaternion vectors,
that is to deal with Q-improper signals, complementary covariance matrices (pseudoco-
variances) need to be employed. These complementary covariance matrices are termed
the ı-covariance Cqı, -covariance Cq and κ-covariance Cqκ, and are given by [42,43]
Cqı = E{qqıH}; Cq = E{qqH}; Cqκ = E{qqκH} (2.20)
Thus, the complete second-order characteristics of the quaternion random vec-
tor are described by the augmented covariance matrix Caq of an augmented vector
2.3 Augmented Quaternion Statistics 30
qa = [qTqıTqTqκT ]T , given by1
Caq = E{qaqaH} =
Cqq Cqı Cq Cqκ
CHqı Cqıqı Cqıq Cqıqκ
CHq Cqqı Cqq Cqqκ
CHqκ Cqκqı Cqκq Cqκqκ
(2.21)
where the submatrices in (2.21) are calculated according to2
Cδ = E{qδH} Cαβ = E{αβH}
δ ∈ {qı,q,qκ} α,β ∈ {q,qı,q,qκ} (2.22)
A quaternion random vector q is said to be Cı-circular when the -covariance Cq
and κ-covariance Cqκ vanish [43]. Similar definitions hold for C-circular and Cκ-circular
quaternion random vectors. The semi-widely linear model, based on the statistics of Cη
circularity, is described in [43]. On the other hand, an H-circular quaternion random
vector q has the property that it is not correlated with its quaternion involutions qı, q
and qκ, that is
E{qqıH} = 0; E{qqH} = 0; E{qqκH} = 0 (2.23)
yielding the augmented covariance matrix Caq in (2.21) of a H-circular random vector in
the form3
Caq = E{qaqaH} =
Cqq 0 0 0
0 Cıqq 0 0
0 0 Cqq 0
0 0 0 Cκqq
(2.24)
1As long as the covariance matrix Cqq is nonsingular, then it shows immediately that the other co-variance matrices Cqıqı, Cqq, Cqκqκ have inverses. Therefore, the augmented Ca
q is full rank and thereforenonsingular.
2The matrices Cqηqη are an involution of Cqq over η and therefore can be simplified to Cηqq where
η ∈ {ı, , κ} [43].3Any other basis comprising four combinations out of {q,qı
,q,q
κ} and their conjugates are equallyvalid. The basis proposed in [42] and used here, qa = [qT
qıTqTqκT ]T provides most convenient repre-
sentation, as shown in the augmented covariance structure for H-circular signals in (2.21) and (2.24).
2.4 Analyticity in H 31
To exploit the complete second-order statistics of quaternion valued signals, a fil-
tering model similar to the widely linear model in C needs to be considered [38,44]. The
quaternion widely linear model is based on the augmented basis that builds the matrix Caq
(2.21), and can be described by [42,43,45,46]
y = waTxa = gTx+ hTxı + uTx + vTxκ (2.25)
where g, h, u and v are the weight vectors, x is the input signal, xı, x and xκ are
respectively its ı, and κ involutions, wa = [gT hT uT vT ]T is the augmented weight
vector, and xa = [xT xıT xT xκT ]T is the augmented random input vector. Another
benefit of the quaternion widely linear model is the possibility to determine the degree of
properness of quaternion random vectors [47,48].
2.4 Analyticity in H
In the complex domain C, the notion of analyticity conforms with holomorphic, harmonic-
ity and conformality, therefore one notion would imply the others. However, this is not
the case in the quaternion domain H due to the non-commutativity product. Each of the
notions mentioned above needs to be re-evaluated in H. The notion of interest to this
thesis is holomorphy which means the existence of the derivative of the function. In order
to make the terms used synonymous with past literatures in R and C, the term analyticity
is adopted to define the existence of the derivative of the function.
The analyticity of a complex function f(z) = u(x, y) + v(x, y)ı is governed by the
Cauchy-Riemann (CR) equations given by
∂u(x, y)
∂x=∂v(x, y)
∂y;
∂v(x, y)
∂x= −
∂u(x, y)
∂y(2.26)
For a complex function f(z) to be analytic in C , the derivatives along the real and
imaginary axis have to be equal, that is
∂f(z)
∂x+∂f(z)
∂yı⇔
∂f(z)
∂z∗= 0 (2.27)
2.4 Analyticity in H 32
where z = x+ yı.
By continuity, one of the first definitions for analyticity in H is described by the
Generalized Cauchy-Riemann (GCR) conditions. Due to the non-commutative nature of
the quaternion product, there exist two definitions of GCR which are given by [49]
Right GCR :∂f(q)
∂qa= −
∂f(q)
∂qbı = −
∂f(q)
∂qc = −
∂f(q)
∂qdκ (2.28)
Left GCR :∂f(q)
∂qa= −ı
∂f(q)
∂qb= −
∂f(q)
∂qc= −κ
∂f(q)
∂qd(2.29)
where q = qa + qbı+ qc+ qdκ.
These two definitions for quaternion analyticity create ambiguity as to which con-
dition to exercise when determining the analyticity of the function. The derivative of the
function obtained through left GCR is called left derivative and right GCR is the right
derivative. These GCRs are only satisfied by a special form of quaternion linear func-
tions and constants proving to be too prohibitive for any practical application such as in
neural networks, where typically nonlinear neuron models are involved. The restrictive
nature of the GCR conditions arises from the fact that they were initially proposed for a
four-dimensional domain, with Clifford algebra as their basis, making them unsuitable for
applications in H [50].
To circumvent this issue, Fueter further relaxed these conditions by redefining them
based on a quaternion basis, resulting in the left and right Cauchy-Riemann-Fueter (CRF)
conditions given by [25]
Right CRF :∂f(q)
∂qa+∂f(q)
∂qbı+
∂f(q)
∂qc+
∂f(q)
∂qdκ = 0 (2.30)
Left CRF :∂f(q)
∂qa+ ı
∂f(q)
∂qb+
∂f(q)
∂qc+ κ
∂f(q)
∂qd= 0 (2.31)
Unlike the GCRs conditions, these CRFs conditions are defined by a single quaternion par-
tial differentiation which lead to a close analogue of Cauchy’s theorem, Cauchy’s integral
formula and the Laurent expansion [51]. Furthermore, these CRFs provide a generaliza-
tion over the GCRs by permitting cannonical complex variable limit as a solution. The
2.5 Review of Nonlinear Functions 33
cannonical complex limits refers to functions of a complex variables involving only a single
imaginary units which are
qı = qa + qbı; q = qa + qc; qκ = qa + qdκ; (2.32)
However, similar to the GCR conditions, the notion for analyticity is still ambiguous as
there exists the left derivative and right derivative. It can be shown that only linear
quaternion functions and constants satisfy these CRF conditions [25], limiting the scope
for nonlinear adaptive filtering in H which requires differentiable nonlinear functions.
2.5 Review of Nonlinear Functions
The choice of nonlinear function has a key influence on determining the performance of
the nonlinear adaptive filters. The fundamentals of determining the choice of suitable
activation function goes way back to the Hilbert 13th problem. Hilbert 13th problem
questions the possibility of expressing a general algebraic equation of a high degree by
using the sums and compositions of single variable functions. Kolmogrov showed that the
conjecture of Hilbert 13th problem was incorrect and provided a general representation
theorem stating that any real-valued continuous function f can be represented as
f(x1, . . . , xn) =
2n+1∑
q=1
Φq
( n∑
p=1
ψpq(xp)
)
(2.33)
where Φq and ψpq are nonlinear continuous function of one variable. (2.33) proves that any
function of general number of variables can be approximated with nonlinear continuous
functions of a single variable.
The Kolmogorov’s theorem provided the existence proof for the approximation
capabilities of neural networks (NN). Based on this, the first proof of the universal ap-
proximation capabilities of NNs is given by
f(x) ≈N∑
i=1
wiσ(aTi x+ bi) (2.34)
2.5 Review of Nonlinear Functions 34
where ai, wi, bi are dense in the space of continuous function defined on [0, 1]n and σ is a
discriminatory function [1]. It was concluded that any bounded and measurable sigmoidal
functions is a discriminatory function [1, 2].
For gradient-descent learning algorithms, the sigmoidal functions should be differ-
entiable and bounded. To put emphasis on the differentiability aspect of the nonlinear
function, the weight update of the Nonlinear Gradient Descent (NGD) algorithm is pre-
sented as [52]
w(n+ 1) = w(n) + µe(n)Φ′
(wT (n)x(n))x(n) (2.35)
where e(n) is the error, w(n) is the adaptive weight vectors, x(n) is the filter input and
Φ′
(·) is the first-order derivative of the nonlinear function.
The differentiability of the nonlinear function was proven to be problematic in the
complex domain C. This is due to violating Liouville theorem which states that a bounded
entire function must be a constant in C. In order to cater for such a conflict, split-complex
functions which process componentwise are implemented, defined by
Φs
(
wT (n)x(n)
)
= Φr
(
R(wT (n)x(n)
))
+Φi
(
I(wT (n)x(n)
))
ı (2.36)
where Φr and Φi are real-valued sigmoidal functions. The symbolsR(·) and I(·) correspond
to the real and imaginary component respectively.
The properties of suitable split-complex functions for gradient descent adaptive
filtering applications are specified below [9]
a) f(z) = u(x, y) + v(x, y)ı is nonlinear in x and y;
b) f(z) has no singularities and is always bounded for all values of z;
c) The partial derivatives ∂u∂x ,
∂v∂y ,
∂v∂x and ∂u
∂y are continuous and bounded;
d) ∂u∂x
∂v∂y 6= ∂v
∂x∂u∂y to avoid the error gradient becoming zero for any non-zero inputs
ensuring continuous learning;
Despite the strict conditions imposed on the split-complex functions, these functions
2.6 Quaternion-valued Adaptive Filtering 35
do not give accurate gradient measurements as it does not satisfy the Cauchy-Riemann
(CR) conditions. Furthermore, the split-complex functions performed poorly for signals
that have high correlation between the two dimensions. With this motivation, Kim and
Adali proposed the usage of a class of complex elementary transcendental functions (ETF)
derivable from the entire complex exponential functions ez [12]. These fully complex func-
tions satisfy the CR conditions and the properties specified in [9] justifying its suitability
for gradient-descent adaptive filtering in C.
The situation in H proved to be more difficult than in C. In H, there is no known dif-
ferentiable nonlinear function as the analyticity is dictated by the strict Cauchy-Riemann-
Fueter (CRF) conditions. The CRF is only satisfied by a constant and linear function
hindering the growth of nonlinear adaptive filtering in H.
In order to circumvent the issue of analyticity, it was proposed to apply the split-
quaternion functions. The split-quaternion function that processes componentwise is given
as
Φs
(
wT (n)x(n)
)
= Φa
(
R(wT (n)x(n)
))
+Φb
(
Iı
(wT (n)x(n)
))
ı+Φc
(
I
(wT (n)x(n)
))
+ Φd
(
Iκ
(wT (n)x(n)
))
κ (2.37)
with Φa, Φb, Φc, Φd are real-valued sigmodial functions. The symbols Iı(·), I(·) and Iκ(·)
correspond to the ı, and κ components respectively.
Similar to C, the main problem inherent to the split-quaternion function is the
inadequacy of processing signal that has strong correlations between the four dimensions.
2.6 Quaternion-valued Adaptive Filtering
The cost function in quaternion-valued adaptive filtering is usually given by a real function
of quaternion variables such as
E(n) = e2a(n) + e2b(n) + e2c(n) + e2d(n) = e(n)e∗(n) (2.38)
2.6 Quaternion-valued Adaptive Filtering 36
Figure 2.2: Linear adaptive finite impulse response (FIR) filter
where the terms ea(n), eb(n), ec(n) and ed(n) denote respectively the error component in
the real part, ı part, part, and κ part.
Based on (2.38), the derivation of one of the earliest quaternion-valued adaptive filtering
algorithm, the Quaternion Least Mean Square (QLMS) [24] is provided in the coming
subsection.
2.6.1 Derivation of Quaternion Least Mean Square (QLMS)
The Quaternion Least Mean Square (QLMS) is derived based on the finite impulse response
(FIR) architecture. The basic structure of a FIR is depicted in Figure 2.2 with the output
y(n) and conjugate output y∗(n) of the filter given by
y(n) = wT (n)x(n); y∗(n) = xH(n)w∗(n) (2.39)
where w(n) and x(n) correspond to the adaptive weight vectors and the filter input.
The QLMS is made adaptive according to a gradient descent update of the coeffi-
cients, given by
w(n+ 1) = w(n)− µ∇wE(n) (2.40)
where µ is the real valued learning rate.
From (2.38), the gradient ∇wE(n) is derived to be [24]
∇wE(n) = e(n)∇we∗(n) +∇we(n)e
∗(n)
= e(n)(∇wd
∗(n)−∇wy∗(n)
)+
(∇wd(n)−∇wy(n)
)e∗(n)
= −
(
e(n)∇wy∗(n) +∇wy(n)e
∗(n)
)
(2.41)
2.6 Quaternion-valued Adaptive Filtering 37
The terms ∇wy(n) and ∇wy∗(n) in (2.41) are defined as
∇wy(n) = ∇way(n) +∇wby(n)ı+∇wcy(n)+∇wd
y(n)κ (2.42)
∇wy∗(n) = ∇way
∗(n) +∇wby∗(n)ı+∇wcy
∗(n)+∇wdy∗(n)κ (2.43)
The gradients ∇wy(n) in (2.42) and∇wy∗(n) in(2.43) are derived in [24]. For convenience,
the full gradients derivation is provided in Appendix A. The final gradients are given by
∇wy(n) = −2x∗(n); ∇wy∗(n) = 4x∗(n) (2.44)
Replacing (2.44) into the error gradient in (2.41) to give the final QLMS weight update
of [24]
w(n+ 1) = w(n) + µ(2e(n)x∗(n)− x∗(n)e∗(n)
)(2.45)
where 2 is absorbed into µ.
For the sake of comparison, the weight update of the Complex Least Mean Square
(CLMS) is reproduced here and is given by [7]
w(n+ 1) = w(n) + µe(n)x∗(n) (2.46)
Comparing the weight update structure of the QLMS (2.45) with the CLMS (2.46)
proved that the QLMS is not a simple extension of the CLMS. The extra term in the QLMS
weight update is needed to capture the extra statistical information exists in utilizing the
quaternion domain H.
2.6.2 Analysis of Quaternion Least Mean Square (QLMS)
In order to analyse the performance of the QLMS, the standard assumption in adaptive
filtering is made, which is
d(n) = wToptx(n) (2.47)
where wopt is the optimal weight specified by the Wiener-Hopf equation in (2.3).
2.6 Quaternion-valued Adaptive Filtering 38
Following the standard analysis of convergence in the mean [30], the weight error
vector v(n) is defined as
v(n) = w(n)−wopt (2.48)
The error e(n) is then rewritten to be
e(n) = d(n)− y(n)
= wToptx(n)−wT (n)x(n)
= −vT (n)x(n) (2.49)
The QLMS analysis will be based on the two following observations separately:
a) y(n)∗ = y(n) when I{y(n)} = 0;
b) y(n)∗ = −y(n) when R{y(n)} = 0.
From the weight update in (2.45) and exercising situation (a), the real-part of the weight
update is calculated to be
R{w(n+ 1)} = R{w(n)}+R{µ(2e(n)x∗(n)− x∗(n)e∗(n)
)}
= R{w(n)} − 2R{µvT (n)x(n)x∗(n)}+R{µvT (n)x(n)x(n)}
= R{w(n)} − 2R{µ(vT (n)x(n)xH(n)}
)T+R{µ
(vT (n)x(n)xT (n)}
)T(2.50)
Next, subtract wopt from both sides of (2.50) to yield
R{v(n+1)} = R{v(n)}− 2R{µ(vT (n)x(n)xH(n)
︸ ︷︷ ︸
Covariance
})T
+R{µ(vT (n) x(n)xT (n)
︸ ︷︷ ︸
Pseudocovariance
})T
(2.51)
The recursive weight error vector v(n) (2.51) shows that the QLMS considers both covari-
ance Cqq and pseudocovariance Pqq in its weight updates.
Similarly, considering the vector part of the QLMS weight update in situation (b)
2.7 Introduction to Quaternion Kalman Filtering 39
would lead to
I{v(n+ 1)} = I{v(n)} − 2I{µ(vT (n)x(n)xH(n)
︸ ︷︷ ︸
Covariance
})T
− I{µ(vT (n) x(n)xT (n)
︸ ︷︷ ︸
Pseudocovariance
})T
(2.52)
which proves that the covariance Cqq and pseudocovariance Pqq are still involved in the
weight updates of the QLMS. Therefore, this indicates that complex “augmented statis-
tics” is inherent to this class of algorithms.
2.7 Introduction to Quaternion Kalman Filtering
Kalman filter (KF) algorithm operates in the state-space as opposed to the Wiener filter
which minimises a specified cost function. KF versatility stems from the flexible process
and measurement state models which can be modified according to the application at
hand. The Quaternion Kalman Filter (QKF) algorithm was first derived for attitude
control utilizing the q-method based approach [23]. For simplicity, the QKF derived in
this section is based on the basic model given by
Process State : x(n+ 1) = F(n)x(n) + ε1(n) (2.53)
Measurement State : y(n + 1) = H(n)x(n) + ε2(n) (2.54)
where x is the M × 1 state vector, F is the M ×M transition matrix, y is the N × 1
observable output vector and H is the N ×M measurement matrix. Both ε1 and ε2 are
i.i.d. quadruply white Quaternion Gaussian noise (QWGN) vector of M × 1 and N × 1
respectively.
The basic operations of QKF is divided into two distinct steps:
a) the time update step which predicts the a priori state vector x−;
b) the measurement update which corrects the a priori prediction x− upon receiving
the observable output y.
Firstly, consider the time update stage where theM×1 a priori state vector x− is predicted
2.7 Introduction to Quaternion Kalman Filtering 40
by
x−(n) = F(n)x(n− 1) (2.55)
After that, the M ×M a priori estimated error covariance matrix P− is calculated to be
P−(n) = F(n)P(n− 1)FH(n) +Q1(n) (2.56)
where Q1 is the M ×M covariance matrix of process noise ε1.
Proceeding to the measurement update stage, the previously estimated a priori error
covariance matrix P− is used to calculate the M ×N Kalman Gain matrix K according
to
K(n) = P−(n)HH(n)[H(n)P−(n)HH(n) +Q2(n)]−1 (2.57)
where the symbol (·)−1 denotes matrix inverse and Q2(n) is the N ×N covariance matrix
of the measurement noise ε2.
Next, the N × 1 innovations vector αi is defined as
αi(n) = y(n)−H(n)x−(n) (2.58)
Utilizing the Kalman Gain K and innovation vectors αi, the M × 1 estimated a posteriori
state vector x is corrected by
x(n) = x−(n) +K(n)αi(n) (2.59)
Finally, the M ×M estimated a posteriori error covariance P is updated according to
P(n) = (I−K(n)H(n))P−(n) (2.60)
where I is the M ×M identity matrix.
These two update stages are computed at every iteration of the QKF algorithm.
Comparing (2.59) with the QLMS weight update (2.45), it can be seen that the Kalman
Gain K functions a similar role to the learning rate µ.
2.7 Introduction to Quaternion Kalman Filtering 41
In order to model nonlinear dynamics effectively, the Extended Kalman Filter
(EKF) is proposed. The Quaternion Extended Kalman Filter (QEKF) is derived by con-
sidering a simple nonlinear state space given by [53].
Process State : x(n+ 1) = ΦP
(F(n)x(n)
)+ ε1(n) (2.61)
Measurement State : y(n + 1) = ΦM
(H(n)x(n)
)+ ε2(n) (2.62)
where ΦP (·) and ΦM(·) both nonlinear function of the process and measurement state
respectively.
The QEKF is derived in a similar fashion to the QKF resulting in similar expres-
sions. One major difference is that the EQKF requires the computation of the quaternion
nonlinear functions derivatives, ΦP (·)′
and ΦM (·)′
, which posed to be problematic.
Despite the enhancement provided by the QEKF, the QKF is a more favourable
approach to attitude estimation. This is because the QEKF is sensitive to initial conditions
and biases in the estimation errors [23]. Furthermore, the derivatives, ΦP (·)′
and ΦM (·)′
,
are unstable as they do not fulfill the Cauchy-Riemann-Fueter (CRF) conditions.
42
Chapter 3
A Class of Split Quaternion
Nonlinear Adaptive Filters
This chapter proposes a class of split quaternion learning algorithm for the training of non-
linear finite impulse response (FIR) adaptive filters for the processing of three- and four-
dimensional signals. For higher dimensional signals, it can be represented as a quaternion-
vector similar to the Quaternion Kalman Filter approach [23]. These algorithm derivations
take into consideration explicitly the non-commutativity of the quaternion product. The
additional information obtained by this method provides improved performance on pro-
cessing hypercomplex processes. A rigorous analysis of the convergence of the proposed
algorithms is also provided. Simulation results on both benchmark and real-world signals
justify the proposed approach.
3.1 Introduction
The introduction of Quaternion Multilayer Perceptron (QMLP) has opened up many ap-
plications in the quaternion domain H such as polarized signal classification [54] and
controlling the attitude of a rigid body [55]. Despite reaping the benefits of processing in
H, the performance of the QMLP can still be improved upon. This is because the QMLP
did not explicitly consider the non-commutativity of quaternion product in its derivation.
3.2 Derivation of Split Quaternion Algorithms 43
The aim of this chapter is to introduce a quaternion valued nonlinear finite impulse
response (FIR) adaptive filter suitable for the processing of nonlinear signals. The learn-
ing algorithms introduced, the Split Quaternion Nonlinear Adaptive Filtering Algorithm
(SQAFA) and the Adaptive Amplitude Split Quaternion Nonlinear Adaptive Filtering
Algorithm (AASQAFA), are derived rigorously in order to explicitly address the non-
commutativity of the quaternion product and to compensate the large dynamical range
of the quaternion signal.
The chapter is organised as follows. In Section 3.2, the proposed SQAFA and
AASQAFA are derived followed by the analysis on their convergence properties. Sec-
tion 3.3 shows the performances on SQAFA and AASQAFA algorithms compared against
the QMLP, QMLP for FIR (QMLP-FIR) and the corresponding complex and multidi-
mensional real-valued algorithms, through simulations on both benchmark and real-world
multidimensional data. Section 3.4 provides further elaboration of the results obtained.
Finally, the chapter concludes in Section 3.5.
3.2 Derivation of Split Quaternion Algorithms
The cost function in quaternion-valued adaptive filtering is given by
E(n) = e2a(n) + e2b(n) + e2c(n) + e2d(n) (3.1)
= e(n)e∗(n) (3.2)
where the terms ea(n), eb(n), ec(n) and ed(n) denote respectively the error component in
the real part, ı part, part, and κ part.
The current quaternion-valued nonlinear adaptive filtering algorithms utilise the
split quaternion functions in order to circumvent the strict Cauchy-Riemann-Fueter (CRF)
analyticity conditions. The output of a split quaternion function is given as
Φs(q) = Φa(qa) + Φb(qb)ı+Φc(qc)+Φd(qd)κ (3.3)
3.2 Derivation of Split Quaternion Algorithms 44
Figure 3.1: Nonlinear adaptive finite impulse response (FIR) filter
where Φs(·) denoting split quaternion nonlinear function, Φa(·) is a real-valued nonlinear
activation function applied to the real-part of q, Φb(·) to the ı part, Φc(·) to the part
and Φd(·) to the κ part.
The derivative of the split quaternion nonlinear function Φ′
s is defined to be
Φ′
s(q) = Φ′
a(qa) + Φ′
b(qb)ı+Φ′
c(qc)+Φ′
d(qd)κ (3.4)
where the derivatives Φ′
a(·), Φ′
b(·), Φ′
c(·) and Φ′
d(·) are real-valued derivatives defined com-
ponentwise.
The following algorithms derived are based on the split quaternion nonlinear functions.
3.2.1 Derivation of Quaternion-valued Finite Impulse Response algo-
rithm
In order to perform a fair comparison with the proposed algorithms, the quaternion-valued
nonlinear algorithm under consideration needs to be of the same nonlinear FIR architecture
shown in Figure 3.1. Therefore, the QMLP is derived for the nonlinear FIR architecture
and aptly named QMLP for Finite Impulse Response (QMLP-FIR) filter.
The output of the QMLP-FIR algorithm is given by
y(n) = Φs(net(n))
= Φa(neta(n)) + Φb(netb(n))ı+Φc(netc(n))+Φd(netd(n))κ
= ya(n) + yb(n)ı+ yc(n)+ yd(n)κ (3.5)
where net is defined as net(n) = wT (n)x(n) with w(n) and x(n) correspond to the adap-
3.2 Derivation of Split Quaternion Algorithms 45
tive weight vector and the filter input. Symbols (·)T and (·)∗ denote the transpose and
quaternion conjugate operator. The terms ya(n), yb(n), yc(n) and yd(n) are the compo-
nentwise output of the filter.
The terms neta, netb, netc and netd are real-valued defined by
neta(n) = R{wT (n)x(n)}; netb(n) = Iı{wT (n)x(n)}
netc(n) = I{wT (n)x(n)}; netd(n) = Iκ{w
T (n)x(n)} (3.6)
where the symbols R(·), Iı(·), I(·) and Iκ(·) corresponding to the real, ı, and κ com-
ponents respectively. The full expression of these terms are presented in Appendix B.
The QMLP-FIR then minimises the cost function (3.1) through a gradient descent
weight update specified by
w(n+ 1) = w(n)− µ∇wE(n) (3.7)
where µ is the real-valued learning rate and the gradient ∇wE(n) is given by
∇wE(n) =∂e2a(n)
∂w+∂e2b(n)
∂w+∂e2c(n)
∂w+∂e2d(n)
∂w
= −2ea(n)∂ya(n)
∂w− 2eb(n)
∂yb(n)
∂w− 2ec(n)
∂yc(n)
∂w− 2ed(n)
∂yd(n)
∂w(3.8)
The term ∂ya(n)∂w is calculated by differentiating ya with respect to w which will yield
∂ya(n)
∂w=
∂ya(n)
∂wa+∂ya(n)
∂wbı+
∂ya(n)
∂wc+
∂ya(n)
∂wdκ
= Φ′a(neta(n))xa(n)− Φ′
a(neta(n))xb(n)ı− Φ′a(neta(n))xc(n)− Φ′
a(neta(n))xd(n)κ
= Φ′a(neta(n))x
∗(n) (3.9)
The expression for the remaining terms ∂yb∂w , ∂yc
∂w and ∂yd∂w can be calculated similarly and
are derived in Appendix B. Replacing all these terms into (3.8) will result in the final
weight update of
w(n+ 1) = w(n) + µ
(
e(n) · Φ′
s
(net(n)
)x∗(n)
)
(3.10)
3.2 Derivation of Split Quaternion Algorithms 46
where the factor 2 is absorbed into µ.
3.2.2 Derivation of the Split Quaternion Adaptive Filtering Algorithm
(SQAFA)
For convenient, the derivation of SQAFA considers the cost function of (3.2) which ex-
plicitly takes into consideration the non-commutative nature of the quaternion product1.
The cost function in (3.2) can be rewritten as
E(n) =
(
d(n)− y(n)
)(
d∗(n)− y∗(n)
)
= d(n)d∗(n)− d(n)y∗(n)− y(n)d∗(n) + y(n)y∗(n)
(3.11)
Taking the error gradient ∇wE(n) of (3.11) would give us
∇wE(n) = −d(n)∇wy∗(n)−∇wy(n)d
∗(n) + y(n)∇wy∗(n) +∇wy(n)y
∗(n) (3.12)
The error gradient ∇wE(n) in (3.12) explicitly considers the non-commutativity of
quaternion algebra. To simplify the derivation of SQAFA, the odd-symmetry property of
elementary transcendental functions (ETF) is applied given by
Φ′∗s
(net(n)
)= Φ
′
a
(neta(n)
)− Φ
′
b
(netb(n)
)ı− Φ
′
c
(netc(n)
)− Φ
′
d
(netd(n)
)κ = Φ
′
s
(net∗(n)
)
(3.13)
Applying the property in (3.13), the derivations to determine ∇wy(n) and ∇wy∗(n) can
be simplified resulting in (the derivation is similar to Appendix A)
∇wy(n) = Φ′
s
(net(n)
)(− 2x∗(n)
); ∇wy
∗(n) = Φ′∗s
(net(n)
)(4x∗(n)
)(3.14)
Replacing these gradients into the error gradient ∇wE(n) in (3.12) will give the final
SQAFA algorithm weight update of
w(n+ 1) = w(n) + µ
(
2e(n)Φ′
s
(net∗(n)
)x∗(n)− Φ
′
s
(net(n)
)x∗(n)e∗(n)
)
(3.15)
1In the quaternion domain H, due to the non-commutativity of the quaternion product,∇w
(
e(n)e∗(n))
6= ∇w
(
e∗(n)e(n)
)
. The gradient ∇w
(
e(n)e∗(n))
is chosen as it is quaternion-valued.
3.2 Derivation of Split Quaternion Algorithms 47
3.2.3 Derivation of Adaptive Amplitude Split Quaternion Adaptive Fil-
tering Algorithm (AASQAFA)
Architectures with fixed nonlinearities are not suitable for real-world signals with large
dynamical range. One method of addressing the large dynamics of the signal is through
the implementation of an adaptive slope of the activation function. However, the adaptive
slope of the activation function is interchangeable with the time varying step size of the
learning algorithm, rendering it less effective [56]. To overcome this, a trainable amplitude
of the activation function is implemented [57]. This trainable amplitude was applied to
nonlinear FIR adaptive filter in R [58] and then extended to the recurrent neural network
(RNN) for processing in the complex domain C [59] which yielded superior performance
compared to their counterparts with fixed nonlinearities. Motivated by this, a trainable
amplitude activation function shall be incorporated into the SQAFA, termed the Adaptive
Amplitude Split Quaternion Adaptive Filtering Algorithm (AASQAFA).
The adaptive amplitude of nonlinearity is defined as [57]
Φs
(wT (n)x(n)
)= λ(n) · Φs
(wT (n)x(n)
)(3.16)
where λ(n) denotes the time varying amplitude and Φs(·) the real nonlinearity with unit
amplitude applied componentwise.
In the context of “split quaternion” filtering, this can be formulated as
Φs
(neta(n)
)= λa(n)Φa
(neta(n)
)+ λb(n)Φb
(netb(n)
)ı+ λc(n)Φc
(netc(n)
)+ λd(n)Φd
(netd(n)
)κ
= ya(n) + yb(n)ı+ yc(n)+ yd(n)κ (3.17)
where λa(n) is the amplitude of the nonlinearity for the real part of the quaternion, λb(n)
for the ı part, λc(n) for the part and λd(n) for the κ part.
The update of the adaptive amplitude is derived based on
λ(n + 1) = λ(n)− ρ∇λE(n) (3.18)
3.2 Derivation of Split Quaternion Algorithms 48
where ρ is a real-valued learning rate.
The error gradient ∇λE(n) is given as
∇λE(n) =∂E(n)
∂λ(n)=∂[e(n)e∗(n)
]
∂λ(n)= e(n)
∂e∗(n)
∂λ(n)+∂e(n)
∂λ(n)e∗(n) (3.19)
From (3.17), since each dimension is treated separately, it is convenient to define the
corresponding component-wise errors as
ea(n) = da(n)− λa(n)Φa
(neta(n)
); eb(n) = db(n)− λb(n)Φb
(netb(n)
)
ec(n) = dc(n)− λc(n)Φc
(netc(n)
); ed(n) = dd(n)− λd(n)Φd
(netd(n)
)(3.20)
As the adaptive amplitude is applied component-wise, the error gradient ∇λE(n)
for each dimension can be optimised separately. For instance, the error gradient with
respect to λa, ∇λaE(n) is given as
∇λaE(n) = ea(n)
∂e∗a(n)
∂λa(n)+∂ea(n)
∂λa(n)e∗a(n) = −2ea(n)Φa
(neta(n)
)(3.21)
Similar expressions are obtained for the other three dimensions. Finally, the up-
dates for the amplitudes of all the four nonlinearities are given by
λa(n+ 1) = λa(n) + ρea(n)Φa
(neta(n)
); λb(n + 1) = λb(n) + ρeb(n)Φb
(netb(n)
)
λc(n+ 1) = λc(n) + ρec(n)Φc
(netc(n)
); λd(n+ 1) = λd(n) + ρed(n)Φd
(netd(n)
)(3.22)
Although λ is quaternion-valued, it is derived componentwise due the split-
quaternion function which processes componentwise. In order to derive λ as a whole,
an analytic quaternion function needs to be implemented.
3.2.4 Convergence Analysis of SQAFA and AASQAFA
The convergence analysis of the proposed algorithms is achieved based upon the relation-
ship between the a priori, and a posteriori error, and by deriving the stepsize bound which
3.2 Derivation of Split Quaternion Algorithms 49
ensures convergence. Following the approach from [60] and [52], consider the first order
Taylor series expansion
‖e(n)‖22 = ‖e(n)‖22 +∆wH(n)∂‖e(n)‖22∂w(n)
(3.23)
where e(n), e(n), ∆wH(n) and∂‖e(n)‖2
2
∂w(n) are respectively the a posteriori error, the a priori
error, the Hermitian of the weight update and the error gradient. The a posteriori output
error e and the a priori output error e are defined as2
e(n) = d(n)−Φs
(wT (n+ 1)x(n)
)+ ε(n); e(n) = d(n)−Φs
(wT (n)x(n)
)+ ε(n) (3.24)
The symbols ε and ε denote quaternion quadruply white Gaussian noise (QWGN) defined
as
ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (3.25)
where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN), inde-
pendent and identically distributed (i.i.d.).
For the filter to converge, the a priori and the a posteriori errors need to satisfy
‖e(n)‖22 < ‖e(n)‖22 (3.26)
In the following analysis, three standard assumptions are made:
a) the learning rate µ is small to ensure the deterministic behaviour of the ensemble
average learning curves;
b) at convergence, e(n) is statistically independent of x(n) [30];
c) both the a posteriori output error e(n) and a priori output e(n) error are Gaussian.
Applying those assumptions, the final sufficient condition for the convergence of the step-
2The the term wT (n)x(n) is maintained instead of net(n) throughout the derivation in this subsection
to explicitly show the difference between the a posteriori error e(n) and the a priori error e(n).
3.3 Simulations 50
size µ of SQAFA becomes (full derivation is given in Appendix C)
0 < µ <1
10E{xT (n)x∗(n)‖Φ′(wT (n)x(n)
)‖22}
(3.27)
In the case of AASQAFA, each parameter λ controls the amplitude of the nonlinear-
ity in their respective dimension, hence, the convergence analysis is conducted separately
for each dimension. In order for AASQAFA to converge, λa(n), λb(n), λc(n) and λd(n)
must each converge. The analysis based on the convergence for λa(n) is first illustrated.
In order to understand the convergence property of the AASQAFA, the convergence
at each dimension is first analyzed. First, the scalar component output of the AASQAFA
ya in (3.17) is considered. This will then modify the scalar component of the priori error
ea(n), and the a posteriori error ea(n) of (3.24), to be
ea(n) = da(n)−λa(n)Φa
(wT (n)x(n)
)+ε(n); ea(n) = da(n)−λa(n)Φa
(wT (n+1)x(n)
)+ε(n)
(3.28)
Similarly, using the same procedures to find the convergence of SQAFA, the bounds
on λa(n), λb(n), λc(n) and λd(n) can be found as (derivations are given in Appendix D)
0 < λ2a(n) <1
2µE{xT (n)x∗(n)‖Φ′a
(wT (n)x(n)
)‖22}
0 < λ2b(n) <1
2µE{xT (n)x∗(n)‖Φ′b
(wT (n)x(n)
)‖22}
0 < λ2c(n) <1
2µE{xT (n)x∗(n)‖Φ′c
(wT (n)x(n)
)‖22}
0 < λ2d(n) <1
2µE{xT (n)x∗(n)‖Φ′d
(wT (n)x(n)
)‖22}
(3.29)
3.3 Simulations
Simulations were performed in an M-step prediction setting and provide a comprehensive
comparison between the nonlinear FIR filters trained with SQAFA, AASQAFA, QMLP-
FIR, Complex Nonlinear Gradient Descent (CNGD) [38], real-valued Nonlinear Gradient
3.3 Simulations 51
0 2000 4000−2
0
2
Time (samples)
X1
0 2000 4000−10
0
10
Time (samples)
Y1
0 2000 4000−2
0
2
Time (samples)
X2
0 2000 4000−10
0
10
Time (samples)
Y2
0 500 1000 1500 2000−4
−2
0
2
Time (samples)
Ea
st
Dire
ctio
n (
m/s
)
0 500 1000 1500 2000−5
0
5
Time (samples)
No
rth
Dire
ctio
n (
m/s
)
0 500 1000 1500 2000−2
−1
0
1
Time (samples)V
ert
ica
l
Dire
ctio
n (
m/s
)(a) 4D Saito Signal (b) Wind Signal
Figure 3.2: Left: The 4D Saito Signal. Right: The 3D wind signal.
Descent (NGD) [38] and the training algorithm for the Quaternion valued Multilayer
Perceptron (QMLP) [16]. The SQAFA, AASQAFA, QMLP-FIR, CNGD and NGD were
implemented with a filter length p whereas the QMLP had one hidden layer comprising
p inputs, three hidden neurons and one output neuron. The nonlinear function was the
tanh function applied component-wise. The original QMLP applied the unipolar logistic
function as the nonlinearity, whereas the QMLP algorithm implemented in our simulations
applied the bipolar tanh function, which was better suited to the dynamic range of the
data. This is justified by [61] which prove that interchanging the nonlinearity would not
lead to a significant deviation in performance. In the experiments, the amplitudes of
input signals in each dimension were scaled to within the range [-0.8,0.8]. The step size of
the adaptive amplitude was chosen to be ρ=0.4 with an initial amplitude λ(0)=1 for all
experiments. A total of 20 independent simulation trials were conducted and averaged.
These values were chosen to ensure optimal performance of the algorithms considered.
The standard prediction gain Rp was used as a quantitative measure of performance
3.3 Simulations 52
0
5
10
0
5
106
8
10
12
14
16
18
Filter Length pPrediction Horizon M
Pre
dic
tion
Ga
in (
dB
)
0
5
10
0
0.005
0.010
5
10
15
20
Filter Length pStepsize µ
Pre
dic
tion
Ga
in (
dB
)
AASQAFA
QMLP
SQAFA
AASQAFA
QMLPSQAFA
(b) Dependence of Prediction Gain on µ and p(a) Dependence of Prediction Gain on M and p
Figure 3.3: The performance of SQAFA, AASQAFA and QMLP on the prediction of 4DSaito’s Chaotic Signal.
defined as [62]
Rp = 10 log10σ2xσ2e
(3.30)
where σ2x and σ2e denote the estimated variance of the input and error respectively.
The variances were estimated according to
σ2x = E{x2a + x2b + x2c + x2d}; σ2e = E{e2a + e2b + e2c + e2d} (3.31)
where E{·} denotes the statistical expectation operator, x2a, x2b , x
2c , x
2d are the correspond-
ing squared components of the input signal, and similarly the squared error components,
e2a, e2b , e
2c , e
2d. All these values were measured at the steady-state.
Two quaternion valued processes were considered: the synthetic benchmark four-
dimensional Saito’s Chaotic Signal [63] and the real-world three-dimensional wind field
(pure quaternion).
3.3 Simulations 53
0
5
10
0
5
104
6
8
10
12
Filter Length pPrediction Horizon M
Pre
dic
tion
Ga
in (
dB
)
0
5
10
0
0.005
0.010
2
4
6
8
10
12
Filter Length pStepsize µ
Pre
dic
tion
Ga
in (
dB
)
AASQAFA
SQAFA
QMLP
(b) Dependence of Prediction Gain on µ and p
QMLP
AASQAFA
SQAFA
(a) Dependence of Prediction Gain on M and p
Figure 3.4: The performance of SQAFA, AASQAFA and QMLP on the prediction of 3Dwind signal.
3.3.1 Four-dimensional Saito’s Chaotic Circuit
The Saito’s chaotic circuit is governed by four state variables x1, y1, x2, y2 and five param-
eters η, α1, α2, β1, β2, and is given by [63]
∂x1
∂τ
∂y1∂τ
=
−1 1
−α1 −α1β1
x1 − ηρ1h(z)
y1 − η ρ1β1h(z)
(3.32)
∂x2
∂τ
∂y2∂τ
=
−1 1
−α2 −α2β2
x2 − ηρ2h(z)
y2 − η ρ2β2h(z)
(3.33)
where τ is the time constant of the chaotic circuit and h(z) is the normalized hysteresis
value which is given as [63]
h(z) =
1, z ≥ −1
−1, z ≤ 1(3.34)
3.3 Simulations 54
The symbols z, ρ1 and ρ2 are given as
z = x1 + x2; ρ1 =β1
1− β1; ρ2 =
β21− β2
(3.35)
Saito’s chaotic signal used is initialised with the following standard parameters:
η=1.3, α1=7.5, α2=15, β1=0.16 and β2=0.097. As chaotic signals are sensitive to ini-
tial conditions, these values would ensure that the Saito’s chaotic signal exhibit chaotic
behaviour. Figure 3.2(a) shows the 4D Saito’s signal dimension-wise.
Figure 3.3 illustrates the performance of the algorithms considered as a function
of the prediction horizon M (with µ = 10−2), and as a function of stepsize µ (with the
prediction horizon, M=1). From Figure 3.3, it can be seen that AASQAFA and SQAFA
have similar performance and they both have higher performance than the QMLP.
3.3.2 Wind Forecasting
In the next simulation, a three-dimensional wind field was used as an input.3 The wind
data was initially sampled at 50 Hz, but resampled at 5 Hz for simulation purposes. Figure
3.2(b) shows the three-dimensional wind data dimension-wise.
Figure 3.4 depicts the performance of SQAFA, AASQAFA and QMLP as a function
of the prediction horizon M and stepsize µ. The prediction gain for SQAFA was better
than that of QMLP in both case studies (varying learning rate and prediction horizon),
thus indicating the benefits of fully exploiting the quaternion algebra. The performance of
AASQAFA was superior to that of SQAFA, due to its adaptive amplitude which follows
the dynamics of the wind signal more closely.
Figure 3.5 shows the comparison between SQAFA, the learning algorithm for QMLP
applied to the FIR filter (QMLP-FIR), CNGD, and NGD as a function of prediction
horizon M and stepsize µ. The performance gain for SQAFA was higher than that for
QFIR, followed by those of the NGD algorithm and CNGD algorithm. When using the
same FIR architecture, the SQAFA has an improved performance over the QMLP-FIR
3The wind data is obtained from Prof. K. Aihara and his team at the Institute for Industrial Science,University of Tokyo, in an urban environment.
3.4 Discussion 55
0
5
10
0
5
10−5
0
5
10
15
Filter Length pPrediction Horizon M
Pre
dic
tion
Ga
in (
dB
)
0
5
10
0
0.005
0.01−5
0
5
10
15
Filter Length pStepsize µ
Pre
dic
tion
Ga
in (
dB
)
SQAFA
NGD
SQAFA
QMLP−FIR
NGD
CNGD
QMLP−FIR
(a) Dependence of Prediction Gain on M and p (b) Dependence of Prediction Gain on µ and p
CNGD
Figure 3.5: The performance of SQAFA, QFIR, CNGD and NGD on the prediction of 3Dwind signal.
highlighting the advantage of taking into consideration the non-commutativity aspect of
quaternion algebra. Moreover, both quaternions based algorithms proved to be better
than their complex and real-valued counterparts.
3.4 Discussion
The performance of the SQAFA was generally better than that for QMLP, as it takes into
account more complete information about the statistics of the multidimensional signal.
The AASQAFA, on the other hand, outperformed SQAFA due to its ability to better
track the dynamics of the signal. The QMLP was less affected by the length of the
prediction horizon and the filter length as compared to the SQAFA and AASQAFA. The
deterioration of the QMLP prediction gain Rp with the increase of prediction horizon
M is almost negligible due to the structural richness of the multilayer neural network
(NN) compared to the single layer FIR architecture of SQAFA and AASQAFA. The H
3.4 Discussion 56
Algorithms Additions Multiplications
1× QMLP 96p+168 108p+2161× SQAFA 54p+15 68p+241× AASQAFA 54p+19 68p+361× QMLP-FIR 28p+15 36p+202× CNGD 16p+4 24p+84× NGD 8p+4 12p+4
Table 3.1: Computational complexities of the algorithms
domain algorithms outperformed the algorithms in the complex C and real domain R
indicating quaternion based signal processing being a better choice for the processing of
three-dimensional and hypercomplex processes.
Another aspect that needs to be addressed is the computational complexity of
the algorithms, which is summarised in Table 3.1. The computational complexities for
AASQAFA and SQAFA are both O(68p) and QMLP is O(108p). On the other hand, the
computational complexity of QMLP is more than twice that of SQAFA and AASQAFA
when p = 1. Since the computational complexities for the SQAFA and AASQAFA are
similar, AASQAFA is a preferable choice due to its superior performance. Computational
complexities of the QMLP-FIR is O(36p), for the CNGD it is O(24p), and for NGD it is
O(12p). The computational complexity of SQAFA and AASQAFA are less than two times
that of QMLP-FIR, nearly three times that of CNGD and almost seven times that of NGD.
Hence, there is a trade-off between a higher computational complexity and increment in
performance.
The QMLP utilising the split-quaternion function was proven to be universal ap-
proximators in [16]. Specifically, it was shown that a universal approximator for quaternion
functions must be in the form of [16]
f(x) ≈N∑
i=1
CiΦs
(wT (n)x(n) + θ(n)
)(3.36)
where f(x) is a quaternion-valued function to be approximated, Ci is quaternion-valued
variable, Φs(·) is a split quaternion sigmoidal function, w(n) is the quaternion weight
vectors, x(n) is the quaternion input vectors and θ(n) is the quaternion-valued bias term.
3.4 Discussion 57
00.5
11.5
22.5
3
0
0.5
1
1.59.5
10
10.5
11
λ(0)ρ
Pre
dic
tion
Ga
in (
dB
)
Dependence of AASQAFA Prediction Gain on parameters ρ and λ(0)
Figure 3.6: Prediction gain of AASQAFA for the varying initial amplitude λ(0) and stepsize ρ.
Equation (3.36) conforms to the earlier findings of [1], who stated that any continuous
function can be approximated by the superposition of N sigmoidal functions. In the
context of the SQAFA and AASQAFA, N = 1, and therefore if SQAFA and AASQAFA
are extended to a neural network architecture, their approximation capabilities become
those of a universal approximator.
Figure 3.6 illustrates the dependence of the prediction gain of AASQAFA on the
initial amplitude λ(0) and step size ρ. It is shown that AASQAFA is robust to the initial
state λ(0) and the learning rate ρ for the realistic range of 0 < λ(0) < 3 and 0.1 ≤ ρ ≤ 1.5.
In summary, the advantages SQAFA and AASQAFA are
a) Taking into account the non-commutativity of the quaternion product leads to more
efficient use of the available statistics and improved performance;
b) AASQAFA caters for the changes in dynamical range of the signals, resulting in a
performance enhancement;
3.5 Summary 58
c) AASQAFA is robust to the choice of initial amplitude λ(0) and learning rate ρ.
3.5 Summary
A class of stochastic gradient algorithms (SQAFA and AASQAFA) for the training of
quaternion valued nonlinear adaptive finite impulse response (FIR) filters has been pro-
posed. The learning algorithm for the training of QMLP proved inadequate for modelling
the hypercomplex processes considered (four-dimensional Saito’s chaotic signal and three-
dimensional wind signal) due to the strong coupling between each dimension. Furthermore,
multiple univariate NGD and a pair of complex NGD (CNGD) were also considered, but
yielded poorer performance compared to both the QMLP and the SQAFA algorithms. The
split-quaternion nonlinear function was next employed, as there are no known analytic ex-
tensions of elementary transcendental functions from C to H, due to the violation of the
Cauchy-Riemann-Fueter (CRF) conditions. The derivations of the SQAFA and AASQAFA
have taken into account the non-commutativity of the quaternion product, and have been
simplified by making use of the odd-symmetry property of elementary transcendental func-
tions applied component-wise. A rigorous stability analysis has provided the range of the
stepsizes for SQAFA and AASQAFA, and has established the relationship between the
adaptive amplitude and the stepsize of the AASQAFA. The proposed algorithms (SQAFA
and AASQAFA) have been shown to exhibit excellent performance on the prediction of
quaternion valued real-world vector fields. The AASQAFA achieved better performance
due to its enhanced ability to track the time varying dynamics of the input signals.
59
Chapter 4
A Class of Quaternion Valued
Nonlinear Adaptive Filters
In the previous chapter, it has been shown that considering the non-commutativity of
quaternion algebra in deriving a new class of algorithms leads to an improved performance.
However, the nonlinearity used is the split-quaternion nonlinearity which does not take
into full consideration the available correlations between the dimensions. The usage of
the split-quaternion functions were necessary as there are no global analytic quaternion
nonlinear functions as dictated by the Cauchy-Riemann-Fueter (CRF) equations.
This chapter aims to propose a class of nonlinear quaternion-valued adaptive fil-
tering algorithms based on locally analytic nonlinear activation functions. To circumvent
the stringent standard analyticity conditions of CRF which are prohibitive to the develop-
ment of nonlinear adaptive quaternion-valued estimation models, the fact that stochastic
gradient learning algorithms require only local analyticity at the operating point in the
estimation space is enforced. It is shown that the quaternion-valued exponential function
is locally analytic, and since local analyticity extends to polynomials, products and ratios,
it is shown that a class of transcendental nonlinear functions can serve as activation func-
tions in nonlinear and neural adaptive models. This provides a unifying framework for
the derivation of gradient based learning algorithms in the quaternion domain H, and the
derived algorithms are shown to have the same generic form as their real- and complex-
4.1 Introduction 60
valued counterparts. To make such models second-order optimal for the generality of
quaternion signals (both circular and noncircular), recent developments in augmented
quaternion statistics is implemented to introduce widely linear versions of the proposed
nonlinear adaptive quaternion valued filters. This allows to fully exploit second-order in-
formation in the data, contained both in the covariance and pseudocovariances to cater
rigorously for second-order noncircularity (improperness), and the corresponding power
mismatch in the signal components. Simulations over a range of circular and noncircular
synthetic processes and a real world three-dimensional noncircular wind signal support
the approach.
4.1 Introduction
Although quaternion nonlinear functions have been implemented, for example, the Quater-
nion Independent Component Analysis (ICA) algorithm [64], the analyticity of such func-
tion has not been rigorously examined. The very stringent Cauchy-Riemann-Fueter (CRF)
conditions [25] ensure that the only globally analytic quaternion-valued functions are linear
functions and constants. This is a serious obstacle as the CRF conditions prevent us from
choosing the standard nonlinear activation functions (tanh, logistic) as the nonlinearities
in nonlinear quaternion-valued adaptive estimation.
It is important to notice that most practical gradient based learning algorithms [14–
16] only require local analyticity at a point. In analogy to the complex domain C, where
so called fully complex nonlinearities (elementary transcendental functions) provide means
for generic extensions of real neural networks [12, 38], our aim is to show that the class
of elementary transcendental functions, such as tanh are locally analytic in H and thus
permit generalisation of neural networks (NN) to the quaternion domain. This is not
possible to achieve using the standard Cauchy-Riemann-Fueter (CRF) conditions [25],
which are too restrictive. To this end, recent results on local analyticity [27] are exploited,
and due to a cumbersome derivation, the possibility of building generic quaternion-valued
nonlinear adaptive filters for the most commonly used activation functions, such as tanh,
are analytically shown. The derivation involves proving local analyticity of exponential
4.2 Fully Quaternion Functions in H 61
functions and their ratios, thus enabling the local analyticity for transcendental nonlinear
activation functions in H. Based on this set of results, the nonlinear adaptive filtering
and neural network paradigm in H are then established, in the similar way in R and
C [38, 39,65–69], as a natural generalisation.
In this work, a class of fully quaternion locally analytic nonlinear functions suitable
for quaternion-valued nonlinear adaptive filtering is introduced. It was also shown that
full second-order statistical information in the quaternion domain can be exploited by
combining the proposed nonlinear models with so called augmented quaternion statistics
and the widely linear model [42, 43]. For simplicity, the analysis and derivations are
provided for a single nonlinear perceptron and its widely linear counterpart.
This chapter is organised as follows. Section 4.2 reviews the local analyticity con-
dition (LAC) followed by the analysis of the quaternion exponential function and quater-
nion tanh function. Section 4.3 derives the proposed learning algorithms and their widely
linear counterparts followed by their convergence analyses. Section 4.4 compares the per-
formances of the proposed algorithms against the existing algorithms of the kind. The
results are discussed in Section 4.5 whereas the chapter concludes in Section 4.6.
4.2 Fully Quaternion Functions in H
Analyticity in H is governed by Cauchy-Riemann-Fueter (CRF) conditions given by [25]
∂f(q)
∂qa+∂f(q)
∂qbı+
∂f(q)
∂qc+
∂f(q)
∂qdκ⇔
∂f(q)
∂q∗= 0 (4.1)
The CRF conditions are too strict and are only satisfied by linear quaternion functions
and constants [25], prohibiting the development of quaternion-valued nonlinear signal
processing.
To relax the CRF, a “local” analyticity condition was proposed in [27], by using a
complex representation of a quaternion to give
∂f
∂qa= −
∂f
∂αζ (4.2)
4.2 Fully Quaternion Functions in H 62
where ζ and α are given by
ζ =qbı+ qc+ qdκ
α; α =
√
q2b + q2c + q2d (4.3)
The term “local” here refers to the fact that this representation uses “imaginary” unit ζ
which depends on the values of qb, qc and qd [27]. The local analyticity condition only
guarantees the first-order differentiability of the single variable quaternion functions at the
current operating point. This is perfectly adequate for quaternion valued gradient descent
adaptive filtering algorithms, as they only require the information about the gradient
value at a point. Furthermore, the local analyticity condition has a only a single definition
for analyticity eliminating the ambiguity of having left and right derivatives previously
suffered by the CRF conditions. An attractive aspect of the quaternion function satisfying
the local analyticity condition is that it is also a solution to the Fueter third-order analytic
conditions [27].
To provide a rigorous basis for nonlinear quaternion-valued adaptive filtering; the
fully quaternion nonlinearities in H are identified. The function that satisfies the local an-
alyticity condition in (4.2) is termed as ‘fully quaternion nonlinearity’, in the sense of local
analyticity. Due to the “local” nature of the first-order differentiability, the quaternionic
derivative at a point is dependant on the direction of the ζ-plane. The analyticity of a
function at a given point is evaluated by analysing the local derivative within the ζ-plane
(with ζ fixed) to obtain the relationship [27]
∂f
∂α=
∂qb∂α
∂f
∂qb+∂qc∂α
∂f
∂qc+∂qd∂α
∂f
∂qd
α∂f
∂α= qb
∂f
∂qb+ qc
∂f
∂qc+ qd
∂f
∂qd(4.4)
Based on this relationship, along with ζ and α in (4.3), the right hand side of the analyticity
condition in (4.2) is expanded along the orthogonal-axis vectors ı, and κ as
−
(∂f
∂α
)(
ζ
)
= −
(qbα
∂f
∂qb+qcα
∂f
∂qc+qdα
∂f
∂qd
)(qbı+ qc+ qdκ
α
)
(4.5)
By analogy with C, this yields the characteristics of a fully quaternion locally
4.2 Fully Quaternion Functions in H 63
analytic nonlinearity suitable for gradient based learning, given by
a) f(q) = u(qa, α) + v(qa, α)ζ is nonlinear in qa and α;
b) f(q) has no singularities and is always bounded for all values of q;
c) The partial derivatives ∂u∂qa
, ∂v∂α ,
∂v∂qa
and ∂u∂α are continuous and bounded;
d) ∂u∂qa
∂v∂α 6= ∂v
∂qa∂u∂α to ensure continuous learning.
The next subsection focuses on the analyticity of the quaternion exponential function eq,
as it serves as a building block to construct transcendental nonlinear quaternion functions,
typically used as nonlinear activation functions.
4.2.1 Quaternion Exponential Function
The notion of exponential function in H is not straightforward. Due to the non-
commutativity of the quaternion product, there exist several definitions of the quaternion
exponential [70]; for convenience, the following exponential function is considered (p.9 [71])
eq = eqa+qbı+qc+qdκ = eqaeqbı+qc+qdκ (4.6)
Expanding the term eq using the Euler formula leads to
eq = eqa(
cos(α) + sin(α)ζ
)
= eqa(
cos(α) +qb sin(α)ı
α+qc sin(α)
α+qd sin(α)κ
α
)
(4.7)
where α and ζ are defined in (4.3).
To examine whether such quaternion exponential function satisfies the analyticity
condition in (4.2), (4.7) is differentiated with respect to qa to give the left hand side of
(4.2), that is
∂eq
∂qa= eqa
(
cos(α) + sin(α)ζ
)
(4.8)
Next, (4.7) is differentiated with respect to α to obtain the right hand side of (4.2) as
−∂eq
∂αζ = −
(qbα
∂eq
∂qb+qcα
∂eq
∂qc+qdα
∂eq
∂qd
)(qbı+ qc+ qdκ
α
)
(4.9)
4.2 Fully Quaternion Functions in H 64
The result of such differentiation is given by (see Appendix E for a full derivation)
−∂eq
∂αζ = eqa
(
cos(α) + sin(α)ζ
)
(4.10)
Therefore, this quaternion exponential function satisfies the analyticity condition in (4.2)
giving the local derivative of the exponential function as
∂eq
∂q= eq (4.11)
Observe that, as desired, this result represents a generic extension of the real and complex
derivatives of an exponential. In addition, as gradient based learning algorithms are local,
this result provides a basis for introducing other nonlinearities, such as the elementary
transcendental functions (ETF), as a vehicle for a class of fully quaternion nonlinear
adaptive filters.
4.2.2 Local Analyticity of the Quaternion tanh Function
Similarly to the complex domain, tanh(q) in H can be defined as
tanh(q) =e2q − 1
e2q + 1(4.12)
Proceeding in a similar manner as when addressing the analyticity of eq, tanh(q) is first
expanded using the Euler formula in (4.7), leading to
tanh(q) =e2qa cos(2α) − 1 + e2qa sin(2α)ζ
e2qa cos(2α) + 1 + e2qa sin(2α)ζ
=e4qa
(cos2(2α) + sin2(2α)
)− 1 + 2e2qa cos(2α)ζ
e4qa(cos2(2α) + sin2(2α)
)+ 1 + 2e2qa cos(2α)
=e4qa − 1 + 2e2qa sin(2α)ζ
e4qa + 1 + 2e2qa cos(2α)(4.13)
To prove the local analyticity, the left hand side of (4.2) is obtained by differentiating
(4.13) with respect to qa, and the right hand side of (4.2) is obtained by differentiating
4.3 Derivation of Fully Quaternion Algorithms 65
(4.13) with respect to α, resulting in (a detailed derivation is given in Appendix F)
∂ tanh(q)
∂qa=
4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)(e4qa + 2e2qa cos(2α) + 1
)2 +
(4e2qa sin(2α) − 4e6qa sin(2α)
)
(e4qa + 2e2qa cos(2α) + 1
)2 ζ(4.14)
=−∂ tanh(q)
∂αζ (4.15)
thus illustrating that tanh(q) is a locally analytic quaternion function.
The expression for a local derivative of tanh(q) is obtained analogously to the
complex case; sech(q) shall be first defined as
sech(q) =2
eq + e−q(4.16)
By expanding (4.16) into its Euler form and then squaring (full derivation can be found
in Appendix G) will result in
sech2(q) =4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)
(e4qa + 2e2qa cos(2α) + 1
)2 +−4e6qa sin(2α) + 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1
)2 ζ (4.17)
A comparison of the definition for sech2(q) in (4.17) with ∂ tanh(q)∂qa
= −∂ tanh(q)∂α ζ in (4.14)
shows that they are equivalent; therefore, a generic extension of the real and complex tanh
function have been introduced to the quaternion domain, whose derivative is
∂ tanh(q)
∂q= sech2(q) (4.18)
4.3 Derivation of Fully Quaternion Algorithms
Similar to the class of algorithm derived in Chapter 3.2, the cost function that the
quaternion-valued adaptive filtering algorithms are minimising is given by
E(n) = e(n)e∗(n) (4.19)
where e(n) = d(n) − y(n) with d(n) and y(n) denoting respectively to the desired signal
and output signal. The symbol (·)∗ is the conjugate operator.
4.3 Derivation of Fully Quaternion Algorithms 66
4.3.1 Derivation of Quaternion Nonlinear Gradient Descent (QNGD)
To introduce the Quaternion Nonlinear Gradient Descent (QNGD) algorithm for the finite
impulse response (FIR) filter that employs a fully quaternion nonlinear activation function,
consider the output y(n) and its conjugate y∗(n) given by
y(n) = Φ(wT (n)x(n)
)= Φ
(net(n)
); y∗(n) = Φ
(xH(n)w∗(n)
)= Φ
(net∗(n)
)(4.20)
where (·)T is the transpose operator, (·)H is the Hermitian, and Φ(·) is the fully quaternion
nonlinearity such as the tanh(q) introduced in Section 4.2.2. Proceeding similar to Section
3.2, the cost function (4.19) shall be expressed as
E(n) =
(
d(n)− y(n)
)(
d∗(n)− y∗(n)
)
= d(n)d∗(n)− d(n)y∗(n)− y(n)d∗(n) + y(n)y∗(n)
(4.21)
The error gradient ∇wE(n) of QNGD is then calculated as
∇wE(n) = −d(n)∇wy∗(n)−∇wy(n)d
∗(n) + y(n)∇wy∗(n) +∇wy(n)y
∗(n) (4.22)
To simplify the derivation of QNGD further, the odd-symmetry property of locally analytic
quaternion elementary transcendental functions (ETF) is applied shown to be
Φ′∗(net(n)
)= Φ
′(net∗(n)
)(4.23)
Exercising (4.23), the expressions for ∇wy(n) and ∇wy∗(n) are given by (similar to the
derivations in Appendix A)
∇wy(n) = −Φ′(net(n)
)2x∗(n); ∇wy
∗(n) = Φ′∗(net(n)
)4x∗(n) (4.24)
Substitute the terms ∇wy∗(n) and ∇wy(n) into (4.22) to obtain the QNGD weight update
in the form
w(n+ 1) = w(n) + µ
(
2e(n)Φ′∗(net(n)
)x∗(n)− Φ
′(net(n)
)x∗(n)e∗(n)
)
(4.25)
4.3 Derivation of Fully Quaternion Algorithms 67
where Φ′
(·) is the local derivative of the fully quaternion function and µ is the real-valued
learning rate. Notice that the factor 2 is absorbed into µ.
4.3.2 Augmented Quaternion Nonlinear Gradient Descent (AQNGD)
The QNGD is now extended to fully capture the second-order statistics of the signal by
incorporating the quaternion widely linear model [42, 43, 45] into its derivation, resulting
in the Augmented Quaternion Nonlinear Gradient Descent (AQNGD) algorithm1. The
output y(n) of AQNGD is defined as
y(n) = Φ(gT (n)x(n)+hT (n)xı(n)+uT (n)x(n)+vT (n)xκ(n)
)= Φ
(waT(n)xa(n)
)= Φ
(neta(n)
)
(4.26)
where g, h, u and v are the weight vectors, x is the input signal, xı, x and xκ are
respectively its ı, and κ involutions, wa = [gT hT uT vT ]T is the augmented weight
vector, and xa = [xT xıT xT xκT ]T is the augmented random input vector.
The conjugate output y∗(n) is then given as
y∗(n) = Φ(neta∗(n)
)(4.27)
The weight updates of the AQNGD are made gradient adaptive according to
g(n+ 1) = g(n)− µ∇gE(n); h(n+ 1) = h(n)− µ∇hE(n)
u(n+ 1) = u(n)− µ∇uE(n); v(n + 1) = v(n) − µ∇vE(n) (4.28)
The error gradient ∇wE(n) in (4.22) is equivalent to ∇gE(n), hence
g(n + 1) = g(n) + µ
(
2e(n)Φ′∗(neta(n)
)x∗(n)− Φ
′(neta(n)
)x∗(n)e∗(n)
)
(4.29)
The error gradient ∇hE(n) is given by
∇hE(n) = −d(n)∇hy∗(n)−∇hy(n)d
∗(n) + y(n)∇hy∗(n) +∇hy(n)y
∗(n) (4.30)
1A comprehensive account of widely linear modelling in the complex domain C is given in [38].
4.3 Derivation of Fully Quaternion Algorithms 68
In the same manner, the terms ∇hy(n) and ∇hy∗(n) are calculated as
∇hy(n) = −Φ′(neta(n)
)2xı∗(n); ∇hy
∗(n) = Φ′∗(neta(n)
)4xı∗(n) (4.31)
Substituting ∇hy(n) and ∇hy∗(n) into the error gradient ∇hE(n) in (4.30) yields
h(n+ 1) = h(n) + µ
(
2e(n)Φ′∗(neta(n)
)xı∗(n)− Φ
′(neta(n)
)xı∗(n)e∗(n)
)
(4.32)
Proceeding in a similar manner, the weight updates for u(n) and v(n) are found to be
u(n + 1) = u(n) + µ
(
2e(n)Φ′∗(neta(n)
)x∗(n)− Φ
′(neta(n)
)x∗(n)e∗(n)
)
v(n + 1) = v(n) + µ
(
2e(n)Φ′∗(neta(n)
)xκ∗(n)−Φ
′(neta(n)
)xκ∗(n)e∗(n)
)
(4.33)
For convenience, the final weight update of the AQNGD can be written in an augmented
form as2
wa(n+ 1) = wa(n) + µ
(
2e(n)Φ′∗(neta(n)
)xa∗(n)− Φ
′(neta(n)
)xa∗(n)e∗(n)
)
(4.34)
4.3.3 Convergence Analysis of QNGD and AQNGD
Similar to the convergence analysis in Chapter 3, three widely used general assumptions
are made [52]
a) the learning rate µ is sufficiently small to ensure the deterministic behaviour of the
ensemble average;
b) at convergence, the a priori output error e(n) is statistically independent of the input
vector x(n), that is E{e(n)x(n)} = 0;
c) both the a posteriori output error e(n) and a priori output error e(n) are Gaussian.
2The QNGD could also be readily extended to incorporate the semi-widely linear model [43], howeverthis is beyond the scope of this work.
4.4 Simulations 69
Applying those assumptions and proceeding similar to Appendix C, the final suffi-
cient condition for the convergence of QNGD becomes
0 < µ <1
10E{xT (n)x∗(n)‖Φ′(wT (n)x(n)
)‖22}
(4.35)
whereas the condition for AQNGD is
0 < µ <1
10E{xaT (n)xa∗(n)‖Φ′(waT (n)xa(n)
)‖22}
(4.36)
Both the upper bounds of (4.35) and (4.36) are governed by the expected value of the
random input vector and the gradient of the fully quaternion nonlinearity. Note that the
upper bound of µ for the AQNGD in (4.36) is smaller than that of QNGD in (4.35), due to
the larger size of the augmented input vector xa(n). This means that the allowable value
for µ in QNGD is larger than the AQNGD resulting in a faster convergence for QNGD.
4.4 Simulations
A comprehensive comparison of the performances is provided between the training al-
gorithm for the feedforward Quaternion Multilayer Perceptron (QMLP) [16, 72] and the
nonlinear FIR filters trained with the QMLP learning algorithm (QMLP-FIR), Adaptive
Amplitude Split Quaternion Adaptive Filtering Algorithm (AASQAFA), real-valued Non-
linear Gradient Descent (NGD) [52] and the proposed algorithms based on fully quaternion
nonlinear functions, QNGD and AQNGD. The QMLP-FIR, AASQAFA, NGD, QNGD
and AQNGD were implemented with a filter length p whereas the QMLP had one hidden
layer comprising of p input neurons, three hidden neurons and one output neuron. The
tanh(q) nonlinear activation function was used for all the algorithms. The performance
was measured using the prediction gain Rp defined as [52]
Rp = 10 log10σ2xσ2e
(4.37)
where σ2x and σ2e denote respectively the estimated variance of the input and error.
4.4 Simulations 70
1000 2000 3000 4000 5000 6000−35
−30
−25
−20
−15
−10
−5
0
Number of iterations (n)
Err
or
10
log
10 E
(n)
AASQAFA
AQNGD
QNGD
QMLP−FIR
Figure 4.1: Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on theprediction of linear AR (4) signal (4.39) driven by H-circular white Gaussian noise.
WGN H-circular Cı-circular Noncircular
εa N (0, 1) N (0, 1) N (0, 1)εb N (0, 1) N (0, 1) −0.6εa +N (0, 1)εc N (0, 1) 0.4εa + 0.8εb +N (0, 1) 0.8εb +N (0, 1)εd N (0, 1) 0.8εa − 0.4εb +N (0, 1) 0.8εa − 0.4εb +N (0, 1)
Table 4.1: Classes of Quaternion White Gaussian Noise
The three quaternion valued processes considered were the synthetic linear AR (4)
process [38] with a varying degree of circularity, the noncircular chaotic four-dimensional
Saito signal [63], and the real-world three-dimensional wind field.
4.4.1 Linear AR (4)
For this experiment, the input tap length was chosen to be p = 3, prediction horizon
M = 1 and the learning rate µ = 5× 10−3.
In the first set of simulations, the performances of AQNGD, QNGD, AASQAFA and
4.4 Simulations 71
QMLP-FIR were analysed for a linear AR (4) process with a varying degree of circularity
of the driving quaternion quadruply white Gaussian noise (QWGN) ε(n). The QWGN is
described by
ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (4.38)
where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN). The
properties of noises used to generate different classes of QWGN are shown in Table 4.1.
Note that the properties for C-circular and Cκ-circular noises are similar to those of the
Cı-circular input noise, and their descriptions are omitted due to space limitation.
A total of 100 independent simulation trials were conducted and averaged for the
linear AR (4) process given by
r(n) = 1.79r(n − 1)− 1.85r(n − 2) + 1.27r(n − 3)− 0.41r(n − 4) + ε(n) (4.39)
Figure 4.1 shows the learning curves for an H-circular quaternion white Gaussian
noise as the driving noise of the linear AR (4) process. Observe that the proposed AQNGD
and QNGD had the fastest convergence, followed by the AASQAFA and QMLP-FIR. It
can be seen that the steady-state performances for AQNGD, QNGD and AASQAFA were
similar due to the matched power of the components of the H-circular linear AR (4) signal.
Figure 4.2 depicts the learning curves for the input Cı-circular white Gaussian
noise3 for all of the algorithms considered. Similar to the previous case, the AQNGD and
QNGD had the fastest convergence, and as desired, the steady-state results for AQNGD
and QNGD were equivalent. In the case of C and Cκ white Gaussian noises, similar
performances were obtained and are omitted in this work for conciseness.
Figure 4.3 shows learning curves for all the algorithms considered using a noncircu-
lar white Gaussian noise as the input; the AQNGD and QNGD had improved performances
over the AASQAFA and QMLP-FIR. It can also be seen that the steady-state performance
of AQNGD was lower than that of QNGD as it was designed to cater for any noncircular
autoregressive (AR) type of processes.
3The notion of Cη circularity refers to only having a pair of axis exhibiting complex circularity.
4.4 Simulations 72
1000 2000 3000 4000 5000 6000−35
−30
−25
−20
−15
−10
−5
0
Number of iterations (n)
Err
or
10
log
10E
(n)
QMLP−FIRAASQAFA
QNGDAQNGD
Figure 4.2: Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on theprediction of linear AR (4) signal (4.39) driven by Ci-circular white Gaussian noise.
Table 4.2 compares prediction gains Rp of the AQNGD, QNGD, AASQAFA and
QMLP-FIR for the prediction of linear AR (4) process with varying classes of input cir-
cularity, with µ = 10−2. The prediction gain was obtained from an average of 100 Monte-
Carlo trials. In all the cases, the proposed algorithms, AQNGD and QNGD, had bet-
ter performance over the AASQAFA and QMLP-FIR, illustrating the power of the fully
quaternion function over the split-quaternion function. Also from Table 4.2, the use of
the quaternion widely linear model for noncircular data is fully justified, as indicated by
a higher prediction gain of AQNGD over the QNGD for noncircular sources.
4.4.2 Four-dimensional Saito’s Chaotic Circuit
The Saito chaotic signal was initialised with the following parameters: η=1.3, α1=7.5,
α2=15, β1=0.16 and β2=0.097, and is noncircular, as shown dimension-wise in Figure
4.4(a). These values would guarantee the chaotic behaviour of the Saito’s chaotic signal.
4.4 Simulations 73
1000 2000 3000 4000 5000 6000−35
−30
−25
−20
−15
−10
−5
0
Number of iterations (n)
Err
or
10
log
10E
(n)
QMLP−FIR AASQAFA
QNGD AQNGD
Figure 4.3: Learning curves for QMLP-FIR, AASQAFA, QNGD and AQNGD on theprediction of linear AR (4) signal (4.39) driven by noncircular white Gaussian noise.
Figure 4.5 depicts the performances of the algorithms considered in terms of pre-
diction horizon M (with fixed stepsize µ = 10−2) and stepsize µ (with fixed prediction
horizon M=1). Observe that the AQNGD outperformed all the other algorithms by a
margin greater than 2dB. For all the cases, increasing the stepsize led to a higher predic-
tion gain provided that the upper bound of QNGD in (4.35) and AQNGD in (4.36) were
satisfied.
Figure 4.6 illustrates the dependence of the prediction gain on filter length p for all
algorithms with a fixed prediction horizonM = 1 and stepsize µ = 10−2. Observe that the
prediction gain for the AQNGD was the largest followed closely by the QNGD. However,
increasing the filter length above p = 80 taps would lead to a significant performance degra-
dation of the AQNGD whereas the performance of the QNGD remains almost constant
for higher filter length p. This is because increasing the filter length would proportionally
increase the value of the term xaT (n)xa∗(n), which controls the maximum allowable µ,
thus violating the upper bound of µ for AQNGD, specified in (4.36). However, this value
4.5 Discussion 74
Algorithms H-circular Cı-circular C-circular Cκ-circular Noncircular
AQNGD 20.22dB 20.93dB 20.91dB 20.88dB 21.58dBQNGD 19.46dB 20.04dB 19.99dB 20.01dB 20.45dBAASQAFA 18.09dB 15.75dB 15.35dB 15.66dB 17.01dBQMLP-FIR 16.58dB 18.11dB 18.11dB 18.05dB 18.04dB
Table 4.2: Prediction Gain Rp for a Linear AR (4) Process With Varying Degree ofNoncircularity
is still within the upper bound of µ for QNGD given in (4.35).
4.4.3 Wind Forecasting
In this set of simulations, a single realization of three-dimensional wind field was used
as the input4. Figure 4.4(b) shows the wind field signal dimension-wise, and Figure 4.7
illustrates the performances of AQNGD, QNGD, AASQAFA and QMLP-FIR as a function
of prediction horizon M and stepsize µ. The performance of AQNGD was better than that
of QNGD; this was closely followed by AASQAFA, whereas the performance of the QMLP-
FIR was the lowest.
Figure 4.8 shows a comparison of the proposed QNGD with the existing QMLP
and three real-valued NGD as a function of prediction horizon M with a fixed stepsize
µ = 10−2. From Figure 4.8, observe that the QNGD outperformed the other algorithms
considered. Also observe that QMLP prediction gain was almost constant with the increase
of the prediction horizon due to the structural richness of the feedforward multilayer neural
network, which conforms to our earlier studies in Chapter 3.
4.5 Discussion
The performances of the filters that use the proposed locally analytic fully quaternion ac-
tivation functions were generally better than those of the existing AASQAFA and QMLP-
FIR. The widely linear version outperformed the QNGD, due to the implementation of the
quaternion widely linear model that fully captures the second-order statistics of quaternion
4The wind data were sampled at 32 Hz and recorded by the 3D WindMaster anemometer provided byGill Instruments.
4.5 Discussion 75
0 1000 2000 3000−2
0
2
Time (samples)
X1
0 1000 2000 3000−10
0
10
Time (samples)
Y1
0 1000 2000 3000−2
0
2
Time (samples)
X2
0 1000 2000 3000−10
0
10
Time (samples)
Y2
0 1000 2000 3000−2
−1
0
1
Time (samples)
Ea
st D
ire
ctio
n (
m/s
)
0 1000 2000 3000−6
−4
−2
0
2
Time (samples)
No
rth
Dire
ctio
n (
m/s
)
0 1000 2000 3000−2
0
2
Time (samples)Ve
rtic
al D
ire
ctio
n (
m/s
)(a) 4D Saito Signal (b) 3D Wind Signal
Figure 4.4: Noncircular signals used in simulations. Left: The 4D Saito Signal. Right:The 3D wind signal.
signals.
In order to create a class of fully quaternion function that is suitable for quaternion-
valued adaptive filtering, it is essential to examine the possibility of employing other fully
complex transcendental functions [12] as locally analytic fully quaternion functions. In
Section 4.2, the exponential function eq is established to be locally analytic and, given
that summations and products of analytic functions are analytic as well as quotients
(provided the denominator does not vanish), the tanh(q) function is also locally analytic
because it can be expressed in terms of eq as
tanh(q) =sinh(q)
cosh(q)=
eq − e−q
eq + e−q=
e2q − 1
e2q + 1(4.40)
This was verified by a rigorous derivation given in Appendix E, Appendix F and
Appendix G. By continuity, the other quaternion transcendental functions are also locally
analytic. In the complex domain, it has been shown in [61] that these performances based
4.5 Discussion 76
0
5
10
0
5
106
8
10
12
14
16
18
20
Filter Length pPrediction Horizon M
Pre
dic
tion
Ga
in (
dB
)
0
5
100
0.005
0.01
0
5
10
15
20
Stepsize µFilter Length p
Pre
dic
tion
Ga
in (
dB
)
QNGD
AQNGD
AASQAFA
QMLP−FIR
QNGD
AQNGD
AASQAFA
QMLP−FIR
(b) Dependence of the prediction gain on µ and p(a) Dependence of the prediction gain on M and p
Figure 4.5: The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on theprediction of the noncircular 4D Saito signal.
on a set of fully analytic transcendental functions were similar. In the same spirit, Figure
4.9 confirms by simulations that the other elementary transcendental functions give similar
performance to that of the locally analytic function tanh(q). It is therefore shown that
the fully complex transcendental activation functions from C can be extended to fully
quaternion functions in H; this is consistent with the observations in [61].
For convenience, the class of locally analytic fully quaternion functions and their
4.5 Discussion 77
10 20 30 40 50 60 70 80 9012
14
16
18
20
22
24
Filter Length p
Pre
dic
tion
Ga
in (
dB
)
AQNGD
QNGD
QMLP−FIR
AASQAFA
Figure 4.6: The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on theprediction of the noncircular 4D Saito signal over a range of filter lengths.
derivatives are given below
tanh(q) :∂ tanh(q)
∂q= sech2(q) (4.41)
tan(q) :∂ tan(q)
∂q= sec2(q) (4.42)
sin(q) :∂ sin(q)
∂q= cos(q) (4.43)
arctan(q) :∂ arctan(q)
∂q= (1 + q2)−1 (4.44)
arcsin(q) :∂ arcsin(q)
∂q= (1− q2)−1/2 (4.45)
sinh(q) :∂ sinh(q)
∂q= cosh(q) (4.46)
arctanh(q) :∂arctanh(q)
∂q= (1− q2)−1 (4.47)
arcsinh(q) :∂arcsinh(q)
∂q= (1 + q2)−1 (4.48)
4.5 Discussion 78
0
5
10
0
5
104
6
8
10
12
14
Filter Length pPrediction Horizon M
Pre
dic
tion
Ga
in (
dB
)
0
5
10 00.002
0.0040.006
0.0080.01
0
5
10
15
Stepsize µFilter Length p
Pre
dic
tion
Ga
in (
dB
)
AQNGD
AASQAFA
QNGD
QMLP−FIR
AQNGD
AASQAFAQMLP−FIR
QNGD
(b) Dependence of the prediction gain on µ on p(a) Dependence of the prediction gain on M and p
Figure 4.7: The performance of AQNGD, QNGD, AASQAFA and QMLP-FIR on theprediction of a 3D wind signal.
Another factor to consider is the computational complexity of the algorithms, sum-
marised in Table 4.3. The computational complexity of the AASQAFA, QNGD is O(68p);
the NGD has the lowest computational complexity of O(9p) and the AQNGD has the
highest computational complexity of O(272p). Computational complexities of the QMLP-
FIR is O(36p) and for the QMLP it is O(108p). The QNGD algorithm thus represents an
improvement from our previous proposed algorithm AASQAFA in terms of performance
and simplicity, while maintaining similar computational complexity.
Algorithms Multiplications Additions
1× QMLP-FIR 36p+20 28p+151× AASQAFA 68p+36 54p+191× QMLP 108p+216 96p+1683× NGD 9p+3 6p+31× QNGD 68p+36 54p+241× AQNGD 272p+144 208p+38
Table 4.3: Computational complexities of the algorithms considered
4.5 Discussion 79
0
5
10 0
5
10
0
2
4
6
8
10
12
14
Prediction Horizon MFilter Length p
Pre
dic
tion
Ga
in (
dB
)
QNGD
NGDQMLP
Dependence of the prediction gain on M and p
Figure 4.8: The performance of QNGD, QMLP and NGD on the prediction of a 3D windsignal.
In summary, the advantages of proposed class of QNGD and AQNGD algorithms
based on fully quaternion locally analytic nonlinearities, are
a) The performances of algorithms based on fully quaternion locally analytic functions,
QNGD and AQNGD, were better compared to those based on the split quaternion
functions, AASQAFA and QMLP-FIR, as the fully quaternion nonlinearities (4.41)
- (4.48) operate directly in the quaternion domain instead of the channelwise pro-
cessing in R;
b) The widely linear model enables the AQNGD to fully capture the quaternion second-
order statistics suitable for noncircular signals (improper), and hence offers a fur-
ther performance enhancement over the standard linear model employed in QNGD,
AASQAFA and QMLP-FIR;
c) The fully quaternion based QNGD is a reasonable choice as it allows for a trade-off
between performance and computational complexity.
4.6 Summary 80
02
46
810
0
5
107
8
9
10
11
12
13
Filter Length pPrediction Horizon M
Pre
dic
tion
Ga
in (
dB
)
Dependence of prediction gain on types of nonlinear quaternion functions
Figure 4.9: Prediction gains of QNGD for tan(q), sin(q), arctan(q), arcsin(q), sinh(q),arctanh(q) and arcsinh(q) for the prediction of 3D wind signal.
4.6 Summary
A class of quaternion-valued nonlinear functions suitable for stochastic gradient based
training of quaternion valued nonlinear adaptive filters has been proposed. The existing
learning algorithms either completely neglect the non-commutativity aspect of quaternion,
thus proving inadequate for the modelling of three- and four-dimensional processes, or are
unable to provide an accurate estimate due to the use of the split-quaternion function that
applies real nonlinearities component-wise. A class of fully quaternion activation functions
has been derived according to the local analyticity condition (LAC) which enables the
extension of fully complex nonlinear activation functions to the quaternion domain H.
The proposed fully quaternion algorithms (QNGD and AQNGD) have been shown to
exhibit excellent performance on the prediction of four-dimensional synthetic and three-
dimensional real-world vector signals. The widely linear AQNGD has been shown to
achieve enhanced performance due to the utilisation of the quaternion widely linear model
4.6 Summary 81
and the associated augmented quaternion statistics, which fully captures the second-order
information within quaternion-valued signals and enables the processing of both second-
order circular (proper) and noncircular (improper processes). Simulations over a range of
noncircular synthetic signals and real world three-dimensional wind recordings illustrate
the benefits of the proposed approach.
82
Chapter 5
Enabling Quaternion Valued
Recurrent Neural Networks
In the previous chapter, it was proven that the fully quaternion functions are suitable
for gradient-descent nonlinear quaternion-valued adaptive filtering applications. The fully
quaternion functions fulfil the local analyticity condition (LAC) guaranteeing first-order
differentiability of these functions.
This chapter aims to introduce an extension of the previously proposed fully
quaternion algorithms to the quaternion-valued recurrent neural networks (RNN). The
strict Cauchy-Riemann-Fueter (CRF) analyticity conditions establish that only linear
quaternion-valued functions are analytic, prohibiting the development of quaternion-
valued nonlinear adaptive filters for the recurrent neural network architecture (RNN).
In this work, the requirement of local analyticity in gradient based learning is exercised
and proposes to use the local analyticity condition (LAC) to introduce quaternion-valued
nonlinear feedback adaptive filters. The introduced class of algorithms make full use of
quaternion algebra and provide generic extensions of the corresponding real and complex
solutions. Simulations in the prediction setting support the analysis presented.
5.1 Introduction 83
5.1 Introduction
Quaternion-valued nonlinear filtering algorithms make use of elementary transcendental
functions (ETF) [12], which do not satisfy the Cauchy-Riemann-Fueter (CRF) conditions;
in fact these strict conditions are only met by linear quaternion-valued functions and con-
stants. The local analyticity condition (LAC) [27] is adopted to circumvent the analyticity
problem of the CRF. It treats the quaternion variable similarly to a complex variable and
can only guarantee the first-order differentiability of single variable quaternion functions
at a point. Notice, however, that for most gradient based learning algorithms the first
order derivative is adequate, enabling the derivation of nonlinear algorithms as shown in
Chapter 4.
This class of nonlinear algorithms in Chapter 4 was based on the feedforward ar-
chitecture and requires a long filter length for the modelling of systems with long term
correlations. For such a case, the infinite impulse response (IIR) architecture is more ap-
propriate due to the feedback as these can model long term correlations with a small-scale
model. For completeness, the aim is to investigate the suitability of LAC in the derivation
of gradient-based learning algorithms for feedback architectures and to provide building
block for recurrent neural networks in the context of quaternion-valued signal processing.
The LAC will allow the use of the ‘fully’ rather than the ‘split’ quaternion functions. Sim-
ilarly to the complex domain C [12,38], these ‘fully’ quaternion functions permit rigorous
treatment of the cross-information across the data channels, in contrast to the componen-
twise operation of the ‘split’ quaternion functions. The use of the recurrent architecture
and the fully quaternion functions will thus enhance the generality of the existing class of
quaternion-valued adaptive filtering algorithms [24,73].
This section is organised as follows. Section 5.2 presents an analysis that high-
lights the differences between split and fully quaternion functions. In Section 5.3, the
Quaternion-valued recurrent neural network (RNN) algorithms are derived. The proposed
algorithms are supported by simulations on synthetic three-dimensional Lorez attractor
and three-dimensional motion data in Section 5.4. This chapter concludes in Section 5.5.
5.2 Analysis of Quaternion-Valued Functions 84
5.2 Analysis of Quaternion-Valued Functions
This section will show that the fully-quaternion functions are better in capturing the cross-
correlations between the dimensions compared to the split-quaternion functions. Consider
the split quaternion function eq, as it serves as a building block to construct other quater-
nion elementary transcendental functions, that is
eq = eqa + eqbı+ eqc+ eqdκ = ya + ybı+ yc+ ydκ (5.1)
where ya, yb, yc and yd are the real-valued elements of the componentwise quaternion-
valued output.
From (5.1), it is clear that the componentwise output depends only on the corre-
sponding input component of the same dimension which is shown to be
E{ya} = E{eqa}; E{yb} = E{eqb}; E{yc} = E{eqc}; E{yd} = E{eqd} (5.2)
Now consider a “fully” eq function that gives the output
eq = eqaeqbı+qc+qdκ
= eqa(
cos(√
q2b + q2c + q2d) +qb sin(
√
q2b + q2c + q2d)√
q2b + q2c + q2d
ı+qc sin(
√
q2b + q2c + q2d)√
q2b + q2c + q2d
+qd sin(
√
q2b + q2c + q2d)√
q2b + q2c + q2d
κ
)
= ya + ybı+ yc+ ydκ (5.3)
Examining the output ya componentwise shows that
E{ya} = E
{
eqa cos(√
q2b + q2c + q2d)
}
(5.4)
represents a nonlinear combination of all the input components qa, qb, qc, qd, and therefore
accounts for the internal couplings. This holds true for the other output components which
5.3 FCRNN Algorithms in H 85
are given by
E{yb} = E
{
eqaqb sin(
√
q2b + q2c + q2d)√
q2b + q2c + q2d
}
E{yc} = E
{
eqaqc sin(
√
q2b + q2c + q2d)√
q2b + q2c + q2d
}
E{yd} = E
{
eqaqd sin(
√
q2b + q2c + q2d)√
q2b + q2c + q2d
}
(5.5)
5.3 FCRNN Algorithms in H
The fully connected recurrent neural network (FCRNN) consists of N neurons and p
external inputs as illustrated in Figure 5.1. The network has two distinct layers consisting
of a feedback layer and a layer of processing elements. In order to make these terms
consistent with past recurrent neural network (RNN) literature, yl(n) is chosen to denote
the quaternion-valued output of each neuron, l = 1, . . . , N at time index n and s(n) the
(1 × p) external quaternion-valued input vector. The overall input to the network z(n)
represents the concatenation of vectors y(n), s(n) and the bias input (1 + ı+ + κ), and
is given by
z(n) = [s(n− 1), . . . , s(n − p), 1 + ı+ + κ, y1(n− 1), . . . , yN (n − 1)]T
= zal + zbl ı+ zcl + zdl κ (5.6)
where zal , zbl , z
cl and zdl are the real-valued input components corresponding to the lth
element from the input vector z(n).
A quaternion-valued weight matrix of the network is denoted by W, where for
lth neuron, we have wl = [wl,1, . . . , wl,p+F+1]T . In the following subsections, only the
output from the first neuron (recurrent perceptron) y1(n) is considered resulting in the
5.3 FCRNN Algorithms in H 86
Figure 5.1: A fully connected recurrent neural network (FCRNN).
cost function of
E(n) = (ea1(n))2 + (eb1(n))
2 + (ec1(n))2 + (ed1(n))
2 (5.7)
= e1(n)e∗1(n) (5.8)
where the error e1(n) = d(n)− y1(n) with d(n) being the desired signal. The terms ea1, eb1,
ec1 and ed1 denote the error component in the real part, ı part, part and κ part.
This terminology is used throughout this chapter.
5.3.1 Derivation of the Split Quaternion-valued RTRL
The split Quaternion-Valued Real-Time Recurrent Learning (Split QRTRL) algorithm for
FCRNN utilises the split-quaternion function, whose output at the lth neuron yl(n) is
given by
yl(n) = Φs
(wT
l (n)z(n))= Φa
(netal (n)
)+Φb
(netbl (n)
)ı+Φc
(netcl (n)
)+Φd
(netdl (n)
)κ
(5.9)
5.3 FCRNN Algorithms in H 87
where Φs(·) denotes the “split” quaternion nonlinearity, Φa is a real-valued nonlinear
activation function applied to the real part of netl, Φb to the ı part, Φc to the part and
Φd to the κ part. The terms netal , netbl , net
cl and netdl are given by
neta(n) = R{wTl (n)z(n)}; netb(n) = Iı{w
Tl (n)z(n)}
netc(n) = I{wTl (n)z(n)}; netd(n) = Iκ{w
Tl (n)z(n)} (5.10)
where the symbols R(·), Iı(·), I(·) and Iκ(·) correspond to the real, ı, and κ components
respectively. The full expansion of the terms is given in Appendix H.
The Split QRTRL then minimises the cost function (5.7) through a gradient descent
weight update specified by ws,t(n + 1) = ws,t(n)− µ∇ws,tE(n) where µ is the real-valued
learning rate and the gradient ∇ws,tE(n) is given by
∇ws,tE(n) =∂E(n)
∂was,t(n)
+∂E(n)
∂wbs,t(n)
ı+∂E(n)
∂wcs,t(n)
+∂E(n)
∂wds,t(n)
κ (5.11)
Expanding the term ∂E∂wa
s,tin (5.11) gives
∂E(n)
∂was,t(n)
= −ea1(n)∂yal (n)
∂was,t(n)
− eb1(n)∂ybl (n)
∂was,t(n)
− ec1(n)∂ycl (n)
∂was,t(n)
− ed1(n)∂ydl (n)
∂was,t(n)
(5.12)
where the terms∂ya
l
∂was,t,
∂ybl
∂was,t,
∂ycl
∂wcs,t
and∂yd
l
∂wds,t
represents the real-valued sensitivity of the
network.
For convenience, the sensitivity terms in (5.12) is denoted with Ψl,(ηa)s,t =
∂yηl
∂was,t
where
η ∈ {a, b, c, d}, resulting in
∂E(n)
∂was,t(n)
= −ea1(n)Ψl,(aa)s,t (n)− eb1(n)Ψ
l,(ba)s,t (n)− ec1(n)Ψ
l,(ca)s,t (n)− ed1(n)Ψ
l,(da)s,t (n) (5.13)
In order to make further calculations feasible, a small stepsize is assumed so that [52,73]
w(n) ≈ w(n− 1) ≈ · · · ≈ w(n−M)
∂y(n)
∂w(n)≈
∂y(n)
∂w(n− 1)≈ · · · ≈
∂y(n)
∂w(n −M)(5.14)
5.3 FCRNN Algorithms in H 88
The sensitivity Ψl,(aa)s,t is first calculated by differentiating yal with respect to wa
s,t and
applying the assumptions in (5.14) to yield
Ψl,(aa)s,t (n) =
∂yal (n)
∂netal (n)
∂netal (n)
∂was,t(n)
= Φ′
s
(netal (n)
)(
δslzal (n) +
N∑
q=1
∂yl(n− 1)
∂was,t(n)
)
= Φ′
s
(netal (n)
)(
δslzal (n) +
N∑
q=1
wal,p+1+q(n)Ψ
q,(aa)s,t (n− 1)− wb
l,p+1+q(n)Ψq,(ba)s,t (n− 1)
− wcl,p+1+q(n)Ψ
q,(ca)s,t (n− 1)− wd
l,p+1+q(n)Ψq,(da)s,t (n− 1)
)
(5.15)
The other 15 sensitivities are also derived in a similar manner (derivation is given in
Appendix H). Following a similar approach to [11], the compact solution is then obtained
by grouping these 16 sensitivity terms together to yield
Ψls,t(n) = Φs
′
(n)
( N∑
q=1
W(n)Ψqs,t(n− 1) + δslzsplit(n)
)
(5.16)
where δsl is the dirac-delta function. Each of the real-valued matrices are given as (the
time index ’n’ has been dropped due to space restrictions)
Ψls,t =
Ψl,(aa)s,t Ψ
l,(ab)s,t Ψ
l,(ac)s,t Ψ
l,(ad)s,t
Ψl,(ba)s,t Ψ
l,(bb)s,t Ψ
l,(bc)s,t Ψ
l,(bd)s,t
Ψl,(ca)s,t Ψ
l,(cb)s,t Ψ
l,(cc)s,t Ψ
l,(cd)s,t
Ψl,(da)s,t Ψ
l,(db)s,t Ψ
l,(dc)s,t Ψ
l,(dd)s,t
Φs
′
=
Φ′
a(netal ) 0 0 0
0 Φ′
b(netbl ) 0 0
0 0 Φ′
c(netcl ) 0
0 0 0 Φ′
d(netdl )
(5.17)
5.3 FCRNN Algorithms in H 89
W =
wal,p+1+q −wb
l,p+1+q −wcl,p+1+q −wd
l,p+1+q
wal,p+1+q wb
l,p+1+q wcl,p+1+q −wd
l,p+1+q
wal,p+1+q −wb
l,p+1+q wcl,p+1+q wd
l,p+1+q
wal,p+1+q wb
l,p+1+q −wcl,p+1+q wd
l,p+1+q
zsplit =
zal zbl zcl zdl
−zbl zal −zdl zcl
−zcl zdl zal −zbl
−zdl −zcl zbl zal
(5.18)
5.3.2 Derivation of the Quaternion-Valued RTRL
For the fully Quaternion-Valued RTRL, the ouptut yl(n) is given by
yl(n) = Φ(wT
l (n)z(n))= Φ
(netl(n)
)(5.19)
Based on the cost function of (5.8), the gradient ∇wE(n) of QRTRL shall be expressed as
∇wE(n) = e1(n)∇we∗1(n) +∇we1(n)e
∗1(n) = −e1(n)Υ(n)−Ψ(n)e∗1(n) (5.20)
where Υ(n) and Ψ(n) are the conjugate sensitivities and sensitivities respectively, defined
by
Ψ(n) =
[∂y1(n)
∂w1,1(n), · · · ,
∂yl(n)
∂wN,N+p+1(n)
]
; Υ(n) =
[∂(y1)
∗(n)
∂w1,1(n), · · · ,
∂(yl)∗(n)
∂wN,N+p+1(n)
]
(5.21)
The sensitivity Ψls,t is calculated by differentiating yl in (5.19) with respect to ws,t resulting
in
Ψls,t(n) =
∂yl(n)
∂was,t(n)
+∂yl(n)
∂wbs,t(n)
ı+∂yl(n)
∂wcs,t(n)
+∂yl(n)
∂wds,t(n)
κ (5.22)
5.4 Simulations 90
To find the term ∂yl∂wa
s,tin (5.22), differentiate yl with respect to wa
s,t to yield
∂yl∂wa
s,t(n)=
∂yl(n)
∂netl(n)
∂netl(n)
∂was,t(n)
= Φ′
(netl(n))
(
δsl(zal (n) + zbl (n)ı+ zcl (n)+ zdl (n)κ
)+
N∑
q=1
ws,t(n)∂yq(n− 1)
∂was,t(n)
)
(5.23)
Similar to the derivation of ∂yl∂wa
s,t, the terms ∂yl
∂wbs,t
, ∂yl∂wc
s,tand ∂yl
∂wds,t
can also be found in the
same manner. From Appendix I, the final expression for the sensitivity Ψls,t(n) is given by
Ψls,t(n) = Φ
′
(netl(n))
(
− 2δslz∗l (n) +
N∑
q=1
ws,t(n)Ψqs,t(n− 1)
)
(5.24)
Similarly, the expression for the conjugate sensitivity Υls,t becomes
Υls,t(n) = Φ
′∗(netl(n))
(
4δslz∗l (n) +
N∑
q=1
Υqs,t(n− 1)w∗
s,t(n)
)
(5.25)
It is clear that only two quaternion-valued sensitivities Υ and Ψ in (5.24) - (5.25) are
needed to govern the system in contrast with the 16 real-valued sensitivities in the split-
quaternion case shown in (5.18), which results in a reduced computational complexity.
5.4 Simulations
The tanh(q) was chosen as the nonlinear activation function and initial values for Ψ(n)
and Υ(n) were set to zero for both algorithms. The algorithms had the input tap length
of p = 3 and output neurons of N = 2; the performance was assessed in a predictive
setting. For simulation purposes, the three-dimensional Lorenz chaotic signal [52] and
three-dimensional real-world Tai Chi motion recorded from 3D inertial motion sensors
were considered.
5.4 Simulations 91
−0.5
0
0.5
−0.5
0
0.50
0.2
0.4
0.6
0.8
XYZ
−0.5
0
0.5
−0.5
0
0.50
0.2
0.4
0.6
0.8
XY
Z−0.5
0
0.5
−0.5
0
0.50
0.2
0.4
0.6
0.8
XY
Z
−0.5
0
0.5
−0.5
0
0.50
0.2
0.4
0.6
0.8
XY
Z
−0.5
0
0.5
−0.5
0
0.50
0.2
0.4
0.6
0.8
XY
Z
−0.5
0
0.5
−0.5
0
0.50
0.2
0.4
0.6
0.8
XY
Z
a) Phase Space for Lorenz Signal with M=1
b) Phase Space for Lorenz Signal with M=10Lorenz Attractor Split QRTRL
Split QRTRL QRTRLLorenz Attractor
QRTRL
Figure 5.2: Phase space of Lorenz signal
Dimension Split QRTRL M=1 QRTRL M=1 Split QRTRL M=10 QRTRL M=10
X 0.884 0.935 0.360 0.636Y 0.867 0.919 0.351 0.412Z 0.588 0.719 0.001 0.034
Table 5.1: Correlation Coefficients Between Lorenz Attractors
5.4.1 Three-dimensional Lorenz Chaotic Signal
The Lorenz attractor is a three-dimensional system originally used to model atmospheric
turbulence [74] but is also now used to model lasers, dynamos and waterwheels [75]. For
this experiment, the learning rate is set to µ = 5× 10−4 for both algorithms.
The Lorenz attractor is governed by coupled partial differential equations
∂x
∂t= α(y − x);
∂y
∂t= x(ρ− z)− y;
∂z
∂t= xy − βz (5.26)
where α, ρ and β >0. The parameters for the Lorenz system was chosen to be α = 10,
ρ = 28 and β = 8/3. These values would ensure the existence of the Lorenz attractor.
5.4 Simulations 92
0 200 400 600 800
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Number of iterations (n)
X−
com
po
ne
nt
0 200 400 600 8000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Number of iterations (n)Y
−co
mp
on
en
t
0 200 400 600 800
−0.4
−0.2
0
0.2
0.4
0.6
Number of iterations (n)
Z−
com
po
ne
nt
Actual
split QRTRL
QRTRL, RTRL
RTRL
Actual QRTRL
Actual
QRTRL
split QRTRL
split QRTRLRTRL
Figure 5.3: The performance of QRTRL, split QRTRL and RTRL on the prediction ofmotion data
Figure 5.2a shows the original Lorenz attractor and the reconstruction of the at-
tractor in the phase space for both algorithms at one step ahead (M = 1) prediction.
Although both algorithms were able to reconstruct the attractor, the QRTRL estimated
a more accurate replica of the attractor than the split QRTRL.
Figure 5.2b depicts the Lorenz attractor and the reconstructed attractor for both
algorithms for ten steps ahead prediction (M = 10). It is apparent that the output of
the QRTRL still resembles the original Lorenz attractor, therefore outperforming the split
QRTRL.
Table 5.1 shows the correlation coefficients between the original Lorenz attractor
and the reconstructed attractor for the split QRTRL and QRTRL algorithms for M = 1
and M = 10. The larger values of the QRTRL algorithm for all three-dimensions at one
and ten step ahead predictions proved that its reconstructed attractor was more similar
to the original Lorenz attractor compared to the ones by the split QRTRL. This justifies
5.5 Summary 93
the advantages of using a fully quaternion function over a split quaternion function.
5.4.2 Motion Estimation
Five 3D gyroscopes were placed on the left arm, left hand, right arm, right hand and the
waist of an athlete performing Tai Chi movements and 3D motion data were recorded using
the XSense MTx 3DOF Orientation Tracker and the movement of the left arm was used
as a pure quaternion input for this simulation. For a fair comparison, the performance of
three parallel real-valued FCRNN trained with RTRL is also considered [52]. The learning
rate was set to µ = 1 × 10−3 for the quaternion-valued algorithms and µ = 1 × 10−2 for
the RTRL since it performed poorly at smaller learning rate.
Figure 5.3 shows the componentwise performance of the one step ahead prediction
M = 1 of the Tai Chi motion using the split QRTRL, QRTRL and RTRL algorithms. It
can be seen that the algorithms performed similarly in the X-component. However, the
QRTRL performed better than the split QRTRL and RTRL in Y- and Z-components. The
performance for both the split QRTRL and RTRL were similar for all three-dimensions.
5.5 Summary
A class of quaternion-valued learning algorithms for recurrent neural networks based on
the local gradient has been introduced. The superior performances of the fully-quaternion
algorithm (QRTRL) compared to the split-quaternion algorithm (split QRTRL) stems
from the fact that QRTRL accounts for the interchannel couplings in contrast to the split
QRTRL. The componentwise channel processing which operates on both split QRTRL and
RTRL explains their similar performances. Simulations over the chaotic Lorenz attractor
and real world three-dimensional motion data illustrate the advantages of the proposed
approach. The same framework can be used to introduce any other nonlinear gradient
based learning algorithm, and by removing the nonlinearity, learning algorithms for IIR
filters are obtained.
94
Chapter 6
Identification of Improper
Quaternion Processes by
Fractional Tap-Length Algorithms
In the previous chapter, it has been established that the locally analytic functions are
suitable for the training of Quaternion-valued recurrent neural networks (RNN). Owing
to the inherent ability to better capture the cross-correlations between dimensions, the
performance of the algorithm based on the fully quaternion has improved compared to its
split quaternion counterparts.
This chapter aims to extend the fractional-tap (FT) length adaptive filtering
paradigm from the real to the quaternion domain that enables data-adaptive optimal
modelling and identification. This is achieved by combining the FT length optimisation
with the recently introduced strictly linear and widely linear quaternion-valued adap-
tive filtering algorithms, the Quaternion Least Mean Square (QLMS) and Widely Linear
Quaternion Least Mean Square (WL-QLMS). A collaborative combination of QLMS and
WLQLMS (CC-QLMS) is shown to both identify the type of processes (second-order cir-
cular and noncircular) and to track their optimal parameters. Further insights into these
algorithms are provided by establishing a relationship between the steady-state error and
tap-length. This is further supported by simulations on model order selection and identifi-
6.1 Introduction 95
cation of the second-order circular (proper) and noncircular (improper) quaternion-valued
systems.
6.1 Introduction
A convenient and rigorous method to identify the model order of a quaternion-valued
system is by using a combination of quaternion-valued adaptive filters and variable tap-
length algorithm, optimised for the optimal filter length [28,76]. The variable tap-length
algorithm considered in this work is the fractional tap-length (FT) one, due to its simplicity
and robustness [28]. The FT algorithm was designed specifically for real-valued filters and
recently extended to widely linear complex-valued filters [29].
To this end, the FT algorithm is extended to the quaternion domain by consid-
ering the second-order augmented quaternion statistics of the signal. The quaternion-
valued algorithms considered are, the recently introduced Quaternion Least Mean Square
(QLMS) [24] and Widely Linear Quaternion Least Mean Square (WL-QLMS) [45] algo-
rithms, which when combined provide the necessary tools to identify the model order of
a general quaternion-valued systems. The WL-QLMS is based on the widely linear model
which has the ability to capture the full second-order statistics of the quaternion signal
characterised by the standard covariance matrix Cq and three complementary covariance
matrices termed the ı-covariance Cqı, -covariance Cq and κ-covariance Cqκ [42, 43]. The
collaborative combination of QLMS and WLQLMS (CCQLMS) provide a more flexible
tool for the modelling of the generality quaternion-valued systems. Furthermore, the
evolution of the convex mixing parameter illustrates the degree of properness of a given
quaternion-valued system.
This chapter is organised as follows. Section 6.2 shall describe the workings of
the proposed model order identification algorithms. This is followed by the steady-state
analysis in Section 6.3. In Section 6.4, simulations supporting the proposed approach are
presented. The chapter concludes in Section 6.5.
6.2 Model Order Identification 96
6.2 Model Order Identification
The proposed algorithms, FT-QLMS, FT-WLQLMS and FT-CCQLMS, comprise of two
parts: the finite impulse response (FIR) filter weight update which optimises the adaptive
weight coefficients, followed by the fractional tap-length (FT) algorithm that adapts the
tap-length of the filter to an optimal length. The filter weight algorithms are first reviewed
then followed by an illustration on ways to exploit the FT algorithm within quaternion-
valued adaptive systems.
6.2.1 Filter Weight Updates
The filter weight quaternion-valued algorithms are based on optimising a real-valued cost
function of quaternion variables shown to be
E(n) = e2a(n) + e2b(n) + e2c(n) + e2d(n) = e(n)e∗(n) (6.1)
where the error e(n) = d(n)−y(n) with d(n) and y(n) denoting respectively to the desired
signal and output signal. The terms ea(n), eb(n), ec(n) and ed(n) denote respectively the
error component in the real part, ı part, part, and κ part.
The QLMS is based on gradient-descent and is described by [24] (derivation is
provided in Appendix A)
el(n) = d(n)− yl(n)
yl(n) = wT (n)x(n)
w(n+ 1) = w(n) + µ
(
2el(n)x∗(n)− x∗(n)e∗l (n)
)
(6.2)
where w(n) is the weight vector, x(n) is the filter input, el(n) is the QLMS error, yl(n)
is the QLMS output, symbol (·)∗ denotes the quaternion conjugate operator, and µ is a
real-valued learning rate.
The WL-QLMS which utilises the widely linear model and is given by [45] (deriva-
tion is similar to the Augmented Quaternion Nonlinear Gradient Descent (AQNGD) al-
6.2 Model Order Identification 97
Figure 6.1: Hybrid filter structure.
gorithm in Section 4.3.2)
ew(n) = d(n)− yw(n)
yw(n) = waT(n)xa(n)
wa(n+ 1) = wa(n) + µ
(
2ew(n)xa∗(n)− xa∗(n)e∗w(n)
)
(6.3)
where ew(n) is the WL-QLMS error, yw(n) is the WL-QLMS output, wa(n) is the aug-
mented weight vector and xa(n) is the augmented filter input.
The collaborative filter shown in Figure 6.1, consists of two independent subfilters
sharing the common filter input x(n) and desired signal d(n). Similar to [77], the convex
combination of the output of the QLMS and WL-QLMS (CC-QLMS) forms the overall
output ycc(n) given by
ycc(n) = λ(n)yl(n) +(1− λ(n)
)yw(n) (6.4)
where λ(n) is the real-valued convex mixing parameter. The update of the convex mixing
parameter λ(n) is governed by
λ(n+ 1) = λ(n)− µλ∇λE(n) (6.5)
6.2 Model Order Identification 98
where µλ and ∇λE(k) represent the real-valued stepsize and the error gradient.
The error gradient ∇λE(n) can be evaluated as
∇λE(n) = ecc(n)∂e∗cc(n)
∂λ(n)+∂ecc(n)
∂λ(n)e∗cc(n)
= ecc(n)(yl(n)− yw(n)
)∗+
(yl(n)− yw(n)
)e∗cc(n)
= 2R{ecc(n)
(yl(n)− yw(n)
)∗}(6.6)
where ecc(n) = d(n)−ycc(n) is the error of the CC-QLMS algorithm and R{·} is the scalar
part of the variable.
This will yield the final weight update of the convex mixing parameter λ(n) in the
form
λ(n+ 1) = λ(n)− µλ
(
R{ecc(n)
(yl(n)− yw(n)
)∗})
(6.7)
where 2 is absorbed into the learning rate µλ.
Due to the convex nature of the CC-QLMS and given that the mixing parame-
ter λ(n) is within [0, 1], the CC-QLMS would converge as long as one of the subfilters
converges [78]. The value of λ(n) is hard bounded when λ > 1 or λ < 0.
6.2.2 Tap Length Adaptation
The tap-length adaptation is governed by the FT algorithm given by [28]
ηf (n+ 1) = (ηf (n)− α)− γ ·
[(
E(p)p (n)
)
−
(
E(p)p−∆(n)
)]
(6.8)
where ηf is the pseudo fractional tap-length which can take only positive real value, α and
γ are the leaky factor and tap-length learning rate, which are small positive real values
that satisfy α � γ. Symbols E(p)p (n) and E
(p)p−∆(n) denote respectively the instantaneous
square errors for the tap-lengths of p and p −∆, symbol p denotes the “true” tap-length
at discrete time instant ‘n’, and ∆ is a real positive integer such that min{p(n)−∆} > 0.
The instantaneous square output errors for filters of lengths p and p−∆ are given
6.3 Steady-State Analysis of FT Based Algorithms 99
by
E(p)p (n) =
(e(p)p (n)
)(e(p)p (n)
)∗; E
(p)p−∆(n) =
(e(p)p−∆(n)
)(e(p)p−∆(n)
)∗(6.9)
based on the errors e(p)p (n) and e
(p)p−∆(n).
These errors can be shown to be
e(p)q (n) = d(n)− y(p)q (n) = d(n)−w(p)Tq (n)x(p)
q (n) (6.10)
where 1 ≤ q ≤ p, while w(p)q (n) and x
(p)q (n) are vectors consisting of the first q coefficients
of w(p)(n) and x(p)(n), respectively.
To calculate the optimal tap length, the tap-length parameter p(n) is made adaptive
according to [28]
p(n+ 1) =
bηf (n)c, |p(n)− ηf (n)| ≥ δ
p(n), otherwise(6.11)
where δ is a predefined integer threshold and b·c denotes the floor operator.
The operations of the proposed algorithms are summarised in Algorithm 1.
Algorithm 1
Filter Weight Algorithms
Initialisation Values: λ(0) = 0.5, ηf (0) = p(0)
CC-QLMS, QLMS: w(n+ 1) = w(n) + µ
(
2el(n)x∗(n)− x∗(n)e∗l (n)
)
CC-QLMS, WL-QLMS: wa(n+ 1) = wa(n) + µ
(
2ew(n)xa∗(n)− xa∗(n)e∗w(n)
)
CC-QLMS: λ(n+ 1) = λ(n)− µλ
(
R{ecc(n)
(yl(n)− yw(n)
)∗})
Fractional Tap-Length Algorithm
ηf (n+ 1) = (ηf (n)− α)− γ ·
[(
E(p)p (n)
)
−
(
E(p)p−∆(n)
)]
p(n+ 1) =
{bηf (n)c, |p(n)− ηf (n)| ≥ δp(n), otherwise
6.3 Steady-State Analysis of FT Based Algorithms
This section will provide a rigorous steady-state analysis of the of the FT-QLMS, FT-
WLQLMS and FT-CCQLMS algorithms for two models of teaching signals: linear and
6.3 Steady-State Analysis of FT Based Algorithms 100
widely linear. First consider the case of widely linear teaching signal and the FT-WLQLMS
algorithm. The desired (teaching) signal d(n) is defined as
d(n) = goTLoptxLopt(n) + hoTLoptx
ıLopt(n) + uoT
LoptxLopt(n) + voT
LoptxκLopt(n) + v(n) (6.12)
where goLopt, hoLopt, u
oLopt and vo
Lopt are the optimal weight coefficients of the optimal tap
lengths of the widely linear model, and v(n) is a H-circular quaternion white Gaussian
noise. The symbols (·)ı, (·) and (·)κ denote the ı, , κ involutions respectively.
The output of the FT-WLQLMS algorithm is given as
yw(n) = gT (n)x(n)︸ ︷︷ ︸
standard part
+hT (n)xı(n) + uT (n)x(n) + vT (n)xκ(n)︸ ︷︷ ︸
augmented part
(6.13)
The output error e(n) = d(n) − y(n) is expressed in terms of the optimal tap weights by
subtracting (6.13) from (6.12) resulting in
e(n) = goTxLopt(n) + hoTxıLopt(n) + uoTx
Lopt(n) + voTxκ
Lopt(n) + v(n)
− gT (n)x(n)− hT (n)xı(n)− uT (n)x(n)− vT (n)xκ(k) (6.14)
Proceeding in a manner similar to the analysis in [79], the optimal coefficients of the weight
vectors can be split into three parts
goLopt =
g′o
g′′o
g′′′o
hoLopt =
h′o
h′′o
h′′′o
uoLopt =
u′o
u′′o
u′′′o
voLopt =
v′o
v′′o
v′′′o
(6.15)
where g′o, h′o, u′o, v′o are the coefficients modelled by tap-length 1:p −∆, g′′o, h′′o, u′′o,
v′′o are the coefficients modelled by the tap-length p−∆+ 1 : p, and g′′′o, h′′′o, u′′′o, v′′′o
are the undermodelled coefficients.
For convenience, the coefficient weight error vectors of the FT-WLQLMS are denoted as
g(n) = go−gp(n); h(n) = ho−hp(n); u(n) = uo−up(n); v(n) = vo−vp(n) (6.16)
6.3 Steady-State Analysis of FT Based Algorithms 101
where gp(n), hp(n), up(n) and vp(n) are the FT-WLQLMS weight vectors of length p.
Similar to (6.15), the weight error vectors can also be split up into three parts
g(n) =
g′(n)
g′′(n)
g′′′(n)
h(n) =
h′(n)
h′′(n)
h′′′(n)
u(n) =
u′(n)
u′′(n)
u′′′(n)
v(n) =
v′(n)
v′′(n)
v′′′(n)
(6.17)
The errors e(p)p (n) and e
(p)p−∆(n) are rewritten to be (the time index ‘n’ has been dropped
due to space limitations)
e(p)p =
g′
g′′
g′′′o
T
x′
x′′
x′′′
+
h′
h′′
h′′′o
T
x′ı
x′′ı
x′′′ı
+
u′
u′′
u′′′o
T
x′
x′′
x′′′
+
v′
v′′
v′′′o
T
x′κ
x′′κ
x′′′κ
+ v (6.18)
e(p)p−∆ =
g′
g′′o
g′′′o
T
x′
x′′
x′′′
+
h′
h′′o
h′′′o
T
x′ı
x′′ı
x′′′ı
+
u′
u′′o
u′′′o
T
x′
x′′
x′′′
+
v′
v′′o
v′′′o
T
x′κ
x′′κ
x′′′κ
+ v (6.19)
In order to ensure mathematical tractability, the following assumptions are en-
forced [79]:
a) both the input signal x(n) and the noise v(n) are i.i.d. zero mean white jointly
Gaussian with the respective variances σ2x and σ2v ;
b) at the steady state, the input signal x(n) is independent of the weight vectors;
c) the tap-length parameter has converged at steady-state, hence E{ηf (n + 1)} =
E{ηf (n)}, leading to the undermodelled error vectors vanishing.
Applying the statistical expectation operator to the steady-state MSE in (6.8) yields
E
{(
E(p)p (n)
)
−
(
E(p)p−∆(n)
)}
< |α
γ| (6.20)
6.3 Steady-State Analysis of FT Based Algorithms 102
Following the definitions of E(p)p and E
(p)p−∆ in (6.9), expanding (6.20) will give
E{‖g′′T (n)x′′(n)‖22 + ‖h′′T
(n)x′′ı(n)‖22 + ‖u′′T (n)x′′(n)‖22 + ‖v′′T (n)x′′κ(n)‖22
−‖g′′oT (n)x′′(n)‖22 − ‖h′′oT (n)x′′ı(n)‖22 − ‖u′′oT (n)x′′(n)‖22 − ‖v′′oT (n)x′′κ(n)‖22} < |α
γ| (6.21)
Remark#1: The FT-WLQLMS incorporates the errors from the standard and augmented
parts of the quaternion widely linear model in adapting the tap-length, thus ensuring
efficient modelling of the widely linear quaternion-valued systems.
To obtain the steady-state of the FT-QLMS algorithm, the augmented part in (6.13) is
set to zero which will give
yl(n) = wT (n)x(n) (6.22)
Proceeding in a similar fashion to FT-WLQLMS, the final steady-state performance is
shown to be
E{‖w′′T (n)x′′(n)‖22 − ‖g′′oT (n)x′′(n)‖22} < |α
γ| (6.23)
Remark#2: The FT-QLMS only considers the error from only the standard part of the
quaternion widely linear model in adapting the tap-length proving to be insufficient for
the modelling of widely linear quaternion-valued systems.
Next, the steady-state of the FT-CCQLMS algorithm is derived. Consider the output of
FT-CCQLMS given by
ycc(n) = λ(n)wT (n)x(n)+
(
1−λ(n)
)(
gT (n)x(n)+hT (n)xı(n)+uT (n)x(n)+vT (n)xκ(k)
)
(6.24)
Following a similar manner to obtain FT-WLQLMS and FT-QLMS, the final steady-state
6.3 Steady-State Analysis of FT Based Algorithms 103
is given by
E
{(
1− λ(n)
)(
‖g′′T (n)x′′(n)‖22 + ‖h′′T
(n)x′′ı(n)‖22 + ‖u′′T (n)x′′(n)‖22 + ‖v′′T (n)x′′κ(n)‖22
)
+λ(n)‖w′′T (n)x′′(n)‖22 − ‖g′′oT (n)x′′(n)‖22 − ‖h′′oT (n)x′′ı(n)‖22 − ‖u′′oT (n)x′′(n)‖22
−‖v′′oT (n)x′′κ(n)‖22
}
< |α
γ| (6.25)
For optimal processing, λ→ 0 which will simplify (6.25) to become similar to the steady-
state of FT-WLQLMS in (6.21).
Remark#3: As λ → 0, the FT-CCQLMS performance will be similar to the FT-
WLQLMS for the processing of widely linear systems.
Moving on, consider a linear model shown to be
d(n) = woTLoptxLopt(n) + v(n) (6.26)
This will then result in similar steady-state expression for FT-QLMS and FT-WLQLMS
given by
FT-QLMS : E{‖w′′T (n)x′′(n)‖22 − ‖w′′oT (n)x′′(n)‖22} < |α
γ| (6.27)
FT-WLQLMS : E{‖g′′T (n)x′′(n)‖22 − ‖w′′oT (n)x′′(n)‖22} < |α
γ| (6.28)
Remark#4: Both the FT-QLMS and FT-WLQLMS takes into account the error from the
standard part of the quaternion linear model in adapting their tap-lengths demonstrating
to be suitable for the modelling of the linear quaternion-valued systems.
Similarly, the steady-state expression for FT-CCQLMS will become
E
{
λ(n)‖w′′T (n)x′′(n)‖22 +
(
1− λ(n)
)(
‖g′′T (n)x′′(n)‖22 + ‖h′′T
(n)x′′ı(n)‖22 + ‖u′′T (n)x′′(n)‖22
+‖v′′T (n)x′′κ(n)‖22
)
− ‖w′′oT (n)x′′(n)‖22
}
< |α
γ| (6.29)
For optimal processing of the linear model, λ→ 1 resulting in a similar expression to the
steady-state of FT-QLMS.
Remark#5: The FT-CCQLMS will have similar expressions to the FT-QLMS for the
6.4 Simulations 104
10 20 300
10
20
30
40
50
60
Tap Length p
Ste
ad
y−st
ate
MS
E
10 20 300
20
40
60
80
100
120
140
Tap Length p
Ste
ad
y−st
ate
MS
Ea) The steady−state MSE for the process W
1b) The steady−state MSE for the process W
2
QLMS
QLMS, CC−QLMS
WL−QLMSWL−QLMS
CC−QLMS
Figure 6.2: The steady-state MSE for the processesW1 andW2 with respect to tap-length.
modelling of strictly linear quaternion-valued systems when λ→ 1.
6.4 Simulations
Simulations were conducted in the system identification setting and performances of FT-
QLMS, FT-WLQLMS and FT-CCQLMS were evaluated for a range of systems with the
quaternion quadruply circular white Gaussian noise (QWGN) serving as a driving input,
given by
ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (6.30)
where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN).
The QWGN was first fed through a filter defined by H(n) = 0.35ε(n) + ε(n− 1) +
0.35ε(n−2) to illustrate a severe condition. The output of H(n) is then fed to the systems
6.4 Simulations 105
0 1000 2000 3000 4000 50005
10
15
20
25
30
35
40
45
50
Number of iterations (n)
Tap
Leng
th p
0 1000 2000 3000 4000 50000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of iterations (n)
Mix
ing
Par
amet
er λ
FT−WLQLMS
b) Mixing Parameter λ of the linear system W1a) Modelling of the linear system W
1
FT−QLMS, FT−CCQLMS
Figure 6.3: The evolution of the optimal filter length parameter p and mixing parameterλ for the modelling of the linear system W1.
defined by
W1(n) = 1.79W1(n− 1)− 1.85W1(n − 2) + 1.27W1(n− 3)− 0.41W1(n− 4) + ε(n) (6.31)
W2(n) = 1.79W2(n− 1)− 1.85W2(n − 2) + 1.27W2(n− 3)− 0.41W2(n− 4) + ε(n)
+ 0.5ε∗(n) + 0.9ε∗(n− 1) (6.32)
where W1 is a linear AR (4) system [38] and W2 is a widely linear AR (4) system [29].
System W2 is constructed by combining W1 with the augmented part of W given by [80]
W (n) = eıW (n− 1) + ε(n) + 0.5ε∗(n) + 0.9ε∗(n− 1) (6.33)
where ı is the imaginary unit.
6.4 Simulations 106
0 1000 2000 3000 4000 50005
10
15
20
25
30
35
40
45
50
Number of iterations (n)
Tap
Leng
th p
0 1000 2000 3000 4000 50000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of iterations (n)
Mix
ing
Par
amet
er λFT−WLQLMS, FT−CCQLMS
FT−QLMS
a) Modelling of the widely linear system W2
b) Mixing Parameter λ of the widely linearsystem W
2
Figure 6.4: The evolution of the optimal filter length parameter p and mixing parameterλ for the modelling of the widely linear system W2.
6.4.1 Optimal Tap-Length
The optimal tap-lengths for both systems were determined by the steady-state MSE esti-
mated by [28]
ε(n) = λcε(n− 1) + (1− λc)E(n) (6.34)
where ε is the estimated steady-state MSE and λc = 0.9.
Figure 6.2 depicts the steady-state MSE for both the linear system W1 and widely
linear system W2, using the QLMS, WLQLMS and CCQLMS algorithms with µ = 10−3.
From Figure 6.2a, it can be seen that all three of the MSE curves were monotonically non-
increasing functions of the tap-length and such the optimal tap-length for all algorithms
were found to be p0 = {15, 16}. Figure 6.2b shows that the shape of the MSE curve for
QLMS does not asymptotically converge, thus proving the inability of the strictly linear
QLMS to model the widely linear system W2 according to [28]. On the other hand,
the MSE curves for the WL-QLMS and CC-QLMS converged indicating their ability to
model W2 for which the optimal tap-length was found to be p0 = {21, 22}. The optimal
6.4 Simulations 107
5 10 15 20 25 30 350
10
20
30
40
50
60
Tap Length p
Ste
ady−
stat
e M
SE
CC−QLMS
WL−QLMS
QLMS
Figure 6.5: The steady-state MSE for the process linear noncircular W1 with respect totap-length.
tap-lengths for both systems were not a single integer due to the use of feedforward filters,
which can only give approximations of the autoregressive (AR) feedback system response.
6.4.2 Modelling of Quaternion-Valued Systems
Figure 6.3 depicts the evolution of the optimal tap length parameter p for the FT-QLMS,
FT-WLQLMS and FT-CCQLMS algorithms when employed for the modelling of linear
AR(4) system W1 along with the evolution of the mixing parameter λ of FT-CCQLMS.
These algorithms were initialised with the following parameters: α = 0.03, γ = 1, δ=1,
∆=4, µ = 1× 10−5, µl = 5× 10−4, the initial mixing parameter λ(0) = 0.5 and the initial
tap length p(0) = 10. From Figure 6.3a, it was evident that the performances of all three
algorithms considered were similar as they converged to the optimal tap-length at around
the same number of iterations. This conforms with Remark 4 and Remark 5, which gives
justification for their similar performances. Figure 6.3b shows that the mixing parameter
λ → 1 for the FT-CCQLMS when modelling the linear system W1. This corroborates
with Remark 5. The reason being is that the QLMS subfilter converges faster than the
WL-QLMS leading to the CC-QLMS favouring the QLMS.
6.4 Simulations 108
WGN Noncircular
εa N (0, 1)εb −0.6εa +N (0, 1)εc 0.8εb +N (0, 1)εd 0.8εa − 0.4εb +N (0, 1)
Table 6.1: Noncircular Quaternion White Gaussian Noise
Similarly, Figure 6.4 shows the modelling for the widely linear system W2. It can
be seen from Figure 6.4a that the FT-QLMS was unable to model the widely linear system
W2, whereas FT-WLQLMS and FT-CCQLMS converged to the optimal tap-length. This
justified by Remark 1, Remark 2 and Remark 3 in the previous section. Figure 6.4b
illustrates that λ → 0 for the modelling of widely linear system conforming to Remark
3. This is due to the better performance of the WL-QLMS subfilter dominating the CC-
QLMS algorithm.
6.4.3 Nonstationary Systems
A system consisting of three separate subsystems was considered. The first subsystem is
the linear system W1 for the intervals of 1 ≤ n ≤ 3000 followed by widely linear system
W2 for 3001 ≤ n ≤ 6000. The third subsystem is a linear noncircular W1 for the intervals
of 6001 ≤ n ≤ 9000. The linear noncircular W1 is the system W1 (6.31) fed with the
noncircular QWGN as the driving input. The construction of the noncircular QWGN
is described in Table 4.1. For clarity, the characteristics of the noncircular QWGN is
reproduced in Table 6.1.
Figure 6.5 shows the steady-state MSE for noncircular linear system W1 using the
QLMS, WLQLMS and CCQLMS algorithms with µ = 10−4. Analysing Figure 6.5, it is
shown that all three of the MSE curves were monotonically non-increasing functions of
the tap-length and the optimal tap length was found to be po = {21, 22}. Observing the
CC-QLMS curve, there is an error spike at the tap length p = 4. This is because that p = 3
is a local minimum and the CC-QLMS is struggling to escape from it. This is supported
by the reduced slope of the QLMS and WL-QLMS curves between the 3 ≤ p ≤ 5.
6.4 Simulations 109
0 1000 2000 3000 4000 5000 6000 7000 8000 90005
10
15
20
25
30
35
40
45
50
Number of iterations (n)
Tap
Len
gth
p FT−CCQLMSFT−WLQLMS
FT−QLMS
Figure 6.6: The evolution of the optimal filter length parameter p for the modelling of thesystem W1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and noncircular W1
for 6001 ≤ n ≤ 9000
Figure 6.6 shows the evolution of the optimal filter length parameter p for the FT-
QLMS, FT-WLQLMS and FT-CCQLMS employed for the modelling of the system W1
(interval of 1 ≤ n ≤ 3000), W2 (interval of 3001 ≤ n ≤ 6000) and noncircular W1 (interval
of 6001 ≤ n ≤ 9000). These algorithms were initialised as follows: α = 0.03, γ = 1, δ=1,
∆=4, µ = 1× 10−5, µl = 5× 10−4, the initial mixing parameter λ(0) = 0.5 and the initial
tap length p(0) = 25. From the figure, the FT-WLQLMS was able to converge to the
optimal tap-length of the system W1 for the interval 1 ≤ n ≤ 3000 and adapts to the
system W2 for 3001 ≤ n ≤ 6000. The FT-WLQLMS was unable to model efficiently the
noncircular W1 for interval 6001 ≤ n ≤ 9000. As for the FT-QLMS, it was incapable to
adapt to the system W2 during the interval of 3001 ≤ n ≤ 6000 but was able to model W1
and W2. FT-CCQLMS was able to model all three systems owing to the robust mixing
parameter λ.
Figure 6.7 depicts the evolution of the mixing parameter λ of FT-CCQLMS for the
modelling of subsystems W1, W2 and noncircular W1. For the modelling of linear system
W1, the parameter λ → 1 for the interval of 1 ≤ n ≤ 3000 making FT-QLMS dominant
over FT-WLQLMS. As for the widely linear system W2 in the interval 3001 ≤ n ≤ 6000,
6.5 Summary 110
0 1000 2000 3000 4000 5000 6000 7000 8000 90000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of iterations (n)
Mix
ing
Par
amet
er λ
Figure 6.7: The evolution of the mixing parameter λ for the modelling of the systemW1 for the intervals 1 ≤ n ≤ 3000, W2 for 3001 ≤ n ≤ 6000 and noncircular W1 for6001 ≤ n ≤ 9000
the parameter λ → 0 resulting in FT-WLQLMS to be superior. For the processing of
noncircular linear system W1 in the interval 6001 ≤ n ≤ 9000, parameter λ→ 1 favouring
the linear model of FT-QLMS. This is due to the noncircular input signal being quadruply
white which has a low magnitude of properness profile [64]. This corroborates with earlier
findings in [81,82].
6.5 Summary
The fractional tap-length (FT) algorithm is successfully extended into quaternion-valued
adaptive filters trained by the Quaternion Least Mean Square (QLMS) and Widely Linear
Quaternion Least Mean Square (WL-QLMS) which have demonstrated their capabilities in
model order selection. The collaborative combination FT-CCQLMS has been shown to be
able to model efficiently both widely linear and strictly linear quaternion-valued systems
due to its robust mixing ability. The relationship between the steady-state error and tap-
length has been established giving a mathematical proof to the modelling capabilities of
all algorithms. Simulations on model order selection and the identification of quaternion-
6.5 Summary 111
valued systems support the approach. The results can be easily extended to incorporate
nonlinear quaternion-valued algorithms.
112
Chapter 7
Conclusions and Future Works
Section 7.1 of this chapter presents the conclusions of the thesis. Section 7.2 provides
suggestions for future works in the field.
7.1 Conclusions
This thesis has proposed novel class of quaternion-valued adaptive filtering algorithms
improving over existing algorithms. The findings of this thesis are summarised as follows:
a) employ a new cost function which takes into account the non-commutative nature of
quaternion product resulting in a class of nonlinear split-quaternion adaptive filtering
algorithms;
b) introduce a new class of nonlinear quaternion-valued adaptive filtering algorithms
utilizing the locally analytic nonlinear quaternion functions based on the Local An-
alyticity Condition (LAC);
c) extend the locally analytic nonlinear quaternion functions to the recurrent neural
networks (RNN) architecture catering for long-term correlations of the signal;
d) provide a tool to minimize computational complexity and enable system modelling in
the quaternion domain H through the usage of fractional tap-length (FT) algorithm.
7.1 Conclusions 113
The first contribution is deriving algorithms that takes into consideration the non-
commutativity aspect of the quaternion product. Due to the restrictive nature of the
Cauchy-Riemann-Fueter (CRF) conditions, the componentwise analytic split-quaternion
functions are utilised. The excellent simulation results achieved over real- and complex-
valued algorithms of the same nature has highlighted the benefits of processing in the
quaternion domain H. The higher performance over previous nonlinear quaternion-valued
algorithms proved the significance of considering the non-commutativity. These derived
algorithms are then served a basis for future algorithms.
The second contribution is proposing a class of locally analytic functions by-
passing the strict CRF conditions. This CRF restriction is the sole reason prohibiting
further developments of nonlinear quaternion-valued algorithms. The gradient descent
based quaternion-valued algorithms require a first-order derivative which current nonlin-
ear quaternion functions fail to provide. In that respect, the local analyticity condition
(LAC) is chosen as an alternative to define analyticity in H. The nonlinear quaternion
functions satisfying the LAC are called locally analytic functions which guarantees its
first-order differentiability proving to be suitable for gradient descent based algorithms.
One convenient aspect of these functions is that it enables a generic extension of the
complex-valued elementary transcendental functions (ETF) to the quaternion domain H.
Building on the non-commutative split-quaternion algorithms proposed previously, a new
class of quaternion-valued adaptive filtering algorithms utilising these functions are intro-
duced. The improved performance of these algorithms offers an insight to its prowess over
split-quaternion based algorithms.
The third contribution illustrates the versatility of the proposed locally analytic
functions. Developing on the previous fully quaternion algorithms for the finite impulse
response, these functions are implemented in the recurrent neural network (RNN) archi-
tecture. Its superior ability to better capture the cross-correlations between dimensions
and provides a better estimate of the gradient has led to better performances over its
split-quaternion counterpart. Furthermore, the fully-quaternion RNN algorithm has sig-
nificantly less computational complexity than the split-quaternion RNN making it very
7.2 Future Works 114
attractive. The flexibility of the locally analytic functions will hopefully draw more re-
searchers to indulge themselves in fully quaternion based algorithms and its practical
applications.
The fourth contribution tackles the issue of extending the fractional tap-length (FT)
algorithm to the quaternion domain H. The FT algorithms combined with quaternion-
valued adaptive filtering algorithms have demonstrated excellent abilities in model or-
der selection and reducing the computational complexity incurred processing in H. It
is established that the collaborative combination algorithm is able to model generality
quaternion-valued systems owing to the flexible adaptive convex mixing parameter. The
results obtained could be easily extended to nonlinear learning algorithms of the same
architecture. This could further open up applications in the quaternion domain H.
Overall, this thesis has opened up new possibilities for nonlinear quaternion-valued
adaptive filtering algorithms. A class of nonlinear quaternion-valued adaptive filtering
algorithms that takes the non-commutativity aspect of quaternion algebra into consid-
eration is proposed. This class of algorithms is then improved upon by utilising locally
analytic functions. These algorithms are then extended to the recurrent neural network
(RNN) architecture. The previously complex-valued fractional tap-length algorithms were
then extended to the quaternion domain H enabling modelling of generality quaternion-
valued systems. All of the proposed algorithms performances have been supported through
rigorous mathematical analysis and simulations of real and synthetic quaternion-valued
signals.
7.2 Future Works
Following the studies of this thesis, several future directions in this research area is pro-
posed to further improve upon existing algorithms.
Despite the flexibility of the proposed locally analytic functions, the functions are
not suitable for algorithms that are based on the second-order derivatives such as the
Newton method. These second-order derivative algorithms guarantee faster convergence
7.2 Future Works 115
with the price of increased sensitivity to initial values. Proposing an analytic quaternion
function that guarantees the existence of its second-order derivative will enable the ex-
tension of these classes of algorithms to the quaternion domain H. This is crucial to the
development of nonlinear quaternion signal processing in H as a whole.
Another improvement is to extend the quaternion-valued fractional tap-length (FT)
algorithm to encompass architectures with feedbacks such as infinite impulse response
(IIR). One setback of the proposed FT algorithm can only model moving average (MA)
systems and provides only an approximate for autoregressive (AR) and autoregressive
moving average (ARMA) systems. The main obstacle in extending the FT algorithm
to IIR architecture is in determining the order of feedback. The current FT algorithm
cost function is not suited for this task and needs to be modified. Therefore, a new cost
function that considers the impact of the order of feedbacks needs to be constructed and
the performances of this algorithm in H should be analysed.
An interesting research area is to extend the usage of the locally analytic functions
to noise cancellation of optical communication systems. The linear polarization effects such
as the Polarization Mode Dispersion (PMD) and Polarization Dependent Loss (PDL) are
a major source of degradation at high bit rates. These effects can be modelled using a
quaternion transfer function enabling for the processing to be done in H instead of the
complex domain C.
116
Bibliography
[1] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathemat-
ics of Control, Signals, and Systems, vol. 2, no. 4, pp. 303–314, 1989.
[2] K. Funahashi, “On the approximate realisation of continuous mappings by neural
networks,” Neural Networks, vol. 2, no. 3, pp. 183–192, 1989.
[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, 1986.
[4] P. J. Werbos, “Backpropagation Through Time: what it does and how to do it,”
Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
[5] F. J. Pineda, “Recurrent backpropagation and the dynamical approach to adaptive
neural computation,” Neural Computation, vol. 1, no. 2, pp. 161–172, 1989.
[6] R. J. Williams and D. Zipser, “A learning algorithm for continually running fully
recurrent neural networks,” Neural Computation, vol. 1, no. 2, pp. 270–280, 1989.
[7] B. Widrow, J. McCool, and M. Ball, “The complex LMS algorithm,” Proceedings of
the IEEE, vol. 63, no. 4, pp. 719–720, 1975.
[8] H. Leung and S. Haykin, “The complex backpropagation algorithm,” IEEE Transac-
tions on Signal Processing, vol. 39, no. 9, pp. 2101–2104, 1991.
[9] G. M. Georgiou and C. Koutsougeras, “Complex domain backpropagation,” IEEE
Transactions on Circuits and Systems II, vol. 39, no. 5, pp. 330–334, 1992.
[10] A. Hirose, “Continuous complex-valued backpropagation learning,” IEE Electronics
Letters, vol. 28, no. 20, pp. 1854–1855, 1990.
Bibliography 117
[11] G. Kechriotis and E. S. Manolakos, “Training fully recurrent neural networks with
complex weights,” IEEE Transactions on Circuits and Systems II: Analog and Digital
Signal Processing, vol. 41, no. 3, pp. 235–238, 1994.
[12] T. Kim and T. Adali, “Approximation by fully complex multilayer perceptrons,”
Neural Computation, vol. 15, no. 7, pp. 1641–1666, 2003.
[13] S. L. Goh and D. P. Mandic, “A complex-valued RTRL algorithm for recurrent neural
networks,” Neural Computation, vol. 16, no. 12, pp. 2699–2713, 2004.
[14] T. Nitta and H. de Garis, “A 3D vector version of the back-propagation algorithm,” In
Proceedings of International Joint Conference on Neural Networks (IJCNN), vol. 2,
pp. 511–516, 1992.
[15] T. Nitta, “A back-propagation algorithm for neural networks based on 3D vector prod-
uct,” In Proceedings of International Joint Conference on Neural Networks (IJCNN),
vol. 1, pp. 589–592, 1993.
[16] P. Arena, L. Fortuna, G. Muscato, and M. G. Xibilia, Neural Networks in Multidi-
mensional Domains. Lecture Notes in Control and Information Sciences (Springer
Verlag), Vol. 234, 1998.
[17] W. R. Hamilton, Elements of Quaternions (2nd edition). Longmans, Green and Co,
1899.
[18] F. D. Murnaghan, “The evolution of the concept of number,” The Scientific Monthly,
vol. 68, no. 4, pp. 262–269, 1949.
[19] D. Alfsman, H. G. Gockler, S. J. Sangwine, and T. A. Ell, “Hypercomplex algebras
in digital signal processing: benefits and drawbacks,” In Proceedings EURASIP 15th
European Signal Processing Conference (EUSIPCO), pp. 1322–1326, 2007.
[20] C. F. F. Karney, “Quaternions in molecular modelling,” Journal of Molecular Graph-
ics and Modelling, vol. 25, no. 5, pp. 595–604, 2007.
[21] S. B. Choe and J. J. Faraway, “Modeling head and hand orientation during motion
using quaternions,” Journal of Aerospace, vol. 113, no. 1, pp. 186–192, 2004.
Bibliography 118
[22] J. C. K. Chou, “Quaternions kinematic and dynamic differential equations,” IEEE
Transactions on Robotics and Automation, vol. 8, no. 1, pp. 53–64, 1992.
[23] D. Choukkroun, I. Y. Bar Itzhack, and Y. Ohsman, “Novel quaternion Kalman filter,”
IEEE Transactions on Aerospace and Electronics Systems, vol. 42, no. 1, pp. 174–190,
2006.
[24] C. Cheong Took and D. P. Mandic, “The quaternion LMS algorithm for adaptive
filtering of hypercomplex processes,” IEEE Transactions on Signal Processing, vol. 57,
no. 4, pp. 1316–1327, 2009.
[25] A. Sudbery, “Quaternionic analysis,” Mathematical Proceedings of the Cambridge
Philosophical Society, vol. 85, no. 2, pp. 199–225, 1979.
[26] W. R. Hamilton, “Elements of quaternions,” Chelsea Publication, 1969.
[27] S. De Leo and P. Rotelli, “Quaternion analyticity,” Applied Mathematics Letters,
vol. 16, no. 7, pp. 1077–1081, 2003.
[28] Y. Gong and C. F. N. Cowan, “An LMS style variable tap-length algorithm for struc-
ture adaptation,” IEEE Transactions on Signal Processing, vol. 53, no. 7, pp. 2400–
2407, 2005.
[29] B. Che Ujang, C. Cheong Took, and D. P. Mandic, “Identification of improper pro-
cesses by variable tap-length complex valued adaptive filters,” In Proceedings of In-
ternational Joint Conference on Neural Networks (IJCNN), pp. 1–6, 2010.
[30] S. Haykin, Adaptive filter theory (4th edition). Prentice Hall, 2002.
[31] C. C. Took, G. Strbac, K. Aihara, and D. P. Mandic, “Quaternion-valued short-term
joint forecasting of three-dimensional wind and atmospheric parameters,” Renewable
Energy, vol. 36, no. 6, pp. 1754–1760, 2011.
[32] O. Heaviside, “Vectors versus quaternions,” Nature, vol. 47, pp. 533–534, 1893.
[33] A. MacFarlane, “Vectors versus quaternions,” Nature, vol. 48, no. 1230, pp. 75–76,
1893.
Bibliography 119
[34] C. C. Silva and R. D. A. Martins, “Polar and axial vectors versus quaternions,”
American Association of Physics Teachers, vol. 70, no. 9, pp. 958–963, 2002.
[35] F. D. Neeser and J. L. Massey, “Proper complex random processes with applications
to information theory,” IEEE Transactions on Information Theory, vol. 39, no. 4,
pp. 1293–1302, 1993.
[36] B. Picinbono, “On circularity,” IEEE Signal Processing Letters, vol. 42, no. 12,
pp. 3473–3482, 1994.
[37] A. Walden and P. Rubin-Delanchy, “On testing for impropriety of complex-valued
Gaussian vectors,” IEEE Transactions on Signal Processing, vol. 57, no. 3, pp. 825–
834, 2009.
[38] D. P. Mandic and V. S. L. Goh, Complex valued nonlinear adaptive filters: noncircu-
larity, widely linear and neural models. Wiley, 2009.
[39] D. P. Mandic, S. Javidi, S. L. Goh, A. Kuh, and K. Aihara, “Complex valued predic-
tion of wind profile using augmented complex statistics,” Renewable Energy, vol. 34,
no. 1, pp. 196–210, 2009.
[40] N. N. Vakhania, “Random vectors with values in quaternion Hilbert spaces,” Theories
of Probability and its Applications, vol. 43, no. 1, pp. 99–115, 1999.
[41] P. O. Amblard and N. Le Bihan, “On properness of quaternion valued random vari-
ables,” In Proceedings of International Conference on Mathemathics (IMA) in Signal
Processing, pp. 23–26, 2004.
[42] C. Cheong Took and D. P. Mandic, “Augmented second-order statistics of quaternion
random process,” Signal Processing, vol. 91, no. 2, pp. 214–224, 2011.
[43] J. Via, D. Ramirez, and I. Santamaria, “Properness and widely linear processing
of quaternion random vectors,” IEEE Transactions on Information Theory, vol. 56,
no. 7, pp. 3502–3515, 2010.
[44] B. Picinbono and P. Chevalier, “Widely linear estimation with complex data,” IEEE
Transactions on Signal Processing, vol. 43, no. 8, pp. 2030–2033, 1995.
Bibliography 120
[45] C. Cheong Took and D. P. Mandic, “A quaternion widely linear adaptive filter,” IEEE
Transactions on Signal Processing, vol. 58, no. 8, pp. 4427–4431, 2010.
[46] J. Via, D. Ramirez, I. Santamaria, and L. Vielva, “Widely and semi-widely linear
processing of quaternion vectors,” In Proceedings of IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), pp. 3946–3949, 2010.
[47] J. Via, D. P. Palomar, and L. Vielva, “Generalized likelihood ratios for testing the
properness of quaternion gaussian vectors ,” IEEE Transactions on Signal Processing,
vol. 59, no. 4, pp. 1356–1370, 2011.
[48] J. Via, L. Vielva, I. Santamaria, and D. P. Palomar, “Independent component analysis
of quaternion gaussian vectors,” In Proceedings of IEEE Sensor Array and Multichan-
nel Signal Processing Workshop (SAM), pp. 145–148, 2010.
[49] E. Stiefel, “On Cauchy-Riemann equations in higher dimensions,” Journal of Research
of the National Bureau of Standards, vol. 48, no. 5, pp. 395–398, 1952.
[50] R. E. S. Watson, “The generalized Cauchy-Riemann-Fueter equation and handed-
ness,” Complex Variables, vol. 48, no. 7, pp. 555–568, 2003.
[51] C. A. Deavours, “The quaternion calculus,” The American Mathematical Monthly,
vol. 80, no. 9, pp. 995–1008, 1973.
[52] D. P. Mandic and J. A. Chambers, Recurrent Neural Networks for Prediction: Learn-
ing Algorithms, Architectures and Stability. Wiley, 2001.
[53] A. M. Sabatini, “Quaternion-based extended Kalman filter for determining orienta-
tion by inertial and magnetic sensing,” IEEE Transactions on Biomedical Engineer-
ing, vol. 53, no. 7, pp. 1346–1356, 2006.
[54] S. Buchholz and N. L. Bihan, “Polarized signal classification by complex and quater-
nionic multi-layer perceptrons,” International Journal of Neural Systems, vol. 18,
no. 2, pp. 75–85, 2008.
Bibliography 121
[55] L. Fortuna, G. Muscato, and M. G. Xibilia, “A comparison between HMLP and
HRBF for attitude control,” IEEE Transactions on Neural Networks, vol. 12, no. 2,
pp. 318–328, 2001.
[56] D. P. Mandic and J. A. Chambers, “Relating the slope of the activation function and
the learning rate within a recurrent neural network,” Neural Computation, vol. 11,
no. 5, pp. 1069–1077, 1999.
[57] E. Trentin, “Networks with trainable amplitude of activation functions,” Neural Net-
works, vol. 14, no. 4-5, pp. 471–493, 2001.
[58] A. I. Hanna and D. P. Mandic, “Nonlinear FIR adaptive filters with a gradient adap-
tive amplitude in the nonlinearity,” IEEE Signal Processing Letters, vol. 9, no. 8,
pp. 253–255, 2002.
[59] S. L. Goh and D. P. Mandic, “Recurrent neural networks with trainable amplitude
of activation functions,” Neural Networks, vol. 16, no. 8, pp. 1095–1100, 2003.
[60] E. Soria Olivas, J. Maravilla, J. F. Guerrero Martinez, M. Martinez Sober, and
J. Espi Lopez, “An easy demonstration of the optimum value of the adaptation con-
stant in the LMS algorithm,” IEEE Transactions on Education, vol. 41, no. 1, p. 81,
1998.
[61] W. Duch and N. Jankowski, “Survey of neural transfer functions,” Neural Computing
Survey, vol. 2, pp. 163–212, 1999.
[62] S. Haykin and L. Li, “Nonlinear adaptive prediction of nonstationary signals,” IEEE
Transactions on Signal Processing, vol. 43, no. 2, pp. 526–535, 1995.
[63] K. Mitsubori and T. Saito, “Torus doubling and hyperchaos in a five dimensional
hysteresis circuit,” In Proceedings of 1994 IEEE International Symposium on Circuit
and Systems (ISCAS), vol. 6, pp. 113–116, 1994.
[64] J. Via, D. P. Palomar, L. Vielva, and I. Santamaria, “Quaternion ICA from second-
order statistics,” IEEE Transactions on Signal Processing, vol. 59, no. 4, pp. 1586–
1600, 2011.
Bibliography 122
[65] A. Hirose, Complex-valued neural networks: theories and applications. World Scien-
tific Publishing, 2003.
[66] A. Hirose and H. Onishi, “Proposal of relative-minimization learning for behavior
stabilization of complex-valued recurrent neural networks,” Neurocomputing, vol. 24,
no. 1-3, pp. 163–171, 1999.
[67] I. Aizenberg and C. Moraga, “Multilayer feedforward neural network based on multi-
valued neurons (MLMVN) and a backpropagation learning algorithm,” Soft Comput-
ing, vol. 11, no. 2, pp. 169–183, 2007.
[68] I. Aizenberg, N. N. Aizenberg, and J. P. L. Vandewalle, Multi-valued and universal
binary neurons. Springer-Verlag New York, 2000.
[69] S. L.Goh, M. Chen, D. H. Popovic, K. Aihara, D. Obradovic, and D. P. Mandic,
“Complex valued forecasting of wind profile,” Renewable Energy, vol. 31, no. 11,
pp. 1733–1750, 2006.
[70] F. F. Brackx, “The exponential function of a quaternion variable,” Applicable Anal-
ysis, vol. 8, pp. 265–276, 1979.
[71] L. Shi, “Exploration in quaternion colour,” Master’s thesis, Computer Science, Simon
Fraser University, 2005.
[72] S. Buchholz and N. Le Bihan, “Polarized signal classification by complex and quater-
nionic multilayer perceptrons,” International Journal of Neural Systems, vol. 18,
no. 2, pp. 75–85, 2008.
[73] C. Cheong Took and D. P. Mandic, “Quaternion-valued stochastic gradient-based
adaptive IIR filtering,” IEEE Transactions on Signal Processing, vol. 58, no. 7,
pp. 3895–3901, 2010.
[74] E. N. Lorenz, “Deterministic nonperiodic flow,” Journal of the Atmospheric Sciences,
vol. 20, no. 2, pp. 130–141, 1963.
Bibliography 123
[75] S. H. Strogartz, Nonlinear dynamics and chaos: with applications to physics, biol-
ogy, chemistry and engineering (studies in nonlinearity) 1st edition. Boulder , CO:
Westview Press, 2001.
[76] Z. Pritzker and A. Feuer, “Variable length stochastic gradient algorithm,” IEEE
Transactions on Signal Processing, vol. 39, no. 4, pp. 997–1001, 1991.
[77] B. Jelfs, D. P. Mandic, and S. C. Douglas, “An adaptive approach for the identification
of improper complex signals,” Signal Processing, vol. 92, no. 2, pp. 335–344, 2012.
[78] J. Arenas-Garcia and A. H. S. A. R. Figueiras-Vidal, “Mean-square performance of a
convex combination of two adaptive filters,” IEEE Transactions on Signal Processing,
vol. 54, no. 3, pp. 1078–1090, 2006.
[79] Y. Zhang, N. Li, J. A. Chambers, and A. H. Sayed, “Steady-state performance anal-
ysis of variable tap-length LMS algorithm,” IEEE Transactions on Signal Processing,
vol. 56, no. 2, pp. 839–845, 2008.
[80] J. Navarro-Moreno, “ARMA prediction of widely linear systems by using the innova-
tions algorithm,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 3061–
3068, 2008.
[81] B. Jelfs, S. Javidi, P. Vayanos, and D. P. Mandic, “Characterisation of signal modality:
exploiting signal nonlinearity in machine learning and signal processing,” Journal of
Signal Processing Systems, vol. 61, no. 1, pp. 105–115, 2010.
[82] E. Ollila, “On the circularity of a complex random variable,” IEEE Signal Processing
Letters, vol. 15, pp. 841–844, 2008.
124
Appendix A
Derivation of QLMS
To calculate ∇wy(n) and ∇wy∗(n), terms wT (n)x(n) and xH(n)w∗(n) are first expanded
as (due to space limitation, the time index “n” has been dropped) :
wT (n)x(n) =
wTa xa −wT
b xb −wTc xc −wT
d xd
wTa xb +wT
b xa +wTc xd −wT
d xc
wTa xc +wT
c xa +wTd xb −wT
b xd
wTa xd +wT
d xa +wTb xc −wT
c xb
(A.1)
xH(n)w∗(n) =
wTa xa −wT
b xb −wTc xc −wT
d xd
−wTa xb −wT
b xa −wTc xd +wT
d xc
−wTa xc −wT
c xa −wTd xb +wT
b xd
−wTa xd −wT
d xa −wTb xc +wT
c xb
(A.2)
and the gradients ∇wy(n) and ∇wy∗(n) are defined as:
∇wy(n) = ∇way(n) +∇wby(n)ı+∇wcy(n)+∇wd
y(n)κ (A.3)
∇wy∗(n) = ∇way
∗(n) +∇wby∗(n)ı+∇wcy
∗(n)+∇wdy∗(n)κ (A.4)
A. Derivation of QLMS 125
Based on the expansions (A.1) and (A.2), the derivatives of (A.3) can be computed as:
∇way(n) = xa(n) + xb(n)ı+ xc(n)+ xd(n)κ
∇wby(n)ı = (−xb(n) + xa(n)ı− xd(n)+ xc(n)κ)ı
= −xa(n)− xb(n)ı+ xc(n)+ xd(n)κ
∇wcy(n) = (−xc(n) + xd(n)ı+ xa(n)− xb(n)κ)
= −xa(n) + xb(n)ı− xc(n)+ xd(n)κ
∇wdy(n)κ = (−xd(n)− xc(n)ı+ xb(n)+ xa(n)κ)κ
= −xa(n) + xb(n)ı+ xc(n)− xd(n)κ (A.5)
Similarly to the above, the derivatives in (A.4) are obtained as
∇way∗(n) = xa(n)− xb(n)ı− xc(n)− xd(n)κ
∇wby∗(n)ı = (−xb(n)− xa(n)ı+ xd(n)− xc(n)κ)ı
= xa(n)− xb(n)ı− xc(n)− xd(n)κ
∇wcy∗(n) = (−xc(n)− xd(n)ı− xa(n)+ xb(n)κ)
= xa(n)− xb(n)ı− xc(n)− xd(n)κ
∇wdy∗(n)κ = (−xd(n) + xc(n)ı− xb(n)− xa(n)κ)κ
= xa(n)− xb(n)ı− xc(n)− xd(n)κ (A.6)
Substituting (A.5) into gradient ∇wy(n) (A.3) and (A.6) into ∇wy∗(n) (A.4) yield
∇wy(n) = −2x∗(n); ∇wy∗(n) = 4x∗(n) (A.7)
which is employed in the derivation of the QLMS.
The derivation for other nonlinear quaternion algorithms, SQAFA, AASQAFA,
QNGD and AQNGD, also follow a similar approach.
126
Appendix B
Derivation of QMLP-FIR
Before proceeding, the componentwise output net(n) = wT (n)x(n) is given as (the time
index “n” is dropped due to space limitation)
neta
netb
netc
netd
=
wTa xa −wT
b xb −wTc xc −wT
d xd
wTa xb +wT
b xa +wTc xd −wT
d xc
wTa xc +wT
c xa +wTd xb −wT
b xd
wTa xd +wT
d xa +wTb xc −wT
c xb
(B.1)
For clarity, the gradient of the QMLP-FIR is shown to be
∇wE(n) = −2ea(n)∂ya(n)
∂w− 2eb(n)
∂yb(n)
∂w− 2ec(n)
∂yc(n)
∂w− 2ed(n)
∂yd(n)
∂w(B.2)
From Section 3.2.1, the term ∂ya(n)∂w is calculated to be
∂ya(n)
∂w= Φ′
a(neta(n))x∗(n) (B.3)
B. Derivation of QMLP-FIR 127
Similarly, the other terms ∂yb(n)∂w , ∂yc(n)
∂w and ∂yd(n)∂w are derived to be
∂yb(n)
∂w= Φ′
b(netb(n))xb(n) + Φ′b(netb(n))xa(n)ı+Φ′
b(netb(n))xd(n)− Φ′b(netb(n))xc(n)κ
= Φ′b
(netb(n))ı
(− xb(n)ı+ xa(n)− xd(n)κ− xc(n)
)
= Φ′b
(netb(n))ıx
∗(n)
∂yc(n)
∂w= Φ′
c(netc(n))xc(n)− Φ′c(netc(n))xd(n)ı+Φ′
c(netc(n))xa(n)+Φ′c(netc(n))xb(n)κ
= Φ′c
(netc(n))
(− xc(n)− xd(n)κ+ xa(n)− xb(n)ı
)
= Φ′c(netc(n))x
∗(n)
∂yd(n)
∂w= Φ′
d(netd(n))xd(n) + Φ′d(netd(n))xc(n)ı− Φ′
d(netd(n))xb(n)+Φ′d(netd(n))xa(n)κ
= Φ′d
(netd(n))κ
(− xd(n)κ− xc(n)− xb(n)ı+ xa(n)
)
= Φ′d(netd(n))κx
∗(n) (B.4)
Substituting the terms defined in (B.3) and (B.4) into the gradient ∇wE(n) (B.2) to yield
∇wE(n) = −2ea(n)Φ′a(neta(n))x
∗(n)− 2eb(n)Φ′b(netb(n))ıx
∗(n)− 2ec(n)Φ′c(netc(n))x
∗(n)
− 2ed(n)Φ′d(netd(n))κx
∗(n)
= −2e(n).Φs(net(n))x∗(n) (B.5)
where “.” denotes the dot product.
128
Appendix C
Convergence of SQAFA
The convergence criterion employed in this work is given by
E{‖e(n)‖22} ≤ E{‖e(n)‖22} (C.1)
where e and e are respectively the a posteriori and the a priori output error, given by
e(n) = d(n)− Φs
(wT (n+ 1)x(n)
)+ ε(n); e(n) = d(n)− Φs
(wT (n)x(n)
)+ ε(n) (C.2)
The symbols ε and ε denote quaternion quadruply white Gaussian noise (QWGN) defined
as
ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (C.3)
where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN), inde-
pendent and identically distributed (i.i.d.).
The terms e and e in (C.2) can be related by the first order Taylor series expansion as
‖e(n)‖22 = ‖e(n)‖22 +∆wH(n)∂‖e(n)‖22∂w∗(n)
(C.4)
C. Convergence of SQAFA 129
where∂‖e(n)‖2
2
∂w∗(n) is effectively the error gradient of the cost function.
The term ‖e(n)‖22 is first evaluated as
‖e(n)‖22 =
(
d(n)− y(n) + ε(n)
)(
d∗(n)− y∗(n) + ε∗(n)
)
= d(n)d∗(n)− d(n)y∗(n) + d(n)ε∗(n)− y(n)d∗(n) + y(n)y∗(n)− y(n)ε∗(n) + ε(n)d∗(n)
− ε(n)y∗(n) + ε(n)ε∗(n) (C.5)
Then, the error gradient∂‖e(n)‖2
2
∂w∗(n) can be calculated as
∂‖e(n)‖22∂w∗(n)
= −d(n)∇wy∗(n)−∇wy(n)d
∗(n) + y(n)∇wy∗(n) +∇wy(n)y
∗(n)−∇wy(n)ε∗(n)
− ε(n)∇wy∗(n)
=
(
− d(n) + y(n)− ε(n)
)
∇wy∗(n) +∇wy(n)
(
− d∗(n) + y∗(n)− ε∗(n)
)
= −e(n)∇wy∗(n)−∇wy(n)e
∗(n)
= −[4e(n)Φ′
s
(xH(n)w∗(n)
)x∗(n)− 2Φ′
s
(wT (n)x(n)
)x∗(n)e∗(n)
](C.6)
The term ∆wH(n) = −µ(∂‖e(n)‖2
2
∂w∗(n)
)H, where
∂‖e(n)‖22
∂w∗(n) is given in (C.6), and can be calculated
as
∆wH = µ[2xT (n)Φ
′∗s
(xH(n)w∗(n)
)e∗(n)− e(n)xT (n)Φ′∗
s
(wT (n)x(n)
)](C.7)
Substitute (C.6) - (C.7) into the Taylor series expansion (C.4) and apply the expectation
operators on both sides to yield
E{‖e(n)‖22} = E
{
|e(n)‖22 − µ
([2xT (n)Φ′∗
s
(xH(n)w∗(n)
)e∗(n)− e(n)xT (n)Φ′∗
s
(wT (n)x(n)
)]
[4e(n)Φ′
s
(xH(n)w∗(n)
)x∗(n)− 2Φ′
s
(wT (n)x(n)
)x∗(n)e∗(n)
])}
(C.8)
C. Convergence of SQAFA 130
Applying the assumptions of small µ and statistical independence between the e(n) and
x(n) followed by the factorization of the term ‖e(n)‖22 gives
E{‖e(n)‖22} = E
{
‖e(n)‖22
[
1− 10µxT (n)x∗(n)‖Φ′s
(wT (n)x(n)
)‖22
]}
= E{‖e(n)‖22}E
{[
1− 10µxT(n)x∗(n)‖Φ′s
(wT (n)x(n)
)‖22
]}
(C.9)
The two terms can be separated since they are independent of each other cor-
responding to the statistical independence between the e(n) and x(n). Therefore, the
condition for convergence in (C.1) is satisfied for
0 < 10µE{xT (n)x∗(n)‖Φ′s
(wT (n
)x(n))‖22} < 1 (C.10)
Solving for µ we obtain the range of the stepsize for SQAFA to converge
0 < µ <1
10E{xT (n)x∗(n)‖Φ′s
(wT (n)x(n)
)‖22}
(C.11)
The range of stepsize for QNGD and AQNGD are derived in the same manner.
131
Appendix D
Convergence of AASQAFA
Similar to the convergence of SQAFA, the convergence criterion employed is
E{‖e(n)‖22} ≤ E{‖e(n)‖22} (D.1)
The a priori error in the real part ea(n), and the a posteriori error in the real part ea(n),
are given by
ea(n) = da(n)−λa(n)Φa
(wT (n)x(n)
)+ε(n); ; ea(n) = da(n)−λa(n)Φa
(wT (n+1)x(n)
)+ε(n)
(D.2)
The symbols ε and ε denote quaternion quadruply white Gaussian noise (QWGN) defined
as
ε(n) = εa(n) + εb(n)ı+ εc(n)+ εd(n)κ (D.3)
where εa, εb, εc and εd are realisations of real-valued white Gaussian noises (WGN), inde-
pendent and identically distributed (i.i.d.).
Since λa corresponds to the real part of a quaternion quantity, we shall consider
only the real part of the Taylor series expansion. From (C.4) we have
‖ea(n)‖22 = ‖ea(n)‖
22 +∆aw(n)
∂‖ea(n)‖22
∂w(n)(D.4)
where the term ∆awH(n) refers to the Hermitian of the weight update in the real part.
D. Convergence of AASQAFA 132
The term∂‖ea(n)‖22∂w(n) is equivalent to ∇wEa(n) and is given by
∇wEa = ea(n)∂e∗a(n)
∂w(n)+∂ea(n)
∂w(n)e∗a(n) = 2ea(n)
∂ea(n)
∂w(n)(D.5)
From the previous results, the term R{wT (n)x(n)} = neta(n) is given as
neta(n) = wTa (n)xa(n)−wT
b (n)xb(n)−wTc (n)xc(n)−wT
d (n)xd(n) (D.6)
Using (D.6) the real part of the nonlinear function Φa(·) can be expanded into
Φa
(neta(n)
)= Φa
(wT
a (n)xa(n)−wTb (n)xb(n)−wT
c (n)xc(n)−wTd (n)xd(n)
)(D.7)
Substitute the expression for the real part of the nonlinear function (D.7) into the a priori
error (D.2) and then differentiate with respect to w(n) to give
∂ea(n)
∂w(n)= −λa(n)Φ
′a
(wT (n)x(n)
)(xa(n)− xb(n)ı− xc(n)− xd(n)κ
)
= −λa(n)Φ′a
(wT (n)x(n)
)x∗(n) (D.8)
Replacing (D.8) into the error gradient ∇wEa(n) in (D.5) gives
∇wEa(n) = −2ea(n)λa(n)Φ′a
(wT (n)x(n)
)x∗(n) (D.9)
The term ∆aw(n) is obtained from (D.9) and is given by
∆aw(n) = µ
(
λa(n)Φ′a
(wT (n)x(n)
)xT (n)ea(n)
)
(D.10)
Replace the error gradient ∇wEa(n) from (D.9) and ∆aw(n) from (D.10) into the real
D. Convergence of AASQAFA 133
part of the Taylor series expansion (D.4) to yield
E{‖ea(n)‖22} = E
{
‖ea(n)‖22 −
[
2µλ2a(n)xT (n)x∗(n)‖Φ′
a
(wT (n)x(n)
)‖22‖ea(n)‖
22
]}
(D.11)
Now, in order to satisfy the convergence condition (D.1), (D.11) becomes
0 < E{1 − 2µλ2a(n)xT (n)x∗(n)‖Φ′
a
(wT (n)x(n)
)‖22)} < 1 (D.12)
Solving for λa(n) gives the stability bounds on the adaptive amplitude parameter, in the
form
0 < λ2a(n) <1
2µE{xT (n)x∗(n)‖Φ′a
(wT (n)x(n)
)‖22}
(D.13)
which also reveals the relationship between the value of the amplitude of the quaternion
nonlinearity and the stepsize parameter.
Similarly, using the same procedures, the bounds on λb(n), λc(n) and λd(n) can be found
as
0 < λ2b(n) <1
2µxT (n)x∗(n)‖Φ′b
(wT (n)x(n)
)‖2
(D.14)
0 < λ2c(n) <1
2µxT (n)x∗(n)‖Φ′c
(wT (n)x(n)
)‖22
(D.15)
0 < λ2d(n) <1
2µxT (n)x∗(n)‖Φ′d
(wT (n)x(n)
)‖22
(D.16)
134
Appendix E
Analyticity of the exponential
function eq
The quaternion exponential function eq in its Euler form is given by
eq = eqa(
cos(α) +qb sin(α)ı
α+qc sin(α)
α+qd sin(α)κ
α
)
(E.1)
The derivative to be evaluated is defined as
−∂eq
∂αζ = −
(qbα
∂eq
∂qb+qcα
∂eq
∂qc+qdα
∂eq
∂qd
)(qbı+ qc+ qdκ
α
)
(E.2)
To calculate the term −∂eq
∂α ζ in (E.2), the terms ∂eq
∂qb, ∂eq
∂qcand ∂eq
∂qdare first evaluated. The
term ∂eq
∂qbis derived by differentiating (E.1) with respect to qb to yield
∂eq
∂qb= eqa
∂
∂qb
(
cos(α) +qb sin(α)ı
α+qc sin(α)
α+qd sin(α)κ
α
)
= eqa(−qb sin(α)
α+q2b cos(α)ı
α2+
(q2c + q2d
)sin(α)ı
α3
+qbqc cos(α)
α2−qbqc sin(α)
α3+qbqd cos(α)κ
α2−qbqd sin(α)κ
α3
)
(E.3)
E. Analyticity of the exponential function eq 135
Proceeding in the same manner, the terms ∂eq
∂qcand ∂eq
∂qdare calculated as
∂eq
∂qc= eqa
(−qc sin(α)
α+qbqc cos(α)ı
α2−qbqc sin(α)ı
α3
+q2c cos(α)
α2+
(q2b + q2d
)sin(α)
α3+qcqd cos(α)κ
α2−qcqd sin(α)κ
α3
)
(E.4)
∂eq
∂qd= eqa
(−qd sin(α)
α+qbqd cos(α)ı
α2−qbqd sin(α)ı
α3
+qcqd cos(α)
α2−
(qcqd
)sin(α)
α3+q2d cos(α)κ
α2+
(q2b + q2c
)sin(α)κ
α3
)
(E.5)
Substituting the terms defined in (E.3), (E.4) and (E.5) into the analyticity condition
specified in (E.2) to yield
−∂eq
∂αζ = eqa
(− sin(α)
α2
(q2b + q2c + q2d
)+qa cos(α)
α3ı+
qc cos(α)
α3+
qd cos(α)
α3κ
)(
− ζ
)
(E.6)
From Section 4.3, ζ and α are given by
ζ =qbı+ qc+ qdκ
α; α =
√
q2b + q2c + q2d (E.7)
(E.6) is simplified further by replacing the definition of ζ and α in (E.7) to give
−∂eq
∂αζ = eqa
(
− sin(α) +qb cos(α)ı
α+qc cos(α)
α+qd cos(α)κ
α
)(
− ζ
)
= eqa(
cos(α) + sin(α)ζ
)
(E.8)
136
Appendix F
Local Analyticity of tanh(q)
From Section 4.3.2, the Euler expression for tanh(q) is derived to be
tanh(q) =e4qa − 1 + 2e2qa sin(2α)ζ
e4qa + 1 + 2e2qa cos(2α)(F.1)
To examine the local analyticity of tanh(q), the quaternion local analyticity condition of
tanh(q) is given as
∂ tanh(q)
∂qa= −
(qbα
∂ tanh(q)
∂qb+qcα
∂ tanh(q)
∂qc+qdα
∂ tanh(q)
∂qd
)(qbı+ qc+ qdκ
α
)
(F.2)
Similarly to the case of quaternion exponential functions, the term ∂ tanh(q)∂qa
can be obtained
by differentiating (F.1) with respect to qa, to give
∂ tanh(q)
∂qa=
∂
∂qa
(e4qa − 1
e4qa + 2e2qa cos(2α) + 1+
2e2qa sin(2α)ζ
e4qa + 2e2qa cos(2α) + 1
)
=4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)
(e4qa + 2e2qa cos(2α) + 1
)2 +
(4e2qa sin(2α) − 4e6qa sin(2α)
)
(e4qa + 2e2qa cos(2α) + 1
)2 ζ (F.3)
In order to determine the remaining terms in (F.2), define
u = 2e2qa sin(2α); v = e4qa + 2e2qa cos(2α) + 1 (F.4)
F. Local Analyticity of tanh(q) 137
Furthermore, ζ and α are defined by
ζ =qbı+ qc+ qdκ
α; α =
√
q2b + q2c + q2d (F.5)
Substitute u and v into (F.1) and expand ζ according to (F.5) to yield
tanh(q) =e4qa − 1 + uζ
v=
e4qa − 1
v+uqbı
vα+uqc
vα+uqdκ
vα(F.6)
Proceeding in a manner similar to when determining the analyticity of eq, the term ∂ tanh(q)∂qb
is obtained by differentiating (F.2) with respect to qb, resulting in
∂ tanh(q)
∂qb=
∂
∂qb
(e4qa − 1
v+uqbı
vα+uqc
vα+uqdκ
vα
)
=
(e4qa − 1
)(4e2qaqb sin(2α)
)
v2+
(vα
)(∂uqb∂qb
)−
(uqb
)(∂vα∂qb
)
(vα)2
ı+
(vα
)(∂uqc∂qb
)−
(uqc
)(∂vα∂qb
)
(vα)2
+
(vα
)(∂uqd∂qb
)−
(uqd
)(∂vα∂qb
)
(vα)2
κ
=
(e4qa − 1
)(4e2qaqb sin(2α)
)
v2+
(vαu+ v4e2qaq2b cos(2α) −
uvq2b
α + uq2b4e2qa sin(2α)
(vα
)2
)
ı
+
(v4e2qaqbqc cos(2α) −
uvqbqcα + uqbqc4e
2qa sin(2α)(vα
)2
)
+
(v4e2qaqbqd cos(2α)−
uvqbqdα + uqbqd4e
2qa sin(2α)(vα
)2
)
κ (F.7)
F. Local Analyticity of tanh(q) 138
Noticing that u, v and α are functions of the variables qb, qc and qd, the terms ∂ tanh(q)∂qc
and ∂ tanh(q)∂qd
become
∂ tanh(q)
∂qc=
(e4qa − 1
)(4e2qaqc sin(2α)
)
v2+
(v4e2qaqbqc cos(2α) −
uvqbqcα + uqbqc4e
2qa sin(2α)(vα
)2
)
ı
+
(vαu+ v4e2qaq2c cos(2α) −
uvq2cα + uq2c4e
2qa sin(2α)(vα
)2
)
+
(v4e2qaqcqd cos(2α) −
uvqcqdα + uqcqd4e
2qa sin(2α)(vα
)2
)
κ (F.8)
∂ tanh(q)
∂qd=
(e4qa − 1
)(4e2qaqd sin(2α)
)
v2+
(v4e2qaqbqd cos(2α) −
uvqbqdα + uqbqd4e
2qa sin(2α)(vα
)2
)
ı
+
(v4e2qaqcqd cos(2α) −
uvqcqdα + uqcqd4e
2qa sin(2α)(vα
)2
)
+
(vαu+ v4e2qaq2d cos(2α) −
uvq2d
α + uq2d4e2qa sin(2α)
(vα
)2
)
κ (F.9)
Replacing (F.7), (F.8) and (F.9) to the right hand of side of (F.2) yields
−∂ tanh(q)
∂αζ =
((e4qa − 1
)(4e2qa sin(2α)
(q2b + q2c + q2d
))
(vα
)2 +v4qbe
2qa cos(2α) + u4qbe2qa sin(2α)
v2αı
+v4qce
2qa cos(2α) + u4qce2qa sin(2α)
v2α+
v4qde2qa cos(2α) + u4qde
2qa sin(2α)
v2ακ
)
(
− ζ
)
(F.10)
Next, the terms u and v (F.4) are expanded to give
−∂ tanh(q)
∂αζ =
((e4qa − 1
)(4e2qa sin(2α)
(q2b + q2c + q2d
))
((e4qa + 2e2qa cos(2α) + 1
)α)2
+4qbe
6qa cos(2α) + 4qbe2qa cos(2α) + 8qbe
4qa(cos2(2α) + sin2(2α)
)
(e4qa + 2e2qa cos(2α) + 1
)2α
ı
+4qce
6qa cos(2α) + 4qce2qa cos(2α) + 8qce
4qa(cos2(2α) + sin2(2α)
)
(e4qa + 2e2qa cos(2α) + 1
)2α
+4qde
6qa cos(2α) + 4qde2qa cos(2α) + 8qde
4qa(cos2(2α) + sin2(2α)
)
(e4qa + 2e2qa cos(2α) + 1
)2α
κ
)(
− ζ
)
(F.11)
F. Local Analyticity of tanh(q) 139
Simplify (F.11) further by employing sin2(α) + cos2(α) = 1 to yield
−∂ tanh(q)
∂αζ =
(4e6qa sin(2α) − 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1
)2
+4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)
(e4qa + 2e2qa cos(2α) + 1
)2
(qbı+ qc+ qdκ
α
))(
− ζ
)
(F.12)
Further substituting ζ and α in (F.5) gives
−∂ tanh(q)
∂αζ =
(4e6qa sin(2α) − 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1
)2 +4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)
(e4qa + 2e2qa cos(2α) + 1
)2 ζ
)(
− ζ
)
=4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)
(e4qa + 2e2qa cos(2α) + 1
)2 +4e2qa sin(2α) − 4e6qa sin(2α)(e4qa + 2e2qa cos(2α) + 1
)2 ζ (F.13)
140
Appendix G
A Local Derivative of tanh(q)
sech(q) is first expanded into its Euler formula to give
sech(q) =2
eq + e−q
=2
eqa(cos(α) + sin(α)ζ
)+ e−qa
(cos(α)− sin(α)ζ
)
=2e3qa
(cos(α)− sin(α)ζ
)+ 2eqa
(cos(α) + sin(α)ζ
)
e4qa + 2e2qa(cos2(α)− sin2(α)
)+ 1
(G.1)
and apply the identity cos2(α)− sin2(α) = cos(2α) to give
sech(q) =2e3qa
(cos(α)− sin(α)ζ
)+ 2eqa
(cos(α) + sin(α)ζ
)
e4qa + 2e2qa cos(2α) + 1(G.2)
Upon squaring (G.2) results in
sech2(q) =4e6qa
(cos2(α) − sin2(α)
)+ 4e4qa
(2 cos2(α) + 2 sin2(α)
)+ 4e2qa
(cos2(α) − sin2(α)
)
(e4qa + 2e2qa cos(2α) + 1
)2
+−8e6qa sin(α) cos(α) + 8e2qa sin(α) cos(α)
(e4qa + 2e2qa cos(2α) + 1
)2 ζ (G.3)
and substituting 2 sin(α) cos(α) = sin(2α) yields
sech2(q) =4e6qa cos(2α) + 8e4qa + 4e2qa cos(2α)
(e4qa + 2e2qa cos(2α) + 1
)2 +−4e6qa sin(2α) + 4e2qa sin(2α)(e4qa + 2e2qa cos(2α) + 1
)2 ζ (G.4)
141
Appendix H
Derivation of Split QRTRL
The term wTl (n)z(n) = netl is expanded into its componentwise terms given by (the time
index “n” is dropped due to space limitation)
netal
netbl
netcl
netdl
=
(wal )
T za − (wbl )
T zb − (wcl )
T zc − (wdl )
T zd
(wal )
T zb + (wbl )
T za + (wcl )
T zd − (wdl )
T zc
(wal )
T zc + (wcl )
T za + (wdl )
T zb − (wbl )
T zd
(wal )
T zd + (wdl )
T za + (wbl )
T zc − (wcl )
T zb
(H.1)
The gradient for the split QRTRL is given as
∇wE(n) =∂E(n)
∂wa(n)+
∂E(n)
∂wb(n)ı+
∂E(n)
∂wc(n)+
∂E(n)
∂wd(n)κ (H.2)
Expanding the term ∂E∂wa
s,tin (H.2) gives
∂E(n)
∂was,t(n)
= −eal (n)Ψl,(aa)s,t (n)− ebl (n)Ψ
l,(ba)s,t (n)− ecl (n)Ψ
l,(ca)s,t (n)− edl (n)Ψ
l,(da)s,t (n) (H.3)
From Section 5.3.1, the sensitivity Ψl,(aa)s,t is given as
Ψl,(aa)s,t (n) = Φ
′
s
(netal (n)
)(
δslzal (n) +
N∑
q=1
wal,p+1+q(n)Ψ
q,(aa)s,t (n− 1)− wb
l,p+1+q(n)Ψq,(ba)s,t (n− 1)
− wcl,p+1+q(n)Ψ
q,(ca)s,t (n− 1)− wd
l,p+1+q(n)Ψq,(da)s,t (n− 1)
)
(H.4)
H. Derivation of Split QRTRL 142
Similar to the derivation of Ψl,(aa)s,t , the other three sensitivities in (H.3) is determined to
be
Ψl,(ba)s,t (n) = Φ
′
s
(netbl (n)
)(δslz
bl (n) +
N∑
q=1
wal,p+1+q(n)Ψ
q,(ba)s,t (n− 1) + wb
l,p+1+q(n)Ψq,(aa)s,t (n− 1)
+ wcl,p+1+q(n)Ψ
q,(da)s,t (n− 1)− wd
l,p+1+q(n)Ψq,(ca)s,t (n− 1)
)
Ψl,(ca)s,t (n) = Φ
′
s
(netcl (n)
)(δslz
cl (n) +
N∑
q=1
wal,p+1+q(n)Ψ
q,(ca)s,t (n− 1)− wb
l,p+1+q(n)Ψq,(da)s,t (n− 1)
+ wcl,p+1+q(n)Ψ
q,(aa)s,t (n− 1) + wd
l,p+1+q(n)Ψq,(ba)s,t (n − 1)
)
Ψq,(da)s,t (n) = Φ
′
s
(netdl (n)
)(δslz
dl (n) +
N∑
q=1
wal,p+1+q(n)Ψ
q,(da)s,t (n− 1) + wb
l,p+1+q(n)Ψq,(ca)s,t (n− 1)
− wcl,p+1+q(n)Ψ
q,(ba)s,t (n− 1) + wd
l,p+1+q(n)Ψq,(aa)s,t (n − 1)
)) (H.5)
Next, expanding the remaining terms ∂E∂wb
s,t
, ∂E∂wc
s,tand ∂E
∂wds,t
in (H.2) will result in
∂E(n)
∂wbs,t(n)
= −eal (n)Ψl,(ab)s,t (n)− ebl (n)Ψ
l,(bb)s,t (n)− ecl (n)Ψ
l,(cb)s,t (n)− edl (n)Ψ
l,(db)s,t (n)
∂E(n)
∂wcs,t(n)
= −eal (n)Ψl,(ac)s,t (n)− ebl (n)Ψ
l,(bc)s,t (n)− ecl (n)Ψ
l,(cc)s,t (n)− edl (n)Ψ
l,(dc)s,t (n)
∂E(n)
∂wds,t(n)
= −eal (n)Ψl,(ad)s,t (n)− ebl (n)Ψ
l,(bd)s,t (n)− ecl (n)Ψ
l,(cd)s,t (n)− edl (n)Ψ
l,(dd)s,t (n)(H.6)
Since the sensitivities in (H.6) is in a similar form to (H.3), the following 12 sensitivities
are also in a similar expression to (H.5). These remaining sensitivities are derived in the
same manner and the full expression is given in Section 5.3.1.
143
Appendix I
Derivation of QRTRL
The sensitivity Ψls,t is shown to be
Ψls,t(n) =
∂yl(n)
∂was,t(n)
+∂yl(n)
∂wbs,t(n)
ı+∂yl(n)
∂wcs,t(n)
+∂yl(n)
∂wds,t(n)
κ (I.1)
In order to derive the terms in (I.1), the term wTl (n)z(n) = netl(n) is defined as (due to
space limitation, the time index “n” has been dropped)
netl=
(wal )
T za − (wbl )
T zb − (wcl )
T zc − (wdl )
T zd
(wal )
T zb + (wbl )
T za + (wcl )
T zd − (wdl )
T zc
(wal )
T zc + (wcl )
T za + (wdl )
T zb − (wbl )
T zd
(wal )
T zd + (wdl )
T za + (wbl )
T zc − (wcl )
T zb
(I.2)
Utilizing (I.2), the term ∂yl∂wa
s,tis shown to be
∂yl(n)
∂was,t(n)
= Φ′
(netl(n))
(
δsl(zal (n)+z
bl (n)ı+z
cl (n)+z
dl (n)κ
)+
N∑
q=1
ws,t(n)∂yq(n − 1)
∂was,t(n)
)
(I.3)
I. Derivation of QRTRL 144
Similar to the derivation of ∂yl∂wa
s,tin Section 5.3.2, the terms ∂yl
∂wbs,t
, ∂yl∂wc
s,tand ∂yl
∂wds,t
∂yl(n)
∂wbs,t(n)
ı = Φ∗(netl(n))
(
δsl(−zal (n)− zbl (n)ı+ zcl (n)+ zdl (n)κ) +
∑Nq=1ws,t(n)
∂yq(n−1))
wbs,t(n)
ı
)
∂yl(n)
∂wcs,t(n)
ı = Φ∗(netl(n))
(
δsl(−zal (n) + zbl (n)ı− zcl (n)+ zdl (n)κ) +
∑Nq=1ws,t(n)
∂yq(n−1))∂wc
s,t(n)
)
∂yl(n)
∂wds,t(n)
κ = Φ∗(netl(n))
(
δsl(−zal (n) + zbl (n)ı+ zcl (n)− zdl (n)κ) +
∑Nq=1 ws,t(n)
∂yq(n−1))
∂wds,t(n)
κ
)
(I.4)
Adding up these terms to determine the sensitivity Ψls,t will give
Ψls,t(n) = = Φ
′
(netl(n))
(
− 2δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ
)
+
N∑
q=1
ws,t(n)(∂yq(n− 1)
∂was,t(n)
+∂yq(n− 1))
wbs,t(n)
ı+∂yq(n− 1))
∂wcs,t(n)
+∂yq(n− 1))
∂wds,t(n)
κ))
= Φ′
(netl(n))
(
− 2δslzl(n) +
N∑
q=1
ws,t(n)Ψls,t(n− 1)
)
(I.5)
The conjugate senstivity Υls,t is derived in a similar fashion. The sensitivity Υl
s,t is first
shown to be
Υls,t(n) =
∂y∗l (n)
∂was,t(n)
+∂y∗l (n)
∂wbs,t(n)
ı+∂y∗l (n)
∂wcs,t(n)
+∂y∗l (n)
∂wds,t(n)
κ (I.6)
Since y∗l (n) = Φ(zH(n)w∗(n)
), the term zH(n)w∗(n) = net∗l (n) is first defined as (the
time index “n” is dropped due to space limitation)
net∗l =
(wal )
T za − (wbl )
T zb − (wcl )
T zc − (wdl )
T zd
−(wal )
T zb − (wbl )
T za − (wcl )
T zd + (wdl )
T zc
−(wal )
T zc − (wcl )
T za − (wdl )
T zb + (wbl )
T zd
−(wal )
T zd − (wdl )
T za − (wbl )
T zc + (wcl )
T zb
(I.7)
I. Derivation of QRTRL 145
Following a similar method in determining the sensitivity Ψls,t, the differential terms in
(I.6) is derived to be
∂y∗l (n)
∂was,t(n)
= Φ′
(net∗l (n))
(
δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ
)+
N∑
q=1
∂y∗q(n− 1)
∂was,t(n)
w∗s,t(n)
)
∂y∗l (n)
∂wbs,t(n)
= Φ′
(net∗l (n))
(
δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ
)+
N∑
q=1
∂y∗q(n− 1)
∂wbs,t(n)
w∗s,t(n)
)
∂y∗l (n)
∂wcs,t(n)
= Φ′
(net∗l (n))
(
δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ
)+
N∑
q=1
∂y∗q(n− 1)
∂wcs,t(n)
w∗s,t(n)
)
∂y∗l (n)
∂wds,t(n)
= Φ′
(net∗l (n))
(
δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ
)+
N∑
q=1
∂y∗q(n− 1)
∂wds,t(n)
w∗s,t(n)
)
(I.8)
Next, these differential terms are added up to yield the conjugate sensitivity Υls,t as
Υls,t(n) = = Φ
′
(net∗l (n))
(
4δsl(zal (n)− zbl (n)ı− zcl (n)− zdl (n)κ
)
+N∑
q=1
(∂y∗q (n− 1)
∂was,t(n)
+∂y∗q(n− 1))
wbs,t(n)
ı+∂y∗q(n − 1))
∂wcs,t(n)
+∂y∗q(n− 1))
∂wds,t(n)
κ)w∗s,t(n)
)
= Φ′
(net∗l (n))
(
4δslzl(n) +
N∑
q=1
Υls,t(n− 1)w∗
s,t(n)
)
(I.9)