a geometric approach to biomedical time series analysis
TRANSCRIPT
A Geometric Approach to Biomedical Time Series Analysis
by
John Rajiv Vijh Malik
Department of MathematicsDuke University
Date:
Approved:
Hau-tieng Wu, Advisor
David Dunson
Sayan Mukherjee
James Nolen
Dissertation submitted in partial fulfillment of therequirements for the degree of Doctor of Philosophy
in the Department of Mathematicsin the Graduate School of
Duke University
2020
ABSTRACT
A Geometric Approach to Biomedical Time Series Analysis
by
John Rajiv Vijh Malik
Department of MathematicsDuke University
Date:
Approved:
Hau-tieng Wu, Advisor
David Dunson
Sayan Mukherjee
James Nolen
An abstract of a dissertation submitted in partial fulfillment of therequirements for the degree of Doctor of Philosophy
in the Department of Mathematicsin the Graduate School of
Duke University
2020
Copyright © 2020 by John Rajiv Vijh Malik
All rights reserved
Abstract
Biomedical time series are non-invasive windows through which we may observe
human systems. Although a vast amount of information is hidden in the medical
field’s growing collection of long-term, high-resolution, and multi-modal biomedical
time series, effective algorithms for extracting that information have not yet been
developed. We are particularly interested in the physiological dynamics of a human
system, namely the changes in state that the system experiences over time (which
may be intrinsic or extrinsic in origin). We introduce a mathematical model for a
particular class of biomedical time series, called the wave-shape oscillatory model,
which quantifies the sense in which dynamics are hidden in those time series. There are
two key ideas behind the new model. First, instead of viewing a biomedical time series
as a sequence of measurements made at the sampling rate of the signal, we can often
view it as a sequence of cycles occurring at irregularly-sampled time points. Second,
the “shape” of an individual cycle is assumed to have a one-to-one correspondence with
the state of the system being monitored; as such, changes in system state (dynamics)
can be inferred by tracking changes in cycle shape. Since physiological dynamics are
not random but are well-regulated (except in the most pathological of cases), we can
assume that all of the system’s states lie on a low-dimensional, abstract Riemannian
manifold called the phase manifold. When we model the correspondence between
the hidden system states and the observed cycle shapes using a diffeomorphism, we
allow the topology of the phase manifold to be recovered by methods belonging to the
field of unsupervised manifold learning. In particular, we prove that the physiological
dynamics hidden in a time series adhering to the wave-shape oscillatory model can
be well-recovered by applying the diffusion maps algorithm to the time series’ set
of oscillatory cycles. We provide several applications of the wave-shape oscillatory
iv
model and the associated algorithm for dynamics recovery, including unsupervised and
supervised heartbeat classification, derived respiratory monitoring, intra-operative
cardiovascular monitoring, supervised and unsupervised sleep stage classification, and
f -wave extraction (a single-channel blind source separation problem).
v
Acknowledgements
Thank you to my supervisor, Hau-tieng Wu; my committee members; my collaborators
Yu-Ting Lin, Yu-Lun Lo, Chun-Li Wang, Elsayed Z. Soliman, Gi-Ren Liu, Nan
Wu, Chao Shen, and Yao Lu; my fellow graduate students at Duke University, the
University of Toronto, and the University of Western Ontario; the faculty members at
the University of Western Ontario; my family in Canada; and my girlfriend.
vi
Contents
Abstract iv
Acknowledgements vi
List of Tables xi
List of Figures xii
1 Introduction 1
1.1 Biomedical Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Wave-shape Oscillatory Model . . . . . . . . . . . . . . . . . . . 6
1.4 Dynamics in Chaotic Time Series . . . . . . . . . . . . . . . . . . . . 22
2 Diffusion Maps 24
2.1 Diffusion on a Measure Space . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Diffusion on a Compact Domain . . . . . . . . . . . . . . . . . . . . . 27
2.3 Diffusion on a Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 Finite Samples . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Robustness to Noise . . . . . . . . . . . . . . . . . . . . . . . 35
3 Cardiovascular Waveform Analysis 37
3.1 Heartbeat Classification . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1.1 Unsupervised Extraction of Morphological Dynamics . . . . . 40
3.1.2 Supervised Ventricular Ectopy Detection . . . . . . . . . . . . 43
3.1.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
vii
3.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Electrocardiogram-derived Respiration . . . . . . . . . . . . . . . . . 59
3.2.1 Modeling Respiratory Dynamics in the ECG . . . . . . . . . . 61
3.2.2 DM-EDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.3 Material and Results . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Intraoperative Cardiac Monitoring . . . . . . . . . . . . . . . . . . . . 66
4 Decoding Sleep Dynamics using Multiple EEG Channels 74
4.1 Background and Significance . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Unsupervised Extraction of Sleep Dynamics . . . . . . . . . . . . . . 77
4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.1 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.2 Annotated Polysomnograms . . . . . . . . . . . . . . . . . . . 85
4.3.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5 Supervised Extraction of Sleep Dynamics using InstantaneousHeart Rate 92
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.1.1 Heart Rate Variability and Sleep . . . . . . . . . . . . . . . . 93
5.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2.2 Instantaneous Heart Rate . . . . . . . . . . . . . . . . . . . . 97
5.2.3 Convolutional Neural Network . . . . . . . . . . . . . . . . . . 98
viii
5.2.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 98
5.2.5 Robustness Check . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.3.1 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3.2 Transfer Proficiency Among Monitoring Devices . . . . . . . . 103
5.3.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4 Further Exploration of the CNN via Data and Feature Visualization . 108
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5.1 Comparison with Previous Work . . . . . . . . . . . . . . . . 117
5.5.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . 119
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.7 Supplementary Results . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6 Blind Source Separation in Atrial Fibrillation 125
6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1.1 DD-NLEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1.2 A Variation of DD-NLEM . . . . . . . . . . . . . . . . . . . . . 133
6.1.3 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.1.4 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.2.1 Comparison with Previous Algorithms . . . . . . . . . . . . . 141
6.2.2 Validation of the Proposed mVR Index . . . . . . . . . . . . . 142
6.3 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 147
Bibliography 151
ix
Biography 187
x
List of Tables
3.1 Annotated single-lead ECG databases . . . . . . . . . . . . . . . . . . 47
3.2 Performance of the proposed ventricular ectopy detector . . . . . . . 54
3.3 Comparison with previous results on DS2 . . . . . . . . . . . . . . . . 55
4.1 Comparison of linear sensor fusion methods . . . . . . . . . . . . . . 88
4.2 F1 scores obtained by the sleep stage classifier when using differentpairings of EEG channels . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Performance of the proposed automatic sleep stage classifier . . . . . 88
5.1 Performance statistics for the CNN model trained on CGMH-training 103
5.2 Transfer proficiency of the CNN model among different monitoringdevices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Sensitivity analysis for the CNN model trained on CGMH-trainingwith various input sizes . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.4 Performance statistics for the CNN models trained on the DREAMSSubjects Database and the UCDSADB . . . . . . . . . . . . . . . . . 109
5.5 Performance statistics for the REM vs. non-REM model trained onCGMH-training (REM) . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.6 Performance statistics for the scattering transform-based model trainedon CGMH-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.1 Comparison of existing single-lead f -wave extraction algorithms fromthree viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.2 Comparison of f -wave extraction algorithms on simulated one-hourECG signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.3 Comparison of f -wave extraction algorithms on real one-hour Holtersignals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
xi
List of Figures
1.1 Simultaneously recorded 10-second ECG, PPG, and ABP signals . . . 4
1.2 The instantaneous amplitude of an ECG signal gives surrogate respira-tory information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 The instantaneous frequency of an ECG signal is known as the instan-taneous heart rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Landmarks on the enigmatic ECG template correspond to distinctstages of a healthy heart contraction . . . . . . . . . . . . . . . . . . 11
1.5 Atrial fibrillation is characterized by a lack of P waves, irregularly-occurring ventricular contractions, and a fibrillatory wave . . . . . . . 12
1.6 ECG signals featuring premature ventricular contractions are difficultto model because the wave-shape is not slowly varying . . . . . . . . 16
1.7 An illustration of the wave-shape manifold in Theorem 1.3.3 . . . . . 20
3.1 A pathological ECG signal featuring morphologically distinct prematureventricular contractions and fusion beats . . . . . . . . . . . . . . . . 43
3.2 A mapping of N = 2953 cardiac cycles into R2 afforded by the top twocoordinates of the time-1 diffusion map . . . . . . . . . . . . . . . . . 44
3.3 A projection of the morphological dynamics on the recovered wave-shape manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 A mapping of the set of QRS complexes to R2 using the top twonon-trivial coordinates of the time-1 diffusion map . . . . . . . . . . . 65
3.5 The EDR signal (obtained from either the traditional method or theDM-EDR algorithm) well-approximates the respiratory flow signal . . . 66
3.6 Morphological differences between normal and PVC beats are reflectedin the top non-trivial eigenvector of the diffusion kernel . . . . . . . . 67
3.7 ECG pulses occuring preceding, during, and following the MI event . 68
xii
3.8 A mapping of the ECG waveforms to R3 afforded by the first threenon-trivial coordinates of the diffusion map . . . . . . . . . . . . . . . 71
3.9 ABP pulses preceding, during, and following the MI event . . . . . . 73
3.10 A mapping of the ABP pulse waveforms to R2 afforded by the first andthird non-trivial coordinates of the diffusion map . . . . . . . . . . . 73
4.1 A mapping of N = 734 locally averaged, time-localized power spectrainto R3 afforded by the top three non-trivial coordinates of the time-1diffusion map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 A projection of the morphological dynamics on the recovered phasemanifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1 The architecture of the 1-dimensional convolutional neural network . 99
5.2 The architecture of a single convolution block . . . . . . . . . . . . . 99
5.3 We apply the trained CNN model to each subject in CGMH-validationindividually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4 We apply the trained CNN model to each subject in the UCDSADBindividually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 We construct input signals of various lengths by additionally consideringdata which precedes the labeled epoch . . . . . . . . . . . . . . . . . 106
5.6 We illustrate the features captured by the trained CNN model . . . . 111
5.7 We assess the predictive potential of the 10 features learned by theCNN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.8 Typical activations in the last convolutional layer of the CNN model . 113
6.1 An ECG signal featuring atrial fibrillation . . . . . . . . . . . . . . . 126
6.2 After running each of the seven algorithms on a simulated, one-hourECG signal featuring Af, we plot the extracted f -waves . . . . . . . . 143
xiii
6.3 After running each of the five previous algorithms on a real, one-hourHolter recording featuring Af, we plot the extracted f -waves . . . . . 144
6.4 After running DD-NLEM and its variation NLEM on a real, one-hour Holterrecording featuring Af, we plot the extracted f -waves . . . . . . . . . 146
6.5 A scatter plot of the mVR (mean modified ventricular residue) indexversus the normalized mean squared error (NMSE) . . . . . . . . . . 147
xiv
Chapter 1
Introduction
1.1 Biomedical Time Series
Biomedical time series (vital signs) are obtained when energy emitted by the human
body is passed through a transducer to a digital recording device. A variety of
energy mediums can be digitized, including electrical energy, fluid pressure, vibration,
gas partial pressure, and sound. Examples of biomedical time series include the
electrocardiogram (ECG), the electroencephalogram (EEG), the photoplethysmogram
(PPG), the electrooculogram (EOG), and the electromyogram (EMG). Some recording
tools are invasive, but others are non-invasive; some tools are active, but others observe
passively. Vital signs are windows through which we may view the physiological
state of systems within the human body. The ECG observes the cardiac system, the
EEG observes the nervous system, the PPG observes the venous system, the EOG
documents eye movement, and the EMG observes the musculoskeletal system.
In most cases, a fixed number of measurements (called the sampling rate of the
signal) are made per second. Vital signs recorded at higher sampling rates contain
larger amounts of information but require increased processing power and more storage
space. Vital signs can be recorded for any amount of time, provided enough storage
space is available. Multiple recordings may be made simultaneously, in which case
information from a variety of viewpoints can be fused. On one hand, the resolution,
length, and number of modes in a biomedical time series all add richness to the
amount of information documented by the time series, but on the other hand, these
factors add complexity to the task of information extraction. The task of information
extraction is also complicated by the fact that recordings are noisy, contain nuisance
1
background information, are rarely of pristine quality, and lack spatial resolution.
Biomedical time series are used to guide medical decision-making. Time series are
typically assessed by the eye of a trained professional. This process is expensive and
error-prone. When appropriate, the task of signal annotation should be diverted to
an intelligent, computer-based algorithm. In fact, as technological advancements in
hardware and processing power make obtaining physiological data much easier, the
growing wealth of available physiological data is driving the necessity of automatic
annotation algorithms. A plethora of data exists, but effective analysis algorithms do
not. Finally, computer-based algorithms are necessary as the human eye is not capable
of detecting finer properties of high-resolution signals and compiling information from
a growing number of sources.
Biomedical time series are used in a variety of settings. The iconic 12-lead
surface ECG is used to diagnose cardiac arrhythmia and assess the need for a
rhythm intervention. These recordings are typically 10 seconds long, and aggregating
the unique perspectives provided by the individual recordings (made from different
locations on the body) allows an electrophysiologist to pinpoint the location of a
failing heart structure. The cardiac system is also studied in long-term settings; heart
rate information is collected over a number of days using compact mobile monitoring
devices, and scientists connect the presence of low-frequency oscillations in heart rate
(heart rate variability, or HRV) to patient outcomes. Large databases (with thousands
of subjects) are analyzed offline to identify modifiable risk factors present in the ECG
signal. In the operating room, the EEG-derived bispectral index (BIS) is used to
monitor depth of anesthesia. The arterial blood pressure signal (ABP, an invasive
measurement of the fluid pressure in the arteries) monitors cardiovascular condition
in the operating room, the emergency room, and the intensive care unit (ICU). In
the sleep laboratory, a collection of EEG, EOG, and EMG signals mark transitions
2
between sleep stages during a 6-8 hour night of sleep. Finally, the market for wearable
health monitoring devices has exploded in the past decade, and these inexpensive
devices are additionally being used to screen patients in developing countries, where
access to trained professionals may be limited.
Broadly speaking, biomedical time series can be divided into two classes. In the
first class lies signals such as the ECG, PPG, and ABP that are sequences of disjoint
cycles (see Figure 1.1). Instead of viewing such signals as a series of measurements
made at the sampling rate of the signal, we can view them as a sequence of heartbeats
occurring at irregularly-sampled time points. From this viewpoint, we can make
precise the principles guiding traditional approaches to electrophysiology; the shape of
each individual cycle corresponds to the state of the system during its appearance in
the time series, and changes in cycle shape are reflections of changes in system state.
The contributions of this thesis include a model describing this class of oscillatory
biomedical time series, called the wave-shape oscillatory model, and a model for
the correspondence between cycle shape and system state called the wave-shape
manifold model. The second class of signals includes the EEG, EOG, and EMG
which are chaotic in nature and do not feature clear cycles. Nevertheless, their spectral
properties reflect the state of the systems being monitored, and we can analyze how
their spectral properties change from time to time using time-frequency analysis
techniques such as the short-time Fourier transform (STFT) and the continuous
wavelet transform (CWT). Beginning in the third chapter, each succeeding chapter
in this thesis is devoted to a particular subset of biomedical time series. We analyze
oscillatory signals in the time domain, and we analyze chaotic signals in the frequency
domain. The remainder of this chapter will feature a discussion of dynamics in
biomedical time series (the primary information we are interested in assessing),
the wave-shape oscillatory model, and the wave-shape manifold model. This work
3
originally appeared in [LMW19]. The second chapter will review a manifold learning
technique, diffusion maps (DM), which allows us to perform dynamics extraction in
the third chapter and onward.
Figure 1.1: Simultaneously recorded 10-second ECG (top), PPG (middle), and ABP(bottom) signals. Cardiac cycles are delineated to the left and to the right of thecoloured markers. Note that the cycles in the three time series are not preciselyaligned because these signals measure cardiovascular activity from a variety of placesin the body and through different mediums.
1.2 Dynamics
The dynamics of a human system are the changes in state it experiences due to
interactions with an internal regulator such as the autonomic nervous system, periph-
eral systems, and the environment outside the body. For example, a healthy sleeper
4
naturally oscillates between the REM, N1, N2, and N3 sleep stages but wakes up
when externally stimulated. We are interested in detecting and quantifying these
changes in state. To build a rigorous algorithm which extracts dynamical information
from a biomedical time series, we need to fix a reasonable model for physiological
dynamics, and we need to clarify the sense in which dynamics are manifested in the
time series.
A model for physiological dynamics is a collection of properties which any real set
of physiological dynamics could, within reason, be assumed to possess. This collection
of properties should be general enough to encapsulate as many types of physiological
dynamics as possible. Having fixed a system of interest, we assume the state-space
model for dynamics, in which
the collection of system states is a smooth, compact, low-dimensional Riemannian
manifold N called the phase manifold;
the state of the system at time t ∈ R is described by ψ(t), where ψ:R→ N .
We assume no additional properties for ψ. However, in some cases, it will be useful to
assign additional regularity to ψ. For example, we could consider ψ to be driven by
a vector field on N , in the sense that ψ′(t) = Xψ(t) (in which case ψ is an integral
curve), or ψ′(t) = Xψ(t) + dωt, where dω is the derivative of Brownian motion. Note
that N is not assumed to be connected or lack boundary. The state-space model for
dynamics was developed by Boltzmann, Poincare, and Gibbs, and it is commonly
considered in dynamical systems theory to model and predict physical phenomena.
From a physiological viewpoint, the state-space model for dynamics, including the
low dimensionality of the phase manifold, is reasonable due to the phenomenon of
homeostasis, wherein a human system remains at equilibrium when perturbed. As a
consequence, the states we observe are not random but are well-constrained. On the
5
other hand, when a system is perturbed, the response of the perturbed system is to
initiate a feedback mechanism through which the system may return to an equilibrium.
Negative feedback mechanisms bring the system back to the same equilibrium, and
positive feedback mechanisms bring the system to a new equilibrium. In this sense,
transitions between equilibria are also well-constrained by the scope of available
feedback mechanisms.
A full recovery of the dynamics is an isometric embedding Θ from the abstract
phase manifold N to a Hilbert space. In this case, the geodesic distance between two
points on N is equal to the `2 distance between their images in the Hilbert space.
Moreover, the mapping Θ pushes the time-dependent ψ to a curve Θ ψ on the
embedded phase manifold, which is a new time series summarizing the evolution of
system state over time. Our recovery is not required to quantify system states in
an absolute sense, but only in a relative sense, as our recovery is unique only up to
isometry. A partial recovery of the dynamics occurs when Θ is simply an embedding of
the phase manifold; in many cases, the observation process distorts the phase manifold,
making a recovery of the metric almost impossible. Nevertheless, an embedding of
the phase manifold will preserve topological information.
1.3 The Wave-shape Oscillatory Model
In this section, we make precise the signals that belong to our oscillatory class by
defining the wave-shape oscillatory model. We describe how dynamics are manifested
in these oscillatory biomedical time series. In particular, we define the wave-shape
manifold, through which we may access the phase manifold.
Definition 1.3.1. A biomedical time series f :R → R adheres to the wave-shape
6
oscillatory model with wave-shape manifold M⊂ L2(R) if
f(t) =∑j≥1
sj(t− tj) + Φ(t) t ∈ R, (1.1)
where tj ∈ R is the time at which the j-th cycle occurs, sj ∈M is the j-th cycle, and
Φ:R→ R is stationary random noise.
Definition 1.3.2. Let H be the subspace of C2(R) consisting of functions whose
support is a subset of [−1/2, 1/2]. A wave-shape manifold is a smooth, compact, and
low-dimensional Riemannian manifold embedded in H.
We describe the correspondence between system state and cycle shape using a
diffeomorphism ρ:N →M which encodes signal-specific distortions incurred during
the observation process. The wave-shape manifold takes an approach to modeling
signals like the ECG, PPG, and ABP that generalizes previous mathematical models.
The Phenomenological Model
To begin describing these previous mathematical models and their relationships to
the wave-shape oscillatory model, consider a time series of the form
f(t) = s(t) t ∈ R, (1.2)
where s ∈ C1,α(R) is 1-periodic (in the sense that s(t) = s(t + 1) for all t ∈ R)
and possibly non-sinusoidal. In particular, the function s on the interval [−1/2, 1/2]
describes the “shape” of the cycles appearing in the time series. This simple model is
yet insufficient for describing physiological time series because physiological time series
almost always vary in frequency and amplitude over time (due to the dynamics of the
underlying system). The phenomenological model proposed in [Wu13, Yan15, HS16]
takes amplitude and frequency modulation into account. The oscillatory time series
f :R→ R is modeled as
f(t) = a(t)s(φ(t)) + T (t) + Φ(t) t ∈ R, (1.3)
7
where
a ∈ C1(R) ∩ L∞(R) is a positive, smooth function describing amplitude modu-
lation;
s ∈ C1,α(R) is 1-periodic and describes the “shape” of the oscillations in the
time series (coined the wave-shape function [Wu13]);
φ ∈ C2(R) is a smooth, monotonically-increasing function describing phase
modulation (φ′ ∈ L∞(R) is termed the instantaneous frequency of f);
T ∈ C1(R) is the trend that intuitively is “locally constant” [CCW14, (6)];
Φ:R→ R models the inevitable random noise, which we assume to be stationary.
In this work, we always assume that the trend has been removed. For this reason, we
do not mention a trend term in the definition of the wave-shape oscillatory model.
Since a and φ′ are not required to be constant, this model is suitable for describing
physiological time series that change in amplitude and frequency over time (with
some degree of regularity). Under this phenomenological model, the usual mission in
time series analysis is estimating a, s, φ, and the statistical properties of Φ from one
realization of f . To ensure identifiability, we need to assume that a and φ′ are slowly
varying. To be specific, we say that a and φ′ are slowly varying with parameter ε > 0
if the following conditions hold.
|a′(t)|≤ εφ′(t) for all t ∈ R;
|φ′′(t)|≤ εφ′(t) for all t ∈ R.
When we recover a and φ, we are recovering dynamical information about the system
being monitored in a limited (but in many cases sufficient) sense. When we recover s,
we are recovering a representation of the “mean” state for the system. This statement
8
will be made more precise later, but it corresponds well with traditional approaches
to electrophysiology. We now provide some specific examples of how a and φ encode
the changing physiological state of the system being monitored.
Example 1. The instantaneous amplitude of a single-lead ECG signal is directly
related to respiratory volume (see Figure 1.2). When the lung is full of air, the ECG
electrode moves further from the heart and thoracic impedance increases, causing
the amplitude of the recorded ECG signal to decrease. On the other hand, when the
lung is empty, the ECG electrode moves closer to the heart and thoracic impedance
decreases [CAM+06, Chapter 8]. This relationship between respiratory volume and
the amplitude of the ECG signal has lead to the design of algorithms which estimate
the respiratory signal from the ECG signal. The estimated respiratory signal is called
the ECG-derived respiration (EDR) signal. We discuss EDR signals in Section 3.2.
Figure 1.2: The instantaneous amplitude of an ECG signal gives surrogate respiratoryinformation. Top: we plot a respiratory flow signal in blue; bottom: we plot thesimultaneously-recorded ECG signal and illustrate its amplitude modulation usingthe dotted pink lines.
9
Example 2. While the pulse rate of the sinoatrial (SA) node is constant, heart rate
is generally not constant [SMZ14, Gla09]. This discrepancy is caused by neural and
neuro-chemical influences on the pathway from the SA node to the cardiac muscles.
Heart rate can be modeled as the instantaneous frequency of the ECG signal (see
Figure 1.3); variations in heart rate (which correspond to frequency modulation of the
ECG signal) are the main object of study in the field of heart rate variability (HRV)
analysis. HRV analysis will be studied in Chapter 5.
Figure 1.3: The instantaneous frequency of an ECG signal is known as the instanta-neous heart rate. In red, we plot the instantaneous heart rate signal corresponding tothe ECG signal in black.
The wave-shape function s deserves some more discussion. In particular, we
discuss the physiological information it encodes and how recovering s is akin to
traditional electrophysiological practices. When analyzing ECG, electrocardiologists
derive physiological information from the cardiac waveform independently of the
amplitude and frequency modulation inherent in the signal. Their approach could be
called a landmark approach to biomedical time series analysis, wherein they notice
that the components of the cardiac waveform (as manifested in the ECG) correspond
10
to different stages of heart contraction (and hence the heart substructures that are
active during those stages). Landmarks on the enigmatic template are conventionally
labeled as shown in Figure 1.4, and each deflection corresponds to a particular stage
of heart contraction.
P
Q
R
S
T
U
Figure 1.4: Landmarks on the enigmatic ECG template correspond to distinct stagesof a healthy heart contraction. The P wave corresponds to the depolarization of theatria, the QRS complex corresponds to the depolarization of the ventricles, the Twave corresponds to the repolarization of the ventricles, and the U wave correspondsto the repolarization of the papillary muscles.
Example 3. When diagnosing atrial fibrillation (a cardiac arrhythmia associated
with heart failure and stroke) using the ECG signal, cardiologists look for the absence
of P waves, among other considerations (see Figure 1.5). The cardiac cycle is assessed
independently of the frequency modulation caused by heart rate, the amplitude
modulation caused by respiration, and the noise in the signal. In atrial fibrillation, an
additional confounding factor when recovering the cardiac waveform s is the presence
of a noise-like component called the fibrillatory wave (f -wave). Atrial fibrillation,
including its modeling and analysis, is further discussed in Chapter 6.
Example 4. When diagnosing myocardial infarction (blockage of the coronary
artery), electrocardiologists look for a phenomenon called ST segment elevation,
11
Figure 1.5: Atrial fibrillation is characterized by a lack of P waves, irregularly-oc-curring ventricular contractions, and a rapid component called the fibrillatory wave.
wherein the cardiac waveform in any precordial lead reaches an electric potential of
approximately 0.2 mV between the S and T landmarks. While a high heart rate
usually accompanies a heart attack, assessing the cardiac waveform independently of
the heart rate allows physicians to detect the difference between, for example, high
heart rates that accompany physical activity and high heart rates that accompany
adverse cardiac events. This phenomenon will be studied in Section 3.3.
Example 5. In ECG-based cardiac waveform analysis, the QT interval is the length
of time between the start of the Q wave and the end of the T wave. Since heart rate
is known to be negatively correlated with cardiac cycle duration, QT interval lengths
are commonly “corrected” so that the QT interval lengths of subjects with different
heart rates can be effectively compared. This correction process roughly amounts to
recovering s independently of any frequency modulation. An abnormal QT interval is
associated with an increased risk for sudden cardiac death. In clinical trials for new
medications, pharmaceutical companies use the corrected QT interval to assess the
increased risk for sudden cardiac death that a patient taking the new drug will incur.
We return to a mathematical discussion of the wave-shape function. Due to the
12
smoothness of s, it has a point-wise Fourier representation
s(t) = α0 +∞∑k=1
αk cos(2πkt+ βk) t ∈ R, (1.4)
where α0 ∈ R and αk ≥ 0 are associated with the Fourier coefficients of s, and
βk ∈ [0, 2π). Hence, we have the following expansion for a(t)s(φ(t)) in (1.3):
a(t)s(φ(t)) = a(t)[α0 +
∞∑k=1
αk cos(2πkφ(t) + βk)]
t ∈ R. (1.5)
The signal a(t)s(φ(t)) can be interpreted in two different ways. First, we could view
it as an oscillatory signal with one oscillatory component; in this case, the oscillation
is non-sinusoidal. Second, we could view it as an oscillatory signal with multiple
oscillatory components, each having a cosine oscillatory pattern (1.5); in this case, we
call the first oscillatory component a(t)α1 cos(2πφ(t) + β1) the fundamental compo-
nent and a(t)αk cos(2πkφ(t) + βk) (for k ≥ 2) the k-th multiple of the fundamental
component. Clearly, α0 is the zero-frequency term of the wave-shape function, and the
instantaneous frequency of the k-th multiple is k-times that of the fundamental compo-
nent. The second viewpoint is easier to analyze theoretically (e.g. via time-frequency
methods such as the short-time Fourier transform and its associated synchrosqueezed
representation). However, as described in the above examples, the first viewpoint is
more physiological and coincides well with traditional electrophysiological practices.
Physicians diagnose various cardiac diseases by reading the wave-shape function.
However, physicians read the waveform in the time domain and not in terms of
the waveform’s Fourier decomposition. We have thus confirmed the utility of the
wave-shape function for modeling and studying an oscillatory biomedical time series.
13
Generalizing the Phenomenological Model
While the phenomenological model (1.3) is able to capture the non-sinusoidal nature
of cycles in an oscillatory biomedical time series, it does not capture time-dependent
changes in oscillatory morphology (aside from those deviations due to amplitude and
frequency modulation). Physiologically, this time-varying oscillatory morphology is
important. Modeling changes in oscillatory morphology is especially relevant due to
the increased prevalence of long-term, mobile cardiac monitoring.
Example 6. During a surgical procedure in which a general anesthetic is required,
dosage is varied and administered based on the discretion of the supervising anesthesi-
ologist or nurse anesthetist; vital signs are recorded for the duration of the procedure,
which may be several hours. The changing concentration of anesthetic in the body will
continuously modulate the cardiac waveform of the patient. The dynamics underlying
general anesthesia are discussed in [WWH+19], and the analysis is extended to ABP
signals.
Example 7. In the mobile monitoring of patients suspected of having heart disease,
recordings are many hours long. However, adverse cardiac events such as paroxys-
mal supraventricular tachycardia or myocardial ischemia are only a few minutes in
duration; the rest of the time, the patient appears normal. In this case, the cardiac
waveform changes shape depending on the modulating cardiac health of the patient
being monitored.
Motivated by the above examples, the phenomenological model is generalized
to fully capture the time-varying morphology of cycles [LSW18]. A similar model
is considered in the followup research articles [XYD18, Yan17]. The main idea is
intuitive. We generalize αk∞k=0 in (1.4) to be time-varying: for each k ≥ 0 we let
14
a(t)αk be a new function Ak(t). We also let each kφ(t) + βk/(2π) be a new function
φk(t). We then have
a(t)s(φ(t)) = A0(t) +∞∑k=1
Ak(t) cos(2πφk(t)) t ∈ R, (1.6)
where Ak and φk satisfy a few conditions. In addition to the slowly varying assumption
imposed in the phenomenological model, we have
|φ′k(t)− kφ′1(t)|≤ εφ′1(t) for all k ∈ N+, where φ′1 > 0;
Ak(t) ≤ ckA1(t) for all k ∈ N+, where A1 > 0 and ck∞k=1 is an `1 sequence;
|φ′′k(t)|≤ εkφ′1(t) for all k ∈ N+;
|A′k(t)|≤ εckφ′1(t) for all k ∈ N+.
Other regularity conditions listed in [LSW18, Definition 2.1] also hold. Here, the
conditions |φ′k(t) − kφ′1(t)|≤ εφ′1(t) and Ak(t) ≤ ckA1(t) capture the fact that the
wave-shape is not fixed. The conditions |φ′′k(t)|≤ εkφ′1(t) and |A′k(t)|≤ εckφ′1(t) capture
the fact that the wave-shape may not change dramatically from one cycle to the next.
While this model has been used to design algorithms which handle various phys-
iological problems such as fetal ECG analysis [SW17b], fetal magnetocardiography
[EVWFE18], simultaneous heart rate and respiratory rate estimation from the PPG
[CW17], and cardiogenic artifact recycling [LWM19], its dependence on the “slowly
varying wave-shape” assumption limits its applicability to physiological time series.
Moreover, since the generalized phenomenological model [LSW18] encodes the time-
varying nature of the wave-shape function in the frequency domain, the wave-shape
function interacts with the instantaneous frequency in a non-trivial way, which further
limits its ability to independently quantify dynamics encoded by the instantaneous
frequency and the time-varying wave-shape. Specifically, when the subject is not of
normal physiology, the model is limited.
15
Example 8. Premature ventricular contractions are heartbeats which are not trig-
gered by the SA node but instead originate in the ventricles (the lower compartment
of the heart). The frequency at which these beats occur is associated with congestive
heart failure [DDV+15]. Premature ventricular contractions manifest as the heart
“skipping a beat” and appear at seemingly random times. Morphologically, they appear
very different from beats triggered by the SA node, and a signal featuring premature
ventricular contractions will not adhere to the “slowly varying wave-shape” assump-
tion (see Figure 1.6). A patient may have multiple morphologically distinct types of
premature ventricular contractions, in which case his or her risk for cardiac disease
increases. Premature ventricular contractions will be discussed later in Section 3.1.2.
Figure 1.6: ECG signals featuring premature ventricular contractions are difficultto model because the wave-shape is not slowly varying. Premature ventricularcontractions are morphologically distinct from normal beats and are not triggered bythe SA node.
We refer readers with interest in the generalized phenomenological model to
[LSW18, XYD18, Yan17] for its theoretical details and to [SW17b, EVWFE18, CW17,
LWM19] for applications. As useful as this model is, it cannot capture the (non-
slowly) time-varying morphology of cycles in an oscillatory biomedical time series.
Specifically, we have interest in those dynamics which are encoded by variations in
cycle morphology that are independent of amplitude and frequency modulation.
16
Connecting the Phenomenological Model to the Wave-shape Oscillatory
Model
We show that for a given physiological signal satisfying the phenomenological model,
the collection of all oscillatory patterns can be well-approximated by a one-chart
manifold. The slowly varying assumption is used to show that each oscillatory
pattern in the physiological signal is “close” to one whose amplitude and frequency
are constant. The following theorem says that the collection of all such constant-
amplitude, constant-frequency oscillatory patterns is a one-chart manifold. We then
use this opportunity to simulate such a one-chart wave-shape manifold and visualize
its structure.
Theorem 1.3.3. Suppose s ∈ C2(R) is a wave-shape function whose support is a
subset of [−1/2, 1/2]. Let U = I1 × I2, where I1 ⊂ (0,∞) and I2 ⊂ (1,∞) are two open
intervals of finite lengths. Define a map Φ:U → H ⊂ L2(R) by sending an ordered
pair (a, f) ∈ U to the function R 3 t 7→ as(ft). Then Φ is a C1 diffeomorphism onto
an open subset M := Φ(U) ⊂ L2(R); that is, M is a manifold with one chart.
Proof. Note that since s ∈ C2c (R), Φ(U) ⊂ C2
c (R). Suppose (a1, f1) 6= (a2, f2) while
‖Φ(a1, f1)− Φ(a2, f2)‖2= 0. (1.7)
We abuse the notation and denote Φ(a1, f1) = a1s(f1t) to simplify the discussion
when there is no danger of confusion. Since s continuous, we have
a1s(f1t) = a2s(f2t) (1.8)
for all t ∈ R. Suppose a1s(f1t) has support [a, b] ⊂ [−1/2f1, 1/2f1]. Without loss
of generality, assume f2 > f1, a < 0, and b > 0. The support of a2s(f2t) is thus
[af1/f2, bf1/f2] ⊂ [a, b]. Hence, we can find x ∈ [a, af1/f2] ∪ [bf1/f2, b] so that
17
a1s(f1x) 6= 0 but a2s(f2x) = 0, which contradicts (1.8). As a result, we have shown
that Φ is one-to-one and onto M := Φ(U).
We claim that the total differential of Φ is
DΦ|(a0,f0)(h1, h2) = h1s(f0t) + h2a0ts′(f0t), (1.9)
where (a0, f0) ∈ U and (h1, h2) ∈ R2. Note that Φ is a Hilbert space-valued function.
Indeed, when (h1, h2) = (a− a0, f − f0), we have
Φ(a, f)− Φ(a0, f0)−DΦ|(a0,f0)(a− a0, f − f0) (1.10)
= as(ft)− a0s(f0t)− [(a− a0)s(f0t) + (f − f0)a0ts′(f0t)]
= a0[s(ft)− s(f0t)− (f − f0)ts′(f0t)] + (a− a0)(s(ft)− s(f0t)).
By the integral form of Taylor’s expansion, we have
s(ft)− s(f0t) =
∫ f
f0
ts′(zt) dz (1.11)
and
s(ft)− s(f0t)− (f − f0)ts′(f0t) =
∫ f
f0
(f − z)t2s′′(zt) dz. (1.12)
Here, we view s(ft) as a function of f . Next, we bound the L2 norm of (1.10). Indeed,
we have ∥∥∥∫ f
f0
(f − z)t2s′′(zt) dz∥∥∥L2
(1.13)
=
∫ ∫ f
f0
∫ f
f0
(f − z)t2s′′(zt)(f − z′)t2s′′(z′t) dzdz′dt
=
∫ f
f0
∫ f
f0
(f − z)(f − z′)[ ∫
t2s′′(zt)t2s′′(z′t) dt]dzdz′.
Note that since s ∈ C2c (R), we have∣∣∣ ∫ t2s′′(zt)t2s′′(z′t) dt
∣∣∣ ≤ (∫ t4|s′′(zt)|2 dt)1/2(∫
t4|s′′(z′t)|2 dt)1/2
=C
zz′
18
for some C > 0 depending on the fourth absolute moment of s′′ only. As a result,∥∥∥∫ f
f0
(f − z)t2s′′(zt) dz∥∥∥L2≤ C
∫ f
f0
∫ f
f0
|(f − z)(f − z′)|zz′
dzdz′ (1.14)
≤ C
f 20
(∫ f
f0
|f − z| dz)2
=C|f − f0|4
4f 20
.
Similarly, we can bound∫ ff0ts′(zt) dz. Therefore,
‖Φ(a, f)− Φ(a0, f0)−DΦ|(a0,f0)(a− a0, f − f0)‖L2≤ C|f − f0|4, (1.15)
which leads to
‖Φ(a, f)− Φ(a0, f0)−DΦ|(a0,f0)(a− a0, f − f0)‖L2
‖(a− a0, f − f0)‖→ 0 (1.16)
when ‖(a− a0, f − f0)‖→ 0. To finish the proof, we show that the total differential of
Φ at (a, f) ∈ U , DΦ|(a,f), is of full rank for any (a, f) ∈ U . It suffices to show that
s(ft) and ats′(ft) are linearly independent in L2(R). Suppose there are constants
c1, c2 ∈ R such that for all t ∈ R,
c1s(ft) = c2ats′(ft). (1.17)
Suppose c1 6= 0. In this case, c2 6= 0; otherwise, we have a contradiction. Since
s ∈ C2c (R), there exists t 6= 0 such that s(ft) 6= 0 is the extremal value of s; that
is, fs′(ft) = 0. Since f > 0, s′(ft) = 0. Therefore, we have c1s(ft) 6= 0 but
c2ats′(ft) = 0, which is a contradiction. As a result, we conclude that c1 = 0. In this
case, c2 must be 0; otherwise, we have a contradiction. This concludes the claim that
DΦ|(a,f) is of full rank. We conclude that Φ:U →M is a differomorphism.
See Figure 1.7 for an example of the wave-shape manifold in Theorem 1.3.3. We can
clearly see the nonlinear structure of the one-chart wave-shape manifold determined
by the 1-periodic wave-shape function. Next, we show that for a signal satisfying the
phenomenological model (1.3), it can be well-approximated by the manifold indicated
in Theorem 1.3.3.
19
Figure 1.7: An illustration of the wave-shape manifold in Theorem 1.3.3. Thewave-shape function (left) is the db4 wavelet. We scale and dilate s by independentlysampling amplitudes uniformly from I1 = [3/4, 5/4] and frequencies uniformly fromI2 = [1, 9]. On the right, we visualize the wave-shape manifold by linearly projectinga set of N = 2500 discretized wave-shape functions from R7169 to R3. Each pointis a wave-shape function t 7→ as(ft); red corresponds to high frequencies, and bluecorresponds to low frequencies.
Theorem 1.3.4. Take ε > 0 to be sufficiently small. Consider f :R→ R satisfying
the phenomenological model (1.3):
f(t) = A(t) s (φ(t)) . (1.18)
Assume without loss of generality that inft∈RA(t) > 0 and inft∈R φ′(t) = 1. Define
tn := φ−1(n), where n ∈ Z, and define functions on R as
gn(t) :=
A(tn)s(φ′(tn)t) if −1
2φ′(tn)≤ t ≤ 1
2φ′(tn)
0 otherwise,(1.19)
and
fn(t) :=
A(tn + t)s(φ(tn + t)) if −1
2φ′(tn)≤ t ≤ 1
2φ′(tn)
0 otherwise.(1.20)
We then have uniformly over n that
‖gn − fn‖∞≤ Cε , (1.21)
where C = C(‖A‖∞, ‖φ′‖∞, ‖s‖C1 , ‖φ′′‖∞).
20
Proof. Note that for each n ∈ Z, gn and fn are all supported in [−1/2, 1/2]. By a direct
calculation, we have
|gn(t)− fn(t)|≤|A(tn)− A(tn + t)||s(φ′(tn)t)| (1.22)
+ A(tn + t)|s(φ′(tn)t)− s(φ(tn + t))| .
for all t ∈ [−1/2, 1/2], and |gn(t)− fn(t)|= 0 for all t /∈ [−1/2, 1/2]. By a direct bound,
we have
|A(tn)− A(tn + t)| ≤∫ tn+t
tn
|A′(x)| dx ≤ ε
∫ tn+t
tn
[φ′(tn) +
∫ x
tn
φ′′(z) dx] dx
≤ εφ′(tn)|t|+ε12‖φ′′‖∞|t|2≤
1
2ε[φ′(tn) +
1
4‖φ′′‖∞
], (1.23)
where the last bound holds since we only need to control t ∈ [−1/2, 1/2]. For the other
term, note that
s(φ(tn + t)) = s
(φ(tn) + φ′(tn)t+
∫ tn+t
tn
φ′′(z)z dz
)(1.24)
= s
(φ′(tn)t+
∫ tn+t
tn
φ′′(z)z dz
)
since φ(tn) ∈ Z and s is 1-periodic. Therefore, by denoting J :=∫ tn+t
tnφ′′(z)z dz, we
have
|s(φ′(tn)t)− s(φ(tn + t))| ≤∫ tn+J
tn
|s′(x)| dx = J‖s′‖∞. (1.25)
J is controlled in the same way:
|J | ≤ ε
∫ tn+t
tn
φ′(z)z dz ≤ ε
∫ tn+t
tn
z[φ′(tn) +
∫ z
tn
φ′′(w) dw]dz
≤ ε(φ′(tn)|t|2+
1
3‖φ′′‖∞|t|3
)≤ 1
4ε(φ′(tn) +
1
6‖φ′′‖∞
).
As a result, we obtain the claim with C = C(‖A‖∞, ‖φ′‖∞, ‖s‖C1 , ‖φ′′‖∞).
21
A direct consequence of this theorem is that gnn∈Z is a subset of the one-
chart manifold M := Φ(U) described in Theorem 1.3.3, where U = Ia × If ,
Ia = (infnA(tn), supnA(tn)) and If = (1, supn φ′(tn)). As a result, the collection
of oscillatory cycles, fnn∈Z, can be parametrized by M up to a controllable error
depending on εC(‖A‖∞, ‖φ′‖∞, ‖s‖C1 , ‖φ′′‖∞). We mention that a similar argument
can be applied to the signal satisfying the generalized phenomenological model sum-
marized in Section 1.3 with more tedious notation and calculation. Since it does not
shed more light upon the topic, we omit the details.
1.4 Dynamics in Chaotic Time Series
Biomedical time series such as the electroencephalogram (EEG), the electromyogram
(EMG), and the electrooculogram (which measure brain activity, muscle activity,
and eye movement, respectively) do not adhere to the wave-shape oscillatory model.
Nevertheless, there is still a sense in which they are observations of a process on a
manifold hosting physiological states. For the simple reason that these signals are
exceptionally challenging to model, we call them chaotic. EEG signals in particular
appear to be summations of numerous intermittent oscillatory “bursts” of electrical
energy, each of which has a well-localized fundamental frequency. These bursts of
energy (called neural oscillations or brainwaves) correspond to neuronal activity and
are associated with cognitive processing. The spectral content of a subject’s EEG
signal is known to vary with his or her state of consciousness. Activity in the delta
band (0-4 Hz) is associated with unconsciousness or deep sleep, activity in the theta
band (4-8 Hz) is associated with meditation and drowsiness, activity in the alpha
band (8-16 Hz) is associated with wakeful relaxation, the beta band (16-32 Hz) is
associated with active thinking, and the gamma band (32-100 Hz) is associated with
a phenomenon termed unified perception, in which different parts of the brain work
22
together to accomplish demanding cognitive tasks. In this sense, the time-localized
spectrum of the EEG signal is a non-negative, integrable function on the real line
which provides some coarse information about the state of the brain. In this work, we
will assume that the space of such functions is a smooth, compact, low-dimensional
Riemannian manifold embedded in L2(R) which is diffeomorphic to the phase manifold
(as in the wave-shape manifold model). In principle, the time-varying spectrum of the
EEG signal is akin to the time-varying morphology of the cardiac waveform. Localizing
the spectral content of the EEG signal in time can be accomplished using techniques
such as the short-time Fourier transform or the continuous wavelet transform. We
can use the resulting time-frequency representation to observe neural dynamics. One
clear advantage to this approach is that we can localize the spectrum of the EEG
signal at any point in time (as opposed to at the incidence of certain landmarks), and
we take the opportunity to observe the spectrum at regular sampling intervals.
In comparison with signals adhering to the wave-shape oscillatory model, the
time-domain structure of the EEG signal is very difficult to assess. In the field of
cognitive neuroscience, neural oscillations with specific time-domain features that
are characteristic of certain cognitive tasks (event-related potentials) are obtained
by averaging together multiple aligned recordings of a subject repeatedly performing
that task. The difficulty of obtaining an event-related potential arises because of the
high level of background neural activity; in cardiac vital signs, background activity
is minimal. In Chapter 4, we discuss the problem of sleep stage classification using
the EEG signal, and we handle the background activity by taking two simultaneously
recorded EEG channels into account.
23
Chapter 2
Diffusion Maps
To recover the wave-shape manifold M and the process ρ∗ψ which is a surrogate for
the underlying physiological dynamics, we go to the field of manifold learning. In
manifold learning, high-dimensional or abstract data sets (viewed as point clouds
sampled from a low-dimensional manifold) are mapped to a Hilbert space so that
the geodesic distance between any two points on the manifold is approximately
equal to the `2 distance between their images. The main challenges facing dynamics
extraction are the nonlinearity of the wave-shape manifold and the possible noise
that deviates observed oscillatory patterns from the wave-shape manifold. To handle
these challenges, we use the diffusion maps algorithm. We choose the diffusion maps
algorithm because of its sound theoretical support. Since its appearance in 2006
[CL06] as a generalization of the Laplacian Eigenmap [BN03], a series of theoretical
works have been carried out which study the graph Laplacian (the main tool used
by the diffusion maps algorithm) and the behavior of diffusion maps in the manifold
setting. However, the diffusion maps algorithm has meaning in settings where the
data set has much less regularity. We begin in the most general setting.
2.1 Diffusion on a Measure Space
This section roughly follows the exposition of [Laf04]. Let (X,A, µ) be a measure space.
Let µ be a finite measure. Our definition of measure includes countable additivity.
Assume that the local geometry of X is informed by a kernel k:X ×X → C which is
24
an element of L2(X ×X,µ× µ)∫X
∫X
k(x, y)k(x, y) dµ(x)dµ(y) <∞; (2.1)
Hermitian
k(x, y) = k(y, x) x, y ∈ X; (2.2)
positive definite∫X
∫X
f(x)k(x, y)f(y) dµ(x)dµ(y) ≥ 0 f ∈ L2(X,µ). (2.3)
The simplest example of such a measure space is a finite set of nodes in a graph; the
accompanying kernel describes node adjacency. Define the degree function as
d(x) =
∫X
k(x, y) dµ(y) x ∈ X, (2.4)
and assume k is such that d(x) 6= 0 for all x ∈ X. Note that we achieve this condition
in particular if k is non-negative and k(x, x) > 0 for all x ∈ X. This condition is
needed to avoid dividing by zero in the following definition. Define the diffusion kernel
p:X ×X → C by
p(x, x′) =k(x, x′)
d(x)x, x′ ∈ X, (2.5)
and call the corresponding bounded linear map
Pf(x) :=
∫X
p(x, y)f(y) dµ(y) f ∈ L2(X,µ), x ∈ X (2.6)
the diffusion operator. When k is non-negative, p(x, · ) can be thought of as a
probability density function, and taking powers of P amounts to iterating a Markov
process over X, where the transition probabilities are informed by k. For an integer
25
t > 0 which we term the diffusion time, let pt:X ×X → R denote the integral kernel
associated to P t. Define the diffusion distance as
Dt(x, x′)2 =
∫X
[pt(x, y)− pt(x′, y)]2dµ(y)
d(y)x, x′ ∈ X. (2.7)
The notion in which this quantity is a distance will be made clear in the following
analysis. Moreover, the notion in which this distance provides geometric information
intrinsic to the space X will be made more precise when we consider the case where
X ⊂ Rn is a submanifold. For now, our goal is find a mapping Φt:X → H into a
Hilbert space H such that
Dt(x, x′) = ‖Φt(x)− Φt(x
′)‖H. (2.8)
Define the isotropic diffusion kernel
k(x, x′) =k(x, x′)√d(x)d(x′)
x, x′ ∈ X. (2.9)
Then k ∈ L2(X × X,µ × µ) is also Hermitian and positive definite. The isotopic
diffusion operator
Kf(x) :=
∫X
k(x, y)√d(x)d(y)
f(y) dµ(y) f ∈ L2(X,µ), x ∈ X (2.10)
is bounded with norm 1. Indeed, by the Cauchy-Schwarz inequality,
〈Kf, f〉 ≤√‖Kf‖‖f‖ (2.11)
Since K√d(x) =
√d(x), we see that ‖K‖= 1. We also know that K is Hilbert-
Schmidt (c.f. [Con90, Page 267] or [RS80, Page 210]). Using the Hilbert-Schmidt
Theorem and the fact that k is Hermitian, we see that
k(x, y) =∑
λnφn(x)φn(y), (2.12)
where φn is the complete orthonormal basis for L2(X,µ) consisting of eigenfunctions
for K, and 1 = λ1 ≥ λ2 ≥ · · · are the corresponding eigenvalues. Recall that
26
eigenfunctions of Hermitian operators are orthogonal and that their eigenvalues are
real. Moreover, since k is positive definite, λn ≥ 0 for all n. The convergence of (2.12)
is in terms of the norm of L2(X×X,µ×µ). We can use (2.12) to get a decomposition
of the diffusion kernel. We have
pt(x, y) =∑
λtnφn(x)√d(x)
φn(y)√d(y). (2.13)
Note that P t now has meaning when t > 0 is not an integer. We can then show that
Dt(x, x′) =
∑λ2tn
(φn(x)√d(x)
− φn(x′)√d(x′)
)2
, (2.14)
which implies that the diffusion distance is a metric when λn > 0 for all n ∈ N+.
Moreover, the diffusion distance is evidently the `2(N+)-distance between sequences
Dt(x, x′)2 =
∥∥∥∥∥∥∥λ
t1ϕ1(x)λt2ϕ2(x)
...
−λ
t1ϕ1(x′)λt2ϕ2(x′)
...
∥∥∥∥∥∥∥
2
`2(N+)
, (2.15)
where we have made the substitution ϕn(x) = φn(x)/√d(x). Note that
Pϕn(x) =1√d(x)
Kφn(x) =1√d(x)
λnφn(x) = λnϕn(x), (2.16)
so ϕn is an unnormalized eigenfunction of the diffusion operator P with eigenvalue
λn. This observation provokes the definition of the diffusion map Φt:X → `2(N+):
Φt(x) =
λ1ϕ1(x)λ2ϕ2(x)
...
, (2.17)
which satisfies the property (2.8) in which we had interest.
2.2 Diffusion on a Compact Domain
This section follows the exposition of [MNY06]. Mercer theory allows us to strengthen
the decomposition of the diffusion kernel provided by the Hilbert-Schmidt theorem,
27
under suitable assumptions. The easiest way to ensure all assumptions are met is the
following. Let X be a compact subset of Rn. Let µ be a finite Borel measure which is
supported on X. This means that if U ⊂ X is non-empty and open, then µ(X) > 0.
Note that our definition of measure includes countable additivity. Take a continuous,
Hermitian, positive definite kernel k:X ×X → C on X. Then k necessarily satisfies∫X
∫X
k(x, y)k(x, y) dµ(x)dµ(y) <∞. (2.18)
Define the degree function as
d(x) =
∫X
k(x, y) dµ(y) x ∈ X, (2.19)
and assume k is such that d(x) > 0 for all x ∈ X. Since X ⊂ Rn, we have an obvious
metric on X, namely the Euclidean distance on Rn. Bochner’s theorem says that the
kernel
k(x, y) = h(‖x− y‖2) (2.20)
is positive definite provided h(0) = 1 and there exists a finite positive measure ν such
that
h(u2) =
∫Re−2πiξu dν(ξ). (2.21)
For example, we often take the Gaussian kernel
k(x, y) = e−‖x−y‖2
. (2.22)
As before, the diffusion kernel p:X ×X → C is
p(x, x′) =k(x, x′)
d(x)x, x′ ∈ X, (2.23)
and the diffusion operator is
Pf(x) :=
∫X
p(x, y)f(y) dµ(y) f ∈ L2(X,µ), x ∈ X. (2.24)
28
The diffusion distance is
Dt(x, x′)2 =
∫X
[pt(x, y)− pt(x′, y)]2dµ(y)
d(y)x, x′ ∈ X, (2.25)
where t > 0, and pt is the integral kernel associated to the operator P t. The isotropic
diffusion kernel
k(x, x′) =k(x, x′)√d(x)d(x′)
x, x′ ∈ X. (2.26)
is also continuous (and hence L2-bounded), Hermitian, and positive definite. By
the Hilbert-Schmidt theorem, there are eigenfunctions φn (which form a complete
orthonormal basis of L2(X,µ)) and corresponding eigenvalues 1 = λ1 ≥ λ2 ≥ · · · ≥ 0
such that
k(x, y) =∑
λnφn(x)φn(y). (2.27)
Since k also satisfies Mercer’s condition, which states that∫X
∫X
f(x)k(x, y)f(y) dµ(x)dµ(y) <∞ f ∈ L2(X,µ), (2.28)
Mercer’s theorem says that the convergence of (2.27) is point-wise and uniform on
X. We also see that the eigenfunctions corresponding to non-zero eigenvalues are
continuous. We follow the procedure above to obtain a decomposition for pt, an
expression for the diffusion distance involving the unnormalized eigenvectors of P ,
and the diffusion map Φt.
2.3 Diffusion on a Manifold
We now proceed to the analysis of diffusion on a manifold. In this setting, we
assume thatM is an m-dimensional, closed (compact without boundary), and smooth
manifold embedded in Rn. We take the induced Riemmanian metric g onM (from the
29
canonical metric on Rn), and we assume that m ≤ n. Write dVg for the Riemannian
volume measure. Note that we do not have access to the metric g or a topological
description ofM (even its dimension). We have only a list of its elements. Nevertheless,
by applying the diffusion maps algorithm, we will be able to recover the metric. We
assume the existence of a family of kernels
kh(x, y) := k
(‖x− y‖√
h
), (2.29)
where k: [0,∞)→ R+ is a positive C2-function such that both k and k′ decay expo-
nentially fast at +∞. The Gaussian k(u) = e−u2
is one common example. Assume h
is chosen so that 0 <√h < τ, ι, where ι is the injectivity radius of M and τ is the
condition number [HWB08]. Define the degree function
d(x) =
∫Mkh(x, y) dVg(y) x ∈M, (2.30)
the diffusion kernel
p(x, y) =kh(x, y)
d(x)x, y ∈M, (2.31)
and the diffusion operator
Pf(x) =
∫Mp(x, y)f(y) dVg(y) f ∈ L2(M, dVg), x ∈M. (2.32)
The diffusion maps algorithm generalizes a family of methods for the organization
of large data sets, which includes the Laplacian Eigenmap. The algorithms in this
family derive similarity from their use of eigenfunctions of the operator
Lh :=Pf − f
hf ∈ L2(M, dVg) (2.33)
called the graph Laplacian. Diffusion is carried out when we diagonalize P , to which
diagonalizing Lh is equivalent. This operator derives its name from the Laplace-
Beltrami partial differential operator ∆g. Indeed, Lh converges point-wise as h→ 0+
30
to a scalar multiple (depending on the kernel k, the dimension m, and the volume of
M) of ∆g within an error of order O(h) [SW17a, Theorem 5.2]. Moreover, we obtain
spectral convergence; the eigenvalues and normalized eigenvectors of Lh converge
in probability (as h → 0+) to the eigenvalues and normalized eigenfunctions of ∆g
[BN07]. The spectral embedding introduced by Berard, Besson, and Gallot explains
the sense in which taking an eigendecomposition of ∆g reveals the geometry of the
manifold. Suppose
ak∞k=1 ⊂ L2(M, dVg) 0 ≤ µ1 ≤ µ2 ≤ · · · (2.34)
is an orthonormal basis of real eigenfunctions for the operator ∆g. For each t > 0,
the mapping Ψt:M→ `2(N+) given by
x 7→√
2(4π)m4 (2t)
m+24 e−µjtaj(x)∞j=1 (2.35)
is an embedding. In particular, as t 7→ 0+, the pullback of the canonical metric on
`2(N+) is
g +2t
3
(1
2Scalgg + Ricg
)+O(t2), (2.36)
indicating that Ψt is an approximate isometry. An important first observation is that a
large number of eigenfunctions appear to be needed in order to obtain the Riemannian
metric. Moreover, it is not clear how many eigenfunctions can be discarded while
maintaining the embedding property (which guarantees that the topology of the
manifold is preserved). For these reasons, the truncated spectral embedding has
recently been studied [JMS08, Bat14, Por16]; one result is as follows [Por16, Theorem
5.1]. Fix ε > 0. If κ is a lower bound on the Ricci curvature ofM and ι is the injectivity
radius, then there exists a t0 = t0(m, ε, κ, ι) such that for all 0 < t < t0, there exists
an nE = nE(m, ε, t, κ, ι,Vol(M)) such that if q ≥ nE, the finite-dimensional mapping
Φ(q)t : x 7→
√2(4π)
m4 (2t)
m+24 e−tµjaj(x)qj=1 ∈ Rq (2.37)
31
is an embedding and satisfies
1− ε ≤ ‖(dΨ(q)t )x‖≤ 1 + ε. (2.38)
Note that Φ(q)t is an isometry when the operator norm ‖(dΨ
(q)t )x‖ is 1. We note also
that although the operator Lh is asymptotically ∆g when M is a compact manifold
without boundary [SW17a], a corresponding embedding result for manifolds with
boundary does not exist.
The eigendecomposition of ∆g affords us an extra piece of information: the
ordering imposed by the magnitude of the corresponding eigenvalues. While algebraic
topologists organize the geometry of a manifold in terms of homotopy, homology, and
cohomology, harmonic analysts organize the geometry of a manifold in terms of a
generalized notion of “frequency” afforded by the ordered eigenvalues of ∆g. In the
same way that functions on S1 can be decomposed using the Fourier basis for L2(S1),
functions on the manifoldM can be decomposed using the eigenfunctions of ∆g which
serve as an orthonormal basis for L2(M, dVg). Each eigenfunction is oscillatory in
nature, and its oscillatory behaviour can be characterized by the magnitude of the
corresponding eigenvalue. To obtain coarse information about the geometry of M,
we should consider the lowest-order eigenfunctions, but to obtain finer information
about the geometry of M (such as geodesic distances and community structure), we
should consider the higher-order eigenfunctions.
2.3.1 Finite Samples
In practice, we are provided with only a finite number of points X = xiNi=1 ⊂M,
which we assume are sampled independently and identically from a random vector
Y : (Ω,F ,P)→ Rp whose range isM. To make the notation clear, Ω is the domain of
Y , F is a Borel σ-algebra on Ω, and P is a probability measure on F . We assume
32
that the pushforward measure Y∗P on M is absolutely continuous with respect to the
Riemannian volume measure dVg, and that the function
p :=Y∗PdVg
:M→ R+ (2.39)
given by Radon-Nikodym theorem is uniformly bounded away from both zero and
infinity and is of class C4(M). We call p the probability density function on M
associated with Y ; when p is constant, we say that the sampling is uniform. In
general, a non-uniform density function poses a significant challenge to recovering
the operator ∆g. However, a simple normalization step called α-normalization can
overcome this challenge. The algorithm (in the continuous setting) is as follows.
Choose 0 ≤ α ≤ 1, and set
π(x) =
∫Mkh(x, y) p(y)dVg(y) x ∈M (2.40)
kh,α(x, y) =kh(x, y)
π(x)απ(y)αx, y ∈M (2.41)
dh,α(x) =
∫Mkh,α(x, y) p(y)dVg(y) x ∈M (2.42)
ph,α(x, y) =kh,α(x, y)
dh,α(x)x, y ∈M. (2.43)
We use the α-normalized diffusion kernel ph,α to obtain the α-normalized diffusion
operator
Ph,αf(x) =
∫Mph,α(x, y)f(y) p(y)dVg(y) f ∈ L2(M, dVg), x ∈M. (2.44)
Pointwise on M, we then have the asymptotical result [SW17a, Theorem 5.2] saying
that
(Ph,αf − f) (x)
h= C(k,m)
(∆gf(x) +
2∇f(x) · ∇ (p1−α) (x)
p1−α(x)
)+O(h), (2.45)
33
where C(k,m) is a constant depending on the kernel k and the dimension m of M.
Evidently, taking α = 1 causes the potential term to become zero, leaving the Laplace-
Beltrami operator in the zeroth order. In the discrete setting, we can approximate
the α-normalized diffusion kernel using the set xiNi=1 as follows.
π(x) =1
N
N∑i=1
kh(x, xi) x ∈M (2.46)
kh,α(x, y) =kh(x, y)
π(x)απ(y)αx, y ∈M (2.47)
dh,α(x) =1
N
N∑i=1
kh,α(x, xi) x ∈M (2.48)
ph,α(x, y) =kh,α(x, y)
dh,α(x)x, y ∈M (2.49)
Ph,αf(x) =1
N
N∑i=1
ph,α(x, xi)f(xi) f ∈ L2(M, dVg), x ∈M. (2.50)
Having done so, we recall the recent pointwise [SW17a, Corollary 5.1] and spectral
[SW17a, Theorem 5.4] convergence results associated to these operators. Suppose
α ∈ (0, 1], and that h = h(N) so that both hN→∞−−−→ 0 and
1
hm/4+1
√logN
N
N→∞−−−→ 0. (2.51)
With probability higher than 1− 1/N2, we have
(Ph,αf − f)(xi)
h= C(k,m)
(∆gf(xi) +
2∇f(xi) · ∇ (p1−α) (xi)
p1−α(xi)
)+ · · ·
· · ·+O(h) +O
(1
hm/4+1
√logN
N
)(2.52)
for all 1 ≤ i ≤ N , where f ∈ C4(M). Finally, if t > 0 is fixed, 1 = λ1,N ≥ λ2,N · · · are
the eigenvalues of(Ph,1
) th, ϕj,N∞j=1 are the associated L2-normalized eigenfunctions,
34
and i ≥ 1 is a fixed integer, then there is a sequence hN∞N=1 such that
limN→∞
λi,N = µi (2.53)
limN→∞
‖ϕi,N − ai‖L2(M,dVg)= 0 (2.54)
in probability, where µj∞j=1 are the eigenvalues of the heat kernel operator et∆g , and
aj∞j=1 are the associated L2-normalized eigenfunctions. As a result, we can approxi-
mate the truncated spectral embedding by choosing α = 1, choosing h appropriately,
and setting
Φ(q)t (xi) = λ1ϕ1(xi), ..., λqϕq(xi). (2.55)
Compact manifolds with boundary are also handled in [SW17a], and non-compact
manifolds with boundary are discussed in [HAVL05].
2.3.2 Robustness to Noise
Real data is inevitably noisy. However, under some mild assumptions, the diffusion
maps algorithm is robust to noise [EK+10, EKW+16]. We summarize the result here.
Let W and W0 denote the N × N affinity matrices associated with the noisy and
clean data sets, respectively. Suppose ε > 0 is such that
sup1≤i,j≤N
|W (i, j)−W0(i, j)|≤ ε (2.56)
Note that ε quantifies the control that we have over the noise. If the affinity matrix
W (for the noisy data) satisfies
inf1≤i≤N
∑j 6=i
W (i, j)
N> γ (2.57)
for some γ > ε (meaning in particular that the noise has not introduced isolated
points), then ∥∥D−1W −D−10 W0
∥∥ ≤ ε
γ
(1 +
1
γ − ε
), (2.58)
35
where ‖ · ‖ is the operator norm. By Weyl’s inequality (for the spectra of perturbed
matrices) and the Davis-Kahan sin θ theorem, the first q′ eigenvectors and eigenvalues
are well-reconstructed up to a controllable error, where the number q′ depends on the
noise level. When q′ is large enough so that q′ ≥ nE (m, ε, t, κ, ι,Vol(M)) as in (2.37),
we are guaranteed a reconstruction of the manifold.
36
Chapter 3
Cardiovascular Waveform Analysis
Heart rate is the most basic and widely known measure of the cardiovascular system.
On the other hand, a surrogate for heart rate known as pulse rate can be estimated
by measuring blood flow at nearly any place in the human body because of the
transportation efficiency of the human circulatory system. Besides heart rate and
pulse rate, the analysis of cardiac and pulse waveforms provides further information
regarding the cardiovascular system. Deflections in the cardiac waveform are known
to correspond to the stages of heart contraction, and factors such as vascular wall
tension, blood volume, and cardiac contractility all affect the morphology of the pulse
waveform [O’R11]. Due to the abundant information hidden in both the cardiac and
pulse waveforms, waveform analysis has been widely applied in clinics. On the one
hand, indices are derived from a patient’s “mean” waveform and compared with the
indices of other patients for the purpose of diagnosis. On the other hand, indices are
derived from a sequence of waveforms and examined over time to summarize a patient’s
progress during surgery or treatment. For example, the ECG waveform is used to
diagnose cardiac arrhythmia and adverse cardiac events such as myocardial infarctions
(heart attacks). Commercial monitoring instruments use pulse waveform analysis to
bring physicians a relatively non-invasive assessment of cardiac output [vDvBH+17].
The FloTracsystem marketed by Edwards Lifesciences Inc. (Irvine, CA, USA) uses
the arterial blood pressure (ABP) waveform to derive various circulation-related
indices to facilitate clinical management [SMG14, TSC+16].
The waveform analysis techniques that currently exist can be classified as either
landmark methods (time-domain) or spectral methods (frequency-domain). Unifying
37
these approaches is the implicit assumption that there exists a template guiding
the oscillation. For example, an ECG waveform is quantified in terms of its P, Q,
R, S, and T deflections, and an ABP waveform is often quantified in terms of its
forward-propogating and reflected components [AVBB+09]. In general, each waveform
is classified with reference to the assumed template, and one obtains indices useful for
clinical purposes such as diagnosis. Time-domain parameters quantifying the relative
positions of landmarks in the blood pressure waveform include augmentation pressure
[AVBB+09, WAO+04] and the augmentation index [OA05]; both have been proposed
as indicators of cardiovascular status. Spectral methods interpret cardiac and pulse
waveforms in the frequency domain, providing information about the inflection of
the waveform. The spectral content of the ECG waveform has been found to be
useful for classifying abnormal heartbeat morphologies [QA17]. The spectral content
of the ABP waveform has been found to be related to the physical properties of
the artery [WLH+00]. Moreover, it has been shown that by arbitrarily assembling a
million features from the pulse waveform (including both time- and frequency-domain
features), researchers can predict hypotension [HJB+18].
While these template-based approaches have been widely applied, they are chal-
lenged when we encounter long-term physiological time series or when the subject
is pathological; in both cases, one template is not enough to model the oscillatory
signal. See Figure 3.1 for an ECG signal belonging to a subject possessing premature
ventricular contractions (PVCs), wherein one template is insufficient to model the
time series. Paying mind to long-term physiological time series, the oscillatory pattern
may exhibit changes over larger time scales, in which case one template is again
insufficient for summarizing all of the observed activity. For example, the cardiac cycle
violently changes during a myocardial infarction or a paroxysmal supraventricular
tachycardia event. As discussed in Chapter 1, we designed the wave-shape oscillatory
38
model to overcome these challenges. In this chapter, we discuss applications of the
wave-shape oscillatory model to vital signs which measure the cardiovascular system.
We extract dynamical information from the electrocardiogram (ECG) and the arterial
blood pressure (ABP) signal.
Our novel algorithm for exploring the dynamics encoded by a time series adhering
to the wave-shape oscillatory model is as follows. The algorithm is composed of four
main steps, and the details of each step depend on the physiological time series that
is under consideration. Suppose the time series is sampled uniformly at fs Hz for
T > 0 seconds; that is, there are in total n := bT × fsc sampling points. Denote the
discretized time series as x ∈ Rn.
1. Apply any suitable beat tracking algorithm [MSW20] to determine the locations
of all oscillatory cycles in x. Suppose there are N resulting cycles. Denote the
location (in samples) of the i-th oscillation as ti, where t1 < t2 < . . . < tN .
2. Extract the individual oscillations from x. Denote the i-th oscillatory cycle as
xi ∈ Rp, where p > 0 depends on the frequency of the oscillation. Note that
these segments may overlap, in general. Define X := xiNi=1 ⊂ Rp. Based
on our model, X is a set of points sampled (possibly with noise) from the
wave-shape manifold M associated with x.
3. Apply the diffusion maps algorithm to the set X in order to recover the manifold
M. Write Φ(q)t :X → Rq for the truncated time-t diffusion map, where t, q > 0
are chosen appropriately.
4. Recover the dynamics on the wave-shape manifold by constructing the time
series ti/fs 7→ Φ(q)t (i). Optionally, interpolate this time series over the duration
T of the original recording.
39
In the remainder of this chapter, we provide explicit testbeds featuring several suc-
cessful applications of the wave-shape oscillatory model and the associated algorithm
for recovering physiological dynamics to cardiovascular time series. The second and
third sections of this chapter continue work originally presented in [LMW19].
3.1 Heartbeat Classification
We provide a comfortable and systematic demonstration of how the wave-shape
oscillatory model can be applied to extract dynamical information from a single-lead
ECG signal. This section is organized in the following way. First, we provide an
exploration of how the wave-shape oscillatory model and the wave-shape manifold
model can be used to assess cardiac arrhythmia from an unsupervised perspective.
We show a concrete example of recovering the wave-shape manifold and visualizing
the dynamics on the recovered manifold. Then, we leverage the wave-shape oscillatory
model to train an automatic ventricular ectopy detector which achieves state-of-the-art
performance.
3.1.1 Unsupervised Extraction of Morphological Dynamics
The existing mathematical models for morphological dynamics in single-lead ECG
signals are limited, especially when dealing with pathological cases. In particular,
rapid changes in wave-shape pose serious problems to those models. The ECG signal
in Figure 3.1 is such a pathological signal. (It is record 208 from the MIT-BIH
Arrhythmia database available at PhysioNet.) The wave-shapes (delineated 250 ms
to the left and 400 ms to the right of each demarcated R wave) do not satisfy the
slowly varying assumption required by the generalized phenomenological model. A
model incorporating multiple components would require knowledge about the number
of distinct components in the signal, which a priori is not available to us. The
40
wave-shape oscillatory model is intended to overcome these difficulties. We use this
particular ECG signal (which features arrhythmic beats called premature ventricular
contractions) to provide a brief discussion of how the wave-shape oscillatory model
should be applied in practice before moving to a clinical application of these principles.
In Figure 3.2, we show the recovery of the wave-shape manifold obtained by
running the diffusion maps algorithm on the set of excised cardiac cycles. The
visualization we obtain amounts to a spectral clustering of the cardiac cycles in the
recording based on morphology. (There are 2953 annotated beats in the recording.)
To obtain this embedding, we perform the following computations. Write the raw
ECG signal as x ∈ Rn, where fs is the sampling rate and n = 30 × 60 × fs is the
number of samples in this 30-minute recording. There are N = 2953 annotated R
waves provided by the expert annotator. Let X ∈ RN×p be the matrix of beats, where
p = b0.25× fsc+ b0.4× fsc+ 1 and
X(i, j + b0.25× fsc+ 1) = x(ri + j) − b0.25× fsc ≤ j ≤ b0.4× fsc. (3.1)
We build a self-tuning affinity matrix W ∈ RN×N by setting
W(i, j) =1
2
[exp
[−
p∑k=1
(X((i, k)−X((j, k))2
σ2i
]+ exp
[−
p∑k=1
(X((i, k)−X((j, k))2
σ2j
]],
(3.2)
where σi is the median Euclidean distance from the i-th row of X to all other rows.
We then perform α-normalization with α = 1 to handle the (possibly) non-uniform
density function. We calculate
W(α) = D−1WD−1, (3.3)
where D ∈ RN×N is a diagonal matrix satisfying
D(i, i) =N∑j=1
W(i, j). (3.4)
41
We then calculate the isotropic diffusion kernel by setting
P =(D(α)
)−1/2W(α)
(D(α)
)−1/2, (3.5)
where D(α) ∈ RN×N is a diagonal matrix satisfying
D(α)(i, i) =N∑j=1
W(α)(i, j). (3.6)
We obtain the top two non-trivial eigenvectors of P with their corresponding eigenval-
ues, namely
φ1, φ2 ∈ RN 1 > λ1 ≥ λ2. (3.7)
Finally, we set the truncated time-1 diffusion map to be the matrix
Φ1 =[λ1ϕ1 λ2ϕ2
]∈ RN×2, (3.8)
where ϕi =(D(α)
)−1/2φi ∈ RN are the top non-trivial (unnormalized) eigenvectors
of the diffusion kernel. To visualize the dynamics on the recovered wave-shape
manifold, we plot the time series ri/fs 7→ −ϕ1(i), where ϕ1 is the first non-trivial
eigenvector of the diffusion kernel. In Figure 3.3, we have additionally performed
linear interpolation over the duration of the recording. To verify the meaning of the
extracted dynamics, we compare this time series with the interpolated time series
of manual beat annotations. The negative sign in front of ϕ1 is placed so that the
time series are similarly oriented. Evidently, they are highly correlated. From this
compressed representation of the information in our pathological ECG signal, we
can ascertain a number of quantities useful to physicians. First, we can assess the
frequency at which premature ventricular contractions occur. We can also assess any
temporal coupling between the triggering of normal and abnormal beats. Finally, we
can determine the number of distinct cardiac cycle morphologies present in the signal,
as well as the geometric relationship between those morphologies. In particular, we
42
Figure 3.1: A pathological ECG signal featuring morphologically distinct prematureventricular contractions and fusion beats. In this case, the wave-shape is not slowlyvarying.
notice in Figure 3.2 that the fusion beats are located in between the normal beats
and the premature ventricular contractions, as expected from the definition. This
time series is rarely studied over the long term, partially due to the cost and difficulty
of obtaining a correct sequence of beat annotations. However, with the wave-shape
oscillatory model and its associated algorithm for dynamics recovery, we are making
a step in this direction. Below, we present a clinical application of these principles.
3.1.2 Supervised Ventricular Ectopy Detection
We design an automatic algorithm for detecting ventricular ectopic beats in single-lead
ECG recordings. The classifier leverages the wave-shape manifold of each subject.
We build six features from each cardiac cycle, including a novel morphological feature
which describes the distance to the median beat in the recording. After an unsupervised
subject-specific normalization procedure, we train an ensemble binary classifier using
the AdaBoost algorithm. After training our classifier on subset DS1 of the MIT-BIH
Arrhythmia database, our classifier obtains a sensitivity of 94.75%, a positive predictive
43
Figure 3.2: A mapping of N = 2953 cardiac cycles into R2 afforded by the toptwo coordinates of the time-1 diffusion map. We color the embedded points usingtheir associated annotations. Fuchsia points correspond to premature ventricularcontractions, orange points correspond to normal beats, and teal points correspond tofusion beats.
44
Figure 3.3: A projection of the morphological dynamics on the recovered wave-shapemanifold. In red, we plot the sequence of annotations provided by the expert annotator.In blue, we plot the time series ri/fs 7→ −ϕ1(i), the first non-trivial eigenvector of thediffusion kernel. The abbreviation PVC refers to premature ventricular contractions.
value of 95.49%, and an F1 score of 95.12% on subset DS2 of the same database.
Our classifier achieves an F1 score of 91.23% on the St. Petersburg INCART 12-lead
Arrhythmia database. The six features (as well as the normalization method) we use
allow our ventricular ectopy detector to achieve high precision on unseen subjects
and databases. Our ventricular ectopy detector will be used to study the relationship
between premature ventricular contraction burden and undesirable patient outcomes
such as congestive heart failure and mortality.
Background and Significance
Ventricular ectopy is known to be a modifiable risk factor for congestive heart failure
(CHF) among certain populations. The exact relationship between PVC frequency
and both CHF and mortality is still being studied [DDV+15]. It is also known that
polymorphic PVCs are associated with a higher incidence of mortality, hospitaliza-
45
tion, transient ischemic attack, new-onset atrial fibrillation, and new-onset heart
failure independent of other clinical risk factors [LCL+15]. Moreover, PVC coupling
interval variability may lead to cardiomyopathy [HRC+17]. While quantities such
as PVC frequency are commonly encountered both in the clinic and in large-scale
epidemiological studies, the temporal coupling between PVCs and non-PVCs is less
frequently considered in the literature. On the other hand, it is well known that the
existence of PVCs has a significant impact on heart rate variability (HRV) analysis
[SHS01], and a correction must be carried out before analyzing HRV. In practice, PVC
frequency is assessed using long-term, mobile electrocardiogram (ECG) recordings.
These recordings are usually read by automatic ventricular ectopy detectors using
hardware such as the MARS 8000 PC Holter Monitoring System (GE Medical Systems,
Milwaukee, Wisconsin) and the PVC information is provided by the accompanying
proprietary algorithms.
In the literature, a variety of heartbeat classification algorithms have been proposed
to automatically annotate heartbeat type from the available ECG recordings (following
the AAMI standard [ECA87, EC598]), and the majority of these algorithms partition
an individual’s heartbeat set into four classes, namely normal beats, supraventricular
ectopic beats, ventricular ectopic beats, and fusion beats (i.e., fusions of normal and
ventricular ectopic beats). Traditionally, researchers design automatic classification
algorithms based on carefully chosen morphological features and heart rate [DCOR04,
LM10, LM12, YKC12, HFSW17]. Recently, the trend has been to use deep neural
networks (DNNs) to classify cardiac rhythm. Although DNNs are still black boxes,
various classification results have been published [AOH+17, WH17, XNC+18]. While
existing algorithms have the potential to help physicians identify various types of
arrhythmia, the classification accuracy for each individual beat type (PVCs, in
particular) is usually not ideal. To the best of our knowledge, only a few works focus
46
Table 3.1: Annotated single-lead ECG databases
DatabaseMIT-BIH
ArrhythmiaDS1
MIT-BIHArrhythmia
DS2
St.PetersburgINCART
Arrhythmia
Non-VEB 47233 46491 155888VEB 3788 3221 20011
Sampling Rate 360 Hz 360 Hz 257 HzRecordings 22 22 75
Purpose Training Validation ValidationLength 30 mins 30 mins 30 mins
on providing an accurate and efficient PVC detection algorithm that operates on
single-channel ECG recordings [ASO19, MAR10, MAR11, MAR13].
In this work, we propose a novel algorithm to identify whether a heartbeat is
a ventricular ectopic beat (i.e., a PVC or a ventricular escape beat). Behind our
algorithm is the principle (supported by a priori cardiac electrophysiological knowledge)
that ventricular ectopic beats are “distant” from the median beat in the recording.
The novelty is twofold. First, to handle the difficulty of inter-individual rhythm
classification, we introduce the notion of subject-specific normalization of features.
This means that our classifier is agnostic to each beat’s subject of origin. Second,
we introduce a novel beat morphology feature that is based on the quadratic form
distance [HSE+95] between histograms of heart beats. By convention, in most existing
works, training and validation of these classifiers is performed on the publicly-available
MIT-BIH Arrhythmia database [MM01]. In this work, we adopt this convention and
consider an additional database available at Physionet [GAG+00], namely the St.
Petersburg INCART 12-lead Arrhythmia database.
47
3.1.3 Methods
Annotated Databases
Two annotated databases (see Table 1) were used for the purposes of training and
validation. Forty-four recordings from the MIT-BIH Arrythmia database [MM01] were
divided into two groups, labeled DS1 and DS2 [DCOR04]. (The database actually
contains 48 records, but four of them are conventionally discarded because they
contain paced beats.) Only the first available lead (a modified limb lead II) was
considered in this study, as our algorithm is designed to operate on a single channel.
All 44 recordings were 30 minutes in length and were sampled at 360 Hz. We also
considered all 75 recordings from the St Petersburg INCART 12-lead Arrhythmia
Database. In this database, we used the second available lead (the limb lead II)
because of the low quality of the first available lead. Recordings were 30 minutes
long and were sampled at 257 Hz. Both databases are available online at PhysioNet
[GAG+00]. PhysioNet’s annotation system divides cardiac cycles into 19 subgroups
following the AAMI standard [ECA87, EC598]. Our algorithm distinguishes PVCs
(annotation code V) and ventricular escape beats (annotation code E) from all other
beat types. This is a binary classification problem; we consider the positive class to
be class consisting of the PVCs and the ventricular escape beats.
Proposed Algorithm
Our proposed algorithm takes as input a raw electrocardiogram x ∈ Rn in units of
mV, its sampling rate fs, and the locations (in samples) of all QRS complexes in
the recording. Suppose there are N detected QRS complexes in x, and that the
i-th QRS complex is located at sample ri. Our algorithm determines which of the
observed QRS complexes correspond to either a premature ventricular contraction
or a ventricular escape beat. We apply a standard high-accuracy QRS detector to x.
48
If the i-th QRS complex location is matched to any of the predicted QRS complex
locations provided by the QRS detector, we replace ri with this value. This step is
to ensure that the algorithm is not confounded by the method used to obtain the
QRS complex locations. We apply a bi-directional 3rd order Butterworth bandpass
filter to x with cutoff frequencies 0.5 Hz and 40 Hz to remove baseline wandering and
to suppress noise. We then upsample the signal to a sampling rate of fs = 2000 Hz.
We use the same notation ri to denote the location of the i-th QRS complex in the
upsampled signal. Denote the filtered and upsampled signal as x ∈ Rn. We associate
six features to each QRS complex. We build six vectors Y1, ..., Y6 ∈ RN so that Yi(j)
is the value of the i-th feature at the j-th QRS complex. We begin by calculating
the time difference between the current beat and the previous beat, specifically
X1(i) :=ri − ri−1
fs(3.9)
the time difference between the current beat and the next beat, specifically
X2(i) :=ri+1 − ri
fs(3.10)
the electric potential (in the filtered ECG) at the location of the detected QRS
complex, specifically
X3(i) := x(ri) (3.11)
These features are normalized by a procedure we call “subject-specific normalization.”
This normalization is also time-specific. In particular, for each 1 ≤ i ≤ 3, we define
the normalized features for the current recording to be
Yi(j) = Xi(j)− medXi(k) : |j − k|≤ 500, (3.12)
where med denotes taking the median. Note that this operation corresponds to
removing the output of a median filter (with window size 1001 QRS complexes) from
49
the signal Xi. The final three features are morphological. To calculate these features,
we first splice a portion of each cardiac cycle from the recording. Set a = b0.125fsc
and b = b0.2fsc. Obtain a matrix B ∈ RN×(a+b+1) by setting
B(i, t+ a+ 1) = x(ri − t), (3.13)
where i is the beat index and t is an integer satisfying −a ≤ t ≤ b. The intention is
that the i-th row of B corresponds to a large portion of the i-th cardiac cycle. We
now associate three distances with each cardiac cycle, namely
the squared Euclidean distance to the median beat, specifically
Y4(i) =a+b+1∑t=1
[B(i, t)− medB(j, t) : |i− j|≤ 500]2 (3.14)
the squared diffusion distance [CL06] to the median embedded beat, which we
will define shortly;
the quadratic form [HSE+95] (or cross-bin) distance from the current beat’s
histogram to the median histogram, which we will define shortly.
To calculate the fifth feature (the diffusion distance to the median embedded
beat), we carry out the following steps. Build an affinity matrix W ∈ RN×N using
the formula
W (i, j) = exp
[−
a+b+1∑k=1
(B(i, k)−B(j, k))2
200
]. (3.15)
We α-normalize the affinity matrix (with α = 1) by setting
W (α) = D−1WD−1, (3.16)
where D ∈ RN×N is a diagonal matrix satisfying
D(i, i) =N∑j=1
W (i, j). (3.17)
50
We then calculate the isotropic diffusion kernel by setting
P =(D(α)
)−1/2W (α)
(D(α)
)−1/2, (3.18)
where D(α) ∈ RN×N is a diagonal matrix satisfying
D(α)(i, i) =N∑j=1
W (α)(i, j). (3.19)
We obtain the top 80 non-trivial eigenvectors of P with their corresponding eigenvalues,
namely
φi80i=1 ⊂ RN 1 > λ1 ≥ λ2 ≥ · · · ≥ λ80. (3.20)
Finally, we set the truncated time-1 diffusion map to be the matrix
Φ1 =[λ1
(D(α)
)−1/2φ1 · · · λ80
(D(α)
)−1/2φ80
]∈ RN×80. (3.21)
The fifth feature (associted to the i-th cardiac cycle) is obtained by setting
Y5(i) =80∑t=1
[Φ1(i, t)− medΦ1(j, t) : |i− j|≤ 500]2 . (3.22)
To obtain the sixth feature, we first calculate the histogram of each cardiac cycle.
Define a vector of bin edges
E = (−2,−11/6, ...,−1/3, 0, 1/3, ..., 11/6, 2) ∈ R25 (3.23)
and a matrix of histograms H ∈ RN×24 by setting
H(i, j) = |B(i, t) : 1 ≤ t ≤ a+ b+ 1 ∩ [Ej, Ej+1)| (3.24)
for each 1 ≤ i ≤ N and each 1 ≤ j ≤ 24. Note that H(i, j) counts the number of
samples making up the i-th beat which are greater than or equal to Ej but less than
Ej+1. Note that the sum of each row of H is a + b + 1, and it is unnecessary to
normalize the histograms so that each sums to 1. We locally estimate the median
histogram by introducing a new matrix G ∈ RN×24 whose entries are
G(i, j) = medH(i, k) : |j − k|≤ 500. (3.25)
51
In fact, we obtain G by applying a moving median filter to the columns of H. We
define a matrix of bin similarities M ∈ R24×24 by setting
M(p, q) = exp
[−(Ep − Ep+1 − Eq + Eq+1)2
2
](3.26)
for all 1 ≤ p, q ≤ 24. The sixth and final feature, namely the quadratic form distance
from the current beat’s histogram rowi(H) to the median histogram rowi(G), is
calculated by the formula
Y6(i) = (rowi(H)− rowi(G))M (rowi(H)− rowi(G))> . (3.27)
The six features Y1(i), ..., Y6(i) are fed into an ensemble binary classifier to determine if
the i-th cardiac cycle is of the positive class (either a premature ventricular contraction
or a ventricular escape beat). The classifier consists of 50 weak binary decision tree
learners each with a maximum of 4 splits, and the ensembling method was the
AdaBoost algorithm [FS95, Fri01, FHT+00, SFB+98]. In particular, we used the
AdaBoostM1 functionality in MATLAB 2019a with the default parameters.
Evaluation
We carry out the following training procedure. First, we train the ensemble binary
classifier on all of the cardiac cycles in DS1. Then, we validate the trained classifier
on the cardiac cycles in DS2. We did not use the DS2 database during any point
of the training stage. Note that beats from the same subject are not distributed
across both DS1 and DS2. We carry out a further validation by applying the trained
classifier to all of the cardiac cycles in both the St. Petersburg INCART database.
We use a number of evaluation metrics derived from the number of true positives
(TP), false positives (FP), true negatives (TN), and false negatives (FN). We calculate
the sensitivity (SEN), specificity (SPE), and positive predictive value (PPV) using
52
the formulas
SEN = 100%× TP
TP + FNSPE = 100%× TN
FP + TN(3.28)
PPV = 100%× TP
TP + FP(3.29)
The F1 score is the harmonic mean of SEN and PPV and serves as our main summary
statistic. Due to the strong imbalance between the positive and negative classes, SPE
is not as informative as SEN and PPV for assessing model performance.
3.1.4 Results
After training the ensemble binary classifier on DS1, we report in Table 2 a variety
of evaluation metrics which describe the classifier’s performance on DS2 and the
St. Petersburg INCART database. We also report the performance on the training
database (DS1) to show that our model has not been over-fit to the training data.
A SEN of 94.75% on DS2 means that our classifier correctly detects 95 out of every
100 ventricular ectopic beats. A PPV of 95.49% means that every ventricular ectopy
detection made by our algorithm has a 95.49% chance of being a ventricular ectopic
beat. We do notice, however, that the F1 score on the INCART database is lower
than the F1 score on DS2. This result could be due to the fact that the recordings
were made on different devices and annotated by different individuals. This result
could also be due to differences in the densities of certain pathologies within each
database.
Comparison with Previous Results
We compare our algorithm’s performance to the reported performance of previous
algorithms (see Table 3). Llamedo and Martinez [LM10] design an automatic heartbeat
classifier which divides cardiac cycles into four categories, one of which is the ventricular
53
Table 3.2: Performance of the proposed ventricular ectopy detector
DatabaseMIT-BIH
ArrhythmiaDS1
MIT-BIHArrhythmia
DS2
St.PetersburgINCART
Arrhythmia
Purpose Training Validation ValidationTP 3530 3052 17648FP 213 144 1028TN 47020 46347 154860FN 258 169 2363
SEN (%) 93.19 94.75 88.19SPE (%) 99.55 99.69 99.34PPV (%) 94.31 95.49 94.50F1 (%) 93.75 95.12 91.23
ACC (%) 99.08 99.37 98.07AUC 0.9989 0.9981 99.31
TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; SEN: sensitivity;SPE: specificity; PPV: positive predictive value; F1: F1 score; ACC: accuracy; AUC: area under the
ROC curve.
ectopy class we consider in this work. They consider two-lead recordings. They select
eight features based on the RR interval time series and the autocorrelation of the
discrete wavelet transform of each ECG lead. They employ a linear discriminant
classifier, and they report a SEN of 82.9% and a PPV of 88.0% on DS2 when identifying
the ventricular ectopy class (although three ventricular ectopic beats appear to be
missing from the reference set). While their algorithm’s performance is certainly
below the performance of our algorithm, the fact that their algorithm was designed to
solve a multiclass problem indicates that a lower performance on the ventricular class
is to be expected. Llamedo and Martinez also consider a semi-automatic approach
to heartbeat classification [LM12]. Their previously considered model is modified to
allow expert assistance. After manually annotating 12 carefully selected beats from
each recording, they achieve a SEN of 92.2% and a PPV of 94.8% when detecting
premature ventricular contractions, ventricular escape beats, and fusions of ventricular
54
Table 3.3: Comparison with previous results on DS2
Reference SEN (%) PPV (%) F1 (%) Notes
de Chazal et al. [DCOR04, Table 8] 77.7 81.6 79.6Multi-classMulti-lead
Llamedo & Martinez[LM10, Table 6]
82.9 88.0 85.4Multi-classMulti-lead
Llamedo & Martinez[LM12, Discussion]
92.2 94.8 93.5Expert-assisted
Multi-classMulti-lead
Oster et al. [OBS+15, Table 4] 92.7 96.2 94.5Expert-assisted
Multi-lead
Alfaras et al. [ASO19, Table 7] 92.7 95.7 94.2
Sultan Qurraie& Ghorbani Afkhami
[QA17, Table 4]
95.4 94.1 94.8 Multi-class
This work 94.8 95.5 95.1
and normal beats (Physionet annotation code F). Since our ventricular ectopy class
does not contain the fusion class, this result may not be comparable. Another expert-
assisted approach which features manual annotation of an average of three beats from
each 30-minute recording in DS2 achieved an F1 score of 94.5% [OBS+15] on the
same binary classification problem we consider in this work. Oster et al. consider
two leads, and they report that discarding noisy beats (3.2% of all beats in DS2) can
improve the F1 score to 98.6%. For ultra-long-term signals (≥ 14 days) we expect
that the number of beats which need to be manually annotated in order to achieve
the same performance would increase.
A recent single-lead automatic ventricular ectopy detector which uses ring-topology
reservoir computing (echo state networks) achieved a SEN of 92.7% and a PPV of
95.7% when detecting the same set of ventricular ectopic beats that we consider in this
55
work [ASO19, Table 7]. Alfaras et al. also report performance on the second available
lead in the MIT-BIH Arrhythmia database; they obtain a SEN of 86.1% and a PPV of
75.1% on DS2, but we obtain a SEN of 79.3% and a PPV of 89.6%. In our experiments,
their ensemble of 30 networks (with 1000 nodes each) ironically suffers from long
computation times. Nevertheless, their reported performance is not as strong as what
we are able to achieve. We report one final heartbeat classifier by Sultan Qurraie and
Ghorbani Afkhami which uses time-frequency analysis (the Wigner-Ville distribution)
[QA17]. They achieve a SEN of 95.4% and a PPV of 94.1% on DS2 using only the first
lead (although three ventricular ectopic beats appear to be missing from DS2). While
this result is roughly equivalent to ours, the algorithm is undoubtedly numerically
inefficient due to its use of time-frequency methods. Moreover, since the authors do
not validate on multiple databases, their parameters may be over-tuned to perform
well on DS2.
3.1.5 Discussion
In this study, we design and validate an automatic, single-lead ventricular ectopy
detector which achieves an F1 score of 95.12% on the validation database DS2. The
algorithm uses subject-specific normalization of six features derived from the RR
interval time series and beat morphology. We ensure that our algorithm is not
confounded by the QRS complex annotation method (whether it be automatic or
manual) by, when possible, replacing each QRS complex annotation with one generated
by our own high-accuracy QRS detector. We also propose a novel feature to describe
beat morphology which involves comparing heartbeat histograms using the cross-bin
distance. The final classifier we use to solve the binary classification problem is an
ensemble of binary decision trees, aggregated using the AdaBoost algorithm with 50
learning cycles. Our detector generalizes well to other databases, achieving an F1
56
score of 91.23% on the St. Petersburg INCART Arrhythmia database.
Our algorithm is intended to be able to perform on new systems in mobile health
monitoring such as the ZIO Patch cardiac monitor (iRhythm Technologies, Inc., San
Fransisco, California, USA), so it is not designed to leverage all of the information
present in a multi-lead ECG recording. However, it appears that for ventricular ectopy
detection, one lead is enough to achieve a satisfactory result. In the future, we will
report work which leverages multiple leads to obtain information from ECG recordings
with higher precision. We note that since we use the diffusion maps algorithm in
this work (which does not scale well with the number of beats in a recording), long
recordings should be analyzed in 30-minute segments.
We provide a brief discussion of technical details in this work. The first one is the
use of histograms, as well as the so-called quadratic form distance between histograms.
Niblack et al. describe the use of histograms and the quadratic form distance for
searching large databases of images [NBE+93]. Hafner et al. describe the use of
histograms and the quadratic form distance for performing color-based indexing of
images [HSE+95]. Rubner et al. compare the cross-bin distance to the earth mover’s
distance [RPTB01, RTG00]. Histograms have been used for classifying images and
textures in images [HGN04, LO06]. The quadratic form distance we use in this work
is a Bregman divergence with a positive-definite kernel, but as far as we know, its
specific theoretical properties have not been studied. The properties it apparently
enjoys is a robustness to translation (of the beat, due to possible misalignment of the
R peak) and a robustness to measurement noise. Further work is required to verify the
properties of this comparison tool, but we find its usefulness evident in this application.
Since our algorithm uses the distance to the median beat to detect ventricular ectopy,
we are implicitly assuming that the most common beat in each recording is not a
ventricular ectopic beat. We believe that this assumption is warranted, as the subject
57
in DS2 with the highest percentage of ventricular ectopic beats had only 32%.
In the algorithm of Llamedo and Martinez [LM10, LM12], both the current RR
interval and the average of the previous RR intervals are considered as features (after
applying the natural logarithm). Of course, this approach differs from ours in two
significant ways. First, the average is used instead of the median; while the median
operator is insensitive to outliers, the mean operator is not. Second, we (roughly
speaking) consider the difference between these features, while Llamedo and Martinez
feed them separately into the classifier. We believe Llamedo and Martinez’s approach
leads to over-fitting, as the classifier will attempt to draw connections between baseline
heart rate and PVC frequency. In general, it is very easy for a classifier which relies on
non-relative features such as heart rate or beat morphology to over-fit to the training
data. In the case of heart rate, the reason is that the classifier is provided with a
large number of PVCs with similar RR intervals. In the case of beat morphology, the
reason is that the classifier is provided with a large number of morphologically similar
PVCs. If the resolving power of the classifier is high enough (such as with a deep
convolutional neural network [LBBH98]), the model will be able to recognize only
those ventricular ectopic beats which demonstrate high morphological similarity to
at least one PVC morphology in the training database. However, since the training
database is small (in terms of the number of distinct PVC morphologies available
to us), and due to the large variety of distinct PVC morphologies encountered in
practice, this kind of classifier is undesirable.
We had the option to use the temporal coupling between PVCs and non-PVCs as
a feature in our classifier (via, for example, long short-term memory networks [HS97]).
Doing so led to an increased F1 score on DS2 but a decreased F1 score on the INCART
database. Since these scores were only slightly better than what we obtained using a
boosted ensemble of decision trees (except on the INCART database), we decided
58
that the increased computational load was not worth the improvement. An advantage
to using decision trees to perform classification is that we do not have to map features
into a small interval by a process such as z-scoring or min-max scaling before training
the classifier. As the problem of ventricular ectopy detection is largely a problem of
anomaly detection in which we cannot assume our features are distributed normally
(or uniformly) and the number of outliers is high, this kind of normalization step
would be inappropriate. All parameters used in the training algorithm were chosen
ad hoc and without performing any hyperparameter optimization. Our algorithm
is very fast and is suitable for analyzing long-term signals, but since we need to
resample the signal to 2000 Hz and apply the diffusion maps algorithm, the signal
should be analyzed in chunks in order to minimize memory usage. Without including
the QRS detection stage, the feature building and classification stages took only 330
milliseconds per 30-minute recording. All calculations were performed in MATLAB
2019a on an Intel i7-4790K processor at 3.60 GHz with 24 GB of RAM.
This study has several limitations. First, our algorithm suffers from limited
training data. While we follow standard procedure in the literature and train our
algorithm on DS1, an algorithm intended for clinical deployment should be trained
on a larger database or aggregation of databases. Another limitation of our algorithm
is that it requires an automatic signal quality classifier to identify when the ECG is
readable. The databases used in this study are well-behaved as far as readability, so
this is not a serious issue but should be considered when applying the algorithm in a
real-world setting.
3.2 Electrocardiogram-derived Respiration
Respiratory signals can be used for a variety of purposes; respiratory rate variability
and the variability inherent in the respiratory waveform is used to monitor respiratory
59
health and to predict respiratory failure [FCEH03]. In the pulmonary clinic, respiratory
rate is measured by counting the number of breaths a patient takes in a one-minute
period. Unfortunately, this process is error-prone [STSG85, SRB+91]. It also provides
minimal information about the respiratory system; the respiratory waveform holds
crucial information about the structure of the pulmonary system [Tob88]. Existing
methods for extracting this information are uncomfortable for the patient and not
ideal for long-term, mobile monitoring [FGHS02]. We are interested in estimating
respiratory rate and the respiratory waveform through easy-to-use, mobile-capable,
and non-invasive methods.
The respiratory signal can be extracted as the amplitude modulation of the ECG
signal. When the patient inhales, transthoracic electrical impedance increases, and the
ECG electrode moves further from the heart; consequently, the amplitude of the ECG
signal decreases. On the other hand, the amplitude of the ECG signal increases when
the patient exhales via the complementary mechanism. Electrocardiogram-derived
respiration (EDR) is an old technique which uses a multiple- or single-channel ECG
to estimate respiratory activity. Deriving the respiratory signal from a single-channel
ECG has been attempted using a number of methods. Traditionally, the EDR signal
is obtained by interpolating the time series of RS heights as follows:
EDR (ri/fs) := E(ri)− E(si), (3.30)
where E is the discretized ECG signal, fs is the sampling rate, ri is the location of
the i-th R peak, and si is the location of the following S peak, using piecewise cubic
spline interpolation [CAM+06, Chapter 8]. Another landmark-based method involves
estimating the area of each QRS complex [CAM+06, Chapter 8]. In [LBM10], principal
component analysis (PCA) is used to obtain an EDR signal, and in [WVD+12], kernel
principal component analysis (kPCA) is used. Since respiratory dynamics are known
to manifest as the amplitude modulation in the cardiac waveform, recovering the
60
respiratory signal via the wave-shape oscillatory model should be achievable. In this
section, we present a proof-of-concept approach to obtaining an EDR signal from a
non-traditional perspective.
3.2.1 Modeling Respiratory Dynamics in the ECG
Denote the ECG signal as E : R → R, and assume it adheres to the wave-shape
oscillatory model. We assume that the phase manifold N hosts respiratory dynamics
in the following sense. Suppose there is a smooth vector field X on N that describes
the respiratory drive (the instantaneous velocity of the respiratory center’s input).
We let
θt : N → N (3.31)
be the flow of X. More generally, we could assume that θt satisfies the following
stochastic differential equation defined on N :
dθt = Xθtdt+ dωt, (3.32)
where ω is Brownian motion on N . In either case, θ′t(p) describes the respiratory
drive after time t with initial state p ∈ N . In general, the phase manifold N is
not accessible. However, as discussed above, the ECG signal is a sensor which can
occasionally observe the phase manifold. To model the observation process, we assume
the existence of a diffeomorphism ρ:N →M that maps N to the wave-shape manifold
M associated with the ECG signal. In this case, the flow θt is mapped to a flow
ρ∗θt = ρθt defined onM. Clearly, under the model (3.31), ρ∗θt is the flow associated
with the vector field ρ∗X defined on M. Under the model (3.32), by Ito’s formula,
ρ∗θt satisfies the following stochastic differential equation defined on M:
d(ρ∗θt) =
(1
2∆ρ|θt+∇ρ|θtXθt
)dt+∇ρ|θtdωt. (3.33)
61
This model can be generalized to other physiological oscillatory signals, while the
interpretation of θt, N , and ρ will be different. In general, θt represents the intrinsic
respiratory dynamics, and ρ∗θt represents the observable respiratory dynamics. We
now extend this discussion to obtain a new ECG-derived respiration algorithm.
3.2.2 DM-EDR
We provide a detailed description of our proposed algorithm (coined DM-EDR) for
extracting respiratory dynamics from a single-lead ECG signal. The raw ECG signal
is first upsampled to a sampling rate of 1000 Hz. Noise in the ECG signal is surpressed,
baseline wandering is removed, and the P and T waves are suppressed by applying
a bi-directional third-order Butterworth bandpass filter with cutoff frequencies 3-40
Hz. We detect the R peaks in the recording by applying a standard high-accuracy
QRS detector [Elg13]. This step allows us to delineate cardiac cycles. Write the
pre-processed ECG signal as a vector E ∈ Rn, where n = fs × T is the number of
samples, fs = 1000 Hz is the sampling rate of the signal, and T is the duration of
the recording in seconds. Suppose there are N detected R peaks in the ECG signal.
Write the location in samples of the ith R peak as ri. For the purpose of analyzing
the traditional EDR signal, the S peak following each R peak is determined by
si = arg minri+1≤t≤ri+60E(t) , (3.34)
where 60 ms is chosen according to normal electrophysiology. By defining a window
surrounding each R peak that extends 100 ms to the left and 100 ms to the right,
we are able to excise each QRS complex. (Recall that the P and T waves have been
suppressed.)
X(i, j + 101) := E (ri + j) −100 ≤ j ≤ 100, (3.35)
Clearly, X(i, j) ∈ RN×p, where p = 201. According to the wave-shape manifold model,
the set of QRS complexes lie on a low-dimensional manifold embedded in Rp. Note
62
that the row-ordering of X subliminally encodes the temporal information, but we will
not use it to recover the wave-shape manifold. To recover the wave-shape manifold
and the respiratory dynamics it hosts, we apply the diffusion maps algorithm as
follows. We build a self-tuning affinity matrix W ∈ RN×N by setting
W(i, j) =1
2
[exp
[−
p∑k=1
(X((i, k)−X((j, k))2
σ2i
]+ exp
[−
p∑k=1
(X((i, k)−X((j, k))2
σ2j
]],
(3.36)
where σi is the median Euclidean distance from the i-th row of X to all other rows.
We then perform α-normalization with α = 1 to handle the (possibly) non-uniform
density function. We calculate
W(α) = D−1WD−1, (3.37)
where D ∈ RN×N is a diagonal matrix satisfying
D(i, i) =N∑j=1
W(i, j). (3.38)
We then calculate the isotropic diffusion kernel by setting
P =(D(α)
)−1/2W(α)
(D(α)
)−1/2, (3.39)
where D(α) ∈ RN×N is a diagonal matrix satisfying
D(α)(i, i) =N∑j=1
W(α)(i, j). (3.40)
We obtain the top non-trivial eigenvector φ ∈ RN of P, and we set ϕ ∈ RN to be the
vector
ϕ =(D(α)
)−1/2φ. (3.41)
Finally the DM-EDR is obtained by interpolating the function
DM-EDR (ri/fs) := ϕ(i) (3.42)
63
over the duration of the recording at a uniform sampling rate of 10 Hz. (We use
piecewise cubic spline interpolation.) Note that, in the case of EDR, due to the
interactions between the eigenfunctions and the dynamics on the manifold, it is not
guaranteed which eigenfunction will most accurately capture the respiratory dynamics.
To reduce the chance of confoundment (following a previous approach [WVD+12]), we
force our algorithm to rely solely on QRS complex morphology. In general, automating
a selection of the best eigenfunction requires further ingenuity. In the worst case, the
respiratory information may be spread non-linearly across multiple eigenfunctions. In
the following section, we provide an explicit example in which we need to adjust the
algorithm when a subject exhibits premature ventricular contractions (PVCs).
3.2.3 Material and Results
The ECG signals featured in our experiment are from a set of standard overnight
polysomnograms which were recorded to confirm the presence of sleep apnea syndrome
in subjects suspected of sleep apnea at the sleep center in Chang Gung Memorial
Hospital (CGMH), Linkou, Taoyuan, Taiwan. The Institutional Review Board of
CGMH approved the study protocol (No. 101-4968A3). All recordings were acquired
on the Alice 5 data acquisition system (Philips Respironics, Murrysville, PA). One
subject with PVC arrhythmia is chosen manually from the database. We focus on
the first lead of the ECG recording, which, before upsampling, is sampled at 200
Hz. An automated signal quality index called rSQI [BOLC13] is used to identify
15-minute segments of each ECG signal which can be trusted for analysis. We visualize
the dynamics in each 15-minute ECG signal in two ways. First, we visualize the
wave-shape manifold by showing the image of the time-1 diffusion map. In Figure 3.4,
we show that one subject has a wave-shape manifold roughly characterized in terms
of R peak height and the interval between the R peak and the S peak.
64
Figure 3.4: A mapping of the set of QRS complexes to R2 using the top twonon-trivial coordinates of the time-1 diffusion map. Each point corresponds to oneQRS complex in the recording. Left: we colour each point by the amplitude of the Rpeak; right: we colour each point by the length of the corresponding R-to-S interval,measured in time from the R peak to the S peak. The two embeddings are otherwiseidentical.
To visualize the dynamics on the wave-shape manifold, we turn to the interpolated
eigenfunctions. In Figure 3.5, we show that respiratory dynamics are encoded and well-
captured by the top non-trivial eigenvector of the diffusion kernel. The accompanying
airflow signal (Flow) from the PSG is used to confirm the accuracy of the recovery.
The EDR signal obtained using the traditional method appears almost identical to
the DM-EDR signal. In Figure 3.6, we show that the top non-trivial eigenvector of the
diffusion kernel is sensitive to PVC activity. This result comes from the mixing effect
of the eigenvectors. While PVCs lie on a disconnected component of the wave-shape
manifold, we do not have this information a priori, and in the DM-EDR algorithm, a
complete graph is constructed. As a result, the eigenvectors are distorted. To obtain
the respiratory dynamics of a patient with PVC arrhythmia, we should apply the DM
to only those cardiac cycles which are classified as normal (by a PVC detector such
as the one described in Section ). The result is shown as ϕ in Figure 3.6, wherein
65
we additionally ignore the beats that are directly adjacent to any PVC beat when
calculating the diffusion map.
Figure 3.5: The EDR signal (obtained from either the traditional method or theDM-EDR algorithm) well-approximates the respiratory flow signal.
While no quantitative performance measures are provided, this series of results
demonstrates that, based on the wave-shape oscillatory model, we can extract res-
piratory information from a non-traditional viewpoint. We do not claim that the
proposed approach outperforms the traditional one, but this result suggests that we
may combine the new approach with the traditional one to obtain a better estimate
for the respiratory information. We will systematically explore this possibility in
future work.
3.3 Intraoperative Cardiac Monitoring
In this section, we provide an example showcasing how the wave-shape oscillatory
model can be used to visualize the response of the cardiovascular system to a myocardial
66
Figure 3.6: Morphological differences between normal and PVC beats are reflectedin the top non-trivial eigenvector of the diffusion kernel. Because of the mixing ofeigenvectors, compensating for the presence of abnormal beats by ignoring them duringinterpolation does not yield respiratory information (ϕnormal). Alternatively, usingthe diffusion maps algorithm to recover each connected component of the wave-shapemanifold individually succeeds (ϕ).
infarction event. In the operating room, we have at least two methods for monitoring
the cardiovascular system: the ECG signal records the electrical activity of the heart,
and the arterial blood pressure (ABP) signal records the fluid pressure in either the
radial or the femoral artery [KJH+13]. We show that both depictions of the cardiac
cycle present the same dynamical information from different viewpoints. To this
end, we study a particular intraoperative phenomenon. We analyze a single case of
intraoperative myocardial infarction, wherein the patient experienced a heart attack
during his or her open heart surgical procedure. During the surgery, there was a
sudden appearance of ST segment elevation in the ECG signal (see Figure 3.7). An
immediate trans-esophageal echocardiogram examination and a direct assessment of
the open chest by the surgeon showed an abnormal heart muscle movement which
67
Figure 3.7: On the left, we plot every tenth ECG pulse preceding the MI event. Inthe middle, we plot pulses occuring during the MI event. On the right, we plot pulsesfollowing the MI event. The beats are coloured according to their order of appearance;blue is early, and red is late. An increase in ST segment elevation is noticed duringand following the MI event.
fulfilled the diagnostic criteria for a myocardial infarction (MI) event. The ST segment
elevation in the ECG waveform is obvious to the trained eye, but morphological
changes to the ABP waveform are less evident.
We are broadly interested in the physiological response of the cardiovascular system
during myocardial infarction events. To this end, we analyze pulse waveforms obtained
from both the ECG and the ABP signal. The physiological time series were collected
using the Philips IntelliVue patient monitor. The signals were collected from the
patient monitor via a third-party data dumping system called ixTrend Express 2.1
(ixellence GmbH, Wildau, Germany). The ECG signal (lead II of the EASI Lead
System) was sampled at 500 Hz, and the ABP signal was sampled at 125 Hz. The
signals used are from the study approved by Shin Kong Wu Ho-Su Memorial Hospital,
Taipei, Taiwan. The Institutional Review Board approved the study protocol (No.
20160106R). Written informed consent was obtained from the patient. Our recordings
are 16 minutes long.
68
Dynamics in the ECG Signal
To demonstrate the proposed algorithm and obtain the desired dynamical information
from the ECG signal, we carry out the following calculations. Noise in the ECG
signal is supressed and baseline wandering is removed by applying a bi-directional,
third-order Butterworth bandpass filter with cutoff frequencies 0.5-40 Hz. We detect
the R peaks in the recording by applying a standard high-accuracy QRS detector
[Elg13]. Write the pre-processed ECG signal as a vector E ∈ Rn, where n = fs × T
is the number of samples, fs = 500 is the sampling rate, and T is the duration of
the recording in seconds. Suppose that there are N detected R peaks in the ECG
signal. Write the location in samples of the i-th R peak as ri. By defining a window
surrounding each R peak which stretches from the beginning of the P wave to the end
of the T wave, we are able to extract each cardiac cycle. In this example, we extract
the i-th beat by performing
X(i, j + 176) := E (ri + j) −175 ≤ j ≤ 175 (3.43)
Consequently, X is an N × p matrix, where p = 351. According to Section 1.3, we
assume the set of heartbeats lies on a low-dimensional manifold embedded in Rp. We
recover this manifold by approximating the eigenfunctions of its Laplace-Beltrami
operator. Indeed, we consider the self-tuning Gaussian kernel
W(i, j) =1
2
[exp
[−
p∑k=1
(X((i, k)−X((j, k))2
σ2i
]+ exp
[−
p∑k=1
(X((i, k)−X((j, k))2
σ2j
]],
(3.44)
where σi is the median Euclidean distance from the i-th row of X to all other rows.
We then perform α-normalization with α = 1 to handle the (possibly) non-uniform
density function. We calculate
W(α) = D−1WD−1, (3.45)
69
where D ∈ RN×N is a diagonal matrix satisfying
D(i, i) =N∑j=1
W(i, j). (3.46)
We then calculate the isotropic diffusion kernel by setting
P =(D(α)
)−1/2W(α)
(D(α)
)−1/2, (3.47)
where D(α) ∈ RN×N is a diagonal matrix satisfying
D(α)(i, i) =N∑j=1
W(α)(i, j). (3.48)
We obtain the top three non-trivial eigenvectors of P with their corresponding
eigenvalues, namely
φ1, φ2, φ3 ∈ RN 1 > λ1 ≥ λ2 ≥ λ3. (3.49)
Finally, we set the truncated time-1 diffusion map to be the matrix
Φ1 =[λ1ϕ1 λ2ϕ2 λ3ϕ3
]∈ RN×3, (3.50)
where ϕi =(D(α)
)−1/2φi ∈ RN are the top non-trivial (unnormalized) eigenvectors of
the diffusion kernel.
Dynamics in the ABP Signal
We now show that the same dynamical information is available in the simultaneously
recorded ABP signal, even though it may not be apparent to the human eye (see
Figure 3.9). First, we upsample the ABP signal to 500 Hz. To obtain all pulses from
the ABP signal, we perform peak detection using an algorithm designed for PPG
signals [ENB+13]. Suppose the ABP time series is discretized as A ∈ Rn, where
n = fs × T , fs = 500 is the sampling rate, and T is the duration of the recording
70
Figure 3.8: A mapping of the ECG waveforms to R3 afforded by the first threenon-trivial coordinates of the diffusion map. The embedded points are colouredaccording to their order of appearance; blue is early, and red is late. Beats featuringST segment elevation appear later in the recording.
71
in seconds. Suppose there are N detected peaks, and their locations in samples are
aiNi=1. Delineate the i-th cycle in the ABP signal as
X(i, j + b0.13× fsc+ 1) := A(ai + j) −b0.13× fsc ≤ j ≤ b0.5× fsc. (3.51)
Clearly, X ∈ RN×p, where p := 316. Next, we remove the blood pressure information
by normalizing each row of X. Set
X(i, j) := σ−1i (X(i, j)− µi) j = 1, . . . , p, (3.52)
where µi is the mean of the i-th row of X, and σi is its standard deviation. The point
cloud Xini=1 encodes the morphological information of the ABP waveform, and the
temporal information is preserved in the sequence aiNi=1. We carry out the diffusion
maps algorithm precisely as in the case of the ECG signal (with X in place of X).
The result is depicted in Figure 3.10. It is worth mentioning that since the blood
pressure information has been removed (detrended), the embedding only reflects the
dynamics encoded in the time-varying oscillatory morphology of the ABP pulses.
This finding coincides with our physiological knowledge: the stimulus invokes a series
of physiological responses including vasoconstriction, increased heart contractility,
and their subsequent interactions. The impact of the stimulus decays after a while,
and the physiological status gradually returns to normal. This example serves to
demonstrate the usefulness of the wave-shape oscillatory model in decoding dynamics
that are difficult to observe with the human eye.
72
Figure 3.9: On the left, we plot every tenth ABP pulse preceding the MI event. Inthe middle, we plot pulses occuring during the MI event. On the right, we plot pulsesfollowing the MI event. The beats are coloured according to their order of appearance;blue is early, and red is late. It is difficult to pinpoint the morphological differencesbetween these waveforms.
Figure 3.10: A mapping of the ABP pulse waveforms to R2 afforded by the first andthird non-trivial coordinates of the diffusion map. The embedded points are colouredaccording to their order of appearance; blue is early, and red is late. The yellow pointscorrespond to pulses occuring during the MI event.
73
Chapter 4
Decoding Sleep Dynamics using Multiple
EEG Channels
The electroencephalogram (EEG) does not adhere to the wave-shape oscillatory model,
but there is still a sense in which EEG signals are observations of a process on a
manifold hosting physiological states. In this chapter, we will consider sleep dynamics;
a healthy sleeper naturally oscillates between the REM (rapid eye movement) sleep
stage and the non-REM sleep stages: N1, N2, and N3. We are interested in detecting
and quantifying these changes in state. While transitions in sleep stage are marked by
more than just brain activity, we will nevertheless use only the EEG to assess these
dynamics.
For the simple reason that EEG signals are exceptionally challenging to model, we
call them chaotic. EEG signals appear to be summations of numerous intermittent
oscillatory “bursts” of electrical energy, each of which has a well-localized fundamental
frequency. These bursts of energy (called neural oscillations or brainwaves) correspond
to neuronal activity and are associated with cognitive processing. The spectral content
of a subject’s EEG signal is known to vary with his or her state of consciousness.
Activity in the delta band (0-4 Hz) is associated with unconsciousness or deep sleep,
activity in the theta band (4-8 Hz) is associated with meditation and drowsiness,
activity in the alpha band (8-16 Hz) is associated with wakeful relaxation, the beta
band (16-32 Hz) is associated with active thinking, and the gamma band (32-100 Hz)
is associated with a phenomenon termed unified perception, in which different parts of
the brain work together to accomplish demanding cognitive tasks. In this sense, the
time-localized spectrum of the EEG signal is a non-negative function on the real line
74
which provides some coarse information about the state of the brain. In this work, we
will assume that the space of such functions is a smooth, compact, low-dimensional
Riemannian manifold embedded in L2(R) which is diffeomorphic to the phase manifold
(as in the wave-shape manifold model). In principle, the time-varying power spectrum
of the EEG signal is akin to the time-varying morphology of the cardiac waveform.
In this work, we localize the spectral content of the EEG signal in time using the
continuous wavelet transform. We can use the resulting time-frequency representation
to observe sleep dynamics. One clear advantage to this approach is that we can
localize the spectrum of the EEG signal at any point in time (as opposed to only
at the incidence of certain landmarks), and we take the opportunity to observe the
spectrum at regularly sampled time points.
This chapter is organized in the following way. First, we apply the diffusion maps
algorithm to visualize the neural dynamics encoded by the changing power spectrum
of a sleeping subject’s EEG signal. Then, we describe a clinical application of these
principles; an automatic sleep stage classifier is built and validated on an annotated
database of EEG signals. We discuss how this algorithm is designed to leverage
information from multiple EEG channels (and hence multiple locations in the brain);
canonical correlation analysis is applied to preserve information that is intrinsic to
the current sleep stage and to ignore nuisance information. We provide a geometric
interpretation of the algorithm. The dual-channel automatic sleep stage classifier
that we build achieves an accuracy of 80.27% and a macro F1 score of 72.88% on our
database of 60 subjects without sleep apnea.
4.1 Background and Significance
The physiological importance of sleep in undeniable. During sleep, the body enters
a restorative state, secreting anabolic hormones which help to repair damaged cells
75
and remove metabolic waste products [GSK+04, XKX+13]. Sleep deprivation is
associated with immune system impairment and impaired cognitive function, among
other negative effects [PH96, CA+06]. Healthy sleep additionally refers to achieving an
appropriate balance between time spent in the REM and non-REM stages [KTR+94].
Slow-wave sleep is especially important as 70% of all human growth hormone secretion
occurs during this phase [VCP96, VCLP00]. This means that, for some sleep disorders,
sleeping for longer periods of time will have no effect on symptoms. Sleep disorders are
placed into one of three categories, namely: dyssomnias (e.g. insomnia or obstructive
sleep apnea); parasomnias (e.g. night terrors or sleep walking); and circadian rhythm
sleep disorders (that are symptomatic of biological clock disorientation).
Sleep disorders are assessed in a sleep lab. A subject suspected of having a
sleep disorder sleeps in the lab overnight, and his or her vital signs are continuously
monitored. A polysomnogram (PSG) includes multiple EEG, EOG, and EMG channels.
The respiratory and cardiac systems are also monitored. In total, subjects can be
monitored using as many as 25 different sensors. The 6-8 hour PSG is then manually
annotated by a trained sleep technologist. This process is a time-consuming and
repetitive task; as such, it is expensive and error-prone. Inter-rater agreement is often
low, even when 25 channels are available. An automatic algorithm for sleep stage
annotation is needed. The challenges facing the development of an automatic sleep
stage classifier include the lack of inter-rater agreement and confounding factors such
as sleep apnea events. When dealing with EEG recordings in particular, one must
overcome the high level of background noise in the EEG and the low spatial resolution
of a single EEG channel.
76
4.2 Unsupervised Extraction of Sleep Dynamics
An EEG recording x ∈ L2loc(R) is a chaotic time series whose spectral content describes
the cognitive state of the subject being monitored. To access the spectral content
at each point in time, we apply the continuous wavelet transform (CWT) which is
commonly considered in the neuroscience community [TO19, MFMTH+14, TMG16,
Aki02]. In particular, if ψ ∈ L2(R) is a mother wavelet satisfying ψ(0) = 0 with center
frequency 1, the CWT of x is defined as
W (ψ)x (a, b) =
1√|a|
∫Rx(t)ψ
(t− ba
)dt a, b ∈ R, (4.1)
where ψ denotes complex conjugation. The (non-conventional) power spectrum is
then |W (ψ)x (a, b)| and represents the energy that x has near the center frequency a at
time b ∈ R. An important property of the CWT is that the bandwidth of the filter
response
1√|a|ψ (•/a) (ξ) =
√|a|ψ(aξ) ξ ∈ R (4.2)
grows with increasing |a|. As a result, we can obtain a covering of the positive
frequency axis using a discrete set of center frequencies ai = 2i/Q∞i=1 that are are
linearly distributed in the logarithmic scale. (The parameter Q is called the number
of wavelets per octave.) This property is evidently beneficial for EEG analysis because
of how the delta, theta, beta, alpha, and gamma frequency bands are also linearly
distributed in the logarithmic scale. In our model for sleep dynamics, we will assume
the existence of a phase manifold as in Section 1.2. We additionally assume that the
locally-averaged power spectra
a 7→∫R|W (ψ)
x (a, b− t)|φ(t) dt ∈ L2(R) (4.3)
(where φ is the scaling function associated to ψ) are sampled from a smooth, compact,
low-dimensional (Riemannian) manifoldM that is diffeomorphic to the phase manifold,
77
where the lack of isometry indicates the distortion incurred during the observation
process. To recover M and the dynamics on its surface, we take the opportunity to
calculate a finite number of these power spectra at a sampling interval of δ > 0, in
which case the sampling points are
δ − t0, 2δ − t0, ..., Nδ − t0 ⊂ R, (4.4)
where t0 is an appropriate time shift, and N is the number of samples. We then apply
the diffusion maps algorithm to this finite set of locally-averaged power spectra to
recover the underlying manifold M. The result for one EEG recording is shown in
Figure 4.1.
The recovery of the phase manifold (up to diffeomorphism) provided by the
diffusion maps algorithm shows embedded power spectra that are implicitly organized
according to sleep stage. The embedding has not fully recovered the difference between
the REM and N1 stage, but this result is to be expected because an EOG channel
(which tracks eye movement) is additionally required to differentiate between these
sleep stages. In the first non-trivial coordinate of the time-1 diffusion map (see
Figure 4.2), we see a recovery of the oscillation between the REM and NREM stages.
The second non-trivial coordinate also exhibits periodicity.
This method evidently has the potential for a clinical application, and in the
rest of this chapter, we make progress in that direction. In short, we use the CWT
features extracted from the EEG channel to train a supervised automatic sleep stage
classifier. To this end, we need to make a few considerations. First, the algorithm
should be trained on a large number of annotated EEG recordings. Incorporating
EEG recordings (and their associated power spectra) from a larger number of subjects
into the embedding would serve to stabilize it. However, the lack of scalability of
kernel-based methods for manifold learning means that the diffusion maps algorithm
will have to be set aside in favor of a simpler method. This method should further
78
allow yet unseen recordings to be easily read, which kernel-based methods have a hard
time achieving. Second, to increase the accuracy of our classifier and mimic what
is done by sleep technologists, we should take multiple EEG channels into account.
After all, a severe limitation of the EEG device is the high level of background noise
it contains, and combining information from multiple EEG channels is a common
approach to signal de-noising (as often done when studying event-related potentials).
A single EEG channel is furthermore limited spatially; it can only “see” a small
portion of the brain. We will take these considerations into account when designing
our automatic sleep stage classifier.
Figure 4.1: A mapping of N = 734 locally averaged, time-localized power spectrainto R3 afforded by the top three non-trivial coordinates of the time-1 diffusion map.We color the embedded points using their associated annotations. The embeddinghas been rotated in R3 to provide a better view.
79
Figure 4.2: A projection of the morphological dynamics on the recovered phasemanifold. In green, we plot the sequence of annotations provided by the expertannotator. In red, we plot the first non-trivial eigenvector of the diffusion kernel. Inteal, we plot the second non-trivial eigenvector of the diffusion kernel.
4.3 Methods
We describe the the process of building and evaluating an automatic sleep stage
classifier that is suitable for clinical deployment. We begin with a description of our
proposed algorithm and its variations. We then discuss the database used to validate
our model, as well as the metrics used to assess our model’s performance. We briefly
describe the inter-subject cross-validation procedure we employ which assesses how
80
well our model can generalize to unseen subjects.
4.3.1 Proposed Algorithm
Our automatic sleep stage classification algorithm takes as input two simultaneously
recorded EEG channels x1, x2 ∈ Rn with sampling rate fs = 100 Hz, so that the
recording duration is 0.01n seconds. (If the native sampling rate is not 100 Hz, we
should resample the signals.) We will assume that 0.01n is an integer multiple of 30, so
that there are N = 0.01n/30 30-second epochs in the recording. (By convention, the sleep
technologist makes sleep stage annotations every 30 seconds.) We take an unsupervised
approach to feature extraction. We apply the continuous wavelet transform to each
signal. We provide the technical details here. Take an analytic Morlet mother wavelet
ψ:R→ R with its dilations ψa(t) = ψ(t/a) and their discretizations ψa ∈ RT defined
in the frequency domain as
ψa(j) = 2 exp
[−(aω(j)− ω0)2
2
]U(aω(j)) j = 1, ..., N, (4.5)
where the center frequency of ψ is ω0 = 6 radians per second, U :R→ R is defined as
U(t) :=
1 if t > 00 otherwise,
(4.6)
and ω is the discretized Fourier domain:
ω(j) =
2π(j−1)
Nif 1 ≤ j ≤ N/2 + 1
2π(j−1)N− 2π if N/2 + 1 < j ≤ N.
(4.7)
We are interested in capturing frequencies between 0.2 and 40 Hz. As such, we set
a0 =ω0fs
40× 2π(4.8)
to be the smallest wavelet scale. To obtain the other scales, we set Q = 8 wavelets
per octave, and for each j = 1, ..., 61, we set aj = a02j/Q. The largest wavelet scale
81
a61 corresponds to a center frequency of approximately 0.2026 Hz. We perform
convolution of the signal xi with the dilated wavelet ψajby multiplying in the Fourier
domain:
yj = FFT(xi) ψaj0 ≤ j ≤ 61, (4.9)
where FFT denotes the n-point fast Fourier transform. We then calculate the discrete
time-frequency representation Wi ∈ R62×n by performing
Wi (j, :) = IFFT(yj). (4.10)
To get the locally-averaged power spectrum Pi ∈ R62×3N , we perform
Pi(j, k) = mean|Wi(j, t)| : |t− k|≤ 10fs k = 10fs, 20fs, ..., 3N. (4.11)
Each column of Pi can be thought of as a non-negative signal sampled at 0.1 Hz,
and Pi(j, k) represents the average spectral information of xi during the time period
0.1k ± 0.1 seconds near the center frequency ω0fs/2πaj Hz. We build a 186×N matrix
Xi satisfying
Xi(: , k) =
log (Pi(: , 3k − 2) + ε)log (Pi(: , 3k − 1) + ε)log (Pi(: , 3k − 0) + ε)
k = 1, ..., N, (4.12)
where ε = 2−52. This matrix Xi represents a point cloud in R186, where the k-th point
(the k-th column of Xi) corresponds to the k-th 30-second epoch of the time series xi.
Uncovering the low-dimensional structure underlying this point cloud will allow us to
demarcate sleep dynamics.
Sensor Fusion
We are provided with two matrices X1,X2 ∈ R186×N which describe observations
of a latent space (the phase manifold). A variety of models (with varying degrees
of abstraction) can be considered for the sensor fusion problem. In all cases, the
82
dynamics of interest will be hosted by the latent manifold N , and the information
observed by the two sensors will be hosted on the manifolds M1 and M2. Now N
represents the common information between M1 and M2 in the sense that there are
surjective submersions ρ1:M1 → N and ρ2:M2 → N . The observed dynamics in the
first sensor could be described by a function ϕ1:R→M1, and likewise in the second
sensor by a function ϕ2:R → M2. (The dynamics can be thought of as a particle
traversing each manifold.) The dynamics of interest are then ρ1 ϕ1 = ρ2 ϕ2, where
we have equality because of the synchronization in time. We observe the dynamics in
both sensors at a finite set of time points t1 < · · · < tN . Supposing M1 and M2 are
embedded in R186, the information we are provided with is
X1 =
ϕ1(t1)>
...ϕ1(tN)>
X2 =
ϕ2(t1)>
...ϕ2(tN)>
, (4.13)
and we want to use both point clouds to recover the manifold N (and hence the
dynamics ρ1 ϕ1 = ρ2 ϕ2). A recovery of N is an embedding (isometrically, if
possible) of N into some (low-dimensional) Euclidean space. A special case which
has been previously analyzed is when ρ1 and ρ2 are diffeomorphisms [TW19]. In
this general setting, alternating diffusion or multi-view diffusion maps can obtain the
desired recovery. However, due to the problems with these methods outlined above,
we approach the task of recovery from a linear point of view.
The linear method we will employ relies on the following simplified model. Suppose
thatN is embedded in Rp for p ≤ 186. Suppose ρ1 is a 186×p matrix with orthonormal
columns, and ρ2 is a 186× p matrix with orthonormal columns. When p = 186, we
can obtain M1 and M2 from N by simply rotating the ambient space R186. When
p < 186, there is an opportunity for N to be of strictly smaller dimension than both
M1 and M2. In particular, Mi could be isomorphic to the product of N and some
nuisance structure of non-trivial dimension. In this case, the orthogonal projection
83
ρi truncates the nuisance information. Given X1 and X2 that are centered at the
origin (in the sense that 1>NX1 = 1>NX2 = 0186), we can estimate ρ1 and ρ2 (and
hence N ) via the following linear method which is approximately traditional canonical
correlation analysis. We perform
(ρ1, ρ2) = argmaxU∈O(186,p),V ∈O(186,p)〈X1U,X2V 〉 (4.14)
where 〈X1U,X2V 〉 = trace(U>X>1 X2V ) is the covariance between the projected and
rotated point clouds X1U and X2V . Of course, we can solve this optimization problem
using the singular value decomposition of X>1 X2. Indeed if UΣV > = X>1 X2, then
ρ1 should be the p columns of U which correspond to the largest singular values.
Similarly, ρ2 should be the p columns of V which correspond to the largest singular
values. A recovery of the points on N is then X1ρ1 ≈ X2ρ2, and the time information
is preserved in the ordering of the columns. The features we extract by this sensor
fusion method are
CCA =[X1ρ1 X2ρ2
]∈ RN×2p, (4.15)
These features will be used to classify the sleep stages. Other approaches to sensor
fusion that we compare with are concatenation (CON), averaging (AV), co-dimension
reduction (PCA), and the concatenation of dimension-reduced representations (DIM).
These approaches are defined as follows.
CON =[X1 X2
](4.16)
AV =X1 + X2
2(4.17)
PCA = US (4.18)
DIM =[U1S1 U2S2
](4.19)
84
where we use the singular value decomposition to get the low-rank approximations
USV> = CON>CON to rank 2p (4.20)
U1S1V>1 = X>1 X1 to rank p (4.21)
U2S2V>2 = X>2 X2 to rank p (4.22)
The classification method we employ is a set of kernel support vector machines
(kSVMs) aggregated using error-correcting output coding (ECOC). A kSVM is trained
to differentiate between all pairs of sleep stages so that there are ten kSVMs in each
ECOC ensemble. We standardize all input features by z-scoring before passing them
to the ten kSVMs. In each kSVM, we use a Gaussian kernel u 7→ e−u2/σ2
with σ = 10
and a box constraint of 1.
4.3.2 Annotated Polysomnograms
We use EEG recordings from a database of annotated polysomnograms. Standard
overnight PSG studies were performed to confirm the presence of sleep apnea syndrome
in clinical subjects suspected of sleep apnea at the sleep center in Chang Gung
Memorial Hospital (CGMH), Linkou, Taoyuan, Taiwan. The Institutional Review
Board of CGMH approved the study protocol (No. 101-4968A3). All recordings were
acquired on the Alice 5 data acquisition system (Philips Respironics, Murrysville,
PA). The sleep stages were scored by two experienced sleep technologists abiding by
the AASM 2007 guidelines [IAIC+07], and a consensus was reached. Each recording
is at least 5 hours long. We focus only on the EEG channels in this work (C3-A2,
C4-A1, O1-A2, and O2-A1), which were all sampled at 200 Hz. As suggested in
[VSKKVdV90], we start by considering the pairing C3-A2 and C4-A1. In the results
section, we report the performance of our model when evaluated on all pairings of
EEG channels. The database contains one-night recordings from 60 adult subjects
(each with apnea-hypopnea index less than 5). We did not exclude subjects based on
85
any other criterion such as gender, age, race, current medications, or previous medical
conditions. This database is called the CGMH database.
4.3.3 Evaluation Metrics
To evaluate our automatic sleep stage classifier, we consider a variety of multi-class
performance metrics. Consider the confusion matrix M ∈ N5×5, where Mij represents
the number of epochs belonging to the i-th class that were predicted to belong to the
j-th class. Ideally, Mij = 0 if i 6= j. We define the recall (REi), precision (PRi), and
F1i score associated with the i-th class as
REi = 100%× Mii∑5j=1Mji
PRi = 100%× Mii∑5j=1Mij
(4.23)
F1i =
(RE−1
i + PR−1i
2
)−1
. (4.24)
Note that REi is the percentage of epochs belonging to the i-th class which are
predicted to belong to the i-th class, PRi is the percentage of epochs predicted to
belong to the i-th class which actually belong to the i-th class, and F1i is the harmonic
mean of REi and PRi. We want all of these metrics to be as close to 100% as possible.
We also report accuracy (AC) and the macro F1 (F1) score, defined as
AC = 100%×∑5
i=1Mii∑5i=1
∑5j=1Mij
F1 =1
5
5∑i=1
F1i. (4.25)
The macro F1 score represents the average (binary) F1 score over all five classes
and will be used as our final evaluation metric. We will not weight AC heavily
because each of the five classes does not have the same number of epochs belonging
to it. As described above, we build several models for the purpose of showing the
improvement gained by adopting our proposed sensor fusion method. We run all
models through the same inter-individual cross-validation procedure. We apply 10-fold
86
cross-validation; the 60 subjects in our database are randomly partitioned into 10
non-overlapping subsets. (This partition is the same for all cross-validation events to
maintain comparability between the different models and feature extraction methods.)
By performing inter-individual cross-validation, we can get a better idea of how our
model will perform on epochs coming from subjects the model has never seen, which
is critical for clinical applicability.
4.4 Results
In our database of 60 subjects without sleep apnea, there were 45,339 thirty-second
epochs. Of these, 6,697 (14.77%) were in the wake class, 6,423 (14.17%) were in
the REM class, 5,288 (11.66%) were in the N1 class, 22,766 (50.21%) were in the
N2 class, and 4,165 (9.19%) were in the N3 class. We begin with a comparison of
sensor fusion methods. We fuse the spectral information obtained from the C3-A2
and C4-A1 channels. In Table 4.1, we report the performance statistics. We find that
canonical correlation analysis effectively fuses information from both EEG leads. We
achieve increases in ACC and F1 of 1.60% and 2.25% over the next best sensor fusion
method, which was . Interestingly, not all sensor fusion methods perform better than
both single-channel methods. Next, using the four EEG channels available in record
of the CGMH database, we explore the effectiveness of our algorithm on different
pairings of EEG channels. (We use canonical correlation analysis as the sensor fusion
method.) In Table 4.2, we report that the optimal pairing of leads was C3-A2 and
O1-A2. Finally, we show the full confusion matrix for the optimal pairing in Table 4.3.
Evidently, the classifier has difficulty differentiating the N2 and N3 stages, as well
as differentiating the N1 stage from all other stages except N3. Nevertheless, the
performance on the Wake, REM, and N2 stages is very good.
87
Table 4.1: Comparison of linear sensor fusion methods
X1 (C4-A1) X2 (C3-A2) CON AV PCA DIM CCA
F11 (%) 80.48 84.15 83.75 80.60 83.93 84.87 86.51F12 (%) 67.99 74.42 71.05 72.71 73.15 74.10 78.64F13 (%) 41.19 44.59 44.46 43.96 47.04 47.75 51.89F14 (%) 85.67 86.37 86.54 86.10 87.32 87.31 87.75F15 (%) 49.35 49.21 49.73 50.64 52.46 53.24 53.73F1 (%) 64.84 67.75 67.10 66.80 68.78 69.45 71.70
ACC (%) 74.84 77.13 76.55 76.10 77.75 78.12 79.72
CON: concatenation; AV: averaging; PCA: co-dimension reduction; DIM: concatenation ofdimension-reduced representations; CCA: canonical correlation analysis; F11: F1 score for the wakestage; F12: F1 score for the REM stage; F13: F1 score for the N1 stage; F14: F1 score for the N2stage; F15: F1 score for the N3 stage; F1: average F1 score across all five stages; ACC: accuracy.
Table 4.2: F1 scores obtained by the sleep stage classifier when using differentpairings of EEG channels
C3-A2 C4-A1 O1-A2 O2-A1
C3-A2 71.70 72.88 71.66C4-A1 72.61 70.21O1-A2 68.76O2-A1
Table 4.3: Performance of the proposed automatic sleep stage classifier
Predicted Class
Wake REM N1 N2 N3 PR (%) RE (%) F1 (%)
Tar
get
Cla
ss Wake 5953 48 604 92 0 87.93 88.89 88.41REM 86 5173 773 391 0 79.30 80.54 79.92N1 559 963 2771 994 1 57.12 52.40 54.66N2 170 339 700 20609 948 84.61 90.53 87.47N3 2 0 3 2272 1888 66.55 45.33 53.93
PR: precision; RE: recall; F1: F1 score. The overall accuracy was ACC = 80.27%, and the averageF1 score across all five classes was F1 = 72.88%.
4.5 Discussion and Conclusion
In this chapter, we discuss the development of an automatic sleep stage classifier which
fuses spectral information from two EEG channels to remove nuisance information.88
Our geometric interpretation of sleep dynamics and the sensor fusion method provides
a reasonable guarantee for the effectiveness of our algorithm. We numerically validated
our algorithm using two databases of annotated PSGs. We examined the effectiveness
of our algorithm on a variety of pairs of EEG channels. While this algorithm is a step
toward a working, robust model that can be directly applied in the sleep laboratory,
there is still a large amount future work which must be done.
4.5.1 Limitations
We do not consider subjects with mild to severe sleep apnea. Sleep apnea is a
common sleep disorder, and a sleep stage classification algorithm designed for clinical
application should be able to demonstrate strong performance on subjects having the
disorder. However, patients with sleep apnea are excluded from this study for the
following reasons. Sleep apnea events cause very fast changes in sleep stage. The
AASM guidelines for sleep staging (which were made with normal physiology in mind)
do not allow sleep technologists to change the frequency at which they make sleep
stage annotations; annotations are made exactly once every 30 seconds. In normal
subjects, sleep stages transition slowly and in a canonical order. The sleep technologist
is put in a difficult situation when one 30-second epoch contains features indicative of
multiple sleep stages (which is generally the case when an apnea event occurs during
that epoch). It would seem that current regulations are insufficient for simultaneously
assessing sleep stages and sleep apnea events.
Reiterating, the characterization of sleep dynamics developed by Rechtschaffen
& Kales and modified by the AASM was made with normal sleep physiology in
mind. As such, it is severely limited when assessing pathological subjects. There is a
growing body of evidence that sleep pathology in humans can manifest in a variety
of previously uncharacterized ways. For example, sleep stages may be dissociated
89
across different regions of the brain [NFM+11]. There may be extra “transitional”
sleep stages that exist between the canonical six stages, and it has been suggested to
view the time spent awake before falling asleep as distinct from arousal events that
occur during the night [PDMB15, MS92].
The discipline of sleep staging is in general plagued by a lack of inter-rater agree-
ment [NPS+00, DhAZ+09]. Not only is sleep staging often an ambiguous endeavour,
but sleep technologists are under pressure to perform the repetitive task of epoch
annotation quickly and for extended periods of time; errors will almost certainly be
made. The lack of inter-rater agreement means that our labels are very noisy. On
the other hand, we may be over-fitting to the preferences of one annotator (or group
of co-annotators). Moving from one database of PSG recordings to another presents
additional challenges which we did not address in this study. EEG channels recorded
on different devices (at different sampling and bit rates), using different electrodes
(in a hardware sense), and placed on the subject by different individuals can yield
drastic changes in both the quality and content of the signal. In this work, we did
not assess the transfer proficiency of our model between lead types, but this presents
another challenge.
We did not directly attempt to assess inter-individual variability. It may be the
case that the spectral content of one subject’s N2 epochs is, on average, slightly
different (via a transposition, perhaps) than the spectral content of another subject’s
N2 epochs. For example, we know that sleep spindles (bursts of activity between
12 and 14 Hz that are associated with transitions to the N3 stage) have different
fundamental frequencies in different individuals, and that the rate of occurrence of
sleep spindles varies from person to person. We did not take into account variations in
the amplitude of the EEG waveform that could arise when moving from one subject
to another.
90
Finally, a vast amount of structure exists inside the CWT representation of an
EEG waveform, and we did not take advantage of this structure. For example, we did
not directly take into account the geometry inherent in the ordering and spacing of
the center frequencies of the bandpass filters used to compute the discretized CWT.
On another level, we did not take into account how pairs or groups of frequency
bands work together to differentiate sleep stages. We also did not take into account
transition probabilities among sleep stages or any temporal information. However,
we note that including this information in the model may lead to poor performance
on pathological subjects who, by definition, would not exhibit the same transition
probabilities between sleep stages.
91
Chapter 5
Supervised Extraction of Sleep Dynamics
using Instantaneous Heart Rate
Fluctuations in heart rate are intimately tied to changes in the physiological state
of the organism. We examine and exploit this relationship by classifying a human
subject’s wake/sleep status using his instantaneous heart rate (IHR) series. We use a
convolutional neural network (CNN) to build features from the IHR series extracted
from a whole-night electrocardiogram (ECG) and predict every 30 seconds whether the
subject is awake or asleep. Our training database consists of 56 normal subjects, and
we consider three different databases for validation; one is private, and two are public
with different races and apnea severities. On our private database of 27 subjects, our
accuracy, sensitivity, specificity, and AUC values for predicting the wake stage are
83.1%, 52.4%, 89.4%, and 0.83, respectively. Validation performance is similar on our
two public databases. When we use the photoplethysmography instead of the ECG
to obtain the IHR series, the performance is also comparable. A robustness check is
carried out to confirm the obtained performance statistics. This result advocates for
an effective and scalable method for recognizing changes in physiological state using
non-invasive heart rate monitoring. The CNN model detects fluctuations in IHR (as
well as their locations in the signal), and this information is suitable for differentiating
between the wake and sleep stages. This chapter is the peer-reviewed publication
[MLW18].
92
5.1 Introduction
Sleep is a key ingredient to our well-being, and disrupted sleep may lead to catastrophes
in personal medicine or public health [KTR+94, KLB+09, CA+06]. Understanding
sleep is critical for the whole healthcare system. The gold standard for quantifying
sleep dynamics is the polysomnography (PSG). However, it is bulky, complicated to
install, labor-intensive, and expensive. It is also possibly error-prone if an overnight
PSG recording is manually examined by experts [NPS+00]. An automatic, convenient,
and accurate way to study sleep is therefore in high demand. Efforts made in the past
few decades to accomplish these tasks have been assisted by advancements such as
bio-sensors, wearable technology, and mobile health devices.
The assessment of sleep stages using vital signs is most commonly done using the
electroencephalogram (EEG). The photoplethysmography (PPG) and the electrocar-
diogram (ECG) are also widely considered. Compared to a PSG, these signals are easy
to install and cheap to obtain, and long-term monitoring is possible. The challenge
with these signals is that the amount of available information is limited. Neural
networks lend themselves advantageously to this task, as they excel at extracting and
compressing information into representations that are useful for the application at
hand [LBH15]. Motivated by the needs and challenges associated with predicting
sleep dynamics using mobile health devices, we focus on analyzing ECG and PPG
signals in this work, and we propose an effective and scalable approach to analyzing
heart rate variability (HRV) as it pertains to physiological events and outcomes
[MC90, otESoC+96, BLS11].
5.1.1 Heart Rate Variability and Sleep
HRV is quantified by studying the time series consisting of intervals between con-
secutive pairs of heart beats, which is usually called the R-to-R interval (RRI)
93
time series. Interest in HRV has a long history [Bil11], and it has been exten-
sively studied in the past few decades. See, for example, a far-from-complete
list of review articles [VPH+09, SMZ14]. The rhythm of the heart is regulated
by the autonomic nervous systems (ANS). Changes in heart rate are the result
of complicated interactions between physiological systems and external stimuli.
They reflect the integrity of the physiological systems. In short, analyzing HRV
amounts to observing through a non-invasive window the physiological dynamics of
an organism. It has long been believed that a correct quantification of HRV will
yield a deeper understanding of a variety of physiological systems, including sleep
[SHMG64, ZVS84, SDMA93, VQMR95, TGP+96, BA97, EHO99, CD14, PKB+16].
For example, while a subject is awake, the sympathetic tone of the ANS is domi-
nant. The heart rate is higher due to daily activity and external stimuli, and the
rhythm of the heart is less stable [SDMA93]. The heart rate is relatively lower
when the subject is asleep. It reaches its lowest value during slow wave (deep) sleep
[SHMG64]. During non-REM (rapid eye movement) sleep, the parasympathetic
nervous system dominates, the sympathetic tone becomes less dominant, and the
energy restoration and metabolic rates reach their lowest levels. The heart rate
decreases, and the rhythm of the heart stabilizes [SDMA93]. These physiological
facts indicate that we can distinguish between the wake and sleep stages by taking
heart rate into account. There have been several works in this direction, such as
[LSCS05, LSC+07, MMC+10, LFH+12, XYS+13, AMT+15, YYJG16, VLBB16].
With appropriate HRV quantities, physicians can better understand the ANS and
hence improve both diagnostic accuracy and treatment quality [VPH+09]. However,
despite a lot of research in this direction and many quantitative indices proposed by
experts, there is limited consensus in the universality of HRV analysis. The reason for
the lack of extensive success in this field is largely due to the non-stationary nature of
94
the system. See, for example, some published discussions on this topic [PG94, Gla09].
We do not expect to sort out this difficult problem in this exploratory work which
focuses on the sleep stage classification problem. Instead, we propose the use of a
convolutional neural network (CNN) on the IHR series as a scalable approach to
adaptively quantifying heart rate fluctuation from a variety of monitoring sources. A
CNN [LBBH98] is a special kind of neural network (NN) capable of building features
from time series and images. A CNN detects not only the presence of structured
features which are important for classification, but also their temporal (spatial)
location. CNN design is motivated by the hypothesized mechanism for the animal
visual cortex. An important property of the CNN is that it behaves equivariantly
with respect to translations of input features. This design is favorable for sleep
stage classification because the precise locations of fluctuations which correspond
to sympathetic (“fight or flight”) tone dominance provide physiological information
about sleep stage. We refer readers with interest in CNN architecture to the review
article [LBH15]. There have been several studies using NNs and manually designed
heart rate features to predict sleep stages [LSCS05, AMT+15], but to the best of our
knowledge, CNNs have not been considered in the field. We show that the CNN
approach is effective for accomplishing this task, providing evidence that it may lend
itself favorably to other HRV applications.
5.2 Materials and Methods
5.2.1 Data
Standard overnight PSG studies were performed to confirm the presence of sleep
apnea syndrome in clinical subjects suspected of sleep apnea at the sleep center in
Chang Gung Memorial Hospital (CGMH), Linkou, Taoyuan, Taiwan. The Institu-
tional Review Board of CGMH approved the study protocol (No. 101-4968A3). All
95
recordings were acquired on the Alice 5 data acquisition system (Philips Respironics,
Murrysville, PA). The sleep/wake stages were defined and scored by two experienced
sleep technologists abiding by the AASM 2007 guidelines [IAIC+07], and a consensus
was reached. Each recording is at least 5 hours long. We focus only on the PPG
recording and on the second lead of the ECG recording, which were both sampled at
200 Hz. There are 90 healthy subjects (each with apnea-hypopnea index less than
5) in the training database, among which we consider only 56 subjects who were
labeled as awake for at least 10% of the recording duration. This database is called
CGMH-training.
We consider three validation databases. The validation databases were not used
to tune the model’s parameters, and were not subjected to any rejection. The first
one consists of 27 subjects and was acquired independently of CGMH-training from
the same sleep laboratory. This database is called CGMH-validation. The other two
databases are publicly available. The DREAMS Subjects Database1, consists of 20
recordings from healthy subjects. The recordings were selected by the TCTS Lab to
be of high clarity and to contain few artifacts. The technologists used the Brainnet
system (Medatec, Brussels, Belgium) to obtain the ECG recordings. The sampling
rate is 200 Hz, and the minimum recording duration is 7 hours. Although the race
information is not provided, we may assume that since the database is collected from
Belgium, its population constitution is different from that of the CGMH databases.
The third database is the St. Vincent’s University Hospital/University College
Dublin Sleep Apnea Database (UCDSADB) that is publicly available at Physionet
[GAG+00]2 and which consists of 25 subjects with sleep apnea of various severities.
The technologists used Holter monitors to obtain the ECG recordings. The sampling
rate is 128 Hz, and the minimum recording duration is 6 hours. We focus on the first
1http://www.tcts.fpms.ac.be/~devuyst/Databases/DatabaseSubjects
2https://physionet.org/pn3/ucddb
96
ECG lead in this study. The DREAMS Subjects Database is chosen to assess the
model’s performance on recordings which come from subjects of a different race, and
the UCDSADB is chosen to assess the model’s performance on recordings which come
from subjects with sleep disorders.
5.2.2 Instantaneous Heart Rate
We apply a standard automatic R peak detection algorithm to each ECG recording,
which is a modification of that found in [Elg13]. Let rini=1 denote the location in
time (sec) of the n detected R peaks. We ensure that there is no artifact in the
detected R peaks by ignoring beats which are either too close or too far from their
preceding beats. (The mechanism for detecting false positive or missing beats is
based on a 5-beat median filter.) We view the instantaneous heart rate (IHR) as a
continuous and positive function. We estimate the IHR at time ri in beats-per-minute
(bpm) as
IHR (ri) =60
ri − ri−1
i = 2, ..., n. (5.1)
Using these estimates, we follow the Task Force standard [otESoC+96] and obtain
the IHR series over the duration of the recording at a sampling rate of 4 Hz. The
interpolation method is shape-preserving piecewise cubic interpolation. When the
PPG signal is considered, the same procedure is applied: we view the peaks in each
PPG recording as surrogate heart beats, and these peaks are automatically detected
by [ENB+13]. With the same interpolation scheme, the resulting time series is called
IHR-PPG (for the sake of distinguishing it from the IHR series extracted from the
ECG signal). We break the IHR (or IHR-PPG) signal into 30-second epochs, abiding
by the boundaries originally assigned by the sleep technologists. We discard an epoch
if fewer than five R peaks are detected within it.
97
5.2.3 Convolutional Neural Network
We implement the 1-dimensional CNN using TensorFlow 1.5 [AAe15]. The feed-
forward network architecture is shown in Figure 5.1. The input signal is first passed
through five convolution blocks. Following common practice in HRV analysis [Eng98,
CdlHMFAL06], we consider input signals 5 minutes in length. (Building an input
signal 5 minutes in length amounts to concatenating the labeled 30-second epoch with
the preceding 4 minutes and 30 seconds.) We normalize each 5-minute input signal
by subtracting its median value.
The architecture of a single convolution block is shown in Figure 5.2. Each
convolutional layer in the block has 10 filters with kernel size 8. A bias is added to the
output of each filter, and the result is passed through a rectified linear unit (ReLU)
activation function. The convolution blocks are followed by two fully-connected
(dense) layers of 20 nodes each. The fully-connected nodes also have associated
biases and ReLU activation functions. We apply dropout with probability 0.5 to the
last convolutional layer and to both fully-connected layers. The output of the last
fully-connected layer is fed with biases into a 2-node output layer. We normalize the
output of the network using the softmax function. We predict that a subject is awake
during a given epoch if the output of the “wake” node is greater than or equal to the
output of the “sleep” node. We train the network for 180 epochs using mini-batch
gradient descent with a learning rate of 10−3, a batch size of 24, and cross-entropy as
the loss function.
5.2.4 Performance Evaluation
We train the CNN model to differentiate between the wake and sleep stages by
fitting the model’s parameters to the IHR series extracted from CGMH-training.
We then apply the trained model to three different validation databases: CGMH-
98
Input
Convolution Blocks
20-Dense
20-Dense
Output
Figure 5.1: The architecture of the 1-dimensional convolutional neural network. Thenotation m-Dense means that the fully connected layer possesses m nodes. For inputsignals 5 minutes in length, we use five convolution blocks.
(10, 8, 1)-Convolution
(10, 8, 2)-Convolution
Figure 5.2: The architecture of a single convolution block. The notation(f, k, s)-convolution means that the convolutional layer has f filters with kernelsize k and stride s. The output of the block is half the size of the input.
validation, the DREAMS Subjects Database, and the UCDSADB. We report a variety
of performance measures, namely sensitivity (recall), specificity, accuracy, AUC (area
under the receiver operating characteristic curve), precision (PPV), the F1 score, and
Cohen’s kappa coefficient. We emphasize that our model assessment is performed on
an inter-individual basis. We further access how effectively a model trained on signals
from one monitoring device might be applied to recordings made on another device.
The model trained on IHR series is applied to IHR-PPG series, and vice versa. (The
PPG signal is commonly installed in modern mobile devices.) We compare the PPG
performance statistics with those obtained using the corresponding ECG recordings.
99
5.2.5 Robustness Check
To check the robustness of the considered CNN model, we perform two sensitivity
tests. First, we consider different input sizes, namely 30 seconds, 2 minutes, and
10 minutes. We assign a different number of convolution blocks to each input size.
When the input size is 30 seconds (2 minutes and 10 minutes, respectively), we build
up 3 (4 and 6, respectively) convolution blocks. In Figure 5.5, we show that inputs
of various sizes are constructed by additionally considering a length of time before
the labeled epoch. In our second sensitivity test, we train new CNN models on the
DREAMS Subjects Database and the UCDSADB. We validate these two models on
our three validation databases. Note that the sampling rate for the UCDSADB is
unique, and that the data collection system is heterogeneous across all databases;
these observations allow us to test the robustness of the established CNN model.
5.3 Results
Our training database from CGMH consists of recordings from 30 healthy males aged
36.8± 14.0 years, 23 healthy females aged 43.8± 13.6 years, and 3 healthy subjects
of unknown gender and age. The AHI and BMI of the male group are 2.7± 1.4 and
23.1±2.9, and the AHI and BMI of the female group are 2.6±1.4 and 23.6±4.6. The
AHI and BMI of the remaining subjects are not known. CGMH-validation consists
of recordings from 10 healthy males aged 42.3± 12.6 years, 16 healthy females aged
47.1± 17.9 years, and one healthy subject of unknown gender and age. The AHI and
BMI of the male group are 3.0 ± 1.0 and 24.4 ± 3.4, and the AHI and BMI of the
female group are 2.4± 1.6 and 24.2± 5.0. The AHI and BMI of the remaining subject
are not known. We dismiss 203 noisy epochs from the training set and 199 from the
testing set because they contain less than five detected R peaks. After dismissal, there
are a total of 41, 472 points in the training set and 20, 102 points in the validation set.
100
The percent of wake labels is 18.8% in the training set and 17.1% in the validation set.
All processing, training, and validation calculations are performed on an Intel® Core
i7-4790K Processor at 3.60 GHz with 24 GB of RAM and a Microsoft® Windows®
10 Home operating system (version 1709). In all tables, the positive class corresponds
to the “wake” label. Because of the imbalance between the positive and negative
classes, we should take care when assessing the reported performance measures.
5.3.1 Model Validation
In Table 5.1, we show the performance measures associated with fitting the model
to CGMH-training and applying it to the three validation databases. We obtain
a good AUC value of 0.83 and a moderate Cohen’s kappa coefficient of 0.41 on
CGMH-validation. The F1 score is 0.51, which is not high due to the low precision.
To examine the effect of imbalanced classes on the F1 and precision scores, we apply
the trained CNN model to each subject in CGMH-validation individually, and we
compare the associated F1 and precision values to the percentage of “wake” labels in
the given recording (see Figure 5.3). If w denotes the percentage of ”wake” labels
in a given recording, a linear regression analysis shows that F1 = 0.32 + 0.0086w,
where the slope is statistically significant under t-test when the significance level is
set to p = 0.05, and precision = 40.16 + 1.03w, where the slope is also statistically
significant.
We show that our model performs well when applied to external recordings that
are associated with healthy subjects of a different race. The DREAMS Subjects
Database has recordings from 16 healthy females aged 35.8± 15.5 years and 4 healthy
males as aged 20, 23, 27, and 27 years. We dismiss 25 noisy epochs from the database
before evaluation because they are labeled as such or because they contain less than 5
detected R peaks. After dismissal, there are 20, 032 epochs remaining, and the percent
101
of wake labels is 16.7%. The AUC value is strong at 0.81, compared to the value 0.83
on the internal CGMH-validation database. The sensitivities and specificities are also
similar. The slight difference in performance could be attributed to the fact that the
sleep technologists supervising the DREAMS Subjects Database selected recordings
which had few artifacts, whereas such artifacts and their physiological correlates were
relevant in the CGMH-training data for classification.
We show that our model struggles to achieve convincing results when applied
to a database of subjects whose sleep physiology is different. The UCDSADB has
recordings from 21 males aged 48.3 ± 8.4 years and 4 females aged 41, 62, 63, and
68 years. The AHI and BMI of the male group are 25.4± 21.3 and 31.5± 4.3, and
the AHI and BMI of the female group are 18.3± 14.6 and 32.2± 2.6. We dismiss 91
noisy epochs from the database before evaluation because they are labeled as such
or because they contain less than 5 detected R peaks. After dismissal, there are
19, 974 epochs remaining, and the percent of wake labels is 21.2%. The performance
measures for the UCDSADB, while acceptable, are significantly lower than those
observed in the previous model validation steps. To explore the reason for this drop
in performance, we evaluate the CNN model on each subject from the UCDSADB
individually. We calculate the AUC and Cohen’s kappa coefficient for each subject
and plot these numbers in Figure 5.4 against his or her apnea-hypopnea index (AHI).
A subject with a high apnea-hypopnea index experiences frequent interruptions in
sleep that arise from breathing difficulties. Since we have trained our model on a
database of subjects who do not experience such interruptions, we expect to see a
correlation between AHI and model performance measures. We choose AUC and
Cohen’s kappa coefficient because they are relatively stable under changes in class
size. A linear regression analysis shows that AUC = 0.8− 0.0034AHI, where the slope
is statistically significant under t-test when the significance level is set to p = 0.05,
102
Table 5.1: Performance statistics for the CNN model trained on CGMH-training
CGMH-training
CGMH-validation
DREAMSSubjects
UCDSADB
TP 4, 464 1, 800 1, 777 1, 838FP 2, 143 1, 763 2, 151 2, 853TN 31, 550 14, 906 14, 532 12, 883FN 3, 315 1, 633 1, 572 2, 400
SE (%) 57.4 52.4 53.1 43.4SP (%) 93.6 89.4 87.1 81.9
ACC (%) 86.8 83.1 81.4 73.7
PR (%) 67.6 50.5 45.2 39.2F1 0.62 0.51 0.49 0.41
AUC 0.90 0.83 0.81 0.72Kappa 0.54 0.41 0.38 0.24
TP: true positive; FP: false positive; TN: true negative; FN: false negative; ACC: accuracy; AUC:area under the ROC curve; PR: precision; SE: sensitivity; SP: specificity.
and Kappa = 0.34− 0.0047AHI, where the slope is also statistically significant.
5.3.2 Transfer Proficiency Among Monitoring Devices
We extract IHR-PPG series from the PPG recordings in CGMH-training and call
this database CGMH-training-PPG. Similarly, we extract IHR series from the PPG
recordings in CGMH-validation and call this database CGMH-validation-PPG. We
then train two models: one on CGMH-training, and one on CGMH-training-PPG. In
Table 5.2, we show the performance measures associated with applying both models to
CGMH-validation and CGMH-validation-PPG. We see that the overall performance
slightly drops when we validate the model on different modality. When the model
is both trained and validated on IHR-PPG series, the performance is equivalent, or
slightly better, to that of the model trained and validated on the ECG-derived IHR.
103
Percent Wake
0 10 20 30 40 50 60
F1Score
0
0.2
0.4
0.6
0.8
1
Precision
0
0.2
0.4
0.6
0.8
1
F1 Score
Precision
Figure 5.3: We apply the trained CNN model to each subject in CGMH-validationindividually. We plot the associated F1 and precision values against the percentage of“wake” labels in the given recording. We see that the model’s performance statisticsincrease when the percent of “wake” labels increases.
AHI
0 20 40 60 80 100
AUC
0.4
0.6
0.8
1
KappaCoeffi
cient
-0.5
0
0.5
1
AUC
Kappa Coefficient
Figure 5.4: We apply the trained CNN model to each subject in the UCDSADBindividually. We plot the associated AUC and Cohen’s kappa values against eachsubject’s apnea-hypopnea index. We see that the model performs well on healthysubjects.
104
Table 5.2: Transfer proficiency of the CNN model among different monitoring devices
TrainingDatabase
CGMH-training
CGMH-training-PPG
ValidationDatabase
CGMH-validation
-PPG
CGMH-validation
CGMH-validation
-PPG
TP 2, 085 1, 217 1, 760FP 3, 335 1, 042 1, 522TN 13, 324 15, 627 15, 137FN 1, 331 2, 216 1, 656
SE (%) 61.0 35.5 51.5SP (%) 80.0 93.8 90.9
ACC (%) 76.8 83.8 84.2
PR (%) 38.5 53.9 53.6F1 0.47 0.43 0.53
AUC 0.79 0.81 0.84Kappa 0.33 0.34 0.43
TP: true positive; FP: false positive; TN: true negative; FN: false negative; ACC: accuracy; AUC:area under the ROC curve; PR: precision; SE: sensitivity; SP: specificity.
105
30 sec
30 sec
30 sec
30 sec
Labeled EpochAdditional Input
90 sec
270 sec
570 sec
Figure 5.5: We construct input signals of various lengths by additionally consideringdata which precedes the labeled epoch.
5.3.3 Sensitivity Analysis
We examine the effect of input size on the performance of the CNN model. (See
Figure 5.5 for a depiction of the considered input sizes.) In Table 5.3, we show
the performance measures associated with fitting the model to CGMH-training and
applying it to the three validation databases. When the model is trained using an
input size of 30 seconds, the validation performance is lower than when the model
is trained using any amount of the preceding information. When the input size is 2
minutes, the performance is closer to the performance obtained using an input size of
5 minutes. Note, however, that the benefits of including large amounts of preceding
information are not unlimited. We observe no significant improvement in validation
AUC by considering 10 minutes instead of 5 minutes, whereas the improvement
obtained by considering 5 minutes instead of 30 seconds is large. Evidently, the lack of
improvement when considering an input size of 10 minutes is the result of over-fitting
to the training database. Whereas the training process for input signals of length
30 seconds takes approximately 20 minutes, the same process takes over 5 hours for
input signals of length 10 minutes. We declare a trade-off between computational
efficiency and classification proficiency.
Our sensitivity analysis proceeds. We fit the CNN model to our external validation
106
Table 5.3: Sensitivity analysis for the CNN model trained on CGMH-training withvarious input sizes
Evaluated On Input Size 30 Sec 2 Min 10 Min
CGMH-training
TP 2, 860 4, 240 6, 387FP 1, 590 1, 416 2, 419TN 32, 127 32, 294 31, 198FN 5, 399 3, 858 908
SE (%) 34.6 52.4 87.6SP (%) 95.3 95.8 92.8ACC (%) 83.4 87.4 91.9
PR (%) 64.3 75.0 74.3F1 0.45 0.61 0.79AUC 0.81 0.90 0.96Kappa 0.36 0.54 0.74
CGMH-validation
TP 1, 067 1, 619 2, 012FP 1, 164 1, 705 2, 527TN 15, 520 14, 977 14, 072FN 2, 594 1, 963 1, 221
SE (%) 29.2 45.2 62.2SP (%) 93.0 89.9 84.8ACC (%) 81.5 81.9 81.1
PR (%) 47.8 48.7 44.3F1 0.36 0.47 0.52AUC 0.75 0.81 0.83Kappa 0.26 0.36 0.40
DREAMSSubjects
TP 998 1, 458 1, 775FP 1, 197 1, 891 2, 267TN 15, 486 14, 792 14, 404FN 2, 531 2, 011 1, 386
SE (%) 28.3 42.0 56.2SP (%) 92.8 88.7 86.4ACC (%) 81.6 80.6 81.6
PR (%) 45.5 43.5 43.9F1 0.35 0.43 0.49AUC 0.69 0.76 0.81Kappa 0.25 0.31 0.38
UCDSADB
TP 1, 344 1, 789 2, 199FP 2, 293 2, 874 3, 963TN 13, 446 12, 865 11, 750FN 3, 116 2, 596 1, 812
SE (%) 30.1 40.8 54.8SP (%) 85.4 81.7 74.8ACC (%) 73.2 72.8 70.7
PR (%) 37.0 38.4 35.7F1 0.33 0.39 0.43AUC 0.61 0.68 0.69Kappa 0.17 0.22 0.25
TP: true positive; FP: false positive; TN: true negative; FN: false negative; ACC: accuracy; AUC:area under the ROC curve; PR: precision; SE: sensitivity; SP: specificity.
107
databases, one by one, so that we have two trained models. We use an input size of 5
minutes. In Table 5.4, we report the performance measures associated with applying
these models to our three validation databases. Note that some of the performance
measures are inflated because the model has been applied to its training database.
We notice that the DREAMS Subjects Database does not provide a result which
generalizes well to all other databases. Specifically, the associated model obtained an
AUC value of 0.67 on the UCDSADB. This poor performance could be attributed to
the size of the DREAMS Subjects Database or the relative homogeneity of the subjects
compared to the UCDSADB. However, the model trained on the UCDSADB does
transfer effectively to the other databases, and we suggest that the reason involves the
fact that number of “wake” labels is relatively high. This result is surprising because
the number of subjects in this database is less than half the number of subjects in
CGMH-training.
5.4 Further Exploration of the CNN via Data and
Feature Visualization
Although there have been several works trying to theoretically understand CNNs, like
[Mal12, LTR17, WB17], our knowledge about the CNN framework is still limited. We
should take care to observe that the behaviour of our model is not unreasonable. At
the very least, viewing the activity of the model provides insight into how the network
captures HRV quantities. We introduce some notation to ease the exposition. Write
xi20102i=1 ⊂ R1200 for the 5-minute input signals from CGMH-validation. For each i,
let yi = f(xi) ∈ R38×10 denote the output of the last convolution block in the network,
where f :R1200 → R38×10 is used to denote applying the convolutional section of the
network in a feed-forward fashion. The size of each yi is determined by the network
architecture: there are 10 filters in the last convolutional layer, and we think of the
108
Table 5.4: Performance statistics for the CNN models trained on the DREAMSSubjects Database and the UCDSADB
TrainingValidation CGMH
-validationDREAMSSubjects
UCDSADB
DREAMSSubjects
TP 1, 517 2, 095 1, 166FP 1, 928 1, 396 1, 713TN 14, 741 15, 287 14, 023FN 1, 916 1, 254 3, 072
SE (%) 44.2 62.6 27.5SP (%) 88.4 91.6 89.1ACC (%) 80.9 86.8 76.0
PR (%) 44.0 60.0 40.5F1 (%) 0.44 0.61 0.33AUC 0.77 0.89 0.67Kappa 0.33 0.53 0.19
UCDSADB
TP 1, 579 1, 723 2, 169FP 2, 271 3, 000 911TN 14, 398 13, 706 14, 825FN 1, 854 1, 626 2, 069
SE (%) 46.0 51.5 51.2SP (%) 86.4 82.0 94.2ACC (%) 79.5 76.9 85.1
PR (%) 41.0 36.5 70.4F1 (%) 0.43 0.43 0.59AUC 0.75 0.75 0.87Kappa 0.31 0.29 0.50
TP: true positive; FP: false positive; TN: true negative; FN: false negative; ACC: accuracy; AUC:area under the ROC curve; PR: precision; SE: sensitivity; SP: specificity.
109
38 × 10 matrix yi as 10 filtered and down-sampled versions of the original xi. The
intuition is that if yji ∈ R38 denotes the j-th column of yi, i.e. the j-th filtered and
down-sampled version of xi, then the probability that xi contains some feature ψj
at a time corresponding to the t-th sample (1 ≤ t ≤ 38) is yji (t). Note that f is a
non-negative matrix-valued function because of the ReLU activation function, which
maps all negative numbers to zero. We discover the morphology of the features ψj as
follows. Fix one of the 38 samples t, and fix j ∈ 1, ..., 10, one of the 10 non-linearly
filtered signals outputted by the final convolution block. Consider the subset of 400
indices defined sequentially as
ik := arg maxyji (t) : i /∈ i1, ..., ik−1 1 ≤ k ≤ 400, (5.2)
and set
ψj(t) := medianxik : 1 ≤ k ≤ 400 ∈ R1200. (5.3)
In Figure 5.4, we plot ψj(t) for 1 ≤ j ≤ 10 and t = 18. These plots represent the
common information held among input signals with similar activations. Note that
choosing a different sample t centers the main activity of the plot at a different point
in time. The captured features appear to be related to low-frequency variabilities
inside the IHR series, but this result could be merely a consequence of the fact that
the averaging process cancels out the detailed information.
According to our analysis, the features in plots 2, 3, 5, 8, 9, and 10 are indicative of
the wake stage, whereas the other plots are indicative of the sleep stage. The procedure
is as follows. For each sample t ∈ 1, ..., 38 and each output signal j ∈ 1, ..., 10,
we perform two statistical tests using the labeled data from CGMH-validation. For
each i, let xi(l) denote the label (1 or 0 for “wake” or “sleep”, respectively) associated
to the input signal xi. In Figure 5.4 (a), the (j, t)-entry is − logp, where p is the
probability that the set
yji (t) : xi(l) = 1 (5.4)
110
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Time (seconds)-200 -100 0
Med
ianHR
Fluctuation
0
5
10
15
Figure 5.6: We illustrate the features captured by the trained CNN model. Theplots in the first row are ψ1(18)-ψ4(18), the plots in the second row are ψ5(18)-ψ8(18),and the plots in the third row are ψ9(18)-ψ10(18).
is sampled from a distribution with median less or equal to the median of the
distribution from which the set
yji (t) : xi(l) = 0 (5.5)
is sampled. The colour black is associated with large values of − log p and the
proposition that if xi possesses feature ψj at a time corresponding to sample t, then
xi(l) is likely equal to 1. In Figure 5.4 (b), the (j, t)-entry is − logp, where p is the
probability that the set
yji (t) : xi(l) = 0 (5.6)
is sampled from a distribution with median less or equal to the median of the
distribution from which the set
yji (t) : xi(l) = 1 (5.7)
111
Output Signal j
2 4 6 8 10
Sample
t
10
20
30
(a) Features indicative of the wake stageOutput Signal j
2 4 6 8 10
Sample
t
10
20
30
(b) Features indicative of the sleep stage
Figure 5.7: At each time t, we assess the predictive potential of the 10 features ψjlearned by the CNN model. The black colour in the (j, t)-entry of (a) (resp. (b))indicates that the feature ψj is useful for identifying the wake (resp. sleep) stage whenit occurs at sample t.
is sampled. The colour black is associated with large values of − log p and the
proposition that if xi possesses feature ψj at a time corresponding to sample t, then
xi(l) is likely equal to 0. Remember that the label associated to xi is based on the
last 30 seconds of xi. What we see is expected: detecting a feature closer to the end
of the signal (t→ 38) is more useful for determining the true label of the input signal.
In Figure 5.4, we feed input signals 5 minutes in length through the trained CNN.
Given an input signal xi, we plot the output signals
σwakei (t) := maxyji (t) : j = 2, 3, 5, 8, 9, 10; (5.8)
σsleepi (t) := maxyji (t) : j = 1, 4, 6, 7. (5.9)
Note that we previously established which features ψj in our model correspond to
the wake and sleep stages (see Figure 5.4). Maximum activations are considered
to enhance visibility. What we see is expected: in the first three examples (from
CGMH-validation), when the subject is awake, the network detects the fluctuations
which, based on our physiological knowledge [SHMG64, SDMA93], should correspond
to sympathetic tone dominance and line up with the subject’s arousal. For the first
input signal, we can visualize a clear change in heart rate during the wake stage. For
112
Time (seconds)-200 -100 0
Heart
RateFluctuation
-10
0
10
20
30
40
xi
Time (seconds)-200 -100 0
Heart
RateFluctuation
-10
0
10
20
30
40
xi
Time (seconds)-200 -100 0
Heart
RateFluctuation
-10
0
10
20
30
40
xi
Time (seconds)-200 -100 0
Heart
RateFluctuation
-10
0
10
20
30
40
xi
Time (seconds)-200 -100 0
Max
imum
Activation
0
0.5
1
1.5
σwakei
Time (seconds)-200 -100 0
Max
imum
Activation
0
0.5
1
1.5
σwakei
Time (seconds)-200 -100 0
Max
imum
Activation
0
0.5
1
1.5
σwakei
Time (seconds)-200 -100 0
Max
imum
Activation
0
0.5
1
1.5
σwakei
Time (seconds)-200 -100 0
Maxim
um
Activation
0
0.5
1
1.5
σsleepi
Time (seconds)-200 -100 0
Maxim
um
Activation
0
0.5
1
1.5
σsleepi
Time (seconds)-200 -100 0
Maxim
um
Activation
0
0.5
1
1.5
σsleepi
Time (seconds)-200 -100 0
Maxim
um
Activation
0
0.5
1
1.5
σsleepi
Figure 5.8: We present typical activations in the last convolutional layer of the CNNmodel. The top row shows four typical input IHR signals, and the correspondingactivations in the last convolutional layer of the CNN model are shown in the bottomtwo rows. See (5.8) for definitions of σwake
i and σsleepi . The red colour indicates that
the subject is awake, whereas the black colour indicates that the subject is asleep.
the second and third signals, σwakei appears to be correlated with the irregularity of
the IHR series, whereas for the last signal, σsleepi appears to be correlated with the
regularity of the IHR series. It is not always visually clear which properties of the
IHR series the network is recognizing, but the results in this paper indicate that the
network is successful at capturing these hidden dynamics.
5.5 Discussion
In the field of HRV analysis, an HRV index is usually calculated on a window (subset)
of the IHR or RRI series. For example, it is suggested to estimate HRV indices
such as approximate entropy [Eng98] or one of the many Poincare map-based indices
[CdlHMFAL06] using a window containing at least 300 beats. The choice of window
113
size strongly affects the quality of information extracted by the chosen HRV index.
If the window size is large, and the IHR can be well-approximated by a stationary
process, the HRV index is stable. But in the presence of non-stationarity (as is usually
the case for large windows), the index becomes unreliable (e.g. the standard deviation
of a time series is strongly affected by baseline wandering). If the window size is small,
under an assumption of local stationarity, the impact of non-stationarity becomes
negligible. However, shortening the window destabilizes most indices because they
are calculated using too little information. Hence, a type of “uncertainty principle”
exists in HRV analysis, just as in other non-stationary time series analysis fields.
The traditional approach to HRV index calculation is to associate a single number to
each window of the IHR or RRI series. This number is effectively invariant under cyclic
permutations of the window, and the location of a detected feature is not preserved.
On the other hand, a 1-dimensional convolutional layer behaves equivariantly under
translations of the input, i.e. if the inputted time series is cyclically permuted, so is
the outputted time series. In Figure 5.4, we see a temporal correspondence between
fluctuation morphologies in the inputted time series and large activations in the
outputted time series. The binary classifier appended to the final convolution block is
informed not only of the presence of certain HRV features, but also of their location
in time relative to the labeled epoch.
The pathway through which input size affects the classification result is unclear,
but we provide some interpretation. Sleep stage classification using heart rate relies on
precisely locating time-varying fluctuation in the IHR series. A large-enough window
is required to capture these fluctuations. This high-level interpretation is consistent
with baseline calibration techniques used in the field of anaesthesiology. For example,
in [SKVG+05], the authors use HRV indices to determine if a subject has felt the
pain of a surgical incision during anaesthesia. Post-incision (0-120 sec) fluctuation
114
of the IHR series indicates a painful experience for the patient. The authors detect
this fluctuation by comparing the post-incision HRV indices to the pre-incision HRV
indices, while a normalization scheme is used to account for inter-individual variance.
The crucial information is not the raw HRV indices but whether they begin to fluctuate
after the incision compared with the baseline. A similar study using thermal stimuli
is [HKH+12].
In the robustness check, we carry out a kind of “cross-validation” at the database
level (see Table 5.4). From the statistical viewpoint, this check provides evidence
that the model structure and training process is effective independently of the chosen
database. Without any theory to back up the effectiveness of the CNN framework, we
need reassurance that the choices of parameters such as the number of convolution
blocks, the number of filters, the kernel size, the number of epochs, and the learning
rate are not over-fitted to one use-case. We take further care to make the pre-processing
steps as weak as possible: we do not manually correct the R peak detection algorithm
outcome, but simply reject epochs with a very loose criterion. Our database-level
cross-validation also shows that the selected CNN-IHR series pairing is robust to
factors which should be irrelevant to the classification problem, such as the monitoring
machine, the original sampling rate, and the presence of measurement noise. An
interesting phenomenon we observe in Table 5.4 is that although the UCDSADB
contains subjects with sleep apnea of various severities, the model established from it
performs reasonably well on CGMH-validation and the DREAMS Subjects Database.
On the other hand, when the model is trained on CGMH-training or the DREAMS
Subjects Database (which are composed of normal subjects), we see that it does
not perform well on the UCDSADB, particularly on those subjects with severe sleep
apnea (see Figure 5.4). This result suggests an interesting conjecture: the capacity
of the CNN framework is large enough to include features from both normal and
115
abnormal subjects. In other words, the CNN efficiently learns features for both normal
and abnormal subjects from the UCDSADB, but it does not learn anything about
abnormal subjects from the DREAMS Subjects Database, and this conjecture explains
the asymmetric validation outcomes shown in Tables 5.1 and 5.4.
The transfer proficiency test deserves more discussion. In addition to showing that
the CNN model trained on one modality can be applied to another modality, we see
that the CNN model which is both trained and validated on IHR-PPG series performs
slightly better than the CNN model trained and validated on IHR series (see Tables
5.1 and 5.2). At first glance, this result seems unrealistic because the link between the
ECG signal and the ANS is more direct than the link between the PPG signal and the
ANS. However, our least obliging intuition is that this slight difference is a consequence
of the sensitivity of the PPG signal to movement. A moving subject is likely awake,
and when a subject moves, his PPG signal will experience more noise than his ECG
signal. Hence, on average, the IHR-PPG series will experience more distortions than
the IHR series. We could hypothesize that the CNN associates such distortions with
the wake stage, but other interpretations are available: note that although the IHR
and IHR-PPG series both aim to capture heart rate, they are different. The main
difference comes from the pulse transit time (PTT), which describes the time it takes
for pumped blood to travel from the heart to the peripheral, where the PPG signal is
recorded. It is well known that the PTT contains blood pressure information and more
general information about hemodynamics [MHI+15]. In other words, the IHR-PPG
contains both heart rate information and hemodynamics information. A large scale
empirical study is needed to confirm whether the CNN framework is sensitive enough
to exploit this additional information, and a theoretical analysis is recommended.
Finally, we discuss our exploration of the CNN framework provided in Section 5.4.
The analysis carried out is applicable to only one version of the trained model. There
116
is no theoretical basis for expecting the model to be the same after re-training. Besides
exhibiting a simple permutation of the features, the network could converge to an
entirely different local minimum, causing the new feature set to be distinct from the
old one. It is important to realize that although we have shown the robustness of our
model from various viewpoints, this lack of explicit reproducibility experienced with
NNs is troubling.
5.5.1 Comparison with Previous Work
We compare our result to previous studies in the field that feature inter-individual
validation or cross-validation. In [XYS+13], 41 features were used to build a random
forest model and differentiate between the wake, REM, and non-REM stages. The
database was composed of healthy subjects between the ages of 16 and 61. Based
on the confusion matrix provided in [XYS+13], the sensitivity, specificity, accuracy,
and F1 for detecting the wake stage are 51.2%, 90.2%, 84.0%, and 0.50. However, we
cannot make a direct comparison between our work and the work in [XYS+13] because
the authors only analyzed those data labeled with “stationary,” but we analyze the
whole database.
In [MMC+10], the REM and non-REM stages were differentiated. The authors
take the temporal information into account by using a time-varying auto-regressive
model, and they use the phase and magnitude of the “sleepy pole” as new features.
The database was composed of 24 subjects between the ages of 40 and 50 with body
mass index less than 29 kg/m2 and apnea-hypopnea index 0. The reported sensitivity,
specificity, and accuracy of the trained hidden Markov model was 70.2%, 85.1%, and
79.3%, respectively. Although the HRV properties of the REM stage are similar to
those of the wake stage, they are physiologically different and we cannot achieve a
fair comparison.
117
In [LSCS05], a variety of features and classification algorithms were considered
and evaluated on a database composed of 190 infants. The wake and sleep classes
were balanced for the analysis. The sensitivity and specificity of their multi-layer
perceptron model (without rejection) was 79.0% and 77.5%, respectively. Because
there are physiological variations between the sleep dynamics of infants and adults, a
comparison might not be meaningful.
In [AMT+15], a feed-forward NN was applied to various time-domain, frequency-
domain, and regularity features to differentiate between the wake and sleep stages.
Detrended fluctuation analysis was also used. Various epoch lengths were considered,
and the highest performance was recorded on an epoch length of 5 minutes. The ECG
recordings came from 20 subjects aged 49-68 years with varying degrees of sleep apnea.
The authors performed two cross-validation schemes: in the inter-individual leave-one-
subject-out scheme, the accuracy, sensitivity, specificity, and kappa coefficient was
71.9± 18.2%, 43.7± 27.3%, 89.0± 7.8%, and 0.29± 0.24, respectively. (The other
validation method was not inter-individual.) In comparison with our approach, the
lower performance of their model could be attributed to the physiological heterogeneity
of their subjects.
In [LFH+12], an adaptive method was used to extract spectral HRV features
from the IHR series. Fifteen subjects aged 31.0± 10.4 years with Pittsburgh Sleep
Quality Index less than 6 were considered in a leave-one-subject-out cross-validation
scheme. The linear discriminant-based classifier achieved a Cohen’s kappa coefficient
of 0.48±0.24 and an AUC of 0.54. The sensitivity was 49.7±19.2%, and the specificity
was 96± 3.3%. Further publications which perform inter-individual classification are
[TMPG05, KMF09, FLR+15]. However, they are not compared here because they
additionally report use of the respiratory signal.
Overall, due to the heterogeneity of the data sets used in different publications, it
118
is not easy to make direct comparisons, but we recognize from the reported numbers
the standard to which we should adhere.
5.5.2 Limitations and Future Work
From the database perspective, the chief limitation in this study is the class imbalance.
As a result of this imbalance, the training procedure is more challenging, and the
algorithm’s performance is skewed by the size difference between the two classes. This
can be seen in the low sensitivity of all validation checks. Another limitation is the
case number. It is widely believed that the larger the high quality database we have,
the better the model we can train. We show that a small training set from a few
subjects is enough to obtain a network which generalizes well, but, as demonstrated
by our results on the UCDSADB in Table 5.3, physiological heterogeneity related
to sleep apnea contributes strongly to adverse classification results. With a larger
database, we might tune the training process to accommodate physiological variability
among subjects and databases.
From the signal processing perspective, in this study we focus mainly on the IHR
and IHR-PPG series, which are extracted from the raw ECG and PPG signals. It is
clear that the IHR and IHR-PPG series are “dimension reduced” versions of the raw
ECG and PPG signals, which contain more information than simply heart rate; e.g.
respiratory information is usually hidden in the ECG and PPG signals, and blood
pressure and hemodynamics information is contained in the PPG signal. We might
obtain an improvement by taking the raw time series into account. The cost of doing so
increases as both the size of the network and the size of the training set would need to
grow. On the other hand, if there are more recorded vital signs available, taking them
into account may improve classification performance. For example, we might take
multiple ECG leads into account, or we might consider an ECG recording and a PPG
119
recording simultaneously. It has been shown in [TMPG05, KMF09, FLR+15] that
the respiratory flow signal contains abundant information related to sleep dynamics.
From the algorithmic and theoretical perspective, the network architecture em-
ployed in this paper is relatively simple compared to other works in the field (e.g.
[YNS+15]) and could be augmented to improve classification performance at the
cost of computational resources. We might introduce a recurrent structure to
the network, as in long short-term memory networks [HS97], to beneficially uti-
lize the sequential nature of the problem. Although there have been several efforts
[Mal12, PNB16, LTR17, WB17] to understand how CNNs and general deep NNs
work, it is still a quite open question. Again, understanding this “black box” is an
urgent mission for upcoming applications.
From the clinical perspective, accurately assessing the wake stage has an important
application in the field of sleep apnea screening. The apnea-hypopnea index (AHI) is
obtained by counting the apnea and hypopnea events that occur while the subject
is asleep. If a patient is labeled as awake while he is asleep, then his AHI will be
underestimated. Without a proper diagnosis, an unhealthy patient will continue
to be exposed to the dangers associated with severe sleep apnea. Our established
model can be applied to the problem of screening for sleep apnea using home-care
class IV equipment. There are several open problems listed above, ranging from data
collection, to theoretical work, to clinical applications. We will report the result of
resolving these problems in future work.
5.6 Conclusion
In this study, we build a CNN model to predict every 30 seconds whether a subject is
awake or asleep using his overnight ECG or PPG recording. The considered CNN not
only detects useful HRV features from the interpolated IHR series but also pinpoints
120
their location in time. Since the DREAMS Subjects Database is composed of different
races, our results suggest that the model is flexible enough to be applied to different
races. The acceptable performance on the UCDSADB shows that the model is flexible
enough to be applied to subjects with various severities of sleep apnea. Our series of
robustness checks provides evidence that although the CNN framework is minimally
understood from a theoretical viewpoint, its effectiveness in our chosen application is
not accidental.
5.7 Supplementary Results
For the sake of self-containedness, we present results in connection with the main
result which further validate the use of this method in the particular field of sleep
stage classification. Although it is not the main focus of this work, we have interest
in the binary problem of differentiating between the REM and non-REM stages while
a subject is asleep [MMC+10, AMT+15]. As in [AMT+15], we discard all epochs
labeled with “wake.” Since we are no longer concerned with the wake stage, we return
to our original database of 90 subjects from CGMH and reject a subject if his ratio of
“REM” to “non-REM” labels is less than 0.1. There are 77 subjects remaining, and
we call this database CGMH-training (REM). (Besides removing the “wake” epochs,
we do not change CGMH-validation for this experiment.) We train the CNN model
on signals of length 5 minutes from CGMH-training (REM), and report in Table 5.5
the performance measures associated with applying the trained model to the three
validation databases. In this table, the positive class is associated to the “REM”
label. After disregarding the “wake” labels, the percent of “REM” labels is 17.8% in
CGMH-training (REM), 14.8% in CGMH-validation, 18.1% in the DREAMS Subjects
Database, and 19.0% in the UCDSADB. Overall, the performance of the CNN model
appears to exceed the performances reported in [MMC+10, AMT+15]. Note that the
121
performance is lower in the UCDSADB, which is expected due to sleep disturbances
caused by sleep apnea.
The scattering transform (ST) [Mal12] is a theoretical framework based on the
wavelet transform that builds representations of time series and images that are useful
for classification. The ST is inspired by the success of the CNN framework and builds
features using a similar succession of operations. The crucial difference between the
ST and the CNN framework is that the filters employed in the ST are pre-designed
wavelets and are not learned. The theoretical properties of the ST are well-understood
and are discussed in [Mal12] with several generalizations [CL19, QCCS18]. We replace
the feature design step in our CNN model with the ST in order to provide evidence
that the effectiveness of our CNN model is not accidental. We compute the second
order (m = 2) ST of each input time series using Morlet wavelets, one wavelet per
octave (Q = 1), and a local averaging parameter of T = 27. A support vector machine
(SVM) is commonly used to perform the classification step. However, in our work,
the features designed by the ST are fed through a neural network classifier consisting
of two dense layers of 20 nodes; each node employs a bias and a sigmoid activation
function. The output of the second dense layer is fed with biases into two output
nodes corresponding to the wake and sleep stages. We train the network using scaled
conjugate gradient back-propagation for 200 epochs, and we use cross-entropy as the
loss function. We implement all of the ST and classification steps using MATLAB
R2015a, ScatNet-0.2,3 and the Neural Network Toolbox. After training the ST-based
model on CGMH-training, the performance statistics are shown in Table 5.6. The
results obtained on CGMH-validation are close to the corresponding results obtained
using the CNN model. The results obtained on the DREAMS Subjects Database
and the UCDSADB are acceptable but not as strong as the corresponding results
3http://www.di.ens.fr/data/software/scatnet/
122
Table 5.5: Performance statistics for the REM vs. non-REM model trained onCGMH-training (REM)
CGMH-training(REM)
CGMH-validation
DREAMSSubjects
UCDSADB
TP 5, 822 1, 563 1, 363 1, 626FP 3, 154 1, 122 713 1, 222TN 37, 632 13, 083 12, 974 11, 527FN 3, 017 901 1, 656 1, 361
SE (%) 65.9 63.4 45.2 54.4SP (%) 92.3 92.1 94.8 90.4
ACC (%) 87.6 87.9 85.8 83.6
PR (%) 64.9 58.2 65.7 57.1F1 0.65 0.61 0.54 0.56
AUC 0.91 0.89 0.87 0.83Kappa 0.58 0.54 0.45 0.47
TP: true positive; FP: false positive; TN: true negative; FN: false negative; ACC: accuracy; AUC:area under the ROC curve; PR: precision; SE: sensitivity; SP: specificity.
obtained using the CNN model.
123
Table 5.6: Performance statistics for the scattering transform-based model trainedon CGMH-training
CGMH-training
CGMH-validation
DREAMS UCDSADB
TP 4, 026 1, 709 1, 569 1, 773FP 3, 874 1, 684 2, 560 2, 873TN 29, 819 14, 985 14, 123 12, 863FN 3, 753 1, 724 1, 780 2, 465
SE (%) 51.8 49.8 46.8 41.8SP (%) 88.5 89.9 84.7 81.7
ACC (%) 81.6 83.0 78.3 73.3
PR (%) 51.0 50.4 38.0 38.2F1 0.51 0.50 0.42 0.40
AUC 0.83 0.82 0.75 0.68Kappa 0.40 0.40 0.29 0.23
TP: true positive; FP: false positive; TN: true negative; FN: false negative; ACC: accuracy; AUC:area under the ROC curve; PR: precision; SE: sensitivity; SP: specificity.
124
Chapter 6
Blind Source Separation in Atrial
Fibrillation
Electrocardiograms often feature an arrhythmia called atrial fibrillation (Af) that is
associated with heart failure and stroke [BCB12]. During Af, the upper (atrial) and
lower (ventricular) compartments of the heart are de-synchronized, and the recorded
ECG is a summation of two independent time series (see Figure 6.1). Isolating
these time series is a single-channel blind source separation problem. Cardiologists
are interested in leveraging the ECG recordings of Af patients to predict adverse
cardiac events and thus determine the necessity of a rhythm intervention. The
relationship between a patient’s ventricular activity and his or her susceptibility to
negative outcomes has been extensively studied (less so for Af patients in particular)
[MC90], but a rigorous analysis of the atrial activity (called the fibrillatory wave,
or the f -wave) has been elusive due in part to the lack of an effective extraction
algorithm. In this chapter, we discuss the development of an f -wave extraction
algorithm which improved on a number of existing algorithms; previous algorithms
would leave ventricular residua in the extracted f -wave [MRWW17, Figure 4]. The
algorithm was designed to work on single-channel ECG signals of arbitrary length
in order to leverage the increasing prevalence of mobile health monitoring devices
[BKH+14] and because of the need to screen for diurnal patterns in the f -wave. The
algorithm does not depend on expert interpretation or fine instrumentation, and thus
can increase the diagnostic capabilities of medical teams in rural areas or developing
countries.
There are several challenges we must face when performing f -wave extraction.
125
Figure 6.1: We show an ECG signal featuring atrial fibrillation. To aid physiciansin medical decision-making, the atrial activity (the f -wave) must be separated fromthe ventricular activity.
The spectral overlap between the ventricular activity and the f -wave guarantees
that a filtering approach will fail to fully separate the f -wave from the ventricular
activity. The second challenge is that due to the dynamics of the cardiac system,
the shape of each ventricular activation changes from beat to beat. In Af patients,
the situation is further complicated by frequent morphologically distinct premature
ventricular contractions (PVCs). The third challenge is the lack of ground truth.
In general, we do not have an explicit characterization of the separated atrial and
ventricular activities. Such a characterization could be obtained from an intracardiac
electrophysiology study. However, the relationship between the intracardiac atrial
activity and the f -wave that we observe in a surface ECG is not trivial. This lack of
ground truth makes validating a proposed algorithm difficult. To handle this difficulty,
we need a set of simulated recordings that well-approximate real Af cases, as well as a
new metric to evaluate the performance of the algorithm in a real signal. Having a
systematic, reliable, and accurate method to extract the f -wave from a single-lead
ECG for the purposes of monitoring, diagnosis, and treatment is not only a challenging
126
and interesting signal processing problem, but also an important clinical endeavor.
This chapter is a modification of the work presented in [MRWW17] and is organized
in the following way. In Section 6.1, we introduce the proposed single-lead f -wave
extraction algorithm. We specify our model for an ECG featuring Af, and we
provide a theoretical guarantee for the quality of the extracted f -wave under suitable
assumptions. In Section 6.2, we validate our proposed algorithm on simulated signals,
as well as on real Holter recordings. We compare our algorithm to other common
algorithms in the literature. The simulation methods are detailed for the purpose of
reproducibility. We provide a discussion of the result in Section 6.3.
6.1 Methods
In this section, we describe our new single-lead f -wave extraction algorithm, the
simulated database of Af signals, and the metrics used to evaluate our algorithm. We
call our algorithm DD-NLEM.
6.1.1 DD-NLEM
We begin by providing an overview of the theoretical aspects of our algorithm. An Af
patient’s ECG signal E:R→ R can be written as
E = V + A+ Φ, (6.1)
where V :R → R is the ventricular activity, A:R → R is the atrial activity, and
Φ:R → R is measurement noise. The input to our single-lead f -wave extraction
algorithm is the signal E, and the desired output is the signal A + Φ. Due to the
rapid nature of the f -wave, the signal A will be periodic with frequency 3-12 Hz. A
bandpass filter h:R→ R can be applied so that
h(E) = h(V ) + ε, (6.2)
127
where ε:R → R is a small noise term describing the difference between h(V ) and
h(V + A + Φ). However, the filtered h(V ) is a distortion of the original V , and if
we simply attempt to remove h(E) from E, the resulting depiction of the f -wave
will have significant residual artifacts because the signals V and h(V ) can be quite
different. The remedy we propose is as follows. The ventricular activity during the
first N ventricular contractions (heart beats) can be written as
V (t) =N∑i=1
vi(t− ti), (6.3)
where vi ∈ V is sampled from a wave-shape manifold V embedded in L2(R), and there
exists a positive number b > 0 such that each vi is supported on [−b, b]. In other
words, V adheres to the wave-shape oscillatory model with wave-shape manifold V.
The manifold V is isomorphic to the implicit phase manifold describing the dynamics
of the ventricles. Each vi represents the i-th ventricular contraction, and ti ∈ R is the
time at which the i-th contraction occurs.
The time points ti can be recovered from the signal E using a standard peak
detection algorithm. Then, a set of patches X := xiNi=1 ⊂ L2[−b, b] is constructed
using the formula
xi(t) = E(ti − t). (6.4)
The corresponding filtered patches are also assembled: h(X) := h(xi)Ni=1. Now
xi = vi + ai + φi, (6.5)
and
h(xi) = h(vi) + εi, (6.6)
where ai(t) := A(ti − t) and φi(t) := Φ(ti − t) for t ∈ [−b, b], and εi is a small noise
term describing the remainder h(vi + ai + φi)− h(vi). We assume that h(vi) has been
sampled from a submanifold W of L2[−b, b] which is diffeomorphic to V .
128
The truncated diffusion map (an unsupervised manifold learning algorithm) yields
a mapping Θp : h(X) → Rp which approximates an isometry of W (and hence an
embedding of V , via the diffeomorphism) into Rp for some p that is large enough. Most
importantly, the mapping Θp can be made agnostic to the noise εi by ignoring the
eigenvectors of the diffusion operator which correspond to smaller eigenvalues. Given
a patch xi, we can estimate vi by an approach called the non-local median. First,
find the k-nearest neighbours to Θp h(xi) from the set Θp h(X), and label them
Θph(xi1), ...,Θph(xiK). Due to the embedding which preserves local neighbourhoods,
we can expect that
vi := medianxi1, ..., xiK = vi + medianai1 + φi1, ..., aiK + φiK, (6.7)
where the median is taken point-wise in t. Since the atrial activity and measurement
noise in each patch are independent of the ventricular activity in that patch, we have
medianai1 + φi1, ..., aiK + φiK = median1≤j≤Naj + φj, (6.8)
which we assume is the zero function on [−b, b]. Consequently, vi is an accurate
estimate for vi. Removing
V (t) :=N∑i=1
vi(t− ti) (6.9)
from E yields the signal of interest, namely A+ Φ. Having provided an overview of
the theory behind DD-NLEM, in the remainder of this section, we discuss its numerical
implementation.
Pre-processing
The pre-processing step contains several commonly applied methods in the field. The
recorded surface ECG is sampled at a rate of fs samples per second and is saved as a
n-dimensional vector E ∈ Rn, where for each i ∈ 1, ..., n, we have
E(i) = E (i/fs) . (6.10)
129
The signal lasts for T = n/fs seconds, and the collected samples satisfy
E = V + A + Φ , (6.11)
where V,A,Φ ∈ Rn are discretizations of V , A, and Φ, respectively. The goal of the
implementation is to cancel V from E so that we obtain the discretized f -wave A+Φ.
If fs is lower than 1000 Hz, the signal is upsampled to fs = 1000 Hz to enhance
the resolution of the R wave detection step that will follow. We remove baseline
wandering and attenuate noise by applying an order-20 bi-directional IIR bandpass
filter with half power frequencies 3 and 40 Hz. We apply the QRS complex detection
algorithm proposed by Elgendi [Elg13] to find all ventricular activations in the signal.
We assume that the N R waves in E are correctly recovered. The sample index ri of
the ith R wave is the integer part of 1000× ti. At this stage, we can delineate the
i-th cardiac cycle. Define
xi(t+ 301) = E(ri + t) − 300 ≤ t ≤ 700. (6.12)
The intuition is that xi represents the discretization of xi.
Metric Design
We now design the filter h that will remove the atrial activity from the recording
(while distorting the ventricular activity). The filtered signal will allow us to recover
the wave-shape manifold V underlying the set of ventricular contractions. We perform
an adaptation of the filtering performed in Elgendi’s QRS detection algorithm to yield
a signal which contains primarily QRS complex activity. We apply a bidirectional 3rd
order Butterworth bandpass filter to E with cutoff frequencies 8 and 40 Hz. Write
s for the filtered signal. We let q be the result of applying a moving average filter
with window length 100 ms to s s, where is the Hadamard element-wise product.
The purpose of taking the coordinate-wise square is to emphasize the high-frequency
130
and high-amplitude content in s. Set y = q s to be the discretization of h(E). We
excise from y a collection of surrogate patches yiNi=1 ⊂ R161 defined as
yi(t+ 81) = y (ri + t) − 80 ≤ t ≤ 80. (6.13)
The intuition is that yi is a discretization of the “center” of h(xi). Note that the T
wave has been suppressed by the 3 Hz highpass filter, and the ventricular activity
remaining in yi is a deformation of the i-th QRS complex.
Ideally, the surrogate patches yi and yj are similar if and only if vi is similar to vj .
However, since the patch yi contains a discretized deformation of vi as well as noise,
we need a metric besides the Euclidean distance in R161 which is robust to that noise.
The metric we will use is the diffusion distance on yi. To be specific, we perform
the complete-graph diffusion map as recommended by El Karoui and Wu. Define the
self-tuning affinity matrix W ∈ RN×N as
Wij =1
2
[exp
(−‖yi − yj‖2
σ2i
)+ exp
(−‖yi − yj‖2
σ2j
)], (6.14)
where σi is the median Euclidean distance from yi to the set ykNk=1. Define the
diagonal degree matrix D ∈ RN×N as
Dii =∑j≥1
Wij. (6.15)
We carry out the α-normalization step with α = 1 to handle the (possibly) non-uniform
density function:
W (α) = D−1WD−1. (6.16)
Define the isotropic diffusion kernel P as
P =(D(α)
)−1/2W (α)
(D(α)
)−1/2, (6.17)
131
where D(α)ii =
∑j≥1W
(α)ij . Write the eigenvectors of P as ϕ1, ..., ϕN , with their
corresponding eigenvalues λ1 ≥ · · · ≥ λN . The truncated time-1 diffusion map
restricted to yi is
Θp : yi 7→ (λ2φ2(i), . . . , λp+1φp+1(i))> ∈ Rp, (6.18)
where p = 80 is chosen by the user, and ϕj =(D(α)
)φj for each 1 ≤ j ≤ N . Finally,
the diffusion distance (which approximates the geodesic between points on V) is the
pullback of the canonical metric on Rp. This diffusion distance dDD depends solely on
the ventricular activity and is robust to the residual noise in the surrogate patches.
In particular, we have
dDD(yi,yj) = ‖Θp(yi)−Θp(yj)‖. (6.19)
Template Calculation and Removal
With the designed metric dDD, we are able to estimate the ventricular activity in
each patch xi. We apply a technique called the non-local Euclidean median (NLEM)
to achieve this goal. The NLEM is computed using an iteratively reweighted least
squares-type algorithm as described in [CS12]. Choose a number of neighbours k = 30.
For each patch yi, choose the k-nearest patches, denoted yi1, ...,yik, using the
metric dDD. Note that yi1 = yi. Let vi ∈ R1001 be the estimated i-th ventricular
activation, given by
vi = argminz∈R1001
k∑j=1
‖xij − z‖2. (6.20)
The final step in DD-NLEM is removing the estimated ventricular activity from E.
Initially, set a = E, and then iteratively substitute
a(ri − 300, ..., ri + 700) = xi − ζ vi (6.21)
132
for every i = 1, . . . , N , where the taper function ζ ∈ R1001 is, for c = 100,
ζ(i) =
sin2
[π(i−1)
2c
]if 1 ≤ i ≤ c
1 if c < i < 1001− csin2
[π(i−1001)
2c
]if 1001− c ≤ i ≤ 1001.
(6.22)
In other words, we obtain a blending of the estimated f -wave with the patch xi near
the edges of xi. Removal of the patches vi should be done in chronological order. The
output of our algorithm is a.
6.1.2 A Variation of DD-NLEM
In some cases, we may only have a short ECG recording. In this case, running the DM
algorithm might not be feasible. In this situation, we could set p = 161, Θp = idR161
and replace (6.20) by
vi = argminz∈R1001
k∑j=1
‖xij − z‖2. (6.23)
Instead of using the embedded patch space, we directly use the (possibly noisy)
patches yi to estimate the ventricular activity. We call this variation NLEM.
6.1.3 Material
To validate the proposed algorithm, we consider simulated signals and a set of real
Holter signals. We prepare simulated signals by adding an atrial wave and noise to a
purely ventricular signal.
Simulated f-wave
Our simulated f -wave is generated using a generalization of the sawtooth model
[SS01, PMSL12]. The sawtooth model is as follows. Define the atrial wave a:R→ R
133
to be
a(t) = −απ
M∑m=1
(−1)msin (2πmξt)
mt ∈ R, (6.24)
where α = 0.25± 0.025 is the amplitude of the wave, ξ = 6± 1 is the frequency, and
M = 4 is the number of sinusoidal harmonics in the sawtooth wave. The above model
is discretized at the sampling rate fs as
a(j) =α
π
M∑m=1
(−1)m+1
msin
(2πmξj
fs
)j ∈ Z. (6.25)
To obtain a more realistic f -wave, we generalize the sawtooth model by taking the
adaptive non-harmonic model [LSW18] into account. In particular, we allow the
amplitude factor and the frequency to vary over time. To model this variation, we
simulate AR(1) processes α:N→ R+ and ξ:N→ R+ as follows.
α(j + 1) = α(j) + (1− θα)(µα − α(j)) + εα(j) (6.26)
ξ(j + 1) = ξ(j) + (1− θξ)(µξ − ξ(j)) + εξ(j), (6.27)
where θα = θξ = 0.99 are constants, µα = 0.1 ± 0.01 is the mean amplitude, µξ =
exp(log(4)± 0.2) is the mean frequency, εα:N→ R is normal with standard deviation
σα = 0.001, and εξ:N → R is normal with standard deviation σξ = 0.1. The atrial
wave is then
a(j) =α(j)
π
M∑m=1
(−1)m+1
msin
(2πm
fs
∑i≤j
ξ(i)
)j ∈ N. (6.28)
The noise of the simulated ECG is created by applying a 3-40 Hz bandpass filter to
Gaussian white noise and scaling the standard deviation of the filtered noise to be
.05µα. Finally, we invert the simulated f -wave over the horizontal axis with probability
0.5. The generalization of the model in [SS01, PMSL12] lies in the randomness
134
of the modulation. Note that in our model, the amplitudes and frequencies are
varied according to a random process, which results in less regularity in the signal.
Furthermore, all oscillations generated in the sawtooth model are not symmetric, and
inverting over the horizontal axis yields a different signal. Amplitude and frequency
modulation is especially relevant when simulating long f -wave signals.
Simulated Ventricular Activity
We obtain ventricular activity simulations using the dynamical model ECGSYN
[MCTS03]. We generate signals lacking P waves, and we vary the amplitudes, positions,
and widths of the Q, R, S, and T waves from signal to signal. The implementation of
ECGSYN that we use performs QT interval stretching according to Bazett’s formula.
In practice, the relationship between RR interval length and QRST complex amplitude
is noisy. The ventricular signals are thus integrated beat-wise, and the amplitude
of each QRST complex is scaled by a factor of 1 ± 0.05. (In some cases, T wave
amplitude is non-linearly related to QRS complex amplitude, but we do not simulate
this relationship.) To simulate small perturbations in the cardiac axis from beat to
beat, we modify the z-positions of the Q, R, and S peaks in each beat by adding 0± 2.
Note that our synthetic ventricular signals do not contain ectopic beats, and that
after adding the f -wave, we apply the pre-processing steps of DD-NLEM. We assemble
the final ECG signals by combining the simulated f -waves with these simulated
ventricular signals.
The parameters used for the simulations in ECGSYN are summarized here. The
measurement noise is set to zero, the mean heart rate is set to 80 ± 8 bpm with
standard deviation 10 bpm, the low frequency-to-high frequency ratio is set to 0.5,
the angle, z-position, and Gaussian width parameters are −70, 0, and 0 for the P
wave; −12± 2, −5± 10, and 0.05± 0.01 for the Q wave; 0, 20± 8, and 0.08± 0.01 for
135
the R wave; 12± 2, −15± 8, and 0.07± 0.01 for the S wave; and 90± 10, 0.5± 0.2,
and 0.12± 0.02 for the T wave.
Real Holter Signals
To evaluate the potential of applying the proposed algorithm to real data, we collect
continuous, one-hour ECG signals from 47 patients with persistent Af visiting Chang
Gung Memorial Hospital during the year 2011. Patients received 24-hour Holter
recordings in order to screen for cardiac arrhythmia, examine rate control therapy in
atrial fibrillation, or survey symptoms related to rhythm disturbances. This study is
retrospective, and the Chang Gung Medical Foundation Institutional Review Board
approved the study protocol (No. 103-7449B and No. 104-7769C). The data is
collected from a DigiTrak XT (Philips), and we focus on the first channel under the
EASI lead system.
6.1.4 Other Algorithms
Our algorithm seeks to improve upon the performance of several existing algorithms,
including average beat subtraction (ABS) [SBM+85, SSS92, HPIe98, SYCH04], lo-
cal principal component analysis (PCA) [CMR+05], local adaptive singular value
cancellation (aSVC) [AR08] and non-local aSVC [AR08, Section 3.3]. We describe
these algorithms in detail. In each of these algorithms, we perform pre-processing as
described in Section 6.1.1. These algorithms differ only when it comes to ventricular
template selection. The ABS, local PCA, and local aSVC algorithms obtain tempo-
rally local estimates for the ventricular activity in each cardiac cycle. In ABS, the
ventricular template is the average of the nearest k = 30 patches to xi. In local PCA,
for each patch xi, we evaluate the singular value decomposition of the data matrix
136
X = USV>, where
X =
x>i−14...
x>i+15
∈ R30×1001. (6.29)
We then set the template vi to be
v>i = U(15, 1)S(1, 1) col1(V)>. (6.30)
In local aSVC, we set the template to be
vi = RSiU(15, 1) col1(V)>, (6.31)
where RSi is a constant that intends to scale the singular vector col1(V) so that its
height matches the height of the patch xi. To this end, we define
RSi =xi(301)−min1≤j≤60 xi(300 + j)
U(15, 1)V(301, 1)−min1≤j≤60 U(15, 1)V(300 + j, 1). (6.32)
We clip the denominator so that it is at least 0.01 to prevent RSi from exploding.
In non-local PCA, the cosine affinity is used to determine the neighbours used for
template calculation. Let xi1, ...,xi30 be the 30-nearest patches to xi in terms of
the metric
dcos(xi,xj) =
∑|k|≤80 xi(300 + k)xj(300 + k)√∑
|k|≤80 xi(300 + k)2√∑
|k|≤80 xj(300 + k)2(6.33)
known as the cosine affinity. Then, estimate the ventricular activity in the patch xi
by taking the singular value decomposition of X = USV>, where
X =
x>i1...
x>i30
∈ R30×1001, (6.34)
and setting
v>i = U(1, 1)S(1, 1) col1(V)>. (6.35)
137
Table 6.1: Comparison of existing single-lead f -wave extraction algorithms fromthree viewpoints
ABS [SBM+85] local PCA [CMR+05] non-local PCA NLEM DD-NLEM
and aSVC [AR08] and aSVC [AR08]
Neighbours local local non-local non-local non-local
Metric time time cosine affinity ‖yi − yj‖ diffusiondesign or correlation distance dDD
Template mean first principal first principal Euclidean Euclideandesign component component median median
The algorithms we consider are average beat subtraction (ABS), principal component analysis(PCA), adaptive singular value cancellation (aSVC), and the two algorithms introduced in this
paper, namely NLEM and DD-NLEM.
The approach in non-local aSVC is similar, but the neighbours are calculated using
the correlation coefficient instead of the cosine distance, and each template vi is scaled
so that its height matches the height of the patch xi. Once the estimated templates
have been obtained, we remove them from the patches xi using (6.21) as described
in the last step our algorithm. A graphical overview of the algorithms we consider is
provided in Table 6.1.
6.1.5 Evaluation Metrics
We use multiple metrics to quantify the cancellation of the ventricular activity. In the
case of simulated signals, we have an explicit characterization of the atrial activity.
The normalized mean squared error (NMSE) [CRMZ05, Equation (16)] between the
simulated f -wave a ∈ Rn and the extracted f -wave a ∈ Rn is defined as
NMSE =
∑ni=1(a(i)− a(i))2∑n
i=1 a(i)2. (6.36)
We also consider the cross-correlation ρ between a and a [AR08, Equation (16)]. We
further evaluate the signal-to-noise ratio (SNR) and the peak signal-to-noise ratio
138
(PSNR) commonly considered in the literature using the formulas
SNR = 20 log10
[std(a)
rmse(a, a)
]; PSNR = 20 log10
[maxi|a(i)|rmse(a, a)
], (6.37)
where rmse is the root mean squared error. Note that if a and a have the same mean,
then rmse(a, a) = std(a− a). When dealing with real ECG signals, where an explicit
characterization of the f -wave in each recording is not available, we consider the
ventricular residue (VR) index proposed in [AR08, Equation (18)] to evaluate how
well the QRS complex is canceled. (It is not designed to assess T wave cancellation.)
The VR index for the ith ventricular contraction is defined as
VRi =N∑N
l=1 a(l)2
√√√√ ti+H∑l=ti−H
a(l)2 maxti−H≤l≤ti+H
|a(l)|, (6.38)
where H = 50. When the ventricular activity is not accurately removed and the
`∞-norm of the extracted f -wave is high, the VR index is large. While the VR index
provides information about ventricular residua, it has several limitations. The most
problematic one is its tendency to accept “over-smoothing.” If we remove all of the
cardiac activity from the ECG signal, then the VR index is zero. Hence, we still need
an index that can simultaneously measure the cancellation of the ventricular activity
and the preservation of the atrial activity. Inspired by the VR index, we propose the
following modified VR (mVR) index. For the ith ventricular contraction, define
mVRi :=1
2
[med|ai − med(ai)|med|bi − med(bi)|
+med|bi − med(bi)|med|ai − med(ai)|
]
× 1
2
[max|ai − med(ai)|q95|bi − med(bi)|
+q95|bi − med(bi)|max|ai − med(ai)|
], (6.39)
where med is the median operator, q95w is the 95th quantile of the vector w,
ai(t+ 300) = a(ri + t) − 300 ≤ t ≤ 300, (6.40)
139
and bi is obtained by concatenating segments of (pre-processed) E that are within
10 seconds of ti but not within 300 ms of any R wave. Note that the first term
measures how well the “spectrum” of the f -wave is preserved. This term is introduced
to prevent over-smoothing. The second term measures the presence of ventricular
residua. This index captures simultaneously how well the ventricular activity is
removed and how well the atrial activity is preserved. To provide a summary of all
mVRi values obtained on one recording, we define
mVR =1
N
N∑i=1
mVRi. (6.41)
Finally, we also consider the spectral concentration (SC) index proposed in [CRMZ05,
Equation (16)]. The one-sided power spectral density P ∈ R4001 of a is calculated using
Welch’s method, featuring 8000 Fourier modes, Hamming windows of 400 samples,
and 50% overlapping. The SC index for a signal sampled at 1000 Hz is defined as
SC =
∑96i=24 P (i)∑4001i=1 P (i)
(6.42)
and is the percentage of energy in the frequency range 3-12 Hz. This band is chosen
to be sufficiently wide to cover the f -wave spectrum.
6.2 Results
We apply the algorithm to a variety of real and simulated ECG signals, compare our
algorithm to the previous algorithms described in Section 6.1.4, and use the evaluation
metrics in Section 6.1.5 to measure performance. We visually and quantitatively
justify our use of the diffusion distance. All of our experiments are conducted in
Mathworks MATLAB R2019a using a 4th Generation Intel Core i7 processor and 24
GB of RAM.
140
6.2.1 Comparison with Previous Algorithms
We validate DD-NLEM and compare its performance with that of ABS [SBM+85,
SSS92, HPIe98, SYCH04], local PCA [CMR+05], non-local PCA, local aSVC [AR08]
and non-local aSVC [AR08]. We also show the performance of NLEM. We facilitate
this comparison in two settings. In the first setting, we evaluate the performance
of our algorithm using synthesized one-hour signals generated by our generalized
sawtooth model. We generate a total of 26 simulated signals and apply the above-
mentioned algorithms to each signal. Instead of running the suggested R peak
detection algorithm, we used the R peak locations provided by the simulation software
ECGSYN. A summary of results for the first event is shown in Table 6.2. In this
setting, we see that both NLEM and DD-NLEM outperform other methods in terms of
all considered evaluation indices except computational time, and DD-NLEM performs
the best. Due to the nearest neighbour search, the non-local algorithms take longer
than their local versions. On the other hand, NLEM and DD-NLEM are slightly more
efficiently than non-local PCA and non-local aSVC since we evaluate the median (a
first order statistic) instead of the singular value decomposition. In Figure 6.2, we
show how the various algorithms perform on one of the simulated signals. In the
second setting, we apply each of the six algorithms to 47 real Holter signals and
display the corresponding mVR and SC indices in Table 6.3. The proposed DD-NLEM
performs the best out of the six algorithms. In Figures 6.3 and 6.4, we show how the
various algorithms perform on one of the real Holter signals.
Overall, we can see that the ABS, local PCA, and local aSVC algorithms tend to
provide accurate reconstructions of the f -wave in regions which contain no ventricular
activity, but they present larger QRS complex residua than their non-local counterparts.
On the other hand, while non-local PCA and non-local aSVC remove QRS complexes
quite well, they tend to provide an “over-smoothed” f -wave estimation. One reason
141
for this over-smoothing is that the f -wave is not suppressed before evaluating the
cosine or correlation affinity between cardiac cycles. When the signal is short, these
approaches have an opportunity to perform well. However, when the signal is long
(which is the situation we encounter in Holter monitoring), the probability of finding
two patches with similar ventricular activations and similar f -waves is high. As a
result, the top singular vector is contaminated by the f -wave and most often is not a
good estimate for the ventricular activity. Another problem encountered by the PCA
and aSVC approaches is the large p and large n issue, which in the past decade has
been widely discussed [Joh07]. Under our assumed model, the f -wave behaves like
high dimensional “random noise,” while the ventricular activity can be viewed as the
signal we want to recover. Taking the singular value decomposition in the large p and
large n setting leads to the estimated ventricular activity being biased; in this case, a
correction is needed [SW13, DGJ18]. The problems encountered by these algorithms
are avoided by the proposed NLEM and DD-NLEM algorithms; the f -wave is attenuated
before attempting to find matching ventricular contractions, the diffusion distance
is applied to reduce the influence of noise, and the unbiased Euclidean median is
considered when computing the ventricular template.
6.2.2 Validation of the Proposed mVR Index
To validate the meaningfulness of the proposed mVR index in real ECG signals, we
examine carefully the relationship between the mVR index and the NMSE in our set
of simulated signals. In Figure 6.5, we plot mVR values against the corresponding
NMSE values; we aggregate results from all seven algorithms and all 26 simulated
signals. It is clear that when the NMSE is larger than 0.3, the mVR index is positively
correlated with the NSME. This result indicates that, while mVR might be insensitive
to small errors, the mVR index can detect large errors in f -wave recovery. In addition
142
Figure 6.2: After running each of the seven algorithms on a simulated, one-hourECG signal featuring Af, we plot the extracted f -waves in fuchsia. The true f -wave(built using the generalized sawtooth model) is shown in gray.
143
Figure 6.3: After running each of the five previous algorithms on a real, one-hourHolter recording featuring Af, we plot the extracted f -waves in teal.
144
Table 6.2: Comparison of f -wave extraction algorithms on simulated one-hour ECGsignals
Method NMSE ρ SNR PSNR
ABS 0.79± 0.44 0.75± 0.09 1.65± 2.48 10.22± 2.42PCA 0.49± 0.29 0.81± 0.07 3.71± 2.28 12.28± 2.23aSVC 0.53± 0.34 0.81± 0.08 3.42± 2.49 11.99± 2.44
non-local PCA 0.61± 0.09 0.63± 0.06 2.21± 0.60 10.78± 0.70non-local aSVC 0.47± 0.06 0.72± 0.04 3.27± 0.52 11.84± 0.64
NLEM 0.24± 0.08 0.87± 0.04 6.34± 1.45 14.90± 1.50DM-NLEM 0.16± 0.04 0.91± 0.02 8.01± 1.23 16.58± 1.26
Method mVR SC Time (sec)
ABS 1.39± 0.21 0.67± 0.05 0.24± 0.03PCA 1.27± 0.14 0.59± 0.07 12.28± 2.78aSVC 1.29± 0.16 0.62± 0.06 12.21± 2.70
non-local PCA 1.73± 0.08 0.64± 0.04 13.86± 2.97non-local aSVC 1.68± 0.07 0.66± 0.05 13.89± 2.96
NLEM 1.24± 0.11 0.69± 0.06 7.48± 1.53DD-NLEM 1.12± 0.05 0.70± 0.07 11.00± 2.39
We provide a summary of the performance of the seven algorithms when extracting f -waves fromsimulated Af signals. The results over all simulations are summarized as mean ± the standard
deviation. For each evaluation metric, the best method is marked in bold. The abbreviations areNMSE (normalized mean squared error), ρ (cross-correlation), SNR (signal-to-noise ratio), PSNR
(peak signal-to-noise ratio), mVR (mean modified ventricular residue index), and SC (spectralconcentration).
145
Figure 6.4: After running DD-NLEM and its variation NLEM on a real, one-hour Holterrecording featuring Af, we plot the extracted f -waves in teal.
Table 6.3: Comparison of f -wave extraction algorithms on real one-hour Holtersignals
Method mVR SC Time (sec)
ABS 1.81± 1.37 0.69± 0.11 0.26± 0.06local PCA 1.72± 1.14 0.66± 0.10 13.45± 3.64local aSVC 1.98± 1.37 0.69± 0.10 13.48± 3.83
non-local PCA 1.86± 1.32 0.65± 0.09 15.88± 4.69non-local aSVC 1.93± 1.38 0.67± 0.09 15.74± 4.71
NLEM 1.65± 1.20 0.71± 0.09 8.54± 2.74DD-NLEM 1.59± 1.09 0.72± 0.09 13.35± 4.99
We provide a summary of the performance of the seven algorithms when extracting f -waves fromreal one-hour Holter signals featuring Af. The results over all subjects are summarized as mean ±
the standard deviation. For each evaluation metric, the best method is marked in bold. Theabbreviations are mVR (mean modified ventricular residue index) and SC (spectral concentration).
146
to the philosophy behind the design of mVR index discussed above, these results
support the usefulness of the proposed mVR index when evaluating the performance
of an f -wave extraction algorithm on real data.
Figure 6.5: A scatter plot of the mVR (mean modified ventricular residue) indexversus the normalized mean squared error (NMSE). We include the results from allseven algorithms evaluated on all of the simulated signals generated by the generalizedsawtooth model. We see that when the mVR index is low, so is the NMSE.
6.3 Discussion and Conclusion
Designing a correct metric is at the core of data analysis. In this work, our metric
design process involves two steps. First, inspired by Elgendi’s QRS detection algorithm
[Elg13] and based on a priori physiological knowledge about the f -wave, we design a
series of operations which succeed in suppressing the f -wave. However, the filter is not
perfect, and since the residual f -wave behaves like noise, we apply the diffusion maps
algorithm to stabilize the manifold parametrizing ventricular activations. The diffusion
maps algorithm and the diffusion distance have been theoretically studied and shown
147
to be robust to errors in affinity estimation. The proposed DD-NLEM algorithm has two
novelties when compared with traditional single-lead f -wave extraction algorithms.
First, the non-local nature of the algorithm allows us to leverage information from
the entire signal to robustly and accurately estimate a template for the ventricular
activity in each beat; second, our filtering process and our use of the diffusion distance
prevents us from over-smoothing the extracted f -wave.
There are several issues that we should address. The most important one is the
issue of outliers. Specifically, when there exist beats that are morphologically far away
from all other beats, the algorithm may not perform well. The commonly-encountered
ectopic beat is an example. The existence of ectopic beats poses a unique problem in a
number of ways. They vary significantly in morphology when compared to the signal’s
normal beats. They can also vary significantly among themselves, and the number
of ectopic beats in a 1-hour recording may be as low as 1. As such, local ABS-type
algorithms are severely limited, and a non-local algorithm is needed. In this work,
we assume that if a recording has an ectopic beat, there are enough morphologically
similar beats in the recording to guarantee that we can effectively remove it using the
proposed algorithm. While we see reasonable performance in the database of 47 real
Holter signals, there is still room for improvement. For example, we could pool beats
from different subjects to more accurately match the morphology of any ectopic beat.
Nevertheless, using longer signals would likely alleviate this issue, as the probability of
finding morphologically similar beats later in the recording becomes higher. Another
issue is the inevitable noise. Note that removing the ventricular activity results
in a signal which consists of the atrial activity and noise; post-processing may be
required. We do not attempt to solve this problem in this work because of limited
knowledge concerning f -wave regularity and its clinical significance. Nevertheless,
there are several post-processing approaches which may be useful, such as adaptive
148
recurrent filtering algorithms [SK85]. The final issue is that the T wave is not handled
separately in this work. First, we assume that the T wave has been suppressed by
the highpass filter at 3 Hz (in the pre-processing step). Nevertheless, there may be
residual information from the T wave (particularly when dealing with other lead
types). In this case, we assume that the T wave and the QRS complex have related
morphologies. However, some works in the field show the benefit of dealing with the
T wave separately [LJF+05], and we may consider a separate T wave cancellation
procedure in future work.
We mention several future directions. First, in this work, we focus on patients
with persistent Af. The analysis of patients with paroxysmal Af is of its importance
and will be explored in upcoming research. For example, how can one detect the
spontaneous initiation and termination of Af? Second, while we model the manifold
underlying the patch space xi by a fiber bundle structure, the topological structure
of the fiber bundle is not fully utilized. Moreover, we do not take the manifold
structure into account when performing the Euclidean median; a geometric median
could be considered. Third, having designed an appropriate algorithm for extracting
the f -wave signal, we need an informative feature which derives from the f -wave
useful electrophysiological properties of the atrium [LZCS14, BZK+14]; such a feature
could reflect the patient’s disease status and see relevance for the purposes diagnosis,
treatment, and prognosis. Several indices have been proposed to quantify the intrinsic
features underlying the f -wave [LZCS14, BZK+14]. While these indices are informa-
tive for different clinical applications [LZS+16], the dynamics of the f -wave remain
unstudied. We will apply other non-linear algorithms to extract these dynamical
features and study clinical problems. Fourth, although we confirm the performance
of our algorithm using simulated data, we have not provided a full proof of efficacy.
Since it is now relatively easy to obtain intracardiac signals, we plan to study the
149
relationship between the f -wave observed in the surface ECG and the corresponding
intracardiac signal; an understanding of this relationship would allow us to better
understand the f -wave and hence design a better algorithm to decouple the f -wave
from the ventricular activity in a single-lead recording.
150
Bibliography
[AAe15] M. Abadi, A. Agarwal, and et. al. TensorFlow: Large-scale machinelearning on heterogeneous systems, 2015. Software available fromtensorflow.org.
[AHP+01] G. Alan, E. Hylek, K. Phillips, Y. Chang, L. Henault, J. Selby,and D. Singer. Prevalence of diagnosed atrial fibrillation in adults:national implications for rhythm management and stroke prevention:the nticoagulation and risk factors in atrial fibrillation (ATRIA)study. JAMA, 2370-2375:157–168, 2001.
[AJY12] Mourad Adnane, Zhongwei Jiang, and Zhonghong Yan. Sleep–wakestages classification and sleep efficiency estimation using single-leadelectrocardiogram. Expert Systems with Applications, 39(1):1401–1413, 2012.
[Aki02] M. Akin. Comparison of wavelet transform and FFT methods in theanalysis of EEG signals. Journal of medical systems, 26(3):241–247,2002.
[AL12] Majd AlGhatrif and Joseph Lindsay. A brief review: history to un-derstand fundamentals of electrocardiography. Journal of communityhospital internal medicine perspectives, 2(1):14383, 2012.
[ALG+13] A. Arza, J. Lazaro, E. Gil, P. Laguna, J. Aguilo, and R. Bailon. Pulsetransit time and pulse width as potential measure for estimatingbeat-to-beat systolic and diastolic blood pressure. In Computing inCardiology Conference, pages 887–890. IEEE, 2013.
[AM14] J. Anden and S. Mallat. Deep scattering spectrum. IEEE Transac-tions on Signal Processing, 62(16):4114–4128, 2014.
[AMT+15] Md Aktaruzzaman, Matteo Migliorini, Mirja Tenhunen, Sari-LeenaHimanen, Anna M Bianchi, and Roberto Sassi. The addition ofentropy-based regularity parameters improves sleep stage classifica-tion based on heart rate variability. Medical & biological engineering& computing, 53(5):415–425, 2015.
[AOH+17] U Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan,Muhammad Adam, Arkadiusz Gertych, and Ru San Tan. A deepconvolutional neural network model to classify heartbeats. Computersin biology and medicine, 89:389–396, 2017.
151
[AR07] R. Alcaraz and J.J. Rieta. Adaptive singular value QRST cancellationfor the analysis of short single lead atrial fibrillation electrocardio-grams. Computers in Cardiology, pages 513–516, 2007.
[AR08] R. Alcaraz and J. J. Rieta. Adaptive singular value cancelation ofventricular activity in single-lead atrial fibrillation electrocardiograms.Physiological measurement, 29(12):1351–1369, 2008.
[ASFW18] Sankaraleengam Alagapan, Hae Won Shin, Flavio Frohlich, and Hau-tieng Wu. Diffusion geometry approach to efficiently remove electricalstimulation artifacts in intracranial electroencephalography. Journalof neural engineering, 2018.
[ASO19] Miquel Alfaras, Miguel Cornelles Soriano, and Silvia Ortın. A fastmachine learning model for ecg-based heartbeat classification andarrhythmia detection. Frontiers in Physics, 7:103, 2019.
[ATN+09] Saif Ahmad, Anjali Tejuja, Kimberley D Newman, Ryan Zarychanski,and Andrew JE Seely. Clinical review: a review and analysis of heartrate variability and the diagnosis and prognosis of infection. CriticalCare, 13(6):232, 2009.
[AVBB+09] Alberto P Avolio, Luc M Van Bortel, Pierre Boutouyrie, John RCockcroft, Carmel M McEniery, Athanase D Protogerou, Mary JRoman, Michel E Safar, Patrick Segers, and Harold Smulyan. Role ofpulse pressure amplification in arterial hypertension: experts opinionand review of the data. Hypertension, 54(2):375–383, 2009.
[BA97] MH Bonnet and DL Arand. Heart rate variability: sleep stage, timeof night, and arousal influences. Electroencephalography and clinicalneurophysiology, 102(5):390–396, 1997.
[Bat14] Jonathan Bates. The embedding dimension of laplacian eigenfunctionmaps. Applied and Computational Harmonic Analysis, 37(3):516–530,2014.
[BBe12] R. B. Berry, D. G. Budhiraja, and et al. Rules for scoring respiratoryevents in sleep: update of the 2007 AASM manual for the scoring ofsleep and associated events. J Clin Sleep Med, 8(5):597–619, 2012.
[BBG94] Pierre Berard, Gerard Besson, and Sylvain Gallot. Embedding rie-mannian manifolds by their heat kernel. Geometric & FunctionalAnalysis GAFA, 4(4):373–398, 1994.
[BCB12] Stefano Bordignon, Maria Chiara Corti, and Claudio Bilato. Atrialfibrillation associated with heart failure, stroke and mortality. Journalof atrial fibrillation, 5(1), 2012.
152
[BCM05] Antoni Buades, Bartomeu Coll, and J-M Morel. A non-local algo-rithm for image denoising. In 2005 IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition (CVPR’05),volume 2, pages 60–65. IEEE, 2005.
[BD16] Peter J Brockwell and Richard A Davis. Introduction to time seriesand forecasting. springer, 2016.
[Ber06] Pierre H Berard. Spectral geometry: direct and inverse problems,volume 1207. Springer, 2006.
[BGGM80] A. Buzo, A. Gray, R. Gray, and J. Markel. Speech coding based uponvector quantization. IEEE Transactions on Acoustics, Speech, andSignal Processing, 28(5):562–574, 1980.
[BHM+06] A. Bollmann, D. Husser, L. Mainardi, F. Lombardi, P. Langley,A. Murray, J. J. Rieta, J. Millet, S. B. Olsson, M. Stridh, andL. Sornmo. Analysis of surface electrocardiograms in atrial fibrillation:techniques, research, and clinical applications. Europace: Europeanpacing, arrhythmias, and cardiac electrophysiology, 8(11):911–26,2006.
[Bil11] George E Billman. Heart rate variability–a historical perspective.Frontiers in physiology, 2:86, 2011.
[BKH+14] Paddy M Barrett, Ravi Komatireddy, Sharon Haaser, Sarah Topol,Judith Sheard, Jackie Encinas, Angela J Fought, and Eric J Topol.Comparison of 24-hour holter monitoring with 14-day novel adhesivepatch electrocardiographic monitoring. The American journal ofmedicine, 127(1):95–e11, 2014.
[BKR08] Peter Bickel, Bas Kleijn, and John Rice. Event-weighted tests fordetecting periodicity in photon arrival times. The AstrophysicalJournal, 685(1):384, 2008.
[BLS11] Andrea Bravi, Andre Longtin, and Andrew JE Seely. Review and clas-sification of variability analysis techniques with clinical applications.Biomedical engineering online, 10(1):90, 2011.
[BM13] Joan Bruna and Stephane Mallat. Invariant scattering convolutionnetworks. IEEE transactions on pattern analysis and machine intel-ligence, 35(8):1872–1886, 2013.
[BMAB05] Riccardo Barbieri, Eric C Matten, AbdulRasheed A Alabi, andEmery N Brown. A point-process model of human heartbeat intervals:new definitions of heart rate and heart rate variability. American
153
Journal of Physiology-Heart and Circulatory Physiology, 288(1):H424–H435, 2005.
[BMB+15] J. Bruna, S. Mallat, E. Bacry, J. Muzy, et al. Intermittent processanalysis with scattering moments. The Annals of Statistics, 43(1):323–351, 2015.
[BML+85] AA Borbely, P Mattmann, M Loepfe, I Strauch, and D Lehmann.Effect of benzodiazepine hypnotics on all-night sleep eeg spectra.Human neurobiology, 4(3):189–194, 1985.
[BN03] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimen-sionality reduction and data representation. Neural computation,15(6):1373–1396, 2003.
[BN07] Mikhail Belkin and Partha Niyogi. Convergence of laplacian eigen-maps. In Advances in Neural Information Processing Systems, pages129–136, 2007.
[BN08] Mikhail Belkin and Partha Niyogi. Towards a theoretical foundationfor laplacian-based manifold methods. Journal of Computer andSystem Sciences, 74(8):1289–1308, 2008.
[BOLC13] Joachim Behar, Julien Oster, Qiao Li, and Gari D Clifford. Ecg signalquality during arrhythmia and its application to false alarm reduction.IEEE transactions on biomedical engineering, 60(6):1660–1666, 2013.
[BP11] Julius S Bendat and Allan G Piersol. Random data: analysis andmeasurement procedures, volume 729. John Wiley & Sons, 2011.
[BPST98] Mirella Boselli, Liborio Parrino, Arianna Smerieri, and Mario Gio-vanni Terzano. Effect of age on eeg arousals in normal sleep. Sleep,21(4):361–367, 1998.
[BS17] Alexander Berrian and Naoki Saito. Adaptive synchrosqueezing basedon a quilted short-time fourier transform. In Wavelets and SparsityXVII, volume 10394, page 1039420. International Society for Opticsand Photonics, 2017.
[BWB+11] Martin C Baruch, Darren ER Warburton, Shannon SD Bredin, AnitaCote, David W Gerdt, and Charles M Adkins. Pulse decompositionanalysis of the digital arterial pulse during hemorrhage simulation.Nonlinear biomedical physics, 5(1):1, 2011.
[BZK+14] P. Bonizzi, S. Zeemering, J. M. H. Karel, L. Y. Di Marco, L. Uldry,J. Van Zaen, J. M. Vesin, and U. Schotten. Systematic comparison
154
of non-invasive measures for the assessment of atrial fibrillation com-plexity: A step forward towards standardization of atrial fibrillationelectrogram analysis. Europace, 17(2):318–325, 2014.
[CA+06] Harvey R Colten, Bruce M Altevogt, et al. Functional and economicimpact of sleep loss and sleep-related disorders. In Sleep disordersand sleep deprivation: An unmet public health problem. NationalAcademies Press (US), 2006.
[CAM+06] Gari D Clifford, Francisco Azuaje, Patrick McSharry, et al. Advancedmethods and tools for ECG data analysis. Artech house Boston, 2006.
[CBLR12] GD Clifford, J Behar, Q Li, and Iead Rezek. Signal quality indicesand data fusion for determining clinical acceptability of electrocar-diograms. Physiological measurement, 33(9):1419, 2012.
[CBWG12] E. J. Ciaccio, A. B. Biviano, W. Whang, and H. Garan. A newLMS algorithm for analysis of atrial fibrillation signals. Biomedicalengineering online, 11(1):15, 2012.
[CCW14] Yu-Chun Chen, Ming-Yen Cheng, and Hau-Tieng Wu. Non-parametric and adaptive modelling of dynamic periodicity and trendwith heteroscedastic and dependent errors. Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 76(3):651–682,2014.
[CD14] Florian Chouchou and Martin Desseilles. Heart rate variability: atool to explore the sleeping brain? Frontiers in neuroscience, 8:402,2014.
[CdlHMFAL06] Pablo Casaseca-de-la Higuera, Marcos Martın-Fernandez, and CarlosAlberola-Lopez. Weaning from mechanical ventilation: a retrospec-tive analysis leading to a multimodal perspective. IEEE transactionson biomedical engineering, 53(7):1330–1345, 2006.
[CDT+10] Ferran Roche Campo, Xavier Drouot, Arnaud W Thille, FabriceGalia, Belen Cabello, Marie-Pia d’Ortho, and Laurent Brochard.Poor sleep quality is associated with late noninvasive ventilationfailure in patients with acute hypercapnic respiratory failure. Criticalcare medicine, 38(2):477–485, 2010.
[CG97] Fan RK Chung and Fan Chung Graham. Spectral graph theory.Number 92. American Mathematical Soc., 1997.
[CL06] Ronald R Coifman and Stephane Lafon. Diffusion maps. Appliedand computational harmonic analysis, 21(1):5–30, 2006.
155
[CL19] Wojciech Czaja and Weilin Li. Analysis of time-frequency scat-tering transforms. Applied and Computational Harmonic Analysis,47(1):149–171, 2019.
[CLW16a] C. K. Chui, Y.-T. Lin, and H.-T. Wu. Real-time dynamics acquisitionfrom irregular samples – with application to anesthesia evaluation.Analysis and Applications, 14(4):537–590, 2016.
[CLW16b] Charles K Chui, Yu-Ting Lin, and Hau-Tieng Wu. Real-time dynam-ics acquisition from irregular sampleswith application to anesthesiaevaluation. Analysis and Applications, 14(04):537–590, 2016.
[CMR+05] F. Castells, C. Mora, J. J. Rieta, D. Moratal-Perez, and J. Millet.Estimation of atrial fibrillatory wave from single-lead atrial fibrilla-tion electrocardiograms using principal component analysis concepts.Medical and Biological Engineering and Computing, 43(5):557–560,2005.
[CNF+97] Chen-Huan Chen, Erez Nevo, Barry Fetics, Peter H Pak, Frank CPYin, W Lowell Maughan, and David A Kass. Estimation of cen-tral aortic pressure waveform by mathematical transformation ofradial tonometry pressure: validation of generalized transfer function.Circulation, 95(7):1827–1836, 1997.
[Coh95] L. Cohen. Uncertainty principles of the short-time fourier transform.In Advanced Signal Processing Algorithms, volume 2563, pages 80–91.International Society for Optics and Photonics, 1995.
[Coh14] M. X. Cohen. Analyzing neural time series data: theory and practice.MIT Press, 2014.
[Con90] John B Conway. A course in functional analysis. Springer, 1990.
[CRMZ05] F. Castells, J. Rieta, J. Millet, and V. Zarzoso. Spatio-temporalblind source separation approach to atrial activity estimation in atrialtachyarrhythmias. IEEE Transactions on Biomedical Engineering,52:258–267, 2005.
[CRV+06] S. Cesar, J. J. Rieta, C. Vaya, D. M. Perez, R. Zangroniz, andJ. Millet. Wavelet Denoising as Preprocessing Stage to Improve ICAperformance in Atrial Fibrillation Analysis. In Justinian Rosca, DenizErdogmus, Jose C. Prıncipe, and Simon Haykin, editors, LNCS 3889,volume 3889 of Lecture Notes in Computer Science. Springer BerlinHeidelberg, 2006.
156
[CS12] K. N. Chaudhury and A. Singer. Non-local euclidean medians. IEEESignal Processing Letters, 19(11):745–748, 2012.
[CTL+95] Chen-Huan Chen, Chih-Tai Ting, Shing-Jong Lin, Tsui-Lieh Hsu,Frank CP Yin, Cynthia O Siu, Pesus Chou, Shih-Pu Wang, andMau-Song Chang. Different effects of fosinopril and atenolol on wavereflections in hypertensive patients. Hypertension, 25(5):1034–1041,1995.
[CTW91] T. Chou, Y. Tamura, and I. Wong. Detection of atrial fibrillation inECGs. In Computers in Cardiology, 1991.
[CW17] Antonio Cicone and Hau-Tieng Wu. How nonlinear-type time-frequency analysis can help in sensing instantaneous heart rate andinstantaneous respiratory rate from photoplethysmography in a reli-able way. Frontiers in physiology, 8:701, 2017.
[CZC15] Ying Chen, Xin Zhu, and Wenxi Chen. Automatic sleep staging basedon ecg signals using hidden markov models. In 2015 37th AnnualInternational Conference of the IEEE Engineering in Medicine andBiology Society (EMBC), pages 530–533. IEEE, 2015.
[D+97] Rainer Dahlhaus et al. Fitting time series models to nonstationaryprocesses. The annals of Statistics, 25(1):1–37, 1997.
[Dau92] I. Daubechies. Ten lectures on wavelets. SIAM, 1992.
[DCOR04] Philip De Chazal, Maria O’Dwyer, and Richard B Reilly. Auto-matic classification of heartbeats using ecg morphology and heart-beat interval features. IEEE transactions on biomedical engineering,51(7):1196–1206, 2004.
[DDV+15] Jonathan W Dukes, Thomas A Dewland, Eric Vittinghoff, Mala CMandyam, Susan R Heckbert, David S Siscovick, Phyllis K Stein,Bruce M Psaty, Nona Sotoodehnia, John S Gottdiener, et al. Ven-tricular ectopy as a predictor of heart failure and death. Journal ofthe American College of Cardiology, 66(2):101–109, 2015.
[DFL+13] F. I. Donoso, R. L. Figueroa, E. A Lecannelier, E. J. Pino, and A. J.Rojas. Atrial activity selection for atrial fibrillation ECG recordings.Computers in biology and medicine, 43(10):1628–36, 2013.
[DG05] David L Donoho and Carrie Grimes. Image manifolds which areisometric to euclidean space. Journal of mathematical imaging andvision, 23(1):5–24, 2005.
157
[DGJ18] David L Donoho, Matan Gavish, and Iain M Johnstone. Optimalshrinkage of eigenvalues in the spiked covariance model. Annals ofstatistics, 46(4):1742, 2018.
[DhAZ+09] Heidi Danker-hopfe, Peter Anderer, Josef Zeitlhofer, Marion Boeck,Hans Dorn, Georg Gruber, Esther Heller, Erna Loretz, Doris Moser,Silvia Parapatics, et al. Interrater reliability for sleep scoring accord-ing to the rechtschaffen & kales and the new aasm standard. Journalof sleep research, 18(1):74–84, 2009.
[Dhi01] Inderjit S Dhillon. Co-clustering documents and words using bipartitespectral graph partitioning. In Proceedings of the seventh ACMSIGKDD international conference on Knowledge discovery and datamining, pages 269–274, 2001.
[DKS07] L. G. Doroshenkov, V. A. Konyshev, and S. V. Selishchev. Classifi-cation of human sleep stages based on EEG processing using hiddenMarkov models. Biomedical Engineering, 41(1):25–28, 2007.
[DL14] Y. Dong and D. Li. Automatic speech recognition: A deep learningapproach. Springer, 2014.
[DLHS11] Alysha M De Livera, Rob J Hyndman, and Ralph D Snyder. Fore-casting time series with complex seasonal patterns using exponen-tial smoothing. Journal of the American statistical association,106(496):1513–1527, 2011.
[DLW11] Ingrid Daubechies, Jianfeng Lu, and Hau-Tieng Wu. Synchrosqueezedwavelet transforms: An empirical mode decomposition-like tool. Ap-plied and computational harmonic analysis, 30(2):243–261, 2011.
[Don92] David L Donoho. Superresolution via sparsity constraints. SIAMjournal on mathematical analysis, 23(5):1309–1331, 1992.
[DS04] I. Dotsinsky and T. Stoyanov. Optimization of bi-directional digitalfiltering for drift suppression in electrocardiogram signals. Journalof Medical Engineering & Technology, 28(4):178–180, 2004.
[DTZC13] Dominique Duncan, Ronen Talmon, Hitten P Zaveri, and Ronald RCoifman. Identifying preseizure state in intracranial eeg data usingdiffusion kernels. Math. Biosci. Eng, 10(3):579–590, 2013.
[DWW16] Ingrid Daubechies, Yi Wang, and Hau-tieng Wu. Conceft: Concen-tration of frequency and time via a multitapered synchrosqueezedtransform. Philosophical Transactions of the Royal Society A: Math-ematical, Physical and Engineering Sciences, 374(2065):20150193,2016.
158
[EC598] ANSI-AAMI EC57. Testing and reporting performance results ofcardiac rhythm and st segment measurement algorithms. Associationfor the Advancement of Medical Instrumentation, Arlington, VA,1998.
[ECA87] AAMI ECAR. Recommended practice for testing and reportingperformance results of ventricular arrhythmia detection algorithms.Association for the Advancement of Medical Instrumentation, page 69,1987.
[EHO99] Sigrid Elsenbruch, Michael J Harnish, and William C Orr. Heartrate variability during waking and sleep in healthy males and females.Sleep, 22(8):1067–1071, 1999.
[EK+10] Noureddine El Karoui et al. On information plus noise kernel randommatrices. The Annals of Statistics, 38(5):3191–3216, 2010.
[EKW+16] Noureddine El Karoui, Hau-Tieng Wu, et al. Graph connectionlaplacian methods can be made robust to noise. The Annals ofStatistics, 44(1):346–372, 2016.
[Elg13] Mohamed Elgendi. Fast qrs detection with an optimized knowledge-based method: Evaluation on 11 standard ecg databases. PloS one,8(9):e73557, 2013.
[ENB+13] Mohamed Elgendi, Ian Norton, Matt Brearley, Derek Abbott, andDale Schuurmans. Systolic peak detection in acceleration photo-plethysmograms measured from emergency responders in tropicalconditions. PLoS One, 8(10), 2013.
[Eng98] Milo Engoren. Approximate entropy of respiratory rate and tidalvolume during weaning from mechanical ventilation. Critical caremedicine, 26(11):1817–1823, 1998.
[EPD49] K.F. Eather, L.H. Peterson, and R.D. Dripps. Studies of the circula-tion of anesthetized patients by a new method for recording arterialpressure and pressure pulse contours. Anesthesiology, 10(2):125, 1949.
[ESAMN13] Farideh Ebrahimi, Seyed-Kamaledin Setarehdan, Jose Ayala-Moyeda,and Homer Nazeran. Automatic sleep staging using empirical modedecomposition, discrete wavelet transform, time-domain, and non-linear dynamics features of heart rate variability signals. Computermethods and programs in biomedicine, 112(1):47–57, 2013.
159
[EVWFE18] Diana Escalona-Vargas, Hau-Tieng Wu, Martin G Frasch, and HariEswaran. A comparison of five algorithms for fetal magnetocardiog-raphy signal extraction. Cardiovascular engineering and technology,pages 1–5, 2018.
[FCEH03] Mia Folke, L Cernerud, M Ekstrom, and Bertil Hok. Critical reviewof non-invasive respiratory monitoring in medical care. Medical andBiological Engineering and Computing, 41(4):377–383, 2003.
[FdMM17] Matthias Franz, Santiago Lopez de Medrano, and John Malik. Mu-tants of compactified representations revisited. Boletın de la SociedadMatematica Mexicana, 23(1):511–526, 2017.
[FDSR02] A. Flexerand, G. Dorffner, P. Sykacekand, and I. Rezek. An auto-matic, continuous and probabilistic sleep stager based on a hiddenMarkov model. Applied Artificial Intelligence, 16(3):199–207, 2002.
[FGD05] A. Flexer, G. Gruber, and G. Dorffner. A reliable probabilisticsleep stager based on a single EEG signal. Artificial Intelligence inMedicine, 33(3):199–207, 2005.
[FGHS02] Mia Folke, Fredrik Granstedt, Bertil Hok, and Hakan Scheer. Com-parative provocation test of respiratory monitoring methods. Journalof clinical monitoring and computing, 17(2):97–103, 2002.
[FHT+00] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additivelogistic regression: a statistical view of boosting (with discussion anda rejoinder by the authors). The annals of statistics, 28(2):337–407,2000.
[Fla99] P Flandrin. Wavelet analysis and its applications. Time–Frequency/Time-Scale Analysis, pages 183–307, 1999.
[FLR+15] Pedro Fonseca, Xi Long, Mustafa Radha, Reinder Haakma, Ronald MAarts, and Jerome Rolink. Sleep stage classification with ecg andrespiratory effort. Physiological measurement, 36(10):2027, 2015.
[Fra08] A. M. Fraser. Hidden Markov models and dynamical systems. SIAM,2008.
[Fri01] Jerome H Friedman. Greedy function approximation: a gradientboosting machine. Annals of statistics, pages 1189–1232, 2001.
[FS95] Yoav Freund and Robert E Schapire. A desicion-theoretic generaliza-tion of on-line learning and an application to boosting. In Europeanconference on computational learning theory, pages 23–37. Springer,1995.
160
[Fun13] Yuan-cheng Fung. Biomechanics: circulation. Springer Science &Business Media, 2013.
[GAG+00] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff,Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George BMoody, Chung-Kang Peng, and H Eugene Stanley. Physiobank,physiotoolkit, and physionet: components of a new research resourcefor complex physiologic signals. circulation, 101(23):e215–e220, 2000.
[GDDGG14] Dragoljub Gajic, Zeljko Djurovic, Stefano Di Gennaro, and FredrikGustafsson. Classification of eeg signals for detection of epilep-tic seizures based on wavelets and statistical pattern recognition.Biomedical Engineering: Applications, Basis and Communications,26(02):1450021, 2014.
[GDG+15] Dragoljub Gajic, Zeljko Djurovic, Jovan Gligorijevic, Stefano Di Gen-naro, and Ivana Savic-Gajic. Detection of epileptiform activity in eegsignals based on time-frequency and non-linear analysis. Frontiers incomputational neuroscience, 9:38, 2015.
[GH00] A. Guyton and J. Hall. Textbook of Medical Physiology. Saunders,2000.
[GH07] Marc G Genton and Peter Hall. Statistical inference for evolvingperiodic functions. Journal of the Royal Statistical Society: Series B(Statistical Methodology), 69(4):643–657, 2007.
[GHA+02] D. Gorur, U. Halici, H. Aydin, G. Ongun, F. Ozgen, and K. Leblebi-cioglu. Sleep spindles detection using short time fourier transformand neural networks. In IJCNN’02, volume 2, pages 1631–1636.IEEE, 2002.
[GK+06] Evarist Gine, Vladimir Koltchinskii, et al. Empirical graph laplacianapproximation of laplace–beltrami operators: Large sample results.In High dimensional probability, pages 238–259. Institute of Mathe-matical Statistics, 2006.
[Gla09] Leon Glass. Introduction to controversial topics in nonlinear science:Is the normal heart rate chaotic?, 2009.
[Gol06] A. L. Goldberger. Clinical Electrocardiography: A Simplified Ap-proach. Mosby, 2006.
[GRAR16] M. Garcıa, J. Rodenas, R. Alcaraz, and J. J. Rieta. Application ofthe relative wavelet energy to heart rate independent detection ofatrial fibrillation. Computer Methods and Programs in Biomedicine,131(April):157–168, 2016.
161
[GRS05] Steinn Gudmundsson, Thomas Philip Runarsson, and Sven Sigurds-son. Automatic sleep staging using support vector machines withposterior probability estimates. In International Conference on Com-putational Intelligence for Modelling, Control and Automation andInternational Conference on Intelligent Agents, Web Technologies andInternet Commerce (CIMCA-IAWTIC’06), volume 2, pages 366–372.IEEE, 2005.
[GS64] Izrail Moiseevich Gelfand and Georgii Evgenevich Shilov. Generalizedfunctions, Vol. 4: applications of harmonic analysis. Academic Press,1964.
[GS02] et al. Goodacre S. Abc of clinical electrocardiography: Atrial ar-rhythmias. BMJ, 324(7344):594–597, 2002.
[GSK+04] Kenan Gumustekin, Bedri Seven, Nezihe Karabulut, Omer Aktas,Nesrin Gursan, Sahin Aslan, Mustafa Keles, Erhan Varoglu, andSenol Dane. Effects of sleep deprivation, nicotine, and seleniumon wound healing in rats. International journal of neuroscience,114(11):1433–1442, 2004.
[Hal78] Marc Hallin. Mixed autoregressive-moving average multivariate pro-cesses with time-dependent coefficients. Journal of MultivariateAnalysis, 8(4):567–572, 1978.
[Hal80] Marc Hallin. Invertibility and generalized invertibility of time seriesmodels. Journal of the Royal Statistical Society: Series B (Method-ological), 42(2):210–212, 1980.
[HAVL05] Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. Fromgraphs to manifolds–weak and strong pointwise consistency of graphlaplacians. In International Conference on Computational LearningTheory, pages 470–485. Springer, 2005.
[HCH+06] Tse Lin Hsu, Pin Tsun Chao, Hsin Hsiu, Wei Kung Wang, Sai Ping Li,and Yuh Ying Lin Wang. Organ-specific ligation-induced changes inharmonic components of the pulse spectrum and regional vasoconstric-tor selectivity in wistar rats. Experimental physiology, 91(1):163–170,2006.
[HCKM04] J. Healey, G.D. Clifford, L. Kontothanassis, and P.E. McSharry. Anopen-source method for simulating atrial fibrillation using ECGSYN.Computers in Cardiology,, (September):19–22, 2004.
[HFSW17] Christophe L Herry, Martin Frasch, Andrew JE Seely, and Hau-tieng Wu. Heart beat classification from single-lead ecg using the
162
synchrosqueezing transform. Physiological measurement, 38(2):171,2017.
[HGN04] Efstathios Hadjidemetriou, Michael D Grossberg, and Shree K Nayar.Multiresolution histograms and their use for recognition. IEEETransactions on Pattern Analysis and Machine Intelligence, 26(7):831–847, 2004.
[HJB+18] Feras Hatib, Zhongping Jian, Sai Buddi, Christine Lee, Jos Settels,Karen Sibert, Joseph Rinehart, and Maxime Cannesson. Machine-learning algorithm to predict hypotension based on high-fidelityarterial pressure waveform analysis. Anesthesiology, 129(4):663–674,2018.
[HKF+05] B. L. Hoppe, A. M. Kahn, G. K. Feld, A. Hassankhani, and S. M.Narayan. Separating Atrial Flutter From Atrial Fibrillation WithApparent Electrocardiographic Organization Using Dominant andNarrow F-Wave Spectra. Journal of the American College of Cardi-ology, 46(11):2079–2087, 2005.
[HKH+12] K Hamunen, V Kontinen, Elina Hakala, P Talke, M Paloheimo, andE Kalso. Effect of pain on autonomic nervous system indices derivedfrom photoplethysmography in healthy volunteers. British journal ofanaesthesia, 108(5):838–844, 2012.
[HMC04] SA Hope, IT Meredith, and JD Cameron. Effect of non-invasivecalibration of radial waveforms on error in transfer-function-derivedcentral aortic waveform characteristics. Clinical science (London,England: 1979), 107(2):205, 2004.
[HOH13] Abdulnasir Hossen, H Ozer, and Ulrich Heute. Identification of sleepstages from heart rate variability using a soft-decision wavelet-basedtechnique. Digital Signal Processing, 23(1):218–229, 2013.
[Hor90] L. Hormander. The analysis of linear partial differential operators I.Springer, 1990.
[HPIe98] M. Holm, S. Pehrson, M. Ingemansson, and et al. Non-invasiveassessment of the atrial cycle length during atrial fibrillation inman: introducing, validating and illustrating a new ECG method.Cardiovasc Res, 38:69–81, 1998.
[HRC+17] David Hamon, Pradeep S Rajendran, Ray W Chui, Olujimi A Ajijola,Tadanobu Irie, Ramin Talebi, Siamak Salavatian, Marmar Vaseghi,Jason S Bradfield, J Andrew Armour, et al. Premature ventricularcontraction coupling interval variability destabilizes cardiac neuronal
163
and electrophysiological control: insights from simultaneous car-dioneural mapping. Circulation: Arrhythmia and Electrophysiology,10(4):e004937, 2017.
[HRPH10] A. Hyvarinen, P. Ramkumar, L. Parkkonen, and R. Hari. Independentcomponent analysis of short-time fourier transforms for spontaneousEEG/MEG analysis. NeuroImage, 49(1):257–271, 2010.
[HRR00] Peter Hall, James Reimann, and John Rice. Nonparametric estima-tion of a periodic function. Biometrika, 87(3):545–557, 2000.
[HS97] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735–1780, 1997.
[HS16] Thomas Y Hou and Zuoqiang Shi. Extracting a shape functionfor a signal with intra-wave frequency modulation. PhilosophicalTransactions of the Royal Society A: Mathematical, Physical andEngineering Sciences, 374(2065):20150194, 2016.
[HSE+95] James Hafner, Harpreet S. Sawhney, William Equitz, Myron Flickner,and Wayne Niblack. Efficient color histogram indexing for quadraticform distance functions. IEEE transactions on pattern analysis andmachine intelligence, 17(7):729–736, 1995.
[HTPB04] Peter Halasz, Mario Terzano, Liborio Parrino, and Robert Bodizs.The nature of arousal in sleep. Journal of sleep research, 13(1):1–23,2004.
[HWB08] Chinmay Hegde, Michael Wakin, and Richard Baraniuk. Randomprojections for manifold learning. In Advances in neural informationprocessing systems, pages 641–648, 2008.
[IAIC+07] Conrad Iber, Sonia Ancoli-Israel, Andrew L Chesson, Stuart F Quan,et al. The AASM manual for the scoring of sleep and associatedevents: rules, terminology and technical specifications, volume 1.American Academy of Sleep Medicine Westchester, IL, 2007.
[IRV14] S. A. Imtiaz and Esther R.-V. Recommendations for performanceassessment of automatic sleep staging algorithms. In Engineering inMedicine and Biology Society (EMBC), 2014 36th Annual Interna-tional Conference of the IEEE, pages 5044–5047. IEEE, 2014.
[JAR15] M. Julian, R. Alcaraz, and J. J. Rieta. Application of Hurst exponentsto assess atrial reverse remodeling in paroxysmal atrial fibrillation.Physiological measurement, 36:2231–2246, 2015.
164
[Je14] C. T. January and et. al. 2014 aha/acc/hrs guideline for the manage-ment of patients with atrial fibrillation: Executive summary. Journalof the American College of Cardiology, 64(21):2246–2280, 2014.
[JHH+00] Ming-Yie Jan, Hsin Hsiu, Tse-Lin Hsu, Yuh-Ying Lin Wang, andWei-Kung Wang. The importance of pulsatile microcirculation inrelation to hypertension. IEEE Engineering in Medicine and BiologyMagazine, 19(3):106–111, 2000.
[JMS08] Peter W Jones, Mauro Maggioni, and Raanan Schul. Manifoldparametrizations by eigenfunctions of the laplacian and heat kernels.Proceedings of the National Academy of Sciences, 105(6):1803–1808,2008.
[Joh07] I. M. Johnstone. High dimensional statistical inference and randommatrices. In Proceedings of the International Congress of Mathemati-cians, 2007.
[Kam12] Chandrakar Kamath. A new approach to detect congestive heartfailure using teager energy nonlinear scatter plot of r–r interval series.Medical engineering & physics, 34(7):841–848, 2012.
[Ke16] P. Kirchhof and et. al. 2016 esc guidelines for the management ofatrial fibrillation developed in collaboration with eacts. EuropeanHeart Journal, 37(38):2893, 2016.
[Kee98] J. Keener. Mathematical Physiology. Springer, 1998.
[KJH+13] Won Young Kim, Jong Hun Jun, Jin Won Huh, Sang Bum Hong,Chae-Man Lim, and Younsuck Koh. Radial to femoral arterial bloodpressure differences in septic shock patients receiving high-dose nore-pinephrine therapy. Shock, 40(6):527–531, 2013.
[KLB+09] Jae-Eun Kang, Miranda M Lim, Randall J Bateman, James J Lee,Liam P Smyth, John R Cirrito, Nobuhiro Fujiki, Seiji Nishino, andDavid M Holtzman. Amyloid-β dynamics are regulated by orexinand the sleep-wake cycle. Science, 326(5955):1005–1007, 2009.
[KLR+09] Yongsam Kim, Sookkyung Lim, Subha V Raman, Orlando P Si-monetti, and Avner Friedman. Blood flow in a compliant vessel bythe immersed boundary method. Annals of biomedical engineering,37(5):927–942, 2009.
[KMF09] Walter Karlen, Claudio Mattiussi, and Dario Floreano. Sleep andwake classification with ecg and respiratory effort signals. IEEETransactions on Biomedical Circuits and Systems, 3(2):71–78, 2009.
165
[KSK+01] D.A. Kass, E.P. Shapiro, M. Kawaguchi, A.R. Capriotti, A. Scuteri,E.G. Lakatta, et al. Improved arterial compliance by a novel advancedglycation end-product crosslink breaker. Circulation, 104(13):1464–1470, 2001.
[KSP+13] Sung-Hoon Kim, Jun-Gol Song, Ji-Hyun Park, Jung-Won Kim, Yong-Seok Park, and Gyu-Sam Hwang. Beat-to-beat tracking of systolicblood pressure using noninvasive pulse transit time during anesthesiainduction in hypertensive patients. Anesthesia & Analgesia, 116(1):94–100, 2013.
[KTK+16] Takeshi Kanda, Natsuko Tsujino, Eriko Kuramoto, YoshimasaKoyama, Etsuo A Susaki, Sachiko Chikahisa, and Hiromasa Fu-nato. Sleep as a biological problem: an overview of frontiers in sleepresearch. The Journal of Physiological Sciences, 66(1):1–13, 2016.
[KTLW17] Ori Katz, Ronen Talmon, Yu-Lun Lo, and Hau-Tieng Wu. Diffusion-based nonlinear filtering for multimodal data fusion with applicationto sleep stage assessment. arXiv preprint arXiv:1701.03619, 2017.
[KTR+94] Avi Karni, David Tanne, Barton S Rubenstein, JJ Askenasy, andDov Sagi. Dependence on rem sleep of overnight improvement of aperceptual skill. Science, 265(5172):679–682, 1994.
[Ku97] David N Ku. Blood flow in arteries. Annual review of fluid mechanics,29(1):399–434, 1997.
[Laf04] Stephane S Lafon. Diffusion maps and geometric harmonics. PhDthesis, Yale University, 2004.
[LB05] E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsicdimension. In Advances in neural information processing systems,pages 777–784, 2005.
[LBBH98] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner.Gradient-based learning applied to document recognition. Proceedingsof the IEEE, 86(11):2278–2324, 1998.
[LBH15] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015.
[LBLP12] Damien Leger, Virginie Bayon, Jean Pierre Laaban, and PierrePhilip. Impact of sleep apnea on economics. Sleep medicine re-views, 16(5):455–462, 2012.
166
[LBM00] P. Langley, J. Bourke, and A. Murray. Frequency analysis of atrialfibrillation. In Proc. Int. Conf. IEEE Computers in Cardiology,volume 27, pages 65–68. IEEE, 2000.
[LBM10] Philip Langley, Emma J Bowers, and Alan Murray. Principal com-ponent analysis as a tool for analysing beat-to-beat changes in elec-trocardiogram features: Application to electrocardiogram derivedrespiration. IEEE Trans. Biomed. Eng, 7:7, 2010.
[LCL+15] Chin-Yu Lin, Shih-Lin Chang, Yenn-Jiang Lin, Li-Wei Lo, Fa-PoChung, Yun-Yu Chen, Tze-Fan Chao, Yu-Feng Hu, Ta-Chuan Tuan,Jo-Nan Liao, et al. Long-term outcome of multiform prematureventricular complexes in structurally normal heart. Internationaljournal of cardiology, 180:80–85, 2015.
[LFH+12] Xi Long, Pedro Fonseca, Reinder Haakma, Ronald M Aarts, andJerome Foussier. Time-frequency analysis of heart rate variabilityfor sleep and wake classification. In 2012 IEEE 12th InternationalConference on Bioinformatics & Bioengineering (BIBE), pages 85–90.IEEE, 2012.
[LFW17] Ruilin Li, Martin G Frasch, and Hau-Tieng Wu. Efficient fetal-maternal ecg signal separation from two channel maternal abdominalecg via diffusion-based channel selection. Frontiers in physiology,8:277, 2017.
[LGT14] James R Lee, Shayan Oveis Gharan, and Luca Trevisan. Multiwayspectral partitioning and higher-order cheeger inequalities. Journalof the ACM (JACM), 61(6):1–30, 2014.
[LI09] R. Llinares and J. Igual. Application of constrained independentcomponent analysis algorithms in electrocardiogram arrhythmias.Artificial Intelligence in Medicine, 47(2):121–133, 2009.
[LIMB10] R. Llinares, J. Igual, and J. Miro-Borras. A fixed point algorithm forextracting the atrial activity in the frequency domain. Computers inBiology and Medicine, 40(11-12):943–949, 2010.
[LJF+05] M. Lemay, V. Jacquemet, a. Forclaz, J. M. Vesin, and L. Kappen-berger. Spatiotemporal QRST cancellation method using separateQRS and T-waves templates. Computers in Cardiology, 32:611–614,2005.
[LLM+20] Gi-Ren Liu, Yu-Lun Lo, John Malik, Yuan-Chung Sheu, and Hau-Tieng Wu. Diffuse to fuse eeg spectra–intrinsic geometry of sleep
167
dynamics for classification. Biomedical Signal Processing and Control,55:101576, 2020.
[LM10] Mariano Llamedo and Juan Pablo Martınez. Heartbeat classificationusing feature selection driven by database generalization criteria.IEEE Transactions on Biomedical Engineering, 58(3):616–625, 2010.
[LM12] Mariano Llamedo and Juan Pablo Martınez. An automatic patient-adapted ecg heartbeat classifier allowing expert assistance. IEEETransactions on Biomedical Engineering, 59(8):2312–2320, 2012.
[LMW19] Yu-Ting Lin, John Malik, and Hau-Tieng Wu. Wave-shape oscillatorymodel for biomedical time series with applications, 2019.
[LNS16] Seungmin Lee, Seungyoon Nam, and Hyunsoon Shin. The analysisof sleep stages with motion and heart rate signals from a handheldwearable device. In 2016 International Conference on Informationand Communication Technology Convergence (ICTC), pages 1135–1137. IEEE, 2016.
[LO06] Haibin Ling and Kazunori Okada. Diffusion distance for histogramcomparison. In 2006 IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition (CVPR’06), volume 1, pages246–253. IEEE, 2006.
[LRS+03] P. Langley, J. Rieta, M. Stridh, J. Millet, L. Sornmo, and A. Mur-ray. Reconstruction of atrial signals derived from the 12-lead ECGusing atrial signal extraction techniques. In Proc. Int. Conf. IEEEComputers in Cardiology, pages 129–132, 2003.
[LRS+06] P. Langley, J. J. Rieta, M. Stridh, J. Millet, L. Sornmo, and A. Murray.Comparison of atrial signal extraction algorithms in 12-lead ECGswith atrial fibrillation. IEEE Transactions on Biomedical Engineering,53(2):343–6, 2006.
[LS00] P. Laguna and L. Sornmo. Sampling rate and the estimation of ensem-ble variability for repetitive signals. Medical & biological engineering& computing, 38(5):540–6, 2000.
[LSC+07] Aaron Lewicke, Edward Sazonov, Michael J Corwin, Michael Neuman,and Stephanie Schuckers. Sleep versus wake classification from heartrate variability using computational intelligence: consideration ofrejection in classification models. IEEE Transactions on BiomedicalEngineering, 55(1):108–118, 2007.
168
[LSCS05] AT Lewicke, ES Sazonov, MJ Corwin, and SAC Schuckers. Reliabledetermination of sleep versus wake from heart rate variability usingneural networks. In Proceedings. 2005 IEEE International JointConference on Neural Networks, 2005., volume 4, pages 2394–2399.IEEE, 2005.
[LSW18] Chen-Yun Lin, Li Su, and Hau-Tieng Wu. Wave-shape functionanalysis. Journal of Fourier Analysis and Applications, 24(2):451–505, 2018.
[LT15] R. R. Lederman and R. Talmon. Learning the geometry of commonlatent variables using alternating-diffusion. Appl. Comp. Harmon.Anal., in press (2015), 2015.
[LTR17] Henry W Lin, Max Tegmark, and David Rolnick. Why does deepand cheap learning work so well? Journal of Statistical Physics,168(6):1223–1247, 2017.
[LW83] CT Lee and LY Wei. Spectrum analysis of human pulse. IEEEtransactions on bio-medical engineering, 30(6):348, 1983.
[LWM19] Yao Lu, Hau-Tieng Wu, and John Malik. Recycling cardiogenicartifacts in impedance pneumography. Biomedical Signal Processingand Control, 51:162–170, 2019.
[LYSA20] Ofir Lindenbaum, Arie Yeredor, Moshe Salhov, and Amir Averbuch.Multi-view diffusion maps. Information Fusion, 55:127–149, 2020.
[LZCS14] T. A. R. Lankveld, S. Zeemering, H. J. G. M. Crijns, and U. Schotten.The ECG as a tool to determine atrial fibrillation complexity. Heart,100(14):1077–84, 2014.
[LZS+16] T. Lankveld, S. Zeemering, D. Scherr, P. Kuklik, B. A. Hoffmann,S. Willems, B. Pieske, M. Haıssaguerre, P. Jaıs, H. J. Crijns, andU. Schotten. Atrial Fibrillation Complexity Parameters Derived fromSurface ECGs Predict Procedural Outcome and Long-Term Follow-Up of Stepwise Catheter Ablation for Atrial Fibrillation. Circulation:Arrhythmia and Electrophysiology, 9(2):e003354, 2016.
[Mal01] Marek Malik. Problems of heart rate correction in assessment ofdrug-induced qt interval prolongation. Journal of cardiovascularelectrophysiology, 12(4):411–420, 2001.
[Mal12] S. Mallat. Group invariant scattering. Communications on Pure andApplied Mathematics, 65(10):1331–1398, 2012.
169
[Mar98] J. B. Mark. Atlas of cardiovascular monitoring. Churchill Livingstone,1998.
[MAR10] A Martinez, Raul Alcaraz, and Jose Joaquın Rieta. Ectopic beatscanceler for improved atrial activity extraction from holter recordingsof atrial fibrillation. In 2010 Computing in Cardiology, pages 1015–1018. IEEE, 2010.
[MAR11] Arturo Martınez, Raul Alcaraz, and Jose J Rieta. Detection andremoval of ventricular ectopic beats in atrial fibrillation recordingsvia principal component analysis. In 2011 Annual InternationalConference of the IEEE Engineering in Medicine and Biology Society,pages 4693–4696. IEEE, 2011.
[MAR12] A Martınez, R Alcaraz, and JJ Rieta. Optimal cancellation templateanalysis for ectopic beats removal in atrial fibrillation recordings. In2012 Computing in Cardiology, pages 801–804. 2012.
[MAR13] Arturo Martınez, Raul Alcaraz, and Jose J Rieta. Ventricular activitymorphological characterization: Ectopic beats removal in long termatrial fibrillation recordings. Computer methods and programs inbiomedicine, 109(3):283–292, 2013.
[MC90] Marek Malik and A John Camm. Heart rate variability. Clinicalcardiology, 13(8):570–576, 1990.
[McD55] DA McDonald. The relation of pulsatile pressure to flow in arteries.The Journal of physiology, 127(3):533–552, 1955.
[MCTS03] P. E. McSharry, G. D. Clifford, L. Tarassenko, and L. A. Smith. Adynamical model for generating synthetic electrocardiogram signals.IEEE Transactions on Biomedical Engineering, 50(3):289–294, 2003.
[MF17] Pejman Memar and Farhad Faradji. A novel multi-class eeg-basedsleep stage classification system. IEEE Transactions on Neural Sys-tems and Rehabilitation Engineering, 26(1):84–95, 2017.
[MFMTH+14] Shayan Motamedi-Fakhr, Mohamed Moshrefi-Torbati, Martyn Hill,Catherine M Hill, and Paul R White. Signal processing techniquesapplied to human sleep eeg signalsa review. Biomedical Signal Pro-cessing and Control, 10:21–33, 2014.
[MHI+15] Ramakrishna Mukkamala, Jin-Oh Hahn, Omer T Inan, Lalit KMestha, Chang-Sei Kim, Hakan Toreyin, and Survi Kyal. Towardubiquitous blood pressure monitoring via pulse transit time: the-ory and practice. IEEE Transactions on Biomedical Engineering,62(8):1879–1901, 2015.
170
[MJC89] S Lawrence Marple Jr and William M Carey. Digital spectral analysiswith applications, 1989.
[MLW18] John Malik, Yu-Lun Lo, and Hau-tieng Wu. Sleep-wake classificationvia quantifying heart rate variability by convolutional neural network.Physiological measurement, 39(8):085004, 2018.
[MM89] G. Moody and R. Mark. Qrs morphology representation and noiseestimation using the karhunen-loeve transform. In Proc. Int. Conf.IEEE Computers in Cardiology, pages 269–272. IEEE, 1989.
[MM01] George B Moody and Roger G Mark. The impact of the mit-biharrhythmia database. IEEE Engineering in Medicine and BiologyMagazine, 20(3):45–50, 2001.
[MMC+10] Martin Oswaldo Mendez, Matteo Matteucci, Vincenza Castronovo,Luigi Ferini-Strambi, Sergio Cerutti, and Anna Bianchi. Sleep stagingfrom heart rate variability: time-varying spectral features and hiddenmarkov models. International Journal of Biomedical Engineeringand Technology, 3(3-4):246–263, 2010.
[MMZM85] George B Moody, Roger G Mark, Andrea Zoccola, and Sara Mantero.Derivation of respiratory signals from multi-lead ecgs. Computers incardiology, 12(1985):113–116, 1985.
[MNY06] Ha Quang Minh, Partha Niyogi, and Yuan Yao. Mercers theorem,feature maps, and smoothing. In International Conference on Com-putational Learning Theory, pages 154–168. Springer, 2006.
[MRWW17] John Malik, Neil Reed, Chun-Li Wang, and Hau-Tieng Wu. Single-lead f-wave extraction using diffusion geometry. Physiological mea-surement, 38(7):1310, 2017.
[MS92] Mark W Mahowald and Carlos H Schenck. Dissociated states ofwakefulness and sleep. Neurology, 1992.
[MSW20] John Malik, Elsayed Z. Soliman, and Hau-Tieng Wu. An adaptiveqrs detection algorithm for ultra-long-term ecg recordings. Journalof Electrocardiology, 2020.
[MSWW19] John Malik, Chao Shen, Hau-Tieng Wu, and Nan Wu. Connectingdots: from local covariance to empirical intrinsic geometry and locallylinear embedding. Pure and Applied Analysis, 1(4):515–542, 2019.
[MT01] CL Mason and L Tarassenko. Quantitative assessment of respiratoryderivation algorithms. In 2001 Conference Proceedings of the 23rd
171
Annual International Conference of the IEEE Engineering in Medicineand Biology Society, volume 2, pages 1998–2001. IEEE, 2001.
[Nat02] S. Nattel. New ideas about atrial fibrillation 50 years on. Nature,415(6868):219–26, 2002.
[NBE+93] Carlton Wayne Niblack, Ron Barber, Will Equitz, Myron D Flickner,Eduardo H Glasman, Dragutin Petkovic, Peter Yanker, ChristosFaloutsos, and Gabriel Taubin. Qbic project: querying images bycontent, using color, texture, and shape. In Storage and retrieval forimage and video databases, volume 1908, pages 173–187. InternationalSociety for Optics and Photonics, 1993.
[ND02] David J Nott and William TM Dunsmuir. Estimation of nonstation-ary spatial covariance structure. Biometrika, 89(4):819–829, 2002.
[NDKW01] David J Nott, William T M Dunsmuir, Robert Kohn, and FrankWoodcock. Statistical correction of a deterministic numerical weatherprediction model. Journal of the American Statistical Association,96(455):794–804, 2001.
[NFM+11] Lino Nobili, Michele Ferrara, Fabio Moroni, Luigi De Gennaro, Gior-gio Lo Russo, Claudio Campus, Francesco Cardinale, and FabrizioDe Carli. Dissociated wake-like and sleep-like electro-cortical activityduring sleep. Neuroimage, 58(2):612–619, 2011.
[NLCK06] Boaz Nadler, Stephane Lafon, Ronald R Coifman, and Ioannis GKevrekidis. Diffusion maps, spectral clustering and reaction coordi-nates of dynamical systems. Applied and Computational HarmonicAnalysis, 21(1):113–127, 2006.
[NPS+00] Robert G Norman, Ivan Pal, Chip Stewart, Joyce A Walsleben, andDavid M Rapoport. Interobserver agreement among sleep scorersfrom different centers in a large dataset. Sleep, 23(7):901–908, 2000.
[OA05] MF O’Rourke and A Adji. An updated clinical primer on large arterymechanics: implications of pulse waveform analysis and arterialtonometry. Current opinion in cardiology, 20(4):275, 2005.
[OBS+15] Julien Oster, Joachim Behar, Omid Sayadi, Shamim Nemati, Alis-tair EW Johnson, and Gari D Clifford. Semisupervised ecg ventricularbeat classification with novelty detection based on switching kalmanfilters. IEEE Transactions on Biomedical Engineering, 62(9):2125–2134, 2015.
172
[OG96] Michael F O’Rourke and David E Gallagher. Pulse wave analysis.Journal of hypertension. Supplement: official journal of the Interna-tional Society of Hypertension, 14(5):S147–57, 1996.
[ON04] Michael F O’Rourke and Wilmer W Nichols. Changes in wavereflection with advancing age in normal subjects. Hypertension(Dallas, Tex.: 1979), 44(6):e10–1, 2004.
[ONBC04] Hee-Seok Oh, Doug Nychka, Tim Brown, and Paul Charbonneau.Period analysis of variable stars by robust smoothing. Journal of theRoyal Statistical Society: Series C (Applied Statistics), 53(1):15–30,2004.
[O’R11] Michael F O’Rourke. McDonald’s Blood Flow in Arteries 6th Edition:Theoretical, Experimental and Clinical Principles. Hodder Arnold,2011.
[otESoC+96] Task Force of the European Society of Cardiology et al. Heart ratevariability, standards of measurement, physiological interpretation,and clinical use. circulation, 93:1043–1065, 1996.
[PAHJ11] Cheolwoo Park, Jeongyoun Ahn, Martin Hendry, and Woncheol Jang.Analysis of long period variable stars with nonparametric tests fortrend detection. Journal of the American Statistical Association,106(495):832–845, 2011.
[PDMB15] Laure Peter-Derex, Michel Magnin, and Helene Bastuji. Heterogene-ity of arousals in human sleep: a stereo-electroencephalographic study.Neuroimage, 123:229–244, 2015.
[Pea01] Karl Pearson. Liii. on lines and planes of closest fit to systems ofpoints in space. The London, Edinburgh, and Dublin PhilosophicalMagazine and Journal of Science, 2(11):559–572, 1901.
[Pey09] Gabriel Peyre. Manifold models for signals and images. ComputerVision and Image Understanding, 113(2):249–260, 2009.
[PG94] Steven M Pincus and Ary L Goldberger. Physiological time-seriesanalysis: what does regularity quantify? American Journal ofPhysiology-Heart and Circulatory Physiology, 266(4):H1643–H1656,1994.
[PH96] June J Pilcher and Allen I Huffcutt. Effects of sleep deprivation onperformance: a meta-analysis. Sleep, 19(4):318–326, 1996.
173
[PHSG95] C-K Peng, Shlomo Havlin, H Eugene Stanley, and Ary L Goldberger.Quantification of scaling exponents and crossover phenomena innonstationary heartbeat time series. Chaos: An InterdisciplinaryJournal of Nonlinear Science, 5(1):82–87, 1995.
[PKB+16] Thomas Penzel, Jan W Kantelhardt, Ronny P Bartsch, Maik Riedl,Jan F Kraemer, Niels Wessel, Carmen Garcia, Martin Glos, IngoFietze, and Christoph Schobel. Modulations of heart rate, ecg, andcardio-respiratory coupling observed in polysomnography. Frontiersin physiology, 7:460, 2016.
[PKZL12] S.-T. Pan, C.-E. Kuo, J.-H. Zeng, and S.-F. Liang. A transition-constrained discrete hidden Markov model for automatic sleep staging.Biomedical engineering online, 11(1):52, 2012.
[PMR00] Hamilton P.S., Curley M., and Aimi R. Effect of adaptive motion-artifact reduction on QRS detection. Biomed Instrum Technol,34(3):197–202, 2000.
[PMSL12] Andrius Petrenas, Vaidotas Marozas, Leif Sornmo, and Arunas Luko-sevicius. An echo state neural network for qrst cancellation dur-ing atrial fibrillation. IEEE transactions on biomedical engineering,59(10):2950–2957, 2012.
[PNB16] Ankit B Patel, Minh Tan Nguyen, and Richard Baraniuk. A proba-bilistic framework for deep learning. In Advances in neural informa-tion processing systems, pages 2558–2566, 2016.
[PNN+06] S. Petrutiu, J. Ng, H. Nijm, G.and Al-Angari, S. Swiryn, and A. Sa-hakian. Atrial fibrillation and waveform characterization. IEEEEngineering in Medicine and Biology Magazine, 25(6):24–30, 2006.
[Pol09] DSG Pollock. Investigating economic trends and cycles. In PalgraveHandbook of Econometrics, pages 243–307. Springer, 2009.
[Por16] Jacobus W Portegies. Embeddings of riemannian manifolds with heatkernels and eigenfunctions. Communications on Pure and AppliedMathematics, 69(3):478–518, 2016.
[PSH02] Edward F Pace-Schott and J Allan Hobson. The neurobiology ofsleep: genetics, cellular physiology and subcortical networks. NatureReviews Neuroscience, 3(8):591–605, 2002.
[PT85] J. Pan and W. J. Tompkins. A real-time QRS detection algorithm.IEEE Trans. Biomed. Eng., 32(3):230–6, 1985.
174
[QA17] Safa Sultan Qurraie and Rashid Ghorbani Afkhami. Ecg arrhythmiaclassification using time frequency distribution techniques. Biomedi-cal engineering letters, 7(4):325–332, 2017.
[QCCS18] Qiang Qiu, Xiuyuan Cheng, Robert Calderbank, and GuillermoSapiro. Dcfnet: Deep neural network with decomposed convolutionalfilters. arXiv preprint arXiv:1802.04145, 2018.
[Ram04] James O Ramsay. Functional data analysis. Encyclopedia of Statisti-cal Sciences, 4, 2004.
[RCS+04] J. J. Rieta, F. Castells, C. Sanchez, V. Zarzoso, and J. Millet. Atrialactivity extraction for atrial fibrillation analysis using blind source sep-aration. IEEE Transactions on Biomedical Engineering, 51(7):1176–86, 2004.
[Rec68] Allan Rechtschaffen. A manual for standardized terminology, tech-niques and scoring system for sleep stages in human subjects. Braininformation service, 1968.
[RFJ11] Alcaraz R., Hornero F., and Rieta J.J. Assessment of non-invasivetime and frequency atrial fibrillation organization markers with unipo-lar atrial electrograms. Physiol. Meas., 32:99–114, 2011.
[RGAR15] J. Rodenas, M. Garcıa, R. Alcaraz, and J. J. Rieta. Wavelet entropyautomatically detects episodes of atrial fibrillation from single-leadelectrocardiograms. Entropy, 17(9):6179–6199, 2015.
[RH07] J. J. Rieta and F. Hornero. Comparative study of methods for ventric-ular activity cancellation in atrial electrograms of atrial fibrillation.Physiological measurement, 28(8):925–36, 2007.
[RK04] Ryan Rifkin and Aldebaro Klautau. In defense of one-vs-all clas-sification. Journal of machine learning research, 5(Jan):101–141,2004.
[RMHA07] BR Ribeiro, AM Marques, JH Henriques, and MA Antunes. Prema-ture ventricular beat detection by using spectral clustering methods.In 2007 Computers in Cardiology, pages 149–152. IEEE, 2007.
[RPTB01] Yossi Rubner, Jan Puzicha, Carlo Tomasi, and Joachim M Buhmann.Empirical evaluation of dissimilarity measures for color and texture.Computer vision and image understanding, 84(1):25–43, 2001.
[RS80] Michael Reed and Barry Simon. Methods of modern mathematicalphysics. Volume I: functional analysis. Academic press, 1980.
175
[RS00] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionalityreduction by locally linear embedding. science, 290(5500):2323–2326,2000.
[RSW09] Ori Rosen, David S Stoffer, and Sally Wood. Local spectral analysisvia a bayesian mixture of smoothing splines. Journal of the AmericanStatistical Association, 104(485):249–262, 2009.
[RTG00] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earthmover’s distance as a metric for image retrieval. International journalof computer vision, 40(2):99–121, 2000.
[SBDH97] Jaroslav Stark, DS Broomhead, ME Davies, and J Huke. Takensembedding theorems for forced and stochastic systems. NonlinearAnalysis: Theory, Methods & Applications, 30(8):5303–5314, 1997.
[SBM+85] J. Slocum, E. Byrom, L. McCarthy, A. Sahakian, and S. Swiryn.Computer detection of atrioventricular dissociation from the sur-face electrocardiogram during wide QRS tachycardias. Circulation,72:1028–1036, 1985.
[SC08] Amit Singer and Ronald R Coifman. Non-linear independent com-ponent analysis with diffusion maps. Applied and ComputationalHarmonic Analysis, 25(2):226–239, 2008.
[SCO92] EEG AROUSAL SCORING. Eeg arousals: scoring rules and exam-ples a preliminary report from the sleep disorders atlas task force ofthe american sleep disorders association. Sleep, 15(2), 1992.
[SD] Myriam Kerkhofs Stphanie Devuyst, Thierry Dutoit. Dreams subjectsdatabase. University of MONS - TCTS Laboratory and UniversitLibre de Bruxelles - CHU de Charleroi Sleep Laboratory.
[SDMA93] Virend K Somers, Mark E Dyken, Allyn L Mark, and Francois MAbboud. Sympathetic-nerve activity during sleep in normal subjects.New England Journal of Medicine, 328(5):303–307, 1993.
[SDWG17] Akara Supratak, Hao Dong, Chao Wu, and Yike Guo. Deepsleepnet:a model for automatic sleep stage scoring based on raw single-channeleeg. IEEE Transactions on Neural Systems and Rehabilitation Engi-neering, 25(11):1998–2008, 2017.
[SFB+98] Robert E Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee, et al.Boosting the margin: A new explanation for the effectiveness ofvoting methods. The annals of statistics, 26(5):1651–1686, 1998.
176
[She07] Kirk H Shelley. Photoplethysmography: beyond the calculation ofarterial oxygen saturation and heart rate. Anesthesia & Analgesia,105(6):S31–S36, 2007.
[SHMG64] Frederick Snyder, J Allan Hobson, Donald F Morrison, and FrederickGoldfrank. Changes in respiration, heart rate, and systolic bloodpressure in human sleep. Journal of applied physiology, 19(3):417–422,1964.
[SHS01] Mirja A Salo, Heikki V Huikuri, and Tapio Seppanen. Ectopic beatsin heart rate variability analysis: effects of editing on time andfrequency domain measures. Annals of noninvasive electrocardiology,6(1):5–17, 2001.
[Sin06] Amit Singer. From graph to manifold laplacian: The convergencerate. Applied and Computational Harmonic Analysis, 21(1):128–134,2006.
[SK85] A.V. Sahakian and K.H. Kuo. Canceling the cardiogenic artifact inimpedance pneumography. In Proc. Annu. Int. Conf. IEEE Engi-neering Medicine Biology Society, volume 7, pages 855–859. IEEE,1985.
[SKVG+05] ERJ Seitsonen, IKJ Korhonen, MJ Van Gils, M Huiku, JMP Lotjonen,KT Korttila, and AM Yli-Hankala. Eeg spectral entropy, heartrate, photoplethysmography and motor responses to skin incisionduring sevoflurane anaesthesia. Acta Anaesthesiologica Scandinavica,49(3):284–292, 2005.
[SMG14] C Slagt, I Malagon, and AB Groeneveld. Systematic review ofuncalibrated arterial pressure waveform analysis to determine cardiacoutput and stroke volume variation. British journal of anaesthesia,112(4):626, 2014.
[SMZ14] Fred Shaffer, Rollin McCraty, and Christopher L Zerr. A healthyheart is not a metronome: an integrative review of the heart’s anatomyand heart rate variability. Frontiers in psychology, 5:1040, 2014.
[SRB+91] EA Simoes, R Roark, S Berman, LL Esler, and J Murphy. Respiratoryrate: measurement of variability over time and accuracy at differentcounting periods. Archives of disease in childhood, 66(10):1199–1203,1991.
[SS01] M. Stridh and L. Sornmo. Spatiotemporal QRST cancellation tech-niques for analyses of atrial fibrillation. IEEE Transactions onBiomedical Engineering, 51:105–111, 2001.
177
[SSB+02] Bernhard Scholkopf, Alexander J Smola, Francis Bach, et al. Learningwith kernels: support vector machines, regularization, optimization,and beyond. MIT press, 2002.
[SSS92] J. Slocum, A. Sahakian, and S. Swiryn. Diagnosis of atrial fibrillationfrom surface electrocardiograms based on computer-detected atrialactivity. J Electrocardiol, 25:1–8, 1992.
[SSS98] Sergio Shkurovich, Alan V Sahakian, and Steven Swiryn. Detectionof atrial activity from high-voltage leads of implantable ventriculardefibrillators using a cancellation technique. IEEE Transactions onBiomedical Engineering, 45(2):229–234, 1998.
[STS16] Tal Shnitzer, Ronen Talmon, and Jean-Jacques Slotine. Manifoldlearning with contracting observers for data-driven time-series analy-sis. IEEE Transactions on Signal Processing, 65(4):904–918, 2016.
[STSG85] Benedict J Semmes, Martin J Tobin, James V Snyder, and AkeGrenvik. Subjective and objective measurement of tidal volume incritically iii patients. Chest, 87(5):577–579, 1985.
[SW12] Amit Singer and H-T Wu. Vector diffusion maps and the connec-tion laplacian. Communications on pure and applied mathematics,65(8):1067–1144, 2012.
[SW13] Amit Singer and H-T Wu. Two-dimensional tomography from noisyprojections taken at unknown random directions. SIAM journal onimaging sciences, 6(1):136–175, 2013.
[SW17a] Amit Singer and Hau-Tieng Wu. Spectral convergence of the connec-tion laplacian from random samples. Information and Inference: AJournal of the IMA, 6(1):58–123, 2017.
[SW17b] Li Su and Hau-Tieng Wu. Extract fetal ecg from single-lead abdomi-nal ecg by de-shape short time fourier transform and nonlocal median.Frontiers in Applied Mathematics and Statistics, 3:2, 2017.
[SWaHC02] L. Senhadji, F. Wang, a.I. Hernandez, and G. Carrault. Waveletsextrema representation for QRS-T cancellation and P wave detection.Computers in Cardiology, pages 0–3, 2002.
[SWW07] Oleg G Smolyanov, Heinrich v Weizsacker, and Olaf Wittich. Cher-noff’s theorem and discrete time approximations of brownian motionon manifolds. Potential Analysis, 26(1):1–29, 2007.
[SYC91] Tim Sauer, James A Yorke, and Martin Casdagli. Embedology.Journal of statistical Physics, 65(3-4):579–616, 1991.
178
[SYCH04] D. C. Shah, T. Yamane, K. J. Choi, and M. Haissaguerre. QRSsubtraction and the ECG analysis of atrial ectopics. Annals ofNoninvasive Electrocardiology, 9(4):389–398, 2004.
[TAH01] Stephen R Thompson, Uwe Ackermann, and Richard L Horner. Sleepas a teaching tool for integrating respiratory physiology and motorcontrol. Advances in physiology education, 25(2):29–44, 2001.
[Tak81] Floris Takens. Detecting strange attractors in turbulence. In Dynam-ical systems and turbulence, Warwick 1980, pages 366–381. Springer,1981.
[Tay66a] MG Taylor. The input impedance of an assembly of randomlybranching elastic tubes. Biophysical journal, 6(1):29–51, 1966.
[Tay66b] Michael G Taylor. Use of random excitation and spectral analysis inthe study of frequency-dependent parameters of the cardiovascularsystem. Circulation research, 18(5):585–595, 1966.
[TBFW13] G. Thakur, E. Brevdo, N. S. Fuckar, and H.-T. Wu. The syn-chrosqueezing algorithm for time-varying spectral analysis: Robust-ness properties and new paleoclimate applications. Signal Processing,93(5):1079–1094, 2013.
[TC13] Ronen Talmon and Ronald R Coifman. Empirical intrinsic geometryfor nonlinear modeling and time series filtering. Proceedings of theNational Academy of Sciences, 110(31):12535–12540, 2013.
[TdL00] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A Global Geo-metric Framework for Nonlinear Dimensionality Reduction. Science,290(5500):2319–2323, 2000.
[Teo08] L.-C. Teofilo. Sleep medicine: Essentials and review. Oxford Univer-sity Press, 2008.
[TGHS18] N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimatesfor spectral convergence of the graph Laplacian on random geomet-ric graphs towards the Laplace–Beltrami operator. arXiv preprintarXiv:1801.10108, 2018.
[TGP+96] L Toscani, PF Gangemi, A Parigi, R Silipo, P Ragghianti, E Sirabella,M Morelli, L Bagnoli, R Vergassola, and G Zaccara. Human heartrate variability and sleep stages. The Italian Journal of NeurologicalSciences, 17(6):437–439, 1996.
179
[TMG16] Orestis Tsinalis, Paul M Matthews, and Yike Guo. Automaticsleep stage scoring using time-frequency analysis and stacked sparseautoencoders. Annals of biomedical engineering, 44(5):1587–1597,2016.
[TMPG05] Robert Joseph Thomas, Joseph E Mietus, Chung-Kang Peng, andAry L Goldberger. An electrocardiogram-based technique to assesscardiopulmonary coupling during sleep. Sleep, 28(9):1151–1161, 2005.
[TMZC15] Ronen Talmon, Stephane Mallat, Hitten Zaveri, and Ronald R Coif-man. Manifold learning for latent variable inference in dynamicalsystems. IEEE Transactions on Signal Processing, 63(15):3843–3856,2015.
[TO19] Omer Turk and Mehmet Sirac Ozerdem. Epilepsy detection by usingscalogram based convolutional neural network from eeg signals. Brainsciences, 9(5):115, 2019.
[Tob88] Martin J Tobin. Respiratory monitoring in the intensive care unit.Am Rev Respir Dis, 138:1625–1642, 1988.
[TS18] Nicolas Garcıa Trillos and Dejan Slepcev. A variational approach tothe consistency of spectral clustering. Applied and ComputationalHarmonic Analysis, 45(2):239–281, 2018.
[TSC+16] JL Teboul, B Saugel, M Cecconi, D De Backer, CK Hofer, X Monnet,A Perel, MR Pinsky, DA Reuter, A Rhodes, et al. Less invasivehemodynamic monitoring in critically ill patients. Intensive caremedicine, 42(9):1350, 2016.
[TW11] G. Thakur and H.-T. Wu. Synchrosqueezing-based recovery of in-stantaneous frequency from nonuniform samples. SIAM Journal onMathematical Analysis, 43(5):2078–2095, 2011.
[TW19] Ronen Talmon and Hau-Tieng Wu. Latent common manifold learningwith alternating diffusion: analysis and applications. Applied andComputational Harmonic Analysis, 47(3):848–892, 2019.
[TZ91] N.V. Thakor and Y.S. Zhu. Applications of adaptive filtering toECG analysis: noise cancellation and arrhythmia detection. IEEETransactions on Biomedical Engineering, 38(8):785–794, 1991.
[TZ15] Naftali Tishby and Noga Zaslavsky. Deep learning and the infor-mation bottleneck principle. In 2015 IEEE Information TheoryWorkshop (ITW), pages 1–5. IEEE, 2015.
180
[UBBP18] Muhammed Kursad Ucar, Mehmet Recep Bozkurt, Cahit Bilgin, andKemal Polat. Automatic sleep staging in obstructive sleep apneapatients using photoplethysmography, heart rate variability signal andmachine learning techniques. Neural Computing and Applications,29(8):1–16, 2018.
[VCLP00] Eve Van Cauter, Rachel Leproult, and Laurence Plat. Age-relatedchanges in slow wave sleep and rem sleep and relationship with growthhormone and cortisol levels in healthy men. Jama, 284(7):861–868,2000.
[VCP96] Eve Van Cauter and Laurence Plat. Physiology of growth hormonesecretion during sleep. The Journal of pediatrics, 128(5):S32–S37,1996.
[VDMPVdH09] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik.Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71):13, 2009.
[vdSWSvL18] Bjorn JP van der Ster, Berend E Westerhof, Wim J Stok, andJohannes J van Lieshout. Detecting central hypovolemia in simulatedhypovolemic shock by automated feature extraction with principalcomponent analysis. Physiological reports, 6(22):e13895, 2018.
[vDvBH+17] A van Drumpt, J van Bommel, S Hoeks, F Grune, T Wolvetang,J Bekkers, and M ter Horst. The value of arterial pressure waveformcardiac output measurements in the radial and femoral artery inmajor cardiac surgery patients. BMC Anesthesiology, 17, 2017.
[VDWHH11] Alexander TM Van De Water, Alison Holmes, and Deirdre A Hurley.Objective measurements of sleep for non-laboratory settings as alter-natives to polysomnography–a systematic review. Journal of sleepresearch, 20(1pt2):183–200, 2011.
[VHM+01] C. Vasquez, A. Hernandez, F. Mora, G. Carrault, and G. Passariello.Atrial activity enhancement by Wiener filtering using an artificialneural network. IEEE Transactions on Biomedical Engineering,48(8):940–4, 2001.
[vIJH07] A. van Oosterom, Z. Ihara, V. Jacquemet, and R. Hoekema. Vector-cardiographic lead systems for the characterization of atrial fibrilla-tion. Journal of electrocardiology, 40(4):343.e1–11, 2007.
[VLBB08] Ulrike Von Luxburg, Mikhail Belkin, and Olivier Bousquet. Consis-tency of spectral clustering. The Annals of Statistics, pages 555–586,2008.
181
[VLBB16] Jose Vicente, Pablo Laguna, Ariadna Bartra, and Raquel Bailon.Drowsiness detection using heart rate variability. Medical & biologicalengineering & computing, 54(6):927–937, 2016.
[VLM04] Michael V Vitiello, Lawrence H Larsen, and Karen E Moe. Age-related sleep change: gender and estrogen effects on the subjective–objective sleep quality relationships of healthy, noncomplaining oldermen and women. Journal of psychosomatic research, 56(5):503–510,2004.
[VMH17] Albert Vilamala, Kristoffer H Madsen, and Lars K Hansen. Deepconvolutional neural networks for interpretable analysis of eeg sleepstage scoring. In 2017 IEEE 27th International Workshop on MachineLearning for Signal Processing (MLSP), pages 1–6. IEEE, 2017.
[VPH+09] Luiz Carlos Marques Vanderlei, Carlos Marcelo Pastre,Rosangela Akemi Hoshi, Tatiana Dias de Carvalho, and MoacirFernandes de Godoy. Basic notions of heart rate variability and itsclinical applicability. Brazilian Journal of Cardiovascular Surgery,24(2):205–217, 2009.
[VQMR95] Bradley V Vaughn, SR Quint, JA Messenheimer, and KR Robertson.Heart period variability in sleep. Electroencephalography and clinicalNeurophysiology, 94(3):155–162, 1995.
[VSKKVdV90] B Van Sweden, B Kemp, HAC Kamphuisen, and EA Van der Velde.Alternative electrode placement in (automatic) sleep scoring (f pz-cz/p z-oz versus c4-at). Sleep, 13(3):279–283, 1990.
[Wan15] Xu Wang. Spectral convergence rate of graph laplacian. arXivpreprint arXiv:1510.08110, 2015.
[WAO+04] Thomas Weber, Johann Auer, Michael F ORourke, Erich Kvas,Elisabeth Lassnig, Robert Berent, and Bernd Eber. Arterial stiffness,wave reflections, and the risk of coronary artery disease. Circulation,109(2):184–189, 2004.
[WB17] Thomas Wiatowski and Helmut Bolcskei. A mathematical theoryof deep convolutional neural networks for feature extraction. IEEETransactions on Information Theory, 64(3):1845–1866, 2017.
[WC85] Ling Y Wei and Peter Chow. Frequency distribution of human pulsespectra. IEEE Transactions on Biomedical Engineering, 32(3):245–246, 1985.
182
[WCC+97] Yuh-Ying Lin Wang, CC Chang, JC Chen, H Hsiu, and WK Wang.Pressure wave propagation in arteries. a model with radial dilatationfor simulating the behavior of a real artery. IEEE Engineering inMedicine and Biology Magazine, 16(1):51–54, 1997.
[WCLY14] Hau-Tieng Wu, Yi-Hsin Chan, Yu-Ting Lin, and Yung-Hsin Yeh.Using synchrosqueezing transform to discover breathing dynamicsfrom ecg signals. Applied and Computational Harmonic Analysis,36(2):354–359, 2014.
[WCW+91] Yuh Ying Wang, SL Chang, YE Wu, TL Hsu, and WK Wang.Resonance. the missing phenomenon in hemodynamics. Circulationresearch, 69(1):246–249, 1991.
[WCW98] Ian B Wilkinson, John R Cockcroft, and David J Webb. Pulse waveanalysis and arterial stiffness. Journal of cardiovascular pharmacology,32:S33–7, 1998.
[WH17] Philip Warrick and Masun Nabhan Homsi. Cardiac arrhythmiadetection from ecg combining convolutional and long short-termmemory networks. In 2017 Computing in Cardiology (CinC), pages1–4. IEEE, 2017.
[WJS+04] Yuh-Ying Lin Wang, Ming-Yie Jan, Ching-Show Shyu, Chi-AngChiang, and Wei-Kung Wang. The natural frequencies of the arterialsystem and their relation to the heart rate. IEEE Transactions onBiomedical Engineering, 51(1):193–195, 2004.
[WLD+15] H.-T. Wu, G. F. Lewis, M. I. Davila, I. Daubechies, and S. W. Porges.Optimizing heart rate variability evaluation from pulse wave signalswith the synchrosqueezing transform. Methods of Information inMedicine, in press, 2015.
[WLH+00] Yuh Ying Lin Wang, WC Lia, Hsin Hsiu, Ming-Yie Jan, and Wei KungWang. Effect of length on the fundamental resonance frequencyof arterial models having radial dilatation. IEEE transactions onbiomedical engineering, 47(3):313–318, 2000.
[WLW09] Nico Westerhof, Jan-Willem Lankhaar, and Berend E Westerhof. Thearterial windkessel. Medical & biological engineering & computing,47(2):131–141, 2009.
[Wom55] John R Womersley. Method for the calculation of velocity, rate offlow and viscous drag in arteries when the pressure gradient is known.The Journal of physiology, 127(3):553–563, 1955.
183
[WRSV08] Eric P Widmaier, Hershel Raff, Kevin T Strang, and Arthur J Vander.Vander’s Human physiology: the mechanisms of body function. Boston:McGraw-Hill Higher Education,, 2008.
[WS01] Christopher KI Williams and Matthias Seeger. Using the nystrommethod to speed up kernel machines. In Advances in neural informa-tion processing systems, pages 682–688, 2001.
[WTL14] Hau-tieng Wu, Ronen Talmon, and Yu-Lun Lo. Assess sleep stageby modern signal processing techniques. IEEE Transactions onBiomedical Engineering, 62(4):1159–1168, 2014.
[Wu11] Hau-Tieng Wu. Adaptive analysis of complex data sets. PhD thesis,Princeton University, 2011.
[Wu13] Hau-Tieng Wu. Instantaneous frequency and wave shape functions(i). Applied and Computational Harmonic Analysis, 35(2):181–199,2013.
[WVD+12] Devy Widjaja, Carolina Varon, Alexander Dorado, Johan AKSuykens, and Sabine Van Huffel. Application of kernel principalcomponent analysis for single-lead-ecg-derived respiration. IEEETransactions on Biomedical Engineering, 59(4):1169–1176, 2012.
[WW+18] Hau-Tieng Wu, Nan Wu, et al. Think globally, fit locally under themanifold setup: Asymptotic analysis of locally linear embedding. TheAnnals of Statistics, 46(6B):3805–3837, 2018.
[WWH+19] Shen-Chih Wang, Hau-Tieng Wu, Po-Hsun Huang, Cheng-Hsi Chang,Chien-Kun Ting, and Yu-Ting Lin. Novel imaging revealing in-ner dynamics for cardiovascular waveform analysis via unsupervisedmanifold learning. arXiv preprint arXiv:1909.04206, 2019.
[XKX+13] Lulu Xie, Hongyi Kang, Qiwu Xu, Michael J Chen, Yonghong Liao,Meenakshisundaram Thiyagarajan, John ODonnell, Daniel J Chris-tensen, Charles Nicholson, Jeffrey J Iliff, et al. Sleep drives metaboliteclearance from the adult brain. science, 342(6156):373–377, 2013.
[XNC+18] Zhaohan Xiong, Martyn P Nash, Elizabeth Cheng, Vadim V Fedorov,Martin K Stiles, and Jichao Zhao. Ecg signal classification for thedetection of cardiac arrhythmias using a convolutional recurrentneural network. Physiological measurement, 39(9):094006, 2018.
[XSG+01] Xueyan Xu, Stephanie Schuckers, CHIME Study Group, et al. Auto-matic detection of artifacts in heart period data. Journal of electro-cardiology, 34(4):205–210, 2001.
184
[XSS03] Q. Xi, A. V. Sahakian, and S. Swiryn. The Effect of QRS Cancellationon Atrial Fibrillatory Wave Signal Characteristics in the SurfaceElectrocardiogram. J. Electrocardiol., 36(3):243–249, 2003.
[XYD18] Jieren Xu, Haizhao Yang, and Ingrid Daubechies. Recursivediffeomorphism-based regression for shape functions. SIAM Journalon Mathematical Analysis, 50(1):5–32, 2018.
[XYS+13] Meng Xiao, Hong Yan, Jinzhong Song, Yuzhou Yang, and XianglinYang. Sleep stages classification based on heart rate variability andrandom forest. Biomedical Signal Processing and Control, 8(6):624–633, 2013.
[YAA+10] Bulent Yılmaz, Musa H Asyalı, Eren Arıkan, Sinan Yetkin, and FuatOzgen. Sleep stage and obstructive apneaic epoch classification usingsingle-lead ecg. Biomedical engineering online, 9(1):39, 2010.
[Yan15] Haizhao Yang. Synchrosqueezed wave packet transforms and diffeo-morphism based spectral analysis for 1d general mode decompositions.Applied and Computational Harmonic Analysis, 39(1):33–66, 2015.
[Yan17] Haizhao Yang. Multiresolution mode decomposition for adaptivetime series analysis. arXiv preprint arXiv:1709.06880, 2017.
[YKC12] Can Ye, BVK Vijaya Kumar, and Miguel Tavares Coimbra. Heartbeatclassification using morphological and dynamic features of ecg signals.IEEE Transactions on Biomedical Engineering, 59(10):2930–2941,2012.
[YL89] FC Yin and ZR Liu. Estimating arterial resistance and compli-ance during transient conditions in humans. American Journalof Physiology-Heart and Circulatory Physiology, 257(1):H190–H197,1989.
[YNS+15] J. Yang, M. N. Nguyen, P. P. San, X. Li, and S. Krishnaswamy. Deepconvolutional neural networks on multichannel time series for humanactivity recognition. In IJCAI, pages 3995–4001, 2015.
[YYJG16] Yanqing Ye, Kewei Yang, Jiang Jiang, and Bingfeng Ge. Automaticsleep and wake classifier with heart rate and pulse oximetry: Deriveddynamic time warping features and logistic model. In 2016 AnnualIEEE Systems Conference (SysCon), pages 1–6. IEEE, 2016.
[ZBH+16] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Un-derstanding deep learning requires rethinking generalization. arXivpreprint arXiv:1611.03530, 2016.
185
[ZG96] W. Zeng and L. Glass. Statistical properties of heartbeat intervalsduring atrial fibrillation. Physical Review E, 54(2):1779–1784, 1996.
[ZLB+14] S. Zeemering, T. A. R. Lankveld, P. Bonizzi, H. J. G. M. Crijns,and U. Schotten. Principal Component Analysis of Body SurfacePotential Mapping in Atrial Fibrillation Patients Suggests AdditionalECG Lead Locations. Computing in Cardiology, 41:893–896, 2014.
[ZVS84] Danguole ZEmaityte, Giedrius Varoneckas, and Eugene Sokolov.Heart rhythm control during sleep. Psychophysiology, 21(3):279–289,1984.
[ZW17a] J. Zhang and Y. Wu. Automatic sleep stage classification of single-channel eeg by using complex-valued convolutional neural network.Biomedical Engineering/Biomedizinische Technik, 2017.
[ZW17b] J. Zhang and Y. Wu. A new method for automatic sleep stage clas-sification. IEEE Transactions on Biomedical Circuits and Systems,2017.
[ZZ04] Z. Zhang and H. Zha. Principal manifolds and nonlinear dimension-ality reduction via tangent space alignment. SIAM J. Sci. Comput.,26:313 – 338, 2004.
186
Biography
John Malik is a Canadian PhD candidate in the Department of Mathematics at Duke
University with a completion date of May, 2020. He completed his undergraduate
and master’s degrees in mathematics at the University of Western Ontario. He
develops and uses tools from applied harmonic analysis, differential geometry, and
nonlinear time-frequency analysis to extract clinically relevant information from
biomedical time series. He is currently funded by the Myra and William Waldo
Boone Fellowship. Journal publications by the author include [MSW20] [MRWW17]
[MLW18] [MSWW19] [LLM+20] [LWM19] [FdMM17].
187