a geometric approach to biomedical time series analysis

A Geometric Approach to Biomedical Time Series Analysis

by

John Rajiv Vijh Malik

Department of MathematicsDuke University

Date:

Approved:

Hau-tieng Wu, Advisor

David Dunson

Sayan Mukherjee

James Nolen

Dissertation submitted in partial fulfillment of therequirements for the degree of Doctor of Philosophy

in the Department of Mathematicsin the Graduate School of

Duke University

2020

ABSTRACT

A Geometric Approach to Biomedical Time Series Analysis

by

John Rajiv Vijh Malik

Department of MathematicsDuke University

Date:

Approved:

Hau-tieng Wu, Advisor

David Dunson

Sayan Mukherjee

James Nolen

An abstract of a dissertation submitted in partial fulfillment of therequirements for the degree of Doctor of Philosophy

in the Department of Mathematicsin the Graduate School of

Duke University

2020

Abstract

Biomedical time series are non-invasive windows through which we may observe

human systems. Although a vast amount of information is hidden in the medical

field’s growing collection of long-term, high-resolution, and multi-modal biomedical

time series, effective algorithms for extracting that information have not yet been

developed. We are particularly interested in the physiological dynamics of a human

system, namely the changes in state that the system experiences over time (which

may be intrinsic or extrinsic in origin). We introduce a mathematical model for a

particular class of biomedical time series, called the wave-shape oscillatory model,

which quantifies the sense in which dynamics are hidden in those time series. There are

two key ideas behind the new model. First, instead of viewing a biomedical time series

as a sequence of measurements made at the sampling rate of the signal, we can often

view it as a sequence of cycles occurring at irregularly-sampled time points. Second,

the “shape” of an individual cycle is assumed to have a one-to-one correspondence with

the state of the system being monitored; as such, changes in system state (dynamics)

can be inferred by tracking changes in cycle shape. Since physiological dynamics are

not random but are well-regulated (except in the most pathological of cases), we can

assume that all of the system’s states lie on a low-dimensional, abstract Riemannian

manifold called the phase manifold. When we model the correspondence between

the hidden system states and the observed cycle shapes using a diffeomorphism, we

allow the topology of the phase manifold to be recovered by methods belonging to the

field of unsupervised manifold learning. In particular, we prove that the physiological

dynamics hidden in a time series adhering to the wave-shape oscillatory model can

be well-recovered by applying the diffusion maps algorithm to the time series’ set

of oscillatory cycles. We provide several applications of the wave-shape oscillatory

iv

model and the associated algorithm for dynamics recovery, including unsupervised and

supervised heartbeat classification, derived respiratory monitoring, intra-operative

cardiovascular monitoring, supervised and unsupervised sleep stage classification, and

f -wave extraction (a single-channel blind source separation problem).

v

Acknowledgements

Thank you to my supervisor, Hau-tieng Wu; my committee members; my collaborators

Yu-Ting Lin, Yu-Lun Lo, Chun-Li Wang, Elsayed Z. Soliman, Gi-Ren Liu, Nan

Wu, Chao Shen, and Yao Lu; my fellow graduate students at Duke University, the

University of Toronto, and the University of Western Ontario; the faculty members at

the University of Western Ontario; my family in Canada; and my girlfriend.

vi

Contents

Abstract iv

Acknowledgements vi

List of Tables xi

List of Figures xii

1 Introduction 1

1.1 Biomedical Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 The Wave-shape Oscillatory Model . . . . . . . . . . . . . . . . . . . 6

1.4 Dynamics in Chaotic Time Series . . . . . . . . . . . . . . . . . . . . 22

2 Diffusion Maps 24

2.1 Diffusion on a Measure Space . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Diffusion on a Compact Domain . . . . . . . . . . . . . . . . . . . . . 27

2.3 Diffusion on a Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 Finite Samples . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.2 Robustness to Noise . . . . . . . . . . . . . . . . . . . . . . . 35

3 Cardiovascular Waveform Analysis 37

3.1 Heartbeat Classification . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1.1 Unsupervised Extraction of Morphological Dynamics . . . . . 40

3.1.2 Supervised Ventricular Ectopy Detection . . . . . . . . . . . . 43

3.1.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

vii

3.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2 Electrocardiogram-derived Respiration . . . . . . . . . . . . . . . . . 59

3.2.1 Modeling Respiratory Dynamics in the ECG . . . . . . . . . . 61

3.2.2 DM-EDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2.3 Material and Results . . . . . . . . . . . . . . . . . . . . . . . 64

3.3 Intraoperative Cardiac Monitoring . . . . . . . . . . . . . . . . . . . . 66

4 Decoding Sleep Dynamics using Multiple EEG Channels 74

4.1 Background and Significance . . . . . . . . . . . . . . . . . . . . . . . 75

4.2 Unsupervised Extraction of Sleep Dynamics . . . . . . . . . . . . . . 77

4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3.1 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . 81

4.3.2 Annotated Polysomnograms . . . . . . . . . . . . . . . . . . . 85

4.3.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 86

4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 88

4.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5 Supervised Extraction of Sleep Dynamics using InstantaneousHeart Rate 92

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.1.1 Heart Rate Variability and Sleep . . . . . . . . . . . . . . . . 93

5.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.2.2 Instantaneous Heart Rate . . . . . . . . . . . . . . . . . . . . 97

5.2.3 Convolutional Neural Network . . . . . . . . . . . . . . . . . . 98

viii

5.2.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 98

5.2.5 Robustness Check . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.3.1 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.3.2 Transfer Proficiency Among Monitoring Devices . . . . . . . . 103

5.3.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 106

5.4 Further Exploration of the CNN via Data and Feature Visualization . 108

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.5.1 Comparison with Previous Work . . . . . . . . . . . . . . . . 117

5.5.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . 119

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.7 Supplementary Results . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6 Blind Source Separation in Atrial Fibrillation 125

6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.1.1 DD-NLEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.1.2 A Variation of DD-NLEM . . . . . . . . . . . . . . . . . . . . . 133

6.1.3 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.1.4 Other Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.1.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 138

6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.2.1 Comparison with Previous Algorithms . . . . . . . . . . . . . 141

6.2.2 Validation of the Proposed mVR Index . . . . . . . . . . . . . 142

6.3 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 147

Bibliography 151

ix

Biography 187

x

List of Tables

3.1 Annotated single-lead ECG databases . . . . . . . . . . . . . . . . . . 47

3.2 Performance of the proposed ventricular ectopy detector . . . . . . . 54

3.3 Comparison with previous results on DS2 . . . . . . . . . . . . . . . . 55

4.1 Comparison of linear sensor fusion methods . . . . . . . . . . . . . . 88

4.2 F1 scores obtained by the sleep stage classifier when using differentpairings of EEG channels . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.3 Performance of the proposed automatic sleep stage classifier . . . . . 88

5.1 Performance statistics for the CNN model trained on CGMH-training 103

5.2 Transfer proficiency of the CNN model among different monitoringdevices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.3 Sensitivity analysis for the CNN model trained on CGMH-trainingwith various input sizes . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.4 Performance statistics for the CNN models trained on the DREAMSSubjects Database and the UCDSADB . . . . . . . . . . . . . . . . . 109

5.5 Performance statistics for the REM vs. non-REM model trained onCGMH-training (REM) . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.6 Performance statistics for the scattering transform-based model trainedon CGMH-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.1 Comparison of existing single-lead f -wave extraction algorithms fromthree viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.2 Comparison of f -wave extraction algorithms on simulated one-hourECG signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.3 Comparison of f -wave extraction algorithms on real one-hour Holtersignals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

xi

List of Figures

1.1 Simultaneously recorded 10-second ECG, PPG, and ABP signals . . . 4

1.2 The instantaneous amplitude of an ECG signal gives surrogate respira-tory information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 The instantaneous frequency of an ECG signal is known as the instan-taneous heart rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Landmarks on the enigmatic ECG template correspond to distinctstages of a healthy heart contraction . . . . . . . . . . . . . . . . . . 11

1.5 Atrial fibrillation is characterized by a lack of P waves, irregularly-occurring ventricular contractions, and a fibrillatory wave . . . . . . . 12

1.6 ECG signals featuring premature ventricular contractions are difficultto model because the wave-shape is not slowly varying . . . . . . . . 16

1.7 An illustration of the wave-shape manifold in Theorem 1.3.3 . . . . . 20

3.1 A pathological ECG signal featuring morphologically distinct prematureventricular contractions and fusion beats . . . . . . . . . . . . . . . . 43

3.2 A mapping of N = 2953 cardiac cycles into R2 afforded by the top twocoordinates of the time-1 diffusion map . . . . . . . . . . . . . . . . . 44

3.3 A projection of the morphological dynamics on the recovered wave-shape manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.4 A mapping of the set of QRS complexes to R2 using the top twonon-trivial coordinates of the time-1 diffusion map . . . . . . . . . . . 65

3.5 The EDR signal (obtained from either the traditional method or theDM-EDR algorithm) well-approximates the respiratory flow signal . . . 66

3.6 Morphological differences between normal and PVC beats are reflectedin the top non-trivial eigenvector of the diffusion kernel . . . . . . . . 67

3.7 ECG pulses occuring preceding, during, and following the MI event . 68

xii

3.8 A mapping of the ECG waveforms to R3 afforded by the first threenon-trivial coordinates of the diffusion map . . . . . . . . . . . . . . . 71

3.9 ABP pulses preceding, during, and following the MI event . . . . . . 73

3.10 A mapping of the ABP pulse waveforms to R2 afforded by the first andthird non-trivial coordinates of the diffusion map . . . . . . . . . . . 73

4.1 A mapping of N = 734 locally averaged, time-localized power spectrainto R3 afforded by the top three non-trivial coordinates of the time-1diffusion map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2 A projection of the morphological dynamics on the recovered phasemanifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.1 The architecture of the 1-dimensional convolutional neural network . 99

5.2 The architecture of a single convolution block . . . . . . . . . . . . . 99

5.3 We apply the trained CNN model to each subject in CGMH-validationindividually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.4 We apply the trained CNN model to each subject in the UCDSADBindividually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.5 We construct input signals of various lengths by additionally consideringdata which precedes the labeled epoch . . . . . . . . . . . . . . . . . 106

5.6 We illustrate the features captured by the trained CNN model . . . . 111

5.7 We assess the predictive potential of the 10 features learned by theCNN model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.8 Typical activations in the last convolutional layer of the CNN model . 113

6.1 An ECG signal featuring atrial fibrillation . . . . . . . . . . . . . . . 126

6.2 After running each of the seven algorithms on a simulated, one-hourECG signal featuring Af, we plot the extracted f -waves . . . . . . . . 143

xiii

6.3 After running each of the five previous algorithms on a real, one-hourHolter recording featuring Af, we plot the extracted f -waves . . . . . 144

6.4 After running DD-NLEM and its variation NLEM on a real, one-hour Holterrecording featuring Af, we plot the extracted f -waves . . . . . . . . . 146

6.5 A scatter plot of the mVR (mean modified ventricular residue) indexversus the normalized mean squared error (NMSE) . . . . . . . . . . 147

xiv

Chapter 1

Introduction

1.1 Biomedical Time Series

Biomedical time series (vital signs) are obtained when energy emitted by the human

body is passed through a transducer to a digital recording device. A variety of

energy mediums can be digitized, including electrical energy, fluid pressure, vibration,

gas partial pressure, and sound. Examples of biomedical time series include the

electrocardiogram (ECG), the electroencephalogram (EEG), the photoplethysmogram

(PPG), the electrooculogram (EOG), and the electromyogram (EMG). Some recording

tools are invasive, but others are non-invasive; some tools are active, but others observe

passively. Vital signs are windows through which we may view the physiological

state of systems within the human body. The ECG observes the cardiac system, the

EEG observes the nervous system, the PPG observes the venous system, the EOG

documents eye movement, and the EMG observes the musculoskeletal system.

In most cases, a fixed number of measurements (called the sampling rate of the

signal) are made per second. Vital signs recorded at higher sampling rates contain

larger amounts of information but require increased processing power and more storage

space. Vital signs can be recorded for any amount of time, provided enough storage

space is available. Multiple recordings may be made simultaneously, in which case

information from a variety of viewpoints can be fused. On one hand, the resolution,

length, and number of modes in a biomedical time series all add richness to the

amount of information documented by the time series, but on the other hand, these

factors add complexity to the task of information extraction. The task of information

extraction is also complicated by the fact that recordings are noisy, contain nuisance

1

background information, are rarely of pristine quality, and lack spatial resolution.

Biomedical time series are used to guide medical decision-making. Time series are

typically assessed by the eye of a trained professional. This process is expensive and

error-prone. When appropriate, the task of signal annotation should be diverted to

an intelligent, computer-based algorithm. In fact, as technological advancements in

hardware and processing power make obtaining physiological data much easier, the

growing wealth of available physiological data is driving the necessity of automatic

annotation algorithms. A plethora of data exists, but effective analysis algorithms do

not. Finally, computer-based algorithms are necessary as the human eye is not capable

of detecting finer properties of high-resolution signals and compiling information from

a growing number of sources.

Biomedical time series are used in a variety of settings. The iconic 12-lead

surface ECG is used to diagnose cardiac arrhythmia and assess the need for a

rhythm intervention. These recordings are typically 10 seconds long, and aggregating

the unique perspectives provided by the individual recordings (made from different

locations on the body) allows an electrophysiologist to pinpoint the location of a

failing heart structure. The cardiac system is also studied in long-term settings; heart

rate information is collected over a number of days using compact mobile monitoring

devices, and scientists connect the presence of low-frequency oscillations in heart rate

(heart rate variability, or HRV) to patient outcomes. Large databases (with thousands

of subjects) are analyzed offline to identify modifiable risk factors present in the ECG

signal. In the operating room, the EEG-derived bispectral index (BIS) is used to

monitor depth of anesthesia. The arterial blood pressure signal (ABP, an invasive

measurement of the fluid pressure in the arteries) monitors cardiovascular condition

in the operating room, the emergency room, and the intensive care unit (ICU). In

the sleep laboratory, a collection of EEG, EOG, and EMG signals mark transitions

2

between sleep stages during a 6-8 hour night of sleep. Finally, the market for wearable

health monitoring devices has exploded in the past decade, and these inexpensive

devices are additionally being used to screen patients in developing countries, where

access to trained professionals may be limited.

Broadly speaking, biomedical time series can be divided into two classes. In the

first class lies signals such as the ECG, PPG, and ABP that are sequences of disjoint

cycles (see Figure 1.1). Instead of viewing such signals as a series of measurements

made at the sampling rate of the signal, we can view them as a sequence of heartbeats

occurring at irregularly-sampled time points. From this viewpoint, we can make

precise the principles guiding traditional approaches to electrophysiology; the shape of

each individual cycle corresponds to the state of the system during its appearance in

the time series, and changes in cycle shape are reflections of changes in system state.

The contributions of this thesis include a model describing this class of oscillatory

biomedical time series, called the wave-shape oscillatory model, and a model for

the correspondence between cycle shape and system state called the wave-shape

manifold model. The second class of signals includes the EEG, EOG, and EMG

which are chaotic in nature and do not feature clear cycles. Nevertheless, their spectral

properties reflect the state of the systems being monitored, and we can analyze how

their spectral properties change from time to time using time-frequency analysis

techniques such as the short-time Fourier transform (STFT) and the continuous

wavelet transform (CWT). Beginning in the third chapter, each succeeding chapter

in this thesis is devoted to a particular subset of biomedical time series. We analyze

oscillatory signals in the time domain, and we analyze chaotic signals in the frequency

domain. The remainder of this chapter will feature a discussion of dynamics in

biomedical time series (the primary information we are interested in assessing),

the wave-shape oscillatory model, and the wave-shape manifold model. This work

3

originally appeared in [LMW19]. The second chapter will review a manifold learning

technique, diffusion maps (DM), which allows us to perform dynamics extraction in

the third chapter and onward.

Figure 1.1: Simultaneously recorded 10-second ECG (top), PPG (middle), and ABP(bottom) signals. Cardiac cycles are delineated to the left and to the right of thecoloured markers. Note that the cycles in the three time series are not preciselyaligned because these signals measure cardiovascular activity from a variety of placesin the body and through different mediums.

1.2 Dynamics

The dynamics of a human system are the changes in state it experiences due to

interactions with an internal regulator such as the autonomic nervous system, periph-

eral systems, and the environment outside the body. For example, a healthy sleeper

4

naturally oscillates between the REM, N1, N2, and N3 sleep stages but wakes up

when externally stimulated. We are interested in detecting and quantifying these

changes in state. To build a rigorous algorithm which extracts dynamical information

from a biomedical time series, we need to fix a reasonable model for physiological

dynamics, and we need to clarify the sense in which dynamics are manifested in the

time series.

A model for physiological dynamics is a collection of properties which any real set

of physiological dynamics could, within reason, be assumed to possess. This collection

of properties should be general enough to encapsulate as many types of physiological

dynamics as possible. Having fixed a system of interest, we assume the state-space

model for dynamics, in which

the collection of system states is a smooth, compact, low-dimensional Riemannian

manifold N called the phase manifold;

the state of the system at time t ∈ R is described by ψ(t), where ψ:R→ N .

We assume no additional properties for ψ. However, in some cases, it will be useful to

assign additional regularity to ψ. For example, we could consider ψ to be driven by

a vector field on N , in the sense that ψ′(t) = Xψ(t) (in which case ψ is an integral

curve), or ψ′(t) = Xψ(t) + dωt, where dω is the derivative of Brownian motion. Note

that N is not assumed to be connected or lack boundary. The state-space model for

dynamics was developed by Boltzmann, Poincare, and Gibbs, and it is commonly

considered in dynamical systems theory to model and predict physical phenomena.

From a physiological viewpoint, the state-space model for dynamics, including the

low dimensionality of the phase manifold, is reasonable due to the phenomenon of

homeostasis, wherein a human system remains at equilibrium when perturbed. As a

consequence, the states we observe are not random but are well-constrained. On the

5

other hand, when a system is perturbed, the response of the perturbed system is to

initiate a feedback mechanism through which the system may return to an equilibrium.

Negative feedback mechanisms bring the system back to the same equilibrium, and

positive feedback mechanisms bring the system to a new equilibrium. In this sense,

transitions between equilibria are also well-constrained by the scope of available

feedback mechanisms.

A full recovery of the dynamics is an isometric embedding Θ from the abstract

phase manifold N to a Hilbert space. In this case, the geodesic distance between two

points on N is equal to the `2 distance between their images in the Hilbert space.

Moreover, the mapping Θ pushes the time-dependent ψ to a curve Θ ψ on the

embedded phase manifold, which is a new time series summarizing the evolution of

system state over time. Our recovery is not required to quantify system states in

an absolute sense, but only in a relative sense, as our recovery is unique only up to

isometry. A partial recovery of the dynamics occurs when Θ is simply an embedding of

the phase manifold; in many cases, the observation process distorts the phase manifold,

making a recovery of the metric almost impossible. Nevertheless, an embedding of

the phase manifold will preserve topological information.

1.3 The Wave-shape Oscillatory Model

In this section, we make precise the signals that belong to our oscillatory class by

defining the wave-shape oscillatory model. We describe how dynamics are manifested

in these oscillatory biomedical time series. In particular, we define the wave-shape

manifold, through which we may access the phase manifold.

Definition 1.3.1. A biomedical time series f :R → R adheres to the wave-shape

6

oscillatory model with wave-shape manifold M⊂ L2(R) if

f(t) =∑j≥1

sj(t− tj) + Φ(t) t ∈ R, (1.1)

where tj ∈ R is the time at which the j-th cycle occurs, sj ∈M is the j-th cycle, and

Φ:R→ R is stationary random noise.

Definition 1.3.2. Let H be the subspace of C2(R) consisting of functions whose

support is a subset of [−1/2, 1/2]. A wave-shape manifold is a smooth, compact, and

low-dimensional Riemannian manifold embedded in H.

We describe the correspondence between system state and cycle shape using a

diffeomorphism ρ:N →M which encodes signal-specific distortions incurred during

the observation process. The wave-shape manifold takes an approach to modeling

signals like the ECG, PPG, and ABP that generalizes previous mathematical models.

The Phenomenological Model

To begin describing these previous mathematical models and their relationships to

the wave-shape oscillatory model, consider a time series of the form

f(t) = s(t) t ∈ R, (1.2)

where s ∈ C1,α(R) is 1-periodic (in the sense that s(t) = s(t + 1) for all t ∈ R)

and possibly non-sinusoidal. In particular, the function s on the interval [−1/2, 1/2]

describes the “shape” of the cycles appearing in the time series. This simple model is

yet insufficient for describing physiological time series because physiological time series

almost always vary in frequency and amplitude over time (due to the dynamics of the

underlying system). The phenomenological model proposed in [Wu13, Yan15, HS16]

takes amplitude and frequency modulation into account. The oscillatory time series

f :R→ R is modeled as

f(t) = a(t)s(φ(t)) + T (t) + Φ(t) t ∈ R, (1.3)

7

where

a ∈ C1(R) ∩ L∞(R) is a positive, smooth function describing amplitude modu-

lation;

s ∈ C1,α(R) is 1-periodic and describes the “shape” of the oscillations in the

time series (coined the wave-shape function [Wu13]);

φ ∈ C2(R) is a smooth, monotonically-increasing function describing phase

modulation (φ′ ∈ L∞(R) is termed the instantaneous frequency of f);

T ∈ C1(R) is the trend that intuitively is “locally constant” [CCW14, (6)];

Φ:R→ R models the inevitable random noise, which we assume to be stationary.

In this work, we always assume that the trend has been removed. For this reason, we

do not mention a trend term in the definition of the wave-shape oscillatory model.

Since a and φ′ are not required to be constant, this model is suitable for describing

physiological time series that change in amplitude and frequency over time (with

some degree of regularity). Under this phenomenological model, the usual mission in

time series analysis is estimating a, s, φ, and the statistical properties of Φ from one

realization of f . To ensure identifiability, we need to assume that a and φ′ are slowly

varying. To be specific, we say that a and φ′ are slowly varying with parameter ε > 0

if the following conditions hold.

|a′(t)|≤ εφ′(t) for all t ∈ R;

|φ′′(t)|≤ εφ′(t) for all t ∈ R.

When we recover a and φ, we are recovering dynamical information about the system

being monitored in a limited (but in many cases sufficient) sense. When we recover s,

we are recovering a representation of the “mean” state for the system. This statement

8

will be made more precise later, but it corresponds well with traditional approaches

to electrophysiology. We now provide some specific examples of how a and φ encode

the changing physiological state of the system being monitored.

Example 1. The instantaneous amplitude of a single-lead ECG signal is directly

related to respiratory volume (see Figure 1.2). When the lung is full of air, the ECG

electrode moves further from the heart and thoracic impedance increases, causing

the amplitude of the recorded ECG signal to decrease. On the other hand, when the

lung is empty, the ECG electrode moves closer to the heart and thoracic impedance

decreases [CAM+06, Chapter 8]. This relationship between respiratory volume and

the amplitude of the ECG signal has lead to the design of algorithms which estimate

the respiratory signal from the ECG signal. The estimated respiratory signal is called

the ECG-derived respiration (EDR) signal. We discuss EDR signals in Section 3.2.

Figure 1.2: The instantaneous amplitude of an ECG signal gives surrogate respiratoryinformation. Top: we plot a respiratory flow signal in blue; bottom: we plot thesimultaneously-recorded ECG signal and illustrate its amplitude modulation usingthe dotted pink lines.

9

Example 2. While the pulse rate of the sinoatrial (SA) node is constant, heart rate

is generally not constant [SMZ14, Gla09]. This discrepancy is caused by neural and

neuro-chemical influences on the pathway from the SA node to the cardiac muscles.

Heart rate can be modeled as the instantaneous frequency of the ECG signal (see

Figure 1.3); variations in heart rate (which correspond to frequency modulation of the

ECG signal) are the main object of study in the field of heart rate variability (HRV)

analysis. HRV analysis will be studied in Chapter 5.

Figure 1.3: The instantaneous frequency of an ECG signal is known as the instanta-neous heart rate. In red, we plot the instantaneous heart rate signal corresponding tothe ECG signal in black.

The wave-shape function s deserves some more discussion. In particular, we

discuss the physiological information it encodes and how recovering s is akin to

traditional electrophysiological practices. When analyzing ECG, electrocardiologists

derive physiological information from the cardiac waveform independently of the

amplitude and frequency modulation inherent in the signal. Their approach could be

called a landmark approach to biomedical time series analysis, wherein they notice

that the components of the cardiac waveform (as manifested in the ECG) correspond

10

to different stages of heart contraction (and hence the heart substructures that are

active during those stages). Landmarks on the enigmatic template are conventionally

labeled as shown in Figure 1.4, and each deflection corresponds to a particular stage

of heart contraction.

P

Q

R

S

T

U

Figure 1.4: Landmarks on the enigmatic ECG template correspond to distinct stagesof a healthy heart contraction. The P wave corresponds to the depolarization of theatria, the QRS complex corresponds to the depolarization of the ventricles, the Twave corresponds to the repolarization of the ventricles, and the U wave correspondsto the repolarization of the papillary muscles.

Example 3. When diagnosing atrial fibrillation (a cardiac arrhythmia associated

with heart failure and stroke) using the ECG signal, cardiologists look for the absence

of P waves, among other considerations (see Figure 1.5). The cardiac cycle is assessed

independently of the frequency modulation caused by heart rate, the amplitude

modulation caused by respiration, and the noise in the signal. In atrial fibrillation, an

additional confounding factor when recovering the cardiac waveform s is the presence

of a noise-like component called the fibrillatory wave (f -wave). Atrial fibrillation,

including its modeling and analysis, is further discussed in Chapter 6.

Example 4. When diagnosing myocardial infarction (blockage of the coronary

artery), electrocardiologists look for a phenomenon called ST segment elevation,

11

Figure 1.5: Atrial fibrillation is characterized by a lack of P waves, irregularly-oc-curring ventricular contractions, and a rapid component called the fibrillatory wave.

wherein the cardiac waveform in any precordial lead reaches an electric potential of

approximately 0.2 mV between the S and T landmarks. While a high heart rate

usually accompanies a heart attack, assessing the cardiac waveform independently of

the heart rate allows physicians to detect the difference between, for example, high

heart rates that accompany physical activity and high heart rates that accompany

adverse cardiac events. This phenomenon will be studied in Section 3.3.

Example 5. In ECG-based cardiac waveform analysis, the QT interval is the length

of time between the start of the Q wave and the end of the T wave. Since heart rate

is known to be negatively correlated with cardiac cycle duration, QT interval lengths

are commonly “corrected” so that the QT interval lengths of subjects with different

heart rates can be effectively compared. This correction process roughly amounts to

recovering s independently of any frequency modulation. An abnormal QT interval is

associated with an increased risk for sudden cardiac death. In clinical trials for new

medications, pharmaceutical companies use the corrected QT interval to assess the

increased risk for sudden cardiac death that a patient taking the new drug will incur.

We return to a mathematical discussion of the wave-shape function. Due to the

12

smoothness of s, it has a point-wise Fourier representation

s(t) = α0 +∞∑k=1

αk cos(2πkt+ βk) t ∈ R, (1.4)

where α0 ∈ R and αk ≥ 0 are associated with the Fourier coefficients of s, and

βk ∈ [0, 2π). Hence, we have the following expansion for a(t)s(φ(t)) in (1.3):

a(t)s(φ(t)) = a(t)[α0 +

∞∑k=1

αk cos(2πkφ(t) + βk)]

t ∈ R. (1.5)

The signal a(t)s(φ(t)) can be interpreted in two different ways. First, we could view

it as an oscillatory signal with one oscillatory component; in this case, the oscillation

is non-sinusoidal. Second, we could view it as an oscillatory signal with multiple

oscillatory components, each having a cosine oscillatory pattern (1.5); in this case, we

call the first oscillatory component a(t)α1 cos(2πφ(t) + β1) the fundamental compo-

nent and a(t)αk cos(2πkφ(t) + βk) (for k ≥ 2) the k-th multiple of the fundamental

component. Clearly, α0 is the zero-frequency term of the wave-shape function, and the

instantaneous frequency of the k-th multiple is k-times that of the fundamental compo-

nent. The second viewpoint is easier to analyze theoretically (e.g. via time-frequency

methods such as the short-time Fourier transform and its associated synchrosqueezed

representation). However, as described in the above examples, the first viewpoint is

more physiological and coincides well with traditional electrophysiological practices.

Physicians diagnose various cardiac diseases by reading the wave-shape function.

However, physicians read the waveform in the time domain and not in terms of

the waveform’s Fourier decomposition. We have thus confirmed the utility of the

wave-shape function for modeling and studying an oscillatory biomedical time series.

13

Generalizing the Phenomenological Model

While the phenomenological model (1.3) is able to capture the non-sinusoidal nature

of cycles in an oscillatory biomedical time series, it does not capture time-dependent

changes in oscillatory morphology (aside from those deviations due to amplitude and

frequency modulation). Physiologically, this time-varying oscillatory morphology is

important. Modeling changes in oscillatory morphology is especially relevant due to

the increased prevalence of long-term, mobile cardiac monitoring.

Example 6. During a surgical procedure in which a general anesthetic is required,

dosage is varied and administered based on the discretion of the supervising anesthesi-

ologist or nurse anesthetist; vital signs are recorded for the duration of the procedure,

which may be several hours. The changing concentration of anesthetic in the body will

continuously modulate the cardiac waveform of the patient. The dynamics underlying

general anesthesia are discussed in [WWH+19], and the analysis is extended to ABP

signals.

Example 7. In the mobile monitoring of patients suspected of having heart disease,

recordings are many hours long. However, adverse cardiac events such as paroxys-

mal supraventricular tachycardia or myocardial ischemia are only a few minutes in

duration; the rest of the time, the patient appears normal. In this case, the cardiac

waveform changes shape depending on the modulating cardiac health of the patient

being monitored.

Motivated by the above examples, the phenomenological model is generalized

to fully capture the time-varying morphology of cycles [LSW18]. A similar model

is considered in the followup research articles [XYD18, Yan17]. The main idea is

intuitive. We generalize αk∞k=0 in (1.4) to be time-varying: for each k ≥ 0 we let

14

a(t)αk be a new function Ak(t). We also let each kφ(t) + βk/(2π) be a new function

φk(t). We then have

a(t)s(φ(t)) = A0(t) +∞∑k=1

Ak(t) cos(2πφk(t)) t ∈ R, (1.6)

where Ak and φk satisfy a few conditions. In addition to the slowly varying assumption

imposed in the phenomenological model, we have

|φ′k(t)− kφ′1(t)|≤ εφ′1(t) for all k ∈ N+, where φ′1 > 0;

Ak(t) ≤ ckA1(t) for all k ∈ N+, where A1 > 0 and ck∞k=1 is an `1 sequence;

|φ′′k(t)|≤ εkφ′1(t) for all k ∈ N+;

|A′k(t)|≤ εckφ′1(t) for all k ∈ N+.

Other regularity conditions listed in [LSW18, Definition 2.1] also hold. Here, the

conditions |φ′k(t) − kφ′1(t)|≤ εφ′1(t) and Ak(t) ≤ ckA1(t) capture the fact that the

wave-shape is not fixed. The conditions |φ′′k(t)|≤ εkφ′1(t) and |A′k(t)|≤ εckφ′1(t) capture

the fact that the wave-shape may not change dramatically from one cycle to the next.

While this model has been used to design algorithms which handle various phys-

iological problems such as fetal ECG analysis [SW17b], fetal magnetocardiography

[EVWFE18], simultaneous heart rate and respiratory rate estimation from the PPG

[CW17], and cardiogenic artifact recycling [LWM19], its dependence on the “slowly

varying wave-shape” assumption limits its applicability to physiological time series.

Moreover, since the generalized phenomenological model [LSW18] encodes the time-

varying nature of the wave-shape function in the frequency domain, the wave-shape

function interacts with the instantaneous frequency in a non-trivial way, which further

limits its ability to independently quantify dynamics encoded by the instantaneous

frequency and the time-varying wave-shape. Specifically, when the subject is not of

normal physiology, the model is limited.

15

Example 8. Premature ventricular contractions are heartbeats which are not trig-

gered by the SA node but instead originate in the ventricles (the lower compartment

of the heart). The frequency at which these beats occur is associated with congestive

heart failure [DDV+15]. Premature ventricular contractions manifest as the heart

“skipping a beat” and appear at seemingly random times. Morphologically, they appear

very different from beats triggered by the SA node, and a signal featuring premature

ventricular contractions will not adhere to the “slowly varying wave-shape” assump-

tion (see Figure 1.6). A patient may have multiple morphologically distinct types of

premature ventricular contractions, in which case his or her risk for cardiac disease

increases. Premature ventricular contractions will be discussed later in Section 3.1.2.

Figure 1.6: ECG signals featuring premature ventricular contractions are difficultto model because the wave-shape is not slowly varying. Premature ventricularcontractions are morphologically distinct from normal beats and are not triggered bythe SA node.

We refer readers with interest in the generalized phenomenological model to

[LSW18, XYD18, Yan17] for its theoretical details and to [SW17b, EVWFE18, CW17,

LWM19] for applications. As useful as this model is, it cannot capture the (non-

slowly) time-varying morphology of cycles in an oscillatory biomedical time series.

Specifically, we have interest in those dynamics which are encoded by variations in

cycle morphology that are independent of amplitude and frequency modulation.

16

Connecting the Phenomenological Model to the Wave-shape Oscillatory

Model

We show that for a given physiological signal satisfying the phenomenological model,

the collection of all oscillatory patterns can be well-approximated by a one-chart

manifold. The slowly varying assumption is used to show that each oscillatory

pattern in the physiological signal is “close” to one whose amplitude and frequency

are constant. The following theorem says that the collection of all such constant-

amplitude, constant-frequency oscillatory patterns is a one-chart manifold. We then

use this opportunity to simulate such a one-chart wave-shape manifold and visualize

its structure.

Theorem 1.3.3. Suppose s ∈ C2(R) is a wave-shape function whose support is a

subset of [−1/2, 1/2]. Let U = I1 × I2, where I1 ⊂ (0,∞) and I2 ⊂ (1,∞) are two open

intervals of finite lengths. Define a map Φ:U → H ⊂ L2(R) by sending an ordered

pair (a, f) ∈ U to the function R 3 t 7→ as(ft). Then Φ is a C1 diffeomorphism onto

an open subset M := Φ(U) ⊂ L2(R); that is, M is a manifold with one chart.

Proof. Note that since s ∈ C2c (R), Φ(U) ⊂ C2

c (R). Suppose (a1, f1) 6= (a2, f2) while

‖Φ(a1, f1)− Φ(a2, f2)‖2= 0. (1.7)

We abuse the notation and denote Φ(a1, f1) = a1s(f1t) to simplify the discussion

when there is no danger of confusion. Since s continuous, we have

a1s(f1t) = a2s(f2t) (1.8)

for all t ∈ R. Suppose a1s(f1t) has support [a, b] ⊂ [−1/2f1, 1/2f1]. Without loss

of generality, assume f2 > f1, a < 0, and b > 0. The support of a2s(f2t) is thus

[af1/f2, bf1/f2] ⊂ [a, b]. Hence, we can find x ∈ [a, af1/f2] ∪ [bf1/f2, b] so that

17

a1s(f1x) 6= 0 but a2s(f2x) = 0, which contradicts (1.8). As a result, we have shown

that Φ is one-to-one and onto M := Φ(U).

We claim that the total differential of Φ is

DΦ|(a0,f0)(h1, h2) = h1s(f0t) + h2a0ts′(f0t), (1.9)

where (a0, f0) ∈ U and (h1, h2) ∈ R2. Note that Φ is a Hilbert space-valued function.

Indeed, when (h1, h2) = (a− a0, f − f0), we have

Φ(a, f)− Φ(a0, f0)−DΦ|(a0,f0)(a− a0, f − f0) (1.10)

= as(ft)− a0s(f0t)− [(a− a0)s(f0t) + (f − f0)a0ts′(f0t)]

= a0[s(ft)− s(f0t)− (f − f0)ts′(f0t)] + (a− a0)(s(ft)− s(f0t)).

By the integral form of Taylor’s expansion, we have

s(ft)− s(f0t) =

∫ f

f0

ts′(zt) dz (1.11)

and

s(ft)− s(f0t)− (f − f0)ts′(f0t) =

∫ f

f0

(f − z)t2s′′(zt) dz. (1.12)

Here, we view s(ft) as a function of f . Next, we bound the L2 norm of (1.10). Indeed,

we have ∥∥∥∫ f

f0

(f − z)t2s′′(zt) dz∥∥∥L2

(1.13)

=

∫ ∫ f

f0

∫ f

f0

(f − z)t2s′′(zt)(f − z′)t2s′′(z′t) dzdz′dt

=

∫ f

f0

∫ f

f0

(f − z)(f − z′)[ ∫

t2s′′(zt)t2s′′(z′t) dt]dzdz′.

Note that since s ∈ C2c (R), we have∣∣∣ ∫ t2s′′(zt)t2s′′(z′t) dt

∣∣∣ ≤ (∫ t4|s′′(zt)|2 dt)1/2(∫

t4|s′′(z′t)|2 dt)1/2

=C

zz′

18

for some C > 0 depending on the fourth absolute moment of s′′ only. As a result,∥∥∥∫ f

f0

(f − z)t2s′′(zt) dz∥∥∥L2≤ C

∫ f

f0

∫ f

f0

|(f − z)(f − z′)|zz′

dzdz′ (1.14)

≤ C

f 20

(∫ f

f0

|f − z| dz)2

=C|f − f0|4

4f 20

.

Similarly, we can bound∫ ff0ts′(zt) dz. Therefore,

‖Φ(a, f)− Φ(a0, f0)−DΦ|(a0,f0)(a− a0, f − f0)‖L2≤ C|f − f0|4, (1.15)

which leads to

‖Φ(a, f)− Φ(a0, f0)−DΦ|(a0,f0)(a− a0, f − f0)‖L2

‖(a− a0, f − f0)‖→ 0 (1.16)

when ‖(a− a0, f − f0)‖→ 0. To finish the proof, we show that the total differential of

Φ at (a, f) ∈ U , DΦ|(a,f), is of full rank for any (a, f) ∈ U . It suffices to show that

s(ft) and ats′(ft) are linearly independent in L2(R). Suppose there are constants

c1, c2 ∈ R such that for all t ∈ R,

c1s(ft) = c2ats′(ft). (1.17)

Suppose c1 6= 0. In this case, c2 6= 0; otherwise, we have a contradiction. Since

s ∈ C2c (R), there exists t 6= 0 such that s(ft) 6= 0 is the extremal value of s; that

is, fs′(ft) = 0. Since f > 0, s′(ft) = 0. Therefore, we have c1s(ft) 6= 0 but

c2ats′(ft) = 0, which is a contradiction. As a result, we conclude that c1 = 0. In this

case, c2 must be 0; otherwise, we have a contradiction. This concludes the claim that

DΦ|(a,f) is of full rank. We conclude that Φ:U →M is a differomorphism.

See Figure 1.7 for an example of the wave-shape manifold in Theorem 1.3.3. We can

clearly see the nonlinear structure of the one-chart wave-shape manifold determined

by the 1-periodic wave-shape function. Next, we show that for a signal satisfying the

phenomenological model (1.3), it can be well-approximated by the manifold indicated

in Theorem 1.3.3.

19

Figure 1.7: An illustration of the wave-shape manifold in Theorem 1.3.3. Thewave-shape function (left) is the db4 wavelet. We scale and dilate s by independentlysampling amplitudes uniformly from I1 = [3/4, 5/4] and frequencies uniformly fromI2 = [1, 9]. On the right, we visualize the wave-shape manifold by linearly projectinga set of N = 2500 discretized wave-shape functions from R7169 to R3. Each pointis a wave-shape function t 7→ as(ft); red corresponds to high frequencies, and bluecorresponds to low frequencies.

Theorem 1.3.4. Take ε > 0 to be sufficiently small. Consider f :R→ R satisfying

the phenomenological model (1.3):

f(t) = A(t) s (φ(t)) . (1.18)

Assume without loss of generality that inft∈RA(t) > 0 and inft∈R φ′(t) = 1. Define

tn := φ−1(n), where n ∈ Z, and define functions on R as

gn(t) :=

A(tn)s(φ′(tn)t) if −1

2φ′(tn)≤ t ≤ 1

2φ′(tn)

0 otherwise,(1.19)

and

fn(t) :=

A(tn + t)s(φ(tn + t)) if −1

2φ′(tn)≤ t ≤ 1

2φ′(tn)

0 otherwise.(1.20)

We then have uniformly over n that

‖gn − fn‖∞≤ Cε , (1.21)

where C = C(‖A‖∞, ‖φ′‖∞, ‖s‖C1 , ‖φ′′‖∞).

20

Proof. Note that for each n ∈ Z, gn and fn are all supported in [−1/2, 1/2]. By a direct

calculation, we have

|gn(t)− fn(t)|≤|A(tn)− A(tn + t)||s(φ′(tn)t)| (1.22)

+ A(tn + t)|s(φ′(tn)t)− s(φ(tn + t))| .

for all t ∈ [−1/2, 1/2], and |gn(t)− fn(t)|= 0 for all t /∈ [−1/2, 1/2]. By a direct bound,

we have

|A(tn)− A(tn + t)| ≤∫ tn+t

tn

|A′(x)| dx ≤ ε

∫ tn+t

tn

[φ′(tn) +

∫ x

tn

φ′′(z) dx] dx

≤ εφ′(tn)|t|+ε12‖φ′′‖∞|t|2≤

1

2ε[φ′(tn) +

1

4‖φ′′‖∞

], (1.23)

where the last bound holds since we only need to control t ∈ [−1/2, 1/2]. For the other

term, note that

s(φ(tn + t)) = s

(φ(tn) + φ′(tn)t+

∫ tn+t

tn

φ′′(z)z dz

)(1.24)

= s

(φ′(tn)t+

∫ tn+t

tn

φ′′(z)z dz

)

since φ(tn) ∈ Z and s is 1-periodic. Therefore, by denoting J :=∫ tn+t

tnφ′′(z)z dz, we

have

|s(φ′(tn)t)− s(φ(tn + t))| ≤∫ tn+J

tn

|s′(x)| dx = J‖s′‖∞. (1.25)

J is controlled in the same way:

|J | ≤ ε

∫ tn+t

tn

φ′(z)z dz ≤ ε

∫ tn+t

tn

z[φ′(tn) +

∫ z

tn

φ′′(w) dw]dz

≤ ε(φ′(tn)|t|2+

1

3‖φ′′‖∞|t|3

)≤ 1

4ε(φ′(tn) +

1

6‖φ′′‖∞

).

As a result, we obtain the claim with C = C(‖A‖∞, ‖φ′‖∞, ‖s‖C1 , ‖φ′′‖∞).

21

A direct consequence of this theorem is that gnn∈Z is a subset of the one-

chart manifold M := Φ(U) described in Theorem 1.3.3, where U = Ia × If ,

Ia = (infnA(tn), supnA(tn)) and If = (1, supn φ′(tn)). As a result, the collection

of oscillatory cycles, fnn∈Z, can be parametrized by M up to a controllable error

depending on εC(‖A‖∞, ‖φ′‖∞, ‖s‖C1 , ‖φ′′‖∞). We mention that a similar argument

can be applied to the signal satisfying the generalized phenomenological model sum-

marized in Section 1.3 with more tedious notation and calculation. Since it does not

shed more light upon the topic, we omit the details.

1.4 Dynamics in Chaotic Time Series

Biomedical time series such as the electroencephalogram (EEG), the electromyogram

(EMG), and the electrooculogram (which measure brain activity, muscle activity,

and eye movement, respectively) do not adhere to the wave-shape oscillatory model.

Nevertheless, there is still a sense in which they are observations of a process on a

manifold hosting physiological states. For the simple reason that these signals are

exceptionally challenging to model, we call them chaotic. EEG signals in particular

appear to be summations of numerous intermittent oscillatory “bursts” of electrical

energy, each of which has a well-localized fundamental frequency. These bursts of

energy (called neural oscillations or brainwaves) correspond to neuronal activity and

are associated with cognitive processing. The spectral content of a subject’s EEG

signal is known to vary with his or her state of consciousness. Activity in the delta

band (0-4 Hz) is associated with unconsciousness or deep sleep, activity in the theta

band (4-8 Hz) is associated with meditation and drowsiness, activity in the alpha

band (8-16 Hz) is associated with wakeful relaxation, the beta band (16-32 Hz) is

associated with active thinking, and the gamma band (32-100 Hz) is associated with

a phenomenon termed unified perception, in which different parts of the brain work

22

together to accomplish demanding cognitive tasks. In this sense, the time-localized

spectrum of the EEG signal is a non-negative, integrable function on the real line

which provides some coarse information about the state of the brain. In this work, we

will assume that the space of such functions is a smooth, compact, low-dimensional

Riemannian manifold embedded in L2(R) which is diffeomorphic to the phase manifold

(as in the wave-shape manifold model). In principle, the time-varying spectrum of the

EEG signal is akin to the time-varying morphology of the cardiac waveform. Localizing

the spectral content of the EEG signal in time can be accomplished using techniques

such as the short-time Fourier transform or the continuous wavelet transform. We

can use the resulting time-frequency representation to observe neural dynamics. One

clear advantage to this approach is that we can localize the spectrum of the EEG

signal at any point in time (as opposed to at the incidence of certain landmarks), and

we take the opportunity to observe the spectrum at regular sampling intervals.

In comparison with signals adhering to the wave-shape oscillatory model, the

time-domain structure of the EEG signal is very difficult to assess. In the field of

cognitive neuroscience, neural oscillations with specific time-domain features that

are characteristic of certain cognitive tasks (event-related potentials) are obtained

by averaging together multiple aligned recordings of a subject repeatedly performing

that task. The difficulty of obtaining an event-related potential arises because of the

high level of background neural activity; in cardiac vital signs, background activity

is minimal. In Chapter 4, we discuss the problem of sleep stage classification using

the EEG signal, and we handle the background activity by taking two simultaneously

recorded EEG channels into account.

23

Chapter 2

Diffusion Maps

To recover the wave-shape manifold M and the process ρ∗ψ which is a surrogate for

the underlying physiological dynamics, we go to the field of manifold learning. In

manifold learning, high-dimensional or abstract data sets (viewed as point clouds

sampled from a low-dimensional manifold) are mapped to a Hilbert space so that

the geodesic distance between any two points on the manifold is approximately

equal to the `2 distance between their images. The main challenges facing dynamics

extraction are the nonlinearity of the wave-shape manifold and the possible noise

that deviates observed oscillatory patterns from the wave-shape manifold. To handle

these challenges, we use the diffusion maps algorithm. We choose the diffusion maps

algorithm because of its sound theoretical support. Since its appearance in 2006

[CL06] as a generalization of the Laplacian Eigenmap [BN03], a series of theoretical

works have been carried out which study the graph Laplacian (the main tool used

by the diffusion maps algorithm) and the behavior of diffusion maps in the manifold

setting. However, the diffusion maps algorithm has meaning in settings where the

data set has much less regularity. We begin in the most general setting.

2.1 Diffusion on a Measure Space

This section roughly follows the exposition of [Laf04]. Let (X,A, µ) be a measure space.

Let µ be a finite measure. Our definition of measure includes countable additivity.

Assume that the local geometry of X is informed by a kernel k:X ×X → C which is

24

an element of L2(X ×X,µ× µ)∫X

∫X

k(x, y)k(x, y) dµ(x)dµ(y) <∞; (2.1)

Hermitian

k(x, y) = k(y, x) x, y ∈ X; (2.2)

positive definite∫X

∫X

f(x)k(x, y)f(y) dµ(x)dµ(y) ≥ 0 f ∈ L2(X,µ). (2.3)

The simplest example of such a measure space is a finite set of nodes in a graph; the

accompanying kernel describes node adjacency. Define the degree function as

d(x) =

∫X

k(x, y) dµ(y) x ∈ X, (2.4)

and assume k is such that d(x) 6= 0 for all x ∈ X. Note that we achieve this condition

in particular if k is non-negative and k(x, x) > 0 for all x ∈ X. This condition is

needed to avoid dividing by zero in the following definition. Define the diffusion kernel

p:X ×X → C by

p(x, x′) =k(x, x′)

d(x)x, x′ ∈ X, (2.5)

and call the corresponding bounded linear map

Pf(x) :=

∫X

p(x, y)f(y) dµ(y) f ∈ L2(X,µ), x ∈ X (2.6)

the diffusion operator. When k is non-negative, p(x, · ) can be thought of as a

probability density function, and taking powers of P amounts to iterating a Markov

process over X, where the transition probabilities are informed by k. For an integer

25

t > 0 which we term the diffusion time, let pt:X ×X → R denote the integral kernel

associated to P t. Define the diffusion distance as

Dt(x, x′)2 =

∫X

[pt(x, y)− pt(x′, y)]2dµ(y)

d(y)x, x′ ∈ X. (2.7)

The notion in which this quantity is a distance will be made clear in the following

analysis. Moreover, the notion in which this distance provides geometric information

intrinsic to the space X will be made more precise when we consider the case where

X ⊂ Rn is a submanifold. For now, our goal is find a mapping Φt:X → H into a

Hilbert space H such that

Dt(x, x′) = ‖Φt(x)− Φt(x

′)‖H. (2.8)

Define the isotropic diffusion kernel

k(x, x′) =k(x, x′)√d(x)d(x′)

x, x′ ∈ X. (2.9)

Then k ∈ L2(X × X,µ × µ) is also Hermitian and positive definite. The isotopic

diffusion operator

Kf(x) :=

∫X

k(x, y)√d(x)d(y)

f(y) dµ(y) f ∈ L2(X,µ), x ∈ X (2.10)

is bounded with norm 1. Indeed, by the Cauchy-Schwarz inequality,

〈Kf, f〉 ≤√‖Kf‖‖f‖ (2.11)

Since K√d(x) =

√d(x), we see that ‖K‖= 1. We also know that K is Hilbert-

Schmidt (c.f. [Con90, Page 267] or [RS80, Page 210]). Using the Hilbert-Schmidt

Theorem and the fact that k is Hermitian, we see that

k(x, y) =∑

λnφn(x)φn(y), (2.12)

where φn is the complete orthonormal basis for L2(X,µ) consisting of eigenfunctions

for K, and 1 = λ1 ≥ λ2 ≥ · · · are the corresponding eigenvalues. Recall that

26

eigenfunctions of Hermitian operators are orthogonal and that their eigenvalues are

real. Moreover, since k is positive definite, λn ≥ 0 for all n. The convergence of (2.12)

is in terms of the norm of L2(X×X,µ×µ). We can use (2.12) to get a decomposition

of the diffusion kernel. We have

pt(x, y) =∑

λtnφn(x)√d(x)

φn(y)√d(y). (2.13)

Note that P t now has meaning when t > 0 is not an integer. We can then show that

Dt(x, x′) =

∑λ2tn

(φn(x)√d(x)

− φn(x′)√d(x′)

)2

, (2.14)

which implies that the diffusion distance is a metric when λn > 0 for all n ∈ N+.

Moreover, the diffusion distance is evidently the `2(N+)-distance between sequences

Dt(x, x′)2 =

∥∥∥∥∥∥∥λ

t1ϕ1(x)λt2ϕ2(x)

...

−λ

t1ϕ1(x′)λt2ϕ2(x′)

...

∥∥∥∥∥∥∥

2

`2(N+)

, (2.15)

where we have made the substitution ϕn(x) = φn(x)/√d(x). Note that

Pϕn(x) =1√d(x)

Kφn(x) =1√d(x)

λnφn(x) = λnϕn(x), (2.16)

so ϕn is an unnormalized eigenfunction of the diffusion operator P with eigenvalue

λn. This observation provokes the definition of the diffusion map Φt:X → `2(N+):

Φt(x) =

λ1ϕ1(x)λ2ϕ2(x)

...

, (2.17)

which satisfies the property (2.8) in which we had interest.

2.2 Diffusion on a Compact Domain

This section follows the exposition of [MNY06]. Mercer theory allows us to strengthen

the decomposition of the diffusion kernel provided by the Hilbert-Schmidt theorem,

27

under suitable assumptions. The easiest way to ensure all assumptions are met is the

following. Let X be a compact subset of Rn. Let µ be a finite Borel measure which is

supported on X. This means that if U ⊂ X is non-empty and open, then µ(X) > 0.

Note that our definition of measure includes countable additivity. Take a continuous,

Hermitian, positive definite kernel k:X ×X → C on X. Then k necessarily satisfies∫X

∫X

k(x, y)k(x, y) dµ(x)dµ(y) <∞. (2.18)

Define the degree function as

d(x) =

∫X

k(x, y) dµ(y) x ∈ X, (2.19)

and assume k is such that d(x) > 0 for all x ∈ X. Since X ⊂ Rn, we have an obvious

metric on X, namely the Euclidean distance on Rn. Bochner’s theorem says that the

kernel

k(x, y) = h(‖x− y‖2) (2.20)

is positive definite provided h(0) = 1 and there exists a finite positive measure ν such

that

h(u2) =

∫Re−2πiξu dν(ξ). (2.21)

For example, we often take the Gaussian kernel

k(x, y) = e−‖x−y‖2

. (2.22)

As before, the diffusion kernel p:X ×X → C is

p(x, x′) =k(x, x′)

d(x)x, x′ ∈ X, (2.23)

and the diffusion operator is

Pf(x) :=

∫X

p(x, y)f(y) dµ(y) f ∈ L2(X,µ), x ∈ X. (2.24)

28

The diffusion distance is

Dt(x, x′)2 =

∫X

[pt(x, y)− pt(x′, y)]2dµ(y)

d(y)x, x′ ∈ X, (2.25)

where t > 0, and pt is the integral kernel associated to the operator P t. The isotropic

diffusion kernel

k(x, x′) =k(x, x′)√d(x)d(x′)

x, x′ ∈ X. (2.26)

is also continuous (and hence L2-bounded), Hermitian, and positive definite. By

the Hilbert-Schmidt theorem, there are eigenfunctions φn (which form a complete

orthonormal basis of L2(X,µ)) and corresponding eigenvalues 1 = λ1 ≥ λ2 ≥ · · · ≥ 0

such that

k(x, y) =∑

λnφn(x)φn(y). (2.27)

Since k also satisfies Mercer’s condition, which states that∫X

∫X

f(x)k(x, y)f(y) dµ(x)dµ(y) <∞ f ∈ L2(X,µ), (2.28)

Mercer’s theorem says that the convergence of (2.27) is point-wise and uniform on

X. We also see that the eigenfunctions corresponding to non-zero eigenvalues are

continuous. We follow the procedure above to obtain a decomposition for pt, an

expression for the diffusion distance involving the unnormalized eigenvectors of P ,

and the diffusion map Φt.

2.3 Diffusion on a Manifold

We now proceed to the analysis of diffusion on a manifold. In this setting, we

assume thatM is an m-dimensional, closed (compact without boundary), and smooth

manifold embedded in Rn. We take the induced Riemmanian metric g onM (from the

29

canonical metric on Rn), and we assume that m ≤ n. Write dVg for the Riemannian

volume measure. Note that we do not have access to the metric g or a topological

description ofM (even its dimension). We have only a list of its elements. Nevertheless,

by applying the diffusion maps algorithm, we will be able to recover the metric. We

assume the existence of a family of kernels

kh(x, y) := k

(‖x− y‖√

h

), (2.29)

where k: [0,∞)→ R+ is a positive C2-function such that both k and k′ decay expo-

nentially fast at +∞. The Gaussian k(u) = e−u2

is one common example. Assume h

is chosen so that 0 <√h < τ, ι, where ι is the injectivity radius of M and τ is the

condition number [HWB08]. Define the degree function

d(x) =

∫Mkh(x, y) dVg(y) x ∈M, (2.30)

the diffusion kernel

p(x, y) =kh(x, y)

d(x)x, y ∈M, (2.31)

and the diffusion operator

Pf(x) =

∫Mp(x, y)f(y) dVg(y) f ∈ L2(M, dVg), x ∈M. (2.32)

The diffusion maps algorithm generalizes a family of methods for the organization

of large data sets, which includes the Laplacian Eigenmap. The algorithms in this

family derive similarity from their use of eigenfunctions of the operator

Lh :=Pf − f

hf ∈ L2(M, dVg) (2.33)

called the graph Laplacian. Diffusion is carried out when we diagonalize P , to which

diagonalizing Lh is equivalent. This operator derives its name from the Laplace-

Beltrami partial differential operator ∆g. Indeed, Lh converges point-wise as h→ 0+

30

to a scalar multiple (depending on the kernel k, the dimension m, and the volume of

M) of ∆g within an error of order O(h) [SW17a, Theorem 5.2]. Moreover, we obtain

spectral convergence; the eigenvalues and normalized eigenvectors of Lh converge

in probability (as h → 0+) to the eigenvalues and normalized eigenfunctions of ∆g

[BN07]. The spectral embedding introduced by Berard, Besson, and Gallot explains

the sense in which taking an eigendecomposition of ∆g reveals the geometry of the

manifold. Suppose

ak∞k=1 ⊂ L2(M, dVg) 0 ≤ µ1 ≤ µ2 ≤ · · · (2.34)

is an orthonormal basis of real eigenfunctions for the operator ∆g. For each t > 0,

the mapping Ψt:M→ `2(N+) given by

x 7→√

2(4π)m4 (2t)

m+24 e−µjtaj(x)∞j=1 (2.35)

is an embedding. In particular, as t 7→ 0+, the pullback of the canonical metric on

`2(N+) is

g +2t

3

(1

2Scalgg + Ricg

)+O(t2), (2.36)

indicating that Ψt is an approximate isometry. An important first observation is that a

large number of eigenfunctions appear to be needed in order to obtain the Riemannian

metric. Moreover, it is not clear how many eigenfunctions can be discarded while

maintaining the embedding property (which guarantees that the topology of the

manifold is preserved). For these reasons, the truncated spectral embedding has

recently been studied [JMS08, Bat14, Por16]; one result is as follows [Por16, Theorem

5.1]. Fix ε > 0. If κ is a lower bound on the Ricci curvature ofM and ι is the injectivity

radius, then there exists a t0 = t0(m, ε, κ, ι) such that for all 0 < t < t0, there exists

an nE = nE(m, ε, t, κ, ι,Vol(M)) such that if q ≥ nE, the finite-dimensional mapping

Φ(q)t : x 7→

√2(4π)

m4 (2t)

m+24 e−tµjaj(x)qj=1 ∈ Rq (2.37)

31

is an embedding and satisfies

1− ε ≤ ‖(dΨ(q)t )x‖≤ 1 + ε. (2.38)

Note that Φ(q)t is an isometry when the operator norm ‖(dΨ

(q)t )x‖ is 1. We note also

that although the operator Lh is asymptotically ∆g when M is a compact manifold

without boundary [SW17a], a corresponding embedding result for manifolds with

boundary does not exist.

The eigendecomposition of ∆g affords us an extra piece of information: the

ordering imposed by the magnitude of the corresponding eigenvalues. While algebraic

topologists organize the geometry of a manifold in terms of homotopy, homology, and

cohomology, harmonic analysts organize the geometry of a manifold in terms of a

generalized notion of “frequency” afforded by the ordered eigenvalues of ∆g. In the

same way that functions on S1 can be decomposed using the Fourier basis for L2(S1),

functions on the manifoldM can be decomposed using the eigenfunctions of ∆g which

serve as an orthonormal basis for L2(M, dVg). Each eigenfunction is oscillatory in

nature, and its oscillatory behaviour can be characterized by the magnitude of the

corresponding eigenvalue. To obtain coarse information about the geometry of M,

we should consider the lowest-order eigenfunctions, but to obtain finer information

about the geometry of M (such as geodesic distances and community structure), we

should consider the higher-order eigenfunctions.

2.3.1 Finite Samples

In practice, we are provided with only a finite number of points X = xiNi=1 ⊂M,

which we assume are sampled independently and identically from a random vector

Y : (Ω,F ,P)→ Rp whose range isM. To make the notation clear, Ω is the domain of

Y , F is a Borel σ-algebra on Ω, and P is a probability measure on F . We assume

32

that the pushforward measure Y∗P on M is absolutely continuous with respect to the

Riemannian volume measure dVg, and that the function

p :=Y∗PdVg

:M→ R+ (2.39)

given by Radon-Nikodym theorem is uniformly bounded away from both zero and

infinity and is of class C4(M). We call p the probability density function on M

associated with Y ; when p is constant, we say that the sampling is uniform. In

general, a non-uniform density function poses a significant challenge to recovering

the operator ∆g. However, a simple normalization step called α-normalization can

overcome this challenge. The algorithm (in the continuous setting) is as follows.

Choose 0 ≤ α ≤ 1, and set

π(x) =

∫Mkh(x, y) p(y)dVg(y) x ∈M (2.40)

kh,α(x, y) =kh(x, y)

π(x)απ(y)αx, y ∈M (2.41)

dh,α(x) =

∫Mkh,α(x, y) p(y)dVg(y) x ∈M (2.42)

ph,α(x, y) =kh,α(x, y)

dh,α(x)x, y ∈M. (2.43)

We use the α-normalized diffusion kernel ph,α to obtain the α-normalized diffusion

operator

Ph,αf(x) =

∫Mph,α(x, y)f(y) p(y)dVg(y) f ∈ L2(M, dVg), x ∈M. (2.44)

Pointwise on M, we then have the asymptotical result [SW17a, Theorem 5.2] saying

that

(Ph,αf − f) (x)

h= C(k,m)

(∆gf(x) +

2∇f(x) · ∇ (p1−α) (x)

p1−α(x)

)+O(h), (2.45)

33

where C(k,m) is a constant depending on the kernel k and the dimension m of M.

Evidently, taking α = 1 causes the potential term to become zero, leaving the Laplace-

Beltrami operator in the zeroth order. In the discrete setting, we can approximate

the α-normalized diffusion kernel using the set xiNi=1 as follows.

π(x) =1

N

N∑i=1

kh(x, xi) x ∈M (2.46)

kh,α(x, y) =kh(x, y)

π(x)απ(y)αx, y ∈M (2.47)

dh,α(x) =1

N

N∑i=1

kh,α(x, xi) x ∈M (2.48)

ph,α(x, y) =kh,α(x, y)

dh,α(x)x, y ∈M (2.49)

Ph,αf(x) =1

N

N∑i=1

ph,α(x, xi)f(xi) f ∈ L2(M, dVg), x ∈M. (2.50)

Having done so, we recall the recent pointwise [SW17a, Corollary 5.1] and spectral

[SW17a, Theorem 5.4] convergence results associated to these operators. Suppose

α ∈ (0, 1], and that h = h(N) so that both hN→∞−−−→ 0 and

1

hm/4+1

√logN

N

N→∞−−−→ 0. (2.51)

With probability higher than 1− 1/N2, we have

(Ph,αf − f)(xi)

h= C(k,m)

(∆gf(xi) +

2∇f(xi) · ∇ (p1−α) (xi)

p1−α(xi)

)+ · · ·

· · ·+O(h) +O

(1

hm/4+1

√logN

N

)(2.52)

for all 1 ≤ i ≤ N , where f ∈ C4(M). Finally, if t > 0 is fixed, 1 = λ1,N ≥ λ2,N · · · are

the eigenvalues of(Ph,1

) th, ϕj,N∞j=1 are the associated L2-normalized eigenfunctions,

34

and i ≥ 1 is a fixed integer, then there is a sequence hN∞N=1 such that

limN→∞

λi,N = µi (2.53)

limN→∞

‖ϕi,N − ai‖L2(M,dVg)= 0 (2.54)

in probability, where µj∞j=1 are the eigenvalues of the heat kernel operator et∆g , and

aj∞j=1 are the associated L2-normalized eigenfunctions. As a result, we can approxi-

mate the truncated spectral embedding by choosing α = 1, choosing h appropriately,

and setting

Φ(q)t (xi) = λ1ϕ1(xi), ..., λqϕq(xi). (2.55)

Compact manifolds with boundary are also handled in [SW17a], and non-compact

manifolds with boundary are discussed in [HAVL05].

2.3.2 Robustness to Noise

Real data is inevitably noisy. However, under some mild assumptions, the diffusion

maps algorithm is robust to noise [EK+10, EKW+16]. We summarize the result here.

Let W and W0 denote the N × N affinity matrices associated with the noisy and

clean data sets, respectively. Suppose ε > 0 is such that

sup1≤i,j≤N

|W (i, j)−W0(i, j)|≤ ε (2.56)

Note that ε quantifies the control that we have over the noise. If the affinity matrix

W (for the noisy data) satisfies

inf1≤i≤N

∑j 6=i

W (i, j)

N> γ (2.57)

for some γ > ε (meaning in particular that the noise has not introduced isolated

points), then ∥∥D−1W −D−10 W0

∥∥ ≤ ε

γ

(1 +

1

γ − ε

), (2.58)

35

where ‖ · ‖ is the operator norm. By Weyl’s inequality (for the spectra of perturbed

matrices) and the Davis-Kahan sin θ theorem, the first q′ eigenvectors and eigenvalues

are well-reconstructed up to a controllable error, where the number q′ depends on the

noise level. When q′ is large enough so that q′ ≥ nE (m, ε, t, κ, ι,Vol(M)) as in (2.37),

we are guaranteed a reconstruction of the manifold.

36

Chapter 3

Cardiovascular Waveform Analysis

Heart rate is the most basic and widely known measure of the cardiovascular system.

On the other hand, a surrogate for heart rate known as pulse rate can be estimated

by measuring blood flow at nearly any place in the human body because of the

transportation efficiency of the human circulatory system. Besides heart rate and

pulse rate, the analysis of cardiac and pulse waveforms provides further information

regarding the cardiovascular system. Deflections in the cardiac waveform are known

to correspond to the stages of heart contraction, and factors such as vascular wall

tension, blood volume, and cardiac contractility all affect the morphology of the pulse

waveform [O’R11]. Due to the abundant information hidden in both the cardiac and

pulse waveforms, waveform analysis has been widely applied in clinics. On the one

hand, indices are derived from a patient’s “mean” waveform and compared with the

indices of other patients for the purpose of diagnosis. On the other hand, indices are

derived from a sequence of waveforms and examined over time to summarize a patient’s

progress during surgery or treatment. For example, the ECG waveform is used to

diagnose cardiac arrhythmia and adverse cardiac events such as myocardial infarctions

(heart attacks). Commercial monitoring instruments use pulse waveform analysis to

bring physicians a relatively non-invasive assessment of cardiac output [vDvBH+17].

The FloTracsystem marketed by Edwards Lifesciences Inc. (Irvine, CA, USA) uses

the arterial blood pressure (ABP) waveform to derive various circulation-related

indices to facilitate clinical management [SMG14, TSC+16].

The waveform analysis techniques that currently exist can be classified as either

landmark methods (time-domain) or spectral methods (frequency-domain). Unifying

37

these approaches is the implicit assumption that there exists a template guiding

the oscillation. For example, an ECG waveform is quantified in terms of its P, Q,

R, S, and T deflections, and an ABP waveform is often quantified in terms of its

forward-propogating and reflected components [AVBB+09]. In general, each waveform

is classified with reference to the assumed template, and one obtains indices useful for

clinical purposes such as diagnosis. Time-domain parameters quantifying the relative

positions of landmarks in the blood pressure waveform include augmentation pressure

[AVBB+09, WAO+04] and the augmentation index [OA05]; both have been proposed

as indicators of cardiovascular status. Spectral methods interpret cardiac and pulse

waveforms in the frequency domain, providing information about the inflection of

the waveform. The spectral content of the ECG waveform has been found to be

useful for classifying abnormal heartbeat morphologies [QA17]. The spectral content

of the ABP waveform has been found to be related to the physical properties of

the artery [WLH+00]. Moreover, it has been shown that by arbitrarily assembling a

million features from the pulse waveform (including both time- and frequency-domain

features), researchers can predict hypotension [HJB+18].

While these template-based approaches have been widely applied, they are chal-

lenged when we encounter long-term physiological time series or when the subject

is pathological; in both cases, one template is not enough to model the oscillatory

signal. See Figure 3.1 for an ECG signal belonging to a subject possessing premature

ventricular contractions (PVCs), wherein one template is insufficient to model the

time series. Paying mind to long-term physiological time series, the oscillatory pattern

may exhibit changes over larger time scales, in which case one template is again

insufficient for summarizing all of the observed activity. For example, the cardiac cycle

violently changes during a myocardial infarction or a paroxysmal supraventricular

tachycardia event. As discussed in Chapter 1, we designed the wave-shape oscillatory

38

model to overcome these challenges. In this chapter, we discuss applications of the

wave-shape oscillatory model to vital signs which measure the cardiovascular system.

We extract dynamical information from the electrocardiogram (ECG) and the arterial

blood pressure (ABP) signal.

Our novel algorithm for exploring the dynamics encoded by a time series adhering

to the wave-shape oscillatory model is as follows. The algorithm is composed of four

main steps, and the details of each step depend on the physiological time series that

is under consideration. Suppose the time series is sampled uniformly at fs Hz for

T > 0 seconds; that is, there are in total n := bT × fsc sampling points. Denote the

discretized time series as x ∈ Rn.

1. Apply any suitable beat tracking algorithm [MSW20] to determine the locations

of all oscillatory cycles in x. Suppose there are N resulting cycles. Denote the

location (in samples) of the i-th oscillation as ti, where t1 < t2 < . . . < tN .

2. Extract the individual oscillations from x. Denote the i-th oscillatory cycle as

xi ∈ Rp, where p > 0 depends on the frequency of the oscillation. Note that

these segments may overlap, in general. Define X := xiNi=1 ⊂ Rp. Based

on our model, X is a set of points sampled (possibly with noise) from the

wave-shape manifold M associated with x.

3. Apply the diffusion maps algorithm to the set X in order to recover the manifold

M. Write Φ(q)t :X → Rq for the truncated time-t diffusion map, where t, q > 0

are chosen appropriately.

4. Recover the dynamics on the wave-shape manifold by constructing the time

series ti/fs 7→ Φ(q)t (i). Optionally, interpolate this time series over the duration

T of the original recording.

39

In the remainder of this chapter, we provide explicit testbeds featuring several suc-

cessful applications of the wave-shape oscillatory model and the associated algorithm

for recovering physiological dynamics to cardiovascular time series. The second and

third sections of this chapter continue work originally presented in [LMW19].

3.1 Heartbeat Classification

We provide a comfortable and systematic demonstration of how the wave-shape

oscillatory model can be applied to extract dynamical information from a single-lead

ECG signal. This section is organized in the following way. First, we provide an

exploration of how the wave-shape oscillatory model and the wave-shape manifold

model can be used to assess cardiac arrhythmia from an unsupervised perspective.

We show a concrete example of recovering the wave-shape manifold and visualizing

the dynamics on the recovered manifold. Then, we leverage the wave-shape oscillatory

model to train an automatic ventricular ectopy detector which achieves state-of-the-art

performance.

3.1.1 Unsupervised Extraction of Morphological Dynamics

The existing mathematical models for morphological dynamics in single-lead ECG

signals are limited, especially when dealing with pathological cases. In particular,

rapid changes in wave-shape pose serious problems to those models. The ECG signal

in Figure 3.1 is such a pathological signal. (It is record 208 from the MIT-BIH

Arrhythmia database available at PhysioNet.) The wave-shapes (delineated 250 ms

to the left and 400 ms to the right of each demarcated R wave) do not satisfy the

slowly varying assumption required by the generalized phenomenological model. A

model incorporating multiple components would require knowledge about the number

of distinct components in the signal, which a priori is not available to us. The

40

wave-shape oscillatory model is intended to overcome these difficulties. We use this

particular ECG signal (which features arrhythmic beats called premature ventricular

contractions) to provide a brief discussion of how the wave-shape oscillatory model

should be applied in practice before moving to a clinical application of these principles.

In Figure 3.2, we show the recovery of the wave-shape manifold obtained by

running the diffusion maps algorithm on the set of excised cardiac cycles. The

visualization we obtain amounts to a spectral clustering of the cardiac cycles in the

recording based on morphology. (There are 2953 annotated beats in the recording.)

To obtain this embedding, we perform the following computations. Write the raw

ECG signal as x ∈ Rn, where fs is the sampling rate and n = 30 × 60 × fs is the

number of samples in this 30-minute recording. There are N = 2953 annotated R

waves provided by the expert annotator. Let X ∈ RN×p be the matrix of beats, where

p = b0.25× fsc+ b0.4× fsc+ 1 and

X(i, j + b0.25× fsc+ 1) = x(ri + j) − b0.25× fsc ≤ j ≤ b0.4× fsc. (3.1)

We build a self-tuning affinity matrix W ∈ RN×N by setting

W(i, j) =1

2

[exp

[−

p∑k=1

(X((i, k)−X((j, k))2

σ2i

]+ exp

[−

p∑k=1

(X((i, k)−X((j, k))2

σ2j

]],

(3.2)

where σi is the median Euclidean distance from the i-th row of X to all other rows.

We then perform α-normalization with α = 1 to handle the (possibly) non-uniform

density function. We calculate

W(α) = D−1WD−1, (3.3)

where D ∈ RN×N is a diagonal matrix satisfying

D(i, i) =N∑j=1

W(i, j). (3.4)

41

We then calculate the isotropic diffusion kernel by setting

P =(D(α)

)−1/2W(α)

(D(α)

)−1/2, (3.5)

where D(α) ∈ RN×N is a diagonal matrix satisfying

D(α)(i, i) =N∑j=1

W(α)(i, j). (3.6)

We obtain the top two non-trivial eigenvectors of P with their corresponding eigenval-

ues, namely

φ1, φ2 ∈ RN 1 > λ1 ≥ λ2. (3.7)

Finally, we set the truncated time-1 diffusion map to be the matrix

Φ1 =[λ1ϕ1 λ2ϕ2

]∈ RN×2, (3.8)

where ϕi =(D(α)

)−1/2φi ∈ RN are the top non-trivial (unnormalized) eigenvectors

of the diffusion kernel. To visualize the dynamics on the recovered wave-shape

manifold, we plot the time series ri/fs 7→ −ϕ1(i), where ϕ1 is the first non-trivial

eigenvector of the diffusion kernel. In Figure 3.3, we have additionally performed

linear interpolation over the duration of the recording. To verify the meaning of the

extracted dynamics, we compare this time series with the interpolated time series

of manual beat annotations. The negative sign in front of ϕ1 is placed so that the

time series are similarly oriented. Evidently, they are highly correlated. From this

compressed representation of the information in our pathological ECG signal, we

can ascertain a number of quantities useful to physicians. First, we can assess the

frequency at which premature ventricular contractions occur. We can also assess any

temporal coupling between the triggering of normal and abnormal beats. Finally, we

can determine the number of distinct cardiac cycle morphologies present in the signal,

as well as the geometric relationship between those morphologies. In particular, we

42

Figure 3.1: A pathological ECG signal featuring morphologically distinct prematureventricular contractions and fusion beats. In this case, the wave-shape is not slowlyvarying.

notice in Figure 3.2 that the fusion beats are located in between the normal beats

and the premature ventricular contractions, as expected from the definition. This

time series is rarely studied over the long term, partially due to the cost and difficulty

of obtaining a correct sequence of beat annotations. However, with the wave-shape

oscillatory model and its associated algorithm for dynamics recovery, we are making

a step in this direction. Below, we present a clinical application of these principles.

3.1.2 Supervised Ventricular Ectopy Detection

We design an automatic algorithm for detecting ventricular ectopic beats in single-lead

ECG recordings. The classifier leverages the wave-shape manifold of each subject.

We build six features from each cardiac cycle, including a novel morphological feature

which describes the distance to the median beat in the recording. After an unsupervised

subject-specific normalization procedure, we train an ensemble binary classifier using

the AdaBoost algorithm. After training our classifier on subset DS1 of the MIT-BIH

Arrhythmia database, our classifier obtains a sensitivity of 94.75%, a positive predictive

43

Figure 3.2: A mapping of N = 2953 cardiac cycles into R2 afforded by the toptwo coordinates of the time-1 diffusion map. We color the embedded points usingtheir associated annotations. Fuchsia points correspond to premature ventricularcontractions, orange points correspond to normal beats, and teal points correspond tofusion beats.

44

Figure 3.3: A projection of the morphological dynamics on the recovered wave-shapemanifold. In red, we plot the sequence of annotations provided by the expert annotator.In blue, we plot the time series ri/fs 7→ −ϕ1(i), the first non-trivial eigenvector of thediffusion kernel. The abbreviation PVC refers to premature ventricular contractions.

value of 95.49%, and an F1 score of 95.12% on subset DS2 of the same database.

Our classifier achieves an F1 score of 91.23% on the St. Petersburg INCART 12-lead

Arrhythmia database. The six features (as well as the normalization method) we use

allow our ventricular ectopy detector to achieve high precision on unseen subjects

and databases. Our ventricular ectopy detector will be used to study the relationship

between premature ventricular contraction burden and undesirable patient outcomes

such as congestive heart failure and mortality.

Background and Significance

Ventricular ectopy is known to be a modifiable risk factor for congestive heart failure

(CHF) among certain populations. The exact relationship between PVC frequency

and both CHF and mortality is still being studied [DDV+15]. It is also known that

polymorphic PVCs are associated with a higher incidence of mortality, hospitaliza-

45

tion, transient ischemic attack, new-onset atrial fibrillation, and new-onset heart

failure independent of other clinical risk factors [LCL+15]. Moreover, PVC coupling

interval variability may lead to cardiomyopathy [HRC+17]. While quantities such

as PVC frequency are commonly encountered both in the clinic and in large-scale

epidemiological studies, the temporal coupling between PVCs and non-PVCs is less

frequently considered in the literature. On the other hand, it is well known that the

existence of PVCs has a significant impact on heart rate variability (HRV) analysis

[SHS01], and a correction must be carried out before analyzing HRV. In practice, PVC

frequency is assessed using long-term, mobile electrocardiogram (ECG) recordings.

These recordings are usually read by automatic ventricular ectopy detectors using

hardware such as the MARS 8000 PC Holter Monitoring System (GE Medical Systems,

Milwaukee, Wisconsin) and the PVC information is provided by the accompanying

proprietary algorithms.

In the literature, a variety of heartbeat classification algorithms have been proposed

to automatically annotate heartbeat type from the available ECG recordings (following

the AAMI standard [ECA87, EC598]), and the majority of these algorithms partition

an individual’s heartbeat set into four classes, namely normal beats, supraventricular

ectopic beats, ventricular ectopic beats, and fusion beats (i.e., fusions of normal and

ventricular ectopic beats). Traditionally, researchers design automatic classification

algorithms based on carefully chosen morphological features and heart rate [DCOR04,

LM10, LM12, YKC12, HFSW17]. Recently, the trend has been to use deep neural

networks (DNNs) to classify cardiac rhythm. Although DNNs are still black boxes,

various classification results have been published [AOH+17, WH17, XNC+18]. While

existing algorithms have the potential to help physicians identify various types of

arrhythmia, the classification accuracy for each individual beat type (PVCs, in

particular) is usually not ideal. To the best of our knowledge, only a few works focus

46

Table 3.1: Annotated single-lead ECG databases

DatabaseMIT-BIH

ArrhythmiaDS1

MIT-BIHArrhythmia

DS2

St.PetersburgINCART

Arrhythmia

Non-VEB 47233 46491 155888VEB 3788 3221 20011

Sampling Rate 360 Hz 360 Hz 257 HzRecordings 22 22 75

Purpose Training Validation ValidationLength 30 mins 30 mins 30 mins

on providing an accurate and efficient PVC detection algorithm that operates on

single-channel ECG recordings [ASO19, MAR10, MAR11, MAR13].

In this work, we propose a novel algorithm to identify whether a heartbeat is

a ventricular ectopic beat (i.e., a PVC or a ventricular escape beat). Behind our

algorithm is the principle (supported by a priori cardiac electrophysiological knowledge)

that ventricular ectopic beats are “distant” from the median beat in the recording.

The novelty is twofold. First, to handle the difficulty of inter-individual rhythm

classification, we introduce the notion of subject-specific normalization of features.

This means that our classifier is agnostic to each beat’s subject of origin. Second,

we introduce a novel beat morphology feature that is based on the quadratic form

distance [HSE+95] between histograms of heart beats. By convention, in most existing

works, training and validation of these classifiers is performed on the publicly-available

MIT-BIH Arrhythmia database [MM01]. In this work, we adopt this convention and

consider an additional database available at Physionet [GAG+00], namely the St.

Petersburg INCART 12-lead Arrhythmia database.

47

3.1.3 Methods

Annotated Databases

Two annotated databases (see Table 1) were used for the purposes of training and

validation. Forty-four recordings from the MIT-BIH Arrythmia database [MM01] were

divided into two groups, labeled DS1 and DS2 [DCOR04]. (The database actually

contains 48 records, but four of them are conventionally discarded because they

contain paced beats.) Only the first available lead (a modified limb lead II) was

considered in this study, as our algorithm is designed to operate on a single channel.

All 44 recordings were 30 minutes in length and were sampled at 360 Hz. We also

considered all 75 recordings from the St Petersburg INCART 12-lead Arrhythmia

Database. In this database, we used the second available lead (the limb lead II)

because of the low quality of the first available lead. Recordings were 30 minutes

long and were sampled at 257 Hz. Both databases are available online at PhysioNet

[GAG+00]. PhysioNet’s annotation system divides cardiac cycles into 19 subgroups

following the AAMI standard [ECA87, EC598]. Our algorithm distinguishes PVCs

(annotation code V) and ventricular escape beats (annotation code E) from all other

beat types. This is a binary classification problem; we consider the positive class to

be class consisting of the PVCs and the ventricular escape beats.

Proposed Algorithm

Our proposed algorithm takes as input a raw electrocardiogram x ∈ Rn in units of

mV, its sampling rate fs, and the locations (in samples) of all QRS complexes in

the recording. Suppose there are N detected QRS complexes in x, and that the

i-th QRS complex is located at sample ri. Our algorithm determines which of the

observed QRS complexes correspond to either a premature ventricular contraction

or a ventricular escape beat. We apply a standard high-accuracy QRS detector to x.

48

If the i-th QRS complex location is matched to any of the predicted QRS complex

locations provided by the QRS detector, we replace ri with this value. This step is

to ensure that the algorithm is not confounded by the method used to obtain the

QRS complex locations. We apply a bi-directional 3rd order Butterworth bandpass

filter to x with cutoff frequencies 0.5 Hz and 40 Hz to remove baseline wandering and

to suppress noise. We then upsample the signal to a sampling rate of fs = 2000 Hz.

We use the same notation ri to denote the location of the i-th QRS complex in the

upsampled signal. Denote the filtered and upsampled signal as x ∈ Rn. We associate

six features to each QRS complex. We build six vectors Y1, ..., Y6 ∈ RN so that Yi(j)

is the value of the i-th feature at the j-th QRS complex. We begin by calculating

the time difference between the current beat and the previous beat, specifically

X1(i) :=ri − ri−1

fs(3.9)

the time difference between the current beat and the next beat, specifically

X2(i) :=ri+1 − ri

fs(3.10)

the electric potential (in the filtered ECG) at the location of the detected QRS

complex, specifically

X3(i) := x(ri) (3.11)

These features are normalized by a procedure we call “subject-specific normalization.”

This normalization is also time-specific. In particular, for each 1 ≤ i ≤ 3, we define

the normalized features for the current recording to be

Yi(j) = Xi(j)− medXi(k) : |j − k|≤ 500, (3.12)

where med denotes taking the median. Note that this operation corresponds to

removing the output of a median filter (with window size 1001 QRS complexes) from

49

the signal Xi. The final three features are morphological. To calculate these features,

we first splice a portion of each cardiac cycle from the recording. Set a = b0.125fsc

and b = b0.2fsc. Obtain a matrix B ∈ RN×(a+b+1) by setting

B(i, t+ a+ 1) = x(ri − t), (3.13)

where i is the beat index and t is an integer satisfying −a ≤ t ≤ b. The intention is

that the i-th row of B corresponds to a large portion of the i-th cardiac cycle. We

now associate three distances with each cardiac cycle, namely

the squared Euclidean distance to the median beat, specifically

Y4(i) =a+b+1∑t=1

[B(i, t)− medB(j, t) : |i− j|≤ 500]2 (3.14)

the squared diffusion distance [CL06] to the median embedded beat, which we

will define shortly;

the quadratic form [HSE+95] (or cross-bin) distance from the current beat’s

histogram to the median histogram, which we will define shortly.

To calculate the fifth feature (the diffusion distance to the median embedded

beat), we carry out the following steps. Build an affinity matrix W ∈ RN×N using

the formula

W (i, j) = exp

[−

a+b+1∑k=1

(B(i, k)−B(j, k))2

200

]. (3.15)

We α-normalize the affinity matrix (with α = 1) by setting

W (α) = D−1WD−1, (3.16)


D(i, i) =N∑j=1

W (i, j). (3.17)

50


P =(D(α)

)−1/2W (α)

(D(α)

)−1/2, (3.18)


D(α)(i, i) =N∑j=1

W (α)(i, j). (3.19)

We obtain the top 80 non-trivial eigenvectors of P with their corresponding eigenvalues,

namely

φi80i=1 ⊂ RN 1 > λ1 ≥ λ2 ≥ · · · ≥ λ80. (3.20)


Φ1 =[λ1

(D(α)

)−1/2φ1 · · · λ80

(D(α)

)−1/2φ80

]∈ RN×80. (3.21)

The fifth feature (associted to the i-th cardiac cycle) is obtained by setting

Y5(i) =80∑t=1

[Φ1(i, t)− medΦ1(j, t) : |i− j|≤ 500]2 . (3.22)

To obtain the sixth feature, we first calculate the histogram of each cardiac cycle.

Define a vector of bin edges

E = (−2,−11/6, ...,−1/3, 0, 1/3, ..., 11/6, 2) ∈ R25 (3.23)

and a matrix of histograms H ∈ RN×24 by setting

H(i, j) = |B(i, t) : 1 ≤ t ≤ a+ b+ 1 ∩ [Ej, Ej+1)| (3.24)

for each 1 ≤ i ≤ N and each 1 ≤ j ≤ 24. Note that H(i, j) counts the number of

samples making up the i-th beat which are greater than or equal to Ej but less than

Ej+1. Note that the sum of each row of H is a + b + 1, and it is unnecessary to

normalize the histograms so that each sums to 1. We locally estimate the median

histogram by introducing a new matrix G ∈ RN×24 whose entries are

G(i, j) = medH(i, k) : |j − k|≤ 500. (3.25)

51

In fact, we obtain G by applying a moving median filter to the columns of H. We

define a matrix of bin similarities M ∈ R24×24 by setting

M(p, q) = exp

[−(Ep − Ep+1 − Eq + Eq+1)2

2

](3.26)

for all 1 ≤ p, q ≤ 24. The sixth and final feature, namely the quadratic form distance

from the current beat’s histogram rowi(H) to the median histogram rowi(G), is

calculated by the formula

Y6(i) = (rowi(H)− rowi(G))M (rowi(H)− rowi(G))> . (3.27)

The six features Y1(i), ..., Y6(i) are fed into an ensemble binary classifier to determine if

the i-th cardiac cycle is of the positive class (either a premature ventricular contraction

or a ventricular escape beat). The classifier consists of 50 weak binary decision tree

learners each with a maximum of 4 splits, and the ensembling method was the

AdaBoost algorithm [FS95, Fri01, FHT+00, SFB+98]. In particular, we used the

AdaBoostM1 functionality in MATLAB 2019a with the default parameters.

Evaluation

We carry out the following training procedure. First, we train the ensemble binary

classifier on all of the cardiac cycles in DS1. Then, we validate the trained classifier

on the cardiac cycles in DS2. We did not use the DS2 database during any point

of the training stage. Note that beats from the same subject are not distributed

across both DS1 and DS2. We carry out a further validation by applying the trained

classifier to all of the cardiac cycles in both the St. Petersburg INCART database.

We use a number of evaluation metrics derived from the number of true positives

(TP), false positives (FP), true negatives (TN), and false negatives (FN). We calculate

the sensitivity (SEN), specificity (SPE), and positive predictive value (PPV) using

52

the formulas

SEN = 100%× TP

TP + FNSPE = 100%× TN

FP + TN(3.28)

PPV = 100%× TP

TP + FP(3.29)

The F1 score is the harmonic mean of SEN and PPV and serves as our main summary

statistic. Due to the strong imbalance between the positive and negative classes, SPE

is not as informative as SEN and PPV for assessing model performance.

3.1.4 Results

After training the ensemble binary classifier on DS1, we report in Table 2 a variety

of evaluation metrics which describe the classifier’s performance on DS2 and the

St. Petersburg INCART database. We also report the performance on the training

database (DS1) to show that our model has not been over-fit to the training data.

A SEN of 94.75% on DS2 means that our classifier correctly detects 95 out of every

100 ventricular ectopic beats. A PPV of 95.49% means that every ventricular ectopy

detection made by our algorithm has a 95.49% chance of being a ventricular ectopic

beat. We do notice, however, that the F1 score on the INCART database is lower

than the F1 score on DS2. This result could be due to the fact that the recordings

were made on different devices and annotated by different individuals. This result

could also be due to differences in the densities of certain pathologies within each

database.

Comparison with Previous Results

We compare our algorithm’s performance to the reported performance of previous

algorithms (see Table 3). Llamedo and Martinez [LM10] design an automatic heartbeat

classifier which divides cardiac cycles into four categories, one of which is the ventricular

53

Table 3.2: Performance of the proposed ventricular ectopy detector

DatabaseMIT-BIH

ArrhythmiaDS1

MIT-BIHArrhythmia

DS2

St.PetersburgINCART

Arrhythmia

Purpose Training Validation ValidationTP 3530 3052 17648FP 213 144 1028TN 47020 46347 154860FN 258 169 2363

SEN (%) 93.19 94.75 88.19SPE (%) 99.55 99.69 99.34PPV (%) 94.31 95.49 94.50F1 (%) 93.75 95.12 91.23

ACC (%) 99.08 99.37 98.07AUC 0.9989 0.9981 99.31

TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; SEN: sensitivity;SPE: specificity; PPV: positive predictive value; F1: F1 score; ACC: accuracy; AUC: area under the

ROC curve.

ectopy class we consider in this work. They consider two-lead recordings. They select

eight features based on the RR interval time series and the autocorrelation of the

discrete wavelet transform of each ECG lead. They employ a linear discriminant

classifier, and they report a SEN of 82.9% and a PPV of 88.0% on DS2 when identifying

the ventricular ectopy class (although three ventricular ectopic beats appear to be

missing from the reference set). While their algorithm’s performance is certainly

below the performance of our algorithm, the fact that their algorithm was designed to

solve a multiclass problem indicates that a lower performance on the ventricular class

is to be expected. Llamedo and Martinez also consider a semi-automatic approach

to heartbeat classification [LM12]. Their previously considered model is modified to

allow expert assistance. After manually annotating 12 carefully selected beats from

each recording, they achieve a SEN of 92.2% and a PPV of 94.8% when detecting

premature ventricular contractions, ventricular escape beats, and fusions of ventricular

54

Table 3.3: Comparison with previous results on DS2

Reference SEN (%) PPV (%) F1 (%) Notes

de Chazal et al. [DCOR04, Table 8] 77.7 81.6 79.6Multi-classMulti-lead

Llamedo & Martinez[LM10, Table 6]

82.9 88.0 85.4Multi-classMulti-lead

Llamedo & Martinez[LM12, Discussion]

92.2 94.8 93.5Expert-assisted

Multi-classMulti-lead

Oster et al. [OBS+15, Table 4] 92.7 96.2 94.5Expert-assisted

Multi-lead

Alfaras et al. [ASO19, Table 7] 92.7 95.7 94.2

Sultan Qurraie& Ghorbani Afkhami

[QA17, Table 4]

95.4 94.1 94.8 Multi-class

This work 94.8 95.5 95.1

and normal beats (Physionet annotation code F). Since our ventricular ectopy class

does not contain the fusion class, this result may not be comparable. Another expert-

assisted approach which features manual annotation of an average of three beats from

each 30-minute recording in DS2 achieved an F1 score of 94.5% [OBS+15] on the

same binary classification problem we consider in this work. Oster et al. consider

two leads, and they report that discarding noisy beats (3.2% of all beats in DS2) can

improve the F1 score to 98.6%. For ultra-long-term signals (≥ 14 days) we expect

that the number of beats which need to be manually annotated in order to achieve

the same performance would increase.

A recent single-lead automatic ventricular ectopy detector which uses ring-topology

reservoir computing (echo state networks) achieved a SEN of 92.7% and a PPV of

95.7% when detecting the same set of ventricular ectopic beats that we consider in this

55

work [ASO19, Table 7]. Alfaras et al. also report performance on the second available

lead in the MIT-BIH Arrhythmia database; they obtain a SEN of 86.1% and a PPV of

75.1% on DS2, but we obtain a SEN of 79.3% and a PPV of 89.6%. In our experiments,

their ensemble of 30 networks (with 1000 nodes each) ironically suffers from long

computation times. Nevertheless, their reported performance is not as strong as what

we are able to achieve. We report one final heartbeat classifier by Sultan Qurraie and

Ghorbani Afkhami which uses time-frequency analysis (the Wigner-Ville distribution)

[QA17]. They achieve a SEN of 95.4% and a PPV of 94.1% on DS2 using only the first

lead (although three ventricular ectopic beats appear to be missing from DS2). While

this result is roughly equivalent to ours, the algorithm is undoubtedly numerically

inefficient due to its use of time-frequency methods. Moreover, since the authors do

not validate on multiple databases, their parameters may be over-tuned to perform

well on DS2.

3.1.5 Discussion

In this study, we design and validate an automatic, single-lead ventricular ectopy

detector which achieves an F1 score of 95.12% on the validation database DS2. The

algorithm uses subject-specific normalization of six features derived from the RR

interval time series and beat morphology. We ensure that our algorithm is not

confounded by the QRS complex annotation method (whether it be automatic or

manual) by, when possible, replacing each QRS complex annotation with one generated

by our own high-accuracy QRS detector. We also propose a novel feature to describe

beat morphology which involves comparing heartbeat histograms using the cross-bin

distance. The final classifier we use to solve the binary classification problem is an

ensemble of binary decision trees, aggregated using the AdaBoost algorithm with 50

learning cycles. Our detector generalizes well to other databases, achieving an F1

56

score of 91.23% on the St. Petersburg INCART Arrhythmia database.

Our algorithm is intended to be able to perform on new systems in mobile health

monitoring such as the ZIO Patch cardiac monitor (iRhythm Technologies, Inc., San

Fransisco, California, USA), so it is not designed to leverage all of the information

present in a multi-lead ECG recording. However, it appears that for ventricular ectopy

detection, one lead is enough to achieve a satisfactory result. In the future, we will

report work which leverages multiple leads to obtain information from ECG recordings

with higher precision. We note that since we use the diffusion maps algorithm in

this work (which does not scale well with the number of beats in a recording), long

recordings should be analyzed in 30-minute segments.

We provide a brief discussion of technical details in this work. The first one is the

use of histograms, as well as the so-called quadratic form distance between histograms.

Niblack et al. describe the use of histograms and the quadratic form distance for

searching large databases of images [NBE+93]. Hafner et al. describe the use of

histograms and the quadratic form distance for performing color-based indexing of

images [HSE+95]. Rubner et al. compare the cross-bin distance to the earth mover’s

distance [RPTB01, RTG00]. Histograms have been used for classifying images and

textures in images [HGN04, LO06]. The quadratic form distance we use in this work

is a Bregman divergence with a positive-definite kernel, but as far as we know, its

specific theoretical properties have not been studied. The properties it apparently

enjoys is a robustness to translation (of the beat, due to possible misalignment of the

R peak) and a robustness to measurement noise. Further work is required to verify the

properties of this comparison tool, but we find its usefulness evident in this application.

Since our algorithm uses the distance to the median beat to detect ventricular ectopy,

we are implicitly assuming that the most common beat in each recording is not a

ventricular ectopic beat. We believe that this assumption is warranted, as the subject

57

in DS2 with the highest percentage of ventricular ectopic beats had only 32%.

In the algorithm of Llamedo and Martinez [LM10, LM12], both the current RR

interval and the average of the previous RR intervals are considered as features (after

applying the natural logarithm). Of course, this approach differs from ours in two

significant ways. First, the average is used instead of the median; while the median

operator is insensitive to outliers, the mean operator is not. Second, we (roughly

speaking) consider the difference between these features, while Llamedo and Martinez

feed them separately into the classifier. We believe Llamedo and Martinez’s approach

leads to over-fitting, as the classifier will attempt to draw connections between baseline

heart rate and PVC frequency. In general, it is very easy for a classifier which relies on

non-relative features such as heart rate or beat morphology to over-fit to the training

data. In the case of heart rate, the reason is that the classifier is provided with a

large number of PVCs with similar RR intervals. In the case of beat morphology, the

reason is that the classifier is provided with a large number of morphologically similar

PVCs. If the resolving power of the classifier is high enough (such as with a deep

convolutional neural network [LBBH98]), the model will be able to recognize only

those ventricular ectopic beats which demonstrate high morphological similarity to

at least one PVC morphology in the training database. However, since the training

database is small (in terms of the number of distinct PVC morphologies available

to us), and due to the large variety of distinct PVC morphologies encountered in

practice, this kind of classifier is undesirable.

We had the option to use the temporal coupling between PVCs and non-PVCs as

a feature in our classifier (via, for example, long short-term memory networks [HS97]).

Doing so led to an increased F1 score on DS2 but a decreased F1 score on the INCART

database. Since these scores were only slightly better than what we obtained using a

boosted ensemble of decision trees (except on the INCART database), we decided

58

that the increased computational load was not worth the improvement. An advantage

to using decision trees to perform classification is that we do not have to map features

into a small interval by a process such as z-scoring or min-max scaling before training

the classifier. As the problem of ventricular ectopy detection is largely a problem of

anomaly detection in which we cannot assume our features are distributed normally

(or uniformly) and the number of outliers is high, this kind of normalization step

would be inappropriate. All parameters used in the training algorithm were chosen

ad hoc and without performing any hyperparameter optimization. Our algorithm

is very fast and is suitable for analyzing long-term signals, but since we need to

resample the signal to 2000 Hz and apply the diffusion maps algorithm, the signal

should be analyzed in chunks in order to minimize memory usage. Without including

the QRS detection stage, the feature building and classification stages took only 330

milliseconds per 30-minute recording. All calculations were performed in MATLAB

2019a on an Intel i7-4790K processor at 3.60 GHz with 24 GB of RAM.

This study has several limitations. First, our algorithm suffers from limited

training data. While we follow standard procedure in the literature and train our

algorithm on DS1, an algorithm intended for clinical deployment should be trained

on a larger database or aggregation of databases. Another limitation of our algorithm

is that it requires an automatic signal quality classifier to identify when the ECG is

readable. The databases used in this study are well-behaved as far as readability, so

this is not a serious issue but should be considered when applying the algorithm in a

real-world setting.

3.2 Electrocardiogram-derived Respiration

Respiratory signals can be used for a variety of purposes; respiratory rate variability

and the variability inherent in the respiratory waveform is used to monitor respiratory

59

health and to predict respiratory failure [FCEH03]. In the pulmonary clinic, respiratory

rate is measured by counting the number of breaths a patient takes in a one-minute

period. Unfortunately, this process is error-prone [STSG85, SRB+91]. It also provides

minimal information about the respiratory system; the respiratory waveform holds

crucial information about the structure of the pulmonary system [Tob88]. Existing

methods for extracting this information are uncomfortable for the patient and not

ideal for long-term, mobile monitoring [FGHS02]. We are interested in estimating

respiratory rate and the respiratory waveform through easy-to-use, mobile-capable,

and non-invasive methods.

The respiratory signal can be extracted as the amplitude modulation of the ECG

signal. When the patient inhales, transthoracic electrical impedance increases, and the

ECG electrode moves further from the heart; consequently, the amplitude of the ECG

signal decreases. On the other hand, the amplitude of the ECG signal increases when

the patient exhales via the complementary mechanism. Electrocardiogram-derived

respiration (EDR) is an old technique which uses a multiple- or single-channel ECG

to estimate respiratory activity. Deriving the respiratory signal from a single-channel

ECG has been attempted using a number of methods. Traditionally, the EDR signal

is obtained by interpolating the time series of RS heights as follows:

EDR (ri/fs) := E(ri)− E(si), (3.30)

where E is the discretized ECG signal, fs is the sampling rate, ri is the location of

the i-th R peak, and si is the location of the following S peak, using piecewise cubic

spline interpolation [CAM+06, Chapter 8]. Another landmark-based method involves

estimating the area of each QRS complex [CAM+06, Chapter 8]. In [LBM10], principal

component analysis (PCA) is used to obtain an EDR signal, and in [WVD+12], kernel

principal component analysis (kPCA) is used. Since respiratory dynamics are known

to manifest as the amplitude modulation in the cardiac waveform, recovering the

60

respiratory signal via the wave-shape oscillatory model should be achievable. In this

section, we present a proof-of-concept approach to obtaining an EDR signal from a

non-traditional perspective.

3.2.1 Modeling Respiratory Dynamics in the ECG

Denote the ECG signal as E : R → R, and assume it adheres to the wave-shape

oscillatory model. We assume that the phase manifold N hosts respiratory dynamics

in the following sense. Suppose there is a smooth vector field X on N that describes

the respiratory drive (the instantaneous velocity of the respiratory center’s input).

We let

θt : N → N (3.31)

be the flow of X. More generally, we could assume that θt satisfies the following

stochastic differential equation defined on N :

dθt = Xθtdt+ dωt, (3.32)

where ω is Brownian motion on N . In either case, θ′t(p) describes the respiratory

drive after time t with initial state p ∈ N . In general, the phase manifold N is

not accessible. However, as discussed above, the ECG signal is a sensor which can

occasionally observe the phase manifold. To model the observation process, we assume

the existence of a diffeomorphism ρ:N →M that maps N to the wave-shape manifold

M associated with the ECG signal. In this case, the flow θt is mapped to a flow

ρ∗θt = ρθt defined onM. Clearly, under the model (3.31), ρ∗θt is the flow associated

with the vector field ρ∗X defined on M. Under the model (3.32), by Ito’s formula,

ρ∗θt satisfies the following stochastic differential equation defined on M:

d(ρ∗θt) =

(1

2∆ρ|θt+∇ρ|θtXθt

)dt+∇ρ|θtdωt. (3.33)

61

This model can be generalized to other physiological oscillatory signals, while the

interpretation of θt, N , and ρ will be different. In general, θt represents the intrinsic

respiratory dynamics, and ρ∗θt represents the observable respiratory dynamics. We

now extend this discussion to obtain a new ECG-derived respiration algorithm.

3.2.2 DM-EDR

We provide a detailed description of our proposed algorithm (coined DM-EDR) for

extracting respiratory dynamics from a single-lead ECG signal. The raw ECG signal

is first upsampled to a sampling rate of 1000 Hz. Noise in the ECG signal is surpressed,

baseline wandering is removed, and the P and T waves are suppressed by applying

a bi-directional third-order Butterworth bandpass filter with cutoff frequencies 3-40

Hz. We detect the R peaks in the recording by applying a standard high-accuracy

QRS detector [Elg13]. This step allows us to delineate cardiac cycles. Write the

pre-processed ECG signal as a vector E ∈ Rn, where n = fs × T is the number of

samples, fs = 1000 Hz is the sampling rate of the signal, and T is the duration of

the recording in seconds. Suppose there are N detected R peaks in the ECG signal.

Write the location in samples of the ith R peak as ri. For the purpose of analyzing

the traditional EDR signal, the S peak following each R peak is determined by

si = arg minri+1≤t≤ri+60E(t) , (3.34)

where 60 ms is chosen according to normal electrophysiology. By defining a window

surrounding each R peak that extends 100 ms to the left and 100 ms to the right,

we are able to excise each QRS complex. (Recall that the P and T waves have been

suppressed.)

X(i, j + 101) := E (ri + j) −100 ≤ j ≤ 100, (3.35)

Clearly, X(i, j) ∈ RN×p, where p = 201. According to the wave-shape manifold model,

the set of QRS complexes lie on a low-dimensional manifold embedded in Rp. Note

62

that the row-ordering of X subliminally encodes the temporal information, but we will

not use it to recover the wave-shape manifold. To recover the wave-shape manifold

and the respiratory dynamics it hosts, we apply the diffusion maps algorithm as

follows. We build a self-tuning affinity matrix W ∈ RN×N by setting

W(i, j) =1

2

[exp

[−

p∑k=1

(X((i, k)−X((j, k))2

σ2i

]+ exp

[−

p∑k=1

(X((i, k)−X((j, k))2

σ2j

]],

(3.36)




W(α) = D−1WD−1, (3.37)


D(i, i) =N∑j=1

W(i, j). (3.38)


P =(D(α)

)−1/2W(α)

(D(α)

)−1/2, (3.39)


D(α)(i, i) =N∑j=1

W(α)(i, j). (3.40)

We obtain the top non-trivial eigenvector φ ∈ RN of P, and we set ϕ ∈ RN to be the

vector

ϕ =(D(α)

)−1/2φ. (3.41)

Finally the DM-EDR is obtained by interpolating the function

DM-EDR (ri/fs) := ϕ(i) (3.42)

63

over the duration of the recording at a uniform sampling rate of 10 Hz. (We use

piecewise cubic spline interpolation.) Note that, in the case of EDR, due to the

interactions between the eigenfunctions and the dynamics on the manifold, it is not

guaranteed which eigenfunction will most accurately capture the respiratory dynamics.

To reduce the chance of confoundment (following a previous approach [WVD+12]), we

force our algorithm to rely solely on QRS complex morphology. In general, automating

a selection of the best eigenfunction requires further ingenuity. In the worst case, the

respiratory information may be spread non-linearly across multiple eigenfunctions. In

the following section, we provide an explicit example in which we need to adjust the

algorithm when a subject exhibits premature ventricular contractions (PVCs).

3.2.3 Material and Results

The ECG signals featured in our experiment are from a set of standard overnight

polysomnograms which were recorded to confirm the presence of sleep apnea syndrome

in subjects suspected of sleep apnea at the sleep center in Chang Gung Memorial

Hospital (CGMH), Linkou, Taoyuan, Taiwan. The Institutional Review Board of

CGMH approved the study protocol (No. 101-4968A3). All recordings were acquired

on the Alice 5 data acquisition system (Philips Respironics, Murrysville, PA). One

subject with PVC arrhythmia is chosen manually from the database. We focus on

the first lead of the ECG recording, which, before upsampling, is sampled at 200

Hz. An automated signal quality index called rSQI [BOLC13] is used to identify

15-minute segments of each ECG signal which can be trusted for analysis. We visualize

the dynamics in each 15-minute ECG signal in two ways. First, we visualize the

wave-shape manifold by showing the image of the time-1 diffusion map. In Figure 3.4,

we show that one subject has a wave-shape manifold roughly characterized in terms

of R peak height and the interval between the R peak and the S peak.

64

Figure 3.4: A mapping of the set of QRS complexes to R2 using the top twonon-trivial coordinates of the time-1 diffusion map. Each point corresponds to oneQRS complex in the recording. Left: we colour each point by the amplitude of the Rpeak; right: we colour each point by the length of the corresponding R-to-S interval,measured in time from the R peak to the S peak. The two embeddings are otherwiseidentical.

To visualize the dynamics on the wave-shape manifold, we turn to the interpolated

eigenfunctions. In Figure 3.5, we show that respiratory dynamics are encoded and well-

captured by the top non-trivial eigenvector of the diffusion kernel. The accompanying

airflow signal (Flow) from the PSG is used to confirm the accuracy of the recovery.

The EDR signal obtained using the traditional method appears almost identical to

the DM-EDR signal. In Figure 3.6, we show that the top non-trivial eigenvector of the

diffusion kernel is sensitive to PVC activity. This result comes from the mixing effect

of the eigenvectors. While PVCs lie on a disconnected component of the wave-shape

manifold, we do not have this information a priori, and in the DM-EDR algorithm, a

complete graph is constructed. As a result, the eigenvectors are distorted. To obtain

the respiratory dynamics of a patient with PVC arrhythmia, we should apply the DM

to only those cardiac cycles which are classified as normal (by a PVC detector such

as the one described in Section ). The result is shown as ϕ in Figure 3.6, wherein

65

we additionally ignore the beats that are directly adjacent to any PVC beat when

calculating the diffusion map.

Figure 3.5: The EDR signal (obtained from either the traditional method or theDM-EDR algorithm) well-approximates the respiratory flow signal.

While no quantitative performance measures are provided, this series of results

demonstrates that, based on the wave-shape oscillatory model, we can extract res-

piratory information from a non-traditional viewpoint. We do not claim that the

proposed approach outperforms the traditional one, but this result suggests that we

may combine the new approach with the traditional one to obtain a better estimate

for the respiratory information. We will systematically explore this possibility in

future work.

3.3 Intraoperative Cardiac Monitoring

In this section, we provide an example showcasing how the wave-shape oscillatory

model can be used to visualize the response of the cardiovascular system to a myocardial

66

Figure 3.6: Morphological differences between normal and PVC beats are reflectedin the top non-trivial eigenvector of the diffusion kernel. Because of the mixing ofeigenvectors, compensating for the presence of abnormal beats by ignoring them duringinterpolation does not yield respiratory information (ϕnormal). Alternatively, usingthe diffusion maps algorithm to recover each connected component of the wave-shapemanifold individually succeeds (ϕ).

infarction event. In the operating room, we have at least two methods for monitoring

the cardiovascular system: the ECG signal records the electrical activity of the heart,

and the arterial blood pressure (ABP) signal records the fluid pressure in either the

radial or the femoral artery [KJH+13]. We show that both depictions of the cardiac

cycle present the same dynamical information from different viewpoints. To this

end, we study a particular intraoperative phenomenon. We analyze a single case of

intraoperative myocardial infarction, wherein the patient experienced a heart attack

during his or her open heart surgical procedure. During the surgery, there was a

sudden appearance of ST segment elevation in the ECG signal (see Figure 3.7). An

immediate trans-esophageal echocardiogram examination and a direct assessment of

the open chest by the surgeon showed an abnormal heart muscle movement which

67

Figure 3.7: On the left, we plot every tenth ECG pulse preceding the MI event. Inthe middle, we plot pulses occuring during the MI event. On the right, we plot pulsesfollowing the MI event. The beats are coloured according to their order of appearance;blue is early, and red is late. An increase in ST segment elevation is noticed duringand following the MI event.

fulfilled the diagnostic criteria for a myocardial infarction (MI) event. The ST segment

elevation in the ECG waveform is obvious to the trained eye, but morphological

changes to the ABP waveform are less evident.

We are broadly interested in the physiological response of the cardiovascular system

during myocardial infarction events. To this end, we analyze pulse waveforms obtained

from both the ECG and the ABP signal. The physiological time series were collected

using the Philips IntelliVue patient monitor. The signals were collected from the

patient monitor via a third-party data dumping system called ixTrend Express 2.1

(ixellence GmbH, Wildau, Germany). The ECG signal (lead II of the EASI Lead

System) was sampled at 500 Hz, and the ABP signal was sampled at 125 Hz. The

signals used are from the study approved by Shin Kong Wu Ho-Su Memorial Hospital,

Taipei, Taiwan. The Institutional Review Board approved the study protocol (No.

20160106R). Written informed consent was obtained from the patient. Our recordings

are 16 minutes long.

68

Dynamics in the ECG Signal

To demonstrate the proposed algorithm and obtain the desired dynamical information

from the ECG signal, we carry out the following calculations. Noise in the ECG

signal is supressed and baseline wandering is removed by applying a bi-directional,

third-order Butterworth bandpass filter with cutoff frequencies 0.5-40 Hz. We detect

the R peaks in the recording by applying a standard high-accuracy QRS detector

[Elg13]. Write the pre-processed ECG signal as a vector E ∈ Rn, where n = fs × T

is the number of samples, fs = 500 is the sampling rate, and T is the duration of

the recording in seconds. Suppose that there are N detected R peaks in the ECG

signal. Write the location in samples of the i-th R peak as ri. By defining a window

surrounding each R peak which stretches from the beginning of the P wave to the end

of the T wave, we are able to extract each cardiac cycle. In this example, we extract

the i-th beat by performing

X(i, j + 176) := E (ri + j) −175 ≤ j ≤ 175 (3.43)

Consequently, X is an N × p matrix, where p = 351. According to Section 1.3, we

assume the set of heartbeats lies on a low-dimensional manifold embedded in Rp. We

recover this manifold by approximating the eigenfunctions of its Laplace-Beltrami

operator. Indeed, we consider the self-tuning Gaussian kernel

W(i, j) =1

2

[exp

[−

p∑k=1

(X((i, k)−X((j, k))2

σ2i

]+ exp

[−

p∑k=1

(X((i, k)−X((j, k))2

σ2j

]],

(3.44)




W(α) = D−1WD−1, (3.45)

69


D(i, i) =N∑j=1

W(i, j). (3.46)


P =(D(α)

)−1/2W(α)

(D(α)

)−1/2, (3.47)


D(α)(i, i) =N∑j=1

W(α)(i, j). (3.48)

We obtain the top three non-trivial eigenvectors of P with their corresponding

eigenvalues, namely

φ1, φ2, φ3 ∈ RN 1 > λ1 ≥ λ2 ≥ λ3. (3.49)


Φ1 =[λ1ϕ1 λ2ϕ2 λ3ϕ3

]∈ RN×3, (3.50)

where ϕi =(D(α)

)−1/2φi ∈ RN are the top non-trivial (unnormalized) eigenvectors of

the diffusion kernel.

Dynamics in the ABP Signal

We now show that the same dynamical information is available in the simultaneously

recorded ABP signal, even though it may not be apparent to the human eye (see

Figure 3.9). First, we upsample the ABP signal to 500 Hz. To obtain all pulses from

the ABP signal, we perform peak detection using an algorithm designed for PPG

signals [ENB+13]. Suppose the ABP time series is discretized as A ∈ Rn, where

n = fs × T , fs = 500 is the sampling rate, and T is the duration of the recording

70

Figure 3.8: A mapping of the ECG waveforms to R3 afforded by the first threenon-trivial coordinates of the diffusion map. The embedded points are colouredaccording to their order of appearance; blue is early, and red is late. Beats featuringST segment elevation appear later in the recording.

71

in seconds. Suppose there are N detected peaks, and their locations in samples are

aiNi=1. Delineate the i-th cycle in the ABP signal as

X(i, j + b0.13× fsc+ 1) := A(ai + j) −b0.13× fsc ≤ j ≤ b0.5× fsc. (3.51)

Clearly, X ∈ RN×p, where p := 316. Next, we remove the blood pressure information

by normalizing each row of X. Set

X(i, j) := σ−1i (X(i, j)− µi) j = 1, . . . , p, (3.52)

where µi is the mean of the i-th row of X, and σi is its standard deviation. The point

cloud Xini=1 encodes the morphological information of the ABP waveform, and the

temporal information is preserved in the sequence aiNi=1. We carry out the diffusion

maps algorithm precisely as in the case of the ECG signal (with X in place of X).

The result is depicted in Figure 3.10. It is worth mentioning that since the blood

pressure information has been removed (detrended), the embedding only reflects the

dynamics encoded in the time-varying oscillatory morphology of the ABP pulses.

This finding coincides with our physiological knowledge: the stimulus invokes a series

of physiological responses including vasoconstriction, increased heart contractility,

and their subsequent interactions. The impact of the stimulus decays after a while,

and the physiological status gradually returns to normal. This example serves to

demonstrate the usefulness of the wave-shape oscillatory model in decoding dynamics

that are difficult to observe with the human eye.

72

Figure 3.9: On the left, we plot every tenth ABP pulse preceding the MI event. Inthe middle, we plot pulses occuring during the MI event. On the right, we plot pulsesfollowing the MI event. The beats are coloured according to their order of appearance;blue is early, and red is late. It is difficult to pinpoint the morphological differencesbetween these waveforms.

Figure 3.10: A mapping of the ABP pulse waveforms to R2 afforded by the first andthird non-trivial coordinates of the diffusion map. The embedded points are colouredaccording to their order of appearance; blue is early, and red is late. The yellow pointscorrespond to pulses occuring during the MI event.

73

Chapter 4

Decoding Sleep Dynamics using Multiple

EEG Channels

The electroencephalogram (EEG) does not adhere to the wave-shape oscillatory model,

but there is still a sense in which EEG signals are observations of a process on a

manifold hosting physiological states. In this chapter, we will consider sleep dynamics;

a healthy sleeper naturally oscillates between the REM (rapid eye movement) sleep

stage and the non-REM sleep stages: N1, N2, and N3. We are interested in detecting

and quantifying these changes in state. While transitions in sleep stage are marked by

more than just brain activity, we will nevertheless use only the EEG to assess these

dynamics.

For the simple reason that EEG signals are exceptionally challenging to model, we

call them chaotic. EEG signals appear to be summations of numerous intermittent

oscillatory “bursts” of electrical energy, each of which has a well-localized fundamental

frequency. These bursts of energy (called neural oscillations or brainwaves) correspond

to neuronal activity and are associated with cognitive processing. The spectral content

of a subject’s EEG signal is known to vary with his or her state of consciousness.

Activity in the delta band (0-4 Hz) is associated with unconsciousness or deep sleep,

activity in the theta band (4-8 Hz) is associated with meditation and drowsiness,

activity in the alpha band (8-16 Hz) is associated with wakeful relaxation, the beta

band (16-32 Hz) is associated with active thinking, and the gamma band (32-100 Hz)

is associated with a phenomenon termed unified perception, in which different parts of

the brain work together to accomplish demanding cognitive tasks. In this sense, the

time-localized spectrum of the EEG signal is a non-negative function on the real line

74

which provides some coarse information about the state of the brain. In this work, we

will assume that the space of such functions is a smooth, compact, low-dimensional

Riemannian manifold embedded in L2(R) which is diffeomorphic to the phase manifold

(as in the wave-shape manifold model). In principle, the time-varying power spectrum

of the EEG signal is akin to the time-varying morphology of the cardiac waveform.

In this work, we localize the spectral content of the EEG signal in time using the

continuous wavelet transform. We can use the resulting time-frequency representation

to observe sleep dynamics. One clear advantage to this approach is that we can

localize the spectrum of the EEG signal at any point in time (as opposed to only

at the incidence of certain landmarks), and we take the opportunity to observe the

spectrum at regularly sampled time points.

This chapter is organized in the following way. First, we apply the diffusion maps

algorithm to visualize the neural dynamics encoded by the changing power spectrum

of a sleeping subject’s EEG signal. Then, we describe a clinical application of these

principles; an automatic sleep stage classifier is built and validated on an annotated

database of EEG signals. We discuss how this algorithm is designed to leverage

information from multiple EEG channels (and hence multiple locations in the brain);

canonical correlation analysis is applied to preserve information that is intrinsic to

the current sleep stage and to ignore nuisance information. We provide a geometric

interpretation of the algorithm. The dual-channel automatic sleep stage classifier

that we build achieves an accuracy of 80.27% and a macro F1 score of 72.88% on our

database of 60 subjects without sleep apnea.

4.1 Background and Significance

The physiological importance of sleep in undeniable. During sleep, the body enters

a restorative state, secreting anabolic hormones which help to repair damaged cells

75

and remove metabolic waste products [GSK+04, XKX+13]. Sleep deprivation is

associated with immune system impairment and impaired cognitive function, among

other negative effects [PH96, CA+06]. Healthy sleep additionally refers to achieving an

appropriate balance between time spent in the REM and non-REM stages [KTR+94].

Slow-wave sleep is especially important as 70% of all human growth hormone secretion

occurs during this phase [VCP96, VCLP00]. This means that, for some sleep disorders,

sleeping for longer periods of time will have no effect on symptoms. Sleep disorders are

placed into one of three categories, namely: dyssomnias (e.g. insomnia or obstructive

sleep apnea); parasomnias (e.g. night terrors or sleep walking); and circadian rhythm

sleep disorders (that are symptomatic of biological clock disorientation).

Sleep disorders are assessed in a sleep lab. A subject suspected of having a

sleep disorder sleeps in the lab overnight, and his or her vital signs are continuously

monitored. A polysomnogram (PSG) includes multiple EEG, EOG, and EMG channels.

The respiratory and cardiac systems are also monitored. In total, subjects can be

monitored using as many as 25 different sensors. The 6-8 hour PSG is then manually

annotated by a trained sleep technologist. This process is a time-consuming and

repetitive task; as such, it is expensive and error-prone. Inter-rater agreement is often

low, even when 25 channels are available. An automatic algorithm for sleep stage

annotation is needed. The challenges facing the development of an automatic sleep

stage classifier include the lack of inter-rater agreement and confounding factors such

as sleep apnea events. When dealing with EEG recordings in particular, one must

overcome the high level of background noise in the EEG and the low spatial resolution

of a single EEG channel.

76

4.2 Unsupervised Extraction of Sleep Dynamics

An EEG recording x ∈ L2loc(R) is a chaotic time series whose spectral content describes

the cognitive state of the subject being monitored. To access the spectral content

at each point in time, we apply the continuous wavelet transform (CWT) which is

commonly considered in the neuroscience community [TO19, MFMTH+14, TMG16,

Aki02]. In particular, if ψ ∈ L2(R) is a mother wavelet satisfying ψ(0) = 0 with center

frequency 1, the CWT of x is defined as

W (ψ)x (a, b) =

1√|a|

∫Rx(t)ψ

(t− ba

)dt a, b ∈ R, (4.1)

where ψ denotes complex conjugation. The (non-conventional) power spectrum is

then |W (ψ)x (a, b)| and represents the energy that x has near the center frequency a at

time b ∈ R. An important property of the CWT is that the bandwidth of the filter

response

1√|a|ψ (•/a) (ξ) =

√|a|ψ(aξ) ξ ∈ R (4.2)

grows with increasing |a|. As a result, we can obtain a covering of the positive

frequency axis using a discrete set of center frequencies ai = 2i/Q∞i=1 that are are

linearly distributed in the logarithmic scale. (The parameter Q is called the number

of wavelets per octave.) This property is evidently beneficial for EEG analysis because

of how the delta, theta, beta, alpha, and gamma frequency bands are also linearly

distributed in the logarithmic scale. In our model for sleep dynamics, we will assume

the existence of a phase manifold as in Section 1.2. We additionally assume that the

locally-averaged power spectra

a 7→∫R|W (ψ)

x (a, b− t)|φ(t) dt ∈ L2(R) (4.3)

(where φ is the scaling function associated to ψ) are sampled from a smooth, compact,

low-dimensional (Riemannian) manifoldM that is diffeomorphic to the phase manifold,

77

where the lack of isometry indicates the distortion incurred during the observation

process. To recover M and the dynamics on its surface, we take the opportunity to

calculate a finite number of these power spectra at a sampling interval of δ > 0, in

which case the sampling points are

δ − t0, 2δ − t0, ..., Nδ − t0 ⊂ R, (4.4)

where t0 is an appropriate time shift, and N is the number of samples. We then apply

the diffusion maps algorithm to this finite set of locally-averaged power spectra to

recover the underlying manifold M. The result for one EEG recording is shown in

Figure 4.1.

The recovery of the phase manifold (up to diffeomorphism) provided by the

diffusion maps algorithm shows embedded power spectra that are implicitly organized

according to sleep stage. The embedding has not fully recovered the difference between

the REM and N1 stage, but this result is to be expected because an EOG channel

(which tracks eye movement) is additionally required to differentiate between these

sleep stages. In the first non-trivial coordinate of the time-1 diffusion map (see

Figure 4.2), we see a recovery of the oscillation between the REM and NREM stages.

The second non-trivial coordinate also exhibits periodicity.

This method evidently has the potential for a clinical application, and in the

rest of this chapter, we make progress in that direction. In short, we use the CWT

features extracted from the EEG channel to train a supervised automatic sleep stage

classifier. To this end, we need to make a few considerations. First, the algorithm

should be trained on a large number of annotated EEG recordings. Incorporating

EEG recordings (and their associated power spectra) from a larger number of subjects

into the embedding would serve to stabilize it. However, the lack of scalability of

kernel-based methods for manifold learning means that the diffusion maps algorithm

will have to be set aside in favor of a simpler method. This method should further

78

allow yet unseen recordings to be easily read, which kernel-based methods have a hard

time achieving. Second, to increase the accuracy of our classifier and mimic what

is done by sleep technologists, we should take multiple EEG channels into account.

After all, a severe limitation of the EEG device is the high level of background noise

it contains, and combining information from multiple EEG channels is a common

approach to signal de-noising (as often done when studying event-related potentials).

A single EEG channel is furthermore limited spatially; it can only “see” a small

portion of the brain. We will take these considerations into account when designing

our automatic sleep stage classifier.

Figure 4.1: A mapping of N = 734 locally averaged, time-localized power spectrainto R3 afforded by the top three non-trivial coordinates of the time-1 diffusion map.We color the embedded points using their associated annotations. The embeddinghas been rotated in R3 to provide a better view.

79

Figure 4.2: A projection of the morphological dynamics on the recovered phasemanifold. In green, we plot the sequence of annotations provided by the expertannotator. In red, we plot the first non-trivial eigenvector of the diffusion kernel. Inteal, we plot the second non-trivial eigenvector of the diffusion kernel.

4.3 Methods

We describe the the process of building and evaluating an automatic sleep stage

classifier that is suitable for clinical deployment. We begin with a description of our

proposed algorithm and its variations. We then discuss the database used to validate

our model, as well as the metrics used to assess our model’s performance. We briefly

describe the inter-subject cross-validation procedure we employ which assesses how

80

well our model can generalize to unseen subjects.

4.3.1 Proposed Algorithm

Our automatic sleep stage classification algorithm takes as input two simultaneously

recorded EEG channels x1, x2 ∈ Rn with sampling rate fs = 100 Hz, so that the

recording duration is 0.01n seconds. (If the native sampling rate is not 100 Hz, we

should resample the signals.) We will assume that 0.01n is an integer multiple of 30, so

that there are N = 0.01n/30 30-second epochs in the recording. (By convention, the sleep

technologist makes sleep stage annotations every 30 seconds.) We take an unsupervised

approach to feature extraction. We apply the continuous wavelet transform to each

signal. We provide the technical details here. Take an analytic Morlet mother wavelet

ψ:R→ R with its dilations ψa(t) = ψ(t/a) and their discretizations ψa ∈ RT defined

in the frequency domain as

ψa(j) = 2 exp

[−(aω(j)− ω0)2

2

]U(aω(j)) j = 1, ..., N, (4.5)

where the center frequency of ψ is ω0 = 6 radians per second, U :R→ R is defined as

U(t) :=

1 if t > 00 otherwise,

(4.6)

and ω is the discretized Fourier domain:

ω(j) =

2π(j−1)

Nif 1 ≤ j ≤ N/2 + 1

2π(j−1)N− 2π if N/2 + 1 < j ≤ N.

(4.7)

We are interested in capturing frequencies between 0.2 and 40 Hz. As such, we set

a0 =ω0fs

40× 2π(4.8)

to be the smallest wavelet scale. To obtain the other scales, we set Q = 8 wavelets

per octave, and for each j = 1, ..., 61, we set aj = a02j/Q. The largest wavelet scale

81

a61 corresponds to a center frequency of approximately 0.2026 Hz. We perform

convolution of the signal xi with the dilated wavelet ψajby multiplying in the Fourier

domain:

yj = FFT(xi) ψaj0 ≤ j ≤ 61, (4.9)

where FFT denotes the n-point fast Fourier transform. We then calculate the discrete

time-frequency representation Wi ∈ R62×n by performing

Wi (j, :) = IFFT(yj). (4.10)

To get the locally-averaged power spectrum Pi ∈ R62×3N , we perform

Pi(j, k) = mean|Wi(j, t)| : |t− k|≤ 10fs k = 10fs, 20fs, ..., 3N. (4.11)

Each column of Pi can be thought of as a non-negative signal sampled at 0.1 Hz,

and Pi(j, k) represents the average spectral information of xi during the time period

0.1k ± 0.1 seconds near the center frequency ω0fs/2πaj Hz. We build a 186×N matrix

Xi satisfying

Xi(: , k) =

log (Pi(: , 3k − 2) + ε)log (Pi(: , 3k − 1) + ε)log (Pi(: , 3k − 0) + ε)

k = 1, ..., N, (4.12)

where ε = 2−52. This matrix Xi represents a point cloud in R186, where the k-th point

(the k-th column of Xi) corresponds to the k-th 30-second epoch of the time series xi.

Uncovering the low-dimensional structure underlying this point cloud will allow us to

demarcate sleep dynamics.

Sensor Fusion

We are provided with two matrices X1,X2 ∈ R186×N which describe observations

of a latent space (the phase manifold). A variety of models (with varying degrees

of abstraction) can be considered for the sensor fusion problem. In all cases, the

82

dynamics of interest will be hosted by the latent manifold N , and the information

observed by the two sensors will be hosted on the manifolds M1 and M2. Now N

represents the common information between M1 and M2 in the sense that there are

surjective submersions ρ1:M1 → N and ρ2:M2 → N . The observed dynamics in the

first sensor could be described by a function ϕ1:R→M1, and likewise in the second

sensor by a function ϕ2:R → M2. (The dynamics can be thought of as a particle

traversing each manifold.) The dynamics of interest are then ρ1 ϕ1 = ρ2 ϕ2, where

we have equality because of the synchronization in time. We observe the dynamics in

both sensors at a finite set of time points t1 < · · · < tN . Supposing M1 and M2 are

embedded in R186, the information we are provided with is

X1 =

ϕ1(t1)>

...ϕ1(tN)>

X2 =

ϕ2(t1)>

...ϕ2(tN)>

, (4.13)

and we want to use both point clouds to recover the manifold N (and hence the

dynamics ρ1 ϕ1 = ρ2 ϕ2). A recovery of N is an embedding (isometrically, if

possible) of N into some (low-dimensional) Euclidean space. A special case which

has been previously analyzed is when ρ1 and ρ2 are diffeomorphisms [TW19]. In

this general setting, alternating diffusion or multi-view diffusion maps can obtain the

desired recovery. However, due to the problems with these methods outlined above,

we approach the task of recovery from a linear point of view.

The linear method we will employ relies on the following simplified model. Suppose

thatN is embedded in Rp for p ≤ 186. Suppose ρ1 is a 186×p matrix with orthonormal

columns, and ρ2 is a 186× p matrix with orthonormal columns. When p = 186, we

can obtain M1 and M2 from N by simply rotating the ambient space R186. When

p < 186, there is an opportunity for N to be of strictly smaller dimension than both

M1 and M2. In particular, Mi could be isomorphic to the product of N and some

nuisance structure of non-trivial dimension. In this case, the orthogonal projection

83

ρi truncates the nuisance information. Given X1 and X2 that are centered at the

origin (in the sense that 1>NX1 = 1>NX2 = 0186), we can estimate ρ1 and ρ2 (and

hence N ) via the following linear method which is approximately traditional canonical

correlation analysis. We perform

(ρ1, ρ2) = argmaxU∈O(186,p),V ∈O(186,p)〈X1U,X2V 〉 (4.14)

where 〈X1U,X2V 〉 = trace(U>X>1 X2V ) is the covariance between the projected and

rotated point clouds X1U and X2V . Of course, we can solve this optimization problem

using the singular value decomposition of X>1 X2. Indeed if UΣV > = X>1 X2, then

ρ1 should be the p columns of U which correspond to the largest singular values.

Similarly, ρ2 should be the p columns of V which correspond to the largest singular

values. A recovery of the points on N is then X1ρ1 ≈ X2ρ2, and the time information

is preserved in the ordering of the columns. The features we extract by this sensor

fusion method are

CCA =[X1ρ1 X2ρ2

]∈ RN×2p, (4.15)

These features will be used to classify the sleep stages. Other approaches to sensor

fusion that we compare with are concatenation (CON), averaging (AV), co-dimension

reduction (PCA), and the concatenation of dimension-reduced representations (DIM).

These approaches are defined as follows.

CON =[X1 X2

](4.16)

AV =X1 + X2

2(4.17)

PCA = US (4.18)

DIM =[U1S1 U2S2

](4.19)

84

where we use the singular value decomposition to get the low-rank approximations

USV> = CON>CON to rank 2p (4.20)

U1S1V>1 = X>1 X1 to rank p (4.21)

U2S2V>2 = X>2 X2 to rank p (4.22)

The classification method we employ is a set of kernel support vector machines

(kSVMs) aggregated using error-correcting output coding (ECOC). A kSVM is trained

to differentiate between all pairs of sleep stages so that there are ten kSVMs in each

ECOC ensemble. We standardize all input features by z-scoring before passing them

to the ten kSVMs. In each kSVM, we use a Gaussian kernel u 7→ e−u2/σ2

with σ = 10

and a box constraint of 1.

4.3.2 Annotated Polysomnograms

We use EEG recordings from a database of annotated polysomnograms. Standard

overnight PSG studies were performed to confirm the presence of sleep apnea syndrome

in clinical subjects suspected of sleep apnea at the sleep center in Chang Gung

Memorial Hospital (CGMH), Linkou, Taoyuan, Taiwan. The Institutional Review

Board of CGMH approved the study protocol (No. 101-4968A3). All recordings were

acquired on the Alice 5 data acquisition system (Philips Respironics, Murrysville,

PA). The sleep stages were scored by two experienced sleep technologists abiding by

the AASM 2007 guidelines [IAIC+07], and a consensus was reached. Each recording

is at least 5 hours long. We focus only on the EEG channels in this work (C3-A2,

C4-A1, O1-A2, and O2-A1), which were all sampled at 200 Hz. As suggested in

[VSKKVdV90], we start by considering the pairing C3-A2 and C4-A1. In the results

section, we report the performance of our model when evaluated on all pairings of

EEG channels. The database contains one-night recordings from 60 adult subjects

(each with apnea-hypopnea index less than 5). We did not exclude subjects based on

85

any other criterion such as gender, age, race, current medications, or previous medical

conditions. This database is called the CGMH database.

4.3.3 Evaluation Metrics

To evaluate our automatic sleep stage classifier, we consider a variety of multi-class

performance metrics. Consider the confusion matrix M ∈ N5×5, where Mij represents

the number of epochs belonging to the i-th class that were predicted to belong to the

j-th class. Ideally, Mij = 0 if i 6= j. We define the recall (REi), precision (PRi), and

F1i score associated with the i-th class as

REi = 100%× Mii∑5j=1Mji

PRi = 100%× Mii∑5j=1Mij

(4.23)

F1i =

(RE−1

i + PR−1i

2

)−1

. (4.24)

Note that REi is the percentage of epochs belonging to the i-th class which are

predicted to belong to the i-th class, PRi is the percentage of epochs predicted to

belong to the i-th class which actually belong to the i-th class, and F1i is the harmonic

mean of REi and PRi. We want all of these metrics to be as close to 100% as possible.

We also report accuracy (AC) and the macro F1 (F1) score, defined as

AC = 100%×∑5

i=1Mii∑5i=1

∑5j=1Mij

F1 =1

5

5∑i=1

F1i. (4.25)

The macro F1 score represents the average (binary) F1 score over all five classes

and will be used as our final evaluation metric. We will not weight AC heavily

because each of the five classes does not have the same number of epochs belonging

to it. As described above, we build several models for the purpose of showing the

improvement gained by adopting our proposed sensor fusion method. We run all

models through the same inter-individual cross-validation procedure. We apply 10-fold

86

cross-validation; the 60 subjects in our database are randomly partitioned into 10

non-overlapping subsets. (This partition is the same for all cross-validation events to

maintain comparability between the different models and feature extraction methods.)

By performing inter-individual cross-validation, we can get a better idea of how our

model will perform on epochs coming from subjects the model has never seen, which

is critical for clinical applicability.

4.4 Results

In our database of 60 subjects without sleep apnea, there were 45,339 thirty-second

epochs. Of these, 6,697 (14.77%) were in the wake class, 6,423 (14.17%) were in

the REM class, 5,288 (11.66%) were in the N1 class, 22,766 (50.21%) were in the

N2 class, and 4,165 (9.19%) were in the N3 class. We begin with a comparison of

sensor fusion methods. We fuse the spectral information obtained from the C3-A2

and C4-A1 channels. In Table 4.1, we report the performance statistics. We find that

canonical correlation analysis effectively fuses information from both EEG leads. We

achieve increases in ACC and F1 of 1.60% and 2.25% over the next best sensor fusion

method, which was . Interestingly, not all sensor fusion methods perform better than

both single-channel methods. Next, using the four EEG channels available in record

of the CGMH database, we explore the effectiveness of our algorithm on different

pairings of EEG channels. (We use canonical correlation analysis as the sensor fusion

method.) In Table 4.2, we report that the optimal pairing of leads was C3-A2 and

O1-A2. Finally, we show the full confusion matrix for the optimal pairing in Table 4.3.

Evidently, the classifier has difficulty differentiating the N2 and N3 stages, as well

as differentiating the N1 stage from all other stages except N3. Nevertheless, the

performance on the Wake, REM, and N2 stages is very good.

87

Table 4.1: Comparison of linear sensor fusion methods

X1 (C4-A1) X2 (C3-A2) CON AV PCA DIM CCA

F11 (%) 80.48 84.15 83.75 80.60 83.93 84.87 86.51F12 (%) 67.99 74.42 71.05 72.71 73.15 74.10 78.64F13 (%) 41.19 44.59 44.46 43.96 47.04 47.75 51.89F14 (%) 85.67 86.37 86.54 86.10 87.32 87.31 87.75F15 (%) 49.35 49.21 49.73 50.64 52.46 53.24 53.73F1 (%) 64.84 67.75 67.10 66.80 68.78 69.45 71.70

ACC (%) 74.84 77.13 76.55 76.10 77.75 78.12 79.72

CON: concatenation; AV: averaging; PCA: co-dimension reduction; DIM: concatenation ofdimension-reduced representations; CCA: canonical correlation analysis; F11: F1 score for the wakestage; F12: F1 score for the REM stage; F13: F1 score for the N1 stage; F14: F1 score for the N2stage; F15: F1 score for the N3 stage; F1: average F1 score across all five stages; ACC: accuracy.

Table 4.2: F1 scores obtained by the sleep stage classifier when using differentpairings of EEG channels

C3-A2 C4-A1 O1-A2 O2-A1

C3-A2 71.70 72.88 71.66C4-A1 72.61 70.21O1-A2 68.76O2-A1

Table 4.3: Performance of the proposed automatic sleep stage classifier

Predicted Class

Wake REM N1 N2 N3 PR (%) RE (%) F1 (%)

Tar

get

Cla

ss Wake 5953 48 604 92 0 87.93 88.89 88.41REM 86 5173 773 391 0 79.30 80.54 79.92N1 559 963 2771 994 1 57.12 52.40 54.66N2 170 339 700 20609 948 84.61 90.53 87.47N3 2 0 3 2272 1888 66.55 45.33 53.93

PR: precision; RE: recall; F1: F1 score. The overall accuracy was ACC = 80.27%, and the averageF1 score across all five classes was F1 = 72.88%.

4.5 Discussion and Conclusion

In this chapter, we discuss the development of an automatic sleep stage classifier which

fuses spectral information from two EEG channels to remove nuisance information.88

Our geometric interpretation of sleep dynamics and the sensor fusion method provides

a reasonable guarantee for the effectiveness of our algorithm. We numerically validated

our algorithm using two databases of annotated PSGs. We examined the effectiveness

of our algorithm on a variety of pairs of EEG channels. While this algorithm is a step

toward a working, robust model that can be directly applied in the sleep laboratory,

there is still a large amount future work which must be done.

4.5.1 Limitations

We do not consider subjects with mild to severe sleep apnea. Sleep apnea is a

common sleep disorder, and a sleep stage classification algorithm designed for clinical

application should be able to demonstrate strong performance on subjects having the

disorder. However, patients with sleep apnea are excluded from this study for the

following reasons. Sleep apnea events cause very fast changes in sleep stage. The

AASM guidelines for sleep staging (which were made with normal physiology in mind)

do not allow sleep technologists to change the frequency at which they make sleep

stage annotations; annotations are made exactly once every 30 seconds. In normal

subjects, sleep stages transition slowly and in a canonical order. The sleep technologist

is put in a difficult situation when one 30-second epoch contains features indicative of

multiple sleep stages (which is generally the case when an apnea event occurs during

that epoch). It would seem that current regulations are insufficient for simultaneously

assessing sleep stages and sleep apnea events.

Reiterating, the characterization of sleep dynamics developed by Rechtschaffen

& Kales and modified by the AASM was made with normal sleep physiology in

mind. As such, it is severely limited when assessing pathological subjects. There is a

growing body of evidence that sleep pathology in humans can manifest in a variety

of previously uncharacterized ways. For example, sleep stages may be dissociated

89

across different regions of the brain [NFM+11]. There may be extra “transitional”

sleep stages that exist between the canonical six stages, and it has been suggested to

view the time spent awake before falling asleep as distinct from arousal events that

occur during the night [PDMB15, MS92].

The discipline of sleep staging is in general plagued by a lack of inter-rater agree-

ment [NPS+00, DhAZ+09]. Not only is sleep staging often an ambiguous endeavour,

but sleep technologists are under pressure to perform the repetitive task of epoch

annotation quickly and for extended periods of time; errors will almost certainly be

made. The lack of inter-rater agreement means that our labels are very noisy. On

the other hand, we may be over-fitting to the preferences of one annotator (or group

of co-annotators). Moving from one database of PSG recordings to another presents

additional challenges which we did not address in this study. EEG channels recorded

on different devices (at different sampling and bit rates), using different electrodes

(in a hardware sense), and placed on the subject by different individuals can yield

drastic changes in both the quality and content of the signal. In this work, we did

not assess the transfer proficiency of our model between lead types, but this presents

another challenge.

We did not directly attempt to assess inter-individual variability. It may be the

case that the spectral content of one subject’s N2 epochs is, on average, slightly

different (via a transposition, perhaps) than the spectral content of another subject’s

N2 epochs. For example, we know that sleep spindles (bursts of activity between

12 and 14 Hz that are associated with transitions to the N3 stage) have different

fundamental frequencies in different individuals, and that the rate of occurrence of

sleep spindles varies from person to person. We did not take into account variations in

the amplitude of the EEG waveform that could arise when moving from one subject

to another.

90

Finally, a vast amount of structure exists inside the CWT representation of an

EEG waveform, and we did not take advantage of this structure. For example, we did

not directly take into account the geometry inherent in the ordering and spacing of

the center frequencies of the bandpass filters used to compute the discretized CWT.

On another level, we did not take into account how pairs or groups of frequency

bands work together to differentiate sleep stages. We also did not take into account

transition probabilities among sleep stages or any temporal information. However,

we note that including this information in the model may lead to poor performance

on pathological subjects who, by definition, would not exhibit the same transition

probabilities between sleep stages.

91

Chapter 5

Supervised Extraction of Sleep Dynamics

using Instantaneous Heart Rate

Fluctuations in heart rate are intimately tied to changes in the physiological state

of the organism. We examine and exploit this relationship by classifying a human

subject’s wake/sleep status using his instantaneous heart rate (IHR) series. We use a

convolutional neural network (CNN) to build features from the IHR series extracted

from a whole-night electrocardiogram (ECG) and predict every 30 seconds whether the

subject is awake or asleep. Our training database consists of 56 normal subjects, and

we consider three different databases for validation; one is private, and two are public

with different races and apnea severities. On our private database of 27 subjects, our

accuracy, sensitivity, specificity, and AUC values for predicting the wake stage are

83.1%, 52.4%, 89.4%, and 0.83, respectively. Validation performance is similar on our

two public databases. When we use the photoplethysmography instead of the ECG

to obtain the IHR series, the performance is also comparable. A robustness check is

carried out to confirm the obtained performance statistics. This result advocates for

an effective and scalable method for recognizing changes in physiological state using

non-invasive heart rate monitoring. The CNN model detects fluctuations in IHR (as

well as their locations in the signal), and this information is suitable for differentiating

between the wake and sleep stages. This chapter is the peer-reviewed publication

[MLW18].

92

5.1 Introduction

Sleep is a key ingredient to our well-being, and disrupted sleep may lead to catastrophes

in personal medicine or public health [KTR+94, KLB+09, CA+06]. Understanding

sleep is critical for the whole healthcare system. The gold standard for quantifying

sleep dynamics is the polysomnography (PSG). However, it is bulky, complicated to

install, labor-intensive, and expensive. It is also possibly error-prone if an overnight

PSG recording is manually examined by experts [NPS+00]. An automatic, convenient,

and accurate way to study sleep is therefore in high demand. Efforts made in the past

few decades to accomplish these tasks have been assisted by advancements such as

bio-sensors, wearable technology, and mobile health devices.

The assessment of sleep stages using vital signs is most commonly done using the

electroencephalogram (EEG). The photoplethysmography (PPG) and the electrocar-

diogram (ECG) are also widely considered. Compared to a PSG, these signals are easy

to install and cheap to obtain, and long-term monitoring is possible. The challenge

with these signals is that the amount of available information is limited. Neural

networks lend themselves advantageously to this task, as they excel at extracting and

compressing information into representations that are useful for the application at

hand [LBH15]. Motivated by the needs and challenges associated with predicting

sleep dynamics using mobile health devices, we focus on analyzing ECG and PPG

signals in this work, and we propose an effective and scalable approach to analyzing

heart rate variability (HRV) as it pertains to physiological events and outcomes

[MC90, otESoC+96, BLS11].

5.1.1 Heart Rate Variability and Sleep

HRV is quantified by studying the time series consisting of intervals between con-

secutive pairs of heart beats, which is usually called the R-to-R interval (RRI)

93

time series. Interest in HRV has a long history [Bil11], and it has been exten-

sively studied in the past few decades. See, for example, a far-from-complete

list of review articles [VPH+09, SMZ14]. The rhythm of the heart is regulated

by the autonomic nervous systems (ANS). Changes in heart rate are the result

of complicated interactions between physiological systems and external stimuli.

They reflect the integrity of the physiological systems. In short, analyzing HRV

amounts to observing through a non-invasive window the physiological dynamics of

an organism. It has long been believed that a correct quantification of HRV will

yield a deeper understanding of a variety of physiological systems, including sleep

[SHMG64, ZVS84, SDMA93, VQMR95, TGP+96, BA97, EHO99, CD14, PKB+16].

For example, while a subject is awake, the sympathetic tone of the ANS is domi-

nant. The heart rate is higher due to daily activity and external stimuli, and the

rhythm of the heart is less stable [SDMA93]. The heart rate is relatively lower

when the subject is asleep. It reaches its lowest value during slow wave (deep) sleep

[SHMG64]. During non-REM (rapid eye movement) sleep, the parasympathetic

nervous system dominates, the sympathetic tone becomes less dominant, and the

energy restoration and metabolic rates reach their lowest levels. The heart rate

decreases, and the rhythm of the heart stabilizes [SDMA93]. These physiological

facts indicate that we can distinguish between the wake and sleep stages by taking

heart rate into account. There have been several works in this direction, such as

[LSCS05, LSC+07, MMC+10, LFH+12, XYS+13, AMT+15, YYJG16, VLBB16].

With appropriate HRV quantities, physicians can better understand the ANS and

hence improve both diagnostic accuracy and treatment quality [VPH+09]. However,

despite a lot of research in this direction and many quantitative indices proposed by

experts, there is limited consensus in the universality of HRV analysis. The reason for

the lack of extensive success in this field is largely due to the non-stationary nature of

94

the system. See, for example, some published discussions on this topic [PG94, Gla09].

We do not expect to sort out this difficult problem in this exploratory work which

focuses on the sleep stage classification problem. Instead, we propose the use of a

convolutional neural network (CNN) on the IHR series as a scalable approach to

adaptively quantifying heart rate fluctuation from a variety of monitoring sources. A

CNN [LBBH98] is a special kind of neural network (NN) capable of building features

from time series and images. A CNN detects not only the presence of structured

features which are important for classification, but also their temporal (spatial)

location. CNN design is motivated by the hypothesized mechanism for the animal

visual cortex. An important property of the CNN is that it behaves equivariantly

with respect to translations of input features. This design is favorable for sleep

stage classification because the precise locations of fluctuations which correspond

to sympathetic (“fight or flight”) tone dominance provide physiological information

about sleep stage. We refer readers with interest in CNN architecture to the review

article [LBH15]. There have been several studies using NNs and manually designed

heart rate features to predict sleep stages [LSCS05, AMT+15], but to the best of our

knowledge, CNNs have not been considered in the field. We show that the CNN

approach is effective for accomplishing this task, providing evidence that it may lend

itself favorably to other HRV applications.

5.2 Materials and Methods

5.2.1 Data

Standard overnight PSG studies were performed to confirm the presence of sleep

apnea syndrome in clinical subjects suspected of sleep apnea at the sleep center in

Chang Gung Memorial Hospital (CGMH), Linkou, Taoyuan, Taiwan. The Institu-

tional Review Board of CGMH approved the study protocol (No. 101-4968A3). All

95

recordings were acquired on the Alice 5 data acquisition system (Philips Respironics,

Murrysville, PA). The sleep/wake stages were defined and scored by two experienced

sleep technologists abiding by the AASM 2007 guidelines [IAIC+07], and a consensus

was reached. Each recording is at least 5 hours long. We focus only on the PPG

recording and on the second lead of the ECG recording, which were both sampled at

200 Hz. There are 90 healthy subjects (each with apnea-hypopnea index less than

5) in the training database, among which we consider only 56 subjects who were

labeled as awake for at least 10% of the recording duration. This database is called

CGMH-training.

We consider three validation databases. The validation databases were not used

to tune the model’s parameters, and were not subjected to any rejection. The first

one consists of 27 subjects and was acquired independently of CGMH-training from

the same sleep laboratory. This database is called CGMH-validation. The other two

databases are publicly available. The DREAMS Subjects Database1, consists of 20

recordings from healthy subjects. The recordings were selected by the TCTS Lab to

be of high clarity and to contain few artifacts. The technologists used the Brainnet

system (Medatec, Brussels, Belgium) to obtain the ECG recordings. The sampling

rate is 200 Hz, and the minimum recording duration is 7 hours. Although the race

information is not provided, we may assume that since the database is collected from

Belgium, its population constitution is different from that of the CGMH databases.

The third database is the St. Vincent’s University Hospital/University College

Dublin Sleep Apnea Database (UCDSADB) that is publicly available at Physionet

[GAG+00]2 and which consists of 25 subjects with sleep apnea of various severities.

The technologists used Holter monitors to obtain the ECG recordings. The sampling

rate is 128 Hz, and the minimum recording duration is 6 hours. We focus on the first

1http://www.tcts.fpms.ac.be/~devuyst/Databases/DatabaseSubjects

2https://physionet.org/pn3/ucddb

96

ECG lead in this study. The DREAMS Subjects Database is chosen to assess the

model’s performance on recordings which come from subjects of a different race, and

the UCDSADB is chosen to assess the model’s performance on recordings which come

from subjects with sleep disorders.

5.2.2 Instantaneous Heart Rate

We apply a standard automatic R peak detection algorithm to each ECG recording,

which is a modification of that found in [Elg13]. Let rini=1 denote the location in

time (sec) of the n detected R peaks. We ensure that there is no artifact in the

detected R peaks by ignoring beats which are either too close or too far from their

preceding beats. (The mechanism for detecting false positive or missing beats is

based on a 5-beat median filter.) We view the instantaneous heart rate (IHR) as a

continuous and positive function. We estimate the IHR at time ri in beats-per-minute

(bpm) as

IHR (ri) =60

ri − ri−1

i = 2, ..., n. (5.1)

Using these estimates, we follow the Task Force standard [otESoC+96] and obtain

the IHR series over the duration of the recording at a sampling rate of 4 Hz. The

interpolation method is shape-preserving piecewise cubic interpolation. When the

PPG signal is considered, the same procedure is applied: we view the peaks in each

PPG recording as surrogate heart beats, and these peaks are automatically detected

by [ENB+13]. With the same interpolation scheme, the resulting time series is called

IHR-PPG (for the sake of distinguishing it from the IHR series extracted from the

ECG signal). We break the IHR (or IHR-PPG) signal into 30-second epochs, abiding

by the boundaries originally assigned by the sleep technologists. We discard an epoch

if fewer than five R peaks are detected within it.

97

5.2.3 Convolutional Neural Network

We implement the 1-dimensional CNN using TensorFlow 1.5 [AAe15]. The feed-

forward network architecture is shown in Figure 5.1. The input signal is first passed

through five convolution blocks. Following common practice in HRV analysis [Eng98,

CdlHMFAL06], we consider input signals 5 minutes in length. (Building an input

signal 5 minutes in length amounts to concatenating the labeled 30-second epoch with

the preceding 4 minutes and 30 seconds.) We normalize each 5-minute input signal

by subtracting its median value.

The architecture of a single convolution block is shown in Figure 5.2. Each

convolutional layer in the block has 10 filters with kernel size 8. A bias is added to the

output of each filter, and the result is passed through a rectified linear unit (ReLU)

activation function. The convolution blocks are followed by two fully-connected

(dense) layers of 20 nodes each. The fully-connected nodes also have associated

biases and ReLU activation functions. We apply dropout with probability 0.5 to the

last convolutional layer and to both fully-connected layers. The output of the last

fully-connected layer is fed with biases into a 2-node output layer. We normalize the

output of the network using the softmax function. We predict that a subject is awake

during a given epoch if the output of the “wake” node is greater than or equal to the

output of the “sleep” node. We train the network for 180 epochs using mini-batch

gradient descent with a learning rate of 10−3, a batch size of 24, and cross-entropy as

the loss function.

5.2.4 Performance Evaluation

We train the CNN model to differentiate between the wake and sleep stages by

fitting the model’s parameters to the IHR series extracted from CGMH-training.

We then apply the trained model to three different validation databases: CGMH-

98

Input

Convolution Blocks

20-Dense

20-Dense

Output

Figure 5.1: The architecture of the 1-dimensional convolutional neural network. Thenotation m-Dense means that the fully connected layer possesses m nodes. For inputsignals 5 minutes in length, we use five convolution blocks.

(10, 8, 1)-Convolution

(10, 8, 2)-Convolution

Figure 5.2: The architecture of a single convolution block. The notation(f, k, s)-convolution means that the convolutional layer has f filters with kernelsize k and stride s. The output of the block is half the size of the input.

validation, the DREAMS Subjects Database, and the UCDSADB. We report a variety

of performance measures, namely sensitivity (recall), specificity, accuracy, AUC (area

under the receiver operating characteristic curve), precision (PPV), the F1 score, and

Cohen’s kappa coefficient. We emphasize that our model assessment is performed on

an inter-individual basis. We further access how effectively a model trained on signals

from one monitoring device might be applied to recordings made on another device.

The model trained on IHR series is applied to IHR-PPG series, and vice versa. (The

PPG signal is commonly installed in modern mobile devices.) We compare the PPG

performance statistics with those obtained using the corresponding ECG recordings.

99

5.2.5 Robustness Check

To check the robustness of the considered CNN model, we perform two sensitivity

tests. First, we consider different input sizes, namely 30 seconds, 2 minutes, and

10 minutes. We assign a different number of convolution blocks to each input size.

When the input size is 30 seconds (2 minutes and 10 minutes, respectively), we build

up 3 (4 and 6, respectively) convolution blocks. In Figure 5.5, we show that inputs

of various sizes are constructed by additionally considering a length of time before

the labeled epoch. In our second sensitivity test, we train new CNN models on the

DREAMS Subjects Database and the UCDSADB. We validate these two models on

our three validation databases. Note that the sampling rate for the UCDSADB is

unique, and that the data collection system is heterogeneous across all databases;

these observations allow us to test the robustness of the established CNN model.

5.3 Results

Our training database from CGMH consists of recordings from 30 healthy males aged

36.8± 14.0 years, 23 healthy females aged 43.8± 13.6 years, and 3 healthy subjects

of unknown gender and age. The AHI and BMI of the male group are 2.7± 1.4 and

23.1±2.9, and the AHI and BMI of the female group are 2.6±1.4 and 23.6±4.6. The

AHI and BMI of the remaining subjects are not known. CGMH-validation consists

of recordings from 10 healthy males aged 42.3± 12.6 years, 16 healthy females aged

47.1± 17.9 years, and one healthy subject of unknown gender and age. The AHI and

BMI of the male group are 3.0 ± 1.0 and 24.4 ± 3.4, and the AHI and BMI of the

female group are 2.4± 1.6 and 24.2± 5.0. The AHI and BMI of the remaining subject

are not known. We dismiss 203 noisy epochs from the training set and 199 from the

testing set because they contain less than five detected R peaks. After dismissal, there

are a total of 41, 472 points in the training set and 20, 102 points in the validation set.

100

The percent of wake labels is 18.8% in the training set and 17.1% in the validation set.

All processing, training, and validation calculations are performed on an Intel® Core

i7-4790K Processor at 3.60 GHz with 24 GB of RAM and a Microsoft® Windows®

10 Home operating system (version 1709). In all tables, the positive class corresponds

to the “wake” label. Because of the imbalance between the positive and negative

classes, we should take care when assessing the reported performance measures.

5.3.1 Model Validation

In Table 5.1, we show the performance measures associated with fitting the model

to CGMH-training and applying it to the three validation databases. We obtain

a good AUC value of 0.83 and a moderate Cohen’s kappa coefficient of 0.41 on

CGMH-validation. The F1 score is 0.51, which is not high due to the low precision.

To examine the effect of imbalanced classes on the F1 and precision scores, we apply

the trained CNN model to each subject in CGMH-validation individually, and we

compare the associated F1 and precision values to the percentage of “wake” labels in

the given recording (see Figure 5.3). If w denotes the percentage of ”wake” labels

in a given recording, a linear regression analysis shows that F1 = 0.32 + 0.0086w,

where the slope is statistically significant under t-test when the significance level is

set to p = 0.05, and precision = 40.16 + 1.03w, where the slope is also statistically

significant.

We show that our model performs well when applied to external recordings that

are associated with healthy subjects of a different race. The DREAMS Subjects

Database has recordings from 16 healthy females aged 35.8± 15.5 years and 4 healthy

males as aged 20, 23, 27, and 27 years. We dismiss 25 noisy epochs from the database

before evaluation because they are labeled as such or because they contain less than 5

detected R peaks. After dismissal, there are 20, 032 epochs remaining, and the percent

101

of wake labels is 16.7%. The AUC value is strong at 0.81, compared to the value 0.83

on the internal CGMH-validation database. The sensitivities and specificities are also

similar. The slight difference in performance could be attributed to the fact that the

sleep technologists supervising the DREAMS Subjects Database selected recordings

which had few artifacts, whereas such artifacts and their physiological correlates were

relevant in the CGMH-training data for classification.

We show that our model struggles to achieve convincing results when applied

to a database of subjects whose sleep physiology is different. The UCDSADB has

recordings from 21 males aged 48.3 ± 8.4 years and 4 females aged 41, 62, 63, and

68 years. The AHI and BMI of the male group are 25.4± 21.3 and 31.5± 4.3, and

the AHI and BMI of the female group are 18.3± 14.6 and 32.2± 2.6. We dismiss 91

noisy epochs from the database before evaluation because they are labeled as such

or because they contain less than 5 detected R peaks. After dismissal, there are

19, 974 epochs remaining, and the percent of wake labels is 21.2%. The performance

measures for the UCDSADB, while acceptable, are significantly lower than those

observed in the previous model validation steps. To explore the reason for this drop

in performance, we evaluate the CNN model on each subject from the UCDSADB

individually. We calculate the AUC and Cohen’s kappa coefficient for each subject

and plot these numbers in Figure 5.4 against his or her apnea-hypopnea index (AHI).

A subject with a high apnea-hypopnea index experiences frequent interruptions in

sleep that arise from breathing difficulties. Since we have trained our model on a

database of subjects who do not experience such interruptions, we expect to see a

correlation between AHI and model performance measures. We choose AUC and

Cohen’s kappa coefficient because they are relatively stable under changes in class

size. A linear regression analysis shows that AUC = 0.8− 0.0034AHI, where the slope

is statistically significant under t-test when the significance level is set to p = 0.05,

102

Table 5.1: Performance statistics for the CNN model trained on CGMH-training

CGMH-training

CGMH-validation

DREAMSSubjects

UCDSADB

TP 4, 464 1, 800 1, 777 1, 838FP 2, 143 1, 763 2, 151 2, 853TN 31, 550 14, 906 14, 532 12, 883FN 3, 315 1, 633 1, 572 2, 400

SE (%) 57.4 52.4 53.1 43.4SP (%) 93.6 89.4 87.1 81.9

ACC (%) 86.8 83.1 81.4 73.7

PR (%) 67.6 50.5 45.2 39.2F1 0.62 0.51 0.49 0.41

AUC 0.90 0.83 0.81 0.72Kappa 0.54 0.41 0.38 0.24

TP: true positive; FP: false positive; TN: true negative; FN: false negative; ACC: accuracy; AUC:area under the ROC curve; PR: precision; SE: sensitivity; SP: specificity.

and Kappa = 0.34− 0.0047AHI, where the slope is also statistically significant.

5.3.2 Transfer Proficiency Among Monitoring Devices

We extract IHR-PPG series from the PPG recordings in CGMH-training and call

this database CGMH-training-PPG. Similarly, we extract IHR series from the PPG

recordings in CGMH-validation and call this database CGMH-validation-PPG. We

then train two models: one on CGMH-training, and one on CGMH-training-PPG. In

Table 5.2, we show the performance measures associated with applying both models to

CGMH-validation and CGMH-validation-PPG. We see that the overall performance

slightly drops when we validate the model on different modality. When the model

is both trained and validated on IHR-PPG series, the performance is equivalent, or

slightly better, to that of the model trained and validated on the ECG-derived IHR.

103

Percent Wake

0 10 20 30 40 50 60

F1Score

0

0.2

0.4

0.6

0.8

1

Precision

0

0.2

0.4

0.6

0.8

1

F1 Score

Precision

Figure 5.3: We apply the trained CNN model to each subject in CGMH-validationindividually. We plot the associated F1 and precision values against the percentage of“wake” labels in the given recording. We see that the model’s performance statisticsincrease when the percent of “wake” labels increases.

AHI

0 20 40 60 80 100

AUC

0.4

0.6

0.8

1

KappaCoeffi

cient

-0.5

0

0.5

1

AUC

Kappa Coefficient

Figure 5.4: We apply the trained CNN model to each subject in the UCDSADBindividually. We plot the associated AUC and Cohen’s kappa values against eachsubject’s apnea-hypopnea index. We see that the model performs well on healthysubjects.

104

Table 5.2: Transfer proficiency of the CNN model among different monitoring devices

TrainingDatabase

CGMH-training

CGMH-training-PPG

ValidationDatabase

CGMH-validation

-PPG

CGMH-validation

CGMH-validation

-PPG

TP 2, 085 1, 217 1, 760FP 3, 335 1, 042 1, 522TN 13, 324 15, 627 15, 137FN 1, 331 2, 216 1, 656

SE (%) 61.0 35.5 51.5SP (%) 80.0 93.8 90.9

ACC (%) 76.8 83.8 84.2

PR (%) 38.5 53.9 53.6F1 0.47 0.43 0.53

AUC 0.79 0.81 0.84Kappa 0.33 0.34 0.43


105

30 sec

30 sec

30 sec

30 sec

Labeled EpochAdditional Input

90 sec

270 sec

570 sec

Figure 5.5: We construct input signals of various lengths by additionally consideringdata which precedes the labeled epoch.

5.3.3 Sensitivity Analysis

We examine the effect of input size on the performance of the CNN model. (See

Figure 5.5 for a depiction of the considered input sizes.) In Table 5.3, we show

the performance measures associated with fitting the model to CGMH-training and

applying it to the three validation databases. When the model is trained using an

input size of 30 seconds, the validation performance is lower than when the model

is trained using any amount of the preceding information. When the input size is 2

minutes, the performance is closer to the performance obtained using an input size of

5 minutes. Note, however, that the benefits of including large amounts of preceding

information are not unlimited. We observe no significant improvement in validation

AUC by considering 10 minutes instead of 5 minutes, whereas the improvement

obtained by considering 5 minutes instead of 30 seconds is large. Evidently, the lack of

improvement when considering an input size of 10 minutes is the result of over-fitting

to the training database. Whereas the training process for input signals of length

30 seconds takes approximately 20 minutes, the same process takes over 5 hours for

input signals of length 10 minutes. We declare a trade-off between computational

efficiency and classification proficiency.

Our sensitivity analysis proceeds. We fit the CNN model to our external validation

106

Table 5.3: Sensitivity analysis for the CNN model trained on CGMH-training withvarious input sizes

Evaluated On Input Size 30 Sec 2 Min 10 Min

CGMH-training

TP 2, 860 4, 240 6, 387FP 1, 590 1, 416 2, 419TN 32, 127 32, 294 31, 198FN 5, 399 3, 858 908

SE (%) 34.6 52.4 87.6SP (%) 95.3 95.8 92.8ACC (%) 83.4 87.4 91.9

PR (%) 64.3 75.0 74.3F1 0.45 0.61 0.79AUC 0.81 0.90 0.96Kappa 0.36 0.54 0.74

CGMH-validation

TP 1, 067 1, 619 2, 012FP 1, 164 1, 705 2, 527TN 15, 520 14, 977 14, 072FN 2, 594 1, 963 1, 221

SE (%) 29.2 45.2 62.2SP (%) 93.0 89.9 84.8ACC (%) 81.5 81.9 81.1

PR (%) 47.8 48.7 44.3F1 0.36 0.47 0.52AUC 0.75 0.81 0.83Kappa 0.26 0.36 0.40

DREAMSSubjects

TP 998 1, 458 1, 775FP 1, 197 1, 891 2, 267TN 15, 486 14, 792 14, 404FN 2, 531 2, 011 1, 386

SE (%) 28.3 42.0 56.2SP (%) 92.8 88.7 86.4ACC (%) 81.6 80.6 81.6

PR (%) 45.5 43.5 43.9F1 0.35 0.43 0.49AUC 0.69 0.76 0.81Kappa 0.25 0.31 0.38

UCDSADB

TP 1, 344 1, 789 2, 199FP 2, 293 2, 874 3, 963TN 13, 446 12, 865 11, 750FN 3, 116 2, 596 1, 812

SE (%) 30.1 40.8 54.8SP (%) 85.4 81.7 74.8ACC (%) 73.2 72.8 70.7

PR (%) 37.0 38.4 35.7F1 0.33 0.39 0.43AUC 0.61 0.68 0.69Kappa 0.17 0.22 0.25


107

databases, one by one, so that we have two trained models. We use an input size of 5

minutes. In Table 5.4, we report the performance measures associated with applying

these models to our three validation databases. Note that some of the performance

measures are inflated because the model has been applied to its training database.

We notice that the DREAMS Subjects Database does not provide a result which

generalizes well to all other databases. Specifically, the associated model obtained an

AUC value of 0.67 on the UCDSADB. This poor performance could be attributed to

the size of the DREAMS Subjects Database or the relative homogeneity of the subjects

compared to the UCDSADB. However, the model trained on the UCDSADB does

transfer effectively to the other databases, and we suggest that the reason involves the

fact that number of “wake” labels is relatively high. This result is surprising because

the number of subjects in this database is less than half the number of subjects in

CGMH-training.

5.4 Further Exploration of the CNN via Data and

Feature Visualization

Although there have been several works trying to theoretically understand CNNs, like

[Mal12, LTR17, WB17], our knowledge about the CNN framework is still limited. We

should take care to observe that the behaviour of our model is not unreasonable. At

the very least, viewing the activity of the model provides insight into how the network

captures HRV quantities. We introduce some notation to ease the exposition. Write

xi20102i=1 ⊂ R1200 for the 5-minute input signals from CGMH-validation. For each i,

let yi = f(xi) ∈ R38×10 denote the output of the last convolution block in the network,

where f :R1200 → R38×10 is used to denote applying the convolutional section of the

network in a feed-forward fashion. The size of each yi is determined by the network

architecture: there are 10 filters in the last convolutional layer, and we think of the

108

Table 5.4: Performance statistics for the CNN models trained on the DREAMSSubjects Database and the UCDSADB

TrainingValidation CGMH

-validationDREAMSSubjects

UCDSADB

DREAMSSubjects

TP 1, 517 2, 095 1, 166FP 1, 928 1, 396 1, 713TN 14, 741 15, 287 14, 023FN 1, 916 1, 254 3, 072

SE (%) 44.2 62.6 27.5SP (%) 88.4 91.6 89.1ACC (%) 80.9 86.8 76.0

PR (%) 44.0 60.0 40.5F1 (%) 0.44 0.61 0.33AUC 0.77 0.89 0.67Kappa 0.33 0.53 0.19

UCDSADB

TP 1, 579 1, 723 2, 169FP 2, 271 3, 000 911TN 14, 398 13, 706 14, 825FN 1, 854 1, 626 2, 069

SE (%) 46.0 51.5 51.2SP (%) 86.4 82.0 94.2ACC (%) 79.5 76.9 85.1

PR (%) 41.0 36.5 70.4F1 (%) 0.43 0.43 0.59AUC 0.75 0.75 0.87Kappa 0.31 0.29 0.50


109

38 × 10 matrix yi as 10 filtered and down-sampled versions of the original xi. The

intuition is that if yji ∈ R38 denotes the j-th column of yi, i.e. the j-th filtered and

down-sampled version of xi, then the probability that xi contains some feature ψj

at a time corresponding to the t-th sample (1 ≤ t ≤ 38) is yji (t). Note that f is a

non-negative matrix-valued function because of the ReLU activation function, which

maps all negative numbers to zero. We discover the morphology of the features ψj as

follows. Fix one of the 38 samples t, and fix j ∈ 1, ..., 10, one of the 10 non-linearly

filtered signals outputted by the final convolution block. Consider the subset of 400

indices defined sequentially as

ik := arg maxyji (t) : i /∈ i1, ..., ik−1 1 ≤ k ≤ 400, (5.2)

and set

ψj(t) := medianxik : 1 ≤ k ≤ 400 ∈ R1200. (5.3)

In Figure 5.4, we plot ψj(t) for 1 ≤ j ≤ 10 and t = 18. These plots represent the

common information held among input signals with similar activations. Note that

choosing a different sample t centers the main activity of the plot at a different point

in time. The captured features appear to be related to low-frequency variabilities

inside the IHR series, but this result could be merely a consequence of the fact that

the averaging process cancels out the detailed information.

According to our analysis, the features in plots 2, 3, 5, 8, 9, and 10 are indicative of

the wake stage, whereas the other plots are indicative of the sleep stage. The procedure

is as follows. For each sample t ∈ 1, ..., 38 and each output signal j ∈ 1, ..., 10,

we perform two statistical tests using the labeled data from CGMH-validation. For

each i, let xi(l) denote the label (1 or 0 for “wake” or “sleep”, respectively) associated

to the input signal xi. In Figure 5.4 (a), the (j, t)-entry is − logp, where p is the

probability that the set

yji (t) : xi(l) = 1 (5.4)

110

Time (seconds)-200 -100 0

Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15


Med

ianHR

Fluctuation

0

5

10

15

Figure 5.6: We illustrate the features captured by the trained CNN model. Theplots in the first row are ψ1(18)-ψ4(18), the plots in the second row are ψ5(18)-ψ8(18),and the plots in the third row are ψ9(18)-ψ10(18).

is sampled from a distribution with median less or equal to the median of the

distribution from which the set

yji (t) : xi(l) = 0 (5.5)

is sampled. The colour black is associated with large values of − log p and the

proposition that if xi possesses feature ψj at a time corresponding to sample t, then

xi(l) is likely equal to 1. In Figure 5.4 (b), the (j, t)-entry is − logp, where p is the

probability that the set

yji (t) : xi(l) = 0 (5.6)

is sampled from a distribution with median less or equal to the median of the

distribution from which the set

yji (t) : xi(l) = 1 (5.7)

111

Output Signal j

2 4 6 8 10

Sample

t

10

20

30

(a) Features indicative of the wake stageOutput Signal j

2 4 6 8 10

Sample

t

10

20

30

(b) Features indicative of the sleep stage

Figure 5.7: At each time t, we assess the predictive potential of the 10 features ψjlearned by the CNN model. The black colour in the (j, t)-entry of (a) (resp. (b))indicates that the feature ψj is useful for identifying the wake (resp. sleep) stage whenit occurs at sample t.

is sampled. The colour black is associated with large values of − log p and the

proposition that if xi possesses feature ψj at a time corresponding to sample t, then

xi(l) is likely equal to 0. Remember that the label associated to xi is based on the

last 30 seconds of xi. What we see is expected: detecting a feature closer to the end

of the signal (t→ 38) is more useful for determining the true label of the input signal.

In Figure 5.4, we feed input signals 5 minutes in length through the trained CNN.

Given an input signal xi, we plot the output signals

σwakei (t) := maxyji (t) : j = 2, 3, 5, 8, 9, 10; (5.8)

σsleepi (t) := maxyji (t) : j = 1, 4, 6, 7. (5.9)

Note that we previously established which features ψj in our model correspond to

the wake and sleep stages (see Figure 5.4). Maximum activations are considered

to enhance visibility. What we see is expected: in the first three examples (from

CGMH-validation), when the subject is awake, the network detects the fluctuations

which, based on our physiological knowledge [SHMG64, SDMA93], should correspond

to sympathetic tone dominance and line up with the subject’s arousal. For the first

input signal, we can visualize a clear change in heart rate during the wake stage. For

112


Heart

RateFluctuation

-10

0

10

20

30

40

xi


Heart

RateFluctuation

-10

0

10

20

30

40

xi


Heart

RateFluctuation

-10

0

10

20

30

40

xi


Heart

RateFluctuation

-10

0

10

20

30

40

xi


Max

imum

Activation

0

0.5

1

1.5

σwakei


Max

imum

Activation

0

0.5

1

1.5

σwakei


Max

imum

Activation

0

0.5

1

1.5

σwakei


Max

imum

Activation

0

0.5

1

1.5

σwakei


Maxim

um

Activation

0

0.5

1

1.5

σsleepi


Maxim

um

Activation

0

0.5

1

1.5

σsleepi


Maxim

um

Activation

0

0.5

1

1.5

σsleepi


Maxim

um

Activation

0

0.5

1

1.5

σsleepi

Figure 5.8: We present typical activations in the last convolutional layer of the CNNmodel. The top row shows four typical input IHR signals, and the correspondingactivations in the last convolutional layer of the CNN model are shown in the bottomtwo rows. See (5.8) for definitions of σwake

i and σsleepi . The red colour indicates that

the subject is awake, whereas the black colour indicates that the subject is asleep.

the second and third signals, σwakei appears to be correlated with the irregularity of

the IHR series, whereas for the last signal, σsleepi appears to be correlated with the

regularity of the IHR series. It is not always visually clear which properties of the

IHR series the network is recognizing, but the results in this paper indicate that the

network is successful at capturing these hidden dynamics.

5.5 Discussion

In the field of HRV analysis, an HRV index is usually calculated on a window (subset)

of the IHR or RRI series. For example, it is suggested to estimate HRV indices

such as approximate entropy [Eng98] or one of the many Poincare map-based indices

[CdlHMFAL06] using a window containing at least 300 beats. The choice of window

113

size strongly affects the quality of information extracted by the chosen HRV index.

If the window size is large, and the IHR can be well-approximated by a stationary

process, the HRV index is stable. But in the presence of non-stationarity (as is usually

the case for large windows), the index becomes unreliable (e.g. the standard deviation

of a time series is strongly affected by baseline wandering). If the window size is small,

under an assumption of local stationarity, the impact of non-stationarity becomes

negligible. However, shortening the window destabilizes most indices because they

are calculated using too little information. Hence, a type of “uncertainty principle”

exists in HRV analysis, just as in other non-stationary time series analysis fields.

The traditional approach to HRV index calculation is to associate a single number to

each window of the IHR or RRI series. This number is effectively invariant under cyclic

permutations of the window, and the location of a detected feature is not preserved.

On the other hand, a 1-dimensional convolutional layer behaves equivariantly under

translations of the input, i.e. if the inputted time series is cyclically permuted, so is

the outputted time series. In Figure 5.4, we see a temporal correspondence between

fluctuation morphologies in the inputted time series and large activations in the

outputted time series. The binary classifier appended to the final convolution block is

informed not only of the presence of certain HRV features, but also of their location

in time relative to the labeled epoch.

The pathway through which input size affects the classification result is unclear,

but we provide some interpretation. Sleep stage classification using heart rate relies on

precisely locating time-varying fluctuation in the IHR series. A large-enough window

is required to capture these fluctuations. This high-level interpretation is consistent

with baseline calibration techniques used in the field of anaesthesiology. For example,

in [SKVG+05], the authors use HRV indices to determine if a subject has felt the

pain of a surgical incision during anaesthesia. Post-incision (0-120 sec) fluctuation

114

of the IHR series indicates a painful experience for the patient. The authors detect

this fluctuation by comparing the post-incision HRV indices to the pre-incision HRV

indices, while a normalization scheme is used to account for inter-individual variance.

The crucial information is not the raw HRV indices but whether they begin to fluctuate

after the incision compared with the baseline. A similar study using thermal stimuli

is [HKH+12].

In the robustness check, we carry out a kind of “cross-validation” at the database

level (see Table 5.4). From the statistical viewpoint, this check provides evidence

that the model structure and training process is effective independently of the chosen

database. Without any theory to back up the effectiveness of the CNN framework, we

need reassurance that the choices of parameters such as the number of convolution

blocks, the number of filters, the kernel size, the number of epochs, and the learning

rate are not over-fitted to one use-case. We take further care to make the pre-processing

steps as weak as possible: we do not manually correct the R peak detection algorithm

outcome, but simply reject epochs with a very loose criterion. Our database-level

cross-validation also shows that the selected CNN-IHR series pairing is robust to

factors which should be irrelevant to the classification problem, such as the monitoring

machine, the original sampling rate, and the presence of measurement noise. An

interesting phenomenon we observe in Table 5.4 is that although the UCDSADB

contains subjects with sleep apnea of various severities, the model established from it

performs reasonably well on CGMH-validation and the DREAMS Subjects Database.

On the other hand, when the model is trained on CGMH-training or the DREAMS

Subjects Database (which are composed of normal subjects), we see that it does

not perform well on the UCDSADB, particularly on those subjects with severe sleep

apnea (see Figure 5.4). This result suggests an interesting conjecture: the capacity

of the CNN framework is large enough to include features from both normal and

115

abnormal subjects. In other words, the CNN efficiently learns features for both normal

and abnormal subjects from the UCDSADB, but it does not learn anything about

abnormal subjects from the DREAMS Subjects Database, and this conjecture explains

the asymmetric validation outcomes shown in Tables 5.1 and 5.4.

The transfer proficiency test deserves more discussion. In addition to showing that

the CNN model trained on one modality can be applied to another modality, we see

that the CNN model which is both trained and validated on IHR-PPG series performs

slightly better than the CNN model trained and validated on IHR series (see Tables

5.1 and 5.2). At first glance, this result seems unrealistic because the link between the

ECG signal and the ANS is more direct than the link between the PPG signal and the

ANS. However, our least obliging intuition is that this slight difference is a consequence

of the sensitivity of the PPG signal to movement. A moving subject is likely awake,

and when a subject moves, his PPG signal will experience more noise than his ECG

signal. Hence, on average, the IHR-PPG series will experience more distortions than

the IHR series. We could hypothesize that the CNN associates such distortions with

the wake stage, but other interpretations are available: note that although the IHR

and IHR-PPG series both aim to capture heart rate, they are different. The main

difference comes from the pulse transit time (PTT), which describes the time it takes

for pumped blood to travel from the heart to the peripheral, where the PPG signal is

recorded. It is well known that the PTT contains blood pressure information and more

general information about hemodynamics [MHI+15]. In other words, the IHR-PPG

contains both heart rate information and hemodynamics information. A large scale

empirical study is needed to confirm whether the CNN framework is sensitive enough

to exploit this additional information, and a theoretical analysis is recommended.

Finally, we discuss our exploration of the CNN framework provided in Section 5.4.

The analysis carried out is applicable to only one version of the trained model. There

116

is no theoretical basis for expecting the model to be the same after re-training. Besides

exhibiting a simple permutation of the features, the network could converge to an

entirely different local minimum, causing the new feature set to be distinct from the

old one. It is important to realize that although we have shown the robustness of our

model from various viewpoints, this lack of explicit reproducibility experienced with

NNs is troubling.

5.5.1 Comparison with Previous Work

We compare our result to previous studies in the field that feature inter-individual

validation or cross-validation. In [XYS+13], 41 features were used to build a random

forest model and differentiate between the wake, REM, and non-REM stages. The

database was composed of healthy subjects between the ages of 16 and 61. Based

on the confusion matrix provided in [XYS+13], the sensitivity, specificity, accuracy,

and F1 for detecting the wake stage are 51.2%, 90.2%, 84.0%, and 0.50. However, we

cannot make a direct comparison between our work and the work in [XYS+13] because

the authors only analyzed those data labeled with “stationary,” but we analyze the

whole database.

In [MMC+10], the REM and non-REM stages were differentiated. The authors

take the temporal information into account by using a time-varying auto-regressive

model, and they use the phase and magnitude of the “sleepy pole” as new features.

The database was composed of 24 subjects between the ages of 40 and 50 with body

mass index less than 29 kg/m2 and apnea-hypopnea index 0. The reported sensitivity,

specificity, and accuracy of the trained hidden Markov model was 70.2%, 85.1%, and

79.3%, respectively. Although the HRV properties of the REM stage are similar to

those of the wake stage, they are physiologically different and we cannot achieve a

fair comparison.

117

In [LSCS05], a variety of features and classification algorithms were considered

and evaluated on a database composed of 190 infants. The wake and sleep classes

were balanced for the analysis. The sensitivity and specificity of their multi-layer

perceptron model (without rejection) was 79.0% and 77.5%, respectively. Because

there are physiological variations between the sleep dynamics of infants and adults, a

comparison might not be meaningful.

In [AMT+15], a feed-forward NN was applied to various time-domain, frequency-

domain, and regularity features to differentiate between the wake and sleep stages.

Detrended fluctuation analysis was also used. Various epoch lengths were considered,

and the highest performance was recorded on an epoch length of 5 minutes. The ECG

recordings came from 20 subjects aged 49-68 years with varying degrees of sleep apnea.

The authors performed two cross-validation schemes: in the inter-individual leave-one-

subject-out scheme, the accuracy, sensitivity, specificity, and kappa coefficient was

71.9± 18.2%, 43.7± 27.3%, 89.0± 7.8%, and 0.29± 0.24, respectively. (The other

validation method was not inter-individual.) In comparison with our approach, the

lower performance of their model could be attributed to the physiological heterogeneity

of their subjects.

In [LFH+12], an adaptive method was used to extract spectral HRV features

from the IHR series. Fifteen subjects aged 31.0± 10.4 years with Pittsburgh Sleep

Quality Index less than 6 were considered in a leave-one-subject-out cross-validation

scheme. The linear discriminant-based classifier achieved a Cohen’s kappa coefficient

of 0.48±0.24 and an AUC of 0.54. The sensitivity was 49.7±19.2%, and the specificity

was 96± 3.3%. Further publications which perform inter-individual classification are

[TMPG05, KMF09, FLR+15]. However, they are not compared here because they

additionally report use of the respiratory signal.

Overall, due to the heterogeneity of the data sets used in different publications, it

118

is not easy to make direct comparisons, but we recognize from the reported numbers

the standard to which we should adhere.

5.5.2 Limitations and Future Work

From the database perspective, the chief limitation in this study is the class imbalance.

As a result of this imbalance, the training procedure is more challenging, and the

algorithm’s performance is skewed by the size difference between the two classes. This

can be seen in the low sensitivity of all validation checks. Another limitation is the

case number. It is widely believed that the larger the high quality database we have,

the better the model we can train. We show that a small training set from a few

subjects is enough to obtain a network which generalizes well, but, as demonstrated

by our results on the UCDSADB in Table 5.3, physiological heterogeneity related

to sleep apnea contributes strongly to adverse classification results. With a larger

database, we might tune the training process to accommodate physiological variability

among subjects and databases.

From the signal processing perspective, in this study we focus mainly on the IHR

and IHR-PPG series, which are extracted from the raw ECG and PPG signals. It is

clear that the IHR and IHR-PPG series are “dimension reduced” versions of the raw

ECG and PPG signals, which contain more information than simply heart rate; e.g.

respiratory information is usually hidden in the ECG and PPG signals, and blood

pressure and hemodynamics information is contained in the PPG signal. We might

obtain an improvement by taking the raw time series into account. The cost of doing so

increases as both the size of the network and the size of the training set would need to

grow. On the other hand, if there are more recorded vital signs available, taking them

into account may improve classification performance. For example, we might take

multiple ECG leads into account, or we might consider an ECG recording and a PPG

119

recording simultaneously. It has been shown in [TMPG05, KMF09, FLR+15] that

the respiratory flow signal contains abundant information related to sleep dynamics.

From the algorithmic and theoretical perspective, the network architecture em-

ployed in this paper is relatively simple compared to other works in the field (e.g.

[YNS+15]) and could be augmented to improve classification performance at the

cost of computational resources. We might introduce a recurrent structure to

the network, as in long short-term memory networks [HS97], to beneficially uti-

lize the sequential nature of the problem. Although there have been several efforts

[Mal12, PNB16, LTR17, WB17] to understand how CNNs and general deep NNs

work, it is still a quite open question. Again, understanding this “black box” is an

urgent mission for upcoming applications.

From the clinical perspective, accurately assessing the wake stage has an important

application in the field of sleep apnea screening. The apnea-hypopnea index (AHI) is

obtained by counting the apnea and hypopnea events that occur while the subject

is asleep. If a patient is labeled as awake while he is asleep, then his AHI will be

underestimated. Without a proper diagnosis, an unhealthy patient will continue

to be exposed to the dangers associated with severe sleep apnea. Our established

model can be applied to the problem of screening for sleep apnea using home-care

class IV equipment. There are several open problems listed above, ranging from data

collection, to theoretical work, to clinical applications. We will report the result of

resolving these problems in future work.

5.6 Conclusion

In this study, we build a CNN model to predict every 30 seconds whether a subject is

awake or asleep using his overnight ECG or PPG recording. The considered CNN not

only detects useful HRV features from the interpolated IHR series but also pinpoints

120

their location in time. Since the DREAMS Subjects Database is composed of different

races, our results suggest that the model is flexible enough to be applied to different

races. The acceptable performance on the UCDSADB shows that the model is flexible

enough to be applied to subjects with various severities of sleep apnea. Our series of

robustness checks provides evidence that although the CNN framework is minimally

understood from a theoretical viewpoint, its effectiveness in our chosen application is

not accidental.

5.7 Supplementary Results

For the sake of self-containedness, we present results in connection with the main

result which further validate the use of this method in the particular field of sleep

stage classification. Although it is not the main focus of this work, we have interest

in the binary problem of differentiating between the REM and non-REM stages while

a subject is asleep [MMC+10, AMT+15]. As in [AMT+15], we discard all epochs

labeled with “wake.” Since we are no longer concerned with the wake stage, we return

to our original database of 90 subjects from CGMH and reject a subject if his ratio of

“REM” to “non-REM” labels is less than 0.1. There are 77 subjects remaining, and

we call this database CGMH-training (REM). (Besides removing the “wake” epochs,

we do not change CGMH-validation for this experiment.) We train the CNN model

on signals of length 5 minutes from CGMH-training (REM), and report in Table 5.5

the performance measures associated with applying the trained model to the three

validation databases. In this table, the positive class is associated to the “REM”

label. After disregarding the “wake” labels, the percent of “REM” labels is 17.8% in

CGMH-training (REM), 14.8% in CGMH-validation, 18.1% in the DREAMS Subjects

Database, and 19.0% in the UCDSADB. Overall, the performance of the CNN model

appears to exceed the performances reported in [MMC+10, AMT+15]. Note that the

121

performance is lower in the UCDSADB, which is expected due to sleep disturbances

caused by sleep apnea.

The scattering transform (ST) [Mal12] is a theoretical framework based on the

wavelet transform that builds representations of time series and images that are useful

for classification. The ST is inspired by the success of the CNN framework and builds

features using a similar succession of operations. The crucial difference between the

ST and the CNN framework is that the filters employed in the ST are pre-designed

wavelets and are not learned. The theoretical properties of the ST are well-understood

and are discussed in [Mal12] with several generalizations [CL19, QCCS18]. We replace

the feature design step in our CNN model with the ST in order to provide evidence

that the effectiveness of our CNN model is not accidental. We compute the second

order (m = 2) ST of each input time series using Morlet wavelets, one wavelet per

octave (Q = 1), and a local averaging parameter of T = 27. A support vector machine

(SVM) is commonly used to perform the classification step. However, in our work,

the features designed by the ST are fed through a neural network classifier consisting

of two dense layers of 20 nodes; each node employs a bias and a sigmoid activation

function. The output of the second dense layer is fed with biases into two output

nodes corresponding to the wake and sleep stages. We train the network using scaled

conjugate gradient back-propagation for 200 epochs, and we use cross-entropy as the

loss function. We implement all of the ST and classification steps using MATLAB

R2015a, ScatNet-0.2,3 and the Neural Network Toolbox. After training the ST-based

model on CGMH-training, the performance statistics are shown in Table 5.6. The

results obtained on CGMH-validation are close to the corresponding results obtained

using the CNN model. The results obtained on the DREAMS Subjects Database

and the UCDSADB are acceptable but not as strong as the corresponding results

3http://www.di.ens.fr/data/software/scatnet/

122

Table 5.5: Performance statistics for the REM vs. non-REM model trained onCGMH-training (REM)

CGMH-training(REM)

CGMH-validation

DREAMSSubjects

UCDSADB

TP 5, 822 1, 563 1, 363 1, 626FP 3, 154 1, 122 713 1, 222TN 37, 632 13, 083 12, 974 11, 527FN 3, 017 901 1, 656 1, 361

SE (%) 65.9 63.4 45.2 54.4SP (%) 92.3 92.1 94.8 90.4

ACC (%) 87.6 87.9 85.8 83.6

PR (%) 64.9 58.2 65.7 57.1F1 0.65 0.61 0.54 0.56

AUC 0.91 0.89 0.87 0.83Kappa 0.58 0.54 0.45 0.47


obtained using the CNN model.

123

Table 5.6: Performance statistics for the scattering transform-based model trainedon CGMH-training

CGMH-training

CGMH-validation

DREAMS UCDSADB

TP 4, 026 1, 709 1, 569 1, 773FP 3, 874 1, 684 2, 560 2, 873TN 29, 819 14, 985 14, 123 12, 863FN 3, 753 1, 724 1, 780 2, 465

SE (%) 51.8 49.8 46.8 41.8SP (%) 88.5 89.9 84.7 81.7

ACC (%) 81.6 83.0 78.3 73.3

PR (%) 51.0 50.4 38.0 38.2F1 0.51 0.50 0.42 0.40

AUC 0.83 0.82 0.75 0.68Kappa 0.40 0.40 0.29 0.23


124

Chapter 6

Blind Source Separation in Atrial

Fibrillation

Electrocardiograms often feature an arrhythmia called atrial fibrillation (Af) that is

associated with heart failure and stroke [BCB12]. During Af, the upper (atrial) and

lower (ventricular) compartments of the heart are de-synchronized, and the recorded

ECG is a summation of two independent time series (see Figure 6.1). Isolating

these time series is a single-channel blind source separation problem. Cardiologists

are interested in leveraging the ECG recordings of Af patients to predict adverse

cardiac events and thus determine the necessity of a rhythm intervention. The

relationship between a patient’s ventricular activity and his or her susceptibility to

negative outcomes has been extensively studied (less so for Af patients in particular)

[MC90], but a rigorous analysis of the atrial activity (called the fibrillatory wave,

or the f -wave) has been elusive due in part to the lack of an effective extraction

algorithm. In this chapter, we discuss the development of an f -wave extraction

algorithm which improved on a number of existing algorithms; previous algorithms

would leave ventricular residua in the extracted f -wave [MRWW17, Figure 4]. The

algorithm was designed to work on single-channel ECG signals of arbitrary length

in order to leverage the increasing prevalence of mobile health monitoring devices

[BKH+14] and because of the need to screen for diurnal patterns in the f -wave. The

algorithm does not depend on expert interpretation or fine instrumentation, and thus

can increase the diagnostic capabilities of medical teams in rural areas or developing

countries.

There are several challenges we must face when performing f -wave extraction.

125

Figure 6.1: We show an ECG signal featuring atrial fibrillation. To aid physiciansin medical decision-making, the atrial activity (the f -wave) must be separated fromthe ventricular activity.

The spectral overlap between the ventricular activity and the f -wave guarantees

that a filtering approach will fail to fully separate the f -wave from the ventricular

activity. The second challenge is that due to the dynamics of the cardiac system,

the shape of each ventricular activation changes from beat to beat. In Af patients,

the situation is further complicated by frequent morphologically distinct premature

ventricular contractions (PVCs). The third challenge is the lack of ground truth.

In general, we do not have an explicit characterization of the separated atrial and

ventricular activities. Such a characterization could be obtained from an intracardiac

electrophysiology study. However, the relationship between the intracardiac atrial

activity and the f -wave that we observe in a surface ECG is not trivial. This lack of

ground truth makes validating a proposed algorithm difficult. To handle this difficulty,

we need a set of simulated recordings that well-approximate real Af cases, as well as a

new metric to evaluate the performance of the algorithm in a real signal. Having a

systematic, reliable, and accurate method to extract the f -wave from a single-lead

ECG for the purposes of monitoring, diagnosis, and treatment is not only a challenging

126

and interesting signal processing problem, but also an important clinical endeavor.

This chapter is a modification of the work presented in [MRWW17] and is organized

in the following way. In Section 6.1, we introduce the proposed single-lead f -wave

extraction algorithm. We specify our model for an ECG featuring Af, and we

provide a theoretical guarantee for the quality of the extracted f -wave under suitable

assumptions. In Section 6.2, we validate our proposed algorithm on simulated signals,

as well as on real Holter recordings. We compare our algorithm to other common

algorithms in the literature. The simulation methods are detailed for the purpose of

reproducibility. We provide a discussion of the result in Section 6.3.

6.1 Methods

In this section, we describe our new single-lead f -wave extraction algorithm, the

simulated database of Af signals, and the metrics used to evaluate our algorithm. We

call our algorithm DD-NLEM.

6.1.1 DD-NLEM

We begin by providing an overview of the theoretical aspects of our algorithm. An Af

patient’s ECG signal E:R→ R can be written as

E = V + A+ Φ, (6.1)

where V :R → R is the ventricular activity, A:R → R is the atrial activity, and

Φ:R → R is measurement noise. The input to our single-lead f -wave extraction

algorithm is the signal E, and the desired output is the signal A + Φ. Due to the

rapid nature of the f -wave, the signal A will be periodic with frequency 3-12 Hz. A

bandpass filter h:R→ R can be applied so that

h(E) = h(V ) + ε, (6.2)

127

where ε:R → R is a small noise term describing the difference between h(V ) and

h(V + A + Φ). However, the filtered h(V ) is a distortion of the original V , and if

we simply attempt to remove h(E) from E, the resulting depiction of the f -wave

will have significant residual artifacts because the signals V and h(V ) can be quite

different. The remedy we propose is as follows. The ventricular activity during the

first N ventricular contractions (heart beats) can be written as

V (t) =N∑i=1

vi(t− ti), (6.3)

where vi ∈ V is sampled from a wave-shape manifold V embedded in L2(R), and there

exists a positive number b > 0 such that each vi is supported on [−b, b]. In other

words, V adheres to the wave-shape oscillatory model with wave-shape manifold V.

The manifold V is isomorphic to the implicit phase manifold describing the dynamics

of the ventricles. Each vi represents the i-th ventricular contraction, and ti ∈ R is the

time at which the i-th contraction occurs.

The time points ti can be recovered from the signal E using a standard peak

detection algorithm. Then, a set of patches X := xiNi=1 ⊂ L2[−b, b] is constructed

using the formula

xi(t) = E(ti − t). (6.4)

The corresponding filtered patches are also assembled: h(X) := h(xi)Ni=1. Now

xi = vi + ai + φi, (6.5)

and

h(xi) = h(vi) + εi, (6.6)

where ai(t) := A(ti − t) and φi(t) := Φ(ti − t) for t ∈ [−b, b], and εi is a small noise

term describing the remainder h(vi + ai + φi)− h(vi). We assume that h(vi) has been

sampled from a submanifold W of L2[−b, b] which is diffeomorphic to V .

128

The truncated diffusion map (an unsupervised manifold learning algorithm) yields

a mapping Θp : h(X) → Rp which approximates an isometry of W (and hence an

embedding of V , via the diffeomorphism) into Rp for some p that is large enough. Most

importantly, the mapping Θp can be made agnostic to the noise εi by ignoring the

eigenvectors of the diffusion operator which correspond to smaller eigenvalues. Given

a patch xi, we can estimate vi by an approach called the non-local median. First,

find the k-nearest neighbours to Θp h(xi) from the set Θp h(X), and label them

Θph(xi1), ...,Θph(xiK). Due to the embedding which preserves local neighbourhoods,

we can expect that

vi := medianxi1, ..., xiK = vi + medianai1 + φi1, ..., aiK + φiK, (6.7)

where the median is taken point-wise in t. Since the atrial activity and measurement

noise in each patch are independent of the ventricular activity in that patch, we have

medianai1 + φi1, ..., aiK + φiK = median1≤j≤Naj + φj, (6.8)

which we assume is the zero function on [−b, b]. Consequently, vi is an accurate

estimate for vi. Removing

V (t) :=N∑i=1

vi(t− ti) (6.9)

from E yields the signal of interest, namely A+ Φ. Having provided an overview of

the theory behind DD-NLEM, in the remainder of this section, we discuss its numerical

implementation.

Pre-processing

The pre-processing step contains several commonly applied methods in the field. The

recorded surface ECG is sampled at a rate of fs samples per second and is saved as a

n-dimensional vector E ∈ Rn, where for each i ∈ 1, ..., n, we have

E(i) = E (i/fs) . (6.10)

129

The signal lasts for T = n/fs seconds, and the collected samples satisfy

E = V + A + Φ , (6.11)

where V,A,Φ ∈ Rn are discretizations of V , A, and Φ, respectively. The goal of the

implementation is to cancel V from E so that we obtain the discretized f -wave A+Φ.

If fs is lower than 1000 Hz, the signal is upsampled to fs = 1000 Hz to enhance

the resolution of the R wave detection step that will follow. We remove baseline

wandering and attenuate noise by applying an order-20 bi-directional IIR bandpass

filter with half power frequencies 3 and 40 Hz. We apply the QRS complex detection

algorithm proposed by Elgendi [Elg13] to find all ventricular activations in the signal.

We assume that the N R waves in E are correctly recovered. The sample index ri of

the ith R wave is the integer part of 1000× ti. At this stage, we can delineate the

i-th cardiac cycle. Define

xi(t+ 301) = E(ri + t) − 300 ≤ t ≤ 700. (6.12)

The intuition is that xi represents the discretization of xi.

Metric Design

We now design the filter h that will remove the atrial activity from the recording

(while distorting the ventricular activity). The filtered signal will allow us to recover

the wave-shape manifold V underlying the set of ventricular contractions. We perform

an adaptation of the filtering performed in Elgendi’s QRS detection algorithm to yield

a signal which contains primarily QRS complex activity. We apply a bidirectional 3rd

order Butterworth bandpass filter to E with cutoff frequencies 8 and 40 Hz. Write

s for the filtered signal. We let q be the result of applying a moving average filter

with window length 100 ms to s s, where is the Hadamard element-wise product.

The purpose of taking the coordinate-wise square is to emphasize the high-frequency

130

and high-amplitude content in s. Set y = q s to be the discretization of h(E). We

excise from y a collection of surrogate patches yiNi=1 ⊂ R161 defined as

yi(t+ 81) = y (ri + t) − 80 ≤ t ≤ 80. (6.13)

The intuition is that yi is a discretization of the “center” of h(xi). Note that the T

wave has been suppressed by the 3 Hz highpass filter, and the ventricular activity

remaining in yi is a deformation of the i-th QRS complex.

Ideally, the surrogate patches yi and yj are similar if and only if vi is similar to vj .

However, since the patch yi contains a discretized deformation of vi as well as noise,

we need a metric besides the Euclidean distance in R161 which is robust to that noise.

The metric we will use is the diffusion distance on yi. To be specific, we perform

the complete-graph diffusion map as recommended by El Karoui and Wu. Define the

self-tuning affinity matrix W ∈ RN×N as

Wij =1

2

[exp

(−‖yi − yj‖2

σ2i

)+ exp

(−‖yi − yj‖2

σ2j

)], (6.14)

where σi is the median Euclidean distance from yi to the set ykNk=1. Define the

diagonal degree matrix D ∈ RN×N as

Dii =∑j≥1

Wij. (6.15)

We carry out the α-normalization step with α = 1 to handle the (possibly) non-uniform

density function:

W (α) = D−1WD−1. (6.16)

Define the isotropic diffusion kernel P as

P =(D(α)

)−1/2W (α)

(D(α)

)−1/2, (6.17)

131

where D(α)ii =

∑j≥1W

(α)ij . Write the eigenvectors of P as ϕ1, ..., ϕN , with their

corresponding eigenvalues λ1 ≥ · · · ≥ λN . The truncated time-1 diffusion map

restricted to yi is

Θp : yi 7→ (λ2φ2(i), . . . , λp+1φp+1(i))> ∈ Rp, (6.18)

where p = 80 is chosen by the user, and ϕj =(D(α)

)φj for each 1 ≤ j ≤ N . Finally,

the diffusion distance (which approximates the geodesic between points on V) is the

pullback of the canonical metric on Rp. This diffusion distance dDD depends solely on

the ventricular activity and is robust to the residual noise in the surrogate patches.

In particular, we have

dDD(yi,yj) = ‖Θp(yi)−Θp(yj)‖. (6.19)

Template Calculation and Removal

With the designed metric dDD, we are able to estimate the ventricular activity in

each patch xi. We apply a technique called the non-local Euclidean median (NLEM)

to achieve this goal. The NLEM is computed using an iteratively reweighted least

squares-type algorithm as described in [CS12]. Choose a number of neighbours k = 30.

For each patch yi, choose the k-nearest patches, denoted yi1, ...,yik, using the

metric dDD. Note that yi1 = yi. Let vi ∈ R1001 be the estimated i-th ventricular

activation, given by

vi = argminz∈R1001

k∑j=1

‖xij − z‖2. (6.20)

The final step in DD-NLEM is removing the estimated ventricular activity from E.

Initially, set a = E, and then iteratively substitute

a(ri − 300, ..., ri + 700) = xi − ζ vi (6.21)

132

for every i = 1, . . . , N , where the taper function ζ ∈ R1001 is, for c = 100,

ζ(i) =

sin2

[π(i−1)

2c

]if 1 ≤ i ≤ c

1 if c < i < 1001− csin2

[π(i−1001)

2c

]if 1001− c ≤ i ≤ 1001.

(6.22)

In other words, we obtain a blending of the estimated f -wave with the patch xi near

the edges of xi. Removal of the patches vi should be done in chronological order. The

output of our algorithm is a.

6.1.2 A Variation of DD-NLEM

In some cases, we may only have a short ECG recording. In this case, running the DM

algorithm might not be feasible. In this situation, we could set p = 161, Θp = idR161

and replace (6.20) by

vi = argminz∈R1001

k∑j=1

‖xij − z‖2. (6.23)

Instead of using the embedded patch space, we directly use the (possibly noisy)

patches yi to estimate the ventricular activity. We call this variation NLEM.

6.1.3 Material

To validate the proposed algorithm, we consider simulated signals and a set of real

Holter signals. We prepare simulated signals by adding an atrial wave and noise to a

purely ventricular signal.

Simulated f-wave

Our simulated f -wave is generated using a generalization of the sawtooth model

[SS01, PMSL12]. The sawtooth model is as follows. Define the atrial wave a:R→ R

133

to be

a(t) = −απ

M∑m=1

(−1)msin (2πmξt)

mt ∈ R, (6.24)

where α = 0.25± 0.025 is the amplitude of the wave, ξ = 6± 1 is the frequency, and

M = 4 is the number of sinusoidal harmonics in the sawtooth wave. The above model

is discretized at the sampling rate fs as

a(j) =α

π

M∑m=1

(−1)m+1

msin

(2πmξj

fs

)j ∈ Z. (6.25)

To obtain a more realistic f -wave, we generalize the sawtooth model by taking the

adaptive non-harmonic model [LSW18] into account. In particular, we allow the

amplitude factor and the frequency to vary over time. To model this variation, we

simulate AR(1) processes α:N→ R+ and ξ:N→ R+ as follows.

α(j + 1) = α(j) + (1− θα)(µα − α(j)) + εα(j) (6.26)

ξ(j + 1) = ξ(j) + (1− θξ)(µξ − ξ(j)) + εξ(j), (6.27)

where θα = θξ = 0.99 are constants, µα = 0.1 ± 0.01 is the mean amplitude, µξ =

exp(log(4)± 0.2) is the mean frequency, εα:N→ R is normal with standard deviation

σα = 0.001, and εξ:N → R is normal with standard deviation σξ = 0.1. The atrial

wave is then

a(j) =α(j)

π

M∑m=1

(−1)m+1

msin

(2πm

fs

∑i≤j

ξ(i)

)j ∈ N. (6.28)

The noise of the simulated ECG is created by applying a 3-40 Hz bandpass filter to

Gaussian white noise and scaling the standard deviation of the filtered noise to be

.05µα. Finally, we invert the simulated f -wave over the horizontal axis with probability

0.5. The generalization of the model in [SS01, PMSL12] lies in the randomness

134

of the modulation. Note that in our model, the amplitudes and frequencies are

varied according to a random process, which results in less regularity in the signal.

Furthermore, all oscillations generated in the sawtooth model are not symmetric, and

inverting over the horizontal axis yields a different signal. Amplitude and frequency

modulation is especially relevant when simulating long f -wave signals.

Simulated Ventricular Activity

We obtain ventricular activity simulations using the dynamical model ECGSYN

[MCTS03]. We generate signals lacking P waves, and we vary the amplitudes, positions,

and widths of the Q, R, S, and T waves from signal to signal. The implementation of

ECGSYN that we use performs QT interval stretching according to Bazett’s formula.

In practice, the relationship between RR interval length and QRST complex amplitude

is noisy. The ventricular signals are thus integrated beat-wise, and the amplitude

of each QRST complex is scaled by a factor of 1 ± 0.05. (In some cases, T wave

amplitude is non-linearly related to QRS complex amplitude, but we do not simulate

this relationship.) To simulate small perturbations in the cardiac axis from beat to

beat, we modify the z-positions of the Q, R, and S peaks in each beat by adding 0± 2.

Note that our synthetic ventricular signals do not contain ectopic beats, and that

after adding the f -wave, we apply the pre-processing steps of DD-NLEM. We assemble

the final ECG signals by combining the simulated f -waves with these simulated

ventricular signals.

The parameters used for the simulations in ECGSYN are summarized here. The

measurement noise is set to zero, the mean heart rate is set to 80 ± 8 bpm with

standard deviation 10 bpm, the low frequency-to-high frequency ratio is set to 0.5,

the angle, z-position, and Gaussian width parameters are −70, 0, and 0 for the P

wave; −12± 2, −5± 10, and 0.05± 0.01 for the Q wave; 0, 20± 8, and 0.08± 0.01 for

135

the R wave; 12± 2, −15± 8, and 0.07± 0.01 for the S wave; and 90± 10, 0.5± 0.2,

and 0.12± 0.02 for the T wave.

Real Holter Signals

To evaluate the potential of applying the proposed algorithm to real data, we collect

continuous, one-hour ECG signals from 47 patients with persistent Af visiting Chang

Gung Memorial Hospital during the year 2011. Patients received 24-hour Holter

recordings in order to screen for cardiac arrhythmia, examine rate control therapy in

atrial fibrillation, or survey symptoms related to rhythm disturbances. This study is

retrospective, and the Chang Gung Medical Foundation Institutional Review Board

approved the study protocol (No. 103-7449B and No. 104-7769C). The data is

collected from a DigiTrak XT (Philips), and we focus on the first channel under the

EASI lead system.

6.1.4 Other Algorithms

Our algorithm seeks to improve upon the performance of several existing algorithms,

including average beat subtraction (ABS) [SBM+85, SSS92, HPIe98, SYCH04], lo-

cal principal component analysis (PCA) [CMR+05], local adaptive singular value

cancellation (aSVC) [AR08] and non-local aSVC [AR08, Section 3.3]. We describe

these algorithms in detail. In each of these algorithms, we perform pre-processing as

described in Section 6.1.1. These algorithms differ only when it comes to ventricular

template selection. The ABS, local PCA, and local aSVC algorithms obtain tempo-

rally local estimates for the ventricular activity in each cardiac cycle. In ABS, the

ventricular template is the average of the nearest k = 30 patches to xi. In local PCA,

for each patch xi, we evaluate the singular value decomposition of the data matrix

136

X = USV>, where

X =

x>i−14...

x>i+15

∈ R30×1001. (6.29)

We then set the template vi to be

v>i = U(15, 1)S(1, 1) col1(V)>. (6.30)

In local aSVC, we set the template to be

vi = RSiU(15, 1) col1(V)>, (6.31)

where RSi is a constant that intends to scale the singular vector col1(V) so that its

height matches the height of the patch xi. To this end, we define

RSi =xi(301)−min1≤j≤60 xi(300 + j)

U(15, 1)V(301, 1)−min1≤j≤60 U(15, 1)V(300 + j, 1). (6.32)

We clip the denominator so that it is at least 0.01 to prevent RSi from exploding.

In non-local PCA, the cosine affinity is used to determine the neighbours used for

template calculation. Let xi1, ...,xi30 be the 30-nearest patches to xi in terms of

the metric

dcos(xi,xj) =

∑|k|≤80 xi(300 + k)xj(300 + k)√∑

|k|≤80 xi(300 + k)2√∑

|k|≤80 xj(300 + k)2(6.33)

known as the cosine affinity. Then, estimate the ventricular activity in the patch xi

by taking the singular value decomposition of X = USV>, where

X =

x>i1...

x>i30

∈ R30×1001, (6.34)

and setting

v>i = U(1, 1)S(1, 1) col1(V)>. (6.35)

137

Table 6.1: Comparison of existing single-lead f -wave extraction algorithms fromthree viewpoints

ABS [SBM+85] local PCA [CMR+05] non-local PCA NLEM DD-NLEM

and aSVC [AR08] and aSVC [AR08]

Neighbours local local non-local non-local non-local

Metric time time cosine affinity ‖yi − yj‖ diffusiondesign or correlation distance dDD

Template mean first principal first principal Euclidean Euclideandesign component component median median

The algorithms we consider are average beat subtraction (ABS), principal component analysis(PCA), adaptive singular value cancellation (aSVC), and the two algorithms introduced in this

paper, namely NLEM and DD-NLEM.

The approach in non-local aSVC is similar, but the neighbours are calculated using

the correlation coefficient instead of the cosine distance, and each template vi is scaled

so that its height matches the height of the patch xi. Once the estimated templates

have been obtained, we remove them from the patches xi using (6.21) as described

in the last step our algorithm. A graphical overview of the algorithms we consider is

provided in Table 6.1.

6.1.5 Evaluation Metrics

We use multiple metrics to quantify the cancellation of the ventricular activity. In the

case of simulated signals, we have an explicit characterization of the atrial activity.

The normalized mean squared error (NMSE) [CRMZ05, Equation (16)] between the

simulated f -wave a ∈ Rn and the extracted f -wave a ∈ Rn is defined as

NMSE =

∑ni=1(a(i)− a(i))2∑n

i=1 a(i)2. (6.36)

We also consider the cross-correlation ρ between a and a [AR08, Equation (16)]. We

further evaluate the signal-to-noise ratio (SNR) and the peak signal-to-noise ratio

138

(PSNR) commonly considered in the literature using the formulas

SNR = 20 log10

[std(a)

rmse(a, a)

]; PSNR = 20 log10

[maxi|a(i)|rmse(a, a)

], (6.37)

where rmse is the root mean squared error. Note that if a and a have the same mean,

then rmse(a, a) = std(a− a). When dealing with real ECG signals, where an explicit

characterization of the f -wave in each recording is not available, we consider the

ventricular residue (VR) index proposed in [AR08, Equation (18)] to evaluate how

well the QRS complex is canceled. (It is not designed to assess T wave cancellation.)

The VR index for the ith ventricular contraction is defined as

VRi =N∑N

l=1 a(l)2

√√√√ ti+H∑l=ti−H

a(l)2 maxti−H≤l≤ti+H

|a(l)|, (6.38)

where H = 50. When the ventricular activity is not accurately removed and the

`∞-norm of the extracted f -wave is high, the VR index is large. While the VR index

provides information about ventricular residua, it has several limitations. The most

problematic one is its tendency to accept “over-smoothing.” If we remove all of the

cardiac activity from the ECG signal, then the VR index is zero. Hence, we still need

an index that can simultaneously measure the cancellation of the ventricular activity

and the preservation of the atrial activity. Inspired by the VR index, we propose the

following modified VR (mVR) index. For the ith ventricular contraction, define

mVRi :=1

2

[med|ai − med(ai)|med|bi − med(bi)|

+med|bi − med(bi)|med|ai − med(ai)|

]

× 1

2

[max|ai − med(ai)|q95|bi − med(bi)|

+q95|bi − med(bi)|max|ai − med(ai)|

], (6.39)

where med is the median operator, q95w is the 95th quantile of the vector w,

ai(t+ 300) = a(ri + t) − 300 ≤ t ≤ 300, (6.40)

139

and bi is obtained by concatenating segments of (pre-processed) E that are within

10 seconds of ti but not within 300 ms of any R wave. Note that the first term

measures how well the “spectrum” of the f -wave is preserved. This term is introduced

to prevent over-smoothing. The second term measures the presence of ventricular

residua. This index captures simultaneously how well the ventricular activity is

removed and how well the atrial activity is preserved. To provide a summary of all

mVRi values obtained on one recording, we define

mVR =1

N

N∑i=1

mVRi. (6.41)

Finally, we also consider the spectral concentration (SC) index proposed in [CRMZ05,

Equation (16)]. The one-sided power spectral density P ∈ R4001 of a is calculated using

Welch’s method, featuring 8000 Fourier modes, Hamming windows of 400 samples,

and 50% overlapping. The SC index for a signal sampled at 1000 Hz is defined as

SC =

∑96i=24 P (i)∑4001i=1 P (i)

(6.42)

and is the percentage of energy in the frequency range 3-12 Hz. This band is chosen

to be sufficiently wide to cover the f -wave spectrum.

6.2 Results

We apply the algorithm to a variety of real and simulated ECG signals, compare our

algorithm to the previous algorithms described in Section 6.1.4, and use the evaluation

metrics in Section 6.1.5 to measure performance. We visually and quantitatively

justify our use of the diffusion distance. All of our experiments are conducted in

Mathworks MATLAB R2019a using a 4th Generation Intel Core i7 processor and 24

GB of RAM.

140

6.2.1 Comparison with Previous Algorithms

We validate DD-NLEM and compare its performance with that of ABS [SBM+85,

SSS92, HPIe98, SYCH04], local PCA [CMR+05], non-local PCA, local aSVC [AR08]

and non-local aSVC [AR08]. We also show the performance of NLEM. We facilitate

this comparison in two settings. In the first setting, we evaluate the performance

of our algorithm using synthesized one-hour signals generated by our generalized

sawtooth model. We generate a total of 26 simulated signals and apply the above-

mentioned algorithms to each signal. Instead of running the suggested R peak

detection algorithm, we used the R peak locations provided by the simulation software

ECGSYN. A summary of results for the first event is shown in Table 6.2. In this

setting, we see that both NLEM and DD-NLEM outperform other methods in terms of

all considered evaluation indices except computational time, and DD-NLEM performs

the best. Due to the nearest neighbour search, the non-local algorithms take longer

than their local versions. On the other hand, NLEM and DD-NLEM are slightly more

efficiently than non-local PCA and non-local aSVC since we evaluate the median (a

first order statistic) instead of the singular value decomposition. In Figure 6.2, we

show how the various algorithms perform on one of the simulated signals. In the

second setting, we apply each of the six algorithms to 47 real Holter signals and

display the corresponding mVR and SC indices in Table 6.3. The proposed DD-NLEM

performs the best out of the six algorithms. In Figures 6.3 and 6.4, we show how the

various algorithms perform on one of the real Holter signals.

Overall, we can see that the ABS, local PCA, and local aSVC algorithms tend to

provide accurate reconstructions of the f -wave in regions which contain no ventricular

activity, but they present larger QRS complex residua than their non-local counterparts.

On the other hand, while non-local PCA and non-local aSVC remove QRS complexes

quite well, they tend to provide an “over-smoothed” f -wave estimation. One reason

141

for this over-smoothing is that the f -wave is not suppressed before evaluating the

cosine or correlation affinity between cardiac cycles. When the signal is short, these

approaches have an opportunity to perform well. However, when the signal is long

(which is the situation we encounter in Holter monitoring), the probability of finding

two patches with similar ventricular activations and similar f -waves is high. As a

result, the top singular vector is contaminated by the f -wave and most often is not a

good estimate for the ventricular activity. Another problem encountered by the PCA

and aSVC approaches is the large p and large n issue, which in the past decade has

been widely discussed [Joh07]. Under our assumed model, the f -wave behaves like

high dimensional “random noise,” while the ventricular activity can be viewed as the

signal we want to recover. Taking the singular value decomposition in the large p and

large n setting leads to the estimated ventricular activity being biased; in this case, a

correction is needed [SW13, DGJ18]. The problems encountered by these algorithms

are avoided by the proposed NLEM and DD-NLEM algorithms; the f -wave is attenuated

before attempting to find matching ventricular contractions, the diffusion distance

is applied to reduce the influence of noise, and the unbiased Euclidean median is

considered when computing the ventricular template.

6.2.2 Validation of the Proposed mVR Index

To validate the meaningfulness of the proposed mVR index in real ECG signals, we

examine carefully the relationship between the mVR index and the NMSE in our set

of simulated signals. In Figure 6.5, we plot mVR values against the corresponding

NMSE values; we aggregate results from all seven algorithms and all 26 simulated

signals. It is clear that when the NMSE is larger than 0.3, the mVR index is positively

correlated with the NSME. This result indicates that, while mVR might be insensitive

to small errors, the mVR index can detect large errors in f -wave recovery. In addition

142

Figure 6.2: After running each of the seven algorithms on a simulated, one-hourECG signal featuring Af, we plot the extracted f -waves in fuchsia. The true f -wave(built using the generalized sawtooth model) is shown in gray.

143

Figure 6.3: After running each of the five previous algorithms on a real, one-hourHolter recording featuring Af, we plot the extracted f -waves in teal.

144

Table 6.2: Comparison of f -wave extraction algorithms on simulated one-hour ECGsignals

Method NMSE ρ SNR PSNR

ABS 0.79± 0.44 0.75± 0.09 1.65± 2.48 10.22± 2.42PCA 0.49± 0.29 0.81± 0.07 3.71± 2.28 12.28± 2.23aSVC 0.53± 0.34 0.81± 0.08 3.42± 2.49 11.99± 2.44

non-local PCA 0.61± 0.09 0.63± 0.06 2.21± 0.60 10.78± 0.70non-local aSVC 0.47± 0.06 0.72± 0.04 3.27± 0.52 11.84± 0.64

NLEM 0.24± 0.08 0.87± 0.04 6.34± 1.45 14.90± 1.50DM-NLEM 0.16± 0.04 0.91± 0.02 8.01± 1.23 16.58± 1.26

Method mVR SC Time (sec)

ABS 1.39± 0.21 0.67± 0.05 0.24± 0.03PCA 1.27± 0.14 0.59± 0.07 12.28± 2.78aSVC 1.29± 0.16 0.62± 0.06 12.21± 2.70

non-local PCA 1.73± 0.08 0.64± 0.04 13.86± 2.97non-local aSVC 1.68± 0.07 0.66± 0.05 13.89± 2.96

NLEM 1.24± 0.11 0.69± 0.06 7.48± 1.53DD-NLEM 1.12± 0.05 0.70± 0.07 11.00± 2.39

We provide a summary of the performance of the seven algorithms when extracting f -waves fromsimulated Af signals. The results over all simulations are summarized as mean ± the standard

deviation. For each evaluation metric, the best method is marked in bold. The abbreviations areNMSE (normalized mean squared error), ρ (cross-correlation), SNR (signal-to-noise ratio), PSNR

(peak signal-to-noise ratio), mVR (mean modified ventricular residue index), and SC (spectralconcentration).

145

Figure 6.4: After running DD-NLEM and its variation NLEM on a real, one-hour Holterrecording featuring Af, we plot the extracted f -waves in teal.

Table 6.3: Comparison of f -wave extraction algorithms on real one-hour Holtersignals

Method mVR SC Time (sec)

ABS 1.81± 1.37 0.69± 0.11 0.26± 0.06local PCA 1.72± 1.14 0.66± 0.10 13.45± 3.64local aSVC 1.98± 1.37 0.69± 0.10 13.48± 3.83

non-local PCA 1.86± 1.32 0.65± 0.09 15.88± 4.69non-local aSVC 1.93± 1.38 0.67± 0.09 15.74± 4.71

NLEM 1.65± 1.20 0.71± 0.09 8.54± 2.74DD-NLEM 1.59± 1.09 0.72± 0.09 13.35± 4.99

We provide a summary of the performance of the seven algorithms when extracting f -waves fromreal one-hour Holter signals featuring Af. The results over all subjects are summarized as mean ±

the standard deviation. For each evaluation metric, the best method is marked in bold. Theabbreviations are mVR (mean modified ventricular residue index) and SC (spectral concentration).

146

to the philosophy behind the design of mVR index discussed above, these results

support the usefulness of the proposed mVR index when evaluating the performance

of an f -wave extraction algorithm on real data.

Figure 6.5: A scatter plot of the mVR (mean modified ventricular residue) indexversus the normalized mean squared error (NMSE). We include the results from allseven algorithms evaluated on all of the simulated signals generated by the generalizedsawtooth model. We see that when the mVR index is low, so is the NMSE.

6.3 Discussion and Conclusion

Designing a correct metric is at the core of data analysis. In this work, our metric

design process involves two steps. First, inspired by Elgendi’s QRS detection algorithm

[Elg13] and based on a priori physiological knowledge about the f -wave, we design a

series of operations which succeed in suppressing the f -wave. However, the filter is not

perfect, and since the residual f -wave behaves like noise, we apply the diffusion maps

algorithm to stabilize the manifold parametrizing ventricular activations. The diffusion

maps algorithm and the diffusion distance have been theoretically studied and shown

147

to be robust to errors in affinity estimation. The proposed DD-NLEM algorithm has two

novelties when compared with traditional single-lead f -wave extraction algorithms.

First, the non-local nature of the algorithm allows us to leverage information from

the entire signal to robustly and accurately estimate a template for the ventricular

activity in each beat; second, our filtering process and our use of the diffusion distance

prevents us from over-smoothing the extracted f -wave.

There are several issues that we should address. The most important one is the

issue of outliers. Specifically, when there exist beats that are morphologically far away

from all other beats, the algorithm may not perform well. The commonly-encountered

ectopic beat is an example. The existence of ectopic beats poses a unique problem in a

number of ways. They vary significantly in morphology when compared to the signal’s

normal beats. They can also vary significantly among themselves, and the number

of ectopic beats in a 1-hour recording may be as low as 1. As such, local ABS-type

algorithms are severely limited, and a non-local algorithm is needed. In this work,

we assume that if a recording has an ectopic beat, there are enough morphologically

similar beats in the recording to guarantee that we can effectively remove it using the

proposed algorithm. While we see reasonable performance in the database of 47 real

Holter signals, there is still room for improvement. For example, we could pool beats

from different subjects to more accurately match the morphology of any ectopic beat.

Nevertheless, using longer signals would likely alleviate this issue, as the probability of

finding morphologically similar beats later in the recording becomes higher. Another

issue is the inevitable noise. Note that removing the ventricular activity results

in a signal which consists of the atrial activity and noise; post-processing may be

required. We do not attempt to solve this problem in this work because of limited

knowledge concerning f -wave regularity and its clinical significance. Nevertheless,

there are several post-processing approaches which may be useful, such as adaptive

148

recurrent filtering algorithms [SK85]. The final issue is that the T wave is not handled

separately in this work. First, we assume that the T wave has been suppressed by

the highpass filter at 3 Hz (in the pre-processing step). Nevertheless, there may be

residual information from the T wave (particularly when dealing with other lead

types). In this case, we assume that the T wave and the QRS complex have related

morphologies. However, some works in the field show the benefit of dealing with the

T wave separately [LJF+05], and we may consider a separate T wave cancellation

procedure in future work.

We mention several future directions. First, in this work, we focus on patients

with persistent Af. The analysis of patients with paroxysmal Af is of its importance

and will be explored in upcoming research. For example, how can one detect the

spontaneous initiation and termination of Af? Second, while we model the manifold

underlying the patch space xi by a fiber bundle structure, the topological structure

of the fiber bundle is not fully utilized. Moreover, we do not take the manifold

structure into account when performing the Euclidean median; a geometric median

could be considered. Third, having designed an appropriate algorithm for extracting

the f -wave signal, we need an informative feature which derives from the f -wave

useful electrophysiological properties of the atrium [LZCS14, BZK+14]; such a feature

could reflect the patient’s disease status and see relevance for the purposes diagnosis,

treatment, and prognosis. Several indices have been proposed to quantify the intrinsic

features underlying the f -wave [LZCS14, BZK+14]. While these indices are informa-

tive for different clinical applications [LZS+16], the dynamics of the f -wave remain

unstudied. We will apply other non-linear algorithms to extract these dynamical

features and study clinical problems. Fourth, although we confirm the performance

of our algorithm using simulated data, we have not provided a full proof of efficacy.

Since it is now relatively easy to obtain intracardiac signals, we plan to study the

149

relationship between the f -wave observed in the surface ECG and the corresponding

intracardiac signal; an understanding of this relationship would allow us to better

understand the f -wave and hence design a better algorithm to decouple the f -wave

from the ventricular activity in a single-lead recording.

150

Bibliography

[AAe15] M. Abadi, A. Agarwal, and et. al. TensorFlow: Large-scale machinelearning on heterogeneous systems, 2015. Software available fromtensorflow.org.

[AHP+01] G. Alan, E. Hylek, K. Phillips, Y. Chang, L. Henault, J. Selby,and D. Singer. Prevalence of diagnosed atrial fibrillation in adults:national implications for rhythm management and stroke prevention:the nticoagulation and risk factors in atrial fibrillation (ATRIA)study. JAMA, 2370-2375:157–168, 2001.

[AJY12] Mourad Adnane, Zhongwei Jiang, and Zhonghong Yan. Sleep–wakestages classification and sleep efficiency estimation using single-leadelectrocardiogram. Expert Systems with Applications, 39(1):1401–1413, 2012.

[Aki02] M. Akin. Comparison of wavelet transform and FFT methods in theanalysis of EEG signals. Journal of medical systems, 26(3):241–247,2002.

[AL12] Majd AlGhatrif and Joseph Lindsay. A brief review: history to un-derstand fundamentals of electrocardiography. Journal of communityhospital internal medicine perspectives, 2(1):14383, 2012.

[ALG+13] A. Arza, J. Lazaro, E. Gil, P. Laguna, J. Aguilo, and R. Bailon. Pulsetransit time and pulse width as potential measure for estimatingbeat-to-beat systolic and diastolic blood pressure. In Computing inCardiology Conference, pages 887–890. IEEE, 2013.

[AM14] J. Anden and S. Mallat. Deep scattering spectrum. IEEE Transac-tions on Signal Processing, 62(16):4114–4128, 2014.

[AMT+15] Md Aktaruzzaman, Matteo Migliorini, Mirja Tenhunen, Sari-LeenaHimanen, Anna M Bianchi, and Roberto Sassi. The addition ofentropy-based regularity parameters improves sleep stage classifica-tion based on heart rate variability. Medical & biological engineering& computing, 53(5):415–425, 2015.

[AOH+17] U Rajendra Acharya, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan,Muhammad Adam, Arkadiusz Gertych, and Ru San Tan. A deepconvolutional neural network model to classify heartbeats. Computersin biology and medicine, 89:389–396, 2017.

151

[AR07] R. Alcaraz and J.J. Rieta. Adaptive singular value QRST cancellationfor the analysis of short single lead atrial fibrillation electrocardio-grams. Computers in Cardiology, pages 513–516, 2007.

[AR08] R. Alcaraz and J. J. Rieta. Adaptive singular value cancelation ofventricular activity in single-lead atrial fibrillation electrocardiograms.Physiological measurement, 29(12):1351–1369, 2008.

[ASFW18] Sankaraleengam Alagapan, Hae Won Shin, Flavio Frohlich, and Hau-tieng Wu. Diffusion geometry approach to efficiently remove electricalstimulation artifacts in intracranial electroencephalography. Journalof neural engineering, 2018.

[ASO19] Miquel Alfaras, Miguel Cornelles Soriano, and Silvia Ortın. A fastmachine learning model for ecg-based heartbeat classification andarrhythmia detection. Frontiers in Physics, 7:103, 2019.

[ATN+09] Saif Ahmad, Anjali Tejuja, Kimberley D Newman, Ryan Zarychanski,and Andrew JE Seely. Clinical review: a review and analysis of heartrate variability and the diagnosis and prognosis of infection. CriticalCare, 13(6):232, 2009.

[AVBB+09] Alberto P Avolio, Luc M Van Bortel, Pierre Boutouyrie, John RCockcroft, Carmel M McEniery, Athanase D Protogerou, Mary JRoman, Michel E Safar, Patrick Segers, and Harold Smulyan. Role ofpulse pressure amplification in arterial hypertension: experts opinionand review of the data. Hypertension, 54(2):375–383, 2009.

[BA97] MH Bonnet and DL Arand. Heart rate variability: sleep stage, timeof night, and arousal influences. Electroencephalography and clinicalneurophysiology, 102(5):390–396, 1997.

[Bat14] Jonathan Bates. The embedding dimension of laplacian eigenfunctionmaps. Applied and Computational Harmonic Analysis, 37(3):516–530,2014.

[BBe12] R. B. Berry, D. G. Budhiraja, and et al. Rules for scoring respiratoryevents in sleep: update of the 2007 AASM manual for the scoring ofsleep and associated events. J Clin Sleep Med, 8(5):597–619, 2012.

[BBG94] Pierre Berard, Gerard Besson, and Sylvain Gallot. Embedding rie-mannian manifolds by their heat kernel. Geometric & FunctionalAnalysis GAFA, 4(4):373–398, 1994.

[BCB12] Stefano Bordignon, Maria Chiara Corti, and Claudio Bilato. Atrialfibrillation associated with heart failure, stroke and mortality. Journalof atrial fibrillation, 5(1), 2012.

152

[BCM05] Antoni Buades, Bartomeu Coll, and J-M Morel. A non-local algo-rithm for image denoising. In 2005 IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition (CVPR’05),volume 2, pages 60–65. IEEE, 2005.

[BD16] Peter J Brockwell and Richard A Davis. Introduction to time seriesand forecasting. springer, 2016.

[Ber06] Pierre H Berard. Spectral geometry: direct and inverse problems,volume 1207. Springer, 2006.

[BGGM80] A. Buzo, A. Gray, R. Gray, and J. Markel. Speech coding based uponvector quantization. IEEE Transactions on Acoustics, Speech, andSignal Processing, 28(5):562–574, 1980.

[BHM+06] A. Bollmann, D. Husser, L. Mainardi, F. Lombardi, P. Langley,A. Murray, J. J. Rieta, J. Millet, S. B. Olsson, M. Stridh, andL. Sornmo. Analysis of surface electrocardiograms in atrial fibrillation:techniques, research, and clinical applications. Europace: Europeanpacing, arrhythmias, and cardiac electrophysiology, 8(11):911–26,2006.

[Bil11] George E Billman. Heart rate variability–a historical perspective.Frontiers in physiology, 2:86, 2011.

[BKH+14] Paddy M Barrett, Ravi Komatireddy, Sharon Haaser, Sarah Topol,Judith Sheard, Jackie Encinas, Angela J Fought, and Eric J Topol.Comparison of 24-hour holter monitoring with 14-day novel adhesivepatch electrocardiographic monitoring. The American journal ofmedicine, 127(1):95–e11, 2014.

[BKR08] Peter Bickel, Bas Kleijn, and John Rice. Event-weighted tests fordetecting periodicity in photon arrival times. The AstrophysicalJournal, 685(1):384, 2008.

[BLS11] Andrea Bravi, Andre Longtin, and Andrew JE Seely. Review and clas-sification of variability analysis techniques with clinical applications.Biomedical engineering online, 10(1):90, 2011.

[BM13] Joan Bruna and Stephane Mallat. Invariant scattering convolutionnetworks. IEEE transactions on pattern analysis and machine intel-ligence, 35(8):1872–1886, 2013.

[BMAB05] Riccardo Barbieri, Eric C Matten, AbdulRasheed A Alabi, andEmery N Brown. A point-process model of human heartbeat intervals:new definitions of heart rate and heart rate variability. American

153

Journal of Physiology-Heart and Circulatory Physiology, 288(1):H424–H435, 2005.

[BMB+15] J. Bruna, S. Mallat, E. Bacry, J. Muzy, et al. Intermittent processanalysis with scattering moments. The Annals of Statistics, 43(1):323–351, 2015.

[BML+85] AA Borbely, P Mattmann, M Loepfe, I Strauch, and D Lehmann.Effect of benzodiazepine hypnotics on all-night sleep eeg spectra.Human neurobiology, 4(3):189–194, 1985.

[BN03] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimen-sionality reduction and data representation. Neural computation,15(6):1373–1396, 2003.

[BN07] Mikhail Belkin and Partha Niyogi. Convergence of laplacian eigen-maps. In Advances in Neural Information Processing Systems, pages129–136, 2007.

[BN08] Mikhail Belkin and Partha Niyogi. Towards a theoretical foundationfor laplacian-based manifold methods. Journal of Computer andSystem Sciences, 74(8):1289–1308, 2008.

[BOLC13] Joachim Behar, Julien Oster, Qiao Li, and Gari D Clifford. Ecg signalquality during arrhythmia and its application to false alarm reduction.IEEE transactions on biomedical engineering, 60(6):1660–1666, 2013.

[BP11] Julius S Bendat and Allan G Piersol. Random data: analysis andmeasurement procedures, volume 729. John Wiley & Sons, 2011.

[BPST98] Mirella Boselli, Liborio Parrino, Arianna Smerieri, and Mario Gio-vanni Terzano. Effect of age on eeg arousals in normal sleep. Sleep,21(4):361–367, 1998.

[BS17] Alexander Berrian and Naoki Saito. Adaptive synchrosqueezing basedon a quilted short-time fourier transform. In Wavelets and SparsityXVII, volume 10394, page 1039420. International Society for Opticsand Photonics, 2017.

[BWB+11] Martin C Baruch, Darren ER Warburton, Shannon SD Bredin, AnitaCote, David W Gerdt, and Charles M Adkins. Pulse decompositionanalysis of the digital arterial pulse during hemorrhage simulation.Nonlinear biomedical physics, 5(1):1, 2011.

[BZK+14] P. Bonizzi, S. Zeemering, J. M. H. Karel, L. Y. Di Marco, L. Uldry,J. Van Zaen, J. M. Vesin, and U. Schotten. Systematic comparison

154

of non-invasive measures for the assessment of atrial fibrillation com-plexity: A step forward towards standardization of atrial fibrillationelectrogram analysis. Europace, 17(2):318–325, 2014.

[CA+06] Harvey R Colten, Bruce M Altevogt, et al. Functional and economicimpact of sleep loss and sleep-related disorders. In Sleep disordersand sleep deprivation: An unmet public health problem. NationalAcademies Press (US), 2006.

[CAM+06] Gari D Clifford, Francisco Azuaje, Patrick McSharry, et al. Advancedmethods and tools for ECG data analysis. Artech house Boston, 2006.

[CBLR12] GD Clifford, J Behar, Q Li, and Iead Rezek. Signal quality indicesand data fusion for determining clinical acceptability of electrocar-diograms. Physiological measurement, 33(9):1419, 2012.

[CBWG12] E. J. Ciaccio, A. B. Biviano, W. Whang, and H. Garan. A newLMS algorithm for analysis of atrial fibrillation signals. Biomedicalengineering online, 11(1):15, 2012.

[CCW14] Yu-Chun Chen, Ming-Yen Cheng, and Hau-Tieng Wu. Non-parametric and adaptive modelling of dynamic periodicity and trendwith heteroscedastic and dependent errors. Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 76(3):651–682,2014.

[CD14] Florian Chouchou and Martin Desseilles. Heart rate variability: atool to explore the sleeping brain? Frontiers in neuroscience, 8:402,2014.

[CdlHMFAL06] Pablo Casaseca-de-la Higuera, Marcos Martın-Fernandez, and CarlosAlberola-Lopez. Weaning from mechanical ventilation: a retrospec-tive analysis leading to a multimodal perspective. IEEE transactionson biomedical engineering, 53(7):1330–1345, 2006.

[CDT+10] Ferran Roche Campo, Xavier Drouot, Arnaud W Thille, FabriceGalia, Belen Cabello, Marie-Pia d’Ortho, and Laurent Brochard.Poor sleep quality is associated with late noninvasive ventilationfailure in patients with acute hypercapnic respiratory failure. Criticalcare medicine, 38(2):477–485, 2010.

[CG97] Fan RK Chung and Fan Chung Graham. Spectral graph theory.Number 92. American Mathematical Soc., 1997.

[CL06] Ronald R Coifman and Stephane Lafon. Diffusion maps. Appliedand computational harmonic analysis, 21(1):5–30, 2006.

155

[CL19] Wojciech Czaja and Weilin Li. Analysis of time-frequency scat-tering transforms. Applied and Computational Harmonic Analysis,47(1):149–171, 2019.

[CLW16a] C. K. Chui, Y.-T. Lin, and H.-T. Wu. Real-time dynamics acquisitionfrom irregular samples – with application to anesthesia evaluation.Analysis and Applications, 14(4):537–590, 2016.

[CLW16b] Charles K Chui, Yu-Ting Lin, and Hau-Tieng Wu. Real-time dynam-ics acquisition from irregular sampleswith application to anesthesiaevaluation. Analysis and Applications, 14(04):537–590, 2016.

[CMR+05] F. Castells, C. Mora, J. J. Rieta, D. Moratal-Perez, and J. Millet.Estimation of atrial fibrillatory wave from single-lead atrial fibrilla-tion electrocardiograms using principal component analysis concepts.Medical and Biological Engineering and Computing, 43(5):557–560,2005.

[CNF+97] Chen-Huan Chen, Erez Nevo, Barry Fetics, Peter H Pak, Frank CPYin, W Lowell Maughan, and David A Kass. Estimation of cen-tral aortic pressure waveform by mathematical transformation ofradial tonometry pressure: validation of generalized transfer function.Circulation, 95(7):1827–1836, 1997.

[Coh95] L. Cohen. Uncertainty principles of the short-time fourier transform.In Advanced Signal Processing Algorithms, volume 2563, pages 80–91.International Society for Optics and Photonics, 1995.

[Coh14] M. X. Cohen. Analyzing neural time series data: theory and practice.MIT Press, 2014.

[Con90] John B Conway. A course in functional analysis. Springer, 1990.

[CRMZ05] F. Castells, J. Rieta, J. Millet, and V. Zarzoso. Spatio-temporalblind source separation approach to atrial activity estimation in atrialtachyarrhythmias. IEEE Transactions on Biomedical Engineering,52:258–267, 2005.

[CRV+06] S. Cesar, J. J. Rieta, C. Vaya, D. M. Perez, R. Zangroniz, andJ. Millet. Wavelet Denoising as Preprocessing Stage to Improve ICAperformance in Atrial Fibrillation Analysis. In Justinian Rosca, DenizErdogmus, Jose C. Prıncipe, and Simon Haykin, editors, LNCS 3889,volume 3889 of Lecture Notes in Computer Science. Springer BerlinHeidelberg, 2006.

156

[CS12] K. N. Chaudhury and A. Singer. Non-local euclidean medians. IEEESignal Processing Letters, 19(11):745–748, 2012.

[CTL+95] Chen-Huan Chen, Chih-Tai Ting, Shing-Jong Lin, Tsui-Lieh Hsu,Frank CP Yin, Cynthia O Siu, Pesus Chou, Shih-Pu Wang, andMau-Song Chang. Different effects of fosinopril and atenolol on wavereflections in hypertensive patients. Hypertension, 25(5):1034–1041,1995.

[CTW91] T. Chou, Y. Tamura, and I. Wong. Detection of atrial fibrillation inECGs. In Computers in Cardiology, 1991.

[CW17] Antonio Cicone and Hau-Tieng Wu. How nonlinear-type time-frequency analysis can help in sensing instantaneous heart rate andinstantaneous respiratory rate from photoplethysmography in a reli-able way. Frontiers in physiology, 8:701, 2017.

[CZC15] Ying Chen, Xin Zhu, and Wenxi Chen. Automatic sleep staging basedon ecg signals using hidden markov models. In 2015 37th AnnualInternational Conference of the IEEE Engineering in Medicine andBiology Society (EMBC), pages 530–533. IEEE, 2015.

[D+97] Rainer Dahlhaus et al. Fitting time series models to nonstationaryprocesses. The annals of Statistics, 25(1):1–37, 1997.

[Dau92] I. Daubechies. Ten lectures on wavelets. SIAM, 1992.

[DCOR04] Philip De Chazal, Maria O’Dwyer, and Richard B Reilly. Auto-matic classification of heartbeats using ecg morphology and heart-beat interval features. IEEE transactions on biomedical engineering,51(7):1196–1206, 2004.

[DDV+15] Jonathan W Dukes, Thomas A Dewland, Eric Vittinghoff, Mala CMandyam, Susan R Heckbert, David S Siscovick, Phyllis K Stein,Bruce M Psaty, Nona Sotoodehnia, John S Gottdiener, et al. Ven-tricular ectopy as a predictor of heart failure and death. Journal ofthe American College of Cardiology, 66(2):101–109, 2015.

[DFL+13] F. I. Donoso, R. L. Figueroa, E. A Lecannelier, E. J. Pino, and A. J.Rojas. Atrial activity selection for atrial fibrillation ECG recordings.Computers in biology and medicine, 43(10):1628–36, 2013.

[DG05] David L Donoho and Carrie Grimes. Image manifolds which areisometric to euclidean space. Journal of mathematical imaging andvision, 23(1):5–24, 2005.

157

[DGJ18] David L Donoho, Matan Gavish, and Iain M Johnstone. Optimalshrinkage of eigenvalues in the spiked covariance model. Annals ofstatistics, 46(4):1742, 2018.

[DhAZ+09] Heidi Danker-hopfe, Peter Anderer, Josef Zeitlhofer, Marion Boeck,Hans Dorn, Georg Gruber, Esther Heller, Erna Loretz, Doris Moser,Silvia Parapatics, et al. Interrater reliability for sleep scoring accord-ing to the rechtschaffen & kales and the new aasm standard. Journalof sleep research, 18(1):74–84, 2009.

[Dhi01] Inderjit S Dhillon. Co-clustering documents and words using bipartitespectral graph partitioning. In Proceedings of the seventh ACMSIGKDD international conference on Knowledge discovery and datamining, pages 269–274, 2001.

[DKS07] L. G. Doroshenkov, V. A. Konyshev, and S. V. Selishchev. Classifi-cation of human sleep stages based on EEG processing using hiddenMarkov models. Biomedical Engineering, 41(1):25–28, 2007.

[DL14] Y. Dong and D. Li. Automatic speech recognition: A deep learningapproach. Springer, 2014.

[DLHS11] Alysha M De Livera, Rob J Hyndman, and Ralph D Snyder. Fore-casting time series with complex seasonal patterns using exponen-tial smoothing. Journal of the American statistical association,106(496):1513–1527, 2011.

[DLW11] Ingrid Daubechies, Jianfeng Lu, and Hau-Tieng Wu. Synchrosqueezedwavelet transforms: An empirical mode decomposition-like tool. Ap-plied and computational harmonic analysis, 30(2):243–261, 2011.

[Don92] David L Donoho. Superresolution via sparsity constraints. SIAMjournal on mathematical analysis, 23(5):1309–1331, 1992.

[DS04] I. Dotsinsky and T. Stoyanov. Optimization of bi-directional digitalfiltering for drift suppression in electrocardiogram signals. Journalof Medical Engineering & Technology, 28(4):178–180, 2004.

[DTZC13] Dominique Duncan, Ronen Talmon, Hitten P Zaveri, and Ronald RCoifman. Identifying preseizure state in intracranial eeg data usingdiffusion kernels. Math. Biosci. Eng, 10(3):579–590, 2013.

[DWW16] Ingrid Daubechies, Yi Wang, and Hau-tieng Wu. Conceft: Concen-tration of frequency and time via a multitapered synchrosqueezedtransform. Philosophical Transactions of the Royal Society A: Math-ematical, Physical and Engineering Sciences, 374(2065):20150193,2016.

158

[EC598] ANSI-AAMI EC57. Testing and reporting performance results ofcardiac rhythm and st segment measurement algorithms. Associationfor the Advancement of Medical Instrumentation, Arlington, VA,1998.

[ECA87] AAMI ECAR. Recommended practice for testing and reportingperformance results of ventricular arrhythmia detection algorithms.Association for the Advancement of Medical Instrumentation, page 69,1987.

[EHO99] Sigrid Elsenbruch, Michael J Harnish, and William C Orr. Heartrate variability during waking and sleep in healthy males and females.Sleep, 22(8):1067–1071, 1999.

[EK+10] Noureddine El Karoui et al. On information plus noise kernel randommatrices. The Annals of Statistics, 38(5):3191–3216, 2010.

[EKW+16] Noureddine El Karoui, Hau-Tieng Wu, et al. Graph connectionlaplacian methods can be made robust to noise. The Annals ofStatistics, 44(1):346–372, 2016.

[Elg13] Mohamed Elgendi. Fast qrs detection with an optimized knowledge-based method: Evaluation on 11 standard ecg databases. PloS one,8(9):e73557, 2013.

[ENB+13] Mohamed Elgendi, Ian Norton, Matt Brearley, Derek Abbott, andDale Schuurmans. Systolic peak detection in acceleration photo-plethysmograms measured from emergency responders in tropicalconditions. PLoS One, 8(10), 2013.

[Eng98] Milo Engoren. Approximate entropy of respiratory rate and tidalvolume during weaning from mechanical ventilation. Critical caremedicine, 26(11):1817–1823, 1998.

[EPD49] K.F. Eather, L.H. Peterson, and R.D. Dripps. Studies of the circula-tion of anesthetized patients by a new method for recording arterialpressure and pressure pulse contours. Anesthesiology, 10(2):125, 1949.

[ESAMN13] Farideh Ebrahimi, Seyed-Kamaledin Setarehdan, Jose Ayala-Moyeda,and Homer Nazeran. Automatic sleep staging using empirical modedecomposition, discrete wavelet transform, time-domain, and non-linear dynamics features of heart rate variability signals. Computermethods and programs in biomedicine, 112(1):47–57, 2013.

159

[EVWFE18] Diana Escalona-Vargas, Hau-Tieng Wu, Martin G Frasch, and HariEswaran. A comparison of five algorithms for fetal magnetocardiog-raphy signal extraction. Cardiovascular engineering and technology,pages 1–5, 2018.

[FCEH03] Mia Folke, L Cernerud, M Ekstrom, and Bertil Hok. Critical reviewof non-invasive respiratory monitoring in medical care. Medical andBiological Engineering and Computing, 41(4):377–383, 2003.

[FdMM17] Matthias Franz, Santiago Lopez de Medrano, and John Malik. Mu-tants of compactified representations revisited. Boletın de la SociedadMatematica Mexicana, 23(1):511–526, 2017.

[FDSR02] A. Flexerand, G. Dorffner, P. Sykacekand, and I. Rezek. An auto-matic, continuous and probabilistic sleep stager based on a hiddenMarkov model. Applied Artificial Intelligence, 16(3):199–207, 2002.

[FGD05] A. Flexer, G. Gruber, and G. Dorffner. A reliable probabilisticsleep stager based on a single EEG signal. Artificial Intelligence inMedicine, 33(3):199–207, 2005.

[FGHS02] Mia Folke, Fredrik Granstedt, Bertil Hok, and Hakan Scheer. Com-parative provocation test of respiratory monitoring methods. Journalof clinical monitoring and computing, 17(2):97–103, 2002.

[FHT+00] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additivelogistic regression: a statistical view of boosting (with discussion anda rejoinder by the authors). The annals of statistics, 28(2):337–407,2000.

[Fla99] P Flandrin. Wavelet analysis and its applications. Time–Frequency/Time-Scale Analysis, pages 183–307, 1999.

[FLR+15] Pedro Fonseca, Xi Long, Mustafa Radha, Reinder Haakma, Ronald MAarts, and Jerome Rolink. Sleep stage classification with ecg andrespiratory effort. Physiological measurement, 36(10):2027, 2015.

[Fra08] A. M. Fraser. Hidden Markov models and dynamical systems. SIAM,2008.

[Fri01] Jerome H Friedman. Greedy function approximation: a gradientboosting machine. Annals of statistics, pages 1189–1232, 2001.

[FS95] Yoav Freund and Robert E Schapire. A desicion-theoretic generaliza-tion of on-line learning and an application to boosting. In Europeanconference on computational learning theory, pages 23–37. Springer,1995.

160

[Fun13] Yuan-cheng Fung. Biomechanics: circulation. Springer Science &Business Media, 2013.

[GAG+00] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff,Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George BMoody, Chung-Kang Peng, and H Eugene Stanley. Physiobank,physiotoolkit, and physionet: components of a new research resourcefor complex physiologic signals. circulation, 101(23):e215–e220, 2000.

[GDDGG14] Dragoljub Gajic, Zeljko Djurovic, Stefano Di Gennaro, and FredrikGustafsson. Classification of eeg signals for detection of epilep-tic seizures based on wavelets and statistical pattern recognition.Biomedical Engineering: Applications, Basis and Communications,26(02):1450021, 2014.

[GDG+15] Dragoljub Gajic, Zeljko Djurovic, Jovan Gligorijevic, Stefano Di Gen-naro, and Ivana Savic-Gajic. Detection of epileptiform activity in eegsignals based on time-frequency and non-linear analysis. Frontiers incomputational neuroscience, 9:38, 2015.

[GH00] A. Guyton and J. Hall. Textbook of Medical Physiology. Saunders,2000.

[GH07] Marc G Genton and Peter Hall. Statistical inference for evolvingperiodic functions. Journal of the Royal Statistical Society: Series B(Statistical Methodology), 69(4):643–657, 2007.

[GHA+02] D. Gorur, U. Halici, H. Aydin, G. Ongun, F. Ozgen, and K. Leblebi-cioglu. Sleep spindles detection using short time fourier transformand neural networks. In IJCNN’02, volume 2, pages 1631–1636.IEEE, 2002.

[GK+06] Evarist Gine, Vladimir Koltchinskii, et al. Empirical graph laplacianapproximation of laplace–beltrami operators: Large sample results.In High dimensional probability, pages 238–259. Institute of Mathe-matical Statistics, 2006.

[Gla09] Leon Glass. Introduction to controversial topics in nonlinear science:Is the normal heart rate chaotic?, 2009.

[Gol06] A. L. Goldberger. Clinical Electrocardiography: A Simplified Ap-proach. Mosby, 2006.

[GRAR16] M. Garcıa, J. Rodenas, R. Alcaraz, and J. J. Rieta. Application ofthe relative wavelet energy to heart rate independent detection ofatrial fibrillation. Computer Methods and Programs in Biomedicine,131(April):157–168, 2016.

161

[GRS05] Steinn Gudmundsson, Thomas Philip Runarsson, and Sven Sigurds-son. Automatic sleep staging using support vector machines withposterior probability estimates. In International Conference on Com-putational Intelligence for Modelling, Control and Automation andInternational Conference on Intelligent Agents, Web Technologies andInternet Commerce (CIMCA-IAWTIC’06), volume 2, pages 366–372.IEEE, 2005.

[GS64] Izrail Moiseevich Gelfand and Georgii Evgenevich Shilov. Generalizedfunctions, Vol. 4: applications of harmonic analysis. Academic Press,1964.

[GS02] et al. Goodacre S. Abc of clinical electrocardiography: Atrial ar-rhythmias. BMJ, 324(7344):594–597, 2002.

[GSK+04] Kenan Gumustekin, Bedri Seven, Nezihe Karabulut, Omer Aktas,Nesrin Gursan, Sahin Aslan, Mustafa Keles, Erhan Varoglu, andSenol Dane. Effects of sleep deprivation, nicotine, and seleniumon wound healing in rats. International journal of neuroscience,114(11):1433–1442, 2004.

[Hal78] Marc Hallin. Mixed autoregressive-moving average multivariate pro-cesses with time-dependent coefficients. Journal of MultivariateAnalysis, 8(4):567–572, 1978.

[Hal80] Marc Hallin. Invertibility and generalized invertibility of time seriesmodels. Journal of the Royal Statistical Society: Series B (Method-ological), 42(2):210–212, 1980.

[HAVL05] Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. Fromgraphs to manifolds–weak and strong pointwise consistency of graphlaplacians. In International Conference on Computational LearningTheory, pages 470–485. Springer, 2005.

[HCH+06] Tse Lin Hsu, Pin Tsun Chao, Hsin Hsiu, Wei Kung Wang, Sai Ping Li,and Yuh Ying Lin Wang. Organ-specific ligation-induced changes inharmonic components of the pulse spectrum and regional vasoconstric-tor selectivity in wistar rats. Experimental physiology, 91(1):163–170,2006.

[HCKM04] J. Healey, G.D. Clifford, L. Kontothanassis, and P.E. McSharry. Anopen-source method for simulating atrial fibrillation using ECGSYN.Computers in Cardiology,, (September):19–22, 2004.

[HFSW17] Christophe L Herry, Martin Frasch, Andrew JE Seely, and Hau-tieng Wu. Heart beat classification from single-lead ecg using the

162

synchrosqueezing transform. Physiological measurement, 38(2):171,2017.

[HGN04] Efstathios Hadjidemetriou, Michael D Grossberg, and Shree K Nayar.Multiresolution histograms and their use for recognition. IEEETransactions on Pattern Analysis and Machine Intelligence, 26(7):831–847, 2004.

[HJB+18] Feras Hatib, Zhongping Jian, Sai Buddi, Christine Lee, Jos Settels,Karen Sibert, Joseph Rinehart, and Maxime Cannesson. Machine-learning algorithm to predict hypotension based on high-fidelityarterial pressure waveform analysis. Anesthesiology, 129(4):663–674,2018.

[HKF+05] B. L. Hoppe, A. M. Kahn, G. K. Feld, A. Hassankhani, and S. M.Narayan. Separating Atrial Flutter From Atrial Fibrillation WithApparent Electrocardiographic Organization Using Dominant andNarrow F-Wave Spectra. Journal of the American College of Cardi-ology, 46(11):2079–2087, 2005.

[HKH+12] K Hamunen, V Kontinen, Elina Hakala, P Talke, M Paloheimo, andE Kalso. Effect of pain on autonomic nervous system indices derivedfrom photoplethysmography in healthy volunteers. British journal ofanaesthesia, 108(5):838–844, 2012.

[HMC04] SA Hope, IT Meredith, and JD Cameron. Effect of non-invasivecalibration of radial waveforms on error in transfer-function-derivedcentral aortic waveform characteristics. Clinical science (London,England: 1979), 107(2):205, 2004.

[HOH13] Abdulnasir Hossen, H Ozer, and Ulrich Heute. Identification of sleepstages from heart rate variability using a soft-decision wavelet-basedtechnique. Digital Signal Processing, 23(1):218–229, 2013.

[Hor90] L. Hormander. The analysis of linear partial differential operators I.Springer, 1990.

[HPIe98] M. Holm, S. Pehrson, M. Ingemansson, and et al. Non-invasiveassessment of the atrial cycle length during atrial fibrillation inman: introducing, validating and illustrating a new ECG method.Cardiovasc Res, 38:69–81, 1998.

[HRC+17] David Hamon, Pradeep S Rajendran, Ray W Chui, Olujimi A Ajijola,Tadanobu Irie, Ramin Talebi, Siamak Salavatian, Marmar Vaseghi,Jason S Bradfield, J Andrew Armour, et al. Premature ventricularcontraction coupling interval variability destabilizes cardiac neuronal

163

and electrophysiological control: insights from simultaneous car-dioneural mapping. Circulation: Arrhythmia and Electrophysiology,10(4):e004937, 2017.

[HRPH10] A. Hyvarinen, P. Ramkumar, L. Parkkonen, and R. Hari. Independentcomponent analysis of short-time fourier transforms for spontaneousEEG/MEG analysis. NeuroImage, 49(1):257–271, 2010.

[HRR00] Peter Hall, James Reimann, and John Rice. Nonparametric estima-tion of a periodic function. Biometrika, 87(3):545–557, 2000.

[HS97] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735–1780, 1997.

[HS16] Thomas Y Hou and Zuoqiang Shi. Extracting a shape functionfor a signal with intra-wave frequency modulation. PhilosophicalTransactions of the Royal Society A: Mathematical, Physical andEngineering Sciences, 374(2065):20150194, 2016.

[HSE+95] James Hafner, Harpreet S. Sawhney, William Equitz, Myron Flickner,and Wayne Niblack. Efficient color histogram indexing for quadraticform distance functions. IEEE transactions on pattern analysis andmachine intelligence, 17(7):729–736, 1995.

[HTPB04] Peter Halasz, Mario Terzano, Liborio Parrino, and Robert Bodizs.The nature of arousal in sleep. Journal of sleep research, 13(1):1–23,2004.

[HWB08] Chinmay Hegde, Michael Wakin, and Richard Baraniuk. Randomprojections for manifold learning. In Advances in neural informationprocessing systems, pages 641–648, 2008.

[IAIC+07] Conrad Iber, Sonia Ancoli-Israel, Andrew L Chesson, Stuart F Quan,et al. The AASM manual for the scoring of sleep and associatedevents: rules, terminology and technical specifications, volume 1.American Academy of Sleep Medicine Westchester, IL, 2007.

[IRV14] S. A. Imtiaz and Esther R.-V. Recommendations for performanceassessment of automatic sleep staging algorithms. In Engineering inMedicine and Biology Society (EMBC), 2014 36th Annual Interna-tional Conference of the IEEE, pages 5044–5047. IEEE, 2014.

[JAR15] M. Julian, R. Alcaraz, and J. J. Rieta. Application of Hurst exponentsto assess atrial reverse remodeling in paroxysmal atrial fibrillation.Physiological measurement, 36:2231–2246, 2015.

164

[Je14] C. T. January and et. al. 2014 aha/acc/hrs guideline for the manage-ment of patients with atrial fibrillation: Executive summary. Journalof the American College of Cardiology, 64(21):2246–2280, 2014.

[JHH+00] Ming-Yie Jan, Hsin Hsiu, Tse-Lin Hsu, Yuh-Ying Lin Wang, andWei-Kung Wang. The importance of pulsatile microcirculation inrelation to hypertension. IEEE Engineering in Medicine and BiologyMagazine, 19(3):106–111, 2000.

[JMS08] Peter W Jones, Mauro Maggioni, and Raanan Schul. Manifoldparametrizations by eigenfunctions of the laplacian and heat kernels.Proceedings of the National Academy of Sciences, 105(6):1803–1808,2008.

[Joh07] I. M. Johnstone. High dimensional statistical inference and randommatrices. In Proceedings of the International Congress of Mathemati-cians, 2007.

[Kam12] Chandrakar Kamath. A new approach to detect congestive heartfailure using teager energy nonlinear scatter plot of r–r interval series.Medical engineering & physics, 34(7):841–848, 2012.

[Ke16] P. Kirchhof and et. al. 2016 esc guidelines for the management ofatrial fibrillation developed in collaboration with eacts. EuropeanHeart Journal, 37(38):2893, 2016.

[Kee98] J. Keener. Mathematical Physiology. Springer, 1998.

[KJH+13] Won Young Kim, Jong Hun Jun, Jin Won Huh, Sang Bum Hong,Chae-Man Lim, and Younsuck Koh. Radial to femoral arterial bloodpressure differences in septic shock patients receiving high-dose nore-pinephrine therapy. Shock, 40(6):527–531, 2013.

[KLB+09] Jae-Eun Kang, Miranda M Lim, Randall J Bateman, James J Lee,Liam P Smyth, John R Cirrito, Nobuhiro Fujiki, Seiji Nishino, andDavid M Holtzman. Amyloid-β dynamics are regulated by orexinand the sleep-wake cycle. Science, 326(5955):1005–1007, 2009.

[KLR+09] Yongsam Kim, Sookkyung Lim, Subha V Raman, Orlando P Si-monetti, and Avner Friedman. Blood flow in a compliant vessel bythe immersed boundary method. Annals of biomedical engineering,37(5):927–942, 2009.

[KMF09] Walter Karlen, Claudio Mattiussi, and Dario Floreano. Sleep andwake classification with ecg and respiratory effort signals. IEEETransactions on Biomedical Circuits and Systems, 3(2):71–78, 2009.

165

[KSK+01] D.A. Kass, E.P. Shapiro, M. Kawaguchi, A.R. Capriotti, A. Scuteri,E.G. Lakatta, et al. Improved arterial compliance by a novel advancedglycation end-product crosslink breaker. Circulation, 104(13):1464–1470, 2001.

[KSP+13] Sung-Hoon Kim, Jun-Gol Song, Ji-Hyun Park, Jung-Won Kim, Yong-Seok Park, and Gyu-Sam Hwang. Beat-to-beat tracking of systolicblood pressure using noninvasive pulse transit time during anesthesiainduction in hypertensive patients. Anesthesia & Analgesia, 116(1):94–100, 2013.

[KTK+16] Takeshi Kanda, Natsuko Tsujino, Eriko Kuramoto, YoshimasaKoyama, Etsuo A Susaki, Sachiko Chikahisa, and Hiromasa Fu-nato. Sleep as a biological problem: an overview of frontiers in sleepresearch. The Journal of Physiological Sciences, 66(1):1–13, 2016.

[KTLW17] Ori Katz, Ronen Talmon, Yu-Lun Lo, and Hau-Tieng Wu. Diffusion-based nonlinear filtering for multimodal data fusion with applicationto sleep stage assessment. arXiv preprint arXiv:1701.03619, 2017.

[KTR+94] Avi Karni, David Tanne, Barton S Rubenstein, JJ Askenasy, andDov Sagi. Dependence on rem sleep of overnight improvement of aperceptual skill. Science, 265(5172):679–682, 1994.

[Ku97] David N Ku. Blood flow in arteries. Annual review of fluid mechanics,29(1):399–434, 1997.

[Laf04] Stephane S Lafon. Diffusion maps and geometric harmonics. PhDthesis, Yale University, 2004.

[LB05] E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsicdimension. In Advances in neural information processing systems,pages 777–784, 2005.

[LBBH98] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner.Gradient-based learning applied to document recognition. Proceedingsof the IEEE, 86(11):2278–2324, 1998.

[LBH15] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015.

[LBLP12] Damien Leger, Virginie Bayon, Jean Pierre Laaban, and PierrePhilip. Impact of sleep apnea on economics. Sleep medicine re-views, 16(5):455–462, 2012.

166

[LBM00] P. Langley, J. Bourke, and A. Murray. Frequency analysis of atrialfibrillation. In Proc. Int. Conf. IEEE Computers in Cardiology,volume 27, pages 65–68. IEEE, 2000.

[LBM10] Philip Langley, Emma J Bowers, and Alan Murray. Principal com-ponent analysis as a tool for analysing beat-to-beat changes in elec-trocardiogram features: Application to electrocardiogram derivedrespiration. IEEE Trans. Biomed. Eng, 7:7, 2010.

[LCL+15] Chin-Yu Lin, Shih-Lin Chang, Yenn-Jiang Lin, Li-Wei Lo, Fa-PoChung, Yun-Yu Chen, Tze-Fan Chao, Yu-Feng Hu, Ta-Chuan Tuan,Jo-Nan Liao, et al. Long-term outcome of multiform prematureventricular complexes in structurally normal heart. Internationaljournal of cardiology, 180:80–85, 2015.

[LFH+12] Xi Long, Pedro Fonseca, Reinder Haakma, Ronald M Aarts, andJerome Foussier. Time-frequency analysis of heart rate variabilityfor sleep and wake classification. In 2012 IEEE 12th InternationalConference on Bioinformatics & Bioengineering (BIBE), pages 85–90.IEEE, 2012.

[LFW17] Ruilin Li, Martin G Frasch, and Hau-Tieng Wu. Efficient fetal-maternal ecg signal separation from two channel maternal abdominalecg via diffusion-based channel selection. Frontiers in physiology,8:277, 2017.

[LGT14] James R Lee, Shayan Oveis Gharan, and Luca Trevisan. Multiwayspectral partitioning and higher-order cheeger inequalities. Journalof the ACM (JACM), 61(6):1–30, 2014.

[LI09] R. Llinares and J. Igual. Application of constrained independentcomponent analysis algorithms in electrocardiogram arrhythmias.Artificial Intelligence in Medicine, 47(2):121–133, 2009.

[LIMB10] R. Llinares, J. Igual, and J. Miro-Borras. A fixed point algorithm forextracting the atrial activity in the frequency domain. Computers inBiology and Medicine, 40(11-12):943–949, 2010.

[LJF+05] M. Lemay, V. Jacquemet, a. Forclaz, J. M. Vesin, and L. Kappen-berger. Spatiotemporal QRST cancellation method using separateQRS and T-waves templates. Computers in Cardiology, 32:611–614,2005.

[LLM+20] Gi-Ren Liu, Yu-Lun Lo, John Malik, Yuan-Chung Sheu, and Hau-Tieng Wu. Diffuse to fuse eeg spectra–intrinsic geometry of sleep

167

dynamics for classification. Biomedical Signal Processing and Control,55:101576, 2020.

[LM10] Mariano Llamedo and Juan Pablo Martınez. Heartbeat classificationusing feature selection driven by database generalization criteria.IEEE Transactions on Biomedical Engineering, 58(3):616–625, 2010.

[LM12] Mariano Llamedo and Juan Pablo Martınez. An automatic patient-adapted ecg heartbeat classifier allowing expert assistance. IEEETransactions on Biomedical Engineering, 59(8):2312–2320, 2012.

[LMW19] Yu-Ting Lin, John Malik, and Hau-Tieng Wu. Wave-shape oscillatorymodel for biomedical time series with applications, 2019.

[LNS16] Seungmin Lee, Seungyoon Nam, and Hyunsoon Shin. The analysisof sleep stages with motion and heart rate signals from a handheldwearable device. In 2016 International Conference on Informationand Communication Technology Convergence (ICTC), pages 1135–1137. IEEE, 2016.

[LO06] Haibin Ling and Kazunori Okada. Diffusion distance for histogramcomparison. In 2006 IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition (CVPR’06), volume 1, pages246–253. IEEE, 2006.

[LRS+03] P. Langley, J. Rieta, M. Stridh, J. Millet, L. Sornmo, and A. Mur-ray. Reconstruction of atrial signals derived from the 12-lead ECGusing atrial signal extraction techniques. In Proc. Int. Conf. IEEEComputers in Cardiology, pages 129–132, 2003.

[LRS+06] P. Langley, J. J. Rieta, M. Stridh, J. Millet, L. Sornmo, and A. Murray.Comparison of atrial signal extraction algorithms in 12-lead ECGswith atrial fibrillation. IEEE Transactions on Biomedical Engineering,53(2):343–6, 2006.

[LS00] P. Laguna and L. Sornmo. Sampling rate and the estimation of ensem-ble variability for repetitive signals. Medical & biological engineering& computing, 38(5):540–6, 2000.

[LSC+07] Aaron Lewicke, Edward Sazonov, Michael J Corwin, Michael Neuman,and Stephanie Schuckers. Sleep versus wake classification from heartrate variability using computational intelligence: consideration ofrejection in classification models. IEEE Transactions on BiomedicalEngineering, 55(1):108–118, 2007.

168

[LSCS05] AT Lewicke, ES Sazonov, MJ Corwin, and SAC Schuckers. Reliabledetermination of sleep versus wake from heart rate variability usingneural networks. In Proceedings. 2005 IEEE International JointConference on Neural Networks, 2005., volume 4, pages 2394–2399.IEEE, 2005.

[LSW18] Chen-Yun Lin, Li Su, and Hau-Tieng Wu. Wave-shape functionanalysis. Journal of Fourier Analysis and Applications, 24(2):451–505, 2018.

[LT15] R. R. Lederman and R. Talmon. Learning the geometry of commonlatent variables using alternating-diffusion. Appl. Comp. Harmon.Anal., in press (2015), 2015.

[LTR17] Henry W Lin, Max Tegmark, and David Rolnick. Why does deepand cheap learning work so well? Journal of Statistical Physics,168(6):1223–1247, 2017.

[LW83] CT Lee and LY Wei. Spectrum analysis of human pulse. IEEEtransactions on bio-medical engineering, 30(6):348, 1983.

[LWM19] Yao Lu, Hau-Tieng Wu, and John Malik. Recycling cardiogenicartifacts in impedance pneumography. Biomedical Signal Processingand Control, 51:162–170, 2019.

[LYSA20] Ofir Lindenbaum, Arie Yeredor, Moshe Salhov, and Amir Averbuch.Multi-view diffusion maps. Information Fusion, 55:127–149, 2020.

[LZCS14] T. A. R. Lankveld, S. Zeemering, H. J. G. M. Crijns, and U. Schotten.The ECG as a tool to determine atrial fibrillation complexity. Heart,100(14):1077–84, 2014.

[LZS+16] T. Lankveld, S. Zeemering, D. Scherr, P. Kuklik, B. A. Hoffmann,S. Willems, B. Pieske, M. Haıssaguerre, P. Jaıs, H. J. Crijns, andU. Schotten. Atrial Fibrillation Complexity Parameters Derived fromSurface ECGs Predict Procedural Outcome and Long-Term Follow-Up of Stepwise Catheter Ablation for Atrial Fibrillation. Circulation:Arrhythmia and Electrophysiology, 9(2):e003354, 2016.

[Mal01] Marek Malik. Problems of heart rate correction in assessment ofdrug-induced qt interval prolongation. Journal of cardiovascularelectrophysiology, 12(4):411–420, 2001.

[Mal12] S. Mallat. Group invariant scattering. Communications on Pure andApplied Mathematics, 65(10):1331–1398, 2012.

169

[Mar98] J. B. Mark. Atlas of cardiovascular monitoring. Churchill Livingstone,1998.

[MAR10] A Martinez, Raul Alcaraz, and Jose Joaquın Rieta. Ectopic beatscanceler for improved atrial activity extraction from holter recordingsof atrial fibrillation. In 2010 Computing in Cardiology, pages 1015–1018. IEEE, 2010.

[MAR11] Arturo Martınez, Raul Alcaraz, and Jose J Rieta. Detection andremoval of ventricular ectopic beats in atrial fibrillation recordingsvia principal component analysis. In 2011 Annual InternationalConference of the IEEE Engineering in Medicine and Biology Society,pages 4693–4696. IEEE, 2011.

[MAR12] A Martınez, R Alcaraz, and JJ Rieta. Optimal cancellation templateanalysis for ectopic beats removal in atrial fibrillation recordings. In2012 Computing in Cardiology, pages 801–804. 2012.

[MAR13] Arturo Martınez, Raul Alcaraz, and Jose J Rieta. Ventricular activitymorphological characterization: Ectopic beats removal in long termatrial fibrillation recordings. Computer methods and programs inbiomedicine, 109(3):283–292, 2013.

[MC90] Marek Malik and A John Camm. Heart rate variability. Clinicalcardiology, 13(8):570–576, 1990.

[McD55] DA McDonald. The relation of pulsatile pressure to flow in arteries.The Journal of physiology, 127(3):533–552, 1955.

[MCTS03] P. E. McSharry, G. D. Clifford, L. Tarassenko, and L. A. Smith. Adynamical model for generating synthetic electrocardiogram signals.IEEE Transactions on Biomedical Engineering, 50(3):289–294, 2003.

[MF17] Pejman Memar and Farhad Faradji. A novel multi-class eeg-basedsleep stage classification system. IEEE Transactions on Neural Sys-tems and Rehabilitation Engineering, 26(1):84–95, 2017.

[MFMTH+14] Shayan Motamedi-Fakhr, Mohamed Moshrefi-Torbati, Martyn Hill,Catherine M Hill, and Paul R White. Signal processing techniquesapplied to human sleep eeg signalsa review. Biomedical Signal Pro-cessing and Control, 10:21–33, 2014.

[MHI+15] Ramakrishna Mukkamala, Jin-Oh Hahn, Omer T Inan, Lalit KMestha, Chang-Sei Kim, Hakan Toreyin, and Survi Kyal. Towardubiquitous blood pressure monitoring via pulse transit time: the-ory and practice. IEEE Transactions on Biomedical Engineering,62(8):1879–1901, 2015.

170

[MJC89] S Lawrence Marple Jr and William M Carey. Digital spectral analysiswith applications, 1989.

[MLW18] John Malik, Yu-Lun Lo, and Hau-tieng Wu. Sleep-wake classificationvia quantifying heart rate variability by convolutional neural network.Physiological measurement, 39(8):085004, 2018.

[MM89] G. Moody and R. Mark. Qrs morphology representation and noiseestimation using the karhunen-loeve transform. In Proc. Int. Conf.IEEE Computers in Cardiology, pages 269–272. IEEE, 1989.

[MM01] George B Moody and Roger G Mark. The impact of the mit-biharrhythmia database. IEEE Engineering in Medicine and BiologyMagazine, 20(3):45–50, 2001.

[MMC+10] Martin Oswaldo Mendez, Matteo Matteucci, Vincenza Castronovo,Luigi Ferini-Strambi, Sergio Cerutti, and Anna Bianchi. Sleep stagingfrom heart rate variability: time-varying spectral features and hiddenmarkov models. International Journal of Biomedical Engineeringand Technology, 3(3-4):246–263, 2010.

[MMZM85] George B Moody, Roger G Mark, Andrea Zoccola, and Sara Mantero.Derivation of respiratory signals from multi-lead ecgs. Computers incardiology, 12(1985):113–116, 1985.

[MNY06] Ha Quang Minh, Partha Niyogi, and Yuan Yao. Mercers theorem,feature maps, and smoothing. In International Conference on Com-putational Learning Theory, pages 154–168. Springer, 2006.

[MRWW17] John Malik, Neil Reed, Chun-Li Wang, and Hau-Tieng Wu. Single-lead f-wave extraction using diffusion geometry. Physiological mea-surement, 38(7):1310, 2017.

[MS92] Mark W Mahowald and Carlos H Schenck. Dissociated states ofwakefulness and sleep. Neurology, 1992.

[MSW20] John Malik, Elsayed Z. Soliman, and Hau-Tieng Wu. An adaptiveqrs detection algorithm for ultra-long-term ecg recordings. Journalof Electrocardiology, 2020.

[MSWW19] John Malik, Chao Shen, Hau-Tieng Wu, and Nan Wu. Connectingdots: from local covariance to empirical intrinsic geometry and locallylinear embedding. Pure and Applied Analysis, 1(4):515–542, 2019.

[MT01] CL Mason and L Tarassenko. Quantitative assessment of respiratoryderivation algorithms. In 2001 Conference Proceedings of the 23rd

171

Annual International Conference of the IEEE Engineering in Medicineand Biology Society, volume 2, pages 1998–2001. IEEE, 2001.

[Nat02] S. Nattel. New ideas about atrial fibrillation 50 years on. Nature,415(6868):219–26, 2002.

[NBE+93] Carlton Wayne Niblack, Ron Barber, Will Equitz, Myron D Flickner,Eduardo H Glasman, Dragutin Petkovic, Peter Yanker, ChristosFaloutsos, and Gabriel Taubin. Qbic project: querying images bycontent, using color, texture, and shape. In Storage and retrieval forimage and video databases, volume 1908, pages 173–187. InternationalSociety for Optics and Photonics, 1993.

[ND02] David J Nott and William TM Dunsmuir. Estimation of nonstation-ary spatial covariance structure. Biometrika, 89(4):819–829, 2002.

[NDKW01] David J Nott, William T M Dunsmuir, Robert Kohn, and FrankWoodcock. Statistical correction of a deterministic numerical weatherprediction model. Journal of the American Statistical Association,96(455):794–804, 2001.

[NFM+11] Lino Nobili, Michele Ferrara, Fabio Moroni, Luigi De Gennaro, Gior-gio Lo Russo, Claudio Campus, Francesco Cardinale, and FabrizioDe Carli. Dissociated wake-like and sleep-like electro-cortical activityduring sleep. Neuroimage, 58(2):612–619, 2011.

[NLCK06] Boaz Nadler, Stephane Lafon, Ronald R Coifman, and Ioannis GKevrekidis. Diffusion maps, spectral clustering and reaction coordi-nates of dynamical systems. Applied and Computational HarmonicAnalysis, 21(1):113–127, 2006.

[NPS+00] Robert G Norman, Ivan Pal, Chip Stewart, Joyce A Walsleben, andDavid M Rapoport. Interobserver agreement among sleep scorersfrom different centers in a large dataset. Sleep, 23(7):901–908, 2000.

[OA05] MF O’Rourke and A Adji. An updated clinical primer on large arterymechanics: implications of pulse waveform analysis and arterialtonometry. Current opinion in cardiology, 20(4):275, 2005.

[OBS+15] Julien Oster, Joachim Behar, Omid Sayadi, Shamim Nemati, Alis-tair EW Johnson, and Gari D Clifford. Semisupervised ecg ventricularbeat classification with novelty detection based on switching kalmanfilters. IEEE Transactions on Biomedical Engineering, 62(9):2125–2134, 2015.

172

[OG96] Michael F O’Rourke and David E Gallagher. Pulse wave analysis.Journal of hypertension. Supplement: official journal of the Interna-tional Society of Hypertension, 14(5):S147–57, 1996.

[ON04] Michael F O’Rourke and Wilmer W Nichols. Changes in wavereflection with advancing age in normal subjects. Hypertension(Dallas, Tex.: 1979), 44(6):e10–1, 2004.

[ONBC04] Hee-Seok Oh, Doug Nychka, Tim Brown, and Paul Charbonneau.Period analysis of variable stars by robust smoothing. Journal of theRoyal Statistical Society: Series C (Applied Statistics), 53(1):15–30,2004.

[O’R11] Michael F O’Rourke. McDonald’s Blood Flow in Arteries 6th Edition:Theoretical, Experimental and Clinical Principles. Hodder Arnold,2011.

[otESoC+96] Task Force of the European Society of Cardiology et al. Heart ratevariability, standards of measurement, physiological interpretation,and clinical use. circulation, 93:1043–1065, 1996.

[PAHJ11] Cheolwoo Park, Jeongyoun Ahn, Martin Hendry, and Woncheol Jang.Analysis of long period variable stars with nonparametric tests fortrend detection. Journal of the American Statistical Association,106(495):832–845, 2011.

[PDMB15] Laure Peter-Derex, Michel Magnin, and Helene Bastuji. Heterogene-ity of arousals in human sleep: a stereo-electroencephalographic study.Neuroimage, 123:229–244, 2015.

[Pea01] Karl Pearson. Liii. on lines and planes of closest fit to systems ofpoints in space. The London, Edinburgh, and Dublin PhilosophicalMagazine and Journal of Science, 2(11):559–572, 1901.

[Pey09] Gabriel Peyre. Manifold models for signals and images. ComputerVision and Image Understanding, 113(2):249–260, 2009.

[PG94] Steven M Pincus and Ary L Goldberger. Physiological time-seriesanalysis: what does regularity quantify? American Journal ofPhysiology-Heart and Circulatory Physiology, 266(4):H1643–H1656,1994.

[PH96] June J Pilcher and Allen I Huffcutt. Effects of sleep deprivation onperformance: a meta-analysis. Sleep, 19(4):318–326, 1996.

173

[PHSG95] C-K Peng, Shlomo Havlin, H Eugene Stanley, and Ary L Goldberger.Quantification of scaling exponents and crossover phenomena innonstationary heartbeat time series. Chaos: An InterdisciplinaryJournal of Nonlinear Science, 5(1):82–87, 1995.

[PKB+16] Thomas Penzel, Jan W Kantelhardt, Ronny P Bartsch, Maik Riedl,Jan F Kraemer, Niels Wessel, Carmen Garcia, Martin Glos, IngoFietze, and Christoph Schobel. Modulations of heart rate, ecg, andcardio-respiratory coupling observed in polysomnography. Frontiersin physiology, 7:460, 2016.

[PKZL12] S.-T. Pan, C.-E. Kuo, J.-H. Zeng, and S.-F. Liang. A transition-constrained discrete hidden Markov model for automatic sleep staging.Biomedical engineering online, 11(1):52, 2012.

[PMR00] Hamilton P.S., Curley M., and Aimi R. Effect of adaptive motion-artifact reduction on QRS detection. Biomed Instrum Technol,34(3):197–202, 2000.

[PMSL12] Andrius Petrenas, Vaidotas Marozas, Leif Sornmo, and Arunas Luko-sevicius. An echo state neural network for qrst cancellation dur-ing atrial fibrillation. IEEE transactions on biomedical engineering,59(10):2950–2957, 2012.

[PNB16] Ankit B Patel, Minh Tan Nguyen, and Richard Baraniuk. A proba-bilistic framework for deep learning. In Advances in neural informa-tion processing systems, pages 2558–2566, 2016.

[PNN+06] S. Petrutiu, J. Ng, H. Nijm, G.and Al-Angari, S. Swiryn, and A. Sa-hakian. Atrial fibrillation and waveform characterization. IEEEEngineering in Medicine and Biology Magazine, 25(6):24–30, 2006.

[Pol09] DSG Pollock. Investigating economic trends and cycles. In PalgraveHandbook of Econometrics, pages 243–307. Springer, 2009.

[Por16] Jacobus W Portegies. Embeddings of riemannian manifolds with heatkernels and eigenfunctions. Communications on Pure and AppliedMathematics, 69(3):478–518, 2016.

[PSH02] Edward F Pace-Schott and J Allan Hobson. The neurobiology ofsleep: genetics, cellular physiology and subcortical networks. NatureReviews Neuroscience, 3(8):591–605, 2002.

[PT85] J. Pan and W. J. Tompkins. A real-time QRS detection algorithm.IEEE Trans. Biomed. Eng., 32(3):230–6, 1985.

174

[QA17] Safa Sultan Qurraie and Rashid Ghorbani Afkhami. Ecg arrhythmiaclassification using time frequency distribution techniques. Biomedi-cal engineering letters, 7(4):325–332, 2017.

[QCCS18] Qiang Qiu, Xiuyuan Cheng, Robert Calderbank, and GuillermoSapiro. Dcfnet: Deep neural network with decomposed convolutionalfilters. arXiv preprint arXiv:1802.04145, 2018.

[Ram04] James O Ramsay. Functional data analysis. Encyclopedia of Statisti-cal Sciences, 4, 2004.

[RCS+04] J. J. Rieta, F. Castells, C. Sanchez, V. Zarzoso, and J. Millet. Atrialactivity extraction for atrial fibrillation analysis using blind source sep-aration. IEEE Transactions on Biomedical Engineering, 51(7):1176–86, 2004.

[Rec68] Allan Rechtschaffen. A manual for standardized terminology, tech-niques and scoring system for sleep stages in human subjects. Braininformation service, 1968.

[RFJ11] Alcaraz R., Hornero F., and Rieta J.J. Assessment of non-invasivetime and frequency atrial fibrillation organization markers with unipo-lar atrial electrograms. Physiol. Meas., 32:99–114, 2011.

[RGAR15] J. Rodenas, M. Garcıa, R. Alcaraz, and J. J. Rieta. Wavelet entropyautomatically detects episodes of atrial fibrillation from single-leadelectrocardiograms. Entropy, 17(9):6179–6199, 2015.

[RH07] J. J. Rieta and F. Hornero. Comparative study of methods for ventric-ular activity cancellation in atrial electrograms of atrial fibrillation.Physiological measurement, 28(8):925–36, 2007.

[RK04] Ryan Rifkin and Aldebaro Klautau. In defense of one-vs-all clas-sification. Journal of machine learning research, 5(Jan):101–141,2004.

[RMHA07] BR Ribeiro, AM Marques, JH Henriques, and MA Antunes. Prema-ture ventricular beat detection by using spectral clustering methods.In 2007 Computers in Cardiology, pages 149–152. IEEE, 2007.

[RPTB01] Yossi Rubner, Jan Puzicha, Carlo Tomasi, and Joachim M Buhmann.Empirical evaluation of dissimilarity measures for color and texture.Computer vision and image understanding, 84(1):25–43, 2001.

[RS80] Michael Reed and Barry Simon. Methods of modern mathematicalphysics. Volume I: functional analysis. Academic press, 1980.

175

[RS00] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionalityreduction by locally linear embedding. science, 290(5500):2323–2326,2000.

[RSW09] Ori Rosen, David S Stoffer, and Sally Wood. Local spectral analysisvia a bayesian mixture of smoothing splines. Journal of the AmericanStatistical Association, 104(485):249–262, 2009.

[RTG00] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earthmover’s distance as a metric for image retrieval. International journalof computer vision, 40(2):99–121, 2000.

[SBDH97] Jaroslav Stark, DS Broomhead, ME Davies, and J Huke. Takensembedding theorems for forced and stochastic systems. NonlinearAnalysis: Theory, Methods & Applications, 30(8):5303–5314, 1997.

[SBM+85] J. Slocum, E. Byrom, L. McCarthy, A. Sahakian, and S. Swiryn.Computer detection of atrioventricular dissociation from the sur-face electrocardiogram during wide QRS tachycardias. Circulation,72:1028–1036, 1985.

[SC08] Amit Singer and Ronald R Coifman. Non-linear independent com-ponent analysis with diffusion maps. Applied and ComputationalHarmonic Analysis, 25(2):226–239, 2008.

[SCO92] EEG AROUSAL SCORING. Eeg arousals: scoring rules and exam-ples a preliminary report from the sleep disorders atlas task force ofthe american sleep disorders association. Sleep, 15(2), 1992.

[SD] Myriam Kerkhofs Stphanie Devuyst, Thierry Dutoit. Dreams subjectsdatabase. University of MONS - TCTS Laboratory and UniversitLibre de Bruxelles - CHU de Charleroi Sleep Laboratory.

[SDMA93] Virend K Somers, Mark E Dyken, Allyn L Mark, and Francois MAbboud. Sympathetic-nerve activity during sleep in normal subjects.New England Journal of Medicine, 328(5):303–307, 1993.

[SDWG17] Akara Supratak, Hao Dong, Chao Wu, and Yike Guo. Deepsleepnet:a model for automatic sleep stage scoring based on raw single-channeleeg. IEEE Transactions on Neural Systems and Rehabilitation Engi-neering, 25(11):1998–2008, 2017.

[SFB+98] Robert E Schapire, Yoav Freund, Peter Bartlett, Wee Sun Lee, et al.Boosting the margin: A new explanation for the effectiveness ofvoting methods. The annals of statistics, 26(5):1651–1686, 1998.

176

[She07] Kirk H Shelley. Photoplethysmography: beyond the calculation ofarterial oxygen saturation and heart rate. Anesthesia & Analgesia,105(6):S31–S36, 2007.

[SHMG64] Frederick Snyder, J Allan Hobson, Donald F Morrison, and FrederickGoldfrank. Changes in respiration, heart rate, and systolic bloodpressure in human sleep. Journal of applied physiology, 19(3):417–422,1964.

[SHS01] Mirja A Salo, Heikki V Huikuri, and Tapio Seppanen. Ectopic beatsin heart rate variability analysis: effects of editing on time andfrequency domain measures. Annals of noninvasive electrocardiology,6(1):5–17, 2001.

[Sin06] Amit Singer. From graph to manifold laplacian: The convergencerate. Applied and Computational Harmonic Analysis, 21(1):128–134,2006.

[SK85] A.V. Sahakian and K.H. Kuo. Canceling the cardiogenic artifact inimpedance pneumography. In Proc. Annu. Int. Conf. IEEE Engi-neering Medicine Biology Society, volume 7, pages 855–859. IEEE,1985.

[SKVG+05] ERJ Seitsonen, IKJ Korhonen, MJ Van Gils, M Huiku, JMP Lotjonen,KT Korttila, and AM Yli-Hankala. Eeg spectral entropy, heartrate, photoplethysmography and motor responses to skin incisionduring sevoflurane anaesthesia. Acta Anaesthesiologica Scandinavica,49(3):284–292, 2005.

[SMG14] C Slagt, I Malagon, and AB Groeneveld. Systematic review ofuncalibrated arterial pressure waveform analysis to determine cardiacoutput and stroke volume variation. British journal of anaesthesia,112(4):626, 2014.

[SMZ14] Fred Shaffer, Rollin McCraty, and Christopher L Zerr. A healthyheart is not a metronome: an integrative review of the heart’s anatomyand heart rate variability. Frontiers in psychology, 5:1040, 2014.

[SRB+91] EA Simoes, R Roark, S Berman, LL Esler, and J Murphy. Respiratoryrate: measurement of variability over time and accuracy at differentcounting periods. Archives of disease in childhood, 66(10):1199–1203,1991.

[SS01] M. Stridh and L. Sornmo. Spatiotemporal QRST cancellation tech-niques for analyses of atrial fibrillation. IEEE Transactions onBiomedical Engineering, 51:105–111, 2001.

177

[SSB+02] Bernhard Scholkopf, Alexander J Smola, Francis Bach, et al. Learningwith kernels: support vector machines, regularization, optimization,and beyond. MIT press, 2002.

[SSS92] J. Slocum, A. Sahakian, and S. Swiryn. Diagnosis of atrial fibrillationfrom surface electrocardiograms based on computer-detected atrialactivity. J Electrocardiol, 25:1–8, 1992.

[SSS98] Sergio Shkurovich, Alan V Sahakian, and Steven Swiryn. Detectionof atrial activity from high-voltage leads of implantable ventriculardefibrillators using a cancellation technique. IEEE Transactions onBiomedical Engineering, 45(2):229–234, 1998.

[STS16] Tal Shnitzer, Ronen Talmon, and Jean-Jacques Slotine. Manifoldlearning with contracting observers for data-driven time-series analy-sis. IEEE Transactions on Signal Processing, 65(4):904–918, 2016.

[STSG85] Benedict J Semmes, Martin J Tobin, James V Snyder, and AkeGrenvik. Subjective and objective measurement of tidal volume incritically iii patients. Chest, 87(5):577–579, 1985.

[SW12] Amit Singer and H-T Wu. Vector diffusion maps and the connec-tion laplacian. Communications on pure and applied mathematics,65(8):1067–1144, 2012.

[SW13] Amit Singer and H-T Wu. Two-dimensional tomography from noisyprojections taken at unknown random directions. SIAM journal onimaging sciences, 6(1):136–175, 2013.

[SW17a] Amit Singer and Hau-Tieng Wu. Spectral convergence of the connec-tion laplacian from random samples. Information and Inference: AJournal of the IMA, 6(1):58–123, 2017.

[SW17b] Li Su and Hau-Tieng Wu. Extract fetal ecg from single-lead abdomi-nal ecg by de-shape short time fourier transform and nonlocal median.Frontiers in Applied Mathematics and Statistics, 3:2, 2017.

[SWaHC02] L. Senhadji, F. Wang, a.I. Hernandez, and G. Carrault. Waveletsextrema representation for QRS-T cancellation and P wave detection.Computers in Cardiology, pages 0–3, 2002.

[SWW07] Oleg G Smolyanov, Heinrich v Weizsacker, and Olaf Wittich. Cher-noff’s theorem and discrete time approximations of brownian motionon manifolds. Potential Analysis, 26(1):1–29, 2007.

[SYC91] Tim Sauer, James A Yorke, and Martin Casdagli. Embedology.Journal of statistical Physics, 65(3-4):579–616, 1991.

178

[SYCH04] D. C. Shah, T. Yamane, K. J. Choi, and M. Haissaguerre. QRSsubtraction and the ECG analysis of atrial ectopics. Annals ofNoninvasive Electrocardiology, 9(4):389–398, 2004.

[TAH01] Stephen R Thompson, Uwe Ackermann, and Richard L Horner. Sleepas a teaching tool for integrating respiratory physiology and motorcontrol. Advances in physiology education, 25(2):29–44, 2001.

[Tak81] Floris Takens. Detecting strange attractors in turbulence. In Dynam-ical systems and turbulence, Warwick 1980, pages 366–381. Springer,1981.

[Tay66a] MG Taylor. The input impedance of an assembly of randomlybranching elastic tubes. Biophysical journal, 6(1):29–51, 1966.

[Tay66b] Michael G Taylor. Use of random excitation and spectral analysis inthe study of frequency-dependent parameters of the cardiovascularsystem. Circulation research, 18(5):585–595, 1966.

[TBFW13] G. Thakur, E. Brevdo, N. S. Fuckar, and H.-T. Wu. The syn-chrosqueezing algorithm for time-varying spectral analysis: Robust-ness properties and new paleoclimate applications. Signal Processing,93(5):1079–1094, 2013.

[TC13] Ronen Talmon and Ronald R Coifman. Empirical intrinsic geometryfor nonlinear modeling and time series filtering. Proceedings of theNational Academy of Sciences, 110(31):12535–12540, 2013.

[TdL00] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A Global Geo-metric Framework for Nonlinear Dimensionality Reduction. Science,290(5500):2319–2323, 2000.

[Teo08] L.-C. Teofilo. Sleep medicine: Essentials and review. Oxford Univer-sity Press, 2008.

[TGHS18] N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimatesfor spectral convergence of the graph Laplacian on random geomet-ric graphs towards the Laplace–Beltrami operator. arXiv preprintarXiv:1801.10108, 2018.

[TGP+96] L Toscani, PF Gangemi, A Parigi, R Silipo, P Ragghianti, E Sirabella,M Morelli, L Bagnoli, R Vergassola, and G Zaccara. Human heartrate variability and sleep stages. The Italian Journal of NeurologicalSciences, 17(6):437–439, 1996.

179

[TMG16] Orestis Tsinalis, Paul M Matthews, and Yike Guo. Automaticsleep stage scoring using time-frequency analysis and stacked sparseautoencoders. Annals of biomedical engineering, 44(5):1587–1597,2016.

[TMPG05] Robert Joseph Thomas, Joseph E Mietus, Chung-Kang Peng, andAry L Goldberger. An electrocardiogram-based technique to assesscardiopulmonary coupling during sleep. Sleep, 28(9):1151–1161, 2005.

[TMZC15] Ronen Talmon, Stephane Mallat, Hitten Zaveri, and Ronald R Coif-man. Manifold learning for latent variable inference in dynamicalsystems. IEEE Transactions on Signal Processing, 63(15):3843–3856,2015.

[TO19] Omer Turk and Mehmet Sirac Ozerdem. Epilepsy detection by usingscalogram based convolutional neural network from eeg signals. Brainsciences, 9(5):115, 2019.

[Tob88] Martin J Tobin. Respiratory monitoring in the intensive care unit.Am Rev Respir Dis, 138:1625–1642, 1988.

[TS18] Nicolas Garcıa Trillos and Dejan Slepcev. A variational approach tothe consistency of spectral clustering. Applied and ComputationalHarmonic Analysis, 45(2):239–281, 2018.

[TSC+16] JL Teboul, B Saugel, M Cecconi, D De Backer, CK Hofer, X Monnet,A Perel, MR Pinsky, DA Reuter, A Rhodes, et al. Less invasivehemodynamic monitoring in critically ill patients. Intensive caremedicine, 42(9):1350, 2016.

[TW11] G. Thakur and H.-T. Wu. Synchrosqueezing-based recovery of in-stantaneous frequency from nonuniform samples. SIAM Journal onMathematical Analysis, 43(5):2078–2095, 2011.

[TW19] Ronen Talmon and Hau-Tieng Wu. Latent common manifold learningwith alternating diffusion: analysis and applications. Applied andComputational Harmonic Analysis, 47(3):848–892, 2019.

[TZ91] N.V. Thakor and Y.S. Zhu. Applications of adaptive filtering toECG analysis: noise cancellation and arrhythmia detection. IEEETransactions on Biomedical Engineering, 38(8):785–794, 1991.

[TZ15] Naftali Tishby and Noga Zaslavsky. Deep learning and the infor-mation bottleneck principle. In 2015 IEEE Information TheoryWorkshop (ITW), pages 1–5. IEEE, 2015.

180

[UBBP18] Muhammed Kursad Ucar, Mehmet Recep Bozkurt, Cahit Bilgin, andKemal Polat. Automatic sleep staging in obstructive sleep apneapatients using photoplethysmography, heart rate variability signal andmachine learning techniques. Neural Computing and Applications,29(8):1–16, 2018.

[VCLP00] Eve Van Cauter, Rachel Leproult, and Laurence Plat. Age-relatedchanges in slow wave sleep and rem sleep and relationship with growthhormone and cortisol levels in healthy men. Jama, 284(7):861–868,2000.

[VCP96] Eve Van Cauter and Laurence Plat. Physiology of growth hormonesecretion during sleep. The Journal of pediatrics, 128(5):S32–S37,1996.

[VDMPVdH09] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik.Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71):13, 2009.

[vdSWSvL18] Bjorn JP van der Ster, Berend E Westerhof, Wim J Stok, andJohannes J van Lieshout. Detecting central hypovolemia in simulatedhypovolemic shock by automated feature extraction with principalcomponent analysis. Physiological reports, 6(22):e13895, 2018.

[vDvBH+17] A van Drumpt, J van Bommel, S Hoeks, F Grune, T Wolvetang,J Bekkers, and M ter Horst. The value of arterial pressure waveformcardiac output measurements in the radial and femoral artery inmajor cardiac surgery patients. BMC Anesthesiology, 17, 2017.

[VDWHH11] Alexander TM Van De Water, Alison Holmes, and Deirdre A Hurley.Objective measurements of sleep for non-laboratory settings as alter-natives to polysomnography–a systematic review. Journal of sleepresearch, 20(1pt2):183–200, 2011.

[VHM+01] C. Vasquez, A. Hernandez, F. Mora, G. Carrault, and G. Passariello.Atrial activity enhancement by Wiener filtering using an artificialneural network. IEEE Transactions on Biomedical Engineering,48(8):940–4, 2001.

[vIJH07] A. van Oosterom, Z. Ihara, V. Jacquemet, and R. Hoekema. Vector-cardiographic lead systems for the characterization of atrial fibrilla-tion. Journal of electrocardiology, 40(4):343.e1–11, 2007.

[VLBB08] Ulrike Von Luxburg, Mikhail Belkin, and Olivier Bousquet. Consis-tency of spectral clustering. The Annals of Statistics, pages 555–586,2008.

181

[VLBB16] Jose Vicente, Pablo Laguna, Ariadna Bartra, and Raquel Bailon.Drowsiness detection using heart rate variability. Medical & biologicalengineering & computing, 54(6):927–937, 2016.

[VLM04] Michael V Vitiello, Lawrence H Larsen, and Karen E Moe. Age-related sleep change: gender and estrogen effects on the subjective–objective sleep quality relationships of healthy, noncomplaining oldermen and women. Journal of psychosomatic research, 56(5):503–510,2004.

[VMH17] Albert Vilamala, Kristoffer H Madsen, and Lars K Hansen. Deepconvolutional neural networks for interpretable analysis of eeg sleepstage scoring. In 2017 IEEE 27th International Workshop on MachineLearning for Signal Processing (MLSP), pages 1–6. IEEE, 2017.

[VPH+09] Luiz Carlos Marques Vanderlei, Carlos Marcelo Pastre,Rosangela Akemi Hoshi, Tatiana Dias de Carvalho, and MoacirFernandes de Godoy. Basic notions of heart rate variability and itsclinical applicability. Brazilian Journal of Cardiovascular Surgery,24(2):205–217, 2009.

[VQMR95] Bradley V Vaughn, SR Quint, JA Messenheimer, and KR Robertson.Heart period variability in sleep. Electroencephalography and clinicalNeurophysiology, 94(3):155–162, 1995.

[VSKKVdV90] B Van Sweden, B Kemp, HAC Kamphuisen, and EA Van der Velde.Alternative electrode placement in (automatic) sleep scoring (f pz-cz/p z-oz versus c4-at). Sleep, 13(3):279–283, 1990.

[Wan15] Xu Wang. Spectral convergence rate of graph laplacian. arXivpreprint arXiv:1510.08110, 2015.

[WAO+04] Thomas Weber, Johann Auer, Michael F ORourke, Erich Kvas,Elisabeth Lassnig, Robert Berent, and Bernd Eber. Arterial stiffness,wave reflections, and the risk of coronary artery disease. Circulation,109(2):184–189, 2004.

[WB17] Thomas Wiatowski and Helmut Bolcskei. A mathematical theoryof deep convolutional neural networks for feature extraction. IEEETransactions on Information Theory, 64(3):1845–1866, 2017.

[WC85] Ling Y Wei and Peter Chow. Frequency distribution of human pulsespectra. IEEE Transactions on Biomedical Engineering, 32(3):245–246, 1985.

182

[WCC+97] Yuh-Ying Lin Wang, CC Chang, JC Chen, H Hsiu, and WK Wang.Pressure wave propagation in arteries. a model with radial dilatationfor simulating the behavior of a real artery. IEEE Engineering inMedicine and Biology Magazine, 16(1):51–54, 1997.

[WCLY14] Hau-Tieng Wu, Yi-Hsin Chan, Yu-Ting Lin, and Yung-Hsin Yeh.Using synchrosqueezing transform to discover breathing dynamicsfrom ecg signals. Applied and Computational Harmonic Analysis,36(2):354–359, 2014.

[WCW+91] Yuh Ying Wang, SL Chang, YE Wu, TL Hsu, and WK Wang.Resonance. the missing phenomenon in hemodynamics. Circulationresearch, 69(1):246–249, 1991.

[WCW98] Ian B Wilkinson, John R Cockcroft, and David J Webb. Pulse waveanalysis and arterial stiffness. Journal of cardiovascular pharmacology,32:S33–7, 1998.

[WH17] Philip Warrick and Masun Nabhan Homsi. Cardiac arrhythmiadetection from ecg combining convolutional and long short-termmemory networks. In 2017 Computing in Cardiology (CinC), pages1–4. IEEE, 2017.

[WJS+04] Yuh-Ying Lin Wang, Ming-Yie Jan, Ching-Show Shyu, Chi-AngChiang, and Wei-Kung Wang. The natural frequencies of the arterialsystem and their relation to the heart rate. IEEE Transactions onBiomedical Engineering, 51(1):193–195, 2004.

[WLD+15] H.-T. Wu, G. F. Lewis, M. I. Davila, I. Daubechies, and S. W. Porges.Optimizing heart rate variability evaluation from pulse wave signalswith the synchrosqueezing transform. Methods of Information inMedicine, in press, 2015.

[WLH+00] Yuh Ying Lin Wang, WC Lia, Hsin Hsiu, Ming-Yie Jan, and Wei KungWang. Effect of length on the fundamental resonance frequencyof arterial models having radial dilatation. IEEE transactions onbiomedical engineering, 47(3):313–318, 2000.

[WLW09] Nico Westerhof, Jan-Willem Lankhaar, and Berend E Westerhof. Thearterial windkessel. Medical & biological engineering & computing,47(2):131–141, 2009.

[Wom55] John R Womersley. Method for the calculation of velocity, rate offlow and viscous drag in arteries when the pressure gradient is known.The Journal of physiology, 127(3):553–563, 1955.

183

[WRSV08] Eric P Widmaier, Hershel Raff, Kevin T Strang, and Arthur J Vander.Vander’s Human physiology: the mechanisms of body function. Boston:McGraw-Hill Higher Education,, 2008.

[WS01] Christopher KI Williams and Matthias Seeger. Using the nystrommethod to speed up kernel machines. In Advances in neural informa-tion processing systems, pages 682–688, 2001.

[WTL14] Hau-tieng Wu, Ronen Talmon, and Yu-Lun Lo. Assess sleep stageby modern signal processing techniques. IEEE Transactions onBiomedical Engineering, 62(4):1159–1168, 2014.

[Wu11] Hau-Tieng Wu. Adaptive analysis of complex data sets. PhD thesis,Princeton University, 2011.

[Wu13] Hau-Tieng Wu. Instantaneous frequency and wave shape functions(i). Applied and Computational Harmonic Analysis, 35(2):181–199,2013.

[WVD+12] Devy Widjaja, Carolina Varon, Alexander Dorado, Johan AKSuykens, and Sabine Van Huffel. Application of kernel principalcomponent analysis for single-lead-ecg-derived respiration. IEEETransactions on Biomedical Engineering, 59(4):1169–1176, 2012.

[WW+18] Hau-Tieng Wu, Nan Wu, et al. Think globally, fit locally under themanifold setup: Asymptotic analysis of locally linear embedding. TheAnnals of Statistics, 46(6B):3805–3837, 2018.

[WWH+19] Shen-Chih Wang, Hau-Tieng Wu, Po-Hsun Huang, Cheng-Hsi Chang,Chien-Kun Ting, and Yu-Ting Lin. Novel imaging revealing in-ner dynamics for cardiovascular waveform analysis via unsupervisedmanifold learning. arXiv preprint arXiv:1909.04206, 2019.

[XKX+13] Lulu Xie, Hongyi Kang, Qiwu Xu, Michael J Chen, Yonghong Liao,Meenakshisundaram Thiyagarajan, John ODonnell, Daniel J Chris-tensen, Charles Nicholson, Jeffrey J Iliff, et al. Sleep drives metaboliteclearance from the adult brain. science, 342(6156):373–377, 2013.

[XNC+18] Zhaohan Xiong, Martyn P Nash, Elizabeth Cheng, Vadim V Fedorov,Martin K Stiles, and Jichao Zhao. Ecg signal classification for thedetection of cardiac arrhythmias using a convolutional recurrentneural network. Physiological measurement, 39(9):094006, 2018.

[XSG+01] Xueyan Xu, Stephanie Schuckers, CHIME Study Group, et al. Auto-matic detection of artifacts in heart period data. Journal of electro-cardiology, 34(4):205–210, 2001.

184

[XSS03] Q. Xi, A. V. Sahakian, and S. Swiryn. The Effect of QRS Cancellationon Atrial Fibrillatory Wave Signal Characteristics in the SurfaceElectrocardiogram. J. Electrocardiol., 36(3):243–249, 2003.

[XYD18] Jieren Xu, Haizhao Yang, and Ingrid Daubechies. Recursivediffeomorphism-based regression for shape functions. SIAM Journalon Mathematical Analysis, 50(1):5–32, 2018.

[XYS+13] Meng Xiao, Hong Yan, Jinzhong Song, Yuzhou Yang, and XianglinYang. Sleep stages classification based on heart rate variability andrandom forest. Biomedical Signal Processing and Control, 8(6):624–633, 2013.

[YAA+10] Bulent Yılmaz, Musa H Asyalı, Eren Arıkan, Sinan Yetkin, and FuatOzgen. Sleep stage and obstructive apneaic epoch classification usingsingle-lead ecg. Biomedical engineering online, 9(1):39, 2010.

[Yan15] Haizhao Yang. Synchrosqueezed wave packet transforms and diffeo-morphism based spectral analysis for 1d general mode decompositions.Applied and Computational Harmonic Analysis, 39(1):33–66, 2015.

[Yan17] Haizhao Yang. Multiresolution mode decomposition for adaptivetime series analysis. arXiv preprint arXiv:1709.06880, 2017.

[YKC12] Can Ye, BVK Vijaya Kumar, and Miguel Tavares Coimbra. Heartbeatclassification using morphological and dynamic features of ecg signals.IEEE Transactions on Biomedical Engineering, 59(10):2930–2941,2012.

[YL89] FC Yin and ZR Liu. Estimating arterial resistance and compli-ance during transient conditions in humans. American Journalof Physiology-Heart and Circulatory Physiology, 257(1):H190–H197,1989.

[YNS+15] J. Yang, M. N. Nguyen, P. P. San, X. Li, and S. Krishnaswamy. Deepconvolutional neural networks on multichannel time series for humanactivity recognition. In IJCAI, pages 3995–4001, 2015.

[YYJG16] Yanqing Ye, Kewei Yang, Jiang Jiang, and Bingfeng Ge. Automaticsleep and wake classifier with heart rate and pulse oximetry: Deriveddynamic time warping features and logistic model. In 2016 AnnualIEEE Systems Conference (SysCon), pages 1–6. IEEE, 2016.

[ZBH+16] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Un-derstanding deep learning requires rethinking generalization. arXivpreprint arXiv:1611.03530, 2016.

185

[ZG96] W. Zeng and L. Glass. Statistical properties of heartbeat intervalsduring atrial fibrillation. Physical Review E, 54(2):1779–1784, 1996.

[ZLB+14] S. Zeemering, T. A. R. Lankveld, P. Bonizzi, H. J. G. M. Crijns,and U. Schotten. Principal Component Analysis of Body SurfacePotential Mapping in Atrial Fibrillation Patients Suggests AdditionalECG Lead Locations. Computing in Cardiology, 41:893–896, 2014.

[ZVS84] Danguole ZEmaityte, Giedrius Varoneckas, and Eugene Sokolov.Heart rhythm control during sleep. Psychophysiology, 21(3):279–289,1984.

[ZW17a] J. Zhang and Y. Wu. Automatic sleep stage classification of single-channel eeg by using complex-valued convolutional neural network.Biomedical Engineering/Biomedizinische Technik, 2017.

[ZW17b] J. Zhang and Y. Wu. A new method for automatic sleep stage clas-sification. IEEE Transactions on Biomedical Circuits and Systems,2017.

[ZZ04] Z. Zhang and H. Zha. Principal manifolds and nonlinear dimension-ality reduction via tangent space alignment. SIAM J. Sci. Comput.,26:313 – 338, 2004.

186

Biography

John Malik is a Canadian PhD candidate in the Department of Mathematics at Duke

University with a completion date of May, 2020. He completed his undergraduate

and master’s degrees in mathematics at the University of Western Ontario. He

develops and uses tools from applied harmonic analysis, differential geometry, and

nonlinear time-frequency analysis to extract clinically relevant information from

biomedical time series. He is currently funded by the Myra and William Waldo

Boone Fellowship. Journal publications by the author include [MSW20] [MRWW17]

[MLW18] [MSWW19] [LLM+20] [LWM19] [FdMM17].

187

a geometric approach to biomedical time series analysis

Documents