new multi-modal heart-beat estimation on an iphone · 2016. 2. 18. · abstract multi-modal...
TRANSCRIPT
-
Multi-Modal Heart-Beat Estimation On an iPhone
by
Narges Norouzi
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
Copyright c© 2014 by Narges Norouzi
-
Abstract
Multi-Modal Heart-Beat Estimation On an iPhone
Narges Norouzi
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2014
Current generation smartphone video cameras and microphones enable photoplethys-
mography (PPG) and phonocardiography (PCG) acquisition. In this thesis, I utilized
the iPhone microphone and camera to measure heart rate. We developed a heart rate
measurement system using triple sensing mechanisms (finger and face color changes and
heart sound measurement) all on the iPhone. The three proposed measurement systems
each provide an independent heart rate estimate, as well as a combined estimation based
on the fusion of the individual sensors.
The proposed algorithm estimates the heart rate by (1) heart pulse analysis to com-
pute the heart rate of the user using our version of the EMD algorithm which is used in
advanced biomedical signal processing, (2) assessing the quality of the PPG and PCG
waveforms using the Support Vector Machine (SVM) classifier,(3) concisely combining
heart rate information from the three different modalities based on the assessed quality
of the waveforms.
ii
-
I dedicate my MASc thesis to my dear parents - Malek Norouzi and
Fakhrolsadat Nabavi, for the advice, guidance, and opportunity they have
provided me throughout my personal and professional life.
Acknowledgements
I would like to say special thanks to Prof. Parham Aarabi, who has supported me through
my Masters; keeping me going when times were tough, asking insightful questions, and
offering invaluable advice.
I would also like to thank my colleagues Mike and Gary for helping with data collection,
iPhone application development and implementation.
iii
-
Contents
1 Introduction 1
2 Background 3
2.1 Cardiovascular System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Prior Work on heart rate Monitoring . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Photoplethysmography . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1.1 Photoplethysmography by Smartphone’s Videocamera . 6
2.2.2 Phonocardiography . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 heart rate Monitoring on Smartphones . . . . . . . . . . . . . . . 10
2.2.4 Signal Processing Algorithms . . . . . . . . . . . . . . . . . . . . 13
2.2.4.1 Empirical Mode Decomposition . . . . . . . . . . . . . . 13
2.2.4.1.1 Applications of EMD Algorithm . . . . . . . . . 17
2.2.4.2 Ensemble Empirical Mode Decomposition . . . . . . . . 18
2.2.4.3 The Fourier Transform and STFT . . . . . . . . . . . . 19
2.2.4.4 Wavelet Transform (WT) . . . . . . . . . . . . . . . . . 20
3 Application Architecture 23
3.1 Fingertip Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Face Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Audio Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Heartbeat Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 30
iv
-
3.5 Signal Filtering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.1 FIR Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2 EMD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.2.1 Decomposition . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.2.2 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.3 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 Peak Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7 Learning System for heart rate Estimation based on Support Vector Ma-
chines (SVMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7.1 SVM Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7.2 SVM Classifier Implementation . . . . . . . . . . . . . . . . . . . 45
3.8 Multi-Channel heart rate Estimation . . . . . . . . . . . . . . . . . . . . 47
4 Experimental Results 50
4.1 Heartbeat Detection Accuracy without the use of the SVM Classifier . . 51
4.2 SVM Classifier Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Heartbeat Detection Accuracy by using SVM Classifier . . . . . . . . . . 57
5 Conclusion 61
Bibliography 62
Appendices 69
A Acronyms 70
v
-
List of Tables
2.1 Heart rate for different ages . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1 Root Mean Square Error between heart rate measured by the pulse oxime-
ter and heart rate estimation using FIR, DWT, and EMD filtering for all
70 subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Average percentage error between heart rate measured by pulse oximeter
and heart rate estimation using FIR, DWT, and EMD filtering for all 70
subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Determining sensitivity and specificity of the SVM classifier for 84 de-
noised 5 seconds waveform segments of the PPG signal, recorded from the
fingertip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Determining sensitivity and specificity of the SVM classifier for 84 de-
noised 5 seconds waveform segments of the PPG signal recorded from the
face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Determining sensitivity and specificity of the SVM classifier for 84 de-
noised 5 seconds waveform segments of the PCG signal . . . . . . . . . . 56
4.6 Root Mean Square Error between the heart rate measured by the pulse
oximeter and heart rate estimation using the SVM classifier and FIR,
DWT, and EMD filtering for all 70 subjects . . . . . . . . . . . . . . . . 58
vi
-
4.7 Average percentage error between heart rate measured by pulse oximeter
and heart rate estimation using the SVM classifier and FIR, DWT, and
EMD filtering for all 70 subjects . . . . . . . . . . . . . . . . . . . . . . . 59
vii
-
List of Figures
2.1 A typical ECG signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 General scheme to record video for PPG acquisition . . . . . . . . . . . . 7
2.3 Phonocardiogram copied from [19] . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Auscultation areas on chest . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Sifting process and envelopes . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Decomposition of sample ECG signal into its first 12 IMFs. . . . . . . . . 16
2.7 Discrete Wavelet Transform decomposition . . . . . . . . . . . . . . . . . 22
3.1 Block diagram of the application architecture . . . . . . . . . . . . . . . 24
3.2 Video recording from the fingertip using back camera . . . . . . . . . . . 25
3.3 Region of interest in each frame . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Example of fingertip data obtained by the described capture method . . 26
3.5 Video recording from the face using front-facing camera . . . . . . . . . . 27
3.6 Example of PPG signal from the face obtained by the described capture
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 Example of PCG recorded by the microphone and identifiable heart sounds 29
3.8 Elements of the proposed algorithm . . . . . . . . . . . . . . . . . . . . . 31
3.9 FIR filtering module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.10 Sample data captured from the device. It has already been filtered and
down-sampled as specified. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
viii
-
3.11 Sample PPG recording with baseline wander, baseline, and clean PPG
after removing the baseline . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.12 Original PPG recorded from the fingertip and clean PPG after applying
FIR filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.13 Original PPG recorded from the face and clean PPG after applying FIR
filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.14 Original PCG recorded from the chest of the User and clean PCG after
applying FIR filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.15 Original PPG recorded from the fingertip and the decomposed IMFs using
the EMD algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.16 Decomposition of the PPG signal into IMFs using the EMD algorithm. . 39
3.17 Power Spectral Density of the decomposed IMFs. . . . . . . . . . . . . . 39
3.18 Original PPG recorded from the fingertip and the clean signal after apply-
ing the EMD algorithm and reconstructing the signal based on the Power
Spectral Density of the IMFs. . . . . . . . . . . . . . . . . . . . . . . . . 40
3.19 Block diagram of the wavelet decomposition and reconstruction. . . . . . 41
3.20 An illustration of the peak detection algorithm . . . . . . . . . . . . . . . 43
3.21 Histogram of peak-to-peak distance from the fingertip recording . . . . . 48
3.22 Histogram of peak-to-peak distance from the face recording . . . . . . . . 48
3.23 Histogram of peak-to-peak distance from the audio recording . . . . . . . 49
3.24 Histogram of peak-to-peak distance from 3 modalities . . . . . . . . . . . 49
4.1 Comparison between the accuracy of proposed heartbeat rate estimation
system with and without the use of SVM classifier in terms of RMSE . . 59
4.2 Comparison between the accuracy of proposed heartbeat rate estimation
system with and without the use of SVM classifier in terms of average
percentage error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
ix
-
Chapter 1
Introduction
Advancements in sensor technology allow for new models in automated healthcare
monitoring. Currently, specialized devices such as electrocardiograph, pulse oximeters
and phonocardiagraphs are used to measure heart rates. Furthermore, wireless heart
rate monitors are widely available and provide users with realtime estimates of their
HR at rest, during and following physical activities [1]. However, wireless HR monitors
often require wearing a strap around the chest or arm. As heart rate monitors receive
wide distribution to use as low-cost physiological measurement solutions, the alternative
idea of using smartphones as heart rate monitors has now emerged. With the heart rate
monitor applications on smartphones, people do not need to carry heart rate monitors,
which is much more convenient.
In recent years, automated health monitoring with mobile smartphones has become
the subject of great interest [2]. In particular, there has been significant interest in
accurately estimating heart beat frequency using the smartphone’s built-in camera, ac-
celerometer, gyroscope, and microphone. This is a relatively easy way to measure user
heart rates since it does not require any special skills or buying special devices. All that
is needed is a smartphone with on-board sensors.
1
-
Chapter 1. Introduction 2
Monitoring heart rates using a smartphones is important as a non-invasive remote
health care monitoring option. Monitoring heart rate is both important during exercise
and rest. Resting heart rates are a good indicator of aerobic fitness and also reduce
the risk of heart attack. On the other hand, measuring heart rate before, during, and
after exercise improves the quality of exercise and also ensures the safety of the fitness
program.
Currently, there are many applications in the App Store that measures the user’s
heart rate using either auscultation or pulse oximetry. Auscultation is done in “Heart
Monitor for iPhone” [21] and “Heart Record” [22]. Pulse oximetry is done both through
the finger in applications such as “Instant Heart Rate” [24] and “Heart Beat Rate” [25]
and also through the face in “Cardiio” [28] and “Touchless Pulse Monitor”[29].
In this thesis, an iPhone application is developed and tested in order to demonstrate
the potential of the iPhone for measuring heartbeat rates in realtime. This application
makes use of the iPhones front-facing and back cameras and its microphone for PPG
and PCG acquisition in order to provide an estimate of users heart rate. The goal of
this project is to make the application fully functional, providing people with precise and
useful wellness information based on their pulse using more advanced signal processing
techniques like EMD and machine learning algorithms.
The proposed heartbeat estimation system in this thesis provides an estimation of
user’s HR by (1) computing HR of the input PPGs and PCG signals using our version
of Empirical Mode Decomposition (EMD) algorithm, (2) assessing the quality of input
signals in order to distinguish between good/bad waveform segments using the SVM clas-
sifier, (3) concisely combining heart rates information from the three different modalities
based on the assessed quality of signals.
Finally, a quantitative comparison between EMD-based filtering and other filtering algo-
rithms like Discrete Wavelet Transform (DWT) algorithm and FIR filtering is conducted.
-
Chapter 2
Background
2.1 Cardiovascular System
The circulatory system is responsible for transporting and delivering the blood cell’s
their nutrition, water and oxygen and carries away waste such as carbon dioxide that
cells produce. The heart serves as a pump to deliver blood to the body’s tissues. The
heart does so by undergoing a cycle of contraction and relaxation called the cardiac cycle.
The heart is comprised of atrium and ventricles to pump blood in each cardiac cycle.
The heart rate or pulse rate is typically expressed as the number of beats per minute
(bpm). The pulse rate varies according to the body’s physical and psychological need such
as age, physical exercise, anxiety, level of stress, and drugs. Although high pulse rates
indicate abnormality in heart activity and can help determine various problems within
the body, it cannot be used solely to diagnose an abnormality. Table 2.1 demonstrates
the heartbeat range for different ages.
3
-
Chapter 2. Background 4
Table 2.1: Heart rate for different ages
Age heart rate (bpm)
Newborn 100-160
0-5 months 90-150
6-12 months 80-140
1-3 years 80-130
3-5 years 80-120
6-10 years 70-110
11-14 years 60-105
14+ years 60-100
2.2 Prior Work on heart rate Monitoring
Heart rate is the rate at which the heart beats, measured either in the wrist or neck
given by beats per minute. The pulse rate can be felt directly on the wrist or neck by
pressing it with the index and middle finger. A more precise method of determining
heart rate involves the use of an Electrocardiograph (ECG) or a Photoplethysmography
(PPG). ECG monitors the electrical changes occurring during the cardiac cycle from the
surface of the body. A normal ECG recording associated with a single cardiac cycle
contains three waveforms (Figure 2.1).
The P wave shows the sequential activation of the right and left atria. The QRS
complex (which consists of the Q, R, and S waves) represents the simultaneous activa-
tion of the right and left ventricles. The last waveform, the T wave, is triggered by the
repolarization of the ventricles.
-
Chapter 2. Background 5
Figure 2.1: A typical ECG signal
In the rest of this section, the two other non-invasive methods, photoplethysmog-
raphy and phonocardiography, for measuring heart rate are described in detail.
2.2.1 Photoplethysmography
Non-invasive measurements of temporal variation in the blood volume by pulse oxime-
try is acknowledged to be one of the most important technological advances in monitor-
ing a patient’s heart rate in clinical settings [3]. The photoplethysmograph was first
introduced by Hertzman [4], composed of a light source and a photo-detector. In pho-
toplethysmography a sensor is placed on a thin part of the patient’s body like the ear
lobes, fingertips, or toes, where a high degree of superficial vasculature exist.
Photoplethysmogram (PPG) waveform is formed by measuring the amount of light
passing through the skin and represent the changes in the shape of the pulse. This phe-
nomenon is caused by absorption of light by the capillaries, which become full of blood in
-
Chapter 2. Background 6
each heartbeat cycle and thus less light can pass through them. The PPG obtained from
pulse oximetry has been shown to be used for estimating other important physiological
features such as blood oxygen saturation and breath rate effectively [5].
2.2.1.1 Photoplethysmography by Smartphone’s Videocamera
Two popular portable devices for measuring the heart rate are pulse oximeters (which
attach to one of the fingers) and heart rate monitors (which use a belt to electronically
detect the heart rate and relay that information to a specially designed watch). In order
for these types of “standalone” products to reach the public, there would be significant
cost and require the user to purchase a device that was designed for a singular purpose,
which is inconvenient.
A more convenient alternative would be to create an application on a smartphone that
could use the hardware functionality of the smartphone to capture the heart rate. Most
recent smartphones are equipped with high-resolution cameras and LEDs. This is very
similar to the construction of the pulse oximeters. Users should place their finger on the
smartphone’s camera covering both the camera and LED. A schematic picture of video
recording for PPG acquisition on a smartphone is shown in Figure 2.2.
[6] proposed to use the smartphone’s camera for PPG acquisition. The waveform
acquisition was done on a Nokia E63 and it is reported that the green channel signal
is more informative than the red channel signal. However, [7] showed that the distri-
bution of the pixels in the green channel is not uniform for different smartphones like
HTC HD2, iPhone 4, Nokia or Samsung, and red channel characteristics are similar for
different smartphones.
The PPG signal acquisition on smartphones utilizes the same image acquisition con-
cept that is available in pulse oximeters. In order to determine oxygenated and deoxy-
genated blood, based on the blood opacity, the average of the red channel intensities in
-
Chapter 2. Background 7
Figure 2.2: General scheme to record video for PPG acquisition
each frame is calculated and the plot of average red channel intensities over time is an
indication of PPG.
To estimate the heart rate reliably from the PPG signal recorded by a smartphone,
the affect of finger pressure on the the lens of camera, finger movement during recording,
and illumination level of the environment must be taken into account. Several methods
have been proposed to consider these factors in the literature of PPG processing.
[8] introduced the idea of removing motion artifact from the PPG signal for the accu-
rate measurement of aerial oxygen saturation during movement. They use a combination
of Independent Component Analysis (ICA) and block interleaving with low-pass filter-
ing to reduce the motion artifact. Enriquez et al. in [9] studied the plethysmographic
signal using Principle Components Analysis (PCA) and claimed that clinically relevant
parameters can be obtained from PPG when PCA is used. Furthermore, [10] presented
a realtime de-noising algorithm for PPG and ECG signals for measuring pulse rate and
-
Chapter 2. Background 8
blood pressure using Discrete Wavelet Transform (DWT). Additionally, [11] reduced the
influence of force variation on the estimation of the heart rate by means of Continuous
Wavelet Transform (CWT). In their study, the experiment was conducted under three
different force conditions - low, medium, and high.
In another category of PPG processing algorithms, the idea of using Intrinsic Mode
Functions (IMF) using Empirical Mode Decomposition (EMD) is introduced [12].
Finally, several data-driven decision-support systems have been developed in order to
produce meaningful results from the physiologic data, mainly PPG signals. [13] and [14]
used Support Vector Machine (SVM) and Neural Network (NN) respectively to assess
PPG signal and extract heart rate information.
2.2.2 Phonocardiography
Heart sound is an essential tool in the clinical setting and provides clinicians with
valuable diagnostic information on heart diseases. However, phonocardiogram (PCG)
is a complex signal to analyze visually and heart auscultation can take several years to
learn; and also has a high degree of subjectivity. But, the low cost of phonocardiography
still keeps it among the most desirable clinical techniques.
Phonocardiography breaks the heartbeat into 4 distinct sections. The first sound
(“S1”) occurs during systole and is produced by opening the heart valves, audibly heard
as the “lub” in the popular “lub-dub” description of the heart [15]. The second sound
(“S2”) occurs during diastole and is produced by the valves closing; this is the “dub”.
These are considered as normal heart sounds [16]. S1 and S2 can be clearly heard while
listening to a patient’s heart with a stethoscope.
-
Chapter 2. Background 9
The next two possible sounds (S3 and S4) are generally abnormal in adults and
produce a distinct “galloping” heartbeat [17]. Finally, there is a class of sounds called
“murmurs” that can occur during any of the 4 phases and are caused by various ab-
normalities in the heart valve. Detection and analysis of these murmurs is often critical
in the diagnosis of heart problems [18]. Figure 2.3 illustrates the normal and abnormal
PCG signals and their corresponding ECG signal.
Figure 2.3: Phonocardiogram copied from [19]
There are four main areas of auscultation on the patient’s chest (Figure 2.4) that
are optimal sites for auscultation. In these sites the intensity of the heart sound is the
highest because the sound is being transmitted through solid tissue or through minimal
thickness of inflated lung.
-
Chapter 2. Background 10
Figure 2.4: Auscultation areas on chest
2.2.3 heart rate Monitoring on Smartphones
For several years people measured heart rate by listening to the heart’s sound though
the patient’s chest. At the start of the 20th century, Einthoven developed electrocardio-
graph (ECG). With an ECG, it is possible to record the electrical changes during each
heartbeat cycle and make a graphic recording of this activity.
In the 1980s, the first wireless Heart Rate Monitor (HRM) consisting of a transmitter
and a receiver was developed. The transmitter could be attached to the chest using
either disposable electrodes or an elastic electrode belt. The receiver was a watch-like
monitor worn on the wrist [20]. The development of this relatively small wireless monitor
resulted in increased utilization of HRMs by sportsmen. As a consequence, the objective
measure of HR replaced the more subjective perceived exertion as an indicator of exercise
intensity. Another relatively recent development in HR monitoring is the measurement
of Heart Rate Variability (HRV) that may have various applications. These features and
their reliability and validity will be discussed in the following sections.
-
Chapter 2. Background 11
The latest category of devices for monitoring heart rate are smartphones. Cur-
rently, in the App Store there are applications that measure the user’s heart rate using
either auscultation or pulse oximetry. Auscultation is done with the “Heart Monitor
for iPhone” [21] and “Heart Record” [22]. Pulse oximetry is done through the finger in
“Heart Rate - Free” [23], “Instant Heart Rate” [24], “Heart Beat Rate” [25], “Runtastic”
[26], and “HeartTracker” [27].
Pulse oximetry can be also done through the face in “Cardio” [28], “Touchless Pulse
Monitor” [29] and “What’s My Heart Rate” [30]. Users need to hold the iPhone roughly
six inches (15 cm) in front of them and line up their face inside a guiding box. Cardiio
uses only the front-facing camera for video recording and claims an accuracy of within
3 beats per minute of a clinical pulse oximeter [31]. It can estimate heart rate through
both face and fingertip separately. In the “What’s My Heart Rate” application, users can
switch between front-facing and back cameras in order to measure heart rate of others
and it also measures breath rate in its premium version.
Another category of mobile applications use external heart rate monitors, like a heart
rate monitoring strap to measure the heart beat rate and then the data is transferred to
the smartphone for recording in the history. This type of application is mostly used for
tracking workouts and monitoring the heart rate before, during and after each workout.
The “Digifit iCardio Multi-Sport Heart Rate Monitor Training” application is an exam-
ple of this type of application that uses the heart rate monitor strap to monitor heart
pulse during workouts [32]. Another example is “Fitbeat Heart Rate Monitor”, which
works with a 5.3K Hz un-coded heart rate belt or Bluetooth smart devices.
In addition to the above heart rate monitors, there is currently a paid app developed
by Azumio- “Stress Check Pro” that uses a pulse oximetry technique to estimate the
user’s stress level [33].
-
Chapter 2. Background 12
One note is that most of these applications do not try to infer any information from
the heartbeat other than the heart rate, and that they all use only one method to obtain
that information. It should be pointed out that most of the developers of these mobile
applications mentioned that their apps are intended for “informational and entertain-
ment purposes only”, and shouldn’t be used instead of professional medical equipment.
Different studies have been conducted to explore the potential of the smartphone to
estimate heart rate. [34] used video recorded from the face of the user using the front-
facing camera of an iPhone 4 as an indication of the PPG signal. Then, they detected
the facial region in each frame and extracted the cardiac pulse signal using frequency
analysis of the raw trace signal and the analyzed signal form ICA.
Also, Laure et al. analyzed the PPG recorded from the fingertip of the user and intro-
duced two different peak detection algorithms for HR estimation [35], [36]. They applied
the two proposed peak detection algorithms introduced in [35] and [36] on a set of 50
test measurements. In 20% of the calculations using the peak detection algorithm in [35],
the estimated values differ from real heart rate by more than 5%. Also, the application
of the peak detection algorithm proposed in [36] on the same data yielded 8% incorrect
calculations.
-
Chapter 2. Background 13
2.2.4 Signal Processing Algorithms
Several signal processing algorithms can be introduced to remove the noise added
to the original PPG and PCG signals recorded by a smartphone. The noise added to
the PPG signal normally corresponds to the different illumination levels during the video
recording, motion artifact added to the signal by face or finger movement, different finger
pressure levels on the camera, and objects that are covering some part of the face. Also,
the noise that presents in the recorded PCG corresponds to the background noise in the
environment and movement of the phone on the chest during the recording.
In this section, we will provide an overview of the different signal processing algorithms
used in processing PPG and PCG signals in the literature.
2.2.4.1 Empirical Mode Decomposition
Empirical Mode Decomposition (EMD) [37] is a method of breaking down signal with-
out leaving the time domain. EMD is an recursive method introduced to analyze non-
linear and non-stationary signals like biomedical signals [39]. This algorithm is based
on the decomposition of the original signal into a collection of Intrinsic Mode Functions
(IMFs) using a numerical sifting process.
IMFs must fulfill two conditions: i) the number of extremas and the number of zero
crossings must be equal or different at most by one; and ii) the mean value between the
upper and lower envelopes is zero everywhere.
The sifting process can be separated into following steps:
For a signal x(t), let m1 be the mean of its upper and lower envelopes as determined
from a cubic-spline interpolation of local maxima and minima. The locality is deter-
mined by an arbitrary parameter; the calculation time and the effectiveness of the EMD
depends greatly on such a parameter.
-
Chapter 2. Background 14
The first component h1 is computed:
h1 = x(t)−m1. (2.1)
• In the second sifting process, h1 is treated as the data, and m11 is the mean of h1’s
upper and lower envelopes:
h11 = h1 −m11. (2.2)
• This sifting procedure is repeated k times, until h1k is an IMF, that is:
h1(k−1) −m1k = h1k (2.3)
.
• Then, it is designated as c1=h1k, the first IMF component from the data, which
contains the shortest period component of the signal. We separate it from the rest
of the data:
X(t)− c1 = r1. (2.4)
• The procedure is repeated on rj : r1 − c2 = r2, ...., rn−1 − cn = rn.
• Thus the original signal x(t) can be expressed as:
x(t) =n∑
j=1
cj(t) + rn(t). (2.5)
• cj(t) is an IMF where j represents the number of corresponding IMF and rn(t) is
residue.
The sifting process, the maximum, minimum, and the mean envelopes are shown in
Figure 2.5.
-
Chapter 2. Background 15
0 1 2 3 4 5 6 7 8 9 10−1
−0.5
0
0.5
1x 10
−3
Time (s)
Am
plit
ud
e
Data
Upper Envelope
Lower Envelope
Mean
Figure 2.5: Sifting process and envelopes
The stoppage criterion determines the number of sifting steps to produce an IMF.
Two different stoppage criteria have been used traditionally:
1. This criterion is proposed by Huang et al. [38] and defined as a sum of the differ-
ences, SD,
SDk =
T∑t=0
|hk−1(t)− hk(t)|2
T∑t=0
h2k−1(t)
(2.6)
Then the sifting process will stop when SD is smaller than a pre-given value.
2. A second criterion is based on the number called the S-number, which is defined as
the number of consecutive siftings when the numbers of zero-crossings and extrema
are equal or at most differing by one. Specifically, an S-number is pre-selected. The
-
Chapter 2. Background 16
sifting process will stop only if for S consecutive times the numbers of zero-crossings
and extrema stay the same, and are equal or at most differ by one.
The EMD decomposes non-stationary signals into narrow-band components with de-
creasing frequency. The decomposition is complete, almost orthogonal, local and adap-
tive. All IMFs form a completely and “nearly” orthogonal basis for the original signal.
The basis directly comes from the signal, which guarantees the inherent characteristic
of the signal and avoids the diffusion and leakage of signal energy. The sifting process
eliminates riding waves, so each IMF is more symmetrical and is actually a zero mean
AM-FM component. An example of decomposition of ECG signal into its first 12 IMFs
is shown in Figure 2.6.
Figure 2.6: Decomposition of sample ECG signal into its first 12 IMFs.
-
Chapter 2. Background 17
Mode mixing appears to be the most significant drawback of the EMD algorithm,
which implies either a single IMF consisting of signals of dramatically disparate scales or
a signal of the same scale appearing in different IMF components, and usually causing
intermittency of the analyzing signal.
2.2.4.1.1 Applications of EMD Algorithm Nimunkar in [39] implemented the
EMD algorithm for filtering noisy ECG signals and compared the result of the EMD
algorithm with a traditional low-pass filtering approach. Also Tong et al. in [40] used
empirical mode decomposition for filtering power line noise in electrocardiogram signal.
They added pseudo noise at a frequency higher than the highest frequency of the signal to
filter out just the power line noise in the first IMF. They also compared the results with
traditional IIR-based bandstop filtering. This technique can also be used for filtering
power line noise during the enhancement of stress ECG signals. Furthermore, [41] used
EMD and PCA algorithms to obtain cardiovascular signals from the sensing hardware
embedded in a chair.
In another study, [42] showed that its proposed methods using EMD algorithm provides
better performance of noise reduction than wavelet thresholding de-noising methods in
aspects of remaining geometrical characteristics of ECG signal and the signal-to-noise
ratio (SNR).
The steps for de-noising the ECG signal proposed by [42] using the EMD are:
• Transform the noisy ECG signal s(k) by EMD, ci is used to denote a series of IMFs
of EMD at scale i, where i = 1, 2, .., n
• Calculate the mean square value δi at scale i, then threshold ti can be determined
by 3δ rule
• Apply the hard-thresholding method to obtain the estimated IMFs ci as follows:
-
Chapter 2. Background 18
c̃i(k) =
ci(k) if |ci(k)| ≥ ti
0 if |ci(k)| < ti(2.7)
• Reconstruct the de-noised ECG signal s(k) from c̃i(k)
2.2.4.2 Ensemble Empirical Mode Decomposition
Ensemble EMD (EEMD) was introduced to remove the mode-mixing effect. The
EEMD overcomes largely the mode-mixing problem of the original EMD by adding white
noise into the targeted signal repeatedly, and provides physically unique decompositions
when it is applied to data with mixed and intermittent scales.
The EEMD decomposing process can be separated into the following steps:
• Add a white noise series w(t) to the targeted data x(t) , the noise must be zero
mean and variance constant, so X(t) = x(t) + w(t).
• Decompose the data with added white noise into Intrinsic Mode Functions (IMFs)
and residue rn,
X(t) =n−1∑j=1
cj + rn (2.8)
• Repeat step 1 and step 2 for N times, but with different white noise, wi(t), serried
each time. So,
Xi(t) =n−1∑j=1
cij + rin (2.9)
• Obtain the ensemble means of corresponding IMFs of the decompositions as the
final result. Each IMF is obtained by decomposing the target signal.
cj =1
N
N∑i=1
cij (2.10)
-
Chapter 2. Background 19
This new approach utilizes the full advantage of the statistical characteristics of uni-
form distribution of frequency of white noise to improve the EMD method. Adding white
noise into the targeted signal, all scales continue to avoid the mode-mixing phenomenon.
Comparing the IMF component at the same level, EEMD has more concentrated and
band limited components.
2.2.4.3 The Fourier Transform and STFT
The Fourier Transform (FT), X(ω), of a signal x(t) is defined as:
X(ω) =
∫ ∞−∞
x(t)e−jωtdt (2.11)
where t and ω are the time and frequency parameters, respectively. It defines the spec-
trum of x(t) which consists of components at all frequencies over the range of which it is
nonzero.
Historically, Fourier spectrum analysis has provided a general method for examining
the global energy-frequency distribution. Fourier analysis has dominated the data anal-
ysis efforts soon after its introduction because of its power and simplicity. The Fourier
transform belongs to the class of orthogonal transformations that uses fixed harmonic
basis functions. The Fourier transform result can be shown as a decomposition of the
initial signal into harmonic functions with fixed frequencies and amplitudes.
For many signals, Fourier analysis is useful because the signal’s frequency content
is important. But Fourier analysis has a serious drawback for information loss while
transforming the signal to frequency domain. It is only valid under extremely general
conditions, (i.e. the system must be linear, and the data must be strictly periodic or
stationary) otherwise the resulting spectrum will make little physical sense.
Dannis Gabor in 1946 adapted the Fourier transform to analyze only a small set of sig-
-
Chapter 2. Background 20
nals at a time. It is called Short-Time Fourier Transform (STFT). The STFT is obtained
from the usual FT by multiplying time domain signal x(t) by an appropriate sliding time
window w(t). Thus, instead of the usual FT expression one gets a time-frequency ex-
pression of the form:
X(τ, ω) =
∫ ∞−∞
x(t)w(t− τ)e−jωtdt (2.12)
where w(t) is the time window applied to the signal.
The information STFT provides has limited precision, which is determined by the size
of the window.
2.2.4.4 Wavelet Transform (WT)
The Wavelet Transfrom (WT) is used to analyze the signal in time and frequency
domain. The WT describes the properties of a waveform that change over time and the
waveform is divided into segments of scale. It involves representing a time function in
terms of simple, fixed building blocks, termed wavelets. These building blocks are actu-
ally a family of functions, which are derived from a single generating function called the
mother wavelet by translation and dilation operations.
The WT can be categorized into two types of continuous and discrete. Continuous
Wavelet Transform (CWT) is used to divide a continuous-time function into wavelets.
The CWT of a continuous, square-integrable function x(t) at a scale a > 0 and transla-
tional value b ∈ R is dened by:
Wω(a, b) =1√|a|
∫ +∞−∞
x(t)g∗(t− ba
)dt (2.13)
Where ∗ denotes a complex conjugate, g(t) is a so-called analyzing wavelet and is a
continuous function in both the time domain and frequency domain. g(t) is called the
-
Chapter 2. Background 21
mother wavelet.
To recover the original signal x(t), inverse continuous wavelet transform can be ex-
ploited.
x(t) =
∫ +∞0
∫ +∞−∞
1
a2Xω(a, b)
1√|a|g̃
(t− ba
)db da (2.14)
g̃(t) is the dual function of g(t).
The analyzing wavelet g(t) should satisfy a certain number of properties. The most
important property is integrability and square integrability. Also, the wavelet has to be
concentrated in the time and frequency as much as possible.
However, calculating wavelet coefficients for every possible scale can represent a consid-
erable effort and result in a vast amount of data. Therefore, Discrete Wavelet Transform
(DWT) is often used. The WT can be thought of as an extension of the classic Fourier
transform, except that, instead of working on a single scale (time or frequency), it works
on a multi-scale basis. This multi-scale feature of the WT allows the decomposition of
a signal into a number of scales, each scale representing a particular coarseness of the
signal under study [43].
The DWT of a signal x[n] is calculated by passing it through a series of filters. In
each stage two 2 digital filters and 2 down samplers by 2 exist as shown in Figure 2.7.
g[n] is the discrete mother wavelet and is a high-pass filter and h[n] is its minor version
and low-pass in nature.
The outputs giving the detail coefficients (from the high-pass filter) and approxima-
tion coefficients (from the low-pass filter) is computed as follows:
ylow[n] =
+∞∑k=−∞
x[k]h[2n− k]
yhigh[n] =+∞∑
k=−∞
x[k]g[2n− k](2.15)
-
Chapter 2. Background 22
x[n]
h[n]
h[n]g[n]
g[n]
2
22 ...
Level 1 DWT coefficients
Level 2 DWT coefficients Level 3 DWT
coefficients
Figure 2.7: Discrete Wavelet Transform decomposition
The wavelet transform is often compared with the Fourier transform. The Fourier
transform is a powerful tool for processing stationary signals (a signal where there is no
change in properties of the signal). To avoid constraints associated with non- stationary
signals, a wavelet transform is introduced. Like the Fourier transform, it performs de-
composition in a fixed basis of functions. However, unlike FT it expands the signal in
terms of wavelet functions which are localized in both time and frequency [44].
-
Chapter 3
Application Architecture
We developed and tested an Apple iOS application to demonstrate the iOS device’s
potential for measuring user heart rates in realtime. This application makes use of the
iPhone’s front-facing and back cameras and also microphone for PPG and PCG acquisi-
tion, in order to provide an estimation of the user’s heart rate. Once the measurements
are obtained, the app will analyze the signals to compute the user’s heart rate.
At a high level, the core algorithm can be represented by the block diagram in Figure
3.1. Testing the iDevice sensors’ capability for retrieving heart pulse information is per-
formed in 4 steps. The video and audio processing units take in their inputs in the first
3 steps, and in the last step, signal processing and machine learning algorithms are used
to estimate the heart rate.
3.1 Fingertip Processing Unit
The application records video from the fingertip of the user, using the back camera
for 10 seconds. The user needs to gently press the camera lens and its LED with his
index finger as previously shown in Figure 2.2. When the user presses the camera lens
of the device and its LED simultaneously, the ambient light travels through the finger
23
-
Chapter 3. Application Architecture 24
Figure 3.1: Block diagram of the application architecture
and is reaches the camera sensor. A sample frame of the recording video from the index
fingertip is shown in Figure 3.2.
Our application utilizes the same image acquisition concept that is available in pulse
oximeters. In order to determine oxygenated and deoxygenated blood, based on the
blood opacity, we measure the brightness of skin over time. In order to compute the
brightness variation of the skin, we calculate the average red channel intensities of pixels
in the region of interest in each frame. So, we divide each frame into 9 cells and PPG
waveform extraction takes the central cell into consideration as shown in Figure 3.3.
-
Chapter 3. Application Architecture 25
Figure 3.2: Video recording from the fingertip using back camera
Figure 3.3: Region of interest in each frame
-
Chapter 3. Application Architecture 26
The average red channel intensity is calculated by the equation 3.1 to determine the
PPG signal.
PPG1(t) =
∑x,y
R(x, y, t)
WH(3.1)
where R(x, y, t) is the red channel intensity of frame at time t at the pixel (x, y) and
hence 0 ≤ R(x, y, t) ≤ 255. WH is the number of pixels in the region of interest, which
is 192 x 144 in our application.
Sample PPG signal from the video recorded from the index fingertip is shown in Fig-
ure 3.4. The data is quite clean and peak-to-peak distances are visually identifiable.
Figure 3.4: Example of fingertip data obtained by the described capture method
-
Chapter 3. Application Architecture 27
3.2 Face Processing Unit
Another mechanism to sense the color changes of skin during a cardiac cycle is record-
ing video from the face. The application records video from the face for 10 seconds.
To record properly, the face should be placed in front of the front-facing camera in a
well-lit environment. In this application, the user should place his/her forehead in a
pre-determined area that is displayed on the screen (Figure 3.5).
Figure 3.5: Video recording from the face using front-facing camera
In order to capture the PPG signal from the face we again apply the equation 3.2
to the region of interest in each frame of the recorded video. The PPG signal computed
from the face might have additional noise corresponding to illumination levels in the
room, objects covering the forehead, or possible movement of the device during recording.
PPG2(t) =
∑x,y
R(x, y, t)
WH(3.2)
-
Chapter 3. Application Architecture 28
and again R(x, y, t) is the red channel intensity of the frame at time t at the pixel (x, y)
and hence 0 ≤ R(x, y, t) ≤ 255. WH is the number of pixels in the region of interest,
which is 192 x 144 in our application.
A sample PPG signal from the video recorded from the face is shown in Figure 3.6.
The data is still clean, but it contains additional noise compared to the PPG signal
recorded from the fingertip.
Figure 3.6: Example of PPG signal from the face obtained by the described capture
method
For both fingertip and face recording, the exposure settings on the device are locked
in order to eliminate the effect of auto-exposure on the captured results. For example,
during the fingertip test, the finger completely covers the camera, resulting in the iOS
device over-exposing the capture, thinking it is in a low-light condition. This trend tends
to drop the frame rate and saturate the red intensity of the captured data.
-
Chapter 3. Application Architecture 29
3.3 Audio Processing Unit
In the final step, the user should place the microphone directly on his chest, preferably
on the auscultation sites that was shown in Figure 2.4. The audio will be recorded for
10 seconds using the primary microphone of the device at sampling frequency 44.1 kHz.
The recorded audio is an indication of the PCG signal and two main sounds in PCG,
S1 and S2 are identifiable from it. The sample PCG signal recorded using the primary
microphone on the iDevice is shown in Figure 3.7.
Figure 3.7: Example of PCG recorded by the microphone and identifiable heart sounds
From the PCG signal shown in Figure 3.7, two different heart sounds, S1 and S2,
are quite identifiable, but like any signal, the recorded audio by the microphone might
contain some noise corresponding to the movement of the phone placed on the chest
during recording or possible background noise in the room.
In the next section, our proposed method in filtering the three different captured
signals and estimating the heart rate of the user is presented.
-
Chapter 3. Application Architecture 30
3.4 Heartbeat Estimation Algorithm
The proposed heartbeat estimation algorithm in this thesis provides an estimate of
the user’s HR by (1) computing the HR from the input PPGs and PCG signals using
our version of the Empirical Mode Decomposition (EMD) algorithm, (2) assessing the
quality of the input signals in order to distinguish between good/bad waveform segments
using the SVM classifier, (3) concisely combining heart rate information from the three
different modalities based on the quality of signals.
Figure 3.8 illustrates the components of this approach. In the first component, we
use a signal filtering algorithm (either EMD, DWT, or FIR filtering) to remove the noise
artifact in each waveform. Next, we apply the peak detection algorithm to compute
heart rate in each segment independently. Third, we separately qualify PPG and PCG
waveform segments (each segment is 5 seconds) as either good or bad through the use of
the machine learning algorithm in the form of SVMs. In the fourth and final component,
through a decision-logic algorithm, we precisely combine the result of the three previous
components to provide the final heart rate estimation.
3.5 Signal Filtering Algorithms
In this section we will discuss three different signal filtering algorithms applied to our
dataset. Our main contribution in the signal processing algorithm is to introduce a ver-
sion of the EMD algorithm to reduce noise in PPG and PCG signals. We also applied
Wavelet Transform and FIR filtering to our dataset, in order to compare the results with
our proposed EMD algorithm.
-
Chapter 3. Application Architecture 31
Figure 3.8: Elements of the proposed algorithm
3.5.1 FIR Filtering
An overview of the designed FIR filtering algorithm is shown in Figure 3.9. We first
need to down-sample the audio before getting off the device and in the next phase we
remove baseline wander, and also filter out the noise outside the heartbeat range of
interest. Finally, we apply a moving average algorithm in order to detect the peaks
efficiently.
In the first step we need to filter and down-sample the audio to get the recorded
data off the device in a reasonable amount of time (with a reasonable size). The raw
data is sampled at 44.1 kHz for 10 seconds. The data is then filtered with a 6th order
Butterworth filter (low-pass) and down-sampled to ensure that the Nyquist requirement
is still met. An example of the data at this stage is illustrated in Figure 3.10.
-
Chapter 3. Application Architecture 32
Figure 3.9: FIR filtering module
Figure 3.10: Sample data captured from the device. It has already been filtered and
down-sampled as specified.
-
Chapter 3. Application Architecture 33
After the signal acquisition, a band-pass filter attenuates frequencies outside the in-
terest band. This reduces the noise in later processing steps and makes the resulting
heart rate signal smoother. In this case, we first remove the baseline wander to provide
signals with zero mean. Then we divided each of the three vital signals into 10 intervals,
because the average of the input signals can shift over time due to sensor drift. Linear
trends were subtracted for each interval to remove the baseline wander. A sample of the
original PPG signal, the baseline wander of the PPG, and the clean signal after removing
noise is shown in Figure 3.11.
Secondly, we apply the Butterworth band-pass filter of order four to each input signal
with cutoff frequencies of 0.8 Hz and 3.0 Hz to reject the noise outside the heart rate
range of 48 to 180 beats per minute.
0 2 4 6 8 10−5
0
5
Time (s)
Rvalu
e
Clean PPG after removing baseline
0 2 4 6 8 1040
50
60
70
Rvalu
e
Baseline
0 2 4 6 8 1040
50
60
70
RV
alu
e
Original PPG signal
Figure 3.11: Sample PPG recording with baseline wander, baseline, and clean PPG after
removing the baseline
-
Chapter 3. Application Architecture 34
Finally, a moving average is applied to the filtered data. The equation is:
y[n] =1
2L+ 1
n+L∑m=n−L
|x[m]| (3.3)
where L is the length of the window used for averaging. The study in [45] suggests that
the shorter heart sound is approximately 67-87 ms in length, and so we applied a window
of 63 ms.
Figures 3.12 and 3.13 illustrate the result of applying the FIR filtering module de-
scribed above to the PPG recorded from the fingertip and PPG recorded form the face
respectively.
0 2 4 6 8 1010
15
20
25Original fingertip recording
Rva
lue
0 2 4 6 8 10−1
−0.5
0
0.5
1Filtered fingertip recording
Time (s)
Rva
lue
Figure 3.12: Original PPG recorded from the fingertip and clean PPG after applying
FIR filtering
-
Chapter 3. Application Architecture 35
0 2 4 6 8 10169
170
171
172
173
174
Rvalu
e
Original face recording
0 2 4 6 8 10−1
−0.5
0
0.5
1
Time (s)
Rvalu
e
Filtered face recording
Figure 3.13: Original PPG recorded from the face and clean PPG after applying FIR
filtering
Finally, since we need to correlate the three inputs to estimate the heart rate of
the user, we need the sampling rates of the audio recording from the microphone to
be proportionally equal to the sampling rate of our camera, which is 30 fps. So, we
down-sampled the audio and for this purpose we use the Butterworth low-pass filter
with appropriate cutoff frequencies to avoid aliasing. Figure 3.14. illustrates the PCG
recorded from the chest of the user and its down-sampled, filtered, and smooth result
after applying the algorithm.
-
Chapter 3. Application Architecture 36
0 2 4 6 8 10−4000
−2000
0
2000
4000Audio recording
Am
plit
ude
0 2 4 6 8 10−400
−200
0
200
400
Time (s)
Am
plit
ude
Filtered audio recording
Figure 3.14: Original PCG recorded from the chest of the User and clean PCG after
applying FIR filtering
3.5.2 EMD Algorithm
Now we will discuss the EMD algorithm proposed for filtering out noise from the two
PPGs and the PCG signals, which consists of two parts:
3.5.2.1 Decomposition
The Empirical Mode Decomposition algorithm is basically introduced to analyze non-
linear and non-stationary signals like biomedical signals and the algorithm is based on
decomposing the signal into a collection of IIMFs. These IMFs should fulfill 2 conditions
that were discussed previously.
-
Chapter 3. Application Architecture 37
In the first step of signal filtering using EMD we decompose the original PPGs and
a PCG using the EMD method using the following:
1. Initialize h1(t) with the original signal.
2. Identify the extreme of the signal, hi(t).
3. Generate the upper and lower envelopes by interpolation of maxima and minima
points developed in the previous step.
4. Calculate the mean of the two envelopes to determine the local mean value, m(t).
5. Calculate d(t) = hi(t)−m(t).
6. Test if d(t) becomes a zero-mean signal, then d(t) is considered as the next IMF,
hi+1(t) = d(t). Otherwise replace hi(t) with d(t) and repeat from step (2).
7. Update the residue series as r = r− hi(t) and i = i+ 1. Repeat steps (2) to (6) by
sifting the residual signal. The process is stopped when the final residual signal is
obtained as a monotonic function.
A sample decomposition of the PPG signal recorded from the fingertip into its IMFs
is illustrated in Figure 3.15.
3.5.2.2 Reconstruction
After applying the EMD algorithm on the input signal, the signal is decomposed into
a residue and a collection of IMFs. Hence it can be expressed as:
x(t) =n∑
i=1
hi(t) + r (3.4)
where n is the number of IMFs.
-
Chapter 3. Application Architecture 38
0 2 4 6 8 1040
60
80
PP
G
0 2 4 6 8 10−5
0
5
IMF
1
0 2 4 6 8 10−5
0
5
IMF
2
0 2 4 6 8 10−5
0
5
IMF
3
0 2 4 6 8 1050
60
70
Time (s)
IMF
4
Figure 3.15: Original PPG recorded from the fingertip and the decomposed IMFs using
the EMD algorithm.
We know from the literature of EMD applications that the last IMFs are considered
as baseline wander and also that high frequency noise components lie in the first IMFs. In
order to reconstruct the clean signal from the decomposed IMFs, we need to determine the
noise level in the signal. To determine the noise level and recover the heartbeat signal, the
IMFs corresponding to heartbeat is determined according to their peak frequencies. So,
we compute the Power Spectral Density of each IMF, which demonstrate the dominant
frequency in the IMF. In our algorithm, the IMFs with peak frequency, Fi, in the range
of 0.8 Hz - 3.0 Hz are classified as a component of the heartbeat signal. [46] tested these
cutoff limits on the output of sensors in a designed “HeartPhone”. Therefore, we can
reconstruct the heartbeat signal as:
Hclean(t) =∑i
hi(t) (Fi ∈ [0.8, 3.0] Hz) (3.5)
Figures 3.16 and 3.17 illustrate the decomposition of the PPG signal recorded from
-
Chapter 3. Application Architecture 39
the fingertip into five IMFs and corresponding power spectral densities.
0 2 4 6 8 105
10
15
PP
G
EMD decomposition of PPG signal
0 2 4 6 8 10−0.5
0
0.5IM
F1
0 2 4 6 8 10−1
0
1
IMF
2
0 2 4 6 8 10−0.2
0
0.2
IMF
3
0 2 4 6 8 10−2
0
2
IMF
4
0 2 4 6 8 105
10
15
Time (s)
IMF
5
Figure 3.16: Decomposition of the PPG signal into IMFs using the EMD algorithm.
0 1 2 3 4 5 6 7 8 9 10−60
−40
−20
IMF
1
Welch Power Spectral Density Estimate
0 1 2 3 4 5 6 7 8 9 10−100
−50
0
IMF
2
0 1 2 3 4 5 6 7 8 9 10−100
−50
0
IMF
3
0 1 2 3 4 5 6 7 8 9 10−100
0
100
IMF
4
0 1 2 3 4 5 6 7 8 9 10−100
0
100
Normalized Frequency (Hz)
IMF
5
Figure 3.17: Power Spectral Density of the decomposed IMFs.
-
Chapter 3. Application Architecture 40
A comparison between the original and reconstructed PPG signal, recorded from
the fingertip via the use of the EMD algorithm, is shown in Figure 3.18. The recon-
struction was based on IMFs whose dominant frequency components are in the range of
0.8 Hz - 3.0 Hz. According to the frequency range, we used second and third IMFs for
partial reconstruction of the signal.
0 2 4 6 8 106
7
8
9
10
11
12
Rva
lue
Original PPG signal
0 2 4 6 8 10−1
−0.5
0
0.5
1
Time (s)
Rva
lue
De−noised PPG after applying EMD algorithm
Figure 3.18: Original PPG recorded from the fingertip and the clean signal after applying
the EMD algorithm and reconstructing the signal based on the Power Spectral Density
of the IMFs.
3.5.3 Wavelet Transform
Wavelet transform can be used for data decomposition and reconstruction. By decom-
posing the original signal, we can eliminate the wavelets corresponding to the noise and
reconstruct a clean signal. In order to implement the WT to filter the recorded data, we
-
Chapter 3. Application Architecture 41
use Multi-Resolution Analysis (MRA).
According to Wavelet Transform analysis approximations are the high-scale, low-
frequency components and the details are the low-scale, high-frequency components of
the signal. Under a varied level of decomposition, a threshold is needed to determine
which level of components should be eliminated.
The selection of an appropriate wavelet and number of decomposition levels is very
important in the analysis of signals using the WT. The number of decomposition levels is
chosen based on the dominant frequency component of the signal. The levels are chosen
so that those parts of the signal that correlate with the frequencies required for classifi-
cation of the signal are retained in the wavelet coefficients. In our algorithm, the level
of decomposition was chosen to be 4 [43]. Thus the PPG and PCG signal were decom-
posed into the details D1 −D4 and one final approximation, A4. A4 contains dominant
frequency in the [0, 3.75] Hz, which corresponds to the heart pulse.
Usually tests are performed with different types of wavelets and the one that gives
the maximum efficiency is selected for the particular application. [43] suggests using the
Daubechies wavelet of order 2 for the PPG, ECG, and EEG signals, so we have also
done our analysis by (db2) at the level of 4. The block diagram of the wavelet transform
algorithm is illustrated bellow.
Figure 3.19: Block diagram of the wavelet decomposition and reconstruction.
-
Chapter 3. Application Architecture 42
3.6 Peak Detection Algorithm
The peak detection algorithm used in our own algorithm is a version of the Adaptive
Peak Identification Technique (ADAPIT) introduced by [13]. This algorithm detects
peak and computes heart rate for each waveform segment. The main steps of the peak
detection algorithm are as follows:
1. In order to detect peaks precisely, we need to remove the baseline of the signal
which we have already did in the previous section.
2. In this step the first estimation of the actual peaks is given:
• Two thresholds, T1 and T2 are computed. T1 is set to 2σ1 where σ1 denotes
standard deviation of all the data points of the waveform and defines the
waveform’s baseline range [-T1, T1]. T2 is set to 3σ2, σ2 being the baseline
standard deviation. The peaks greater than T2 are taken as the first estimation
of the actual peaks.
• The lower bound on the amplitude of the peak is set to one half of the median
amplitudes of all the peaks identified in the previous step.
3. To determine the actual peaks retained from the previous step, strings of markers
with period P are iteratively generated and moved along the timeline to align with
the retained peaks. Through this iteratively process, P is modified to a range of
length equivalent to HRs between 48 and 180 bpm. The largest P aligned to largest
number of peaks is selected.
4. Each unaligned marker of the selected P is allowed to move back and forth along
the timeline by as much as one half of P, in an attempt to line up any unaligned
peak.
Figure 3.20 illustrates the peak detection algorithm. (a) and (b) show the original
signal and its baseline wander and in (c) a clean signal without baseline wander is shown.
-
Chapter 3. Application Architecture 43
The threshold T1 and T2 is shown in this figure. In part (d) the primary peaks detected
by T3 are shown. Also, (e) and (f) show the peak-to-peak intervals and detected peaks
respectively.
Figure 3.20: An illustration of the peak detection algorithm
3.7 Learning System for heart rate Estimation based
on Support Vector Machines (SVMs)
3.7.1 SVM Classifier
SVM is a commonly used method for statistical pattern recognition. Consider the
problem of separating the input vectors belonging to two separate categories
V = {(x1, y1), ..., (xi, yi)} , i = 1, 2, ...,m, xi ∈ Rn, yi ∈ {±1}, with a hyper-plane wTx +
b = 0, where xi ∈ Rn are the patterns to be classified and yi ∈ {±1} are their categories,
“w” is a normal vector and “b” is a bias term.
-
Chapter 3. Application Architecture 44
The goal of the SVM classifier is to find the optimal separating hyper-plane which
optimally separates error and maximizes the distance between the closest vector to the
hyper-plane. Training the classifier involves the minimization of the error function:
1
2wTw =
1
2||w2|| (3.6)
subject to constraints:
yi(wTxi + b) ≥ 1, i = 1, 2, ...,m (3.7)
From the Equation 3.6 we can find that wTx+ b ≤ 0 for yi = −1 while wTx+ b ≥ 0 for
yi = 1.
The optimization problem can be formulated as follows:
min J(w, ζ) =1
2wTw + C
N∑1
ζi (3.8)
such that
yi(wTϕ(xi) + b) ≥ 1− ζi (3.9)
ζi ≥ 0 i = 1, ..., N (3.10)
where C is a positive regularization constant, which is chosen empirically, w is the weight
vector for training parameter, ζi is a positive slack variable indicating the distance of xi
with respect to the decision boundary, and ϕ is a nonlinear mapping function used to
map input data point xi into a higher dimentional space.
SVMs can be written using the Lagrange multiplier α ≥ 0. The solution for the
Lagrange multiplier is obtained by solving a quadratic programming problem. The SVM
decision function can be expressed as:
g(x) =∑
xi∈SV
αiyiK(x, xi) + b (3.11)
where K(x, xi) is the kernel function and defined as:
K(x, xi) = ϕ(x)Tϕ(xi) (3.12)
In this work the linear kernel function is used and is defined as K(x, xi) = xTxi.
-
Chapter 3. Application Architecture 45
3.7.2 SVM Classifier Implementation
The SVM classifier is used as a post-processing analysis in our heart rate estimation
system. The presented heartbeat estimation system is based on filtering and peak detec-
tion algorithms that were discussed in previous sections and the SVM classifier is used
to distinguish between good and bad recordings. The results from the filtering and peak
detection modules used to provide an estimation of heart rate based on the classified
good waveforms using the SVM classifier.
In our proposed method, we first apply our version of the EMD algorithm. Then, we
apply the peak detection module on the clean signal to detect the peaks in the signal,
from which the features of the classifier will be computed. We then apply the SVM
classifier, which is a supervised machine learning algorithm to distinguish between good
and bad waveforms. [48] has shown that the SVM classifier is an effective classifier in a
wide variety of applications, including in the characterization of PPG and PCG signals.
This component of the heartbeat detection system implements our premise that the
reliability of the heart rate estimation is highly dependent on the quality of the underly-
ing waveforms from which they are derived. A machine learning classifier, implemented
by SVM, automates the categorization of the waveforms by attempting to mimic the
performance of human who relies on visual inspection. A classifier learns the rules by
finding coefficients that optimize the correlations between a set of waveform-extracted
features and waveform quality obtained from manually categorized waveform samples.
There are five steps in the development of a Support Vector Machine-learning Classi-
fier:
1. manually classify and categorize sample waveform segments as good or bad.
2. define candidate waveform features that distinguish good/bad waveforms.
3. Select the most informative feature.
-
Chapter 3. Application Architecture 46
4. Train the classifier
5. Test the classifier
As a supervised-learning algorithm, the development of an SVM requires a set of in-
put/output learning samples, where the input consists of a list of discriminatory features
and the output consists of labeled binary classes.
To manually categorize waveforms, each recording is divided into two 5 seconds seg-
ments. Each segment is visually examined by a person. We examined 56 five second
segments for each of the PPG and PCG recordings. A segment is ranked as bad if more
than 2 expected peaks are not observed or if more than 2 expected peaks from the seg-
ment cannot be distinguished. Otherwise, it is ranked as good.
The success of an SVM classifier is highly dependent on good feature selection. So for
the feature selection part of the algorithm, we used the two features that are validated
by [13] for heart pulse signals, the Fraction of Aligned Waves (FW) and Pulse Wave
Variability (PV). Both of the features are time-domain features. FW provides a measure
of temporal regularity of the potential heartbeat signal and PV provides a measure of
the variability of the time interval between two adjacent pulse waves.
The training part of the SVM classifier uses the FW and PV features on a set of wave-
forms. The performance of the SVM classifier can also vary depending on the number of
waveforms used in the training phase and also the quality distribution of waveforms.
Hence, for any new data collected from the iDevice, we first filter the signals and run
the peak detection algorithm and then we use the learned SVM to assess the quality of
6 waveforms. After assessing the quality of the waveform, the ones that are classified as
good waveforms contribute to the final estimation of the heartbeat from our algorithm,
and can be computed using the equation:
-
Chapter 3. Application Architecture 47
Hr =
∑S∈Wgood
θ(S)
∑S∈Wgood
T(S)× 60 (3.13)
where Wgood is the class of waveforms that are classified as a good pulse signal, θ(S) is
the number of peaks in the waveform S, and T (S) is the duration of the waveform S in
seconds.
3.8 Multi-Channel heart rate Estimation
The multi-channel heart rate estimation module use the location of detected peaks in
peak detection module. The interval between successive detected peaks are calculated for
each of the waveforms. The mean value of the histogram of peak-to-peak distances is used
as an estimation of the heart rate. Hence, we can use the computed mean value to provide
an estimation of user’s heart rate. Figures 3.21 to 3.23 demonstrates the histogram of
peak-to-peak distances for fingertip, face, and audio recordings, respectively.
The main idea behind the use of 3 different modalities for estimating the heart rate
is to make the final estimation more reliable. To achieve this goal, we assume that the
heart rate of the user has a negligible change during the test so that we can use fusion of 3
modalities for our estimation. Therefore, we combine three histograms to create a single
histogram of the fused data. Figure 3.24 illustrates the combination of three histograms.
The final heart rate estimation of the system has peak-to-peak distance of 0.93s which is
64 bpm.
-
Chapter 3. Application Architecture 48
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Histogram of Peak Intervals for the Fingertip Recording
Peak−to−peak Distance (Seconds)
Nu
mb
er
of
Occu
rre
nce
s
0.8358
Figure 3.21: Histogram of peak-to-peak distance from the fingertip recording
0.2 0.4 0.6 0.8 1 1.2 1.4 1.60
0.5
1
1.5
2
2.5
3
3.5
4Histogram of Peak Intervals for the Recording of Face
Peak−to−peak Distance (Seconds)
Nu
mb
er
of
Occu
rre
nce
s
0.8870
Figure 3.22: Histogram of peak-to-peak distance from the face recording
-
Chapter 3. Application Architecture 49
−1 −0.5 0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5Histogram of Peak Intervals for the Audio Recording
Peak−to−peak Distance (Seconds)
Nu
mb
er
of
Occu
rre
nce
s
0.9364
Figure 3.23: Histogram of peak-to-peak distance from the audio recording
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
2
4
6
8
10
12
14Histogram of Peak Intervals for the Fused Data
Peak−to−peak Distance (Seconds)
Num
ber
of O
ccurr
ences
0.8747
Figure 3.24: Histogram of peak-to-peak distance from 3 modalities
-
Chapter 4
Experimental Results
Our iPhone application was developed with data collected from 70 adults, aged 19-62-
all without any known history of cardiovascular abnormalities. The diverse sample of
participants consisted of 37 females and 33 males. Furthermore, 11 of the participants
had dark skins.
The experiments were conducted in a quiet and well-lit environment. Throughout the
experiment, subjects were comfortably seated, holding an iPod in their right hand and a
CMS 50-E pulse oximeter connecting to the index finger of their left hand.
We recorded user’s heart rate simultaneously using a pulse oximeter in order to ad-
dress the accuracy of our proposed algorithm. We used a pulse oximeter because it is
the easiest non-invasive way to measure heart rate of users, with a known error rate
that does not exceed 2% [49]. The heart rate measured by the pulse oximeter during
the experiment was recorded and used as a reference in comparison to the results of our
proposed algorithm.
50
-
Chapter 4. Experimental Results 51
4.1 Heartbeat Detection Accuracy without the use
of the SVM Classifier
In this section the accuracy of each of the FIR filters, Discrete Wavelet Transform,
and our proposed EMD algorithm is presented and compared. To evaluate the accuracy
of each method, we apply each of the signal processing algorithms to remove the high
frequency noise and baseline wander. Then, we apply the peak detection algorithm to
the de-noised signal and fuse the histograms of peak-to-peak distances to retrieve the
corresponding heart rate.
To compute the similarity between the actual heart rate measured by the pulse oxime-
ter and the estimated heart rate by each of the signal processing algorithms, the Root
Mean Square Error (RMSE) is computed. The RMSE between the heart rate measured
by pulse oximeter and heart rate estimation using FIR, DWT, and EMD filtering algo-
rithms for all 70 subjects are shown in Table 4.1. The RMSE is computed for each of
the three modalities and also the estimation from the fusion of the modalities.
From Table 4.1 we can see that, although each of the modalities contain valuable
information about the heart rate, the fused data provides us with a more accurate es-
timation of the users heart rate. Moreover, PPG signals are more informative than the
PCG signal due to the unavoidable background sounds in the experimental environment.
Furthermore, the video recorded from the fingertip is more accurate than the face, since
the illumination level of the environment and other objects covering the face have less
effect.
-
Chapter 4. Experimental Results 52
Table 4.1: Root Mean Square Error between heart rate measured by the pulse oximeter
and heart rate estimation using FIR, DWT, and EMD filtering for all 70 subjects
Fingertip
Recording
(bpm)
Face Recording
(bpm)
Audio Recording
(bpm)
Fused Estimation
(bpm)
FIR Filtering 4.8 5.2 6.1 4.7
DWT Filtering 4.7 4.8 5.1 4.5
EMD Filtering 4.1 4.3 5.2 3.8
Table 4.2 also demonstrates the average percentage error between the actual heart
rate measured by pulse oximeter and estimated heart rate using FIR, DWT, and EMD
filtering algorithms.
Table 4.2: Average percentage error between heart rate measured by pulse oximeter and
heart rate estimation using FIR, DWT, and EMD filtering for all 70 subjects
Fingertip
Recording
Face Recording Audio Recording Fused Estimation
FIR Filtering 4.4% 4.8% 5.6% 4.3%
DWT Filtering 4.2% 4.5% 4.8% 3.8%
EMD Filtering 3.7% 3.8% 4.7% 3.6%
-
Chapter 4. Experimental Results 53
As the results show, the heart rate estimation algorithm using our EMD algorithm
is more accurate than the FIR filtering and DWT algorithm. It should be noted that the
EMD algorithm works well for all of the three modalities and that the heart rate estima-
tion algorithm using EMD on the fused data is the most accurate heartbeat estimation.
The RMSE between the heart rate estimation using EMD on the fused data from the
three modalities and the recorded heart rate from the pulse oximeter is 3.8.
4.2 SVM Classifier Sensitivity
The level of noise recorded in PPGs from the face and fingertip and the PCG could
lead to a distortion of the heart pulse signal. In order to determine the sensitivity of the
SVM classifier discussed in Chapter 3, we tested the SVM classifier through 20 cross-
validation procedures. For each cross-validation procedure, we measure the quality of the
recordings visually and categorize them accordingly. In each of the 20 cross-validation
repetitions, 40% of the waveforms were used for training and the other 60% waveforms
were used for testing the classifier. The training sample waveforms are chosen randomly
in all 20 cross-validation procedures. For all the simulations, we used the same SVM
model with a linear kernel function and at the end of the 20 simulations, the average
performance measures, such as sensitivity (Se) and specificity(Sp), was computed and
the human classification was used as the ground truth. Classifier sensitivity provides a
measure of the incorrectly classified bad waveform segments, whereas classifier specificity
provides a measure of false hits (i.e. the fraction of good segments classified as bad).
The sensitivity (Se) and specificity (Sp) are defined as:
Se =TP
TP + FN(4.1)
Sp =TN
TN + FP(4.2)
-
Chapter 4. Experimental Results 54
where TP is the number of good waveform segments that are identified as good waveforms,
and TN is the number of bad waveform segments classified correctly as bad waveforms,
FP is the number of good waveforms incorrectly identified as bad waveforms, and FN is
the number of bad waveform segments incorrectly classified as good waveforms.
For classification purposes, as described earlier, each waveform is divided into two
segments of 5 seconds and hence we have 56 waveform segments for each of the modalities
for training the classifier. Tables 4.3, 4.4, and 4.5. correspond to the results of testing
the SVM classifier on the rest of 84 waveform segments of the PPG signal recorded from
the fingertip, the PPG signal recorded from the face, and the PCG signal.
For each set of training-testing waveforms, we run the simulation three times using
the de-noised PPG signal from applying FIR filtering, DWT de-noising algorithm, and
our proposed EMD algorithm. Averaged over 20 cross-validations, the results in Table
4.3 show the sensitivity and specificity of the SVM classifier on the de-noised signal using
FIR filtering, DWT de-noising algorithm, and our proposed EMD algorithm.
Table 4.3: Determining sensitivity and specificity of the SVM classifier for 84 de-noised
5 seconds waveform segments of the PPG signal, recorded from the fingertip
Average Sensitivity Average Specificity
FIR Filtering 85% 86%
DWT De-noising 82% 87%
EMDAlgorithm 89% 95%
From Table 4.3 we can see that the SVM classifier works best on the waveform
segments de-noised by the EMD algorithm and the sensitivity and specificity indexes
are 89% and 95% respectively. Also, the sensitivity of the SVM classifier is higher on
de-noised signals using FIR filtering than de-noised signals using the DWT algorithm,
-
Chapter 4. Experimental Results 55
whereas the specificity of the SVM classifier on the later is better than the first one.
The second result from Table 4.3 is a higher percentage of the average specificity index
rather than the average sensitivity of the SVM classifier in all three types of de-noised
waveform segments. This shows that the SVM classifier finds bad waveforms more cor-
rectly, with higher probability than good waveform. This happens because the peak
detection algorithm cannot detect peaks better than the human eye. Hence more good
waveform segments are classified as bad waveforms and caused the lower sensitivity per-
centage of the SVM classifier.
Table 4.4 illustrates the sensitivity and specificity of the SVM classifier on three
types of clean PPG signals, recorded from the face de-noised by FIR filtering, DWT, and
EMD algorithms by the same cross-validation procedure as described. Here again, the
sensitivity and specificity of the SVM classifier on de-noised waveform segments, using
our EMD logarithm is the highest of all. Also, the specificity percentage in all of the
three types of clean PPG signals is better than the sensitivity percentage of the SVM
classifier on them.
Table 4.4: Determining sensitivity and specificity of the SVM classifier for 84 de-noised
5 seconds waveform segments of the PPG signal recorded from the face
Average Sensitivity Average Specificity
FIR Filtering 72% 81%
DWT De-noising 69% 73%
EMDAlgorithm 74% 92%
-
Chapter 4. Experimental Results 56
Furthermore, the performance of the SVM classifier on PPG signals, recorded from
the fingertip is better than its performance on PPG signals recorded from the face. This,
as we expect, shows that if the PPG signal recorded from the face is not clean and that
noise has contaminated the heart pulse, it is so intense that it would lead to distortion
of the pulse signal. The highest level of noise in the PPG recorded from the face might
be due to different illumination levels in the environment during the experiment and also
movement of the device during recording.
The sensitivity and specificity of the SVM classifier for the de-noised PCG signals
using the previously described cross-validation procedure is shown in Table 4.5. Speci-
ficity of the SVM classier has the highest percentage on de-noised PCG using the EMD
algorithm, which is 91%. Also the highest sensitivity of the SVM classifier is on the clean
PCG from the DWT (86%) which is slightly higher than that of the EMD algorithm,
which is 84%. The performance of SVM classifier on PCG waveform segments is better
than it performance on the PPG signal recorded from the face, but still worse than its
performance on the PPG signal recorded from the fingertip. Again, we can justify this
result from the fact that with no background noise in the environment, the device can
record the heart sound well, but if there is background noise, the heart pulse signal might
distort and lead to poor heart rate estimation.
Table 4.5: Determining sensitivity and specificity of the SVM classifier for 84 de-noised
5 seconds waveform segments of the PCG signal
Average Sensitivity Average Specificity
FIR Filtering 78% 82%
DWT De-noising 86% 88%
EMDAlgorithm 84% 91%
-
Chapter 4. Experimental Results 57
4.3 Heartbeat Detection Accuracy by using SVM
Classifier
In this section we will explore the accuracy of the heartbeat detection algorithms using
the SVM classifier as a post-processing analysis. We first filtered the signal using one of
the FIR, DWT, or EMD filtering algorithms. Then, we ran the peak detection algorithm
to detect the heart pulse peaks in the recorded signals. We used the peak locations and
estimated the heart rate from each of the modalities to extract the classification features
of the SVM classifier. As described in the previous chapter, we used fraction of aligned
waves and pulse variability as the classification features of our SVM classifier. Next, we
applied the SVM classifier using these two features to classify the waveform segments as
good or bad using a linear kernel. Finally, we computed the heart rate from the waveform
segments that are classified as good.
To measure the performance of the proposed heartbeat detection system using SVM
classifier, we tested the SVM classifier through 20 cross-validation procedures employing
manually categorized waveform samples, at each of the 20 cross-validation repetitions
40% of the samples were used for training and the other 60% of samples were used
for testing the classifier. The training sample waveforms are chosen randomly in all 20
cross-validation procedures. For each of the training-testing set of waveforms, we run
the simulation three times using the de-noised signals using FIR filtering, the DWT de-
noising algorithm, and our proposed EMD algorithm.
The results of the heartbeat detection algorithm using the SVM classifier on each of
the signal filtering methods are shown in Table 4.6. As the results show, adding the SVM
classifier to the heartbeat estimation system provides better performance in terms of the
RMSE between the estimated heart rate and that measured heart rate using the pulse
oximeter. The major effect of adding the SVM classifier as a post-processing analysis
-
Chapter 4. Experimental Results 58