new multi-modal heart-beat estimation on an iphone · 2016. 2. 18. · abstract multi-modal...

Multi-Modal Heart-Beat Estimation On an iPhone

by

Narges Norouzi

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

Copyright c© 2014 by Narges Norouzi

Abstract

Multi-Modal Heart-Beat Estimation On an iPhone

Narges Norouzi

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2014

Current generation smartphone video cameras and microphones enable photoplethys-

mography (PPG) and phonocardiography (PCG) acquisition. In this thesis, I utilized

the iPhone microphone and camera to measure heart rate. We developed a heart rate

measurement system using triple sensing mechanisms (finger and face color changes and

heart sound measurement) all on the iPhone. The three proposed measurement systems

each provide an independent heart rate estimate, as well as a combined estimation based

on the fusion of the individual sensors.

The proposed algorithm estimates the heart rate by (1) heart pulse analysis to com-

pute the heart rate of the user using our version of the EMD algorithm which is used in

advanced biomedical signal processing, (2) assessing the quality of the PPG and PCG

waveforms using the Support Vector Machine (SVM) classifier,(3) concisely combining

heart rate information from the three different modalities based on the assessed quality

of the waveforms.

ii

I dedicate my MASc thesis to my dear parents - Malek Norouzi and

Fakhrolsadat Nabavi, for the advice, guidance, and opportunity they have

provided me throughout my personal and professional life.

Acknowledgements

I would like to say special thanks to Prof. Parham Aarabi, who has supported me through

my Masters; keeping me going when times were tough, asking insightful questions, and

offering invaluable advice.

I would also like to thank my colleagues Mike and Gary for helping with data collection,

iPhone application development and implementation.

iii

Contents

1 Introduction 1

2 Background 3

2.1 Cardiovascular System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Prior Work on heart rate Monitoring . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Photoplethysmography . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1.1 Photoplethysmography by Smartphone’s Videocamera . 6

2.2.2 Phonocardiography . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.3 heart rate Monitoring on Smartphones . . . . . . . . . . . . . . . 10

2.2.4 Signal Processing Algorithms . . . . . . . . . . . . . . . . . . . . 13

2.2.4.1 Empirical Mode Decomposition . . . . . . . . . . . . . . 13

2.2.4.1.1 Applications of EMD Algorithm . . . . . . . . . 17

2.2.4.2 Ensemble Empirical Mode Decomposition . . . . . . . . 18

2.2.4.3 The Fourier Transform and STFT . . . . . . . . . . . . 19

2.2.4.4 Wavelet Transform (WT) . . . . . . . . . . . . . . . . . 20

3 Application Architecture 23

3.1 Fingertip Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Face Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Audio Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Heartbeat Estimation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 30

iv

3.5 Signal Filtering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5.1 FIR Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5.2 EMD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5.2.1 Decomposition . . . . . . . . . . . . . . . . . . . . . . . 36

3.5.2.2 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 37

3.5.3 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.6 Peak Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7 Learning System for heart rate Estimation based on Support Vector Ma-

chines (SVMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7.1 SVM Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7.2 SVM Classifier Implementation . . . . . . . . . . . . . . . . . . . 45

3.8 Multi-Channel heart rate Estimation . . . . . . . . . . . . . . . . . . . . 47

4 Experimental Results 50

4.1 Heartbeat Detection Accuracy without the use of the SVM Classifier . . 51

4.2 SVM Classifier Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.3 Heartbeat Detection Accuracy by using SVM Classifier . . . . . . . . . . 57

5 Conclusion 61

Bibliography 62

Appendices 69

A Acronyms 70

v

List of Tables

2.1 Heart rate for different ages . . . . . . . . . . . . . . . . . . . . . . . . . 4

4.1 Root Mean Square Error between heart rate measured by the pulse oxime-

ter and heart rate estimation using FIR, DWT, and EMD filtering for all

70 subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Average percentage error between heart rate measured by pulse oximeter

and heart rate estimation using FIR, DWT, and EMD filtering for all 70

subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 Determining sensitivity and specificity of the SVM classifier for 84 de-

noised 5 seconds waveform segments of the PPG signal, recorded from the

fingertip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54


noised 5 seconds waveform segments of the PPG signal recorded from the

face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


noised 5 seconds waveform segments of the PCG signal . . . . . . . . . . 56

4.6 Root Mean Square Error between the heart rate measured by the pulse

oximeter and heart rate estimation using the SVM classifier and FIR,

DWT, and EMD filtering for all 70 subjects . . . . . . . . . . . . . . . . 58

vi

4.7 Average percentage error between heart rate measured by pulse oximeter

and heart rate estimation using the SVM classifier and FIR, DWT, and

EMD filtering for all 70 subjects . . . . . . . . . . . . . . . . . . . . . . . 59

vii

List of Figures

2.1 A typical ECG signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 General scheme to record video for PPG acquisition . . . . . . . . . . . . 7

2.3 Phonocardiogram copied from [19] . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Auscultation areas on chest . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Sifting process and envelopes . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6 Decomposition of sample ECG signal into its first 12 IMFs. . . . . . . . . 16

2.7 Discrete Wavelet Transform decomposition . . . . . . . . . . . . . . . . . 22

3.1 Block diagram of the application architecture . . . . . . . . . . . . . . . 24

3.2 Video recording from the fingertip using back camera . . . . . . . . . . . 25

3.3 Region of interest in each frame . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Example of fingertip data obtained by the described capture method . . 26

3.5 Video recording from the face using front-facing camera . . . . . . . . . . 27

3.6 Example of PPG signal from the face obtained by the described capture

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.7 Example of PCG recorded by the microphone and identifiable heart sounds 29

3.8 Elements of the proposed algorithm . . . . . . . . . . . . . . . . . . . . . 31

3.9 FIR filtering module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.10 Sample data captured from the device. It has already been filtered and

down-sampled as specified. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

viii

3.11 Sample PPG recording with baseline wander, baseline, and clean PPG

after removing the baseline . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.12 Original PPG recorded from the fingertip and clean PPG after applying

FIR filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.13 Original PPG recorded from the face and clean PPG after applying FIR

filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.14 Original PCG recorded from the chest of the User and clean PCG after

applying FIR filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.15 Original PPG recorded from the fingertip and the decomposed IMFs using

the EMD algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.16 Decomposition of the PPG signal into IMFs using the EMD algorithm. . 39

3.17 Power Spectral Density of the decomposed IMFs. . . . . . . . . . . . . . 39

3.18 Original PPG recorded from the fingertip and the clean signal after apply-

ing the EMD algorithm and reconstructing the signal based on the Power

Spectral Density of the IMFs. . . . . . . . . . . . . . . . . . . . . . . . . 40

3.19 Block diagram of the wavelet decomposition and reconstruction. . . . . . 41

3.20 An illustration of the peak detection algorithm . . . . . . . . . . . . . . . 43

3.21 Histogram of peak-to-peak distance from the fingertip recording . . . . . 48

3.22 Histogram of peak-to-peak distance from the face recording . . . . . . . . 48

3.23 Histogram of peak-to-peak distance from the audio recording . . . . . . . 49

3.24 Histogram of peak-to-peak distance from 3 modalities . . . . . . . . . . . 49

4.1 Comparison between the accuracy of proposed heartbeat rate estimation

system with and without the use of SVM classifier in terms of RMSE . . 59

4.2 Comparison between the accuracy of proposed heartbeat rate estimation

system with and without the use of SVM classifier in terms of average

percentage error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

ix

Chapter 1

Introduction

Advancements in sensor technology allow for new models in automated healthcare

monitoring. Currently, specialized devices such as electrocardiograph, pulse oximeters

and phonocardiagraphs are used to measure heart rates. Furthermore, wireless heart

rate monitors are widely available and provide users with realtime estimates of their

HR at rest, during and following physical activities [1]. However, wireless HR monitors

often require wearing a strap around the chest or arm. As heart rate monitors receive

wide distribution to use as low-cost physiological measurement solutions, the alternative

idea of using smartphones as heart rate monitors has now emerged. With the heart rate

monitor applications on smartphones, people do not need to carry heart rate monitors,

which is much more convenient.

In recent years, automated health monitoring with mobile smartphones has become

the subject of great interest [2]. In particular, there has been significant interest in

accurately estimating heart beat frequency using the smartphone’s built-in camera, ac-

celerometer, gyroscope, and microphone. This is a relatively easy way to measure user

heart rates since it does not require any special skills or buying special devices. All that

is needed is a smartphone with on-board sensors.

1

Chapter 1. Introduction 2

Monitoring heart rates using a smartphones is important as a non-invasive remote

health care monitoring option. Monitoring heart rate is both important during exercise

and rest. Resting heart rates are a good indicator of aerobic fitness and also reduce

the risk of heart attack. On the other hand, measuring heart rate before, during, and

after exercise improves the quality of exercise and also ensures the safety of the fitness

program.

Currently, there are many applications in the App Store that measures the user’s

heart rate using either auscultation or pulse oximetry. Auscultation is done in “Heart

Monitor for iPhone” [21] and “Heart Record” [22]. Pulse oximetry is done both through

the finger in applications such as “Instant Heart Rate” [24] and “Heart Beat Rate” [25]

and also through the face in “Cardiio” [28] and “Touchless Pulse Monitor”[29].

In this thesis, an iPhone application is developed and tested in order to demonstrate

the potential of the iPhone for measuring heartbeat rates in realtime. This application

makes use of the iPhones front-facing and back cameras and its microphone for PPG

and PCG acquisition in order to provide an estimate of users heart rate. The goal of

this project is to make the application fully functional, providing people with precise and

useful wellness information based on their pulse using more advanced signal processing

techniques like EMD and machine learning algorithms.

The proposed heartbeat estimation system in this thesis provides an estimation of

user’s HR by (1) computing HR of the input PPGs and PCG signals using our version

of Empirical Mode Decomposition (EMD) algorithm, (2) assessing the quality of input

signals in order to distinguish between good/bad waveform segments using the SVM clas-

sifier, (3) concisely combining heart rates information from the three different modalities

based on the assessed quality of signals.

Finally, a quantitative comparison between EMD-based filtering and other filtering algo-

rithms like Discrete Wavelet Transform (DWT) algorithm and FIR filtering is conducted.

Chapter 2

Background

2.1 Cardiovascular System

The circulatory system is responsible for transporting and delivering the blood cell’s

their nutrition, water and oxygen and carries away waste such as carbon dioxide that

cells produce. The heart serves as a pump to deliver blood to the body’s tissues. The

heart does so by undergoing a cycle of contraction and relaxation called the cardiac cycle.

The heart is comprised of atrium and ventricles to pump blood in each cardiac cycle.

The heart rate or pulse rate is typically expressed as the number of beats per minute

(bpm). The pulse rate varies according to the body’s physical and psychological need such

as age, physical exercise, anxiety, level of stress, and drugs. Although high pulse rates

indicate abnormality in heart activity and can help determine various problems within

the body, it cannot be used solely to diagnose an abnormality. Table 2.1 demonstrates

the heartbeat range for different ages.

3

Chapter 2. Background 4

Table 2.1: Heart rate for different ages

Age heart rate (bpm)

Newborn 100-160

0-5 months 90-150

6-12 months 80-140

1-3 years 80-130

3-5 years 80-120

6-10 years 70-110

11-14 years 60-105

14+ years 60-100

2.2 Prior Work on heart rate Monitoring

Heart rate is the rate at which the heart beats, measured either in the wrist or neck

given by beats per minute. The pulse rate can be felt directly on the wrist or neck by

pressing it with the index and middle finger. A more precise method of determining

heart rate involves the use of an Electrocardiograph (ECG) or a Photoplethysmography

(PPG). ECG monitors the electrical changes occurring during the cardiac cycle from the

surface of the body. A normal ECG recording associated with a single cardiac cycle

contains three waveforms (Figure 2.1).

The P wave shows the sequential activation of the right and left atria. The QRS

complex (which consists of the Q, R, and S waves) represents the simultaneous activa-

tion of the right and left ventricles. The last waveform, the T wave, is triggered by the

repolarization of the ventricles.


Figure 2.1: A typical ECG signal

In the rest of this section, the two other non-invasive methods, photoplethysmog-

raphy and phonocardiography, for measuring heart rate are described in detail.

2.2.1 Photoplethysmography

Non-invasive measurements of temporal variation in the blood volume by pulse oxime-

try is acknowledged to be one of the most important technological advances in monitor-

ing a patient’s heart rate in clinical settings [3]. The photoplethysmograph was first

introduced by Hertzman [4], composed of a light source and a photo-detector. In pho-

toplethysmography a sensor is placed on a thin part of the patient’s body like the ear

lobes, fingertips, or toes, where a high degree of superficial vasculature exist.

Photoplethysmogram (PPG) waveform is formed by measuring the amount of light

passing through the skin and represent the changes in the shape of the pulse. This phe-

nomenon is caused by absorption of light by the capillaries, which become full of blood in


each heartbeat cycle and thus less light can pass through them. The PPG obtained from

pulse oximetry has been shown to be used for estimating other important physiological

features such as blood oxygen saturation and breath rate effectively [5].

2.2.1.1 Photoplethysmography by Smartphone’s Videocamera

Two popular portable devices for measuring the heart rate are pulse oximeters (which

attach to one of the fingers) and heart rate monitors (which use a belt to electronically

detect the heart rate and relay that information to a specially designed watch). In order

for these types of “standalone” products to reach the public, there would be significant

cost and require the user to purchase a device that was designed for a singular purpose,

which is inconvenient.

A more convenient alternative would be to create an application on a smartphone that

could use the hardware functionality of the smartphone to capture the heart rate. Most

recent smartphones are equipped with high-resolution cameras and LEDs. This is very

similar to the construction of the pulse oximeters. Users should place their finger on the

smartphone’s camera covering both the camera and LED. A schematic picture of video

recording for PPG acquisition on a smartphone is shown in Figure 2.2.

[6] proposed to use the smartphone’s camera for PPG acquisition. The waveform

acquisition was done on a Nokia E63 and it is reported that the green channel signal

is more informative than the red channel signal. However, [7] showed that the distri-

bution of the pixels in the green channel is not uniform for different smartphones like

HTC HD2, iPhone 4, Nokia or Samsung, and red channel characteristics are similar for

different smartphones.

The PPG signal acquisition on smartphones utilizes the same image acquisition con-

cept that is available in pulse oximeters. In order to determine oxygenated and deoxy-

genated blood, based on the blood opacity, the average of the red channel intensities in


Figure 2.2: General scheme to record video for PPG acquisition

each frame is calculated and the plot of average red channel intensities over time is an

indication of PPG.

To estimate the heart rate reliably from the PPG signal recorded by a smartphone,

the affect of finger pressure on the the lens of camera, finger movement during recording,

and illumination level of the environment must be taken into account. Several methods

have been proposed to consider these factors in the literature of PPG processing.

[8] introduced the idea of removing motion artifact from the PPG signal for the accu-

rate measurement of aerial oxygen saturation during movement. They use a combination

of Independent Component Analysis (ICA) and block interleaving with low-pass filter-

ing to reduce the motion artifact. Enriquez et al. in [9] studied the plethysmographic

signal using Principle Components Analysis (PCA) and claimed that clinically relevant

parameters can be obtained from PPG when PCA is used. Furthermore, [10] presented

a realtime de-noising algorithm for PPG and ECG signals for measuring pulse rate and


blood pressure using Discrete Wavelet Transform (DWT). Additionally, [11] reduced the

influence of force variation on the estimation of the heart rate by means of Continuous

Wavelet Transform (CWT). In their study, the experiment was conducted under three

different force conditions - low, medium, and high.

In another category of PPG processing algorithms, the idea of using Intrinsic Mode

Functions (IMF) using Empirical Mode Decomposition (EMD) is introduced [12].

Finally, several data-driven decision-support systems have been developed in order to

produce meaningful results from the physiologic data, mainly PPG signals. [13] and [14]

used Support Vector Machine (SVM) and Neural Network (NN) respectively to assess

PPG signal and extract heart rate information.

2.2.2 Phonocardiography

Heart sound is an essential tool in the clinical setting and provides clinicians with

valuable diagnostic information on heart diseases. However, phonocardiogram (PCG)

is a complex signal to analyze visually and heart auscultation can take several years to

learn; and also has a high degree of subjectivity. But, the low cost of phonocardiography

still keeps it among the most desirable clinical techniques.

Phonocardiography breaks the heartbeat into 4 distinct sections. The first sound

(“S1”) occurs during systole and is produced by opening the heart valves, audibly heard

as the “lub” in the popular “lub-dub” description of the heart [15]. The second sound

(“S2”) occurs during diastole and is produced by the valves closing; this is the “dub”.

These are considered as normal heart sounds [16]. S1 and S2 can be clearly heard while

listening to a patient’s heart with a stethoscope.


The next two possible sounds (S3 and S4) are generally abnormal in adults and

produce a distinct “galloping” heartbeat [17]. Finally, there is a class of sounds called

“murmurs” that can occur during any of the 4 phases and are caused by various ab-

normalities in the heart valve. Detection and analysis of these murmurs is often critical

in the diagnosis of heart problems [18]. Figure 2.3 illustrates the normal and abnormal

PCG signals and their corresponding ECG signal.

Figure 2.3: Phonocardiogram copied from [19]

There are four main areas of auscultation on the patient’s chest (Figure 2.4) that

are optimal sites for auscultation. In these sites the intensity of the heart sound is the

highest because the sound is being transmitted through solid tissue or through minimal

thickness of inflated lung.


Figure 2.4: Auscultation areas on chest

2.2.3 heart rate Monitoring on Smartphones

For several years people measured heart rate by listening to the heart’s sound though

the patient’s chest. At the start of the 20th century, Einthoven developed electrocardio-

graph (ECG). With an ECG, it is possible to record the electrical changes during each

heartbeat cycle and make a graphic recording of this activity.

In the 1980s, the first wireless Heart Rate Monitor (HRM) consisting of a transmitter

and a receiver was developed. The transmitter could be attached to the chest using

either disposable electrodes or an elastic electrode belt. The receiver was a watch-like

monitor worn on the wrist [20]. The development of this relatively small wireless monitor

resulted in increased utilization of HRMs by sportsmen. As a consequence, the objective

measure of HR replaced the more subjective perceived exertion as an indicator of exercise

intensity. Another relatively recent development in HR monitoring is the measurement

of Heart Rate Variability (HRV) that may have various applications. These features and

their reliability and validity will be discussed in the following sections.


The latest category of devices for monitoring heart rate are smartphones. Cur-

rently, in the App Store there are applications that measure the user’s heart rate using

either auscultation or pulse oximetry. Auscultation is done with the “Heart Monitor

for iPhone” [21] and “Heart Record” [22]. Pulse oximetry is done through the finger in

“Heart Rate - Free” [23], “Instant Heart Rate” [24], “Heart Beat Rate” [25], “Runtastic”

[26], and “HeartTracker” [27].

Pulse oximetry can be also done through the face in “Cardio” [28], “Touchless Pulse

Monitor” [29] and “What’s My Heart Rate” [30]. Users need to hold the iPhone roughly

six inches (15 cm) in front of them and line up their face inside a guiding box. Cardiio

uses only the front-facing camera for video recording and claims an accuracy of within

3 beats per minute of a clinical pulse oximeter [31]. It can estimate heart rate through

both face and fingertip separately. In the “What’s My Heart Rate” application, users can

switch between front-facing and back cameras in order to measure heart rate of others

and it also measures breath rate in its premium version.

Another category of mobile applications use external heart rate monitors, like a heart

rate monitoring strap to measure the heart beat rate and then the data is transferred to

the smartphone for recording in the history. This type of application is mostly used for

tracking workouts and monitoring the heart rate before, during and after each workout.

The “Digifit iCardio Multi-Sport Heart Rate Monitor Training” application is an exam-

ple of this type of application that uses the heart rate monitor strap to monitor heart

pulse during workouts [32]. Another example is “Fitbeat Heart Rate Monitor”, which

works with a 5.3K Hz un-coded heart rate belt or Bluetooth smart devices.

In addition to the above heart rate monitors, there is currently a paid app developed

by Azumio- “Stress Check Pro” that uses a pulse oximetry technique to estimate the

user’s stress level [33].


One note is that most of these applications do not try to infer any information from

the heartbeat other than the heart rate, and that they all use only one method to obtain

that information. It should be pointed out that most of the developers of these mobile

applications mentioned that their apps are intended for “informational and entertain-

ment purposes only”, and shouldn’t be used instead of professional medical equipment.

Different studies have been conducted to explore the potential of the smartphone to

estimate heart rate. [34] used video recorded from the face of the user using the front-

facing camera of an iPhone 4 as an indication of the PPG signal. Then, they detected

the facial region in each frame and extracted the cardiac pulse signal using frequency

analysis of the raw trace signal and the analyzed signal form ICA.

Also, Laure et al. analyzed the PPG recorded from the fingertip of the user and intro-

duced two different peak detection algorithms for HR estimation [35], [36]. They applied

the two proposed peak detection algorithms introduced in [35] and [36] on a set of 50

test measurements. In 20% of the calculations using the peak detection algorithm in [35],

the estimated values differ from real heart rate by more than 5%. Also, the application

of the peak detection algorithm proposed in [36] on the same data yielded 8% incorrect

calculations.


2.2.4 Signal Processing Algorithms

Several signal processing algorithms can be introduced to remove the noise added

to the original PPG and PCG signals recorded by a smartphone. The noise added to

the PPG signal normally corresponds to the different illumination levels during the video

recording, motion artifact added to the signal by face or finger movement, different finger

pressure levels on the camera, and objects that are covering some part of the face. Also,

the noise that presents in the recorded PCG corresponds to the background noise in the

environment and movement of the phone on the chest during the recording.

In this section, we will provide an overview of the different signal processing algorithms

used in processing PPG and PCG signals in the literature.

2.2.4.1 Empirical Mode Decomposition

Empirical Mode Decomposition (EMD) [37] is a method of breaking down signal with-

out leaving the time domain. EMD is an recursive method introduced to analyze non-

linear and non-stationary signals like biomedical signals [39]. This algorithm is based

on the decomposition of the original signal into a collection of Intrinsic Mode Functions

(IMFs) using a numerical sifting process.

IMFs must fulfill two conditions: i) the number of extremas and the number of zero

crossings must be equal or different at most by one; and ii) the mean value between the

upper and lower envelopes is zero everywhere.

The sifting process can be separated into following steps:

For a signal x(t), let m1 be the mean of its upper and lower envelopes as determined

from a cubic-spline interpolation of local maxima and minima. The locality is deter-

mined by an arbitrary parameter; the calculation time and the effectiveness of the EMD

depends greatly on such a parameter.


The first component h1 is computed:

h1 = x(t)−m1. (2.1)

• In the second sifting process, h1 is treated as the data, and m11 is the mean of h1’s

upper and lower envelopes:

h11 = h1 −m11. (2.2)

• This sifting procedure is repeated k times, until h1k is an IMF, that is:

h1(k−1) −m1k = h1k (2.3)

.

• Then, it is designated as c1=h1k, the first IMF component from the data, which

contains the shortest period component of the signal. We separate it from the rest

of the data:

X(t)− c1 = r1. (2.4)

• The procedure is repeated on rj : r1 − c2 = r2, ...., rn−1 − cn = rn.

• Thus the original signal x(t) can be expressed as:

x(t) =n∑

j=1

cj(t) + rn(t). (2.5)

• cj(t) is an IMF where j represents the number of corresponding IMF and rn(t) is

residue.

The sifting process, the maximum, minimum, and the mean envelopes are shown in

Figure 2.5.


0 1 2 3 4 5 6 7 8 9 10−1

−0.5

0

0.5

1x 10

−3

Time (s)

Am

plit

ud

e

Data

Upper Envelope

Lower Envelope

Mean

Figure 2.5: Sifting process and envelopes

The stoppage criterion determines the number of sifting steps to produce an IMF.

Two different stoppage criteria have been used traditionally:

1. This criterion is proposed by Huang et al. [38] and defined as a sum of the differ-

ences, SD,

SDk =

T∑t=0

|hk−1(t)− hk(t)|2

T∑t=0

h2k−1(t)

(2.6)

Then the sifting process will stop when SD is smaller than a pre-given value.

2. A second criterion is based on the number called the S-number, which is defined as

the number of consecutive siftings when the numbers of zero-crossings and extrema

are equal or at most differing by one. Specifically, an S-number is pre-selected. The


sifting process will stop only if for S consecutive times the numbers of zero-crossings

and extrema stay the same, and are equal or at most differ by one.

The EMD decomposes non-stationary signals into narrow-band components with de-

creasing frequency. The decomposition is complete, almost orthogonal, local and adap-

tive. All IMFs form a completely and “nearly” orthogonal basis for the original signal.

The basis directly comes from the signal, which guarantees the inherent characteristic

of the signal and avoids the diffusion and leakage of signal energy. The sifting process

eliminates riding waves, so each IMF is more symmetrical and is actually a zero mean

AM-FM component. An example of decomposition of ECG signal into its first 12 IMFs

is shown in Figure 2.6.

Figure 2.6: Decomposition of sample ECG signal into its first 12 IMFs.


Mode mixing appears to be the most significant drawback of the EMD algorithm,

which implies either a single IMF consisting of signals of dramatically disparate scales or

a signal of the same scale appearing in different IMF components, and usually causing

intermittency of the analyzing signal.

2.2.4.1.1 Applications of EMD Algorithm Nimunkar in [39] implemented the

EMD algorithm for filtering noisy ECG signals and compared the result of the EMD

algorithm with a traditional low-pass filtering approach. Also Tong et al. in [40] used

empirical mode decomposition for filtering power line noise in electrocardiogram signal.

They added pseudo noise at a frequency higher than the highest frequency of the signal to

filter out just the power line noise in the first IMF. They also compared the results with

traditional IIR-based bandstop filtering. This technique can also be used for filtering

power line noise during the enhancement of stress ECG signals. Furthermore, [41] used

EMD and PCA algorithms to obtain cardiovascular signals from the sensing hardware

embedded in a chair.

In another study, [42] showed that its proposed methods using EMD algorithm provides

better performance of noise reduction than wavelet thresholding de-noising methods in

aspects of remaining geometrical characteristics of ECG signal and the signal-to-noise

ratio (SNR).

The steps for de-noising the ECG signal proposed by [42] using the EMD are:

• Transform the noisy ECG signal s(k) by EMD, ci is used to denote a series of IMFs

of EMD at scale i, where i = 1, 2, .., n

• Calculate the mean square value δi at scale i, then threshold ti can be determined

by 3δ rule

• Apply the hard-thresholding method to obtain the estimated IMFs ci as follows:


c̃i(k) =

ci(k) if |ci(k)| ≥ ti

0 if |ci(k)| < ti(2.7)

• Reconstruct the de-noised ECG signal s(k) from c̃i(k)

2.2.4.2 Ensemble Empirical Mode Decomposition

Ensemble EMD (EEMD) was introduced to remove the mode-mixing effect. The

EEMD overcomes largely the mode-mixing problem of the original EMD by adding white

noise into the targeted signal repeatedly, and provides physically unique decompositions

when it is applied to data with mixed and intermittent scales.

The EEMD decomposing process can be separated into the following steps:

• Add a white noise series w(t) to the targeted data x(t) , the noise must be zero

mean and variance constant, so X(t) = x(t) + w(t).

• Decompose the data with added white noise into Intrinsic Mode Functions (IMFs)

and residue rn,

X(t) =n−1∑j=1

cj + rn (2.8)

• Repeat step 1 and step 2 for N times, but with different white noise, wi(t), serried

each time. So,

Xi(t) =n−1∑j=1

cij + rin (2.9)

• Obtain the ensemble means of corresponding IMFs of the decompositions as the

final result. Each IMF is obtained by decomposing the target signal.

cj =1

N

N∑i=1

cij (2.10)


This new approach utilizes the full advantage of the statistical characteristics of uni-

form distribution of frequency of white noise to improve the EMD method. Adding white

noise into the targeted signal, all scales continue to avoid the mode-mixing phenomenon.

Comparing the IMF component at the same level, EEMD has more concentrated and

band limited components.

2.2.4.3 The Fourier Transform and STFT

The Fourier Transform (FT), X(ω), of a signal x(t) is defined as:

X(ω) =

∫ ∞−∞

x(t)e−jωtdt (2.11)

where t and ω are the time and frequency parameters, respectively. It defines the spec-

trum of x(t) which consists of components at all frequencies over the range of which it is

nonzero.

Historically, Fourier spectrum analysis has provided a general method for examining

the global energy-frequency distribution. Fourier analysis has dominated the data anal-

ysis efforts soon after its introduction because of its power and simplicity. The Fourier

transform belongs to the class of orthogonal transformations that uses fixed harmonic

basis functions. The Fourier transform result can be shown as a decomposition of the

initial signal into harmonic functions with fixed frequencies and amplitudes.

For many signals, Fourier analysis is useful because the signal’s frequency content

is important. But Fourier analysis has a serious drawback for information loss while

transforming the signal to frequency domain. It is only valid under extremely general

conditions, (i.e. the system must be linear, and the data must be strictly periodic or

stationary) otherwise the resulting spectrum will make little physical sense.

Dannis Gabor in 1946 adapted the Fourier transform to analyze only a small set of sig-


nals at a time. It is called Short-Time Fourier Transform (STFT). The STFT is obtained

from the usual FT by multiplying time domain signal x(t) by an appropriate sliding time

window w(t). Thus, instead of the usual FT expression one gets a time-frequency ex-

pression of the form:

X(τ, ω) =

∫ ∞−∞

x(t)w(t− τ)e−jωtdt (2.12)

where w(t) is the time window applied to the signal.

The information STFT provides has limited precision, which is determined by the size

of the window.

2.2.4.4 Wavelet Transform (WT)

The Wavelet Transfrom (WT) is used to analyze the signal in time and frequency

domain. The WT describes the properties of a waveform that change over time and the

waveform is divided into segments of scale. It involves representing a time function in

terms of simple, fixed building blocks, termed wavelets. These building blocks are actu-

ally a family of functions, which are derived from a single generating function called the

mother wavelet by translation and dilation operations.

The WT can be categorized into two types of continuous and discrete. Continuous

Wavelet Transform (CWT) is used to divide a continuous-time function into wavelets.

The CWT of a continuous, square-integrable function x(t) at a scale a > 0 and transla-

tional value b ∈ R is dened by:

Wω(a, b) =1√|a|

∫ +∞−∞

x(t)g∗(t− ba

)dt (2.13)

Where ∗ denotes a complex conjugate, g(t) is a so-called analyzing wavelet and is a

continuous function in both the time domain and frequency domain. g(t) is called the


mother wavelet.

To recover the original signal x(t), inverse continuous wavelet transform can be ex-

ploited.

x(t) =

∫ +∞0

∫ +∞−∞

1

a2Xω(a, b)

1√|a|g̃

(t− ba

)db da (2.14)

g̃(t) is the dual function of g(t).

The analyzing wavelet g(t) should satisfy a certain number of properties. The most

important property is integrability and square integrability. Also, the wavelet has to be

concentrated in the time and frequency as much as possible.

However, calculating wavelet coefficients for every possible scale can represent a consid-

erable effort and result in a vast amount of data. Therefore, Discrete Wavelet Transform

(DWT) is often used. The WT can be thought of as an extension of the classic Fourier

transform, except that, instead of working on a single scale (time or frequency), it works

on a multi-scale basis. This multi-scale feature of the WT allows the decomposition of

a signal into a number of scales, each scale representing a particular coarseness of the

signal under study [43].

The DWT of a signal x[n] is calculated by passing it through a series of filters. In

each stage two 2 digital filters and 2 down samplers by 2 exist as shown in Figure 2.7.

g[n] is the discrete mother wavelet and is a high-pass filter and h[n] is its minor version

and low-pass in nature.

The outputs giving the detail coefficients (from the high-pass filter) and approxima-

tion coefficients (from the low-pass filter) is computed as follows:

ylow[n] =

+∞∑k=−∞

x[k]h[2n− k]

yhigh[n] =+∞∑

k=−∞

x[k]g[2n− k](2.15)


x[n]

h[n]

h[n]g[n]

g[n]

2

22 ...

Level 1 DWT coefficients

Level 2 DWT coefficients Level 3 DWT

coefficients

Figure 2.7: Discrete Wavelet Transform decomposition

The wavelet transform is often compared with the Fourier transform. The Fourier

transform is a powerful tool for processing stationary signals (a signal where there is no

change in properties of the signal). To avoid constraints associated with non- stationary

signals, a wavelet transform is introduced. Like the Fourier transform, it performs de-

composition in a fixed basis of functions. However, unlike FT it expands the signal in

terms of wavelet functions which are localized in both time and frequency [44].

Chapter 3

Application Architecture

We developed and tested an Apple iOS application to demonstrate the iOS device’s

potential for measuring user heart rates in realtime. This application makes use of the

iPhone’s front-facing and back cameras and also microphone for PPG and PCG acquisi-

tion, in order to provide an estimation of the user’s heart rate. Once the measurements

are obtained, the app will analyze the signals to compute the user’s heart rate.

At a high level, the core algorithm can be represented by the block diagram in Figure

3.1. Testing the iDevice sensors’ capability for retrieving heart pulse information is per-

formed in 4 steps. The video and audio processing units take in their inputs in the first

3 steps, and in the last step, signal processing and machine learning algorithms are used

to estimate the heart rate.

3.1 Fingertip Processing Unit

The application records video from the fingertip of the user, using the back camera

for 10 seconds. The user needs to gently press the camera lens and its LED with his

index finger as previously shown in Figure 2.2. When the user presses the camera lens

of the device and its LED simultaneously, the ambient light travels through the finger

23

Chapter 3. Application Architecture 24

Figure 3.1: Block diagram of the application architecture

and is reaches the camera sensor. A sample frame of the recording video from the index

fingertip is shown in Figure 3.2.

Our application utilizes the same image acquisition concept that is available in pulse

oximeters. In order to determine oxygenated and deoxygenated blood, based on the

blood opacity, we measure the brightness of skin over time. In order to compute the

brightness variation of the skin, we calculate the average red channel intensities of pixels

in the region of interest in each frame. So, we divide each frame into 9 cells and PPG

waveform extraction takes the central cell into consideration as shown in Figure 3.3.


Figure 3.2: Video recording from the fingertip using back camera

Figure 3.3: Region of interest in each frame


The average red channel intensity is calculated by the equation 3.1 to determine the

PPG signal.

PPG1(t) =

∑x,y

R(x, y, t)

WH(3.1)

where R(x, y, t) is the red channel intensity of frame at time t at the pixel (x, y) and

hence 0 ≤ R(x, y, t) ≤ 255. WH is the number of pixels in the region of interest, which

is 192 x 144 in our application.

Sample PPG signal from the video recorded from the index fingertip is shown in Fig-

ure 3.4. The data is quite clean and peak-to-peak distances are visually identifiable.

Figure 3.4: Example of fingertip data obtained by the described capture method


3.2 Face Processing Unit

Another mechanism to sense the color changes of skin during a cardiac cycle is record-

ing video from the face. The application records video from the face for 10 seconds.

To record properly, the face should be placed in front of the front-facing camera in a

well-lit environment. In this application, the user should place his/her forehead in a

pre-determined area that is displayed on the screen (Figure 3.5).

Figure 3.5: Video recording from the face using front-facing camera

In order to capture the PPG signal from the face we again apply the equation 3.2

to the region of interest in each frame of the recorded video. The PPG signal computed

from the face might have additional noise corresponding to illumination levels in the

room, objects covering the forehead, or possible movement of the device during recording.

PPG2(t) =

∑x,y

R(x, y, t)

WH(3.2)


and again R(x, y, t) is the red channel intensity of the frame at time t at the pixel (x, y)

and hence 0 ≤ R(x, y, t) ≤ 255. WH is the number of pixels in the region of interest,

which is 192 x 144 in our application.

A sample PPG signal from the video recorded from the face is shown in Figure 3.6.

The data is still clean, but it contains additional noise compared to the PPG signal

recorded from the fingertip.

Figure 3.6: Example of PPG signal from the face obtained by the described capture

method

For both fingertip and face recording, the exposure settings on the device are locked

in order to eliminate the effect of auto-exposure on the captured results. For example,

during the fingertip test, the finger completely covers the camera, resulting in the iOS

device over-exposing the capture, thinking it is in a low-light condition. This trend tends

to drop the frame rate and saturate the red intensity of the captured data.


3.3 Audio Processing Unit

In the final step, the user should place the microphone directly on his chest, preferably

on the auscultation sites that was shown in Figure 2.4. The audio will be recorded for

10 seconds using the primary microphone of the device at sampling frequency 44.1 kHz.

The recorded audio is an indication of the PCG signal and two main sounds in PCG,

S1 and S2 are identifiable from it. The sample PCG signal recorded using the primary

microphone on the iDevice is shown in Figure 3.7.

Figure 3.7: Example of PCG recorded by the microphone and identifiable heart sounds

From the PCG signal shown in Figure 3.7, two different heart sounds, S1 and S2,

are quite identifiable, but like any signal, the recorded audio by the microphone might

contain some noise corresponding to the movement of the phone placed on the chest

during recording or possible background noise in the room.

In the next section, our proposed method in filtering the three different captured

signals and estimating the heart rate of the user is presented.


3.4 Heartbeat Estimation Algorithm

The proposed heartbeat estimation algorithm in this thesis provides an estimate of

the user’s HR by (1) computing the HR from the input PPGs and PCG signals using

our version of the Empirical Mode Decomposition (EMD) algorithm, (2) assessing the

quality of the input signals in order to distinguish between good/bad waveform segments

using the SVM classifier, (3) concisely combining heart rate information from the three

different modalities based on the quality of signals.

Figure 3.8 illustrates the components of this approach. In the first component, we

use a signal filtering algorithm (either EMD, DWT, or FIR filtering) to remove the noise

artifact in each waveform. Next, we apply the peak detection algorithm to compute

heart rate in each segment independently. Third, we separately qualify PPG and PCG

waveform segments (each segment is 5 seconds) as either good or bad through the use of

the machine learning algorithm in the form of SVMs. In the fourth and final component,

through a decision-logic algorithm, we precisely combine the result of the three previous

components to provide the final heart rate estimation.

3.5 Signal Filtering Algorithms

In this section we will discuss three different signal filtering algorithms applied to our

dataset. Our main contribution in the signal processing algorithm is to introduce a ver-

sion of the EMD algorithm to reduce noise in PPG and PCG signals. We also applied

Wavelet Transform and FIR filtering to our dataset, in order to compare the results with

our proposed EMD algorithm.


Figure 3.8: Elements of the proposed algorithm

3.5.1 FIR Filtering

An overview of the designed FIR filtering algorithm is shown in Figure 3.9. We first

need to down-sample the audio before getting off the device and in the next phase we

remove baseline wander, and also filter out the noise outside the heartbeat range of

interest. Finally, we apply a moving average algorithm in order to detect the peaks

efficiently.

In the first step we need to filter and down-sample the audio to get the recorded

data off the device in a reasonable amount of time (with a reasonable size). The raw

data is sampled at 44.1 kHz for 10 seconds. The data is then filtered with a 6th order

Butterworth filter (low-pass) and down-sampled to ensure that the Nyquist requirement

is still met. An example of the data at this stage is illustrated in Figure 3.10.


Figure 3.9: FIR filtering module

Figure 3.10: Sample data captured from the device. It has already been filtered and

down-sampled as specified.


After the signal acquisition, a band-pass filter attenuates frequencies outside the in-

terest band. This reduces the noise in later processing steps and makes the resulting

heart rate signal smoother. In this case, we first remove the baseline wander to provide

signals with zero mean. Then we divided each of the three vital signals into 10 intervals,

because the average of the input signals can shift over time due to sensor drift. Linear

trends were subtracted for each interval to remove the baseline wander. A sample of the

original PPG signal, the baseline wander of the PPG, and the clean signal after removing

noise is shown in Figure 3.11.

Secondly, we apply the Butterworth band-pass filter of order four to each input signal

with cutoff frequencies of 0.8 Hz and 3.0 Hz to reject the noise outside the heart rate

range of 48 to 180 beats per minute.

0 2 4 6 8 10−5

0

5

Time (s)

Rvalu

e

Clean PPG after removing baseline

0 2 4 6 8 1040

50

60

70

Rvalu

e

Baseline

0 2 4 6 8 1040

50

60

70

RV

alu

e

Original PPG signal

Figure 3.11: Sample PPG recording with baseline wander, baseline, and clean PPG after

removing the baseline


Finally, a moving average is applied to the filtered data. The equation is:

y[n] =1

2L+ 1

n+L∑m=n−L

|x[m]| (3.3)

where L is the length of the window used for averaging. The study in [45] suggests that

the shorter heart sound is approximately 67-87 ms in length, and so we applied a window

of 63 ms.

Figures 3.12 and 3.13 illustrate the result of applying the FIR filtering module de-

scribed above to the PPG recorded from the fingertip and PPG recorded form the face

respectively.

0 2 4 6 8 1010

15

20

25Original fingertip recording

Rva

lue

0 2 4 6 8 10−1

−0.5

0

0.5

1Filtered fingertip recording

Time (s)

Rva

lue

Figure 3.12: Original PPG recorded from the fingertip and clean PPG after applying

FIR filtering


0 2 4 6 8 10169

170

171

172

173

174

Rvalu

e

Original face recording

0 2 4 6 8 10−1

−0.5

0

0.5

1

Time (s)

Rvalu

e

Filtered face recording

Figure 3.13: Original PPG recorded from the face and clean PPG after applying FIR

filtering

Finally, since we need to correlate the three inputs to estimate the heart rate of

the user, we need the sampling rates of the audio recording from the microphone to

be proportionally equal to the sampling rate of our camera, which is 30 fps. So, we

down-sampled the audio and for this purpose we use the Butterworth low-pass filter

with appropriate cutoff frequencies to avoid aliasing. Figure 3.14. illustrates the PCG

recorded from the chest of the user and its down-sampled, filtered, and smooth result

after applying the algorithm.


0 2 4 6 8 10−4000

−2000

0

2000

4000Audio recording

Am

plit

ude

0 2 4 6 8 10−400

−200

0

200

400

Time (s)

Am

plit

ude

Filtered audio recording

Figure 3.14: Original PCG recorded from the chest of the User and clean PCG after

applying FIR filtering

3.5.2 EMD Algorithm

Now we will discuss the EMD algorithm proposed for filtering out noise from the two

PPGs and the PCG signals, which consists of two parts:

3.5.2.1 Decomposition

The Empirical Mode Decomposition algorithm is basically introduced to analyze non-

linear and non-stationary signals like biomedical signals and the algorithm is based on

decomposing the signal into a collection of IIMFs. These IMFs should fulfill 2 conditions

that were discussed previously.


In the first step of signal filtering using EMD we decompose the original PPGs and

a PCG using the EMD method using the following:

1. Initialize h1(t) with the original signal.

2. Identify the extreme of the signal, hi(t).

3. Generate the upper and lower envelopes by interpolation of maxima and minima

points developed in the previous step.

4. Calculate the mean of the two envelopes to determine the local mean value, m(t).

5. Calculate d(t) = hi(t)−m(t).

6. Test if d(t) becomes a zero-mean signal, then d(t) is considered as the next IMF,

hi+1(t) = d(t). Otherwise replace hi(t) with d(t) and repeat from step (2).

7. Update the residue series as r = r− hi(t) and i = i+ 1. Repeat steps (2) to (6) by

sifting the residual signal. The process is stopped when the final residual signal is

obtained as a monotonic function.

A sample decomposition of the PPG signal recorded from the fingertip into its IMFs

is illustrated in Figure 3.15.

3.5.2.2 Reconstruction

After applying the EMD algorithm on the input signal, the signal is decomposed into

a residue and a collection of IMFs. Hence it can be expressed as:

x(t) =n∑

i=1

hi(t) + r (3.4)

where n is the number of IMFs.


0 2 4 6 8 1040

60

80

PP

G

0 2 4 6 8 10−5

0

5

IMF

1

0 2 4 6 8 10−5

0

5

IMF

2

0 2 4 6 8 10−5

0

5

IMF

3

0 2 4 6 8 1050

60

70

Time (s)

IMF

4

Figure 3.15: Original PPG recorded from the fingertip and the decomposed IMFs using

the EMD algorithm.

We know from the literature of EMD applications that the last IMFs are considered

as baseline wander and also that high frequency noise components lie in the first IMFs. In

order to reconstruct the clean signal from the decomposed IMFs, we need to determine the

noise level in the signal. To determine the noise level and recover the heartbeat signal, the

IMFs corresponding to heartbeat is determined according to their peak frequencies. So,

we compute the Power Spectral Density of each IMF, which demonstrate the dominant

frequency in the IMF. In our algorithm, the IMFs with peak frequency, Fi, in the range

of 0.8 Hz - 3.0 Hz are classified as a component of the heartbeat signal. [46] tested these

cutoff limits on the output of sensors in a designed “HeartPhone”. Therefore, we can

reconstruct the heartbeat signal as:

Hclean(t) =∑i

hi(t) (Fi ∈ [0.8, 3.0] Hz) (3.5)

Figures 3.16 and 3.17 illustrate the decomposition of the PPG signal recorded from


the fingertip into five IMFs and corresponding power spectral densities.

0 2 4 6 8 105

10

15

PP

G

EMD decomposition of PPG signal

0 2 4 6 8 10−0.5

0

0.5IM

F1

0 2 4 6 8 10−1

0

1

IMF

2

0 2 4 6 8 10−0.2

0

0.2

IMF

3

0 2 4 6 8 10−2

0

2

IMF

4

0 2 4 6 8 105

10

15

Time (s)

IMF

5

Figure 3.16: Decomposition of the PPG signal into IMFs using the EMD algorithm.

0 1 2 3 4 5 6 7 8 9 10−60

−40

−20

IMF

1

Welch Power Spectral Density Estimate

0 1 2 3 4 5 6 7 8 9 10−100

−50

0

IMF

2

0 1 2 3 4 5 6 7 8 9 10−100

−50

0

IMF

3

0 1 2 3 4 5 6 7 8 9 10−100

0

100

IMF

4

0 1 2 3 4 5 6 7 8 9 10−100

0

100

Normalized Frequency (Hz)

IMF

5

Figure 3.17: Power Spectral Density of the decomposed IMFs.


A comparison between the original and reconstructed PPG signal, recorded from

the fingertip via the use of the EMD algorithm, is shown in Figure 3.18. The recon-

struction was based on IMFs whose dominant frequency components are in the range of

0.8 Hz - 3.0 Hz. According to the frequency range, we used second and third IMFs for

partial reconstruction of the signal.

0 2 4 6 8 106

7

8

9

10

11

12

Rva

lue

Original PPG signal

0 2 4 6 8 10−1

−0.5

0

0.5

1

Time (s)

Rva

lue

De−noised PPG after applying EMD algorithm

Figure 3.18: Original PPG recorded from the fingertip and the clean signal after applying

the EMD algorithm and reconstructing the signal based on the Power Spectral Density

of the IMFs.

3.5.3 Wavelet Transform

Wavelet transform can be used for data decomposition and reconstruction. By decom-

posing the original signal, we can eliminate the wavelets corresponding to the noise and

reconstruct a clean signal. In order to implement the WT to filter the recorded data, we


use Multi-Resolution Analysis (MRA).

According to Wavelet Transform analysis approximations are the high-scale, low-

frequency components and the details are the low-scale, high-frequency components of

the signal. Under a varied level of decomposition, a threshold is needed to determine

which level of components should be eliminated.

The selection of an appropriate wavelet and number of decomposition levels is very

important in the analysis of signals using the WT. The number of decomposition levels is

chosen based on the dominant frequency component of the signal. The levels are chosen

so that those parts of the signal that correlate with the frequencies required for classifi-

cation of the signal are retained in the wavelet coefficients. In our algorithm, the level

of decomposition was chosen to be 4 [43]. Thus the PPG and PCG signal were decom-

posed into the details D1 −D4 and one final approximation, A4. A4 contains dominant

frequency in the [0, 3.75] Hz, which corresponds to the heart pulse.

Usually tests are performed with different types of wavelets and the one that gives

the maximum efficiency is selected for the particular application. [43] suggests using the

Daubechies wavelet of order 2 for the PPG, ECG, and EEG signals, so we have also

done our analysis by (db2) at the level of 4. The block diagram of the wavelet transform

algorithm is illustrated bellow.

Figure 3.19: Block diagram of the wavelet decomposition and reconstruction.


3.6 Peak Detection Algorithm

The peak detection algorithm used in our own algorithm is a version of the Adaptive

Peak Identification Technique (ADAPIT) introduced by [13]. This algorithm detects

peak and computes heart rate for each waveform segment. The main steps of the peak

detection algorithm are as follows:

1. In order to detect peaks precisely, we need to remove the baseline of the signal

which we have already did in the previous section.

2. In this step the first estimation of the actual peaks is given:

• Two thresholds, T1 and T2 are computed. T1 is set to 2σ1 where σ1 denotes

standard deviation of all the data points of the waveform and defines the

waveform’s baseline range [-T1, T1]. T2 is set to 3σ2, σ2 being the baseline

standard deviation. The peaks greater than T2 are taken as the first estimation

of the actual peaks.

• The lower bound on the amplitude of the peak is set to one half of the median

amplitudes of all the peaks identified in the previous step.

3. To determine the actual peaks retained from the previous step, strings of markers

with period P are iteratively generated and moved along the timeline to align with

the retained peaks. Through this iteratively process, P is modified to a range of

length equivalent to HRs between 48 and 180 bpm. The largest P aligned to largest

number of peaks is selected.

4. Each unaligned marker of the selected P is allowed to move back and forth along

the timeline by as much as one half of P, in an attempt to line up any unaligned

peak.

Figure 3.20 illustrates the peak detection algorithm. (a) and (b) show the original

signal and its baseline wander and in (c) a clean signal without baseline wander is shown.


The threshold T1 and T2 is shown in this figure. In part (d) the primary peaks detected

by T3 are shown. Also, (e) and (f) show the peak-to-peak intervals and detected peaks

respectively.

Figure 3.20: An illustration of the peak detection algorithm

3.7 Learning System for heart rate Estimation based

on Support Vector Machines (SVMs)

3.7.1 SVM Classifier

SVM is a commonly used method for statistical pattern recognition. Consider the

problem of separating the input vectors belonging to two separate categories

V = {(x1, y1), ..., (xi, yi)} , i = 1, 2, ...,m, xi ∈ Rn, yi ∈ {±1}, with a hyper-plane wTx +

b = 0, where xi ∈ Rn are the patterns to be classified and yi ∈ {±1} are their categories,

“w” is a normal vector and “b” is a bias term.


The goal of the SVM classifier is to find the optimal separating hyper-plane which

optimally separates error and maximizes the distance between the closest vector to the

hyper-plane. Training the classifier involves the minimization of the error function:

1

2wTw =

1

2||w2|| (3.6)

subject to constraints:

yi(wTxi + b) ≥ 1, i = 1, 2, ...,m (3.7)

From the Equation 3.6 we can find that wTx+ b ≤ 0 for yi = −1 while wTx+ b ≥ 0 for

yi = 1.

The optimization problem can be formulated as follows:

min J(w, ζ) =1

2wTw + C

N∑1

ζi (3.8)

such that

yi(wTϕ(xi) + b) ≥ 1− ζi (3.9)

ζi ≥ 0 i = 1, ..., N (3.10)

where C is a positive regularization constant, which is chosen empirically, w is the weight

vector for training parameter, ζi is a positive slack variable indicating the distance of xi

with respect to the decision boundary, and ϕ is a nonlinear mapping function used to

map input data point xi into a higher dimentional space.

SVMs can be written using the Lagrange multiplier α ≥ 0. The solution for the

Lagrange multiplier is obtained by solving a quadratic programming problem. The SVM

decision function can be expressed as:

g(x) =∑

xi∈SV

αiyiK(x, xi) + b (3.11)

where K(x, xi) is the kernel function and defined as:

K(x, xi) = ϕ(x)Tϕ(xi) (3.12)

In this work the linear kernel function is used and is defined as K(x, xi) = xTxi.


3.7.2 SVM Classifier Implementation

The SVM classifier is used as a post-processing analysis in our heart rate estimation

system. The presented heartbeat estimation system is based on filtering and peak detec-

tion algorithms that were discussed in previous sections and the SVM classifier is used

to distinguish between good and bad recordings. The results from the filtering and peak

detection modules used to provide an estimation of heart rate based on the classified

good waveforms using the SVM classifier.

In our proposed method, we first apply our version of the EMD algorithm. Then, we

apply the peak detection module on the clean signal to detect the peaks in the signal,

from which the features of the classifier will be computed. We then apply the SVM

classifier, which is a supervised machine learning algorithm to distinguish between good

and bad waveforms. [48] has shown that the SVM classifier is an effective classifier in a

wide variety of applications, including in the characterization of PPG and PCG signals.

This component of the heartbeat detection system implements our premise that the

reliability of the heart rate estimation is highly dependent on the quality of the underly-

ing waveforms from which they are derived. A machine learning classifier, implemented

by SVM, automates the categorization of the waveforms by attempting to mimic the

performance of human who relies on visual inspection. A classifier learns the rules by

finding coefficients that optimize the correlations between a set of waveform-extracted

features and waveform quality obtained from manually categorized waveform samples.

There are five steps in the development of a Support Vector Machine-learning Classi-

fier:

1. manually classify and categorize sample waveform segments as good or bad.

2. define candidate waveform features that distinguish good/bad waveforms.

3. Select the most informative feature.


4. Train the classifier

5. Test the classifier

As a supervised-learning algorithm, the development of an SVM requires a set of in-

put/output learning samples, where the input consists of a list of discriminatory features

and the output consists of labeled binary classes.

To manually categorize waveforms, each recording is divided into two 5 seconds seg-

ments. Each segment is visually examined by a person. We examined 56 five second

segments for each of the PPG and PCG recordings. A segment is ranked as bad if more

than 2 expected peaks are not observed or if more than 2 expected peaks from the seg-

ment cannot be distinguished. Otherwise, it is ranked as good.

The success of an SVM classifier is highly dependent on good feature selection. So for

the feature selection part of the algorithm, we used the two features that are validated

by [13] for heart pulse signals, the Fraction of Aligned Waves (FW) and Pulse Wave

Variability (PV). Both of the features are time-domain features. FW provides a measure

of temporal regularity of the potential heartbeat signal and PV provides a measure of

the variability of the time interval between two adjacent pulse waves.

The training part of the SVM classifier uses the FW and PV features on a set of wave-

forms. The performance of the SVM classifier can also vary depending on the number of

waveforms used in the training phase and also the quality distribution of waveforms.

Hence, for any new data collected from the iDevice, we first filter the signals and run

the peak detection algorithm and then we use the learned SVM to assess the quality of

6 waveforms. After assessing the quality of the waveform, the ones that are classified as

good waveforms contribute to the final estimation of the heartbeat from our algorithm,

and can be computed using the equation:


Hr =

∑S∈Wgood

θ(S)

∑S∈Wgood

T(S)× 60 (3.13)

where Wgood is the class of waveforms that are classified as a good pulse signal, θ(S) is

the number of peaks in the waveform S, and T (S) is the duration of the waveform S in

seconds.

3.8 Multi-Channel heart rate Estimation

The multi-channel heart rate estimation module use the location of detected peaks in

peak detection module. The interval between successive detected peaks are calculated for

each of the waveforms. The mean value of the histogram of peak-to-peak distances is used

as an estimation of the heart rate. Hence, we can use the computed mean value to provide

an estimation of user’s heart rate. Figures 3.21 to 3.23 demonstrates the histogram of

peak-to-peak distances for fingertip, face, and audio recordings, respectively.

The main idea behind the use of 3 different modalities for estimating the heart rate

is to make the final estimation more reliable. To achieve this goal, we assume that the

heart rate of the user has a negligible change during the test so that we can use fusion of 3

modalities for our estimation. Therefore, we combine three histograms to create a single

histogram of the fused data. Figure 3.24 illustrates the combination of three histograms.

The final heart rate estimation of the system has peak-to-peak distance of 0.93s which is

64 bpm.


0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Histogram of Peak Intervals for the Fingertip Recording

Peak−to−peak Distance (Seconds)

Nu

mb

er

of

Occu

rre

nce

s

0.8358

Figure 3.21: Histogram of peak-to-peak distance from the fingertip recording

0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

0.5

1

1.5

2

2.5

3

3.5

4Histogram of Peak Intervals for the Recording of Face


Nu

mb

er

of

Occu

rre

nce

s

0.8870

Figure 3.22: Histogram of peak-to-peak distance from the face recording


−1 −0.5 0 0.5 1 1.5 2 2.50

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5Histogram of Peak Intervals for the Audio Recording


Nu

mb

er

of

Occu

rre

nce

s

0.9364

Figure 3.23: Histogram of peak-to-peak distance from the audio recording

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

2

4

6

8

10

12

14Histogram of Peak Intervals for the Fused Data


Num

ber

of O

ccurr

ences

0.8747

Figure 3.24: Histogram of peak-to-peak distance from 3 modalities

Chapter 4

Experimental Results

Our iPhone application was developed with data collected from 70 adults, aged 19-62-

all without any known history of cardiovascular abnormalities. The diverse sample of

participants consisted of 37 females and 33 males. Furthermore, 11 of the participants

had dark skins.

The experiments were conducted in a quiet and well-lit environment. Throughout the

experiment, subjects were comfortably seated, holding an iPod in their right hand and a

CMS 50-E pulse oximeter connecting to the index finger of their left hand.

We recorded user’s heart rate simultaneously using a pulse oximeter in order to ad-

dress the accuracy of our proposed algorithm. We used a pulse oximeter because it is

the easiest non-invasive way to measure heart rate of users, with a known error rate

that does not exceed 2% [49]. The heart rate measured by the pulse oximeter during

the experiment was recorded and used as a reference in comparison to the results of our

proposed algorithm.

50

Chapter 4. Experimental Results 51

4.1 Heartbeat Detection Accuracy without the use

of the SVM Classifier

In this section the accuracy of each of the FIR filters, Discrete Wavelet Transform,

and our proposed EMD algorithm is presented and compared. To evaluate the accuracy

of each method, we apply each of the signal processing algorithms to remove the high

frequency noise and baseline wander. Then, we apply the peak detection algorithm to

the de-noised signal and fuse the histograms of peak-to-peak distances to retrieve the

corresponding heart rate.

To compute the similarity between the actual heart rate measured by the pulse oxime-

ter and the estimated heart rate by each of the signal processing algorithms, the Root

Mean Square Error (RMSE) is computed. The RMSE between the heart rate measured

by pulse oximeter and heart rate estimation using FIR, DWT, and EMD filtering algo-

rithms for all 70 subjects are shown in Table 4.1. The RMSE is computed for each of

the three modalities and also the estimation from the fusion of the modalities.

From Table 4.1 we can see that, although each of the modalities contain valuable

information about the heart rate, the fused data provides us with a more accurate es-

timation of the users heart rate. Moreover, PPG signals are more informative than the

PCG signal due to the unavoidable background sounds in the experimental environment.

Furthermore, the video recorded from the fingertip is more accurate than the face, since

the illumination level of the environment and other objects covering the face have less

effect.


Table 4.1: Root Mean Square Error between heart rate measured by the pulse oximeter

and heart rate estimation using FIR, DWT, and EMD filtering for all 70 subjects

Fingertip

Recording

(bpm)

Face Recording

(bpm)

Audio Recording

(bpm)

Fused Estimation

(bpm)

FIR Filtering 4.8 5.2 6.1 4.7

DWT Filtering 4.7 4.8 5.1 4.5

EMD Filtering 4.1 4.3 5.2 3.8

Table 4.2 also demonstrates the average percentage error between the actual heart

rate measured by pulse oximeter and estimated heart rate using FIR, DWT, and EMD

filtering algorithms.

Table 4.2: Average percentage error between heart rate measured by pulse oximeter and

heart rate estimation using FIR, DWT, and EMD filtering for all 70 subjects

Fingertip

Recording

Face Recording Audio Recording Fused Estimation

FIR Filtering 4.4% 4.8% 5.6% 4.3%

DWT Filtering 4.2% 4.5% 4.8% 3.8%

EMD Filtering 3.7% 3.8% 4.7% 3.6%


As the results show, the heart rate estimation algorithm using our EMD algorithm

is more accurate than the FIR filtering and DWT algorithm. It should be noted that the

EMD algorithm works well for all of the three modalities and that the heart rate estima-

tion algorithm using EMD on the fused data is the most accurate heartbeat estimation.

The RMSE between the heart rate estimation using EMD on the fused data from the

three modalities and the recorded heart rate from the pulse oximeter is 3.8.

4.2 SVM Classifier Sensitivity

The level of noise recorded in PPGs from the face and fingertip and the PCG could

lead to a distortion of the heart pulse signal. In order to determine the sensitivity of the

SVM classifier discussed in Chapter 3, we tested the SVM classifier through 20 cross-

validation procedures. For each cross-validation procedure, we measure the quality of the

recordings visually and categorize them accordingly. In each of the 20 cross-validation

repetitions, 40% of the waveforms were used for training and the other 60% waveforms

were used for testing the classifier. The training sample waveforms are chosen randomly

in all 20 cross-validation procedures. For all the simulations, we used the same SVM

model with a linear kernel function and at the end of the 20 simulations, the average

performance measures, such as sensitivity (Se) and specificity(Sp), was computed and

the human classification was used as the ground truth. Classifier sensitivity provides a

measure of the incorrectly classified bad waveform segments, whereas classifier specificity

provides a measure of false hits (i.e. the fraction of good segments classified as bad).

The sensitivity (Se) and specificity (Sp) are defined as:

Se =TP

TP + FN(4.1)

Sp =TN

TN + FP(4.2)


where TP is the number of good waveform segments that are identified as good waveforms,

and TN is the number of bad waveform segments classified correctly as bad waveforms,

FP is the number of good waveforms incorrectly identified as bad waveforms, and FN is

the number of bad waveform segments incorrectly classified as good waveforms.

For classification purposes, as described earlier, each waveform is divided into two

segments of 5 seconds and hence we have 56 waveform segments for each of the modalities

for training the classifier. Tables 4.3, 4.4, and 4.5. correspond to the results of testing

the SVM classifier on the rest of 84 waveform segments of the PPG signal recorded from

the fingertip, the PPG signal recorded from the face, and the PCG signal.

For each set of training-testing waveforms, we run the simulation three times using

the de-noised PPG signal from applying FIR filtering, DWT de-noising algorithm, and

our proposed EMD algorithm. Averaged over 20 cross-validations, the results in Table

4.3 show the sensitivity and specificity of the SVM classifier on the de-noised signal using

FIR filtering, DWT de-noising algorithm, and our proposed EMD algorithm.

Table 4.3: Determining sensitivity and specificity of the SVM classifier for 84 de-noised

5 seconds waveform segments of the PPG signal, recorded from the fingertip

Average Sensitivity Average Specificity

FIR Filtering 85% 86%

DWT De-noising 82% 87%

EMDAlgorithm 89% 95%

From Table 4.3 we can see that the SVM classifier works best on the waveform

segments de-noised by the EMD algorithm and the sensitivity and specificity indexes

are 89% and 95% respectively. Also, the sensitivity of the SVM classifier is higher on

de-noised signals using FIR filtering than de-noised signals using the DWT algorithm,


whereas the specificity of the SVM classifier on the later is better than the first one.

The second result from Table 4.3 is a higher percentage of the average specificity index

rather than the average sensitivity of the SVM classifier in all three types of de-noised

waveform segments. This shows that the SVM classifier finds bad waveforms more cor-

rectly, with higher probability than good waveform. This happens because the peak

detection algorithm cannot detect peaks better than the human eye. Hence more good

waveform segments are classified as bad waveforms and caused the lower sensitivity per-

centage of the SVM classifier.

Table 4.4 illustrates the sensitivity and specificity of the SVM classifier on three

types of clean PPG signals, recorded from the face de-noised by FIR filtering, DWT, and

EMD algorithms by the same cross-validation procedure as described. Here again, the

sensitivity and specificity of the SVM classifier on de-noised waveform segments, using

our EMD logarithm is the highest of all. Also, the specificity percentage in all of the

three types of clean PPG signals is better than the sensitivity percentage of the SVM

classifier on them.


5 seconds waveform segments of the PPG signal recorded from the face






Furthermore, the performance of the SVM classifier on PPG signals, recorded from

the fingertip is better than its performance on PPG signals recorded from the face. This,

as we expect, shows that if the PPG signal recorded from the face is not clean and that

noise has contaminated the heart pulse, it is so intense that it would lead to distortion

of the pulse signal. The highest level of noise in the PPG recorded from the face might

be due to different illumination levels in the environment during the experiment and also

movement of the device during recording.

The sensitivity and specificity of the SVM classifier for the de-noised PCG signals

using the previously described cross-validation procedure is shown in Table 4.5. Speci-

ficity of the SVM classier has the highest percentage on de-noised PCG using the EMD

algorithm, which is 91%. Also the highest sensitivity of the SVM classifier is on the clean

PCG from the DWT (86%) which is slightly higher than that of the EMD algorithm,

which is 84%. The performance of SVM classifier on PCG waveform segments is better

than it performance on the PPG signal recorded from the face, but still worse than its

performance on the PPG signal recorded from the fingertip. Again, we can justify this

result from the fact that with no background noise in the environment, the device can

record the heart sound well, but if there is background noise, the heart pulse signal might

distort and lead to poor heart rate estimation.


5 seconds waveform segments of the PCG signal






4.3 Heartbeat Detection Accuracy by using SVM

Classifier

In this section we will explore the accuracy of the heartbeat detection algorithms using

the SVM classifier as a post-processing analysis. We first filtered the signal using one of

the FIR, DWT, or EMD filtering algorithms. Then, we ran the peak detection algorithm

to detect the heart pulse peaks in the recorded signals. We used the peak locations and

estimated the heart rate from each of the modalities to extract the classification features

of the SVM classifier. As described in the previous chapter, we used fraction of aligned

waves and pulse variability as the classification features of our SVM classifier. Next, we

applied the SVM classifier using these two features to classify the waveform segments as

good or bad using a linear kernel. Finally, we computed the heart rate from the waveform

segments that are classified as good.

To measure the performance of the proposed heartbeat detection system using SVM

classifier, we tested the SVM classifier through 20 cross-validation procedures employing

manually categorized waveform samples, at each of the 20 cross-validation repetitions

40% of the samples were used for training and the other 60% of samples were used

for testing the classifier. The training sample waveforms are chosen randomly in all 20

cross-validation procedures. For each of the training-testing set of waveforms, we run

the simulation three times using the de-noised signals using FIR filtering, the DWT de-

noising algorithm, and our proposed EMD algorithm.

The results of the heartbeat detection algorithm using the SVM classifier on each of

the signal filtering methods are shown in Table 4.6. As the results show, adding the SVM

classifier to the heartbeat estimation system provides better performance in terms of the

RMSE between the estimated heart rate and that measured heart rate using the pulse

oximeter. The major effect of adding the SVM classifier as a post-processing analysis

new multi-modal heart-beat estimation on an iphone · 2016. 2. 18. · abstract multi-modal...

Documents