real time extraction of ecg fiducial points using shape ... · real time extraction of ecg fiducial...

Real Time Extraction of ECG FiducialPoints Using Shape Based Detection

This thesis is

presented to the

School of Computer Science & Software Engineering

for the degree of

Doctor of Philosophy

The University of Western Australia

John Mark Darrington, BSc (hons)

April 2009

John Mark Darrington, BSc (hons)

To Martina.

Abstract

The electrocardiograph (ECG) is a common clinical and biomedical research

tool used for both diagnostic and prognostic purposes. In recent years com-

puter aided analysis of the ECG has enabled cardiographic patterns to be found

which were hitherto not apparent. Many of these analyses rely upon the segmenta-

tion of the ECG into separate time delimited waveforms. The instants delimiting

these segments are called the “fiducial points”.

Rapid automatic identification of the fiducial points is a task which the biomedical

engineer needs to perform as a prerequisite to further analysis. To the researcher,

post-processing of pre-recorded results is an acceptable mode of analysis. However

clinical staff require an immediate real-time diagnosis. This thesis is concerned with

the detection of fiducial points using methods which lend themselves to real-time

analysis, suitable for use in the development of systems intended for environments

where immediate results are called for.

By way of introduction, an examination of contemporary methods of fiducial point

detection is presented, followed by a discussion of the conventional methods of

assessing their performance. It is found that these assessment methods, whilst

widely used, are not only mathematically imprecise but are formed in such a way

as to be useful to the biomedical engineer, but not to the clinician. An alternative

assessment measure which overcomes these problems is proposed, and examples

presented to demonstrate how the proposed new measure can be used.

Secondly, a novel method of ECG peak detection is presented. Since real-time

detection is of interest, the method is developed with emphasis on optimising its

performance without the use of filtering or other pre-processing stages. The method

relies upon examination of the shape of the signal’s peaks rather than its spectral

analysis. This approach brings benefits in terms of noise immunity, particularly

where the spectral response of the feature is similar to that of the noise signal.

Initially, the method is applied to the task of detecting R peaks, the most prominent

feature of the ECG. Results are presented which demonstrate how the method is

not only effective at discriminating between these peaks, and other maxima in the

ECG, but also how the speed of execution suggests that the method lends itself to

real-time applications in a clinical setting.

Later in the thesis, the method is extended and a hybrid approach developed which

uses the advantages of the new method in conjunction with conventional linear

signal processing techniques to detect another two of the most important features,

namely the P and T waves.

Finally, since the basic method encapsulates the geometry of peaks in the signal, the

author presents a discussion of the opportunities that the method holds for further

uses. In particular it is proposed that the detection of onset and offset detected

waveforms can be extracted without significant extra time penalty.

Overall, the thesis presents an alternative basis for ECG analysis and an alternative

methodology of assessing the performance of such systems. It is hoped that the

reader finds these methods more intuitive than their existing counterparts, and will

inspire further research.

Preface

This thesis contains material derived from published work which has been co-

authored. The details of these works are set out below.

• John Darrington. Towards real time QRS detection: A fast method using

minimal pre-processing. Biomedical Signal Processing and Control, Volume 1,

Number 2, pages 169–176, April 2006.

The research presented in this paper forms the basis of chapter 4.

• John Darrington and Livia Hool. A new methodology for assessment of the

performance of heartbeat classification systems. BMC Medical Informatics

and Decision Making, Volume 8, Number 7, 2008.

The results presented in appendix A were taken from those presented in this

publication and the research presented in chapter 3 was, in part, derived from

work included in this paper.

Other papers published during the course of the PhD. candidature, which provided

information, insight or inspiration include:

• John Darrington and Livia Hool. EKG Beat Analysis: Minimising Redun-

dancy between Detection and Classification. Proceedings of the 5th IASTED

International Conference on Biomedical Engineering, February 2007.

• John Darrington Devising a Multidimensional Significant Maxima Detec-

tion Algorithm. Proceedings of the 14th UWA CSSE Research Conference,

September 2005.

Where a number of authors are present, the authors acted in a supervisory capacity

only. The percentage of effort towards the thesis by the candidate is 85%. The can-

didate is the primary author of the above publications. The candidate is responsible

for originality of the research presented in this thesis.

John Darrington

Amitava Datta

Livia Hool

Acknowledgements

I would like to express my sincere thanks to Amitava Datta who supervised me

throughout the entire study. I respect Amitava as an academic and a person of

considerable intellect and understanding. His encouragement and trust was instru-

mental to successful completion of the thesis. It has been an honour to study under

his supervision. I humbly offer him by profound gratitude and appreciation.

Livia Hool is a professional academic who is very knowledgeable in her field. She

has been exceedingly patient and helpful when explaining to me aspects of car-

diology and electrophysiology. She kindly agreed to co-supervise my candidature

and introduced to me aspects of academia I had not previously encountered. Her

assistance to me was immeasurable.

I would also like to extend my gratitude to the students and staff at the University

of Western Australia, School of Computer Science and Software Engineering for the

fun times, and the staff at Student Services for their support in the others.

Finally I would like to thank my partner Martina Faust to whom I dedicate this

thesis. Without her patience, compassion and extraordinary stamina this work

would not have been possible.

Contents

1 Introduction 1

1.1 Overview of the ECG. . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Physiological interpretation . . . . . . . . . . . . . . . . . . . . . . 5

1.3 The fiducial points . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Multiple channel electrocardiograph . . . . . . . . . . . . . . . . . . 6

1.5 Noise in the ECG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.6 Real time aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.7 Contribution of this thesis . . . . . . . . . . . . . . . . . . . . . . . 10

2 Literature Review 13

2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Linear filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.2 Non-linear filters . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Non-linear transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.1 Morphological transforms . . . . . . . . . . . . . . . . . . . 18

2.2.2 The wavelet transform . . . . . . . . . . . . . . . . . . . . . 19

2.2.3 Kupeev’s algorithm . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 The decision rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.1 Adaptive thresholding . . . . . . . . . . . . . . . . . . . . . 22

2.3.2 Stochastic methods . . . . . . . . . . . . . . . . . . . . . . . 25

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Assessment of Performance 29

3.1 Reference sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Reference matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.1 Matching tolerance . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.2 Matching algorithms . . . . . . . . . . . . . . . . . . . . . . 32

3.3 Reporting and visualising performance . . . . . . . . . . . . . . . . 36

3.3.1 The receiver operating characteristic . . . . . . . . . . . . . 36

3.3.2 Problems with traditional definitions of sensitivity and speci-

ficity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.3 Alternative definition of sensitivity and specificity . . . . . . 40

3.4 Time taken for detection . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Measurement of precision . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Assessing feature classifiers . . . . . . . . . . . . . . . . . . . . . . . 42

3.6.1 Problems with current classification methods . . . . . . . . . 43

3.6.2 Proposed methodology . . . . . . . . . . . . . . . . . . . . . 44

3.6.3 Worked example . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6.4 Summary of classifier assessment . . . . . . . . . . . . . . . 49

3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Detection of R peaks 51

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Algorithm implementation . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.1 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.2 Setup and initialisation . . . . . . . . . . . . . . . . . . . . . 55

4.2.3 Identifying the peaks . . . . . . . . . . . . . . . . . . . . . . 55

4.2.4 Constructing the tree . . . . . . . . . . . . . . . . . . . . . . 57

4.2.5 Extracting the maxima . . . . . . . . . . . . . . . . . . . . . 61

4.2.6 Shifting window . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.3 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.4 Parameterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.4.1 Window size . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.4.2 Weight threshold . . . . . . . . . . . . . . . . . . . . . . . . 65

4.5 Effect of filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.6.3 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . 71

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Detection of P and T waves 75

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2.1 Review of the wavelet transform . . . . . . . . . . . . . . . . 77

5.3 Algorithm implementation . . . . . . . . . . . . . . . . . . . . . . . 79

5.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.4.1 Tests using synthetically generated signals . . . . . . . . . . 83

5.4.2 Performance tests against the QT database . . . . . . . . . . 88

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6 General conclusions and future research 95

6.1 Existing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.2 Assessment of performance . . . . . . . . . . . . . . . . . . . . . . . 96

6.3 R peak detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.4 P wave detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.5 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6.5.1 Detection of minima . . . . . . . . . . . . . . . . . . . . . . 98

6.5.2 Continuous real-time operation . . . . . . . . . . . . . . . . 99

6.5.3 Detection of waveform onset and offset . . . . . . . . . . . . 100

6.5.4 Other applications . . . . . . . . . . . . . . . . . . . . . . . 101

A Costs of beat misclassification 103

A.1 Extrapolation of data . . . . . . . . . . . . . . . . . . . . . . . . . . 103

A.2 Costs of incorrect classification . . . . . . . . . . . . . . . . . . . . 104

A.2.1 Costs of misclassification of a normal beat as abnormal . . . 109

A.2.2 Erroneous classification of beats as supra-ventricular ectopic

beats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Bibliography 111

List of Tables

3.1 A priori probabilities derived from the MIT-BIH Arrhythmia database. 47

3.2 Comparative performance of two classifiers . . . . . . . . . . . . . . 48

4.3 Sensitivity and Specificity for record 207 of the MIT-BIH Arrhythmia

database. This record is a particularly difficult one to analyse. . . . 70

4.4 Elapsed time to process a record, demonstrating the benefits of low

pass filtering of the complex lead. Despite the overhead of the filter,

there is considerable nett benefit to be gained by filtering. . . . . . 72

5.5 Distribution of detection error versus P wave position. θ indicates

the phase referenced to the R peak. The default value is −60◦. All

detections performed with SNR: 10dB, β: 1. . . . . . . . . . . . . 86

5.6 Distribution of detection error versus P wave amplitude. a indicates

the amplitude of the P wave. The default value is 0.75mV. All de-

tections performed with SNR: 10dB, β: 1. . . . . . . . . . . . . . . 87

5.7 Distribution of detection error versus P wave width. b indicates the

width of the P wave. The default value is 0.25s. All detections

performed with SNR: 10dB, β: 1. . . . . . . . . . . . . . . . . . . . 87

5.8 Table of beat by beat comparisons of detected P and T waves against

annotated records of the QT database. The ‘O’ column indicates

beats annotated as something other than P or T waves. . . . . . . 90

A.9 Costs of false classification in 1000s of AUD . . . . . . . . . . . . . 105

A.10 Expected Loss of life due to Atrial Fibrillation . . . . . . . . . . . . 106

A.11 Expected Loss of life due to Ventricular Fibrillation . . . . . . . . . 107

A.12 Expected loss of life due to Ventricular Ectopic Beats . . . . . . . . 109

4.8 The geometry of a peak. This example illustrates a peak P0 contain-

ing two child peaks, P1 and P2. . . . . . . . . . . . . . . . . . . . . 56

4.9 Detection of turning points in the signal. The input signal f(x) is

differentiated to get f ′(x), which is clamped to ±1 resulting with

g(x). g(x) is differentiated to get g′(x). The negative values of g′(x)

indicate the maxima in f(x), and the positive values indicate the

minima. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.10 Creating the final stack SS from the tree K. Node d is the leaf with

the smallest weight, and is therefore lowest in the stack. . . . . . . 62

4.11 A shifting window provides quasi real time response, and high pass

filtering. The overlapping regions are necessary, since peaks not com-

pletely contained within a window will otherwise go undetected. . . 63

4.12 ROC curves for R-peak detection. Each curve plots data at σ = 0.05,

0.1, 0.2, 0.3, 0.4, 0.5, & 0.6. Smaller values are plotted towards the

lower left hand corner. . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.13 Comparison of two records from the MIT-BIH Arrhythmia database

demonstrating the respective directions of R peaks in normal and

arrhythmic beats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.14 ECG channels presented to the detection algorithm. The R peaks

manifest themselves at slightly different temporal locations in chan-

nels 0 and 1, leading to double peaks in the complex lead. Low pass

filtering the complex lead combines the double peaks into a single one. 70

5.15 ECG signal and corresponding discrete wavelet transform. The hor-

izontal axis represents time, and the vertical axis the wavelet level.

This plot shows wavelet levels 1 through to 5; level 1 appearing at the

top. Coefficients of high magnitude are shown as white, and those of

low magnitude as black. . . . . . . . . . . . . . . . . . . . . . . . . 78

5.16 Intersection of wavelet coefficients and a window N . The window

corresponds to the range of a single node in the binary tree of Ku-

peev’s algorithm. Each coefficient domain which wholly or partially

falls within the range (u, v) contributes to the weight of N . . . . . 80

5.17 Detection results for different noise colours. Each curve is plotted by

running the detection algorithm with the following weight threshold

values: 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2, 0.3,

0.4, 0.5, 0.6, 0.7, 0.8, 0.9; smaller values result in a more promiscuous

detector and are plotted towards the upper right hand corner. . . . 84

5.18 Standard deviation of the difference between the actual and detected

locations vs. signal to noise ratio for different noise colours. The

weight threshold used for each experiment was 0.07. . . . . . . . . . 85

5.19 The result of applying the wavelet weighted Kupeev detector to a

synthetically generated ECG. Undetected P waves were observed to

be preceded by a flat TP segment, whereas correctly detected P waves

are generally associated with a positive slope in this region and no-

ticeably greater negative deflection. . . . . . . . . . . . . . . . . . 91

5.20 The Daubechies-4 mother wavelet. The shape of this wavelet is sim-

ilar to that of the most commonly manifested P waves, and explains

the efficacy of this wavelet in their detection. . . . . . . . . . . . . . 92

6.21 A sliding window would require a dynamically modifiable tree. Leaf

nodes occurring before the window are excised from the tree. New

nodes entering the window require the tree to be re-rooted. The new

nodes becomes the right hand child of the new root. . . . . . . . . . 99

6.22 Principle of a possible system to determine the onset of a feature. The

onset will occur in the region where there is a change in ∆f(t)/∆t. 101

List of Figures

1.1 Anterior view of the human heart, illustrating the two atria and

ventricles. The arrows indicate the direction of blood flow from atria

to ventricles and through the pulmonary and aortic valves. . . . . . 3

1.2 A single cardiac cycle in an ECG lead representing normal sinus

rhythm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Configuration of the 12 lead electrocardiograph. These diagrams

show an anterior view of the patient. The right leg is normally con-

nected to signal ground (not indicated here). . . . . . . . . . . . . 7

2.4 An example of peak detection by Kupeev’s algorithm. Excised nodes

are placed into an external stack which will contain peaks sorted by

weight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.5 Two different permutations for matching reference features to de-

tected features. Reference features are annotated rn and detected

features sn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6 An example of several receiver operating characteristic curves plotted

on the same axes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.7 A visualisation of Kupeev’s algorithm applied to an ECG signal.

Each peak becomes a node in the binary tree. Subsequent processing

will extract the nodes of maximal height to select R peaks whilst

discarding P & T waves and high frequency noise. . . . . . . . . . . 53

Chapter 1

Introduction

An electrocardiograph (ECG) is a cartesian representation of the electrical po-

tential generated by the heart. Since its invention in 1887, it has been an

invaluable diagnostic tool for the clinician. Traditionally, the ECG is recorded in

a hospital setting, or by an ambulatory device[41] and the analysis is done offline

by trained clinical personnel. However recent applications demand analysis online,

where no skilled persons are available, or without any manual intervention.

One such application is the automatic external defibrillator. A defibrillator is a de-

vice which administers a therapeutic electrical shock to a patient exhibiting ventric-

ular fibrillation or other form of ventricular tachyarrhythmia, and if used correctly

will induce the heart to resume normal rhythm. Such equipment has traditionally

been restricted to hospitals since the shock must be applied at the correct instant

with respect to the fibrillating rhythm. However recent advances in automatic ECG

analysis has led to the development of automatic external defibrillators which detect

the appropriate instant to deliver a shock, and can thus be used by persons with no

more than basic first aid training[79]. Patients susceptible to ventricular fibrillation

are sometimes fitted with an implantable cardioverter-defibrillator. Due to the im-

plantable nature of such a device, it must be capable of quickly analysing the ECG

rhythm, using a minimum of hardware. There have however been reports of these

2 CHAPTER 1. INTRODUCTION

devices erroneously detecting arrhythmias and delivering inappropriate shocks to

the wearer[75]. Inappropriate shocks can induce arrhythmias and reducing the like-

lihood of such occurrences is vital since induced arrhythmias expose the patient to

the risk of permanent injury or death.

Offline analysis of ECG can sometimes be useful as a prognostic tool. However, the

patterns useful for prognostic purposes are generally very subtle and impossible to

detect by direct human observation[10]. As a diagnostic tool, offline analysis is not

preferred, since it is costly and it delays diagnosis which can be life threatening.

Whilst computer analysis of ECG signals is becoming acknowledged by the medical

profession, the quality of current automatic analysis is questioned in some areas[37,

48], which demonstrates the need for improved techniques. Other novel applications

of ECG analysis outside the medical field also appear in the literature[11, 33] and

continue to attract interest.

1.1 Overview of the ECG.

An elementary understanding of the ECG and the physiological process it repre-

sents is necessary in order to appreciate the motives for ECG analysis and the

processes involved. The heart comprises four chambers, as depicted in figure 1.1.

The two upper chambers are called the atria and lower two the ventricles. The

walls of each chamber consist of myocardial muscles, which themselves comprise

cells called myocytes . Normally, the interior of the cell has a potential of about

-70mV with respect to the extra cellular fluids surrounding the cell. Thus, there

exists an electrical potential across the cell membrane. A cell in such a state is said

to be polarised . A cell with a trans-membrane potential of zero (or slightly positive)

is said to be depolarised . Ions passing through the cell membrane can alter the po-

tential difference. Similarly, particular ranges of membrane potential are favourable

for the passage of the ions of particular molecules. This inter-relationship between

ion channels and the membrane potential is complex (see chapter 18 of [38]), but in

1.1. OVERVIEW OF THE ECG. 3

Sinus node

Right ventricle

Right atrium

Left atrium

Left ventricle

Figure 1.1: Anterior view of the human heart, illustrating the two atria and ven-tricles. The arrows indicate the direction of blood flow from atria to ventricles andthrough the pulmonary and aortic valves.

cardiac cycle←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

PP interval←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

QT interval←−−−−−−−−−−−−−−−−−→

STsegment←−−−→

RR interval←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→

T wave←−−−−−→ P wave←−−−−−−→Q

Figure 1.2: A single cardiac cycle in an ECG lead representing normal sinus rhythm.

normal healthy subjects, the result is a regular, periodic change in trans-membrane

potential versus time. The sinus node is said to be the primary “pacemaker” of the

heart. Under normal conditions it spontaneously produces an action potential, to

which the rest of the myocardial muscles sympathetically respond. A unique prop-

erty of myocytes is their ability to propagate the trans-membrane potential from

one cell to an adjacent cell. Hence, in normal healthy tissue, a wave of depolari-

sation may be observed moving across the heart. Another property is that sudden

depolarisation causes the myocardium to contract. These two properties result in a

wave of contraction, starting from the sinus node, spreading across the heart in the

inferior direction. In general, the myocardial polarisation is a deterministic, peri-

odic process and this gives rise to the regular heartbeat and ECG signal illustrated

in figure 1.2.

1.2. PHYSIOLOGICAL INTERPRETATION 5

1.2 Physiological interpretation

An ECG recorded for offline analysis is conventionally written to graph paper, with

a horizontal scale of 40ms per division and a vertical scale of 0.1mV per division,

with divisions occurring at 1mm intervals. As a depolarised region moves towards

an electrode, a positive deflection will be recorded on the ECG, and a negative de-

flection will be recorded as the regions closest to the electrode become repolarised.

For historical reasons, the turning points of a normal ECG are conventionally la-

belled P, Q, R, S and T (see figure 1.2). Some texts also include a U wave, but U

is often of very low amplitude or absent altogether. The P wave occurs as the atria

are depolarising, and hence contracting. This typically takes approximately 120ms.

Following the P wave, comes the QRS complex. This represents the ventricles de-

polarising, and completes in about 100ms. Since the ventricles are much larger

cavities than the atria, a larger electrical potential results from their depolarisation

and the QRS complex is of larger magnitude than the P wave. During the T wave,

the ventricles are repolarising. In the human heart, the repolarisation takes place

in the direction of the endocardium to epicardium (ie. in the opposite direction to

polarisation). Hence the T wave extends in the same direction as the R peak.

1.3 The fiducial points

A simple example of ECG analysis is the measurement of heart rate. This involves

detecting the R peaks and measuring the RR intervals (the time between each ven-

tricular contraction). Normally, this will be identical to the PP intervals, however

under pathological conditions, the two may become independent. Hence detection

of the P wave provides useful data. Advanced ECG analysis typically calls for

segmentation of each beat into its component waves. The relative durations and

amplitudes of each wave are often indicative of certain clinical conditions. For exam-

ple, an abnormally wide P wave is known to be a predictor of atrial fibrillation[22].

Similarly, an elevated voltage in the ST segment is commonly associated with acute

transmural myocardial ischemia (loss of blood supply to the cardiac muscles)[46].

Other studies have used the relative positions and magnitudes of the segmented

components in stochastic learning tools to perform beat classification[20]. Con-

temporary research, thus frequently requires detection not only of the peaks, but

also an accurate location of the onset and offset of each “complex”, and the points

delimiting them. These are generally known as the “fiducial” points. There is no

consensus as to which points are the most useful in ECG analysis, but many studies

have been concerned with the peaks of P, Q, R, S and T as well as the onset and

offset of P and T[45, 34, 70].

1.4 Multiple channel electrocardiograph

The heart is a 3 dimensional organ and a single channel does not contain sufficient

information to fully represent its electrophysiology. In modern clinical use, the elec-

trocardiograph provides 12 channels (called leads). This is achieved by connecting a

total of 10 electrodes to the patient: one to each limb, and six to the chest, as shown

in figure 1.3. The electrode connected to the right leg is used as a signal ground

only. The six chest electrodes, labelled {V1, V2 . . . V6}, have a standard anatomical

placement (see chapter 20 of [38]). V1 is placed just to the right of the sternum

in the fourth intercostal space. Each successive V electrode is placed in successive

positions towards the left of and slightly inferior to the previous electrode, with

V6 located on the midaxillary line level within the sixth intercostal space. The

monitoring instrument displays the 12 leads by use of differential amplifiers be-

tween the indicated electrodes. The I, II and III leads are commonly referred to

as Einthoven’s leads, whilst the aVR, aVL and aVF as Wilson’s leads after their

respective inventors[6].

Whilst the 12-lead ECG is the classical form of the electrocardiograph, the practical-

ities of preparing one, especially in emergency situations, often dictate a simplified

1.4. MULTIPLE CHANNEL ELECTROCARDIOGRAPH 7

aVR�

............................................

.............................................

..............................................

...............................................

......

........................

......

(a) Wilson’s leads

� III

{V1 . . . V6}6

(b) Einthoven’s leads

Figure 1.3: Configuration of the 12 lead electrocardiograph. These diagrams showan anterior view of the patient. The right leg is normally connected to signal ground(not indicated here).

form. Thus, in ECGs recorded from clinical situations, it is common to find only

one or two channels of information. Conversely, in a laboratory environment, ECGs

using up to 64 electrodes are used where a detailed map of body surface potentials

is needed[81].

Features which occur in one channel may be attenuated or absent in another. For

this reason, it is sometimes desirable to combine the available channels. One method

of doing this is the so called “complex lead”. The complex lead formula is cited by

Christov[15] as

f(xi) =1

|fj(xi+1)− fj(xi−1)| (1.1)

where:

L = Number of leads available,

fj(xi) = The ith sample from lead j.

Since the complex lead is a union of the features from each of its components, it

gives a marked increase in the sensitivity. However, features which occur in more

than one channel, typically appear with a small but significant phase delay. Also

high frequency noise is amplified since (1.1) implements a derivative. Subsequent

application of a low pass filter is required in order to alleviate these effects.

1.5 Noise in the ECG

Like any physical signal, ECGs suffer from various forms of noise. Noise can arise

• Power line interference and other electrical environmental noise. A carefully

prepared ECG can significantly reduce the magnitude of this kind of noise.

However ECGs recorded during emergency situations may not have the benefit

of such careful preparation. Fortunately, this type of noise is generally of a

1.5. NOISE IN THE ECG 9

higher frequency (≥ 50Hz) than the components normally of interest in ECG

analysis.

• Electrode movement. As the patient’s body moves, electrodes can lift away

from contact with the skin. This type of noise manifests itself as near or

complete saturation lasting for up to one second. In some applications where

a multi-channel ECG is used alternative channels maybe used during periods

where this noise is present.

• Respiration noise. This is caused by the patient’s normal respiratory function

giving rise to electrical activity in the intercostal muscles. It manifests itself

as low frequency “baseline shift” with a frequency of less than 0.4 Hz.

• Muscle noise. The ECG from any conscious patient will exhibit noise due to

muscle contractions. These can be particularly troublesome for ECG analysis

since their spectrum and waveform can closely match that of the wanted

signal.

The various noise sources produce a spectrum distribution which is of the form 1/fβ,

where β is constant. When β = 0, the power distribution is uniform, and this is

called “white noise” and arises from external sources or from within the measuring

equipment. In modern equipment this kind of noise is not of sufficient magnitude

to be troublesome. Noise for which β = 1, called “pink noise”, is most typical of

the ambient background generated within the patient as it naturally occurs in all

stable biological systems[73]. Electrode movement results in “brown noise” (β = 2),

which is very much richer in low frequency components.

Because of the various sources and forms of noise, when measuring and reporting

the performance of ECG analysis techniques it is desirable to use test data across

a wide range of signal to noise levels and across all common noise “colours”. Mea-

surement of noise in a recorded ECG can be difficult. Techniques for doing so

involve estimation of the wanted signal using principle components analysis, and

subtraction from the recorded sample[17]. For this reason, testing using syntheti-

cally generated ECG signals, with noise added post hoc is a common alternative to

testing using recorded signals.

There are several approaches to dealing with the problem of noise. One approach is

to filter the signal. This is feasible where the noise is outside of the frequency range

of the wanted signal, or confined to a narrow, well defined band (such as in the case

of power line interference). Filtering however introduces a phase delay, especially

for low pass filters, which may be unacceptable in certain applications.

1.6 Real time aspects

In many situations analysis of the ECG needs to be performed online, and in “real

time”; ie. within a finite predetermined time frame. For example, the automatic

cardioverter-defibrillator needs to quickly identify a fibrillating rhythm, and identify

the correct instant to deliver a therapeutic shock. Clearly, in these applications,

the online nature puts limits on the extent to which pre-processing can be done.

Filters, for example, have a finite phase delay — the output of the filter lags the

input by a certain time. It can be seen then, that the correct performance of a

ECG analysis algorithm depends not only on the quality of the output it produces,

but also on the time taken to produce that output. Minimising the amount of pre-

processing required, is therefore a major consideration when designing a system for

online analysis.

1.7 Contribution of this thesis

A review of common fiducial point detection techniques found in the literature to

date is presented in chapter 2. In subsequent chapters of this thesis, chapter 3

provides a critical examination the current metrics used to report the performance

of beat detection and classification algorithms. Some problems with these metrics

1.7. CONTRIBUTION OF THIS THESIS 11

are discussed and some alternative metrics proposed. Chapter 4 introduces a novel

algorithm for the detection of QRS complexes, with an emphasis on real time be-

haviour. Despite not having been used previously for ECG analysis, the method’s

precision is comparable to others from the literature. Furthermore, there is a known

upper bound on the running time. An extension to the method is presented in chap-

ter 5, where a combination of a morphological approach with a traditional linear

signal processing technique is used to detect the P and T waves; a problem which

is difficult due to the low amplitude of these waves. Finally in chapter 6, a concise

summary of previous chapters is given and a discussion of possible research which

might follow from the findings.

Chapter 2

Literature Review

With the advent of cheap computer technology, the limits on the complexity

of ECG analysis algorithms have been greatly lifted, and this has resulted

in a surge of interest in the subject. A comprehensive review of the techniques used

in fiducial point detection may be found in [40]. Whilst this review concentrates

on QRS detection, the general techniques are also applicable to detection of the

other fiducial points. However since the QRS complex is the most prominent fea-

ture in the ECG, extracting the less prominent ones often involves thresholding or

other classification techniques as a post processing stage[32]. The QRS is often of

primary interest, since it corresponds to ventricular activity. Sustained abnormal

ventricular beats are often fatal. P waves are of interest, not only for the purpose

of segmentation, but because they represent atrial activity, so patterns in P wave

amplitude and timing can be a useful predictor of atrial fibrillation[68, 22]. P wave

detection is a difficult problem, because the magnitude is often low, and the spec-

trum and morphology can be similar to other features. In early studies, the only

reliable means of detecting P waves used invasive sensors, such as the “oesophageal

pill electrode” swallowed by the patient[35] or electrodes surgically implanted inside

the body[74]. Typical P wave detectors first detect the QRS complex, then use a

backward searching technique over a localised area to find a maxima which is as-

sumed to be the P wave[27], or by ventricular cancellation where the QRS complex

14 CHAPTER 2. LITERATURE REVIEW

is first isolated and then subtracted from the input signal[80]. The problem with

these approaches, is that they both depend upon reliable QRS detection. Cascading

two detection algorithms, means that the precision of the union is the product of

the precision of the individual stages. A largely unsolved problem in ECG analysis

is the detection of secondary features (those other than the QRS), without having

first determined the QRS location.

For an ECG signal f(t) (in the case of multi-channel ECG, f is a vector function),

detection and classification of fiducial points is generally achieved by a pipeline

system comprising three consecutive stages:

Filtering to remove noise: Noise in the ECG is described in section 1.5. The

output from this stage is a function in the time domain, but not necessarily

the same number of dimensions as the input.

A non-linear transform: This enhances the desired features, whilst attenuating

those parts of the signal which are not of interest. The transform’s output

need not be a function of t.

A decision rule: In the case of a detector, the decision rule has a binary output

(“feature” or “no feature”). For a classifier, many output states are possible.

In some algorithms, one or more stages may be trivial, relying upon the complexity

of other stages to achieve performance. Others may have additional intermediate

stages. In the literature, some algorithms are presented such that the distinction

between the stages is unclear. For example, authors may describe their non-linear

transform as a “filter”[13, 55]. In the context of this chapter however filtering refers

to the process of removing unwanted parts of the signal, which would otherwise

interfere with subsequent processes, whereas the non-linear transform’s purpose is

to convert the signal into a form suitable for discrimination by the decision rule.

2.1. FILTERING 15

2.1 Filtering

Most algorithms for fiducial point extraction use filtering as a pre-processing stage,

to remove unwanted components. Some algorithms use differentiation in their non-

linear transform stage, which re-introduces high frequency noise, in which case it

is necessary to postpone filtering or to add a second filter stage to be performed

after the transform. Filters are optimised to a particular noise characteristic. The

most common sources of noise have been described in section 1.5, and the following

sections discuss common filter technologies used to address them.

2.1.1 Linear filters

The traditional “moving average” filter can be used to remove frequency components

within certain ranges. It can be implemented by convolving a filter mask with the

signal. For a signal f(t) and a mask g(t), the convolution operation is defined as

(f ∗ g)(t) ,∫ ∞

−∞

f(x)g(t− x) dx. (2.1)

However since this is an integral over infinite bounds, it cannot, in the general case,

be implemented in practical applications. Instead, it is necessary to choose g(t)

such that

lim|t|→∞

g(t) = 0. (2.2)

Then one may choose a value of t, κ such that

g(t) ≈ 0 where |t| > κ. (2.3)

Now it is necessary to integrate only between t− κ and t+ κ:

(f ∗ g)(t) =

∫ t+κ

t−κ

f(x)g(t− x) dx. (2.4)

In real-time ECG applications, such as cardioverter-defibrillators, it is desirable to

keep κ as small as possible, since it represents a delay between the time when a

feature occurs, and the earliest time it can be detected. However, effective filter

masks with small values of κ are difficult to design (especially for high pass filters),

and is a specialist topic[78].

An alternative implementation is to perform a fast fourier transform on the signal,

multiply the transformed signal with the fourier transform of the mask, and then

perform the inverse fourier transform on the signal. This avoids the expense of

convolution, but requires a forward and inverse fourier transform operations, which

are themselves expensive. It also cannot be performed in real time since the fourier

transform must be performed on the signal over the whole time domain. A detailed

discussion of various types of linear filter used in ECG analysis may be found in

chapter 5 of [19].

The gaussian filter

A commonly used mask is the gaussian function:

g(t) =1√2πσ

e−t2

2σ2 (2.5)

which has the familiar “bell” shape and produces a smooth result, and satisfies

condition (2.2). The problem that the algorithm designer faces, is that of choosing

the most appropriate value of σ. If σ is too large, then not only are wanted features

inadvertently filtered out, but the approximation of (2.3) becomes invalid, and

artifacts in the signal result, triggering false positives in the detection stages. If σ

is too small then the unwanted features will not be sufficiently attenuated.

Typically gaussian filters are used to remove the high frequency noise components

and/or the very low frequency components such as those caused by the patient’s

respiration.

Wiener filtering

If the characteristics of the noise are known, then it is possible to design a mask

optimised to that type of noise. The result is known as the wiener filter[60] and is a

2.1. FILTERING 17

popular filter in telemetry and image processing. Several studies have used wiener

filters, with varying degrees of success[54, 44]. The drawback of the wiener filter is

that it must be optimised to a particular type of noise. A model of the noise has

to be prepared when designing the filter. In different situations, the nature of the

noise can change, rendering the noise model invalid.

2.1.2 Non-linear filters

For practical reasons, modern signal processing methods involve digitising the signal

into discrete samples. In the discrete world, equations such as (2.4) become:

(f ∗ g)(t) =

t+κ∑

x=t−κ

f(x)g(t− x), (2.6)

which may be equivalently represented in vector notation

(f ∗ g)(t) = g · [f(t− κ), f(t− κ+ 1) . . . f(t+ κ)]T , (2.7)

where g is a 2κ+1 column vector containing the discrete sampled values of g(t) and

T indicates the transpose. As σ → ∞, g → [1/(2κ + 1), 1/(2κ+ 1) · · · 1/(2κ+ 1)].

In other words, the filtered signal is the mean average of the signal values in the

region of the mask.

An alternative type of filter, uses the median average instead of the mean. Such a

filter is an example of a non-linear filter. General non-linear filters are based upon

order statistics instead of arithmetic statistics. Uses of non-linear filters in ECG

analysis are discussed in chapter 6 of [19]. In the literature, specially optimised

filters and filters employing alternative techniques can be found[59].

2.2 Non-linear transforms

A non-linear transform is vital to the fiducial point extraction process. A useful

transform will accentuate the desired features and attenuate those features which

are not of interest. It follows then that the most effective transforms will be opti-

mised to a particular set of fiducial points and to a particular set of input data. In

the Pan-Tompkins detector (see section 2.3.1) the transform used is f(t) 7→ f(t)2,

which has the effect of making regions of high amplitude higher. More complex

transforms may not be representable by simple algebra.

2.2.1 Morphological transforms

Morphological transforms are commonly used in image processing, in order to iden-

tify the peaks and valleys in a signal. The two basic morphological operations are

Minkowski addition ⊕, and Minkowski subtraction ⊖, defined as:

A⊕B =⋃

β∈B

(A + β) (2.8)

A⊖B =⋂

β∈B

(A + β) (2.9)

A common use of morphological operations is found in the field of digital image

processing, where A is the set of pixels comprising an image component and B is an

arbitrary “structuring element”. In signal processing, such as ECG analysis, A is

the signal of interest, and B the structuring element. For ease of use, it’s common

to define the terms “dilation”:

D(A,B) = A⊕ B (2.10)

and “erosion”

E(A,B) = A⊖−B. (2.11)

From these, we further define “opening”

2.2. NON-LINEAR TRANSFORMS 19

A ◦B = D(E(A,B), B) (2.12)

and “closure”

A •B = E(D(A,−B),−B). (2.13)

The application of morphological operators for QRS detection was first proposed

by Trahanias[76], who suggested the transforms:

PE(f) = f − (f ◦B) (2.14)

V E(f) = f − (f •B) (2.15)

to extract peaks and valleys respectively.

When applied to a real, single dimensional signal, such as an ECG, B is a function

b(t) =

1 |t| < K

0 otherwise

where K is constant. One of the problems of the morphological approach is deter-

mining the optimal value of K. If too small, the transform will be sensitive to high

frequency noise, and will erroneously emphasise spikes of short duration. However

K must be less than the width of the components one wishes to detect. Studies

have reported successful results using values between 55–60ms[70].

2.2.2 The wavelet transform

In recent years there has been much research into ECG analysis using wavelet

transforms. Like the fourier transform, the wavelet transform produces an output

indicating the amplitude of the signal in the frequency domain. However, the fourier

transform cannot explicitly indicate which temporal range(s) of the signal contain

which frequency components. One way to address this shortcoming is the windowed

fourier transform, where the signal is first divided into windows of equal duration,

and the fourier transform applied to each. There are two issues with this approach.

The first problem is that of deciding the optimal size of the window. If the window is

too large, then no temporal information will be obtained (in the limit it degenerates

to the non-windowed fourier transform). Conversely, the window must be larger

than the period of the components one desires to identify. The second issue is that

of artifacts at the window boundaries. The wavelet transform aspires to overcome

these problems.

Formally, the wavelet transform is defined as

w(s, τ) =

∫ ∞

−∞

f(t) ∗ ψ∗s,τ(t) dt (2.16)

where the basis functions ψs,τ are obtained by scaling and translation of a single

function ψ, called the mother wavelet:

ψs,τ (t) =1√sψ

(t− τs

. (2.17)

The mother wavelet may be chosen from any function which satisfies a number of

conditions. Details of these conditions and a general description of wavelet trans-

forms may be found in [77] and in many text books.

The convolution operator in (2.16) means that the simple application of a wavelet

transform is an expensive operation. Fortunately however there is a special case

of the wavelet transform called the discrete wavelet transform (DWT), somewhat

analogous to the fast fourier transform, which when applied to a discrete signal

with n samples, operates in time O(n).

The DWT imposes a condition on (2.17) such that s = 2j and τ = k, where j and

k are non-negative integers:

ψj,k(t) =1√2jψ

(t− k2j

[j ∈ N, k ∈ N]. (2.18)

In the literature, j is often called the ‘level’ of the wavelet. For a signal with n

samples, meaningful values of j are in the range [0, log2 n] and k has the range

2.2. NON-LINEAR TRANSFORMS 21

[0, 2j−1]. At j = 0, the wavelet transform gives no temporal information, but a high

resolution of frequency information. Conversely at j = log2 n, temporal resolution

is high, but frequency resolution is low. The complete DWT is the combination of

ψj,k(t) for all valid values of j and k.

Wavelet transforms are an attractive tool for ECG fiducial point detection since they

allow frequency components in a time series to be identified. Successful fiducial

point detection systems based on wavelet transforms have been done offline [45]

or online using purpose built hardware[63]. Software algorithms tend to use the

discrete wavelet transform[36, 50] because it operates in linear time. The questions

raised when designing a wavelet based detection system include the choice of mother

wavelet, and what level of wavelet coefficients are to be regarded as containing useful

information.

2.2.3 Kupeev’s algorithm

This thesis draws upon the work of Kupeev[42] who presents a general algorithm for

the detection of maxima in a single dimensional signal. It uses a discrete approach,

and does not rely upon filtering, nor on any direct thresholding techniques. The

algorithm provides absolute reliable detection of maxima in an arbitrary signal. The

algorithm was not presented with a view to ECG analysis, but rather, applications

in image processing were envisaged.

Kupeev describes two variants of his algorithm, one for detecting a predefined num-

ber of maxima in a signal, and another for detecting all the maxima which exceed

a given level of “significance”. The “significance” of a maximum is a scalar value

normalised to the range (0, 1). The unnormalised significance is called the “weight”.

Kupeev does not prescribe the definition of “weight”, but suggests the area and the

height of the peaks as candidates. In this context, a ‘peak’ of a signal is an interval

[t1, t2] on the function domain t, (see figure 2.4a) which satisfies the properties:

1. f(t1) = f(t2), and

2. At either t = t1 or t = t2, f(t) satisfies the conditions:

(a) f ′(t) = 0, and

(b) f ′′(t) > 0.

The algorithm creates a binary tree, where each node represents a peak in the

signal. It is a property of this tree, that all nodes have as their parent, a node

representing the peak formed by the union of its children. Figure 2.4 illustrates the

process. Next, the algorithm iterates the leaves of the tree, and removes the peaks

of smallest weight, placing them onto a stack. The process continues until all nodes

have been removed from the tree. The result is a stack of leaf nodes sorted by

weight. In a noisy signal many of the nodes will be of very low weight, and can be

easily discarded by setting a suitable threshold. Performing this process efficiently

is non-trivial. However if implemented optimally, the complexity of the algorithm

is O(n logn) where n is the number of samples.

2.3 The decision rule

The decision rule is a classification algorithm. It takes each time sample and makes

a decision which feature, if any, occurs at that instant, based on the output of the

non-linear transform. In the case of a simple detector, the classifier is a binary

classifier, and the possible classes are (“feature” and “no feature”). More general

classifiers will have a number of different feature types.

2.3.1 Adaptive thresholding

Thresholding is the most trivial decision rule. This rule states that if the output

from the non-linear transformation exceeds a certain value, then a feature exists,

otherwise no feature exists. In its unmodified form, thresholding is vulnerable to

noise; both low frequency noise (baseline shift) and high frequency interference. For

2.3. THE DECISION RULE 23

(a) Initially, the tree contains one nodefor each peak in the signal. Peak ‘a’ isdefined on the range [t1, t2]. Other peaksare defined similarly.

(b) The leaf node with the smallest weightis excised from the tree.

(c) The leaf node with the smallest weightis excised from the tree. Excising nodesmay uncover new parent nodes which be-come new leaves.

(d) The final peak represents the most sig-nificant peak in the signal.

Figure 2.4: An example of peak detection by Kupeev’s algorithm. Excised nodesare placed into an external stack which will contain peaks sorted by weight.

this reason, practical implementations use an adaptive heuristic where the value of

the threshold varies according to data gathered over a predetermined time window

immediately preceding the input time.

Pan-Tompkins detector

A widely cited algorithm by Pan and Tompkins[56] uses a square function as its

non-linear transform an adaptive dual threshold as the decision rule. This is one of

the earliest reliable software based detectors, from which much subsequent work is

derived. The algorithm has the following successive steps:

1. Filtering, for noise removal.

2. Differentiation, to select regions of high slope,

3. Squaring, which enhances the detected slope and discards its sign

4. Integration over a moving window of 150ms duration. This helps to select

features which have both large slope and width, reducing false detections

caused by spikes.

5. An adaptive dual threshold is applied both to the output of steps 1 and 4. A

QRS complex is considered to be present, if and only if the signal exceeds the

upper threshold in both cases. If a QRS is not detected withing 1.66 times

the running average of the RR period, then the thresholding stage is applied

again, but using the lower of the two thresholds.

The Pan-Tompkins detector suffers a number of problems:

• The algorithm tends to misidentify T waves as QRS complexes. So it’s neces-

sary to compare the slope of the detected feature with that of the previously

detected QRS complex. If the slope is less than one half, then it is declared

to be a T wave and discarded.

• Any QRS occurring within 200ms of the previous one is not detected. Further,

the dual thresholds are lowered, depending upon the time from the previous

detection.

• The complex nature of the algorithm involves many parameters, eg. the ratio

between upper and lower threshold, the period over which the first threshold

stage applies, the size of the moving window etc. The authors present results

based on empirical optimisations of these parameters. Whilst the results seem

impressive, it cannot be determined to what extent the parameters have been

over-optimised to suit the test data.

These problems mean that abrupt changes in rhythm or ectopic beats can be missed

or delayed. It also means that the output lags the input by 200ms.

2.3.2 Stochastic methods

One family of classifiers may be jointly called stochastic classifiers. In these types

of classifiers, a generalised underlying model of the process is assumed. However

the parameters of that model are initially unknown. Sampled output from the

system is collected over a period of time. These data are used in conjunction with

the known ideal output, obtained by some other reliable method of classification,

in order to “train” the model with the necessary parameters to perform optimal

classification. Training is normally an iterative process, which consumes significant

resources. However once trained, a classifier can perform rapid classifications. A

well trained model, will then be able to perform classifications on hitherto unseen

There are two general problems with the stochastic approach:

• The training process is subject to “overfitting”; if the model is trained too

well, then it produces good results on the training set, but poor results on

other data. In the case of ECG analysis therefore it is necessary to use a very

generalised training set, or to restrict the application of a the system to a

narrow class of input. For example a system trained on ECGs recorded from

healthy subjects might perform poorly when presented with an ECG from a

patient exhibiting premature ventricular beats. It is hard to be sure that a

system has not been over optimised to a particular data set.

• The parameters of the model are “non-parametric”; they do not contain values

which can be associated with any physical or mathematical quantity. This

means that, even if a model performs well, it does not contribute to un-

derstanding of the physiological process. Neither does it lend itself to easy

modification; in general adding a new record to the training set cannot be

done independently. It is necessary to completely retrain the system.

Neural networks

Neural networks are a mathematical approach to learning problems inspired by the

physiology of the brain. For a general introduction to neural networks see [8]. A

network comprising several “neurons” is used to produce an output from a given

input. Each neuron has a single output, many inputs, and a non-linear addition

operator mapping input to output. The connections between each neuron has a

“weight” parameter. The weights are initially set to random values, and progressive

training steps mutate the weights into values which produce optimal output. There

are many different possible arrangements of the neurons and many different training

schemes.

In the field of ECG fiducial point detection, neural networks have been used in

conjunction with wavelet transforms[69], regression coefficients[82] and directly on

the sampled data in the time domain[71]. One popular configuration, the “multi-

layer perceptron” uses a layer of “hidden” neurons. Choosing the optimal number

of hidden neurons, the most appropriate configuration, and the best representation

of input data is crucial to the success of neural network based systems.

Genetic algorithms

In a genetic algorithm, the decision rule is regarded as a parameterised function

of the non-linear transform’s output. The parameters are represented by a string

of binary digits, called a genotype. A population of several hundred genotypes are

created, which are initially set to random values. The genetic algorithm tests each

genotype in turn, and discards those which perform badly. Those which perform

better are selected to form the basis of the population of the next “generation”.

The new generation comprises new genotypes formed by “crossover combination”

of the better genotypes from the previous generation. After a number of genera-

tions, the population will have converged, so as to comprise genotypes which are

of optimal performance. Issues to be solved in genetic algorithms include selection

of appropriate function parameters, and the criteria upon which to halt training so

as to avoid overfitting. Successful QRS detectors have been implemented using a

genetic algorithm based system where the parameters are the amplitude thresholds

and time onset and offset of the windows within which a peak is deemed to have

occurred[58]. In the literature there does not appear to be any reports of successful

detection of other features using genetic algorithms.

Hidden markov models

Markov models consider the ECG to be generated from a probabilistic function

according to the state of an underlying state sequence. If the state sequence can

be determined, then the model can be used to predict features. An overview of

hidden markov models and their application to signal processing may be found in

[61] and similar, but more concise tutorial in [26]. The ECG may be modelled by

a markov chain where each feature (or lack thereof) corresponds to a state in the

state space[4].

2.4 Summary

Algorithms from the literature can generally be described by the filter—non-linear

transform—decision rule pipeline model, and this can be a useful schema to organise

different techniques. However not all authors choose to describe their work accord-

ing to this model, so some interpretation is required on the part of the reader.

Nevertheless, for the purposes of this thesis, the pipeline model serves well as a

guide to understanding different approaches to fiducial point extraction.

Chapter 3

Assessment of Performance

To determine the efficacy of a system intended to detect or classify features in

an ECG, the system will need to be tested and its performance reported. How

to test it, what parameters to report, and which metrics should be used depends

upon what the system is supposed to do, and the perceived application. Quanti-

tative measures are preferred so as to allow comparison between systems. In the

literature various metrics based upon the number of correctly and incorrectly de-

tected or classified features are customarily used. However they are poorly defined,

which raises uncertainty when comparing systems.

3.1 Reference sources

Systems intended to detect or classify heartbeats or features in an ECG, are gen-

erally assessed by comparison with results from a nominated reference source. For

this purpose, there are several publicly available databases containing pre-recorded

ECG signals and annotations indicating the temporal position and other proper-

ties of the salient features. One well known database is the MIT-BIH Arrhythmia

Database[67]. This database has two data channels and an annotation channel

indicating the temporal position of each R peak, the class of each beat and some

30 CHAPTER 3. ASSESSMENT OF PERFORMANCE

auxiliary information. The QT database[2] annotates not only the R peaks, but also

the intra-beat features, for example the P wave and T wave. However, each record

in the database is annotated only for a short period, which limits its usefulness.

Comparison against annotated databases has a number of problems:

• The database may not contain a representative sample of the type of ECG data

which one desires to analyse. This is particularly true for clinical conditions

which are very rare. Conversely, since most databases comprise records taken

from hospital patients, it is unsafe to assume them to be a representative

sample of the general population.

• The annotations are created by human judgement. Typically, two or more

expert cardiologists are asked to indicate where features occur in the records,

and to classify that feature. The annotated temporal location is then deemed

to be the mean average of those indications. Investigation has shown that

disagreement between experts is common, and that the same cardiologist can

have a different interpretation of the same ECG on different occasions[64].

An alternative reference source may be found in the form of synthetically generated

signals. Such a reference has the following advantages:

• The temporal location of the features are precise and unambiguous.

• Signals satisfying particular criteria may be generated on demand.

• Noise free signals may be generated. Alternatively noise of a particular mag-

nitude, and of a particular type can be added.

• The sampling rate can be chosen to suit the application.

However, the artificial nature of such signals leads one to question how well they

model a signal found in clinical practice. Testing a system using both synthetic and

database reference sources would provide the most comprehensive evaluation of the

system, but in the literature this is rarely done.

3.2. REFERENCE MATCHING 31

3.2 Reference matching

Before any metric can be calculated, it is necessary to match each beat or feature

from the system’s output against the reference source. A number of problems arise

• The system may not define its feature set or beat class according to the same

rules as the reference source. In general, some mapping from classes in the

system’s domain to the reference domain is necessary. For example, a database

may consider bundle-branch-block beats as a separate class, whereas they

would be regarded as “normal” by a system interested only in discriminating

ventricular ectopic beats.

• Differences in sampling rate, and the accuracy of both the reference source

and the system will mean that no feature will occur at precisely the same

instant. The researcher must choose a tolerance, between the reference time

and the system time, within which an annotation from the reference and a

detection from the system are deemed to refer to the same feature.

• However refined a system, it is unreasonable to expect that every feature

present in the reference source will be detected or that all features detected

will have corresponding features in the reference source, or that both sources

will contain the same number of features. Hence some algorithm for matching

features from the reference source to those from the system must be defined,

and this algorithm must be tolerant to features which may be present in one

source, but not the other.

These issues are discussed in greater length below.

3.2.1 Matching tolerance

The length of time that might be considered a reasonable matching tolerance de-

pends, amongst other factors, upon the type of features being matched. Clearly,

as the tolerance increases, the number of “false negatives” decreases giving a false

appearance of improved performance. The AAMI/ANSI standard EC57:1998[1]

recommends 150ms. Whilst this figure is reasonable for the detection of individual

beats in a healthy patient, it is clearly too wide for a system detecting P, Q and R

peaks, or when detecting beats during periods of fibrillation. In these circumstances,

features may be as little as±120ms apart. In the literature, many authors have pub-

lished figures for Se and Sp (see section 3.3) which are quite high. However, in many

of these papers (both those published before and after AAMI/ANSI EC57:1998) the

matching tolerance is either not mentioned or appears to be arbitrarily selected. In

at least two studies, different asymmetric tolerances are used[23, 28] with no men-

tion of why these figures were chosen. Another study[15] uses two tolerances —

beats which fall outside the lower tolerance but within the higher are regarded as

“shifted errors” and used to modify the definitions of Se and Sp in a way which

the authors justify in the paper, but is nevertheless not ubiquitous. According to

one report[83], opinion on the acceptable tolerance for QRS duration (and by im-

plication, the position of the Q and S features) vary from ±7ms to as low as ±2ms.

It should be noted however, that most reference databases are available only in

digitised format with a sampling period of between 2.7ms and 4ms. Further, in the

case of a 12-lead ECG, a propagation delay between leads will be observed, making

the precise instant of a feature’s occurrence ambiguous.

3.2.2 Matching algorithms

Having decided upon an appropriate acceptance tolerance, there are a number of

ways of matching the reference annotations to the features detected by the system.

By way of example, figure 3.5 illustrates two plausible ways of matching a short

r1 r2 r|R|

s1 s2 s|S|

r1 r2 r|R|

s1 s2 s|S|

time−−−−−−−−−−−−−−−−−−−−→Figure 3.5: Two different permutations for matching reference features to detectedfeatures. Reference features are annotated rn and detected features sn.

sequence of reference features to a detected sequence. Formally, the problem may

be described as follows:

Given a series of reference features

R = r1, r2, . . . ri, . . . r|R|

and a series of features as detected by the system

S = s1, s2, . . . sj, . . . s|S|,

one desires to find a series of pairs

W = w1, w2, . . . wk, . . . w|W | max(|R|, |S|) ≤ |W | < |R|+ |S|,

where each element takes the form wk = (i, j)k i, j ∈ (N ∪ X) where X indicates

a missing value. The series W matches each feature in the reference source to a

feature detected by the system. There is a constraint on W :

wk = (i, j), wk+1 = (i′, j′) i ≤ i′ ≤ i+ 1 ∀i 6= X, j ≤ j′ ≤ j + 1 ∀j 6= X

which ensures that all elements from both the reference and the system are matched,

and that no pair of matched features are straddled by another.

Dynamic time warping

There are many possible combinations for W but there is no obvious optimal choice.

Finding such a combination is a process known as “dynamic time warping”[39]. One

approach is to choose a combination such that the total of the differences between

a feature in the reference and its matched feature in the system is minimised. In

other words to minimisek=|W |∑

|wk,i − wk,j|. (3.1)

Another choice would be to minimise the sum of squares. These choices would find

the “best fit” of the system to the reference source, but would not be optimal in

terms of reducing the number of false positives and false negatives.

Instead, if we were to maximisek=|W |∑

xk, (3.2)

0 if wk,i = Xor wk,j = X

1 if |wk,i − wk,j| < θ

0 otherwise,

and θ is the chosen acceptance tolerance, then false detections will occur with

the lowest possible frequency. However, this approach is arguably inappropriate

since it deliberately optimises the test metric in order to find a figure which most

closely matches the desired result, rather than objectively reporting the results of

the system.

General solutions to the dynamic time warping problem require quadratic time

to find, although methods for obtaining faster solutions are an area of active

research[65]. The quadratic time requirement may be one reason why general dy-

namic time warping techniques are not used in the assessment of ECG feature

detection/classification systems. However, this need not be an issue, since assess-

ment can be performed offline, and need not affect the speed of the system itself.

Furthermore, a typical database record is of 30 minutes duration, which for a sys-

tem detecting R peaks corresponds to approximately 1800 features, which is well

within the capabilities of modern computer hardware. Despite this, AAMI/ANSI

EC57:1998 recommends a very simplified form of the general time warp matching

algorithm. The algorithm is described in verbose terms, by section 4.3.2 of the

standard, but using the notation of this chapter may be written:

1. Let i = j = k = 1.

2. If sj < ri then

if |sj − ri| < |sj+1 − ri| and |sj − ri| < θ then

wk = (i, j), increment i, increment j,

else wk = (X, j), increment j,

if |sj − ri| < |sj − ri+1| and |sj − ri| < θ then

wk = (i, j), increment i, increment j

else wk = (i, X), increment i.

3. Increment k

4. Goto 2

Elements of the form (i, X) indicate a feature is present in R, but a matching feature

cannot be found in S — a false negative. Elements of the form (X, j) indicate a

feature which is present in S, but a matching feature does not exist in R — a false

positive.

This matching algorithm, whilst simple, considers only those features which are

closest to each other as matching pairs, whereas the general dynamic time warping

problem permits features at opposite ends of the series to be matched. Whilst it

would be unreasonable to match a feature to a reference annotation several minutes

apart, it is plausible that more than one intermediate feature might exist in either

series between the matched pairs.

3.3 Reporting and visualising performance

Performance is often reported by quoting the sensitivity and specificity when match-

ing against a reference source with some acceptance tolerance. In the literature

sensitivity is defined as

Se ,TP

TP + FN(3.3)

and specificity as

Sp ,TN

TN + FP, (3.4)

where the following definitions apply

TP the number of true positives,

FN the number of false negatives,

TN the number of true negatives,

FP the number of false positives.

Both metrics are real numbers in the range (0, 1) and are often expressed as per-

centages. Sensitivity expresses the ability of a system to detect features, whereas

specificity represents the ability to reject false detections. It is trivial, but not use-

ful, to produce a system which has a sensitivity of 100% or specificity of 100%. In

a useful system, both the figures will approach 100%.

In practice, a compromise between high sensitivity and high specificity is necessary.

It is commonly found that adjusting a parameter of the system increases one figure

but reduces the other. For example, reducing a decision rule threshold would result

in a more promiscuous detector, hence a higher sensitivity but a lower specificity.

3.3.1 The receiver operating characteristic

One way of concurrently representing the specificity and sensitivity is the receiver

operating characteristic (commonly abbreviated to ROC). This is a cartesian plot

3.3. REPORTING AND VISUALISING PERFORMANCE 37

0.0 0.2 0.4 0.6 0.8 1.00.0

1 − Sp

Figure 3.6: An example of several receiver operating characteristic curves plottedon the same axes.

indicating how Se varies with Sp. By convention, Se is plotted on the vertical

scale and 1 − Sp on the horizontal scale, as shown in figure 3.6. For an ideal

detector, the entire plot would be clustered in the upper left hand corner, whereas a

“detector” which randomly “guesses” features would result in diagonal line running

from bottom-left to top-right. The ROC representation is useful because several

curves can be plotted on the same axes, thus giving a good visualisation of how a

detector performs when various parameters are changed.

3.3.2 Problems with traditional definitions of sensitivity

and specificity

The receiver operating characteristic was developed in the 1950s during research

into interference in radar equipment. Today, it is commonly used in the reporting

of medical decision making, including ECG analysis. However, there are aspects of

the definition of the ROC and the metrics it represents which are misunderstood in

the literature and this leads to inappropriate use.

The definitive texts [57, 72] on the ROC do not mention “sensitivity” or “speci-

ficity”. Instead, the ROC is defined in terms of probabilities. Let the following

events assume the indicated definitions:

α0 the event that the system indicates a feature is present,

ω0 the event that a feature is actually present at the system’s input,

ω1 the event that no feature is present at the system’s input.

The ROC curve plots P (α0|ω0) on the vertical scale and P (α0|ω1) on the horizontal

scale. Now P (α0|ω0) is the probability of detecting a feature which is known to be

present and P (α0|ω1) is the probability of a “false alarm”.

For digitally sampled signals, one could write:

P (α0|ω0) = TPTP+FN

P (α0|ω1) = FPFP+TN

. (3.6)

Hence,

1− P (α0|ω1) = 1− FPFP+TN

= FP+TNFP+TN

− FPFP+TN

= FP+TN−FPFP+TN

= TNFP+TN

Equations (3.5) and (3.7) appear identical to the definitions of Se and Sp which has

prompted their adoption in the assessment of ECG analysis systems.

However most features of interest in a ECG are discrete events, for example, the

R peak, the P wave onset/offset etc. Careful examination of equations (3.5) and

(3.7) reveals that they are valid for the measurement of continuous events, but not

for discrete events occurring in a continuous medium. A “positive” (either true or

3.3. REPORTING AND VISUALISING PERFORMANCE 39

false), since it is instantaneous, does not occupy the entire duration of one sample,

whereas “negatives” have a duration equal to the sampling period. Hence addition

and division of these heterogeneous values have no meaningful result. The problem

affects Sp the greatest, and is most easily realised by way of example.

Example 1: A 30 minute ECG with a mean heart rate of 1 beat

per second will contain 1800 beats. Consider a system which correctly

detects all 1800 beats, but erroneously reports an additional 1800 beats.

Thus FN = 0, TP = 1800 and FP = 1800. If the system is sampled at

100Hz, then the total number of samples is 30× 60× 100 = 180, 000, so

TN = 180, 000− 0− 1800− 1800 = 176, 400. Hence,

Sp = 176400176400+1800

= 98.9898%.(3.8)

Example 2: If however, the same record is sampled at 500Hz, and

tested with the same system, then the total number of samples is 30×60×500 = 900, 000. Hence TN = 900, 000−0−1800−1800 = 896, 400,

Sp = 896400896400+1800

= 99.7995%.(3.9)

From these examples two problems are clear:

1. Despite the fact that there are just as many erroneous detections as correct

detections Sp is very close to 100%.

2. Increasing the sampling rate gives the appearance of improving performance.

Problem 1 means that specificity, as defined by equation (3.4) is of limited usefulness

when visualising the performance of a system, since in almost all systems it will be

close to 100%. Problem 2 is a particular concern since it means that a parameter

which is independent of the system affects the measure of the system’s performance.

The underlying problem is that features of interest are not continuous but instan-

taneous. Hence, increasing the sampling rate does not in general increase FP but

will most certainly result in an increase of TN.

3.3.3 Alternative definition of sensitivity and specificity

Despite these problems, Se and Sp have been freely used in the literature to report

the performance of ECG detection systems. One author[15] alludes to the problem

with the words:

The logic of using shifted errors [. . . ] is that thus the total number of

beats in a record retains its value. Otherwise it would change depending

on the type and number of errors and thus impede correct computation

of Se and Sp.

However, the Se and Sp metrics continue to be used, it would appear, for no other

reason than convention. What is required is a calculation of the probabilities of cor-

rect detection and of “false alarm” based on the digitised results. Here, a modified

calculation of Se and Sp is proposed. Let

Se′ ,TPθ

TPθ + FN/s(3.10)

Sp′ ,TN/s

TN/s+ FPθ, (3.11)

where s is the sampling rate of the digitised ECG record and θ is the acceptance

tolerance discussed in section 3.2.1. Note that the dimensions of both the numerator

and denominator in these definitions are time. By the following examples, it can

be seen that the metric is largely independent of the sampling rate and, for cases

where FN is high, has a value significantly less than 100%.

Example 3: Using the figures from the previous examples, the value

for Sp′ when sampled at 100Hz, and using an acceptance tolerance of

3.4. TIME TAKEN FOR DETECTION 41

150ms, is

Sp′ = 176400/100176400/100+1800×0.15

= 17641764+270

= 86.7256%.

(3.12)

Example 4: However, increasing the sampling rate to 500Hz does not

have significant effect on the value of Sp′.

Sp′ = 176400/500176400/500+1800×0.15

= 18001800+270

= 86.9565%.

(3.13)

Using Se′ and Sp′ instead of Se and Sp has the advantages that

• The figures more closely represent the historical definitions, and have mean-

ingful mathematical interpretations.

• They are independent of the sampling rate.

• Large numbers of false positives and/or false negatives are more easily seen.

3.4 Time taken for detection

In real time applications, a system’s performance is measured not only by the quality

of the results it produces, but also the time taken to produce those results. Hence,

for completeness, studies should report the time between a feature’s occurrence

and the feature being detected by the system. For many systems, especially those

which employ windowed techniques, this time will not be constant. In these cases,

the maximum time or parameters indicating the distribution of the detection time

should be reported.

3.5 Measurement of precision

Whilst many applications are not overly concerned with the difference between the

detected time and the event time, it can be important for cardioversion and other

therapeutic purposes. Biomedical engineers working in such areas will therefore

be interested in the mean and standard deviation of the differences. However, in

the literature, very few studies report both the mean and standard deviation of

the differences. A system with systematic error will have a non-zero mean, and

applications can subtract this value to improve results. Knowing the standard

deviation will allow estimates, using the central limit theorem, of the proportion of

detections falling outside any particular tolerance to be made.

3.6 Assessing feature classifiers

So far, this chapter has discussed only the assessment of systems designed to de-

tect features, and not those designed to classify them. A feature classifier can be

considered as the generalised case of a detector. Whereas a detector makes only

two possible decisions, α0 or α1, a classifier’s decision has a number of possibilities

{α1, α2 . . . αn}, where one possible decision (say αn) is “no feature”. A classifier

might be designed to decide the waveform of a feature (ie. P, Q, R, S or T) or

to classify beats according to their clinical diagnosis — normal beat, premature

ventricular beat, supra-ventricular beat, bundle branch block beat etc.

The literature to date, compares techniques on a low level, parameter-by-parameter

basis[14, 21, 16]. These comparisons are of interest to those working on the devel-

opment of new algorithms or the enhancement of existing ones, but are of little

interest to a clinician when making a decision about which algorithm suits a par-

ticular purpose.

3.6. ASSESSING FEATURE CLASSIFIERS 43

3.6.1 Problems with current classification methods

Performance is reported either by a table giving counts of correctly and incorrectly

classified beats, or by way of statistics inferred from such a table. Recognising

that classification is the generalised case of detection, researchers have attempted

to generalise the traditional definitions of Se and Sp. However these values are

defined only for binary classification, and do not readily lend themselves to problems

involving more than two classes. Nevertheless, in the literature, one often sees beat

classifier performance reports where sensitivity and specificity are freely quoted.

Whilst the definitions of these measures for the context are normally not given,

they appear to use the following extended definitions:

TPj the number of instances where the system decides αj and the

feature belongs to class ωj,

FPj the number of instances where the system decides αj and the

feature belongs to a class other than ωj ,

TNj the number of instances where the system makes a decision other

than αj and the feature belongs to a class other than ωj,

FNj the number of instances where the system makes a decision other

than αj and the feature belongs to class ωj.

Hence Sej and Spj for each value of j can be defined accordingly.

In addition to the concerns of section 3.3.2, further problems become apparent when

using such statistics to evaluate the performance of a classifier:

1. They do not take into account the a priori probabilities of the features.

2. They do not take into account the relative costs of false classification.

3. They can be presented only as a multi-dimensional value, even where only

two classes are being considered. There is no obvious single ordinal value.

Problem 1 has been recognised in the medical literature[3]. Problems 2 and 3 how-

ever are rarely mentioned, possibly because the metrics are devised by biomedical

engineers and not clinicians. Problem 3 makes reports particularly unhelpful from

the point of view of the clinician trying to compare systems with a view to adopt-

ing one for use. For an n class classifier, there are 2n scalar quantities, so ranking

classifiers using these quantities is not possible.

In the following section a new method is proposed, which overcomes these problems

and aims to be generally useful for the quantitative comparison of beat classification

systems.

3.6.2 Proposed methodology

A system’s utility as a prognostic medical tool is a measure of the benefit afforded

by selecting it against other alternatives. Choosing a system involves maximising

the benefit, or alternatively, minimising the risk. A measure for the overall risk

associated with making a decision based upon the output of a classifier is a useful

measure of its performance. Risk is characterised by the probability of error and

the costs associated with making a decision based upon the erroneous classification.

This proposal uses Bayesian decision theory to determine a method of calculating

the associated risk.

Review of Bayesian risk

Bayesian decision theory is presented in many texts on statistics and classification

theory [66, 25, 72] and will be introduced here only briefly. In a system which is

claimed to recognise n different classes of beats {ω1, ω2 . . . ωn}, there are n possible

outputs, {α1, α2 . . . αn}. Bayes rule states:

P (ωj|αk) =P (αk|ωj)P (ωj)

P (αk). (3.14)

Since {α1, α2 . . . αn} are mutually exclusive and P (⋃n

i=1 αi) = 1 then

P (αk) =

P (αk|ωi)P (ωi). (3.15)

The quantity P (ωj) is called the a priori probability, P (αk|ωj) the likelihood or class

conditional probability and P (ωj|αk) the a posteriori probability. From (3.14) and

(3.15) one can write:

P (ωj|αk) =P (αk|ωj)P (ωj)n∑

P (αk|ωi)P (ωi)

. (3.16)

Let λ(αk|ωj) be the cost incurred for making decision αk when ωj is the true class.

Therefore, the risk of making decision αk based upon the classifier’s output (the

risk of reliance) is:

R(αk) =n∑

λ(αk|ωj)P (ωj|αk). (3.17)

Combining the above gives

R(αk) =

λ(αk|ωj)P (αk|ωj)P (ωj)

P (αk|ωi)P (ωi)

. (3.18)

The overall risk of relying on a classifier is

R(αk)P (αk), (3.19)

or equivalently

R =n∑

λ(αk|ωj)P (αk|ωj)P (ωj). (3.20)

A proposed new measure of classifier utility

In this proposal {R(α1), R(α2) . . . R(αn)} is used in the consideration of a classifier’s

utility, and R is used for overall rating. R has the range (0,∞) and its units are

the same as those chosen for λ(αk|ωj). In many circumstances a unitless measure,

having the range (0, 1) will be more useful. Accordingly, a normalised metric

R =RRmax

, (3.21)

where Rmax is the value obtained from equation (3.18) when the class conditional

probabilities are set to

P (αk|ωj) =

1 if λ(αk|ωj) = maxi λ(αi|ωj)

0 otherwise(3.22)

is suggested. Thus, a perfect classifier has a R value of zero and, at the opposite

extreme, unity.

Equation (3.18) comprises the terms P (αk|ωi), P (ωj) and λ(αk|ωj). P (αk|ωi) are

parameters of the classifier and can be tested experimentally. P (ωj) and λ(αk|ωj)

are parameters of the classes of interest. They are respectively the a priori proba-

bilities and the costs of making decisions.

3.6.3 Worked example

AAMI/ANSI EC57:1998 section 4.3 identifies 5 classes of beats which are recom-

mended in performance reports, viz: normal beats, supra-ventricular ectopic beats,

ventricular ectopic beats, fusions of normal and ventricular ectopic beats and other

unclassified beats. This section shows how a value for R can be calculated, using

secondary sources and published figures from each of the systems.

Calculation of a priori probabilities

Table 3.1 shows the a priori probabilities. These were calculated using the beat

counts from the MIT-BIH Arrhythmia database. The records chosen were the

j ωj cj P (ωj)0 Not a Beat 577 .1 (n) Normal Beat 46097 0.96702 (sveb) Supra Ventricular Ectopic Beat 192 0.00403 (veb) Ventricular Ectopic Beat 1345 0.28004 (f) Fusion of Normal and VEB 13 0.00025 (q) Unclassified Beat 0 0.00

Table 3.1: A priori probabilities derived from the MIT-BIH Arrhythmia database.

first group of records (numbers 100–124) from the database. The second group

(numbers 200–234) were disregarded since they are not randomly selected data

but were deliberately selected by the authors of the database to contain “rare but

clinically important phenomena”. The first group, however was randomly selected

so as to “serve as a representative sample of the variety of waveforms and artifact

that an arrhythmia detector might encounter in routine clinical use”.

Class 0 was disregarded, since these annotations are not beats, but are used to mark

other interesting features in the signal. cj are the counts of beats of class ωj. P (ωj)

was calculated by dividing cj by∑5

1 cj, Note that c5 = 0 and we therefore conclude

that beat classes other than 1–4 are sufficiently rare to have negligible effect on the

utility of a system.

Calculation of costs

Cost is a subjective concept, and will depend upon the objectives of the researcher.

Table A.9 shows a set of costs of making an incorrect classification calculated on

a particular set of propositions and criteria. The derivation of this table, and the

propositions and criteria used, are given in appendix A.

Chazal et al. Melo et al.R(αn) 2.05 1.67R(αsveb) 65.46 17.91R(αveb) 0.07 0.03R(αf) 1.79 0.71

R 6.67 1.71

R 0.126 0.033

Table 3.2: Comparative performance of two classifiers

Calculation of risk

Together with the values of P (αk|ωj), tables 3.1 and A.9 enable the risk to be cal-

culated. Unfortunately, in many cases the literature presents neither the values for

P (αk|ωj), nor the table of beat-by-beat comparisons from which they could be de-

duced. A search of the literature revealed only two classifiers for which this data was

reported. These are the classifiers of de Chazal et al. [20] and of Melo et al. [53].

Melo et al. publishes separate results for aberrated and non-aberrated atrial pre-

mature beats. For the purposes of comparison, aberrated and non-aberrated atrial

premature beats are regarded as a single class (ωsveb).

From tables 3.1 and A.9, the value of Rmax was calculated as described in sec-

tion 3.6.2 as $52,817.

The results are shown in table 3.2. In these results, the overall risk R is significantly

lower for the Melo classifier, and the risks of reliance R(αi) is also lower for all i. In

other words, this classifier dominates in all respects. In general however this may

not be the case, and one classifier may have a lower risk of reliance for one decision

whilst having a higher risk of reliance for another.

3.7. CONCLUSION 49

3.6.4 Summary of classifier assessment

A system designed to classify beats into more than two classes is not a binary clas-

sifier and performance should not be reported as if it is. The utility of the classifier

cannot be fully quantified in terms of the number of correct and incorrect beats.

Instead, the number of misclassifications for each class is required. This may be

reported as a n× n matrix of beat classifications (the class conditional frequen-

cies). Together with the a priori probabilities and the costs of misclassification,

quantitative measures of a classifier’s utility can be calculated.

AAMI/ANSI EC57:1998 describes how to compile such a matrix, but makes no rec-

ommendation for its publication. This thesis recommends publication of the matrix,

either within the text of the paper or by reference to an external source. It is trivial

to calculate sensitivity and specificity from such a matrix if desired, and it allows

for more useful measures of performance as described above. Clinicians wishing to

assess a classifier need to obtain estimates for the costs of misclassification, and

calculate the overall risk of reliance.

3.7 Conclusion

Fiducial point extraction algorithms are assessed by comparing their results with

features from a nominated reference source. Features which fall within a certain

tolerance of a reference feature are considered to be matched and the others mis-

matched. There is no general consensus as to the most appropriate matching tol-

erance to use, and there are variants on the matching process. The precise details

of the matching algorithm used to assess the system is seldom mentioned in the

literature, but is nevertheless important for repeatability of results and meaningful

comparison.

The commonly used calculations for sensitivity and specificity do not result in a true

figure for the probabilities intended by their historical definitions. Misunderstand-

ings are likely to occur if this detail is not appreciated by the research community.

The proposed alternative calculations given in (3.10) and (3.11) may help to avoid

such misunderstanding.

In order to make performance results pertinent to a real-time system, at least some

mention of the total time taken to produce results and the phase delay needs to

be mentioned. For time and precision results, the mean average provides only the

bare minimum of information that the biomedical engineer requires. If the full

table of results is not published, then the mean and standard deviation, or some

other description of the probability density function should be provided in order for

results to be useful in implementation of larger systems.

Chapter 4

Detection of R peaks

The most prominent feature of the ECG is the QRS complex. It corresponds to

the contraction of the ventricles, which are the largest chambers of the heart.

At the centre of the QRS complex is the R peak. Accurate measurement of the

RR interval is a fundamental aspect of ECG analysis. In this chapter a method of

detecting R using Kupeev’s algorithm[42] is introduced. This method performs well

in the presence of noise, even without filtering or any other pre-processing steps.

However, low-pass filtering not only improves results but, despite the overhead,

yields an overall improvement in the speed of the algorithm.

4.1 Overview

The method uses a window of fixed time and applies Kupeev’s algorithm to the

windowed portion of the signal. The algorithm approximates the ECG signal with

a binary tree where each node of the tree represents a peak. Initially, all regions

between successive minima are considered to be peaks, as illustrated in figure 4.7.

Successive pruning of leaf node peaks with the smallest weight results in only the

most prominent ones remaining. These remaining peaks indicate the R peaks in

the ECG signal.

52 CHAPTER 4. DETECTION OF R PEAKS

A general discussion of Kupeev’s algorithm is presented in section 2.2.3. There are

several aspects of the algorithm used in this chapter, which are not covered by [42]

and these are listed below.

• Kupeev describes two variants of his algorithm. The first variant detects the n

most significant peaks of a signal, where n is chosen beforehand. The second

variant detects all peaks which exceed a given magnitude. In this chapter

only the second variant is used.

• This algorithm allocates a “weight” to every peak of the signal and isolates

those peaks of greatest weight. Kupeev suggests that either the height or

the area of each peak be used as the “weight”. QRS complexes are typically

of high amplitude and short duration, which makes the area an unsuitable

choice in this application. Therefore, throughout this chapter, the height of

a peak is used as the weight, and except where indicated otherwise the terms

“height” and “weight” are used synonymously.

• Kupeev states the optimal complexity of the algorithm, but does not describe

how to implement an algorithm with that complexity. This chapter presents

such a description.

• The algorithm is used in conjunction with the shifting window approach de-

scribed in section 4.2.6.

4.2 Algorithm implementation

There are a number of ways to implement the algorithm. However, because of the

emphasis on speed, the most obvious ones are not suitable. To implement the al-

gorithm efficiently one must avoid repetitive iteration over the same regions of the

signal. This requires the creation and maintenance of a number of data structures

4.2. ALGORITHM IMPLEMENTATION 53

(a) ECG signal with baselines ateach minima.

(b) Nodes are allocated for eachpeak.

h16 h17

h18h19

........................................................

..........................................................................................................................................................

.................................................................................................................................................

..............................................................

........................................................................................................................................

................................................................................................................................................

........................................................................................

........................................................................

................................................................

.............................................................................

.............................................................................................................................................................................

...............................................................................

...............................................................................................................................................................................................................................

.................

....................................................................................................................................

.....................................................................

......................................................................

(c) Peaks formed into a binary tree,and heights are added.