statistical models of mammographic texture and appearance
Post on 13-Jan-2015
7.845 Views
Preview:
DESCRIPTION
TRANSCRIPT
Statistical Models of Mammographic
Texture and Appearance
A thesis submitted to the University of Manchester for the
degree of Doctor of Philosophy in the Faculty of Medical
and Human Sciences
2005
Christopher J. Rose
School of Medicine
1
Contents
1 Introduction 28
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.2 Breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3 Computer-aided mammography . . . . . . . . . . . . . . . . . . . 30
1.4 Novelty detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.5 Generative models . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.6 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 33
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2 Breast cancer 36
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Anatomy of the breast . . . . . . . . . . . . . . . . . . . . . . . . 37
2
CONTENTS CONTENTS
2.3 Breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.1 What is breast cancer? . . . . . . . . . . . . . . . . . . . . 39
2.3.2 Predictive factors . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.3 Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.4 Clinical detection . . . . . . . . . . . . . . . . . . . . . . . 46
2.3.5 Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.6 Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 Breast imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4.1 X-ray mammography . . . . . . . . . . . . . . . . . . . . . 51
2.4.2 Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . 55
2.4.3 Magnetic resonance imaging . . . . . . . . . . . . . . . . . 55
2.4.4 Computed tomography . . . . . . . . . . . . . . . . . . . . 56
2.4.5 Thermography . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Computer-aided mammography 59
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Computer-aided mammography . . . . . . . . . . . . . . . . . . . 60
3
CONTENTS CONTENTS
3.3 Image enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 Breast segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.5 Breast density and risk estimation . . . . . . . . . . . . . . . . . . 68
3.6 Microcalcification detection . . . . . . . . . . . . . . . . . . . . . 70
3.7 Masses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.8 Spiculated lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.9 Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.10 Clinical decision support . . . . . . . . . . . . . . . . . . . . . . . 85
3.11 Evaluation of computer-based methods . . . . . . . . . . . . . . . 86
3.12 Image databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.13 Commercial systems . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.14 Prompting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.15 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4 Scale-orientation pixel signatures 107
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.2 Mathematical morphology . . . . . . . . . . . . . . . . . . . . . . 108
4
CONTENTS CONTENTS
4.2.1 Dilation and erosion . . . . . . . . . . . . . . . . . . . . . 109
4.2.2 Opening and closing . . . . . . . . . . . . . . . . . . . . . 110
4.2.3 M- and N-filters . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3 Pixel signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.1 Local scale-orientation descriptors . . . . . . . . . . . . . . 112
4.3.2 Constructing pixel signatures . . . . . . . . . . . . . . . . 113
4.3.3 Metric properties . . . . . . . . . . . . . . . . . . . . . . . 115
4.4 Analysis of the current implementation . . . . . . . . . . . . . . . 116
4.4.1 Structuring element length . . . . . . . . . . . . . . . . . . 116
4.4.2 Local coverage . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5 An information theoretic measure of signature quality . . . . . . . 122
4.5.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.6 Classification-based evaluation . . . . . . . . . . . . . . . . . . . . 128
4.6.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5
CONTENTS CONTENTS
4.6.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5 Modelling distributions with mixtures of Gaussians 133
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.3 Density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.4 Gaussian mixture models . . . . . . . . . . . . . . . . . . . . . . . 140
5.4.1 Learning the parameters . . . . . . . . . . . . . . . . . . . 141
5.4.2 The k-means clustering algorithm . . . . . . . . . . . . . . 142
5.4.3 The Expectation Maximisation algorithm for Gaussian mix-
tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.5 Useful properties of multivariate normal distributions . . . . . . . 151
5.5.1 Marginal distributions . . . . . . . . . . . . . . . . . . . . 151
5.5.2 Conditional distributions . . . . . . . . . . . . . . . . . . . 153
5.5.3 Sampling from a Gaussian mixture model . . . . . . . . . 160
6
CONTENTS CONTENTS
5.6 Learning from large datasets . . . . . . . . . . . . . . . . . . . . . 161
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6 Modelling mammographic texture for image synthesis and anal-
ysis 166
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.3 Non-parametric sampling for texture synthesis . . . . . . . . . . . 170
6.4 A generative parametric model of texture . . . . . . . . . . . . . . 172
6.5 Generating synthetic textures . . . . . . . . . . . . . . . . . . . . 174
6.5.1 Pixel-wise texture synthesis . . . . . . . . . . . . . . . . . 174
6.5.2 Patch-wise texture synthesis . . . . . . . . . . . . . . . . . 174
6.5.3 The advantages and disadvantages of a parametric statisti-
cal approach . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.6 Some texture models and synthetic textures . . . . . . . . . . . . 178
6.6.1 A model of fractal mammographic texture . . . . . . . . . 178
6.6.2 A model of real mammographic texture . . . . . . . . . . . 179
6.6.3 The quality of the synthetic textures . . . . . . . . . . . . 182
6.6.4 Time and space requirements of the parametric method . . 185
7
CONTENTS CONTENTS
6.7 Novelty detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7 Evaluating the texture model 190
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.2 Psychophysical evaluation of synthetic textures . . . . . . . . . . 191
7.2.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
7.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.3 Initial validation of the novelty detection method . . . . . . . . . 197
7.3.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.4 Evaluation of novelty detection performance . . . . . . . . . . . . 200
7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.4.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8
CONTENTS CONTENTS
7.4.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
7.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8 GMMs in principal components spaces and low-dimensional tex-
ture models 222
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
8.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . 223
8.3 Gaussian mixtures in principal components spaces . . . . . . . . . 224
8.3.1 A numerical issue . . . . . . . . . . . . . . . . . . . . . . . 227
8.4 Texture synthesis in principal components spaces . . . . . . . . . 228
8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9 A generative statistical model of entire mammograms 233
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
9.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.2.1 Why are mammograms hard to model? . . . . . . . . . . . 234
9
CONTENTS CONTENTS
9.2.2 Approaches to modelling the appearance of entire mammo-
grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9.3 Modelling and synthesising entire mammograms . . . . . . . . . . 242
9.3.1 Breast shape and the correspondence problem . . . . . . . 243
9.3.2 Approximate appearance . . . . . . . . . . . . . . . . . . . 249
9.3.3 Detailed appearance . . . . . . . . . . . . . . . . . . . . . 254
9.3.4 Generating synthetic mammograms . . . . . . . . . . . . . 256
9.4 Example synthetic mammograms . . . . . . . . . . . . . . . . . . 257
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
10 Evaluating the synthetic mammograms 261
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.2 Qualitative evaluation by a mammography expert . . . . . . . . . 262
10.3 A quantitative psychophysical evaluation . . . . . . . . . . . . . . 263
10.3.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
10.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
10.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
10.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
10
CONTENTS 11
10.4 Evaluating the detailing model . . . . . . . . . . . . . . . . . . . . 265
10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
11 Summary and conclusions 271
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
11.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
11.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
11.4 Final statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
A The expectation maximisation algorithm 280
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
A.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
A.3 Proof of convergence . . . . . . . . . . . . . . . . . . . . . . . . . 282
List of Figures
2.1 Basic anatomy of the normal developed female breast. . . . . . . . 38
2.2 Incidence of breast cancer in England. . . . . . . . . . . . . . . . 43
2.3 The mediolateral-oblique and cranio-caudal views. . . . . . . . . . 52
3.1 An example microcalcification cluster. . . . . . . . . . . . . . . . 72
3.2 An example circumscribed mass. . . . . . . . . . . . . . . . . . . . 75
3.3 An example spiculated lesion. . . . . . . . . . . . . . . . . . . . . 79
4.1 Dilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2 A sieved mammographic image. . . . . . . . . . . . . . . . . . . . 112
4.3 Example pixel signatures. . . . . . . . . . . . . . . . . . . . . . . 114
4.4 An illustration of the two limitations of the existing implementation.118
4.5 Incremental approximations of the bow tie structuring element. . 119
12
13
4.6 Rotating the “rectangular” structuring elements. . . . . . . . . . . 121
4.7 An “improved” pixel signature from the centre of a Gaussian blob. 122
4.8 Regions of increased Shannon entropy. . . . . . . . . . . . . . . . 127
4.9 An example region of interest and its groundtruth. . . . . . . . . 129
5.1 An illustration of the expectation maximisation algorithm. . . . . 149
5.2 A two-dimensional distribution marginalised over one dimension. . 152
5.3 A conditional distribution. . . . . . . . . . . . . . . . . . . . . . . 154
5.4 The divide-and-conquer clustering algorithm. . . . . . . . . . . . . 163
6.1 Unconditional samples from the fractal model. . . . . . . . . . . . 180
6.2 Fractal training and synthetic textures. . . . . . . . . . . . . . . . 181
6.3 Unconditional samples from the real mammographic texture model. 182
6.4 Real training and synthetic textures. . . . . . . . . . . . . . . . . 183
6.5 Examples of synthesis failure using patch-wise synthesis with a
model of real mammographic appearance. . . . . . . . . . . . . . 185
7.1 A screenshot of one of the trials. . . . . . . . . . . . . . . . . . . . 195
7.2 Fractal and scrambled textures. . . . . . . . . . . . . . . . . . . . 198
7.3 ROC curve for texture discrimination. . . . . . . . . . . . . . . . 199
14
7.4 The circle chord attenuation function. . . . . . . . . . . . . . . . . 208
7.5 The sigmoid attenuation function. . . . . . . . . . . . . . . . . . . 208
7.6 Examples of simulated masses using the three methods. . . . . . . 209
7.7 Example log-likelihood image and ROC curve for simulated micro-
calcifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.8 Example log-likelihood image and ROC curve for a simulated mass. 212
7.9 ROC curve for simulated masses and microcalcifications (combined).213
7.10 Example log-likelihood image and ROC curve for a real microcal-
cification cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
7.11 ROC curve for real masses. . . . . . . . . . . . . . . . . . . . . . . 216
7.12 ROC curve for real microcalcifications and masses (combined). . . 217
8.1 Synthesis using a principal components model. . . . . . . . . . . . 229
9.1 Examples of mammographic variation. . . . . . . . . . . . . . . . 235
9.2 Overview of the Active Appearance Model. . . . . . . . . . . . . . 240
9.3 Samples from two shape models, illustrating the need for good
correspondences. . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.4 Values of the Kotcheff and Taylor objective function. . . . . . . . 246
9.5 Values of the MDL objective function. . . . . . . . . . . . . . . . 247
15
9.6 The initial and final correspondences for the mammogram shape
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
9.7 Block diagram for the steerable pyramid decomposition. . . . . . . 252
9.8 The coefficients in the top three levels of a steerable pyramid de-
composition of a mammogram. . . . . . . . . . . . . . . . . . . . 253
9.9 Synthetic mammograms generated using the model. . . . . . . . . 259
9.10 Real and synthetic mammograms. . . . . . . . . . . . . . . . . . . 260
10.1 Contributions of detailing coefficients to real and synthetic mam-
mograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
List of Algorithms
1 The non-iterative k-means algorithm. . . . . . . . . . . . . . . . . 143
2 The iterative k-means algorithm. . . . . . . . . . . . . . . . . . . 143
3 The EM algorithm for fitting a GMM with two components to
one-dimensional data. . . . . . . . . . . . . . . . . . . . . . . . . . 148
4 The EM algorithm for fitting a GMM with multiple components
to multivariate data. . . . . . . . . . . . . . . . . . . . . . . . . . 150
5 Efros and Leung’s texture synthesis algorithm. . . . . . . . . . . . 171
6 Pixel-wise texture synthesis with a Gaussian mixture model of local
textural appearance. . . . . . . . . . . . . . . . . . . . . . . . . . 175
7 Patch-wise texture synthesis with a Gaussian mixture model of
local textural appearance. . . . . . . . . . . . . . . . . . . . . . . 176
8 Fractal mammographic texture algorithm. . . . . . . . . . . . . . 179
9 Novelty detection using a Gaussian mixture model of texture. . . 188
10 Simulating microcalcification clusters. . . . . . . . . . . . . . . . . 206
11 Generating a synthetic mammogram . . . . . . . . . . . . . . . . 256
16
List of Tables
4.1 Classification results for the two signature types. . . . . . . . . . . 130
7.1 Results for the psychophysical experiment. . . . . . . . . . . . . . 196
17
Abstract
Breast cancer is the most common cancer in women. Many countries—including
the UK—offer asymptomatic screening for the disease. The interpretation of
mammograms is a visual task and is subject to human error. Computer-aided
image interpretation has been proposed as a way of helping radiologists perform
this difficult task. Shape and texture features are typically classified into true
or false detections of specific signs of breast cancer. This thesis promotes an
alternative approach where any deviation from normal appearance is marked as
suspicious, automatically including all signs of breast cancer. This approach re-
quires a model of normal mammographic appearance. Statistical models allow
deviation from normality to be measured within a rigorous mathematical frame-
work. Generative models make it possible to determine how and why a model is
successful or unsuccessful. This thesis presents two generative statistical models.
The first treats mammographic appearance as a stationary texture. The sec-
ond models the appearance of entire mammograms. Psychophysical experiments
were used to evaluate synthetic textures and mammograms generated using these
models. A novelty detection experiment on real and simulated data shows how
the model of local texture may be used to detect abnormal features.
18
Declaration
No portion of the work referred to in the thesis has been submitted in support of
an application for another degree or qualification of this or any other university
or other institute of learning.
19
Copyright
1. Copyright in text of this thesis rests with the Author. Copies (by any
process) either in full, or of extracts, may be made only in accordance with
instructions given by the Author and lodged in the John Rylands University
Library of Manchester. Details may be obtained from the Librarian. This
page must form part of any such copies made. Further copies (by any
process) of copies made in accordance with such instructions may not be
made without the permission (in writing) of the Author.
2. The ownership of any intellectual property rights which may be described
in this thesis is vested in the University of Manchester, subject to any
prior agreement to the contrary, and may not be made available for use by
third parties without the written permission of the University, which will
prescribe the terms and conditions of any such agreement.
3. Further information on the conditions under which disclosures and ex-
ploitation may take place is available from the Head of School of School
of Medicine.
20
Dedication
This thesis is dedicated to the memory of Gareth Jones.
In addition to being an excellent office mate, Gareth made a substantial contri-
bution to my PhD research. With his dry sense of humour, willingness to help
and pragmatic perfectionism—and despite his admirable unwillingness to bend
to the stupidity of others—he motivated me to learn how to prepare documents
using the LATEX typesetting system, contributed to discussions on mathematical
matters, helped me with various aspects of MATLAB and UNIX, and radically
altered my view of computers and programming. It is a pleasure to have known
him, and I wish I had known him better.
Friday 21 November 2003.
21
Acknowledgements
The author would like to thank the following people:
• My mother, Anne, who has put myself and my brothers first in everything
she has done.
• My girlfriend, Chris, for uncountable reasons.
• My PhD supervisor, Prof. Chris Taylor OBE, who is patient, supportive,
giving and hard-working.
• Anthony Holmes, for his generosity in getting me started.
• Special thanks go to Andrew Bentley, who employed a spotty teenage geek
and taught him electronics and computer programming. This thesis would
not exist without his support—thank you! Thanks also to Richard, David,
Keith and Martin for all their assistance.
• My friends, for their support over the last few years: Stuart, Rick, Rob,
Jimi, Elios, Alan, Caroline, Harpreet, Karen, Ruth and Sian.
• My office mates: Gareth, Craig, Mike, Kaiyan, Basma, Tamader, John and
Rob.
22
23
• Other members of ISBE, including Tim Cootes, Carole Twining, Sue Astley,
Paul Beatty, Jim Graham, Ian Scott and Tomos Williams, for their help at
various times during my time as a PhD student.
• The ISBE information technology support team for keeping things ticking.
• Alexandre Nasrallah for proof-reading some of the chapters in this thesis.
Funding
The work described in this thesis was supported by the EPSRC as part of the
MIAS-IRC project From Medical Images and Signals to Clinical Information (EP-
SRC GR/N14248/01 and UK Medical Research Council Grant No. D2025/31).
24
About the Author
In holiday time during his A-level studies and first degree, Chris Rose worked
for Kraft Jacobs Suchard on a range of electronic and software projects. He
graduated from The University of Manchester in 1999 with a 2.1 BEng (Hons)
degree in Electronic Systems Engineering. He then worked for a small software
house where he developed software and produced training materials for Ericsson.
In 2000, he returned to The University of Manchester to begin a PhD in the
Division of Imaging Science and Biomedical Engineering, under the supervision
of Prof. Chris Taylor OBE. During this period he published the following papers
related to the work in this thesis.
• C. J. Rose and C. J. Taylor. An Improved Method of Computing Scale-
Orientation Signatures. In Medical Image Understanding and Analysis,
pages 5–8, July 2001
• C. J. Rose and C. J. Taylor. A Statistical Model of Texture for Medical Im-
age Synthesis and Analysis. In Medical Image Understanding and Analysis,
pages 1–4, July 2003
25
26
• C. J. Rose and C. J. Taylor. A Model of Mammographic Appearance. In
British Journal of Radiology Congress Series: Proceedings of UK Radiolog-
ical Congress 2004, pages 34–35, Manchester, United Kingdom, June 2004
• C. J. Rose and C. J. Taylor. A Statistical Model of Mammographic Ap-
pearance for Synthesis and Analysis. In International Workshop on Digital
Mammography, 2004. (Accepted, pending.)
• C. J. Rose and C. J. Taylor. A Generative Statistical Model of Mammo-
graphic Appearance. In D. Rueckert, J. Hajnal, and G.-Z. Yang, editors,
Medical Image Understanding and Analysis 2004, pages 89–92, Imperial
College London, UK, September 2004
• C. J. Rose and C. J. Taylor. A Holistic Approach to the Detection of
Abnormalities in Mammograms. In British Journal of Radiology Congress
Series: Proceedings of UK Radiological Congress 2005, page 29, Manchester,
United Kingdom, June 2005
• A. S. Holmes, C. J. Rose, and C. J. Taylor. Measuring Similarity between
Pixel Signatures. Image and Vision Computing, 20(5–6):331–340, April
2002
• A. S. Holmes, C. J. Rose, and C. J. Taylor. Transforming Pixel Signa-
tures into an Improved Metric Space. Image and Vision Computing, 20(9–
10):701–707, August 2002
‘As many truths as men. Occasionally, I glimpse a truer Truth, hiding in im-
perfect simulacrums of itself, but as I approach, it bestirs itself and moves deeper
into the thorny swamp of dissent.’
From Cloud Atlas by David Mitchell.
27
Chapter 1
Introduction
1.1 Introduction
Since work for this thesis began, approximately 64 000 British women have died
from breast cancer [24]. Computer-aided X-ray mammography has been pro-
posed as a way to help radiologists detect breast cancer at an early stage. This
thesis describes work on generative statistical models of normal mammographic
appearance. The ultimate aim of this strand of research is to be able to detect
breast cancer as a deviation from normal appearance. The generative property
enables insight into what has been modelled successfully and where improvement
is needed. Two generative statistical models of mammographic appearance are
described.
This chapter presents a brief overview of the main subjects and motivations of
this thesis. The chapter presents:
28
Chapter 1—Breast cancer 29
• An overview of breast cancer.
• An overview of computer-aided mammography.
• A description of novelty detection, the approach to breast cancer detection
that motivates this thesis.
• A description of generative models, and an explanation of why this property
is vital to developing accurate models.
• An overview of the organisation of the thesis.
1.2 Breast cancer
Approximately 11 500 women die from breast cancer each year in England and
Wales and it is the most common cancer in women (both in the UK and world-
wide) [82]. It is possible to detect breast cancer at an early stage using X-ray
mammography; treatments are available and survival rates are good [82]. The UK
National Health Service Breast Screening Programme (NHSBSP) was initiated in
1988 as a result of the Forrest report [66], published in 1987. All asymptomatic
women aged 50–69 are invited for X-ray mammographic screening every three
years. Radiologists visually inspect these X-ray images for signs of breast cancer
and other problems. A more detailed background to breast cancer and screening
is presented in Chapter 2.
Chapter 1—Computer-aided mammography 30
1.3 Computer-aided mammography
Research into the use of computers to detect breast cancer in mammograms has
been underway for about thirty years. In the most common approach, a com-
puter automatically analyses a digitised mammogram and attempts to locate
signs of cancer. Detections are displayed to clinicians as prompts on a computer
screen or paper printout. Computer-aided mammography research has matured
to the point where, in 1998, the US Food and Drug Administration (FDA) gave
pre-market approval to the ImageChecker system, developed by R2 Technology
Incorporated. Three other systems have since been given FDA approval. How-
ever, results from research into the effectiveness of these systems in the clinical
environment are mixed. A large prospective study recently showed that expert
screening radiologist performance in one academic practice was not improved by
the use of a computer-aided mammography system [76] (see Section 3.14 for a
more detailed discussion). Other studies have indicated that such systems can
help radiologists detect breast cancer earlier [8]. Psychophysical experiments
that have studied the effect of the false prompt rate (i.e. incorrect detections of
cancer) on radiologist performance indicate that the number of true and false
prompts must be approximately equal if radiologist performance is to be im-
proved [95]. Only 5% of screening mammograms have any form of abnormality.
This suggests that a target rate should be approximately 0.0125 false positives
per image (see Chapter 3). Commercial systems operate at much higher false
positive rates. For example, R2 Technology Incorporated claim that version 8.0
of their ImageChecker algorithm achieves ‘1.5 false positive marks per normal
case at the 91 percent sensitivity level’ [149]. This perhaps explains why the
Chapter 1—Novelty detection 31
commercial computer-aided mammography systems do not appear to improve
radiologist performance. Research is needed to determine how computer-aided
mammography systems can be improved and how the false positive rate can be
reduced to the target level. It is likely that much more sophisticated approaches
will be required. This thesis investigates one such approach, which is described
briefly in the next section.
1.4 Novelty detection
Breast cancer, as imaged in mammograms, can manifest itself in a number of
different ways. Masses appear as “blob”-like features, microcalcifications appear
as very small specks, architectural distortions subtly change the appearance of
the breast tissue and spiculated masses have radiating linear structures. Each of
these can be extremely subtle. Current computer-aided mammography methods
typically target only microcalcifications and masses (including spiculated masses),
and treat each type of abnormality separately. A common approach is to locate
candidate abnormalities (often using ad hoc methods), compute measurements of
shape and texture (called features) and then use a classifier to classify the features
into clinically meaningful classes (e.g. malignant or benign). The approach has a
number of drawbacks:
• Different features and classifiers are required for each type of abnormality.
• The features and classifiers implicitly and incompletely model the appear-
ance of normal and breast cancer tissue. These tissue types are subject to
Chapter 1—Generative models 32
significant variation.
• It is often difficult to justify why a particular measure of texture or shape
is better than another and what it actually represents.
• The use of ad hoc methods risks the accidental adoption of assumptions
about the data.
The approach advocated in this thesis is novelty detection, which is motivated
by the fact that signs indicative of breast cancer are not found in pathology-free
mammograms. If deviation from normality could be detected, then all types of
abnormality would automatically be detectable. This approach requires a model
of what normal mammograms look like. Mammograms vary dramatically, both
between women and between screening sessions, so such a model must be able to
cope with this variability. Statistical models capture variability and are suited to
novelty detection problems because deviation from normality can be measured in
a meaningful way within a rigorous mathematical framework. Abnormal mam-
mograms are relatively rare in the screening environment, so there is much more
data with which to train a model of normality than there is to train a classifier
that has an “abnormal” class.
1.5 Generative models
If novelty detection is to be used, then the underlying model must be able to
“legally” represent any pathology-free instance and be unable to legally represent
abnormal instances. The only way to verify this is to be able to generate instances
Chapter 1—Overview of the thesis 33
from the model; thus the model must be generative. Further, generative models
make it relatively easy to visualise what has been modelled successfully and what
has not. The generative property makes progress towards a model that accurately
explains mammographic appearance tractable. The aim of the research presented
in this thesis was to develop and evaluate generative statistical models of normal
mammographic appearance with the ultimate aim of being able to detect breast
cancer via novelty detection. Two models have been developed and evaluated.
The first assumes that mammograms are textures and neglects the shape of the
breast and the spatial variability in mammographic texture. The model allows
synthetic textures to be generated and can be used in an analytical mode to
perform novelty detection. The second is a generative statistical model of entire
mammograms and addresses many of the problems associated with modelling
mammographic appearance.
1.6 Overview of the thesis
• Chapter 2 presents background information on breast cancer, the clinical
problem and the various imaging modalities that are used to diagnose the
disease.
• Chapter 3 presents a review of the computer-aided mammography litera-
ture.
• Chapter 4 describes work on improving the way that scale-orientation
pixel signatures (a type of texture feature) are computed. A measure of
Chapter 1—Overview of the thesis 34
signature quality, based upon information theory, is developed and a simple
classification experiment is presented.
• Chapter 5 presents background information on the multivariate normal
distribution and the Gaussian mixture model. These models are used ex-
tensively in this thesis.
• Chapter 6 presents Efros and Leung’s algorithm for texture synthesis and
develops the method into a parametric statistical model of texture that can
be used in both generative and analytical modes. Synthetic textures are
presented.
• Chapter 7 presents a psychophysical evaluation of synthetic mammo-
graphic textures produced by the model developed in Chapter 6. A novelty
detection experiment using simulated and real data is presented.
• Chapter 8 presents an investigation into how Gaussian mixture models
(and hence the class of texture model presented in Chapter 6) may be
learned in low-dimensional principal components spaces. Texture synthesis
and analysis using such models is discussed.
• Chapter 9 describes a generative statistical model of entire mammograms
and shows how synthetic mammograms may be generated.
• Chapter 10 presents three evaluations of the synthetic mammograms gen-
erated using the model of entire mammograms.
• Chapter 11 summarises the work presented in the thesis.
Chapter 1—Summary 35
1.7 Summary
This chapter presented a brief overview of the subjects, motivations and structure
of this thesis. The next chapter presents an introduction to breast cancer and
the imaging modalities used to detect the disease.
Chapter 2
Breast cancer
2.1 Introduction
This chapter introduces the clinical problem of breast cancer and describes how
medical imaging is used to detect the disease. The chapter discusses:
• The anatomy of the breast.
• Breast cancer and its risk factors, prevention, detection, treatment and
survival.
• The various medical imaging modalities used to detect breast cancer, par-
ticularly X-ray mammography.
36
Chapter 2—Anatomy of the breast 37
2.2 Anatomy of the breast
The main purpose of the female breast is to produce and deliver milk to offspring.
Additionally, breasts are a secondary sexual characteristic and serve to indicate
sexual maturity. A brief description of the basic anatomy of the breast follows,
but the interested reader is directed to [172] for a comprehensive description
within the context of mammography.
The breast itself is a modified sweat gland and is composed of several structures,
illustrated in Figure 2.1. Above the ribcage is the pectoral muscle. At the front
of the breast, and externally visible, is the nipple. Milk is produced in lobes and
delivered to the nipple by ducts. These are collectively referred to as parenchymal
or glandular tissue; they are the functional structures of the breast, as opposed to
being connective or supporting tissues. The areola exposes glands that lubricate
the nipple during breastfeeding. Circular radiating muscles behind the areola
cause the nipple to become erect upon tactile stimulation, facilitating suckling.
The lymphatic system is responsible for protecting the body from infection from
microorganisms and antigens. This is achieved by transporting the microorgan-
isms and antigens to the lymph nodes where they are dealt with by the body’s
cellular immune system. Blood is transported to and from the breast by the vas-
culature. Blood delivers oxygen and nutrients and removes waste products. The
structure of breast is supported by Cooper’s ligaments and also contains adipose
(fatty) tissue, neither of which are shown in Figure 2.1.
Chapter 2—Anatomy of the breast 38
Figure 2.1: Basic anatomy of the normal developed female breast.Key:A Pectoral muscleB VasculatureC LobeD DuctE Lymph node and lymphatic systemF NippleG Areola
Chapter 2—Breast cancer 39
2.3 Breast cancer
Breast cancer is almost exclusively a disease that affects women: 11 491 women
and 82 men died from breast cancer in England and Wales in 2002 [82]. We will
now briefly examine the background to the disease.
2.3.1 What is breast cancer?
We will now briefly discuss the cellular basis of cancer1. Our bodies are composed
of cells, which typically carry all of the genetic information required to determine
how we will grow. Cancer is an umbrella term for a group of diseases that cause
cells in the body to reproduce in an uncontrolled manner.
Cells have several abilities, one of which is reproduction. Reproduction is achieved
via cell division. At each cell division, the genetic material contained within the
mother cell is copied to the daughter cells via a robust mechanism. This robust
mechanism can detect errors in the genetic material contained within the cell and
can instruct the cell to “commit suicide” via programmed cell death (PCD)2 to
prevent the erroneous information from being propagated.
Recent cancer research has suggested that an enzyme called telomerase [77] plays
an important role. At each normal cell division, genetic material at the ends
of the chromosomes is lost. To prevent useful genetic material from being de-
stroyed, the ends of chromosomes have redundant repeating genetic sequences
1The interested reader is directed to [36] for background material on cellular biology2PCD is also referred to as apoptosis.
Chapter 2—Breast cancer 40
called telomeres. Part of these sequences are lost at each cell division, but the
genetic information specific to the organism is preserved. If telomeres become
too short, or are deleted entirely, the body interprets the genetic sequence as
being broken. In this situation, the cell can be instructed to perform PCD, or
reparative mechanisms can be employed. These reparative mechanisms can intro-
duce genetic mutations. Cancer cells are “immortal” in that they do not respond
to PCD instructions. Telomerase—an enzyme that builds new telomeres—is ex-
pressed in approximately 90% of cancers, and the telomeres in cancer cells do not
shorten. It is believed that telomerase may be the reason why cancer cells are
immortal. Cancer cells divide rapidly until they are forcefully destroyed (e.g. by
medical intervention or the death of the host organism). Cancer cells are there-
fore genetically abnormal, but the exact genetic nature of cancer is not yet fully
understood.
Cancers are named after their originating organ (i.e. breast cancer originates in
the breast and is composed of pathological breast tissue). Cancer cells can break
away from their original location and travel through the vascular or lymphatic
systems. These cells may lodge to form secondary cancers in other parts of the
body. This process is called metastasis. The new cancer is named after the
originating tissue and new location, for example secondary breast cancer of the
brain. Breast cancer generally develops in the ducts (ductal cancer), but may
also develop in the lobes (lobular cancer).
The terms cancer and tumour are not synonymous. A tumour may be benign or
malignant. Benign tumours are abnormal growths, but do not grow uncontrol-
lably or metastasise, and are not necessarily life-threatening. The word cancer
Chapter 2—Breast cancer 41
is synonymous with the phrase malignant tumour. Benign tumours can become
malignant, but malignant tumours do not become benign. Cancer is caused by a
number of factors that can act individually or in combination [5]. These include:
• External factors, e.g. exposure to:
– Chemicals—particularly tobacco use
– Infectious organisms
– Radiation
• Internal factors, e.g. :
– Inherited and metabolic genetic mutations
– Hormones
– Immunity responses
Breast cancers can be described as being in situ (i.e. they have not spread from
their originating duct or lobule), and are often cured [4]. Alternatively, breast
cancers can be described as being invasive or infiltrating (i.e. they have broken
into the surrounding fatty tissue of the breast). The severity of an invasive breast
cancer is related to the stage of the disease, which describes how far it has spread
(e.g. it is confined to the breast, or surrounding tissue, or has metastasised to
distant organs). The following terms are often used to describe the stage of the
disease [37]:
Chapter 2—Breast cancer 42
• Stage 1
– The tumour is no larger than 2 cm in diameter.
– The lymph nodes in the armpit are unaffected.
– The cancer has not metastasised.
• Stage 2
– The tumour is between 2 cm and 5 cm in diameter, and/or the cancer
has spread to the lymph nodes under the armpit.
– The cancer has not spread elsewhere in the body.
• Stage 3
– The tumour is larger than 5 cm in diameter.
– The cancer has spread to the lymph nodes under the armpit.
– The cancer has not spread elsewhere in the body.
• Stage 4
– The tumour may be any size.
– The lymph nodes in the armpit are often affected.
– The cancer has spread to other parts of the body.
2.3.2 Predictive factors
The risk of developing breast cancer increases with age, as Figure 2.2 illustrates.
In the USA, 95% of new cases and 96% of breast cancer deaths in the period
Chapter 2—Breast cancer 43
Figure 2.2: Incidence of breast cancer in England.The incidence of breast cancer in English women in 2001 per 100 000 populationas a function of age. Linear interpolation is used between data points. Source ofdata: National Statistics [21].
1996–2000 occurred in women aged 40 and older [4].
Risk factors can be grouped by relative risk3 [4]:
• Relative risk > 4.0
– Inherited genetic mutations (particularly BRCA1 and/or BRCA2).
– Two or more first-degree relatives4 diagnosed with breast cancer at an
early age.
3Relative risk is defined as the ratio of the probability of the disease in the group exposedto the risk, to the probability of the disease in a control group.
4A first-degree relative is a mother, father, sister, brother, daughter or son
Chapter 2—Breast cancer 44
– Post-menopausal breast density.
• Relative risk > 2.0 and ≤ 4.0
– One first-degree relative with breast cancer.
– High dose of radiation to the chest.
– High post-menopausal bone density.
• Relative risk > 1.0 and ≤ 2.0
– Late age at first full-term pregnancy (> 30 years).
– Early menarche (< 12 years).
– Late menopause (> 55 years).
– No full-term pregnancies.
– Recent oral contraceptive use.
– Recent and long-term hormone replacement therapy.
– Tall.
– High socioeconomic status.
– Post-menopausal obesity.
Tobacco use is not necessarily linked to breast cancer. Some studies have shown
that smoking is not associated with the disease, while others have indicated a
link [43]. Effects due to smoking are confounded by alcohol use, which correlates
with both tobacco use and increased breast cancer risk. Alcohol is the dietary
factor most consistently associated with increased breast cancer risk [4] and breast
cancer risk increases by about 7% per alcoholic drink consumed per day [112].
Chapter 2—Breast cancer 45
2.3.3 Prevention
Breast cancer cannot be prevented due to the environmental and inherited risk
factors. However it should be possible to reduce the incidence of cancers that can
be attributed to lifestyle factors via behavioural modification.
One of the most important lifestyle changes that can be made is the management
of alcohol consumption: even moderate alcohol use is associated with increased
breast cancer risk [4]. Moderate alcohol consumption has a cardio-protective
effect, so advice on alcohol consumption must consider more than just breast
cancer risk [182]. Women who are not known to have an increased risk of breast
cancer are advised to adopt a healthy lifestyle by limiting alcohol, avoiding to-
bacco use and by maintaining a healthy weight through regular exercise and a
diet that is low in fats and high in fruit and vegetables. However, this advice
is not specific to breast cancer, and instead considers evidence for all common
diseases [182]. Women who are known to have an increased risk of breast cancer
should be advised accordingly.
There is debate within the clinical community about how women should be ad-
vised regarding tobacco use and its effect on breast cancer risk. Some favour
honest advice that states that the balance of evidence shows no or little increased
risk, while others favour advice that emphasises the evidence that indicates that
there is an increased risk in some circumstances, and that women should be dis-
couraged from smoking because of other associated risks (e.g. lung cancer) [43].
General practitioners should consider the risk of breast cancer when prescrib-
ing hormonal medications such as hormone replacement therapy or oral contra-
Chapter 2—Breast cancer 46
ceptives. Women at very high risk may be offered prophylactic mastectomy or
treatment with a drug such as Tamoxifen [4].
2.3.4 Clinical detection
Breast cancer is most successfully treated at an early stage and it has been rec-
ommended for the past 30 years or so that women perform regular breast self-
examination (BSE). In recent years this advice has been challenged. A Canadian
meta-analysis failed to find evidence that BSE reduces breast cancer mortality,
but found that BSE results in more benign breast biopsies and increased patient
distress [12]. The study recommended that women should not be taught BSE, but
the author stresses the difference between BSE and breast self-awareness, and en-
courages the latter [118]. An American study found that women who had benign
biopsies after performing BSE tended to perform BSE less frequently as a result
[13]. Advice on BSE and breast self-awareness needs to informed by evidence of
the risks of increased biopsy rate and distress with the potential benefits. The
American Cancer Society currently recommends that women optionally perform
monthly BSE [4].
Some countries have implemented national screening programmes—where women
are invited for asymptomatic X-ray imaging of the breast (mammography) to
detect cancer at an early stage. The International Breast Cancer Screening Net-
work currently has 27 member countries who have pilot or established national
or subnational screening programmes [101]. These members are predominantly
developed countries in North America, Western Europe and the Far East. The
Chapter 2—Breast cancer 47
UK National Health Service Breast Screening Programme (NHSBSP) was initi-
ated in 1988 as a result of the Forrest report [66]. Women between the ages of
50 and 70 (formerly 65) are invited for screening every three years. Women now
have two views of each of their breasts imaged at each screening session, resulting
in 13% more breast cancers being detected in 2002/3 compared with the previ-
ous 12 months when a single view was used [133]. A 14 year follow-up of the
Edinburgh randomised trial of breast screening, published in 1999, showed that
breast screening reduced breast cancer mortality by 13% [2]; the NHSBSP an-
nual review for 2004 [133] claims that mortality dropped by 30% in the preceding
decade, though this success cannot be attributed to breast screening alone.
The benefits of asymptomatic breast screening are disputed and some argue that
screening may even be detrimental to the health of women. Gøtzsche and Olsen
argue that there is no reliable evidence that screening mammography reduces
mortality and that screening may result in distress and unnecessarily aggressive
treatment [73, 136]. However, their conclusions are largely based upon meta-
analyses which debunk studies that show that screening has a positive effect,
rather than upon data that show that screening has a negative effect. Another
criticism of screening mammography is economic. While the cost per woman
screened is low (approximately £40 [134] in the UK), another picture emerges
when one looks at the cost per life saved. The UK NHSBSP currently costs
approximately £52M per year and is estimated to save approximately 300 lives
per year [134]. This equates to an approximate average cost of £173 300 per
life saved. By the year 2010, it is estimated that the NHSBSP will save 1 250
lives per year; this will bring the cost per life saved down to approximately
Chapter 2—Breast cancer 48
£41 600 (assuming other factors do not change). In 1995, the cost per life saved
by the Ontario, Canada screening programme was estimated to be £558 000,
based upon the cost of a single mammography examination and the estimated
number of women who would need to be screened in order for one life to be saved
[186]. Variation in the cost of screening can be attributed to the environment
and manner in which screening and treatment are implemented. It is a matter
for those responsible for public health policy to determine the best use of available
resources given the evidence for and against screening mammography.
Molecular tests are now available that can detect some of the BRCA genetic
mutations [4] and these may be used routinely in the future. Consideration is
being given to a UK-wide programme to use magnetic resonance imaging to screen
pre-menopausal women at high genetic risk of breast cancer [28].
2.3.5 Treatment
Treatment for breast cancer is dependent upon several factors: the stage of dis-
ease and its biological characteristics, patient age and the risks and benefits as
determined by clinicians and the patient [4]. Surgery to remove the cancerous
tissue is common, and the type of surgery is chosen to balance the need to re-
move the cancer with the disfigurement that the surgery will cause. Surgery may
involve (in order of increasing disfigurement):
• Lumpectomy—which can be employed when the cancer is localised—involves
removing the “lump” and a border of “normal” tissue which is checked to
ensure that all cancerous tissue has been removed.
Chapter 2—Breast cancer 49
• Simple mastectomy (or total mastectomy) involves the removal of the entire
breast.
• Modified radical mastectomy involves the removal of the entire breast and
underarm lymph nodes.
• Radical mastectomy involves the removal of the breast, underarm lymph
nodes and chest wall muscle. This type of surgery is now used less frequently
as less disfiguring approaches have proved to be effective [4].
Surgery is often used alongside chemotherapy, hormone therapy, biologic (also
called immune and antibody) therapy or radiotherapy. Chemotherapy, hormone
and biologic therapies are systemic treatments in that they are applied to the
entire body—rather than a specific organ—with the intention of killing cancer
cells that may have metastasised.
Chemotherapy is a drug treatment that kills rapidly dividing cells. This includes
cancer cells as well as some types of normal cells, such as blood and hair cells.
Chemotherapy, in combination with surgery, has been shown to deliver five year
survival rates of between 50% and 70% [25]
Hormone therapy attempts to prevent the growth of metastasised cancer cells
by blocking the effects of hormones (such as oestrogen) that can promote their
growth. An anti-oestrogen drug called Tamoxifen has been used successfully,
but recent research shows that the aromatase inhibitor anastrozole significantly
increases disease-free survival over five years compared to Tamoxifen [75].
Chapter 2—Breast cancer 50
Trastuzumab (marketed under the name Herceptin) is a biologic therapy that
targets cancer cells which produce an excess of a protein called HER2. When
combined with chemotherapy, trastuzumab treatment can reduce the relative
risk of mortality by 20%, but can increase the risk of heart failure [119].
In contrast to the systemic treatments, radiotherapy (also called radiation ther-
apy) is targeted at specific locations. High energy radiation is focused on areas of
the body affected with cancer (such as the breast, chest wall or underarm area).
Alternatively, small radiation sources, called pellets, can be implanted into the
cancer. There is no significant difference in survival between women who have
small breast tumours removed by lumpectomy compared to those who also re-
ceive radiotherapy, but women who receive radiotherapy have a reduced risk of
their cancer returning and therefore require less additional treatment [64].
2.3.6 Survival
The one and five year survival rates for English women diagnosed with breast
cancer between 1993 and 2000 were 92.6% and 75.9% respectively [148]. For
comparison, in the same period the mean one and five year survival rates in
both sexes for lung cancer—the second most common cancer in women and most
common cancer in men—were 21.6% and 5.5% respectively. In the USA, the
five year survival rate for women with breast cancer is 87% [4]. There is also an
association between low socioeconomic status, poor access to medical care and
additional illness and low survival rates [4].
Chapter 2—Breast imaging 51
2.4 Breast imaging
This section introduces X-ray mammography—the most common form of clinical
imaging used to detect breast cancer—and briefly discusses the other imaging
modalities that may be used.
2.4.1 X-ray mammography
X-rays were discovered by Wilhelm Conrad Rontgen in 1895, who was awarded
the first Nobel prize for physics for his discovery. X-rays are high-frequency
electromagnetic radiation (30 PHz–60 EHz) and are useful in diagnostic imaging
because the dense tissues in the body are more likely to absorb X-rays (i.e. they
are radio-opaque) while the soft tissues are less likely to absorb X-rays (i.e. they
are radiolucent). X-rays are formed by accelerating electrons from a heated cath-
ode filament towards an anode. The interaction of the high energy electrons with
the anode emits radiation in the X-ray spectrum. This radiation is then directed
towards the patient.
X-rays are detected using photographic film or digitally (e.g. using a charge-
coupled device). By placing a body part between the X-ray source and detector,
it is possible to form an image that spatially describes the X-ray absorption of
the body part. This image will be a two-dimensional projection of the three-
dimensional structure.
X-rays were first used to investigate breast cancer almost a hundred years ago
[160]. An X-ray mammogram is obtained by imaging the breast compressed
Chapter 2—Breast imaging 52
Figure 2.3: The mediolateral-oblique and cranio-caudal views.The diagram illustrates the directions of compression used in the mediolateral-oblique (MLO) and cranio-caudal (CC) views. The MLO view is illustrated onthe left in blue and the CC view is illustrated on the right in red.
between two parallel radiolucent plates. Different directions of compression allow
clinicians to view the three-dimensional structure of the breast in more than one
way. This allows ambiguities caused by occlusion or other perspective effects to
be minimised. Two common views are the cranio-caudal view (CC—“head to
tail”) and the mediolateral-oblique view (MLO; where the compression is angled
approximately 45◦ to the CC view). These are illustrated in Figure 2.3.
X-ray mammography is the imaging modality of choice for breast cancer investiga-
Chapter 2—Breast imaging 53
tion and the UK National Health Service Breast Screening Programme generates
hundreds of thousands of mammograms each year [133]. X-ray mammography
is favoured because of its high resolution (required to image microcalcifications)
and low cost (approximately £40 per woman screened [134]).
Fully digital systems are increasing in quality and popularity. The advantages of
fully digital systems may include:
• Direct digital image acquisition.
• Increased sensitivity compared to film-based methods, permitting lower ra-
diation dosage.
• Immediate image display and enhancement.
• Improved archival and transmission possibilities (including remote image
analysis by human or computer).
It is expected that fully digital mammography will soon supersede film-based
mammography. Fully digital mammography is likely to benefit the computer-
aided mammography research community, as the digitisation step required for
film-based mammography is an impediment to the collection of useful image
data.
Although X-ray mammography remains the most useful imaging modality for
breast cancer, it is dependent upon the use of radiation, which itself can cause
cancer. It is likely that some cancers are caused by the screening programme.
Efforts are made to monitor and minimise radiation dose.
Chapter 2—Breast imaging 54
Mammograms are most commonly read visually as X-ray films, although com-
mercial computer-aided mammography and digital systems are being used—
particularly in the USA (see Section 3.13 for a discussion of commercial sys-
tems). In the screening environment, dedicated viewing stations are loaded with
a batch of mammograms. The mammograms are positioned so that left and
right breasts—and CC and MLO views, if both are available—can be compared
directly. Radiologists use strategies to try and ensure that ‘danger zones’ are al-
ways examined. In the UK screening environment, it is typical that a radiologist
will take an average of 30 s to read each patient’s mammograms. Radiologists
record their assessments and difficult cases are likely to be discussed with col-
leagues. If double reading is used—where two radiologists independently read
each mammogram—a protocol will be followed to combine the assessments of
each radiologist.
Women for whom screening indicates abnormality are recalled for further investi-
gation such as a magnification X-ray or ultrasound. The diagnosis of breast cancer
may be confirmed by analysing a tissue sample extracted by biopsy. Because the
interpretation of mammograms is a difficult task and is subject to human error,
biopsies are sometimes performed on women who do not have cancer. The recall
process is traumatic and biopsy—like any surgery—causes discomfort and worry.
The benign biopsy rate in 2002/3 was 1.20 per 100 000 women screened [133].
The benign biopsy rate has improved with advances in diagnostic technique.
The radiological signs of breast cancer are described in Chapter 3; example images
are given for the most common indicative signs.
Chapter 2—Breast imaging 55
2.4.2 Ultrasonography
Ultrasound imaging works by sending high-frequency sound pulses into the tissues
of a patient using an array of transceivers that is placed on the patient’s skin.
When these sounds encounter tissue interfaces, some of the sound is reflected back
to the array. The distances from the skin surface to the tissue interfaces are then
computed based upon the time between the pulses being sent and received and
the speed of the sound wave. One-dimensional transceiver arrays produce image
slices, while two-dimensional arrays produce volumes. These are presented to
the ultrasonographer on a computer display. Ultrasound images are generated in
real-time and are useful in breast cancer investigation when a suspicious feature
has been identified by X-ray mammography or when a patient has reported with
symptoms [63]. It is particularly useful for differentiating between cysts (which
are benign) and malignant masses.
2.4.3 Magnetic resonance imaging
The human body is composed largely of water, which in turn is composed largely
of hydrogen. A hydrogen atom has an unpaired proton, and so has a non-zero
nuclear spin. In magnetic resonance imaging (MRI), the patient is placed in a
strong uniform magnetic field (usually between 0.23T and 3.0T). This forces the
spins of the protons in the hydrogen atoms to align with the field. Almost all
protons will be paired, in that each member of a pair will be oriented at 180◦ to
the other, but some will not. A radio frequency pulse can temporarily deflect the
unpaired protons.
Chapter 2—Breast imaging 56
The imparted energy is released as electromagnetic radiation as the spins realign
with the field. The realignment signal is characteristic of tissue type and can
be measured. By applying an additional graduated magnetic field it is possible
to localise the signals, since their frequency is related to their position in the
graduated field. The received signals are recorded in a frequency space called
K-space. An inverse Fourier transform is applied to form the corresponding
spatial volumetric data. Voxel values represent tissue type and hence the patient’s
anatomy.
The spatial resolution of current clinical MRI systems is not as good as that of X-
ray mammography, so microcalcifications cannot be imaged. However, MRI has
several advantages over X-ray imaging: patients are exposed to little radiation,
three-dimensional data can be acquired and contrast agents can be used. Un-
fortunately, MRI is currently too expensive for routine asymptomatic screening
for breast cancer, but may be useful for screening younger women whose family
history and/or genetic status suggest that are at increased risk of breast cancer
[28].
2.4.4 Computed tomography
In computed tomography (CT), an X-ray source is rotated around the patient’s
major axis. Whereas the beam of a conventional X-ray can be considered to be
conical (i.e. 3-D), CT typically uses a “triangular” beam (i.e. a very thin cone).
The attenuation of the beam as it passes through the patient is recorded by an
X-ray detector positioned opposite to the source. The attenuation data from all
Chapter 2—Summary 57
orientations can be combined to compute a 2-D image “slice”, where each location
in the slice represents the X-ray attenuation of the tissue to which it corresponds.
By slowly passing the patient through the rotating mechanism, 3-D data can be
acquired. Although it is possible to use CT for breast imaging, it is rarely used
to diagnose breast cancer [117]. The technique can be useful for surgical planning
and to assess the patient’s response to treatment.
2.4.5 Thermography
Advanced cancers promote angiogenesis—the development of a blood supply to
the tumour. Regions containing more blood are hotter than others, and this heat
may be detectable on the skin surface. Thermography is an imaging technique
that forms maps of the emission of infrared radiation [117]. These maps enable
clinicians to look for asymmetries in the heat patterns on the breasts that may
result from angiogenesis or the enhanced metabolic processes that occur in a
tumour. Compared to X-ray mammography, thermography lacks specificity and
resolution.
2.5 Summary
This chapter presented an introduction to breast cancer and the imaging modal-
ities used to detect the disease. In summary:
• Breast cancer is a significant public health issue. While many countries
Chapter 2—Summary 58
now have screening programmes to help detect the disease at an early and
treatable stage, the image interpretation task is performed visually and is
subject to human error.
• X-ray mammography is the most useful imaging modality because of the
high image quality and low cost. X-ray mammography allows the anatomy
of the breast to be imaged at very high resolution, allowing very small
indicative signs of breast cancer—such as microcalcifications—to be seen.
• X-ray mammography does have some drawbacks (e.g. the use of radiation,
2-D projection of the 3-D structure, potential for poor patient positioning,
potential for poor film exposure and development).
• Other imaging modalities have their uses in detecting and diagnosing breast
cancer, but X-ray mammography for screening is unlikely to be replaced by
any of the currently available imaging techniques.
Chapter 3
Computer-aided mammography
3.1 Introduction
This chapter presents a review of the computer-aided mammography literature.
The chapter reviews:
• Image pre-processing.
• Automatic prediction of breast cancer risk.
• The appearance of common signs of breast cancer and approaches to their
detection by computer.
• Methods of evaluating computer-aided mammography systems.
• Common image databases.
• Available commercial systems.
59
Chapter 3—Computer-aided mammography 60
• Research on computer-aided prompting of radiologists.
We discuss the typical approach to computer-aided mammography and the prob-
lems associated with it. We propose how this problem might be solved.
3.2 Computer-aided mammography
Although screening mammography has been shown to reduce breast cancer mor-
tality [133, 2], it suffers from some problems that computer vision systems might
be able to solve, for example:
• Double reading improves cancer detection rate [57], but it cannot always be
performed in the screening environment due to human resource or economic
limitations. Computer vision systems could act as a second reader.
• The interpretation of mammograms is a difficult task and human error does
occur [97]. Computer vision systems could deliver a guaranteed minimum
quality of screening and potentially catch some of the errors made by radi-
ologists.
• Cancer is detected in less than 1% of women screened [133]. A computer
vision system that could accurately dismiss mammograms that were normal
could dramatically reduce radiologist workload.
The most commonly proposed approach to computer-aided mammography is
prompting, in which a computer system automatically analyses a digitised mam-
mogram and places prompts on a representation of the mammogram—e.g. an
Chapter 3—Image enhancement 61
image of the digitised mammogram displayed on screen or a paper printout—to
indicate the presence and location of possible signs of abnormality. A radiol-
ogist would then consider these prompts alongside their own interpretation of
mammograms. Prompting is discussed further in Section 3.14. The following re-
view of computer-aided mammography research generally assumes the prompting
approach, but other paradigms are also discussed.
3.3 Image enhancement
Image enhancement describes approaches that change the characteristics of im-
ages to make them more amenable to other tasks (e.g. inspection by humans or
further processing by computer). This includes noise suppression or equalisation,
image magnification, grey-level manipulation (e.g. brightness and contrast im-
provement) and feature enhancement or suppression. Generic image enhancement
techniques are well-established and are routinely used within more sophisticated
algorithms.
A commonly-used algorithm is histogram equalisation [168]. Histogram equali-
sation attempts to modify the grey-level values in an input image such that the
histogram of those values matches a specified histogram, which is often flat. If
a flat target histogram is specified, the result will be an image that uses the
entire range of grey-levels, with increased contrast near maxima in the original
histogram, and decreased contrast near minima. A possible problem with the
approach is that the image is modified based upon global image statistics, which
might not be appropriate in local contexts. Local histogram modification tech-
Chapter 3—Image enhancement 62
niques use local neighbourhoods, while adaptive histogram modification methods
use local contextual information [168].
Averaging filters replace pixel values with the average of those within a local
neighbourhood. Using the mean tends to blur edges (as it is essentially a low-
pass filter) while using the median does not. Bick et al. used median filtering to
remove noise spikes [15, 168]. Lai et al. used a modified median filter where the
set of pixels considered by the filter was restricted to exclude those that were too
dissimilar to the pixel that the filter was centred on [116]. The approach achieved
better edge preservation compared to the standard median filter. Such methods
are “coarse” in that they rarely have any model of the domain in which they
operate (e.g. such filters might mistake film noise for small microcalcifications
because they have no “knowledge” about those two classes of image feature).
Zwiggelaar et al. used a directional recursive median filter to construct mam-
mographic feature descriptors [192] (see Section 3.8 and Chapter 4 for a more
detailed description of such descriptors).
Grey-level values can be manipulated via the Fourier domain. For example,
image smoothing can be used in an attempt to suppress noise by attenuating
high-frequency components [168]. However, methods that operate only in the
Fourier domain lack spatial information, and so important context may not be
available. Wavelets address this problem as they can be used to describe images
in terms of both space and frequency, and are commonly used in mammography.
Wavelet analysis was used by Qian et al. [147] to enhance microcalcifications by
selectively reconstructing a subset of the wavelet sub-band images. Compared
to Fourier methods, wavelets allow the characteristics of the signal(s) of interest
Chapter 3—Image enhancement 63
to be specified more precisely. Wavelets were used in place of ad hoc texture
features by Campanini et al. [35] and used to statistically model mammographic
texture by Sajda et al. [159] (see Section 3.7 for a more detailed discussion of
these methods).
In contrast to the frequency-based methods such as Fourier and wavelet analy-
sis, mathematical morphology analyses images based upon the shape of image
features. It can be used to remove image features of a given shape and size
(e.g. Dengler et al. considered microcalcification candidates [55]). A possible
problem with mathematical morphology is that a specification of shape is re-
quired: image features that vary dramatically in shape may require very many
such specifications, leading to implementation issues. A detailed discussion of
mathematical morphology can be found in Chapter 4.
Noise equalisation is important because machine learning systems are under-
pinned by statistical methods which often implicitly assume that the noise has
particular characteristics. By equalising the noise, the properties of the image
data are likely to be more closely matched to the assumptions made by the al-
gorithms that operate on that data. Image noise in digitised mammograms may
be considered to vary as a function of grey-level pixel value [106, 166]. Smith
et al. used a radiopaque step-wedge phantom to estimate this relationship in order
to correct the non-uniformity [166], but a phantom is likely to be a nuisance in a
screening environment. Karssemeijer and Veldkamp described noise equalisation
transforms where the noise is estimated from the image itself—rather than from a
radiological phantom—using the standard deviation of local contrasts [106, 180].
It was demonstrated that equalising the noise using the approach improved the
Chapter 3—Image enhancement 64
performance of detection algorithms. This is likely to be due to the explanation
given above.
Highnam and Brady [88, 86] proposed a physics-based model of the mammo-
graphic image acquisition process to convert digital mammograms to an image
representation they call hint. In the hint representation pixel values represent the
thickness of the “interesting” (non-fat) tissue. The technique relies upon knowing
several parameters that describe the X-ray imaging process, such as the thickness
of the compressed breast, tube voltage, film type and exposure time. By modelling
the imaging process, the appearance under a set of “standard” imaging conditions
can be predicted, leading to the Standard Mammogram Form (SMF) [88]. It is
not always practical to measure the various imaging parameters during routine
screening and radiologists do not train with such standardised mammograms. It
seems likely that working with mammograms where pixel values represent tangi-
ble quantities will lead to better detection algorithms, but digital mammograms
are not widely available in hint form. One of the goals of the eDiaMoND project
was to make such data available to researchers (see Section 3.12) [23].
The identification of curvilinear structures is useful in detecting and classifying
spiculated lesions (see Section 3.8). Cerneaz and Brady developed a physics-based
model that was used to model the expected attenuation of curvilinear structures
[39]. The authors assumed that such structures are elliptical in cross-section and
so would appear to have strong second derivative components in the image. The
second derivative was used to enhance candidate pixels and a skeletonisation al-
gorithm [168] was used in further processing. Physics-based models would have
to be extremely complex and specific to properly explain the appearance of mam-
Chapter 3—Breast segmentation 65
mograms. It therefore seems likely that approaches based upon image data itself
have more potential. Most research on digital mammography has used this latter
approach.
3.4 Breast segmentation
The identification of the breast border is a common task in digital mammog-
raphy and the development of reliable automatic methods is important. Such
information is required to limit the search for abnormalities to the breast area
(particularly when algorithms are computationally expensive), or so that some
form of breast shape analysis can be performed (see Chapter 9 for an example).
Locating the breast border is a non-trivial task due to the variation both between
women and inherent to the X-ray acquisition process.
Grey-level thresholding is a common approach to breast segmentation. Two
thresholds are generally sought. The first discards pixels with low grey-levels,
assuming them to belong to non-breast radiolucent objects (such as air). The
second discards pixels with high grey-levels, assuming them to belong to non-
breast radiopaque objects (such as film markers). The selection of these thresh-
olds is generally non-trivial, and other information such as shape is often also
used. Byng et al. determined these thresholds manually [33]. They can also be
determined by analysing the shape of the image histogram [42]. Although thresh-
olding can provide an initial estimate of the boundary, the approach is generally
confounded by features such as film markers, and much more sophisticated ap-
proaches that have some model of what the segmented image should look like are
Chapter 3—Breast segmentation 66
generally used.
Chandrasekhar and Attikiouzel analysed the shape of the cumulative grey-level
image histogram to identify a characteristic ‘knee’ which represents the bound-
ary between background and breast tissue [42]. Adaptive thresholding yielded an
initial segmentation which was then modelled by polynomials. This segmentation
was subtracted from the original image and the result was thresholded, resulting
in a binary image describing the breast and non-breast regions. Morphological
operations were used to remove artifacts arising from film scratches. An imple-
mentation of Chandrasekhar and Attikiouzel’s algorithm was subjectively good
enough to approximately limit the operation of detection algorithms to the breast
region, but was not good enough to allow the shape of entire mammograms to be
modelled in the work described in Chapter 9.
Lou et al. [121] quantised mammograms using k-means clustering and inspected
horizontal slices through the quantised images to determine the direction of a
decrease in pixel value. The direction was used to estimate the left-right orien-
tation of the breast. Pixel values on the skin-air border were found to lay in one
of three quantised pixel values. This information was used to generate an initial
estimate of the breast border. Actual mammogram pixel values were sampled
along normals to the initial estimate. Pixels values along normals to the breast
border will decrease from values associated with the edge of the breast to those
associated with the non-breast region. Linear models of pixel value as a func-
tion of distance along the normals were used to refine the estimate of the breast
border. A rule-based search was then used to further refine the breast border.
Finally, a B-spline was used to link and smooth the located breast border points.
Chapter 3—Breast segmentation 67
The approach is sensible because the skin-air border should be relatively easy to
model. However, a common confounding feature is the placement of film markers.
These would pose an occlusion problem to methods that do not also use a model
of legal breast shape.
The active shape model (ASM) [48] has been used in a number of medical and non-
medical applications. An ASM models the statistical variation of shape associated
with a particular class of object and uses a statistical model of pixel values along
normals to the shape boundary to legally deform the model to fit to an object
in an image. Smith et al. used an ASM to locate the breast outline [165]. The
ASM can therefore be viewed as a generalisation of the approach proposed by
Lou et al. [121]. The two main problems with the ASM are that it does not use
all the image information in its search strategy and it requires an initialisation
that is already a good approximation to the final solution. The former was
rectified by the Active Appearance Model [47]. A better approach to breast
border segmentation might be to build a low resolution appearance model (similar
to that described in Chapter 9) and then search over the model parameters to find
those that best describe a low resolution version of the mammogram in question.
This would provide a low resolution estimate of the boundary. The estimate
could then be refined at high resolution using a model of the skin-air boundary
transition. Refinements could be propagated upwards to the low resolution model
where illegal (unlikely) refinements could be rejected.
Chapter 3—Breast density and risk estimation 68
3.5 Breast density and risk estimation
Post-menopausal breast density is a high risk factor for breast cancer [4]. Also,
because cancer develops from dense (glandular) tissue it may be masked in mam-
mograms by normal dense tissue. Automatic assessment of the density of breasts
and the risk associated with that density may be helpful to radiologists, particu-
larly as automated methods can provide stable independent measurements, while
there will be inherent variability in assessments made by humans.
Byng et al. proposed a simple interactive approach where users of their system
selected grey-level thresholds to segment the breast region and dense tissue [33].
The proportion of dense to total area was used as a measure of breast density. The
approach is reasonable because the mammographic brightness indicates density,
but it seem that a similar approach using the calibrated hint measure would
be more stable. Additionally, the manual selection of thresholds will introduce
variation between and within users; a fully automated system could avoid such
problems.
Taylor et al. investigated sorting mammograms into fatty and dense sets using
a multi-resolution non-overlapping tile-based method. A number of statistical
and texture measures, computed for each tile, were evaluated and local skewness
was found to best discriminate between the classes [175]. The reader is hereafter
referred to Section 3.15 for a discussion of ad hoc texture descriptors.
Wolfe proposed that parenchymal patterns are related to breast cancer risk [185]
and developed a radiological lexicon for describing the dense and fatty char-
Chapter 3—Breast density and risk estimation 69
acteristics of mammograms, known as Wolfe grades. The relationship between
parenchymal pattern and breast cancer risk has been confirmed by Boyd et al. [22]
and van Gils et al. [178]. Tahoces et al. statistically modelled various texture de-
scriptors to predict Wolfe grades [173].
Caldwell et al. used fractal dimension—a measure of the complexity of a self-
affine object—with mammographic images (considered as surfaces) to measure
textural characteristics. They classified mammograms by Wolfe grade, based
upon average fractal dimension and the difference between that average and the
fractal dimension of a region near the nipple [34].
Karssemeijer divided the breast into radial regions so that the distances to the
skin line were approximately equal. Grey-level histograms were computed for
each region and the mean standard deviation and skewness were used to classify
mammograms by Wolfe grade using a k-nearest neighbour classifier [107]. The
success of the method can probably be attributed to the statistical characterisa-
tion of the appearance of the mammograms.
Zhou et al. used a rule-based method that classified mammograms according to
prototypical characteristics in their grey-level histograms. This classification was
used to automatically select a threshold with which to segment the dense tissue.
The proportion of dense to total breast area was then computed [189]. Detecting
a well-understood feature in a 1-D function (the histogram) can be reasonably
easy, although the approach is dependent upon the stability of these histogram
characteristics.
A Gaussian mixture model of texture descriptors, learned using the Expectation-
Chapter 3—Microcalcification detection 70
Maximisation (EM) algorithm, was used by Zwiggelaar et al. to segment mammo-
grams into six tissue classes [193]. The area of dense tissue—as segmented by the
model—as a proportion of total area was used in a k-nearest neighbour framework
to classify mammograms into one of five density classes. Although learning the
distribution of texture features allows a principled statistical approach to be used,
it is not clear that the clustering produced by the EM algorithm would necessarily
correspond to a clustering that an expert might produce. Further, the EM algo-
rithm aims to find the best fit of a model of a probability density function to the
data, rather than to partition the data (as Algorithm 3 in Chapter 5 explains, in
the EM algorithm every data point belongs to every model component, so there
is no actual partitioning). Dedicated clustering methods might have been more
appropriate. Gaussian mixture models are discussed in some detail in Chapter 5
and a proof of the convergence property of the EM algorithm is presented in
Appendix A.
3.6 Microcalcification detection
Microcalcifications are tiny (approximately 500 µm) specks of calcium. A cluster
of microcalcifications can indicate the presence of an early cancer. Microcalci-
fications can sometimes be detected easily as they can be much brighter than
the surrounding tissue. However, small microcalcifications may appear to be
very similar to film or digitisation noise. Scratches on the mammographic film
can sometimes be mistaken for bright microcalcifications, particularly by auto-
mated methods. A mammogram containing an obvious microcalcification cluster
Chapter 3—Microcalcification detection 71
is shown in Figure 3.1.
Karssemeijer describes an iterative scheme for updating pixel labels, based upon
three local image descriptors (local contrast at two spatial resolutions and an es-
timate of local orientation). Pre-processing was used to achieve noise equalisation
using information from a radiological phantom. A Markov random field model
was used to model the spatial constraints between four pixel classes (background,
microcalcifications, lines or edges, and film emulsion errors) and a final labelling
was achieved via iteration [105]. Local methods are appropriate for individual
microcalcification detection because of their small size, but are inappropriate for
cluster detection. Detecting clusters of microcalcifications is important because
their form contains important information about the cause of the cluster (e.g. ma-
lignancy). In addition, it can be difficult to determine when Markov random field
models have converged.
Veldkamp et al. [179] classified microcalcification clusters as being malignant or
benign by estimating the likelihood of them being malignant. Individual micro-
calcifications were detected using Karssemeijer’s method. Discs were then centred
on each microcalcification and the boundaries of the intersection of the discs were
computed. Microcalcifications were clustered according to which boundary they
were located within. The procedure was performed for both mediolateral-oblique
and cranio-caudal views, and correspondences were determined between clusters
in each view. Features used for classification included the relative location of the
cluster in the breast, measures of calcification distribution within the cluster and
shape features. The likelihood of malignancy was computed as the ratio of the
number of malignant to benign neighbours in the k-nearest neighbourhood. The
Chapter 3—Microcalcification detection 72
Figure 3.1: An example microcalcification cluster.The location of the microcalcification cluster is indicated by the red circle. Thebottom left image shows a magnification of the cluster; the bottom right imageshows a histogram equalised version of the magnified cluster. Source: The mam-mographic image analysis society digital mammogram database [171].
Chapter 3—Microcalcification detection 73
approach is sensible because it acknowledges that it is the clusters that are im-
portant, includes information about the form of clusters and delivers a statistical
measure of the likelihood of malignancy.
Bocchi et al. [18] designed a matched filter to enhance microcalcifications by as-
suming a Gaussian model of microcalcifications and a fractal model of mammo-
graphic background. A region growing algorithm was used to segment candidate
microcalcification clusters and to describe the location of each candidate micro-
calcification. An artificial neural network was used to discriminate between mi-
crocalcifications and artifacts of the filtering stage. Segmented regions were char-
acterised by fractal descriptors and these were used in a second artificial neural
network to identify true clusters. The underlying assumptions of the approach—
a Gaussian model of microcalcifications and a fractal model of mammographic
background—while being reasonable models, are not true. A more realistic model
of these image features may have improved their results.
False positive elimination was addressed by Ema et al. who used edge gradients
at signal-perimeter pixels to eliminate features such as noise or other artifacts
[61]. Zhang et al. used a “shift-invariant” artificial neural network to segment
candidate microcalcifications [188]. The size and “linearity” of candidate micro-
calcifications were analysed to reject false positives due to vessels. Both of these
methods implicitly attempt to model the neighbourhood around true microcalci-
fications and direct modelling of that neighbourhood—such as that described in
Chapter 6—might be more appropriate.
Chapter 3—Masses 74
3.7 Masses
Masses are abnormal growths and may be malignant or benign. Masses may ap-
pear to be localised bright regions, but are often very similar in appearance to,
and may be obscured by, normal glandular tissue. Detection and discrimination
of masses can be difficult even for expert mammography radiologists. Malignant
masses are often characterised by linear features radiating from the mass, called
spicules, and we discuss methods for detecting and assessing spiculation in Sec-
tion 3.8. A mammogram containing an obvious circumscribed mass is shown in
Figure 3.2.
A common approach to the detection and classification of masses is to determine
candidate mass regions and then compute descriptors for the region designed
to allow discrimination between true and false detections. The problems that
research addresses is how candidate mass locations are found, which features
should be extracted and how they should be combined to yield a classification.
Karssemeijer and te Brake compared two methods for segmenting masses [177].
The first grew a region from a seed location, expanding the region if neighbour-
ing pixels were above a certain threshold. The region growing was repeated using
a number of thresholds and the “best” region was selected using a maximum
likelihood method that considered the distribution of pixel grey-levels inside and
outside the region. The second method was a dynamic contour defined by a set
of connected vertices, similar to the method proposed by Kass et al. [111]. The
vertices were accelerated towards the mass boundary using internal and external
forces. The internal forces served to encourage compactness and circularity of the
Chapter 3—Masses 75
Figure 3.2: An example circumscribed mass.The location of the mass is indicated by the red circle. The bottom left image showsa magnification of the mass; the bottom right image shows a histogram equalisedversion of the mass. Source: The mammographic image analysis society digitalmammogram database [171].
Chapter 3—Masses 76
region, while the external forces served to encourage the boundary to converge on
strong image gradients. A damping force was used to promote convergence. The
authors report that the two methods produced segmentations that were similar
to those of radiologists. The segmentations produced by the dynamic contour
model allowed better discrimination between normal and abnormal regions when
geometric and texture features were used within an artificial neural network clas-
sifier. Region growing methods generally only consider local neighbourhoods,
and so segmentations can have illegal shapes. Dynamic contour methods depend
upon the form of the forces used to constrain them. Equations relating the im-
age content to the force applied to the vertices tend to be ad hoc in nature, and
so it is easy for assumptions about the data to be implicitly included. It may
be more appropriate to learn the form of the constraining forces than to choose
them manually. Dynamic contour methods do not generally have any notion of
the range of legal shapes that they may take. This is often a problem in cases
where the objects of interest have prototypical characteristics (e.g. the shapes of
people’s hands), but is appropriate for objects such as mammographic masses
where the shapes lack typical structure (i.e. have very high variability).
Haralick et al. used texture descriptors—spatial grey-level dependence (SGLD)
matrices (also called co-occurrence matrices)—to compute texture features [79].
The (i, j)-th element of a SGLD matrix Sd,θ describes the number of pixels in
the input image with grey-level i that have a pixel with grey-level j at a distance
of d in direction θ. Petrosian et al. and Chan et al. computed statistics from
these matrices to describe textural characteristics [141, 41]. These were used to
discriminate between textures associated with mass and non-mass regions. As the
Chapter 3—Masses 77
number of grey-levels increases (i.e. as the number of bits used in the digitisation
increases), so does the size of the SGLD matrices. This leads to a problem similar
to the “curse of dimensionality” (described in Section 5.3), where very much data
is required to estimate the matrices that adequately characterise the texture. The
bit-depth of the images can be reduced to make the estimation tractable, but this
can lead to a poor description of the texture.
Brzakovic et al. segmented mass candidates using a multi-scale fuzzy method. A
textural descriptor was used within a hierarchy of classifiers that used thresholds
and Bayesian methods to classify the candidates as as malignant or benign [29].
Wavelets were used by Campanini et al. to detect malignant masses [35]. Wavelet
decompositions were computed on square windows extracted by “scanning” mam-
mograms over a range of scales. A support vector machine classifier was trained
on the wavelet coefficients to classify the windows as malignant masses or normal
regions. For a particular test mammogram, the initial output was a set of binary
images, one at each scale. A majority voting scheme was employed to produce a
final classification. Support vector machines have proved to perform well in high-
dimensional spaces and the authors rely on the ability of the learning system to
extract useful features from the full descriptions provided by the wavelet coeffi-
cients. This is reasonable because it removes the need to make explicit or implicit
assumptions about which image characteristics are appropriate to extract.
Sajda et al. present a generative statistical model of the appearance of mammo-
graphic regions of interest. Wavelet coefficients, computed from mammographic
patches, were statistically modelled using a tree-structured variant of a hidden
Chapter 3—Spiculated lesions 78
Markov model. In addition to being able to generate synthetic mammographic
textures and compress mammographic images, the model can be used in an ana-
lytical mode as an adjunct to a mass detection algorithm to reduce false positives.
Models were trained on mass and non-mass regions of interest and used to com-
pute likelihood ratios for test images [159, 169]. The method is discussed further
in Section 6.2.
3.8 Spiculated lesions
The margin of a mass contains information that radiologists can use to charac-
terise the mass. Margins can be described as circumscribed, obscured, lobulated,
indistinct or spiculated. Spiculations (also called stellate distortions) are curvi-
linear radial features and a strong sign of malignancy. Automated methods seek
to classify the mass margin using either features that describe properties of the
margin or by detecting and classifying spiculations directly. A mammogram con-
taining an obvious spiculated lesion is shown in Figure 3.3.
Scale-orientation pixel signatures1 corresponding to linear structures were statis-
tically modelled by Zwiggelaar and Marti [191]. The model was then used to
classify pixels as belonging to linear structures or not. Pixel signatures are a
type of texture feature and describe pixel neighbourhoods in terms of scale and
orientation. Signatures taken from blob-like features are dissimilar to those taken
from linear features. Modelling signatures from linear features is sensible as it
allows the presence of such structures to be analysed in a statistically meaningful
1Scale-orientation pixel signatures are presented in detail in Chapter 4.
Chapter 3—Spiculated lesions 79
Figure 3.3: An example spiculated lesion.The location of the spiculated lesion is indicated by the red circle. The bottom leftimage shows a magnification of the spiculated lesion; the bottom right image showsa histogram equalised version of the spiculated lesion. Source: The mammographicimage analysis society digital mammogram database [171].
Chapter 3—Spiculated lesions 80
way.
Zwiggelaar et al. compared several approaches to the detection of linear struc-
tures [190]. Two variants of a line operator were investigated that computes an
orientation and strength for each pixel by computing the mean pixel value along
oriented lines centred on the pixel in question. Karssemeijer’s method [110] and
a ridge detector designed to minimise the response to “blobs” were also used.
The line operators were found to perform best. An approach to spicule detec-
tion was investigated. Linear features were classified into their anatomical classes
on the basis of their cross-sectional profiles. Noise was reduced using principal
components analysis and classification was achieved by assuming Gaussian mod-
els of class conditional densities. However, in order to detect spiculated lesions,
a method would be required to integrate knowledge from the classified linear
structures.
Karssemeijer computed statistics for a circular region centred on each pixel in
turn that described the concentration and radial uniformity of moderately strong
gradients pointing towards the centre of the region. A continuous multi-resolution
scheme was used to match the feature extraction to the scale of the image fea-
tures. These features were used in an artificial neural network to predict the
likelihood of suspiciousness [108]. Karssemeijer describes a method of comput-
ing the orientation and strength of linear features by combining the responses to
three oriented Gaussian second derivate kernels [110]. While spicules do point in
the general direction of the central mass, they are often curved and so a method
that could determine that a number of curvilinear structures—rather than just
pixels with particular gradients—“point” towards a given area might be more
Chapter 3—Spiculated lesions 81
successful.
A linear discriminant was used by Mudigonda et al. to classify masses as malig-
nant or benign using texture and gradient features extracted from SGLD matrices
computed from “ribbons” around mass borders [132]. One of the problems that
such a method would have is determining the correct region around the border,
as some spicules can be quite short, while others can be relatively long.
A spiculation descriptor was proposed by Huo et al. and evaluated for four types
of mass region [94]. Let I denote the pixels inside a segmented mass region and O
denote the pixels outside the segmented region but within the region of interest
containing the mass candidate. Four types of region were investigated: I, O,
O∪I and a region lying on the boundary of the segmented mass. The directions
of maximal gradient were computed for each pixel in the region and compared to
the direction defined by the line connecting the centre of gravity of the mass region
to the location of the pixel in question. Statistics computed from these measures
were used to describe the spiculation associated with the mass. The authors
report that features computed from O and O ∪ I provided better estimates of
the likelihood of malignancy than the other regions, but combining the measures
from all regions yielded the best performance. This method is similar to that
of Karssemeijer [108]; again, a method that could determine that a number of
curvilinear structures—rather than just pixels with particular gradients—“point”
towards a given area might be more successful.
Evans et al. developed a statistical model of the characteristics of normal curvilin-
ear features [62]. A multi-scale method was used to enhance locally bright ridge-
Chapter 3—Spiculated lesions 82
like structures. Shape description features were computed for non-intersecting
curvilinear features which were then projected into a principal components space.
The distribution of points in the principal components space was modelled us-
ing a Gaussian mixture model. Such a method could be used within a novelty
detection scheme to detect abnormal curvilinear features.
Sahiner et al. classified masses as malignant or benign using morphological and
textural features extracted from a region on the mass periphery. They used
an active contour (see [111, 177]) to segment mass candidates. Morphological
features (e.g. a Fourier descriptor, convexity and rectangularity measures) and
texture features extracted from SGLD matrices were used in a linear discriminant
classifier [158].
A wavelet decomposition was used by Liu et al. to detect and classify masses [120].
Orientation and magnitude features were extracted from each sub-band image and
used within a binary classification tree that processed the features in a coarse to
fine order according to the scale of the sub-band images. This allowed images
to be efficiently processed, as positive mass detections were propagated from
coarse levels, eliminating the need to process all pixels in all sub-band images.
Median filtering was used on the final response images to reduce false positives.
While classifiers like the support vector machine are now more common than
classification trees, the approach taken could allow definitely normal features to
be ignored at little computational expense. However, there is a risk of an increased
false negative and positive rate if some of the available evidence is ignored.
Chapter 3—Asymmetry 83
3.9 Asymmetry
Radiologists typically view mammograms as pairs of left and right breasts and
use information in each to help understand the appearance of the other. An
abnormality that is detected as a result of a difference between a pair of mammo-
grams is called an asymmetry, although there is a distinction between a radiolog-
ical asymmetry and a mathematical asymmetry. All pairs of mammograms are
mathematically asymmetrical and this asymmetry may be quite marked while
still being considered normal. Few computer-aided detection algorithms include
asymmetry information and almost certainly suffer as a result.
Giger et al. used mathematical asymmetry to generate candidate mass locations
by registering pairs of breasts and performing bilateral subtraction. Geometric
and texture features were extracted and used within an artificial neural network.
The authors improved performance using temporal subtraction, which can be
considered as another form of asymmetry [71]. A potential problem with this
approach is that the texture analysed is created from a bilateral subtraction that
is due to a registration. If the behaviour of the registration algorithm is unstable
(e.g. if it performs differently on different types of breast) then the texture dis-
crimination task would be confounded. There is a fundamental problem in the
assumption that dense correspondences can be obtained between a pair of mam-
mograms, because structure may be missing from one or both mammograms
(e.g. the pectoral muscle or nipple may not have been imaged).
Miller and Astley [129] note that mathematical asymmetry—typically obtained
via image registration and bilateral subtraction, which may introduce artificial
Chapter 3—Asymmetry 84
asymmetry—is not a good model of radiological asymmetry. They propose mea-
sures for three types of radiological asymmetry: shape, intensity and topology.
Radiologists annotated the dense regions of mammograms and correspondence
was assumed between the largest such regions in each pair of mammograms.
Bilateral differences in shape descriptors were used as shape asymmetry mea-
sures. The authors also used the minimum cost of “transporting” grey-levels
from one reduced-resolution mammogram to the other—using the transportation
algorithm [89]—as a measure of intensity asymmetry. Topological asymmetry
was measured using the difference between area and binary moments. A lin-
ear discriminant performed best when all three measures were combined. The
assumption that there is a correspondence between the largest annotated dense
regions may not be correct because it may be possible for a dense region in one
breast to correspond to two (or more) such regions in the other. In addition,
only considering the largest dense regions ignores the contribution to asymmetry
from the other regions. Asymmetry can be a subtle sign of abnormality, and so
applying the transportation algorithm at low resolution may miss the more subtle
asymmetries.
Miller and Astley could only compute transportation cost for low-resolution mam-
mograms because the solution to the transportation programming problem scales
poorly with the number of pixels and computing power was limited when judged
by today’s standards [129]. Board et al. revisited the transportation problem
as an asymmetry measure and developed a multi-resolution transportation algo-
rithm where solutions at low resolutions constrain the problems at higher reso-
lutions, thus allowing only “plausible” transportations [17]. They used the mean
Chapter 3—Clinical decision support 85
transportation cost per pixel to discriminate between normal and abnormal asym-
metries and the per-pixel transportation cost to localise asymmetries. While this
work addresses the problem that Miller and Astley faced, it is not clear what
transportation cost means in a statistical sense.
3.10 Clinical decision support
Clinical decision support refers to the use of computer technology to help clini-
cians make clinically relevant decisions. While computer-aided detection (CADe)
is concerned with fully-automatic methods that aim to draw the attention of ra-
diologists to abnormalities they may have missed or to act as substitute indepen-
dent second readers, clinical decision support—which may also be referred to as
computer-aided diagnosis (CADi)—is concerned with the independent evaluation
of clinical information to help clinicians reach diagnoses. The clinical information
is often provided by the radiologist, rather than being identified automatically by
the computer.
Even simple clinically significant information can improve the performance of
CADe systems. Kilday et al. included patient age with more conventional shape
and texture features [113]. The inclusion of age increased the area under the
ROC curve for the system from 0.72 to 0.82 (see Section 3.11 for background on
ROC analysis). However, care must be taken when constructing such systems
so that a priori information does not dominate other evidence (e.g. it would be
undesirable for a mammogram from a young woman with breast cancer to be
misclassified as normal on the basis that breast cancer is uncommon in that age
Chapter 3—Evaluation of computer-based methods 86
group).
Wu et al. trained an artificial neural network on features (e.g. presence of well-
defined mass, presence of microcalcification, subtlety of distortions), rated by
radiologists on a 10-point scale, from textbook cases. The trained system was
evaluated on clinical cases and the authors report that the system could discrim-
inate between malignant and benign cases more accurately than attending and
resident radiologists [187]. A similar approach was used by Floyd et al. [65].
D’Orsi et al. developed a reporting scheme where radiologists recorded either the
magnitude of a mammographic feature or a measure of their confidence in the
presence of the feature. Discriminant analysis was used to provide an estimate
of the likelihood of malignancy [58]. A significant problem with these approaches
is that the decision provided by the computer system is dependent upon human
input. There is likely to be both inter- and intra-user variation, so such systems
must be constructed to be robust to such error. In contrast to CADe systems,
where is should be possible to quote a guaranteed minimum level of performance,
no such guarantees can be made for CADi. In addition, it is likely that clinicians
would need to be trained how to use such systems and visual inspection and man-
ual interaction are required (i.e. a CADi system could not act as an independent
second reader).
3.11 Evaluation of computer-based methods
Computer-aided mammography systems are generally designed to produce a mea-
sure which can be used to make a binary decision about the presence (condition
Chapter 3—Evaluation of computer-based methods 87
A) or absence (condition B) of some characteristic. Often, there is location
information associated with the measure. Example outputs of computer-aided
mammography systems are:
• A classification of a region of interest as malignant or benign.
• An estimate of the likelihood of malignancy in a mammogram.
• A pixel-wise classification of a mammogram into microcalcification and non-
microcalcification classes.
• A pixel-wise estimate of the likelihood of the presence of a malignant mass.
• A pixel-wise segmentation of a mammogram into several tissue classes.
A simple evaluation measure that can be used when binary classifications are
made is percent correct (e.g. ‘the system correctly detected 75% of the malignant
masses’ ). This measure describes the proportion of true positives (TP)—correct
detections of condition A. However, the measure does not tell us the number of:
• false positives (FP)—incorrect detections of condition A;
• true negatives (TN)—correct detections of condition B ;
• false negatives (FN)—incorrect detections of condition B.
Rather than being reported explicitly, these statistics are usually used to compute
sensitivity (the proportion of cases of condition A that are correctly identified)
and specificity (the proportion of cases of condition B that are correctly identified)
Chapter 3—Evaluation of computer-based methods 88
[140]. Formally, if nTP denotes the number of true positives, nFP denotes the
number of false positives, nTN denotes the number of true negatives and nFN
denotes the number of false negatives, then sensitivity and specificity are defined
as:
Sensitivity =nTP
nTP + nFN
(3.1)
Specificity =nTN
nTN + nFP
(3.2)
A perfect detection or classification algorithm would have both sensitivity and
specificity equal to unity.
When an algorithm produces less coarse measurements about the presence or
absence of the characteristic in question (e.g. on a continuous scale), richer de-
scriptions of the performance of the algorithm can be produced. Sensitivity and
specificity can be computed at each of a number of thresholds. These can be plot-
ted on the unit plane to form a receiver operating characteristic2 (ROC) curve,
with 1 − specificity plotted on the abscissa (i.e. the “x-axis”) and sensitivity
plotted on the ordinate (i.e. the “y-axis”). The diagonal line defined by
sensitivity = 1− specificity (3.3)
(i.e. y = x) represents the performance of a random classifier. The ROC curve de-
scribes the trade-off between sensitivity and specificity when a particular thresh-
old (an operating point) is selected to discriminate between the two classes. Ide-
2Receiver operating characteristic analysis is named after the RADAR receiver operators ofthe second world war [46].
Chapter 3—Image databases 89
ally, the ROC curve would enclose the unit plane perfectly, and the area under
the curve would be unity. The area under the ROC curve is commonly used to
summarise the ROC curve, and is usually given the symbol Az. An example of
a desirable ROC curve is shown in Figure 7.3 and an example of an undesirable
ROC curve is shown in Figure 7.4.4.
Variants of ROC analysis were developed to allow localisation information to
form part of the analysis (e.g. FROC [30], LROC [170], AFROC [40]). The
abscissa of a FROC curve shows the number of false positives per image and
the corresponding proportion of correct detections with correct localisation is
plotted on the ordinate. FROC curves are highly sensitive to the criteria used
to determine suitable localisation. ROC and FROC analysis are commonly used
in the computer-aided mammography literature and the reader is directed to
Metz for a detailed exposition on experimental design issues and performance
evaluation in computer-aided mammography [126, 127].
3.12 Image databases
Comparing published results is not meaningful unless one can be sure that the
same data and evaluation criteria were used. It is commonplace for authors to
evaluate their algorithms using data, obtained from radiologist colleagues, which
is not made available to other investigators. This is often perfectly justified,
for example when data with particular characteristics is required, or when ethical
approval or confidentiality agreements prohibit the dissemination of patient data.
In the majority of cases, however, the use of publicly-available data should be
Chapter 3—Image databases 90
preferred, to more easily allow results to be compared and experiments to be
replicated. Efforts were made to establish common datasets in the 1990s, when
several research groups compiled databases and made them available to other
investigators. Data was originally distributed via physical media (e.g. CD-ROM,
magnetic tape) or the Internet, but as persistent storage capacity has increased
and Internet connectivity is approaching ubiquity, the Internet has become the
dominant means of distributing data to investigators.
The UK Mammographic Image Analysis Society’s (MIAS) database [171, 128]
contains 161 pairs of MLO views. The database contains examples of normal
mammograms and common types of abnormality. The images were digitised at
50 µm per pixel at 8 bits per pixel. The images were obtained from a single
UK screening centre, and the database includes all breast types (e.g. fatty, fatty-
glandular, dense). Groundtruth was annotated by a radiologist and consists of
location coordinates and radii which specify regions containing abnormalities.
The authors say that the images were ‘carefully selected. . . to be of the high-
est quality of exposure and patient positioning’ ; most papers publicising digital
mammogram databases make similar claims. A reduced-resolution version of the
database—called the mini-MIAS database—is also available [130].
The University of South Florida’s Digital Database for Screening Mammography
(DDSM) [83] contains 2 620 cases with 4 films per case, taken from screening
examinations. The images were obtained from a number of sites (the University
of South Florida, Massachusetts General Hospital, Sandia National Laboratories
and Washington University School of Medicine). The images were compressed
using a lossless variant of the JPEG image format and software is provided to
Chapter 3—Image databases 91
decode data in this format. In addition to the image data, the database con-
tains patient age, examination and digitisation dates and American College of
Radiologist (ACR) and Breast Imaging Reporting and Data System (BI-RADS)
annotations. The database is available via the Internet [52].
The Internet is the most suitable medium for advertising and distributing image
databases because it allows data to be accessed on demand and at low cost by
anyone in the world with a suitable Internet connection. A comparison of the
databases publicised in the literature with those advertised or made available via
the Internet reveals that several databases have not been adequately maintained.
These include: the Lawrence Livermore National Laboratory/University of Cal-
ifornia, San Francisco (LLNL/UCSF) database [123], the PRISM/PROMAM
database, the University of Chicago/University of North Carolina (Chapel Hill)
database [135] and the University of Washington database (although this appears
to have been included in the DDSM).
The UK Diagnostic Mammography National Database (eDiaMoND) project [23],
was a research collaboration between academia, clinicians and industry that
aimed to investigate the use of “grid” technologies to improve the efficiency of
the NHS Breast Screening Programme by enabling access to image data through
digitisation and to aid training, epidemiology and computer aided detection ef-
forts. The project aimed to make data available to its users in both traditional
and Standard Mammogram Form (SMF) formats [88]. However, it appears that
blanket ethical approval to allow researchers to use the data for arbitrary re-
search has not been obtained and so there is currently no open access to the
data. However, ethical approval may be given to specific projects. The European
Chapter 3—Image databases 92
MammoGrid project has similar aims [3] to the eDiaMoND project.
Although the DDSM is recognised as being the premier database for computer-
aided mammography, the image data is compressed using a lossless variant of the
JPEG format, which is not widely supported. A second problem is the relatively
poor annotation. Mass regions are outlined, but only the areas containing micro-
calcifications and spicules are given—individual microcalcifications and spicules
are not annotated.
An “ideal” database of digital mammograms for computer-aided mammography
research would have some or all of the following characteristics:
• Ethical approval of and patient consent for all possible useful research in
which the database could be used.
• Safeguards to ensure patient confidentiality and anonymity.
• Grouping by patient, with current and prior cases, with four views per case.
• Enough cases that statistically significant results could be obtained.
• Patients should be sampled from several clinical centres.
• Mammograms should not be excluded on the basis of substandard image ac-
quisition (unless, perhaps, a radiologist would discard the mammogram and
ask for the patient to be recalled for better mammograms to be obtained).
• Representation of all classes of mammograms:
– Normal and abnormal cases.
Chapter 3—Image databases 93
– Inclusion of all clinically significant abnormalities (microcalcifications,
masses, spiculated lesions, architectural distortions and asymmetries).
– All types of breast (e.g. fatty, dense).
– Data should be collected from both asymptomatic and symptomatic
women.
• Pixel-level annotation by several radiologists so that groundtruth likelihood
of abnormality could be estimated.
• Inclusion of clinical information relevant to breast cancer risk (e.g. patient
age, family history of breast cancer, socioeconomic status).
• Image acquisition and digitisation parameters.
• Identified subsets of abnormality (e.g. mass subset, microcalcification sub-
set), so that component algorithms could be tested separately.
• Specifications and implementations of a set of common evaluation strategies,
so that results in published work can be compared directly.
The “ideal” database described above would require significant resources to build
and maintain, however the lack of a database with these—or similar—characteristics
(and the lack of standardised evaluation protocols) is an impediment to the field.
With the recent advent of web and “grid” services, it should be possible to provide
not only mammographic image data via the Internet, but to facilitate standard-
ised evaluation of computer-aided mammography algorithms. Digitised mammo-
grams could be requested from a data provider, locally analysed, and algorithm
Chapter 3—Commercial systems 94
output submitted to an evaluation service provider which would return the evalu-
ation results (e.g. ROC curve data). These results would be directly comparable
to others generated by the same service provider.
3.13 Commercial systems
There have been several attempts to develop and market commercial computer-
aided mammography systems. The reader is directed to [72] for a detailed history
of commercialisation efforts. It is common for medical devices to be marketed
in the USA first, and doing so requires pre-market approval (PMA) from the
US Food and Drug Administration (FDA). For devices such as CADe systems,
PMA requires that the device does not significantly increase the callback rate
(especially for biopsies) and is capable of correctly identifying areas associated
with cancer. PMA is not concerned with value for money or the impact a device
has on work-flow. Instead, PMA is a certificate of safety, rather than of clinical
effectiveness or efficiency. FDA PMA is judged in terms of the mammography
landscape in the USA, which differs from that of the UK (e.g. in the USA the
age range of women undergoing screening is wider, the screening population is
self-referred and the screening interval is one year [72]). Claims made about
a CADe system with respect to FDA PMA do not automatically apply to the
UK. Nevertheless, we will restrict the discussion of commercial systems to those
which have obtained FDA PMA (although VuCOMP expects FDA approval for
its M-Vu system in 2005 [181]).
There are currently four commercial CADe systems for mammography that have
Chapter 3—Commercial systems 95
obtained FDA PMA: the ImageChecker by R2 Technology Incorporated [150],
Second Look by iCAD Incorporated [99], the KODAK Mammography CAD Sys-
tem by the Eastman Kodak Company [114] and the Senographe 2000D system
by the General Electric Company [69]. General Electric license the ImageChecker
software for their Senographe system. The KODAK Mammography CAD Sys-
tem has only recently been given FDA PMA (late 2004) and no evaluations of
the technology have been published in the literature. We will therefore restrict
our discussion to the ImageChecker and Second Look systems.
The ImageChecker system obtained PMA in 1998 and the FDA has since granted
PMA for several improvements to the system. The system uses algorithms devel-
oped by Nico Karssemeijer and collaborators and displays mass and microcalci-
fication prompts on a computer monitor. There has been extensive evaluation of
the system in the USA and Europe.
R2 Technology Incorporated claim that version 8.0 of their ImageChecker algo-
rithm achieves ‘1.5 false positive marks per normal case at the 91 percent sensitiv-
ity level’ [149]. The system costs approximately £108 000 and an annual service
contract costs approximately £10 000 (ca. 2001, [72]).
The reader is referred to [72] for a discussion of evaluations performed on the
ImageChecker system and to Section 3.14 for a discussion of evaluations of the
ImageCheckersystem for prompting. Astley et al. compared the ImageChecker
system to non-medical readers [9] for pre-screening. 900 cases containing four
films per case (10% containing cancers) were read by 6 trained but non-medical
readers and the ImageChecker system. The ImageChecker failed to mark 3 of the
Chapter 3—Commercial systems 96
cancers, while the non-medical readers failed to mark between 4 and 21 cancers.
The best non-medical readers had false positive rates of 33% and 44% while the
ImageChecker system had a false positive rate of around 69%. It took the non-
medical readers an average of 40 s to read a case, while the ImageChecker system
took an average of 318 s.
The Second Look system obtained PMA in 2002 and the FDA has since granted
PMA for several improvements to the system. The system uses algorithms devel-
oped by Steven Rogers, a retired airborne weapons specialist, and displays mass
and microcalcification cluster prompts on a paper printout [72]. The Second Look
700 system costs $139 950 [98] (approximately £72 870).
Astley et al. evaluated the Second Look system in a UK screening environment
[8]. They report that the false positive rate was 1.43 per image on normal mam-
mograms and 1.22 per image when averaged over both normal and abnormal
mammograms. The system could correctly identify 73.8% of abnormalities (rising
to 83.3% when both MLO and CC views were available). The authors simulated
clinical use of the system. 790 cases were read by 3 radiologists and 1 radio-
grapher, with and without prompting. No significant differences in recall rate
or reading time were found (the radiographer was faster with prompting, while
the radiologists took longer). The authors note that technical problems with
the system (e.g. reduced throughput due to problems with stick-on film labels
and failures caused by static electricity in the reading room) would require the
employment of an additional administrator and would delay reading by a day.
An evaluation of the Second Look system’s ability to detect early cancers was
performed. Current and prior films were studied from a normal control group
Chapter 3—Prompting 97
and from a group for whom cancer was identified in the current films. The radi-
ologists were asked to identify cancers in the prior films without and then with
the current films. The radiologists identified 10% and then 14.4% of the cancers.
The Second Look system identified 27.8% of the cancers in the prior films. This
suggests that, on early cancers, the system can perform better than radiologists.
3.14 Prompting
The prompting model for computer-aided mammography is predicated on the
assumption that prompts will help radiologists. Research into prompting seeks
to determine if, and under what circumstances, this assumption is valid. There
are essentially two types of prompting research:
• The psychophysical aspects of prompting. Participants generally perform
image interpretation tasks in synthetic environments.
• Evaluation of radiologist performance when CADe systems are used.
Hutt et al. investigated the effect of erroneous prompts on radiologist perfor-
mance. Seven radiologists viewed 48 digitised mammographic regions of interest
with and without microcalcification clusters. Prompts were placed on the images
and the error rate was varied. The authors report that prompting was only ef-
fective when the false positive rate was low (approximately 0.5 false prompts per
image) [96]. A screening environment was simulated and 6 radiologists viewed
100 films containing normal and abnormal mammograms with single or mul-
tiple abnormalities. The mammograms were read with and without prompts,
Chapter 3—Prompting 98
with the false prompt rate set at approximately 1.1 per image. The radiologists
performed better in the prompted condition. In prompted cases where the radi-
ologists missed abnormalities, the films had no prompt on the real abnormality
and a false prompt elsewhere. This work suggests that, not only is the false
positive rate important, but incorrect prompts can distract radiologists from real
abnormalities.
Hutt’s PhD thesis presents a larger version of the experiment reported by Hutt
et al. [96]. Prompted and unprompted mammograms were read by 30 radiolo-
gists from 11 UK screening centres. The results suggest that prompting can be
expected to be successful if the number of false positives does not exceed the
number of true positives by more than 50%. Hutt suggests that, given the over
representation of abnormal mammograms in the test set, this relationship should
be revised downwards to a true to false positive ratio of approximately unity.
This ratio was confirmed by Astley et al. in a psychophysical experiment that
used simulated abnormalities and non-medical readers [7]. Given that only 5% of
screening mammograms have any form of abnormality, a prompting system that
generates true and false positives with equal probability will on average generate
a false positive no more than once in 20 cases. If we assume 4 images per case,
then this equates to 1 false positive in 80 images (or, 0.0125 false positives per
image). By comparison, R2 Technology Incorporated claim that version 8.0 of
their ImageChecker algorithm achieves ‘1.5 false positive marks per normal case
at the 91 percent sensitivity level’ [149]. However, it should be noted that the
psychophysical experiments reported by Hutt et al. were not conducted in clinical
settings and relatively few radiologists and images were used, limiting the validity
Chapter 3—Prompting 99
of generalising their results.
Giger et al. evaluated the usefulness of an “intelligent search” mammography
workstation [70]. Upon presentation of an unknown case, the workstation output
an estimate of malignancy (based upon an automatic segmentation algorithm and
an artificial neural network using geometric and texture features), images—from
an atlas—of lesions that were deemed to be similar and graphics illustrating the
characteristics of the presented lesion relative to those in the atlas. The user
could search for similar lesions using various criteria. Users could interactively
alter the image contrast and magnify the mammograms. A set of 50 normal and
50 mass images were viewed by 5 radiologists with and without the workstation.
The authors report an improvement when the workstation was used (Az of 0.90
with the workstation compared to 0.86 without). This work suggests that allowing
radiologists to manipulate the digital images and compare them to other cases can
improve radiologist performance, but the paper does not analyse which aspects
of the system were most effective.
Karssemeijer et al. investigated single and double reading by radiologists and sin-
gle reading with prompting [109]. A set of 10 expert radiologists read 500 cases,
where half contained cancers, and estimated the likelihood of malignancy. The
images were also analysed using the ImageChecker system and the suspiciousness
rating of each prompt was recorded. Double reading was simulated by combin-
ing annotations from each possible pairing of the 10 radiologists using a prompt
proximity rule. Reading with CADe was simulated using a similar approach. The
performance of the three types of reading was assessed using the mean sensitiv-
ity in the region of the ROC curve representing false positive rates lower than
Chapter 3—Prompting 100
10%. This figure was chosen as the false positive rate of screening in the USA is
approximately 8% and is between 1% and 4% in Europe. For single reading, the
mean sensitivity was 39.4%. For simulated double reading, the mean sensitivity
was 49.9%. For simulated reading with CADe the mean sensitivity was 46.4%.
Gur et al. prospectively assessed the impact of CADe on patient recall and cancer
detection rates in a clinical setting [76]. A set of 115 571 mammograms was
divided into two almost equal sets which were read by 24 radiologists with and
without prompts generated by the ImageChecker system. No significant increase
in recall or detection rates were found when CADe was used. However, the
confidence intervals associated with recall and detection rates were large enough
to be consistent with the possibility of large improvements when CADe is used,
due to the relatively low number of cancers detected with and without CADe and
the large inter-reader variability among the radiologists. Additionally, during the
period of the study, the percentage of women who were screened for the first
time decreased from 40% to 30%. On average, first screening rounds have higher
recall rates than subsequent rounds and so cancers detected in first rounds may
be considered “easier”. However, the authors found there to be no statistically
significant trend in detection rates over time. The authors conclude that, if their
results were not due to chance, current CADe systems are not suitable for use by
expert screening mammography radiologists.
Freer and Ulissey conducted a large prospective study of the effect of CADe on
recall rate, positive predictive value for biopsy, cancer detection rate and the stage
of detected cancers [67]. 12 860 screening mammograms were interpreted first
without the assistance of CADe, and then immediately after with the assistance
Chapter 3—Discussion 101
of the ImageChecker CADe system. The authors report that use of the CADe
system for prompting resulted in an increase in recall rate (from 6.5% to 7.7%),
no change in positive predictive value for biopsy, an increase of 19.5% in the
number of cancers detected and an increase in the number of early stage cancers
detected (from 73% to 78%). However, the authors caution that the relatively
low median age of the screening population (49 years) imposes limitations on the
statistical significance of the above observations.
Warren Burhenne et al. retrospectively studied the ability of the ImageChecker
system to identify cancers missed by radiologists [32]. 1 083 mammograms that
led to biopsy-proved cancers and their available prior mammograms were collected
from 13 centres. The CADe system was able to identify 77% of the cancers that
were originally missed by radiologists, without a statistically significant increase
in recall rate. The research suggests that the ImageChecker system could have a
dramatic effect on the early detection of breast cancer.
3.15 Discussion
One of the earliest papers on computer-aided mammography was written by
Ackerman and Gose [1] in 1972. The authors aimed to classify low-resolution
digitised photographs of regions of mammograms as malignant or benign using
automatically-extracted features (measures of calcification, spiculation, rough-
ness and the area-to-perimeter ratio). Classification was attempted using a mul-
tivariate Gaussian model and nearest neighbour classification, the latter of which
was found to perform best. While computers, image digitisation technology and
Chapter 3—Discussion 102
machine learning algorithms have developed significantly since the paper was
published, the approach to computer-aided mammography has not.
This approach can be stated as follows: ad hoc features are extracted from seg-
mented regions and classified into clinically significant classes. The classification
stage is informed by the wider pattern recognition, machine learning and statis-
tical decision theory communities. The segmentation step typically uses ad hoc
algorithms. Often, an attempt to insert human expertise into the system is made
by choosing features that describe characteristics that radiologists report to be
important. Whilst the above approach is usually reported to be successful, ad
hoc methods risk the accidental adoption of assumptions about the data. The
result of this problem may be that CADe methods perform well on the original
investigator’s data, but do not work as well on other data.
Many methods that use classifiers produce “probability” images, which are later
thresholded to obtain a final classification. These images are typically not true
probability or likelihood images and are simply images with probability-like val-
ues. This distinction is important, because accurate quantitative descriptions
may be more useful to clinicians than qualitative (classification) descriptions,
and could be used in further statistical analyses. Measurements that describe
the state or change in anatomy may also be clinically useful. For example, it is
probably more meaningful to report that a tumour in a mammogram has likely
increased in volume by 20% since the last screening session, than to simply say
that an area is suspicious.
Evaluation criteria are often optimistic. In LROC analysis, for example, the
Chapter 3—Discussion 103
selection of forgiving localisation criteria can give an inaccurate assessment of the
performance of algorithms. This could be rectified by the adoption of standard
databases and assessment criteria. This would have the additional benefit of
allowing meaningful comparison of results in the literature.
The research of Hutt et al. suggested that CADe would only result in a significant
improvement in radiologist performance if the number of false positives can be
reduced to 0.0125 per image. Research has indicated that radiologist performance
can be improved by CADe algorithms that have false prompt rates substantially
higher than that target [67, 32]. However, it has been shown that current commer-
cial CADe systems can fail to improve radiologist performance [76], so lowering
the false positive rate to a level at which significant improvement can be expected
is highly desirable. Reducing the false positive rate while maintaining sensitivity
will be a significant challenge. The hypothesis promoted in this thesis is that
this kind of improvement can only be achieved by systems that understand the
appearance of mammograms.
Abnormality in mammograms manifests itself in a number of ways, but most
CADe methods target only one of these classes of abnormality; microcalcification
clusters, masses and spiculated lesions are most commonly chosen. A better ap-
proach would be to develop a single method that can identify all (or many) of
the common types of abnormality. It is not immediately clear how this might
be achieved, because the appearances of the various forms of abnormality are so
different. However, there is commonality between all types of abnormal mammo-
graphic appearance: none of them are found in normal mammograms. A method
that could detect deviation from normality should be able to identify all forms of
Chapter 3—Discussion 104
abnormality. This approach is called novelty detection.
Novelty detection uses a model of the class of interest that allows novel instances
to be identified. Statistical models serve this purpose well, because deviation
from normality can be measured in a meaningful way within a rigorous mathe-
matical framework. Further, generative statistical models—such as active shape
and appearance models [47]—allow synthetic instances of the class of interest
to be generated. This allows the specificity and generality of the model to be
assessed. A specific model is one that models only legal instances of the class
of interest and a general model is one that models all possible instances of the
class of interest. A good model would be both specific and general. The aim of
the work presented in this thesis is to investigate generative statistical models of
mammographic appearance. The ultimate aim is to perform CADe by novelty
detection.
Novelty detection has previously been applied to computer-aided mammography.
Tarassenko et al. identified masses using a novelty detection method. Geometrical
and textural features were extracted from pre-processed mammograms. A Parzen
window density estimator (see Section 5.3) was used to model the distribution of
feature vectors extracted from normal tissue. The method identified all masses
in a test set of 40 images at a false positive rate of 1 per image [174]. Holmes
used an adaptive kernel density estimator to learn the distribution of transformed
scale-orientation pixel signatures taken from normal tissue (see Chapter 4 for a
detailed discussion of pixel signatures). The transformation to a low-dimensional
space allowed Euclidean distance to approximate a sophisticated robust metric.
Holmes performed novelty detection by computing the likelihood of signatures
Chapter 3—Summary 105
under the model to produce likelihood images. Subjectively, the likelihood values
appeared to allow pixels belonging to normal tissue to be discriminated from those
belonging to spiculated lesions, though no quantitative evaluation of the method
was performed [90]. However, neither of these methods employed generative
models.
3.16 Summary
This chapter presented a review of the computer-aided mammography literature.
In summary:
• The typical approach taken to CADe is to classify shape and texture fea-
tures, extracted from candidate locations, into clinically significant classes.
It can be difficult to justify exactly why one set of features is better than
another and to explain what they correspond to in terms of the clinical
situation. Features are typically tuned to a specific sign of abnormality, so
each indicative sign requires a different algorithm.
• The lack of standardised evaluation methods, training and test sets makes
it very difficult to compare published results.
• Commercial systems are available and have been shown to improve radi-
ologist performance; however, they can also fail to improve performance.
Psychophysical research has suggested that a false positive rate much lower
than that achieved by current commercial systems is required for signif-
icant improvement in radiologist performance. Much more sophisticated
Chapter 3—Summary 106
approaches may be required to achieve such targets.
• One such approach may be novelty detection, where all forms of abnormality
should be able to be detected and quantified within a rigorous mathematical
framework. Novelty detection requires a model of the appearance of normal
mammograms that allows deviation from normality to be measured.
Chapter 4
Scale-orientation pixel signatures
4.1 Introduction
This chapter presents some work on improving an existing method for describing
local image structure in terms of scale and orientation. The chapter presents:
• Background information on mathematical morphology and its use in com-
puting scale-orientation pixel signatures.
• An analysis which identifies two flaws in an existing implementation and
proposes how these problems can be rectified.
• An information theoretic method for comparing the old and new pixel sig-
natures.
• A classification experiment to compare the two approaches.
107
Chapter 4—Mathematical morphology 108
4.2 Mathematical morphology
Image and signal processing has commonly been thought about in terms of fre-
quency (e.g. Fourier analysis; wavelet analysis uses positional information in ad-
dition to frequency information [49, 122]). Mathematical morphology approaches
image and signal processing in terms of shape1. One of the attractions of mor-
phological processing is that image features can be targeted for processing without
altering the rest of the image (e.g. small features can be removed from images,
leaving edges and grey-levels untouched). We will present two fundamental mor-
phological operators and show how they can be combined to perform two other
classes of morphological operation.
Morphological operators can be defined for simple 1-D signals, 2-D images or
more complex signals. We will restrict our discussion to the 2-D image plane.
The operators we shall discuss are binary operators, meaning that they take two
input objects and return a single output. One of these input objects is the image
matrix to be processed and the other is an object called a structuring element,
which allows the operations to be tuned to specific sizes and shapes of feature. A
structuring element is simply a shape and can be represented by a set of vectors
that specify offsets from some origin2. The structuring element can be visualised
by plotting each offset in the set in an image plane. A simple structuring element
1A thorough presentation is given by Serra and Matheron [161, 162, 124], though the readeris directed to Sonka et al. [168] for an introduction to mathematical morphology as it relatesto the work here.
2Although the grey-scale definitions of the following operators can use structuring elementsthat have associated grey-levels, this is not of interest in this work.
Chapter 4—Mathematical morphology 109
is shown in Figure 4.1(b) and corresponds to the following point set3:
S = {(0, 0), (0, 1)}. (4.1)
4.2.1 Dilation and erosion
Dilation and erosion are the fundamental morphological operators. Let f(x)
be a function that describes a grey-level image. Further, let ti be an offset
and S ={ti : i = 1, . . . , N
}be a structuring element as described above. The
dilation of f(x) by S is given by:
f(x)⊕ S = maxt∈S
{f(x− t)}. (4.2)
An example is given in Figure 4.1. Figure 4.1(a) shows a binary image of the
letter E, where the background has a value of zero and the foreground has a value
of one. Figure 4.1(b) shows the structuring element defined by Equation 4.1.
Figure 4.1(c) shows the dilation of the image of the letter E by the structuring
element. The figure illustrates how dilation removes intensity troughs that are
smaller than the structuring element. Dilation increases the object size and can
be used to fill gaps.
The dual of dilation is erosion, which is defined as:
f(x) S = mint∈S
{f(x− t)}. (4.3)
3Note that we use (r, c)—row, column—indexing, as opposed to (x, y) indexing.
Chapter 4—Mathematical morphology 110
(a) (b) (c)
Figure 4.1: Dilation.A binary image matrix is shown in (a). It is dilated by the structuring elementshown in (b). The result is shown in (c).
Erosion removes intensity peaks that are smaller than the structuring element.
4.2.2 Opening and closing
Dilation and erosion can be used to remove image features, but they change the
global appearance of the image (the object in Figure 4.1 is made larger by dilation
and would be made smaller by erosion). Dilation and erosion can be combined
so that targeted features are removed without changing the global appearance of
the image. These combinations are called the opening and closing operators, and
are respectively defined as:
f ◦ S = (f S)⊕ S, (4.4)
f • S = (f ⊕ S) S, (4.5)
where we drop the image indexing for simplicity. Opening and closing respectively
remove intensity peaks and troughs that are smaller than the structuring element,
without altering the global image appearance. They are idempotent operators,
Chapter 4—Mathematical morphology 111
which means that successive applications of the same operation do not alter the
previous result.
4.2.3 M- and N-filters
Opening and closing allows intensity peaks and troughs to be removed without
altering the global image appearance, but are tuned to the polarity of the features
on which they operate. Combining an opening and a closing is called sieving
[11]. Sieves remove image features that are smaller than the structuring element,
irrespective of the feature’s polarity. Two sieves—called M- and N-filters—are
respectively defined as:
f } S = (f ◦ S) • S, (4.6)
f ~ S = (f • S) ◦ S. (4.7)
An example of grey-level sieving is shown in Figure 4.2. A mammographic region
of interest is sieved using a rectangular structuring element oriented at approx-
imately 45◦. The figure shows how image structure that is smaller than the
oriented structuring element is removed.
Chapter 4—Pixel signatures 112
(a) (b)
Figure 4.2: A sieved mammographic image.Image (a) is a mammographic region of interest around a spiculated lesion. Image(b) shows the result of sieving image (a) with a rectangular structuring element,oriented at approximately 45◦. The structuring element is shown in red in thetop-right corner of (b).
4.3 Pixel signatures
4.3.1 Local scale-orientation descriptors
Pixel signatures are rich feature descriptors of local image structure that are ex-
pressed in terms of scale and orientation. Describing mammographic features in
terms of scale and orientation is useful for a number of reasons. Mammograms
contain features that have an associated orientation (e.g. curvilinear structures)
and which do not have a particular orientation (e.g. circumscribed masses); these
features may exist over a range of scales. Radiologists often talk about mam-
mographic features in terms of scale and orientation (e.g. features that ‘point’
towards the nipple or ‘radiate’ from a particular location). Further, it is known
Chapter 4—Pixel signatures 113
that the mammalian primary visual cortex explicitly encodes visual information
in terms of scale and orientation (see [183] for a discussion of the work of Hubel
[93] and Wiesel [184]).
The pixel signatures discussed in this thesis are developed from those described by
Holmes [90], which used M-filters, and these were in turn developed from those
described by Zwiggelaar et al. [192], which used directional recursive median
filters4.
4.3.2 Constructing pixel signatures
For a given input image, a scale-orientation pixel signature is computed at each
pixel location as follows. A set of sieved images are generated from the input im-
age by sieving it with structuring elements at a number of scales and orientations.
The pixel signatures used by Holmes et al. [91, 92, 90] were computed using a
Bresenham line structuring element [26]. Each signature is a 2-D array where the
rows are measurements for the same scale and the columns are measurements for
the same orientation (see Figure 4.3).
Formally, let f(x) be a grey-scale image. f(x) is sieved using a set of structuring
elements {Sσ,φ}, where σ indexes scale and φ indexes orientation. The result is
a set of grey-scale images {sσ,φ(x)}. The value at (σ, φ) in the pixel signature
4The principal advantage of using morphological operators is that there is an efficient wayto perform erosion and dilation [167], however today’s desktop computers can construct pixelsignatures reasonably quickly using a naıve implementation.
Chapter 4—Pixel signatures 114
Figure 4.3: Example pixel signatures.Pixel signatures taken from the centres of Gaussian blob and line images.
associated with location x in f(x) is given by
ρ(x, σ, φ) = sσ−1,φ(x)− sσ,φ(x). (4.8)
Stated simply, for a particular image pixel and a given scale and orientation, the
signature value is the grey-level difference between the pixel value in the sieved
images at the previous and current scales.
Figure 4.3 shows two pixel signatures taken from the centres of two synthetic
images. One image is a Gaussian blob and the other is a Gaussian line. The
signature for the Gaussian blob shows approximately uniform scale which is in-
dependent of orientation5. The signature for the Gaussian line shows that as one
looks across the line it appears to have a limited scale, but when one looks along
the line it appears to be much larger. Pixel signatures from non-trivial images are
not as simple to interpret and are intended to be used as feature vectors within
a machine learning framework such as a classifier.
In the previously reported work [91, 92] pixel signatures were generated for 12
5We will see later that a limitation in the implementation results in the non-ideal behaviourin the signature for the Gaussian blob in Figure 4.3.
Chapter 4—Pixel signatures 115
regularly-spaced orientations and 10 scales—ranging from 150 µm to 2 cm. These
scales encompass image features that we would like to measure, from microcalci-
fications to small masses. The scales increase logarithmically to give preferential
sampling resolution to small features. We use the same scheme in the research
presented in this chapter.
4.3.3 Metric properties
Although pixel signatures give a rich local description of image structure, the Eu-
clidean distance between two pixel signatures treated as points in a vector space
is an imperfect similarity measure. This is because responses to two similar image
structures may appear in slightly different locations in the corresponding signa-
tures. The work presented by Holmes et al. describes a sophisticated approach
to dealing with this problem by treating signature similarity as a transportation
problem, where similarity is measured by the cost of transforming one signature
into another [91, 92, 90]. Further, an efficient way of computing this measure is
described, where signatures are transformed into a space where Euclidean dis-
tance approximates the transportation cost. This chapter deals with improving
the raw signatures, and so the metric properties of pixel signatures will not be
discussed further.
Chapter 4—Analysis of the current implementation 116
4.4 Analysis of the current implementation
In this section we analyse the existing implementation of pixel signatures and
propose two improvements. The first addresses the length of the structuring
element and the second addresses the coverage of the structuring element.
4.4.1 Structuring element length
Figure 4.3 shows a problem with the implementation of pixel signatures used in
[91, 92, 90]: even though the Gaussian blob is circular (up to the image quan-
tisation), the pixel signature for the central pixel shows the scale to vary with
orientation. This is caused by incorrect computation of the length of the struc-
turing element, which should be invariant to orientation. If, for a particular scale,
one were to plot the position of the ends of the structuring element as it is rotated
about a pixel, it should trace a circle. Instead, the structuring element traces a
square, with the structuring element being longer at the diagonal orientations
than at the horizontal and vertical orientations. This is illustrated in Figure 4.4:
as the structuring element moves from position A to B the structuring element
“grows” in length (although all three structuring elements have the same number
of pixels). This problem is corrected in our implementation, as Figure 4.7(b)
shows.
Chapter 4—Analysis of the current implementation 117
4.4.2 Local coverage
The structuring elements in the existing implementation are 1-D (i.e. a single
line of pixels), as illustrated in Figure 4.4. The area between rotations of the
structuring element—the shaded region in Figure 4.4—does not contribute to
the pixel signature. If we neglect quantisation, this region has an area of r2θ
(i.e. two sectors of a circle), where r is the length of the structuring element
and θ is the angle between adjacent structuring elements. This is a problem
because there is likely to be useful information in the region that is not considered.
While information in this area may contribute to nearby signatures, it should
be contained in the signature for the pixel. The solution is contained within
Figure 4.4: the structuring element should be shaped like a bow tie—i.e. like the
shaded region in the figure.
Recall from Section 4.2 that our morphological operators are defined in terms of
minima and maxima of areas under the structuring element. The bow tie-shaped
structuring element is non-trivial to construct on the quantised image plane for
arbitrary sizes and orientations. Further, computing the minimum or maximum
value under such a shape—particularly for large images such as mammograms—is
likely to be computationally demanding. We seek to improve the signatures by
considering the relevant pixels using a suitable structuring element, but without
incurring the computational penalty associated with a complex shape.
Figure 4.5 shows a series of approximations of the bow tie-shaped structuring
element. Simplifying the shape of the structuring element yields the element
shown in Figure 4.5(b), which has a shape that is easier to construct, but gives
Chapter 4—Analysis of the current implementation 118
Figure 4.4: An illustration of the two limitations of the existing implementation.Three rotations of a structuring element are shown. As the structuring elementis rotated, it “grows” in length. The red shaded region illustrates the area notcovered by the 1-D structuring elements of the existing implementation and thedesired length of the diagonal structuring element.
Chapter 4—Analysis of the current implementation 119
(a) (b) (c) (d)
Figure 4.5: Incremental approximations of the bow tie structuring element.The computationally expensive bow tie-shaped structuring element is shown in(a). An initial approximation is shown in (b), which is closely approximated by(c). The structuring element in (c) can be approximately decomposed as (d).
Chapter 4—Analysis of the current implementation 120
consideration to the regions either side of the centre. When quantised, this ap-
proximation is actually a rectangle for all but the largest structuring elements,
and the additional regions either side of the centre are insignificant. Using a solid
structuring element is expensive because of the number of pixels that need to be
compared when computing the minimum or maximum.
It is possible to approximately decompose a sieving with an arbitrarily oriented
rectangular structuring element as a sieving with two orthogonal 1-D structuring
elements. The first structuring element has the same length and orientation as
the longest side of the rectangular structuring element. The second structuring
element has the same length and orientation as the shortest side of the rectangular
structuring element. The input image is sieved using the first structuring element
and the resulting image is then sieved using the second structuring element6. For
the majority of pixels (60%–90%), there is no difference between the full sieving
and the approximation. Approximation errors are very rarely more than 10 grey-
level values in magnitude (in 8-bit images) and are imperceptible. Because we
have been able to decompose the sieving in terms of two structuring elements
that are one pixel wide, Soille’s algorithm can be used to perform the erosions
and dilations efficiently [167].
The width of the rectangular structuring element—and hence the second 1-D
structuring element—needs to be such that the rectangle “fits” as it is rotated
from one orientation to another. If the length of the first (longest) structuring
element is r then the length of the second structuring element is 2r sin θ2
where
6Experimental work showed that reversing the order in which sieving was performed de-creased the accuracy of the approximation.
Chapter 4—Analysis of the current implementation 121
Figure 4.6: Rotating the “rectangular” structuring elements.The diagram shows how the width of the rectangle—and hence the length of thesecond approximating structuring element—needs to be selected so that correctcoverage is achieved (i.e. the corners of adjacent structuring elements need totouch).
θ is the angle through which the elements are rotated when moving from one
orientation column to another. This is illustrated in Figure 4.6.
Our proposed new method of computing pixel signatures ensures that structuring
element length is constant over orientation (allowing for the quantised image
plane) and uses the orthogonal elements approximation to give consideration to
pixels that the original method neglected. Figure 4.7 shows a pixel signature
computed for the centre of a Gaussian blob using the new method. The non-
linearity that remains is due to quantisation. Signatures from the centre of a
Chapter 4—An information theoretic measure of signature quality 122
(a) (b)
Figure 4.7: An “improved” pixel signature from the centre of a Gaussian blob.A Gaussian blob is shown in (a) and a pixel signature, computed using our method,is shown in (b). Note that the signature does not exhibit the non-linearity of theequivalent signature in Figure 4.3.
Gaussian line are similar to those of the original method.
4.5 An information theoretic measure of signa-
ture quality
The most obvious way to compare the original and new methods of computing
signatures would be to run a classification experiment. However, to build an ac-
curate picture of how well each performed would require large-scale experiments,
targeting the various different forms of abnormality. Consequently we sought a
more direct measure of comparing their behaviour.
In producing pixel signatures, we hope to encapsulate useful information about
local image appearance. A signature that contains more information than another
is likely to be more useful. Shannon’s entropy [163] is a measure of the average
information carried by a discrete symbol emitted from some source. The entropy
Chapter 4—An information theoretic measure of signature quality 123
measure is derived by considering the “uncertainty” that is associated with a
symbol (or the “surprise” associated with the symbol). Given a symbol with
probability p, selected from some alphabet A = {a1, a2, · · · , aN}, the measure of
the uncertainty associated with the symbol, u(p), is defined axiomatically:
• u(1) = 0. We are certain of—or unsurprised by—the certain event.
• u(p) > u(q) ⇐⇒ p < q. We are more uncertain of—or more surprised
by—less probable symbols.
• u(pq) = u(p) + u(q). The uncertainty measure is additive for a sequence of
symbols.
• u(p) is continuous in p.
Shannon showed that the only function satisfying these axioms is u(p) = −K loga p.
The constant K is usually set to unity and the base of the logarithm is usually
set to 2, in which case the uncertainty—usually interpreted as the information
content—is measured in bits. The expected information content of a symbol
emitted by a source is given by:
H = −N∑
i=1
pi log2 pi. (4.9)
Shannon’s entropy can be illustrated as follows. Imagine two coins, each of which
has an associated probability mass function. Assume that one coin is fair and
the other is very heavily biased towards Heads. Further, imagine that a friend
knows these models and has to guess the outcomes of coin tosses, given that they
Chapter 4—An information theoretic measure of signature quality 124
can know which coin was tossed. Telling your friend that the unfair coin was
tossed gives them a very good chance of correctly guessing the message, but the
actual message itself (‘The coin landed with the Head facing upwards.’ ) contains
little surprise (information). Conversely, if the friend is told that the fair coin
is tossed, they have little information about what the message might be and so
the message carries more surprise (information) than the message for the unfair
coin. In summary: on average, events from peaked distributions convey little
information, while events from flat distributions convey more information.
An experiment to compare the two methods of computing scale-orientation pix-
els signatures using the information theoretic measure of signature quality is
described below.
4.5.1 Aims
The aim of the experiment was to determine if the modifications made to the
pixel signatures increases the information content of the new signatures, relative
to the original method.
4.5.2 Method
We would ideally treat each pixel signature as a symbol and compute the expected
information that each of the two types of signature carries (i.e. treat the signature
type as the source). However, because the pixel signatures we use are essentially
points in a 120-D space, building a model of the probability mass function for
Chapter 4—An information theoretic measure of signature quality 125
signatures is intractable; it is very unlikely that multiple identical signatures will
be encountered, even in a large sample7. If an equal number of original and
new signatures were sampled, and each signature occurred only once, then the
Shannon entropy of each source would be identical. Such a measure would not
be useful. Instead, we consider each pixel signature to be a source, where the
values of the signature elements are the message symbols. If all the elements in
signatures had similar values the signatures would carry little useful information,
whereas signatures where different elements take on distinct values can carry
useful information.
A set of 10 regions of interest, each approximately 400 mm2, around spiculated
lesions were pseudo-randomly selected from the Digital Database for Screening
Mammography [83]. As well as containing the abnormal feature, the regions
were large enough to contain pixels from tissue that a CADe system should label
as being normal. For each pixel in each image, pixel signatures were computed
using both methods and the corresponding Shannon entropies were calculated.
Despite the relatively small number of images, our sample size is actually very
large (2 310 342 pixel locations). A more comprehensive study would look at all
indicative signs of abnormality (e.g. the various types of mass, microcalcifications,
architectural distortions), but such work was beyond the scope of this experiment.
7Multiple identical signatures would only exist in a set of images that contained multipleidentical regions.
Chapter 4—An information theoretic measure of signature quality 126
4.5.3 Results
Shannon entropy was computed for 2 310 342 pixel signatures. The total Shannon
entropy was 6 426 499 bits for the original signatures and 7 638 189 bits for the new
signatures. This is an average increase of over 0.5 bits per pixel or nearly 19%.
A t-test on the paired differences between the two sets of entropies at the 95%
significance level showed that the new method yields a statistically significant
increase in Shannon entropy. Figure 4.8 shows three regions of interest around
spiculated lesions and illustrates where the additional information is distributed.
4.5.4 Discussion
The results show that our attempt to improve the way that pixel signatures are
computed increases the information content of the signatures for spiculated lesion
and surrounding “normal” tissue. Although pixel signatures for almost all types
of tissue included see an increase in information content, the increase seems to
be larger for regions around masses—particularly for spicules. Little increase in
information, or a decrease, is seen in homogeneous regions. We cannot draw any
conclusions for regions containing microcalcifications—as we did not include such
images—but as inhomogeneous regions see the most increase in Shannon entropy,
we would expect an increase for pixel signatures from such regions. The following
experiment investigates whether our modifications yielded better results when the
new method of computing signatures is used in a practical application.
Chapter 4—An information theoretic measure of signature quality 127
Figure 4.8: Regions of increased Shannon entropy.The left column shows three regions of interest (to scale). The right column showsthe pixel-wise differences in Shannon entropy between the new and original meth-ods (i.e. positive values illustrate where the new method has more information).Thresholding the difference images shows that almost all pixel pixel signaturescomputed using the new method have more information than those computed withthe original method.
Chapter 4—Classification-based evaluation 128
4.6 Classification-based evaluation
4.6.1 Aims
The information theoretic evaluation demonstrates that the new signatures con-
tain more information than those produced by the previous method. Although it
is intuitive to expect that a more informative description will yield better results
when used within a learning framework such as a classifier, we need to demon-
strate that this is the case. The aim of this experiment is to determine if the new
signatures can be applied more successfully than those produced by the original
method.
4.6.2 Method
An expert radiologist provided annotations for the images described in Section 4.5
(an example region of interest is shown in Figure 4.9). A set of just over 20 000
locations within the images were randomly sampled, such that half were sampled
from the abnormal regions and half from the normal regions. Pixel signatures—
computed using the two methods as described previously—were then extracted
for these locations. The columns of the signatures were concatenated, converting
the 2-D signatures into vectors that can be considered to be points in a 120-D
space. For each type of signature—i.e. original and new—training and test sets
were formed by randomly allocating signatures to either a training set or a test
set.
Chapter 4—Classification-based evaluation 129
Figure 4.9: An example region of interest and its groundtruth.
There are many pattern classification techniques—e.g. nearest neighbour clas-
sifier, linear discriminant analysis, artificial neural networks—and the support
vector machine classifier has become popular for its classification ability and abil-
ity to generalise. A support vector machine classifier [31] was trained using the
training set for the original signatures. Suitable training parameters were selected
by validating on the test set for the original signatures. A second classifier was
then trained on the training set for the new signatures, using the same training
parameters as were selected for the original signatures. This approach attempts
to remove bias towards the new method of producing pixel signatures. The test
set for the signatures produced using the new method were then classified using
the second classifier.
Chapter 4—Classification-based evaluation 130
Original signatures New signatures
nTP 3 729 3 932nTN 3 745 3 746nFP 1 276 1 266nFN 1 283 1 080Specificity 0.747 0.751Sensitivity 0.744 0.785
Table 4.1: Classification results for the two signature types.The table shows the number of true positives (nTP), true negatives (nTN), falsepositives (nFP), false negatives (nFN) and the specificity and sensitivity for thetwo signature types. See Section 3.11 for an explanation of these quantities.
4.6.3 Results
The results of the classification experiment are summarised in Table 4.1. Both
the specificity and sensitivity are improved when using the classifier trained using
the new signatures.
4.6.4 Discussion
The results show that the new signatures can yield better results in classification
experiments. It should be noted that the results for the classifier trained using
the new signatures is pessimistic, since the classifier parameters were tuned to the
original signatures. As stated previously, Euclidean distance is not a good met-
ric for pixel signatures. Classifiers trained in a space where Euclidean distance
approximates the transportation-based similarity measure perform better than
those trained in the raw pixel signature space [90]. We could therefore expect
classification performance to be improved if we used signatures in a more appro-
Chapter 4—Summary 131
priate space and selected classifier parameters for the new signatures, rather than
for the original signatures.
4.7 Summary
This chapter presented work on improving the way that scale-orientation pixel
signatures are computed. In summary:
• Mathematical morphology was introduced.
• Scale-orientation pixel signatures were introduced and an existing imple-
mentation was analysed. Two flaws with the existing method were ad-
dressed, yielding a new way to compute pixel signatures. An efficient way
of computing the new signatures was developed.
• An information theoretic measure of signature quality was developed. Com-
paring pixel signatures computed on mammographic images using the old
and new methods showed that the new method increased the information
content of the signatures by approximately 19%.
• A classification experiment was reported in which signatures computed us-
ing the two methods were used to discriminate between pixels belonging to
normal and spiculated lesion tissues. The new signatures outperformed the
original signatures in terms of both specificity and sensitivity. By tuning the
classifier parameters to the new signatures—rather than the old ones—it is
expected that even better performance could be achieved.
Chapter 4—Summary 132
Although the pixel signature approach shows some promise as a method of mod-
elling mammographic appearance, it does not lead to the generative approach
advocated in the introduction to the thesis. The remainder of the thesis focuses
on developing generative statistical models of mammographic appearance.
Chapter 5
Modelling distributions with
mixtures of Gaussians
5.1 Introduction
This chapter presents background information on the multivariate normal distri-
bution and a class of statistical model called the Gaussian mixture model, both of
which are used extensively in the remainder of the thesis. The chapter presents:
• A brief overview of the density estimation problem and a review of common
approaches used to model distributions.
• The Gaussian mixture model and two algorithms for learning the model
parameters from training data.
• Some useful properties of the multivariate normal distribution and Gaussian
133
Chapter 5—Background 134
mixture models (computing marginal and conditional distributions).
• A method of learning Gaussian mixture model parameters from large train-
ing sets using a variant of the k-means clustering algorithm.
5.2 Background
This thesis is largely concerned with statistical modelling, which is used to de-
scribe scenarios (experiments) that are governed by stochastic processes, or which
can be assumed to be governed by such processes. One of the characteristics of
randomness is variation, and this thesis deals with the variation of mammographic
texture and appearance. We use statistical models to cope with this variation.
A random variable (e.g. X) is a function that maps every possible outcome of an
experiment to a unique number1. In this way, the random variable is governed by
the stochastic process. The probabilities of discrete events are usually described
using a probability mass function (pmf), P (X = x), abbreviated as P (x). For
an event x outside the possibility space (i.e. an impossible event), P (x) = 0.
The certain event is assigned a probability of unity. The discrete cumulative
distribution function (cdf) is defined as
C(x) = P (X ≤ x) =∑X≤x
P (x). (5.1)
1Experiments often do not have numerical outcomes, e.g. tossing a coin has outcomes Headsand Tails.
Chapter 5—Background 135
Similarly, the continuous cdf is defined as
C(x) = P (X ≤ x) =
∫ x
−∞p(α) dα, (5.2)
where p(x) is the probability density function (pdf). A pdf is simply the derivative
of the corresponding cdf, and is the continuous equivalent of the pmf. Probability
mass and density functions are nonnegative and must sum or integrate to unity
(because the probability of any event occurring in the possibility space is the
certain event).
Events in continuous distributions are defined as being regions within the possibil-
ity space and so the probability of an event is equal to the integral of the pdf within
the region that delimits the event. In this thesis the possibility spaces are typi-
cally measured on multiple axes, and so the pdfs are multivariate (i.e. are scalar
functions of vectors). In the multivariate case, pdfs have a value at every point
in the possibility space, and probabilities are the integrals over hyper-volumes of
the regions that delimit the events.
The problem of density estimation can be stated as follows: given a training
set, T = {xi ∈ Rd}, i ∈ {1, · · · , N}, of samples from a particular population,
how do we compute the value of the associated pdf for an arbitrary point in the
space? Implicit in this question is the assumption that the pdf cannot be known
a priori. Further to estimating the pdf, it is often necessary to manipulate the
pdf to determine further densities, such as marginal and conditional distributions
(Section 5.5 presents some background on these topics), or to compute likelihoods
or probabilities by integrating the pdf.
Chapter 5—Density estimation 136
5.3 Density estimation
A common approach to density estimation is to assume that the data in T follows
a known trivial distribution, such as a uniform or normal distribution. The
validity of this assumption can be assessed roughly by plotting the data on each
pair of dimensions. If the data do not follow the assumed distribution, then more
sophisticated approaches are required. We will now look at a few common density
estimation techniques and consider how they support the following three tasks:
1. Computing a marginal distribution.
2. Computing a conditional distribution.
3. Sampling from the underlying pdf.
(A full description of what the terms marginal and conditional mean is given in
Section 5.5.)
A simple density estimator is the histogram. The possibility space is broken
into discrete regions called bins, and each bin is assigned a value equal to the
number of training data that lay within the associated region. When normalised
to sum to unity, the histogram defines a pmf, and the situation changes from
being continuous to being discrete. If there are ample data, the granularity
of the estimate of the probability mass function can be such that it is a good
approximation of the pdf. Addressing our three tasks:
1. Computing a marginal distribution simply involves summing the histogram
along the marginal dimensions.
Chapter 5—Density estimation 137
2. Computing a conditional distribution can be achieved by constructing a
lower-dimensional histogram from the bins that intersect the conditions,
and then normalising the resulting pmf to sum to unity.
3. A multivariate histogram could be sampled as follows: construct an associa-
tive array that maps the probabilities to their bin locations in the possibility
space and then sample one of these mappings according to the probabilities.
The histogram approach works well in low dimensions. However, the amount
of data required to populate a space with a given density of data increases ex-
ponentially with the dimensionality of the space. Imagine that one can feasibly
sample from 10 000 individuals. If we could only measure one attribute for each
individual, on a scale of 1 to 100, then the density of data points in the possibility
space would be 10 000/100 = 100. If one could measure two attributes for each
individual (using the same scale), then the density of the possibility space would
be 10 000/1002 = 1. Measuring three attributes yields a density of 0.01, and so
on. This effect is called the curse of dimensionality [14].
A naıve representation of a high-dimensional histogram would be a multi-dimensional
array. To approximate a continuous density, each dimension requires a reasonable
number of elements. The result is a multidimensional array with ad elements—
where a determines the quality of the approximation to the continuous density—
which quickly becomes impractical. The curse of dimensionality implies that most
of the histogram bins will be empty. A more practical implementation could ex-
ploit this redundancy and use a sparse representation, but this would lack the
conceptual simplicity of the histogram and would make computing marginal and
Chapter 5—Density estimation 138
conditional distributions more difficult.
The k-nearest neighbour approach [56] uses the data directly to facilitate density
estimation. The idea is to estimate the local density around a given location in
the possibility space by considering the distance, r, to the k-th nearest neighbour.
The assumption is that the content (hyper-volume) of a hypersphere of radius
r around a point of interest, will be smaller in densely populated regions of the
possibility space than in sparsely populated regions. The density at a point x
can be estimated as
p(x) ≈ k
N
1
vd(r)(5.3)
where kN
is an estimate of the probability represented by the k data points and
vd(r) is the content of a d-dimensional hypersphere with radius r = ‖x− xk‖2,
where xk is the k-th nearest neighbour to x. The main problem with this method
is that an efficient method of finding the nearest neighbours is required. Address-
ing our three tasks:
1. Computing a marginal density simply involves modifying the nearest neigh-
bour routine to neglect measurements from the non-marginal dimensions
and using the appropriate value for d when Equation 5.3 is used.
2. It is difficult to see how this method would allow conditional distributions
to be computed.
3. Since samples from a population are likely to be more common from dense
regions of the pdf, “new” samples could be generated simply by choosing
one of the original samples at random, but this would restrict samples to
the observed set. One could consider the hypersphere defined by r to be of
Chapter 5—Density estimation 139
uniform density, and draw a sample from such a region around a randomly
chosen point in the set of observed samples.
The Parzen window density estimator [56] is similar to the k-nearest neighbour
approach in that it uses the training data points directly to help model the den-
sity. The method assumes that the underlying pdf to be estimated is nonzero at
locations near the training points and that less can be inferred about the pdf—
based on a particular training point—as one moves further away from it in the
possibility space. The relationship between the inference that can be made about
the pdf, based on a particular training point xi, and the distance from that train-
ing point, is represented by a kernel function which has the form k(r, xi), where
r is the (often Euclidean) distance to xi. It is the kernel that defines the contri-
bution of the data point to the estimate of the pdf: a kernel is centred on each
data point, and the pdf is defined as the sum of these kernels, normalised such
that the integral of the pdf is unity. The particular form of the kernel (e.g. the
Gaussian, boxcar or triangle functions)—and its parameterisation—must be cho-
sen to be suitable for the application at hand. While the Parzen window density
estimator is reasonably simple, choosing the kernel function and its parameters
can be difficult. Addressing our three tasks:
1. Computing a marginal distribution simply involves ignoring measurements
on the non-marginal axes and re-normalising the integral of the pdf to unity.
2. Computing a conditional distribution will depend upon the form of kernel
chosen. If a Gaussian kernel is chosen, then a closed-form solution exists
(see Section 5.5.2).
Chapter 5—Gaussian mixture models 140
3. Similarly to the k-nearest neighbour approach, one could sample from a
Parzen window representation by choosing a data point at random, and
then sampling from its associated kernel as if it were a distribution.
In the remainder of the chapter we present the Gaussian mixture model, which
can be viewed as a generalisation of the Parzen window density estimator. The
Gaussian mixture model is an elegant and relatively simple density estimator that
can be trained in a principled way. Further, there exist closed-form solutions for
the marginal and conditional distributions. It is also easy to sample from the
modelled distribution. We shall exploit these properties in much of this thesis
and see that these properties are extremely useful for our image synthesis and
analysis methods (see Chapter 6 and Chapter 9).
5.4 Gaussian mixture models
The GMM approximates an arbitrary pdf using a weighted sum of Gaussian (nor-
mal) basis functions, which we call components. In the univariate case, where
observations are measured on a single axis, each component is parameterised by
a mean and a variance. In the multivariate case, where observations are mea-
sured on multiple axes, each component is parameterised by a mean vector and
a covariance matrix. In addition, each component has an associated probability
(“weight”). We shall assume the multivariate case, but the same theory applies
Chapter 5—Gaussian mixture models 141
in the univariate case. The GMM has the following form:
p(x) =k∑
i=1
P (i)g(x, µi, Σi) (5.4)
where x is a point in the possibility space, p(x) is the pdf, i indexes the k
components, and µi and Σi are the mean vector and covariance matrix for the i-
th component. The probability of the i-th component is P (i) and g is the function
that describes the pdf of a single component:
g(x, µ, Σ ) =1√
(2π |Σ |)ne−
12(x−µ)TΣ−1(x−µ) . (5.5)
5.4.1 Learning the parameters
To perform density estimation using the GMM, one has to find the model param-
eters that fit the model to the training data. This is an ill-posed problem, and
the most common regularisation strategy is to use maximum likelihood estima-
tion, where we seek the model parameters that maximise the likelihood that the
model could generate the data. We shall present two solutions to the parameter
selection problem shortly.
Finding the model parameters would be simpler if we did not have to worry
about the parameter k—the number of model components—which effectively says
that there is a countably infinite number of families of model. Unlike many
unsupervised learning problems, where one of the aims is to discover the classes
that exist within a mixed training set, all the data in T comes from the same class,
Chapter 5—Gaussian mixture models 142
so we do not need to determine the “correct” number of components—we simply
want to model the distribution of the data. As the number of model components
increases, so does the level of pdf detail that can be modelled. However, we must
be able to support the choice of parameters for each component using data from T ,
so there is a practical upper bound on the number of components that a model can
have. We shall see later that once we have determined the model parameters and
want to use the model, we need to iterate over each component. This introduces
a further constraint on the number of components, as the computational cost of
using a GMM is related to this number. In short, provided that we have adequate
support for the components, we can have as many as is practical.
We will now describe two approaches to fitting a GMM to training data. The k-
means clustering algorithm is a simple and intuitive method, but was not designed
to fit GMMs to data, while the Expectation-Maximisation (EM) algorithm is more
principled.
5.4.2 The k-means clustering algorithm
The k-means clustering algorithm [125, 102] is a simple example of unsupervised
learning. The problem is posed as follows: given a set of multivariate measure-
ments, T ={ti : i = 1, . . . , N
}, form k disjoint subsets (called clusters) such
that all the elements of a particular cluster are similar. There are many variants
of the algorithm, but we shall present two: the first clusters the data in a single
pass (see Algorithm 1); the second is iterative in nature, giving each data point
the opportunity to migrate (see Algorithm 2).
Chapter 5—Gaussian mixture models 143
Algorithm 1 The non-iterative k-means algorithm.
. Randomly assign each ti ∈ T to one of the k clusters.for i = 1, · · · , k do
. Compute the i-th cluster centre: the mean of the elements assigned tocluster i.
end forfor each element ti ∈ T do
. Using some metric compute the distance from ti to each cluster centre.if ti is not assigned to the cluster with the closest centre then
. Assign ti to the cluster with closest centre.
. Recompute the means of the two clusters involved in the reassignment.end if
end for
Algorithm 2 The iterative k-means algorithm.
. Randomly assign each ti ∈ T to one of the k clusters.for i = 1, · · · , k do
. Compute the i-th cluster centre: the mean of the elements assigned tocluster i.
end forrepeat
for each ti ∈ T do. Using some metric compute the distance from ti to each cluster centre.if ti is not assigned to the cluster with the closest centre then
. Assign ti to the cluster with closest centre.end if
end for. Recompute the cluster centres.
until some stopping criterion is met (see text).
Chapter 5—Gaussian mixture models 144
The metric used to measure similarity can be selected to be appropriate to the
problem at hand, but Euclidean distance is often used. For the iterative algo-
rithm, a range of stopping criteria can be used, but a common strategy is to stop
iterating when no further reassignments occur.
Once a final clustering has been obtained, it is a simple matter to fit a GMM
to the clustering: the means,{µi : i = 1, . . . , k
}, are simply the cluster centres;
the covariance matrices,{Σi : i = 1, . . . , k
}, are the covariance matrices com-
puted from the elements assigned to each cluster and the component probabili-
ties, {P (i) : i = 1, . . . , k}, are computed using the number of elements assigned
to each cluster:
P (i) =ni
N. (5.6)
The clustering scheme can easily be modified to remove clusters if the number of
elements assigned to them falls to a level at which there is insufficient support
for the corresponding Gaussian component.
The k-means algorithm is intuitive and simple to implement, but it was not
designed to fit a GMM to data. In statistical terms, the k-means algorithm
minimises the within-cluster variances. Due to the random initialisation, a given
run of the algorithm will find one possible local minimum. Several runs of the
algorithm give a reasonable chance of finding the global minimum or a suitable
local minimum.
Chapter 5—Gaussian mixture models 145
5.4.3 The Expectation Maximisation algorithm for Gaus-
sian mixtures
Although the k-means algorithm is intuitive, it was not designed to fit GMMs to
data. The maximum likelihood formulation provides a more principled approach
to this problem, where model parameters are sought that maximise the likeli-
hood of the data having been generated. Unfortunately, there is no analytical
solution to this optimisation problem, and so alternative approaches are used.
The Expectation-Maximisation (EM) algorithm [137] is a general approach to
simplifying maximum likelihood problems, and in this section we shall present
the EM algorithm for fitting a GMM to training data. We will start with a simple
one-dimensional problem with just two model components [81], and then gener-
alise the algorithm to work in higher dimensions and with an arbitrary number
of components. (The expectation maximisation algorithm is presented in its ab-
stract form in Appendix A, along with a proof that the algorithm converges to a
local maximum of the objective function.)
We assume a training set,{xi ∈ R : i = 1, . . . , N
}, that has been drawn from an
underlying distribution that can reasonably be modelled using a GMM with two
components. Using the random variables X, X1 and X2, we can describe our model
as follows:
X ∼ (1−∆)X1 + ∆X2 (5.7)
X1 ∼ N(µ1, σ21) (5.8)
X2 ∼ N(µ2, σ22) (5.9)
Chapter 5—Gaussian mixture models 146
where ∆ ∈ {0, 1} with P (∆ = 1) = π.
Equation 5.7 can be viewed as a simple generative model: generate a ∆ with
probability π; if ∆ = 0 then deliver X1, otherwise deliver X2. If gθ(x) is a normal
distribution with parameters θ = (µ, σ2), the we can write the pdf of X as:
p(x) = (1− π)gθ1(x) + πgθ2(x). (5.10)
The model is parameterised by a vector Θ = (π, θ1, θ2) = (π, µ1, σ21, µ2, σ
22). We
want to select an optimal vector, Θ′, which is a maximiser of the likelihood of the
data having been generated by the model. The log-likelihood of the parameters
given the N data points is:
`(Θ;X ) =N∑
i=1
log p(xi) =N∑
i=1
log [(1− π)gθ1(xi) + πgθ2(xi)] . (5.11)
Unfortunately, there is not a closed-form solution to Equation 5.11 and so a
numerical approach is required. If we knew the component from which each data
point was drawn, then finding the optimum Θ would be easy—the component
means and variances could just be computed by considering each component
separately, and π could be computed from the number of points assigned to each
cluster. Because we do not know the membership of each data point, we consider
unobserved latent variables,{∆i ∈ {0, 1} : i = 1, . . . , N
}, as in Equation 5.7, and
make soft (probabilistic) assignments. Given a current estimate, Θ, of the model
Chapter 5—Gaussian mixture models 147
parameters, we compute the expected value of each ∆i:
δi = E(∆i|Θ,X ) = P (∆i = 1|Θ,X ) (5.12)
and we can call δi the responsibility of X2 for observation i. This is the expectation
step of the EM algorithm. In the maximisation step, the estimates of the model
parameters are updated using maximum-likelihood estimates weighted by the
responsibilities. The EM algorithm for fitting a GMM with two components to
one-dimensional data is described by Algorithm 3.
Just as the k-means algorithm requires an initial hard assignment of data points
to clusters, the EM algorithm requires an initialisation. For example, the mixing
proportion π can be set to 0.5, two of the xi may be chosen to be µ1 and µ2, and the
component variances can be set to be the overall sample variance, 1N
∑Ni=1(xi−x)2.
If a Gaussian component, with zero variance, is placed upon one of the data
points, then the likelihood of that data point becomes infinite, thus giving an
unfortunate maximum for Equation 5.11. Therefore the variances must be con-
strained to be greater than zero. Dempster, Laird, and Rubin showed that an
iteration of the EM algorithm cannot decrease the objective function [137]. In
general, the objective function can have multiple optima, and several runs of the
algorithm—using different initialisations—may be required. Algorithm 4 gener-
alises Algorithm 3 to the case of multivariate data and multiple model compo-
nents. Notice that with multiple components, the component responsibilities for
the data points need to be computed in the computation of each type of model
parameter, and so the expectation and maximisations steps are combined.
Chapter 5—Gaussian mixture models 148
Algorithm 3 The EM algorithm for fitting a GMM with two components toone-dimensional data.
. Initialise the parameters (see text):
Θ = (π, µ1, σ21, µ2, σ
22). (5.13)
repeat. The expectation step: update the estimate of the responsibilities:
δi =πgθ2
(xi)
(1− π)gθ1(xi) + πgθ2
(xi),∀ i ∈ {1, · · · , N}. (5.14)
. The maximisation step: update the weighted maximum-likelihood es-timates of the means and variances, and update the estimate of the mixingprobability:
µ1 =
∑Ni=1(1− δi)xi∑Ni=1(1− δi)
, (5.15)
σ21 =
∑Ni=1(1− δi)(xi − µ1)
2∑Ni=1(1− δi)
,
µ2 =
∑Ni=1 δixi∑Ni=1 δi
,
σ22 =
∑Ni=1 δi(xi − µ2)
2∑Ni=1 δi
,
π =N∑
i−1
δi
N.
until convergence.
Chapter 5—Gaussian mixture models 149
Figure 5.1: An illustration of the expectation maximisation algorithm.
Figure 5.1 shows an illustration of the EM algorithm. The figure shows the joint
distribution of model parameters and latent data for a pedagogic example. The
vertical axis represents the model parameter space, and the horizontal axis rep-
resents the latent variable space. The horizontal lines in the diagram represent
the E-steps and the vertical lines represent the M-steps. The procedure begins
with an initial (poor) estimate of the model parameters. Keeping these constant,
the E-step obtains an estimate of the latent data. Keeping the latent data con-
stant, the M-step obtains a refined estimate for the model parameters. The two
steps are iterated until the algorithm converges to a local maximum. Note that
this particular run of the algorithm finds a local maximum that is not the global
maximum.
Chapter 5—Gaussian mixture models 150
Algorithm 4 The EM algorithm for fitting a GMM with multiple componentsto multivariate data.
. Initialise the parameters:
Θ = {P (i), µi, Σi}, ∀i ∈ {1, · · · , k}. (5.16)
repeat. Update the estimate of the mixing probabilities:
P (i) =1
N
N∑j=1
P (i|xj, Θ), ∀i ∈ {1, · · · , k}, (5.17)
where the “responsibility” of component i for xj is
P (i|xj, Θ) =p(xj|i, Θ)P (i|Θ)
p(xj|Θ)(5.18)
by Bayes’ theorem.. Update the estimate of the component means:
µi =
∑Nj=1 P (i|xj, Θ)xj∑N
j=1 P (i|xj, Θ), ∀i ∈ {1, · · · , k}. (5.19)
. Update the estimate of the component covariance matrices:
Σi =
∑Nj=1 P (i|xj, Θ)(xj − xi)(xj − xi)
T∑Nj=1 P (i|xj, Θ)
, ∀i ∈ {1, · · · , k}. (5.20)
until convergence.
Chapter 5—Useful properties of multivariate normal distributions 151
5.5 Useful properties of multivariate normal dis-
tributions
The multivariate normal distribution has the very useful property that there exist
closed-form solutions to the problems of computing the marginal and conditional
distributions. We will review what is meant by these terms, describe the closed-
form solutions for the multivariate normal (i.e. a single component), and then
generalise these results to the multivariate GMM.
5.5.1 Marginal distributions
Imagine that three measurements are made for a sample of individuals on a
continuous scale (e.g. the height, weight and annual income of a number of peo-
ple). We could fit a GMM to this data. Further, imagine that to answer a
particular question we are only interested in the distribution of one of these mea-
surements (e.g. height) and have no constraining information for the other two
dimensions. The distribution we seek is called a marginal distribution. Intuitively,
the marginal distribution is the projection of the full pdf onto the dimensions that
we are interested in. Figure 5.2 illustrates a two-dimensional pdf marginalised
over one dimension.
Formally, if p(x) = p(x1, · · · , xn) is a multivariate pdf, then p(x) marginalised
Chapter 5—Useful properties of multivariate normal distributions 152
Figure 5.2: A two-dimensional distribution marginalised over one dimension.The marginal distribution is the “shadow” at the back of the distribution.
over all dimensions except those indexed by D ={di : i = 1, . . . ,m, m ≤ n
}is:
p(xf1 , · · · , xfq) = (5.21)∫ ∞
−∞· · ·
∫ ∞
−∞p(x) dxd1 · · · dxdm , F =
{fi : i = 1, . . . , q
}6⊂ D.
In Equation 5.21, F represents a set of dimension indices which are to be retained
(i.e. we are interested in them). D represents a set of dimension indices which
are to be removed via marginalisation. Sets D and F cannot share indices and
so they are disjoint.
Although the definition of the marginal involves a series of integrals, there is
a very simple general solution: we simply pretend that the dimensions that we
want to marginalise over do not exist (and so no measurements could have been
Chapter 5—Useful properties of multivariate normal distributions 153
made for them) [103]. In the case of the Gaussian, the parameters that define the
distribution—the mean vector and covariance matrix—are modified by removing
entries that correspond to the dimensions that we want to marginalise over. An
example of this is shown below. If X ∼ N(µ, Σ ) with
µ =
µ1
µ2
µ3
, Σ =
Σ1,1 Σ1,2 Σ1,3
Σ2,1 Σ2,2 Σ2,3
Σ3,1 Σ3,2 Σ3,3
(5.22)
then
p(x1, x3) =
∫ ∞
−∞p(x) dx2 = N(µm, Σm) (5.23)
where
µm =
µ1
µ3
, Σm =
Σ1,1 Σ1,3
Σ3,1 Σ3,3
. (5.24)
Note that the marginal Gaussian density is itself a Gaussian density. The proce-
dure for computing the marginal distribution can be easily extended to the case
of a GMM by applying the above procedure to each component.
5.5.2 Conditional distributions
Imagine again a multivariate distribution. Also, imagine that we have made
a measurement along one of the dimensions and want to know how this mea-
surement constrains the distribution of values on the other dimensions. The
distribution we seek is called the conditional distribution.
Chapter 5—Useful properties of multivariate normal distributions 154
Figure 5.3: A conditional distribution.A joint density is shown on the left. Applying a condition on one dimensionconstrains the distribution along the other.
In the general case of a multivariate pdf and multiple conditions, each condition
defines a hyperplane through the full distribution. The hyperplanes are axis-
aligned and mutually orthogonal. The conditional distribution is the function
that describes the values of the pdf that lie on the intersection of these hyper-
planes, normalised so that the function’s integral equals unity (i.e. is a valid pdf).
Figure 5.3 illustrates this concept.
We now derive the conditional distribution for the multivariate normal distribu-
tion [103]. We seek an expression for p(x1|x2). We will partition the random
vector X into X1 and X2. X2 will be conditioned by X2 = x2. Our approach is to
find a way of forcing independence between X1 and X2. Recall that if two distri-
butions, p(a) and p(b), are independent, then p(a|b) = p(a). Let X ∼ N(µ, Σ ).
We partition Σ as follows:
Σ =
Σ1,1 Σ1,2
Σ2,1 Σ2,2.
(5.25)
Chapter 5—Useful properties of multivariate normal distributions 155
Σ1,1 can be linearly transformed so that the covariances shared with Σ2,2 are
zero and hence the two are independent. Assume X ∈ Rp and that there are q
conditions. Let
A =
I︸︷︷︸
q×q
−Σ1,2Σ−12,2︸ ︷︷ ︸
q×(p−q)
0T︸︷︷︸(p−q)×q
I︸︷︷︸(p−q)×(p−q)
. (5.26)
Applying A to Σ yields:
AΣAT (5.27)
=
I −Σ1,2Σ−12,2
0T I
Σ1,1 Σ1,2
Σ2,1 Σ2,2
I 0
(−Σ1,2Σ−12,2 )T I
=
Σ1,1 − Σ1,2Σ−12,2 Σ2,1 0
0T Σ2,2.
We see that the off-diagonal covariances are zero. Applying the same transfor-
mation to (X− µ):
A(X− µ) = A
X1 − µ1
X2 − µ2
(5.28)
=
I −Σ1,2Σ−12,2
0T I
X1 − µ1
X2 − µ2
=
X1 − µ1 − Σ1,2Σ−12,2 (X2 − µ2)
X2 − µ2
,
Chapter 5—Useful properties of multivariate normal distributions 156
which has the distribution Nq(0, Σ1,1 − Σ1,2Σ−12,2 Σ2,1)
Np(0, Σ2,2)
. (5.29)
If we fix X2 = x2, then µ1 − Σ1,2Σ−12,2 (x2 − µ2) is constant. Because X1 − µ1 −
Σ1,2Σ−12,2 (x2 − µ2) and X2 − µ2 are independent, the conditional distribution of
X1 − µ1 − Σ1,2Σ−12,2 (x2 − µ2) is the same as the unconditional distribution of
X1 − µ1 − Σ1,2Σ−12,2 (X2 − µ2), i.e. Nq(0, Σ1,1 − Σ1,2Σ
−12,2 Σ2,1). Therefore, given
X2 = x2, X1 ∼ Nq(µ1 + Σ1,2Σ−12,2 Σ2,1(x2 −µ2), Σ1,1 −Σ1,2Σ
−12,2 Σ2,1). Note that, as
with the marginal distribution, the conditional is itself a Gaussian density. Note
also that the conditional covariances are independent of x1.
For clarity, we summarise the result obtained above. If X is a multivariate random
variable, where X ∼ N(µ, Σ ), then we can partition these as:
X =
X1
X2
, (5.30)
µ =
µ1
µ2
, (5.31)
Σ =
Σ1,1 Σ1,2
Σ2,1 Σ2,2
. (5.32)
Chapter 5—Useful properties of multivariate normal distributions 157
The conditional distribution p(x1|x2) = N(µ′, Σ ′) with
µ′ = µ1 + Σ1,2Σ−12,2 (x2 − µ2) (5.33)
Σ ′ = Σ1,1 − Σ1,2Σ−12,2 Σ2,1 . (5.34)
The dimensions of X1 and X2 do not have to be adjacent, which allows the dis-
tribution to be conditioned over arbitrary dimensions.
Computing the conditional distribution for a GMM involves computing the con-
ditional means and covariances for each component as described above and com-
puting{P (i|x2) : i = 1, . . . , k
}, the set of conditional component probabilities.
These are computed using Bayes’ theorem:
P (i|x2) =p(x2|i)P (i)
p(x2)(5.35)
where p(x2|i) is computed by marginalising each component over the unknown
dimensions.
Numerical issues
The quantity Σ−12,2 is required in order to compute a conditional distribution. In
practice, covariance matrices can often be close to singular (numerically difficult
to invert). An ad hoc approach to improving the condition of a covariance ma-
trix is to add to the diagonal of the matrix. This essentially adds variance to
the distribution represented by the matrix. A significant problem with this ap-
proach is that one does not usually know a priori how much variance should be
Chapter 5—Useful properties of multivariate normal distributions 158
added. We have experimented with a scheme where small amounts of variance
are added incrementally until the matrix can be inverted. The method was rea-
sonably successful—i.e. it could be used in the methods we describe in successive
chapters—but computationally expensive.
Another approach, and one that we have found to be a good solution, is to
compute the Moore-Penrose generalised inverse (commonly called the pseudo-
inverse) of the covariance matrix instead [131, 139]. The Moore-Penrose inverse
of the matrix A, which we will denote by A+, has the following properties:
AA+A = A (5.36)
A+AA+ = A+ (5.37)
(AA+)T = AA+ (5.38)
(A+A)T = A+A (5.39)
and
x = A+b (5.40)
is the least squares solution to
Ax = b. (5.41)
Although the Moore-Penrose generalised inverse is defined for any complex ma-
trix, we shall restrict this discussion to covariance matrices, which are symmetric.
The Moore-Penrose generalised inverse can be computed as follows. Note that
Chapter 5—Useful properties of multivariate normal distributions 159
the inverse of the matrix A can be written as
A−1 = (PDP−1)−1 = PD−1P−1 (5.42)
where D is a diagonal matrix of the eigenvalues of A, and P is a matrix whose
columns are the eigenvectors of A (i.e. Equation 5.42 represents a rotation of A
to its principal axes). The matrix D−1 is trivial to compute, as it is simply a di-
agonal matrix where each diagonal element is the reciprocal of the corresponding
element in D. For near singular matrices, some of the eigenvalues will be small.
We modify P by discarding the eigenvectors that have small corresponding eigen-
values, and remove the elements of D that correspond to the small eigenvalues
(e.g. if eigenvector 3 is small, row 3 and column 3 of D would be removed):
A+ = PD−1P−1 (5.43)
where P is the modified P and D is the modified D. Since P is orthonormal
P−1 = PT ⇒ P−1 = PT. Although we generally use the Moore-Penrose gen-
eralised inverse for covariance matrices, we use the Σ−1 notation throughout
this thesis—rather than Σ+—because other techniques are occasionally used (see
Section 8.3.1) and the Σ−1 notation implies intent rather than implementation
detail.
Computing conditional Gaussians represents approximately 98% of the computa-
tions performed in the work presented in Chapter 62 and a substantial proportion
of those in Chapter 9, and so a hand-tuned implementation of the above algorithm
2As determined by profiling our implementation.
Chapter 5—Useful properties of multivariate normal distributions 160
was developed. This implementation allows Moore-Penrose generalised inverses
of covariance matrices to be computed about 1.4 times faster than the implemen-
tation provided by MATLAB—which uses LAPACK routines [6]—and is equally
robust. A less portable version was much faster, being 1.5 to 2 times faster than
the MATLAB implementation.
5.5.3 Sampling from a Gaussian mixture model
Sampling from an n-dimensional Gaussian mixture model is reasonably straight-
forward. Firstly, one of the model components is selected at random. The distri-
bution used for this sampling is the set of component probabilities,{P (i) : i =
1, . . . , k}. A sample is then drawn from the selected component, described by the
covariance matrix Σ . If the component was aligned with the Cartesian axes, then
sampling from the component would be easy because its covariance matrix would
be diagonal and the dimensions would be independent: a set of n scalars could
be sampled from univariate normal distributions with variances corresponding
to each diagonal element of the covariance matrix. In general, components are
not aligned with the Cartesian axes, and so we must first diagonalise the compo-
nent’s covariance matrix. This is achieved by performing an eigen decomposition
which yields a matrix P , where each column is an eigenvector of the covariance
matrix. P represents the transformation needed to diagonalise the covariance
matrix. This is a Principal Components Analysis (PCA) [104]. The diagonalised
covariance matrix is given by:
ΣD = PTΣP . (5.44)
Chapter 5—Learning from large datasets 161
An n-dimensional vector, sD, is then sampled from the diagonalised component,
using the procedure described above. This vector is then transformed back to the
original space by applying the inverse transformation to yield a sample, s′:
s′ = PsD. (5.45)
s′ is now in the space of our model, but is centred on the origin. We then translate
this sample, using the component’s mean vector, µ:
s = s′ + µ. (5.46)
The vector s is a sample from the distribution represented by the model.
5.6 Learning from large datasets
Mammograms are digitised at high resolution, yielding images that contain a
few million pixels. It seems reasonable that, in order to learn the variation in
mammographic appearance, we will need to consider large quantities of data. We
have therefore considered how the k-means algorithm could be adapted to process
such volumes of data.
Jain et al. present a review of data clustering techniques where they discuss clus-
tering large datasets [102]. The most natural approach for problems where the
entire dataset cannot be stored in primary memory is the divide-and-conquer al-
gorithm, which is illustrated in Figure 5.4. This algorithm stores the full dataset,
Chapter 5—Learning from large datasets 162
D0, in a secondary memory (e.g. on hard disk or a large networked store) and ran-
domly divides it into p subsets, Si : i ∈ {1, · · · , p}, of equal size. Each Si is then
processed by a clustering algorithm, yielding clusters Ci,j, where j ∈ {1, · · · , k}.
Each cluster Ci,j then contributes a number of representative data points to form
a new data set, D1. Let there be Ni data points in subset Si and ni,j data points
in cluster Ci,j. The “probability” of Ci,j is ni,j/Ni. If there are to be η data points
contributed from each Si to D1, then cluster Ci,j contributes qj data points, where:
qj =ηni,j
Ni
. (5.47)
(Clusters which represent more data contribute appropriately, so that one “type”
of data is not disproportionately represented in D1.)
If there are still too many data points in D1 for it to be clustered in primary
memory, the above process can be repeated to create D2, D3 and so forth. The
number of times the divide-and-conquer algorithm will be run is determined by
the initial number of subsets, p, and the number of data points contributed from
each subset, η.
In our work in Chapter 6, we set p and η such that D1 can be clustered within
primary memory (i.e. only one run through the divide-and-conquer algorithm is
required). Since the data sets {Si} need to be clustered as an intermediate step,
we use the non-iterative variant of the k-means algorithm on each of these, and
the iterative variant to yield the final clustering, from which the GMM parameters
are computed.
Chapter 5—Learning from large datasets 163
Figure 5.4: The divide-and-conquer clustering algorithm.This diagram illustrates how a large data set, D0, can be divided into smaller ones{Si} which can be clustered in primary memory. Each clustering then contributessome representative data to form a new data set, D1, which can be clustered inprimary memory to yield a final clustering.
Chapter 5—Summary 164
In much of the work presented in Chapter 6, we adopt the divide-and-conquer
approach. However, although this variant of the k-means algorithm allows GMMs
to be built from large datasets, we found there to be no appreciable difference
between models built using the divide-and-conquer algorithm and those built
simply by selecting a reasonable number of training points at random. The divide-
and-conquer method simply makes it unlikely that the GMM will be built from
biased data.
A better approach might be to implement an EM algorithm that can consider
large datasets. The inner loop of the EM algorithm is written as an iteration over
the data points; with an appropriate caching strategy, the EM algorithm extends
naturally to deal with large data sets. Unlike the divide-and-conquer variant of
k-means algorithm, every data point would contribute to every model parameter.
5.7 Summary
This chapter presented an introduction to Gaussian mixture models and the mul-
tivariate normal distribution. In summary:
• Gaussian mixture models are a flexible solution to the density estimation
problem.
• Gaussian mixture model parameters can be learned from training data using
several approaches. This chapter described the k-means clustering and
Expectation-Maximisation algorithms.
Chapter 5—Summary 165
• The k-means clustering algorithm is a simple and intuitive approach but
was not designed to fit Gaussian mixture models to data.
• The Expectation-Maximisation algorithm is a principled approach to learn-
ing Gaussian mixture model parameters from training data.
• The multivariate normal distribution has two useful properties: the marginal
and conditional distributions can be computed using closed-form solutions.
Further, the marginal and conditional distributions are themselves multi-
or univariate normal distributions.
• It is possible to sample from a multivariate normal distribution.
• These properties of the multivariate normal distribution can be used to
define equivalent operations on the multivariate Gaussian mixture model.
• The chapter described an approach to learning parameters for Gaussian
mixture models from large datasets using a variant of the k-means cluster-
ing algorithm. The Expectation-Maximisation algorithm can be trivially
extended to learn from large datasets.
Chapter 6
Modelling mammographic
texture for image synthesis and
analysis
6.1 Introduction
This chapter develops a generative parametric statistical model of stationary
texture. The model is based upon Efros and Leung’s non-parametric texture
synthesis method [60]. The chapter presents:
• Efros and Leung’s texture synthesis algorithm.
• A parametric model-based version of their algorithm that allows texture
analysis as well as synthesis.
166
Chapter 6—Background 167
• A way of synthesising textures using the parametric model and some exam-
ple synthetic images.
• A novelty detection method that allows the parametric model to be used
to analyse textures.
6.2 Background
Chapter 3 gave a brief overview of some computer-aided detection systems. These
generally attempt to emulate radiologists’ interpretation strategies using pattern
recognition. The approach is very common in computer-aided mammography,
but may not be the best way to approach the problem.
Instead of learning a classification rule that separates classes (e.g. malignant
masses from benign masses), we should instead learn what pathology-free mam-
mograms look like within a framework that allows illegal instances to be iden-
tified. If we can determine that the appearance of a particular mammogram is
unlikely—given it is supposed to be free of pathology—then we can label that
mammogram as being novel (perhaps leaving an expert to determine exactly why
it is novel). Another name for the approach is outlier detection.
To perform novelty detection on mammograms we need a model of the appearance
of pathology-free mammograms that can be used in an analytical mode. That is
to say that the model needs to be able to identify unlikely model instances by
assigning likelihoods (or similar measures) to model instances.
Chapter 6—Background 168
The appearance of entire mammograms is difficult to model due to the nature of
the imaging process and anatomical differences between women. In this chapter
we make the problem more tractable by assuming that mammograms are sta-
tionary (i.e. the statistics of the texture do not vary over the image plane and
so local appearance does not vary across the breast). This is certainly not true,
for example the appearance of a pectoral muscle differs significantly from a fatty
breast region, but the assumption allows us to concentrate on a manageable part
of the problem. Because we are assuming stationarity, we do not need to worry
about shape—our simplified mammogram is a texture on a potentially infinite
plane. We address the problem of modelling the appearance of entire breasts in
mammograms in Chapter 9.
When developing models of appearance, it is useful to be able to visualise in-
stances of the model—in our case to be able to generate synthetic examples of
mammographic textures. This will allow us to evaluate the generality and speci-
ficity of the model. Because this generative property is so useful, we make it a
requirement for our model.
Sajda et al. [159, 169] used wavelet coefficients, computed from mammographic
patches, to statistically model mammographic patches using a tree-structured
variant of a hidden Markov model. To synthesise an image, coefficients in finer
levels of the wavelet decomposition were sampled, conditional on those at coarser
levels. Subjectively, this approach was reasonably successful at capturing local
textural appearance of mammograms, and can be used in both generative and
analytical modes. The model was used to filter false positives produced by an-
other CADe system. Unfortunately, due to the way in which finer levels are
Chapter 6—Background 169
conditioned upon coarser ones, the synthetic images produced using the model
had an obvious grid structure corresponding to the coarsest level of the wavelet
decomposition. The approach is similar to De Bonet and Viola’s model of generic
textures that allows both synthesis and analysis [53]. We present a model of entire
mammograms that uses the hierarchical conditioning approach in Chapter 9.
Bochud et al. [19] developed an algorithm to generate mammographic texture
using a strategy that placed basis functions at locations within a white noise
image, according to a pdf. The white noise and kernel function were matched
to the power spectrum of real mammographic texture. The method produced
reasonable synthetic textures, but the images could easily be distinguished from
real examples. Brettle et al. [27] evaluated several methods for generating medical
textures (including mammographic textures) and found that the generic texture
synthesis method developed by Efros and Leung produced the best results [60].
Heine et al. modelled mammographic texture using a random field model [85].
The authors assumed that mammographic texture could be modelled by a con-
volution of a random field with a kernel function. The choice of the form of the
kernel was based in part on studies of the fractal nature of mammograms. The
parameter governing the kernel function was learned from real mammographic
data, as were the statistical characteristics of the random field. Subjectively, the
approach allowed reasonably realistic synthetic textures to be generated. The au-
thors analysed an obvious mass in a real mammogram by computing the random
field that would have been required to generate the image under their model.
The location of the mass was visible in computed field, and the approach can be
viewed as an example of novelty detection.
Chapter 6—Non-parametric sampling for texture synthesis 170
In addition to methods developed specifically to model mammographic texture,
generic texture synthesis algorithms may also be useful for mammographic texture
synthesis (e.g. [53, 84, 144]). The work presented in this chapter extends the Efros
and Leung algorithm to a parametric statistical setting, which allows the method
to be used to generate synthetic textures and perform novelty detection.
6.3 Non-parametric sampling for texture syn-
thesis
Efros and Leung describe a method of replicating texture, based upon non-
parametric sampling [60]1. Their algorithm is based upon an idea from Shan-
non’s paper that introduced information theory [163]. Their method extends
Shannon’s one dimensional Markov chain approach to producing English-looking
text to the image plane, to achieve texture synthesis. Though the method is
simple, it produces some of the best results in the texture synthesis literature.
Assume an image containing a sample of the texture one wishes to replicate.
This source image, IS, is simply a matrix of grey-level values. The aim is to fill an
unpopulated target matrix, IT , with grey-level pixel values, such that the textures
in the two images are similar (but not identical). Their algorithm is described in
Algorithm 5.
The method essentially has two parameters: the window size and the number
1Efros and Leung’s method was developed independently of a similar method presented in[68].
Chapter 6—Non-parametric sampling for texture synthesis 171
Algorithm 5 Efros and Leung’s texture synthesis algorithm.
. Select a region from IS and insert it at some location in IT . For example, any3× 3 pixel section could be used.. Initially, let the set S = ∅.for each pixel p ∈ IS do
. Extract a square window, s, of size n× n, centred around p.
. Add s to S.end forrepeat
. Compute a list, U , of unpopulated pixel locations that are 8-connected tothe populated area of IT .. Randomly choose a pixel location, u ∈ U .. Examine a vector t, formed from a square window of size w × w pixels,centred on u. Some dimensions (pixels) will be populated and some will not.. Find a small set of windows, S ′, from S that are similar to t.. Randomly select one of the windows from S ′ and place its centre pixel valueinto IT at location u.
until all the pixels in IT have been populated.
of similar windows to place into S ′. The first parameter is important for good
texture synthesis and is a function of the actual texture. The authors say that
the window size should be selected to be similar in size to the largest repeating
feature in the texture. The second parameter is automatically adapted by finding
the distance, δ, to the most similar window, and then including all windows in
S ′ that are within a radius of (1 + ε)δ from t (Efros and Leung set ε to 0.1 [60]).
The method also requires a metric that measures window similarity and takes
into consideration the missing (unpopulated) pixels from t. The authors use a
normalised sum of squared differences metric, which is weighted by a Gaussian
kernel to give more weight to the pixels near the centre of the window and hence
encourage local similarity. The authors also trivially extend the method to work
with colour images, although this is not useful for a mammography application.
Chapter 6—A generative parametric model of texture 172
Efros and Leung’s method does not consider texture analysis, but Efros claims
that it could be achieved using k-nearest neighbour classification2. Such methods
are problematic for two reasons: populating high-dimensions spaces is impractical
[14] and finding the k-nearest neighbours is computationally demanding.
In a subsequent paper [59], Efros and Freeman address the run-time efficiency
of the texture synthesis algorithm, proposing that instead of populating IT pixel-
by-pixel, the texture is synthesised patch-by-patch. Although the fundamental
idea of conditional sampling was preserved, the non-parametric approach meant
that Efros and Leung’s algorithm had to be significantly modified and a different
similarity measure and sampling method were used.
6.4 A generative parametric model of texture
In this section we propose an approach that unifies Efros and Leung’s and Efros
and Freeman’s methods within a parametric statistical framework. Our method
will not only enable novelty detection to be performed using statistical inference,
but will address two of the problems of the non-parametric approaches: their
sampling methods do not truly reflect the statistical distribution of window ap-
pearance and the time complexity of the synthesis algorithms are a function of
the number of pixels in IS.
Our method is similar to those presented by Popat and Picard [142] and Grim and
Haindl [74] in that we use a parametric model of the distribution of local textural
2Personal communication.
Chapter 6—A generative parametric model of texture 173
appearance. However, Popat and Picard’s model had a hierarchical configuration
which was designed to capture overall texture structure. Grim and Haindl also
modelled the distribution of local textural appearance, but a number of such
models were used, one for each decorrelated component of the colour space.
We address the first of the above problems by using an explicit representation of
the distribution of the appearance of the windows. We address the second prob-
lem by moving the burden of iterating over the “training” set to a training stage,
meaning that the computational complexity of image synthesis and analysis be-
comes a function of the model parameters and is largely unrelated to the number
of training points. We also address building the model from large training sets.
Our method assumes a training set of a number of images containing examples of
the same texture. Centred on each pixel in the training set, we extract windows
of size w × w, where w is odd, centred on the pixel. Windows that overlap the
border of their image are discarded. The pixels in each window are concatenated
so that the windows may be considered as points in a high-dimensional space. We
seek to model the distribution of these points. The divide-and-conquer algorithm
(described in Section 5.6) and the k-means algorithms are used to build a GMM
of the distribution. We use the fast non-iterative variant of k-means for the
first stage of the divide-and-conquer algorithm, and then the iterative variant to
produce the final clustering, and hence the model. We have also built models
using the EM algorithm for Gaussian mixtures.
We now describe how the model can be used to generate new examples of the
modelled texture.
Chapter 6—Generating synthetic textures 174
6.5 Generating synthetic textures
We have developed two algorithms for texture synthesis, which are parametric
analogues of the non-parametric methods used by Efros and Leung and Efros and
Freeman. Each assumes a Gaussian mixture model of a particular class of texture
parameterised by Θ, as described in Section 6.4. The algorithms are presented in
Algorithm 6 and Algorithm 7. Like the Efros and Leung and Efros and Freeman
algorithms, we define a target image, IT , whose pixels are initially unpopulated.
We describe the two algorithms in the following sections.
6.5.1 Pixel-wise texture synthesis
Algorithm 6 is analogous to the Efros and Leung synthesis algorithm.
6.5.2 Patch-wise texture synthesis
Algorithm 7 is analogous to the Efros and Freeman synthesis algorithm. The
algorithm is the same as for the pixel-wise algorithm, except the marginalisation
step is removed and the sample from the conditional model contains pixel values
for the remainder of the window, rather than individual pixels. Because the
image is filled patch-by-patch, synthesis is performed significantly faster than by
the pixel-wise algorithm.
Chapter 6—Generating synthetic textures 175
Algorithm 6 Pixel-wise texture synthesis with a Gaussian mixture model oflocal textural appearance.
. Select a region from one of the training images and insert it at some locationin IT . For example, any 3× 3 pixel section could be used.repeat
. Compute a list, U , of unpopulated pixel locations that are 8-connected tothe populated area of IT .. Randomly choose a pixel location, u ∈ U .. Examine a vector t, formed from a square window of size w × w pixels,centred on u. Some dimensions (pixels) will be populated and some will not.. Marginalise the GMM over all dimensions corresponding to unpopulatedpixels (not including the centre pixel), as described in Section 5.5.1. Thisyields a Gaussian Mixture Model parameterised by Θ′.. Condition the model with parameters Θ′ on the values of the populatedpixels, at the corresponding dimensions, as described in Section 5.5.2. Thisyields a Gaussian Mixture Model parameterised by Θ∗.
This conditional model represents the likely distribution of pixel values forthe centre pixel, given the local populated pixels.
. Sample a value, p, from the model parameterised by Θ∗.
. Insert the value p into IT at location u.until all the pixels in IT have been populated.
Chapter 6—Generating synthetic textures 176
Algorithm 7 Patch-wise texture synthesis with a Gaussian mixture model oflocal textural appearance.
. Select a region from one of the training images and insert it at some locationin IT . For example, any 3× 3 pixel section could be used.repeat
. Compute a list, U , of unpopulated pixel locations that are 8-connected tothe populated area of IT .. Randomly choose a pixel location, u ∈ U .. Examine a vector t, formed from a square window of size w × w pixels,centred on u. Some dimensions (pixels) will be populated and some will not.if the window overlaps the edge of the image then
. Marginalise the model over the dimensions that lie outside IU .end if. Condition the Gaussian mixture model on the values of the populatedpixels, at the corresponding dimensions, as described in Section 5.5.2. Thisyields a Gaussian Mixture Model parameterised by Θ∗.
This conditional model represents the likely distribution of pixel values forthe unpopulated pixels in the window around the pixel with location u.
. Sample a vector, p, from the model parameterised by Θ∗. In the case thatΘ∗ represents a univariate model, p will be a scalar.. Insert the values p into IT to populate the remainder of the window centredon location u.
until all the pixels in IT have been populated.
Chapter 6—Generating synthetic textures 177
6.5.3 The advantages and disadvantages of a parametric
statistical approach
As described in section Section 6.4, the Efros and Leung and Efros and Free-
man sampling methods do not truly reflect the statistical distribution of window
appearance. The parametric model that we propose can easily be sampled in a
principled manner.
Because we have taken a statistical approach, we have been able to address the
run-time efficiency of the pixel-wise algorithm, providing a natural extension
of the method to a patch-wise algorithm. Thus, our approach is a parametric
generalisation of the Efros and Leung and Efros and Freeman algorithms.
As we shall see in Section 6.7, the parametric approach allows us to perform nov-
elty detection by assigning a likelihood to each pixel, which would be problematic
and computationally expensive using a non-parametric approach.
One can view the aim of texture synthesis as follows: to produce original examples
of an existing texture, which are both specific and general—i.e. the generated
textures are similar to, and span the range of, the textural appearance observed
in the set of example textures. The non-parametric methods use direct copying of
pixel values in an attempt to achieve specificity and ad hoc sampling methods in
an attempt to achieve generality. In contrast, the parametric method we propose
attempts to achieve both specificity and generality by sampling from the learned
distribution.
A potential drawback of the parametric approach is that synthesised pixels or
Chapter 6—Some texture models and synthetic textures 178
patches will not necessarily exist in the training set (they are sampled from a
model, rather than copied from legal examples). While illegal pixel values or
patches cannot be synthesised by the non-parametric approach, they are simply
unlikely under the parametric method.
6.6 Some texture models and synthetic textures
Having described the theory behind our parametric texture model, we now show
synthesis results for two mammographic textures. The first is a “fractal”3 texture
which is generated using a simple procedure. This texture is similar to mammo-
graphic parenchymal patterns. Unlike real mammographic texture, these fractal
textures are stationary, and so the key assumption of our model is well-matched
to the properties of the texture. The fractal texture served as a useful “sanity
check” while developing the model. The second set of textures are regions taken
from real digitised mammograms.
6.6.1 A model of fractal mammographic texture
The recursive procedure for generating the fractal textures is shown in Algo-
rithm 84. For the training images used to build the model presented in this
section, the initial size of the image was 4× 4 pixels, and the algorithm was run
until the image was 256 × 256 pixels. Example training textures are shown in
3We refer to this type of texture as being fractal-like because of the generation process,which involves applying the same algorithm at a number of scales. It is the generative processthat is self-similar, rather than the final texture.
4Implementation by Arjun Viswanathan.
Chapter 6—Some texture models and synthetic textures 179
Algorithm 8 Fractal mammographic texture algorithm.
. An n × n grey-scale image matrix is initialised with random pixel values,sampled uniformly on [0, 1].repeat
. The function underlying the image is interpolated to form an new imagematrix with four times the number of pixels (i.e. each of the pixels in theprevious image corresponds to four pixels in the new image).. Each pixel value is perturbed by adding uniform random noise, sampleduniformly on [0, 1], and scaled by 2−i, where i is the iteration number.
until the image reaches a predefined size.
Figure 6.2
A Gaussian mixture model of the fractal texture was built using the approach
described in Section 6.4, using 10 training textures generated by 10 runs of Al-
gorithm 8, a window size of 11 × 11 and 50 model components (though 10 were
discarded by the k-means algorithm due to weak support). Some unconditional
sampled patches are shown in Figure 6.1. Examples of synthetic textures gener-
ated from the model using both pixel- and patch-wise sampling (as described in
Section 6 and Section 7) are shown in Figure 6.2
6.6.2 A model of real mammographic texture
A Gaussian mixture model of the real texture was built using the approach de-
scribed in Section 6.4, using 10 training textures that were manually selected from
the Digital Database of Screening Mammography [83] to represent the range of
real mammographic textural variation, a window size of 11 × 11 and 50 model
components (again, 10 were discarded by the k-means algorithm due to weak sup-
port). Some unconditional sampled patches are shown in Figure 6.3. Examples
Chapter 6—Some texture models and synthetic textures 180
Figure 6.1: Unconditional samples from the fractal model.196 samples from the model of fractal texture. For this figure, all model compo-nents were equally likely to be sampled from.
Chapter 6—Some texture models and synthetic textures 181
Figure 6.2: Fractal training and synthetic textures.Top row: Three training images. Middle row: Synthetic textures produced us-ing the pixel-wise algorithm. Bottom row: Synthetic textures produced using thepatch-wise algorithm.
Chapter 6—Some texture models and synthetic textures 182
Figure 6.3: Unconditional samples from the real mammographic texture model.196 samples from the model of real mammographic texture. For this figure, allmodel components were equally likely to be sampled from.
of synthetic textures generated from the model using both pixel- and patch-wise
sampling (as described in Section 6 and Section 7) are shown in Figure 6.4
6.6.3 The quality of the synthetic textures
It is relatively easy to qualitatively assess the quality of the synthetic textures,
simply by comparing them to the training textures (a more quantitative evalua-
Chapter 6—Some texture models and synthetic textures 183
Figure 6.4: Real training and synthetic textures.Top row: Three training images. Middle row: Synthetic textures produced us-ing the pixel-wise algorithm. Bottom row: Synthetic textures produced using thepatch-wise algorithm.
Chapter 6—Some texture models and synthetic textures 184
tion is described in Section 7.2). In the case of the fractal texture, it is subjectively
clear that the pixel-wise method produces very good synthetic textures, while the
patch-wise method produces much less convincing results. Except for the fractal
mammographic texture, the synthetic textures that we have generated using the
patch-wise algorithm have been subjectively very similar to those produced by
the pixel-wise algorithm (where the same model was used by each algorithm). It
is not clear why the fractal patch-wise textures are so poor, but detailed work to
determine this is beyond the scope of this thesis.
In the case of the real mammographic textures, both the pixel-wise and patch-wise
methods produce reasonable results, but the synthetic images are easily distin-
guishable from the training images. The synthetic real mammographic textures
do capture the local textural appearance of the training images, but the overall
appearances are subjectively quite dissimilar. The most likely explanation for
this is that structure exists in mammograms on a number of levels; the texture
will be determined by local tissue type (e.g. glandular, fatty) and higher-level
structure (such as a duct). The high-level structure breaks the assumption of
stationarity.
It is possible for two areas in synthetic images to develop independently before
ultimately converging. If these two areas have different textural appearances, then
pixels or patches that are synthesised where the areas meet are forced to merge
one type of textural appearance into another. This can cause a discontinuity. We
have not investigated strategies that may prevent this behaviour, but such work
may yield better synthetic textures. More extreme examples of this type of failure
are shown in Figure 6.5. When the texture appears to have been adequately
Chapter 6—Some texture models and synthetic textures 185
Figure 6.5: Examples of synthesis failure using patch-wise synthesis with amodel of real mammographic appearance.
modelled, we estimate that failures of this type occur in fewer than 1 in 20
attempts. Because all parts of the appearance space have non-zero density, it is
also possible for the synthesis procedure to transition to and “get stuck” in a part
of the appearance space which is illegal. This results in incorrect texture being
generated. The frequency of such failure may be reduced by learning a “better
model”—this is possible because the k-means and EM algorithms converge to
locally optimal solutions. Determining if a particular model is the best is still
an open research question. We estimate that failures of this type occur in fewer
than 1 in 10 attempts.
6.6.4 Time and space requirements of the parametric method
The time required to build a parametric model of textural appearance depends
upon the number and size of the training images. Using all the training windows
in a set of training images with the divide-and-conquer algorithm is computa-
tionally expensive. For example: building a model of 10 images, each an average
Chapter 6—Some texture models and synthetic textures 186
of 300× 300 pixels, can take a few days. Using a subset of the training set con-
taining 20 000 training windows and building the model using the EM algorithm,
a model can be built in around 12 hours5.
Storing the model is trivial on a modern workstation. The models of fractal and
real mammographic texture have—by coincidence—the same number of model
components, and each uses 11×11 pixel windows. To encode such a model requires
storing 40 component probabilities, mean vectors and covariance matrices. The
matrix of mean vectors has 40 × 121 elements and each covariance matrix has
121× 121 elements. We therefore need to store 40+ (40× 121)+40(121× 121) =
590 520 parameters. If double precision representation (IEEE Standard 754 [100])
is used to encode these parameters, then each parameter requires 8 bytes, and so
the model can be stored in 4 724 160 bytes—less than 5MB—without compression.
Storing this model in an uncompressed form consumes more space than storing
the original images (since pixel values are usually represented using relatively low
precision and compression is commonly used). However, because the size of the
model is fixed, synthesis or analysis of each pixel can essentially be performed in
O(1) time, while the non-parametric methods are required to iterate over each
possible window in the “training” set. Marginalisation is computationally cheap,
while computing a conditional distribution—which must be done for each pixel
or patch—is relatively expensive. Profiling reveals that approximately 98% of the
parametric synthesis algorithms’ time is spent computing Moore-Penrose gener-
alised inverses. Using our optimised implementation (see Section 5.5.2), each
5These figures are for a workstation with a 1.3GHz Intel Pentium 4 processor with 512MBof physical memory.
Chapter 6—Novelty detection 187
pixel or patch takes approximately 0.22 seconds to generate on a computational
server with a 2.8GHz Intel Xeon Hyperthreaded processor with 2GB of physical
memory6. Using the pixel-wise algorithm, a 300 × 300 pixel image can be syn-
thesised in 5.5 hours, while an image of the same size can be synthesised in a few
minutes using the patch-wise algorithm.
6.7 Novelty detection
Because we have an explicit statistical model of the appearance of local texture,
it is possible to assign likelihoods to pixels, based upon a local neighbourhood.
Pixels marked as unlikely should be interpreted as being novel. The novelty
detection algorithm is very similar to the pixel-wise synthesis algorithm and is
described in Algorithm 9. We assume an unseen image IU which may contain
texture that is not of the modelled class and a model of the expected texture
with parameters Θ. We will form an image of log-likelihoods IL and a binary
image indicating novel pixels IB.
Note that the likelihood of an event is the probability of the event had it not
actually occurred (i.e. probabilities refer to future events, while likelihoods refer
to past events). In order to compute true likelihoods (or log-likelihoods), the pdf
defined by the conditional model would need to be integrated between suitable
limits. Since the conditional pdf—given by pΘ∗(x)—is univariate, this is relatively
simple:
L(a) =
∫ a+r+
a−r−pΘ∗(x) dx (6.1)
6This machine was shared with one other large computational job.
Chapter 6—Novelty detection 188
Algorithm 9 Novelty detection using a Gaussian mixture model of texture.
for each pixel location p ∈ IU do. Extract a square window, of size w×w, represented as a vector t, centredaround the pixel at location p.if the window overlaps the edge of the image then
. Marginalise the model over the dimensions that lie outside IU .end if. Condition the model upon all values in t, except for the centre pixel. Letthe resulting univariate model be parameterised by Θ∗.. Compute the log-likelihood, l, of the centre pixel value under the modelparameterised by Θ∗ (see the text for more details).. Assign l to the pixel at location p in IL.
end for. In addition to the log-likelihood image, produce a binary image IB whichidentifies novel pixels using a threshold on the log-likelihoods (e.g. learnedusing an independent training set).
where r− and r+ delimit the event and may be estimated from the expected noise
on the pixel value a. If we assume that the noise is constant then we may set
r− = r+ = ∆2
(where ∆ defines a region around the pixel value a). If ∆ is suitably
small, then Equation 6.1 can be approximated by
L(a) ≈ ∆pΘ∗(a). (6.2)
We can use the conditional density at a as a proxy for the likelihood estimated
by Equation 6.2, as it is simply a scaling of Equation 6.2. If actual likelihoods
are required (for example by another system), then IL can be scaled using the
estimate of ∆. In the novelty detection work in this thesis, we use the conditional
density at a as our likelihood measure.
It is natural to ask if an analogue of the patch-wise synthesis algorithm could be
used to efficiently perform novelty detection. Unfortunately, the density at any
Chapter 6—Summary 189
point in a high dimensional space is vanishingly small, so a patch-wise novelty
detection algorithm cannot be used7.
6.8 Summary
This chapter presented a parametric statistical model of stationary texture, de-
veloped from the texture synthesis algorithm of Efros and Leung. In summary:
• Efros and Leung’s texture synthesis algorithm was described.
• A parametric version of Efros and Leung’s algorithm was developed.
• Two methods of generating synthetic textures using the model were de-
veloped. Our model can be viewed as a parametric generalisation of the
methods of Efros and Leung and Efros and Freeman.
• Synthetic textures that were generated using our model were presented and
discussed.
• A novelty detection method was developed that allows the parametric model
to be used to analyse textures.
7It may be possible to compute a few likelihoods at once, as a compromise between the twoextremes, but this has not been investigated further
Chapter 7
Evaluating the texture model
7.1 Introduction
This chapter presents an evaluation of the parametric texture model for texture
synthesis and analysis. The chapter describes:
• A psychophysical evaluation of synthetic mammographic textures produced
using the model.
• An evaluation of how well the model can detect abnormal features in sim-
ulated and real mammographic images.
190
Chapter 7—Psychophysical evaluation of synthetic textures 191
7.2 Psychophysical evaluation of synthetic tex-
tures
It is relatively easy to make a personal qualitative assessment of whether a pair of
textures are similar or not. However, this approach is subjective and qualitative;
an objective and quantitative approach is preferred. Few of the most frequently
cited papers in the texture modelling and synthesis literature present any such
evaluation (e.g. [53, 59, 60, 84, 144]). Little rigorous evaluation appears to be at-
tempted. Brettle et al. evaluated methods for synthesising textures from medical
images (including mammographic textures) using several texture measures [27].
The synthetic images generated by Efros and Leung’s original method [60] were
found to be most realistic. Although texture features provide an objective and
quantitative measure of textural properties, the best systems available to compare
textures are evolved biological vision systems, such as the human visual system.
Psychophysical experiments can allow the human visual system to be used ob-
jectively and quantitatively. We now present a psychophysical experiment that
evaluates the synthetic textures produced using our model.
7.2.1 Aims
The primary aim of this experiment was to determine if textures generated using
the parametric model of local texture can be differentiated from examples of
the real texture. The secondary aim is to compare synthetic textures generated
using the parametric model to those generated using Efros and Leung’s method.
Chapter 7—Psychophysical evaluation of synthetic textures 192
Since the patch-wise synthetic images can easily be differentiated from the real
textures, we restrict ourselves to the pixel-wise images. We can therefore state
our experimental hypotheses:
1. Synthetic fractal mammographic textures generated by the parametric model
are indistinguishable from real fractal mammographic textures.
2. Synthetic real mammographic textures generated by the parametric model
are indistinguishable from real mammographic patches.
3. Synthetic fractal mammographic textures generated by the parametric model
are more like real fractal mammographic texture than those produced using
Efros and Leung’s method.
4. Synthetic real mammographic textures generated by the parametric model
are more like real mammographic patches than those produced using Efros
and Leung’s method.
7.2.2 Method
The forced-choice paradigm is well-suited to the process of comparing a pair of
textures. Participants were presented with a series of three textures which we
shall call Image A, Image B and a reference image.
In the case of experiments 1 and 2, the reference image was an example of a real
texture selected from the training set, Image A was a synthetic texture generated
using the parametric model and Image B was an example of a real texture selected
from the training set (though different to the reference image).
Chapter 7—Psychophysical evaluation of synthetic textures 193
In the case of experiments 3 and 4, the reference image was an example of a real
texture selected from the training set, Image A was a synthetic texture generated
using the parametric model and Image B was a synthetic texture generated using
Efros and Leung’s method. Note that there was an important difference between
the way that the Image A synthetic textures and Image B synthetic textures
were generated in experiments 3 and 4: Image A textures were generated from a
model that was trained on a number of mammographic images while each Image
B texture was generated from a single “training” image using Efros and Leung’s
method. This was necessary because the Efros and Leung algorithm scales poorly
with the number of “training” pixels. It was expected that this would result in
the Image B textures to be highly specific to the image they were generated from
and appear to be more consistent and “plausible”.
For each set of three images, all were of the same class of texture. The images
were arranged in a row with the reference image in the centre. Image A would
appear to the left of the reference image with probability 0.5, and the position of
Image B was set accordingly. Trials corresponding to the four hypotheses were
presented in a random order, so that participants could not easily guess the exact
experimental design and introduce bias into their responses. The participants
were asked to compare Image A and Image B to the reference image and choose
the one they thought to be most similar to the reference image.
Each of the three images could be drawn from a set of 10 images (e.g. there were 10
synthetic reals generated using our method, 10 synthetic fractals generated using
our method and 10 real mammographic images). The images in the training
sets—and hence the reference image sets—were manually selected such that the
Chapter 7—Psychophysical evaluation of synthetic textures 194
sets represented a broad range of textural appearance for the class of texture
being investigated. No synthesised images were excluded (e.g. on the basis that
synthesis failure occurred). Each participant was shown 10 image sets for each
experiment. The number of images of each type was limited by the computational
time required to synthesise the four classes of synthetic image and the number of
images presented to each participant was selected such that the experiment could
be completed within a reasonably short space of time (approximately 5 minutes).
Ideally, the experiment would have been conducted using more reference and
synthesised textures. This would minimise the probability of the participants
seeing the same images (or combination thereof) and could more accurately reflect
the distribution of the various “types” of textural appearance. However, we
believe that the design achieves these aims to the maximum extent possible under
the time constraints imposed by the synthesis algorithms and the participants’
patience.
The experiment was implemented as a Internet-based application, delivered via
an XHTML [138] interface. Image A and Image B were hyperlinks that reported
the participants’ choices to the application. The names of the image files were
disguised as “random” strings of text, so that web browser software could not
disclose the “correct” image by displaying the image filenames on-screen. The
hyperlink encoding of the participants’ selections was similarly disguised. The re-
sponses were recorded in a database upon completion of the experiment (i.e. only
results from those who completed the experiment were recorded). A screenshot
of one of the trials is shown in Figure 7.1. The number of times Image A and
Image B were chosen was recorded for each participant, allowing χ2 analysis by
Chapter 7—Psychophysical evaluation of synthetic textures 195
Figure 7.1: A screenshot of one of the trials.
pooling all participants.
The experiment was run twice. The aim of the first run was to test the application
using a small number of participants. The experiment was advertised in an email
to all members of the division of Imaging Science and Biomedical Engineering
at the University of Manchester. The first run of the experiment attracted 24
participants. The aim of the second run was to get as many people as possible
to take part. The experiment was advertised to all students—undergraduates
and postgraduates—of the University of Manchester via email. The second run
of the experiment attracted 1 777 participants. Participants were therefore self-
selecting, and we did not control for factors such as age and sex.
Chapter 7—Psychophysical evaluation of synthetic textures 196
Experiment Image A selection (small run) Image A selection (large run)
1 29% (of 240 trials) 34% (of 17 770 trials)2 27% (of 240 trials) 28% (of 17 770 trials)3 38% (of 240 trials) 41% (of 17 770 trials)4 25% (of 240 trials) 26% (of 17 770 trials)
Table 7.1: Results for the psychophysical experiment.Row 1: Synthetic fractal textures generated by our model versus real fractal tex-tures. Row 2: Synthetic real mammographic textures generated by our model ver-sus real fractal textures. Row 3: Synthetic fractal textures generated by our modelversus those generated using Efros and Leung’s algorithm. Row 4: Real mammo-graphic textures generated by our model versus those generated using Efros andLeung’s algorithm.
7.2.3 Results
Results for the small and large runs are shown in Table 7.1. The tables shows
the number of “votes” for Image A (the synthetic textures generated using our
parametric model) as a percentage of the total for each of the four experimental
conditions (refer to the Section 7.2). Image B was selected more often in all cases,
and this result is statistically significant at the 95% significance level.
7.2.4 Discussion
In experimental conditions 1 and 2, we would have liked Image A to have been
chosen 50% of the time. This would have suggested that participants could not
differentiate between the real examples of the two texture classes and those gener-
ated by the parametric method. The results indicate that participants were able
to differentiate between the real and synthetic textures, but the synthetic images
were realistic enough that the participants mistook them for the real textures
Chapter 7—Initial validation of the novelty detection method 197
about a third of the time. Subjectively, the simulated fractal mammographic
textures appear to be modelled more successfully than the real mammographic
texture, and the results support this observation.
In experimental conditions 3 and 4, we would have liked Image A to have been se-
lected in preference to Image B (i.e. more than 50%). This would have suggested
that participants thought that the synthetic images generated using the paramet-
ric model were more like the real textures than the synthetic images generated
using Efros and Leung’s method. The results indicate that participants were able
to differentiate between the synthetic images generated using the two methods.
As suspected, participants favoured the synthetic images generated using Efros
and Leung’s method, but the images generated using the parametric method were
preferred in 41% (fractal mammographic texture) and 26% (real mammographic
texture) of cases. However, this experimental condition was heavily biased in
favour of Efros and Leung’s method because of the difference in the way that the
training set was utilised by the two methods. Efros and Leung’s method produces
more specific textures but it cannot be used to analyse textures.
7.3 Initial validation of the novelty detection
method
In order to use the novelty detection method developed in Chapter 6, we need to
be confident that it can perform texture discrimination. To validate the method,
a simple experiment was performed.
Chapter 7—Initial validation of the novelty detection method 198
Figure 7.2: Fractal and scrambled textures.A fractal texture is shown on the left. The right-hand texture is the left-handfractal texture after being scrambled. The grey-level histograms of both texturesare identical.
7.3.1 Aim
The aim of this experiment is to determine if the novelty detection method can
discriminate between two textures with similar characteristics.
7.3.2 Method
A fractal mammographic image was generated. A second image was generated
from the first by scrambling the pixel locations. The resulting image has exactly
the same histogram as the fractal image, but has a different texture. An example
is shown in Figure 7.2. Log-likelihood images were generated for each image by
applying Algorithm 9. The log-likelihood values obtained for each image were
then compared.
Chapter 7—Initial validation of the novelty detection method 199
Figure 7.3: ROC curve for texture discrimination.
7.3.3 Results
Figure 7.3 shows a ROC curve that was generated by varying a threshold on the
log-likelihood values to classify pixels as belonging to the fractal image or the
scrambled image. The ROC curve shows excellent discrimination (Az = 0.98).
Analysis of the log-likelihood histograms shows that pixels in the scrambled image
are considered to be less likely than those in the fractal image.
7.3.4 Discussion
The results show that the novelty detection method can discriminate on the basis
of local textural appearance (rather than just pixel intensity, for example). This
is important because although mammographic abnormalities often appear to be
Chapter 7—Evaluation of novelty detection performance 200
brighter than their surrounding tissue, the pixel values are often within the range
of those for normal tissue. The novelty detection method should also function
as a “brightness detector”—regions that are unusually bright (or dim) should be
considered unlikely by our model.
7.4 Evaluation of novelty detection performance
7.4.1 Introduction
We performed a number of novelty detection experiments on abnormal mam-
mographic patches. Two types of mammographic texture were used: simulated
abnormal mammographic textures (based on the fractal textures) and patches
from real mammograms. Two classes of abnormality were used for each type
of texture: masses and calcifications. In the case of the fractal textures, these
abnormalities were simulated.
7.4.2 Aims
The aim of these experiments was to determine whether a single model of mam-
mographic textural appearance (for a particular class of texture) can be used to
detect different forms of abnormality (where a conventional pattern recognition
algorithm approach would require multiple classifiers and probably multiple types
of feature descriptor).
The experiments were designed to answer the following questions:
Chapter 7—Evaluation of novelty detection performance 201
1. How well can abnormalities be detected in simulated mammographic tex-
tures where the textures contain simulated calcifications?
2. How well can abnormalities be detected in simulated mammographic tex-
tures where the textures contain simulated masses?
3. How well can abnormalities be detected in simulated mammographic tex-
tures where the textures contain both simulated masses and calcifications?
4. How well can abnormalities be detected in real mammographic textures
where the textures contain real calcifications?
5. How well can abnormalities be detected in real mammographic textures
where the textures contain real masses?
6. How well can abnormalities be detected in real mammographic textures
where the textures contain both real masses and calcifications?
7.4.3 Method
For each experiment a set of images were generated or collected which contained
the desired type, or types, of abnormality. We shortly describe how the simulated
abnormalities were generated. The real microcalcification patches were selected
by a colleague from a local database (as pixel-level expert annotation was avail-
able) on the basis that the set should represent a broad range of appearances
of that class of abnormality. The other real data was pseudo-randomly selected
from the Digital Database for Screening Mammography (DDSM) [83]. Because
Chapter 7—Evaluation of novelty detection performance 202
the analysis process is computationally expensive, the real mammographic im-
ages were processed at low resolution (150 µm), using a model trained on 10
pathology-free patches from the DDSM (scaled to the same resolution), selected
in the same way as above. Each test set contained 10–20 regions of interest.
The size of the test sets were limited due to the computational time required to
perform the analysis task.
In the case of the simulated abnormalities, groundtruth images were automati-
cally generated. In the case of the real mammographic data the groundtruth was
provided by a digital mammography researcher1. Care was taken during the an-
notation to ensure that the groundtruth was as detailed as possible, rather than
simply marking the centres of abnormalities or providing coarse indications such
as circles that contain the abnormalities. In the case of the microcalcification
images, for example, each microcalcification was individually annotated.
We did not include separate normal images for analysis alongside the simulated
images or the real microcalcification images for computational expediency. The
groundtruth annotations were interpreted strictly: we considered “hits” on pixels
labelled as abnormal to be true positive detections, “hits” on pixels labelled as
normal to be false positive detections and so on for the true negative and false
negative possibilities. Relative to the majority of results published in the liter-
ature, this interpretation of groundtruth produces a pessimistic evaluation of a
detection system because a “hit” close to an abnormal feature would be likely to
draw a clinician’s attention to the area and so would be clinically useful2. How-
1Michael Board, a third year digital mammography PhD student in the Division of ImagingScience and Biomedical Engineering at the University of Manchester.
2In the computer-aided mammography literature it is common to consider a single correct
Chapter 7—Evaluation of novelty detection performance 203
ever, we believe that our detection criteria are appropriate for experiments on
relatively small test images because we want to measure how well the method
detects specific indicative signs of abnormality, rather than measuring how well
the method would alert clinicians to the presence of abnormality (which would
presumably result from accurate detections of abnormal features). The strict
interpretation of groundtruth treats each pixel in the test images as a separate
data point, delivering a large sample from a seemingly small training set. This
allows the area under the ROC curve to be accurately estimated (i.e. with small
standard error—see Section 7.4.4). However, results from experiments that use a
small number of images are obviously less representative than those from exper-
iments that use a large number of images.
Each pixel in each test image was assigned a log-likelihood—using the appropriate
model—as described in Section 6.7. ROC analysis was performed on each set of
results by thresholding the log-likelihoods, and the resulting classifications were
compared to the groundtruth annotation.
For the real mass images, we found that pixels labelled as being masses were given
very similar log-likelihoods to the surrounding non-mass pixels, to the extent that
no discrimination could be achieved. This was either because our approach fails
on this class of image, or because it is unrealistic to consider tissue close to a mass
to be normal (e.g. it may be distorted by the presence of the mass). For this class
of image, we also analysed a set of 10 pathology-free images, and considered all
pixels in an abnormal image to be abnormal, and all pixels in a normal image to
“hit” in a coarsely annotated abnormal region to be a true positive detection of that region,irrespective of its absolute location or incorrect “hits” or “misses” within that region.
Chapter 7—Evaluation of novelty detection performance 204
be normal.
Generating synthetic calcifications and masses
The simulation of mammographic abnormalities has been investigated previously.
Highnam et al. investigated adding simulated and real masses to mammograms
represented using hint (see Section 3.3 for details of hint) [87]. Simulated masses
were generated by inferring 3-D models of 2-D mass shapes (obtained, for exam-
ple, from annotations of real masses). The hint values of the real masses were
estimated by subtracting the average hint value of the surrounding non-mass re-
gion from those of the mass region. In each case, the estimated mass hint values
were then simply added to normal hint mammograms. Caulkin et al. modelled
the appearance of spiculated masses by estimating the contribution of the nor-
mal tissue to the abnormal region and then learning statistical models of the
contributions due to the central mass and spicules [38]. A model of spicule place-
ment and number was also learned. Simulated spiculated lesions were generated
by sampling from the models and adding the simulated abnormalities into nor-
mal mammograms. Claridge and Richter modelled the cross-sectional profile of
masses by convolving a step-edge function with a Gaussian kernel and then ro-
tating the resulting function to form a surface, where the height is proportional
to the attenuation due to the mass [44] (a similar approach is described in more
detail below). Bliznakova et al. modelled masses, spicules and microcalcifications
within a 3-D breast model (see Section 9.2.2 for a more detailed description of
the approach) [16].
Chapter 7—Evaluation of novelty detection performance 205
In our work, both the simulated calcification and mass images were based upon
the fractal mammographic textures described in Section 6.6.1. Fractal back-
grounds were generated and simulated calcifications or masses were introduced
using an additive process, mimicking the attenuation process. We describe how
each type of abnormality was modelled below.
The simulated microcalcifications were modelled by ellipses, rotated to random
angles and placed in clusters. The number of calcifications in each image was fixed
at 30. Algorithm 10 describes how the microcalcifications were simulated and how
the groundtruth was generated. Note that although the simulated microcalcifi-
cation shape and spatial distribution were modelled in an ad hoc way (though
broadly consistent with real data), microcalcification brightness was modelled to
be consistent with real data.
Real mammographic masses can have well-defined borders, diffuse borders or
spiculations. We decided to model masses with diffuse borders because well-
defined borders may to too easy to detect, while spiculated lesions would be
harder to model when only a simple simulation of abnormality is required.
One might model a mass in a breast as being a sphere of uniform density, situated
within the normal breast tissue and distorted by the compression of the breast
between two plates. While a detailed analytical model of the problem could be
derived, a reasonable approximation that yields suitable test images would be
acceptable. We experimented with three methods of simulating masses. All the
methods modify the fractal background pixel values within a disc, and differ in
how the abnormal pixel values are modelled.
Chapter 7—Evaluation of novelty detection performance 206
Algorithm 10 Simulating microcalcification clusters.
. Generate a fractal texture using Algorithm 8
. Determine the centre of the image and consider the image edges to be 6standard deviations from the centre. Select locations for the calcifications usingthe resulting bivariate normal distribution.for each calcification to be generated do
. Generate a 100× 100 pixel disc.
. Warp the disc to a random elliptical shape (with a mean eccentricity of 2and associated variance of 0.5).. Rotate the ellipse to a random angle.. Convolve the ellipse with a Gaussian kernel (with a standard deviation of10 pixels), to remove the hard edges.. Resize the ellipse to be 4 pixels long along its major axis.. Normalise the ellipse so that its maximum values are set to unity, and theother pixels are scaled accordingly.. Scale the simulated microcalcification pixel values such that when added tothe image, the ratio of the mean calcification pixel value to the mean fractalbackground pixel value is normalised to the same ratio for real mammogramscontaining microcalcifications.. Insert the calcification into the fractal image by adding the pixel values ofthe calcification to the background image pixel values.
end for. Compute the groundtruth image by subtracting the calcification image fromthe original fractal image and then threshold at a low value to discard the effectof the convolution with the Gaussian kernel.
Chapter 7—Evaluation of novelty detection performance 207
Our first method adds a two dimensional Gaussian to the fractal background.
This method produces very diffuse borders. An example of such an image is
shown in Figure 7.6a.
Our other two methods use concentric discs, where the central disc has uniform
pixel values and the annulus transitions from the uniform value to zero. The
first of these methods assumes that the compressed mass has a cross-section
as illustrated in the top graph in Figure 7.4. The cross-section of the annular
region is semi-circular with radius k. The function, f(d), that describes the X-
ray attenuation in this model (i.e. the depth of the mass) is described by:
f(d) =
0 : d > m
1 : d < m− k√1− (d−(m−k)
k)2 : otherwise
(7.1)
where d is distance from the centre of the concentric discs, m is the distance along
d of the simulated mass boundary and k is the difference between the radii that
describe the two discs. The value of f(d) describes the thickness of the simulated
mass, which is a chord that is perpendicular to the d axis when d > m − k
and d < m. We call this method the circle chord method. An example of
the simulated mass images produced using the circle chord method is shown in
Figure 7.6b.
The second variant of the concentric discs model uses a sigmoid function to de-
Chapter 7—Evaluation of novelty detection performance 208
Figure 7.4: The circle chord attenuation function.The top graph shows a cross-section of the model of the compressed mass. Thebottom graph shows the attenuation function, f(d).
Figure 7.5: The sigmoid attenuation function.
scribe the attenuation in the annular region:
f(d) =1
1 + e−α
[1− 1
k
(d−(m−k)
)] (7.2)
where d, m and k are as before. The constant α determines the shape of the
sigmoid and we use a value of 6. An illustration of the sigmoid attenuation
function is shown in Figure 7.5. An example of the simulated mass images
produced using the sigmoid method is shown in Figure 7.6c.
Chapter 7—Evaluation of novelty detection performance 209
(a) (b) (c)
Figure 7.6: Examples of simulated masses using the three methods.Gaussian method (a); circle chord method (b); sigmoid method (c).
For each of the methods, the magnitude of f(d) was scaled so that the ratio of the
mean mass pixel value to the mean non-mass pixel value was equal to that found
in real mass images (i.e. we attempt to accurately model mass brightness). We
assessed by inspection that the sigmoid method produces acceptable test images.
Because the novelty detection algorithm is computationally expensive, we limited
the number of pixels to be analysed by cropping the mass images to contain an
equal number of mass and non-mass pixels. The groundtruth was generated by
computing the difference image between the original fractal images and the result
after adding the synthetic mass.
Although the mass region can be easily identified by eye, the actual pixel values
are not necessarily higher than those in the fractal backgrounds. This is the case
when, for example, a mass pixel is added to a relatively dark background region.
If all of the mass pixel values (after being added to the fractal background) were
higher than those in fractal image (without the presence of a simulated mass),
then simply thresholding the images would identify the mass region. However,
Chapter 7—Evaluation of novelty detection performance 210
this is not the case.
7.4.4 Results
Results for simulated microcalcifications
Figure 7.7 shows a simulated microcalcification image and Figure 7.7 the corre-
sponding log-likelihood image. Figure 7.7(c) shows the ROC curve for all simu-
lated microcalcification images. The area under the curve is approximately 0.92.
Results for simulated masses
Figure 7.8 shows one of the simulated masses and the corresponding log-likelihood
image. Figure 7.8(c) shows the ROC curve for all simulated mass images. The
area under the curve is approximately 0.64.
Results for simulated mass and microcalcifications (combined)
Figure 7.9 shows the ROC curve for the experiment where the single novelty
detection method is used to detect both simulated masses and microcalcifications
(in equal proportions). The area under the curve is approximately 0.75.
Chapter 7—Evaluation of novelty detection performance 211
(a) (b)
(c)
Figure 7.7: Example log-likelihood image and ROC curve for simulated micro-calcifications.An example simulated microcalcification image (a); the corresponding log-likelihood image (b); the ROC curve for the simulated calcification images (c).The log-likelihoods range from −276 to −0.5.
Chapter 7—Evaluation of novelty detection performance 212
(a) (b)
(c)
Figure 7.8: Example log-likelihood image and ROC curve for a simulated mass.An example simulated mass image (a); the corresponding log-likelihood image (b);the ROC curve for the simulated mass images (c). The log-likelihoods range from−216 to −0.5.
Chapter 7—Evaluation of novelty detection performance 213
Figure 7.9: ROC curve for simulated masses and microcalcifications (combined).
Chapter 7—Evaluation of novelty detection performance 214
Results for real microcalcifications
Figure 7.10 shows a sample microcalcification cluster along with the correspond-
ing groundtruth and log-likelihood images. Figure 7.10(d) shows the ROC curve
for all the test images. The area under the curve is approximately 0.56.
Results for real masses
Figure 7.11 shows the ROC curve for the real mass experiment. The area un-
der the curve is approximately 0.54. We do not show sample images (see the
discussion of these results in Section 7.4.5).
Results for real mass and microcalcifications (combined)
Figure 7.12 shows the ROC curve for the real mass experiment. The area under
the curve is approximately 0.53. A hypothesis test at the 95% significance level
using the method described by Hanley and McNeil [78] showed that there was a
statistically significant difference between the area under the ROC curve and the
area under the curve corresponding to random discrimination (i.e. the diagonal
line with area equal to 0.5)3.
3It was assumed that the diagonal had the same number of data points as the ROC curve.
Chapter 7—Evaluation of novelty detection performance 215
(a) (b) (c)
(d)
Figure 7.10: Example log-likelihood image and ROC curve for a real microcal-cification cluster.An example real microcalcification image (a); the corresponding groundtruth im-age (b); the corresponding log-likelihood image (c); the ROC curve for the simu-lated mass images (d). The log-likelihoods range from −25 to −0.3.
Chapter 7—Evaluation of novelty detection performance 216
Figure 7.11: ROC curve for real masses.
Chapter 7—Evaluation of novelty detection performance 217
Figure 7.12: ROC curve for real microcalcifications and masses (combined).
Chapter 7—Evaluation of novelty detection performance 218
7.4.5 Discussion
Simulated microcalcifications
As Figure 7.10(b) shows, the synthetic microcalcifications are easily identified,
which is reflected in the corresponding ROC curve. There are no false-positives in
the normal regions; however, because of the strict pixel-wise evaluation criterion,
there are a few false-positives at the microcalcification edges. This is where the
sampled window is not centred on a pixel that is labelled as abnormal, but does
border an abnormal pixel. The result is that the model is partially conditioned
upon abnormal image data—which biases the conditional model—yielding lower
log-likelihood values for the centre pixel. We will call this the local bias effect.
Note that the log-likelihoods in the abnormal regions of the simulated micro-
calcifications are lower than for the simulated masses, which corresponds with
subjective assertions that microcalcifications are easier to detect than masses.
Simulated masses
The results for the simulated masses are not as good as for the simulated micro-
calcifications, but Figure 7.8(b) shows that the mass is identified. The annular
region of the simulated mass is marked as being more abnormal than the central
region. This may be because the model was not trained on images with this
sort of intensity change, while the model did see the more uniform texture of
similar brightness from the centre of the simulated mass during training. The
log-likelihoods for the central area are close to those of the normal background
Chapter 7—Evaluation of novelty detection performance 219
texture, and this is reflected in the ROC curve in Figure 7.8(c).
Simulated masses and microcalcifications (combined)
Figure 7.9 shows the results of using the same model and method to detect both
types of abnormality, and shows that this is possible. Although the data were
simulated, this result is important because it shows that it is possible to identify
more than one type of abnormality using a single method. Note that Figure 7.9
was constructed from data where the ratio of microcalcification to mass data was
equal to unity, so that performance on one type of abnormality did not contribute
disproportionately.
Real microcalcifications
The ROC curve in Figure 7.10(d) is disappointing and indicates that the method
performs only slightly better than a random classifier (as indicated by the red
diagonal line, which represents chance). Microcalcifications are considered to
be easy to detect because they are often very bright against the mammographic
background. However, as Figure 7.10(a) shows, this is not always true. It appears
that the local bias effect may also contribute to the poor performance. The log-
likelihoods in the calcified area tend to be lower, but the most “unlikely” pixels do
not correspond exactly to the individual microcalcifications—instead they tend
to be a few pixels away. There are pixels in the uncalcified tissue which have
low log-likelihoods, and are essentially false positives. This may be because the
model is not sufficiently specific to pathology-free appearance, because the tissue
Chapter 7—Summary 220
really was abnormal (and unannotated) or because it is incorrect to label tissue
so close to a microcalcification cluster as being normal.
Real masses
The results for real masses are similar to those for the real microcalcifications: the
method performs only slightly better than a random classifier. Because we used
separate normal and abnormal test sets, this result adds weight to the hypothesis
that the model is not specific enough to pathology-free appearance. In the case of
masses, it is unreasonable to expect a small local window to detect abnormality.
A better approach might be to adopt a multi-scale approach where likelihoods
are propagated downwards, such as was used by Liu et al. [120].
Real masses and microcalcifications (combined)
Although the performance on real data is relatively poor, this result indicates
that some discrimination of more than one class of abnormality can be achieved
using a single method.
7.5 Summary
This chapter presented an evaluation of the parametric texture model. In sum-
mary:
• A psychophysical evaluation was reported. The experiment was deployed
Chapter 7—Summary 221
as a Internet-based application. The application was tested by a small
number of participants and then advertised to all students at the University
of Manchester.
• The synthetic textures were not indistinguishable from the real textures,
but were selected in approximately one third of trials.
• The synthetic images generated by Efros and Leung’s algorithm were con-
sidered more realistic than those generated by the parametric model. The
textures generated using the parametric model were selected in 26% and
41% of trials. However, the images generated by the Efros and Leung algo-
rithm used a more specific “training” set than was used to train the para-
metric model. Direct comparison of the two approaches should consider
this experimental bias and the ability of the parametric model to analyse
images via novelty detection.
• A novelty detection experiment was reported. Simulated and real microcal-
cification and mass images were analysed using parametric models. Results
for the simulated data show that the novelty detection approach can success-
fully detect multiple types of abnormality using a single method. Results
for the real data show that some discrimination was possible, but significant
improvement is needed. This may be achieved by improving the specificity
of the model and the adoption of a hierarchical strategy.
Chapter 8
GMMs in principal components
spaces and low-dimensional
texture models
8.1 Introduction
This chapter presents a method for learning Gaussian mixture models in low-
dimensional spaces and describes how the parametric texture model may be im-
proved by doing so. The chapter describes:
• The motivation for learning in low-dimensional spaces.
• How principal components analysis can be used to build Gaussian mixture
models—and hence our parametric texture model—in a low-dimensional
222
Chapter 8—Dimensionality reduction 223
space that approximates the natural space of the data.
• How textures can be synthesised using such a model.
8.2 Dimensionality reduction
The dimensionality of the texture model described in Chapter 6 is reasonably
high. With an 11 × 11 window, for example, the model has 121 dimensions.
Given that there is likely to be a high degree of correlation between neighbouring
pixels in the windows, it is sensible to ask if this redundancy can be exploited.
Dimensionality reduction can have a number of benefits in statistical modelling.
Firstly, because the number of data points required to populate a space with fixed
density increases exponentially with the number of dimensions, dimensionality
reduction can allow one to populate the space to be modelled more densely for
a given size of training set. Secondly, since computations often involve iteration
over the number of dimensions in the modelled space, dimensionality reduction
may allow us to develop more efficient algorithms.
In the following sections we describe how a Gaussian mixture model can be built
in a low-dimensional space and how such a model may be used to perform texture
synthesis and analysis.
Chapter 8—Gaussian mixtures in principal components spaces 224
8.3 Gaussian mixtures in principal components
spaces
A set of multivariate measurements, X ={xi : i = 1, . . . , N
}, when thought of
as a cloud of points in a vector space, can be considered to have a set of mutually
orthogonal axes that describe the main directions of variation. In general, these
axes will not be aligned with the regular Cartesian axes (the covariance matrix
of the data is unlikely to be diagonal).
Principal Components Analysis (PCA) [104] is a technique that determines these
axes (the principal components) and the variance associated with each. The
principal components are simply the eigenvectors of the covariance matrix, and
the variances are the associated eigenvalues. If we define P to be a matrix where
each column is an eigenvector of the covariance matrix, then we can project xi, a
measurement in the natural data space, to a vector bi in the principal components
space and back again:
bi = PT(xi − x) (8.1)
xi = x + Pbi ,
where x is the mean vector. (Note that since P has mutually orthogonal columns,
and each is a unit vector, it is orthonormal. Hence P−1 = PT.) Another way
of thinking about P is that it is the transformation needed to diagonalise the
Chapter 8—Gaussian mixtures in principal components spaces 225
covariance matrix of the original data:
Σb = PTΣxP (8.2)
where Σb is the (diagonal) covariance matrix of the data in the principal compo-
nents space and Σx is the covariance matrix of the data in the natural space.
The total variance of the data can be computed by summing the eigenvalues.
Since the eigenvalues describe the variance associated with each dimension of
the principal components space, it is possible to discard eigenvectors with small
associated variances. In this way, dimensionality reduction can be achieved: the
original data can be transformed into a lower-dimensional space while retaining
an arbitrarily large proportion of the total variance. If P is constructed in this
way, then Equation 8.1 becomes approximate:
bi ≈ PT(xi − x) (8.3)
xi ≈ x + Pbi .
Building a Gaussian mixture model in a principal components space is simple.
Compute Σx from X , perform an eigen decomposition to determine P (discard-
ing eigenvectors to retain a given proportion of the total variance), and then
project each xi into the lower-dimensional principal components space to form
B ={bi : i = 1, . . . , N
}. The Gaussian mixture model is then built using the
data in B.
Once we have a low-dimensional model, we will need to compute conditional
Chapter 8—Gaussian mixtures in principal components spaces 226
distributions in order to perform synthesis or analysis. It is not possible to ap-
ply conditions directly to the principal components model—as we have done so
far—because the conditions and model exist in different spaces. Recall from
Section 5.5.2 that the conditional distribution p(x1|x2) = N(µ′, Σ ′) with
µ′ = µ1 + Σ1,2Σ−12,2 (x2 − µ2) (8.4)
Σ ′ = Σ1,1 − Σ1,2Σ−12,2 Σ2,1 . (8.5)
We can partition the matrix P as:
P =
P1
P2
, (8.6)
where the rows of P1 correspond to the unknown dimensions (i.e. x1) and the
rows of P2 correspond to the known dimensions (i.e. x2). We can write
Σx =
Σ1,1 Σ1,2
Σ2,1 Σ2,2
≈ PΣbPT =
P1ΣbP1T P1ΣbP2
T
P2ΣbP1T P2ΣbP2
T
. (8.7)
Given Equation 8.7, it is straightforward to write approximations of the condi-
tional mean vector and covariance matrix as:
µ′ = µ1 + Σ1,2Σ−12,2 (x2 − µ2) (8.8)
≈ µ1 + (P1ΣbP2T)(P2ΣbP2
T)−1(x2 − µ2) ,
Σ ′ = Σ1,1 − Σ1,2Σ−12,2 Σ2,1 (8.9)
≈ (P1ΣbP1T)− (P1ΣbP2
T)(P2ΣbP2T)−1(P2ΣbP1
T) .
Chapter 8—Gaussian mixtures in principal components spaces 227
It therefore seems likely that there is an elegant way to use Gaussian mixture mod-
els in low-dimensional spaces. However, we have not yet considered how to com-
pute the conditional component probabilities,{P (i|x2) : i = 1, . . . , k
}. Unfortu-
nately, in order to compute{P (i|x2) : i = 1, . . . , k
}(as shown in Equation 5.35),{
P (x2|i) : i = 1, . . . , k}
are required, which are computed by marginalising the
model—in its natural space—over the dimensions corresponding to x1. This
means that two versions of the model are required: one in the principal compo-
nents space, and one in the natural space. Though this seems awkward, it may
be acceptable if working in the principal components space is advantageous in
terms of computational efficiency or the more densely populated training space
leads to better models.
8.3.1 A numerical issue
In practice, the P2ΣbP2T matrices are close to singular (numerically difficult to
invert). Common advice on dealing with this type of problem is to add a scaled
identity matrix to the ill-conditioned matrix. This increases the variances in the
corresponding distribution. The scalar is determined by the amount of variance
to be added to the distribution. In the current setting, this advice essentially
assumes that the P2ΣbP2T matrices are close to singular because not all of the
variance observed in the data was kept in the model as a result of the princi-
pal components approximation. In the case of multiple model components, it is
not clear how to distribute the missing variance. We tried distributing the miss-
ing variance in two ways: evenly over the components and in proportion to the
component probabilities. Neither approach performed satisfactorily.
Chapter 8—Texture synthesis in principal components spaces 228
While we have found that computing the Moore-Penrose generalised inverse to
be the most satisfactory way to solve the problem for covariance matrices in
the natural data space, this approach does not work reliably in the principal
components case. We address this problem by computing the generalised inverse
of Σb. If we keep all of the eigenvectors in P , then Σx = PΣbPT. By ignoring
eigenvectors with small associated eigenvalues, Σb is a low rank approximation of
Σx, and similarly P2Σ−1b P2
T is an approximation of Σ−1x .
8.4 Texture synthesis in principal components
spaces
The procedure for generating synthetic textures using a Gaussian mixture model
that has been built in a principal components space is identical to the regular
case, except that the computation of the conditional distribution is performed
as described in Section 8.3. Figure 8.1 shows an example training image taken
from the MeasTex database1 and a synthetic texture generated using a Gaussian
mixture model built in a principal components space. The model retained 95%
of the total variance and the texture was produced using the patch-wise algo-
rithm. The example shown is an example of successful synthesis. We were not
able to synthesise mammographic textures to this level of quality using principal
components models.
1Currently available at http://www.cssip.uq.edu.au/meastex/meastex.html
Chapter 8—Discussion 229
(a) (b)
Figure 8.1: Synthesis using a principal components model.A training image is shown in (a) and a synthetic image is shown in (b).
8.5 Discussion
Our aims in building reduced dimensionality models were:
• To build better models by exploiting data redundancy to more densely
populate the space of training examples.
• To achieve faster training, synthesis and analysis.
Although the second of these aims is partially achieved (training is faster, but the
projections used to compute conditionals result in slower synthesis), it is generally
at the expense of the quality of synthesis (and presumably analysis). Although
the texture shown in Figure 8.1 is one of the best synthetic textures in this thesis,
textures generated using principal components models were generally not as good
as textures from models built in the natural space. The approximation used in
Chapter 8—Discussion 230
the computation of the inverse of the P2ΣbP2T matrices degrades the quality of
synthesis (see below).
There is a way to benefit from the advantages of dimensionality reduction without
suffering the disadvantages. It is possible to build the model in the low dimen-
sional principal components space and then project the entire model into the
natural space for all subsequent processing (i.e. synthesis and analysis). Thus,
model building is accelerated. Little degradation in the quality of the synthetic
textures is observed (hence the loss in quality noted above can be attributed to
the approximation used in the computation of the inverse of the P2ΣbP2T ma-
trices). Further, we do not need to project the component covariance matrices
each time a conditional distribution needs to be computed, and so synthesis and
analysis can be performed at normal speed. There does not appear to be any
noticeable benefit from having a more densely populated training space in the
models that we have built. The procedure is as follows: the model is built in the
principal components space as described in Section 8.3, and then projected into
the natural space as follows:
µx,i = x + Pµb,i, ∀ i ∈ {1, · · · , k} , (8.10)
Σx,i = PΣb,iPT, ∀ i ∈ {1, · · · , k} , (8.11)
where µx,i and µb,i are the i-th component mean vectors in the natural and prin-
cipal components spaces respectively, and Σx,i and Σx,i are the i-th component
covariance matrices in the natural and principal components spaces respectively.
The component probabilities,{P (i) : i = 1, . . . , k
}, are unaffected by the pro-
Chapter 8—Summary 231
jection.
8.6 Summary
This chapter presented a method for learning the parameters of a Gaussian mix-
ture model—and hence a parametric texture model—in low-dimensional spaces.
In summary:
• There are two reasons why building Gaussian mixture models in low di-
mensional spaces might be useful. First, compared to higher-dimensional
spaces, fewer training points are required to populate a low-dimensional
space at a given density, so more specific models may be able to be built.
Second, algorithms often iterate over the dimensions of the data, so working
in a low-dimensional space is likely to yield more efficient algorithms.
• A method to learn the parameters of Gaussian mixture models in principal
components spaces was developed. The closed-form method of computing
conditional distributions was extended to the principal components model
and a numerical issue arising from this was addressed.
• It is not straightforward to marginalise a principal components model over
dimensions in the natural space. This problem makes working in a principal
components space less attractive.
• A method for synthesising textures from a principal components parametric
texture model was described.
Chapter 8—Summary 232
• A synthetic texture, generated from a principal components model, was
presented. Although it is possible to achieve excellent results using the
approach, results for principal components models were much more variable
than for the models built in the natural data space.
• Gaussian mixture models can be built in low-dimensional spaces and then
projected into the natural data space. This allows models to be built in
more densely populated spaces in less time and used as if they had been
built in the natural data space.
Chapter 9
A generative statistical model of
entire mammograms
9.1 Introduction
This chapter presents a parametric statistical model of the appearance of entire
mammograms. The chapter describes:
• Why mammograms are difficult to model.
• Approaches other authors have used to solve the problem.
• The structure of our model.
• How the model parameters are learned from training data.
• How synthetic mammograms can be generated using the model.
233
Chapter 9—Background 234
9.2 Background
9.2.1 Why are mammograms hard to model?
Mammograms are difficult to model because they vary dramatically in appearance
and are digitised at high resolution; their appearance is highly detailed. In this
section we will discuss the sources of this variation and comment on how it affects
the images. Figure 9.1 illustrates the effects of these sources of variation.
Size and shape variation
It is apparent that womens’ breasts vary in size and shape. This variation is due
to both the natural variation between individuals, but is also related to lifestyle
(as breasts store fat, overweight or obese women are likely to have larger, more
fatty breasts [54]). In addition to this natural variation, the apparent size and
shape of breasts in mammograms varies due to the imaging process (e.g. the
degree of compression). Compare Figure 9.1(b) and Figure 9.1(e).
Anatomical variation
Womens’ breasts also vary in their composition. The proportion of glandular
to fatty tissue—the density—is variable, with post-menopausal women usually
having almost entirely fatty breasts. The number and configuration of ducts
varies between women, and the imaging process may capture them in varying
degree. Mammograms are digitised at high resolution and the resulting images
Chapter 9—Background 235
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Figure 9.1: Examples of mammographic variation.These images are to scale.
Chapter 9—Background 236
are therefore large and contain a lot of detail. Another form of variation that
might be considered “anatomical” is that introduced by surgery (e.g. lumpectomy
or augmentation mammoplasty), but we do not consider these types of variation
in this work. Compare Figure 9.1(g), a breast with a well-defined fibro-glandular
tissue region, to Figure 9.1(h), which is almost entirely fatty.
Variation in the imaging process
Due to the manual placement of the breast in the X-ray equipment, features such
as the nipple or pectoral muscle may be absent, partially imaged, obscured or par-
tially visible. While such images can be interpreted by trained clinicians, these
problems pose a significant problem to computer-based methods which often rely
upon reliable points of reference. Further, it is not feasible for radiographers
to take more care in the acquisition process, because of the natural variation be-
tween women and due to the compression part of the process being uncomfortable
or painful. Compare Figure 9.1(k), where the pectoral muscle is not imaged, to
Figure 9.1(a), where the pectoral muscle is included. The breast in Figure 9.1(h)
has a poorly-defined border—which is probably due to the placement and com-
pression of the breast—while Figure 9.1(c) is more defined. Non-uniformity of
the intensity of the X-ray illumination field can also result in a visible difference
in density over an X-ray image (e.g. the anode heel effect). In the next sec-
tion a review of research on modelling and synthesising the appearance of entire
mammograms is presented.
Chapter 9—Background 237
9.2.2 Approaches to modelling the appearance of entire
mammograms
The most common approach to modelling the appearance of entire mammograms
for synthesis and analysis is physics-based. Bakic et al. [10] developed a 3-D
model of the physical distribution of the various tissue types. They modelled
compression of the breast and the X-ray image formation process to generate
simulated X-rays.
Taylor et al. [176] developed a 3-D model of breast development to allow simulated
mammograms to be generated. A voxel-based cellular automaton was initialised
with a rudimentary ductal structure that represented a breast prior to maturation.
Voxels contained a mixture of fatty and glandular tissue. The ductal structure
was developed by allowing it to branch and grow. This development was driven
by simulated branching and growth agents that had either promotive or inhibitive
effects. A parameterised breast surface model was developed using data from real
women. Synthetic mammograms were formed by simulating the compression of
the breast and the projection of X-rays. The paper does not give examples of
synthetic mammograms, but some examples are available on the author’s website
[146].
Bliznakova et al. developed a highly detailed model of the structure of the breast
[16]. The authors modelled the breast surface, ductal system, terminal ductal
lobular units, Cooper’s ligaments, pectoral muscle, 3-D parenchymal texture and
several forms of abnormality. They separately modelled the X-ray image forma-
tion process. The breast shape was modelled simply as two geometrical primitives.
Chapter 9—Background 238
Large, medium and small-sized breasts were modelled separately. The ductal sys-
tem was modelled as a tree structure composed of cylindrical components and a
probabilistic model was used to characterise the branching. The 3-D parenchymal
texture was simulated by mapping 2-D fractal textures into the volumetric space.
Cooper’s ligaments were modelled by thin ellipsoidal shells occurring in random
locations within the breast. Masses were modelled by ellipsoids, spiculations by a
series of connected cylinders and microcalcification clusters by collections of small
ellipsoids. The characteristics of the abnormalities were controlled by user input.
The authors report that simulated mammograms could be generated in less than
5 minutes on a 2 GHz Intel Pentium 4 processor. Subjectively, their results are
impressive. The authors conducted a psychophysical experiment to determine the
extent to which expert radiologists could differentiate real from simulated regions
of interest. Regions of interest of size 40 mm × 40 mm were inspected on screen
and the radiologists correctly identified 80% of the simulated normal patches,
67% of the real normal patches, 87% of the simulated calcification patches, 96%
of the simulated mass patches and 100% of both the real calcification and mass
patches.
Although physics-based models are useful for investigating the image acquisition
process (e.g. patient positioning, radiation dose, breast compression and deforma-
tion), the synthetic images they produce are usually subjectively not particularly
realistic and these methods are not intended to support image analysis.
Another important approach to modelling the appearance of objects in images is
the Active Appearance Model (AAM) [47], which models shape and shape-free
appearance. The work in this chapter is closely related to the AAM, and we give
Chapter 9—Background 239
a brief overview of the method below.
Overview of AAMs
An AAM is a model of the shape and appearance of a particular class of object
in an image—e.g. a face or an anatomical structure—combined with a search
strategy that allows instances of the object to be located in previously unseen
images. The following discussion focuses on the model itself, rather than the
search strategy.
An AAM consists of shape and appearance sub-models, which are statistically
coupled. The shape sub-model is built by annotating a set of training images—
that contain instances of the object of interest—with landmarks. These land-
marks are typically positioned on salient features of the object being modelled
and must correspond across the training set (e.g. when modelling faces, if the
5-th landmark identifies the left corner of the left eye, then it must do so in all of
the training images). Each landmark in a 2-D image is represented by an (x, y)
coordinate. If each image contains N landmarks then the landmark coordinates
for each image can be concatenated to form a vector with 2N elements, which
is called a shape vector. Each of these vectors can be considered to be a point
in a 2N -dimensional space. There is likely to be significant redundancy in each
shape vector because the positions of landmarks will be correlated within each
image and across the training set. This correlation is exploited using Principal
Components Analysis (PCA, which was presented in Chapter 8). PCA allows
each training shape vector to be projected into a low-dimensional space, so that
Chapter 9—Background 240
Figure 9.2: Overview of the Active Appearance Model.The left-most image shows one of the training images with landmarks (in green).Nine samples from the AAM are shown on the right. The top row shows the meanappearance warped to three synthetic shapes sampled from the shape sub-model.The middle row shows three samples from the appearance sub-model warped tothe mean shape. The bottom row shows three joint samples (i.e. from both modelcomponents), illustrating how the model can represent a range of legal instancesof human faces.
Chapter 9—Background 241
each shape vector in the training set has a corresponding shape parameter in the
principal components space.
The appearance sub-model is built by warping each object in the training set
to the mean shape. This removes spatial variation from the training set—which
has already been learned—leaving “textural” variation. Triangles can be defined
between the landmarks in each training image. Image intensities are sampled
within the triangles of each warped training image. The result is a set of vectors of
intensities, one for each training image. There is a dense correspondence between
the elements of the vectors. PCA is applied to exploit the redundancy in these
texture vectors yielding a set of texture parameters in a low-dimensional space.
The shape and texture parameters for each training image are concatenated and
a further PCA is performed. This couples shape and appearance and exploits
any correlation between the two. The result is a set of low-dimensional vectors
that describes both shape and appearance of the objects in the training set. The
distribution of these vectors is modelled, typically using a multivariate normal
distribution. The model can be sampled and the corresponding vector recon-
structed to form a synthetic object in the image plane. The model can also be
used to constrain the AAM search strategy. An example is shown in Figure 9.2,
which illustrates an AAM of the human face. The figure shows landmarks for
one training image and nine samples from the AAM.
AAMs rely on finding sufficient redundancy in the intensity information across a
set of training images to dramatically reduce the dimensionality of the appearance
space, thus making it possible to train the model with a reasonably small set of
Chapter 9—Modelling and synthesising entire mammograms 242
images (e.g. 30). However, the highly detailed nature of mammograms would not
be captured by the AAM approach.
9.3 Modelling and synthesising entire mammo-
grams
We assume that the mammograms we will model are all of the same view,
e.g. mediolateral oblique (MLO) or cranial-caudal (CC); in practice we have
worked with the MLO view. We decompose the problem of modelling mam-
mograms, combining an AAM-like model of global shape and appearance with
a wavelet-based model of stationary texture, allowing us to bypass the curse of
dimensionality [14]. The model is composed of three sub-models: a model of
shape, a model of approximate appearance1, and a model of local textural ap-
pearance. We have trained a model using a set of 36 mammograms from the Dig-
ital Database for Screening Mammography [83]. The number of training images
was limited by a computational “bottleneck”—solving the shape correspondence
problem—which we discuss further in the next section. After outlining a series
of pre-processing steps, we describe each of these sub-models and show how they
can be combined to synthesise mammograms.
1Shape and approximate appearance are jointly modelled.
Chapter 9—Modelling and synthesising entire mammograms 243
9.3.1 Breast shape and the correspondence problem
We assume a training set of images B, which are mammograms with the non-
breast regions (e.g. markers) set to black, and normalised such that all breasts
“point” to the right (i.e. the nipple is on the right). As in an AAM, a statistical
shape model (SSM) is used to cope with size and shape variation. A set of land-
mark points is required to define the shape of the breast in each image. These
landmarks must correspond across the training set. In the SSM framework, land-
marks are often manually placed and chosen to correspond to easily identifiable
image features (e.g. the tips of the fingers when modelling hand shapes). As
mammograms lack reliable features, we seek to automate annotation. A naıve
approach is to use landmarks placed at regular intervals on the breast borders,
starting at a reliable location (e.g. the right-most point of the breast, the approx-
imate location of the nipple). Such landmarks are a good first approximation.
However, synthetic shapes generated from a shape model built using landmarks
with such correspondences are subjectively unrealistic. This is demonstrated in
the top row of Figure 9.3. Better correspondences are required.
Approaches proposed by Kotcheff and Taylor [115] and Davies et al. [51, 50]
seek to improve correspondences across a set of training shapes. The idea is to
search over parameterisations of the training shapes to find a model that best
describes the training set. The training shapes are re-parameterised using a set
of monotonic mappings which guarantee that the landmark ordering is preserved
across the training set, and so the mapping is diffeomorphic2. These methods
aim to find an optimal set of re-parameterisations—according to some concept of
2A diffeomorphism is a mapping that does not tear or fold the manifold.
Chapter 9—Modelling and synthesising entire mammograms 244
Figure 9.3: Samples from two shape models, illustrating the need for goodcorrespondences.Top row: Samples from a shape model built using regularly-spaced landmarks.Bottom row: Samples from a shape model built using optimal correspondences.Examples of real mammogram shapes, taken from the training set, are shown inFigure 9.6.
Chapter 9—Modelling and synthesising entire mammograms 245
goodness—and so the problem is posed as an optimisation.
The main difference between the two methods is the choice of the objective func-
tion. Kotcheff and Taylor’s method uses the determinant of the (re-parameterised)
shape model’s covariance matrix, which effectively measures the hyper-volume oc-
cupied by the training set in shape space; minimising the measure yields more
compact models. The method proposed by Davies et al. uses an information
theoretic measure of model quality. Their objective function computes the num-
ber of bits required to transmit the training set by encoding it using a (re-
parameterised) shape model. In order for a receiver to understand the message,
the model must also be transmitted, and so contributes to the objective function.
The authors refine the piecewise-linear re-parameterisation method presented by
Kotcheff and Taylor using the integral of a sum of Cauchy kernels, ensuring that
the re-parameterisation functions are differentiable and hence more suited for
use in an optimisation scheme. Additionally, an efficient optimisation scheme is
presented.
Although the method of Davies et al. is rigorously justified, it is closely approx-
imated by that of Kotcheff and Taylor, whose method generally finds a good
solution to the correspondence problem more quickly. We initialise Kotcheff and
Taylor’s scheme with regularly-spaced landmarks, and run their optimisation.
We refine the improved correspondences using the minimum description length
(MDL) scheme of Davies et al. Figure 9.4 shows the value of the Kotcheff and
Taylor objective function as a function of the iteration number for our training
set and Figure 9.5 shows the objective function values for the MDL algorithm3.
3Note that the two methods’ objective functions are expressed in different units.
Chapter 9—Modelling and synthesising entire mammograms 246
Figure 9.4: Values of the Kotcheff and Taylor objective function.
Figure 9.6 shows selected points from the initialisation and final solution, and
illustrates how the correspondences are improved. Although the difference is
subtle, and the solution is not necessarily intuitive, Figure 9.3 shows the impor-
tance of good correspondences. Figure 9.3 shows that improved correspondences
yield an SSM that successfully limits illegal variation.
The final shape model has the form:
s = s + Psbs (9.1)
where s is a shape parameterised by bs, s is the mean shape, and Ps is a matrix
whose columns are a set of eigenvectors of the shape data covariance matrix,
sufficient for the model to retain a given proportion of the total variance of the
Chapter 9—Modelling and synthesising entire mammograms 247
Figure 9.5: Values of the MDL objective function.The solution found by running the Kotcheff and Taylor algorithm (see Figure 9.4)is refined in this run of the MDL algorithm.
original data. The number of retained eigenvectors, ds, is typically much smaller
than the dimensionality of the original space: ds � 2Nl (where there are Nl
landmarks for each image), so the distribution of shape parameters can be learned
from a reasonably small training set.
The computational cost of optimising the correspondences is related to the num-
ber of training shapes. This limits the number of training images that can be used
to build the model of entire mammographic appearance—this is the “bottleneck”
mentioned in the previous section. It should be possible to allow an arbitrarily
large number of training images to be used. Correspondences could be optimised
for a relatively small set of training shapes and an active shape or appearance
model could then be built and used to locate the breast border in other training
Chapter 9—Modelling and synthesising entire mammograms 248
Figure 9.6: The initial and final correspondences for the mammogram shapemodel.The figure shows every 10-th point for 6 of the 36 training shapes. The top rowshows the initial positions, and the bottom row shows the final solution.
Chapter 9—Modelling and synthesising entire mammograms 249
images [165]. Hence correspondences in these other training images would be
defined implicitly. However, this work is beyond the scope of this thesis.
9.3.2 Approximate appearance
We consider mammographic appearance to have two components: an approx-
imate appearance (the general global appearance of the mammogram) and a
detailed appearance (the local textural details, see Section 9.3.3). In this section
we describe how approximate appearance is modelled. We address the appear-
ance correspondence problem, describe how approximate appearance is separated
from detailed appearance and describe how approximate appearance is related to
breast shape.
The appearance correspondence problem
To model the approximate appearance, we need to consider the appearance cor-
respondence problem. If we could guarantee that all mammograms contain the
same features, then we could define dense correspondences between the contents
of a set of mammograms. This is not the case and, due to anatomical differences
between women, there are no underlying correspondences that can be exploited.
We choose to cope with this form of variation implicitly by approximately regis-
tering breasts to a canonical shape and then learning the variation in appearance
(the mean shape s provides a natural canonical reference shape). For each seg-
mented breast in B, we use a thin plate spline [20] to warp to the mean shape,
yielding a set N of segmented breasts in a shape-normalised space. The thin
Chapter 9—Modelling and synthesising entire mammograms 250
plate spline does not guarantee diffeomorphic transformations, but since we do
not use control points within the breast region, and the order of control points is
preserved by the correspondence optimisation algorithms, the resulting warps are
well-behaved. An alternative approach would be to use a non-rigid registration
algorithm to define the correspondences between points within the breast region,
but we have not investigated this.
The steerable pyramid decomposition
In modelling mammographic appearance, we would like to be able treat the ap-
pearance of each mammogram as a point in an appearance space so that the
appearance could be modelled using straightforward statistical methods. Al-
though the number of pixels in a mammogram is very large, if we could exploit
redundancy in the shape-normalised appearance—for example by defining dense
correspondences—we might be able to populate the appearance space sufficiently
for density estimation to be successful. Unfortunately, this is not the case. To
overcome this problem we use a hierarchical decomposition called the steerable
pyramid [164]. This is a wavelet-like image decomposition developed for use in
texture modelling and synthesis. Images are decomposed in terms of multiple
scales and orientations using directional derivative basis functions which range in
scale and orientation. This allows the coarse and fine structure of the images to
be treated separately within a single framework.
The steerable pyramid was selected because it decomposes images in terms of
scale and orientation (which have been found to be useful in mammography ap-
Chapter 9—Modelling and synthesising entire mammograms 251
plications, see Chapter 4); it has been used successfully in texture modelling and
synthesis [143, 144]; the decomposition is motivated by knowledge of biological
vision ([183] discusses the work of Hubel [93] and Wiesel [184]); and there is a
freely available implementation4. See Section 9.3.3 for further notes on the use
of the steerable pyramid.
Figure 9.7 shows a block diagram of the decomposition. Analysis is shown on
the left-hand side. The image is separated into high- and low-pass sub-bands
using filters H0 and L0. The low-pass sub-band is then separated into a series
of oriented bandpass sub-bands and another low-pass sub-band using filters {Bi}
and L1. This low-pass sub-band image is then sub-sampled by a factor of two
in each direction and the result passed recursively to {Bi} and L1, as indicated
by the dark circle and shaded region in Figure 9.7. Synthesis is shown on the
right-hand side of Figure 9.7, and involves reversing the analysis steps.
We can think of the steerable pyramid decomposition as having a structure similar
to a quad-tree. The pyramid has a number of levels which correspond to scale,
and range from coarse (a few pixels square) to fine (the same size as the original
image). Each level has a number of oriented sub-band images. In addition, there
is a coarse low-pass sub-band and a fine high-pass sub-band. Although there are
more coefficients in the pyramid than pixels in the original image, the hierarchical
structure of the pyramid allows us to decompose our modelling problem further.
We can consider the top part of the pyramid (the coarse levels) separately to
the bottom part of the pyramid (the fine levels). Figure 9.8 shows the top three
4The steerable pyramid software is currently available at http://www.cns.nyu.edu/˜eero/steerpyr/
Chapter 9—Modelling and synthesising entire mammograms 252
Figure 9.7: Block diagram for the steerable pyramid decomposition.Analysis is shown on the left and synthesis is shown on the right. The dark circleindicates the recursive computation of the shaded region. The {Bi} filters computethe oriented sub-band images.
pyramid levels for a mammogram.
The approximate appearance model
We want to be able to represent the general appearance of a mammogram in a
way that allows us to subject it to statistical analysis. We decompose each image
in N to form the set of pyramids P . For each pyramid in P , we concatenate the
coefficients in the top few pyramid levels into a vector a. This vector describes
the approximate appearance of the shape-normalised mammogram. We again
perform PCA, yielding
a = a + Paba. (9.2)
Initially, the coefficients in each pyramid level are effectively measured on different
intensity scales. In order to use a covariance matrix to model the distribution
Chapter 9—Modelling and synthesising entire mammograms 253
Figure 9.8: The coefficients in the top three levels of a steerable pyramid de-composition of a mammogram.The breast is oriented so that the nipple points downwards. The oriented sub-bands are shown as L-shaped sets of images and the final low-pass image is in thetop-right corner of the image. The arrows indicate the orientation of the filtersused to compute the five sub-band images. The high-pass image is not shown.
of such data—either for its own sake, or to perform PCA—it is best to use a
common scale. We normalise the data in each dimension either to z-scores as
described by Equation 9.3 [45], or to a common scale using a robust M-estimator
of spread [145], depending upon the characteristics of the data. If xi is a data
point from a sample with mean x and standard deviation σ then zi, the z-score
for xi, is given by:
zi =xi − x
σ. (9.3)
For simplicity, the conversion to and from these standard scales is assumed in the
rest of this chapter.
Chapter 9—Modelling and synthesising entire mammograms 254
A joint model of shape and approximate appearance
We have described how we can model mammographic appearance in the shape-
normalised space. To perform synthesis we need to be able to warp to a plausible
shape. A naıve approach would be to model the distribution of the shape param-
eters bs and then sample from it. However, this would not take into consideration
the fact that there may be a relationship between the appearance of a mammo-
gram and its size and shape. For example, fatty breasts tend to be large, while
glandular breasts tend to be small. The approach we take is to model the joint
distribution of shape parameters and approximating appearance parameters and
then condition this model on the approximating parameters to yield a model of
plausible shapes for the generated mammogram. We use a single multivariate
Gaussian:
p(bs, ba) = p(bj) = N(mj, Σj). (9.4)
9.3.3 Detailed appearance
The approximating model provides a first approximation to the mammographic
appearance, but does not include any information from the lower pyramid levels.
We call these the detailing levels. A parent vector is defined as the set of coef-
ficients on a path through a pyramid at locations corresponding to a particular
pixel in the original image [53]. The parent vector contains information about
the local image behaviour at a particular location, from the coarsest level to the
finest. Using a notation similar to that in Figure 9.7, a parent vector, bt(x, y),
Chapter 9—Modelling and synthesising entire mammograms 255
corresponding to a particular (x, y) location in the original image is given by:
bt(x, y) =[H0(x, y), B1
0(⌊
x21
⌋,⌊
y21
⌋), B1
1(⌊
x21
⌋,⌊
y21
⌋), · · · , (9.5)
B20(
⌊x22
⌋,⌊
y22
⌋), B2
1(⌊
x22
⌋,⌊
y22
⌋), · · · , LM−1(
⌊x
2M−1
⌋,⌊
y2M−1
⌋)]T,
where there are M levels, H0(x, y) is the coefficient at (x, y) in the high-pass band,
Bji (x, y) is the coefficient at (x, y) in the i-th oriented sub-band at the j-th level,
LM−1(x, y) is the coefficient at (x, y) in the low-pass band and the subscript t in
bt indicates texture. The floor function—which returns the largest integer that
is less than or equal to x—is denoted by bxc. It serves here to ensure that the
sub-bands are correctly indexed. In the remainder of this chapter we will drop the
(x, y) indexing notation, as we assume that the detailed textural component of
mammographic appearance is stationary. This assumption makes the problem of
modelling detail tractable. It is reasonable because we might expect local detail
to depend only on tissue type, which is modelled implicitly by the approximating
model.
We consider a parent vector bt to be a point in a high dimensional vector space.
A suitable model of the distribution of parent vectors, p(bt), would allow the
detailing levels to be populated by sampling the model, conditioned upon the
coefficients in the approximating levels. This approach is motivated by previous
work on hierarchical texture modelling by De Bonet and Viola [53] and Sajda
et al. [159]. Multivariate Gaussian, or mixture of multivariate Gaussian, repre-
sentations are ideal for this purpose as there is a closed-form solution for the
conditional Gaussian (see Section 5.5). p(bt) is modelled as p(bt) = N(µt, Σt).
Chapter 9—Modelling and synthesising entire mammograms 256
Although we use the steerable pyramid, the choice of decomposition is probably
not critical: other authors have reported success with hierarchical conditioning of
wavelet coefficients for texture modelling, synthesis and analysis and presumably
use different decompositions (e.g. [159, 53]).
9.3.4 Generating synthetic mammograms
Algorithm 11 describes how synthetic mammograms are generated.
Algorithm 11 Generating a synthetic mammogram
. Simultaneously sample an approximate appearance parameter, ba, and ashape parameter, bs from the joint model of p(bs, ba).
This is equivalent to sampling one and then conditionally sampling the other.
. Reconstruct the approximating steerable pyramid coefficients by projectingba back to the natural space to yield the corresponding a.for each (x, y) location within the shape-normalised breast region do
. Sample the parent vector at the current location. The detailing coefficientswill be unpopulated. Compute the distribution of detailing coefficients byconditioning the model of p(bt) on the approximating coefficients in the sam-pled parent vector, using Equation 5.33.. Sample from this conditional distribution, and place the sampled detailingcoefficients into the parent vector at the current location.
Note that because the steerable pyramid may not be a perfect quad-tree, theabove two steps are implemented as iterations over the pyramid levels.
end for. Reconstruct the fully-populated pyramid to form the corresponding image inthe shape normalised space.. Project the shape parameter bs to its natural space, yielding the shape thatcorresponds to the parameter.. Warp the reconstructed image to the sampled shape.
Chapter 9—Example synthetic mammograms 257
9.4 Example synthetic mammograms
We selected 36 pathology-free MLO mammograms—ranging in size, shape and
appearance—from the Digital Database for Screening Mammography (DDSM)
[83] and built a model of mammographic appearance as described in Section 9.3.
The training set had relatively few images because the optimisation of the breast
boundary landmark correspondences is computationally expensive. Such a small
training set cannot represent the full variation in mammographic appearance,
although the synthetic images generated by the model are subjectively quite
realistic (future work should investigate if realistic results can be achieved with
models trained with larger training sets).
One hundred (100) landmark points were used to define the breast boundary. We
used 7 pyramid levels—including the high- and low-pass sub-bands—each with 5
orientations. The top three pyramid levels were included in the approximating
model. These had 159 420 coefficients prior to PCA. We found that retaining
90% of the total variance in the shape model and 99% in the approximating ap-
pearance models yielded compact models that produced convincing results when
sampled. One hundred thousand (100 000) locations within the breast regions
were randomly selected and the corresponding parent vectors were extracted.
Their distribution was modelled using a single multivariate Gaussian component,
as described in Section 9.3.3.
Building the model took approximately 24 hours (most of this time was spent
computing the optimal shape correspondences). Producing a synthetic mammo-
gram takes approximately 2.5 hours (almost all of this time is spent sampling the
Chapter 9—Summary 258
conditional parent vectors)5. Figure 9.9 shows some synthetic mammograms that
were generated using our model and Figure 9.10 shows some synthetic mammo-
grams alongside a real mammogram.
9.5 Summary
This chapter presented a generative statistical model of the appearance of entire
mammograms. In summary:
• The appearance of entire mammograms is difficult to model because of the
variation between women, variability in the imaging process and the high
resolution of the images.
• Our model is composed of components that model the breast shape, the
approximate appearance and detailed texture. Detailed texture is assumed
to be stationary. The three model components are statistically coupled, so
that plausible synthetic mammograms can be generated.
• The breast shape model can be learned by solving the shape boundary
landmark correspondence problem. Algorithms developed by Kotcheff and
Taylor and Davies et al. were used to solve this problem.
• Synthetic mammograms can be generated by sampling from the joint model
of shape and approximate appearance and sampling detailing coefficients
using a hierarchical conditioning method.
5Timings are for a computational server with a 2.8GHz Intel Xeon processor and 2GB ofRAM.
Chapter 9—Summary 259
Figure 9.9: Synthetic mammograms generated using the model.
Chapter 9—Summary 260
Figure 9.10: Real and synthetic mammograms.A real mammogram is shown on the left and three synthetic mammograms areshown on the right.
Chapter 10
Evaluating the synthetic
mammograms
10.1 Introduction
This chapter presents an evaluation of the synthetic mammograms produced by
the model described in the previous chapter. The chapter describes:
• A qualitative evaluation of the synthetic mammograms by an expert mam-
mography radiologist.
• A quantitative psychophysical evaluation of the synthetic mammograms.
• An evaluation of the detailing component of the model.
261
Chapter 10—Qualitative evaluation by a mammography expert 262
10.2 Qualitative evaluation by a mammography
expert
An expert mammography radiologist evaluated our synthetic mammograms in
a psychophysical experiment. We printed real and synthetic mammograms onto
quality A4 paper using a high-quality laser printer. For the real mammograms, we
used the real mammograms from N (i.e. without markers and other non-breast
regions). While one should not generally test using a training set, we believed
that although the synthetic mammograms are quite realistic, they would not be
good enough to convince an expert radiologist, and so the experiment would not
be biased by testing with training data.
We presented a “shuffled” set of 13 real and 13 synthetic full resolution mammo-
grams to the radiologist and asked them to rank the mammograms according to
how realistic they were. The radiologist was quickly able to sort the mammograms
into the two sets. Although they were able to identify the synthetic mammograms,
their feedback was positive and the most useful feedback was obtained in infor-
mal discussion. The radiologist said that some of the synthetic mammograms
looked ‘quite realistic’. One of the ways they could identify the synthetic images
was by the lack of blood vessels, lymph nodes and benign calcifications. Such
structures exist at the boundary of the approximate appearance model and the
detailing texture model and are not captured by our current model. The radiolo-
gist pointed out that our synthetic mammograms were ‘a little fuzzy’ and lacked
‘dark regions’ ; the latter criticism can probably be attributed to the relatively
small training set. The radiologist said that one of the synthetic mammograms—a
Chapter 10—A quantitative psychophysical evaluation 263
large, fatty breast—was unrealistic. The radiologist was dismissive of the quality
of the other examples of synthetic mammograms and mammographic textures
in the literature, and considered our synthetic images to be superior (though it
should be noted that realism is not the aim of some of these methods).
10.3 A quantitative psychophysical evaluation
10.3.1 Aims
Aware that the lack of blood vessels, lymph nodes and benign calcifications made
the difference between the real and synthetic mammograms more obvious, we
wanted to determine whether the two classes could be distinguished when such
features could not be used as prompts.
10.3.2 Method
We formed sets of 7 real and 7 synthetic mammograms at low resolution. The real
mammograms were manually selected such that the set did not contain any with
very strong vascular clues. The synthetic set contained mammograms generated
using our model. Some “fatty” synthetic mammograms were excluded because
similar real mammograms were often excluded from the set of reals because they
contained strong vascular clues1. All selected mammograms were reduced in size
such that the remaining vascular clues could not easily be perceived in the set of
1It is also the case that the fatty mammograms generated using our model were deemed tobe less realistic by the expert mammography radiologist.
Chapter 10—A quantitative psychophysical evaluation 264
real mammograms. The resulting images were small (approximately 200 × 140
pixels), but the synthetic mammograms contained contributions from both the
approximating and detailing models at this resolution. Each real mammogram
in the set was paired with each synthetic mammogram to form a test set of 49
pairs. The number of images used was limited by the time available to synthesise
the set of synthetic mammograms. However, we were not sufficiently confident
that the synthetic mammograms would be realistic enough to be confused for the
real mammograms, and so a larger experiment would not have been justifiable.
We recruited five participants2 and allowed them to study a training set of 6 real
mammograms, scaled to fit within a 1024 × 768 pixel computer display. They
then performed a forced choice experiment, in which they were asked to guess
the real mammogram from each of the 49 possible pairings of real and synthetic
mammograms.
10.3.3 Results
At the end of the experiment, the participants were asked if they could tell the
difference between the real and synthetic mammograms: none of the subjects
believed that they had been able to identify the real mammograms reliably.
χ2 analysis (see Section 7.2.2) showed that one participant did no better than
random at the 95% significance level. The other participants differed significantly
from random, but consistently mistook the synthetic mammograms for the real
2Computer vision researchers from the division of Imagine Science and Biomedical Engi-neering at the University of Manchester.
Chapter 10—Evaluating the detailing model 265
ones. Between them, the participants correctly identified 75 real mammograms
out of 245 (31%). If we allow the consistent misclassification to count as correct
identification of the real mammograms, the participants collectively identified 191
real mammograms out of 245 (78%).
10.3.4 Discussion
These results show that, although the reduced resolution synthetic images are not
always indistinguishable from real mammograms, they are sufficiently convinc-
ing to make discrimination difficult. The fact that several subjects consistently
selected the synthetic mammograms as the real ones implies that the differences
were very subtle. It is interesting that the participants did not think they could
tell the difference, even though the statistical analysis indicates otherwise. It is
possible that the results can be attributed to the relatively small set of images
used to train the model, the small number of images used to “train” the non-
expert readers or by the selection of the real images used in the experiment. It
would therefore be unwise to generalise the above result.
10.4 Evaluating the detailing model
It is difficult to show the contribution made by the detailing model—either on
screen or in print—by examining entire mammograms, because of the high res-
olution of the images. Using a region of interest makes the contribution to the
textural appearance visible.
Chapter 10—Evaluating the detailing model 266
Figure 10.1 shows the contribution made by the detailing levels to regions of
interest from a real and synthetic mammogram. The left-hand column shows
contributions for a real mammogram and the right-hand column shows contribu-
tions for a synthetic mammogram. The top-row shows the contributions made
by the finest pyramid level, the second row shows the contributions made by the
finest and next-finest pyramid level, and so on. The bottom row shows the con-
tributions made by all detailing levels. These contribution images were computed
by taking the pixel-wise differences between regions that were reconstructed with
and without the corresponding detailing levels. The real mammogram was se-
lected to be subjectively similar in appearance to the synthetic mammogram (to
allow comparison of the contribution images) and the regions of interest were
extracted from approximately the same location in each image. The detail model
can be evaluated by comparing the textural characteristics of the real and syn-
thetic contribution images.
The images in the top row of Figure 10.1 subjectively have almost identical tex-
tures. The coefficients at this level are likely to represent high-frequency signals
such as “noise”. Subjectively, the images in the second row are texturally very
similar, but the real mammographic data has a slightly larger range. The images
in the third row are also subjectively similar, but the real data contains structure
corresponding to curvilinear features. This leads the real data to have a larger
range than the synthetic data. The images in the fourth row—showing all contri-
butions made by the detailing coefficients—are subjectively similar, but there are
large contributions made by curvilinear features in the real data. The histograms
of the contribution images in the bottom row show that the distributions of the
Chapter 10—Evaluating the detailing model 267
Figure 10.1: Contributions of detailing coefficients to real and synthetic mam-mograms.Left-hand column: contributions for a real mammogram. Right-hand column:contributions for a synthetic mammogram. See text for details.
Chapter 10—Evaluating the detailing model 268
difference values are approximately normal. The standard deviation of the real
data is approximately twice that of the synthetic data.
The mammograms that the contribution images in Figure 10.1 correspond to are
different, but allow us to draw some conclusions about how the detailing model
works with the approximating model to synthesise mammographic texture. Given
approximating coefficients for two similar mammograms, the detailing model is
subjectively successful in capturing the characteristics of the finest two levels.
Subjectively, the second most coarse level is also modelled reasonably well. The
coarsest level is not modelled particularly well. This is because the detailing
model assumes stationarity, but in reality the level is dominated by curvilinear
structures. These structures feed down to the second most coarse detailing level to
some extent. Some small curvilinear structures are also found at the second most
coarse detailing level. Similar results are obtained when detailing coefficients are
sampled for the approximating coefficients of a real mammogram.
The contribution images show that the use of a single multivariate Gaussian
component adequately models the detailed texture component of mammograms.
There is little evidence to suggest that a more complex model (such as a mixture
of Gaussians) would dramatically improve the stationary aspects of the detailed
texture. However, it is clear that modelling curvilinear structures is of vital im-
portance to the detailed texture. These long range structures tend to be most
evident in the coarsest detailing level. The model cannot currently capture such
structures. Learning legal configurations of curvilinear features within a statisti-
cal framework is likely to be a significant challenge. One approach to this problem
would be to extract networks of curvilinear structures using a method such as
Chapter 10—Summary 269
that presented by Zwiggelaar and Marti and statistically model characteristics of
curvilinear structure length, width, tortuosity and branching [191]. By learning
the joint distribution of these features and approximating parameters, it may be
possible to determine and synthesise the correct types of curvilinear networks for
a particular type of breast.
10.5 Summary
This chapter presented an evaluation of synthetic mammograms generated using
the model developed in the previous chapter. In summary:
• An expert mammography radiologist could easily distinguish between real
and synthetic mammograms. However, they commented that some of the
synthetic mammograms looked ‘quite realistic’. The lack of blood vessels,
lymph nodes and benign calcifications allowed the synthetic mammograms
to be identified.
• A quantitative psychophysical evaluation of reduced resolution synthetic
mammograms showed that, in general, the synthetic mammograms could
be differentiated from real mammograms, but not very reliably. One par-
ticipant could not distinguish between the two classes at all and the other
participants consistently misclassified the synthetic mammograms as real,
reporting that they could not tell the difference between the two classes.
The results indicate that, at low resolution, the synthetic mammograms are
sufficiently realistic that differentiating real and synthetic mammograms is
Chapter 10—Summary 270
difficult.
• An evaluation of the contribution made by the detailing model shows that,
while local textural detail is successfully captured, the model cannot capture
the appearance of curvilinear structures. As the qualitative evaluation by
the expert mammography radiologist showed, these structures allow the
real and synthetic mammograms to be easily differentiated. A method of
modelling these structures was proposed.
Chapter 11
Summary and conclusions
11.1 Introduction
This chapter presents:
• A summary of the work presented in this thesis.
• The conclusions that may be drawn from the work.
• A final statement.
11.2 Summary
• Chapter 2 presented background information on breast cancer, the clinical
problem and the various imaging modalities that are used to diagnose the
271
Chapter 11—Summary 272
disease. Breast cancer is a significant public health problem and many
countries have X-ray mammography screening programmes. The image
inspection task is performed visually and is subject to human error.
• Chapter 3 presented a review of the computer-aided mammography liter-
ature. CADe algorithms typically extract shape and texture features from
candidate locations and use classifiers to differentiate between true and false
detections of specific indicative signs of abnormality. Commercial systems
are available and have been shown to improve radiologist performance; how-
ever, they can also fail to improve performance. Psychophysical research
has suggested that a false positive rate much lower than that achieved by
current commercial systems is required for significant improvement in radi-
ologist performance. Much more sophisticated approaches may be required
to achieve such targets. One such method is novelty detection, which re-
quires a model of normal mammographic appearance that can measure
deviation from normal appearance. Statistical models should allow this
deviation to be measured within a rigorous mathematical framework. If
novelty detection is to be used, then the underlying model must be able
to “legally” represent any pathology-free instance and be unable to legally
represent abnormal instances. The only way to verify this is to be able
to generate instances from the model; thus the model must be generative.
Further, generative models make it relatively easy to visualise what has
been modelled successfully and what has not.
• Chapter 4 described work on improving the way that scale-orientation
pixel signatures are computed. Two flaws with an existing implementation
Chapter 11—Summary 273
were identified and a new method of computing signatures was developed.
An information theoretic measure of signature quality showed that, com-
pared to the original method of computing pixel signatures, the new method
increased signature information content by approximately 19%. A classi-
fication experiment was reported in which signatures computed using the
two methods were used to discriminate between pixels belonging to normal
and spiculated lesion tissues. The new signatures outperformed the original
signatures in terms of both specificity and sensitivity.
• Chapter 5 presented background information on the multivariate normal
distribution and the Gaussian mixture model. The Gaussian mixture model
is a flexible solution to the density estimation problem. Model parame-
ters can be learned using the k-means and Expectation-Maximisation algo-
rithms. Both the marginal and conditional distributions can be computed
for a Gaussian mixture model in closed-form; these distributions are them-
selves Gaussian mixture models. It is straightforward to sample from a
Gaussian mixture model.
• Chapter 6 presented Efros and Leung’s algorithm for texture synthesis
and developed the method into a parametric statistical model of texture
that can be used in both generative and analytical modes. Methods of
synthesising and analysing textures were developed and synthetic images
were presented.
• Chapter 7 presented a psychophysical evaluation of synthetic mammo-
graphic textures produced by the parametric model. The synthetic textures
were not indistinguishable from the real textures, but were selected in ap-
Chapter 11—Summary 274
proximately one third of trials. The synthetic images generated by Efros
and Leung’s algorithm were considered more realistic than those gener-
ated by the parametric model; the textures generated using the parametric
model were selected in 26% and 41% of trials. However, the images gen-
erated by the Efros and Leung algorithm used a more specific “training”
set than was used to train the parametric model. Direct comparison of the
two approaches should consider this experimental bias and the ability of
the parametric model to analyse images via novelty detection.
Simulated and real microcalcification and mass images were analysed using
parametric models. Results for the simulated data show that the novelty
detection approach can successfully detect multiple types of abnormality
using a single method. Results for the real data show that some discrim-
ination was possible, but significant improvement is needed. This may be
achieved by improving the specificity of the model and the adoption of a
hierarchical strategy.
• Chapter 8 presented an investigation into how Gaussian mixture mod-
els may be learned in low-dimensional principal components spaces. The
closed-form method of computing conditional distributions was extended
to the principal components model. The chapter described a method for
synthesising textures from a parametric texture model built in a princi-
pal components space. It is not straightforward to marginalise a principal
components model over dimensions from the natural space. This problem
makes working in a principal components space less attractive. Although it
is possible to achieve excellent results using the approach, results for princi-
Chapter 11—Summary 275
pal components models were much more variable than for the models built
in the natural data space.
• Chapter 9 described a generative statistical model of entire mammograms
and showed how synthetic mammograms may be generated. The model
has components that model the breast shape, approximate appearance and
the detailed texture. The breast shape model is learned by solving the
shape boundary landmark correspondence problem using the approaches
described by Kotcheff and Taylor and Davies et al.
• Chapter 10 presented three evaluations of the synthetic mammograms
generated using the model of entire mammograms. An expert mammogra-
phy radiologist could easily distinguish between real and synthetic mammo-
grams, but noted that some of the synthetic mammograms did look quite
realistic. The lack of blood vessels, lymph nodes and benign calcifications
allowed the synthetic mammograms to be identified.
A quantitative psychophysical evaluation of reduced resolution synthetic
mammograms showed that, in general, the synthetic mammograms could be
differentiated from real mammograms. However, one participant could not
distinguish between the two classes and the other participants consistently
misclassified the synthetic mammograms as real, reporting that they could
not tell the difference between the two classes. The results indicate that,
at low resolution, the synthetic mammograms are sufficiently realistic that
differentiating real and synthetic mammograms is difficult.
An evaluation of the contribution made by the detailing model shows that,
while local textural detail is successfully captured, the model cannot capture
Chapter 11—Conclusions 276
the appearance of curvilinear structures.
11.3 Conclusions
The work in this thesis should be considered in the correct context: while very
much research has been done on the traditional approach to CADe, almost no
work on generative statistical models for novelty detection has been done previ-
ously.
As discussed in Chapter 3, one of the most significant problems that the computer-
aided mammography community needs to address is the high false positive rate
of CADe systems. We believe that this can only be achieved by systems that
have a much better “understanding” of mammographic appearance. In addition
to reducing the false positive rate, it would be desirable if CADe systems could
detect any indicative sign of abnormality, not just microcalcifications and masses.
It would be elegant if a single algorithm could detect any indicative sign of ab-
normality. We believe that the novelty detection approach is the most principled
way to achieve these aims.
The results of the novelty detection experiment in Chapter 7 show that it is
possible for a single algorithm to detect multiple types of abnormality within a
novelty detection framework. Although the results for real mammographic data
were a little disappointing, the approach does have potential. The generative
property of the model developed in Chapter 6 was important as it allowed us
to verify exactly what had been modelled successfully and what had not. This
Chapter 11—Conclusions 277
was particularly useful during the development of the model and its implemen-
tation. Although the assumption underpinning the model—that mammographic
appearance is a stationary texture—is obviously invalid, the development of this
model allowed us to gain an understanding of the problems involved in modelling
mammographic appearance.
The evaluation of the synthetic textures showed that they were good enough to be
confused with the real textures about a third of the time and compared favourably
with those produced using Efros and Leung’s method, which is considered to be
one of the best methods in the literature. The parametric model is competitive
with the non-parametric method, but is much more flexible: synthetic textures
can be generated, images can be analysed using the novelty detection algorithm
and the time and space complexity of the method scales well with the number of
training pixels.
There is a significant lack of rigorous evaluation of texture synthesis algorithms
in the literature. Psychophysical experiments allow the human visual system to
be used objectively and quantitatively. Psychophysical experiments can be de-
ployed relatively easily via the Internet, allowing large numbers of participants
to be recruited. However, there are disadvantages to running experiments on-
line: participants are self-selecting, participants may be unlikely to volunteer for
experiments that take a long time to complete or if personal information is so-
licited and it is not possible to control the environment in which the experiment
is conducted (e.g. distractions, viewing distance, ambient lighting).
The generative model developed in Chapter 9 represents a significant step towards
Chapter 11—Conclusions 278
understanding how to statistically model the appearance of entire mammograms.
We decomposed the problem into modelling shape, general appearance and de-
tailed textural appearance. All three components were successfully modelled.
Curvilinear structures were not considered and were therefore not captured by
the model. Future work should consider how this important component of ap-
pearance can be combined with the other model components.
While the full synthetic mammograms can easily be differentiated from real mam-
mograms by an expert mammography radiologist, computer vision researchers
found discrimination at low resolution difficult. The aim of developing the model
of entire mammograms was to further our understanding of how real mammo-
grams may be statistically modelled, rather than to immediately solve the novelty
detection problem; future research should pursue both goals.
Modelling the appearance of entire mammograms is extremely difficult, and we
conclude with a suggestion for an alternative approach to the novelty detection
problem. Consider a pair of mammograms taken of a particular patient. Each
mammogram in that pair is a very specific model of what the other should look
like. Asymmetry can therefore be considered as a novelty detection approach.
It may be possible to statistically learn the legal transformations that may be
applied to a mammogram in a pair of normal mammograms. Novelty would
correspond to an illegal transformation. It may be possible to generalise this idea
to the case where both CC and MLO views are available, or to the temporal case.
Chapter 11—Final statement 279
11.4 Final statement
This thesis proposed a new approach to detecting abnormalities in mammograms.
Novelty detection requires a model of normal mammographic appearance that al-
lows deviation from normality to be measured. Two generative statistical models
of mammographic appearance have been developed and evaluated. A novelty
detection experiment showed that it is possible to detect multiple types of ab-
normality using a model of normal appearance if that model is sufficiently spe-
cific. Psychophysical experiments demonstrated that significant progress has been
made towards being able to realistically model both mammographic texture and
the appearance of entire mammograms.
Appendix A
The expectation maximisation
algorithm
A.1 Introduction
Maximum likelihood is an approach to finding “optimal” estimates for model
parameters. A set of model parameters, θ∗, is optimal in the maximum likelihood
sense if they are most likely given some observed data:
θ∗ = arg maxθ
L(θ|{yi}) (A.1)
where L is the likelihood function and {yi} are the observed data. The likelihood
function is usually replaced by the log-likelihood function ` for computational
convenience.
280
Appendix A—The algorithm 281
The expectation maximisation (EM) algorithm is a general approach to solving
maximum likelihood problems in the presence of missing data [80, 137]. One form
of missing data is latent data which is a contrivance that makes the parameter
estimation problem tractable. Latent data can be assumed to exist—even if it
cannot be measured—and in this way can be considered missing. We will now
present the abstract form of the EM algorithm (see Section 5.4.3 for an example
of an application of the algorithm). The presentation of the algorithm is based
in part upon those of Ravishanker et al. [151] and Hastie et al. [81].
A.2 The algorithm
The EM algorithm is named after its two steps, the expectation step and the
maximisation step. These steps are iterated until the algorithm converges.
Let YO denote the observed data, YL denote the latent data and let the complete
data be denoted by Y = (YO,YL). From conditional probability we can write
P (YO|θ) =P (YL,YO|θ)P (YL|YO, θ)
=P (Y|θ)
P (YL|YO, θ). (A.2)
Taking logarithms:
`(θ;YO) = `0(θ;Y)− `1(θ;YL|YO) (A.3)
where `1 is based upon P (YL|YO, θ). Taking expectations, conditioned on YO and
Appendix A—Proof of convergence 282
the model parameters at the m-th iteration of the algorithm, θ(m):
`(θ;YO) = Q(θ, θ(m))−H(θ, θ(m)) (A.4)
def= E[`0(θ;Y)|YO, θ(m)]− E[`1(θ;YL|YO)|YO, θ(m)].
Equation A.4 is the log-likelihood equivalent of the objective function we seek
(Equation A.1). Q(θ, θ(m)) is computed in the E-step. This is essentially a vertical
slice through the density shown in Figure 5.1. The M-step obtains θ(m+1) by
maximising Q over θ:
Q(θ(m+1), θ(m)) ≥ Q(θ, θ(m)) ∀θ. (A.5)
The actual form that Q takes is problem specific (see Section 5.4.3 for a more
intuitive example). We shall now show why maximising Q maximises `(θ;YO)
and prove that the EM algorithm converges by showing that each step of the EM
algorithm is guaranteed not to decrease the objective function.
A.3 Proof of convergence
We will show that
`(θ(m+1);YO)− `(θ(m);YO) ≥ 0 (A.6)
with equality if θ(m+1) = θ(m). Consider what happens to the objective function—
in terms of Q and H—as we move from one iteration of the EM algorithm to the
Appendix A—Proof of convergence 283
next:
`(θ(m+1);YO)− `(θ(m);YO) = (A.7)A︷ ︸︸ ︷[
Q(θ(m+1), θ(m))−Q(θ(m), θ(m))]
−[H(θ(m+1), θ(m))−H(θ(m), θ(m))
]︸ ︷︷ ︸B
The M-step ensures that Q(θ(m+1), θ(m)) ≥ Q(θ(m), θ(m)), and so part A of Equa-
tion A.7 will be non-negative. If part B of Equation A.7 is non-positive, then the
an iteration of the EM algorithm cannot decrease the objective function—i.e. we
need to prove that:
H(θ, θ(m)) ≤ H(θ(m), θ(m)) ∀θ, (A.8)
which can be read as ‘H(θ, θ(m)) is maximised by θ = θ(m)’.
From Equation A.4 and the definition of conditional expectation we can write H
as:
H(θ, θ(m)) =
∫yL∈L
p(yL|YO, θ(m)) log p(yL|YO, θ) dyL. (A.9)
Note that H has the form
∫ ∞
−∞p(x) log q(x) dx (A.10)
where p and q are densities with associated models θ(p) and θ(q). Equation A.8
says that H is maximised when θ(p) = θ(q). Considering the discrete case:
n∑i
pi log qi, (A.11)
Appendix A—Proof of convergence 284
we can state the following:
log x ≤ x− 1 ⇒n∑i
pi log qi ≤n∑i
pi(qi − 1) (A.12)
=n∑i
(piqi − pi)
=n∑i
piqi −n∑i
pi
=n∑i
piqi − 1
∑ni piqi is a scaler product of two vectors, p and q. The Cauchy-Schwartz in-
equality states that:
|p · q| ≤ ‖p‖2‖q‖2 (A.13)
so
|p · q|‖p‖2‖q‖2
≤ 1. (A.14)
The scaler product of two vectors is
|p · q| = ‖p‖2‖q‖2 cos θ (A.15)
so if |p · q| is maximised, then |p · q| = ‖p‖2‖q‖2 ⇒ cos θ = 1 ⇒ θ = 0, and so p
and q are parallel. The two vectors are parallel if p = tq. If∑n
i pi =∑n
i qi = 1,
then t = 1 and p = q. Therefore Equation A.11 is maximised when pi = qi ∀i.
Generalising to the continuous case:
limn→∞
n∑i
pi log qi =
∫ ∞
−∞p(x) log q(x) dx. (A.16)
Appendix A—Proof of convergence 285
Equation A.16 is maximised when p = q (i.e. when θ(p) = θ(q)), and so H(θ, θ(m))
is maximised when θ = θ(m). Therefore part B of Equation A.7 is non-positive
and so an iteration of the EM algorithm cannot decrease the log-likelihood of the
model parameters given the observed data. In summary, the EM algorithm finds a
maximum of the objective function. However, there is no guarantee that the max-
imum will be the global maximum, and so several runs of the algorithm—starting
from different initialisations—may be necessary to find a suitable solution.
Bibliography
[1] L. V. Ackerman and E. E. Gose. Breast lesion classification by computer
and xeroradiography. Cancer, 30(4):1025–1035, October 1972.
[2] F. E. Alexander, T. J. Anderson, H. K. Brown, A. P. M. Forrest, W. Hep-
burn, A. E. Kirkpatrick, B. B. Muir, R. J. Prescott, and A. Smith. 14 years
of follow-up from the edinburgh randomised trial of breast-cancer screening.
The Lancet, 353(9168):1903–1908, June 1999.
[3] S. R. Amendolia, F. Estrella, T. Hauer, D. Manset, D. McCabe, R. Mc-
Clatchey, M. Odeh, T. Reading, D. Rogulin, D. Schottlander, and
T. Solomonides. Grid Databases for Shared Image Analysis in the Mam-
moGrid Project. In Proceedings of International Database Engineering and
Applications Symposium. IDEAS’04, pages 312–321. IEEE, July 2004.
[4] Breast Cancer Facts and Figures 2003–2004. Annual report, American
Cancer Society, Atlanta, Georgia, USA, 2003.
[5] Cancer Facts and Figures 2004. Annual report, American Cancer Society,
Atlanta, Georgia, USA, 2004.
286
Bibliography 287
[6] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,
J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen.
LAPACK Users’ Guide. Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 3rd edition, 1999.
[7] S. Astley, R. Zwiggelaar, C. Wolstenholme, K. Davies, T. Parr, and C. Tay-
lor. Prompting in mammography: How accurate must the prompt gener-
ators be? In N. Karssemeijer, M. A. O. Thijssen, J. H. C. L. Hendriks,
and L. J. T. O. van Erning, editors, Digital Mammography, volume 13 of
Computational Imaging and Vision, pages 347–354. Kluwer Academic Pub-
lishers, November 1998.
[8] S. M. Astley, C. R. M. Boggis, K. Walker, S. Wallace, S. Tomkinson,
V. Hillier, and J. Morris. An Evaluation of a Commercial Prompting System
in a Busy Screening Centre. In H.-O. Peitgen, editor, Digital Mammogra-
phy: IWDM—6th International Workshop on Digital Mammography, pages
471–475. Springer-Verlag, March 2003.
[9] S. M. Astley, T. C. Mistry, C. R. M. Boggis, and V. F. Hillier. Should
we use humans or a machine to pre-screen mammograms. In H.-O. Peit-
gen, editor, Digital Mammography: IWDM—6th International Workshop
on Digital Mammography, pages 476–480. Springer-Verlag, March 2003.
[10] P. R. Bakic, M. Albert, D. Brzakovic, and A. D. A. Maidment. Mammogram
synthesis using 3D simulation. I. Breast tissue model and image acquisition
simulation. Medical Physics, 29:2131–2139, 2002.
Bibliography 288
[11] J. A. Bangham, P. D. Ling, and R. Young. Multiscale recursive medians,
scale-space and transforms with applications to image processing. IEEE
Transactions on Image Processing, 5(6):1043–1048, 1996.
[12] N. Baxter. Preventive health care, 2001 update: should women be routinely
taught breast self-examination to screen for breast cancer? Canadian Med-
ical Association Journal, 164(13):1837–1846, June 2001.
[13] A. O. Beacham, J. S. Carpenter, and M. A. Andrykowski. Impact of
benign breast biopsy upon breast self-examination. Preventive Medicine,
38(6):723–731, June 2004.
[14] R. E. Bellman. Adaptive Control Processes. Princeton University Press,
Princeton, NJ, USA, 1961.
[15] U. Bick, M. L. Giger, R. A. Schmidt, R. M. Nishikawa, D. Wolverton, and
K. Doi. Automated segmentation of digitized mammograms. Academic
Radiology, 2:1–9, 1995.
[16] K. Bliznakova, Z. Bliznakov, V. Bravou, Z. Kolitsi, and N. Pallikarakis. A
three-dimensional breast software phantom for mammography simulation.
Physics in Medicine and Biology, 48(22):3699–3719, 2003.
[17] M. Board, S. Astley, and C. Boggis. Multi-resolution transportation for
the detection of mammographic asymmetry. In International Workshop on
Digital Mammography, 2004. (Accepted, pending.).
Bibliography 289
[18] L. Bocchi, G Coppini, J. Nori, and G. Valli. Detection of single and clus-
tered microcalcifications in mammograms using fractals models and neural
networks. Medical Engineering and Physics, 26(4):303–312, May 2004.
[19] F. O. Bochud, C. K. Abbey, and M. P. Eckstein. Statistical texture syn-
thesis of mammographic images with clustered lumpy backgrounds. Optics
Express, 4(1):33–43, January 1999.
[20] F. L. Bookstein. Principal Warps: Thin-Plate Splines and the Decompo-
sition of Deformations. IEEE Transactions on Pattern Analysis Machine
Intelligence, 11(6):567–585, 1989.
[21] H. Booth, M. Gautrey, M. Sheldrake, N. Cooper, and M. Quinn. Cancer
statistics registrations: Registrations of cancer diagnosed in 2001, England.
Annual report series MB1 no. 32, National Statistics, 2004. Crown copy-
right.
[22] N. F. Boyd, J. W. Byng, R. A. Long, E. K. Fishell, L. E. Little, A. B.
Miller, G. A. Lockwood, D. L. Tritchler, and M. J. Yaffe. Qualitative clas-
sification of mammographic densities and breast cancer risk: results from
the Canadian National Breast Screening Study. Journal of the National
Cancer Institute, 87(9):670–675, May 1995.
[23] M. Brady, F. Gilbert, S. Lloyd, M. Jirotka, D. Gavaghan, A. Simp-
son, R. Highnam, T. Bowles, D. Schottlander, D. McCabe, D. Watson,
B. Collins, J. Williams, A. Knox, M. Oevers, and P. Taylor. eDiaMoND:
the UK’s Digital Mammography National Database. In International Work-
shop on Digital Mammography, 2004. (Accepted, pending.).
Bibliography 290
[24] Breast Cancer Factsheet—February 2004. Online, February 2004. Accessed
March 13 2005.
[25] J. L. Breau. Chemotherapy in the management of breast cancer (la chimio-
thrapie dans le traitement du cancer du sein). Chirurgie; Memoires De
l’Academie De Chirurgie, 120(6–7):354–356, 1994–1995.
[26] J. Bresenham. Algorithm for computer control of digital plotter. IBM
System Journal, 4:25–30, 1965.
[27] D. S. Brettle, E. Berry, and M. A. Smith. Synthesis of texture from clinical
images. Image and Vision Computing, 21:433–445, May 2003.
[28] J. Brown, A. Coulthard, A. K. Dixon, J. M. Dixon, D. F. Easton, R. A.
Eeles, D. G. R. Evans, F. G. Gilbert, C. Hayes, J. P. R. Jenkins, et al.
Rationale for a national multi-centre study of magnetic resonance imaging
screening in women at genetic risk of breast cancer. The Breast, 9(2):72–77,
April 2000.
[29] D. Brzakovic, X. M. Luo, and P. Brzakovic. An Approach to Automated
Detection of Tumors in Mammograms. IEE Transactions on Medical Imag-
ing, 9(3):233–241, September 1990.
[30] P. C. Bunch, J. F. Hamilton, G. K. Sanderson, and A. H. Simmons. A free
response approach to measurement and characterization of radiographic
observer performance. SPIE Proceedings, 127:124–135, 1977.
[31] C. J. C. Burges. A tutorial on support vector machines for pattern recog-
nition. Knowledge Discovery and Data Mining, 2(2):1–43, 1998.
Bibliography 291
[32] Warren L. J. Burhenne, S. A. Wood, C. J. D’Orsi, S. A. Feig, D. B. Kopans,
K. F. O’Shaughnessy, E. A. Sickles, L. Tabar, C. J. Vyborny, and R. A.
Castellino. Potential contribution of computer-aided detection to the sen-
sitivity of screening mammography. Radiology, 215(2):554–562, May 2000.
[33] J. W. Byng, N. F. Boyd, E. Fishell, R. A. Jong, and M. J. Yaffe. The
quantitative analysis of mammographic densities. Physics in Medicine and
Biology, 39(10):1629–1638, October 1994.
[34] C. B. Caldwell, S. J. Stapleton, D. W. Holdsworth, R. A. Jong, W. J.
Weiser, G. Cooke, and M. J. Yaffe. Characterization of mammographic
parenchymal pattern by fractal dimension. Physics in Medicine and Biology,
35(2):235–247, February 1990.
[35] R. Campanini, D. Dongiovanni, E. Iampieri, N. Lanconelli, M. Masotti,
G. Palermo, A. Riccardi, and M. Roffilli. A novel featureless approach to
mass detection in digital mammograms based on Support Vector Machines.
Physics in Medicine and Biology, 49(6):961–975, March 2004.
[36] N. A. Campbell and J. B. Reece. Biology. Benjamin Cummings, 7th edition,
December 2004.
[37] The stages, http://www.cancerhelp.org.uk/help/default.
asp?page=3315 , accessed July 10 2005.
[38] S. J. Caulkin, S. M. Astley, A. Mills, and C. R. M. Boggis. Generating
Realistic Spiculated Lesions in Digital Mammograms. In M. J. Yaffe, editor,
Digital Mammography: IWDM 2000, 5th International Workshop, pages
713–720. Medical Physics Publishing, December 2001.
Bibliography 292
[39] N. Cerneaz and M. Brady. Finding curvilinear structures in mammograms.
In N. Ayache, editor, Computer Vision, Virtual Reality and Robotics in
Medicine, volume 905 of Lecture Notes in Computer Science, pages 372–
382. Springer, March 1995.
[40] D. P. Chakraborty. Maximum likelihood analysis of free-response receiver
operating characteristic (FROC) data. Medical Physics, 16(4):561–568, July
1989.
[41] H.-P. Chan, D. Wei, M. A. Helvie, B. Sahiner, D. D. Adler, M. M. Goodsitt,
and N. Petrick. Computer-aided classification of mammographic masses and
normal tissue: linear discriminant analysis in texture feature space. Physics
in Medicine and Biology, 40(5):857–875, May 1995.
[42] R. Chandrasekhar and Y. Attikiouzel. Automatic Breast Border Segmen-
tation by Background Modelling and Subtraction. In M. J. Yaffe, editor,
Digital Mammography: IWDM 2000, 5th International Workshop, pages
560–565, Madison, Wisconsin, USA, December 2001. Medical Physics Pub-
lishing.
[43] P. Chaturvedi. Does smoking increase the risk of breast cancer? The Lancet
Oncology, 4(11):657–658, November 2003.
[44] E. Claridge and J. H. Richter. Characterisation of mammographic lesions.
In A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital
Mammography: Proceedings of the 2nd International Workshop on Digi-
tal Mammography, York, UK, 10–12 July 1994, pages 241–250. Elsevier
Science, September 1994.
Bibliography 293
[45] G. M. Clarke and D. Cooke. A Basic Course in Statistics. Arnold Publish-
ers, 4th edition, October 1998.
[46] P. Collinson. Of bombers, radiologists, and cardiologists: time to ROC.
Heart, 80(3):236, February 1998.
[47] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active Appearance Models.
IEEE Transactions on Pattern Analysis Machine Intelligence, 23(6):681–
685, 2001.
[48] T. F. Cootes, C. J. Taylor, and A. Lanitis. Active shape models: Evaluation
of a multi-resolution method for improving image search. In E. Hancock,
editor, Proceedings of the 5th British Machine Vision Conference, pages
327–336. BMVA Press, September 1994.
[49] I. Daubechies. Ten Lectures on Wavelets. CBMS-NSF Conference Series
in Applied Mathematics. Society for Industrial and Applied Mathematics,
January 1992.
[50] Rh. H. Davies. Learning Shape: Optimal Models for Analysing Shape Vari-
ability. PhD thesis, The Victoria University of Manchester, Manchester,
United Kingdom, 2002.
[51] Rh. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J.
Taylor. A Minimum Description Length Approach to Statistical Shape
Modelling. IEEE Transactions on Medical Imaging, 2002.
[52] USF Digital Mammography Home Page, http://marathon.csee.
usf.edu/Mammography/Database.html , accessed January 2005.
Bibliography 294
[53] J. S. De Bonet and P. Viola. A Non-Parametric Multi-Scale Statistical
Model for Natural Images. Advances in Neural Information Processing, 10,
1997.
[54] I. den Tonkelaar, P. H. M. Peeters, and P. A. H. van Noord. Increase
in breast size after menopause: prevalence and determinants. Maturitas,
48(1):51–57, May 2004.
[55] J. Dengler, S. Behrens, and J. F. Desaga. Segmentation of Microcalcifica-
tions in Mammograms. IEEE Transactions on Medical Imaging, 12(4):634–
642, December 1993.
[56] P. A. Devijer and J. Kittler. Pattern Recognition: A Statistical Approach.
Prentice Hall International, 1982.
[57] J. Dinnes, S. Moss, J. Melia, R. Blanks, F. Song, and J. Kleijnen. Effec-
tiveness and cost-effectiveness of double reading of mammograms in breast
cancer screening: findings of a systematic review. The Breast, 10(6):455–
463, December 2001.
[58] C. J. D’Orsi, D. J. Getty, J. A. Swets, R. M. Pickett, S. E. Seltzer, and B. J.
McNeil. Reading and Decision Aids for Improved Accuracy and Standard-
ization of Mammographic Diagnosis. Radiology, 184:619–622, September
1992.
[59] A. A. Efros and W. T. Freeman. Image quilting for texture synthesis and
transfer. In L. Pocock, editor, SIGGRAPH ’01: Proceedings of the 28th
annual conference on computer graphics and interactive techniques, pages
341–346, New York, USA, 2001. ACM Press.
Bibliography 295
[60] A. A. Efros and T. K. Leung. Texture Synthesis by Non-Parametric Sam-
pling. In 7th International Conference on Computer Vision (ICCV ’99),
volume 2, pages 1033–1039. IEEE Computer Society Press, November 1999.
[61] T. Ema, K. Doi, R. M. Nishikawa, Y. Jiang, and J. Papaioannou. Image
feature analysis and computer-aided diagnosis in mammography: reduc-
tion of false-positive clustered microcalcifications using local edge-gradient
analysis. Medical Physics, 22(2):161–169, February 1995.
[62] C. Evans, K. Yates, and M. Brady. Statistical Characterization of Normal
Curvilinear Structures in Mammograms. In H.-O. Peitgen, editor, Digital
Mammography: IWDM—6th International Workshop on Digital Mammog-
raphy, pages 285–291. Springer-Verlag, March 2003.
[63] A. Fenster, K. Surry, W. Smith, and D. B. Downey. The use of three-
dimensional ultrasound imaging in breast biopsy and prostate therapy. Mea-
surement, 36(3–4):245–256, October–December 2004.
[64] B. Fisher, J. Bryant, J. J. Dignam, D. L. Wickerham, E. P. Mamounas,
E. R. Fisher, R. G. Margolese, L. Nesbitt, S. Paik, T. M. Pisansky, and
N. Wolmark. Tamoxifen, Radiation Therapy, or Both for Prevention of
Ipsilateral Breast Tumor Recurrence After Lumpectomy in Women With
Invasive Breast Cancers of One Centimeter or Less. Journal of Clinical
Oncology, 20(20):4141–4149, October 2002.
[65] C. E. Floyd, J. Y. Lo, A. J. Yun, D. C. Sullivan, and P. J. Kornguth. Pre-
diction of Breast Cancer Malignancy Using and Artificial Neural Network.
Cancer, 74(11):2944–2948, December 1994.
Bibliography 296
[66] P. Forrest. Breast cancer screening. Report to Health Ministers of England,
Wales, Scotland and Northern Ireland by Working Group chaired by Sir
Patrick Forrest, 1987. HMSO.
[67] T. W. Freer and M. J. Ulissey. Screening mammography with computer-
aided detection: prospective study of 12 860 patients in a community breast
center. Radiology, 220(3):781–786, September 2001.
[68] D. D. Garber. Computational Models for Texture Analysis and Texture
Synthesis. PhD thesis, University of Southern California, May 1981.
[69] GE Healthcare — Product Technology — Mammography — Senographe
2000D, http://www.gehealthcare.com/euen/mammography/
products/senographe-2000d/2000d_cad.html , accessed July 20
2005.
[70] M. L. Giger, Z. Huo, C. J. Vyborny, L. Lan, R. M. Nishikawa, and I. Rosen-
bourgh. Results of an Observer Study with an Intelligent Mammographic
Workstation for CAD. In H.-O. Peitgen, editor, Digital Mammography:
IWDM—6th International Workshop on Digital Mammography, pages 297–
303. Springer-Verlag, March 2003.
[71] P. Giger, M. L. Lu and Z. Huo. CAD in mammography: Computerized
detection and classification of masses. In A. G. Gale, S. M. Astley, D. R.
Dance, and A. Y. Cairns, editors, Digital Mammography: Proceedings of
the 2nd International Workshop on Digital Mammography, York, UK, 10–
12 July 1994, page 281. Elsevier Science, September 1994.
Bibliography 297
[72] F. J. Gilbert, A. Kirkpatrick, C. Boggis, S. Astley, S. Field, A. Gale,
C. Hancock, K. Young, J. Cooke, S. Moss, R. Blanks, and L. Garvican.
Computer Aided Detection in Mammography: Working Party of the Ra-
diologists Quality Assurance Coordinating Group. Technical report, NHS,
NHS Cancer Screening Programmes, Sheffield, UK, January 2001. NHSBSP
Publication No. 48.
[73] P. C. Gøtzsche and O. Olsen. Is screening for breast cancer with mammog-
raphy justifiable? The Lancet, 355(9198):129–134, January 2000.
[74] J. Grim and M. Haindl. A Discrete Mixtures Colour Texture Model. In
Texture 2002: The 2nd international workshop on texture analysis and syn-
thesis, pages 59–63, 1 June 2002.
[75] ATAC Trialists’ Group. Results of the ATAC (Arimidex, Tamoxifen, Alone
or in Combination) trial after completion of 5 years’ adjuvant treatment for
breast cancer. The Lancet, 356(9453):60–62, January 2005.
[76] D. Gur, J. H. Sumkin, H. E. Rockette, M. Ganott, C. Hakim, L. Hard-
esty, W. R. Poller, R. Shah, and L. Wallace. Changes in Breast Cancer
Detection and Mammography Recall Rates After the Introduction of a
Computer-Aided Detection System. Journal of the National Cancer In-
stitute, 96(3):185–190, 2004.
[77] W. C. Hahn. Telomerase and Cancer. Clinical Cancer Research, 7:2953–
2954, October 2001.
Bibliography 298
[78] J. A. Hanley and B. J. McNeil. The meaning and use of the area under
a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36,
April 1982.
[79] R. M. Haralick, K. Shanmugan, and I. Dinstein. Texture features for image
classification. IEEE Transactions on Systems, Man and Cybernetics, 3:610–
621, 1973.
[80] H. Hartley. Maximum likelihood estimation from incomplete data. Biomet-
rics, 14:174–194, 1958.
[81] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical
Learning. Springer Series in Statistics. Springer, 2001.
[82] Health Service Quarterly. Report 18, Office for National Statistics, London,
UK, Summer 2003.
[83] M. Heath, K. Bowyer, D. Kopans, R. Moore, and P. Kegelmeyer Jr. The
Digital Database for Screening Mammography. In M. J. Yaffe, editor, Dig-
ital Mammography: IWDM 2000, 5th International Workshop, pages 212–
218, Madison, Wisconsin, USA, December 2001. Medical Physics Publish-
ing.
[84] D. J. Heeger and J. R. Bergen. Pyramid-Based Texture Analysis/Synthesis.
In SIGGRAPH 95: 22nd International ACM Conference on Computer
Graphics and Interactive Techniques, pages 229–238. ACM Press, 1995.
Bibliography 299
[85] J. J. Heine, S. R. Deans, R. P. Velthuizen, and L. P. Clarke. On the statisti-
cal nature of mammograms. Medical Physics, 26(11):2254–2265, November
1999.
[86] R. Highnam and M. Brady. Mammographic Image Analysis. Computational
Imaging and Vision Series. Kluwer, April 1999.
[87] R. P. Highnam, J. M. Brady, and R. E. English. Simulating Disease in
Mammography. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000,
5th International Workshop, pages 727–731. Medical Physics Publishing,
December 2001.
[88] R. P. Highnam, J. M. Brady, and B. J. Shepstone. A representation
for mammographic image processing. Medical Image Analysis, 1(1):1–18,
March 1996.
[89] F. L. Hitchcock. The distribution of a product from several sources to
numerous localities. Journal of Mathematics and Physics, 20:224–230, 1941.
[90] A. Holmes. Computer-aided Detection of Abnormalities in Mammograms.
PhD thesis, The Victoria University of Manchester, Manchester, United
Kingdom, 2001.
[91] A. S. Holmes, C. J. Rose, and C. J. Taylor. Measuring Similarity between
Pixel Signatures. Image and Vision Computing, 20(5–6):331–340, April
2002.
Bibliography 300
[92] A. S. Holmes, C. J. Rose, and C. J. Taylor. Transforming Pixel Signa-
tures into an Improved Metric Space. Image and Vision Computing, 20(9–
10):701–707, August 2002.
[93] D. H. Hubel. Exploration of the Primary Visual Cortex. Nature, 299:515–
524, 1982.
[94] Z. Huo, M. L. Giger, C. V. Vyborny, U. Bick, P. Lu, D. E. Wolverton, and
R. A. Schmidt. Analysis of spiculation in the computerized classification of
mammographic masses. Medical Physics, 22(10):1569–1579, October 1995.
[95] I. W. Hutt. The computer-aided detection of abnormalities in digital mam-
mograms. PhD thesis, The Victoria University of Manchester, Manchester,
United Kingdom, 1996.
[96] I. W. Hutt, S. M. Astley, and C. R. M. Boggis. Prompting as an aid to
Diagnosis in Mammography. In A. G. Gale, S. M. Astley, D. R. Dance,
and A. Y. Cairns, editors, Digital Mammography: Proceedings of the 2nd
International Workshop on Digital Mammography, York, UK, 10–12 July
1994, pages 389–398. Elsevier Science, September 1994.
[97] P. T. Huynh, A. M. Jarolimek, and S. Daye. The false negative mammo-
gram. Radiographics, 18:1137–1154, 1998.
[98] Press release, http://www.icadmed.com , accessed January 2005.
[99] iCAD Breast Cancer Detection, http://www.icadmed.com , accessed
July 20 2005.
Bibliography 301
[100] IEEE Computer Society. IEEE Standard for Binary Floating-Point Arith-
metic, IEEE Standard 754-1985. Standard, IEEE, 1985.
[101] International Breast Cancer Screening Network, http://
appliedresearch.cancer.gov/ibsn/ , accessed July 20 2005.
[102] A. K. Jain, M. N. Murty, and P. J. Flynn. Data Clustering: A Review.
ACM Computing Surveys, 31(3), September 1999.
[103] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analy-
sis. Prentice-Hall, 5th edition, 2002.
[104] I. T. Jolliffe. Principal Component Analysis. Springer Series in Statistics.
Springer Verlag, New York, USA, 2nd edition, 2002.
[105] N. Karssemeijer. Adaptive noise equalization and recognition of micro-
calcification clusters in mammograms. International Journal of Pattern
Recognition and Artificial Intelligence, 7(6):1357–1376, 1993.
[106] N. Karssemeijer. Adaptive Noise Equalization and Image Analysis in Mam-
mography. In H. H. Barrett and A. F. Gmitro, editors, International Con-
ference on Information Processing in Medical Imaging, volume 687 of Lec-
ture Notes in Computer Science, pages 472–486, Flagstaff, Arizona, USA,
June 14–18 1998. Springer.
[107] N. Karssemeijer. Automated classification of parenchymal patterns in mam-
mograms. Physics in Medicine and Biology, 43(2):365–378, February 1998.
[108] N. Karssemeijer. Local orientation distribution as a function of spatial
scale for detection of masses in mammograms. In A. Kuba, M. Samal, and
Bibliography 302
A. Todd-Pokropek, editors, Information Processing in Medical Imaging:
16th International Conference, IPMI ’99, Visegrad, Hungary, June 28-July
2, 1999, volume 1613 of Lecture Notes in Computer Science, pages 280–293.
Springer, June 1999.
[109] N. Karssemeijer, J. D. M. Otten, A. L. M. Verbeek, J. H. Groenewoud,
H. J. de Koning, J. H. C. L. Hendriks, and R. Holland. Computer-aided
Detection versus Independent Double Reading of Masses on Mammograms.
Radiology, 227:192–200, February 2003.
[110] N. Karssemeijer and G. M. te Brake. Detection of stellate distortions in
mammograms. IEEE Transactions on Medical Imaging, 15(5):611–619, Oc-
tober 1996.
[111] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models.
International Journal of Computer Vision, 1(4):321–331, 1987.
[112] T. J. Key, N. E. Allen, E. A. Spencer, and R. C. Travis. Nutrition and breast
cancer. Breast (Edinburgh, Scotland), 12(6):412–416, December 2003.
[113] J. Kilday, F. Palmieri, and M. D. Fox. Classifying Mammographic Le-
sions Using Computerized Image Analysis. IEEE Transactions on Medical
Imaging, 12(4):664–669, December 1993.
[114] KODAK Mammography Computer-Aided Detection (CAD) System,
http://www.kodak.com/global/en/health/productsByType/
medFilmSys/eqp/system/mamCad.jhtml?pq-path=6498 , ac-
cessed July 20 2005.
Bibliography 303
[115] A. C. W. Kotcheff and C. J. Taylor. Automatic Construction of Eigenspace
Models by Direct Optimisation. Medical Image Analysis, 2:303–314, 1998.
[116] S. Lai, X. Li, and W. Bischoff. On techniques for detecting circumscribed
masses in mammograms. IEEE Transactions on Medical Imaging, 8(4):377–
386, December 1989.
[117] J.-L. Lamarque. An Atlas of The Breast: Clinical Radiodiagnosis. Wolfe
Medical Atlases. Wolfe Medical Publications, London, United Kingdom,
1981.
[118] M. Larkin. Breast self examination does more harm than good, says task
force. The Lancet, 357(9274):2109, June 2001. News article.
[119] B. Leyland-Jones. Trastuzumab: hopes and realities. The Lancet Oncology,
3(3):137–144, March 2002.
[120] S. Liu, C. F. Babbs, and E. J. Delp. Multiresolution Detection of Spiculated
Lesions in Digital Mammograms. IEEE Transactions on Image Processing,
10(6):874–884, June 2001.
[121] S. L. Lou, H. D. Lin, K. P. Lin, and D. Hoogstrate. Automatic breast re-
gion extraction from digital mammograms for PACS and telemammography
applications. Computerized Medical Imaging and Graphics, 24(4):205–220,
August 2000.
[122] C. G. Mallat, S. G. Mallat, and S. Mallat. A Wavelet Tour of Signal
Processing. Wavelet Analysis and Its Applications Series. Elsevier Science
& Technology Books, 2nd edition, September 1999.
Bibliography 304
[123] L. N. Mascio, S. D. Frankel, J. M. Hernandez, and C. M. Logan. Building
the LLNL/UCSF Digital Mammogram Library with image groundtruth.
In K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schidt, editors, Dig-
ital Mammography ’96: Proceedings of the 3rd International Workshop on
Digital Mammography, International Congress Series, pages 427–430, Hills-
borough, New Jersey, USA, December 1996. Excerpta Medica.
[124] G. Matheron. Random Sets and Integral Geometry. Probability and Statis-
tics Series. Wiley, February 1975.
[125] J. McQueen. Some Methods for Classification and Analysis of Multivariate
Observations. In 5th Berkeley Symposium on Mathematical Statistics and
Probability, 1967.
[126] C. E. Metz. ROC methodology in radiologic imaging. Investigative Radi-
ology, 21(9):720–733, September 1986.
[127] C. E. Metz. Evaluation of digital mammography by ROC analysis. In
K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schidt, editors, Digi-
tal Mammography ’96: Proceedings of the 3rd International Workshop on
Digital Mammography, International Congress Series, pages 61–68, Hills-
borough, New Jersey, USA, December 1996. Excerpta Medica.
[128] The NEW MIAS Digital Mammogram Database, http://www.wiau.
man.ac.uk/services/MIAS/MIASweb.html , accessed July 20 2005.
[129] P. Miller and S. Astley. Automated detection of breast asymmetries. In
J. Illingworth, editor, British Machine Vision Conference, pages 519–528.
BMVA Press, September 1993.
Bibliography 305
[130] The mini-MIAS database of mammograms, http://peipa.essex.ac.
uk/info/mias.html , accessed July 20 2005.
[131] E. H. Moore. On the Reciprocal of the General Algebraic Matrix. (Ab-
stract). Bulletin of the American Mathematical Society, 26:394–395, 1920.
[132] N. R. Mudigonda, R. M. Rangayyan, and J. E. L. Desautels. Gradient and
Texture Analysis for the Classification of Mammographic Masses. IEEE
Transactions on Medical Imaging, 19(10):1032–1043, October 2000.
[133] NHS Breast Screening Programme Annual Review 2004. Technical report,
NHS, 2004.
[134] The NHS Breast Screening Programme, http://www.
cancerscreening.nhs.uk/breastscreen/ , accessed February
2005.
[135] R. M. Nishikawa, R. E. Johnston, D. E. Wolverton, R. A. Schmidt, E. D.
Pisano, B. M. Hemminger, and J. Moody. A Common Database of Mam-
mograms for Research in Digital Mammography. In K. Doi, M. L. Giger,
R. M. Nishikawa, and R. A. Schidt, editors, Digital Mammography ’96:
Proceedings of the 3rd International Workshop on Digital Mammography,
International Congress Series, pages 435–438, Hillsborough, New Jersey,
USA, December 1996. Excerpta Medica.
[136] O. Olsen and P. C. Gøtzsche. Cochrane review on screening for breast cancer
with mammography. The Lancet, 358(9290):1340–1342, October 2001.
Bibliography 306
[137] Dempster A. P., Laird N. M., and Rubin D. B. Maximum Likelihood for
Incomplete Data via the EM Algorithm. Journal of the Royal Statistical
Society, Series B, 39:1–38, 1977.
[138] S. Pemberton, D. Austin, J. Axelsson, T. Celik, D. Dominiak, H. Elenbaas,
B. Epperson, M. Ishikawa, S. Matsui, S. McCarron, A. Navarro, S. Peru-
vemba, R. Relyea, S. Schnitzenbaumer, and P. Stark. XHTML 1.0 The
Extensible HyperText Markup Language (Second Edition). W3C Recom-
mendation, World Wide Web Consortium (W3C), August 2002.
[139] R. A. Penrose. A Generalised Inverse for Matrices. Proceedings of the
Cambridge Philosophical Society, 51:406–413, 1955.
[140] A. Petrie and C. Sabin. Medical Statistics at a Glance. At a Glance series.
Blackwell Science, Oxford, UK, June 2000.
[141] A. Petrosian, H.-P. Chan, M. A. Helvie, M. M. Goodsitt, and D. D.
Adler. Computer-aided diagnosis in mammography: classification of mass
and normal tissue by texture analysis. Physics in Medicine and Biology,
39(12):2273–2288, December 1994.
[142] K. Popat and R. Picard. Novel Cluster-Based Probability Model for Texture
Synthesis, Classification, and Compression. In B. G. Haskell and H.-M.
Hang, editors, Visual Communications and Image Processing ’93, volume
2094, pages 756–768, Bellingham, Washington, USA, October 1993. SPIE.
[143] J. Portilla and E. P. Simoncelli. Texture Modelling and Synthesis using
Joint Statistics of Complex Wavelet Coefficients. In IEEE Workshop on
Bibliography 307
Statistical and Computational Theories of Vision, Fort Collins, Colorado,
USA, June 1999.
[144] J. Portilla and E. P. Simoncelli. A Parametric Texture Model Based on
Joint Statistics of Complex Wavelet Coefficients. International Journal of
Computer Vision, 40(1):49–71, 2000.
[145] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numer-
ical Recipes in C: The Art of Scientific Computing. Cambridge University
Press, 1992.
[146] Digital Mammography Research, http://www.csse.uwa.edu.au/
˜ptaylor/digmam.html , accessed March 6 2005.
[147] W. Qian, M. Kallergi, L. P. Clarke, H.-D. Li, P. Venugopal, D. Song, and
R. A. Clark. Tree structured wavelet transform segmentation of micro-
calcifications in digital mammography. Medical Physics, 22(8):1247–1254,
August 1995.
[148] M. Quinn. Cancer survival, England, 1993-2000. National Statistics Press
Release, January 2002.
[149] Press release: R2 Introduces Smarter CAD Algorithm and Work-
flow For Mammography Products, http://www.r2tech.com/main/
company/news_one_up.php?prID=140 , accessed June 2005.
[150] R2 Home, http://www.r2tech.com , accessed July 20 2005.
[151] N. Ravishanker and D. K. Dey. A First Course in Linear Model Theory.
Chapman and Hall/CRC, 2002.
Bibliography 308
[152] C. J. Rose and C. J. Taylor. An Improved Method of Computing Scale-
Orientation Signatures. In Medical Image Understanding and Analysis,
pages 5–8, July 2001.
[153] C. J. Rose and C. J. Taylor. A Statistical Model of Texture for Medical Im-
age Synthesis and Analysis. In Medical Image Understanding and Analysis,
pages 1–4, July 2003.
[154] C. J. Rose and C. J. Taylor. A Generative Statistical Model of Mammo-
graphic Appearance. In D. Rueckert, J. Hajnal, and G.-Z. Yang, editors,
Medical Image Understanding and Analysis 2004, pages 89–92, Imperial
College London, UK, September 2004.
[155] C. J. Rose and C. J. Taylor. A Model of Mammographic Appearance. In
British Journal of Radiology Congress Series: Proceedings of UK Radio-
logical Congress 2004, pages 34–35, Manchester, United Kingdom, June
2004.
[156] C. J. Rose and C. J. Taylor. A Statistical Model of Mammographic Ap-
pearance for Synthesis and Analysis. In International Workshop on Digital
Mammography, 2004. (Accepted, pending.).
[157] C. J. Rose and C. J. Taylor. A Holistic Approach to the Detection of Abnor-
malities in Mammograms. In British Journal of Radiology Congress Series:
Proceedings of UK Radiological Congress 2005, page 29, Manchester, United
Kingdom, June 2005.
[158] B. Sahiner, H.-P. Chan, N. Petrick, M. A. Helvie, and L. M. Hadjiiski. Im-
provement of mammographic mass characterization using spiculation mea-
Bibliography 309
sures and morphological features. Medical Physics, 28(7):1455–1465, July
2001.
[159] P. Sajda, C. Spence, and L. Parra. A multi-scale probabilistic network
model for detection, synthesis and compression in mammographic image
analysis. Medical Image Analysis, 7(2):187–204, June 2003.
[160] A. Salomon. Beitrage zur pathologie und klinik der mammokarzinome.
Archiv fur Klinische Chirurgie, 101:573–668, 1913.
[161] J. A. Serra, editor. Image Analysis and Mathematical Morphology, volume 1.
Academic Press, April 1982.
[162] J. A. Serra, editor. Image Analysis and Mathematical Morphology: Theo-
retical Advances, volume 2. Academic Press, 1988.
[163] C. E. Shannon. A mathematical theory of communication. Bell System
Technical Journal, 27:379–423 and 623–656, July and October 1948.
[164] E. P. Simoncelli and W. T. Freeman. The Steerable Pyramid: A Flexible
Architecture for Multi-Scale Derivative Computation. In Second Interna-
tional Conference on Image Processing, volume 3, pages 444–447. IEEE
Signal Processing Society, 1995.
[165] J. H. Smith. Prediction of the risk of breast cancer using computer vision
techniques. PhD thesis, The Victoria University of Manchester, Manchester,
United Kingdom, 1998.
[166] J. H. Smith, S. M. Astley, J. Graham, and A. P. Hufton. The calibration
of grey-levels in mammograms. In K. Doi, M. L. Giger, R. M. Nishikawa,
Bibliography 310
and R. A. Schidt, editors, Digital Mammography ’96: Proceedings of the 3rd
International Workshop on Digital Mammography, International Congress
Series, pages 195–200, Hillsborough, New Jersey, USA, December 1996.
Excerpta Medica.
[167] P. Soille, J. Breen, and R. Jones. Recursive Implementation of Erosions and
Dilations along Discrete Lines at Arbitrary Angles. IEEE Transactions on
Pattern Analysis and Machine Vision, 18(5):562–567, May 1996.
[168] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis and Ma-
chine Vision. PWS (Brooks/Cole Publishing), International Thomson Pub-
lishing Europe, High Holborn, London, England, 2nd edition, 1999.
[169] C. Spence, L. Parra, and P. Sajda. Detection, Synthesis and Compression
in Mammographic Image Analysis with a Hierarchical Image Probability
Model. In L. Staib, editor, IEEE Workshop on Mathematical Methods in
Biomedical Image Analysis, pages 3–10. IEEE, 2001.
[170] S. J. Starr, C. E. Metz, L. B. Lusted, and D. J. Goodenough. Visual
detection and localization of radiographic images. Radiology, 116:533–538,
1975.
[171] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts,
E. Stamatakis, N. Cerneaz, S. Kok, P. Taylor, D. Betal, and J. Savage. The
mammographic image analysis society digital mammogram database. In
A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital
Mammography: Proceedings of the 2nd International Workshop on Digi-
Bibliography 311
tal Mammography, York, UK, 10–12 July 1994, pages 375–378. Elsevier
Science, September 1994.
[172] L. Tabar, P. B. Dean, and T. Tot. Teaching Atlas of Mammography. Thieme
Medical Publishers, New York, USA, 3rd edition, January 2001.
[173] P. G. Tahoces, J. Correa, M. Soutu, L. Gomez, and J. J. Vidal. Computer-
assisted diagnosis: the classification of mammographic breast parenchymal
patterns. Physics in Medicine and Biology, 40(1):103–117, January 1995.
[174] L. Tarassenko, P. Hayton, N. Cerneaz, and M. Brady. Novelty detection
for the identification of masses in mammograms. In Proceedings of the
Fourth International Conference on Artificial Neural Networks, pages 442–
447. IEEE, June 1995.
[175] P. Taylor, S. Hajnal, M.-H. Dilhuydy, and B. Barreau. Measuring image
texture to separate “difficult” from “easy” mammograms. British Journal
of Radiology, 67(797):456–463, 1994.
[176] P. Taylor, R. Owens, and D. Ingram. 3-D Fractal Modelling of Breast
Growths. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th
International Workshop, pages 785–791, Madison, Wisconsin, USA, Decem-
ber 2001. Medical Physics Publishing.
[177] G.M. te Brake and N. Karssemeijer. Segmentation of suspicious densities
in digital mammograms. Medical Physics, 28(2):259–266, February 2001.
[178] C. H. van Gils, J. H. C. L. Hendriks, R. Holland, N. Karssemeijer, J. D. M.
Otten, H. Straatman, and A. L. M. Verbeek. Changes in mammographic
Bibliography 312
breast density and concomitant changes in breast cancer risk. European
Journal of Cancer Prevention, 8(6):509–515, December 1999.
[179] J. H. Veldkamp, N. Karssemeijer, J. D. M. Otten, and J. H. C. L. Hendriks.
Automated classification of clustered microcalcifications into malignant and
benign types. Medical Physics, 27(11):2600–2608, November 2000.
[180] W. Veldkamp and N. Karssemeijer. Improved correction for signal de-
pendent noise applied to automatic detection of microcalcifications. In
N. Karssemeijer, M. A. O. Thijssen, J. H. C. L. Hendriks, and L. J. T. O.
van Erning, editors, Digital Mammography, volume 13 of Computational
Imaging and Vision, pages 169–176. Kluwer Academic Publishers, Novem-
ber 1998.
[181] VuCOMP—Redefining CAD, http://www.vucomp.com/ , accessed
July 20 2005.
[182] R. Warren, M. Harvie, and A. Howell. Strategies for managing breast cancer
risk after the menopause. Treat Endocrinol, 3(5):289–307, 2004.
[183] A. P. Wickens. Foundations of Biopsychology. Pearson Education, Harlow,
England, 2nd edition, 2005.
[184] T. N. Wiesel. Postnatal development of the visual cortex and the influence
of the environment. Nature, 299:583–591, 1982.
[185] J. N. Wolfe. Risk for breast cancer development determined by mammo-
graphic parenchymal pattern. Cancer, 37(5):2486–2492, May 1976.
Bibliography 313
[186] C. J. Wright and C. B. Mueller. Screening mammography and public health
policy: the need for perspective. The Lancet, 436(8966):29–32, July 1995.
[187] Y. Wu, M. L. Giger, K. Doi, C. Vyborny, R. A. Schmidt, and C. E. Metz.
Artificial Neural Networks in Mammography: Application to Decision Mak-
ing in the Diagnosis of Breast Cancer. Radiology, 187:81–87, April 1993.
[188] W. Zhang, K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schmidt. An
improved shift-invariant artificial neural network for computerized detection
of clustered microcalcifications in digital mammograms. Medical Physics,
23(4):595–601, April 1996.
[189] C. Zhou, H.-P. Chan, N. Petrick, M. A. Helvie, M. M. Goodsitt, B. Sahiner,
and L. M. Hadjiiski. Computerized image analysis: Estimation of breast
density on mammograms. Medical Physics, 28(6):1056–1069, June 2001.
[190] R. Zwiggelaar, S. M. Astley, C. R. M. Boggis, and C. J. Taylor. Linear
Structures in Mammographic Images: Detection and Classification. IEEE
Transactions on Medical Imaging, 23(9):1077–1087, September 2004.
[191] R. Zwiggelaar and R. Marti. Detecting Linear Structures In Mammographic
Images. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th
International Workshop, pages 436–442, Madison, Wisconsin, USA, Decem-
ber 2001. Medical Physics Publishing.
[192] R. Zwiggelaar, T. C. Parr, J. E. Schuum, I. W Hutt, S. M. Astley, C. J.
Taylor, and C. R. M. Boggis. Model-based detection of spiculated lesions
in mammograms. Medical Image Analysis, 3(1):39–62, 1999.
Bibliography 314
[193] R. Zwiggelaar, P. Planiol, J. Marti, R. Marti, L. Blot, E. R. E. Denton,
and C. M. E. Rubin. EM Texture Segmentation of Mammographic Im-
ages. In H.-O. Peitgen, editor, Digital Mammography: IWDM—6th In-
ternational Workshop on Digital Mammography, pages 223–227. Springer-
Verlag, March 2003.
top related