statistical models of mammographic texture and appearance

Statistical Models of Mammographic

Texture and Appearance

A thesis submitted to the University of Manchester for the

degree of Doctor of Philosophy in the Faculty of Medical

and Human Sciences

2005

Christopher J. Rose

School of Medicine

1

Contents

1 Introduction 28

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.2 Breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.3 Computer-aided mammography . . . . . . . . . . . . . . . . . . . 30

1.4 Novelty detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.5 Generative models . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.6 Overview of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 33

1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2 Breast cancer 36

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.2 Anatomy of the breast . . . . . . . . . . . . . . . . . . . . . . . . 37

2

CONTENTS CONTENTS

2.3 Breast cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.3.1 What is breast cancer? . . . . . . . . . . . . . . . . . . . . 39

2.3.2 Predictive factors . . . . . . . . . . . . . . . . . . . . . . . 42

2.3.3 Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.3.4 Clinical detection . . . . . . . . . . . . . . . . . . . . . . . 46

2.3.5 Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.3.6 Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.4 Breast imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.4.1 X-ray mammography . . . . . . . . . . . . . . . . . . . . . 51

2.4.2 Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . 55

2.4.3 Magnetic resonance imaging . . . . . . . . . . . . . . . . . 55

2.4.4 Computed tomography . . . . . . . . . . . . . . . . . . . . 56

2.4.5 Thermography . . . . . . . . . . . . . . . . . . . . . . . . 57

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3 Computer-aided mammography 59

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.2 Computer-aided mammography . . . . . . . . . . . . . . . . . . . 60

3

CONTENTS CONTENTS

3.3 Image enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.4 Breast segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.5 Breast density and risk estimation . . . . . . . . . . . . . . . . . . 68

3.6 Microcalcification detection . . . . . . . . . . . . . . . . . . . . . 70

3.7 Masses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.8 Spiculated lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.9 Asymmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.10 Clinical decision support . . . . . . . . . . . . . . . . . . . . . . . 85

3.11 Evaluation of computer-based methods . . . . . . . . . . . . . . . 86

3.12 Image databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.13 Commercial systems . . . . . . . . . . . . . . . . . . . . . . . . . 94

3.14 Prompting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.15 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4 Scale-orientation pixel signatures 107

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.2 Mathematical morphology . . . . . . . . . . . . . . . . . . . . . . 108

4

CONTENTS CONTENTS

4.2.1 Dilation and erosion . . . . . . . . . . . . . . . . . . . . . 109

4.2.2 Opening and closing . . . . . . . . . . . . . . . . . . . . . 110

4.2.3 M- and N-filters . . . . . . . . . . . . . . . . . . . . . . . . 111

4.3 Pixel signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

4.3.1 Local scale-orientation descriptors . . . . . . . . . . . . . . 112

4.3.2 Constructing pixel signatures . . . . . . . . . . . . . . . . 113

4.3.3 Metric properties . . . . . . . . . . . . . . . . . . . . . . . 115

4.4 Analysis of the current implementation . . . . . . . . . . . . . . . 116

4.4.1 Structuring element length . . . . . . . . . . . . . . . . . . 116

4.4.2 Local coverage . . . . . . . . . . . . . . . . . . . . . . . . . 117

4.5 An information theoretic measure of signature quality . . . . . . . 122

4.5.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

4.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

4.6 Classification-based evaluation . . . . . . . . . . . . . . . . . . . . 128

4.6.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5

CONTENTS CONTENTS

4.6.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5 Modelling distributions with mixtures of Gaussians 133

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.3 Density estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 136

5.4 Gaussian mixture models . . . . . . . . . . . . . . . . . . . . . . . 140

5.4.1 Learning the parameters . . . . . . . . . . . . . . . . . . . 141

5.4.2 The k-means clustering algorithm . . . . . . . . . . . . . . 142

5.4.3 The Expectation Maximisation algorithm for Gaussian mix-

tures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

5.5 Useful properties of multivariate normal distributions . . . . . . . 151

5.5.1 Marginal distributions . . . . . . . . . . . . . . . . . . . . 151

5.5.2 Conditional distributions . . . . . . . . . . . . . . . . . . . 153

5.5.3 Sampling from a Gaussian mixture model . . . . . . . . . 160

6

CONTENTS CONTENTS

5.6 Learning from large datasets . . . . . . . . . . . . . . . . . . . . . 161

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

6 Modelling mammographic texture for image synthesis and anal-

ysis 166

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

6.3 Non-parametric sampling for texture synthesis . . . . . . . . . . . 170

6.4 A generative parametric model of texture . . . . . . . . . . . . . . 172

6.5 Generating synthetic textures . . . . . . . . . . . . . . . . . . . . 174

6.5.1 Pixel-wise texture synthesis . . . . . . . . . . . . . . . . . 174

6.5.2 Patch-wise texture synthesis . . . . . . . . . . . . . . . . . 174

6.5.3 The advantages and disadvantages of a parametric statisti-

cal approach . . . . . . . . . . . . . . . . . . . . . . . . . . 177

6.6 Some texture models and synthetic textures . . . . . . . . . . . . 178

6.6.1 A model of fractal mammographic texture . . . . . . . . . 178

6.6.2 A model of real mammographic texture . . . . . . . . . . . 179

6.6.3 The quality of the synthetic textures . . . . . . . . . . . . 182

6.6.4 Time and space requirements of the parametric method . . 185

7

CONTENTS CONTENTS

6.7 Novelty detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

7 Evaluating the texture model 190

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

7.2 Psychophysical evaluation of synthetic textures . . . . . . . . . . 191

7.2.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

7.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

7.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

7.3 Initial validation of the novelty detection method . . . . . . . . . 197

7.3.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

7.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7.4 Evaluation of novelty detection performance . . . . . . . . . . . . 200

7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 200

7.4.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

8

CONTENTS CONTENTS

7.4.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

7.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

7.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

8 GMMs in principal components spaces and low-dimensional tex-

ture models 222

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

8.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . 223

8.3 Gaussian mixtures in principal components spaces . . . . . . . . . 224

8.3.1 A numerical issue . . . . . . . . . . . . . . . . . . . . . . . 227

8.4 Texture synthesis in principal components spaces . . . . . . . . . 228

8.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

9 A generative statistical model of entire mammograms 233

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

9.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

9.2.1 Why are mammograms hard to model? . . . . . . . . . . . 234

9

CONTENTS CONTENTS

9.2.2 Approaches to modelling the appearance of entire mammo-

grams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

9.3 Modelling and synthesising entire mammograms . . . . . . . . . . 242

9.3.1 Breast shape and the correspondence problem . . . . . . . 243

9.3.2 Approximate appearance . . . . . . . . . . . . . . . . . . . 249

9.3.3 Detailed appearance . . . . . . . . . . . . . . . . . . . . . 254

9.3.4 Generating synthetic mammograms . . . . . . . . . . . . . 256

9.4 Example synthetic mammograms . . . . . . . . . . . . . . . . . . 257

9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

10 Evaluating the synthetic mammograms 261

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

10.2 Qualitative evaluation by a mammography expert . . . . . . . . . 262

10.3 A quantitative psychophysical evaluation . . . . . . . . . . . . . . 263

10.3.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

10.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

10.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

10.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

10

CONTENTS 11

10.4 Evaluating the detailing model . . . . . . . . . . . . . . . . . . . . 265

10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

11 Summary and conclusions 271

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

11.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

11.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

11.4 Final statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

A The expectation maximisation algorithm 280

A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

A.2 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

A.3 Proof of convergence . . . . . . . . . . . . . . . . . . . . . . . . . 282

List of Figures

2.1 Basic anatomy of the normal developed female breast. . . . . . . . 38

2.2 Incidence of breast cancer in England. . . . . . . . . . . . . . . . 43

2.3 The mediolateral-oblique and cranio-caudal views. . . . . . . . . . 52

3.1 An example microcalcification cluster. . . . . . . . . . . . . . . . 72

3.2 An example circumscribed mass. . . . . . . . . . . . . . . . . . . . 75

3.3 An example spiculated lesion. . . . . . . . . . . . . . . . . . . . . 79

4.1 Dilation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.2 A sieved mammographic image. . . . . . . . . . . . . . . . . . . . 112

4.3 Example pixel signatures. . . . . . . . . . . . . . . . . . . . . . . 114

4.4 An illustration of the two limitations of the existing implementation.118

4.5 Incremental approximations of the bow tie structuring element. . 119

12

13

4.6 Rotating the “rectangular” structuring elements. . . . . . . . . . . 121

4.7 An “improved” pixel signature from the centre of a Gaussian blob. 122

4.8 Regions of increased Shannon entropy. . . . . . . . . . . . . . . . 127

4.9 An example region of interest and its groundtruth. . . . . . . . . 129

5.1 An illustration of the expectation maximisation algorithm. . . . . 149

5.2 A two-dimensional distribution marginalised over one dimension. . 152

5.3 A conditional distribution. . . . . . . . . . . . . . . . . . . . . . . 154

5.4 The divide-and-conquer clustering algorithm. . . . . . . . . . . . . 163

6.1 Unconditional samples from the fractal model. . . . . . . . . . . . 180

6.2 Fractal training and synthetic textures. . . . . . . . . . . . . . . . 181

6.3 Unconditional samples from the real mammographic texture model. 182

6.4 Real training and synthetic textures. . . . . . . . . . . . . . . . . 183

6.5 Examples of synthesis failure using patch-wise synthesis with a

model of real mammographic appearance. . . . . . . . . . . . . . 185

7.1 A screenshot of one of the trials. . . . . . . . . . . . . . . . . . . . 195

7.2 Fractal and scrambled textures. . . . . . . . . . . . . . . . . . . . 198

7.3 ROC curve for texture discrimination. . . . . . . . . . . . . . . . 199

14

7.4 The circle chord attenuation function. . . . . . . . . . . . . . . . . 208

7.5 The sigmoid attenuation function. . . . . . . . . . . . . . . . . . . 208

7.6 Examples of simulated masses using the three methods. . . . . . . 209

7.7 Example log-likelihood image and ROC curve for simulated micro-

calcifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

7.8 Example log-likelihood image and ROC curve for a simulated mass. 212

7.9 ROC curve for simulated masses and microcalcifications (combined).213

7.10 Example log-likelihood image and ROC curve for a real microcal-

cification cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

7.11 ROC curve for real masses. . . . . . . . . . . . . . . . . . . . . . . 216

7.12 ROC curve for real microcalcifications and masses (combined). . . 217

8.1 Synthesis using a principal components model. . . . . . . . . . . . 229

9.1 Examples of mammographic variation. . . . . . . . . . . . . . . . 235

9.2 Overview of the Active Appearance Model. . . . . . . . . . . . . . 240

9.3 Samples from two shape models, illustrating the need for good

correspondences. . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

9.4 Values of the Kotcheff and Taylor objective function. . . . . . . . 246

9.5 Values of the MDL objective function. . . . . . . . . . . . . . . . 247

15

9.6 The initial and final correspondences for the mammogram shape

model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

9.7 Block diagram for the steerable pyramid decomposition. . . . . . . 252

9.8 The coefficients in the top three levels of a steerable pyramid de-

composition of a mammogram. . . . . . . . . . . . . . . . . . . . 253

9.9 Synthetic mammograms generated using the model. . . . . . . . . 259

9.10 Real and synthetic mammograms. . . . . . . . . . . . . . . . . . . 260

10.1 Contributions of detailing coefficients to real and synthetic mam-

mograms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

List of Algorithms

1 The non-iterative k-means algorithm. . . . . . . . . . . . . . . . . 143

2 The iterative k-means algorithm. . . . . . . . . . . . . . . . . . . 143

3 The EM algorithm for fitting a GMM with two components to

one-dimensional data. . . . . . . . . . . . . . . . . . . . . . . . . . 148

4 The EM algorithm for fitting a GMM with multiple components

to multivariate data. . . . . . . . . . . . . . . . . . . . . . . . . . 150

5 Efros and Leung’s texture synthesis algorithm. . . . . . . . . . . . 171

6 Pixel-wise texture synthesis with a Gaussian mixture model of local

textural appearance. . . . . . . . . . . . . . . . . . . . . . . . . . 175

7 Patch-wise texture synthesis with a Gaussian mixture model of

local textural appearance. . . . . . . . . . . . . . . . . . . . . . . 176

8 Fractal mammographic texture algorithm. . . . . . . . . . . . . . 179

9 Novelty detection using a Gaussian mixture model of texture. . . 188

10 Simulating microcalcification clusters. . . . . . . . . . . . . . . . . 206

11 Generating a synthetic mammogram . . . . . . . . . . . . . . . . 256

16

List of Tables

4.1 Classification results for the two signature types. . . . . . . . . . . 130

7.1 Results for the psychophysical experiment. . . . . . . . . . . . . . 196

17

Abstract

Breast cancer is the most common cancer in women. Many countries—including

the UK—offer asymptomatic screening for the disease. The interpretation of

mammograms is a visual task and is subject to human error. Computer-aided

image interpretation has been proposed as a way of helping radiologists perform

this difficult task. Shape and texture features are typically classified into true

or false detections of specific signs of breast cancer. This thesis promotes an

alternative approach where any deviation from normal appearance is marked as

suspicious, automatically including all signs of breast cancer. This approach re-

quires a model of normal mammographic appearance. Statistical models allow

deviation from normality to be measured within a rigorous mathematical frame-

work. Generative models make it possible to determine how and why a model is

successful or unsuccessful. This thesis presents two generative statistical models.

The first treats mammographic appearance as a stationary texture. The sec-

ond models the appearance of entire mammograms. Psychophysical experiments

were used to evaluate synthetic textures and mammograms generated using these

models. A novelty detection experiment on real and simulated data shows how

the model of local texture may be used to detect abnormal features.

18

Declaration

No portion of the work referred to in the thesis has been submitted in support of

an application for another degree or qualification of this or any other university

or other institute of learning.

19

Copyright

1. Copyright in text of this thesis rests with the Author. Copies (by any

process) either in full, or of extracts, may be made only in accordance with

instructions given by the Author and lodged in the John Rylands University

Library of Manchester. Details may be obtained from the Librarian. This

page must form part of any such copies made. Further copies (by any

process) of copies made in accordance with such instructions may not be

made without the permission (in writing) of the Author.

2. The ownership of any intellectual property rights which may be described

in this thesis is vested in the University of Manchester, subject to any

prior agreement to the contrary, and may not be made available for use by

third parties without the written permission of the University, which will

prescribe the terms and conditions of any such agreement.

3. Further information on the conditions under which disclosures and ex-

ploitation may take place is available from the Head of School of School

of Medicine.

20

Dedication

This thesis is dedicated to the memory of Gareth Jones.

In addition to being an excellent office mate, Gareth made a substantial contri-

bution to my PhD research. With his dry sense of humour, willingness to help

and pragmatic perfectionism—and despite his admirable unwillingness to bend

to the stupidity of others—he motivated me to learn how to prepare documents

using the LATEX typesetting system, contributed to discussions on mathematical

matters, helped me with various aspects of MATLAB and UNIX, and radically

altered my view of computers and programming. It is a pleasure to have known

him, and I wish I had known him better.

Friday 21 November 2003.

21

Acknowledgements

The author would like to thank the following people:

• My mother, Anne, who has put myself and my brothers first in everything

she has done.

• My girlfriend, Chris, for uncountable reasons.

• My PhD supervisor, Prof. Chris Taylor OBE, who is patient, supportive,

giving and hard-working.

• Anthony Holmes, for his generosity in getting me started.

• Special thanks go to Andrew Bentley, who employed a spotty teenage geek

and taught him electronics and computer programming. This thesis would

not exist without his support—thank you! Thanks also to Richard, David,

Keith and Martin for all their assistance.

• My friends, for their support over the last few years: Stuart, Rick, Rob,

Jimi, Elios, Alan, Caroline, Harpreet, Karen, Ruth and Sian.

• My office mates: Gareth, Craig, Mike, Kaiyan, Basma, Tamader, John and

Rob.

22

23

• Other members of ISBE, including Tim Cootes, Carole Twining, Sue Astley,

Paul Beatty, Jim Graham, Ian Scott and Tomos Williams, for their help at

various times during my time as a PhD student.

• The ISBE information technology support team for keeping things ticking.

• Alexandre Nasrallah for proof-reading some of the chapters in this thesis.

Funding

The work described in this thesis was supported by the EPSRC as part of the

MIAS-IRC project From Medical Images and Signals to Clinical Information (EP-

SRC GR/N14248/01 and UK Medical Research Council Grant No. D2025/31).

24

About the Author

In holiday time during his A-level studies and first degree, Chris Rose worked

for Kraft Jacobs Suchard on a range of electronic and software projects. He

graduated from The University of Manchester in 1999 with a 2.1 BEng (Hons)

degree in Electronic Systems Engineering. He then worked for a small software

house where he developed software and produced training materials for Ericsson.

In 2000, he returned to The University of Manchester to begin a PhD in the

Division of Imaging Science and Biomedical Engineering, under the supervision

of Prof. Chris Taylor OBE. During this period he published the following papers

related to the work in this thesis.

• C. J. Rose and C. J. Taylor. An Improved Method of Computing Scale-

Orientation Signatures. In Medical Image Understanding and Analysis,

pages 5–8, July 2001

• C. J. Rose and C. J. Taylor. A Statistical Model of Texture for Medical Im-

age Synthesis and Analysis. In Medical Image Understanding and Analysis,

pages 1–4, July 2003

25

26

• C. J. Rose and C. J. Taylor. A Model of Mammographic Appearance. In

British Journal of Radiology Congress Series: Proceedings of UK Radiolog-

ical Congress 2004, pages 34–35, Manchester, United Kingdom, June 2004

• C. J. Rose and C. J. Taylor. A Statistical Model of Mammographic Ap-

pearance for Synthesis and Analysis. In International Workshop on Digital

Mammography, 2004. (Accepted, pending.)

• C. J. Rose and C. J. Taylor. A Generative Statistical Model of Mammo-

graphic Appearance. In D. Rueckert, J. Hajnal, and G.-Z. Yang, editors,

Medical Image Understanding and Analysis 2004, pages 89–92, Imperial

College London, UK, September 2004

• C. J. Rose and C. J. Taylor. A Holistic Approach to the Detection of

Abnormalities in Mammograms. In British Journal of Radiology Congress

Series: Proceedings of UK Radiological Congress 2005, page 29, Manchester,

United Kingdom, June 2005

• A. S. Holmes, C. J. Rose, and C. J. Taylor. Measuring Similarity between

Pixel Signatures. Image and Vision Computing, 20(5–6):331–340, April

2002

• A. S. Holmes, C. J. Rose, and C. J. Taylor. Transforming Pixel Signa-

tures into an Improved Metric Space. Image and Vision Computing, 20(9–

10):701–707, August 2002

‘As many truths as men. Occasionally, I glimpse a truer Truth, hiding in im-

perfect simulacrums of itself, but as I approach, it bestirs itself and moves deeper

into the thorny swamp of dissent.’

From Cloud Atlas by David Mitchell.

27

Chapter 1

Introduction

1.1 Introduction

Since work for this thesis began, approximately 64 000 British women have died

from breast cancer [24]. Computer-aided X-ray mammography has been pro-

posed as a way to help radiologists detect breast cancer at an early stage. This

thesis describes work on generative statistical models of normal mammographic

appearance. The ultimate aim of this strand of research is to be able to detect

breast cancer as a deviation from normal appearance. The generative property

enables insight into what has been modelled successfully and where improvement

is needed. Two generative statistical models of mammographic appearance are

described.

This chapter presents a brief overview of the main subjects and motivations of

this thesis. The chapter presents:

28

Chapter 1—Breast cancer 29

• An overview of breast cancer.

• An overview of computer-aided mammography.

• A description of novelty detection, the approach to breast cancer detection

that motivates this thesis.

• A description of generative models, and an explanation of why this property

is vital to developing accurate models.

• An overview of the organisation of the thesis.

1.2 Breast cancer

Approximately 11 500 women die from breast cancer each year in England and

Wales and it is the most common cancer in women (both in the UK and world-

wide) [82]. It is possible to detect breast cancer at an early stage using X-ray

mammography; treatments are available and survival rates are good [82]. The UK

National Health Service Breast Screening Programme (NHSBSP) was initiated in

1988 as a result of the Forrest report [66], published in 1987. All asymptomatic

women aged 50–69 are invited for X-ray mammographic screening every three

years. Radiologists visually inspect these X-ray images for signs of breast cancer

and other problems. A more detailed background to breast cancer and screening

is presented in Chapter 2.

Chapter 1—Computer-aided mammography 30

1.3 Computer-aided mammography

Research into the use of computers to detect breast cancer in mammograms has

been underway for about thirty years. In the most common approach, a com-

puter automatically analyses a digitised mammogram and attempts to locate

signs of cancer. Detections are displayed to clinicians as prompts on a computer

screen or paper printout. Computer-aided mammography research has matured

to the point where, in 1998, the US Food and Drug Administration (FDA) gave

pre-market approval to the ImageChecker system, developed by R2 Technology

Incorporated. Three other systems have since been given FDA approval. How-

ever, results from research into the effectiveness of these systems in the clinical

environment are mixed. A large prospective study recently showed that expert

screening radiologist performance in one academic practice was not improved by

the use of a computer-aided mammography system [76] (see Section 3.14 for a

more detailed discussion). Other studies have indicated that such systems can

help radiologists detect breast cancer earlier [8]. Psychophysical experiments

that have studied the effect of the false prompt rate (i.e. incorrect detections of

cancer) on radiologist performance indicate that the number of true and false

prompts must be approximately equal if radiologist performance is to be im-

proved [95]. Only 5% of screening mammograms have any form of abnormality.

This suggests that a target rate should be approximately 0.0125 false positives

per image (see Chapter 3). Commercial systems operate at much higher false

positive rates. For example, R2 Technology Incorporated claim that version 8.0

of their ImageChecker algorithm achieves ‘1.5 false positive marks per normal

case at the 91 percent sensitivity level’ [149]. This perhaps explains why the

Chapter 1—Novelty detection 31

commercial computer-aided mammography systems do not appear to improve

radiologist performance. Research is needed to determine how computer-aided

mammography systems can be improved and how the false positive rate can be

reduced to the target level. It is likely that much more sophisticated approaches

will be required. This thesis investigates one such approach, which is described

briefly in the next section.

1.4 Novelty detection

Breast cancer, as imaged in mammograms, can manifest itself in a number of

different ways. Masses appear as “blob”-like features, microcalcifications appear

as very small specks, architectural distortions subtly change the appearance of

the breast tissue and spiculated masses have radiating linear structures. Each of

these can be extremely subtle. Current computer-aided mammography methods

typically target only microcalcifications and masses (including spiculated masses),

and treat each type of abnormality separately. A common approach is to locate

candidate abnormalities (often using ad hoc methods), compute measurements of

shape and texture (called features) and then use a classifier to classify the features

into clinically meaningful classes (e.g. malignant or benign). The approach has a

number of drawbacks:

• Different features and classifiers are required for each type of abnormality.

• The features and classifiers implicitly and incompletely model the appear-

ance of normal and breast cancer tissue. These tissue types are subject to

Chapter 1—Generative models 32

significant variation.

• It is often difficult to justify why a particular measure of texture or shape

is better than another and what it actually represents.

• The use of ad hoc methods risks the accidental adoption of assumptions

about the data.

The approach advocated in this thesis is novelty detection, which is motivated

by the fact that signs indicative of breast cancer are not found in pathology-free

mammograms. If deviation from normality could be detected, then all types of

abnormality would automatically be detectable. This approach requires a model

of what normal mammograms look like. Mammograms vary dramatically, both

between women and between screening sessions, so such a model must be able to

cope with this variability. Statistical models capture variability and are suited to

novelty detection problems because deviation from normality can be measured in

a meaningful way within a rigorous mathematical framework. Abnormal mam-

mograms are relatively rare in the screening environment, so there is much more

data with which to train a model of normality than there is to train a classifier

that has an “abnormal” class.

1.5 Generative models

If novelty detection is to be used, then the underlying model must be able to

“legally” represent any pathology-free instance and be unable to legally represent

abnormal instances. The only way to verify this is to be able to generate instances

Chapter 1—Overview of the thesis 33

from the model; thus the model must be generative. Further, generative models

make it relatively easy to visualise what has been modelled successfully and what

has not. The generative property makes progress towards a model that accurately

explains mammographic appearance tractable. The aim of the research presented

in this thesis was to develop and evaluate generative statistical models of normal

mammographic appearance with the ultimate aim of being able to detect breast

cancer via novelty detection. Two models have been developed and evaluated.

The first assumes that mammograms are textures and neglects the shape of the

breast and the spatial variability in mammographic texture. The model allows

synthetic textures to be generated and can be used in an analytical mode to

perform novelty detection. The second is a generative statistical model of entire

mammograms and addresses many of the problems associated with modelling

mammographic appearance.

1.6 Overview of the thesis

• Chapter 2 presents background information on breast cancer, the clinical

problem and the various imaging modalities that are used to diagnose the

disease.

• Chapter 3 presents a review of the computer-aided mammography litera-

ture.

• Chapter 4 describes work on improving the way that scale-orientation

pixel signatures (a type of texture feature) are computed. A measure of

Chapter 1—Overview of the thesis 34

signature quality, based upon information theory, is developed and a simple

classification experiment is presented.

• Chapter 5 presents background information on the multivariate normal

distribution and the Gaussian mixture model. These models are used ex-

tensively in this thesis.

• Chapter 6 presents Efros and Leung’s algorithm for texture synthesis and

develops the method into a parametric statistical model of texture that can

be used in both generative and analytical modes. Synthetic textures are

presented.

• Chapter 7 presents a psychophysical evaluation of synthetic mammo-

graphic textures produced by the model developed in Chapter 6. A novelty

detection experiment using simulated and real data is presented.

• Chapter 8 presents an investigation into how Gaussian mixture models

(and hence the class of texture model presented in Chapter 6) may be

learned in low-dimensional principal components spaces. Texture synthesis

and analysis using such models is discussed.

• Chapter 9 describes a generative statistical model of entire mammograms

and shows how synthetic mammograms may be generated.

• Chapter 10 presents three evaluations of the synthetic mammograms gen-

erated using the model of entire mammograms.

• Chapter 11 summarises the work presented in the thesis.

Chapter 1—Summary 35

1.7 Summary

This chapter presented a brief overview of the subjects, motivations and structure

of this thesis. The next chapter presents an introduction to breast cancer and

the imaging modalities used to detect the disease.

Chapter 2

Breast cancer

2.1 Introduction

This chapter introduces the clinical problem of breast cancer and describes how

medical imaging is used to detect the disease. The chapter discusses:

• The anatomy of the breast.

• Breast cancer and its risk factors, prevention, detection, treatment and

survival.

• The various medical imaging modalities used to detect breast cancer, par-

ticularly X-ray mammography.

36

Chapter 2—Anatomy of the breast 37

2.2 Anatomy of the breast

The main purpose of the female breast is to produce and deliver milk to offspring.

Additionally, breasts are a secondary sexual characteristic and serve to indicate

sexual maturity. A brief description of the basic anatomy of the breast follows,

but the interested reader is directed to [172] for a comprehensive description

within the context of mammography.

The breast itself is a modified sweat gland and is composed of several structures,

illustrated in Figure 2.1. Above the ribcage is the pectoral muscle. At the front

of the breast, and externally visible, is the nipple. Milk is produced in lobes and

delivered to the nipple by ducts. These are collectively referred to as parenchymal

or glandular tissue; they are the functional structures of the breast, as opposed to

being connective or supporting tissues. The areola exposes glands that lubricate

the nipple during breastfeeding. Circular radiating muscles behind the areola

cause the nipple to become erect upon tactile stimulation, facilitating suckling.

The lymphatic system is responsible for protecting the body from infection from

microorganisms and antigens. This is achieved by transporting the microorgan-

isms and antigens to the lymph nodes where they are dealt with by the body’s

cellular immune system. Blood is transported to and from the breast by the vas-

culature. Blood delivers oxygen and nutrients and removes waste products. The

structure of breast is supported by Cooper’s ligaments and also contains adipose

(fatty) tissue, neither of which are shown in Figure 2.1.

Chapter 2—Anatomy of the breast 38

Figure 2.1: Basic anatomy of the normal developed female breast.Key:A Pectoral muscleB VasculatureC LobeD DuctE Lymph node and lymphatic systemF NippleG Areola


2.3 Breast cancer

Breast cancer is almost exclusively a disease that affects women: 11 491 women

and 82 men died from breast cancer in England and Wales in 2002 [82]. We will

now briefly examine the background to the disease.

2.3.1 What is breast cancer?

We will now briefly discuss the cellular basis of cancer1. Our bodies are composed

of cells, which typically carry all of the genetic information required to determine

how we will grow. Cancer is an umbrella term for a group of diseases that cause

cells in the body to reproduce in an uncontrolled manner.

Cells have several abilities, one of which is reproduction. Reproduction is achieved

via cell division. At each cell division, the genetic material contained within the

mother cell is copied to the daughter cells via a robust mechanism. This robust

mechanism can detect errors in the genetic material contained within the cell and

can instruct the cell to “commit suicide” via programmed cell death (PCD)2 to

prevent the erroneous information from being propagated.

Recent cancer research has suggested that an enzyme called telomerase [77] plays

an important role. At each normal cell division, genetic material at the ends

of the chromosomes is lost. To prevent useful genetic material from being de-

stroyed, the ends of chromosomes have redundant repeating genetic sequences

1The interested reader is directed to [36] for background material on cellular biology2PCD is also referred to as apoptosis.


called telomeres. Part of these sequences are lost at each cell division, but the

genetic information specific to the organism is preserved. If telomeres become

too short, or are deleted entirely, the body interprets the genetic sequence as

being broken. In this situation, the cell can be instructed to perform PCD, or

reparative mechanisms can be employed. These reparative mechanisms can intro-

duce genetic mutations. Cancer cells are “immortal” in that they do not respond

to PCD instructions. Telomerase—an enzyme that builds new telomeres—is ex-

pressed in approximately 90% of cancers, and the telomeres in cancer cells do not

shorten. It is believed that telomerase may be the reason why cancer cells are

immortal. Cancer cells divide rapidly until they are forcefully destroyed (e.g. by

medical intervention or the death of the host organism). Cancer cells are there-

fore genetically abnormal, but the exact genetic nature of cancer is not yet fully

understood.

Cancers are named after their originating organ (i.e. breast cancer originates in

the breast and is composed of pathological breast tissue). Cancer cells can break

away from their original location and travel through the vascular or lymphatic

systems. These cells may lodge to form secondary cancers in other parts of the

body. This process is called metastasis. The new cancer is named after the

originating tissue and new location, for example secondary breast cancer of the

brain. Breast cancer generally develops in the ducts (ductal cancer), but may

also develop in the lobes (lobular cancer).

The terms cancer and tumour are not synonymous. A tumour may be benign or

malignant. Benign tumours are abnormal growths, but do not grow uncontrol-

lably or metastasise, and are not necessarily life-threatening. The word cancer


is synonymous with the phrase malignant tumour. Benign tumours can become

malignant, but malignant tumours do not become benign. Cancer is caused by a

number of factors that can act individually or in combination [5]. These include:

• External factors, e.g. exposure to:

– Chemicals—particularly tobacco use

– Infectious organisms

– Radiation

• Internal factors, e.g. :

– Inherited and metabolic genetic mutations

– Hormones

– Immunity responses

Breast cancers can be described as being in situ (i.e. they have not spread from

their originating duct or lobule), and are often cured [4]. Alternatively, breast

cancers can be described as being invasive or infiltrating (i.e. they have broken

into the surrounding fatty tissue of the breast). The severity of an invasive breast

cancer is related to the stage of the disease, which describes how far it has spread

(e.g. it is confined to the breast, or surrounding tissue, or has metastasised to

distant organs). The following terms are often used to describe the stage of the

disease [37]:


• Stage 1

– The tumour is no larger than 2 cm in diameter.

– The lymph nodes in the armpit are unaffected.

– The cancer has not metastasised.

• Stage 2

– The tumour is between 2 cm and 5 cm in diameter, and/or the cancer

has spread to the lymph nodes under the armpit.

– The cancer has not spread elsewhere in the body.

• Stage 3

– The tumour is larger than 5 cm in diameter.

– The cancer has spread to the lymph nodes under the armpit.

– The cancer has not spread elsewhere in the body.

• Stage 4

– The tumour may be any size.

– The lymph nodes in the armpit are often affected.

– The cancer has spread to other parts of the body.

2.3.2 Predictive factors

The risk of developing breast cancer increases with age, as Figure 2.2 illustrates.

In the USA, 95% of new cases and 96% of breast cancer deaths in the period


Figure 2.2: Incidence of breast cancer in England.The incidence of breast cancer in English women in 2001 per 100 000 populationas a function of age. Linear interpolation is used between data points. Source ofdata: National Statistics [21].

1996–2000 occurred in women aged 40 and older [4].

Risk factors can be grouped by relative risk3 [4]:

• Relative risk > 4.0

– Inherited genetic mutations (particularly BRCA1 and/or BRCA2).

– Two or more first-degree relatives4 diagnosed with breast cancer at an

early age.

3Relative risk is defined as the ratio of the probability of the disease in the group exposedto the risk, to the probability of the disease in a control group.

4A first-degree relative is a mother, father, sister, brother, daughter or son


– Post-menopausal breast density.

• Relative risk > 2.0 and ≤ 4.0

– One first-degree relative with breast cancer.

– High dose of radiation to the chest.

– High post-menopausal bone density.

• Relative risk > 1.0 and ≤ 2.0

– Late age at first full-term pregnancy (> 30 years).

– Early menarche (< 12 years).

– Late menopause (> 55 years).

– No full-term pregnancies.

– Recent oral contraceptive use.

– Recent and long-term hormone replacement therapy.

– Tall.

– High socioeconomic status.

– Post-menopausal obesity.

Tobacco use is not necessarily linked to breast cancer. Some studies have shown

that smoking is not associated with the disease, while others have indicated a

link [43]. Effects due to smoking are confounded by alcohol use, which correlates

with both tobacco use and increased breast cancer risk. Alcohol is the dietary

factor most consistently associated with increased breast cancer risk [4] and breast

cancer risk increases by about 7% per alcoholic drink consumed per day [112].


2.3.3 Prevention

Breast cancer cannot be prevented due to the environmental and inherited risk

factors. However it should be possible to reduce the incidence of cancers that can

be attributed to lifestyle factors via behavioural modification.

One of the most important lifestyle changes that can be made is the management

of alcohol consumption: even moderate alcohol use is associated with increased

breast cancer risk [4]. Moderate alcohol consumption has a cardio-protective

effect, so advice on alcohol consumption must consider more than just breast

cancer risk [182]. Women who are not known to have an increased risk of breast

cancer are advised to adopt a healthy lifestyle by limiting alcohol, avoiding to-

bacco use and by maintaining a healthy weight through regular exercise and a

diet that is low in fats and high in fruit and vegetables. However, this advice

is not specific to breast cancer, and instead considers evidence for all common

diseases [182]. Women who are known to have an increased risk of breast cancer

should be advised accordingly.

There is debate within the clinical community about how women should be ad-

vised regarding tobacco use and its effect on breast cancer risk. Some favour

honest advice that states that the balance of evidence shows no or little increased

risk, while others favour advice that emphasises the evidence that indicates that

there is an increased risk in some circumstances, and that women should be dis-

couraged from smoking because of other associated risks (e.g. lung cancer) [43].

General practitioners should consider the risk of breast cancer when prescrib-

ing hormonal medications such as hormone replacement therapy or oral contra-


ceptives. Women at very high risk may be offered prophylactic mastectomy or

treatment with a drug such as Tamoxifen [4].

2.3.4 Clinical detection

Breast cancer is most successfully treated at an early stage and it has been rec-

ommended for the past 30 years or so that women perform regular breast self-

examination (BSE). In recent years this advice has been challenged. A Canadian

meta-analysis failed to find evidence that BSE reduces breast cancer mortality,

but found that BSE results in more benign breast biopsies and increased patient

distress [12]. The study recommended that women should not be taught BSE, but

the author stresses the difference between BSE and breast self-awareness, and en-

courages the latter [118]. An American study found that women who had benign

biopsies after performing BSE tended to perform BSE less frequently as a result

[13]. Advice on BSE and breast self-awareness needs to informed by evidence of

the risks of increased biopsy rate and distress with the potential benefits. The

American Cancer Society currently recommends that women optionally perform

monthly BSE [4].

Some countries have implemented national screening programmes—where women

are invited for asymptomatic X-ray imaging of the breast (mammography) to

detect cancer at an early stage. The International Breast Cancer Screening Net-

work currently has 27 member countries who have pilot or established national

or subnational screening programmes [101]. These members are predominantly

developed countries in North America, Western Europe and the Far East. The


UK National Health Service Breast Screening Programme (NHSBSP) was initi-

ated in 1988 as a result of the Forrest report [66]. Women between the ages of

50 and 70 (formerly 65) are invited for screening every three years. Women now

have two views of each of their breasts imaged at each screening session, resulting

in 13% more breast cancers being detected in 2002/3 compared with the previ-

ous 12 months when a single view was used [133]. A 14 year follow-up of the

Edinburgh randomised trial of breast screening, published in 1999, showed that

breast screening reduced breast cancer mortality by 13% [2]; the NHSBSP an-

nual review for 2004 [133] claims that mortality dropped by 30% in the preceding

decade, though this success cannot be attributed to breast screening alone.

The benefits of asymptomatic breast screening are disputed and some argue that

screening may even be detrimental to the health of women. Gøtzsche and Olsen

argue that there is no reliable evidence that screening mammography reduces

mortality and that screening may result in distress and unnecessarily aggressive

treatment [73, 136]. However, their conclusions are largely based upon meta-

analyses which debunk studies that show that screening has a positive effect,

rather than upon data that show that screening has a negative effect. Another

criticism of screening mammography is economic. While the cost per woman

screened is low (approximately £40 [134] in the UK), another picture emerges

when one looks at the cost per life saved. The UK NHSBSP currently costs

approximately £52M per year and is estimated to save approximately 300 lives

per year [134]. This equates to an approximate average cost of £173 300 per

life saved. By the year 2010, it is estimated that the NHSBSP will save 1 250

lives per year; this will bring the cost per life saved down to approximately


£41 600 (assuming other factors do not change). In 1995, the cost per life saved

by the Ontario, Canada screening programme was estimated to be £558 000,

based upon the cost of a single mammography examination and the estimated

number of women who would need to be screened in order for one life to be saved

[186]. Variation in the cost of screening can be attributed to the environment

and manner in which screening and treatment are implemented. It is a matter

for those responsible for public health policy to determine the best use of available

resources given the evidence for and against screening mammography.

Molecular tests are now available that can detect some of the BRCA genetic

mutations [4] and these may be used routinely in the future. Consideration is

being given to a UK-wide programme to use magnetic resonance imaging to screen

pre-menopausal women at high genetic risk of breast cancer [28].

2.3.5 Treatment

Treatment for breast cancer is dependent upon several factors: the stage of dis-

ease and its biological characteristics, patient age and the risks and benefits as

determined by clinicians and the patient [4]. Surgery to remove the cancerous

tissue is common, and the type of surgery is chosen to balance the need to re-

move the cancer with the disfigurement that the surgery will cause. Surgery may

involve (in order of increasing disfigurement):

• Lumpectomy—which can be employed when the cancer is localised—involves

removing the “lump” and a border of “normal” tissue which is checked to

ensure that all cancerous tissue has been removed.


• Simple mastectomy (or total mastectomy) involves the removal of the entire

breast.

• Modified radical mastectomy involves the removal of the entire breast and

underarm lymph nodes.

• Radical mastectomy involves the removal of the breast, underarm lymph

nodes and chest wall muscle. This type of surgery is now used less frequently

as less disfiguring approaches have proved to be effective [4].

Surgery is often used alongside chemotherapy, hormone therapy, biologic (also

called immune and antibody) therapy or radiotherapy. Chemotherapy, hormone

and biologic therapies are systemic treatments in that they are applied to the

entire body—rather than a specific organ—with the intention of killing cancer

cells that may have metastasised.

Chemotherapy is a drug treatment that kills rapidly dividing cells. This includes

cancer cells as well as some types of normal cells, such as blood and hair cells.

Chemotherapy, in combination with surgery, has been shown to deliver five year

survival rates of between 50% and 70% [25]

Hormone therapy attempts to prevent the growth of metastasised cancer cells

by blocking the effects of hormones (such as oestrogen) that can promote their

growth. An anti-oestrogen drug called Tamoxifen has been used successfully,

but recent research shows that the aromatase inhibitor anastrozole significantly

increases disease-free survival over five years compared to Tamoxifen [75].


Trastuzumab (marketed under the name Herceptin) is a biologic therapy that

targets cancer cells which produce an excess of a protein called HER2. When

combined with chemotherapy, trastuzumab treatment can reduce the relative

risk of mortality by 20%, but can increase the risk of heart failure [119].

In contrast to the systemic treatments, radiotherapy (also called radiation ther-

apy) is targeted at specific locations. High energy radiation is focused on areas of

the body affected with cancer (such as the breast, chest wall or underarm area).

Alternatively, small radiation sources, called pellets, can be implanted into the

cancer. There is no significant difference in survival between women who have

small breast tumours removed by lumpectomy compared to those who also re-

ceive radiotherapy, but women who receive radiotherapy have a reduced risk of

their cancer returning and therefore require less additional treatment [64].

2.3.6 Survival

The one and five year survival rates for English women diagnosed with breast

cancer between 1993 and 2000 were 92.6% and 75.9% respectively [148]. For

comparison, in the same period the mean one and five year survival rates in

both sexes for lung cancer—the second most common cancer in women and most

common cancer in men—were 21.6% and 5.5% respectively. In the USA, the

five year survival rate for women with breast cancer is 87% [4]. There is also an

association between low socioeconomic status, poor access to medical care and

additional illness and low survival rates [4].

Chapter 2—Breast imaging 51

2.4 Breast imaging

This section introduces X-ray mammography—the most common form of clinical

imaging used to detect breast cancer—and briefly discusses the other imaging

modalities that may be used.

2.4.1 X-ray mammography

X-rays were discovered by Wilhelm Conrad Rontgen in 1895, who was awarded

the first Nobel prize for physics for his discovery. X-rays are high-frequency

electromagnetic radiation (30 PHz–60 EHz) and are useful in diagnostic imaging

because the dense tissues in the body are more likely to absorb X-rays (i.e. they

are radio-opaque) while the soft tissues are less likely to absorb X-rays (i.e. they

are radiolucent). X-rays are formed by accelerating electrons from a heated cath-

ode filament towards an anode. The interaction of the high energy electrons with

the anode emits radiation in the X-ray spectrum. This radiation is then directed

towards the patient.

X-rays are detected using photographic film or digitally (e.g. using a charge-

coupled device). By placing a body part between the X-ray source and detector,

it is possible to form an image that spatially describes the X-ray absorption of

the body part. This image will be a two-dimensional projection of the three-

dimensional structure.

X-rays were first used to investigate breast cancer almost a hundred years ago

[160]. An X-ray mammogram is obtained by imaging the breast compressed


Figure 2.3: The mediolateral-oblique and cranio-caudal views.The diagram illustrates the directions of compression used in the mediolateral-oblique (MLO) and cranio-caudal (CC) views. The MLO view is illustrated onthe left in blue and the CC view is illustrated on the right in red.

between two parallel radiolucent plates. Different directions of compression allow

clinicians to view the three-dimensional structure of the breast in more than one

way. This allows ambiguities caused by occlusion or other perspective effects to

be minimised. Two common views are the cranio-caudal view (CC—“head to

tail”) and the mediolateral-oblique view (MLO; where the compression is angled

approximately 45◦ to the CC view). These are illustrated in Figure 2.3.

X-ray mammography is the imaging modality of choice for breast cancer investiga-


tion and the UK National Health Service Breast Screening Programme generates

hundreds of thousands of mammograms each year [133]. X-ray mammography

is favoured because of its high resolution (required to image microcalcifications)

and low cost (approximately £40 per woman screened [134]).

Fully digital systems are increasing in quality and popularity. The advantages of

fully digital systems may include:

• Direct digital image acquisition.

• Increased sensitivity compared to film-based methods, permitting lower ra-

diation dosage.

• Immediate image display and enhancement.

• Improved archival and transmission possibilities (including remote image

analysis by human or computer).

It is expected that fully digital mammography will soon supersede film-based

mammography. Fully digital mammography is likely to benefit the computer-

aided mammography research community, as the digitisation step required for

film-based mammography is an impediment to the collection of useful image

data.

Although X-ray mammography remains the most useful imaging modality for

breast cancer, it is dependent upon the use of radiation, which itself can cause

cancer. It is likely that some cancers are caused by the screening programme.

Efforts are made to monitor and minimise radiation dose.


Mammograms are most commonly read visually as X-ray films, although com-

mercial computer-aided mammography and digital systems are being used—

particularly in the USA (see Section 3.13 for a discussion of commercial sys-

tems). In the screening environment, dedicated viewing stations are loaded with

a batch of mammograms. The mammograms are positioned so that left and

right breasts—and CC and MLO views, if both are available—can be compared

directly. Radiologists use strategies to try and ensure that ‘danger zones’ are al-

ways examined. In the UK screening environment, it is typical that a radiologist

will take an average of 30 s to read each patient’s mammograms. Radiologists

record their assessments and difficult cases are likely to be discussed with col-

leagues. If double reading is used—where two radiologists independently read

each mammogram—a protocol will be followed to combine the assessments of

each radiologist.

Women for whom screening indicates abnormality are recalled for further investi-

gation such as a magnification X-ray or ultrasound. The diagnosis of breast cancer

may be confirmed by analysing a tissue sample extracted by biopsy. Because the

interpretation of mammograms is a difficult task and is subject to human error,

biopsies are sometimes performed on women who do not have cancer. The recall

process is traumatic and biopsy—like any surgery—causes discomfort and worry.

The benign biopsy rate in 2002/3 was 1.20 per 100 000 women screened [133].

The benign biopsy rate has improved with advances in diagnostic technique.

The radiological signs of breast cancer are described in Chapter 3; example images

are given for the most common indicative signs.


2.4.2 Ultrasonography

Ultrasound imaging works by sending high-frequency sound pulses into the tissues

of a patient using an array of transceivers that is placed on the patient’s skin.

When these sounds encounter tissue interfaces, some of the sound is reflected back

to the array. The distances from the skin surface to the tissue interfaces are then

computed based upon the time between the pulses being sent and received and

the speed of the sound wave. One-dimensional transceiver arrays produce image

slices, while two-dimensional arrays produce volumes. These are presented to

the ultrasonographer on a computer display. Ultrasound images are generated in

real-time and are useful in breast cancer investigation when a suspicious feature

has been identified by X-ray mammography or when a patient has reported with

symptoms [63]. It is particularly useful for differentiating between cysts (which

are benign) and malignant masses.

2.4.3 Magnetic resonance imaging

The human body is composed largely of water, which in turn is composed largely

of hydrogen. A hydrogen atom has an unpaired proton, and so has a non-zero

nuclear spin. In magnetic resonance imaging (MRI), the patient is placed in a

strong uniform magnetic field (usually between 0.23T and 3.0T). This forces the

spins of the protons in the hydrogen atoms to align with the field. Almost all

protons will be paired, in that each member of a pair will be oriented at 180◦ to

the other, but some will not. A radio frequency pulse can temporarily deflect the

unpaired protons.


The imparted energy is released as electromagnetic radiation as the spins realign

with the field. The realignment signal is characteristic of tissue type and can

be measured. By applying an additional graduated magnetic field it is possible

to localise the signals, since their frequency is related to their position in the

graduated field. The received signals are recorded in a frequency space called

K-space. An inverse Fourier transform is applied to form the corresponding

spatial volumetric data. Voxel values represent tissue type and hence the patient’s

anatomy.

The spatial resolution of current clinical MRI systems is not as good as that of X-

ray mammography, so microcalcifications cannot be imaged. However, MRI has

several advantages over X-ray imaging: patients are exposed to little radiation,

three-dimensional data can be acquired and contrast agents can be used. Un-

fortunately, MRI is currently too expensive for routine asymptomatic screening

for breast cancer, but may be useful for screening younger women whose family

history and/or genetic status suggest that are at increased risk of breast cancer

[28].

2.4.4 Computed tomography

In computed tomography (CT), an X-ray source is rotated around the patient’s

major axis. Whereas the beam of a conventional X-ray can be considered to be

conical (i.e. 3-D), CT typically uses a “triangular” beam (i.e. a very thin cone).

The attenuation of the beam as it passes through the patient is recorded by an

X-ray detector positioned opposite to the source. The attenuation data from all


orientations can be combined to compute a 2-D image “slice”, where each location

in the slice represents the X-ray attenuation of the tissue to which it corresponds.

By slowly passing the patient through the rotating mechanism, 3-D data can be

acquired. Although it is possible to use CT for breast imaging, it is rarely used

to diagnose breast cancer [117]. The technique can be useful for surgical planning

and to assess the patient’s response to treatment.

2.4.5 Thermography

Advanced cancers promote angiogenesis—the development of a blood supply to

the tumour. Regions containing more blood are hotter than others, and this heat

may be detectable on the skin surface. Thermography is an imaging technique

that forms maps of the emission of infrared radiation [117]. These maps enable

clinicians to look for asymmetries in the heat patterns on the breasts that may

result from angiogenesis or the enhanced metabolic processes that occur in a

tumour. Compared to X-ray mammography, thermography lacks specificity and

resolution.

2.5 Summary

This chapter presented an introduction to breast cancer and the imaging modal-

ities used to detect the disease. In summary:

• Breast cancer is a significant public health issue. While many countries


now have screening programmes to help detect the disease at an early and

treatable stage, the image interpretation task is performed visually and is

subject to human error.

• X-ray mammography is the most useful imaging modality because of the

high image quality and low cost. X-ray mammography allows the anatomy

of the breast to be imaged at very high resolution, allowing very small

indicative signs of breast cancer—such as microcalcifications—to be seen.

• X-ray mammography does have some drawbacks (e.g. the use of radiation,

2-D projection of the 3-D structure, potential for poor patient positioning,

potential for poor film exposure and development).

• Other imaging modalities have their uses in detecting and diagnosing breast

cancer, but X-ray mammography for screening is unlikely to be replaced by

any of the currently available imaging techniques.

Chapter 3

Computer-aided mammography

3.1 Introduction

This chapter presents a review of the computer-aided mammography literature.

The chapter reviews:

• Image pre-processing.

• Automatic prediction of breast cancer risk.

• The appearance of common signs of breast cancer and approaches to their

detection by computer.

• Methods of evaluating computer-aided mammography systems.

• Common image databases.

• Available commercial systems.

59

Chapter 3—Computer-aided mammography 60

• Research on computer-aided prompting of radiologists.

We discuss the typical approach to computer-aided mammography and the prob-

lems associated with it. We propose how this problem might be solved.

3.2 Computer-aided mammography

Although screening mammography has been shown to reduce breast cancer mor-

tality [133, 2], it suffers from some problems that computer vision systems might

be able to solve, for example:

• Double reading improves cancer detection rate [57], but it cannot always be

performed in the screening environment due to human resource or economic

limitations. Computer vision systems could act as a second reader.

• The interpretation of mammograms is a difficult task and human error does

occur [97]. Computer vision systems could deliver a guaranteed minimum

quality of screening and potentially catch some of the errors made by radi-

ologists.

• Cancer is detected in less than 1% of women screened [133]. A computer

vision system that could accurately dismiss mammograms that were normal

could dramatically reduce radiologist workload.

The most commonly proposed approach to computer-aided mammography is

prompting, in which a computer system automatically analyses a digitised mam-

mogram and places prompts on a representation of the mammogram—e.g. an

Chapter 3—Image enhancement 61

image of the digitised mammogram displayed on screen or a paper printout—to

indicate the presence and location of possible signs of abnormality. A radiol-

ogist would then consider these prompts alongside their own interpretation of

mammograms. Prompting is discussed further in Section 3.14. The following re-

view of computer-aided mammography research generally assumes the prompting

approach, but other paradigms are also discussed.

3.3 Image enhancement

Image enhancement describes approaches that change the characteristics of im-

ages to make them more amenable to other tasks (e.g. inspection by humans or

further processing by computer). This includes noise suppression or equalisation,

image magnification, grey-level manipulation (e.g. brightness and contrast im-

provement) and feature enhancement or suppression. Generic image enhancement

techniques are well-established and are routinely used within more sophisticated

algorithms.

A commonly-used algorithm is histogram equalisation [168]. Histogram equali-

sation attempts to modify the grey-level values in an input image such that the

histogram of those values matches a specified histogram, which is often flat. If

a flat target histogram is specified, the result will be an image that uses the

entire range of grey-levels, with increased contrast near maxima in the original

histogram, and decreased contrast near minima. A possible problem with the

approach is that the image is modified based upon global image statistics, which

might not be appropriate in local contexts. Local histogram modification tech-


niques use local neighbourhoods, while adaptive histogram modification methods

use local contextual information [168].

Averaging filters replace pixel values with the average of those within a local

neighbourhood. Using the mean tends to blur edges (as it is essentially a low-

pass filter) while using the median does not. Bick et al. used median filtering to

remove noise spikes [15, 168]. Lai et al. used a modified median filter where the

set of pixels considered by the filter was restricted to exclude those that were too

dissimilar to the pixel that the filter was centred on [116]. The approach achieved

better edge preservation compared to the standard median filter. Such methods

are “coarse” in that they rarely have any model of the domain in which they

operate (e.g. such filters might mistake film noise for small microcalcifications

because they have no “knowledge” about those two classes of image feature).

Zwiggelaar et al. used a directional recursive median filter to construct mam-

mographic feature descriptors [192] (see Section 3.8 and Chapter 4 for a more

detailed description of such descriptors).

Grey-level values can be manipulated via the Fourier domain. For example,

image smoothing can be used in an attempt to suppress noise by attenuating

high-frequency components [168]. However, methods that operate only in the

Fourier domain lack spatial information, and so important context may not be

available. Wavelets address this problem as they can be used to describe images

in terms of both space and frequency, and are commonly used in mammography.

Wavelet analysis was used by Qian et al. [147] to enhance microcalcifications by

selectively reconstructing a subset of the wavelet sub-band images. Compared

to Fourier methods, wavelets allow the characteristics of the signal(s) of interest


to be specified more precisely. Wavelets were used in place of ad hoc texture

features by Campanini et al. [35] and used to statistically model mammographic

texture by Sajda et al. [159] (see Section 3.7 for a more detailed discussion of

these methods).

In contrast to the frequency-based methods such as Fourier and wavelet analy-

sis, mathematical morphology analyses images based upon the shape of image

features. It can be used to remove image features of a given shape and size

(e.g. Dengler et al. considered microcalcification candidates [55]). A possible

problem with mathematical morphology is that a specification of shape is re-

quired: image features that vary dramatically in shape may require very many

such specifications, leading to implementation issues. A detailed discussion of

mathematical morphology can be found in Chapter 4.

Noise equalisation is important because machine learning systems are under-

pinned by statistical methods which often implicitly assume that the noise has

particular characteristics. By equalising the noise, the properties of the image

data are likely to be more closely matched to the assumptions made by the al-

gorithms that operate on that data. Image noise in digitised mammograms may

be considered to vary as a function of grey-level pixel value [106, 166]. Smith

et al. used a radiopaque step-wedge phantom to estimate this relationship in order

to correct the non-uniformity [166], but a phantom is likely to be a nuisance in a

screening environment. Karssemeijer and Veldkamp described noise equalisation

transforms where the noise is estimated from the image itself—rather than from a

radiological phantom—using the standard deviation of local contrasts [106, 180].

It was demonstrated that equalising the noise using the approach improved the


performance of detection algorithms. This is likely to be due to the explanation

given above.

Highnam and Brady [88, 86] proposed a physics-based model of the mammo-

graphic image acquisition process to convert digital mammograms to an image

representation they call hint. In the hint representation pixel values represent the

thickness of the “interesting” (non-fat) tissue. The technique relies upon knowing

several parameters that describe the X-ray imaging process, such as the thickness

of the compressed breast, tube voltage, film type and exposure time. By modelling

the imaging process, the appearance under a set of “standard” imaging conditions

can be predicted, leading to the Standard Mammogram Form (SMF) [88]. It is

not always practical to measure the various imaging parameters during routine

screening and radiologists do not train with such standardised mammograms. It

seems likely that working with mammograms where pixel values represent tangi-

ble quantities will lead to better detection algorithms, but digital mammograms

are not widely available in hint form. One of the goals of the eDiaMoND project

was to make such data available to researchers (see Section 3.12) [23].

The identification of curvilinear structures is useful in detecting and classifying

spiculated lesions (see Section 3.8). Cerneaz and Brady developed a physics-based

model that was used to model the expected attenuation of curvilinear structures

[39]. The authors assumed that such structures are elliptical in cross-section and

so would appear to have strong second derivative components in the image. The

second derivative was used to enhance candidate pixels and a skeletonisation al-

gorithm [168] was used in further processing. Physics-based models would have

to be extremely complex and specific to properly explain the appearance of mam-

Chapter 3—Breast segmentation 65

mograms. It therefore seems likely that approaches based upon image data itself

have more potential. Most research on digital mammography has used this latter

approach.

3.4 Breast segmentation

The identification of the breast border is a common task in digital mammog-

raphy and the development of reliable automatic methods is important. Such

information is required to limit the search for abnormalities to the breast area

(particularly when algorithms are computationally expensive), or so that some

form of breast shape analysis can be performed (see Chapter 9 for an example).

Locating the breast border is a non-trivial task due to the variation both between

women and inherent to the X-ray acquisition process.

Grey-level thresholding is a common approach to breast segmentation. Two

thresholds are generally sought. The first discards pixels with low grey-levels,

assuming them to belong to non-breast radiolucent objects (such as air). The

second discards pixels with high grey-levels, assuming them to belong to non-

breast radiopaque objects (such as film markers). The selection of these thresh-

olds is generally non-trivial, and other information such as shape is often also

used. Byng et al. determined these thresholds manually [33]. They can also be

determined by analysing the shape of the image histogram [42]. Although thresh-

olding can provide an initial estimate of the boundary, the approach is generally

confounded by features such as film markers, and much more sophisticated ap-

proaches that have some model of what the segmented image should look like are


generally used.

Chandrasekhar and Attikiouzel analysed the shape of the cumulative grey-level

image histogram to identify a characteristic ‘knee’ which represents the bound-

ary between background and breast tissue [42]. Adaptive thresholding yielded an

initial segmentation which was then modelled by polynomials. This segmentation

was subtracted from the original image and the result was thresholded, resulting

in a binary image describing the breast and non-breast regions. Morphological

operations were used to remove artifacts arising from film scratches. An imple-

mentation of Chandrasekhar and Attikiouzel’s algorithm was subjectively good

enough to approximately limit the operation of detection algorithms to the breast

region, but was not good enough to allow the shape of entire mammograms to be

modelled in the work described in Chapter 9.

Lou et al. [121] quantised mammograms using k-means clustering and inspected

horizontal slices through the quantised images to determine the direction of a

decrease in pixel value. The direction was used to estimate the left-right orien-

tation of the breast. Pixel values on the skin-air border were found to lay in one

of three quantised pixel values. This information was used to generate an initial

estimate of the breast border. Actual mammogram pixel values were sampled

along normals to the initial estimate. Pixels values along normals to the breast

border will decrease from values associated with the edge of the breast to those

associated with the non-breast region. Linear models of pixel value as a func-

tion of distance along the normals were used to refine the estimate of the breast

border. A rule-based search was then used to further refine the breast border.

Finally, a B-spline was used to link and smooth the located breast border points.


The approach is sensible because the skin-air border should be relatively easy to

model. However, a common confounding feature is the placement of film markers.

These would pose an occlusion problem to methods that do not also use a model

of legal breast shape.

The active shape model (ASM) [48] has been used in a number of medical and non-

medical applications. An ASM models the statistical variation of shape associated

with a particular class of object and uses a statistical model of pixel values along

normals to the shape boundary to legally deform the model to fit to an object

in an image. Smith et al. used an ASM to locate the breast outline [165]. The

ASM can therefore be viewed as a generalisation of the approach proposed by

Lou et al. [121]. The two main problems with the ASM are that it does not use

all the image information in its search strategy and it requires an initialisation

that is already a good approximation to the final solution. The former was

rectified by the Active Appearance Model [47]. A better approach to breast

border segmentation might be to build a low resolution appearance model (similar

to that described in Chapter 9) and then search over the model parameters to find

those that best describe a low resolution version of the mammogram in question.

This would provide a low resolution estimate of the boundary. The estimate

could then be refined at high resolution using a model of the skin-air boundary

transition. Refinements could be propagated upwards to the low resolution model

where illegal (unlikely) refinements could be rejected.

Chapter 3—Breast density and risk estimation 68

3.5 Breast density and risk estimation

Post-menopausal breast density is a high risk factor for breast cancer [4]. Also,

because cancer develops from dense (glandular) tissue it may be masked in mam-

mograms by normal dense tissue. Automatic assessment of the density of breasts

and the risk associated with that density may be helpful to radiologists, particu-

larly as automated methods can provide stable independent measurements, while

there will be inherent variability in assessments made by humans.

Byng et al. proposed a simple interactive approach where users of their system

selected grey-level thresholds to segment the breast region and dense tissue [33].

The proportion of dense to total area was used as a measure of breast density. The

approach is reasonable because the mammographic brightness indicates density,

but it seem that a similar approach using the calibrated hint measure would

be more stable. Additionally, the manual selection of thresholds will introduce

variation between and within users; a fully automated system could avoid such

problems.

Taylor et al. investigated sorting mammograms into fatty and dense sets using

a multi-resolution non-overlapping tile-based method. A number of statistical

and texture measures, computed for each tile, were evaluated and local skewness

was found to best discriminate between the classes [175]. The reader is hereafter

referred to Section 3.15 for a discussion of ad hoc texture descriptors.

Wolfe proposed that parenchymal patterns are related to breast cancer risk [185]

and developed a radiological lexicon for describing the dense and fatty char-

Chapter 3—Breast density and risk estimation 69

acteristics of mammograms, known as Wolfe grades. The relationship between

parenchymal pattern and breast cancer risk has been confirmed by Boyd et al. [22]

and van Gils et al. [178]. Tahoces et al. statistically modelled various texture de-

scriptors to predict Wolfe grades [173].

Caldwell et al. used fractal dimension—a measure of the complexity of a self-

affine object—with mammographic images (considered as surfaces) to measure

textural characteristics. They classified mammograms by Wolfe grade, based

upon average fractal dimension and the difference between that average and the

fractal dimension of a region near the nipple [34].

Karssemeijer divided the breast into radial regions so that the distances to the

skin line were approximately equal. Grey-level histograms were computed for

each region and the mean standard deviation and skewness were used to classify

mammograms by Wolfe grade using a k-nearest neighbour classifier [107]. The

success of the method can probably be attributed to the statistical characterisa-

tion of the appearance of the mammograms.

Zhou et al. used a rule-based method that classified mammograms according to

prototypical characteristics in their grey-level histograms. This classification was

used to automatically select a threshold with which to segment the dense tissue.

The proportion of dense to total breast area was then computed [189]. Detecting

a well-understood feature in a 1-D function (the histogram) can be reasonably

easy, although the approach is dependent upon the stability of these histogram

characteristics.

A Gaussian mixture model of texture descriptors, learned using the Expectation-

Chapter 3—Microcalcification detection 70

Maximisation (EM) algorithm, was used by Zwiggelaar et al. to segment mammo-

grams into six tissue classes [193]. The area of dense tissue—as segmented by the

model—as a proportion of total area was used in a k-nearest neighbour framework

to classify mammograms into one of five density classes. Although learning the

distribution of texture features allows a principled statistical approach to be used,

it is not clear that the clustering produced by the EM algorithm would necessarily

correspond to a clustering that an expert might produce. Further, the EM algo-

rithm aims to find the best fit of a model of a probability density function to the

data, rather than to partition the data (as Algorithm 3 in Chapter 5 explains, in

the EM algorithm every data point belongs to every model component, so there

is no actual partitioning). Dedicated clustering methods might have been more

appropriate. Gaussian mixture models are discussed in some detail in Chapter 5

and a proof of the convergence property of the EM algorithm is presented in

Appendix A.

3.6 Microcalcification detection

Microcalcifications are tiny (approximately 500 µm) specks of calcium. A cluster

of microcalcifications can indicate the presence of an early cancer. Microcalci-

fications can sometimes be detected easily as they can be much brighter than

the surrounding tissue. However, small microcalcifications may appear to be

very similar to film or digitisation noise. Scratches on the mammographic film

can sometimes be mistaken for bright microcalcifications, particularly by auto-

mated methods. A mammogram containing an obvious microcalcification cluster


is shown in Figure 3.1.

Karssemeijer describes an iterative scheme for updating pixel labels, based upon

three local image descriptors (local contrast at two spatial resolutions and an es-

timate of local orientation). Pre-processing was used to achieve noise equalisation

using information from a radiological phantom. A Markov random field model

was used to model the spatial constraints between four pixel classes (background,

microcalcifications, lines or edges, and film emulsion errors) and a final labelling

was achieved via iteration [105]. Local methods are appropriate for individual

microcalcification detection because of their small size, but are inappropriate for

cluster detection. Detecting clusters of microcalcifications is important because

their form contains important information about the cause of the cluster (e.g. ma-

lignancy). In addition, it can be difficult to determine when Markov random field

models have converged.

Veldkamp et al. [179] classified microcalcification clusters as being malignant or

benign by estimating the likelihood of them being malignant. Individual micro-

calcifications were detected using Karssemeijer’s method. Discs were then centred

on each microcalcification and the boundaries of the intersection of the discs were

computed. Microcalcifications were clustered according to which boundary they

were located within. The procedure was performed for both mediolateral-oblique

and cranio-caudal views, and correspondences were determined between clusters

in each view. Features used for classification included the relative location of the

cluster in the breast, measures of calcification distribution within the cluster and

shape features. The likelihood of malignancy was computed as the ratio of the

number of malignant to benign neighbours in the k-nearest neighbourhood. The


Figure 3.1: An example microcalcification cluster.The location of the microcalcification cluster is indicated by the red circle. Thebottom left image shows a magnification of the cluster; the bottom right imageshows a histogram equalised version of the magnified cluster. Source: The mam-mographic image analysis society digital mammogram database [171].


approach is sensible because it acknowledges that it is the clusters that are im-

portant, includes information about the form of clusters and delivers a statistical

measure of the likelihood of malignancy.

Bocchi et al. [18] designed a matched filter to enhance microcalcifications by as-

suming a Gaussian model of microcalcifications and a fractal model of mammo-

graphic background. A region growing algorithm was used to segment candidate

microcalcification clusters and to describe the location of each candidate micro-

calcification. An artificial neural network was used to discriminate between mi-

crocalcifications and artifacts of the filtering stage. Segmented regions were char-

acterised by fractal descriptors and these were used in a second artificial neural

network to identify true clusters. The underlying assumptions of the approach—

a Gaussian model of microcalcifications and a fractal model of mammographic

background—while being reasonable models, are not true. A more realistic model

of these image features may have improved their results.

False positive elimination was addressed by Ema et al. who used edge gradients

at signal-perimeter pixels to eliminate features such as noise or other artifacts

[61]. Zhang et al. used a “shift-invariant” artificial neural network to segment

candidate microcalcifications [188]. The size and “linearity” of candidate micro-

calcifications were analysed to reject false positives due to vessels. Both of these

methods implicitly attempt to model the neighbourhood around true microcalci-

fications and direct modelling of that neighbourhood—such as that described in

Chapter 6—might be more appropriate.

Chapter 3—Masses 74

3.7 Masses

Masses are abnormal growths and may be malignant or benign. Masses may ap-

pear to be localised bright regions, but are often very similar in appearance to,

and may be obscured by, normal glandular tissue. Detection and discrimination

of masses can be difficult even for expert mammography radiologists. Malignant

masses are often characterised by linear features radiating from the mass, called

spicules, and we discuss methods for detecting and assessing spiculation in Sec-

tion 3.8. A mammogram containing an obvious circumscribed mass is shown in

Figure 3.2.

A common approach to the detection and classification of masses is to determine

candidate mass regions and then compute descriptors for the region designed

to allow discrimination between true and false detections. The problems that

research addresses is how candidate mass locations are found, which features

should be extracted and how they should be combined to yield a classification.

Karssemeijer and te Brake compared two methods for segmenting masses [177].

The first grew a region from a seed location, expanding the region if neighbour-

ing pixels were above a certain threshold. The region growing was repeated using

a number of thresholds and the “best” region was selected using a maximum

likelihood method that considered the distribution of pixel grey-levels inside and

outside the region. The second method was a dynamic contour defined by a set

of connected vertices, similar to the method proposed by Kass et al. [111]. The

vertices were accelerated towards the mass boundary using internal and external

forces. The internal forces served to encourage compactness and circularity of the


Figure 3.2: An example circumscribed mass.The location of the mass is indicated by the red circle. The bottom left image showsa magnification of the mass; the bottom right image shows a histogram equalisedversion of the mass. Source: The mammographic image analysis society digitalmammogram database [171].


region, while the external forces served to encourage the boundary to converge on

strong image gradients. A damping force was used to promote convergence. The

authors report that the two methods produced segmentations that were similar

to those of radiologists. The segmentations produced by the dynamic contour

model allowed better discrimination between normal and abnormal regions when

geometric and texture features were used within an artificial neural network clas-

sifier. Region growing methods generally only consider local neighbourhoods,

and so segmentations can have illegal shapes. Dynamic contour methods depend

upon the form of the forces used to constrain them. Equations relating the im-

age content to the force applied to the vertices tend to be ad hoc in nature, and

so it is easy for assumptions about the data to be implicitly included. It may

be more appropriate to learn the form of the constraining forces than to choose

them manually. Dynamic contour methods do not generally have any notion of

the range of legal shapes that they may take. This is often a problem in cases

where the objects of interest have prototypical characteristics (e.g. the shapes of

people’s hands), but is appropriate for objects such as mammographic masses

where the shapes lack typical structure (i.e. have very high variability).

Haralick et al. used texture descriptors—spatial grey-level dependence (SGLD)

matrices (also called co-occurrence matrices)—to compute texture features [79].

The (i, j)-th element of a SGLD matrix Sd,θ describes the number of pixels in

the input image with grey-level i that have a pixel with grey-level j at a distance

of d in direction θ. Petrosian et al. and Chan et al. computed statistics from

these matrices to describe textural characteristics [141, 41]. These were used to

discriminate between textures associated with mass and non-mass regions. As the


number of grey-levels increases (i.e. as the number of bits used in the digitisation

increases), so does the size of the SGLD matrices. This leads to a problem similar

to the “curse of dimensionality” (described in Section 5.3), where very much data

is required to estimate the matrices that adequately characterise the texture. The

bit-depth of the images can be reduced to make the estimation tractable, but this

can lead to a poor description of the texture.

Brzakovic et al. segmented mass candidates using a multi-scale fuzzy method. A

textural descriptor was used within a hierarchy of classifiers that used thresholds

and Bayesian methods to classify the candidates as as malignant or benign [29].

Wavelets were used by Campanini et al. to detect malignant masses [35]. Wavelet

decompositions were computed on square windows extracted by “scanning” mam-

mograms over a range of scales. A support vector machine classifier was trained

on the wavelet coefficients to classify the windows as malignant masses or normal

regions. For a particular test mammogram, the initial output was a set of binary

images, one at each scale. A majority voting scheme was employed to produce a

final classification. Support vector machines have proved to perform well in high-

dimensional spaces and the authors rely on the ability of the learning system to

extract useful features from the full descriptions provided by the wavelet coeffi-

cients. This is reasonable because it removes the need to make explicit or implicit

assumptions about which image characteristics are appropriate to extract.

Sajda et al. present a generative statistical model of the appearance of mammo-

graphic regions of interest. Wavelet coefficients, computed from mammographic

patches, were statistically modelled using a tree-structured variant of a hidden

Chapter 3—Spiculated lesions 78

Markov model. In addition to being able to generate synthetic mammographic

textures and compress mammographic images, the model can be used in an ana-

lytical mode as an adjunct to a mass detection algorithm to reduce false positives.

Models were trained on mass and non-mass regions of interest and used to com-

pute likelihood ratios for test images [159, 169]. The method is discussed further

in Section 6.2.

3.8 Spiculated lesions

The margin of a mass contains information that radiologists can use to charac-

terise the mass. Margins can be described as circumscribed, obscured, lobulated,

indistinct or spiculated. Spiculations (also called stellate distortions) are curvi-

linear radial features and a strong sign of malignancy. Automated methods seek

to classify the mass margin using either features that describe properties of the

margin or by detecting and classifying spiculations directly. A mammogram con-

taining an obvious spiculated lesion is shown in Figure 3.3.

Scale-orientation pixel signatures1 corresponding to linear structures were statis-

tically modelled by Zwiggelaar and Marti [191]. The model was then used to

classify pixels as belonging to linear structures or not. Pixel signatures are a

type of texture feature and describe pixel neighbourhoods in terms of scale and

orientation. Signatures taken from blob-like features are dissimilar to those taken

from linear features. Modelling signatures from linear features is sensible as it

allows the presence of such structures to be analysed in a statistically meaningful

1Scale-orientation pixel signatures are presented in detail in Chapter 4.


Figure 3.3: An example spiculated lesion.The location of the spiculated lesion is indicated by the red circle. The bottom leftimage shows a magnification of the spiculated lesion; the bottom right image showsa histogram equalised version of the spiculated lesion. Source: The mammographicimage analysis society digital mammogram database [171].


way.

Zwiggelaar et al. compared several approaches to the detection of linear struc-

tures [190]. Two variants of a line operator were investigated that computes an

orientation and strength for each pixel by computing the mean pixel value along

oriented lines centred on the pixel in question. Karssemeijer’s method [110] and

a ridge detector designed to minimise the response to “blobs” were also used.

The line operators were found to perform best. An approach to spicule detec-

tion was investigated. Linear features were classified into their anatomical classes

on the basis of their cross-sectional profiles. Noise was reduced using principal

components analysis and classification was achieved by assuming Gaussian mod-

els of class conditional densities. However, in order to detect spiculated lesions,

a method would be required to integrate knowledge from the classified linear

structures.

Karssemeijer computed statistics for a circular region centred on each pixel in

turn that described the concentration and radial uniformity of moderately strong

gradients pointing towards the centre of the region. A continuous multi-resolution

scheme was used to match the feature extraction to the scale of the image fea-

tures. These features were used in an artificial neural network to predict the

likelihood of suspiciousness [108]. Karssemeijer describes a method of comput-

ing the orientation and strength of linear features by combining the responses to

three oriented Gaussian second derivate kernels [110]. While spicules do point in

the general direction of the central mass, they are often curved and so a method

that could determine that a number of curvilinear structures—rather than just

pixels with particular gradients—“point” towards a given area might be more


successful.

A linear discriminant was used by Mudigonda et al. to classify masses as malig-

nant or benign using texture and gradient features extracted from SGLD matrices

computed from “ribbons” around mass borders [132]. One of the problems that

such a method would have is determining the correct region around the border,

as some spicules can be quite short, while others can be relatively long.

A spiculation descriptor was proposed by Huo et al. and evaluated for four types

of mass region [94]. Let I denote the pixels inside a segmented mass region and O

denote the pixels outside the segmented region but within the region of interest

containing the mass candidate. Four types of region were investigated: I, O,

O∪I and a region lying on the boundary of the segmented mass. The directions

of maximal gradient were computed for each pixel in the region and compared to

the direction defined by the line connecting the centre of gravity of the mass region

to the location of the pixel in question. Statistics computed from these measures

were used to describe the spiculation associated with the mass. The authors

report that features computed from O and O ∪ I provided better estimates of

the likelihood of malignancy than the other regions, but combining the measures

from all regions yielded the best performance. This method is similar to that

of Karssemeijer [108]; again, a method that could determine that a number of

curvilinear structures—rather than just pixels with particular gradients—“point”

towards a given area might be more successful.

Evans et al. developed a statistical model of the characteristics of normal curvilin-

ear features [62]. A multi-scale method was used to enhance locally bright ridge-


like structures. Shape description features were computed for non-intersecting

curvilinear features which were then projected into a principal components space.

The distribution of points in the principal components space was modelled us-

ing a Gaussian mixture model. Such a method could be used within a novelty

detection scheme to detect abnormal curvilinear features.

Sahiner et al. classified masses as malignant or benign using morphological and

textural features extracted from a region on the mass periphery. They used

an active contour (see [111, 177]) to segment mass candidates. Morphological

features (e.g. a Fourier descriptor, convexity and rectangularity measures) and

texture features extracted from SGLD matrices were used in a linear discriminant

classifier [158].

A wavelet decomposition was used by Liu et al. to detect and classify masses [120].

Orientation and magnitude features were extracted from each sub-band image and

used within a binary classification tree that processed the features in a coarse to

fine order according to the scale of the sub-band images. This allowed images

to be efficiently processed, as positive mass detections were propagated from

coarse levels, eliminating the need to process all pixels in all sub-band images.

Median filtering was used on the final response images to reduce false positives.

While classifiers like the support vector machine are now more common than

classification trees, the approach taken could allow definitely normal features to

be ignored at little computational expense. However, there is a risk of an increased

false negative and positive rate if some of the available evidence is ignored.

Chapter 3—Asymmetry 83

3.9 Asymmetry

Radiologists typically view mammograms as pairs of left and right breasts and

use information in each to help understand the appearance of the other. An

abnormality that is detected as a result of a difference between a pair of mammo-

grams is called an asymmetry, although there is a distinction between a radiolog-

ical asymmetry and a mathematical asymmetry. All pairs of mammograms are

mathematically asymmetrical and this asymmetry may be quite marked while

still being considered normal. Few computer-aided detection algorithms include

asymmetry information and almost certainly suffer as a result.

Giger et al. used mathematical asymmetry to generate candidate mass locations

by registering pairs of breasts and performing bilateral subtraction. Geometric

and texture features were extracted and used within an artificial neural network.

The authors improved performance using temporal subtraction, which can be

considered as another form of asymmetry [71]. A potential problem with this

approach is that the texture analysed is created from a bilateral subtraction that

is due to a registration. If the behaviour of the registration algorithm is unstable

(e.g. if it performs differently on different types of breast) then the texture dis-

crimination task would be confounded. There is a fundamental problem in the

assumption that dense correspondences can be obtained between a pair of mam-

mograms, because structure may be missing from one or both mammograms

(e.g. the pectoral muscle or nipple may not have been imaged).

Miller and Astley [129] note that mathematical asymmetry—typically obtained

via image registration and bilateral subtraction, which may introduce artificial

Chapter 3—Asymmetry 84

asymmetry—is not a good model of radiological asymmetry. They propose mea-

sures for three types of radiological asymmetry: shape, intensity and topology.

Radiologists annotated the dense regions of mammograms and correspondence

was assumed between the largest such regions in each pair of mammograms.

Bilateral differences in shape descriptors were used as shape asymmetry mea-

sures. The authors also used the minimum cost of “transporting” grey-levels

from one reduced-resolution mammogram to the other—using the transportation

algorithm [89]—as a measure of intensity asymmetry. Topological asymmetry

was measured using the difference between area and binary moments. A lin-

ear discriminant performed best when all three measures were combined. The

assumption that there is a correspondence between the largest annotated dense

regions may not be correct because it may be possible for a dense region in one

breast to correspond to two (or more) such regions in the other. In addition,

only considering the largest dense regions ignores the contribution to asymmetry

from the other regions. Asymmetry can be a subtle sign of abnormality, and so

applying the transportation algorithm at low resolution may miss the more subtle

asymmetries.

Miller and Astley could only compute transportation cost for low-resolution mam-

mograms because the solution to the transportation programming problem scales

poorly with the number of pixels and computing power was limited when judged

by today’s standards [129]. Board et al. revisited the transportation problem

as an asymmetry measure and developed a multi-resolution transportation algo-

rithm where solutions at low resolutions constrain the problems at higher reso-

lutions, thus allowing only “plausible” transportations [17]. They used the mean

Chapter 3—Clinical decision support 85

transportation cost per pixel to discriminate between normal and abnormal asym-

metries and the per-pixel transportation cost to localise asymmetries. While this

work addresses the problem that Miller and Astley faced, it is not clear what

transportation cost means in a statistical sense.

3.10 Clinical decision support

Clinical decision support refers to the use of computer technology to help clini-

cians make clinically relevant decisions. While computer-aided detection (CADe)

is concerned with fully-automatic methods that aim to draw the attention of ra-

diologists to abnormalities they may have missed or to act as substitute indepen-

dent second readers, clinical decision support—which may also be referred to as

computer-aided diagnosis (CADi)—is concerned with the independent evaluation

of clinical information to help clinicians reach diagnoses. The clinical information

is often provided by the radiologist, rather than being identified automatically by

the computer.

Even simple clinically significant information can improve the performance of

CADe systems. Kilday et al. included patient age with more conventional shape

and texture features [113]. The inclusion of age increased the area under the

ROC curve for the system from 0.72 to 0.82 (see Section 3.11 for background on

ROC analysis). However, care must be taken when constructing such systems

so that a priori information does not dominate other evidence (e.g. it would be

undesirable for a mammogram from a young woman with breast cancer to be

misclassified as normal on the basis that breast cancer is uncommon in that age

Chapter 3—Evaluation of computer-based methods 86

group).

Wu et al. trained an artificial neural network on features (e.g. presence of well-

defined mass, presence of microcalcification, subtlety of distortions), rated by

radiologists on a 10-point scale, from textbook cases. The trained system was

evaluated on clinical cases and the authors report that the system could discrim-

inate between malignant and benign cases more accurately than attending and

resident radiologists [187]. A similar approach was used by Floyd et al. [65].

D’Orsi et al. developed a reporting scheme where radiologists recorded either the

magnitude of a mammographic feature or a measure of their confidence in the

presence of the feature. Discriminant analysis was used to provide an estimate

of the likelihood of malignancy [58]. A significant problem with these approaches

is that the decision provided by the computer system is dependent upon human

input. There is likely to be both inter- and intra-user variation, so such systems

must be constructed to be robust to such error. In contrast to CADe systems,

where is should be possible to quote a guaranteed minimum level of performance,

no such guarantees can be made for CADi. In addition, it is likely that clinicians

would need to be trained how to use such systems and visual inspection and man-

ual interaction are required (i.e. a CADi system could not act as an independent

second reader).

3.11 Evaluation of computer-based methods

Computer-aided mammography systems are generally designed to produce a mea-

sure which can be used to make a binary decision about the presence (condition


A) or absence (condition B) of some characteristic. Often, there is location

information associated with the measure. Example outputs of computer-aided

mammography systems are:

• A classification of a region of interest as malignant or benign.

• An estimate of the likelihood of malignancy in a mammogram.

• A pixel-wise classification of a mammogram into microcalcification and non-

microcalcification classes.

• A pixel-wise estimate of the likelihood of the presence of a malignant mass.

• A pixel-wise segmentation of a mammogram into several tissue classes.

A simple evaluation measure that can be used when binary classifications are

made is percent correct (e.g. ‘the system correctly detected 75% of the malignant

masses’ ). This measure describes the proportion of true positives (TP)—correct

detections of condition A. However, the measure does not tell us the number of:

• false positives (FP)—incorrect detections of condition A;

• true negatives (TN)—correct detections of condition B ;

• false negatives (FN)—incorrect detections of condition B.

Rather than being reported explicitly, these statistics are usually used to compute

sensitivity (the proportion of cases of condition A that are correctly identified)

and specificity (the proportion of cases of condition B that are correctly identified)


[140]. Formally, if nTP denotes the number of true positives, nFP denotes the

number of false positives, nTN denotes the number of true negatives and nFN

denotes the number of false negatives, then sensitivity and specificity are defined

as:

Sensitivity =nTP

nTP + nFN

(3.1)

Specificity =nTN

nTN + nFP

(3.2)

A perfect detection or classification algorithm would have both sensitivity and

specificity equal to unity.

When an algorithm produces less coarse measurements about the presence or

absence of the characteristic in question (e.g. on a continuous scale), richer de-

scriptions of the performance of the algorithm can be produced. Sensitivity and

specificity can be computed at each of a number of thresholds. These can be plot-

ted on the unit plane to form a receiver operating characteristic2 (ROC) curve,

with 1 − specificity plotted on the abscissa (i.e. the “x-axis”) and sensitivity

plotted on the ordinate (i.e. the “y-axis”). The diagonal line defined by

sensitivity = 1− specificity (3.3)

(i.e. y = x) represents the performance of a random classifier. The ROC curve de-

scribes the trade-off between sensitivity and specificity when a particular thresh-

old (an operating point) is selected to discriminate between the two classes. Ide-

2Receiver operating characteristic analysis is named after the RADAR receiver operators ofthe second world war [46].

Chapter 3—Image databases 89

ally, the ROC curve would enclose the unit plane perfectly, and the area under

the curve would be unity. The area under the ROC curve is commonly used to

summarise the ROC curve, and is usually given the symbol Az. An example of

a desirable ROC curve is shown in Figure 7.3 and an example of an undesirable

ROC curve is shown in Figure 7.4.4.

Variants of ROC analysis were developed to allow localisation information to

form part of the analysis (e.g. FROC [30], LROC [170], AFROC [40]). The

abscissa of a FROC curve shows the number of false positives per image and

the corresponding proportion of correct detections with correct localisation is

plotted on the ordinate. FROC curves are highly sensitive to the criteria used

to determine suitable localisation. ROC and FROC analysis are commonly used

in the computer-aided mammography literature and the reader is directed to

Metz for a detailed exposition on experimental design issues and performance

evaluation in computer-aided mammography [126, 127].

3.12 Image databases

Comparing published results is not meaningful unless one can be sure that the

same data and evaluation criteria were used. It is commonplace for authors to

evaluate their algorithms using data, obtained from radiologist colleagues, which

is not made available to other investigators. This is often perfectly justified,

for example when data with particular characteristics is required, or when ethical

approval or confidentiality agreements prohibit the dissemination of patient data.

In the majority of cases, however, the use of publicly-available data should be


preferred, to more easily allow results to be compared and experiments to be

replicated. Efforts were made to establish common datasets in the 1990s, when

several research groups compiled databases and made them available to other

investigators. Data was originally distributed via physical media (e.g. CD-ROM,

magnetic tape) or the Internet, but as persistent storage capacity has increased

and Internet connectivity is approaching ubiquity, the Internet has become the

dominant means of distributing data to investigators.

The UK Mammographic Image Analysis Society’s (MIAS) database [171, 128]

contains 161 pairs of MLO views. The database contains examples of normal

mammograms and common types of abnormality. The images were digitised at

50 µm per pixel at 8 bits per pixel. The images were obtained from a single

UK screening centre, and the database includes all breast types (e.g. fatty, fatty-

glandular, dense). Groundtruth was annotated by a radiologist and consists of

location coordinates and radii which specify regions containing abnormalities.

The authors say that the images were ‘carefully selected. . . to be of the high-

est quality of exposure and patient positioning’ ; most papers publicising digital

mammogram databases make similar claims. A reduced-resolution version of the

database—called the mini-MIAS database—is also available [130].

The University of South Florida’s Digital Database for Screening Mammography

(DDSM) [83] contains 2 620 cases with 4 films per case, taken from screening

examinations. The images were obtained from a number of sites (the University

of South Florida, Massachusetts General Hospital, Sandia National Laboratories

and Washington University School of Medicine). The images were compressed

using a lossless variant of the JPEG image format and software is provided to


decode data in this format. In addition to the image data, the database con-

tains patient age, examination and digitisation dates and American College of

Radiologist (ACR) and Breast Imaging Reporting and Data System (BI-RADS)

annotations. The database is available via the Internet [52].

The Internet is the most suitable medium for advertising and distributing image

databases because it allows data to be accessed on demand and at low cost by

anyone in the world with a suitable Internet connection. A comparison of the

databases publicised in the literature with those advertised or made available via

the Internet reveals that several databases have not been adequately maintained.

These include: the Lawrence Livermore National Laboratory/University of Cal-

ifornia, San Francisco (LLNL/UCSF) database [123], the PRISM/PROMAM

database, the University of Chicago/University of North Carolina (Chapel Hill)

database [135] and the University of Washington database (although this appears

to have been included in the DDSM).

The UK Diagnostic Mammography National Database (eDiaMoND) project [23],

was a research collaboration between academia, clinicians and industry that

aimed to investigate the use of “grid” technologies to improve the efficiency of

the NHS Breast Screening Programme by enabling access to image data through

digitisation and to aid training, epidemiology and computer aided detection ef-

forts. The project aimed to make data available to its users in both traditional

and Standard Mammogram Form (SMF) formats [88]. However, it appears that

blanket ethical approval to allow researchers to use the data for arbitrary re-

search has not been obtained and so there is currently no open access to the

data. However, ethical approval may be given to specific projects. The European


MammoGrid project has similar aims [3] to the eDiaMoND project.

Although the DDSM is recognised as being the premier database for computer-

aided mammography, the image data is compressed using a lossless variant of the

JPEG format, which is not widely supported. A second problem is the relatively

poor annotation. Mass regions are outlined, but only the areas containing micro-

calcifications and spicules are given—individual microcalcifications and spicules

are not annotated.

An “ideal” database of digital mammograms for computer-aided mammography

research would have some or all of the following characteristics:

• Ethical approval of and patient consent for all possible useful research in

which the database could be used.

• Safeguards to ensure patient confidentiality and anonymity.

• Grouping by patient, with current and prior cases, with four views per case.

• Enough cases that statistically significant results could be obtained.

• Patients should be sampled from several clinical centres.

• Mammograms should not be excluded on the basis of substandard image ac-

quisition (unless, perhaps, a radiologist would discard the mammogram and

ask for the patient to be recalled for better mammograms to be obtained).

• Representation of all classes of mammograms:

– Normal and abnormal cases.


– Inclusion of all clinically significant abnormalities (microcalcifications,

masses, spiculated lesions, architectural distortions and asymmetries).

– All types of breast (e.g. fatty, dense).

– Data should be collected from both asymptomatic and symptomatic

women.

• Pixel-level annotation by several radiologists so that groundtruth likelihood

of abnormality could be estimated.

• Inclusion of clinical information relevant to breast cancer risk (e.g. patient

age, family history of breast cancer, socioeconomic status).

• Image acquisition and digitisation parameters.

• Identified subsets of abnormality (e.g. mass subset, microcalcification sub-

set), so that component algorithms could be tested separately.

• Specifications and implementations of a set of common evaluation strategies,

so that results in published work can be compared directly.

The “ideal” database described above would require significant resources to build

and maintain, however the lack of a database with these—or similar—characteristics

(and the lack of standardised evaluation protocols) is an impediment to the field.

With the recent advent of web and “grid” services, it should be possible to provide

not only mammographic image data via the Internet, but to facilitate standard-

ised evaluation of computer-aided mammography algorithms. Digitised mammo-

grams could be requested from a data provider, locally analysed, and algorithm

Chapter 3—Commercial systems 94

output submitted to an evaluation service provider which would return the evalu-

ation results (e.g. ROC curve data). These results would be directly comparable

to others generated by the same service provider.

3.13 Commercial systems

There have been several attempts to develop and market commercial computer-

aided mammography systems. The reader is directed to [72] for a detailed history

of commercialisation efforts. It is common for medical devices to be marketed

in the USA first, and doing so requires pre-market approval (PMA) from the

US Food and Drug Administration (FDA). For devices such as CADe systems,

PMA requires that the device does not significantly increase the callback rate

(especially for biopsies) and is capable of correctly identifying areas associated

with cancer. PMA is not concerned with value for money or the impact a device

has on work-flow. Instead, PMA is a certificate of safety, rather than of clinical

effectiveness or efficiency. FDA PMA is judged in terms of the mammography

landscape in the USA, which differs from that of the UK (e.g. in the USA the

age range of women undergoing screening is wider, the screening population is

self-referred and the screening interval is one year [72]). Claims made about

a CADe system with respect to FDA PMA do not automatically apply to the

UK. Nevertheless, we will restrict the discussion of commercial systems to those

which have obtained FDA PMA (although VuCOMP expects FDA approval for

its M-Vu system in 2005 [181]).

There are currently four commercial CADe systems for mammography that have


obtained FDA PMA: the ImageChecker by R2 Technology Incorporated [150],

Second Look by iCAD Incorporated [99], the KODAK Mammography CAD Sys-

tem by the Eastman Kodak Company [114] and the Senographe 2000D system

by the General Electric Company [69]. General Electric license the ImageChecker

software for their Senographe system. The KODAK Mammography CAD Sys-

tem has only recently been given FDA PMA (late 2004) and no evaluations of

the technology have been published in the literature. We will therefore restrict

our discussion to the ImageChecker and Second Look systems.

The ImageChecker system obtained PMA in 1998 and the FDA has since granted

PMA for several improvements to the system. The system uses algorithms devel-

oped by Nico Karssemeijer and collaborators and displays mass and microcalci-

fication prompts on a computer monitor. There has been extensive evaluation of

the system in the USA and Europe.

R2 Technology Incorporated claim that version 8.0 of their ImageChecker algo-

rithm achieves ‘1.5 false positive marks per normal case at the 91 percent sensitiv-

ity level’ [149]. The system costs approximately £108 000 and an annual service

contract costs approximately £10 000 (ca. 2001, [72]).

The reader is referred to [72] for a discussion of evaluations performed on the

ImageChecker system and to Section 3.14 for a discussion of evaluations of the

ImageCheckersystem for prompting. Astley et al. compared the ImageChecker

system to non-medical readers [9] for pre-screening. 900 cases containing four

films per case (10% containing cancers) were read by 6 trained but non-medical

readers and the ImageChecker system. The ImageChecker failed to mark 3 of the


cancers, while the non-medical readers failed to mark between 4 and 21 cancers.

The best non-medical readers had false positive rates of 33% and 44% while the

ImageChecker system had a false positive rate of around 69%. It took the non-

medical readers an average of 40 s to read a case, while the ImageChecker system

took an average of 318 s.

The Second Look system obtained PMA in 2002 and the FDA has since granted

PMA for several improvements to the system. The system uses algorithms devel-

oped by Steven Rogers, a retired airborne weapons specialist, and displays mass

and microcalcification cluster prompts on a paper printout [72]. The Second Look

700 system costs $139 950 [98] (approximately £72 870).

Astley et al. evaluated the Second Look system in a UK screening environment

[8]. They report that the false positive rate was 1.43 per image on normal mam-

mograms and 1.22 per image when averaged over both normal and abnormal

mammograms. The system could correctly identify 73.8% of abnormalities (rising

to 83.3% when both MLO and CC views were available). The authors simulated

clinical use of the system. 790 cases were read by 3 radiologists and 1 radio-

grapher, with and without prompting. No significant differences in recall rate

or reading time were found (the radiographer was faster with prompting, while

the radiologists took longer). The authors note that technical problems with

the system (e.g. reduced throughput due to problems with stick-on film labels

and failures caused by static electricity in the reading room) would require the

employment of an additional administrator and would delay reading by a day.

An evaluation of the Second Look system’s ability to detect early cancers was

performed. Current and prior films were studied from a normal control group

Chapter 3—Prompting 97

and from a group for whom cancer was identified in the current films. The radi-

ologists were asked to identify cancers in the prior films without and then with

the current films. The radiologists identified 10% and then 14.4% of the cancers.

The Second Look system identified 27.8% of the cancers in the prior films. This

suggests that, on early cancers, the system can perform better than radiologists.

3.14 Prompting

The prompting model for computer-aided mammography is predicated on the

assumption that prompts will help radiologists. Research into prompting seeks

to determine if, and under what circumstances, this assumption is valid. There

are essentially two types of prompting research:

• The psychophysical aspects of prompting. Participants generally perform

image interpretation tasks in synthetic environments.

• Evaluation of radiologist performance when CADe systems are used.

Hutt et al. investigated the effect of erroneous prompts on radiologist perfor-

mance. Seven radiologists viewed 48 digitised mammographic regions of interest

with and without microcalcification clusters. Prompts were placed on the images

and the error rate was varied. The authors report that prompting was only ef-

fective when the false positive rate was low (approximately 0.5 false prompts per

image) [96]. A screening environment was simulated and 6 radiologists viewed

100 films containing normal and abnormal mammograms with single or mul-

tiple abnormalities. The mammograms were read with and without prompts,


with the false prompt rate set at approximately 1.1 per image. The radiologists

performed better in the prompted condition. In prompted cases where the radi-

ologists missed abnormalities, the films had no prompt on the real abnormality

and a false prompt elsewhere. This work suggests that, not only is the false

positive rate important, but incorrect prompts can distract radiologists from real

abnormalities.

Hutt’s PhD thesis presents a larger version of the experiment reported by Hutt

et al. [96]. Prompted and unprompted mammograms were read by 30 radiolo-

gists from 11 UK screening centres. The results suggest that prompting can be

expected to be successful if the number of false positives does not exceed the

number of true positives by more than 50%. Hutt suggests that, given the over

representation of abnormal mammograms in the test set, this relationship should

be revised downwards to a true to false positive ratio of approximately unity.

This ratio was confirmed by Astley et al. in a psychophysical experiment that

used simulated abnormalities and non-medical readers [7]. Given that only 5% of

screening mammograms have any form of abnormality, a prompting system that

generates true and false positives with equal probability will on average generate

a false positive no more than once in 20 cases. If we assume 4 images per case,

then this equates to 1 false positive in 80 images (or, 0.0125 false positives per

image). By comparison, R2 Technology Incorporated claim that version 8.0 of

their ImageChecker algorithm achieves ‘1.5 false positive marks per normal case

at the 91 percent sensitivity level’ [149]. However, it should be noted that the

psychophysical experiments reported by Hutt et al. were not conducted in clinical

settings and relatively few radiologists and images were used, limiting the validity


of generalising their results.

Giger et al. evaluated the usefulness of an “intelligent search” mammography

workstation [70]. Upon presentation of an unknown case, the workstation output

an estimate of malignancy (based upon an automatic segmentation algorithm and

an artificial neural network using geometric and texture features), images—from

an atlas—of lesions that were deemed to be similar and graphics illustrating the

characteristics of the presented lesion relative to those in the atlas. The user

could search for similar lesions using various criteria. Users could interactively

alter the image contrast and magnify the mammograms. A set of 50 normal and

50 mass images were viewed by 5 radiologists with and without the workstation.

The authors report an improvement when the workstation was used (Az of 0.90

with the workstation compared to 0.86 without). This work suggests that allowing

radiologists to manipulate the digital images and compare them to other cases can

improve radiologist performance, but the paper does not analyse which aspects

of the system were most effective.

Karssemeijer et al. investigated single and double reading by radiologists and sin-

gle reading with prompting [109]. A set of 10 expert radiologists read 500 cases,

where half contained cancers, and estimated the likelihood of malignancy. The

images were also analysed using the ImageChecker system and the suspiciousness

rating of each prompt was recorded. Double reading was simulated by combin-

ing annotations from each possible pairing of the 10 radiologists using a prompt

proximity rule. Reading with CADe was simulated using a similar approach. The

performance of the three types of reading was assessed using the mean sensitiv-

ity in the region of the ROC curve representing false positive rates lower than


10%. This figure was chosen as the false positive rate of screening in the USA is

approximately 8% and is between 1% and 4% in Europe. For single reading, the

mean sensitivity was 39.4%. For simulated double reading, the mean sensitivity

was 49.9%. For simulated reading with CADe the mean sensitivity was 46.4%.

Gur et al. prospectively assessed the impact of CADe on patient recall and cancer

detection rates in a clinical setting [76]. A set of 115 571 mammograms was

divided into two almost equal sets which were read by 24 radiologists with and

without prompts generated by the ImageChecker system. No significant increase

in recall or detection rates were found when CADe was used. However, the

confidence intervals associated with recall and detection rates were large enough

to be consistent with the possibility of large improvements when CADe is used,

due to the relatively low number of cancers detected with and without CADe and

the large inter-reader variability among the radiologists. Additionally, during the

period of the study, the percentage of women who were screened for the first

time decreased from 40% to 30%. On average, first screening rounds have higher

recall rates than subsequent rounds and so cancers detected in first rounds may

be considered “easier”. However, the authors found there to be no statistically

significant trend in detection rates over time. The authors conclude that, if their

results were not due to chance, current CADe systems are not suitable for use by

expert screening mammography radiologists.

Freer and Ulissey conducted a large prospective study of the effect of CADe on

recall rate, positive predictive value for biopsy, cancer detection rate and the stage

of detected cancers [67]. 12 860 screening mammograms were interpreted first

without the assistance of CADe, and then immediately after with the assistance

Chapter 3—Discussion 101

of the ImageChecker CADe system. The authors report that use of the CADe

system for prompting resulted in an increase in recall rate (from 6.5% to 7.7%),

no change in positive predictive value for biopsy, an increase of 19.5% in the

number of cancers detected and an increase in the number of early stage cancers

detected (from 73% to 78%). However, the authors caution that the relatively

low median age of the screening population (49 years) imposes limitations on the

statistical significance of the above observations.

Warren Burhenne et al. retrospectively studied the ability of the ImageChecker

system to identify cancers missed by radiologists [32]. 1 083 mammograms that

led to biopsy-proved cancers and their available prior mammograms were collected

from 13 centres. The CADe system was able to identify 77% of the cancers that

were originally missed by radiologists, without a statistically significant increase

in recall rate. The research suggests that the ImageChecker system could have a

dramatic effect on the early detection of breast cancer.

3.15 Discussion

One of the earliest papers on computer-aided mammography was written by

Ackerman and Gose [1] in 1972. The authors aimed to classify low-resolution

digitised photographs of regions of mammograms as malignant or benign using

automatically-extracted features (measures of calcification, spiculation, rough-

ness and the area-to-perimeter ratio). Classification was attempted using a mul-

tivariate Gaussian model and nearest neighbour classification, the latter of which

was found to perform best. While computers, image digitisation technology and


machine learning algorithms have developed significantly since the paper was

published, the approach to computer-aided mammography has not.

This approach can be stated as follows: ad hoc features are extracted from seg-

mented regions and classified into clinically significant classes. The classification

stage is informed by the wider pattern recognition, machine learning and statis-

tical decision theory communities. The segmentation step typically uses ad hoc

algorithms. Often, an attempt to insert human expertise into the system is made

by choosing features that describe characteristics that radiologists report to be

important. Whilst the above approach is usually reported to be successful, ad

hoc methods risk the accidental adoption of assumptions about the data. The

result of this problem may be that CADe methods perform well on the original

investigator’s data, but do not work as well on other data.

Many methods that use classifiers produce “probability” images, which are later

thresholded to obtain a final classification. These images are typically not true

probability or likelihood images and are simply images with probability-like val-

ues. This distinction is important, because accurate quantitative descriptions

may be more useful to clinicians than qualitative (classification) descriptions,

and could be used in further statistical analyses. Measurements that describe

the state or change in anatomy may also be clinically useful. For example, it is

probably more meaningful to report that a tumour in a mammogram has likely

increased in volume by 20% since the last screening session, than to simply say

that an area is suspicious.

Evaluation criteria are often optimistic. In LROC analysis, for example, the


selection of forgiving localisation criteria can give an inaccurate assessment of the

performance of algorithms. This could be rectified by the adoption of standard

databases and assessment criteria. This would have the additional benefit of

allowing meaningful comparison of results in the literature.

The research of Hutt et al. suggested that CADe would only result in a significant

improvement in radiologist performance if the number of false positives can be

reduced to 0.0125 per image. Research has indicated that radiologist performance

can be improved by CADe algorithms that have false prompt rates substantially

higher than that target [67, 32]. However, it has been shown that current commer-

cial CADe systems can fail to improve radiologist performance [76], so lowering

the false positive rate to a level at which significant improvement can be expected

is highly desirable. Reducing the false positive rate while maintaining sensitivity

will be a significant challenge. The hypothesis promoted in this thesis is that

this kind of improvement can only be achieved by systems that understand the

appearance of mammograms.

Abnormality in mammograms manifests itself in a number of ways, but most

CADe methods target only one of these classes of abnormality; microcalcification

clusters, masses and spiculated lesions are most commonly chosen. A better ap-

proach would be to develop a single method that can identify all (or many) of

the common types of abnormality. It is not immediately clear how this might

be achieved, because the appearances of the various forms of abnormality are so

different. However, there is commonality between all types of abnormal mammo-

graphic appearance: none of them are found in normal mammograms. A method

that could detect deviation from normality should be able to identify all forms of


abnormality. This approach is called novelty detection.

Novelty detection uses a model of the class of interest that allows novel instances

to be identified. Statistical models serve this purpose well, because deviation

from normality can be measured in a meaningful way within a rigorous mathe-

matical framework. Further, generative statistical models—such as active shape

and appearance models [47]—allow synthetic instances of the class of interest

to be generated. This allows the specificity and generality of the model to be

assessed. A specific model is one that models only legal instances of the class

of interest and a general model is one that models all possible instances of the

class of interest. A good model would be both specific and general. The aim of

the work presented in this thesis is to investigate generative statistical models of

mammographic appearance. The ultimate aim is to perform CADe by novelty

detection.

Novelty detection has previously been applied to computer-aided mammography.

Tarassenko et al. identified masses using a novelty detection method. Geometrical

and textural features were extracted from pre-processed mammograms. A Parzen

window density estimator (see Section 5.3) was used to model the distribution of

feature vectors extracted from normal tissue. The method identified all masses

in a test set of 40 images at a false positive rate of 1 per image [174]. Holmes

used an adaptive kernel density estimator to learn the distribution of transformed

scale-orientation pixel signatures taken from normal tissue (see Chapter 4 for a

detailed discussion of pixel signatures). The transformation to a low-dimensional

space allowed Euclidean distance to approximate a sophisticated robust metric.

Holmes performed novelty detection by computing the likelihood of signatures


under the model to produce likelihood images. Subjectively, the likelihood values

appeared to allow pixels belonging to normal tissue to be discriminated from those

belonging to spiculated lesions, though no quantitative evaluation of the method

was performed [90]. However, neither of these methods employed generative

models.

3.16 Summary

This chapter presented a review of the computer-aided mammography literature.

In summary:

• The typical approach taken to CADe is to classify shape and texture fea-

tures, extracted from candidate locations, into clinically significant classes.

It can be difficult to justify exactly why one set of features is better than

another and to explain what they correspond to in terms of the clinical

situation. Features are typically tuned to a specific sign of abnormality, so

each indicative sign requires a different algorithm.

• The lack of standardised evaluation methods, training and test sets makes

it very difficult to compare published results.

• Commercial systems are available and have been shown to improve radi-

ologist performance; however, they can also fail to improve performance.

Psychophysical research has suggested that a false positive rate much lower

than that achieved by current commercial systems is required for signif-

icant improvement in radiologist performance. Much more sophisticated


approaches may be required to achieve such targets.

• One such approach may be novelty detection, where all forms of abnormality

should be able to be detected and quantified within a rigorous mathematical

framework. Novelty detection requires a model of the appearance of normal

mammograms that allows deviation from normality to be measured.

Chapter 4

Scale-orientation pixel signatures

4.1 Introduction

This chapter presents some work on improving an existing method for describing

local image structure in terms of scale and orientation. The chapter presents:

• Background information on mathematical morphology and its use in com-

puting scale-orientation pixel signatures.

• An analysis which identifies two flaws in an existing implementation and

proposes how these problems can be rectified.

• An information theoretic method for comparing the old and new pixel sig-

natures.

• A classification experiment to compare the two approaches.

107

Chapter 4—Mathematical morphology 108

4.2 Mathematical morphology

Image and signal processing has commonly been thought about in terms of fre-

quency (e.g. Fourier analysis; wavelet analysis uses positional information in ad-

dition to frequency information [49, 122]). Mathematical morphology approaches

image and signal processing in terms of shape1. One of the attractions of mor-

phological processing is that image features can be targeted for processing without

altering the rest of the image (e.g. small features can be removed from images,

leaving edges and grey-levels untouched). We will present two fundamental mor-

phological operators and show how they can be combined to perform two other

classes of morphological operation.

Morphological operators can be defined for simple 1-D signals, 2-D images or

more complex signals. We will restrict our discussion to the 2-D image plane.

The operators we shall discuss are binary operators, meaning that they take two

input objects and return a single output. One of these input objects is the image

matrix to be processed and the other is an object called a structuring element,

which allows the operations to be tuned to specific sizes and shapes of feature. A

structuring element is simply a shape and can be represented by a set of vectors

that specify offsets from some origin2. The structuring element can be visualised

by plotting each offset in the set in an image plane. A simple structuring element

1A thorough presentation is given by Serra and Matheron [161, 162, 124], though the readeris directed to Sonka et al. [168] for an introduction to mathematical morphology as it relatesto the work here.

2Although the grey-scale definitions of the following operators can use structuring elementsthat have associated grey-levels, this is not of interest in this work.


is shown in Figure 4.1(b) and corresponds to the following point set3:

S = {(0, 0), (0, 1)}. (4.1)

4.2.1 Dilation and erosion

Dilation and erosion are the fundamental morphological operators. Let f(x)

be a function that describes a grey-level image. Further, let ti be an offset

and S ={ti : i = 1, . . . , N

}be a structuring element as described above. The

dilation of f(x) by S is given by:

f(x)⊕ S = maxt∈S

{f(x− t)}. (4.2)

An example is given in Figure 4.1. Figure 4.1(a) shows a binary image of the

letter E, where the background has a value of zero and the foreground has a value

of one. Figure 4.1(b) shows the structuring element defined by Equation 4.1.

Figure 4.1(c) shows the dilation of the image of the letter E by the structuring

element. The figure illustrates how dilation removes intensity troughs that are

smaller than the structuring element. Dilation increases the object size and can

be used to fill gaps.

The dual of dilation is erosion, which is defined as:

f(x) S = mint∈S

{f(x− t)}. (4.3)

3Note that we use (r, c)—row, column—indexing, as opposed to (x, y) indexing.


(a) (b) (c)

Figure 4.1: Dilation.A binary image matrix is shown in (a). It is dilated by the structuring elementshown in (b). The result is shown in (c).

Erosion removes intensity peaks that are smaller than the structuring element.

4.2.2 Opening and closing

Dilation and erosion can be used to remove image features, but they change the

global appearance of the image (the object in Figure 4.1 is made larger by dilation

and would be made smaller by erosion). Dilation and erosion can be combined

so that targeted features are removed without changing the global appearance of

the image. These combinations are called the opening and closing operators, and

are respectively defined as:

f ◦ S = (f S)⊕ S, (4.4)

f • S = (f ⊕ S) S, (4.5)

where we drop the image indexing for simplicity. Opening and closing respectively

remove intensity peaks and troughs that are smaller than the structuring element,

without altering the global image appearance. They are idempotent operators,


which means that successive applications of the same operation do not alter the

previous result.

4.2.3 M- and N-filters

Opening and closing allows intensity peaks and troughs to be removed without

altering the global image appearance, but are tuned to the polarity of the features

on which they operate. Combining an opening and a closing is called sieving

[11]. Sieves remove image features that are smaller than the structuring element,

irrespective of the feature’s polarity. Two sieves—called M- and N-filters—are

respectively defined as:

f } S = (f ◦ S) • S, (4.6)

f ~ S = (f • S) ◦ S. (4.7)

An example of grey-level sieving is shown in Figure 4.2. A mammographic region

of interest is sieved using a rectangular structuring element oriented at approx-

imately 45◦. The figure shows how image structure that is smaller than the

oriented structuring element is removed.

Chapter 4—Pixel signatures 112

(a) (b)

Figure 4.2: A sieved mammographic image.Image (a) is a mammographic region of interest around a spiculated lesion. Image(b) shows the result of sieving image (a) with a rectangular structuring element,oriented at approximately 45◦. The structuring element is shown in red in thetop-right corner of (b).

4.3 Pixel signatures

4.3.1 Local scale-orientation descriptors

Pixel signatures are rich feature descriptors of local image structure that are ex-

pressed in terms of scale and orientation. Describing mammographic features in

terms of scale and orientation is useful for a number of reasons. Mammograms

contain features that have an associated orientation (e.g. curvilinear structures)

and which do not have a particular orientation (e.g. circumscribed masses); these

features may exist over a range of scales. Radiologists often talk about mam-

mographic features in terms of scale and orientation (e.g. features that ‘point’

towards the nipple or ‘radiate’ from a particular location). Further, it is known


that the mammalian primary visual cortex explicitly encodes visual information

in terms of scale and orientation (see [183] for a discussion of the work of Hubel

[93] and Wiesel [184]).

The pixel signatures discussed in this thesis are developed from those described by

Holmes [90], which used M-filters, and these were in turn developed from those

described by Zwiggelaar et al. [192], which used directional recursive median

filters4.

4.3.2 Constructing pixel signatures

For a given input image, a scale-orientation pixel signature is computed at each

pixel location as follows. A set of sieved images are generated from the input im-

age by sieving it with structuring elements at a number of scales and orientations.

The pixel signatures used by Holmes et al. [91, 92, 90] were computed using a

Bresenham line structuring element [26]. Each signature is a 2-D array where the

rows are measurements for the same scale and the columns are measurements for

the same orientation (see Figure 4.3).

Formally, let f(x) be a grey-scale image. f(x) is sieved using a set of structuring

elements {Sσ,φ}, where σ indexes scale and φ indexes orientation. The result is

a set of grey-scale images {sσ,φ(x)}. The value at (σ, φ) in the pixel signature

4The principal advantage of using morphological operators is that there is an efficient wayto perform erosion and dilation [167], however today’s desktop computers can construct pixelsignatures reasonably quickly using a naıve implementation.


Figure 4.3: Example pixel signatures.Pixel signatures taken from the centres of Gaussian blob and line images.

associated with location x in f(x) is given by

ρ(x, σ, φ) = sσ−1,φ(x)− sσ,φ(x). (4.8)

Stated simply, for a particular image pixel and a given scale and orientation, the

signature value is the grey-level difference between the pixel value in the sieved

images at the previous and current scales.

Figure 4.3 shows two pixel signatures taken from the centres of two synthetic

images. One image is a Gaussian blob and the other is a Gaussian line. The

signature for the Gaussian blob shows approximately uniform scale which is in-

dependent of orientation5. The signature for the Gaussian line shows that as one

looks across the line it appears to have a limited scale, but when one looks along

the line it appears to be much larger. Pixel signatures from non-trivial images are

not as simple to interpret and are intended to be used as feature vectors within

a machine learning framework such as a classifier.

In the previously reported work [91, 92] pixel signatures were generated for 12

5We will see later that a limitation in the implementation results in the non-ideal behaviourin the signature for the Gaussian blob in Figure 4.3.


regularly-spaced orientations and 10 scales—ranging from 150 µm to 2 cm. These

scales encompass image features that we would like to measure, from microcalci-

fications to small masses. The scales increase logarithmically to give preferential

sampling resolution to small features. We use the same scheme in the research

presented in this chapter.

4.3.3 Metric properties

Although pixel signatures give a rich local description of image structure, the Eu-

clidean distance between two pixel signatures treated as points in a vector space

is an imperfect similarity measure. This is because responses to two similar image

structures may appear in slightly different locations in the corresponding signa-

tures. The work presented by Holmes et al. describes a sophisticated approach

to dealing with this problem by treating signature similarity as a transportation

problem, where similarity is measured by the cost of transforming one signature

into another [91, 92, 90]. Further, an efficient way of computing this measure is

described, where signatures are transformed into a space where Euclidean dis-

tance approximates the transportation cost. This chapter deals with improving

the raw signatures, and so the metric properties of pixel signatures will not be

discussed further.

Chapter 4—Analysis of the current implementation 116

4.4 Analysis of the current implementation

In this section we analyse the existing implementation of pixel signatures and

propose two improvements. The first addresses the length of the structuring

element and the second addresses the coverage of the structuring element.

4.4.1 Structuring element length

Figure 4.3 shows a problem with the implementation of pixel signatures used in

[91, 92, 90]: even though the Gaussian blob is circular (up to the image quan-

tisation), the pixel signature for the central pixel shows the scale to vary with

orientation. This is caused by incorrect computation of the length of the struc-

turing element, which should be invariant to orientation. If, for a particular scale,

one were to plot the position of the ends of the structuring element as it is rotated

about a pixel, it should trace a circle. Instead, the structuring element traces a

square, with the structuring element being longer at the diagonal orientations

than at the horizontal and vertical orientations. This is illustrated in Figure 4.4:

as the structuring element moves from position A to B the structuring element

“grows” in length (although all three structuring elements have the same number

of pixels). This problem is corrected in our implementation, as Figure 4.7(b)

shows.


4.4.2 Local coverage

The structuring elements in the existing implementation are 1-D (i.e. a single

line of pixels), as illustrated in Figure 4.4. The area between rotations of the

structuring element—the shaded region in Figure 4.4—does not contribute to

the pixel signature. If we neglect quantisation, this region has an area of r2θ

(i.e. two sectors of a circle), where r is the length of the structuring element

and θ is the angle between adjacent structuring elements. This is a problem

because there is likely to be useful information in the region that is not considered.

While information in this area may contribute to nearby signatures, it should

be contained in the signature for the pixel. The solution is contained within

Figure 4.4: the structuring element should be shaped like a bow tie—i.e. like the

shaded region in the figure.

Recall from Section 4.2 that our morphological operators are defined in terms of

minima and maxima of areas under the structuring element. The bow tie-shaped

structuring element is non-trivial to construct on the quantised image plane for

arbitrary sizes and orientations. Further, computing the minimum or maximum

value under such a shape—particularly for large images such as mammograms—is

likely to be computationally demanding. We seek to improve the signatures by

considering the relevant pixels using a suitable structuring element, but without

incurring the computational penalty associated with a complex shape.

Figure 4.5 shows a series of approximations of the bow tie-shaped structuring

element. Simplifying the shape of the structuring element yields the element

shown in Figure 4.5(b), which has a shape that is easier to construct, but gives


Figure 4.4: An illustration of the two limitations of the existing implementation.Three rotations of a structuring element are shown. As the structuring elementis rotated, it “grows” in length. The red shaded region illustrates the area notcovered by the 1-D structuring elements of the existing implementation and thedesired length of the diagonal structuring element.


(a) (b) (c) (d)

Figure 4.5: Incremental approximations of the bow tie structuring element.The computationally expensive bow tie-shaped structuring element is shown in(a). An initial approximation is shown in (b), which is closely approximated by(c). The structuring element in (c) can be approximately decomposed as (d).


consideration to the regions either side of the centre. When quantised, this ap-

proximation is actually a rectangle for all but the largest structuring elements,

and the additional regions either side of the centre are insignificant. Using a solid

structuring element is expensive because of the number of pixels that need to be

compared when computing the minimum or maximum.

It is possible to approximately decompose a sieving with an arbitrarily oriented

rectangular structuring element as a sieving with two orthogonal 1-D structuring

elements. The first structuring element has the same length and orientation as

the longest side of the rectangular structuring element. The second structuring

element has the same length and orientation as the shortest side of the rectangular

structuring element. The input image is sieved using the first structuring element

and the resulting image is then sieved using the second structuring element6. For

the majority of pixels (60%–90%), there is no difference between the full sieving

and the approximation. Approximation errors are very rarely more than 10 grey-

level values in magnitude (in 8-bit images) and are imperceptible. Because we

have been able to decompose the sieving in terms of two structuring elements

that are one pixel wide, Soille’s algorithm can be used to perform the erosions

and dilations efficiently [167].

The width of the rectangular structuring element—and hence the second 1-D

structuring element—needs to be such that the rectangle “fits” as it is rotated

from one orientation to another. If the length of the first (longest) structuring

element is r then the length of the second structuring element is 2r sin θ2

where

6Experimental work showed that reversing the order in which sieving was performed de-creased the accuracy of the approximation.


Figure 4.6: Rotating the “rectangular” structuring elements.The diagram shows how the width of the rectangle—and hence the length of thesecond approximating structuring element—needs to be selected so that correctcoverage is achieved (i.e. the corners of adjacent structuring elements need totouch).

θ is the angle through which the elements are rotated when moving from one

orientation column to another. This is illustrated in Figure 4.6.

Our proposed new method of computing pixel signatures ensures that structuring

element length is constant over orientation (allowing for the quantised image

plane) and uses the orthogonal elements approximation to give consideration to

pixels that the original method neglected. Figure 4.7 shows a pixel signature

computed for the centre of a Gaussian blob using the new method. The non-

linearity that remains is due to quantisation. Signatures from the centre of a

Chapter 4—An information theoretic measure of signature quality 122

(a) (b)

Figure 4.7: An “improved” pixel signature from the centre of a Gaussian blob.A Gaussian blob is shown in (a) and a pixel signature, computed using our method,is shown in (b). Note that the signature does not exhibit the non-linearity of theequivalent signature in Figure 4.3.

Gaussian line are similar to those of the original method.

4.5 An information theoretic measure of signa-

ture quality

The most obvious way to compare the original and new methods of computing

signatures would be to run a classification experiment. However, to build an ac-

curate picture of how well each performed would require large-scale experiments,

targeting the various different forms of abnormality. Consequently we sought a

more direct measure of comparing their behaviour.

In producing pixel signatures, we hope to encapsulate useful information about

local image appearance. A signature that contains more information than another

is likely to be more useful. Shannon’s entropy [163] is a measure of the average

information carried by a discrete symbol emitted from some source. The entropy


measure is derived by considering the “uncertainty” that is associated with a

symbol (or the “surprise” associated with the symbol). Given a symbol with

probability p, selected from some alphabet A = {a1, a2, · · · , aN}, the measure of

the uncertainty associated with the symbol, u(p), is defined axiomatically:

• u(1) = 0. We are certain of—or unsurprised by—the certain event.

• u(p) > u(q) ⇐⇒ p < q. We are more uncertain of—or more surprised

by—less probable symbols.

• u(pq) = u(p) + u(q). The uncertainty measure is additive for a sequence of

symbols.

• u(p) is continuous in p.

Shannon showed that the only function satisfying these axioms is u(p) = −K loga p.

The constant K is usually set to unity and the base of the logarithm is usually

set to 2, in which case the uncertainty—usually interpreted as the information

content—is measured in bits. The expected information content of a symbol

emitted by a source is given by:

H = −N∑

i=1

pi log2 pi. (4.9)

Shannon’s entropy can be illustrated as follows. Imagine two coins, each of which

has an associated probability mass function. Assume that one coin is fair and

the other is very heavily biased towards Heads. Further, imagine that a friend

knows these models and has to guess the outcomes of coin tosses, given that they


can know which coin was tossed. Telling your friend that the unfair coin was

tossed gives them a very good chance of correctly guessing the message, but the

actual message itself (‘The coin landed with the Head facing upwards.’ ) contains

little surprise (information). Conversely, if the friend is told that the fair coin

is tossed, they have little information about what the message might be and so

the message carries more surprise (information) than the message for the unfair

coin. In summary: on average, events from peaked distributions convey little

information, while events from flat distributions convey more information.

An experiment to compare the two methods of computing scale-orientation pix-

els signatures using the information theoretic measure of signature quality is

described below.

4.5.1 Aims

The aim of the experiment was to determine if the modifications made to the

pixel signatures increases the information content of the new signatures, relative

to the original method.

4.5.2 Method

We would ideally treat each pixel signature as a symbol and compute the expected

information that each of the two types of signature carries (i.e. treat the signature

type as the source). However, because the pixel signatures we use are essentially

points in a 120-D space, building a model of the probability mass function for


signatures is intractable; it is very unlikely that multiple identical signatures will

be encountered, even in a large sample7. If an equal number of original and

new signatures were sampled, and each signature occurred only once, then the

Shannon entropy of each source would be identical. Such a measure would not

be useful. Instead, we consider each pixel signature to be a source, where the

values of the signature elements are the message symbols. If all the elements in

signatures had similar values the signatures would carry little useful information,

whereas signatures where different elements take on distinct values can carry

useful information.

A set of 10 regions of interest, each approximately 400 mm2, around spiculated

lesions were pseudo-randomly selected from the Digital Database for Screening

Mammography [83]. As well as containing the abnormal feature, the regions

were large enough to contain pixels from tissue that a CADe system should label

as being normal. For each pixel in each image, pixel signatures were computed

using both methods and the corresponding Shannon entropies were calculated.

Despite the relatively small number of images, our sample size is actually very

large (2 310 342 pixel locations). A more comprehensive study would look at all

indicative signs of abnormality (e.g. the various types of mass, microcalcifications,

architectural distortions), but such work was beyond the scope of this experiment.

7Multiple identical signatures would only exist in a set of images that contained multipleidentical regions.


4.5.3 Results

Shannon entropy was computed for 2 310 342 pixel signatures. The total Shannon

entropy was 6 426 499 bits for the original signatures and 7 638 189 bits for the new

signatures. This is an average increase of over 0.5 bits per pixel or nearly 19%.

A t-test on the paired differences between the two sets of entropies at the 95%

significance level showed that the new method yields a statistically significant

increase in Shannon entropy. Figure 4.8 shows three regions of interest around

spiculated lesions and illustrates where the additional information is distributed.

4.5.4 Discussion

The results show that our attempt to improve the way that pixel signatures are

computed increases the information content of the signatures for spiculated lesion

and surrounding “normal” tissue. Although pixel signatures for almost all types

of tissue included see an increase in information content, the increase seems to

be larger for regions around masses—particularly for spicules. Little increase in

information, or a decrease, is seen in homogeneous regions. We cannot draw any

conclusions for regions containing microcalcifications—as we did not include such

images—but as inhomogeneous regions see the most increase in Shannon entropy,

we would expect an increase for pixel signatures from such regions. The following

experiment investigates whether our modifications yielded better results when the

new method of computing signatures is used in a practical application.


Figure 4.8: Regions of increased Shannon entropy.The left column shows three regions of interest (to scale). The right column showsthe pixel-wise differences in Shannon entropy between the new and original meth-ods (i.e. positive values illustrate where the new method has more information).Thresholding the difference images shows that almost all pixel pixel signaturescomputed using the new method have more information than those computed withthe original method.

Chapter 4—Classification-based evaluation 128

4.6 Classification-based evaluation

4.6.1 Aims

The information theoretic evaluation demonstrates that the new signatures con-

tain more information than those produced by the previous method. Although it

is intuitive to expect that a more informative description will yield better results

when used within a learning framework such as a classifier, we need to demon-

strate that this is the case. The aim of this experiment is to determine if the new

signatures can be applied more successfully than those produced by the original

method.

4.6.2 Method

An expert radiologist provided annotations for the images described in Section 4.5

(an example region of interest is shown in Figure 4.9). A set of just over 20 000

locations within the images were randomly sampled, such that half were sampled

from the abnormal regions and half from the normal regions. Pixel signatures—

computed using the two methods as described previously—were then extracted

for these locations. The columns of the signatures were concatenated, converting

the 2-D signatures into vectors that can be considered to be points in a 120-D

space. For each type of signature—i.e. original and new—training and test sets

were formed by randomly allocating signatures to either a training set or a test

set.


Figure 4.9: An example region of interest and its groundtruth.

There are many pattern classification techniques—e.g. nearest neighbour clas-

sifier, linear discriminant analysis, artificial neural networks—and the support

vector machine classifier has become popular for its classification ability and abil-

ity to generalise. A support vector machine classifier [31] was trained using the

training set for the original signatures. Suitable training parameters were selected

by validating on the test set for the original signatures. A second classifier was

then trained on the training set for the new signatures, using the same training

parameters as were selected for the original signatures. This approach attempts

to remove bias towards the new method of producing pixel signatures. The test

set for the signatures produced using the new method were then classified using

the second classifier.


Original signatures New signatures

nTP 3 729 3 932nTN 3 745 3 746nFP 1 276 1 266nFN 1 283 1 080Specificity 0.747 0.751Sensitivity 0.744 0.785

Table 4.1: Classification results for the two signature types.The table shows the number of true positives (nTP), true negatives (nTN), falsepositives (nFP), false negatives (nFN) and the specificity and sensitivity for thetwo signature types. See Section 3.11 for an explanation of these quantities.

4.6.3 Results

The results of the classification experiment are summarised in Table 4.1. Both

the specificity and sensitivity are improved when using the classifier trained using

the new signatures.

4.6.4 Discussion

The results show that the new signatures can yield better results in classification

experiments. It should be noted that the results for the classifier trained using

the new signatures is pessimistic, since the classifier parameters were tuned to the

original signatures. As stated previously, Euclidean distance is not a good met-

ric for pixel signatures. Classifiers trained in a space where Euclidean distance

approximates the transportation-based similarity measure perform better than

those trained in the raw pixel signature space [90]. We could therefore expect

classification performance to be improved if we used signatures in a more appro-


priate space and selected classifier parameters for the new signatures, rather than

for the original signatures.

4.7 Summary

This chapter presented work on improving the way that scale-orientation pixel

signatures are computed. In summary:

• Mathematical morphology was introduced.

• Scale-orientation pixel signatures were introduced and an existing imple-

mentation was analysed. Two flaws with the existing method were ad-

dressed, yielding a new way to compute pixel signatures. An efficient way

of computing the new signatures was developed.

• An information theoretic measure of signature quality was developed. Com-

paring pixel signatures computed on mammographic images using the old

and new methods showed that the new method increased the information

content of the signatures by approximately 19%.

• A classification experiment was reported in which signatures computed us-

ing the two methods were used to discriminate between pixels belonging to

normal and spiculated lesion tissues. The new signatures outperformed the

original signatures in terms of both specificity and sensitivity. By tuning the

classifier parameters to the new signatures—rather than the old ones—it is

expected that even better performance could be achieved.


Although the pixel signature approach shows some promise as a method of mod-

elling mammographic appearance, it does not lead to the generative approach

advocated in the introduction to the thesis. The remainder of the thesis focuses

on developing generative statistical models of mammographic appearance.

Chapter 5

Modelling distributions with

mixtures of Gaussians

5.1 Introduction

This chapter presents background information on the multivariate normal distri-

bution and a class of statistical model called the Gaussian mixture model, both of

which are used extensively in the remainder of the thesis. The chapter presents:

• A brief overview of the density estimation problem and a review of common

approaches used to model distributions.

• The Gaussian mixture model and two algorithms for learning the model

parameters from training data.

• Some useful properties of the multivariate normal distribution and Gaussian

133

Chapter 5—Background 134

mixture models (computing marginal and conditional distributions).

• A method of learning Gaussian mixture model parameters from large train-

ing sets using a variant of the k-means clustering algorithm.

5.2 Background

This thesis is largely concerned with statistical modelling, which is used to de-

scribe scenarios (experiments) that are governed by stochastic processes, or which

can be assumed to be governed by such processes. One of the characteristics of

randomness is variation, and this thesis deals with the variation of mammographic

texture and appearance. We use statistical models to cope with this variation.

A random variable (e.g. X) is a function that maps every possible outcome of an

experiment to a unique number1. In this way, the random variable is governed by

the stochastic process. The probabilities of discrete events are usually described

using a probability mass function (pmf), P (X = x), abbreviated as P (x). For

an event x outside the possibility space (i.e. an impossible event), P (x) = 0.

The certain event is assigned a probability of unity. The discrete cumulative

distribution function (cdf) is defined as

C(x) = P (X ≤ x) =∑X≤x

P (x). (5.1)

1Experiments often do not have numerical outcomes, e.g. tossing a coin has outcomes Headsand Tails.


Similarly, the continuous cdf is defined as

C(x) = P (X ≤ x) =

∫ x

−∞p(α) dα, (5.2)

where p(x) is the probability density function (pdf). A pdf is simply the derivative

of the corresponding cdf, and is the continuous equivalent of the pmf. Probability

mass and density functions are nonnegative and must sum or integrate to unity

(because the probability of any event occurring in the possibility space is the

certain event).

Events in continuous distributions are defined as being regions within the possibil-

ity space and so the probability of an event is equal to the integral of the pdf within

the region that delimits the event. In this thesis the possibility spaces are typi-

cally measured on multiple axes, and so the pdfs are multivariate (i.e. are scalar

functions of vectors). In the multivariate case, pdfs have a value at every point

in the possibility space, and probabilities are the integrals over hyper-volumes of

the regions that delimit the events.

The problem of density estimation can be stated as follows: given a training

set, T = {xi ∈ Rd}, i ∈ {1, · · · , N}, of samples from a particular population,

how do we compute the value of the associated pdf for an arbitrary point in the

space? Implicit in this question is the assumption that the pdf cannot be known

a priori. Further to estimating the pdf, it is often necessary to manipulate the

pdf to determine further densities, such as marginal and conditional distributions

(Section 5.5 presents some background on these topics), or to compute likelihoods

or probabilities by integrating the pdf.

Chapter 5—Density estimation 136

5.3 Density estimation

A common approach to density estimation is to assume that the data in T follows

a known trivial distribution, such as a uniform or normal distribution. The

validity of this assumption can be assessed roughly by plotting the data on each

pair of dimensions. If the data do not follow the assumed distribution, then more

sophisticated approaches are required. We will now look at a few common density

estimation techniques and consider how they support the following three tasks:

1. Computing a marginal distribution.

2. Computing a conditional distribution.

3. Sampling from the underlying pdf.

(A full description of what the terms marginal and conditional mean is given in

Section 5.5.)

A simple density estimator is the histogram. The possibility space is broken

into discrete regions called bins, and each bin is assigned a value equal to the

number of training data that lay within the associated region. When normalised

to sum to unity, the histogram defines a pmf, and the situation changes from

being continuous to being discrete. If there are ample data, the granularity

of the estimate of the probability mass function can be such that it is a good

approximation of the pdf. Addressing our three tasks:

1. Computing a marginal distribution simply involves summing the histogram

along the marginal dimensions.


2. Computing a conditional distribution can be achieved by constructing a

lower-dimensional histogram from the bins that intersect the conditions,

and then normalising the resulting pmf to sum to unity.

3. A multivariate histogram could be sampled as follows: construct an associa-

tive array that maps the probabilities to their bin locations in the possibility

space and then sample one of these mappings according to the probabilities.

The histogram approach works well in low dimensions. However, the amount

of data required to populate a space with a given density of data increases ex-

ponentially with the dimensionality of the space. Imagine that one can feasibly

sample from 10 000 individuals. If we could only measure one attribute for each

individual, on a scale of 1 to 100, then the density of data points in the possibility

space would be 10 000/100 = 100. If one could measure two attributes for each

individual (using the same scale), then the density of the possibility space would

be 10 000/1002 = 1. Measuring three attributes yields a density of 0.01, and so

on. This effect is called the curse of dimensionality [14].

A naıve representation of a high-dimensional histogram would be a multi-dimensional

array. To approximate a continuous density, each dimension requires a reasonable

number of elements. The result is a multidimensional array with ad elements—

where a determines the quality of the approximation to the continuous density—

which quickly becomes impractical. The curse of dimensionality implies that most

of the histogram bins will be empty. A more practical implementation could ex-

ploit this redundancy and use a sparse representation, but this would lack the

conceptual simplicity of the histogram and would make computing marginal and


conditional distributions more difficult.

The k-nearest neighbour approach [56] uses the data directly to facilitate density

estimation. The idea is to estimate the local density around a given location in

the possibility space by considering the distance, r, to the k-th nearest neighbour.

The assumption is that the content (hyper-volume) of a hypersphere of radius

r around a point of interest, will be smaller in densely populated regions of the

possibility space than in sparsely populated regions. The density at a point x

can be estimated as

p(x) ≈ k

N

1

vd(r)(5.3)

where kN

is an estimate of the probability represented by the k data points and

vd(r) is the content of a d-dimensional hypersphere with radius r = ‖x− xk‖2,

where xk is the k-th nearest neighbour to x. The main problem with this method

is that an efficient method of finding the nearest neighbours is required. Address-

ing our three tasks:

1. Computing a marginal density simply involves modifying the nearest neigh-

bour routine to neglect measurements from the non-marginal dimensions

and using the appropriate value for d when Equation 5.3 is used.

2. It is difficult to see how this method would allow conditional distributions

to be computed.

3. Since samples from a population are likely to be more common from dense

regions of the pdf, “new” samples could be generated simply by choosing

one of the original samples at random, but this would restrict samples to

the observed set. One could consider the hypersphere defined by r to be of


uniform density, and draw a sample from such a region around a randomly

chosen point in the set of observed samples.

The Parzen window density estimator [56] is similar to the k-nearest neighbour

approach in that it uses the training data points directly to help model the den-

sity. The method assumes that the underlying pdf to be estimated is nonzero at

locations near the training points and that less can be inferred about the pdf—

based on a particular training point—as one moves further away from it in the

possibility space. The relationship between the inference that can be made about

the pdf, based on a particular training point xi, and the distance from that train-

ing point, is represented by a kernel function which has the form k(r, xi), where

r is the (often Euclidean) distance to xi. It is the kernel that defines the contri-

bution of the data point to the estimate of the pdf: a kernel is centred on each

data point, and the pdf is defined as the sum of these kernels, normalised such

that the integral of the pdf is unity. The particular form of the kernel (e.g. the

Gaussian, boxcar or triangle functions)—and its parameterisation—must be cho-

sen to be suitable for the application at hand. While the Parzen window density

estimator is reasonably simple, choosing the kernel function and its parameters

can be difficult. Addressing our three tasks:

1. Computing a marginal distribution simply involves ignoring measurements

on the non-marginal axes and re-normalising the integral of the pdf to unity.

2. Computing a conditional distribution will depend upon the form of kernel

chosen. If a Gaussian kernel is chosen, then a closed-form solution exists

(see Section 5.5.2).

Chapter 5—Gaussian mixture models 140

3. Similarly to the k-nearest neighbour approach, one could sample from a

Parzen window representation by choosing a data point at random, and

then sampling from its associated kernel as if it were a distribution.

In the remainder of the chapter we present the Gaussian mixture model, which

can be viewed as a generalisation of the Parzen window density estimator. The

Gaussian mixture model is an elegant and relatively simple density estimator that

can be trained in a principled way. Further, there exist closed-form solutions for

the marginal and conditional distributions. It is also easy to sample from the

modelled distribution. We shall exploit these properties in much of this thesis

and see that these properties are extremely useful for our image synthesis and

analysis methods (see Chapter 6 and Chapter 9).

5.4 Gaussian mixture models

The GMM approximates an arbitrary pdf using a weighted sum of Gaussian (nor-

mal) basis functions, which we call components. In the univariate case, where

observations are measured on a single axis, each component is parameterised by

a mean and a variance. In the multivariate case, where observations are mea-

sured on multiple axes, each component is parameterised by a mean vector and

a covariance matrix. In addition, each component has an associated probability

(“weight”). We shall assume the multivariate case, but the same theory applies


in the univariate case. The GMM has the following form:

p(x) =k∑

i=1

P (i)g(x, µi, Σi) (5.4)

where x is a point in the possibility space, p(x) is the pdf, i indexes the k

components, and µi and Σi are the mean vector and covariance matrix for the i-

th component. The probability of the i-th component is P (i) and g is the function

that describes the pdf of a single component:

g(x, µ, Σ ) =1√

(2π |Σ |)ne−

12(x−µ)TΣ−1(x−µ) . (5.5)

5.4.1 Learning the parameters

To perform density estimation using the GMM, one has to find the model param-

eters that fit the model to the training data. This is an ill-posed problem, and

the most common regularisation strategy is to use maximum likelihood estima-

tion, where we seek the model parameters that maximise the likelihood that the

model could generate the data. We shall present two solutions to the parameter

selection problem shortly.

Finding the model parameters would be simpler if we did not have to worry

about the parameter k—the number of model components—which effectively says

that there is a countably infinite number of families of model. Unlike many

unsupervised learning problems, where one of the aims is to discover the classes

that exist within a mixed training set, all the data in T comes from the same class,


so we do not need to determine the “correct” number of components—we simply

want to model the distribution of the data. As the number of model components

increases, so does the level of pdf detail that can be modelled. However, we must

be able to support the choice of parameters for each component using data from T ,

so there is a practical upper bound on the number of components that a model can

have. We shall see later that once we have determined the model parameters and

want to use the model, we need to iterate over each component. This introduces

a further constraint on the number of components, as the computational cost of

using a GMM is related to this number. In short, provided that we have adequate

support for the components, we can have as many as is practical.

We will now describe two approaches to fitting a GMM to training data. The k-

means clustering algorithm is a simple and intuitive method, but was not designed

to fit GMMs to data, while the Expectation-Maximisation (EM) algorithm is more

principled.

5.4.2 The k-means clustering algorithm

The k-means clustering algorithm [125, 102] is a simple example of unsupervised

learning. The problem is posed as follows: given a set of multivariate measure-

ments, T ={ti : i = 1, . . . , N

}, form k disjoint subsets (called clusters) such

that all the elements of a particular cluster are similar. There are many variants

of the algorithm, but we shall present two: the first clusters the data in a single

pass (see Algorithm 1); the second is iterative in nature, giving each data point

the opportunity to migrate (see Algorithm 2).


Algorithm 1 The non-iterative k-means algorithm.

. Randomly assign each ti ∈ T to one of the k clusters.for i = 1, · · · , k do

. Compute the i-th cluster centre: the mean of the elements assigned tocluster i.

end forfor each element ti ∈ T do

. Using some metric compute the distance from ti to each cluster centre.if ti is not assigned to the cluster with the closest centre then

. Assign ti to the cluster with closest centre.

. Recompute the means of the two clusters involved in the reassignment.end if

end for

Algorithm 2 The iterative k-means algorithm.

. Randomly assign each ti ∈ T to one of the k clusters.for i = 1, · · · , k do

. Compute the i-th cluster centre: the mean of the elements assigned tocluster i.

end forrepeat

for each ti ∈ T do. Using some metric compute the distance from ti to each cluster centre.if ti is not assigned to the cluster with the closest centre then

. Assign ti to the cluster with closest centre.end if

end for. Recompute the cluster centres.

until some stopping criterion is met (see text).


The metric used to measure similarity can be selected to be appropriate to the

problem at hand, but Euclidean distance is often used. For the iterative algo-

rithm, a range of stopping criteria can be used, but a common strategy is to stop

iterating when no further reassignments occur.

Once a final clustering has been obtained, it is a simple matter to fit a GMM

to the clustering: the means,{µi : i = 1, . . . , k

}, are simply the cluster centres;

the covariance matrices,{Σi : i = 1, . . . , k

}, are the covariance matrices com-

puted from the elements assigned to each cluster and the component probabili-

ties, {P (i) : i = 1, . . . , k}, are computed using the number of elements assigned

to each cluster:

P (i) =ni

N. (5.6)

The clustering scheme can easily be modified to remove clusters if the number of

elements assigned to them falls to a level at which there is insufficient support

for the corresponding Gaussian component.

The k-means algorithm is intuitive and simple to implement, but it was not

designed to fit a GMM to data. In statistical terms, the k-means algorithm

minimises the within-cluster variances. Due to the random initialisation, a given

run of the algorithm will find one possible local minimum. Several runs of the

algorithm give a reasonable chance of finding the global minimum or a suitable

local minimum.


5.4.3 The Expectation Maximisation algorithm for Gaus-

sian mixtures

Although the k-means algorithm is intuitive, it was not designed to fit GMMs to

data. The maximum likelihood formulation provides a more principled approach

to this problem, where model parameters are sought that maximise the likeli-

hood of the data having been generated. Unfortunately, there is no analytical

solution to this optimisation problem, and so alternative approaches are used.

The Expectation-Maximisation (EM) algorithm [137] is a general approach to

simplifying maximum likelihood problems, and in this section we shall present

the EM algorithm for fitting a GMM to training data. We will start with a simple

one-dimensional problem with just two model components [81], and then gener-

alise the algorithm to work in higher dimensions and with an arbitrary number

of components. (The expectation maximisation algorithm is presented in its ab-

stract form in Appendix A, along with a proof that the algorithm converges to a

local maximum of the objective function.)

We assume a training set,{xi ∈ R : i = 1, . . . , N

}, that has been drawn from an

underlying distribution that can reasonably be modelled using a GMM with two

components. Using the random variables X, X1 and X2, we can describe our model

as follows:

X ∼ (1−∆)X1 + ∆X2 (5.7)

X1 ∼ N(µ1, σ21) (5.8)

X2 ∼ N(µ2, σ22) (5.9)


where ∆ ∈ {0, 1} with P (∆ = 1) = π.

Equation 5.7 can be viewed as a simple generative model: generate a ∆ with

probability π; if ∆ = 0 then deliver X1, otherwise deliver X2. If gθ(x) is a normal

distribution with parameters θ = (µ, σ2), the we can write the pdf of X as:

p(x) = (1− π)gθ1(x) + πgθ2(x). (5.10)

The model is parameterised by a vector Θ = (π, θ1, θ2) = (π, µ1, σ21, µ2, σ

22). We

want to select an optimal vector, Θ′, which is a maximiser of the likelihood of the

data having been generated by the model. The log-likelihood of the parameters

given the N data points is:

`(Θ;X ) =N∑

i=1

log p(xi) =N∑

i=1

log [(1− π)gθ1(xi) + πgθ2(xi)] . (5.11)

Unfortunately, there is not a closed-form solution to Equation 5.11 and so a

numerical approach is required. If we knew the component from which each data

point was drawn, then finding the optimum Θ would be easy—the component

means and variances could just be computed by considering each component

separately, and π could be computed from the number of points assigned to each

cluster. Because we do not know the membership of each data point, we consider

unobserved latent variables,{∆i ∈ {0, 1} : i = 1, . . . , N

}, as in Equation 5.7, and

make soft (probabilistic) assignments. Given a current estimate, Θ, of the model


parameters, we compute the expected value of each ∆i:

δi = E(∆i|Θ,X ) = P (∆i = 1|Θ,X ) (5.12)

and we can call δi the responsibility of X2 for observation i. This is the expectation

step of the EM algorithm. In the maximisation step, the estimates of the model

parameters are updated using maximum-likelihood estimates weighted by the

responsibilities. The EM algorithm for fitting a GMM with two components to

one-dimensional data is described by Algorithm 3.

Just as the k-means algorithm requires an initial hard assignment of data points

to clusters, the EM algorithm requires an initialisation. For example, the mixing

proportion π can be set to 0.5, two of the xi may be chosen to be µ1 and µ2, and the

component variances can be set to be the overall sample variance, 1N

∑Ni=1(xi−x)2.

If a Gaussian component, with zero variance, is placed upon one of the data

points, then the likelihood of that data point becomes infinite, thus giving an

unfortunate maximum for Equation 5.11. Therefore the variances must be con-

strained to be greater than zero. Dempster, Laird, and Rubin showed that an

iteration of the EM algorithm cannot decrease the objective function [137]. In

general, the objective function can have multiple optima, and several runs of the

algorithm—using different initialisations—may be required. Algorithm 4 gener-

alises Algorithm 3 to the case of multivariate data and multiple model compo-

nents. Notice that with multiple components, the component responsibilities for

the data points need to be computed in the computation of each type of model

parameter, and so the expectation and maximisations steps are combined.


Algorithm 3 The EM algorithm for fitting a GMM with two components toone-dimensional data.

. Initialise the parameters (see text):

Θ = (π, µ1, σ21, µ2, σ

22). (5.13)

repeat. The expectation step: update the estimate of the responsibilities:

δi =πgθ2

(xi)

(1− π)gθ1(xi) + πgθ2

(xi),∀ i ∈ {1, · · · , N}. (5.14)

. The maximisation step: update the weighted maximum-likelihood es-timates of the means and variances, and update the estimate of the mixingprobability:

µ1 =

∑Ni=1(1− δi)xi∑Ni=1(1− δi)

, (5.15)

σ21 =

∑Ni=1(1− δi)(xi − µ1)

2∑Ni=1(1− δi)

,

µ2 =

∑Ni=1 δixi∑Ni=1 δi

,

σ22 =

∑Ni=1 δi(xi − µ2)

2∑Ni=1 δi

,

π =N∑

i−1

δi

N.

until convergence.


Figure 5.1: An illustration of the expectation maximisation algorithm.

Figure 5.1 shows an illustration of the EM algorithm. The figure shows the joint

distribution of model parameters and latent data for a pedagogic example. The

vertical axis represents the model parameter space, and the horizontal axis rep-

resents the latent variable space. The horizontal lines in the diagram represent

the E-steps and the vertical lines represent the M-steps. The procedure begins

with an initial (poor) estimate of the model parameters. Keeping these constant,

the E-step obtains an estimate of the latent data. Keeping the latent data con-

stant, the M-step obtains a refined estimate for the model parameters. The two

steps are iterated until the algorithm converges to a local maximum. Note that

this particular run of the algorithm finds a local maximum that is not the global

maximum.


Algorithm 4 The EM algorithm for fitting a GMM with multiple componentsto multivariate data.

. Initialise the parameters:

Θ = {P (i), µi, Σi}, ∀i ∈ {1, · · · , k}. (5.16)

repeat. Update the estimate of the mixing probabilities:

P (i) =1

N

N∑j=1

P (i|xj, Θ), ∀i ∈ {1, · · · , k}, (5.17)

where the “responsibility” of component i for xj is

P (i|xj, Θ) =p(xj|i, Θ)P (i|Θ)

p(xj|Θ)(5.18)

by Bayes’ theorem.. Update the estimate of the component means:

µi =

∑Nj=1 P (i|xj, Θ)xj∑N

j=1 P (i|xj, Θ), ∀i ∈ {1, · · · , k}. (5.19)

. Update the estimate of the component covariance matrices:

Σi =

∑Nj=1 P (i|xj, Θ)(xj − xi)(xj − xi)

T∑Nj=1 P (i|xj, Θ)

, ∀i ∈ {1, · · · , k}. (5.20)

until convergence.

Chapter 5—Useful properties of multivariate normal distributions 151

5.5 Useful properties of multivariate normal dis-

tributions

The multivariate normal distribution has the very useful property that there exist

closed-form solutions to the problems of computing the marginal and conditional

distributions. We will review what is meant by these terms, describe the closed-

form solutions for the multivariate normal (i.e. a single component), and then

generalise these results to the multivariate GMM.

5.5.1 Marginal distributions

Imagine that three measurements are made for a sample of individuals on a

continuous scale (e.g. the height, weight and annual income of a number of peo-

ple). We could fit a GMM to this data. Further, imagine that to answer a

particular question we are only interested in the distribution of one of these mea-

surements (e.g. height) and have no constraining information for the other two

dimensions. The distribution we seek is called a marginal distribution. Intuitively,

the marginal distribution is the projection of the full pdf onto the dimensions that

we are interested in. Figure 5.2 illustrates a two-dimensional pdf marginalised

over one dimension.

Formally, if p(x) = p(x1, · · · , xn) is a multivariate pdf, then p(x) marginalised


Figure 5.2: A two-dimensional distribution marginalised over one dimension.The marginal distribution is the “shadow” at the back of the distribution.

over all dimensions except those indexed by D ={di : i = 1, . . . ,m, m ≤ n

}is:

p(xf1 , · · · , xfq) = (5.21)∫ ∞

−∞· · ·

∫ ∞

−∞p(x) dxd1 · · · dxdm , F =

{fi : i = 1, . . . , q

}6⊂ D.

In Equation 5.21, F represents a set of dimension indices which are to be retained

(i.e. we are interested in them). D represents a set of dimension indices which

are to be removed via marginalisation. Sets D and F cannot share indices and

so they are disjoint.

Although the definition of the marginal involves a series of integrals, there is

a very simple general solution: we simply pretend that the dimensions that we

want to marginalise over do not exist (and so no measurements could have been


made for them) [103]. In the case of the Gaussian, the parameters that define the

distribution—the mean vector and covariance matrix—are modified by removing

entries that correspond to the dimensions that we want to marginalise over. An

example of this is shown below. If X ∼ N(µ, Σ ) with

µ =

µ1

µ2

µ3

, Σ =

Σ1,1 Σ1,2 Σ1,3

Σ2,1 Σ2,2 Σ2,3

Σ3,1 Σ3,2 Σ3,3

(5.22)

then

p(x1, x3) =

∫ ∞

−∞p(x) dx2 = N(µm, Σm) (5.23)

where

µm =

µ1

µ3

, Σm =

Σ1,1 Σ1,3

Σ3,1 Σ3,3

. (5.24)

Note that the marginal Gaussian density is itself a Gaussian density. The proce-

dure for computing the marginal distribution can be easily extended to the case

of a GMM by applying the above procedure to each component.

5.5.2 Conditional distributions

Imagine again a multivariate distribution. Also, imagine that we have made

a measurement along one of the dimensions and want to know how this mea-

surement constrains the distribution of values on the other dimensions. The

distribution we seek is called the conditional distribution.


Figure 5.3: A conditional distribution.A joint density is shown on the left. Applying a condition on one dimensionconstrains the distribution along the other.

In the general case of a multivariate pdf and multiple conditions, each condition

defines a hyperplane through the full distribution. The hyperplanes are axis-

aligned and mutually orthogonal. The conditional distribution is the function

that describes the values of the pdf that lie on the intersection of these hyper-

planes, normalised so that the function’s integral equals unity (i.e. is a valid pdf).

Figure 5.3 illustrates this concept.

We now derive the conditional distribution for the multivariate normal distribu-

tion [103]. We seek an expression for p(x1|x2). We will partition the random

vector X into X1 and X2. X2 will be conditioned by X2 = x2. Our approach is to

find a way of forcing independence between X1 and X2. Recall that if two distri-

butions, p(a) and p(b), are independent, then p(a|b) = p(a). Let X ∼ N(µ, Σ ).

We partition Σ as follows:

Σ =

Σ1,1 Σ1,2

Σ2,1 Σ2,2.

(5.25)


Σ1,1 can be linearly transformed so that the covariances shared with Σ2,2 are

zero and hence the two are independent. Assume X ∈ Rp and that there are q

conditions. Let

A =

I︸︷︷︸

q×q

−Σ1,2Σ−12,2︸︷︷︸

q×(p−q)

0T︸︷︷︸(p−q)×q

I︸︷︷︸(p−q)×(p−q)

. (5.26)

Applying A to Σ yields:

AΣAT (5.27)

=

I −Σ1,2Σ−12,2

0T I

Σ1,1 Σ1,2

Σ2,1 Σ2,2

I 0

(−Σ1,2Σ−12,2 )T I

=

Σ1,1 − Σ1,2Σ−12,2 Σ2,1 0

0T Σ2,2.

We see that the off-diagonal covariances are zero. Applying the same transfor-

mation to (X− µ):

A(X− µ) = A

X1 − µ1

X2 − µ2

(5.28)

=

I −Σ1,2Σ−12,2

0T I

X1 − µ1

X2 − µ2

=

X1 − µ1 − Σ1,2Σ−12,2 (X2 − µ2)

X2 − µ2

,


which has the distribution Nq(0, Σ1,1 − Σ1,2Σ−12,2 Σ2,1)

Np(0, Σ2,2)

. (5.29)

If we fix X2 = x2, then µ1 − Σ1,2Σ−12,2 (x2 − µ2) is constant. Because X1 − µ1 −

Σ1,2Σ−12,2 (x2 − µ2) and X2 − µ2 are independent, the conditional distribution of

X1 − µ1 − Σ1,2Σ−12,2 (x2 − µ2) is the same as the unconditional distribution of

X1 − µ1 − Σ1,2Σ−12,2 (X2 − µ2), i.e. Nq(0, Σ1,1 − Σ1,2Σ

−12,2 Σ2,1). Therefore, given

X2 = x2, X1 ∼ Nq(µ1 + Σ1,2Σ−12,2 Σ2,1(x2 −µ2), Σ1,1 −Σ1,2Σ

−12,2 Σ2,1). Note that, as

with the marginal distribution, the conditional is itself a Gaussian density. Note

also that the conditional covariances are independent of x1.

For clarity, we summarise the result obtained above. If X is a multivariate random

variable, where X ∼ N(µ, Σ ), then we can partition these as:

X =

X1

X2

, (5.30)

µ =

µ1

µ2

, (5.31)

Σ =

Σ1,1 Σ1,2

Σ2,1 Σ2,2

. (5.32)


The conditional distribution p(x1|x2) = N(µ′, Σ ′) with

µ′ = µ1 + Σ1,2Σ−12,2 (x2 − µ2) (5.33)

Σ ′ = Σ1,1 − Σ1,2Σ−12,2 Σ2,1 . (5.34)

The dimensions of X1 and X2 do not have to be adjacent, which allows the dis-

tribution to be conditioned over arbitrary dimensions.

Computing the conditional distribution for a GMM involves computing the con-

ditional means and covariances for each component as described above and com-

puting{P (i|x2) : i = 1, . . . , k

}, the set of conditional component probabilities.

These are computed using Bayes’ theorem:

P (i|x2) =p(x2|i)P (i)

p(x2)(5.35)

where p(x2|i) is computed by marginalising each component over the unknown

dimensions.

Numerical issues

The quantity Σ−12,2 is required in order to compute a conditional distribution. In

practice, covariance matrices can often be close to singular (numerically difficult

to invert). An ad hoc approach to improving the condition of a covariance ma-

trix is to add to the diagonal of the matrix. This essentially adds variance to

the distribution represented by the matrix. A significant problem with this ap-

proach is that one does not usually know a priori how much variance should be


added. We have experimented with a scheme where small amounts of variance

are added incrementally until the matrix can be inverted. The method was rea-

sonably successful—i.e. it could be used in the methods we describe in successive

chapters—but computationally expensive.

Another approach, and one that we have found to be a good solution, is to

compute the Moore-Penrose generalised inverse (commonly called the pseudo-

inverse) of the covariance matrix instead [131, 139]. The Moore-Penrose inverse

of the matrix A, which we will denote by A+, has the following properties:

AA+A = A (5.36)

A+AA+ = A+ (5.37)

(AA+)T = AA+ (5.38)

(A+A)T = A+A (5.39)

and

x = A+b (5.40)

is the least squares solution to

Ax = b. (5.41)

Although the Moore-Penrose generalised inverse is defined for any complex ma-

trix, we shall restrict this discussion to covariance matrices, which are symmetric.

The Moore-Penrose generalised inverse can be computed as follows. Note that


the inverse of the matrix A can be written as

A−1 = (PDP−1)−1 = PD−1P−1 (5.42)

where D is a diagonal matrix of the eigenvalues of A, and P is a matrix whose

columns are the eigenvectors of A (i.e. Equation 5.42 represents a rotation of A

to its principal axes). The matrix D−1 is trivial to compute, as it is simply a di-

agonal matrix where each diagonal element is the reciprocal of the corresponding

element in D. For near singular matrices, some of the eigenvalues will be small.

We modify P by discarding the eigenvectors that have small corresponding eigen-

values, and remove the elements of D that correspond to the small eigenvalues

(e.g. if eigenvector 3 is small, row 3 and column 3 of D would be removed):

A+ = PD−1P−1 (5.43)

where P is the modified P and D is the modified D. Since P is orthonormal

P−1 = PT ⇒ P−1 = PT. Although we generally use the Moore-Penrose gen-

eralised inverse for covariance matrices, we use the Σ−1 notation throughout

this thesis—rather than Σ+—because other techniques are occasionally used (see

Section 8.3.1) and the Σ−1 notation implies intent rather than implementation

detail.

Computing conditional Gaussians represents approximately 98% of the computa-

tions performed in the work presented in Chapter 62 and a substantial proportion

of those in Chapter 9, and so a hand-tuned implementation of the above algorithm

2As determined by profiling our implementation.


was developed. This implementation allows Moore-Penrose generalised inverses

of covariance matrices to be computed about 1.4 times faster than the implemen-

tation provided by MATLAB—which uses LAPACK routines [6]—and is equally

robust. A less portable version was much faster, being 1.5 to 2 times faster than

the MATLAB implementation.

5.5.3 Sampling from a Gaussian mixture model

Sampling from an n-dimensional Gaussian mixture model is reasonably straight-

forward. Firstly, one of the model components is selected at random. The distri-

bution used for this sampling is the set of component probabilities,{P (i) : i =

1, . . . , k}. A sample is then drawn from the selected component, described by the

covariance matrix Σ . If the component was aligned with the Cartesian axes, then

sampling from the component would be easy because its covariance matrix would

be diagonal and the dimensions would be independent: a set of n scalars could

be sampled from univariate normal distributions with variances corresponding

to each diagonal element of the covariance matrix. In general, components are

not aligned with the Cartesian axes, and so we must first diagonalise the compo-

nent’s covariance matrix. This is achieved by performing an eigen decomposition

which yields a matrix P , where each column is an eigenvector of the covariance

matrix. P represents the transformation needed to diagonalise the covariance

matrix. This is a Principal Components Analysis (PCA) [104]. The diagonalised

covariance matrix is given by:

ΣD = PTΣP . (5.44)

Chapter 5—Learning from large datasets 161

An n-dimensional vector, sD, is then sampled from the diagonalised component,

using the procedure described above. This vector is then transformed back to the

original space by applying the inverse transformation to yield a sample, s′:

s′ = PsD. (5.45)

s′ is now in the space of our model, but is centred on the origin. We then translate

this sample, using the component’s mean vector, µ:

s = s′ + µ. (5.46)

The vector s is a sample from the distribution represented by the model.

5.6 Learning from large datasets

Mammograms are digitised at high resolution, yielding images that contain a

few million pixels. It seems reasonable that, in order to learn the variation in

mammographic appearance, we will need to consider large quantities of data. We

have therefore considered how the k-means algorithm could be adapted to process

such volumes of data.

Jain et al. present a review of data clustering techniques where they discuss clus-

tering large datasets [102]. The most natural approach for problems where the

entire dataset cannot be stored in primary memory is the divide-and-conquer al-

gorithm, which is illustrated in Figure 5.4. This algorithm stores the full dataset,


D0, in a secondary memory (e.g. on hard disk or a large networked store) and ran-

domly divides it into p subsets, Si : i ∈ {1, · · · , p}, of equal size. Each Si is then

processed by a clustering algorithm, yielding clusters Ci,j, where j ∈ {1, · · · , k}.

Each cluster Ci,j then contributes a number of representative data points to form

a new data set, D1. Let there be Ni data points in subset Si and ni,j data points

in cluster Ci,j. The “probability” of Ci,j is ni,j/Ni. If there are to be η data points

contributed from each Si to D1, then cluster Ci,j contributes qj data points, where:

qj =ηni,j

Ni

. (5.47)

(Clusters which represent more data contribute appropriately, so that one “type”

of data is not disproportionately represented in D1.)

If there are still too many data points in D1 for it to be clustered in primary

memory, the above process can be repeated to create D2, D3 and so forth. The

number of times the divide-and-conquer algorithm will be run is determined by

the initial number of subsets, p, and the number of data points contributed from

each subset, η.

In our work in Chapter 6, we set p and η such that D1 can be clustered within

primary memory (i.e. only one run through the divide-and-conquer algorithm is

required). Since the data sets {Si} need to be clustered as an intermediate step,

we use the non-iterative variant of the k-means algorithm on each of these, and

the iterative variant to yield the final clustering, from which the GMM parameters

are computed.


Figure 5.4: The divide-and-conquer clustering algorithm.This diagram illustrates how a large data set, D0, can be divided into smaller ones{Si} which can be clustered in primary memory. Each clustering then contributessome representative data to form a new data set, D1, which can be clustered inprimary memory to yield a final clustering.


In much of the work presented in Chapter 6, we adopt the divide-and-conquer

approach. However, although this variant of the k-means algorithm allows GMMs

to be built from large datasets, we found there to be no appreciable difference

between models built using the divide-and-conquer algorithm and those built

simply by selecting a reasonable number of training points at random. The divide-

and-conquer method simply makes it unlikely that the GMM will be built from

biased data.

A better approach might be to implement an EM algorithm that can consider

large datasets. The inner loop of the EM algorithm is written as an iteration over

the data points; with an appropriate caching strategy, the EM algorithm extends

naturally to deal with large data sets. Unlike the divide-and-conquer variant of

k-means algorithm, every data point would contribute to every model parameter.

5.7 Summary

This chapter presented an introduction to Gaussian mixture models and the mul-

tivariate normal distribution. In summary:

• Gaussian mixture models are a flexible solution to the density estimation

problem.

• Gaussian mixture model parameters can be learned from training data using

several approaches. This chapter described the k-means clustering and

Expectation-Maximisation algorithms.


• The k-means clustering algorithm is a simple and intuitive approach but

was not designed to fit Gaussian mixture models to data.

• The Expectation-Maximisation algorithm is a principled approach to learn-

ing Gaussian mixture model parameters from training data.

• The multivariate normal distribution has two useful properties: the marginal

and conditional distributions can be computed using closed-form solutions.

Further, the marginal and conditional distributions are themselves multi-

or univariate normal distributions.

• It is possible to sample from a multivariate normal distribution.

• These properties of the multivariate normal distribution can be used to

define equivalent operations on the multivariate Gaussian mixture model.

• The chapter described an approach to learning parameters for Gaussian

mixture models from large datasets using a variant of the k-means cluster-

ing algorithm. The Expectation-Maximisation algorithm can be trivially

extended to learn from large datasets.

Chapter 6

Modelling mammographic

texture for image synthesis and

analysis

6.1 Introduction

This chapter develops a generative parametric statistical model of stationary

texture. The model is based upon Efros and Leung’s non-parametric texture

synthesis method [60]. The chapter presents:

• Efros and Leung’s texture synthesis algorithm.

• A parametric model-based version of their algorithm that allows texture

analysis as well as synthesis.

166


• A way of synthesising textures using the parametric model and some exam-

ple synthetic images.

• A novelty detection method that allows the parametric model to be used

to analyse textures.

6.2 Background

Chapter 3 gave a brief overview of some computer-aided detection systems. These

generally attempt to emulate radiologists’ interpretation strategies using pattern

recognition. The approach is very common in computer-aided mammography,

but may not be the best way to approach the problem.

Instead of learning a classification rule that separates classes (e.g. malignant

masses from benign masses), we should instead learn what pathology-free mam-

mograms look like within a framework that allows illegal instances to be iden-

tified. If we can determine that the appearance of a particular mammogram is

unlikely—given it is supposed to be free of pathology—then we can label that

mammogram as being novel (perhaps leaving an expert to determine exactly why

it is novel). Another name for the approach is outlier detection.

To perform novelty detection on mammograms we need a model of the appearance

of pathology-free mammograms that can be used in an analytical mode. That is

to say that the model needs to be able to identify unlikely model instances by

assigning likelihoods (or similar measures) to model instances.


The appearance of entire mammograms is difficult to model due to the nature of

the imaging process and anatomical differences between women. In this chapter

we make the problem more tractable by assuming that mammograms are sta-

tionary (i.e. the statistics of the texture do not vary over the image plane and

so local appearance does not vary across the breast). This is certainly not true,

for example the appearance of a pectoral muscle differs significantly from a fatty

breast region, but the assumption allows us to concentrate on a manageable part

of the problem. Because we are assuming stationarity, we do not need to worry

about shape—our simplified mammogram is a texture on a potentially infinite

plane. We address the problem of modelling the appearance of entire breasts in

mammograms in Chapter 9.

When developing models of appearance, it is useful to be able to visualise in-

stances of the model—in our case to be able to generate synthetic examples of

mammographic textures. This will allow us to evaluate the generality and speci-

ficity of the model. Because this generative property is so useful, we make it a

requirement for our model.

Sajda et al. [159, 169] used wavelet coefficients, computed from mammographic

patches, to statistically model mammographic patches using a tree-structured

variant of a hidden Markov model. To synthesise an image, coefficients in finer

levels of the wavelet decomposition were sampled, conditional on those at coarser

levels. Subjectively, this approach was reasonably successful at capturing local

textural appearance of mammograms, and can be used in both generative and

analytical modes. The model was used to filter false positives produced by an-

other CADe system. Unfortunately, due to the way in which finer levels are


conditioned upon coarser ones, the synthetic images produced using the model

had an obvious grid structure corresponding to the coarsest level of the wavelet

decomposition. The approach is similar to De Bonet and Viola’s model of generic

textures that allows both synthesis and analysis [53]. We present a model of entire

mammograms that uses the hierarchical conditioning approach in Chapter 9.

Bochud et al. [19] developed an algorithm to generate mammographic texture

using a strategy that placed basis functions at locations within a white noise

image, according to a pdf. The white noise and kernel function were matched

to the power spectrum of real mammographic texture. The method produced

reasonable synthetic textures, but the images could easily be distinguished from

real examples. Brettle et al. [27] evaluated several methods for generating medical

textures (including mammographic textures) and found that the generic texture

synthesis method developed by Efros and Leung produced the best results [60].

Heine et al. modelled mammographic texture using a random field model [85].

The authors assumed that mammographic texture could be modelled by a con-

volution of a random field with a kernel function. The choice of the form of the

kernel was based in part on studies of the fractal nature of mammograms. The

parameter governing the kernel function was learned from real mammographic

data, as were the statistical characteristics of the random field. Subjectively, the

approach allowed reasonably realistic synthetic textures to be generated. The au-

thors analysed an obvious mass in a real mammogram by computing the random

field that would have been required to generate the image under their model.

The location of the mass was visible in computed field, and the approach can be

viewed as an example of novelty detection.

Chapter 6—Non-parametric sampling for texture synthesis 170

In addition to methods developed specifically to model mammographic texture,

generic texture synthesis algorithms may also be useful for mammographic texture

synthesis (e.g. [53, 84, 144]). The work presented in this chapter extends the Efros

and Leung algorithm to a parametric statistical setting, which allows the method

to be used to generate synthetic textures and perform novelty detection.

6.3 Non-parametric sampling for texture syn-

thesis

Efros and Leung describe a method of replicating texture, based upon non-

parametric sampling [60]1. Their algorithm is based upon an idea from Shan-

non’s paper that introduced information theory [163]. Their method extends

Shannon’s one dimensional Markov chain approach to producing English-looking

text to the image plane, to achieve texture synthesis. Though the method is

simple, it produces some of the best results in the texture synthesis literature.

Assume an image containing a sample of the texture one wishes to replicate.

This source image, IS, is simply a matrix of grey-level values. The aim is to fill an

unpopulated target matrix, IT , with grey-level pixel values, such that the textures

in the two images are similar (but not identical). Their algorithm is described in

Algorithm 5.

The method essentially has two parameters: the window size and the number

1Efros and Leung’s method was developed independently of a similar method presented in[68].

Chapter 6—Non-parametric sampling for texture synthesis 171

Algorithm 5 Efros and Leung’s texture synthesis algorithm.

. Select a region from IS and insert it at some location in IT . For example, any3× 3 pixel section could be used.. Initially, let the set S = ∅.for each pixel p ∈ IS do

. Extract a square window, s, of size n× n, centred around p.

. Add s to S.end forrepeat

. Compute a list, U , of unpopulated pixel locations that are 8-connected tothe populated area of IT .. Randomly choose a pixel location, u ∈ U .. Examine a vector t, formed from a square window of size w × w pixels,centred on u. Some dimensions (pixels) will be populated and some will not.. Find a small set of windows, S ′, from S that are similar to t.. Randomly select one of the windows from S ′ and place its centre pixel valueinto IT at location u.

until all the pixels in IT have been populated.

of similar windows to place into S ′. The first parameter is important for good

texture synthesis and is a function of the actual texture. The authors say that

the window size should be selected to be similar in size to the largest repeating

feature in the texture. The second parameter is automatically adapted by finding

the distance, δ, to the most similar window, and then including all windows in

S ′ that are within a radius of (1 + ε)δ from t (Efros and Leung set ε to 0.1 [60]).

The method also requires a metric that measures window similarity and takes

into consideration the missing (unpopulated) pixels from t. The authors use a

normalised sum of squared differences metric, which is weighted by a Gaussian

kernel to give more weight to the pixels near the centre of the window and hence

encourage local similarity. The authors also trivially extend the method to work

with colour images, although this is not useful for a mammography application.

Chapter 6—A generative parametric model of texture 172

Efros and Leung’s method does not consider texture analysis, but Efros claims

that it could be achieved using k-nearest neighbour classification2. Such methods

are problematic for two reasons: populating high-dimensions spaces is impractical

[14] and finding the k-nearest neighbours is computationally demanding.

In a subsequent paper [59], Efros and Freeman address the run-time efficiency

of the texture synthesis algorithm, proposing that instead of populating IT pixel-

by-pixel, the texture is synthesised patch-by-patch. Although the fundamental

idea of conditional sampling was preserved, the non-parametric approach meant

that Efros and Leung’s algorithm had to be significantly modified and a different

similarity measure and sampling method were used.

6.4 A generative parametric model of texture

In this section we propose an approach that unifies Efros and Leung’s and Efros

and Freeman’s methods within a parametric statistical framework. Our method

will not only enable novelty detection to be performed using statistical inference,

but will address two of the problems of the non-parametric approaches: their

sampling methods do not truly reflect the statistical distribution of window ap-

pearance and the time complexity of the synthesis algorithms are a function of

the number of pixels in IS.

Our method is similar to those presented by Popat and Picard [142] and Grim and

Haindl [74] in that we use a parametric model of the distribution of local textural

2Personal communication.

Chapter 6—A generative parametric model of texture 173

appearance. However, Popat and Picard’s model had a hierarchical configuration

which was designed to capture overall texture structure. Grim and Haindl also

modelled the distribution of local textural appearance, but a number of such

models were used, one for each decorrelated component of the colour space.

We address the first of the above problems by using an explicit representation of

the distribution of the appearance of the windows. We address the second prob-

lem by moving the burden of iterating over the “training” set to a training stage,

meaning that the computational complexity of image synthesis and analysis be-

comes a function of the model parameters and is largely unrelated to the number

of training points. We also address building the model from large training sets.

Our method assumes a training set of a number of images containing examples of

the same texture. Centred on each pixel in the training set, we extract windows

of size w × w, where w is odd, centred on the pixel. Windows that overlap the

border of their image are discarded. The pixels in each window are concatenated

so that the windows may be considered as points in a high-dimensional space. We

seek to model the distribution of these points. The divide-and-conquer algorithm

(described in Section 5.6) and the k-means algorithms are used to build a GMM

of the distribution. We use the fast non-iterative variant of k-means for the

first stage of the divide-and-conquer algorithm, and then the iterative variant to

produce the final clustering, and hence the model. We have also built models

using the EM algorithm for Gaussian mixtures.

We now describe how the model can be used to generate new examples of the

modelled texture.

Chapter 6—Generating synthetic textures 174

6.5 Generating synthetic textures

We have developed two algorithms for texture synthesis, which are parametric

analogues of the non-parametric methods used by Efros and Leung and Efros and

Freeman. Each assumes a Gaussian mixture model of a particular class of texture

parameterised by Θ, as described in Section 6.4. The algorithms are presented in

Algorithm 6 and Algorithm 7. Like the Efros and Leung and Efros and Freeman

algorithms, we define a target image, IT , whose pixels are initially unpopulated.

We describe the two algorithms in the following sections.

6.5.1 Pixel-wise texture synthesis

Algorithm 6 is analogous to the Efros and Leung synthesis algorithm.

6.5.2 Patch-wise texture synthesis

Algorithm 7 is analogous to the Efros and Freeman synthesis algorithm. The

algorithm is the same as for the pixel-wise algorithm, except the marginalisation

step is removed and the sample from the conditional model contains pixel values

for the remainder of the window, rather than individual pixels. Because the

image is filled patch-by-patch, synthesis is performed significantly faster than by

the pixel-wise algorithm.


Algorithm 6 Pixel-wise texture synthesis with a Gaussian mixture model oflocal textural appearance.

. Select a region from one of the training images and insert it at some locationin IT . For example, any 3× 3 pixel section could be used.repeat

. Compute a list, U , of unpopulated pixel locations that are 8-connected tothe populated area of IT .. Randomly choose a pixel location, u ∈ U .. Examine a vector t, formed from a square window of size w × w pixels,centred on u. Some dimensions (pixels) will be populated and some will not.. Marginalise the GMM over all dimensions corresponding to unpopulatedpixels (not including the centre pixel), as described in Section 5.5.1. Thisyields a Gaussian Mixture Model parameterised by Θ′.. Condition the model with parameters Θ′ on the values of the populatedpixels, at the corresponding dimensions, as described in Section 5.5.2. Thisyields a Gaussian Mixture Model parameterised by Θ∗.

This conditional model represents the likely distribution of pixel values forthe centre pixel, given the local populated pixels.

. Sample a value, p, from the model parameterised by Θ∗.

. Insert the value p into IT at location u.until all the pixels in IT have been populated.


Algorithm 7 Patch-wise texture synthesis with a Gaussian mixture model oflocal textural appearance.

. Select a region from one of the training images and insert it at some locationin IT . For example, any 3× 3 pixel section could be used.repeat

. Compute a list, U , of unpopulated pixel locations that are 8-connected tothe populated area of IT .. Randomly choose a pixel location, u ∈ U .. Examine a vector t, formed from a square window of size w × w pixels,centred on u. Some dimensions (pixels) will be populated and some will not.if the window overlaps the edge of the image then

. Marginalise the model over the dimensions that lie outside IU .end if. Condition the Gaussian mixture model on the values of the populatedpixels, at the corresponding dimensions, as described in Section 5.5.2. Thisyields a Gaussian Mixture Model parameterised by Θ∗.

This conditional model represents the likely distribution of pixel values forthe unpopulated pixels in the window around the pixel with location u.

. Sample a vector, p, from the model parameterised by Θ∗. In the case thatΘ∗ represents a univariate model, p will be a scalar.. Insert the values p into IT to populate the remainder of the window centredon location u.

until all the pixels in IT have been populated.


6.5.3 The advantages and disadvantages of a parametric

statistical approach

As described in section Section 6.4, the Efros and Leung and Efros and Free-

man sampling methods do not truly reflect the statistical distribution of window

appearance. The parametric model that we propose can easily be sampled in a

principled manner.

Because we have taken a statistical approach, we have been able to address the

run-time efficiency of the pixel-wise algorithm, providing a natural extension

of the method to a patch-wise algorithm. Thus, our approach is a parametric

generalisation of the Efros and Leung and Efros and Freeman algorithms.

As we shall see in Section 6.7, the parametric approach allows us to perform nov-

elty detection by assigning a likelihood to each pixel, which would be problematic

and computationally expensive using a non-parametric approach.

One can view the aim of texture synthesis as follows: to produce original examples

of an existing texture, which are both specific and general—i.e. the generated

textures are similar to, and span the range of, the textural appearance observed

in the set of example textures. The non-parametric methods use direct copying of

pixel values in an attempt to achieve specificity and ad hoc sampling methods in

an attempt to achieve generality. In contrast, the parametric method we propose

attempts to achieve both specificity and generality by sampling from the learned

distribution.

A potential drawback of the parametric approach is that synthesised pixels or

Chapter 6—Some texture models and synthetic textures 178

patches will not necessarily exist in the training set (they are sampled from a

model, rather than copied from legal examples). While illegal pixel values or

patches cannot be synthesised by the non-parametric approach, they are simply

unlikely under the parametric method.

6.6 Some texture models and synthetic textures

Having described the theory behind our parametric texture model, we now show

synthesis results for two mammographic textures. The first is a “fractal”3 texture

which is generated using a simple procedure. This texture is similar to mammo-

graphic parenchymal patterns. Unlike real mammographic texture, these fractal

textures are stationary, and so the key assumption of our model is well-matched

to the properties of the texture. The fractal texture served as a useful “sanity

check” while developing the model. The second set of textures are regions taken

from real digitised mammograms.

6.6.1 A model of fractal mammographic texture

The recursive procedure for generating the fractal textures is shown in Algo-

rithm 84. For the training images used to build the model presented in this

section, the initial size of the image was 4× 4 pixels, and the algorithm was run

until the image was 256 × 256 pixels. Example training textures are shown in

3We refer to this type of texture as being fractal-like because of the generation process,which involves applying the same algorithm at a number of scales. It is the generative processthat is self-similar, rather than the final texture.

4Implementation by Arjun Viswanathan.


Algorithm 8 Fractal mammographic texture algorithm.

. An n × n grey-scale image matrix is initialised with random pixel values,sampled uniformly on [0, 1].repeat

. The function underlying the image is interpolated to form an new imagematrix with four times the number of pixels (i.e. each of the pixels in theprevious image corresponds to four pixels in the new image).. Each pixel value is perturbed by adding uniform random noise, sampleduniformly on [0, 1], and scaled by 2−i, where i is the iteration number.

until the image reaches a predefined size.

Figure 6.2

A Gaussian mixture model of the fractal texture was built using the approach

described in Section 6.4, using 10 training textures generated by 10 runs of Al-

gorithm 8, a window size of 11 × 11 and 50 model components (though 10 were

discarded by the k-means algorithm due to weak support). Some unconditional

sampled patches are shown in Figure 6.1. Examples of synthetic textures gener-

ated from the model using both pixel- and patch-wise sampling (as described in

Section 6 and Section 7) are shown in Figure 6.2

6.6.2 A model of real mammographic texture

A Gaussian mixture model of the real texture was built using the approach de-

scribed in Section 6.4, using 10 training textures that were manually selected from

the Digital Database of Screening Mammography [83] to represent the range of

real mammographic textural variation, a window size of 11 × 11 and 50 model

components (again, 10 were discarded by the k-means algorithm due to weak sup-

port). Some unconditional sampled patches are shown in Figure 6.3. Examples


Figure 6.1: Unconditional samples from the fractal model.196 samples from the model of fractal texture. For this figure, all model compo-nents were equally likely to be sampled from.


Figure 6.2: Fractal training and synthetic textures.Top row: Three training images. Middle row: Synthetic textures produced us-ing the pixel-wise algorithm. Bottom row: Synthetic textures produced using thepatch-wise algorithm.


Figure 6.3: Unconditional samples from the real mammographic texture model.196 samples from the model of real mammographic texture. For this figure, allmodel components were equally likely to be sampled from.

of synthetic textures generated from the model using both pixel- and patch-wise

sampling (as described in Section 6 and Section 7) are shown in Figure 6.4

6.6.3 The quality of the synthetic textures

It is relatively easy to qualitatively assess the quality of the synthetic textures,

simply by comparing them to the training textures (a more quantitative evalua-


Figure 6.4: Real training and synthetic textures.Top row: Three training images. Middle row: Synthetic textures produced us-ing the pixel-wise algorithm. Bottom row: Synthetic textures produced using thepatch-wise algorithm.


tion is described in Section 7.2). In the case of the fractal texture, it is subjectively

clear that the pixel-wise method produces very good synthetic textures, while the

patch-wise method produces much less convincing results. Except for the fractal

mammographic texture, the synthetic textures that we have generated using the

patch-wise algorithm have been subjectively very similar to those produced by

the pixel-wise algorithm (where the same model was used by each algorithm). It

is not clear why the fractal patch-wise textures are so poor, but detailed work to

determine this is beyond the scope of this thesis.

In the case of the real mammographic textures, both the pixel-wise and patch-wise

methods produce reasonable results, but the synthetic images are easily distin-

guishable from the training images. The synthetic real mammographic textures

do capture the local textural appearance of the training images, but the overall

appearances are subjectively quite dissimilar. The most likely explanation for

this is that structure exists in mammograms on a number of levels; the texture

will be determined by local tissue type (e.g. glandular, fatty) and higher-level

structure (such as a duct). The high-level structure breaks the assumption of

stationarity.

It is possible for two areas in synthetic images to develop independently before

ultimately converging. If these two areas have different textural appearances, then

pixels or patches that are synthesised where the areas meet are forced to merge

one type of textural appearance into another. This can cause a discontinuity. We

have not investigated strategies that may prevent this behaviour, but such work

may yield better synthetic textures. More extreme examples of this type of failure

are shown in Figure 6.5. When the texture appears to have been adequately


Figure 6.5: Examples of synthesis failure using patch-wise synthesis with amodel of real mammographic appearance.

modelled, we estimate that failures of this type occur in fewer than 1 in 20

attempts. Because all parts of the appearance space have non-zero density, it is

also possible for the synthesis procedure to transition to and “get stuck” in a part

of the appearance space which is illegal. This results in incorrect texture being

generated. The frequency of such failure may be reduced by learning a “better

model”—this is possible because the k-means and EM algorithms converge to

locally optimal solutions. Determining if a particular model is the best is still

an open research question. We estimate that failures of this type occur in fewer

than 1 in 10 attempts.

6.6.4 Time and space requirements of the parametric method

The time required to build a parametric model of textural appearance depends

upon the number and size of the training images. Using all the training windows

in a set of training images with the divide-and-conquer algorithm is computa-

tionally expensive. For example: building a model of 10 images, each an average


of 300× 300 pixels, can take a few days. Using a subset of the training set con-

taining 20 000 training windows and building the model using the EM algorithm,

a model can be built in around 12 hours5.

Storing the model is trivial on a modern workstation. The models of fractal and

real mammographic texture have—by coincidence—the same number of model

components, and each uses 11×11 pixel windows. To encode such a model requires

storing 40 component probabilities, mean vectors and covariance matrices. The

matrix of mean vectors has 40 × 121 elements and each covariance matrix has

121× 121 elements. We therefore need to store 40+ (40× 121)+40(121× 121) =

590 520 parameters. If double precision representation (IEEE Standard 754 [100])

is used to encode these parameters, then each parameter requires 8 bytes, and so

the model can be stored in 4 724 160 bytes—less than 5MB—without compression.

Storing this model in an uncompressed form consumes more space than storing

the original images (since pixel values are usually represented using relatively low

precision and compression is commonly used). However, because the size of the

model is fixed, synthesis or analysis of each pixel can essentially be performed in

O(1) time, while the non-parametric methods are required to iterate over each

possible window in the “training” set. Marginalisation is computationally cheap,

while computing a conditional distribution—which must be done for each pixel

or patch—is relatively expensive. Profiling reveals that approximately 98% of the

parametric synthesis algorithms’ time is spent computing Moore-Penrose gener-

alised inverses. Using our optimised implementation (see Section 5.5.2), each

5These figures are for a workstation with a 1.3GHz Intel Pentium 4 processor with 512MBof physical memory.


pixel or patch takes approximately 0.22 seconds to generate on a computational

server with a 2.8GHz Intel Xeon Hyperthreaded processor with 2GB of physical

memory6. Using the pixel-wise algorithm, a 300 × 300 pixel image can be syn-

thesised in 5.5 hours, while an image of the same size can be synthesised in a few

minutes using the patch-wise algorithm.

6.7 Novelty detection

Because we have an explicit statistical model of the appearance of local texture,

it is possible to assign likelihoods to pixels, based upon a local neighbourhood.

Pixels marked as unlikely should be interpreted as being novel. The novelty

detection algorithm is very similar to the pixel-wise synthesis algorithm and is

described in Algorithm 9. We assume an unseen image IU which may contain

texture that is not of the modelled class and a model of the expected texture

with parameters Θ. We will form an image of log-likelihoods IL and a binary

image indicating novel pixels IB.

Note that the likelihood of an event is the probability of the event had it not

actually occurred (i.e. probabilities refer to future events, while likelihoods refer

to past events). In order to compute true likelihoods (or log-likelihoods), the pdf

defined by the conditional model would need to be integrated between suitable

limits. Since the conditional pdf—given by pΘ∗(x)—is univariate, this is relatively

simple:

L(a) =

∫ a+r+

a−r−pΘ∗(x) dx (6.1)

6This machine was shared with one other large computational job.


Algorithm 9 Novelty detection using a Gaussian mixture model of texture.

for each pixel location p ∈ IU do. Extract a square window, of size w×w, represented as a vector t, centredaround the pixel at location p.if the window overlaps the edge of the image then

. Marginalise the model over the dimensions that lie outside IU .end if. Condition the model upon all values in t, except for the centre pixel. Letthe resulting univariate model be parameterised by Θ∗.. Compute the log-likelihood, l, of the centre pixel value under the modelparameterised by Θ∗ (see the text for more details).. Assign l to the pixel at location p in IL.

end for. In addition to the log-likelihood image, produce a binary image IB whichidentifies novel pixels using a threshold on the log-likelihoods (e.g. learnedusing an independent training set).

where r− and r+ delimit the event and may be estimated from the expected noise

on the pixel value a. If we assume that the noise is constant then we may set

r− = r+ = ∆2

(where ∆ defines a region around the pixel value a). If ∆ is suitably

small, then Equation 6.1 can be approximated by

L(a) ≈ ∆pΘ∗(a). (6.2)

We can use the conditional density at a as a proxy for the likelihood estimated

by Equation 6.2, as it is simply a scaling of Equation 6.2. If actual likelihoods

are required (for example by another system), then IL can be scaled using the

estimate of ∆. In the novelty detection work in this thesis, we use the conditional

density at a as our likelihood measure.

It is natural to ask if an analogue of the patch-wise synthesis algorithm could be

used to efficiently perform novelty detection. Unfortunately, the density at any


point in a high dimensional space is vanishingly small, so a patch-wise novelty

detection algorithm cannot be used7.

6.8 Summary

This chapter presented a parametric statistical model of stationary texture, de-

veloped from the texture synthesis algorithm of Efros and Leung. In summary:

• Efros and Leung’s texture synthesis algorithm was described.

• A parametric version of Efros and Leung’s algorithm was developed.

• Two methods of generating synthetic textures using the model were de-

veloped. Our model can be viewed as a parametric generalisation of the

methods of Efros and Leung and Efros and Freeman.

• Synthetic textures that were generated using our model were presented and

discussed.

• A novelty detection method was developed that allows the parametric model

to be used to analyse textures.

7It may be possible to compute a few likelihoods at once, as a compromise between the twoextremes, but this has not been investigated further

Chapter 7

Evaluating the texture model

7.1 Introduction

This chapter presents an evaluation of the parametric texture model for texture

synthesis and analysis. The chapter describes:

• A psychophysical evaluation of synthetic mammographic textures produced

using the model.

• An evaluation of how well the model can detect abnormal features in sim-

ulated and real mammographic images.

190

Chapter 7—Psychophysical evaluation of synthetic textures 191

7.2 Psychophysical evaluation of synthetic tex-

tures

It is relatively easy to make a personal qualitative assessment of whether a pair of

textures are similar or not. However, this approach is subjective and qualitative;

an objective and quantitative approach is preferred. Few of the most frequently

cited papers in the texture modelling and synthesis literature present any such

evaluation (e.g. [53, 59, 60, 84, 144]). Little rigorous evaluation appears to be at-

tempted. Brettle et al. evaluated methods for synthesising textures from medical

images (including mammographic textures) using several texture measures [27].

The synthetic images generated by Efros and Leung’s original method [60] were

found to be most realistic. Although texture features provide an objective and

quantitative measure of textural properties, the best systems available to compare

textures are evolved biological vision systems, such as the human visual system.

Psychophysical experiments can allow the human visual system to be used ob-

jectively and quantitatively. We now present a psychophysical experiment that

evaluates the synthetic textures produced using our model.

7.2.1 Aims

The primary aim of this experiment was to determine if textures generated using

the parametric model of local texture can be differentiated from examples of

the real texture. The secondary aim is to compare synthetic textures generated

using the parametric model to those generated using Efros and Leung’s method.


Since the patch-wise synthetic images can easily be differentiated from the real

textures, we restrict ourselves to the pixel-wise images. We can therefore state

our experimental hypotheses:

1. Synthetic fractal mammographic textures generated by the parametric model

are indistinguishable from real fractal mammographic textures.

2. Synthetic real mammographic textures generated by the parametric model

are indistinguishable from real mammographic patches.

3. Synthetic fractal mammographic textures generated by the parametric model

are more like real fractal mammographic texture than those produced using

Efros and Leung’s method.

4. Synthetic real mammographic textures generated by the parametric model

are more like real mammographic patches than those produced using Efros

and Leung’s method.

7.2.2 Method

The forced-choice paradigm is well-suited to the process of comparing a pair of

textures. Participants were presented with a series of three textures which we

shall call Image A, Image B and a reference image.

In the case of experiments 1 and 2, the reference image was an example of a real

texture selected from the training set, Image A was a synthetic texture generated

using the parametric model and Image B was an example of a real texture selected

from the training set (though different to the reference image).


In the case of experiments 3 and 4, the reference image was an example of a real

texture selected from the training set, Image A was a synthetic texture generated

using the parametric model and Image B was a synthetic texture generated using

Efros and Leung’s method. Note that there was an important difference between

the way that the Image A synthetic textures and Image B synthetic textures

were generated in experiments 3 and 4: Image A textures were generated from a

model that was trained on a number of mammographic images while each Image

B texture was generated from a single “training” image using Efros and Leung’s

method. This was necessary because the Efros and Leung algorithm scales poorly

with the number of “training” pixels. It was expected that this would result in

the Image B textures to be highly specific to the image they were generated from

and appear to be more consistent and “plausible”.

For each set of three images, all were of the same class of texture. The images

were arranged in a row with the reference image in the centre. Image A would

appear to the left of the reference image with probability 0.5, and the position of

Image B was set accordingly. Trials corresponding to the four hypotheses were

presented in a random order, so that participants could not easily guess the exact

experimental design and introduce bias into their responses. The participants

were asked to compare Image A and Image B to the reference image and choose

the one they thought to be most similar to the reference image.

Each of the three images could be drawn from a set of 10 images (e.g. there were 10

synthetic reals generated using our method, 10 synthetic fractals generated using

our method and 10 real mammographic images). The images in the training

sets—and hence the reference image sets—were manually selected such that the


sets represented a broad range of textural appearance for the class of texture

being investigated. No synthesised images were excluded (e.g. on the basis that

synthesis failure occurred). Each participant was shown 10 image sets for each

experiment. The number of images of each type was limited by the computational

time required to synthesise the four classes of synthetic image and the number of

images presented to each participant was selected such that the experiment could

be completed within a reasonably short space of time (approximately 5 minutes).

Ideally, the experiment would have been conducted using more reference and

synthesised textures. This would minimise the probability of the participants

seeing the same images (or combination thereof) and could more accurately reflect

the distribution of the various “types” of textural appearance. However, we

believe that the design achieves these aims to the maximum extent possible under

the time constraints imposed by the synthesis algorithms and the participants’

patience.

The experiment was implemented as a Internet-based application, delivered via

an XHTML [138] interface. Image A and Image B were hyperlinks that reported

the participants’ choices to the application. The names of the image files were

disguised as “random” strings of text, so that web browser software could not

disclose the “correct” image by displaying the image filenames on-screen. The

hyperlink encoding of the participants’ selections was similarly disguised. The re-

sponses were recorded in a database upon completion of the experiment (i.e. only

results from those who completed the experiment were recorded). A screenshot

of one of the trials is shown in Figure 7.1. The number of times Image A and

Image B were chosen was recorded for each participant, allowing χ2 analysis by


Figure 7.1: A screenshot of one of the trials.

pooling all participants.

The experiment was run twice. The aim of the first run was to test the application

using a small number of participants. The experiment was advertised in an email

to all members of the division of Imaging Science and Biomedical Engineering

at the University of Manchester. The first run of the experiment attracted 24

participants. The aim of the second run was to get as many people as possible

to take part. The experiment was advertised to all students—undergraduates

and postgraduates—of the University of Manchester via email. The second run

of the experiment attracted 1 777 participants. Participants were therefore self-

selecting, and we did not control for factors such as age and sex.


Experiment Image A selection (small run) Image A selection (large run)

1 29% (of 240 trials) 34% (of 17 770 trials)2 27% (of 240 trials) 28% (of 17 770 trials)3 38% (of 240 trials) 41% (of 17 770 trials)4 25% (of 240 trials) 26% (of 17 770 trials)

Table 7.1: Results for the psychophysical experiment.Row 1: Synthetic fractal textures generated by our model versus real fractal tex-tures. Row 2: Synthetic real mammographic textures generated by our model ver-sus real fractal textures. Row 3: Synthetic fractal textures generated by our modelversus those generated using Efros and Leung’s algorithm. Row 4: Real mammo-graphic textures generated by our model versus those generated using Efros andLeung’s algorithm.

7.2.3 Results

Results for the small and large runs are shown in Table 7.1. The tables shows

the number of “votes” for Image A (the synthetic textures generated using our

parametric model) as a percentage of the total for each of the four experimental

conditions (refer to the Section 7.2). Image B was selected more often in all cases,

and this result is statistically significant at the 95% significance level.

7.2.4 Discussion

In experimental conditions 1 and 2, we would have liked Image A to have been

chosen 50% of the time. This would have suggested that participants could not

differentiate between the real examples of the two texture classes and those gener-

ated by the parametric method. The results indicate that participants were able

to differentiate between the real and synthetic textures, but the synthetic images

were realistic enough that the participants mistook them for the real textures

Chapter 7—Initial validation of the novelty detection method 197

about a third of the time. Subjectively, the simulated fractal mammographic

textures appear to be modelled more successfully than the real mammographic

texture, and the results support this observation.

In experimental conditions 3 and 4, we would have liked Image A to have been se-

lected in preference to Image B (i.e. more than 50%). This would have suggested

that participants thought that the synthetic images generated using the paramet-

ric model were more like the real textures than the synthetic images generated

using Efros and Leung’s method. The results indicate that participants were able

to differentiate between the synthetic images generated using the two methods.

As suspected, participants favoured the synthetic images generated using Efros

and Leung’s method, but the images generated using the parametric method were

preferred in 41% (fractal mammographic texture) and 26% (real mammographic

texture) of cases. However, this experimental condition was heavily biased in

favour of Efros and Leung’s method because of the difference in the way that the

training set was utilised by the two methods. Efros and Leung’s method produces

more specific textures but it cannot be used to analyse textures.

7.3 Initial validation of the novelty detection

method

In order to use the novelty detection method developed in Chapter 6, we need to

be confident that it can perform texture discrimination. To validate the method,

a simple experiment was performed.


Figure 7.2: Fractal and scrambled textures.A fractal texture is shown on the left. The right-hand texture is the left-handfractal texture after being scrambled. The grey-level histograms of both texturesare identical.

7.3.1 Aim

The aim of this experiment is to determine if the novelty detection method can

discriminate between two textures with similar characteristics.

7.3.2 Method

A fractal mammographic image was generated. A second image was generated

from the first by scrambling the pixel locations. The resulting image has exactly

the same histogram as the fractal image, but has a different texture. An example

is shown in Figure 7.2. Log-likelihood images were generated for each image by

applying Algorithm 9. The log-likelihood values obtained for each image were

then compared.


Figure 7.3: ROC curve for texture discrimination.

7.3.3 Results

Figure 7.3 shows a ROC curve that was generated by varying a threshold on the

log-likelihood values to classify pixels as belonging to the fractal image or the

scrambled image. The ROC curve shows excellent discrimination (Az = 0.98).

Analysis of the log-likelihood histograms shows that pixels in the scrambled image

are considered to be less likely than those in the fractal image.

7.3.4 Discussion

The results show that the novelty detection method can discriminate on the basis

of local textural appearance (rather than just pixel intensity, for example). This

is important because although mammographic abnormalities often appear to be

Chapter 7—Evaluation of novelty detection performance 200

brighter than their surrounding tissue, the pixel values are often within the range

of those for normal tissue. The novelty detection method should also function

as a “brightness detector”—regions that are unusually bright (or dim) should be

considered unlikely by our model.

7.4 Evaluation of novelty detection performance

7.4.1 Introduction

We performed a number of novelty detection experiments on abnormal mam-

mographic patches. Two types of mammographic texture were used: simulated

abnormal mammographic textures (based on the fractal textures) and patches

from real mammograms. Two classes of abnormality were used for each type

of texture: masses and calcifications. In the case of the fractal textures, these

abnormalities were simulated.

7.4.2 Aims

The aim of these experiments was to determine whether a single model of mam-

mographic textural appearance (for a particular class of texture) can be used to

detect different forms of abnormality (where a conventional pattern recognition

algorithm approach would require multiple classifiers and probably multiple types

of feature descriptor).

The experiments were designed to answer the following questions:


1. How well can abnormalities be detected in simulated mammographic tex-

tures where the textures contain simulated calcifications?


tures where the textures contain simulated masses?


tures where the textures contain both simulated masses and calcifications?

4. How well can abnormalities be detected in real mammographic textures

where the textures contain real calcifications?


where the textures contain real masses?


where the textures contain both real masses and calcifications?

7.4.3 Method

For each experiment a set of images were generated or collected which contained

the desired type, or types, of abnormality. We shortly describe how the simulated

abnormalities were generated. The real microcalcification patches were selected

by a colleague from a local database (as pixel-level expert annotation was avail-

able) on the basis that the set should represent a broad range of appearances

of that class of abnormality. The other real data was pseudo-randomly selected

from the Digital Database for Screening Mammography (DDSM) [83]. Because


the analysis process is computationally expensive, the real mammographic im-

ages were processed at low resolution (150 µm), using a model trained on 10

pathology-free patches from the DDSM (scaled to the same resolution), selected

in the same way as above. Each test set contained 10–20 regions of interest.

The size of the test sets were limited due to the computational time required to

perform the analysis task.

In the case of the simulated abnormalities, groundtruth images were automati-

cally generated. In the case of the real mammographic data the groundtruth was

provided by a digital mammography researcher1. Care was taken during the an-

notation to ensure that the groundtruth was as detailed as possible, rather than

simply marking the centres of abnormalities or providing coarse indications such

as circles that contain the abnormalities. In the case of the microcalcification

images, for example, each microcalcification was individually annotated.

We did not include separate normal images for analysis alongside the simulated

images or the real microcalcification images for computational expediency. The

groundtruth annotations were interpreted strictly: we considered “hits” on pixels

labelled as abnormal to be true positive detections, “hits” on pixels labelled as

normal to be false positive detections and so on for the true negative and false

negative possibilities. Relative to the majority of results published in the liter-

ature, this interpretation of groundtruth produces a pessimistic evaluation of a

detection system because a “hit” close to an abnormal feature would be likely to

draw a clinician’s attention to the area and so would be clinically useful2. How-

1Michael Board, a third year digital mammography PhD student in the Division of ImagingScience and Biomedical Engineering at the University of Manchester.

2In the computer-aided mammography literature it is common to consider a single correct


ever, we believe that our detection criteria are appropriate for experiments on

relatively small test images because we want to measure how well the method

detects specific indicative signs of abnormality, rather than measuring how well

the method would alert clinicians to the presence of abnormality (which would

presumably result from accurate detections of abnormal features). The strict

interpretation of groundtruth treats each pixel in the test images as a separate

data point, delivering a large sample from a seemingly small training set. This

allows the area under the ROC curve to be accurately estimated (i.e. with small

standard error—see Section 7.4.4). However, results from experiments that use a

small number of images are obviously less representative than those from exper-

iments that use a large number of images.

Each pixel in each test image was assigned a log-likelihood—using the appropriate

model—as described in Section 6.7. ROC analysis was performed on each set of

results by thresholding the log-likelihoods, and the resulting classifications were

compared to the groundtruth annotation.

For the real mass images, we found that pixels labelled as being masses were given

very similar log-likelihoods to the surrounding non-mass pixels, to the extent that

no discrimination could be achieved. This was either because our approach fails

on this class of image, or because it is unrealistic to consider tissue close to a mass

to be normal (e.g. it may be distorted by the presence of the mass). For this class

of image, we also analysed a set of 10 pathology-free images, and considered all

pixels in an abnormal image to be abnormal, and all pixels in a normal image to

“hit” in a coarsely annotated abnormal region to be a true positive detection of that region,irrespective of its absolute location or incorrect “hits” or “misses” within that region.


be normal.

Generating synthetic calcifications and masses

The simulation of mammographic abnormalities has been investigated previously.

Highnam et al. investigated adding simulated and real masses to mammograms

represented using hint (see Section 3.3 for details of hint) [87]. Simulated masses

were generated by inferring 3-D models of 2-D mass shapes (obtained, for exam-

ple, from annotations of real masses). The hint values of the real masses were

estimated by subtracting the average hint value of the surrounding non-mass re-

gion from those of the mass region. In each case, the estimated mass hint values

were then simply added to normal hint mammograms. Caulkin et al. modelled

the appearance of spiculated masses by estimating the contribution of the nor-

mal tissue to the abnormal region and then learning statistical models of the

contributions due to the central mass and spicules [38]. A model of spicule place-

ment and number was also learned. Simulated spiculated lesions were generated

by sampling from the models and adding the simulated abnormalities into nor-

mal mammograms. Claridge and Richter modelled the cross-sectional profile of

masses by convolving a step-edge function with a Gaussian kernel and then ro-

tating the resulting function to form a surface, where the height is proportional

to the attenuation due to the mass [44] (a similar approach is described in more

detail below). Bliznakova et al. modelled masses, spicules and microcalcifications

within a 3-D breast model (see Section 9.2.2 for a more detailed description of

the approach) [16].


In our work, both the simulated calcification and mass images were based upon

the fractal mammographic textures described in Section 6.6.1. Fractal back-

grounds were generated and simulated calcifications or masses were introduced

using an additive process, mimicking the attenuation process. We describe how

each type of abnormality was modelled below.

The simulated microcalcifications were modelled by ellipses, rotated to random

angles and placed in clusters. The number of calcifications in each image was fixed

at 30. Algorithm 10 describes how the microcalcifications were simulated and how

the groundtruth was generated. Note that although the simulated microcalcifi-

cation shape and spatial distribution were modelled in an ad hoc way (though

broadly consistent with real data), microcalcification brightness was modelled to

be consistent with real data.

Real mammographic masses can have well-defined borders, diffuse borders or

spiculations. We decided to model masses with diffuse borders because well-

defined borders may to too easy to detect, while spiculated lesions would be

harder to model when only a simple simulation of abnormality is required.

One might model a mass in a breast as being a sphere of uniform density, situated

within the normal breast tissue and distorted by the compression of the breast

between two plates. While a detailed analytical model of the problem could be

derived, a reasonable approximation that yields suitable test images would be

acceptable. We experimented with three methods of simulating masses. All the

methods modify the fractal background pixel values within a disc, and differ in

how the abnormal pixel values are modelled.


Algorithm 10 Simulating microcalcification clusters.

. Generate a fractal texture using Algorithm 8

. Determine the centre of the image and consider the image edges to be 6standard deviations from the centre. Select locations for the calcifications usingthe resulting bivariate normal distribution.for each calcification to be generated do

. Generate a 100× 100 pixel disc.

. Warp the disc to a random elliptical shape (with a mean eccentricity of 2and associated variance of 0.5).. Rotate the ellipse to a random angle.. Convolve the ellipse with a Gaussian kernel (with a standard deviation of10 pixels), to remove the hard edges.. Resize the ellipse to be 4 pixels long along its major axis.. Normalise the ellipse so that its maximum values are set to unity, and theother pixels are scaled accordingly.. Scale the simulated microcalcification pixel values such that when added tothe image, the ratio of the mean calcification pixel value to the mean fractalbackground pixel value is normalised to the same ratio for real mammogramscontaining microcalcifications.. Insert the calcification into the fractal image by adding the pixel values ofthe calcification to the background image pixel values.

end for. Compute the groundtruth image by subtracting the calcification image fromthe original fractal image and then threshold at a low value to discard the effectof the convolution with the Gaussian kernel.


Our first method adds a two dimensional Gaussian to the fractal background.

This method produces very diffuse borders. An example of such an image is

shown in Figure 7.6a.

Our other two methods use concentric discs, where the central disc has uniform

pixel values and the annulus transitions from the uniform value to zero. The

first of these methods assumes that the compressed mass has a cross-section

as illustrated in the top graph in Figure 7.4. The cross-section of the annular

region is semi-circular with radius k. The function, f(d), that describes the X-

ray attenuation in this model (i.e. the depth of the mass) is described by:

f(d) =

0 : d > m

1 : d < m− k√1− (d−(m−k)

k)2 : otherwise

(7.1)

where d is distance from the centre of the concentric discs, m is the distance along

d of the simulated mass boundary and k is the difference between the radii that

describe the two discs. The value of f(d) describes the thickness of the simulated

mass, which is a chord that is perpendicular to the d axis when d > m − k

and d < m. We call this method the circle chord method. An example of

the simulated mass images produced using the circle chord method is shown in

Figure 7.6b.

The second variant of the concentric discs model uses a sigmoid function to de-


Figure 7.4: The circle chord attenuation function.The top graph shows a cross-section of the model of the compressed mass. Thebottom graph shows the attenuation function, f(d).

Figure 7.5: The sigmoid attenuation function.

scribe the attenuation in the annular region:

f(d) =1

1 + e−α

[1− 1

k

(d−(m−k)

)] (7.2)

where d, m and k are as before. The constant α determines the shape of the

sigmoid and we use a value of 6. An illustration of the sigmoid attenuation

function is shown in Figure 7.5. An example of the simulated mass images

produced using the sigmoid method is shown in Figure 7.6c.


(a) (b) (c)

Figure 7.6: Examples of simulated masses using the three methods.Gaussian method (a); circle chord method (b); sigmoid method (c).

For each of the methods, the magnitude of f(d) was scaled so that the ratio of the

mean mass pixel value to the mean non-mass pixel value was equal to that found

in real mass images (i.e. we attempt to accurately model mass brightness). We

assessed by inspection that the sigmoid method produces acceptable test images.

Because the novelty detection algorithm is computationally expensive, we limited

the number of pixels to be analysed by cropping the mass images to contain an

equal number of mass and non-mass pixels. The groundtruth was generated by

computing the difference image between the original fractal images and the result

after adding the synthetic mass.

Although the mass region can be easily identified by eye, the actual pixel values

are not necessarily higher than those in the fractal backgrounds. This is the case

when, for example, a mass pixel is added to a relatively dark background region.

If all of the mass pixel values (after being added to the fractal background) were

higher than those in fractal image (without the presence of a simulated mass),

then simply thresholding the images would identify the mass region. However,


this is not the case.

7.4.4 Results

Results for simulated microcalcifications

Figure 7.7 shows a simulated microcalcification image and Figure 7.7 the corre-

sponding log-likelihood image. Figure 7.7(c) shows the ROC curve for all simu-

lated microcalcification images. The area under the curve is approximately 0.92.

Results for simulated masses

Figure 7.8 shows one of the simulated masses and the corresponding log-likelihood

image. Figure 7.8(c) shows the ROC curve for all simulated mass images. The

area under the curve is approximately 0.64.

Results for simulated mass and microcalcifications (combined)

Figure 7.9 shows the ROC curve for the experiment where the single novelty

detection method is used to detect both simulated masses and microcalcifications

(in equal proportions). The area under the curve is approximately 0.75.


(a) (b)

(c)

Figure 7.7: Example log-likelihood image and ROC curve for simulated micro-calcifications.An example simulated microcalcification image (a); the corresponding log-likelihood image (b); the ROC curve for the simulated calcification images (c).The log-likelihoods range from −276 to −0.5.


(a) (b)

(c)

Figure 7.8: Example log-likelihood image and ROC curve for a simulated mass.An example simulated mass image (a); the corresponding log-likelihood image (b);the ROC curve for the simulated mass images (c). The log-likelihoods range from−216 to −0.5.


Figure 7.9: ROC curve for simulated masses and microcalcifications (combined).


Results for real microcalcifications

Figure 7.10 shows a sample microcalcification cluster along with the correspond-

ing groundtruth and log-likelihood images. Figure 7.10(d) shows the ROC curve

for all the test images. The area under the curve is approximately 0.56.

Results for real masses

Figure 7.11 shows the ROC curve for the real mass experiment. The area un-

der the curve is approximately 0.54. We do not show sample images (see the

discussion of these results in Section 7.4.5).

Results for real mass and microcalcifications (combined)

Figure 7.12 shows the ROC curve for the real mass experiment. The area under

the curve is approximately 0.53. A hypothesis test at the 95% significance level

using the method described by Hanley and McNeil [78] showed that there was a

statistically significant difference between the area under the ROC curve and the

area under the curve corresponding to random discrimination (i.e. the diagonal

line with area equal to 0.5)3.

3It was assumed that the diagonal had the same number of data points as the ROC curve.


(a) (b) (c)

(d)

Figure 7.10: Example log-likelihood image and ROC curve for a real microcal-cification cluster.An example real microcalcification image (a); the corresponding groundtruth im-age (b); the corresponding log-likelihood image (c); the ROC curve for the simu-lated mass images (d). The log-likelihoods range from −25 to −0.3.


Figure 7.11: ROC curve for real masses.


Figure 7.12: ROC curve for real microcalcifications and masses (combined).


7.4.5 Discussion

Simulated microcalcifications

As Figure 7.10(b) shows, the synthetic microcalcifications are easily identified,

which is reflected in the corresponding ROC curve. There are no false-positives in

the normal regions; however, because of the strict pixel-wise evaluation criterion,

there are a few false-positives at the microcalcification edges. This is where the

sampled window is not centred on a pixel that is labelled as abnormal, but does

border an abnormal pixel. The result is that the model is partially conditioned

upon abnormal image data—which biases the conditional model—yielding lower

log-likelihood values for the centre pixel. We will call this the local bias effect.

Note that the log-likelihoods in the abnormal regions of the simulated micro-

calcifications are lower than for the simulated masses, which corresponds with

subjective assertions that microcalcifications are easier to detect than masses.

Simulated masses

The results for the simulated masses are not as good as for the simulated micro-

calcifications, but Figure 7.8(b) shows that the mass is identified. The annular

region of the simulated mass is marked as being more abnormal than the central

region. This may be because the model was not trained on images with this

sort of intensity change, while the model did see the more uniform texture of

similar brightness from the centre of the simulated mass during training. The

log-likelihoods for the central area are close to those of the normal background


texture, and this is reflected in the ROC curve in Figure 7.8(c).

Simulated masses and microcalcifications (combined)

Figure 7.9 shows the results of using the same model and method to detect both

types of abnormality, and shows that this is possible. Although the data were

simulated, this result is important because it shows that it is possible to identify

more than one type of abnormality using a single method. Note that Figure 7.9

was constructed from data where the ratio of microcalcification to mass data was

equal to unity, so that performance on one type of abnormality did not contribute

disproportionately.

Real microcalcifications

The ROC curve in Figure 7.10(d) is disappointing and indicates that the method

performs only slightly better than a random classifier (as indicated by the red

diagonal line, which represents chance). Microcalcifications are considered to

be easy to detect because they are often very bright against the mammographic

background. However, as Figure 7.10(a) shows, this is not always true. It appears

that the local bias effect may also contribute to the poor performance. The log-

likelihoods in the calcified area tend to be lower, but the most “unlikely” pixels do

not correspond exactly to the individual microcalcifications—instead they tend

to be a few pixels away. There are pixels in the uncalcified tissue which have

low log-likelihoods, and are essentially false positives. This may be because the

model is not sufficiently specific to pathology-free appearance, because the tissue


really was abnormal (and unannotated) or because it is incorrect to label tissue

so close to a microcalcification cluster as being normal.

Real masses

The results for real masses are similar to those for the real microcalcifications: the

method performs only slightly better than a random classifier. Because we used

separate normal and abnormal test sets, this result adds weight to the hypothesis

that the model is not specific enough to pathology-free appearance. In the case of

masses, it is unreasonable to expect a small local window to detect abnormality.

A better approach might be to adopt a multi-scale approach where likelihoods

are propagated downwards, such as was used by Liu et al. [120].

Real masses and microcalcifications (combined)

Although the performance on real data is relatively poor, this result indicates

that some discrimination of more than one class of abnormality can be achieved

using a single method.

7.5 Summary

This chapter presented an evaluation of the parametric texture model. In sum-

mary:

• A psychophysical evaluation was reported. The experiment was deployed


as a Internet-based application. The application was tested by a small

number of participants and then advertised to all students at the University

of Manchester.

• The synthetic textures were not indistinguishable from the real textures,

but were selected in approximately one third of trials.

• The synthetic images generated by Efros and Leung’s algorithm were con-

sidered more realistic than those generated by the parametric model. The

textures generated using the parametric model were selected in 26% and

41% of trials. However, the images generated by the Efros and Leung algo-

rithm used a more specific “training” set than was used to train the para-

metric model. Direct comparison of the two approaches should consider

this experimental bias and the ability of the parametric model to analyse

images via novelty detection.

• A novelty detection experiment was reported. Simulated and real microcal-

cification and mass images were analysed using parametric models. Results

for the simulated data show that the novelty detection approach can success-

fully detect multiple types of abnormality using a single method. Results

for the real data show that some discrimination was possible, but significant

improvement is needed. This may be achieved by improving the specificity

of the model and the adoption of a hierarchical strategy.

Chapter 8

GMMs in principal components

spaces and low-dimensional

texture models

8.1 Introduction

This chapter presents a method for learning Gaussian mixture models in low-

dimensional spaces and describes how the parametric texture model may be im-

proved by doing so. The chapter describes:

• The motivation for learning in low-dimensional spaces.

• How principal components analysis can be used to build Gaussian mixture

models—and hence our parametric texture model—in a low-dimensional

222

Chapter 8—Dimensionality reduction 223

space that approximates the natural space of the data.

• How textures can be synthesised using such a model.

8.2 Dimensionality reduction

The dimensionality of the texture model described in Chapter 6 is reasonably

high. With an 11 × 11 window, for example, the model has 121 dimensions.

Given that there is likely to be a high degree of correlation between neighbouring

pixels in the windows, it is sensible to ask if this redundancy can be exploited.

Dimensionality reduction can have a number of benefits in statistical modelling.

Firstly, because the number of data points required to populate a space with fixed

density increases exponentially with the number of dimensions, dimensionality

reduction can allow one to populate the space to be modelled more densely for

a given size of training set. Secondly, since computations often involve iteration

over the number of dimensions in the modelled space, dimensionality reduction

may allow us to develop more efficient algorithms.

In the following sections we describe how a Gaussian mixture model can be built

in a low-dimensional space and how such a model may be used to perform texture

synthesis and analysis.

Chapter 8—Gaussian mixtures in principal components spaces 224

8.3 Gaussian mixtures in principal components

spaces

A set of multivariate measurements, X ={xi : i = 1, . . . , N

}, when thought of

as a cloud of points in a vector space, can be considered to have a set of mutually

orthogonal axes that describe the main directions of variation. In general, these

axes will not be aligned with the regular Cartesian axes (the covariance matrix

of the data is unlikely to be diagonal).

Principal Components Analysis (PCA) [104] is a technique that determines these

axes (the principal components) and the variance associated with each. The

principal components are simply the eigenvectors of the covariance matrix, and

the variances are the associated eigenvalues. If we define P to be a matrix where

each column is an eigenvector of the covariance matrix, then we can project xi, a

measurement in the natural data space, to a vector bi in the principal components

space and back again:

bi = PT(xi − x) (8.1)

xi = x + Pbi ,

where x is the mean vector. (Note that since P has mutually orthogonal columns,

and each is a unit vector, it is orthonormal. Hence P−1 = PT.) Another way

of thinking about P is that it is the transformation needed to diagonalise the


covariance matrix of the original data:

Σb = PTΣxP (8.2)

where Σb is the (diagonal) covariance matrix of the data in the principal compo-

nents space and Σx is the covariance matrix of the data in the natural space.

The total variance of the data can be computed by summing the eigenvalues.

Since the eigenvalues describe the variance associated with each dimension of

the principal components space, it is possible to discard eigenvectors with small

associated variances. In this way, dimensionality reduction can be achieved: the

original data can be transformed into a lower-dimensional space while retaining

an arbitrarily large proportion of the total variance. If P is constructed in this

way, then Equation 8.1 becomes approximate:

bi ≈ PT(xi − x) (8.3)

xi ≈ x + Pbi .

Building a Gaussian mixture model in a principal components space is simple.

Compute Σx from X , perform an eigen decomposition to determine P (discard-

ing eigenvectors to retain a given proportion of the total variance), and then

project each xi into the lower-dimensional principal components space to form

B ={bi : i = 1, . . . , N

}. The Gaussian mixture model is then built using the

data in B.

Once we have a low-dimensional model, we will need to compute conditional


distributions in order to perform synthesis or analysis. It is not possible to ap-

ply conditions directly to the principal components model—as we have done so

far—because the conditions and model exist in different spaces. Recall from

Section 5.5.2 that the conditional distribution p(x1|x2) = N(µ′, Σ ′) with

µ′ = µ1 + Σ1,2Σ−12,2 (x2 − µ2) (8.4)

Σ ′ = Σ1,1 − Σ1,2Σ−12,2 Σ2,1 . (8.5)

We can partition the matrix P as:

P =

P1

P2

, (8.6)

where the rows of P1 correspond to the unknown dimensions (i.e. x1) and the

rows of P2 correspond to the known dimensions (i.e. x2). We can write

Σx =

Σ1,1 Σ1,2

Σ2,1 Σ2,2

≈ PΣbPT =

P1ΣbP1T P1ΣbP2

T

P2ΣbP1T P2ΣbP2

T

. (8.7)

Given Equation 8.7, it is straightforward to write approximations of the condi-

tional mean vector and covariance matrix as:

µ′ = µ1 + Σ1,2Σ−12,2 (x2 − µ2) (8.8)

≈ µ1 + (P1ΣbP2T)(P2ΣbP2

T)−1(x2 − µ2) ,

Σ ′ = Σ1,1 − Σ1,2Σ−12,2 Σ2,1 (8.9)

≈ (P1ΣbP1T)− (P1ΣbP2

T)(P2ΣbP2T)−1(P2ΣbP1

T) .


It therefore seems likely that there is an elegant way to use Gaussian mixture mod-

els in low-dimensional spaces. However, we have not yet considered how to com-

pute the conditional component probabilities,{P (i|x2) : i = 1, . . . , k

}. Unfortu-

nately, in order to compute{P (i|x2) : i = 1, . . . , k

}(as shown in Equation 5.35),{

P (x2|i) : i = 1, . . . , k}

are required, which are computed by marginalising the

model—in its natural space—over the dimensions corresponding to x1. This

means that two versions of the model are required: one in the principal compo-

nents space, and one in the natural space. Though this seems awkward, it may

be acceptable if working in the principal components space is advantageous in

terms of computational efficiency or the more densely populated training space

leads to better models.

8.3.1 A numerical issue

In practice, the P2ΣbP2T matrices are close to singular (numerically difficult to

invert). Common advice on dealing with this type of problem is to add a scaled

identity matrix to the ill-conditioned matrix. This increases the variances in the

corresponding distribution. The scalar is determined by the amount of variance

to be added to the distribution. In the current setting, this advice essentially

assumes that the P2ΣbP2T matrices are close to singular because not all of the

variance observed in the data was kept in the model as a result of the princi-

pal components approximation. In the case of multiple model components, it is

not clear how to distribute the missing variance. We tried distributing the miss-

ing variance in two ways: evenly over the components and in proportion to the

component probabilities. Neither approach performed satisfactorily.

Chapter 8—Texture synthesis in principal components spaces 228

While we have found that computing the Moore-Penrose generalised inverse to

be the most satisfactory way to solve the problem for covariance matrices in

the natural data space, this approach does not work reliably in the principal

components case. We address this problem by computing the generalised inverse

of Σb. If we keep all of the eigenvectors in P , then Σx = PΣbPT. By ignoring

eigenvectors with small associated eigenvalues, Σb is a low rank approximation of

Σx, and similarly P2Σ−1b P2

T is an approximation of Σ−1x .

8.4 Texture synthesis in principal components

spaces

The procedure for generating synthetic textures using a Gaussian mixture model

that has been built in a principal components space is identical to the regular

case, except that the computation of the conditional distribution is performed

as described in Section 8.3. Figure 8.1 shows an example training image taken

from the MeasTex database1 and a synthetic texture generated using a Gaussian

mixture model built in a principal components space. The model retained 95%

of the total variance and the texture was produced using the patch-wise algo-

rithm. The example shown is an example of successful synthesis. We were not

able to synthesise mammographic textures to this level of quality using principal

components models.

1Currently available at http://www.cssip.uq.edu.au/meastex/meastex.html

http://www.cssip.uq.edu.au/meastex/meastex.html


(a) (b)

Figure 8.1: Synthesis using a principal components model.A training image is shown in (a) and a synthetic image is shown in (b).

8.5 Discussion

Our aims in building reduced dimensionality models were:

• To build better models by exploiting data redundancy to more densely

populate the space of training examples.

• To achieve faster training, synthesis and analysis.

Although the second of these aims is partially achieved (training is faster, but the

projections used to compute conditionals result in slower synthesis), it is generally

at the expense of the quality of synthesis (and presumably analysis). Although

the texture shown in Figure 8.1 is one of the best synthetic textures in this thesis,

textures generated using principal components models were generally not as good

as textures from models built in the natural space. The approximation used in


the computation of the inverse of the P2ΣbP2T matrices degrades the quality of

synthesis (see below).

There is a way to benefit from the advantages of dimensionality reduction without

suffering the disadvantages. It is possible to build the model in the low dimen-

sional principal components space and then project the entire model into the

natural space for all subsequent processing (i.e. synthesis and analysis). Thus,

model building is accelerated. Little degradation in the quality of the synthetic

textures is observed (hence the loss in quality noted above can be attributed to

the approximation used in the computation of the inverse of the P2ΣbP2T ma-

trices). Further, we do not need to project the component covariance matrices

each time a conditional distribution needs to be computed, and so synthesis and

analysis can be performed at normal speed. There does not appear to be any

noticeable benefit from having a more densely populated training space in the

models that we have built. The procedure is as follows: the model is built in the

principal components space as described in Section 8.3, and then projected into

the natural space as follows:

µx,i = x + Pµb,i, ∀ i ∈ {1, · · · , k} , (8.10)

Σx,i = PΣb,iPT, ∀ i ∈ {1, · · · , k} , (8.11)

where µx,i and µb,i are the i-th component mean vectors in the natural and prin-

cipal components spaces respectively, and Σx,i and Σx,i are the i-th component

covariance matrices in the natural and principal components spaces respectively.

The component probabilities,{P (i) : i = 1, . . . , k

}, are unaffected by the pro-


jection.

8.6 Summary

This chapter presented a method for learning the parameters of a Gaussian mix-

ture model—and hence a parametric texture model—in low-dimensional spaces.

In summary:

• There are two reasons why building Gaussian mixture models in low di-

mensional spaces might be useful. First, compared to higher-dimensional

spaces, fewer training points are required to populate a low-dimensional

space at a given density, so more specific models may be able to be built.

Second, algorithms often iterate over the dimensions of the data, so working

in a low-dimensional space is likely to yield more efficient algorithms.

• A method to learn the parameters of Gaussian mixture models in principal

components spaces was developed. The closed-form method of computing

conditional distributions was extended to the principal components model

and a numerical issue arising from this was addressed.

• It is not straightforward to marginalise a principal components model over

dimensions in the natural space. This problem makes working in a principal

components space less attractive.

• A method for synthesising textures from a principal components parametric

texture model was described.


• A synthetic texture, generated from a principal components model, was

presented. Although it is possible to achieve excellent results using the

approach, results for principal components models were much more variable

than for the models built in the natural data space.

• Gaussian mixture models can be built in low-dimensional spaces and then

projected into the natural data space. This allows models to be built in

more densely populated spaces in less time and used as if they had been

built in the natural data space.

Chapter 9

A generative statistical model of

entire mammograms

9.1 Introduction

This chapter presents a parametric statistical model of the appearance of entire

mammograms. The chapter describes:

• Why mammograms are difficult to model.

• Approaches other authors have used to solve the problem.

• The structure of our model.

• How the model parameters are learned from training data.

• How synthetic mammograms can be generated using the model.

233


9.2 Background

9.2.1 Why are mammograms hard to model?

Mammograms are difficult to model because they vary dramatically in appearance

and are digitised at high resolution; their appearance is highly detailed. In this

section we will discuss the sources of this variation and comment on how it affects

the images. Figure 9.1 illustrates the effects of these sources of variation.

Size and shape variation

It is apparent that womens’ breasts vary in size and shape. This variation is due

to both the natural variation between individuals, but is also related to lifestyle

(as breasts store fat, overweight or obese women are likely to have larger, more

fatty breasts [54]). In addition to this natural variation, the apparent size and

shape of breasts in mammograms varies due to the imaging process (e.g. the

degree of compression). Compare Figure 9.1(b) and Figure 9.1(e).

Anatomical variation

Womens’ breasts also vary in their composition. The proportion of glandular

to fatty tissue—the density—is variable, with post-menopausal women usually

having almost entirely fatty breasts. The number and configuration of ducts

varies between women, and the imaging process may capture them in varying

degree. Mammograms are digitised at high resolution and the resulting images


(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 9.1: Examples of mammographic variation.These images are to scale.


are therefore large and contain a lot of detail. Another form of variation that

might be considered “anatomical” is that introduced by surgery (e.g. lumpectomy

or augmentation mammoplasty), but we do not consider these types of variation

in this work. Compare Figure 9.1(g), a breast with a well-defined fibro-glandular

tissue region, to Figure 9.1(h), which is almost entirely fatty.

Variation in the imaging process

Due to the manual placement of the breast in the X-ray equipment, features such

as the nipple or pectoral muscle may be absent, partially imaged, obscured or par-

tially visible. While such images can be interpreted by trained clinicians, these

problems pose a significant problem to computer-based methods which often rely

upon reliable points of reference. Further, it is not feasible for radiographers

to take more care in the acquisition process, because of the natural variation be-

tween women and due to the compression part of the process being uncomfortable

or painful. Compare Figure 9.1(k), where the pectoral muscle is not imaged, to

Figure 9.1(a), where the pectoral muscle is included. The breast in Figure 9.1(h)

has a poorly-defined border—which is probably due to the placement and com-

pression of the breast—while Figure 9.1(c) is more defined. Non-uniformity of

the intensity of the X-ray illumination field can also result in a visible difference

in density over an X-ray image (e.g. the anode heel effect). In the next sec-

tion a review of research on modelling and synthesising the appearance of entire

mammograms is presented.


9.2.2 Approaches to modelling the appearance of entire

mammograms

The most common approach to modelling the appearance of entire mammograms

for synthesis and analysis is physics-based. Bakic et al. [10] developed a 3-D

model of the physical distribution of the various tissue types. They modelled

compression of the breast and the X-ray image formation process to generate

simulated X-rays.

Taylor et al. [176] developed a 3-D model of breast development to allow simulated

mammograms to be generated. A voxel-based cellular automaton was initialised

with a rudimentary ductal structure that represented a breast prior to maturation.

Voxels contained a mixture of fatty and glandular tissue. The ductal structure

was developed by allowing it to branch and grow. This development was driven

by simulated branching and growth agents that had either promotive or inhibitive

effects. A parameterised breast surface model was developed using data from real

women. Synthetic mammograms were formed by simulating the compression of

the breast and the projection of X-rays. The paper does not give examples of

synthetic mammograms, but some examples are available on the author’s website

[146].

Bliznakova et al. developed a highly detailed model of the structure of the breast

[16]. The authors modelled the breast surface, ductal system, terminal ductal

lobular units, Cooper’s ligaments, pectoral muscle, 3-D parenchymal texture and

several forms of abnormality. They separately modelled the X-ray image forma-

tion process. The breast shape was modelled simply as two geometrical primitives.


Large, medium and small-sized breasts were modelled separately. The ductal sys-

tem was modelled as a tree structure composed of cylindrical components and a

probabilistic model was used to characterise the branching. The 3-D parenchymal

texture was simulated by mapping 2-D fractal textures into the volumetric space.

Cooper’s ligaments were modelled by thin ellipsoidal shells occurring in random

locations within the breast. Masses were modelled by ellipsoids, spiculations by a

series of connected cylinders and microcalcification clusters by collections of small

ellipsoids. The characteristics of the abnormalities were controlled by user input.

The authors report that simulated mammograms could be generated in less than

5 minutes on a 2 GHz Intel Pentium 4 processor. Subjectively, their results are

impressive. The authors conducted a psychophysical experiment to determine the

extent to which expert radiologists could differentiate real from simulated regions

of interest. Regions of interest of size 40 mm × 40 mm were inspected on screen

and the radiologists correctly identified 80% of the simulated normal patches,

67% of the real normal patches, 87% of the simulated calcification patches, 96%

of the simulated mass patches and 100% of both the real calcification and mass

patches.

Although physics-based models are useful for investigating the image acquisition

process (e.g. patient positioning, radiation dose, breast compression and deforma-

tion), the synthetic images they produce are usually subjectively not particularly

realistic and these methods are not intended to support image analysis.

Another important approach to modelling the appearance of objects in images is

the Active Appearance Model (AAM) [47], which models shape and shape-free

appearance. The work in this chapter is closely related to the AAM, and we give


a brief overview of the method below.

Overview of AAMs

An AAM is a model of the shape and appearance of a particular class of object

in an image—e.g. a face or an anatomical structure—combined with a search

strategy that allows instances of the object to be located in previously unseen

images. The following discussion focuses on the model itself, rather than the

search strategy.

An AAM consists of shape and appearance sub-models, which are statistically

coupled. The shape sub-model is built by annotating a set of training images—

that contain instances of the object of interest—with landmarks. These land-

marks are typically positioned on salient features of the object being modelled

and must correspond across the training set (e.g. when modelling faces, if the

5-th landmark identifies the left corner of the left eye, then it must do so in all of

the training images). Each landmark in a 2-D image is represented by an (x, y)

coordinate. If each image contains N landmarks then the landmark coordinates

for each image can be concatenated to form a vector with 2N elements, which

is called a shape vector. Each of these vectors can be considered to be a point

in a 2N -dimensional space. There is likely to be significant redundancy in each

shape vector because the positions of landmarks will be correlated within each

image and across the training set. This correlation is exploited using Principal

Components Analysis (PCA, which was presented in Chapter 8). PCA allows

each training shape vector to be projected into a low-dimensional space, so that


Figure 9.2: Overview of the Active Appearance Model.The left-most image shows one of the training images with landmarks (in green).Nine samples from the AAM are shown on the right. The top row shows the meanappearance warped to three synthetic shapes sampled from the shape sub-model.The middle row shows three samples from the appearance sub-model warped tothe mean shape. The bottom row shows three joint samples (i.e. from both modelcomponents), illustrating how the model can represent a range of legal instancesof human faces.


each shape vector in the training set has a corresponding shape parameter in the

principal components space.

The appearance sub-model is built by warping each object in the training set

to the mean shape. This removes spatial variation from the training set—which

has already been learned—leaving “textural” variation. Triangles can be defined

between the landmarks in each training image. Image intensities are sampled

within the triangles of each warped training image. The result is a set of vectors of

intensities, one for each training image. There is a dense correspondence between

the elements of the vectors. PCA is applied to exploit the redundancy in these

texture vectors yielding a set of texture parameters in a low-dimensional space.

The shape and texture parameters for each training image are concatenated and

a further PCA is performed. This couples shape and appearance and exploits

any correlation between the two. The result is a set of low-dimensional vectors

that describes both shape and appearance of the objects in the training set. The

distribution of these vectors is modelled, typically using a multivariate normal

distribution. The model can be sampled and the corresponding vector recon-

structed to form a synthetic object in the image plane. The model can also be

used to constrain the AAM search strategy. An example is shown in Figure 9.2,

which illustrates an AAM of the human face. The figure shows landmarks for

one training image and nine samples from the AAM.

AAMs rely on finding sufficient redundancy in the intensity information across a

set of training images to dramatically reduce the dimensionality of the appearance

space, thus making it possible to train the model with a reasonably small set of

Chapter 9—Modelling and synthesising entire mammograms 242

images (e.g. 30). However, the highly detailed nature of mammograms would not

be captured by the AAM approach.

9.3 Modelling and synthesising entire mammo-

grams

We assume that the mammograms we will model are all of the same view,

e.g. mediolateral oblique (MLO) or cranial-caudal (CC); in practice we have

worked with the MLO view. We decompose the problem of modelling mam-

mograms, combining an AAM-like model of global shape and appearance with

a wavelet-based model of stationary texture, allowing us to bypass the curse of

dimensionality [14]. The model is composed of three sub-models: a model of

shape, a model of approximate appearance1, and a model of local textural ap-

pearance. We have trained a model using a set of 36 mammograms from the Dig-

ital Database for Screening Mammography [83]. The number of training images

was limited by a computational “bottleneck”—solving the shape correspondence

problem—which we discuss further in the next section. After outlining a series

of pre-processing steps, we describe each of these sub-models and show how they

can be combined to synthesise mammograms.

1Shape and approximate appearance are jointly modelled.


9.3.1 Breast shape and the correspondence problem

We assume a training set of images B, which are mammograms with the non-

breast regions (e.g. markers) set to black, and normalised such that all breasts

“point” to the right (i.e. the nipple is on the right). As in an AAM, a statistical

shape model (SSM) is used to cope with size and shape variation. A set of land-

mark points is required to define the shape of the breast in each image. These

landmarks must correspond across the training set. In the SSM framework, land-

marks are often manually placed and chosen to correspond to easily identifiable

image features (e.g. the tips of the fingers when modelling hand shapes). As

mammograms lack reliable features, we seek to automate annotation. A naıve

approach is to use landmarks placed at regular intervals on the breast borders,

starting at a reliable location (e.g. the right-most point of the breast, the approx-

imate location of the nipple). Such landmarks are a good first approximation.

However, synthetic shapes generated from a shape model built using landmarks

with such correspondences are subjectively unrealistic. This is demonstrated in

the top row of Figure 9.3. Better correspondences are required.

Approaches proposed by Kotcheff and Taylor [115] and Davies et al. [51, 50]

seek to improve correspondences across a set of training shapes. The idea is to

search over parameterisations of the training shapes to find a model that best

describes the training set. The training shapes are re-parameterised using a set

of monotonic mappings which guarantee that the landmark ordering is preserved

across the training set, and so the mapping is diffeomorphic2. These methods

aim to find an optimal set of re-parameterisations—according to some concept of

2A diffeomorphism is a mapping that does not tear or fold the manifold.


Figure 9.3: Samples from two shape models, illustrating the need for goodcorrespondences.Top row: Samples from a shape model built using regularly-spaced landmarks.Bottom row: Samples from a shape model built using optimal correspondences.Examples of real mammogram shapes, taken from the training set, are shown inFigure 9.6.


goodness—and so the problem is posed as an optimisation.

The main difference between the two methods is the choice of the objective func-

tion. Kotcheff and Taylor’s method uses the determinant of the (re-parameterised)

shape model’s covariance matrix, which effectively measures the hyper-volume oc-

cupied by the training set in shape space; minimising the measure yields more

compact models. The method proposed by Davies et al. uses an information

theoretic measure of model quality. Their objective function computes the num-

ber of bits required to transmit the training set by encoding it using a (re-

parameterised) shape model. In order for a receiver to understand the message,

the model must also be transmitted, and so contributes to the objective function.

The authors refine the piecewise-linear re-parameterisation method presented by

Kotcheff and Taylor using the integral of a sum of Cauchy kernels, ensuring that

the re-parameterisation functions are differentiable and hence more suited for

use in an optimisation scheme. Additionally, an efficient optimisation scheme is

presented.

Although the method of Davies et al. is rigorously justified, it is closely approx-

imated by that of Kotcheff and Taylor, whose method generally finds a good

solution to the correspondence problem more quickly. We initialise Kotcheff and

Taylor’s scheme with regularly-spaced landmarks, and run their optimisation.

We refine the improved correspondences using the minimum description length

(MDL) scheme of Davies et al. Figure 9.4 shows the value of the Kotcheff and

Taylor objective function as a function of the iteration number for our training

set and Figure 9.5 shows the objective function values for the MDL algorithm3.

3Note that the two methods’ objective functions are expressed in different units.


Figure 9.4: Values of the Kotcheff and Taylor objective function.

Figure 9.6 shows selected points from the initialisation and final solution, and

illustrates how the correspondences are improved. Although the difference is

subtle, and the solution is not necessarily intuitive, Figure 9.3 shows the impor-

tance of good correspondences. Figure 9.3 shows that improved correspondences

yield an SSM that successfully limits illegal variation.

The final shape model has the form:

s = s + Psbs (9.1)

where s is a shape parameterised by bs, s is the mean shape, and Ps is a matrix

whose columns are a set of eigenvectors of the shape data covariance matrix,

sufficient for the model to retain a given proportion of the total variance of the


Figure 9.5: Values of the MDL objective function.The solution found by running the Kotcheff and Taylor algorithm (see Figure 9.4)is refined in this run of the MDL algorithm.

original data. The number of retained eigenvectors, ds, is typically much smaller

than the dimensionality of the original space: ds � 2Nl (where there are Nl

landmarks for each image), so the distribution of shape parameters can be learned

from a reasonably small training set.

The computational cost of optimising the correspondences is related to the num-

ber of training shapes. This limits the number of training images that can be used

to build the model of entire mammographic appearance—this is the “bottleneck”

mentioned in the previous section. It should be possible to allow an arbitrarily

large number of training images to be used. Correspondences could be optimised

for a relatively small set of training shapes and an active shape or appearance

model could then be built and used to locate the breast border in other training


Figure 9.6: The initial and final correspondences for the mammogram shapemodel.The figure shows every 10-th point for 6 of the 36 training shapes. The top rowshows the initial positions, and the bottom row shows the final solution.


images [165]. Hence correspondences in these other training images would be

defined implicitly. However, this work is beyond the scope of this thesis.

9.3.2 Approximate appearance

We consider mammographic appearance to have two components: an approx-

imate appearance (the general global appearance of the mammogram) and a

detailed appearance (the local textural details, see Section 9.3.3). In this section

we describe how approximate appearance is modelled. We address the appear-

ance correspondence problem, describe how approximate appearance is separated

from detailed appearance and describe how approximate appearance is related to

breast shape.

The appearance correspondence problem

To model the approximate appearance, we need to consider the appearance cor-

respondence problem. If we could guarantee that all mammograms contain the

same features, then we could define dense correspondences between the contents

of a set of mammograms. This is not the case and, due to anatomical differences

between women, there are no underlying correspondences that can be exploited.

We choose to cope with this form of variation implicitly by approximately regis-

tering breasts to a canonical shape and then learning the variation in appearance

(the mean shape s provides a natural canonical reference shape). For each seg-

mented breast in B, we use a thin plate spline [20] to warp to the mean shape,

yielding a set N of segmented breasts in a shape-normalised space. The thin


plate spline does not guarantee diffeomorphic transformations, but since we do

not use control points within the breast region, and the order of control points is

preserved by the correspondence optimisation algorithms, the resulting warps are

well-behaved. An alternative approach would be to use a non-rigid registration

algorithm to define the correspondences between points within the breast region,

but we have not investigated this.

The steerable pyramid decomposition

In modelling mammographic appearance, we would like to be able treat the ap-

pearance of each mammogram as a point in an appearance space so that the

appearance could be modelled using straightforward statistical methods. Al-

though the number of pixels in a mammogram is very large, if we could exploit

redundancy in the shape-normalised appearance—for example by defining dense

correspondences—we might be able to populate the appearance space sufficiently

for density estimation to be successful. Unfortunately, this is not the case. To

overcome this problem we use a hierarchical decomposition called the steerable

pyramid [164]. This is a wavelet-like image decomposition developed for use in

texture modelling and synthesis. Images are decomposed in terms of multiple

scales and orientations using directional derivative basis functions which range in

scale and orientation. This allows the coarse and fine structure of the images to

be treated separately within a single framework.

The steerable pyramid was selected because it decomposes images in terms of

scale and orientation (which have been found to be useful in mammography ap-


plications, see Chapter 4); it has been used successfully in texture modelling and

synthesis [143, 144]; the decomposition is motivated by knowledge of biological

vision ([183] discusses the work of Hubel [93] and Wiesel [184]); and there is a

freely available implementation4. See Section 9.3.3 for further notes on the use

of the steerable pyramid.

Figure 9.7 shows a block diagram of the decomposition. Analysis is shown on

the left-hand side. The image is separated into high- and low-pass sub-bands

using filters H0 and L0. The low-pass sub-band is then separated into a series

of oriented bandpass sub-bands and another low-pass sub-band using filters {Bi}

and L1. This low-pass sub-band image is then sub-sampled by a factor of two

in each direction and the result passed recursively to {Bi} and L1, as indicated

by the dark circle and shaded region in Figure 9.7. Synthesis is shown on the

right-hand side of Figure 9.7, and involves reversing the analysis steps.

We can think of the steerable pyramid decomposition as having a structure similar

to a quad-tree. The pyramid has a number of levels which correspond to scale,

and range from coarse (a few pixels square) to fine (the same size as the original

image). Each level has a number of oriented sub-band images. In addition, there

is a coarse low-pass sub-band and a fine high-pass sub-band. Although there are

more coefficients in the pyramid than pixels in the original image, the hierarchical

structure of the pyramid allows us to decompose our modelling problem further.

We can consider the top part of the pyramid (the coarse levels) separately to

the bottom part of the pyramid (the fine levels). Figure 9.8 shows the top three

4The steerable pyramid software is currently available at http://www.cns.nyu.edu/˜eero/steerpyr/

http://www.cns.nyu.edu/~eero/steerpyr/

http://www.cns.nyu.edu/~eero/steerpyr/


Figure 9.7: Block diagram for the steerable pyramid decomposition.Analysis is shown on the left and synthesis is shown on the right. The dark circleindicates the recursive computation of the shaded region. The {Bi} filters computethe oriented sub-band images.

pyramid levels for a mammogram.

The approximate appearance model

We want to be able to represent the general appearance of a mammogram in a

way that allows us to subject it to statistical analysis. We decompose each image

in N to form the set of pyramids P . For each pyramid in P , we concatenate the

coefficients in the top few pyramid levels into a vector a. This vector describes

the approximate appearance of the shape-normalised mammogram. We again

perform PCA, yielding

a = a + Paba. (9.2)

Initially, the coefficients in each pyramid level are effectively measured on different

intensity scales. In order to use a covariance matrix to model the distribution


Figure 9.8: The coefficients in the top three levels of a steerable pyramid de-composition of a mammogram.The breast is oriented so that the nipple points downwards. The oriented sub-bands are shown as L-shaped sets of images and the final low-pass image is in thetop-right corner of the image. The arrows indicate the orientation of the filtersused to compute the five sub-band images. The high-pass image is not shown.

of such data—either for its own sake, or to perform PCA—it is best to use a

common scale. We normalise the data in each dimension either to z-scores as

described by Equation 9.3 [45], or to a common scale using a robust M-estimator

of spread [145], depending upon the characteristics of the data. If xi is a data

point from a sample with mean x and standard deviation σ then zi, the z-score

for xi, is given by:

zi =xi − x

σ. (9.3)

For simplicity, the conversion to and from these standard scales is assumed in the

rest of this chapter.


A joint model of shape and approximate appearance

We have described how we can model mammographic appearance in the shape-

normalised space. To perform synthesis we need to be able to warp to a plausible

shape. A naıve approach would be to model the distribution of the shape param-

eters bs and then sample from it. However, this would not take into consideration

the fact that there may be a relationship between the appearance of a mammo-

gram and its size and shape. For example, fatty breasts tend to be large, while

glandular breasts tend to be small. The approach we take is to model the joint

distribution of shape parameters and approximating appearance parameters and

then condition this model on the approximating parameters to yield a model of

plausible shapes for the generated mammogram. We use a single multivariate

Gaussian:

p(bs, ba) = p(bj) = N(mj, Σj). (9.4)

9.3.3 Detailed appearance

The approximating model provides a first approximation to the mammographic

appearance, but does not include any information from the lower pyramid levels.

We call these the detailing levels. A parent vector is defined as the set of coef-

ficients on a path through a pyramid at locations corresponding to a particular

pixel in the original image [53]. The parent vector contains information about

the local image behaviour at a particular location, from the coarsest level to the

finest. Using a notation similar to that in Figure 9.7, a parent vector, bt(x, y),


corresponding to a particular (x, y) location in the original image is given by:

bt(x, y) =[H0(x, y), B1

0(⌊

x21

⌋,⌊

y21

⌋), B1

1(⌊

x21

⌋,⌊

y21

⌋), · · · , (9.5)

B20(

⌊x22

⌋,⌊

y22

⌋), B2

1(⌊

x22

⌋,⌊

y22

⌋), · · · , LM−1(

⌊x

2M−1

⌋,⌊

y2M−1

⌋)]T,

where there are M levels, H0(x, y) is the coefficient at (x, y) in the high-pass band,

Bji (x, y) is the coefficient at (x, y) in the i-th oriented sub-band at the j-th level,

LM−1(x, y) is the coefficient at (x, y) in the low-pass band and the subscript t in

bt indicates texture. The floor function—which returns the largest integer that

is less than or equal to x—is denoted by bxc. It serves here to ensure that the

sub-bands are correctly indexed. In the remainder of this chapter we will drop the

(x, y) indexing notation, as we assume that the detailed textural component of

mammographic appearance is stationary. This assumption makes the problem of

modelling detail tractable. It is reasonable because we might expect local detail

to depend only on tissue type, which is modelled implicitly by the approximating

model.

We consider a parent vector bt to be a point in a high dimensional vector space.

A suitable model of the distribution of parent vectors, p(bt), would allow the

detailing levels to be populated by sampling the model, conditioned upon the

coefficients in the approximating levels. This approach is motivated by previous

work on hierarchical texture modelling by De Bonet and Viola [53] and Sajda

et al. [159]. Multivariate Gaussian, or mixture of multivariate Gaussian, repre-

sentations are ideal for this purpose as there is a closed-form solution for the

conditional Gaussian (see Section 5.5). p(bt) is modelled as p(bt) = N(µt, Σt).


Although we use the steerable pyramid, the choice of decomposition is probably

not critical: other authors have reported success with hierarchical conditioning of

wavelet coefficients for texture modelling, synthesis and analysis and presumably

use different decompositions (e.g. [159, 53]).

9.3.4 Generating synthetic mammograms

Algorithm 11 describes how synthetic mammograms are generated.

Algorithm 11 Generating a synthetic mammogram

. Simultaneously sample an approximate appearance parameter, ba, and ashape parameter, bs from the joint model of p(bs, ba).

This is equivalent to sampling one and then conditionally sampling the other.

. Reconstruct the approximating steerable pyramid coefficients by projectingba back to the natural space to yield the corresponding a.for each (x, y) location within the shape-normalised breast region do

. Sample the parent vector at the current location. The detailing coefficientswill be unpopulated. Compute the distribution of detailing coefficients byconditioning the model of p(bt) on the approximating coefficients in the sam-pled parent vector, using Equation 5.33.. Sample from this conditional distribution, and place the sampled detailingcoefficients into the parent vector at the current location.

Note that because the steerable pyramid may not be a perfect quad-tree, theabove two steps are implemented as iterations over the pyramid levels.

end for. Reconstruct the fully-populated pyramid to form the corresponding image inthe shape normalised space.. Project the shape parameter bs to its natural space, yielding the shape thatcorresponds to the parameter.. Warp the reconstructed image to the sampled shape.

Chapter 9—Example synthetic mammograms 257

9.4 Example synthetic mammograms

We selected 36 pathology-free MLO mammograms—ranging in size, shape and

appearance—from the Digital Database for Screening Mammography (DDSM)

[83] and built a model of mammographic appearance as described in Section 9.3.

The training set had relatively few images because the optimisation of the breast

boundary landmark correspondences is computationally expensive. Such a small

training set cannot represent the full variation in mammographic appearance,

although the synthetic images generated by the model are subjectively quite

realistic (future work should investigate if realistic results can be achieved with

models trained with larger training sets).

One hundred (100) landmark points were used to define the breast boundary. We

used 7 pyramid levels—including the high- and low-pass sub-bands—each with 5

orientations. The top three pyramid levels were included in the approximating

model. These had 159 420 coefficients prior to PCA. We found that retaining

90% of the total variance in the shape model and 99% in the approximating ap-

pearance models yielded compact models that produced convincing results when

sampled. One hundred thousand (100 000) locations within the breast regions

were randomly selected and the corresponding parent vectors were extracted.

Their distribution was modelled using a single multivariate Gaussian component,

as described in Section 9.3.3.

Building the model took approximately 24 hours (most of this time was spent

computing the optimal shape correspondences). Producing a synthetic mammo-

gram takes approximately 2.5 hours (almost all of this time is spent sampling the


conditional parent vectors)5. Figure 9.9 shows some synthetic mammograms that

were generated using our model and Figure 9.10 shows some synthetic mammo-

grams alongside a real mammogram.

9.5 Summary

This chapter presented a generative statistical model of the appearance of entire

mammograms. In summary:

• The appearance of entire mammograms is difficult to model because of the

variation between women, variability in the imaging process and the high

resolution of the images.

• Our model is composed of components that model the breast shape, the

approximate appearance and detailed texture. Detailed texture is assumed

to be stationary. The three model components are statistically coupled, so

that plausible synthetic mammograms can be generated.

• The breast shape model can be learned by solving the shape boundary

landmark correspondence problem. Algorithms developed by Kotcheff and

Taylor and Davies et al. were used to solve this problem.

• Synthetic mammograms can be generated by sampling from the joint model

of shape and approximate appearance and sampling detailing coefficients

using a hierarchical conditioning method.

5Timings are for a computational server with a 2.8GHz Intel Xeon processor and 2GB ofRAM.


Figure 9.9: Synthetic mammograms generated using the model.


Figure 9.10: Real and synthetic mammograms.A real mammogram is shown on the left and three synthetic mammograms areshown on the right.

Chapter 10

Evaluating the synthetic

mammograms

10.1 Introduction

This chapter presents an evaluation of the synthetic mammograms produced by

the model described in the previous chapter. The chapter describes:

• A qualitative evaluation of the synthetic mammograms by an expert mam-

mography radiologist.

• A quantitative psychophysical evaluation of the synthetic mammograms.

• An evaluation of the detailing component of the model.

261

Chapter 10—Qualitative evaluation by a mammography expert 262

10.2 Qualitative evaluation by a mammography

expert

An expert mammography radiologist evaluated our synthetic mammograms in

a psychophysical experiment. We printed real and synthetic mammograms onto

quality A4 paper using a high-quality laser printer. For the real mammograms, we

used the real mammograms from N (i.e. without markers and other non-breast

regions). While one should not generally test using a training set, we believed

that although the synthetic mammograms are quite realistic, they would not be

good enough to convince an expert radiologist, and so the experiment would not

be biased by testing with training data.

We presented a “shuffled” set of 13 real and 13 synthetic full resolution mammo-

grams to the radiologist and asked them to rank the mammograms according to

how realistic they were. The radiologist was quickly able to sort the mammograms

into the two sets. Although they were able to identify the synthetic mammograms,

their feedback was positive and the most useful feedback was obtained in infor-

mal discussion. The radiologist said that some of the synthetic mammograms

looked ‘quite realistic’. One of the ways they could identify the synthetic images

was by the lack of blood vessels, lymph nodes and benign calcifications. Such

structures exist at the boundary of the approximate appearance model and the

detailing texture model and are not captured by our current model. The radiolo-

gist pointed out that our synthetic mammograms were ‘a little fuzzy’ and lacked

‘dark regions’ ; the latter criticism can probably be attributed to the relatively

small training set. The radiologist said that one of the synthetic mammograms—a

Chapter 10—A quantitative psychophysical evaluation 263

large, fatty breast—was unrealistic. The radiologist was dismissive of the quality

of the other examples of synthetic mammograms and mammographic textures

in the literature, and considered our synthetic images to be superior (though it

should be noted that realism is not the aim of some of these methods).

10.3 A quantitative psychophysical evaluation

10.3.1 Aims

Aware that the lack of blood vessels, lymph nodes and benign calcifications made

the difference between the real and synthetic mammograms more obvious, we

wanted to determine whether the two classes could be distinguished when such

features could not be used as prompts.

10.3.2 Method

We formed sets of 7 real and 7 synthetic mammograms at low resolution. The real

mammograms were manually selected such that the set did not contain any with

very strong vascular clues. The synthetic set contained mammograms generated

using our model. Some “fatty” synthetic mammograms were excluded because

similar real mammograms were often excluded from the set of reals because they

contained strong vascular clues1. All selected mammograms were reduced in size

such that the remaining vascular clues could not easily be perceived in the set of

1It is also the case that the fatty mammograms generated using our model were deemed tobe less realistic by the expert mammography radiologist.

Chapter 10—A quantitative psychophysical evaluation 264

real mammograms. The resulting images were small (approximately 200 × 140

pixels), but the synthetic mammograms contained contributions from both the

approximating and detailing models at this resolution. Each real mammogram

in the set was paired with each synthetic mammogram to form a test set of 49

pairs. The number of images used was limited by the time available to synthesise

the set of synthetic mammograms. However, we were not sufficiently confident

that the synthetic mammograms would be realistic enough to be confused for the

real mammograms, and so a larger experiment would not have been justifiable.

We recruited five participants2 and allowed them to study a training set of 6 real

mammograms, scaled to fit within a 1024 × 768 pixel computer display. They

then performed a forced choice experiment, in which they were asked to guess

the real mammogram from each of the 49 possible pairings of real and synthetic

mammograms.

10.3.3 Results

At the end of the experiment, the participants were asked if they could tell the

difference between the real and synthetic mammograms: none of the subjects

believed that they had been able to identify the real mammograms reliably.

χ2 analysis (see Section 7.2.2) showed that one participant did no better than

random at the 95% significance level. The other participants differed significantly

from random, but consistently mistook the synthetic mammograms for the real

2Computer vision researchers from the division of Imagine Science and Biomedical Engi-neering at the University of Manchester.

Chapter 10—Evaluating the detailing model 265

ones. Between them, the participants correctly identified 75 real mammograms

out of 245 (31%). If we allow the consistent misclassification to count as correct

identification of the real mammograms, the participants collectively identified 191

real mammograms out of 245 (78%).

10.3.4 Discussion

These results show that, although the reduced resolution synthetic images are not

always indistinguishable from real mammograms, they are sufficiently convinc-

ing to make discrimination difficult. The fact that several subjects consistently

selected the synthetic mammograms as the real ones implies that the differences

were very subtle. It is interesting that the participants did not think they could

tell the difference, even though the statistical analysis indicates otherwise. It is

possible that the results can be attributed to the relatively small set of images

used to train the model, the small number of images used to “train” the non-

expert readers or by the selection of the real images used in the experiment. It

would therefore be unwise to generalise the above result.

10.4 Evaluating the detailing model

It is difficult to show the contribution made by the detailing model—either on

screen or in print—by examining entire mammograms, because of the high res-

olution of the images. Using a region of interest makes the contribution to the

textural appearance visible.


Figure 10.1 shows the contribution made by the detailing levels to regions of

interest from a real and synthetic mammogram. The left-hand column shows

contributions for a real mammogram and the right-hand column shows contribu-

tions for a synthetic mammogram. The top-row shows the contributions made

by the finest pyramid level, the second row shows the contributions made by the

finest and next-finest pyramid level, and so on. The bottom row shows the con-

tributions made by all detailing levels. These contribution images were computed

by taking the pixel-wise differences between regions that were reconstructed with

and without the corresponding detailing levels. The real mammogram was se-

lected to be subjectively similar in appearance to the synthetic mammogram (to

allow comparison of the contribution images) and the regions of interest were

extracted from approximately the same location in each image. The detail model

can be evaluated by comparing the textural characteristics of the real and syn-

thetic contribution images.

The images in the top row of Figure 10.1 subjectively have almost identical tex-

tures. The coefficients at this level are likely to represent high-frequency signals

such as “noise”. Subjectively, the images in the second row are texturally very

similar, but the real mammographic data has a slightly larger range. The images

in the third row are also subjectively similar, but the real data contains structure

corresponding to curvilinear features. This leads the real data to have a larger

range than the synthetic data. The images in the fourth row—showing all contri-

butions made by the detailing coefficients—are subjectively similar, but there are

large contributions made by curvilinear features in the real data. The histograms

of the contribution images in the bottom row show that the distributions of the


Figure 10.1: Contributions of detailing coefficients to real and synthetic mam-mograms.Left-hand column: contributions for a real mammogram. Right-hand column:contributions for a synthetic mammogram. See text for details.


difference values are approximately normal. The standard deviation of the real

data is approximately twice that of the synthetic data.

The mammograms that the contribution images in Figure 10.1 correspond to are

different, but allow us to draw some conclusions about how the detailing model

works with the approximating model to synthesise mammographic texture. Given

approximating coefficients for two similar mammograms, the detailing model is

subjectively successful in capturing the characteristics of the finest two levels.

Subjectively, the second most coarse level is also modelled reasonably well. The

coarsest level is not modelled particularly well. This is because the detailing

model assumes stationarity, but in reality the level is dominated by curvilinear

structures. These structures feed down to the second most coarse detailing level to

some extent. Some small curvilinear structures are also found at the second most

coarse detailing level. Similar results are obtained when detailing coefficients are

sampled for the approximating coefficients of a real mammogram.

The contribution images show that the use of a single multivariate Gaussian

component adequately models the detailed texture component of mammograms.

There is little evidence to suggest that a more complex model (such as a mixture

of Gaussians) would dramatically improve the stationary aspects of the detailed

texture. However, it is clear that modelling curvilinear structures is of vital im-

portance to the detailed texture. These long range structures tend to be most

evident in the coarsest detailing level. The model cannot currently capture such

structures. Learning legal configurations of curvilinear features within a statisti-

cal framework is likely to be a significant challenge. One approach to this problem

would be to extract networks of curvilinear structures using a method such as


that presented by Zwiggelaar and Marti and statistically model characteristics of

curvilinear structure length, width, tortuosity and branching [191]. By learning

the joint distribution of these features and approximating parameters, it may be

possible to determine and synthesise the correct types of curvilinear networks for

a particular type of breast.

10.5 Summary

This chapter presented an evaluation of synthetic mammograms generated using

the model developed in the previous chapter. In summary:

• An expert mammography radiologist could easily distinguish between real

and synthetic mammograms. However, they commented that some of the

synthetic mammograms looked ‘quite realistic’. The lack of blood vessels,

lymph nodes and benign calcifications allowed the synthetic mammograms

to be identified.

• A quantitative psychophysical evaluation of reduced resolution synthetic

mammograms showed that, in general, the synthetic mammograms could

be differentiated from real mammograms, but not very reliably. One par-

ticipant could not distinguish between the two classes at all and the other

participants consistently misclassified the synthetic mammograms as real,

reporting that they could not tell the difference between the two classes.

The results indicate that, at low resolution, the synthetic mammograms are

sufficiently realistic that differentiating real and synthetic mammograms is


difficult.

• An evaluation of the contribution made by the detailing model shows that,

while local textural detail is successfully captured, the model cannot capture

the appearance of curvilinear structures. As the qualitative evaluation by

the expert mammography radiologist showed, these structures allow the

real and synthetic mammograms to be easily differentiated. A method of

modelling these structures was proposed.

Chapter 11

Summary and conclusions

11.1 Introduction

This chapter presents:

• A summary of the work presented in this thesis.

• The conclusions that may be drawn from the work.

• A final statement.

11.2 Summary

• Chapter 2 presented background information on breast cancer, the clinical

problem and the various imaging modalities that are used to diagnose the

271


disease. Breast cancer is a significant public health problem and many

countries have X-ray mammography screening programmes. The image

inspection task is performed visually and is subject to human error.

• Chapter 3 presented a review of the computer-aided mammography liter-

ature. CADe algorithms typically extract shape and texture features from

candidate locations and use classifiers to differentiate between true and false

detections of specific indicative signs of abnormality. Commercial systems

are available and have been shown to improve radiologist performance; how-

ever, they can also fail to improve performance. Psychophysical research

has suggested that a false positive rate much lower than that achieved by

current commercial systems is required for significant improvement in radi-

ologist performance. Much more sophisticated approaches may be required

to achieve such targets. One such method is novelty detection, which re-

quires a model of normal mammographic appearance that can measure

deviation from normal appearance. Statistical models should allow this

deviation to be measured within a rigorous mathematical framework. If

novelty detection is to be used, then the underlying model must be able

to “legally” represent any pathology-free instance and be unable to legally

represent abnormal instances. The only way to verify this is to be able

to generate instances from the model; thus the model must be generative.

Further, generative models make it relatively easy to visualise what has

been modelled successfully and what has not.

• Chapter 4 described work on improving the way that scale-orientation

pixel signatures are computed. Two flaws with an existing implementation


were identified and a new method of computing signatures was developed.

An information theoretic measure of signature quality showed that, com-

pared to the original method of computing pixel signatures, the new method

increased signature information content by approximately 19%. A classi-

fication experiment was reported in which signatures computed using the

two methods were used to discriminate between pixels belonging to normal

and spiculated lesion tissues. The new signatures outperformed the original

signatures in terms of both specificity and sensitivity.

• Chapter 5 presented background information on the multivariate normal

distribution and the Gaussian mixture model. The Gaussian mixture model

is a flexible solution to the density estimation problem. Model parame-

ters can be learned using the k-means and Expectation-Maximisation algo-

rithms. Both the marginal and conditional distributions can be computed

for a Gaussian mixture model in closed-form; these distributions are them-

selves Gaussian mixture models. It is straightforward to sample from a

Gaussian mixture model.

• Chapter 6 presented Efros and Leung’s algorithm for texture synthesis

and developed the method into a parametric statistical model of texture

that can be used in both generative and analytical modes. Methods of

synthesising and analysing textures were developed and synthetic images

were presented.

• Chapter 7 presented a psychophysical evaluation of synthetic mammo-

graphic textures produced by the parametric model. The synthetic textures

were not indistinguishable from the real textures, but were selected in ap-


proximately one third of trials. The synthetic images generated by Efros

and Leung’s algorithm were considered more realistic than those gener-

ated by the parametric model; the textures generated using the parametric

model were selected in 26% and 41% of trials. However, the images gen-

erated by the Efros and Leung algorithm used a more specific “training”

set than was used to train the parametric model. Direct comparison of the

two approaches should consider this experimental bias and the ability of

the parametric model to analyse images via novelty detection.

Simulated and real microcalcification and mass images were analysed using

parametric models. Results for the simulated data show that the novelty

detection approach can successfully detect multiple types of abnormality

using a single method. Results for the real data show that some discrim-

ination was possible, but significant improvement is needed. This may be

achieved by improving the specificity of the model and the adoption of a

hierarchical strategy.

• Chapter 8 presented an investigation into how Gaussian mixture mod-

els may be learned in low-dimensional principal components spaces. The

closed-form method of computing conditional distributions was extended

to the principal components model. The chapter described a method for

synthesising textures from a parametric texture model built in a princi-

pal components space. It is not straightforward to marginalise a principal

components model over dimensions from the natural space. This problem

makes working in a principal components space less attractive. Although it

is possible to achieve excellent results using the approach, results for princi-


pal components models were much more variable than for the models built

in the natural data space.

• Chapter 9 described a generative statistical model of entire mammograms

and showed how synthetic mammograms may be generated. The model

has components that model the breast shape, approximate appearance and

the detailed texture. The breast shape model is learned by solving the

shape boundary landmark correspondence problem using the approaches

described by Kotcheff and Taylor and Davies et al.

• Chapter 10 presented three evaluations of the synthetic mammograms

generated using the model of entire mammograms. An expert mammogra-

phy radiologist could easily distinguish between real and synthetic mammo-

grams, but noted that some of the synthetic mammograms did look quite

realistic. The lack of blood vessels, lymph nodes and benign calcifications

allowed the synthetic mammograms to be identified.

A quantitative psychophysical evaluation of reduced resolution synthetic

mammograms showed that, in general, the synthetic mammograms could be

differentiated from real mammograms. However, one participant could not

distinguish between the two classes and the other participants consistently

misclassified the synthetic mammograms as real, reporting that they could

not tell the difference between the two classes. The results indicate that,

at low resolution, the synthetic mammograms are sufficiently realistic that

differentiating real and synthetic mammograms is difficult.

An evaluation of the contribution made by the detailing model shows that,

while local textural detail is successfully captured, the model cannot capture

Chapter 11—Conclusions 276

the appearance of curvilinear structures.

11.3 Conclusions

The work in this thesis should be considered in the correct context: while very

much research has been done on the traditional approach to CADe, almost no

work on generative statistical models for novelty detection has been done previ-

ously.

As discussed in Chapter 3, one of the most significant problems that the computer-

aided mammography community needs to address is the high false positive rate

of CADe systems. We believe that this can only be achieved by systems that

have a much better “understanding” of mammographic appearance. In addition

to reducing the false positive rate, it would be desirable if CADe systems could

detect any indicative sign of abnormality, not just microcalcifications and masses.

It would be elegant if a single algorithm could detect any indicative sign of ab-

normality. We believe that the novelty detection approach is the most principled

way to achieve these aims.

The results of the novelty detection experiment in Chapter 7 show that it is

possible for a single algorithm to detect multiple types of abnormality within a

novelty detection framework. Although the results for real mammographic data

were a little disappointing, the approach does have potential. The generative

property of the model developed in Chapter 6 was important as it allowed us

to verify exactly what had been modelled successfully and what had not. This


was particularly useful during the development of the model and its implemen-

tation. Although the assumption underpinning the model—that mammographic

appearance is a stationary texture—is obviously invalid, the development of this

model allowed us to gain an understanding of the problems involved in modelling

mammographic appearance.

The evaluation of the synthetic textures showed that they were good enough to be

confused with the real textures about a third of the time and compared favourably

with those produced using Efros and Leung’s method, which is considered to be

one of the best methods in the literature. The parametric model is competitive

with the non-parametric method, but is much more flexible: synthetic textures

can be generated, images can be analysed using the novelty detection algorithm

and the time and space complexity of the method scales well with the number of

training pixels.

There is a significant lack of rigorous evaluation of texture synthesis algorithms

in the literature. Psychophysical experiments allow the human visual system to

be used objectively and quantitatively. Psychophysical experiments can be de-

ployed relatively easily via the Internet, allowing large numbers of participants

to be recruited. However, there are disadvantages to running experiments on-

line: participants are self-selecting, participants may be unlikely to volunteer for

experiments that take a long time to complete or if personal information is so-

licited and it is not possible to control the environment in which the experiment

is conducted (e.g. distractions, viewing distance, ambient lighting).

The generative model developed in Chapter 9 represents a significant step towards


understanding how to statistically model the appearance of entire mammograms.

We decomposed the problem into modelling shape, general appearance and de-

tailed textural appearance. All three components were successfully modelled.

Curvilinear structures were not considered and were therefore not captured by

the model. Future work should consider how this important component of ap-

pearance can be combined with the other model components.

While the full synthetic mammograms can easily be differentiated from real mam-

mograms by an expert mammography radiologist, computer vision researchers

found discrimination at low resolution difficult. The aim of developing the model

of entire mammograms was to further our understanding of how real mammo-

grams may be statistically modelled, rather than to immediately solve the novelty

detection problem; future research should pursue both goals.

Modelling the appearance of entire mammograms is extremely difficult, and we

conclude with a suggestion for an alternative approach to the novelty detection

problem. Consider a pair of mammograms taken of a particular patient. Each

mammogram in that pair is a very specific model of what the other should look

like. Asymmetry can therefore be considered as a novelty detection approach.

It may be possible to statistically learn the legal transformations that may be

applied to a mammogram in a pair of normal mammograms. Novelty would

correspond to an illegal transformation. It may be possible to generalise this idea

to the case where both CC and MLO views are available, or to the temporal case.

Chapter 11—Final statement 279

11.4 Final statement

This thesis proposed a new approach to detecting abnormalities in mammograms.

Novelty detection requires a model of normal mammographic appearance that al-

lows deviation from normality to be measured. Two generative statistical models

of mammographic appearance have been developed and evaluated. A novelty

detection experiment showed that it is possible to detect multiple types of ab-

normality using a model of normal appearance if that model is sufficiently spe-

cific. Psychophysical experiments demonstrated that significant progress has been

made towards being able to realistically model both mammographic texture and

the appearance of entire mammograms.

Appendix A

The expectation maximisation

algorithm

A.1 Introduction

Maximum likelihood is an approach to finding “optimal” estimates for model

parameters. A set of model parameters, θ∗, is optimal in the maximum likelihood

sense if they are most likely given some observed data:

θ∗ = arg maxθ

L(θ|{yi}) (A.1)

where L is the likelihood function and {yi} are the observed data. The likelihood

function is usually replaced by the log-likelihood function ` for computational

convenience.

280

Appendix A—The algorithm 281

The expectation maximisation (EM) algorithm is a general approach to solving

maximum likelihood problems in the presence of missing data [80, 137]. One form

of missing data is latent data which is a contrivance that makes the parameter

estimation problem tractable. Latent data can be assumed to exist—even if it

cannot be measured—and in this way can be considered missing. We will now

present the abstract form of the EM algorithm (see Section 5.4.3 for an example

of an application of the algorithm). The presentation of the algorithm is based

in part upon those of Ravishanker et al. [151] and Hastie et al. [81].

A.2 The algorithm

The EM algorithm is named after its two steps, the expectation step and the

maximisation step. These steps are iterated until the algorithm converges.

Let YO denote the observed data, YL denote the latent data and let the complete

data be denoted by Y = (YO,YL). From conditional probability we can write

P (YO|θ) =P (YL,YO|θ)P (YL|YO, θ)

=P (Y|θ)

P (YL|YO, θ). (A.2)

Taking logarithms:

`(θ;YO) = `0(θ;Y)− `1(θ;YL|YO) (A.3)

where `1 is based upon P (YL|YO, θ). Taking expectations, conditioned on YO and

Appendix A—Proof of convergence 282

the model parameters at the m-th iteration of the algorithm, θ(m):

`(θ;YO) = Q(θ, θ(m))−H(θ, θ(m)) (A.4)

def= E[`0(θ;Y)|YO, θ(m)]− E[`1(θ;YL|YO)|YO, θ(m)].

Equation A.4 is the log-likelihood equivalent of the objective function we seek

(Equation A.1). Q(θ, θ(m)) is computed in the E-step. This is essentially a vertical

slice through the density shown in Figure 5.1. The M-step obtains θ(m+1) by

maximising Q over θ:

Q(θ(m+1), θ(m)) ≥ Q(θ, θ(m)) ∀θ. (A.5)

The actual form that Q takes is problem specific (see Section 5.4.3 for a more

intuitive example). We shall now show why maximising Q maximises `(θ;YO)

and prove that the EM algorithm converges by showing that each step of the EM

algorithm is guaranteed not to decrease the objective function.

A.3 Proof of convergence

We will show that

`(θ(m+1);YO)− `(θ(m);YO) ≥ 0 (A.6)

with equality if θ(m+1) = θ(m). Consider what happens to the objective function—

in terms of Q and H—as we move from one iteration of the EM algorithm to the


next:

`(θ(m+1);YO)− `(θ(m);YO) = (A.7)A︷︸︸︷[

Q(θ(m+1), θ(m))−Q(θ(m), θ(m))]

−[H(θ(m+1), θ(m))−H(θ(m), θ(m))

]︸︷︷︸B

The M-step ensures that Q(θ(m+1), θ(m)) ≥ Q(θ(m), θ(m)), and so part A of Equa-

tion A.7 will be non-negative. If part B of Equation A.7 is non-positive, then the

an iteration of the EM algorithm cannot decrease the objective function—i.e. we

need to prove that:

H(θ, θ(m)) ≤ H(θ(m), θ(m)) ∀θ, (A.8)

which can be read as ‘H(θ, θ(m)) is maximised by θ = θ(m)’.

From Equation A.4 and the definition of conditional expectation we can write H

as:

H(θ, θ(m)) =

∫yL∈L

p(yL|YO, θ(m)) log p(yL|YO, θ) dyL. (A.9)

Note that H has the form

∫ ∞

−∞p(x) log q(x) dx (A.10)

where p and q are densities with associated models θ(p) and θ(q). Equation A.8

says that H is maximised when θ(p) = θ(q). Considering the discrete case:

n∑i

pi log qi, (A.11)


we can state the following:

log x ≤ x− 1 ⇒n∑i

pi log qi ≤n∑i

pi(qi − 1) (A.12)

=n∑i

(piqi − pi)

=n∑i

piqi −n∑i

pi

=n∑i

piqi − 1

∑ni piqi is a scaler product of two vectors, p and q. The Cauchy-Schwartz in-

equality states that:

|p · q| ≤ ‖p‖2‖q‖2 (A.13)

so

|p · q|‖p‖2‖q‖2

≤ 1. (A.14)

The scaler product of two vectors is

|p · q| = ‖p‖2‖q‖2 cos θ (A.15)

so if |p · q| is maximised, then |p · q| = ‖p‖2‖q‖2 ⇒ cos θ = 1 ⇒ θ = 0, and so p

and q are parallel. The two vectors are parallel if p = tq. If∑n

i pi =∑n

i qi = 1,

then t = 1 and p = q. Therefore Equation A.11 is maximised when pi = qi ∀i.

Generalising to the continuous case:

limn→∞

n∑i

pi log qi =

∫ ∞

−∞p(x) log q(x) dx. (A.16)


Equation A.16 is maximised when p = q (i.e. when θ(p) = θ(q)), and so H(θ, θ(m))

is maximised when θ = θ(m). Therefore part B of Equation A.7 is non-positive

and so an iteration of the EM algorithm cannot decrease the log-likelihood of the

model parameters given the observed data. In summary, the EM algorithm finds a

maximum of the objective function. However, there is no guarantee that the max-

imum will be the global maximum, and so several runs of the algorithm—starting

from different initialisations—may be necessary to find a suitable solution.

Bibliography

[1] L. V. Ackerman and E. E. Gose. Breast lesion classification by computer

and xeroradiography. Cancer, 30(4):1025–1035, October 1972.

[2] F. E. Alexander, T. J. Anderson, H. K. Brown, A. P. M. Forrest, W. Hep-

burn, A. E. Kirkpatrick, B. B. Muir, R. J. Prescott, and A. Smith. 14 years

of follow-up from the edinburgh randomised trial of breast-cancer screening.

The Lancet, 353(9168):1903–1908, June 1999.

[3] S. R. Amendolia, F. Estrella, T. Hauer, D. Manset, D. McCabe, R. Mc-

Clatchey, M. Odeh, T. Reading, D. Rogulin, D. Schottlander, and

T. Solomonides. Grid Databases for Shared Image Analysis in the Mam-

moGrid Project. In Proceedings of International Database Engineering and

Applications Symposium. IDEAS’04, pages 312–321. IEEE, July 2004.

[4] Breast Cancer Facts and Figures 2003–2004. Annual report, American

Cancer Society, Atlanta, Georgia, USA, 2003.

[5] Cancer Facts and Figures 2004. Annual report, American Cancer Society,

Atlanta, Georgia, USA, 2004.

286

Bibliography 287

[6] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,

J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen.

LAPACK Users’ Guide. Society for Industrial and Applied Mathematics,

Philadelphia, PA, USA, 3rd edition, 1999.

[7] S. Astley, R. Zwiggelaar, C. Wolstenholme, K. Davies, T. Parr, and C. Tay-

lor. Prompting in mammography: How accurate must the prompt gener-

ators be? In N. Karssemeijer, M. A. O. Thijssen, J. H. C. L. Hendriks,

and L. J. T. O. van Erning, editors, Digital Mammography, volume 13 of

Computational Imaging and Vision, pages 347–354. Kluwer Academic Pub-

lishers, November 1998.

[8] S. M. Astley, C. R. M. Boggis, K. Walker, S. Wallace, S. Tomkinson,

V. Hillier, and J. Morris. An Evaluation of a Commercial Prompting System

in a Busy Screening Centre. In H.-O. Peitgen, editor, Digital Mammogra-

phy: IWDM—6th International Workshop on Digital Mammography, pages

471–475. Springer-Verlag, March 2003.

[9] S. M. Astley, T. C. Mistry, C. R. M. Boggis, and V. F. Hillier. Should

we use humans or a machine to pre-screen mammograms. In H.-O. Peit-

gen, editor, Digital Mammography: IWDM—6th International Workshop

on Digital Mammography, pages 476–480. Springer-Verlag, March 2003.

[10] P. R. Bakic, M. Albert, D. Brzakovic, and A. D. A. Maidment. Mammogram

synthesis using 3D simulation. I. Breast tissue model and image acquisition

simulation. Medical Physics, 29:2131–2139, 2002.

Bibliography 288

[11] J. A. Bangham, P. D. Ling, and R. Young. Multiscale recursive medians,

scale-space and transforms with applications to image processing. IEEE

Transactions on Image Processing, 5(6):1043–1048, 1996.

[12] N. Baxter. Preventive health care, 2001 update: should women be routinely

taught breast self-examination to screen for breast cancer? Canadian Med-

ical Association Journal, 164(13):1837–1846, June 2001.

[13] A. O. Beacham, J. S. Carpenter, and M. A. Andrykowski. Impact of

benign breast biopsy upon breast self-examination. Preventive Medicine,

38(6):723–731, June 2004.

[14] R. E. Bellman. Adaptive Control Processes. Princeton University Press,

Princeton, NJ, USA, 1961.

[15] U. Bick, M. L. Giger, R. A. Schmidt, R. M. Nishikawa, D. Wolverton, and

K. Doi. Automated segmentation of digitized mammograms. Academic

Radiology, 2:1–9, 1995.

[16] K. Bliznakova, Z. Bliznakov, V. Bravou, Z. Kolitsi, and N. Pallikarakis. A

three-dimensional breast software phantom for mammography simulation.

Physics in Medicine and Biology, 48(22):3699–3719, 2003.

[17] M. Board, S. Astley, and C. Boggis. Multi-resolution transportation for

the detection of mammographic asymmetry. In International Workshop on

Digital Mammography, 2004. (Accepted, pending.).

Bibliography 289

[18] L. Bocchi, G Coppini, J. Nori, and G. Valli. Detection of single and clus-

tered microcalcifications in mammograms using fractals models and neural

networks. Medical Engineering and Physics, 26(4):303–312, May 2004.

[19] F. O. Bochud, C. K. Abbey, and M. P. Eckstein. Statistical texture syn-

thesis of mammographic images with clustered lumpy backgrounds. Optics

Express, 4(1):33–43, January 1999.

[20] F. L. Bookstein. Principal Warps: Thin-Plate Splines and the Decompo-

sition of Deformations. IEEE Transactions on Pattern Analysis Machine

Intelligence, 11(6):567–585, 1989.

[21] H. Booth, M. Gautrey, M. Sheldrake, N. Cooper, and M. Quinn. Cancer

statistics registrations: Registrations of cancer diagnosed in 2001, England.

Annual report series MB1 no. 32, National Statistics, 2004. Crown copy-

right.

[22] N. F. Boyd, J. W. Byng, R. A. Long, E. K. Fishell, L. E. Little, A. B.

Miller, G. A. Lockwood, D. L. Tritchler, and M. J. Yaffe. Qualitative clas-

sification of mammographic densities and breast cancer risk: results from

the Canadian National Breast Screening Study. Journal of the National

Cancer Institute, 87(9):670–675, May 1995.

[23] M. Brady, F. Gilbert, S. Lloyd, M. Jirotka, D. Gavaghan, A. Simp-

son, R. Highnam, T. Bowles, D. Schottlander, D. McCabe, D. Watson,

B. Collins, J. Williams, A. Knox, M. Oevers, and P. Taylor. eDiaMoND:

the UK’s Digital Mammography National Database. In International Work-

shop on Digital Mammography, 2004. (Accepted, pending.).

Bibliography 290

[24] Breast Cancer Factsheet—February 2004. Online, February 2004. Accessed

March 13 2005.

[25] J. L. Breau. Chemotherapy in the management of breast cancer (la chimio-

thrapie dans le traitement du cancer du sein). Chirurgie; Memoires De

l’Academie De Chirurgie, 120(6–7):354–356, 1994–1995.

[26] J. Bresenham. Algorithm for computer control of digital plotter. IBM

System Journal, 4:25–30, 1965.

[27] D. S. Brettle, E. Berry, and M. A. Smith. Synthesis of texture from clinical

images. Image and Vision Computing, 21:433–445, May 2003.

[28] J. Brown, A. Coulthard, A. K. Dixon, J. M. Dixon, D. F. Easton, R. A.

Eeles, D. G. R. Evans, F. G. Gilbert, C. Hayes, J. P. R. Jenkins, et al.

Rationale for a national multi-centre study of magnetic resonance imaging

screening in women at genetic risk of breast cancer. The Breast, 9(2):72–77,

April 2000.

[29] D. Brzakovic, X. M. Luo, and P. Brzakovic. An Approach to Automated

Detection of Tumors in Mammograms. IEE Transactions on Medical Imag-

ing, 9(3):233–241, September 1990.

[30] P. C. Bunch, J. F. Hamilton, G. K. Sanderson, and A. H. Simmons. A free

response approach to measurement and characterization of radiographic

observer performance. SPIE Proceedings, 127:124–135, 1977.

[31] C. J. C. Burges. A tutorial on support vector machines for pattern recog-

nition. Knowledge Discovery and Data Mining, 2(2):1–43, 1998.

Bibliography 291

[32] Warren L. J. Burhenne, S. A. Wood, C. J. D’Orsi, S. A. Feig, D. B. Kopans,

K. F. O’Shaughnessy, E. A. Sickles, L. Tabar, C. J. Vyborny, and R. A.

Castellino. Potential contribution of computer-aided detection to the sen-

sitivity of screening mammography. Radiology, 215(2):554–562, May 2000.

[33] J. W. Byng, N. F. Boyd, E. Fishell, R. A. Jong, and M. J. Yaffe. The

quantitative analysis of mammographic densities. Physics in Medicine and

Biology, 39(10):1629–1638, October 1994.

[34] C. B. Caldwell, S. J. Stapleton, D. W. Holdsworth, R. A. Jong, W. J.

Weiser, G. Cooke, and M. J. Yaffe. Characterization of mammographic

parenchymal pattern by fractal dimension. Physics in Medicine and Biology,

35(2):235–247, February 1990.

[35] R. Campanini, D. Dongiovanni, E. Iampieri, N. Lanconelli, M. Masotti,

G. Palermo, A. Riccardi, and M. Roffilli. A novel featureless approach to

mass detection in digital mammograms based on Support Vector Machines.

Physics in Medicine and Biology, 49(6):961–975, March 2004.

[36] N. A. Campbell and J. B. Reece. Biology. Benjamin Cummings, 7th edition,

December 2004.

[37] The stages, http://www.cancerhelp.org.uk/help/default.

asp?page=3315 , accessed July 10 2005.

[38] S. J. Caulkin, S. M. Astley, A. Mills, and C. R. M. Boggis. Generating

Realistic Spiculated Lesions in Digital Mammograms. In M. J. Yaffe, editor,

Digital Mammography: IWDM 2000, 5th International Workshop, pages

713–720. Medical Physics Publishing, December 2001.

http://www.cancerhelp.org.uk/help/default.asp?page=3315

http://www.cancerhelp.org.uk/help/default.asp?page=3315

Bibliography 292

[39] N. Cerneaz and M. Brady. Finding curvilinear structures in mammograms.

In N. Ayache, editor, Computer Vision, Virtual Reality and Robotics in

Medicine, volume 905 of Lecture Notes in Computer Science, pages 372–

382. Springer, March 1995.

[40] D. P. Chakraborty. Maximum likelihood analysis of free-response receiver

operating characteristic (FROC) data. Medical Physics, 16(4):561–568, July

1989.

[41] H.-P. Chan, D. Wei, M. A. Helvie, B. Sahiner, D. D. Adler, M. M. Goodsitt,

and N. Petrick. Computer-aided classification of mammographic masses and

normal tissue: linear discriminant analysis in texture feature space. Physics

in Medicine and Biology, 40(5):857–875, May 1995.

[42] R. Chandrasekhar and Y. Attikiouzel. Automatic Breast Border Segmen-

tation by Background Modelling and Subtraction. In M. J. Yaffe, editor,

Digital Mammography: IWDM 2000, 5th International Workshop, pages

560–565, Madison, Wisconsin, USA, December 2001. Medical Physics Pub-

lishing.

[43] P. Chaturvedi. Does smoking increase the risk of breast cancer? The Lancet

Oncology, 4(11):657–658, November 2003.

[44] E. Claridge and J. H. Richter. Characterisation of mammographic lesions.

In A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital

Mammography: Proceedings of the 2nd International Workshop on Digi-

tal Mammography, York, UK, 10–12 July 1994, pages 241–250. Elsevier

Science, September 1994.

Bibliography 293

[45] G. M. Clarke and D. Cooke. A Basic Course in Statistics. Arnold Publish-

ers, 4th edition, October 1998.

[46] P. Collinson. Of bombers, radiologists, and cardiologists: time to ROC.

Heart, 80(3):236, February 1998.

[47] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active Appearance Models.

IEEE Transactions on Pattern Analysis Machine Intelligence, 23(6):681–

685, 2001.

[48] T. F. Cootes, C. J. Taylor, and A. Lanitis. Active shape models: Evaluation

of a multi-resolution method for improving image search. In E. Hancock,

editor, Proceedings of the 5th British Machine Vision Conference, pages

327–336. BMVA Press, September 1994.

[49] I. Daubechies. Ten Lectures on Wavelets. CBMS-NSF Conference Series

in Applied Mathematics. Society for Industrial and Applied Mathematics,

January 1992.

[50] Rh. H. Davies. Learning Shape: Optimal Models for Analysing Shape Vari-

ability. PhD thesis, The Victoria University of Manchester, Manchester,

United Kingdom, 2002.

[51] Rh. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J.

Taylor. A Minimum Description Length Approach to Statistical Shape

Modelling. IEEE Transactions on Medical Imaging, 2002.

[52] USF Digital Mammography Home Page, http://marathon.csee.

usf.edu/Mammography/Database.html , accessed January 2005.

http://marathon.csee.usf.edu/Mammography/Database.html

http://marathon.csee.usf.edu/Mammography/Database.html

Bibliography 294

[53] J. S. De Bonet and P. Viola. A Non-Parametric Multi-Scale Statistical

Model for Natural Images. Advances in Neural Information Processing, 10,

1997.

[54] I. den Tonkelaar, P. H. M. Peeters, and P. A. H. van Noord. Increase

in breast size after menopause: prevalence and determinants. Maturitas,

48(1):51–57, May 2004.

[55] J. Dengler, S. Behrens, and J. F. Desaga. Segmentation of Microcalcifica-

tions in Mammograms. IEEE Transactions on Medical Imaging, 12(4):634–

642, December 1993.

[56] P. A. Devijer and J. Kittler. Pattern Recognition: A Statistical Approach.

Prentice Hall International, 1982.

[57] J. Dinnes, S. Moss, J. Melia, R. Blanks, F. Song, and J. Kleijnen. Effec-

tiveness and cost-effectiveness of double reading of mammograms in breast

cancer screening: findings of a systematic review. The Breast, 10(6):455–

463, December 2001.

[58] C. J. D’Orsi, D. J. Getty, J. A. Swets, R. M. Pickett, S. E. Seltzer, and B. J.

McNeil. Reading and Decision Aids for Improved Accuracy and Standard-

ization of Mammographic Diagnosis. Radiology, 184:619–622, September

1992.

[59] A. A. Efros and W. T. Freeman. Image quilting for texture synthesis and

transfer. In L. Pocock, editor, SIGGRAPH ’01: Proceedings of the 28th

annual conference on computer graphics and interactive techniques, pages

341–346, New York, USA, 2001. ACM Press.

Bibliography 295

[60] A. A. Efros and T. K. Leung. Texture Synthesis by Non-Parametric Sam-

pling. In 7th International Conference on Computer Vision (ICCV ’99),

volume 2, pages 1033–1039. IEEE Computer Society Press, November 1999.

[61] T. Ema, K. Doi, R. M. Nishikawa, Y. Jiang, and J. Papaioannou. Image

feature analysis and computer-aided diagnosis in mammography: reduc-

tion of false-positive clustered microcalcifications using local edge-gradient

analysis. Medical Physics, 22(2):161–169, February 1995.

[62] C. Evans, K. Yates, and M. Brady. Statistical Characterization of Normal

Curvilinear Structures in Mammograms. In H.-O. Peitgen, editor, Digital

Mammography: IWDM—6th International Workshop on Digital Mammog-

raphy, pages 285–291. Springer-Verlag, March 2003.

[63] A. Fenster, K. Surry, W. Smith, and D. B. Downey. The use of three-

dimensional ultrasound imaging in breast biopsy and prostate therapy. Mea-

surement, 36(3–4):245–256, October–December 2004.

[64] B. Fisher, J. Bryant, J. J. Dignam, D. L. Wickerham, E. P. Mamounas,

E. R. Fisher, R. G. Margolese, L. Nesbitt, S. Paik, T. M. Pisansky, and

N. Wolmark. Tamoxifen, Radiation Therapy, or Both for Prevention of

Ipsilateral Breast Tumor Recurrence After Lumpectomy in Women With

Invasive Breast Cancers of One Centimeter or Less. Journal of Clinical

Oncology, 20(20):4141–4149, October 2002.

[65] C. E. Floyd, J. Y. Lo, A. J. Yun, D. C. Sullivan, and P. J. Kornguth. Pre-

diction of Breast Cancer Malignancy Using and Artificial Neural Network.

Cancer, 74(11):2944–2948, December 1994.

Bibliography 296

[66] P. Forrest. Breast cancer screening. Report to Health Ministers of England,

Wales, Scotland and Northern Ireland by Working Group chaired by Sir

Patrick Forrest, 1987. HMSO.

[67] T. W. Freer and M. J. Ulissey. Screening mammography with computer-

aided detection: prospective study of 12 860 patients in a community breast

center. Radiology, 220(3):781–786, September 2001.

[68] D. D. Garber. Computational Models for Texture Analysis and Texture

Synthesis. PhD thesis, University of Southern California, May 1981.

[69] GE Healthcare — Product Technology — Mammography — Senographe

2000D, http://www.gehealthcare.com/euen/mammography/

products/senographe-2000d/2000d_cad.html , accessed July 20

2005.

[70] M. L. Giger, Z. Huo, C. J. Vyborny, L. Lan, R. M. Nishikawa, and I. Rosen-

bourgh. Results of an Observer Study with an Intelligent Mammographic

Workstation for CAD. In H.-O. Peitgen, editor, Digital Mammography:

IWDM—6th International Workshop on Digital Mammography, pages 297–

303. Springer-Verlag, March 2003.

[71] P. Giger, M. L. Lu and Z. Huo. CAD in mammography: Computerized

detection and classification of masses. In A. G. Gale, S. M. Astley, D. R.

Dance, and A. Y. Cairns, editors, Digital Mammography: Proceedings of

the 2nd International Workshop on Digital Mammography, York, UK, 10–

12 July 1994, page 281. Elsevier Science, September 1994.

http://www.gehealthcare.com/euen/mammography/products/senographe-2000d/2000d_cad.html

http://www.gehealthcare.com/euen/mammography/products/senographe-2000d/2000d_cad.html

Bibliography 297

[72] F. J. Gilbert, A. Kirkpatrick, C. Boggis, S. Astley, S. Field, A. Gale,

C. Hancock, K. Young, J. Cooke, S. Moss, R. Blanks, and L. Garvican.

Computer Aided Detection in Mammography: Working Party of the Ra-

diologists Quality Assurance Coordinating Group. Technical report, NHS,

NHS Cancer Screening Programmes, Sheffield, UK, January 2001. NHSBSP

Publication No. 48.

[73] P. C. Gøtzsche and O. Olsen. Is screening for breast cancer with mammog-

raphy justifiable? The Lancet, 355(9198):129–134, January 2000.

[74] J. Grim and M. Haindl. A Discrete Mixtures Colour Texture Model. In

Texture 2002: The 2nd international workshop on texture analysis and syn-

thesis, pages 59–63, 1 June 2002.

[75] ATAC Trialists’ Group. Results of the ATAC (Arimidex, Tamoxifen, Alone

or in Combination) trial after completion of 5 years’ adjuvant treatment for

breast cancer. The Lancet, 356(9453):60–62, January 2005.

[76] D. Gur, J. H. Sumkin, H. E. Rockette, M. Ganott, C. Hakim, L. Hard-

esty, W. R. Poller, R. Shah, and L. Wallace. Changes in Breast Cancer

Detection and Mammography Recall Rates After the Introduction of a

Computer-Aided Detection System. Journal of the National Cancer In-

stitute, 96(3):185–190, 2004.

[77] W. C. Hahn. Telomerase and Cancer. Clinical Cancer Research, 7:2953–

2954, October 2001.

Bibliography 298

[78] J. A. Hanley and B. J. McNeil. The meaning and use of the area under

a receiver operating characteristic (ROC) curve. Radiology, 143(1):29–36,

April 1982.

[79] R. M. Haralick, K. Shanmugan, and I. Dinstein. Texture features for image

classification. IEEE Transactions on Systems, Man and Cybernetics, 3:610–

621, 1973.

[80] H. Hartley. Maximum likelihood estimation from incomplete data. Biomet-

rics, 14:174–194, 1958.

[81] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical

Learning. Springer Series in Statistics. Springer, 2001.

[82] Health Service Quarterly. Report 18, Office for National Statistics, London,

UK, Summer 2003.

[83] M. Heath, K. Bowyer, D. Kopans, R. Moore, and P. Kegelmeyer Jr. The

Digital Database for Screening Mammography. In M. J. Yaffe, editor, Dig-

ital Mammography: IWDM 2000, 5th International Workshop, pages 212–

218, Madison, Wisconsin, USA, December 2001. Medical Physics Publish-

ing.

[84] D. J. Heeger and J. R. Bergen. Pyramid-Based Texture Analysis/Synthesis.

In SIGGRAPH 95: 22nd International ACM Conference on Computer

Graphics and Interactive Techniques, pages 229–238. ACM Press, 1995.

Bibliography 299

[85] J. J. Heine, S. R. Deans, R. P. Velthuizen, and L. P. Clarke. On the statisti-

cal nature of mammograms. Medical Physics, 26(11):2254–2265, November

1999.

[86] R. Highnam and M. Brady. Mammographic Image Analysis. Computational

Imaging and Vision Series. Kluwer, April 1999.

[87] R. P. Highnam, J. M. Brady, and R. E. English. Simulating Disease in

Mammography. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000,

5th International Workshop, pages 727–731. Medical Physics Publishing,

December 2001.

[88] R. P. Highnam, J. M. Brady, and B. J. Shepstone. A representation

for mammographic image processing. Medical Image Analysis, 1(1):1–18,

March 1996.

[89] F. L. Hitchcock. The distribution of a product from several sources to

numerous localities. Journal of Mathematics and Physics, 20:224–230, 1941.

[90] A. Holmes. Computer-aided Detection of Abnormalities in Mammograms.

PhD thesis, The Victoria University of Manchester, Manchester, United

Kingdom, 2001.

[91] A. S. Holmes, C. J. Rose, and C. J. Taylor. Measuring Similarity between

Pixel Signatures. Image and Vision Computing, 20(5–6):331–340, April

2002.

Bibliography 300

[92] A. S. Holmes, C. J. Rose, and C. J. Taylor. Transforming Pixel Signa-

tures into an Improved Metric Space. Image and Vision Computing, 20(9–

10):701–707, August 2002.

[93] D. H. Hubel. Exploration of the Primary Visual Cortex. Nature, 299:515–

524, 1982.

[94] Z. Huo, M. L. Giger, C. V. Vyborny, U. Bick, P. Lu, D. E. Wolverton, and

R. A. Schmidt. Analysis of spiculation in the computerized classification of

mammographic masses. Medical Physics, 22(10):1569–1579, October 1995.

[95] I. W. Hutt. The computer-aided detection of abnormalities in digital mam-

mograms. PhD thesis, The Victoria University of Manchester, Manchester,


[96] I. W. Hutt, S. M. Astley, and C. R. M. Boggis. Prompting as an aid to

Diagnosis in Mammography. In A. G. Gale, S. M. Astley, D. R. Dance,

and A. Y. Cairns, editors, Digital Mammography: Proceedings of the 2nd

International Workshop on Digital Mammography, York, UK, 10–12 July

1994, pages 389–398. Elsevier Science, September 1994.

[97] P. T. Huynh, A. M. Jarolimek, and S. Daye. The false negative mammo-

gram. Radiographics, 18:1137–1154, 1998.

[98] Press release, http://www.icadmed.com , accessed January 2005.

[99] iCAD Breast Cancer Detection, http://www.icadmed.com , accessed

July 20 2005.

http://www.icadmed.com

http://www.icadmed.com

Bibliography 301

[100] IEEE Computer Society. IEEE Standard for Binary Floating-Point Arith-

metic, IEEE Standard 754-1985. Standard, IEEE, 1985.

[101] International Breast Cancer Screening Network, http://

appliedresearch.cancer.gov/ibsn/ , accessed July 20 2005.

[102] A. K. Jain, M. N. Murty, and P. J. Flynn. Data Clustering: A Review.

ACM Computing Surveys, 31(3), September 1999.

[103] R. A. Johnson and D. W. Wichern. Applied Multivariate Statistical Analy-

sis. Prentice-Hall, 5th edition, 2002.

[104] I. T. Jolliffe. Principal Component Analysis. Springer Series in Statistics.

Springer Verlag, New York, USA, 2nd edition, 2002.

[105] N. Karssemeijer. Adaptive noise equalization and recognition of micro-

calcification clusters in mammograms. International Journal of Pattern

Recognition and Artificial Intelligence, 7(6):1357–1376, 1993.

[106] N. Karssemeijer. Adaptive Noise Equalization and Image Analysis in Mam-

mography. In H. H. Barrett and A. F. Gmitro, editors, International Con-

ference on Information Processing in Medical Imaging, volume 687 of Lec-

ture Notes in Computer Science, pages 472–486, Flagstaff, Arizona, USA,

June 14–18 1998. Springer.

[107] N. Karssemeijer. Automated classification of parenchymal patterns in mam-

mograms. Physics in Medicine and Biology, 43(2):365–378, February 1998.

[108] N. Karssemeijer. Local orientation distribution as a function of spatial

scale for detection of masses in mammograms. In A. Kuba, M. Samal, and

http://appliedresearch.cancer.gov/ibsn/

http://appliedresearch.cancer.gov/ibsn/

Bibliography 302

A. Todd-Pokropek, editors, Information Processing in Medical Imaging:

16th International Conference, IPMI ’99, Visegrad, Hungary, June 28-July

2, 1999, volume 1613 of Lecture Notes in Computer Science, pages 280–293.

Springer, June 1999.

[109] N. Karssemeijer, J. D. M. Otten, A. L. M. Verbeek, J. H. Groenewoud,

H. J. de Koning, J. H. C. L. Hendriks, and R. Holland. Computer-aided

Detection versus Independent Double Reading of Masses on Mammograms.

Radiology, 227:192–200, February 2003.

[110] N. Karssemeijer and G. M. te Brake. Detection of stellate distortions in

mammograms. IEEE Transactions on Medical Imaging, 15(5):611–619, Oc-

tober 1996.

[111] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models.

International Journal of Computer Vision, 1(4):321–331, 1987.

[112] T. J. Key, N. E. Allen, E. A. Spencer, and R. C. Travis. Nutrition and breast

cancer. Breast (Edinburgh, Scotland), 12(6):412–416, December 2003.

[113] J. Kilday, F. Palmieri, and M. D. Fox. Classifying Mammographic Le-

sions Using Computerized Image Analysis. IEEE Transactions on Medical

Imaging, 12(4):664–669, December 1993.

[114] KODAK Mammography Computer-Aided Detection (CAD) System,

http://www.kodak.com/global/en/health/productsByType/

medFilmSys/eqp/system/mamCad.jhtml?pq-path=6498 , ac-

cessed July 20 2005.

http://www.kodak.com/global/en/health/productsByType/medFilmSys/eqp/system/mamCad.jhtml?pq-path=6498

http://www.kodak.com/global/en/health/productsByType/medFilmSys/eqp/system/mamCad.jhtml?pq-path=6498

Bibliography 303

[115] A. C. W. Kotcheff and C. J. Taylor. Automatic Construction of Eigenspace

Models by Direct Optimisation. Medical Image Analysis, 2:303–314, 1998.

[116] S. Lai, X. Li, and W. Bischoff. On techniques for detecting circumscribed

masses in mammograms. IEEE Transactions on Medical Imaging, 8(4):377–

386, December 1989.

[117] J.-L. Lamarque. An Atlas of The Breast: Clinical Radiodiagnosis. Wolfe

Medical Atlases. Wolfe Medical Publications, London, United Kingdom,

1981.

[118] M. Larkin. Breast self examination does more harm than good, says task

force. The Lancet, 357(9274):2109, June 2001. News article.

[119] B. Leyland-Jones. Trastuzumab: hopes and realities. The Lancet Oncology,

3(3):137–144, March 2002.

[120] S. Liu, C. F. Babbs, and E. J. Delp. Multiresolution Detection of Spiculated

Lesions in Digital Mammograms. IEEE Transactions on Image Processing,

10(6):874–884, June 2001.

[121] S. L. Lou, H. D. Lin, K. P. Lin, and D. Hoogstrate. Automatic breast re-

gion extraction from digital mammograms for PACS and telemammography

applications. Computerized Medical Imaging and Graphics, 24(4):205–220,

August 2000.

[122] C. G. Mallat, S. G. Mallat, and S. Mallat. A Wavelet Tour of Signal

Processing. Wavelet Analysis and Its Applications Series. Elsevier Science

& Technology Books, 2nd edition, September 1999.

Bibliography 304

[123] L. N. Mascio, S. D. Frankel, J. M. Hernandez, and C. M. Logan. Building

the LLNL/UCSF Digital Mammogram Library with image groundtruth.

In K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schidt, editors, Dig-

ital Mammography ’96: Proceedings of the 3rd International Workshop on

Digital Mammography, International Congress Series, pages 427–430, Hills-

borough, New Jersey, USA, December 1996. Excerpta Medica.

[124] G. Matheron. Random Sets and Integral Geometry. Probability and Statis-

tics Series. Wiley, February 1975.

[125] J. McQueen. Some Methods for Classification and Analysis of Multivariate

Observations. In 5th Berkeley Symposium on Mathematical Statistics and

Probability, 1967.

[126] C. E. Metz. ROC methodology in radiologic imaging. Investigative Radi-

ology, 21(9):720–733, September 1986.

[127] C. E. Metz. Evaluation of digital mammography by ROC analysis. In

K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schidt, editors, Digi-

tal Mammography ’96: Proceedings of the 3rd International Workshop on

Digital Mammography, International Congress Series, pages 61–68, Hills-

borough, New Jersey, USA, December 1996. Excerpta Medica.

[128] The NEW MIAS Digital Mammogram Database, http://www.wiau.

man.ac.uk/services/MIAS/MIASweb.html , accessed July 20 2005.

[129] P. Miller and S. Astley. Automated detection of breast asymmetries. In

J. Illingworth, editor, British Machine Vision Conference, pages 519–528.

BMVA Press, September 1993.

http://www.wiau.man.ac.uk/services/MIAS/MIASweb.html

http://www.wiau.man.ac.uk/services/MIAS/MIASweb.html

Bibliography 305

[130] The mini-MIAS database of mammograms, http://peipa.essex.ac.

uk/info/mias.html , accessed July 20 2005.

[131] E. H. Moore. On the Reciprocal of the General Algebraic Matrix. (Ab-

stract). Bulletin of the American Mathematical Society, 26:394–395, 1920.

[132] N. R. Mudigonda, R. M. Rangayyan, and J. E. L. Desautels. Gradient and

Texture Analysis for the Classification of Mammographic Masses. IEEE

Transactions on Medical Imaging, 19(10):1032–1043, October 2000.

[133] NHS Breast Screening Programme Annual Review 2004. Technical report,

NHS, 2004.

[134] The NHS Breast Screening Programme, http://www.

cancerscreening.nhs.uk/breastscreen/ , accessed February

2005.

[135] R. M. Nishikawa, R. E. Johnston, D. E. Wolverton, R. A. Schmidt, E. D.

Pisano, B. M. Hemminger, and J. Moody. A Common Database of Mam-

mograms for Research in Digital Mammography. In K. Doi, M. L. Giger,

R. M. Nishikawa, and R. A. Schidt, editors, Digital Mammography ’96:

Proceedings of the 3rd International Workshop on Digital Mammography,

International Congress Series, pages 435–438, Hillsborough, New Jersey,

USA, December 1996. Excerpta Medica.

[136] O. Olsen and P. C. Gøtzsche. Cochrane review on screening for breast cancer

with mammography. The Lancet, 358(9290):1340–1342, October 2001.

http://peipa.essex.ac.uk/info/mias.html

http://peipa.essex.ac.uk/info/mias.html

http://www.cancerscreening.nhs.uk/breastscreen/

http://www.cancerscreening.nhs.uk/breastscreen/

Bibliography 306

[137] Dempster A. P., Laird N. M., and Rubin D. B. Maximum Likelihood for

Incomplete Data via the EM Algorithm. Journal of the Royal Statistical

Society, Series B, 39:1–38, 1977.

[138] S. Pemberton, D. Austin, J. Axelsson, T. Celik, D. Dominiak, H. Elenbaas,

B. Epperson, M. Ishikawa, S. Matsui, S. McCarron, A. Navarro, S. Peru-

vemba, R. Relyea, S. Schnitzenbaumer, and P. Stark. XHTML 1.0 The

Extensible HyperText Markup Language (Second Edition). W3C Recom-

mendation, World Wide Web Consortium (W3C), August 2002.

[139] R. A. Penrose. A Generalised Inverse for Matrices. Proceedings of the

Cambridge Philosophical Society, 51:406–413, 1955.

[140] A. Petrie and C. Sabin. Medical Statistics at a Glance. At a Glance series.

Blackwell Science, Oxford, UK, June 2000.

[141] A. Petrosian, H.-P. Chan, M. A. Helvie, M. M. Goodsitt, and D. D.

Adler. Computer-aided diagnosis in mammography: classification of mass

and normal tissue by texture analysis. Physics in Medicine and Biology,

39(12):2273–2288, December 1994.

[142] K. Popat and R. Picard. Novel Cluster-Based Probability Model for Texture

Synthesis, Classification, and Compression. In B. G. Haskell and H.-M.

Hang, editors, Visual Communications and Image Processing ’93, volume

2094, pages 756–768, Bellingham, Washington, USA, October 1993. SPIE.

[143] J. Portilla and E. P. Simoncelli. Texture Modelling and Synthesis using

Joint Statistics of Complex Wavelet Coefficients. In IEEE Workshop on

Bibliography 307

Statistical and Computational Theories of Vision, Fort Collins, Colorado,

USA, June 1999.

[144] J. Portilla and E. P. Simoncelli. A Parametric Texture Model Based on

Joint Statistics of Complex Wavelet Coefficients. International Journal of

Computer Vision, 40(1):49–71, 2000.

[145] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numer-

ical Recipes in C: The Art of Scientific Computing. Cambridge University

Press, 1992.

[146] Digital Mammography Research, http://www.csse.uwa.edu.au/

˜ptaylor/digmam.html , accessed March 6 2005.

[147] W. Qian, M. Kallergi, L. P. Clarke, H.-D. Li, P. Venugopal, D. Song, and

R. A. Clark. Tree structured wavelet transform segmentation of micro-

calcifications in digital mammography. Medical Physics, 22(8):1247–1254,

August 1995.

[148] M. Quinn. Cancer survival, England, 1993-2000. National Statistics Press

Release, January 2002.

[149] Press release: R2 Introduces Smarter CAD Algorithm and Work-

flow For Mammography Products, http://www.r2tech.com/main/

company/news_one_up.php?prID=140 , accessed June 2005.

[150] R2 Home, http://www.r2tech.com , accessed July 20 2005.

[151] N. Ravishanker and D. K. Dey. A First Course in Linear Model Theory.

Chapman and Hall/CRC, 2002.

http://www.csse.uwa.edu.au/~ptaylor/digmam.html

http://www.csse.uwa.edu.au/~ptaylor/digmam.html

http://www.r2tech.com/main/company/news_one_up.php?prID=140

http://www.r2tech.com/main/company/news_one_up.php?prID=140

http://www.r2tech.com

Bibliography 308

[152] C. J. Rose and C. J. Taylor. An Improved Method of Computing Scale-

Orientation Signatures. In Medical Image Understanding and Analysis,

pages 5–8, July 2001.

[153] C. J. Rose and C. J. Taylor. A Statistical Model of Texture for Medical Im-

age Synthesis and Analysis. In Medical Image Understanding and Analysis,

pages 1–4, July 2003.

[154] C. J. Rose and C. J. Taylor. A Generative Statistical Model of Mammo-

graphic Appearance. In D. Rueckert, J. Hajnal, and G.-Z. Yang, editors,

Medical Image Understanding and Analysis 2004, pages 89–92, Imperial

College London, UK, September 2004.

[155] C. J. Rose and C. J. Taylor. A Model of Mammographic Appearance. In

British Journal of Radiology Congress Series: Proceedings of UK Radio-

logical Congress 2004, pages 34–35, Manchester, United Kingdom, June

2004.

[156] C. J. Rose and C. J. Taylor. A Statistical Model of Mammographic Ap-

pearance for Synthesis and Analysis. In International Workshop on Digital

Mammography, 2004. (Accepted, pending.).

[157] C. J. Rose and C. J. Taylor. A Holistic Approach to the Detection of Abnor-

malities in Mammograms. In British Journal of Radiology Congress Series:

Proceedings of UK Radiological Congress 2005, page 29, Manchester, United

Kingdom, June 2005.

[158] B. Sahiner, H.-P. Chan, N. Petrick, M. A. Helvie, and L. M. Hadjiiski. Im-

provement of mammographic mass characterization using spiculation mea-

Bibliography 309

sures and morphological features. Medical Physics, 28(7):1455–1465, July

2001.

[159] P. Sajda, C. Spence, and L. Parra. A multi-scale probabilistic network

model for detection, synthesis and compression in mammographic image

analysis. Medical Image Analysis, 7(2):187–204, June 2003.

[160] A. Salomon. Beitrage zur pathologie und klinik der mammokarzinome.

Archiv fur Klinische Chirurgie, 101:573–668, 1913.

[161] J. A. Serra, editor. Image Analysis and Mathematical Morphology, volume 1.

Academic Press, April 1982.

[162] J. A. Serra, editor. Image Analysis and Mathematical Morphology: Theo-

retical Advances, volume 2. Academic Press, 1988.

[163] C. E. Shannon. A mathematical theory of communication. Bell System

Technical Journal, 27:379–423 and 623–656, July and October 1948.

[164] E. P. Simoncelli and W. T. Freeman. The Steerable Pyramid: A Flexible

Architecture for Multi-Scale Derivative Computation. In Second Interna-

tional Conference on Image Processing, volume 3, pages 444–447. IEEE

Signal Processing Society, 1995.

[165] J. H. Smith. Prediction of the risk of breast cancer using computer vision

techniques. PhD thesis, The Victoria University of Manchester, Manchester,


[166] J. H. Smith, S. M. Astley, J. Graham, and A. P. Hufton. The calibration

of grey-levels in mammograms. In K. Doi, M. L. Giger, R. M. Nishikawa,

Bibliography 310

and R. A. Schidt, editors, Digital Mammography ’96: Proceedings of the 3rd

International Workshop on Digital Mammography, International Congress

Series, pages 195–200, Hillsborough, New Jersey, USA, December 1996.

Excerpta Medica.

[167] P. Soille, J. Breen, and R. Jones. Recursive Implementation of Erosions and

Dilations along Discrete Lines at Arbitrary Angles. IEEE Transactions on

Pattern Analysis and Machine Vision, 18(5):562–567, May 1996.

[168] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis and Ma-

chine Vision. PWS (Brooks/Cole Publishing), International Thomson Pub-

lishing Europe, High Holborn, London, England, 2nd edition, 1999.

[169] C. Spence, L. Parra, and P. Sajda. Detection, Synthesis and Compression

in Mammographic Image Analysis with a Hierarchical Image Probability

Model. In L. Staib, editor, IEEE Workshop on Mathematical Methods in

Biomedical Image Analysis, pages 3–10. IEEE, 2001.

[170] S. J. Starr, C. E. Metz, L. B. Lusted, and D. J. Goodenough. Visual

detection and localization of radiographic images. Radiology, 116:533–538,

1975.

[171] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts,

E. Stamatakis, N. Cerneaz, S. Kok, P. Taylor, D. Betal, and J. Savage. The

mammographic image analysis society digital mammogram database. In

A. G. Gale, S. M. Astley, D. R. Dance, and A. Y. Cairns, editors, Digital

Mammography: Proceedings of the 2nd International Workshop on Digi-

Bibliography 311

tal Mammography, York, UK, 10–12 July 1994, pages 375–378. Elsevier

Science, September 1994.

[172] L. Tabar, P. B. Dean, and T. Tot. Teaching Atlas of Mammography. Thieme

Medical Publishers, New York, USA, 3rd edition, January 2001.

[173] P. G. Tahoces, J. Correa, M. Soutu, L. Gomez, and J. J. Vidal. Computer-

assisted diagnosis: the classification of mammographic breast parenchymal

patterns. Physics in Medicine and Biology, 40(1):103–117, January 1995.

[174] L. Tarassenko, P. Hayton, N. Cerneaz, and M. Brady. Novelty detection

for the identification of masses in mammograms. In Proceedings of the

Fourth International Conference on Artificial Neural Networks, pages 442–

447. IEEE, June 1995.

[175] P. Taylor, S. Hajnal, M.-H. Dilhuydy, and B. Barreau. Measuring image

texture to separate “difficult” from “easy” mammograms. British Journal

of Radiology, 67(797):456–463, 1994.

[176] P. Taylor, R. Owens, and D. Ingram. 3-D Fractal Modelling of Breast

Growths. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th

International Workshop, pages 785–791, Madison, Wisconsin, USA, Decem-

ber 2001. Medical Physics Publishing.

[177] G.M. te Brake and N. Karssemeijer. Segmentation of suspicious densities

in digital mammograms. Medical Physics, 28(2):259–266, February 2001.

[178] C. H. van Gils, J. H. C. L. Hendriks, R. Holland, N. Karssemeijer, J. D. M.

Otten, H. Straatman, and A. L. M. Verbeek. Changes in mammographic

Bibliography 312

breast density and concomitant changes in breast cancer risk. European

Journal of Cancer Prevention, 8(6):509–515, December 1999.

[179] J. H. Veldkamp, N. Karssemeijer, J. D. M. Otten, and J. H. C. L. Hendriks.

Automated classification of clustered microcalcifications into malignant and

benign types. Medical Physics, 27(11):2600–2608, November 2000.

[180] W. Veldkamp and N. Karssemeijer. Improved correction for signal de-

pendent noise applied to automatic detection of microcalcifications. In

N. Karssemeijer, M. A. O. Thijssen, J. H. C. L. Hendriks, and L. J. T. O.

van Erning, editors, Digital Mammography, volume 13 of Computational

Imaging and Vision, pages 169–176. Kluwer Academic Publishers, Novem-

ber 1998.

[181] VuCOMP—Redefining CAD, http://www.vucomp.com/ , accessed

July 20 2005.

[182] R. Warren, M. Harvie, and A. Howell. Strategies for managing breast cancer

risk after the menopause. Treat Endocrinol, 3(5):289–307, 2004.

[183] A. P. Wickens. Foundations of Biopsychology. Pearson Education, Harlow,

England, 2nd edition, 2005.

[184] T. N. Wiesel. Postnatal development of the visual cortex and the influence

of the environment. Nature, 299:583–591, 1982.

[185] J. N. Wolfe. Risk for breast cancer development determined by mammo-

graphic parenchymal pattern. Cancer, 37(5):2486–2492, May 1976.

http://www.vucomp.com/

Bibliography 313

[186] C. J. Wright and C. B. Mueller. Screening mammography and public health

policy: the need for perspective. The Lancet, 436(8966):29–32, July 1995.

[187] Y. Wu, M. L. Giger, K. Doi, C. Vyborny, R. A. Schmidt, and C. E. Metz.

Artificial Neural Networks in Mammography: Application to Decision Mak-

ing in the Diagnosis of Breast Cancer. Radiology, 187:81–87, April 1993.

[188] W. Zhang, K. Doi, M. L. Giger, R. M. Nishikawa, and R. A. Schmidt. An

improved shift-invariant artificial neural network for computerized detection

of clustered microcalcifications in digital mammograms. Medical Physics,

23(4):595–601, April 1996.

[189] C. Zhou, H.-P. Chan, N. Petrick, M. A. Helvie, M. M. Goodsitt, B. Sahiner,

and L. M. Hadjiiski. Computerized image analysis: Estimation of breast

density on mammograms. Medical Physics, 28(6):1056–1069, June 2001.

[190] R. Zwiggelaar, S. M. Astley, C. R. M. Boggis, and C. J. Taylor. Linear

Structures in Mammographic Images: Detection and Classification. IEEE

Transactions on Medical Imaging, 23(9):1077–1087, September 2004.

[191] R. Zwiggelaar and R. Marti. Detecting Linear Structures In Mammographic

Images. In M. J. Yaffe, editor, Digital Mammography: IWDM 2000, 5th

International Workshop, pages 436–442, Madison, Wisconsin, USA, Decem-

ber 2001. Medical Physics Publishing.

[192] R. Zwiggelaar, T. C. Parr, J. E. Schuum, I. W Hutt, S. M. Astley, C. J.

Taylor, and C. R. M. Boggis. Model-based detection of spiculated lesions

in mammograms. Medical Image Analysis, 3(1):39–62, 1999.

Bibliography 314

[193] R. Zwiggelaar, P. Planiol, J. Marti, R. Marti, L. Blot, E. R. E. Denton,

and C. M. E. Rubin. EM Texture Segmentation of Mammographic Im-

ages. In H.-O. Peitgen, editor, Digital Mammography: IWDM—6th In-

ternational Workshop on Digital Mammography, pages 223–227. Springer-

Verlag, March 2003.

statistical models of mammographic texture and appearance

Documents

texture models

modelling mammographic

texture model1907

novelty detection method

pixelwise texture synthesis

breast density

breast imaging

breast cancer362