evaluation of emrad ai in mammography project 2018-2020
TRANSCRIPT
DECEMBER 17, 2020
TaoHealth Research & Implementation
Lead author: Dr Niamh Lennox-Chhugani
Evaluation of EMRAD AI in Breast Screening Project: Final Report Full Technical Report
TaoHealth Research & Implementation i
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Disclaimer: Although TaoHealth Ltd has taken reasonable professional care in the preparation of this document, we cannot
guarantee absolute accuracy or completeness of information/data contained in this document, nor do we accept responsibility for
recommendations that may have been omitted due to particular or exceptional conditions and circumstances.
Confidentiality: This document contains information, which is proprietary and may not be disclosed to third parties without prior
written approval from TaoHealth Ltd or NHS EMRAD. Except where permitted under the provisions of confidentiality above, this
document may not be reproduced, retained or stored beyond the period of validity, or transmitted in whole, or in part, to any third
party without prior, written permission from TaoHealth Ltd.
TaoHealth Research & Implementation 1
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Contents Glossary..................................................................................................................................................... 1
Introduction and background ................................................................................................................... 3
Overview of the evaluation ..................................................................................................................................................................... 3
Literature review: AI in healthcare ......................................................................................................................................................... 6
AI in medical imaging .......................................................................................................................................................................... 8
The NHS breast screening programme (NHSBSP) ............................................................................................................................... 8
AI in breast cancer screening ............................................................................................................................................................ 10
Public perceptions of the use of AI in general and in healthcare ..................................................................................................... 11
Changes to the tools during the project ............................................................................................................................................... 13
Kheiron Medical Technologies - Mia™ .............................................................................................................................................. 13
Faculty .............................................................................................................................................................................................. 14
Structure of the report .......................................................................................................................................................................... 15
Methods .................................................................................................................................................. 16
Ethical approval .................................................................................................................................................................................... 17
Data collection ...................................................................................................................................................................................... 17
Data analysis ......................................................................................................................................................................................... 18
Qualitative data analysis ................................................................................................................................................................... 19
Quantitative data analysis ................................................................................................................................................................ 19
The evolving theory of change ......................................................................................................................................................... 20
Findings ................................................................................................................................................. 24
Overview .............................................................................................................................................................................................. 24
How well do different groups understand AI in general? ..................................................................................................................... 25
What were the perceived benefits of the use of AI tools in the breast screening service? .................................................................. 26
What were the concerns of the workforce and women about the use of AI tools in the breast screening service? ........................... 35
What were the technical and data benefits and challenges for the project? ....................................................................................... 41
What were the organisational issues that enabled or constrained the progress of the project? ......................................................... 45
What wider contextual issues affected the progress of the project? ................................................................................................... 50
What is the potential impact of the screening imaging innovation programme on the performance of screening services? ............. 54
What is the potential impact of the screening optimisation innovation programme? ......................................................................... 56
Was the programme worth the investment, that is, did it deliver value for money and if not in the timeframe of the evaluation,
when is it likely to deliver a return on investment? .............................................................................................................................. 57
What would the impact of the screening imaging innovation programme be if implemented at scale across EMRAD? ..................... 62
What would the impact of the screening optimisation innovation programme be if implemented at scale across EMRAD? ............. 63
Summary of key findings ....................................................................................................................................................................... 64
Discussion and implications .................................................................................................................... 65
Evaluating progress, outcomes, and impact ......................................................................................................................................... 65
Implications ........................................................................................................................................................................................... 68
TaoHealth Research & Implementation 2
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Limitations of the study .................................................................................................................................................................... 70
Lessons learnt for future evaluation design ..................................................................................................................................... 70
Acknowledgements ................................................................................................................................ 73
References .............................................................................................................................................. 74
Appendices ............................................................................................................................................. 83
Appendix 1 ............................................................................................................................................................................................ 84
Appendix 2 ............................................................................................................................................................................................ 90
Appendix 3 ............................................................................................................................................................................................ 91
Appendix 4 ............................................................................................................................................................................................ 94
Appendix 5 ............................................................................................................................................................................................ 96
Appendix 6 ............................................................................................................................................................................................ 99
Appendix 7 .......................................................................................................................................................................................... 101
TaoHealth Research & Implementation 1
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Glossary ACC Acute care collaborative
AI Artificial intelligence
AgeX Age Extension trial for breast screening
API Application programming interface
AUC Area under curve
BIA Budget impact analysis
BSU Breast screening unit
CAD Computer aided decision-making
CAG Confidentiality Advisory Group
CE mark Conformité Européenne mark
CPD Continuing professional development
CQC Care Quality Commission
DL Deep learning
DPA Data Protection Act (2018)
DPIA Data protection impact assessment
DNAs Did not attend, the term used for a patient who missed an appointment
EMAHSN East Midlands Academic Health Sciences Network
EMRAD East Midlands Radiology Consortium
GDPR General Data Protection Regulation
HRA Health Research Authority
ICO Information Commissioner’s Office
IG Information Governance
IoT Internet of things
IT Information Technology
IRAS Integrated Research Application System
NASSS Non-adoption, abandonment, scale-up, spread, and sustainability
NBI Nottingham Breast Institute
NBSS National breast screening system
NHSEI National Health Service England and Improvement
NHSBSP National Health Service Breast Screening Programme
NHSX NHS body responsible for supporting digital technology adoption in the NHS
NICE National Institute for Health and Care Excellence
TaoHealth Research & Implementation 2
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
NSC National Screening Committee
NUH Nottingham University Hospitals NHS Trust
MHRA Medicines and Healthcare Products Regulatory Agency
ML Machine learning
PACS Picture archiving and communication system
PHE Public Health England
PHIS Public health identity systems
PI Principal investigator
ROC Receiver operating curve
RSA Royal Society for the encouragement of Arts, Manufactures and Commerce
RSNA Radiological Society of North America
SDS Synthetic data set
ULBSS United Lincolnshire Breast Screening Service
ULH United Lincolnshire Hospitals NHS Trust
TaoHealth Research & Implementation 3
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Introduction and background Overview of the evaluation
This evaluation was conducted from October 2018 – September 2020 by the research team at TaoHealth
Research & Implementation. It was commissioned by NHS EMRAD1 as part of the NHS England Wave 2 Test Beds
Programme to deliver project learning during and at the end of the project, inform project implementation and
future investment decisions locally and nationally as the NHS looks at how it can derive value from digital
technologies including artificial intelligence in the future.
NHS EMRAD (EMRAD), together with two commercial digital technology companies, Kheiron Medical
Technologies (Kheiron) and Faculty (formerly ASI Data Science), their provider of radiology IT systems GE
Healthcare and East Midlands AHSN bid for funding under the NHS England Test Beds Programme in 2018 to
train and implement artificial intelligence (AI) solutions [Box 1] within the national breast screening programme.
Neither of the products being ‘tested’ were market-ready although the product from Kheiron was CE-marked.
This evaluation used mixed methods to understand the potential impact of the technologies on the breast
screening service and the process of implementation of introducing such novel technologies into the clinical
context and the breast screening pathway specifically [Figure 1].
The theory of change underlying the EMRAD screening imaging innovation programme is that the two AI tools,
one an algorithm-based clinical decision support tool and the other a machine learning pathway optimisation
tool, in the context of a scalable radiology IT system, will optimise the efficiency of the overall service, allow the
same number of staff to process more scans, reducing reporting delays, freeing up staff to deliver high value
activity and enable prompt and accurate diagnosis and treatment.
The main aims of the evaluation were to:
1. Understand the effect of combinatorial innovation in the NHS Breast Screening Programme (NHSBSP) on
coverage and utilisation, user satisfaction, improvement in workforce productivity and improvement in
health and care services with a specific focus on:
a. Assessing the use of machine learning models with proven effectiveness in non-healthcare, live
environments [developed by Faculty] to optimise the operational aspects (clinic scheduling and
resource allocation) of the breast screening service, boosting system capacity, reducing delays and
improving patient experience.
b. Understand clinical and patient attitudes towards this technology, with a view to wider roll-out
across the NHS.
1 NHS EMRAD stands for East Midlands Imaging Network and is a partnership of seven NHS trusts (Chesterfield Royal
Hospital NHS Foundation Trust, Kettering General Hospital NHS Foundation Trust, Northampton General Hospital NHS Trust,
Nottingham University Hospitals NHS Trust (host organisation), Sherwood Forest Hospitals NHS Foundation Trust, United
Lincolnshire Hospitals NHS Trust, and University Hospitals of Derby and Burton NHS Foundation Trust). These trusts run 11
hospitals, covering more than five million patients.
TaoHealth Research & Implementation 4
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
2. Understand and share ‘lessons learnt’ as a nationally relevant template for the combined deployment of
clinically and operationally focused AI tools in healthcare.
3. Make recommendations about future real-world testing and scale-up of AI technologies in the health
system.
Figure 1: Evaluation timeline
The evaluation does not include a comprehensive assessment of the safety and effectiveness of Kheiron’s Mia™
tool (the subject of a separate HRA approved study) or Faculty’s service optimisation tool. It explores the process
of testing and developing the tools in the real world, perceptions around the use of AI tools in the context of the
NHSBSP, early evidence of the effect on NHSBSP performance and the process of innovating in the NHS.
Box 1: Definitions
Artificial intelligence (AI) can be viewed as ‘general’ or ‘narrow’ in scope. Artificial general intelligence refers to
a machine with broad cognitive abilities, which is able to think, or at least simulate convincingly, all of the
intellectual capacities of a human being, and potentially surpass them—it would essentially be intellectually
indistinguishable from a human being.
Narrow AI systems perform specific tasks which would require intelligence in a human being, and may even
surpass human abilities in these areas. However, such systems are limited in the range of tasks they can perform.
The terms ‘machine learning’ and ‘artificial intelligence’ are also sometimes conflated or confused, but machine
learning is in fact a particular type of artificial intelligence which is especially dominant within the field today.
Machine learning (ML) gives computers the ability to learn from and improve with experience, without being
explicitly programmed. When provided with sufficient data, a machine learning algorithm can learn to make
predictions or solve problems, such as identifying objects in pictures or winning at particular games, for example.
Neural networks are types of ML loosely inspired by the structure of the human brain. A neural network is
composed of simple processing nodes, or ‘artificial neurons’, which are connected to one another in layers. Each
node will receive data from several nodes ‘above’ it and give data to several nodes ‘below’ it. Nodes attach a
‘weight’ to the data
they receive and attribute a value to that data. If the data does not pass a certain threshold, it is not passed on
to another node. The weights and thresholds of the nodes are adjusted when the algorithm is trained until similar
data input results in consistent outputs. Deep learning (DL) is a more recent variation of neural networks, which
uses many layers of artificial neurons to solve more difficult problems. Its popularity as a technique increased
TaoHealth Research & Implementation 5
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
significantly from the mid-2000s onwards, as it is behind much of the wider interest in AI today. It is often used
to classify information from images, text or sound.
Select Committee on Artificial Intelligence (2018)
The evolution of deep learning
Classic machine learning depends on carefully designed features, requiring human expertise and complicated
task-specific optimization. Deep learning bypasses feature engineering by taking advantage of large quantities
of data and flexible hierarchical models. Deep learning has recently achieved striking performance improvements
in diverse fields such as image classification, speech recognition, natural language processing, and playing
games. Blue boxes represent components learned by fitting a model to example data; deep learning allows
learning an end-to-end mapping from the input to the output.
Deep learning: A primer for Radiologists (Chartrand, 2017)
TaoHealth Research & Implementation 6
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Literature review: AI in healthcare
In 2019, the Secretary of State created NHSX, a joint unit across the Department of Health and Social Care and
NHS England and Improvement, to lead on digital transformation across the health and care system in England.
Since its creation, the team within NHSX focusing on artificial intelligence (AI) has produced papers setting out
the enabling context they would like to create for value-added AI technology in health including Artificial
Intelligence: How to get it right (2019) and Code of conduct for data-driven health and care technology (2019).
Together with other healthcare regulators, NHSX recognizes the potential but untested impact of AI on
healthcare.
Globally, health systems are looking to technology including artificial intelligence to address some of the demand
and capacity challenges facing them (Rong, 2020) (Davenport, 2019) (Loh, 2018) (Reform, 2018) (Shah, 2018)
(Fenech, 2018). Software and applications using artificial intelligence are seen to have great potential here but
there are few prospective real-world use cases (Nagendran, 2020) (Kelly CJ, 2019) (AHSN Network, 2018).
The main benefits that proponents of the use of AI in healthcare put forward are that:
1. It is more accurate than humans on well-defined tasks (Liu X. F., 2019) (Buch, 2018) (Chen, 2017);
2. It can help increase healthcare workforce productivity (Buch, 2018) (Meskó, 2018) releasing highly
trained professionals to high value activities that require human interaction;
3. Support the administration and management of services to match capacity to demand (Nelson, 2019)
(Rajkomar, 2018);
4. Enable greater than current levels of patient autonomy and self-care (AHSN Network, 2018) although
this remains untested; and
5. Greater cost effectiveness (Wolff J, 2020) although this again remains unproven.
The limitations and risks of applying AI in the healthcare settings has also been highlighted and is subject to
much current debate centring around regulation. The concerns that have been raised include:
1. The dependence of data that is not always reliable, generalizable, consistent or available added to the
risks inherent in machine learning where the output of the algorithm itself shapes future data inputs
(Kelly CJ, 2019) (Buch, 2018) (Chen, 2017);
2. The risk of bias both within algorithms themselves and within the data that is used to train and validate
the models (Coeckelbergh, 2019) (Char, 2018 );
3. The risks of excluding protected groups who may be under-represented in data-sets or important
markers may not be included (Brown, 2019) (Panch T, 2019);
4. The risks of breaches to data security especially where commercial partnerships play a role in delivery
(Thompson, 2018), weak data governance and unclear data control (Morley J, 2019) (Panch T, 2019)
(Coeckelbergh, 2019) (Morley J, 2020);
5. Lack of clarity around clinical accountability when AI is used to make diagnostic and treatment
recommendations (Tschider, 2018) (Floridi, 2020) (Smith, 2020) (Coeckelbergh, 2019);
6. Outstanding questions about clinical safety (Challen R, 2019) (Macrae, 2019);
TaoHealth Research & Implementation 7
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
7. Lack of understanding about AI and machine learning algorithms, how they are trained, the inputs
required, how they function and what they cannot do amongst health professionals (Robbins, 2019)
(Lee, 2019) (Harvey, 2018); and
8. AI technology encroaches on human clinical autonomy (McDougall, 2019) (Milano, 2020) (Asan, 2020)
and will have an as yet unknown impact on human relationships in healthcare (Kerasidou, 2020) (Fenech,
2018) (Powell, 2019) (Karches, 2018) (Bjerring, 2020).
The covid-19 pandemic accelerated the use of technology solutions in most aspects to health and care delivery
including use of video consultations in general practice and hospital out patients (Greenhalgh T, 2020)
challenging working practices, public experience of care, technical infrastructure and, in the longer term,
reimbursement models (Webster, 2020). The need to rapidly expand countries contact tracing capacity in
response to the pandemic led governments to turn to software developers and data scientists to plug a gap by
developing digital applications. In the UK, there was a particularly strident public debate around the use and
basis of this kind of technology to track and trace infection in the population and offer immunity passports with
trust in data privacy and security at the centre of this debate (Ada Lovelace Institute, 2020). Calls for ethical
frameworks and robust regulation of health technology including AI have been bolstered by this recent and time-
sensitive debate. A number of ethical frameworks have been proposed (Morley J. F., 2020) (Morley J. F., 2019)
(Open Data Institute, 2020) and are informing how such technology can be safely adopted in healthcare.
Healthcare regulators in England are starting to develop their approaches to regulating this emerging technology
(Care Quality Commission, 2020) (Care Quality Commission (b), 2020) (Information Commissioners Office, 2020)
(NICE, 2019). This development in regulation can be seen mirrored in Europe and the US (Pesapane F, 2018).
The rapid adoption of technology in health in the context of the pandemic had underlined the implementation
challenges that have dogged the adoption and spread of technology in healthcare (Sheikh A, 2011). AI tools need
data to be trained and accessing large bodies of high quality, deidentified data is often the first hurdle (Lee,
2019). In England, NHSX is seeking to overcome this challenge by setting up the NHS AI Lab and an AI Award
programme to provide technology developers with support to access data safely (NHSX, 2019). Given the role
that AI could play in augmenting decision-making as part of the processes of assessment, diagnosis and
treatment, it is critical to understand the interaction between AI technology and humans in real-world clinical
workflow (Kelly CJ, 2019) (Cresswell, 2018) and the how acceptable health professionals find the technology in
different contexts (Cohen, 2017) (Shaw J, 2019). These and other particular development and implementation
challenges for AI in health mean that a more iterative approach will be required than has been traditionally the
case when adopting and scaling technology (Coiera, 2019).
There does seem to be an emerging consensus amongst clinical academics and policy-makers that AI technology
has the potential to augment clinical decision-making rather than replace it (Shaw J, 2019) (Health Education
England, 2019) (Health Education England, 2020), although this has not yet been tested in practice. As Chen and
Asch put it; “Whether such artificial-intelligence systems are “smarter” than human practitioners makes for a
stimulating debate — but is largely irrelevant. Combining machine-learning software with the best human
clinician “hardware” will permit delivery of care that outperforms what either can do alone” (Chen, 2017).
TaoHealth Research & Implementation 8
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
AI in medical imaging
Specifically in the field of medical imaging, the shortage of radiologists (Royal College of Radiologists, 2019)
(Royal College of Radiologists, 2020) (Lee, 2019) is constraining the capacity of imaging services including the
services provided as part of screening programmes. Deep learning (DL) AI provides the promise of a potential
solution to this challenge. Whilst there have been some fears expressed that AI may replace radiologists, it is far
more likely that AI will enhance the workforce’s ability to deliver high quality services (Allen, 2020). Systematic
analysis of studies examining the comparative performance of these solutions to date have identified a paucity
of randomized control trials, fewer prospective trials and even fewer again conducted in a real-world setting
(Nagendran, 2020) (Liu X. F., 2019). With these caveats in mind, early performance results are promising,
showing performance on a par with clinicians (Shen, 2019).
The workforce within medical imaging, radiologists, radiographers and clinical support staff, will need to have
an understanding of AI and the role that it can play in various diagnostic and treatment pathways (Panch, 2018)
(Recht, 2020). There is a requirement for medical education to reflect this need (Mendelson, 2019). Studies
exploring the attitudes of the workforce, including trainees, to the use of AI in imaging workflows are few but
those that have been conducted in countries such as France (Waymel, 2019) (Lai, 2020), Canada (Gong, 2019),
Germany (Pinto dos Santos, 2019), USA (Park C. Y., 2020), Switzerland (van Hoek. J. Huber, 2019), Saudi Arabia
(Abdullah R, 2020), South Korea (Oh S, 2019) and the UK (Sit, 2020) show similar patterns of attitudes. The
radiologist workforce is open to learning more about AI and role it could play particularly in diagnostics,
supplementing clinical expertise. However, clinicians participating doubt that AI could ever deal with some of
unexpected patterns that arise in the real-world of patient interaction. There are fears that the technology could
ultimately replace human image readers, but this is more prevalent in clinical support staff and technicians who
do not have patient-facing roles. Interestingly in the Swiss study, radiologists were more anxious about losing
territory to non-radiologist colleagues than to AI. All these studies highlight the need for more education of the
workforce in AI. Those studies that examined radiologists’ attitudes in more detail including qualitative methods,
highlighted perceived potential benefits as, saving time, reducing error rates, and increasing time spent with
patients.
The NHS breast screening programme (NHSBSP)
One of the aims of the NHSBSP is to lead to an earlier detection of breast cancer and improved outcomes for
women between the ages of 50-70 years. The NHSBSP invites more than 2 million women for a test every year
nationally. In 2018/19, 71.1% of women took up the invitation and of these, 19,558 women had cancers detected
which was the highest rate in the last 10 years (NHS Digital, 2020). Screening saves around one life from breast
cancer for every 200 women screened, which equates to 1,300 lives saved from breast cancer each year in the
UK (Department of Health and Social Care, 2019).
The Age Extension (AgeX) trial (NHS Breast Screening Programme, 2020) conducted by a team at the University
of Oxford is currently assessing the risks and benefits of the extending the screening age range for women aged
47-49 and over 70 years. This trial is not due to conclude until 2026 and only then will the results be known and
fed back to the National Screening Committee (NSC). The NSC is responsible for advising the Secretary of State
for Health and Social Care on whether any new initiatives are sufficiently well evidenced to be used within a
population screening programme such as the breast screening programme. It was announced in August 2020
that responsibility for oversight of the national screening programmes in England will move to NHS England and
Improvement at a date yet to be determined (Brennan, 2020). A number of other studies are ongoing exploring
TaoHealth Research & Implementation 9
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
the effectiveness of a range of approaches to preventing breast cancer through prediction, early detection and
prevention of breast cancer (The Nightingale Centre, 2019) (Evans, 2012) and their acceptability to women.
An independent review of breast screening services delivered its final report in December 2018 (Commons,
2018), and set out recommendations to improve the operations of the breast screening programme. Sir Mike
Richards was commissioned to conduct a review of all cancer screening programmes in 2019 and delivered his
interim report in May 2019 which included recommendations about the greater use of technology and artificial
intelligence to support high quality cancer screening services (House of Commons, 2019a). The final report was
published in October 2019 (Richards, 2019b). Since March 2020, the NHSBSP has been paused due to the Covid-
19 pandemic and screening staff were redeployed to support the clinical response to the pandemic. Since the
summer of 2020, breast screening units in England have started to resume socially distanced screening services
with some piloting a new approach to inviting women to attend in September 2020. The resulting backlog will
add pressure to an already stretched service.
EMRAD have reported that within the East Midlands, breast screening services have been critically challenged.
With the lowest rate of radiologists per 100,000 of the population and highest rates of retirement over the next
5 years (Royal College of Radiologists, 2020), coupled with increasing numbers of women being eligible for breast
screening, there has been an increase in workload for readers and breast screening service managers. Some of
the EMRAD consortium Trusts were only just managing to meet two-week cancer targets pre-Covid-19 at the
expense of other important elements such as research. Other EMRAD Trusts were failing to hit the targets and
continue to refer patients onto neighbouring Trusts for treatment. This was not only causing increased travel,
anxiety, and reduction in choice for patients, but putting additional pressure on those neighbouring services
which were just about meeting demand within target performance. This has yet to be confirmed by quantitative
data but recent PHE Screening Quality Assurance Reports for the NHSPSP provided at Kettering (Public Health
England, 2018), North Nottingham (Public Health England, 2018) (Sherwood Forest Hospital) and Lincolnshire
(Public Health England, 2017) all raise the issue of staff capacity within the local breast screening units (BSUs).
The BSUs across EMRAD are slowly resuming services within the constraints of social distancing and infection
control as of July 2019.
There has been some debate in recent years about the risks and benefits of the breast screening per se (Løberg,
2015) (Gøtzsche, 2013) with one of main risks being that of over-diagnosis due to false-positives (something
found on the mammogram turns out not to be cancer) with consequent negative effects on well-being (Health
Quality Ontario, 2016). Women’s preferences show that they are willing to accept this risk along with the
discomfort of the process itself if it means cancers are diagnosed earlier (Mathioudakis AG, 2019). The quality
of clinical communication with women called back for assessment after screening is particularly important in
this circumstance (Long, 2019).
TaoHealth Research & Implementation 10
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
AI in breast cancer screening
The use of machine learning (ML) is not new in cancer diagnosis (Maclin, 1991) (Cicchetti, 1992) (Kononenko,
2001) (McCarthy JF, 2004) (Cruz, 2006). One challenge in using machine learning is the status of the input data
for training. ML relies on data that is uniformly annotated, labelled and structured. Medical images, including
breast images, are rarely curated in ways that allow for ML to be applied on large data sets without significant
preparatory work (Harvey H., 2019). The process of preparing medical images for machine learning is complex
and rarely fast (Willemink, 2020) (Chartrand, 2017) and requires three subsets, a training set2 , a validation set3
and a test set4. Inadequate planning for this data curation commonly leads to ML project delay and failure
(Harvey H., 2019).
Figure 2: Preparing medical imaging data for machine learning, Willemink et al 2020
The journal Nature published an article in January 2020 that gained considerable media interest. The article
presented the results of an international evaluation of an AI system for breast cancer screening (McKinney,
2020). This retrospective study using data from the US (enriched) and UK (representative) found that the AI
reader outperformed human readers, reducing both false positives and false negatives (the mammogram may
look normal even though breast cancer is present). The size of the data set (c17,000 images) of this study
addressed some of the previous criticisms of studies examining the use of AI in breast cancer screening
(Houssami, 2019). Another retrospective study (Kim, 2020) published one month later using an even larger data
set from the UK, US and Korea (170,230) as well as using images from three different imaging vendors, found
that standalone AI outperformed radiologists and when used alongside radiologists, improved diagnostic
performance significantly. A paper published in August 2020 (Salim M, 2020), presenting the results of an
independent evaluation of three commercially available AI products used as stand-alone readers and with a
2 Trains and optimises the neural network parameters. 3 Monitors the performance of the model during training. Internal validation uses the data used to develop the model. External validation uses a separate data set. Temporal or geographic external validation enables assessment of generalisability (Park S. &., 2018). 4 Measures final model performance when parameters are fixed.
TaoHealth Research & Implementation 11
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
human reader, found the algorithms demonstrated “sufficient diagnostic performance” and identified more true
positive cases of cancer when combined with human readers.
So there is emerging evidence that AI is safe, effective and accurate when used retrospectively but we have, as
yet, no prospective studies that test if this performance carries through into real-world clinical practice (Dustler,
2020). A concern that has been voiced more recently that AI image reading may currently lack the level of patient
focus on “clinically meaningful endpoints such as survival, symptoms, and need for treatment” needed to
mitigate the risks of over-treatment and false positives (Oren, 2020).
Public perceptions of the use of AI in general and in healthcare
The increasing ubiquity of AI in our daily lives is reflected in the media’s portrayal of AI and related ethics. From
no media discussion of AI and ethics in 2013 (Ouchchy, 2020) to a position in mid-2020 where public discourse
on the role of algorithms and AI in decision-making in the UK has been shaped by controversies surrounding the
NHS Covid-19 app, crime and justice decision-support systems and exam results (The Guardian, 2020).
AI and its use in software in everyday use is perceived differently by demographic groups across the world. Here
in the UK, recent opinion polls have highlighted a varying degree of understanding of what AI is, the role it plays
in day to day life, and perceptions of its impact. In 2018, a poll conducted by YouGov for the Royal Society for
the encouragement of Arts, Manufactures and Commerce (RSA) found that people were less familiar with the
types of AI most likely to have a direct impact on their lives. These types were also less visible to them and
included automated decision-support used in personal finance, welfare and criminal justice (The RSA, 2018). The
less visible the application the less likely people were to trust the output. A 2019 report looking at the attitudes
of the American public to AI found similar results (Zhang, 2019) with 82% wanting to see careful management
and regulation of AI applications. Research conducted by the organisation Doteveryone in the UK in 2020,
following up research conducted in 2018 looking at public attitudes to technology generally, found that while
the public remain positive about the impact of technology in their lives, enthusiasm has declined since 2018
(Miller, 2020). Women were less optimistic about the future effect of technology than men and older people
less optimistic than younger. A research study published in 2020 measured public attitudes to the smart home
and use of AI and Internet of Things (IoT) technology in the home (Cannizzaro, 2020) and found that trust in
technology in the home was low overall and this was particularly affected by concerns about the unauthorized
use of data.
Public attitudes to the use of AI and machine learning in healthcare specifically are similarly evolving. In 2017,
research commissioned by The Royal Society found that people who took part in deliberative events were
positive in general about the use of machine learning to support diagnosis in physical illness although not mental
illness (Ipsos Mori, 2017). People were clear though, that this should not replace the human interaction that
they value in the healthcare context. A study conducted for the Academy of Medical Sciences in 2018 confirmed
these views (Academy of Medical Sciences, 2018). Additionally, this research found that participants trusted the
NHS, as the guardian of their personal health data to retain control of this data when working with commercial
partners. Another notable observation from this study was the difference in attitudes between those who
identify themselves as healthy users and those identify themselves as ‘patients’, that is, people living with a
specific condition for which they are receiving treatment. Patients have a more positive view of the use of
innovative technology, probably because they have more of an interest in the benefits. A recent study looking
at public attitudes in China to the use of AI in medicine (Xiang Y, 2020) found that there is a high level of
acceptance of the use of AI in medicine overall in China. Interestingly, they found that receptivity to the use of
TaoHealth Research & Implementation 12
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
medical AI increased with age but that people do not perceive AI as replacing human health professionals, but
augmenting them. Another Chinese study analysing public attitudes as expressed on social media (Gao S, 2020)
found that nearly 60% of views were positive. Of the negative views expressed (6%), AI immaturity and distrust
of technology companies were the most common views. A study of members of the public in France (Tran, 2019)
replicated the largely positive views towards using AI in healthcare with the primary benefits seen as improving
access and follow-up, reducing the burden of treatment and reducing the workload of health and care
professionals. Perceived disbenefits included reducing human interaction, risks to privacy and security and
reliability issues.
In July 2020, the Ada Lovelace Institute published a report summarizing their findings from a series of
deliberative events conducted with the public on their attitudes to the use of technologies including AI and
algorithmic decision-making. They focused specifically on public health identity systems (PHIS) in light of the
Covid-19 pandemic (Ada Lovelace Institute, 2020) and identified issues around public trust in technology and
the companies developing technologies, concerns about effectiveness, worries about discrimination, and a
recognition that technology is not neutral but shaped by and shaping prevailing and dominant social and political
attitudes.
Looking specifically at patient attitudes towards the use of AI in radiology, a small sample survey (n=155)
conducted in The Netherlands (Ongena, 2020) showed that people want to be fully informed about the use of
AI in radiology and want to retain human interaction in the diagnostic process. A study of patient attitudes to
the use of AI to diagnose skin melanomas in Germany (Jutzi TB, 2020) (n=298) found that the respondents were
positive about the use of such technology to support clinician diagnosis and deliver faster, more precise and
unbiased results. They were concerned about data protection and susceptibility to errors.
The public are not passive recipients of care. They are essential stakeholders in the healthcare ecosystem and
their willingness to adopt new innovations can enable or constrain spread and scale (Lennon MR, 2017). If the
benefits of AI are to be delivered in programmes such as the National Breast Screening Programme and the
disbenefits minimised, then the public should be actively engaged in the design, development and monitoring
of this technology (Kirsch, 2017) (Katell, 2020).
TaoHealth Research & Implementation 13
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Changes to the tools during the project
Kheiron Medical Technologies - Mia™
Kheiron’s Mia™ AI tool for breast cancer screening and reporting was CE-marked from the outset of the project.
It has been used to retrospectively read a large number of mammograms from different manufacturers, to train
the algorithm, validate it and to determine which of the software’s various operating points would be best used
for future prospective pilot in the NHS breast screening programme (NHSBSP).
The way that Mia™ fits in to the NHSBSP workflow is summarised in Figure 3 below which is reproduced from a
document submitted by the partners to NHSX in September 2019.
Figure 3: Mia™ AI mammogram reader
Mia™ is feasibility tested in a multi-centre randomised trial using mammograms and de-identified outcomes
data. The trial assessed decision-making efficacy of Mia™ in a screening setting on European demographics. The
trial indicated that Mia™ software could potentially identify Breast Cancer correctly 9 times out of 10 (i.e. 1 in
10 false negatives). Sensitivity and specificity were 90% each, with an AUC (Area Under Curve) of 0.96. These
indicative results were consistent and repeatable, and outperformed all known Computer Aided Diagnostics
(CAD) software for breast malignancy detection. This performance was also above recommended human
performance guidelines with no significant risks or safety concerns. Mia™ received CE marking (Class IIa) in
October 2018.
The retrospective study conducted as part of the NHS Test Bed Project (HRA approval was in place prior to The
NHS Test Bed) undertaken over the last 24 months was aimed at calibrating the deep learning tool and then
validating the results of this updated model to assess levels of sensitivity and specificity. The results of this
retrospective study were presented at RSNA 2020 in December and have been submitted for publication to a
peer-reviewed journal.
TaoHealth Research & Implementation 14
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Kheiron have had to engage intensively with information governance (IG) and NHSBSP teams at the two sites
extensively during the period to design and deliver the retrospective data extraction that conforms to strict IG
requirements (Data Protection Impact Assessment and risk assessment for re-identification) and understand
data linkage, coding and labels as they calibrate the tool. The primary users of the tool on the NHSBSP sites
(mammogram readers) have not yet experienced the technology in their workflow.
The next phase of the work will be the delivery of the NHSx and AAC Phase 4 AI Award to deploy Mia across 15
sites in the UK over 3 years. Whilst retrospective evaluations demonstrate evidence of software performance,
the AI Award will allow for collection of outcomes in a prospective setting to ensure additional evidence of
acceptability and utility within business-as-usual NHS screening settings.
Faculty
Faculty AI’s ‘Platform’ (formerly SherlockML) software is a secure machine learning environment for accessing
and manipulating enormous amounts of data, designing, and testing AI models, and deploying those models in
live environments. The platform has already been used on more than 200 commercial projects. The original
vision of this project was to use this platform to develop tools that optimise the capacity and demand in the
context of breast screening round length5. The outcome of the discovery work conducted between October 2018
– December 2018 was the identification of 2 priority tools:
a) Breast Screening Programme Management Tool focusing on round length optimisation,
attendance monitoring and clinic scheduling, and
b) Theatre Productivity Optimisation Tool with further detail to be determined.
In practice, Faculty and the two participating sites (Nottingham University Hospitals NHS Trust (NUH) and United
Lincolnshire Hospitals NHS Trust (ULH)) were unable to share the data required due to unsurmountable
information governance issues.
In October 2019, Faculty proposed an alternative route to delivering their solution via the development of a
synthetic data set, which mimicked the National Breast Screening System (NBSS) data set. This synthetic data
set (SDS) could then be used to develop a range of machine learning tools to support the management of the
breast screening service.
The main outputs of Faculty’s contribution to the project have been:
1. A synthetic NBSS data set co-owned by Faculty and NUH (on behalf of EMRAD);
2. A deployment environment in NUH that is a controlled and governed environment in which new AI
products can be deployed safely;
3. A round length machine learning (ML) tool that includes: (a) an interactive information dashboard that
provides situational awareness to service managers about multiple dimensions of service activity and
demand that can be used to support day to day decision making; and (b) a scenario planning tool that
allows managers to model scenarios and predict demand and capacity changes under modelled
conditions, both using the SDS.
5 Round length is the term used to describe the time between breast screening appointments for each woman. In England this is usually 3 years.
TaoHealth Research & Implementation 15
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
The aim of the round length machine learning tool is to make the best possible use of scarce resources like
radiologist time and expensive machinery, and to reduce stress on the clinical and administrative workforce
delivering the programme.
Structure of the report
There are three main parts to this report:
• The methods used for collecting and analysing the data.
• The findings from the evaluation; and
• The conclusions of the evaluation and recommendations for future similar projects.
It was delivered alongside the project throughout the duration of the project delivery.
TaoHealth Research & Implementation 16
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Methods The evaluation of digital health technology can be conceptualised as a cycle (Bonten TN, 2020) which is similar
in many ways to the cycle of health technology assessment (Gutiérrez-Ibarluzea I, 2017) [Figure 4]. However,
digital health technologies using AI and machine learning present some novel challenges to evaluators. The
process of training, validating testing AI models on real-world data means that there is more iteration across the
phases than traditional technologies. This real-world data is not quite ‘real-world’ however as cleaning is done
to remove incomplete data – a luxury not available in the real world. The ‘black-box’ nature of AI models make
the process of assessing effectiveness less transparent and externally verifiable. Researchers have started to
recommend approaches that recognise these challenges (Vollmer, 2020) but this remains an evolving field.
Figure 4: Comparing the ehealth evaluation cycle and HTA cycle
With these challenges in mind, we chose a non-experimental design for this evaluation. We chose this design on
the basis that it was not be possible to control all possible variables, and from a sampling perspective, it was not
feasible to randomly select the sites for project implementation. Two EMRAD partner sites were selected
(Nottingham Breast Institute and United Lincolnshire Breast Screening Service) for the project. To establish a
counterfactual for the evaluation, two other EMRAD sites were selected, Sherwood Forest NHS Trust and
Northampton General Hospital NHS Trust. We sought to understand if there were any differences in staff
attitudes to the use of AI in breast screening between sites using the model of ‘usual care’ and the sites involved
in the development of the proposed new model using AI tools.
The evaluation used a mixed methods approach, where quantitative and qualitative data is collected, analysed,
and synthesised at different points in the evaluation design to allow for different levels of exploration.
TaoHealth Research & Implementation 17
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Ethical approval
Ethical approval was sought and obtained from the Health Research Authority in July 2019 for the staff and
general public survey component of the evaluation (IRAS ID 262287).
Data collection
The mixed methods approach combines qualitative and quantitative research methods to answer the evaluation
questions from a range of stakeholder perspectives. Information and data are being collected from key
informant interviews, surveys, potentially focus groups, and local data sources.
This approach recognises the importance of establishing a clear baseline (to understand current processes, costs
and outcomes) and establishing a counterfactual (to illustrate the projected costs and outcomes of usual
practice) that can be used in future prospective trials of the innovations. The types of data collection methods
are summarised in Table 1. The data collection survey tools are included in Appendix 1. These were submitted
to the HRA as part of the approval process.
Table 1: Data collection methods
Data collection
method
Details Timing Sample
population size
Target cohort
Document review All programme documents that are held in the programme repository (meeting minutes, reports, status updates, lessons log, risk register, case studies, communications, etc.) are collated and themed.
Oct 2018 – Aug
2020
60+ documents
up to Sept 2019
N/A
Observations Observations of project meetings and other project activity were collected using content and discourse analytical frameworks.
Oct 2018 – Aug
2020
15 meetings up
to Sept 2019
N/A
Initial semi-structured
interviews
Semi-structured interviews were conducted with the programme team in November – December 2018 and again in July – August 2019 to understand the programme partners perceptions of the programme progress and the moderators of this progress.
Nov 2018 –
Dec 2018; Jul
2019 – Aug
2019
31 interviews
with 12
interviewees
interviewed on
2 occasions
19
programme
team and
governance
board
members
Survey of NHSBSP
managers and
administrators
Surveys were sent to service managers at all four sites in December 2019 (round 1) and again in July 2020 (round 2). Only those who participated in round 1 were invited to participate in round 2.
Round 1: Dec
2019 – Jan
2020
Round 2:
July 2020
Round 1: 46
Round 2: 38
46
Survey of NHSBSP
clinicians
Surveys were sent to all clinicians working on the NHSBSP at all four sites in December 2019 (round 1) and again in July 2020 (round 2). Only those who participated in round 1 were invited to participate in round 2.
Round 1: Dec
2019 – Jan
2020
Round 2:
July 2020
Round 1: 115
Round 2: 67
115
TaoHealth Research & Implementation 18
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Survey of women in
the general
population
Surveys were set up on www.onlinesurveys.ac.uk and information was shared via a range of site communication channels with women over the age of 18 years working at all four sites. This group was used as a proxy for the wider population of women in the East Midlands. They were also invited to share information about the project with female friends and relatives, especially those who are not in paid employment including those who are retired. We gathered information on age, ethnicity, and employment status to enable us to identify any gaps in the sample cohort that we needed to address using alternative methods.
Dec 2019 – Feb
2020
24,300 2,500
Programme
leadership survey
The original plan had been to conduct further interviews with programme partners to explore themes around implementation. Covid-19 meant that in-person interviews were not possible and time constraints and redeployment limited even online access to programme partners. Instead, we circulated a survey to all members of the EMRAD governance structure to understand their perspectives on the project progress.
July 2020 74 22
Focus groups We organised focus groups for women in the general population to target groups that are under-represented in the survey responses.
May - June
2020
24,300 30
Operational data We convened a working group of EMRAD programme team members, NHSBSP staff and members of the trusts information teams to refine the programme logic model and identify programme costs and consequences. Information was then collected from trust finance and information teams and used to build the budget impact analysis model.
March - May
2020
N/A N/A
Data analysis
All survey data collected was anonymised and was stored securely within the GDPR compliant
www.onlinesurveys.ac.uk platform and other data on spreadsheets and word documents that were managed
and stored at TaoHealth Research & Implementation under the Data Protection Guidelines (available on request)
and Research Ethics Guidelines.
TaoHealth Research & Implementation 19
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Qualitative data analysis
We uploaded all qualitative data into Nvivo6, a software package which is commonly used by researchers to
organise and visualise data analysis. We used Nvivo to develop and use a first order hierarchical thematic
framework to classify and organise data according to key themes, concepts, and emergent categories. It allows
for exploring data in depth while simultaneously maintaining an effective and transparent audit trail, which
enhances the rigour of the analytical processes and the credibility of the findings.
In addition to the emergent thematic framework for first order analysis [Appendix 2], we also used the Non-
adoption, Abandonment and Challenges to the Scale-up, Spread and Sustainability of health and care
technologies (NASSS) framework (Greenhalgh, 2017) [Error! Reference source not found.] as a second order
analysis to enable us to answer the evaluation questions.
Figure 5 The NASSS framework for considering influences on the adoption, non-adoption, abandonment, spread, scale-up, and sustainability of patient-facing health and care technologies.
Quantitative data analysis
The public and staff surveys were analysed using descriptive statistics to explore views on the use of AI in general
and in breast screening specifically as well as changes in staff attitudes over time.
6 NVivo is a qualitative and mixed-methods data analysis software tool used by academics and professional researchers globally.
TaoHealth Research & Implementation 20
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
We collected quantitative activity and financial information from the two test sites (Nottingham University
Hospitals NHS Trust and United Lincolnshire Hospitals NHS Trust). The activity information for the years 2016/17,
2017/18 and 2018/19 was extracted from the NBSS by each test site (NUH and ULH) and forms the basis of the
KC62 performance report submitted by all breast screening units to NHS Digital. The financial information was
the service budget from the trust ledgers for the same years to identify the direct and indirect costs of service
provision.
This information was then used to conduct a budget impact analysis (BIA) to understand the potential value of
the innovations to the health system. The model for the BIA was agreed with a group of stakeholders in March
2020. The difference between a budget impact analysis and economic evaluation is summarised in Table 2.
Table 2: Comparison of BIA and economic evaluation
The process of conducting a budget impact analysis is summarised in Appendix 3.
The evolving theory of change
The starting point for the evaluation of any programme is the establishment of a programme theory of change.
This sets out the change that is expected to happen, the activities and processes it will employ to effect that
change, identifies the context within which the change will happen and the how that change can be measured.
The theory of change from the model of ‘usual care’ that underpinned the project as a whole was as follows:
a) The real-world application of ‘Platform’ as part of the management of NHSBSP would improve and
optimise clinical service capacity in terms of workforce, scanning equipment and physical space.
b) The use of Mia™ would release capacity in the radiologist and reporting radiographer workforce by
performing one of the two reads on mammographic screens replacing one reader.
c) In combination, the real-world application of these two tools will enhance patient care at significant
scale.
It was also hypothesised that the use of evidence-supported AI tools in programmes such as NHSBSP will increase
NHS clinical staff and commissioner confidence in utilisation of innovative machine learning tools such as Mia™
and ‘Platform’. A simplified version of this theory of change is summarised in Figure 6 below.
TaoHealth Research & Implementation 21
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 6: Simple Theory of Change, March 2019
In December 2018, the evaluators ran a working session with the project partners to develop the initial theory
of change for the project. It became very clear that the different levels of development of the solutions being
tested (Mia™ versus ‘Platform’) meant that a single theory of change would be of limited usefulness. Two
theories of change were developed, and these are included in Appendix 4. These theories of change were used
to refine the evaluation plan and data collection tools.
This project was slightly different from others in the NHS England Wave 2 Test Beds Programme in that it was
not testing real-world deployment of market ready digital tools, but developing innovative tools using artificial
intelligence in a real-world context. The exploratory and iterative nature of this work meant that the theories of
change evolved over the course of the project. The most recent versions developed in January 2020 as part of
this evaluation are summarised in Figure 7 and Figure 8 below.
Figure 7: Kheiron Medical Technology Mia™ Theory of Change January 2020
TaoHealth Research & Implementation 22
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 8: Faculty 'Platform' Machine Learning Theory of Change January 2020
The usual care model that is the starting point for the theory of change is set out in the service specification for
the NHSBSP7 and summarised in Figure 9 with the location of each of the AI tools being tested in the project in
process presented.
7 NHS public health functions agreement 2018-19. Service specification no.24. Breast Screening Programme. Version
number: FINAL. First published: September2018. NHS England Gateway Number: 07845.
TaoHealth Research & Implementation 23
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 9: NHSBSP Process Map (2018/19)
As well as exploring the process of implementation, this evaluation has looked at the likely effect of the tools
being developed on the breast screening service. We have focused attention within the part of the pathway
most likely to be directly impacted as highlighted above. There may be other effects downstream in the pathway
but these are as yet untested as neither tool has been deployed in a live environment.
The evaluation questions that were set out at the beginning of this project have, because of changes to the
project, been modified as the project has progressed. The modifications are summarised in Appendix 5.
TaoHealth Research & Implementation 24
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Findings Overview
We present the narrative of the project in Appendix 6. This illustrates the project’s journey from the beginning
of the project prior to the Test Beds award in October 2018 through to December 2020. Over that time the
project has delivered the following outputs:
Change Domain Output
Technology: Mia™ retrospective study comprising completion of training, validation and testing;
A synthetic data set based on NBSS data to train and validate operational machine learning tools;
Round-length planning tool developed and tested.
Organisational readiness: Information governance blueprint for development of AI tools in an NHS context;
A project team with the skills and experience to test AI radiology products in the NHS environment;
Commercial / NHS partnerships for future development, deployment and uptake.
Value proposition: Outline business case for real-world deployment;
Financial and budget impact baseline model;
Adopters: Change in staff and public attitudes to the use of AI in breast screening with greater awareness across clinical staff groups in participating sites;
A baseline understanding of women in the wider population’s attitudes to the use of AI in the breast cancer screening programme in England.
Wider context: Contribution to the emerging regulation of and policy context for AI in health through collaborative work with NHS England and Improvement, CQC and NHSX.
We will pick up these changes in more detail as we present the findings of the evaluation in the next sections.
TaoHealth Research & Implementation 25
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
How well do different groups understand AI in general?
Fifty-eight percent of clinical staff described themselves as having some or extensive understanding of AI
compared to 20% of non-clinical staff. Women from the general public under and of screening age described
themselves as having similar levels of understanding as the clinical staff we surveyed (57%). Illustrative examples
of descriptions used are provided in Box 2.
Box 2 Descriptions of artificial intelligence
Clinical Non-clinical Women (focus groups)
“Machines helping us make better decisions.”
“Using a computer system to perform a task originally performed by people.”
“The use of computer software to perform a task normally done by a human.”
“Science fiction, robots, scary.”
“Non-human skills, anything to do with robots.”
“I think of it as more assisted intelligence.”
“It doesn't exist, it is misinterpretation of the issue.”
“Robotic systems using our information.”
“Robots, no need for human input, films, the future, apprehension but exciting, fear of the unknown.”
“Intelligence of technology, making computers act more like humans would.”
“Technology thinking for itself.”
“Robots.”
“Something very clever beyond my intelligence”
“A lot of my information is from the media and from films, but that's all I know. My understanding of AI is science - fictional or something that is created to automate answers like a chat facility. I've read about it a bit in the media.”
“I think more about software and algorithms and a learning algorithm that learns more and more as you put data in.”
“I've just had an experience with my bank of speaking with the little robot person and it was horrible and I wanted to send the survey saying this is rubbish. I wouldn't try it again.”
“A lot of my understanding is from movies where robots take over the world.”
TaoHealth Research & Implementation 26
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What were the perceived benefits of the use of AI tools in the breast screening service?
Clinical staff
At the beginning of the project, clinical staff at both test sites were reported by clinical leads to have limited
awareness of the application of AI either as a second reader of mammograms, or in the management of the
breast screening programme. By the time of the round 1 survey, over 12 months had passed since the start of
the project. However, project progress had been limited and clinical teams had relatively low levels of
involvement in the training and validation of Mia™.
The clinicians showed a little change in their understanding of AI over the period studied with the majority
classifying themselves as having “some understanding” of what AI is in both rounds. Most of the rest described
themselves as “aware but having limited understanding” [Table 3].
Table 3: Clinicians self-assessment of their understanding of AI
Rates of understanding were higher in test sites compared to control sites during both rounds of the survey. The
proportion of clinicians who had read something about the use of AI in mammography went up from 67% to
76% over the period between the surveys.
When asked if they thought AI would have positive effects on society in general, 84% of clinicians agreed in
round 2 up from 76% in round 1 [Table 4].
Table 4: Clinicians perception of the potential for a positive impact of AI on society
To understand the extent to which the perceived benefits of Mia™ aligned to actual challenges faced by the
service, we asked clinical staff what they thought were the greatest challenges facing the service at present. We
asked them to select their top three from a long list of challenges based on a literature review and consultation
with a small group of clinical staff. In both rounds of the survey, workforce shortage was by far the biggest
concern (95%) and high ‘do not attend’ (DNA) rates the next biggest (33%).
Clinicians were asked about their views of the likely benefits of AI to the breast screening service based on the
theory of change for this project and their responses for the two rounds were compared.
Whilst the clinicians were positive about the potential benefits of AI in breast cancer screening in simplifying
current working practices [Table 5] and in supporting decision-making [Table 6], they are less convinced of the
potential to improve workforce capacity [Table 7] and there was little overall change in their view that AI can be
trusted to identify anomalies accurately [Table 8].
Clinicians understanding of AI Round 1 Round 2 Change
Aware of AI but limited understanding 46.51% 35.14% -11.38%
Some understanding 51.16% 56.76% 5.59%
Extensive understanding 2.33% 8.11% 5.78%
Perceived impact on society Round 1 Round 2 Change
Strongly agree 14.93% 18.92% 3.99%
Agree 61.19% 64.86% 3.67%
Neither agree nor disagree / Undecided 23.88% 16.22% -7.66%
TaoHealth Research & Implementation 27
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Table 5: AI tools could simplify current working practices - Clinicians views
Table 6: AI tools could support decision-making - Clinicians views
Table 7: AI tools could improve workforce capacity - Clinicians views
Table 8: AI tools could be trusted to identify anomalies correctly - Clinicians views
Asked about their level of comfort using AI as a second reader in the process of reading population breast
screening mammograms, 51% agreed in round 1, they would be comfortable, and this increased only slightly to
54% by round 2. The proportion saying they would not be comfortable also went up from 4.5% to 8%. When
asked to expand on this qualitatively and indicate what would give them greater confidence that: AI second
readers were safe effective; would improve their working life; and impact the experience of women coming into
the service, 43% of respondents indicated the want to see evidence from trials and 24% wanting results of clinical
audits in situ.
Simplify current working practices Round 1 Round 2 Change
Strongly agree 8.96% 13.51% 4.56%
Agree 43.28% 62.16% 18.88%
Disagree 2.99% 2.70% -0.28%
Strongly disagree 0.00% 0.00% 0.00%
Neither agree nor disagree / Undecided 44.78% 21.62% -23.15%
Support decision-making Round 1 Round 2 Change
Strongly agree 2.99% 13.51% 10.53%
Agree 56.72% 56.76% 0.04%
Disagree 1.49% 0.00% -1.49%
Strongly disagree 0.00% 0.00% 0.00%
Neither agree nor disagree / Undecided 38.81% 29.73% -9.08%
Improve workforce capacity Round 1 Round 2 Change
Strongly agree 10.45% 5.41% -5.04%
Agree 52.24% 62.16% 9.92%
Disagree 1.49% 2.70% 1.21%
Strongly disagree 0.00% 5.41% 5.41%
Neither agree nor disagree / Undecided 35.82% 24.32% -11.50%
Be trusted to identify anomolies accurately Round 1 Round 2 Change
Strongly agree 2.99% 8.11% 5.12%
Agree 38.81% 32.43% -6.37%
Disagree 5.97% 5.41% -0.56%
Strongly disagree 0.00% 5.41% 5.41%
Neither agree nor disagree / Undecided 52.24% 48.65% -3.59%
TaoHealth Research & Implementation 28
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Quotes from free text
“Initially it would have to be introduced as an additional tool rather than a replacement. The use of audits would
then be able to determine the effectiveness and benefits of using AI.” Clinician at test site.
“[I need to see] publicised, peer reviewed results against real life.” Clinician at control site.
Clinicians were positive about the potential effect of introducing the AI reader on the experiences of women
attending the service, increasing from 42% to 51% over the period studied. There were only very small
differences between the test sites and control sites in all the survey items and these did not demonstrate any
significant change over time. Clinical staff were also positive about the potential deployment of AI to support
service optimisation in their free text responses.
Quotes from free text
“NHSBSP standards to offer appointments is very challenging, especially with age extension and having to catch
up due to a cease in screening due to COVID-19. It would be fantastic if AI was used for predicting actual numbers
attended accurately so booking slots can be used for effectively.” Clinician at test site.
Non-clinical staff
The early engagement of breast screening service administrative staff at the two test sites by Faculty as part of
the discovery process would have raised awareness and expectation within these staff groups about the
potential use and benefits of a service optimisation tool. This was reflected in the response to the question about
understanding of AI, with 83% saying they have some understanding or awareness of AI at test sites versus 78%
at control sites. This had increased to 100% by round 2 although the small sample size for this round is a
limitation when interpreting results [Table 9].
Table 9: Non-clinicians self-assessment of their understanding of AI
When asked if they thought AI would have positive effects on society in general, non-clinical staff were more
sceptical than clinicians with only 30% of agreeing in round 1 (although this increased to 58% in round 2). This
may be linked to the age profile of the two groups:70% of non-clinical staff were aged 50 and over; whereas only
40% of clinical staff were aged 50 and over.
To understand the extent to which the perceived benefits of the service optimisation tool aligned to actual
challenges faced by the service, non-clinical staff were asked what they thought were the greatest challenges
facing the service at present. They selected their top three challenges from a list drawn up after consultation
with a small group of non-clinical staff. In both rounds of the survey, workforce shortage was the biggest concern
(80%), high ‘do not attend’ (DNA) rates the next biggest (58%) and administrative burden also being a significant
concern, more so in round 2 (58%).
Non-clinicians were asked about their views of the likely benefits of AI to the breast screening service based on
the theory of change for this project and their responses for the two rounds were compared. By the time of the
second survey round, non-clinicians were more likely to agree that AI would have potential benefits in supporting
the management of breast screening services against all three dimensions [Table 10].
Non-clinicians understanding of AI Round 1 Round 2 Change
Some understanding 21.05% 58.33% 37.28%
Aware of AI but limited understanding 60.53% 41.67% -18.86%
None 18.42% 0.00% -18.42%
TaoHealth Research & Implementation 29
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Table 10: Non-clinicians who agree that these benefits of AI are likely
Quote from free text
“I think if this will increase the experience of women it will be a good thing. We do get a lot of anxious women
wanting to know results sooner than our protocol. However, hopefully there are no risks involved.” Breast
screening service manager.
Non-clinicians were undecided about the potential effect of introducing AI service optimisation tools on the
experiences of women attending the service, with only 33% in round 2 agreeing that it would have a positive
effect on women’s experience (although this had increased from 18% in round 1).
Women of and under screening age
The survey was targeted at women over the age of 18 years and the respondents were segmented into women
who are currently or have recently been the population of focus for the NHSBSP (aged 50+ years) and women
who are not yet of screening age. We wanted to understand if there were differences between these two groups
in what they think about the use of AI in the screening programme to read mammograms and support
programme management. More information on the profile of the respondents is included in Appendix 7.
Most women (54%) did not understand the process for currently reading mammograms. Twenty-four percent
had an accurate understanding of the process [Figure 10]. A higher number of women under screening age (58%)
did not understand the process compared to women of screening age (49%), indicating that women are likely to
seek out information on this once they enter the programme.
Figure 10: What is the process for reading mammograms?
When asked about their understanding of AI in general, most women rated themselves as having some
understanding (55%). When this was broken down by age group, there was consistency across all age groups
Potential benefit Round 1 Round 2 Change
Assist the management of the BSS 29% 50% 21%
Improve workforce capacity 34% 42% 8%
Simplify current working practices 32% 50% 18%
TaoHealth Research & Implementation 30
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
except for the 18-19 years and 70+ years groups who both reported higher levels of no or little understanding
[Figure 11].
Figure 11: Women’s self-assessed understanding of AI
The survey respondents were much more likely to be positive about the potential effect of AI on society (50%)
than negative (6%), but were almost as likely to be undecided on this (44%) [Figure 12]. Women under screening
age were slightly more likely to be positive about the potential effect of AI (53%) as opposed to women of
screening age (47%).
Figure 12: Artificial intelligence will have a positive effect on society
TaoHealth Research & Implementation 31
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Some women chose to explain why they had given the response that they did to this item in optional free text
(n=670). Sentiment analysis8 revealed 42% positive statements about the impact of AI. A similar pattern was
noted for both women of and under screening age [Figure 13].
Figure 13: Sentiment analysis of free text in response to AI having a positive impact on society
When we looked at the free text responses of those women who stated that they neither agree nor disagree
with the statement, there was more of a difference between women of screening age and those under screening
age with women under screening age more likely to express positive or mixed sentiments about AI and its effect
(35%) than women of screening age (21%) [Figure 14].
Figure 14: Sub-sample of those who responded neither agree nor disagree that artificial intelligence will have a positive effect on society
8 Sentiment analysis involves classifying opinions into categories like "positive" or "negative” view.
TaoHealth Research & Implementation 32
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Quotes from free text
“A computer if programmed correctly will not produce any errors when a human can.” (50-59 years)
“The removal of emotion and personal circumstances can lead to more consistent and fair decisions.” (40-49
years)
“AI can be used to aid in education and understanding of many issues.” (20-29 years)
“It can be used in a range of areas to improve speed, accuracy, reduce costs of certain tasks. It is already used in
lots of ways. It can standardise tasks/tests etc. as not prone to same biases of humans (other biases may exist
and need to be taken into account). Reduce problems caused by human error and differences in my opinion.” (30-
39 years)
Respondents were asked about their views in a free text response on using AI in the breast screening
programme, both as a second reader and to support programme management. Sentiment analysis of these
responses showed that the largest proportion of women were positive about the use of AI in the breast screening
programme (46%) with the next largest expressing mixed views (20%) and 16% expressing a negative view
[Figure 15].
Figure 15: Views on the use of AI as part of the breast screening programme (% of total sample)
TaoHealth Research & Implementation 33
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Women of screening age were the more strongly positive of the two groups and women under screening age
more likely to hold mixed views [Figure 16].
Figure 16: Views on the use of AI as part of the breast screening programme (% of each sample group)
Thematic analysis of the free text data showed that when any perceived benefits of using AI in the breast
screening programme were mentioned, women were most likely to say that they were not sure what these
would be (n=543). When benefits were identified, the most frequently mentioned were; increased efficiency
(n=162), improved reliability (n=263) and greater safety (n=139). Many women expressed the view that AI in
breast screening would and should happen (n=847) in the future which represented 78% who expressed a view
about the future of AI in breast screening.
These themes around potential benefits were explored in more detail as part of the focus groups that followed
the survey. Information about the participants in the focus groups is included in Appendix 7.
Of the 25 women who took part in the focus groups, 76% had either experienced a breast cancer diagnosis
themselves or know someone who had. Seventy-two percent had had a breast cancer screening appointment.
Sixty percent of the women who took part knew that two readers looked at mammograms and 36% did not
know the process for reading mammograms. This made them a more informed group than the general
population surveyed, probably reflective of their self-selecting status when volunteering to take part in the focus
groups.
Many focus group participants expressed the view that the use of AI in healthcare and specifically in the breast
screening programme was inevitable (n=11), with some seeing a positive contribution being made by AI (n=4).
The main benefits that women saw AI in breast screening offering were in increased efficiency (n=23), improved
reliability (n=12), improved outcomes (n=8) and improved safety / fewer errors (n=8). They also hypothesised
that introducing AI into the breast screening programme might release staff to higher value activities and save
money for the service (n=6) and help address the workforce shortage within the breast screening programme
(n=17).
TaoHealth Research & Implementation 34
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Quotes from free text
“My GP has introduced AskMyGP – I was blown over from the response - personalised to me. I would find this
easier to do and would prefer to spending 2-3 hours going to the surgery.” (Woman under screening age)
“I'd like to think that this AI will shorten the time taken from the mammogram being taken to getting the results.”
(Breast cancer survivor)
“AI in the background - you could really get a lot of people through the system. I'd have a level of comfort from
a mammogram point of view. I guess if there was a problem you would have a review by a radiologist. I guess on
the back end you would still be getting some kind of personal touch.” (Woman of screening age)
“I wanted to choose a mix of AI and human. So much of my life was waiting for results and being on hold, I think
it was about speed and accuracy for me. I don't have enough experience of normal mammograms to know how
to answer.” (Breast cancer survivor)
TaoHealth Research & Implementation 35
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What were the concerns of the workforce and women about the use of AI tools in the breast screening service?
Across all the groups we researched, ‘trust’ was the single most referred to concern. The emphasis for each
group in terms of both reasons for lack of trust and consequent mitigations were slightly different and are
explored in the following sections.
The word “trust” was mentioned 137 times in the public survey in the free text response to two questions which
sought to understand more detail on respondents’ attitudes to the use of AI.
• Tell us more about why you selected the level of agreement with Artificial intelligence can have a
positive effect on society that you did.
• How would you feel about artificial intelligence being used to read mammograms?
When we included synonyms such as “sure”, “confident” and “believe”, this incidence increased to 696. Almost
all the respondents who mentioned trust, either did not trust AI or felt that it could not be trusted if used in
isolation without human oversight.
Clinical staff
When asked how often they trusted information from search engines general queries, clinical staff trusted this
information often or very often 64% of the time. Seventy-three percent of this group use search engines to seek
health information for themselves and the proportion who do this declines only with age for those in the 60-69
years age band [Figure 17].
Figure 17: Clinician's use of search engines to seek health information
While some clinical staff see an AI second reader as potentially alleviating the severe pressure placed on the
screening workforce at the moment while releasing staff to activities that are “patient-facing”, others expressed
concern about the impact of introducing AI as a second reader on possible job losses and reduced opportunities
for reporting radiographers to enhance their skills.
TaoHealth Research & Implementation 36
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Clinical staff were also concerned about the safety, accuracy and reliability of the AI reader and saw the
publication of clinical trials as an essential prerequisite to adoption of the technology in the breast screening
service workflow. A text search of free text returned 30 mentions of “evidence”, “trial” or “audit” in round 2 of
the survey, an increase from round 1.
Quotes from free text
“I believe the use of AI could be an exciting development in improving the service, however I would want to see
the evidence that it is a safe tool to use.”
“I think AI tools in the breast screening programme may be useful in booking patients, sending them invitation
to screening etc, but probably not advisable/safe to use in image reporting.”
“It will be easier to assign sensitivity thresholds with an AI to reduce false positives.”
Non-clinical staff
When asked how often they trusted from search engines general queries, non-clinical staff trusted this
information often or very often 66% of the time. Sixty-six percent of this group use search engines to seek health
information for themselves.
During the project, the non-clinical workforce had no real-world experience of a tool in practice until the last
two months (after the surveys had closed). This was reflected in the common assertion in the free text responses
that respondents to the survey did not have enough information to express a view on the introduction of AI into
the service management workflow. Some concerns were expressed about the possible effect on jobs in terms of
job losses if a service optimisation tool is introduced.
Quotes from free text
“Because I do not know much about it and nothing artificial is usually not good. However, if I don't understand
how it works then my answer will be biased.”
“I feel uneasy about it until there is more information and research to prove the reliability and benefit and also
wonder what this means for jobs.”
“I would like to see it in place first before I make a comment.”
Non-clinical staff also expressed some concerns about the accuracy and reliability of the AI second reader in free
text responses and the knock-on effect that this could have on women’s experience of and confidence in the
service.
TaoHealth Research & Implementation 37
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Women of and under screening age
When women were asked in the survey the extent to which they trusted the output of commonly used
technology platforms that use AI such as search engines and virtual assistants, 60% of all women said they
trusted the output often or very often leaving over one third somewhat or very sceptical [Figure 18].
Figure 18: The extent to which women trust the information they get from the technology that they use everyday
There was a small difference between women of screening age and under screening age with those of screening
age a little more sceptical than those under screening age [Figure 19].
Figure 19: Difference in trust between women of and under screening age
Six hundred and seventy women chose to explain why they had given the response that they did to the statement
“Artificial intelligence will have a positive effect on society” and when these responses were analysed for
sentiment, 39% made mixed or negative statements about the impact of AI. Women under screening age had
slightly more mixed or negative views (41%) than women of screening age (37%) [Figure 20].
TaoHealth Research & Implementation 38
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 20: Sentiment analysis of free text in response to AI having a positive impact on society
Women who had a negative or mixed view of the effect of AI in society were unsure of why they felt this way in
many cases (n=96) although they felt it was an inevitable part of their lives in the future (n=20). Those that did
express a view cited concern about the reliability and safety of technology (n=123); a lack of trust in the
technology itself or the systems that sit around it (n=65); a fear about a combination of over-reliance on AI and
job losses that might ensue (n=32); and the absence of the human touch in interactions (n=46).
Quotes from free text
“If used well, AI has an important part to play in diagnosis of disease. However, there are also dangers of it being
used to further profit-driven goals.” (50-59 years)
“It reduces human interaction but I agree there are a lot of amazing applications of AI that can, for example,
keep the disabled independent.” (40-49 years)
“It has the potential for profound good or profound harm. It must be controllable.” (20-29 years)
“AI has the ability to remove skills and development of people and learning is wasted. I’m concerned for my
children in the further, I believe that humans will be removed from day to day things and life will miss person
centred contact.” (30-39 years)
When asked about their views on using AI in the breast screening programme, both as a second reader and to
support programme management, sentiment analysis of these free text responses showed that the 16% of
women expressed a negative view [Figure 15].
When we compared the women’s views of the effect of AI on society and their views on the use of AI in the
breast screening programme, women with positive views on the effect of AI on society as a whole were slightly
more likely to hold positive views on the use of AI in breast screening but interestingly, also more likely than
women who had a negative or undecided view on the effect of AI on society, to hold a negative view of AI in
breast screening. In other words, women who have positive views about AI’s effect on society have more decisive
views in AI in breast screening (positive or negative). Women with the highest proportion of negative views on
AI in breast screening were those with neutral views on the effect of AI on society [Table 11].
TaoHealth Research & Implementation 39
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Table 11: Cross tabulation of the perception of the effect of AI on society with perception of use in breast screening
View of the use of AI in breast screening
View on the effect of AI on society
(rank)
No
answer
Negative Mixed Undecided Positive Grand
Total
Strongly disagree 3 26 12 7 52 100
Disagree 5 77 15 16 16 129
Neither agree nor disagree /
Undecided
68 414 329 423 569 1803
Agree 29 148 452 139 1069 1837
Strongly agree 4 10 31 8 174 227
Grand Total 109 675 839 593 1880 4096
When women’s perception of the use of AI in the breast screening programme was compared with women’s
self-reported understanding of AI overall, we found that the higher women rated their understanding of AI, the
more likely there were to have a positive view of its use in the breast screening programme [Table 12].
Table 12: Cross tabulation of self-reported understanding of AI with perceptions of AI in breast screening
View of the use of AI in breast screening
Understanding of AI in general No
answer
Negative Mixed Undecided Positive Grand
Total
None 33 66 25 90 90 304
Aware that it exists but little
understanding
33 287 269 261 583 1433
Some understanding 40 310 525 236 1148 2259
Extensive understanding 3 12 20 6 59 100
Grand Total 109 675 839 593 1880 4096
Thematic analysis of the free text data showed that the concerns and risks of using AI in the breast screening
programme most frequently mentioned were: a lack of clarity around how the technology would be governed
and regulated once in place (n=163); and the lack of ‘human touch’ that may result (n=143). A large number of
women (n=643) expressed the view that they expected the AI tool being used as a second image reader would
be rigorously tested and there would be robust evidence made available on its safety and accuracy. A small but
not insignificant group (n=242) of women who expressed a view about the future of AI in breast screening said
that AI should not be used within the breast screening programme.
These themes around potential concerns were explored in more detail as part of the focus groups that followed
the survey. Information about the participants in the focus groups is included in Appendix 7.
As we stated before, the 25 women who took part in the focus groups, were a more informed group than the
general population surveyed.
The main concerns that were expressed by women were: the absence of the ‘human touch’ (n=37); lack of clarity
around how the AI tools will be governed; and potential discriminatory bias avoided (n=33) and how data privacy
will be protected (n=25).
TaoHealth Research & Implementation 40
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Quotes from free text
“There are all those sci-fi movies where it goes rogue and I am not saying it is not completely far-fetched, most
of it is, but I think it about having some real strong list of ethical principles about how you use it but in free
market capitalism you are not going to have that are you? that would be bad for society, it will make money for
the money people, it will leave behind the poor people and there will be some good people along the way who
will do good things with it.” (Women of screening age)
“I think behind the scenes it is great, but I think you need a lot of face to face compassion and understanding.”
(Woman under screening age)
“If there was some sort of consent, confidentiality, some sort of understanding of the rules. It would be nice to
know that some sort of trustworthy organisation was monitoring it.” (Breast cancer survivor)
“I have been reading negative stuff about AI like facial recognition and how it’s a bit biased - would it be biased
against certain racial populations?” (Woman under screening age)
“Internet security and AI security and hacking - it is a concern for me. My worry would be that systems are hacked.
Do they have a minimum level of security?” (Woman of screening age)
When exploring the kind of actions that women thought would mitigate some of their concerns, they suggested
that the workflow would always need to involve humans. For some women the AI technology undertakes most
of the activity including decision-making with human oversight (n=10), for others they see the human role as
pre-eminent with AI used to augment clinical activity and decision-making (n=15). The women assumed that
this technology would never be used without clear evidence of its effectiveness (n=24) and expected effect on
the equity of access to breast screening to be closely monitored (n=18) through governance processes.
Women were divided on whether they would want to be informed if AI tools were being used as part of their
experience of breast screening but agreed overall that women should be given the information as part of the
process of informed consent when taking part in the breast screening programme (n=15).
TaoHealth Research & Implementation 41
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What were the technical and data benefits and challenges for the project?
This project was not testing real-world deployment of market-ready digital tools, but developing innovative tools
using artificial intelligence in a real-world context. In the case of Mia™, this meant training, validating and testing
their AI breast image reader on a large retrospective data set extracted from the GE Healthcare system that
EMRAD Trusts use to process and store all medical imaging. Faculty first tried to develop their service
optimisation tools using NBSS data extracted from EMRAD trusts but it became clear that this would not be
possible and so they pivoted to developing a synthetic data set (SDS) based on an extraction of real-world data
at one EMRAD trust site, and then developing the service optimisation tools using this SDS.
Whilst there were some minor technical challenges during the project, the data challenges were more
significant. These can be separated into challenges with (1) ethical approval and information governance; (2)
data extraction and transfer; and (3) data cleaning and preparation. The findings below are based on content
analysis of project board minutes, individual and group interviews and other project documents.
Mia™ AI breast image reader
The work undertaken by Kheiron as part of the project was governed by the ethical approval obtained from the
Health Research Authority (HRA) for the retrospective study. During the project, the EMRAD team identified that
Kheiron’s internal/onsite computer servers used to store the de-identified extracted data, were not compliant
with standard ISO2700. The AI project board discussed the issue and assessed the impact of it as low due to the
nature of the data (de-identified) the contractual protections in place, and the imminent update of ISO27001.
In October 2018, GE Healthcare’s datacentre copied the data onto an encrypted share which was transported
securely to NUH. The next six months was spent on data de-identification, quality control and information
governance sign-off. After this was complete, half of the data was transported securely to Kheiron’s offices in
London. This transfer process took considerable time and coordination to organise to comply with information
governance requirements. Obtaining sign-off for data transfer from more than one trust caused significant
delays to this process and suggestions have been made to mitigate this in future projects.
Data cleaning and preparation is a critically important stage in the process of training an AI tool (Harvey H.,
2019). Kheiron worked with the two trust sites (NUH and ULH) to ensure the de-identified datasets still contained
sufficient data markers to enable ML training. A risk was identified early on that data fields removed during the
anonymisation process, or data fields not extracted initially, may later prove important for training. In March
2020, it became clear that two such data fields were missing. The data had to be re-extracted. This lesson was
recorded in the project log and AI project board agreed future mitigations should this arise again.
The National Breast Screening System (NBSS) is provided by Hitachi Consulting and has been used by Public
Health England’s (PHE) national breast screening service since the 1980’s (House of Commons, 2019a). The
system is maintained but the underlying programming is not now commonly used and requires specialist
knowledge to write scripts for data extraction. Kheiron were able to source this expertise early in the process in
October 2019. They also needed to work closely with GE Healthcare to integrate Mia™ with the GE imaging
interface. Having GE Healthcare as a partner in the project from the beginning made this critical enabler possible.
TaoHealth Research & Implementation 42
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Faculty service optimisation (round length ML) tool
During the first half of the project, Faculty were planning to develop their service optimisation tool using de-
identified data extracted from NBSS and BS-Select9 which includes GP data. Again the age of the NBSS system
meant that Faculty had to re-scope their tool design to focus on static downloads of data as PHE advised them
that development work to enable integration with NBSS would not be a priority. Faculty mitigated this risk by
removing the need for a live feed from the NBSS system.
It quickly became clear that information governance would make GP data inaccessible to Faculty. When
exploring how they would extract and transfer the data they needed from the EMRAD sites (NUH and ULH),
Faculty and EMRAD recognised the need for a data protection agreement. After several weeks of discussion, it
became clear that the issue of level of liability cap was contentious and took some time to resolve. During this
time, the two parties (EMRAD and Faculty) had some frank discussions about the continued inclusion of Faculty
in the project, agreeing in September 2019 that Faculty should continue and reaching a compromise on liability
under the data processing agreement. By October 2019 it had become clear that the data requested could not
be shared due to complexity and sensitivity.
At this point, it was agreed that Faculty would refocus activity on the production of a synthetic data set (SDS)
that could then be used to develop a few different service optimisation tools for the breast screening
programme. To address the information governance challenges that proved insurmountable during the first half
of the project, Faculty applied to HRA for ethical approval of the development of the SDS with the NUH as
sponsor of the research. This provided Faculty with access to specialist advice from NUH’s Research and
Innovation team. The novelty of this type of research in the NHS meant that the process for applying for ethical
approval was unclear and required many discussions with stakeholders and decision-makers. For example,
initially it was thought that the research did not need to go to the Confidentiality Advisory Group (CAG), then it
was decided that it did and then subsequently confirmed that it did not. This decision alone took three months
to confirm, adding time to this process. HRA approval was finally granted in July 2020.
For the development of the SDS, Faculty agreed an approach where the round length management ML tool
would be first trained and refined on synthetic data10 on Platform™ and deployed into NUH’s IT environment
and retrained on real data there [Figure 21].
9 BS-Select is another IT system used by the NHS Breast Screening Programme to take information from primary care to invite women to breast screening appointments. 10 Synthetic data is produced by a learned generative model of original data, in this case extracted from NBSS, which retains the same statistical patterns as the original data.
TaoHealth Research & Implementation 43
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 21: Process for developing the SDS
The first four stages were completed by the end of September 2020 in the NUH environment. The Covid-19 social
distancing restrictions meant that a model was developed which enabled a Faculty data scientist to access
historic NBSS data remotely and interact securely using a soft token to gain remote access using a virtual desktop
infrastructure (VDI). This interaction was observed by an NUH IT team member to provide assurance that agreed
protocols were being adhered to. Faculty used JupyterLabs open-source software and Faculty’s proprietary code
to generate the SDS. The SDS generation tool developed during this process is a supplementary output of this
work and will remain in place in the NUH environment to be used for generating other SDSs. This generation
tool is co-owned by NUH (on behalf of EMRAD) and Faculty.
The next stages (5-6) took place in the Faculty environment after SDS data transfer. The first attempt at data
transfer was deleted during to concerns about data completeness post-transfer. These two stages took 6-8
weeks and were delivered concurrently with stage 7 (building the deployment environment in NUH IT) which
enables the tool to be deployed rapidly by ensuring the readiness of the host IT infrastructure. The breast
screening service managers at NUH were involved in the development of the wireframes during this stage of
development to make sure that prioritised features met their needs and reflected their existing decision-making
processes. The tool has not yet been integrated with live NBSS and it is unlikely that this is realisable in near
future. Howeverthe tool is available to service managers as a service planning tool which enables managers to
build schedules and plan against scenarios using the NBSS-based synthetic data. It will be possible for service
managers to load static cuts of NBSS into the tool with minor additional development. Future further
developments are planned to continue in 2021.
Breast screening service staff perceptions of the technical feasibility of using AI tools in the workflow
When clinical staff were asked in the survey if they thought it would be technically feasible to embed an AI breast
screening management tool into current working practices, 49% respondents in round 2 believed that it would.
This was down from round 1 (57%). This may be explained by the lack of perceived progress in introducing the
AI second reader into their workflow during the period of the project given it was a planned activity at the outset.
When asked the same question, non-clinical staff were more equivocal in both rounds with 67% respondents in
round 2 neither agreeing nor disagreeing, down from 74% in round 1. The remaining minority of respondents
TaoHealth Research & Implementation 44
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
agreed it would be technically feasible. In the free text, some respondents voiced dis-satisfaction with the
current technology offered by EMRAD as part of the wider programme of EMRAD’s work and this may bias their
perception of any initiative.
EMRAD programme leadership perceptions of the technology
This project is governed as part of the wider EMRAD governance structure and process. As such, the members
of the EMRAD governance groups are accountable to their trusts for the performance of this and other projects
undertaken by EMRAD. A sample of this group were interviewed at the beginning of the project and a wider
group were sent a survey in July 2020 to understand their perception of the progress that the project had made.
We received 18 responses from a mix of people who had direct involvement with the day to day running of the
AI project (n=6) and people who were indirectly involved, taking governance or advisory roles (n=12).
When asked how easy the AI tools would be to use within the existing technical infrastructure, respondents the
majority said they did not know (n=11). Of those who said it would be easy (n=5), they were not directly involved
in the day to day running of the project.
TaoHealth Research & Implementation 45
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What were the organisational issues that enabled or constrained the progress of the project?
The project was extended by Innovate UK beyond its original timeline on two occasions, once by six months in
response to funding mobilisation delays and the second time by a further three months in response to the
COVID-19 pandemic. It was also re-scoped on several occasions for both AI tools being developed and tested.
The findings related to organisational issues below are based on content analysis of project board minutes,
individual and group interviews, and other project documents as well as responses to the programme leadership
survey administered in July 2020. Three main categories of enablers or constraints were identified in the data,
(1) capacity, (2) readiness and (3) implementation. These are explored at three levels, (1) core EMRAD project
implementation team, (2) wider AI project implementation team including commercial partners and (3) EMRAD
trusts.
The project was led by NHS EMRAD Imaging Network, a non-statutory body in which all seven trusts have equal
interest. EMRAD is hosted by Nottingham University Hospitals NHS Trust (NUH) and NUH was chosen as one of
the two sites for the test bed. There is strong support for NUH taking a leading role on behalf of the network in
this and other projects. EMRAD had been the vehicle through which the network successfully delivered a new
model of care as part of the Acute Care Collaboratives (ACC) programme sponsored by NHS England from 2016-
2018. The resources, structures and processes built around the EMRAD network created capacity at project level
with a core project team allocated to the test bed project from the outset, including in the test bed application
process. However, the delay in distributing funding from Innovate UK during the first quarter of the project
meant that EMRAD and the host organisation NUH were unable to recruit the additional project management
staff they needed for the project. By the time the funding was in place and the roles were approved, the length
of appointment (a maximum of 12 months at that point) made the roles unattractive and only one post was
filled. In March 2020, in response to the need for additional frontline resource in response to the Covid-19
pandemic, some clinical staff seconded to the team were redeployed to support the pandemic response in their
employing trusts.
The wider team include the commercial partners in the project, GE Healthcare, Kheiron Medical Technologies
and Faculty. These companies were able to flex their own internal capacity to meet the needs of the project at
different times. Kheiron raised a Series A funding round in 2019 and Faculty were the successful supplier to
support NHSX build the NHS AI Lab in 2020. However, the national lockdown imposed in March 2020 in response
to the rise in community spread of Covid-19 led to at least one commercial partner putting staff on furlough in
April 2020.
The breast screening units in each of the test sites, NUH and ULH, had different levels of capacity, as well as
activity. ULH’s team is smaller and with high levels of variance in vacancy rates across all staff groups from year
to year [Figure 22].
TaoHealth Research & Implementation 46
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 22: Whole time equivalent funded staff dedicated to the breast screening service 2017-2019
The numbers of women invited for screening has increased at ULH over the last 3 years and is now similar to the
levels at NUH although uptake rates remain lower at ULH [Figure 23].
Figure 23: Women invited and attending screening appointments at the two test sites (NUH and ULH)
Each of the test sites had a nominated principal investigator (PI) who was also a senior clinical practitioner on
each site and had a significant ‘day job’. Despite this, they provided support and commitment to the project
from beginning to end. The capacity of the host trust’s information governance (IG) team to provide the levels
of IG support that were required by both teams was severely stretched by the significant pressure placed them.
To address this, the EMRAD core team negotiated with other information governance resources from the wider
EMRAD network to agree the content of a Memorandum of Understanding so that an IG lead from one EMRAD
Trust could make recommendations for any another EMRAD Trust whilst the ultimate decision remained with
the Trust's Caldecott Guardian.
TaoHealth Research & Implementation 47
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
The results of the clinical and non-clinical staff surveys indicated some variation in the readiness of staff to use
AI tools as part of their workflow. Clinical staff were open to the use of AI as a second image reader although
they wanted to see evidence from research trials that it is safe and effective. It is not clear how or if the use of
Mia™ will change their experience of the clinical workflow. Non-clinical staff had been engaged early in the
project as part of the discovery process but had heard little until the last two months of the project.
Exploitation was a key activity of the test bed although again the more iterative and novel nature of this project
meant that this required a different approach to other test beds. In the case of Mia™, EMRAD was expecting to
be able to procure a trained and validated product for use in clinical workflows by the end of the project.
However, a few issues constrained this. A business case was drafted, but without an agreed price for Mia™, could
not be developed in detail. Whilst Class IIa CE marked11, under the new evidence standards for digital health
technology (NICE, 2019), Mia™ is classed as a Tier 3b12 which requires the highest level of evidence. These
evidence standards are not mandatory but are being used by regulators and purchasers across the NHS in
England when assessing the value of new technology. The standards were developed during this project. The
project partners contributed to its design and testing of the first questionnaire to assess technology evidence.
There is still no clear process for applying the evidence standards in practice.
Even if Mia™ was fully tested in clinical workflows, a price set and it was approved by regulators, there was still
no procurement mechanism available for commercial exploitation. Kheiron applied to the Health System
Support Framework (HSSF) Lot 0 in early 2020 but failed to secure a place. Currently, Kheiron is not on any
procurement frameworks for the NHS. In the case of Faculty, there is currently no service optimisation tool to
procure. When this changes, Faculty is on the UK government’s gCloud procurement framework and has already
successfully become a supplier to NHSX supporting the development of the NHSX AI Lab.
The project implementation was led by EMRAD core team. This team and its previous experience working closely
with the imaging system vendor, GE Healthcare, ensured that GE were involved in the project from the very
beginning. The supportive involvement of GE Healthcare minimised what could have been a very significant
obstacle to implementation.
The programme leadership surveyed in July 2020 were asked about their perception of the progress made by
the project up to that point against expectations [Figure 24]. The group was split between those who thought it
had made the expected progress and those who thought it had made less than expected. Only one respondent
thought it had made more progress than expected and that respondent came to the project in early 2020.
11 Class IIa medical devices are low to medium risk devices. 12 Digital health technologies (DHTs) with measurable user benefits, including tools used for treatment and diagnosis, as well as those influencing clinical management through active monitoring or calculation. It is possible DHTs in this tier will qualify as medical devices.
TaoHealth Research & Implementation 48
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 24: The programme leadership's perception of project progress (July 2020)
When asked to explain why they gave this assessment, those that felt the project had made the progress they
expected cited the following factors as influencing the progress; the novelty of the technologies, the complexity
of the project and the trust between the partners in the project. They pointed mitigating actions that they
observed included changing project scope, learning iteratively by building formative evidence and leveraging the
EMRAD core team and its considerable change management experience. For those who thought the project had
made less progress than they hoped, they pointed to the absence of real-world application of the tools, a key
deliverable of the project in their view. The problems extracting data were identified as a key constraint.
Quotes from the programme leadership survey
“From an EMRAD point of view, this is very much new territory for us and is the first time that we have had to
apply for HRA approvals, complete Data Access and Processing Agreements, Data Protection Impact
Assessments, etc.”
“Whilst it would be nice to say that Mia (and Faculty's tool) were in use in clinical settings, this is not the case
right now. I think this is not a reflection of slow progress, but rather of the complexity of the task, and the
innovative nature of the technologies, but also getting these technologies ready to be applied in a clinical
setting.”
“[This] has been a steep learning curve for many of us.”
“I am disappointed that the tool is not yet in a live or semi-live setting.”
“At one point in the project, I thought that Faculty would have to drop out of the project as they were unable to
access the necessary data.”
“We have learned that it is far more complex than we first envisaged, with a number of regulatory and national
stakeholders needing to endorse the changes before we can deploy.”
“The project has been delayed for numerous reasons, some beyond the projects' control.”
TaoHealth Research & Implementation 49
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Focusing on project attributes, a majority of the programme leadership who responded agreed that the vision
and objectives of the project were clear (89%), decision-making and reporting was clear timely and agile (89%)
and the project was well resourced for implementation (72%). They also expressed the view that the
relationships between the project partners were open and based on trust (72%). There was less agreement when
programme leaders were asked their view on the involvement of clinical and non-clinical teams from the breast
screening service in the project, with only 45% agreeing that this was the case. This reflects the views expressed
by a few of the staff in the round 2 surveys who had also expected to see more progress in the project by July
2020 and questioned why they had not been more involved.
TaoHealth Research & Implementation 50
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What wider contextual issues affected the progress of the project?
The project was undertaken during a period of accelerated change for technology in the NHS and increasing
interest in the use of AI in healthcare. The wider contextual factors which impacted the progress of the project
were policy, regulation, social attitudes and Covid-19. Some of these impacts enabled progress and others
constrained it. The findings related to contextual issues below are based on content analysis of relevant UK
policy documents, regulation white papers, AI project board minutes, individual and group interviews, the
programme leadership survey administered in July 2020, the public survey administered in December 2019 -
February 2020 and the focus groups.
The policy context for technology including AI in healthcare has been supportive over the last six years (Asthana,
2019) with the NHS Five Year Forward View (Maruthappu, 2014) and NHS Long Term Plan (Alderwick, 2019) both
dedicating significant space to the role of technology in enabling greater care integration and coordination and
more focus on prevention for better population health.
A number of initiatives have been set up to promote the testing and adoption of technology in healthcare
including the NHS Test Beds Programme, of which this project is part, the Digital Catalyst programme and the
Global Digital Exemplars programme all of which have received significant investment. The Topol Review was
conducted in 2018-19 and looked at the effect of increasingly digital modes of healthcare provision on the
workforce and made a series of recommendations. NHSX was set up in July 2019 to bring together expertise in
digital, data and technology that had been fragmented across different bodies into one organisation. NHSX
included an AI team which published a number of policy papers during 2019 and 2020 including the “Code of
Conduct for data driven health and care technology” and “Artificial Intelligence: How to get it right” and set up
the AI Lab in late 2019. EMRAD and its commercial partners were involved in discussions with NHSX and other
third-party stakeholders (such as Public Health England) about the challenges it was facing around data sharing,
collaborating with providers of systems outside the immediate programme. These discussions led to the
unlocking of some constraints for the project, as well as providing insight to NHSX informing the development
of some of their policies.
One of the areas of focus for NHSX’s AI Lab is the regulation of AI within the health system. The key health and
care regulators for England are all involved in this work and include the National Institute of Health Excellence
(NICE), Care Quality Commission (CQC), Medicines and Healthcare Products Regulatory Agency (MHRA) and the
Health Research Authority (HRA). The priorities for this work on regulation of AI in the NHS in England are (1)
streamlining processes for regulatory approval of AI tools, (2) regulating synthetic data sets, and (3) post-market
surveillance. One of the partners in this project, Faculty, are supporting NHSX in the set up and initial running of
the AI Lab. Whilst the teams supporting the two projects are separate, one of the project managers from the
EMRAD project was moved to the AI Lab when this project was set up in April 2020 and would have taken the
learnings from the EMRAD project with them. During the project, NHSX introduced the “Evidence Standards for
Digital Health Technologies” in March 2019 and the EMRAD project team were asked to be part of the testing of
the first assessment questionnaire tool. This enabled to the project to reflect on some of the requirements and
burdens of likely future regulation and provide feedback to NHSX on the tool. This feedback was incorporated
into successive iterations of the assessment toolkit.
During the period of this project the General Data Protection Regulation (GDPR) came into force across the EU
in May 2019. This placed new requirements for information governance particularly in the responsibilities of
data controllers and processers.
TaoHealth Research & Implementation 51
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
In May 2020, the ICO, together with the Alan Turing Institute published guidance on explaining decisions made
by AI (ICO and Alan Turing Institute, 2020) and later the ICO updated their guidance on AI and data protection
(ICO, 2020). While neither of these had a direct impact on the project, both of these are likely to have significant
impact on any future implementation of the tested AI tools. This is underlined by the importance that women
responding to our survey placed on how AI technology would be governed and regulated once in place (n=163).
For the AI project each new introduction of guidelines in respect of AI and data protection required the project
team spending time reading and understanding the impact of the guidance for the project and communicating
this to stakeholders, and recommending changes to project delivery and implementing these. All of these
provided valuable learning, which was recorded in the project lessons log, but also contributed to the delays
experienced by the project.
In August 2020, the Secretary of State for Health and Social Care announced that PHE would be abolished in
March 2021 and replaced by the National Institute for Health Protection. It is not yet clear where responsibility
for the national screening programmes including breast screening will sit, although the NSC will continue to
advice on any changes to the screening programmes. This announcement had little impact on the project
although it may impact how the recommendations and outcomes of the project are followed through if
responsibility for the national programme remains unclear.
The project was delivered against a backdrop of increasing awareness amongst the public about the use of AI in
daily life, the role of algorithms in decision-making and the benefits and disbenefits of technology in general. It
also surfaced some of the differences in the working practices and mindsets of people working in the technology
sector and people, including clinicians, working in the NHS. These social factors were noted throughout the
duration of the project. Whilst understanding and ‘explainability’ of AI remains relatively low across the board,
2020 has seen a number of public scandals involving algorithmic decision-making including the abandonment of
algorithmically determined exam results in the UK in August 2020. In the same month facial recognition AI used
by South Wales policy was ruled unlawful by the local authorities, stopping the use of algorithmic decision tools
for benefit claims. This will affect public trust in AI and algorithm-based tools. A small but not insignificant group
(n=242) of women we surveyed expressed the view that AI should not be used within the breast screening
programme and it will be interesting to see if this group increases or decreases in time.
Socio-cultural differences between commercial technology firms and public sector providers were evident from
the outset. The first output from Faculty at the end of their ‘discovery’ phase in December 2018, presented ten
ideas for AI service optimisation tools. The AI project board asked for these to be evaluated for potential
outcomes, financial savings and commercial market opportunities. This ‘business case’ type approach contrasts
with the agile approach that was used by Faculty. Two board meetings over two months were needed to resolve
this difference in expectations and gain understanding of each other’s different working practices.
Quote from the Programme Leadership survey
“One of the biggest challenges for me personally has been bridging the NHS-industry communications gap. The
complexity of the technology, the scale, the novel nature of the task at hand, and the continuously steep learning
curve that comes for companies working on the development and testing of these technologies, often makes it
difficult to communicate externally at the right level of simplicity/ transparency/ cadence, whilst also not
introducing too much uncertainty or confusion such that trust is damaged.” (AI project programme leadership
team member #17)
There have also been other differences between AI project stakeholders on outcome expectations. A continuum
emerged between those who wanted to take a measured and cautious approach to gathering clinical evidence
TaoHealth Research & Implementation 52
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
as part of a process of clinical trials (for Mia™) and those who wanted to move quickly to real world deployment
based on existing results and monitor effectiveness in this real-world context. The nature of the tools being
tested as part of this project as ‘in development’ rather than ‘market-ready’ underscored these differences and
the lack of clarity around regulatory requirements further complicated matters. The NHS Test Bed Programme
itself was set up to promote the deployment and scale of proven technologies that are market ready. This meant
that there was a mismatch at times between the expectations of the EMRAD management board and the
clinicians about what is deliverable within the window of the project.
Quote from the Programme Leadership survey
“I am disappointed that the tool is not yet in a live or semi-live setting. The insistence on randomised trials …… is
short-sighted given that the technology is already proven to work. There are other ways to implement safely” (AI
project programme leadership team member #5)
The Covid-19 pandemic had several effects on the project some of which were positive. During the pandemic,
the national breast screening programme was suspended and was only recommenced in July 2020 and for some
breast screening units even later as estate and workforce limitations made it more difficult to make the resumed
service “Covid-secure”. Many staff in the two breast screening units that took part in this project were
redeployed to support other services in their Trusts during the Spring 2020. Early in the national response, the
project team produced a mitigation document which was shared with all stakeholders including Innovate UK and
NHS England which set out this risks and how they would be managed and monitored through project
governance.
The pandemic provided an opportunity for the accelerated deployment of technology for patient consultations,
diagnostics, treatment monitoring and communications across the NHS (Schwamm LH, 2020). In round 2 of the
staff survey (July 2020), we asked clinical and non-clinical staff to reflect on the effect this had on their daily and
working life. The most staff saw substantial change in the use of technology over the four months prior to the
survey (March – June 2020) with the biggest change felt at ULH. Only 12% of respondents experienced no change
at all [Figure 25].
TaoHealth Research & Implementation 53
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 25: Have you noticed a change in your use of technology over the last 4 months or since the beginning of the Covid-19 pandemic both personally and professionally?
When we compared clinical and non-clinical respondents, it was clear that clinical staff had experienced more
effect from the rapid technological change that took place over the four months, although the opposite was the
case for staff groups at one site (Sherwood Forest Hospitals Foundation Trust) [Figure 26]. It will be interesting
to see if this is sustained over time.
Figure 26: Comparing experience of change - clinical versus non-clinical
TaoHealth Research & Implementation 54
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What is the potential impact of the screening imaging innovation programme on the performance of screening services?
Kheiron Medical Technologies added a third site to the retrospective study in September 2019 (not an EMRAD
Imaging Network member and in a different English region) which uses different mammography equipment.
This site was not included in the NHS Test Bed project. They used the data gathered from the retrospective
studies in all three sites to answer the evaluation questions set out at the beginning of this project.
Anonymised mammograms were extracted from the three sites and a randomised sample was not used for
algorithm development and training but retained for validation. The date range for the mammograms was Jan
2012- Jan 2019. The AI algorithms opinion (normal or recall) was paired with the opinion of the 1st human reader
to simulate double human readings which were obtained from NBSS. Sensitivity, specificity and discordant
opinion rate were calculated.
a) Is the Mia™ software suitable for use in a large-scale screening programme like NHSBSP by comparing
rates of specificity/sensitivity/recall rate on a large cohort of historic screening cases?
The following results were submitted in an abstract to the annual Radiological Society of North America (RSNA)
conference in December 2020 (Sharma, 2020 [in press]). 40,588 mammograms were reviewed. Overall, for the
two human reader process, 40,230 had a normal outcome and 358 were biopsy proven cancers. The rate of
discordant opinion for two human readers was 1,216/40,588 (3%). The overall recall rate was 4%, with a cancer
detection rate of 8.5 per 1000 [Figure 27].
When the AI algorithm was applied as the second reader to this test set, there was consensus in 33,255 (81.9%)
of the reads to either recall or not recall the cases. This meant that 7,333 (18.1%) of reads were discordant
between the 1st human reader and the AI algorithm. The AI algorithm had a sensitivity of 85.5% and specificity
of 87.2% compared with the first human reader with a sensitivity of 89.4% and specificity 96%. Combining the
AI algorithm with reader 1 gave a sensitivity 95.0% and specificity of 96.9%, cancer detection rate of 8.4 per 1000
and recall rate of 4%. The results are presented in the table below.
Figure 27: Summary results - Mia™ retrospective study
In late September 2020, an additional, entirely withheld, 130,269 cases from EMRAD (together with 150k+ cases
from other screening units in Leeds and Hungary) were processed by Mia in a clinical trial. The results are
currently being validated by an external CRO. Preliminary analysis from Kheiron indicate:
• AI performance has transferred well from internal data [Figure 27] to the Trial, with the number of cases
increasing more than quarter million (250k) and cases coming from two countries.
• Standalone, the AI has shown superiority in sensitivity and non-inferiority in specificity compared to
single readers.
Metric Mia™ standalone R1 standalone R1 + R2 screening outcome R1+ Mia™ screening outcome
Sensitivity (%) 85.5% 89.4% 96.9% 95.0%
Specificity (%) 87.2% 96.0% 96.7% 96.9%
Cancer detection rate (per 1000) 7.5 7.9 8.5 8.4
Recall rate (%) 17.5% 4.9% 4.0% 4.0%
2nd human reader required (%) 100.0% 18.1%
3rd human reader required (%) 3.0% 0.0%
TaoHealth Research & Implementation 55
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
• With the AI as an independent reader in a double reader workflow, statistical analysis is indicating
equivalent clinical performance (non-inferior in sensitivity and superior in specificity) and significant
operational savings.
An updated CE mark and academic publications based on the trial results are expected in early 2021.
b) Does the software achieve similar results for different manufacturers equipment used across the
EMRAD consortium?
In a second paper submitted to the RSNA conference 2020 (James, 2020 [in press]), Kheiron looked at the results
for three different manufacturers equipment. A random selection of 40,642 of the 245,277 extracted cases were
held back and used as a test set for this study and had not been used in the training of the algorithm. Human
reader results, outcome and pathology data on each case were extracted from NBSS. Cancer cases were all
biopsy proven. Cases were only considered normal for the analysis if they had a 3-year negative follow-up
mammogram. Difference in performance of the AI algorithm between vendors was assessed using receiver
operator characteristic (ROC) analysis. A Two-One-Sided-T-test (TOST method) was used to assess equivalence
in sensitivity and specificity between vendors.
There were 12,462 mammograms consisting of 402 biopsy proven cancers and 12,060 normal cases with a 3-
year negative follow up. There were 6,378 mammograms from vendor 1, 3,988 mammograms from vendor 2
and 2,096 from vendor 3. Overall, the AI algorithm had a sensitivity and specificity of 86.7% and 87.1%
respectively. The algorithm had a sensitivity and specificity for mammograms from vendor 1 of 87.3% and 88.0%,
vendor 2 of 85.7% and 85.8% and from vendor 3 of 86.9% and 87.1% respectively (p<0.00037 for non-
equivalence at a 2.5% margin, for all tests, thus equivalence shown).
c) How do these results inform the process of designing and developing a prospective pilot with the
Mia™ software?
Prospective pilot design was not achieved within the timeframe of the Test Bed. In September 2020, Kheiron
were successfully granted an NHSx and AAC Phase 4 AI Award to deploy Mia™ across 15 sites in the UK over 3
years. At the time of writing, NICE and the Technology Specific Evaluation Team (TSET) assigned to Kheiron are
undertaking an evidence gap analysis as part of the AI Award to advise on the design and development of
prospective pilots of Mia™, in line with evidence standards for digital health technologies and the UK National
Screening Committee criteria for screening.
TaoHealth Research & Implementation 56
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What is the potential impact of the screening optimisation innovation programme?
It has not been possible to evaluate the impact of the round length ML tool as it has not been deployed in a live
environment as part of the screening service workflow. These questions remain important questions to answer
and should be addressed in 2021 once the tool is deployed.
The planned evaluation questions (December 2018)
What is the potential impact of the screening optimisation tool;
1. On breast screening programme manager and administrative staff time?
2. On breast screening workforce utilisation?
3. On optimised breast screening pathway and clinical risk management?
4. On the rate of breast screening ‘did not attends’ (DNAs)?
It is important to note that while the originally planned benefits of the part of project have not yet been
delivered, the likely value to EMRAD, Faculty and the wider NHS of developing a synthetic data set using NBSS
data will be significant for developing operational tools and for research purposes. The interest in developing
synthetic data sets in healthcare is growing, with NHS Digital providing a synthetic data advisory service13 and
NHS England and Improvement sharing synthetic A&E data sets to support the development of new tools that
will help with A&E demand and capacity management14.
The development of the deployment environment is another output with likely tangible benefits for EMRAD and
Faculty as a Trusted Research Environment (TRE). TREs were developed by NHS Digital and Health Data Research
UK in response the need for rapid research using real data in the context of Covid-1915. The actual benefits
accrued will need to be monitored in the longer term. The model for development used for NUH could be scaled
to other Trusts in England. A possible additional question that could be evaluated in the future could be:
What will the synthetic data set (SDS) allow NHS breast screening units to do that could not otherwise be done?
13 https://digital.nhs.uk/services/e-referral-service/document-library/synthetic-data-in-live-environments 14 https://data.england.nhs.uk/dataset/a-e-synthetic-data 15 https://digital.nhs.uk/coronavirus/coronavirus-data-services-updates/trusted-research-environment-service-for-england
TaoHealth Research & Implementation 57
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Was the programme worth the investment, that is, did it deliver value for money and if not in the timeframe of the evaluation, when is it likely to deliver a return on investment?
A novel multi-stakeholder project of this nature will generate different benefits for each of the different
participants. Thus, when seeking to answer the question of ‘value for money’ and whether the benefits of the
sum of efforts exceeds the direct costs of these efforts, a contextual lens is applied to the likely effort-outcome
for each participant.
A budget impact analysis (BIA) was undertaken to answer this question. BIA complements other types of
economic evaluations such as cost-effectiveness analysis (NICE, 2019), by providing decision makers with
additional information on the financial consequences of commissioning and procuring new technologies. More
information on the methods used in the BIA is provided in Appendix 3. This scope of the BIA covered Mia™
deployment for (i) the two test sites, (ii) all EMRAD sites and (iii) England. An analysis was not conducted for the
service optimisation tool as this was not tested or deployed within the data collection timeframe.
The BIA compared the usual care model (two human readers) with the proposed model (one human reader and
one AI reader) [Figure 28].
Figure 28: Usual care pathway versus new AI second reader pathway
Costs
In determining the costs of the project implementation, we have worked off the return figures submitted by
each party to Innovate UK [Table 13].
TaoHealth Research & Implementation 58
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Table 13: Project costs
Partner costs
Nottingham University
Hospitals NHS Trust
Faculty AI Ltd
GE Healthcare
Ltd
Kheiron Medical
Technologies Ltd
Matrix Insight Ltd16 Total
Labour £671,387 £280,223 £23,319 £244,839 £40,000 £1,259,768
Overheads £0 £56,045 £4,664 £48,968 £0 £109,677
Materials £0 £20,000 £76,000 £22,800 £0 £118,800
Capital usage £0 £0 £0 £56,000 £0 £56,000
Subcontract £63,52817 £0 £0 £29,820 £0 £93,348
Travel and subsistence £17,500 £3,000 £1,500 £15,000 £3,040 £40,040
Other costs £156,164 £0 £0 £7,500 £0 £163,664
Total £908,579 £359,268 £105,483 £424,927 £43,040 £1,841,297
Grant £908,579 £251,487 £0 £297,449 £30,128 £1,487,643
Own contribution £0 £107,781 £105,483 £127,478 £12,912 £353,654
As can be seen in Figure 29, people (labour) costs accounted for the largest (72%) component of this project.
With a dedicated project team, EMRAD (hosted by Nottingham University Hospitals NHS Trust) accounted for
52% of the labour cost with Kheiron and Faculty accounting for 18% and 21% of labour costs, respectively.
Kheiron was the only entity to have allocated capital usage costs (approximating 3% of the total project cost).
Public sector grants funded 80% of project costs, with 20% contributed by the four private sector entities.
EMRAD costs were fully funded by the grant.
Figure 29: Cost distribution for the AI in breast screening project
16 Matrix Insight Ltd went into administration on 21 August 2019 at which point NUH took responsibility for subcontracting an alternative evaluation provider, TaoHealth Research & Implementation. 17 Includes the cost of subcontracting the evaluation partner.
TaoHealth Research & Implementation 59
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Analysing the information received, we focused on what assumptions and predicted effects could be supported
by evidence, and what the material impact would be of deploying the Mia tool within the breast screening
programme. Since it would be difficult to validate downstream benefits before establishing peer-reviewed
clinical evidence gathered as part of retrospective studies and prospective trials, we focused on what immediate
and direct resource implications were revealed during the initial retrospective study phase covering three sites
(including the two EMRAD sites).
The most material and tangible benefit identified, around which there was sufficient supporting evidence was
around second reader time. Having already analysed relative cost breakdowns in both general ledgers, we were
able to narrow down focus to the most verifiable impact of AI in second report screening, that is, cost of reader
time.
We used ledger information from NUH and ULH on actual staff cost allocations for the last three years and
discussions with screening unit managers to establish the proportion of time spent on screening activity where
other non-screening activity is delivered by the team [Figure 30]. We estimated the labour cost of the usual care
model looking at clinical staff available for screen reading only. We were unable to cost the proposed care model
without the price information from Kheiron. Future economic evaluation that takes a broader system-wide
perspective would need to use up to date unit costs for health and social care in England18.
There were some key differences in work practices between NUH and UHL. Firstly, at NUH only radiology
consultants read mammograms, radiographers are not used for reading or reporting. At UHL reporting
radiographers19 at band 6-8 are used alongside radiologist reporting.
Figure 30: Labour cost of usual care model for breast screening service (clinical staff engaged in reading breast images only)
18 https://www.pssru.ac.uk/project-pages/unit-costs/ 19 https://www.sor.org/practice/reporting/radiographer-reporting
TaoHealth Research & Implementation 60
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
The locum consultant radiologist expenditure at UHL is much higher reflecting the significant workforce shortage
faced by the trust in this area.
Since there are currently no agreed prices for Mia™, nor any published market pricing, it was not possible to
compute the cost of deployment in a real-world setting or to calculate specific budget impacts. Illustrative break-
even / recoupment figures were modelled and these were shared with the EMRAD team but due to the
commercially sensitive nature of the information, not included in this report.
Outcomes
Whilst the benefits claimed by Kheiron Medical Technologies in their business case were noted, including
reduced waiting time for results, increased uptake from women screened and reduced costs of assessment
clinics, without evidence from current retrospective studies or prospective trials, the specific immediate benefit
of reducing second reader time was focused on.
Factoring in hourly reading rates, arbitration rates under the two scenarios (Reader 1 + Reader 2 + Arbitration
Reader 3; Reader 1 + AI + Arbitration Reader 3), estimates of capacity released were built.
A nominal growth rate in population was applied using ONS data and based on historic trends in activity and
productivity. The effect of introducing an AI second reader on human reading hours required was modelled
using the arbitration levels which were identified in the retrospective study (18.2%). This gave an estimate of
human screening hours required for each care model, usual versus AI [Table 14].
Table 14: Human reading hours required by each model of care
The charts below [Figure 31] show comparative screen reading hours under the usual care and proposed new
model of care scenarios, with the chart on the right reflecting lower resource utilisation under a human reader
and AI second reader scenario. Below [Figure 32] the same data is presented for each of the two sites separately.
Figure 31: Human reading hours required by each model of care at NUH and UHL
TaoHealth Research & Implementation 61
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 32: Predicted human reading hour change by site
While the AI reduced second reader hours, it also increased arbitration (i.e. third reader hours) rates, thereby
resulting in approximate capacity release of 42% of human reader time. As trials progress and the technology
and processes benefit from learning and refining, it is hoped that the arbitration rates will decrease, and effective
capacity release would increase. Thus, the pressure on existing resource (i.e. locum and consultant radiologists
and reporting radiographers) would in turn decrease, either allowing more screening with the same level of staff,
or a lower staffing to meet existing needs. These variables will need to be considered as part of future health
economic analysis during prospective trials and real-world implementation.
Using existing staff cost templates, and comparing these to NHS reference costs, the effective budget
implications were calculated based on the predicted changes to the requirement for human readers. A number
of different cost scenarios based on a range of different staff grades were calculated. The most common cost
scenarios, consultant only image reading and band 8 reporting radiographer image reading, are presented below
showing the potential annual cost savings using NUH and ULH activity data projected over the next five years
[Table 15].
Table 15: Annual direct cost savings based on two scenarios (2021-2025)
With regard to returns on implementation costs, that is, grants awarded for the initial test bed phase
(implementation costs), qualitative benefits accrued to EMRAD were identified which included; creating
information governance blueprints, and establishing baseline reader time savings using an AI second reader with
potential mitigation on workforce shortage. Financial benefits accrued to Kheiron Medical in the form of grant
funding of £15M through the NHS AI Award Phase 4 to Kheiron to proceed with further retrospective study of
the algorithm at 15 sites across England. While these studies will not be with an EMRAD trust, it would not be
unreasonable to directly attribute the award of this £15M grant to the work done as part of the EMRAD-led test
bed. Hence, the initial grant awarded to Kheiron (£297 449) approximates 1.9% of the next phase of funding
recently awarded. Adding in Kheiron’s own contribution of £127 248, the financial return on effort remains
substantive.
TaoHealth Research & Implementation 62
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What would the impact of the screening imaging innovation programme be if implemented at scale across EMRAD?
The only information that can be extrapolated in the absence of cost data for the proposed new model of care
(Mia™) is the potential effect on resourcing of adopting an AI second reader at scale across all seven EMRAD
trusts. This was further extrapolated to England using the same methodology. Activity data for the last three
years directly was obtained from EMRAD trusts and cross referenced this with KC62 returns to NHS Digital20. A
3% growth rate was used for the screening population as agreed with EMRAD based on likely changes to the
population invited to screening by the NHSBSP over the next five years. The KC62 returns were used to establish
trends across England and the same growth rate for screening invitations. The graphs below indicate the
potential effect of using AI as a second reader [Figure 33].
Figure 33: Predicted human reading hour change for (a) EMRAD and (b) England
This analysis is limited in its scope and has limited meaning without information on change to costs for provider
trusts (the usual care costs of image reading based on the skill-mix used) and evidence from prospective trials
on workflow and downstream effects. It does give some indication of the potential effect on workforce utilisation
and the potential for an AI second reader to help resolve the acute workforce shortage in radiology.
20 https://datadictionary.nhs.uk/data_sets/central_return_data_sets/nhs_breast_screening_programme_central_return_da
ta_set__kc62_.html
TaoHealth Research & Implementation 63
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
What would the impact of the screening optimisation innovation programme be if implemented at scale across EMRAD?
For the same reasons it has not been possible to evaluate the impact of the round length management ML tool
at the test sites (no live deployment as part of the screening service workflow), it not possible to evaluate the
impact of wider scale and spread at this stage.
TaoHealth Research & Implementation 64
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Summary of key findings
The key findings of the evaluation are:
1. Clinicians working in the breast screening service are positive about the potential for AI as a second
reader in breast screening but want to see more evidence from clinical trials and real-world validation.
2. Service administrative staff and managers (non-clinical) are less convinced about the potential for AI in
service optimisation but have also seen less obvious development in this area over the period of the
project.
3. Adult women of and under screening age are generally positive about the introduction of AI into the
breast screening service but they also want to see evidence of effectiveness and safety especially where
the technology is used as a second reader.
4. The same factors influenced the early stages of implementation of these novel technologies as any other
digital health technology.
5. Some factors that are additional and unique to AI were evidenced during the implementation of this
project.
6. The results of the retrospective study using the AI mammogram reader (Mia™), when used to model the
impact of the technology on resourcing showed that using AI as a second reader could reduce the time
required from human readers (radiologists and reporting radiographers) by 42%.
TaoHealth Research & Implementation 65
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Discussion and implications Evaluating progress, outcomes, and impact
When this project is reviewed in terms of progress against the plans originally set out in December 2018, there
is a significant difference between what was originally planned and what has been delivered summarised in
Figure 34 below.
Figure 34: Planned versus actual project plans
TaoHealth Research & Implementation 66
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Explaining the progress
Complexity and novelty of this particular project were two reasons for the slower than originally planned rate of
progress that the programme leadership group pointed to when surveyed. The dimensions of complexity that
have been evident in this project are structural, socio-political and emergent (Maylor, 2013).
1. Structural
Delays in securing project funding and in recruiting some of the resource to support project implementation,
changes in project scope in response to unanticipated information governance hurdles, and requirements for
research approval led to delays in the project delivery overall. In some cases, planned workstreams were
abandoned (economic evaluation and clinical deployment of Mia™) or postponed (service optimisation tool
development, training and testing). Not involving information governance expertise as part of the core team
from the point of writing the proposal for Test bed funding is a key lesson learnt by the programme team.
Information governance was so central to the project’s progress that early involvement of IG specialists in
planning may have enabled more proactive mitigation strategies to be put in place.
2. Socio-political
Cultural differences between technology start-up commercial partners and NHS trusts were evident. Both
technology start-ups had limited experience of working in the NHS either in a research or delivery capacity. This
became evident early on during the discussions around information governance and data sharing.
Commercialisation and scaling up highlighted these differences again as technology companies worked with the
NHS to develop a business case that would gain support from Trust boards. As business case development
progressed, the different expectations of outcomes between Kheiron and EMRAD trusts became obvious.
EMRAD trusts were keen to move quickly to adoption, Kheiron were more cautious, citing the need to deliver
clinical trial data first. The policy context as well as political and social priorities were moving rapidly over the
course of the project and heavily influenced the progress of the project. The project itself influenced policy and
regulatory developments as evidenced by the number of times it is referenced as a case study in policy
documents (Commons, 2018) (NHSX, 2019). The promotion of AI technology in healthcare in the UK is driven by
political commitment directly from the Secretary of State 21 and this high profile, alongside the multiple
stakeholders with an interest in the outcome of this project (NHSX, PHE, NHS England & Improvement, Office
for Life Sciences, Innovate UK, CQC, ICO and NHS EMRAD Trusts), means that the project has had to satisfy a
range of interests which have not always been aligned.
3. Emergent
The Covid-19 pandemic influenced the project in ways that could not have been anticipated and had to be
adapted to as the risk emerged and was realised.
21 https://www.computerweekly.com/news/252488719/Matt-Hancock-announces-50m-for-healthcare-AI-projects
TaoHealth Research & Implementation 67
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Attributing outcomes
It is challenging to attribute outcomes in complex projects such as this one (Bovaird, 2014). Add to this, the
novelty of a project that involves testing and validating new technology that is not yet fully regulated or
commercially available, and the traditional approach to evaluating a programme theory of change becomes even
more challenging.
The process of co-producing the programme theory of change and informing its development as the project
progressed has enabled the project team to draw out some emerging impact pathways from the data.
The process of conducting the HRA approved research using Mia™ has had some limited impact on the
confidence with which the clinical workforce in test sites perceive the AI tool. However, it is worth noting that
similar patterns of increased confidence were noted at the control sites which may be indicative the increased
profile of the use of AI in healthcare and breast screening specifically during the period of the project. The
process of conducting the discovery work with non-clinical staff for the service optimisation tool did have a small
effect on levels of engagement with staff and positive perceptions of the possible impact of and AI tool but this
was not sustained through the delayed design and development of the tool. Overall, there were no significant
differences across test and control sites that can be attributed to the objective of the test bed.
None of the predicted impacts in relation to numbers of women screened, workforce productivity or experience
of care have been evidenced in the real world at this stage. The most likely immediate impact of workforce
productivity has been modelled as part of the budget impact analysis but remains to be tested in real-world
deployment.
TaoHealth Research & Implementation 68
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Implications
Based on the findings of this evaluation and what is known from the theoretical and empirical literature in this
area to date, we have highlighted the following recommendations that are specific to the UK context but may
be generalisable to other contexts:
For policy makers and regulators
1. Continue to evolve the system of regulation in collaboration with interested groups, shifting the focus from
AI technology firms to healthcare professionals and the wider public, including protected groups, as adopters
of the technology.
2. Continue to consider the role of data governance and ethics in the application of AI. Consider the impact on
power relationships in the context of person-centred integrated care starting with focusing on the role of
informed consent, involving the public in the design and monitoring of these approaches beyond user
testing.
For the national breast screening programme
3. Set out the expectations for the evidence threshold to be generated as part of future retrospective and
prospective clinical trials of AI as a second reader of mammograms undertaken in the UK population.
4. Clarify the requirements and priorities for wider socio-political research on the impact of implementing AI
technology in the breast screening programme.
For breast screening units and the NHS trusts that run them
5. If considering adopting AI as part of the clinical or non-clinical workflow, understand the level of readiness
of your workforce, workflow and organisation to take up the new technology and ensure that the
appropriate information governance and change management support is in place from the beginning to
deliver the change.
6. Apply the principles set out in NHSX’s A Buyers Guide to AI in Healthcare22 to the procurement and
implementation of service optimisation AI tools for use in the breast screening programme.
22 https://www.nhsx.nhs.uk/ai-lab/explore-all-resources/adopt-ai/a-buyers-guide-to-ai-in-health-and-care/
TaoHealth Research & Implementation 69
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
7. Share learning on the developments in AI across the breast screening unit team (clinical and non-clinical
staff) and open a forum as part of team professional development that discusses critically developments in
technology including AI in breast screening.
For radiologists, radiographers and other clinical staff
8. Provide support and incentives for staff to learn from (and if possible engage in) research on AI in breast
imaging as part of the CPD requirements in your work place.
9. Consider the likely effects of adopting AI as a second reader in the clinical pathway as part of a multi-
disciplinary team in terms of professional accreditation and ongoing development, productivity, simplifying
working practice and improving the experience of care.
For women of and under screening age and society more widely
10. Ask for information about the results of clinical and real-world research on the effectiveness, safety and
ethics of AI in breast screening and other healthcare applications in ways that are clear and understandable
to the layperson.
TaoHealth Research & Implementation 70
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Limitations of the study
The greatest limitation of this evaluation is the lack of evidence generated to conduct a full impact evaluation.
In the time frame of the project this has not been possible, and this was highlighted to stakeholders early in the
project.
The short time between the two rounds of the staff survey combined with the change in scope for both Mia™
(dropping the clinical deployment workstream) and the service optimisation tool (delaying the development of
the tool by one year) meant that there was little change to be measured between the two surveys. This also
contributed to the relatively low rates of response in round 2 of the surveys.
Lessons learnt for future evaluation design
AI in healthcare, including breast screening, whilst still relatively new, is likely to be adopted widely in the coming
years. Recent publications have set out new guidelines for evaluating AI technology in healthcare during
development and testing (Rivera, 2020) (Liu X. R., 2020) governing trial protocols and trial reporting. These are
only beginning to be applied in practice.
The scope of this evaluation was looking at the real-world effects of the two tools being tested. The evaluation
aimed to set out a rich narrative that provided context, exploring the complexity and novelty of the test bed
project in a way that could be accessible to future test sites. With the benefit of hindsight, it can be seen that
the evaluation design should have encompassed the retrospective clinical study and an independent evaluation
of the effectiveness of the service optimisation tool alongside the real-world evaluation of adoption and spread.
This multi-disciplinary approach to independent, verifiable evaluations of AI in healthcare practice will be
essential in the future.
Informed by these lessons, additions to the NASSS framework (Greenhalgh, 2017) are proposed [Figure 35Figure
35] that can be used when evaluating AI technology in real-world clinical and non-clinical workflows, specifically
focusing on readiness for adoption. These additions are discussed in more detail in a publication submitted to
the Journal of Medical Internet Research in December 2020.
TaoHealth Research & Implementation 71
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 35: Evaluating AI technology in healthcare - addition to the NASSS framework
Implementation planning starts with a clear value proposition that sets out who the beneficiaries are, prioritises
them if appropriate and predicts when the value will be realised and how this will be established. The value
proposition should be a live document which is iterated based on implementation feedback and evaluation
learning. In the case of AI technology, the value proposition will need to be shared with a wider group than other
technologies at the early stage of adoption and this is likely to include research authorities, regulators, and ethics
committees. Similarly, the landscape of AI technology providers is limited and only in the foothills of bespoke
regulation, leaving implementation teams in the position of having to make more complex and risky decisions
about commercial contracts for a technology they may not be very familiar with.
During implementation, data quality, security and processes specific to AI technology require steps for training,
validating, and testing algorithms (Harvey H., 2019) or for creating synthetic data sets (Pollack, 2019) which go
beyond those required for traditional information technology presenting implementation teams with new
business processes to develop within and between organisations. This drives a requirement for ethical
governance which provides accountability for this new technology that needs to include members with the
following backgrounds:- clinical accountability, corporate accountability, information governance, research
ethics, person living with the target condition, and applied AI in health expertise. As with any technology project
that involves adoption within a existing workflow, having change managers dedicated to supporting the change
processes required to enable successful impact should never be underestimated. These processes include two-
way stakeholder communication and engagement throughout.
When delivered in combination all the above will contribute to building adopter trust in the AI technology being
implemented. Including people living with the target condition who can support and drive communication with
service users throughout the project, from communicating algorithm explainability, designing informed consent
and monitoring emerging feedback from evaluation, trials and / or clinical audit.
TaoHealth Research & Implementation 72
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
There is an issue which may arise in the future relating to adaptation over time for AI technology that does not
apply to traditional technology and that is continuous learning, where the algorithm continues to use ‘live’ data
to learn and adapt. There are no examples of this in use in healthcare as yet (Rivera, 2020) but this will need to
be monitored.
TaoHealth Research & Implementation 73
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Acknowledgements This evaluation would not have been possible without the cooperation of a few people and organisations.
Staff from across the four study sites freely gave their time to support and engage in this study either as staff
members within the Breast Screening Units or as ordinary women who are either attending breast screening
now or will in the future. The Research & Innovation teams from the sites supported this work proactively
encouraging people to take part in the study. Women from the general public also took part in the survey and
in the subsequent focus groups.
The NHS EMRAD AI project team were instrumental in facilitating introductions to the service teams and other
local stakeholders and played a proactive role in removing the barriers to the evaluation when they arose,
without seeking to influence the independence of the evaluation.
TaoHealth Research & Implementation 74
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
References Abdullah R, F. B. (2020). Health Care Employees' Perceptions of the Use of Artificial Intelligence
Applications: Survey Study. . J Med Internet Res. , 22(5):e17620. doi:10.2196/17620.
Academy of Medical Sciences. (2018). Future data-driven technologies and the implications for use of
patient data. Ipsos Mori.
Ada Lovelace Institute. (2020). Exit through the app store. London: Ada Lovelace Institute.
Ada Lovelace Institute. (2020). No green lights, no red lines: Public perspectives on COVID-19
technologies. Ada Lovelace Institute.
AHSN Network. (2018). Accelerating Artificial Intelligence in health and care: results from a state of
the nation survey. London: Department of Health and Social Care.
Alderwick, H. &. (2019). The NHS long term plan . BMJ , 364 :l84 .
Allen, B. D. (2020). Integrating Artificial Intelligence Into Radiologic Practice: ALook to the Future.
Data Science and Radiologic Practice, 280-283.
Asan, O. E. (2020). Artificial Intelligence and Human Trust in Healthcare: Focus on Clinicians. Journal
of Medical Internet Research, 2(6):e15154) doi: 10.2196/15154.
Asthana, S. J. (2019). Why does the NHS struggle to adopt eHealth innovations? A review of macro,
meso and micro factors. . BMC Health Services Research , 19, 984.
https://doi.org/10.1186/s12913-019-4790-x.
Bjerring, J. &. (2020). Artificial Intelligence and Patient-Centered Decision-Making . Philosophy &
Technology , https://doi.org/10.1007/s13347-019-00391-6.
Bonten TN, R. A.-P. (2020). Online Guide for Electronic Health Evaluation Approaches: Systematic
Scoping Review and Concept Mapping Study. Journal of Medical Internet Research,
22(8):e17774.
Bovaird, T. (2014). Attributing Outcomes to Social Policy Interventions – ‘Gold Standard’ or ‘Fool's
Gold’ in Public Policy and Management? Social Policy & Administration, 1-23.
Brennan, S. (2020, August 25). New screening committee to replace PHE role. HSJ.
Brown, A. C.-H. (2019). CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in
Computing Systems (pp. Paper No.: 41 Pages 1–12https://doi.org/10.1145/3290605.3300271).
Glasgow: https://doi.org/10.1145/3290605.3300271.
Buch, V. A. (2018). Artificial intelligence in medicine: Current trends and future possibilities. British
Journal of General Practice, 143-144.
Cannizzaro, S. P. (2020). Trust in the smart home: Findings from a nationally representative survey in
the UK. . PLoS ONE , 15(5): e0231615. https://doi.org/10.1371/journal.pone.02316.
Care Quality Commission (b). (2020). Using machine learning in diagnostic servicesA report with
recommendationsfrom CQC’s regulatory sandbox. CQC.
Care Quality Commission. (2020). Getting to the right carein the right way – digital triage in health
servicesA report with recommendationsfrom CQC’s first regulatory sandbox. CQC.
TaoHealth Research & Implementation 75
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Care, D. o. (2019). Code of conduct for data-driven health and care technology. London: Department
of Health and Social Care.
Challen R, D. J. (2019). Artificial intelligence, bias and clinical safety. BMJ Quality & Safety , 231-237.
Char, D. S. ( 2018 ). Implementing Machine Learning in Health Care — Addressing Ethical Challenges.
New England Journal of Medicine, 981–983. doi:10.1056/NEJMp1714229.
Chartrand, G. C. (2017). Deep Learning: A Primer for Radiologists. RadioGraphics, 2113-2131.
Chen, J. a. (2017). Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated
Expectation. New England Journal of Medicine, 2507-2509.
Cicchetti, D. (1992). Neural Networks and Diagnosis in the Clinical Laboratory: State of the Art. Clinical
Chemistry, Volume 38, Issue 1, Pages 9–10, https://doi.org/10.1093/clinchem/38.1.9.
Coeckelbergh, M. ( 2019). Ethics of artificial intelligence: Some ethical issues and regulatory
challenges. Mark Coeckelbergh, Ethics of artificial intelligence: Some ethical issues and
regulatory challenges, Technology and Regulation, 31–34
https://doi.org/10.26116/techreg.2019.003 • ISSN: 2666-139.
Cohen, C. K. (2017). Acceptability Among Community Healthcare Nurses of IntelligentWireless
Sensor-system Technology for the Rapid Detection of HealthIssues in Home-dwelling Older
Adults. The Open Nursing Journal, 54-63.
Coiera, E. (2019). The Last Mile: Where Artificial Intelligence Meets Reality. Journal of Medical
Internet Research, 21(11):e16323. doi:10.2196/16323.
Commons, H. o. (2018). The Independent Breast Screening Review . London: HC 1799.
Cresswell, K. (2018). Health Care Robotics: Qualitative Exploration of Key Challengesand Future
Directions. Journal of Medical Internet Research, 20(7):e10410.
Cruz, J. &. (2006). Applications of Machine Learning in Cancer Prediction and Prognosis . Cancer
Informatics, 59-77.
Daniels, N. G. (2019). STEER: Factors to Consider When Designing Online Focus Groups Using
Audiovisual Technology in Health Research. International Journal of Qualitative Methods. ,
doi:10.1177/1609406919885786.
Davenport, T. &. (2019). The potential for artifcial intelligence in healthcare. Future Healthcare
Journal, 94-98.
Department of Health and Social Care. (2019). Government response to the Independent Breast
Screening Review recommendations. London: Department of Health and Social Care.
Dustler, M. (2020). Evaluating AI in breast cancer screening: a complex task . The Lancet, e106-107.
Evans, D. W. (2012). Assessing Individual Breast Cancer Risk within the U.K. National Health Service
Breast Screening Program: A New Paradigm for Cancer Prevention. Cancer Prevention
Research, 943-951.
Fenech, M. S. (2018). Ethical, social and political challenges of artificial intelligence in health. Future
Advocacy and Wellcome Trust.
Floridi, L. C. (2020). How to Design AI for Social Good: Seven Essential Factors. Science and
Engineering Ethics, 1771–1796 https://doi.org/10.1007/s11948-020-00213-5.
TaoHealth Research & Implementation 76
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Flynn, R. A. (2018). Two Approaches to Focus Group Data Collection for Qualitative Health Research:
Maximizing Resources and Data Quality. International Journal of Qualitative Methods.,
doi:10.1177/1609406917750781.
Gao S, H. L. (2020). Public Perception of Artificial Intelligence in Medical Care: Content Analysis of
Social Media. Journal of Medical Internet Research , 22(7):e16649.
Gong, B. N. (2019). Influence of Artificial Intelligence on Canadian Medical Students' Preference for
Radiology Specialty: ANational Survey Study. Academic Radiology, 566-577.
Gøtzsche, P. &. (2013, June 3). Screening for breast cancer with mammography. Retrieved from
www.cochrane.org: https://www.cochrane.org/CD001877/BREASTCA_screening-for-breast-
cancer-with-mammography
Greenhalgh T, W. J. (2020). Video consultations for covid-19 . BMJ, 368 :m998 .
Greenhalgh, T. W. (2017). Beyond Adoption: A New Framework for Theorizing and Evaluating
Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of
Health and Care Technologies. Journal of Medical Internet Research , e367. doi:
10.2196/jmir.8775. PMID: 29092808; PMCID: PMC5688245.
Gutiérrez-Ibarluzea I, C. M. (2017). The Life Cycle of Health Technologies. Challenges and Ways
Forward . Frontiers in Pharmacology, (8):14
https://www.frontiersin.org/article/10.3389/fphar.2017.00014 .
Harvey H., &. G. (2019). A Standardised Approach for Preparing Imaging Data for Machine Learning
Tasks in Radiology. . In M. S. Ranschaert E., Artificial Intelligence in Medical Imaging. .
Springer, Cham. https://doi.org/10.1007/978-.
Harvey, H. a. (2018). Algorithms are the new drugs? Reflections for a culture of impact assessment
and vigilance. Proceedings of the International Conferences on e-Health . Madrid: IADIS.
Health Education England. (2019). Preparing the healthcare workforce to deliver the digital future:
The Topol Review. Health Education England.
Health Education England. (2020). One year on: Progress on the recommendations of the Topol
Review. NHS England and Improvement.
Health Quality Ontario. (2016). Women’s Experiences of Inaccurate Breast Cancer Screening Results: A
Systematic Review and Qualitative Meta-synthesis. Health Quality Ontario.
House of Commons. (2019a). Independent review of national cancer screening programmes in
England: Interim Report. London: House of Commons.
Houssami, N. K.-J. (2019). Artificial Intelligence (AI) for the early detection of breast cancer: a scoping
reviewto assess AI’s potential in breast screening practice. Expert Review of Medical Devices,
351-362.
ICO. (2020, July 30). Key DP themes: ICO Website. Retrieved from ICO Website: https://ico.org.uk/for-
organisations/guide-to-data-protection/key-data-protection-themes/guidance-on-ai-and-
data-protection/
ICO and Alan Turing Institute. (2020, May 20). Key DP Themes: ICO Website. Retrieved from ICO
Website: https://ico.org.uk/for-organisations/guide-to-data-protection/key-data-protection-
themes/explaining-decisions-made-with-artificial-intelligence/
Information Commissioners Office. (2020). Guidance on AI and data protection. ICO.
TaoHealth Research & Implementation 77
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Ipsos Mori. (2017). Public views of Machine Learning. The Royal Society.
James, J. S. (2020 [in press]). Generalisability of a commercially available Artificial Intelligence (AI)
solution across multiple hardware vendors in a national breast cancer screening programme.
Radioligy Society of North America (RSNA) Annual Meeting.
Jutzi TB, K.-H. E.-L. (2020). Artificial Intelligence in Skin Cancer Diagnostics: The Patients’Perspective.
Frontiers in Medicine, 7:233.doi: 10.3389/fmed.2020.00233.
Karches, K. (2018). Against the iDoctor: why artificial intelligence should not replace physician
judgment. . Theoretical Medicine and Bioethics , 91-110 https://doi.org/10.1007/s11017-018-
9442-3.
Katell, M. Y. (2020). Toward situated interventions for algorithmic equity: lessons from the field. . In
Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20).
Association for Computing Machinery, (pp. 45–55.
DOI:https://doi.org/10.1145/3351095.3372874). New York, NY, USA.
Kelly CJ, K. A. (2019). Key challenges for delivering clinical impact with artificial intelligence. . BMC
Med., 17(1):195. Published 2019 Oct 29. doi:10.1186/s12916-019-1426-2.
Kerasidou, A. (2020). Artificial intelligence and the ongoing need for empathy, compassion and trust
in healthcare. Bulletin of the World Health Organisation, 245-250 doi:
http://dx.doi.org/10.2471/BLT.19.237198.
Kim, H.-E. K.-H.-K. (2020). Changes in cancer detection and false-positive recall inmammography using
artificial intelligence: a retrospective, multireader study. The Lancet, e138–48.
Kirsch, A. (2017). Explain to whom? Putting the User in the Center of Explainable AI. Proceedingsof
the First International Workshop on Comprehensibility and Explanation in AI and ML 2017 co-
located with 16th International Conference of the Italian Association for Artificial Intelligence.
Bari, Italy. hal-01845135 .
Kite, J. &. (2017). Insights for conducting real-time focus groups online using a web conferencing
service. . F1000 Research, 6:122. Published 2017 Feb 9. doi:10.12688/f1000research.10427.1.
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and
perspective. Artifcial Intelligence in Medicine, 89-109.
Lai, M.-C. B.-F. (2020). Perceptions of artificial intelligence in healthcare: findings for a qualitative
study among actors in France. Jounral of Translational Medicine.
Lee, L. K. (2019). The Current State of Artificial Intelligence in Medical Imaging and Nuclear Medicine.
BJR Open, 1: 20190037.
Lennon MR, B. M. (2017). Readiness for Delivering Digital Health at Scale: Lessons From a
Longitudinal Qualitative Evaluation of a National Digital Health Innovation Program in the
United Kingdom. Journal of Medical Internet Research, 19(2):e42.
Liu, X. F. (2019). A comparison of deep learning performance against health-care professionals in
detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet
Digital Health, e271-e297.
Liu, X. R. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial
intelligence: the CONSORT-AI Extension. BMJ, 370 :m3164.
TaoHealth Research & Implementation 78
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Løberg, M. L. (2015). Benefits and harms of mammography screening. . Breast cancer research , 63.
https://doi.org/10.1186/s13058-015-0525-z.
Loh, E. (2018). Medicine and the rise of the robots: a qualitative review of recent advances of artificial
intelligence in health. BMJ Leader, 59-63.
Long, H. B. (2019). How do women experience a false-positive test result from breast screening? A
systematic review and thematic synthesis of qualitative studies. British Journal of Cancer ,
351–358 https://doi.org/10.1038/s41416-019-05.
Maclin, P. D. (1991). Using neural networks to diagnose cancer. J Med Syst , 11–19
https://doi.org/10.1007/BF00993877.
Macrae, C. (2019). Governing the safety of artificial intelligence in healthcare. BMJ Quality & Safety ,
495-498.
Maruthappu, M. S. (2014). The NHS Five Year Forward View: transforming care. The British journal of
general practice : the journal of the Royal College of General Practitioners, 64(629), 635.
https://doi.org/10.3399/bjgp14X682897.
Mathioudakis AG, S. M.-P.-H.-C. (2019). Systematic review on women's values and preferences
concerning breast cancer screening and diagnostic services. Psycho-oncology, 939-947.
Maylor, H. T.-W. (2013). How Hard Can It Be?: Actively Managing Complexity in Technology Projects .
Research-Technology Management, 56:4, 45-51, DOI: 10.5437/08956308X5602125 .
McCarthy JF, M. K. (2004). Applications of machine learning and high-dimensional visualization in
cancer detection, diagnosis, and management. . Annals of the New York Acadamy of Sciences ,
1020:239-262. doi:10.1196/annals.1310.020.
McDougall, R. (2019). Computer knows best? The need for value-flexibility in medical AI. Journal of
Medical Ethics, 156-160.
McKinney, S. S. (2020). International evaluation of an AI system for breast cancer screening. Nature,
89–94 https://doi.org/10.1038/s41586-019-1799-6.
Mendelson, E. (2019). Artificial Intelligence in Breast Imaging: Potentials and Limitations. AJR
American Journal of Roentgenology, 293-299. doi:10.2214/AJR.18.20532.
Meskó, B. H. (2018). Will artificial intelligence solve the human resource crisis in healthcare? BMC
Health Services Research., 18, 545. https://doi.org/10.1186/s12913-018-3359-4.
Meskó, B. H. (2018). Will artificial intelligence solve the human resource crisis in healthcare?. . BMC
Health Service Research , 545 https://doi.org/10.1186/s12913-018-3359-4.
Milano, S. T. (2020). Recommender systems and their ethical challenges. . AI & Society ,
https://doi.org/10.1007/s00146-020-00950-y.
Miller, C. K. (2020, May 11). People, Power and Technology: The 2020 Digital Attitudes Report.
Retrieved from Doteveryone:
https://www.doteveryone.org.uk/report/peoplepowertech2020/
Morley J, F. L. (2019). How to designa governable digital health ecosystem. Available at SSRN:
https://ssrn.com/abstract=3424376 or http://dx.doi.org/10.2139/ssrn.3424376.
Morley J, F. L. (2020). NHS AI Lab: why we need to be ethically mindful about AI for healthcare. Pre-
publication, downloaded from Researchgate 10.13140/RG.2.2.23203.20004.
TaoHealth Research & Implementation 79
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Morley, J. F. (2019). A Typology of AI Ethics Tools, Methods and Researchto Translate Principles into
Practices. AI for Social Good workshop at NeurIPS . Vancouver, Canada.
Morley, J. F. (2020). From What to How: An Initial Review of Publicly Available AI Ethics Tools,
Methods and Research to Translate Principles into Practices. Science and Engineering Ethics ,
2141–2168 https://doi.org/10.1007/s11948-019-0016.
Nagendran, M. C. (2020). Artificial intelligence versus clinicians: systematic review of design,
reporting standards, and claims of deep learning studies. BMJ, 368:m689.
Nelson, A. H. (2019). Predicting scheduled hospital attendance with artificial intelligence. . npj Digital
Medicine , 26 https://doi.org/10.1038/s41746-019-0103-3.
NHS Breast Screening Programme. (2020, August 23). AgeX Trial. Retrieved from www.agex.uk:
http://www.agex.uk/
NHS Digital. (2020, August 23). Breaat Screening Programme, England 2018-19. Retrieved from NHS
Digital: https://digital.nhs.uk/data-and-information/publications/statistical/breast-screening-
programme/england---2018-19
NHSX. (2019). Artificial Intelligence: How to get it right. Putting policy into practice for safe data-
driven innovation in health and care. London: NHSX.
NICE. (2019). Evidence standards framework for digital health technologies. National Institute for
Health and Care Excellence.
NICE. (2019). Evidence Standards Framework for Digital Health: Cost Consequences and Budget
Impact Analyses. York: York Health Economics Consortium.
Oh S, K. J. (2019). Physician Confidence in Artificial Intelligence: An Online Mobile Survey. Journal of
Medical Internet Research, 21(3):e12422.
Ongena, Y. H. (2020). Patients’ views on the implementation of artificial intelligence in radiology:
development and validation of a standardized questionnaire. . European Radiology, 1033–
1040 https://doi.org/10.1007/s00330-019-06486-0.
Open Data Institute. (2020). Covid-19: Identifying and managing ethical issues around data. London:
ODI.
Oren, O. G. (2020). Artificial intelligence in medical imaging: switching from radiographic pathological
data to clinically meaningful endpoints,. The Lancet Digital Health, e486-e488.
Ouchchy, L. C. (2020). AI in the headlines: the portrayal of the ethical issues of artificial intelligence in
the media. . AI & Society , https://doi.org/10.1007/s00146-020-00965-5.
Panch T, M. H. (2019). The “inconvenient truth” about AI in healthcare. npj Digital Medicine, 77
https://doi.org/10.1038/s41746-019-0155-4.
Panch, T. S. (2018). Artificial intelligence, machine learning and health systems. Journal of Global
Health,, 8(2), 020303. https://doi.org/10.7189/jogh.08.020303.
Park, C. Y. (2020). Medical Student Perspectives on the Impact of Artificial Intelligence on the Practice
of Medicine. Current Problems and Diagnostic Radiology, in press.
Park, S. &. (2018). Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial
Intelligence Technology for Medical Diagnosis and Prediction. Radiology, 800-809.
TaoHealth Research & Implementation 80
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Pesapane F, V. C. (2018). Artificial intelligence as a medical device in radiology: ethical and regulatory
issues in Europe and the United States. Insights Imaging, 745-753. doi:10.1007/s13244-018-
0645-y.
Pinto dos Santos, D. G. (2019). Medical students' attitude towards artificial intelligence: a multicentre
survey. . European Radiology, 1640–1646 https://doi.org/10.1007/s00330-018-5601-1.
Pollack, A. S. (2019). Creating synthetic patient data to support the design and evaluation of novel
health information technology. Journal of Biomedical Informatics,
https://doi.org/10.1016/j.jbi.2019.103201.
Powell, J. (2019). Trust Me, I’m a Chatbot: How Artificial Intelligence in Health CareFails the Turing
Test. Journal of Medical Internet Research, e16222 doi: 10.2196/16222.
Public Health England. (2017). Screening Quality Assurance visit report NHS Breast Screening
Programme Lincolnshire. Public Health England.
Public Health England. (2018). Screening Quality Assurance visit report NHS Breast Screening
Programme Kettering. Public Health England.
Public Health England. (2018). Screening Quality Assurance visit report NHS Breast Screening
Programme North Nottinghamshire. Public Health England.
Rainey, L. v. (2018). Women’s perceptions of the adoption of personalised risk-based breast cancer
screening and primary prevention: a systematic review. Acta Oncologica, 1275-1283.
Rajkomar, A. O. (2018). Scalable and accurate deep learning with electronic health records. npj Digital
Medicine, https://doi.org/10.1038/s41746-018-0029-1.
Recht, M. D. (2020). Integrating artificial intelligence into the clinical practice of radiology: challenges
and recommendations. European Radiology , 3576–3584 https://doi.org/10.1007/s00330-
020-06672-5.
Reform. (2018). Thinking on its own: AI in the NHS. London: The Reform Research Trust.
Reid F. Thompson, G. V. (n.d.).
Richards, P. S. (2019b). Report of the Independent Review of Adult Screening Programmes in England.
London.
Rivera, S. L.-W. (2020). Guidelines for clinical trial protocols for interventions involving artificial
intelligence: the SPIRIT-AI Extension. BMJ , 370 :m3210 .
Robbins, S. (2019). AI and the path to envelopment: knowledge as a first step towards the responsible
regulation and use of AI-powered machines. AI & Society, 391-400.
Rong, G. M. (2020). Artificial Intelligence in Healthcare: Review and Prediction Case Studies.
Engineering, 291-301.
Royal College of Radiologists. (2019). Clinical radiology UK workforce census report 2018. Royal
College of Radiologists.
Royal College of Radiologists. (2020). Clinical Radiology: England workforce 2019 summary report.
Royal College of Radiologists.
Salim M, W. E. (2020). External Evaluation of 3 Commercial Artificial Intelligence Algorithms for
Independent Assessment of Screening Mammograms. JAMA Oncology,
doi:10.1001/jamaoncol.2020.3321.
TaoHealth Research & Implementation 81
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Schwamm LH, E. J. (2020). Virtual care: new models of caring for our patients and workforce. Lancet
Digit Health, 2(6), e282-e285.
Select Committee on Artificial Intelligence. (2018). AI in the UK: ready, willing and able? London:
Authority of the House of Lords.
Shah, R. &. (2018). IOT and AI in healthcare: A systematic literature review. Issues in Information
Systems , 33-41.
Sharma, N. J. (2020 [in press]). Impact of an Artificial Intelligence (AI) solution as a reader in a national
breast screening programme. Radiology Society of North America (RSNA) Annual Meeting.
Shaw J, R. F. (2019). Artificial Intelligence and the Implementation Challenge. . Journal of Medical
Internet Research , 21(7):e13659. doi:10.2196/13659.
Sheikh A, C. T. (2011). Implementation and adoption of nationwide electronic health records in
secondary care in England: final qualitative results from prospective national evaluation in
“early adopter” hospitals. BMJ, 343.
Shen, J. Z. (2019). Artificial Intelligence Versus Clinicians in Disease Diagnosis: Systematic Review.
JMIR Medical Informatics, 7(3):e10010.
Sit, C. S. (2020). Attitudes and perceptions of UK medical students towards artificial intelligence and
radiology: a multicentre survey. Insights Imaging , 11, 14 https://doi.org/10.1186/s13244-019-
0830-7.
Smith, H. (2020). Clinical AI: opacity, accountability, responsibility and liability. AI & Society,
https://doi.org/10.1007/s00146-020-01019-6.
The Guardian. (2020, August 24). Councils scrapping algorithms. Retrieved from The Guardian:
https://www.theguardian.com/society/2020/aug/24/councils-scrapping-algorithms-benefit-
welfare-decisions-concerns-bias
The Nightingale Centre. (2019). Annual Scientific Report . Manchester: Prevent Breast Cancer.
The RSA. (2018). Engaging citizens in the ethical use of AI for automated decision-making. The RSA.
Thompson, R. V. (2018). Artificial intelligence in radiation oncology: A specialty-wide disruptive
transformation? Radiotherapy and Oncology, 421-426.
Tran, V. R. (2019). Patients’ views of wearable devices and AI in healthcare: findings from the
ComPaRe e-cohort. npj Digital Medicine, https://doi.org/10.1038/s41746-019-0132-y.
Tschider, C. (2018). The consent myth: Improving choice for patients of the future. Washington
University Law Review, 1505-1536.
van Hoek. J. Huber, A. L.-K. (2019). A survey on the future of radiology among radiologists, medical
students and surgeons: Students and surgeons tend to be more skeptical about artificial
intelligence and radiologists may fear that other disciplines take over,. European Journal of
Radiology.
Vollmer, S. M. (2020). Machine learning and artificial intelligence research for patient benefit: 20
critical questions on transparency, replicability, ethics, and effectiveness. BMJ, 368 :l6927.
Waymel, S. B. (2019). Impact of the rise of artificial intelligence in radiology: What do radiologists
think?,. Diagnostic and Interventional Imaging, 327-336.
Webster, P. (2020). Virtual health care in the era of COVID-19. The Lancet, 1180-1181.
Willemink, M. K. (2020). Preparing medical imaging data for machine learning . Radiology, 4-15.
TaoHealth Research & Implementation 82
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Wolff J, P. J. (2020). The Economic Impact of Artificial Intelligence in Health Care: Systematic Review.
Journal of Medical Internet Research. , 22(2):e16866. Published 2020 Feb 20.
doi:10.2196/16866.
Xiang Y, Z. L. (2020). Implementation of artificial intelligence in medicine: Status analysis and
development suggestions. . Artificial Intelligence in Medicine, 102:101780.
doi:10.1016/j.artmed.2019.101780.
Zhang, B. &. (2019). Artificial Intelligence: American Attitudes and Trends. Oxford, UK: Center for
theGovernance of AI, Future of Humanity Institute, University of Oxford.
TaoHealth Research & Implementation 83
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendices
APPENDIX 1 – DATA COLLECTION TOOLS
APPENDIX 2 – THEMATIC ANALYSIS FRAMEWORK
APPENDIX 3 – BUDGET IMPACT ANALYSIS PROCESS
APPENDIX 4 – THEORIES OF CHANGE (DECEMBER 2018)
APPENDIX 5 – EVALUATION QUESTIONS AND ADAPTATIONS
APPENDIX 6 – THE STORY OF THE PROJECT
APPENDIX 7 – PARTICPIANT INFORMATION
TaoHealth Research & Implementation 84
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendix 1
Data collection tools
Survey Maps – staff surveys
Figure 36: Clinical survey map - round 1
Figure 37: Non-clinical survey map - round 1
TaoHealth Research & Implementation 85
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 38: Clinical survey map - round 2
Figure 39: Non-clinical survey map - round 2
TaoHealth Research & Implementation 86
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Survey map – Public survey
Figure 40: Public survey map (one round only)
Programme Leadership survey map
Figure 41: Programme leadership survey map - based on parts of the NASSS framework
TaoHealth Research & Implementation 87
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Focus group protocol
Introduction
The design of the evaluation of this project, which was summarised in the interim evaluation report23, included
a set of focus groups to explore in more detail some of the issues raised in the public survey administered
between December 2019 and February 2020. These were due to take place during April and May 2020. The
Covid-19 pandemic forced a change in the approach and timing of the focus groups and this design document
summarizes the refreshed approach.
Recruitment
We will look to recruit 3 groups at a minimum. One group will be of women under screening age (<50 years),
another will be of women of screening age (50-70 years) and the final group is women of any age from Asian
backgrounds. This final group was under-represented in the survey and we would like to try and capture their
views in the use of AI in breast screening more comprehensively.
We will work with the EMRAD project team to identify members of their patient involvement group who may
be interested as well as with the communications and diversity & inclusion groups at the 7 EMRAD trusts. We
will communicate with local (East Midlands) community and voluntary sector groups with primarily female
membership and ask them to communicate the study and the invitation to participate with their membership.
We will also use social media to invite women from target groups to participate.
Sample size and other considerations for online focus groups
Whereas with in-person focus groups, a single group could consist of as many as 13 individuals, online focus
groups are a bit more clunky, and are prone to technology issues, lagging, internet dropouts, and interruptions.
For that reason, a brief review of the literature shows that it is optimal to cap focus groups at around 6
participants (Kite, 2017); (Flynn, 2018); (Daniels, 2019).
Consent and pre-focus group survey
All participants will fill in a consent form before the invitation to the focus group is sent to them. This will be
based on the consent which was submitted as part of the HRA approval and included at the beginning of the
public survey.
They will also be asked to complete a short survey before the focus group to share some individual characteristic
information that will be used as part of the data analysis after the focus group and to ensure that the groups are
balanced.
Facilitation
The role of the researcher or facilitator is key to making focus groups effective. Once the group is underway,
the researcher or facilitator will allow, as much as possible, discussion to emerge from the group itself. The
facilitator will occasionally help the group to focus and structure their discussion, bring discussion back or
move it on, widen the discussion to include everyone, and ensure a balance between participants. The
facilitator will guide and space the discussion to ensure all the issues are covered, and they probe individuals
and the group as a whole to encourage in-depth exploration. They will be alert to non-verbal language and to
23 CAPACITY CARE AND CONFIDENCE. Developing and Testing Artificial Intelligence in Breast Screening Interim Evaluation Report. TaoHealth Research & Implementation. Nov 2019.
TaoHealth Research & Implementation 88
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
the dynamic of the discussion, and they will challenge or stimulate the group if what is said seems too readily
challenging to reflect social norms or apparent consensus.
Group format
Each group will be scheduled for 60 minutes. An outline of the format of each group is set out below.
Timing Description
Set-up The facilitator will open the virtual room 15 minutes before the start to help any participants having technical difficulties.
Introduction By the facilitator describing the study and the contribution they are making through the focus group. Permission will be sought for group recording for the purpose of analysis only. Ground rules for the online discussion will be set including use of text chat and hand raising. Each participant to introduce them selves by first name only.
Discussion Facilitator will use the topic guide below to explore in more depth some of the points raised through the public survey.
Close The facilitator will bring the group to a close.
Introduction to the technology under investigation
This has been based on the advice of the EMRAD AI Project PPPG which met on 23 June 2020 and will be
supported by slides.
• What do you think AI is?
• The definition of AI that is being used
• You could think of Artificial Intelligence (AI) as computers and robots understanding patterns,
pictures, speech and language. They can learn from their understanding and make decisions.
• Do you know how the breast screening process works after a woman has been for your mammogram?
• Do you know there is a shortage of doctors especially in radiology, the specialism that reads mammograms?
Topic guide
1. How do you feel about the use of technology to provide you with health care?
a. Examples will be given = online booking for GP appointments; video-conferencing consultations;
remote monitoring apps for diabetes and heart disease.
2. Have you ever used any technology when seeking or receiving health care?
a. If any of these used artificial intelligence, how would you feel?
3. Trust is a word that often comes up for people in this context. What would make you trust a technology that
used artificial intelligence more or less?
4. Another theme that came up in our survey was a concern that AI technology lacked empathy and emotion.
How would that effect your views about implementing it?
5. Would you want to know that an AI product was being used to read mammograms when you went for a
breast screening appointment?
a. If you would, what would you want to know?
TaoHealth Research & Implementation 89
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
6. Imagine a scenario where you had a choice between a breast screening appointment that has one human
reader and one AI reader and could get your results back in one week and a breast screening appointment
that has two human readers and could get your results back in three weeks (due to not enough imaging
readers) – which would you chose and why?
These topics capture the content to be covered. Specific questions will be framed based on the group and the
flow of the discussion.
TaoHealth Research & Implementation 90
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendix 2
First order thematic analysis
Domain / Node Detail / Sub-node
Reading process
Workforce shortage
Admin process
AI is….. What is AI?
Not sure
Reliablity might be better
Greater efficiency
Possibly safer
Job loss
Improve outcomes
Releases professionals for value-added human tasks
Over-reliance
Patient empowerment
Cheaper
More data
Trust is an issue
Must ensure privacy and security
Needs to be evidence based
Choice and consent are important
Want to see humans augmented by machines
Want machines overseen by humans
Must ensure equitable access
Owners are private for profit
Who owns or understands it?
Owners are not for profit
It will happen
It should happen
It shouldn't happen
AI misses the 'human touch'
AI misses accountablity
No
Yes
It depends
Cost How much will it cost to implement?
What is misses
Need to know AI is being used
Breast screening
Effects
Governance
How it is used
Owner
The future
TaoHealth Research & Implementation 91
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendix 3
Budget impact analysis methodology
The aim was to focus on affordability by conducting a Budget Impact Analysis identifying the costs and possible
savings related to the application of the AI solutions and investigating the affordability of these as a function of
available resources.
At a very basic level, our focus was to identify what would be materially impacted (time, resources), what savings
could be realised and what variables and uncertainties needed to be quantified [Figure 42].
Figure 42: The budget impact analysis process
The key was therefore to identify the resources that might change [
Figure 43], allowing the modelling of plausible scenarios and sensitivity analysis. Data was provided to us by the
EMRAD team, specifically around the two sites selected for the breast screening test bed. Additional data
sources are listed below. Finance managers provided extracts of their general ledgers itemising direct and
indirect costs allocated. We have relied on all historical information provided to us as being accurate, and made
no attempt to verify this information. EMRAD also shared the full business case submitted, along with their own
analysis of mammogram reading costs and the projected implementation budget of the next phase with Kheiron
Medical Technology’s Mia™ testing (as part of the AI Award Phase 3 application). We were unable to conduct
analysis of the service optimisation tool as no data was available for testing in the timeframe of the evaluation.
Budget impact analysis (BIA) has been defined as a tool to predict the potential financial impact of the
adoption and diffusion of a new technology into a healthcare system with finite resources.
BIA considers costs and benefits which are monetised – non-financial benefits are excluded.
No discounting is undertaken for costs and benefits in future years.
TaoHealth Research & Implementation 92
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 43: Data sources for BIA
Analysing the information received, we focused on what assumptions could be supported by evidence, and what
the material impact would be of deploying the Mia™ tool within the breast screening programme. Where
opinions were expressed, or in cases where ranges were given (i.e. number of mammograms reviewed per hour),
we discussed these with the clinical managers concerned and proceeded with a conservative mid-range figure.
Since it would be difficult to validate downstream benefits before establishing peer-reviewed clinical evidence,
we focused on what immediate and direct resource implications were revealed during the initial test phase.
The most material and tangible benefit identified, around which there was sufficient supporting evidence was
around second reader time. Having already analysed relative cost breakdowns in both general ledgers, we were
able to narrow down focus to the most verifiable impact of AI in second report screening: cost of reader time.
Figure 44: Approach to analysis
Whilst we noted benefits claimed by Kheiron Medical Technologies include reduced waiting time for results,
increased uptake from women screened and increased productivity of non-reader staff, we could only find
evidence to support a specific benefit: reducing second reader time.
Since there were no agreed prices, nor any published market pricing, we were unable to compute specific budget
impacts at a given investment level. Furthermore, with the commercial information not being part of the public
domain, we did not feel it appropriate to include any commercially sensitive information in this report.
We computed some illustrative break-even / recoupment figures based on some pricing scenarios. This was
done in layers, showing what costs could be recouped (i.e. implementation cost layer, implementation costs and
payments) with the aim of establishing break-even levels for each outflow. This analysis was shared with EMRAD
but has not been published here for the reasons given above.
TaoHealth Research & Implementation 93
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
It should be noted that this analysis centred around quantifiable direct savings from reduced cost based on
different skill mixes (under different radiologist costs and radiographer grades and whether mix was consultant
+ consultant, consultant + Band 7, consultant + Locum or Locum + Locum arrangements) and no downstream
benefits such as reduced assessment clinic costs, have been quantified nor modelled. Sites where reader costs
are highest (i.e. Locum + Locum), the savings from AI are higher than where lower cost readers (i.e. Consultant
+ Band 7/8) are deployed.
TaoHealth Research & Implementation 94
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendix 4
Theories of change (December 2018)
Mia™
TaoHealth Research & Implementation 95
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Faculty’s Platform tool
TaoHealth Research & Implementation 96
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendix 5
Original question Revised question Reason for change
1. What is the potential impact of the screening imaging innovation programme on the performance of screening services?
a) Is the Mia™ software suitable for use in a large-scale screening programme like NHSBSP by comparing rates of specificity/sensitivity/recall rate on a large cohort of historic screening cases?
b) Does the software achieve similar results for different manufacturers equipment used across the EMRAD consortium?
c) How do these results inform the process of designing and developing a prospective pilot with the Mia™ software?
No change N/A
2. What is the measurable impact of the screening optimisation innovation programme?
a) On breast screening programme manager and administrative staff time?
b) On breast screening workforce utilisation?
c) On optimised breast screening pathway and clinical risk management?
d) On the rate of breast screening ‘did not attends’ (DNAs)?
No change
For future: What will the synthetic data set (SDS) allow NHS breast screening units to do that could not otherwise be done?
Added a question about the synthetic data set (SDS) as this was introduced at a late stage in the project.
TaoHealth Research & Implementation 97
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Original question Revised question Reason for change
3. What have been the moderating factors in implementing the programme of innovations?
That enabled the project to progress;
That constrained progress.
What were the technological and data benefits and challenges for the project?
What were the perceived benefits of the use of AI tools in the NHSBSP workflow?
What were the concerns raised about the use of AI tools in the NHSBSP workflow?
What were the organizational issues that enabled or constrained the progress of the project?
What wider contextual issues affected the progress of the project?
Focus on the most salient enablers and constraints of the project from the perspective of different stakeholders:
• Programme leadership including commercial partners;
• Breast screening unit workforce;
• Women at and under screening age.
4. Was the programme worth the investment, that is, did it deliver value for money and if not in the timeframe of the evaluation, when is it likely to deliver a return on investment?
How would a future prospective evaluation of Mia™ and the round length optimisation tool determine the return on investment based on early exploratory findings from this project?
A budget impact analysis (BIA) was conducted with gaps in information (e.g. product price is not yet set for Mia™ and no data was available on the deployment of the round length ML tool). Assumptions have been made for the purpose of the evaluation (Mia™ only) but the model is only indicative and would need to be tested when price information is available and the products are being tested in the real world.
5. What would the impact of the screening imaging innovation programme be if implemented at scale across EMRAD?
How would a future prospective evaluation of Mia™ evaluate the impact of large-scale implementation across EMRAD based on early exploratory findings from this project?
The indicative BIA provides input to the health economic model for future implementation at scale.
TaoHealth Research & Implementation 98
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
6. What would the impact of the screening optimisation innovation programme be if implemented at scale across EMRAD?
How would a future prospective evaluation of the round length optimisation tool determine the impact of large-scale implementation across EMRAD based on early exploratory findings from this project?
Early exploratory findings not yet available.
TaoHealth Research & Implementation 99
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendix 6
The AI in breast cancer screening project – October 2018 to December 2020
How the project come about
NHS EMRAD had been an Acute Care Collaborative (ACC) as part of the NHS New Care Models programme in
2016-18 and in the summer of 2018 the Medical Director of EMRAD, Dr Tim Taylor, met Kheiron Medical
Technologies and Faculty (formerly ASI Data Science) and two other potential partners to discuss a potential bid
for funding under the NHS England Test Bed Wave 2 programme. The joint bid which included the seven EMRAD
NHS Trusts, Kheiron Medical Technologies, Faculty, GE Healthcare (the providers of the imaging infrastructure
for EMRAD) and others was successful in August 2018.
The first three months
October to December 2018 were spent focusing on the set-up of the project. This meant establishing the project
governance structures and processes, straightforward for an EMRAD project team that had done this before.
Amongst the project documentation that needed to be drafted was the collaboration agreement. This required
some lengthy negotiation with involvement from legal advisors and was only signed in January 2019 after some
prolonged discussion regarding intellectual property. In this period, Kheiron were conducting initial preparatory
work for data extraction and Faculty were engaged in a discovery process with breast screening unit managers.
EMRAD and Kheiron also used this time to engage with women at a number of breast cancer events to get a
preliminary understanding of their thoughts about the use of AI in breast screening and feedback on the name
of the product. It was through this process that the name Mia™ (Mammography Intelligent Assessment) was
agreed. Delays disbursing funds from Innovate UK meant that all partners had to proceed at risk during this
period. Also during this period, the Test Beds national team had established national information governance
and evaluation advice partners for all Wave 2 Test Beds sites. Based on feedback from Test Bed sites, this support
was terminated in early 2019.
Some early challenges to overcome
The first 9 months of the initially 18-month long project were characterised by information governance
negotiations, planning and problem-solving. These challenges underlined the importance of clear leadership,
open conversations and working through disagreements constructively. Early lessons that were learnt included
the importance of involving information governance and research expertise from the bid stage of the project
process. It was also noted that the evaluation and lessons learnt from the Wave 1 Test Bed programme were
not made available which left the project team feeling that there was a risk that they were repeating mistakes
that they did not need to make.
The challenges get bigger
From mid-summer 2019 there was a growing recognition of the perhaps unrealistic ambition of the timescale
and scope for a project as innovative as this. Information governance remained an issue and with minimal
guidance available from, for example, the Information Commissioners Office (ICO), the project had to forge a
novel and sometimes fraught path. This was compounded by some inconsistency in how information governance
was approached for the two commercial partners. During this time, the regulatory and policy context was
changing. New guidance like the Evidence Standards for Digital Health Technology and the creation of NHSX as
a non-statutory advisory body were amongst almost monthly changes at this time. Project progress was steady
TaoHealth Research & Implementation 100
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
if slow. An extension of 6 months was granted to all Wave 2 projects during this time, in recognition of the delays
caused funding disbursement. All of this was happening at a time when the project was on regular display to the
outside world, presenting at conferences and sitting on panels sharing experiences. A decision was made by the
end of 2019 to stop attending these events and focus on delivery.
And the biggest (and most unanticipated) challenge of them all
In March 2020, the Covid-19 pandemic led the UK government to declare a nationwide lockdown. The NHS was
already dealing with the consequences of the infection. As part of the response the pandemic, the NHS breast
screening programme, alongside all other national screening programmes was placed ‘on pause’. Some EMRAD
project staff were redeployed to frontline clinical work at participating trust sites and Kheiron Medical
Technologies placed a proportion of their staff on furlough reducing organisational capacity. Public Health
England (PHE) played a key role in responding to the pandemic and staff responsible for the breast screening
programme were also refocused on to pandemic related work. In August 2020, the Secretary of State announced
the abolition of PHE to be replaced in April 2021 by the National Institute for Health Improvement. This
compounded the existing challenge of engaging with Hitachi as provider of the National Breast Screening System
(NBSS) and PHE.
What next?
Kheiron will expand on the retrospective study using Mia™ in 15 more sites in England under the NHS AI Award
Phase 4. Faculty and EMRAD are looking for funding to deploy the round length management ML tool at scale
across EMRAD and EMRAD continues to deliver radiology services at scale and to innovate operating at a fully
engaged imaging network working closely with their current global partner, GE Healthcare.
Figure 45: The AI breast screening project timeline – key events and responses
TaoHealth Research & Implementation 101
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Appendix 7
Participant information: Staff surveys
Figure 46: Survey response numbers: Clinical and Non-Clinical Surveys
Figure 47: Survey response rates: Clinical and Non-Clinical Surveys
TaoHealth Research & Implementation 102
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 48: Age and gender profile of clinical staff
Figure 49: Age and gender profile of non-clinical staff
TaoHealth Research & Implementation 103
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Participant information: Public survey
Figure 50: Public survey response numbers and rates
Figure 51: Public survey age profile
TaoHealth Research & Implementation 104
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 52: Public survey ethnicity profile
Figure 53: Public survey occupation profile
TaoHealth Research & Implementation 105
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 54: Attendance at breast screening by age band
TaoHealth Research & Implementation 106
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Focus group membership
Figure 55: Age profile
Figure 56: Ethnicity
TaoHealth Research & Implementation 107
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 57: Occupation
Figure 58: Have had a breast cancer diagnosis or knows someone who has
TaoHealth Research & Implementation 108
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Figure 59: Attended a breast screening appointment
Figure 60: What happens when a mammogram is read
TaoHealth Research & Implementation 109
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Programme leadership respondent characteristics
The survey was sent to 74 members of the following EMRAD governance groups:
• EMRAD Management Board
• EMRAD Operational Board
• EMRAD Information Governance Board
• Artificial Intelligence Programme Board
Only the members of AI Project Board and some members of the EMRAD Information Governance Board have
been directly involved in the implementation of the programme.
18 programme leaders responded to the survey (response rate of 24%).
We stratified the respondents into those directly involved in the day to day implementation of the project and
those who were not directly involved but played a support or advisory role or a role in governance and decision-
making.
Figure 61: Programme leadership segmentation
TaoHealth Research & Implementation 110
EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020
Dr Niamh Lennox-Chhugani
Research Director, TaoHealth Research & Implementation
Sanjeev Chhugani
Financial analyst, TaoHealth Research & Implementation
For more information T: +44 (0)7983 458 733
www.taohealth.co.uk
http://twitter.com/taohealth2