evaluation of emrad ai in mammography project 2018-2020

DECEMBER 17, 2020

TaoHealth Research & Implementation

Lead author: Dr Niamh Lennox-Chhugani

Evaluation of EMRAD AI in Breast Screening Project: Final Report Full Technical Report

TaoHealth Research & Implementation i

EVALUATION OF EMRAD AI IN MAMMOGRAPHY PROJECT 2018-2020

Disclaimer: Although TaoHealth Ltd has taken reasonable professional care in the preparation of this document, we cannot

guarantee absolute accuracy or completeness of information/data contained in this document, nor do we accept responsibility for

recommendations that may have been omitted due to particular or exceptional conditions and circumstances.

Confidentiality: This document contains information, which is proprietary and may not be disclosed to third parties without prior

written approval from TaoHealth Ltd or NHS EMRAD. Except where permitted under the provisions of confidentiality above, this

document may not be reproduced, retained or stored beyond the period of validity, or transmitted in whole, or in part, to any third

party without prior, written permission from TaoHealth Ltd.

TaoHealth Research & Implementation 1


Contents Glossary..................................................................................................................................................... 1

Introduction and background ................................................................................................................... 3

Overview of the evaluation ..................................................................................................................................................................... 3

Literature review: AI in healthcare ......................................................................................................................................................... 6

AI in medical imaging .......................................................................................................................................................................... 8

The NHS breast screening programme (NHSBSP) ............................................................................................................................... 8

AI in breast cancer screening ............................................................................................................................................................ 10

Public perceptions of the use of AI in general and in healthcare ..................................................................................................... 11

Changes to the tools during the project ............................................................................................................................................... 13

Kheiron Medical Technologies - Mia™ .............................................................................................................................................. 13

Faculty .............................................................................................................................................................................................. 14

Structure of the report .......................................................................................................................................................................... 15

Methods .................................................................................................................................................. 16

Ethical approval .................................................................................................................................................................................... 17

Data collection ...................................................................................................................................................................................... 17

Data analysis ......................................................................................................................................................................................... 18

Qualitative data analysis ................................................................................................................................................................... 19

Quantitative data analysis ................................................................................................................................................................ 19

The evolving theory of change ......................................................................................................................................................... 20

Findings ................................................................................................................................................. 24

Overview .............................................................................................................................................................................................. 24

How well do different groups understand AI in general? ..................................................................................................................... 25

What were the perceived benefits of the use of AI tools in the breast screening service? .................................................................. 26

What were the concerns of the workforce and women about the use of AI tools in the breast screening service? ........................... 35

What were the technical and data benefits and challenges for the project? ....................................................................................... 41

What were the organisational issues that enabled or constrained the progress of the project? ......................................................... 45

What wider contextual issues affected the progress of the project? ................................................................................................... 50

What is the potential impact of the screening imaging innovation programme on the performance of screening services? ............. 54

What is the potential impact of the screening optimisation innovation programme? ......................................................................... 56

Was the programme worth the investment, that is, did it deliver value for money and if not in the timeframe of the evaluation,

when is it likely to deliver a return on investment? .............................................................................................................................. 57

What would the impact of the screening imaging innovation programme be if implemented at scale across EMRAD? ..................... 62

What would the impact of the screening optimisation innovation programme be if implemented at scale across EMRAD? ............. 63

Summary of key findings ....................................................................................................................................................................... 64

Discussion and implications .................................................................................................................... 65

Evaluating progress, outcomes, and impact ......................................................................................................................................... 65

Implications ........................................................................................................................................................................................... 68



Limitations of the study .................................................................................................................................................................... 70

Lessons learnt for future evaluation design ..................................................................................................................................... 70

Acknowledgements ................................................................................................................................ 73

References .............................................................................................................................................. 74

Appendices ............................................................................................................................................. 83

Appendix 1 ............................................................................................................................................................................................ 84

Appendix 2 ............................................................................................................................................................................................ 90

Appendix 3 ............................................................................................................................................................................................ 91

Appendix 4 ............................................................................................................................................................................................ 94

Appendix 5 ............................................................................................................................................................................................ 96

Appendix 6 ............................................................................................................................................................................................ 99

Appendix 7 .......................................................................................................................................................................................... 101



Glossary ACC Acute care collaborative

AI Artificial intelligence

AgeX Age Extension trial for breast screening

API Application programming interface

AUC Area under curve

BIA Budget impact analysis

BSU Breast screening unit

CAD Computer aided decision-making

CAG Confidentiality Advisory Group

CE mark Conformité Européenne mark

CPD Continuing professional development

CQC Care Quality Commission

DL Deep learning

DPA Data Protection Act (2018)

DPIA Data protection impact assessment

DNAs Did not attend, the term used for a patient who missed an appointment

EMAHSN East Midlands Academic Health Sciences Network

EMRAD East Midlands Radiology Consortium

GDPR General Data Protection Regulation

HRA Health Research Authority

ICO Information Commissioner’s Office

IG Information Governance

IoT Internet of things

IT Information Technology

IRAS Integrated Research Application System

NASSS Non-adoption, abandonment, scale-up, spread, and sustainability

NBI Nottingham Breast Institute

NBSS National breast screening system

NHSEI National Health Service England and Improvement

NHSBSP National Health Service Breast Screening Programme

NHSX NHS body responsible for supporting digital technology adoption in the NHS

NICE National Institute for Health and Care Excellence



NSC National Screening Committee

NUH Nottingham University Hospitals NHS Trust

MHRA Medicines and Healthcare Products Regulatory Agency

ML Machine learning

PACS Picture archiving and communication system

PHE Public Health England

PHIS Public health identity systems

PI Principal investigator

ROC Receiver operating curve

RSA Royal Society for the encouragement of Arts, Manufactures and Commerce

RSNA Radiological Society of North America

SDS Synthetic data set

ULBSS United Lincolnshire Breast Screening Service

ULH United Lincolnshire Hospitals NHS Trust



Introduction and background Overview of the evaluation

This evaluation was conducted from October 2018 – September 2020 by the research team at TaoHealth

Research & Implementation. It was commissioned by NHS EMRAD1 as part of the NHS England Wave 2 Test Beds

Programme to deliver project learning during and at the end of the project, inform project implementation and

future investment decisions locally and nationally as the NHS looks at how it can derive value from digital

technologies including artificial intelligence in the future.

NHS EMRAD (EMRAD), together with two commercial digital technology companies, Kheiron Medical

Technologies (Kheiron) and Faculty (formerly ASI Data Science), their provider of radiology IT systems GE

Healthcare and East Midlands AHSN bid for funding under the NHS England Test Beds Programme in 2018 to

train and implement artificial intelligence (AI) solutions [Box 1] within the national breast screening programme.

Neither of the products being ‘tested’ were market-ready although the product from Kheiron was CE-marked.

This evaluation used mixed methods to understand the potential impact of the technologies on the breast

screening service and the process of implementation of introducing such novel technologies into the clinical

context and the breast screening pathway specifically [Figure 1].

The theory of change underlying the EMRAD screening imaging innovation programme is that the two AI tools,

one an algorithm-based clinical decision support tool and the other a machine learning pathway optimisation

tool, in the context of a scalable radiology IT system, will optimise the efficiency of the overall service, allow the

same number of staff to process more scans, reducing reporting delays, freeing up staff to deliver high value

activity and enable prompt and accurate diagnosis and treatment.

The main aims of the evaluation were to:

1. Understand the effect of combinatorial innovation in the NHS Breast Screening Programme (NHSBSP) on

coverage and utilisation, user satisfaction, improvement in workforce productivity and improvement in

health and care services with a specific focus on:

a. Assessing the use of machine learning models with proven effectiveness in non-healthcare, live

environments [developed by Faculty] to optimise the operational aspects (clinic scheduling and

resource allocation) of the breast screening service, boosting system capacity, reducing delays and

improving patient experience.

b. Understand clinical and patient attitudes towards this technology, with a view to wider roll-out

across the NHS.

1 NHS EMRAD stands for East Midlands Imaging Network and is a partnership of seven NHS trusts (Chesterfield Royal

Hospital NHS Foundation Trust, Kettering General Hospital NHS Foundation Trust, Northampton General Hospital NHS Trust,

Nottingham University Hospitals NHS Trust (host organisation), Sherwood Forest Hospitals NHS Foundation Trust, United

Lincolnshire Hospitals NHS Trust, and University Hospitals of Derby and Burton NHS Foundation Trust). These trusts run 11

hospitals, covering more than five million patients.



2. Understand and share ‘lessons learnt’ as a nationally relevant template for the combined deployment of

clinically and operationally focused AI tools in healthcare.

3. Make recommendations about future real-world testing and scale-up of AI technologies in the health

system.

Figure 1: Evaluation timeline

The evaluation does not include a comprehensive assessment of the safety and effectiveness of Kheiron’s Mia™

tool (the subject of a separate HRA approved study) or Faculty’s service optimisation tool. It explores the process

of testing and developing the tools in the real world, perceptions around the use of AI tools in the context of the

NHSBSP, early evidence of the effect on NHSBSP performance and the process of innovating in the NHS.

Box 1: Definitions

Artificial intelligence (AI) can be viewed as ‘general’ or ‘narrow’ in scope. Artificial general intelligence refers to

a machine with broad cognitive abilities, which is able to think, or at least simulate convincingly, all of the

intellectual capacities of a human being, and potentially surpass them—it would essentially be intellectually

indistinguishable from a human being.

Narrow AI systems perform specific tasks which would require intelligence in a human being, and may even

surpass human abilities in these areas. However, such systems are limited in the range of tasks they can perform.

The terms ‘machine learning’ and ‘artificial intelligence’ are also sometimes conflated or confused, but machine

learning is in fact a particular type of artificial intelligence which is especially dominant within the field today.

Machine learning (ML) gives computers the ability to learn from and improve with experience, without being

explicitly programmed. When provided with sufficient data, a machine learning algorithm can learn to make

predictions or solve problems, such as identifying objects in pictures or winning at particular games, for example.

Neural networks are types of ML loosely inspired by the structure of the human brain. A neural network is

composed of simple processing nodes, or ‘artificial neurons’, which are connected to one another in layers. Each

node will receive data from several nodes ‘above’ it and give data to several nodes ‘below’ it. Nodes attach a

‘weight’ to the data

they receive and attribute a value to that data. If the data does not pass a certain threshold, it is not passed on

to another node. The weights and thresholds of the nodes are adjusted when the algorithm is trained until similar

data input results in consistent outputs. Deep learning (DL) is a more recent variation of neural networks, which

uses many layers of artificial neurons to solve more difficult problems. Its popularity as a technique increased



significantly from the mid-2000s onwards, as it is behind much of the wider interest in AI today. It is often used

to classify information from images, text or sound.

Select Committee on Artificial Intelligence (2018)

The evolution of deep learning

Classic machine learning depends on carefully designed features, requiring human expertise and complicated

task-specific optimization. Deep learning bypasses feature engineering by taking advantage of large quantities

of data and flexible hierarchical models. Deep learning has recently achieved striking performance improvements

in diverse fields such as image classification, speech recognition, natural language processing, and playing

games. Blue boxes represent components learned by fitting a model to example data; deep learning allows

learning an end-to-end mapping from the input to the output.

Deep learning: A primer for Radiologists (Chartrand, 2017)



Literature review: AI in healthcare

In 2019, the Secretary of State created NHSX, a joint unit across the Department of Health and Social Care and

NHS England and Improvement, to lead on digital transformation across the health and care system in England.

Since its creation, the team within NHSX focusing on artificial intelligence (AI) has produced papers setting out

the enabling context they would like to create for value-added AI technology in health including Artificial

Intelligence: How to get it right (2019) and Code of conduct for data-driven health and care technology (2019).

Together with other healthcare regulators, NHSX recognizes the potential but untested impact of AI on

healthcare.

Globally, health systems are looking to technology including artificial intelligence to address some of the demand

and capacity challenges facing them (Rong, 2020) (Davenport, 2019) (Loh, 2018) (Reform, 2018) (Shah, 2018)

(Fenech, 2018). Software and applications using artificial intelligence are seen to have great potential here but

there are few prospective real-world use cases (Nagendran, 2020) (Kelly CJ, 2019) (AHSN Network, 2018).

The main benefits that proponents of the use of AI in healthcare put forward are that:

1. It is more accurate than humans on well-defined tasks (Liu X. F., 2019) (Buch, 2018) (Chen, 2017);

2. It can help increase healthcare workforce productivity (Buch, 2018) (Meskó, 2018) releasing highly

trained professionals to high value activities that require human interaction;

3. Support the administration and management of services to match capacity to demand (Nelson, 2019)

(Rajkomar, 2018);

4. Enable greater than current levels of patient autonomy and self-care (AHSN Network, 2018) although

this remains untested; and

5. Greater cost effectiveness (Wolff J, 2020) although this again remains unproven.

The limitations and risks of applying AI in the healthcare settings has also been highlighted and is subject to

much current debate centring around regulation. The concerns that have been raised include:

1. The dependence of data that is not always reliable, generalizable, consistent or available added to the

risks inherent in machine learning where the output of the algorithm itself shapes future data inputs

(Kelly CJ, 2019) (Buch, 2018) (Chen, 2017);

2. The risk of bias both within algorithms themselves and within the data that is used to train and validate

the models (Coeckelbergh, 2019) (Char, 2018 );

3. The risks of excluding protected groups who may be under-represented in data-sets or important

markers may not be included (Brown, 2019) (Panch T, 2019);

4. The risks of breaches to data security especially where commercial partnerships play a role in delivery

(Thompson, 2018), weak data governance and unclear data control (Morley J, 2019) (Panch T, 2019)

(Coeckelbergh, 2019) (Morley J, 2020);

5. Lack of clarity around clinical accountability when AI is used to make diagnostic and treatment

recommendations (Tschider, 2018) (Floridi, 2020) (Smith, 2020) (Coeckelbergh, 2019);

6. Outstanding questions about clinical safety (Challen R, 2019) (Macrae, 2019);



7. Lack of understanding about AI and machine learning algorithms, how they are trained, the inputs

required, how they function and what they cannot do amongst health professionals (Robbins, 2019)

(Lee, 2019) (Harvey, 2018); and

8. AI technology encroaches on human clinical autonomy (McDougall, 2019) (Milano, 2020) (Asan, 2020)

and will have an as yet unknown impact on human relationships in healthcare (Kerasidou, 2020) (Fenech,

2018) (Powell, 2019) (Karches, 2018) (Bjerring, 2020).

The covid-19 pandemic accelerated the use of technology solutions in most aspects to health and care delivery

including use of video consultations in general practice and hospital out patients (Greenhalgh T, 2020)

challenging working practices, public experience of care, technical infrastructure and, in the longer term,

reimbursement models (Webster, 2020). The need to rapidly expand countries contact tracing capacity in

response to the pandemic led governments to turn to software developers and data scientists to plug a gap by

developing digital applications. In the UK, there was a particularly strident public debate around the use and

basis of this kind of technology to track and trace infection in the population and offer immunity passports with

trust in data privacy and security at the centre of this debate (Ada Lovelace Institute, 2020). Calls for ethical

frameworks and robust regulation of health technology including AI have been bolstered by this recent and time-

sensitive debate. A number of ethical frameworks have been proposed (Morley J. F., 2020) (Morley J. F., 2019)

(Open Data Institute, 2020) and are informing how such technology can be safely adopted in healthcare.

Healthcare regulators in England are starting to develop their approaches to regulating this emerging technology

(Care Quality Commission, 2020) (Care Quality Commission (b), 2020) (Information Commissioners Office, 2020)

(NICE, 2019). This development in regulation can be seen mirrored in Europe and the US (Pesapane F, 2018).

The rapid adoption of technology in health in the context of the pandemic had underlined the implementation

challenges that have dogged the adoption and spread of technology in healthcare (Sheikh A, 2011). AI tools need

data to be trained and accessing large bodies of high quality, deidentified data is often the first hurdle (Lee,

2019). In England, NHSX is seeking to overcome this challenge by setting up the NHS AI Lab and an AI Award

programme to provide technology developers with support to access data safely (NHSX, 2019). Given the role

that AI could play in augmenting decision-making as part of the processes of assessment, diagnosis and

treatment, it is critical to understand the interaction between AI technology and humans in real-world clinical

workflow (Kelly CJ, 2019) (Cresswell, 2018) and the how acceptable health professionals find the technology in

different contexts (Cohen, 2017) (Shaw J, 2019). These and other particular development and implementation

challenges for AI in health mean that a more iterative approach will be required than has been traditionally the

case when adopting and scaling technology (Coiera, 2019).

There does seem to be an emerging consensus amongst clinical academics and policy-makers that AI technology

has the potential to augment clinical decision-making rather than replace it (Shaw J, 2019) (Health Education

England, 2019) (Health Education England, 2020), although this has not yet been tested in practice. As Chen and

Asch put it; “Whether such artificial-intelligence systems are “smarter” than human practitioners makes for a

stimulating debate — but is largely irrelevant. Combining machine-learning software with the best human

clinician “hardware” will permit delivery of care that outperforms what either can do alone” (Chen, 2017).



AI in medical imaging

Specifically in the field of medical imaging, the shortage of radiologists (Royal College of Radiologists, 2019)

(Royal College of Radiologists, 2020) (Lee, 2019) is constraining the capacity of imaging services including the

services provided as part of screening programmes. Deep learning (DL) AI provides the promise of a potential

solution to this challenge. Whilst there have been some fears expressed that AI may replace radiologists, it is far

more likely that AI will enhance the workforce’s ability to deliver high quality services (Allen, 2020). Systematic

analysis of studies examining the comparative performance of these solutions to date have identified a paucity

of randomized control trials, fewer prospective trials and even fewer again conducted in a real-world setting

(Nagendran, 2020) (Liu X. F., 2019). With these caveats in mind, early performance results are promising,

showing performance on a par with clinicians (Shen, 2019).

The workforce within medical imaging, radiologists, radiographers and clinical support staff, will need to have

an understanding of AI and the role that it can play in various diagnostic and treatment pathways (Panch, 2018)

(Recht, 2020). There is a requirement for medical education to reflect this need (Mendelson, 2019). Studies

exploring the attitudes of the workforce, including trainees, to the use of AI in imaging workflows are few but

those that have been conducted in countries such as France (Waymel, 2019) (Lai, 2020), Canada (Gong, 2019),

Germany (Pinto dos Santos, 2019), USA (Park C. Y., 2020), Switzerland (van Hoek. J. Huber, 2019), Saudi Arabia

(Abdullah R, 2020), South Korea (Oh S, 2019) and the UK (Sit, 2020) show similar patterns of attitudes. The

radiologist workforce is open to learning more about AI and role it could play particularly in diagnostics,

supplementing clinical expertise. However, clinicians participating doubt that AI could ever deal with some of

unexpected patterns that arise in the real-world of patient interaction. There are fears that the technology could

ultimately replace human image readers, but this is more prevalent in clinical support staff and technicians who

do not have patient-facing roles. Interestingly in the Swiss study, radiologists were more anxious about losing

territory to non-radiologist colleagues than to AI. All these studies highlight the need for more education of the

workforce in AI. Those studies that examined radiologists’ attitudes in more detail including qualitative methods,

highlighted perceived potential benefits as, saving time, reducing error rates, and increasing time spent with

patients.

The NHS breast screening programme (NHSBSP)

One of the aims of the NHSBSP is to lead to an earlier detection of breast cancer and improved outcomes for

women between the ages of 50-70 years. The NHSBSP invites more than 2 million women for a test every year

nationally. In 2018/19, 71.1% of women took up the invitation and of these, 19,558 women had cancers detected

which was the highest rate in the last 10 years (NHS Digital, 2020). Screening saves around one life from breast

cancer for every 200 women screened, which equates to 1,300 lives saved from breast cancer each year in the

UK (Department of Health and Social Care, 2019).

The Age Extension (AgeX) trial (NHS Breast Screening Programme, 2020) conducted by a team at the University

of Oxford is currently assessing the risks and benefits of the extending the screening age range for women aged

47-49 and over 70 years. This trial is not due to conclude until 2026 and only then will the results be known and

fed back to the National Screening Committee (NSC). The NSC is responsible for advising the Secretary of State

for Health and Social Care on whether any new initiatives are sufficiently well evidenced to be used within a

population screening programme such as the breast screening programme. It was announced in August 2020

that responsibility for oversight of the national screening programmes in England will move to NHS England and

Improvement at a date yet to be determined (Brennan, 2020). A number of other studies are ongoing exploring



the effectiveness of a range of approaches to preventing breast cancer through prediction, early detection and

prevention of breast cancer (The Nightingale Centre, 2019) (Evans, 2012) and their acceptability to women.

An independent review of breast screening services delivered its final report in December 2018 (Commons,

2018), and set out recommendations to improve the operations of the breast screening programme. Sir Mike

Richards was commissioned to conduct a review of all cancer screening programmes in 2019 and delivered his

interim report in May 2019 which included recommendations about the greater use of technology and artificial

intelligence to support high quality cancer screening services (House of Commons, 2019a). The final report was

published in October 2019 (Richards, 2019b). Since March 2020, the NHSBSP has been paused due to the Covid-

19 pandemic and screening staff were redeployed to support the clinical response to the pandemic. Since the

summer of 2020, breast screening units in England have started to resume socially distanced screening services

with some piloting a new approach to inviting women to attend in September 2020. The resulting backlog will

add pressure to an already stretched service.

EMRAD have reported that within the East Midlands, breast screening services have been critically challenged.

With the lowest rate of radiologists per 100,000 of the population and highest rates of retirement over the next

5 years (Royal College of Radiologists, 2020), coupled with increasing numbers of women being eligible for breast

screening, there has been an increase in workload for readers and breast screening service managers. Some of

the EMRAD consortium Trusts were only just managing to meet two-week cancer targets pre-Covid-19 at the

expense of other important elements such as research. Other EMRAD Trusts were failing to hit the targets and

continue to refer patients onto neighbouring Trusts for treatment. This was not only causing increased travel,

anxiety, and reduction in choice for patients, but putting additional pressure on those neighbouring services

which were just about meeting demand within target performance. This has yet to be confirmed by quantitative

data but recent PHE Screening Quality Assurance Reports for the NHSPSP provided at Kettering (Public Health

England, 2018), North Nottingham (Public Health England, 2018) (Sherwood Forest Hospital) and Lincolnshire

(Public Health England, 2017) all raise the issue of staff capacity within the local breast screening units (BSUs).

The BSUs across EMRAD are slowly resuming services within the constraints of social distancing and infection

control as of July 2019.

There has been some debate in recent years about the risks and benefits of the breast screening per se (Løberg,

2015) (Gøtzsche, 2013) with one of main risks being that of over-diagnosis due to false-positives (something

found on the mammogram turns out not to be cancer) with consequent negative effects on well-being (Health

Quality Ontario, 2016). Women’s preferences show that they are willing to accept this risk along with the

discomfort of the process itself if it means cancers are diagnosed earlier (Mathioudakis AG, 2019). The quality

of clinical communication with women called back for assessment after screening is particularly important in

this circumstance (Long, 2019).



AI in breast cancer screening

The use of machine learning (ML) is not new in cancer diagnosis (Maclin, 1991) (Cicchetti, 1992) (Kononenko,

2001) (McCarthy JF, 2004) (Cruz, 2006). One challenge in using machine learning is the status of the input data

for training. ML relies on data that is uniformly annotated, labelled and structured. Medical images, including

breast images, are rarely curated in ways that allow for ML to be applied on large data sets without significant

preparatory work (Harvey H., 2019). The process of preparing medical images for machine learning is complex

and rarely fast (Willemink, 2020) (Chartrand, 2017) and requires three subsets, a training set2 , a validation set3

and a test set4. Inadequate planning for this data curation commonly leads to ML project delay and failure

(Harvey H., 2019).

Figure 2: Preparing medical imaging data for machine learning, Willemink et al 2020

The journal Nature published an article in January 2020 that gained considerable media interest. The article

presented the results of an international evaluation of an AI system for breast cancer screening (McKinney,

2020). This retrospective study using data from the US (enriched) and UK (representative) found that the AI

reader outperformed human readers, reducing both false positives and false negatives (the mammogram may

look normal even though breast cancer is present). The size of the data set (c17,000 images) of this study

addressed some of the previous criticisms of studies examining the use of AI in breast cancer screening

(Houssami, 2019). Another retrospective study (Kim, 2020) published one month later using an even larger data

set from the UK, US and Korea (170,230) as well as using images from three different imaging vendors, found

that standalone AI outperformed radiologists and when used alongside radiologists, improved diagnostic

performance significantly. A paper published in August 2020 (Salim M, 2020), presenting the results of an

independent evaluation of three commercially available AI products used as stand-alone readers and with a

2 Trains and optimises the neural network parameters. 3 Monitors the performance of the model during training. Internal validation uses the data used to develop the model. External validation uses a separate data set. Temporal or geographic external validation enables assessment of generalisability (Park S. &., 2018). 4 Measures final model performance when parameters are fixed.



human reader, found the algorithms demonstrated “sufficient diagnostic performance” and identified more true

positive cases of cancer when combined with human readers.

So there is emerging evidence that AI is safe, effective and accurate when used retrospectively but we have, as

yet, no prospective studies that test if this performance carries through into real-world clinical practice (Dustler,

2020). A concern that has been voiced more recently that AI image reading may currently lack the level of patient

focus on “clinically meaningful endpoints such as survival, symptoms, and need for treatment” needed to

mitigate the risks of over-treatment and false positives (Oren, 2020).

Public perceptions of the use of AI in general and in healthcare

The increasing ubiquity of AI in our daily lives is reflected in the media’s portrayal of AI and related ethics. From

no media discussion of AI and ethics in 2013 (Ouchchy, 2020) to a position in mid-2020 where public discourse

on the role of algorithms and AI in decision-making in the UK has been shaped by controversies surrounding the

NHS Covid-19 app, crime and justice decision-support systems and exam results (The Guardian, 2020).

AI and its use in software in everyday use is perceived differently by demographic groups across the world. Here

in the UK, recent opinion polls have highlighted a varying degree of understanding of what AI is, the role it plays

in day to day life, and perceptions of its impact. In 2018, a poll conducted by YouGov for the Royal Society for

the encouragement of Arts, Manufactures and Commerce (RSA) found that people were less familiar with the

types of AI most likely to have a direct impact on their lives. These types were also less visible to them and

included automated decision-support used in personal finance, welfare and criminal justice (The RSA, 2018). The

less visible the application the less likely people were to trust the output. A 2019 report looking at the attitudes

of the American public to AI found similar results (Zhang, 2019) with 82% wanting to see careful management

and regulation of AI applications. Research conducted by the organisation Doteveryone in the UK in 2020,

following up research conducted in 2018 looking at public attitudes to technology generally, found that while

the public remain positive about the impact of technology in their lives, enthusiasm has declined since 2018

(Miller, 2020). Women were less optimistic about the future effect of technology than men and older people

less optimistic than younger. A research study published in 2020 measured public attitudes to the smart home

and use of AI and Internet of Things (IoT) technology in the home (Cannizzaro, 2020) and found that trust in

technology in the home was low overall and this was particularly affected by concerns about the unauthorized

use of data.

Public attitudes to the use of AI and machine learning in healthcare specifically are similarly evolving. In 2017,

research commissioned by The Royal Society found that people who took part in deliberative events were

positive in general about the use of machine learning to support diagnosis in physical illness although not mental

illness (Ipsos Mori, 2017). People were clear though, that this should not replace the human interaction that

they value in the healthcare context. A study conducted for the Academy of Medical Sciences in 2018 confirmed

these views (Academy of Medical Sciences, 2018). Additionally, this research found that participants trusted the

NHS, as the guardian of their personal health data to retain control of this data when working with commercial

partners. Another notable observation from this study was the difference in attitudes between those who

identify themselves as healthy users and those identify themselves as ‘patients’, that is, people living with a

specific condition for which they are receiving treatment. Patients have a more positive view of the use of

innovative technology, probably because they have more of an interest in the benefits. A recent study looking

at public attitudes in China to the use of AI in medicine (Xiang Y, 2020) found that there is a high level of

acceptance of the use of AI in medicine overall in China. Interestingly, they found that receptivity to the use of



medical AI increased with age but that people do not perceive AI as replacing human health professionals, but

augmenting them. Another Chinese study analysing public attitudes as expressed on social media (Gao S, 2020)

found that nearly 60% of views were positive. Of the negative views expressed (6%), AI immaturity and distrust

of technology companies were the most common views. A study of members of the public in France (Tran, 2019)

replicated the largely positive views towards using AI in healthcare with the primary benefits seen as improving

access and follow-up, reducing the burden of treatment and reducing the workload of health and care

professionals. Perceived disbenefits included reducing human interaction, risks to privacy and security and

reliability issues.

In July 2020, the Ada Lovelace Institute published a report summarizing their findings from a series of

deliberative events conducted with the public on their attitudes to the use of technologies including AI and

algorithmic decision-making. They focused specifically on public health identity systems (PHIS) in light of the

Covid-19 pandemic (Ada Lovelace Institute, 2020) and identified issues around public trust in technology and

the companies developing technologies, concerns about effectiveness, worries about discrimination, and a

recognition that technology is not neutral but shaped by and shaping prevailing and dominant social and political

attitudes.

Looking specifically at patient attitudes towards the use of AI in radiology, a small sample survey (n=155)

conducted in The Netherlands (Ongena, 2020) showed that people want to be fully informed about the use of

AI in radiology and want to retain human interaction in the diagnostic process. A study of patient attitudes to

the use of AI to diagnose skin melanomas in Germany (Jutzi TB, 2020) (n=298) found that the respondents were

positive about the use of such technology to support clinician diagnosis and deliver faster, more precise and

unbiased results. They were concerned about data protection and susceptibility to errors.

The public are not passive recipients of care. They are essential stakeholders in the healthcare ecosystem and

their willingness to adopt new innovations can enable or constrain spread and scale (Lennon MR, 2017). If the

benefits of AI are to be delivered in programmes such as the National Breast Screening Programme and the

disbenefits minimised, then the public should be actively engaged in the design, development and monitoring

of this technology (Kirsch, 2017) (Katell, 2020).



Changes to the tools during the project

Kheiron Medical Technologies - Mia™

Kheiron’s Mia™ AI tool for breast cancer screening and reporting was CE-marked from the outset of the project.

It has been used to retrospectively read a large number of mammograms from different manufacturers, to train

the algorithm, validate it and to determine which of the software’s various operating points would be best used

for future prospective pilot in the NHS breast screening programme (NHSBSP).

The way that Mia™ fits in to the NHSBSP workflow is summarised in Figure 3 below which is reproduced from a

document submitted by the partners to NHSX in September 2019.

Figure 3: Mia™ AI mammogram reader

Mia™ is feasibility tested in a multi-centre randomised trial using mammograms and de-identified outcomes

data. The trial assessed decision-making efficacy of Mia™ in a screening setting on European demographics. The

trial indicated that Mia™ software could potentially identify Breast Cancer correctly 9 times out of 10 (i.e. 1 in

10 false negatives). Sensitivity and specificity were 90% each, with an AUC (Area Under Curve) of 0.96. These

indicative results were consistent and repeatable, and outperformed all known Computer Aided Diagnostics

(CAD) software for breast malignancy detection. This performance was also above recommended human

performance guidelines with no significant risks or safety concerns. Mia™ received CE marking (Class IIa) in

October 2018.

The retrospective study conducted as part of the NHS Test Bed Project (HRA approval was in place prior to The

NHS Test Bed) undertaken over the last 24 months was aimed at calibrating the deep learning tool and then

validating the results of this updated model to assess levels of sensitivity and specificity. The results of this

retrospective study were presented at RSNA 2020 in December and have been submitted for publication to a

peer-reviewed journal.



Kheiron have had to engage intensively with information governance (IG) and NHSBSP teams at the two sites

extensively during the period to design and deliver the retrospective data extraction that conforms to strict IG

requirements (Data Protection Impact Assessment and risk assessment for re-identification) and understand

data linkage, coding and labels as they calibrate the tool. The primary users of the tool on the NHSBSP sites

(mammogram readers) have not yet experienced the technology in their workflow.

The next phase of the work will be the delivery of the NHSx and AAC Phase 4 AI Award to deploy Mia across 15

sites in the UK over 3 years. Whilst retrospective evaluations demonstrate evidence of software performance,

the AI Award will allow for collection of outcomes in a prospective setting to ensure additional evidence of

acceptability and utility within business-as-usual NHS screening settings.

Faculty

Faculty AI’s ‘Platform’ (formerly SherlockML) software is a secure machine learning environment for accessing

and manipulating enormous amounts of data, designing, and testing AI models, and deploying those models in

live environments. The platform has already been used on more than 200 commercial projects. The original

vision of this project was to use this platform to develop tools that optimise the capacity and demand in the

context of breast screening round length5. The outcome of the discovery work conducted between October 2018

– December 2018 was the identification of 2 priority tools:

a) Breast Screening Programme Management Tool focusing on round length optimisation,

attendance monitoring and clinic scheduling, and

b) Theatre Productivity Optimisation Tool with further detail to be determined.

In practice, Faculty and the two participating sites (Nottingham University Hospitals NHS Trust (NUH) and United

Lincolnshire Hospitals NHS Trust (ULH)) were unable to share the data required due to unsurmountable

information governance issues.

In October 2019, Faculty proposed an alternative route to delivering their solution via the development of a

synthetic data set, which mimicked the National Breast Screening System (NBSS) data set. This synthetic data

set (SDS) could then be used to develop a range of machine learning tools to support the management of the

breast screening service.

The main outputs of Faculty’s contribution to the project have been:

1. A synthetic NBSS data set co-owned by Faculty and NUH (on behalf of EMRAD);

2. A deployment environment in NUH that is a controlled and governed environment in which new AI

products can be deployed safely;

3. A round length machine learning (ML) tool that includes: (a) an interactive information dashboard that

provides situational awareness to service managers about multiple dimensions of service activity and

demand that can be used to support day to day decision making; and (b) a scenario planning tool that

allows managers to model scenarios and predict demand and capacity changes under modelled

conditions, both using the SDS.

5 Round length is the term used to describe the time between breast screening appointments for each woman. In England this is usually 3 years.



The aim of the round length machine learning tool is to make the best possible use of scarce resources like

radiologist time and expensive machinery, and to reduce stress on the clinical and administrative workforce

delivering the programme.

Structure of the report

There are three main parts to this report:

• The methods used for collecting and analysing the data.

• The findings from the evaluation; and

• The conclusions of the evaluation and recommendations for future similar projects.

It was delivered alongside the project throughout the duration of the project delivery.



Methods The evaluation of digital health technology can be conceptualised as a cycle (Bonten TN, 2020) which is similar

in many ways to the cycle of health technology assessment (Gutiérrez-Ibarluzea I, 2017) [Figure 4]. However,

digital health technologies using AI and machine learning present some novel challenges to evaluators. The

process of training, validating testing AI models on real-world data means that there is more iteration across the

phases than traditional technologies. This real-world data is not quite ‘real-world’ however as cleaning is done

to remove incomplete data – a luxury not available in the real world. The ‘black-box’ nature of AI models make

the process of assessing effectiveness less transparent and externally verifiable. Researchers have started to

recommend approaches that recognise these challenges (Vollmer, 2020) but this remains an evolving field.

Figure 4: Comparing the ehealth evaluation cycle and HTA cycle

With these challenges in mind, we chose a non-experimental design for this evaluation. We chose this design on

the basis that it was not be possible to control all possible variables, and from a sampling perspective, it was not

feasible to randomly select the sites for project implementation. Two EMRAD partner sites were selected

(Nottingham Breast Institute and United Lincolnshire Breast Screening Service) for the project. To establish a

counterfactual for the evaluation, two other EMRAD sites were selected, Sherwood Forest NHS Trust and

Northampton General Hospital NHS Trust. We sought to understand if there were any differences in staff

attitudes to the use of AI in breast screening between sites using the model of ‘usual care’ and the sites involved

in the development of the proposed new model using AI tools.

The evaluation used a mixed methods approach, where quantitative and qualitative data is collected, analysed,

and synthesised at different points in the evaluation design to allow for different levels of exploration.



Ethical approval

Ethical approval was sought and obtained from the Health Research Authority in July 2019 for the staff and

general public survey component of the evaluation (IRAS ID 262287).

Data collection

The mixed methods approach combines qualitative and quantitative research methods to answer the evaluation

questions from a range of stakeholder perspectives. Information and data are being collected from key

informant interviews, surveys, potentially focus groups, and local data sources.

This approach recognises the importance of establishing a clear baseline (to understand current processes, costs

and outcomes) and establishing a counterfactual (to illustrate the projected costs and outcomes of usual

practice) that can be used in future prospective trials of the innovations. The types of data collection methods

are summarised in Table 1. The data collection survey tools are included in Appendix 1. These were submitted

to the HRA as part of the approval process.

Table 1: Data collection methods

Data collection

method

Details Timing Sample

population size

Target cohort

Document review All programme documents that are held in the programme repository (meeting minutes, reports, status updates, lessons log, risk register, case studies, communications, etc.) are collated and themed.

Oct 2018 – Aug

2020

60+ documents

up to Sept 2019

N/A

Observations Observations of project meetings and other project activity were collected using content and discourse analytical frameworks.

Oct 2018 – Aug

2020

15 meetings up

to Sept 2019

N/A

Initial semi-structured

interviews

Semi-structured interviews were conducted with the programme team in November – December 2018 and again in July – August 2019 to understand the programme partners perceptions of the programme progress and the moderators of this progress.

Nov 2018 –

Dec 2018; Jul

2019 – Aug

2019

31 interviews

with 12

interviewees

interviewed on

2 occasions

19

programme

team and

governance

board

members

Survey of NHSBSP

managers and

administrators

Surveys were sent to service managers at all four sites in December 2019 (round 1) and again in July 2020 (round 2). Only those who participated in round 1 were invited to participate in round 2.

Round 1: Dec

2019 – Jan

2020

Round 2:

July 2020

Round 1: 46

Round 2: 38

46

Survey of NHSBSP

clinicians

Surveys were sent to all clinicians working on the NHSBSP at all four sites in December 2019 (round 1) and again in July 2020 (round 2). Only those who participated in round 1 were invited to participate in round 2.

Round 1: Dec

2019 – Jan

2020

Round 2:

July 2020

Round 1: 115

Round 2: 67

115



Survey of women in

the general

population

Surveys were set up on www.onlinesurveys.ac.uk and information was shared via a range of site communication channels with women over the age of 18 years working at all four sites. This group was used as a proxy for the wider population of women in the East Midlands. They were also invited to share information about the project with female friends and relatives, especially those who are not in paid employment including those who are retired. We gathered information on age, ethnicity, and employment status to enable us to identify any gaps in the sample cohort that we needed to address using alternative methods.

Dec 2019 – Feb

2020

24,300 2,500

Programme

leadership survey

The original plan had been to conduct further interviews with programme partners to explore themes around implementation. Covid-19 meant that in-person interviews were not possible and time constraints and redeployment limited even online access to programme partners. Instead, we circulated a survey to all members of the EMRAD governance structure to understand their perspectives on the project progress.

July 2020 74 22

Focus groups We organised focus groups for women in the general population to target groups that are under-represented in the survey responses.

May - June

2020

24,300 30

Operational data We convened a working group of EMRAD programme team members, NHSBSP staff and members of the trusts information teams to refine the programme logic model and identify programme costs and consequences. Information was then collected from trust finance and information teams and used to build the budget impact analysis model.

March - May

2020

N/A N/A

Data analysis

All survey data collected was anonymised and was stored securely within the GDPR compliant

www.onlinesurveys.ac.uk platform and other data on spreadsheets and word documents that were managed

and stored at TaoHealth Research & Implementation under the Data Protection Guidelines (available on request)

and Research Ethics Guidelines.

http://www.onlinesurveys.ac.uk/

http://www.onlinesurveys.ac.uk/



Qualitative data analysis

We uploaded all qualitative data into Nvivo6, a software package which is commonly used by researchers to

organise and visualise data analysis. We used Nvivo to develop and use a first order hierarchical thematic

framework to classify and organise data according to key themes, concepts, and emergent categories. It allows

for exploring data in depth while simultaneously maintaining an effective and transparent audit trail, which

enhances the rigour of the analytical processes and the credibility of the findings.

In addition to the emergent thematic framework for first order analysis [Appendix 2], we also used the Non-

adoption, Abandonment and Challenges to the Scale-up, Spread and Sustainability of health and care

technologies (NASSS) framework (Greenhalgh, 2017) [Error! Reference source not found.] as a second order

analysis to enable us to answer the evaluation questions.

Figure 5 The NASSS framework for considering influences on the adoption, non-adoption, abandonment, spread, scale-up, and sustainability of patient-facing health and care technologies.

Quantitative data analysis

The public and staff surveys were analysed using descriptive statistics to explore views on the use of AI in general

and in breast screening specifically as well as changes in staff attitudes over time.

6 NVivo is a qualitative and mixed-methods data analysis software tool used by academics and professional researchers globally.



We collected quantitative activity and financial information from the two test sites (Nottingham University

Hospitals NHS Trust and United Lincolnshire Hospitals NHS Trust). The activity information for the years 2016/17,

2017/18 and 2018/19 was extracted from the NBSS by each test site (NUH and ULH) and forms the basis of the

KC62 performance report submitted by all breast screening units to NHS Digital. The financial information was

the service budget from the trust ledgers for the same years to identify the direct and indirect costs of service

provision.

This information was then used to conduct a budget impact analysis (BIA) to understand the potential value of

the innovations to the health system. The model for the BIA was agreed with a group of stakeholders in March

2020. The difference between a budget impact analysis and economic evaluation is summarised in Table 2.

Table 2: Comparison of BIA and economic evaluation

The process of conducting a budget impact analysis is summarised in Appendix 3.

The evolving theory of change

The starting point for the evaluation of any programme is the establishment of a programme theory of change.

This sets out the change that is expected to happen, the activities and processes it will employ to effect that

change, identifies the context within which the change will happen and the how that change can be measured.

The theory of change from the model of ‘usual care’ that underpinned the project as a whole was as follows:

a) The real-world application of ‘Platform’ as part of the management of NHSBSP would improve and

optimise clinical service capacity in terms of workforce, scanning equipment and physical space.

b) The use of Mia™ would release capacity in the radiologist and reporting radiographer workforce by

performing one of the two reads on mammographic screens replacing one reader.

c) In combination, the real-world application of these two tools will enhance patient care at significant

scale.

It was also hypothesised that the use of evidence-supported AI tools in programmes such as NHSBSP will increase

NHS clinical staff and commissioner confidence in utilisation of innovative machine learning tools such as Mia™

and ‘Platform’. A simplified version of this theory of change is summarised in Figure 6 below.



Figure 6: Simple Theory of Change, March 2019

In December 2018, the evaluators ran a working session with the project partners to develop the initial theory

of change for the project. It became very clear that the different levels of development of the solutions being

tested (Mia™ versus ‘Platform’) meant that a single theory of change would be of limited usefulness. Two

theories of change were developed, and these are included in Appendix 4. These theories of change were used

to refine the evaluation plan and data collection tools.

This project was slightly different from others in the NHS England Wave 2 Test Beds Programme in that it was

not testing real-world deployment of market ready digital tools, but developing innovative tools using artificial

intelligence in a real-world context. The exploratory and iterative nature of this work meant that the theories of

change evolved over the course of the project. The most recent versions developed in January 2020 as part of

this evaluation are summarised in Figure 7 and Figure 8 below.

Figure 7: Kheiron Medical Technology Mia™ Theory of Change January 2020



Figure 8: Faculty 'Platform' Machine Learning Theory of Change January 2020

The usual care model that is the starting point for the theory of change is set out in the service specification for

the NHSBSP7 and summarised in Figure 9 with the location of each of the AI tools being tested in the project in

process presented.

7 NHS public health functions agreement 2018-19. Service specification no.24. Breast Screening Programme. Version

number: FINAL. First published: September2018. NHS England Gateway Number: 07845.



Figure 9: NHSBSP Process Map (2018/19)

As well as exploring the process of implementation, this evaluation has looked at the likely effect of the tools

being developed on the breast screening service. We have focused attention within the part of the pathway

most likely to be directly impacted as highlighted above. There may be other effects downstream in the pathway

but these are as yet untested as neither tool has been deployed in a live environment.

The evaluation questions that were set out at the beginning of this project have, because of changes to the

project, been modified as the project has progressed. The modifications are summarised in Appendix 5.



Findings Overview

We present the narrative of the project in Appendix 6. This illustrates the project’s journey from the beginning

of the project prior to the Test Beds award in October 2018 through to December 2020. Over that time the

project has delivered the following outputs:

Change Domain Output

Technology: Mia™ retrospective study comprising completion of training, validation and testing;

A synthetic data set based on NBSS data to train and validate operational machine learning tools;

Round-length planning tool developed and tested.

Organisational readiness: Information governance blueprint for development of AI tools in an NHS context;

A project team with the skills and experience to test AI radiology products in the NHS environment;

Commercial / NHS partnerships for future development, deployment and uptake.

Value proposition: Outline business case for real-world deployment;

Financial and budget impact baseline model;

Adopters: Change in staff and public attitudes to the use of AI in breast screening with greater awareness across clinical staff groups in participating sites;

A baseline understanding of women in the wider population’s attitudes to the use of AI in the breast cancer screening programme in England.

Wider context: Contribution to the emerging regulation of and policy context for AI in health through collaborative work with NHS England and Improvement, CQC and NHSX.

We will pick up these changes in more detail as we present the findings of the evaluation in the next sections.



How well do different groups understand AI in general?

Fifty-eight percent of clinical staff described themselves as having some or extensive understanding of AI

compared to 20% of non-clinical staff. Women from the general public under and of screening age described

themselves as having similar levels of understanding as the clinical staff we surveyed (57%). Illustrative examples

of descriptions used are provided in Box 2.

Box 2 Descriptions of artificial intelligence

Clinical Non-clinical Women (focus groups)

“Machines helping us make better decisions.”

“Using a computer system to perform a task originally performed by people.”

“The use of computer software to perform a task normally done by a human.”

“Science fiction, robots, scary.”

“Non-human skills, anything to do with robots.”

“I think of it as more assisted intelligence.”

“It doesn't exist, it is misinterpretation of the issue.”

“Robotic systems using our information.”

“Robots, no need for human input, films, the future, apprehension but exciting, fear of the unknown.”

“Intelligence of technology, making computers act more like humans would.”

“Technology thinking for itself.”

“Robots.”

“Something very clever beyond my intelligence”

“A lot of my information is from the media and from films, but that's all I know. My understanding of AI is science - fictional or something that is created to automate answers like a chat facility. I've read about it a bit in the media.”

“I think more about software and algorithms and a learning algorithm that learns more and more as you put data in.”

“I've just had an experience with my bank of speaking with the little robot person and it was horrible and I wanted to send the survey saying this is rubbish. I wouldn't try it again.”

“A lot of my understanding is from movies where robots take over the world.”



What were the perceived benefits of the use of AI tools in the breast screening service?

Clinical staff

At the beginning of the project, clinical staff at both test sites were reported by clinical leads to have limited

awareness of the application of AI either as a second reader of mammograms, or in the management of the

breast screening programme. By the time of the round 1 survey, over 12 months had passed since the start of

the project. However, project progress had been limited and clinical teams had relatively low levels of

involvement in the training and validation of Mia™.

The clinicians showed a little change in their understanding of AI over the period studied with the majority

classifying themselves as having “some understanding” of what AI is in both rounds. Most of the rest described

themselves as “aware but having limited understanding” [Table 3].

Table 3: Clinicians self-assessment of their understanding of AI

Rates of understanding were higher in test sites compared to control sites during both rounds of the survey. The

proportion of clinicians who had read something about the use of AI in mammography went up from 67% to

76% over the period between the surveys.

When asked if they thought AI would have positive effects on society in general, 84% of clinicians agreed in

round 2 up from 76% in round 1 [Table 4].

Table 4: Clinicians perception of the potential for a positive impact of AI on society

To understand the extent to which the perceived benefits of Mia™ aligned to actual challenges faced by the

service, we asked clinical staff what they thought were the greatest challenges facing the service at present. We

asked them to select their top three from a long list of challenges based on a literature review and consultation

with a small group of clinical staff. In both rounds of the survey, workforce shortage was by far the biggest

concern (95%) and high ‘do not attend’ (DNA) rates the next biggest (33%).

Clinicians were asked about their views of the likely benefits of AI to the breast screening service based on the

theory of change for this project and their responses for the two rounds were compared.

Whilst the clinicians were positive about the potential benefits of AI in breast cancer screening in simplifying

current working practices [Table 5] and in supporting decision-making [Table 6], they are less convinced of the

potential to improve workforce capacity [Table 7] and there was little overall change in their view that AI can be

trusted to identify anomalies accurately [Table 8].

Clinicians understanding of AI Round 1 Round 2 Change

Aware of AI but limited understanding 46.51% 35.14% -11.38%

Some understanding 51.16% 56.76% 5.59%

Extensive understanding 2.33% 8.11% 5.78%

Perceived impact on society Round 1 Round 2 Change

Strongly agree 14.93% 18.92% 3.99%

Agree 61.19% 64.86% 3.67%

Neither agree nor disagree / Undecided 23.88% 16.22% -7.66%



Table 5: AI tools could simplify current working practices - Clinicians views

Table 6: AI tools could support decision-making - Clinicians views

Table 7: AI tools could improve workforce capacity - Clinicians views

Table 8: AI tools could be trusted to identify anomalies correctly - Clinicians views

Asked about their level of comfort using AI as a second reader in the process of reading population breast

screening mammograms, 51% agreed in round 1, they would be comfortable, and this increased only slightly to

54% by round 2. The proportion saying they would not be comfortable also went up from 4.5% to 8%. When

asked to expand on this qualitatively and indicate what would give them greater confidence that: AI second

readers were safe effective; would improve their working life; and impact the experience of women coming into

the service, 43% of respondents indicated the want to see evidence from trials and 24% wanting results of clinical

audits in situ.

Simplify current working practices Round 1 Round 2 Change

Strongly agree 8.96% 13.51% 4.56%

Agree 43.28% 62.16% 18.88%

Disagree 2.99% 2.70% -0.28%

Strongly disagree 0.00% 0.00% 0.00%


Support decision-making Round 1 Round 2 Change

Strongly agree 2.99% 13.51% 10.53%

Agree 56.72% 56.76% 0.04%

Disagree 1.49% 0.00% -1.49%



Improve workforce capacity Round 1 Round 2 Change

Strongly agree 10.45% 5.41% -5.04%

Agree 52.24% 62.16% 9.92%

Disagree 1.49% 2.70% 1.21%



Be trusted to identify anomolies accurately Round 1 Round 2 Change

Strongly agree 2.99% 8.11% 5.12%

Agree 38.81% 32.43% -6.37%

Disagree 5.97% 5.41% -0.56%





Quotes from free text

“Initially it would have to be introduced as an additional tool rather than a replacement. The use of audits would

then be able to determine the effectiveness and benefits of using AI.” Clinician at test site.

“[I need to see] publicised, peer reviewed results against real life.” Clinician at control site.

Clinicians were positive about the potential effect of introducing the AI reader on the experiences of women

attending the service, increasing from 42% to 51% over the period studied. There were only very small

differences between the test sites and control sites in all the survey items and these did not demonstrate any

significant change over time. Clinical staff were also positive about the potential deployment of AI to support

service optimisation in their free text responses.


“NHSBSP standards to offer appointments is very challenging, especially with age extension and having to catch

up due to a cease in screening due to COVID-19. It would be fantastic if AI was used for predicting actual numbers

attended accurately so booking slots can be used for effectively.” Clinician at test site.

Non-clinical staff

The early engagement of breast screening service administrative staff at the two test sites by Faculty as part of

the discovery process would have raised awareness and expectation within these staff groups about the

potential use and benefits of a service optimisation tool. This was reflected in the response to the question about

understanding of AI, with 83% saying they have some understanding or awareness of AI at test sites versus 78%

at control sites. This had increased to 100% by round 2 although the small sample size for this round is a

limitation when interpreting results [Table 9].

Table 9: Non-clinicians self-assessment of their understanding of AI

When asked if they thought AI would have positive effects on society in general, non-clinical staff were more

sceptical than clinicians with only 30% of agreeing in round 1 (although this increased to 58% in round 2). This

may be linked to the age profile of the two groups:70% of non-clinical staff were aged 50 and over; whereas only

40% of clinical staff were aged 50 and over.

To understand the extent to which the perceived benefits of the service optimisation tool aligned to actual

challenges faced by the service, non-clinical staff were asked what they thought were the greatest challenges

facing the service at present. They selected their top three challenges from a list drawn up after consultation

with a small group of non-clinical staff. In both rounds of the survey, workforce shortage was the biggest concern

(80%), high ‘do not attend’ (DNA) rates the next biggest (58%) and administrative burden also being a significant

concern, more so in round 2 (58%).

Non-clinicians were asked about their views of the likely benefits of AI to the breast screening service based on

the theory of change for this project and their responses for the two rounds were compared. By the time of the

second survey round, non-clinicians were more likely to agree that AI would have potential benefits in supporting

the management of breast screening services against all three dimensions [Table 10].

Non-clinicians understanding of AI Round 1 Round 2 Change

Some understanding 21.05% 58.33% 37.28%

Aware of AI but limited understanding 60.53% 41.67% -18.86%

None 18.42% 0.00% -18.42%



Table 10: Non-clinicians who agree that these benefits of AI are likely

Quote from free text

“I think if this will increase the experience of women it will be a good thing. We do get a lot of anxious women

wanting to know results sooner than our protocol. However, hopefully there are no risks involved.” Breast

screening service manager.

Non-clinicians were undecided about the potential effect of introducing AI service optimisation tools on the

experiences of women attending the service, with only 33% in round 2 agreeing that it would have a positive

effect on women’s experience (although this had increased from 18% in round 1).

Women of and under screening age

The survey was targeted at women over the age of 18 years and the respondents were segmented into women

who are currently or have recently been the population of focus for the NHSBSP (aged 50+ years) and women

who are not yet of screening age. We wanted to understand if there were differences between these two groups

in what they think about the use of AI in the screening programme to read mammograms and support

programme management. More information on the profile of the respondents is included in Appendix 7.

Most women (54%) did not understand the process for currently reading mammograms. Twenty-four percent

had an accurate understanding of the process [Figure 10]. A higher number of women under screening age (58%)

did not understand the process compared to women of screening age (49%), indicating that women are likely to

seek out information on this once they enter the programme.

Figure 10: What is the process for reading mammograms?

When asked about their understanding of AI in general, most women rated themselves as having some

understanding (55%). When this was broken down by age group, there was consistency across all age groups

Potential benefit Round 1 Round 2 Change

Assist the management of the BSS 29% 50% 21%

Improve workforce capacity 34% 42% 8%

Simplify current working practices 32% 50% 18%



except for the 18-19 years and 70+ years groups who both reported higher levels of no or little understanding

[Figure 11].

Figure 11: Women’s self-assessed understanding of AI

The survey respondents were much more likely to be positive about the potential effect of AI on society (50%)

than negative (6%), but were almost as likely to be undecided on this (44%) [Figure 12]. Women under screening

age were slightly more likely to be positive about the potential effect of AI (53%) as opposed to women of

screening age (47%).

Figure 12: Artificial intelligence will have a positive effect on society



Some women chose to explain why they had given the response that they did to this item in optional free text

(n=670). Sentiment analysis8 revealed 42% positive statements about the impact of AI. A similar pattern was

noted for both women of and under screening age [Figure 13].

Figure 13: Sentiment analysis of free text in response to AI having a positive impact on society

When we looked at the free text responses of those women who stated that they neither agree nor disagree

with the statement, there was more of a difference between women of screening age and those under screening

age with women under screening age more likely to express positive or mixed sentiments about AI and its effect

(35%) than women of screening age (21%) [Figure 14].

Figure 14: Sub-sample of those who responded neither agree nor disagree that artificial intelligence will have a positive effect on society

8 Sentiment analysis involves classifying opinions into categories like "positive" or "negative” view.




“A computer if programmed correctly will not produce any errors when a human can.” (50-59 years)

“The removal of emotion and personal circumstances can lead to more consistent and fair decisions.” (40-49

years)

“AI can be used to aid in education and understanding of many issues.” (20-29 years)

“It can be used in a range of areas to improve speed, accuracy, reduce costs of certain tasks. It is already used in

lots of ways. It can standardise tasks/tests etc. as not prone to same biases of humans (other biases may exist

and need to be taken into account). Reduce problems caused by human error and differences in my opinion.” (30-

39 years)

Respondents were asked about their views in a free text response on using AI in the breast screening

programme, both as a second reader and to support programme management. Sentiment analysis of these

responses showed that the largest proportion of women were positive about the use of AI in the breast screening

programme (46%) with the next largest expressing mixed views (20%) and 16% expressing a negative view

[Figure 15].

Figure 15: Views on the use of AI as part of the breast screening programme (% of total sample)



Women of screening age were the more strongly positive of the two groups and women under screening age

more likely to hold mixed views [Figure 16].

Figure 16: Views on the use of AI as part of the breast screening programme (% of each sample group)

Thematic analysis of the free text data showed that when any perceived benefits of using AI in the breast

screening programme were mentioned, women were most likely to say that they were not sure what these

would be (n=543). When benefits were identified, the most frequently mentioned were; increased efficiency

(n=162), improved reliability (n=263) and greater safety (n=139). Many women expressed the view that AI in

breast screening would and should happen (n=847) in the future which represented 78% who expressed a view

about the future of AI in breast screening.

These themes around potential benefits were explored in more detail as part of the focus groups that followed

the survey. Information about the participants in the focus groups is included in Appendix 7.

Of the 25 women who took part in the focus groups, 76% had either experienced a breast cancer diagnosis

themselves or know someone who had. Seventy-two percent had had a breast cancer screening appointment.

Sixty percent of the women who took part knew that two readers looked at mammograms and 36% did not

know the process for reading mammograms. This made them a more informed group than the general

population surveyed, probably reflective of their self-selecting status when volunteering to take part in the focus

groups.

Many focus group participants expressed the view that the use of AI in healthcare and specifically in the breast

screening programme was inevitable (n=11), with some seeing a positive contribution being made by AI (n=4).

The main benefits that women saw AI in breast screening offering were in increased efficiency (n=23), improved

reliability (n=12), improved outcomes (n=8) and improved safety / fewer errors (n=8). They also hypothesised

that introducing AI into the breast screening programme might release staff to higher value activities and save

money for the service (n=6) and help address the workforce shortage within the breast screening programme

(n=17).




“My GP has introduced AskMyGP – I was blown over from the response - personalised to me. I would find this

easier to do and would prefer to spending 2-3 hours going to the surgery.” (Woman under screening age)

“I'd like to think that this AI will shorten the time taken from the mammogram being taken to getting the results.”

(Breast cancer survivor)

“AI in the background - you could really get a lot of people through the system. I'd have a level of comfort from

a mammogram point of view. I guess if there was a problem you would have a review by a radiologist. I guess on

the back end you would still be getting some kind of personal touch.” (Woman of screening age)

“I wanted to choose a mix of AI and human. So much of my life was waiting for results and being on hold, I think

it was about speed and accuracy for me. I don't have enough experience of normal mammograms to know how

to answer.” (Breast cancer survivor)



What were the concerns of the workforce and women about the use of AI tools in the breast screening service?

Across all the groups we researched, ‘trust’ was the single most referred to concern. The emphasis for each

group in terms of both reasons for lack of trust and consequent mitigations were slightly different and are

explored in the following sections.

The word “trust” was mentioned 137 times in the public survey in the free text response to two questions which

sought to understand more detail on respondents’ attitudes to the use of AI.

• Tell us more about why you selected the level of agreement with Artificial intelligence can have a

positive effect on society that you did.

• How would you feel about artificial intelligence being used to read mammograms?

When we included synonyms such as “sure”, “confident” and “believe”, this incidence increased to 696. Almost

all the respondents who mentioned trust, either did not trust AI or felt that it could not be trusted if used in

isolation without human oversight.

Clinical staff

When asked how often they trusted information from search engines general queries, clinical staff trusted this

information often or very often 64% of the time. Seventy-three percent of this group use search engines to seek

health information for themselves and the proportion who do this declines only with age for those in the 60-69

years age band [Figure 17].

Figure 17: Clinician's use of search engines to seek health information

While some clinical staff see an AI second reader as potentially alleviating the severe pressure placed on the

screening workforce at the moment while releasing staff to activities that are “patient-facing”, others expressed

concern about the impact of introducing AI as a second reader on possible job losses and reduced opportunities

for reporting radiographers to enhance their skills.



Clinical staff were also concerned about the safety, accuracy and reliability of the AI reader and saw the

publication of clinical trials as an essential prerequisite to adoption of the technology in the breast screening

service workflow. A text search of free text returned 30 mentions of “evidence”, “trial” or “audit” in round 2 of

the survey, an increase from round 1.


“I believe the use of AI could be an exciting development in improving the service, however I would want to see

the evidence that it is a safe tool to use.”

“I think AI tools in the breast screening programme may be useful in booking patients, sending them invitation

to screening etc, but probably not advisable/safe to use in image reporting.”

“It will be easier to assign sensitivity thresholds with an AI to reduce false positives.”

Non-clinical staff

When asked how often they trusted from search engines general queries, non-clinical staff trusted this

information often or very often 66% of the time. Sixty-six percent of this group use search engines to seek health

information for themselves.

During the project, the non-clinical workforce had no real-world experience of a tool in practice until the last

two months (after the surveys had closed). This was reflected in the common assertion in the free text responses

that respondents to the survey did not have enough information to express a view on the introduction of AI into

the service management workflow. Some concerns were expressed about the possible effect on jobs in terms of

job losses if a service optimisation tool is introduced.


“Because I do not know much about it and nothing artificial is usually not good. However, if I don't understand

how it works then my answer will be biased.”

“I feel uneasy about it until there is more information and research to prove the reliability and benefit and also

wonder what this means for jobs.”

“I would like to see it in place first before I make a comment.”

Non-clinical staff also expressed some concerns about the accuracy and reliability of the AI second reader in free

text responses and the knock-on effect that this could have on women’s experience of and confidence in the

service.



Women of and under screening age

When women were asked in the survey the extent to which they trusted the output of commonly used

technology platforms that use AI such as search engines and virtual assistants, 60% of all women said they

trusted the output often or very often leaving over one third somewhat or very sceptical [Figure 18].

Figure 18: The extent to which women trust the information they get from the technology that they use everyday

There was a small difference between women of screening age and under screening age with those of screening

age a little more sceptical than those under screening age [Figure 19].

Figure 19: Difference in trust between women of and under screening age

Six hundred and seventy women chose to explain why they had given the response that they did to the statement

“Artificial intelligence will have a positive effect on society” and when these responses were analysed for

sentiment, 39% made mixed or negative statements about the impact of AI. Women under screening age had

slightly more mixed or negative views (41%) than women of screening age (37%) [Figure 20].



Figure 20: Sentiment analysis of free text in response to AI having a positive impact on society

Women who had a negative or mixed view of the effect of AI in society were unsure of why they felt this way in

many cases (n=96) although they felt it was an inevitable part of their lives in the future (n=20). Those that did

express a view cited concern about the reliability and safety of technology (n=123); a lack of trust in the

technology itself or the systems that sit around it (n=65); a fear about a combination of over-reliance on AI and

job losses that might ensue (n=32); and the absence of the human touch in interactions (n=46).


“If used well, AI has an important part to play in diagnosis of disease. However, there are also dangers of it being

used to further profit-driven goals.” (50-59 years)

“It reduces human interaction but I agree there are a lot of amazing applications of AI that can, for example,

keep the disabled independent.” (40-49 years)

“It has the potential for profound good or profound harm. It must be controllable.” (20-29 years)

“AI has the ability to remove skills and development of people and learning is wasted. I’m concerned for my

children in the further, I believe that humans will be removed from day to day things and life will miss person

centred contact.” (30-39 years)

When asked about their views on using AI in the breast screening programme, both as a second reader and to

support programme management, sentiment analysis of these free text responses showed that the 16% of

women expressed a negative view [Figure 15].

When we compared the women’s views of the effect of AI on society and their views on the use of AI in the

breast screening programme, women with positive views on the effect of AI on society as a whole were slightly

more likely to hold positive views on the use of AI in breast screening but interestingly, also more likely than

women who had a negative or undecided view on the effect of AI on society, to hold a negative view of AI in

breast screening. In other words, women who have positive views about AI’s effect on society have more decisive

views in AI in breast screening (positive or negative). Women with the highest proportion of negative views on

AI in breast screening were those with neutral views on the effect of AI on society [Table 11].



Table 11: Cross tabulation of the perception of the effect of AI on society with perception of use in breast screening

View of the use of AI in breast screening

View on the effect of AI on society

(rank)

No

answer

Negative Mixed Undecided Positive Grand

Total

Strongly disagree 3 26 12 7 52 100

Disagree 5 77 15 16 16 129

Neither agree nor disagree /

Undecided

68 414 329 423 569 1803

Agree 29 148 452 139 1069 1837

Strongly agree 4 10 31 8 174 227

Grand Total 109 675 839 593 1880 4096

When women’s perception of the use of AI in the breast screening programme was compared with women’s

self-reported understanding of AI overall, we found that the higher women rated their understanding of AI, the

more likely there were to have a positive view of its use in the breast screening programme [Table 12].

Table 12: Cross tabulation of self-reported understanding of AI with perceptions of AI in breast screening

View of the use of AI in breast screening

Understanding of AI in general No

answer

Negative Mixed Undecided Positive Grand

Total

None 33 66 25 90 90 304

Aware that it exists but little

understanding

33 287 269 261 583 1433

Some understanding 40 310 525 236 1148 2259

Extensive understanding 3 12 20 6 59 100

Grand Total 109 675 839 593 1880 4096

Thematic analysis of the free text data showed that the concerns and risks of using AI in the breast screening

programme most frequently mentioned were: a lack of clarity around how the technology would be governed

and regulated once in place (n=163); and the lack of ‘human touch’ that may result (n=143). A large number of

women (n=643) expressed the view that they expected the AI tool being used as a second image reader would

be rigorously tested and there would be robust evidence made available on its safety and accuracy. A small but

not insignificant group (n=242) of women who expressed a view about the future of AI in breast screening said

that AI should not be used within the breast screening programme.

These themes around potential concerns were explored in more detail as part of the focus groups that followed

the survey. Information about the participants in the focus groups is included in Appendix 7.

As we stated before, the 25 women who took part in the focus groups, were a more informed group than the

general population surveyed.

The main concerns that were expressed by women were: the absence of the ‘human touch’ (n=37); lack of clarity

around how the AI tools will be governed; and potential discriminatory bias avoided (n=33) and how data privacy

will be protected (n=25).




“There are all those sci-fi movies where it goes rogue and I am not saying it is not completely far-fetched, most

of it is, but I think it about having some real strong list of ethical principles about how you use it but in free

market capitalism you are not going to have that are you? that would be bad for society, it will make money for

the money people, it will leave behind the poor people and there will be some good people along the way who

will do good things with it.” (Women of screening age)

“I think behind the scenes it is great, but I think you need a lot of face to face compassion and understanding.”

(Woman under screening age)

“If there was some sort of consent, confidentiality, some sort of understanding of the rules. It would be nice to

know that some sort of trustworthy organisation was monitoring it.” (Breast cancer survivor)

“I have been reading negative stuff about AI like facial recognition and how it’s a bit biased - would it be biased

against certain racial populations?” (Woman under screening age)

“Internet security and AI security and hacking - it is a concern for me. My worry would be that systems are hacked.

Do they have a minimum level of security?” (Woman of screening age)

When exploring the kind of actions that women thought would mitigate some of their concerns, they suggested

that the workflow would always need to involve humans. For some women the AI technology undertakes most

of the activity including decision-making with human oversight (n=10), for others they see the human role as

pre-eminent with AI used to augment clinical activity and decision-making (n=15). The women assumed that

this technology would never be used without clear evidence of its effectiveness (n=24) and expected effect on

the equity of access to breast screening to be closely monitored (n=18) through governance processes.

Women were divided on whether they would want to be informed if AI tools were being used as part of their

experience of breast screening but agreed overall that women should be given the information as part of the

process of informed consent when taking part in the breast screening programme (n=15).



What were the technical and data benefits and challenges for the project?

This project was not testing real-world deployment of market-ready digital tools, but developing innovative tools

using artificial intelligence in a real-world context. In the case of Mia™, this meant training, validating and testing

their AI breast image reader on a large retrospective data set extracted from the GE Healthcare system that

EMRAD Trusts use to process and store all medical imaging. Faculty first tried to develop their service

optimisation tools using NBSS data extracted from EMRAD trusts but it became clear that this would not be

possible and so they pivoted to developing a synthetic data set (SDS) based on an extraction of real-world data

at one EMRAD trust site, and then developing the service optimisation tools using this SDS.

Whilst there were some minor technical challenges during the project, the data challenges were more

significant. These can be separated into challenges with (1) ethical approval and information governance; (2)

data extraction and transfer; and (3) data cleaning and preparation. The findings below are based on content

analysis of project board minutes, individual and group interviews and other project documents.

Mia™ AI breast image reader

The work undertaken by Kheiron as part of the project was governed by the ethical approval obtained from the

Health Research Authority (HRA) for the retrospective study. During the project, the EMRAD team identified that

Kheiron’s internal/onsite computer servers used to store the de-identified extracted data, were not compliant

with standard ISO2700. The AI project board discussed the issue and assessed the impact of it as low due to the

nature of the data (de-identified) the contractual protections in place, and the imminent update of ISO27001.

In October 2018, GE Healthcare’s datacentre copied the data onto an encrypted share which was transported

securely to NUH. The next six months was spent on data de-identification, quality control and information

governance sign-off. After this was complete, half of the data was transported securely to Kheiron’s offices in

London. This transfer process took considerable time and coordination to organise to comply with information

governance requirements. Obtaining sign-off for data transfer from more than one trust caused significant

delays to this process and suggestions have been made to mitigate this in future projects.

Data cleaning and preparation is a critically important stage in the process of training an AI tool (Harvey H.,

2019). Kheiron worked with the two trust sites (NUH and ULH) to ensure the de-identified datasets still contained

sufficient data markers to enable ML training. A risk was identified early on that data fields removed during the

anonymisation process, or data fields not extracted initially, may later prove important for training. In March

2020, it became clear that two such data fields were missing. The data had to be re-extracted. This lesson was

recorded in the project log and AI project board agreed future mitigations should this arise again.

The National Breast Screening System (NBSS) is provided by Hitachi Consulting and has been used by Public

Health England’s (PHE) national breast screening service since the 1980’s (House of Commons, 2019a). The

system is maintained but the underlying programming is not now commonly used and requires specialist

knowledge to write scripts for data extraction. Kheiron were able to source this expertise early in the process in

October 2019. They also needed to work closely with GE Healthcare to integrate Mia™ with the GE imaging

interface. Having GE Healthcare as a partner in the project from the beginning made this critical enabler possible.



Faculty service optimisation (round length ML) tool

During the first half of the project, Faculty were planning to develop their service optimisation tool using de-

identified data extracted from NBSS and BS-Select9 which includes GP data. Again the age of the NBSS system

meant that Faculty had to re-scope their tool design to focus on static downloads of data as PHE advised them

that development work to enable integration with NBSS would not be a priority. Faculty mitigated this risk by

removing the need for a live feed from the NBSS system.

It quickly became clear that information governance would make GP data inaccessible to Faculty. When

exploring how they would extract and transfer the data they needed from the EMRAD sites (NUH and ULH),

Faculty and EMRAD recognised the need for a data protection agreement. After several weeks of discussion, it

became clear that the issue of level of liability cap was contentious and took some time to resolve. During this

time, the two parties (EMRAD and Faculty) had some frank discussions about the continued inclusion of Faculty

in the project, agreeing in September 2019 that Faculty should continue and reaching a compromise on liability

under the data processing agreement. By October 2019 it had become clear that the data requested could not

be shared due to complexity and sensitivity.

At this point, it was agreed that Faculty would refocus activity on the production of a synthetic data set (SDS)

that could then be used to develop a few different service optimisation tools for the breast screening

programme. To address the information governance challenges that proved insurmountable during the first half

of the project, Faculty applied to HRA for ethical approval of the development of the SDS with the NUH as

sponsor of the research. This provided Faculty with access to specialist advice from NUH’s Research and

Innovation team. The novelty of this type of research in the NHS meant that the process for applying for ethical

approval was unclear and required many discussions with stakeholders and decision-makers. For example,

initially it was thought that the research did not need to go to the Confidentiality Advisory Group (CAG), then it

was decided that it did and then subsequently confirmed that it did not. This decision alone took three months

to confirm, adding time to this process. HRA approval was finally granted in July 2020.

For the development of the SDS, Faculty agreed an approach where the round length management ML tool

would be first trained and refined on synthetic data10 on Platform™ and deployed into NUH’s IT environment

and retrained on real data there [Figure 21].

9 BS-Select is another IT system used by the NHS Breast Screening Programme to take information from primary care to invite women to breast screening appointments. 10 Synthetic data is produced by a learned generative model of original data, in this case extracted from NBSS, which retains the same statistical patterns as the original data.



Figure 21: Process for developing the SDS

The first four stages were completed by the end of September 2020 in the NUH environment. The Covid-19 social

distancing restrictions meant that a model was developed which enabled a Faculty data scientist to access

historic NBSS data remotely and interact securely using a soft token to gain remote access using a virtual desktop

infrastructure (VDI). This interaction was observed by an NUH IT team member to provide assurance that agreed

protocols were being adhered to. Faculty used JupyterLabs open-source software and Faculty’s proprietary code

to generate the SDS. The SDS generation tool developed during this process is a supplementary output of this

work and will remain in place in the NUH environment to be used for generating other SDSs. This generation

tool is co-owned by NUH (on behalf of EMRAD) and Faculty.

The next stages (5-6) took place in the Faculty environment after SDS data transfer. The first attempt at data

transfer was deleted during to concerns about data completeness post-transfer. These two stages took 6-8

weeks and were delivered concurrently with stage 7 (building the deployment environment in NUH IT) which

enables the tool to be deployed rapidly by ensuring the readiness of the host IT infrastructure. The breast

screening service managers at NUH were involved in the development of the wireframes during this stage of

development to make sure that prioritised features met their needs and reflected their existing decision-making

processes. The tool has not yet been integrated with live NBSS and it is unlikely that this is realisable in near

future. Howeverthe tool is available to service managers as a service planning tool which enables managers to

build schedules and plan against scenarios using the NBSS-based synthetic data. It will be possible for service

managers to load static cuts of NBSS into the tool with minor additional development. Future further

developments are planned to continue in 2021.

Breast screening service staff perceptions of the technical feasibility of using AI tools in the workflow

When clinical staff were asked in the survey if they thought it would be technically feasible to embed an AI breast

screening management tool into current working practices, 49% respondents in round 2 believed that it would.

This was down from round 1 (57%). This may be explained by the lack of perceived progress in introducing the

AI second reader into their workflow during the period of the project given it was a planned activity at the outset.

When asked the same question, non-clinical staff were more equivocal in both rounds with 67% respondents in

round 2 neither agreeing nor disagreeing, down from 74% in round 1. The remaining minority of respondents



agreed it would be technically feasible. In the free text, some respondents voiced dis-satisfaction with the

current technology offered by EMRAD as part of the wider programme of EMRAD’s work and this may bias their

perception of any initiative.

EMRAD programme leadership perceptions of the technology

This project is governed as part of the wider EMRAD governance structure and process. As such, the members

of the EMRAD governance groups are accountable to their trusts for the performance of this and other projects

undertaken by EMRAD. A sample of this group were interviewed at the beginning of the project and a wider

group were sent a survey in July 2020 to understand their perception of the progress that the project had made.

We received 18 responses from a mix of people who had direct involvement with the day to day running of the

AI project (n=6) and people who were indirectly involved, taking governance or advisory roles (n=12).

When asked how easy the AI tools would be to use within the existing technical infrastructure, respondents the

majority said they did not know (n=11). Of those who said it would be easy (n=5), they were not directly involved

in the day to day running of the project.



What were the organisational issues that enabled or constrained the progress of the project?

The project was extended by Innovate UK beyond its original timeline on two occasions, once by six months in

response to funding mobilisation delays and the second time by a further three months in response to the

COVID-19 pandemic. It was also re-scoped on several occasions for both AI tools being developed and tested.

The findings related to organisational issues below are based on content analysis of project board minutes,

individual and group interviews, and other project documents as well as responses to the programme leadership

survey administered in July 2020. Three main categories of enablers or constraints were identified in the data,

(1) capacity, (2) readiness and (3) implementation. These are explored at three levels, (1) core EMRAD project

implementation team, (2) wider AI project implementation team including commercial partners and (3) EMRAD

trusts.

The project was led by NHS EMRAD Imaging Network, a non-statutory body in which all seven trusts have equal

interest. EMRAD is hosted by Nottingham University Hospitals NHS Trust (NUH) and NUH was chosen as one of

the two sites for the test bed. There is strong support for NUH taking a leading role on behalf of the network in

this and other projects. EMRAD had been the vehicle through which the network successfully delivered a new

model of care as part of the Acute Care Collaboratives (ACC) programme sponsored by NHS England from 2016-

2018. The resources, structures and processes built around the EMRAD network created capacity at project level

with a core project team allocated to the test bed project from the outset, including in the test bed application

process. However, the delay in distributing funding from Innovate UK during the first quarter of the project

meant that EMRAD and the host organisation NUH were unable to recruit the additional project management

staff they needed for the project. By the time the funding was in place and the roles were approved, the length

of appointment (a maximum of 12 months at that point) made the roles unattractive and only one post was

filled. In March 2020, in response to the need for additional frontline resource in response to the Covid-19

pandemic, some clinical staff seconded to the team were redeployed to support the pandemic response in their

employing trusts.

The wider team include the commercial partners in the project, GE Healthcare, Kheiron Medical Technologies

and Faculty. These companies were able to flex their own internal capacity to meet the needs of the project at

different times. Kheiron raised a Series A funding round in 2019 and Faculty were the successful supplier to

support NHSX build the NHS AI Lab in 2020. However, the national lockdown imposed in March 2020 in response

to the rise in community spread of Covid-19 led to at least one commercial partner putting staff on furlough in

April 2020.

The breast screening units in each of the test sites, NUH and ULH, had different levels of capacity, as well as

activity. ULH’s team is smaller and with high levels of variance in vacancy rates across all staff groups from year

to year [Figure 22].



Figure 22: Whole time equivalent funded staff dedicated to the breast screening service 2017-2019

The numbers of women invited for screening has increased at ULH over the last 3 years and is now similar to the

levels at NUH although uptake rates remain lower at ULH [Figure 23].

Figure 23: Women invited and attending screening appointments at the two test sites (NUH and ULH)

Each of the test sites had a nominated principal investigator (PI) who was also a senior clinical practitioner on

each site and had a significant ‘day job’. Despite this, they provided support and commitment to the project

from beginning to end. The capacity of the host trust’s information governance (IG) team to provide the levels

of IG support that were required by both teams was severely stretched by the significant pressure placed them.

To address this, the EMRAD core team negotiated with other information governance resources from the wider

EMRAD network to agree the content of a Memorandum of Understanding so that an IG lead from one EMRAD

Trust could make recommendations for any another EMRAD Trust whilst the ultimate decision remained with

the Trust's Caldecott Guardian.



The results of the clinical and non-clinical staff surveys indicated some variation in the readiness of staff to use

AI tools as part of their workflow. Clinical staff were open to the use of AI as a second image reader although

they wanted to see evidence from research trials that it is safe and effective. It is not clear how or if the use of

Mia™ will change their experience of the clinical workflow. Non-clinical staff had been engaged early in the

project as part of the discovery process but had heard little until the last two months of the project.

Exploitation was a key activity of the test bed although again the more iterative and novel nature of this project

meant that this required a different approach to other test beds. In the case of Mia™, EMRAD was expecting to

be able to procure a trained and validated product for use in clinical workflows by the end of the project.

However, a few issues constrained this. A business case was drafted, but without an agreed price for Mia™, could

not be developed in detail. Whilst Class IIa CE marked11, under the new evidence standards for digital health

technology (NICE, 2019), Mia™ is classed as a Tier 3b12 which requires the highest level of evidence. These

evidence standards are not mandatory but are being used by regulators and purchasers across the NHS in

England when assessing the value of new technology. The standards were developed during this project. The

project partners contributed to its design and testing of the first questionnaire to assess technology evidence.

There is still no clear process for applying the evidence standards in practice.

Even if Mia™ was fully tested in clinical workflows, a price set and it was approved by regulators, there was still

no procurement mechanism available for commercial exploitation. Kheiron applied to the Health System

Support Framework (HSSF) Lot 0 in early 2020 but failed to secure a place. Currently, Kheiron is not on any

procurement frameworks for the NHS. In the case of Faculty, there is currently no service optimisation tool to

procure. When this changes, Faculty is on the UK government’s gCloud procurement framework and has already

successfully become a supplier to NHSX supporting the development of the NHSX AI Lab.

The project implementation was led by EMRAD core team. This team and its previous experience working closely

with the imaging system vendor, GE Healthcare, ensured that GE were involved in the project from the very

beginning. The supportive involvement of GE Healthcare minimised what could have been a very significant

obstacle to implementation.

The programme leadership surveyed in July 2020 were asked about their perception of the progress made by

the project up to that point against expectations [Figure 24]. The group was split between those who thought it

had made the expected progress and those who thought it had made less than expected. Only one respondent

thought it had made more progress than expected and that respondent came to the project in early 2020.

11 Class IIa medical devices are low to medium risk devices. 12 Digital health technologies (DHTs) with measurable user benefits, including tools used for treatment and diagnosis, as well as those influencing clinical management through active monitoring or calculation. It is possible DHTs in this tier will qualify as medical devices.



Figure 24: The programme leadership's perception of project progress (July 2020)

When asked to explain why they gave this assessment, those that felt the project had made the progress they

expected cited the following factors as influencing the progress; the novelty of the technologies, the complexity

of the project and the trust between the partners in the project. They pointed mitigating actions that they

observed included changing project scope, learning iteratively by building formative evidence and leveraging the

EMRAD core team and its considerable change management experience. For those who thought the project had

made less progress than they hoped, they pointed to the absence of real-world application of the tools, a key

deliverable of the project in their view. The problems extracting data were identified as a key constraint.

Quotes from the programme leadership survey

“From an EMRAD point of view, this is very much new territory for us and is the first time that we have had to

apply for HRA approvals, complete Data Access and Processing Agreements, Data Protection Impact

Assessments, etc.”

“Whilst it would be nice to say that Mia (and Faculty's tool) were in use in clinical settings, this is not the case

right now. I think this is not a reflection of slow progress, but rather of the complexity of the task, and the

innovative nature of the technologies, but also getting these technologies ready to be applied in a clinical

setting.”

“[This] has been a steep learning curve for many of us.”

“I am disappointed that the tool is not yet in a live or semi-live setting.”

“At one point in the project, I thought that Faculty would have to drop out of the project as they were unable to

access the necessary data.”

“We have learned that it is far more complex than we first envisaged, with a number of regulatory and national

stakeholders needing to endorse the changes before we can deploy.”

“The project has been delayed for numerous reasons, some beyond the projects' control.”



Focusing on project attributes, a majority of the programme leadership who responded agreed that the vision

and objectives of the project were clear (89%), decision-making and reporting was clear timely and agile (89%)

and the project was well resourced for implementation (72%). They also expressed the view that the

relationships between the project partners were open and based on trust (72%). There was less agreement when

programme leaders were asked their view on the involvement of clinical and non-clinical teams from the breast

screening service in the project, with only 45% agreeing that this was the case. This reflects the views expressed

by a few of the staff in the round 2 surveys who had also expected to see more progress in the project by July

2020 and questioned why they had not been more involved.



What wider contextual issues affected the progress of the project?

The project was undertaken during a period of accelerated change for technology in the NHS and increasing

interest in the use of AI in healthcare. The wider contextual factors which impacted the progress of the project

were policy, regulation, social attitudes and Covid-19. Some of these impacts enabled progress and others

constrained it. The findings related to contextual issues below are based on content analysis of relevant UK

policy documents, regulation white papers, AI project board minutes, individual and group interviews, the

programme leadership survey administered in July 2020, the public survey administered in December 2019 -

February 2020 and the focus groups.

The policy context for technology including AI in healthcare has been supportive over the last six years (Asthana,

2019) with the NHS Five Year Forward View (Maruthappu, 2014) and NHS Long Term Plan (Alderwick, 2019) both

dedicating significant space to the role of technology in enabling greater care integration and coordination and

more focus on prevention for better population health.

A number of initiatives have been set up to promote the testing and adoption of technology in healthcare

including the NHS Test Beds Programme, of which this project is part, the Digital Catalyst programme and the

Global Digital Exemplars programme all of which have received significant investment. The Topol Review was

conducted in 2018-19 and looked at the effect of increasingly digital modes of healthcare provision on the

workforce and made a series of recommendations. NHSX was set up in July 2019 to bring together expertise in

digital, data and technology that had been fragmented across different bodies into one organisation. NHSX

included an AI team which published a number of policy papers during 2019 and 2020 including the “Code of

Conduct for data driven health and care technology” and “Artificial Intelligence: How to get it right” and set up

the AI Lab in late 2019. EMRAD and its commercial partners were involved in discussions with NHSX and other

third-party stakeholders (such as Public Health England) about the challenges it was facing around data sharing,

collaborating with providers of systems outside the immediate programme. These discussions led to the

unlocking of some constraints for the project, as well as providing insight to NHSX informing the development

of some of their policies.

One of the areas of focus for NHSX’s AI Lab is the regulation of AI within the health system. The key health and

care regulators for England are all involved in this work and include the National Institute of Health Excellence

(NICE), Care Quality Commission (CQC), Medicines and Healthcare Products Regulatory Agency (MHRA) and the

Health Research Authority (HRA). The priorities for this work on regulation of AI in the NHS in England are (1)

streamlining processes for regulatory approval of AI tools, (2) regulating synthetic data sets, and (3) post-market

surveillance. One of the partners in this project, Faculty, are supporting NHSX in the set up and initial running of

the AI Lab. Whilst the teams supporting the two projects are separate, one of the project managers from the

EMRAD project was moved to the AI Lab when this project was set up in April 2020 and would have taken the

learnings from the EMRAD project with them. During the project, NHSX introduced the “Evidence Standards for

Digital Health Technologies” in March 2019 and the EMRAD project team were asked to be part of the testing of

the first assessment questionnaire tool. This enabled to the project to reflect on some of the requirements and

burdens of likely future regulation and provide feedback to NHSX on the tool. This feedback was incorporated

into successive iterations of the assessment toolkit.

During the period of this project the General Data Protection Regulation (GDPR) came into force across the EU

in May 2019. This placed new requirements for information governance particularly in the responsibilities of

data controllers and processers.



In May 2020, the ICO, together with the Alan Turing Institute published guidance on explaining decisions made

by AI (ICO and Alan Turing Institute, 2020) and later the ICO updated their guidance on AI and data protection

(ICO, 2020). While neither of these had a direct impact on the project, both of these are likely to have significant

impact on any future implementation of the tested AI tools. This is underlined by the importance that women

responding to our survey placed on how AI technology would be governed and regulated once in place (n=163).

For the AI project each new introduction of guidelines in respect of AI and data protection required the project

team spending time reading and understanding the impact of the guidance for the project and communicating

this to stakeholders, and recommending changes to project delivery and implementing these. All of these

provided valuable learning, which was recorded in the project lessons log, but also contributed to the delays

experienced by the project.

In August 2020, the Secretary of State for Health and Social Care announced that PHE would be abolished in

March 2021 and replaced by the National Institute for Health Protection. It is not yet clear where responsibility

for the national screening programmes including breast screening will sit, although the NSC will continue to

advice on any changes to the screening programmes. This announcement had little impact on the project

although it may impact how the recommendations and outcomes of the project are followed through if

responsibility for the national programme remains unclear.

The project was delivered against a backdrop of increasing awareness amongst the public about the use of AI in

daily life, the role of algorithms in decision-making and the benefits and disbenefits of technology in general. It

also surfaced some of the differences in the working practices and mindsets of people working in the technology

sector and people, including clinicians, working in the NHS. These social factors were noted throughout the

duration of the project. Whilst understanding and ‘explainability’ of AI remains relatively low across the board,

2020 has seen a number of public scandals involving algorithmic decision-making including the abandonment of

algorithmically determined exam results in the UK in August 2020. In the same month facial recognition AI used

by South Wales policy was ruled unlawful by the local authorities, stopping the use of algorithmic decision tools

for benefit claims. This will affect public trust in AI and algorithm-based tools. A small but not insignificant group

(n=242) of women we surveyed expressed the view that AI should not be used within the breast screening

programme and it will be interesting to see if this group increases or decreases in time.

Socio-cultural differences between commercial technology firms and public sector providers were evident from

the outset. The first output from Faculty at the end of their ‘discovery’ phase in December 2018, presented ten

ideas for AI service optimisation tools. The AI project board asked for these to be evaluated for potential

outcomes, financial savings and commercial market opportunities. This ‘business case’ type approach contrasts

with the agile approach that was used by Faculty. Two board meetings over two months were needed to resolve

this difference in expectations and gain understanding of each other’s different working practices.

Quote from the Programme Leadership survey

“One of the biggest challenges for me personally has been bridging the NHS-industry communications gap. The

complexity of the technology, the scale, the novel nature of the task at hand, and the continuously steep learning

curve that comes for companies working on the development and testing of these technologies, often makes it

difficult to communicate externally at the right level of simplicity/ transparency/ cadence, whilst also not

introducing too much uncertainty or confusion such that trust is damaged.” (AI project programme leadership

team member #17)

There have also been other differences between AI project stakeholders on outcome expectations. A continuum

emerged between those who wanted to take a measured and cautious approach to gathering clinical evidence



as part of a process of clinical trials (for Mia™) and those who wanted to move quickly to real world deployment

based on existing results and monitor effectiveness in this real-world context. The nature of the tools being

tested as part of this project as ‘in development’ rather than ‘market-ready’ underscored these differences and

the lack of clarity around regulatory requirements further complicated matters. The NHS Test Bed Programme

itself was set up to promote the deployment and scale of proven technologies that are market ready. This meant

that there was a mismatch at times between the expectations of the EMRAD management board and the

clinicians about what is deliverable within the window of the project.

Quote from the Programme Leadership survey

“I am disappointed that the tool is not yet in a live or semi-live setting. The insistence on randomised trials …… is

short-sighted given that the technology is already proven to work. There are other ways to implement safely” (AI

project programme leadership team member #5)

The Covid-19 pandemic had several effects on the project some of which were positive. During the pandemic,

the national breast screening programme was suspended and was only recommenced in July 2020 and for some

breast screening units even later as estate and workforce limitations made it more difficult to make the resumed

service “Covid-secure”. Many staff in the two breast screening units that took part in this project were

redeployed to support other services in their Trusts during the Spring 2020. Early in the national response, the

project team produced a mitigation document which was shared with all stakeholders including Innovate UK and

NHS England which set out this risks and how they would be managed and monitored through project

governance.

The pandemic provided an opportunity for the accelerated deployment of technology for patient consultations,

diagnostics, treatment monitoring and communications across the NHS (Schwamm LH, 2020). In round 2 of the

staff survey (July 2020), we asked clinical and non-clinical staff to reflect on the effect this had on their daily and

working life. The most staff saw substantial change in the use of technology over the four months prior to the

survey (March – June 2020) with the biggest change felt at ULH. Only 12% of respondents experienced no change

at all [Figure 25].



Figure 25: Have you noticed a change in your use of technology over the last 4 months or since the beginning of the Covid-19 pandemic both personally and professionally?

When we compared clinical and non-clinical respondents, it was clear that clinical staff had experienced more

effect from the rapid technological change that took place over the four months, although the opposite was the

case for staff groups at one site (Sherwood Forest Hospitals Foundation Trust) [Figure 26]. It will be interesting

to see if this is sustained over time.

Figure 26: Comparing experience of change - clinical versus non-clinical



What is the potential impact of the screening imaging innovation programme on the performance of screening services?

Kheiron Medical Technologies added a third site to the retrospective study in September 2019 (not an EMRAD

Imaging Network member and in a different English region) which uses different mammography equipment.

This site was not included in the NHS Test Bed project. They used the data gathered from the retrospective

studies in all three sites to answer the evaluation questions set out at the beginning of this project.

Anonymised mammograms were extracted from the three sites and a randomised sample was not used for

algorithm development and training but retained for validation. The date range for the mammograms was Jan

2012- Jan 2019. The AI algorithms opinion (normal or recall) was paired with the opinion of the 1st human reader

to simulate double human readings which were obtained from NBSS. Sensitivity, specificity and discordant

opinion rate were calculated.

a) Is the Mia™ software suitable for use in a large-scale screening programme like NHSBSP by comparing

rates of specificity/sensitivity/recall rate on a large cohort of historic screening cases?

The following results were submitted in an abstract to the annual Radiological Society of North America (RSNA)

conference in December 2020 (Sharma, 2020 [in press]). 40,588 mammograms were reviewed. Overall, for the

two human reader process, 40,230 had a normal outcome and 358 were biopsy proven cancers. The rate of

discordant opinion for two human readers was 1,216/40,588 (3%). The overall recall rate was 4%, with a cancer

detection rate of 8.5 per 1000 [Figure 27].

When the AI algorithm was applied as the second reader to this test set, there was consensus in 33,255 (81.9%)

of the reads to either recall or not recall the cases. This meant that 7,333 (18.1%) of reads were discordant

between the 1st human reader and the AI algorithm. The AI algorithm had a sensitivity of 85.5% and specificity

of 87.2% compared with the first human reader with a sensitivity of 89.4% and specificity 96%. Combining the

AI algorithm with reader 1 gave a sensitivity 95.0% and specificity of 96.9%, cancer detection rate of 8.4 per 1000

and recall rate of 4%. The results are presented in the table below.

Figure 27: Summary results - Mia™ retrospective study

In late September 2020, an additional, entirely withheld, 130,269 cases from EMRAD (together with 150k+ cases

from other screening units in Leeds and Hungary) were processed by Mia in a clinical trial. The results are

currently being validated by an external CRO. Preliminary analysis from Kheiron indicate:

• AI performance has transferred well from internal data [Figure 27] to the Trial, with the number of cases

increasing more than quarter million (250k) and cases coming from two countries.

• Standalone, the AI has shown superiority in sensitivity and non-inferiority in specificity compared to

single readers.

Metric Mia™ standalone R1 standalone R1 + R2 screening outcome R1+ Mia™ screening outcome

Sensitivity (%) 85.5% 89.4% 96.9% 95.0%

Specificity (%) 87.2% 96.0% 96.7% 96.9%

Cancer detection rate (per 1000) 7.5 7.9 8.5 8.4

Recall rate (%) 17.5% 4.9% 4.0% 4.0%

2nd human reader required (%) 100.0% 18.1%

3rd human reader required (%) 3.0% 0.0%



• With the AI as an independent reader in a double reader workflow, statistical analysis is indicating

equivalent clinical performance (non-inferior in sensitivity and superior in specificity) and significant

operational savings.

An updated CE mark and academic publications based on the trial results are expected in early 2021.

b) Does the software achieve similar results for different manufacturers equipment used across the

EMRAD consortium?

In a second paper submitted to the RSNA conference 2020 (James, 2020 [in press]), Kheiron looked at the results

for three different manufacturers equipment. A random selection of 40,642 of the 245,277 extracted cases were

held back and used as a test set for this study and had not been used in the training of the algorithm. Human

reader results, outcome and pathology data on each case were extracted from NBSS. Cancer cases were all

biopsy proven. Cases were only considered normal for the analysis if they had a 3-year negative follow-up

mammogram. Difference in performance of the AI algorithm between vendors was assessed using receiver

operator characteristic (ROC) analysis. A Two-One-Sided-T-test (TOST method) was used to assess equivalence

in sensitivity and specificity between vendors.

There were 12,462 mammograms consisting of 402 biopsy proven cancers and 12,060 normal cases with a 3-

year negative follow up. There were 6,378 mammograms from vendor 1, 3,988 mammograms from vendor 2

and 2,096 from vendor 3. Overall, the AI algorithm had a sensitivity and specificity of 86.7% and 87.1%

respectively. The algorithm had a sensitivity and specificity for mammograms from vendor 1 of 87.3% and 88.0%,

vendor 2 of 85.7% and 85.8% and from vendor 3 of 86.9% and 87.1% respectively (p<0.00037 for non-

equivalence at a 2.5% margin, for all tests, thus equivalence shown).

c) How do these results inform the process of designing and developing a prospective pilot with the

Mia™ software?

Prospective pilot design was not achieved within the timeframe of the Test Bed. In September 2020, Kheiron

were successfully granted an NHSx and AAC Phase 4 AI Award to deploy Mia™ across 15 sites in the UK over 3

years. At the time of writing, NICE and the Technology Specific Evaluation Team (TSET) assigned to Kheiron are

undertaking an evidence gap analysis as part of the AI Award to advise on the design and development of

prospective pilots of Mia™, in line with evidence standards for digital health technologies and the UK National

Screening Committee criteria for screening.



What is the potential impact of the screening optimisation innovation programme?

It has not been possible to evaluate the impact of the round length ML tool as it has not been deployed in a live

environment as part of the screening service workflow. These questions remain important questions to answer

and should be addressed in 2021 once the tool is deployed.

The planned evaluation questions (December 2018)

What is the potential impact of the screening optimisation tool;

1. On breast screening programme manager and administrative staff time?

2. On breast screening workforce utilisation?

3. On optimised breast screening pathway and clinical risk management?

4. On the rate of breast screening ‘did not attends’ (DNAs)?

It is important to note that while the originally planned benefits of the part of project have not yet been

delivered, the likely value to EMRAD, Faculty and the wider NHS of developing a synthetic data set using NBSS

data will be significant for developing operational tools and for research purposes. The interest in developing

synthetic data sets in healthcare is growing, with NHS Digital providing a synthetic data advisory service13 and

NHS England and Improvement sharing synthetic A&E data sets to support the development of new tools that

will help with A&E demand and capacity management14.

The development of the deployment environment is another output with likely tangible benefits for EMRAD and

Faculty as a Trusted Research Environment (TRE). TREs were developed by NHS Digital and Health Data Research

UK in response the need for rapid research using real data in the context of Covid-1915. The actual benefits

accrued will need to be monitored in the longer term. The model for development used for NUH could be scaled

to other Trusts in England. A possible additional question that could be evaluated in the future could be:

What will the synthetic data set (SDS) allow NHS breast screening units to do that could not otherwise be done?

13 https://digital.nhs.uk/services/e-referral-service/document-library/synthetic-data-in-live-environments 14 https://data.england.nhs.uk/dataset/a-e-synthetic-data 15 https://digital.nhs.uk/coronavirus/coronavirus-data-services-updates/trusted-research-environment-service-for-england

https://digital.nhs.uk/services/e-referral-service/document-library/synthetic-data-in-live-environments

https://data.england.nhs.uk/dataset/a-e-synthetic-data

https://digital.nhs.uk/coronavirus/coronavirus-data-services-updates/trusted-research-environment-service-for-england

https://digital.nhs.uk/coronavirus/coronavirus-data-services-updates/trusted-research-environment-service-for-england



Was the programme worth the investment, that is, did it deliver value for money and if not in the timeframe of the evaluation, when is it likely to deliver a return on investment?

A novel multi-stakeholder project of this nature will generate different benefits for each of the different

participants. Thus, when seeking to answer the question of ‘value for money’ and whether the benefits of the

sum of efforts exceeds the direct costs of these efforts, a contextual lens is applied to the likely effort-outcome

for each participant.

A budget impact analysis (BIA) was undertaken to answer this question. BIA complements other types of

economic evaluations such as cost-effectiveness analysis (NICE, 2019), by providing decision makers with

additional information on the financial consequences of commissioning and procuring new technologies. More

information on the methods used in the BIA is provided in Appendix 3. This scope of the BIA covered Mia™

deployment for (i) the two test sites, (ii) all EMRAD sites and (iii) England. An analysis was not conducted for the

service optimisation tool as this was not tested or deployed within the data collection timeframe.

The BIA compared the usual care model (two human readers) with the proposed model (one human reader and

one AI reader) [Figure 28].

Figure 28: Usual care pathway versus new AI second reader pathway

Costs

In determining the costs of the project implementation, we have worked off the return figures submitted by

each party to Innovate UK [Table 13].



Table 13: Project costs

Partner costs

Nottingham University

Hospitals NHS Trust

Faculty AI Ltd

GE Healthcare

Ltd

Kheiron Medical

Technologies Ltd

Matrix Insight Ltd16 Total

Labour £671,387 £280,223 £23,319 £244,839 £40,000 £1,259,768

Overheads £0 £56,045 £4,664 £48,968 £0 £109,677

Materials £0 £20,000 £76,000 £22,800 £0 £118,800

Capital usage £0 £0 £0 £56,000 £0 £56,000

Subcontract £63,52817 £0 £0 £29,820 £0 £93,348

Travel and subsistence £17,500 £3,000 £1,500 £15,000 £3,040 £40,040

Other costs £156,164 £0 £0 £7,500 £0 £163,664

Total £908,579 £359,268 £105,483 £424,927 £43,040 £1,841,297

Grant £908,579 £251,487 £0 £297,449 £30,128 £1,487,643

Own contribution £0 £107,781 £105,483 £127,478 £12,912 £353,654

As can be seen in Figure 29, people (labour) costs accounted for the largest (72%) component of this project.

With a dedicated project team, EMRAD (hosted by Nottingham University Hospitals NHS Trust) accounted for

52% of the labour cost with Kheiron and Faculty accounting for 18% and 21% of labour costs, respectively.

Kheiron was the only entity to have allocated capital usage costs (approximating 3% of the total project cost).

Public sector grants funded 80% of project costs, with 20% contributed by the four private sector entities.

EMRAD costs were fully funded by the grant.

Figure 29: Cost distribution for the AI in breast screening project

16 Matrix Insight Ltd went into administration on 21 August 2019 at which point NUH took responsibility for subcontracting an alternative evaluation provider, TaoHealth Research & Implementation. 17 Includes the cost of subcontracting the evaluation partner.



Analysing the information received, we focused on what assumptions and predicted effects could be supported

by evidence, and what the material impact would be of deploying the Mia tool within the breast screening

programme. Since it would be difficult to validate downstream benefits before establishing peer-reviewed

clinical evidence gathered as part of retrospective studies and prospective trials, we focused on what immediate

and direct resource implications were revealed during the initial retrospective study phase covering three sites

(including the two EMRAD sites).

The most material and tangible benefit identified, around which there was sufficient supporting evidence was

around second reader time. Having already analysed relative cost breakdowns in both general ledgers, we were

able to narrow down focus to the most verifiable impact of AI in second report screening, that is, cost of reader

time.

We used ledger information from NUH and ULH on actual staff cost allocations for the last three years and

discussions with screening unit managers to establish the proportion of time spent on screening activity where

other non-screening activity is delivered by the team [Figure 30]. We estimated the labour cost of the usual care

model looking at clinical staff available for screen reading only. We were unable to cost the proposed care model

without the price information from Kheiron. Future economic evaluation that takes a broader system-wide

perspective would need to use up to date unit costs for health and social care in England18.

There were some key differences in work practices between NUH and UHL. Firstly, at NUH only radiology

consultants read mammograms, radiographers are not used for reading or reporting. At UHL reporting

radiographers19 at band 6-8 are used alongside radiologist reporting.

Figure 30: Labour cost of usual care model for breast screening service (clinical staff engaged in reading breast images only)

18 https://www.pssru.ac.uk/project-pages/unit-costs/ 19 https://www.sor.org/practice/reporting/radiographer-reporting

https://www.pssru.ac.uk/project-pages/unit-costs/

https://www.sor.org/practice/reporting/radiographer-reporting



The locum consultant radiologist expenditure at UHL is much higher reflecting the significant workforce shortage

faced by the trust in this area.

Since there are currently no agreed prices for Mia™, nor any published market pricing, it was not possible to

compute the cost of deployment in a real-world setting or to calculate specific budget impacts. Illustrative break-

even / recoupment figures were modelled and these were shared with the EMRAD team but due to the

commercially sensitive nature of the information, not included in this report.

Outcomes

Whilst the benefits claimed by Kheiron Medical Technologies in their business case were noted, including

reduced waiting time for results, increased uptake from women screened and reduced costs of assessment

clinics, without evidence from current retrospective studies or prospective trials, the specific immediate benefit

of reducing second reader time was focused on.

Factoring in hourly reading rates, arbitration rates under the two scenarios (Reader 1 + Reader 2 + Arbitration

Reader 3; Reader 1 + AI + Arbitration Reader 3), estimates of capacity released were built.

A nominal growth rate in population was applied using ONS data and based on historic trends in activity and

productivity. The effect of introducing an AI second reader on human reading hours required was modelled

using the arbitration levels which were identified in the retrospective study (18.2%). This gave an estimate of

human screening hours required for each care model, usual versus AI [Table 14].

Table 14: Human reading hours required by each model of care

The charts below [Figure 31] show comparative screen reading hours under the usual care and proposed new

model of care scenarios, with the chart on the right reflecting lower resource utilisation under a human reader

and AI second reader scenario. Below [Figure 32] the same data is presented for each of the two sites separately.

Figure 31: Human reading hours required by each model of care at NUH and UHL



Figure 32: Predicted human reading hour change by site

While the AI reduced second reader hours, it also increased arbitration (i.e. third reader hours) rates, thereby

resulting in approximate capacity release of 42% of human reader time. As trials progress and the technology

and processes benefit from learning and refining, it is hoped that the arbitration rates will decrease, and effective

capacity release would increase. Thus, the pressure on existing resource (i.e. locum and consultant radiologists

and reporting radiographers) would in turn decrease, either allowing more screening with the same level of staff,

or a lower staffing to meet existing needs. These variables will need to be considered as part of future health

economic analysis during prospective trials and real-world implementation.

Using existing staff cost templates, and comparing these to NHS reference costs, the effective budget

implications were calculated based on the predicted changes to the requirement for human readers. A number

of different cost scenarios based on a range of different staff grades were calculated. The most common cost

scenarios, consultant only image reading and band 8 reporting radiographer image reading, are presented below

showing the potential annual cost savings using NUH and ULH activity data projected over the next five years

[Table 15].

Table 15: Annual direct cost savings based on two scenarios (2021-2025)

With regard to returns on implementation costs, that is, grants awarded for the initial test bed phase

(implementation costs), qualitative benefits accrued to EMRAD were identified which included; creating

information governance blueprints, and establishing baseline reader time savings using an AI second reader with

potential mitigation on workforce shortage. Financial benefits accrued to Kheiron Medical in the form of grant

funding of £15M through the NHS AI Award Phase 4 to Kheiron to proceed with further retrospective study of

the algorithm at 15 sites across England. While these studies will not be with an EMRAD trust, it would not be

unreasonable to directly attribute the award of this £15M grant to the work done as part of the EMRAD-led test

bed. Hence, the initial grant awarded to Kheiron (£297 449) approximates 1.9% of the next phase of funding

recently awarded. Adding in Kheiron’s own contribution of £127 248, the financial return on effort remains

substantive.



What would the impact of the screening imaging innovation programme be if implemented at scale across EMRAD?

The only information that can be extrapolated in the absence of cost data for the proposed new model of care

(Mia™) is the potential effect on resourcing of adopting an AI second reader at scale across all seven EMRAD

trusts. This was further extrapolated to England using the same methodology. Activity data for the last three

years directly was obtained from EMRAD trusts and cross referenced this with KC62 returns to NHS Digital20. A

3% growth rate was used for the screening population as agreed with EMRAD based on likely changes to the

population invited to screening by the NHSBSP over the next five years. The KC62 returns were used to establish

trends across England and the same growth rate for screening invitations. The graphs below indicate the

potential effect of using AI as a second reader [Figure 33].

Figure 33: Predicted human reading hour change for (a) EMRAD and (b) England

This analysis is limited in its scope and has limited meaning without information on change to costs for provider

trusts (the usual care costs of image reading based on the skill-mix used) and evidence from prospective trials

on workflow and downstream effects. It does give some indication of the potential effect on workforce utilisation

and the potential for an AI second reader to help resolve the acute workforce shortage in radiology.

20 https://datadictionary.nhs.uk/data_sets/central_return_data_sets/nhs_breast_screening_programme_central_return_da

ta_set__kc62_.html

https://datadictionary.nhs.uk/data_sets/central_return_data_sets/nhs_breast_screening_programme_central_return_data_set__kc62_.html

https://datadictionary.nhs.uk/data_sets/central_return_data_sets/nhs_breast_screening_programme_central_return_data_set__kc62_.html



What would the impact of the screening optimisation innovation programme be if implemented at scale across EMRAD?

For the same reasons it has not been possible to evaluate the impact of the round length management ML tool

at the test sites (no live deployment as part of the screening service workflow), it not possible to evaluate the

impact of wider scale and spread at this stage.



Summary of key findings

The key findings of the evaluation are:

1. Clinicians working in the breast screening service are positive about the potential for AI as a second

reader in breast screening but want to see more evidence from clinical trials and real-world validation.

2. Service administrative staff and managers (non-clinical) are less convinced about the potential for AI in

service optimisation but have also seen less obvious development in this area over the period of the

project.

3. Adult women of and under screening age are generally positive about the introduction of AI into the

breast screening service but they also want to see evidence of effectiveness and safety especially where

the technology is used as a second reader.

4. The same factors influenced the early stages of implementation of these novel technologies as any other

digital health technology.

5. Some factors that are additional and unique to AI were evidenced during the implementation of this

project.

6. The results of the retrospective study using the AI mammogram reader (Mia™), when used to model the

impact of the technology on resourcing showed that using AI as a second reader could reduce the time

required from human readers (radiologists and reporting radiographers) by 42%.



Discussion and implications Evaluating progress, outcomes, and impact

When this project is reviewed in terms of progress against the plans originally set out in December 2018, there

is a significant difference between what was originally planned and what has been delivered summarised in

Figure 34 below.

Figure 34: Planned versus actual project plans



Explaining the progress

Complexity and novelty of this particular project were two reasons for the slower than originally planned rate of

progress that the programme leadership group pointed to when surveyed. The dimensions of complexity that

have been evident in this project are structural, socio-political and emergent (Maylor, 2013).

1. Structural

Delays in securing project funding and in recruiting some of the resource to support project implementation,

changes in project scope in response to unanticipated information governance hurdles, and requirements for

research approval led to delays in the project delivery overall. In some cases, planned workstreams were

abandoned (economic evaluation and clinical deployment of Mia™) or postponed (service optimisation tool

development, training and testing). Not involving information governance expertise as part of the core team

from the point of writing the proposal for Test bed funding is a key lesson learnt by the programme team.

Information governance was so central to the project’s progress that early involvement of IG specialists in

planning may have enabled more proactive mitigation strategies to be put in place.

2. Socio-political

Cultural differences between technology start-up commercial partners and NHS trusts were evident. Both

technology start-ups had limited experience of working in the NHS either in a research or delivery capacity. This

became evident early on during the discussions around information governance and data sharing.

Commercialisation and scaling up highlighted these differences again as technology companies worked with the

NHS to develop a business case that would gain support from Trust boards. As business case development

progressed, the different expectations of outcomes between Kheiron and EMRAD trusts became obvious.

EMRAD trusts were keen to move quickly to adoption, Kheiron were more cautious, citing the need to deliver

clinical trial data first. The policy context as well as political and social priorities were moving rapidly over the

course of the project and heavily influenced the progress of the project. The project itself influenced policy and

regulatory developments as evidenced by the number of times it is referenced as a case study in policy

documents (Commons, 2018) (NHSX, 2019). The promotion of AI technology in healthcare in the UK is driven by

political commitment directly from the Secretary of State 21 and this high profile, alongside the multiple

stakeholders with an interest in the outcome of this project (NHSX, PHE, NHS England & Improvement, Office

for Life Sciences, Innovate UK, CQC, ICO and NHS EMRAD Trusts), means that the project has had to satisfy a

range of interests which have not always been aligned.

3. Emergent

The Covid-19 pandemic influenced the project in ways that could not have been anticipated and had to be

adapted to as the risk emerged and was realised.

21 https://www.computerweekly.com/news/252488719/Matt-Hancock-announces-50m-for-healthcare-AI-projects

https://www.computerweekly.com/news/252488719/Matt-Hancock-announces-50m-for-healthcare-AI-projects



Attributing outcomes

It is challenging to attribute outcomes in complex projects such as this one (Bovaird, 2014). Add to this, the

novelty of a project that involves testing and validating new technology that is not yet fully regulated or

commercially available, and the traditional approach to evaluating a programme theory of change becomes even

more challenging.

The process of co-producing the programme theory of change and informing its development as the project

progressed has enabled the project team to draw out some emerging impact pathways from the data.

The process of conducting the HRA approved research using Mia™ has had some limited impact on the

confidence with which the clinical workforce in test sites perceive the AI tool. However, it is worth noting that

similar patterns of increased confidence were noted at the control sites which may be indicative the increased

profile of the use of AI in healthcare and breast screening specifically during the period of the project. The

process of conducting the discovery work with non-clinical staff for the service optimisation tool did have a small

effect on levels of engagement with staff and positive perceptions of the possible impact of and AI tool but this

was not sustained through the delayed design and development of the tool. Overall, there were no significant

differences across test and control sites that can be attributed to the objective of the test bed.

None of the predicted impacts in relation to numbers of women screened, workforce productivity or experience

of care have been evidenced in the real world at this stage. The most likely immediate impact of workforce

productivity has been modelled as part of the budget impact analysis but remains to be tested in real-world

deployment.



Implications

Based on the findings of this evaluation and what is known from the theoretical and empirical literature in this

area to date, we have highlighted the following recommendations that are specific to the UK context but may

be generalisable to other contexts:

For policy makers and regulators

1. Continue to evolve the system of regulation in collaboration with interested groups, shifting the focus from

AI technology firms to healthcare professionals and the wider public, including protected groups, as adopters

of the technology.

2. Continue to consider the role of data governance and ethics in the application of AI. Consider the impact on

power relationships in the context of person-centred integrated care starting with focusing on the role of

informed consent, involving the public in the design and monitoring of these approaches beyond user

testing.

For the national breast screening programme

3. Set out the expectations for the evidence threshold to be generated as part of future retrospective and

prospective clinical trials of AI as a second reader of mammograms undertaken in the UK population.

4. Clarify the requirements and priorities for wider socio-political research on the impact of implementing AI

technology in the breast screening programme.

For breast screening units and the NHS trusts that run them

5. If considering adopting AI as part of the clinical or non-clinical workflow, understand the level of readiness

of your workforce, workflow and organisation to take up the new technology and ensure that the

appropriate information governance and change management support is in place from the beginning to

deliver the change.

6. Apply the principles set out in NHSX’s A Buyers Guide to AI in Healthcare22 to the procurement and

implementation of service optimisation AI tools for use in the breast screening programme.

22 https://www.nhsx.nhs.uk/ai-lab/explore-all-resources/adopt-ai/a-buyers-guide-to-ai-in-health-and-care/

https://www.nhsx.nhs.uk/ai-lab/explore-all-resources/adopt-ai/a-buyers-guide-to-ai-in-health-and-care/



7. Share learning on the developments in AI across the breast screening unit team (clinical and non-clinical

staff) and open a forum as part of team professional development that discusses critically developments in

technology including AI in breast screening.

For radiologists, radiographers and other clinical staff

8. Provide support and incentives for staff to learn from (and if possible engage in) research on AI in breast

imaging as part of the CPD requirements in your work place.

9. Consider the likely effects of adopting AI as a second reader in the clinical pathway as part of a multi-

disciplinary team in terms of professional accreditation and ongoing development, productivity, simplifying

working practice and improving the experience of care.

For women of and under screening age and society more widely

10. Ask for information about the results of clinical and real-world research on the effectiveness, safety and

ethics of AI in breast screening and other healthcare applications in ways that are clear and understandable

to the layperson.



Limitations of the study

The greatest limitation of this evaluation is the lack of evidence generated to conduct a full impact evaluation.

In the time frame of the project this has not been possible, and this was highlighted to stakeholders early in the

project.

The short time between the two rounds of the staff survey combined with the change in scope for both Mia™

(dropping the clinical deployment workstream) and the service optimisation tool (delaying the development of

the tool by one year) meant that there was little change to be measured between the two surveys. This also

contributed to the relatively low rates of response in round 2 of the surveys.

Lessons learnt for future evaluation design

AI in healthcare, including breast screening, whilst still relatively new, is likely to be adopted widely in the coming

years. Recent publications have set out new guidelines for evaluating AI technology in healthcare during

development and testing (Rivera, 2020) (Liu X. R., 2020) governing trial protocols and trial reporting. These are

only beginning to be applied in practice.

The scope of this evaluation was looking at the real-world effects of the two tools being tested. The evaluation

aimed to set out a rich narrative that provided context, exploring the complexity and novelty of the test bed

project in a way that could be accessible to future test sites. With the benefit of hindsight, it can be seen that

the evaluation design should have encompassed the retrospective clinical study and an independent evaluation

of the effectiveness of the service optimisation tool alongside the real-world evaluation of adoption and spread.

This multi-disciplinary approach to independent, verifiable evaluations of AI in healthcare practice will be

essential in the future.

Informed by these lessons, additions to the NASSS framework (Greenhalgh, 2017) are proposed [Figure 35Figure

35] that can be used when evaluating AI technology in real-world clinical and non-clinical workflows, specifically

focusing on readiness for adoption. These additions are discussed in more detail in a publication submitted to

the Journal of Medical Internet Research in December 2020.



Figure 35: Evaluating AI technology in healthcare - addition to the NASSS framework

Implementation planning starts with a clear value proposition that sets out who the beneficiaries are, prioritises

them if appropriate and predicts when the value will be realised and how this will be established. The value

proposition should be a live document which is iterated based on implementation feedback and evaluation

learning. In the case of AI technology, the value proposition will need to be shared with a wider group than other

technologies at the early stage of adoption and this is likely to include research authorities, regulators, and ethics

committees. Similarly, the landscape of AI technology providers is limited and only in the foothills of bespoke

regulation, leaving implementation teams in the position of having to make more complex and risky decisions

about commercial contracts for a technology they may not be very familiar with.

During implementation, data quality, security and processes specific to AI technology require steps for training,

validating, and testing algorithms (Harvey H., 2019) or for creating synthetic data sets (Pollack, 2019) which go

beyond those required for traditional information technology presenting implementation teams with new

business processes to develop within and between organisations. This drives a requirement for ethical

governance which provides accountability for this new technology that needs to include members with the

following backgrounds:- clinical accountability, corporate accountability, information governance, research

ethics, person living with the target condition, and applied AI in health expertise. As with any technology project

that involves adoption within a existing workflow, having change managers dedicated to supporting the change

processes required to enable successful impact should never be underestimated. These processes include two-

way stakeholder communication and engagement throughout.

When delivered in combination all the above will contribute to building adopter trust in the AI technology being

implemented. Including people living with the target condition who can support and drive communication with

service users throughout the project, from communicating algorithm explainability, designing informed consent

and monitoring emerging feedback from evaluation, trials and / or clinical audit.



There is an issue which may arise in the future relating to adaptation over time for AI technology that does not

apply to traditional technology and that is continuous learning, where the algorithm continues to use ‘live’ data

to learn and adapt. There are no examples of this in use in healthcare as yet (Rivera, 2020) but this will need to

be monitored.



Acknowledgements This evaluation would not have been possible without the cooperation of a few people and organisations.

Staff from across the four study sites freely gave their time to support and engage in this study either as staff

members within the Breast Screening Units or as ordinary women who are either attending breast screening

now or will in the future. The Research & Innovation teams from the sites supported this work proactively

encouraging people to take part in the study. Women from the general public also took part in the survey and

in the subsequent focus groups.

The NHS EMRAD AI project team were instrumental in facilitating introductions to the service teams and other

local stakeholders and played a proactive role in removing the barriers to the evaluation when they arose,

without seeking to influence the independence of the evaluation.



References Abdullah R, F. B. (2020). Health Care Employees' Perceptions of the Use of Artificial Intelligence

Applications: Survey Study. . J Med Internet Res. , 22(5):e17620. doi:10.2196/17620.

Academy of Medical Sciences. (2018). Future data-driven technologies and the implications for use of

patient data. Ipsos Mori.

Ada Lovelace Institute. (2020). Exit through the app store. London: Ada Lovelace Institute.

Ada Lovelace Institute. (2020). No green lights, no red lines: Public perspectives on COVID-19

technologies. Ada Lovelace Institute.

AHSN Network. (2018). Accelerating Artificial Intelligence in health and care: results from a state of

the nation survey. London: Department of Health and Social Care.

Alderwick, H. &. (2019). The NHS long term plan . BMJ , 364 :l84 .

Allen, B. D. (2020). Integrating Artificial Intelligence Into Radiologic Practice: ALook to the Future.

Data Science and Radiologic Practice, 280-283.

Asan, O. E. (2020). Artificial Intelligence and Human Trust in Healthcare: Focus on Clinicians. Journal

of Medical Internet Research, 2(6):e15154) doi: 10.2196/15154.

Asthana, S. J. (2019). Why does the NHS struggle to adopt eHealth innovations? A review of macro,

meso and micro factors. . BMC Health Services Research , 19, 984.

https://doi.org/10.1186/s12913-019-4790-x.

Bjerring, J. &. (2020). Artificial Intelligence and Patient-Centered Decision-Making . Philosophy &

Technology , https://doi.org/10.1007/s13347-019-00391-6.

Bonten TN, R. A.-P. (2020). Online Guide for Electronic Health Evaluation Approaches: Systematic

Scoping Review and Concept Mapping Study. Journal of Medical Internet Research,

22(8):e17774.

Bovaird, T. (2014). Attributing Outcomes to Social Policy Interventions – ‘Gold Standard’ or ‘Fool's

Gold’ in Public Policy and Management? Social Policy & Administration, 1-23.

Brennan, S. (2020, August 25). New screening committee to replace PHE role. HSJ.

Brown, A. C.-H. (2019). CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in

Computing Systems (pp. Paper No.: 41 Pages 1–12https://doi.org/10.1145/3290605.3300271).

Glasgow: https://doi.org/10.1145/3290605.3300271.

Buch, V. A. (2018). Artificial intelligence in medicine: Current trends and future possibilities. British

Journal of General Practice, 143-144.

Cannizzaro, S. P. (2020). Trust in the smart home: Findings from a nationally representative survey in

the UK. . PLoS ONE , 15(5): e0231615. https://doi.org/10.1371/journal.pone.02316.

Care Quality Commission (b). (2020). Using machine learning in diagnostic servicesA report with

recommendationsfrom CQC’s regulatory sandbox. CQC.

Care Quality Commission. (2020). Getting to the right carein the right way – digital triage in health

servicesA report with recommendationsfrom CQC’s first regulatory sandbox. CQC.



Care, D. o. (2019). Code of conduct for data-driven health and care technology. London: Department

of Health and Social Care.

Challen R, D. J. (2019). Artificial intelligence, bias and clinical safety. BMJ Quality & Safety , 231-237.

Char, D. S. ( 2018 ). Implementing Machine Learning in Health Care — Addressing Ethical Challenges.

New England Journal of Medicine, 981–983. doi:10.1056/NEJMp1714229.

Chartrand, G. C. (2017). Deep Learning: A Primer for Radiologists. RadioGraphics, 2113-2131.

Chen, J. a. (2017). Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated

Expectation. New England Journal of Medicine, 2507-2509.

Cicchetti, D. (1992). Neural Networks and Diagnosis in the Clinical Laboratory: State of the Art. Clinical

Chemistry, Volume 38, Issue 1, Pages 9–10, https://doi.org/10.1093/clinchem/38.1.9.

Coeckelbergh, M. ( 2019). Ethics of artificial intelligence: Some ethical issues and regulatory

challenges. Mark Coeckelbergh, Ethics of artificial intelligence: Some ethical issues and

regulatory challenges, Technology and Regulation, 31–34

https://doi.org/10.26116/techreg.2019.003 • ISSN: 2666-139.

Cohen, C. K. (2017). Acceptability Among Community Healthcare Nurses of IntelligentWireless

Sensor-system Technology for the Rapid Detection of HealthIssues in Home-dwelling Older

Adults. The Open Nursing Journal, 54-63.

Coiera, E. (2019). The Last Mile: Where Artificial Intelligence Meets Reality. Journal of Medical

Internet Research, 21(11):e16323. doi:10.2196/16323.

Commons, H. o. (2018). The Independent Breast Screening Review . London: HC 1799.

Cresswell, K. (2018). Health Care Robotics: Qualitative Exploration of Key Challengesand Future

Directions. Journal of Medical Internet Research, 20(7):e10410.

Cruz, J. &. (2006). Applications of Machine Learning in Cancer Prediction and Prognosis . Cancer

Informatics, 59-77.

Daniels, N. G. (2019). STEER: Factors to Consider When Designing Online Focus Groups Using

Audiovisual Technology in Health Research. International Journal of Qualitative Methods. ,

doi:10.1177/1609406919885786.

Davenport, T. &. (2019). The potential for artifcial intelligence in healthcare. Future Healthcare

Journal, 94-98.

Department of Health and Social Care. (2019). Government response to the Independent Breast

Screening Review recommendations. London: Department of Health and Social Care.

Dustler, M. (2020). Evaluating AI in breast cancer screening: a complex task . The Lancet, e106-107.

Evans, D. W. (2012). Assessing Individual Breast Cancer Risk within the U.K. National Health Service

Breast Screening Program: A New Paradigm for Cancer Prevention. Cancer Prevention

Research, 943-951.

Fenech, M. S. (2018). Ethical, social and political challenges of artificial intelligence in health. Future

Advocacy and Wellcome Trust.

Floridi, L. C. (2020). How to Design AI for Social Good: Seven Essential Factors. Science and

Engineering Ethics, 1771–1796 https://doi.org/10.1007/s11948-020-00213-5.



Flynn, R. A. (2018). Two Approaches to Focus Group Data Collection for Qualitative Health Research:

Maximizing Resources and Data Quality. International Journal of Qualitative Methods.,

doi:10.1177/1609406917750781.

Gao S, H. L. (2020). Public Perception of Artificial Intelligence in Medical Care: Content Analysis of

Social Media. Journal of Medical Internet Research , 22(7):e16649.

Gong, B. N. (2019). Influence of Artificial Intelligence on Canadian Medical Students' Preference for

Radiology Specialty: ANational Survey Study. Academic Radiology, 566-577.

Gøtzsche, P. &. (2013, June 3). Screening for breast cancer with mammography. Retrieved from

www.cochrane.org: https://www.cochrane.org/CD001877/BREASTCA_screening-for-breast-

cancer-with-mammography

Greenhalgh T, W. J. (2020). Video consultations for covid-19 . BMJ, 368 :m998 .

Greenhalgh, T. W. (2017). Beyond Adoption: A New Framework for Theorizing and Evaluating

Nonadoption, Abandonment, and Challenges to the Scale-Up, Spread, and Sustainability of

Health and Care Technologies. Journal of Medical Internet Research , e367. doi:

10.2196/jmir.8775. PMID: 29092808; PMCID: PMC5688245.

Gutiérrez-Ibarluzea I, C. M. (2017). The Life Cycle of Health Technologies. Challenges and Ways

Forward . Frontiers in Pharmacology, (8):14

https://www.frontiersin.org/article/10.3389/fphar.2017.00014 .

Harvey H., &. G. (2019). A Standardised Approach for Preparing Imaging Data for Machine Learning

Tasks in Radiology. . In M. S. Ranschaert E., Artificial Intelligence in Medical Imaging. .

Springer, Cham. https://doi.org/10.1007/978-.

Harvey, H. a. (2018). Algorithms are the new drugs? Reflections for a culture of impact assessment

and vigilance. Proceedings of the International Conferences on e-Health . Madrid: IADIS.

Health Education England. (2019). Preparing the healthcare workforce to deliver the digital future:

The Topol Review. Health Education England.

Health Education England. (2020). One year on: Progress on the recommendations of the Topol

Review. NHS England and Improvement.

Health Quality Ontario. (2016). Women’s Experiences of Inaccurate Breast Cancer Screening Results: A

Systematic Review and Qualitative Meta-synthesis. Health Quality Ontario.

House of Commons. (2019a). Independent review of national cancer screening programmes in

England: Interim Report. London: House of Commons.

Houssami, N. K.-J. (2019). Artificial Intelligence (AI) for the early detection of breast cancer: a scoping

reviewto assess AI’s potential in breast screening practice. Expert Review of Medical Devices,

351-362.

ICO. (2020, July 30). Key DP themes: ICO Website. Retrieved from ICO Website: https://ico.org.uk/for-

organisations/guide-to-data-protection/key-data-protection-themes/guidance-on-ai-and-

data-protection/

ICO and Alan Turing Institute. (2020, May 20). Key DP Themes: ICO Website. Retrieved from ICO

Website: https://ico.org.uk/for-organisations/guide-to-data-protection/key-data-protection-

themes/explaining-decisions-made-with-artificial-intelligence/

Information Commissioners Office. (2020). Guidance on AI and data protection. ICO.



Ipsos Mori. (2017). Public views of Machine Learning. The Royal Society.

James, J. S. (2020 [in press]). Generalisability of a commercially available Artificial Intelligence (AI)

solution across multiple hardware vendors in a national breast cancer screening programme.

Radioligy Society of North America (RSNA) Annual Meeting.

Jutzi TB, K.-H. E.-L. (2020). Artificial Intelligence in Skin Cancer Diagnostics: The Patients’Perspective.

Frontiers in Medicine, 7:233.doi: 10.3389/fmed.2020.00233.

Karches, K. (2018). Against the iDoctor: why artificial intelligence should not replace physician

judgment. . Theoretical Medicine and Bioethics , 91-110 https://doi.org/10.1007/s11017-018-

9442-3.

Katell, M. Y. (2020). Toward situated interventions for algorithmic equity: lessons from the field. . In

Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20).

Association for Computing Machinery, (pp. 45–55.

DOI:https://doi.org/10.1145/3351095.3372874). New York, NY, USA.

Kelly CJ, K. A. (2019). Key challenges for delivering clinical impact with artificial intelligence. . BMC

Med., 17(1):195. Published 2019 Oct 29. doi:10.1186/s12916-019-1426-2.

Kerasidou, A. (2020). Artificial intelligence and the ongoing need for empathy, compassion and trust

in healthcare. Bulletin of the World Health Organisation, 245-250 doi:

http://dx.doi.org/10.2471/BLT.19.237198.

Kim, H.-E. K.-H.-K. (2020). Changes in cancer detection and false-positive recall inmammography using

artificial intelligence: a retrospective, multireader study. The Lancet, e138–48.

Kirsch, A. (2017). Explain to whom? Putting the User in the Center of Explainable AI. Proceedingsof

the First International Workshop on Comprehensibility and Explanation in AI and ML 2017 co-

located with 16th International Conference of the Italian Association for Artificial Intelligence.

Bari, Italy. hal-01845135 .

Kite, J. &. (2017). Insights for conducting real-time focus groups online using a web conferencing

service. . F1000 Research, 6:122. Published 2017 Feb 9. doi:10.12688/f1000research.10427.1.

Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and

perspective. Artifcial Intelligence in Medicine, 89-109.

Lai, M.-C. B.-F. (2020). Perceptions of artificial intelligence in healthcare: findings for a qualitative

study among actors in France. Jounral of Translational Medicine.

Lee, L. K. (2019). The Current State of Artificial Intelligence in Medical Imaging and Nuclear Medicine.

BJR Open, 1: 20190037.

Lennon MR, B. M. (2017). Readiness for Delivering Digital Health at Scale: Lessons From a

Longitudinal Qualitative Evaluation of a National Digital Health Innovation Program in the

United Kingdom. Journal of Medical Internet Research, 19(2):e42.

Liu, X. F. (2019). A comparison of deep learning performance against health-care professionals in

detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet

Digital Health, e271-e297.

Liu, X. R. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial

intelligence: the CONSORT-AI Extension. BMJ, 370 :m3164.



Løberg, M. L. (2015). Benefits and harms of mammography screening. . Breast cancer research , 63.

https://doi.org/10.1186/s13058-015-0525-z.

Loh, E. (2018). Medicine and the rise of the robots: a qualitative review of recent advances of artificial

intelligence in health. BMJ Leader, 59-63.

Long, H. B. (2019). How do women experience a false-positive test result from breast screening? A

systematic review and thematic synthesis of qualitative studies. British Journal of Cancer ,

351–358 https://doi.org/10.1038/s41416-019-05.

Maclin, P. D. (1991). Using neural networks to diagnose cancer. J Med Syst , 11–19

https://doi.org/10.1007/BF00993877.

Macrae, C. (2019). Governing the safety of artificial intelligence in healthcare. BMJ Quality & Safety ,

495-498.

Maruthappu, M. S. (2014). The NHS Five Year Forward View: transforming care. The British journal of

general practice : the journal of the Royal College of General Practitioners, 64(629), 635.

https://doi.org/10.3399/bjgp14X682897.

Mathioudakis AG, S. M.-P.-H.-C. (2019). Systematic review on women's values and preferences

concerning breast cancer screening and diagnostic services. Psycho-oncology, 939-947.

Maylor, H. T.-W. (2013). How Hard Can It Be?: Actively Managing Complexity in Technology Projects .

Research-Technology Management, 56:4, 45-51, DOI: 10.5437/08956308X5602125 .

McCarthy JF, M. K. (2004). Applications of machine learning and high-dimensional visualization in

cancer detection, diagnosis, and management. . Annals of the New York Acadamy of Sciences ,

1020:239-262. doi:10.1196/annals.1310.020.

McDougall, R. (2019). Computer knows best? The need for value-flexibility in medical AI. Journal of

Medical Ethics, 156-160.

McKinney, S. S. (2020). International evaluation of an AI system for breast cancer screening. Nature,

89–94 https://doi.org/10.1038/s41586-019-1799-6.

Mendelson, E. (2019). Artificial Intelligence in Breast Imaging: Potentials and Limitations. AJR

American Journal of Roentgenology, 293-299. doi:10.2214/AJR.18.20532.

Meskó, B. H. (2018). Will artificial intelligence solve the human resource crisis in healthcare? BMC

Health Services Research., 18, 545. https://doi.org/10.1186/s12913-018-3359-4.

Meskó, B. H. (2018). Will artificial intelligence solve the human resource crisis in healthcare?. . BMC

Health Service Research , 545 https://doi.org/10.1186/s12913-018-3359-4.

Milano, S. T. (2020). Recommender systems and their ethical challenges. . AI & Society ,

https://doi.org/10.1007/s00146-020-00950-y.

Miller, C. K. (2020, May 11). People, Power and Technology: The 2020 Digital Attitudes Report.

Retrieved from Doteveryone:

https://www.doteveryone.org.uk/report/peoplepowertech2020/

Morley J, F. L. (2019). How to designa governable digital health ecosystem. Available at SSRN:

https://ssrn.com/abstract=3424376 or http://dx.doi.org/10.2139/ssrn.3424376.

Morley J, F. L. (2020). NHS AI Lab: why we need to be ethically mindful about AI for healthcare. Pre-

publication, downloaded from Researchgate 10.13140/RG.2.2.23203.20004.



Morley, J. F. (2019). A Typology of AI Ethics Tools, Methods and Researchto Translate Principles into

Practices. AI for Social Good workshop at NeurIPS . Vancouver, Canada.

Morley, J. F. (2020). From What to How: An Initial Review of Publicly Available AI Ethics Tools,

Methods and Research to Translate Principles into Practices. Science and Engineering Ethics ,

2141–2168 https://doi.org/10.1007/s11948-019-0016.

Nagendran, M. C. (2020). Artificial intelligence versus clinicians: systematic review of design,

reporting standards, and claims of deep learning studies. BMJ, 368:m689.

Nelson, A. H. (2019). Predicting scheduled hospital attendance with artificial intelligence. . npj Digital

Medicine , 26 https://doi.org/10.1038/s41746-019-0103-3.

NHS Breast Screening Programme. (2020, August 23). AgeX Trial. Retrieved from www.agex.uk:

http://www.agex.uk/

NHS Digital. (2020, August 23). Breaat Screening Programme, England 2018-19. Retrieved from NHS

Digital: https://digital.nhs.uk/data-and-information/publications/statistical/breast-screening-

programme/england---2018-19

NHSX. (2019). Artificial Intelligence: How to get it right. Putting policy into practice for safe data-

driven innovation in health and care. London: NHSX.

NICE. (2019). Evidence standards framework for digital health technologies. National Institute for

Health and Care Excellence.

NICE. (2019). Evidence Standards Framework for Digital Health: Cost Consequences and Budget

Impact Analyses. York: York Health Economics Consortium.

Oh S, K. J. (2019). Physician Confidence in Artificial Intelligence: An Online Mobile Survey. Journal of

Medical Internet Research, 21(3):e12422.

Ongena, Y. H. (2020). Patients’ views on the implementation of artificial intelligence in radiology:

development and validation of a standardized questionnaire. . European Radiology, 1033–

1040 https://doi.org/10.1007/s00330-019-06486-0.

Open Data Institute. (2020). Covid-19: Identifying and managing ethical issues around data. London:

ODI.

Oren, O. G. (2020). Artificial intelligence in medical imaging: switching from radiographic pathological

data to clinically meaningful endpoints,. The Lancet Digital Health, e486-e488.

Ouchchy, L. C. (2020). AI in the headlines: the portrayal of the ethical issues of artificial intelligence in

the media. . AI & Society , https://doi.org/10.1007/s00146-020-00965-5.

Panch T, M. H. (2019). The “inconvenient truth” about AI in healthcare. npj Digital Medicine, 77

https://doi.org/10.1038/s41746-019-0155-4.

Panch, T. S. (2018). Artificial intelligence, machine learning and health systems. Journal of Global

Health,, 8(2), 020303. https://doi.org/10.7189/jogh.08.020303.

Park, C. Y. (2020). Medical Student Perspectives on the Impact of Artificial Intelligence on the Practice

of Medicine. Current Problems and Diagnostic Radiology, in press.

Park, S. &. (2018). Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial

Intelligence Technology for Medical Diagnosis and Prediction. Radiology, 800-809.



Pesapane F, V. C. (2018). Artificial intelligence as a medical device in radiology: ethical and regulatory

issues in Europe and the United States. Insights Imaging, 745-753. doi:10.1007/s13244-018-

0645-y.

Pinto dos Santos, D. G. (2019). Medical students' attitude towards artificial intelligence: a multicentre

survey. . European Radiology, 1640–1646 https://doi.org/10.1007/s00330-018-5601-1.

Pollack, A. S. (2019). Creating synthetic patient data to support the design and evaluation of novel

health information technology. Journal of Biomedical Informatics,

https://doi.org/10.1016/j.jbi.2019.103201.

Powell, J. (2019). Trust Me, I’m a Chatbot: How Artificial Intelligence in Health CareFails the Turing

Test. Journal of Medical Internet Research, e16222 doi: 10.2196/16222.

Public Health England. (2017). Screening Quality Assurance visit report NHS Breast Screening

Programme Lincolnshire. Public Health England.


Programme Kettering. Public Health England.


Programme North Nottinghamshire. Public Health England.

Rainey, L. v. (2018). Women’s perceptions of the adoption of personalised risk-based breast cancer

screening and primary prevention: a systematic review. Acta Oncologica, 1275-1283.

Rajkomar, A. O. (2018). Scalable and accurate deep learning with electronic health records. npj Digital

Medicine, https://doi.org/10.1038/s41746-018-0029-1.

Recht, M. D. (2020). Integrating artificial intelligence into the clinical practice of radiology: challenges

and recommendations. European Radiology , 3576–3584 https://doi.org/10.1007/s00330-

020-06672-5.

Reform. (2018). Thinking on its own: AI in the NHS. London: The Reform Research Trust.

Reid F. Thompson, G. V. (n.d.).

Richards, P. S. (2019b). Report of the Independent Review of Adult Screening Programmes in England.

London.

Rivera, S. L.-W. (2020). Guidelines for clinical trial protocols for interventions involving artificial

intelligence: the SPIRIT-AI Extension. BMJ , 370 :m3210 .

Robbins, S. (2019). AI and the path to envelopment: knowledge as a first step towards the responsible

regulation and use of AI-powered machines. AI & Society, 391-400.

Rong, G. M. (2020). Artificial Intelligence in Healthcare: Review and Prediction Case Studies.

Engineering, 291-301.

Royal College of Radiologists. (2019). Clinical radiology UK workforce census report 2018. Royal

College of Radiologists.

Royal College of Radiologists. (2020). Clinical Radiology: England workforce 2019 summary report.

Royal College of Radiologists.

Salim M, W. E. (2020). External Evaluation of 3 Commercial Artificial Intelligence Algorithms for

Independent Assessment of Screening Mammograms. JAMA Oncology,

doi:10.1001/jamaoncol.2020.3321.



Schwamm LH, E. J. (2020). Virtual care: new models of caring for our patients and workforce. Lancet

Digit Health, 2(6), e282-e285.

Select Committee on Artificial Intelligence. (2018). AI in the UK: ready, willing and able? London:

Authority of the House of Lords.

Shah, R. &. (2018). IOT and AI in healthcare: A systematic literature review. Issues in Information

Systems , 33-41.

Sharma, N. J. (2020 [in press]). Impact of an Artificial Intelligence (AI) solution as a reader in a national

breast screening programme. Radiology Society of North America (RSNA) Annual Meeting.

Shaw J, R. F. (2019). Artificial Intelligence and the Implementation Challenge. . Journal of Medical

Internet Research , 21(7):e13659. doi:10.2196/13659.

Sheikh A, C. T. (2011). Implementation and adoption of nationwide electronic health records in

secondary care in England: final qualitative results from prospective national evaluation in

“early adopter” hospitals. BMJ, 343.

Shen, J. Z. (2019). Artificial Intelligence Versus Clinicians in Disease Diagnosis: Systematic Review.

JMIR Medical Informatics, 7(3):e10010.

Sit, C. S. (2020). Attitudes and perceptions of UK medical students towards artificial intelligence and

radiology: a multicentre survey. Insights Imaging , 11, 14 https://doi.org/10.1186/s13244-019-

0830-7.

Smith, H. (2020). Clinical AI: opacity, accountability, responsibility and liability. AI & Society,

https://doi.org/10.1007/s00146-020-01019-6.

The Guardian. (2020, August 24). Councils scrapping algorithms. Retrieved from The Guardian:

https://www.theguardian.com/society/2020/aug/24/councils-scrapping-algorithms-benefit-

welfare-decisions-concerns-bias

The Nightingale Centre. (2019). Annual Scientific Report . Manchester: Prevent Breast Cancer.

The RSA. (2018). Engaging citizens in the ethical use of AI for automated decision-making. The RSA.

Thompson, R. V. (2018). Artificial intelligence in radiation oncology: A specialty-wide disruptive

transformation? Radiotherapy and Oncology, 421-426.

Tran, V. R. (2019). Patients’ views of wearable devices and AI in healthcare: findings from the

ComPaRe e-cohort. npj Digital Medicine, https://doi.org/10.1038/s41746-019-0132-y.

Tschider, C. (2018). The consent myth: Improving choice for patients of the future. Washington

University Law Review, 1505-1536.

van Hoek. J. Huber, A. L.-K. (2019). A survey on the future of radiology among radiologists, medical

students and surgeons: Students and surgeons tend to be more skeptical about artificial

intelligence and radiologists may fear that other disciplines take over,. European Journal of

Radiology.

Vollmer, S. M. (2020). Machine learning and artificial intelligence research for patient benefit: 20

critical questions on transparency, replicability, ethics, and effectiveness. BMJ, 368 :l6927.

Waymel, S. B. (2019). Impact of the rise of artificial intelligence in radiology: What do radiologists

think?,. Diagnostic and Interventional Imaging, 327-336.

Webster, P. (2020). Virtual health care in the era of COVID-19. The Lancet, 1180-1181.

Willemink, M. K. (2020). Preparing medical imaging data for machine learning . Radiology, 4-15.



Wolff J, P. J. (2020). The Economic Impact of Artificial Intelligence in Health Care: Systematic Review.

Journal of Medical Internet Research. , 22(2):e16866. Published 2020 Feb 20.

doi:10.2196/16866.

Xiang Y, Z. L. (2020). Implementation of artificial intelligence in medicine: Status analysis and

development suggestions. . Artificial Intelligence in Medicine, 102:101780.

doi:10.1016/j.artmed.2019.101780.

Zhang, B. &. (2019). Artificial Intelligence: American Attitudes and Trends. Oxford, UK: Center for

theGovernance of AI, Future of Humanity Institute, University of Oxford.



Appendices

APPENDIX 1 – DATA COLLECTION TOOLS

APPENDIX 2 – THEMATIC ANALYSIS FRAMEWORK

APPENDIX 3 – BUDGET IMPACT ANALYSIS PROCESS

APPENDIX 4 – THEORIES OF CHANGE (DECEMBER 2018)

APPENDIX 5 – EVALUATION QUESTIONS AND ADAPTATIONS

APPENDIX 6 – THE STORY OF THE PROJECT

APPENDIX 7 – PARTICPIANT INFORMATION



Appendix 1

Data collection tools

Survey Maps – staff surveys

Figure 36: Clinical survey map - round 1

Figure 37: Non-clinical survey map - round 1



Figure 38: Clinical survey map - round 2

Figure 39: Non-clinical survey map - round 2



Survey map – Public survey

Figure 40: Public survey map (one round only)

Programme Leadership survey map

Figure 41: Programme leadership survey map - based on parts of the NASSS framework



Focus group protocol

Introduction

The design of the evaluation of this project, which was summarised in the interim evaluation report23, included

a set of focus groups to explore in more detail some of the issues raised in the public survey administered

between December 2019 and February 2020. These were due to take place during April and May 2020. The

Covid-19 pandemic forced a change in the approach and timing of the focus groups and this design document

summarizes the refreshed approach.

Recruitment

We will look to recruit 3 groups at a minimum. One group will be of women under screening age (<50 years),

another will be of women of screening age (50-70 years) and the final group is women of any age from Asian

backgrounds. This final group was under-represented in the survey and we would like to try and capture their

views in the use of AI in breast screening more comprehensively.

We will work with the EMRAD project team to identify members of their patient involvement group who may

be interested as well as with the communications and diversity & inclusion groups at the 7 EMRAD trusts. We

will communicate with local (East Midlands) community and voluntary sector groups with primarily female

membership and ask them to communicate the study and the invitation to participate with their membership.

We will also use social media to invite women from target groups to participate.

Sample size and other considerations for online focus groups

Whereas with in-person focus groups, a single group could consist of as many as 13 individuals, online focus

groups are a bit more clunky, and are prone to technology issues, lagging, internet dropouts, and interruptions.

For that reason, a brief review of the literature shows that it is optimal to cap focus groups at around 6

participants (Kite, 2017); (Flynn, 2018); (Daniels, 2019).

Consent and pre-focus group survey

All participants will fill in a consent form before the invitation to the focus group is sent to them. This will be

based on the consent which was submitted as part of the HRA approval and included at the beginning of the

public survey.

They will also be asked to complete a short survey before the focus group to share some individual characteristic

information that will be used as part of the data analysis after the focus group and to ensure that the groups are

balanced.

Facilitation

The role of the researcher or facilitator is key to making focus groups effective. Once the group is underway,

the researcher or facilitator will allow, as much as possible, discussion to emerge from the group itself. The

facilitator will occasionally help the group to focus and structure their discussion, bring discussion back or

move it on, widen the discussion to include everyone, and ensure a balance between participants. The

facilitator will guide and space the discussion to ensure all the issues are covered, and they probe individuals

and the group as a whole to encourage in-depth exploration. They will be alert to non-verbal language and to

23 CAPACITY CARE AND CONFIDENCE. Developing and Testing Artificial Intelligence in Breast Screening Interim Evaluation Report. TaoHealth Research & Implementation. Nov 2019.



the dynamic of the discussion, and they will challenge or stimulate the group if what is said seems too readily

challenging to reflect social norms or apparent consensus.

Group format

Each group will be scheduled for 60 minutes. An outline of the format of each group is set out below.

Timing Description

Set-up The facilitator will open the virtual room 15 minutes before the start to help any participants having technical difficulties.

Introduction By the facilitator describing the study and the contribution they are making through the focus group. Permission will be sought for group recording for the purpose of analysis only. Ground rules for the online discussion will be set including use of text chat and hand raising. Each participant to introduce them selves by first name only.

Discussion Facilitator will use the topic guide below to explore in more depth some of the points raised through the public survey.

Close The facilitator will bring the group to a close.

Introduction to the technology under investigation

This has been based on the advice of the EMRAD AI Project PPPG which met on 23 June 2020 and will be

supported by slides.

• What do you think AI is?

• The definition of AI that is being used

• You could think of Artificial Intelligence (AI) as computers and robots understanding patterns,

pictures, speech and language. They can learn from their understanding and make decisions.

• Do you know how the breast screening process works after a woman has been for your mammogram?

• Do you know there is a shortage of doctors especially in radiology, the specialism that reads mammograms?

Topic guide

1. How do you feel about the use of technology to provide you with health care?

a. Examples will be given = online booking for GP appointments; video-conferencing consultations;

remote monitoring apps for diabetes and heart disease.

2. Have you ever used any technology when seeking or receiving health care?

a. If any of these used artificial intelligence, how would you feel?

3. Trust is a word that often comes up for people in this context. What would make you trust a technology that

used artificial intelligence more or less?

4. Another theme that came up in our survey was a concern that AI technology lacked empathy and emotion.

How would that effect your views about implementing it?

5. Would you want to know that an AI product was being used to read mammograms when you went for a

breast screening appointment?

a. If you would, what would you want to know?



6. Imagine a scenario where you had a choice between a breast screening appointment that has one human

reader and one AI reader and could get your results back in one week and a breast screening appointment

that has two human readers and could get your results back in three weeks (due to not enough imaging

readers) – which would you chose and why?

These topics capture the content to be covered. Specific questions will be framed based on the group and the

flow of the discussion.



Appendix 2

First order thematic analysis

Domain / Node Detail / Sub-node

Reading process

Workforce shortage

Admin process

AI is….. What is AI?

Not sure

Reliablity might be better

Greater efficiency

Possibly safer

Job loss

Improve outcomes

Releases professionals for value-added human tasks

Over-reliance

Patient empowerment

Cheaper

More data

Trust is an issue

Must ensure privacy and security

Needs to be evidence based

Choice and consent are important

Want to see humans augmented by machines

Want machines overseen by humans

Must ensure equitable access

Owners are private for profit

Who owns or understands it?

Owners are not for profit

It will happen

It should happen

It shouldn't happen

AI misses the 'human touch'

AI misses accountablity

No

Yes

It depends

Cost How much will it cost to implement?

What is misses

Need to know AI is being used

Breast screening

Effects

Governance

How it is used

Owner

The future



Appendix 3

Budget impact analysis methodology

The aim was to focus on affordability by conducting a Budget Impact Analysis identifying the costs and possible

savings related to the application of the AI solutions and investigating the affordability of these as a function of

available resources.

At a very basic level, our focus was to identify what would be materially impacted (time, resources), what savings

could be realised and what variables and uncertainties needed to be quantified [Figure 42].

Figure 42: The budget impact analysis process

The key was therefore to identify the resources that might change [

Figure 43], allowing the modelling of plausible scenarios and sensitivity analysis. Data was provided to us by the

EMRAD team, specifically around the two sites selected for the breast screening test bed. Additional data

sources are listed below. Finance managers provided extracts of their general ledgers itemising direct and

indirect costs allocated. We have relied on all historical information provided to us as being accurate, and made

no attempt to verify this information. EMRAD also shared the full business case submitted, along with their own

analysis of mammogram reading costs and the projected implementation budget of the next phase with Kheiron

Medical Technology’s Mia™ testing (as part of the AI Award Phase 3 application). We were unable to conduct

analysis of the service optimisation tool as no data was available for testing in the timeframe of the evaluation.

Budget impact analysis (BIA) has been defined as a tool to predict the potential financial impact of the

adoption and diffusion of a new technology into a healthcare system with finite resources.

BIA considers costs and benefits which are monetised – non-financial benefits are excluded.

No discounting is undertaken for costs and benefits in future years.



Figure 43: Data sources for BIA

Analysing the information received, we focused on what assumptions could be supported by evidence, and what

the material impact would be of deploying the Mia™ tool within the breast screening programme. Where

opinions were expressed, or in cases where ranges were given (i.e. number of mammograms reviewed per hour),

we discussed these with the clinical managers concerned and proceeded with a conservative mid-range figure.

Since it would be difficult to validate downstream benefits before establishing peer-reviewed clinical evidence,

we focused on what immediate and direct resource implications were revealed during the initial test phase.

The most material and tangible benefit identified, around which there was sufficient supporting evidence was

around second reader time. Having already analysed relative cost breakdowns in both general ledgers, we were

able to narrow down focus to the most verifiable impact of AI in second report screening: cost of reader time.

Figure 44: Approach to analysis

Whilst we noted benefits claimed by Kheiron Medical Technologies include reduced waiting time for results,

increased uptake from women screened and increased productivity of non-reader staff, we could only find

evidence to support a specific benefit: reducing second reader time.

Since there were no agreed prices, nor any published market pricing, we were unable to compute specific budget

impacts at a given investment level. Furthermore, with the commercial information not being part of the public

domain, we did not feel it appropriate to include any commercially sensitive information in this report.

We computed some illustrative break-even / recoupment figures based on some pricing scenarios. This was

done in layers, showing what costs could be recouped (i.e. implementation cost layer, implementation costs and

payments) with the aim of establishing break-even levels for each outflow. This analysis was shared with EMRAD

but has not been published here for the reasons given above.



It should be noted that this analysis centred around quantifiable direct savings from reduced cost based on

different skill mixes (under different radiologist costs and radiographer grades and whether mix was consultant

+ consultant, consultant + Band 7, consultant + Locum or Locum + Locum arrangements) and no downstream

benefits such as reduced assessment clinic costs, have been quantified nor modelled. Sites where reader costs

are highest (i.e. Locum + Locum), the savings from AI are higher than where lower cost readers (i.e. Consultant

+ Band 7/8) are deployed.



Appendix 4

Theories of change (December 2018)

Mia™



Faculty’s Platform tool



Appendix 5

Original question Revised question Reason for change

1. What is the potential impact of the screening imaging innovation programme on the performance of screening services?

a) Is the Mia™ software suitable for use in a large-scale screening programme like NHSBSP by comparing rates of specificity/sensitivity/recall rate on a large cohort of historic screening cases?

b) Does the software achieve similar results for different manufacturers equipment used across the EMRAD consortium?

c) How do these results inform the process of designing and developing a prospective pilot with the Mia™ software?

No change N/A

2. What is the measurable impact of the screening optimisation innovation programme?

a) On breast screening programme manager and administrative staff time?

b) On breast screening workforce utilisation?

c) On optimised breast screening pathway and clinical risk management?

d) On the rate of breast screening ‘did not attends’ (DNAs)?

No change

For future: What will the synthetic data set (SDS) allow NHS breast screening units to do that could not otherwise be done?

Added a question about the synthetic data set (SDS) as this was introduced at a late stage in the project.



Original question Revised question Reason for change

3. What have been the moderating factors in implementing the programme of innovations?

That enabled the project to progress;

That constrained progress.

What were the technological and data benefits and challenges for the project?

What were the perceived benefits of the use of AI tools in the NHSBSP workflow?

What were the concerns raised about the use of AI tools in the NHSBSP workflow?

What were the organizational issues that enabled or constrained the progress of the project?

What wider contextual issues affected the progress of the project?

Focus on the most salient enablers and constraints of the project from the perspective of different stakeholders:

• Programme leadership including commercial partners;

• Breast screening unit workforce;

• Women at and under screening age.

4. Was the programme worth the investment, that is, did it deliver value for money and if not in the timeframe of the evaluation, when is it likely to deliver a return on investment?

How would a future prospective evaluation of Mia™ and the round length optimisation tool determine the return on investment based on early exploratory findings from this project?

A budget impact analysis (BIA) was conducted with gaps in information (e.g. product price is not yet set for Mia™ and no data was available on the deployment of the round length ML tool). Assumptions have been made for the purpose of the evaluation (Mia™ only) but the model is only indicative and would need to be tested when price information is available and the products are being tested in the real world.

5. What would the impact of the screening imaging innovation programme be if implemented at scale across EMRAD?

How would a future prospective evaluation of Mia™ evaluate the impact of large-scale implementation across EMRAD based on early exploratory findings from this project?

The indicative BIA provides input to the health economic model for future implementation at scale.



6. What would the impact of the screening optimisation innovation programme be if implemented at scale across EMRAD?

How would a future prospective evaluation of the round length optimisation tool determine the impact of large-scale implementation across EMRAD based on early exploratory findings from this project?

Early exploratory findings not yet available.



Appendix 6

The AI in breast cancer screening project – October 2018 to December 2020

How the project come about

NHS EMRAD had been an Acute Care Collaborative (ACC) as part of the NHS New Care Models programme in

2016-18 and in the summer of 2018 the Medical Director of EMRAD, Dr Tim Taylor, met Kheiron Medical

Technologies and Faculty (formerly ASI Data Science) and two other potential partners to discuss a potential bid

for funding under the NHS England Test Bed Wave 2 programme. The joint bid which included the seven EMRAD

NHS Trusts, Kheiron Medical Technologies, Faculty, GE Healthcare (the providers of the imaging infrastructure

for EMRAD) and others was successful in August 2018.

The first three months

October to December 2018 were spent focusing on the set-up of the project. This meant establishing the project

governance structures and processes, straightforward for an EMRAD project team that had done this before.

Amongst the project documentation that needed to be drafted was the collaboration agreement. This required

some lengthy negotiation with involvement from legal advisors and was only signed in January 2019 after some

prolonged discussion regarding intellectual property. In this period, Kheiron were conducting initial preparatory

work for data extraction and Faculty were engaged in a discovery process with breast screening unit managers.

EMRAD and Kheiron also used this time to engage with women at a number of breast cancer events to get a

preliminary understanding of their thoughts about the use of AI in breast screening and feedback on the name

of the product. It was through this process that the name Mia™ (Mammography Intelligent Assessment) was

agreed. Delays disbursing funds from Innovate UK meant that all partners had to proceed at risk during this

period. Also during this period, the Test Beds national team had established national information governance

and evaluation advice partners for all Wave 2 Test Beds sites. Based on feedback from Test Bed sites, this support

was terminated in early 2019.

Some early challenges to overcome

The first 9 months of the initially 18-month long project were characterised by information governance

negotiations, planning and problem-solving. These challenges underlined the importance of clear leadership,

open conversations and working through disagreements constructively. Early lessons that were learnt included

the importance of involving information governance and research expertise from the bid stage of the project

process. It was also noted that the evaluation and lessons learnt from the Wave 1 Test Bed programme were

not made available which left the project team feeling that there was a risk that they were repeating mistakes

that they did not need to make.

The challenges get bigger

From mid-summer 2019 there was a growing recognition of the perhaps unrealistic ambition of the timescale

and scope for a project as innovative as this. Information governance remained an issue and with minimal

guidance available from, for example, the Information Commissioners Office (ICO), the project had to forge a

novel and sometimes fraught path. This was compounded by some inconsistency in how information governance

was approached for the two commercial partners. During this time, the regulatory and policy context was

changing. New guidance like the Evidence Standards for Digital Health Technology and the creation of NHSX as

a non-statutory advisory body were amongst almost monthly changes at this time. Project progress was steady



if slow. An extension of 6 months was granted to all Wave 2 projects during this time, in recognition of the delays

caused funding disbursement. All of this was happening at a time when the project was on regular display to the

outside world, presenting at conferences and sitting on panels sharing experiences. A decision was made by the

end of 2019 to stop attending these events and focus on delivery.

And the biggest (and most unanticipated) challenge of them all

In March 2020, the Covid-19 pandemic led the UK government to declare a nationwide lockdown. The NHS was

already dealing with the consequences of the infection. As part of the response the pandemic, the NHS breast

screening programme, alongside all other national screening programmes was placed ‘on pause’. Some EMRAD

project staff were redeployed to frontline clinical work at participating trust sites and Kheiron Medical

Technologies placed a proportion of their staff on furlough reducing organisational capacity. Public Health

England (PHE) played a key role in responding to the pandemic and staff responsible for the breast screening

programme were also refocused on to pandemic related work. In August 2020, the Secretary of State announced

the abolition of PHE to be replaced in April 2021 by the National Institute for Health Improvement. This

compounded the existing challenge of engaging with Hitachi as provider of the National Breast Screening System

(NBSS) and PHE.

What next?

Kheiron will expand on the retrospective study using Mia™ in 15 more sites in England under the NHS AI Award

Phase 4. Faculty and EMRAD are looking for funding to deploy the round length management ML tool at scale

across EMRAD and EMRAD continues to deliver radiology services at scale and to innovate operating at a fully

engaged imaging network working closely with their current global partner, GE Healthcare.

Figure 45: The AI breast screening project timeline – key events and responses



Appendix 7

Participant information: Staff surveys

Figure 46: Survey response numbers: Clinical and Non-Clinical Surveys

Figure 47: Survey response rates: Clinical and Non-Clinical Surveys



Figure 48: Age and gender profile of clinical staff

Figure 49: Age and gender profile of non-clinical staff



Participant information: Public survey

Figure 50: Public survey response numbers and rates

Figure 51: Public survey age profile



Figure 52: Public survey ethnicity profile

Figure 53: Public survey occupation profile



Figure 54: Attendance at breast screening by age band



Focus group membership

Figure 55: Age profile

Figure 56: Ethnicity



Figure 57: Occupation

Figure 58: Have had a breast cancer diagnosis or knows someone who has



Figure 59: Attended a breast screening appointment

Figure 60: What happens when a mammogram is read



Programme leadership respondent characteristics

The survey was sent to 74 members of the following EMRAD governance groups:

• EMRAD Management Board

• EMRAD Operational Board

• EMRAD Information Governance Board

• Artificial Intelligence Programme Board

Only the members of AI Project Board and some members of the EMRAD Information Governance Board have

been directly involved in the implementation of the programme.

18 programme leaders responded to the survey (response rate of 24%).

We stratified the respondents into those directly involved in the day to day implementation of the project and

those who were not directly involved but played a support or advisory role or a role in governance and decision-

making.

Figure 61: Programme leadership segmentation



Dr Niamh Lennox-Chhugani

Research Director, TaoHealth Research & Implementation

[email protected]

Sanjeev Chhugani

Financial analyst, TaoHealth Research & Implementation

[email protected]

For more information T: +44 (0)7983 458 733

www.taohealth.co.uk

http://twitter.com/taohealth2

mailto:[email protected]

http://www.taohealth.co.uk/

http://twitter.com/taohealth2

evaluation of emrad ai in mammography project 2018-2020

Documents