patricia diana soerensen*, henry christensen, soeren gray

12
Patricia Diana Soerensen*, Henry Christensen, Soeren Gray Worsoe Laursen, Christian Hardahl, Ivan Brandslund and Jonna Skov Madsen Using articial intelligence in a primary care setting to identify patients at risk for cancer: a risk prediction model based on routine laboratory tests https://doi.org/10.1515/cclm-2021-1015 Received July 21, 2021; accepted October 1, 2021; published online ▪▪▪ Abstract Objectives: To evaluate the ability of an artificial intelli- gence (AI) model to predict the risk of cancer in patients referred from primary care based on routine blood tests. Results obtained with the AI model are compared to results based on logistic regression (LR). Methods: An analytical profile consisting of 25 predefined routine laboratory blood tests was introduced to general practitioners (GPs) to be used for patients with non-specific symptoms, as an additional tool to identify individuals at increased risk of cancer. Consecutive analytical profiles ordered by GPs from November 29th 2011 until March 1st 2020 were included. AI and LR analysis were performed on data from 6,592 analytical profiles for their ability to detect cancer. Cohort I for model development included 5,224 analytical profiles ordered by GPs from November 29th 2011 until the December 31st 2018, while 1,368 analytical profiles included from January 1st 2019 until March 1st 2020 constituted the out of timevalidation test Cohort II. The main outcome measure was a cancer diagnosis within 90 days. Results: The AI model based on routine laboratory blood tests can provide an easy-to use risk score to predict cancer within 90 days. Results obtained with the AI model were comparable to results from the LR model. In the in- ternal validation Cohort IB, the AI model provided slightly better results than the LR analysis both in terms of the area under the receiver operating characteristics curve (AUC) and PPV, sensitivity/specicity while in the out of timevalidation test Cohort II, the obtained results were comparable. Conclusions: The AI risk score may be a valuable tool in the clinical decision-making. The score should be further vali- dated to determine its applicability in other populations. Keywords: artificial intelligence (AI) blood; cancer; pre- dictive; score. Introduction Risk prediction models aim to assist healthcare pro- viders in the process of clinical decision making by estimating the probability of specific outcomes in a population. Traditionally, parametric logistic regression analyses (LR) have dominated and improved risk pre- diction in healthcare for decades [1]. However, the increased opportunities of managing large and complex datasets have encouraged the application and the development of new models and tools based on articial intelligence (AI) [2]. In a primary care setting one of the main challenges is to ensure an early diagnosis of cancer, as this entails better prognosis and lower mortality [3]. Many of the symptoms associated with malignant disease are non-specific, vague or imprecise and relative low risk. Even when it comes to classical alarmsymp- toms, the positive predictive value (PPV) for an underlying *Corresponding author: Patricia Diana Soerensen, Department of Clinical Biochemistry and Immunology, Lillebaelt Hospital, University Hospital of Southern Denmark, Vejle, Denmark, E-mail: [email protected] Henry Christensen, Department of Clinical Biochemistry and Immunology, Lillebaelt Hospital, University Hospital of Southern Denmark, Vejle, Denmark Soeren Gray Worsoe Laursen, The Danish Cancer Society, Copenhagen, Denmark Christian Hardahl, SAS Institute A/S, Aarhus, Denmark Ivan Brandslund, Department of Regional Health Research, University of Southern Denmark, Odense, Denmark Jonna Skov Madsen, Department of Clinical Biochemistry and Immunology, Lillebaelt Hospital, University Hospital of Southern Denmark, Vejle, Denmark; and Department of Regional Health Research, University of Southern Denmark, Odense, Denmark. https://orcid.org/0000-0001-6668-4714 Clin Chem Lab Med 2021; aop Open Access. © 2021 Patricia Diana Soerensen et al., published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 International License.

Upload: others

Post on 05-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

Patricia Diana Soerensen*, Henry Christensen, Soeren Gray Worsoe Laursen,Christian Hardahl, Ivan Brandslund and Jonna Skov Madsen

Using artificial intelligence in a primary caresetting to identify patients at risk for cancer: a riskprediction model based on routine laboratorytestshttps://doi.org/10.1515/cclm-2021-1015Received July 21, 2021; accepted October 1, 2021;published online ▪▪▪

Abstract

Objectives: To evaluate the ability of an artificial intelli-gence (AI) model to predict the risk of cancer in patientsreferred from primary care based on routine blood tests.Results obtained with the AImodel are compared to resultsbased on logistic regression (LR).Methods: An analytical profile consisting of 25 predefinedroutine laboratory blood tests was introduced to generalpractitioners (GPs) to be used for patients with non-specificsymptoms, as an additional tool to identify individuals atincreased risk of cancer. Consecutive analytical profilesordered by GPs from November 29th 2011 until March 1st2020 were included. AI and LR analysis were performed ondata from 6,592 analytical profiles for their ability to detectcancer. Cohort I for model development included 5,224analytical profiles ordered by GP’s from November 29th2011 until the December 31st 2018, while 1,368 analyticalprofiles included from January 1st 2019 until March 1st 2020constituted the “out of time” validation test Cohort II. The

main outcome measure was a cancer diagnosis within90 days.Results: The AI model based on routine laboratory bloodtests can provide an easy-to use risk score to predictcancer within 90 days. Results obtainedwith the AImodelwere comparable to results from the LR model. In the in-ternal validation Cohort IB, the AImodel provided slightlybetter results than the LR analysis both in terms of thearea under the receiver operating characteristics curve(AUC) and PPV, sensitivity/specificity while in the “out oftime” validation test Cohort II, the obtained results werecomparable.Conclusions: TheAI risk scoremay be a valuable tool in theclinical decision-making. The score should be further vali-dated to determine its applicability in other populations.

Keywords: artificial intelligence (AI) blood; cancer; pre-dictive; score.

Introduction

Risk prediction models aim to assist healthcare pro-

viders in the process of clinical decision making by

estimating the probability of specific outcomes in a

population. Traditionally, parametric logistic regression

analyses (LR) have dominated and improved risk pre-

diction in healthcare for decades [1]. However, the

increased opportunities of managing large and complex

datasets have encouraged the application and the

development of new models and tools based on artificial

intelligence (AI) [2].In a primary care setting one of the main challenges is

to ensure an early diagnosis of cancer, as this entails betterprognosis and lower mortality [3].

Many of the symptoms associated with malignantdisease are non-specific, vague or imprecise and relativelow risk. Even when it comes to classical “alarm” symp-toms, the positive predictive value (PPV) for an underlying

*Corresponding author: Patricia Diana Soerensen, Department ofClinical Biochemistry and Immunology, Lillebaelt Hospital, UniversityHospital of Southern Denmark, Vejle, Denmark,E-mail: [email protected] Christensen, Department of Clinical Biochemistry andImmunology, Lillebaelt Hospital, University Hospital of SouthernDenmark, Vejle, DenmarkSoeren Gray Worsoe Laursen, The Danish Cancer Society,Copenhagen, DenmarkChristian Hardahl, SAS Institute A/S, Aarhus, DenmarkIvan Brandslund, Department of Regional Health Research, Universityof Southern Denmark, Odense, DenmarkJonna Skov Madsen, Department of Clinical Biochemistry andImmunology, Lillebaelt Hospital, University Hospital of SouthernDenmark, Vejle, Denmark; and Department of Regional HealthResearch, University of Southern Denmark, Odense, Denmark.https://orcid.org/0000-0001-6668-4714

Clin Chem Lab Med 2021; aop

OpenAccess.©2021 Patricia DianaSoerensen et al., publishedbyDeGruyter. Thiswork is licensed under the Creative CommonsAttribution 4.0International License.

Page 2: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

malignant disease is low [4]. While cancer biomarkers areroutinely used in hospital settings, applied to the low riskpopulation in a primary care setting, they have a low PPVtowards detecting cancer, and at the same time high falsepositive rates.

Given the relatively low PPV of individual blood tests,two main approaches to assess cancer risk based on testsperformed fromblood samples have emerged.One approachis basedondetecting circulating freeDNA (cfDNA) in abloodsample, whereas the other is based on applying artificialintelligence todetect non-obvious and latent relationships inroutine blood based laboratory test results.

The approach using cfDNA released to the blood inorder to detect possible cancer is a field in rapid growth.Thus, a noninvasive blood test (CancerSEEK)was shown toperform with greater than 99% specificity and with sensi-tivities ranging from 69 to 98% for the detection of fivecancer types—ovarian, liver, stomach, pancreas, andesophageal—for which there are no current screening testsavailable for average-risk individuals [5]. In addition, anoninvasive blood test based on circulating tumor DNAmethylation (PanSeer) was reported to be able to detectcancer up to four years before standard diagnosis in alongitudinal study [6].

Schneider et al. validated a predictive model generatedby a machine-learning algorithm that used complete bloodcell count and demographic data from individuals at ages50–75 years with the purpose of identifying individuals atincreased risk for colorectal cancer. At a specificity of 97%corresponding to a high score from the developed algorithmthey obtained a sensitivity of 35.4% for a colorectal cancerdiagnosis within the next 6 months and had an area underthe receiver operating characteristics curve (AUC) of 0.78 [7].

Thus, routine laboratory test results may contain far

more information than recognized by even the most

experienced clinician and detection of such non-obvious

interrelationships are suitable to analysis by artificial

intelligence in order to provide individual risk scores.In January of 2008, Lillebaelt Hospital introduced a

gender specific analytical profile based on routine labo-ratory tests to be used in the primary care setting by gen-eral practitioners (GP) as an additional tool for patientswith non-specific symptoms to identify individuals atincreased risk of cancer.

In the current study, we evaluate the ability of an AImodel to provide an individual cancer risk score based onthese routine laboratory tests. In addition, the risk scoresobtained in the AI model are compared to results obtainedby standard logistic regression (LR).

Materials and methods

Study population and laboratory tests

The uptake population area is located in the Region of SouthernDenmark with around 350,000 inhabitants served by 106 GPs. In ajoint collaboration between the local Clinical Biochemistry laboratoryat the Lillebaelt Hospital and the GPs, a specific analytical profilecontaining routine blood tests was provided as an additional toolin the GPs diagnostic arsenal meant for patients consulting theirphysician with common or non-specific symptoms and where theGP suspected possible hidden cancer. As an initiative promptedby Denmark’s third national cancer plan, the urgent referral forunspecific, serious symptoms was implemented nationally by theNational Board of Health and Danish Regions in 2011. The pathwayconsists of a two-step approach with a filter function performed bythe GP and, if still relevant, a referral to a diagnostic center. The filterfunction is a battery of diagnostic investigations consisting ofanamnesis, blood and urine tests and diagnostic imaging. It is thispredefined routine laboratory set of blood tests that is the subjectmatter of our study.

The GP could order an analytical profile labeled “Suspicion ofHidden Cancer/Woman” or “Suspicion of Hidden Cancer/Man”. Thus,the set of blood testswasdrawn frompatientswherenoobvious tentativediagnoses for a specific cancer or other diseases identified by the GP.

The analytical profile was introduced January 2008 and con-sisted of the following components:– In both men and women: B-hemoglobin, Mean corpuscular

volume (MCV), Mean corpuscular hemoglobin (MCH),B-leukocytes with differential count, B-reticulocytes, B-platelets,P-C-reactive protein, P-sodium, P-potassium, P-calcium total,P-albumin, P-creatinine, P-carbamide, P-urate, P-glucose,P-bilirubin, P-alanine transaminase, P-basic phosphatase,P-amylase pancreatic specific, P-lactate dehydrogenase,P-immunoglobulins A, G and M (IgA, IgG and IgM), P-thyroideastimulating hormone,

– In men: in addition P-prostate-specific antigen– In women: in addition P-cancer antigen-125

We did the leukocyte subpopulation quantifications on the Sysmexhematology systems and they were quantified as total Leucocytes,Neutrophils, Eosinophils, Basophils, Lymphocytes and Monocytes.

During the whole study period, we have used equipment fromRoche for routine biochemistry analysis and from Sysmex for thehematology instruments. However, for the Roche instruments, therehas been both instrument andmethodological upgrades in the period.According to our routine procedures for continuous quality insur-ance, the validation of each laboratory component was performed,including the investigation of a potential bias between the previousand the current modules. Thus, the upgrades in instrument and/ormethodology did not have an impact on the results reported here.

Study cohort

Due to changes in the laboratory information system, only data afterNovember 29th 2011 were available. The total eligible study cohortincluded 6,592 consecutive analytical profiles ordered from GPs: 5,224

2 Soerensen et al.: Using AI to identify patients at risk for cancer

Page 3: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

were included from November 29th 2011 until the December 31st 2018(Cohort I) and 1,368 from January 1st 2019untilMarch 1st 2020 (Cohort II).

The following exclusion criteria were applied: Individuals <18years of age; individuals without information on gender; patientsdiagnosed with cancer within the last 5 years; for individuals withmore than one analytical profile ordered prior to being diagnosedwithcancer, only the first was included for data analysis. Finally, caseswith too few laboratory tests results within the analytical profile werealso excluded (initial analytical profiles with ≤10 laboratory test re-sults, and again in the final model analytical profiles ≤30 laboratorytests results).

A presentation of the total study cohort is provided in Figure 1.

Outcome measures

Any cancer diagnosed within 730 days of follow up after study in-clusion was registered. All types of cancer were included (both pri-mary and metastatic).

The main outcome measure in this study was “cancer within90 days” which we considered a sufficient follow period up to ensurevalid data for Cohort II regarding registered cancer diagnosis as well.

Ethics

Project approval, data transfer and data safety: The project was car-ried out by permission from the Danish Board for Patient Safety(number 3-3013-2954/1) and Hospital Lillebaelt. All data were ano-nymized and transferred to the “in house” SAS Viya platform. No dataleft the hospital regional database.

Data management and statistics

AI model development was performed based on Cohort I and finalvalidation of the model was performed in the “out of time” validationtest Cohort II. Cohort I was divided into training and validation sec-tions. This splitting was done outside of the modelling processes toensure that all models used the same criteria for this procedure,thereby allowing for a comparison of performance results. The datawas assigned to training and validation sections respectively by an 80/20 split by stratified random sampling. The 80/20 split was chosen topermit many training observations as machine learning techniquesare “data hungry” and the Cohort I contained only 5,224 observations.

Cohort IA for training consisted of 4,182 analytical profiles whileCohort IB for validation consisted of 1,042 profiles.

Transformations: Data was transformed in the manner below toaccommodate the types of models that are incapable of handlingmissing values and are sensitive to variables of different scales:a. Missing biomarkers were imputed/replaced by their mean value

(from the training section).b. All input variables were z-normalized (subtraction of sample

mean, division by sample standard deviation – both statisticsestimated on the training section).

The implemented code allowed for imputation and transformations tobe gender specific. At first, a gender specific model development wasperformed including PSA formales and Ca125 for females. A drawback

with this approach, however, was that our cohorts, both the trainingand validation, were rather small for the purpose. Thus, also a fullmodel was developed, without regard to gender. The full model per-formed aswell as the gender specificmodels andwas therefore chosenas the final model.

AI model selection: AI model selection was performed on SAS® Viya®

(V.03.04, Denmark).All model parameters were estimated in the training section

while performance measures were obtained from the validation sec-tion. The main criteria for model selection were the Area Under thecurve and the receiver operating characteristics curve (AUC/ROC)measured on the validation section for both genders.

Determining the AI models: It was decided to evaluate two initialstrategies for model development prior to determining which one tochoose for further fine-tuning.a. A Random forest based method which has the advantage of

frequently allowing good results with a minimum of imputationsand transformations as this model handles missing valuesexplicitly without being affected by variable scale or magnitude.

b. A Neural network model, which requires imputation and a stan-dardization method, but allows for more flexibility in themodelling process.

Both model types were tested in order to determine which methodseemed most promising. As the first round of modelling indicated agreater fit for the Neural Network model, this was selected for furtherfine-tuning. Thus, multiple hyper parameters were tested in order toobtain the best possible AUC in the neural network model.

Applying the bootstrap procedure: The Bootstrap procedure wasapplied with the purpose of evaluation of the imprecision of themethodology through the following procedure:(1) A number of groups were created based on the assigned score

from the model. Each score belongs to one group.(2) A Bootstrap procedure was used to estimate the event rate dis-

tributionwithin each data partition and group as describedbelow.For each partition the following procedures were performed:a. All blood samples in thepartitionwere scored andassigned to

the relevant group by comparing the individual score to thecutoffs of the groups.

b. From the partition one thousand samples (with a size equal tothe partition size) were drawn with replacement (i.e. 1,000Bootstrap replicates). The Bootstrap samples were generatedas balanced Bootstrap samples, i.e. each original observationis represented the same number of times in the final samples.

c. For each replicate the empirical event rate in each group wasestimated

d. For each group the distribution of the 1,000 samples of theevent rate was used to compute the mean and relevant frac-tiles of the distribution.

The end goal of the procedure was to obtain group estimates for theempirical event rate on the test and validation section.

Standard statistical analyses: Standard statistical analyses wereperform on the SAS platform; logistic regression (LR) was performedfor the standard ROC curve determination on the development Cohort I

Soerensen et al.: Using AI to identify patients at risk for cancer 3

Page 4: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

Figure 1: Total study cohort. The three cohorts of the study are shown.

4 Soerensen et al.: Using AI to identify patients at risk for cancer

Page 5: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

(training Cohort IA and validation Cohort IB) and on the “out of time”validation test Cohort II using “cancer within 90 days” as target.

Test performance presentation: With the purpose of facilitating abetter understanding of the results for the clinicians, data was alsocalculated to generate sensitivity, specificity and predictive valuesaccording to amodel described in thework of Gerhardt et al. [8]. Thesecalculations were done for both the AI model and the LR model.

The AI risk was generated by machine learning in a supervisedprocess using the outcome data. Based on an arbitrary risk scale from0 to 100, the relative risk was determined. Based on the observedabsolute risk of being diagnosed with cancer within 90 days, thisscore was converted to absolute risk for cancer. The performance ofthe AI score to predict cancer within 90 days was calculated for the“out of time” validation test Cohort II at different thresholds. Theformula TP/(TP + FP) was used for calculation of the threshold forpositive test, where TP is true positive and FP is false positive. Theformula FN/(FN + TN) was used for calculation of the threshold fornegative test, where FN is false negative and TN is true negative.

Results

The total number of “cancer within 90 days” of studyinclusion was 5.67% for training Cohort IA, 5.28% forvalidation Cohort IB and 6.14% for validation test Cohort IIrespectively.

The frequency of the 22 most common cancer typesdiagnosed in the total Cohort (IA, IB and II) within 90 daysof study inclusion is presented in Table 1.

AI ROC curves for training and validation in Cohort Iwith the primary outcome “cancer within 90 days” arepresented in Figure 2.

The AUC results obtained by the AI analysis in Cohort I(both the training Cohort IA and the validation Cohort IB)and in the “out of time” validation test Cohort II are pre-sented in Table 2 (training Cohort IA, validation Cohort IBand validation test Cohort II, overall and by gender: femaleand male. Target: cancer within 90 days).

Bootstrap performed in both the validation Cohort IBand in the validation test Cohort II shows risk categoriesindicated as very low, low, medium, high and very highand the estimation of the uncertainty of the result (Table 3).

The distribution of patients across AI risk scores ob-tained in the validation test Cohort II is presented inFigure 3A together with the observed incidence of cancerwithin 90 days in Figure 3B.

Prediction scores for the AI model covering a range ofscores from 0 to 100 were calculated and data obtained atdifferent thresholds in the validation Cohort IB are pre-sented in Table 4.

For each AI score, data are presented with the corre-sponding results regarding true positive number (TP(n)),false negative number (FN(N)), true negative number(TN(n)), false positive number (FP(n)), sensitivity%, spec-ificity%, positive predictive value (PPV)% and negativepredictive value (NPV)%.

Similar, the developed AI model was applied to the“out of time” validation test Cohort II. The absolute risk forhaving a cancer diagnose within 90 days at differentthresholds according to the AI score is provided in Table 5.

Standard statistical analysis

The AUC obtained by the LR analysis was 0.80, 0.80 and0.79 for the training Cohort IA, the validation Cohort IB andthe “out of time” validation test Cohort II, respectively.Data is presented in Figure 4.

Data regarding sensitivity, specificity, PPV and NPV atdifferent scores calculated from LR method is presented inTable 6 for the validation Cohort IB and in Table 7 forvalidation test Cohort II.

Discussion

This study demonstrates that the developed AI modelbased on routine laboratory tests from GP’s in a primary

Table : Cancer types diagnosed within days of study inclusion.

Cancer type Cases in total cohort,n (%)

Prostate cancer (.%)Cancer of the upper lobe of the lung (.%)Lung cancer (.%)Multiple myeloma (.%)Ovarian cancer (.%)Prostate cancer with metastases (.%)Diffuse large cell B-cell lymphoma (.%)Cancer of the sigmoid colon (.%)Rectal cancer (.%)Cancer of the bronchi and lung spanningmultiple localizations

(.%)

Cancer of the ascending colon (.%)Cancer of the lower lobe of the lung (.%)Breast cancer (.%)Chronic B-cell type (B-CLL) chronic lympho-cytic leukemia

(.%)

Hepatocellular carcinoma (.%)Kidney cancer (.%)Bladder cancer (.%)Remote metastasis of bone or bone marrow (.%)Uterine cancer (.%)Pancreatic cancer (.%)Cancer of the gastric cardia (.%)Stomach cancer (.%)

Soerensen et al.: Using AI to identify patients at risk for cancer 5

Page 6: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

care setting can provide a specific risk score for the pre-diction of cancer comparable to, and in some respectsslightly better than using standard statistical measuressuch as logistic regression (LR).

The AI model provided slightly better results than theLR analysis when looking at the ROC curves obtained in thevalidation Cohort IB. In this case, the AI model had an AUCof 0.86 compared to 0.80 obtained in the LR model. In the“out of time” validation test Cohort II however, theobtained AUC results were comparable, with an AUC for

the AI model of 0.79 compared to 0.79 for the LR model. Areason for the lower AUC obtained with the AI model in thevalidation test Cohort II than in the validation Cohort IBmay be due to the model being “overfitted”with regards tothe training and validation sections. Though the validationCohort IB was not used to train the neural network, it wasused in the selection of neural networks.

The corresponding results, when comparing the twomodels in performance on PPV, specificity, sensitivity andNPV (data presented in Tables 4–7) confirmed that resultsfrom the AI model turned out to be slightly better than theLR model in the validation Cohort IB. However, whenapplied to the “out of time” validation test Cohort II resultsobtained with the AI model were comparable to resultsbased on the LR model. As mentioned above this is prob-ably due to “overfitting” in the AI model regarding resultsin the validation Cohort IB.

This challenge with “overfitting” has been demon-strated in a multitude of previous studies, where the AImodel provided good PPV within the dataset from whichthey were derived, but underperformedwhen applied to anexternal validation cohort [9].

This demonstrates and underscores the generalimportance of performing internal and external validation

Figure 2: The AUC results obtained by the AI analysis in Cohort IA for training and validation Cohort IB.ROC curves obtained in the development cohort, with the training Cohort IA (left) and the validation Cohort IB (right). Green: Males, red:females, Blue: both genders.

Table : The AUC results obtained by the AI model.

AUC

Gender Cohort I for development Corhort II for out of timevalidation test n=,

Trainingn=,

Validationn=,

Overall . . .Females . . .Males . . .

Cohort I constituted a training Cohort IA and a validation Cohort IB.Cohort II served the “out of time” validation test cohort. Outcomemeasure was individuals being diagnosed with cancer within days.

6 Soerensen et al.: Using AI to identify patients at risk for cancer

Page 7: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

Figure 3: AI risk score and cancer incidence in validation test Cohort II.(A) The upper panel shows the distribution of patients across AI scores in validation test Cohort II. Thus, it appears thatmost patients had anAIscore below 1, and only a total of 31 patients had an AI score above 50. (B) This panel provide the incidence of cancer diagnosedwithin 90 daysacross AI risk scores. Thus, the incidence of cancer within 90 days in validation test Cohort II was found to be below 12% at an AI score valuebetween 4 and 5 and op to 100% for AI scores from 70 to 90. However, only a limited number of patients in this rather small cohort had high AIscores and for example, there were no patients with AI score between 7 and 8 or between 40 and 45.

Soerensen et al.: Using AI to identify patients at risk for cancer 7

Page 8: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

of the obtained results. In the current studywe performed avalidation of the developed model in the “out of time”validation test Cohort II in order to overcome the challengewith overfitting.

Bootstrap estimates performed with the purpose ofevaluating the imprecision of the AI methodology, alsoallows for a comparison of risks between results obtainedin the validation Cohort IB and in the validation test CohortII. For example, a patients’ blood panel result was classi-fied as medium risk corresponding to a mean value risk of3.35% (5–95% fractile 1.80–5.13) in the validation Cohort IBand of 4.35% (5–95% fractile 2.91–5.95) in the validationtest Cohort II.

Presenting data in an AUC format is not optimal, noris presenting risk scores divided into categories (low,medium, high) as they may be regarded as too imprecisewhen it comes to clinical decisionmaking in the individualpatient. Presenting absolute risk as indicated in Table 5may be more useful and easier to interpret, as it provideinformation on the performance of the AI score to predictcancer at a range of different thresholds. In this work, wedo not recommend a specific threshold to define when atest is positive or negative, but instead provide the infor-mation on risk stratification to be used in clinical decisionmaking in the individual patient. As an example, if an AIscore of ≤3 was hypothetically considered as the threshold

Table : Bootstrap estimates on grouped risk bins on validation Cohort IB and validation test Cohort II.

Bootstrap estimates of event rate fractiles

Data partition Bin % fractile % fractile Mean value % fractile % fractile

Cohort IB . Very low (–) .% .% .% .% .%Cohort IB . Low (–) .% .% .% .% .%Cohort IB . Medium (–) .% .% .% .% .%Cohort IB . High (–) .% .% .% .% .%Cohort IB . Very high (+) .% .% .% .% .%Cohort II . Very low (–) .% .% .% .% .%Cohort II . Low (–) .% .% .% .% .%Cohort II . Medium (–) .% .% .% .% .%Cohort II . High (–) .% .% .% .% .%Cohort II . Very high (+) .% .% .% .% .%

Table : Performance of the AI score ranging from to in the validation Cohort IB.

AI score TP, n FN, n TN, n FP, n Sensitivity, % Specificity, % PPV, % NPV, %

. . . NA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NA .

8 Soerensen et al.: Using AI to identify patients at risk for cancer

Page 9: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

Table : Performance of AI score to predict cancer within days at different thresholds in the validation test Cohort II.

Cut-off Cohort number, % Cancer number, % TP TN FP FN Sensitivity, % Specificity, % PPV, % NPV, %

Threshold for negative test≤ (.) (.) . . . .≤ (.) (.) . . . .≤ (.) (.) . . . .≤ ((.) (.) . . . .≤ (.) (.) . . . .≤ (.) (.) . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,() (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .≤ ,(.) (.) , . . . .Threshold for positive test> ,(.) (.) . . . .> (.) (.) . . . .> (.) (.) . . . .> (.) (.) . . . .> (.) (.) . . . .> (.) (.) . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .> (.) (.) , . . . .

FN, false negative; FP, false positive; NPV, negative predictive value; PPV, positive predictive value; TN, true negative; TP, true positive. Theabsolute risk for having a cancer diagnosed within days at different thresholds according to the AI score is provided in the current Table . Ifchoosing a threshold for a negative AI score at ≤ then the risk in this group was found to be .%. However, for a single patient with a scoreof≤, the riskwas .%. In comparison, if a threshold for a positive AI score is set to >, the risk of being diagnosedwith a cancerwithin daysin this group of patients was found to be %. However, for a patient with a score of >, the risk was .%.

Soerensen et al.: Using AI to identify patients at risk for cancer 9

Page 10: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

for a negative test, the risk for having a cancer within90 days would be 1.2% for a patient having a score of 2 and1.1% at a score of 1. However, a patient having an AI scoreof eightwill be classified as having a positive test and in thecurrent study this patient will have a 19% absolute risk ofbeing diagnosed with cancer within 90 days (Table 5).

Thus, the developed AI model can provide a predictivescore, identifying patients with an increased risk of having

cancer in addition to identifying those patients who mayface a minor risk of having cancer. In this way, the scoremay serve as a supplementary source of information for theGPs overall assessment and clinical decision makingregarding further diagnostic work up and follow-up strat-egy for the individual patient.

However, even at low AI risk scores; there is a risk ofoverlooking patients with cancer. This is not surprisingbearing in mind that the patient has already consulted theGP due to symptoms. Because the GP has decided that thepatient needs further examinations via the “Suspicionof Hidden Cancer” pathway, it may be assumed that aheightened risk of cancer already exists, based on the GPschoice of referring the patient. This is in accordance withthe work ofWatson et al. [4] that concludes that blood testsin primary care such as hemoglobin, platelets, serum cal-cium, liver function tests and inflammatory markers mayindicate cancer in patients with non-specific symptoms butcannot rule out the presence of cancer.

To provide a context, we have compared performanceof the developed AI model with performance of theimmunochemical faecal blood test (iFOBT) used in theDanish screening program for colorectal cancer withan estimated sensitivity of about 79% and PPV in a range of3–8% [10]. Of those participating in the Danish screeningprogram, 6–7% of citizens present a value above the pos-itive threshold and are offered a colonoscopy, wherebetween 3 and 8% will be diagnosed with colorectal can-cer. In comparison, in our developed AI model applied todata from the validation test Cohort II, a patient with apositive AI score of three will have a 10.7% risk of beingdiagnosed with cancer within 90 days. At an AI score 3 wefound a sensitivity of 82% and a specificity of 55% as

Figure 4: ROC curve. Target “cancer within 90 days” with the LRmethod.Cohort IA: represents the training cohort. Cohort IB: represents thevalidation cohort. Cohort II: represents the validation test cohort.

Table : LR model score from – in the validation Cohort IB.

LR score TP, n FN, n TN, n FP, n Sensitivity, % Specificity, % PPV, % NPV, %

. . . NA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NA .

10 Soerensen et al.: Using AI to identify patients at risk for cancer

Page 11: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

presented in Table 5. A recent paper usingDNSmethylationpatterns for early detection of cancer yielded sensitivitiesfor stage I cancers of 18% and stage II of 43% [11].

A routine set of blood tests has previously been studiedby Naeser et al. [12] but from a different perspective. Naeseret al. analyzed the results from 1,499 patients of which12.2% were subsequently diagnosed with cancer and discov-ered that the probability of cancer increased with the numberof test results outside the reference range or with specificcombinations of abnormal test results. However, just count-ing the number of tests results outside the reference range isinsufficient in this context, as it has been shown that evenchanges within the normal rangemay indicate increased riskfor cancer. Thus, a reduced P-Albumin concentration and anincreased platelet number in blood, increase the risk forcancer in a concentration dependent way.

In addition, a recent study addresses the issue ofcombining simple blood tests to identify primary carepatients with unexpected weight loss for cancer investi-gation. They found that combinations of simple blood testabnormalities could be used to identify patients with un-expected weight loss who warrant referral for investiga-tion, while people with combinations of normal resultscould be exempted from referral [13].

Our data shows that both AI and LR models can beused for calculating a predictive score for being diagnosedwith cancer within 3 months. This risk can be used by theGP in the overall risk assessment, together with the otherinformation obtained from the anamnesis and objectiveexamination. The GP may recommend a faster investiga-tion for patients with a score corresponding to a high riskwhile a watchful waiting strategy may be used for patientswith a score corresponding to a low risk.

It is intuitively easy to use the personal absolute cancerrisk as a percentage rate to prioritize those patientswith thehighest risk vs. those with amuch lower risk, already at theoutset of the diagnostic process.

Finally, future research may determine whether LRin itself is sufficiently robust to be used as a riskassessment tool, given that one then does not have toadjust an AI algorithm and there is thus no “require-ment” for a standardized set of blood tests, if theinitiative is to be scaled to other analytical profiles with adifferent set of blood tests.

Strengths and limitations of this study

The study cohorts in the current study are well defined. Afurther strength is the validation of the developedmodel inthe “out of time” validation test Cohort II in order toovercome the challenge with overfitting. In addition, theuse of routine laboratory tests available for primary careincreases its clinical applicability, avoiding the require-ment for laboratory tests often only available in a special-ized hospital setting.

Our study has limitations. Firstly, the study populationis relatively small. Therefore, the studywas performedwitha full model without considering gender. However, infuture studies, gender specific models will be more rele-vant. In addition, it is a retrospective study, and a pro-spective design with a reassessment of the set of blood testwould probably strengthen the model. In addition, we didnot asses, which single blood test from the analyticalprofile had the greatest value in detecting cancer and thisneeds to be done in future studies. Finally, prior to a gen-eral clinical implementation it is crucial that the score isfurther validated to determine its applicability in otherpopulations.

Thisworkdidnot includedemographic risks, such as thelifestyle or social conditions of patients, as its main purposewas aproof of concept studyas towhether laboratory tests bythemselves contained sufficient valuable intrinsic informa-tion to be robust when used for AI data processing to

Table : LR model score from - in the validation test Cohort II.

LR score TP, n FN, n TN, n FP, n Sensitivity, % Specificity, % PPV, % NPV, %

, . . . NA . . . . . . . . . . . . , . . . . , . . . .

Soerensen et al.: Using AI to identify patients at risk for cancer 11

Page 12: Patricia Diana Soerensen*, Henry Christensen, Soeren Gray

calculate absolute risk of cancer. This however, is a prereq-uisite if laboratory test results should be used in combinationwith other kinds of data to enable evenmore comprehensiverisk prediction models, and thereby ultimately leading tobetter assessment of the patient’s risk of having cancer in alow prevalent setting as the primary care.

Conclusions and perspectives

The current study demonstrates the ability to develop an AImodel based on routine laboratory blood tests which isable to provide an easy-to use risk score to predict cancerwithin 90 days. The use of laboratory tests, widely avail-able for primary care, increases its clinical applicabilityand the AI risk score may prove to be a valuable tool in theclinical decision-making, supporting the GP triagewhethera patient needs faster further investigations or insteadadopt a watchful waiting strategy.

The AI score, however, needs to be further externalvalidated to determine its applicability in other populations.

A future improvement of the AI risk score may be ob-tained by further development and customization regardingthe panel of included routine laboratory blood tests. Inaddition, improvements may be obtained by using genderspecific models and by combining laboratory test resultsand demographic data in future prediction models.

Research funding: This project was financed by the HealthAuthorities of the Region of Southern Denmark.Author contributions:All authorshaveaccepted responsibilityfor the entire content of this manuscript and approved itssubmission.Competing interests: Christian Hardahl is an employee(Subject Matter Expert) at the company from which SASViya was purchased.Informedconsent:This projectwasperformed retrospectivelywithout patient contact in connection with the project afterpermission from the Danish Board for Patient Safety. Theproject is performed according to the DanishHealth Law § 42.No patient data were analysed outside the legal boarder ofthe Region of Southern Denmark Health System.Ethical approval: The project was carried out by permissionfrom the Danish Board for Patient Safety (number 3-3013-2954/1) and Hospital Lillebaelt.

References

1. Shipe ME, Deppen SA, Farjah F, Grogan EL. Developing predictionmodels for clinical use using logistic regression: an overview. JThorac Dis 2019;11:S574–84.

2. Fei Y, Li WQ. Improve artificial neural network formedical analysis, diagnosis and prediction. J Crit Care 2017;40:293.

3. Sud A, Torr B, Jones ME, Broggio J, Scott S, Loveday C, et al. Effectof delays in the 2-week-wait cancer referral pathway during theCOVID-19 pandemic on cancer survival in the UK: a modellingstudy. Lancet Oncol 2020;21:1035–44.

4. Watson J, Mounce L, Bailey SE, Cooper SL, Hamilton W. Bloodmarkers for cancer. BMJ 2019;367:l5774.

5. Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, et al.Detection and localization of surgically resectable cancers with amulti-analyte blood test. Science 2018;359:926–30.

6. Chen X, Gole J, Gore A, He Q, Lu M, JunM, et al. Non-invasive earlydetection of cancer four years before conventional diagnosisusing a blood test. Nat Commun 2020;11:3475.

7. Schneider JL, Layefsky E, Udaltsova N, Levin TR, Corley DA.Validation of an algorithm to identify patients at risk for colorectalcancer based on laboratory test and demographic data in diverse,community-based population. Clin Gastroenterol Hepatol 2020;18:2734–41.e6.

8. Gerhardt W, Keller H. Evaluation of test data from clinical studies.I. Terminology, graphic interpretation, diagnostic strategies, andselection of sample groups. II. Critical review of the concepts ofefficiency, receiver operated characteristics (ROC), and likelihoodratios. Scand J Clin Lab Invest Suppl 1986;181:1–74.

9. Roelofs R, Shankar V, Recht B, Fridovich-Keil S, Hardt M, Miller J,et al. A meta-analysis of overfitting in machine learning. NeuralInformation Processing Systems (NeurIPS). VancouverConvention Center, Vancouver CANADA 2019;32:207979247.

10. Robertson D, Lee J, Boland C, Dominitz J, Giardiello F, Johnson D,et al. Recommendations on fecal immunochemical testing toscreen for colorectal neoplasia: a consensus statement by the USMulti-Society Task Force on colorectal cancer. Gastrointest Endosc2017;152. https://doi.org/10.1038/ajg.2016.492.

11. Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV, CCGAConsortium. Sensitive and specific multi-cancer detection andlocalization using methylation signatures in cell-free DNA. AnnOncol 2020;31:745–59.

12. Naeser E,Moeller H, Fredberg U, Frystyk J, Peter Vedsted P. Routineblood tests and probability of cancer in patients referred withnonspecific serious symptoms: a cohort study. BMC Cancer 2017;17:817.

13. Nicholson BD, Aveyard P, Koshiaris C, Perera R, Hamilton W,Oke J, et al. Combining simple blood tests to identify primarycare patients with unexpected weight loss for cancerinvestigation: clinical risk score development, internalvalidation, and net benefit analysis. PLoS Med 2021;18:e1003728.

12 Soerensen et al.: Using AI to identify patients at risk for cancer