a deep learning algorithm to quantify liver fat content in ... · steatosis grading by the...

1
A Deep Learning Algorithm to Quantify Liver Fat Content in Humans S. Qadri 1,2 , L. Ahtiainen 1,2 , S. Boyd 3 , P. Luukkonen 1,2 , A. Juuti 4 , H. Sammalkorpi 4 , V. Männistö 5 , J. Pihlajamäki 6,7 , V. Kärjä 8 , K. Pitkänen 9 , J. Lundin 9 , J. Arola 3 , H. Yki-Järvinen 1,2 1 Department of Medicine, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; 2 Minerva Foundation Institute for Medical Research, Helsinki, Finland; 3 Department of Pathology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; 4 Department of Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; 5 Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland; 6 Department of Clinical Nutrition, Faculty of Health Sciences, Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Kuopio, Finland; 7 Clinical Nutrition and Obesity Centre, Kuopio University Hospital, Kuopio, Finland; 8 Department of Pathology, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland; 9 Aiforia Technologies Oy, Helsinki, Finland Background & Aims 1 Deep learning (DL) algorithms are computational paradigms that are inspired by the biological function of neurons 1 . DL algorithms are powerful tools for automatic image analysis 2 . In histological diagnosis and classification of liver disease, visual evaluation by a hepatopathologist is considered to be the gold standard. Observer-related factors are well known to cause significant variability in pathologists’ evaluations 3- 6 . There is a need for observer-independent methods for accurate, rapid and automated quantification of liver histology. We determined whether DL can be used to automatically quantify hepatic steatosis in human liver biopsies. We developed and validated a DL algorithm to analyse liver histology using the Aiforia™ platform in a large cohort of liver biopsies, and compared the algorithm’s performance against human observers. Metric % Formula Precision 96.8 TP/(TP+FP) Recall* 89.8 TP/(TP+FN) F 1 -score† 93.1 2*P*R/(P+R) TP, true positive; FP, false positive; FN, false negative; P, precision; R, recall (*sensitivity) Algorithm recognises lipid droplets with high sensitivity and precision in comparison to manual human counting n = 668 n = 561 n = 107 TRAINING COHORT VALIDATION COHORT Patient characteristics Age (years) 48.6 ± 0.3 Females (%) 71.8 BMI (kg/m 2 ) 42.7 ± 0.3 Liver fat (%) 10 (0-30) NAFLD (%) 67.6 NASH (%) 12.4 Data are in %, mean ± SEM or median (IQR). LIVER BIOPSIES FROM BARIATRIC SURGERY PATIENTS The deep learning algorithm automatically segments hepatic parenchyma, capsule, portal tracts, and lipid droplets in WSIs. We also implemented a method for automatically quantifying the distribution of LDs in the hepatic acini by measuring the distance of individual LDs to the edge of the nearest portal tract (see figure in the lower right corner). Analysis speed was on average 3.5 seconds per single WSI or 50 mm 2 per second. Thus, it takes one hour to analyse 1000 histological sections. 1. Deep learning is a fundamentally different method of analysing liver histology compared to traditional histological assessment by pathologists. It provides rapid, consistent and accurate metrics regarding hepatic steatosis. 2. Detection of lipid droplets by DL compared to a human is both sensitive and precise. 3. Steatosis quantitation using DL correlates well with estimatied steatosis perentage by experienced hepatopathologists. 4. Pathologists consistently overestimated the degree of steatosis in liver specimens. Previous data published by others support the notion that the human eye overemphasizes the degree of steatosis in liver biopsies 8-10 . 5. Use of computerised analysis eliminates observer-related variability in histological assessments, improving consistency. 6. These novel metrics can be used to further characterize the emerging subtypes of NAFLD. Results 3 Patients & Methods 2 Conclusions 4 80 / 70 % 40 / 30 % 10 / 7 % 48 % 16 % 5 % The human eye overemphasizes the degree of steatosis in liver biopsies Pathologists’ assessments of steatosis correlate higly significantly with algorithm’s quantitation but pathologists consistently report higher percentage of fat in a given specimen Algorithm Pathologists Manually selected homogenous areas from three biopsies containing mainly hepatocytes and macrovesicular lipid droplets. Steatosis grading by the algorithm achieves higher agreement with pathologists than pathologists achieve with each other Kappa score 1.0 reflects perfect agreement amongst two observers. We acquired digital hi-res whole-slide images (WSI) of Herovici-stained liver specimens, which we then uploaded to the cloud-based Aiforia™ image processing platform 7 . Using hand-drawn annotations, the DL algorithm was trained by pathologists and trained operators to recognise different histological structures. Algorithm calculates the percentage of macrovesicular steatosis, in addition to the number, size, diameter and surface area of lipid droplets (LD) and other structures, and the distribution of LDs in hepatic acini. We compared the algorithm’s results to manual human counting and to pathologists’ conventional assessments of steatosis. Performance metrics of LD recognition Pathologist-algorithm Inter-pathologist 1. LeCun, Y., et al. Nature 2015; 521(7553): 436-444. 2. Litjens, G., et al. Med Image Anal 2017; 42: 60-88. 3. Younossi, Z. M., et al. Mod Pathol 1998; 11: 560-565. 4. Kleiner, D. E., et al. Hepatology 2005; 41(6): 1313-1321. 5. Bedossa, P., et al. Hepatology 2012; 56(5): 1751-1759. 6. Bedossa, P. and F. P. Consortium. Hepatology 2014; 60(2): 565-575. 7. Penttinen, A. M., et al. Eur J Neurosci 2018; 48(6): 2354-2361. 8. Franzen, L. E., et al. Mod Pathol 2005; 18(7): 912-916. 9. Hall, A. R., et al. Liver Int 2013; 33(6): 926-935. 10. Turlin, B., et al. Liver Int 2009; 29(4): 530-535. the F-measure reflects overall accuracy of the algorithm Count of LDs in random scoring regions in histological images 200 µm 200 µm 200 µm References 5 Contact: Sami Qadri, BM – [email protected]

Upload: others

Post on 27-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • A Deep Learning Algorithm to Quantify Liver Fat Content in HumansS. Qadri1,2, L. Ahtiainen1,2, S. Boyd3, P. Luukkonen1,2, A. Juuti4, H. Sammalkorpi4, V. Männistö5, J. Pihlajamäki6,7, V. Kärjä8, K. Pitkänen9, J. Lundin9, J. Arola3, H. Yki-Järvinen1,2

    1Department of Medicine, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; 2Minerva Foundation Institute for Medical Research, Helsinki, Finland; 3Department of Pathology, University of Helsinki and HelsinkiUniversity Hospital, Helsinki, Finland; 4Department of Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland; 5Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland;6Department of Clinical Nutrition, Faculty of Health Sciences, Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Kuopio, Finland; 7Clinical Nutrition and Obesity Centre, Kuopio University Hospital, Kuopio, Finland;8Department of Pathology, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland; 9Aiforia Technologies Oy, Helsinki, Finland

    Background & Aims1Deep learning (DL) algorithms are computationalparadigms that are inspired by the biological function ofneurons1. DL algorithms are powerful tools forautomatic image analysis2. In histological diagnosis andclassification of liver disease, visual evaluation by ahepatopathologist is considered to be the goldstandard. Observer-related factors are well known tocause significant variability in pathologists’ evaluations3-6. There is a need for observer-independent methodsfor accurate, rapid and automated quantification of liverhistology.

    We determined whether DL can be used toautomatically quantify hepatic steatosis in human liverbiopsies. We developed and validated a DL algorithm toanalyse liver histology using the Aiforia™ platform in alarge cohort of liver biopsies, and compared thealgorithm’s performance against human observers.

    Metric % Formula

    Precision 96.8 TP/(TP+FP)

    Recall* 89.8 TP/(TP+FN)

    F1-score† 93.1 2*P*R/(P+R)TP, true positive; FP, false positive; FN, falsenegative; P, precision; R, recall (*sensitivity)

    Algorithm recognises lipid droplets with high sensitivity and precision in comparison to manual human counting

    n = 668

    n = 561

    n = 107

    TRAINING COHORT

    VALIDATION COHORT

    Patient characteristicsAge (years) 48.6 ± 0.3Females (%) 71.8BMI (kg/m2) 42.7 ± 0.3Liver fat (%) 10 (0-30)NAFLD (%) 67.6NASH (%) 12.4Data are in %, mean ± SEM or median (IQR).

    LIVER BIOPSIES FROM BARIATRIC SURGERY PATIENTS

    The deep learning algorithm automatically segmentshepatic parenchyma, capsule, portal tracts, and lipiddroplets in WSIs. We also implemented a method forautomatically quantifying the distribution of LDs in thehepatic acini by measuring the distance of individual LDsto the edge of the nearest portal tract (see figure in thelower right corner).

    Analysis speed was on average 3.5 seconds per single WSIor 50 mm2 per second. Thus, it takes one hour to analyse1000 histological sections.

    1. Deep learning is a fundamentally differentmethod of analysing liver histology comparedto traditional histological assessment bypathologists. It provides rapid, consistent andaccurate metrics regarding hepatic steatosis.

    2. Detection of lipid droplets by DL compared toa human is both sensitive and precise.

    3. Steatosis quantitation using DL correlates wellwith estimatied steatosis perentage byexperienced hepatopathologists.

    4. Pathologists consistently overestimated thedegree of steatosis in liver specimens.Previous data published by others support thenotion that the human eye overemphasizesthe degree of steatosis in liver biopsies8-10.

    5. Use of computerised analysis eliminatesobserver-related variability in histologicalassessments, improving consistency.

    6. These novel metrics can be used to furthercharacterize the emerging subtypes of NAFLD.

    Results3

    Patients & Methods2

    Conclusions4

    80/70 % 40/30 % 10/7 %48 % 16 % 5 %

    The human eye overemphasizes the degree of steatosis in liver biopsies

    Pathologists’ assessments of steatosis correlate higly significantly with algorithm’s quantitation but pathologists consistently report higher

    percentage of fat in a given specimen

    Algorithm

    Pathologists

    Manually selectedhomogenous areasfrom three biopsiescontaining mainlyhepatocytes andmacrovesicular lipiddroplets.

    Steatosis grading by the algorithm achieves higher agreement with pathologists than pathologists achieve with each other

    Kappa score 1.0 reflects perfect agreement amongst two observers.

    • We acquired digital hi-res whole-slide images (WSI)of Herovici-stained liver specimens, which we thenuploaded to the cloud-based Aiforia™ imageprocessing platform7.

    • Using hand-drawn annotations, the DL algorithmwas trained by pathologists and trained operatorsto recognise different histological structures.

    • Algorithm calculates the percentage ofmacrovesicular steatosis, in addition to thenumber, size, diameter and surface area of lipiddroplets (LD) and other structures, and thedistribution of LDs in hepatic acini.

    • We compared the algorithm’s results to manualhuman counting and to pathologists’ conventionalassessments of steatosis.

    Performance metrics of LD recognition

    Pathologist-algorithm

    Inter-pathologist

    1. LeCun, Y., et al. Nature 2015; 521(7553): 436-444.2. Litjens, G., et al. Med Image Anal 2017; 42: 60-88.3. Younossi, Z. M., et al. Mod Pathol 1998; 11: 560-565.4. Kleiner, D. E., et al. Hepatology 2005; 41(6): 1313-1321.5. Bedossa, P., et al. Hepatology 2012; 56(5): 1751-1759.6. Bedossa, P. and F. P. Consortium. Hepatology 2014; 60(2): 565-575.7. Penttinen, A. M., et al. Eur J Neurosci 2018; 48(6): 2354-2361.8. Franzen, L. E., et al. Mod Pathol 2005; 18(7): 912-916.9. Hall, A. R., et al. Liver Int 2013; 33(6): 926-935.10. Turlin, B., et al. Liver Int 2009; 29(4): 530-535.

    † the F-measure reflects overall accuracyof the algorithm

    Count of LDs in random scoring regions in histological images

    200 µm 200 µm200 µm

    References5

    Contact: Sami Qadri, BM – [email protected]