musculoskeletal abnormality detection in medical imaging ......sn computer science (2020) 1:319 page...

12
Vol.:(0123456789) SN Computer Science (2020) 1:319 https://doi.org/10.1007/s42979-020-00340-7 SN Computer Science ORIGINAL RESEARCH Musculoskeletal Abnormality Detection in Medical Imaging Using GnCNNr (Group Normalized Convolutional Neural Networks with Regularization) Mukul Goyal 1  · Rishabh Malik 1  · Deepika Kumar 1  · Siddhant Rathore 1  · Rahul Arora 1 Received: 7 July 2020 / Accepted: 18 September 2020 / Published online: 1 October 2020 © Springer Nature Singapore Pte Ltd 2020 Abstract Musculoskeletal abnormality detection serves as an advantage to the professionals in the medical domain and also serves as an assistance in the diagnosis as well as the treatment of the abnormalities. This paper mainly focuses on accurately detecting musculoskeletal abnormalities using various deep learning models and techniques. MURA dataset has been used for experimentation. MURA dataset has 14,863 images of finger, wrist, elbow, shoulder, forearm and hand which has been analyzed using deep learning models. In this research paper, authors have proposed GnCNNr model which utilizes group normalization, weight standardization and cyclic learning rate scheduler to enhance the accuracy, precision and other model interpretation metrics. The musculoskeletal abnormality has been detected by using various deep learning models. Accu- racy and Cohen Kappa have been taken as the evaluation criteria. The highest accuracy of 85% and Cohen Kappa statistic of 0.698 was achieved by the GnCNNr model in comparison with the conventional deep learning methods like DenseNet, Inception, Inception v2 model. Keywords GnCNNr · Group normalized CNN · Abnormality detection · Machine learning Introduction The human body is the structure which comprises vari- ous cells and organs. The human body is necessary for carrying out the various types of functions necessary for a healthy living. Any abnormality in the human body can cause uneasiness to the human being. An abnormality is any structural change in one or more parts of the body. Any abnormality in the human body can lead to a functional loss or impairment [1]. Abnormalities can be numerical or structural on the basis of chromosomes or first, second and third on the basis of the degree of deformity [2]. Any type or degree of abnormality can be dangerous to physical as well as mental health [3]. Abnormalities can cause aches in several parts of the body, abnormal postures, confused think- ing, reduced concentration, extreme mood swings, inability to cope with stress, excessive anger or violence and major changes in eating habits [4]. Traditionally, abnormalities in the human body are analyzed manually by a professional in the medical domain such as a general doctor or a specialist. The manual analysis can be a clinical interview, symptom questionnaire, personal inventories, self-monitoring, behav- ioral observation, intelligence tests and the most frequently used are the brain imaging techniques. One of the challenges in the questionnaire type diagnosis is when an individual refuse to share the personal details or does not want to be examined or diagnosed. In these types of situations, the brain imaging techniques have proven to be advantageous [5]. Although there are a lot of manual testing techniques This article is part of the topical collection “Deep learning approaches for data analysis: A practical perspective guest edited by D. Jude Hemanth, Lipo Wang and Anastasia Angelopoulou”. * Deepika Kumar [email protected] Mukul Goyal [email protected] Rishabh Malik [email protected] Siddhant Rathore siddhantrathoreinfi[email protected] Rahul Arora [email protected] 1 Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi, India

Upload: others

Post on 18-Nov-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

Vol.:(0123456789)

SN Computer Science (2020) 1:319 https://doi.org/10.1007/s42979-020-00340-7

SN Computer Science

ORIGINAL RESEARCH

Musculoskeletal Abnormality Detection in Medical Imaging Using GnCNNr (Group Normalized Convolutional Neural Networks with Regularization)

Mukul Goyal1 · Rishabh Malik1 · Deepika Kumar1  · Siddhant Rathore1 · Rahul Arora1

Received: 7 July 2020 / Accepted: 18 September 2020 / Published online: 1 October 2020 © Springer Nature Singapore Pte Ltd 2020

AbstractMusculoskeletal abnormality detection serves as an advantage to the professionals in the medical domain and also serves as an assistance in the diagnosis as well as the treatment of the abnormalities. This paper mainly focuses on accurately detecting musculoskeletal abnormalities using various deep learning models and techniques. MURA dataset has been used for experimentation. MURA dataset has 14,863 images of finger, wrist, elbow, shoulder, forearm and hand which has been analyzed using deep learning models. In this research paper, authors have proposed GnCNNr model which utilizes group normalization, weight standardization and cyclic learning rate scheduler to enhance the accuracy, precision and other model interpretation metrics. The musculoskeletal abnormality has been detected by using various deep learning models. Accu-racy and Cohen Kappa have been taken as the evaluation criteria. The highest accuracy of 85% and Cohen Kappa statistic of 0.698 was achieved by the GnCNNr model in comparison with the conventional deep learning methods like DenseNet, Inception, Inception v2 model.

Keywords GnCNNr · Group normalized CNN · Abnormality detection · Machine learning

Introduction

The human body is the structure which comprises vari-ous cells and organs. The human body is necessary for carrying out the various types of functions necessary for a healthy living. Any abnormality in the human body can

cause uneasiness to the human being. An abnormality is any structural change in one or more parts of the body. Any abnormality in the human body can lead to a functional loss or impairment [1]. Abnormalities can be numerical or structural on the basis of chromosomes or first, second and third on the basis of the degree of deformity [2]. Any type or degree of abnormality can be dangerous to physical as well as mental health [3]. Abnormalities can cause aches in several parts of the body, abnormal postures, confused think-ing, reduced concentration, extreme mood swings, inability to cope with stress, excessive anger or violence and major changes in eating habits [4]. Traditionally, abnormalities in the human body are analyzed manually by a professional in the medical domain such as a general doctor or a specialist. The manual analysis can be a clinical interview, symptom questionnaire, personal inventories, self-monitoring, behav-ioral observation, intelligence tests and the most frequently used are the brain imaging techniques. One of the challenges in the questionnaire type diagnosis is when an individual refuse to share the personal details or does not want to be examined or diagnosed. In these types of situations, the brain imaging techniques have proven to be advantageous [5]. Although there are a lot of manual testing techniques

This article is part of the topical collection “Deep learning approaches for data analysis: A practical perspective guest edited by D. Jude Hemanth, Lipo Wang and Anastasia Angelopoulou”.

* Deepika Kumar [email protected]

Mukul Goyal [email protected]

Rishabh Malik [email protected]

Siddhant Rathore [email protected]

Rahul Arora [email protected]

1 Department of Computer Science and Engineering, Bharati Vidyapeeth’s College of Engineering, New Delhi, India

Page 2: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319319 Page 2 of 12

SN Computer Science

that can be used to diagnose an abnormality some of the medical experts have focused on proposing machine learning techniques and methods to overcome the problem previously mentioned. The ML techniques used traditional algorithms, such as support vector machines, Naïve Bayes, K-nearest neighborhood (KNNs) and random forest technique more frequently [6]. Machine learning not only assists in obtain-ing accurate and precise results in the field of medical image analysis [7, 8]. Machine learning has also provided signifi-cant results in the field of medicine previously [9, 10]. Mus-culoskeletal condition is a state in which pain and injury are caused in muscles and bones due to sudden exertions. The parts that are affected the most by this condition are the joints, muscles, nerves, tendons, neck and back. Accord-ing to the WHO report [11], musculoskeletal condition has caused disabilities throughout the world. It is reported that one in three people live with a painful and disabling mus-culoskeletal condition. It significantly hinders movement capacity which leads to reduction in work efficiency, reduced ability to participate in social roles. It is a tedious job to pick a particular study in the dataset as normal or abnormal: a person tested normal by this procedure can skip further diagnosis and medical formalities. Abnormality detection in imaging has applications in the medical also. Firstly, a sys-tem has to be used to ascertain which patient to test first. The patient who gets an abnormal report will receive diagnosis and treatment with first priority and the patients who get a normal report can be discharged away. Second, automated abnormality detection can help reduce the vexation from the shoulders of the professionals. An efficient algorithm would help the physicians work much efficiently and with higher levels of concentrations. Machine learning techniques help skip the traditional methods of manual diagnosis and diag-nose the medical imaging data by using single or multiple integrated algorithms [12].

This research focuses on developing a model to efficiently predict abnormality in the upper parts of the human body through medical imaging. The evolution of deep learning has arose curiosity in its utilization in the field of medical imag-ing. Earlier, there was a constraint of very small publicly available datasets due to its nature of containing personal information. The recent availability of large data sets so as to generalize the results obtained by the classifier model. This paper proposes a model that works on the MURA data-set which contains almost medical images from multiple stud-ies, where studies have been labeled and thus functioning as supervised learning [13].

The rest of the paper is organized as: Sect. 2 discusses the related work, proposed methodology with dataset descrip-tion, data preprocessing, the model architecture is discussed in Sects. 3 and 4 explain about the results and analysis which is followed by conclusion and lists some of the shortcomings of the model is Sect. 5.

Related Work

A huge amount of work has already been done to detect abnormalities in medical imaging. Work based on vari-ous concepts linked to pattern recognition, classification, segmentation, image mining and image retrieval has been discussed further. The authors introduced a model which worked on a MURA dataset consisting of musculoskeletal X-rays. The images were classified into two categories, normal and abnormal. The views of the study were taken as input by the model and on every view, the abnormality is predicted using a 169-layer DenseNet baseline model. The arithmetic mean was then calculated for each study to get the desired output. Model was then compared using Cohen kappa. Prediction is termed as abnormal if the prob-ability comes higher than 0.5 in the overall case [14]. The research provided a method in which using digital items, abnormalities can be automatically analyzed and detected. It analyzed the abnormalities in the form of lung tissue distortions. It took more than one portion of an image and generate suitable image data. All the image data are further correlated together. The data are then searched to identify an abnormal region in any of digital images based on predetermined criteria. Due to this, the desired abnormal region is known, and it is studied extensively that whether the region is harmless or not. The studies are based on a variety of predetermined features in the specific region. For further enhancement, the histogram of an image is compared along with grey-level frequency distribution of the same [15]. The research used deep neural network (DNN). In this abnormality detection was done using a DNN. To map acquisitions of radiologists to the images, DNN was built using various reconstruction algorithms. After that, a 3D convolution neural network was used, and to optimize the entire DNN both these net-works were trained together. The fully optimized DNN was further used in chest computed tomography (CT) for lung nodule detection. The working together of both the mod-els produced better results than when they were trained separately. Due to this, there was a big reduction in case of false-positive (FP) rate which is required to check that there hadn’t been any overdiagnosis in lung CT imaging. One of the disadvantages of the model was that the recon-struction network was used to generate an image which could have been left to other algorithms [16]. The authors discussed the techniques which have been widely used in medical imaging, particularly in the field of anomaly detection. Along with this anomaly detection in the field of brain CT images is also assessed and discussed. This system is supported on various methods using machine learning and statistics. Various techniques measured on terms like measure of performance, segmentation, ground

Page 3: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319 Page 3 of 12 319

SN Computer Science

truth [17]. The research discussed an examination bun-dle was used which consists of at least a single medical image from the first modality along with at least one from another modality. The system includes a learning engine which was used to determine some essential character-istics of the abnormalities from medical images of each first modality as well as a second modality. To detect the abnormalities from one or more than one medical images, detecting the engine was used. The module then combines to determine the attributes of abnormalities with the help of features extracted from both the modalities [18]. The cerebral MRI images were used to detect brain tumor effi-ciently. The process was divided into three major steps. First, an enhancement process was applied, to improve the image quality to the desired level as well as to limit the risk of fusion during the segmentation phase. Also, to increase contrast in images, mathematical morphology was applied. Further wavelet transform was used to decompose those images. It was followed by applying the k-means algorithm to extract suspicious tumors. The end results are satisfactory in this case [19]. This research introduces a method which aims to use clear histopathological images as an input to convolution neural network (CNN) and thus avoiding a model which might result in a very complex as well as computational costly architecture. For the final classification, it basically uses extraction of various image patches which was used for training the CNN. Due to this CNN shows the better result as compared to previous stud-ies. Different CNNs have been combined together using fusion rules, which had resulted in a major improvement in results [20]. The research was based on reviewing the research done on using ML algorithms for diagnosing mental illness and it suggests how ML techniques can be further used and implemented. Five traditional algo-rithms were used which are mainly used for research related to mental health. Some of these include support vector machines, random forests and Naïve Bayes. This paper basically provided useful information along with the limitations of all the techniques using machine learn-ing. The major drawbacks in the research included that since sample space was not sufficient to have got major details about the entire population [9]. The authors classify medical images using binary classification. This classifi-cation was done using different features of an image and its abnormality. KNN classifier had been used to classify images. Its performance was then compared along with the SVM classifier. The result revealed that KNN showed better results compared to SVM with an 80% classifica-tion rate. Post-processing step was applied in the case the image was found to be abnormal [21]. In recent years, magnetic resonance imaging (MRI) had played a huge role in the field of research done on different data sets con-taining brain images. Classification was widely used to

discriminate between normal patients and those suffering from abnormalities. This research used MRI images and handed down features based on texture like grey level co-occurrence matrix. Sequential forward selection algorithm was used to work on particular features. After that, the support vector machine classified MRI data into binary classes of normal and abnormal [22]. The authors provide a statistical learning procedure that identifies abnormali-ties which are divergent from that of normal. To capture the inter-individual variability, the model learns multivari-ate statistics of high feature images. The model worked on limited training images. Usually, due to high dimensional images along with limited training samples, it was difficult to learn an accurate statistical model of data being used. To conquer the problem, the model used the technique of encapsulating a large number of low dimensional sub spaces as these are more reliable. Thus, the learned sub-space models are used to determine abnormalities through iterative projections [23].

The research proposed a gradable classification sys-tem that classified data belonging to multiple hierarchical classes for medical image annotation. A single predictive clustering tree (PCT) was built that predicts continuous annotations in an image. PCT uses a single classifier for hierarchical semantics, hence it was far more efficient than other classifiers that are valid for just a single class. The model used a set of PCTs for better performance. Evalua-tion was done in an IRMA database that had several X-ray images. The performance investigation was done using two ensemble approaches of bagging and random forests with the involvement of high and low feature fusion [24]. In this research, 15 preschool children aged 2–4, suffering from Autism spectrum disorder (ASD) were distinguished with 15 developing children using a supervised machine learning algorithm. Autism spectrum disorder can be found out using upper limb movements was the main outcome of the results. To fulfill this goal, basic kinematic analysis of reach, grasp was performed by children and supervised machine learn-ing procedure was then applied to it. 96.7% accuracy was reached using features which had the greatest distinction cognition [25]. Research proposed a model that recognized the crucial physical aspects that cause spinal abnormalities. These complications play an important role in appropriate therapy of lower back pain. Hence this model proves wor-thy in the medical field as it helps in recognition of symp-toms at the initial stage. The model used unsupervised and supervised machine learning outlooks such as principal component analysis and KNN respectively to foresee spinal abnormalities. The authors demonstrated that degree spon-dylolisthesis was the prime aspect that gave rise to spinal abnormalities. As far as the comparability of the result was considered between RF classifier and KNN classifier, the latter performed better [26].

Page 4: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319319 Page 4 of 12

SN Computer Science

Proposed Methodology

This research extensively works upon improving the results and accuracy of abnormality detection by creating a new CNN model. This model comprises of some previously used CNN layers along with a learning rate scheduler and the weight standardization technique. The data set is preproc-essed before it is input to the model and some augmenta-tion techniques applied to the data, so the results obtained are more accurate and precise. The complete structure of the model and its working can be comprehended using the Fig. 1.

Dataset Description

There are various data sets available to detect abnormalities using medical imaging. The MURA dataset [13] is used in this research as it contains a large amount of data for a pre-cise and accurate study results. The dataset contains 40,561

images which provide a vast coverage area of studying the medical imaging.

Various models have already been designed using the various publicly available datasets. Some of the available datasets are shown in Table 1. The machine learning mod-els designed using these available datasets were unable to produce the desired results, accuracy and precision. The MURA dataset, one of the largest publicly available radio-graphic image datasets, can produce accurate and precise results. It also consists of multiple views of the radiographic images which assists the model in the accurate detection of abnormalities.

Before any transformation takes place on the data and fed to the model, the data is structured so that it can be aligned with the required format of input of the model. The model requires two arrays as input—the image directory to load the images and their corresponding labels. The labels are only provided for each study so that information is used to generate labels for each image. The entire dataset information is stored in a

Fig. 1 Proposed workflow for abnormality detection

Table 1 Publicly available datasets for medical imaging

Dataset Study type Label Images

MURA [13] Musculoskeletal (upper extremity) Abnormality 40,561Pediatric bone age [27] Musculoskeletal (hand) Bone age 14,2360.E.1 [28] Musculoskeletal (knee) K&L grade 8892Digital hand atlas [29] Musculoskeletal (left hand) Bone age 1390Chest X-ray 14 [30] Chest Multiple pathologies 112,120Open l [31] Chest Multiple pathologies 7470MC [32] Chest Abnormality 138JSRT [33] Chest Pulmonary nodule 247

Page 5: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319 Page 5 of 12 319

SN Computer Science

Pandas data frame where each image’s corresponding location, output label, body part label and study label have been stored.

Data Preprocessing

Each image is a grayscale image with different aspect ratios but the model requires input with a single aspect ratio so all the images are rescaled to 320 × 320 size. Each image is normal-ized using the equation:

Images are given a random transformation like rotation, horizontal flipping or vertical flipping before feeding into the model. The various data preprocessing techniques used are as follows: fixed size of images, increased channels, data augmentation and normalization. For example, images are randomly rotated at 30° and provided random horizontal and vertical flip, images are randomly rotated at 37° and provided random horizontal flip only, and images are randomly rotated at 37° and provided random vertical flip only.

Learning Rate Scheduler

A neural network is particularly improved using stochastic gradient descent and the parameter O (weight) is improved by O T = O T − 1 − ∈ T ∂l/∂O, where T, the learning rate and l, the loss function. Since, a smaller learning rate will make a training algorithm approach slowly while a larger learning rate makes the algorithm depart. Hence, learning rates must be experimented continuously [34].

This project uses a cyclic learning rate (CLR) scheduler in which the learning rate cycles between upper bound and lower bound instead of setting it to a fixed value. Dauphin et al. argues that the difficulty in minimizing the loss arises from saddle points rather than poor local minima [35].

There are various learning rate window policies discussed in the paper but for this project, triangular window learning rate policy has been chosen. Mathematically, the general schedule is defined as:

(1)x =x −mean

std,∀x = pixel value ∈ [0, 255].

(2)pt = pmin + (pmax − pmin)(max(0, 1 − y)),

where y can be defined as

and cycle can be calculated as

where pmax and pmin defines extremities of the learning rate, total number of iterations and step size = half of a cycle length measured epochs. The cyclic learning rate policy has been shown in Fig. 2.

Training

A CNN model is used for predicting chance of abnormality in every data in the dataset. The complete layer replaces to a layer having a single output, after which runs a sig-moid non-linearity. For each sub data of dataset type in the training set, an optimized binary cross entropy loss is used mathematically

where Y is the label and p (Y = i|X) is the mathematical chance that the network assigns to the label i.

Model Architecture

This model has its first input layer which processes the images of size (320, 320) in batches of 3 and then transfers it to the first convolution layer as an input with a kernel size of (3, 3) along with hyperbolic tangent as the activation function. This layer gives 32 feature maps in all to the next layer which is the max pooling layer.

The convolution layers are the ones where the image is changed with filters and then the max pooling layer selecting the maximum element cloaked by the filter from the feature map. Therefore, the max-pooling layer outputs a feature map that contains the salient features only. Output of the max pooling layer is now sent to the 2nd layer giving 64 feature maps in all to the next layer.

(3)y =||||iterations

stepsize− 2(cycle) + 1

||||

(4)cycle = floor

(1 + iterations

2(stepsize)

)

(5)L(X,Y) = −wT ,1 ⋅ Y log p(Y = 1|X) − wT ,0 ⋅ (1 − Y log p(Y = 0|X),

Fig. 2 Triangular window cyclic learning rate policy [36]

Page 6: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319319 Page 6 of 12

SN Computer Science

After performing max pooling on this feature map group normalization layer divides the channels into sets and cal-culates the mean and variance for standardization of each group i.e. normalizing the features in the groups. Group normalization is autonomous from batch sizes. Also, its accuracy is stable with various batch sizes. The flatten layer inputs feature maps from layers already generated and con-verts a 1D array of data to pass it to next layer. Convolutional layer’s output is made planar creating one long feature vector which connects to the final classification [37, 37, 38]. The dense layer takes the 1-D array which is the output of the flatten layer and has a weight m and bias vector v and acti-vation of previous layer p, used to generate the final output. This layer uses ReLU activation function which is repre-sented as:

The final architecture for GnCNNr has been shown in Fig. 3.

Testing

The model is passed with multiple views of a subset of upper extremity from the bigger dataset. The CNN predicts the chance of abnormality on each and every view input to the model. The overall chance is calculated as the arithmetic mean of the abnormality chances output by the network for singular data. The final result is taken to be abnormal if the chance is greater than 0.5.

Results and Analysis

The three models used for comparison with the GnCNNr are the DenseNet, Inception and Inception v2 model. The three models can be explained as given below.

DenseNet Model

Densely connected network (DenseNet) was proposed to solve the problem of vanishing gradient in continuation to the ResNet which familiarized residual learning in CNN so as to improve the training process of deep networks [39]. DenseNet uses cross-layer connectivity to get over the com-plication related to the vanishing gradient in the ResNet. DenseNet connects each and every layer to each other layer to input all the preceding layers’ features in all the further layers. Thus, DenseNet model creates l (l + 1)/2 direct con-nections, whereas traditional CNNs form l connections between layers [40].

(6)y = max(0, x).

Inception and Inception v2 Model

Google Net familiarized inception block to CNN which involves transformations such as split, transform and merge. The model was introduced so as to achieve a high accuracy rate generating a low computational cost. Google Net focused on introducing a feature of parameter efficiency in CNN. Google Net, before employing big kernels, adds a bottleneck layer with a 1 × 1 filter.[41]. Further improvements to the Inception model are namely Inception v3 and v4. Inception v3 reduces the com-putational cost of the model by not changing the generalization of the network [42]. For this purpose, Szegedy et al. replaced large size filters (5 × 5 and 7 × 7) with small and asymmetric filters (1 × 7 and 1 × 5) and used 1 × 1 convolution as a bot-tleneck prior to the large filters [43]. The model performances are assessed on Cohen Kappa (95% confidence interval) and AUROC statistics along with specificity and sensitivity. Cohen Kappa and AUROC measures are explained as follows.

Cohen Kappa

Cohen’s Kappa [44] is a statistic useful for reliability testing. It can range from − 1 to + 1, where 0 represents the amount of agreement from random chance, and 1 represents perfect agreement. The kappa is a standardized value and thus is inter-preted as the same across multiple studies.

Mathematically, Cohen Kappa is defined as

where po is the relative chance, and pe is the hypothetical chance, using the observed data to calculate the probabili-ties of each observer randomly seeing each category. For C categories, O observations to categorize and nCi the number of times rater i predicted category C:

Receiver Operating Characteristic

A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied [45]. The ROC curve is created by plotting the true-positive rate and false-positive rate. When using normalized units, the area under the curve (AUC) [43] is equal to the probability of a randomly chosen positive instance higher than a randomly chosen negative one:

(7)x =po − pe

1 − pe,

(8)pe =1

O2

C

nC1nC2.

(9)TPR(A) ∶ A → Y(X)

Page 7: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319 Page 7 of 12 319

SN Computer Science

Fig. 3 Model architecture

Page 8: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319319 Page 8 of 12

SN Computer Science

The results obtained from different models are compared with the GnCNNr model. The model which achieved the highest performance on the test dataset is the GrCNNr with a kappa statistic of 0.698 and 95% CI of [0.657, 0.739]. Receiving operating characteristics are plotted and area under ROC curve is 0.899. The model achieves a sensitivity of 0.776 and a specificity of 0.913. The model are compared on the basis of the results obtained. A detailed report for each model on the test dataset is given below (Tables 2, 3, 4).

From the analysis of the three models above, it can be observed clearly that the specificity and sensitivity remains consistent throughout the dataset with a low AUROC. The specificity defines the chances of the result being negative when there is no abnormality present in the input data. The consistency in the specificity of the results obtained clearly indicates that the models are capable of recognizing the negative results from the dataset. The models need to be improved so as to recognize the abnormal cases as abnormal more efficiently (Fig. 4).

The AUROC statistic defines if the model is able to dif-ferentiate between normal and abnormal images correctly or not. The higher the AUROC, the better is the model. The AUROC of the above three models is not enough to distin-guish between negative and positive data.

The GnCNNr model has been analyzed as follows (Table 5, Fig. 5).

Performance Comparison of Models

Results obtained from the various models trained and tested on the MURA dataset after data preprocessing calculated

(10)FPR(A) ∶ A → X

(11)Area =

1

∫x=0

TPR(FPR−1(X)

)dX.

above are compared in this section. Table 6 depicts the per-formance comparison of all the models which shows Cohen Kappa achieved by the GnCNNr is maximum with a value of 69.8% as compared to Densenet, Inception, Inception v2. Specificity, sensitivity and AUROC of all the models have also been compared.

Specificity of the model represents the chances that the result will be negative if there is no abnormality in the image(s) input to the model. The specificity of the Inception model comes out to be the highest out of the four models i.e. 0.922. Sensitivity of the model reports if the result will be positive when there is any musculo-skeletal abnormality in the image(s) input to the model. The sensitivity of the GnCNNr model comes out to be the highest out of the four models i.e. 0.776. The AUROC statistic represents how much the model is able to distin-guish between an abnormal and a normal case. The higher the AUROC statistic, the better is the model. The high-est AUROC, out of the four models represented, is of the GnCNNr model i.e. 0.899 (Fig. 6).

Table 2 DenseNet model performance

Cohen Kappa (95% CI)

Sensitivity Specificity AUROC

Hand 0.563 (0.464, 0.722) 0.666 0.922 0.935Wrist 0.776 (0.693, 0.859) 0.713 0.954 0.832Humerus 0.777 (0.671, 0.883) 0.860 0.877 0.824Shoulder 0.566 (0.450, 0.682) 0.758 0.727 0.956Elbow 0.747 (0.641, 0.854) 0.717 0.985 0.806Finger 0.677 (0.567, 0.786) 0.761 0.912 0.888Forearm 0.742 (0.628, 0.856) 0.757 0.996 0.926Overall 0.648 (0.617, 0.679) 0.723 0.902 0.879

Table 3 Inception model performance

Cohen Kappa (95% CI)

Sensitivity Specificity AUROC

Hand 0.645 (0.523, 0.768) 0.666 0.950 0.856Wrist 0.775 (0.691, 0.858) 0.773 0.978 0.936Humerus 0.762 (0.653, 0.871) 0.850 0.911 0.912Shoulder 0.628 (0.519, 0.738) 0.800 0.828 0.885Elbow 0.696 (0.581, 0.810) 0.772 0.913 0.881Finger 0.665 (0.554, 0.776) 0.759 0.902 0.882Forearm 0.681 (0.555, 0.806) 0.718 0.956 0.890Overall 0.697 (0.656, 0.738) 0.765 0.922 0.897

Table 4 Inception v2 model performance

Cohen Kappa (95% CI)

Sensitivity Specificity AUROC

Hand 0.552 (0.418, 0.687) 0.606 0.920 0.816Wrist 0.749 (0.662, 0.836) 0.773 0.957 0.929Humerus 0.762 (0.653, 0.871) 0.835 0.926 0.916Shoulder 0.618 (0.508, 0.729) 0.821 0.797 0.851Elbow 0.760 (0.656, 0.864) 0.787 0.956 0.907Finger 0.667 (0.556, 0.777) 0.807 0.858 0.885Forearm 0.681 (0.555, 0.806) 0.718 0.956 0.869Overall 0.687 (0.646, 0.729) 0.769 0.910 0.888

Page 9: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319 Page 9 of 12 319

SN Computer Science

Conclusion

The research experimented on four models applied on the MURA dataset, which contains 40,561 images, using dif-ferent machine learning techniques, namely DenseNet-169, Inception v3, Inception-ResNet v2 and GnCNNr. Along with batch normalization, group standardization, data augmenta-tion and a cyclic learning rate scheduler, all four models achieving a Kappa statistic of more than 0.648 with the high-est performance on GnCNNr model architecture with Kappa statistic of 0.698 and 95% confidence interval of [0.657, 0.739]. The accuracy and other interpretation metrics can

Fig. 4 ROC curves for the DenseNet, Inception and Inception v2 models

Table 5 GnCNNr model performance

Cohen Kappa (95% CI)

Sensitivity Specificity AUROC

Hand 0.593 (0.464, 0.722) 0.636 0.930 0.835Wrist 0.776 (0.693, 0.859) 0.793 0.964 0.932Humerus 0.777 (0.671, 0.883) 0.880 0.897 0.924Shoulder 0.566 (0.450, 0.682) 0.778 0.787 0.856Elbow 0.747 (0.641, 0.854) 0.787 0.945 0.906Finger 0.677 (0.567, 0.786) 0.771 0.902 0.888Forearm 0.742 (0.628, 0.856) 0.781 0.956 0.926Overall 0.698 (0.657, 0.739) 0.776 0.913 0.899

Page 10: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319319 Page 10 of 12

SN Computer Science

also be improved by using a more extensively integrated GnCNNr model.

Further techniques of regularization of model and increas-ing the data set size artificially can be applied to reduce over fitting of data to improve model generalization. The GnC-NNr model can be integrated with CNN models made by using genetic algorithms and not by the manual methods.

Fig. 5 ROC curve for GnCNNr model

Table 6 Comparison of the results from the various models

Cohen Kappa Sensitivity Specificity AUROC

DenseNet model 0.648 0.723 0.902 0.879Inception model 0.697 0.765 0.922 0.897Inception v2

model0.687 0.769 0.910 0.888

GnCNNr model 0.698 0.776 0.913 0.899

Fig. 6 Comparison graph for the various models

Page 11: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319 Page 11 of 12 319

SN Computer Science

Author Contributions MG: Documentation, RM: Methodology and conceptualization, DK: Supervision-Review editing, SR: Writing Final Draft and Conceptualization, RA: Methodology.

Funding There is no funding received for the project.

Compliance with Ethical Standards

Conflict of Interest There is no conflict of interest.

References

1. Kaplan KM, Spivak JM, Bendo JA. Embryology of the spine and associated congenital abnormalities. Spine J. 2005;5(5):564–76.

2. Becker KG. The common variants/multiple disease hypoth-esis of common complex genetic disorders. Med Hypoth-eses. 2004;62(2):309–17. https ://doi.org/10.1016/S0306 -9877(03)00332 -3.

3. Munné S, Chen S, Colls P, Garrisi J, Zheng X, Cekleniak N, Cohen J. Maternal age, morphology, development and chro-mosome abnormalities in over 6000 cleavage-stage embryos. Reprod Biomed Online. 2007;14(5):628–34. https ://doi.org/10.1016/S1472 -6483(10)61057 -7.

4. Jackson RG, Patel R, Jayatilleke N, Kolliakou A, Ball M, Gorrell G, Stewart R. Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open. 2017. https ://doi.org/10.1136/bmjop en-2016-01201 2.

5. Kurjak A, Kirkinen P, Latin V, Rajhvajn B. Diagnosis and assessment of fetal malformations and abnormalities by ultrasound. J Perinat Med. 1980;8(5):219–35. https ://doi.org/10.1515/jpme.1980.8.5.219.

6. Omar S, Ngadi A, Jebur HH. Machine learning techniques for anomaly detection: an overview. Int J Comput Appl. 2013;79(2):33–41.

7. Mittal M, Arora M, Pandey T, Goyal LM (2020) Image seg-mentation using deep learning techniques in medical images. In: Advancement of machine intelligence in interactive medi-cal image analysis, pp 41–63. Springer, Singapore. https ://doi.org/10.1007/978-981-15-1100-4_3

8. Kaur B, Sharma M, Mittal M, Verma A, Goyal LM, Hemanth DJ. An improved salient object detection algorithm combin-ing background and foreground connectivity for brain image analysis. Comput Electr Eng. 2018;71:692–703. https ://doi.org/10.1016/j.compe lecen g.2018.08.018.

9. Cho G, Yim J, Choi Y, Ko J, Lee SH. Review of machine learn-ing algorithms for diagnosing mental illness. Psychiatry Inves-tig. 2019;16(4):262. https ://doi.org/10.30773 /pi.2018.12.21.2.

10. Mittal M, Goyal LM, Kaur S, Kaur I, Verma A, Hemanth DJ. Deep learning based enhanced tumor segmentation approach for MR brain images. Appl Soft Comput. 2019;78:346–54. https ://doi.org/10.1016/j.asoc.2019.02.036.

11. Brennan-Olsen SL, Cook S, Leech MT, Bowe SJ, Kowal P, Nai-doo N, Mohebbi M. Prevalence of arthritis according to age, sex and socioeconomic status in six low and middle income countries: analysis of data from the World Health Organization study on global AGEing and adult health (SAGE) Wave 1. BMC Musculoskelet Disord. 2017;18(1):271. https ://doi.org/10.1186/s1289 1-017-1624-z.

12. Lundervold AS, Lundervold A. An overview of deep learning in medical imaging focusing on MRI. Zeitschrift für Medizinische

Physik. 2019;29(2):102–27. https ://doi.org/10.1016/j.zemed i.2018.11.002.

13. Stanford ML Group (2018) What is MURA? MURA Dataset. https ://stanf ordml group .githu b.io/compe titio ns/mura/

14. Rajpurkar P, Irvin J, Bagul A, Ding D, Duan T, Mehta H Lan-glotz C (2017) Mura: Large dataset for abnormality detection in musculoskeletal radiographs. https ://arxiv .org/abs/1712.06957

15. Savitha G, Jidesh P (2019) Lung nodule identification and classification from distorted CT images for diagnosis and detection of lung cancer. In: Machine intelligence and sig-nal analysis, pp 11–23. Springer, Singapore. https ://doi.org/10.1007/978-981-13-0923-6_2

16. Wu D, Kim K, Dong B, El Fakhri G, Li Q (2018) End-to-end lung nodule detection in computed tomography. In: Interna-tional workshop on machine learning in medical imaging, pp 37–45. Springer, Cham. https ://doi.org/10.1007/978-3-030-00919 -9_5

17. Taboada-Crispi A, Sahli H, Hernandez-Pacheco D, Falcon-Ruiz A (2009) Anomaly detection in medical image analysis. In: Hand-book of research on advanced techniques in diagnostic imaging and biomedical applications, pp 426–446. IGI Global. https ://doi.org/10.4018/978-1-60566 -314-2.ch027

18. Cahill ND, Chen S, Sun Z, Ray LA (2010) US7738683B2—Abnormality detection in medical images. Google Patents. https ://paten ts.googl e.com/paten t/US773 8683B 2/en

19. Kharrat A, Benamrane N, Messaoud MB, Abid M (2009) Detec-tion of brain tumor in medical images. In: 2009 3rd International conference on signals, circuits and systems (SCS), pp 1–6. IEEE. https ://doi.org/10.1109/ICSCS .2009.54125 77

20. Spanhol FA, Oliveira LS, Petitjean C, Heutte L (2016) Breast cancer histopathological image classification using convolutional neural networks. In: 2016 International joint conference on neural networks (IJCNN), pp 2560–2567. IEEE. https ://doi.org/10.1109/IJCNN .2016.77275 19

21. Ramteke RJ, Monali YK. Automatic medical image classification and abnormality detection using k-nearest neighbour. Int J Adv Comput Res. 2012;2(4):190–6.

22. Rajeswari S, Jeyaselvi KT. Support vector machine classifica-tion for MRI images. Int Emerg Trends Comput Electron Eng. 2012;1(3):1534.

23. Erus G, Zacharaki EI, Bryan N, Davatzikos C (2010) Learning high-dimensional image statistics for abnormality detection on medical images. In: 2010 IEEE Computer society conference on computer vision and pattern recognition-workshops, pp 139–145. IEEE. https ://doi.org/10.1109/CVPRW .2010.55431 41

24. Dimitrovski I, Kocev D, Loskovska S, Džeroski S. Hier-archical annotation of medical images. Pattern Recogn. 2011;44(10–11):2436–49.

25. Crippa A, Salvatore C, Perego P, Forti S, Nobile M, Molteni M, Castiglioni I. Use of machine learning to identify children with autism and their motor abnormalities. J Autism Dev Disord. 2015;45(7):2146–56.

26. Abdullah AA, Yaakob A, Ibrahim Z (2018) Prediction of spinal abnormalities using machine learning techniques. In: 2018 Inter-national conference on computational approach in smart systems design and applications (ICASSDA), pp 1–6. IEEE. https ://doi.org/10.1109/ICASS DA.2018.84776 22

27. Iglovikov VI, Rakhlin A, Kalinin AA, Shvets AA (2018) Pae-diatric bone age assessment using deep convolutional neu-ral networks. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp 300–308. Springer, Cham

28. Räsänen LP, Mononen ME, Nieminen MT, Lammentausta E, Jurvelin JS, Korhonen RK, OAI Investigators. Implementation of subject-specific collagen architecture of cartilage into a 2D

Page 12: Musculoskeletal Abnormality Detection in Medical Imaging ......SN Computer Science (2020) 1:319 Page 5 of 12 319 SN Computer Science Pandas dataframe where each image’scorresponding

SN Computer Science (2020) 1:319319 Page 12 of 12

SN Computer Science

computational model of a knee joint—data from the osteoarthritis initiative (OAI). J Orthop Res. 2013;31(1):10–22.

29. Gertych A, Zhang A, Sayre J, Pospiech-Kurkowska S, Huang HK. Bone age assessment of children using a digital hand atlas. Comput Med Imaging Graphics. 2007;31(4–5):322–31. https ://doi.org/10.1016/j.compm edima g.2007.02.012.

30. Wang X, Peng Y, Lu L, Lu Z, Summers RM (2018) Tienet: Text-image embedding network for common thorax disease classifica-tion and reporting in chest X-rays. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9049–9058. https ://doi.org/10.1109/CVPR.2018.00943

31. Kalpathy-Cramer J, de Herrera AGS, Demner-Fushman D, Antani S, Bedrick S, Müller H. Evaluating performance of biomedical image retrieval systems—an overview of the medical image retrieval task at ImageCLEF 2004–2013. Comput Med Imaging Graph. 2015;39:55–61.

32. Jaeger S, Candemir S, Antani S, Wáng YXJ, Lu PX, Thoma G. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg. 2014;4(6):475. https ://doi.org/10.3978/j.issn.2223-4292.2014.11.20.

33. Shiraishi J, Abe H, Li F, Engelmann R, MacMahon H, Doi K. Computer-aided diagnosis for the detection and classification of lung cancers on chest radiographs: ROC analysis of radiologists’ performance. Acad Radiol. 2006;13(8):995–1003. https ://doi.org/10.1016/j.acra.2006.04.007.

34. Smith LN (2017) Cyclical learning rates for training neural net-works. In: 2017 IEEE Winter conference on applications of com-puter vision (WACV), pp 464–472. IEEE. https ://doi.org/10.1109/WACV.2017.58

35. Dauphin Y, De Vries H, Bengio Y (2015) Equilibrated adaptive learning rates for non-convex optimization. In: Advances in neural information processing systems, pp 1504–1512

36. Bonn D, Rosebrock A, Saxena A, Zhang X, Esbel O (2020) Cycli-cal learning rates with Keras and deep learning. https ://www.pyima gesea rch.com/2019/07/29/cycli cal-learn ing-rates -with-keras -and-deep-learn ing/

37. Mittal A, Kumar D. AiCNNs (artificially-integrated convolutional neural networks) for brain tumor prediction. EAI Endorsed Trans Pervasive Health Technol. 2019;5(17):98. https ://doi.org/10.4108/eai.12-2-2019.16197 6.

38. Verma OP, Roy S, Pandey SC, Mittal M, editors. Advancement of machine intelligence in interactive medical image analysis. New York: Springer Nature; 2019.

39. Mittal A, Kumar D, Mittal M, Saba T, Abunadi I, Rehman A, Roy S. Detecting pneumonia using convolutions and dynamic capsule routing for chest X-ray images. Sensors. 2020;20(4):1068. https ://doi.org/10.3390/s2004 1068.

40. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. https ://doi.org/10.1109/CVPR.2016.90

41. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708. https ://doi.org/10.1109/CVPR.2017.243

42. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabi-novich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recogni-tion, pp 1–9. https ://doi.org/10.1109/CVPR.2015.72985 94

43. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826. https ://doi.org/10.1109/CVPR.2016.308

44. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learn-ing. In: Thirty-first AAAI conference on artificial intelligence

45. McHugh ML. Interrater reliability: the Kappa statistic. Biochem Med Biochem Med. 2012;22(3):276–82.

46. Tape TG. Using the receiver operating characteristic (ROC) curve to analyze a classification model. Nebraska: University of Nebraska; 2000. p. 1–3.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.