automated screening malaria parasite using light microscopic images

10
Micron 45 (2013) 97–106 Contents lists available at SciVerse ScienceDirect Micron j our na l ho me p age: www.elsevier.com/locate/micron Machine learning approach for automated screening of malaria parasite using light microscopic images Dev Kumar Das a , Madhumala Ghosh a , Mallika Pal b , Asok K. Maiti b , Chandan Chakraborty a,a School of Medical Science and Technology, IIT Kharagpur, India b Department of Pathology, Midnapur Medical College & Hospital, Midnapur, West Bengal, India a r t i c l e i n f o Article history: Received 24 November 2011 Received in revised form 3 November 2012 Accepted 6 November 2012 Keywords: Malaria parasite Erythrocyte Texture Bayesian classifier Machine learning a b s t r a c t The aim of this paper is to address the development of computer assisted malaria parasite characterization and classification using machine learning approach based on light microscopic images of peripheral blood smears. In doing this, microscopic image acquisition from stained slides, illumination correction and noise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classification of different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated. The erythrocytes are segmented using marker controlled watershed transformation and subsequently total ninety six features describing shape-size and texture of erythrocytes are extracted in respect to the parasitemia infected versus non-infected cells. Ninety four features are found to be statistically significant in discriminating six classes. Here a feature selection-cum-classification scheme has been devised by combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine (SVM) in order to provide the higher classification accuracy using best set of discriminating features. Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classification by selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 most significant features. Finally, the performance of these two classifiers under feature selection framework has been compared toward malaria parasite classification. © 2012 Elsevier Ltd. All rights reserved. 1. Introduction Malaria is one type of parasitic infectious disease caused by Plasmodium species viz. Plasmodium falciparum (P. falciparum), Plasmodium vivax (P. vivax), Plasmodium malariae (P. malariae) and Plasmodim ovale (P. ovale) (Greer et al., 2009). This parasite exhibits a complex life cycle involving an insect vector (mosquito) and a ver- tebrate host (human). Malaria is common in Asian and Sub African populations (Frean, 2010) and is responsible for 1.5–2.7 millions of death per year (Raviraja et al., 2006). In the Indian population, the incident rate is higher in P. vivax infection cases than that of P. falciparum. It has been observed that 50–60% of malaria patients are affected by P. vivax while 40–50% is affected by P. falciparum in India (NVBDCP, 2010-2011). Like other diseases, it is well understood that early detection of malaria infection leads to prevention and cure by means of pro- viding treatment and management. Red blood cells or erythrocyte blood cells are mainly affected by the malaria parasites. In human blood, three life stages viz., trophozoite, schizont and gametocyte are cycled for the parasite. These infection stages viz. trophozoite, Corresponding author. Tel.: +91 3222 283570; fax: +91 3222 282221. E-mail address: [email protected] (C. Chakraborty). schizont and gametocyte are visible under light microscope using peripheral blood smears. The trophozoite stage is often known as ring stage (WHO, 2010). In case of P. falciparum trophozoite and gametocyte stage are visible under microscope but schizont stage is rarely visible because it remains in capillaries and bone marrow (Cuomo et al., 2012). In case of P. vivax infection, all three stages and in P. falciparum infection, two stages (trophozoite and gametocyte) are visible under microscope during peripheral blood smear screening. Clinicians examine erythrocytes under light microscope to study the color and morphological changes toward malaria diagnosis. Evaluation accuracy mostly depends on the expert’s clinic-pathological understanding. In effect, such proce- dure involves humanistic error in terms of subjectivity, which leads to inconsistent as well as less diagnostic accuracy. To increase diagnostic precision by minimizing such subjectivity, developing a computer assisted malaria parasite detection tool has given impor- tance in modern pathological services where a clinician will get assistance in order to quickly make better decision toward malaria diagnosis. In modern diagnostic system development, machine learning techniques have enormous contributions for achieving higher diag- nostic precision in medical imaging informatics like microscopy, ultrasound imaging, MRI, and CT. Microscopic image analysis is the most important as well as highly informative tool toward 0968-4328/$ see front matter © 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.micron.2012.11.002

Upload: endi-alfarezell

Post on 02-Dec-2015

23 views

Category:

Documents


6 download

DESCRIPTION

paper malaria

TRANSCRIPT

Page 1: Automated Screening Malaria Parasite Using Light Microscopic Images

Ml

Da

b

a

ARRA

KMETBM

1

PPPatpotfai

mvbba

0h

Micron 45 (2013) 97–106

Contents lists available at SciVerse ScienceDirect

Micron

j our na l ho me p age: www.elsev ier .com/ locate /micron

achine learning approach for automated screening of malaria parasite usingight microscopic images

ev Kumar Dasa, Madhumala Ghosha, Mallika Palb, Asok K. Maitib, Chandan Chakrabortya,∗

School of Medical Science and Technology, IIT Kharagpur, IndiaDepartment of Pathology, Midnapur Medical College & Hospital, Midnapur, West Bengal, India

r t i c l e i n f o

rticle history:eceived 24 November 2011eceived in revised form 3 November 2012ccepted 6 November 2012

eywords:alaria parasite

rythrocyteextureayesian classifier

a b s t r a c t

The aim of this paper is to address the development of computer assisted malaria parasite characterizationand classification using machine learning approach based on light microscopic images of peripheral bloodsmears. In doing this, microscopic image acquisition from stained slides, illumination correction andnoise reduction, erythrocyte segmentation, feature extraction, feature selection and finally classificationof different stages of malaria (Plasmodium vivax and Plasmodium falciparum) have been investigated.The erythrocytes are segmented using marker controlled watershed transformation and subsequentlytotal ninety six features describing shape-size and texture of erythrocytes are extracted in respect to theparasitemia infected versus non-infected cells. Ninety four features are found to be statistically significantin discriminating six classes. Here a feature selection-cum-classification scheme has been devised by

achine learning combining F-statistic, statistical learning techniques i.e., Bayesian learning and support vector machine(SVM) in order to provide the higher classification accuracy using best set of discriminating features.Results show that Bayesian approach provides the highest accuracy i.e., 84% for malaria classificationby selecting 19 most significant features while SVM provides highest accuracy i.e., 83.5% with 9 mostsignificant features. Finally, the performance of these two classifiers under feature selection frameworkhas been compared toward malaria parasite classification.

. Introduction

Malaria is one type of parasitic infectious disease caused bylasmodium species viz. Plasmodium falciparum (P. falciparum),lasmodium vivax (P. vivax), Plasmodium malariae (P. malariae) andlasmodim ovale (P. ovale) (Greer et al., 2009). This parasite exhibits

complex life cycle involving an insect vector (mosquito) and a ver-ebrate host (human). Malaria is common in Asian and Sub Africanopulations (Frean, 2010) and is responsible for 1.5–2.7 millionsf death per year (Raviraja et al., 2006). In the Indian population,he incident rate is higher in P. vivax infection cases than that of P.alciparum. It has been observed that ∼50–60% of malaria patientsre affected by P. vivax while ∼40–50% is affected by P. falciparumn India (NVBDCP, 2010-2011).

Like other diseases, it is well understood that early detection ofalaria infection leads to prevention and cure by means of pro-

iding treatment and management. Red blood cells or erythrocyte

lood cells are mainly affected by the malaria parasites. In humanlood, three life stages viz., trophozoite, schizont and gametocytere cycled for the parasite. These infection stages viz. trophozoite,

∗ Corresponding author. Tel.: +91 3222 283570; fax: +91 3222 282221.E-mail address: [email protected] (C. Chakraborty).

968-4328/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.ttp://dx.doi.org/10.1016/j.micron.2012.11.002

© 2012 Elsevier Ltd. All rights reserved.

schizont and gametocyte are visible under light microscope usingperipheral blood smears. The trophozoite stage is often knownas ring stage (WHO, 2010). In case of P. falciparum trophozoiteand gametocyte stage are visible under microscope but schizontstage is rarely visible because it remains in capillaries and bonemarrow (Cuomo et al., 2012). In case of P. vivax infection, allthree stages and in P. falciparum infection, two stages (trophozoiteand gametocyte) are visible under microscope during peripheralblood smear screening. Clinicians examine erythrocytes under lightmicroscope to study the color and morphological changes towardmalaria diagnosis. Evaluation accuracy mostly depends on theexpert’s clinic-pathological understanding. In effect, such proce-dure involves humanistic error in terms of subjectivity, which leadsto inconsistent as well as less diagnostic accuracy. To increasediagnostic precision by minimizing such subjectivity, developing acomputer assisted malaria parasite detection tool has given impor-tance in modern pathological services where a clinician will getassistance in order to quickly make better decision toward malariadiagnosis.

In modern diagnostic system development, machine learning

techniques have enormous contributions for achieving higher diag-nostic precision in medical imaging informatics like microscopy,ultrasound imaging, MRI, and CT. Microscopic image analysis isthe most important as well as highly informative tool toward
Page 2: Automated Screening Malaria Parasite Using Light Microscopic Images

9 icron 4

pomoi(oet

atdrais2qcf

8 D.K. Das et al. / M

athological evaluation of different diseases viz. hematological dis-rder, oral cancer, breast cancer, cervical cancer, etc. Like othersicroscopic image analysis, peripheral blood smear screening is

ne of the essential diagnostic techniques to identify hematolog-cal disorders (anemia, thalassemia, etc.) and parasitic infectionmalaria, filaria) in the blood. In case of malaria detection, pathol-gists frequently use light microscope to detect infection inrythrocytes based on color as well as morphological changes ofhe erythrocyte.

Now-a-days there are various techniques for malaria diagnosisvailable in the market (Tangpukdee et al., 2009) but conven-ional microscopic technique remains the gold standard for malariaiagnosis. Other methods are not cost effective and also theseequire further improvement for diagnostic precision. Few liter-tures have suggested computer vision approach to detect malarianfection based on digital microscopic images of peripheral bloodmear. Color histogram based malaria parasite detection (Tek et al.,

006) has been carried out. Further, Diaz et al. (2009) showeduantification and classification of P. falciparum infected erythro-ytes. Morphological and novel thresholding selection techniquesor identification of erythrocytes were used by Ross et al. (2006).

Fig. 1. Work flow diagram of th

5 (2013) 97–106

Malaria parasite in HSV (Hue, Saturation, and Value) color spacewas segmented (Makkapati and Rao, 2009). Erythrocytes infectedby malaria parasites were detected by using statistical approach(Raviraja et al., 2008). Mathematical morphology and granulom-etry approaches (Dempster and Ruberto, 1999) and gray levelthresholding (Toha and Ngah, 2007) for estimation of parasitemiawere applied. Kumar et al. (2006) suggested clump splitting algo-rithm and rule base approach to segment out clump erythrocytesfrom peripheral blood smear images. Sio et al. (2007) appliedrule based approach for P. falciparum infection detection purpose.Most of the literatures showed malaria classification based oncultured blood smear sample. But no comprehensive approachtoward developing a pathological decision support system isstill available for computerized detection of malaria parasitemiaviz., P. vivax and P. falciparum using peripheral blood smearimages.

In view of this, our study focuses on development of machine

learning approach for discriminating five (three P. vivax and two P.falciparum) different stages of infected erythrocyte due to malariainfection and non-infected erythrocytes using color, textural andmorphological information. Fig. 1 depicts systematic approach for

e proposed methodology.

Page 3: Automated Screening Malaria Parasite Using Light Microscopic Images

D.K. Das et al. / Micron 45 (2013) 97–106 99

yte an

es

2

2

lIslfr4nsgsI

2

pDac

Fig. 2. Original grabbed images. (a) Gametoc

xecuting the screening tool development methodology for malariacreening.

. Materials and methods

.1. Sample collection

Thin peripheral blood smear samples were prepared and col-ected from Midnapur Medical College & Hospital, West Midnapur,ndia. Here, thin smears were prepared on clean and disinfectedlides and stained with leishman for visualizing different cellu-ar counterparts. Peripheral blood smear slides were collectedrom a total 600 (50 normal, 496 P. vivax and 54 P. falcipa-um) patients. Out of 600, 150 (70 P. vivax, 40 P. falciparum and0 normal) slides were considered because other slides wereot well prepared and not clearly visible under microscope. Allamples were verified and labeled by three distinct patholo-ists as P. vivax, P. falciparum or normal cases. Approval for thistudy was obtained from the research ethics committee of thenstitute.

.1.1. Image acquisitionLeishman stained peripheral blood smear images of malaria

atients’ slides were optically grabbed by Leica Observer (LeicaM750) under 100× oil objectives (NA 1.5150) in JPEG formatnd the image size was 2048 × 1536. The effective magnifi-ation and pixel size were 1000 and 0.064 �m, respectively.

Fig. 3. Illumination corrected Images. (a–c) Original i

d (b) trophozoite infection stages of P. vivax.

All images were labeled by the pathologists and markedthe images P. vivax and P. falciparum infection stage wise.Fig. 2 shows the grabbed images of different malaria infectionstages.

2.2. Preprocessing

2.2.1. Illumination correctionDue to staining variability of blood smear and camera cali-

bration, change occurs in illumination of the microscope images.Several illumination correction techniques have been applied in theliterature. Here we have considered gray world assumption (Lam,2005) for correcting illumination. Fig. 3 shows the result of an illu-mination corrected image. Mathematically gray world assumptionis described as follows:

If f(x, y) is considered as grabbed image of size M × N wherecorresponding red, green and blue channel can be defined as fr(x, y),fg(x, y) and fb(x, y), respectively. The average value of each channelcan be calculated as follows respectively

Fravg = 1MN

M∑x=1

N∑y=1

fr(x, y) (1)

Fgavg = 1MN

M∑x=1

N∑y=1

fg(x, y) (2)

mages and (d–f) illumination corrected images.

Page 4: Automated Screening Malaria Parasite Using Light Microscopic Images

100 D.K. Das et al. / Micron 4

Table 1Quantitative performance measure of different denoising filters.

Filtering techniques MSE PSNR RMSE SNR

Median filter 1.025 48.021 1.012 23.949Wiener 1.070 47.837 1.034 23.764Max filter 0.469 51.416 0.685 27.344

−4

F

a

g

f

2

Stewpom

f

wS

measures viz., Shannon, Renyi, Havarda and Charvat, Kapur’s

Geometric mean filter 0.360 × 10 92.560 0.006 41.164M3 filter 0.399 × 10−3 82.115 0.020 30.719

bavg = 1MN

M∑x=1

N∑y=1

fb(x, y) (3)

In this method, keeping green channel unchanged, gain for rednd green channels was computed as

r = Fgavg

Fravgand gb = Fgavg

Fbavg

Subsequently, red and blue channel were adjusted as

radj(x, y) = gr × fr(x, y) and fbadj(x, y) = gb × fb(x, y) (4)

.2.2. Noise reductionNoise removal is still a challenge in medical image analysis.

everal filtering methods exist but each one has its own limita-ion. Based on quantitative measure of performance (Thangavelt al., 2009), geometric mean filter (Gonzalez and Woods, 2002)as selected for impulse noise removal purpose. Table 1 shows theerformance of five different types of filters and Fig. 4 shows theutput of different impulse noise removal filters. The geometricalean filter can defined as

1(x, y) =

⎡⎣ ∏

(s,t) ∈ Sxy

g(s, t)

⎤⎦

1/mn

(5)

here the gray image of illumination corrected image is g(s, t), andxy is the set of coordinate of window of size m × n.

Fig. 4. Output images of the different typ

5 (2013) 97–106

2.3. Erythrocyte segmentation

Peripheral blood smear image consists of four componentsviz. erythrocyte, leukocyte, platelet and plasma. Leukocytes andplatelets are morphologically different from erythrocyte’s shape.In this study, erythrocytes are the main regions of interest. Markercontrolled watershed algorithm (Gonzalez and Woods, 2002;Beucher and Meyer, 1993) has been applied to separate erythro-cytes from the microscopic image. Fig. 5 shows the segmentationresult of peripheral blood smear images using marker controlledwatershed techniques. In principle, marker controlled watershedalgorithm consists of the following three steps.

Step 1 – Determine gradient image using Sobel filterStep 2 – Foreground and background region extractionStep 3 – Computed watershed transform.

2.4. Feature extraction

Feature extraction is a transformation of an image data intoanother domain that produce features. In the present study, five dif-ferent types of malaria infected erythrocytes and non-infected cellswere discriminated from peripheral blood smear images. In manualevaluation process, pathologists consider color and morphologi-cal variation for identifying which types of infected erythrocytesare present on that particular peripheral blood smear slide. Wehave computed a total of 80 textural (entropy, Haralick textu-ral features, local binary pattern, fractal dimension, histogrambased features, gray level run length matrix based texture) and16 morphological features (shape features and Hu’s moment) todiscriminate six types of infected and non-infected erythrocytes(see Table 2).

2.4.1. EntropyEntropy is the measure of uncertainty associated with ran-

domness. We have considered five different types of entropy

entropy (Pharwaha and Sing, 2009) and Yeager’s measure (Ghoshet al., 2010). Let I(x, y) be the erythrocyte (infected or non-infected)image having Ni (i = 0,1,0,2,3,4, . . ., L–1) distinct gray values. The

es of impulse noise removal filter.

Page 5: Automated Screening Malaria Parasite Using Light Microscopic Images

D.K. Das et al. / Micron 45 (2013) 97–106 101

Fig. 5. Marker controlled watershed segmentation res

Table 2List of extracted quantitative features.

No. of features Feature

1 Havarda Charvat Entropy2 Kapur’s entropy3 Renyi entropy4 Yeager’s measure5 Shannon entropy6–24 Haralick textural features (19)

25 Fractal dimension26–31 Local binary pattern (6)32–36 Histogram features (5)

ni

H

S

R

R

K

Y

vb

2

b

37–45 Shape features (9)46–52 Hu’s moment (7)53–96 GRLM (44)

ormalized histogram can be defined for a particular region ofnterest of size (M × N) as

i = Ni

MN(6)

The Shannon entropy can be defined as

= −L−1∑i=0

Hi log2(Hi), (7)

Similarly the others entropy measures are described as follows:Renyi entropy

= 11 − ˛

log2

L−1∑i=1

H˛i , where ̨ /= 1, ̨ > 0 (8)

Havrda and Charvat’s entropy

= 11 − ˛

log2

L−1∑i=0

H˛i , where ̨ /= 1, ̨ > 0 (9)

Kapur’s entropy

˛,ˇ = 1 ̌ − ˛

log2

∑L−1i=0 H˛

i∑L−1i=0 Hˇ

i

, where ̨ /= ˇ, ̨ > 0, ̌ > 0 (10)

Yager’s measure

= 1 −∑L−1

i=0 2Hi − 1

|M × N|(11)

In addition, another five types of first order statistical featuresiz. mean, variance, skewness, kurtosis and energy were computedased on image histogram.

.4.2. GLCM based textural featuresGray level concurrence matrix (GLCM) (Haralick et al., 1973)

ased textural features provide useful textural information in many

ult. (a) Original image and (b) segmented Image.

diagnostic problems. In view of this, we used GLCM and extracted19 textural features like energy, entropy, variance, and informationmeasure of correlation (Tan et al., 2009, 2010).

Suppose I(x, y) denotes the segmented erythrocyte (infected andnon-infected) image having N(0,1,2,3, . . ., N − 1) distinct gray levelintensities. Firstly, we calculate GLCM of order N × N, where N refersthe number of gray levels. Based on this GLCM, Haralick describedstatistical features for describing the textural pattern of an image.Some of these are calculated as follows:

If P(i, j) = normalized dependence matrix and N = no. of gray lev-els present in the erythrocyte, then

Entropy

I1 = −N−1∑i=0

N−1∑j=0

P(i, j) log(P(i, j)) (12)

Energy

I2 =N−1∑i=0

N−1∑j=0

P(i, j)2 (13)

Correlation

I3 =∑N−1

i=0

∑N−1j=0 (i, j)P(i, j) − �x�y

�x�y(14)

where �x, �y, �x, and �y indicate the standard deviations andmeans of Px, Py whereby Px, Py correspond to the partial proba-bility density functions. Px(i) = ith entry in the marginal-probabilitymatrix obtained by summing the rows of P(i, j)

Variance

I4 =N−1∑i=0

N−1∑j=0

(i − �)2 log(P(i, j)), where � = mean of P(i, j) (15)

Information correlation measure 1

I5 = I1 − HXY1

max(HX − HY)(16)

Information correlation measure 2

I6 = (1 − exp[−2(HXY2 − I1)])1/2 (17)

where HX and HY are the entropies for Px and Py

HXY = −N−1∑i,j=0

P(i, j)(log(P(i, j))

HXY1 = −N−1∑i,j=0

P(i, j) log(Px(i)Py(j))

Page 6: Automated Screening Malaria Parasite Using Light Microscopic Images

1 icron 4

H

I

2

Grror(tw

S

L

G

R

R

L

H

S

S

02 D.K. Das et al. / M

XY2 = −N−1=0∑

i,j=0

Px(i)Py(j) log(Px(i)Py(j))

Sum entropy

7 = −2(N−1)∑

i=2

Px+y(i) log(Px+y(i)) (18)

.4.3. GLRLM based textural featuresThe gray level run length matrix (GLRLM) has been proposed by

alloway (1975) to describe coarse structure analysis. For an eryth-ocyte image I(x, y), run length matrix R(i, j) specifies the number ofun length j in the given direction for a particular gray value i. Basedn this run length matrix, Galloway (1975) proposed five textu-al features. Further, Chu et al. (1990) and Dasarathy and Holder1991) proposed six new textural features. Total 11 textural fea-ures (Tang, 1998) for each 11◦, 45◦, 90◦, and 135◦ direction anglesere computed as follows:

Short run emphasis (SRE)

RE =∑Ng

i=1

∑Nr

j=1r(i, j)/j2∑Ng

i=1

∑Nr

j=1R(i, j)(19)

Long run emphasis (LRE)

RE =∑Ng

i=1

∑Nr

j=1j2R(i, j)∑Ng

i=1

∑Nr

j=1R(i, j)(20)

Gray-level non uniformity (GLNU)

LNU =∑Ng

i=1

(∑Nr

j=1R(i, j))2

∑Ng

i=1

∑Nr

j=1R(i, j)(21)

Run length non uniformity (RLNU)

LNU =∑Nr

j=1

(∑Ng

i=1R(i, j))2

∑Ng

i=1

∑Nr

j=1R(i, j)(22)

Run percentage (RP)

P =∑Ng

i=1

∑Nr

j=1R(i, j)

P, (23)

Here P is the total number of image pixels pointLow gray-level run emphasis (LGRE)

GRE =∑Ng

i=1

∑Nr

j=1R(i, j)/i2∑Ng

i=1

∑Nr

j=1R(i, j)(24)

High gray-level run emphasis (HGRE)

GRE =∑Ng

i=1

∑Nr

j=1R(i, j) · i2∑Ng

i=1

∑Nr

j=1R(i, j)(25)

Short run low gray-level run emphasis (SRLGE)

RLGE =∑Ng

i=1

∑Nr

j=1R(i,j)i2·j2∑Ng

i=1

∑Nr

j=1R(i, j)(26)

Short run high gray-level run emphasis (SRHGE)

RHGE =∑Ng

i=1

∑Nr

j=1(R(i, j) · i2)/j2∑Ng

i=1

∑Nr

j=1R(i, j)(27)

5 (2013) 97–106

Long run low gray-level run emphasis (LRLGE)

LRLGE =∑Ng

i=1

∑Nr

j=1(R(i, j) · j2)/i2∑Ng

i=1

∑Nr

j=1R(i, j)(28)

Long run high gray-level run emphasis (LRHGE)

LRHGE =∑Ng

i=1

∑Nr

j=1(R(i, j) · j2i2)/i2∑Ng

i=1

∑Nr

j=1R(i, j)(29)

2.4.4. Fractal dimensionFractal dimension is basically used for estimating the roughness

of an image surface (Krishnan et al., 2012). If we consider the grayscale profile as the third dimension along with two dimensions ofimage, the variation in this profile gives the roughness or texturalchanges of the virtual surface consisting of the infected erythro-cyte. Here we considered modified differential box counting withsequential algorithm (Mandelbrot, 1982; Sarkar and Choudhury,1994). Mathematically fractal dimension can be defined as

D = limr→0

log Nr

log(1/r)(30)

The summation of difference between maximum and minimumintensities provides N where r can be found by

r = S

M

where S and M denote the grid size and the minimum size of theimage, respectively. The grid contribution Nr can be calculated asfollows

Nr =∑

i,j

nr(i, j) (31)

Let maximum and minimum gray levels of the image I(x, y) in(i, j) grid fall in the box numbers k and l, respectively.

nr(i, j) = k − l + 1 (32)

2.4.5. Local binary pattern (LBP)LBP is an important textural feature describing the local neigh-

borhood (Ojala et al., 2002; Krishnan et al., 2011) for gray scaleimages. Here we have considered circular neighborhood and bilin-ear interpolation value for LBP computation. If P is considered asnumber of circular neighborhood pixel points for radius R, let Gc

indicates the gray value of center pixel of circular neighborhoodfor an erythrocyte image I(x, y) and corresponding circular neigh-borhood pixel gray value is Gp, for p = 0, . . . ., P − 1. Depending onthe gray value of the center pixel Gc, circular points P are convertedinto a binary (0 or 1) pattern. The local texture of the image I(x, y)is defined as

T = t(Gc, G0, . . . , Gp−1) (33)

The LBP for center pixel can be defined as

LBPPR =P−1∑p=0

F(Gp − Gc)2p where F(x) ={

1, if x ≥ 0;

0, otherwise(34)

In rotation invariant mapping, all neighborhood sets rotate inclockwise direction for getting maximum number of most signifi-cant bits which is zero in the LBP code.

LBPriP,R = min{ROR(LBPPR, i)|i = 0, 1, . . . , P − 1} (35)

For a particular bit sequence x by i step, circular rotation is ROR(x,i). To remove sampling artifact, here uniformity measure (U) is

Page 7: Automated Screening Malaria Parasite Using Light Microscopic Images

D.K. Das et al. / Micron 45 (2013) 97–106 103

pes of malarial infection stage (a) Kapur’s entropy and (b) fractal dimension.

cI

L

e

2

tencnm

3

wwpecBp9t

Table 3Statistical test of features describing malaria samples.

Feature set Feature

1* Havarda Charvat entropy2* Kapur’s entropy3* Renyi entropy4* Yeager’s measure5* Shannon entropy6–24* Haralick textural features (19)25* Fractal dimension26–31* Local binary pattern (6)32,34–36* Histogram features (4)37–40, 42–45* Shape features (8)46–52* Hu’s moment (7)53–96* GRLM (44)33 Histogram features

Fig. 6. Class conditional density plot for non-infected and others five ty

onsidered based on the transition in the neighborhood pattern.n LBP code, pattern with U ≤ 2 is considered.

BPriu2P,R =

{∑P−1p=0F(Gp − Gc) if U(LBPP,R)

P + 1 Otherwise(36)

Here we have considered two features (mean and variance) forvery radius (R = 1, 2, 3) and corresponding P as 8, 16, and 24.

.4.6. Morphological featureMorphometric information have significant role in charac-

erizing abnormal erythrocytes detection. In case of anemia,rythrocytes shape and size become irregular with respect toormal in nature. Here nine shape features like area, perimeter,ircularity, eccentricity, orientation, major axis, minor axis, round-ess, formfactor (Gonzalez and Woods, 2002) and seven invariantoments (Hue, 1962; Das et al., 2010) are extracted.

. Feature selection

In this work, a total of 96 textural and morphological featuresere generated for malaria parasite infected erythrocytes. One-ay ANOVA was applied to obtain F-value for feature selectionurpose. In addition the distribution profiles (Rastogi, 2008; Gunt al., 2008) of features among abnormal and normal erythro-ytes were quantified using probability density estimation and

ox-whisker’s plot. Figs. 6 and 7 show the density and distributionlot for the six class data. In case of malaria infected erythrocytes,4 features were found to be statistically significant. Table 3 showshe F-statistic as discriminating criteria for feature ranking.

Fig. 7. Box-whisker plot for non-infected and others five types of mala

41 Shape feature

* p < 0.001 indicates statistical significance.

3.1. One-way ANOVA

One-way ANOVA is a study of relationship between differentsamples (Gun et al., 2008). It compares the mean of three or moreclasses by using F distribution. F distribution is defined as the ratiobetween mean square variance of classes and mean square vari-ance within classes. For calculating mean square, first we have tocompute sum square classes and sum square within classes.

4. Malaria infection stage classification

4.1. Bayesian approach

Malarial infections stage classification becomes the challeng-ing task. Here Naïve Bayes’ classifier (Duda et al., 2007; Han andKamber, 2006) is used for classifying five stages of malaria infected

rial infection stage (a) Kapur’s entropy and (b) fractal dimension.

Page 8: Automated Screening Malaria Parasite Using Light Microscopic Images

1 icron 45 (2013) 97–106

ettftsb

P

P

w

P

Nm

P

al

4

tdd2heih

W

Table 4Performance analysis of feature selection-cum-classification scheme.

F value % Accuracy Feature set

Naïve Bayes’ SVM

>4 68.35 67.11 94>100 70.60 67.11 74>150 74.43 67.11 63>200 77.25 67.68 53>249 80.85 67.68 47>300 82.43 77.70 32>350 81.64 76.91 28>400 84.00 76.80 19>450 83.78 80.96 13>500 80.96 75.67 11>550 82.31 83.55 9>600 79.95 83.44 7>650 68.24 73.19 6

TB

P

TS

P

04 D.K. Das et al. / M

rythrocytes (ring, scizon, gametocytes for P. vivax and ring, game-ocyte for P. falciparum) and non-infected erythrocyte. Supposehere are m classes viz., C1, C2, C3, . . . , Cm, whereas d dimensionaleature space X = (x1, x2, . . . , xd) is considered as object descrip-ors. For a particular feature set X, classifier predicts the infectedtage in one of the classes where it attains higher posterior proba-ility i.e., erythrocyte belongs to the class C1 if and only if

(Ci|X) > P(Cj|X) for 1 ≤ j ≤ m, j /= i. (37)

Posterior probability can be defined based on Bayes’ theorem as

(Ci|X) = P(X|Ci) · P(Ci)P(X)

(38)

here P(X)is the prior probability as defined by

(X) =m∑

i=1

P(X|Ci)P(Ci) (39)

P(X|Ci) denotes the likelihood of class Ci with respect to X. Underaïve assumption, the likelihood function becomes the product ofarginal density functions, defined as

(X|Ci) =d∏

k=1

P(xk|Ci) = P(x1|Ci) × P(x2|Ci)

× P(x3|Ci) × · · · × P(xd|Ci) (40)

In order to predict for unknown features set X*, posterior prob-bility is calculated for each class (Ci) label and predict the classabel for which posterior probability is maximum.

.2. Support vector machine (SVM)

SVM is a well known supervised learning technique. It optimizeshe class separation hyperplane in such a way that maximizes theistance between pattern and the class separating hyperplane. Hereata are not linearly separable (Martis et al., 2012; Krishnan et al.,012). So RBF kernel has been applied for projecting the data intoigher dimensional space where data are linearly separable. Let

ach class has d dimensional feature vector X = (x1, x2, . . . , xd) andts class level y is assigned two values +1, −1. Then, the boundaryyperplane is defined as

T X + b = 0 (41)

able 5ayesian learning based confusion matrix for accuracy 84% and 19 most significant featur

PV gametocyte PV schizont PV trophoz

PV gametocyte 143 5 0

PV schizont 12 132 0

PV trophozoite 6 1 121

PF gametocyte 10 0 0

PF trophozoite 3 0 21

Non-infected 0 0 6

V : P. vivax; PF: P. falciparum.

able 6VM learning based confusion matrix for accuracy 83.5% and 9 most significant features.

PV gametocyte PV schizont PV trophoz

PV gametocyte 110 16 4

PV schizont 12 130 1

PV trophozoite 6 1 118

PF gametocyte 3 0 0

PF trophozoite 0 0 12

Non-infected 0 0 7

V: P. vivax; PF: P. falciparum.

>700 65.42 71.95 3

Here, W is weight coefficient vector, and b is bias term. Main aimof this algorithm is to minimize the cost function J(W), defined as

J(W) = 1WT W

(42)

For linearly non-separable data cases, Eq. (41) can be written as

WT ˚(X) + b = 0 (43)

where ̊ defined as kernel transformation to a higher dimensionalspace.

5. Results

In the proposed scheme, Table 1 showed the results of MSE,RMSE, SNR and PSNR for various filters where it has been observedthat geometric mean filter provides higher SNR and lower MSEin minimizing the impulse noise. After that erythrocytes are seg-mented using marker controlled watershed method. Tables 2 and 3showed all 96 extracted features where 94 features are found tobe statistically significant as evaluated by Fisher’s F-criterion. InTable 4, the result toward performance analysis of the feature

selection-cum-classification scheme has been obtained in order toselect optimum set of features for achieving the highest accuracyin both the learning techniques. 10-fold cross validation approachgenerated confusion matrices (see Tables 5 and 6) for Bayesian

es.

oite PF gametocyte PF trophozoite Non-infected

0 0 00 4 00 20 0

138 0 00 110 140 40 102

oite PF gametocyte PF trophozoite Non-infected

18 0 00 5 00 23 0

145 0 03 108 250 10 131

Page 9: Automated Screening Malaria Parasite Using Light Microscopic Images

D.K. Das et al. / Micron 45 (2013) 97–106 105

Table 7Comparative study of the proposed methodology with the existing methods.

Malaria sp. Staining Performance

Diaz et al. (2009) P. falciparum GiemsaSensitivity – 94%Specificity – 98.7%

Sio et al. (2007) P. falciparum Giemsa –

Tek et al. (2006) P. falciparum GiemsaSensitivity – 74%Specificity – 98%

Ross et al. (2006) P. falciparum GiemsaSensitivity – 85%Positive predictive value – 81%

Proposed methodology P. falciparum & P. vivax Leishman

(a) Bayesian learningSensitivity – 98.10%Specificity – 68.91%

(b) SVM learningSensitivity – 96.62%Specificity – 88.51%

corre

aFtr

6

pcsNtttsassbacfnst

Fig. 8. Malaria classification accuracy

nd SVM techniques based on the most significant set of features.inally, a comparative study has been shown in Table 7 to comparehe proposed machine learning scheme with existing methods withespect to sensitivity and specificity.

. Discussions

Automated detection of malaria infected erythrocytes fromeripheral blood smear samples using light microscopy is still ahallenging task in pathological decision making. In fact, infectiontage classification is most important for early diagnosis of malaria.ow-a-days both types (P. falciparum and P. vivax) of malaria infec-

ion are even observed in a single patient. Often pathologists tendo overlook the presence of both infections due to overlapping fea-ures of trophozoite stage. Our proposed methodology providesignificant discriminating capability to differentiate both (P. falcip-rum and P. vivax) trophozoite stages of the infection with reducedubjective error. Most of the literatures considered cultured bloodamples; but here we have considered leishman stain peripherallood smear sample. Textural as well as morphological informationre necessary for characterizing abnormal malaria infected erythro-ytes. Here, we have incorporated both textural and morphological

eatures to discriminate P. vivax and P. falciparum infected andon-infected erythrocytes. In our proposed approach, we have con-idered a total of 888 (148 per class) erythrocytes for training andesting purposes. In Table 4, it can be observed that the dimension

sponding to F-value and feature set.

of feature space is decreasing with varying classification accuracywhile feature’s discriminating potentiality computed by F-value isincreasing. In Table 4 and Fig. 8, it can be observed that Bayesianapproach provides highest accuracy (84%) with 19 most significantfeatures corresponding to F ≥ 400; and SVM leads to the highestaccuracy (83.5%) for 9 most significant features while F ≥ 550. Suchinteractive statistical feature selection process becomes impor-tant when the dimension of features set is large. From Table 7, ithas been observed that most of the studies have been designedfor only P. falciparum infection stage classification using cultureblood sample with giemsa stain. The proposed machine learningscheme using Bayesian approach achieved 84% screening accu-racy, 98.10% sensitivity and 68.91% specificity for both P. falciparumand P. vivax parasite recognition. On the other side, the devel-oped approach using SVM leads to 83.5% screening accuracy,96.62% sensitivity and 88.51% specificity for both the parasitesrecognition.

7. Conclusion

In the field of quantitative microscopy, machine learning playsimportant role in structural and textural characterization of tis-

sue and cells. In view of this, here attempt has been made todevelop computer aided pattern recognition of malaria para-sitemia along with its stages based on learning techniques. Theproposed scheme is able to quantitatively characterize both P.
Page 10: Automated Screening Malaria Parasite Using Light Microscopic Images

1 icron 4

vaisqa

A

G(

R

B

C

C

D

D

D

D

D

F

G

G

G

G

G

H

H

H

K

06 D.K. Das et al. / M

ivax and P. falciparum for better pathological understanding. Inddition to this, it is able to automatically classify the malarianfected erythrocytes into trophozoite, schizont and gametocytetages. Moreover, it can be applicable in telemedicine to provideuick diagnosis in remote places where pathologists are not oftenccessible.

cknowledgement

The authors acknowledge Dept. of Information Technology,ovt. of India for providing financial support to carry out this work

Ref. No. IIT/SRIC/SMST/DPR/2009-10/15).

eferences

eucher, S., Meyer, F., 1993. The morphological approach to segmentation: thewatershed transformation. Mathematical Morphology in Image Processing, vol.34. Marcel Dekker, New York, pp. 433–481 (Chapter 12).

hu, A., Schgal, C.M., Greenleaf, J.F., 1990. Use of gray value distribution of run lengthsfor texture analysis. Pattern Recognition Letter 11 (6), 415–420.

uomo, M.J., Noel, L.B., White, D.B., 2012. Diagnosing Medical Parasites: A PublicHealth Officers Guide to Assisting Laboratory and Medical Officers. Retrievedfrom http://www.phsource.us/PH/PARA/Diagnosing Medical Parasites.pdf

as, D., Ghosh, M., Chakraborty, C., Pal, M., Maity, A.K., 2010. Invariant moment basedfeature analysis for abnormal erythrocyte recognition. In: Proceedings of IEEE,ICSMB, IEEE, IIT Kharagpur, India, pp. 242–247.

asarathy, B.R., Holder, E.B., 1991. Image characterization based on joint gray-levelrun-length distribution. Pattern Recognition Letter 12 (8), 497–502.

empster, A., Ruberto, C.D., 1999. Morphological processing of malarial slide images.In: Matlab DSP Conference, Espoo, Finland, pp. 16–17.

iaz, G., Gonzalez, F.A., Romero, E., 2009. A semi automatic method for quantificationand classification of erythrocytes infected with malaria parasites in microscopicimage. Journal of Biomedical Informatics 42 (2), 296–307.

uda, R., Hart, P.E., Stork, D.G., 2007. Pattern Classification, 2nd ed. Wieley Pub, NewDelhi.

rean, J., 2010. Microscopic determination of malaria parasite load: role of imageanalysis. Microscopy: Science, Technology, Application and Education, FORMA-TEX 3, 862–866.

alloway, M.M., 1975. Texture analysis using gray level run lengths. ComputerGraphics and Image Processing 4 (2), 172–179.

onzalez, R.C., Woods, R.E., 2002. Digital Image Processing, 2nd ed. Prentice Hall,New York.

hosh, M., Das, D., Chakraborty, C., 2010. Entropy based divergence for leukocyteimage segmentation. In: Proceedings of IEEE, ICSMB, IEEE, IIT Kharagpur, India,pp. 409–413.

reer, J.P., Foerster, J., Rodgers, G.M., Paraskevas, F., Glader, B., et al., 2009. Wintrobe’sClinical Hematology, 12th ed. Lippincott Williams & Wilkins, Philadelphia.

un, A.M., Gupta, M.K., Dasgupta, B., 2008. Fundamentals of Statistics, vol. 2. TheWorld Press Pvt. Ltd., Kolkata, India.

an, J., Kamber, M., 2006. Data Mining: Concept and Techniques. Morgan KaufmannPublishers, San Francisco, USA.

aralick, R.M., Shanmugam, K., Dinstein, I., 1973. Textural features for imageclassification. IEEE Transaction on Systems. Man and Cybernetics 3 (6),610–621.

ue, M.K., 1962. Visual pattern recognition by moment invariants. IRE Transactionon Information Theory 8 (2), 179–187.

rishnan, M.M.R., Shah, P., Choudhury, A., Chakraborty, C., Paul, R.R., et al., 2011.Textural characterization of histopathological images for oral sub-mucous fibro-sis detection. Tissue and Cell 43 (5), 318–330.

5 (2013) 97–106

Krishnan, M.M.R, Shah, P., Chakraborty, C., Ray, A.K., 2012. Statistical analysis oftextural features for improved classification of oral histopathological images.Journal of Medical Systems 36 (2), 865–881.

Kumar, S., Ong, S.H., Raganath, S., Ong, T.C., Chew, F.T., 2006. A rule-basedapproach for robust clump splitting. Pattern Recognition Letter 39 (6),1088–1098.

Lam, E.Y., 2005. Combining gray world and retinex theory for automatic white bal-ance in digital photography. In: Proceedings of International Symposium onConsumer Electronics, pp. 134–139.

Makkapati, V.V., Rao, R.M., 2009. Segmentation on malaria parasites in peripheralblood smear images. In: Proceedings of IEEE International Conference on Acous-tics, Speech and Signal Processing, pp. 1361–1364.

Mandelbrot, B.B., 1982. Fractal Geometry of Nature. W. H. Freeman and Company,New York.

Martis, R.J., Acharya, R.U., Mandana, K.M., Ray, A.K., Chakraborty, C., 2012.Application of principal component analysis to ECG signals for automateddiagnosis of cardiac health. Expert Systems with Applications 39 (14),11792–11800.

National vector borne disease control programme. Trend of malaria (2010-2011).Available at http://nvbdcp.gov.in/malaria9.html

Ojala, T., Pietikainen, M., Maenpaa, T., 2002. Multiresolution gray-scale and rotationinvariant texture classification with local binary patterns. IEEE Transactions onPattern Analysis and Machine Intelligence 24 (7), 971–981.

Pharwaha, A.P.S., Sing, B., 2009. Shannon and non-Shannon measures of entropy forstatistical texture feature extraction in digitized mammograms. In: Proceedingsof WCECS, vol. I/II, San Francisco, USA, pp. 1286–1291.

Rastogi, V., 2008. Fundamentals of Biostatistics. Ane Books, India.Raviraja, S., Osman, S.S., Kardman, 2008. A novel technique for malaria diag-

nosis using invariant moments and by image compression. In: IFMBEProceedings of 4th International Conference on Biomedical Engineering, vol. 21,pp. 730–733.

Raviraja, S., Bajpai, G., Sharma, S.K., 2006. Analysis of detecting the malarialparasite infected blood images using statistical based approach. In: IFMBEProceedings of 3rd International conference on Biomedical Engineering, vol. 15,pp. 534–537.

Ross, N.E., Pritchard, C.J., Rubin, D.M., Duse, A.G., 2006. Automatic image processingmethod for the diagnosis and classification of malaria on thin blood smears.Medical and Biological Engineering Computing 44 (5), 427–436.

Sarkar, N., Choudhury, B.B., 1994. An efficient differential box-counting approachto compute fractal dimension of image. IEEE Transaction on Systems, Man andCybernetics 24 (1), 115–120.

Sio, S.W.S., Sun, W., Kumar, S., Bin, W.Z., Tan, S.S., et al., 2007. Malaria count: animage analysis-based program for the accurate determination of parasitemia.Journal of Microbiological Methods 68 (1), 11–18.

Tan, J.H., Ng, E.Y.K., Acharya, R.U., Chee, C., 2010. Study of normalocular ther-mogram using textural parameters. Infrared Physics & Technology 53 (2),120–126.

Tan, J.H., Ng, E.Y.K., Acharya, U.R., Chee, C., 2009. Infrared thermography on ocu-lar surface temperature: a review. Infrared Physics & Technology 52 (4),97–108.

Tang, X., 1998. Texture information in run-lengths matrices. IEEE Transaction onImage Processing 7 (11), 1602–1609.

Tangpukdee, N., Duangdee, C., Wilairatana, P., Krudsood, S., 2009. Malaria diagnosis:a brief review. Korean Journal of Parasitology 47 (2), 93–102.

Tek, F.B., Dempster, A.G., Kale, I., 2006. Malaria parasite detection in peripheral bloodimages. In: Proceeding of British Machine Vision Conference.

Thangavel, K., Manavalan, R., Laurence Aroquiaraj, I., 2009. Removal of speckle noisefrom ultrasound medical image based on special filters: comparative study.ICGST-GVIP Journal 9 (3), 25–32.

Toha, S.F., Ngah, U.K., 2007. Computer Aided Medical Diagnosis for the Identifi-cation of Malaria Parasites. In: Proceedings of IEEE ICSCN, MIT Campus, AnnaUniversity, Chennai, India, pp. 521–522.

WHO, 2010. Basic Malaria Microscopy: Part I. Learner’s Guide, 2nd ed. World HealthOrganization, Geneva, Switzerland, pp. 51–67.