malaria in india - tufts universitymalaria in india finding correla onsusing remote sensed data this...

1
Malaria in India Finding correlaƟons using remote sensed data This project aims to look at the relaƟonship between land cover type, temperature. Malaria is is a mosquitoborne disease caused by a par asite. People with malaria oŌen experience fever, chills, and ulike illness. LeŌ untreated, they may develop severe complicaƟons and die. In 2010 an esƟmated 219 million cases of malaria occurred worldwide and 660,000 people died (Malaria). Malaria is one of the most widespread diseases in developing naƟons. In order to combat this widespread disease, environmental factors like surface temperature and land cover type must be considered. Our group explored two regions of India with high prevalence of Malaria casesMangalore and Visakhapatnam. In this paper we will look at how classicaƟon and temperature are related to Malaria cases in these two regions. IntroducƟon Analysis model This project seeks to measure the eect of landcover and temperature on malaria infecƟon rates. A cross secƟonal model will be used to add robustness to the analysis. Two sites were chosen based on the availability of data and high malaria infecƟon rates: Mangalore and Visakhapatnam India. Because these two areas are also on opposite coasts, this should add external validity to the model by making it more representaƟve of India as a whole. Therefore, we propose the following model: Y it =β 0 +β 1 TempUrb it +β 2 TempVeg it +β 3 Urb% it +β 4 Veg% it +β 5 Water it +β 6 UrbDensity*DistVeg it +θ+ε TempUrb it =Temperature in urban area in locaƟon i at Ɵme t TempVeg it =Temperature in vegetated area in locaƟon i at Ɵme t Urb% it =Percentage of urban area in locaƟon i at Ɵme t Veg% it =Percentage of vegetated area in locaƟon i at Ɵme t Water% it =Percentage of vegetated area in locaƟon i at Ɵme t UrbDensity*DistVeg it =InteracƟon of urban density variable and distance from vegetated area in locaƟon i at Ɵme t θ= Fixed E ects Data All data is Landsat, either 5 or 7. Obtained from USGS EarthExplorer and Glovis Visakhapatnam Group obtained all data Used 4Images from four dates ranging from December to February over 20012004 Chose not to use gapll for classicaƟons as we are looking at percentages Mangalore Group used data Used seven images from seven dates ranging from December to January over 20032009 for temperature data. Mosaiced into ve images for analysis ClassicaƟon •Goal of classicaƟon was to determine changes in urban, vegetated, and water land cover over Ɵme •Used Maximum Likelihood supervised classicaƟon Four types of dierent supervised classicaƟon methods are used to examine which one is beƩer classify the target image: Figure.2 Comparison of dierent classicaƟon method. From leŌ to right: false color image before classicaƟon, parallelepiped classicaƟon, minimum distance classicaƟon, mahalanobis distance classicaƟon, maximum likelihood classicaƟon. Figure.2 shows the most disƟnguished dierences of dierent classicaƟon methods. The parallelepiped method sƟll misclassied a lot of urban area and soil and vegetaƟon area. The minimum distance classicaƟon method classies part of the soil area as the shadow. And the mahalanobis distance classicaƟon method includes a lot of soil area (shown in green) into the urban area. The maximum likelihood method did the best job Created masks for not relevant areas, including clouds and shadows in some images Aside from obtaining staƟsƟcs for landcover, also created masks from classicaƟon for temperature data. Figure.1 Mangalore and Visakhapatnam chosen as study areas. Both have high malaria rates, available da- ta.Cross Sectional Models more robust Post classicaƟon Temperature In essence, all the processing for temperature is similar. First, the landsat image’s bands are converted to radiance. Next, the thermal band is atmospherically corrected using data from NASA. Finally, band math is used to convert the atmospherically corrected image to Kelvin, then to Celsius. For two of the ve years used from Mangalore however, the images rst needed to be mosaiced together to form a complete area for temperature analysis. Only images that had accompanying images within two weeks were considered for mosaicing in order to keep the data consistent Also in the interest of consistency, the group decided to use the same landcover mask to obtain all the temperature data. This is due to the lack of Ɵme for true accuracy assessments of the classicaƟons. OpƟ‐ mally, at least two classicaƟons would have been done of each image to test for the accuracy, but since this was not possible, there are most likely measurement errors between years. Because theoreƟcally, the size of the urban and vegetated areas should not change drasƟcally from year to year, we believe that the gain from increased consistency is large compared to the loss in accuracy Results The dependent variable is in thousands of people, so the temperature variable can be interpreted as: An increase in one degree Celsius in vegetated areas translates to a 17.18 increase in Malaria cases per 1000 people. Likewise, the landcover variable can be interpreted as: A one percent increase in the percent of the area with vegetaƟon translates to an increase in the number of malaria cases increasing by one per 1000 people. Most of the variables are staƟsƟcally signicant at the 5% level, but this should be taken with cauƟon given the number of observaƟons. The direcƟon of the urban temperature variable is interesƟng, but makes sense that since urban temperatures are already hoƩer than vegetated areas, they may be too hot for the opƟmal level for mosquitos, so cooler temperatures increase malaria cases. (see Figure.6) Figure.6 Results Source of error Reference David Murphy: The Fletcher School Yuqi Tang: School of Engineering Mojdeh Karkhanehchi: School of Engineering Valerie Cleland: School of Arts and Sciences Project date: April 2013 Goal of post classicaƟon is to determine the accuracy of the classicaƟons. •Majority and Minority Analysis; Error Matrix; MisclassicaƟon Rate The Majority lter is used to “smooth” the image and the percentage for the category which is in the majority of the image would increase by this method. On the other hand, the urban area which is in minority of the image, would show a higher percentage in the Minority Analysis. Ɵ: MisclassicaƟon Rate ϴ=y/n= (596(92+192+199+97))/596=0.026846 ; y=16 ; n=596 95% condence interval : Conf ϴ=[0.01658, 0.04316] The misclassicaƟon rate is low because we did not have any ground •Error Matrix: The error matrix is used to determine the overall accuracy, producer’s accuracy ,user’s accuracy, commission and omissions. The overall accuracy was 97.8%, which is high. The reason is that we extracted 20% of the ROIs which were selected as a specic category and Specic category and assigned them as test and train ROIs. Nine observaƟons isn’t enough to conclude anything staƟsƟcally signicant from the data. For the classicaƟons, each group member did one classicaƟon for each region and due to human error and dierence in selecƟons of ROIs. We also used the same mask for each year. All of the images we used were collected across three or four months, error could occur due the Ɵme span of analysis. For the Visakhapatnam images, we had to mosaic mulƟple images together to get the complete region of interest. Finally, we were not able to ground truth our data. Overall, we did well with the nine observaƟons that we had, and to do this again we would need more Ɵme to ensure that these errors were accounted for. Future Study Remote sensing can be a very valuable tool when looking at Malaria. First, we would have a larger number of observaƟons from the two sites as well as more regions to compare. We could compare images directly before and aŌer the rainy season. Looking at this would further our ability to look at the larval habitats for mosquitoes and compare that with the number of cases in the area. A project could be done that focused more on what types of habitats are best for mosquito larvae. This could then be combined with GIS to look at the water type and distance to the closest human seƩlement to see if there is a correlaƟons between distance to certain types of habitat and malaria cases. Clennon, Julie, and Aniset Kamanga. "Identifying Malaria Vector Breeding Habitats with Remote Sensing Data and Terrain-based Landscape Indices in Zambia."International Journal of Health Geographics (2010): Web. Dhiman, R. C. (2000). REMOTE SENSING : A VISIONARY TOOL IN. ICMR Bulletin, 30(11). "Malaria." Centers for Disease Control and Prevention. Centers for Disease Control and Prevention, 09 Aug. 2012. Web. 20 Apr. 2013. Midekisa. "Remote Sensing-based Time Series Models for Malaria Early Warn- ing in the Highlands of Ethiopia." National Center for Biotechnology Infor- mation. U.S. National Library of Medicine, n.d. Web. 23 Apr. 2013. Figure.3. Post classicaƟon Figure.5 Mosaicing the image Figure.4 Matrix confusion

Upload: others

Post on 25-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Malaria in India - Tufts UniversityMalaria in India Finding correla onsusing remote sensed data This project aims to look at the rela onship between land cover type, temperature. Malaria

 

Malaria in India  Finding correla ons using remote sensed data 

This project aims to look at the rela onship between land cover type, 

temperature. Malaria is is a mosquito‐borne disease caused by a par‐

asite. People with malaria o en experience fever, chills, and flu‐like 

illness. Le  untreated, they may develop severe complica ons and 

die. In 2010 an es mated 219 million cases of malaria occurred 

worldwide and 660,000 people died (Malaria). Malaria is one of the 

most widespread diseases in developing na ons. In order to combat 

this widespread disease, environmental factors like surface tempera‐

ture and land cover type must be considered. Our group explored 

two regions of India with high prevalence of Malaria cases‐ Manga‐

lore and Visakhapatnam. In this paper we will look at how classifica‐

on and temperature are related to Malaria cases in these two re‐

gions. 

Introduc on 

Analysis model 

This project seeks to measure the effect of landcover and tempera‐ture on malaria infec on rates. A cross sec onal model will be used to add robustness to the analysis. Two sites were chosen based on the availability of data and high malaria infec on rates: Mangalore and Visakhapatnam India. Because these two areas are also on oppo‐site coasts, this should add external validity to the model by making it more representa ve of India as a whole. Therefore, we propose the following model: 

 

Yit=β0+β1TempUrbit+β2TempVegit+β3Urb%it+β4Veg%it 

+β5Waterit+β6UrbDensity*DistVegit+θ+ε 

 

TempUrbit=Temperature in urban area in loca on i at me t 

TempVegit=Temperature in vegetated area in loca on i at me t 

Urb%it=Percentage of urban area in loca on i at me t 

Veg%it=Percentage of vegetated area in loca on i at me t 

Water%it=Percentage of vegetated area in loca on i at me t 

UrbDensity*DistVegit=Interac on of urban density variable and dis‐

tance from vegetated area in loca on i at me t 

θ= Fixed Effects 

 

Data 

All data is Landsat, either 5 or 7. Obtained from USGS EarthExplorer and Glovis 

Visakhapatnam 

●Group obtained all data 

●Used 4Images from four dates ranging 

from December to February over 2001‐

2004 

●Chose not to use gapfill for classifica ons 

as we are looking at percentages 

Mangalore

●Group used data  

●Used seven images from seven dates 

ranging from December to  January over 

2003‐2009 for temperature data. Mosaiced 

into five images for analysis 

 

Classifica on 

•Goal of classifica on was to determine changes in urban, vegetated, and water land cover 

over  me 

•Used Maximum Likelihood supervised classifica on 

Four types of different supervised classifica on methods are used to examine which one is 

be er classify the target image: 

Figure.2 Comparison of different classifica on method. From le to right: false color image before classifica on, parallelepiped

classifica on, minimum distance classifica on, mahalanobis distance classifica on, maximum likelihood classifica on.

Figure.2 shows the most dis nguished differences of different classifica on methods. The 

parallelepiped method s ll misclassified a lot of urban area and soil and vegeta on area. The 

minimum distance classifica on method classifies part of the soil area as the shadow. And the 

mahalanobis distance classifica on method includes a lot of soil area (shown in green) into the 

urban area. The maximum likelihood method did the best job  

•Created masks for not relevant areas, including clouds and shadows in some images

•Aside from obtaining sta s cs for landcover, also created masks from classifica on for tem‐

perature data. 

Figure.1  Mangalore and

Visakhapatnam chosen as study areas. Both have high malaria rates, available da-ta.Cross Sectional Models more robust 

Post classifica on 

Temperature 

In essence, all the processing for temperature is similar. First, the 

landsat image’s bands are converted to radiance. Next, the thermal 

band is atmospherically corrected using data from NASA. Finally, 

band math is used to convert the atmospherically corrected image to 

Kelvin, then to Celsius. For two of the five years used from Mangalore 

however, the images first needed to be mosaiced together to form a 

complete area for temperature analysis. Only images that had ac‐

companying images within two weeks were considered for mosaicing 

in order to keep the data consistent 

Also in the interest of consistency, the group decided to use the same 

landcover mask to obtain all the temperature data. This is due to the 

lack of  me for true accuracy assessments of the classifica ons. Op ‐

mally, at least two classifica ons would have been done of each im‐

age to test for the accuracy, but since this was not possible, there are 

most likely measurement errors between years. Because theore cal‐

ly, the size of the urban and vegetated areas should not change dras‐

cally from year to year, we believe that the gain from increased con‐

sistency is large compared to the loss in accuracy  

Results 

The dependent variable is in thousands of people, so the tempera‐

ture variable can be interpreted as: An increase in one degree Celsius 

in vegetated areas translates to a 17.18 increase in Malaria cases per 

1000 people. Likewise, the landcover variable can be interpreted as: 

A one percent increase in the percent of the area with vegeta on 

translates to an increase in the number of malaria cases increasing 

by one per 1000 people. Most of the variables are sta s cally signifi‐

cant at the 5% level, but this should be taken with cau on given the 

number of observa ons. The direc on of the urban temperature var‐

iable is interes ng, but makes sense that since urban temperatures 

are already ho er than vegetated areas, they may be too hot for the 

op mal level for mosquitos, so cooler temperatures increase malaria 

cases. (see Figure.6) 

Figure.6 Results

Source of error 

 

Reference 

David Murphy: The Fletcher School

Yuqi Tang: School of Engineering

Mojdeh Karkhanehchi: School of Engineering

Valerie Cleland: School of Arts and Sciences

Project date: April 2013

    •Goal of post classifica on is to  determine the accura‐

cy of the classifica ons. 

    •Majority and Minority Analysis; Error Matrix; Misclas‐

sifica on Rate 

    •The Majority filter is used to  “smooth” the image and 

the percentage for the category which is in the majority  

of the image would increase by this method. On the oth‐

er hand, the urban area which is in minority of the image, would show a 

higher percentage in the Minority Analysis.  

    • Ɵ: Misclassifica on Rate 

ϴ=y/n= (596‐(92+192+199+97))/596=0.026846 ; y=16 ; n=596 

95% confidence interval :        Conf ϴ=[0.01658, 0.04316] 

The misclassifica on rate is low because we did not have any ground  

•Error Matrix: The error matrix is used to 

determine the overall accuracy, producer’s 

accuracy ,user’s accuracy, commission and 

omissions. The overall accuracy was 97.8%, 

which is high. The reason is that we extract‐

ed 20% of the ROIs which were selected as 

a specific category and Specific category 

and assigned them as test and train ROIs. 

Nine observa ons isn’t enough to conclude an‐

ything sta s cally significant from the data. For 

the classifica ons, each group member did one 

classifica on for each region and due to human 

error and difference in selec ons of ROIs. We 

also used the same mask for each year. All of 

the images we used were collected across three 

or four months, error could occur due the  me 

span of analysis. For the Visakhapatnam imag‐

es, we had to mosaic mul ple images together 

to get the complete region of interest. Finally, 

we were not able to ground truth our data. 

Overall, we did well with the nine observa ons 

that we had, and to do this again we would 

need more  me to ensure that these errors 

were accounted for. 

Future Study 

Remote sensing can be a very valuable tool 

when looking at Malaria. First, we would have a 

larger number of observa ons from the two 

sites as well as more regions to compare. We 

could compare images directly before and a er 

the rainy season. Looking at this would further 

our ability to look at the larval habitats for mos‐

quitoes and compare that with the number of 

cases in the area. A project could be done that 

focused more on what types of habitats are 

best for mosquito larvae. This could then be 

combined with GIS to look at the water type 

and distance to the closest human se lement 

to see if there is a correla ons between dis‐

tance to certain types of habitat and malaria 

cases.  

Clennon, Julie, and Aniset Kamanga. "Identifying Malaria Vector Breeding Habitats with Remote Sensing Data and Terrain-based Landscape Indices in Zambia."International Journal of Health Geographics (2010): Web.  

Dhiman, R. C. (2000). REMOTE SENSING : A VISIONARY TOOL IN. ICMR Bulletin, 30(11).  

"Malaria." Centers for Disease Control and Prevention. Centers for Disease Control and Prevention, 09 Aug. 2012. Web. 20 Apr. 2013.  

Midekisa. "Remote Sensing-based Time Series Models for Malaria Early Warn-ing in the Highlands of Ethiopia." National Center for Biotechnology Infor-mation. U.S. National Library of Medicine, n.d. Web. 23 Apr. 2013. 

Figure.3. Post classifica on

Figure.5 Mosaicing the image

Figure.4 Matrix confusion