poster: identifying urban villages from city-wide ... · neural network (mask-rcnn) model for...

4
Poster: Identifying Urban Villages from City-Wide Satellite Imagery Leveraging Mask R-CNN Longbiao Chen Tianqi Xie Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University Xiamen, China [email protected] Xueyi Wang University of Hertfordshire Hertfordshire, UK Cheng Wang* Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University Xiamen, China ABSTRACT Urban villages emerge with the rapid urbanization process in many developing countries, and bring serious social and eco- nomic challenges to urban authorities, such as overcrowding and low living standards. A comprehensive understanding of the locations and regional boundaries of urban villages in a city is crucial for urban planning and management, especially when urban authorities need to renovate these regions. Traditional methods greatly rely on surveys and investigations of city planners, which consumes substantial time and human labor. In this work, we propose a low-cost and automatic framework to accurately identify urban vil- lages from high-resolution remote sensing satellite imagery. Specifically, we leverage the Mask Regional Convolutional Neural Network (Mask-RCNN) model for end-to-end urban village detection and segmentation. We evaluate our frame- work on the city-wide satellite imagery of Xiamen, China. Results show that our framework successfully detects 87.18% of the urban villages in the city, and accurately segments their regional boundaries with an IoU of 74.48%. CCS CONCEPTS Human-centered computing Ubiquitous and mobile computing systems and tools. KEYWORDS Urban Village; Image Segmentation; Mask-RCNN; Deep Learn- ing; Urban Computing Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. UbiComp/ISWC ’19 Adjunct, September 9–13, 2019, London, United Kingdom © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6869-8/19/09. . . $15.00 https://doi.org/10.1145/3341162.3355269 ACM Reference Format: Longbiao Chen, Tianqi Xie, Xueyi Wang, and Cheng Wang*. 2019. Poster: Identifying Urban Villages from City-Wide Satellite Imagery Leveraging Mask R-CNN. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Com- puting and the 2019 International Symposium on Wearable Com- puters (UbiComp/ISWC ’19 Adjunct), September 9–13, 2019, Lon- don, United Kingdom. ACM, New York, NY, USA, 4 pages. https: //doi.org/10.1145/3341162.3355269 1 INTRODUCTION Urban village refers to the residential area that is lagging be- hind the pace of development of urbanization, free from the management of modern cities, and with low living standards in the process of urban development [4]. In China, These villages are used to be gathered by the old cottages remained from years ago, and they are commonly inhabited by low in- come and transient communities. As a results, urban villages continuously suffer from overcrowding and social problems [2]. Accurately identifying the regional boundaries of urban villages is important to the management and planning of the development of urban villages, including real estate reform, road renovation, and sanitation management. In the past, detecting the location and finding the regional boundary of an urban village mainly rely on field surveys of city planners and their local knowledges, which is usu- ally time-consuming and inaccurate for a comprehensive understanding of city-wide urban villages [4]. Recently, the ubiquitousness of high resolution satellite images and the rapid development of deep learning techniques provide us with new opportunities to identifying urban village regions in a low-cost and automatic manner. Specifically, Mask Re- gional Convolutional Neural Networks (Mask-RCNN) has a very good performance in target detection and instance segmentation from images [3]. In this work, we propose an end-to-end framework to detect urban villages and segment their boundaries from city-wide satellite images using the Mask-RCNN architecture. The main contributions of this work include:

Upload: others

Post on 28-Jan-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Poster: Identifying Urban Villages from City-WideSatellite Imagery Leveraging Mask R-CNNLongbiao Chen

Tianqi XieFujian Key Laboratory of Sensing andComputing for Smart City, School of

Informatics, Xiamen UniversityXiamen, China

[email protected]

Xueyi WangUniversity of Hertfordshire

Hertfordshire, UK

Cheng Wang*Fujian Key Laboratory of Sensing andComputing for Smart City, School of

Informatics, Xiamen UniversityXiamen, China

ABSTRACTUrban villages emerge with the rapid urbanization process inmany developing countries, and bring serious social and eco-nomic challenges to urban authorities, such as overcrowdingand low living standards. A comprehensive understandingof the locations and regional boundaries of urban villagesin a city is crucial for urban planning and management,especially when urban authorities need to renovate theseregions. Traditional methods greatly rely on surveys andinvestigations of city planners, which consumes substantialtime and human labor. In this work, we propose a low-costand automatic framework to accurately identify urban vil-lages from high-resolution remote sensing satellite imagery.Specifically, we leverage the Mask Regional ConvolutionalNeural Network (Mask-RCNN) model for end-to-end urbanvillage detection and segmentation. We evaluate our frame-work on the city-wide satellite imagery of Xiamen, China.Results show that our framework successfully detects 87.18%of the urban villages in the city, and accurately segmentstheir regional boundaries with an IoU of 74.48%.

CCS CONCEPTS• Human-centered computing→ Ubiquitous and mobilecomputing systems and tools.

KEYWORDSUrbanVillage; Image Segmentation;Mask-RCNN;Deep Learn-ing; Urban Computing

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACMmust be honored. Abstracting withcredit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Requestpermissions from [email protected]/ISWC ’19 Adjunct, September 9–13, 2019, London, United Kingdom© 2019 Association for Computing Machinery.ACM ISBN 978-1-4503-6869-8/19/09. . . $15.00https://doi.org/10.1145/3341162.3355269

ACM Reference Format:Longbiao Chen, Tianqi Xie, Xueyi Wang, and Cheng Wang*. 2019.Poster: Identifying Urban Villages from City-Wide Satellite ImageryLeveraging Mask R-CNN. In Adjunct Proceedings of the 2019 ACMInternational Joint Conference on Pervasive and Ubiquitous Com-puting and the 2019 International Symposium on Wearable Com-puters (UbiComp/ISWC ’19 Adjunct), September 9–13, 2019, Lon-don, United Kingdom. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3341162.3355269

1 INTRODUCTIONUrban village refers to the residential area that is lagging be-hind the pace of development of urbanization, free from themanagement of modern cities, and with low living standardsin the process of urban development [4]. In China, Thesevillages are used to be gathered by the old cottages remainedfrom years ago, and they are commonly inhabited by low in-come and transient communities. As a results, urban villagescontinuously suffer from overcrowding and social problems[2]. Accurately identifying the regional boundaries of urbanvillages is important to the management and planning of thedevelopment of urban villages, including real estate reform,road renovation, and sanitation management.

In the past, detecting the location and finding the regionalboundary of an urban village mainly rely on field surveysof city planners and their local knowledges, which is usu-ally time-consuming and inaccurate for a comprehensiveunderstanding of city-wide urban villages [4]. Recently, theubiquitousness of high resolution satellite images and therapid development of deep learning techniques provide uswith new opportunities to identifying urban village regionsin a low-cost and automatic manner. Specifically, Mask Re-gional Convolutional Neural Networks (Mask-RCNN) hasa very good performance in target detection and instancesegmentation from images [3]. In this work, we propose anend-to-end framework to detect urban villages and segmenttheir boundaries from city-wide satellite images using theMask-RCNN architecture. The main contributions of thiswork include:

UbiComp/ISWC ’19 Adjunct, September 9–13, 2019, London, United Kingdom Chen et al.

Figure 1: Framework overview.

• To the best of our knowledge, this is the first work onurban village detection and segmentation from satel-lite imagery, which provides a low-cost alternative forurban planning and management.

• We propose an end-to-end framework to detect andsegment urban villages from city-wide satellite im-ages. We first clip a large city-wide satellite imageinto small patches, and then collect the urban villagemask labels using a crowdsourcing platform. We traina Mask-RCNN model on a randomly-selected patchset and segment the urban village masks on the leftpatches. Finally, we merge all the patches to obtainthe regional boundaries of all the urban village in thecity-wide satellite image.

• We conduct real-world evaluation in Xiamen, China.Results show that our framework successfully detectsthe urban villages in the city with a precision of 90.67%and a recall of 87.18%, and accurately segments theirregional boundaries with an IoU of 74.48%.

2 FRAMEWORK OVERVIEWThe overview of the proposed framework is shown in Fig-ure 1. First, we clip a large city-wide satellite image into smallpatches, and mask the urban villages in the patches usinga crowdsourcing platform. Then, we train a Mask-RCNNmodel on a set of randomly-selected patches, and predict theurban village masks for the other patches. Finally, we mergeall the patches to obtain the regional boundaries of all theurban villages in the city-wide satellite image.

3 CITY-WIDE SATELLITE IMAGE CLIPPINGHigh-resolution satellite images can be obtained from vari-ous geographic information services, such as Google Earth1. However, city-wide satellite imageries are usually verylarge. For example, the satellite image of Xiamen island with

1https://www.google.com/earth/

a resolution of 0.5 meter can be as large as 1.80 GB. Directlyprocessing such a large image for urban village identifica-tion is computationally intractable. Therefore, we first clip acity-wide satellite image into small patches for training.

Specifically, we employ the Python Imaging Library (PIL)for satellite imagery clipping, which has proven to be efficientand can preserve the geographical coordinates embedded inthe satellite imagery. We determine the size of each patchto be 500 × 500m2 squares based on previous studies on thegeographic spans of typical urban village [2].

4 URBAN VILLAGE MASK LABELINGProviding an sufficient number of samples for machine learn-ing tasks is the basis of ensuring model performance [1].However, labeling urban village masks in each patch is timeconsuming and requires domain knowledge. Therefore, weexploit the crowdsourcing mechanism to outsource the labelmasking tasks to the massive qualified crowd workers.First, we recruit a group of participants with incentives

from Xiamen University and train them with backgroundknowledge about urban villages. We then develop a web-based crowdsourcing platform to randomly assign patches toparticipants. Specifically, each patch is assigned to three par-ticipants for cross validation. We integrate an open-sourceimage mask labeling tool labelme 2 into the platform to facili-tate the masking process. Finally, we obtain the urban villagemasks for all the patches via the crowdsourcing platform.Figure 2 shows an example of the collected urban villagemask labels near a train station.

5 URBAN VILLAGE DETECTION ANDSEGMENTATION

In this step, we train a Mask-RCNN model to detect andsegment urban villages from each image patch. Mask-RCNNprovides an end-to-end solution to efficiently detects objects

2https://github.com/wkentaro/labelme

Poster: Identifying Urban Villages from City-Wide Satellite Imagery... UbiComp/ISWC ’19 Adjunct, September 9–13, 2019, London, United Kingdom

(a) The clipped patch. (b) The labeled masks.

Figure 2: An illustrative example of the collected urban vil-lage masks labeled by the crowdsourcing participants.

in an image while simultaneously generating a high-qualitysegmentation mask for each instance [3].Specifically, we randomly select a small set of patches

with urban village masks as the training set, and use themto train a Mask-RCNN model. The selection of the trainingset size is based on repeated experiments. We then exploitthe trained model to detect urban villages and predict theircorresponding masks simultaneously. Afterwards, we mergeall the patches and masks into one large image to obtain acity-wide view of urban village distribution. We also con-duct several auxiliary image processing steps to eliminateunnecessary boundaries between adjacent patches.

6 EVALUATIONSExperiment SettingsWe evaluate our framework using high-resolution satelliteimagery of Xiamen Island from Google Earth. Table 1 showsthe details of the collected imagery. We clip the whole im-agery into 500 × 500m2 patches and obtain 650 patches. Werandomly select 32 patches (5% of all patches) with urban vil-lages to train the Mask-RCNN model, and predict the maskson the other patches. We deploy our framework on on aserver with an nVIDIA GeForce GTX 1080Ti graphic cardand 16GB RAM.

Table 1: Imagery Description

Items specificationNorthwest Coordinates [24.561492, 118.064736]Southeast Coordinates [24.423240, 118.198513]

Geographic Span 13.54km × 15.39kmSatellite Image Resolution 0.54meter

(a) Baijiacun Village (b) Zengcuoan Village

Figure 3: Two examples of identified urban villages.

Evaluation MetricsDetection accuracy: if an urban village in the ground truthhas a spatial overlapping with the detected instance, wemarkthe detection as a hit, and otherwise a miss. Based upon this,the precision and recall are calculated as follows:

precision =|{truth instance} ∩ {detected instance}|

|{detected instance}|(1)

recall =|{ground-truth instance} ∩ {detected instance}|

|{ground-truth instance}|(2)

Segmentation accuracy: we adopt the popular Intersec-tion over Union (IoU) metric to evaluate the segmentationaccuracy over the city-wide imagery, i.e.,

IoU =|{ground-truth pixel} ∩ {detected pixel}|

|{ground-truth pixel}|(3)

Evaluation ResultsFigure 4 shows the result of city-wide urban village detectionand segmentation in Xiamen Island. Our framework success-fully identifies the urban villages with various sizes and loca-tions. For example, Figure 3 demonstrates two examples ofidentified urban villages. Specifically, our framework detects75 urban villages, among which 69 of them are found in the78 ground truth instances, achieving a detection precision of90.67% and a recall of 87.18%, respectively. For segmentationaccuracy, our framework achieves an IoU of 74.48%, whichis quite good for city-wide image segmentation.

7 CONCLUSIONIn this work, we detect the urban villages and segment theirgeographic boundary from city-wide high-resolution satel-lite imagery. We propose a framework to exploit the state-of-the-art Mask-RCNN model for instance detection andsegmentation in an end-to-end manner. The proposed frame-work is evaluated in Xiamen Island and achieves accurate

UbiComp/ISWC ’19 Adjunct, September 9–13, 2019, London, United Kingdom Chen et al.

Figure 4: Result of city-wide urban village detection and segmentation in Xiamen Island.

detection and segmentation results. In the future, we plan toexplore how the villages boundaries change with time.

ACKNOWLEDGMENTSWe would like to thank the reviewers for their constructivesuggestions. This research is supported by NSF of China No.61802325, NSF of Fujian Province No. 2018J01105, and theChina Fundamental Research Funds for the Central Univer-sities No. 20720170040.

REFERENCES[1] C.M. Bishop and others. 2006. Pattern Recognition and Machine Learning.

Vol. 4. springer New York.[2] Tim Brindley. 2003. The Social Dimension of the Urban Village: A

Comparison of Models for Sustainable Urban Development. URBANDESIGN 8, 1 (June 2003), 53–65.

[3] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017.Mask R-CNN. arXiv:1703.06870 [cs] (March 2017). arXiv:cs/1703.06870

[4] Alberto Magnaghi. 2005. The Urban Village: A Charter for Democracyand Sustainable Development in the City. Zed Books.