identification of disaster-affected areas using exploratory visual analysis of georeferenced tweets:...
TRANSCRIPT
AGILE 2016 Conference, Helsinki - 15th June 2016
Identification of disaster-affected areas using exploratory visual analysis of georeferenced Tweets:
application to a flood eventV. Cerutti1, G. Fuchs2, G. Andrienko2, N. Andrienko2, F.Ostermann1
1 ITC Faculty of Geo-Information Science and Earth Observation, University of Twente, The Netherlands 2 Fraunhofer IAIS, Sankt Augustin, Germany
IN THIS PRESENTATION:
INTRODUCTION METHODS CASE STUDY CONCLUSIONS AND FUTURE WORK
AKNOWLEDGMENT1. 2. 3. 4. 5.
1. INTRODUCTION
3
4
RESEARCH CONTEXT AND MOTIVATION
Social media for Disaster Management - Disaster response phase
Enable decision makers in rapid assessment of the situation
Geographic and temporal extent of disaster effects
5
RESEARCH OBJECTIVES
Conceptual management of crisis information
Geospatial footprint of disaster
Situational awareness and decision making
Combination of data mining and visual analysis
Detection of areas affected by a disaster
Twitter as data source
6
CASE STUDY OBJECTIVE
Help decision makers to assess the spatio-temporal footprint of a disaster
2. METHODS
7
8
Tweets pre-processing:
• removal of geographically unrelated data
• pattern-based removal of machine-generated data
• natural language processing to mine the data and classify and annotate its content
DATA PREPROCESSING
CLUSTERING + VISUAL ANALYSIS
9
Clustering techniques to understand how affected places are represented
Tight integration between computational and visualization
functionality
V-Analytics toolkit (http://geoanalytics.net/)
Need parameterization
space-time cube frequency histograms time graphs qualitative colouring animated maps
Density-based clustering Distance-bounded spatio-temporal event
clustering Data-driven territory tessellation
Visual analysis techniques
3. CASE STUDY
10
11
DATA DESCRIPTION AND PREPARATION
Case study: Sardinia flood, 18-19 November 2013
Original dataset: georeferenced Tweets (Dec 2012 - Apr 2014), bounding box: Italy, collected from Twitter streaming API
Query: Lexicon of flood-related keywords in Italian language
Demographic data to normalize the results
Ground truth information from official reports
Final dataset: 3,000 Tweets (Nov 2013) 897 Tweets (18-20 Nov 2013)
12
DATA ANALYSIS
18-19 Nov 2013
Random sample (5%) of georeferenced Tweets generated during the month of November 2013
Impossible to detect the flood event
After keyword-based filtering:
13
Spatio-temporal density-based clustering (OPTICS) + visual analysis to select optimal parameters
DATA ANALYSIS
14
Distance-bounded event clustering + visual analysis to select optimal parameters
DATA ANALYSIS
15
Data-driven territory tessellation + time series of Tweets frequency
DATA ANALYSIS
16
Spatio-temporal density-based clustering (OPTICS)
Distance-bounded event clustering
Data-driven territory tessellation
DATA ANALYSIS
17
RESULTS COMPARISON AND EVALUATION
False negative
Ground truth data Combination of clustering results
False positive
4. CONCLUSIONS AND FUTURE WORK
18
19
CONCLUSIONS
First steps towards an approach to define the geospatial footprint of a flood event using georeferenced Tweets
Data mining techniques + exploratory visual analysis of georeferenced Tweets to identify the areas affected by a flood
Intuitive and fast procedure
User evaluation needed
False positive and false negative need to be addressed
Further analysis required to obtain more precise footprints
20
Extraction of locations (and other potentially useful information) mentioned in social media
Comparison locations of origin (geotag) and content
Contextualization and validation of social media information with authoritative data
FUTURE WORK
Conceptualization of disasters
Content analysis
Geospatial footprint in near real time
5. AKNOWLEDGEMENT
21
AKNOWLEDGEMENT
22
COST ACTION IC 1203• STSM Grant
Fraunhofer Institute for Intelligent Analysis and Information Systems• Knowledge• Software• Dataset