vol. xx, no. xx, xx 20xx 1 data flow analysis and visualization...

13
VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization for Spatiotemporal Statistical Data without Trajectory Information Seokyeon Kim, Seongmin Jeong, Insoo Woo, Yun Jang, Member, IEEE, Ross Maciejewski, Member, IEEE, and David S. Ebert, Fellow, IEEE Abstract—Geographic visualization research has focused on a variety of techniques to represent and explore spatiotemporal data. The goal of those techniques is to enable users to explore events and interactions over space and time in order to facilitate the discovery of patterns, anomalies and relationships within the data. However, it is difficult to extract and visualize data flow patterns over time for non-directional statistical data without trajectory information. In this work, we develop a novel flow analysis technique to extract, represent, and analyze flow maps of non-directional spatiotemporal data unaccompanied by trajectory information. We estimate a continuous distribution of these events over space and time, and extract flow fields for spatial and temporal changes utilizing a gravity model. Then, we effectively visualize the spatiotemporal patterns in the data by employing flow visualization techniques. The user is presented with temporal trends of geo-referenced discrete events on a map. As such, overall spatiotemporal data flow patterns help users analyze geo-referenced temporal events, such as disease outbreaks, crime patterns, etc. To validate our model, we discard the trajectory information in an origin-destination dataset and apply our technique to the data and compare the derived trajectories and the original. Finally, we present spatiotemporal trend analysis for statistical datasets including twitter data, maritime search and rescue events, and syndromic surveillance. Index Terms—Spatiotemporal Data Visualization, Kernel Density Estimation, Flow Map, Gravity Model. 1 I NTRODUCTION I N cartography, a thematic map is a special-purpose map de- signed to illustrate particular features or concepts [8], [43] within a dataset. One common theme is the representation of geographical movements of people, ideas, money, energy, or material. Usually movement tables are given and the tables are visualized utilizing lines, arrows, or streaklines (e.g., [5], [27], [39], [48]). However, these movement tables do not exist for much data, even though we know that individuals in the dataset do move. Furthermore, many spatiotemporal event datasets are collected as the movement is already embedded into the datasets; unfortunately, the movement is never explicitly defined in the data. For example, emergency room records are routinely collected, and such records may contain an underlying notion of disease spread. However, the records themselves have no explicit defi- nition of movements. Instead, each record has only a patient’s address, time of visit, and visit reason. One can imagine that looking at some measure of movement within the spatiotemporal events could provide analysts with insights into various spread patterns. Such spread patterns are not limited only to emergency room events. We can expand this idea to any spatiotemporal event data, such as criminal incident reports, economics, and social trends. In order to find movements within the statistical data, both spatial and temporal patterns need to be explored during the Seokyeon Kim and Seongmin Jeong are with Sejong University. E-mail: {ksy0586 | jsm}@sju.ac.kr Insoo Woo is with Intel Folsom. E-mail: [email protected] Yun Jang is with Sejong University. E-mail: [email protected] Ross Maciejewski is with Arizona State University. E-mail: [email protected] David S. Ebert is with Purdue University. E-mail: [email protected] analysis phase. However, it is not easy to extract target movements from statistical data since there are many complicating factors. In previous work, one natural way to analyze such spatiotemporal data is to plot the data on a map and then provide animation control or small multiples views to visualize each time step of the data. This approach provides analysts with insight into the spatial distributions of the data as well as patterns and correlations between these distributions over time. However, scrolling through time steps requires analysts to remember what spatial distributions have occurred at different time steps, and it is not easy to compare one visualization with another visualization on the screen. More- over, it is hard to extract movements of the data explicitly by direct comparison of two visualizations. There also have been several illustrative and abstract visualization techniques but only direct movement data have been targeted without any flow extraction. In order to overcome these issues, we propose a flow extraction model. Figure 1 illustrates the concept of potential event flow extrac- tion. Spatiotemporal data is visualized on the map in geographical space in accordance with the time. Events are represented as red heatmaps estimated from the raw data. When the heatmaps are given with two time steps, t and t + 1, as in Figure 1 (a), the event flow of the data can be indicated as the straight arrow from the left to the right. In the same way, when the heatmaps are provided over the time steps, t - 1 to t + 1, the event flow can simply be extracted as the round shaped arrow as shown in Figure 1 (b). Both Figure 1 (a) and (b) are simple cases that have no overlap between events, and the event flows can easily be extracted. However, when applying the actual data, the heatmaps are too complicated to analyze. Figure 1 (c) presents actual heatmaps using real data over the time steps, t - 2 to t + 2.

Upload: others

Post on 21-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 1

Data Flow Analysis and Visualization forSpatiotemporal Statistical Data without

Trajectory InformationSeokyeon Kim, Seongmin Jeong, Insoo Woo, Yun Jang, Member, IEEE,

Ross Maciejewski, Member, IEEE, and David S. Ebert, Fellow, IEEE

Abstract—Geographic visualization research has focused on a variety of techniques to represent and explore spatiotemporal data.The goal of those techniques is to enable users to explore events and interactions over space and time in order to facilitate thediscovery of patterns, anomalies and relationships within the data. However, it is difficult to extract and visualize data flow patterns overtime for non-directional statistical data without trajectory information. In this work, we develop a novel flow analysis technique to extract,represent, and analyze flow maps of non-directional spatiotemporal data unaccompanied by trajectory information. We estimate acontinuous distribution of these events over space and time, and extract flow fields for spatial and temporal changes utilizing a gravitymodel. Then, we effectively visualize the spatiotemporal patterns in the data by employing flow visualization techniques. The user ispresented with temporal trends of geo-referenced discrete events on a map. As such, overall spatiotemporal data flow patterns helpusers analyze geo-referenced temporal events, such as disease outbreaks, crime patterns, etc. To validate our model, we discard thetrajectory information in an origin-destination dataset and apply our technique to the data and compare the derived trajectories and theoriginal. Finally, we present spatiotemporal trend analysis for statistical datasets including twitter data, maritime search and rescueevents, and syndromic surveillance.

Index Terms—Spatiotemporal Data Visualization, Kernel Density Estimation, Flow Map, Gravity Model.

F

1 INTRODUCTION

I N cartography, a thematic map is a special-purpose map de-signed to illustrate particular features or concepts [8], [43]

within a dataset. One common theme is the representation ofgeographical movements of people, ideas, money, energy, ormaterial. Usually movement tables are given and the tables arevisualized utilizing lines, arrows, or streaklines (e.g., [5], [27],[39], [48]). However, these movement tables do not exist formuch data, even though we know that individuals in the datasetdo move. Furthermore, many spatiotemporal event datasets arecollected as the movement is already embedded into the datasets;unfortunately, the movement is never explicitly defined in the data.

For example, emergency room records are routinely collected,and such records may contain an underlying notion of diseasespread. However, the records themselves have no explicit defi-nition of movements. Instead, each record has only a patient’saddress, time of visit, and visit reason. One can imagine thatlooking at some measure of movement within the spatiotemporalevents could provide analysts with insights into various spreadpatterns. Such spread patterns are not limited only to emergencyroom events. We can expand this idea to any spatiotemporal eventdata, such as criminal incident reports, economics, and socialtrends. In order to find movements within the statistical data,both spatial and temporal patterns need to be explored during the

• Seokyeon Kim and Seongmin Jeong are with Sejong University.E-mail: {ksy0586 | jsm}@sju.ac.kr

• Insoo Woo is with Intel Folsom. E-mail: [email protected]• Yun Jang is with Sejong University. E-mail: [email protected]• Ross Maciejewski is with Arizona State University.

E-mail: [email protected]• David S. Ebert is with Purdue University. E-mail: [email protected]

analysis phase. However, it is not easy to extract target movementsfrom statistical data since there are many complicating factors. Inprevious work, one natural way to analyze such spatiotemporaldata is to plot the data on a map and then provide animationcontrol or small multiples views to visualize each time step ofthe data. This approach provides analysts with insight into thespatial distributions of the data as well as patterns and correlationsbetween these distributions over time. However, scrolling throughtime steps requires analysts to remember what spatial distributionshave occurred at different time steps, and it is not easy to compareone visualization with another visualization on the screen. More-over, it is hard to extract movements of the data explicitly by directcomparison of two visualizations. There also have been severalillustrative and abstract visualization techniques but only directmovement data have been targeted without any flow extraction.In order to overcome these issues, we propose a flow extractionmodel.

Figure 1 illustrates the concept of potential event flow extrac-tion. Spatiotemporal data is visualized on the map in geographicalspace in accordance with the time. Events are represented as redheatmaps estimated from the raw data. When the heatmaps aregiven with two time steps, t and t +1, as in Figure 1 (a), the eventflow of the data can be indicated as the straight arrow from theleft to the right. In the same way, when the heatmaps are providedover the time steps, t − 1 to t + 1, the event flow can simply beextracted as the round shaped arrow as shown in Figure 1 (b). BothFigure 1 (a) and (b) are simple cases that have no overlap betweenevents, and the event flows can easily be extracted. However, whenapplying the actual data, the heatmaps are too complicated toanalyze. Figure 1 (c) presents actual heatmaps using real data overthe time steps, t−2 to t +2.

Page 2: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 2

(b)

t-1 t t+1

flow

(a)

t t+1

(c)

t-2 t-1 t t+1 t+2

?

Figure 1. Illustration of potential flow map extraction. When eventheatmaps for two consecutive time steps are given as in (a), we canindicate that the event is relocated from the left to the right. Similarly,there are three events over time and the event flow is easily imaginedas in (b). However, when complicated event heatmaps are provided, it isnot easy to extract the event flows in (c).

In this work, we present a novel technique for extractingmovement information from event based data sources to creategeographic flow maps. As shown in Figure 2, our techniqueapproximates the underlying data distribution over time throughthe application of a kernel density estimator. This provides us witha continuous functional representation of the data. We, then applya gravity model to extract flow maps of non-directional statisticaldata. The mass in the gravity model is obtained from the functionaldensity distribution and the gravity vectors are computed for theflow directions. In this way, we can explore the spread patternsof spatiotemporal event data without any trajectory information,such as disease, crime, social trend. In order to visualize theflow efficiently, we apply line integral convolution (LIC) [7] withanimated directional glyphs on the map and oriented line integralconvolution (OLIC) [49]. We evaluate our technique using GPStrajectory data, Twitter data, maritime search and rescue events,and syndromic surveillance data. Note that we utilize the termmovement and flow interchangeably throughout this work.

Our technique has several benefits for analyzing spatiotempo-ral data. Since the trajectory information is not easily collected andrequires massive amounts of storage for further analysis, we do notmake use of the trajectory information in our system. Instead, ourtechnique provides flow maps of event-based statistical datasets.The flow maps are estimated procedurally on the fly and usersare able to interactively adjust the parameters in our flow mapgeneration algorithm for more intuitive flow map extractions. Theflow maps can be used to understand potential movement pathsover time within the statistical datasets. Since certain types ofstatistical datasets imply movement, it is more advantageous toanalyze the data with the movement information, which is theflow map in this work, as compared to conventional visualizationsof statistical data distributions. Note that this shows flow trendsbut does not track movements of the entities creating them.

The major contributions of our paper are as follows:

• A continuous spatiotemporal functional representation fornon-directional discrete event based data using kerneldensity estimation without trajectory information

• Procedural continuous flow map extraction from spa-tiotemporal data using the functional representation

• Design of a 3D gravity model to extract potential flowpattern in statistical spatiotemporal data

• Interactive visualization of the flow maps using animated

glyphs (particles), LIC, and OLIC

We first review previous work in Section 2 and present anoverview of our flow analysis system in Section 3. We describefunctional representations for spatiotemporal statistical datasetsin Section 4. In Section 5 we present our flow map extractionalgorithm, a gravity model, and provide flow map visualizationtechniques in Section 6. Then, we present results generated by oursystem in Section 7 and discuss our system in Section 8. Finally,the conclusion and future directions are discussed in Section 9.

2 RELATED WORK

As mentioned earlier, our system generates data flow maps withinnon-directional spatiotemporal data by employing existing vectorfield visualization techniques. This section covers relevant topicsin spatiotemporal data visualization and vector field visualization.

2.1 Spatiotemporal data analysis

In recent years, many geo-visualization techniques have beendeveloped for exploratory data analysis, and these techniques havebeen used to inspire creative thinking and provide new insightsinto previously unknown characteristics of the original data [38].Some of the techniques focus on data with spatial and temporalinformation. Tominski et al. [46] use three-dimensional iconsto visualize spatiotemporal data on a map which is subdividedbased on administrative subdivisions of geographical regions (e.g.,federal state). Temporal trends are analyzed based on the admin-istrative subdivisions, encoded in 3D models such as pencils andhelixes and displayed on each subdivision. However, this methoddoes not provide temporal trends across subdivisions. Maciejewskiet al. [34] propose kernel density estimate heatmaps linked withtemporal analysis views to detect hotspots for spatiotemporal data.This work allows the user to detect hotspots of interesting eventsand analyze a temporal trend by selecting a hotspot on the kerneldensity estimate heatmap. There is more previous work providingmultiple views (e.g., parallel coordinate, matrix and scatter plots)linked with spatial and temporal data views in order to leveragethe spatiotemporal data analysis [13], [18].

Another technique to represent spatiotemporal data is thespacetime cube [15], which is constructed by aggregating theevent information over the two-dimensional geographical spaceand additional third-dimension to denote time. The spacetimecube shows its usefulness in analyzing agent-based events withspatial and temporal information in geovisualization [21]. Theconcept of the spacetime cube is utilized to visualize temporalbehaviors of events, such as storytelling and human eye movement[9], [29]. Other types of events, such as health care data, arealso applied to the spacetime cube presentation by displayingmeaningful glyphs in the cube. Gatalsky et al. [11] visualize acatalog of earthquake events in the spacetime cube using different-size circles in different colors. Kraak [22] applies the spacetimecube to visualize the distribution and propagation of epidemicswith additional data analysis views (e.g., parallel coordinates).However, it is still hard for the user to recognize the locationand time of a certain event in the spacetime cube. Moreover,it becomes more difficult to analyze the temporal trends forfrequently occurring events. Nakaya and Yano [37] use spacetimekernel density estimation on spatiotemporal events and visualizethe estimated density results using volume rendering techniques.

Page 3: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 3

System overview

ㆍ ㆍ ㆍ

tim

e

Spatiotemporal Statistical Data

Functional Representations

Flow maps

3D Gravity model

Kernel Density Estimation

Figure 2. System overview. Discrete spatiotemporal data is represented as a continuous function (KDE) and flow maps are extracted using 3Dgravity model for temporal trends (movement flows).

Flow maps [5], [12], [14], [32], [39] have been considered aneffective means of visualizing spatial interactions (e.g., migrationflow) when datasets imply flow between spatially different regions.Such data can form a location-to-location network. However,some event types, such as crime counts, and patient counts,are not applicable for flow maps since there is no network ortrajectory information for the origins and destinations. WaldoTobler [44], [45] presents migration map using the N-squared tableof geographical interactions. His insight origins from Tobler’s FirstLaw of Geography; “Everything is related to everything else, butnear things are more related than distant things.” Although the N-squared table is useful, Tobler’s work has a lack of insight to timevarying data. Andrienko et al. [2] propose spatial generalizationand aggregation of massive time varying movement data usingglyph shape designed by Tobler. Another approach to generatethe flow maps from the statistical datasets is to use a gravitymodel [1], [3], [4], [19], [28], [30], [40], [47]. A gravity model hasbeen used in many social science applications to explain certainbehaviors that are similar to gravitational interaction in Newton’sLaw of Gravitation. Recently the gravity model is widely utilizedin various research areas including spatial clustering [16], navi-gation [17], international migration [19], [28], [40], internationaltrade [4], and disease spread [3], [30], [47]. Especially, AlbertoSalvo [41] highlights how collusive behavior can magnify theeffects of distance with two illustrations, such as the supply oftwo firms to two northeastern markets in 1996 and the supplyof three firms to two southeastern markets in 1999. Kincses andToth [20] propose accessibility analysis using gravity modeling,bi-dimensional regression calculations, and GIS visualization.Our work is motivated by the previous gravity model researchand we simplify the gravity model to extract flow maps fromspatiotemporal statistical data without any trajectory information.

2.2 Vector field visualizationVector field (flow) visualization has been a well studied topic, andis used to depict flow patterns and magnitudes in simulated ormeasured vector fields. There have been various approaches [26]to vector field visualization which are classified into direct flowvisualization, texture-based flow visualization, geometric flowvisualization, and feature-based flow visualization. One well-known texture-based flow visualization technique is Line Integral

Convolution (LIC) [7]. It has been shown that LIC fits wellin the visualization of 2D flow patterns [25], [26]. Moreover,in order to reveal the directional vector field, Wegenkittl etal. [49] have proposed oriented LIC (OLIC). As geometric flowvisualizations, streamlines have been shown to provide a goodoverview of the flow field [36]. Particle advection techniquesprovide animated flows by mapping glyph images on particles andrendering geometric shapes of particles [23]. It has been a chal-lenging issue to determine the best flow visualization techniquethat is well matched to a specific application and domain. As aneffort to understand the effectiveness of various flow visualizationtechniques, researchers performed evaluation studies [10], [24],[31]. Laidlaw et al. [24] describe strengths and weaknesses forsix visualization methods including arrow icons, integral curves,wedges, and line-integral convolution (LIC) for 2D vector datathrough a quantitative evaluation study. Their results show thatboth expert and novice users better performed given tasks whenthey used methods with the sign of vectors within the vector fieldas well as visual representations of integral curves and locations ofcritical points. Liu et al. [31] evaluate geometry-based and texture-based flow visualization methods of grid-based arrows, evenlyspaced streamlines, and LIC variants along with a color wheeland color map for 2D flow visualization. By conducting a userstudy, they find that a texture-based representation with enhancedLIC creates intuitive perception of the flow, and a geometry-basedrepresentation with streamlines may generate visual interpolationas such maximizing the perception of flows. They also describethat a priori familiarity of users with the techniques may affectunderstanding flows, while mentioning that many participants intheir evaluation study are familiar with arrows and streamlinesrather than LIC. Based on the results and recommendations fromprevious research, in this work we utilize particle advection withglyph images.

3 SYSTEM OVERVIEW

Most spatiotemporal visual analysis systems utilize a combina-tion of geospatial and temporal visualization. Starting from anoverview of geospatial data at a certain time, the user takes adetailed look at temporal information in a location or area ofinterest through the use of linked views. Such a process has

Page 4: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 4Flow patterns

II

I

(a) (b) (c)

III

(d)

Figure 3. Data distributions are shown as (a) a heatmap in the current, t0, and (b) a heatmap in the future, t0 +1. (c) The flow map from t0 to t0 +1is extracted using the gravity model. The area (I) shows diverging flows and the area (II) presents converging flows when the density decreasesand increases, respectively. The flow path is formed along the density changes over time as shown in the green arrow (III). Note that figure (c) isobtained with the kernel (W = 30, T = 1). An example of a 3D gravity kernel for the gravity model is illustrated in (d), where kernel size is W = 2 andT = 1 in Equation 5. The (p,q,r) values are shown as 3-tuples in (d). The red arrow is the flow for the green heat map.

proven to be an effective means of exploring spatiotemporal data.However, such exploration needs many snapshots of geospatialdata over time and requires the user to switch from a spatial viewto a temporal view and vice versa repeatedly.

In order to reduce such cumbersome tasks, we design anovel analysis system to investigate non-directional event basedspatiotemporal data. Our flow analysis approach provides spatialmovement patterns as well as temporal trends for these datasets.Figure 2 illustrates an overview of our flow analysis. First,events of interest are aggregated based on a user specified timeduration (e.g., daily, weekly, monthly). The aggregated discretedata distribution is converted into our functional representationsusing kernel density estimations. The flow maps are then generatedby evaluating a set of our functional representations using agravity model. In this way, we can extract the flow maps fromstatistical datasets, and then the flow maps are visualized byadopting vector field visualization techniques. In order to providea proper spatiotemporal analysis system, we need to overcome thefollowing challenges:

• How do we handle spatial and temporal dimensions to-gether in the functional representations?

• How do we extract flow maps (vector fields) from thescalar data distributions?

• How can we visualize the flow maps for spatiotemporalanalysis?

Our solutions to these challenges are introduced in Section 4,Section 5, and Section 6, respectively. Sample visualizations fromour flow map analysis system are shown in Figure 3. In the figure,two heatmaps for the current, t0, in (a) and the future, t0 + 1,in (b) are presented. The flow map between these two heatmapsis visualized in (c). The flow map has three different patternsdepending on the data distribution. Diverging and convergingflows of Figure 3 are found in the area (I) and (II). The convergingpattern (II) is seen in the area where the density increases, whereasthe diverging pattern (I) is found when the density decreases.Moreover, a flow path (III) is extracted along the density changesover time as seen in Figure 3(the green arrow in (c)).

( W:80, T:1 )

(a)

( W:80, T:1 )

(b)

( W:80, T:2 )

(c)

𝑡0 𝑡0 + 2 𝑡0 + 1 𝑡0 − 1 𝑡0 − 2

𝑡0 𝑡0 + 1 𝑡0 − 1 𝑡0 𝑡0 + 1 𝑡0 − 1

Figure 4. Two simple event patterns. (a) presents diverging and con-verging patterns. (b) shows the flow map using our approach when theevents move toward the left lower corner.

In order to illustrate our algorithm more clearly, we presenttwo test cases under ideal conditions in Figure 4. There are twoevent locations in (a). The density of the upper left event decreases,whereas, the density of the lower right event increases. Thisdemonstrated the expected the diverging and converging patterns,respectively. These flow patterns are obtained by our approachas shown in (a). Figure 4 (b) shows another simple test casewhere events are moving toward the lower left, as illustrated inthe resulting visualization.

4 FUNCTIONAL REPRESENTATION OF SPATIOTEM-PORAL DATA

Spatiotemporal data consists of discrete events at geolocationsover time. Spatial distribution of the events at each time stepcan be easily represented using functions as presented in manyprevious studies. The distribution is encoded with compactlysupported kernels and the encoded distribution provides a heatmapof data values. In order to obtain spatiotemporal distributions, it isnecessary to combine two different dimensions: location and time.However, it is not intuitively easy to extend this representation

Page 5: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 5

(𝑎0, 𝑎1) = (1, 1) (𝑎0, 𝑎1) = (3, 1) (𝑎0, 𝑎1) = (1, 3)

𝑡0 − 1 𝑡0 𝑡0 + 1

b) c) d)

a)

Figure 5. Parameter comparisons for a0 and a1 in Equation 5. (a) presents three heat map visualizations from t0− 1 to t0 + 1. The flow maps areextracted by varying the values, a0 and a1, in (b), (c), and (d). When a0 increases, the flow patterns are observed as similar to the potential flow att0. On the other hand, when a1 increases, the flow patterns are extracted as more temporal patterns of the neighboring steps from t0− 1 to t0 + 1and this tends to ignore the events at t0.

to spatiotemporal data since geolocation and time are often ondifferent scales.

In this work, we utilize a simple representation of discretespatiotemporal data that is an array of spatial distributions overtime. In order to obtain continuous representations from discretedata, the kernel density estimation (KDE) approach is applied tothe data distribution. Initially we have combined spatial spaceand temporal space together to analyze data over time but thisapproach showed only either diverging or converging flow pattern.Therefore, each time step is encoded with 2D kernels separatelyand the functional representation is written as follows:

f2D(x,y) =1N

N

∑i=1

1h2

s,iKs

(x− xi

hs,i,

y− yi

hs,i

), (1)

where f2D(x,y) is the density estimate at a given location, (x,y),N is the total number of samples, and hs,i is the bandwidth of Ks.We use the Triweight kernel for Ks as shown in the following.

Ks,t(u) =3532

(1−u2)31(||u||≤1) (2)

Note that any kernel can be used to represent discrete data.However, one has to make sure that the first derivative of a kernelexists within the entire spatial data range, otherwise, the firstderivative field of Equation 1 will be discontinuous or incorrect.For example, the kernels having sharp edge on end of kernel, suchas Epanechnikov kernel or Cosine kernel, result in discontinuousvector fields when kernels are overlapped. Since the Triweightkernel covers a narrow range about an event, this leads to lessuncertain functional representations compared to wide range ker-nels, such as Gaussian. Also the Triweight kernel requires lesscomputing time compared to kernels with exponential functions.

The kernel bandwidth can be fixed or varied depending on themethod employed. The kernel bandwidth influences the magnitudeof the kernel, i.e., kernels with large bandwidths have smallerheights. We use a variable bandwidth for the the kernel, Ks,proposed by Silverman [42]. The bandwidth (hs,i) of the kernel(Ks) placed on the point (xi,yi) is proportional to the distancefrom the ith sample to the kth nearest neighbor.

5 FLOW MAP EXTRACTION

Flow maps in cartography are defined by Phan et al. [39] as “amix of maps and flow charts, that show the movement of objectsfrom one location to another, such as the number of people ina migration, the amount of goods being traded, or the numberof packets in a network”. Usually flow maps are obtained fromdatasets with direction or trajectory information. However, moststatistical datasets do not have the directional information. In thisSection, we propose a gravity-based model to extract the potentialflow map.

5.1 Gravity modelThe generic form of the gravity model between spatial locations,i, j, is presented as follows:

g(i,j) = (mia0 ·m j

a1)/di j2, (3)

where mi and m j are the masses at i and j, di j is the distancebetween i and j, and a0 and a1 are the control parameters affectingmasses. In many fields, researchers have estimated these a0 and a1statistically using many data variables. For example, Karemera et

Page 6: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 6

I

II

(a)

II

I

(b)

II

I

(c)

II

I

(d)

Figure 6. Comparison between the difference-based and the gravity-based flow map extraction. (a) and (b) present heatmaps at the current time,t0, and the future, t1, respectively. (c) presents the flow map visualization using the difference-based flow extraction, whereas, (d) shows the flowmap visualization with the gravity-based flow extraction.

al. [19] present a gravity model analysis of international migrationto North America and they use the number of migrants, distance,population, income, inflation, unemployment, language, etc. asdata variables to estimate influential parameters in their model. Inthis work, we have statistical datasets that consist of geolocationsand events over time; therefore, we modify the gravity model toestimate the flows within the statistical data. We could use otherestimation parameters combined with this in future work.

5.2 Gravity-based flow extraction model

In order to estimate the event flows using the gravity model, wereplace the mass in Equation 3, m, with f2D(x,y) as presented inthe following.

g(i,j) =(

f2D(xi,yi))a0 ·

(f2D(x j,y j)

)a1

di j2 (4)

The spatio-temporal flow map extraction model using the gravitymodel is presented as follows:

FMap3D(x,y, t) =W

∑p=−W

W

∑q=−W

T

∑r=−T

(p,q,r)·(f2D(x,y)|t

)a0 ·(

f2D(xp,yq)|tr)a1

di j2 ,

(5)

where W is the kernel size in the spatial axes, (x,y), T is the kernelsize along the time axis, (t), and di j is the euclidean distancebetween (x,y, t) and (xp,yq, tr). The temporal trend is determinedby the multiple adjacent distributions along the time axis. Note thatthe multiplication of (p,q,r) results in the directional informationin the flow map. Figure 3 (d) illustrates an example of the gravitykernel when W is 2 and T is 1 in Equation 5. In that case, thex-axis and y-axis range is -2 to 2 and the t-axis range is -1 to 1.Note that the discrete kernel values are represented in the 3-tuples.

The influences of the parameters, a0 and a1, are compared inFigure 5. Three heatmaps for the consecutive time steps are shownin (a). The flow maps are extracted about t0 with the kernel (W=80,T=1 ) by varying the values, a0 and a1, in Figure 5 (b), (c), and (d).When a0 increases, the flow patterns are similar to the potentialflow at t0. On the other hands, when a1 increases, our model tendsto produce the temporal flow patterns over the relationship onlybetween the neighboring steps. This tends to reduce the effect ofthe time step t0. Figure 5 (c) shows the flow patterns similar to the

Vector Field

OLIC

Arrow particles

Particles

Similar to LIC

Web-based S/W

D3.js Backbone.js

When.js

GLSL Point Sprite

Stand-alone S/W

Flow map

Extraction

Raw Data

Algorithms Software Visualization

Figure 7. Visualization pipeline. The vector field is computed using thefunctional representation from the raw data with KDE and gravity model.The vector fields are visualized on either stand-alone client softwareor web-based software. Our system provides three different types ofvisualizations including OLIC, arrow glyphs, and particle tracing similarto LIC.

potential flow at t0, whereas, Figure 5 (d) presents the flow patternsonly between t0− 1 and t0 + 1 and ignore the event density at t0.Since the effects of the parameters, a0 and a1, are dominant, weuse (1, 1) for a0 and a1 in order to extract the global flow patternsincluding all time steps within the kernel ranges. However, onecan extract the flow map by adjusting a0 and a1 to stress certaintime steps.

As a different flow map extraction model, the flow can besimply obtained by the difference between heatmaps in twodifferent time steps. The difference-based flow extraction modelis presented as follows:

FMapdiff(x,y)|(t0, t1) =(

f2D(x,y)|t1)−(

f2D(x,y)|t0)

(6)

However, FMapdi f f (x,y) extracts only local flow patterns that donot represent the global flows. Figure 6 shows the comparisonbetween difference-based flow extraction and gravity-based flowextraction. Figure 6 (a) and (b) are the heatmaps at the currenttime, t0, and the future, t1. The major changes between Figure 6 (a)and (b) are found in area (I) and area (II). Figure 6 (c) presents theflow map visualization with the difference-based flow extraction.As shown in Figure 6 (c), several converging flows toward thecenters of high heatmap values in the future, t1. However, it isnot possible to extract the global movements since the difference-based model produces only local flows. Figure 6 (d) shows theflow map using the gravity-based flow extraction. The global flowpatterns are extracted as the curve from Figure 6 area (I) to area(II).

Page 7: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 7

Table 1Comparison of CPU and GPU computing time

computingtype model total

pointsnumber oftimesteps

gravity parameters Tload (ms) Tpre(ms) Tkde(ms) Tgrav(ms) Ttran(ms) total(ms)W TCPU FMapdi f f

3,056 31

- - 93 514 34,135 - - 34,742

GPU

FMapdi f f - - 96 518 247 - 189 1,050

FMap3D

1 1 99 730 243 82 199 1,35310 1 90 699 209 3,967 191 5,15730 1 108 589 247 33,469 192 34,60650 1 98 757 243 86,913 192 88,204

Table 2Performance of GPU computing time for different datasets

data totalpoints

number oftimesteps

gravity parameters Tload (ms) Tpre(ms) Tkde(ms) Tgrav(ms) Ttran(ms) total(ms)W TSuper Bowl 4,659 24 30 2 49 60 35 4,796 33 4,973

Boston Marathon 5,224 10 20 1 80 379 119 1,163 14 1,755SAR 23,416 4 120 1 205 553 738 210,577 229 212,302

syndromic 693,878 86 50 3 3,542 1,782 6,897 135,564 271 148,056

6 VISUALIZATION SYSTEM OF SPATIOTEMPORALDATA

In order to analyze spatiotemporal trends of statistical data, theflow maps are extracted as introduced in Section 5. In this work,the flow maps are overlaid on a map to reveal movement patternswithin a geographical space and the flow fields are visualizedwith vector field visualization techniques. Figure 7 illustrates thevisualization pipeline for the flow map. The computation forthe algorithm is performed and, then, the system transfers thevector fields to either a stand-alone client software or a web-basedsoftware for the vector field visualization. Our system providesthree different types of visualization including OLIC, arrow glyph,and particle visualization similar to LIC. We present the flow mapvisualization techniques and details of the system implementationin this section.

6.1 Flow map visualization

In flow visualization using the particle advection, initial particleseeding is one of the challenging issues to obtain the best illus-tration of flow patterns within a vector field [48]. Some previouswork uses flow feature points from fluid simulation [51]. However,such approaches are not applicable to our flow maps computed inSection 5 since our input data is not physics-based simulation databut a set of spatiotemporal discrete events. Moreover, our particleadvection scheme is different from typical particle advectionmethods in that we deal with the transition of areas with high den-sity areas or high density changes and consider two-dimensionalprojection of the patterns in the flow maps onto a geographicalmap. Many statistical datasets have more importance in the areaswith high density changes since they may have meaningful trendsto be analyzed. Such areas are usually located around hotspots. Weinitially place the particles in the region of interest, such as highdensity or high velocity areas, using the importance-driven methodproposed by Burger et al. [6]. We randomly select one high densityarea and place a new particle in a random position (x,y) withinthe selected area. The new particle is advected along the flow mapand dies when the life time is over. The particles are visualizedas animated directional glyphs in our system. Similarly, OLIC areapplied only to non-zero vector fields, which are extracted in the

areas whose density values are not zero. Therefore, we can projectthese visualization results onto a map where actual events occur.

6.2 Web-based flow map visualizationOur system also provides a web-based flow map visualization.When the system generates a flow map using our model, thevector fields are transmitted to not only the local client softwarebut also the web server. The web-based flow map visualizationsystem consists of JavaScript and several APIs, such as D3.js,Backbone.js, When.js. The web server visualizes the 3D globe onthe web according to the vector fields with the D3 projections.The server, then, attaches minimum geographical information in-cluding roads, country boundaries, and lakes, using the TopoJSON.The web-based flow map visualization provides animated particlesand the color of a particle varies accordingly as the particle ages.If the target vector field area is too small to be visible enough, oursystem allows a user to apply an additional map on the differentmap layer located between the globe layer and the particle layer.

6.3 System implementationIn order to build our system, we utilize hardware acceleratedcomputations and visualizations. The functional representationsin Section 4 and the flow map extractions in Section 5 are im-plemented using NVIDIA CUDA. Once the functional represen-tations are obtained in the system, we calculate the flow maps inCUDA again. Thereafter, the flow maps are transferred to shadersfor visualizations. We use a multi-pass technique to obtain severallayers including map rendering, heat map rendering, particlerendering, point sprite rendering, LIC and OLIC rendering, etc.,and the layers are blended for the final results. For the vector fieldvisualization, a geometry shader is used for the particle renderingand point sprite rendering in order to create glyphs. Table 1 andTable 2 show the computing performance according to our modelparameters. Data load and histogram generation time, Tload , KDEpreparation time, Tpre, computing time for KDE, Tkde, computingtime for the gravity model, Tgrav, and data transfer time fromGPU to CPU, Ttran, are compared, respectively. In Table 1, GPUperformance is about 30 times faster than CPU FMapdi f f . Notethat computing FMap3D on CPU is not possible in our experiment.For the FMap3D computation, as W increases, the computing

Page 8: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 8

(a) (b) (c) (d) (e)

(f) (g)

이미지3-수정1

Figure 8. The flow map visualizations of GPS trajectory data. (a)-(e) show the data distributions starting from (a) and moving toward (e). (f) presentsthe flow map extracted from our gravity model with three time steps. (g) is the flow map with five time steps. The difference between (f) and (g) isfound at intersections marked in the rectangular boxes. The flow map is visualized with OLIC.

time becomes longer due to the increase of the number of cellsin the gravity kernel. Table 2 shows the performance of GPUcomputing for different datasets that are used in Section 7. Wehave tested our flow analysis system on commodity workstations,whose specifications include an Intel i7 CPU, 16GB CPU memory,and GeForce GTX 660 with 2GB GPU memory.

7 DATA EXPLORATION

Our system extracts the flow maps and presents the spatiotemporaltrends on a geographical map. In order to evaluate our system, weuse various datasets and extract different flow maps. We presentthe flow map visualizations and the discussion of the flow maps inthis section. In Section 7.1, we validate our technique with a GPStrajectory data as a systematic evaluation. However, in Section 7.2,Section 7.3, and Section 7.4, we apply the technique to statisticalspatiotemporal data including twitter data, maritime search andrescue events, and syndromic surveillance. Since we are focusedon data flow pattern analysis for non-directional spatiotemporalstatistical data, our system utilizes discrete event data withouttrajectory information, such as twitter ID or movements.

7.1 Origin-destination data analysis

In order to validate our technique, we present flow map visualiza-tions with origin-destination data containing actual flows. We useGPS trajectory data and we apply our technique to the dataset afterdiscarding the trajectory information in the data. We, then, applyour flow map extraction model and compare extracted patternswith the original trajectory information. The GPS trajectory datawere collected by 182 GPS users in GeoLife project of MicrosoftResearch Asia from April 2007 to August 2012 [52]. Figure 8(a) - (e) present the data visualizations with every 5 minutes GPStrajectory data with the heatmaps. The flow maps, (f) and (g), arevisualized with OLIC for the flow vectors extracted by our gravitymodel. In order to make the data analysis fair, we aggregate thedata points within certain time range and generate new data withmultiple time steps. (a)-(e) are the distributions of five consecutive

time steps and each time step contains all data points within thetime range. Then we apply our gravity model to extract the flowmap. The flow map in (f) is generated with three time steps andone in (g) is extracted with five time steps. Both show the dataflow directions with OLIC technique. The difference is found atthe intersections marked in the rectangular boxes. The flows atthe intersections in (f) are following the street shapes, whereas,the flow directions at the intersections in (g) are diagonal. If wecompute the flow map with many time steps at the same time, thevector fields are representing the data flow directions for all timesteps instead of just a single time step, which tells the effect ofthe gravity kernel size along the temporal axis, T , in Equation 5.Note that the individual OLIC visualizations over several timesteps are composed within our system to reveal the flow mapfor entire time steps. In this example, we can demonstrate thatour flow map extraction technique reproduces the actual trajectorydirections and the flow maps can be abstracted for multiple timesteps depending on the gravity kernel sizes.

7.2 Twitter data analysis

Recently, social media services, e.g, Twitter, offers a freely acces-sible database of user-generated reports. As many people use GPSenabled mobile communication devices, these reports are able tocapture important local events observed by an active and ubiqui-tous community. We have collected and analyzed Tweet messagesto investigate the message flows in our system. We analyzed twodifferent cases, 2015 Super Bowl and Boston Marathon bombingin 2013. Note that we do not utilize any trajectory informationfrom the social media data.

The Super Bowl was held in Glendale, Arizona, on February1st, 2015. Before investigating this case, we expected that manyfans came to the stadium to watch the game and they might havebeen using their mobile phones to broadcast their status duringthe game day through Twitter. As expected, a lot of Tweets weregenerated during the day, and we decided to explore movements ofthe fans on the day. We aggregated the Twitter data by hour from12:00 to 20:00 and extracted the flow map every hour. There are

Page 9: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 9

12H 13H 15H

16H 17H 18H 20H

Information

Malls

Stadium Parking Area (Golf Center)

Figure 9. The flow maps during Super Bowl in 2015. Three hot spots are marked in the information; the stadium, malls, and remote parking area.The tweet messages related with the game are analyzed between 12:00 and 20:00. The game started at 16:30 and the converging flow patternswere found around the stadium from 12:00 to 16:00. Then, the flow pattens toward the remote parking area from 16:00 to 17:00 were discovereddue to the lack of parking spots around the stadium. After the game, the flow patterns were detected as people moved toward the malls.

(a) 12H

(d) 15H

(b) 13H

(e) 16H

(c) 14H

(f) 17H

Figure 10. The flow maps of the Boston Marathon bombing in 2013. The Tweet messages are analyzed between 13:00 and 17:00. The bombsexploded at 14:49 and the escape flow patterns are found right after the bomb explosion in (c). Random patterns and gathering patterns at themarathon course are found in (a) and (b). People started to gather again at the bombing sites two hours later in (d).

three hot spots, such as the stadium, malls, and a remote parkingarea as indicated in Figure 9. People started to move to the stadiumfrom 12:00 until the game starting time, which was 16:30. As timewent on, the movement flows were dominant toward the stadiumwhere there were many available parking spaces until 15:00. Thenfrom 16:00, the major flow was found in the remote parking area,which is apart from the main stadium. This indicates that therewas no available parking spaces near the stadium, and peoplestarted to move to other parking spaces in the remote parkingareas. Interesting patterns were found around 20:00. There wasa flow from the stadium to the malls, which might indicate that

people went to the malls again for food and drinks or picking uptheir cars.

Our second twitter study focused on The Boston Marathonbombings and subsequent related shooting incidents that began onApril 15, 2013 when two pressure cooker bombs exploded duringthe Boston Marathon at 14:49, killing 3 people and injuring anestimated 264 others [50]. We started to analyze Twitter messagesfrom 13:00 to 17:00 as we assumed that there were randompatterns before the marathon athletes showed up and eventuallypeople gathered together along the marathon course. As seen inFigure 10 (a), (b), and (c), there were mixed patterns, such as

Page 10: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 10

(a) 𝑡0 − 1

(b) 𝑡0

(c) 𝑡0 + 1

(d)

Traverse City

Muskegon

Bay City

Detroit

Cleveland Chicago

Milwaukee

Sturgeon Bay

Ludington

Sarnia

Figure 11. (a) is a heatmap in the past, t0−1, (b) in the current, t0, and (c) in the future, t0 +1, for the search and rescue case. (d) is the flow mapvisualization and the flow trends are extracted based on the density distributions (a, b, c) from the past to the future.

(d)

(a) 𝑡0 − 3 (b) 𝑡0 (c) 𝑡0 + 3

Indianapolis

Fort Wayne

Indianapolis

Fort Wayne

Indianapolis

Fort Wayne

Indianapolis

Fort Wayne

(e)

Indianapolis

Fort Wayne

Figure 12. Outbreaks of synthetic syndromic data. (a), (b), and (c) show the heatmaps of the data distribution and the corresponding flow map ispresented in (d) and (e). Flow paths from Indianapolis to Northern Indiana are extracted as the disease spreading direction.

random and gathering movements at 12:00, 13:00, and 14:00.Then the bombs exploded at 14:49PM and right after that, peopleescaped from the bombing sites, which are marked in Figure 10(d). The flow patterns showed that all people moved away and ourflow map model caught these patterns. After two hours, peopleslowly came to the bombing sites to investigate or watch the scene.This is shown in Figure 10 (f).

7.3 Coast guard search and rescue data analysis

Our next application is the Search And Rescue (SAR) data thatis collected by all U.S. Coast Guard stations [35]. There aretwo different types of the data: response cases (call for action)and response sorties (resources deployed to respond to the call.e.g., a boat or an aircraft). With this data, the Coast Guard isparticularly interested in determining the spatial and temporaldistribution of response cases and their associated sorties (a boator an aircraft allocated to respond to an incident) for all SARoperations conducted in the Great Lakes. The SAR data from

2004 to 2011 is used for this study. Figure 11 (a), (b), and (c)for winter, spring, and summer, respectively, show the densitydistributions of the SAR cases in Lake Michigan, Lake Huron, andLake Erie. As seen in the figures, more SAR cases happen duringsummer. Moreover, more SAR cases occur near the small citiesincluding Muskegon, Ludington, Traverse City, Sarnia, duringsummer, whereas, most SAR cases are found near the big citiesduring winter. This observation is found as the flow map in (d).This implies that the safety or coast guards should be allocatedmore near the small cities in summer along the flow patterns.

7.4 Synthetic syndromic data analysis

Another example is syndromic surveillance, which focuses ondetection of adverse health events. We use a synthetic syndromicsurveillance data generated by Maciejewski et al. [33], consistingof a series of outbreaks over time moving from Indianapolis toNorthern Indiana. Figure 12 illustrates how the outbreaks changeover time as heatmaps in (a)-(c). The outbreaks occurred around

Page 11: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 11

( W:80, T:2 )

𝑡0

𝑡0 + 2 𝑡0 + 1

𝑡0 − 1 𝑡0 − 2

Figure 13. The event distribution moves to the bottom left (t0 − 2 andt0 − 1 ) and then moves back to the top right (t0 + 1 and t0 + 2 ). Ouralgorithm generates the flow map only for the non-overlap area if theoverlap area is placed within the temporal kernel size.

Indianapolis on July 26, 2006 and propagated to near areas andNorthern Indiana areas. We compute the flow map of the data byplacing the center of gravity on (b) with the kernel (W=50, T=3),which indicates that our system computes the flow map for theseven time steps. Flow paths from Indianapolis to North Indianaalong local highways and state border between Indiana and Illinoisare extracted as the disease spreading in Figure 12 (d) and (e),which are not easily imagined only from the heatmaps. Small flowpatterns are also visible toward some small cities in the suburb ofIndianapolis. In addition, there are major flow paths toward FortWayne in the northeast Indiana, which is actually a future (t0 +1)event at the time (t0) in (b). The major flow is formed near FortWayne because the amount of event change is large even when thedensity near Fort Wayne is low. Notice that the flow patterns fromhigh density to high density do not appear since there is no eventchange. In addition, the pattern from high density to low densityappears as diverging flow, whereas, the pattern from low densityto high density are shown as converging flow.

8 DISCUSSION

We have developed an event data flow analysis and visualizationsystem with novel flow map extraction algorithms that can be ap-plied to any spatial statistical data without trajectory information.In this section, we discuss requirements and limitations of ourapproach.

8.1 Statistical dataOur system requires exact geographical information, however,most statistical data contains district level location, such as stateand city, without latitude and longitude information. Since thiskeeps our system from building accurate functional representa-tions, it is not easy to extract flow maps due to the ambiguity ofthe event locations. Also if there is nonuniform event distributionalong the time axis, it is difficult to determine the time aggregationlevel for the data flow analysis. Another requirement for the datain our system is that there should be enough time steps to extractthe flow map. Since the gravity model analyzes the past, current,and future time steps, our system needs at least three differenttime steps to extract the flow map. Due to this, it is not possible toobtain the flow maps for the first and last time steps.

8.2 Flow patternsAlthough our algorithm extracts the flow map from the statisticsdata without any trajectory information, there are a few cases

that our system is not able to handle properly for the flow mapextraction. The special cases are discussed in this section.

The first case is that an event moves toward one direction andthen comes back to the original location as seen in Figure 13. Thedensity distributions are shown on the left from (t0−2) to (t0 +2).As seen in the distributions, the event passes through the samelocation with two opposite directions over time. Since the temporalgravity kernel size is 2, total five time steps are used to computethe flow map. In this case, the flow map over the area that an eventvisits multiple times within the temporal kernel size becomes zerofrom Equation 5. This is shown on the right of Figure 13 and theflow patterns only in the non-overlap area are extracted. In orderto prevent this limitation, we can change the temporal aggregationlevel during the preprocessing for the functional representationso that we avoid the multiple visits within the temporal kernelsize. The other solution is that we change the temporal kernel sizeto avoid the multiple visits with opposite directions. Consideringthe computation performance as seen in Table 1 and Table 2,adjusting the temporal kernel size would have more benefits sincethe computing time for the gravity model is much longer.

Another case happens when different objects are handledunder the same density distribution. For example, there are twocars. One car moves to a location and stops there. The othercar starts to move from the location to another location afterthe first car stops. Since we compute the event distribution forboth cars together, there is no means to differentiate the objectsand our system generates the continuous flow map for the cars.In order to overcome this issue, we must analyze the differentobjects separately and overlay all flow maps together. We planto investigate more efficient approach to handle this case in thefuture.

Figure 14 (a) shows another case. (a)-(I), (a)-(II), and (a)-(III)are the density distributions and our system extracts the flow mapas seen in (a)-(IV). We can see that the event is moving towardthe west of Cleveland over time. However, our system showsalso flow patterns toward the east of Cleveland as presented inthe red rectangle. This happens when the events occur sparselyin the spatial space and our KDE does not cover the entiregeographical space sufficiently. Similar cases are found in Figure 9and Figure 10. Although we can tell that this flow is correctlyestimated for some cases, we still consider that the flow is notwell extracted from the data. This sparsity issue will be studiedfurther in the future in order to remove the erroneous flow patternsat the outer edge of the kernels.

Figure 14 (b) presents unwanted flow patterns. Three eventdata distributions are shown in (b)-(I), (b)-(II), and (b)-(III) and theflow map extracted from the distributions about t0 in (b)-(IV). Asseen in (I), there are four blue boxes whose density values decreaseover time. In this case, diverging flow patterns are extracted fromthe blue box areas. However, the problem is found in the lowdensity area surrounded by the blue boxes. Ideally, no flow patternis supposed to be extracted in the area but as seen in (VI), thereare flow patterns that are produced by the diverging flows fromthe blue boxes. Although these flow patterns are visualized in oursystem, their flow speeds are very slow and the patterns are noisysimilar to saddle points for flows from all directions. Therefore, itis possible to differentiate these flow patterns from other accurateflows. Nonetheless, we consider that these flows are not properlycomputed. Since we do not use any trajectory information, it isnot easy to formulate topology structures and remove these flowpatterns. We will search for an improved algorithm to embed

Page 12: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 12

Cleveland Cleveland Cleveland

(a) (b) (I) (II) (III)

(IV)

(I) (II) (III)

(IV)

𝒕𝟎 −𝟏 𝒕𝟎 +𝟏 𝒕𝟎 𝒕𝟎 −𝟏 𝒕𝟎 +𝟏 𝒕𝟎

(V)

(V)

Figure 14. (a)-(I), (a)-(II), and (a)-(III) present heat maps from t0−1 to t0 +1. (a)-(IV) is the flow map from the heat maps. (a)-(V) presents the zerovector field area, such as the kernel edges. We stop advecting the vector field at the kernel edges since there is no actual field beyond the edges.(b)-(I), (b)-(II), and (b)-(III) are different heat maps. Incorrect flow patterns are extracted in the area in (b)-(V) surrounded by the high density areas(blue boxes). All diverging flows from the blue box areas are merged in the red box area but it is not properly estimated since there is no densitychange in the area.

topology information to improve the flow map extraction in thefuture.

8.3 Visualization techniques

Our system provides three different visualizations for the flowmap including OLIC, arrow glyph, and dense particle tracingsimilar to LIC. OLIC is useful to visualize sparse flow patternsas seen in Figure 8 but this is not appropriate for complicated anddense flow patterns. Arrow glyph visualization allows the userto focus on our flow map extracted by our algorithm but it isdifficult to analyze detail flow maps. Additionally, the quality ofthe visualization for the arrow glyphs depends on the number ofglyphs in the beginning. Dense particle tracing visualization isuseful when we analyze the overall flow map patterns. However,the dense particle tracing visualization conveys much more detailinformation although we want to analyze the abstract flow maps.In addition, currently, we offer two different visualization sys-tems, stand-alone SW and web-based SW, however, the issue ofcoordinates for mapping occurs as seen Figure 12 (d) and (e). Themapping on the web-based visualization SW is implemented usingthe 3D globe algorithm, whereas, that on the stand-alone softwareis implemented by the hard-coded edge point. The differencebetween these coordinate mappings produces the slight differentvisualizations. This will be investigated as future work.

8.4 Expansion to other domains

Our flow map extraction algorithm enables the data flow analysisand produces insight to the movement patterns for the spatiotem-poral statistical data. Currently, there are many devices that gener-ate data with spatiotemporal information.Therefore, it is possibleto extend this algorithm to many different domains, such as, socialand natural science. However, as mentioned earlier, we do notinclude domain characteristics during the the flow map extraction,such as, additional network, transportation information, naturaldirectional phenomena including weather condition. We plan toapply the domain features to improve our algorithm in the future.

8.5 Miscellaneous issueAs summarized in Table 1, the computation for the flow map usingFMap3D is more expensive than one using FMapdi f f . Althoughwe mentioned that the gravity model produces the flow map moreadequately, we recommend that the user starts to analyze theflow map with FMapdi f f since the flow maps are extracted andvisualized fairly fast on GPU. In order to accelerate rhe FMap3Dcomputation, we will study the approximation of the gravity modelas future work.

9 CONCLUSION AND FUTURE WORK

In this paper, we have presented a novel spatiotemporal dataanalysis technique. We extracted flow maps from discrete spa-tiotemporal statistical data by evaluating the continuous functionalrepresentations. We employed a two-dimensional kernel densityestimation to approximate the underlying data distribution andapplied a gravity model that generates flow maps. We evaluatedthe flow map extraction model with two trajectory datasets. Wehave also demonstrated our flow map analysis and its visualizationusing four different types of event-based data. Our results showedthe benefits of employing the flow maps for the trend analysis ofspatiotemporal data. Our technique can be used to understand po-tential movement paths and trends over time within the statisticaldatasets, such as, criminal incident reports, economics, and socialtrends. As future work, we are going to investigate advanced flowmap extraction models based on further statistical analysis of thedata including constraint conditions. We also plan to handle noisydata distributions that cause random flow patterns. Furthermore,we will apply illustrative flow visualization techniques for betterperception of the flow. We will extend our flow analysis techniquesto multivariate spatiotemporal data analysis as well.

REFERENCES

[1] J. E. Anderson. The gravity model. Working Paper 16576, NationalBureau of Economic Research, 2010.

[2] N. Andrienko and G. Adrienko. Spatial generalization and aggregationof massive movement data. IEEE Transactions on visualization andcomputer graphics, 17(2):205–219, 2011.

Page 13: VOL. XX, NO. XX, XX 20XX 1 Data Flow Analysis and Visualization …rmaciejewski.faculty.asu.edu/papers/2017/Yun-Flow.pdf · Data Flow Analysis and Visualization for Spatiotemporal

VOL. XX, NO. XX, XX 20XX 13

[3] J. M. Barrios, W. W. Verstraeten, P. Maes, J.-M. Aerts, J. Farifteh, andP. Coppin. Using the gravity model to estimate the spatial spread ofvector-borne diseases. International Journal of Environmental Researchand Public Health, 9(12):4346–4364, 2012.

[4] J. H. Bergstrand. The Gravity Equation in International Trade: SomeMicroeconomic Foundations and Empirical Evidence. The Review ofEconomics and Statistics, 67(3):474–81, 1985.

[5] K. Buchin, B. Speckmann, and K. Verbeek. Flow map layout via spiraltrees. IEEE Transactions on Visualization and Computer Graphics,17(12):2536 –2544, 2011.

[6] K. Burger, P. Kondratieva, J. Kruger, and R. Westermann. Importance-driven particle techniques for flow visualization. In Proceedings of TheIEEE Pacific Visualization Symposium, pages 71 –78, 2008.

[7] B. Cabral and L. C. Leedom. Imaging vector fields using line integralconvolution. In Proceedings of the Conference on Computer Graphicsand Interactive Techniques, pages 263–270. ACM, 1993.

[8] B. Dent. Cartography: Thematic Map Design. WCB/McGraw-Hill,1999.

[9] R. Eccles, T. Kapler, R. Harper, and W. Wright. Stories in geotime. InProceedings of the IEEE Symposium on Visual Analytics Science andTechnology, pages 19 –26, 2007.

[10] A. Forsberg, J. Chen, and D. Laidlaw. Comparing 3D vector fieldvisualization methods: A user study. IEEE Transactions on Visualizationand Computer Graphics, 15(6):1219 –1226, 2009.

[11] P. Gatalsky, N. Andrienko, and G. Andrienko. Interactive analysis ofevent data using space-time cube. In Proceedings of the InternationalConference on Information Visualization, pages 145–152, 2004.

[12] D. Guo. Flow mapping and multivariate visualization of large spatialinteraction data. IEEE Transactions on Visualization and ComputerGraphics, 15(6):1041 –1048, 2009.

[13] D. Guo, J. Chen, A. MacEachren, and K. Liao. A visualization systemfor space-time and multivariate patterns (vis-stamp). IEEE Transactionson Visualization and Computer Graphics, 12(6):1461 –1474, 2006.

[14] D. Guo and X. Zhu. Origin-destination flow data smoothing andmapping. IEEE Transactions on Visualization and Computer Graphics,20(12):2043–2052, 2014.

[15] T. Hagerstrand. What about people in Regional Science? Papers inRegional Science, 24(1):6–21, Dec. 1970.

[16] M. Indulska and M. E. Orlowska. Gravity based spatial clustering. InProceedings of the 10th ACM International Symposium on Advances inGeographic Information Systems, pages 125–130. ACM, 2002.

[17] W. Javed, S. Ghani, and N. Elmqvist. Gravnav: Using a gravity modelfor multi-scale navigation. In Proceedings of the International WorkingConference on Advanced Visual Interfaces, pages 217–224. ACM, 2012.

[18] M. Jern, T. Astrom, and S. Johansson. Geoanalytics tools applied to largegeospatial datasets. In Proceedings of the International Conference onInformation Visualisation, pages 362 –372, 2008.

[19] D. Karemera, V. I. Oguledo, and B. Davis. A gravity model analysis of in-ternational migration to north america. Applied Economics, 32(13):1745–1755, 2000.

[20] A. Kincses and G. Toth. The application of gravity model in theinvestigation of spatial structure. Acta Polytechnica Hungarica, 11(2):5–19, 2014.

[21] M. Kraak. The space-time cube revisited from a geovisualization per-spective. In Proceedings of the International Cartographic Conference,pages 1988–1996, 2003.

[22] M.-J. Kraak and P. Madzudzo. Space time visualization for epidemiolog-ical research. International Cartographic Conference, page 302, 2007.

[23] J. Kruger, P. Kipfer, P. Kondratieva, and R. Westermann. A particlesystem for interactive visualization of 3d flows. IEEE Transactions onVisualization and Computer Graphics, 11:744–756, 2005.

[24] D. H. Laidlaw, R. M. Kirby, C. D. Jackson, J. S. Davidson, T. S. Miller,M. D. Silva, W. H. Warren, and M. J. Tarr. Comparing 2d vector fieldvisualization methods: A user study. IEEE Transactions on Visualizationand Computer Graphics, 11:59–70, 2005.

[25] R. S. Laramee, G. Erlebacher, C. Garth, T. Schafhitzel, H. Theisel,X. Tricoche, T. Weinkauf, and D. Weiskopf. Applications of texture-based flow visualization. Engineering Applications of ComputationalFluid Mechanics (EACFM), 2(3):264–274, 2008.

[26] R. S. Laramee, H. Hauser, H. Doleisch, B. Vrolijk, F. H. Post, andD. Weiskopf. The state of the art in flow visualization: Dense and texture-based techniques. Computer Graphics Forum, 23:2004, 2004.

[27] S. J. Lavin and R. S. Cerveny. Unit-vector density mapping. TheCartographic Journal, 24(2):131–141, 1987.

[28] J. J. Lewer and H. Van den Berg. A gravity model of immigration.Economics Letters, 99(1):164–167, 2008.

[29] X. Li, A. Coltekin, and M.-J. Kraak. Visual exploration of eye movementdata using the space-time-cube. In Proceedings of the International Con-ference on Geographic information science, pages 295–309. Springer-Verlag, 2010.

[30] X. Li, H. Tian, D. Lai, and Z. Zhang. Validation of the gravity modelin predicting the global spread of influenza. Int J Environ Res PublicHealth, 8(8):3134–43, 2011.

[31] Z. Liu, S. Cai, J. E. Swan II, R. J. Moorhead II, J. P. Martin, and T. J.Jankun-Kelly. A 2D flow visualization user study using explicit flowsynthesis and implicit task design. IEEE Transactions on Visualizationand Computer Graphics, 18(5):783–796, 2012.

[32] W. Luo, A. M. MacEachren, P. Yin, and F. Hardisty. Spatial-socialnetwork visualization for exploratory data analysis. In Proceedings ofthe ACM SIGSPATIAL International Workshop on Location-Based SocialNetworks, pages 3:1–3:4. ACM, 2011.

[33] R. Maciejewski, R. Hafen, S. Rudolph, G. Tebbetts, W. Cleveland,S. Grannis, and D. Ebert. Generating synthetic syndromic-surveillancedata for evaluating visual-analytics techniques. Computer Graphics andApplications, IEEE, 29(3):18 –28, 2009.

[34] R. Maciejewski, S. Rudolph, R. Hafen, A. Abusalah, M. Yakout, M. Ouz-zani, W. Cleveland, S. Grannis, and D. Ebert. A visual analyticsapproach to understanding spatiotemporal hotspots. IEEE Transactionson Visualization and Computer Graphics, 16(2):205 –220, 2010.

[35] A. Malik, R. Maciejewski, B. Maule, and D. Ebert. A visual analyticsprocess for maritime resource allocation and risk assessment. In IEEEConference on Visual Analytics Science and Technology (VAST), pages221 –230, 2011.

[36] O. Mattausch, T. Theußl, H. Hauser, and M. E. Groller. Strategiesfor interactive exploration of 3d flow using evenly-spaced illuminatedstreamlines. In K. Joy, editor, Proceedings of Spring Conference onComputer Graphics, pages 213–222. SCCG, 2003.

[37] T. Nakaya and K. Yano. Visualising crime clusters in a space-time cube:An exploratory data-analysis approach using space-time kernel densityestimation and scan statistics. Transactions in GIS, 14(3):223–239, 2010.

[38] D. J. Peuquet and M.-J. Kraak. Geobrowsing: creative thinking andknowledge discovery using geographic visualization. Information Visu-alization, 1:80–91, 2002.

[39] D. Phan, L. Xiao, R. Yeh, and P. Hanrahan. Flow map layout. In IEEESymposium on Information Visualization, pages 219–224, 2005.

[40] R. Ramos and J. Surinach. A Gravity Model of Migration between ENCand EU. IZA Discussion Papers 7700, Institute for the Study of Labor(IZA), 2013.

[41] A. Salvo. Trade flows in a spatial oligopoly: gravity fits well, but whatdoes it explain? Canadian Journal of Economics/Revue canadienned’economique, 43(1):63–96, 2010.

[42] B. W. Silverman. Density Estimation for Statistics and Data Analysis.Chapman & Hall/CRC, 1986.

[43] T. A. Slocum, R. B. McMaster, F. C. Kessler, and H. H. Hugh. ThematicCartography and Geovisualization. Pearson Prentice Hall, 2009.

[44] W. Tobler. Movement mapping. 2004.[45] W. R. Tobler. Experiments in migration mapping by computer. The

American Cartographer, 14(2):155–163, 1987.[46] C. Tominski, P. Schulze-Wollgast, and H. Schumann. 3d information

visualization for time dependent data on maps. In Proceedings of theInternational Conference on Information Visualisation, pages 175–181,2005.

[47] J. Truscott and N. M. Ferguson. Evaluating the adequacy of gravitymodels as a description of human mobility for epidemic modelling. PLoSComputational Biology, 8(10), 2012.

[48] G. Turk and D. Banks. Image-guided streamline placement. In SIG-GRAPH 96 Conference Proceedings, pages 453–460, 1996.

[49] R. Wegenkittl, E. Groller, and W. Purgathofer. Animating flow fields:rendering of oriented line integral convolution. In Computer Animation’97, pages 15 –21, 1997.

[50] Wikipedia. Boston marathon bombings – Wikipedia, the free encyclope-dia, 2013. [Online accessed 31-March-2015].

[51] L. Xu, H. Dinh, E. Zhang, Z. Lin, and R. Laramee. A distribution-basedapproach to tracking points in velocity vector fields. In Proceedings ofComputer Vision and Pattern Recognition, pages 2663 –2670, 2009.

[52] Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locationsand travel sequences from gps trajectories. In Proceedings of the 18thInternational Conference on World Wide Web, pages 791–800. ACM,2009.