texas a&m university · civil engineering applications of gis yanru zhang abstract traffic...
TRANSCRIPT
Texas A&M University
CVEN 658– Civil Engineering Applications of GIS
Hotspot Analysis of Highway Accident Spatial Pattern Based on
Network Spatial Weights
Instructor: Dr. Francisco Olivera
Author: Yanru Zhang
Zachry Department of Civil Engineering
December 06, 2010
Civil Engineering Applications of GIS Yanru Zhang
ABSTRACT
Traffic accidents are increasing being recognized as a social and public health challenge due to
the increased mobility of today’s society. Factors that influence traffic accidents are equipment
failure, roadway design, poor roadway maintenance and driver behavior. From empirical
experience, we know that spatial patterns exist in traffic incident. Some places are more likely to
have accident than others, because of poor roadway design or more aggressive drivers exist in
that area. To study and identify the areas that traffic incident frequently happen is helpful for
road managers to allocate recourses to that areas to either improve the roadway conditions or
develop strategies to avoid aggressive behavior. In this paper, I applied spatial statistics
techniques in ArcGIS to study the spatial relationships among highway incidents in Houston.
Three major steps will be involved in this study, which are construct a network dataset, generate
spatial weights matrix and conduct statistical analyses. The function of Generate Network Spatial
Weights will be used to obtain spatial weights of incidents which are based on road network
rather than on straight line distances. Then the obtained spatial weights matrix will be
implemented into Hot Spot Analysis (Getis-Ord Gi*) to get the final results.
INTRODUCTION
Due to the increased mobility of today’s society, road traffic accidents analysis and prevention
are increasingly being recognized as an important topic. In United States 2009, there are 30,797
fatal crashes and 33,808 persons dead in these accidents. This is to say that approximately 4
persons dead each hour. A study released by the World Health Organization shows that an
estimated 50,000,000 people are injured and 1,200,000 people are killed in road crashes each
year worldwide. An estimated 65% increase of accidents over the next 20 years unless new
prevention actions are taken. However, highway accident are distributed on each sections of the
road, inspection of every single location that an accident happens is impractical. To study and
identify the areas (or road sections) that traffic accident frequently happen is helpful for road
managers to allocate recourses to those areas to either improve the roadway conditions or
develop strategies to avoid road accidents or diminish loses. Factors that contribute to the road
accident occurrence include: traffic volume, roadway design, weather, configuration of highway
networks and maintenance of highway, etc. and these factors are all exhibit strong spatial
Civil Engineering Applications of GIS Yanru Zhang
patterns (Xiea and Yanb 2008). Thus investigating the spatial patterns of traffic accident is
crucial steps in understanding how, where and when a traffic accident happens. To identify the
areas that traffic accident frequently happens, we introduced the topic of the cluster analysis.
Cluster occurs when features in a center area are found have similar high or low values. Identify
the locations of accident clusters can help to identify the causes of the accidents. By comparing it
with other locations that cluster does not occur, it is also possible to find causes that lead to the
accident.
GIS technology as a valuable tool that combines the spatial information with other data has been
widely used in road accident analysis procedure to visualize accident data and analyze hotspots
in highways. Here, accident hotspot is the cluster of individual accidents. GIS can hold large
amount of data that can be easily stored, shared analyzed and managed (Erdogan et al. 2007).
Existing studies only consider geometric distance and did not take the road network into
consideration. However, the accidents are network based and it is important to take the road
network into consideration when study distance between two accidents locations. In this study,
distances between different accidents are defined based on the configuration of road network, so
that the spatial relationships between accident data are defined based on the highway network.
To realize this function, a network data set is created as the basic background for the accident
analysis. Then generate spatial weight matrix function was used to calculate the weight matrix
for accident data. Then hot spot analysis (Getis-Ord Gi*) were used to obtain the spatial
relationships among traffic accident data.
LITERATURE REVIEW
GIS-based accident information systems provide a platform to conduct spatial analysis of the
accident data which are almost impossible by using a non-spatial database. Since 1990, the GIS
technologies and its applications on traffic safety and accident analysis gained popularity among
agencies and researchers. Erdogan et al. (2007) summarized existing analysis methods that used
in traffic accident analysis procedures, which include intersection or segment analysis, proximity
analysis, spatial query analysis, cluster analysis, density analysis. He also introduced the
statistical analysis methods: kernel density analysis and repeatability analysis to conduct the
accident analysis and determine the hot spots of the accidents. The study results showed that
Civil Engineering Applications of GIS Yanru Zhang
cross roads and junction points are places that accident frequently happens. Saffet(2009) studied
the inter-province differences in traffic accidents and mortality. He used GIS to extract the
features that can influence the accidents like day, temperature, humidity, weather conditions, and
month of occurred traffic accidents. Apply the CFS method to select important features that can
influence the traffic accident. Use SVM and ANN to classify the traffic accident dataset. The
study results show that the proposed model has better prediction results of traffic accidents than
that of SVM or ANN models alone. Anderson (2008) use Geographical Information Systems
(GIS) and Kernel Density Estimation to study the spatial relationships among injury related
accident data then using a K-means clustering algorithm to identify the accident hot spot. Based
on collision and accident attribute data in London, UK, five groups and 15 clusters were created.
There is no universally accepted definition of accident hotspot, Hauer (1997) describes two
methods that are widely used to rank the accident locations, one is based on accident rate the
other is based on accident frequencies. Road accident hotspot analysis usually focus on road
segments or junctions, area based road accident analysis are seldom used in existing studies. A
comprehensive understanding of factors are contribute to accidents are important in hotspot
analysis procedures, for example, the severity of the accident and the surrounding environment.
Because the GIS platform has the ability to link a large amount of disparate data bases, it allows
both historical and statistical analysis of traffic accident. The most commonly used function in
traffic accident analysis is spatial analysis extension and it provides varies ways to conduct
accident analysis.
METHODOLOGY
The purpose of studying the distribution of traffic accidents is to find out the cluster of accidents
that have the same feature, like the clearance time, the number of people injured or the number
of death. In this study, the clearance time is used as the attribute feature. For an accident, if the
clearance time is long, then it is defined as a serious traffic accident, if the clearance time is short,
it is defined as a minor traffic accident in this study. The basic idea of the network based hotspot
analysis of the accident data is first calculate the network spatial weights between any pair of
accident data and then use the hot spot analysis (Getis-Ord Gi*) function in ArcGIS to find the
Civil Engineering Applications of GIS Yanru Zhang
locations that long clearance time traffic accidents happens. To realize this function, three steps
are involved:
Data Preparation
Data used for the network based accident hotspot analysis include accident data and road
network data. The accident data includes the longitude and latitude of the accident locations,
roadway name, cross street name and clearance time. The road network data should include the
line feature of the road network, the length of each road section, the longitude and latitude of the
road and the turn features.
Network Dataset
Before generate network spatial weights, a network dataset is needed to represents the distance
among different accident locations. To create a network dataset, we first need to enable the
network analysis extension in AcrCatalog. In ArcCatalog, go to the direction where the road
network shapfile is located and choose the New Network Dataset to start define the attributes of
the network dataset. In the following steps, we need to define the name of the network dataset,
the network connectivity, elevation field settings, turn information, driving directions. After all
the settings are defined, click yes to build the network. Then close the ArcCatalog. The created
network dataset is a vitalization of the transportation networks and offers functions that can
model impedances, restrictions, and hierarchy for the network. A network dataset includes: two
shapfiles which are lines features that represents the location of roadway and junctions where
two roadways intersect, one shapefile based network dataset.
Generate Network Spatial Weights
Different from traditional statistical method, spatial statistics takes space and spatial relationships
into consideration. Network spatial weights are conceptualization of spatial relationships
between any two points and are very important in the hotspot analysis. Different definitions of
the weights will leads to different results. Euclidean distance, contiguity, fixed or inverse
distances are most commonly used weighting schemes. Because spatial relationships among
traffic accident data are closely related with road network, define spatial relationship in terms of
Civil Engineering Applications of GIS Yanru Zhang
real road network will be more accurate. In this study, weights among different accident data are
calculated based on the road network. The recently developed generate network spatial weights
tool in ArcGIS can realize this function. Figure 1 illustrates the different conceptualizations of
spatial relationships.
Inverse Distance Distance Band
Zone of Indifference Network Spatial Weights
Fig.1. Most commonly used spatial weights
The inverse distance indicates that correlation exists among all features and the correlations
become smaller as the distance between these features grows larger. A fixed distance band
allows one to specify a distance that features within that distance is closely related while
uncorrelated when out that distance. Thus the value within that distance is a fixed number and
immediately goes to zero when out of that distance. The zone of indifference combines the
inverse distance method and distance band method: value within a distance is a fixed number
Civil Engineering Applications of GIS Yanru Zhang
when out of that distance it gradually goes to zero. The network spatial weights are different
from previous three methods, which define the weights among different objects based on a
Network dataset. Since traffic accidents are network based, it is more appropriate to define the
distance among different accident points by using the network spatial weights.
Hot Spot Analysis (Getis-Ord Gi*)
After we generate the network spatial weights, the next step is conduct traffic accident hotspot
analysis. The hot spot analysis tool in ArcGIS applies the Getis-Ord Gi* statistics can realize this
function and calculate the z-value which indicates whether features with high or low values are
clustered together at each location. In this study, the duration of accident are used as the criterion
to identify where accidents with longer duration are clustered together and where accidents with
shorter duration are clustered together. The statistical definition of Getis-Ord Gi* is as following:
∑
∑ ∑
√
∑ (
∑ )
√
∑ (∑
)
Where
The attribute value for feature j.
Sample size.
Spatial weights between feature and .
The outcome of the Gi* statistic is a z-value for each feature. Higher z-value indicates cluster of
accidents that last for a longer period, while lower z-value indicates large number of accidents
that have shorter duration locate around this area.
The hot spot analysis begin with a null hypothesis that there is no spatial pattern exists among
studied features. In this study, the null hypothesis is that spatial correlations do not exist among
traffic accidents. If the null hypothesis is true, the traffic accident should follow the normal
distribution. The z score is used as a criterion to decide whether or not this null hypothesis
should be rejected, while the p value tells the probability that one made a false statement.
Civil Engineering Applications of GIS Yanru Zhang
Fig. 2. Normal distribution, the p-values and z-scores
At the tail of the normal distribution, z-values are either very high or very low and the p-values
are relatively small. This means that the null hypothesis is unlikely to happen at this kind of
situations, which means spatial pattern exists. The outcome of the hotspot analysis is a z-score
and a p-value for each accident data. Thus, if in an area most accidents have higher z-score and
lower p-value, then it is very likely that this area is an accident prone area and actions are needed
to prevent or release the accident happens in this area.
APPLICATION
I choose Houston highway accident data to conduct the accident hotspot analysis. Getis-Ord Gi*
statistics is used to get the p-values and z-scores. Network spatial weights and Euclidian distance
are used as two different methods to calculate the spatial distance between traffic accident data.
Before conduct the hotspot analysis, one needs to first construct the network dataset to provide
the basic structure to calculate the network spatial weights. Then use the generate spatial weights
function to obtain the spatial weights. The last step is Getis-Ord Gi* analysis of accident hotspots.
Data Description
Data used in this study are Houston highway accident data and Houston highway network
shapfile. Accident data can be obtained from police reports and should include basic accident
Civil Engineering Applications of GIS Yanru Zhang
data attribute, for example geographic coordinate, accident duration and corresponding street
information. Figure 3 is the basic information for accident data, which includes the latitude,
longitude of the accident location, incident ID and incident duration and so on. The accident data
are in Excel file, so we need to first add the Excel data through the Add Data dialog box. To
display the accident locations on the map, one needs to use the Make XY Event Layer tool to
create a point feature shapefile.
Fig. 3. The basic information for accident data
Road network data should contain basic information to create network dataset. The Houston
highway network data were obtained from Houston-Galveston Area Council website, which
contains the basic information that required creating the network dataset.
Generate Network Dataset
The Houston highway network I get is a simple line feature file, which contains one network
impedance value-distance. To create a Network database, one needs to start ArcCatalog, enable
the network analysis extension and then create the network in the ArcCatalog by choosing the
New Network Dataset option shown as in figure 4. I give the name of the new network dataset as
hgac_majthrfare_ND, use global turns, and choose the length of the road as the cost. The
summery of the newly created network is shown in figure 5. If everything is right, then choose
finish to generate the network dataset.
Civil Engineering Applications of GIS Yanru Zhang
Fig. 4. New Network Dataset function
Fig. 5. The summary of the new network dataset
Civil Engineering Applications of GIS Yanru Zhang
After successively create the network dataset, three files will be created including two shapefiles
and one network dataset shapefile. Figure 6 shows three files that a network dataset generated.
The hgac_majthrfare_ND file contains the basic network dataset information and we can realize
the network analysis functions based on this shapefile network dataset. The
hgac_majthrfare_ND_Junctionsshapefile in this study represents the intersections of the road
network.
Fig. 6. Shapefiles of network dataset
After creating the network data set in AcrCatalog, one can open the newly created feature in
AcrMap. The distance between two points will be calculated based on the network instead of
straight distance. Figure 7 shows the travel distance between point 1 and point 2, which is longer
than straight line distance. If point 1 and point 2 are two accident locations, it is more reasonable
to use this distance to represent their spatial relationships, since the road accident is closely
related with the road network.
Civil Engineering Applications of GIS Yanru Zhang
Fig. 7. Network distance between two points
Generate Network Spatial Weights
After generating the road network dataset, we can calculate the network spatial weights. To
generate network spatial weights, a point feature class is needed to represent both feature origins
and feature destinations. In our case, the accident locations are used as the feature origins and
feature destinations. The generate network spatial weights function first allocate the accident on
the highway network and then use the travel distance to calculate the weight between each and
every other accidents locations. Figure 8 shows the process that to create the network spatial
weights between different accident data. The accident data and Houston highway network data
were first displayed on the map and then open the generate network spatial weights tool. The
input feature class is the accident shapefile, the input network is the Houston highway network
Civil Engineering Applications of GIS Yanru Zhang
and the impendent attribute is mile in this study.
Fig. 8. The generate network spatial weights function
The output of the generate network spatial weights function is a spatial weights matrix file which
contains the spatial relationships among all objects. Figure 9 is the table format of the spatial
weights matrix file, FieldID is the “from” feature ID, NID is the “to” feature ID. WEIGHT
represents spatial relationship between the FROM feature and the TO features. This file will be
used to represent spatial relationship among accident points in the Hot Spot Analysis (Getis-Ord
Gi*).
Civil Engineering Applications of GIS Yanru Zhang
Fig. 9. Network based spatial weights matrix
Hot Spot Analysis (Getis-Ord Gi*)
Several functions in ArcGIS can conduct accident hotspot analysis, One of them is hot spot
analysis(Getis-Ord Gi*) function. This function calculate the Getis-Ord Gi* statistics for each
accident to tell us where accidents with long clearance time are clustered together and where
accidents with short clearance time are clustered together. In this study, I use two methods to
study the Getis-Ord Gi* statistics of the accident data by using different spatial weights functions:
one is the most commonly used Euclidian Distance, the other one is the Network Spatial Weights.
Figure 10 shows the Hot Spot Analysis (Getis-Ord Gi*) fuction in ArcGIS. The input feature
class is accident; different input of the conceptualization of spatial relationships will lead to
different results. Available options are inverse distance, inverse distance squared, fixed distance
band, zone of indifference get spatial weights from file, distance band or threshold distance.
Civil Engineering Applications of GIS Yanru Zhang
Fig. 10. Hot spot analysis(Getis-Ord Gi*) function
I first choose the inverse distance as the conceptualization of spatial relationships and then use
Euclidean Distance as the distance method. So that the relationship among accidents are
calculated based on the inverse Euclidean distance, which is to say that if the nearby accidents
will have closer relationship then the that located far away. The results of the hotspot analysis of
the accident data is shown in figure 11. The blue points indicates accidents that have shorter
clearance time were clustered together, while the red points indicates that accidents that have
longer clearance time were clustered together. The Euclidean distance based hotspot analysis can
identify the area where accidents with long clearance time clustered and where accident with
short clearance time clustered.
Civil Engineering Applications of GIS Yanru Zhang
Fig. 11. Euclidian distance based hot spot analysis
Then I conduct the network based hotspot analysis by choose get spatial weights from file option
and use the created network spatial weights swm file to define the spatial relationships of the
accident data. The distance of any two accidents are calculated based on the network. Figure 12
is the results of the hotspot analysis, the red points indicate the locations where accidents with
longer clearance time are clustered together and the blue points indicate the locations where
accidents with shorter clearance time are clustered together. The network based hotspot analysis
is able to identify the road links where accidents frequently happen.
Civil Engineering Applications of GIS Yanru Zhang
Fig. 12. Network based hot spot analysis
COMPARISON WITH OTHER METHODS
Other two methods that can study the spatial distribution of the accident data are central feature
method and point density method. The central feature tool identifies the most centrally located
feature in the accident data. Figure 13 shows how feature central function works. The input
feature is the highway accidents, I choose Euclidian distance to calculate the distance between
each pair of features and roadway is used to group features. The output of the method is a point
feature that located in the central among studied objectives. In this study, it is the central of
accidents happen on the same road sections. Figure 14 is the accident central at each road section.
If one wants to find a best location to deal with the potential accidents in the future, the point
central tool can be used.
Civil Engineering Applications of GIS Yanru Zhang
Fig. 13. Central feature function
Civil Engineering Applications of GIS Yanru Zhang
Fig. 14. Accident central of each link
The point density method shows where the accidents are concentrated by displaying the accident
attribute on the map. This analysis method can be realized by the point density function in
ArcMap as shown in Figure 15. The input point feature is accident, population field is
LnDuration and the output cell size is 500. Figure 16 shows the accidents density map. This map
offers a general view where accidents are densely located. But in the Houston accident analysis,
the density map offers very little information. Since the central of the highway network are
densely distributed, the number of accident data is also densely located there.
Fig. 15. Point density feature function
Civil Engineering Applications of GIS Yanru Zhang
Fig. 16. Accidents density map
CONCLUSIONS
Hotspot analysis performs better than central feature and point density function in identify the
accident prone area. Since the central feature can only points out the accident central of studied
objects and cannot points out where accidents frequently happens. Although the point density
function can points out the area where accidents frequently happens, but it only displays a
density map and it only offer a general view of where accident are more likely to happen. The
hotspot identifies the locations where accidents frequently happen by using the statistical method.
This method is more reliable.
Network based hotspot analysis identify the road section where accident happens while the
Euclidian distance based hotspot analysis can only points out the area where accident frequently
happen. Because the accidents are closely related with the road network, it is more reasonable to
Civil Engineering Applications of GIS Yanru Zhang
calculate the spatial pattern of traffic accidents based on the network. Study results of the project
shows that the network based hotspot analysis are able to points out the links that accidents
happen.
FUTURE RESEARCH
Refine the network dataset according to the real highway network conditions. Because the lack
of the data, the Houston highway network dataset was simplified. The cost for the highway
network is only based on the length of the link and I did not take other impendence factors into
consideration. In the future research, in order to make the results more accurate, the network
dataset should be refined if relevant information is available.
In this study, I only focus on identifying the locations where accidents frequently happen. The
next step is to study the factors that may influence traffic accident data. One way to identify
these factors is to study the similarities among traffic accident-prone areas. So that transportation
agencies can take proper actions to prevent the accident from happening by control these factors.
REFERENCE
Xiea, Z., and Yanb,J.(2008). “Kernel Density Estimation of traffic accidents in a network space.”
Computers, Environment and Urban Systems, 32(5), 396-406.
Erdogan, S., Yilmaz, I., Baybura, T., and Gullu, M. (2007). “Geographical information systems
aided traffic accident analysis system case study: City of Afyonkarahisar.” Accident Analysis
and Prevention, 40(1), 174-181.
Erdogan, S.(2009). “Explorative spatial analysis of traffic accident statistics and road mortality
among the provinces of Turkey.” Journal of Safety Research, 40(5), 341-351.
Anderson, T.K.(2009). “Kernel density estimation and K-means clustering to profile road
accident hotspots.” Accident Analysis and Prevention, 41(3), 359-364.
Hauer, E.(1997). Observational before-after studies in road safety. Pergamon, Oxford.