analyzing traffic density in images with low temporal and ...kyungim/papers/baek_ivcnz14.pdf ·...

6
Analyzing Traffic Density in Images with Low Temporal and Spatial Resolution Michael Claveria Dept. of Information and Computer Sciences University of Hawai`I at Mānoa 1680 East-West Road, Honolulu, HI 96822, USA [email protected] Kyungim Baek* Dept. of Information and Computer Sciences University of Hawai`I at Mānoa 1680 East-West Road, Honolulu, HI 96822, USA [email protected] ABSTRACT The increasing proliferation of traffic monitoring technology has brought about sophisticated techniques for traffic monitoring such as motion tracking using active or optical sensors. Image processing techniques to identify vehicles and track velocity are possible using real time video feedback from traffic cameras along major roads and highways. However, many cities have limitations on camera and equipment quality which obstruct traffic monitoring processes. In Honolulu, the traffic images posted on the traffic monitoring website have a 3 minutes delay between frames. This makes it impossible to perform vehicle tracking based on those images. Variations in camera angles and low spatial resolution also make the task of monitoring traffic more difficult. In this paper two simple traffic density estimators with two different background models are implemented and compared to each other. The estimator first separates traffic foreground from road background using moving average or codebook methods. A modified Hough transformation identifies potential road area and then the traffic density is quantified as percentage of traffic contained within the road area of an image. These techniques deal with the limitations of traffic images with low spatial resolution and low frame rate. Categories and Subject Descriptors I.4.8 [Image Processing and Computer Vision]: Scene Analysis General Terms Algorithms, Performance, Experimentation Keywords Traffic density analysis, Background, Foreground, Moving average, Codebook, Classification 1. INTRODUCTION Traffic monitoring devices have become more commonplace in the lives of everyday motorists as techniques for detecting traffic have become more sophisticated over time. According to a recent urban mobility report, traffic congestion has been increasing over the years and costs the nation $121 billion a year in wasted travel time and fuel [1]. A real time traffic analysis can provide users with information for real time travel options to avoid congestions based on current road transit and parking conditions. It can also aid in the future development of transportation systems by providing data on traffic patterns. One common approach to traffic monitoring includes using active sensors which are typically radar, laser, or acoustic based (e.g. [2, 3, 4]). These sensors are considered active because they detect the objects by measuring the travel time of a signal emitted by the sensors and reflected by the objects [5]. The main advantage of active sensors is their ability to measuring quantities like distance without requiring powerful computing resources and sophisticated processes. Optical sensors (cameras), on the other hand, are referred to as passive sensors because they obtain data in a nonintrusive way. The advantage of passive sensors is the lower cost for implementation and maintenance. Passive sensors can also provide visual information that, when processed, can be used for tasks such as object identification (pedestrians and other objects) that active sensors cannot. They can adjust the angle of viewing much easier than radar sensors can adjust to area of effect. In the past 30 years there have been tremendous strides in using optical sensors to quantify traffic patterns on major roads and highways. The vision based traffic monitoring process has improved with the growth of the computer vision field, the proliferation of feasible technology, and the exponential increase in processor speeds. In one of the classic papers classifying traffic using a computer vision technique, Riddler et al. [6] modeled each pixel in a frame using a Kalman Filter to predict the traffic density. Koller et al. [7] then used this model to create an automatic traffic monitoring application. The model was robust to lighting changes in the scene, however, it recovered slowly and did not handle bimodal backgrounds well. Wren et al. [8] used a multi-class statistical model, Pfinder, for tracking objects and a single Gaussian model for each pixel in the background. This produced good results, however, its application was limited to indoor scenes and not tested on outdoor scenes. Friedman and Russell [9] used a pixel- wise EM framework for detection of vehicles. They classified pixel values into three separate distributions corresponding to road color, shadow color and vehicle color. For this study it is not clear how this works for pixels that present multiple background colors such as those resulting from repetitive motions or reflectance. Stauffer and Grimson [10] used a mixture of Gaussians for each pixel to deal with lighting changes and repetitive motions of objects. It successfully tracked people and cars in an outdoor environment and adjusted to changes in the background over time, but worked with video processing with high temporal resolution. Li et al. [11] proposed a real-time virtual loop detector (mimicking the idea of a physical loop detector) by using a boosted support vector machine classifier to probabilistically determine the traffic density state. They achieved an average accuracy at around 95% under different daytime illumination and weather conditions. However, they also have a high enough frame rate to perform tracking sequences in their project. * Author to whom correspondence should be addressed. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. IVCNZ’14, November 19 – 21 2014, Hamilton, New Zealand Copyright 2014 ACM 978-1-4503-3184-5/14/11…$15.00 http://dx.doi.org/10.1145/2683405.2683412

Upload: others

Post on 15-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyzing Traffic Density in Images with Low Temporal and ...kyungim/papers/baek_ivcnz14.pdf · without requiring powerful computing resources and sophisticated processes. Optical

Analyzing Traffic Density in Images with Low Temporal and Spatial Resolution

Michael Claveria Dept. of Information and Computer Sciences

University of Hawai`I at Mānoa 1680 East-West Road, Honolulu, HI 96822, USA

[email protected]

Kyungim Baek* Dept. of Information and Computer Sciences

University of Hawai`I at Mānoa 1680 East-West Road, Honolulu, HI 96822, USA

[email protected]

ABSTRACT

The increasing proliferation of traffic monitoring technology has

brought about sophisticated techniques for traffic monitoring such

as motion tracking using active or optical sensors. Image

processing techniques to identify vehicles and track velocity are

possible using real time video feedback from traffic cameras

along major roads and highways. However, many cities have

limitations on camera and equipment quality which obstruct

traffic monitoring processes. In Honolulu, the traffic images

posted on the traffic monitoring website have a 3 minutes delay

between frames. This makes it impossible to perform vehicle

tracking based on those images. Variations in camera angles and

low spatial resolution also make the task of monitoring traffic

more difficult. In this paper two simple traffic density estimators

with two different background models are implemented and

compared to each other. The estimator first separates traffic

foreground from road background using moving average or

codebook methods. A modified Hough transformation identifies

potential road area and then the traffic density is quantified as

percentage of traffic contained within the road area of an image.

These techniques deal with the limitations of traffic images with

low spatial resolution and low frame rate.

Categories and Subject Descriptors

I.4.8 [Image Processing and Computer Vision]: Scene Analysis

General Terms

Algorithms, Performance, Experimentation

Keywords

Traffic density analysis, Background, Foreground, Moving

average, Codebook, Classification

1. INTRODUCTION Traffic monitoring devices have become more commonplace in

the lives of everyday motorists as techniques for detecting traffic

have become more sophisticated over time. According to a recent

urban mobility report, traffic congestion has been increasing over

the years and costs the nation $121 billion a year in wasted travel

time and fuel [1]. A real time traffic analysis can provide users

with information for real time travel options to avoid congestions

based on current road transit and parking conditions. It can also

aid in the future development of transportation systems by

providing data on traffic patterns.

One common approach to traffic monitoring includes using active

sensors which are typically radar, laser, or acoustic based (e.g. [2,

3, 4]). These sensors are considered active because they detect the

objects by measuring the travel time of a signal emitted by the

sensors and reflected by the objects [5]. The main advantage of

active sensors is their ability to measuring quantities like distance

without requiring powerful computing resources and sophisticated

processes.

Optical sensors (cameras), on the other hand, are referred to as

passive sensors because they obtain data in a nonintrusive way.

The advantage of passive sensors is the lower cost for

implementation and maintenance. Passive sensors can also

provide visual information that, when processed, can be used for

tasks such as object identification (pedestrians and other objects)

that active sensors cannot. They can adjust the angle of viewing

much easier than radar sensors can adjust to area of effect. In the

past 30 years there have been tremendous strides in using optical

sensors to quantify traffic patterns on major roads and highways.

The vision based traffic monitoring process has improved with the

growth of the computer vision field, the proliferation of feasible

technology, and the exponential increase in processor speeds.

In one of the classic papers classifying traffic using a computer

vision technique, Riddler et al. [6] modeled each pixel in a frame

using a Kalman Filter to predict the traffic density. Koller et al.

[7] then used this model to create an automatic traffic monitoring

application. The model was robust to lighting changes in the

scene, however, it recovered slowly and did not handle bimodal

backgrounds well. Wren et al. [8] used a multi-class statistical

model, Pfinder, for tracking objects and a single Gaussian model

for each pixel in the background. This produced good results,

however, its application was limited to indoor scenes and not

tested on outdoor scenes. Friedman and Russell [9] used a pixel-

wise EM framework for detection of vehicles. They classified

pixel values into three separate distributions corresponding to road

color, shadow color and vehicle color. For this study it is not clear

how this works for pixels that present multiple background colors

such as those resulting from repetitive motions or reflectance.

Stauffer and Grimson [10] used a mixture of Gaussians for each

pixel to deal with lighting changes and repetitive motions of

objects. It successfully tracked people and cars in an outdoor

environment and adjusted to changes in the background over time,

but worked with video processing with high temporal resolution.

Li et al. [11] proposed a real-time virtual loop detector

(mimicking the idea of a physical loop detector) by using a

boosted support vector machine classifier to probabilistically

determine the traffic density state. They achieved an average

accuracy at around 95% under different daytime illumination and

weather conditions. However, they also have a high enough

frame rate to perform tracking sequences in their project.

* Author to whom correspondence should be addressed.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. IVCNZ’14, November 19 – 21 2014, Hamilton, New Zealand Copyright 2014 ACM 978-1-4503-3184-5/14/11…$15.00 http://dx.doi.org/10.1145/2683405.2683412

Page 2: Analyzing Traffic Density in Images with Low Temporal and ...kyungim/papers/baek_ivcnz14.pdf · without requiring powerful computing resources and sophisticated processes. Optical

Official Hawaii state traffic cameras located on over 90

intersections in the island of Oahu report images of the traffic

scene to a Honolulu Country traffic website every 3 minutes [12].

Such extremely low refresh rates provide traffic snapshots of

individual locations, but make it difficult to track traffic over time.

It is also difficult for an individual to process multiple traffic

images around the entire state all at once. A solution is to

automate the process to let a computer analyze traffic images and

report the traffic density conditions.

Two popular methods of measuring traffic density include

counting vehicles by means of motion sensor devices along the

road and using GPS systems to track cell phone location of a

motorist on the road. Automated computer vision processes,

especially on the island of Oahu, provide another such opportunity

for vehicle analysis due to the proliferation of cameras along the

non-freeway roads of the island. Image processing is also much

more cost effective over sensors and motion detectors since it

does not require additional hardware setup and maintenance

beyond the original cameras.

While many of the previously proposed vision based traffic

analysis methods were successful in tracking and identifying

objects within images, these methods are not fully applicable to

analyzing traffic density in images taken from traffic cameras on

Oahu. The publically available traffic images from the Honolulu

traffic cameras have a low spatial resolution and large interval

times between successive images from one location. With a 3

minutes interval, there is little chance of a car in one images being

present in a future image thus methods that rely on motion

analysis across images are not applicable. Some methods that

were designed to work indoors do not capture the changes in

lighting and shadows that occur with outdoor traffic conditions.

Others that used algorithms to identified individual objects could

only work for images with high enough spatial resolution. Low

spatial resolution makes identifying individual cars difficult since

a group of cars looks like one large clump in a highly pixilated

image. Finally, a few methods use complex algorithms that might

identify traffic down to the single pixel, but otherwise could not

be used practically due to long processing times. Since traffic is

classified as qualitatively as heavy, light, etc. a reasonable traffic

estimation is sufficient for classification.

The model presented in this paper is part of a process to create

and maintain a traffic monitoring system that accurately and

efficiently displays qualitative traffic density conditions in real

time for locations with an installed traffic camera. An application

for this work is to extend automated traffic monitoring coverage

to the arterial roads under the surveillance of traffic cameras.

When used together with other popular applications that already

measure traffic congestion along freeways and major roads, it

would help to complete traffic monitoring around the entire

island. This information could be used in several applications in

the future such as developing a real-time automated best route

finder or monitoring traffic conditions over a long period of time.

Due to real time gathering process of the images, this preliminary

model is a streamlined approach with computing times less than a

second. For simplicity, this work focused on images from four

cameras placed on the intersections of: University Ave. and Dole

St., King St. and Bishop St., King St. and Punchbowl St., and

Beretania St. and Punchbowl St. (Figure 1). A sequence of 30

training images from each site are gathered and used to build

background models that are subsequently used to estimate the

traffic density of a set of 20 test images from those sites.

Figure 1. Sample traffic images of the four different

intersections (University avenue and Dole street, King and

Bishop streets, King and Punchbowl streets, and Beretania

and Punchbowl streets).

2. TRAFFIC DENSITY ESTIMATION

2.1 Background Modeling Detection of foreground traffic requires creating a background

model using training images. The two different approaches used

to model the background are Moving Average and Codebook

methods.

Moving Average

Moving average models the background by creating an ideal “no

traffic” image to use as a base image for traffic comparisons.

Images are averaged over time with higher weight given to more

recent images. With newest data point of weight n, and each

successive data point decreasing in weight by 1, we calculate the

weighted moving average for n considered points by

12)1(

2)1( )1()2(1

nn

pppnnpMA

ntnttt

t

where weight n is given to the most recent image pt. When incorporating a new image after the n images, replace the

oldest image following the equations:

12)1(

1

1

11

)1(11

nn

NumeratorMA

TotalnpNumeratorNumerator

ppTotalTotal

t

t

tttt

ntttt

where 𝑇𝑜𝑡𝑎𝑙𝑡 is 𝑝𝑡 + ⋯ + 𝑝𝑡−(𝑛−1).

Due to the frequency of cars passing, it is difficult to find an

image without vehicles in it that could serve as an ideal

background image. One means of getting around this is by

constructing an ideal background image by averaging out vehicles

over time. Calculating a weighted moving average of a set of

images effectively creates a “no traffic” image (Figure 2). This

method finds the mean pixel intensity over a series of training

images. The idea is that most of the images will have the ideal

background pixel while a few others will have a car in the

background for a given pixel. Creating the average image will

theoretically resemble the background more than the vehicles.

This method gives weighted priority to more recent images to

better reflect current conditions. After each image is evaluated the

image is added to the vector of test images to enhance the current

sample of images. Thus the system will learn and theoretically be

more accurate as it gathers more images over time. It can also

adapt to changes in the background over time.

Figure 2. The moving average images for each of the four

intersections generated from 30 training images from each site

shown in Figure 1.

Page 3: Analyzing Traffic Density in Images with Low Temporal and ...kyungim/papers/baek_ivcnz14.pdf · without requiring powerful computing resources and sophisticated processes. Optical

Codebook

A second method of constructing a background model uses a

structure called codebook to differentiate between foreground and

background [13, 14]. Codebook looks at frequencies of each pixel

value over a sequence of images. If there are enough values that

extend beyond a single range of values, codebook accepts discrete

ranges of values (Figure 3).

Figure 3. A simple illustration of codebook formation

reproduced based on the figure in [13]. The original pixel

values grow the box bounds over time and larger values that

occur later in the graph create two discrete box ranges. If this

is the finalized codebook background model, any values that

fit into the range of those two boxes at the end will be

considered part of the background.

In this work, we model codebook using simple boxes that cover

common values as seen over time. Each box is defined by min

and max thresholds for each of the three color axes. The box

grows (i.e. the bounding thresholds expand) if the newer

background samples fall within a learning threshold above the

max or below the min threshold. Any background samples that

fall out of the box will start a new box. Codebook method can

handle pixels whose values might change dramatically but still

keep to a discrete range. It is more robust in handling changes in

the background than moving average and can adapt to shifting

shadows and lighting with enough training images, but it is more

computationally intensive than the moving average method.

2.2 Foreground Extraction Process Once the background model is constructed from the training

images by the moving average or the codebook method, it is

converted to grayscale background image and the foreground

traffic in a test image is found by subtracting the background

image from a grayscale version of the test image. Then, dilation

and erosion operations are applied to the resulting image using a

33 block structuring element to remove noise and to fill the holes

of larger regions [15]. Thresholding is applied to the image after

the morphological operators to produce a black and white image.

The white pixels in the binary image should represent the area

where vehicles occupy the scene. Finally, a connected component

analysis is performed to remove small regions and to fill in gaps

between regions that should be connected [15].

(a) (b) (c) (d)

Figure 4. (a) Original image. (b) Moving average background.

(c) Result of subtracting background (b) from (a) and

applying threshold. (d) Result of the connected components

performed on (c) and superimposed over the image (a).

Figure 4 shows the results of the foreground extraction process.

Image (d) shows that the morphological operators followed by

connected component analysis cleans the foreground image (c)

formed by background subtraction. Note how some of the

extraneous features, such as trees, as well as some smaller cars in

the background are removed in image (d). By adjusting

parameters one can vary the regions identified as foreground.

2.3 Extracting Region of Interest (Road Area) The traffic images have background area that cannot contain

traffic but can nevertheless throw off results due to excess noise.

Such areas include the sky, buildings in the background, trees and

sidewalk area. For more accurate results it is important to identify

the region of potential traffic, i.e. the road area.

Road area can be manually found by cropping out buildings,

sidewalk and non-essential foreground areas, but can be

troublesome especially when having to manually crop each image.

One possible approach to this problem is to use Canny edge

detection [16] to calculate the outlines of objects within the

background image and then use a modified Hough transformation

[17] to detect the extended road lines. In each of the averaged

images, the longest lines in the image are likely to correspond to

outlines of the main road extending through the image. One can

then calculate the area within the longest lines to estimate the road

region. Classification of traffic density is done using the found

foreground traffic area together with the calculated region of

interest road area.

(a) (b) (c) (d)

Figure 5. (a) Background image. (b) Results of the Canny edge

detection done on the image (a). (c) The lines calculated from

the thresholded Hough transformation. (d) The road area in

red calculated by extending the lines in (c) and filling in the

area within those boundary lines.

Figure 6. Calculated road areas for the other three

intersections. Although the areas do not capture the perfect

road area, they offer a reasonably good approximation of

region of interest which can be used to filter out noise in non-

road portions of the images.

Using Canny edge detection on the background image (Figure

5(a)), we get the outlines of objects in the image (Figure 5(b)).

After finding the edges in an image, applying a thresholded

Hough transformation captures the long lines in the image which

signify the road lines (Figure 5(c)). The Hough transform keeps

track of any line with the equation y = mx + b for slope m and y-

intercept b. Any line can be uniquely identified by the pair of

Page 4: Analyzing Traffic Density in Images with Low Temporal and ...kyungim/papers/baek_ivcnz14.pdf · without requiring powerful computing resources and sophisticated processes. Optical

slope and y intercept (m, b). In the implementation, the polar

coordinate representation of a line is used to construct the

parameter space due to the problems with (m, b) representation –

unbounded parameter domain and infinite m for vertical lines. At

each pixel the algorithm determines if there is enough evidence of

an edge or a line segment that connects through that pixel. Each

unique line contains a particular evaluation value in the algorithm

and more evidence of specific line parameters will increase the

evaluation value of that line. The lines with the largest value are

then chosen as the most visible lines in the image. Since road

lengths span a large portion of the image, a high enough threshold

will isolate those large lines. If we extend the lines and fill in the

area, we find the road region in the image (Figure 5(d)). Figure 6

shows the extracted road areas for the other three locations.

2.4 Classification The easiest way to classify the traffic density would be to count

the number of foreground pixels located in the road area.

Foreground traffic would only be counted if it is within a buffer

region of the calculated potential road area. Since different traffic

camera views have different sized road areas, traffic data can be

extracted based on the ratio of the calculated foreground area to

the potential road area of that camera’s image. This ratio can be

broken down into qualitative classifications of traffic. For

example, a 0.025 ratio for a certain intersection might mean light

traffic while 0.5 might indicate heavy traffic. Since the

classification of images between light, medium, heavy, etc.

remains arbitrary, specifying exact cut off values for a

classification would best be done after monitoring a system over

time. This way one could make better judgments as to the

accuracy of such measurements as judged by comparison to

human interpretations of the images. Since assignment of the

classification is a judgment call and may differ based on the

intersection and the camera, no defined classification has been

reached in this work at the moment. Such a classification can be

done through experimental analysis and can be compared to

human interpretation for evaluative purposes.

3. RESULTS The estimation system was implemented in C++ with OpenCV

library. 200 images were gathered for the classification process

with 50 images from each of the four different camera locations in

the island of Oahu. 30 of those 50 images were chosen at random

and used to train the codebook and moving average models. The

other 20 images from each site were evaluated for traffic content.

The output for each tested image is a number corresponding to

calculated traffic ratio in that image.

Due to the limited space, results for four test images for each site

are shown in Figure 7. The numbers below each image indicate

the traffic density percentage calculated using moving average

and codebook methods. MA represents the traffic density

calculated from the moving average method of identifying

foreground traffic and CB represents the codebook methods of

identifying foreground traffic. The four test sites show

significantly different results in terms of how they were classified

using the two different methods.

3.1 University Ave. and Dole St. Codebook and moving average had less variation between them at

University and Dole relative to the variation between the two

methods at the other three sites (first column in Figure 7). They

both performed similarly for this particular intersection, with the

greatest issue being the perspective of the camera. Both methods

appeared to underestimate the traffic in images with many small

cars in the distance and overestimate the traffic in images with

vehicles closer to the camera. The two way traffic presented

another challenge as traffic in the outgoing lane (with vehicles

oriented toward the upper right of the image) would occupy

greater area than the incoming lane (vehicles oriented toward the

lower left of the image) due to the position of the camera. This

particular intersection had limited issues in lighting which would

prove to be a significant issue in the analysis of the other

intersections. Overall, this intersection is easy/medium to classify

relative to the other intersections.

MA: 9.68 36.96 48.99 4.22 CB: 14.80 23.14 7.04 1.71

MA: 12.63 36.60 37.10 4.31

CB: 13.73 39.94 23.44 17.08

MA: 4.19 41.74 41.66 4.09

CB: 1.12 15.40 44.11 22.83

MA: 6.04 57.06 21.52 12.63

CB: 7.93 4.05 6.59 15.75

Figure 7. Traffic density quantified as percentage of traffic

within the extracted road area for the four intersections.

(From left column to the right column: University avenue and

Dole street, King and Punchbowl streets, King and Bishop

streets, Beretania and Punchbowl streets.)

3.2 King St. and Punchbowl St. For the set of images used in this work, King and Punchbowl

appeared to be a busier intersection than the other three (second

column in Figure 7). Codebook did a much better job of correctly

identifying low traffic values for the images with few vehicles in

them. The problem with moving average came from the

differences in lighting between the images. Although shadows

were minimal, the effect of changes in lighting over time meant

that the road appeared shiny and light in some images while

darker in others. The estimation process using moving average

was picking up significant portions of the road as foreground

since it is different in color from the background image.

Codebook, on the other hand, could adapt to the lighting changes.

While the camera on King and Punchbowl had a similar

perspective view to the one on University and Dole, the one way

traffic made computing the density easier. Traffic for this

intersection is easy to classify because of its closer proximity to

the road, minimal shadow effects, and one way traffic flow.

Page 5: Analyzing Traffic Density in Images with Low Temporal and ...kyungim/papers/baek_ivcnz14.pdf · without requiring powerful computing resources and sophisticated processes. Optical

3.3 King St. and Bishop St. King and Bishop had significant lighting changes within the span

of the twenty test images which threw off some results for the two

methods (third column in Figure 7). Moving average based

estimation had high calculated traffic much like that of King and

Punchbowl due to the variability in lighting and the dramatic

shadow effects in the downtown environment. Codebook did a

decent job adapting to the lighting issues except for three images

in the test set with values around 44, 65, and 57. (Only one of the

three images is shown in Figure 7.) These three images had very

little traffic in them and the estimation problem most likely

stemmed from the unique shadow conditions in those three

images. Unfortunately, the training images did not have that

particular shadow pattern as the large shadow appeared to have

entered over the course of a few minutes. One of the problems

with the low temporal resolution is that dramatic changes in

shadows can occur over the course of a few frames. Other than

those three images, codebook did a good job of detecting limited

traffic and correctly identifying higher levels of traffic. This

particular intersection is difficult to classify because of the

dramatic shifts in lighting and shadows most likely due to the

surrounding buildings.

3.4 Beretania St. and Punchbowl St. Beretania and Punchbowl had smaller traffic density calculations

and overall it appeared to have much less traffic in the images

(fourth column in Figure 7). Both moving average and codebook

methods seemed to have trouble picking up the small cars in the

background. The intersection is at a four way stop and the camera

perspective is pulled back from road more than the other

intersections. Moving average based estimation had consistently

low values except for some where the shadow of a tree took up a

large section of the road area. It did not do a good job estimating

traffic for this intersection. Codebook based estimation appeared

to identify the few images with heavier traffic. The vast majority

of vehicles in the image appeared clumped together in the

background, most likely due to the traffic light causing them to

stop. Overall this intersection and intersections of this type are

probably the most difficult to classify. The four way intersection

requires a camera to pull back in order to capture traffic coming

from all four directions. This makes the resolution even worse.

Since traffic was mostly in the background, any traffic moving

from right to left or left to right at that intersection would appear

more prominently. This was the case in a few test images. Shadow

effects are also a concern as a large tree shadow progressively

emerged onto the road area through successive images.

4. DISCUSSION

4.1 Problems with Moving Average The first classification attempt using moving average as a

background model performed poorly for images with high levels

of contrast from shadows and lighting. Moving average resulted in

a neutral background image so shadow or light details in an

original image are identified as foreground traffic after

background subtraction. Certain intersections with high levels of

trees and buildings in the surrounding area, such as the one in

King and Bishop streets, give off patterned light areas at certain

times of day (Figure 8).

Implementation of codebook helped to correct for lighting

changes since the training images captured over the course of the

day had enough shadow and lighting effects for codebook to adapt

to the different background conditions (Figure 9). There is a large

difference between codebook and moving average for several of

the test images from King and Bishop streets, and King and

Punchbowl streets. Moving average tends to overestimate

foreground traffic by counting different lighting patterns as traffic.

(a) (b) (c) (d)

Figure 8. (a) This traffic image from King and Bishop streets

shows interesting shadow and lighting effects. (b) The moving

average background image taken over the course of one day

creates a neutral image. (c) Background subtraction and

thresholded image reveals several lighted areas are perceived

as foreground. (d) The calculated foreground image is

superimposed over original image (a).

In general foreground extraction with codebook resulted in

respectable outcome for relative traffic. The process of evaluating

methods is somewhat imperfect since there is no best measure for

a qualitative process of evaluating traffic density. However,

codebook has discernable problems identifying background on a

larger level although some pixels around the larger areas were

added. These were not significant enough to warrant concern

since traffic density analysis does not require such precision.

Figure 9. The codebook result of the same street image as in

Figure 8(a) is shown in the middle. It has a faded

identification of lighting which, after thresholding and post-

processing, results in the image on the right. Such a

classification would be successful since there are no cars in the

image and there should not be any white foreground pixels.

4.2 Perspective View of Camera The camera position differs for different intersections and rarely is

head on with the traffic. This results in the skewing of perspective

angles. For example, the University and Dole traffic camera

perspective makes cars in the lower right-hand area of the image

larger while causing cars in the incoming lane to appear smaller

(see the first image in Figure 7). As with most images, the

vehicles further away from the camera take up less space than

those closer to the camera which can skew results for images with

many cars in the distance or a few large cars up close. Some cars

in the distant background are treated as noise if their size is not

large enough to register for the connected component threshold.

4.3 Different Sites The different intersections provided different results for the

different models. On the University Ave. and Dole St. intersection

the codebook and moving average performed similarly on the

analysis with the weakness that both tend to disregard cars in the

distant background. The images from the intersections of

University Ave. and Dole St., and Beretania and Punchbowl

streets tend to have a smaller proportion of traffic relative to those

of the other two intersections. Due to the camera angle, individual

cars tend to be smaller in those intersections as opposed to King

and Bishop streets, and King and Punchbowl streets.

Page 6: Analyzing Traffic Density in Images with Low Temporal and ...kyungim/papers/baek_ivcnz14.pdf · without requiring powerful computing resources and sophisticated processes. Optical

From the test images, King and Punchbowl streets, and King and

Bishop streets have the most variance in lighting and shadows

between images. This caused the largest difference between

moving average and codebook calculations within the same

image. King and Punchbowl streets appear to have more traffic in

the test images as compared to the other intersections.

University Ave. and Dole St. provide the additional challenge of

two way traffic, while Beretania and Punchbowl streets have even

more complications as a four way stop. There is no way for the

current model to take into account heavy traffic in one direction

but not the other since it only measures total traffic throughout the

entire image. One possibility is to have multiple traffic models,

one for each lane and direction. However, that runs into the

additional challenge of identifying the car direction which is

difficult for a single image or a sequence of images with a large

between-image interval.

5. CONCLUSIONS AND FUTURE WORK The method studied in this paper provides a simple and efficient

approach to traffic density estimation when faced with the

limitations of low spatial resolution and large intervals between

successive images. The 3 minute interval rate between images

makes tracking vehicles virtually impossible and negates the

usage of sophisticated tracking algorithms. Low spatial resolution

makes it difficult to identify or find the exact outline of a vehicle

since the image is highly pixilated.

Moving average as a background model captured traffic well

under ideal lighting conditions but codebook worked better for

images with heavy shadow and lighting issues. A modified Hough

transformation did a decent job of identifying the road lines in an

image which were used to identify the road area. The simple

traffic density estimation method had trouble dealing with camera

perspective and had differing levels of accuracy for each of the

four tested traffic sites due to unique challenges of each

intersection. This work suggests an initial procedure for

classification of traffic density in real time which can be built

upon by implementing more advanced techniques in the future.

Even though it is simplified to deal with technological constraints,

this traffic density estimation system does a sufficient job of

analyzing traffic patterns around the island of Oahu.

There are a number of ways to improve current work so that it can

deal with some of the difficulties previously discussed and more

complicated situations. For example, a transformation of the street

image changing the viewpoint to a top down view could help with

the perspective issue. This makes vehicles in the distance take up

roughly equal area compared to those similar sized vehicles closer

to the camera.

Another problem to consider is that cars of similar color to the

road are sometimes removed from the image. Future

implementations may consider a separate color component, such

as hue, to avoid the problem. Better filtering methods could also

be used such that the threshold can be altered to recognize more

subtle differences between vehicle features and the roads.

One more future goal could be to implement a capability of

recommending a best path between locations that avoids heavy

traffic as detected by the cameras. One can expand on the results

of four different locations to incorporate all 90+ traffic locations

where the Honolulu Traffic Cameras are installed. Individual

traffic images from a location do not provide much information,

but as a network they can work together to recommend a path of

least traffic congestion from a starting point to a destination.

6. REFERENCES [1] Schrank, D., Eisele, B., and Lomax, T. 2012. TTI’s 2012

Urban Mobility Report. Texas A&M Transportation Institute.

[2] Sen, R., Maurya, A., Raman, B., Mehta, R., Kalyanaraman,

R., Roy, S., and Siriah, P. 2012. Kyun queue: A sensor

network system to monitor road traffic queues. In

Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems (November, 2012), 127-140.

[3] Wang, C., Thorpe, C., and Suppe, A. 2003. Ladar-based

detection and tracking of moving objects from a ground

vehicle at high speeds. In Proceedings of IEEE Intelligent Vehicles Symposium (June 2003), 416-421.

[4] Barbagli, B., Manes G., Facchini, R., and Manes, A. 2012.

Acoustic sensor network for vehicle traffic monitoring. In

Proceedings of the 1st International Conference on Advances

in Vehicular Systems, Technologies and Applications (Venice, Italy, June 24-29, 2012), 1-6.

[5] Sun, Z., Bebis, G., and Miller, R. 2006. On-road vehicle

detection: A review. IEEE Trans. on Pattern Analysis and Machine Intelligence 28, 5 (May 2005), 694-711.

[6] Ridder, C., Munkelt, O., and Kirchner, H. 1995. Adaptive

background estimation and foreground detection using

Kalman-filtering. In Proceedings of International Conference on Recent Advances in Mechatronics, 193-199.

[7] Koller, D., Weber, J., Huang, T., Malik, J., Ogasawara, G.,

Rao, B., and Russel, S. 1994. Towards robust automatic

traffic scene analysis in real-time. In Proceedings of the

International Conference on Pattern Recognition (Israel, November 1994), 126-131.

[8] Wren, C. R., Azarbayejani, A., Darrell, T., and Pentland,

A.P. 1997. Pfinder: real-time tracking of the human body.

IEEE Trans. on Pattern Analysis and Machine Intelligence

29, 7 (July 1997), 780-785.

[9] Friedman, N. and Russell, S. 1997. Image segmentation in

video sequences: A probabilistic approach. In Proceedings of

the 13th Conference on Uncertainty in Artificial Intelligence

(August 1-3, 1997). UAI ’97. Morgan Kaufmann, San Francisco, CA, 175-181.

[10] Stauffer, C. and Grimson, W. 1999. Adaptive background

picture models for real time tracking. In Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition (Fort Collins, CO., June 1999), 246-252.

[11] Li, Z., Tan, E., Chen, J., and Wassantachat, T. 2008. On

traffic density estimation with a boosted svm classifier

Digital Image Computing: Techniques and Applications,

117-123.

[12] http://www1.honolulu.gov/cameras/traffic.htm

[13] Bradski, G. and Kaehler, A. 2008. Learning OpenCV.

O’Reilly Media, Inc., Sebastopol, CA.

[14] Kim, K., Chalidabhongse, T. H., Harwood, D., and Davis, L.

2005. Real-time foreground-background segmentation using

codebook model. Real Time Imaging 11, 3 (June 2005), 172-185.

[15] Gonzalez, R. C. and Woods, R. 2007. Digital Image

Processing. Prentice-Hall, Inc., Upper Saddle River, NJ.

[16] Canny, J. 1986. A computational approach to edge detection.

IEEE Trans. on Pattern Analysis and Machine Intelligence 8, 6 (November 1986), 679-698.

[17] Kim, Y. and Lyu, S. 1989. Extracting lines using a modified

Hough transformation. Multidimensional Signal Processing Workshop (Pacific Grove, CA., September 6-8, 1989).