mastersprojectpaper

10
Webcams and Social Media Jessica Graham There’s a lot of data in the digital world. Some of this data is related, some of it isn’t, and some of it could or maybe even should be. This project specifically looks at digital data in the world and how it can be connected to an already-existing database of webcameras and images, focusing primarily on social media as a source of data. 1. Introduction The internet is a very large place that is used for a lot of different purposes. Some people write articles to share knowledge with others, some people like to post funny pictures, and others like to talk about what’s currently happening near them. This third behavior in particular can shed a lot of light on the current state of a particular location. If it’s raining, a lot of people in the area of the rain are likely to post about it. If there’s a major event, such as a concert, people have a tendency to share that they are at said event. This information can be used to make inferences about other information. In this project, I look at applying world information obtained through Twitter, and later the National Weather Service, to tag images taken from nearby webcameras being tracked by the Archive of Many Outdoor Scenes. 1.1 Terminology and Data The data for this project comes from three places: Twitter, the National Weather Service, and the Archive of Many Outdoor Scenes. Twitter is a popular social media site that allows people to post 140- character-maximum status updates known as Tweets. This medium is intended for simple sharing of thoughts, with the option of attaching a location to one’s Tweets if one wants others to know where he or she is at the time. Tweets with this location information attached are said to be geotagged. The National Weather Service, or NWS, is a service run by the National Oceanic and Atmospheric Association with the purpose of tracking and predicting weather. For this project, I was interested

Upload: jessica-graham

Post on 12-Apr-2017

77 views

Category:

Documents


0 download

TRANSCRIPT

Webcams and Social MediaJessica Graham

There’s a lot of data in the digital world. Some of this data is related, some of it isn’t, and some of it could or maybe even should be. This project specifically looks at digital data in the world and how it can be connected to an already-existing database of webcameras and images, focusing primarily on social media as a source of data.

1. IntroductionThe internet is a very large place that is used for a lot of different purposes. Some

people write articles to share knowledge with others, some people like to post funny pictures, and others like to talk about what’s currently happening near them. This third behavior in particular can shed a lot of light on the current state of a particular location. If it’s raining, a lot of people in the area of the rain are likely to post about it. If there’s a major event, such as a concert, people have a tendency to share that they are at said event. This information can be used to make inferences about other information. In this project, I look at applying world information obtained through Twitter, and later the National Weather Service, to tag images taken from nearby webcameras being tracked by the Archive of Many Outdoor Scenes.

1.1 Terminology and DataThe data for this project comes from three places: Twitter, the National Weather Service,

and the Archive of Many Outdoor Scenes. Twitter is a popular social media site that allows people to post 140-character-maximum status updates known as Tweets. This medium is intended for simple sharing of thoughts, with the option of attaching a location to one’s Tweets if one wants others to know where he or she is at the time. Tweets with this location information attached are said to be geotagged.

The National Weather Service, or NWS, is a service run by the National Oceanic and Atmospheric Association with the purpose of tracking and predicting weather. For this project, I was interested in the severe weather warnings provided by the site, particularly those pertaining to thunderstorms.

The Archive of Many Outdoor Scenes, or AMOS, is a website hosted by the Media and Machines lab out of Washington University in St. Louis. The site functions as a collection of webcameras from all over the world, currently numbering approximately 15,000. Images from each of these webcameras are captured and stored every 30 minutes, leading to a sizeable database that currently contains more than 500,000,000 images.

1.2 Related WorkAs far as I or my advisor know, this is the first work concerning the labeling of

webcamera images from either social media or a weather service.

2. Image TagsAMOS contains a large number of images, but at the beginning of this project there was

no way to store image labels. This meant that either this data would need to be stored elsewhere, or that a way to store image labels would need to be added to AMOS. I took the implementation approach and implemented this adding and storing of image labels, or image tags. I also created a page for browsing and searching through these image tags, as well as API functionality to allow image tags to be created and accessed through outside applications.

2.1 Tag ImplementationTo store image tags, the database structure of AMOS needed to be altered slightly. The

immediate thought was that image tags are associated with a particular image, and thus could be implemented as an additional field in the images table. This made it very easy to directly add tags to an image, as well as to query for the tags associated with a particular image. However, this structure did not scale well and actually caused the AMOS website to crash for a couple of hours. This is because there are more than a half billion images, and having a text field of indefinite length associated with each one of these added overhead that the server couldn’t handle.

Clearly, the obvious solution wasn’t feasible. I had to instead investigate how to store image tags such that they still mapped tag to image without necessitating additional information for every existing image. This led to a separate database table specifically for image tags. For each tag this table stores an ID, the ID of the image to which the tag belongs, and the tag itself. This changes the implementation of image tag creation and retrieval slightly. It is still the case that only the image and the desired tag are necessary to create a tag with the appropriate properties, but the process is now to create a new tag object in the database as opposed to changing a field of an existing image object. It is also still the case that only an image is needed to retrieve the tags associated with it, but the search is now over a foreign key in the image tag table as opposed to simply retrieving a field stored as part of the image itself.

2.2 Search ImplementationOnce image tags were implemented, the next step was to create a way to search

through the images that had been tagged. AMOS already had a camera search page, in which an empty query is used to browse through the cameras. I used a lot of the HTML from that page for the sake of consistency. I was also able to use the same search algorithm, though I had to tweak it to search over image tags and image tag properties instead of cameras and camera properties. This provided nice initial results, except for the fact that multiple tags can be associated with the same image. This led to many repeats of the same image appearing during the browse functionality, as each tag is a separate object even if it is associated with the same image.

This image duplication was undesirable, and was fixed by adding a filtering process to the image tags initially returned by the search. This set of tags was then queried for the unique image IDs appearing in the set, and this set of image IDS was then used to select the exact set of images that had image tags associated with them.

2.3 API ImplementationThe last piece of image tags was to make them useable by outside applications.

Because AMOS is implemented with Django, the functionality I had already written was being done through URL handlers. This made it easy to extend into something useable by an outside application. For both creating and retrieving image tags, I simply had to allow for different image information to be passed in and processed. This change was necessary because, at the points that these functions are called within AMOS, different information is known about the image than an end-user or outside application might have access to.

3. TwitterThe first outside application of image tags that I investigated was using current

information from Twitter to tag current images. This application takes in a list of cameras on AMOS, a collection of geotagged Tweets, search terms to look for in the Tweets and use as image tags, and a few parameters such as a distance tolerance to determine if a Tweet and camera are ‘close’. The output from this application is the image tags that are added to AMOS.

Camera information was already available on AMOS, so the next steps were to determine how to collect Tweets, decide upon the parameters, and determine how to process the Tweets into image tags.

3.1 Collecting TweetsTwitter has two APIs with which their service is accessible. The first of these is the

REST API, which is used to periodically submit queries and obtain a collection of results. The second one is the Streaming API, which is used to open and monitor a live stream of Tweets as they are posted. There is further difference between the APIs, as the REST API allows a query for search term(s) AND a single location while the Streaming API allows filtering for search term(s) OR locations(s).

While the REST API looked like a straightforward method to collect Tweets, AMOS contains more than 15,000 cameras. Because this API only allows one location query at a time, this would mean making 15,000 queries for each individual search term. Even if only one search term were used, Twitter’s rate limiting of 450 calls every 15 minutes would mean that about 33 connections would have to be created to query for every camera.

This left the Streaming API as the only viable option. The benefit of this API is the fact that it allows for querying of any geotagged tweet as opposed to only looking at one specific location at a time. However, since this is a stream and because it cannot filter on both locations and search terms simultaneously, Tweets can only be accessed in real time and need to be filtered by search term on the application side instead. Because the desire is to have a collection of Tweets to process at once, the Tweets must also be temporarily stored in the application as well.

3.2 Choosing ParametersThere are three main parameters involved in this application. The first, the time window,

determines how often the application should process collected Tweets, and is specified in minutes. The second, the close distance, is measure of how far two points are allowed to be from each other while still being considered close. This is specified in miles. The last is a

minimum threshold for the number of Tweets needed to match a given search term for an image to be tagged with the term. Various parameters provide various results, and the parameters can be tweaked through experimentation to find the best values for the particular search terms being used. For example, a smaller close distance is likely to be better for capturing a specific weather event, while a larger close distance might have a better chance of capturing a forest fire. Time windows can also be adjusted depending on how likely the search term is and how important it is to have an camera tagged immediately versus waiting.

3.3 Processing TweetsOnce the Tweets have been collected and filtered, they need to be processed. This

process is as follows:● For each camera:

○ Find Tweets ‘close’ to the current camera○ For each search term:

■ Count how many tweets that match this camera also match this term■ If this count is more than the pre-determined threshold:

● Tag the current camera image with the term

4. National Weather ServiceThe second outside application of image tags that I implemented was to look at the

active thunderstorm warnings on the NWS website. This involved scraping and processing the information on the page to determine which cameras to tag, with identical parameter definitions to the Twitter application.

The scraping process was fairly straightforward, as each warning has the same format. This made it easy to separate warnings as well as to determine how to identify the location information in each warning.

Once the location information is extracted, it is separated into county and state. This information is then fed into Google’s geocoder to get a latitude,longitude coordinate pair, which is used to determine which webcameras are close to the issued warning. Webcameras determined to be close to a warning are then tagged with the term “thunderstorm warning”.

5. ResultsWhile both of these applications have been successfully implemented and actively

adding image tags to AMOS, the results are images that are not particularly high resolution. This makes it difficult to make out details in a black-and-white image, particularly details that indicate weather information about the image. While this problem could potentially be solved by maintaining a large image size, this would lead to unnecessarily excessive length of this paper. If the included results look unimpressive, I implore you to visit the AMOS website at http://amos.cse.wustl.edu, click the Browse Tagged Images link on the right side bar, and enter the search terms twitterPic and NWSpic for Twitter- and NWS-tagged images respectively.

5.1 Images Tagged Through TwitterThe images below were tagged from Tweets with the search term “rain”. There are a

handful of false positives from initial runs with larger close distances, which have been eliminated from future runs by making the close distance smaller. However, even in the initial runs, the true positives outweigh the false positives.

5.2 Images Tagged Through NWS WarningsThe following images were tagged from NWS thunderstorm warnings. With the

exception of a few inactive cameras that weren’t properly filtered out and a few warning locations that were geolocated incorrectly due a bug in an earlier iteration of the program (there are a few that ended up in Norway and England), the results are very convincing.

6. Conclusion and Future ApplicationsThis project was a foray into the idea of connecting two areas of digital data, in this case

connection Twitter and the National Weather Service to the AMOS image database. This included three main tasks: implementing image tags, creating the Twitter-processing application, and creating the NWS-scraping application. However, the idea of using data from a social media site such as Twitter need not be limited to regular searches for a pre-defined list of terms. A future extension of the work I’ve done might be to a live Twitter/webcamera search. This search would involve typing in a query term, such as “Justin Bieber sighting”, finding tweets that match the search term, and finding the cameras near those tweets. Images could then be either automatically tagged or displayed for the user to select which ones to apply the tag to. Getting information social media sites other than Twitter is another possibility.

7. AcknowledgementsI would like to thank Robert Pless, my advisor, and Nathan Jacobs for maintaining

AMOS, without which this project wouldn’t even exist. I would also like to extend thanks to Abby Stylianou for graciously giving up so much of her time to help me get acquainted with AMOS and synthesize test data for my applications, and to Austin Abrams for restoring AMOS when I crashed it as well as providing me with a lot of useful advice along the way.