mining social media in extreme events: lessons learned from the darpa network challenge
Post on 25-Feb-2016
24 Views
Preview:
DESCRIPTION
TRANSCRIPT
Mining Social Media in Extreme Events: Lessons Learned from the
DARPA Network Challenge
Nicklaus A. Giacobe, Hyun-Woo “Anthony” Kim and Avner Faraz
[nxg13, hxk263] @ist.psu.edu, fia5005@psu.edu
The DARPA Network Challenge
• Conceived to better understand and harness the power of the Internet– How does one get a video to go viral on
YouTube?– Leveraging social networks to solve
“intractable” or “impossible” tasks– Short time frame
• Announced on October 29, 2009• Challenge occurred on December 5, 2009 (5 weeks
of possible prep time)
The DARPA Network Challenge
• DARPA launched 10 red weather balloons• Tethered to fixed locations in public places
across the continental United States• Find them and report GPS coordinates of
all 10 before anyone else does and win the $40,000 prize
• The balloons would be “up” from 0900 EST until 1600 local time (7-10 hours) and then taken down
Final Results
• 4600 “Teams” registered with DARPA• 58 Teams were “in the hunt” and submitted
2 or more correct locations• Top teams used mass marketing and
mass motivation techniques (offered to share the prize $ with observers)
• Each team has a “lesson” applicable to homeland security
Top Finalists – We’re #10!
From: https://networkchallenge.darpa.mil/FinalStandings.pdf
Overview• Who are the ?• Caucus of academic institutions studying information
science• The iSchools are interested in the relationship between
information, people and technology. • There are 28 academic Institutions (Colleges or
Departments) in various universities across 8 countries• Some iSchools have roots in Library Science,
Computational Sciences, MIS, Business Management, Cognitive Science, Human-Computer Interaction (HCI) and other fields. Each iSchool has its own focus and competencies.
Team Organization
• Command Structure– Attempted to follow ICS from Fire Service– Most Team Members unfamiliar with this
organizational structure, therefore very limited success
• Operational Section– 2 Branches – Direct Observation and Cyberspace
Search • Facilities
– 211 IST – EEL – Command Post– 208 IST – Classroom – Cyberspace Search
Methods
• Direct Observation– Recruit observers from the iSchools Caucus– Report sightings through
• Website• Phone / SMS• Email
• Cyberspace Search– Multiple Intel Teams searching open communications
online• Twitter, Competitor Websites, No Hacking
• Confirmation and Decision Making
Methods
• Technologies Used– Twitter Capture – Anthony Kim– Custom Crawler – Madian Khabsa– Maltego2 – Avner Faraz
Methods
Direct Observation•Recruiting•All of iSchools to send out
recruiting messages•Phone/SMS/Web/Email Reporting
Cyber Search / Intel• Twitter• Websites• Competing Team Sites
Observer Confirmation• Pre-Registered Observers• Recruited Observers
• iSchools Network• “Friends” Network
Analysis•Photo Analysis•Provenance Evaluation
D-S Evidence Theory
Combination
Methods
• Dempster-Shafer Evidence Theory Combination– Combine evidence from multiple sources under
uncertainty– Apply confidence weights to sensor data
• Intended, but applied cognitively– Analysts were to provide report data with confidence
values (0=low, 10=high)– Some algorithmic process would have been needed
to combine large numbers of reports– … but we had *extremely low* # of reports
Pre-Challenge Org.
Team Organization
• Other Universities– Various Schools to send recruiting messages– UNC – distributed own phone number and
email address for their recruiting messages– Univ. of Illinois – single-person cyberspace
search division
Team Organization
• Finance and Logistics– Google Voice (814-4BALL01)– Website Design (balloon.ist.psu.edu)– Email Address (balloon@ist.psu.edu)– Google Wave (Intel and Command Comms)– Private Twitter (late attempt at outbound comms)– Coffee, Donuts, Pizza, Soda and Homemade
Cookies ($100!)– Incentives and Rewards (10 GPS Systems
Offered by iSchools)
Team Members
• University of Illinois– John Unsworth– Maeve Reilly– Karyn Applegate
• University of North Carolina– Aaron Brubaker– Kjersti Kyle
• Other iSchools– Marketing/Communications Staff
Team Members
• Penn State University– Command Post Team
• Nick Giacobe• Wade Shumaker• Louis-Marie Ngamassi Tchouakeu• John Yen• Jon Becker (p/t)• Michelle Young (p/t)
– Logistics / Website• Shannon Johnson• Lei Lei Zhu
Team Members
• Penn State University– Operations (Cyberspace Search Branch)
• Crawler Task Force– Madian Khabsa and Jian Huang
• Twitter Capture Task Force– Hyun Woo “Anthony” Kim and Airy Guru
• Intelligence Analysts– Chris Robuck– Greg Traylor– Anthony Maslowski– Gregory O’Neill– Joe Magobeth– Avner Faraz– Matt Maisel– Earl Yun
Results
• Recruiting Message Reports– University of Illinois – Email to iSchool alumni; to faculty, staff and
students.– University of Pittsburgh – Email to all alumni; to faculty, staff and
students; Webpage article on main page & alumni news page; LinkedIn announcement to iSchool at Pitt group members; and Facebook announcement to iSchool at Pitt group members. Alumni email distributed to 4,674 alumni. 114 opened the DARPA link. Email blasts to 936 students 41 faculty.
– Penn State – 338 fans on Facebook and 377 followers on Twitter. Re-tweets, questions about the Challenge on Twitter, but no activity on Facebook. 2,125 alumni received the e-mail.
– UCLA – Sent messages to faculty, staff, students and alumni.– Drexel – Alumni listserv, Facebook, Tweeter electronic newsletter for
undergrads and grads, online learning system grad web site release.
Results
• Direct Reporting Data– 1 Report From Observer– 1 Pre-recruited Observer tasked for confirmation– 8 Observers recruited for confirmation
• Website Hit Data– 567 hits (Terrible!)
• 16 Case Studies of Individual Balloon Reports…
Overview of Cases
Erie, PACompetitor SiteObserver
Albany, NY Twitter w/ picture Observer/Photo Analysis
Royal Oak, MI Twitter w/ picture Observer - Fake
Providence, RITwitter w/pictureObserver/Photo Analysis
Seattle, WATwitter w/o PictureObserver
Champaign, ILCompetitor SiteObserver
Des Moines, IATwitterSelf Recant
Christiana (Glascow), DETwitter w/ PhotoDetail /Photo Analysis
Bithlow , FLCustom CralwerObserver
Charlottesville, VAObserver ReportConf.Details / Call Back
Scottsdale, AZCompetitor SiteDetail /Photo Analysis
Portland, ORTwitter w/ PictureObserver
San Francisco, CATwitter/BlogDetail/Photo Analysis
Santa Barbara, CATwitterDetail/Photo Analysis
Westfield, NJTwitter w/o PictureConflicting Data
Memphis , TN Competitor Trade OfferNever Confirmed
Results: Case Study 1
• Location: Erie, PA• Original Report: 10balloons.com website• Method: Observer mobilized• Notes:
– Early report – 9:30AM– Evaluated whether 10balloons.com website
was going to be useful intel or not– Observer was not pre-registered, was
personal friend of command post personnel
Results: Case Study 2
• Location: Albany, NY• Original Report: Twitter Feed• Method: Observer mobilized• Notes:
– Observer provided convincing photo evidence – no balloon at this location
– Subsequent photo analysis confirmed this was a manufactured image
Results: Case Study 2
• Evidence for– Reputability (3/10)
• Established Twitter Acct– Content (6/10)
• Photo of a balloon• Balloon Number 6?• Weather – match • Location – exact coords not provided, but discoverable
• Evidence Against– Reputability (9/10)
• Pre-recruited observer• Known person (PSU
Alumni Assn chapter president)
– Content (10/10)• Excellent quality photo• Same angle, confirming
coincidental landmarks/features
Results: Case Study 3
• Location: Royal Oak, MI• Original Report: Twitter• Method: Observer Mobilized• Notes:
– Observer went to the location and talked to the store owner who admitted that the balloon was a “publicity stunt” – knew about DARPA challenge and put up own balloon.
– Observer was not pre-recruited
Results: Case Study 3
• Evidence for– Reputability (3/10)
• Established Twitter Acct– Content (4/10)
• Photo of a balloon• No “DARPA” pennant , not balloon 6• Weather – match for location• Location – Had GPS
coordinates for location from photo
• Evidence Against– Reputability (9/10)
• Observer recruited after the fact
– Content (8/10)• No photos sent• Good report, verbal
description of events
Results: Case Study 4
• Location: Providence, RI• Original Report: Twitter• Method: Observer Mobilized• Notes:
– Observer was a friend of one of the analysts in the Intel Division – reported no balloon at that location
– Photo analysis provided repeat of the exact fabricated image
Results: Case Study 4
• Evidence for– Reputability (3/10)
• Established Twitter Acct– Content (4/10)
• Photo of a balloon
• Evidence Against– Reputability (9/10)
• Observer recruited after the fact
– Content (8/10)• No photos sent• Good report, verbal
description of location– Photo Evaluation
(10/10)• Reproduced shop job• Edge evaluation
Results: Case Study 5
• Location: Seattle, WA• Original Report:• Method: Observer Mobilized• Notes:
– Report was for balloon over the University of Washington Library
– Observer was Assoc. Dean of LIS College at UW
Results: Case Study 6
• Location: Champaign, IL• Original Report: 10Balloons.com• Method: Observer Mobilized• Notes:
– No picture– Known observer/Team member in close
proximity to this location
Results: Case Study 7
• Location: Des Moines, IA• Original Report: Twitter• Method: Self-Recant• Notes:
– Tweet gave location/street address of reporters home
– No photo evidence– Reporter then re-tweeted complaining about
people running through her yard and then recanted
Results: Case Study 8
• Location: Christiana, DE• Original Report: Twitter w/ coordinates• Method: Photo/Details Analysis• Notes:
– Photos didn’t match Google Maps photos – site was under construction
– Called YMCA Across street from site to confirm
Results: Case Study 9
• Location: Bithlo, FL• Original Report: Crawler• Method: Observer mobilized• Notes:
– Intel team member had a contact in the area– Reported that there was no balloon in that
location
Results: Case Study 10
• Location: Charlottesville, VA• Original Report: Observer Submission• Method: Re-contact observer for details• Notes:
– Alternate observer not available in time– Primary observer re-dispatched to collect
DARPA paperwork– Additional confirmation through intel channels
Results: Case Study 11
• Location: Scottsdale, AZ• Original Report: Fark.com (Competitor
Site)• Method: Extensive Detail Analysis• Notes:
– Interesting story– Unique username– Extensive investigation of reporter’s address, phone
number, etc in online records led to location– Poor attempt at deception– Photographic evidence confirmed location
Results: Case Study 11• A forum had a member who spotted a red balloon near his house. The post
was shortly deleted but his user name was Mini Ditka.• Using only his username and e-mail address on the forum, I tried using
Maltego to perform an OSINT trace on the e-mail address using a transform I had coded for maltego last year.
• The analysis pulled his real name (Scott Shepherd) facebook, linkedin, flickr, spock and Myspace profiles and was able to match it to that e-mail address.
• I further analyzed the subject and was able to find possible phone numbers and web-sites he might have owned.
• In the end: We surmised that the balloon must have been in the Scottsdale area using only his e-mail and username. The actual location was found using image analysis by other team members.
OSINT on E-mail address from deleted post @ forum
FacebookFlickrMyspaceLinkedin
Total Network View For Scott Shepherd
E-mails, Phones, Social Networks, Web-sites, Blogs, Address hits
Phone Number Mining on Scott Shepherd
Results: Case Study 12
• Location: Portland, OR• Original Report: Twitter Feed• Method: Observer mobilized• Notes:
– Photo matched location, gross coordinates, but wanted more details
– Observer confirmed location and provided• Additional photographic evidence• Photo of certificate
– Doctored photo of certificate circulated in the channel by others later
Verify Location Using Google Earth
Results: Case Study 13
• Location: San Francisco, CA• Original Report: Twitter/Blog• Method: Photo evidence• Notes:
– No observer available– Established Twitter Acct/Blog– Multiple photos/angles– GPS Data provided, matched
building/features on Google Maps
Results: Case Study 14
• Location: Santa Barbara, CA• Original Report: Twitter• Method: Photo Evidence• Notes:
– High quality picture available– Unlikely photoshop job based on other photos
available– Location report matches pictures of location
on Google maps
Santa Barbara, CA
Results: Case Study 15
• Location: Westfield, NJ• Original Report: Twitter• Method: Conflicting Details with Known
Good Data• Notes:
– Balloon number conflicted with AZ data that we had high confidence in (both reported as Balloon #2)
– No additional information /no photo– Discussion/disagreement on whether the location for
this balloon was right or not
Results: Case Study 16
• Location: Memphis, TN• Original Report: Competitive Team/Trade• Method: Unable to confirm• Notes:
– Email to balloon@ist.psu.edu from a competitor
– Claimed to have location in TN– Unreliable Trade/info– Trading coordinates with other teams was not
part of our plan/strategy
Results: Others
• Locations: – Katy, TX– Miami, FL– Atlanta, GA
• No evidence was found by our team regarding these locations
Summary of Results
• All 15 cases analyzed with correct assessment
• Dispatched observers to 9 locations
• 5 true locations
• 9 false locations
• 1 sighting reported to us directly
Direct Observation
Cyberspace Search
AnalysisObserver Confirmation
Social Media in Extreme Events
• Information Source (Tweets + Geotag + Image) together with GoogleMaps
• Disturbance / Deception• Crowd Sourcing
Tweets/hour (EST) Dec 04 Dec 05 Dec 06 Dec 07 Dec 08 Dec 09
Tweets/30min on the day of the challenge(Total: 6813 tweets)
A.M. P.M.
Winner Announc
ed
Ads.
Balloons Launched
Three Uses of Tweets
1) Information Source (Tweets + Geotag + Image) together with GoogleMaps
2) Disturbance 3) Crowd Sourcing
(1) Tweets as Information Source
• Tweets with Geotags– Currently only from Twitter Clients for iPhone – 39 out of about 20,000
Images from http://www.twitter-360.com/
Image from http://img129.yfrog.com/i/nhhn.jpg/
“Spotted DARPA balloon #1 in this
very central location”
Tweets as Information Source - Example 1
(1) Information Source - Example 1
Map from maps.google.com
Spotted DARPA balloon #1 in this
very central location.
(37.7879,-122.4073 ) Union Square
San Francisco, CA 94108
(1) Information Source - Example 2
Map from maps.google.com
“Red balloon siting in Marina Del Rey,
CA #DARPA”
( 33.9741,-118.4317 ) Pacific Coast Hwy
Los Angeles, CA 90094
Image from http://twitpic.com/s9k7a
Information Source - Example 2
(2) Tweets as Disturbance Means
(3) Tweets as Crowd Sourcing Means by the MIT team
A.M. P.M.
Help MIT Teamat balloon.media.mit.edu
Balloons Launched
Balloons began to
be put down
Winner Announc
ed
Conclusions• Open source and social media provide a rich source for gathering
and fusing real-time information.
• The Challenge is a nation-wide experiment that is well aligned with the college’s strategic research regarding Extreme Events and Web Science.
• Data and experience gathered are incorporated into SRA courses to enhance the curriculum.
• Integrate social science and STEM education for K-12 using projects related to extreme events.
• NSF project on developing infrastructure for classifying, geo-tagging Tweets/messages for disaster relief at Haiti.
Competitive Analysis
• MIT (10/10)– Market incentive, strong brand, strong pre-
contest press, Ultimately the Winner• Georgia Tech (9/10)
– Red Cross Donation, 2nd place– Why weren’t they part of our team, or us part
of theirs?• Groundspeak Geocachers (7-8/10)
– Market + Donation model– The “right” sensor network (?)
Discussion
• Lessons Learned– Start Earlier– Motivate Observers (promise of cash seemed to work)
– Pick the “right” observer network– Communicate often to maintain motivation– Use OS Intel – it works and it’s cheap!– Be ready to weed out fabricated data
• Photoshop jobs are easy to do• Deceiving pictures are hard to refute
– Define and clarify roles/expectations early and often
Future Work
• Automatic Classification of Microblog Data– Hyun-Woo “Anthony” Kim’s poster yesterday -
he’s here today as well, please see him after• Entity Extraction from Microblog Data
– Useful “entities” in this Challenge• Data Fusion Methods for “soft” data
– Higher levels of inference– Difficult to fuse textual information
DARPA’s Intentions
What did DARPA intend? http://www.youtube.com/watch?v=P_hjpva8gBM (11:15)
When the problems are great, the tendency for each of us is to step away because we believe, even hope that someone else, perhaps smarter, perhaps with more resources will solve those very difficult problems. But I have found that those imaginary people do not exist. There isn’t someone else. It is people just like you and me… There are no imaginary people to do this work. There is no backup plan.
Regina Dugan, Director of DARPA
Thank You!
• Questions?
• QR Code to my website:
top related