social media use during humanitarian emergencies and disasters › talks ›...

59
Social Media Use During Humanitarian Emergencies and Disasters Alexandra Olteanu Social Computing + Computational Social Science IBM Science for Social Good Applied Data Science IBM TJ Watson Research Center

Upload: others

Post on 24-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Social Media Use During Humanitarian Emergencies and Disasters

Alexandra Olteanu Social Computing +

Computational Social Science

IBM Science for Social Good Applied Data Science

IBM TJ Watson Research Center

IBM Science for Social Good Initiative Directed by Aleksandra Mojsilovic & Kush Varshney

IBM Science for Social Good InitiativeExploring a new approach in addressing world’s challenges

Partners IBM Research scientists with academic fellows and subject matter experts from a diverse range of non-governmental organizations (NGOs) to tackle emerging societal challenges using science and technology.

IBM Science for Social Good InitiativeOur projects demonstrate the realm of what’s possible

Events & Social MediaHigh volumes of event-related tweets

• 25+ million tweets for Sandy Hurricane • 24+ million tweets for 2016 Academy Awards • 10+ million tweets for Supreme Court ruling on marriage equality • 4+ million tweets about UK 2015 elections

Endorsed by governmental agencies

Social media are to enhance, not replace

MicroMappers, CrisisLex, CREES, AIDR, …

What kind of information people share during humanitarian crises?

with Carlos Castillo and Sarah Vieweg

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises

✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises

✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises

✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises

✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises

✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises

✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

✓ Twitter sample API ✓ 2012 & 2013 ✓ ~1% random sample of Twitter public stream ✓ ~130+ million tweets per month

✓ Keyword-based searches ✓ proper names of affected location ✓ manila floods, boston bombings, #newyork derailment

✓ proper names of meteorological phenomena ✓ sandy hurricane, typhoon yolanda

✓ promoted hashtags ✓ #SafeNow, #RescuePH, #ReliefPH

✓ 26 crisis events ✓ 14 countries and 8 languages ✓ 12 different hazard types ✓ earthquakes, wildfires, floods, bombings, shootings, etc.

✓ 15 instantaneous crises ✓ 15 diffused crises

✓ 1000 annotated tweets per crisis ✓ Content dimensions ✓ Informativeness ✓ Source of information ✓ Type of information

✓ Crowdsource workers from the affected countries

Data Collection & Annotation

Singapore Haze 2013

Typhoon Pablo 2012

Australia Wildfire 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Content & Source Variations

Affected individuals 20% (min 5%, max 57%)

Infrastructure & Utilities 7% (min 0%, max 22%)

Savar Building Collapse 2013

Typhoon Pablo 2012

Alberta Floods 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work

Singapore Haze 2013

Typhoon Pablo 2012

Australia Wildfire 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Content & Source Variations

Affected individuals 20% (min 5%, max 57%)

Infrastructure & Utilities 7% (min 0%, max 22%)

Savar Building Collapse 2013

Typhoon Pablo 2012

Alberta Floods 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Eyewitness accounts 9% (min 0%, max 54%)

Outsiders 38% (min 3%, max 65%)

Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work

Singapore Haze 2013

Typhoon Pablo 2012

Australia Wildfire 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Content & Source Variations

Affected individuals 20% (min 5%, max 57%)

Infrastructure & Utilities 7% (min 0%, max 22%)

Savar Building Collapse 2013

Typhoon Pablo 2012

Alberta Floods 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Eyewitness accounts 9% (min 0%, max 54%)

Outsiders 38% (min 3%, max 65%)

Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work

Singapore Haze 2013

Typhoon Pablo 2012

Australia Wildfire 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Content & Source Variations

Affected individuals 20% (min 5%, max 57%)

Infrastructure & Utilities 7% (min 0%, max 22%)

Savar Building Collapse 2013

Typhoon Pablo 2012

Alberta Floods 2013

Guatemala Quake 2012

Boston Bombings 2013

Typhoon Yolanda 2013

Eyewitness accounts 9% (min 0%, max 54%)

Outsiders 38% (min 3%, max 65%)

Background Media Biases Collection BiasesData Collection Methods Sensitivity Conclusions Future Work

Savar building collapse’13LA airport shootings’13

NY train crash’13Russia meteor’13

Colorado wildfires’12Guatemala earthquake’12

Glasgow helicopter crash’13West Texas explosion’13

Lac Megantic train crash’13Venezuela refinery’12Bohol earthquake’13Boston bombings’13

Brazil nightclub fire’13Spain train crash’13

Manila floods’13Alberta floods’13

Philipinnes floods’12Typhoon Yolanda’13

Costa Rica earthquake’12Singapore haze’13

Italy earthquakes’12Australia bushfire’13

Queensland floods’13Colorado floods’13Sardinia floods’13Typhoon Pablo’12

Data Patterns: Types

lower similarity

Savar building collapse’13LA airport shootings’13

NY train crash’13Russia meteor’13

Colorado wildfires’12Guatemala earthquake’12

Glasgow helicopter crash’13West Texas explosion’13

Lac Megantic train crash’13Venezuela refinery’12Bohol earthquake’13Boston bombings’13

Brazil nightclub fire’13Spain train crash’13

Manila floods’13Alberta floods’13

Philipinnes floods’12Typhoon Yolanda’13

Costa Rica earthquake’12Singapore haze’13

Italy earthquakes’12Australia bushfire’13

Queensland floods’13Colorado floods’13Sardinia floods’13Typhoon Pablo’12 Natural, diffused,

and progressive

Human-induced, focalized, and instantaneous

Data Patterns: Types

lower similarity

Singapore haze’13Philipinnes floods’12

Alberta floods’13Manila floods’13

Italy earthquakes’12Australia bushfire’13Colorado wildfires’12

Colorado floods’13Queensland floods’13

Typhoon Pablo’12Typhoon Yolanda’13

Brazil nightclub fire’13Russia meteor’13

Boston bombings’13Glasgow helicopter crash’13

Bohol earthquake’13Venezuela refinery’12

Sardinia floods’13West Texas explosion’13

NY train crash’13Costa Rica earthquake’12

LA airport shootings’13Savar building collapse’13

Lac Megantic train crash’13Guatemala earthquake’12

Spain train crash’13

Data Patterns: Sources

lower similarity

Singapore haze’13Philipinnes floods’12

Alberta floods’13Manila floods’13

Italy earthquakes’12Australia bushfire’13Colorado wildfires’12

Colorado floods’13Queensland floods’13

Typhoon Pablo’12Typhoon Yolanda’13

Brazil nightclub fire’13Russia meteor’13

Boston bombings’13Glasgow helicopter crash’13

Bohol earthquake’13Venezuela refinery’12

Sardinia floods’13West Texas explosion’13

NY train crash’13Costa Rica earthquake’12

LA airport shootings’13Savar building collapse’13

Lac Megantic train crash’13Guatemala earthquake’12

Spain train crash’13

Data Patterns: Sources

lower similarity

Instantaneous, focalized, and human-induced

Natural, diffused, and progressive

Singapore haze’13Philipinnes floods’12

Alberta floods’13Manila floods’13

Italy earthquakes’12Australia bushfire’13Colorado wildfires’12

Colorado floods’13Queensland floods’13

Typhoon Pablo’12Typhoon Yolanda’13

Brazil nightclub fire’13Russia meteor’13

Boston bombings’13Glasgow helicopter crash’13

Bohol earthquake’13Venezuela refinery’12

Sardinia floods’13West Texas explosion’13

NY train crash’13Costa Rica earthquake’12

LA airport shootings’13Savar building collapse’13

Lac Megantic train crash’13Guatemala earthquake’12

Spain train crash’13

Data Patterns: Sources

lower similarity

Instantaneous, focalized, and human-induced

Natural, diffused, and progressive

Singapore haze’13Philipinnes floods’12

Alberta floods’13Manila floods’13

Italy earthquakes’12Australia bushfire’13Colorado wildfires’12

Colorado floods’13Queensland floods’13

Typhoon Pablo’12Typhoon Yolanda’13

Brazil nightclub fire’13Russia meteor’13

Boston bombings’13Glasgow helicopter crash’13

Bohol earthquake’13Venezuela refinery’12

Sardinia floods’13West Texas explosion’13

NY train crash’13Costa Rica earthquake’12

LA airport shootings’13Savar building collapse’13

Lac Megantic train crash’13Guatemala earthquake’12

Spain train crash’13

Data Patterns: Sources

lower similarity

Instantaneous, focalized, and human-induced

Natural, diffused, and progressive

Similar events tend to have a similar distribution of message types & sources.

Med

ia

Out

side

rs

Eye

witn

ess

Gov

ernm

ent

NG

Os

Bus

ines

s

Other Useful Info.

Sympathy

Affected Ind.

Donat. & Volun.

Caution & Advice

Infrast. & Utilities

0

5%

10%

>15%

Sources & Message Types

Med

ia

Out

side

rs

Eye

witn

ess

Gov

ernm

ent

NG

Os

Bus

ines

s

Other Useful Info.

Sympathy

Affected Ind.

Donat. & Volun.

Caution & Advice

Infrast. & Utilities

0

5%

10%

>15%

Sources & Message Types

Med

ia

Out

side

rs

Eye

witn

ess

Gov

ernm

ent

NG

Os

Bus

ines

s

Other Useful Info.

Sympathy

Affected Ind.

Donat. & Volun.

Caution & Advice

Infrast. & Utilities

0

5%

10%

>15%

Sources & Message Types

Med

ia

Out

side

rs

Eye

witn

ess

Gov

ernm

ent

NG

Os

Bus

ines

s

Other Useful Info.

Sympathy

Affected Ind.

Donat. & Volun.

Caution & Advice

Infrast. & Utilities

0

5%

10%

>15%

Sources & Message Types

Med

ia

Out

side

rs

Eye

witn

ess

Gov

ernm

ent

NG

Os

Bus

ines

s

Other Useful Info.

Sympathy

Affected Ind.

Donat. & Volun.

Caution & Advice

Infrast. & Utilities

0

5%

10%

>15%

Sources & Message Types

Temporal Distribution: Information Typespeak

12h 24h 36h 48h … several days

Caution & Advice

Temporal Distribution: Information Typespeak

12h 24h 36h 48h … several days

Caution & Advice

Sympathy & Support

Temporal Distribution: Information Typespeak

12h 24h 36h 48h … several days

Caution & Advice

Sympathy & Support

Infrastructure & Utilities

Temporal Distribution: Information Typespeak

12h 24h 36h 48h … several days

Caution & Advice

Sympathy & Support

Affected Individuals

Infrastructure & Utilities

Temporal Distribution: Information Typespeak

12h 24h 36h 48h … several days

Caution & Advice

Sympathy & Support

Affected Individuals

Infrastructure & Utilities

Other Useful Info.

Temporal Distribution: Information Typespeak

12h 24h 36h 48h … several days

Caution & Advice

Sympathy & Support

Affected Individuals

Infrastructure & Utilities

Other Useful Info.

Donations & Volunteering

Temporal Distribution: Information Typespeak

12h 24h 36h 48h … several days

12h 24h 36h 48h … several days

Progressive

Temporal Distribution: Sourcespeak

12h 24h 36h 48h … several days

Eyewitness

Progressive

Temporal Distribution: Sourcespeak

12h 24h 36h 48h … several days

Eyewitness

Government

Progressive

Temporal Distribution: Sourcespeak

12h 24h 36h 48h … several days

Eyewitness

Government Media

Progressive

Temporal Distribution: Sourcespeak

12h 24h 36h 48h … several days

Eyewitness

Government

Outsiders

Media

Progressive

Temporal Distribution: Sourcespeak

12h 24h 36h 48h … several days

Eyewitness

Government

NGOsOutsiders

Media

Progressive

Temporal Distribution: Sourcespeak

12h 24h 36h 48h … several days

Eyewitness

Government

Businesses

NGOsOutsiders

Media

Progressive

Temporal Distribution: Sourcespeak

How do we collect social media data during humanitarian crises?

with Carlos Castillo, Fernando Diaz, and Sarah Vieweg

Data Collection: How Is It Done?

Tweets are queried by

Content #prayforwest #abflood

Location

Low recall: 33%Not everyone uses the keywords.

Maximum 400 terms.

Low precision: 12%Not everyone on the ground talks

about the event. Maximum 25 geo-rectangles.

Maximum 1% of all tweets

longitude: [-97.5, -96.5] & latitude: [31.5, 32]

Data Collection: How Is It Done?

Tweets are queried by

Content #prayforwest #abflood

Location

Low recall: 33%Not everyone uses the keywords.

Maximum 400 terms.

Low precision: 12%Not everyone on the ground talks

about the event. Maximum 25 geo-rectangles.

Maximum 1% of all tweets

longitude: [-97.5, -96.5] & latitude: [31.5, 32]

damage

affected people

people displaced

donate blood

text redcross

stay safe

crisis deepens

evacuated

toll raises

… …

Key Insight: Distill a Crisis Lexicon

Precision vs. Recall

Recall(% of crisis tweets retrieved)

Prec

ision

(o

n-cris

is %

of ret

rieve

d twe

ets)

Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.

Precision vs. Recall

Recall(% of crisis tweets retrieved)

Prec

ision

(o

n-cris

is %

of ret

rieve

d twe

ets)

DesiredKW-based

Geo-based

Keyword based sampling#sandy, #bostonbombings, #qldflood

Location based samplingtweets geo-tagged in area of the disaster

Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.

100%30%

100%

10%

Precision vs. Recall

Recall(% of crisis tweets retrieved)

Prec

ision

(o

n-cris

is %

of ret

rieve

d twe

ets)

DesiredKW-based

Geo-based

Keyword based sampling#sandy, #bostonbombings, #qldflood

Location based samplingtweets geo-tagged in area of the disaster

Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.

High Precision

High Recall

PMI PMI+Freq.CHI2

Freq.

100%30%

100%

10%

Precision vs. Recall

Recall(% of crisis tweets retrieved)

Prec

ision

(o

n-cris

is %

of ret

rieve

d twe

ets)

DesiredKW-based

Geo-based

Keyword based sampling#sandy, #bostonbombings, #qldflood

Location based samplingtweets geo-tagged in area of the disaster

Precision is straightforward to measure.Recall requires a complete data collection. We use geo data as proxy.

High Precision

High Recall

100%30%

100%

10%

0

15

30

45

60

Geo-sampling (baseline)

Eyewitness Government NGOs For-profit Corp. Media Other sources

RepresentativenessKeyword-based samples appear to overrepresent media reports

and underrepresent eyewitness accounts.

Lexicon-based samples better preserve the original distribution of messages types and sources.

Geo-sampling

0

15

30

45

60

Geo-sampling (baseline)

Eyewitness Government NGOs For-profit Corp. Media Other sources

Representativeness

Keyword-sampling

0

15

30

45

60

EyewitnessGov. NGO For-profitMediaOther sourcesKeyword-sampling

0

15

30

45

60

EyewitnessGov. NGO For-profitMediaOther sources

Keyword-based samples appear to overrepresent media reports and underrepresent eyewitness accounts.

Lexicon-based samples better preserve the original distribution of messages types and sources.

Geo-sampling

0

15

30

45

60

Geo-sampling (baseline)

Eyewitness Government NGOs For-profit Corp. Media Other sources

Representativeness

Keyword-sampling

0

15

30

45

60

EyewitnessGov. NGO For-profitMediaOther sourcesKeyword-sampling

0

15

30

45

60

EyewitnessGov. NGO For-profitMediaOther sources

Keyword-based samples appear to overrepresent media reports and underrepresent eyewitness accounts.

Lexicon-based samples better preserve the original distribution of messages types and sources.

Geo-sampling

0

15

30

45

60

Geo-sampling (baseline)

Eyewitness Government NGOs For-profit Corp. Media Other sources

Representativeness

S4

0

15

30

45

60

EyewitnessGov. NGO For-profitMediaOther sourcesS4

0

15

30

45

60

EyewitnessGov. NGO For-profitMediaOther sources

Keyword-based samples appear to overrepresent media reports and underrepresent eyewitness accounts.

Lexicon-based samples better preserve the original distribution of messages types and sources.

Geo-sampling