the italian hate map: semantic content analytics for social good
TRANSCRIPT
The Italian Hate Map: semantic content analytics for social good
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities
Palermo (Italy) - October 29-30, 2015
2Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
3
The Italian HateMap
http://users.humboldt.edu/mstephens/hate/hate_map.html
Inspired by the Hate Map built by
the Humboldt University
joint research with a psychologists team of Rome University and a
no-profit agency focused on human
rights
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
4
http://users.humboldt.edu/mstephens/hate/hate_map.html
Insight:To aggregate rough people-based data in order to analyze
complex phenomena.
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
5(Not a new idea) Map of cholera in London, 1854
red = cholera cases blue = water
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
6
Research Question:Is it possible to extract and process social media
to detect intolerant content posted on social networks and identify the most at-risk areas of the
Italian country?
The Italian HateMap
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
7
A framework for real-time Semantic Analysis of Social Streams
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
8
CrowdPulse
Social Data Extraction
features
Semantic Tagging
Sentiment Analysis Processing & VisualizationCataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
9
workflowCrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
10
Step 1: Social Data ExtractionCrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
11
Step 1: Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
12
Step 1: Social Data Extraction
Extraction
Source
Heuristics
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
13
Step 1: Social Data Extraction
Extraction
Source
Heuristics
ContentUser
Geo
Content+Geo
#icities2015#democrats
#traffic
@barack_obama@comunepalermo
#earthquake
Page
Group
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
14
Step 1: Social Data Extraction
Extraction
Source
Heuristics
ContentUser
Geo
Content+Geo
#www2015#democrats
#traffic
@barack_obama@comunefi
#earthquake
Page
GroupWe only extract public content
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
15
Use Case
Heuristics: Twitter content- 76 intolerant seed terms, defined by the psychologists teams - 5 intolerance dimensions: violence (against women), racism,
homophobia, disability, anti-semitism
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
16
Use Case
Extracted content (seed term: nano/midget)
Tweet about an Italian ministry
CROWDPULSE SETTINGS
Tweet about iPod nano
Tweet about an Italian football player
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
17
Use Case
Tweet about an Italian ministry
CROWDPULSE SETTINGS
Tweet about iPod nano
Tweet about an Italian football player
The Italian Hate Map
Many non-intolerant Tweets are extracted!
XX
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
18
Use Case
Sentiment Analysis and Semantic Tagging of the content
CROWDPULSE SETTINGS
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
The Italian Hate Map
Keyword-based representation introduces a lot of noise in the analysis
nano
?
(midget)
(ipod nano)
Semantic TaggingMotivations
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015 19
“E’inutile, il mio nano non segnerà mai”
?
Semantic TaggingMotivations
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
INTOLERANTNOT INTOLERANT?
20
• Entity Linking Algorithms• Input: textual content • Output: identification and
disambiguation of the entities mentioned in the text.
(1) http://tagme.di.unipi.it
(2) http://spotlight.dbpedia.org
21
Step 2: Semantic Tagging
Solution: semantic processing of extracted content
Algorithms
CrowdPulse
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
22
Use Case
Non-intolerant Tweets are detected and filtered out.
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
23
CrowdPulseStep 3: Sentiment Analysis
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
24
Sentiment AnalysisMotivations
Is this content conveying any opinion?
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
25
Sentiment AnalysisMotivations
Is this content conveying any opinion?
This is a crucial issue if people-based findings have to be generated
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
26
Sentiment AnalysisDefinition
“It is the field of study that analyzes people’s
opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as
products, services, organizations, individuals, issues, events, topics, and
their attributes “ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)
We concentrated on the polarity detection taskCataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
27
Use Case
Tweets with positive or neutral sentiment are detected and filtered out.
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
28
Use CaseCROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
29
CrowdPulseStep 4: Processing
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
30
Use Case
We have to build a map, so we only need geotagged content
CROWDPULSE SETTINGS
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
31
Use CaseCROWDPULSE SETTINGS
The Italian Hate Map
Definition of heuristics to increase the number of geotagged Tweets
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
32
Use CaseThe Italian Hate Map
Dimension #Tweets #Geo %Geo
Homophobia 110,774 8,501 7,66%
Racism 154,170 1,940 1,24%
Violence 1,102,494 28,886 2,62%
Disability 479,654 3,410 0,75%
Anti-Semitism 6,000 1,150 18,03%
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
33
CrowdPulseStep 4: Data Visualization
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
34
Use CaseCROWDPULSE OUTPUT
The Italian Hate Map
Violence against women Disability
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
based on OpenStreetMap
35
Use CaseCROWDPULSE OUTPUT
The Italian Hate Map
Racism Homophobia
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
based on OpenStreetMap
Conclusions
36
Crowdsourcing-based approach
Social content containing the seed terms is extracted and processed in
real-time
Semantic Processing exploited to delete non-intolerant
Tweets
Sentiment Analysis
used to filter out Tweet with irony
1. 2.
3. 4. Analytics Console used to build real-time hate
maps
Almost 2,000,000 social content extracted and analyzed.
The Italian Hate Map
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
Lessons Learned
37Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
38
Lessons LearnedThe Italian Hate Map
Given the maps and given the output of the linguistic analysis of intolerant Tweets (co-occurrences between terms,
time lapse, etc.), the psychologists team defined some guidelines to tackle and prevent intolerant behaviors.
These guidelines have been freely distributed to public administration on early 2015.
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015
Lessons Learned
39
Pipeline of state of the art techniquesSemantic Processing, Sentiment Analysis, Machine Learning, Data Visualization
Use Case: The Italian Hate Map
DEFINITION OF A FRAMEWORK FOR REAL-TIME SEMANTIC CONTENT ANALYSIS
Thanks to the huge availability of textual data very complex
phenomena can be analyzed in a totally new way
Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis The Italian Hate Map: semantic content analytics for social good. iCities 2015 Workshop, Palermo (Italy) 30.10.2015