knoesis-semantic filtering-tutorials
TRANSCRIPT
![Page 1: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/1.jpg)
Semantic FilteringAn example of Semantic technologies for real-time
analysisPavan Kapanipathi
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Wright State University, USA
Tutorial @ Kno.e.sis Centre: Semantics Approach to Big Data and Event Processing, Oct 7-9, 2015
![Page 2: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/2.jpg)
Streams are everywhere
Social DataTextImagesVideos
Sensor Data Streams
![Page 3: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/3.jpg)
3
Information Overload
500M users generate 500M tweets per day
It’s not information overload. It’s filter failure
-- Clay Shirky
![Page 4: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/4.jpg)
Each of our projects face Information Overload
• Disaster Management• Hazards SEES
• Healthcare Issues• Depression
• Societal Issues • Edrug Trends• Harassment
![Page 5: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/5.jpg)
• Filtering is necessary
• Understanding the requirements and utilizing semantics for filtering is important
Semantic Filtering
![Page 6: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/6.jpg)
Two Main Topics
• Twarql• Streaming annotation and flexible
querying on Twitter
• Continuous Semantics• Tracking dynamic topics on Twitter
![Page 7: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/7.jpg)
Twarql
Tracking health care debate in
the United States on Social
Media
Health Care Reform
Health Care Reform
Healthcare reform legislation in the
United States
Patient Protection and Affordable Care
Act (Obamacare)
Health Care Reform
![Page 8: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/8.jpg)
Twarql
![Page 9: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/9.jpg)
Extraction Pipeline - Tweet
I think it’s good deal Apple Ipad Tablet (3G, wifi, WiFi + 3G) Hard Nylon Cube Carrying Case for ipad ( iPad.. http://bit.ly/cry6LF)
Dbpedia:Ipad
Dbpedia:Tablet
URLs
http://penguinkang.com/tweetprobe/
![Page 10: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/10.jpg)
RDF
• RDF Annotation• Common RDF/OWL Data formats.• FOAF, SIOC, OPO, MOAT
![Page 11: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/11.jpg)
: Health_care_reform
Twarql – Use Case
![Page 12: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/12.jpg)
Demo
http://knoesis.wright.edu/library/tools/twarql/demo.swf
![Page 13: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/13.jpg)
13
Continuous Semantics
![Page 14: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/14.jpg)
14
Dynamic Topics
Continuously Evolving on
Entity – Event relevance changes
Many entities are involved
![Page 15: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/15.jpg)
15
Dynamic Topics
Manually crawl using keywords
“indianelection”“jan25” “sandy”
“swineflu” “ebola”
![Page 16: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/16.jpg)
16
Dynamic Topics
Manually updating keywords to get topic relevant tweets is
not feasible
“indianelection”
“modi”“bjp”
“congress”
“jan25”“egypt”“tunisia”
“arabspring”
“sandy”“newyork”“redcross”
“fema”
“swineflu” “ebola”
![Page 17: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/17.jpg)
17
Problem
How can we automatically update the filters to track a dynamically evolving
topic on Twitter
![Page 18: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/18.jpg)
18
Hashtags as Filters
• Identify a topic on Twitter• Tweets with hashtags are more
informative• Users have a lot of freedom to
create them • Some get popular, most die
![Page 19: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/19.jpg)
19
Exploring Hashtags as Evolving Filters for Dynamic Topics
Colorado Shooting
Occupy Wall Street
CS OWS
Tweets: 122,062 Tweets: 6,077,378
Tags: 192,512Distinct: 12,350100% Retrieval: 7,763
Tags: 15,963,209 Distinct: 191,602100% Retrieval: 21,314 HASHTAG
FILTERS
![Page 20: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/20.jpg)
20
Top 1% retrieves around 85% of the
tweets
Hashtag distributions
![Page 21: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/21.jpg)
21
Colorado Shooting Occupy Wall Street
Event Related Hashtags co-occur
with each other
Hashtag Filters Co-occurrence Graph
![Page 22: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/22.jpg)
22
Summarizing Hashtag Analysis
Starting with one of the event relevant hashtags, by co-occurrence we can
reach other relevant hashtags
![Page 23: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/23.jpg)
23
Determining Relevancy of Co-occurring Hashtags
#indianelection2015
#modikisarkar
Too many co-occurring hashtags
![Page 24: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/24.jpg)
24
Determining Relevancy of Co-occurring Hashtags
#indianelection2015
#modikisarkar
Co-occurring: Threshold δ
Preferably a prominent hashtag
![Page 25: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/25.jpg)
25
Hashtag Co-occurrence works?
o No. Just co-occurrence does not worko Many noisy or unrelated hashtags co-occurs
o Determine the “dynamic” relevance of the top co-occurring hashtag with the dynamic topic
![Page 26: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/26.jpg)
26
Determining Relevancy of Co-occurring Hashtags
#indianelection2015
#modikisarkar
Co-occurring: Threshold
Latest K (200,500)
Narendra Modi: 0.9BJP: 0.7NDA: 0.6India: 0.4Elections: 0.2Rahul Gandhi: 0.2Congress: 0.2
Entity Extraction and Scoring
δ
Normalized FrequencyScoring
(Vector Space Model)
![Page 27: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/27.jpg)
27
Determining Relevancy of Co-occurring Hashtags (Vector Space Model)
#indianelection2015
#modikisarkar
Co-occurring: Threshold
Latest K (200,500)
Narendra Modi: 0.9BJP: 0.7NDA: 0.6India: 0.4Elections: 0.2Rahul Gandhi: 0.2Congress: 0.2
Entity Extraction and Scoring
Indian General Election,_2014
Dynamically Updated Background Knowledge
δ
![Page 28: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/28.jpg)
28
Event Relevant Background Knowledge
o Wikipedia Event Pages
![Page 29: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/29.jpg)
29
o Wikipedia Event Pages
Event Relevant Background Knowledge
![Page 30: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/30.jpg)
30
o Entities mentioned on the Event page of Wikipedia are relevant to the Event
Event Relevant Background Knowledge
![Page 31: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/31.jpg)
31
o Wikipedia’s Hyperlink structure is very richo Page-Page (Wikipedia) links
Indian General Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National Congress
Event Relevant Background Knowledge – Graph Structure
![Page 32: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/32.jpg)
32
Determining Relevancy of Co-occurring Hashtags (Vector Space Model)
#indianelection2015
#modikisarkar
Co-occurring: Threshold
Latest K (200,500)
Narendra Modi: 0.9BJP: 0.7NDA: 0.6India: 0.4Elections: 0.2Rahul Gandhi: 0.2Congress: 0.2
Entity Extraction and Scoring
Indian General Election,_2014
Extract, PeriodicallyUpdate Hyperlink structure
One hop from EventPage
δ
![Page 33: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/33.jpg)
33
o Hyperlink structure is dynamically updated
Indian General Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National Congress
10 May 2010
Event Relevant Background Knowledge
![Page 34: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/34.jpg)
34
o Hyperlink structure is dynamically updated
Indian General Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National Congress
10 May 2010
29 March 2013
29 March 2013 29 March 2013
29 March 2013
Event Relevant Background Knowledge
![Page 35: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/35.jpg)
35
o Hyperlink structure is dynamically updated
Indian General Election, 2014
Narendra Modi
Rahul Gandhi
NDA (India)UPA (India)
BJP
Indian National Congress
10 May 2010
29 March 2013
29 March 2013 29 March 2013
29 March 2013
20 May 2013
20 May 2013
Event Relevant Background Knowledge
![Page 36: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/36.jpg)
36
Determining Relevancy of Co-occurring Hashtags (Vector Space Model)
#indianelection2015
#modikisarkar
Co-occurring: Threshold
Latest K (200,500)
Narendra Modi: 0.9BJP: 0.7NDA: 0.6India: 0.4Elections: 0.2Rahul Gandhi: 0.2Congress: 0.2
Entity Extraction and Scoring
Indian General Election,_2014
Extract, PeriodicallyUpdate Hyperlink structure
Entity scoring based on relevance to the Event
One hop from EventPage
δ
![Page 37: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/37.jpg)
37
o Edge Based Measure
o Link Overlap Measure: Jaccard similarity
o Out(c) are the links in Wikipedia page “c”o Final Score: r(c,E) = ed(c,E) + oco(c,E)
Hyperlink Entity Scoring
India General Election, 2014 Narendra Modi
India General Election, 2014
India General Election, 2009 1
Mutually Important
ed (c,E) = 1
ed (c,E) = 2
![Page 38: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/38.jpg)
38
Determining Relevancy of Co-occurring Hashtags (Vector Space Model)
#indianelection2015
#modikisarkar
Co-occurring: Threshold
Latest K (200,500)
Narendra Modi: 0.9BJP: 0.7NDA: 0.6India: 0.4Elections: 0.2Rahul Gandhi: 0.2Congress: 0.2
Entity Extraction and Scoring
Indian General Election,_2014
Extract, PeriodicallyUpdate Hyperlink structure
Entity scoring based on relevance to the Event
One hop from EventPage
Indian General Elec: 1.0India: 0.9Elections: 0.7UPA: 0.6BJP: 0.3NDA: 0.3Narendra Modi: 0.3
δ
![Page 39: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/39.jpg)
39
Determining Relevancy of Co-occurring Hashtags (Vector Space Model)
#indianelection2015
#modikisarkar
Co-occurring: Threshold
Latest K (200,500)
Narendra Modi: 0.9BJP: 0.7NDA: 0.6India: 0.4Elections: 0.2Rahul Gandhi: 0.2Congress: 0.2
Entity Extraction and Scoring
Indian General Election,_2014
Extract, PeriodicallyUpdate Hyperlink structure
Entity scoring based on relevance to the Event
One hop from EventPage
Indian General Elec: 1.0India: 0.9Elections: 0.7UPA: 0.6BJP: 0.3NDA: 0.3Narendra Modi: 0.3
Similarity Check
Relevance Score: 0.6
δ
![Page 40: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/40.jpg)
40
o Set Basedo Jaccard Similarity
o Considers the entities without the scores
o Vector Basedo Symmetric
o Cosine Similarity
o Asymmetrico Subsumption Similarity
Similarity Check
![Page 41: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/41.jpg)
41
India General Election 2014
Narendra Modi
Intuition behind Asymmetric
India General Election 2014
Narendra Modi
Similarity
Similarity
Penalized
Ignored
Similarity
Symmetric
Asymmetric
![Page 42: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/42.jpg)
42
Determining Relevancy of Co-occurring Hashtags (Vector Space Model)
#indianelection2015
#modikisarkar
Co-occurring: Threshold
Latest K (200,500)
Narendra Modi: 0.9BJP: 0.7NDA: 0.6India: 0.4Elections: 0.2Rahul Gandhi: 0.2Congress: 0.2
Entity Extraction and Scoring
Indian General Election,_2014
Extract, PeriodicallyUpdate Hyperlink structure
Entity scoring based on relevance to the Event
One hop from EventPage
Indian General Elec: 1.0India: 0.9Elections: 0.7UPA: 0.6BJP: 0.3NDA: 0.3Narendra Modi: 0.3
Similarity Check
Relevance Score: 0.6
δ
![Page 43: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/43.jpg)
43
o 2 eventso US Presidential Elections (#election2012)o Hurricane Sandy (#sandy)
o Top 25 co-occurring hashtags
Evaluation – Dataset
![Page 44: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/44.jpg)
44
o Ranking Problemo Rank the Top 25 hashtags based on the relevancy
of tweets to the evento Experiment with all the similarity metricso Manually annotated the tweets of these hashtags
as relevant/irrelevant (Gold Standard)
o Ranking Evaluation Metricso Mean Average Precisiono NDCG
Evaluation – Strategy
![Page 45: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/45.jpg)
45
Evaluation
![Page 46: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/46.jpg)
46
Evaluation
Evaluated tweets comprising of top-relevant hashtags detected for dynamic topics• NDCG - 92% at top-5 Mean Average
Precision
![Page 47: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/47.jpg)
47
Conclusions• Semantic Technologies for Real-time filtering of Social Data
– Wikipedia as a Dynamic Knowledge base for events– Determining relevant hashtags using Asymmetric similarity
measure– More hashtags in turn increase the coverage of Tweets for events
• Hashtag Analysis– Co-occurrence technique can be used to detect event relevant
hashtags– More popular hashtags are easier to be detected via co-occurrence
![Page 48: Knoesis-Semantic filtering-Tutorials](https://reader035.vdocument.in/reader035/viewer/2022070513/588553611a28ab47268b4e65/html5/thumbnails/48.jpg)
ThanksContact: @[email protected]