doctoral defense: hany salaheldeen
TRANSCRIPT
![Page 1: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/1.jpg)
2015 Hany SalahEldeen Dissertation Defense 1
Detecting, Modeling, & Predicting User Temporal Intention in Social Media
Hany M. SalahEldeenDoctor of PhilosophyDissertation Defense
Old Dominion UniversityDepartment of Computer Science
Advisor: Dr. Michael L. Nelson
Dr. Michele C. WeigleDr. Hussein M. Abdel-WahabDr. M’Hammed Abdous
Committee:
May 5th, 2015
![Page 2: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/2.jpg)
2015 Hany SalahEldeen Dissertation Defense 2
All tweets are equal…
…but some are more equal than the others
![Page 3: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/3.jpg)
2015 Hany SalahEldeen Dissertation Defense 3
It is imperative to know…
1. How long would these last?2. And if lost, is there a backup somewhere?3. Is this what the author intended?
![Page 4: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/4.jpg)
2015 Hany SalahEldeen Dissertation Defense 4
To maintain historical integrity
Since tweets are considered the first draft of history… the historical integrity of the tweets could be compromised.
![Page 5: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/5.jpg)
2015 Hany SalahEldeen Dissertation Defense 5
Motivation
Background
Related Research
Research Question
User-Time-Shared Resource
Conclusions
![Page 6: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/6.jpg)
2015 Hany SalahEldeen Dissertation Defense 6
People rely on social media for most updated information
![Page 7: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/7.jpg)
2015 Hany SalahEldeen Dissertation Defense 7
Social media is more than kitty photos
Marie ColvinJanuary 12, 1956 – February 22, 2012
Rémi Ochlik16 October 1983 – 22 February 2012
Ahmed Assem1987 – July 8, 2013
![Page 8: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/8.jpg)
2015 Hany SalahEldeen Dissertation Defense 8
For the web is dark, and full of missing content…
Accessed in July 2014
3 out 8 external links on Remi’sWikipedia page return 404
![Page 9: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/9.jpg)
2015 Hany SalahEldeen Dissertation Defense 9
even for content shared in social media
Accessed in July 2014
![Page 10: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/10.jpg)
2015 Hany SalahEldeen Dissertation Defense 10
News sites are also prone to change
Accessed in July 2014
![Page 11: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/11.jpg)
2015 Hany SalahEldeen Dissertation Defense 11
So are specialized sites
Accessed in July 2014
![Page 12: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/12.jpg)
2015 Hany SalahEldeen Dissertation Defense 12
Research Problem:Author’s Intention ≠ Reader’s Experience
![Page 13: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/13.jpg)
2015 Hany SalahEldeen Dissertation Defense 13
Research ImplicationAuthor’s Intention ≠ Reader’s Experience
Broken Inconsistent Weband Historical Records
![Page 14: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/14.jpg)
2015 Hany SalahEldeen Dissertation Defense 14
Motivation
Background
Related Research
Research Question
User-Time-Shared Resource
Conclusions
![Page 15: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/15.jpg)
2015 Hany SalahEldeen Dissertation Defense 15
Social Post
![Page 16: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/16.jpg)
2015 Hany SalahEldeen Dissertation Defense 16
The anatomy of a tweet
Author’s username
Other user mention
Tweet Body
Hash TagShortened URL to resource
Publishing timestamp
SocialPost
Shared Resource
Interactionoptions
![Page 17: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/17.jpg)
2015 Hany SalahEldeen Dissertation Defense 17
3 URIs = 3 Chances to fail
![Page 18: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/18.jpg)
2015 Hany SalahEldeen Dissertation Defense 18
URL shortening and aliasing
curl -L -I http://bit.ly/losing_revolution
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Mon, 07 Jul 2014 18:19:48 GMT
Cache-Control: private; max-age=90
Location:
http://ws-dl.blogspot.com/2012/02/2012-02-11-
losing-my-revolution-year.html
Mime-Version: 1.0
Set-Cookie: _bit=53bae4c4-00328-04f10-
cb1cf10a;domain=.bit.ly;expires=Sat Jan 3
18:19:48 2015;path=/; HttpOnly
Content-Type: text/html;charset=utf-8Content-Length: 167
HTTP/1.1 200 OK
Expires: Mon, 07 Jul 2014 18:19:52 GMT
Date: Mon, 07 Jul 2014 18:19:52 GMT
Cache-Control: private, max-age=0
Last-Modified: Mon, 07 Jul 2014 18:19:07
GMT
ETag: "e3555826-b103-4daa-a3f2-
d0509ebab51f"
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Server: GSE
Alternate-Protocol: 80:quic
Content-Type: text/html;charset=UTF-8Content-Length: 0
![Page 19: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/19.jpg)
2015 Hany SalahEldeen Dissertation Defense 19
Life cycle of a social post
![Page 20: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/20.jpg)
2015 Hany SalahEldeen Dissertation Defense 20
Life cycle of a social post
tweets
![Page 21: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/21.jpg)
2015 Hany SalahEldeen Dissertation Defense 21
Life cycle of a social post
tweets Links to
![Page 22: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/22.jpg)
2015 Hany SalahEldeen Dissertation Defense 22
Life cycle of a social post
tweets
What the reader
receives
Links to
Same state the author intended
![Page 23: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/23.jpg)
2015 Hany SalahEldeen Dissertation Defense 23
Life cycle of a social post
tweets
What the reader
receives
Links to
Same state the author intended
Ideally!
![Page 24: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/24.jpg)
2015 Hany SalahEldeen Dissertation Defense 24
Life cycle of a social post
tweets
What the reader
receives
Links to
Same state the author intended
After a period of time
![Page 25: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/25.jpg)
2015 Hany SalahEldeen Dissertation Defense 25
Life cycle of a social post
tweets
What the reader
receives
Links to
Same state the author intended
The resource has disappeared
After a period of time
![Page 26: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/26.jpg)
2015 Hany SalahEldeen Dissertation Defense 26
Life cycle of a social post
tweets
What the reader
receives
Links to
Same state the author intended
The resource has disappeared
The resource has changed
After a period of time
![Page 27: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/27.jpg)
2015 Hany SalahEldeen Dissertation Defense 27
Memento framework
* http://mementoweb.org/guide/rfc/
![Page 28: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/28.jpg)
2015 Hany SalahEldeen Dissertation Defense 28
Motivation
Background
Related Research
Research Question
User-Time-Shared Resource
Conclusions
![Page 29: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/29.jpg)
2015 Hany SalahEldeen Dissertation Defense 29
Related Work
• Social media analysis:• Understanding Microblogging
• Zhao 2009• Yang 2010• Newman 2003• Kwak 2010• Java 2007• Cha 2009
• History Narration• Vieweg 2010• Starbird 2010-2012• Qu 2011• Neubig 2011• Lehman and Lalmas 2012-
2013
• User’s Web Search Intention• Ashkan 2009
• Lee 2005
• Loser 2008
• Azzopardi 2009
• Baeza-Yates 2006
• Dai 2011
• Commercial Intention• Guo 2010
• Benczur 2007
• Sentiment Analysis• Mishne 2006
• Bollen 2011
• Access to Archives• Van de Sompel 2009
• Persistence of shared resources– Nelson 2002
– Sanderson 2011
– McCown 2007
• URL Shortening– Antoniades 2011
• Tweeting, Micro-blogging and Popularity– Wu 2011
– Java 2007
– Kwak 2010
• Social Networks Growth and Evolution– Meeder 2011
Further details: refer to chapter 3
![Page 30: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/30.jpg)
2015 Hany SalahEldeen Dissertation Defense 30
Motivation
Background
Related Research
Research Question
User-Time-Shared Resource
Conclusions
![Page 31: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/31.jpg)
2015 Hany SalahEldeen Dissertation Defense 31
Research Question:Can we estimate the users’
intention at the time of posting and reading to predict and
maintain temporal consistency?
![Page 32: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/32.jpg)
2015 Hany SalahEldeen Dissertation Defense 32
Research Goals
• Detect the temporal intention of the:
1. Author upon sharing time
2. The reader upon dereferencing time
• Model this intention as a function of time, nature of the resource, and its context.
• Predict how resources change with time and the intention behind sharing them to minimize inconsistency.
• Implement the prediction model to automatically preserve vulnerable social content that is prone to change or loss and provide a smooth temporal navigation of the social web.
Further details: refer to chapter 6
Further details: refer to chapter 7
Further details: refer to chapter 8
Further details: refer to chapter 9
![Page 33: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/33.jpg)
2015 Hany SalahEldeen Dissertation Defense 33
Motivation
Background
Related Research
Research Question
User-Time-Shared Resource
Conclusions
![Page 34: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/34.jpg)
2015 Hany SalahEldeen Dissertation Defense 34
Shared Resource Time User
Our analysis covers three angles
![Page 35: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/35.jpg)
2015 Hany SalahEldeen Dissertation Defense 35
Shared Resource Time User
Loss and Persistence of Shared Resources
![Page 36: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/36.jpg)
2015 Hany SalahEldeen Dissertation Defense 36
Shared Resource Time User
Alive
First: Estimate social media content loss
![Page 37: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/37.jpg)
2015 Hany SalahEldeen Dissertation Defense 37
Six socially significant events
Event Source Year
Iranian Election SNAP Dataset 2009
H1N1 Virus Outbreak SNAP Dataset 2009
Michael Jackson’s Death SNAP Dataset 2009
Obama’s Nobel Peace Prize SNAP Dataset 2009
The Egyptian Revolution Twitter, Websites, Books 2011
The Syrian Uprising Twitter API 2012
![Page 38: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/38.jpg)
2015 Hany SalahEldeen Dissertation Defense 38
Twitter tag expansion and filtration
![Page 39: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/39.jpg)
2015 Hany SalahEldeen Dissertation Defense 39
Twitter tag expansion increases precision
![Page 40: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/40.jpg)
2015 Hany SalahEldeen Dissertation Defense 40
What are people sharing?
![Page 41: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/41.jpg)
2015 Hany SalahEldeen Dissertation Defense 41
Existence on the live web and in the archives
• For each unique URL we resolved the final HTTP response and considered 2 classes:• Success: 200 OK• Failure: 4XX, 50X families and the 30X loop redirects or soft 404s.
• Utilize the memento aggregator:• Archived: if it has at least one memento in the timemap
![Page 42: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/42.jpg)
2015 Hany SalahEldeen Dissertation Defense 42
Resources Missing and Archived
Collection Percentage Missing Percentage Archived
23.49%H1N1 Outbreak 41.65%
36.24%Michael Jackson 39.45%
26.98%Iran 43.08%
24.59%Obama 47.87%
10.48%Egypt 20.18%
7.04%Syria 5.35%
31.62% 30.78%
24.47% 36.26%
25.64% 43.87%
26.15% 46.15%
![Page 43: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/43.jpg)
2015 Hany SalahEldeen Dissertation Defense 43
Shared Resource Time User
Alive
Mis
sin
g
Second: Can we measure existence and disappearance as a function of time?
![Page 44: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/44.jpg)
2015 Hany SalahEldeen Dissertation Defense 44
Resources Missing and Archived
Collection Percentage Missing Percentage Archived
23.49%H1N1 Outbreak 41.65%
36.24%Michael Jackson 39.45%
26.98%Iran 43.08%
24.59%Obama 47.87%
10.48%Egypt 20.18%
7.04%Syria 5.35%
31.62% 30.78%
24.47% 36.26%
25.64% 43.87%
26.15% 46.15%
![Page 45: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/45.jpg)
2015 Hany SalahEldeen Dissertation Defense 45
Timeline of Events
![Page 46: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/46.jpg)
2015 Hany SalahEldeen Dissertation Defense 46
Timeline of Events
![Page 47: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/47.jpg)
2015 Hany SalahEldeen Dissertation Defense 47
Social Events Having a Bimodal Time Distribution
![Page 48: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/48.jpg)
2015 Hany SalahEldeen Dissertation Defense 48
Timeline of Events
![Page 49: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/49.jpg)
2015 Hany SalahEldeen Dissertation Defense 49
Social Events Having a Bimodal Time Distribution
![Page 50: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/50.jpg)
2015 Hany SalahEldeen Dissertation Defense 50
Existence as a function of time
![Page 51: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/51.jpg)
2015 Hany SalahEldeen Dissertation Defense 51
Existence as a function of time
![Page 52: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/52.jpg)
2015 Hany SalahEldeen Dissertation Defense 52
• Results:
• Publications and Articles:1. H. M. SalahEldeen. Losing My Revolution: A year after the Egyptian Revolution, 10% of the
social media documentation is gone. http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html , 2012.
2. H. M. SalahEldeen and M. L. Nelson. Losing my revolution: how many resources shared on social media have been lost? In Proceedings of the Second international conference on Theory and Practice of Digital Libraries, TPDL'12, 2012.
Conclusion: Existence could be estimated as a function of time
• Measured 21,625 resources from 6 data sets in archives & live web.
• After a year from publishing about 11% of content shared on social media will be gone.
• After this we are losing roughly 0.02% daily.
![Page 53: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/53.jpg)
2015 Hany SalahEldeen Dissertation Defense 53
Revisiting Existence after a year
MJ Iran H1N1 Obama Egypt Syria
Measured 37.10% 37.50% 28.17% 30.56% 26.29% 31.62% 32.47% 24.64% 7.55% 12.68%Predicted 31.72% 31.42% 31.96% 30.98% 30.16% 29.68% 29.60% 28.36% 19.80% 11.54%
Error 5.38% 6.08% 3.79% 0.42% 3.87% 1.94% 2.87% 3.72% 12.25% 1.14%
MJ Iran H1N1 Obama Egypt SyriaMeasured 48.61% 40.32% 60.80% 55.04% 47.97% 52.14% 48.38% 40.58% 23.73% 0.56%Predicted 61.78% 61.18% 62.26% 60.30% 58.66% 57.70% 57.54% 55.06% 37.94% 21.42%Error 13.17% 20.86% 1.46% 5.26% 10.69% 5.56% 9.16% 14.48% 14.21% 20.86%
Average Prediction Error = 11.57%
in all cases, our archival predictions were too optimistic
Missing
Archived
Average Prediction Error = 4.15%
in all cases, our missing predictions were acceptable
![Page 54: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/54.jpg)
2015 Hany SalahEldeen Dissertation Defense 54
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Third: Can we use social context to find replacements of missing resources?
![Page 55: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/55.jpg)
2015 Hany SalahEldeen Dissertation Defense 55
Context discovery and shared resource replacement
Problem:
140 characters limits the description of the linked resource. If it went missing, can we get the next best thing?
Solution:
• Shared links typically have several tweets, responses, and retweets
• We can mine these traces for context and viable replacements
![Page 56: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/56.jpg)
2015 Hany SalahEldeen Dissertation Defense 56
Context Discovery
Linking to: http://beta.18daysinegypt.com/
![Page 57: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/57.jpg)
2015 Hany SalahEldeen Dissertation Defense 57
What if the resource disappeared?
Linking to: http://beta.18daysinegypt.com/
![Page 58: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/58.jpg)
2015 Hany SalahEldeen Dissertation Defense 58
Use Topsy to discover tweets sharing the same link
![Page 59: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/59.jpg)
2015 Hany SalahEldeen Dissertation Defense 59
Social Context Extraction{
"URI": "http://beta.18daysinegypt.com/",
"Related Tweet Count": 500,
"Related Hashtags": "#tran #citizensx #arabspring #visualstorytelling
#collaborativerevolution #feb11http://t.co/qxusp70 ...",
"Users who talked about this": "@petra_stienen: @waleedrashed:
@omarsamra @ungormite: @dcisbusy @webdocumentario: ...",
"All associated unique links:": "http://t.co/63X1f3f1
http://t.co/reBh6c4V http://t.co/B3GuhQN4 http://t.co/X2sjf4Rf
http://t.co/P9iR28fH http://t.co/1C4EPh8h ...",
"All other links associated:": "http://vimeo.com/35368376
http://mashable.com/2012/01/21/18daysinegypt-2/ ",
"Most frequent link appearing:": "http://t.co/2ke0rEjP",
"Number of times the Most frequent link appearing:": 49,
"Most frequent tweet posted and reposted:": "Check out 18DaysInEgypt -
A crowd sourced documentary project ================= via
@18daysinegypt",
"Number of times the Most frequent tweet appearing:": 46,
"The longest common phrase appearing:": "RT 2ke0rEjP is an interactive
documentary website that YOU can help create Get your Jan25 stories
ready! Pl RT",
"Number of times the Most common phrase appearing:": 18
}
![Page 60: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/60.jpg)
2015 Hany SalahEldeen Dissertation Defense 60
Build a Tweet Document
A tweet document represents the concatenation of all extracted tweets:
do you have a story to tell about your 18 days of revolution? share it or contact sara 18days brand new interactive storytelling project on egyptian revolution a very creative platform to tell your story daysinegypt marches heading to tahrir square now from all over cairoit's all over again use the website to document your revolutionary stories and share them with the world! check out awesome documentary project crowdsourcing a people's narrative of the egyptian revolution … ”
“
![Page 61: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/61.jpg)
2015 Hany SalahEldeen Dissertation Defense 61
Tweet Signature
Tweet Document:
do you have a story to tell about your 18 days of revolution? share it or contact sara 18days brand new interactive storytelling project on egyptian revolution a very creative platform to tell your story daysinegypt marches heading to tahrir square now from all over cairoit's all over again use the website to document your revolutionary stories and share them with the world! check out awesome documentary project crowdsourcing a people's narrative of the egyptian revolution … ”
“
Tweet Signature = top 5 most frequent terms from Tweet Document
documentary project daysinegypt check sourced
![Page 62: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/62.jpg)
2015 Hany SalahEldeen Dissertation Defense 62
Query Google with the Tweet Signature
![Page 63: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/63.jpg)
2015 Hany SalahEldeen Dissertation Defense 63
Search Engine Results
The original resource
![Page 64: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/64.jpg)
2015 Hany SalahEldeen Dissertation Defense 64
Search Engine Results
The original resource
The others are good replacement
candidates
![Page 65: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/65.jpg)
2015 Hany SalahEldeen Dissertation Defense 65
Recommendation Evaluation
We extract a dataset of resources that are currently available:• Pretend these resources no longer exist (for a baseline)
• Each of the resources are textual based
• Each resource has at least 30 retrievable tweets.
Extracted 731 unique resources
We use boiler plate removal library to remove the template from the:• linked resources
• top 10 retrieved results from Google
We use cosine similarity to compare the documents
![Page 66: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/66.jpg)
2015 Hany SalahEldeen Dissertation Defense 66
Similarity measures in resource replacement
----70% similarity----
41% of the cases we found a replacement with >=70% similarity
![Page 67: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/67.jpg)
2015 Hany SalahEldeen Dissertation Defense 67
Conclusion: We can find viable replacements for missing shared resources
• Results:• 41% of the test cases we can find a replacement page with at least 70% similarity to the original
missing resource• The search results provide a mean reciprocal rank of 0.43
• Publications:1. H. SalahEldeen and M. L. Nelson. Resurrecting my revolution: Using social link
neighborhood in bringing context to the disappearing web. In Research and Advanced Technology for Digital Libraries- International Conference on Theory and Practice of Digital Libraries, TPDL 2013, 2013.
![Page 68: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/68.jpg)
2015 Hany SalahEldeen Dissertation Defense 68
Now we finished analyzing the shared resource…what’s next?
![Page 69: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/69.jpg)
2015 Hany SalahEldeen Dissertation Defense 69
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Footprints on the web
![Page 70: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/70.jpg)
2015 Hany SalahEldeen Dissertation Defense 70
The tweet, the resource…and time
time
Posted a tweet
Read the tweetRelevancy of the resource to the tweet changed through time
we need to measure that
Another tweet posted
And another
…
We need to measure tweet relevance through time
![Page 71: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/71.jpg)
2015 Hany SalahEldeen Dissertation Defense 71
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Longitudinal Study: Rate of change of shared content
![Page 72: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/72.jpg)
2015 Hany SalahEldeen Dissertation Defense 72
Pilot 1: Resource change in the first 80 hours after tweeting
![Page 73: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/73.jpg)
2015 Hany SalahEldeen Dissertation Defense 73
Pilot 2: Delta days from Bitly creation for just tweeted content
Dataset size = 4,000
![Page 74: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/74.jpg)
2015 Hany SalahEldeen Dissertation Defense 74
Pilot 3: Dataset of 1,000 freshly created Bitlys
http://www.cnn.com depth = 0
http://www.cnn.com/world depth = 1
http://www.cnn.com/2009/SHOWBIZ/Music/06/25/jackson depth = 6
![Page 75: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/75.jpg)
2015 Hany SalahEldeen Dissertation Defense 75
What domains do users link to?
![Page 76: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/76.jpg)
2015 Hany SalahEldeen Dissertation Defense 76
What categories* do users link to?
* Extracted from Alexa.com
![Page 77: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/77.jpg)
2015 Hany SalahEldeen Dissertation Defense 77
Summation of Intention in Social Content Through Time
Longitudinal study: We record the change over an extended period of time:• Content: we download a snapshot of the resource every 45 minutes
• Metadata: we collect meta data about the resource• Facebook likes, posts• Tweets in the last hour• Bitly clicklogs and shares
• Average data size: ~1 TB per month
![Page 78: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/78.jpg)
2015 Hany SalahEldeen Dissertation Defense 78
Hourly analysis over an extended period of time
![Page 79: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/79.jpg)
2015 Hany SalahEldeen Dissertation Defense 79
There is a difference between ttweet and tclick
• After just one hour, 4% of the resources have changed by 30%.• After six hours, the percentage doubled to be 8% changed by 40%.• After a day the change rate slowed to be 12% of the resources
changed by 40%.• After that it almost stabilizes at 17% of the resources to be
changed by 40%.
![Page 80: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/80.jpg)
2015 Hany SalahEldeen Dissertation Defense 80
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Archive & Creation
First: Resource – Time – Public Archives
![Page 81: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/81.jpg)
2015 Hany SalahEldeen Dissertation Defense 81
Revisited: Resources Missing and Archived
Collection Percentage Missing Percentage Archived
23.49%H1N1 Outbreak 41.65%
36.24%Michael Jackson 39.45%
26.98%Iran 43.08%
24.59%Obama 47.87%
10.48%Egypt 20.18%
7.04%Syria 5.35%
31.62% 30.78%
24.47% 36.26%
25.64% 43.87%
26.15% 46.15%
![Page 82: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/82.jpg)
2015 Hany SalahEldeen Dissertation Defense 82
But on a more general notion we want to know…
![Page 83: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/83.jpg)
2015 Hany SalahEldeen Dissertation Defense 83
How much of the web is archived?
• Goal: Estimate how much of the public web is present in the public archives and how many copies are available?
• Action:• Getting 4 different datasets from 4 different sources:
• Search Engines Indices• Bit.ly• DMOZ• Delicious.
![Page 84: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/84.jpg)
2015 Hany SalahEldeen Dissertation Defense 84
Conclusion: It depends on the source
• Results:
• Publication:S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the web is archived? In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL '11, pages 133-136, New York, NY, USA, 2011. ACM.
![Page 85: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/85.jpg)
2015 Hany SalahEldeen Dissertation Defense 85
Conclusion: It depends on the source
• Results:
• Publication:S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the web is archived? In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL '11, pages 133-136, New York, NY, USA, 2011. ACM.
Changes since 2011:
no more free SE APIs;
greatly reduced IA
quarantine period; 15
public web archives
2013
95%
92%
23%
26%
![Page 86: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/86.jpg)
2015 Hany SalahEldeen Dissertation Defense 86
Side Experiment: Analyzing the quality of the archives and the archived content
• Goal:• Assessing the quality of the web archives• Better discussed in Justin Brunelle’s work
• Publications:1. J. F. Brunelle, M. Kelly, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. Not All Mementos
Are Created Equal: Measuring The Impact Of Missing Resources. In Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), 2014 (Best student paper award)
![Page 87: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/87.jpg)
2015 Hany SalahEldeen Dissertation Defense 87
A question emerged: When did a certain resource first appear on
the web?
![Page 88: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/88.jpg)
2015 Hany SalahEldeen Dissertation Defense 88
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Archive & Creation
Second: When was the resource created?
![Page 89: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/89.jpg)
2015 Hany SalahEldeen Dissertation Defense 89
Idea
Web pages leave trails as well since the day they were created…
![Page 90: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/90.jpg)
2015 Hany SalahEldeen Dissertation Defense 90
WebResource
Web trails
A web page could leave a trail of one of the following denoting its existence:
• References
• Links (anchors)
• Social media likes and interactions.
• URL shortening.
• Backlinks
• The creation date of any of the associated events/trails could be an estimate of the creation date.
![Page 91: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/91.jpg)
2015 Hany SalahEldeen Dissertation Defense 91
Resource’s timeline
![Page 92: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/92.jpg)
2015 Hany SalahEldeen Dissertation Defense 92
Observations Recorded
1.Last modified date from the response header.2.First Appearance of a backlink.3.First Tweet published.4.First Bitly Shortened URL created.5.Time stamp of first memento in the archives.6.Date of the last crawl by the search engine.
![Page 93: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/93.jpg)
2015 Hany SalahEldeen Dissertation Defense 93
Carbon Date service
![Page 94: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/94.jpg)
2015 Hany SalahEldeen Dissertation Defense 94
Carbon Dating API{
"self": "http://cd.cs.odu.edu/cd?url=http://www.cnn.com","URI": "http://www.cnn.com","Estimated Creation Date": "1998-12-06T04:02:33","Last Modified": "","Bitly.com": "2008-06-08T12:00:00","Topsy.com": "2015-01-25T23:31:42","Backlinks": "2003-03-12T05:35:44","Google.com": "2005-01-11T00:00:00","Archives": [
["Earliest","1998-12-06T04:02:33"
],[
"By_Archive",{
"http://archive.today/20000815052826/http://www.cnn.com/": "2000-08-15T05:28:26","http://arquivo.pt/wayback/wayback/20000815052826/http://www.cnn.com/": "2000-08-15T05:28:26","http://wayback.vefsafn.is/wayback/20011106102722/http://www.cnn.com/": "1998-12-06T04:02:33","http://web.archive.org/web/20131218180509/http://www.cnn.com/": "2013-12-18T18:05:09"
}]
]}
![Page 95: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/95.jpg)
2015 Hany SalahEldeen Dissertation Defense 95
Evaluation Dataset
From each we randomly selected 100 unique URLs to create our gold standard dataset
![Page 96: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/96.jpg)
2015 Hany SalahEldeen Dissertation Defense 96
Evaluation
• Applied our 6 methods on 1200 resources.
• Get leftmost estimate.
Number of Resources Percentage
An estimate found 910 76%
Exact matching estimate 393 33%
No estimate found 290 24%
Total Resources 1200 100%
![Page 97: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/97.jpg)
2015 Hany SalahEldeen Dissertation Defense 97
Actual Vs. Estimated Dates
![Page 98: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/98.jpg)
2015 Hany SalahEldeen Dissertation Defense 98
Conclusion: We can estimate the creation date of resources correctly
• Results:• Succeeded in estimating the creation date accurately in 75.90% of the resources.
• Publications:1. H. M. SalahEldeen and M. L. Nelson. Carbon dating the web: Estimating the age of web
resources. In Proceedings of the 22nd International Conference on World Wide Web Companion, TempWeb03, WWW '13, 2013
![Page 99: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/99.jpg)
2015 Hany SalahEldeen Dissertation Defense 99
Alexander Nwala did an awesome job releasing the second version of Carbon Date which is more reliable, multithreaded, faster, can handle multiple requests, has caching capabilities.
http://cd.cs.odu.edu/
![Page 100: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/100.jpg)
2015 Hany SalahEldeen Dissertation Defense 100
Alexander Nwala did an awesome job releasing the second version of Carbon Date which is more reliable, multithreaded, faster, can handle multiple requests, has caching capabilities.
Yes, it’s better than mine… I admit it
![Page 101: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/101.jpg)
2015 Hany SalahEldeen Dissertation Defense 101
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Archive & Creation
User’s Temporal Intention
![Page 102: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/102.jpg)
2015 Hany SalahEldeen Dissertation Defense 102
Problem: There is an inconsistency between what the tweet’s author intended
to share at time ttweet
and what the reader might actually read upon clicking on the link at time tclick .
![Page 103: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/103.jpg)
2015 Hany SalahEldeen Dissertation Defense 103
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Archive & Creation
Detecting
What is Intention and how to detect it?
![Page 104: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/104.jpg)
2015 Hany SalahEldeen Dissertation Defense 104
Amazon’s Mechanical Turk
• Crowdsourcing Internet marketplace
• Co-ordinates the use of human intelligence to perform tasks that computers are currently unable to do.*
* http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk
![Page 105: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/105.jpg)
2015 Hany SalahEldeen Dissertation Defense 105
Goal: Understand and collect user intention data via MT
Tweets dataset Intention Classification Tasks User Intention Data
Classifier
Train
![Page 106: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/106.jpg)
2015 Hany SalahEldeen Dissertation Defense 106
Goal: Understand and collect user intention data via MT
Tweets dataset Intention Classification Tasks User Intention Data
Classifier
Train
• Problem:• It is not as easy as it seems!
![Page 107: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/107.jpg)
2015 Hany SalahEldeen Dissertation Defense 107
How NOT to classify temporal intention 101
• The tweet is presented along with the two snapshots:
at ttweet at tclick
![Page 108: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/108.jpg)
2015 Hany SalahEldeen Dissertation Defense 108
And compared MT results with Experts
• Experts: Manually assigning a version to each tweet via a face to face meeting with WS-DL members.
• For 9 MT assignments per tweet:• If we allowed 4-5 splits we have 58% match with WS-DL.
• If we allowed 3-6 splits or better we got 31% match
Which is worse than flipping a coin!
![Page 109: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/109.jpg)
2015 Hany SalahEldeen Dissertation Defense 109
Idea: We need to transform the problem from intention to relevance.
![Page 110: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/110.jpg)
2015 Hany SalahEldeen Dissertation Defense 110
Relevance tasks are simpler
• MT workers are more accustomed to classification tasks and it requires minimum amount of explanation
• Transform a hard problem to an easy one
Is that a cat?
- Yes
- No
![Page 111: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/111.jpg)
2015 Hany SalahEldeen Dissertation Defense 111
Temporal Intention Relevancy Model (TIRM)
Between ttweet and tclick:
The linked resource could have:• Changed• Not changed
The tweet and the linked resource could be:• Still relevant• No longer relevant
![Page 112: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/112.jpg)
2015 Hany SalahEldeen Dissertation Defense 112
Resource is changed but relevant
• The resource changed• But it is still relevant
Intention: need the current version of the resource at any time
![Page 113: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/113.jpg)
2015 Hany SalahEldeen Dissertation Defense 113
Relevancy and Intention mapping
Current
![Page 114: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/114.jpg)
2015 Hany SalahEldeen Dissertation Defense 114
Resource is changed and not relevant
Intention: need the past version of the resource at any time
• The resource changed• But it is no longer relevant
![Page 115: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/115.jpg)
2015 Hany SalahEldeen Dissertation Defense 115
Relevancy and Intention mapping
PastCurrent
![Page 116: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/116.jpg)
2015 Hany SalahEldeen Dissertation Defense 116
Resource is not changed and relevant
Intention: need the past version of the resource at any time
• The resource is not changed• And it is relevant
![Page 117: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/117.jpg)
2015 Hany SalahEldeen Dissertation Defense 117
Relevancy and Intention mapping
PastCurrent
Past
![Page 118: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/118.jpg)
2015 Hany SalahEldeen Dissertation Defense 118
Resource is not changed and not relevant
Intention: I am not sure which version of the resource I need
• The resource is not changed• But it is not relevant
![Page 119: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/119.jpg)
2015 Hany SalahEldeen Dissertation Defense 119
Relevancy and Intention mapping
PastCurrent
Past Not Sure
![Page 120: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/120.jpg)
2015 Hany SalahEldeen Dissertation Defense 120
Validation: Update the MT experiment
• MT workers ≡ judgments of the experts (WS-DL members)
✓
Is the content still relevant to the tweet?
![Page 121: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/121.jpg)
2015 Hany SalahEldeen Dissertation Defense 121
Mechanical Turk Workers Vs. Experts
• For 100 tweets, WS-DL members % of agreement:
• Cohen’s K = 0.854 almost perfect agreement
Agreement in 3-2 split or more votes 93%
Agreement in 4-1 split or more votes 80%
Agreement with 5-0 votes 60%
![Page 122: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/122.jpg)
2015 Hany SalahEldeen Dissertation Defense 122
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Archive & Creation
Detecting
Modeling
Can we model this temporal intention?
![Page 123: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/123.jpg)
2015 Hany SalahEldeen Dissertation Defense 123
Data Collection
• From SNAP dataset we extracted:• Tweets in English
• Each has an embedded URI pointing to an external resource.
• The embedded URI is shortened via Bit.ly
• The external resource:• Still persists.
• Has at least 10 mementos.
• Is unique.
We extracted 5,937 unique instances
![Page 124: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/124.jpg)
2015 Hany SalahEldeen Dissertation Defense 124
Time delta between the tweet and the closest memento
Randomly selected 1,124 instancesTime delta range: 3.07 minutes to 56.04 hours Average: 25.79 hours ~ 1 day
Tweet time
After Tweet time
Before Tweet time
![Page 125: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/125.jpg)
2015 Hany SalahEldeen Dissertation Defense 125
Training Dataset
• Rcurrent: The state of the resource at current time.
• Rclick: The state of the resource at click time.
Relevant Assignments 929 82.65%
Non-Relevant Assignments 195 17.35%
5 MT workers agreeing (5-0 split) 589 52.40%
4 MT workers agreeing (4-1 split) 309 27.49%
3 MT workers agreeing (3-2 close call split) 226 20.11%
![Page 126: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/126.jpg)
2015 Hany SalahEldeen Dissertation Defense 126
Training Dataset
• Rcurrent: The state of the resource at current time.
• Rclick: The state of the resource at click time.
Relevant Assignments 929 82.65%
Non-Relevant Assignments 195 17.35%
5 MT workers agreeing (5-0 split) 589 52.40%
4 MT workers agreeing (4-1 split) 309 27.49%
3 MT workers agreeing (3-2 close call split) 226 20.11%
![Page 127: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/127.jpg)
2015 Hany SalahEldeen Dissertation Defense 127
Intention modeling: Feature extraction
• For each tweet we perform:• Link analysis• Social media mining• Archival existence• Sentiment analysis• Content similarity• Entity identification
![Page 128: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/128.jpg)
2015 Hany SalahEldeen Dissertation Defense 128
Training the classifier
• From the feature extraction phase we extracted 39 different features to train the classifier.
• Using 10-fold cross validation, the Cost Sensitive Classifier Based on Random Forests gave the highest success rate = 90.32%
![Page 129: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/129.jpg)
2015 Hany SalahEldeen Dissertation Defense 129
Most significant features sorted by information gain
Rank Feature Gain Ratio
1 Existence of celebrities in tweets 0.149
2 Number of mementos 0.090
3 Tweet similarity with current page 0.071
4 Similarity: Current & past page 0.053
5 Similarity: Tweet & past page 0.044
6 Original URI’s depth 0.032
![Page 130: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/130.jpg)
2015 Hany SalahEldeen Dissertation Defense 130
Testing the model
• We tested against:• The remaining 4,813 from the original 5,937 instances after extracting the 1,124 used
in training.
• The Tweet Collections based on historic events. (MJ, Obama, Iran, Syria, & H1N1)
Dataset Status 200 Status 404 or other Relevant % Non-Relevant %
Extended 4,813 instances 96.77% 3.23% 96.74% 3.26%
MJ’s Death 57.54% 42.46% 93.24% 6.76%
H1N1 Outbreak 8.96% 91.04% 97.48% 2.52%
Iran Elections 68.21% 31.79% 94.69% 5.31%
Obama’s Nobel Prize 62.86% 37.14% 93.89% 6.11%
Syrian Uprising 80.80% 19.20% 70.26% 29.75%
![Page 131: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/131.jpg)
2015 Hany SalahEldeen Dissertation Defense 131
Idea: We need to transform the problem from intention to relevance.
Now we need to transform it back!
Recap…
![Page 132: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/132.jpg)
2015 Hany SalahEldeen Dissertation Defense 132
Recap: Relevancy and Intention mapping
PastReading
the wrong history
![Page 133: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/133.jpg)
2015 Hany SalahEldeen Dissertation Defense 133
Mapping TIRM
• We used 70% similarity as a threshold of relevancy.
Reading the wrong
historyIn up to
25% of the cases
![Page 134: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/134.jpg)
2015 Hany SalahEldeen Dissertation Defense 134
Conclusion: We can model users’ temporal intention accurately and efficiently
• Results:• We successfully transformed the complicated problem of intention to a simpler one of relevance.• We successfully collected a gold standard dataset of temporal user intention.• We found a temporal inconsistency in the shared resource up to 25% of the cases according to the
dataset.
• Publications:1. H. M. SalahEldeen and M. L. Nelson. Reading the correct history?: Modeling temporal
intention in resource sharing. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL '13, 2013.
![Page 135: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/135.jpg)
2015 Hany SalahEldeen Dissertation Defense 135
So we modeled intention… can we make it better?
![Page 136: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/136.jpg)
2015 Hany SalahEldeen Dissertation Defense 136
Most significant features sorted by information gain
Rank Feature Gain Ratio
1 Existence of celebrities in tweets 0.149
2 Number of mementos 0.090
3 Tweet similarity with current page 0.071
4 Similarity: Current & past page 0.0527
5 Similarity: Tweet & past page 0.04401
6 Original URI’s depth 0.0324
![Page 137: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/137.jpg)
2015 Hany SalahEldeen Dissertation Defense 137
Most significant features sorted by information gain
Rank Feature Gain Ratio
1 Existence of celebrities in tweets 0.149
2 Number of mementos 0.090
3 Tweet similarity with current page 0.071
4 Similarity: Current & past page 0.0527
5 Similarity: Tweet & past page 0.04401
6 Original URI’s depth 0.0324
![Page 138: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/138.jpg)
2015 Hany SalahEldeen Dissertation Defense 138
Enhancing TIRM
• Extending and tuning the features:• Linguistic feature analysis• Semantic similarity analysis using latent topic modeling• Dataset balancing• Feature selection and minimization
![Page 139: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/139.jpg)
2015 Hany SalahEldeen Dissertation Defense 139
A whole lot of features!39 65 different features in extended TIRM
Further details: refer to chapter 7
![Page 140: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/140.jpg)
2015 Hany SalahEldeen Dissertation Defense 140
TIRM enhancement and minimization results
![Page 141: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/141.jpg)
2015 Hany SalahEldeen Dissertation Defense 141
Point of Confusion: C
Point of Certainty: S
Strongest Current Intention
From binary to probabilistic strength
Further details: refer to chapter 7
![Page 142: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/142.jpg)
2015 Hany SalahEldeen Dissertation Defense 142
Intention strength formulation
Intention strength magnitude of the new resource:
Generalization in regards of class:
![Page 143: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/143.jpg)
2015 Hany SalahEldeen Dissertation Defense 143
Intention strength across instances in dataset
![Page 144: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/144.jpg)
2015 Hany SalahEldeen Dissertation Defense 144
![Page 145: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/145.jpg)
2015 Hany SalahEldeen Dissertation Defense 145
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Archive & Creation
Detecting
Modeling
Pre
dic
tin
g
Can we find a relation between the modeled intention and time
…to predict it?
![Page 146: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/146.jpg)
2015 Hany SalahEldeen Dissertation Defense 146
Remember: Data Collection
• From SNAP dataset we extracted:• Tweets in English
• Each has an embedded URI pointing to an external resource.
• The embedded URI is shortened via Bit.ly
• The external resource:• Still persists.
• Has at least 10 mementos.
• Is unique.
We extracted 5,937 unique instances
![Page 147: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/147.jpg)
2015 Hany SalahEldeen Dissertation Defense 147
Intention strength across time
time
Resource = Closest
memento
Resource = current versionWe have 10 mementos of the resource uniformly distributed
…
We can calculate intention strength at every point
![Page 148: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/148.jpg)
2015 Hany SalahEldeen Dissertation Defense 148
Intention strength across time
Dataset collection and calculation framework
![Page 149: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/149.jpg)
2015 Hany SalahEldeen Dissertation Defense 149
Behavior of instances in different classes
time
time
time
Inte
nti
on
str
engt
h
Inte
nti
on
str
engt
h
Inte
nti
on
str
engt
h
Steady Current Intention
Steady Past Intention
![Page 150: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/150.jpg)
2015 Hany SalahEldeen Dissertation Defense 150
Behavior of instances in different classes
![Page 151: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/151.jpg)
2015 Hany SalahEldeen Dissertation Defense 151
Given the features we already collected can we classify tweets
according to their behavioral class?
![Page 152: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/152.jpg)
2015 Hany SalahEldeen Dissertation Defense 152
Classifying intention behavior across time
![Page 153: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/153.jpg)
2015 Hany SalahEldeen Dissertation Defense 153
If we can limit the features to the ones that exist before tweet time
can we perform a prediction?
![Page 154: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/154.jpg)
2015 Hany SalahEldeen Dissertation Defense 154
Classifying intention behavior across time
We can perform a prediction!
![Page 155: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/155.jpg)
2015 Hany SalahEldeen Dissertation Defense 155
Intention behavior prediction classifier
![Page 156: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/156.jpg)
2015 Hany SalahEldeen Dissertation Defense 156
Conclusion: We can predict the author’s temporal intention
• Results:• We can predict for the author whether the intention conveyed to the readers will be
consistent or will it change with 77% accuracy.
• Publications:1. H. M. SalahEldeen and M. L. Nelson. Predicting Temporal Intention in Resource Sharing. In
Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL '15, 2015.
![Page 157: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/157.jpg)
2015 Hany SalahEldeen Dissertation Defense 157
At this time, we successfully detected, modeled and predicted
User’s Temporal Intention in Shared Content
![Page 158: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/158.jpg)
2015 Hany SalahEldeen Dissertation Defense 158
Shared Resource Time User
Alive
Mis
sin
g
Replaced
Rate of Change
Archive & Creation
Detecting
Modeling
Pre
dic
tin
g
Use
r Te
mp
ora
l In
ten
tio
n
Temporal Intention Model
![Page 159: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/159.jpg)
2015 Hany SalahEldeen Dissertation Defense 159
So we built an awesome prediction model for Temporal
Intention… what next?
![Page 160: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/160.jpg)
2015 Hany SalahEldeen Dissertation Defense 160
A Framework of Temporal Intention
time
Posted a tweet
Read the tweet
• Tools for authors• Enrich the archives with current content
for posterity
![Page 161: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/161.jpg)
2015 Hany SalahEldeen Dissertation Defense 161
Prediction API
![Page 162: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/162.jpg)
2015 Hany SalahEldeen Dissertation Defense 162
Tools for Authors
![Page 163: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/163.jpg)
2015 Hany SalahEldeen Dissertation Defense 163
Temporal Intention Implementation
time
Posted a tweet
Read the tweet
• Tools for readers• Maintain the temporal consistence of
content
![Page 164: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/164.jpg)
2015 Hany SalahEldeen Dissertation Defense 164
Tools for readers
![Page 165: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/165.jpg)
2015 Hany SalahEldeen Dissertation Defense 165
Tools for readers
1. Temporal preservation of
vulnerable content
2. Version recommendation
based on temporal intention
estimation
Target Publication: Utilizing Temporal Intention
Prediction for Just-in-time Preservation and
Recommendation of Vulnerable Social Media
Content. WSDM 2016
![Page 166: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/166.jpg)
2015 Hany SalahEldeen Dissertation Defense 166
Motivation
Background
Related Research
Research Question
User-Time-Shared Resource
Conclusions
![Page 167: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/167.jpg)
2015 Hany SalahEldeen Dissertation Defense 167
Accomplished Goals
• Detect the temporal intention of the:
1. Author upon sharing time
2. The reader upon dereferencing time
• Model this intention as a function of time, nature of the resource, and its context.
• Predict how resources change with time and the intention behind sharing them to minimize inconsistency.
• Implement the prediction model to automatically preserve vulnerable social content that is prone to change or loss and provide a smooth temporal navigation of the social web.
Further details: refer to chapter 6
Further details: refer to chapter 7
Further details: refer to chapter 8
Further details: refer to chapter 9
![Page 168: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/168.jpg)
2015 Hany SalahEldeen Dissertation Defense 168
Also, our work reached fame…
![Page 169: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/169.jpg)
2015 Hany SalahEldeen Dissertation Defense 169
The Virginian Pilot
![Page 170: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/170.jpg)
2015 Hany SalahEldeen Dissertation Defense 170
http://www.bbc.com/future/story/20120927-the-decaying-web
BBC.com
![Page 171: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/171.jpg)
2015 Hany SalahEldeen Dissertation Defense 171
Popular MechanicsFebruary 2014 issue, page 20
![Page 172: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/172.jpg)
2015 Hany SalahEldeen Dissertation Defense 172
3 x MIT Technology
Review
http://www.technologyreview.com/view/513996/how-to-carbon-date-a-web-page/
http://www.technologyreview.com/view/519391/internet-archaeologists-reconstruct-lost-web-pages/
http://www.technologyreview.com/view/429274/history-as-recorded-on-twitter-is-vanishing-from-the-web-say-computer-scientists/
![Page 173: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/173.jpg)
2015 Hany SalahEldeen Dissertation Defense 173
Mashable
![Page 174: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/174.jpg)
2015 Hany SalahEldeen Dissertation Defense 174
Mashable
Yes I am Indiana Jones of the
internet
![Page 175: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/175.jpg)
2015 Hany SalahEldeen Dissertation Defense 175
Publications
Published Submitted In preparation Planned
JCDL 2011 TPDL 2015 WWW 2016 IJDL 2016
TPDL 2012 SIGIR 2016 WSDM 2016
JCDL 2013
TPDL 2013
WWW 2013
DL 2014
AAAI 2015
IJDL 2015
JCDL 2015
![Page 176: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/176.jpg)
2015 Hany SalahEldeen Dissertation Defense 176
Remember Rémi Ochlik?
Rémi Ochlik16 October 1983 – 22 February 2012
![Page 177: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/177.jpg)
2015 Hany SalahEldeen Dissertation Defense 177
… and the missing content about him?
Accessed in July 2014
![Page 178: Doctoral Defense: Hany SalahEldeen](https://reader036.vdocument.in/reader036/viewer/2022081502/55c62406bb61ebc4338b4726/html5/thumbnails/178.jpg)
2015 Hany SalahEldeen Dissertation Defense 178
We can maintain the consistency of history
Our Temporal Intention Relevancy Model