we need multiple, independent web archives
TRANSCRIPT
![Page 1: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/1.jpg)
We Need Multiple, Independent Web Archives
Panel 4: Social Media Research Data, Tools, and Methodologies
Michael L. Nelson
Old Dominion UniversityWeb Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/@phonedude_mln
With: ODU: Michele C. Weigle
Los Alamos National Laboratory: Herbert Van de Sompel
![Page 2: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/2.jpg)
![Page 3: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/3.jpg)
timetravel.mementoweb.org
http://timetravel.mementoweb.org/list/20140525002314/http://www.bbc.co.uk/
e.g., bbc.co.uk in six different archives…
![Page 4: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/4.jpg)
Seagal’s Law
A man with a watch knows what time it is. A man with two watches is never sure.
How to resolve conflicting archives?
Personalization, GeoIP, mobile vs. desktop, etc.means “the” page rarely exists, only “a” page.
Mat Kelly, Justin F. Brunelle, Michele C. Weigle, and Michael L. Nelson, A Method for Identifying Personalized Representations in Web Archives,
D-Lib Magazine, 19(11/12), 2013. http://www.dlib.org/dlib/november13/kelly/11kelly.html
![Page 5: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/5.jpg)
Why we need multiple, independent archives…
![Page 6: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/6.jpg)
A single archive is vulnerable
http://www.bbc.com/news/uk-politics-24924185 http://ws-dl.blogspot.com/2013/11/2013-11-21-conservative-party-speeches.html
![Page 7: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/7.jpg)
Houston, Tranquility Base Here. The Eagle has landed.
see also: http://ws-dl.blogspot.com/2013/03/2013-03-22-ntrs-web-archives-and-why-we.html
![Page 8: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/8.jpg)
http://www.theguardian.com/technology/2015/feb/19/google-acknowledges-some-people-want-right-to-be-forgotten
![Page 9: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/9.jpg)
$ curl –I "http://www.thedailybeast.com/articles/2016/08/11/i-got-three-grindr-dates-in-an-hour-in-the-olympic-village.html"HTTP/1.1 301 Moved PermanentlyAccess-Control-Allow-Origin: *Age: 0Cache-Control: max-age=60Content-Type: text/html; charset=iso-8859-1Date: Thu, 18 Aug 2016 01:13:46 GMTLocation: http://www.thedailybeast.com/articles/2016/08/11/a-note-from-the-editors.htmlRealAge: 0Server: ApacheVary: Accept-Encoding, User-AgentVia: 1.1 varnishX-BackEnd: defaultX-Cache: MISSX-Cacheable: YESX-Restarts: 0X-UA-Device: pcX-Varnish: 995407903Connection: keep-alive
http://www.usnews.com/news/articles/2016-08-17/wayback-machine-wont-censor-archive-for-taste-director-says-after-olympics-article-scrubbed
![Page 10: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/10.jpg)
But who pays for those extra archives?
1TB endowment = ~$4700: http://blog.dshr.org/2011/02/paying-for-long-term-storage.html see also: http://blog.dshr.org/2011/01/memento-marketplace-for-archiving.html
![Page 11: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/11.jpg)
Archives Aren’t Magic Web SitesThey’re Just Web Sites.
If you used Mummify, you’re now left with a bunch of defunct, shortened links like: https://mummify.it/XbmcMfE3
Don’t throw away link semantics! See: http://robustlinks.mementoweb.org
![Page 12: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/12.jpg)
Economics Working Against Archives
In the paper world in order to monetize their content the copyright owner had to maximize the number of copies of it. In the Web world, in order to monetize their content the copyright owner has to minimize the number of copies. Thus the fundamental economic motivation for Web content militates against its preservation in the ways that Herbert and I would like.
--David Rosenthalhttp://blog.dshr.org/2015/02/the-evanescent-web.html
![Page 13: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/13.jpg)
“We’ll use the cloud!”
![Page 14: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/14.jpg)
https://www.chriswatterston.com/blog/my-there-no-cloud-sticker
![Page 15: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/15.jpg)
http://www.bbc.com/future/story/20120927-the-decaying-web
On January 28 2011, three days into the fierce protests that would eventually oust the Egyptian president Hosni Mubarak, a Twitteruser called Farrah posted a link to a picture that supposedly showedan armed man as he ran on a “rooftop during clashes between policeand protesters in Suez”. I say supposedly, because both the tweetand the picture it linked to no longer exist. Instead they havebeen replaced with error messages that claim the message – and itscontents – “doesn’t exist”.
![Page 16: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/16.jpg)
Missing Tweet & Pic
https://twitter.com/Farrah3m/status/31727870736859137 http://twitpic.com/3uvo6z
http://ws-dl.blogspot.com/2013/05/2013-05-07-who-is-archiving-your-tweets.html
![Page 17: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/17.jpg)
In May 2013, not completely missing…
![Page 18: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/18.jpg)
In February 2015, completely missing.
http://topsy.com/http://twitpic.com/3uvo6z
![Page 19: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/19.jpg)
In 2016, Redirecting
http://topsy.com/http://twitpic.com/3uvo6z
![Page 20: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/20.jpg)
In 2016, Redirecting
http://topsy.com/http://twitpic.com/3uvo6z
![Page 21: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/21.jpg)
No Server == No HTTP Event == Nothing to Archive
http://topsy.com/http://twitpic.com/3uvo6z
![Page 22: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/22.jpg)
Hany M. SalahEldeen, Michael L. Nelson, Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?, Proceedings of TPDL 2012. http://arxiv.org/abs/1209.3026
Hany SalahEldeen, Michael L. Nelson, Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to the Disappearing Web, Proceedings of TPDL 2013. http://arxiv.org/abs/1309.2648
Missing: 11% year 1, 7%/year afterwardsArchived: 7% year 1, 15%/year afterwards
![Page 23: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/23.jpg)
Malaysia Airlines Flight 17 (MH17)
http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
http://www.newyorker.com/magazine/2015/01/26/cobweb
![Page 24: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/24.jpg)
![Page 25: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/25.jpg)
(not really archived as well as you think)
![Page 26: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/26.jpg)
Ed and I Discuss Who Has What…
https://twitter.com/phonedude_mln/status/490171976389238784
![Page 27: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/27.jpg)
Remember MH17?
https://twitter.com/phonedude_mln/status/490171976389238784
![Page 28: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/28.jpg)
Alex is now 404.Would multiple archives have convinced him?
https://twitter.com/quicknquiet
![Page 29: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/29.jpg)
Do we really have “a perfect tool to produce `evidence’ of any kind”?
![Page 30: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/30.jpg)
@AstroKatie Schools @gary4205
https://twitter.com/AstroKatie/status/765344020184739840
![Page 31: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/31.jpg)
But can you prove he didn’t say this?
![Page 32: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/32.jpg)
Or that she didn’t say this?(remember: black hats can use tools created by white hats)
![Page 33: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/33.jpg)
Mutt and Jeff
http://quoteinvestigator.com/2013/04/11/better-light/
![Page 34: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/34.jpg)
Hey #Twitter, did you know there’s flooding in LA…
https://www.facebook.com/KevinFreyTV/photos/a.1678627819032359.1073741829.1675465999348541/1834217933473346/?type=1&theater
Reminder: Facebook ~5X Larger Than Twitter
![Page 35: We Need Multiple, Independent Web Archives](https://reader036.vdocument.in/reader036/viewer/2022062522/58841caf1a28ab485c8b498d/html5/thumbnails/35.jpg)
Summary
• Seagal’s Law has come to web archiving– Learn more about archive interoperability: http://mementoweb.org/
• Archived web is incomplete, unstable, unreliable, and unevenly distributed– Always true for archives, but shouldn’t we expect better?– Learn more about archival verifiability: https://mellon.org/grants/grants-database/grants/old-dominion-
university/11600663/