negotiating the archives of uk web space - netlab › wp-content › uploads › 2016 › 12 ›...

21
Negotiating the archives of UK web space Jane Winters, Professor of Digital Humanities, School of Advanced Study, University of London Workshop on National Webs, Aarhus, 8-9 December 2016

Upload: others

Post on 05-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

Negotiating the archives of UK web space

Jane Winters, Professor of Digital Humanities, School of Advanced Study, University of London

Workshop on National Webs, Aarhus, 8-9 December 2016

Page 2: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

Jisc Domain Dataset (1996-

2013)

Legal Deposit Domain Crawl (2013-2016)

Open UKWA (2004-2016)

UK parliament web archive (2009-2016)

UK Government Web Archive (1996-2016)

Internet Archive (1996-2016)

Common Crawl (1999-2016)

Archive-It

Other national

web archives

Page 3: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

Facts and figures I

• Jisc historical dataset 1996 to 6 April 2013

– 3,520,628,647 distinct records

– 65 terabytes

• 2014 domain crawl (.uk)

– 56TB data

– 2.5 billion webpages and other assets (including 4.7GB of viruses)

Page 4: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

Facts and figures II

• UK Parliament Web Archive

– Three snapshots per year covering 30 sites (37 sites in the archive in total)

– 4.8TB data

• UK Government Web Archive

– 3,000+ websites,

– Twitter (65,000 tweets) and video archives

Page 5: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

Internal inconsistencies

• UKGWA consists of data provided by IA 2003-4 (plus back catalogue to 1996); and by the Internet Memory Foundation from 2005 onwards (further complicated by membership of UKWAC)

• The BL annual domain crawl has failed differently each time it has run

• The ‘break’ between IA and nationally archived content

Page 6: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 7: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 8: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 9: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 10: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 11: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 12: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 13: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 14: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 15: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 16: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

0

50

100

150

200

250

300

Text types Image types Application types Video types File types

Number of format types, 1996-1997

1996 1997

Page 17: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

nexbri.demon.co.uk/local.gif 19970823153342 http://nexbri.demon.co.uk:80/local.gif image/gif 200 DFBOHMHZPPQSEAIGZGL5MTATRKVB3FGF - 40806909 DOTUK-HISTORICAL-1996-2010-GROUP-AK-XABCKD-20110428000000-00002.arc.gz

mirex.demon.co.uk/background3.gif 19970824013134 http://mirex.demon.co.uk:80/background3.gif image/* 200 Z2V3V4NZTEYL634PR4VPS7YWIVG7J4B4 - 40832067 DOTUK-HISTORICAL-1996-2010-GROUP-AK-XABCKD-20110428000000-00002.arc.gz

mirex.demon.co.uk/mirex.gif 19970824013153 http://mirex.demon.co.uk:80/mirex.gif image/* 200 KVZHDCQIPPU4T5TA6P4TCP2BAAJNSH6H - 40840076 DOTUK-HISTORICAL-1996-2010-GROUP-AK-XABCKD-20110428000000-00002.arc.gz

mirex.demon.co.uk/ibrowsenowanim.gif 19970824013315 http://mirex.demon.co.uk:80/IBrowseNowAnim.gif image/* 200 CQXESYZG2DMVYDISQDJVQCMDJAHD7YEK - 40860957 DOTUK-HISTORICAL-1996-2010-GROUP-AK-XABCKD-20110428000000-00002.arc.gz

Page 18: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)
Page 19: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Number of .uk names registered, 1996-2008 (Nominet)

Page 20: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

1,575,655

108,711

4,626 265 42 8,8300

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

.co.uk .org.uk .ltd.uk .plc.uk .net.uk .sch.uk

Breakdown of domain name registrations, 2000 (Nominet)

Page 21: Negotiating the archives of UK web space - NetLab › wp-content › uploads › 2016 › 12 › 12-Jane-Winters...2016/12/12  · Archive (1996-2016) Internet Archive (1996-2016)

Acknowledgements

• BUDDAH project team – Jonathan Blaney, Niels Brügger, Josh Cowls, Helen Hockx-Yu, Andrew Jackson, Eric Meyer, Ralph Schroeder, Jason Webber, Peter Webster

• Bursary holders – Rowan Aust, Rona Cran, Richard Deswarte, Saskia Huc-Hepher, Alison Kay, Gareth Millward, Marta Musso, Harry Raffal, Lorna Richardson, Helen Taylor