the australian government web archive alia conference 2014 18 september 2014, melbourne alison...
TRANSCRIPT
![Page 1: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/1.jpg)
The Australian Government Web ArchiveALIA Conference 201418 September 2014, Melbourne
Alison DellitDirector, Australian Collection Management
![Page 2: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/2.jpg)
NLA web archive collections
• PANDORA Archive collection (open access)– Selective web archiving since 1996
• Australian domain harvest collection (closed)– Large scale, outsourced (IA), annual collection, since 2005
• Australian Government Web Archive collection (open access)– Bulk seed list harvesting, outsourced (IA) and in-house run,
annual (or more frequent)– 2011, 2012, 2013 (x2) and 2014 (x2)
![Page 3: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/3.jpg)
The government publication problem
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 20140
1000
2000
3000
4000
5000
6000
7000
![Page 4: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/4.jpg)
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 20140
1000
2000
3000
4000
5000
6000
7000
![Page 5: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/5.jpg)
So where did AGWA come from?• Administrative conditions• Whole-of-Government arrangements
– Gershon Review (Oct. 2008)
• May 2010 –Secretaries’ ICT Governance Board approval• Non-corporate PGPA Agencies
Commonwealth corporate entities • Technical and development considerations• NLA development of infrastructure and skills• Large scale, bulk harvesting• Access to large scale, bulk harvested collections
![Page 6: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/6.jpg)
Selective‘targets’, ‘titles’
Small scale
ReactiveTimely
Scheduled
High curation
ThemedCurated seed lists
e.g. gov.au
Moderate scale
ScheduledTimely
Highcuration
2nd L Domain
e.g. org.au
Moderate to large
scale
Scheduled(moderate
control)
Moderatecuration
TL Domain
i.e. .au
Large scale
Scheduled(low control)
Low curation
Whole Web
Internet Archive
Large scale
OngoingUnscheduled
No curation control
PANDORA AusCrawl 2005-2013gov.au 2011-2013
![Page 7: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/7.jpg)
NLA Web Archiving StatisticsPANDORA Web Archive
‘Selective’
1996 – Sept. 2014
(102,000 instances)
Australian Domain (.au) Web Archive
‘Country TL domain’
2005-2014
(9 crawls)
Australian Government Web Archive
‘Seed-list’
2011-2014
(6 crawls)
All Collections
Files 269 million 6.33 billion 76.9 million 6.67 billion
Data 13 TB 236 TB 7 TB 256 TB
![Page 8: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/8.jpg)
AGWA content
Total Average harvest
Files 34.5 million ~ 8 million
Data 3 TB 750 GB – 1 TB
200 TBs
11 TBs 3 TBs
Data (TBs)
Whole Domain HarvestsPANDORA ArchiveAGWA
![Page 10: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management](https://reader036.vdocument.in/reader036/viewer/2022082817/56649e425503460f94b35a71/html5/thumbnails/10.jpg)
AGWA futuresComing soon: • 2005-2011 harvest content• More commonwealth agencies• More integration to a catalogue near you.Next few years:• Integration into Trove• Metadata extraction• Visualisation of data