the australian government web archive alia conference 2014 18 september 2014, melbourne alison...

11
The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

Upload: brittney-bailey

Post on 27-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

The Australian Government Web ArchiveALIA Conference 201418 September 2014, Melbourne

Alison DellitDirector, Australian Collection Management

Page 2: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

NLA web archive collections

• PANDORA Archive collection (open access)– Selective web archiving since 1996

• Australian domain harvest collection (closed)– Large scale, outsourced (IA), annual collection, since 2005

• Australian Government Web Archive collection (open access)– Bulk seed list harvesting, outsourced (IA) and in-house run,

annual (or more frequent)– 2011, 2012, 2013 (x2) and 2014 (x2)

Page 3: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

The government publication problem

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 20140

1000

2000

3000

4000

5000

6000

7000

Page 4: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 20140

1000

2000

3000

4000

5000

6000

7000

Page 5: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

So where did AGWA come from?• Administrative conditions• Whole-of-Government arrangements

– Gershon Review (Oct. 2008)

• May 2010 –Secretaries’ ICT Governance Board approval• Non-corporate PGPA Agencies

Commonwealth corporate entities • Technical and development considerations• NLA development of infrastructure and skills• Large scale, bulk harvesting• Access to large scale, bulk harvested collections

Page 6: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

Selective‘targets’, ‘titles’

Small scale

ReactiveTimely

Scheduled

High curation

ThemedCurated seed lists

e.g. gov.au

Moderate scale

ScheduledTimely

Highcuration

2nd L Domain

e.g. org.au

Moderate to large

scale

Scheduled(moderate

control)

Moderatecuration

TL Domain

i.e. .au

Large scale

Scheduled(low control)

Low curation

Whole Web

Internet Archive

Large scale

OngoingUnscheduled

No curation control

PANDORA AusCrawl 2005-2013gov.au 2011-2013

Page 7: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

NLA Web Archiving StatisticsPANDORA Web Archive

‘Selective’

1996 – Sept. 2014

(102,000 instances)

Australian Domain (.au) Web Archive

‘Country TL domain’

2005-2014

(9 crawls)

Australian Government Web Archive

‘Seed-list’

2011-2014

(6 crawls)

All Collections

Files 269 million 6.33 billion 76.9 million 6.67 billion

Data 13 TB 236 TB 7 TB 256 TB

Page 8: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

AGWA content

Total Average harvest

Files 34.5 million ~ 8 million

Data 3 TB 750 GB – 1 TB

200 TBs

11 TBs 3 TBs

Data (TBs)

Whole Domain HarvestsPANDORA ArchiveAGWA

Page 9: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

http://webarchive.nla.gov.au/gov

Page 10: The Australian Government Web Archive ALIA Conference 2014 18 September 2014, Melbourne Alison Dellit Director, Australian Collection Management

AGWA futuresComing soon: • 2005-2011 harvest content• More commonwealth agencies• More integration to a catalogue near you.Next few years:• Integration into Trove• Metadata extraction• Visualisation of data