香港六合彩

24
PANDORA: An Overview Future-proofing Institutional Websites 19-20 January 2006 London Matthew Walker Deputy Director, Collection Infrastructure IT Division National Library of Australia

Upload: iewsxc

Post on 20-Jun-2015

1.838 views

Category:

Business


1 download

DESCRIPTION

香港六合彩要文明点,香港六合彩看,我不就作得很好嘛尽管我想宰了香港六合彩

TRANSCRIPT

Page 1: 香港六合彩

PANDORA:An Overview

Future-proofing Institutional Websites

19-20 January 2006

London

Matthew Walker

Deputy Director, Collection Infrastructure

IT Division

National Library of Australia

Page 2: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

2

Introduction• Origin: Proof-of-concept

• Selection work started in 1996

• Archiving began late 1996/early 1997– Few automated processes– Progressed to more automated approach

• Now: Important NLA archiving activity

Page 3: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

3

How?• Dynamic approach

– Low structure, high flexibility– Processes developed “on the fly”

• Result– Outcomes achieved– Best use of available resources

Page 4: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

4

Who?• NLA

– Digital Archiving Section• Business responsibility (~7 staff)

– Librarians (support as needed)• Cataloguing

– Information Technology• Support (~1 staff)• Enhancement/Redevelopment (~4 staff)

Page 5: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

5

Who?• Partner Institutions

– Libraries:• Northern Territory Library, State Library of New

South Wales, State Library of Queensland, State Library of South Australia, State Library of Victoria, State Library of Western Australia

– Other:• Australian Institute of Aboriginal and Torres Strait

Islander Studies, Australian War Memorial, National Film and Sound Archive

Page 6: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

6

What?• NLA responsibilities

– National Library Act, 1960• No legal deposit legislation for electronic

resources!

– Maintain and develop a national collection of ‘library material’

– Comprehensive collection relating to Australia and the Australian people

– Leadership role

Page 7: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

7

Characteristics• Selective approach

• Scalable to available resources

• Negotiate permission to archive

• Manual quality assurance processes

• Access to the archived resources

Page 8: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

8

Issues• Missing resources for future researchers

• Labour intensive

• Full linking structure of the Internet not retained

• Deep web content not archived

Page 9: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

9

Workflow1. Nominating/Identifying

• Publisher self-nomination• Nomination form (

http://pandora.nla.gov.au/registration_form.html)

• Indexing/abstracting agency nominations.• Nomination form

(http://pandora.nla.gov.au/indexerform.html)

• NLA’s Digital Archiving Section (DAS)• Partner institutions

Page 10: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

10

Workflow2. Selecting

• DAS• NLA selection guidelines (

http://pandora.nla.gov.au/selectionguidelines.html)

• Partner institutions• Own selection guidelines

• Type of content• Documents (e.g. PDF)• Whole and partial websites

Page 11: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

11

Workflow3. Gathering

• Mechanisms• HTTrack crawling (http://www.httrack.com)• FTP from publisher• Email from publisher

• Preservation copy• Post-crawl processing• Working area

Page 12: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

12

Workflow4. Processing

• Quality assurance• Manual check for viewing/linking errors• Completeness and functionality• New content (compare with previous instance)• No unexpected content

• Modifications• Write access to the working area

• Add missing files, fix broken links, etc.

Page 13: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

13

Workflow5. Archiving

• Transfer master display copy from working area to Digital Object Storage System (DOSS)

• Transfer preservation copy to preservation area on the DOSS

• Create display copy on web server• Still not published!

Page 14: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

14

Workflow6. Publishing

• Title Entry Page (TEP)• Created from metadata• Additional links to notes, links to serial issues,

copyright statement, etc.• Creation makes the archived copy publicly

accessible

• Persistent Identifiers (PIs)• e.g. nla.arc-25849-20051113-

www.bullyingnoway.com.au/default.html

Page 15: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

15

Workflow7. Cataloguing

• Bibliographic details• NLA catalogue• National Bibliographic Database (NDB)

• Metadata imported into PANDORA TEPs

Page 16: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

16

Workflow• Permissions

• No legal deposit• Explicit permission of the publisher is sought prior

to archiving

• Copyright, etc• Publisher’s permission to make publicly available

– Restrictions

Page 17: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

17

Workflow• Restrictions

• Publisher restrictions on access• Period

– e.g. accessible from restricted location/s for 5 years– Location is specified by IP address and subnet mask

• Date– e.g. accessible from restricted location/s between 3/12/2005

and 31/1/2007– Location is specified by IP address and subnet mask

• Authenticated group– e.g. accessible by username/password credentials

• Can be enabled/disabled in PANDAS

Page 18: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

18

NLA Tools• PANDAS

– http://pandora.nla.gov.au/pandas.html– Web archive management system.

• XINQ– http://www.nla.gov.au/xinq/– Making deep web database archives

accessible by browse/search.

Page 19: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

19

Other Tools• PageVault

– http://www.projectcomputing.com/products/pageVault/– Archives your website by keeping a copy of every accessed

version of a page as it passes through your web server.• HTTrack

– http://www.httrack.com– Desktop/command-line tool for crawling websites.

• Heritrix– http://crawler.archive.org/– Tool from Internet Archive for crawling the web.– Designed for large-scale crawls, rather than individual

websites.

Page 20: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

20

PANDORA Resources• Selection guidelines

– http://pandora.nla.gov.au/selectionguidelinesallpartners.html

• Papers & presentations– http://pandora.nla.gov.au/papers.html

Page 21: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

21

Other Resources• PANDORA Archiving Issues FAQ

http://pandora.nla.gov.au/manual/pandas/faq.html• NLA Digital Archiving Section - General Procedures

(Procedures for handling Internet resources)http://pandora.nla.gov.au/manual/general_procedures.html

• NLA Digital Archiving Section Manual - Check List for Scheduled Gatheringshttp://pandora.nla.gov.au/manual/checklist.html

• NLA Digital Archiving Section Manual - Gathering Schedule Guidelineshttp://pandora.nla.gov.au/manual/schedule_guidelines.html

Page 22: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

22

Future Directions/Issues• Deep web – database archiving

• Historical repository of tools for viewing archive content

• New & future ways of authoring & publishing to the web– XML publishing, blogs, DB driven, wikis…– What’s coming in 2, 5 or 10 years’ time?

Page 23: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

23

Recommendations for starting out• Do something small & do it now.

• Build on what you already have.

• Think about what you have done and revise/expand as necessary.

Page 24: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

24

Summary• The PANDORA story

• Tools and resources

• Futures/ideas