香港六合彩
DESCRIPTION
香港六合彩要文明点,香港六合彩看,我不就作得很好嘛尽管我想宰了香港六合彩TRANSCRIPT
PANDORA:An Overview
Future-proofing Institutional Websites
19-20 January 2006
London
Matthew Walker
Deputy Director, Collection Infrastructure
IT Division
National Library of Australia
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
2
Introduction• Origin: Proof-of-concept
• Selection work started in 1996
• Archiving began late 1996/early 1997– Few automated processes– Progressed to more automated approach
• Now: Important NLA archiving activity
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
3
How?• Dynamic approach
– Low structure, high flexibility– Processes developed “on the fly”
• Result– Outcomes achieved– Best use of available resources
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
4
Who?• NLA
– Digital Archiving Section• Business responsibility (~7 staff)
– Librarians (support as needed)• Cataloguing
– Information Technology• Support (~1 staff)• Enhancement/Redevelopment (~4 staff)
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
5
Who?• Partner Institutions
– Libraries:• Northern Territory Library, State Library of New
South Wales, State Library of Queensland, State Library of South Australia, State Library of Victoria, State Library of Western Australia
– Other:• Australian Institute of Aboriginal and Torres Strait
Islander Studies, Australian War Memorial, National Film and Sound Archive
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
6
What?• NLA responsibilities
– National Library Act, 1960• No legal deposit legislation for electronic
resources!
– Maintain and develop a national collection of ‘library material’
– Comprehensive collection relating to Australia and the Australian people
– Leadership role
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
7
Characteristics• Selective approach
• Scalable to available resources
• Negotiate permission to archive
• Manual quality assurance processes
• Access to the archived resources
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
8
Issues• Missing resources for future researchers
• Labour intensive
• Full linking structure of the Internet not retained
• Deep web content not archived
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
9
Workflow1. Nominating/Identifying
• Publisher self-nomination• Nomination form (
http://pandora.nla.gov.au/registration_form.html)
• Indexing/abstracting agency nominations.• Nomination form
(http://pandora.nla.gov.au/indexerform.html)
• NLA’s Digital Archiving Section (DAS)• Partner institutions
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
10
Workflow2. Selecting
• DAS• NLA selection guidelines (
http://pandora.nla.gov.au/selectionguidelines.html)
• Partner institutions• Own selection guidelines
• Type of content• Documents (e.g. PDF)• Whole and partial websites
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
11
Workflow3. Gathering
• Mechanisms• HTTrack crawling (http://www.httrack.com)• FTP from publisher• Email from publisher
• Preservation copy• Post-crawl processing• Working area
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
12
Workflow4. Processing
• Quality assurance• Manual check for viewing/linking errors• Completeness and functionality• New content (compare with previous instance)• No unexpected content
• Modifications• Write access to the working area
• Add missing files, fix broken links, etc.
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
13
Workflow5. Archiving
• Transfer master display copy from working area to Digital Object Storage System (DOSS)
• Transfer preservation copy to preservation area on the DOSS
• Create display copy on web server• Still not published!
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
14
Workflow6. Publishing
• Title Entry Page (TEP)• Created from metadata• Additional links to notes, links to serial issues,
copyright statement, etc.• Creation makes the archived copy publicly
accessible
• Persistent Identifiers (PIs)• e.g. nla.arc-25849-20051113-
www.bullyingnoway.com.au/default.html
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
15
Workflow7. Cataloguing
• Bibliographic details• NLA catalogue• National Bibliographic Database (NDB)
• Metadata imported into PANDORA TEPs
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
16
Workflow• Permissions
• No legal deposit• Explicit permission of the publisher is sought prior
to archiving
• Copyright, etc• Publisher’s permission to make publicly available
– Restrictions
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
17
Workflow• Restrictions
• Publisher restrictions on access• Period
– e.g. accessible from restricted location/s for 5 years– Location is specified by IP address and subnet mask
• Date– e.g. accessible from restricted location/s between 3/12/2005
and 31/1/2007– Location is specified by IP address and subnet mask
• Authenticated group– e.g. accessible by username/password credentials
• Can be enabled/disabled in PANDAS
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
18
NLA Tools• PANDAS
– http://pandora.nla.gov.au/pandas.html– Web archive management system.
• XINQ– http://www.nla.gov.au/xinq/– Making deep web database archives
accessible by browse/search.
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
19
Other Tools• PageVault
– http://www.projectcomputing.com/products/pageVault/– Archives your website by keeping a copy of every accessed
version of a page as it passes through your web server.• HTTrack
– http://www.httrack.com– Desktop/command-line tool for crawling websites.
• Heritrix– http://crawler.archive.org/– Tool from Internet Archive for crawling the web.– Designed for large-scale crawls, rather than individual
websites.
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
20
PANDORA Resources• Selection guidelines
– http://pandora.nla.gov.au/selectionguidelinesallpartners.html
• Papers & presentations– http://pandora.nla.gov.au/papers.html
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
21
Other Resources• PANDORA Archiving Issues FAQ
http://pandora.nla.gov.au/manual/pandas/faq.html• NLA Digital Archiving Section - General Procedures
(Procedures for handling Internet resources)http://pandora.nla.gov.au/manual/general_procedures.html
• NLA Digital Archiving Section Manual - Check List for Scheduled Gatheringshttp://pandora.nla.gov.au/manual/checklist.html
• NLA Digital Archiving Section Manual - Gathering Schedule Guidelineshttp://pandora.nla.gov.au/manual/schedule_guidelines.html
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
22
Future Directions/Issues• Deep web – database archiving
• Historical repository of tools for viewing archive content
• New & future ways of authoring & publishing to the web– XML publishing, blogs, DB driven, wikis…– What’s coming in 2, 5 or 10 years’ time?
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
23
Recommendations for starting out• Do something small & do it now.
• Build on what you already have.
• Think about what you have done and revise/expand as necessary.
04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London
24
Summary• The PANDORA story
• Tools and resources
• Futures/ideas