presentation overview
DESCRIPTION
Accessing Historical and Colonial Census data through the Australian Social Science Data Archive Dr . Steve McEachern Deputy Director, ASSDA . Presentation Overview. About ASSDA/ADA ADA in brief The ADA website ADA or ASSDA? 1966-1991 data What do we hold? How can I access it?. - PowerPoint PPT PresentationTRANSCRIPT
Accessing Historical and Colonial Census data through the Australian Social Science Data
Archive
Dr. Steve McEachernDeputy Director, ASSDA
Presentation Overview
1. About ASSDA/ADAa) ADA in briefb) The ADA websitec) ADA or ASSDA?
2. 1966-1991 dataa) What do we hold?b) How can I access it?
3. Historical and Colonial Census Data
a) Introduction to HCCDA
b) Searchingc) Browsing
4. Future directions for census data at ADA
1. About ASSDA/ADA
ASSDA/ADA in BriefASSDA was set up in 1981, housed in the RSSS, ANU to collect and preserve Australian social science data on behalf of the social science research community
Now includes nodes at Uni of Melbourne, Uni of Queensland, Uni of WA, University of Technology Sydney, with infrastructure provided by the ANU Supercomputer Facility
The Archive holds some 2400 data sets, most notable holdings are national election studies; public opinion polls; social attitudes surveys – and CENSUS materials
Data holdings are sourced from academic, government and private sectors.
ASSDA Data Holdings
Ageing Well Census Demography Economics Education Employment, Labour Environment, Conservation,
Land use Health Housing Industry, Management Law, Crime, Courts
Mass media, Communication, Language
Politics Poll Psychology Science, Technology Social classes, Social order Social welfare Sociology, Culture Travel, Transport Non-Australian studies
ASSDA data holdings cover a wide variety of subject areas, currently housed under the following major headings:
ASSDA Front page
ASSDA or ADA?
• From July 2011, ASSDA will be changing names to ADA - The Australian Data Archive, with a new website:
• http://www.ada.edu.au
ADA Historical
Current access
• For now, access to ASSDA’s census data holdings is through our existing portal:
http://www.assda.edu.au/census.html
ASSDA Census Portal
2. Census 1966 - 1991
Available dataYear Documen
-tationTables Master files Matrix files
1966 Online Online through Nesstar, summary files on request
CDMF online (test only, with restricted access), others by request
By request
1971 Online Summary files on request
CDMF online (test only, with restricted access), others by request
By request
1976 Online Summary files on request
By request By request
Available data
Year Documen-tation
Tables Master files Matrix files
1981 Online Summary files on request
By request By request
1986 Online Summary files on request
By request By request
1991 Online Online through Nesstar, summary files on request
By request By request
Census documentation
ASSDA Nesstar Census Portal
What can you do in Nesstar?
• View tables• Create new tabulations (from the existing
set of table dimensions)• Generate charts• Export CSV and PDF files, and HTML or
XML documentation• Bookmark your tables for future reference
Study description
3. The Historical Census and Colonial Data Archive
Introduction to HCCDA
• The Historical Census and Colonial Data Archive (HCCDA) is a searchable archive of Australian Colonial census publications and reports.
• Will become a sub-archive of the new Australian Data Archive (ASSDA).
• Note that the archive contains colonial census reports and tables and not the raw census data.
Source materials
• Large corpus of potential source material – created by ABS as part of the 1988 Bicentennial program
• Paper copies not available or too fragile• Fiche becoming harder to access• Fiche quality an issue (3rd generation?)
Into the digital realm: images
• First the easy part: scan fiche to digital images
• Actually protracted, difficult and stressful• Scanning vendors approach highly
automated• Saved by manual Q/A by ASSDA staff• Complicated by file/page numbering
Into the digital realm: content
• Now the hard part: content conversion• Plain text conversion not good enough• Documents are semantically rich• Documents have rich structure• Tables are valuable in their own right• OCR conversion not good enough• Human data-entry is very good
Into the digital realm: XML
• XML can capture semantics and structure• XML based workflows in the future• But which XML? TEI, DocBook, custom?• Chose DocBook V5.0• Exit strategy: convert to another schema• Created 160+ page archive markup guide• XML created quickly and superbly by
InfoCube
Result: what’s available?
• 3 versions of the image from fiche:– Large, medium and small, for rendering in
various situations• Full text in XHTML format
Summary of contentsColony Years Documents Pages Tables
New South Wales
1833-190111 3987 3897
Victoria 1854-1901 10 6357 6575
Queensland 1861-190114 3122 4033
South Australia
1844-190112 2601 2516
Western Australia
1848-1901 10 1348 1071
Tasmania 1842-1901 9 1238 1181
Browsing HCCDA
• Browsing is done on a “By document” basis:– Page by page– Table by table
• Search is done on the XHTML markup
Browse
Page browse
Image browse
Search (and results)
Why full-text? Definitions!!
And table lookups!!
4. Future directions
What is still to come?• Migration to the new ADA website• Full export of HCCDA results (CSV, XML,
other suggestions?)• Additional of census tables into ADA’s
Nesstar online analysis system• Additional census years (1996 onwards)
• Bridging the gap: 1911-1961 …. ???
Questions or comments?
For further informationWeb: http://www.assda.edu.auFrom July: http://www.ada.edu.auEmail: [email protected] Phone x52200