presentation overview

36
Accessing Historical and Colonial Census data through the Australian Social Science Data Archive Dr. Steve McEachern Deputy Director, ASSDA

Upload: mervin

Post on 23-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Accessing Historical and Colonial Census data through the Australian Social Science Data Archive Dr . Steve McEachern Deputy Director, ASSDA . Presentation Overview. About ASSDA/ADA ADA in brief The ADA website ADA or ASSDA? 1966-1991 data What do we hold? How can I access it?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presentation Overview

Accessing Historical and Colonial Census data through the Australian Social Science Data

Archive

Dr. Steve McEachernDeputy Director, ASSDA

Page 2: Presentation Overview

Presentation Overview

1. About ASSDA/ADAa) ADA in briefb) The ADA websitec) ADA or ASSDA?

2. 1966-1991 dataa) What do we hold?b) How can I access it?

3. Historical and Colonial Census Data

a) Introduction to HCCDA

b) Searchingc) Browsing

4. Future directions for census data at ADA

Page 3: Presentation Overview

1. About ASSDA/ADA

Page 4: Presentation Overview

ASSDA/ADA in BriefASSDA was set up in 1981, housed in the RSSS, ANU to collect and preserve Australian social science data on behalf of the social science research community

Now includes nodes at Uni of Melbourne, Uni of Queensland, Uni of WA, University of Technology Sydney, with infrastructure provided by the ANU Supercomputer Facility

The Archive holds some 2400 data sets, most notable holdings are national election studies; public opinion polls; social attitudes surveys – and CENSUS materials

Data holdings are sourced from academic, government and private sectors.

Page 5: Presentation Overview

ASSDA Data Holdings

Ageing Well Census Demography Economics Education Employment, Labour Environment, Conservation,

Land use Health Housing Industry, Management Law, Crime, Courts

Mass media, Communication, Language

Politics Poll Psychology Science, Technology Social classes, Social order Social welfare Sociology, Culture Travel, Transport Non-Australian studies

ASSDA data holdings cover a wide variety of subject areas, currently housed under the following major headings:

Page 6: Presentation Overview

ASSDA Front page

Page 7: Presentation Overview

ASSDA or ADA?

• From July 2011, ASSDA will be changing names to ADA - The Australian Data Archive, with a new website:

• http://www.ada.edu.au

Page 8: Presentation Overview

ADA Historical

Page 9: Presentation Overview

Current access

• For now, access to ASSDA’s census data holdings is through our existing portal:

http://www.assda.edu.au/census.html

Page 10: Presentation Overview

ASSDA Census Portal

Page 11: Presentation Overview

2. Census 1966 - 1991

Page 12: Presentation Overview

Available dataYear Documen

-tationTables Master files Matrix files

1966 Online Online through Nesstar, summary files on request

CDMF online (test only, with restricted access), others by request

By request

1971 Online Summary files on request

CDMF online (test only, with restricted access), others by request

By request

1976 Online Summary files on request

By request By request

Page 13: Presentation Overview

Available data

Year Documen-tation

Tables Master files Matrix files

1981 Online Summary files on request

By request By request

1986 Online Summary files on request

By request By request

1991 Online Online through Nesstar, summary files on request

By request By request

Page 14: Presentation Overview

Census documentation

Page 15: Presentation Overview

ASSDA Nesstar Census Portal

Page 16: Presentation Overview

What can you do in Nesstar?

• View tables• Create new tabulations (from the existing

set of table dimensions)• Generate charts• Export CSV and PDF files, and HTML or

XML documentation• Bookmark your tables for future reference

Page 17: Presentation Overview

Study description

Page 18: Presentation Overview

3. The Historical Census and Colonial Data Archive

Page 19: Presentation Overview

Introduction to HCCDA

• The Historical Census and Colonial Data Archive (HCCDA) is a searchable archive of Australian Colonial census publications and reports.

• Will become a sub-archive of the new Australian Data Archive (ASSDA).

• Note that the archive contains colonial census reports and tables and not the raw census data.

Page 20: Presentation Overview

Source materials

• Large corpus of potential source material – created by ABS as part of the 1988 Bicentennial program

• Paper copies not available or too fragile• Fiche becoming harder to access• Fiche quality an issue (3rd generation?)

Page 21: Presentation Overview

Into the digital realm: images

• First the easy part: scan fiche to digital images

• Actually protracted, difficult and stressful• Scanning vendors approach highly

automated• Saved by manual Q/A by ASSDA staff• Complicated by file/page numbering

– (more on this later)

Page 22: Presentation Overview

Into the digital realm: content

• Now the hard part: content conversion• Plain text conversion not good enough• Documents are semantically rich• Documents have rich structure• Tables are valuable in their own right• OCR conversion not good enough• Human data-entry is very good

Page 23: Presentation Overview

Into the digital realm: XML

• XML can capture semantics and structure• XML based workflows in the future• But which XML? TEI, DocBook, custom?• Chose DocBook V5.0• Exit strategy: convert to another schema• Created 160+ page archive markup guide• XML created quickly and superbly by

InfoCube

Page 24: Presentation Overview

Result: what’s available?

• 3 versions of the image from fiche:– Large, medium and small, for rendering in

various situations• Full text in XHTML format

Page 25: Presentation Overview

Summary of contentsColony Years Documents Pages Tables

New South Wales

1833-190111 3987 3897

Victoria 1854-1901 10 6357 6575

Queensland 1861-190114 3122 4033

South Australia

1844-190112 2601 2516

Western Australia

1848-1901 10 1348 1071

Tasmania 1842-1901 9 1238 1181

Page 26: Presentation Overview

Browsing HCCDA

• Browsing is done on a “By document” basis:– Page by page– Table by table

• Search is done on the XHTML markup

Page 27: Presentation Overview

Browse

Page 28: Presentation Overview

Page browse

Page 29: Presentation Overview

Image browse

Page 30: Presentation Overview

Search (and results)

Page 31: Presentation Overview

Why full-text? Definitions!!

Page 32: Presentation Overview

And table lookups!!

Page 33: Presentation Overview

4. Future directions

Page 34: Presentation Overview

What is still to come?• Migration to the new ADA website• Full export of HCCDA results (CSV, XML,

other suggestions?)• Additional of census tables into ADA’s

Nesstar online analysis system• Additional census years (1996 onwards)

• Bridging the gap: 1911-1961 …. ???

Page 35: Presentation Overview

Questions or comments?

For further informationWeb: http://www.assda.edu.auFrom July: http://www.ada.edu.auEmail: [email protected] Phone x52200

Page 36: Presentation Overview