nick thieberger department of linguistics & applied linguistics the university of melbourne

23
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne PNC Conference November 2005

Upload: luka

Post on 14-Jan-2016

37 views

Category:

Documents


1 download

DESCRIPTION

Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC. Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne PNC Conference November 2005. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC

Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC

Nick ThiebergerDepartment of Linguistics & Applied LinguisticsThe University of Melbourne

PNC Conference November 2005

Nick ThiebergerDepartment of Linguistics & Applied LinguisticsThe University of Melbourne

PNC Conference November 2005

Page 2: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Collaborative digital research resource set up by University of Sydney, University of Melbourne & Australian National University, 2003. (UNE joined 2004)

Collaborative digital research resource set up by University of Sydney, University of Melbourne & Australian National University, 2003. (UNE joined 2004)

75% funding from Australian Research Council LIEF Scheme (3 successful applications)

Page 3: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Communities of interestCommunities of interest

A group of linguists and musicologists recognised that large collections of recorded material were not being properly archived. The other parts of the community are speakers and their descendants. Shared needs in the current group, and need for training of new researchers.

At least 3000 hours of analog fieldtapesNew technologies have a steep learning curve - Need for specialised assistance - Applied for research funds to establish an archive

A group of linguists and musicologists recognised that large collections of recorded material were not being properly archived. The other parts of the community are speakers and their descendants. Shared needs in the current group, and need for training of new researchers.

At least 3000 hours of analog fieldtapesNew technologies have a steep learning curve - Need for specialised assistance - Applied for research funds to establish an archive

Page 4: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Communities of interestCommunities of interest

Collaboration across universities and disciplines Support from computing specialists (data grid, mass data store, programming), government agencies (E-research, Australian Partnership for Sustainable Repositories. GrangeNet) International links - similar initiatives (OLAC/DELAMAN) Regional cultural centres and museums (targets for repatriation of digital recordings) International standards - Metadata (OLAC/OAI)

All requires coordination or project management

Collaboration across universities and disciplines Support from computing specialists (data grid, mass data store, programming), government agencies (E-research, Australian Partnership for Sustainable Repositories. GrangeNet) International links - similar initiatives (OLAC/DELAMAN) Regional cultural centres and museums (targets for repatriation of digital recordings) International standards - Metadata (OLAC/OAI)

All requires coordination or project management

Page 5: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

To preserve and make accessible Australian researchers’ field recordings of endangered languages and musics from the Asia-Pacific together with other digital material related to cultures of the region (theses, wordlists, texts, etc)

Preservation: to adopt world’s best practice standards and formats to maximise sustainability and future usability of the collectionAccess: To take advantage of emerging information and communication technologies to maximise access to our collection by both researchers and cultural heritage communities

To preserve and make accessible Australian researchers’ field recordings of endangered languages and musics from the Asia-Pacific together with other digital material related to cultures of the region (theses, wordlists, texts, etc)

Preservation: to adopt world’s best practice standards and formats to maximise sustainability and future usability of the collectionAccess: To take advantage of emerging information and communication technologies to maximise access to our collection by both researchers and cultural heritage communities

Page 6: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Over 2000 of the world’s 6000 languages in the Asia-Pacific regionNumber likely to fall to a few hundred by 2100 (UNESCO)Australian researchers active in region since 1950s - making unique recordings of unrepeatable eventsRecordings now themselves endangered (format obsolescence, media deterioration, loss of metadata)

Over 2000 of the world’s 6000 languages in the Asia-Pacific regionNumber likely to fall to a few hundred by 2100 (UNESCO)Australian researchers active in region since 1950s - making unique recordings of unrepeatable eventsRecordings now themselves endangered (format obsolescence, media deterioration, loss of metadata)

Page 7: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

2500 records in PARADISEC catalogue 2500 records in PARADISEC catalogue with data on 390 languages from 50 with data on 390 languages from 50 countriescountries including: including: American Samoa, American Samoa, Australia, Bangladesh, Botswana, Cambodia, Chile, Australia, Bangladesh, Botswana, Cambodia, Chile, China, Cook Islands, Fiji, French Polynesia, China, Cook Islands, Fiji, French Polynesia, Greenland, Hong Kong, Iceland, India, Indonesia, Greenland, Hong Kong, Iceland, India, Indonesia, Israel, Italy, Japan, Kiribati, Republic Of Israel, Italy, Japan, Kiribati, Republic Of Korea, Lao People’s Democratic Republic, Korea, Lao People’s Democratic Republic, Madagascar, Malaysia, Malta, Marshall Islands, Madagascar, Malaysia, Malta, Marshall Islands, Mexico,, Federated States Of Micronesia, Myanmar, Mexico,, Federated States Of Micronesia, Myanmar, Nauru, Nepal, New Caledonia, New Zealand, Nauru, Nepal, New Caledonia, New Zealand, Nigeria, Niue, Palau, Papua New Guinea, Nigeria, Niue, Palau, Papua New Guinea, Philippines, Reunion, Samoa, Singapore, Solomon Philippines, Reunion, Samoa, Singapore, Solomon Islands, South Africa, Taiwan, Province of China, Islands, South Africa, Taiwan, Province of China, Thailand, Tonga, Uganda, United States of Thailand, Tonga, Uganda, United States of America, Vanuatu, Viet Nam, Wallis And Futuna America, Vanuatu, Viet Nam, Wallis And Futuna (data as of September 2005)(data as of September 2005)

2500 records in PARADISEC catalogue 2500 records in PARADISEC catalogue with data on 390 languages from 50 with data on 390 languages from 50 countriescountries including: including: American Samoa, American Samoa, Australia, Bangladesh, Botswana, Cambodia, Chile, Australia, Bangladesh, Botswana, Cambodia, Chile, China, Cook Islands, Fiji, French Polynesia, China, Cook Islands, Fiji, French Polynesia, Greenland, Hong Kong, Iceland, India, Indonesia, Greenland, Hong Kong, Iceland, India, Indonesia, Israel, Italy, Japan, Kiribati, Republic Of Israel, Italy, Japan, Kiribati, Republic Of Korea, Lao People’s Democratic Republic, Korea, Lao People’s Democratic Republic, Madagascar, Malaysia, Malta, Marshall Islands, Madagascar, Malaysia, Malta, Marshall Islands, Mexico,, Federated States Of Micronesia, Myanmar, Mexico,, Federated States Of Micronesia, Myanmar, Nauru, Nepal, New Caledonia, New Zealand, Nauru, Nepal, New Caledonia, New Zealand, Nigeria, Niue, Palau, Papua New Guinea, Nigeria, Niue, Palau, Papua New Guinea, Philippines, Reunion, Samoa, Singapore, Solomon Philippines, Reunion, Samoa, Singapore, Solomon Islands, South Africa, Taiwan, Province of China, Islands, South Africa, Taiwan, Province of China, Thailand, Tonga, Uganda, United States of Thailand, Tonga, Uganda, United States of America, Vanuatu, Viet Nam, Wallis And Futuna America, Vanuatu, Viet Nam, Wallis And Futuna (data as of September 2005)(data as of September 2005)

Page 8: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Locating data in the collectionLocating data in the collection

Metadata complying to international standards

Open language archives community (OLAC)

Geographic data entered via a map interface for later geographic querying

Open Archives Initiative (OAI)

Metadata complying to international standards

Open language archives community (OLAC)

Geographic data entered via a map interface for later geographic querying

Open Archives Initiative (OAI)

Page 9: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Metadata CatalogueMetadata Catalogue

SQL/PHP password access

Controlled vocabularies (language name, contributor role, data type, coverage, etc)

Link to repository data stored at the Australian Partnership for Advanced Computing (APAC) in Canberra

SQL/PHP password access

Controlled vocabularies (language name, contributor role, data type, coverage, etc)

Link to repository data stored at the Australian Partnership for Advanced Computing (APAC) in Canberra

Page 10: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Typical dataTypical data

Stephen Wurm’s several hundred tapes, including 120 1970s Solomon Islands tapes and transcripts/fieldnotes

Arthur Capell’s 114 tapes, Pacific and PNG 1950s (and 30 archive boxes of fieldnotes)

Bert Voorhoeve’s 180 tapes - West Papua

Tom Dutton’s 295 PNG tapes

Stephen Wurm’s several hundred tapes, including 120 1970s Solomon Islands tapes and transcripts/fieldnotes

Arthur Capell’s 114 tapes, Pacific and PNG 1950s (and 30 archive boxes of fieldnotes)

Bert Voorhoeve’s 180 tapes - West Papua

Tom Dutton’s 295 PNG tapes

Page 11: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Imaging fieldnotesImaging fieldnotes

To date over 10,000 pages of fieldnotes have been photographed using AUSTEHC's system

Crucial that links between fieldnotes and field recordings be maintained

Aim to allow trusted users to build links between dynamic media and fieldnotes

To date over 10,000 pages of fieldnotes have been photographed using AUSTEHC's system

Crucial that links between fieldnotes and field recordings be maintained

Aim to allow trusted users to build links between dynamic media and fieldnotes

Page 12: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Wurm collection, Solomon Wurm collection, Solomon Islands, 1979. Digitised Islands, 1979. Digitised

cassette cassette tape with page image of tape with page image of

transcript, transcript, and Wurm’s language mapand Wurm’s language map

Wurm collection, Solomon Wurm collection, Solomon Islands, 1979. Digitised Islands, 1979. Digitised

cassette cassette tape with page image of tape with page image of

transcript, transcript, and Wurm’s language mapand Wurm’s language map

Page 13: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Archival dataArchival data

Linking transcripts to media Citation of primary media Searchable time-aligned media corpus

Linking transcripts to media Citation of primary media Searchable time-aligned media corpus

Page 14: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

AudiamusAudiamus Building a citable corpus of media via linked transcripts

Persistent naming implied by citability Creation of good archival forms of media and then transcripts associated with them by stand-off markup

Need for a tool that facilitates working with this corpus

Cross platform tool Audiamus created for interacting with field recordings via their transcripts

Building a citable corpus of media via linked transcripts

Persistent naming implied by citability Creation of good archival forms of media and then transcripts associated with them by stand-off markup

Need for a tool that facilitates working with this corpus

Cross platform tool Audiamus created for interacting with field recordings via their transcripts

Page 15: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Training, resources and advocacy Training, resources and advocacy

Use of new technological approaches requires training, resources and advocacy Training in use of new toolsResources such as software, archiving, advice on tools and methods

Advocacy of the benefits of these new approaches and tools and the reasons for engaging with them

Use of new technological approaches requires training, resources and advocacy Training in use of new toolsResources such as software, archiving, advice on tools and methods

Advocacy of the benefits of these new approaches and tools and the reasons for engaging with them

Page 16: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Training, resources and advocacy Training, resources and advocacy

Great need for training expressed Great need for training expressed by postgraduate students in by postgraduate students in particularparticular

Training is critical as tools are Training is critical as tools are constantly emerging (recording constantly emerging (recording techniques and equipment, software techniques and equipment, software tools)tools)

Great need for training expressed Great need for training expressed by postgraduate students in by postgraduate students in particularparticular

Training is critical as tools are Training is critical as tools are constantly emerging (recording constantly emerging (recording techniques and equipment, software techniques and equipment, software tools)tools)

Page 17: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Training, resources and advocacy Training, resources and advocacy

We have run training workshops in the We have run training workshops in the use of appropriate linguistic toolsuse of appropriate linguistic tools for for archival output (Toolbox, Transcriber archival output (Toolbox, Transcriber etc)etc)

University campuses in Melbourne, University campuses in Melbourne, Sydney, Brisbane, University of Hawai’iSydney, Brisbane, University of Hawai’i

In community language centres in In community language centres in Melbourne, Kalgoorlie, Nambucca Heads Melbourne, Kalgoorlie, Nambucca Heads and Sydneyand Sydney

Batchelor InstituteBatchelor Institute

We have run training workshops in the We have run training workshops in the use of appropriate linguistic toolsuse of appropriate linguistic tools for for archival output (Toolbox, Transcriber archival output (Toolbox, Transcriber etc)etc)

University campuses in Melbourne, University campuses in Melbourne, Sydney, Brisbane, University of Hawai’iSydney, Brisbane, University of Hawai’i

In community language centres in In community language centres in Melbourne, Kalgoorlie, Nambucca Heads Melbourne, Kalgoorlie, Nambucca Heads and Sydneyand Sydney

Batchelor InstituteBatchelor Institute

Page 18: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Training, resources and advocacy Training, resources and advocacy

Methods for development of:Methods for development of: Time-aligned transcripts (in Time-aligned transcripts (in XML)XML)

Interlinearised text Interlinearised text Dictionary productionDictionary production Crucial separation of content Crucial separation of content and form to allow well-formed and form to allow well-formed archival dataarchival data

Methods for development of:Methods for development of: Time-aligned transcripts (in Time-aligned transcripts (in XML)XML)

Interlinearised text Interlinearised text Dictionary productionDictionary production Crucial separation of content Crucial separation of content and form to allow well-formed and form to allow well-formed archival dataarchival data

Page 19: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Training, resources and advocacy Training, resources and advocacy

Training in creation of archival Training in creation of archival sources by fieldworkerssources by fieldworkers Naming conventions and Naming conventions and persistent identification of persistent identification of datadata

Metadata sets and toolsMetadata sets and tools Data formats Data formats

WAVWAV Text/XMLText/XML etcetc

Training in creation of archival Training in creation of archival sources by fieldworkerssources by fieldworkers Naming conventions and Naming conventions and persistent identification of persistent identification of datadata

Metadata sets and toolsMetadata sets and tools Data formats Data formats

WAVWAV Text/XMLText/XML etcetc

Page 20: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Global research communityGlobal research community

LACITO (Paris)LACITO (Paris)ANLC (Alaska)ANLC (Alaska)

EMELD (Michigan)EMELD (Michigan)

AILLA (Texas)AILLA (Texas)

PARADISECPARADISEC

AMPM (Auckland)AMPM (Auckland)AIATSIS (Canberra)AIATSIS (Canberra)

ELAR (London)ELAR (London)

DOBES (Netherlands)DOBES (Netherlands)

DELAMANDELAMANarchivesarchives

Digital Endangered Languages and Musics Archives NetworkDigital Endangered Languages and Musics Archives NetworkDigital Endangered Languages and Musics Archives NetworkDigital Endangered Languages and Musics Archives Network

Page 21: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

We are cited as an exemplar We are cited as an exemplar using Digital Mass Storage using Digital Mass Storage Systems in the Systems in the International Association International Association of Sound and Audiovisual of Sound and Audiovisual Archives (IASA) Guidelines Archives (IASA) Guidelines on the Production and on the Production and Preservation of Digital Preservation of Digital Audio Objects (IASA-TC04). Audio Objects (IASA-TC04). Aarhus, Denmark: Aarhus, Denmark: International Association International Association of Sound and Audiovisual of Sound and Audiovisual Archives (IASA), 2004, p. Archives (IASA), 2004, p. 51. 51.

"The Sub Committee on "The Sub Committee on Technology of the Memory of Technology of the Memory of the World Programme of the World Programme of UNESCO recommends these UNESCO recommends these guidelines as best practice guidelines as best practice for Audio-Visual Archives. for Audio-Visual Archives. ""

Page 22: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Total file counts by file type:Total file counts by file type: ".jpg" : 46 files".jpg" : 46 files

".mp3" : 2001 files".mp3" : 2001 files ".pdf" : 34 files".pdf" : 34 files ".rtf" : 8 files".rtf" : 8 files ".tif" : 171 files".tif" : 171 files ".txt" : 3 files".txt" : 3 files ".wav" : 2000 files".wav" : 2000 files ".xml" : 31 files".xml" : 31 files

Total file sizes by file type:Total file sizes by file type: ".jpg" : 9.71 MB".jpg" : 9.71 MB ".mp3" : 53.70 GB".mp3" : 53.70 GB ".pdf" : 5.70 MB".pdf" : 5.70 MB ".rtf" : 1.04 MB".rtf" : 1.04 MB ".tif" : 848.57 MB".tif" : 848.57 MB ".txt" : 2.15 MB".txt" : 2.15 MB ".wav" : 1.61 TB".wav" : 1.61 TB ".xml" : 1.20 MB".xml" : 1.20 MB

Current size of collectionCurrent size of collection

As at October 7th 2005 - 4294 files in the As at October 7th 2005 - 4294 files in the collection totaling 1.66 TBcollection totaling 1.66 TB

Page 23: Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Further informationFurther information

http://paradisec.org.auhttp://paradisec.org.au