bente maegaard theme 1: dighumlab launch 10 september 2012:1300

10
Language Materials/Tools and CLARIN Bente Maegaard Centre for Language Technology University of Copenhagen [email protected]

Upload: dighumlab

Post on 01-Nov-2014

371 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

Language Materials/Tools and CLARIN

Bente MaegaardCentre for Language TechnologyUniversity of Copenhagen

[email protected]

Page 2: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

DIGHUMLAB 1 – Language Materials and Tools

Objective of DIGHUMLAB• Provide humanities researchers with access to digital data and tools for research and education

The theme Language Materials and Tools defines its object as all types of collections of materials which are expressed in language, be it written or spoken, including multimodal such as videos.

Collections of materials are of course interesting, but the real value comes with the services which are offered together with the materials – tools for analysing, visualising, storing and retrieving, modifying, comparing, annotating etc.

Many humanities researchers may think: Do we need this? Or why do we need this?

Dias 2

Centre for Language Technology

Aarhus September 2012

Page 3: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

Why is this interesting?

The answer is that many sciences have benefitted immensely from introducing IT into their research. • Makes it easier to do the same things as before• More importantly: adds new dimensions and new

perspectives – and maybe new research questions• Opens the possibility of sharing• But of course not all research will benefit

Dias 3

Centre for Language Technology

Aarhus September 2012

Page 4: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

In what way is this new?

It is true that many researchers have been using digital resources and tools for years.

What makes this special is that everything gets under the same roof – in two ways• National roof: DIGHUMLAB• International roof: CLARIN

Another important feature is that this research infrastructure is meant to be persistent – if not eternal.

Data which are stored here will be available after your research project is over, more data may be provided by others.

Collaboration is made possible and supported this way.

Dias 4

Centre for Language Technology

Aarhus September 2012

Page 5: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

DIGHUMLAB and CLARIN

This part of DIGHUMLAB relates to the European research infrastructure CLARIN ERIC, to be presented later.

So, the activities that are being performed constitute the Danish contribution to CLARIN, and some of them are coordinated by CLARIN.

Our work plan in short• Collect digital material, make available• Collect and create tools, make available• Improve and extend the existing technical infrastructure• Disseminate knowledge about language resources, tools

etc – knowledge sharing at the national level as well as the international level

• Provide a CLARIN technical centre

Dias 5

Centre for Language Technology

Aarhus September 2012

Page 6: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

Workplan – digital resources and tools

First step – a survey of existing resources and needs• Information meetings and follow-up meetings with

researchers at the Danish universities• Identify existing resources which could be integrated

• Determine the need for update, conversion, etc to CLARIN formats

• Clarify the existing licenses, copyrights etc• Identify needs – what is needed in order for a teacher to

use language materials and tools in the classroom, for exercises etc.? Very important for the take-up, so that next generation is better prepared.

• We have had meetings at the University of Copenhagen, and will visit all universities (with humanities faculties) in Denmark

For tools similarly, but we already have a long list of wishes for tools and services

Dias 6

Centre for Language Technology

Aarhus September 2012

Page 7: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

Examples of data and services

•Text, old and modern•Literature, language for special purposes•Parallel texts for translation studies•Videos, - audio and gestures•Newspapers, news on other media•Parliament debates•Tomb stones

•Add annotation (e.g. morpohology, lemma, analysis of gestures)•Search all occurrences of the same gesture•Find the most common pattern of xx•Find all names in historical texts•Find all different pronunciations of the letter ’a’ in Danish and their frequency•Find positive or negative expressions relating to the bank sector in FR and DK newspapers between 1980 and 2000.

Dias 7

Centre for Language Technology

Aarhus September 2012

Page 8: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

Extend current technical infrastructure, CLARIN technical centre

It is part of the Danish CLARIN contribution to provide a technical centre, authorisation and authentication mechanisms, trust federation with the other European CLARIN centres etc.

This will be part of the DIGHUMLAB central operations. Ongoing.

In the DK-CLARIN national project a first technical infrastructure was built, clarin.dk, which already has many of the features required and contains many resources.

User friendliness to be improved, tools and services to be extended and developed.

Dias 8

Centre for Language Technology

Aarhus September 2012

Page 9: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

Knowledge sharing

National and international activities

• PhD courses• Courses at undergraduate level• Workshops on special issues

Centres of expertise – Important instrument• Identify existing centres of expertise in Denmark that

are prepared to act as knowledge centres. Ongoing.

Dias 9

Centre for Language Technology

Aarhus September 2012

Page 10: Bente Maegaard Theme 1: DIGHUMLAB launch 10 September 2012:1300

Summing up: Language materials and CLARIN

User driven

Focus on • Long-term storage• Tools and services• User-friendliness• Being present at all universities and at other institutions

(cultural institutions such as libraries (SB, KB), National Museum, Danish Language Council, Society for Danish Language and Literature etc.)

Dias 10

Centre for Language Technology

Aarhus September 2012