de conferentie 2012 - clarin

Post on 09-Dec-2014

352 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

CLARIN-NLReaching out to the users

Arjan van Hessen

Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands

State of the Technology

Language and Speech Technology is (nearly) mature Many applications are available Most of it is usable (although not perfect) but…..

Unused Technology & Resources

Many scholars are not aware of the HLT & Resources

A-priori technical knowledge still necessary Use it to much

dependent of “friends” in the field

Lack of standardization is killing

It is less used than expected

Research Life cycle

Cultural Heritage Institution(s)

New Idea

Research

BuildingTuning

Publications

?

Unused Technology & Resources

CAR

HLT & CHI paths

Language processing

Machine learning

Humaninities

CATCHCultural Heritage Institutions

After the project

7

Lack of standardizationBad interfaces

CLARIN-EU (2007-2012)CLARIN-NL (2009-2015)

CLARIN-ERIC (2012-xxxx)CLARIAH (2015-…)

Infrastructure program for the Humanities

8

Issues to address

1. Finding the users

2. Identification of their needs/problems

3. Do our solutions correspond to their problems?

4. Usability of tools: can they use them?

5. Visualisation

6. Tutorials and web material (movies, courses)

7. Sustainability of tools and resources

9

1. FINDING THE USERSHow to identify and convince potential users

10

Humanities enter a New Era

Huge amounts of digital data are becoming available

Traditionally, Spitzweg’s “lonely scholar” no longer

sufficesBig data, supported by

automated methods

Hardware allows this and many tools are available and under

development

11

User Surveys

Go out to ask potential users User survey in the Netherlands (2010)

12

2. IDENTIFICATION OF THEIR NEEDS/PROBLEMS

What do they need?

13

User attraction cycle

14

Finding new users

Convincing these users to

participate

Train these users in the use of all those wonderful tools

Support the users

Listening to the users

3. DO OUR SOLUTIONS CORRESPOND TO THEIR PROBLEMS?

What to prevent in order to NOT scare off (potential) users

15

16

The CLARIN dream

Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)

Give me all negative articles about Catholics in the Fryske Courant (1868-1924)

Find European TV news interviews that involve discussions about Geert Wilders

16

17

The CLARIN nightmare in 6 sleepless nights – night 1

Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) “All” means from all countries and all archives, not just some

archives in some (9) countries that happen to be in CLARIN If contemporary docs exist in digital form at all they are

probably pictures – how do we get access to the content? Can we rely on standardized metadata to find them? Many of the docs may be in Latin – can we handle that, and

what about the other languages? How would a scholar know how to formulate this query? How to present results?

4. USABILITY OF TOOLSThe gearbox syndrome

18

19

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

First HLT researcher offering help

20

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

First generation named entity recognizer (rule based)

21

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

Second HLT researcher offering help

22

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

Second generation named entity recognizer (statistics based)

23

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

Third HLT researcher offering help

24

The gearbox syndrome explained

Humanities scholar with a problem, waiting for a solution

LREC 2012 paper about next generation named entity recognizer

25

The gearbox syndrome explained

Making understandable interfaces

5. VISUALIZATION

A picture says more than 1000 wordsEasy visualization fosters data analysisNice visualisation eases use of analysis toolsNice-to-look-at tools help to reach out to the community

27

Who answered which words: visualizing word frequency information in letters

28

C. Culy. 2012. "Some challenges of language and linguistic data for information visualization. " Invited keynote presentation at Advanced Visual Methods for Linguistics. University of York, September 7, 2012.

29

30

Parliamentary Debate

31Which party interrupted which other party and how often?

6. TUTORIALS AND WEB MATERIAL

Create and publish web tutorialsPublish recorded lectures about CLARIN-specific topicsMake and publish show cases

32

Web-video’s

33

7. SUSTAINABILITY OF TOOLS AND RESOURCES

Resources and tools must be accessible after a project finishesData and tools must use international accepted standardsEasy access via federated login

35

CLARIN Centres

36

Conclusion

CLARIN offers a good and sustainable infrastructure for long-term use of both Resources and Tools

Participating in CLARIN gives you access to enclosure tools, standardized metadata, tools for metadata, the CLARIN community

Give other groups/institutions access to your data….. If you want

37

THANK YOU!

So join us!

www.clarin.nl

38

top related