visualizing natural language resources kristina kocijan university of zagreb, faculty of humanities...

Post on 15-Jan-2016

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Visualizing Natural Language Resources

Kristina Kocijan

University of Zagreb,

Faculty of Humanities and Social Sciences,

Department of Information and Communication SciencesZagreb, Croatia

krkocijan@ffzg.hr

Is it about beautiful pictures?

Sooo, what is this presentation about?

Beauty is in the eye of the beholder.

3rd century BC, Greek saying

Baudelaire’s beauty:

data is beautiful if it is the result of reason and calculation.

Thoreau’s beauty:

data is beautiful by its very plainness.

About beautiful pictures!

Sooo, what is this presentation about?

New ways of presenting data?

Sooo, that’s it – only beautiful pictures?

The hope is that, in not too many years, human brains and computing machines will be coupled together very tightly and that the resulting partnership will think as no human

brain has ever thought and process data in a way not approached by the information-handling machines we know today.”

J.C.R. Licklider, in ‘Man-Computer Symbiosis’, March, 1960.

Reading the same data

In different forms

Reading the same dataSlowly, slowly,

very slowly

Faster,

alas lucking infoNouns Common

Collective

Proper

Fem 8 344 1 3 177

Mas. 6 249 2 3 189

Neut. 5 520 3 66

No gender

0 0 36 362

Total per type

20 113 6 42 794

Total nouns

62 913

Reading the same data

Speedy, and empowering

Reading the same data

Speedy, and empowering

Reading the same data

Speedy, and empowering

Presenting the same dataStatistics for the nouns

in a dictionaryStatistics for the nouns

in a corpus

Nouns Common Proper

Fem 39.84 % 3.61 %

Mas. 32.26 % 4.64 %

Neut. 15.36 % 0.12 %

No gender 0 % 4.17 %

Total per type 87.46 % 12.54 %

Total nouns 1 048 570

Nouns Common Proper

Fem 13.26 % 5.05 %

Mas. 9.93 % 5.07 %

Neut. 8.77 % 0.10 %

No gender 0 % 57.80 %

Total per type 31.97 % 68.03 %

Total nouns 62 907

Distribution of top 10 paradigmas

In DIC:

ALAT

ASTRONOM

BLAGOST

BRATIĆ

CRTANJE

DAVOR

FABIANA

GUSJENICA

LEPTIR

MEDO

In Corp:

ALAT

BATBESKRAJ

BLAGO

BLAGOST

BRATIĆ

CRTANJE

GUSJENICA

MEDO

PROLAZNIK

Genitive+sg endings

In DIC In Corpus

Genitive+sg endings

In Corpus

Genitive+sg endings - weighted

In Corpus

Visual Story

As told by Data

Often the most effective way to describe, explore and summarize a set of numbers – even a very large set – is to look at pictures of those numbers.

Edward R. Tufte in ‘Visual Display of Quantitative Information’, 2001.

Story behind the NLR data

Instrumental

Genitive

Vocative

Dative

Accusative

Locative

Story behind the NLR data

Story behind the NLR data

Story behind the NLR data

Story behind the NLR data

Story behind the NLR data

Story behind the NLR data

Thank you!Visualizing

Natural Language ResourcesKristina Kocijan

University of Zagreb,

Faculty of Humanities and Social Sciences,

Department of Information and Communication SciencesZagreb, Croatia

krkocijan@ffzg.hr

Questions?

top related