visualizing natural language resources kristina kocijan university of zagreb, faculty of humanities...

26
Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and Communication Sciences Zagreb, Croatia [email protected]

Upload: francine-greene

Post on 15-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Visualizing Natural Language Resources

Kristina Kocijan

University of Zagreb,

Faculty of Humanities and Social Sciences,

Department of Information and Communication SciencesZagreb, Croatia

[email protected]

Page 2: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Is it about beautiful pictures?

Sooo, what is this presentation about?

Page 3: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Beauty is in the eye of the beholder.

3rd century BC, Greek saying

Baudelaire’s beauty:

data is beautiful if it is the result of reason and calculation.

Thoreau’s beauty:

data is beautiful by its very plainness.

Page 4: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

About beautiful pictures!

Sooo, what is this presentation about?

Page 5: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

New ways of presenting data?

Sooo, that’s it – only beautiful pictures?

Page 6: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

The hope is that, in not too many years, human brains and computing machines will be coupled together very tightly and that the resulting partnership will think as no human

brain has ever thought and process data in a way not approached by the information-handling machines we know today.”

J.C.R. Licklider, in ‘Man-Computer Symbiosis’, March, 1960.

Page 7: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Reading the same data

In different forms

Page 8: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Reading the same dataSlowly, slowly,

very slowly

Faster,

alas lucking infoNouns Common

Collective

Proper

Fem 8 344 1 3 177

Mas. 6 249 2 3 189

Neut. 5 520 3 66

No gender

0 0 36 362

Total per type

20 113 6 42 794

Total nouns

62 913

Page 9: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Reading the same data

Speedy, and empowering

Page 10: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Reading the same data

Speedy, and empowering

Page 11: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Reading the same data

Speedy, and empowering

Page 12: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Presenting the same dataStatistics for the nouns

in a dictionaryStatistics for the nouns

in a corpus

Nouns Common Proper

Fem 39.84 % 3.61 %

Mas. 32.26 % 4.64 %

Neut. 15.36 % 0.12 %

No gender 0 % 4.17 %

Total per type 87.46 % 12.54 %

Total nouns 1 048 570

Nouns Common Proper

Fem 13.26 % 5.05 %

Mas. 9.93 % 5.07 %

Neut. 8.77 % 0.10 %

No gender 0 % 57.80 %

Total per type 31.97 % 68.03 %

Total nouns 62 907

Page 13: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Distribution of top 10 paradigmas

In DIC:

ALAT

ASTRONOM

BLAGOST

BRATIĆ

CRTANJE

DAVOR

FABIANA

GUSJENICA

LEPTIR

MEDO

In Corp:

ALAT

BATBESKRAJ

BLAGO

BLAGOST

BRATIĆ

CRTANJE

GUSJENICA

MEDO

PROLAZNIK

Page 14: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Genitive+sg endings

In DIC In Corpus

Page 15: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Genitive+sg endings

In Corpus

Page 16: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Genitive+sg endings - weighted

In Corpus

Page 17: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Visual Story

As told by Data

Page 18: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Often the most effective way to describe, explore and summarize a set of numbers – even a very large set – is to look at pictures of those numbers.

Edward R. Tufte in ‘Visual Display of Quantitative Information’, 2001.

Page 19: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Story behind the NLR data

Instrumental

Genitive

Vocative

Dative

Accusative

Locative

Page 20: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Story behind the NLR data

Page 21: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Story behind the NLR data

Page 22: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Story behind the NLR data

Page 23: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Story behind the NLR data

Page 24: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Story behind the NLR data

Page 25: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Story behind the NLR data

Page 26: Visualizing Natural Language Resources Kristina Kocijan University of Zagreb, Faculty of Humanities and Social Sciences, Department of Information and

Thank you!Visualizing

Natural Language ResourcesKristina Kocijan

University of Zagreb,

Faculty of Humanities and Social Sciences,

Department of Information and Communication SciencesZagreb, Croatia

[email protected]

Questions?