timo honkela: research interests in text and metadata mining of literature

15
Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015 Timo Honkela 3 Dec 2015 Research interests in text and metadata mining of literature [email protected] A Seminar on Digital Publishing and Research

Upload: timo-honkela

Post on 17-Feb-2017

381 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Timo Honkela

3 Dec 2015

Research interests in text and metadata mining

of literature

[email protected]

A Seminar on Digital Publishing and Research

Page 2: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Inspirational context:classical literature

The presentation handles mostly general methods.We can discuss how and to which extent thesecan be used in this context.

Page 3: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Page 4: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Humanities and “compunities”

● Understanding and analysis of cultural artefacts such as novels or poems requires human experience of the world and dedication to the relevant context

● Computers are useful as tireless “distant reading tools” that can, for example, be put to count instances of linguistic expressions, their contextual relations and relation to given categories or to parse the structure of writings at different levels of abstraction

Page 5: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Digital humanities

● A research area called Digital Humanities usually combines (1) the research questions stemming from humanities and social sciences, (2) their data represented in digital form, and (3) addresses the research questions using quantitative computational analysis methods (statistics, machine learning) on the data along with qualitative research methods

● For the computational analysis, large collections of data (“bid data”) can provide certain benefits but also small data sets can be analyzed in a similar manner

Page 6: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Text and data mining

● Data mining refers to the computer-based analysis of data collections in order to find interesting or useful patterns, relations or structures in the data

● Data mining is often applied to numerical data but also structured data can be used

● Text mining refers to data mining applied specifically to texts

Page 7: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Text mining

● Analysis of text documents at different levels of abstraction– Word segments (morphology)

– Lexicon (words, terms, phrases, names, etc.)

– Syntax

– Semantics

– Pragmatics (computationally challenging)● Context!

Page 8: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Classical example: Learning meaning from context:

Maps of words in Grimm fairy tales

Honkela, Pulkki & Kohonen 1995

Automated learning of word re

lations

using self-organizing m

ap on text c

ontext data

Page 9: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Map of Finnish Science

Chemistry

Physics andengineering

Biosciences

Medicine

Culture and society

A fully automated process from terminology extraction (Likey) to semantic space construction (SOM) without any manually constructed resources.

Page 10: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Further opportunities

● Time series analysis (historical developments)– Ordering a collection of texts

– Analysis of narrative structures

● Social Network Analysis● Sentiment analysis● Names Entity Recognition

(people, places, organisations)

Page 11: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Analysis of metadata

● Study of the overall structures in a collection● Important sources include author, year of

publication, place, etc.● Can be analyzed in itself or in combination

with the full text data● Study of the quality of the metadata

– Variation

Page 12: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Hybrid methods:qualitative + quantative

● Qualitative interpretation of quantitative analysys results

● Quantitative analysis of qualitative interpretations

● Parallel analysis with qualitative methods (e.g. grounded theory) and quantative methods (e.g. undersupervised learning)

● Quantative analysis of human subjective and contextual understanding and expression

Page 13: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Grounded IntersubjectiveConcept Analysis

● A method developed to model how langage is understood in context and with some degree of individuality

● Computational approaches often assume a shared epistemology; here we are interested in the differences in human interpretation

Page 14: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

GICA analysis of the word healthin State of the Union Addresses

Page 15: Timo Honkela: Research interests in text and metadata mining of literature

Timo Honkela, A Seminar on Digital Publishing and Research. 3 Dec 2015

Tack så mycket!