linguistics research and analysis of the bulgarian folklore. experimental implementation of...

19
LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2 1 Institute of Mathematics and Informatics- BAS 2 Ethnographic Institute with Museum -BAS International Conference on Information Research and Applications 24-27 June 2010, Varna, Bulgaria

Upload: rudolph-morris

Post on 05-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN

FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF

LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE

DIGITAL LIBRARY

Konstantin Rangochev1

Maxim Goynov1

Desislava Paneva-Marinova1

Detelin Luchev2

1 Institute of Mathematics and Informatics-BAS2 Ethnographic Institute with Museum -BAS

International Conference on Information Research and Applications

24-27 June 2010, Varna, Bulgaria

Page 2: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Presentation overview

• Linguistics research and analysis of the Bulgarian folklore

• National research project: “Knowledge Technologies for

Creation of Digital Presentation and Significant

Repositories of Folklore Heritage” (FolkKnow)

• Functionality “Bulgarian folklore digital library” multimedia

digital library

• Experimental implementation of a linguistic component in BFDL

Page 3: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Linguistics research and analysis of the Bulgarian folklore (1)

The main component of the linguistic research of the Bulgarian

folklore is the analysis of its lexical structure.

• How many and what token it contains?

• Is there and what is the domination or the lack of some groups

of tokens?

• Paradigm relationships in the folklore lexemes

• Context lexemes/Folklore language formulas

• Frequency of the lexemes, verses/sentences in which they

are, number, numbering in the song, etc. of the

verses/sentences.

• Word forms

• Regional characteristics of the folklore lexical structure, etc.

Page 4: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Linguistics research and analysis of the Bulgarian folklore (2)

Tools, formalizing the folklore analysis:

• Frequency dictionary

• A general frequency dictionary – it contains the all lexical

units which are in a folklore object repository;

• A regional frequency dictionary – it contains all the text units

which come of a definite folklore region or of a concrete

settlement;

• A functional frequency dictionary – it contains all the text

units which have identical functions: descriptions of the

rites, various types of songs, narratives, etc.

Page 5: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Linguistics research and analysis of the Bulgarian folklore (3)

Table: Comparison of the Bulgarian folklore and spoken languages.

Page 6: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Linguistics research and analysis of the Bulgarian folklore (4)

• Concordance dictionaries show the lexeme with/in her

context.

• Example for songs: “Fifty heroes are drinking wine” – the

underlined lexeme is the examined and the lexemes in

italic are her context.

• Example for narrative text: In the description of the rituals

one complete sentence is the context of the observed

lexeme (from point to point).

Page 7: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

FolkKnow project

• FolkKnow project: “Knowledge Technologies for Creation of

Digital Presentation and Significant Repositories of Folklore

Heritage” (contract number: IO-03-03/2006)

• Supported by National Science Fund of the Bulgarian Ministry of

Education and Science

• Partners: Institute of Mathematics and Informatics - BAS,

Institute for Folklore-BAS, Veliko Tarnovo University

• Module 3: “Development of Digital Libraries and Information Portal

with Virtual Exposition - Bulgarian Folklore Heritage”

Page 8: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

FolkKnow project

Page 9: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Bulgarian folklore digital library

Web address: http://213.191.194.27/folklor/

Page 10: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Main services (1)

Page 11: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

• Description of folklore object

Folklore object preview

Page 12: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Main services (2)

Page 13: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Main services (3)

• Extended search through all the object’s characteristics

Page 14: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Main services (4)

• Module for

– Managing and monitoring users’ data and activities:

registration, logs, data changes, level set, actions,

related to the object manipulation: search, preview ,

delete, add, edit, select, etc., administrative actions.

– File format conversion

– XML export of the BFDL objects

Page 15: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Linguistic search in text folklore objects

• Search of a word in the different types of dictionaries;

• Search of two or more words, searching of verbal formulas in the folklore lexis: “Drinking wine”, “Marko seated”.

• Search of a group of words, investigating the paradigmatic relations in the folklore lexis (river- stream- brook- rill…)

• Search for a root of a word, studying the folklore word-formation: “drink” (I am drinking, I have drunk, they have drunk…).

Page 16: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Experimental implementation of a linguistic component in

BFDL

Frequency dictionary functional specification• Linguistic analysis of the available set of test

folklore objects;• Determination of the frequency of meeting the

lexemes in text folklore objects;• Creating of lists of the lexemes,

– in frequency order– in alphabetical order

• Taking the number of the lexical units;• Taking the number of the repeats of the lexical

units.

Page 17: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Experimental implementation of a linguistic component in

BFDL

Sequence Diagram

Page 18: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Experimental implementation of a linguistic component in

BFDL

Analysis class diagram for the BFDL linguistic component

Page 19: LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY

Implementation of the Bulgarian folklore digital library

The main tools and languages used:

• Microsoft Windows Server 2008 x64 Standard;

• Web server: Apache HTTP Server v 2.2, PHP v

2.2.9;

• Database management system: MySQL v 5.1

Standard;

• Tools for the additional modules: FFMPEG, vwWare,

HTML, JavaScript, AJAX;

• Database query language: SPARQL