konstantin rangochev 1 maxim goynov 1 desislava paneva-marinova 1 detelin luchev 2

19
LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE DIGITAL LIBRARY Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2 1 Institute of Mathematics and Informatics- BAS 2 Ethnographic Institute with Museum -BAS International Conference on Information Research and Applications 24-27 June 2010, Varna, Bulgaria

Upload: nelia

Post on 23-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

linguistics research and analysis of the bulgarian folklore. experimental implementation of linguistic components in bulgarian folklore digital library. Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2 1 Institute of Mathematics and Informatics-BAS - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

LINGUISTICS RESEARCH AND ANALYSIS OF THE BULGARIAN

FOLKLORE. EXPERIMENTAL IMPLEMENTATION OF

LINGUISTIC COMPONENTS IN BULGARIAN FOLKLORE

DIGITAL LIBRARY

Konstantin Rangochev1

Maxim Goynov1

Desislava Paneva-Marinova1

Detelin Luchev2

1 Institute of Mathematics and Informatics-BAS2 Ethnographic Institute with Museum -BAS

International Conference on Information Research and Applications

24-27 June 2010, Varna, Bulgaria

Page 2: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Presentation overview

• Linguistics research and analysis of the Bulgarian folklore

• National research project: “Knowledge Technologies for

Creation of Digital Presentation and Significant

Repositories of Folklore Heritage” (FolkKnow)

• Functionality “Bulgarian folklore digital library” multimedia

digital library

• Experimental implementation of a linguistic component in BFDL

Page 3: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Linguistics research and analysis of the Bulgarian folklore (1)

The main component of the linguistic research of the Bulgarian

folklore is the analysis of its lexical structure.

• How many and what token it contains?

• Is there and what is the domination or the lack of some groups

of tokens?

• Paradigm relationships in the folklore lexemes

• Context lexemes/Folklore language formulas

• Frequency of the lexemes, verses/sentences in which they

are, number, numbering in the song, etc. of the

verses/sentences.

• Word forms

• Regional characteristics of the folklore lexical structure, etc.

Page 4: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Linguistics research and analysis of the Bulgarian folklore (2)

Tools, formalizing the folklore analysis:

• Frequency dictionary

• A general frequency dictionary – it contains the all lexical

units which are in a folklore object repository;

• A regional frequency dictionary – it contains all the text units

which come of a definite folklore region or of a concrete

settlement;

• A functional frequency dictionary – it contains all the text

units which have identical functions: descriptions of the

rites, various types of songs, narratives, etc.

Page 5: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Linguistics research and analysis of the Bulgarian folklore (3)

Table: Comparison of the Bulgarian folklore and spoken languages.

Page 6: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Linguistics research and analysis of the Bulgarian folklore (4)

• Concordance dictionaries show the lexeme with/in her

context.

• Example for songs: “Fifty heroes are drinking wine” – the

underlined lexeme is the examined and the lexemes in

italic are her context.

• Example for narrative text: In the description of the rituals

one complete sentence is the context of the observed

lexeme (from point to point).

Page 7: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

FolkKnow project

• FolkKnow project: “Knowledge Technologies for Creation of

Digital Presentation and Significant Repositories of Folklore

Heritage” (contract number: IO-03-03/2006)

• Supported by National Science Fund of the Bulgarian Ministry of

Education and Science

• Partners: Institute of Mathematics and Informatics - BAS,

Institute for Folklore-BAS, Veliko Tarnovo University

• Module 3: “Development of Digital Libraries and Information Portal

with Virtual Exposition - Bulgarian Folklore Heritage”

Page 8: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

FolkKnow project

Page 9: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Bulgarian folklore digital library

Web address: http://213.191.194.27/folklor/

Page 10: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Main services (1)

Page 11: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

• Description of folklore object

Folklore object preview

Page 12: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Main services (2)

Page 13: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Main services (3)

• Extended search through all the object’s characteristics

Page 14: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Main services (4)

• Module for

– Managing and monitoring users’ data and activities:

registration, logs, data changes, level set, actions,

related to the object manipulation: search, preview ,

delete, add, edit, select, etc., administrative actions.

– File format conversion

– XML export of the BFDL objects

Page 15: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Linguistic search in text folklore objects

• Search of a word in the different types of dictionaries;

• Search of two or more words, searching of verbal formulas in the folklore lexis: “Drinking wine”, “Marko seated”.

• Search of a group of words, investigating the paradigmatic relations in the folklore lexis (river- stream- brook- rill…)

• Search for a root of a word, studying the folklore word-formation: “drink” (I am drinking, I have drunk, they have drunk…).

Page 16: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Experimental implementation of a linguistic component in

BFDL

Frequency dictionary functional specification• Linguistic analysis of the available set of test

folklore objects;• Determination of the frequency of meeting the

lexemes in text folklore objects;• Creating of lists of the lexemes,

– in frequency order– in alphabetical order

• Taking the number of the lexical units;• Taking the number of the repeats of the lexical

units.

Page 17: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Experimental implementation of a linguistic component in

BFDL

Sequence Diagram

Page 18: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Experimental implementation of a linguistic component in

BFDL

Analysis class diagram for the BFDL linguistic component

Page 19: Konstantin Rangochev 1 Maxim Goynov 1 Desislava Paneva-Marinova 1 Detelin Luchev 2

Implementation of the Bulgarian folklore digital library

The main tools and languages used:

• Microsoft Windows Server 2008 x64 Standard;

• Web server: Apache HTTP Server v 2.2, PHP v

2.2.9;

• Database management system: MySQL v 5.1

Standard;

• Tools for the additional modules: FFMPEG, vwWare,

HTML, JavaScript, AJAX;

• Database query language: SPARQL