galina bogdanova, konstantin rangochev, desislava paneva-marinova, nikolay noev institute of...

20
Galina Bogdanova, Konstantin Rangochev, Desislava Paneva-Marinova, Nikolay Noev Institute of Mathematics and Informatics, Bulgarian Academy of Sciences [email protected], [email protected], [email protected], [email protected] International Conference on Information Research and Applications – i.Tech 2011, Varna, Bulgaria, June, 2011

Upload: archibald-gibson

Post on 28-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Galina Bogdanova, Konstantin Rangochev, Desislava Paneva-Marinova, Nikolay Noev

Institute of Mathematics and Informatics, Bulgarian Academy of Sciences

[email protected], [email protected], [email protected], [email protected]

International Conference on Information Research and Applications – i.Tech 2011, Varna, Bulgaria, June, 2011

• Dictionaries classifications:

• By format: traditional dictionaries, digital dictionaries

(online (web-based) or local (desktop))

• By their purpose: descriptive dictionaries, grammar

dictionaries, dictionary of synonyms, valence dictionary,

dictionaries of etymology, phrase logical dictionaries,

frequency dictionaries, concordance dictionaries,

specialized dictionaries, terminological dictionaries, etc.

• By the number and type of languages: mono-

lingual, bilingual, multi-language dictionaries

The main component of the linguistic research of the Bulgarian folklore is the

analysis of its lexical structure.

• How many and what token it contains?

• Is there and what is the domination or the lack of some groups of tokens?

• Paradigm relationships in the folklore lexemes

• Context lexemes/Folklore language formulas

• Frequency of the lexemes, verses/sentences in which they

are, number, numbering in the song, etc. of the

verses/sentences.

• Word forms

• Regional characteristics of the folklore lexical structure, etc.

Tools, formalizing the folklore analysis:

• Frequency dictionary

• A general frequency dictionary – it contains the all lexical

units which are in a folklore object repository;

• A regional frequency dictionary – it contains all the text units

which come of a definite folklore region or of a concrete

settlement;

• A functional frequency dictionary – it contains all the text

units which have identical functions: descriptions of the rites,

various types of songs, narratives, etc.

Table: Comparison of the Bulgarian folklore and spoken languages

• Concordance dictionaries show the lexeme with/in her

context.

• Example for songs: “Fifty heroes are drinking wine” – the

underlined lexeme is the examined and the lexemes in

italic are her context.

• Example for narrative text: In the description of the

rituals one complete sentence is the context of the

observed lexeme (from point to point).

FolkKnow project: “Knowledge Technologies for Creation of Digital Presentation

and Significant Repositories of Folklore Heritage” (contract number: IO-03-

03/2006)

Supported by National Science Fund of the Bulgarian Ministry of Education and

Science

Partners: Institute of Mathematics and Informatics - BAS, Institute

for Folklore-BAS, Veliko Tarnovo University

Module 2: “Development, Annotation and Protection of a Digital

Archive “Bulgarian Folklore Heritage”

Module 3: “Development of Digital Libraries and Information Portal with Virtual

Exposition - Bulgarian Folklore Heritage”

Web address: http://folknow.cc.bas.bg/

Description of folklore object

Extended search through all the object’s

characteristics

Search of a word in the different types of dictionaries;

Search of two or more words, searching of verbal formulas in the folklore lexis: “Drinking wine”, “Marko seated”.

Search of a group of words, investigating the paradigmatic relations in the folklore lexis (river- stream- brook- rill…)

Search for a root of a word, studying the folklore word-formation: “drink” (I am drinking, I have drunk, they have drunk…).

Frequency dictionary functional requirements Linguistic analysis of the available set of

test folklore objects; Determination of the frequency of meeting

the lexemes in text folklore objects; Creating of lists of the lexemes,

in frequency order in alphabetical order

Taking the number of the lexical units; Taking the number of the repeats of the

lexical units.

Sequence Diagram

Analysis class diagram for the BFDL linguistic component

frequency dictionary for texts with folklore themes

WEB interface

full text search

rules and concepts in the field of Bulgarian folklore that filter the words/phrases

words/phrases are representatives of 20 different folklore rubrics (thematic headings)

1. Village information 2. Rituals and feasts 3. Songs 4. Instrumental music

(descriptions) 5. Dance folklore (descriptions) 6. Children folklore 7. Prose 8. Proverb, saying 9. National beliefs and knowledge 10. National medicine 11. Magic 12. Fortune-telling 13. Dreams 14. Clothing and adornment 15. Belongings 16. National art 17. Architecture, monuments 18. Food and feeding19. Festivals, gatherings and reviews 20. Others

administrative area: Adding of a text: here the application has a text field, that enables

addition of text and a field that enables upload of the source file. Adding of a rubric: the application is simplified to the limit and the

administrator chooses the level on which he wants to add a rubric and gives only its name

Change of a rubric Deletion: After an object is chosen to be deleted at the chosen level,

the system deletes cascade all lower levels

User part: composed of the search form that allows for selecting a desired item,

the level and the corresponding text. The results appeared on the screen in which information rubric, how many files and how the words are distributed

Full text search: System perform a full text search of corpuses of text Text filtration by rubrics, indexes or metadata

Bulgarian Folklore Digital LibraryBulgarian Folklore Digital Library

http://folknow.cc.bas.bghttp://folknow.cc.bas.bg

For contacts:

[email protected] [email protected]

[email protected]@gmail.com