interconnecting lexicographic resources. in search for a model dan cristea “alexandru ioan cuza”...

58
Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the Romanian Academy [email protected]

Upload: josiah-smale

Post on 01-Apr-2015

218 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Interconnecting lexicographic resources. In search for a model

Dan Cristea“Alexandru Ioan Cuza” University of Iași

Institute of Computer Science of the Romanian Academy

[email protected]

Page 2: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Topics

• Why would one want to connect linguistic resources?

• Parameterising the needs• Standardisation helps interconnecting• A bunch of notorious resources• How would this work?• Final remarks

COST-ENeL, Bled, 29-30 September 2014

Page 3: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Why would one want to connect linguistic resources?

• Use case 1: 100 Romanian dictionaries aligned– CLRE. Essential Romanian Lexicographical Corpus.

100 dictionaries aligned at entry and, partially, sense levels (2010 –2013, at Institute A.Philippide of the Romanian Academy, in Iași• dictionaries’ list at:

http://85.122.23.90/resurse/Lista-dictionarelor.doc• written in 3 types of alphabets: Cyrillic, transition and Latin• large diversity of formatting styles

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 4: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

CLREEssential Romanian Lexicographical Corpus

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 5: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

ISER

The CLRE project

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 6: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Petri

The CLRE project

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 7: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

DN II

The CLRE project

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 8: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Bălăşescu

The CLRE project

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 9: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Lexicon Militar

The CLRE project

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 10: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Dicționar de informatică

The CLRE project

Bucharest, 14-15 December 2012 COST-ENeL, Bled, 29-30 September 2014

Page 11: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

• Scanning• OCR – Abby Fine Reader 9• Parsing entries => XML • Manual verification• Indexing and alignment

Processing in CLRE

Iași, 25-26 September 2013 COST-ENeL, Bled, 29-30 September 2014

Page 12: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

CLRE – manual verification

Iași, 25-26 September 2013 COST-ENeL, Bled, 29-30 September 2014

Page 13: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Why would one want to connect linguistic resources?

• Use case 2: align WN with an explanatory dictionary– a WN synset:

pos (def, ex, w1s1 …wk

sk … wnsn)

– an explanatory dictionary entry: wk, pos, <wk

s1,def1,ex1>… <wksk,defk,exk>… <wk

sm,defm,exm>

COST-ENeL, Bled, 29-30 September 2014

Page 14: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Why would one want to connect linguistic resources?

• Use case 2: align WN with an explanatory dictionary– synsets of a word wk, pos:

(def1, ex1, …wks1 …)

… (defk, exk, …wk

sk …)

(defm, exm, …wksm …)

– the explanatory dictionary entry of the word wk, pos: wk, pos, <wk

s1,def1,ex1>… <wksk,defk,exk>… <wk

sm,defm,exm>COST-ENeL, Bled, 29-30 September 2014

Page 15: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Why would one want to connect linguistic resources?

• Use case 2: align WN with an explanatory dictionary– a WN synset:

pos (def, ex, w1s1 …wk

sk … wnsn)

– explanatory dictionary entries: w1, pos, … <w1

s1, def, ex>…

wk, pos, … <wksk, def, ex>…

wn, pos, … <wnsn, def, ex>…

COST-ENeL, Bled, 29-30 September 2014

Page 16: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Why would one want to connect linguistic resources?

• Use case 3: the TOT problem or the forgotten word

Mic

hael

Zoc

k

COST-ENeL, Bled, 29-30 September 2014

Page 17: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

The first thought: standardisation

• Lexical Markup Framework (LMF)– What is it?

• a common model for creation and use of lexical resources

– With what goal?• to manage the exchange of data between and among

these resources• to enable the merging of a large number of individual

electronic resources to form extensive global electronic resources

COST-ENeL, Bled, 29-30 September 2014

Page 18: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Near-standard• Text Encoding Initiative (TEI)

– What is it?• an inventory of the features most often deployed for

computer-based text processing • recommendations about suitable ways of representing

these features

– With what goal?• to facilitate processing by computer programs• to facilitate the loss-free interchange of data amongst

individuals and research groups using different programs, computer systems, or application software

COST-ENeL, Bled, 29-30 September 2014

Page 19: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Standardisation• Text Encoding Initiative (TEI)

– Example of a dictionary entry serialisation (from TEI Guidelines)

<entry> <form> <orth>disproof</orth> <pron>dIs"pru:f</pron> </form> <gramGrp> <pos>n</pos> </gramGrp> <sense n="1"> <def>facts that disprove something.</def> </sense> <sense n="2"> <def>the act of disproving.</def> </sense> </entry>

disproof (dIs"pru:f) n. 1. facts that disprove something. 2. the act of disproving. CED

COST-ENeL, Bled, 29-30 September 2014

Page 20: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

LMF and TEI content and… discontent

• The TEI format may be used as an interchange format, permitting sharing of resources even when their local encoding schemes differ.

• Both LMF and TEI model lexical material at a deep representational detail…

COST-ENeL, Bled, 29-30 September 2014

Page 21: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

LMF and TEI content and… discontent

• TEI intention: – guidance for individual or local practice in text

creation and data capture– support of data interchange – support of application-independent local processing

• Opening good possibilities of querying• But how would function the interconnection?...

COST-ENeL, Bled, 29-30 September 2014

Page 22: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Parameterising the needs

• If I want to connect two resources, simply merge the contents

• Then be able to interrogate the merged resource by taking advantage of peculiarities in each resource

COST-ENeL, Bled, 29-30 September 2014

Page 23: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Parameterising the needs

• Able to represent variations in word forms, alternate orthography, diachronic morphology

• Easy navigation by applying various filtering criteria

COST-ENeL, Bled, 29-30 September 2014

Page 24: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Parameterising the needs

• Very often lexicographic data is hierarchical– for instance, a sense of a dictionary entry contains

a definition, examples, but also sub-senses• Organise even recursive searches

– give me the definition neighbouring sphere of depth 2 of the word captain (take all senses of the entry captain and form the list of words in the corresponding definitions, then for each of them take all their senses and collect again words in their definitions)

COST-ENeL, Bled, 29-30 September 2014

Page 25: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

The idea

• Representing lexical information as feature structures centred on word’s lemmas

– disproof (dIs"pru:f) n. 1. facts that disprove something. 2. the act of disproving. CED

[lemma=disproof, entry=[pron=dIs"pru:f, pos=n, sense=[n=1, def=facts that disprove something], sense=[n=2, def=the act of disproving], res=CED]]

COST-ENeL, Bled, 29-30 September 2014

Page 26: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing lexical entries as feature structures

lemma=disproof

entry=

pron=dIs"pru:fpos=n

sense=

sense=

res=CED

n=1def=facts that disprove something

n=2def=the act of disproving

Cam

brid

ge E

nglis

h D

ictio

nary

COST-ENeL, Bled, 29-30 September 2014

Page 27: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing lexical entries as feature structures

• Graph representation

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

facts that disproves smthdef2

n

the act of disprovingdef

res

CED

COST-ENeL, Bled, 29-30 September 2014

Page 28: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing lexical entries as feature structures

lemma=disproof

entry=

pron=dIs"pru:fpos=n

sense=

sense=

res=MWCD

n=2def=evidence that disproves

n=1def=the action of disproving

Mer

riam

-Web

ster

’s C

olle

giat

e D

ictio

nary

COST-ENeL, Bled, 29-30 September 2014

Page 29: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing lexical entries as feature structures

• Graph representation

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

the action of disprovingdef2

n

evidence that disprovesdef

res

MWCD

COST-ENeL, Bled, 29-30 September 2014

Page 30: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

How could lexical entries be merged?• Entries of the same word from different

dictionaries

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

facts that…def2

n

the act of…def

res

CED

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

the action of…def2

n

evidence that…def

res

MWCD

COST-ENeL, Bled, 29-30 September 2014

Page 31: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Merging lexical entries

• Distinct parts

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

facts that…def2

n

the act of…def

res

CED

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

the action of…def2

n

evidence that…def

res

MWCD

COST-ENeL, Bled, 29-30 September 2014

Page 32: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Merging lexical entries

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

facts that…def2

n

the act of…def

res

CED

1sense

sense

n

the action of…def2

n

evidence that…def

res

MWCD

X (new)

X (new)

COST-ENeL, Bled, 29-30 September 2014

Page 33: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representation of the merged feature structure

lemma=disproof

entry=

pron=dIs"pru:fpos=n

X=

X=

n=1def=facts that disprove something

n=2def=the act of disproving

sense=

sense=

res=CED

n=1def=the action of disproving

n=2def=evidence that disproves

sense=

sense=

res=MWCD

COST-ENeL, Bled, 29-30 September 2014

Page 34: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

The WordNet search for disproof

COST-ENeL, Bled, 29-30 September 2014

Page 35: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Feature structures representation for the WN synsets of disproof

lemma

disproof

synsets

pos

n(*, falsification, refutation)

synset

(falsification, falsifying, *, refutation, refutal)synsetlex

lexgloss any evidence that helps to establish

the falsity of something

gloss (the act of determining that something is false

COST-ENeL, Bled, 29-30 September 2014

Page 36: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

The WordNet search for discount

COST-ENeL, Bled, 29-30 September 2014

Page 37: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing WordNet synsets

lemma

discount

synsets

pos

n(*, price reduction, deduction)

synset(discount rate, *, bank discount)

synset

synset

lex

lex

synset

gloss the act of reducing the selling price of merchandise

gloss interest on an annual basis deducted in advance on a loan

synsetspos

v

synset

synset

(dismiss, disregard, brush aside, brush off, *, push aside, ignore)lex

gloss bar from attention or consideration

……

ex“She dismissed his advances"

COST-ENeL, Bled, 29-30 September 2014

Page 38: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

How could dictionary entries be merged with

WN synsets?

lemma

disproof

synsets

pos

n(*, falsification, refutation)

synset

(falsification, falsifying, *, refutation, refutal)synsetlex

lexgloss any evidence that helps to establish

the falsity of something

gloss (the act of determining that something is false

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

facts that disproves smthdef2

n

the act of disprovingdef

res

CED

COST-ENeL, Bled, 29-30 September 2014

Page 39: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Merging a dictionary entry with a WN entry

lemma

disproof

entry

pron

dIs"pru:f

npos

1sense

sense

n

facts that disproves smthdef2

n

the act of disprovingdef

res

CED

lemma

disproof

synsets

pos

n(*, falsification, refutation)

synset

(falsification, falsifying, *, refutation, refutal)synsetlex

lexgloss any evidence that helps to establish

the falsity of something

gloss (the act of determining that something is false

COST-ENeL, Bled, 29-30 September 2014

Page 40: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

synsets

pos

n(*, falsification, refutation)

synset

(falsification, falsifying, *, refutation, refutal)synsetlex

lexgloss any evidence that helps to establish

the falsity of something

gloss (the act of determining that something is false

lemma

disproofentry

pron

dIs"pru:f

npos

1sense

sense

n

facts that disproves smthdef2

n

the act of disprovingdef

res

CED

Merging a dictionary entry with a WN entry

COST-ENeL, Bled, 29-30 September 2014

Page 41: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Going one step further

• Feature structures are hierarchical data • Codd: hierarchical data can be represented as

relational tables – Codd, E.F. (June 1970). "A Relational Model of Data

for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.

COST-ENeL, Bled, 29-30 September 2014

Page 42: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing feature structures as relational tables

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

cit …orth

auth…

……

from http://en.wikipedia.org/wiki/Relational_database

yr

COST-ENeL, Bled, 29-30 September 2014

Page 43: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing feature structures as relational tables

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

W O R D

cit …orth

auth…

……

id lemma entry

yr

from http://en.wikipedia.org/wiki/Relational_database

COST-ENeL, Bled, 29-30 September 2014

Page 44: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing feature structures as relational tables

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

W O R D

E N T RY entry pron pos sense res

cit …orth

auth…

……

id lemma entry

yr

from http://en.wikipedia.org/wiki/Relational_database

COST-ENeL, Bled, 29-30 September 2014

Page 45: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing feature structures as relational tables

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

W O R D

E N T RY entry pron pos sense res

cit …orth

auth…

……

S E N S E sense n def cit

id lemma entry

yr

from http://en.wikipedia.org/wiki/Relational_database

COST-ENeL, Bled, 29-30 September 2014

Page 46: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Representing feature structures as relational tables

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

W O R D

E N T RY entry pron pos sense res

cit …orth

auth…

……

C I T

S E N S E sense n def cit

cit orth auth yr

id lemma entry

yr

from http://en.wikipedia.org/wiki/Relational_database

COST-ENeL, Bled, 29-30 September 2014

Page 47: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Relational operators

• Projection: πa1,…an(R) => a relation containing only values of attributes a1,… an from the relation R

• Selection: σφ(R), with ϕ is logical condition => only tuples verifying the condition ϕ are retained from the relation (or the set) R

• Join: R✜S => the set of all attributes in R and S that are equal on their common attributes

• Union: RŮS => a table representing the union of the two relations

COST-ENeL, Bled, 29-30 September 2014

Page 48: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Interrogating a dictionary

• Citations before 1850 of the entry “symphony”.

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

cit …orth

yrauth

……

WORD

ENTRY

SENSE

CIT

πorth(σlemma=“symphony” & yr<1850 (WORD✜ENTRY✜SENSE✜CIT))

COST-ENeL, Bled, 29-30 September 2014

Page 49: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Interrogating a combination between a dictionary and a wordnet

• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

cit …orth

yrauth

……

WORD

ENTRY

SENSE

CIT

COST-ENeL, Bled, 29-30 September 2014

Page 50: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Interrogating a combination between a dictionary and a wordnet

• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

cit …orth

yrauth

……

WORD

ENTRY

SENSE

CITThe citations of the title word w:πorth(σlemma=w & yr<1850 (WORD✜ENTRY✜SENSE✜CIT))

COST-ENeL, Bled, 29-30 September 2014

Page 51: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Interrogating a combination between a dictionary and a wordnet

• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

cit …orth

yrauth

……

WORD

ENTRY

SENSE

CITUnify and lemmatise words belonging to the citations:lem(U(πorth(σlemma=w & yr<1850 (WORD✜ENTRY✜SENSE✜CIT))))

COST-ENeL, Bled, 29-30 September 2014

Page 52: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Interrogating a combination between a dictionary and a wordnet

• All synonyms of nouns belonging to citations dated before 1850, sorted lexicographically.

πlex(σpos=n & lemma lem(U(πorth(σlemma=w & yr<1850 (WORD✜ENTRY✜SENSE✜CIT)))

(WORD✜SYNS✜SYN))

lemma

w

entry

pron

npos 1

sense

sense

n …def

res

cit …orth

yrauth

……

……

WORDENTRY

SENSE

CIT

synsets posn

synset

lex

gloss…

SYNS

SYN

COST-ENeL, Bled, 29-30 September 2014

Page 53: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Conclusions

• I did not propose any model, I simply made some observations (nothing is really new)

• Linking lexicographic resources: – one resource => TEI representation => as feature

structures => hierarchical graphs => relational tables– more resources => unifications of tables– use query and relational operators for interrogation

COST-ENeL, Bled, 29-30 September 2014

Page 54: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Discussion

• Only a sketch – a lot of details should still be filled in– the good news: XML structures (the native

language of TEI) accept direct representations as database records: XSLT => opening direct access to a complex querying language: XQuery => mimicking the relational operators and adding more facilities

COST-ENeL, Bled, 29-30 September 2014

Page 55: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Discussion

• Another good news– representing variable depth structures– recursive hierarchies: Kamfonas

• Fixed depth dimensions are simpler to implement, maintain and query… Hierarchies that have variable depth or an uncertain number of levels… can often benefit if implemented as recursive hierarchies. http://www.kamfonas.com/id3.html

COST-ENeL, Bled, 29-30 September 2014

Page 56: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Discussion

• Even more good news (hopes)– interrogations can be formulated in natural

language => an interpreter translates them in the query language of a DBMS system

– as such, a handy tool at the benefit of lexicographers

COST-ENeL, Bled, 29-30 September 2014

Page 57: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Acknowledgements

• Work partially supported by the project The Computational Representative Corpus of Contemporary Romanian Language, a project of the Romanian Academy and partially by the COST-ENeL project

• I thank Isabelle Tamba and Mădălin Pătrașcu for the slides describing the CLRE project

COST-ENeL, Bled, 29-30 September 2014

Page 58: Interconnecting lexicographic resources. In search for a model Dan Cristea “Alexandru Ioan Cuza” University of Iași Institute of Computer Science of the

Thank you!

COST-ENeL, Bled, 29-30 September 2014