the italian clips lexicon and its reuse in a bilingual environment nilda ruimy ilc cnr, pisa...

68
The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Upload: caleb-malone

Post on 28-Mar-2015

221 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

The Italian CLIPS Lexicon

and its reuse in a bilingual environment

The Italian CLIPS Lexicon

and its reuse in a bilingual environment

Nilda Ruimy

ILC CNR, Pisa

september 2004

Page 2: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

OutlineOutline

The origin of the CLIPS lexicon The PAROLE-SIMPLE model

General encoding criteria Phonological and morphological levels Syntactic level: information content The semantic lexicon Theoretical background: GL theory

The original Qualia Structure The SIMPLE ontology The Extended Qualia Structure Semantic level: information content Predicative structure Syntax-semantics mapping Encoding methodology CLIPS essential features & applications

september 2004

Part I Part II

Creating a bilingual resource The two scenarios

Scenario I Drawbacks

Scenario II The cognate approach The sense indicator approach

Results Concluding remarks

Nilda Ruimy

Page 3: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

CLIPS: a bit of genealogyCLIPS: a bit of genealogy

CLIPSlexicon

XML format

CLIPSlexicon

XML formatmorphology: 20,000 entriessyntax: 20,000 lemmas semantics: 10,000 senses

september 2004

PAROLE Corpus

lexical units PAROLE Corpus

lexical units

DMI phonology

DMI phonology

PAROLE European project

Italy: enlargment of these core

lexicons in a national follow-up project

SIMPLE European project

phonology: 374,000 entriesmorphology: 49,000 entriessyntax: 55,000 lemmassemantics: 55,000 senses

Nilda Ruimy

12 harmonized lexicons

12 harmonized lexiconsPAROLElexicons

PAROLElexicons

SIMPLElexicons

SIMPLElexicons

Semantic Information for Multifunctional

Plurilingual Lexica

Page 4: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

GENELEX-PAROLERepresentational Model

PAROLE-SIMPLETheoretical model

•EAGLES recommendations

•Extended GENELEX model

•Results from EU projects:

• EUROWORDNET

• ACQUILEX

• DELIS

• GENERATIVE LEXICON

The PAROLE-SIMPLE ModelThe PAROLE-SIMPLE Model

september 2004 Nilda Ruimy

Page 5: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

common EAGLES-conformant model common representation language common building methodology

The Linguistic ModelThe Linguistic Model

InnovativeTackles misrepresented areas of knowledgeExtendible and multifunctionalMultilingual perspective

PAROLE-SIMPLE lexicons

Nilda Ruimy september 2004

REUSABILITY

Page 6: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Representational Model (1)Representational Model (1)

Entity/Relationship Model:

september 2004

implemented through a DTD that defines: the structure of every descriptive element the relationships holding among the various

descriptive elements as well as their co-occurence restrictions

non ridondant data representation

Nilda Ruimy

Page 7: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Representational Model (2)Representational Model (2)

specific representational structures for the every level of linguistic description;

september 2004

link among the different levels although the information encoded at each level is perfectly autonomous

Nilda Ruimy

Page 8: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

General encoding criteriaGeneral encoding criteria

Reduce the lexicographer’s margin of subjectivity by setting precise guidelines for the treatment of particular phenomena

Base as much as possible the encoding on corpus data

Find a balance between the encoding of attested structures / senses only and an exhaustive encoding including rare structures / senses as well

Nilda Ruimy

Page 9: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

Splitting entriesSplitting entries

Avoid both redundancy and over-powerful gatherings

Use criteria strictly relevant to the description level, e.g. at the syntactic level, syntactic-driven criteria: arity syntactic function:

disporre i libri negli scaffali / disporre di due auto complement optionality:

attraversare (la strada) (lit. sense) / attraversare un momento difficile different (non alternative) realization of complements: Leo evita Lia / L. ha evitato di guardare L., che L. si ferisse

Encode, at the semantic level, most common senses distinguished in average size dictionaries (ca.150,000 words)

Nilda Ruimy

Page 10: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

a. head properties

b. subcat. frame

positionsynt. restr.

syntactic structure 1

Corresp. MrphU-SynU

Corresp. PhnU-MrphU

MorphologicalUnit

PoS & subcat.inflectional paradigm

PhonologicalUnit

stress positionvowel opennesscons. prononciation

syntactic structure 2

Framesetpositionsynt. restr. a. head properties

b. subcat. frame SyntacticUnit

The four-level architectureThe four-level architecture

september 2004

The first three levelsThe first three levels

Nilda Ruimy

Page 11: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

P1

adverbialdi_PP

optional

Aumentare:

main verb

relates main syntactic frame to alternating one

aux. :avere

syntactic frame:

FRAMESET relating systematic frame alternations:

relates respective frame positions

‘to increase: The government has increased the prices by 3%. Prices have increased by 3%’Il governo ha aumentato i prezzi del 3%. I prezzi sono aumentati del 3%

P0optionalsubject

P1oblig.object

P2optionaladverbial

NP NP di_PP

RELATEDRELATED

P0

subjectNP

optional

decausativization

locative alternation

reciprocal altern.

symmetrical altern.

MAINMAIN

complexsynt.entry

syntactic frame:

Syntactic entry information contentSyntactic entry information content

september 2004

Specific properties of the entry in the syntactic context described

Subcategorization frame

Link between syntactic structures

Nilda Ruimy

Page 12: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

The semantic lexiconThe semantic lexicon

september 2004

Theoretical linguistic background:

Extended version of

Pustejovsky’s Generative Lexicon (GL) theory

Nilda Ruimy

Page 13: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

lexical meanings of various levels of complexity

Generative Lexicon theoryGenerative Lexicon theory

september 2004

bambino HUMAN, age (childhood), sex (male) dottore HUMAN, age (adult), sex (male), giornale 1. printed paper, 2. location

3. istitution 4. human group polysemy

simplest ones : definable by a taxonomic relation

more complex ones:hypernymic relation not sufficient

Qualia Structure allows :to coherently model the pluridimensionality of meaning

to represent uniformly semantic units of different degree of complexity

function

to capture the relationships holding btw. semantic units

Nilda Ruimy

Page 14: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Qualiaformal = what is X?constitutive = what is X made of?agentive = how does X come about?telic = what is X’s function?

september 2004

The Original Qualia structureThe Original Qualia structure

Consists of four roles: formal role: distinguishes the denoted entity from others

constitutive role: expresses its components

agentive role: expresses its coming about

telic role: specifies its funtion

Nilda Ruimy

Page 15: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

The SIMPLE ontology (1)The SIMPLE ontology (1)

september 2004

Lexicon structured on the basis of a type ontology:

Possible creation of language / application specific types

Core Ontology: top level, general types; large consensus;provide essential information;mappable on EuroWordNet ontology

Recommended Ontology:hierarchically lower and more specific types;provide finer-grained information

Nilda Ruimy

Page 16: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

157 language independent semantic types

The SIMPLE ontology (2)The SIMPLE ontology (2)

september 2004

Living_entity

Animal

Earth_Animal

Concrete_entity

Entity

simple types (one-dimensional) : can be fully characterized in terms of a hypernymic

relation, e.g.

Nilda Ruimy

Page 17: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

the reference to orthogonal dimensions of meaning

The SIMPLE ontology (3)The SIMPLE ontology (3)

september 2004

Agentive Telic

Institution

Abstract_Entity

Entity

unified types (multi-dimensional) :can only be defined through the combination of: the relation to their supertype

Nilda Ruimy

Page 18: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

The SIMPLE ontology (4)The SIMPLE ontology (4)

september 2004

Simple Ontology:

multidimensional type hierarchy based on both

hierarchical and non-hierarchical conceptual relations

Nilda Ruimy

Page 19: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Semantic typesSemantic types

september 2004

In the SIMPLE ontology, types are not mere labels but the repository of a specific set of structured semantic information

Nilda Ruimy

Page 20: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Concrete_entity

Abstract_entityPropertyRepresentation

TELIC

•Furniture

•Instrument

•Clothing

•Artwork

•Sign

•Language

•Information

•.....

•Living_entity

•Human

•Animal

•Vegetal_entity

•Artifact

•Susbstance

•Location

•Food

•Material

•Quality

•Quantity

•Physical_prop

•Psychol_prop

•.....

•Convention

•Cognitive_fact

•.....

Artifactual_material

Artifact

TOP

AGENTIVE CONSTITUTIVE ENTITYEvent...

...

...

some semantic types for abstract & concrete entitiessome semantic types for

abstract & concrete entities

september 2004 Nilda Ruimy

Page 21: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Phenomenon

Change

Psych_eventAspectual

State Act

EVENT

Cause_change

Relational_state

Non_relational_act

Relational_act

Move

Cause_act

Relational_change

Change_possession

Change_location

Acquire_knowledge

Natural_transition

...

Creation

......

......

...

...

Speech_act

...

...

some semantic types for events

some semantic types for events

september 2004 Nilda Ruimy

Page 22: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

some semantic types for adjectives

some semantic types for adjectives

september 2004 Nilda Ruimy

ExtensionalIntensional

TOP

Psychological_prop

Social_prop

Physical_prop Intensifying_prop

Temporal_prop

Relational_prop

Temporal

Modal

EmotiveManner

Object_related

Emphasizer

Page 23: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Features:

PlusHuman, PlusCollective,..

Relations between semantic units:

R (<SemU1>, <SemU2>)

Descriptive elementsDescriptive elements

september 2004 Nilda Ruimy

Page 24: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

isaantonym_compantonym_gradmult_opposition

FormalFormal

result_ofagentive_progagentive_causeagentive_experiencecaused_bysource

AGENTIVE

ARTIFACTUAL

AGENTIVE

created_byderived_from

AgentiveAgentive

used_forused_asused_byused_against

TELIC

ACTIVITY

INSTRUMENTAL

DIRECT

TELIC

indirect_telicpurpose

object_of_activity

is_the_activity_ofis_the_ability_ofis_the_habit_of

TelicTelicmade_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesuses

CONSTITUTIVE

causesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_ofcontainsfeeling

P

R

O

P

E

R

T

Y

is_inlives_intypical_location

LOCATION

ConstitutiveConstitutive

september 2004

Extended

Extended

Nilda Ruimy

Extended

roles

Extended

roles

Qualia

Qualia

Structure

Structure

Page 25: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

isaantonym_compantonym_gradmult_opposition

FormalFormal

result_ofagentive_progagentive_causeagentive_experiencecaused_bysource

AGENTIVE

ARTIFACTUAL

AGENTIVE

created_byderived_from

AgentiveAgentive

used_forused_asused_byused_against

TELIC

ACTIVITY

INSTRUMENTAL

DIRECT

TELIC

indirect_telicpurpose

object_of_activity

is_the_activity_ofis_the_ability_ofis_the_habit_of

TelicTelicmade_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesuses

CONSTITUTIVE

causesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_ofcontainsfeeling

P

R

O

P

E

R

T

Y

is_inlives_intypical_location

LOCATION

ConstitutiveConstitutive

september 2004

proiettile, colpire

(projectile, hit)

antitarmico, tarma

(moth balls, moth)

bisturi, chirurgo

(lancet, surgeon)

metano, combustibile

(methane, fuel)

casa, costruire

(house, build)

mohair, capra

(mohair, goat)

manubrio, bicicletta

(handlebar, bicycle)

abbaiare, cane

(bark, dog)

arancio, arancia

(orange tree, orange)medico, curare

(doctor, cure)

fumatore, fumare

(smoker, smoke)

disgusto, provare

(disgust, feel)

senato, senatore

(senate, senator)

Nilda Ruimy

pane, farina

(bread, flour)

Page 26: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Formal role

Agentive role

Tel

ic r

ole

Con

stit

utiv

e ro

le

instrument

is_a

used_forcr

eate

d_byis_made_of

Orthogonal dimensions of meaningOrthogonal dimensions of meaning

september 2004 Nilda Ruimy

Page 27: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Formal role

Agentive role

Tel

ic r

ole

Con

stit

utiv

e ro

le

violin

is_a

mus

ical

_ins

trum

ent

used_forplaying

crea

ted_

bym

ake

has_as_partstrings

is_made_ofwood

Orthogonal dimensions of meaningOrthogonal dimensions of meaning

september 2004 Nilda Ruimy

Page 28: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

recipienterecipientedi legnodi legnofattofatto

che serve per la conservazione e il trasportoche serve per la conservazione e il trasporto

Formal: isa Constitutive: made_of

Agentive: created_by

Constitutive:contains

Telic:Used_for

di doghe arcuate tenute unite da cerchi di ferrodi doghe arcuate tenute unite da cerchi di ferro

Constitutive: made_of

di liquidi, specialmente vinodi liquidi, specialmente vino

bottebottebottebottebarrel

traditional dictionary definition

meaning dimensions expressed by Qualia relations

meaning dimensions expressed by Qualia relations

september 2004 Nilda Ruimy

Page 29: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

arnese attrezzo utensile strumento macchina apparecchio dispositivo

giogo

spalliera

piano graticola aratro citofono laser

manufatto

AARRTTIIFFAACCTT

CCOONNCCRREETTEE__EENNTTIITTYY

IINNSSTTRRUUMMEENNTT iiss--aa rreellaattiioonn iiss--aa rreellaattiioonn

iiss--aa rreellaattiioonn iiss--aa rreellaattiioonn

Within a semantic type population, further clusterings can be made through the is-a relation:

september 2004

Qualia informative power (1)Qualia informative power (1)

Nilda Ruimy

Page 30: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

INSTRUMENTutensile

graticola colabrodo

frusta

posata

coltello

is-a is-a

is-a

cucinare

used for

used for

mangiare

used for CONTAINER

contenitore

pentola tegame padella

is-a

forchetta

Qualia informative power (2)Qualia informative power (2)

september 2004 Nilda Ruimy

Page 31: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

domain

semant. class ontological type Corresp. SynU-SemU

event type

semant. features

semant. relations

Extended Qualia Structure

regular polysemysem. restr.

argumentspredicate predicative represent.

type of link

SemanticUnit

synonymy

derivation

constitutive role

formal role

telic role

agentive role

a. head properties

b. subcat. frame

positionsynt. restr.

syntactic structure 1

Corresp. MrphU-SynU

Corresp. PhnU-MrphU

MorphologicalUnit

PoS & subcat.inflectional paradigm

PhonologicalUnit

stress positionvowel opennesscons. prononciation

syntactic structure 2

positionsynt. restr. Frameseta. head properties

b. subcat. frame SyntacticUnit

semantic level: information contentsemantic level: information content

september 2004 Nilda Ruimy

Page 32: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

Predicative RepresentationPredicative Representation

Assigned to predicative semantic units assignment of a lexical predicate type of link holding btw. entry and predicate predicate argument stucture

semantic role of arguments

selection restrictions of arguments

link semantic arguments / syntactic complements

Describes the semantic scenario a word sense is involved in

Nilda Ruimy

Page 33: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

Assignment of a lexical predicateAssignment of a lexical predicate

verbs;predicative nouns: deverbals (costruzione) and collective simple nouns (gruppo), nouns denoting a relation (madre), quantity (bottiglia), part (fetta), unit of measurement (metro), property (bellezza);adjectives;some adverbs (indipendentemente da)

Nilda Ruimy

Page 34: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

PRED_ACCUSARE

accusare

accusatore

accusa

master

agent nominalisation

process nominalisation

accusato

patient nominalisation

september 2004

Predicate-semantic unit linkPredicate-semantic unit link

to accuseaccusation

accusatoraccused

Nilda Ruimy

Page 35: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

ProtoAgent: volitional subject of verb: ARG0 of kill

ProtoPatient: object undergoing an action: ARG1 of kill

2ndParticipant: indirect object: ARG2 of give

SoA (State of Affair): sentential complement: ARG2 of ask

Location: ARG2 of put

Direction: ARG2 of move

Origin: ARG1 of move

Kinship: ARG0 of father

HeadQuantified: ARG0 of metre, bottle

september 2004

Semantic arguments: thematic roles

Semantic arguments: thematic roles

Nilda Ruimy

Page 36: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Features, used transversely across semantic types (eg.: plusEdible), allow to capture wider preferences w.r.t. single semantic types:

ARG1 eat : [PlusEdible] / ARG1 eat : [FOOD]

september 2004

Semantic arguments: selectional restrictionsSemantic arguments: selectional restrictions

Not proper restrictions, but rather preferences of preferences of combinations in prototypical situationscombinations in prototypical situations.

Expressible through:semantic types;notions (combination of types or type + feature…)features;semantic units

Nilda Ruimy

Page 37: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

increase: the increase of prices by the government

september 2004 Nilda Ruimy

PREDICATIVE REPRESENTATIONPREDICATIVE REPRESENTATION

EXTENDED QUALIA INFO.EXTENDED QUALIA INFO.

ONTOLOGICAL INFO.ONTOLOGICAL INFO.

Aumento:

• Semantic type: Cause_change_of_value

• Gloss: accrescimento in dimensione o quantità

• Agentivecause: yes

L’aumento dei prezzi da parte del governo

• Supertype: Cause_relational_change

• Eventype: transition• Domain: general, economics

• aumento isa cambiamento

• aumento resulting_state maggiore

• Direction: up

• Morphological derivation: Eventverb aumentare

• Lexical semantic predicate: PRED_aumentare

• Type of link: event nominalization

• Predicate arg. struct.: range, semantic role & selectional restrictions of args.:

Arg0

Protoagent

Human / Institution

Arg1

ProtoPatient

Entity

Arg2

Quantifier

Amount

Semantic entry information content (1)Semantic entry information content (1)

Page 38: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

spray: to spray water with a spray

september 2004 Nilda Ruimy

PREDICATIVE REPRESENTATIONPREDICATIVE REPRESENTATION

EXTENDED QUALIA INFO.EXTENDED QUALIA INFO.

ONTOLOGICAL INFO.ONTOLOGICAL INFO.

vaporizzatore:

• Semantic type: Instrument

• Gloss: apparecchio usato per ridurre in minuscole particelle un liquido

• vaporizzatore created_by fabbricare

spruzzare acqua con un vaporizzatore

• Supertype: Artifact

• Eventype: ===• Domain: general, cleaning, gardening, cosmetics

• vaporizzatore isa apparecchio• vaporizzatore has_as_part pulsante

• vaporizzatore used_for atomizzare

• Morphological derivation: Eventverb vaporizzare

• Lexical semantic predicate: PRED_vaporizzare

• Type of link: instrument nominalization

• Predicate arg. struct.: range, semantic role & selectional restrictions of args.:

Arg0

Protoagent

Human / Instrument

Arg1

ProtoPatient

+liquid

Arg2

Location

Concrete_entity

Semantic entry information content (2)Semantic entry information content (2)

• Synonymy: nebulizzatore

Page 39: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

domain

semant. class

a. head properties

b. subcat. frame

positionsynt. restr.

syntactic structure 1

ontological type Corresp. SynU-SemU

event type

semant. features

semant. relations

Extended Qualia Structure

regular polysemysem. restr.

argumentspredicate predicative represent.

Corresp. Syntax-Semantics

type of link

SemanticUnit

synonymy

derivation

constitutive role

formal role

telic role

agentive role

syntactic structure 2

positionsynt. restr. Frameseta. head properties

b. subcat. frame SyntacticUnit

Syntax-semantics mapping (1)Syntax-semantics mapping (1)

september 2004 Nilda Ruimy

Page 40: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Nilda Ruimy

SynU_migliorare

Transitive structure

P0 P1

Intransitive structure

P0Frameset

SYNTACTIC LEVEL

SEMANTIC LEVEL

SemU2_migliorare

CHANGE_OF_STATE

SemU1_migliorare

CAUSE_CHANGE_OF_STATE

‘to improve’

PRED_ migliorare

ARG0 : Agent ARG1 : Patient

SEMANTIC PREDICATE

LINK PREDICATE-SEMANTIC UNIT

september 2004

Syntax-semantics mapping (2)Syntax-semantics mapping (2)

Page 41: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004 Nilda Ruimy

SynU_migliorare ‘to improve’

Transitive structure

P0 P1

Intransitive structure

P0Frameset

SemU1_migliorare SemU2_migliorare

CHANGE_OF_STATECAUSE_CHANGE_OF_STATE

PRED_ migliorare

ARG0 : Agent ARG1 : Patient

CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME

isomorphic isomorphic non-isomorphic non-isomorphic

Syntax-semantics mapping (2)Syntax-semantics mapping (2)

Page 42: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

a template is a schema providing, for each semantic type, a set of structured information that are deemed crucial to its definition

twofold function:interface between ontology and lexiconguide for the lexicographer

ensures systematicity, consistency and uniformity of representation of the lexical meaning

september 2004

Template-drivenencoding methodology

Template-drivenencoding methodology

Nilda Ruimy

Page 43: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

SemU: SemU identifier SynU: Identifier of the SynU the SemU is related to BC number: Number of the corresponding ItalWordNet base concept Template_Type: [Container] Unification_path: [Concrete_entity | ArtifactAgentive | Telic] Domain: General Semantic Class Link to the LexiQuest (or any other ontology) Gloss: Lexicographic gloss Predicative_Repr.:

Predicate associated to the SemU and its argument structure [container(arg0)]

Selectional Restr.:

Selectional restrictions (Arg0-HeadQuantifier-Substance)

Derivation: Derivational relations between SemUs Formal: isa (1, <container> or <hyperonym>) Agentive: created_by (1, <Usem>: [CREATION]) //definitorial// Constitutive: made_of (1, <Usem>) //optional//

has_as_part (1, <Usem>) //optional// contains (1, <Usem>)

Telic: used_for (1, <contain>) //definitorial// used_for (1, <measure>) //optional//

Synonymy: Synonyms of the SemU //optional// Regular Polysemy:

[Amount] [Container]

A templateA template

september 2004 Nilda Ruimy

Page 44: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Generic lexicon large coverage (vocabulary and synt. structures)

Based on a rich and multifunctional linguistic and representational

model shared by 11 other European lexica

Fine-grained information, highly structured, innovative, most useful

for HLT applications

The largest electronic, multilevel lexical resource of Italian language

Lexical description conformant to international standards

Respect of the principles of uniformity, consistency and exhaustivity

High level of reusability

4 description levels: phonology, morphology, syntax, semantics

55,000 words encoded

september 2004

CLIPS’ key featuresCLIPS’ key features

Nilda Ruimy

Page 45: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

natural language understanding, etc.

surface and deep analysis of texts

information retrieval

machine translation

Application fieldsApplication fields

september 2004

building semantic networks

extracting the vocabulary of a specific domain

The wealth of information the lexicon contains allows:

NP recognition: disambiguating the semantic contribution

of some PPs in complex nominals

Nilda Ruimy

Page 46: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

as the PAROLE and SIMPLE lexicons, CLIPS does meet these requirements

september 2004

To lend itself to further uses, a lexicon must have: flexible model generic database uniformly structured data precise and explicit linguistic description

Nilda Ruimy

Page 47: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

1) Use CLIPS and the PAROLE-SIMPLE French lexicon

2) Perform a semi-automatic linking of their respective

entries

Strategy I:

Creating a bilingual electronic lexical resource

Creating a bilingual electronic lexical resource

Nilda Ruimy

Page 48: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

1) Derive , in a semi-automatic way, a semantically

annotated French lexicon from CLIPS

2) Use source and derived lexicons as a basis for

building a bilingual resource

Strategy II:

Creating a bilingual electronic lexical resource

Creating a bilingual electronic lexical resource

Nilda Ruimy

Page 49: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Strategy I:

CLIPSCLIPS

bilingual dictionary

IT-FR & FR-IT

capoufficiogentile

residenzatesserepompascriveretessuto

vestibolotesto

amministratorevincere

PAR-SIMPLE

French lex.PAR-SIMPLE

French lex.

capo_1 phon:…….morph:.……syn:……….sem:…….

capo_2….

ufficio_1 ………………………….

tête_1 morph:.……syn:……….sem:…….

tête_2

…..

tête_3

bureau_1 ………………………….

?

?

capo xxxxx têteyyyyy chefzzzzz bout

ufficioxxxxx bureauyyyyy charge…….. ……..

tête xxxxx testayyyyy capozzzzz facciawww cima bureauxxxxx ufficioyyyyy scrivania……..

ALGORITHM

september 2004 Nilda Ruimy

Page 50: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Analysis of the inherent properties of the SL & TL senses:• identity of ontological classification or subsumption relation btw. the semantic type of the SL & TL senses• identity of semantic class or subsumption relation btw. their semantic class• identity of domain or subsumption relation btw. their domain info.• identity / corrispondence of semantic features• identity / corrispondence of semantic relations

Analysis of their contextual properties:• compatibility of syntactic valency• function and grammatical instantiation of complements• compatibility of semantic valency• semantic role and semantic restrictions of arguments

cf. Villegas et al. LREC 2000, Athens

september 2004 Nilda Ruimy

Page 51: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

evento évènementfreedefinition=”cio' che e' accaduto o potra' accadere, avvenimento”Tipo semantico: EVENTSupertype: ENTITYClasse semantica: EVENT

freedefinition="something that happens at a given place and time"Tipo semantico: EVENTSupertype: -----Classe semantica: EVENT

scrivere écrire

freedefinition=”creare qualcosa di scritto”Tipo semantico: SYMBOLIC_CREATIONSupertype: CREATIONClasse semantica: CREATIONDomain: CREATIVE_WRITING

freedefinition=”create written works & semi” Tipo semantico: CREATIONSupertype: -----Classe semantica: CREATIONDomain: ----

pompa pompefreedefinition=”macchina o apparecchio usato per sollevare liquidi o comprimere gas”Tipo semantico: INSTRUMENTUnificationPath:ConcreteEntityArtifactagentive -MaterialtelicClasse semantica: APPARATUS

freedefinition= "a device that moves fluid or gas by pressure or suction"Tipo semantico: -----UnificationPath:----- 

Classe semantica: APPARATUSseptember 2004 Nilda Ruimy

Page 52: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

vincere vaincre

freedefinition=”portare a termine con successo” Tipo semantico: RELATIONAL_ACTClasse semantica: ACTIVITYRel.Sem:----

freedef.=”be the winner in contest/competition” Tipo semantico: CAUSE_RELAT.-CHANGEClasse semantica: CHANGERel.Sem: Resulting_action/state: victoire Agentive_cause:cause

Tipo semantico: RELATIONAL_ACTSupertype: -----Classe semantica: OBJECTDomain: ----Tratto distintivo: PLUS_SEMIOTIC

Tipo semantico: INFORMATIONSupertype: REPRESENTATIONClasse semantica: ABSTRACTDomain: MEDIATratto distintivo: PLUS_SEMIOTIC

textetesto_1

Tipo semantico: SEMIOTIC_ARTIFACTUnficationPath:ConcreteEntity-Artifactagentive -TelicClasse semantica: ARTIFACTDomain: MEDIATratto distintivo: PLUS_SEMIOTIC

testo_2

PREDICATE_vincere_1 PREDICATE_vaincre_2september 2004 Nilda Ruimy

Page 53: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Discrepancy of lexical coverage between the lexicons => method applicable to 10,000 senses only

Drawbacks of this strategyDrawbacks of this strategy

september 2004

SIMPLE-FR does not always encode all information => necessity of manual intervention wherever SL and TL entries have NO corresponding element due to:

encoding error having privileged different although complementary aspects of meaning, e.g.: imprigionare: PURPOSE_ACT

vs. emprisonner: CAUSE_RELATIONAL_CHANGE

lack of information

Nilda Ruimy

Page 54: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

Deriving a FR lexicon from CLIPSDeriving a FR lexicon from CLIPS

Feasibility study for deriving a semantically annotated French lexicon using CLIPS lexical knowledge

Crucial step for deriving the French entries:

correctly pair off each FR w. sense with the relevant CLIPS semantic unit whose information we want to ultimately assign to the French entry

Strategy II – Phase 1:Strategy II – Phase 1:

Nilda Ruimy

Page 55: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

villaggio: 1 . (piccolo centro abitato) village2. (complesso urbanistico) village

CLIPSCLIPS

semantically annotated

French lexicon

semantically annotated

French lexicon

capo:1.(testa) tête;2.(persona che...) chef...

sense indicatorapproach

sense indicatorapproach

cognate approachcognate

approach

september 2004

exploits the cognateness of Italian and French endings to relate the FR word to the IT CLIPS entry and infer

the FR entry

matches onto the CLIPS data the information provided in bilingual

dictionaries by sense indicators, in order to identify the relevant CLIPS

entry Nilda Ruimy

Page 56: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

look-up

september 2004 Nilda Ruimy

<SemU id="USem0001village">naming="village"weightvalsemfeaturel= «Geopolitical_Location»[…] </SemU>

<SemU id="USem0002village"> naming="village"weightvalsemfeaturel=«Human_group»[…] </SemU>

FR–LEX

<SemU id="USem4123villaggio"> naming="villaggio"weightvalsemfeatrel=«Geopolitical_Location»[…] </SemU>

<SemU id="USemD63504villaggio"naming="villaggio"weightvalsemfeaturel=«Human_group» […]</SemU>

IT–CLIPS

IT—FR bilingual dict.

villaggio : 1. (piccolo centro abitato) village 2. (complesso urbanistico) village

The cognate approachThe cognate approach P. Bouillon, B. Cartoni, TIM/ISSCO, ETI, Geneva

derivation

Condition: unique French constructed word

translate all IT senses

Page 57: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

IT word SENSE INDICATOR FR wordIT word SENSE INDICATOR FR word

compagnie(presenza)compagnia

compagnie(gruppo)compagnia

asphalte(per rivestire) asfalto

sentir (percepire)avvertire

prévenir(avvisare)avvertire

aspirer àintr.(avere) prep. aaspirare

aspirertr. (inalare)aspirare

aspirerLING.aspirare

aspirertr.(con un tubo)aspirare

tête(testa)capo

chef(persona che…)capo

extracted from bilingual dictionary

analysis & classificationof sense indicators

Nilda Ruimy september 2004

The sense indicator approachThe sense indicator approach N. Ruimy, ILC-CNR, Pisa

Page 58: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

indicators conveying morphosyntactic information: verb subclass, auxiliary selection, plural form of nouns,

typical subject / object, PP type, etc.

september 2004

Types of sense indicators (1)Types of sense indicators (1)

Nilda Ruimy

 Italian–French

 COVARE

A. v.tr.

1 (di uccelli) [dar calore col proprio corpo alle uova per sviluppare l’embrione] couver

2 (fig.) [custodire con gelosia] couver

3 (fig.)[nutrire, alimentare in segreto dentro di sé] nourrir, mijoter

[tramare, macchinare in segreto] couver [incubare] couver: covare un malanno

B. v.intr. (aus. avere)(fig.)[stare chiuso, nascosto] couver: il fuoco cova sotto la cenere auxiliary

typical subj.verbal class

verbal class

Atkins, Bouillon, 2003

Page 59: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

indicators conveying inferential information: synonyms, hypernyms, meronyms domain of use

Types of sense indicators (2)Types of sense indicators (2)

Nilda Ruimy

 Italian–French

 CAPOI (persone)1 [testa] tête 2 (fig.) [mente, intelligenza] tête 3 [persona investita di comando, di potere] chef

II (animali)1 (raro) -> testa2 spec. al plur [ciascun individuo di una specie determinata]

têtes, pièces

III (cose) 1 [la parte più grossa e più sporgente di un oggetto] tête 2 [la parte più alta] haut3 [ciascuna delle due estremità di qlco.] bout, tête4 [inizio, principio] début5 [fine, conclusione; sbocco] bout6 loc. …..7 (nei filati) fil8 [singolo oggetto appartenente ad una serie] pièce9 (geog.) cap

synonym

hypernym

synonym

domain of use

domain of use

synonym

Page 60: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

IT word SENSE INDICATOR FR wordIT word SENSE INDICATOR FR word

CLIPSCLIPSCLIPSCLIPS

bijouterie(arte)gioielleria

bijouterie(negozio)gioielleria

asphalte(per rivestire) asfalto

sentir (percepire)avvertire

prévenir(avvisare)avvertire

aspirer àintr.(avere) prep. aaspirare

aspirertr. (inalare)aspirare

aspirerLING.aspirare

aspirertr.(con un tubo)aspirare

tête(testa)capo

chef(persona che…)capo

sense indicators used as search keys for identifying, in CLIPS, the semantic entry relevant to the IT sense

of the bilingual pair

Nilda Ruimy september 2004

Page 61: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

Using sense indicatorsUsing sense indicators

indicators usable straightforwardly

indicators to be converted into the descriptive

language of CLIPS:

illuminare (rendere luminoso) illuminer (to make luminous)

analizzatore (chi effettua analisi) analyseur (who performs analyses)

sem. type of analizzatore belongs to HUMAN hierarchy

sem. type of iluminare belongs to causative types hierarchy

Nilda Ruimy

Page 62: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

september 2004

Rule typesRule types search for a CLIPS entry containing the s.i. as target

of the synonymic relation

of the hypernymic relation

of any qualia relation

search for a CLIPS entry sharing properties with the entry of the s.i.

shared hypernym

shared semantic type

search for a CLIPS entry containing information inferred from the s.i.

specific type

specific relation or feature (esp. domain info.)

specific syntactic structure

testacapo synonym_rel

negoziogioielleria isa_rel

comunicare (notificare) isa_rel dire

avvertire (percepire) semtype EXP._EVENT

conoscere (pron. (reciprocamente)) reciprocal syn. struct.

Nilda Ruimy

Page 63: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

IT word SENSE INDICATOR FR word

CLIPSCLIPSCLIPSCLIPS

compagnie(presenza)compagnia

compagnie(gruppo)compagnia

asphalte(per rivestire) asfalto

sentir (percepire)avvertire

prévenir(avvisare)avvertire

aspirer àintr.(avere) prep. aaspirare

aspirertr. (inalare)aspirare

aspirerLING.aspirare

aspirertr.(con un tubo)aspirare

tête(testa)capo

chef(persona che…)capo

SemU61397capo, sem. type=Body_part, where <capo> synonym <testa>

SemU3615capo, sem. type=Role, where <capo> isa <persona>

SemU68603asfalto, sem. type=Artifact_Material, where <asfalto> used_for <rivestire>

SemU79372aspirare, sem. type=Speech_act, where domain:phonetics

SemU7040aspirare, sem. type=Modal_event, linked to SynUaspirare, intr. pp_a

september 2004 Nilda Ruimy

Page 64: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Small percentage of errors due to a

different granularity of

sense distinctions in CLIPS and in the blingual dictionary

IT constructed words whose different senses are translated by a unique FR

constructed word

IT constructed words having more than one

translation

–aggio 89.9 % 10.1 %

–tà 77.4 % 22.6 %

–zione 80.4 % 19.6 %

FR constructed words sharing the IT CLIPS entries

–aggio 99.97 %

–tà 99.98 %

–zione 99.98 %

recall ratio

september 2004

Cognate approach: resultsCognate approach: results

Nilda Ruimy

Page 65: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Itword – sense indicator – FRword

X – A – Y

application order

1 2 9 7 8 6 3 5 4

investigated lex. data

target of syn. rel.

target of hyper. rel.

target of any qualia

sharedhypernym

sharedsemtype

specificsemtype

specificdomain

specificfeat/rel

specificsyn.struct

success

rate16.6%

26.8% 0.92% 8.9% 5.8% 3.9% 12.3% 9.2% 15.4%

rule type

1

search for an entry of X containing string A

2

search for entry of X sharing properties with an entry of A

3

search for an entry of X containing information inferred from A

september 2004

Sense indicator approach: resultsSense indicator approach: results

the higher the rule rank, the more reliable the result

Nilda Ruimy

Page 66: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

distribution of success rate over the algorithm rules

distribution of success rate over the algorithm rules

recall ratio: 69%recall ratio: 69%

september 2004 Nilda Ruimy

Page 67: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

results may be enhanced by gleaning the most

informative sense indicators from different sources

september 2004

Combining the two methodsCombining the two methods

constructed words represent

68.2% of the vocabulary

successful handling of:

+

69% of non constructed words

95% of constructed words

Nilda Ruimy

Page 68: The Italian CLIPS Lexicon and its reuse in a bilingual environment Nilda Ruimy ILC CNR, Pisa september 2004

Approaches taken applicable to other language pairs sharing

similarities in terms of morphological structure

Derived lexicon building process is simplified and shortened

Deriving new lexical resources from existing ones: a worthwhile

venture in terms of time and effort

Such practice entails coverage and consistency assessment of

the source lexical resource

Source and derived lexicons constitute a most reliable basis for

developing a bilingual resource

september 2004

Concluding remarksConcluding remarks

Nilda Ruimy