aperçu de “boitet-nii-mt-lect2-v2rev.ppt” · after unsure or unfinished ocr (very rare in mt,...
TRANSCRIPT
Christian Boitet
L2: Linguistic architectures of MT systems
Christian BOITET
GETALP-LIG-UJF, Grenoble, [email protected]
16/4/09© Ch. Boitet —!NII MT lectures 1
Introductive words
• No time to show details of various linguisticarchitectures in lecture 1 (L1)but we went into new territory and detailed usually implicit notions
• workflow in HT and MT (MT#1 or automated MT)
• how to measure C, A, Q (ling. quality) for the "C.A.Q MT theorem"
• The goals of L2 will be to:• make explicit the basis of linguistic architectures in MT
• justify independence of linguistic and computational architectures
• at the same time, present various intermediate structures
• speak about their pros a cons.
16/4/09© Ch. Boitet —!NII MT lectures 2
Outline of lecture 2
• Recap: linguistic, computational, operational architectures
• Kinds of representations usable in principle• Linguistic level: monolevel and multilevel structures
• Scope (what are units of translation?)segments, infrasegments, supersegments, whole documents?
• Geometry: strings, trees, charts/lattices, (hyper)graphs, log. forms
• Abstractness: type of <string, structure> correspondence
• Various kinds of representations really used in MTusing existing systems as examples
• morphosyntactic structures
• syntactic structures: c-structure, f-structure
• logico-semantic structures: spa-structure
• Different sorts of deep pivotsusing UNL as an example
• hybrid, semantico-linguistic, semantico-pragmatic
• UNL (semantico linguistic)
• Recap and conclusions• Pros and cons of various linguistic architectures w.r.t. translational situations
16/4/09© Ch. Boitet —!NII MT lectures 3
Linguistic architectureobjectssee Vauquois' triangle
intermediate representationsdirect, semi-direct,transfer (" 7 variants)
2 lexical spaces
IL (" 2 variants)
3 lexical spaces
Computational architectureautomatic processeshuman interaction, if any
programming paradigmsdirect programmingRBMT (rules, automata…)
corpus-basedSMT, PSMT (unsupervised)EBMT (" 3 variants)
± supervised
hybrid
Linguistic vs. computational architectures
16/4/09© Ch. Boitet —!NII MT lectures 4
Linguistic architectures in MT: Vauquois' triangle
16/4/09© Ch. Boitet —!NII MT lectures 5
Intermediate Representation i
Intermediate Representation i+1
Computational architectures
16/4/09© Ch. Boitet —!NII MT lectures 5
Intermediate Representation i
Intermediate Representation i+1
Computational architectures
procedural
Well-formednessgrammar rules
transitionsof transduction
automata
rewriting rulestransformational
grammars
rule-based
expert
statistical(probabilistic)
SMT
annotatedparallel corpora
trees,S-SSTC
rawparallel corpora
analogy-based MTABMT
example-basedEBMT
empirical
Computional typeof the phase
16/4/09© Ch. Boitet —!NII MT lectures 6
Linguistic and computational MT architectures are independent(recap)
16/4/09© Ch. Boitet —!NII MT lectures 7
Kinds of representations usable in principle
• Linguistic levellevelsmonolevel and multilevel structures
• Scope (what are units of translation?)segments, infrasegments, supersegments, whole documents?
• Geometrystrings, trees, charts/lattices, (hyper)graphs, log. forms
• Abstractnesstype of <string, structure> correspondenceconcrete ! abstract # surface ! deep
16/4/09© Ch. Boitet —!NII MT lectures 8
Formalized representations of texts
16/4/09© Ch. Boitet —!NII MT lectures 8
Formalized representations of texts
Linguistic
levels
Main
linguistic
organisation
Geometrical
structure
Algebraic
structure
Text — Structure
CorrespondenceScope
Surface
Deep
1-stratalmonolevel
n-stratalmultilevel
Syntagms(constituents)
Dependencies
Logical and
semantic
relations
Sring
Graph of
strings (chart)
Tree
Graph /
Network
Hypergraph
Label
Structured
label
Boolean
features
Structured
attributes
(vectors)
Feature
structures
(± typed)
concrete (text
!readable from
structure
totally abstract
(ex. UNL graph)
Sentence(! all)
Paragraph
Page(Ariane-G5,
Sygmart)
Document
16/4/09© Ch. Boitet —!NII MT lectures 9
16/4/09© Ch. Boitet —!NII MT lectures 9
MT architectures on which we work
16/4/09© Ch. Boitet —!NII MT lectures 9
16/4/09© Ch. Boitet —!NII MT lectures 9
16/4/09© Ch. Boitet —!NII MT lectures 9
Rule-based MT
(symbolic) for
sub-languages
16/4/09© Ch. Boitet —!NII MT lectures 9
Rule-based MT
(symbolic) for
sub-languages
All-domain MT via
UNL
16/4/09© Ch. Boitet —!NII MT lectures 9
Rule-based MT
(symbolic) for
sub-languages
All-domain MT via
UNL
Translation
Memory
based MAHT
16/4/09© Ch. Boitet —!NII MT lectures 9
Rule-based MT
(symbolic) for
sub-languages
All-domain MT via
UNL
Translation
Memory
based MAHT
Statistico-
structural MT
16/4/09© Ch. Boitet —!NII MT lectures 9
Rule-based MT
(symbolic) for
sub-languages
All-domain MT via
UNL
Translation
Memory
based MAHTAnalogy-based MT
Statistico-
structural MT
16/4/09© Ch. Boitet —!NII MT lectures 9
Rule-based MT
(symbolic) for
sub-languages
All-domain MT via
UNL
Translation
Memory
based MAHTAnalogy-based MT
Example-
based MT
(S-SSTC)
Statistico-
structural MT
16/4/09© Ch. Boitet —!NII MT lectures 10
Direct translation systems
Graphemic level Direct translation Text
Type Steps Method Comments Examples
RBMT
1975—
segmentation
word for wordtranslation
FST (rules + dict.)rules
OK for very near languagesJapanese ! KoreanHindi ! Urdu
ATLAS-IFujitsu,76-78
SMT
1980—
segmentation,reordering…
alignment +
"decoding"statistical
SMT = first idea about MTfrom war cryptographers(W.!Weaver 1949)
Many SMT
IBM… 1980-
EBMT
2000—
no preprocessing
"pure" EBMT
analogy resolution +
n-gram filteringanalogical
Results $ those of SMTNagao 1984 (similarity MT)Lepage 2000 (real analogy)
ALEPH ATR
2000-
16/4/09© Ch. Boitet —!NII MT lectures 11
Structures
• String of typographical wordsWriting systems with word separators
• English, French…
• Confusion network or lattice at character levelafter unsure or unfinished OCR(very rare in MT, sometime for military intelligence)
• Several strings of typographical wordswith scores
• Segmentation lattice or chartWriting systems without word separators
• Japanese, Chinese, Korean, Thai, Lao, Khmer, Vietnamese…
! in Vietnamese, white spaces separate syllables, not words.
• scored nodes or edges
16/4/09© Ch. Boitet —!NII MT lectures 12
Semi-direct translation systems
Morpho-syntactic level
Graphemic level
Tagged text
Text
Semi-direct translation
Type Analysis Transfer Generation Examples
1G-MT
1950—
Program-basedsegmentation &
lemmatizationprocedural
dictionary consult. +
reordering "macros"
procedural
tables +
string macros
procedural
GAT (Georgetown)Ispra, 1965—69
SPANAM-1PAHO, 1975?—
GLOBALINK" Spanam-1, PAHO
SMT1990—
segmentation &lemmatization
statistical
alignment +
"decoding"statistical
language modelstatistical
Candide IBM, 1980—
Many SMT systemsNIST, IWSLTGoogle (?)
Pidgintranslation
segmentationsnobol4
lemmatizationrules
transduction +
reordering(Q-systems rules ontree charts)
morphologicalgeneration
rules
formattingsnobol4
Idea of B. Harris!!(TAUM, translatologist)
rus # eng, freBoitet 1972
16/4/09© Ch. Boitet —!NII MT lectures 13
Descending surface syntactic transfer systems:RBMT (+SMT / LanguageWeaver?)
Morpho-syntactic level
Syntagmatic level
Graphemic level
C-structures (constituent)
Tagged text
Text
Descending transfer
Type AnalysisTransfer +
syntactic generationMorphological
generationExamples
RBMT
1970—
ATNautomatatrans. rules
recursive descentprocedural
grammar+dict.rules
tables+prog.procedural
ENGSPAN, SPANAM-2PAHO, 1978?—
AS-TransacToshiba, 1982—
Reverso ProMT, 1986—
RBMT1980—
ECFG (+decorations)
gram. rulesrecursive descent
proceduraloften in LISP
grammar+dict.rules
METAL Austin, 1982—
Duet-2 Sharp, 1984—
Shalt-1 IBM-Jp, 1982—
Kate !!KDD, 1983—
RBMT
1984—
Lemmatization +
Slot-grammarsprolog
recursive descentlogic programming
in prolog
dictionary+tables+ prog.
prolog
LMT (IBM-US, 1983—)
16/4/09© Ch. Boitet —!NII MT lectures 14
Structures
• Syntactic treesInformation on nodes
• simple labels METEO (Q-systems)
• main label + boolean attributes (TAUM-aviation)
• main label + typed attributes (Ariane, METAL & many others)Concrete trees (projective)
• constituents: frontier (leaves) is a prototype of the utterance
• dependencies: in-order (infix) traversal is a prototype…Abstract trees (not projective)
• normalized trees with semantic constituents (Colmerauer, METEO)
• sometime necessary because of1. verbs with separable particles (German, English…)
2. comb-like constructions
! A, B & C gave A', B' & C' to A", B" & C"
! ,(give(A, A', to(A")), give(B, B', to(B")), give(C, C', to(C"))
16/4/09© Ch. Boitet —!NII MT lectures 15
Syntactic trees
• Concrete syntagmatic (constituent) treeAnn, Bob give cod, donut to Emil, Fathia
PHVB
V
give
VP
N
Ann
NP
N
Bob
NPCoo
,
N
cod
NP
N
donut
NPCoo
,
N
Emil
PP
N
Fathia
NPCoo
,
Prep
to
16/4/09© Ch. Boitet —!NII MT lectures 16
Syntactic trees
• Concrete dependency tree
giveV
subj
AnnN
BobN
,Coo
gov
ccp
codN
donutN
,Coo
gov
ccp
obj1
obj2
EmilN
FatiaN
,Coo
gov
ccp
toPrep
gov
16/4/09© Ch. Boitet —!NII MT lectures 17
Syntactic trees
• Abstract dependency tree
giveVsubj
AnnN
BobN
gov
codN
donutN
obj1 obj2
EmilN
FatiaN
,Coo
toPrep
giveVsubj
toPrep
obj1 obj2
gov
16/4/09© Ch. Boitet —!NII MT lectures 18
Examples!: MT for access, web (1)
ENGLISH (human version) FRENCH (human version) ENGLISH (Systran FRE-ENG
version)The European-Heritage.net
thesaurus covers the fields of
archaeology and architecture as
defined in the Council of Europe
conventions signed in Granada
(1985) and Malta (1992).
Le thesaurus European-Heritage.net
couvre les champs de l'archéologie et
de l'architecture au sens des
conventions du Conseil de l'Europe de
Grenade (1985) et de Malte (1992).
The European-Heritage.net thesaurus
covers the fields of archaeology and
architecture within the meaning of
conventions of the Council of Europe
of Grenade (1985) and Malta (1992).
It encompasses information
ranging from the partners
involved, categories of cultural
assets and legislation, to activities,
skills and funding. It is
supplemented by a number of
specific thesauruses compiled by
each member state on a particular
topic, such as the thesaurus on
Andalusian heritage or the
architectural thesaurus from the
Mérimée database in France.
Il prend en compte des aspects aussi
variés que les acteurs, les catégories
de biens culturels, la législation ou
encore les interventions, les métiers et
les financements. Il est complété et
prolongé par des thesaurus spécifiques
développés par chaque Etat membre
sur tel ou tel sujet spécifique, comme
le thesaurus du patrimoine historique
andalou ou le thesaurus d'architecture
de la base de données documentaire
Mérimée en France.
It takes into account aspects as varied
as the actors, the categories of cultural
goods, the legislation or the
interventions, the trades and the
financings. It is supplemented and
prolonged by thesaurus specific
developed by each Member State on
such or such specific subject, like the
thesaurus of the Andalusian historical
inheritance or the thesaurus of
architecture of the documentation data
base Mérimée in France.
This new, open-ended search tool
will come on line shortly, together
with a management and
administration system shared
among the various contributors.
Cet instrument de recherche,
forcément évolutif, sera mis
prochainement en ligne accompagné
d'un dispositif de gestion et
d'administration réparti entre les
différents contributeurs.
This instrument of search, inevitably
evolutionary, will be put soon on line
accompanied by a device of
management and administration
distributed between the various
contributors.
16/4/09© Ch. Boitet —!NII MT lectures 19
Examples!: MT for access, web (2)
• FE easy compared to EG and even more FGGERMAN (Systran ENG-GER version) GERMAN (Systran FRE-GER version)Der European-Heritage.netthesaurus umfaßt die
Felder von archaeology und von Architektur,
wie in den Europaratvereinbarungen definiert,
die in Granada (1985) unterzeichnet werden und
in Malta (1992).
Der European-Heritage.net-Thesaurus bedeckt
die Felder der Archäologie und der Architektur
im Sinne der Übereinkommen des Europarats
von Granada (1985) und von Malta (1992).
Er gibt die Informationen um, die von den
betroffenen Partnern, von den Kategorien der
kulturellen Werte und der Gesetzgebung, bis zu
Aktivitäten, von den Fähigkeiten und von der
Finanzierung reichen. Er wird durch eine Anzahl
von den spezifischen Thesauren ergänzt, die
durch jeden Mitgliedsstaat auf einem
bestimmten Thema, wie dem Thesaurus auf
Andalusian Erbe oder dem architektonischen
Thesaurus von der Datenbank Mérimée in
Frankreich kompiliert werden.
Er berücksichtigt Aspekte dermaßen variierte,
daß die Beteiligten, die Kategorien kultureller
Güter, die Gesetzgebung oder noch die
Interventionen, die Berufe und die
Finanzierungen. Er wird vervollständigt und
wird durch ein spezifische Thesaurus entwickelt
durch jeder Mitgliedstaat über das eines oder
andere spezifische Thema verlängert, als der
Thesaurus des andalusischen historischen
Kulturgutes oder der Thesaurus der Architektur
der urkundlichen Datenbank Mérimée in
Frankreich.
Dieses neue, offene Suchhilfsmittel kommt auf
Zeile kurz, zusammen mit einem Management-
und Leitungssystem, das unter den
verschiedenen Mitwirkenden geteilt wird.
Dieses notgedrungen entwicklungsfähige
Forschungsinstrument wird gestellt demnächst
online begleitet von einer Verwaltungs- und
Verwaltungsvorrichtung, die aufgeteilt unter den
verschiedenen Beitragenden.
16/4/09© Ch. Boitet —!NII MT lectures 20
Comparison!: rough vs. raw MT
Reverso rough Spanish-English output SpanAm raw Spanish-English outputMessage of the Chief operating officer of the World
Organization of the Health
Message of the Director-General of the World Health
Organization
From his{*its*} discovery, the antibiotics have transformed
completely the perspective of the humanity with regard to
the infectious diseases. Today the use of the antibiotics,
cocktail with improvements in the reparation, the housing
and the nutrition, together with the advent of the programs
of widespread vaccination, they have given place to a
notable decrease of infectious diseases that before were
common and were annihilating entire populations.
From its discovery, antibiotics have completely transformed
the perspective of humankind with respect to infectious
diseases. Today the use of antibiotics, combined with
improvements in sanitation, housing, and nutrition, together
with the advent of the vaccination programs generalized,
have caused a notable reduction of infectious diseases that
previously were common and annihilated entire populations.
Scourges that terrified million persons, as the pest, the
savage cough, the poliomyelitis and the scarlatina, they have
been controlled or are on the verge of be controlling. Now,
in the dawn of a new millenium, the humanity faces with
another crisis. Diseases before curable as the gonorrhea and
the fever tifoidea they are becoming rapidly difficult to
treat, whereas killer old men as the tuberculosis and the
malaria are armed{*assembled*} now with the increasing
impenetrable resistance the antimicrobial ones.
Scourges that terrified millions of people, as plague,
whooping cough, poliomyelitis, and the scarlatina, have
been controlled or are on the verge of being controlled.
Now, in the dawn of a new millennium, humankind faces
another crisis. Previously curable diseases as the gonorrhea
and typhoid fever are becoming rapidly difficult to treat,
while old assassins as tuberculosis and malaria now are
armed of the increasingly impenetrable resistance to the
antimicrobial drugs.
This phenomenon is potentially contenible. The problem is
increasingly deep and complex, accelerated by the abuse of
the antibiotics in the developed countries and the
paradoxical subutilization of the antimicrobial ones of
quality in the countries in development due to the poverty
and the resultant shortage of an attention of effective health.
This phenomenon is potentially contenible. The problem is
increasingly profound and complex, accelerated by the
abuse of antibiotics in the developed countries and the
paradoxical underutilization of the quality antimicrobial
drugs in the developing countries due to the poverty and to
the scarcity resulting from an effective health care.
16/4/09© Ch. Boitet —!NII MT lectures 21
Descending deep syntactic transfer systems
Syntactico-functional level
Morpho-syntactic level
Graphemic level
F-structures (functional)
often dependency
structures
Tagged text
Text
Descending transfer
Type AnalysisTransfer +
synt. generationMorphological
generationExamples
RBMT
1985—
Segm.+ lemmatizationprocedural
Dependencygram. rules+ constraint progr.
recursive descentprocedural
grammar+dict.rules
tables+prog.procedural
JETS (IBM-Jp, 1985-90)
1.5G-MT
1990—
LemmatizationFST (+ dictionaries)
Dependency graphprocedural (C macros)
deterministic
recursive descentprocedural
grammar+dict.rules
tables+prog.procedural
Systran 1990—
16/4/09© Ch. Boitet —!NII MT lectures 22
Morpho-syntactic level
Syntagmatic level
Graphemic level Direct translation
Syntactic transfer (surface) C-structures (constituent)
Tagged text
Text
Type Analysis Transfer Generation Examples
RBMT
1992—
lemmatization +
linear patternsrules
treelet dict.
+ sem. thesaurusrules
tree flattening
grammar+dict.rules
TDMT (for SLT)ATR, 1992—1998
RBMT
1995—
lemmatization +
Slot Grammarsprolog
treelet dictionaryprolog
recursive descentprolog
grammar+dict.rules
PT (from LMT)Linguatech, 1995—
EBMT
2000—
Initial data: bilingual// corpusdictionary
Preparation:build S-SSTCsimprove (hum)
Translation:A//T//Gbottom-up
EBMT (Banturjah)UTMK, USM, 2000—
PSMT
PSCFG
2002—
lemmatization
chunkingstatistical
alignment
decodingstatistical
tree flattening
post-processingstatistical
LanguageWeaver 2002—
Google 2005—
+WU, Melamed 1997, 2004
Horizontal surface syntactic transfer systems:RBMT & Phrase-Based SMT
16/4/09© Ch. Boitet —!NII MT lectures 23
Horizontal deep syntactic transfer systems
Syntactico-functional level
Morpho-syntactic level
Syntagmatic level
Graphemic level
Syntactic transfer (deep) F-structures (functional)
C-structures (constituent)
Tagged text
Text
Type Analysis Transfer Generation Examples
RBMT
1975—
grammar + dictionary
dependency analysisrules
treetransformations
rules
tree flattening
grammar+ dictionaryrules
ETAP-2, ETAP-3IPPI, Moscow, 1977—
RBMT
1995—
lemmatization +
Slot GrammarsProLog
treelet dictionaryProLog
recursive descentProlog
grammar+ dictionaryrules
PT (from LMT)Linguatech, 1995—
RBMT+
SMT
1999—
MSR (Microsoft )analyzers
rules (in G)
Learned from pairs(lf_s, lf_t)
statistical
Microsoft generatorsrules(in G)
MTS-1(on technicaldocumentation)
16/4/09© Ch. Boitet —!NII MT lectures 24
Horizontal multilevel transfer systemsType Analysis Transfer Generation Examples
RBMT
1990—
Lemmatization?
ECFG (govt & binding)gram. rulesinteractivedisambiguation
if not enough space
dictionary +treetransformations
rules
treetransformations
rules
MG: dictionary +grammars
rules
ITS (Geneva, 1990—)
Perhaps PT-2(rather than SF)
Syntactico-functional level
Morpho-syntactic level
Syntagmatic level
Graphemic level
Multilevel transfer
F-structures (functional)
C-structures (constituent)
Tagged text
Text
N levels in 1 structure
(abstract constituent tree) Multilevel description
Logico-semantic level
16/4/09© Ch. Boitet —!NII MT lectures 25
Ascending multilevel transfer systems
Ascending transferLogico-semantic level
Syntactico-functional level
Morpho-syntactic level
Syntagmatic level
Graphemic level
SPA-structures (semantic &
predicate-argument)
F-structures (functional)
C-structures (constituent)
Tagged text
Text
N levels in 1 structure Multilevel description
Type Analysis Transfer Generation Examples
RBMT
1978—
lemmatization:
dict. + Ndet FSTrules
treetransformations
rewriting rules
dictionary+treetransformations
rules
treetransformations
rules
MG: dict. + gram.rules
Ariane-G5-basedru-de#ru, en#my-th 80-87fr#en (BV/aero) 85-92fr#en-de-ru (LIDIA) 90-96
HICATS Hitachi (1990—-)
Jemah USM, NUS (1990—-)
16/4/09© Ch. Boitet —!NII MT lectures 26
Multilevel concrete trees (umc-structures)
“The customers were not given their money back by the cashier, butby the waiter.”
S[type=assertive, time=past, aspect=perfective, tense=c-past, voice=passive…](NP[semrel=dest, logrel=arg2, synfunc=subj, sem=human, num=plur…]
(art[lex=‘the’, semrel=deict, synfunc=det, number=plur, deter=definite…]!noun[lex=‘customer’, synfunc=head, sem=human, number=plur…])
!aux[lex=‘be’, tense=pret, pers=3, number=plur…]!neg[lex=‘not’]!vrb[lex=‘give’, synfunc=head, voice=passive, tense=ppart, vbpart=‘back’…]!NP[semrel=patient, logrel=arg1, synfunc=obj1, number=sing…]
(adjposs[lex=‘his’, semrel=poss, synfunc=det, number=plur, deter=definite…]!Noun[lex=‘money’, synfunc=head, number=sing…])
!vbpart[lex=‘back’]!NP[semrel=agent, logrel=arg0, synfunc=agcomp, number=sing, neg=not-but…]
(prep[lex=‘by’, synfunc=reg]!art[lex=‘the’, semrel=deict, synfunc=det, number=sing, deter=definite…]!Noun[lex=‘cashier’, synfunc=head, sem=human, number=sing, neg=not…]!NP[semrel=id, logrel=arg0, synfunc=coord, number=sing…]
(conj[lex=‘but’, synfunc=coo]!prep[lex=‘by’, synfunc=reg]!art[lex=‘the’, semrel=deict, synfunc=det, number=sing, deter=definite…]!noun[lex=‘waiter’, synfunc=head, sem=human, number=sing…]))
!punct[lex=‘.’])
16/4/09© Ch. Boitet —!NII MT lectures 27
The customers were not given their money back by the cashier, but by the waiter.
Concrete multilevel tree: head-driven, mix ofconstituent / dependency structures, projective
NP
noun['customer'…]
S [type=assertive, time=past, aspect=perfective, tense=c-past, voice=passive…]
NP[semrel=patient,
logrel=arg1,sf=obj1, sing…]
aux['be'…]
neg['not'…]
vrb ['give',passive,ppart…]
art['The'…]
vbpart['back']
noun['money'…]
adjpos['his'…]
NP [semrel=agent,logrel=arg0,
sf=agcmp, sing…]
art['the'…]
prep[by…]
noun['cashier'…]
NP [semrel=id,logrel=arg0,
sf=coord, sing…]
art ['the', semrel=deict,sf=det…]
prep ['by',sf=reg …]
noun[waiter…]
conf ['but',sf=coo…]
punct['.']
16/4/09© Ch. Boitet —!NII MT lectures 28
The customers were not given their money back by the cashier, but by the waiter.
Abstract multilevel tree: same, but smaller(abstraction), & de-projectivized if necessary
NP [semrel=ben,logrel=arg2, sf=subj…]
noun['customer',def, sing…]
S [type=assertive, time=past, aspect=perfective, tense=c-past, voice=passive…]
NP[semrel=patient,
logrel=arg1,sf=obj1, sing…]
vrb ['give-back',passive, c-past, neg…]
noun['money'…]
adjpos['customer',
refpos…]
NP [semrel=agent,logrel=arg0,
sf=agcmp, sing…]
prep ['by',sf=reg…]
noun ['cashier',sf=head, def, sing…]
NP [semrel=id,logrel=arg0,
sf=coord, sing…]
prep ['by',sf=reg …]
noun ['waiter', sf=head,def, sing…]
conj ['not-but',sf=coo…]
punct['.']
16/4/09© Ch. Boitet —!NII MT lectures 29
Examples!: raw MT for revision…
Language divergences are NOT handled contrastively…
16/4/09© Ch. Boitet —!NII MT lectures 30
…with BV-aero/FE (2)
… but by generating from an abstract (multilevel) representation
16/4/09© Ch. Boitet —!NII MT lectures 31
Semantic transfer systems
Logico-semantic level
Syntactico-functional level
Morpho-syntactic level
Syntagmatic level
Graphemic level
Semantic transferSPA-structures (semantic &
predicate-argument)
F-structures (functional)
C-structures (constituent)
Tagged text
Text
Type Analysis Transfer Generation Examples
RBMT
1982—
segmentation
lemmatizationdirect programming
tree transformationsrules
dictionary +
tree transformations
rules
tree transformationsrules
MG: dict. + gram.rules
MUKyodai, 82-87
MAJESTICJICST, 87—
16/4/09© Ch. Boitet —!NII MT lectures 32
Conceptual transfer systems(IL with separate lexicon)
Interlingual level
Logico-semantic level
Syntactico-functional level
Morpho-syntactic level
Syntagmatic level
Graphemic level
Conceptual transfer Semantico-linguistic interlingua
SPA-structures (semantic &
predicate-argument)
F-structures (functional)
C-structures (constituent)
Tagged text
Text
Type Enconversion Conc. transfer Deconversion Examples
RBMT
1980—
Lemmatizationdirect or rules
string-graph transformationsrules
in principle nonegraph-string
transformationsrules
ATLAS-IIFujitsu, 1980—
PIVOT NEC, 1983—
RBMT
1980—DCG (?)
rulesULTRA NMSU, 89-95
RBMT
1997—depending on partners
rules (until now!)
navigation in set of
UWsUNL lexicon
depending on partnersrules
UNL 1996—
16/4/09© Ch. Boitet —!NII MT lectures 33
Interlingual structures: French-Korean by IF (1)
16/4/09© Ch. Boitet —!NII MT lectures 34
French-Korean by IF (2)
16/4/09© Ch. Boitet —!NII MT lectures 35
Knowledge-based systems: explicit understanding(IL linked with an ontology)
Deep understanding level
Interlingual level
Logico-semantic level
Syntactico-functional level
Morpho-syntactic level
Syntagmatic level
Graphemic level
Ontological interlingua
Semantico-linguistic interlingua
SPA-structures (semantic &
predicate-argument)
F-structures (functional)
C-structures (constituent)
Tagged text
Text
Type Enconversion Mapping into $ Deconversion Examples
KBMT
1980—
lemmatization &
EPSG+f-structures+pseudo-unification
rules (using UP)
all but discourseelements
dict.+rules
+ interactivedisambiguation
planning deep-strrec. descent
rules
KBMT-89CMU, 1989—91
KANT/CatalystCMU+Caterpillar,en#fr-sp-de-? 1992—
RBMT
1997—
dictionary + FSTrules
IF is only a pragmatico-semantic representation
no mapping to $
dictionary + FSTrules
CSTAR-II & Nespole!GETA 97-03ETRI (Korea) 97-99
SMT
2003—
learned from(string,IF) KB
statistical
no mapping to $learned from(IF, string) KB
statistical
CSTAR-II & Nespole!Irst 98-03
Mastor-1IBM 2003
16/4/09© Ch. Boitet —!NII MT lectures 36
[recap] Size & cost of resources / MT architectures
!!!!!Sentences
Type
6.5 w/s
BTEC, Meteo
25 w/s
News
SMT
PSMT
analogical EBMT
0.9—3 Mw3.6—12 K pages
0.15—0.5 M sentences
2.4—8 m*y (already done!)
50—200 Mw200—800 K pages
2—8 M sentences
100—400 m*y (available?)
EBMT with treesMSTMastor-1
N/A for short sent.
Supervised learning1h/page?
4—12.5 Mw15—50 K pages
0.15—0.5 M sentences
10—40 m*y (to do!)
EBMT with treesand S-SSTCs
Banturjah (USM)
N/A for short sent.
Supervised learning15 h/page !
dict. (50 K) available
4—12.5 Mw0.6—1 K pages
0.006—0.01 M sentences
6—10 m*y (to do!)
RBMT Dict. 3-10 K 0.6—2 m*y
Total 1—3 m*y (to do!)
Dict. 50-500 K15—150 m*y
Total 40—175 m*y
16/4/09© Ch. Boitet —!NII MT lectures 37
What kind of IL to choose if that is the choice?
IL+ontology restricted domain
high precision applicationscf. CLang (Mooney)beware, $ costlier than gram+dict!
machine learning possible
Pragmatico-semantic ILIF of CSTAR/Nespole!
task- and domain-relatedreservations in tourismmedical assistance
both MUST be restrictedworks very well then
machine learning possible
Semantico-linguistic ILATLAS-II, PIVOTbetter: UNL
all domains/tasks: IL has to begrounded on a NLunderstandable
by most developers anywhere
amenable to machine learning
Introduction to UNL
an anglo-semantic interlingua
16/4/09© Ch. Boitet —!NII MT lectures 39
Enconversion
Source document (English)
Enconverted document
(UNL)
Deconverted document (French)
Deconverted document (Russian)
Deconverted document (Spanish)
Deconverted document(Chinese)
Deconversions
once 'enconverted' into UNL, may be more easily
be 'deconverted' into other languages &
disseminated.
Its UNL representation can be manually improved
if necessary.
A document in a given natural language,
Short presentation of UNL — the vision
16/4/09© Ch. Boitet —!NII MT lectures 40
Some words about UNLUNL = Universal Networking Language
• Project started at UNU in 1996, with 12 languages• H.Uchida (author of ATLAS-II)
• T. Della Senta (Dir. of IAS/UNU)
• Groups from 12 countries of 12 most spread languages
• Funding & organizational problems after 1998• Development of French, Russian, Spanish, Italian, Hindi, Thai, Portugese,
continued on own resources
• Opening of specs, tools in 1998 (see www.undl.org)
• Start of UNDL foundation in 2001 (Geneva, UNITAR bldg)• very high-level goals: link with ontologies, translation of EOLSS
• no concrete, funded project
• Start of U++C in 2005• U++C = UNL Consortium
to promote development of the UNL language à la Linux & à la W3C• prepare & disseminate open source resources, tools
• seek & support real applications of UNL# an experiment & evaluation on a Unesco web site (B@bel)
16/4/09© Ch. Boitet —!NII MT lectures 41
What is UNL?
UNL =a projecta formal interlingua (IL) -- and maybe more!a html-based format for [companion files to] multilingual documents aligned at
sentence level
Language : 1 utterance # 1 (hyper)graphUNL graph = abstract structure of an equivalent English utteranceUNL symbols (UW, relations, attributes) constructed on English
UNL is understandable and usable by all developers in the world
The UNL approach is adequate forsemi-automatic & incremental translation on "all domains"extension to a large number of languagesother applications: CLIR, abstracting, gisting…
UNL is an improvement over the ATLAS-II pivotbest E!J system in Japan since 20 years5.57 M entries in the dictionaries (ATLAS-II v14, 12/2008)
16/4/09© Ch. Boitet —!NII MT lectures 42
A very simple example
16/4/09© Ch. Boitet —!NII MT lectures 42
A very simple example
16/4/09© Ch. Boitet —!NII MT lectures 42
Free Software Portal
A very simple example
16/4/09© Ch. Boitet —!NII MT lectures 43
• Graph = {relations between nodes bearing UWs & attributes}
A UNL graph is not always a tree
16/4/09© Ch. Boitet —!NII MT lectures 43
• Graph = {relations between nodes bearing UWs & attributes}
The dog watches its master.
A UNL graph is not always a tree
16/4/09© Ch. Boitet —!NII MT lectures 43
• Graph = {relations between nodes bearing UWs & attributes}
The dog watches its master.watch
masterdog
agt obj
pos
A UNL graph is not always a tree
16/4/09© Ch. Boitet —!NII MT lectures 43
• Graph = {relations between nodes bearing UWs & attributes}
The dog watches its master.watch
masterdog
agt obj
pos
agt(watch(icl>do,agt>thing,obj>thing).@entry,
dog(icl>animal).@def)
obj(watch(icl>do, agt>thing,obj>thing).@entry,
master(icl>human))
pos(dog(icl>animal).@def, master(icl>human))
A UNL graph is not always a tree
16/4/09© Ch. Boitet —!NII MT lectures 43
• Graph = {relations between nodes bearing UWs & attributes}
The dog watches its master.watch
masterdog
agt obj
pos
agt(watch(icl>do,agt>thing,obj>thing).@entry,
dog(icl>animal).@def)
obj(watch(icl>do, agt>thing,obj>thing).@entry,
master(icl>human))
pos(dog(icl>animal).@def, master(icl>human))
• A graph line :
agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)
A UNL graph is not always a tree
16/4/09© Ch. Boitet —!NII MT lectures 43
• Graph = {relations between nodes bearing UWs & attributes}
agt : binary relation 'defining a thing which initiates an action'
The dog watches its master.watch
masterdog
agt obj
pos
agt(watch(icl>do,agt>thing,obj>thing).@entry,
dog(icl>animal).@def)
obj(watch(icl>do, agt>thing,obj>thing).@entry,
master(icl>human))
pos(dog(icl>animal).@def, master(icl>human))
• A graph line :
agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)
A UNL graph is not always a tree
16/4/09© Ch. Boitet —!NII MT lectures 43
watch(icl>do…): 'universal word' or 'unit of virtual vocabulary' (UW) made of
- a 'headword' : watch
- a 'restriction' : icl>do,agt>thing,obj>thing # lexical disambiguation + arguments
• Graph = {relations between nodes bearing UWs & attributes}
agt : binary relation 'defining a thing which initiates an action'
The dog watches its master.watch
masterdog
agt obj
pos
agt(watch(icl>do,agt>thing,obj>thing).@entry,
dog(icl>animal).@def)
obj(watch(icl>do, agt>thing,obj>thing).@entry,
master(icl>human))
pos(dog(icl>animal).@def, master(icl>human))
• A graph line :
agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)
A UNL graph is not always a tree
16/4/09© Ch. Boitet —!NII MT lectures 43
watch(icl>do…): 'universal word' or 'unit of virtual vocabulary' (UW) made of
- a 'headword' : watch
- a 'restriction' : icl>do,agt>thing,obj>thing # lexical disambiguation + arguments
• Graph = {relations between nodes bearing UWs & attributes}
agt : binary relation 'defining a thing which initiates an action'
The dog watches its master.watch
masterdog
agt obj
pos
agt(watch(icl>do,agt>thing,obj>thing).@entry,
dog(icl>animal).@def)
obj(watch(icl>do, agt>thing,obj>thing).@entry,
master(icl>human))
pos(dog(icl>animal).@def, master(icl>human))
• A graph line :
agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)
A UNL graph is not always a tree
@entry, @def : «attributes!» specifying how the concept is used in the graph :
- @entry means that the node is the graph entry ;
- @def specifies definiteness
16/4/09© Ch. Boitet —!NII MT lectures 44
regret
John
agt
know
agt
:01
obj
obj
and
John knows that Peter will not come and regrets it.
Peter
come
agt
:01
This "scope" node of the graph is the subgraph described here.
A scope is a connex subgraphmade of arcs sharing a "scope id" + their nodes
16/4/09© Ch. Boitet —!NII MT lectures 44
regret
John
agt
know
agt
:01
obj
obj
and
John knows that Peter will not come and regrets it.
Peter
come
agt
:01agt:01(come.@entry.@future.@not,Peter)
This "scope" node of the graph is the subgraph described here.
A scope is a connex subgraphmade of arcs sharing a "scope id" + their nodes
16/4/09© Ch. Boitet —!NII MT lectures 45
Any UNL graph can be "unfolded" in an auxiliary UNL-tree
Isaac sees that an apple falls and he explains it.
16/4/09© Ch. Boitet —!NII MT lectures 45
Any UNL graph can be "unfolded" in an auxiliary UNL-tree
Isaac sees that an apple falls and he explains it.
explain
Isaac
agt
see
agt
:01
obj
obj
and
:01
apple
fall
obj
UNL (hyper) graph
16/4/09© Ch. Boitet —!NII MT lectures 45
Any UNL graph can be "unfolded" in an auxiliary UNL-tree
Isaac sees that an apple falls and he explains it.
agt(explain(icl>do).@entry,Isaac(icl>proper noun))
obj(explain(icl>do).@entry,:01)
obj:01(fall(icl>occur).@entry,apple)
and(explain(icl>do).@entry,see(icl>do))
agt(see(icl>do),Isaac(icl>proper noun)
obj(see(icl>do),:01)
explain
Isaac
agt
see
agt
:01
obj
obj
and
:01
apple
fall
obj
UNL (hyper) graph
16/4/09© Ch. Boitet —!NII MT lectures 45
Any UNL graph can be "unfolded" in an auxiliary UNL-tree
Isaac sees that an apple falls and he explains it.
agt(explain(icl>do).@entry,Isaac(icl>proper noun))
obj(explain(icl>do).@entry,:01)
obj:01(fall(icl>occur).@entry,apple)
and(explain(icl>do).@entry,see(icl>do))
agt(see(icl>do),Isaac(icl>proper noun)
obj(see(icl>do),:01)
explainIsaac:01agt
see
:01obj
and
apple
fall
obj
Isaac:01 agt :01obj
UNL tree (auxiliary)
explain
Isaac
agt
see
agt
:01
obj
obj
and
:01
apple
fall
obj
UNL (hyper) graph
16/4/09© Ch. Boitet —!NII MT lectures 46
agt (agent) action—agt# thing in focus which initiates it
and (conjunction) X—and# Y conjunctive relation between 2 concepts (word or phrase senses)
aoj (thing with attribute) state or attribute —aoj# thing concerned
bas (basis) degree—bas# thing used as the basis (standard) for a comparison
ben (beneficiary) event or state —ben# indirect beneficiary or victim of it
cag (co-agent) action—cag# thing not in focus which initiates it in parallel with the agent
cao (co-thing with attribute) state or attribute—cao# thing not in focus concerned in parallel
cnt (content) X—cnt# Y equivalent concept (Y!X)
cob (affected co-thing) implicit parallel event or state—cob# thing directly affected
con (condition) focused event or state—con# non-focused event or state which conditions it
coo (co-occurrence) focused event or state—coo# co-occurring event or state
dur (duration) event or state—dur# period of time during which it occurs or exists
fmt (range) X—frt# Y range between two things (from X to Y)
frm (origin) X—frm# Y origin of thing X
gol (goal/final state) event—gol# final state of an object or thing finally associated with its object
ins (instrument) event—ins# thing used to carry it out
man (manner) event or state—man# way to carry out the event or to characterize the state
met (method) event—met# method to carry it out
mod (modification) focused thing—mod# thing which restricts it
nam (name) thing—mod# a name of that thing
obj (affected thing) event or state—obj# thing in focus directly affected by it
UNL semantic relations 1/2
16/4/09© Ch. Boitet —!NII MT lectures 47
opl (affected place) event—opl# place in focus where it has effects
or (disjunction) X—or# Y disjunctive relation between 2 concepts (word or phrase senses)
per (proportion, rate or distribution) X—per# thing used as basis (standard) or unit of proportion,
rate or distribution X
plc (place) event or state or thing—plc# place where it occurs or is true or exists
plf (initial place) event or state—plf# place where it begins or becomes true
plt (final place) event or state—plt# place where it begins or becomes false
pof (part-of) focused thing—pof# thing of which it is a part
pos (possessor) thing—pos# possessor of it
ptn (partner) action—ptn#indispensable non-focused initiator of it
pur (purpose or objective) event or existing thing—pur# purpose or objective of an event or
purpose of a thing
qua (quantity) thing or unit—qua# quantity of it
rsn (reason) event or state—rsn# reason that it happens
scn (scene) event or state or thing—scn# virtual world where it occurs or is true or exists
seq (sequence) focused event or state—seq# prior event or state
src (source/initial state) event—src# initial state of an object or thing finally associated with
its object
tim (time) event or state—tim# time at which it occurs or is true
tmf (initial time) event or state—tmf# time at which it starts or becomes true
tmt (final time) event or state—tmt# time at which it starts or becomes false
to (destination) X—to# Y destination of thing X
via (intermediate place or state) event or state—via# intermediate place
UNL semantic relations 2/2
16/4/09© Ch. Boitet —!NII MT lectures 48
@entry: graph entry.
@def : determination
@pl : plural
Attributes specify how concepts are used in a givengraph (tense, aspect, determination, number, etc.)
agt(watch(agt>thing,obj>thing).@entry,dog(icl>animal).@def.@pl)
UNL attributes
16/4/09© Ch. Boitet —!NII MT lectures 49
Time :
@past happened in the past
@present happening at present
@future will happen in future
Aspect :
@begin beginning of an event or a state
@complet finishing/completion of a (whole) event.
@continue continuation of an event
@custom customary or repetitious action
@end end/termination of an event or a state
@experience experience
@progress an event is in progress
@repeat repetition of an event
@state final state or the existence of the object on which an action has been taken
The preceding attributes may be modified by the following ones :
@just
@soon
@yet
UNL attributes (examples)
16/4/09© Ch. Boitet —!NII MT lectures 50
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
agt
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
agt
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
agt
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
agtobj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
agtobj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
agtobj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
agtobj
gol
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
modobj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod:01
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod:01
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agtobj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
and
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
and
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
and
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
andagt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@pl
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@pl
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobjagt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
agt
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
ins
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
ins
Inuit(icl>language)
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
ins
Inuit(icl>language)
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
ins
Inuit(icl>language)mod
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
ins
Inuit(icl>language)mod
agt
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 50
provide(agt>thing,obj>thing)
.@entry
attavik.net(icl>entity)
system(icl>method)
.@indef
management(icl>activity)
.@def
content(icl>information)
agtobj
gol
mod
gol
:01
write(agt>human,obj>thing)
.@entry
manage(icl>treat
(agt>volitional thing,obj>thing)
and
speaker(icl>role)
.@indef.@pl
agt
native(mod<human)
mod
offer(icl>give(agt>thing,
gol>thing,obj>thing))
anddocument(icl>information)
.@indef.@plobj
payment(icl>action)
.@indef.@pl
obj
online(icl>place)mod
language(icl>system)
.@def
ins
Inuit(icl>language)mod
agtins
man
obj
A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
16/4/09© Ch. Boitet —!NII MT lectures 51
It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.
agt(provide(agt>thing,obj>thing).@entry, attavik.net(icl>entity))
obj(provide(agt>thing,obj>thing).@entry, system(icl>method).@indef)
gol(system(icl>method).@indef, management(icl>activity).@def)
obj(management(icl>activity).@def, content(icl>information))
gol(provide(agt>thing,obj>thing).@entry, :01)
and:01(manage(icl>treat(agt>volitional thing,obj>thing)), write(agt>human,obj>thing).@entry)
obj(:01, document(icl>information).@indef.@pl)
agt(:01, speaker(icl>role).@indef.@pl)
mod(speaker(icl>role).@indef.@pl, native(mod<human))
and(offer(icl>give(agt>thing,gol>thing,obj>thing)), :01)
obj(offer(icl>give(agt>thing,gol>thing,obj>thing)), payment(icl>action).@indef.@pl)
mod(payment(icl>action).@indef.@pl, online(icl>place))
ins(offer(icl>give(agt>thing,gol>thing,obj>thing)), language(icl>system).@def)
mod(language(icl>system).@def, Inuit(icl>language))
agt(offer(icl>give(agt>thing,gol>thing,obj>thing)), speaker(icl>role).@indef.@pl)
16/4/09© Ch. Boitet —!NII MT lectures 52
Who Where Enconv Déconv Dict. Remarks
UNL-CenterH.!Uchida
M.Y. ZhuTokyo
UNDLFond. T. Della-Senta Genève —
English UNL-C Tokyo — — 100000
IPPI Moscou + + 45000
Arabic RSS Amman ? + + not active
Daoud Amman univ + + + related CATS proj.
Bibl. Alexandr. Alexandria + + + EOLSS proj.
Armenian LAI Erevan — +? ?
Chinese XMT (Pr Shi) Xiamen + + 90000? gives his system
Korean KAIST Taejon — — — never active
French GETA Grenoble + ++ 45000
German IAI, DFKI Saarbruck — — — not active
Hindi IITB Bombay + ++ 30000? + " applis UNL
Marathi ? + ?
Indonesian AIA (BPPT) Djakarta — +— 40000? server inactive
Italian ILC Pise ? ++ 50000
Japanese UNL-C Tokyo ? + 100000
Lituanian AIL (Spektor) Riga —— — —
Mongolian Delgerjav Toho univ. — — —inactive since 2000
Portuguese IFSC, USC Brésil — +? ?
Russian IPPI Moscou + ++ 45000
Spanish UPM Madrid + ++ 45000
Swahili ? ? — — — never active
Thai NECTEC Bangkok — + 70000? inactive since 2003
Resources available from UNL centers
16/4/09© Ch. Boitet —!NII MT lectures 53
Embedding a comparative task-relatedevaluation of UNL in a real translation task
• Experimental setting (2004-05)Unesco official languages: arb, chi, eng, fre, rus, spaThe Unesco B@bel web site has
• a multilingual architecture:dynamic web pages, contents from a multilingual SQL database
• 100% in English, 10% in French, 0% in other languages
• 42301 words (173 standard pages, $2980 sentences) in English
• GoalsTranslate it : en # fre rus spa +chi
• using available MT systems
• Measure human work (task-related post-editing in translator's mode)Produce UNL graphs for the most complex sentences ($1000)
• Measure times (graph production, dict. & deconverter update)
• Post-edit deconversions in translator's mod (UNL = source language)Compare and evaluate future usability of UNL
• for comparable tasks
• with N target languages, spreading the "UNL cost" on all of them
16/4/09© Ch. Boitet —!NII MT lectures 54
Translation of the Unesco B@bel web site
• Partners: GETA (fre, chi), IPPI (rus), UPM (spa)
• Systran used for eng#fra, spa, chi
• ETAP-3 used for eng#rus
• Post-edition in "translator's mode"by each partnerfor his target language(s)
16/4/09© Ch. Boitet —!NII MT lectures 55
Potential actual and future gains
Results and projections for 10 & 20 target languagesUNL-sa1/sa-2: mid-/long-term speed-ups in graph creation
• Times are in minutes and for 1 page of 250 words
Text type Simple (12 w/s) Complex (25 w/s)
10 target languages 1st draftBil.
Rev
UNL
RevTot 1st draft
Bil.
Rev
UNL
RevTot
H only 45 15 — 60 60 20 — 80
H+TM 20 5 — 25 30 10 — 40
MT-gen 0 15 — 15 0 25 — 25
MT-spec 0 5 — 5 0 15 — 15
UNL-man 120 — 10 22 240 — 10 34
UNL-sa1 20 — 8 10 30 — 8 11
UNL-sa2 10 — 5 6 15 — 5 6.5
20 target languages (UNL-man time is spread over them)
UNL-man 120 — 10 16 240 — 10 22
UNL-sa1 20 — 8 9 30 — 8 9.5
UNL-sa2 10 — 5 5.5 15 — 5 5.7
16/4/09© Ch. Boitet —!NII MT lectures 56
Current developments & perspectives
• U++Consortiumwww.unl.fi.upm.es/consorcio/
• CWL (W3C) incubator www.w3.org/2005/Incubator/cwl/
• EOLSS/UnescoL project done in 2008
16/4/09© Ch. Boitet —!NII MT lectures 57
How to build enconverters / deconverters?
• Start from existing MT systems, if any"bridge the gaps"
• For !-languages (under-resourced languages)use corpora (L, UNL) available from translations + UNL
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
concr-str_enLingenio
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
concr-str_enLingenio
abstr-str_fr abstr-str_enMSR
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
concr-str_enLingenio
gm-str_en
B'VITAL/FE/aéro(G
ET
A)
abstr-str_fr abstr-str_enMSR
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
concr-str_enLingenio
ETAP-3
text_ru
abstr-str_ru
ET
AP
-3concr-str_ru
(IP
PI)
gm-str_en
B'VITAL/FE/aéro(G
ET
A)
abstr-str_fr abstr-str_enMSR
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
concr-str_enLingenio
ETAP-3
text_ru
abstr-str_ru
ET
AP
-3concr-str_ru
(IP
PI)
UNL-FR (G
ETA)
pivot (UNL, IF)
gm-str_en
B'VITAL/FE/aéro(G
ET
A)
abstr-str_fr abstr-str_enMSR
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
concr-str_enLingenio
ETAP-3
text_ru
abstr-str_ru
ET
AP
-3concr-str_ru
(IP
PI)
UNL-FR (G
ETA)
pivot (UNL, IF)
c-str_sp
text_sp
a-str_sp
gm-str_en
B'VITAL/FE/aéro(G
ET
A)
abstr-str_fr abstr-str_enMSR
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 58
text_fr
concr-str_fr
text_en
Systran, Reverso, LMT,
METAL (Comprendium)
concr-str_enLingenio
ETAP-3
text_ru
abstr-str_ru
ET
AP
-3concr-str_ru
(IP
PI)
UNL-FR (G
ETA)
pivot (UNL, IF)
c-str_sp
text_sp
a-str_sp
gm-str_en
B'VITAL/FE/aéro(G
ET
A)
abstr-str_fr abstr-str_enMSR
c-str_hi
text_hi
a-str_hi
Idea on how to bridge the gaps
16/4/09© Ch. Boitet —!NII MT lectures 59
UNL and computerizing !-languages
Let Lt be a !-language (poorly computerized — THA, LAO, VIE…)
• Start form an aligned corpus {(text_Ls, text_Lt)}n
where Ls is a «!rich!» language (FRA, ENG, RUS, SPA…)
• Construct (semi-automatically) the corpus{(text_Ls, UNL)}n
• Using the obtained aligned corpus {(UNL, text_Lt)}n
build the UNL-trees relative to Lt
{(tree-UNL_Lt, text_Lt)}n
align to obtain abstract dependency trees of Lt:{(tree-dep_Lt, text_Lt)}n
use machine learning to build adependency analyzer for Lt
program the transformation
tree-dep_Lt # tree-UNL_Lt
• This produces a deconverter and an enconverter for Ltand derived products (analyzer, lexical correspondences…)
16/4/09© Ch. Boitet —!NII MT lectures 60
UNL-graph ! UNL-tree
Reversible and systematic transformation (simple here)
The city will recover a coastal zone after the Forum
recover(icl>do)
.@entry.@future.@complete
objagt
tim
city(icl>entity)
.@def
zone(icl>place)
.@indef
obj
forum
(icl>proper noun)
.@def
after
mod
coastal(aoj<thing)
16/4/09© Ch. Boitet —!NII MT lectures 61
UNL-tree ! Spanish text
Remark!: we work with imperfect data
1. Here, 2 errors remain (de costal, el Foro)
2. Feedback to the developers of deconverters is planned
16/4/09© Ch. Boitet —!NII MT lectures 62
The alignment (UNL-tree, text) is not always projectiveThe UNL-tree is an abstract dependency tree at the semantic level
UNL-tree ! Chinese text
16/4/09© Ch. Boitet —!NII MT lectures 63
Recap & first conclusions
• The distinctions RBMT, EBMT, an-EBMT, (P)SMT…concern the computational architecture only (PROCESSES)
• The rawer the corpora, the larger they must beSMT/PSMT is for niches for the rich (languages, texts)
• few parallel corpora of 200—800 K pages
• to build them from scratch is 2 to 3 times more expensive than to build aclassical large RBMT system
• IL-based MT can use any computational frameworkstatistical, analogical, rule-based, hybridall depends on available corporal / linguistic / human resources
• Many applications need an adequate IL# all applications needing to
• manipulate content
• in a strongly multilingual setting
• That is possiblewith empirical as well as expert / hybrid architectures