aperçu de “boitet-nii-mt-lect2-v2rev.ppt” · after unsure or unfinished ocr (very rare in mt,...

146
L2: Linguistic architectures of MT systems Christian BOITET GETALP-LIG-UJF, Grenoble, France [email protected]

Upload: others

Post on 30-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

Christian Boitet

L2: Linguistic architectures of MT systems

Christian BOITET

GETALP-LIG-UJF, Grenoble, [email protected]

Page 2: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 1

Introductive words

• No time to show details of various linguisticarchitectures in lecture 1 (L1)but we went into new territory and detailed usually implicit notions

• workflow in HT and MT (MT#1 or automated MT)

• how to measure C, A, Q (ling. quality) for the "C.A.Q MT theorem"

• The goals of L2 will be to:• make explicit the basis of linguistic architectures in MT

• justify independence of linguistic and computational architectures

• at the same time, present various intermediate structures

• speak about their pros a cons.

Page 3: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 2

Outline of lecture 2

• Recap: linguistic, computational, operational architectures

• Kinds of representations usable in principle• Linguistic level: monolevel and multilevel structures

• Scope (what are units of translation?)segments, infrasegments, supersegments, whole documents?

• Geometry: strings, trees, charts/lattices, (hyper)graphs, log. forms

• Abstractness: type of <string, structure> correspondence

• Various kinds of representations really used in MTusing existing systems as examples

• morphosyntactic structures

• syntactic structures: c-structure, f-structure

• logico-semantic structures: spa-structure

• Different sorts of deep pivotsusing UNL as an example

• hybrid, semantico-linguistic, semantico-pragmatic

• UNL (semantico linguistic)

• Recap and conclusions• Pros and cons of various linguistic architectures w.r.t. translational situations

Page 4: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 3

Linguistic architectureobjectssee Vauquois' triangle

intermediate representationsdirect, semi-direct,transfer (" 7 variants)

2 lexical spaces

IL (" 2 variants)

3 lexical spaces

Computational architectureautomatic processeshuman interaction, if any

programming paradigmsdirect programmingRBMT (rules, automata…)

corpus-basedSMT, PSMT (unsupervised)EBMT (" 3 variants)

± supervised

hybrid

Linguistic vs. computational architectures

Page 5: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 4

Linguistic architectures in MT: Vauquois' triangle

Page 6: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 5

Intermediate Representation i

Intermediate Representation i+1

Computational architectures

Page 7: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 5

Intermediate Representation i

Intermediate Representation i+1

Computational architectures

procedural

Well-formednessgrammar rules

transitionsof transduction

automata

rewriting rulestransformational

grammars

rule-based

expert

statistical(probabilistic)

SMT

annotatedparallel corpora

trees,S-SSTC

rawparallel corpora

analogy-based MTABMT

example-basedEBMT

empirical

Computional typeof the phase

Page 8: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 6

Linguistic and computational MT architectures are independent(recap)

Page 9: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 7

Kinds of representations usable in principle

• Linguistic levellevelsmonolevel and multilevel structures

• Scope (what are units of translation?)segments, infrasegments, supersegments, whole documents?

• Geometrystrings, trees, charts/lattices, (hyper)graphs, log. forms

• Abstractnesstype of <string, structure> correspondenceconcrete ! abstract # surface ! deep

Page 10: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 8

Formalized representations of texts

Page 11: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 8

Formalized representations of texts

Linguistic

levels

Main

linguistic

organisation

Geometrical

structure

Algebraic

structure

Text — Structure

CorrespondenceScope

Surface

Deep

1-stratalmonolevel

n-stratalmultilevel

Syntagms(constituents)

Dependencies

Logical and

semantic

relations

Sring

Graph of

strings (chart)

Tree

Graph /

Network

Hypergraph

Label

Structured

label

Boolean

features

Structured

attributes

(vectors)

Feature

structures

(± typed)

concrete (text

!readable from

structure

totally abstract

(ex. UNL graph)

Sentence(! all)

Paragraph

Page(Ariane-G5,

Sygmart)

Document

Page 12: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Page 13: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

MT architectures on which we work

Page 14: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Page 15: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Page 16: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Rule-based MT

(symbolic) for

sub-languages

Page 17: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Rule-based MT

(symbolic) for

sub-languages

All-domain MT via

UNL

Page 18: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Rule-based MT

(symbolic) for

sub-languages

All-domain MT via

UNL

Translation

Memory

based MAHT

Page 19: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Rule-based MT

(symbolic) for

sub-languages

All-domain MT via

UNL

Translation

Memory

based MAHT

Statistico-

structural MT

Page 20: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Rule-based MT

(symbolic) for

sub-languages

All-domain MT via

UNL

Translation

Memory

based MAHTAnalogy-based MT

Statistico-

structural MT

Page 21: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 9

Rule-based MT

(symbolic) for

sub-languages

All-domain MT via

UNL

Translation

Memory

based MAHTAnalogy-based MT

Example-

based MT

(S-SSTC)

Statistico-

structural MT

Page 22: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 10

Direct translation systems

Graphemic level Direct translation Text

Type Steps Method Comments Examples

RBMT

1975—

segmentation

word for wordtranslation

FST (rules + dict.)rules

OK for very near languagesJapanese ! KoreanHindi ! Urdu

ATLAS-IFujitsu,76-78

SMT

1980—

segmentation,reordering…

alignment +

"decoding"statistical

SMT = first idea about MTfrom war cryptographers(W.!Weaver 1949)

Many SMT

IBM… 1980-

EBMT

2000—

no preprocessing

"pure" EBMT

analogy resolution +

n-gram filteringanalogical

Results $ those of SMTNagao 1984 (similarity MT)Lepage 2000 (real analogy)

ALEPH ATR

2000-

Page 23: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 11

Structures

• String of typographical wordsWriting systems with word separators

• English, French…

• Confusion network or lattice at character levelafter unsure or unfinished OCR(very rare in MT, sometime for military intelligence)

• Several strings of typographical wordswith scores

• Segmentation lattice or chartWriting systems without word separators

• Japanese, Chinese, Korean, Thai, Lao, Khmer, Vietnamese…

! in Vietnamese, white spaces separate syllables, not words.

• scored nodes or edges

Page 24: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 12

Semi-direct translation systems

Morpho-syntactic level

Graphemic level

Tagged text

Text

Semi-direct translation

Type Analysis Transfer Generation Examples

1G-MT

1950—

Program-basedsegmentation &

lemmatizationprocedural

dictionary consult. +

reordering "macros"

procedural

tables +

string macros

procedural

GAT (Georgetown)Ispra, 1965—69

SPANAM-1PAHO, 1975?—

GLOBALINK" Spanam-1, PAHO

SMT1990—

segmentation &lemmatization

statistical

alignment +

"decoding"statistical

language modelstatistical

Candide IBM, 1980—

Many SMT systemsNIST, IWSLTGoogle (?)

Pidgintranslation

segmentationsnobol4

lemmatizationrules

transduction +

reordering(Q-systems rules ontree charts)

morphologicalgeneration

rules

formattingsnobol4

Idea of B. Harris!!(TAUM, translatologist)

rus # eng, freBoitet 1972

Page 25: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 13

Descending surface syntactic transfer systems:RBMT (+SMT / LanguageWeaver?)

Morpho-syntactic level

Syntagmatic level

Graphemic level

C-structures (constituent)

Tagged text

Text

Descending transfer

Type AnalysisTransfer +

syntactic generationMorphological

generationExamples

RBMT

1970—

ATNautomatatrans. rules

recursive descentprocedural

grammar+dict.rules

tables+prog.procedural

ENGSPAN, SPANAM-2PAHO, 1978?—

AS-TransacToshiba, 1982—

Reverso ProMT, 1986—

RBMT1980—

ECFG (+decorations)

gram. rulesrecursive descent

proceduraloften in LISP

grammar+dict.rules

METAL Austin, 1982—

Duet-2 Sharp, 1984—

Shalt-1 IBM-Jp, 1982—

Kate !!KDD, 1983—

RBMT

1984—

Lemmatization +

Slot-grammarsprolog

recursive descentlogic programming

in prolog

dictionary+tables+ prog.

prolog

LMT (IBM-US, 1983—)

Page 26: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 14

Structures

• Syntactic treesInformation on nodes

• simple labels METEO (Q-systems)

• main label + boolean attributes (TAUM-aviation)

• main label + typed attributes (Ariane, METAL & many others)Concrete trees (projective)

• constituents: frontier (leaves) is a prototype of the utterance

• dependencies: in-order (infix) traversal is a prototype…Abstract trees (not projective)

• normalized trees with semantic constituents (Colmerauer, METEO)

• sometime necessary because of1. verbs with separable particles (German, English…)

2. comb-like constructions

! A, B & C gave A', B' & C' to A", B" & C"

! ,(give(A, A', to(A")), give(B, B', to(B")), give(C, C', to(C"))

Page 27: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 15

Syntactic trees

• Concrete syntagmatic (constituent) treeAnn, Bob give cod, donut to Emil, Fathia

PHVB

V

give

VP

N

Ann

NP

N

Bob

NPCoo

,

N

cod

NP

N

donut

NPCoo

,

N

Emil

PP

N

Fathia

NPCoo

,

Prep

to

Page 28: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 16

Syntactic trees

• Concrete dependency tree

giveV

subj

AnnN

BobN

,Coo

gov

ccp

codN

donutN

,Coo

gov

ccp

obj1

obj2

EmilN

FatiaN

,Coo

gov

ccp

toPrep

gov

Page 29: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 17

Syntactic trees

• Abstract dependency tree

giveVsubj

AnnN

BobN

gov

codN

donutN

obj1 obj2

EmilN

FatiaN

,Coo

toPrep

giveVsubj

toPrep

obj1 obj2

gov

Page 30: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 18

Examples!: MT for access, web (1)

ENGLISH (human version) FRENCH (human version) ENGLISH (Systran FRE-ENG

version)The European-Heritage.net

thesaurus covers the fields of

archaeology and architecture as

defined in the Council of Europe

conventions signed in Granada

(1985) and Malta (1992).

Le thesaurus European-Heritage.net

couvre les champs de l'archéologie et

de l'architecture au sens des

conventions du Conseil de l'Europe de

Grenade (1985) et de Malte (1992).

The European-Heritage.net thesaurus

covers the fields of archaeology and

architecture within the meaning of

conventions of the Council of Europe

of Grenade (1985) and Malta (1992).

It encompasses information

ranging from the partners

involved, categories of cultural

assets and legislation, to activities,

skills and funding. It is

supplemented by a number of

specific thesauruses compiled by

each member state on a particular

topic, such as the thesaurus on

Andalusian heritage or the

architectural thesaurus from the

Mérimée database in France.

Il prend en compte des aspects aussi

variés que les acteurs, les catégories

de biens culturels, la législation ou

encore les interventions, les métiers et

les financements. Il est complété et

prolongé par des thesaurus spécifiques

développés par chaque Etat membre

sur tel ou tel sujet spécifique, comme

le thesaurus du patrimoine historique

andalou ou le thesaurus d'architecture

de la base de données documentaire

Mérimée en France.

It takes into account aspects as varied

as the actors, the categories of cultural

goods, the legislation or the

interventions, the trades and the

financings. It is supplemented and

prolonged by thesaurus specific

developed by each Member State on

such or such specific subject, like the

thesaurus of the Andalusian historical

inheritance or the thesaurus of

architecture of the documentation data

base Mérimée in France.

This new, open-ended search tool

will come on line shortly, together

with a management and

administration system shared

among the various contributors.

Cet instrument de recherche,

forcément évolutif, sera mis

prochainement en ligne accompagné

d'un dispositif de gestion et

d'administration réparti entre les

différents contributeurs.

This instrument of search, inevitably

evolutionary, will be put soon on line

accompanied by a device of

management and administration

distributed between the various

contributors.

Page 31: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 19

Examples!: MT for access, web (2)

• FE easy compared to EG and even more FGGERMAN (Systran ENG-GER version) GERMAN (Systran FRE-GER version)Der European-Heritage.netthesaurus umfaßt die

Felder von archaeology und von Architektur,

wie in den Europaratvereinbarungen definiert,

die in Granada (1985) unterzeichnet werden und

in Malta (1992).

Der European-Heritage.net-Thesaurus bedeckt

die Felder der Archäologie und der Architektur

im Sinne der Übereinkommen des Europarats

von Granada (1985) und von Malta (1992).

Er gibt die Informationen um, die von den

betroffenen Partnern, von den Kategorien der

kulturellen Werte und der Gesetzgebung, bis zu

Aktivitäten, von den Fähigkeiten und von der

Finanzierung reichen. Er wird durch eine Anzahl

von den spezifischen Thesauren ergänzt, die

durch jeden Mitgliedsstaat auf einem

bestimmten Thema, wie dem Thesaurus auf

Andalusian Erbe oder dem architektonischen

Thesaurus von der Datenbank Mérimée in

Frankreich kompiliert werden.

Er berücksichtigt Aspekte dermaßen variierte,

daß die Beteiligten, die Kategorien kultureller

Güter, die Gesetzgebung oder noch die

Interventionen, die Berufe und die

Finanzierungen. Er wird vervollständigt und

wird durch ein spezifische Thesaurus entwickelt

durch jeder Mitgliedstaat über das eines oder

andere spezifische Thema verlängert, als der

Thesaurus des andalusischen historischen

Kulturgutes oder der Thesaurus der Architektur

der urkundlichen Datenbank Mérimée in

Frankreich.

Dieses neue, offene Suchhilfsmittel kommt auf

Zeile kurz, zusammen mit einem Management-

und Leitungssystem, das unter den

verschiedenen Mitwirkenden geteilt wird.

Dieses notgedrungen entwicklungsfähige

Forschungsinstrument wird gestellt demnächst

online begleitet von einer Verwaltungs- und

Verwaltungsvorrichtung, die aufgeteilt unter den

verschiedenen Beitragenden.

Page 32: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 20

Comparison!: rough vs. raw MT

Reverso rough Spanish-English output SpanAm raw Spanish-English outputMessage of the Chief operating officer of the World

Organization of the Health

Message of the Director-General of the World Health

Organization

From his{*its*} discovery, the antibiotics have transformed

completely the perspective of the humanity with regard to

the infectious diseases. Today the use of the antibiotics,

cocktail with improvements in the reparation, the housing

and the nutrition, together with the advent of the programs

of widespread vaccination, they have given place to a

notable decrease of infectious diseases that before were

common and were annihilating entire populations.

From its discovery, antibiotics have completely transformed

the perspective of humankind with respect to infectious

diseases. Today the use of antibiotics, combined with

improvements in sanitation, housing, and nutrition, together

with the advent of the vaccination programs generalized,

have caused a notable reduction of infectious diseases that

previously were common and annihilated entire populations.

Scourges that terrified million persons, as the pest, the

savage cough, the poliomyelitis and the scarlatina, they have

been controlled or are on the verge of be controlling. Now,

in the dawn of a new millenium, the humanity faces with

another crisis. Diseases before curable as the gonorrhea and

the fever tifoidea they are becoming rapidly difficult to

treat, whereas killer old men as the tuberculosis and the

malaria are armed{*assembled*} now with the increasing

impenetrable resistance the antimicrobial ones.

Scourges that terrified millions of people, as plague,

whooping cough, poliomyelitis, and the scarlatina, have

been controlled or are on the verge of being controlled.

Now, in the dawn of a new millennium, humankind faces

another crisis. Previously curable diseases as the gonorrhea

and typhoid fever are becoming rapidly difficult to treat,

while old assassins as tuberculosis and malaria now are

armed of the increasingly impenetrable resistance to the

antimicrobial drugs.

This phenomenon is potentially contenible. The problem is

increasingly deep and complex, accelerated by the abuse of

the antibiotics in the developed countries and the

paradoxical subutilization of the antimicrobial ones of

quality in the countries in development due to the poverty

and the resultant shortage of an attention of effective health.

This phenomenon is potentially contenible. The problem is

increasingly profound and complex, accelerated by the

abuse of antibiotics in the developed countries and the

paradoxical underutilization of the quality antimicrobial

drugs in the developing countries due to the poverty and to

the scarcity resulting from an effective health care.

Page 33: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 21

Descending deep syntactic transfer systems

Syntactico-functional level

Morpho-syntactic level

Graphemic level

F-structures (functional)

often dependency

structures

Tagged text

Text

Descending transfer

Type AnalysisTransfer +

synt. generationMorphological

generationExamples

RBMT

1985—

Segm.+ lemmatizationprocedural

Dependencygram. rules+ constraint progr.

recursive descentprocedural

grammar+dict.rules

tables+prog.procedural

JETS (IBM-Jp, 1985-90)

1.5G-MT

1990—

LemmatizationFST (+ dictionaries)

Dependency graphprocedural (C macros)

deterministic

recursive descentprocedural

grammar+dict.rules

tables+prog.procedural

Systran 1990—

Page 34: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 22

Morpho-syntactic level

Syntagmatic level

Graphemic level Direct translation

Syntactic transfer (surface) C-structures (constituent)

Tagged text

Text

Type Analysis Transfer Generation Examples

RBMT

1992—

lemmatization +

linear patternsrules

treelet dict.

+ sem. thesaurusrules

tree flattening

grammar+dict.rules

TDMT (for SLT)ATR, 1992—1998

RBMT

1995—

lemmatization +

Slot Grammarsprolog

treelet dictionaryprolog

recursive descentprolog

grammar+dict.rules

PT (from LMT)Linguatech, 1995—

EBMT

2000—

Initial data: bilingual// corpusdictionary

Preparation:build S-SSTCsimprove (hum)

Translation:A//T//Gbottom-up

EBMT (Banturjah)UTMK, USM, 2000—

PSMT

PSCFG

2002—

lemmatization

chunkingstatistical

alignment

decodingstatistical

tree flattening

post-processingstatistical

LanguageWeaver 2002—

Google 2005—

+WU, Melamed 1997, 2004

Horizontal surface syntactic transfer systems:RBMT & Phrase-Based SMT

Page 35: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 23

Horizontal deep syntactic transfer systems

Syntactico-functional level

Morpho-syntactic level

Syntagmatic level

Graphemic level

Syntactic transfer (deep) F-structures (functional)

C-structures (constituent)

Tagged text

Text

Type Analysis Transfer Generation Examples

RBMT

1975—

grammar + dictionary

dependency analysisrules

treetransformations

rules

tree flattening

grammar+ dictionaryrules

ETAP-2, ETAP-3IPPI, Moscow, 1977—

RBMT

1995—

lemmatization +

Slot GrammarsProLog

treelet dictionaryProLog

recursive descentProlog

grammar+ dictionaryrules

PT (from LMT)Linguatech, 1995—

RBMT+

SMT

1999—

MSR (Microsoft )analyzers

rules (in G)

Learned from pairs(lf_s, lf_t)

statistical

Microsoft generatorsrules(in G)

MTS-1(on technicaldocumentation)

Page 36: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 24

Horizontal multilevel transfer systemsType Analysis Transfer Generation Examples

RBMT

1990—

Lemmatization?

ECFG (govt & binding)gram. rulesinteractivedisambiguation

if not enough space

dictionary +treetransformations

rules

treetransformations

rules

MG: dictionary +grammars

rules

ITS (Geneva, 1990—)

Perhaps PT-2(rather than SF)

Syntactico-functional level

Morpho-syntactic level

Syntagmatic level

Graphemic level

Multilevel transfer

F-structures (functional)

C-structures (constituent)

Tagged text

Text

N levels in 1 structure

(abstract constituent tree) Multilevel description

Logico-semantic level

Page 37: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 25

Ascending multilevel transfer systems

Ascending transferLogico-semantic level

Syntactico-functional level

Morpho-syntactic level

Syntagmatic level

Graphemic level

SPA-structures (semantic &

predicate-argument)

F-structures (functional)

C-structures (constituent)

Tagged text

Text

N levels in 1 structure Multilevel description

Type Analysis Transfer Generation Examples

RBMT

1978—

lemmatization:

dict. + Ndet FSTrules

treetransformations

rewriting rules

dictionary+treetransformations

rules

treetransformations

rules

MG: dict. + gram.rules

Ariane-G5-basedru-de#ru, en#my-th 80-87fr#en (BV/aero) 85-92fr#en-de-ru (LIDIA) 90-96

HICATS Hitachi (1990—-)

Jemah USM, NUS (1990—-)

Page 38: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 26

Multilevel concrete trees (umc-structures)

“The customers were not given their money back by the cashier, butby the waiter.”

S[type=assertive, time=past, aspect=perfective, tense=c-past, voice=passive…](NP[semrel=dest, logrel=arg2, synfunc=subj, sem=human, num=plur…]

(art[lex=‘the’, semrel=deict, synfunc=det, number=plur, deter=definite…]!noun[lex=‘customer’, synfunc=head, sem=human, number=plur…])

!aux[lex=‘be’, tense=pret, pers=3, number=plur…]!neg[lex=‘not’]!vrb[lex=‘give’, synfunc=head, voice=passive, tense=ppart, vbpart=‘back’…]!NP[semrel=patient, logrel=arg1, synfunc=obj1, number=sing…]

(adjposs[lex=‘his’, semrel=poss, synfunc=det, number=plur, deter=definite…]!Noun[lex=‘money’, synfunc=head, number=sing…])

!vbpart[lex=‘back’]!NP[semrel=agent, logrel=arg0, synfunc=agcomp, number=sing, neg=not-but…]

(prep[lex=‘by’, synfunc=reg]!art[lex=‘the’, semrel=deict, synfunc=det, number=sing, deter=definite…]!Noun[lex=‘cashier’, synfunc=head, sem=human, number=sing, neg=not…]!NP[semrel=id, logrel=arg0, synfunc=coord, number=sing…]

(conj[lex=‘but’, synfunc=coo]!prep[lex=‘by’, synfunc=reg]!art[lex=‘the’, semrel=deict, synfunc=det, number=sing, deter=definite…]!noun[lex=‘waiter’, synfunc=head, sem=human, number=sing…]))

!punct[lex=‘.’])

Page 39: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 27

The customers were not given their money back by the cashier, but by the waiter.

Concrete multilevel tree: head-driven, mix ofconstituent / dependency structures, projective

NP

noun['customer'…]

S [type=assertive, time=past, aspect=perfective, tense=c-past, voice=passive…]

NP[semrel=patient,

logrel=arg1,sf=obj1, sing…]

aux['be'…]

neg['not'…]

vrb ['give',passive,ppart…]

art['The'…]

vbpart['back']

noun['money'…]

adjpos['his'…]

NP [semrel=agent,logrel=arg0,

sf=agcmp, sing…]

art['the'…]

prep[by…]

noun['cashier'…]

NP [semrel=id,logrel=arg0,

sf=coord, sing…]

art ['the', semrel=deict,sf=det…]

prep ['by',sf=reg …]

noun[waiter…]

conf ['but',sf=coo…]

punct['.']

Page 40: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 28

The customers were not given their money back by the cashier, but by the waiter.

Abstract multilevel tree: same, but smaller(abstraction), & de-projectivized if necessary

NP [semrel=ben,logrel=arg2, sf=subj…]

noun['customer',def, sing…]

S [type=assertive, time=past, aspect=perfective, tense=c-past, voice=passive…]

NP[semrel=patient,

logrel=arg1,sf=obj1, sing…]

vrb ['give-back',passive, c-past, neg…]

noun['money'…]

adjpos['customer',

refpos…]

NP [semrel=agent,logrel=arg0,

sf=agcmp, sing…]

prep ['by',sf=reg…]

noun ['cashier',sf=head, def, sing…]

NP [semrel=id,logrel=arg0,

sf=coord, sing…]

prep ['by',sf=reg …]

noun ['waiter', sf=head,def, sing…]

conj ['not-but',sf=coo…]

punct['.']

Page 41: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 29

Examples!: raw MT for revision…

Language divergences are NOT handled contrastively…

Page 42: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 30

…with BV-aero/FE (2)

… but by generating from an abstract (multilevel) representation

Page 43: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 31

Semantic transfer systems

Logico-semantic level

Syntactico-functional level

Morpho-syntactic level

Syntagmatic level

Graphemic level

Semantic transferSPA-structures (semantic &

predicate-argument)

F-structures (functional)

C-structures (constituent)

Tagged text

Text

Type Analysis Transfer Generation Examples

RBMT

1982—

segmentation

lemmatizationdirect programming

tree transformationsrules

dictionary +

tree transformations

rules

tree transformationsrules

MG: dict. + gram.rules

MUKyodai, 82-87

MAJESTICJICST, 87—

Page 44: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 32

Conceptual transfer systems(IL with separate lexicon)

Interlingual level

Logico-semantic level

Syntactico-functional level

Morpho-syntactic level

Syntagmatic level

Graphemic level

Conceptual transfer Semantico-linguistic interlingua

SPA-structures (semantic &

predicate-argument)

F-structures (functional)

C-structures (constituent)

Tagged text

Text

Type Enconversion Conc. transfer Deconversion Examples

RBMT

1980—

Lemmatizationdirect or rules

string-graph transformationsrules

in principle nonegraph-string

transformationsrules

ATLAS-IIFujitsu, 1980—

PIVOT NEC, 1983—

RBMT

1980—DCG (?)

rulesULTRA NMSU, 89-95

RBMT

1997—depending on partners

rules (until now!)

navigation in set of

UWsUNL lexicon

depending on partnersrules

UNL 1996—

Page 45: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 33

Interlingual structures: French-Korean by IF (1)

Page 46: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 34

French-Korean by IF (2)

Page 47: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 35

Knowledge-based systems: explicit understanding(IL linked with an ontology)

Deep understanding level

Interlingual level

Logico-semantic level

Syntactico-functional level

Morpho-syntactic level

Syntagmatic level

Graphemic level

Ontological interlingua

Semantico-linguistic interlingua

SPA-structures (semantic &

predicate-argument)

F-structures (functional)

C-structures (constituent)

Tagged text

Text

Type Enconversion Mapping into $ Deconversion Examples

KBMT

1980—

lemmatization &

EPSG+f-structures+pseudo-unification

rules (using UP)

all but discourseelements

dict.+rules

+ interactivedisambiguation

planning deep-strrec. descent

rules

KBMT-89CMU, 1989—91

KANT/CatalystCMU+Caterpillar,en#fr-sp-de-? 1992—

RBMT

1997—

dictionary + FSTrules

IF is only a pragmatico-semantic representation

no mapping to $

dictionary + FSTrules

CSTAR-II & Nespole!GETA 97-03ETRI (Korea) 97-99

SMT

2003—

learned from(string,IF) KB

statistical

no mapping to $learned from(IF, string) KB

statistical

CSTAR-II & Nespole!Irst 98-03

Mastor-1IBM 2003

Page 48: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 36

[recap] Size & cost of resources / MT architectures

!!!!!Sentences

Type

6.5 w/s

BTEC, Meteo

25 w/s

News

SMT

PSMT

analogical EBMT

0.9—3 Mw3.6—12 K pages

0.15—0.5 M sentences

2.4—8 m*y (already done!)

50—200 Mw200—800 K pages

2—8 M sentences

100—400 m*y (available?)

EBMT with treesMSTMastor-1

N/A for short sent.

Supervised learning1h/page?

4—12.5 Mw15—50 K pages

0.15—0.5 M sentences

10—40 m*y (to do!)

EBMT with treesand S-SSTCs

Banturjah (USM)

N/A for short sent.

Supervised learning15 h/page !

dict. (50 K) available

4—12.5 Mw0.6—1 K pages

0.006—0.01 M sentences

6—10 m*y (to do!)

RBMT Dict. 3-10 K 0.6—2 m*y

Total 1—3 m*y (to do!)

Dict. 50-500 K15—150 m*y

Total 40—175 m*y

Page 49: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 37

What kind of IL to choose if that is the choice?

IL+ontology restricted domain

high precision applicationscf. CLang (Mooney)beware, $ costlier than gram+dict!

machine learning possible

Pragmatico-semantic ILIF of CSTAR/Nespole!

task- and domain-relatedreservations in tourismmedical assistance

both MUST be restrictedworks very well then

machine learning possible

Semantico-linguistic ILATLAS-II, PIVOTbetter: UNL

all domains/tasks: IL has to begrounded on a NLunderstandable

by most developers anywhere

amenable to machine learning

Page 50: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

Introduction to UNL

an anglo-semantic interlingua

Page 51: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 39

Enconversion

Source document (English)

Enconverted document

(UNL)

Deconverted document (French)

Deconverted document (Russian)

Deconverted document (Spanish)

Deconverted document(Chinese)

Deconversions

once 'enconverted' into UNL, may be more easily

be 'deconverted' into other languages &

disseminated.

Its UNL representation can be manually improved

if necessary.

A document in a given natural language,

Short presentation of UNL — the vision

Page 52: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 40

Some words about UNLUNL = Universal Networking Language

• Project started at UNU in 1996, with 12 languages• H.Uchida (author of ATLAS-II)

• T. Della Senta (Dir. of IAS/UNU)

• Groups from 12 countries of 12 most spread languages

• Funding & organizational problems after 1998• Development of French, Russian, Spanish, Italian, Hindi, Thai, Portugese,

continued on own resources

• Opening of specs, tools in 1998 (see www.undl.org)

• Start of UNDL foundation in 2001 (Geneva, UNITAR bldg)• very high-level goals: link with ontologies, translation of EOLSS

• no concrete, funded project

• Start of U++C in 2005• U++C = UNL Consortium

to promote development of the UNL language à la Linux & à la W3C• prepare & disseminate open source resources, tools

• seek & support real applications of UNL# an experiment & evaluation on a Unesco web site (B@bel)

Page 53: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 41

What is UNL?

UNL =a projecta formal interlingua (IL) -- and maybe more!a html-based format for [companion files to] multilingual documents aligned at

sentence level

Language : 1 utterance # 1 (hyper)graphUNL graph = abstract structure of an equivalent English utteranceUNL symbols (UW, relations, attributes) constructed on English

UNL is understandable and usable by all developers in the world

The UNL approach is adequate forsemi-automatic & incremental translation on "all domains"extension to a large number of languagesother applications: CLIR, abstracting, gisting…

UNL is an improvement over the ATLAS-II pivotbest E!J system in Japan since 20 years5.57 M entries in the dictionaries (ATLAS-II v14, 12/2008)

Page 54: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 42

A very simple example

Page 55: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 42

A very simple example

Page 56: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 42

Free Software Portal

A very simple example

Page 57: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

• Graph = {relations between nodes bearing UWs & attributes}

A UNL graph is not always a tree

Page 58: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

• Graph = {relations between nodes bearing UWs & attributes}

The dog watches its master.

A UNL graph is not always a tree

Page 59: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

• Graph = {relations between nodes bearing UWs & attributes}

The dog watches its master.watch

masterdog

agt obj

pos

A UNL graph is not always a tree

Page 60: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

• Graph = {relations between nodes bearing UWs & attributes}

The dog watches its master.watch

masterdog

agt obj

pos

agt(watch(icl>do,agt>thing,obj>thing).@entry,

dog(icl>animal).@def)

obj(watch(icl>do, agt>thing,obj>thing).@entry,

master(icl>human))

pos(dog(icl>animal).@def, master(icl>human))

A UNL graph is not always a tree

Page 61: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

• Graph = {relations between nodes bearing UWs & attributes}

The dog watches its master.watch

masterdog

agt obj

pos

agt(watch(icl>do,agt>thing,obj>thing).@entry,

dog(icl>animal).@def)

obj(watch(icl>do, agt>thing,obj>thing).@entry,

master(icl>human))

pos(dog(icl>animal).@def, master(icl>human))

• A graph line :

agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)

A UNL graph is not always a tree

Page 62: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

• Graph = {relations between nodes bearing UWs & attributes}

agt : binary relation 'defining a thing which initiates an action'

The dog watches its master.watch

masterdog

agt obj

pos

agt(watch(icl>do,agt>thing,obj>thing).@entry,

dog(icl>animal).@def)

obj(watch(icl>do, agt>thing,obj>thing).@entry,

master(icl>human))

pos(dog(icl>animal).@def, master(icl>human))

• A graph line :

agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)

A UNL graph is not always a tree

Page 63: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

watch(icl>do…): 'universal word' or 'unit of virtual vocabulary' (UW) made of

- a 'headword' : watch

- a 'restriction' : icl>do,agt>thing,obj>thing # lexical disambiguation + arguments

• Graph = {relations between nodes bearing UWs & attributes}

agt : binary relation 'defining a thing which initiates an action'

The dog watches its master.watch

masterdog

agt obj

pos

agt(watch(icl>do,agt>thing,obj>thing).@entry,

dog(icl>animal).@def)

obj(watch(icl>do, agt>thing,obj>thing).@entry,

master(icl>human))

pos(dog(icl>animal).@def, master(icl>human))

• A graph line :

agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)

A UNL graph is not always a tree

Page 64: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 43

watch(icl>do…): 'universal word' or 'unit of virtual vocabulary' (UW) made of

- a 'headword' : watch

- a 'restriction' : icl>do,agt>thing,obj>thing # lexical disambiguation + arguments

• Graph = {relations between nodes bearing UWs & attributes}

agt : binary relation 'defining a thing which initiates an action'

The dog watches its master.watch

masterdog

agt obj

pos

agt(watch(icl>do,agt>thing,obj>thing).@entry,

dog(icl>animal).@def)

obj(watch(icl>do, agt>thing,obj>thing).@entry,

master(icl>human))

pos(dog(icl>animal).@def, master(icl>human))

• A graph line :

agt(watch(icl>do,agt>thing,obj>thing).@entry,dog(icl>animal).@def)

A UNL graph is not always a tree

@entry, @def : «attributes!» specifying how the concept is used in the graph :

- @entry means that the node is the graph entry ;

- @def specifies definiteness

Page 65: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 44

regret

John

agt

know

agt

:01

obj

obj

and

John knows that Peter will not come and regrets it.

Peter

come

agt

:01

This "scope" node of the graph is the subgraph described here.

A scope is a connex subgraphmade of arcs sharing a "scope id" + their nodes

Page 66: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 44

regret

John

agt

know

agt

:01

obj

obj

and

John knows that Peter will not come and regrets it.

Peter

come

agt

:01agt:01(come.@entry.@future.@not,Peter)

This "scope" node of the graph is the subgraph described here.

A scope is a connex subgraphmade of arcs sharing a "scope id" + their nodes

Page 67: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 45

Any UNL graph can be "unfolded" in an auxiliary UNL-tree

Isaac sees that an apple falls and he explains it.

Page 68: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 45

Any UNL graph can be "unfolded" in an auxiliary UNL-tree

Isaac sees that an apple falls and he explains it.

explain

Isaac

agt

see

agt

:01

obj

obj

and

:01

apple

fall

obj

UNL (hyper) graph

Page 69: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 45

Any UNL graph can be "unfolded" in an auxiliary UNL-tree

Isaac sees that an apple falls and he explains it.

agt(explain(icl>do).@entry,Isaac(icl>proper noun))

obj(explain(icl>do).@entry,:01)

obj:01(fall(icl>occur).@entry,apple)

and(explain(icl>do).@entry,see(icl>do))

agt(see(icl>do),Isaac(icl>proper noun)

obj(see(icl>do),:01)

explain

Isaac

agt

see

agt

:01

obj

obj

and

:01

apple

fall

obj

UNL (hyper) graph

Page 70: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 45

Any UNL graph can be "unfolded" in an auxiliary UNL-tree

Isaac sees that an apple falls and he explains it.

agt(explain(icl>do).@entry,Isaac(icl>proper noun))

obj(explain(icl>do).@entry,:01)

obj:01(fall(icl>occur).@entry,apple)

and(explain(icl>do).@entry,see(icl>do))

agt(see(icl>do),Isaac(icl>proper noun)

obj(see(icl>do),:01)

explainIsaac:01agt

see

:01obj

and

apple

fall

obj

Isaac:01 agt :01obj

UNL tree (auxiliary)

explain

Isaac

agt

see

agt

:01

obj

obj

and

:01

apple

fall

obj

UNL (hyper) graph

Page 71: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 46

agt (agent) action—agt# thing in focus which initiates it

and (conjunction) X—and# Y conjunctive relation between 2 concepts (word or phrase senses)

aoj (thing with attribute) state or attribute —aoj# thing concerned

bas (basis) degree—bas# thing used as the basis (standard) for a comparison

ben (beneficiary) event or state —ben# indirect beneficiary or victim of it

cag (co-agent) action—cag# thing not in focus which initiates it in parallel with the agent

cao (co-thing with attribute) state or attribute—cao# thing not in focus concerned in parallel

cnt (content) X—cnt# Y equivalent concept (Y!X)

cob (affected co-thing) implicit parallel event or state—cob# thing directly affected

con (condition) focused event or state—con# non-focused event or state which conditions it

coo (co-occurrence) focused event or state—coo# co-occurring event or state

dur (duration) event or state—dur# period of time during which it occurs or exists

fmt (range) X—frt# Y range between two things (from X to Y)

frm (origin) X—frm# Y origin of thing X

gol (goal/final state) event—gol# final state of an object or thing finally associated with its object

ins (instrument) event—ins# thing used to carry it out

man (manner) event or state—man# way to carry out the event or to characterize the state

met (method) event—met# method to carry it out

mod (modification) focused thing—mod# thing which restricts it

nam (name) thing—mod# a name of that thing

obj (affected thing) event or state—obj# thing in focus directly affected by it

UNL semantic relations 1/2

Page 72: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 47

opl (affected place) event—opl# place in focus where it has effects

or (disjunction) X—or# Y disjunctive relation between 2 concepts (word or phrase senses)

per (proportion, rate or distribution) X—per# thing used as basis (standard) or unit of proportion,

rate or distribution X

plc (place) event or state or thing—plc# place where it occurs or is true or exists

plf (initial place) event or state—plf# place where it begins or becomes true

plt (final place) event or state—plt# place where it begins or becomes false

pof (part-of) focused thing—pof# thing of which it is a part

pos (possessor) thing—pos# possessor of it

ptn (partner) action—ptn#indispensable non-focused initiator of it

pur (purpose or objective) event or existing thing—pur# purpose or objective of an event or

purpose of a thing

qua (quantity) thing or unit—qua# quantity of it

rsn (reason) event or state—rsn# reason that it happens

scn (scene) event or state or thing—scn# virtual world where it occurs or is true or exists

seq (sequence) focused event or state—seq# prior event or state

src (source/initial state) event—src# initial state of an object or thing finally associated with

its object

tim (time) event or state—tim# time at which it occurs or is true

tmf (initial time) event or state—tmf# time at which it starts or becomes true

tmt (final time) event or state—tmt# time at which it starts or becomes false

to (destination) X—to# Y destination of thing X

via (intermediate place or state) event or state—via# intermediate place

UNL semantic relations 2/2

Page 73: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 48

@entry: graph entry.

@def : determination

@pl : plural

Attributes specify how concepts are used in a givengraph (tense, aspect, determination, number, etc.)

agt(watch(agt>thing,obj>thing).@entry,dog(icl>animal).@def.@pl)

UNL attributes

Page 74: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 49

Time :

@past happened in the past

@present happening at present

@future will happen in future

Aspect :

@begin beginning of an event or a state

@complet finishing/completion of a (whole) event.

@continue continuation of an event

@custom customary or repetitious action

@end end/termination of an event or a state

@experience experience

@progress an event is in progress

@repeat repetition of an event

@state final state or the existence of the object on which an action has been taken

The preceding attributes may be modified by the following ones :

@just

@soon

@yet

UNL attributes (examples)

Page 75: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 76: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 77: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 78: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 79: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

agt

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 80: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

agt

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 81: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

agt

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 82: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

agtobj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 83: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

agtobj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 84: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

agtobj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 85: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

agtobj

gol

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 86: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 87: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 88: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 89: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

modobj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 90: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod:01

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 91: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod:01

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 92: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 93: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 94: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 95: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 96: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 97: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 98: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 99: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agtobj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 100: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 101: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 102: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 103: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 104: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

and

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 105: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

and

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 106: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

and

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 107: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

andagt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 108: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@pl

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 109: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@pl

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 110: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobjagt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 111: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 112: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 113: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 114: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 115: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 116: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

agt

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 117: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 118: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 119: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 120: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

ins

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 121: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

ins

Inuit(icl>language)

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 122: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

ins

Inuit(icl>language)

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 123: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

ins

Inuit(icl>language)mod

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 124: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

ins

Inuit(icl>language)mod

agt

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 125: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 50

provide(agt>thing,obj>thing)

.@entry

attavik.net(icl>entity)

system(icl>method)

.@indef

management(icl>activity)

.@def

content(icl>information)

agtobj

gol

mod

gol

:01

write(agt>human,obj>thing)

.@entry

manage(icl>treat

(agt>volitional thing,obj>thing)

and

speaker(icl>role)

.@indef.@pl

agt

native(mod<human)

mod

offer(icl>give(agt>thing,

gol>thing,obj>thing))

anddocument(icl>information)

.@indef.@plobj

payment(icl>action)

.@indef.@pl

obj

online(icl>place)mod

language(icl>system)

.@def

ins

Inuit(icl>language)mod

agtins

man

obj

A more complex example (22-24 words):It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

Page 126: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 51

It provides a content management system that allows native speakers towrite, manage documents and offer online payments in the Inuit language.

agt(provide(agt>thing,obj>thing).@entry, attavik.net(icl>entity))

obj(provide(agt>thing,obj>thing).@entry, system(icl>method).@indef)

gol(system(icl>method).@indef, management(icl>activity).@def)

obj(management(icl>activity).@def, content(icl>information))

gol(provide(agt>thing,obj>thing).@entry, :01)

and:01(manage(icl>treat(agt>volitional thing,obj>thing)), write(agt>human,obj>thing).@entry)

obj(:01, document(icl>information).@indef.@pl)

agt(:01, speaker(icl>role).@indef.@pl)

mod(speaker(icl>role).@indef.@pl, native(mod<human))

and(offer(icl>give(agt>thing,gol>thing,obj>thing)), :01)

obj(offer(icl>give(agt>thing,gol>thing,obj>thing)), payment(icl>action).@indef.@pl)

mod(payment(icl>action).@indef.@pl, online(icl>place))

ins(offer(icl>give(agt>thing,gol>thing,obj>thing)), language(icl>system).@def)

mod(language(icl>system).@def, Inuit(icl>language))

agt(offer(icl>give(agt>thing,gol>thing,obj>thing)), speaker(icl>role).@indef.@pl)

Page 127: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 52

Who Where Enconv Déconv Dict. Remarks

UNL-CenterH.!Uchida

M.Y. ZhuTokyo

UNDLFond. T. Della-Senta Genève —

English UNL-C Tokyo — — 100000

IPPI Moscou + + 45000

Arabic RSS Amman ? + + not active

Daoud Amman univ + + + related CATS proj.

Bibl. Alexandr. Alexandria + + + EOLSS proj.

Armenian LAI Erevan — +? ?

Chinese XMT (Pr Shi) Xiamen + + 90000? gives his system

Korean KAIST Taejon — — — never active

French GETA Grenoble + ++ 45000

German IAI, DFKI Saarbruck — — — not active

Hindi IITB Bombay + ++ 30000? + " applis UNL

Marathi ? + ?

Indonesian AIA (BPPT) Djakarta — +— 40000? server inactive

Italian ILC Pise ? ++ 50000

Japanese UNL-C Tokyo ? + 100000

Lituanian AIL (Spektor) Riga —— — —

Mongolian Delgerjav Toho univ. — — —inactive since 2000

Portuguese IFSC, USC Brésil — +? ?

Russian IPPI Moscou + ++ 45000

Spanish UPM Madrid + ++ 45000

Swahili ? ? — — — never active

Thai NECTEC Bangkok — + 70000? inactive since 2003

Resources available from UNL centers

Page 128: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 53

Embedding a comparative task-relatedevaluation of UNL in a real translation task

• Experimental setting (2004-05)Unesco official languages: arb, chi, eng, fre, rus, spaThe Unesco B@bel web site has

• a multilingual architecture:dynamic web pages, contents from a multilingual SQL database

• 100% in English, 10% in French, 0% in other languages

• 42301 words (173 standard pages, $2980 sentences) in English

• GoalsTranslate it : en # fre rus spa +chi

• using available MT systems

• Measure human work (task-related post-editing in translator's mode)Produce UNL graphs for the most complex sentences ($1000)

• Measure times (graph production, dict. & deconverter update)

• Post-edit deconversions in translator's mod (UNL = source language)Compare and evaluate future usability of UNL

• for comparable tasks

• with N target languages, spreading the "UNL cost" on all of them

Page 129: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 54

Translation of the Unesco B@bel web site

• Partners: GETA (fre, chi), IPPI (rus), UPM (spa)

• Systran used for eng#fra, spa, chi

• ETAP-3 used for eng#rus

• Post-edition in "translator's mode"by each partnerfor his target language(s)

Page 130: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 55

Potential actual and future gains

Results and projections for 10 & 20 target languagesUNL-sa1/sa-2: mid-/long-term speed-ups in graph creation

• Times are in minutes and for 1 page of 250 words

Text type Simple (12 w/s) Complex (25 w/s)

10 target languages 1st draftBil.

Rev

UNL

RevTot 1st draft

Bil.

Rev

UNL

RevTot

H only 45 15 — 60 60 20 — 80

H+TM 20 5 — 25 30 10 — 40

MT-gen 0 15 — 15 0 25 — 25

MT-spec 0 5 — 5 0 15 — 15

UNL-man 120 — 10 22 240 — 10 34

UNL-sa1 20 — 8 10 30 — 8 11

UNL-sa2 10 — 5 6 15 — 5 6.5

20 target languages (UNL-man time is spread over them)

UNL-man 120 — 10 16 240 — 10 22

UNL-sa1 20 — 8 9 30 — 8 9.5

UNL-sa2 10 — 5 5.5 15 — 5 5.7

Page 131: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 56

Current developments & perspectives

• U++Consortiumwww.unl.fi.upm.es/consorcio/

• CWL (W3C) incubator www.w3.org/2005/Incubator/cwl/

• EOLSS/UnescoL project done in 2008

Page 132: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 57

How to build enconverters / deconverters?

• Start from existing MT systems, if any"bridge the gaps"

• For !-languages (under-resourced languages)use corpora (L, UNL) available from translations + UNL

Page 133: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

Idea on how to bridge the gaps

Page 134: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

Idea on how to bridge the gaps

Page 135: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

concr-str_enLingenio

Idea on how to bridge the gaps

Page 136: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

concr-str_enLingenio

abstr-str_fr abstr-str_enMSR

Idea on how to bridge the gaps

Page 137: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

concr-str_enLingenio

gm-str_en

B'VITAL/FE/aéro(G

ET

A)

abstr-str_fr abstr-str_enMSR

Idea on how to bridge the gaps

Page 138: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

concr-str_enLingenio

ETAP-3

text_ru

abstr-str_ru

ET

AP

-3concr-str_ru

(IP

PI)

gm-str_en

B'VITAL/FE/aéro(G

ET

A)

abstr-str_fr abstr-str_enMSR

Idea on how to bridge the gaps

Page 139: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

concr-str_enLingenio

ETAP-3

text_ru

abstr-str_ru

ET

AP

-3concr-str_ru

(IP

PI)

UNL-FR (G

ETA)

pivot (UNL, IF)

gm-str_en

B'VITAL/FE/aéro(G

ET

A)

abstr-str_fr abstr-str_enMSR

Idea on how to bridge the gaps

Page 140: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

concr-str_enLingenio

ETAP-3

text_ru

abstr-str_ru

ET

AP

-3concr-str_ru

(IP

PI)

UNL-FR (G

ETA)

pivot (UNL, IF)

c-str_sp

text_sp

a-str_sp

gm-str_en

B'VITAL/FE/aéro(G

ET

A)

abstr-str_fr abstr-str_enMSR

Idea on how to bridge the gaps

Page 141: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 58

text_fr

concr-str_fr

text_en

Systran, Reverso, LMT,

METAL (Comprendium)

concr-str_enLingenio

ETAP-3

text_ru

abstr-str_ru

ET

AP

-3concr-str_ru

(IP

PI)

UNL-FR (G

ETA)

pivot (UNL, IF)

c-str_sp

text_sp

a-str_sp

gm-str_en

B'VITAL/FE/aéro(G

ET

A)

abstr-str_fr abstr-str_enMSR

c-str_hi

text_hi

a-str_hi

Idea on how to bridge the gaps

Page 142: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 59

UNL and computerizing !-languages

Let Lt be a !-language (poorly computerized — THA, LAO, VIE…)

• Start form an aligned corpus {(text_Ls, text_Lt)}n

where Ls is a «!rich!» language (FRA, ENG, RUS, SPA…)

• Construct (semi-automatically) the corpus{(text_Ls, UNL)}n

• Using the obtained aligned corpus {(UNL, text_Lt)}n

build the UNL-trees relative to Lt

{(tree-UNL_Lt, text_Lt)}n

align to obtain abstract dependency trees of Lt:{(tree-dep_Lt, text_Lt)}n

use machine learning to build adependency analyzer for Lt

program the transformation

tree-dep_Lt # tree-UNL_Lt

• This produces a deconverter and an enconverter for Ltand derived products (analyzer, lexical correspondences…)

Page 143: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 60

UNL-graph ! UNL-tree

Reversible and systematic transformation (simple here)

The city will recover a coastal zone after the Forum

recover(icl>do)

.@entry.@future.@complete

objagt

tim

city(icl>entity)

.@def

zone(icl>place)

.@indef

obj

forum

(icl>proper noun)

.@def

after

mod

coastal(aoj<thing)

Page 144: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 61

UNL-tree ! Spanish text

Remark!: we work with imperfect data

1. Here, 2 errors remain (de costal, el Foro)

2. Feedback to the developers of deconverters is planned

Page 145: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 62

The alignment (UNL-tree, text) is not always projectiveThe UNL-tree is an abstract dependency tree at the semantic level

UNL-tree ! Chinese text

Page 146: Aperçu de “Boitet-NII-MT-lect2-v2rev.ppt” · after unsure or unfinished OCR (very rare in MT, sometime for military intelligence) • Several strings of typographical words with

16/4/09© Ch. Boitet —!NII MT lectures 63

Recap & first conclusions

• The distinctions RBMT, EBMT, an-EBMT, (P)SMT…concern the computational architecture only (PROCESSES)

• The rawer the corpora, the larger they must beSMT/PSMT is for niches for the rich (languages, texts)

• few parallel corpora of 200—800 K pages

• to build them from scratch is 2 to 3 times more expensive than to build aclassical large RBMT system

• IL-based MT can use any computational frameworkstatistical, analogical, rule-based, hybridall depends on available corporal / linguistic / human resources

• Many applications need an adequate IL# all applications needing to

• manipulate content

• in a strongly multilingual setting

• That is possiblewith empirical as well as expert / hybrid architectures