© ch. boitet & wang-ju tsai (geta, clips) icukl-2002, goa, 25-29/11/02 1 proposals for solving...
TRANSCRIPT
![Page 1: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/1.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/021
Proposals for solving some problems in UNL encoding
International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002
Christian BOITET
GETA, CLIPS, IMAG, Grenoble
![Page 2: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/2.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/022
Which problems?
What Igor said "remains to be done"1. representation of multi-word concepts (« long
UWs »);
2. elliptical expressions;
3. treatment of arguments both in the UW dictionary and in the UNL expressions
and 1. conventions about attributes
2. XML formats for UNL documents
![Page 3: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/3.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/023
Representation of multi-word concepts (long UWs) — 1
Problematic examples of "UNKNOWN LONG UWs""Institute of Advanced studies (UNU/IAS)"(icl>…)
"Institute of Advanced studies (UNU/IAS)"(icl>…)
"East-Asia cooperation office"
East-Asia cooperation office
east-asia cooperation office(icl>…)
"Tokyo University"
"University of Kyoto"
"World Bank(icl>…)"
![Page 4: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/4.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/024
Representation of long UWs — 2
What are the problems?1. No hope of including all these long UWs in our
UNL-LLL dictionaries because of potentially immense, unbounded number of
such UWs Maybe never more than 5%, 10% of them in open
domains
2. Necessity to include an analyzer of English compounds in order to translate "unknown long UWs" piece by piece. but such compounds are extremely ambiguous
![Page 5: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/5.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/025
Let us think a bit more
Proper nouns CAN be decomposed. This is NOT to say that their translation is always
compositional.Compositional: World Bank ==> Banque du
Monde false
Idiomatic: World Bank ==> Banque mondiale correct
So that we should have a solution allowing BOTHCompositional deconversion if the long UW is
unknown
Idiomatic deconversion after it put in the UNL-LLL dictionary
![Page 6: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/6.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/026
Proposal of a solution
Origin Proposed by H.Uchida at a meeting in Tokyo (1999?) Not yet included but still needed and still the best
Principle Headword encodes a UNL representation of the
compound
Possible syntax"(mod(bank(icl>entity).@entry,world):01)"(icl> entity)
"(mod(bank(icl> entity).@entry,world))"(icl> entity)
… or a better one!
![Page 7: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/7.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/027
How to deconvert Case 1:
"(mod(bank(icl>institution).@entry,world))"(icl>institution)is not in the UNL-FR dictionary
==> French deconverter "unwraps" mod(bank(icl>institution).@entry,world)
into a scope of the UNL-graph
![Page 8: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/8.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/028
Another example
«"(mod(university.@entry,Tokyo(icl>town)):01)"(icl>entity)»
Compositional deconversion Université de Tokyo University of Tokyo Universität von Tokyo Tokyo no daigaku (or Tokyo ni daigaku)
Idiomatic deconversion Université de Tokyo (or Todai!) Tokyo University / University of Tokyo Universität Tokyo Tokyo daigaku / Todai
![Page 9: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/9.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/029
Elliptical expressions
ExampleDo you prefer the first or the second solution?I prefer the first.
Je préfère le premier? Je préfère la première?
==> A bad deconversion will be very misleading.
Possible solutionEncode the elided element and put .@eld on it.That is equivalent to "preedit" the input text
I prefer the first <eld>solution</eld>.
…and in the spirit of the new idea by H.Uchida of preediting for semantic relations
![Page 10: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/10.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0210
Treatment of arguments
in the UW dictionary in the UNL expressions
See talk by I.Bogulslavskij The solution proposed entails
1. a very small change in the UNL syntax Allow attributes .@A, .@B, .@C, .@D on arcs
hence also on restrictions by sem.rel.
2. a discipline in the UW creation all arguments should appear as restrictions
![Page 11: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/11.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0211
"Argument-full" + "readable" UW
Argument-fulllook(icl>do,agt.@A>person,obj.@B>thing);
look(icl>do,agt.@A>person,gol.@B>thing);
look(icl>do,agt.@A>person,dst.@B>thing);
Readablelook(icl>do, agt.@A>person, obj.@B>thing);look for
something
Even more readablelook for(icl>do,agt .@A>person, obj.@B>thing);look for
something
![Page 12: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/12.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0212
Continuing that list…
look for(icl>do,agt.@A>person, obj.@B>thing);look for something
look at(icl>do,agt.@A>person, plt.@B>thing);look at somethingor
look at(icl>do,agt.@A>person, obj.@B>thing);look at something
look like(icl>do,agt.@A>person, cmp.@B>thing);look like something
look like(icl>do,agt.@A>person, obj.@B>thing);look like somethingmight also cover "look as" in "he looks as a good man"
or
look as if(icl>do,agt.@A>person, obj.@B>thing);it looks as if…
look(icl>do,agt.@A>person, obj.@B>thing);look for something
![Page 13: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/13.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0213
Attributes
The problemlion(icl>mammal).@plur
==> un lion, les lions, lions? We don't know whether definiteness
has been computed ==> it is .@undef ==> use itor not ==> it is UNKNOWN ==> compute default
Solution: for every attribute XXXX, put.@XXXX for +XXXX (1 or true).@unXXXX for -XXXX (0 or false)nothing for XXXX unknown (? or undefined)
![Page 14: © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on](https://reader035.vdocument.in/reader035/viewer/2022072017/56649f035503460f94c17466/html5/thumbnails/14.jpg)
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/0214
XML formats for UNL documents
A minimal UNL-xml format strictly equivalent of UNL-htmlr– proposed & used by Tsai W.J. for the SWIIVRE-UNL web
site & his Ph.D. Methodology for defining and using other, more
detailed UNL-xml-xyz formats: – xyz is an application (e.g. a graphical editor, or statistics-
gathering tool, etc.), – Automatic parsing of the basic UNL-xml format introduces
new tags, – An object document model (DOM) suitable for application
xyz can then be defined and used.