1 montague grammar and mt chris brew, the ohio state university
TRANSCRIPT
1
Montague Grammar and MT
Chris Brew, The Ohio State University
http://www.purl.org/NET/cbrew.htm
Montague Grammar and MT 2795V, Autumn 2005
Machine Translation and Montague Grammar
Great paper by Jan Landsbergen, in Readings in Machine Translation.
The place of linguistics in MT What is the essence of Montague Grammar? How can we use it (the essence) in MT? The subset problem How does this look today?
Montague Grammar and MT 3795V, Autumn 2005
Possible translations
It must be defined clearly what the correct sentences of the source and target languages are. Linguistic theory provides means to do this by providing
grammars with associated compositional semantics Landsbergen suggests a Montague -(inspired) grammar
If the input is a correct source language sentence, the output should be a correct target language sentence. This is a condition on the design of the translation system. Landsbergen sketches one approach
There must be some definition of the information content that the source and target sentences should have in common This is a call to arms for translation theory No good solution is currently available
Montague Grammar and MT 4795V, Autumn 2005
Best translations
It must be defined clearly what the correct sentences of the source and target languages are. This defines the search space of possible inputs and outputs
If the input is a correct source language sentence, the output should be the best corresponding target language sentence. The system will be evaluated on its treatment of correct sentences.
Robustness with respect to incorrect input is not required. It could be that there are three sentences e,f and e’ such that f is
the best translation of e but e’ is the best translation of f. ‘best translation’ is not a symmetric relation
By contrast, ‘possible translation’ is symmetric. In addition, if we have three languages E,F,G then we have
transitivitypossibleE-F possibleF-G = possibleE-G
Montague Grammar and MT 5795V, Autumn 2005
Comparing MT systems
It is possible to reason theoretically about systems that at least aspire to Landsbergen’s principles
There are no obvious grammatical or semantic criteria for evaluating systems when the output is not even a correct sentence of the target language.
Linguists should specify the possible translations Engineers (or linguists wearing hard hats) should
worry about robustness and translation selection. The robustness part might need to appeal to world
knowledge, discourse history, knowledge of the task, other extralinguistic factors
Montague Grammar and MT 6795V, Autumn 2005
The essence of Montague Grammar
There is a set of basic expressions with meanings
Rules are pairs of a syntactic and a semantic rule, where the syntactic and the semantic rules work in lock-step (Rule-to-rule hypothesis)
Either: the semantic rules are operators that build up the semantic value (Montagovian)
Or: the semantic rules build up an expression in some logic, then the expression is interpreted by the rules of the logic to produce a standardized semantic value (echt Montague)
Montague Grammar and MT 7795V, Autumn 2005
Landsbergen’s system
M-grammars Have surface trees (S-trees). S-PARSER is
standard technology, generates parse forest of S-trees)
M-PARSER scans the results of S-PARSER, and applies a series of analytical rules to the S-trees rewriting them to produce surface trees. The M-PARSER is very powerful, and builds up semantic values.
The result of M-PARSER is a semantic tree that is easy to transfer.
Montague Grammar and MT 8795V, Autumn 2005
The subset problem
Montague grammars translate natural language into subsets of intensional logic
There is no guarantee that the subset will be the same for every language
Without extra cleverness, the only sentences that can be translated will be those which are in the intersection of the source language IL and the target language IL
Montague Grammar and MT 9795V, Autumn 2005
Isomorphic grammars
To avoid the subset problem, impose the constraint that For every syntactic rule in one language there is a
corresponding syntactic rule in every other language, and that the meaning operation is the same across the board
For every basic expression, there is a corresponding one in every other language
This is a really heavy constraint on grammar writers, and it isn’t clear how to do it
Montague Grammar and MT 10795V, Autumn 2005
Grammar writing
A set of compositional rules R is written for handling a particular phenomenon in language L, a corresponding set of rules R’ is written for handling the corresponding phenomenon in language L’ (Landsbergen p250)
Grammar development proceeds in parallel. You test by ensuring that R covers the relevant expressions of L and R’ covers the relevant expressions of L’
The most important practical difference between this and other approaches is probably that the grammars are written with translation in mind.
Montague Grammar and MT 11795V, Autumn 2005
The claim
If you do this grammar-writing co-ordination, you can get away without worrying about the subset problem
Montague grammar may be way too complicated but if Dutch geloven works the same as English believe you can, in that case, get away with the same theoretically insufficient representation on both sides
You might be able to control the consequencesof putting extra (non-truth functional) control information into the IL by doing this on a case-by-case basis in order to co-ordinate specific phenomena. (DANGER)
Montague Grammar and MT 12795V, Autumn 2005
How does this look today?
Practical experience with broad-coverage grammars
We now know that broad-coverage grammars produce large numbers of analyses, most of them crazy.
It definitely pays to do some kind of probabilistic parse selection, even if you have a good broad-coverage grammar.
If your goal is to do well on existing parsing metrics, it works well to learn the grammar from a treebank.
Montague Grammar and MT 13795V, Autumn 2005
•The linguistic question
Given a tree, tell me how to make a score for the tree out of smaller components
Montague Grammar and MT 14795V, Autumn 2005
Given a tree
Tell me how to break it down into smaller components
Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable
But large enough that the crucial relationships between the parts of the tree have a chance of coming through
Probabilistic context-free grammars are (slightly?) too coarse-grained.
So we adjust them in ways that bring out more of the crucial relationships. Add parents, grandparents, head-words, other clever
stuff
Montague Grammar and MT 15795V, Autumn 2005
Given a translation pair
Tell me how to break it down into smaller components
Smaller components because these smaller components are going to be common enough that the statistics over them might be reliable
But large enough that the crucial relationships between the parts of the pair have a chance of coming through
Language model for TL, standard technology Models 1,2,3,4,5 for SL TL correspondence.
Clearly very coarse-grained How to adjust so that more of the crucial
relationships come through? How to think about translation pairs?
Montague Grammar and MT 16795V, Autumn 2005
Errorfulness
PTB is smallish and somewhat errorful This imposes practical limits on the complexity
of models. The more detail you ask for, the less likely your training procedure is to provide it in reliable form.
Hand-written grammars blur the distinction between ungrammaticality and lack of coverage.
It is therefore dangerous for components that use grammars to give too much weight to the grammar’s claims about ungrammaticality
Even when the grammar fails to provide a complete analysis, it could provide useful partial results.
Montague Grammar and MT 17795V, Autumn 2005
Errorfulness
Current word-aligned corpora are tiny, but do at least exist. Presumably they too are errorful.
Unsupervised learning via EM has dominated the field. This is because nothing better is available. The pseudo-annotation that EM hallucinates is very errorful.
The complexity of models is limited by the need to do EM and by the difficulty of working with errorful annotation.
It is dangerous for the system to believe hard-and-fast things about intertranslatability
Montague Grammar and MT 18795V, Autumn 2005
Coverage
To score well, it usually pays to guess even if The question seems so stupid that no sensible answer
is possible Your answer would be little better than a random guess
Statistical parsers build up models of grammar that always make a guess
The models learn from the whole of the data. They might be designed to learn linguistic things, but they can and do implicitly learn non-linguistic things that turn out to help.
Montague Grammar and MT 19795V, Autumn 2005
Coverage
To score well, it usually pays to guess even if The question seems so stupid that no sensible answer is
possible Your answer would be little better than a random guess
Brown-style MT systems have good coverage, and not-bad probabilistic models of <something>. They too learn from the whole of the data.
Their design is shaped partly by the need to model linguistic things (e.g. word order variation) partly by accidental success in modeling other factors that we don’t understand yet
Montague Grammar and MT 20795V, Autumn 2005
Conclusions
There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality.
Arguably, statistical parsers succeed because they relax the notion of grammaticality, allowing them to handle misfires in the grammar smoothly. Co-incidentally, they finish up robust to other difficulties, including weaknesses in the statistical models and the training data.
Montague Grammar and MT 21795V, Autumn 2005
Conclusions
There is clear parallel between Landsbergen’s notion of intertranslatability and Montague’s notion of grammaticality.
Arguably, MT systems succeed because they relax the notion of intertranslatability (or just fail to even have such a notion).
Co-incidentally, this makes them robust to failings in the statistical modeling, the data, and the procedures for data augmentation.
That said, it would be nice to have explicit semantics in MT systems