i 591113 fulcrum techniques to languabe analysis · trw computer division c169-3u5 af30(602) -2643...
Post on 12-Jun-2020
1 Views
Preview:
TRANSCRIPT
I IAD-TDI131UMARCH 591113
"FULCRUM TECHNIQUES TO LANGUABE ANALYSISTHOMPSON RAMO WOOLORIOGE INC.t o1 0. M U .. ,,, N4
q44
1c
UL
0-
C1)IB
Ic
D DD
'HAH/l']'EiJlbHbLA4I (A) SIGNIFICANT (B) CONSIDERABLE (C) LARGE (D) GREAT
IRP AT PDON FT ONO OUITR MM W4KAt MLMRuu u FWi S~MsuW if $TAM Aif MU FIRM. iaauff am .mWA U NEUKCW . 9018es aimm mi
PATENT NOTICE: When Government drawings, specifications, or other data are used for any purpose other than In connection
with a definitely related Government procurement operation, the United States Government thereby incurs no responsibility nor
any obligation whatsoever and the fact that the Government may have formulated, fumished, or in any way supplied the said
drawings, specifications or other data is not to be regarded by implication or otherwise as in any manner licensing the holder
or any other person or corporation, or conveying any rights or permission to manufacture, use, or sell any patented Invention that
may In any way be related thereto.
Qualified requestors may obtain copies of this report from the ASTIA Doctument Service Center, Dayton 2, Ohio. ASTIA Servicesfor the Department of Defense contractors are available through the "Field of Interest Register" on a "need-to-know" certified
by the cognizant miitary agency of their project or contract.
!
RADC-TDR-63- 168
5 March 1963
IFINAL REPORT
FULCRUM TECHNIQUES TO SEMANTIC ANALYSIS
IIi
THOMPSON RAMO WOOLDRIDGE INC.TRW COMPUTER DIVISION
C169-3U5
AF30(602) -2643Project No. 4599
Task No. 459901
I[[r[
IIPrepared for the Information Processing Laboratory,Rome Air Development Center, Research and Technology Division
r Air Force Systems Command, United States Air ForceGriffiss Air Force Base, New York
[
FORE WORD
The Machine Translation group at Thompson Ramo Wooldridge
Inc. has been working under the sponsorship of the Intelligence Labo-
ratory of the Rome Axr Development Center, Griffiss Air Force Base,
since 1959. This research continues work done under previous con-
I tracts with RADC.
During the course of the present research a concurrent contract
has been in effect with the National Science Foundation. Some of the
tools used in the present research were created under NSF support.
In general, studies in the area of Semantics have been done under
contract with RADC, while work to improve the techniques for re-
search in MT has been done under contract to NSF.
The support of the Intelligence Laboratory and its members
at the Rome Air Development Center is hereby gratefully acknowledged.
III
[
[Ii
I
I This report represents the work of
I Loretta Ertel
Herbert L. Holley
Paul L. Garvin
I Jules Mersel (Project Manager)
Christine A. Montgomery
I George Onischenko
Gerhard Reitz
Steven B. Smith (Associate Project Manager)
I!!I
II
[[
!I[ ABSTRACT
A new semantic research technique has been developed andemployed under this contract. This technique makes possible the
construction of semantic classes of words which share the character-
istic of specifying a particular translation for a given polysemantic
Russian content word.
Improvements have been made in the RW program for the
areas of subject recognition and clause boundary determination.
Conclusions were reached about an improved English
synthesis. The latter involves reformulating Russian structures
I into appropriate but non-corresponding English structures.
The major hardware implication of the research suggests
that associative memories may provide a simplification in the area
of machine translation.
Flow charts and a sample translation are included.
II
[
L iv
IIiIIi PUBLICATION REVIEW
This report has been reviewed and is approved.
ApprovedFRANK J. INIChief, I ttion Processing laboratoryD rof tInti gecDirectora eof Intelligence & Electronic Warfare
Approved: INN, ol, USAF
Director of Intelligence & Electronic WarfareIrFOR THE COMMANDER:
IRVING GABELMANDirector of Advanced Studies
[![
[[I
[II CONTENTS
I Forward ............................................... ii
List of Personnel ........................................ iii
I Abstract .... ............................................... iv
Publication Review Page .... ............................... v
I Introduction ............................................. . 1
Semantic Research ..... .................................... 2
Clause Boundary Determination ........................... 21
I Subject Recognition ...................................... .. 23
Transformation Programs for Word OrderArrangement in the English Translation .................. 24
Synthesis .... .............................................. 26
Machine Translation of Stylistic Differences .................. 27
Semantic Research Using the Ushakov Dictionary ........... 30
Test Sentences for Syntax Flow Charts ....................... 31
I Hardware Implications of the Research .................... 45
Dictionary Updating ........................................ 52
Sample Translation .... ..................................... 53
Flow Charts of Determiner-Determinee Method ............. 64
I Conclusions and Recommendations ......................... 73
!I
| vi
I
INTRODUCTION
The first section of the report describes the semantic research
technique which was developed under this contract. This technique
is dependent upon our capability for automatic sentence parsing by
[ the Fulcrum approach.
There were two goals in this research; first to define rules
for a particular type of multiple meaning problem that had hereto-
fore been either ignored or considered impossible to solve; second,
to determine what light such rules might shed on the problem of
semantics in general.
Both goals were achieved; the second in particular displayed
certain consistencies in the Russian semantic system which are
remarkable.
I In our research, semantics means the ability to choose the
correct translation from among those provided in the dictionary.I Our studies have shown that the other text words which determine
this selection often fall into groups which themselves seem con-
ceptually related. There has always been a hope among researchers
that such semantic classes could be defined. We feel that our tech-
nique allows us to uncover the identity of these classes and, even
possibly, may provide a metric for the semantic distance function.
III[FF
[I
SEMANTIC RESEARCH
The first step in our research was to select Russian text and
acquire corresponding English translations of that text. The Russian Itext was selected from the field of biology and was taken from the
Doklady Akademii Nauk. All the text was chosen from the same
field because we were interested in problems of multiple meaning
within a single field, rather than differentiations in translations
which were field-defined.
On the opposite page we show p. 564of Vol. 123 of the Doklady for biology.
Aeomaaauw Axanemumw ays CC.CIsIN& Tomnin .IA M
300JorHRC. A. MHJIEAKOUCKHA
J1YHHAq nEPHomj~qHoCTb HEPECTA Y JIHTOPAJ~bHbMXH DEPXHECYSJIHTOPAJ~bHbX BECnO35OH04HbhX
SEJWOO MOPS H APYrHX MOPER
(flpedcmaeiefo axade.wom H. H. f1J.a..wy~vso 8 ViIl 1958)
.Tlyrnasn flCpNoANqNocTb paarnmomermnt 1i aepecra, CBAofCrmernea AMNorkM311kM fiHTOpA~bfMX GCeCUOUOHO'NMIX ?portl4'IeCKIX N Oopea~AbHuX Mope ("6).I~A Oym 4 ayw opANOmN apvTuqecimnx mopefi Ao CMx flop s IInTepalype Ne o~me.
H&a~wNse JayNNOA UCpHOAH'INOCTN 8 J)a3MMO2KCNNN AIITOp~bHhIX GecnO.a3MHOq~beX Beaoro mopu Gsuio ycraimonAaeao maN nyTeM H3YhlIClNX ANNAMatunu%HCJWNNOCTH 9 NJUIIKTONC lix ne11M`111fCK3IX JII4NHOK; AOC1034'PHOC~h We PC-ayjrabToS (JOpa6OTKH KoJni'ecTueHhimx flpoG mopCKoWo xoJo- it meponlarncToHaAOKa3M~a CTalNCTH~eCKN (1-).
MasrepilaA cofeira~cii a BeJIHxofI CIJIme - flpwoAe me*Ay n-o. Knuboxi o. BSJINKiiA (Bejaoe msope, KI(SA8RISKUICKIM1 laJIns). C 26 VI no 14 IX 1957 r.Oeaa v3310 58 xoJviNecTSCNiIx iipo6 IIJIanxroiia. flpooh 6p&IINcb pasaemwoc roJpn8owive 16-8 m 8- 0 u cermo LeAuex 113 rasa M 43 c 3ambmaTetmM. lily.q2Aaac AlHH8MIMK qHcACIIHoCTNj SCX AfftniIlOk ApNNMX OCCI1OIDONOqNMX, S ?K*-IS (AAR cpaNcIINC111) PS15* nOCTONH~HIX IIJINK1CPO@ N HX 111111I1101. BCC npo6aa,4mucupcianuhee 4.1 4IopII8JINoN, OUm'iN ripoctNinami ?T8JshII. AJnn IIocTpO-vNN Ipa:N4K0S NcnoJob3OS8Icb wAeJ1n~it 11A0NocTN A&Noro sum a I us
scjioe 16-0 m, raK icAK qcNJ~eNiioclb 9cex 113y'iaemhx sHAos NmOmNEJI8Cb 5 Oo-oUX ropH30ouirX CHHXPOHNo. Kpome TorO o w) cex rpa*rnacX SMUHIPMH.CICHC ftpauouoepiwe PR~bi (npodim 6panislCb qacro. s NIOJe qepe3 AMHI - vAU, 8 airY-Cce - ceuiidpe 'sepel Asa-'leriPe MNR. H3 He coCCm peryjixpiio, rJaa3HmmI o~~Cpa3ota iU33a noroAbi) nepe~eme~ JInnehIol Narrepnomutieei a paiSNomep-Nue - TpexA~eB~lue. .9r, a TaKNCC DIxiNC licex npo6 a oAnlY N Ty *e $uayflpwJiN (Na1 IIOJIOR DoAe) N "a GA~toN H Tom me meec B. CIJImm nowAmoaea8 3111111tieJbNOR Clefleui cNNTh MJIONCT CAY'lqgAHOCrN. Taic ica MaXCUMbMbUar~y6NNI B. Coamwa Gico~o 25 m. a npeoGJnaltaior rJI)ON~m Ao 20 m, TO JI'MUMAGnHhIX 6erfloloNoHoWtx, ecTpeyssiumeen a cc njianKiTowe, ace IIpNmaxenuaANK JIIITOPa~bNUM N 3epxuecyGJIHrOpa~lb~aaM 334MM.
Kaic nOKalaJI aMSaJINS, ANNaMHKe tIiCJocniIIocitN 6011buiel scrTH J1N'IIIOK AON-
NhIX 651Cf03S0NH'I~hX M3 [naNstious B. CaJImwI npNcyul eARNNIAf 31aconomepniiApirTm.orpawsiaxio.i~lJyNmIo nepnoAm'INocTb NCeCCTI EANbiwx SNAGS (pmc. I a -- r).fiyurnia nepwoAN'sNocri Napecra cooi)craeu~s. no NIIMiuM AINmmm, cae~yn-utum daoeC~otiotiNmm B. Cumin: 6pioxo~swum Lacuna divarlcata (0. Fabr.),Littorina littorca L.; Diaphana minuta Brown, Phifine aperta L., Lniapontlacapitals MOII., Eubrandius exlguus (A. a. H.). Tergipes despectus Johnston,a mine, aepoxmo, ee pxxy muaos us orpsAs~ reoJwMadopHux (J); ona npncy-
,,a pAy sycreop'larIx - MytIlus edullIs L., Macosna baltica L., Mysart-naria L.; seoKTopum ycouarsrn - Balanus balanus L., Verrca strOmla 0. F.MGIIer; PRAY Munaus N uMosoxiax Ophiopholis aculeata L., Ophiura robuataAyers, Aiterias rubena L. (ca.. pitC. 1), a Taicxce, oquu~ao, cyAn no cyamapuad
A)NaMNICS m'INCJirNOCu, N pxAy msNS Polydiaeta.
I 3
ii
The next problem was that of deciding which words to deal
with. In the past, a good deal of semantic research has been
devoted to problems of multiple meaning in connection with function
words (such as prepositions, particles, conjunctions), while-
except for a few instances of idioms-the problem of the multiple
meaning of content words (i. e., nouns, verbs, etc. ) was ignored.
We therefore determined to fix our primary attention on content
words.
On the opposite page, we give asample of the content words investigated -
along with their multiple translations.
A
41
A. in the process of I. relating to
B. connection(s) J. owing to
C. on account of K. due to
D. in view of L. 0E. relation M. relationship
F. in response to N. related
G. accordingly 0. thus
H. because of
I AaaJbHOeAE@
A. then F. ultimately
B. now G. further
C. subsequently H. later
D. subsequent 1. henceforth
E. prolonged
IaKWI pasmxi
A. various D. differing
I B. different E. varying
C. variation in the F. differences in the1I A. for C. over the course
B. course D. 0
ICTopo~a
A. side E. on the other hand
B. toward F. on one hand
C. direction G. laterally
ID. past
I c~yua~x
A. case C. events
B. occasions D. incidents
1 5
Next it was necessary to find out which content words had
multiple meaning problems within a single field, and occurred Hwith sufficient frequency to allow us to make some generalizations
about their behavior. For this purpose a program was written to
give us a frequency count of all words occurring within a large
body of text, and print out the individual Russian forms in the
order of their frequency of occurrence. We thereupon decided
to investigate all those content words with multiple equivalents
that had one form which occurred 20 or more times, as well as
certain other words of particular interest to us. jj
The following is a sample of the outputof the word frequency program. The wordsare listed in order of frequency and the num-ber to the far left indicates the position the I'word occupies in the list. The first groupof numbers to the right of a word indicatesthe number of times that word occurred.The next group of numbers indicates what IIpercentage of the total number of occurrencesof all words is represented by this word andall words prior to it in the list. For example, IIthis provides us with the information that the77 most frequent words (out of a total of37859 running words) represent one third ofthe occurrences in this text. The next groupindicates how much of the total text is repre-sented by occurrences of this word. The nextto last group is the position of the word in the||list multiplied by its frequency of occurrence.(This number, according to Zipf'a law, shouldapproximate 0. 1. Our "Zipf" number, however, Hdoes not take into account that many numbers,in reality, occupy the same position in thelist.) The last group of numbers (here zeros)has as yet no significance.
6
IC
1.00 09 0 1Il w -i.P - oa . r ýc no oq,- noc
4. U00U 0 ) C) C U00U C C ' U0om MC) M MM 0n MM) NN 00
4 C!
'-P-P-O.O'.0 0*** aOlv O'D 1ýWINfi -0 -cý Q O - 0 - in- %A - - N - n VI &P M Vi V% -, n in -% U % - &
000 A*C OU0 0 0 0 000 0 0 0 0 0 0 0I ~ U00JCCU00)0)000C)CO)u)0 C0
Ic I.-j0-W elI >- 0 VI ill I0 4CA - l .0 - n.oJONW 4.
>zz z.' wu NNNNNW IIvi W4i I.-v a**iimnene nel11W i .11.1W f131. 1 7v WI lv i i -v .11 )h .1 1 Wi lv lv) lv .W v lililiWIliW
IJ x 'A- I-r
Q. Q U C UJ %A 0 %A2.4 P II -e u~cA~ . AjL
VIC V" 4 f~1-p -- PF-S0owommmomom OakP00009
The next stop was to list the contexts in which these forms
occurred. In order to find themn in text, another program was [devised to print an alphabetical list of all the words that occurred
in text along with'their text locations.
The following is a sample of the outputof the program to provide a concordance.The first number to the right of a word in-dicates the number of times that wordoccurred. Following this for each occurrenceof the word five things are provided. The firstletter (B here in every case) indicates the Icorpus in which the word occurred, the secondletter ,the article,- and the third letter the pageof that article. The first number indicatesthe line on which the word occurred and thesecond the position on that line (first word,second word, etc.). For example, AKTEoccurred once in corpus B, article R, page 1A, line 12, word 2.
80
ne
lo
4C
U.0 0
doCDim4 , do
go ~ itli, 0 11-- 40 NinF 0%A i OD111`1114
a- %in co' a i
10.41 &L 00 U I1jn ! z4 o
N r.. N~ ~ ~ 44 M0C1 0 4s
0. I0I ,
31 A-* .. 111 30- if- S o
0~~~~ 0 7( 0z d
'3 )j==I 1 IVA,
30 Z I#A'~ 3' 2t .2 *3
gi 3. Si 1 CI 1t3
For each word we thereupon listed twenty or more occurrences
in context, along with the professional English translation of the
word and its context.
As an example of our procedure we wil use the Russian Fword OTHOUoKSE, which translates into English as relation, regard,
etc. 0
The following page shows a work sheetof one of our researchers, listing text loca-tions, Russian contexts and English transla-tions. The translations for the word underconsideration are underlined.
H
[IN
L4WCUr, .
~~c~-4a ~ J~'Ch ~ 'Fa rE Thtk
IT po4V. o~7/c rb )e, A kv e
e , ce-, 74if- /-oc;/, .-L, ~ Cf. -A ýnI £ ~ 'V
0- e. g~k /, i / &i&' 0" s S
5',4f ~ ~ ~ ~ k SLIG-i"hcu~~/~
o. 14 oi LA~k Ol4(&-=I 4 eCIS C 744~.
It Ob -eO( F 71 eýilrT po L~ k 0-64, mi LTMOAA--AELA( 1c.k
1~. _0_ ___ _ _ _
M. -, PLAI&I-- l*A Ar0 IW9J o
(OI _-_. Om ,4 OO1 I 0TW 6iiA6,
Since translators use a proliferation of translations, many
of which overlap or are synonymous (see illustration on front 0
cover), it was necessary to determine the minimal number of
translations needed to render the word in question into good
English.
At the top of the following page wegive all the translations of OTHONE@HNused by the professional translators, andat the bottom the smaller number of trans-lations which we found necessary for theoccurrences under consideration. Instancesin which the professional translator recastthe entire sentence have not been includedin this table.
I1B
UNII
Translations Used by Professional Translators
Iratiorelationshipbehavior
i prationeffectsrelation
I respectproportionpoint of view
as regardsc onne ct ion
Srelisponse
(Necessary Tra ns lo fe i di
respectproportion
S~as regardsrelation0 (Illy" added to translation of preceding modifier)
ratioresponse
!!!
13II
0Once the preceding two steps had been completed, these
contexts were "mapped" in such a way that any consistencies might
be quickly brought to the attention of the researcher.
On the opposite page we give thesyntactic "mapping" of the word OTYOUOHNO.The numbers down the left hand marginindicate the sentence in which the wordoccurred. The letters down the right hand [margin refer to the translation which theprofessional translator used in each case.The first column on the left gives theRussian words of which this word was anobject. The second column lists the prep-ositions of which this word was an object.The third column gives the modifiers whichpreceded the word in text. The fourthcolumn lists genitive nominals which followedthe word, and the next to last column showsthe prepositions which were governed by theword. The final column gives the objects ofthe prepositions which were governed byOJ KO3OHNi.1
0l0
II
14 N
114
x 0
14 -o 0
0o 01)
co 04 0 64444~MM~ mM MMaaP eL
0.
0 4 0
ms 00 ) C .
I~~ 0I'U0 -- 'xM 0 3Q) 1-4
100
040
P.64 mm m m m IQm m m m 0 00 0
0)
m40 ii AF64U
.~~~1 of -k r (* * *
[0
IIs
Next it was necessary to discover which words in the environ- 0ment of the word under consideration could be considered deter-
miners for the various translations of that word.
A particular set of determiners i. ingenerally applicable within a particularsyntactic context. We list here the deter-miners which we found in text for OTYOleROOpalong with their translations and relevantsyntactic contexts. o
I1
HN
III
16I
I /I
(apparatus)
COCYAOS(vessels)I IOCoo~oO,, E~
(capacity)
1 (1) 3I OTHIO~e@HJ cCETOU as regards(synthesis)I
doraTCTna(wealth)
UPOKHKHIO56HKA(penetration) _j
in the proportion of
((1gram) )
"(magnitude) . (first)
I KOZMR'OCTa Kx pal ouy(quantity) (ration)
(2) OTHOXoemX- qMcKa I x Rxly ratio
(number) (number)
ZoARMeCTna x KO~uxeoCTByI (quantity) .•(quantity)
lnlII x (induction) relation•xrynex) r enp~ye"
(TOY3OK KyT6Mresponse(frogs)] I (temperature))respOnse
1 (3) no OTHOeiKIE X) with respect to
1 (4) B STOM OTHOUOK, ) in this respect
I (5) s Mop4oxaorxecxou OTKOE NN) morphologically
I
17
Based on the lists of determiners, we attempted to formulate
general rules for the correct translation of the words which we had
studied.
The group which has as its translation "ratio" displayed a
certain semantic homogeneity. The one-member group I r.
(I gram) suggested a group represented conceptually by "amount."
Other groups displayed no conceptual homogeneity and were sus-
pected as being open-ended or residue classes.
On the opposite page we show the Ginitial rules which were formulated for thetranslation of OYNOUOSSK on the basis ofour original corpus.
II
I
18 1
(1) 5 0K O 3OSOi (genitive nominal block) e.g.
II A. When genitive nominal is specified amount (e. g., 1 re)translate as "in the proportion of"
I B. Otherwise, translate as "as regards"
Ii (2) oTXOe1E genitive nominal block x dative nominal block
A. When genitive nominal is a unit of measurement;"ratio"
B. Otherwise, either "relation" or "ratio"; except whenrule 3A applies
1 (3 ) OTHOeHO-- genitive nominal block
A. When genitive nominal is animate, "response"
I B. Otherwise, "relation"
I(4) UO OTHOMeHOW X "with respect to" (idiom)
(5) 5 STOM OYHOMeHKK "in this respect" (idiom)
[[[III[ 1
I
It was now necessary to apply the rules which we have
developed to fresh text and amend, augment or discard our previous
rules.
In the case of OTROONOaK for example, an examination of
additional text indicated that our rules were in the main correct.
More words were added to our list of determiners for the transla-
tion "ratio. " They were AJ[ZTOebHOCTK (length), 3M00T (height),
and AZNHO (also length). Their semantic relationship to the old
set turned out to be as predicted. The expression a 3TOU OTHOSOMM
(in this respect), which we had originally considered an idiom,
proved to be but one of a set, which in the context B (modifier)
oTH~sOueX (-Msx) determines the translation as "respect". The
determining modifiers found were ApyrzX (other) BOOX (all). 3Here again, we note a certain semantic homogeneity, and we could
extend this list on a speculative basis and, with the help of a 3dictionary, actually find new examples (e.g., uxorzx [ many] ).
The research in this area was a success. It has revealed
some of the most interesting facets of the Russian semantic system
to date. We feel that we have brought MT to the edge of even more
important discoveries in this area.
!II
II!
20 1
CLAUSE BOUNDARY DETERMINATION!During the contract period a number of routines have been
I added to the basic syntax program to improve the quality of the
translation. The most significant of these is a syntax pass which
examines commas and conjunctions whose function is ambiguous
(that is, conjunctions which may connect words, phrases, or
clauses: for example, "and"), in order to determine whether they
function as clause separators. If they are, they are marked as
boundaries for subsequent syntactic and semantic searches. Two
types of clause boundaries are recognized: (1) those separating
independent clauses from each other (can be commas, or conjunc-
3 tions, or both), and (2) those which separate dependent clauses
from the clause on which they depend (commas only):
I Ex.: PoxyX.xeonpoTeuAW MoryT xxor•a RaXOARTbCA
TOJbXO B AeOTUax raydoxot 3o01, a 3OeMORH CpeAxeM It
KHapyZH00 30XV UOAIHOCTbM ammeHm MX. = ribonucleoproteins may
sometimes be found only in the cells of the deep zone, while elements
3 of the middle and outer zones are completely devoid of them..
I Ex.: Pjs6oytpceonpoTeHAhi, XOTOPHe OTCyTCTByDT OT
sixeMeHTO3 CpeAHOM x HapyxHol 3OH, MOryT HaxOAKTbCO B
I xc~eTiax ZrJydOKOtt 3OM. = ribonucleoproteins, which are not
present in elements of the middle and upper zones, may be found
in cells of the deep zone.
The reason for this distinction is that in the first case, syn-
tactic and semantic searches should be discontinued at the clause
boundary, whereas in the second, these searches should be inter-
I rupted at the left boundary of the dependent clause (which is not
searched for syntactic and semantic codes relating to the main
[
1[ 2
clause), and should be resumed at the right boundary and continued
to the end of the sentence. This makes it possible to identify syn- Btactic and semantic elements of a single clause even when they are
separated by long stretches of dependent or parenthetical material.
The clause boundary algorithm first searches the sentence
for a comma or a conjunction. If the comma or conjunction is
between two predicates, the immediate environment is searched
to determine whether a definite decision can be made that the item
is a clause boundary. (One example of this would be a comma flpreceded by an unambiguous conjunction, such as RTC). If no syn-
tactic elements which definitely establish the item as a clause
boundary are present, the segment between the two predicates is
searched for additional commas or ambiguous conjunctions. If Pnone are present, the current item is marked as the clause boundary.
In cases where an additional potential clause boundary is present,
the environments of both items are searched to determine whether |
a definite decision can be made that one or both items are not clause
boundaries (for example, commas or ambiguous conjunctions con- Inecting two modifiers). If no definite decision is possible at this
point, a further search for an additional comma or conjunction is
made, and the same inspection of the environment is carried out
on the new item. If no decisions can yet be reached, a fourth
potential clause boundary is searched for and checked out. Where
no definite conclusions can be made at this point, attempts to re-
solve the boundary of that particular clause are abandoned and
processing of the next clause is begun.
After the initial search of the clause boundary algorithm, if [1the current comma or conjunction is not between two predicates
but between a predicate and the beginning of the sentence, the seg- -!ment between the comma or conjunction and the predicate is inspected
to determine whether it contains a relative pronoun. If a relative
pronoun is found, the current item is marked as the preceding clause
boundary and a search for the following clause boundary is then
initiated.
22
l
SUBJECT RECOGNITION
As clause boundaries can now be defined accurately in all
but a few cases, it has been possible to make several improve-
ments in the quality of the translation by expanding the existing
subject recognition routine to handle objects and by including a
routine to identify a series of subjects. The subject-object recog-
nition routine operates within the limits of a particular clause as5 defined by the clause boundary pass, searching for both subject
and object in order of most frequent occurrence. The search
l for a subject is begun in the segment preceding the predicate and
if unsuccessful, continued in the segment extending from the pred-
l icate to the next clause boundary or end of sentence; in searching
for an object, the order of segments searched is reversed. When
a subject or object is encountered, it is marked as to syntactic
function and transferred to an extended nominal blocking pass which
sets the limits of the particular subject or object block.
If the predicate is plural, the subject search does not conclude
when one subject is found, but transfers control to a subroutine
I which searches for possible additional subjects. The immediate
context of the current subject is first investigated to determine
I whether commas or ambiguous conjunctions are present. If such
coordinators are found, a search for an additional subject is made;
if no commas or conjunctions are found, the search is abandoned.
In the case of a singular subject and a plural predicate a "trouble"
signal is stored if no additional subjects are encountered. When
additional subjects are found, they are marked as subjects and
processed by the subject blocking routine.
III
| 23
TRANSFORMATION PROGRAMS FOR WORDORDER ARRANGEMENT IN THE
ENGLISH TRANSLATION
Since the functional load of word order in inflected languages
(such as Russian) is much lower than in non-inflected languages
(such as English), it is clear that an adequate machine translation
program must provide for the resolution of word order discrepan-
cies between the source and target languages. In our 1961 report
(Machine Translation Studies of Semantic Techniques AF 30(602)-
2036), we discussed in detail existing programs for three types of
word order rearrangement in the English translation. These three
types were described as subject-object rearrangement, rearrange-
ment of governing modifier packages, and rearrangement of auxil-
iaries and modals within predicate blocks. The first type of
rearrangement is the most complex, since it involves the identifi-
cation of more syntactic elements and reshuffling of large blocks
of text. Furthermore, in order to operate successfully within
multiclause sentences, it requires a clean definition of boundaries
between clauses. At the time of that report, large-scale rearrange-
ments could be effected only on single-clause sentences, as there
was no systematic way of identifying separate clauses. It was
possible to rearrange the English translation of:
3 cpeAe BINZOTCA ny'RoX sapKzexxux ROCTYIZ
(In the medium moves the beam of charged of actives particle&)
to read:
The beam of charged particles moves in the medium;
or to rearrange the English translation of:
nyRox 3apKieHHxM RaCT3M o0 acxiiA npoPeccopY
(The beam of charged particles explained the professor)
to read:
The professor explained the beam of charged particles.
24
!
If an additional clause were added, the sentence would read:Inymoz 3apxzeHHhbx RaCTNI ObBCHKZ Upo0eccop, a CTyAOHT
He o~paqAz sHmmaHe
The beam of charged particles explained the professor, but
the student wasn't paying attention.
Then the first clause could not be handled by the word order trans-
formation program because there was no program for determiningwhether the ambiguous conjunction "a" connected words, phrases,
* or clauses.
Since the decisions of the clause boundary routine discussed
above are now one of the information inputs to the rearrangement
program, two separate searches for possible rearrangement require-
* ments are made on the sample sentence: one on the first clause
(the segment between the sentence beginning and the conjunction);
the next on the second clause (from the conjunction to the period).
If a requirement for rearrangement is revealed by search of the
first segment, the English translation of that segment is rearranged
as specified, independently of the second clause, which is not checked
for a rearrangement requirement until the rearrangement require-
I ment of the first clause has been satisfied. Thus, the resulting
English translation of the sample sentence is:
The professor explained the beam of charged particles,
but the student wasn't paying attention.
IIII
25I
SYNTHESIS IiSeveral steps have been taken in this area to achieve some-
thing significantly beyond a modified word by word translation by
a concentration on the translation algorithm (and translating trans-
formation). With a very few exceptions in the past some Russian
constructions (i. e. idioms) have been translated into corresponding
English constructions. Because English and Russian display certain
syntactic similarities, this policy, though often awkward, is usually
comprehensible. In conjunction with the semantic research previ-
ously described, exploratory work was undertaken in this area. iiAs a result rules for the handling of Russian constructions contain-
ing moz0o and cze@yeT were devised and applied to new text.
The rules proved very successful and serve to render what were
previously tricky and awkward constructions into idiomatic English. 11We felt this was ample justification for programming these rules.
This activity is so recent that it is not reflected in the sample out-
put in the appendix of this report.
26
!i
[
MACHINE TRANSLATION OF[ STYLISTIC DIFFERENCES
In connection with our preparations for the machine transla-
tion of fiction we have given considerable thought to the problem
of the automatic detection and correct rendition of stylistic differ-
[ ences. At the present stage of our research we are able to specify
two areas in which we shall ultimately be able to do this. One is
the stylistically correct use of vocabulary which is part of the over-
all problem of multiple meaning to which the research under the
present contract is addressive. This aspect of the stylistic problem
is not essentially different from other problems of multiple meaning
resolution. The second aspect of style is related to the stylistic
function of syntactic elements. We have given consideration to the
well-known fact that the position of subjects and objects with respect
to the predicate have a definite function in the Russian sentence, in
that there is a tendency for the initial position in the sentence,
whether it is filled by subject or object, to indicate old information,
while there is a tendency for the final position, whether it is filled
f by subject or object, to indicate new information. This stylistic
function of position within the sentence can be preserved in the
English translation and can contribute to producing machine transla-
tion which is not only intelligible but also stylistically more closely
comparable to the original. At this stage we are exploring two
IIpossibilities for the rendition of this stylistic value of position into
English.L One possibility is the use of passive translations for Russian
active sentences whenever in the Russian original the subject is in
I the sentence-final position, indicating that it constitutes new infor-
mation. In this case, the Russian subject would be rendered by the
English agent object which likewise would be in the final position,and hence equally indicative of new information. Our assumption
here would be that the passive English sentence is the stylistic
equivalent of the Russian active sentence with inverted order.
27I
nii
As an example, let us take the following Russian sentence:"i•Kxok orpoMh• nyTb c oem paSBNTXM npoKAa xaEa CTpana
sa COPOX Asa roAa nocze Osepaesun IaaBTO MUNTSaCT0N X !
UOMOeNKOD H yCTaHonzemn Coi@TOZO• pa60o0-XpecTbRKOKOI
BDaCTH" " It stems from the lead article "Becnpumepmldt naypMI 11noAUH"r (Unparalleled Scientific Advance) in Pravda of 27 October
1959, dealing with the Soviet moon shot and the pictures taken of
the far side of the moon. In the light of our interpretation of style.
we may assume that the object of this sentence, "Wa 0 ft orpomxabd
flyTb... ." (What enormous path... ) is old information, in the
light of the preceding discussion of the great achievements reported
on, and as new information is presented the subject "MaMa ctpaWa"(our country) and particularly the final predicative complement"sa COpOK Asa ro~a..." (after forty-two years...). Clearly,
then, the most idiomatic translation, taking the stylistic function
of the sentence portions in the Russian original into account and
rendering them equivalently in the English would involve a change
from the active to the passive voice, and a retention of the order
in which the semantic components of the sentence appear in Russian.
Thus, instead of the rearrangement which our present program aims IIat, namely, "Our country in forty-two years after the overthrow of
the rule of the capitalists and landowners and the establishment of
the workers' and peasants' rule passed what enormous path in its
development, " we shall aim at the following translation which pre- Iserves the original order by changing the voice of the English verb
from active to passive: "What enormous path of development was
passed by our country in forty-two years after the overthrow of the -
rule of capitalists and landowners and the establishment of the
workers' and peasants' rule!"
The second possibility is to consider the function of the
English indefinite article or absence of the article in the plural as
an indication of new information. In this case we would give con-
sideration to the possibility of using the indefinite article or no
28
I1
article when the corresponding Russian sentence element occupies
the final position. An example would be the sentence "Kaz•o.33 3TMx AOCTYZOIIKe--OeCUp3MOepNJM HayqHM no~nzr! " taken
from the same document. If we translate the final element of this
sentence, the predicative complement "decnpmieputmN mayqxmml
aorzOHr,1 using the indefinite article in English, we shall come
closest to an idiomatic translation: 'Each of these achievements
is an unparalleled scientific advance."1
ii This research is as yet in the exploratory stage.
IiIiIIIII!II1 29
SEMANTIC RESEARCH USINGTHE USHAKOV DICTIONARY
The utilization of examples of usage given in the Ushakov
Dictionary, which was proposed for the present research, has
been attempted. Unfortunately, in the tests which we conducted,
using about 100 entries, the majority of the examples given in the
Ushakov do not allow multiple meaning due to different subject
matter field. These differences are not amenable to the determiner-
determinee treatment which we are at present in a position to
implement.
I
ri
II30 I
r TEST SENTENCES FOR SYNTAXFLOW CHARTS
In order to facilitate manual and machine checkouts of the
syntax routines, a large number of test sentences (189) were
created to serve as tools in this process.
The aim of these checkouts is different from that of the
processing of live text for testing purposes. The processing of
i live text tests whether or not our syntax rules cover all significant
conditions of Russian sentence structure. The purpose of our test
sentences, on the other hand, is to test, riot the adequacy of our
rules, but whether or not the program actually carries out these
rules as intended.
A test sentence was created to checkout each significant
branch of every flow chart. This means that for every question
asked on a flow chart, at least two sentences were created, one
corresponding to the yes answer, one to the no answer. We could
thus assume that whenever a failure occurred with a particular
test sentence, there was a great likelihood that this failure would
be due to the branch on the flow chart which this sentence was in-
tended to test.
Since the test sentences were designed for purposes of the
syntax, they were as much as possible deliberately restricted to
the syntactic conditions which they are intended to test. In particu-
lar we wanted to avoid complicating the tests by introducing more
than the minimum necessary number of dictionary entries. Con-
sequently, to display the syntactic difference to maximum advantage,
the same dictionary words were used over and over again in the test
sentences, bringing about a somewhat unnatural impression. Since
some of the test sentences were introduced to test flow chart exits
marked "notice of error", these particular sentences were designed
to be grammatically incorrect. An example is sentence No. 2 for
testing the relative pass.
I31
'IIWe wish to illustrate the creation of the test sentences by
giving some examples of sentences pertaining to Homograph Resolu-
tion Pass HR 1. (Homographs of the type "YpyAo, coriaouo,"
called predicative-adverb-preposition homographs.) Test Sentence
No. I illustrates the yes answer to the question "Is the predicative- IIadverb-preposition homograph immediately preceded by a preposi-
tion and immediately followed by a modifier 7' Note that this 11sentence contains a homograph (npamo) in the required position
between a preposition (OT)and a modifier (pexypperTrlIX). 11
This test sentence is correctly translated if the homograph is
rendered by an English adverb.
Test Sentence No. 14 on the other hand illustrates a yes
answer to the question "Is predicative-adverb-preposition homo-
graph immediately followed by a governed block?" and a no answer
to the question "Is predicative-adverb-preposition homograph
immediately preceded by a comma ?' This test sentence is cor-
rectly translated if the homograph is rendered by an English
predicate. 11Test Sentence No. 15 differs from Test Sentence No. 14 by
having a yes answer to the question "Is predicative-adverb-
preposition homograph immediately preceded by a comma ?" which
is in turn followed by a yes answer to the question "Is governed
block immediately followed by a comma ?' This sentence is cor-
rectly translated if the homograph is rendered by a preposition.
Each time a change is made in the program, the test sen-
tences are run in order to ascertain the effects of that change.
The resulting vertical listing is then available for analysis. This
is, however, not enough. It does not take into account the possi-
bility that a change which has produced an improvement in one
place in the program may inadvertently result in a change for the
worse elsewhere. IConsequently, each time a change is made in the program,
the test sentences are run, and in addition to printing them out
332
directly, the information tape of the current run is automatically
I compared to the information tape of the latest previous run of the
same sentences. This comparison is conducted bit by bit. If any
I discrepancies are detected, a vertical listing of the full sentence
is selected for output and each word which displayed a discrepancy
is flagged for the benefit of the analyst.
The inventory of test sentences can be expanded when nec-
essary. The flow chart of the test sentence comparer on the
following page indicates the provisions that have been made for
them.
IIIIIIIIIIIII 33
0
[1iI
0k
S 4)3
to m
t i_1 0 40
m~r4•m
4'4 ,H :1 4ttt
34
Test Sentences for Symbol Pass
1 &-maCTRiIZb upoHHK8aT 'ope: RAPO.
2. BazMHbe~ 7 pONKURT Rep03 RAPO.
I ~~3. Bazbieu 'ISCTIDA Up01K8.T RiOPeS3
4. Heabox sHaTB, UOTOMY 'ITO 5UpOHHKMOT 'isp.: u.po.
5. 'Ia~m~a 2: po~xxaO epe'sp. n.po.
6. EIOCToAHHax 3 [IponazaT mlepe AAPO.
7. 1noHxxaeT 'iepe3 A~po.
8. rnaeHme iaxuaz a~ 2: noHHXxDT 'spe3 AApo.
I 9. C03az~iaxa. %MmTHa XzpoucaeT 'iepes itApo.
10. CoSAaHxoeX npo~axaeT gepeg R~po.
11. 'qC~ nPOHSDOAHT T
I12. flpoR3BOARXY, 'laCTHua npOHXX~aOT 'lepe3 AAPO.
13. 'RacTma mozeT UpOH3BOANTbXT.
I14. IaCTmRUb H- UE~nOKHK8T 'lepes tl o
I Test Sentences for Homograph Resolution Pass: type "TpyAHO," "corxacxow
1. Hamm PS3yAbTaTH ZIPONCXOAJIT OT JIPAMO pexyppOKTHbM MOTOAOD.
2. HBJxeHme flpox3ozzo corsacHO HaUmx pesyatbTaTax.
3. fluzeiiie UpOR3OEZO meZAy npo'mum corsacuo HaREM pe3yJhT8.TaM.
I 4. IHpOX3BOAR pe~yJ~bTaTM corzacHo HaeOmy UpeAUoJLoZOHvm, onbiTb
flpON30MRJ 6e3 TPYAHOCTel.
5. Hq.ue~me npoxa~ouAo RCHo.
6. HIDAO~uI npOU3OR.KO meZAy uponulM AcHo.
7. IRP023BOAR pesylbTaTbi coriaCox, OblTbi ZUpOESOEAN 663 TpyAHOCTSet.
8. HI3aeHxe OMAO coriacuo c H&EHM flpeDUAOSnoao zm.
9. SIDJZHHS corzacHo 6dzzo c HaREM upSAnxIoAZOHoM.
110. SAnze~xe OY.~6T corzacHo c HEmm uentoem.
ii. TPYARo lipeAflozaraTb TaxeS pesaybTaTH.
I12. TpyjAxo Ham apOAUosaraTb Tane6 pegyzbTaTbi.13. IIpeAfozaraTb HMW TPYAHO Taxe. peuyubTaTMl.
14. Taxoe AzxSexx coriacxo HaREM p03yabTaT&M.
35
Test Sentences for Homograph Resolution Pasns: type *rpyAo, *oorA&OXo*
15. Tamse onur., cor~Aono .aeuy nPeAMOMORSUM9 uPOSOWm 0em
16. PaOOTa, nPOM0x0AaXaa AO2O~bUO TYYAU0, 0ODOPM0OTOR oSVOAER.17. B@XKMe XONO R01035900?3M p68yabTI$M DpUCKONXA? 005 TpYA3OOYeI.1s. HCos, 'iTO ueAunf.
19- RaNtio AXR MiXaM 12013 NIAbUE
20. Tame. oDwmT 00, noo POXOZOAAY 003 TpyANOTOA*M
21. Taicoe s 85511 8030, a =a3 0DM? npoUUOMA@Y 063 7pyAROIOt'@.22. Taxos ass.... Gespasannuo ?PYARo.23. RSOHo 3UBOCTHMN pe3yibTa? UPOROZOAN? 0em TpyAROOTGR. [24. CoBePmeNOTBsasaUG Taxxx pesyaT&ToD 0'iO~b TPyAUO. 1
Test Sentences for Homograph Resolution Pass: era, a*, zx [1.* P&OOT& npommouaa noose ero o'ie.b samuz yousul.2. flPOAUosozeiiA uX 0'WNb Baww. H3. CoTpyANuiX 00Xp&HRA9 00.
Test Sentences for Homograph Resolution Pass: type *UoToiaauz"
1. flOTo,'oaXwa TPYARS8 paOoa 3553MCTPNPyOT ZEURb.
2. 1K35b E55DOTpapy6T DOOTOXREMZ DR?b paGoT. I3. CReuyuuee 05050 DUNKMWSZK? DpaO00o0py.
4. COaTPYAuX 035835 oAVA"19ee.5. ROAePuK06e TYazy. WaO~y DOEXR? 36Upuiiuei 08
6. flaorOANuua UpKKEM£ST P53350 ON&OuA'it.
7. liUE AOK5SAM 8D5JIDTOR ytieRU~M.8. UaNK AOxaaAm WDoSmma y'1enw3.9.. HMOx pe"aTI UPOBOASES y4*XMI.3
10. Hama pa"aTa UpXDOAM? X y'ieNMh.
36
Test Sentences for Homograph Resolution Pass: type *UOyonKMuj
12. Hawn pa~OTa UKDXOAUT OT ZOCYOENKOI.
I ~~13. Mama paO~a XuzneCA xzOowOAzaOI.
Test Sentence. for Nominal- and Prepo sitional- Blocking Pass
1. Bonpoo 0Txpuoro oxua pemaeroa osrojtxa.
2. Boupoc OTipHliIO pSeOTxz pOeaOoJIx corOANE.
3. Aua OTKpWTMX OK~a KftXOAKTOR SASOh.I4. ADS 0TKpMIThI POXOTXN NaXO~ATOa *AO0b,
5. OTXpMITOS X 3aXPUT0S OZKU UXOAX?0R 3AG~b.I6. Hezb3X padOTaTb 603 OTxpbiT0or xa7. Hezabax padOOaTb 008 oTxpbMMX oxxa. a peaevxu.
Test Sentences for Inse rted- Structure Pass
1. KOiie'wO, 'laONXMI apOHRU3T Re~po* BApO.
I2. CerOAHA, 'RaCTxza upo..ia Roepes AO3. qIa0TiMw, KONO'(HO, apoHEKED? mopes LAPO.
4. 'qaOTM1ga, CrOFAH, npouaa mopes xApo.I5. Bodg r'QQpx, tMacTIMN UPOHUK8D? 'lePeSJKP
6. 'IaCThNubz, noodS rosops, flPOHKN~5,T Rlopes XAPOOI 7. '4aoTugbi, xax ace Uva3?, uIpouaXar 'SPGOS XAPO.
8: BaxaM:,g Yaxxw onooOodox, 'Ia0TNIMb UpOHNK&T ReOpo&el
9. BUH~t n au&SOY UPSAUOXOXOXND, 'lao?3U! DPOUKKKS? Mopes XAPO.
I13. iCOH.'wo, 'aOTIaMN 1Dp0EaU oerOARA, '(Opol AO14. KoH9RHo, 'laCTRU~ £IpO~KXa5UT, 5000140 roaopa, Repea RAPOO
I15 * ~IUOTK1~M, KOMOMNO, fPOUNMKS? 'SPOSp 120 3UESOM7 SW&OXAPO.116. Reopeal UO 3&E6Y flPSAUoOAZOXMD, SAWO EYXJIOX5M, XON
3 7
[1
Test Sentences for Governing-Modifier Pass
1. Bpawmawxeq zcnAu zAyT suOpsA. fl2. KacICHAM, sp&Mwxocx soEpyr xAps, HAYT UopSoA.
3. KaczaAb HymeozoD, ,paswwqeOC soxpyr JAP&, WAyT BUepSA. H4. Bpauawluecx noxpyr xjpa xacza•u z? 3ne0PeA.
5. BpazAvaxecx soxpyr Rpa xacaAw xyzyeoxo0 Kay? nepOA. H6. Bpauammecas noxpyr Rpa WcTpue zaczOmM MAYT UnepeA.
7. BpawanxecA Boxpyr Rpa OMCTp.e EaczaAM HyKaeozoH Ksy? 3T BUSOA.
"XOTOIDm palss-test sentences
1. 1acxmau, XOTOPeO KAYT snepeA, OReHb Be@X.XN.
2. EaSCXbI XOTopbte HAY xnlpeA op4eHb sexxK.
3. XaCXAjl, XoTOpme nnepeA, omemb sOaeRz.
Test Sentences for Main Syntax Pass-Key To Number Code HX. 0. 0. 0. Predicate
0. X. O.O. Subject [O. O. X.O. Object
0. 0. 0. X. Eliminated candidates for subject or object
1.0.0.0. Predicative plural non-past H2. 0.0.0. Predicative followed by Obi
3.0.0.0. Predicative plural past
4.0.0.0. Predicative singular non-past
5.0.0.0. Predicative singular past feminine
6.0.0.0. Predicative singular past masculine
7.0.0.0. Predicative singular past neuter
8.0.0.0. Predicative plural followed by dM•b
9.0.0.0. Predicative plural followed by OuMb followed by
second predicative.
I38 I
Test Sentences for Main Syntax Pass-Key to Number Code
[ 10.0.0.0. Predicative is Oui and is followed by second predicative
1i. 0.0.0. Predicative is 6yAyr and is followed by second predicative
1 12.0.0.0. Predicative is a comparative.
13.0.0.0. There is no predicative, but coax followed by infinitive.
14. 0.0.0. There is no predicative, but Gais du followed by infinitive.
15. 0.0.0. Predicative is followed by infinitive
16.0.0.0. No predicative, but gerund
17.0.0.0. No predicative, but gerund followed by infinitive
1 0. 1.0.0. Unambiguous nominative plural
0. 2. 0.0. Genetive singular/nominative-accusative plural ambiguity
I reduced by unambiguous modifier
0.3.0.0. Genetive singular/nominative-accusative plural ambiguity
I reduced by nominative numeral
0.4.0.0. Potential nominative singular, any gender
I0.5.0.0. Potential nominative singular, feminine
0.6.0.0. Potential nominative Mingular, masculine
0.7.0.0. Potential nominative singular, neuter
0.8.0.0. No subject, but predicative can be impersonal0.9.0.0. No subject, predicative can not be impersonal rearrange-
[ sent required
S0.0.1.0. Predicate governs accusative, nominative/accusative
potential object
[ 0.0. 2. 0. Predicate governs genetive, genetive potential object
0.0.3.0. Predicate governs accusative plural potential object
S0.0.4.0. Predicate governs accusative, genetive singular/nominative-
accusative plural potential object, ambiguity resolved by
numeral
0.0.5.0 Predicate governing accusative contains negative particle,
nominative/accusative potential object
39
Teat Sentences for Main Syntax Pass-Key to Number Code
0.0.6.0. Predicate governing accusative contains negative particle, [1genetive potential object
0.0.7.0. Predicate 2overns instrumental, potential instrumental LIobject
0.0.8.0. Predicate governs dative, potential dative object h-0.0.9.0. Predicate governs preposition, prepositional block present
as potential object [j
0.0.0. 1. Nominative/accusative plural candidate for subject Ieliminated because of preceding preposition
0.0.0.2. Genetive singular/nominative-accusative plural candidate ffor subject eliminated because of preceding preposition
0.0.0.3. Genetive singular/nominative-accusative plural candidate Ifor subject eliminated because of preceding modifier
0.0.0.4. Singular candidate for subject of unspecified gender
eliminated because of preceding prepositions
0.0.0.5. Feminine singular candidate for subject eliminated becauseof preceding preposition I
0.0.0.6. Masculine singular candidate for subject eliminated because
of preceding preposition I0.0.0.7. Neuter singular candidate for subject eliminated because
of preceding preposition I0.0.0.8. Nominative/accusative candidate for object eliminated
because of preceding preposition
0.0.0.9. Genetive singular/ nominative- accusative plural candidate
for object eliminated because of preceding modifier I0.0.0. 10. Genetive candidate for object eliminated because of
preceding preposition
0.0.0. 1i. Genetive candidate for object eliminated because of
preceding other nominal
0.0.0.12. Instrumental candidate for object eliminated because ofpreceding preposition
440 1
Test Sentences for Main Syntax Paso-Key to Number Code
0. 0. 0.13. Instrumental candidate for object eliminated because of
preceding other nominal
0. 0. 0.14. Dative candidate for object eliminated because of preceding
preposition
[ 0. 0. 0.15. Dative candidate for object eliminated because of preceding
other nominal
I Number followed by I'm 1: nominal block includes modifiers
Test Sentences for Main Syntax Pass
1 ~ToaapxxN yneauaNsaw cns
1.1M.1M.Hanx Toxapxxx yn5exUan 3Yan 0ozzu3aA~nCYK zCK 00ws.
1.1..1 lqepos paAOuuH YOB&pKAR YB.ZU~m3aW? 0033.
1.1.1.lu.qepeS 60.UbEX paloHm Tosapxuu yzeamna3saT 00m3.1 1.1.1.2. 'lepes, coupoYKDzoNaxi ioapNmm ys053113383 0033.
1.1.1.2m. 'lPope dosazuf 00Up0TUNI0338 TOB&PNEN yBOAU'VWMa3[ 00133.1 .1.1 3 *B cuMCOe Taxoro COUPOTHRAORflA ToBapuWu Y6ZEA~n3U3T
1.1*1.8,To~saax~ YBAezuIuaT ROpOS paRou C033.
1 .1 *To~apuxx y3.A~nZa39T 3 cmuCAO Taxoro conpo'Ine5ASIL
1.1.2. Tonaapulu AOCTMVS3? ycNxAU.
11.1.2.10. Toaapuasi AOo~uram 603 TYpAKOCTOfl YOCE~.u1.1.2.11. Toxapawu AooCYNra UOCTaNOBSOR TPYAuoc~ell ycuaufl.
11-3 Tonapanu ysexXRR'iK59.3Y PZ
101 .3.8. TonapEaw ys.3eauusmI nopes paXOX UOTepx.
1 .1 .3.9. Tosapmw. y3CZ3u33s3Y a cmuOO Yrazorc o upomKaxeuE
DOTepa.
1.1.4.Tozapawu YBOCKZKN&V AS* UOTOPX.
41
Test Sentence. for Main Syntax Pass
1.*1.5. Tonapumu as 7 3053q3585? c0ma.
1 *1*.6. Tonapmax 30 yUea3uEza.3 coma&.
1 .1-.7. Tosapxuw CXZyZM TeXHUH&iU.1 .1.7.12. Tosapuuiu CxyU? c noupaixol T6XuX~aMN.
1.1.7.13. Tonapau uzyzaT Oei yupasismax 00OaTORUSK TOZeXRKaME i1.1.8. Tonapaum oT3sena COTpyAH3KaM.
1.1.8.14. ToDapauw OT3SOUDT UO COCTaDy CO~pyARNzaM. I1. 8.5 Tosapmau oTs0~awY 603 UOATBOPZAOHNE COPYAHEKXA
COCTaly.1.1.9.Tonapuuw Qo3*eft3T Ha Taxot Doupoc.
1 *2.1 * Taxne COUPOTHBXOHNA ysexm~mzaMT C03B.
1 .2.1 .1. Mqepe3 pafloau TaKN. c0upoYND.IexNE yuexxuzuaaT 0038.1.2.1.2. Moepe3 COnPOTUDXSHKJI Taxie6 flapTR y36eHxqH363T Cos.,
1.2.1.3. B CMMOZO Taxoro COUpOTK3ZOHKXA Tamme napTUE
VBSJLKRNBSJT 0033.
1.3.1. Ana COUpOTXDXSHKA yBsexxINBS3T 0038. I1 .3.1.1. Mpes palMoRb Ana COUPOTEIZOHNA yneimnusawT 0033,
1.3.1.2. MUOpS COUPOTH32SHNZ ADS naPTEN yDexxR35S3T COMB.I
1 .3.1-*3. B CUNCIO Taxoro C0UP0TRDKOZH3A Abe naPTRU ynD0zK~asaT
1.*9. 00MB YBS.ZRRK33T Tonapusim.
2.1.1.Tosapanx cpasuxuazz ON M.ToA-
2.lx~lu.Hann T03apanM cpasumxaax ON may~sul USTOA.I
2.1.1.1. epes nPNM6PH Toaapzuu cpaiumn~aa ON MOTOA.
2.1.1 .2. Mope.s pauaozexzA T05apxx3 cpassusaim ON MOTOA.I2.1.*1.3. B cmaose Taxoro pasaozewux Tonapump cpasmusefzu Gu
KeTOA0
2.2.1. Taxne 00flPOT335O3N UP030EN GM UOTOA.2.2.1* 1 * M~epes usa UPEMOP Y&K O 0p0Y1350335 UPONSROIN GUSTOA.
2.2.1.2. MSopS, COflPOTMBZNHUR Talneus xe npoMOPM UO OEN Gu M@oA.2.2.1 -3. B CEMOZO Taxoro coflpoTDasexxx Teals upsMepu upoOeK I
OM MOTOA.
42
Test Sentences for Main Syntax Pass
[2.3. 1. AN& OOSIRO?35ZO33 UPOBOZN OH MOYOA.
2.3.1 .1. MOPeS UPNMOPH Asa OORpOYKMaOMz upossix amM OTOA.[ 2.3.1 .2. Rlp.OPOS pOYUPaOTUBS ADO flOPY3N UPOSDOZN QM OYOA,.2.*3.1 .3.* B CumozO TaoNoF conpo~uzzxeww Ass USpYRN lipoDOex[ Ohl MOTOA.2.9. MeTOA UpODOZZ ON TODOPKUK.
13:01.1 ORR OPaSRbxSAa M:T:A.
4.4.1.4 pe nIpxKeP KOPaHSOT MOKIT 3T OY
14.4.1.4m. RopeS TaZXo IpNMoSp-KOpa RoxaUS&nae KOTOAO
4.8. lIMy Y~aoTCX.
5.5.1.PO Uoe oxaaa.1a MOTOA.3 5.5.1.5.OS Mee OAb MOPe. UOXe&S&Xz MOTOA.5.5.1.5K. Repoin UPOCTYM MOAb MOP&. UoMxau MOTOA.
5.9. floxbuy noxabaz. map&..
6.6.1. COWS noxasax MOTOA.
6.6.1.6. %lope: Upxxep COWS UOKOS&A MOTOR.6.6.1.6m. 'lop.. TAXON UpKKzCW S op o a ux~az MOTOA.6.9. flozbay UOxe~aSz MOTOA.
17.7.1. CpazezDHN noUasaoO MOTOA.7.7.1.7. Mqope. UORBTKO CPSDHOHNO nOUs~oO MSTOA*[ 7.7.1.7m. Mopes T&OXO UOU.ETKO CPU.DNONNO ROUUSaXO MOTOR.7.8. Emy yASAOCh.
[7.9. iloabsy noxagazo opoanzox..
8.1. Toxapaux moran Ouwh.8.9. Soria Ovrb TOue&puIX.
9.1. TosapEm= morim GMb NOMS&e.axu9.9. Soria Oli~h 11UOSILZ TOD&PEU3.[ 10.1. Toxeapuzz Gui. noueaa.5.u10.9 Buzz Roxafaiw -rouepxun.11.1 Tonapumw GyAY noKOsOM.
43
Test Sentences for Main Syntax Pass
11.9. BYAYr UozaSMau ToaapR. 112.1.2. Tonapsus 6Om-p@@ comaa.
12.4.2. SOP& ONOTPe. MOSYOA&..1
12.9. COVS& OUoTpee ?OBap3ZN.
13.0.1. BOAR YBOARRMuaab 00ca.
14.0.1OA RON GuYDOAeau~~ab C033.
15.1.1. Touapxza moryT y3*Zx~n&3OTh 003. [15.8.1. y YA&CTCA YBOZNRU3&?h 00I38.
15.9. Coca moryT yueau~zsa~b Y0D8.pMuN
16.0.1. CoSAaBO.5 COM3. I16.o.l. SaAaoaacb oosAa.DOTh COIDS.
Test Sentences for Cleanup Pass CP-1I
1. BeazxMe T0o.apx~ KMyT.
2. 0'Wxh 5sZNmm T0osapuEH HlT.I
3. He TOUapHN NUYT.4. Bazxue Y0O~apNXK MARXH COTPYARNNON MAYT.
5. ONeub DaSmme Y03apmax uO.3H COTpyANHXKO M"y?.
6. He TODO.PX=M MXNO5H OOTpyAHKIOB HAY?.
7. He 9eZNM T0OMPMAN HAyT.
8. He 38.ZKM TBoaapHx NO.5H OOTPyAKNKOD HXy?.
9. Tooapuix 3O.zNb flpo4meCOpos NO.3HZ COYPYAHEXOD
HAyT.
Test Sentences for Cleanup Pass CP-Z
1. Hao axASTb HOAh3K.
2. flpoc'yW MGAb BhAG~b R0Ab3X.I
3. IIpoo'iyM xe~b Tuzoro po1A& SXe~ Resb~a.
4, Taxoro POAIL UpoO~yM MOAb ZHAS~b 36A035.
5. 0ROua UPOOT7D NOAh BRA*-fb N*Ab*X.6. 11lobay BXA0Th Ne~OXU.I
44
[[
HARDWARE IMPLICATIONS OF THE RESEARCH[Machine translation research at the RW Division has been im-
[ plemented on general purpose computers. We recognize such hard-
ware requirements as print readers. multi-font printers, and com-
posing machines.
We have recognized the need for larger memories. This
recommendation, however, has been the general observance-the
larger the memory the easier the solution.
In the areas of dictionary search, we have been able to program
I our way around the fact that the general purpose computers that wehave been using for high-speed memories are too small to contain our
I lexicon.
Both programming foresight and cleverness were required for
the general solution of our ordinary dictionary lookup. However, if
one wishes that a far larger memory had been available, it becomes
possible to ask what new attributes other than larger size are de-
sired of this memory.
What strikes one as the flow charts are examined, is that the
need exists for a totally different type of computer. Such a computer
would have an associative memory.
The reasons which first appear for an associative memory are
those based upon the facts that the data is non-numeric. Such in-
[ structions as:
find the word in the dictionary
[ orfind the prefix of the word
[ orcompare the end of the word with a possible list of suffixes
I immediately present themselves. Their uses are real; our experience
in machine translation, however, has led up to the point where those
[ portions of the program which require the ability to recognize alpha-
numerical configurations require a small portion of the total running
[ time.
45
Attention is directed to the figures on the following pages.
These charts are taken from the 1960 RW machine translation report. J]They illustrate a small portion of the syntactic program. Further-
more, these charts represent the thinking of a linguist and had not
as yet been seriously modified by a programmer's restatement based
on efficient usage of a traditional general purpose computer.
Though there is an occasional reference to actual word forms
in these figures, most references are to grammatic attributes which
would have been found only after a successful dictionary lookup.
In these charts, a large number of the boxes begin with such
words as "search, " "hunt, " or by phrases such as "is nominal im- ifmediately preceded by preposition skipping over modifiers and ad-
verbs." Here we have evidence of the need for a search type memory
past the stage where the problem has consumed alphanumeric data.
Among the important aspects revealed by the flow charts is the
need for a search to be made within a region whose boundaries are
defined syntactically. This searching within a non-graphically de-
fined area is one of the dominant features of language data processing;
this is made evident as we examine different portions of our bilingual
programs.
Syntactic Analysis
It is generally recognized by all research centers that automatic
translation cannot be done without performing detailed syntactic anal-
yses of the text. In syntactic analyses searches are made on other
than lexical units. These searches, made upon grammar codes and
resultant syntactic codes, have as part of their nature a con-
junction of conditions within variably defined syntactic boundaries. _
Up to now, the prominent methods of analysis have necessarily re-
verted to standard computer techniques. Yet, as indicated in the
following figures, this portion of the analysis, at least in the Fulcrum ITechnique, is heavily loaded with "search" and "hunt" procedures. In
this area of the translation procedure, one would expect to find heavily Iused associative memory type algorithms and a different word size and
I46 II
'4II[IIIII II i�iiiI it '�
I
I it il� Ii![IIII I. II'' . a
[IiIi
PAIPIC% p me ps w ft" OR n
PA.0.0, oP MOM G?*0 oaKPNo101
PA.mo.isa . ONelalPAL" f a
Nam$A ftltN. COONan
IW TImeR A
mosirsem, ~ PANami eI
GI P ASASSNOa 00 S I
SINGULAR PA 3sPetmItI
PAPOA IIe p. at4 Cmx, t$ ap. LOCea 10at I
NmN4L0 Seall"
*IconIs O
memory module size indicated from that of the earlier portion@ of
[ the program.
[ Semantics
The portion of all machine translation programs which have
I been most impervious to solution is that represented by correctly
erasing all but one of the possible English translations for a given
foreign word. Some groups have even been dismayed enough to
indicate that human intervention will always have to be required at
this point. Our own research has led us to the use of techniques
which with hindsight would appear to have called for a cascaded
search. Among the problem types are homograph resolution,
idiom identification, word combination recognition, and determiner-
dete rminee paring.
Idiom recognition requires the knowledge that two or more
words, when closely connected syntactically, possibly even con-
I tiguous, have a quite different meaning and possibly a different
grammatic aspect than would be expected from the sum of the
parts of the idiom. Here techniques that have been used include
the dictionary labelling of each member of the idiom as being a
potential idiom and then, when a conjunction of possible members
I occurs, searching in a special idiom table. Even the necessary
identification of the idiom potentiality requires search procedures.
The interaction between idiom identification and associative mem-
ory computer configurations should be studied.
Homograph Resolution
I Many words share the same spelling. A dramatic example in
English are the three different meanings of the spelling "fast. "1 As
[ a verb, it means to not eat; as an adjective, it contains the two
opposite meanings of tied tight and rapid. Resolution of this type of
ambiguity is often made by searching in the neighborhood of the word
for indications of the word class of the homograph. Such resolution
would be more easily effectuated with a search memory.
[[ "9
Word Combinations
Word combinations represent an idiom type whose configuration
is both lexical and syntactic; that is, the correct translation of a
primary word or of associated prepositions is determined by the 1existence of the primary word and a particular type of syntactic en-
vironment. This environment can be represented, for example, by Ha word in the same clause with governed genitive and dative depend-
encies, but with an explicitly missing accusative object. Since indi-
cations of the limits of this environment are only vaguely fixed,
search is again called for.
Determiner -Determinee IMore than any other type of semantic resolution described
thus far, determiner- determinee resolution requires associative
memory techniques as can be seen in the flow charts for the method.
Here the resolution problem is represented by the existence withinwide syntactic boundaries of two or more semantically connected Iwords. The "hunt" initially consists of ascertaining whether a
determinee exists. Since determinees are capable of having dif- Iferent determiners, the next aspect of the search directs one to
"hunting" first the identification of the possible determiners, next !
the establishment of their coexistence and then, finally, verification
of whether they exist in proper syntactic relationship to the deter-
minee. Determiner-determinee resolution has been awkward to
program on a standard computer. Its resolution should be far more
powerful when associative memories are used.
OutputI
A consumer of automatic translation thinks of a printed page
as being the output. This is only partially true. In truth, the printed 3translation is the fruit of the process. The necessary evolutionary
improvements receive their sustenance from the research output. I
Since nature of handling the research output almost invariably calls
I50 1
Ifor the identification of particular translation configurations in a
large body of text, this most necessary area of automatic translation
would seem to benefit greatly from any hardware development which
eases the burdens of search. Here one would study the proper way
of emitting the research output so that it would be best married to
the coming hardware realities.
Dictionary Lookup
The problem of dictionary lookup in RW machine translation
has two major aspects. Most Russian words found in the text that we
process actually exist in our dictionary. A sizeable percentage of
the words, however, are represented in the dictionary with a different
I ending. By stripping off the possible suffixes of text words which are
not found directly by lookup and by then searching on stems, the
I semantic and morphological aspects of these words can be derived.
The use of a search type memory in this portion of the problem is
I obvious.
IIIIIIII
DICTIONARY UPDATING
Our initial machine glossary has been updated in eight separate
runs. This added 6,909 new forms to the dictionary, which now con-.
tains 23,713 forms. Since our stem-affixing procedures allow us to
grammar-code new forms from different forms of the same stem
found in the dictionary, this will give us a coverage of about 50,000 11Russian forms.
52
I
III
I SAMPLE TRANSLATION
I!I!IIIII
W "dW
30.-
at Ndo I- I -
W 4;r ow IUA
Ox 9 W 19ILU. #A 1- 09. z r0 0 o tw W-U
W U a t 0w CU z
iL of WU
U..a. 0 -
W 0U 0 91V4 9 14 94 w #I.i
'a 0 U.
U. W X-
UW i a I
4 0 w 4 94. IL. 30 - VI 0L
I. W W
-1 w
190 W9 9 W
us I.-
54 P
Nw0 0u z I
U'
w I- ,-4
IL ~ ~ V -1 w 1--4 U .
d4 * I" Li4III U W6 i 1.- 49 ua4 0 xJ ow *I jU
x~a at I-IU
24 0 0- -9 Ia iLU gg x U ki 0 Z digj
S., . LU J- - LUL
U LU w I- 4A gow jaI
,. 2 2a OO IL 4. 0
I.LLLU LUw 0 K e"0 z-
I. 49 LU 3D I-$- 44 x LU -1 4b o
ft * z I.0- 2 &L La%A wI L 0 1.- V j 0 -
o 0i LUw 0 LU VIAU W fLU1 ow a 4 1.- 1.- x 0
S. Z U 0. .J V 4jm ac UU z acf %A C; &A JU
I- 0 4 9 z w0 ae ac 0LU0 z. LU Z 0 LU 96LU 0
0 U dc 4 0 EW - 4
&4E Z I A 4 w.4 LU ( LU IA 0-
a .9 2 0 U w t- - 0-LU K0 . 4 o U6 Z 2.- IA 0
xLU U 0 a 0 LU LU w
I- P-- C- 2 d0gi4 VLU.J -4 4 V - 3 I
woo LUwL LU -9 4U0 Li I- 1.- 0 LU 4 0 4
VI 0 C-1 $, V -
0 nI at 0U,
I-- -x-w 33 0 %A J
'.0 0 9 x 0 V
22 LU I. LU LU
oi at 0 U 0 VI0'L LUU
in CA I- 0 4 z9 LU 0- 1. c 0 tie L
IL Ia U. 4 Ui o-Z . ofLU 0 A 0 0 4 2
0- 1.- 0 'J 0 W04 VI V . VA of 0
g. a. I 0 LU I-K * iLU 49 4 2 x 1. 0 LU
ug 0 - ado U. 6- z C1)0
a 4 0 E Z &L W 0
0 " 1- 20- -00.ot I 0- I.LU
MU 4 I T K 6m~SU. _
r. 4A e
S KO 0 W wn#A vi m W 1- .~
0U IA 2 C 4A j2 03 [1 a0 oZ.4 20 z
at wi m
a. w
-0 t-
4 I0 $
at4 0 49 z E2I-4 w 4A - 4
-I K . . J 1
2 ~~~ Li4CLU0 4 L
0U ..j 0 . . -4Ap- cc u I
In 0 I A. - -LUJ 4j 2. - tInu
2 -1 A InI.LaI. U LU 4 U,
cc8 In4 .4 - i44 0 ui 0U Vz CL~ I-- 0 4 -
2 t 0 of. Z z.4 L
x 0U 1- On 4 L 4%A . IA LU w LU LI 4
-C 2j MnU C K
05 L2 0 LuJ z 4t% I= i0i Cc -LI U In
4 0W cc I- Cie In0. 00 -9 In jn - . In
0 IL 3 0-o u. u. i
.j 03 Ll 0 4A J ofJ 0In o 4 -9 0 nat. u 0 0i LI94 LI 0U 0 0 cc cc
CL CL. %A of % 3 LU
c i.w 2 -: cc b-
xn 4c- 2. I-.U. Vw .OA I- 0 C 1.-I
0 0 0 of L 4 I&A Uj 0 z
go -C 49 10 u 0 @. I 1 m 4
19K LJ W "z )0
56 10 A- oo I A L
IL # -
4A U.Z 0 w
w x t a 11 0LUU L tW AI
0- 7 LU
I Km ~ . If 0 .~.0 -"LULU zjv w
LU If~3 .LU 0.. x LIj b.- OW Ij %A
z of .IL LL 0 0- LU
.j 1. -0.z0.I ZL L
Wi 4 of .. > uC.) 0 aul
oa.-e- ~ L m . am 0L00i-. -LU x x0 uu
0 - z x~-0
I.- 2ce..
-j xU 0 0U "A
I- x z C z 0 Ci
0 1 0- A 0N- &0 0uI. of 0 1.0- CL
ID> 0 UJ
Au z 0 1- U.zb- ecJ cc . U. -9
0 CL 0 cca~ vi,.U Lu 0 0 0e w
0- 90 x 40 0- -C IL0 0U Lu C. I 0
&.m- Wa o z #A~~~~ 0~ ? -L
Wy 0 0. u Udo oo w LUL
LU 0 ItJ U..
at ag 0 - 0zu 0- U. 0 m
-9 LU 0 0 1 Loo x w0-
0-3.C ) 414 LU
-4 U. w 5e
IA W Af
* a a-u
tu 0- U. W L-)
1. a~ V 04.0 0 z
Uo WE b- P.
'A 'A 0 u w - 0a. I. 02 I %
2 WU. IL 1-
'A 1- 1.- 44 B- 0
*ZZ 'A 0.'AU U. w 'A 0. Uj
j 3p co- e U -a
.j di )I.-
W6* -a A. P .- a i 0. 1.-
01 9 0. CS *2
o C; 22 U IZ 0
3 * 49..a 0. I-4 A9W('A 2 0 A- . -I
.j 5- 44 x W-
U~ )p w... 2 . .j )U
2l 'U IA Ci j C
o s 'Ai -mo b LU Of I
.J 5- A 0 404 2
4 .1 'U 'U .. 5B'U Is- 0a A. z 2
I~~ 08 2 A. 'J ILU 0.- w z
0- Cie- 2 4 'A.~II~~ 4L a U C 'UC
Q 0 x x 0hUi 'A C LA I- U%) -6- 'U 0. 1- B-
oe 31C- 00 zUW to I.- W C u.5
U UU 0 U.U. ~ ~ ~ ~ U I - . A-I
'A 0 0 wU -z %U z Cz 2j w #U A 0
u , 09 9 U A .. aU
5- IjI-4of W 0. 0 1U..'A
1 U 1- 1.. %Ain 'Ax 'AU %A 0 at W
x ' 0 -8 a. -j 6- -
Ul I. uj ; 0 01C j U UU 5. 2.
4 . 'A U k'
U 0 )- I.-
4 5- U 0
Cw -5- ~ 2
ww
t' 0 mi J
IL a fT l 'K 0.
49 3 A . inCI- IL im..
w. n *w -W C9 0 S.S. . kA CL.. .5
in~ X wz!uz
LU 0. q w U
>n .0 cc j. j 0
W. W) >. I- 4 4A of
2U 0 w A 2 Uaf u or. U. of K- 0%L 4 .2 oxcc
LL Ui .1 ). . 04LU ix ax(L u -9 0 L
.j n ,jA 0b r 4 L
b. I.. w un .ui I-
49 LU L 0 zU 4 I
U x 0 U.U
W) I. U 0. IU4 Uj LU of z .J 0
10C mn 3 L0 I- I. w i-
0 7 J 0K .i u2 I.-0 LU.
4A I- UU at4 LU m in-j Xn b- N.L
ee inJ 0 .z-
4i U) V) LL Z.
4 x 0 of L
-c U 0 0.L
6K 0. K 7
4 I.-Z LU t2 0
0 CA - .U &AO . kI- 0 .JU I- 'I p.-
49 LU cc I.- 0 i
6 - 9.- CL in i
*i u 0 LU wi aoA Vn of LU: -4 in0> 4 LU 3. 0 KU
LU1 L
ot 0 4f -iULU in V) U U
o0 V -0 LLU M0 L
0 -9 LLA Ut L
4 -in p- U )0.0 i.U VLA 0
4~a K I-i ~ ~ L
CD LU L-L 0 *
6- U
) at 0 U z~LIL %L UA & K 6
a. U. 2 ina. U 359
I.A 30=L u i r $ a
z 1.. uJ -j ctwG3 2 -� U. al Sb0S h 44
us 7 'A U. *5 b W
e4t"s 129 11u 0 .- u w3
of U. a-
'A 'AA. .- W
ee %A A w-4o C :; W WO zA
6 A Lal :P IL . b 20W 0 .. j C IA 0 W 4 I
-j 0 Wi 0 ccC 00
'A UU U) W 3
vi U0 %A U. .- C
'A IC ' #A cc4 2.
-. 4 W W .
.IA 0 atWIiw 0 a 0. us
4A 3 'A aC
2 bw 'A of 2
OC 6 - 6" QJ m A 00 U. 70 GC'A 20 C C0S cc WO -4
L4 'A.- t I La AIL 0.- z - lo.
0 w' x Ln VIA' -0 C
LI 9-A il
ill- z j'- 00 U.
x we- z
of U U.C - S
0U 44 . 5 '
5- A 99 0.Wj w A I z 4LA
z z -
I- U6 w Sb0
WW 5- A I 5 4 '
22 x W Im x- s - Z
00 Ui Z ag #A z vi1 4 OAW U. CL 00 .jW
ec 'AU 'A I. IL S. j3 IA z WE OR 4 -. W w
9- 9- U4 %'ZI W) 'A. CU 5p- at of W
0~~~95 A4S0i. AZ#A - Q 'LU WW w
#A z t4 I-4 I UCe. S 0 . - ''
02 IL 0UA 4R vb.J
'A 9-S 2W 4
og~0 U A .iw
'A A
60
TA 1. 0 .
Nu z
IL 5 - !!
9 L a of ' . be
xU Li Wi 2 I&in U U %A L4 U.
MC 0 1- aWoIl
49 oe 4A U.
0 U LL U 0
0 U. x-t: b- z L -U CU 2
U. tA u.5 o IL doC) B 0 x %A
=1 ui b1 U W
0n 4 Ga :0 0 2 W
0 g.. w = B- B
6- w 4W j
U. LL 0g u- > 2 e
w a w xa -
cz 0 4 z4 2In 0 . -6 uj 0 '
1.-0 0 W5 z - cc
2 In ..J -U 0
0 Z 0 0 #A IA
* m4 -ide U 0 In toI
0- x 04A A w
LU w
z 31. UUi IA -U
0 0i of 4 01- U
w n 09 0 0. IA 0 In
Uj 0 In w -4 vi- 1
U. ao z -of *0. cc 4 z -W4
LU 1 ZL) O 0 0U
4A Lo- 0 x I4 ~ v 4r 39-~ -B- ~ w
%A x B-Z i z LA
uow0 0 j x
a LU I.. I- x x Cie
- ui 0 #A of I- B
wI- I. LU d .. 1 zU
oe 9 B- Ib- I. to,
I. B- &M - 960 -J %AI
U 49Z-:pu- 4c 31. .j 30 0
mi W IL ULU..I .1 L
* B- ~~~U. - zi u B
61
a. wo u a xn I-IOU 66 6- 3.
0.. iny I")0 U.V~ I
uj ~ ~ w .8 0 2 2
J 14.8 0 ~.
5- ac x' 44 4I- - I- ..
Ul z 4C LU i L4 T' z I .J LU 0
U. a 3 "- a. z z o9 1- 4A MJ c Iw - 1.- u4) W AC 4A U.U
Wa U. X d8 -9 4A0
2 :: LL L 0 K UJ 5- I .- 4 0; 4:i u ac I
in 4 - z 4930In w u .-4 LU 2I- me4 A I- 3
w 2 LU -4 z 4I. - a a J u
4A- L Z LU 4A j-. 44 .J w nU
0- 1. Il Z uj U; UwU 1; 0: J 0 LU 0-x z z .4 .j a6-. -j 4 4 x
KU Z 4 %. 0 of a:a0 IL 4 a 0 LUz U. LU Wux U. ~ s
S W .4 - - 5- ~ I of W L KLU L- 9-K K nz z z - LU S. L 0
CL 4 C :) = a.j x LU * i 0
4 4 V in 0 Z2 K
"! uS ag0-- $. L w a. ws
4 ~ ~~ ~~ z8 x uu a 4c'l .- i ~ .
5- LU CL - IA 2 tU
IA A X- LU I- Lo20- -j .8 I- 0
0U LU J - 0 U
5- 2 I.- 'i - -
.8 4 -0~ 0. - - L
44% 0 4.U. 44. al u u.
tu J- CU ).C
t. 0 4j L. 0- z co
2Y in U. 0 K3' a % -o
S 0 0 1
& z ; . aIj w de on 00- x U 0 0..U 4j 10 19It z 0w
0. 2 .. e I.- 0i 0
_ z .Z a. aI.- ZW 1L4A :1 02 0
U, cJ w.. 2 JmI-.~~ CIO...O J 0
1o IL ae J ~l u C tLLU 00 AA w
-j clo~s ( 0 1-JLi 6W LL
49 x. w LU C40 I- (. 0 x 00
zU ag 0
44 Z 0 0U L
S4 3T~ 0 Ltu LU m - O0x w -
0 a. 0 U
0 U) 02
I.- M 0U t3 0a
1. IAL) 0II-. ( A 1.-2
~m W .J . 0
IA u . -ii Lu *0lI.- W 0o
0 ia IA a. IA oLU z -j a 0 cc0 CLa
-9 44 LU 0. %- - 4A 0Li 4A M 00 LU CL
J LU - 0 xU aL- 4A uj
CL0 C.Li U. - U X 0 cc
LU IA 0 U. 0 C-
x: 0LU I- z-c- U
- 0 mA = 0 0. -2t LA 1.- . 0 0A 0
.- 0L 0j0t
fl 0. W A1- Lc -j 0
CLU 25 W.A1- U 0 7a a. 2:- e w LU 1- 0 0 0
r- 0. -f0z 00- ui. I4 % 0 Ia,'Q) x LU 0 -.- JO L 0 S I
7 2 U, f- LD 4A 0 0.
a. LU 0u~ ~ u w r U0 a - U 7 *0
-4 N Uj 0 2! C LAC A 0I. -2 LU 0 LU
UJ ZK. m.
.U LU 1J. t. IA l'ý U.IA 10 -x -1 0
>. u. mw z t0 1P.A~; 0 QAU 3 eI
63
II
FLOW CHARTS OF IDETERMINER-DETERMINEE METHOD II
HI
IIIi
64II
AIC
StorIe zeros in AMN and
Save indexes, store zer-N os in a block of 100 ad-
eesses
Bump address 1(to next word)->Test out of sentence yes )I
Ino
Is the current word adeterminee (is "B" thesecond character of thesemantic code)? no PER
Store the address con-taining the word numberof determinee in nextempty address of 100word block
Is block filled? ___
I no
I Is urrent word a deter-miner (is "C" the thirdcharacter of the seman-tic code word)? no
IyesiIs "AWNC" filledt ' yes- ;
Ino
Store the address con-taining the word numberof determiner, in AWIC
* Not yet written 65
g
PAWNC PEM
Stor•e the address con- Pick up next entry intaining the word number 100 word block and pick 11of d terminer in AWNCl up word number referred
Iv to thereP PE Is tie word number less
than the word number ofthe first entry in thedeterminee tablet -,)
Is tke word number thesame as the first entry
Sin the determinee table? x-M+ FU_ ~ noIs first address of 100 Is te word number moreword block containing than last entry in the yesaddresses of word num- deterjinee table? ____
bers of determinees tnofilled? no Is te word number the
ýyes no same as the last entryIs AWWC filledt no ENT* in the determineeye
,yes tabl, yes _Are the contents of ___nofirst address of 100 PCLNword block and AWW noequalt
AyesIs the second addressof the 100 word block yes )
Ifilledt$no yes
Is AWNOl filled? - PBS I- JInoENT*
IIII
* Not yet written 66
PCIN
1d -together the add-reuses of the top andbottom parameters ofthe table and divideby 2 (gives mid-pointof table)
Is tie contents of thismid-point of the tablemore than the word num-ber of the current de- Yesterminee? PLRT -midpoint of the
t•no table the new bottomIs e contents of the parametermidpoint equal to thevord number of the cur- a PCLNrent, determinee? yes P -
Made the midpoint ofthe table the new topParameter
Do tA top parameterand the bottom paramet-er of the table differ yesby 17 no___T4no
* PCLN
I Save~address of match
PRPM+2
Save Iaddres of match
* Not yet written 67
Savel'address of match 1PMP•• t the determiner
table** to the addressindicated in the deter-mince table
Pick up tests. Ascer- [Itain what test applies
Doeskcomputer word ex-smined have an indica-tion that it containstests? no N (error in table) U
,yesDoes code indicate amodifier determined bya following nominal? e---PFN
_ _ _no
PNPl sDocs code indicate anominal determined by yeIa following nominal? Pe_
F_ WDoes code indicate apreposition determined yeby a following nominal? - .P
_ _ _ noDoes code indicate anominal determined by esa preceding modifier? yesPPH
__no
5 ?IDoes code indicate agenitive nominal deter-mined by a preceding yesnomi pal? PN
- -noMNT*I
SNot yet written
*See page 5 for the format of the determiner table
68
IIII
Is rd following deter- yesminee out of sentence? __InoIs determinee a modifier? - ___
yes
in following word an ad-yeverb or modifier? - Bump address
jino
to next word
Is following word a nm- no
,es
Save word number of fol-lowing nominal as poten-
I tial determiner
finished
IIIII
69
G
0
is tLa following cur-rent deeine out ofsentence (ski a dverbe
m id mo d i f i e r s )
fIs Ir folidving cur-rent determines a nom- I
inal (skis? adverbs and'e nomodifiers)?
Does it have any genl no
tive agreement bit., L 11
eSave rd nmber of vord 11following determinee aspotential determiner
finishedVF
found
IIIJ
70
PWE+l - - Pick up next determinerword number in determin-er table
Hav we passed the lastdeterminer in the deter-miner table for the de-terminee on which we are yesworking? e Finished exit
Is toword number ofthe potential determinerthe same as the wordnumber for this entry inthe determiner table?
jyes
Is the test fulfilled bythis determiner equal to ,the one which initiated nothe search? no • Finished exit
IyesPlace flags in one ormore of 5 consecutivelocations indicatingwhich translation ortranslations apply
j Foun exit
7II
71I
ME'
Seareh for first non-blank character inEnglish field
Is ttis a character ofa translation we want(first translation,etc.--check flags set noin Piii)t
ME15+2 VMove to next character
Ha ihe 6th computerword of English field (e"been reached? . roces last e__4 no processes last
coputer word of Englishfieed)
4Is next character a noplus?
another trams-lation has been reached
INSE
Mala out character 3
72 I
I
I. CONCLUSIONS AND RECOMMENDATIONS
Semantics
The determiner-determinee method investigated under this
contract yields results for resolving some classes of English
equivalent ambiguity. The research procedure used pursuing this
study will be fruitful in providing rules for choosing semantically
appropriate English equivalents for a large list of Russian words.
It is recommended that this technique be further refined and
applied to a significant body of fresh text.
Syntax
I The ability to rearrange translations from Russian word
order into idiomatic English word order is dependent upon the
j correct determination of clause boundaries. Research done under
the present contract in the identification of subject and object
I packages in conjunction with other clause boundary indicators
allowed an improved rearrangement procedure. It is recommended
that continued study be applied to the interrelated problems of the
I resolution of clause boundaries, the identification of the functional
attributes of major syntactic units, and relations between clauses
I within the sentence.
[ Diagnostic Procedure
The growing complexity of the machine translation programs
and the recognition of the fact that they are constantly being changed
for experimental purposes have highlighted the need for diagnostic
procedures. Systematic diagnostic procedures similar to those
developed for computer checkout can be applied to machine translation
programs. Such a procedure can help in isolating the causes of
Sdeterioration of previously trusted areas when a change is applied
to a supposedly unrelated portion of the program. A computer
j" program was developed under this contract to perform this diag-
nostic function.
I1 73
computer Configuration
Work in the semantic area of this contract highlighted the
desirability of using a search type memory throughout the
machine translation process.
74I
top related