title a document-retrieval method using...

16
http://repository.osakafu-u.ac.jp/dspace/ Title A Document-Retrieval Method Using Dependency-Relations of Titles Author(s) Takamatsu, Shinobu; Nishida, Fujio Editor(s) Citation Bulletin of University of Osaka Prefecture. Series A, Engineering and nat ural sciences. 1978, 27(1), p.29-43 Issue Date 1978-10-31 URL http://hdl.handle.net/10466/8299 Rights

Upload: vanliem

Post on 02-Apr-2018

225 views

Category:

Documents


4 download

TRANSCRIPT

http://repository.osakafu-u.ac.jp/dspace/

   

Title A Document-Retrieval Method Using Dependency-Relations of Titles

Author(s) Takamatsu, Shinobu; Nishida, Fujio

Editor(s)

CitationBulletin of University of Osaka Prefecture. Series A, Engineering and nat

ural sciences. 1978, 27(1), p.29-43

Issue Date 1978-10-31

URL http://hdl.handle.net/10466/8299

Rights

29

A Document-Retrieval Method Using Dependency-Relations of Titles

Shnobu TAKAMiegrsu" and Fojio NisHiDA"

(Received June 15, 1978)

This paper presents a method of document retrieval using dependency-relations

among key-terms which appear in titles of documents written in English.

A noun phrase or a noun clause contained in a title is converted to a function-

expression and normalized so that semantically equivalent expressions have a unique

syntactic expression. The normal function£xpression is recorded in a tree-1ike file.

The retrieval system is also designed where, for a request expressed in a noun phrase

or a noun clause, almost all the documents implied by the request are efficiently

retrieved.

1. Introduction

This paper presents a new method of document-retrieval using dependency-relations

between words involved in document titles written in EngliSh.

In order to desigriate the requested objects more clearly and precisely, it is necessary

to use a modified and delimited expression of objects such as seen in titles of documents.

However, the method using roles and lmks that have been studied so far has, as well

known, a considerably lower recall rate5). The main reason is considered that the usual

role4ink structure lacks a technique that unifies various equivalent expressions and also

identifies implication-relations.

In this paper, a title expressed by a noun phrase is reduced to a function-expression

by a deterministic parsing. Most of ambiguities of dependency-relations can be removed

by using a set of lower categories ofverbal cases. The function-expression thus obtained

is transformed into a nomial fbrm by several steps, and thereby various dependency-

expressions which have the same meaning are tried so as to have a unified expression as

far as possible.

For the further improvement of the recall rate, a retrieval method using an implica-

tion-relation is introduced based on the concept of parts and wholes.

A construction of a Me which contains dependency-relational data is also presented

for ethcient and systematic retrieval.

2. Conversion of Enghsh Titles to Function-Expressions

2.1 Function-expressionsofEnglishtitles

Most English titles are expressed by noun phrases and noun clauses. The subject re-

* Departrnent of Electrical Engineering,College ofEngineering.

30 Shinobu TAKAMATSU and Fiijio NISHIDA

presented by a noun phrase or a clause is considered to denote a subset of objects,

attributes or events which are all restricted and modified by other objects, attributes and

events.

This consideration leads to the concept that a noun phrase or a noun clause can be

generaily represented as the following function-expression:

f(Ki:=ti, K2:=t2,''', Kn:=tn), (1)where f is a function symbol consisting of a noun word called a governor, each ti (i =

1, 2,・・・,n) is aterm which is amodifier calleda dependant, and each Ki (i = 1,

2, ・ ・ ・ , n) is a case-label which indicates a certain dependency-relation between f arid

ti' ' The term ti consists of an adjective word, a noun word or a string of words of a

functionexpression such as (1) or (2).

A function£xpression of an adjective clause takes a form:

p(Li:=si, L2:=s2,・・・, Lm:=sm), (2)where p is a word of a descriptive adjective or a verbal adjective which governs each

term si (i '-- 1, 2,・・・,m), and each Li (i --' 1,2,・・・,m) is the case-label ofa

dependant si on a governor p.

A term si consists of a noun word, an adverb word or a string ofwords ofa imc-

tion-expression such as (1) or (2).

A case-label Li in a function-expression (2) originates from the cases ofa predicate

word p.i) These cases are subdivided into obligatory cases shown in (a-1) '" (a-3)

contained in Table 1-A, and optional cases shown in (a-4) "v (a-13) contained in Table

1-A.

Table 1-A. Case-labels in the function-expressions.

Thecategoryofagovernor Number Case-labe1 Thecategoryof

adependant

Thepreposi-tionofthedependant

Examples

(a-1) SUBJective by,of,in Fregeandefinition

(a-2) OBJectjve of Ionimplantation

(a-3) COMPLement Enabling(adevice)totest

(a4) INSTrurnent OBjECT Measurementbyprobe

(a-5) MEANS EVENT

with,by

on Fabricationbysputtering

(a-6) PURPose EVENT,OBJECT for Methodformeasurement

EVENT (a-7) SOURCE OBJECT from Emissionfromsilicon

(a-8) GOAL OBJECT ,to,mto Implantationintosilicon

(a-9) CONDition ATTRIBUTE (High)speeddrive

(a-10) MANNer DESCRIPTIONin,with

Efficientcomputation

(a-ll) EXHibition OBjECT,EVENT in,of Optimizationintransistor

(a-12) Location LocATION Underwaterconnection

(a-13) TIME TIME

in,on

at Real-timeoperation

A Document-Retrieval Mlethod Clsing Dependency-Relations of "tles 31

Table 1-B. Case-labels in the function-expressions.

Thecategoryofagovernor Number Case-label Thecategoryof

adependant

Thepreposi-tionofthedependant

Examples

(b-1) SUBJ-i of,with,in SuperconductingjunctionOBJECT

ATTRIBUTE (b-2) OBJ-) of,with Improvedefficiency

(b-3) INST-i for,to Probeformeasurement

(b-4) GOAL-t .EVENT (Ion)implantedGaAsOBJECT

(b-5) PROCESS(PURP-t) by,with Diodesbysputtering

(b-6) EXH-t Systemimproved(inefficiency)

ATTRIBUTE (b-7) COND-' in,of Driftvelocity

PARTS (c-1) COMPONENT MATERIAL with,fromof Thermisterfromgermanium

DEVICE (c-2) COMPONENT PARTS with,of Intensifierwithphotocathode

PARTS (c-3) COMPOSITE DEVICE Inductanceforfilter

MATERIAL (c4) COMPOSITE PARTSin,of,for

Filmmaterial

(c-5) ATTRibute ATTRIBUTE of,with,in (High)frequencycapacitorOBJECT

(c-6) NUMber QUANTITY Threesystems

(c-7) OBJECT OBJECT of,in ConductityinjunctionATTRIBUTE

(c-8) VALue QUANTITY 300n.cm'sresistance

THINGS (c-9) ARTicle ARTICLE Asimulator

Obligatory cases and their categories are intrinsic to each predicate, and given to each

predicate in a word-dictionary, while optional cases, their categories and prepositions are

assumed here to be common to all the predicates. In this paper, the categories of nouns

are classified as shown in Table 2, where the categories of adjectives and adverbs are

assumed to be the categories of their nominalized words.

Table 2. Classification of categories.

Categories Examples

THINGS

NON-EVENT

EVENT

TDEVICE .....

.BJEcTi&fiA:TES.RfiAiL.'XL・

ENTITY ...ATTRIBUTE ..........QUANTITY............

LocATION............TIME.................

STATE{g.Ogt,S,pE.oS,sS,ii,T:i:o.N'

ACTION

intensifier, rectifier, microscope, ・ ・ ・

film, diode, thermister,・・・

. semiconductor,oxide,Ge,・・・

X-ray, pulse, wave,..・

. capacity, speed, conductity,・・・ two, three, some, 6100C, O.36eV, 300 st ・ cm,・・・

... underwater, terrestrial,・・・

... real-time, interval,・・.

FUNCTION, ... PHENOMENONMETHOD ......

USE ...,...FORMATIONCAUSE .....

---

possesslon,

composltlon,・・・ efficiency, correctness, slowness, ・ ・ .

. rectification, amplification,

scintiilation, superconduction, ・ ・ ・

. implantation, sputtering, removal,

mspectlon,・・・ use, application, utilization, ・ - ;

fbrmation, production, fal)rication, ・ ・ ・

cause,・・・

32 Shinobu TAKAMMSU and Fuiio NISHIDA

As seen from Table 2, the nominalized words of descriptive and verbal adjectives

belong to a category `event' and the other noun words belong to a category `non-event'.

Each case-label Ki (i -- 1,2,・・・,n) in the fimction-expression (1) is defined based

on the cases of a predicate mentioned above as follows:

[Case 1] Whenagovernor f belongs to acategory `event', the case-label Ki ofa

dependant ti on the governor f is the same as that ofadependant ti on the predicate

word corresponding to the governor r

[Case 2] Whenagovernor f belongs to acategory `non-event' andadependant ti

belongs to a category `event', the case-label LJ・i is used. L;・i means that the case rela-

tion of ti to f is inverse to that of f to ti expressed by a case-label Li.

These inverse case-labels are shown in (b-1) 'Nt (b-7) contained in Table 1-B.

[Case 3] When bothagovernor f andadependant ti belongtoacategory `non-

event', a certain functional verbal word such as `contain', `compose' or `be' can be con-

sidered to be omitted between f and ti.

The dependency-relations in this case are those between a composite and its com-

ponent, those between an object and its attribute, and others. 'Ihe case-labels are given

as shown in (c-1) "v (c-9) contained in Table 1-B together with the categories ofa

governor and its dependant.

2.2 Conversion to function£xpressions

There are several syntactic patterns of noun phrases and clauses corresponding to

their function expressions (1) and (2). Table 3 shows the main parts of syntactic nies

of noun phrases and clauses, where the square brackets [ ] mean the involved symbols

as well as the brackets can be omitted in some cases. Corresponding to a non-terminal

symbol, the label of the part of speech in conventional use is shown in Table 4.

These syntactic rules have the following context-free like form:

<R>::= <D><G>gl<G>g <D>, (3)where a subscript `g' desigriates that the word reduced to the non-terminal symbol

having the subscript `g' is the governor of the words reduced to the other non-terrninal

symbols appearing in the right side of a syntactic rule.

The parsing is mainly based on the bottom-up analysis of the precedence grammar

and the usual categorical matching.2),3),8)

As a phrase or a clause is reduced to a non-temiinal symbol <R > by using the

syntactic rule (3), the function-expression corresponding to <R >, namely,

is constructed if g is aword of <G> and if d is aword of <D> or a reduced fbrrn

to <D>.

A Document-Retrieval Method Clsing Dependency-Relations of 7Vtles 33

Table 3. Syntactic rules of noun phrases and clauses.

Classification Number Syntacticrules

(A) (Al) <NPC>::==<NPC1>[and<NPC>]Co-ordinate (A2) 1<NPC1>or<NPC>conjunction (A3) <NPCI>::=<NC>1<NP>

(Bl) <NC>::=<NP>g<VAPI>(B2) l<NP>g<VAP2>(B3) 1<NP>g<DAP>

(B) (B4) <NP>::"<NP>g[<PP>]Nounclauses (B5) l[<ART>]<NPI>gand (B6) l<POSS><NPI>gNounphrases (B7) 1<IQADJ><NP1>g

(B8) <NPI>::=[<DQADJ>]<NP2>g(B9) <NP2>::=I<VAP3>)<NP3>g(BIO) 1<VAP4><NP3>g(Bll) 1<NADJ><NP2>g(B12) l<DADJ><NP2>g(B13) <NP3>::=[<NP2>]<N>g

(Cl)' <VAPI>::==<VAPI>g[<ADVP>](C) '(C2) kING>g<NPC>

Adjective (C3) <VAP2>::=<VAP2>g[<ADVP>]clauses (C4) 1<EN>g<NPC>and (C5) <DAP>::=<DADJ>g<PP>phrases (C6) <VAP3>::--{<N>]<ING>g

(C7) <VAP4>::=[<N>]<EN>g

(D).Theothers(Dl)

(D2)

(D3)

<ADVP>::=<ADV>1<PP><PP>::=<PREP><NPC><POSS>::=<NPI>'s

Table 4. Non-terminal symbols.

Non-terminal symbols

<NPC><NC><NP><VAP><DAP><ADVP><PP><poss><ART><IQADJ><DQADJ><NADJ><DADJ><ING><EN><N><ADV><PREP>

Mnemonics

Noun Phrases or Clauses

Noun ClauseNoun PhraseVerbal Adjective Phrase (or clause)

Descriptive Adjective Phrase (or clause)

ADVerb PhrasePrepositional Phrase

POSSessive noun phrase

ARTicleIndefinite Quantitative ADJective

Definite Quantitative ADJective

Nominal ADJectiveDescriptive ADJective::esiepnatr9iacrilil/iPle } verbaladJective

NounADVerbPREPosition

adjective

34 Shinobu TAKAMMSU and Fojio NISHIDA

ff the reduced fbrm to <G> has already a fimction fbrm:

g(Ki :"di,''',Km :=dm),

a function-expression :

g(Ki:=di,''', Km := dm, K:=d)

is constructed.

The case-label K in the function-expression (4) is determined as follows.

If either the governor g or the dependant d belongs to a category `event', the

case- label K is determined by both the category of d (or g) and that ofacase ofa

predicate corresponding to g (or d) and also the preposition of the dependant by

refering to Table 1-A and (b-1) 'h- (b-7) in Table 1-B.

If both the governor g ahd the dependant d belong to a category `non-event',

the case-label K is determined by the categories ofboth g and d and also the preposi-

tion of the dependant by refering to (c-1) "- (c-9) in Table 1-B.

[Example 1]

Let us consider a title

`Si02 Mms fbrmed by ion implantation into silicon'.

The categories of the words `Si02 ', `fdrns', `ion', `implantation' and `sMcon' involved

in the above are `material', `parts', `object' `event' and `object' respectively. It is also

seen by refering to a word-dictionary that the case structure of both the predicates `fbrm'

and `lmplant' is

SUBJ: =human or device, OBJ: =object thing. , Hence, by refering to the syntactic rules (B13), (B2), (C3) and (B4) in Table 3, the

above title is converted into a function-expression

`frkns(COMPONENT:=Si02, OBJ-':=formed(MEANS:=implantation

(OBJ:=ion, GOAL:=sthcon))).

The above parsing can also remove various ambiguities of dependency-relations.

However, there stM remain some kinds of ambiguities that can not be removed by a

simple categorical matching used in the above analysis.

Let us consider a case where a phrase or a clause that belongs to a category `event'

can depend on two words syntactically. Then, if these two words belong to a category

taken by a case of the predicate word, the simple categorical matching can not remove

the ambiguity whether the predicate word depends on either of these two words.

Such ambiguities can be almost removed by using a set of more inferior and precise

categories taken by cases of a predicate word.

Given a predicate word p, there are generally several sets of inferior categories.

Denote one of these categorical sets by

A Documen t-Retrieval Mlethod tlsing Dependency-Relations of 7Vtles 35

[C,,・・・, q,・・・, C2,・・・]. (5) Then, a relation:

(V ti)・・(V tk)・・(V tR) ・・・ [p (Ki:= ti,-・・, Kk:= tk,・・, KR := t2,・・) (5') ' ti E Ci A' ''Atk E (Zk A' 'Atg E CIz A' '']

holds.

Using the above relation, the ambiguities of dependency-relations can be generally

removed. Suppose, fbr instance, a noun clause

`t2 ''t''P'' tk ''',

where `p ・・tk・・' is an adjective clause and it is ambiguous whether the adjective clause

dependson tR or t.

In such a situation, if there is a categorical set of p such that tR belongs to Cli

and t does not belong to any category of the categorical set (5) under the condition

tk E Ck, then p is determined to depend on t2 through the case-label Ki ' .

[Exarnple 2]

Let us consider a title

`Connectors of conductive rubber used for leadless electronic device'.

The three words `connectors', `rubber' and `device' belong to the categories `parts',

`material' and `device'.

One of the case structures of `use' is

use (OBJ: =parts, GOAL: =device).

Hence, the adjective clause `used fbr leadless electronic device' can be determined to

depend on `connectors' through the case label OBJ-i .

3. Normalization of Function-Expressions

Document retrieval generally needs the essential coincidence between the expression

ofa request and that of the headmg ofarelevant document stored inaMe. Hence,it is

desirable that different function-expressions which have equivalence in the meaning are

transforrned into a unique and concise expression called the normal form in advance of

retrieval.

The transformations to the normal form consist of [1] nominakzation of verbal

words and removal of some nominabzed words, and [2] equivalence-transfbrmations in

some cases.

[ 1 ] Nominalization and removal of some kinds of verbal noun

All the predicate words consisting of verbal words and adjective words are replaced

with their nominalized words.

36 Shinobu TAKAMArrSU and Fzijio NISHIDA

EExample 3]

The function-expression of Example 1 is nominalized into

`films (COMPONENT: = Si02 , OBJ-"' : = fbrmation (MEANS: = implantation

(OBJ: = ion, GOAL: = slicon)))'.

The nominakzed function£xpressions contain some expressions which can be made

siippler. They are expressions which contain verbalnoun words p' such as `composi-

tion', `formation' and `application' in the following fbrm :

f (Li・ ':= p'(Li: = t)). (6) As described in the preceding section, there is a corresponding compound noun

phrase expression which represents the dependency-relation between an object and its

composite or an event and its instrument, and the correspondmg fUnction-expression is

Since expression (7) is more concise than expression (6) and more effieient for re-

trieval, expression (6) is transformed to expression (7) if p' in (6) isafimctional verbal

noun as shown in the above. A case-label K in (7) is determined by both the categories

of the governor f and the dependant t and also the case-label Li of t for the verbal

noun word p' by using Table 5.

Table5. Removalofverbalnouns.'

Verbalnoun

'p

Case-label

Li

Categoryofgovernor

fCategoryofdependant

t

Case-label

KINST OBJECT OBJECT COMPONENT

COMPOSITIONSUBJ OBJECT OBJECT COMPOSITE

FORMATION MEANS OBJECT EVENT PRocESS

EVENT OBJECT INST

OBJ EVENT EVENT MEANS

OBJECT OBJECT COMPONENTUSE

OBJECT EVENT INST-iPURP

EVENT EVENT PURP

GOAL OBJECT OBJECT COMPOSITE

If the new case-label of the other term-word which depends on a removed functional

verbal word is not provided in Table 5, the original case-label on the removed imctional

verbal word is used unchangingly as the new case-label for convenience.

[Example 4]

In the function£xpression of Example 3, `fbrmation' is a removable verbal noun,

`filrns' and `implantation' belong to `obejct' and `event' respectively, and the case-label of

`implantation' on `formation' is `MEANS'.

A Document-Retrieval Method USing Dependenqy-Relations of Titles 37

Hence, from Table 5, the function-expression of Exarnple 3 is transformed into

`films (COMPONENT: = Si02 , PROCESS: = implantation

(OBJ: =ion, GOAL: =silicon))'.

[2] Equivalence-transformations

The functionexpressions thus obtained stil1 contain some expressions which are

different from each other in their apparent forms of dependency-relations and equivalent

in their meanings. These expressions often appear in the following fbrms:

(a) An object f modified by both an action f' andamanner f" of f',

(b) An object f modified by both an attribute f' and a description f" of f',

(c) A manipulation f" such as identification modified by bath an object f and

an attribute f' of f.

The above expressions have two dependency forms respectively as shown in the both

sides of the following equivalence-relations:

f(Ki :=f'(Kl :=f"(s), t), r)=f(K2 :=f" (s, K5 :=f'(t)), r), (8)

f(Ki:=f'(Kl:=f"(s),t),r)=f(K2:=f'(t),K5:=f"(s),r), (9)

f" (K, :=f' (Kl :=f(s), t), r) =f" (K2 :=f'(t), Ki :=f(s), r), (1 0)

where a symbol `=' denotes an equivalence-relation, s, t and r represent null or sorne

terms prefixed by case-labels, and Ki , Kl , K2 , K6 are case-labels shown in Table 6.

Table 6. Setsofcasesforequivalence-relations.

Setsofcategories Setsofcase-labelsClassifica-

tionf f' f"

Expres-sion's

number K, K2 K, K6

(8) SUBJ--i MANN EXH-i SUBJ(a) OBJECT ACTION DESCRIPTION

(9) SUBJ-i MANN SUBJ-i EXH-i

(b) OBJECT ATTRIBUTE DESCRIPTION (8) ATTR SUBJ-i EXH-i SUBJ

(8) ATTR OBJ-i EXH-i OBJ(c) OBJECT ATTRIBUTE METHOD

(1O) OBJ OBJECT OBJ EXH

For convenience, the normal forms are assumed here to be defined as the expressions

of the left side of the respective equivalence-relations. Hence, if there are some

expressions of the right side of the equivalence-relation (8), (9) or (10), they are

transformed into those of the left side.

The function£xpression obtained by the transformations described in [1] and [2]

is called a normal function£xpression.

[Example 5]

(i) The function-expression ofthe noun phrase

38 Shinobu TAKAMieCTSU and Fojio NISHIDA

`Optimum transistor of power amplification'

is

`traiisistor (SUBJ-' : = amplification (OBJ: = power),

EXH-i : = optimum)'.

This is transfbrrned by (a-9) in Table 6 into the normal fbrm

`transistor (SUBJ-i : = amplification (OBJ: = power, MANN: = optimum))'

which is the functionexpression of the noun phrase

`Transistor with optimum power amplification'.

(ti) The imction-expression of the noun phrase

`Diodes improved in efficiency'

is

`diodes (EXH- ' : = improvement (OBJ : = efficiency))'.

This is transformed by (c-8) in Table 6 into the normal form

`diodes (ATTR: = eMciency (OBJ- i : = improvement))',

which is the function£xpression of the noun phrase

`Diodes with improved ethciency'.

(hi) The function£xpresslon of the noun phrase

`Cornplex-permittivity measurement ofliquid'

is

`measurement(OBJ:=complex-pemittivity, EXH:=liquid)'.

This is transformed by (c-10) in Table 6 into the nomal form

`measurernent (OBJ: = complex-permittivity (OBJECT: = liquid))',

which is the functionexpression of the noun phrase

`Measurement of complex-perrnittivity in liquid'.

4. Retrieval

A request fbr retrieval is input in a fbrm of an Enghch noun phrase or clause and it

is transfbrrned into a normal functionexpression by using the procedures described in the

preceding sections. The words of function-symbols contained in a normal function-

expression are called key-terms.

The key-terms are hierarchically disposed in an inverted Me accordmg to superior-

inferior relations of their categories as shown in Fig. 1 .4)'7) Each key-term is followed by

several pairs of the identification number of a document which contains the key-term

and a posltion-symbol of the key-term.

The positionsymbol of a key-term indicates a role-position of the key-term in the

normal functionexpression corresponding to a title. The assignment of a position-

symbol is specified as follows:

(1) A nul1 string is assigned to the position-symbol of a head key-term which does

not have the preceding key-term to be modMed.

A Document-Retrieval Method dsing Dependency-Relations of 7Vtles 39

object

T-----r-'----m devices parts materialreEE}Efi. i80P6EpCR'ts':E'ctww'snCigdA"i']Or[@oBJEpce:TmaS'ii:liiiiloM'poNE/T'L,,ri--H"-ii,lllllliil}i.,.,,,.,,[!i/!iilil&F{.::;::.d,.,,,,}

, quantlty

, reslstance ' rr"-i magneto- : event [(!]iistance ' - : phenomenon manipulation ・ -- : : addition : --- rrLA illlglpanRtttloEnss] i diffusien

Fig. 1. An illustration of the inverted file.

(2) A string `a・K' is assigned to the position-symbol ofa key-term which modifies

through the prefixed case-label K the preceding key-term having a position-symbol a.

[Exarnple 6]

In the normal function-expression

, O `films(COMPONENT:=Si02, PROCESS: = implantation (OBJ: = oxygen-ion,

GOAL: = silicon))'

corresponding to a title of a document

`Si02 films formed by oxygen-ion implantation into silicon',

the position-symbols of the key-terms `Mms', `Si02', `implantation' and `silicon' are a

null string, `COMPONENT', `PROCESS' and `PROCESS ・ GOAV respectively.

In the normal function-expression

@ `magnetoresistance (OBJECT: = fthns (COMPONENT: = permalloy))' correspond-

ing to a title of a document

`Magnetoresistance in permalloy fdms',

the position-symbols of the key-terms `magnetoresistance',`films' and `permalloy' are a

null string, `OBJECT' and `OBJECT ・ COMPONENT' respectively.

Fig. 1 Mustrates a part of an inverted Me which records the information contained

in the titles of the documents (D and @ in the above example in a distributed form on

several categorical key-term trees.

The retrieval consists of Mode (1) and Mode (2).

Mode (1) Retrieval ofdocuments which have key-terms equivalent or inferior to that

of a request by means of the usual key-term matchng from the inverted file.

40 Shinobu TAKAMATSU and Fajio NISHIDA

Mode (2) Retrieval of documents implied contextually as well as semanticany by a

request from the documents retrieved in Mode (1).

The retrieval in Mode (2) is divided into the fbnowing two cases (i) and (li):

(i) Implication-relations based on both superior-inferior relations and qualifications

by key-terms Let ltl denote a set of objects, attributes or events expressed by a key-term t.

Then, ifa fbrmula:

(flgIf'IAi(-.Y・ls).,glk). (Ki=K;A(tilg{t;})

holds in the fbllowing two functionexpressions:

Iti =A lf(Ki:=ti,K2:=t2,''', Km:=tm)l,

(t,I .A. Ift(Kl:-t{, K5:- t5,・・・, KA:- th)I,

then

holds,

Since the case-labels in a functionexpression are represented as position-symbols in

the inverted Me as mentioned above, the retrieval procedure based on the above implica-

tion-relation (1 1) is described as follows:

[Procedure 1] het ij and a7' (i -- 1,2,・・・,n) beakey-term anditsposition-

symbol in a functionexpression of a request respectively, then retrieve the documents

which have function£xpressions containing at least n pairs of the key-term equivalent or

inferior to ij and the position-symbol as;.

[Example 7]

' The document (!) is retrievedby Procedure1 from the inverted Me shown in Fig.1

for a request such as

`films(COMPONENT:=oxide, PROCESS:=addition

(OBJ: = impurities))'

or `Oxide fdms fbrmed by impurities addition'.

(il) Implication-relations based on a pair of a part and a whole

'Ihe documents entitled by

r [a study or report] on a whole f' which contains a part fJ

can be considered to imply semantically those entitled by

r [a study or report] on a part f contained in a whole f' J .

[Example 8]

`Film with magnetoresistance' implies `Magnetoresistance in frkn'.

A Document-Retrieval Method USing Dependency-Relations of Titles 41

As shown in Table 7, there wil1 be several case-pairs of a part and a whole such as

components and its composite, an attribute and its object, or means and the purpose.

Table 7. Case-pairs of a part and a whole.

K, K,

COMPOSITE COMPONENT

OBJECT ATTRibute

PURPose MEANS

The above implication-relation can be expressed in a symbolic notation:

f(t, Ki:=f' (s)) [f' (s, K2:=f(t)), (12)

whereboth t and s representsequencesoftermsprefixedbycase-labels,andbotk Ki

and K2 are case-labels that indicate modification-relations between a part and a whole.

As seen from the implication-relation (12), the key-terrn of a whole f' in the left

side has the excessive position-symbol Ki compared with that in the right side, while

the key-term ofa part f in the right side has the excessive position-symbol K2

compared with that in the left side.

Based on the implication-relation (12), the retrieval procedure is described as

follows:

[Procedure 2] Let ij and ati be a key-term and its position-symbol respectively

in a function-expression of a request.

Then, retrieve the documents which have the equivalent or inferior key-term

to ij and its position-symbols such that

(1) they consist of a to which to add a case-label Ki shown in the left column of

' Table 7,

or

(2) they consist of oj from which to delete a case-label K2 shown in the right

column of Table 7.

[Example 9]

The document @ is retrieved by Procedures 1 and 2 from the inverted Me shown

in Fig. 1 fora request such as

, `fdm (COMPONENT: r- magnetic material)' or `Film from magnetic material'.

5. Simulation

A small scale simulation on the language analysis of titles, the normalization and

the document-retrieval was carried out on FACOM U-2oo. The prograrn was written in

an assembly language. The required memories for document-retrieval and those fbr both

language analysis and normalization were about 3.4 KB and 12 KB respectively.

42 Shinobu TAKAMATSU and Fujio NISHIDA

'I he following are some ruustrative examples of inputs and their outputs.

Some ofthe input titles prefixed by a document number are

(37) `Low noise MOS-FET utthzing molybdenum gate masked ion implantation!,

(48) `Modifiedtriplediffusedpower-transistor',

(47) `Thn Si02 film forrned by ion implantation into sikcon',

(32) `The growth ofepitaxial layers for fabrication of I.A, composites',

(45) `FabricationofsthconSchottky-banier-diodesbysputtering'.

They are transfbtmed into their normal function-expressions respectively:

(37) `MOS-FET (ATTR: = noise (SUBJ" : = lowness),

PROCESS : = implantation (OBJ: = ion,

MEANS: = masking (OBJ: = gate (COMPONENT: = molybdenum))))',

(48) `power-transistor (PROCESS : = diffUsion (MANN: = triple),

OBJ-' : = modification)',

(47) `film (SUBJ-' : = thinness, COMPONENT: = Si02,

PROCESS : = implantation (OBJ: = ion, GOAL: = silicon))',

(32) `growth (OBJ:= layers (SUBJ-':= epitaxial),

PURP: = fabrication (OBJ: = composites (COMPONENT: = I.A,)))',

(45) `fabrication (OBJ: = Schottky-barrier-diodes

(COMPONENT: = sMcon), MEANS: = sputtering)'.

Ihese are recorded in a Me which consists of several categorical trees as shown in Fig.

1.

For some requests Rl, R2 and R3, the numbers of the relevant documents are

retrieved as follows:

INPUT REQUESTSRI : `Transistor formed by addition',

`t・ransistor (PROCESS : = addition)',

R2 : `'Ihin dielectric film',

`film (SUBJ-' : = thinness, COMPONENT: -- dielectric)',

R3 : `Fabricationofsemiconductorparts',

`fabrication (OBJ: = parts (COMPONENT: = semiconduct or))'.

OUTPUT DocUMENT'S NUMBERS RI: (37) and (48),

R2: (47) ,

R3 : (3 2) and (45) .

The documents (37), (47), (48) and (45) were retrieved by using the lmplication-

relation (1 1) based on superior-inferior relations of categories and qualifications.

The document (32) was retrieved by using both the implication-relations (11) and

(12) based on modincation relations between a part and a whole as well as superior-

inferior relations.

A Document-Retrieval Mlethod Clsing Dependency-Relations of 71ftles 43

6. Conclusion

It was found from a computer experiment that though the system presented in this

paper is a higher level retrieval system using dependency-relations between key-terms, the

retrieval speed and required memories are almost the same as those of the usual key-terrn

matchng system.

As for the improvement of the recall rate, the equivalence and implication relations

introduced above are considered to play an essential role. More general and extensive

studies along this direction are very important and left in future studies.

1)2)3)4)

5)

6)7)8)

References

B. Bruce, Artificial Intelltgen.ce, 6, 327 (1975).

Y. Wilks, Comm. ACM, 18, 264 (1975).

A. V. Gershamn, The 5th International Joint Conference on Al, 1, 132 (1977).

S. E. Fahlman, M!T, AI memo 331 (1975).

F. W. Lancaster, Information Retrieval System, New York: John Wiley & Sons, (1968).

N. Abe, et al, Trans. IPS. Japan, 11, 699 (1970), x.W. M. Turski, Inform.Stor. Retr. 7, 89 (1971). X

F. Nishida, et al, Trans. IECE. Japan, E60, 290 (1977).