similarity relation defined for the domain opinion query: which sociologists are in considerable...

26

Post on 23-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 2: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 3: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 4: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 5: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 6: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

Similarity relation defined for the domain opinion

18.06.02.0008.018.06.02.006.08.018.06.02.02.06.08.018.06.0

02.06.08.018.0002.06.08.01HNNSNSFFHF

HNNSNSFFHF

Query:which sociologists are in considerable agreement with Kass

concerning policy Y?

Fuzzy Relational Data Base: Buckles, Petry(1) Elements of the tuples contained in the relations may be subset

s of the domain universal set.(2) A similarity relation is defined on each domain universal set.

Page 7: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

Fuzzy Data Base1. (Project (select Assessment where Name = Kass and Option =

Y) over Opinion) giving R1

Relation : R1 Option

favorable

Retrieve the opinion of Kass concerning option Y

2. (Project (select Expert where Field = Sociologist) over Name) giving R2

Relation : R2 Name

Osborn

Schreiber

Cohen

Specterman

Select all sociologists from the table of experts

Page 8: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

3. (Project (select (Join R2 and Assessment over Name) where Opinion = Y) over Name, Opinion) giving R3

List the opinions of the sociologists

4. (Join R3 and R1 over Opinion) with THRES (Opinion) 0.75 ≧and THRES (Name) 0≧

Relation : R3Name      Opinion

Obsorn Slightly favorable

Schreiber Favorable

Cohen Slightly negative

Specterman Highly favorable

Name      Opinion

{Obsorn, Schreiber Specterman}

Slightly favorable, favorable, highly favorable}

Page 9: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

Information retrieval based on fuzzy associations

1. Introduction

2. Three components in information retrieval

3. Fuzziness in a thesaurus: first component

4. Fuzziness in retrieval: second component

5. Fuzziness on output: third component

6. Classification of output

7. Conclusion

Page 10: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 11: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 12: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

2. Three components in information retrieval D = {d1,d2,…,dn} be a finite set of documents for retrie

val W = {w1,w2,…,wm} denote a set of descriptors T : D ─>[0,1]w. T(d): a subset of descriptors in W in

dexed to the document d. U(U = T-1). U(w): documents have keyword w.

F

Information retrieval based on fuzzy associations

U P rr’

q

Page 13: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

3. Fuzziness in a thesaurus: first componentThree type thesaurus (represented as binary relation)RT: related termsNT: narrower termsBT: broader termsB(v,w) = N(w,v) R(v,w) = R(w,v)Method of automatic generation of thesauri:1. Typical:counting frequencies of simultaneous occurrences o

f pairs of keywords in a set of documents.2. Fuzzy set model:C = {c1,c2,…,cp} be a finite set of concepts where each ci, i=1,

…p represents a unit of conceptH:W ─>[0,1]p a fuzzy set valued function which maps each key

word to it’s corresponding concepts as a fuzzy set in C.Wwwh :)( is concept of the word w.

)(

)()(),(

)()(

)()(),(

vh

whvhwvN

whvh

whvhwvR

Page 14: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 15: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

Even by present computers, it’s difficult to calculate values of the fuzzy relation above using array in straightforward way, since the numbers of elements in W and D are very large(103 x 105). Although techniques to handle sparse matrices may be applied, there is another method for generation R and N based on manipulation of sequential files. The principle tool for this is sorting.

(a,b,c) means a record in which field are a,b and c.{(a,b,c)} means a set of records such as (a,b,c).

Input: a set D of documents, Each document d ∈ D has a number of keywords in W.A keyword may occur twice or more in a document. The frequency of occurrence of wi in dk is denoted by hik.

Output: a set of records {(wi,wj,R(wi,wj)]} for all pairs R(wi,wj)<>0

Page 16: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

Algorithm GFT (generation of a fuzzy thesaurus).// Find pairs of keywords in every document.//For all dk D do∈  find all keywords wi W and calculate h∈ jk

  for all (wi,wj),wi<wj, that are found in dk domake record (wi,wj, min(hik,hjk))output (wi,wj,min(hik,hjk) to WORK1

  repeat  for all wi that are found in dk do

make record (wi, hjk)output (wi,hjk) to WORK2

  repeatrepeat//sort WORK1 and WORK2.//sort WORK1 into increasing order of the key (wi,wj)sort WORK2 into increasing order of the key wi

Page 17: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

//Calculate R.Scan WORK1 and WORK2.//for all (wi,wj) in WORK1 do  find all record for (wi,wj) in WORK1

and all records for wi, and wj in WORK2  R (wi,wj)←∑k min(hik,hjk)/(∑k hik+ ∑k hjk- ∑kmin(hik,hjk))  output (wi,wj,R(wi,wj)) to an output filerepeatend-of algorithm GFT

In a foregoing paper an experimental calculation on three thousand documents and thirty thousand keywords was carried out using GFT based on sorting shows a reasonable amount of 800 sec of CPU time.

Page 18: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

//record (di,pi)////before another record (di,pi) satisfies either di<dj or////di = dj, pi > pj//Take the first record (d1,p1) in work(D,P)<-(d1,p1)for all dj in WORK do//the dj’s are sequentially examined.//  if D <> dj then

output (D,P) to to an output file OUT(D,P)<-(di,pj)

  endifrepeatoutput(D,P) to OUT//OUT contains exactly those records that represent P=Uf(d,w) define by above//

Page 19: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

//Third step: if necessary sort again.//sort OUT into the decreasing order of the key p and print OUT

Page 20: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

4. Fuzziness in retrieval: second component

For the crisp case a retrieval through a thesaurus gi

ven a keyword w is as follows.

(a) Examine the thesaurus F and find all associated ter

ms v11,v12,…,v1p.

(b)Find subsets U(v11),U(v12),…,U(v1p).(c) Establish the retrieved set of documents as the uni

on of U(v11),U(v12),…,U(v1p): ∪1 i p≦≦ U(v1i)

Uf(d,w) = 1 iff d U(v∈ 1) for some v1 such that F(v1,

w) = 1, 0 otherwise.

Page 21: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

When the thesaurus F is fuzzy and U is crispUf(d,w) = max v W∈ min [U(d,v),F(v,w)].This equation is valid also for a fuzzy relation U(d,v).Algorithm FR(Fuzzy Retrieval).//First step: Find all records.//for all v such that F(v,w) <> 0 in FT do  for all d U(v) do∈

p(d,v)<-min[U(d,v),F(v,w)]output record (d,p(d,v)) to a work file WORK

  repeatrepeat//second step: Find values of Uf.//sort WORK into increasing order of the first key d  and into decreasing order of the second key p//the above sorting means that in the resulting sequence,a//

Page 22: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

End-of FR5. Fuzziness on output: third component

Fuzzy filter. EX:(a) Find recent documents that have keyword w.(b) Find documents that have keywords w and are

relevant to one’s field of interestr = r’ ∩ g

6. Classification of output1) Decreasing of membership2) Divide into layers

7. ConclusionProblem for further studies1) Discussion of crisp techniques of advanced

indexing and retrieval using a fuzzy set model,

Page 23: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy

2) Studies of efficient algorithms for large scale database. In particular, development of hardware for information retrieval should be taken into account.

3) Application of methods in fuzzy information retrieval to related areas.

Page 24: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 25: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy
Page 26: Similarity relation defined for the domain opinion Query: which sociologists are in considerable agreement with Kass concerning policy Y? Fuzzy