felix d íaz-hemida, david e. losada, alberto bugarín, and senén barro
Post on 30-Dec-2015
20 Views
Preview:
DESCRIPTION
TRANSCRIPT
A Probabilistic Quantifier Fuzzification Mechanism:The Model and Its Evaluation for Information Retriev
al
Felix Díaz-Hemida, David E. Losada, Alberto Bugarín, and Senén Barro
Present by Chia-Hao Lee
2
outline
• Introduction• Fuzzy Quantifiers
– Probabilistic Quantifier Fuzzification Mechanisms
• New View in Crisp Representatives– FA Quantifier Fuzzification mechanism– Properties of the Model
• Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval – Fuzzy Quantifiers and Information Retrieval– Example
• Information Retrieval Experiments• Conclusion
3
Introduction
• The ability of fuzzy quantifiers to model linguistic statements in a natural way has proved useful in diverse areas such as expert systems, data mining, control systems, database systems, etc.
• In the information retrieval (IR) field, fuzzy quantification has been proposed for handling expressive queries giving rise to flexible query language.
4
Introduction
• Fuzzy quantification is a linguistic granulation technique capable of expressing the global characteristics of a collection of individuals, or a relation between individuals, through meaningful linguistic summaries.
• Granular computing attempts to manage complex, large-scale problems by organizing these into different levels of detail.
• It is understood that each sub-problem should be solved at its appropriate level of granularity, and there are effective transformations which mediate between these levels.
5
Introduction
• The need for such transformation process not only arises in the technical problem areas tackled by computers.
• It is hence not surprising that natural language (NL) provides a class of expressions specifically designed to express accumulative properties and to summarize information: natural language quantifiers .
• NL quantifiers, and in particular their approximate variety (“almost all ”, “a few ” etc.), provide flexible means for expressing accumulative properties of collections and can also describe global aspects of relationships between individuals.
6
Introduction
• Fuzzy set theory attempts to model NL quantifiers by operators called fuzzy quantifiers . – Interpretation : the development of methods for evaluating quantify
ing expressions which capture the meaning of natural language quantifiers.
– Summarization : the development of processes for constructing quantifying statements, which succintly describe a collection of observations and/or relationships between a large number of observations (find domain concepts X and Y and a quantifier Q such that “ Q X’s are Y’s is true ”) .
– Reasoning : the development of methods which deduce further knowledge from a set of rules and/or facts involving fuzzy quantifiers.
7
Fuzzy Quantifiers
Fuzzy Quantifiers
Two-valued quantifier: input : crisp input
output: crisp output
Fuzzy quantifier:Input : fuzzy input
Output : fuzzy output
Semi-fuzzy quantifier:Input : crisp input
Output : fuzzy output
8
• Definition 1 (Classic Quantifier or Two valued Quantifier) : An n-ary generalized quantifier on a base set is a mappi
ng Q : A two-valued quantifier hence assigns to each n-tuple of crisp subsets a two-valued quantification res
ult .
Fuzzy Quantifiers
1,0 2E nE
EXX n ~,,1 2XXQ n ,,1
E : the powerset of E E~ : the fuzzy powerset of E
9
Fuzzy Quantifiers
• Well-known examples
• A typical example of a classic quantifier is the following definition of an all statement which can be used for sentences such as “ ” :sXaresXall 21
)1(,0
,1, 21
21
otherwise
XXifXXall
EXXE 1 XXE 1 2121 1, XXXXallE
2121 1, XXXXsomeE kXXXXkleastat E 2121 1,
12121 1, XXXXXrate 12121 1, XXXXXrate
10
Fuzzy Quantifiers
• For example :
Let us consider the evaluation of the sentence “80% or more of students are Spanish” in the reference
where the properties “students” and “Spanish” are,
respectively, defined as
X1(students)={1,0,1,0,1,0,1,1} (true : 1 , false : 0)
X2(Spanish)={1,0,1,0,1,0,0,0}
and “80% or more” is defined as in (1). Then
1
11
21
21
1
80.0,___%80
Xif
XifX
XXXXtheofmoreorabout
080.0
1,1,0,1,0,1,0,1
0,0,0,1,0,1,0,1,___%80 21
XXtheofmoreorabout
Logic “and”
87654321 ,,,,,,, eeeeeeeeE
11
• Definition 2 (Fuzzy Quantifier) :
An n-ary fuzzy quantifier on a base set is a mapping which to each n-tuple of fuzzy subsets of E assigns a gradual result
An example of a fuzzy quantifier is , which can defined as a fuzzy extension of 1 using a typical definition for the fuzzy inclusion operator:
Fuzzy Quantifiers
E 1,0~:
~ nEQ
1,0~:~ 2 Elal
)2(:,1maxinf,~
2121 EeeeXXlal XX
Q~
nXX ,,1 1,0,,
~1 nXXQ
12
• For example :
Let us consider the evaluation of sentence “all tall people
are blond” in the referential set . Let us assume
that properties “tall” and “blond” are, respectively, defined as
Using expression (2) then:
• In many cases, it is not easy to achieve consensus on an intuitive and generally applicable expression for implementing a given quantified expression.
Fuzzy Quantifiers
4321 ,,, eeeeE
43211 /3.0,/6.0,/1,/8.0 eeeetallX 43212 /2.0,/3.0,/7.0,/9.0 eeeeblondX
EeeeXXlal XX :,1maxinf,~
2121
4.07.0,4.0,7.0,9.0inf
13
Fuzzy Quantifiers
• Definition 3 (Semi-fuzzy Quantifier) :
An n-ary semi-fuzzy quantifier on a base set is a mapping which to each n-tuple of crisp subsets of E assigns a gradual result .
.
E 1,0: nEQ
1,0,,1 nXXQ
14
Fuzzy Quantifiers
• Examples of semi-fuzzy quantifier are :
218,6,4,221 ,5_ XXTXXabout
1
11
218.0,5.0
21
1
,___%80
Xif
XifX
XXS
XXtheofmoreorabout
,,0
,,1
,,1
,,
,,0
,,,
xd
dxccd
cxcxb
bxaab
axax
xT dcba
x
xx
xx
x
x
,12
,21
2,2
,0
S 2
2
,
15
Fuzzy Quantifiers
• For example :
Let us consider the evaluation of the sentence “about 80% or more of the students are Spanish”. Let us assume that properties “students” and “Spanish” are, respectively, defined as
X1(students)={1,0,1,0,1,0,1,1} ,
X2(Spanish)={1,0,1,0,1,0,0,0} then
22.0
1,1,0,1,0,1,0,1
0,0,0,1,0,1,0,1,___%80 8.0,5.021
SXXtheofmoreorabout
16
Fuzzy Quantifiers
• Semi-fuzzy quantifiers are half-way between two-valued quantifiers and fuzzy quantifiers because they have crisp input and fuzzy output. In particular, every two-valued quantifier of TGQ (theory of generalized quantifiers) is a semi-fuzzy quantifier by definition.
• Being half-way between two-valued generalized quantifiers and fuzzy quantifiers, semi-fuzzy quantifiers do not accept fuzzy input, and we have to make use of a fuzzification mechanism which transports semi-fuzzy quantifiers to fuzzy quantifiers.
1,0~:~
1,0:: nn EQEQF
17
Fuzzy Quantifiers
• Probabilistic Quantifier Fuzzification Mechanisms :
In the universe of discourse E is finite and expressions and unary then both expressions collapse into the same discrete expression
• The value can be interpreted as the probability that ( ) is selected as the crisp representative for the fuzzy set X .
m
iiii
XQXQF0
1
1 ii
iX eEeX X,
18
Fuzzy Quantifiers
• Let be a set of individuals for which the set
represents the fulfillment of the property “being all”. It is reasonable for X to arise on the basis of a consonant vote. The intuitive ordering of the elements of the referential on the basis of their height is
. The focal elements and their associated probability masses are :
54311 /2.0,/5.0,/5.0,/1,/8.0 eeeeeX
2.0, 12121 eeme XX
3.0,, 312212 eemee XX
3.0,,,, 53343213 eemeeee XX
2.0,,,,, 54543214 emeeeee X
54321 ,,,, eeeeeE EX ~
54312 eeeee
20
Fuzzy Quantifiers
• For example :
Let us consider the evaluation of the quantified sentence “almost all students are tall.”
Suppose that we model the property tall for a referential set
of students through the fuzzy set tall
and we support the quantified expression “almost all” by means of the following semi-fuzzy quantifier :
321 ,, eeeE 321 /1,/9.0,/8.0 eee
2
11 3,
nnqwhereXqXQ
the feature “tall”
21
Fuzzy Quantifiers
given the fuzzy set tall, the values are
and the fuzzification process runs as follows:
0,8.0,9.0,1,1 43210 i
3
01
iiii
tallQtallQF
2110 10 tallQtallQ
4332 32 tallQtallQ
9.0111 33 eQeQtallQF
08.0,,8.09.0, 32132 eeeQeeQ
8.03
31.0
3
21.0
3
1222
855.0
22
New View on Crisp Representatives
• Given a fuzzy set , the process that selects a number of elements in E to be included in a crisp representative of X can be viewed as a random process in which n mutually independent binary decisions are made .
• Every individual decision involving an element may be viewed as a Bernoulli trial whose probability of success equals .
EX ~
En
Ee
eXA random variable X has a Bernoulli distribution with parameter p (0<p<1) if X take only the values 0 and 1.The p.f. f (·|p) of X can be written in the form
otherwise
xforqppxf
xx
0
1,01
pq 1
23
New View on Crisp Representatives
• Definition 4 ( ) :We define the probability that a crisp set i
s a crisp representative of X as
• Definition 5 ( ):Let be a semi-fuzzy quantifier.
YtiveRepresentaP X
EY
Ye
XYe
XX eeYtiveRepresentaP 1
For simplicity , YtiveRepresentaPYm XX
1,0: nEQ
EXXYYQYmYmXXQF sEY EY
ssXXsA
s
s
~,,,,,,, 1111
1
1
AF
fuzzification process :AF
24
New View on Crisp Representatives
• We will denote by a referential containing m elements. By we will denote a crisp (fuzzy) set on this referential. (so we have subsets)
• Let us consider a unary semi-fuzzy quantitative quantifier
mm eeE ,,1
mm EXEY ~
mEYYqYQ ,11
1,0:1 Nq1q : a function with the form
m2
25
New View on Crisp Representatives
• For this case, the expression becomes
• And we instead of
mEY
XA YQYmXQF 11
mEY
XYEY
Xmm
YQYmYQYm 10
1
mEY
XYEY
Xmm
YqYmYqYm 10
1
jYEYX
m
Ym jcardP Xr
mqmcardPqcardPXQF XrXrA
111 00
m
jXr jqjcardP
01
26
New View on Crisp Representatives
• Example of the approach"" tallarestudentsallalmostsentencequantifiedtheagainevaluateusLet
model.tionquantificaFtheapplying A
:/1,/9.0,/8.0"" 321 quantifierfuzzysemitheandeeetallsetfuzzytheGiven :, jofvalueeveryforjcardPiesprobabilitthecomputeweFirst tallr
001.02.000
tallYEY
talltallr mYmcardP
02.011.02.009.02.001.08.0
1 3211
emememYmcardP talltalltallYEY
talltallr
26.019.02.011.08.009.08.0
,,,2 3231212
eemeemeemYmcardP talltalltallYEY
talltallr
72.019.08.0,,3 3213
eeemYmcardP tallYEY
talltallr
AF
27
New View on Crisp Representatives
838.03
372.0
3
226.0
3
102.000
33221100222
1111
3
01
qcardPqcardPqcardPqcardP
jqjcardPtallQF
tallrtallrtallrtallr
jtallr
A
,thenAnd
• It can be proved that all the value can be obtained with a complexity
jcardP Xr 2mO
28
New View on Crisp Representatives
• We can advance that the model is well-behaved because it fulfills the properties of correct generalization of crisp expressions, induced operations, external negation, internal negation, duality, internal meets, monotonicity in arguments monotonicity in quantifiers and coherece with logic .
29
Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval
• IR is the science concerned with the effective and efficient retrieval of information for the subsequent use by interested parties.
• IR models differ in the way in which documents and queries are represented and matched.
• The proposal designs a general framework based on the NVM method in which quantifiers with different degrees of expressiveness can be handled.
30
Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval
• Consider a query with the form . Given a document
of the document base, every query term produces a score which represents the connection between the document’s semantics and the term.
• Formally, every document induces a fuzzy set on the set of query terms which is defined applying the popular weighting strategy
nqtqtall ,,1 kd
iqt
kdkdC
idftf /
nnCCd qtqtqtqtCkdkdk
/,,/ 11
5
maxmax ,
,
ll
i
kzz
kqt
iC qtidf
qtidf
f
fqt i
kd
iqt
kdkqtif , : the raw frequency of term in the document
kzz f ,max : the maximum raw frequency computed over all terms mentioned by the document kd
31
Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval
• The fuzzy set models the connection between the document and every query component.
• Quantification can now be applied on for evaluating the quantified symbol all.
kdC
kd
kdC
32
Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval
• Example :Let us suppose that we apply the following power function for supp
orting a given query quantification symbol Q :
Imagine a query and consider a document whose fuzzy set induced on the query components is
Applying now the fuzzification process explained along this paper, the query-document matching is assigned a score
2
2
,n
xxpqXpqXQs n : the number of query terms
4321 ,,, qtqtqtqtQkd
4321 /2.0,/0,/3.0,/7.0 qtqtqtqtCkd
1100 pqcardPpqcardPkdkd CrCr
3322 pqcardPpqcardPkdkd CrCr 44 pqcardP
kdCr
24
12.07.03.08.03.03.08.07.07.008.07.03.0
12625.04
32.03.07.0
4
22.03.03.02.07.07.08.03.07.0
2
2
2
2
33
Applying the FA Quantifier Fuzzificaiton Mechanism for Information Retrieval
Let us now apply the NVM approach to handle the same example.
The score assigned is equal to
It follows that the final value yielded by the NVM method is:
4
01
iiids
ikCQ
1.0,4.03.0 211 qtqtQqtQQ sss
0,,,2.0,, 4321321 qtqtqtqtQqtqtqtQ ss
1625.02.04
31.0
4
24.0
4
12
2
2
2
2
34
Information Retrieval Experiment
• We ran experiments against the Wall Street Journal (WSJ) documents, which are about 173,000 news articles (from 1987 to 1992).
• Natural language documents are preprocessed as follow:– First, common words such as prepositions, articles, etc. are
eliminated.– Second, terms are reduced to their syntactical root by applying
the popular Porter’s stemmer.
35
Information Retrieval Experiment
• We tried out different semi-fuzzy quantifiers for relaxing the interpretation of the quantified statement all and, for each semi-fuzzy quantifier, both the fuzzification approach and the NVM approach were applied.
• We experimented with power functions and exponential functions, both of them normalized in the interval
as follows :
AF
1,0
XpqXPQ 1 exp
exp
1 n
xxpq
XeqXEQ 1 nk
xk
e
exeq
1
top related