Beihang
1. “
2.
3.
4. Big Data “
5. Google Baidu
6.
1 0 2
“
2016 09 23
Beihang
• ”
•
•
Beihang
( )
• Evaluation– –
Beihang
“ (1)
• –
• • • •
– •
Beihang
“ (2)
• – v.s.
• Indexing structures • Interaction with OS • Communication delays • Other overheads
– “ retrieval performance evaluation
• IR : Relevance
Beihang
: Relevance • “ “
– Answer precise question precisely. – Partially answer question. – Suggest a source for more information. – Give background information. – Remind the user of other knowledge.
Beihang
• ““
– • 0 1
– • 0 1 2 3
4
– • 1994 Stefana Mizzaro 4• < >
• http://www.psy.gla.ac.uk/~steve/stefano.html
Beihang
“
• “– Batch mode
• •
– “ Interactive retrieval •
• – –
Beihang
• ”
•
•
Beihang
Beihang
Beihang
Beihang
Beihang
• / (Recall rate) – “
• / (Precision) – “ “
C C CR CA
CRa
Recall = RaR A
Ra=Precision
Beihang
-• → q
– Rq={d3,d5,d9,d25,d44,d56,d71,d89,d123,d23} – “
{ d123, d84, d56, d6, d8, d9, d511, d129, d187, d25, d38, d48,d250, d113,d3 }
• 11 (11 standard recall levels) 0%,10%, 20%...90%, 100%
))
(
%
)
%
% (
61027254
.10 33
61027254 7 .10 33 86 1
Beihang
A problem
• 11– 11– Rq={d3, d56, d129} – “
{ d123, d84, d56, d6, d8, d9, d511, d129, d187, d25, d38, d48,d250, d113,d3 }
Recall: 33.3%, 66.7%, 100% Precision: 33.3%, 25%, 20%
%
%
(( )
.050
32
. 11
.05032 7 5 . 11 6 7
Beihang
: Interpolation
• rj j j=0,1,…,10
)(max)( 1 rPrP jjj rrr +≤≤=
(
(
%
%(
(
% ( )
.721
3836
5
21044
.72138365 8 21044 97 2
%
%
(( )
.050
32
. 11
.05032 7 5 . 11 6 7
Beihang
• Average Precision – “
–
∑=
=Nq
i q
i
N(r)P(r)P
1
Nq Pi(r) r
Beihang
ROC/AUC
TPR: True Positive Rate, Recall/Sensitivity FPR: Fall-out FNR: Missing rate TNR: Specificity
Precision v.s. Accuracy ROC: Receiver Operating Curve AUC: Area Under the Curve
63 37
7228
Beihang
•
• “
• “
Beihang
→
• P@5/P@10/P@N – 5/10
• R (R-precision) – “ R R
RR
PrecisionR����������
=−
Beihang
→
• (Mean Average Precision)
– AP ri,
– MAP: AP– , MAP
• A(q1): d1,d2,d3,d4,d5
• A(q2): d1,d3,d4,d2,d5
�
��
�
��
��
�
�
∑ ∑= =
×= � � �� �����
�
���
Beihang
• – “
0652 40-3.
061
0-34.
Beihang
• –
– E• b=1 E=1-F, E F• b>1 p r • b<1 r p
rppr
rp+
=
⎟⎟⎠
⎞⎜⎜⎝
⎛+
=2
112F
( )
⎟⎟⎠
⎞⎜⎜⎝
⎛+
+−=
prbb1
11E2
2
Beihang
Beihang
Beihang
• Discounted Cumulated Gain – CG – DCG – NDCG
• BPREF – /
–
Beihang
Beihang
Beihang
• – C=Rk/U
• – novelty=Ru/(Rk+Ru)
• (relative recall) – “
• (recall effort) –
“
CU CRk
CRu
CR CA
Beihang
• ”
•
•
Beihang
TREC • TREC
– Text REtrieval Conference “
– “
• – NIST(National Institute of Standards and Technology) – U.S. Department of Defense
• – “
– 1992~2012 21
Beihang
TREC
• ““
– “
–
“
“ “
Beihang
• Track – TREC
• Topic – “
– topicèquery ( ) – Question (QA)
• Document –
• Relevance Judgments –
Beihang
TREC
• TREC ()
• TREC– : NIST– :– :
NIST – : NIST
–
Beihang
TREC
• – GB– – SGML (Standard Generalized Markup
Language)
• Topic – – SGML
• –
Beihang
Topic
• Title
• Description TitleTitle
• Narrative
Beihang
Topic<topic number="2" type="diagnosis"> <description> A 62 yo male presents with four days of non-productive cough and one day of fever. He is on immunosuppressive medications, including prednisone. He is admitted to the hospital, and his work-up includes bronchoscopy with bronchoalveolar lavage (BAL). BAL fluid examination reveals owl's eye inclusion bodies in the nuclei of infection cells. </description> <summary> A 62-year-old immunosuppressed male with fever, cough and intranuclear inclusion bodies in bronchoalveolar lavage </summary> </topic>
Beihang
Topic
• Topic
•
Beihang
• – Set Precision/Set Recall
• – P@n/Average Precision/Reciprocal Rank
• – Filtering Utility
Beihang
(1)
• topic NIST
100
• Pooling – n
Beihang
(2)
• NIST trec_eval
precisionrecall)
• track
Beihang
TREC
• Ad hoc –
• Information Routing –
Beihang
TREC
Beihang
TREC 2016-Tracks • Clinical Decision Support Track • Contextual Suggestion Track • Dynamic Domain Track • Live QA Track • OpenSearch Track • Real-Time Summarization Track • Tasks Track • Total Recall Track
Beihang
Beihang
NTCIR
• NII Test Collection for IR Systems – NII (National Institute of Informatics)
“
– 1998– “
– • • “
• • “
Beihang
CLEF
• Cross-Language Evaluation Forum – 2000–
– • “
• “
• “
• • “
• “
• “
• “
Beihang
User-Based Evaluation
• Human experimentation in the lab • Side-by-side panels • A/B testing • Crowdsourcing • Using clickthrough data
Beihang
•
Beihang
Q&A