robust pseudo feedback & hmm passage extraction uiuc at trec 2006 genomics track
DESCRIPTION
Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track. Jing Jiang, Xin He, ChengXiang Zhai University of Illinois at Urbana-Champaign. Goal of Participation. To test the effectiveness of some recent language modeling methods for genomics retrieval - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/1.jpg)
Robust Pseudo Feedback& HMM Passage Extraction
UIUC at TREC 2006 Genomics Track
Jing Jiang, Xin He, ChengXiang ZhaiUniversity of Illinois at Urbana-Champaign
![Page 2: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/2.jpg)
11/16/06 2
Goal of Participation
• To test the effectiveness of some recent language modeling methods for genomics retrieval– Robust pseudo feedback [Tao & Zhai 06]
– HMM passage extraction [Jiang & Zhai 06]
• Task at 2006 genomics track– Document-level retrieval– Passage-level retrieval– Aspect-level retrieval
![Page 3: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/3.jpg)
11/16/06 3
Overall Approach
QDocument Retrieval Module
1
Medline articles paragraphs
Passage Extraction
Module2
k
…
1 2 k…
…
…
ranked paragraphs
ranked passages
user relevance feedback
pseudo relevance feedback
![Page 4: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/4.jpg)
11/16/06 4
Goal of Participation
• To test the effectiveness of some recent language modeling methods for genomics retrieval– Robust pseudo feedback [Tao & Zhai 06]
– HMM passage extraction [Jiang & Zhai 06]
![Page 5: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/5.jpg)
11/16/06 5
KL-Divergence Retrieval Model[Lafferty & Zhai 01]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
the 0.020for 0.015prp 0.102mad 0.034cow 0.034diseas 0.068… …
topic
document
D2
D1
Dk
…
…
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
![Page 6: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/6.jpg)
11/16/06 6
KL-Divergence Retrieval Model[Lafferty & Zhai 01]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
the 0.020for 0.015prp 0.102mad 0.034cow 0.034diseas 0.068… …
document
D2
D1
Dk
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
topic …
…
![Page 7: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/7.jpg)
11/16/06 7
Model-Based Feedback[Zhai & Lafferty 01]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
D2
D1
Dk
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
topic
the ?for ?… …prp ?prion ?
feedback
the 0.02for 0.01… …prp 0.003prion 0.004
background
…
…
![Page 8: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/8.jpg)
11/16/06 8
Model-Based Feedback[Zhai & Lafferty 01]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
D2
D1
Dk
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
topic
the 0.003for 0.002… …prp 0.02prion 0.05
feedback
the 0.02for 0.01… …prp 0.003prion 0.004
background
…
…
EM algorithm
![Page 9: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/9.jpg)
11/16/06 9
Model-Based Feedback[Zhai & Lafferty 01]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
D2
D1
Dk
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
topic
the 0.003for 0.002… …prp 0.02prion 0.05
feedback
the 0.02for 0.01… …prp 0.003prion 0.004
background
…
…2 parametersα and λ
![Page 10: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/10.jpg)
11/16/06 10
Regularized Estimation[Tao & Zhai 06]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
D2
D1
Dk
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
topic
the ?for ?… …prp ?prion ?
feedback
the 0.02for 0.01… …prp 0.003prion 0.004
background
…
…
![Page 11: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/11.jpg)
11/16/06 11
Regularized Estimation[Tao & Zhai 06]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
D2
D1
Dk
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
topic
the 0.003for 0.002… …prp 0.02prion 0.05
feedback
the 0.02for 0.01… …prp 0.003prion 0.004
background
prior
regularized EM
algorithm
…
…
![Page 12: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/12.jpg)
11/16/06 12
Regularized Estimation[Tao & Zhai 06]
role 0.2prnp 0.2mad 0.2cow 0.2diseas 0.2
D2
D1
Dk
The…for… spongiform…PrP protein…
Prion diseases… that…(PrP C)…This…
…which…(PrP C)…to the…prion protein…
topic
the 0.003for 0.002… …prp 0.02prion 0.05
feedback
the 0.02for 0.01… …prp 0.003prion 0.004
background
prior
…
…1 parameter η
![Page 13: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/13.jpg)
11/16/06 13
…
D1
D2
Dk
Original vs. Regularized EMoriginal regularized
…
D1
D2
Dk
α
…
D1
D2
Dk
α
α
α dynamically set
α manually set
![Page 14: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/14.jpg)
11/16/06 14
Goal of Participation
• To test the effectiveness of some recent language modeling methods for genomics retrieval– Robust pseudo feedback [Tao & Zhai 06]
– HMM passage extraction [Jiang & Zhai 06]
![Page 15: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/15.jpg)
11/16/06 15
HMM Passage Extraction[Jiang & Zhai 06]
p(w|B1) the: 0.02 for: 0.01 prp: 0.001 …
p(w|R) the: 0.003 for: 0.002 prp: 0.02 …
p(w|B2) the: 0.02 for: 0.01 prp: 0.001 …
B1 R B2p(R|B1)
= 0.1p(B2|R)= 0.05
p(B1|B1)= 0.9
p(R|R)= 0.95
p(B2|B2)= 1
HMM
B R…B B …R R R R B … BR
relevant passage
w w…w w …w w w w w … ww
paragraph
![Page 16: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/16.jpg)
11/16/06 16
HMM Passage Extraction[Jiang & Zhai 06]
B2
B1 R B3 E
a background state for smoothing
end-of-paragraphstate
transition probabilities estimated from observations
![Page 17: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/17.jpg)
11/16/06 17
Experiment Design
• Pre-processing– HTML parsing– paragraph boundaries – Tokenization
• User relevance feedback
![Page 18: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/18.jpg)
11/16/06 18
Official Runs
Q KL-Div Retrieval
1
Medline articles paragraphs
HMM Passage
Extraction2
k
…
1 2 k…
…
…
ranked paragraphs
ranked passages
Q'
![Page 19: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/19.jpg)
11/16/06 19
UIUCauto
Q KL-Div Retrieval
1
Medline articles paragraphs
HMM Passage
Extraction2
k
…
1 2 k…
…
…
ranked paragraphs
ranked passages
Q'
regularized estimation
![Page 20: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/20.jpg)
11/16/06 20
UIUCinter
Q KL-Div Retrieval
1
Medline articles paragraphs
HMM Passage
Extraction2
k
…
1 2 k…
…
…
ranked paragraphs
ranked passages
regularized estimation
Q'
![Page 21: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/21.jpg)
11/16/06 21
UIUCinter2
Q KL-Div Retrieval
1
Medline articles paragraphs
HMM Passage
Extraction2
k
…
1 2 k…
…
…
ranked paragraphs
ranked passages
original estimation
Q'F
![Page 22: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/22.jpg)
11/16/06 22
Pseudo Relevance Feedback(k = 10)
Method Doc MAP Rel. Impr.
Baseline (no feedback) 0.3484 N/A
Original Estimation
Def 0.3606 +3.50%
Opt 0.3943 +13.2%
Regularized Estimation
Def0.3842
(UIUCauto)+10.3%
Opt 0.3952 +13.4%
η is similar to λ / (1 − λ)
![Page 23: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/23.jpg)
11/16/06 23
Pseudo Relevance Feedback(k = 10)
η is similar to λ / (1 − λ)
Method Doc MAP Rel. Impr.
Baseline (no feedback) 0.3484 N/A
Original Estimation
Def 0.3606 +3.50%
Opt 0.3943 +13.2%
Regularized Estimation
Def0.3842
(UIUCauto)+10.3%
Opt 0.3952 +13.4%
![Page 24: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/24.jpg)
11/16/06 24
Pseudo Relevance Feedback(k = 10)
Method Doc MAP Rel. Impr.
Baseline (no feedback) 0.3484 N/A
Original Estimation
Def 0.3606 +3.50%
Opt 0.3943 +13.2%
Regularized Estimation
Def0.3842
(UIUCauto)+10.3%
Opt 0.3952 +13.4%
η is similar to λ / (1 − λ)
![Page 25: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/25.jpg)
11/16/06 25
Parameter Sensitivity(pseudo feedback, k = 10)
![Page 26: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/26.jpg)
11/16/06 26
User Relevance Feedback
MethodDoc MAP
Pseudo Feedback
User Feedback
Rel. Impr.
Original Estimation
Def 0.3606 0.3986 +10.5%
Opt 0.3943 0.4511 +14.4%
Regularized Estimation
Def0.3842
(UIUCauto)0.4261
(UIUCinter)+10.9%
Opt 0.3952 0.4515 +14.2%
![Page 27: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/27.jpg)
11/16/06 27
User Relevance Feedback
MethodDoc MAP
Pseudo Feedback
User Feedback
Rel. Impr.
Original Estimation
Def 0.3606 0.3986 +10.5%
Opt 0.3943 0.4511 +14.4%
Regularized Estimation
Def0.3842
(UIUCauto)0.4261
(UIUCinter)+10.9%
Opt 0.3952 0.4515 +14.2%
![Page 28: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/28.jpg)
11/16/06 28
User Relevance Feedback
MethodDoc MAP
Pseudo Feedback
User Feedback
Rel. Impr.
Original Estimation
Def 0.3606 0.3986 +10.5%
Opt 0.3943 0.4511 +14.4%
Regularized Estimation
Def0.3842
(UIUCauto)0.4261
(UIUCinter)+10.9%
Opt 0.3952 0.4515 +14.2%
![Page 29: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/29.jpg)
11/16/06 29
HMM Passage Extraction
Method Psg MAP
UIUCauto
Paragraph 0.03753
HMM Passage 0.04864
Rel. Impr. +29.6%
UIUCinter
Paragraph 0.04481
HMM Passage 0.05906
Rel. Impr. +31.8%
UIUCinter2
Paragraph 0.04580
HMM Passage 0.06038
Rel. Impr. +31.8%
![Page 30: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/30.jpg)
11/16/06 30
Passage Length (In Bytes)
Max Min Avg Std
True Passages 6928 27 399.8 489.4
HMM Passages 6955 34 1525.8 949.7
Paragraph 8670 60 2105.4 1136.8
HMM passages are generally too long!
![Page 31: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/31.jpg)
11/16/06 31
Example PassagePrion diseases, which include Creutzfeldt-Jacob disease in humans, mad cow disease in cattle, and scrapie in sheep, involve the misfolding of the benign cellular prion protein (PrP C) 1 to the infectious disease-causing scrapie isoform PrP Sc. The prion protein (PrP C) is a copper-binding cell surface glycoprotein. The role of copper in the normal function of PrP, as well as in prion diseases, has been the subject of a number of excellent reviews. The mature cellular form of PrP consists of residues 23 to 231 and is tethered to the cell surface via a glycosylphosphatidylinositol anchor at the C terminus. There are now a number of NMR solution structures of copper-free mammalian PrPs. A crystal structure of PrP C has also been published; this structure is dimeric involving domain swapping of the monomeric form.
![Page 32: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/32.jpg)
11/16/06 32
Example PassagePrion diseases, which include Creutzfeldt-Jacob disease in humans, mad cow disease in cattle, and scrapie in sheep, involve the misfolding of the benign cellular prion protein (PrP C) 1 to the infectious disease-causing scrapie isoform PrP Sc. The prion protein (PrP C) is a copper-binding cell surface glycoprotein. The role of copper in the normal function of PrP, as well as in prion diseases, has been the subject of a number of excellent reviews. The mature cellular form of PrP consists of residues 23 to 231 and is tethered to the cell surface via a glycosylphosphatidylinositol anchor at the C terminus. There are now a number of NMR solution structures of copper-free mammalian PrPs. A crystal structure of PrP C has also been published; this structure is dimeric involving domain swapping of the monomeric form.
![Page 33: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/33.jpg)
11/16/06 33
Conclusions and Future Work• The two language modeling methods in general
works well in genomics domain– Regularized feedback estimation can effectively
eliminates parameter α– HMM passages improves over paragraphs
• User relevance feedback is effective• Limitations and future work
– Regularized feedback estimation still has parameter η to tune
• How to eliminate η?
– The inherent coherence property of HMM passages may not suit the task well
• Different/better HMM architecture?
![Page 34: Robust Pseudo Feedback & HMM Passage Extraction UIUC at TREC 2006 Genomics Track](https://reader035.vdocument.in/reader035/viewer/2022070409/5681443d550346895db0d8f4/html5/thumbnails/34.jpg)
11/16/06 34
The End
• Questions?