cheshire at geoclef 2008: text and fusion approaches for gir
DESCRIPTION
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR. Ray R Larson School of Information University of California, Berkeley. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Cheshire at GeoCLEF 2008: Text and Fusion
Approaches for GIR
Ray R LarsonSchool of Information
University of California, Berkeley
GeoCLEF 2008 -- Aarhus September 18, 2008
Motivation Motivation
In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc.
Last year we decided to try just our “basic” retrieval methodI.e., Logistic regression with blind feedback
The goal was to establish baseline data that we can use to test selective additions in later experiments
In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc.
Last year we decided to try just our “basic” retrieval methodI.e., Logistic regression with blind feedback
The goal was to establish baseline data that we can use to test selective additions in later experiments
GeoCLEF 2008 -- Aarhus September 18, 2008
MotivationMotivation
Because the “baselines” worked well last year, we decided to continue with them and begin testing “fusion” approaches for combining the results of different retrieval algorithmsThis was due in part to Neuchatel’s use
of fusion approaches with good results and our previous use of fusion approaches in earlier CLEF tasks
Because the “baselines” worked well last year, we decided to continue with them and begin testing “fusion” approaches for combining the results of different retrieval algorithmsThis was due in part to Neuchatel’s use
of fusion approaches with good results and our previous use of fusion approaches in earlier CLEF tasks
GeoCLEF 2008 -- Aarhus September 18, 2008
ExperimentsExperiments
TD, TDN, and TDN Fusion for Monolingual English, German, Portuguese (9 runs)
TD, TDN, and TDN Fusion for Bilingual X to English, German, and Portuguese (18 runs)
TD, TDN, and TDN Fusion for Monolingual English, German, Portuguese (9 runs)
TD, TDN, and TDN Fusion for Bilingual X to English, German, and Portuguese (18 runs)
GeoCLEF 2008 -- Aarhus September 18, 2008
MonolingualMonolingual
Run Name Task Characteristics MAPBERKGCMODETD Monolingual German TD auto 0.2295 *BERKGCMODETDN Monolingual German TDN auto 0.205BERKMODETDNPIV Monolingual German TDN auto fusion 0.2292BERKGCMOENTD Monolingual English TD auto 0.2652BERKGCMOENTDN Monolingual English TDN auto 0.2001BERKMOENTDNPIV Monolingual English TDN auto fusion 0.2685 *BERKGCMOPTTD Monolingual Portuguese TD auto 0.217BERKGCMOPTTDN Monolingual Portuguese TDN auto 0.1741BERKMOPTTDNPIV Monolingual Portuguese TDN auto fusion 0.2310 *
GeoCLEF 2008 -- Aarhus September 18, 2008
MonolingualMonolingual
Run Name Task Characteristics MAP
BERKGCMODETD Monolingual German TD auto 0.2295 *
BERKGCMODETDN Monolingual German TDN auto 0.205
BERKMODETDNPIV Monolingual German TDN auto fusion 0.2292
BERKGCMOENTD Monolingual English TD auto 0.2652
BERKGCMOENTDN Monolingual English TDN auto 0.2001
BERKMOENTDNPIV Monolingual English TDN auto fusion 0.2685 *
BERKGCMOPTTD Monolingual Portuguese TD auto 0.217
BERKGCMOPTTDN Monolingual Portuguese TDN auto 0.1741
BERKMOPTTDNPIV Monolingual Portuguese TDN auto fusion 0.2310 *
GeoCLEF 2008 -- Aarhus September 18, 2008
BilingualBilingualRun Name Task Characteristics MAPBERKGCBIENDETD Bilingual English->German TD auto 0.215BERKGCBIENDETDN Bilingual English->German TDN auto 0.1682BERKBIENDETDNPIV Bilingual English->German TDN auto fusion 0.2251 *BERKGCBIPTDETD Bilingual Portuguese->German TD auto 0.195BERKGCBIPTDETDN Bilingual Portuguese->German TDN auto 0.1108BERKBIPTDETDNPIV Bilingual Portuguese->German TDN auto fusion 0.1912BERKGCBIDEENTD Bilingual German->English TD auto 0.2274BERKGCBIDEENTDN Bilingual German->English TDN auto 0.1894BERKBIDEENTDNPIV Bilingual German->English TDN auto fusion 0.2304 *BERKGCBIPTENTD Bilingual Portuguese->English TD auto 0.1886BERKGCBIPTENTDN Bilingual Portuguese->English TDN auto 0.154BERKBIPTENTDNPIV Bilingual Portuguese->English TDN auto fusion 0.2101BERKGCBIDEPTTD Bilingual German->Portuguese TD auto 0.1346BERKGCBIDEPTTDN Bilingual German->Portuguese TDN auto 0.126BERKBIDEPTTDNPIV Bilingual German->Portuguese TDN auto fusion 0.1488BERKGCBIENPTTD Bilingual English->Portuguese TD auto 0.1913BERKGCBIENPTTDN Bilingual English->Portuguese TDN auto 0.1762BERKBIENPTTDNPIV Bilingual English->Portuguese TDN auto fusion 0.2074 *
GeoCLEF 2008 -- Aarhus September 18, 2008
TDN FusionTDN Fusion
NewWt=(B*piv) + (A*(1-piv))
(piv = 0.29)
A: TD LogisticRegression withBlind Feedback
Result
B: TDNOKAPI BM-25
Result
Final Result
A and B Normalized usingMinMax to [0:1]
GeoCLEF 2008 -- Aarhus September 18, 2008
ResultsResults
Fusion of Logistic regression with blind feedback and Okapi BM-25 resulted in most of our best performing runsNot always dramatic improvement
With a single algorithm use of the Narrative is counter-productive. Using Title and Description provides better results with these algorithmsDoes blind feedback accomplish some of the
geographic expansion explicit in the narrative?
Fusion of Logistic regression with blind feedback and Okapi BM-25 resulted in most of our best performing runsNot always dramatic improvement
With a single algorithm use of the Narrative is counter-productive. Using Title and Description provides better results with these algorithmsDoes blind feedback accomplish some of the
geographic expansion explicit in the narrative?
GeoCLEF 2008 -- Aarhus September 18, 2008
Comparison of Berkeley Results 2006, 2007-2008
Comparison of Berkeley Results 2006, 2007-2008
Task MAP 2006
MAP 2007
MAP2008
Pct. Diff‘07-’08
Monolingual English
0.250 0.264 0.268* 1.493
Monolingual German
0.215 0.139 0.230 39.565
Monolingual Portuguese
0.162 0.174 0.231* 24.675
Bilingual English -> German
0.156 0.090 0.225* 60.000
Bilingual English -> Portuguese
0.1260 0.201 0.207* 2.899*using fusion
GeoCLEF 2008 -- Aarhus September 18, 2008
What happened in 2007 German?
What happened in 2007 German?
We speculated last year that it wasNo decompounding
2006 used Aitao Chen’s decompounding (no)
Worse translation?Possibly - different MT systems were used
But same for 2007 and 2008, so no
Incomplete stoplist?Was it really the same? (yes)
Was stemming the same? (yes)
We speculated last year that it wasNo decompounding
2006 used Aitao Chen’s decompounding (no)
Worse translation?Possibly - different MT systems were used
But same for 2007 and 2008, so no
Incomplete stoplist?Was it really the same? (yes)
Was stemming the same? (yes)
GeoCLEF 2008 -- Aarhus September 18, 2008
Why did German work better for us in 2008?Why did German work better for us in 2008?
That was all speculation, but…
It REALLY helps if you include the entire databaseOur 2007 German runs did not
include any documents from the SDA collection!
That was all speculation, but…
It REALLY helps if you include the entire databaseOur 2007 German runs did not
include any documents from the SDA collection!
GeoCLEF 2008 -- Aarhus September 18, 2008
What Next?What Next?
Finally start adding back true geographic processing and test where and why (and if) results are improved
Get decompounding working with German
Finally start adding back true geographic processing and test where and why (and if) results are improved
Get decompounding working with German