question processing: formulation & expansion ling573 nlp systems and applications may 2, 2013

79
Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Upload: caroline-bishop

Post on 18-Jan-2018

227 views

Category:

Documents


0 download

DESCRIPTION

Deeper Processing for Query Formulation MULDER (Kwok, Etzioni, & Weld) Converts question to multiple search queries Forms which match target Vary specificity of query Most general bag of keywords Most specific partial/full phrases Subsets 4 query forms on average Employs full parsing augmented with morphology

TRANSCRIPT

Page 1: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question Processing: Formulation &

ExpansionLing573

NLP Systems and ApplicationsMay 2, 2013

Page 2: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Deeper Processing for Query Formulation

MULDER (Kwok, Etzioni, & Weld)Converts question to multiple search queries

Forms which match targetVary specificity of query

Most general bag of keywordsMost specific partial/full phrases

Page 3: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Deeper Processing for Query Formulation

MULDER (Kwok, Etzioni, & Weld)Converts question to multiple search queries

Forms which match targetVary specificity of query

Most general bag of keywordsMost specific partial/full phrases

Subsets 4 query forms on averageEmploys full parsing augmented with

morphology

Page 4: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ParsingCreates full syntactic analysis of question

Maximum Entropy Inspired (MEI) parserTrained on WSJ

Page 5: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ParsingCreates full syntactic analysis of question

Maximum Entropy Inspired (MEI) parserTrained on WSJ

Challenge: Unknown wordsParser has limited vocabulary

Uses guessing strategy Bad: “tungsten”

Page 6: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ParsingCreates full syntactic analysis of question

Maximum Entropy Inspired (MEI) parserTrained on WSJ

Challenge: Unknown wordsParser has limited vocabulary

Uses guessing strategy Bad: “tungsten” number

Solution:

Page 7: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ParsingCreates full syntactic analysis of question

Maximum Entropy Inspired (MEI) parserTrained on WSJ

Challenge: Unknown wordsParser has limited vocabulary

Uses guessing strategy Bad: “tungsten” number

Solution:Augment with morphological analysis: PC-Kimmo If PC-KIMMO fails?

Page 8: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ParsingCreates full syntactic analysis of question

Maximum Entropy Inspired (MEI) parserTrained on WSJ

Challenge: Unknown wordsParser has limited vocabulary

Uses guessing strategy Bad: “tungsten” number

Solution:Augment with morphological analysis: PC-Kimmo If PC-KIMMO fails? Guess Noun

Page 9: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ClassificationSimple categorization:

Nominal, numerical, temporalHypothesis: Simplicity High accuracy

Also avoids complex training, ontology design

Page 10: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ClassificationSimple categorization:

Nominal, numerical, temporalHypothesis: Simplicity High accuracy

Also avoids complex training, ontology design

Parsing used in two ways:Constituent parser extracts wh-phrases:

e.g. wh-adj: how many

Page 11: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ClassificationSimple categorization:

Nominal, numerical, temporalHypothesis: Simplicity High accuracy

Also avoids complex training, ontology design

Parsing used in two ways:Constituent parser extracts wh-phrases:

e.g. wh-adj: how many numerical; wh-adv: when, where

Page 12: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ClassificationSimple categorization:

Nominal, numerical, temporalHypothesis: Simplicity High accuracy

Also avoids complex training, ontology design

Parsing used in two ways:Constituent parser extracts wh-phrases:

e.g. wh-adj: how many numerical; wh-adv: when, where

wh-noun: type?

Page 13: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ClassificationSimple categorization:

Nominal, numerical, temporalHypothesis: Simplicity High accuracy

Also avoids complex training, ontology design

Parsing used in two ways:Constituent parser extracts wh-phrases:

e.g. wh-adj: how many numerical; wh-adv: when, where

wh-noun: type? any what height vs what time vs what actor

Page 14: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Question ClassificationSimple categorization:

Nominal, numerical, temporalHypothesis: Simplicity High accuracy

Also avoids complex training, ontology design

Parsing used in two ways:Constituent parser extracts wh-phrases:

e.g. wh-adj: how many numerical; wh-adv: when, wherewh-noun: type? any

what height vs what time vs what actorLink parser identifies verb-object relation for wh-noun

Uses WordNet hypernyms to classify object, Q

Page 15: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Syntax for Query Formulation

Parse-based transformations:Applies transformational grammar rules to

questions

Page 16: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Syntax for Query Formulation

Parse-based transformations:Applies transformational grammar rules to

questionsExample rules:

Subject-auxiliary movement: Q: Who was the first American in space?

Page 17: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Syntax for Query Formulation

Parse-based transformations:Applies transformational grammar rules to

questionsExample rules:

Subject-auxiliary movement: Q: Who was the first American in space? Alt: was the first American…; the first American in space

wasSubject-verb movement:

Who shot JFK?

Page 18: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Syntax for Query Formulation

Parse-based transformations:Applies transformational grammar rules to

questionsExample rules:

Subject-auxiliary movement: Q: Who was the first American in space? Alt: was the first American…; the first American in space

wasSubject-verb movement:

Who shot JFK? => shot JFKEtc

Page 19: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Syntax for Query Formulation

Parse-based transformations:Applies transformational grammar rules to

questionsExample rules:

Subject-auxiliary movement: Q: Who was the first American in space? Alt: was the first American…; the first American in space

wasSubject-verb movement:

Who shot JFK? => shot JFKEtc

Page 20: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

More GeneralQuery Processing

WordNet Query ExpansionMany lexical alternations: ‘How tall’ ‘The height is’Replace adjectives with corresponding ‘attribute

noun’

Page 21: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

More GeneralQuery Processing

WordNet Query ExpansionMany lexical alternations: ‘How tall’ ‘The height is’Replace adjectives with corresponding ‘attribute

noun’

Verb conversion:Morphological processing

DO-AUX …. V-INF V+inflectionGeneration via PC-KIMMO

Page 22: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

More GeneralQuery Processing

WordNet Query ExpansionMany lexical alternations: ‘How tall’ ‘The height is’Replace adjectives with corresponding ‘attribute noun’

Verb conversion:Morphological processing

DO-AUX …. V-INF V+inflectionGeneration via PC-KIMMO

Query formulation contributes significantly to effectiveness

Page 23: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Machine Learning Approaches

Diverse approaches:Assume annotated query logs, annotated question

sets, matched query/snippet pairs

Page 24: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Machine Learning Approaches

Diverse approaches:Assume annotated query logs, annotated question

sets, matched query/snippet pairsLearn question paraphrases (MSRA)

Improve QA by setting question sitesImprove search by generating alternate question

forms

Page 25: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Machine Learning Approaches

Diverse approaches:Assume annotated query logs, annotated question

sets, matched query/snippet pairsLearn question paraphrases (MSRA)

Improve QA by setting question sitesImprove search by generating alternate question

forms

Question reformulation as machine translationGiven question logs, click-through snippets

Train machine learning model to transform Q -> A

Page 26: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Query ExpansionBasic idea:

Improve matching by adding words with similar meaning/similar topic to query

Page 27: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Query ExpansionBasic idea:

Improve matching by adding words with similar meaning/similar topic to query

Alternative strategies:Use fixed lexical resource

E.g. WordNet

Page 28: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Query ExpansionBasic idea:

Improve matching by adding words with similar meaning/similar topic to query

Alternative strategies:Use fixed lexical resource

E.g. WordNet

Use information from document collectionPseudo-relevance feedback

Page 29: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

WordNet Based ExpansionIn Information Retrieval settings, mixed history

Helped, hurt, or no effectWith long queries & long documents, no/bad effect

Page 30: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

WordNet Based ExpansionIn Information Retrieval settings, mixed history

Helped, hurt, or no effectWith long queries & long documents, no/bad effect

Some recent positive results on short queriesE.g. Fang 2008

Contrasts different WordNet, Thesaurus similarityAdd semantically similar terms to query

Additional weight factor based on similarity score

Page 31: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Similarity MeasuresDefinition similarity: Sdef(t1,t2)

Word overlap between glosses of all synsetsDivided by total numbers of words in all synsets glosses

Page 32: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Similarity MeasuresDefinition similarity: Sdef(t1,t2)

Word overlap between glosses of all synsetsDivided by total numbers of words in all synsets glosses

Relation similarity:Get value if terms are:

Synonyms, hypernyms, hyponyms, holonyms, or meronyms

Page 33: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Similarity MeasuresDefinition similarity: Sdef(t1,t2)

Word overlap between glosses of all synsetsDivided by total numbers of words in all synsets glosses

Relation similarity:Get value if terms are:

Synonyms, hypernyms, hyponyms, holonyms, or meronyms

Term similarity score from Lin’s thesaurus

Page 34: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ResultsDefinition similarity yields significant

improvementsAllows matching across POSMore fine-grained weighting than binary relations

Page 35: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Managing Morphological Variants

Bilotti et al. 2004“What Works Better for Question Answering:

Stemming or Morphological Query Expansion?”

Page 36: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Managing Morphological Variants

Bilotti et al. 2004“What Works Better for Question Answering:

Stemming or Morphological Query Expansion?”Goal:

Recall-oriented document retrieval for QACan’t answer questions without relevant docs

Page 37: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Managing Morphological Variants

Bilotti et al. 2004“What Works Better for Question Answering:

Stemming or Morphological Query Expansion?”Goal:

Recall-oriented document retrieval for QACan’t answer questions without relevant docs

Approach:Assess alternate strategies for morphological

variation

Page 38: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

QuestionComparison

Index time stemmingStem document collection at index timePerform comparable processing of queryCommon approach

Widely available stemmer implementations: Porter, Krovetz

Page 39: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

QuestionComparison

Index time stemmingStem document collection at index timePerform comparable processing of queryCommon approach

Widely available stemmer implementations: Porter, Krovetz

Query time morphological expansionNo morphological processing of documents at index timeAdd additional morphological variants at query time

Less common, requires morphological generation

Page 40: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Prior FindingsMostly focused on stemming Mixed results (in spite of common use)

Harman found little effect in ad-hoc retrieval: Why?

Page 41: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Prior FindingsMostly focused on stemming Mixed results (in spite of common use)

Harman found little effect in ad-hoc retrieval: Why?Morphological variants in long documentsHelps some, hurts others: How?

Page 42: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Prior FindingsMostly focused on stemming Mixed results (in spite of common use)

Harman found little effect in ad-hoc retrieval: Why?Morphological variants in long documentsHelps some, hurts others: How?

Stemming captures unrelated senses: e.g. AIDS aidOthers:

Large, obvious benefits on morphologically rich langs.Improvements even on English

Page 43: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Prior FindingsMostly focused on stemming Mixed results (in spite of common use)

Harman found little effect in ad-hoc retrieval: Why?Morphological variants in long documentsHelps some, hurts others: How?

Stemming captures unrelated senses: e.g. AIDS aidOthers:

Large, obvious benefits on morphologically rich langs.Improvements even on EnglishHull: most queries improve, some improve a lot

Page 44: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Prior FindingsMostly focused on stemming Mixed results (in spite of common use)

Harman found little effect in ad-hoc retrieval: Why?Morphological variants in long documentsHelps some, hurts others: How?

Stemming captures unrelated senses: e.g. AIDS aidOthers:

Large, obvious benefits on morphologically rich langs.Improvements even on EnglishHull: most queries improve, some improve a lotMonz: Index time stemming improved QA

Page 45: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall ApproachHead-to-head comparisonAQUAINT documents

Page 46: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall ApproachHead-to-head comparisonAQUAINT documentsRetrieval based on Lucene

Boolean retrieval with tf-idf weighting

Page 47: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall ApproachHead-to-head comparisonAQUAINT documentsRetrieval based on Lucene

Boolean retrieval with tf-idf weightingCompare retrieval varying stemming and

expansion

Page 48: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall ApproachHead-to-head comparisonAQUAINT documentsRetrieval based on Lucene

Boolean retrieval with tf-idf weightingCompare retrieval varying stemming and

expansionAssess results

Page 49: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Improving a Test Collection

Observation: (We’ve seen it, too.)# of known relevant docs in TREC QA very small

Page 50: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Improving a Test Collection

Observation: (We’ve seen it, too.)# of known relevant docs in TREC QA very smallTREC 2002: 1.95 relevant per question in poolClearly many more

Approach:

Page 51: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Improving a Test Collection

Observation: (We’ve seen it, too.)# of known relevant docs in TREC QA very smallTREC 2002: 1.95 relevant per question in poolClearly many more

Approach:Manually create improve relevance assessmentCreate queries from originals

Page 52: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Improving a Test Collection

Observation: (We’ve seen it, too.)# of known relevant docs in TREC QA very smallTREC 2002: 1.95 relevant per question in poolClearly many more

Approach:Manually create improve relevance assessmentCreate queries from originals

Terms that “must necessarily” appear in relevant docs

Retrieve and verify documentsFound 15.84 relevant per question

Page 53: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ExampleQ: What is the name of the volcano that

destroyed the ancient city of Pompeii?” A: Vesuvius

New search query:

Page 54: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ExampleQ: What is the name of the volcano that

destroyed the ancient city of Pompeii?” A: Vesuvius

New search query: “Pompeii” and “Vesuvius” In A.D. 79, long-dormant Mount Vesuvius erupted, burying the

Roman cities of Pompeii and Herculaneum in volcanic ash.”

Page 55: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ExampleQ: What is the name of the volcano that

destroyed the ancient city of Pompeii?” A: Vesuvius

New search query: “Pompeii” and “Vesuvius”Relevant: In A.D. 79, long-dormant Mount Vesuvius erupted,

burying the Roman cities of Pompeii and Herculaneum in volcanic ash.”

Pompeii was pagan in A.D. 79, when Vesuvius erupted.

Page 56: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ExampleQ: What is the name of the volcano that destroyed

the ancient city of Pompeii?” A: VesuviusNew search query: “Pompeii” and “Vesuvius”Relevant: In A.D. 79, long-dormant Mount Vesuvius erupted, burying

the Roman cities of Pompeii and Herculaneum in volcanic ash.”

Unsupported: Pompeii was pagan in A.D. 79, when Vesuvius erupted.

Vineyards near Pompeii grow in volcanic soil at the foot of Mt. Vesuvius

Page 57: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ExampleQ: What is the name of the volcano that destroyed

the ancient city of Pompeii?” A: VesuviusNew search query: “Pompeii” and “Vesuvius”Relevant: In A.D. 79, long-dormant Mount Vesuvius erupted, burying

the Roman cities of Pompeii and Herculaneum in volcanic ash.”

Unsupported: Pompeii was pagan in A.D. 79, when Vesuvius erupted.

Irrelevant: Vineyards near Pompeii grow in volcanic soil at the foot of Mt. Vesuvius

Page 58: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Stemming & ExpansionBase query form: Conjunct of disjuncts

Disjunction over morphological term expansions

Page 59: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Stemming & ExpansionBase query form: Conjunct of disjuncts

Disjunction over morphological term expansionsRank terms by IDFSuccessive relaxation by dropping lowest IDF term

Contrasting conditions:

Page 60: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Stemming & ExpansionBase query form: Conjunct of disjuncts

Disjunction over morphological term expansionsRank terms by IDFSuccessive relaxation by dropping lowest IDF term

Contrasting conditions:Baseline: No nothing (except stopword removal)

Page 61: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Stemming & ExpansionBase query form: Conjunct of disjuncts

Disjunction over morphological term expansionsRank terms by IDFSuccessive relaxation by dropping lowest IDF term

Contrasting conditions:Baseline: No nothing (except stopword removal)Stemming: Porter stemmer applied to query, index

Page 62: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Stemming & ExpansionBase query form: Conjunct of disjuncts

Disjunction over morphological term expansionsRank terms by IDFSuccessive relaxation by dropping lowest IDF term

Contrasting conditions:Baseline: No nothing (except stopword removal)Stemming: Porter stemmer applied to query, indexUnweighted inflectional expansion:

POS-based variants generated for non-stop query terms

Page 63: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Stemming & ExpansionBase query form: Conjunct of disjuncts

Disjunction over morphological term expansionsRank terms by IDFSuccessive relaxation by dropping lowest IDF term

Contrasting conditions:Baseline: No nothing (except stopword removal)Stemming: Porter stemmer applied to query, indexUnweighted inflectional expansion:

POS-based variants generated for non-stop query terms

Weighted inflectional expansion: prev. + weights

Page 64: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ExampleQ: What lays blue eggs?Baseline: blue AND eggs AND laysStemming: blue AND egg AND laiUIE: blue AND (eggs OR egg) AND (lays OR

laying OR lay OR laid)WIE: blue AND (eggs OR eggw) AND (lays OR

layingw OR layw OR laidw)

Page 65: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Evaluation MetricsRecall-oriented

Page 66: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Evaluation MetricsRecall-oriented: why?

All later processing filters

Page 67: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Evaluation MetricsRecall-oriented: why?

All later processing filtersRecall @ n:

Fraction of relevant docs retrieved at some cutoff

Page 68: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Evaluation MetricsRecall-oriented: why?

All later processing filtersRecall @ n:

Fraction of relevant docs retrieved at some cutoffTotal document reciprocal rank (TDRR):

Compute reciprocal rank for rel. retrieved documents

Sum overall documentsForm of weighted recall, based on rank

Page 69: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Results

Page 70: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall FindingsRecall:

Page 71: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall FindingsRecall:

Porter stemming performs WORSE than baselineAt all levels

Page 72: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall FindingsRecall:

Porter stemming performs WORSE than baselineAt all levels

Expansion performs BETTER than baselineTuned weighting improves over uniform

Most notable at lower cutoffs

Page 73: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

Overall FindingsRecall:

Porter stemming performs WORSE than baselineAt all levels

Expansion performs BETTER than baselineTuned weighting improves over uniform

Most notable at lower cutoffsTDRR:

Everything’s worse than baseline Irrelevant docs promoted more

Page 74: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ObservationsWhy is stemming so bad?

Page 75: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ObservationsWhy is stemming so bad?

Porter stemming linguistically naïve, over-conflatespolice = policy; organization = organ; European !=

Europe

Page 76: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ObservationsWhy is stemming so bad?

Porter stemming linguistically naïve, over-conflatespolice = policy; organization = organ; European !=

EuropeExpansion better motivated, constrained

Page 77: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ObservationsWhy is stemming so bad?

Porter stemming linguistically naïve, over-conflatespolice = policy; organization = organ; European !=

EuropeExpansion better motivated, constrained

Why does TDRR drop when recall rises?

Page 78: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ObservationsWhy is stemming so bad?

Porter stemming linguistically naïve, over-conflatespolice = policy; organization = organ; European !=

EuropeExpansion better motivated, constrained

Why does TDRR drop when recall rises?TDRR – and RR in general – very sensitive to swaps

at higher ranksSome erroneous docs added higher

Page 79: Question Processing: Formulation & Expansion Ling573 NLP Systems and Applications May 2, 2013

ObservationsWhy is stemming so bad?

Porter stemming linguistically naïve, over-conflatespolice = policy; organization = organ; European !=

EuropeExpansion better motivated, constrained

Why does TDRR drop when recall rises?TDRR – and RR in general – very sensitive to swaps

at higher ranksSome erroneous docs added higher

Expansion approach provides flexible weighting