wong cheuk fun presentation on keyword search. head, modifier, and constraint detection in short...
Post on 01-Jan-2016
213 Views
Preview:
TRANSCRIPT
Presentation on Keyword Search
Wong Cheuk FunPresentation on Keyword SearchHead, Modifier, and Constraint Detection in Short TextsZhongyuan Wang, Haixun Wang, Zhirui HuPopular iphone 5s smart coverModifiersConstraintHead
90% of distinct queries consist of 2 or more componentsDetection ChallengesNo grammar rulesPopular iphone 5s smart cover vs Popular smart cover iphone 5sRequire external knowledgeJob search vs Job interviewInstance-level head-modifier knowledgeConceptual knowledgeConcept-level head-modifier knowledgeDetection Approach(concept[head], concept[modifier], score)
e.g. (accessary[head], device[modifier], 0.9)
Three major challenges:Knowledges coverage to handle all possible inputAvoid deriving conflicting patternsIdentify constraints from non-constraint modifiersMining Concept Patterns -- Probase
IsA taxonomyEntities vs concepts(Barack Obama) vs USA president2.7 million conceptsP(e|c) tells how popular eas concept c is concernedand vice versa.e.g.P(Fujitsu|Computer)> P(Acer|Computer)
n(e,c) denotes the frequencies of e and c occur togetherMining Concept Patterns Instance-level Head-ModifiersIdentify head and modifiers no matter what their orders aresmart cover for iphone 5sOther prepositions:of, with, in, on, atWhen they are used, (A for B, A of B, A with B) it is almost always true that A is the head and B is the constraint.
Mining Concept Patterns Concept-level Head-ModifiersLevels of Conceptualization (head, modifier, score) (smart cover, iphone 5s) too specific, (obj, obj) too general Conflicting rules: (company, device) vs (device, company)Conceptualizing instancesMap e to c if P(c|e) is among top k;Map e to c if P(e|c) is among top k;Map e to c if P(c|e)*P(e|c) is amont top k;Map e to itself if e is itself a conceptFirst two are not desirable as they are either too general or too specificFor(3), larger value shows evidence of the closeness between c and e.For(4), we use entropy to identify popular instance:
Mining Concept Patterns Conceptualizing PairsTerm apple conceptualizes to fruit or companyCEO for apple (CEO, fruit), (CEO, company)Obviously, (CEO, fruit) is wrong.
Wrong concept pairs introduced will be filtered out due to low score
Head and Modifier Detection Parsing1. Text are parsed using Probase*New York and New York Times
2. Remove non-constraint modifiers
3. Cluster terms Cluster short text having more than one head(e.g. apple ipad microsoft surface)Reduce pair for conceptualization
Head and Modifier Detection for 2 components
Head and Modifier Detection for > 2 components
Modifier can thus be ranked by its closeness to the headFor query college football player, we remove the likely weakest edge college player.
Mining non-constraint modifiersTop query Seattle, good travelling hostelNon-constraint modifiers: Top, good
Non-constraint modifiers are more likely on the left of the querye.g. cheap red shoe instead of red cheap shoe
Mining non-constraint modifiers using Probase2.7 million concepts
Mining non-constraint modifiers mining processConstruct modifier networks based on observationsCalculate score of each node as a non-constraint modifier in the networks
Lower PMS makes it a non-constraint modifier
Framework for head, modifier and constraint detection
On Masking Topical Intent in Keyword SearchPeng Wang and Chinya V. Ravishankar
Keyword-Based ObfuscationHide real query in a mass of dummy queries generated using a Dummy Query Generation Algorithm (DGA).Advantage: Purely client-based
Disadvantage: Not secure, cannot ensure real and dummy queries are indistinguishable
Topical Intent ObfuscationFor a real user query q, dummy queries are created matching other topics.*Topic Relevance ensure obfuscationUnder two thresholds, , ( < ), with topic t and query q,Pr[t] : ts relevance based on general interest patternPr[t|q] : ts relevance after taking q into accountPr[t|q] - Pr[t] > t is relevant to q.Aim: Pr[t|q] - Pr[t] < to create irrelevant dummy queries
top related