japanese mlr
DESCRIPTION
Japanese MLR. International/JP MLR Issues. Have to do more with less data Blending different languages? Can’t necessarily filter adult May need new/different features Different types of queries English/Bracket/Phrase/etc Metrics designed for English China has lots more spam - PowerPoint PPT PresentationTRANSCRIPT
Japanese MLR
International/JP MLR Issues
• Have to do more with less data– Blending different languages?
• Can’t necessarily filter adult• May need new/different features• Different types of queries
English/Bracket/Phrase/etc• Metrics designed for English
– China has lots more spam– Japan has much less spam– Germany looks 10-20% ahead of Google by DCG
JP MLR vs. English MLR
Kanji/ Hiragana
Katakana Latin (Romaji)
Baseline 7.2 7.6 9.2
JP MLR +4% +2% +1%
EN MLR 0% +1% +3%
Google +3% +4% +6%
Examples 277 231 96
Different features important for JP
• http://internal.inktomi.com/~lukeb/FeatureImportance.html
• “Linkflux”
• How soon the word appears in the document
• Is the first word in query in the title
New features for JP
• Query Word Length very important
• Query type important
• Phonetic url match
• Future:– vcano match– Matching segmented chunks