graph-based analysis of espresso-style minimally-supervised bootstrapping algorithms jan 15, 2010...
TRANSCRIPT
![Page 1: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/1.jpg)
Graph-based Analysis of Espresso-styleMinimally-supervised Bootstrapping
Algorithms
Jan 15, 2010
Mamoru KomachiNara Institute of Science and Technology
![Page 2: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/2.jpg)
Corpus-based extraction of semantic knowledge
2
Singapore
Hong Kong
___ visa Hong Kong
China
___ history
Australia
Egypt
Instance Pattern New instance
Alternate step by step
Input Output(Extracted from corpus)
Singapore visa
Singapore map
![Page 3: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/3.jpg)
Semantic drift is the central problem of bootstrapping
Singapore
card
visa ___ Australia
messages
greeting __
card
words
Instance Pattern New instance
Errors propagate to successive iteration
Input Output(Extracted from corpus)
Semantic category changed!
Generic patternsPatterns co-occurring with many irrelevant instances
Generic patternsPatterns co-occurring with many irrelevant instances
3
![Page 4: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/4.jpg)
Main contributions of this work
1. Suggest a parallel between semantic drift in Espresso [Pantel and Pennachiotti, 2006] style bootstrapping and topic drift in HITS [Kleinberg, 1999]
2. Solve semantic drift using “relatedness” measure (regularized Laplacian) instead of “importance” measure (HITS authority) used in link analysis community
4
![Page 5: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/5.jpg)
Table of contents
5
4.Graph-based Analysis of Espresso-style Bootstrapping Algorithms
3.Espresso-style Bootstrapping Algorithms
2. Overview of Bootstrapping Algorithms
5.Word Sense Disambiguation
6.Bilingual Dictionary Construction
7.Learning Semantic Categories
![Page 6: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/6.jpg)
Espresso Algorithm [Pantel and Pennacchiotti, 2006]
• Repeat• Pattern extraction• Pattern ranking• Pattern selection• Instance extraction• Instance ranking• Instance selection
• Until a stopping criterion is met
6
![Page 7: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/7.jpg)
Pattern/instance ranking in Espresso
Score for pattern p Score for instance i
7
r (p)
pmi(i, p)
max pmir(i)
iI
I
r(i)
pmi(i, p)
max pmir (p)
pP
P
,**,,*,
,,log),(
pyx
ypxpipmi
p: patterni: instanceP: set of patternsI: set of instancespmi: pointwise mutual informationmax pmi: max of pmi in all the patterns
and instances
![Page 8: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/8.jpg)
Espresso uses pattern-instance matrix Afor ranking patterns and instances
|P|×|I|-dimensional matrix holding the (normalized) pointwise mutual information (pmi) between patterns and instances
8
1 2 ... i ... |I|
1
2
:
p
:
|P|
[A]p,i = pmi(p,i) / maxp,i pmi(p,i)[A]p,i = pmi(p,i) / maxp,i pmi(p,i)
instance indices
pattern indices
![Page 9: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/9.jpg)
p = pattern score vectori = instance score vectorA = pattern-instance matrix
Pattern/instance ranking in Espresso
9
... pattern ranking
... instance ranking
Reliable instances are supported by reliable patterns, and vice versa
|P| = number of patterns|I| = number of instances
normalization factorsto keep score vectors not too large
![Page 10: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/10.jpg)
Espresso Algorithm [Pantel and Pennacchiotti, 2006]
• Repeat• Pattern extraction• Pattern ranking• Pattern selection• Instance extraction• Instance ranking• Instance selection
• Until a stopping criterion is met
10
For graph-theoretic analysis,
we will introduce 3 simplifications to Espresso
For graph-theoretic analysis,
we will introduce 3 simplifications to Espresso
![Page 11: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/11.jpg)
First simplification
• Compute the pattern-instance matrix• Repeat
• Pattern extraction• Pattern ranking• Pattern selection• Instance extraction• Instance ranking• Instance selection
• Until a stopping criterion is met
11
Simplification 1Remove pattern/instance extraction steps
Instead, pre-compute all patterns and instances once in the beginning of the algorithm
Simplification 1Remove pattern/instance extraction steps
Instead, pre-compute all patterns and instances once in the beginning of the algorithm
![Page 12: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/12.jpg)
Second simplification
• Compute the pattern-instance matrix• Repeat
• Pattern ranking• Pattern selection
• Instance ranking• Instance selection
• Until a stopping criterion is met
12
Simplification 2Remove pattern/instance selection stepswhich retain only highest scoring k patterns / m instances for the next iterationi.e., reset the scores of other items to 0
Instead,retain scores of all patterns and instances
Simplification 2Remove pattern/instance selection stepswhich retain only highest scoring k patterns / m instances for the next iterationi.e., reset the scores of other items to 0
Instead,retain scores of all patterns and instances
![Page 13: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/13.jpg)
Third simplification
• Compute the pattern-instance matrix• Repeat
• Pattern ranking
• Instance ranking
• Until a stopping criterion is met
13
Until score vectors p and i converge
Simplification 3No early stopping i.e., run until convergence
Simplification 3No early stopping i.e., run until convergence
![Page 14: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/14.jpg)
Simplified Espresso
Input• Initial score vector of seed instances• Pattern-instance co-occurrence matrix A
Main loopRepeat
Until i and p converge
OutputInstance and pattern score vectors i and p
14
... pattern ranking
... instance ranking
![Page 15: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/15.jpg)
Simplified Espresso is HITS
Simplified Espresso = HITS in a bipartite graph whose adjacency matrix is A
Problem
No matter which seed you start with, the same instance is always ranked topmostSemantic drift (also called topic drift in HITS)
15
The ranking vector i tends to the principal eigenvector of ATA as the iteration proceedsregardless of the seed instances!
![Page 16: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/16.jpg)
How about Espresso?
Espresso has heuristics not present in Simplified Espresso
• Early stopping• Pattern and instance selection
Do these heuristics really help reduce semantic drift?
16
![Page 17: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/17.jpg)
Experiments on semantic drift
Does the heuristics in original Espresso help reduce drift?
![Page 18: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/18.jpg)
Word sense disambiguation task of Senseval-3 English Lexical Sample
Predict the sense of “bank”
18
… the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ) , bring this up to …
In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden
Possibly aligned to water a sort of bank(???) by a rushing river.
Training instances are annotated with their sense
Predict the sense of target word in the test set
![Page 19: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/19.jpg)
Word sense disambiguation by Espresso
Seed instance = the instance to predict its senseSystem output = k-nearest neighbor (k=3)Heuristics of Espresso• Pattern and instance selection
• # of patterns to retain p=20 (increase p by 1 on each iteration)
• # of instance to retain m=100 (increase m by 100 on each iteration)
• Early stopping19
Seed instanceSeed instance
![Page 20: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/20.jpg)
Convergence process of Espresso
20
Heuristics in Espresso helps reducing semantic drift
(However, early stopping is required for optimal performance)
Output the most frequent sense regardless of input
Espresso
Simplified Espresso
Most frequent sense (baseline)
Semantic drift occurs (always outputs the most frequent sense)
![Page 21: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/21.jpg)
Learning curve of Espresso: per-sense breakdown
21
# of most frequent sense predictions increases
Precision for infrequent senses worsens even with
original Espresso
Most frequent sense
Other senses
![Page 22: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/22.jpg)
Summary: Espresso and semantic drift
Semantic drift happens because• Espresso is designed like HITS• HITS gives the same ranking list regardless of
seeds
Some heuristics reduce semantic drift• Early stopping is crucial for optimal performance
Still, these heuristics require• many parameters to be calibrated• but calibration is difficult
22
![Page 23: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/23.jpg)
Main contributions of this work
1. Suggest a parallel between semantic drift in Espresso-like bootstrapping and topic drift in HITS (Kleinberg, 1999)
2. Solve semantic drift by graph kernels used in link analysis community
23
![Page 24: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/24.jpg)
Q. What caused drift in Espresso?
A. Espresso's resemblance to HITS
HITS is an importance computation method(gives a single ranking list for any seeds)
Why not use a method for another type of link analysis measure - which takes seeds into account?"relatedness" measure
(it gives different rankings for different seeds)
24
![Page 25: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/25.jpg)
The regularized Laplacian kernel
• A relatedness measure• Takes higher-order relations into account• Has only one parameter
25
Normalized Graph Laplacian
R n ( L)n (I L) 1
n0
Regularized Laplacian matrix
A: adjacency matrix of the graphD: (diagonal) degree matrix
β: parameterEach column of Rβ gives the rankings relative to a node
![Page 26: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/26.jpg)
Label propagation algorithm [Zhou et al. 2004]
• Form the affinity matrix W defined by if i != j and Wii = 0.
• Construct the matrix in which D is a diagonal matrix with its (i,i)-
element equal to the sum of the i-th row of W.
• Iterate until convergence, where alpha is a parameter in
(0,1)
• Let F* denote the limit of the sequence {F(t)}. Label each point xi as a label
26
X is a set of instancesand xi is an instance
![Page 27: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/27.jpg)
Laplacian label propagation algorithm
• Form the affinity matrix W defined by where A is a row-normalized instance-pattern co-
occurrence matrix
• Construct the normalized Laplacian matrix
• Iterate until convergence, where alpha is a parameter in
(0,1)
• Let F* denote the limit of the sequence {F(t)}. Label each point xi as a label
27
![Page 28: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/28.jpg)
The sequence {F(t)} converges to F* = (1-α)(I-αS)-1Y
Proof:• Suppose F(0) = Y.• By iterative algorithm,
• Since 0 < α < 1 and the eigenvalues of (-L) in [-1, 1]
• Hencefor classification, which is equivalent to
28
This is also a form of the regularized Laplacian kernel
![Page 29: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/29.jpg)
Word Sense Disambiguation
Evaluation of regularized Laplacian against Espresso
![Page 30: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/30.jpg)
Label prediction of “bank” (F measure)
Algorithm Most frequent sense
Other senses
Simplified Espresso 100.0 0.0
Espresso (after convergence)
100.0 30.2
Espresso (optimal stopping) 94.4 67.4
Regularized Laplacian (β=10-
2)92.1 62.8
30
The regularized Laplacian keeps high recall for infrequent senses
Espresso suffers from semantic drift(unless stopped at optimal stage)
![Page 31: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/31.jpg)
WSD on all nouns in Senseval-3
algorithm F measur
e
Most frequent sense (baseline)
54.5
HyperLex 64.6
PageRank 64.6
Simplified Espresso 44.1
Espresso (after convergence) 46.9
Espresso (optimal stopping) 66.5
Regularized Laplacian (β=10-
2)67.1
31
Outperforms other graph-based methods
Espresso needs optimal stopping to achieve an equivalent performance
![Page 32: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/32.jpg)
Regularized Laplacian is stable across a parameter
32
![Page 33: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/33.jpg)
Learning Semantic Categories
Evaluation of regularized Laplacian by large-scaled web data
![Page 34: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/34.jpg)
Graph-based approach to learn semantic knowledge
• We showed that some of bootstrapping algorithms can be regarded as computing similarity over a graph
34
Singapore
Hong Kong___ map
___ visa
UFJ
ANA
___ history
?
China___ airlines
Never tried large-scale data->Does it scale?
Tested on word sense disambiguation task->Is it task-independent?
Never tried large-scale data->Does it scale?
Tested on word sense disambiguation task->Is it task-independent?
![Page 35: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/35.jpg)
Bootstrapping achieves high precision, but requires calibration in practice
Pros• Dedicated to Japanese web search query logs• Outperforms previous methods on this task• Orders of magnitude faster than previous
bootstrapping methods
Cons• Needs to control many parameters and
configurations• Has little theoretical background• Does not scale to large corpus
35
![Page 36: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/36.jpg)
Graph-based methods are simple but effective on
large-scale data such as web documents
Pros• Can scale to massive amount of raw data• Firm mathematical background (Can employ
classical optimization methods)
Cons• Computational efficiency (Can be approximated)• No obvious “good” graph• Requires computational resource (CPU, disk,
memory, etc) and technical know-how
36
![Page 37: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/37.jpg)
Quetchup algorithm (QUEry Term CHUnk Processor)
• Using clickthrough logs as source of information
• Semi-supervised learning with Laplacian Label Propagation
• Efficient computation over a graph
37
![Page 38: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/38.jpg)
Learning semantic categories from clickthrough graph
• Query “Singapore” –(click)-> http://www.cikm2009.org/
38
Singapore
Hong Kong http://en.wikipedia.org/wiki/Hong_Kong
http://www.bk.mufg.jp/
UFJ
ANA
http://www.singaporair.com/hk.jsp
Chinahttp://www.china-airlines.co.jp/
http://www.ana.co.jp/
http://www.acl-ijcnlp-2009.org/
http://www.cikm2009.org/
![Page 39: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/39.jpg)
Experimental setting
• Search logs: Japanese web search logs collected in August 2008; retained the top 10 million distinct queries
• Target categories: Travel and Finance [Komachi and Suzuki, 2008]
• Evaluation: precision at k; relative recall [Pantel and Ravichandran, 2004]
39
where RA|B is relative recall of system A given B,CX is the number of correct output from system X,C is the true number of the correct instances,PX is precison of system X, and |X| is the number of input instances for system X
![Page 40: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/40.jpg)
Seed instances for each category
Category Seed
Travel jal (Japan Airlines),ana (All Nippon Airways),jr (Japan Railways), じゃらん (jalan: online travel guide site), his (H.I.S.Co.,Ltd.: travel agency)
Finance みずほ銀行 (Mizuho Bank),三井住友銀行 (Sumitomo Mitsui Banking Corporation), jcb,新生銀行 (Shinsei Bank), 野村證券 (Nomura Securities)
40
![Page 41: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/41.jpg)
Precision of Travel domain
41
The regularized Laplacian gives best precision
![Page 42: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/42.jpg)
Precision of Finance domain
42
Again, the regularized Laplacian gives best precision
![Page 43: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/43.jpg)
Relative recall of Travel domain
Clickthrough logs not only achieve high precision but also high recall
43
![Page 44: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/44.jpg)
Relative recall of Finance domain
44
![Page 45: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/45.jpg)
Relative recall of Quetchup with varying click:query
45
![Page 46: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/46.jpg)
Precision of Quethcup with varying click:query
46
![Page 47: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/47.jpg)
Instances and patterns with the top 10 highest scores
System
Instance Pattern
Quetchupclick
じゃらん 宿泊 (jalan accommodation), じゃラン (jalan), ジャラン (jalan), jarann, jaran, じゃらん net (jalan.net), jalan, じゅらん (julan), ana 予約 (ana reservation), ana.co.jp
www.jalan.net/, www.ana.co.jp/, www.his-j.com/, www.jreast.co.jp/, www.jtb.co.jp/, www.jtb.co.jp/ace/, www.westjr.co.jp/, www.jtb.co.jp/kaigai/, nippon.his.co.jp/, www.jr.cyberstation.ne.jp/
Quetchupquery
中部発 (Traveling from Midland), his 関西 (his Kansai), 伊平屋島 (Iheya Island), ホテルコンチネンタル横浜 (Hotel Continental Yokohama), げんじいの森 (Genjii-no-mori; spa), フジサファリパーク (Fuji Safari Park), ad-box, アダコミ (adacomi; offensive), スカイチーム (SkyTeam), ノースウェスト (Northwest)
# 時刻表 (timetable), # 国内旅行 (domestic tour), # 宿泊 (accommodation), # 北海道 (Hokkaido), # 関西 (Kansai), # 九州 (Kyushu), # マイレージ (mileage), # 名古屋 (Nagoya), # 沖縄 (Okinawa), # 温泉 (spa)
Tchai 静鉄バス (Seitetsu Bus), 相鉄バス (Sotetsu Bus), 函館バス (Hakodate Bus), 大阪地下鉄 (Osaka subway), 琴電 (Kotoden railways), 地下鉄御堂筋線 (Subway Midosuji Line), 芸陽バス (Geiyo Bus), 新京成バス (Shin-keisei Bus), jr 阪和線 (Japan Railways Hanna Line), 常磐線 (Joban Line)
# 時刻表 , # 路線図 (route map), # 運賃 (fare), # 料金 (fare), # 定期 (season ticket), # 運行状況 (service situation), # 路線 (route), # 定期代 (season ticket fare), # 定期券 (season ticket), # 時刻
47
![Page 48: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/48.jpg)
Random samples of extracted instancesType (Frequency)
Example
Transportation (54) 広島 新幹線 (Hiroshima Super Express), 東海道線 (Tokaido Line), jr飯田線 (JR Iida Line), jr 博多 (JR Hakata), 京都 新幹線 (Kyoto Super Express)
Accommodation (10)
ホテルビーナス (Hotel Venus), リーガロイヤルホテル大阪 (Rihga Royal Hotel Osaka), www.route-inn.co.jp, ホテル京阪ユニバーサル・シティ (Hotel Keihan Universal City), 札幌全日空ホテル (ANA Hotel Sapporo)
Travel Information (10)
外務省 安全 (Foreign Ministry safety), チケットショップ 大阪 (ticket shop Osaka), 観光 関西 (Site seeing Kansai), 高山観光協会 (Takayama Tourism Association), グーグル ナビ (Google navi)
Travel Agency (6) jr おでかけネット (Odekake net), 近畿ツー (Kinki Tourist), タビックス 静岡 (Tabix Shizuoka), フレックスインターナショナル (Flex International), オリオンツアー (Orion Tour)
Others (2) プロテカ (Proteca; bag for travel), jal 紀行倶楽部 (JAL Travel Club)
Not Related (20) 格安航空チケット 海外 (discount flight ticket overseas), 新幹線予約状況 (Super express reservation situation), 新幹線 時刻表 (Super express timetable), 温泉宿 (spa accommodation), 新幹線 停車駅 (Super express stops), 虎 (tiger), youtubu 海外ドラマ (overseas drama), 法務部採用 (legal department recruitment), おくりびと (Okuribito; film), 社会人野球 (amateur baseball)
48
![Page 49: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/49.jpg)
Learning curve of Quetchupclick
The more the data, the higher the precision
49
![Page 50: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/50.jpg)
Sensitivity to Quetchupclick to parameter α
Clickthrough graph seems much denser than query graph, and large alpha (less weight on initial labels) yields better performance than small alpha
50
![Page 51: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/51.jpg)
Conclusions
• Semantic drift in Espresso is a parallel form of topic drift in HITS
• The regularized Laplacian reduces semantic drift in two tasks: word sense disambiguation and named entity extraction• inherently a relatedness measure ( importance
measure)
51
![Page 52: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology](https://reader037.vdocument.in/reader037/viewer/2022110209/56649e355503460f94b23bd7/html5/thumbnails/52.jpg)
Future work
• Investigate if a similar analysis is applicable to a wider class of bootstrapping algorithms (including co-training)
• Investigate the influence of seed selection to bootstrapping algorithms and propose a way to select effective seed instances
• Explore relation between Markov random walk and label propagation methods
52