a self learning universal concept spotter by tomek strzalkowski and jin wang original slides by iman...

14
A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Upload: justin-foster

Post on 17-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

A Self Learning Universal Concept Spotter

By Tomek Strzalkowski and Jin Wang

Original slides by Iman Sen

Edited by Ralph Grishman

Page 2: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Introduction

When this paper appeared (1996), most named entity taggers were hand-coded; work on supervised learning for NE was just beginning.

The Universal Spotter was one of the first procedures proposed for unsupervised learning of semantic categories of names and noun phrases.

Page 3: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Basic Idea

Start with some examples and/or contexts for things to spot (the ‘seed’) & a large corpus

Exploit redundancy of evidence we may be able to classify a name both because we know the

name and we know its context

Use seed examples to learn indicative contexts and use these contexts to learn “new” items.

Initially precision is high, recall very low. Iterations should increase recall, while (hopefully)

maintaining/improving precision.

Page 4: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Seeds: What we are looking for

The seed is the initial information provided by the user. … in the form of either Examples or Contextual

Information. Examples are taken from the text ( “Microsoft”,

“toothbrushes”). Context information can also be specified (both Internal

& External). For example, “Name ends with Co.” or “appears after produced” .

Page 5: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

The Cyclic Process

1. Build context rules from the seed examples.

2. Use these rules to find further examples of this concept in the corpus.

3. As we find more examples of the concepts, we can find more contextual information.

4. Selectively expand context rules using these new contexts.

5. Repeat.

Page 6: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Simple Example

• Suppose we have the seeds “Co” and “Inc” initially and the following text.

“Henry Kaufman is president of Henry Kaufman & Co., …..president of

Gabelli Funds Inc. ; Claude. N . Rosenberg is named president of

Thomson S.A ….”

• Use “Co” and “Inc” to pick out Henry Kaufman & Co and Gabelli Funds Inc.

• Use these new seeds to get contextual information such as for example, “president of” before each of the entities.

• Use “president of” to find “Thomson S.A.”

Page 7: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

The Classification Task

So our goal is to decide whether a sequence of words represents a desired entity/concept.

This is done by calculating significance weights, SW, of evidence items [features], and then combining them .

Page 8: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

The Process: In Detail

• Initially some preprocessing is done including tokenization, POS tagging and lexical normalization or stemming.

• POS tagging help to delineate which sequences of words might contain the desired entities.

• These become the ‘candidate items’

Page 9: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

“Evidence Items” [features]

Consider sequence of words W1,W2,…Wm in text which is of interest. There is a window of size n on either side of the central unit where one looks for contextual information.

Then do the following:

Make up pairs of (word, position), where position is one of preceding (p) context, central unit (s) or following (f) context for all words within the window of size n. Similarly make up pairs of (bigram, position).

Make up triples of (word, position, distance) for the same sequence of words, where distance is the distance from W1 or Wm.

(for units in W1 thru Wm take distance from Wm).

Page 10: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

An Example of Evidence Items

Example: ... boys kicked the door with rage ... with window n=2, and central unit, “the door”.

The generated tuples (called evidence items) are :

(boys, p), (kicked, p), (the, s),

(door, s), (with, f), (rage , f),

((boys, kicked), p), ((the, door)), s),

((with, ,rage), f), (boys, p, 2),

(kicked, p, 1), (the, s, 2), (door, s, 1),

(with, f, 1), (rage, f, 2),

((boys, kicked), p, 1), ((the, door)), s, 1),

((with, ,rage), f, 1)

Page 11: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Calculating Significance Weights for Evidence Items

Candidate items may be classified into two groups, accepted (A) and rejected (R).

Use these groups to calculate SW:

where s is a constant to filter noise and f(x,X) is the frequency of x in X.

• SW takes values between -1.0 & 1.0• For some e, SW(t)>e>0 is taken as positive evidence

and SW(t)<-e is taken as negative evidence.

SW (t) = f(t,A)-f(t,R) f ( t , A ) + f ( t , R ) > s f(t,A)+y(t,R) 0 otherwise

Page 12: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Combining SW weights

Then these SW weights for a given candidate item are combined and if the result exceeds a threshold, then they become available during the next tagging stage.

the primary scheme used by the authors for combining is:

x + y - xy if x>0 and y>0

x o y = x + y + xy if x<0 and y<0

x + y otherwise

Note: Values still remain with [-1.0, 1.0]

+

Page 13: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Bootstrapping

The basic bootstrapping process then looks likethis:

Procedure Bootstrapping Collect seeds l o o p Training phase (calc. SW for each evidence item)

Tagging phase (combine SW for each candidate item)

until Satisfied.

Page 14: A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman

Experiments and Results

Organization Names Training on 7 MB WSJ corpus,

Testing on 10 selected articles. With seed context features,

precision 97%, recall 49% Reached P=95% & R= 90% after 4th cycle Similar experiment for identifying products,

lower performance (about 70% R, 70% P)