extended named entity ontology with attribute information satoshi sekine new york university lrec...

15
Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Upload: suzanna-bates

Post on 27-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Extended Named Entity Ontologywith Attribute Information

Satoshi SekineNew York University

LREC 2008May 28, 2008

Page 2: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Named Entity

• Named Entity is the most important information unit in many Information Access applications (such as IE, Q&A, Summarization, IR, MT)

• History– MUC6   First define Named Entity

• Person, Location, Organization, Date, Time, Money, Percent– IREX

• MUC6 + Artifact– ACE (20 kinds),TIMEX (Standerdized Time Expression)

• Problem: Is it enough with 7~20 categories? What is the meaning of names?

Page 3: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Extended Named Entities

• Extended to 200 categories (LREC 02,04)– Finer categories

• Location →   GPE ( Country, Province, City… )  →  Geographical region (landform, water

form …)  →  Region ( Domestic region,

Continental region … )  →  Astral body ( Star, Planet … )

– New categories• Line ( Railroad, Road, Waterway, Tunnel Bridge … )• Product (Vehicle, Food, Cloth, Weapon, Award …)• Event (Games, Conference, Natural Phenomena, War …)• Disease, Currency, God …• Era, Age, Color, Unit

Page 4: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008
Page 5: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Development of ENE

• Long time, steady development for years– Capital words in English newspaper (~2000)– Q&A, IE examples– Refer Encyclopedia, WordNet,,,– Refer Related work, Related systems– 100->140->200->210

• Used in IE and Q&A system and refine the definition

• http://nlp.cs.nyu.edu/ene

Page 6: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

What is Named Entity?

• Name is only a label• Properties and Attributes are the essential meaning

• “Hudson River” is still “Hudson River” even if people call it “Muh-he-kun-ne-tuk”

• Meaning of the entity can discerned from– “the river is in New York State”– “It is 507 km in length”– “It runs Adirondack Mountains to Upper New York Bay”

• Name is only a label which can be used to refer to the river

Page 7: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Attributes

• “River” has attributes such as “source location”, “outflow”, “length” and so on

• “People” has attributes such as “occupation”, ”birth date”, “nationality” and so on

• Design those attributes and construct the knowledge will be very useful on the applications of NLP technologies– Q&A, IE, IR, Dialogue, co-reference…

Page 8: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Design of the attributes

• We use encyclopedia– Encyclopedia is the knowledge archive of

named entities (dictionary for common words)– Description must contain many attributes

• We will extract attributes from description of named entities (samples) and compile general attributes for each category

Page 9: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Procedure

1. Extract (up to 50) sample name entity instances for each categories. We use a famous Japanese Encyclopedia, “Nippon Daihyakka (Nipponica)” published by Shogakkan Inc.

2. Annotators extract possible attribute values from description of the samples, and name the attribute label

(Attribute values must be a noun phrase or equivalent)

3. Unify the attribute labels and identify the important (essential and mandatory) attributes for each category

4. Redesign the ENE categories5. Construct a set of attributes

Page 10: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Attributes for Person(20) Example of value Freq. ENE

Vocation Professional baseball player 46(100) Vocation

Nationality American, Chinese, Japanese 29(63) Country

Career Professor at Yale University 26(57) Vocation

Masterpiece Guernica, Mona Lisa 25(54) Product, Facility

Graduate M.A. in German at Cambridge 20(44) School

Hometown Paris, Manchester, Shanghai 19(41) City

Native Providence State of Illinois, Sichuan 18(39) Province

Previous stay England, New york 12(26) Location

Mentor Andrea del Verrocchio 10(22) Person

Death date 04/23/1704, unknown 10(22) Date

Era The 11th Century 8(17) Era

Award Academy Award, MVP, Nobel Prize 8(17) Award

Real name Saint Nicholas 8(17) Person

Another name Santa, father Christmas 8(17) Person

Title Knight, an honorary degree at Yale 6(13) Title

Competition World Series, 1955 piano competition in Paris 6(13) Game

Place of death New York, Brirmingham 5(11) Location

Father John B. Kelly, Sr. 5(11) Person

Cause of death Car accident, Guillotine 5(11)  

Page 11: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Attributes for International Organization

17 Example of value Freq. ENE

Another name CARICOM, EMU, CCDN 30(75) Inter. Org.

Year founded 1/10/1920, 2004 26(65 Date

Purpose of foundation Encouragement of the African economy 23(58)

Number of signatories 170 countries, 190 20(50) N_Country

Type League of Nations, International Labor Organization 16(40)

Headquarters New York, Prague 13(33) City

Agreement, Proposal Covenant of the League of Nations 12(30) Rule

Top Organization EU (the European Union) 11(28) Inter. Org.

Member China, Senegal, Norway 10(25) Country

Predecessor African Union (OAU), Caribbean Free Trade Association 9(23) Inter. Org.

Subsidiary Organization International Amateur Athletics Federation 8(20) Organization

Rank Board of directors, Special UN Organization 7(18)

Headquarters (country) Japan, Czech, Ethiopia 7(18) Country

Year of dissolution 1974, 06/20/1977 6(15) Date

Proposer Country USA, England, Luxemburg 5(13) Country

Successor Organization United Nations Economic and Social Commission for Asia and the Pacific 5(13) Inter. Org.

Proposer (Person) Eisenhower, Colonel Qadhafi , Pierre Wellner 4(10) Person

Page 12: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Problemswe encountered and/or we haven’t solved yet

1. Entity dependent attributes   ex) Song/Poem of river, “Loreley” on “Rhine River”

2. Fineness of attributeex) Bird’s “color of head” or “color of body”

3. Span of value expressionLonger than a noun phrase, ex) definition

4. Structure in valueex) Museum’s exhibit has own attributes (author, year)

5. ENE category definitionAttributes are useful to define categories, but not always

6. Distinction of mandatory and optionalDistinction of Property and attribute

Page 13: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Inter-annotator Agreement

• 2 annotator work on Person, Landform, International Organization and Academy

• They agree more often on attributes which have values very often

• They disagree the span of values

Percentage of having values

~60% ~40% ~10%

Agree 13 37 61

Disagree 2 3 16

Page 14: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Summary

• Design Attributes on Extended Named Entity– Attributes are important in applications

– Created based on Encyclopedia description

– Document available (in Japanese, English in progress)

– Dictionary / Tagger in development

• http://nlp.cs.nyu.edu/ene

Page 15: Extended Named Entity Ontology with Attribute Information Satoshi Sekine New York University LREC 2008 May 28, 2008

Application

• Q&A/IR– What is the 15th highest mountain in the world

– How many mountains are there which is higher than 6000m

– Tell me the major league player from New York

– I met Satoshi Sekine from New York

• Document understanding– “Yankees came back home!!”

– “I visited the Marakech’s main sightseeing places”