ontonotes: the 90% solutioncemantix.org/papers/ontonotes-tutorial.pdfkhan korea large geopolitical...

339
The OntoNotes Project OntoNotes in a Nutshell Tutorial Overview OntoNotes: The 90% Solution Sameer S Pradhan 1 Nianwen Xue 2 1 BBN Technologies, Cambridge, MA 2 Brandeis University, Waltham, MA HLT/NAACL 2009, Boulder, Colorado Pradhan, Xue OntoNotes: The 90% Solution

Upload: others

Post on 26-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

OntoNotes: The 90% Solution

Sameer S Pradhan1 Nianwen Xue2

1BBN Technologies, Cambridge, MA2Brandeis University, Waltham, MA

HLT/NAACL 2009, Boulder, Colorado

Pradhan, Xue OntoNotes: The 90% Solution

Page 2: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

The OntoNotes Project

The OntoNotes Project started in 2006 and is a collaborationbetween

BBN TechnologiesRalph WeischedelLance RamshawSameer Pradhan

Brandeis UniversityNianwen Xue

University of ColoradoMartha Palmer

University of Pennsylvania, andMitch Marcus

USC’s Information Sciences InstituteEduard HovyRobert Belvin

Pradhan, Xue OntoNotes: The 90% Solution

Page 3: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

What is OntoNotes? (I)

Multiple layers of annotationSyntaxPropositionsWord senseCoreferenceNamesOntology

Multilingual resource

EnglishChineseArabic

Parallel Data5

PropBank Coreference

OntoNotes Annotated Text

Ontology

Names

Treebank

Text

Word Sense V/N

Pradhan, Xue OntoNotes: The 90% Solution

Page 4: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

What is OntoNotes? (II)

Skeletal representation of literal meaning

Find “sweet spot”

In depth of representationInter-Annotator Agreement (∼90%)Productivity

Integrated Representation

API for ease of use

Distribute data widely through LDC

Pradhan, Xue OntoNotes: The 90% Solution

Page 5: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

The Grand View

4

OntoNotes: Integrated Shallow Semantic Annotation

NP

S

NP

NPNPNP NP NP NPNPNP

PP

PP

has and toof

VP S

VP

NP

e2:founder2

e1:admit1

e9

e8e4e3 e7 e12e11and

e6:transfer2

Arg0 Arg1

Arg0 Arg1

Arg2-to

Arg1

Thefounder

Pakistan’sNucleardepartment

AbdulQadeerKhan

admitted he transferred Nucleartechnology

Iran, Libya NorthKorea

Pakistan’s AbdulQadeerKhan

he nucleartechnology3

Iran Libya NorthKorea

e10

Syntax Propositions

Coreference Ontology

engineering

subject area

cognition

object

mental object

nation

large geopolitical entity

geopolitical entity

social object tangible object

event

summum genus

nucleardepartment2

originator …

causal agent

admit

tangible event

judge

e2 e4e7

=e2 =

NP

S

NP

NPNPNPNPNPNP NPNP NPNP NPNPNPNPNPNP

PP

PP

has and toof

VP S

VP

NP

e2:founder2

e1:admit1

e9

e8e8e4e4e3e3 e7e7 e12e12e11and

e6:transfer2

Arg0 Arg1

Arg0 Arg1

Arg2-to

Arg1

Thefounder

Pakistan’sNucleardepartment

AbdulQadeerKhan

admitted he transferred Nucleartechnology

Iran, Libya NorthKorea

Pakistan’s AbdulQadeerKhan

he nucleartechnology3

Iran Libya NorthKorea

e10e10

Syntax Propositions

Coreference Ontology

engineering

subject area

cognition

object

mental object

nation

large geopolitical entity

geopolitical entity

social object tangible object

event

summum genus

nucleardepartment2

originator …

causal agent

admit

tangible event

judge

e2 e4e7

=e2 =

and Word Sense

The founder of Pakistan’s nuclear department, Abdul Qadeer Khan, has admitted he transferred nuclear technology to Iran, Libya, and North Korea.

The Founder of Pakistan’s nuclear department, Abdul Qadeer Khan, has admitted he transferrednuclear technology to Iran, Libya and North Korea.

Pradhan, Xue OntoNotes: The 90% Solution

Page 6: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Amount of Data

Quantity Matrix

200Arabic

150300250Chinese

200200500English

BCBNNW

OntoNotes 3.0

(K Words)

Pradhan, Xue OntoNotes: The 90% Solution

Page 7: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Parallel Data Quantities: Full OntoNotes

25

Parallel Data: English-Chinese in OntoNotes 3.0

133 100

195

154

57

53

47

55

0

50

100

150

200

250

300

350

400

450

500

English Chinese

K W

ord

s

NW – ECTB – Xinhua

NW – ECTB – Sinorama

BC – Chinese source

BC – English source

Translation Direction

Parallel data with full OntoNotes coverage

Pradhan, Xue OntoNotes: The 90% Solution

Page 8: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Parallel Data Quantities: Only Treebank

26

Parallel Data: English-Chinese Y4 Plan

43 35

7155

21

17

19

16

20

16

20

16

0

20

40

60

80

100

120

140

160

180

200

English Chinese

K W

ord

s P2.5 – NW – Chinese source

Web – Chinese source

P2.5 – BN – Chinese source

P2.5 – BC – Chinese source

P2.5 – Web – Chinese source

Web – English source

(Parallel Treebank for this data is already available.)

Pradhan, Xue OntoNotes: The 90% Solution

Page 9: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Issues with Parallel Data

Trade-offs

Translated data may not be predictive of the language as awholeTranslated versions of informal genres might end-up moretext-like

IssuesLong lead time involved

Data selection and translationTreebankingPropbanking, Word Sense, Coreference

Pradhan, Xue OntoNotes: The 90% Solution

Page 10: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

An Example (Document)Example

The court ruled this senior Libyan intelligence agent planted the bomb that killed 270, mostly Americans, when the plane bound for New York exploded over Lockerbie, Scotland.

And one of the longest running struggles for international justice reached a milestone today of sorts, when a Scottish court, meeting in the Netherlands, finally officially found someone guilty in the 1988 bombing that brought down Pan Am Flight 103. …

A split decision for Lamen Khalifa Fhimah, aquittal, but AbdelBasset Ali Al-megrahi found guilty as charged. …

Pradhan, Xue OntoNotes: The 90% Solution

Page 11: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Example (Names)Example Names

The court ruled this senior Libyan intelligence agent planted the bomb that killed 270, mostly Americans, when the plane bound for New York exploded over Lockerbie, Scotland.

NORP (Nationality, Organization, Religious, Political)

GPE

Cardinal

Pradhan, Xue OntoNotes: The 90% Solution

Page 12: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Example (Parse Tree)Example Parse Tree

...(S (NP-SBJ (DT this)

(JJ senior)(JJ Libyan)(NN intelligence)(NN agent))

(VP (VBD planted)(NP (NP (DT the)

(NN bomb))(SBAR (WHNP-1 (WDT that))

(S (NP-SBJ (-NONE- *T*-1))(VP (VBD killed)

(NP (NP (CD 270))(, ,)(NP (ADVP (RB mostly))

(NNPS Americans)))(, ,)

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 13: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Example (PropBank)

Example PropBank

9 planted (PB frame: plant.01)ARG0 4:1 this senior Libyan intelligence agentARG1 10:2 the bomb that *T*-1 killed 270 , mostly

Americans , when the plane bound * for New York exploded over Lockerbie , Scotland *T*-2

14 killed (PB frame: kill.01)ARG0 13:0 *T*-1

12:1 thatLINK-SLC 10:1 the bombARG1 15:2 270 , mostly AmericansARGM-TMP 20:2 when the plane bound * for New York

exploded over Lockerbie , Scotland *T*-2

Token number and height in tree

Pradhan, Xue OntoNotes: The 90% Solution

Page 14: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Example (Word Sense)Example Word Sense

Court-N1: a sovereign regime and its assemblage2: assembly that transacts judicial business3: demarcated area for sports play4: a room in which judicial proceedings occur… 8: respectful deference

Plant-V1: place into the ground for growing2: place firmly3: place secretly, often for later discovery4: establish, settle

Kill-V

1: cause death, be fatal2: cause great pain or anguish3: eliminate4: thwart…9: drink down

Pradhan, Xue OntoNotes: The 90% Solution

Page 15: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Example (Coreference)

Example Coreference

Chain 000-8 (IDENT)0.31-31 someone1.0-0 He5.11-16 Abdel Basset Ali Al - megrahi6.4-8 this senior Libyan intelligence agent12.1-4 Al - megrahi 's14.10-10 he

Chain 000-20 (IDENT)1.24-25 the victims4.18-24 the victims of Pan Am Flight 1036.15-18 270 , mostly Americans14.7-8 270 people

Chain 000-9 (IDENT)0.34-44 the 1988 bombing that *T*-3 brought down

Pan Am Flight 1036.28-28 exploded20.33-34 this act22.32-33 this crime

Sentenceandtokennumbers

Pradhan, Xue OntoNotes: The 90% Solution

Page 16: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Example (Summary)Example Summary

The court ruled this senior Libyan intelligence agent planted the bomb that killed 270, mostly Americans, when the plane bound for New York exploded over Lockerbie, Scotland.

...(S (NP-SBJ (DT this)

(JJ senior)(JJ Libyan)(NN intelligence)(NN agent))

(VP (VBD planted)(NP (NP (DT the)

(NN bomb))(SBAR (WHNP-1 (WDT that))

(S (NP-SBJ (-NONE- *T*-1))(VP (VBD killed)

(NP (NP (CD 270))(, ,)(NP (ADVP (RB mostly))

(NNPS Americans)))(, ,)

...

Plant-V1: place into the ground for growing2: place firmly3: place secretly, often for later discovery4: establish, settle

Kill-V

1: cause death, be fatal2: cause great pain or anguish3: eliminate4: thwart…9: drink down

Chain 000-8 (IDENT)0.31-31 someone1.0-0 He5.11-16 Abdel Basset Ali Al - megrahi6.4-8 this senior Libyan intelligence agent12.1-4 Al - megrahi 's14.10-10 he

Chain 000-20 (IDENT)1.24-25 the victims4.18-24 the victims of Pan Am Flight 1036.15-18 270 , mostly Americans14.7-8 270 people

9 planted (PB frame: plant.01)ARG0 4:1 this senior Libyan intelligence agentARG1 10:2 the bomb that *T*-1 killed 270 , mostly

Americans , when the plane bound * for New York exploded over Lockerbie , Scotland *T*-2

14 killed (PB frame: kill.01)ARG0 13:0 *T*-1

12:1 thatLINK-SLC 10:1 the bombARG1 15:2 270 , mostly AmericansARGM-TMP 20:2 when the plane bound * for New York

exploded over Lockerbie , Scotland *T*-2

Pradhan, Xue OntoNotes: The 90% Solution

Page 17: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

What is OntoNotes?DataAn ExampleRelated Work

Compared to other resourcesOntoNotes compared to other resources

• A new operating point in terms of:

1

����

WordNet/SemCor

������������> 1M words

2+3*11Genres

����Coref

����Sense Tags

ITA > 90%

3*

����

����

����

OntoNotes

2*

����

����

Prague

1Languages

Sense Tags

ITA > 70-80%

����Propositions

����Syntax

Salsa Annotations

*English, Chinese,

Arabic

*Czech,

English,

*NW, BN, +BC,

NG, WebLogs

Pradhan, Xue OntoNotes: The 90% Solution

Page 18: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

Part I: Annotation LayersPart II: Data Access API

Annotation Layers

1 Treebank

2 PropBank

3 Word Sense

4 Ontology

5 Coreference

6 Names

Pradhan, Xue OntoNotes: The 90% Solution

Page 19: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

The OntoNotes ProjectOntoNotes in a Nutshell

Tutorial Overview

Part I: Annotation LayersPart II: Data Access API

Data Access API

7 Challenges with Multiple Layers of Annotation

8 Architecture

9 Raw Data

10 Database Design

11 Python API Design

12 Data Access

Pradhan, Xue OntoNotes: The 90% Solution

Page 20: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Part I

Annotation Layers

Pradhan, Xue OntoNotes: The 90% Solution

Page 21: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

Treebank

Pradhan, Xue OntoNotes: The 90% Solution

Page 22: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

Syntactic Structure

Phrase TypesFunction TagsTraces and Co-indexing

Syntax Layer

• Phrase Types

• Functional Tags

• Traces and Co-indexing

NP-TTL-SBJ-1

NP-TTL PP-TMP

NP-LGS

*-1“ Lighthouse II ” was painted in oils by the playwright in 1901 -

PP-MNR

TOP

VP

NNP NNP VBD VP

VBN

-NONE- IN NP

NNS

IN

PP

DT NN CD

ININ NP

-

S

“ ”

Pradhan, Xue OntoNotes: The 90% Solution

Page 23: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

A English Treebank Example

S

NPIN

VBD

VP

NP

NP-SBJ

PP

ADVP

The Mortgage and equity real estate investment trust last paid a dividend on August 1, 1988

Pradhan, Xue OntoNotes: The 90% Solution

Page 24: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

A English Treebank Example

S

NPIN

VBD

VP

NP

NP-SBJ

PP

ADVP

The Mortgage and equity real estate investment trust last paid a dividend on August 1, 1988

Pradhan, Xue OntoNotes: The 90% Solution

Page 25: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

A English Treebank Example

S

NPIN

VBD

VP

NP

NP-SBJ

PP

ADVP

The Mortgage and equity real estate investment trust last paid a dividend on August 1, 1988

Pradhan, Xue OntoNotes: The 90% Solution

Page 26: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

Adding NP internal structure: NMLs

Penn Treebank 2 left prenominals flatNML constituents fill in that structure

Assume a default right-branching structureSpecify NMLs where necessary

15

Improving Eng Treebank Consistency: NMLs

PTB2 Right-Branching With NML

(NP (DT a)

(NML (CD 10,000) (NN square) (NN meter))

(NN visitor)

(NN center))

(NP (DT this)

(NML (JJ large) (HYPH -) (NN scale))

(NML (NN light) (CC and) (NN music))

(NN show))))))))

Pradhan, Xue OntoNotes: The 90% Solution

Page 27: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

Improving English Treebank Consistency: Hyphenization

Original treebank did not split any hyphens

More recent treebanks were not very consistent on whichhyphenated tokens were split.

This complicates things for parser and parser evaluations

Trees were revised to split consistently on “most” hyphensAdd a GW (goes with) POS tag

Covers elements like “co-” in “co-operate”

Insert appropriate tree structure over the newly split tokensFor any additional layers of existing annotation (includingPropBank and Word Sense):

Adjust token-based pointersAnnotate additional examples in newly-exposed tokens

Pradhan, Xue OntoNotes: The 90% Solution

Page 28: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

Chinese Treebanking

Penn English Treebank approach

Phrase structure annotationEmphasis on trade offs of annotation speed and consistencyITA: 94%

With enriched structures

All structures build on four primitive structures

Pradhan, Xue OntoNotes: The 90% Solution

Page 29: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

A Treebanked Chinese SentenceA treebanked sentence

If among the 100 pieces of news one piece is made up, the reader will also doubt the other 99 pieces.

CS CD M NN

LC VE

CD M JJ NN

PU

NN

P

DT

CD M AD

MD

VV

NN

PU

CLP

DP

ADVP

NP

VP

VP

VP

PP

NP

IP

ADJP NP

NP-OBJ

CLP

VP

CLP NP

NP

LCP-SBJ

IPADVP

CP

如果如果如果如果ifififif

100

100

条条条条

CL

新新新新闻闻闻闻news

中中中中

in

有有有有

exist

条条条条

CL

假假假假

fake

新新新新闻闻闻闻news

读者读者读者读者reader

对对对对toward

另另另另外外外外

other

99

99

条条条条

CL

也也也也

also

会会会会

will

产生产生产生产生arise

怀怀怀怀 疑疑疑疑

doubt

。。。。

.

一一一一

one

,,,,

,

IP ~ S

CP ~ SBAR

LCP = Localizer Phrase

CLP = Classifier Phrase

Pradhan, Xue OntoNotes: The 90% Solution

Page 30: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

A Treebanked Chinese SentenceA treebanked sentence

If among the 100 pieces of news one piece is made up, the reader will also doubt the other 99 pieces.

CS CD M NN

LC VE

CD M JJ NN

PU

NN

P

DT

CD M AD

MD

VV

NN

PU

CLP

DP

ADVP

NP

VP

VP

VP

PP

NP

IP

ADJP NP

NP-OBJ

CLP

VP

CLP NP

NP

LCP-SBJ

IPADVP

CP

如果如果如果如果ifififif

100

100

条条条条

CL

新新新新闻闻闻闻news

中中中中

in

有有有有

exist

条条条条

CL

假假假假

fake

新新新新闻闻闻闻news

读者读者读者读者reader

对对对对toward

另另另另外外外外

other

99

99

条条条条

CL

也也也也

also

会会会会

will

产生产生产生产生arise

怀怀怀怀 疑疑疑疑

doubt

。。。。

.

一一一一

one

,,,,

,

IP ~ S

CP ~ SBAR

LCP = Localizer Phrase

CLP = Classifier Phrase

Pradhan, Xue OntoNotes: The 90% Solution

Page 31: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

(Hypothetical) Chinese Treebank

S

NPINVBD

VP

NP

NP-SBJ

PPADVP

The Mortgage and equity real estate investment trust last paid a dividend on August 1, 1988

VP

Pradhan, Xue OntoNotes: The 90% Solution

Page 32: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

(Hypothetical) Chinese Treebank

S

NPINVBD

VP

NP

NP-SBJ

PPADVP

The Mortgage and equity real estate investment trust last paid a dividend on August 1, 1988

VP

Pradhan, Xue OntoNotes: The 90% Solution

Page 33: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

(Hypothetical) Chinese Treebank

S

NPINVBD

VP

NP

NP-SBJ

PPADVP

The Mortgage and equity real estate investment trust last paid a dividend on August 1, 1988

VP

Pradhan, Xue OntoNotes: The 90% Solution

Page 34: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

Complementation (left-headed)Complementation (left-headed)

XP

X YP {ZP}

DP

DeTerminer QP

VP

VV NP

PP

P NP

Pradhan, Xue OntoNotes: The 90% Solution

Page 35: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

AdjunctionAdjunction

XP

{YP} XP {ZP}

VP

ADVP VP

ADJP

ADVP ADJP

Pradhan, Xue OntoNotes: The 90% Solution

Page 36: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Syntactic StructureEnglish TreebankEnhancements to the English TreebankChinese Treebank

CoordinationCoordination

XP

XP CC XP

VP

VP CC VP

VP

VP PU VP

NP

NP CC NP

Pradhan, Xue OntoNotes: The 90% Solution

Page 37: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

PropBank

Pradhan, Xue OntoNotes: The 90% Solution

Page 38: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Propositional Structure

Tells Who did What to Whom, When, Where, How, etc.

For both verbs and nouns

ARG2

ARGM-LOC

Propositional Structure

• Tells who did what to whom, when, where, how, etc.

• For both verbs and nouns

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

... major reductions and realignments of troops in central Europe – ...

NP

NP

JJ NNS CC NNS IN NP

NNS

PP

IN NP

JJ NNP

PP

S

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

ARG1

Pradhan, Xue OntoNotes: The 90% Solution

Page 39: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Predicate Frames

Predicate frames define the meanings of the numberedarguments

reduce.01 – Make less

Predicate Frames

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troopsin central Europe – also are being registered at the Pentagon .

reduce.01 – Make less

ARG0 – Agent ARG1 – Thing fallingARG2 – Amount fallenARG3 – Starting pointARG4 – Ending point

• Predicate frames define the meanings of the numbered arguments

-of troopsmajor--

Pradhan, Xue OntoNotes: The 90% Solution

Page 40: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Frame Examples: expect, replace

Portfolio managers expect further declines in interest rates

expect.01 – Look forward to; anticipate

ARG0 – Expecter

ARG1 – Thing expected

Portfolio managers

further declines in interest rates

Continental Air replaced its top executive for the sixth time in as

many years

replace.01 – substitute

ARG0 – replacer

ARG1 – old thing

ARG2 – new thing

Continental Air

Its top executive

for the sixth time in as many years

Pradhan, Xue OntoNotes: The 90% Solution

Page 41: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Frame Examples: increase

Net income increased to $274 million from $130 million

increase.01 – go up incrementally

ARG0 – causer of increase ARG1 – thing increasingARG2 – amount increased byARG3 – starting pointARG4 – end point

-Net income-from $130 millionTo $274 million

Pradhan, Xue OntoNotes: The 90% Solution

Page 42: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Word Senses in PropBank

Some word sense distinctions do not change the type ofargument that a predicate can take, but some do.Propbank makes only sense distinctions that necessitate adifferent argument structure or when the argument havedifferent meaning.

Mary left the roomIf he knew how to handle the finances, I’d leave him lots ofmoney

leave.01 – move away from

ARG0 – entity leaving

ARG1 – place left

leave.02 – give

ARG0 – giver

ARG1 – thing given

ARG2 – beneficiary

PropBank semantic roles are really predicate/framesense-specific

Pradhan, Xue OntoNotes: The 90% Solution

Page 43: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Trends in Argument Numbering

Arg0 = agent

Arg1 = direct object/theme/patient

Arg2 = indirect object/benefactive/instrument/attribute/end state

Arg3 = start point/benefactive/instrument/attribute

Arg4 = end point

Consistency for Arg0 and Arg1, but not so much for Arg2,Arg3, ...

Pradhan, Xue OntoNotes: The 90% Solution

Page 44: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Additional tags: ArgMs (arguments or adjuncts?)

TMP: When?

LOC: Where at?

DIR: Where located?

MNR: How?

PRP: Why?

REC: himself, themselves, each other

PRD: This argument refers to, or modifies another

ADV: Catch all

Pradhan, Xue OntoNotes: The 90% Solution

Page 45: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Annotation Procedure

predicate

Frame creation:Argument definitions,

examples, etc. (1 person)

Automatic tagging(machine)

Double blind hand correctionand frame sense tagging

(2 people)

Adjudication: fix remainder(1 person)

Results: ok agreement?not ok

ok

Frame creation

Annotation

Pradhan, Xue OntoNotes: The 90% Solution

Page 46: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Chinese PropBank

Similar in style to English PropBank

Predicate-specific numbered labels for core argumentsArgMs for adjunctive argumentsCoarse-gained senses

There are some differences

In how split arguments are handledMulti-word expressions are dealt with

Pradhan, Xue OntoNotes: The 90% Solution

Page 47: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Argument structure of verb and nounA treebanked sentence

If among the 100 pieces of news one piece is made up, the reader will also doubt the other 99 pieces.

CS CD M NN

LC VE

CD M JJ NN

PU

NN

P

DT

CD M AD

MD

VV

NN

PU

CLP

DP

ADVP

NP

VP

VP

VP

PP

NP

IP

ADJP NP

NP-OBJ

CLP

VP

CLP NP

NP

LCP-SBJ

IPADVP

CP

如果如果如果如果ifififif

100100

条条条条

CL新新新新闻闻闻闻news

中中中中

in有有有有

exist条条条条

CL假假假假

fake新新新新闻闻闻闻news

读者读者读者读者reader

对对对对toward

另另另另外外外外

other9999

条条条条

CL也也也也

also会会会会

will产生产生产生产生arise

怀怀怀怀 疑疑疑疑

doubt。。。。

.一一一一

one,,,,

,

ARG0

ARG1

Support

ARG0

ARG1

Pradhan, Xue OntoNotes: The 90% Solution

Page 48: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Traces and Split Arguments in English PropBankTraces and split arguments

in English Propbank● Traces[What matters is what advertisers will pay]-1, said *T*-1 Newsweek's chairman

REL: said

Arg1: *T*

Arg0: Newsweek's chairman

● Split Arguments

"What you have to understand," said John [*?*], "is that Philly literally stinks."

Arg1: [*?*] → ["What you have to understand"] ["is that Philly literally stinks"]

REL: said

Arg0: John

Pradhan, Xue OntoNotes: The 90% Solution

Page 49: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Traces in Chinese PropBankTraces in the Chinese Propbank

目前目前目前目前为止为止为止为止,,,,中国中国中国中国 纺织纺织纺织纺织 工业工业工业工业 承建承建承建承建 *T* 的的的的 最大最大最大最大 项目项目项目项目

now till , Chinese textile industry take on DE largest project

“the largest project that the Chinese textile industry has taken on so far”

ARGM-TMP: 目前目前目前目前为止为止为止为止 “so far”

ARG0: 中国中国中国中国 纺织纺织纺织纺织 工业工业工业工业 “Chinese textile industry”

REL: 承建承建承建承建 “take on”

ARG1: *T* →最大最大最大最大 项目项目项目项目 “largest project”

Pradhan, Xue OntoNotes: The 90% Solution

Page 50: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

PossessionPossession

茅台茅台茅台茅台 酒酒酒酒 制作制作制作制作 工艺工艺工艺工艺 复杂复杂复杂复杂 , 生产生产生产生产 周期周期周期周期 长长长长 。。。。

Maotai liquor brewing process complex , production cycle long .

“The brewing process of Maotai Liquor is complex, and its production cycle is

long.”

REL: 复杂复杂复杂复杂 “complex” REL: 长长长长 “long”

ARG0-PSR: 茅台酒茅台酒茅台酒茅台酒 “Maotai liquor” ARG0-PSE: 茅台酒茅台酒茅台酒茅台酒 “Maotai Liquor”

ARG0-PSE: 制作工艺制作工艺制作工艺制作工艺 “brewing process” ARG0-PSE: 生产周期生产周期生产周期生产周期 “production cycle”

Pradhan, Xue OntoNotes: The 90% Solution

Page 51: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Possession (II)Possession (cont’d)

三三三三 大大大大 法典法典法典法典 须须须须 加快加快加快加快 出台出台出台出台 进程进程进程进程 。。。。

Three main law need accelerate promulgation process .

“The promulgation process of the three main laws need to be accelerated.”

PRED: “accelerate”

ARG1-PSR: “three main laws”

ARG1-PSE: “promulgation process”

Pradhan, Xue OntoNotes: The 90% Solution

Page 52: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

PredicationPredication

西非西非西非西非 经济经济经济经济 明显明显明显明显 恢复恢复恢复恢复 增长增长增长增长。。。。

West African economy clearly resume grow .

“West African economy clearly resumed growing”

ARG0: 西非经济西非经济西非经济西非经济 “West African economy”

PRED: 恢复恢复恢复恢复 “resume”

ARGM-ADV: 明显明显明显明显 “clearly”

ARG0-PRD: 增长增长增长增长 “grow”

Pradhan, Xue OntoNotes: The 90% Solution

Page 53: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Propositional StructurePredicate FramesPredicate SensesArgument NomenclatureAnnotation ProcedureChinese PropBankTreebank and PropBank

Reconciling Treebank and PropBank

We found several mismatches between syntax andpropositions

Sometimes PropBank was rightSomethings Treebank was right

Ambiguities were resolved (PP-attachment)Guidelines were modified to bring the two in sync

Modified list of verbs that take small-clauses and sententialcomplements (eg. keep their markets active)A different approach to annotation of empty categories

Now each argument points to a single node in the treeSecondary connections are made using Treebank trace chainsAlmost no discontinuous argumentsNon-trace connections are explicitly identified as LINK-SLCand LINK-PCR

Pradhan, Xue OntoNotes: The 90% Solution

Page 54: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Sense GroupingAnnotation ProcedureConnection to Ontology

Word Sense

Pradhan, Xue OntoNotes: The 90% Solution

Page 55: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Sense GroupingAnnotation ProcedureConnection to Ontology

WordNet, OntoNotes and PropBank senses for develop-vWN, ON and PB Senses for verb “develop”

Create

Come about

Alter by chemical means

Bring into existence

Superimpose

Further grow15

7

8

6

1211

13

2

19

16

5

9

10

14 20

Pradhan, Xue OntoNotes: The 90% Solution

Page 56: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Sense GroupingAnnotation ProcedureConnection to Ontology

Sense Annotation Procedure

52

Sense Annotation Process

word

Sense creation:definitions, examples, etc.

(1 person)

Pre-annotation: 50 instances(2 people)

Full annotation: all instances(2 people)

Results: ok agreement?

Adjudication: fix remainder(1 person)

Results: ok agreement?

not ok

not ok

ok

ok

Sense creation

Annotation

Pradhan, Xue OntoNotes: The 90% Solution

Page 57: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Sense GroupingAnnotation ProcedureConnection to Ontology

Word Sense and Ontology

Meaning of nouns and verbs are specified using a catalog ofpossible sensesAll the senses are annotatable at ∼90% ITAOntology links (currently being added) capture similaritiesbetween related senses of different words

Word Sense and Ontology

• Meaning of nouns and verbs are specified using a catalog of possible senses

• All the senses are annotatable at 90% ITA

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

aim

1. Point or direct object, weapon, at something ...

2. Wish, purpose or intend to achieve something

register

1. Enter into an official record2. Be aware of, enter into someone’s

conciousness3. Indicate a measurement4. Show in one’s face

2. Wish, purpose or intend to achieve something

1. Enter into an official record

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

• Ontology links (currently being added) capture similarities between related senses of different words

Pradhan, Xue OntoNotes: The 90% Solution

Page 58: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

OntologizingAnnotation ProcedureStructure of the Ontology

Ontology

Pradhan, Xue OntoNotes: The 90% Solution

Page 59: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

OntologizingAnnotation ProcedureStructure of the Ontology

Ontologizing

55

Ontologizing

W1 W2 W3

S11 S12 S13 S21 S21 S31... ... ...

P1 P2 P3

C5C1 C2 C3 C4 C6 C7 C8Concepts

Sense Pools

Senses

Words

Documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 60: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

OntologizingAnnotation ProcedureStructure of the Ontology

Sense Pooling Procedure

56

Sense Pooling Procedure

Collect synonyms(1 person)

Create sense pools(1 person)

Validate sense pools(2 people)

Taxonomize pools (1 person)

Store results in ontology

not ok

Results: ok agreement?

ok

Ontologizing

Pradhan, Xue OntoNotes: The 90% Solution

Page 61: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

OntologizingAnnotation ProcedureStructure of the Ontology

Snapshot of the Upper Model

Reptile

ThoughtObject

Perception

Belief

Intention

Thought

ObjectRelation SocialObjectRelation

PhysicalObjectRelation

EventAsObject

FunctionOf*NaturalNonLivingObject

FunctionOf*Artifact

TimeOfEvent

EducationalOrganization

PartOf*NaturalNonLivingObject

Fish

IndividualRole

SpatialLocation

LocationOf*Artifact

LocationOf*NaturalNonLivingObject

NonOrganizedSocialCollection

PoliticalMovement

NonOrganizedCommercialEntity

SocialGathering

NonVolitionalBiologicalObject

Protist

Monera

Plant

BodyPart

Fungus

Bird

NonProfessionRole

KinshipRole

Profession

Mammal

Human

NonHumanMammal

RoleOf*Animal

InformationObject

GeneralizedLanguageObject

MathematicalObject

DisciplineStudyAbstraction

ProcedureAbstraction

Conceptualization

Name

MeasurementQuantity

NonSpatioTemporalPhysicalAbstraction

NonSocialCollection

Object UnrootedObject

TangibleObject

IntangibleObject

NonBiologicalObject

VolitionalNonBiologicalObject

NonVolitionalNonBiologicalObject

Relation

Operator

EventRole

RoleOf*Human

Collection

SocialCollection

BiologicalObject

Animal

LocationOf

RoleOf

ObjectificationOf

SetOf

FunctionOf

PartOf

OrganizedSocialCollection

TangibleVolitionalObject

TangibleNonVolitionalObject

MentalObject

ImmaterialObject

MeasurableAbstraction

Game

PoliticalOrganization

NonProfitNGONonEducationalOrganization

SportsOrganization

CommercialOrganization

MilitaryOrganization

MentalStatus

Artifact

NaturalNonLivingObject

Invertebrate

Vertebrate

PsychologicalCondition

MentalState

SupernaturalBeing

Abstraction

PartOf*Artifact

SetOf*Artifact

ShapeAndStructureAbstraction

WealthAbstraction

SpatioTemporalAbstraction

Amphibian

Emotion

Time

Pradhan, Xue OntoNotes: The 90% Solution

Page 62: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

OntologizingAnnotation ProcedureStructure of the Ontology

Snapshot of the Ontology

58

Snapshot of part of the Ontology

Pradhan, Xue OntoNotes: The 90% Solution

Page 63: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

OntologizingAnnotation ProcedureStructure of the Ontology

Ontology Structure

Upper Model

150 concepts

Sense Pools

3000 Sense Pools

Links

SubtypeRelated

Pradhan, Xue OntoNotes: The 90% Solution

Page 64: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Coreference TypesSalient pointsExamplesChallenges

Coreference

Pradhan, Xue OntoNotes: The 90% Solution

Page 65: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Coreference TypesSalient pointsExamplesChallenges

Coreference

Identifies different mentions of the same entity within adocument – especially links definite, referring noun phrases,and pronouns to their antecedentsTwo types tagged - Identity (IDENT) and Attributive(APPOS)

60

Coreference

• Identifies different mentions of the same entity within a document – especially links definite, referring noun phrases, and pronouns to their antecedents

• Two types tagged – Identity and Attributive

Concerns about the pace of the Vienna talks -- which are aimed at the destruction of some 100,000 weapons , as well as major reductions and realignments of troops in central Europe – also are being registered at the Pentagon .

President Bushconventional arms talk

Pentagon He

e0 e1 e2

of some 100,000 weapons , as well as major reductions and realignments of troops

in central Europe

Vienna talks – which are aimed at the destruction

the Pentagon

Pradhan, Xue OntoNotes: The 90% Solution

Page 66: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Coreference TypesSalient pointsExamplesChallenges

Salient pointsAll types of entities, and even events (marked by verbs) arecoreferenced

Barring few exceptions (2%) coreference links are typicallyrestricted to nodes in the syntax trees

Name, nominal and pronoun mentions are coreferenced

In pro-drop languages like Chinese and Arabic, the “*” or“*pro*” in the tree are tagged with coreference

Generic, underspecified mentions are not coreferenced

Singleton mentions are not coreferenced

Copulas are not coreferenced with each other

Only intra-document coreference is marked – When documentlengths were prohibitive, they were broken down into partsand individual part independently annotated

Pradhan, Xue OntoNotes: The 90% Solution

Page 67: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Coreference TypesSalient pointsExamplesChallenges

Coreference Annotation Examples

IDENT

[Elco Industries Inc]x said [it]x expects net income to fall belowa recent estimate of $ 1.65 a share. [The Rockford, Ill. makerof fasteners]x also said that [it]x expects to post sales in thecurrent fiscal year that are “slightly above” fiscal 1989 sales of$ 155 million.Sales of passenger cars [grew]x 22 %. [The strong growth]xfollowed year-to-year increases.

APPOS

[[The PhacoFlex intraocular lens]HEAD , [the first foldablesilicone lens available for cataract surgery]ATTRIB ]x

Pradhan, Xue OntoNotes: The 90% Solution

Page 68: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Coreference TypesSalient pointsExamplesChallenges

Special Challenges in the Broadcast Conversation Data

Disfluency EffectsFormer Iraqi war combat veteran I guess 0 <disfluency> he ’s a -- --

</disfluency> he is a present veteran Paul Hackett

Ambiguity in speaker turn labels<Firefighter_A> It began as <disfluency> an- </disfluency> any other day you know

*PRO* <uncertain> just uh </uncertain> doing eh normal checks .

<Firefighter_B> At nine o’clock we started our shift .

<Firefighter_A> And so the bells went .

<Firefighter_B> It was about a minute past nine when we got the shout for uh

<uncertain> smoke issuing </uncertain> in Allgate tube station *T*-1 .

<Andrew_Carey> The explosion at Allgate was the first of the four bombs 0 *T*-1

to go off on July the seventh at eight fifty in the morning .

<Andrew_Carey> But Paul Kelly Steve Sodbury and Mel Anderson of <uncertain>

Shadwell </uncertain> Firestation ’s blue watch had no idea what *T*-1 had

happened as they got into the fire engine *PRO*-2 to answer the call .

Pradhan, Xue OntoNotes: The 90% Solution

Page 69: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Types

Names

Pradhan, Xue OntoNotes: The 90% Solution

Page 70: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Types

Types of Names (I)

Person – People, including fictional

NORP – Nationalities, or religious or political groups

Facility – Buildings, airports, highways, bridges, etc.

Organization – Companies, agencies, institutions, etc.

GPE – Countries, cities, states, etc.

Location – Non-GPE locations, mountain ranges, bodies ofwater

Product – Vehicles, weapons, foods, etc.

Event – Named hurricanes, battles, wars, etc.

Work of Art – Titles of books, songs, etc.

Pradhan, Xue OntoNotes: The 90% Solution

Page 71: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

TreebankPropBank

Word SenseOntology

CoreferenceNames

Types

Types of Names (II)

Law – Named documents made into law

Language – Any named language

Date – Absolute or relative dates or periods

Time – Times smaller than a day

Percent – Percentage

Money – Monetary values – including unit

Quantity – Measurements as of weight and distance

Ordinal – “First”, “Second”, etc.

Cardinal – Numerals that do not fall under another type

Pradhan, Xue OntoNotes: The 90% Solution

Page 72: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Part II

Integrated representation

Pradhan, Xue OntoNotes: The 90% Solution

Page 73: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Interpreting Tree pointers in Propositions

71

PropBank Args Now Point to Treebank Nodes

• Encoded propositions point to nodes in the treebank trees

wsj/00/wsj_0037.mrg 67 5 gold set.02 ----- 0:2-ARG0 5:0-rel 6:1-ARG1 10:2-ARGM-TMPwsj/00/wsj_0037.mrg 68 5 gold paint.01 ----- 5:0-rel 1:1*6:0-ARG1 8:1-ARG2-in 10:1-ARG0-by 12:1-ARGM-TMPwsj/00/wsj_0037.mrg 69 21 gold exchange.01 ----- 17:2-ARG0 21:0-rel 22:1-ARG1 23:1-ARGM-TMP wsj/00/wsj_0037.mrg 69 35 gold say.01 ----- 31:1-ARG0 35:0-rel 0:2*37:0-ARG1

NP-TTL-SBJ-1

NP-TTL PP-TMP

NP-LGS

*-1“ Lighthouse II ” was painted in oils by the playwright in 1901 -

PP-MNR

TOP

VP

NNP NNP VBD VP

VBN

-NONE- IN NP

NNS

IN

PP

DT NN CD

ININ NP

-

S

“ ” ARG1

ARG1

Pradhan, Xue OntoNotes: The 90% Solution

Page 74: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Interpreting Argument meaning and constraints

72

Argument Meanings Specified in Frames Files

wsj_0037.mrg 67 5 gold set.02 ----- 0:2-ARG0 5:0-rel 6:1-ARG1 10:2-ARGM-TMPwsj_0037.mrg 68 5 paint.01 5:0-rel 1:1*6:0-ARG1 8:1-ARG2-in 10:1-ARG0 12:1-ARGM-TMPwsj_0037.mrg 69 21 gold exchange.01 ----- 17:2-ARG0 21:0-rel 22:1-ARG1 23:1-ARGM-TMP wsj_0037.mrg 69 35 gold say.01 ----- 31:1-ARG0 35:0-rel 0:2*37:0-ARG1

<!DOCTYPE frameset SYSTEM "frameset.dtd"><frameset><predicate lemma="paint"><note>Frames file for 'paint' based on sentences in wsj and automatic expansion via verbnet.</note>

<roleset id="paint.01" name="put paint on a surface" vncls="25.1"><roles><role descr="agent, painter" n="0"> <vnrole vncls="25.1" vntheta="Agent"/></role><role descr="surface" n="1"><vnrole vncls="25.1" vntheta="Destination"/></role><role descr="explicit mention of paint" n="2> <vnrole vncls=“25.1” vntheta="Theme"/> </role></roles>

• Meanings of the ARG labels are specified in the frames files

Pradhan, Xue OntoNotes: The 90% Solution

Page 75: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Interpreting Sense Numbers and their mappings

73

Word Senses Defined in Sense Inventory Files

Sense Numberwsj/00/wsj_0037.mrg 0 1 judge-v 2wsj/00/wsj_0037.mrg 0 36 lot-n 1

*PRO* Judging from the Americana in Haruki Murakami ’s “ A Wild Sheep Chase ” ( Kodansha, 320 pages, $18.95 *U* ) , baby boomers on both sides of the Pacific have a lot in common .

<?xml version="1.0" ?><!DOCTYPE inventory SYSTEM "inventory.dtd"><inventory lemma="judge-v"> <sense group="1" n="1" name="act as an official judge><examples> She was asked to judge the fancy-dress competition. </examples> <mappings> <wn version="2.1">1,5</wn> <pb>judge.01</pb> </mappings</sense><sense group="1" n="2" name="form an opinion, or conclusion> <examples> They quickly judged him unfit to join the team. </examples><mappings> <wn version="2.1">2,3,4</wn> <pb>judge.01</pb> </mappings></sense></inventory>

• The meaning of sense numbers is specified in the sense inventory files

Pradhan, Xue OntoNotes: The 90% Solution

Page 76: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Challenge with Multiple Layers of Annotation

Not previously availableA number of these layers have not been available in significantquantity before:

Word SenseCoreference

Not previously integrated

Not previously completely consistent

Not previously easily accessible

Raw text format

Not user friendly

Pradhan, Xue OntoNotes: The 90% Solution

Page 77: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Challenge with Multiple Layers of Annotation

Not previously availableA number of these layers have not been available in significantquantity before:

Word SenseCoreference

Not previously integrated

Not previously completely consistent

Not previously easily accessible

Raw text format

Not user friendly

Pradhan, Xue OntoNotes: The 90% Solution

Page 78: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Challenge with Multiple Layers of Annotation

Not previously availableA number of these layers have not been available in significantquantity before:

Word SenseCoreference

Not previously integrated

Not previously completely consistent

Not previously easily accessible

Raw text format

Not user friendly

Pradhan, Xue OntoNotes: The 90% Solution

Page 79: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Challenge with Multiple Layers of Annotation

Not previously availableA number of these layers have not been available in significantquantity before:

Word SenseCoreference

Not previously integrated

Not previously completely consistent

Not previously easily accessible

Raw text format

Not user friendly

Pradhan, Xue OntoNotes: The 90% Solution

Page 80: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Challenge with Multiple Layers of Annotation

Not previously availableA number of these layers have not been available in significantquantity before:

Word SenseCoreference

Not previously integrated

Not previously completely consistent

Not previously easily accessible

Raw text format

Not user friendly

Pradhan, Xue OntoNotes: The 90% Solution

Page 81: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

Challenge with Multiple Layers of Annotation

Not previously availableA number of these layers have not been available in significantquantity before:

Word SenseCoreference

Not previously integrated

Not previously completely consistent

Not previously easily accessible

Raw text format

Not user friendly

Pradhan, Xue OntoNotes: The 90% Solution

Page 82: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 83: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 84: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 85: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 86: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 87: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 88: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 89: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

InterdependenciesChallengeSolution

A Solution: Unified Representation

Provide a bare-bones representation independent of theindividual semantics that can

Efficiently capture intra-and inter-layer semanticsMaintain component independenceProvide mechanism for flexible integrationIntegrate information at the lowest level of granularityRobust to superficial changes in representations

A Relational Database + Object Oriented API

Pradhan, Xue OntoNotes: The 90% Solution

Page 90: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Modes of Data Access

SQL queries can extract examples based on multiple layers ordefine new views

Python Object-Oriented API allows for programmatic accessto tables and queries

And, the raw text files as well

Pradhan, Xue OntoNotes: The 90% Solution

Page 91: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Modes of Data Access

SQL queries can extract examples based on multiple layers ordefine new views

Python Object-Oriented API allows for programmatic accessto tables and queries

And, the raw text files as well

Pradhan, Xue OntoNotes: The 90% Solution

Page 92: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Modes of Data Access

SQL queries can extract examples based on multiple layers ordefine new views

Python Object-Oriented API allows for programmatic accessto tables and queries

And, the raw text files as well

Pradhan, Xue OntoNotes: The 90% Solution

Page 93: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Advantages of Integrated Representation

Each layer translates into a common representation

Clean consistent Layers

Well defined relationships – The Database scheme defines themerged structure efficiently

Original representations available as pre-defined views, eg.Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers ordefine new views

Python object-oriented API allows for programmatic access totables and queries

Pradhan, Xue OntoNotes: The 90% Solution

Page 94: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Advantages of Integrated Representation

Each layer translates into a common representation

Clean consistent Layers

Well defined relationships – The Database scheme defines themerged structure efficiently

Original representations available as pre-defined views, eg.Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers ordefine new views

Python object-oriented API allows for programmatic access totables and queries

Pradhan, Xue OntoNotes: The 90% Solution

Page 95: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Advantages of Integrated Representation

Each layer translates into a common representation

Clean consistent Layers

Well defined relationships – The Database scheme defines themerged structure efficiently

Original representations available as pre-defined views, eg.Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers ordefine new views

Python object-oriented API allows for programmatic access totables and queries

Pradhan, Xue OntoNotes: The 90% Solution

Page 96: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Advantages of Integrated Representation

Each layer translates into a common representation

Clean consistent Layers

Well defined relationships – The Database scheme defines themerged structure efficiently

Original representations available as pre-defined views, eg.Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers ordefine new views

Python object-oriented API allows for programmatic access totables and queries

Pradhan, Xue OntoNotes: The 90% Solution

Page 97: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Advantages of Integrated Representation

Each layer translates into a common representation

Clean consistent Layers

Well defined relationships – The Database scheme defines themerged structure efficiently

Original representations available as pre-defined views, eg.Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers ordefine new views

Python object-oriented API allows for programmatic access totables and queries

Pradhan, Xue OntoNotes: The 90% Solution

Page 98: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Advantages of Integrated Representation

Each layer translates into a common representation

Clean consistent Layers

Well defined relationships – The Database scheme defines themerged structure efficiently

Original representations available as pre-defined views, eg.Treebank, PropBank, etc.

SQL queries can extract examples based on multiple layers ordefine new views

Python object-oriented API allows for programmatic access totables and queries

Pradhan, Xue OntoNotes: The 90% Solution

Page 99: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

AdvantagesData Lifecycle

Data Lifecycle

Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Life-Cycle

Data build Lifecycle

Errors andinconsistencies

Clean DataRawData

Clean objectsclean view

clean view

clean view

Pradhan, Xue OntoNotes: The 90% SolutionPradhan, Xue OntoNotes: The 90% Solution

Page 100: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Organization of the OntoNotes data

.../data/<lang>/annotations/<genre>/<source>/<section>/<filename>.<extension>

.../data/<lang>/metadata/<inventory-type>/<filename>.xml

<extension> ::= ("parse" | "prop" | "sense" | "coref" | "name" | "parallel" | "speaker")

<inventory-type> ::= ("frames" | "sense-inventories")

Pradhan, Xue OntoNotes: The 90% Solution

Page 101: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

Entity Relationship Diagram (I)Corpus

TreeBank

Word Sense

PropBank

CoreferenceNames

Ontologyclass

id

description

argument_analogue

id

argument_type_id (FK)

proposition_id (FK)

feature_type

id

descriptionsensepool_sense

concept_sensepool_type_id (FK)

on_sense_type_id (FK)

concept_sensepool_parent

concept_sensepool_type_id (FK)

parent_concept_sensepool_type_id (FK)

concept_sensepool_relation

concept_sensepool_type_id (FK)

related_concept_sensepool_type_id (FK)

concept_feature

concept_sensepool_type_id (FK)

feature_type_id (FK)

feature_modifier

description

concept_sensepool_type

id

class_id (FK)

name

commentary

compound_function_tag

function_tag_type_id (FK)

tree_id (FK)

ontonotes

idon_sense_type_pb_sense_type

pb_sense_type_id (FK)

on_sense_type_id (FK)

on_sense_type

id

pos_type_id (FK)

lemma_id (FK)

gloss

name_entity_type

id

Description

name_entity

id

tree_id (FK)

name_entity_type_id (FK)

sentence_id (FK)

Start Token

End Token

predicate_node

predicate_id (FK)

tree_id (FK)

primary_flag

part_index

proposition

id

syntactic_link_type

id

description

syntactic_link

actual_tree_id (FK)

syntactic_link_tree_id (FK)

syntactic_link_type_id (FK)

associated_argument_id (FK)

coreference_chain_type

id

description

argument_node

tree_id (FK)

argument_id (FK)

pb_sense_type_argument_type

pb_sense_type_id (FK)

argument_Type_id (FK)

on_sense_type_wn_sense_type

on_sense_type_id (FK)

wn_sense_type_id (FK)

wn_sense_type

id

pos_type_id (FK)

lemma_id (FK)

token

id

lemma_id (FK)

sentence_id (FK)

on_sense_id (FK)

sentence_token_index

word

argument_type

id

description

argument

id

argument_analogue_id (FK)

split_argument_flag

argument_index

predicate

id

proposition_id (FK)

token_id (FK)

predicate_Type_id (FK)

predicate_type

id

description

coreference_link_type

id

description

coreference_link

id

coreference_link_Type_id (FK)

coreference_chain_id (FK)

tree_id (FK)

sentence_id (FK)

Start token

End token

coreference_chain

id

document_id (FK)

coreference_chain_type_id (FK)

function_tag_type

id

description

tree

id

parent_tree_id (FK)

sentence_id (FK)

pos_type_id (FK)

phrase_type_id (FK)

child_index

start

end

pos_type

id

description

pb_sense_type

id

lemma_id (FK)

on_sense

id

on_sense_type_id (FK)

phrase_type

id

description

language_type

id

name

subcorpus

id

language_type_id (FK)

ontonotes_id (FK)

base_dir

root_dir

encoding

file

id

subcorpus_id (FK)

file_type

physical_file_name

base_dir

document

id

file_id (FK)

subcorpus_id (FK)

sentence

id

document_id (FK)

sentence_index

sentence_string

no_trace_string

lemma

id

1

Pradhan, Xue OntoNotes: The 90% Solution

Page 102: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

Entity Relationship Diagram (II)

Pradhan, Xue OntoNotes: The 90% Solution

Page 103: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

Corpus Tables

The corpus tables collectively manage information about thecorpus – specifically the subcorpora, documents, files, etc.

Pradhan, Xue OntoNotes: The 90% Solution

Page 104: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

Treebank Tables

The treebank tables manage the syntactic tree information.Tokens form the lowest level of granularity in OntoNotes.

Pradhan, Xue OntoNotes: The 90% Solution

Page 105: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

PropBank Tables

The proposition tables manage the propositions. Theargument node forms a composite table to managemany-to-many argument/node relationships

Pradhan, Xue OntoNotes: The 90% Solution

Page 106: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

Word Sense Tables

The sense tables contain the lemma and sense numberrepresenting its sense

Multiple composite tables are used to map WordNet sense,OntoNotes sense and Frame senses to each other

Pradhan, Xue OntoNotes: The 90% Solution

Page 107: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

Coreference Tables

The coreference chain and coreference link tables store therespective pointers.

Pradhan, Xue OntoNotes: The 90% Solution

Page 108: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ER DiagramCorpus TablesTreebank TablesPropBank TablesWord Sense TablesCoreference TablesName Tables

Name Tables

The name entity and name type tables represent the names inthe corpus

Pradhan, Xue OntoNotes: The 90% Solution

Page 109: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Module Organization

Pradhan, Xue OntoNotes: The 90% Solution

Page 110: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Object Composition

Pradhan, Xue OntoNotes: The 90% Solution

Page 111: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

on.corpora.tree

on.corpora.tree

com...function_tag

+id+type

+write_to_db(cursor)

phrase_type

+id+type_hash

+__init__()

+write_to_db(cursor)

pos_type

+id

+type_hash

+write_to_db(cursor)

syntactic_link

+id

+word+reference_subtree_id+identity_subtree_id

+associated_argument_id

+__init__()+write_to_db(xursor)

syntactic_link_type

+id+type_hash

+__init__()

+write_to_db(cursor)

token

+id

+word

+__init__()+__repr__()

+write_to_db()

tree

+id+tag+word

+start+end+trace_type

+reference_index+identity_index+lemma+child_index+document_id

+sentence_id+indexed_node_id+indexing_node_id

+proposition+on_sense+name_entity

+__init__()+__repr__()+__getitem__()+from_string(parse)+from_db(cursor)

+subtrees()+leaves()+get_word_string()

+get_subtree(ud)+initialize_ids()+process_leaves()+process_traces()+build_trace_chains()

+is_trace_indexed()+is_trace_origin()+pretty_print()

+write_to_db(cursor)+dump_view()

tree_document

+doc_id

+parse_list+tree_ids+subcorpus_id

+__init__()+get_tree_ids()+get_tree(id)

+get_plain_text()+onf()+__repr__()+write_to_db(cursor)+dump_view()

treebank

+id+tag

+subcorpus+tree_ids+tree_hash+num_trees+tree_document_id

+__init__()+from_db(cursor)+write_to_db(cursor)

+dump_view()

+tree_document_hash*

+trees

1..*

+children

+parent

0..*

0..1

function_tag

+write_to_db(cursor)

function_tag_type

+compound_function_tag0..1

+function_tags

1..*+function_tag_type1

+syntactic_links0..*

+syntactic_link_type 1

+phrase_type1

+pos_type

1

+token

0..1

Pradhan, Xue OntoNotes: The 90% Solution

Page 112: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

on.corpora.proposition

on.corpora.proposition

argument

+id+type+argument_index

+argument_analogue_index

+__init__()+__repr__()

+write_to_db(cursor)

argument_analogue

+id+encoded_argument_analogue+argument_analogue_index

+type

+__init__()+__repr__()

argument_part

+id+tree_id

+part_index

+__init__()+__repr__()

+write_to_db(cursor)

argument_type

+type_hash+type+id

+__init__()+__repr__()+write_to_db(cursor)

frame_set

+lemma+subcorpus

+__init__()+__repr__()+write_to_db(cursor)

pb_sense_type

+id+lemma

+num+type_hash: variant

+__init__()

+__repr__()+write_to_db()

predicate

+id+enc_predicate

+index+lemma+pb_sense_num+proposition_id

+__init__()+get_primary_predicate()+__repr__()+write_to_db(cursor)

predicate_part

+id+type+predicate_id+part_index+primary

+tree_id+lemma

+__init__()

+__repr__()+write_to_db(cursor)

predicate_type

+type_hash

+type

+__init__()+__repr__()

+write_to_db(cursor)

proposition

+id+document_id

+tree_id+corpus_id+sentence_index+predicate_type+lemma_and_sense

+lemma+pb_sense_num

+__init__()

+get_primary_predicate()+__repr__()+write_to_db(cursor)

proposition_bank

+id+subcorpus+lemma_hash

+tag

+__init__()+enrich_treebank(treebank)

+write_to_db(cursor)+from_db(cursor)+dump_view()

+propositions

1..*

+proposition_id+predicate

1

1

+predicate_type1

+predicate_parts 1..*

+proposition_id +argument_analogues

1

0..*

+arguments1..*

+argument_id+argument_parts

1

1..*

+argument_type

1+pb_sense_num

+predicate

11

1

+frame_set_hash1

Pradhan, Xue OntoNotes: The 90% Solution

Page 113: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Salient Methods

Every bank has a enrich treebank method which takes atreebank object and aligns itself to the trees

Almost every object has a from db and write to dbmethod which can create itself from the database, or serializeitself to the database

The SQL statements for reading/writing to DB are classattributes of most classes

Pradhan, Xue OntoNotes: The 90% Solution

Page 114: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Banks DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 115: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Treebank DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 116: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

PropBank DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 117: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Word Sense DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 118: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Coreference DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 119: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Name DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 120: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Inventories DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 121: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Parallel DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 122: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

Python ModulesDatabase/API Correspondence

Speaker DB Tables ⇔ Python Objects ⇔ File Elements

Pradhan, Xue OntoNotes: The 90% Solution

Page 123: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Configuration File

Sections of the Configuration[corpus]

data in: [</path/to/data>]

load: (<lang>-<genre> | <lang>-<genre>-<source>)+

prefix: (<prefix>)*

suffix: (<suffix>)*

granularity: <granularity>

banks: (<bank>)+

ignore-inventories: (<inventory>)*

[db]db: <ontonotes-database-name>

server: <your-mysql-server-address>

db-user: <your-mysql-username>

<lang> ::= ("english" | "chinese" | "arabic") <genre> ::= ("nw" | "bn" | "mz" | "bc")

<source> ::= ("wsj" | "cnn" | "msnbc" | "xinhua" | ...)

<bank> ::= ("parse" | "prop" | "sense" | "coref" | "name" | "parallel" | "speaker")

<inventory> ::= ("senses" | "frames") <granularity> ::= ("file" | "source" | "genre")

<prefix> ::= <digit>+ <suffix> ::= <digit>+

Pradhan, Xue OntoNotes: The 90% Solution

Page 124: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]

data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 125: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 126: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 127: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 128: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 129: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 130: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 131: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]

db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 132: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 133: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 134: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

A Sample Configuration

[corpus]data in : /corpora/ontonotes/v3/data

load : english-nw-wsj chinese-bc

prefix : 02 03

suffix :

granularity : file

banks : parse prop sense

ignore-inventories : senses frames

[db]db : ontonotes v3

server : ontonotes.bbn.com

db-user : ontonotes

Pradhan, Xue OntoNotes: The 90% Solution

Page 135: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Configuration File

[corpus]

data in : data

load : english-nw-wsj

granularity : source

banks : parse coref sense name prop parallel speaker

ignore-inventories: senses frames

Pradhan, Xue OntoNotes: The 90% Solution

Page 136: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 137: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 138: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 139: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 140: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 141: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 142: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 143: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 144: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 145: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Reading the Configuration

In [1]: import on

In [2]: import on.common.util

In [3]: c = on.common.util.load config("config.example")

In [4]: c

Out[4]: <on.common.util.FancyConfigParser instance at 0x82c4c4c>

In [5]: c.sections()

Out[5]: [‘corpus’]

In [7]: c["corpus", "banks"]

Out[7]: ‘parse coref sense name parallel prop speaker’

Pradhan, Xue OntoNotes: The 90% Solution

Page 146: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Creating the ontonotes Object

In [9]: o = on.ontonotes(c)

Loading english nw wsj

....................

found 4 files in the subcorpus all@wsj@nw@en@on

In [10]: o

Out[10]:

ontonotes instance, id=on, subcorpora:

[0] : all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 147: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Creating the ontonotes Object

In [9]: o = on.ontonotes(c)

Loading english nw wsj

....................

found 4 files in the subcorpus all@wsj@nw@en@on

In [10]: o

Out[10]:

ontonotes instance, id=on, subcorpora:

[0] : all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 148: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Creating the ontonotes Object

In [9]: o = on.ontonotes(c)

Loading english nw wsj

....................

found 4 files in the subcorpus all@wsj@nw@en@on

In [10]: o

Out[10]:

ontonotes instance, id=on, subcorpora:

[0] : all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 149: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Creating the ontonotes Object

In [9]: o = on.ontonotes(c)

Loading english nw wsj

....................

found 4 files in the subcorpus all@wsj@nw@en@on

In [10]: o

Out[10]:

ontonotes instance, id=on, subcorpora:

[0] : all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 150: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Creating the ontonotes Object

In [9]: o = on.ontonotes(c)

Loading english nw wsj

....................

found 4 files in the subcorpus all@wsj@nw@en@on

In [10]: o

Out[10]:

ontonotes instance, id=on, subcorpora:

[0] : all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 151: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [6]: c["corpus", "granularity"] = "file"

In [8]: o = on.ontonotes(c)

Loading english nw wsj

.....

found 1 file in the subcorpus 0089@wsj@nw@en@on

.....

found 1 file in the subcorpus 0020@wsj@nw@en@on

.....

found 1 file in the subcorpus 0049@wsj@nw@en@on

.....

found 1 file in the subcorpus 0037@wsj@nw@en@on

In [14]: o

Out[14]:

ontonotes instance, id=on, subcorpora:

[0] : 0089@wsj@nw@en@on

[1] : 0020@wsj@nw@en@on

[2] : 0049@wsj@nw@en@on

[3] : 0037@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 152: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [6]: c["corpus", "granularity"] = "file"

In [8]: o = on.ontonotes(c)

Loading english nw wsj

.....

found 1 file in the subcorpus 0089@wsj@nw@en@on

.....

found 1 file in the subcorpus 0020@wsj@nw@en@on

.....

found 1 file in the subcorpus 0049@wsj@nw@en@on

.....

found 1 file in the subcorpus 0037@wsj@nw@en@on

In [14]: o

Out[14]:

ontonotes instance, id=on, subcorpora:

[0] : 0089@wsj@nw@en@on

[1] : 0020@wsj@nw@en@on

[2] : 0049@wsj@nw@en@on

[3] : 0037@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 153: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [6]: c["corpus", "granularity"] = "file"

In [8]: o = on.ontonotes(c)

Loading english nw wsj

.....

found 1 file in the subcorpus 0089@wsj@nw@en@on

.....

found 1 file in the subcorpus 0020@wsj@nw@en@on

.....

found 1 file in the subcorpus 0049@wsj@nw@en@on

.....

found 1 file in the subcorpus 0037@wsj@nw@en@on

In [14]: o

Out[14]:

ontonotes instance, id=on, subcorpora:

[0] : 0089@wsj@nw@en@on

[1] : 0020@wsj@nw@en@on

[2] : 0049@wsj@nw@en@on

[3] : 0037@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 154: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [6]: c["corpus", "granularity"] = "file"

In [8]: o = on.ontonotes(c)

Loading english nw wsj

.....

found 1 file in the subcorpus 0089@wsj@nw@en@on

.....

found 1 file in the subcorpus 0020@wsj@nw@en@on

.....

found 1 file in the subcorpus 0049@wsj@nw@en@on

.....

found 1 file in the subcorpus 0037@wsj@nw@en@on

In [14]: o

Out[14]:

ontonotes instance, id=on, subcorpora:

[0] : 0089@wsj@nw@en@on

[1] : 0020@wsj@nw@en@on

[2] : 0049@wsj@nw@en@on

[3] : 0037@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 155: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [6]: c["corpus", "granularity"] = "file"

In [8]: o = on.ontonotes(c)

Loading english nw wsj

.....

found 1 file in the subcorpus 0089@wsj@nw@en@on

.....

found 1 file in the subcorpus 0020@wsj@nw@en@on

.....

found 1 file in the subcorpus 0049@wsj@nw@en@on

.....

found 1 file in the subcorpus 0037@wsj@nw@en@on

In [14]: o

Out[14]:

ontonotes instance, id=on, subcorpora:

[0] : 0089@wsj@nw@en@on

[1] : 0020@wsj@nw@en@on

[2] : 0049@wsj@nw@en@on

[3] : 0037@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 156: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [6]: c["corpus", "granularity"] = "file"

In [8]: o = on.ontonotes(c)

Loading english nw wsj

.....

found 1 file in the subcorpus 0089@wsj@nw@en@on

.....

found 1 file in the subcorpus 0020@wsj@nw@en@on

.....

found 1 file in the subcorpus 0049@wsj@nw@en@on

.....

found 1 file in the subcorpus 0037@wsj@nw@en@on

In [14]: o

Out[14]:

ontonotes instance, id=on, subcorpora:

[0] : 0089@wsj@nw@en@on

[1] : 0020@wsj@nw@en@on

[2] : 0049@wsj@nw@en@on

[3] : 0037@wsj@nw@en@on Pradhan, Xue OntoNotes: The 90% Solution

Page 157: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Loading the banks

In [11]: s = o[0]Loading banks for all@wsj@nw@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] ....... 233 trees in the treebank

reading the coreference bank [coref] .......

Enriching parse with coref ...

reading the sense bank [sense] .......

Enriching parse with sense ...

....

reading the name bank [name].......

Enriching parse with name ...

....

reading the parallel bank [parallel] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

reading the proposition bank [prop] .......

Enriching parse with prop ...

....

reading the speaker bank [speaker] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

Not enriching parse with speaker because we have no documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 158: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Loading the banks

In [11]: s = o[0]

Loading banks for all@wsj@nw@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] ....... 233 trees in the treebank

reading the coreference bank [coref] .......

Enriching parse with coref ...

reading the sense bank [sense] .......

Enriching parse with sense ...

....

reading the name bank [name].......

Enriching parse with name ...

....

reading the parallel bank [parallel] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

reading the proposition bank [prop] .......

Enriching parse with prop ...

....

reading the speaker bank [speaker] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

Not enriching parse with speaker because we have no documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 159: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Loading the banks

In [11]: s = o[0]Loading banks for all@wsj@nw@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] ....... 233 trees in the treebank

reading the coreference bank [coref] .......

Enriching parse with coref ...

reading the sense bank [sense] .......

Enriching parse with sense ...

....

reading the name bank [name].......

Enriching parse with name ...

....

reading the parallel bank [parallel] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

reading the proposition bank [prop] .......

Enriching parse with prop ...

....

reading the speaker bank [speaker] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

Not enriching parse with speaker because we have no documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 160: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Loading the banks

In [11]: s = o[0]Loading banks for all@wsj@nw@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] ....... 233 trees in the treebank

reading the coreference bank [coref] .......

Enriching parse with coref ...

reading the sense bank [sense] .......

Enriching parse with sense ...

....

reading the name bank [name].......

Enriching parse with name ...

....

reading the parallel bank [parallel] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

reading the proposition bank [prop] .......

Enriching parse with prop ...

....

reading the speaker bank [speaker] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

Not enriching parse with speaker because we have no documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 161: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Loading the banks

In [11]: s = o[0]Loading banks for all@wsj@nw@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] ....... 233 trees in the treebank

reading the coreference bank [coref] .......

Enriching parse with coref ...

reading the sense bank [sense] .......

Enriching parse with sense ...

....

reading the name bank [name].......

Enriching parse with name ...

....

reading the parallel bank [parallel] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

reading the proposition bank [prop] .......

Enriching parse with prop ...

....

reading the speaker bank [speaker] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

Not enriching parse with speaker because we have no documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 162: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Loading the banks

In [11]: s = o[0]Loading banks for all@wsj@nw@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] ....... 233 trees in the treebank

reading the coreference bank [coref] .......

Enriching parse with coref ...

reading the sense bank [sense] .......

Enriching parse with sense ...

....

reading the name bank [name].......

Enriching parse with name ...

....

reading the parallel bank [parallel] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

reading the proposition bank [prop] .......

Enriching parse with prop ...

....

reading the speaker bank [speaker] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

Not enriching parse with speaker because we have no documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 163: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Loading the banks

In [11]: s = o[0]Loading banks for all@wsj@nw@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] ....... 233 trees in the treebank

reading the coreference bank [coref] .......

Enriching parse with coref ...

reading the sense bank [sense] .......

Enriching parse with sense ...

....

reading the name bank [name].......

Enriching parse with name ...

....

reading the parallel bank [parallel] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

reading the proposition bank [prop] .......

Enriching parse with prop ...

....

reading the speaker bank [speaker] ...keys: [‘parse’, ‘prop’, ‘coref’, ‘name’, ‘sense’]

Not enriching parse with speaker because we have no documents

Pradhan, Xue OntoNotes: The 90% Solution

Page 164: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Inside the subcorpus

In [12]: s

Out[12]:

subcorpus instance, id=all@wsj@nw@en@on, banks:

[ coref] : gold@all@wsj@nw@en@on

[document] : gold@all@wsj@nw@en@on

[ name] : gold@all@wsj@nw@en@on

[parallel] : gold@all@wsj@nw@en@on

[ parse] : gold@all@wsj@nw@en@on

[ prop] : gold@all@wsj@nw@en@on

[ sense] : gold@all@wsj@nw@en@on

[ speaker] : gold@all@wsj@nw@en@on

Accessing the same again does not read from the disk becauseit uses weakref

In [14]: s = o[0]

In [15]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 165: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Inside the subcorpus

In [12]: s

Out[12]:

subcorpus instance, id=all@wsj@nw@en@on, banks:

[ coref] : gold@all@wsj@nw@en@on

[document] : gold@all@wsj@nw@en@on

[ name] : gold@all@wsj@nw@en@on

[parallel] : gold@all@wsj@nw@en@on

[ parse] : gold@all@wsj@nw@en@on

[ prop] : gold@all@wsj@nw@en@on

[ sense] : gold@all@wsj@nw@en@on

[ speaker] : gold@all@wsj@nw@en@on

Accessing the same again does not read from the disk becauseit uses weakref

In [14]: s = o[0]

In [15]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 166: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Inside the subcorpus

In [12]: s

Out[12]:

subcorpus instance, id=all@wsj@nw@en@on, banks:

[ coref] : gold@all@wsj@nw@en@on

[document] : gold@all@wsj@nw@en@on

[ name] : gold@all@wsj@nw@en@on

[parallel] : gold@all@wsj@nw@en@on

[ parse] : gold@all@wsj@nw@en@on

[ prop] : gold@all@wsj@nw@en@on

[ sense] : gold@all@wsj@nw@en@on

[ speaker] : gold@all@wsj@nw@en@on

Accessing the same again does not read from the disk becauseit uses weakref

In [14]: s = o[0]

In [15]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 167: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Inside the subcorpus

In [12]: s

Out[12]:

subcorpus instance, id=all@wsj@nw@en@on, banks:

[ coref] : gold@all@wsj@nw@en@on

[document] : gold@all@wsj@nw@en@on

[ name] : gold@all@wsj@nw@en@on

[parallel] : gold@all@wsj@nw@en@on

[ parse] : gold@all@wsj@nw@en@on

[ prop] : gold@all@wsj@nw@en@on

[ sense] : gold@all@wsj@nw@en@on

[ speaker] : gold@all@wsj@nw@en@on

Accessing the same again does not read from the disk becauseit uses weakref

In [14]: s = o[0]

In [15]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 168: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Inside the subcorpus

In [12]: s

Out[12]:

subcorpus instance, id=all@wsj@nw@en@on, banks:

[ coref] : gold@all@wsj@nw@en@on

[document] : gold@all@wsj@nw@en@on

[ name] : gold@all@wsj@nw@en@on

[parallel] : gold@all@wsj@nw@en@on

[ parse] : gold@all@wsj@nw@en@on

[ prop] : gold@all@wsj@nw@en@on

[ sense] : gold@all@wsj@nw@en@on

[ speaker] : gold@all@wsj@nw@en@on

Accessing the same again does not read from the disk becauseit uses weakref

In [14]: s = o[0]

In [15]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 169: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Inside the subcorpus

In [12]: s

Out[12]:

subcorpus instance, id=all@wsj@nw@en@on, banks:

[ coref] : gold@all@wsj@nw@en@on

[document] : gold@all@wsj@nw@en@on

[ name] : gold@all@wsj@nw@en@on

[parallel] : gold@all@wsj@nw@en@on

[ parse] : gold@all@wsj@nw@en@on

[ prop] : gold@all@wsj@nw@en@on

[ sense] : gold@all@wsj@nw@en@on

[ speaker] : gold@all@wsj@nw@en@on

Accessing the same again does not read from the disk becauseit uses weakref

In [14]: s = o[0]

In [15]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 170: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Coreference Data

In [13]: c bank = s["coref"]

In [15]: c bank

Out[15]:

coreference bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 171: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Coreference Data

In [13]: c bank = s["coref"]

In [15]: c bank

Out[15]:

coreference bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 172: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Coreference Data

In [13]: c bank = s["coref"]

In [15]: c bank

Out[15]:

coreference bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 173: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Coreference Data

In [13]: c bank = s["coref"]

In [15]: c bank

Out[15]:

coreference bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 174: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [17]: c doc

Out[17]:

coreference document, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, coreference chains:

[ 0] : APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : APPOS@000-57@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@000-12@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@000-25@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@000-2@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@000-30@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@000-33@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@000-36@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@000-38@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

[22] : IDENT@000-7@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[23] : IDENT@000-9@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 175: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [17]: c doc

Out[17]:

coreference document, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, coreference chains:

[ 0] : APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : APPOS@000-57@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@000-12@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@000-25@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@000-2@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@000-30@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@000-33@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@000-36@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@000-38@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

[22] : IDENT@000-7@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[23] : IDENT@000-9@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 176: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [17]: c doc

Out[17]:

coreference document, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, coreference chains:

[ 0] : APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : APPOS@000-57@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@000-12@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@000-25@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@000-2@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@000-30@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@000-33@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@000-36@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@000-38@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

[22] : IDENT@000-7@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[23] : IDENT@000-9@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 177: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 178: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 179: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chain

Out[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 180: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 181: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 182: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 183: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0

Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 184: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 185: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1

Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 186: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [21]: c chain = c doc[0]

In [22]: c chainOut[22]:coreference chain instance, id=APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[0] : ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [24]: c link 0 = c chain[0]

In [25]: c link 1 = c chain[1]

In [26]: c link 0Out[26]: <coreference link object: id: ATTRIB@1:2:4@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: ATTRIB --- ‘five other countries’>

In [27]: c link 1Out[27]: <coreference link object: id: HEAD@1:6:14@APPOS@000-52@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: HEAD --- ‘China , Thailand , India , Brazil and Mexico’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 187: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [32]: c chainOut[32]:coreference chain instance, id=IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[ 0] : IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@4:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@6:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@7:19:19@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@10:35:36@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@17:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@18:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@19:5:6@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[10] : IDENT@20:3:4@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [38]: c link 0 = c chain[0]

In [39]: c link 1 = c chain[1]

In [40]: c link 2 = c chain[2]

Pradhan, Xue OntoNotes: The 90% Solution

Page 188: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [32]: c chain

Out[32]:coreference chain instance, id=IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[ 0] : IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@4:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@6:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@7:19:19@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@10:35:36@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@17:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@18:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@19:5:6@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[10] : IDENT@20:3:4@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [38]: c link 0 = c chain[0]

In [39]: c link 1 = c chain[1]

In [40]: c link 2 = c chain[2]

Pradhan, Xue OntoNotes: The 90% Solution

Page 189: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [32]: c chainOut[32]:coreference chain instance, id=IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[ 0] : IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@4:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@6:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@7:19:19@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@10:35:36@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@17:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@18:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@19:5:6@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[10] : IDENT@20:3:4@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [38]: c link 0 = c chain[0]

In [39]: c link 1 = c chain[1]

In [40]: c link 2 = c chain[2]

Pradhan, Xue OntoNotes: The 90% Solution

Page 190: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [32]: c chainOut[32]:coreference chain instance, id=IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[ 0] : IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@4:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@6:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@7:19:19@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@10:35:36@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@17:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@18:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@19:5:6@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[10] : IDENT@20:3:4@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [38]: c link 0 = c chain[0]

In [39]: c link 1 = c chain[1]

In [40]: c link 2 = c chain[2]

Pradhan, Xue OntoNotes: The 90% Solution

Page 191: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [32]: c chainOut[32]:coreference chain instance, id=IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[ 0] : IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@4:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@6:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@7:19:19@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@10:35:36@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@17:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@18:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@19:5:6@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[10] : IDENT@20:3:4@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [38]: c link 0 = c chain[0]

In [39]: c link 1 = c chain[1]

In [40]: c link 2 = c chain[2]

Pradhan, Xue OntoNotes: The 90% Solution

Page 192: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [32]: c chainOut[32]:coreference chain instance, id=IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on, links:

[ 0] : IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : IDENT@4:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : IDENT@6:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 5] : IDENT@7:19:19@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 6] : IDENT@10:35:36@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 7] : IDENT@17:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 8] : IDENT@18:0:0@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 9] : IDENT@19:5:6@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[10] : IDENT@20:3:4@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

In [38]: c link 0 = c chain[0]

In [39]: c link 1 = c chain[1]

In [40]: c link 2 = c chain[2]

Pradhan, Xue OntoNotes: The 90% Solution

Page 193: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [41]: c link 0Out[41]: <coreference link object: id: IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘U.S. Trade Representative Carla Hills’>

In [42]: c link 1Out[42]: <coreference link object: id: IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘Mrs. Hills’>

In [43]: c link 2Out[43]: <coreference link object: id: IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘she’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 194: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [41]: c link 0

Out[41]: <coreference link object: id: IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘U.S. Trade Representative Carla Hills’>

In [42]: c link 1Out[42]: <coreference link object: id: IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘Mrs. Hills’>

In [43]: c link 2Out[43]: <coreference link object: id: IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘she’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 195: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [41]: c link 0Out[41]: <coreference link object: id: IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘U.S. Trade Representative Carla Hills’>

In [42]: c link 1Out[42]: <coreference link object: id: IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘Mrs. Hills’>

In [43]: c link 2Out[43]: <coreference link object: id: IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘she’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 196: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [41]: c link 0Out[41]: <coreference link object: id: IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘U.S. Trade Representative Carla Hills’>

In [42]: c link 1

Out[42]: <coreference link object: id: IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘Mrs. Hills’>

In [43]: c link 2Out[43]: <coreference link object: id: IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘she’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 197: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [41]: c link 0Out[41]: <coreference link object: id: IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘U.S. Trade Representative Carla Hills’>

In [42]: c link 1Out[42]: <coreference link object: id: IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘Mrs. Hills’>

In [43]: c link 2Out[43]: <coreference link object: id: IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘she’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 198: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [41]: c link 0Out[41]: <coreference link object: id: IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘U.S. Trade Representative Carla Hills’>

In [42]: c link 1Out[42]: <coreference link object: id: IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘Mrs. Hills’>

In [43]: c link 2

Out[43]: <coreference link object: id: IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘she’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 199: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [41]: c link 0Out[41]: <coreference link object: id: IDENT@1:34:38@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘U.S. Trade Representative Carla Hills’>

In [42]: c link 1Out[42]: <coreference link object: id: IDENT@3:0:1@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘Mrs. Hills’>

In [43]: c link 2Out[43]: <coreference link object: id: IDENT@3:10:10@IDENT@000-10@000@nw/wsj/00/wsj 0020@all@wsj@...;

type: IDENT --- ‘she’>

Pradhan, Xue OntoNotes: The 90% Solution

Page 200: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.[tab]

c link 0. class ff c link 0. get subtree id c link 0.end token index

c link 0. delattr c link 0. sentence index c link 0.end word index

c link 0. dict c link 0. set end leaf c link 0.enrich tree

c link 0. doc c link 0. set start leaf c link 0.id

c link 0. getattribute c link 0. set string c link 0.overlaps

c link 0. hash c link 0. start leaf c link 0.primary end index

c link 0. init c link 0. start token index c link 0.primary start index

c link 0. module c link 0. start word index c link 0.sentence index

c link 0. new c link 0. string c link 0.sql create statement

c link 0. reduce c link 0. subtree id c link 0.sql insert statement

c link 0. reduce ex c link 0.sql table name c link 0.valid

c link 0. repr c link 0.start leaf c link 0.write to db

c link 0. setattr c link 0.start token index

c link 0. str c link 0.start word index

c link 0. weakref c link 0.string

c link 0. end leaf c link 0.subtree

c link 0. end token index c link 0.subtree id

c link 0. end word index c link 0.type

c link 0. get end leaf c link 0.copy to different trees

c link 0. get sentence index c link 0.coreference chain

c link 0. get start leaf c link 0.coreference chain id

c link 0. get string c link 0.end leaf

Pradhan, Xue OntoNotes: The 90% Solution

Page 201: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.[tab]

c link 0. class ff c link 0. get subtree id c link 0.end token index

c link 0. delattr c link 0. sentence index c link 0.end word index

c link 0. dict c link 0. set end leaf c link 0.enrich tree

c link 0. doc c link 0. set start leaf c link 0.id

c link 0. getattribute c link 0. set string c link 0.overlaps

c link 0. hash c link 0. start leaf c link 0.primary end index

c link 0. init c link 0. start token index c link 0.primary start index

c link 0. module c link 0. start word index c link 0.sentence index

c link 0. new c link 0. string c link 0.sql create statement

c link 0. reduce c link 0. subtree id c link 0.sql insert statement

c link 0. reduce ex c link 0.sql table name c link 0.valid

c link 0. repr c link 0.start leaf c link 0.write to db

c link 0. setattr c link 0.start token index

c link 0. str c link 0.start word index

c link 0. weakref c link 0.string

c link 0. end leaf c link 0.subtree

c link 0. end token index c link 0.subtree id

c link 0. end word index c link 0.type

c link 0. get end leaf c link 0.copy to different trees

c link 0. get sentence index c link 0.coreference chain

c link 0. get start leaf c link 0.coreference chain id

c link 0. get string c link 0.end leaf

Pradhan, Xue OntoNotes: The 90% Solution

Page 202: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.[tab]

c link 0. class ff c link 0. get subtree id c link 0.end token index

c link 0. delattr c link 0. sentence index c link 0.end word index

c link 0. dict c link 0. set end leaf c link 0.enrich tree

c link 0. doc c link 0. set start leaf c link 0.id

c link 0. getattribute c link 0. set string c link 0.overlaps

c link 0. hash c link 0. start leaf c link 0.primary end index

c link 0. init c link 0. start token index c link 0.primary start index

c link 0. module c link 0. start word index c link 0.sentence index

c link 0. new c link 0. string c link 0.sql create statement

c link 0. reduce c link 0. subtree id c link 0.sql insert statement

c link 0. reduce ex c link 0.sql table name c link 0.valid

c link 0. repr c link 0.start leaf c link 0.write to db

c link 0. setattr c link 0.start token index

c link 0. str c link 0.start word index

c link 0. weakref c link 0.string

c link 0. end leaf c link 0.subtree

c link 0. end token index c link 0.subtree id

c link 0. end word index c link 0.type

c link 0. get end leaf c link 0.copy to different trees

c link 0. get sentence index c link 0.coreference chain

c link 0. get start leaf c link 0.coreference chain id

c link 0. get string c link 0.end leaf

Pradhan, Xue OntoNotes: The 90% Solution

Page 203: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.[tab]

c link 0. class ff c link 0. get subtree id c link 0.end token index

c link 0. delattr c link 0. sentence index c link 0.end word index

c link 0. dict c link 0. set end leaf c link 0.enrich tree

c link 0. doc c link 0. set start leaf c link 0.id

c link 0. getattribute c link 0. set string c link 0.overlaps

c link 0. hash c link 0. start leaf c link 0.primary end index

c link 0. init c link 0. start token index c link 0.primary start index

c link 0. module c link 0. start word index c link 0.sentence index

c link 0. new c link 0. string c link 0.sql create statement

c link 0. reduce c link 0. subtree id c link 0.sql insert statement

c link 0. reduce ex c link 0.sql table name c link 0.valid

c link 0. repr c link 0.start leaf c link 0.write to db

c link 0. setattr c link 0.start token index

c link 0. str c link 0.start word index

c link 0. weakref c link 0.string

c link 0. end leaf c link 0.subtree

c link 0. end token index c link 0.subtree id

c link 0. end word index c link 0.type

c link 0. get end leaf c link 0.copy to different trees

c link 0. get sentence index c link 0.coreference chain

c link 0. get start leaf c link 0.coreference chain id

c link 0. get string c link 0.end leaf

Pradhan, Xue OntoNotes: The 90% Solution

Page 204: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.subtree

Out[45]:

(NP-SBJ (NML (NNP U.S.)

(NNP Trade)

(NNP Representative))

(NNP Carla)

(NNP Hills))

In [46]: c link 0.subtree id

Out[46]: ‘34:2@1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [47]: c link 0.type

Out[47]: ‘IDENT’

Pradhan, Xue OntoNotes: The 90% Solution

Page 205: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.subtree

Out[45]:

(NP-SBJ (NML (NNP U.S.)

(NNP Trade)

(NNP Representative))

(NNP Carla)

(NNP Hills))

In [46]: c link 0.subtree id

Out[46]: ‘34:2@1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [47]: c link 0.type

Out[47]: ‘IDENT’

Pradhan, Xue OntoNotes: The 90% Solution

Page 206: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.subtree

Out[45]:

(NP-SBJ (NML (NNP U.S.)

(NNP Trade)

(NNP Representative))

(NNP Carla)

(NNP Hills))

In [46]: c link 0.subtree id

Out[46]: ‘34:2@1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [47]: c link 0.type

Out[47]: ‘IDENT’

Pradhan, Xue OntoNotes: The 90% Solution

Page 207: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.subtree

Out[45]:

(NP-SBJ (NML (NNP U.S.)

(NNP Trade)

(NNP Representative))

(NNP Carla)

(NNP Hills))

In [46]: c link 0.subtree id

Out[46]: ‘34:2@1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [47]: c link 0.type

Out[47]: ‘IDENT’

Pradhan, Xue OntoNotes: The 90% Solution

Page 208: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.subtree

Out[45]:

(NP-SBJ (NML (NNP U.S.)

(NNP Trade)

(NNP Representative))

(NNP Carla)

(NNP Hills))

In [46]: c link 0.subtree id

Out[46]: ‘34:2@1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [47]: c link 0.type

Out[47]: ‘IDENT’

Pradhan, Xue OntoNotes: The 90% Solution

Page 209: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.subtree

Out[45]:

(NP-SBJ (NML (NNP U.S.)

(NNP Trade)

(NNP Representative))

(NNP Carla)

(NNP Hills))

In [46]: c link 0.subtree id

Out[46]: ‘34:2@1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [47]: c link 0.type

Out[47]: ‘IDENT’

Pradhan, Xue OntoNotes: The 90% Solution

Page 210: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [45]: c link 0.subtree

Out[45]:

(NP-SBJ (NML (NNP U.S.)

(NNP Trade)

(NNP Representative))

(NNP Carla)

(NNP Hills))

In [46]: c link 0.subtree id

Out[46]: ‘34:2@1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [47]: c link 0.type

Out[47]: ‘IDENT’

Pradhan, Xue OntoNotes: The 90% Solution

Page 211: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Treebank Data

In [48]: t bank = s["parse"]

In [49]: t bank

Out[49]:

treebank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [50]: t doc = t bank[0]

In [51]: t doc

Out[51]:

tree document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, trees:

[ 0] : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : 1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : 2@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : 3@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : 4@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 212: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Treebank Data

In [48]: t bank = s["parse"]

In [49]: t bank

Out[49]:

treebank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [50]: t doc = t bank[0]

In [51]: t doc

Out[51]:

tree document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, trees:

[ 0] : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : 1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : 2@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : 3@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : 4@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 213: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Treebank Data

In [48]: t bank = s["parse"]

In [49]: t bank

Out[49]:

treebank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [50]: t doc = t bank[0]

In [51]: t doc

Out[51]:

tree document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, trees:

[ 0] : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : 1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : 2@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : 3@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : 4@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 214: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Treebank Data

In [48]: t bank = s["parse"]

In [49]: t bank

Out[49]:

treebank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [50]: t doc = t bank[0]

In [51]: t doc

Out[51]:

tree document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, trees:

[ 0] : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : 1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : 2@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : 3@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : 4@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 215: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Treebank Data

In [48]: t bank = s["parse"]

In [49]: t bank

Out[49]:

treebank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [50]: t doc = t bank[0]

In [51]: t doc

Out[51]:

tree document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, trees:

[ 0] : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : 1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : 2@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : 3@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : 4@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 216: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Treebank Data

In [48]: t bank = s["parse"]

In [49]: t bank

Out[49]:

treebank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [50]: t doc = t bank[0]

In [51]: t doc

Out[51]:

tree document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, trees:

[ 0] : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : 1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : 2@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : 3@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : 4@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 217: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Treebank Data

In [48]: t bank = s["parse"]

In [49]: t bank

Out[49]:

treebank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [50]: t doc = t bank[0]

In [51]: t doc

Out[51]:

tree document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, trees:

[ 0] : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : 1@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : 2@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : 3@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : 4@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 218: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [52]: t 0 = t doc[0]

In [56]: t 0

Out[56]:

<on.corpora.tree object id=0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on value=<(TOP (S (NP-SBJ-1 (DT The)

(NNP U.S.))

(, ,)

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

(, ,)

(VP (VBD removed)

(NP (NP (NNP South)

(NNP Korea))

(, ,)

(NP (NNP Taiwan))

(CC and)

(NP (NNP Saudi)

(NNP Arabia)))

(PP-CLR (IN from)

(NP (NP (DT a)

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 219: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [52]: t 0 = t doc[0]

In [56]: t 0

Out[56]:

<on.corpora.tree object id=0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on value=<(TOP (S (NP-SBJ-1 (DT The)

(NNP U.S.))

(, ,)

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

(, ,)

(VP (VBD removed)

(NP (NP (NNP South)

(NNP Korea))

(, ,)

(NP (NNP Taiwan))

(CC and)

(NP (NNP Saudi)

(NNP Arabia)))

(PP-CLR (IN from)

(NP (NP (DT a)

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 220: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [52]: t 0 = t doc[0]

In [56]: t 0

Out[56]:

<on.corpora.tree object id=0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on value=<(TOP (S (NP-SBJ-1 (DT The)

(NNP U.S.))

(, ,)

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

(, ,)

(VP (VBD removed)

(NP (NP (NNP South)

(NNP Korea))

(, ,)

(NP (NNP Taiwan))

(CC and)

(NP (NNP Saudi)

(NNP Arabia)))

(PP-CLR (IN from)

(NP (NP (DT a)

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 221: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [52]: t 0 = t doc[0]

In [56]: t 0

Out[56]:

<on.corpora.tree object id=0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on value=<(TOP (S (NP-SBJ-1 (DT The)

(NNP U.S.))

(, ,)

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

(, ,)

(VP (VBD removed)

(NP (NP (NNP South)

(NNP Korea))

(, ,)

(NP (NNP Taiwan))

(CC and)

(NP (NNP Saudi)

(NNP Arabia)))

(PP-CLR (IN from)

(NP (NP (DT a)

...

...Pradhan, Xue OntoNotes: The 90% Solution

Page 222: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [65]: for leaf in t 0.leaves():

....: print leaf

....:

....:

(DT The)

(NNP U.S.)

(, ,)

(-NONE- *PRO*-1)

(VBG claiming)

(DT some)

(NN success)

(IN in)

(PRP$ its)

(NN trade)

(NN diplomacy)

...

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 223: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [65]: for leaf in t 0.leaves():

....: print leaf

....:

....:

(DT The)

(NNP U.S.)

(, ,)

(-NONE- *PRO*-1)

(VBG claiming)

(DT some)

(NN success)

(IN in)

(PRP$ its)

(NN trade)

(NN diplomacy)

...

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 224: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [65]: for leaf in t 0.leaves():

....: print leaf

....:

....:

(DT The)

(NNP U.S.)

(, ,)

(-NONE- *PRO*-1)

(VBG claiming)

(DT some)

(NN success)

(IN in)

(PRP$ its)

(NN trade)

(NN diplomacy)

...

...

...

Pradhan, Xue OntoNotes: The 90% Solution

Page 225: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [19]: t 0[3:11]

Out[19]:

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

In [20]: t 0[3:10]

Out[20]:

[(-NONE- *PRO*-1),

(VBG claiming),

(DT some),

(NN success),

(IN in),

(PRP$ its),

(NN trade)]

Pradhan, Xue OntoNotes: The 90% Solution

Page 226: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [19]: t 0[3:11]

Out[19]:

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

In [20]: t 0[3:10]

Out[20]:

[(-NONE- *PRO*-1),

(VBG claiming),

(DT some),

(NN success),

(IN in),

(PRP$ its),

(NN trade)]

Pradhan, Xue OntoNotes: The 90% Solution

Page 227: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [19]: t 0[3:11]

Out[19]:

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

In [20]: t 0[3:10]

Out[20]:

[(-NONE- *PRO*-1),

(VBG claiming),

(DT some),

(NN success),

(IN in),

(PRP$ its),

(NN trade)]

Pradhan, Xue OntoNotes: The 90% Solution

Page 228: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [19]: t 0[3:11]

Out[19]:

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

In [20]: t 0[3:10]

Out[20]:

[(-NONE- *PRO*-1),

(VBG claiming),

(DT some),

(NN success),

(IN in),

(PRP$ its),

(NN trade)]

Pradhan, Xue OntoNotes: The 90% Solution

Page 229: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [19]: t 0[3:11]

Out[19]:

(S-ADV (NP-SBJ (-NONE- *PRO*-1))

(VP (VBG claiming)

(NP (NP (DT some)

(NN success))

(PP-LOC (IN in)

(NP (PRP$ its)

(NN trade)

(NN diplomacy))))))

In [20]: t 0[3:10]

Out[20]:

[(-NONE- *PRO*-1),

(VBG claiming),

(DT some),

(NN success),

(IN in),

(PRP$ its),

(NN trade)]

Pradhan, Xue OntoNotes: The 90% Solution

Page 230: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Proposition Data

In [87]: p = []

In [88]: for leaf in t 0.leaves():

if(leaf.proposition != None):

p.append(leaf.proposition)

....:

....:

Pradhan, Xue OntoNotes: The 90% Solution

Page 231: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Proposition Data

In [87]: p = []

In [88]: for leaf in t 0.leaves():

if(leaf.proposition != None):

p.append(leaf.proposition)

....:

....:

Pradhan, Xue OntoNotes: The 90% Solution

Page 232: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Proposition Data

In [87]: p = []

In [88]: for leaf in t 0.leaves():

if(leaf.proposition != None):

p.append(leaf.proposition)

....:

....:

Pradhan, Xue OntoNotes: The 90% Solution

Page 233: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [95]: p[3]

Out[95]:

proposition:

id : 29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

doc id : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

tree id : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

frame : watch.01

enc prop : wsj 0020@... 0 29 ... watch.01 ----- 29:0-rel 26:1-ARG0 30:0*25:1-ARG1 ... 24:1*25:1-LINK-SLC

predicate:

< predicate analogue : id: watch.01@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0-rel’

< predicate : id: [email protected]@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0’

< predicate node : id: 0@[email protected]@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0’>>>

arguments:

< argument analogue : id: 0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1-ARG0’

< argument : id: 0@0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1’

< argument node : id: 0@0@0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1’>>>

...

...

links:

< link analogue : id: 0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1*25:1-LINK-SLC’

< link : id: 0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1’

< link node : id: 0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1’>>,

Pradhan, Xue OntoNotes: The 90% Solution

Page 234: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [95]: p[3]

Out[95]:

proposition:

id : 29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

doc id : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

tree id : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

frame : watch.01

enc prop : wsj 0020@... 0 29 ... watch.01 ----- 29:0-rel 26:1-ARG0 30:0*25:1-ARG1 ... 24:1*25:1-LINK-SLC

predicate:

< predicate analogue : id: watch.01@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0-rel’

< predicate : id: [email protected]@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0’

< predicate node : id: 0@[email protected]@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0’>>>

arguments:

< argument analogue : id: 0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1-ARG0’

< argument : id: 0@0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1’

< argument node : id: 0@0@0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1’>>>

...

...

links:

< link analogue : id: 0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1*25:1-LINK-SLC’

< link : id: 0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1’

< link node : id: 0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1’>>,

Pradhan, Xue OntoNotes: The 90% Solution

Page 235: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [95]: p[3]

Out[95]:

proposition:

id : 29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

doc id : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

tree id : 0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

frame : watch.01

enc prop : wsj 0020@... 0 29 ... watch.01 ----- 29:0-rel 26:1-ARG0 30:0*25:1-ARG1 ... 24:1*25:1-LINK-SLC

predicate:

< predicate analogue : id: watch.01@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0-rel’

< predicate : id: [email protected]@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0’

< predicate node : id: 0@[email protected]@v@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘29:0’>>>

arguments:

< argument analogue : id: 0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1-ARG0’

< argument : id: 0@0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1’

< argument node : id: 0@0@0@ARG0@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘26:1’>>>

...

...

links:

< link analogue : id: 0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1*25:1-LINK-SLC’

< link : id: 0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1’

< link node : id: 0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@...; enc self: ‘24:1’>>,

Pradhan, Xue OntoNotes: The 90% Solution

Page 236: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 237: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 238: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 239: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 240: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 241: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 242: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 243: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 244: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 245: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [98]: predicate = p[3].predicate

In [99]: predicate.[tab]

predicate. class predicate.add predicate.get primary predicate

predicate. delattr predicate.analogue type predicate.id

predicate. dict predicate.children predicate.index in parent

predicate. doc predicate.copy to different trees predicate.lemma

predicate. getattribute predicate.document id predicate.parent

predicate. getitem predicate.enc self predicate.pb sense num

predicate. hash predicate.enc self type predicate.primary predicate

predicate. init predicate.enrich tree predicate.proposition

predicate. len predicate.get index of predicate.sentence index

In [99]: predicate.lemma

Out[99]: u‘watch’

In [100]: predicate.tree id

Out[100]: ‘0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [101]: predicate.token index

Out[101]: 29

Pradhan, Xue OntoNotes: The 90% Solution

Page 246: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [102]: predicate.document id

Out[102]: ‘nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [103]: predicate.sentence index

Out[103]: 0

In [104]: predicate.type

Out[104]: u‘v’

Pradhan, Xue OntoNotes: The 90% Solution

Page 247: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [102]: predicate.document id

Out[102]: ‘nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [103]: predicate.sentence index

Out[103]: 0

In [104]: predicate.type

Out[104]: u‘v’

Pradhan, Xue OntoNotes: The 90% Solution

Page 248: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [102]: predicate.document id

Out[102]: ‘nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [103]: predicate.sentence index

Out[103]: 0

In [104]: predicate.type

Out[104]: u‘v’

Pradhan, Xue OntoNotes: The 90% Solution

Page 249: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [102]: predicate.document id

Out[102]: ‘nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [103]: predicate.sentence index

Out[103]: 0

In [104]: predicate.type

Out[104]: u‘v’

Pradhan, Xue OntoNotes: The 90% Solution

Page 250: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [102]: predicate.document id

Out[102]: ‘nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [103]: predicate.sentence index

Out[103]: 0

In [104]: predicate.type

Out[104]: u‘v’

Pradhan, Xue OntoNotes: The 90% Solution

Page 251: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [102]: predicate.document id

Out[102]: ‘nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [103]: predicate.sentence index

Out[103]: 0

In [104]: predicate.type

Out[104]: u‘v’

Pradhan, Xue OntoNotes: The 90% Solution

Page 252: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [102]: predicate.document id

Out[102]: ‘nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [103]: predicate.sentence index

Out[103]: 0

In [104]: predicate.type

Out[104]: u‘v’

Pradhan, Xue OntoNotes: The 90% Solution

Page 253: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 254: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 255: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 256: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 257: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 258: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 259: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 260: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 261: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [113]: link analogue = proposition.link analogues

In [120]: link analogue[0][0][0]

Out[120]: <link node id: 0@0@0@LINK-SLC@29@[email protected] 0020@...@on; enc self: ‘24:1’>

In [121]: link node = link analogue[0][0][0]

In [122]: link node.type

Out[122]: u‘LINK-SLC’

In [123]: link node.subtree

Out[123]: (NP (NNS countries))

Pradhan, Xue OntoNotes: The 90% Solution

Page 262: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 263: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 264: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 265: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 266: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 267: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 268: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 269: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 270: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 271: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 272: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [125]: link node.subtree.start

Out[125]: 24

In [126]: link node.subtree.end

Out[126]: 25

In [128]: link node.id

Out[128]: u‘0@0@0@LINK-SLC@29@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on’

In [134]: link node.subtree.get word[tab]

link node.subtree.get word index link node.subtree.get word string

In [134]: link node.subtree.get word string()

Out[134]: u‘countries’

Pradhan, Xue OntoNotes: The 90% Solution

Page 273: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Senses

In [177]: s bank = s["sense"]

In [178]: s bank

Out[178]:

sense bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [179]: s doc 0 = s bank[0]

In [180]: s doc 0

Out[180]:senses tagged document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, on senses:

[ 0] : claim.2@v@3@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : success.2@n@5@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : trade.1@n@8@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : remove.1@v@11@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : list.1@n@21@0@nw/wsj/00/wsj s0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 274: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Senses

In [177]: s bank = s["sense"]

In [178]: s bank

Out[178]:

sense bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [179]: s doc 0 = s bank[0]

In [180]: s doc 0

Out[180]:senses tagged document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, on senses:

[ 0] : claim.2@v@3@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : success.2@n@5@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : trade.1@n@8@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : remove.1@v@11@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : list.1@n@21@0@nw/wsj/00/wsj s0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 275: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Senses

In [177]: s bank = s["sense"]

In [178]: s bank

Out[178]:

sense bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [179]: s doc 0 = s bank[0]

In [180]: s doc 0

Out[180]:senses tagged document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, on senses:

[ 0] : claim.2@v@3@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : success.2@n@5@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : trade.1@n@8@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : remove.1@v@11@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : list.1@n@21@0@nw/wsj/00/wsj s0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 276: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Senses

In [177]: s bank = s["sense"]

In [178]: s bank

Out[178]:

sense bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [179]: s doc 0 = s bank[0]

In [180]: s doc 0

Out[180]:senses tagged document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, on senses:

[ 0] : claim.2@v@3@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : success.2@n@5@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : trade.1@n@8@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : remove.1@v@11@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : list.1@n@21@0@nw/wsj/00/wsj s0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 277: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Senses

In [177]: s bank = s["sense"]

In [178]: s bank

Out[178]:

sense bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [179]: s doc 0 = s bank[0]

In [180]: s doc 0

Out[180]:senses tagged document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, on senses:

[ 0] : claim.2@v@3@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : success.2@n@5@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : trade.1@n@8@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : remove.1@v@11@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : list.1@n@21@0@nw/wsj/00/wsj s0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 278: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Senses

In [177]: s bank = s["sense"]

In [178]: s bank

Out[178]:

sense bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [179]: s doc 0 = s bank[0]

In [180]: s doc 0

Out[180]:senses tagged document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, on senses:

[ 0] : claim.2@v@3@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : success.2@n@5@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : trade.1@n@8@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : remove.1@v@11@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : list.1@n@21@0@nw/wsj/00/wsj s0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 279: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Senses

In [177]: s bank = s["sense"]

In [178]: s bank

Out[178]:

sense bank instance, id=gold@all@wsj@nw@en@on, documents:

[0] : nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[1] : nw/wsj/00/wsj 0037@all@wsj@nw@en@on

[2] : nw/wsj/00/wsj 0049@all@wsj@nw@en@on

[3] : nw/wsj/00/wsj 0089@all@wsj@nw@en@on

In [179]: s doc 0 = s bank[0]

In [180]: s doc 0

Out[180]:senses tagged document instance, id=nw/wsj/00/wsj 0020@all@wsj@nw@en@on, on senses:

[ 0] : claim.2@v@3@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 1] : success.2@n@5@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 2] : trade.1@n@8@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 3] : remove.1@v@11@0@nw/wsj/00/wsj 0020@all@wsj@nw@en@on

[ 4] : list.1@n@21@0@nw/wsj/00/wsj s0020@all@wsj@nw@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 280: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Parallel Connections

In [136]: s["parallel"]

Out[136]:

parallel_bank instance, id=gold@all@wsj@nw@en@on, documents:

(empty)

Pradhan, Xue OntoNotes: The 90% Solution

Page 281: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Parallel Connections

In [136]: s["parallel"]

Out[136]:

parallel_bank instance, id=gold@all@wsj@nw@en@on, documents:

(empty)

Pradhan, Xue OntoNotes: The 90% Solution

Page 282: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Exploring Parallel Connections

In [136]: s["parallel"]

Out[136]:

parallel_bank instance, id=gold@all@wsj@nw@en@on, documents:

(empty)

Pradhan, Xue OntoNotes: The 90% Solution

Page 283: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [137]: c = on.common.util.load_config("config.parallel")

In [138]: o = on.ontonotes(c)

Loading chinese bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@ch@on

Loading english bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 284: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [137]: c = on.common.util.load_config("config.parallel")

In [138]: o = on.ontonotes(c)

Loading chinese bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@ch@on

Loading english bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 285: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [137]: c = on.common.util.load_config("config.parallel")

In [138]: o = on.ontonotes(c)

Loading chinese bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@ch@on

Loading english bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 286: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [137]: c = on.common.util.load_config("config.parallel")

In [138]: o = on.ontonotes(c)

Loading chinese bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@ch@on

Loading english bc msnbc

.......

found 1 file in the subcorpus all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 287: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [139]: o

Out[139]:

ontonotes instance, id=on, subcorpora:

[0] : all@msnbc@bc@ch@on

[1] : all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 288: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [139]: o

Out[139]:

ontonotes instance, id=on, subcorpora:

[0] : all@msnbc@bc@ch@on

[1] : all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 289: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [139]: o

Out[139]:

ontonotes instance, id=on, subcorpora:

[0] : all@msnbc@bc@ch@on

[1] : all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 290: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [140]: s 0 = o[0]

Loading banks for all@msnbc@bc@ch@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 665 trees in the treebank

reading the coreference bank [coref] .... Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

finding original trees to prepare for parallel bank enrichment....

Loading banks for all@msnbc@bc@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 660 trees in the treebank

reading the coreference bank [coref] ....Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

reading the proposition bank [prop] .... Enriching parse with prop ...

reading the speaker bank [speaker] .... Enriching parse with speaker ...

found 1 original treebanks.

enriching treebanks with tree-to-tree parallel data .....

reading the proposition bank [prop] ....Enriching parse with prop ...

reading the speaker bank [speaker] ....Enriching parse with speaker ...

Pradhan, Xue OntoNotes: The 90% Solution

Page 291: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [140]: s 0 = o[0]

Loading banks for all@msnbc@bc@ch@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 665 trees in the treebank

reading the coreference bank [coref] .... Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

finding original trees to prepare for parallel bank enrichment....

Loading banks for all@msnbc@bc@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 660 trees in the treebank

reading the coreference bank [coref] ....Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

reading the proposition bank [prop] .... Enriching parse with prop ...

reading the speaker bank [speaker] .... Enriching parse with speaker ...

found 1 original treebanks.

enriching treebanks with tree-to-tree parallel data .....

reading the proposition bank [prop] ....Enriching parse with prop ...

reading the speaker bank [speaker] ....Enriching parse with speaker ...

Pradhan, Xue OntoNotes: The 90% Solution

Page 292: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [140]: s 0 = o[0]

Loading banks for all@msnbc@bc@ch@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 665 trees in the treebank

reading the coreference bank [coref] .... Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

finding original trees to prepare for parallel bank enrichment....

Loading banks for all@msnbc@bc@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 660 trees in the treebank

reading the coreference bank [coref] ....Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

reading the proposition bank [prop] .... Enriching parse with prop ...

reading the speaker bank [speaker] .... Enriching parse with speaker ...

found 1 original treebanks.

enriching treebanks with tree-to-tree parallel data .....

reading the proposition bank [prop] ....Enriching parse with prop ...

reading the speaker bank [speaker] ....Enriching parse with speaker ...

Pradhan, Xue OntoNotes: The 90% Solution

Page 293: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [140]: s 0 = o[0]

Loading banks for all@msnbc@bc@ch@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 665 trees in the treebank

reading the coreference bank [coref] .... Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

finding original trees to prepare for parallel bank enrichment....

Loading banks for all@msnbc@bc@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 660 trees in the treebank

reading the coreference bank [coref] ....Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

reading the proposition bank [prop] .... Enriching parse with prop ...

reading the speaker bank [speaker] .... Enriching parse with speaker ...

found 1 original treebanks.

enriching treebanks with tree-to-tree parallel data .....

reading the proposition bank [prop] ....Enriching parse with prop ...

reading the speaker bank [speaker] ....Enriching parse with speaker ...

Pradhan, Xue OntoNotes: The 90% Solution

Page 294: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [140]: s 0 = o[0]

Loading banks for all@msnbc@bc@ch@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 665 trees in the treebank

reading the coreference bank [coref] .... Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

finding original trees to prepare for parallel bank enrichment....

Loading banks for all@msnbc@bc@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 660 trees in the treebank

reading the coreference bank [coref] ....Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

reading the proposition bank [prop] .... Enriching parse with prop ...

reading the speaker bank [speaker] .... Enriching parse with speaker ...

found 1 original treebanks.

enriching treebanks with tree-to-tree parallel data .....

reading the proposition bank [prop] ....Enriching parse with prop ...

reading the speaker bank [speaker] ....Enriching parse with speaker ...

Pradhan, Xue OntoNotes: The 90% Solution

Page 295: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [140]: s 0 = o[0]

Loading banks for all@msnbc@bc@ch@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 665 trees in the treebank

reading the coreference bank [coref] .... Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

finding original trees to prepare for parallel bank enrichment....

Loading banks for all@msnbc@bc@en@on: parse, coref, sense, name, parallel, prop, speaker ...

reading the treebank [parse] .... 660 trees in the treebank

reading the coreference bank [coref] ....Enriching parse with coref ...

reading the sense bank [sense] .... Enriching parse with sense ...

reading the name bank [name].... Enriching parse with name ...

reading the parallel bank [parallel] ....

reading the proposition bank [prop] .... Enriching parse with prop ...

reading the speaker bank [speaker] .... Enriching parse with speaker ...

found 1 original treebanks.

enriching treebanks with tree-to-tree parallel data .....

reading the proposition bank [prop] ....Enriching parse with prop ...

reading the speaker bank [speaker] ....Enriching parse with speaker ...

Pradhan, Xue OntoNotes: The 90% Solution

Page 296: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

If you try to load the next subcorpus you will not see anyoutput because it has already read it automatically

In [141]: s 1 = o[1]

In [142]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 297: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

If you try to load the next subcorpus you will not see anyoutput because it has already read it automatically

In [141]: s 1 = o[1]

In [142]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 298: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

If you try to load the next subcorpus you will not see anyoutput because it has already read it automatically

In [141]: s 1 = o[1]

In [142]:

Pradhan, Xue OntoNotes: The 90% Solution

Page 299: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

But, they are different

In [143]: s 0

Out[143]:subcorpus instance, id=all@msnbc@bc@ch@on, banks:

[ coref] : gold@all@msnbc@bc@ch@on

[document] : gold@all@msnbc@bc@ch@on

[ name] : gold@all@msnbc@bc@ch@on

[parallel] : gold@all@msnbc@bc@ch@on

[ parse] : gold@all@msnbc@bc@ch@on

[ prop] : gold@all@msnbc@bc@ch@on

[ sense] : gold@all@msnbc@bc@ch@on

[ speaker] : gold@all@msnbc@bc@ch@on

In [144]: s 1

Out[144]:subcorpus instance, id=all@msnbc@bc@en@on, banks:

[ coref] : gold@all@msnbc@bc@en@on

[document] : gold@all@msnbc@bc@en@on

[ name] : gold@all@msnbc@bc@en@on

[parallel] : gold@all@msnbc@bc@en@on

[ parse] : gold@all@msnbc@bc@en@on

[ prop] : gold@all@msnbc@bc@en@on

[ sense] : gold@all@msnbc@bc@en@on

[ speaker] : gold@all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 300: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

But, they are differentIn [143]: s 0

Out[143]:subcorpus instance, id=all@msnbc@bc@ch@on, banks:

[ coref] : gold@all@msnbc@bc@ch@on

[document] : gold@all@msnbc@bc@ch@on

[ name] : gold@all@msnbc@bc@ch@on

[parallel] : gold@all@msnbc@bc@ch@on

[ parse] : gold@all@msnbc@bc@ch@on

[ prop] : gold@all@msnbc@bc@ch@on

[ sense] : gold@all@msnbc@bc@ch@on

[ speaker] : gold@all@msnbc@bc@ch@on

In [144]: s 1

Out[144]:subcorpus instance, id=all@msnbc@bc@en@on, banks:

[ coref] : gold@all@msnbc@bc@en@on

[document] : gold@all@msnbc@bc@en@on

[ name] : gold@all@msnbc@bc@en@on

[parallel] : gold@all@msnbc@bc@en@on

[ parse] : gold@all@msnbc@bc@en@on

[ prop] : gold@all@msnbc@bc@en@on

[ sense] : gold@all@msnbc@bc@en@on

[ speaker] : gold@all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 301: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

But, they are differentIn [143]: s 0

Out[143]:subcorpus instance, id=all@msnbc@bc@ch@on, banks:

[ coref] : gold@all@msnbc@bc@ch@on

[document] : gold@all@msnbc@bc@ch@on

[ name] : gold@all@msnbc@bc@ch@on

[parallel] : gold@all@msnbc@bc@ch@on

[ parse] : gold@all@msnbc@bc@ch@on

[ prop] : gold@all@msnbc@bc@ch@on

[ sense] : gold@all@msnbc@bc@ch@on

[ speaker] : gold@all@msnbc@bc@ch@on

In [144]: s 1

Out[144]:subcorpus instance, id=all@msnbc@bc@en@on, banks:

[ coref] : gold@all@msnbc@bc@en@on

[document] : gold@all@msnbc@bc@en@on

[ name] : gold@all@msnbc@bc@en@on

[parallel] : gold@all@msnbc@bc@en@on

[ parse] : gold@all@msnbc@bc@en@on

[ prop] : gold@all@msnbc@bc@en@on

[ sense] : gold@all@msnbc@bc@en@on

[ speaker] : gold@all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 302: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

But, they are differentIn [143]: s 0

Out[143]:subcorpus instance, id=all@msnbc@bc@ch@on, banks:

[ coref] : gold@all@msnbc@bc@ch@on

[document] : gold@all@msnbc@bc@ch@on

[ name] : gold@all@msnbc@bc@ch@on

[parallel] : gold@all@msnbc@bc@ch@on

[ parse] : gold@all@msnbc@bc@ch@on

[ prop] : gold@all@msnbc@bc@ch@on

[ sense] : gold@all@msnbc@bc@ch@on

[ speaker] : gold@all@msnbc@bc@ch@on

In [144]: s 1

Out[144]:subcorpus instance, id=all@msnbc@bc@en@on, banks:

[ coref] : gold@all@msnbc@bc@en@on

[document] : gold@all@msnbc@bc@en@on

[ name] : gold@all@msnbc@bc@en@on

[parallel] : gold@all@msnbc@bc@en@on

[ parse] : gold@all@msnbc@bc@en@on

[ prop] : gold@all@msnbc@bc@en@on

[ sense] : gold@all@msnbc@bc@en@on

[ speaker] : gold@all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 303: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

But, they are differentIn [143]: s 0

Out[143]:subcorpus instance, id=all@msnbc@bc@ch@on, banks:

[ coref] : gold@all@msnbc@bc@ch@on

[document] : gold@all@msnbc@bc@ch@on

[ name] : gold@all@msnbc@bc@ch@on

[parallel] : gold@all@msnbc@bc@ch@on

[ parse] : gold@all@msnbc@bc@ch@on

[ prop] : gold@all@msnbc@bc@ch@on

[ sense] : gold@all@msnbc@bc@ch@on

[ speaker] : gold@all@msnbc@bc@ch@on

In [144]: s 1

Out[144]:subcorpus instance, id=all@msnbc@bc@en@on, banks:

[ coref] : gold@all@msnbc@bc@en@on

[document] : gold@all@msnbc@bc@en@on

[ name] : gold@all@msnbc@bc@en@on

[parallel] : gold@all@msnbc@bc@en@on

[ parse] : gold@all@msnbc@bc@en@on

[ prop] : gold@all@msnbc@bc@en@on

[ sense] : gold@all@msnbc@bc@en@on

[ speaker] : gold@all@msnbc@bc@en@on

Pradhan, Xue OntoNotes: The 90% Solution

Page 304: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 305: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 306: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 307: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 308: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 309: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 310: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 311: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 312: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 313: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 314: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [145]: c t b = s 0["parse"]

In [146]: e t b = s 1["parse"]

In [147]: c t b

Out[147]:

treebank instance, id=gold@all@msnbc@bc@ch@on, documents:

[0] : bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on

In [148]: e t b

Out[148]:

treebank instance, id=gold@all@msnbc@bc@en@on, documents:

[0] : bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on

In [149]: c t doc = c t b[0]

In [150]: e t doc = e t b[0]

In [151]: c t 0 = c t doc[0]

In [152]: e t 0 = e t doc[0]

Pradhan, Xue OntoNotes: The 90% Solution

Page 315: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [153]: c t 0

Out[153]:<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on value=<

(TOP (IP (CODE [speaker1 #1E])

(IP (NP-SBJ (-NONE- *pro*))

(VP (NP-PRD (NP (NP-PN (NR ...))

(NP-PN (NR ...))

(NP-PN (NN ...)

(NN ...)

(NN ...)))

(NP (NN ...)))))

(PU ...)

(IP (NP-SBJ (PN ...))

(VP (VC ...)

(NP-PRD (DNP (NP-PN (NR ...))

(DEG ...))

(NP-PN (PU ...)

(IP (NP-SBJ (-NONE- *pro*))

(VP (VV ...)

(NP-OBJ (NN ...))))

(PU ...)))))

(PU ...)))>

Pradhan, Xue OntoNotes: The 90% Solution

Page 316: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [153]: c t 0

Out[153]:<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on value=<

(TOP (IP (CODE [speaker1 #1E])

(IP (NP-SBJ (-NONE- *pro*))

(VP (NP-PRD (NP (NP-PN (NR ...))

(NP-PN (NR ...))

(NP-PN (NN ...)

(NN ...)

(NN ...)))

(NP (NN ...)))))

(PU ...)

(IP (NP-SBJ (PN ...))

(VP (VC ...)

(NP-PRD (DNP (NP-PN (NR ...))

(DEG ...))

(NP-PN (PU ...)

(IP (NP-SBJ (-NONE- *pro*))

(VP (VV ...)

(NP-OBJ (NN ...))))

(PU ...)))))

(PU ...)))>

Pradhan, Xue OntoNotes: The 90% Solution

Page 317: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [153]: c t 0

Out[153]:<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on value=<

(TOP (IP (CODE [speaker1 #1E])

(IP (NP-SBJ (-NONE- *pro*))

(VP (NP-PRD (NP (NP-PN (NR ...))

(NP-PN (NR ...))

(NP-PN (NN ...)

(NN ...)

(NN ...)))

(NP (NN ...)))))

(PU ...)

(IP (NP-SBJ (PN ...))

(VP (VC ...)

(NP-PRD (DNP (NP-PN (NR ...))

(DEG ...))

(NP-PN (PU ...)

(IP (NP-SBJ (-NONE- *pro*))

(VP (VV ...)

(NP-OBJ (NN ...))))

(PU ...)))))

(PU ...)))>

Pradhan, Xue OntoNotes: The 90% Solution

Page 318: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [154]: e t 0

Out[154]:<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [speaker1])

(PP (IN From)

(NP (NP (NNP ~NBC)

(NN news))

(PP-LOC (IN in)

(NP (NNP Washington)))))

(NP-SBJ (DT this))

(VP (VBZ is)

(NP-PRD (NP (NNP Meet)

(NNP the)

(NNP Press))

(PP (IN with)

(NP (NNP Jim)

(NNP Russert)))))

(. /.)))>

Pradhan, Xue OntoNotes: The 90% Solution

Page 319: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [154]: e t 0

Out[154]:<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [speaker1])

(PP (IN From)

(NP (NP (NNP ~NBC)

(NN news))

(PP-LOC (IN in)

(NP (NNP Washington)))))

(NP-SBJ (DT this))

(VP (VBZ is)

(NP-PRD (NP (NNP Meet)

(NNP the)

(NNP Press))

(PP (IN with)

(NP (NNP Jim)

(NNP Russert)))))

(. /.)))>

Pradhan, Xue OntoNotes: The 90% Solution

Page 320: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [154]: e t 0

Out[154]:<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [speaker1])

(PP (IN From)

(NP (NP (NNP ~NBC)

(NN news))

(PP-LOC (IN in)

(NP (NNP Washington)))))

(NP-SBJ (DT this))

(VP (VBZ is)

(NP-PRD (NP (NNP Meet)

(NNP the)

(NNP Press))

(PP (IN with)

(NP (NNP Jim)

(NNP Russert)))))

(. /.)))>

Pradhan, Xue OntoNotes: The 90% Solution

Page 321: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [155]: c t 0.originals

Out[155]:[<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [speaker1])

(PP (IN From)

(NP (NP (NNP ~NBC)

(NN news))

(PP-LOC (IN in)

(NP (NNP Washington)))))

(NP-SBJ (DT this))

(VP (VBZ is)

(NP-PRD (NP (NNP Meet)

(NNP the)

(NNP Press))

(PP (IN with)

(NP (NNP Jim)

(NNP Russert)))))

(. /.)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 322: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [155]: c t 0.originals

Out[155]:[<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [speaker1])

(PP (IN From)

(NP (NP (NNP ~NBC)

(NN news))

(PP-LOC (IN in)

(NP (NNP Washington)))))

(NP-SBJ (DT this))

(VP (VBZ is)

(NP-PRD (NP (NNP Meet)

(NNP the)

(NNP Press))

(PP (IN with)

(NP (NNP Jim)

(NNP Russert)))))

(. /.)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 323: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [155]: c t 0.originals

Out[155]:[<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [speaker1])

(PP (IN From)

(NP (NP (NNP ~NBC)

(NN news))

(PP-LOC (IN in)

(NP (NNP Washington)))))

(NP-SBJ (DT this))

(VP (VBZ is)

(NP-PRD (NP (NNP Meet)

(NNP the)

(NNP Press))

(PP (IN with)

(NP (NNP Jim)

(NNP Russert)))))

(. /.)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 324: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [156]: c t 0.originals[0].translations

Out[156]:[<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on value=<

(TOP (IP (CODE [speaker1 #1E])

(IP (NP-SBJ (-NONE- *pro*))

(VP (NP-PRD (NP (NP-PN (NR ...))

(NP-PN (NR ...))

(NP-PN (NN ...)

(NN ...)

(NN ...)))

(NP (NN ...)))))

(PU ...)

(IP (NP-SBJ (PN ...))

(VP (VC ...)

(NP-PRD (DNP (NP-PN (NR ...))

(DEG ...))

(NP-PN (PU ...)

(IP (NP-SBJ (-NONE- *pro*))

(VP (VV ...)

(NP-OBJ (NN ...))))

(PU ...)))))

(PU ...)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 325: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [156]: c t 0.originals[0].translations

Out[156]:[<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on value=<

(TOP (IP (CODE [speaker1 #1E])

(IP (NP-SBJ (-NONE- *pro*))

(VP (NP-PRD (NP (NP-PN (NR ...))

(NP-PN (NR ...))

(NP-PN (NN ...)

(NN ...)

(NN ...)))

(NP (NN ...)))))

(PU ...)

(IP (NP-SBJ (PN ...))

(VP (VC ...)

(NP-PRD (DNP (NP-PN (NR ...))

(DEG ...))

(NP-PN (PU ...)

(IP (NP-SBJ (-NONE- *pro*))

(VP (VV ...)

(NP-OBJ (NN ...))))

(PU ...)))))

(PU ...)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 326: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [156]: c t 0.originals[0].translations

Out[156]:[<on.corpora.tree object id=0@bc/msnbc/00/msnbc 0000@all@msnbc@bc@ch@on value=<

(TOP (IP (CODE [speaker1 #1E])

(IP (NP-SBJ (-NONE- *pro*))

(VP (NP-PRD (NP (NP-PN (NR ...))

(NP-PN (NR ...))

(NP-PN (NN ...)

(NN ...)

(NN ...)))

(NP (NN ...)))))

(PU ...)

(IP (NP-SBJ (PN ...))

(VP (VC ...)

(NP-PRD (DNP (NP-PN (NR ...))

(DEG ...))

(NP-PN (PU ...)

(IP (NP-SBJ (-NONE- *pro*))

(VP (VV ...)

(NP-OBJ (NN ...))))

(PU ...)))))

(PU ...)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 327: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [167]: len(c t doc)

Out[167]: 665

In [169]: for c t index in range(0, len(c t doc)):

if(len(c t doc[c t index].originals) > 1):

print c t index

.....:

.....:

643

In [172]: c t 643 = c t doc[643]

Pradhan, Xue OntoNotes: The 90% Solution

Page 328: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [167]: len(c t doc)

Out[167]: 665

In [169]: for c t index in range(0, len(c t doc)):

if(len(c t doc[c t index].originals) > 1):

print c t index

.....:

.....:

643

In [172]: c t 643 = c t doc[643]

Pradhan, Xue OntoNotes: The 90% Solution

Page 329: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [167]: len(c t doc)

Out[167]: 665

In [169]: for c t index in range(0, len(c t doc)):

if(len(c t doc[c t index].originals) > 1):

print c t index

.....:

.....:

643

In [172]: c t 643 = c t doc[643]

Pradhan, Xue OntoNotes: The 90% Solution

Page 330: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [167]: len(c t doc)

Out[167]: 665

In [169]: for c t index in range(0, len(c t doc)):

if(len(c t doc[c t index].originals) > 1):

print c t index

.....:

.....:

643

In [172]: c t 643 = c t doc[643]

Pradhan, Xue OntoNotes: The 90% Solution

Page 331: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [167]: len(c t doc)

Out[167]: 665

In [169]: for c t index in range(0, len(c t doc)):

if(len(c t doc[c t index].originals) > 1):

print c t index

.....:

.....:

643

In [172]: c t 643 = c t doc[643]

Pradhan, Xue OntoNotes: The 90% Solution

Page 332: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [167]: len(c t doc)

Out[167]: 665

In [169]: for c t index in range(0, len(c t doc)):

if(len(c t doc[c t index].originals) > 1):

print c t index

.....:

.....:

643

In [172]: c t 643 = c t doc[643]

Pradhan, Xue OntoNotes: The 90% Solution

Page 333: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [173]: c t 643.originals

Out[173]:[ <on.corpora.tree object id=638@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [Tim Russert])

(NP-SBJ-1 (PRP I))

(VP (VBD was)

(ADJP-PRD (JJ ready)

(S (NP-SBJ (-NONE- *PRO*-1))

(VP (TO to)

(VP (VB wear)

(NP (DT this))

(PP-PRP (IN for)

(NP (DT the)

(JJ final)

(CD four))))))))

(. /.)))>,

<on.corpora.tree object id=639@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<(TOP (S-UNF (CODE [Tim Russert])

(CC but)

(INTJ (UH uh))

(INTJ (UH uh))

(NP-SBJ (PRP I))

(. /-)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 334: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [173]: c t 643.originals

Out[173]:[ <on.corpora.tree object id=638@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [Tim Russert])

(NP-SBJ-1 (PRP I))

(VP (VBD was)

(ADJP-PRD (JJ ready)

(S (NP-SBJ (-NONE- *PRO*-1))

(VP (TO to)

(VP (VB wear)

(NP (DT this))

(PP-PRP (IN for)

(NP (DT the)

(JJ final)

(CD four))))))))

(. /.)))>,

<on.corpora.tree object id=639@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<(TOP (S-UNF (CODE [Tim Russert])

(CC but)

(INTJ (UH uh))

(INTJ (UH uh))

(NP-SBJ (PRP I))

(. /-)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 335: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

In [173]: c t 643.originals

Out[173]:[ <on.corpora.tree object id=638@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<

(TOP (S (CODE [Tim Russert])

(NP-SBJ-1 (PRP I))

(VP (VBD was)

(ADJP-PRD (JJ ready)

(S (NP-SBJ (-NONE- *PRO*-1))

(VP (TO to)

(VP (VB wear)

(NP (DT this))

(PP-PRP (IN for)

(NP (DT the)

(JJ final)

(CD four))))))))

(. /.)))>,

<on.corpora.tree object id=639@bc/msnbc/00/msnbc 0005@all@msnbc@bc@en@on value=<(TOP (S-UNF (CODE [Tim Russert])

(CC but)

(INTJ (UH uh))

(INTJ (UH uh))

(NP-SBJ (PRP I))

(. /-)))>]

Pradhan, Xue OntoNotes: The 90% Solution

Page 336: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Advanced Configuration

[corpus]

data in : data

load : english-nw-wsj

granularity : source

banks : parse coref sense name prop

b parse:parse b sense:b parse b prop:b parse

ignore-inventories: senses frames

Pradhan, Xue OntoNotes: The 90% Solution

Page 337: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Dealing with Alignment

Examples on Live CD

Pradhan, Xue OntoNotes: The 90% Solution

Page 338: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Cross-Layer QueryExample of DB Query Function

for a_proposition in a_proposition_bank:if(a_proposition.lemma != "say"):

arg_in_p_q = "select * from argument where proposition_id = '%s';" % (a_proposition.id)a_cursor.execute(arg_in_p_query)argument_rows = a_cursor.fetchall()

for a_argument_row in argument_rows:a_argument_id = a_argument_row["id"]a_argument_type = a_argument_row["type"]

if(a_argument_type != "ARG0"):n_in_arg_q = "select * from argument_node where argument_id = '%s';" % (a_argument_id)a_cursor.execute(n_in_arg_q)argument_node_rows = a_cursor.fetchall()for a_argument_node_row in argument_node_rows:

a_node_id = a_argument_node_row["node_id"]

a_ne_node_query = "select * from name_entity where subtree_id = '%s';" % (a_node_id)a_cursor.execute(a_ne_node_query)ne_rows = a_cursor.fetchall()

for a_ne_row in ne_rows:a_ne_type = a_ne_row["type"]ne_hash[a_ne_type] = ne_hash[a_ne_type] + 1

a_tree = a_tree_document.get_tree(a_tree_id)a_node = a_tree.get_subtree(a_node_id)

for a_child in a_node.subtrees():a_ne_subtree_query = "select * from name_entity where subtree_id = '%s';" % (a_child.id)subtree_ne_rows = a_cursor.execute(a_ne_subtree_query)

ne_subtree_rows = a_cursor.fetchall()

for a_ne_subtree_row in ne_subtree_rows:a_subtree_ne_type = a_ne_subtree_row["type"]ne_hash[a_subtree_ne_type] = ne_hash[a_subtree_ne_type] + 1

if (proposition.lemma == “say”):

query = “select * from argument where proposition_id = '%s';” ..

What is the distribution of named entities that are ARG0s of the predicate “say”?

if (argument_type == "ARG0"):

for child in node.subtrees():

......

15NORP

29Organization

34GPE

84Person

FrequencyName Entity

Pradhan, Xue OntoNotes: The 90% Solution

Page 339: OntoNotes: The 90% Solutioncemantix.org/papers/ontonotes-tutorial.pdfKhan Korea large geopolitical entity and Word Sense The founder of Pakistan s nuclear department, Abdul Qadeer

Challenges with Multiple Layers of AnnotationArchitecture

Raw DataDatabase Design

Python API DesignData Access

ConfigurationCreating ontonotesExploring Various LayersExploring Parallel ConnectionsAdvanced TopicsCross-Layer Query

Acknowledgment

Pradhan, Xue OntoNotes: The 90% Solution