incremental structured prediction using a global learning

395
Incremental Structured Prediction Using a Global Learning and Beam-Search Framework Yue Zhang 1 , Meishan Zhang 2 , Ting Liu 2 Singapore University of Technology and Design 1 [email protected] Harbin Institute of Technology, China 2 {mszhang, tliu}@ir.hit.edu.cn

Upload: others

Post on 02-Mar-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Incremental Structured Prediction Using a Global Learning

Incremental Structured Prediction Using a Global Learning and Beam-Search Framework

Yue Zhang1, Meishan Zhang2, Ting Liu2

Singapore University of Technology and Design1

[email protected]

Harbin Institute of Technology, China2

{mszhang, tliu}@ir.hit.edu.cn

Page 2: Incremental Structured Prediction Using a Global Learning

Outline

Introduction Applications

Analysis ZPar

Page 3: Incremental Structured Prediction Using a Global Learning

Outline

Introduction Applications

Analysis ZPar

Page 4: Incremental Structured Prediction Using a Global Learning

Introduction

Structured prediction problems

An overview of the transition system

Algorithms in details

Beam-search decoding

Online learning using averaged perceptron

Page 5: Incremental Structured Prediction Using a Global Learning

Introduction

Structured prediction problems

An overview of the transition system

Algorithms in details

Beam-search decoding

Online learning using averaged perceptron

Page 6: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Two important tasks in NLP

Classification

Output is a single label

Examples

Document classification

Sentiment analysis

Spam filtering

Structured prediction

Output is a set of inter-related labels or a structure

Page 7: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

POS Tagging

Page 8: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Dependency parsing

Page 9: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Constituent parsing

Page 10: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Machine Translation

Page 11: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Traditional solution

Score each candidate, select the highest-scored output

Search-space typically exponential

Over 100 possible trees for this seven-word sentence. Over one million trees for a 20-word sentence.

Page 12: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

One solution: dynamic programing methods

Independence assumption on features

Local features with global optimization

Solve the exponential problems in polynomial time

Page 13: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

One solution: dynamic programing methods

Independence assumption on features

Local features with global optimization

Solve the exponential problems in polynomial time

Examples

POS tagging: Markov assumption, p(ti|ti-1…t1) = p(ti|ti-1)

Viterbi decoding

Dependency parsing: arc-factorization

1st-order MST decoding

Page 14: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

The learning problem

How to score candidate items such that a higher reflects a more correct candidate.

Examples

POS-tagging: HMM, CRF

Dependency parsing: MIRA

Page 15: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

A framework for structured prediction

Page 16: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

A framework for structured prediction

Incremental state transitions

Use transition actions to build the output

Typically left to right

Typically linear time

Page 17: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

A framework for structured prediction

Incremental state transitions

The search problem

To find a highest-score action sequence out of an exponential number of sequences, rather than scoring structures directly

Beam-search (non-exhaustive decoding)

Page 18: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

A framework for structured prediction

Incremental state transitions

The search problem

Non-local features

Arbitrary features enabled by beam-search

Page 19: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

A framework for structured prediction

Incremental state transitions

The search problem

Non-local features

The learning problem

To score candidates such that a higher-scored action sequence leads to a more correct action sequence

Global discriminative learning

Page 20: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

A framework for structured prediction

Incremental state transitions

The search problem

Non-local features

The learning problem

The framework of this tutorial

(Zhang and Clark, CL 2011)

Page 21: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

The framework of this tutorial

Very high accuracies and efficiencies using this framework

Word segmentation (Zhang and Clark, ACL 2007)

POS-tagging

Dependency parsing (Zhang and Clark, EMNLP 2008; Huang and Sagae ACL 2010, Zhang and Nirve, ACL

2011, Zhang and Nirve, COLING 2012; Goldberg et al., ACL 2013 )

Constituent parsing (Collins and Roark, ACL 2004; Zhang and Clark, IWPT 2009; Zhu et al. ACL 2013)

CCG parsing (Zhang and Clark, ACL 2011)

Machine translation (Liu, ACL 2013)

Joint word segmentation and POS-tagging (Zhang and Clark, ACL 2008; Zhang and Clark, EMNLP 2010)

Joint POS-tagging and dependency parsing (Hatori et al. IJCNLP 2011; Bohnet and Nirve, EMNLP 2012)

Joint word segmentation, POS-tagging and parsing (Hatori et al. ACL 2012; Zhang et al. ACL2013; Zhang et

al. ACL2014)

Joint morphological analysis and syntactic parsing (Bohnet et al., TACL 2013)

Page 22: Incremental Structured Prediction Using a Global Learning

Structured prediction problems

Transition-based methods with beam search decoding

The framework of this tutorial

Very high accuracies and efficiencies using this

framework

General

Can apply to any structured predication tasks, which can be transformed into an incremental process

Page 23: Incremental Structured Prediction Using a Global Learning

Introduction

Structured prediction problems

An overview of the transition system

Algorithms in details

Beam-search decoding

Online learning using averaged perceptron

Page 24: Incremental Structured Prediction Using a Global Learning

A transition system

Automata

State

Start state —— an empty structure

End state —— the output structure

Intermediate states —— partially constructed structures

Actions

Change one state to another

Page 25: Incremental Structured Prediction Using a Global Learning

Automata

A transition system

start

Page 26: Incremental Structured Prediction Using a Global Learning

Automata

A transition system

start

a0

S1

Page 27: Incremental Structured Prediction Using a Global Learning

Automata

A transition system

start …

a0

S1

a1

Page 28: Incremental Structured Prediction Using a Global Learning

Automata

A transition system

start …

a0

S1 Si

a1 ai-1

Page 29: Incremental Structured Prediction Using a Global Learning

Automata

A transition system

start …

a0

S1 Si …

a1 ai-1 ai

Page 30: Incremental Structured Prediction Using a Global Learning

Automata

A transition system

start …

a0

S1 Si … Sn

a1 ai-1 ai an-1

Page 31: Incremental Structured Prediction Using a Global Learning

Automata

A transition system

start …

a0

S1 Si … Sn end

a1 ai-1 ai an-1 an

Page 32: Incremental Structured Prediction Using a Global Learning

State

Corresponds to partial results during decoding

start state, end state, Si

Actions

The operations that can be applied for state transition

Construct output incrementally

ai

A transition system

start …

a0

S1 Si … Sn end

a1 ai-1 ai an-1 an

Page 33: Incremental Structured Prediction Using a Global Learning

A transition-based POS-tagging example

POS tagging

I like reading books → I/PRON like/VERB reading/VERB books/NOUN

Transition system

State

Partially labeled word-POS pairs

Unprocessed words

Actions

TAG(t) 𝑤1/𝑡1 ⋯ 𝑤𝑖/𝑡𝑖 → 𝑤1/𝑡1 ⋯ 𝑤𝑖/𝑡𝑖 𝑤𝑖+1/𝑡

Page 34: Incremental Structured Prediction Using a Global Learning

A transition-based POS-tagging example

Start State

I like reading books

Page 35: Incremental Structured Prediction Using a Global Learning

A transition-based POS-tagging example

TAG(PRON)

I/PRON like reading books

Page 36: Incremental Structured Prediction Using a Global Learning

A transition-based POS-tagging example

TAG(VERB)

I/PRON like/VERB reading books

Page 37: Incremental Structured Prediction Using a Global Learning

A transition-based POS-tagging example

TAG(VERB)

books I/PRON like/VERB reading/VERB

Page 38: Incremental Structured Prediction Using a Global Learning

A transition-based POS-tagging example

TAG (NOUN)

I/PRON like/VERB reading/VERB books/NOUN

Page 39: Incremental Structured Prediction Using a Global Learning

A transition-based POS-tagging example

End State

I/PRON like/VERB reading/VERB books/NOUN

Page 40: Incremental Structured Prediction Using a Global Learning

Introduction

Structured prediction problems

An overview of the transition system

Algorithms in details

Beam-search decoding

Online learning using averaged perceptron

Page 41: Incremental Structured Prediction Using a Global Learning

Introduction

Structured prediction problems

An overview of the transition system

Algorithms in details

Beam-search decoding

Online learning using averaged perceptron

Page 42: Incremental Structured Prediction Using a Global Learning

Find the best sequence of actions

Search

S0

S1

S’1

S’’1

a’0

S2

S’2

S’’2

•••

•••

⁞ ⁞

S’n

S’’n

•••

•••

Sn

•••

⁞⁞⁞

⁞⁞⁞

⁞⁞⁞

Page 43: Incremental Structured Prediction Using a Global Learning

Dynamic programming

Optimum sub-problems are recorded according to dynamic programming signature

Infeasible if features are non-local (which are typically useful)

One solution

Greedy classification

Input: Si

Output:𝑎𝑖 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑎′

𝑤 ∙ 𝑓(Si, 𝑎′)

For better accuracies: beam-search decoding

Search

Page 44: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

start

Zhang and Clark, CL 2011

Page 45: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

start

a00

a01

a0k

a0(k-1)

S11

S12

… S1(k-1)

S1k

Zhang and Clark, CL 2011

Page 46: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

start

S11

S12

… S1(k-1)

S1k

… …

a10

a11

a1k

a1(k-1)

a00

a01

a0k

a0(k-1)

Zhang and Clark, CL 2011

Page 47: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

start

S11

S12

… S1(k-1)

S1k

… …

Si1

Si2

… Si(k-1)

Sik

a00

a01

a0k

a0(k-1)

a10

a11

a1k

a1(k-1)

a(i-1)1

a(i-1)(k-1)

a(i-1)0

a(i-1)k

Zhang and Clark, CL 2011

Page 48: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

start

S11

S12

… S1(k-1)

S1k

… …

Si1

Si2

… Si(k-1)

Sik

ai1

ai(k-1)

ai0

aik

… …

a00

a01

a0k

a0(k-1)

a10

a11

a1k

a1(k-1)

a(i-1)1

a(i-1)(k-1)

a(i-1)0

a(i-1)k

Zhang and Clark, CL 2011

Page 49: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

start

S11

S12

… S1(k-1)

S1k

… …

Si1

Si2

… Si(k-1)

Sik

a(i-1)1

a(i-1)(k-1)

a(i-1)0

a(i-1)k

… …

End1

End2

Endk-1

Endk

an1

an(k-1)

an0

ank

a00

a01

a0k

a0(k-1)

a10

a11

a1k

a1(k-1)

ai1

ai(k-1)

ai0

aik

Zhang and Clark, CL 2011

Page 50: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Zhang and Clark, CL 2011

Page 51: Incremental Structured Prediction Using a Global Learning

An example: POS-tagging

I like reading books

Beam-search decoding

start

Zhang and Clark, CL 2011

Page 52: Incremental Structured Prediction Using a Global Learning

An example: POS-tagging

I like reading books

Beam-search decoding

I/PRON

I/NOUN I/ADV I/ADP

start

PRON

NOUN

ADV

ADP

Zhang and Clark, CL 2011

I/VERB

I/ADV I/PREP I/NR

Page 53: Incremental Structured Prediction Using a Global Learning

An example: POS-tagging

I like reading books

Beam-search decoding

I/PRON

I/NOUN I/ADV I/ADP

start I/PRON like/VERB

I/NOUN like/VERB I/PRON like/CONJ

I/NOUN like/CONJ

VERB

VERB PRON

NOUN

ADV

ADP

Zhang and Clark, CL 2011

I/VERB

I/ADV I/PREP I/NR

…...

…… …... ..….

Page 54: Incremental Structured Prediction Using a Global Learning

An example: POS-tagging

I like reading books

Beam-search decoding

I/PRON

I/NOUN I/ADV I/ADP

start I/PRON like/VERB

I/NOUN like/VERB I/PRON like/CONJ

I/NOUN like/CONJ

……. PRON

NOUN

ADV

ADP

VERB

VERB

Zhang and Clark, CL 2011

I/VERB

I/ADV I/PREP I/NR

…...

…… …... ..….

…...

…… …... ..….

Page 55: Incremental Structured Prediction Using a Global Learning

An example: POS-tagging

I like reading books

Beam-search decoding

I/PRON

I/NOUN I/ADV I/ADP

start I/PRON like/VERB

I/NOUN like/VERB I/PRON like/CONJ

I/NOUN like/CONJ

I/PRON like/VERB reading/VERB books/NOUN I/PRON like/VERB reading/ADJ books/NOUN I/PRON like/CONJ reading/ADJ books/NOUN

I/PRON like/VERB reading/NOUN books/NOUN

NOUN

NOUN

NOUN

NOUN

PRON

NOUN

ADV

ADP

VERB

VERB

…….

Zhang and Clark, CL 2011

I/VERB

I/ADV I/PREP I/NR

…...

…… …... ..….

…...

…… …... ..….

…...

…… …... ..….

Page 56: Incremental Structured Prediction Using a Global Learning

Introduction

Structured prediction problems

An Overview of the transition system

Algorithms in details

Beam-search decoding

Online learning using averaged perceptron

Page 57: Incremental Structured Prediction Using a Global Learning

Online learning

start

Zhang and Clark, CL 2011

Page 58: Incremental Structured Prediction Using a Global Learning

Online learning

start

S11

S12

… S1g

… S1(k-1)

S1k

a00

a01

a0k

a0(k-1)

Zhang and Clark, CL 2011

Page 59: Incremental Structured Prediction Using a Global Learning

Online learning

start

S11

S12

… S1g

… S1(k-1)

S1k

a00

a01

a0k

a0(k-1)

a10

a11

a1k

a1(k-1)

Zhang and Clark, CL 2011

… …

Page 60: Incremental Structured Prediction Using a Global Learning

Online learning

start

S11

S12

… S1g

… S1(k-1)

S1k

… …

Si1

Si2

… Sig

… Si(k-1)

Sik

a00

a01

a0k

a0(k-1)

a10

a11

a1k

a1(k-1)

a(i-1)1

a(i-1)(k-1)

a(i-1)0

a(i-1)k

Zhang and Clark, CL 2011

Page 61: Incremental Structured Prediction Using a Global Learning

Online learning

S(i+1)g

perceptron update here!

start

S11

S12

… S1g

… S1(k-1)

S1k

… …

Si1

Si2

… Sig

… Si(k-1)

Sik

a00

a01

a0k

a0(k-1)

a10

a11

a1k

a1(k-1)

a(i-1)1

a(i-1)(k-1)

a(i-1)0

a(i-1)k

S(i+1)1

S(i+1)2

… S(i+1) (k-1)

S(i+1)k

a(i-1)1

a(i-1)(k-1)

a(i-1)0

a(i-1)k

Zhang and Clark, CL 2011

Page 62: Incremental Structured Prediction Using a Global Learning

Online learning

Zhang and Clark, CL 2011

Page 63: Incremental Structured Prediction Using a Global Learning

Outline

Introduction Applications

Analysis ZPar

Page 64: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 65: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 66: Incremental Structured Prediction Using a Global Learning

Introduction

Chinese word segmentation 我喜欢读书 Ilikereadingbooks 我 喜欢 读 书 I like reading books

Ambiguity

Out-of-vocabulary words (OOV words) 进步 (make progress; OOV) 进(advance; known) 步(step; known)

Known words 这里面: 这里(here) 面(flour) 很(very) 贵(expensive) 这(here) 里面(inside) 很 (very) 冷 (cold)

洽谈会很成功:

洽谈会(discussion meeting) 很 (very) 成功(successful) 洽谈(discussion) 会(will) 很(very) 成功(succeed)

Page 67: Incremental Structured Prediction Using a Global Learning

Introduction

No fixed standard

only about 75% agreement among native speakers

task dependency 北京银行: 北京银行(Bank of Beijing) 北京(Beijing)银行(bank)

Therefore, supervised learning with specific training corpora seems more appropriate.

the dominant approach

Page 68: Incremental Structured Prediction Using a Global Learning

Introduction

The character-tagging approach

Map word segmentation into character tagging 我 喜欢 读 书 我/S喜/B欢/E读/S书/S

Context information: neighboring five character window

Traditionally CRF is used

This method can be implemented using our framework also!

(cf. the sequence labeling example in the intro)

Page 69: Incremental Structured Prediction Using a Global Learning

Introduction

Limitation of the character tagging method 中国外企业 其中(among which) 国外(foreign) 企业(companies) 中国(in China) 外企(foreign companies) 业务(business)

Motivation of a word-based method

Compare candidates by word information directly

Potential for more linguistically motivated features

Zhang and Clark, ACL 2007

Page 70: Incremental Structured Prediction Using a Global Learning

The transition system

State

Partially segmented results

Unprocessed characters

Two candidate actions

Separate ## ## → ## ## #

Append ## ## → ## ## #

Zhang and Clark, ACL 2007

Page 71: Incremental Structured Prediction Using a Global Learning

The transition system

Initial State

我喜欢读书

I like reading books

Zhang and Clark, ACL 2007

Page 72: Incremental Structured Prediction Using a Global Learning

The transition system

Separate

喜欢读书 我

Zhang and Clark, ACL 2007

Page 73: Incremental Structured Prediction Using a Global Learning

The transition system

Separate

欢读书 我 喜

Zhang and Clark, ACL 2007

Page 74: Incremental Structured Prediction Using a Global Learning

The transition system

Append

读书 我 喜欢

Zhang and Clark, ACL 2007

Page 75: Incremental Structured Prediction Using a Global Learning

The transition system

Separate

书 我 喜欢 读

Zhang and Clark, ACL 2007

Page 76: Incremental Structured Prediction Using a Global Learning

The transition system

Separate

我 喜欢 读 书

Zhang and Clark, ACL 2007

Page 77: Incremental Structured Prediction Using a Global Learning

The transition system

End State

我 喜欢 读 书

Zhang and Clark, ACL 2007

Page 78: Incremental Structured Prediction Using a Global Learning

Beam search

ABCDE

“”

Candidates Agenda

Zhang and Clark, ACL 2007

Page 79: Incremental Structured Prediction Using a Global Learning

Beam search

BCDE

A

Zhang and Clark, ACL 2007

Candidates Agenda

Page 80: Incremental Structured Prediction Using a Global Learning

Beam search

BCDE

A

Zhang and Clark, ACL 2007

Candidates Agenda

Page 81: Incremental Structured Prediction Using a Global Learning

Beam search

CDE

A AB A B

Zhang and Clark, ACL 2007

Candidates Agenda

Page 82: Incremental Structured Prediction Using a Global Learning

Beam search

CDE

AB A B

Zhang and Clark, ACL 2007

Candidates Agenda

Page 83: Incremental Structured Prediction Using a Global Learning

Beam search

CDE

AB A B

Zhang and Clark, ACL 2007

Candidates Agenda

Page 84: Incremental Structured Prediction Using a Global Learning

Beam search

DE

AB A B

ABC AB C A BC A B C

Zhang and Clark, ACL 2007

Candidates Agenda

Page 85: Incremental Structured Prediction Using a Global Learning

The beam search decoder

For a given sentence with length=l, there are 2l-1 possible segmentations.

The agenda size is limited, keeping only the B best candidates

Zhang and Clark, ACL 2007

Page 86: Incremental Structured Prediction Using a Global Learning

Feature templates

1

2

3

4

5

6

7

8

9

10

11

12

13

14

word w

word bigram w1w2

single character word w

a word starting with character c and having length l

a word ending with character c and having length l

space separated characters c1 and c2

character bigram c1c2 in any word

the first and last characters c1 and c2 of any word

word w immediately before character c

character c immediately before word w

the starting characters c1 and c2 of two consecutive words

the ending characters c1 and c2 of two consecutive words

a word with length l and the previous word w

a word with length l and the next word w

Zhang and Clark, ACL 2007

Page 87: Incremental Structured Prediction Using a Global Learning

Experimental results

beam = 1

beam = 2

beam = 4

beam = 8 beam = 16

beam = 32 beam = 64

Tradeoff between speed and accuracies (CTB5).

Zhang and Clark, ACL 2007

Page 88: Incremental Structured Prediction Using a Global Learning

Experimental results

Compare with other systems (SIGHAN 2005).

AS CU PU SAV OAV

S01 93.8 90.1 95.1 93.0 95.5

S04 93.9 93.9 94.8

S05 94.2 89.4 91.8 95.9

S06 94.5 92.4 92.4 93.1 95.5

S08 90.4 93.6 92.9 94.8

S09 96.1 94.6 95.4 95.9

S10 94.7 94.7 94/8

S12 95.9 91.6 93.8 95.9

Peng 95.6 92.8 94.1 94.2 95.5

Z&C 07 97.0 94.6 94.6 95.4 95.5

Zhang and Clark, ACL 2007

Page 89: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 90: Incremental Structured Prediction Using a Global Learning

Dependency syntax

Dependency structures represent syntactic relations (dependencies) by drawing links between word pairs in a sentence.

For the link: a telescope

90

• Modifier • Dependent • Child

• Modifier • Dependent • Child

• Head • Governor • Parent

• Head • Governor • Parent

Page 91: Incremental Structured Prediction Using a Global Learning

Dependency graphs

A dependency structure is a directed graph G with the following constraints:

Connected

Acyclic

Single-head

91

tree

Page 92: Incremental Structured Prediction Using a Global Learning

A dependency tree structure represents syntactic relations between word pairs in a sentence

I saw her duck with a telescope

gen

obj

mod

I saw her duck with a telescope

mod

obj

Dependency trees

92

subj

obj det

gen

subj

obj det

Page 93: Incremental Structured Prediction Using a Global Learning

Categorization (Kübler et al. 2009)

Projective

Non-projective

Dependency trees

93

Page 94: Incremental Structured Prediction Using a Global Learning

Score each possible output

Often use dynamic programming to explore search space

The graph-based solution

94 McDonald et al., ACL 2005 Carreras, EMNLP-CONLL 2007; Koo and Collins, ACL 2010

Page 95: Incremental Structured Prediction Using a Global Learning

Projective

Arc-eager

Arc-standard (Nirve, CL 2008)

Non-projective

Arc standard + swap (Nirve, ACL 2009)

Transition systems

95

Page 96: Incremental Structured Prediction Using a Global Learning

The arc-eager transition system

State

A stack to hold partial structures

A queue of next incoming words

Actions

SHIFT, REDUCE, ARC-LEFT, ARC-RIGHT

Page 97: Incremental Structured Prediction Using a Global Learning

State

97

ST STP ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

N0LC

The arc-eager transition system

Page 98: Incremental Structured Prediction Using a Global Learning

Actions

Shift

98

The arc-eager transition system

ST STP ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

N0LC

Page 99: Incremental Structured Prediction Using a Global Learning

Actions

Shift

Pushes stack

99

N0LC

The arc-eager transition system

ST STP ...

STLC STRC

The stack

The input

N1 N2 N3 ... N0

Page 100: Incremental Structured Prediction Using a Global Learning

Actions

Reduce

100

ST STP ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

N0LC

The arc-eager transition system

Page 101: Incremental Structured Prediction Using a Global Learning

Actions

Reduce

Pops stack

ST

STP ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

N0LC

101

The arc-eager transition system

Page 102: Incremental Structured Prediction Using a Global Learning

Actions

Arc-Left

102

ST STP ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

N0LC

The arc-eager transition system

Page 103: Incremental Structured Prediction Using a Global Learning

Actions

Arc-Left

Pops stack

Adds link

103

STP ...

The stack

The input

N0 N1 N2 N3 ...

N0LC ST

STLC STRC

The arc-eager transition system

Page 104: Incremental Structured Prediction Using a Global Learning

Actions

Arc-right

104

ST STP ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

N0LC

The arc-eager transition system

Page 105: Incremental Structured Prediction Using a Global Learning

Actions

Arc-right

Pushes stack

Adds link

105

The arc-eager transition system

ST STP ...

STLC STRC

The stack

The input

N1 N2 N3 ...

N0LC

N0

Page 106: Incremental Structured Prediction Using a Global Learning

106

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here

The arc-eager transition system

Page 107: Incremental Structured Prediction Using a Global Learning

107

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here does it here He S

The arc-eager transition system

Page 108: Incremental Structured Prediction Using a Global Learning

108

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here does it here He S does it here AL

He

108

The arc-eager transition system

Page 109: Incremental Structured Prediction Using a Global Learning

109

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here does it here He S does it here AL

He

it here S

He

does

109

The arc-eager transition system

Page 110: Incremental Structured Prediction Using a Global Learning

110

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here does it here He S does it here AL

He

it here S

He

does

He

does it

AR

here

110

The arc-eager transition system

Page 111: Incremental Structured Prediction Using a Global Learning

111

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here does it here He S does it here AL

He

it here S

He

does

He

does it

AR

here R

He

does here

it

111

The arc-eager transition system

Page 112: Incremental Structured Prediction Using a Global Learning

112

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here does it here He S does it here AL

He

it here S

He

does

He

does it

AR

here R

He

does here

it

AR

He

does here

it

112

The arc-eager transition system

Page 113: Incremental Structured Prediction Using a Global Learning

113

An example S – Shift R – Reduce AL – ArcLeft AR – ArcRight

He does it here does it here He S does it here AL

He

it here S

He

does

He

does it

AR

here R

He

does here

it

AR

He

does here

it

R

He

does

it here

The arc-eager transition system

Page 114: Incremental Structured Prediction Using a Global Learning

Arc-eager

Time complexity: linear

Every word is pushed once onto the stack

Every word except the root is popped once

Links are added between ST and N0

As soon as they are in place

'eager'

114

The arc-eager transition system

Page 115: Incremental Structured Prediction Using a Global Learning

Arc-eager

Labeled parsing? – expand the link-adding actions ArcLeft subject ArcLeft ArcLeft noun modifier ... ArcRight object ArcRight ArcRight prep modifier

...

115

The arc-eager transition system

Page 116: Incremental Structured Prediction Using a Global Learning

State

A stack to hold partial candidates

A queue of next incoming words

Actions

SHIFT LEFT-REDUCE RIGHT-REDUCE

Builds arcs between ST0 and ST1

Associated with shift-reduce CFG parsing process

116

The arc-standard transition system

Page 117: Incremental Structured Prediction Using a Global Learning

Actions

Shift

117

ST ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The arc-standard transition system

Page 118: Incremental Structured Prediction Using a Global Learning

Actions

Shift

Pushes stack

118

ST ST1 ...

STLC STRC

The stack

The input

N1 N2 N3 ... N0

The arc-standard transition system

Page 119: Incremental Structured Prediction Using a Global Learning

Actions

Left-reduce

119

ST ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The arc-standard transition system

Page 120: Incremental Structured Prediction Using a Global Learning

Actions

Left-reduce

Pops stack

Adds link

120

ST

ST1

...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The arc-standard transition system

Page 121: Incremental Structured Prediction Using a Global Learning

Actions

Right-reduce

121

ST ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The arc-standard transition system

Page 122: Incremental Structured Prediction Using a Global Learning

Actions

Right-reduce

Pops stack

Adds link

122

ST

ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The arc-standard transition system

Page 123: Incremental Structured Prediction Using a Global Learning

Characteristic

Time complexity: linear

Empirically comparable with arc-eager, but accuracies for different languages are different

123

The arc-standard transition system

Page 124: Incremental Structured Prediction Using a Global Learning

Non-projectivity

Online reordering (Nivre 2009)

Based on an extra action to the parser: swap

Not linear any more

Can be quadratic due to swap

Expected linear time

ST ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ... ST ...

STLC STRC

The stack

The input

ST1 N0 N1 N2 N3 ...

Page 125: Incremental Structured Prediction Using a Global Learning

Non-projectivity

Initial

125

A meeting was scheduled for this today

Page 126: Incremental Structured Prediction Using a Global Learning

Non-projectivity

SHIFT

126

A meeting was scheduled for this today

Page 127: Incremental Structured Prediction Using a Global Learning

Non-projectivity

SHIFT

127

A meeting was scheduled for this today

Page 128: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

ARC-LEFT

128

meeting was scheduled for this today

A

Page 129: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

129

meeting was scheduled for this today

A

Page 130: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

130

meeting was scheduled for this today

A

Page 131: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

131

meeting was scheduled for this today

A

Page 132: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SWAP

meeting was for scheduled this today

A

Page 133: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SWAP

meeting for was scheduled this today

A

Page 134: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

meeting for was scheduled this today

A

Page 135: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

meeting for was scheduled this today

A

Page 136: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

meeting for was scheduled this today

A

Page 137: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SWAP

meeting for was this scheduled today

A

Page 138: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SWAP

meeting for this was scheduled today

A

Page 139: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

ARC-RIGHT

meeting for was scheduled today

A this

Page 140: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

ARC-RIGHT

meeting was scheduled today

A for

this

Page 141: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

meeting was scheduled today

A for

this

Page 142: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

ARC-LEFT

was scheduled today

A for

this

meeting

Page 143: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

was scheduled today

A for

this

meeting

Page 144: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

SHIFT

was scheduled today

A for

this

meeting

Page 145: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

ARC-RIGHT

was scheduled

A for

this

meeting today

Page 146: Incremental Structured Prediction Using a Global Learning

A transition-based parsing process

ARC-RIGHT

was

A for

this

meeting scheduled

today

Page 147: Incremental Structured Prediction Using a Global Learning

The arc-eager parser using our framework

The arc-eager transition process

Beam-search decoding

Keeps N different partial state items in agenda.

Use the total score of all actions to rank state items

Avoid error propagations from early decisions

Global discriminative training

Zhang and Clark, EMNLP 2008

Page 148: Incremental Structured Prediction Using a Global Learning

A tale of two parsers

Zhang and Clark, EMNLP 2008

Higher order, more features

Graph-based

MST parser

Carreras, 2007

Koo and Collins, 2010

Higher order, more features

This tutorial framework

Transition-based

Malt parser

Zhang and Clark, 2008

Zhang and Clark, 2011

More features

comparable

comparable

Page 149: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here

Zhang and Clark, EMNLP 2008

Page 150: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here does it here He S

Zhang and Clark, EMNLP 2008

Page 151: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here does it here He S does it here AL

He

it here He does

it here He does

Zhang and Clark, EMNLP 2008

Page 152: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here does it here He S does it here AL

He

it here He does

it here He does

it here S

He

does

S here He does it

here He does it

Zhang and Clark, EMNLP 2008

Page 153: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here does it here He S does it here AL

He

it here He does

it here He does

it here S

He

does

S here He does it

here He does it

He

does it here

He does it here

here He does

it

Zhang and Clark, EMNLP 2008

Page 154: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here does it here He S does it here AL

He

it here He does

it here He does

it here S

He

does

S here He does it

here He does it

He

does it here

He does it here

here He does

it

here He does

He

He

does it here

He

does here

it

He

does it here

He

Zhang and Clark, EMNLP 2008

Page 155: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here does it here He S does it here AL

He

it here He does

it here He does

it here S

He

does

S here He does it

here He does it

He

does it here

He does it here

here He does

it

here He does

He

does here

it

He

does it here

He

does here

it He

He

does it here

He

does here

it

He

does it here

He

Zhang and Clark, EMNLP 2008

Page 156: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Our parser

Decoding

He does it here does it here He S does it here AL

He

it here He does

it here He does

it here S

He

does

S here He does it

here He does it

He

does it here

He does it here

here He does

it

here He does

He

does here

it

He

does it here

He

does here

it He

He

does it here

He

does here

it

He

does it here

He He

does

it

He here

does it

here

Zhang and Clark, EMNLP 2008

Page 157: Incremental Structured Prediction Using a Global Learning

The feature templates

The context

S0 – top of stack

S0h – head of S0

S0l – left modifier of S0

S0r – right modifier of S0

S0 S0h ...

S0l S0r

The stack The input

N0 N1 N2 N3 ...

N0l

N0 – head of queue

N0l – left modifier of N0

N1 – next in queue

N2 – next of N1

Zhang and Clark, EMNLP 2008

Page 158: Incremental Structured Prediction Using a Global Learning

The feature templates

The base features

from single words

S0wp; S0w; S0p; N0wp; N0w; N0p;

N1wp; N1w; N1p; N2wp; N2w; N2p;

from word pairs

S0wpN0wp; S0wpN0w; S0wN0wp; S0wpN0p;

S0pN0wp; S0wN0w; S0pN0p

N0pN1p

from three words

N0pN1pN2p; S0pN0pN1p; S0hpS0pN0p;

S0pS0lpN0p; S0pS0rpN0p; S0pN0pN0lp

Zhang and Clark, EMNLP 2008

Page 159: Incremental Structured Prediction Using a Global Learning

The feature templates

The extended features

Distance

Standard in MSTParser (McDonald et al., 2005)

Used in easy-first (Goldberg and Elhadad, 2010)

When used in transition-based parsing, combined with action (this paper)

distance

S0wd; S0pd; N0wd; N0pd;

S0wN0wd; S0pN0pd;

Zhang and Clark, ACL 2011

Page 160: Incremental Structured Prediction Using a Global Learning

The feature templates

The extended features

Valency

Number of modifiers

Graph-based submodel of Zhang and Clark (2008)

The models of Martins et al. (2009)

The models of Sagae and Tsujii (2007)

valency

S0wvr; S0pvr; S0wvl; S0pvl; N0wvl; N0pvl;

Zhang and Clark, ACL 2011

Page 161: Incremental Structured Prediction Using a Global Learning

The feature templates

The extended features

Extended unigrams

S0h, S0l, S0r and N0l has been applied to transition-based parsers via POS-combination

We add their unigram word, POS and label information (this paper)

unigrams

S0hw; S0hp; S0l; S0lw; S0lp; S0ll;

S0rw; S0rp; S0rl;N0lw; N0lp; N0ll;

Zhang and Clark, ACL 2011

Page 162: Incremental Structured Prediction Using a Global Learning

The feature templates

The extended features

Third order

Graph-based dependency parsers (Carreras, 2007; Koo and Collins, 2010)

third-order

S0h2w; S0h2p; S0hl; S0l2w; S0l2p; S0l2l;

S0r2w; S0r2p; S0r2l; N0l2w; N0l2p; N0l2l;

S0pS0lpS0l2p; S0pS0rpS0r2p;

S0pS0hpS0h2p; N0pN0lpN0l2p;

Zhang and Clark, ACL 2011

Page 163: Incremental Structured Prediction Using a Global Learning

The feature templates

The extended features

Set of labels

More global feature

Has not been applied to transition-based parsing

label set 1

S0wsr; S0psr; S0wsl; S0psl; N0wsl; N0psl;

Zhang and Clark, ACL 2011

Page 164: Incremental Structured Prediction Using a Global Learning

Experiments

Chinese Data (CTB5)

English Data (Penn Treebank)

Zhang and Clark, ACL 2011

Page 165: Incremental Structured Prediction Using a Global Learning

Results

Model

Li et al. (2012)

Jun et al. (2011)

H&S10

This Method

Chinese

English Model

Li et al. (2012)

MSTParser

K08 standard

K&C10 model

H&S10

This Method

Zhang and Clark, ACL 2011

Page 166: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 167: Incremental Structured Prediction Using a Global Learning

We use Wang et al. (2006)'s shift-reduce transition-based process

A state item = a pair <stack, queue>

Stack: holds the partial parse trees already built

Queue: holds the incoming words with POS

Actions

SHIFT, REDUCE-BINARY-L/R, REDUCE-UNARY

Corresponds to arc-standard

The shift-reduce parsing process

Wang et al., ACL 2011

Page 168: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT

The shift-reduce parsing process

stack queue

NR布朗 VV访问 NR上海

Zhang and Clark, IWPT 2009

布朗(Brown) 访问(visits) 上海(Shanghai)

Page 169: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT

The shift-reduce parsing process

stack queue

NR布朗 VV访问 NR上海

Zhang and Clark, IWPT 2009

布朗(Brown) 访问(visits) 上海(Shanghai)

Page 170: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-UNARY-X

The shift-reduce parsing process

stack queue

NR布朗 VV访问 NR上海

Zhang and Clark, IWPT 2009

布朗(Brown) 访问(visits) 上海(Shanghai)

Page 171: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-UNARY-X

The shift-reduce parsing process

stack queue

VV访问 NR上海

NR布朗

Zhang and Clark, IWPT 2009

布朗(Brown) 访问(visits) 上海(Shanghai)

Page 172: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-UNARY-X

The shift-reduce parsing process

stack queue

X VV访问 NR上海

NR布朗

Zhang and Clark, IWPT 2009

布朗(Brown) 访问(visits) 上海(Shanghai)

Page 173: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-BINARY-{L/R}-X

The shift-reduce parsing process

stack queue

NP VV访问

NR布朗

NP

NR上海

Zhang and Clark, IWPT 2009

布朗(Brown) 访问(visits) 上海(Shanghai)

Page 174: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-BINARY-{L/R}-X

The shift-reduce parsing process

stack queue

NP

VV访问 NR布朗 NP

NR上海

Zhang and Clark, IWPT 2009

Page 175: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-BINARY-{L/R}-X

The shift-reduce parsing process

stack queue

NP

VV访问 NR布朗 NP

NR上海

VP

Zhang and Clark, IWPT 2009

Page 176: Incremental Structured Prediction Using a Global Learning

Actions

TERMINATE

The shift-reduce parsing process

stack queue

S

Zhang and Clark, IWPT 2009

Page 177: Incremental Structured Prediction Using a Global Learning

Actions

TERMINATE

The shift-reduce parsing process

stack queue

S

ans

Zhang and Clark, IWPT 2009

Page 178: Incremental Structured Prediction Using a Global Learning

Example

SHIFT

The shift-reduce parsing process

stack queue

NR布朗 VV访问 NR上海

Zhang and Clark, IWPT 2009

Page 179: Incremental Structured Prediction Using a Global Learning

Example

REDUCE-UNARY-NP

The shift-reduce parsing process

stack queue

NR布朗 VV访问 NR上海

Zhang and Clark, IWPT 2009

Page 180: Incremental Structured Prediction Using a Global Learning

Example

SHIFT

The shift-reduce parsing process

NP VV访问 NR上海

NR布朗

stack queue

Zhang and Clark, IWPT 2009

Page 181: Incremental Structured Prediction Using a Global Learning

Example

SHIFT

The shift-reduce parsing process

NP VV访问 NR上海

NR布朗

stack queue

Zhang and Clark, IWPT 2009

Page 182: Incremental Structured Prediction Using a Global Learning

Example

REDUCE-UNARY-NP

The shift-reduce parsing process

NP VV访问 NR上海

NR布朗

stack queue

Zhang and Clark, IWPT 2009

Page 183: Incremental Structured Prediction Using a Global Learning

Example

REDUCE-BINARY-L-VP

The shift-reduce parsing process

stack queue

NP VV访问

NR布朗

NP

NR上海

Zhang and Clark, IWPT 2009

Page 184: Incremental Structured Prediction Using a Global Learning

Example

REDUCE-BINARY-R-IP

The shift-reduce parsing process

stack queue

NP

VV访问 NR布朗 NP

NR上海

VP

Zhang and Clark, IWPT 2009

Page 185: Incremental Structured Prediction Using a Global Learning

Example

TERMINATE

The shift-reduce parsing process

stack queue

NP

VV访问 NR布朗 NP

NR上海

VP

IP

Zhang and Clark, IWPT 2009

Page 186: Incremental Structured Prediction Using a Global Learning

Example

The shift-reduce parsing process

stack queue

NP

VV访问 NR布朗 NP

NR上海

VP

IP

Zhang and Clark, IWPT 2009

Page 187: Incremental Structured Prediction Using a Global Learning

Grammar binarization

The shift-reduce parser require binarized trees

Treebank trees are not binarized

Penn Treebank/CTB ↔ Parser

Binarize CTB data to make training data

Unbinarize parser output back to Treebank format

Reversible

Page 188: Incremental Structured Prediction Using a Global Learning

Grammar binarization

The binarization process

Find head

Binarize left nodes

Binarize right nodes

Y

A B C D E F

Page 189: Incremental Structured Prediction Using a Global Learning

Grammar binarization

The binarization process

Find head

Binarize left nodes

Binarize right nodes

Y

A B C (D) E F

Page 190: Incremental Structured Prediction Using a Global Learning

Grammar binarization

The binarization process

Find head

Binarize left nodes

Binarize right nodes

Y

A

B C (D) E F

Y*

Page 191: Incremental Structured Prediction Using a Global Learning

Grammar binarization

The binarization process

Find head

Binarize left nodes

Binarize right nodes

Y

A

B

C (D) E F

Y*

Y*

Page 192: Incremental Structured Prediction Using a Global Learning

Grammar binarization

The binarization process

Find head

Binarize left nodes

Binarize right nodes

Y

A

B

C

(D) E F

Y*

Y*

Y*

Page 193: Incremental Structured Prediction Using a Global Learning

Grammar binarization

The binarization process

Find head

Binarize left nodes

Binarize right nodes

Y

A

B

C

(D) E

F

Y*

Y*

Y*

Y*

Page 194: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

The statistical parser

Initial item stack: empty queue: input

Zhang and Clark, IWPT 2009

Page 195: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

The statistical parser

Initial item stack: empty queue: input

SHIFT state item 1

Zhang and Clark, IWPT 2009

Page 196: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

The statistical parser

Initial item stack: empty queue: input

SHIFT state item 1

SHIFT

REDUCE-UNARY-X

state item 2

state item 3

state item 4

...

state item N

different label {

Zhang and Clark, IWPT 2009

Page 197: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

The statistical parser

Initial item stack: empty queue: input

SHIFT state item 1

SHIFT

REDUCE-UNARY-X

state item 2

state item 3

state item 4

...

state item N

different label {

Zhang and Clark, IWPT 2009

Page 198: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

The statistical parser

Initial item stack: empty queue: input

SHIFT state item 1

SHIFT

REDUCE-UNARY-X

state item 2

state item 3

state item 4

...

state item N

different label {

SHIFT

REDUCE-UNARY-X

REDUCE-BINARY-{L/R}-X

Zhang and Clark, IWPT 2009

Page 199: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

The statistical parser

Initial item stack: empty queue: input

SHIFT state item 1

SHIFT

REDUCE-UNARY-X

state item 2

state item 3

state item 4

...

state item N

different label {

SHIFT

REDUCE-UNARY-X

REDUCE-BINARY-{L/R}-X

Zhang and Clark, IWPT 2009

Page 200: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

Beam-search: B>1

The statistical parser

Initial state item

Zhang and Clark, IWPT 2009

Page 201: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

Beam-search: B>1

The statistical parser

Initial state item

state item 1

SHIFT

Zhang and Clark, IWPT 2009

Page 202: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

Beam-search: B>1

The statistical parser

Initial state item

state item 1

SHIFT state item 1

state item 2

state item 3 ...

state item N

Zhang and Clark, IWPT 2009

Page 203: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

Beam-search: B>1

The statistical parser

Initial state item

state item 1

SHIFT state item 1

state item 2

state item 3 ...

state item N

state item 121

state item 234

state item 165 ...

state item 230

discarded

Zhang and Clark, IWPT 2009

Page 204: Incremental Structured Prediction Using a Global Learning

Beam-search decoding

Deterministic parsing: B=1

Beam-search: B>1

The statistical parser

Initial state item

state item 1

SHIFT state item 1

state item 2

state item 3 ...

state item N

state item 121

state item 234

state item 165 ...

state item 230

discarded

Zhang and Clark, IWPT 2009

Page 205: Incremental Structured Prediction Using a Global Learning

Features

Extracted from top nodes on the stack S0, S1, S2, S3, the left and right or single child of S0 and S1, and the first words on the queue N0, N1, N2, N3.

The statistical parser

stack queue

… S1 S

0

S0l S

0r S

1u

N0

Zhang and Clark, IWPT 2009

Page 206: Incremental Structured Prediction Using a Global Learning

Features

Manually combine word and constituent information

Unigrams

The statistical parser

Zhang and Clark, IWPT 2009

Page 207: Incremental Structured Prediction Using a Global Learning

Features

Manually combine of word and constituent information

Bigrams

The statistical parser

Zhang and Clark, IWPT 2009

Page 208: Incremental Structured Prediction Using a Global Learning

Features

Manually combine of word and constituent information

Trigrams

The statistical parser

Zhang and Clark, IWPT 2009

Page 209: Incremental Structured Prediction Using a Global Learning

An improvement

Unlike dependency parsing, different parse trees of the same input can use the different numbers of actions

The IDLE action

Align the unequal number of actions for different output trees

The statistical parser

Zhu et al., ACL 2013

Page 210: Incremental Structured Prediction Using a Global Learning

LEFT: REDUCE-BINARY-R(NP), IDLE

RIGHT: REDUCE-UNARY(NP), REDUCE-BINARY-L(VP)

The statistical parser

Zhu et al., ACL 2013

Page 211: Incremental Structured Prediction Using a Global Learning

English PTB

Chinese CTB51

Standard evaluation of bracketed P, R and F

Experiments

Zhu et al., ACL 2013

Page 212: Incremental Structured Prediction Using a Global Learning

English results on PTB

Experiments

LR LP F1 #Sent/Second

Ratnaparkhi (1997) 86.3 87.5 86.9 Unk

Collins (1999) 88.1 88.3 88.2 3.5

Charniak (2000) 89.5 89.9 89.5 5.7

Sagae & Lavie (2005) 86.1 86.0 86.0 3.7

Sagae & Lavie (2006) 87.8 88.1 87.9 2.2

Petrov & Klein (2007) 90.1 90.2 90.1 6.2

Carreras et al. (2008) 90.7 91.4 91.1 Unk

This implementation 90.2 90.7 90.4 89.5

Zhu et al., ACL 2013

Page 213: Incremental Structured Prediction Using a Global Learning

Chinese results on CTB51

Experiments

LR LP F1

Charniak (2000) 79.6 82.1 80.8

Bikel (2004) 79.3 82.0 80.6

Petrov & Klein (2007) 81.9 84.8 83.3

This implementation 82.1 84.3 83.2

Zhu et al., ACL 2013

Page 214: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 215: Incremental Structured Prediction Using a Global Learning

Introduction to CCG parsing

Lexical categories

basic categories: N (nouns), NP (noun phrases), PP (prepositional phrases), ...

complex categories: S\NP (intransitive verbs), (S\NP)/NP (transitive verbs), ...

Adjacent phrases are combined to form larger phrases using category combination e.g.:

function application: NP S\NP ⇒ S

function composition: (S\NP)/(S\NP) (S\NP)/NP ⇒ (S\NP)/NP

Unary rules change the type of a phrase

Type raising: NP ⇒ S/(S\NP)

Type changing: S[pss]\NP ⇒ NP\NP

Zhang and Clark, ACL 2011

Page 216: Incremental Structured Prediction Using a Global Learning

Introduction to CCG parsing

An example derivation IBM bought Lotus

Zhang and Clark, ACL 2011

Page 217: Incremental Structured Prediction Using a Global Learning

Introduction to CCG parsing

An example derivation IBM bought Lotus NP (S[dcl]\NP)/NP NP

Zhang and Clark, ACL 2011

Page 218: Incremental Structured Prediction Using a Global Learning

Introduction to CCG parsing

An example derivation IBM bought Lotus NP (S[dcl]\NP)/NP NP S[dcl]\NP

Zhang and Clark, ACL 2011

Page 219: Incremental Structured Prediction Using a Global Learning

Introduction to CCG parsing

An example derivation IBM bought Lotus NP (S[dcl]\NP)/NP NP S[dcl]\NP S[dcl]

Zhang and Clark, ACL 2011

Page 220: Incremental Structured Prediction Using a Global Learning

Introduction to CCG parsing

Rule extraction

Manually define the lexicon and combinatory rule schemas (Steedman, 2000; Clark and Curran, 2007)

Extracting rule instances from corpus (Hockenmaier, 2003; Fowler and Penn, 2010)

Zhang and Clark, ACL 2011

Page 221: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

State

A stack of partial derivations

A queue of input words

A set of shift-reduce actions

SHIFT

COMBINE

UNARY

FINISH

Q1 Q

2 ...

The stack

The queue

... S2

(w2) S

1(w

1)

Zhang and Clark, ACL 2011

Page 222: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

Shift-reduce actions

SHIFT-X

Pushes the head of the queue onto the stack

Assigns label X (a lexical category)

SHIFT action performs lexical category disambiguation

Q1 Q

2 ...

The stack

The queue

Before SHIFT

Q2 ...

The stack

The queue

After SHIFT

... S2

(w2) S

1(w

1) X(Q

1) ... S

2(w

2) S

1(w

1)

Zhang and Clark, ACL 2011

Page 223: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

Shift-reduce actions

COMBINE-X

Pops the top two nodes off the stack

Combines into a new node X, and push it onto stack

Corresponds to the use of a combinatory rule in CCG

Q1 Q

2 ...

The stack

The queue

Before COMBINE

... S2

(w2) S

1(w

1) Q

1 Q

2 ...

The stack

The queue

After COMBINE

S2

(w2) S

1(w

1)

... X(w2)

Zhang and Clark, ACL 2011

Page 224: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

Shift-reduce actions

UNARY-X

Pops the top of the stack

Create a new node with category X; pushes it onto stack

Corresponds to the use of a unary rule in CCG

Q1 Q

2 ...

The stack

The queue

Before UNARY

... S2

(w2) S

1(w

1) ... S

2(w

2) X(w

1) Q

1 Q

2 ...

The stack

The queue

After UNARY

S1

(w1)

Zhang and Clark, ACL 2011

Page 225: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

Shift-reduce actions

FINISH

Terminates the parsing process

Can be applied when all input words have been pushed onto the stack

Allows fragmentary analysis:

when the stack holds multiple items that cannot be combined

such cases can arise from incorrect lexical category assignment

Zhang and Clark, ACL 2011

Page 226: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

IBM bought Lotus yesterday

initial

Zhang and Clark, ACL 2011

Page 227: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

bought Lotus yesterday NPIBM

SHIFT

Zhang and Clark, ACL 2011

Page 228: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

Lotus yesterday NPIBM ((S[dcl]\NP)/NP)bought

SHIFT

Zhang and Clark, ACL 2011

Page 229: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

yesterday NPIBM ((S[dcl]\NP)/NP)bought NPLotus

SHIFT

Zhang and Clark, ACL 2011

Page 230: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

yesterday NPIBM (S[dcl]\NP)bought

((S[dcl]\NP)/NP)bought NPLotus

COMBINE

Zhang and Clark, ACL 2011

Page 231: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

NPIBM (S[dcl]\NP)bought (S\NP)\(S\NP)yesterday

((S[dcl]\NP)/NP)bought NPLotus

SHIFT

Zhang and Clark, ACL 2011

Page 232: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

NPIBM (S[dcl]\NP)bought

((S[dcl]\NP)/NP)bought NPLotus

(S[dcl]\NP)bought (S\NP)\(S\NP)yesterday

COMBINE

Zhang and Clark, ACL 2011

Page 233: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

S[dcl]bought

((S[dcl]\NP)/NP)bought NPLotus

(S[dcl]\NP)bought (S\NP)\(S\NP)yesterday

NPIBM (S[dcl]\NP)bought

COMBINE

Zhang and Clark, ACL 2011

Page 234: Incremental Structured Prediction Using a Global Learning

The shift-reduce parser

An example parsing process

S[dcl]bought

((S[dcl]\NP)/NP)bought NPLotus

(S[dcl]\NP)bought (S\NP)\(S\NP)yesterday

NPIBM (S[dcl]\NP)bought

FINISH

Zhang and Clark, ACL 2011

Page 235: Incremental Structured Prediction Using a Global Learning

Features

Beam-search decoding

context

Stack nodes: S0 S1 S2 S3

Queue nodes: Q0 Q1 Q2 Q3

Stack subnodes: S0L S0R S0U S1L/R/U

Q0 Q

1 Q

2 Q

3 ...

The stack

The queue

... S3 S

2 S

1 S

0

S1U

S0L

S0R

S0wp, S0c, S0pc, S0wc, S1wp, S1c, S1pc, S1wc, S2pc, S2wc, S3pc, S3wc,

Q0wp, Q1wp, Q2wp, Q3wp,

S0Lpc, S0Lwc, S0Rpc, S0Rwc, S0Upc, S0Uwc, S1Lpc, S1Lwc, S1Rpc, S1Rwc, S1Upc, S1Uwc,

S0wcS1wc, S0cS1w, S0wS1c, S0cS1c, S0wcQ0wp, S0cQ0wp, S0wcQ0p, S0cQ0p, S1wcQ0wp, S1cQ0wp, S1wcQ0p, S1cQ0p,

S0wcS1cQ0p, S0cS1wcQ0p, S0cS1cQ0wp, S0cS1cQ0p, S0pS1pQ0p, S0wcQ0pQ1p, S0cQ0wpQ1p, S0cQ0pQ1wp, S0cQ0pQ1p, S0pQ0pQ1p, S0wcS1cS2c, S0cS1wcS2c, S0cS1cS2wc, S0cS1cS2c, S0pS1pS2p,

S0cS0HcS0Lc, S0cS0HcS0Rc, S1cS1HcS1Rc, S0cS0RcQ0p, S0cS0RcQ0w, S0cS0LcS1c, S0cS0LcS1w, S0cS1cS1Rc, S0wS1cS1Rc.

Zhang and Clark, ACL 2011

Page 236: Incremental Structured Prediction Using a Global Learning

Experimental data

CCGBank (Hockenmaier and Steedman, 2007)

Split into three subsets:

Training (section 02 – 21)

Development (section 00)

Testing (section 23)

Extract CCG rules

Binary instances: 3070

Unary instances: 191

Evaluation F-score over CCG dependencies

Use C&C tools for transformation

Zhang and Clark, ACL 2011

Page 237: Incremental Structured Prediction Using a Global Learning

Test results

F&P = Fowler and Penn (2010)

LP LR LF lsent. cats. evaluated

shift-reduce 87.43 83.61 85.48 35.19 93.12 all sentences

C&C (normal-form) 85.58 82.85 84.20 32.90 92.84 all sentences

shift-reduce 87.43 83.71 85.53 35.34 93.15 99.58% (C&C coverage)

C&C (hybrid) 86.17 84.74 85.45 32.92 92.98 99.58% (C&C coverage)

C&C (normal-form) 85.48 84.60 85.04 33.08 92.86 99.58% (C&C coverage)

F&P (Petrov I-5)* 86.29 85.73 86.01 -- -- -- (F&P ∩ C&C coverage; 96.65% on dev. test)

C&C hybrid* 86.46 85.11 85.78 -- -- -- (F&P ∩ C&C coverage; 96.65% on dev. test)

Zhang and Clark, ACL 2011

Page 238: Incremental Structured Prediction Using a Global Learning

Error Comparisons

As sentence length increases Both parsers give lower performance

No difference in the rate of accuracy degradation

When dependency length increases

Zhang and Clark, ACL 2011

Page 239: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 240: Incremental Structured Prediction Using a Global Learning

Introduction of Chinese POS-tagging

Word segmentation is a necessary step before POS-tagging Input 我喜欢读书 Ilikereadingbooks Segment 我 喜欢 读 书 I like reading books Tag 我/PN 喜欢/V 读/V 书/N I/PN like/V reading/V books/N

The traditional approach treats word segmentation and POS-tagging as two separate steps

Page 241: Incremental Structured Prediction Using a Global Learning

Two observations

Segmentation errors propagate to the step of POS-tagging Input 我喜欢读书 llikereadingbooks Segment 我喜 欢 读 书 Ili ke reading books Tag 我喜/N 欢/V 读/V 书/N Ili/N ke/V reading/V books/N

Information about POS helps to improve segmentation 一/CD (1) 个/M (measure word) 人/N (person) or 一/CD (1) 个人/JJ (personal) 二百三十三/CD (233) or 二/CD (2) 百/CD (hundred) 三/CD (3) 十/CD (ten) 三/CD (3)

Page 242: Incremental Structured Prediction Using a Global Learning

Joint segmentation and tagging

The observations lead to the solution of joint segmentation and POS-tagging Input 我喜欢读书 Ilikereading Output 我/PN 喜欢/V 读/V 书/N I/PN like/V reading/V books/N

Consider segmentation and POS information simultaneously

The most appropriate output is chosen from all possible segmented and tagged outputs

Page 243: Incremental Structured Prediction Using a Global Learning

The transition system

State

Partial segmented results

Unprocessed characters

Two actions

Separate (t) : t is a POS tag

Append

Zhang and Clark, EMNLP 2010

Page 244: Incremental Structured Prediction Using a Global Learning

The transition system

Initial state

我喜欢读书

Zhang and Clark, EMNLP 2010

Page 245: Incremental Structured Prediction Using a Global Learning

The transition system

Separate(PN)

喜欢读书 我/PN

Zhang and Clark, EMNLP 2010

Page 246: Incremental Structured Prediction Using a Global Learning

The transition system

Separate (V)

欢读书 我/PN 喜/V

Zhang and Clark, EMNLP 2010

Page 247: Incremental Structured Prediction Using a Global Learning

The transition system

Append

读书 我/PN 喜欢/V

Zhang and Clark, EMNLP 2010

Page 248: Incremental Structured Prediction Using a Global Learning

The transition system

Separate (V)

书 我/PN 喜欢/V 读/V

Zhang and Clark, EMNLP 2010

Page 249: Incremental Structured Prediction Using a Global Learning

The transition system

Separate (N)

我/PN 喜欢/V 读/V 书/N

Zhang and Clark, EMNLP 2010

Page 250: Incremental Structured Prediction Using a Global Learning

The transition system

End state

我/PN 喜欢/V 读/V 书/N

Zhang and Clark, EMNLP 2010

Page 251: Incremental Structured Prediction Using a Global Learning

Feature templates

Zhang and Clark, EMNLP 2010

Page 252: Incremental Structured Prediction Using a Global Learning

Feature templates

Zhang and Clark, EMNLP 2010

Page 253: Incremental Structured Prediction Using a Global Learning

Experiments

Penn Chinese Treebank 5 (CTB-5)

Zhang and Clark, EMNLP 2010

Page 254: Incremental Structured Prediction Using a Global Learning

Experiments

SF JF

K09 (error-driven) 97.87 93.67

This work 97.78 93.67

Zhang 2008 97.82 93.62

K09 (baseline) 97.79 93.60

J08a 97.85 93.41

J08b 97.74 93.37

N07 97.83 93.32

SF = segmentation F-score; JF = joint segmentation and POS-tagging F-score

Accuracy comparisons between various joint segmentors and POS-taggers on CTB5

Zhang and Clark, EMNLP 2010

Page 255: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 256: Incremental Structured Prediction Using a Global Learning

Introduction

Traditional dependency parsing

Input: POS-tagged sentence e.g He/PN does/V it/PN here/RB

Output:

Accurate dependency parsing heavily relies on POS tagging information

Error propagation

Syntactic information can be helpful for POS disambiguation

He/PN does/V it/PN here/RB

Page 257: Incremental Structured Prediction Using a Global Learning

Introduction

Joint POS-tagging and dependency parsing

Input: POS-tagged sentence e.g He does it here

Output:

He/PN does/V it/PN here/RB

Page 258: Incremental Structured Prediction Using a Global Learning

The extended arc-standard transition system

Extended arc-standard dependency parsing transition

State

A stack to hold partial candidates

A queue of next incoming words

Four actions

SHIFT(t), LEFT-REDUCE, RIGHT-REDUCE t is the POS tag

Hatori et al. IJCNLP 2011

Page 259: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT(t)

259

ST ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 260: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT(t)

Pushes stack

260

ST ST1 ...

STLC STRC

The stack

The input

N1 N2 N3 ... N0/t

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 261: Incremental Structured Prediction Using a Global Learning

Actions

LEFT-REDUCE

261

ST ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 262: Incremental Structured Prediction Using a Global Learning

Actions

LEFT-REDUCE

Pops stack

Adds link

262

ST

ST1

...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 263: Incremental Structured Prediction Using a Global Learning

Actions

RIGHT-REDUCE

263

ST ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 264: Incremental Structured Prediction Using a Global Learning

Actions

RIGHT-REDUCE

Pops stack

Adds link

264

ST

ST1 ...

STLC STRC

The stack

The input

N0 N1 N2 N3 ...

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 265: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 266: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here does it here He/PN S(PN)

The extended arc-standard transition system

Hatori et al. IJCNLP 2011

Page 267: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here does it here He/PN S(PN) it here S(V)

The extended arc-standard transition system

He/PN does/V

Hatori et al. IJCNLP 2011

Page 268: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here does it here He/PN S(PN) it here S(V) it here LR

He/PN

Does/V

The extended arc-standard transition system

He/PN does/V

Hatori et al. IJCNLP 2011

Page 269: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here does it here He/PN S(PN) it here S(V) it here LR

He/PN

Does/V

He/PN

does/V it/PN

S(P

N)

here

The extended arc-standard transition system

He/PN does/V

Hatori et al. IJCNLP 2011

Page 270: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here does it here He/PN S(PN) it here S(V) it here LR

He/PN

Does/V

He/PN

does/V it/PN

S(P

N)

here RR

He/PN

does/V here

it/PN

The extended arc-standard transition system

He/PN does/V

Hatori et al. IJCNLP 2011

Page 271: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here does it here He/PN S(PN) it here S(V) it here LR

He/PN

Does/V

He/PN

does/V it/PN

S(P

N)

here RR

He/PN

does/V here

it/PN

S(RB)

He/PN

does/V here/RB

it/PN

The extended arc-standard transition system

He/PN does/V

Hatori et al. IJCNLP 2011

Page 272: Incremental Structured Prediction Using a Global Learning

An example S(t) – SHIFT(t)

LR – LEFT-REDUCE

RR – RIGHT-REDUCE

He does it here does it here He/PN S(PN) it here S(V) it here LR

He/PN

Does/V

He/PN

does/V it/PN

S(P

N)

here RR

He/PN

does/V here

it/PN

S(RB)

He/PN

does/V here/RB

it/PN

RR

He/PN

does/V

it/PN

here/RB

The extended arc-standard transition system

He/PN does/V

Hatori et al. IJCNLP 2011

Page 273: Incremental Structured Prediction Using a Global Learning

Features

POS tag features

Hatori et al. IJCNLP 2011

Page 274: Incremental Structured Prediction Using a Global Learning

Features

Dependency parsing features

Hatori et al. IJCNLP 2011

Page 275: Incremental Structured Prediction Using a Global Learning

Features

Syntactic features

Hatori et al. IJCNLP 2011

Page 276: Incremental Structured Prediction Using a Global Learning

Experiments

CTB5 dataset

Page 277: Incremental Structured Prediction Using a Global Learning

Results

Model LAS UAS POS

Li et al. (2011) (unlabeled) 80.74 93.08

Li et al. (2012) (unlabeled) --- 81.21 94.51

Li et al. (2012) (labeled) 79.01 81.67 94.60

Hatori et al. (2011) (unlabeled) --- 81.33 93.94

Bohnet and Nirve (2012) (labeled) 77.91 81.42 93.24

Our implementation (unlabeled) --- 81.20 94.15

Out implementation (labeled) 78.30 81.26 94.28

Page 278: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 279: Incremental Structured Prediction Using a Global Learning

Traditional: word-based Chinese parsing

CTB-style word-based syntax tree for “中国 (China) 建筑业 (architecture industry) 呈现 (show) 新 (new) 格局 (pattern)”.

Zhang et al. ACL 2013

Page 280: Incremental Structured Prediction Using a Global Learning

This: character-based Chinese parsing

Character-level syntax tree with hierarchal word structures for “中 (middle) 国 (nation) 建 (construction) 筑 (building) 业 (industry) 呈 (present) 现 (show) 新 (new) 格 (style) 局 (situation)”.

Zhang et al. ACL 2013

Page 281: Incremental Structured Prediction Using a Global Learning

Why character-based?

Chinese words have syntactic structures.

Zhang et al. ACL 2013

Page 282: Incremental Structured Prediction Using a Global Learning

Why character-based?

Chinese words have syntactic structures.

Zhang et al. ACL 2013

Page 283: Incremental Structured Prediction Using a Global Learning

Why character-based?

Deep character information of word structures.

Zhang et al. ACL 2013

Page 284: Incremental Structured Prediction Using a Global Learning

Why character-based?

Deep character information of word structures.

Zhang et al. ACL 2013

Page 285: Incremental Structured Prediction Using a Global Learning

Why character-based?

Build syntax tree from character sequences.

Not require segmentation or POS-tagging as input.

Benefit from joint framework, avoid error propagation.

Zhang et al. ACL 2013

Page 286: Incremental Structured Prediction Using a Global Learning

Word structure annotation

Binarized tree structure for each word.

Zhang et al. ACL 2013

Page 287: Incremental Structured Prediction Using a Global Learning

Word structure annotation

Binarized tree structure for each word.

b, i denote whether the below character is at a word’s beginning position.

l, r, c denote the head direction of current node, respectively left, right and coordination.

Zhang et al. ACL 2013

Page 288: Incremental Structured Prediction Using a Global Learning

Word structure annotation

Binarized tree structure for each word.

b, i denote whether the below character is at a word’s beginning position.

l, r, c denote the head direction of current node, respectively left, right and coordination.

We extend word-based phrase-structures into character-based syntax trees using the word structures demonstrated above.

Zhang et al. ACL 2013

Page 289: Incremental Structured Prediction Using a Global Learning

Word structure annotation

Annotation input: a word and its POS.

A word may have different structures according to different POS.

Zhang et al. ACL 2013

Page 290: Incremental Structured Prediction Using a Global Learning

The character-based parsing model

A transition-based parser

Zhang et al. ACL 2013

Page 291: Incremental Structured Prediction Using a Global Learning

The character-based parsing model

A transition-based parser

Extended from Zhang and Clark (2009), a word-based transition parser.

Zhang et al. ACL 2013

Page 292: Incremental Structured Prediction Using a Global Learning

The character-based parsing model

A transition-based parser

Extended from Zhang and Clark (2009), a word-based transition parser.

Incorporating features of a word-based parser as well as a joint SEG&POS system.

Zhang et al. ACL 2013

Page 293: Incremental Structured Prediction Using a Global Learning

The character-based parsing model

A transition-based parser

Extended from Zhang and Clark (2009), a word-based transition parser.

Incorporating features of a word-based parser as well as a joint SEG&POS system.

Adding the deep character information from word structures.

Zhang et al. ACL 2013

Page 294: Incremental Structured Prediction Using a Global Learning

The transition system

SHIFT-SEPARATE(t), SHIFT-APPEND, REDUCE-SUBWORD(d),

REDUCE-WORD, REDUCE-BINARY(d;l), REDUCE-UNARY(l), TERMINATE

State:

Actions:

Zhang et al. ACL 2013

Page 295: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT-SEPARATE(t)

Zhang et al. ACL 2013

Page 296: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT-SEPARATE(t)

Zhang et al. ACL 2013

Page 297: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT-APPEND

Zhang et al. ACL 2013

Page 298: Incremental Structured Prediction Using a Global Learning

Actions

SHIFT-APPEND

Zhang et al. ACL 2013

Page 299: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-SUBWORD(d)

Zhang et al. ACL 2013

Page 300: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-SUBWORD(d)

Zhang et al. ACL 2013

Page 301: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-WORD

Zhang et al. ACL 2013

Page 302: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-WORD

Zhang et al. ACL 2013

Page 303: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-BINARY(d; l)

Zhang et al. ACL 2013

Page 304: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-BINARY(d; l)

Zhang et al. ACL 2013

Page 305: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-UNARY(l)

Zhang et al. ACL 2013

Page 306: Incremental Structured Prediction Using a Global Learning

Actions

REDUCE-UNARY(l)

Zhang et al. ACL 2013

Page 307: Incremental Structured Prediction Using a Global Learning

Actions

TERMINATE

Zhang et al. ACL 2013

Page 308: Incremental Structured Prediction Using a Global Learning

Features

From word-based parser (Zhang and Clark, 2009)

From joint SEG&POS-Tagging (Zhang and Clark, 2010)

Zhang et al. ACL 2013

Page 309: Incremental Structured Prediction Using a Global Learning

Features

From word-based parser (Zhang and Clark, 2009)

From joint SEG&POS-Tagging (Zhang and Clark, 2010)

baseline features

Zhang et al. ACL 2013

Page 310: Incremental Structured Prediction Using a Global Learning

Features

From word-based parser (Zhang and Clark, 2009)

From joint SEG&POS-Tagging (Zhang and Clark, 2010)

baseline features

Deep character features

Zhang et al. ACL 2013

Page 311: Incremental Structured Prediction Using a Global Learning

Features

From word-based parser (Zhang and Clark, 2009)

From joint SEG&POS-Tagging (Zhang and Clark, 2010)

baseline features

Deep character features

new features

Zhang et al. ACL 2013

Page 312: Incremental Structured Prediction Using a Global Learning

Features

Zhang et al. ACL 2013

Page 313: Incremental Structured Prediction Using a Global Learning

Features

Zhang et al. ACL 2013

Page 314: Incremental Structured Prediction Using a Global Learning

Experiments

Penn Chinese Treebank 5 (CTB-5)

Zhang et al. ACL 2013

Page 315: Incremental Structured Prediction Using a Global Learning

Experiments

Baseline models

Pipeline model including:

Joint SEG&POS-Tagging model (Zhang and Clark, 2010).

Word-based CFG parsing model (Zhang and Clark, 2009).

Zhang et al. ACL 2013

Page 316: Incremental Structured Prediction Using a Global Learning

Experiments

Our proposed models

Joint model with flat word structures

Joint model with annotated word structures

Zhang et al. ACL 2013

Page 317: Incremental Structured Prediction Using a Global Learning

Results

Task P R F

Pipeline Seg 97.35 98.02 97.69

Tag 93.51 94.15 93.83

Parse 81.58 82.95 82.26

Flat word Seg 97.32 98.13 97.73

structures Tag 94.09 94.88 94.48

Parse 83.39 83.84 83.61

Annotated Seg 97.49 98.18 97.84

word

structures Tag 94.46 95.14 94.80

Parse 84.42 84.43 84.43

WS 94.02 94.69 94.35

Zhang et al. ACL 2013

Page 318: Incremental Structured Prediction Using a Global Learning

Compare with other systems

Task Seg Tag Parse

Kruengkrai+ ’09 97.87 93.67 –

Sun ’11 98.17 94.02 –

Wang+ ’11 98.11 94.18 –

Li ’11 97.3 93.5 79.7

Li+ ’12 97.50 93.31 –

Hatori+ ’12 98.26 94.64 –

Qian+ ’12 97.96 93.81 82.85

Ours pipeline 97.69 93.83 82.26

Ours joint flat 97.73 94.48 83.61

Ours joint annotated 97.84 94.80 84.43

Zhang et al. ACL 2013

Page 319: Incremental Structured Prediction Using a Global Learning

Applications

Word segmentation

Dependency parsing

Context free grammar parsing

Combinatory categorial grammar parsing

Joint segmentation and POS-tagging

Joint POS-tagging and dependency parsing

Joint segmentation, POS-tagging and constituent parsing

Joint segmentation, POS-tagging and dependency parsing

Page 320: Incremental Structured Prediction Using a Global Learning

Traditional word-based dependency parsing

Inter-word dependencies

Zhang et al. ACL 2014

Page 321: Incremental Structured Prediction Using a Global Learning

Character-level dependency parsing

Inter- and intra-word dependencies

Zhang et al. ACL 2014

Page 322: Incremental Structured Prediction Using a Global Learning

Main method

An overview

Transition-based framework with global learning and beam search (Zhang and Clark, 2011)

Extensions from word-level transition-based dependency parsing models

Arc-standard (Nirve 2008; Huang et al., 2009 )

Arc-eager (Nirve 2008; Zhang and Clark, 2008)

Zhang et al. ACL 2014

Page 323: Incremental Structured Prediction Using a Global Learning

Main method

Word-level transition-based dependency parsing

Arc-standard

Zhang et al. ACL 2014

Page 324: Incremental Structured Prediction Using a Global Learning

Main method

Word-level transition-based dependency parsing

Arc-eager

Zhang et al. ACL 2014

Page 325: Incremental Structured Prediction Using a Global Learning

Main method

Word-level to character-level

Arc-standard

Zhang et al. ACL 2014

Page 326: Incremental Structured Prediction Using a Global Learning

Main method

Word-level to character-level

Arc-standard

Zhang et al. ACL 2014

Page 327: Incremental Structured Prediction Using a Global Learning

Main method

Word-level to character-level

Arc-eager

Zhang et al. ACL 2014

Page 328: Incremental Structured Prediction Using a Global Learning

Main method

Word-level to character-level

Arc-eager

Zhang et al. ACL 2014

Page 329: Incremental Structured Prediction Using a Global Learning

Main method

New features

Zhang et al. ACL 2014

Page 330: Incremental Structured Prediction Using a Global Learning

Experiments

Data

CTB5.0, CTB6.0, CTB7.0

Zhang et al. ACL 2014

Page 331: Incremental Structured Prediction Using a Global Learning

Experiments

Proposed models

STD (real, pseudo)

Joint segmentation and POS-tagging with inner dependencies

STD (pseudo, real)

Joint segmentation, POS-tagging and dependency parsing

STD (real, real)

Joint segmentation, POS-tagging and dependency parsing with inner dependencies

EAG (real, pseudo)

Joint segmentation and POS-tagging with inner dependencies

EAG (pseudo, real)

Joint segmentation, POS-tagging and dependency parsing

EAG (real, real)

Joint segmentation, POS-tagging and dependency parsing with inner dependencies

Zhang et al. ACL 2014

Page 332: Incremental Structured Prediction Using a Global Learning

Experiments

Final results

Zhang et al. ACL 2014

Page 333: Incremental Structured Prediction Using a Global Learning

Experiments

Analysis: word structure predication

OOV words

Overall

Assuming that the segmentation is correct

STD(real,real) 67.98%

EAG(real,real) 69.01%

STD(real,real) 87.64%

EAG(real,real) 89.07%

Zhang et al. ACL 2014

Page 334: Incremental Structured Prediction Using a Global Learning

Experiments

Analysis: word structure predication

OOV words

Zhang et al. ACL 2014

Page 335: Incremental Structured Prediction Using a Global Learning

Outline

Introduction Applications

Analysis ZPar

Page 336: Incremental Structured Prediction Using a Global Learning

Analysis

Empirical analysis

Theoretical analysis

Page 337: Incremental Structured Prediction Using a Global Learning

Analysis

Empirical analysis

Theoretical analysis

Page 338: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Effective on all the tasks: beam-search + global learning + rich features

What are the effects of global learning and beam-search, respectively

Study empirically using dependency parsing

Zhang and Nivre, COLING 2012

Page 339: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Learning, search, features

Arc-eager parser

Learning

Global training

Optimize the entire transition sequence for a sentence

Structured predication

Local training

Each transition is considered in isolation

No global view of the transition sequence for a sentence

Classfier

Zhang and Nivre, COLING 2012

Page 340: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Learning, search, features

Arc-eager parser

Learning

Features

Base features (local features) (Zhang and Clark, EMNLP 2008)

Features refer to combinations of atomic features (words and their POS tags) of the nodes on the stack and in the queue only.

All features (including rich non-local features) (Zhang and Nirve, ACL 2011)

Dependency distance

Valence

Grand and child features

Third-order features

Zhang and Nivre, COLING 2012

Page 341: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Learning, search, features

Arc-eager parser

Learning

Features

Search

Beam = 1, greedy

Beam > 1

Zhang and Nivre, COLING 2012

Page 342: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Contrast

Zhang and Nivre, COLING 2012

Page 343: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Observations

Beam = 1, global learning ≈ local learning

Beam > 1, global learning ↑, local learning ↓

Richer features, make ↑ or ↓ faster.

Zhang and Nivre, COLING 2012

Page 344: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Why does not local learning benefit from beam-search?

Zhang and Nivre, COLING 2012

Page 345: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Does greedy, local learning benefit from rich features?

Beam search (Zpar) and Greedy search (Malt) with non-local features

Zhang and Nivre, COLING 2012

Page 346: Incremental Structured Prediction Using a Global Learning

Empirical analysis

Conclusions

Global learning and beam-search benefit each other

Global learning and beam-search accommodate richer features without overfitting

Global learning and beam-search should be used simultaneously

Zhang and Nivre, COLING 2012

Page 347: Incremental Structured Prediction Using a Global Learning

Analysis

Empirical analysis

Theoretical analysis

Page 348: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

Online learning framework

1 training examples ( , ) |

set 0

for 1

for 1

calculate decode( , )

if( )

( ,

T

i i i

i i

i i

i i

x y

w

r C

i T

z w x

z y

w w x y

Inputs :

Initialization :

Algorithm :

) ( , )i ix z

w

output :

Michael Collins, EMNLP 2002

Page 349: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1

is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,

then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2

Michael Collins, EMNLP 2002

1

1

0

1

if can seperate the data, th

( ( ( , ) ( , )))

( ( , ) ( , ))

thus,

assume 0 and

en

another fact || || 1,

then

( ,

) ( , ))p

t

k k p

t t t

k p

t t t

k k

k

t t

w u w x y x y u

w u x y x y u

w u w u

w

u

x y u x

u

w k

y u

Page 350: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1

is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,

then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2

Michael Collins, EMNLP 2002

1

1

0

1

if can seperate the data, th

( ( ( , ) ( , )))

( ( , ) ( , ))

thus,

assume 0 and

en

another fact || || 1,

then

( ,

) ( , ))p

t

k k p

t t t

k p

t t t

k k

k

t t

w u w x y x y u

w u x y x y u

w u w u

w

u

x y u x

u

w k

y u

the margin

Page 351: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1

is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,

then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2

Michael Collins, EMNLP 2002

1 1

1 2 2 2

1 2 2 2

( ( , ) ( , ))

|| || || || 2( ( , ) ( , )) || ( , ) ( , )||

if we have this update, then

thus, || || || || || ( , ) ( , )|| ||

( , ) ( , ))

k k p

t t t

k k p k p

t t t t t t

k k p

t t t

pk kt t t

w w x y x y

w w x y x y w x y x y

w w x y x y

x y w x y w

2 2

0

1 2 2

|| 4

assume 0

then || || 4

k

k

w R

w

w kR

Page 352: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1

is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,

then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2

Michael Collins, EMNLP 2002

1 1

1 2 2 2

1 2 2 2

( ( , ) ( , ))

|| || || || 2( ( , ) ( , )) || ( , ) ( , )||

if we have this update, then

thus, || || || || || ( , ) ( , )|| ||

( , ) ( , ))

k k p

t t t

k k p k p

t t t t t t

k k p

t t t

pk kt t t

w w x y x y

w w x y x y w x y x y

w w x y x y

x y w x y w

2 2

0

1 2 2

|| 4

assume 0

then || || 4

k

k

w R

w

w kR

This is satisfied in dynamic programming, it may not hold in beam-search

Page 353: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1

is separable and for all 𝜙 𝑥, 𝑦 ≤ 𝑅,

then there exists some 𝜆 > 0, making the max error number (updating number) be less than 𝑅2/𝜆2

Michael Collins, EMNLP 2002

1

1 2 2

2 2 1 2 2

2 2

2 2

|| || 4

Thus, || || 4

4, another words, also

k

k

k

w k

w kR

k w kR

R Rk k

Page 354: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

If the data 𝑥𝑡, 𝑦𝑡 | 𝑇 𝑡 = 1

is not separable, we should assume

that there is an oracle u so that the number of errors made by it is o(T).

Michael Collins, EMNLP 2002

1

1 0 0

0

1

( ( ( , ) ( , )))

( ( , ) ( , ))

thus when ,

( ( )) ( ) ( )

assume 0 and another fact || || 1,

then ( )

k k p

t t t

k p

t t t

k

k

w u w x y x y u

w u x y x y u

k CT

w u k o k o k CR w u k o k w u

w u

w k o k

Page 355: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

Huang et al., NAACL 2012

Page 356: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

Huang et al., NAACL 2012

Page 357: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

The perceptron

The third factor must be less than zero! (violation)

Huang et al., NAACL 2012

Page 358: Incremental Structured Prediction Using a Global Learning

Theoretical analysis

Why early-update?

early update -- when correct label first falls off the beam

up to this point the incorrect prefix should score higher

standard update (full update) -- no guarantee!

Huang et al., NAACL 2012

(pruned)

Page 359: Incremental Structured Prediction Using a Global Learning

Outline

Introduction Applications

Analysis ZPar

Page 360: Incremental Structured Prediction Using a Global Learning

ZPar

Introduction

Usage

Development

On-going work

Contributions welcome

Page 361: Incremental Structured Prediction Using a Global Learning

ZPar

Brief introduction

Usage

Development

On-going work

Contributions welcome

Page 362: Incremental Structured Prediction Using a Global Learning

Brief introduction

Initiated in 2009 at Oxford, extended at Cambridge and SUTD, with more developers being involved

Page 363: Incremental Structured Prediction Using a Global Learning

Brief introduction

2009—2014, Oxford, Cambridge, SUTD

Functionalities extended

Page 364: Incremental Structured Prediction Using a Global Learning

Brief introduction

2009—2014, Oxford, Cambridge, SUTD

Functionalities extended

Released several versions

Page 365: Incremental Structured Prediction Using a Global Learning

Brief introduction

2009—2014, Oxford, Cambridge, SUTD

Functionalities extended

Released several versions

Contains all implementations of this tutorial

Segmentation

POS tagging (single or joint)

Dependency parsing (single or joint)

Constituent parsing (single or joint)

CCG parsing (single or joint)

Page 366: Incremental Structured Prediction Using a Global Learning

Brief introduction

2009—2014, Oxford, Cambridge, SUTD

Functionalities extended

Released several versions

Contains all implementations of this tutorial

Code structure

Page 367: Incremental Structured Prediction Using a Global Learning

ZPar

Introduction

Usage

Development

On-going work

Contributions welcome

Page 368: Incremental Structured Prediction Using a Global Learning

Usage

Download

http://sourceforge.net/projects/zpar/files/0.6/

Page 369: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf Chinese language processing:

Compile: make zpar

Page 370: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf Chinese language processing:

Compile: make zpar

Usage

Page 371: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf Chinese language processing:

Compile: make zpar

Usage

Model download

Page 372: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf Chinese language processing:

Compile: make zpar

Usage

Model download

An example

Page 373: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf English language processing:

Compile: make zpar.en

Page 374: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf English language processing:

Compile: make zpar.en

Usage

Page 375: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf English language processing:

Compile: make zpar.en

Usage

Model download

Page 376: Incremental Structured Prediction Using a Global Learning

Usage

For off-the-shelf English language processing:

Compile: make zpar.en

Usage

Model download

An example

Page 377: Incremental Structured Prediction Using a Global Learning

Usage

A generic ZPar

For many languages the tasks are similar

POS-tagging (consists morphological analysis) and parsing

Page 378: Incremental Structured Prediction Using a Global Learning

Usage

For generic processing:

Compile: make zpar.ge

Usage

Page 379: Incremental Structured Prediction Using a Global Learning

Usage

For generic processing:

Compile: make zpar.ge

Usage

An example

Page 380: Incremental Structured Prediction Using a Global Learning

Usage

Using the individual components

Chinese word segmentation

Makefile modification

Make

Train

Decode

SEGMENTOR_IMPL = agenda

make segmentor

./train input_file model_file iteration

./segmentor model_file input_file output_file

Page 381: Incremental Structured Prediction Using a Global Learning

Usage

Using the individual components

Chinese/English POS tagger

Makefile modification

Make

Train

Decode

CHINESE_TAGGER_IMPL = agenda

ENGLISH_TAGGER_IMPL = agenda

make chinese.postagger

make english.postagger

./train input_file model_file iteration

./tagger model_file input_file output_file

For Chinese POS-tagging

For English POS-tagging

Page 382: Incremental Structured Prediction Using a Global Learning

Usage

Using the individual components

Chinese/English dependency parsing

Makefile modification

Make

Train

Decode

CHINESE_DEPPARSER_IMPL = arceager

ENGLISH_DEPPARSER_IMPL = arceager

make chinese.depparser

make english.depparser

./train input_file model_file iteration

./tagger input_file output_file model_file

Page 383: Incremental Structured Prediction Using a Global Learning

Usage

Using the individual components

Chinese/English constituent parsing

Makefile modification

Make

Train

Decode

CHINESE_CONPARSER_IMPL = cad

ENGLISH_CONPARSER_IMPL = cad

make chinese.conparser

make english.conparser

./train input_file model_file iteration

./tagger input_file output_file model_file

For English/Chinese constituent parsing

For Chinese character-level constituent parsing

Page 384: Incremental Structured Prediction Using a Global Learning

Usage

A tip for training: obtain a best model

For i = 1 to maxN

./train inputfile modelfile 1

evaluate on a develop file and get current model’s performance

if(current performance is the best performance)

save current model

endif

End for

Page 385: Incremental Structured Prediction Using a Global Learning

Usage

More documentation at http://people.sutd.edu.sg/~yue_zhang/doc/index.html

Page 386: Incremental Structured Prediction Using a Global Learning

ZPar

Introduction

Usage

Development

On-going work

Contributions welcome

Page 387: Incremental Structured Prediction Using a Global Learning

Development

Add new implementation (dependency parsing as an example)

New folder under implementations

Page 388: Incremental Structured Prediction Using a Global Learning

Development

Add new implementation (dependency parsing as an example)

New folder under implementations

Modify necessary files

Page 389: Incremental Structured Prediction Using a Global Learning

Development

Add new implementation (dependency parsing as an example)

New folder under implementations

Modify necessary files

Modify the Makefile

# currently support eisner, covington, nivre, combined and joint implementations

CHINESE_DEPPARSER_IMPL = newmethod CHINESE_DEPPARSER_LABELED = false

CHINESE_DEPLABELER_IMPL = naive

# currently support sr implementations

CHINESE_CONPARSER_IMPL = jcad

# currently support only agenda

ENGLISH_TAGGER_IMPL = collins

# currently support eisner, covington, nivre, combined implementations

ENGLISH_DEPPARSER_IMPL = newmethod ENGLISH_DEPPARSER_LABELED = true

ENGLISH_DEPLABELER_IMPL = naive

# currently support sr implementations

ENGLISH_CONPARSER_IMPL = cad

Page 390: Incremental Structured Prediction Using a Global Learning

Development

Flexible—give your own Makefile for other tasks

Page 391: Incremental Structured Prediction Using a Global Learning

ZPar

Introduction

Usage

Development

On-going work

Contributions welcome

Page 392: Incremental Structured Prediction Using a Global Learning

On-going work

The release of ZPar 0.7 this year

New implementations

Deep learning POS-tagger (Ma et al., ACL 2014)

Character-based Chinese dependency parsing (Zhang et al., ACL 2014)

Non-projective parser with more optimizations

Double-stack and double-queue models for parsing heterogeneous dependencies (Zhang et al., COLING 2014)

Page 393: Incremental Structured Prediction Using a Global Learning

On-going work

The release of ZPar 0.7 this year

New implementations

The generic system will replace the Chinese system as the default version

Page 394: Incremental Structured Prediction Using a Global Learning

ZPar

Introduction

Usage

Development

On-going work

Contributions welcome

Page 395: Incremental Structured Prediction Using a Global Learning

Contributions welcome

Open source contributions

User interfaces

Tokenizer html, ….

Optimizations

Reduced memory usage

Parallel versions

Microsoft windows versions