2017-01-25-systemt-overview-stanford

65
1

Upload: laura-chiticariu

Post on 16-Apr-2017

59 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 2017-01-25-SystemT-Overview-Stanford

1

Page 2: 2017-01-25-SystemT-Overview-Stanford

2

Page 3: 2017-01-25-SystemT-Overview-Stanford

Bank6

23%

Bank5

20%

Bank4

21%

Bank3

5%

Bank2

15%

Bank1

15%

3

Page 4: 2017-01-25-SystemT-Overview-Stanford

Operations Analysis

4

Page 5: 2017-01-25-SystemT-Overview-Stanford

5

Page 6: 2017-01-25-SystemT-Overview-Stanford

6

Page 7: 2017-01-25-SystemT-Overview-Stanford

7

We are raising our tablet forecast.

S

areNP

We

S

raisingNP

forecastNP

tablet

DET

our

subj

obj

subj pred

Dependency

Tree

Oct 1 04:12:24 9.1.1.3 41865: %PLATFORM_ENV-1-DUAL_PWR: Faulty internal power supply B detected

Time Oct 1 04:12:24

Host 9.1.1.3

Process 41865

Category%PLATFORM_ENV-1-DUAL_PWR

MessageFaulty internal power supply B detected

Page 8: 2017-01-25-SystemT-Overview-Stanford

88

Singapore 2012 Annual Report(136 pages PDF)

Identify note breaking down Operating expenses line item, and extract opex components

Identify line item for Operating expenses from Income statement (financial table in pdf document)

Page 9: 2017-01-25-SystemT-Overview-Stanford

9

Page 10: 2017-01-25-SystemT-Overview-Stanford

10

Intel's 2013 capex is elevated at 23% of sales, above average of 16%

FHLMC reported $4.4bn net loss and requested $6bn in capital from Treasury.

I'm still hearing from clients that Merrill's website is better.

Customer or competitor?

Good or bad?

Entity of interest

Page 11: 2017-01-25-SystemT-Overview-Stanford

11

Page 12: 2017-01-25-SystemT-Overview-Stanford

’ 12

Page 13: 2017-01-25-SystemT-Overview-Stanford

13

Page 14: 2017-01-25-SystemT-Overview-Stanford

14

Page 15: 2017-01-25-SystemT-Overview-Stanford

15

Page 16: 2017-01-25-SystemT-Overview-Stanford

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin, in sagittis facilisis, volutpat dapibus, ultrices sit amet, sem , volutpat dapibus, ultrices sit amet,

sem Tomorrow, we will meet Mark Scott, Howard Smith and amet lt arcu tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis nunc

volutpat enim, quis viverra lacus nulla sit lectus. Curabitur cursus tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante.

Suspendisse

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin elementum neque at justo. Aliquam erat volutpat. Curabitur a massa. Vivamus luctus, risus in

sagittis facilisis arcu Tomorrow, we will meet Mark Scott, Howard Smith and hendrerit faucibus pede mi ipsum. Curabitur cursus tincidunt orci.

Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante. Suspendisse feugiat, erat in

Tokenization

(preprocessing step)

Level 1

Gazetteer[type = LastGaz] Last

Gazetteer[type = FirstGaz] First

Token[~ “[A-Z]\w+”] Caps

Rule priority used to prefer

First over Caps

• Rule priority used to prefer First over Caps.

• Lossy Sequencing: annotations dropped

because input to next stage must be a sequence

– First preferred over Last since it was declared earlier

16

Page 17: 2017-01-25-SystemT-Overview-Stanford

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin, in sagittis facilisis, volutpat dapibus, ultrices sit amet, sem , volutpat dapibus, ultrices sit amet,

sem Tomorrow, we will meet Mark Scott, Howard Smith and amet lt arcu tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis nunc

volutpat enim, quis viverra lacus nulla sit lectus. Curabitur cursus tincidunt orci. Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante.

Suspendisse

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin elementum neque at justo. Aliquam erat volutpat. Curabitur a massa. Vivamus luctus, risus in

sagittis facilisis arcu Tomorrow, we will meet Mark Scott, Howard Smith and hendrerit faucibus pede mi ipsum. Curabitur cursus tincidunt orci.

Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante. Suspendisse feugiat, erat in

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin elementum neque at justo. Aliquam erat volutpat. Curabitur a massa. Vivamus luctus, risus in e

sagittis Tomorrow, we will meet Mark Scott, Howard Smith and hendrerit faucibus pede mi sed ipsum. Curabitur cursus tincidunt orci.

Pellentesque justo tellus , scelerisque quis, facilisis quis, interdum non, ante. Suspendisse feugiat, erat in feugiat tincidunt, est nunc volutpat enim, quis viverra

lacus nulla sit amet lectus. Nulla odio lorem, feugiat et, volutpat dapibus, ultrices sit amet, sem. Vestibulum quis dui vitae massa euismod faucibus. Pellentesque

id neque id tellus hendrerit tincidunt. Etiam augue. Class aptent

Tokenization

(preprocessing step)

Level 1

Gazetteer[type = LastGaz] Last

Gazetteer[type = FirstGaz] First

Token[~ “[A-Z]\w+”] Caps

Level 2 First Last Person

First Caps Person

First Person

Rigid Rule Priority and Lossy

Sequencing in Level 1 caused

partial results

17

Page 18: 2017-01-25-SystemT-Overview-Stanford

••

18

Page 19: 2017-01-25-SystemT-Overview-Stanford

19

Page 20: 2017-01-25-SystemT-Overview-Stanford

AQL Language

Optimizer

Operator

Graph

Specify extractor semantics

declaratively (express logic of

computation, not control flow)

Choose efficient execution

plan that implements

semantics

Optimized execution plan

executed at runtime

20

Page 21: 2017-01-25-SystemT-Overview-Stanford

21

Page 22: 2017-01-25-SystemT-Overview-Stanford

Document

text: String

Person

last: Spanfirst: Span fullname: Span

22

Page 23: 2017-01-25-SystemT-Overview-Stanford

23 23

Page 24: 2017-01-25-SystemT-Overview-Stanford

Mark

Scott

Anna

DocumentInput Tuple

we will meet Mark

Scott and

Output Tuple 2 Span 2Document

Span 1Output Tuple 1 Document

Dictionary

24

Page 25: 2017-01-25-SystemT-Overview-Stanford

25

Page 26: 2017-01-25-SystemT-Overview-Stanford

Dictionary<First>

SmithScott

TomorrowMarkScottHowardSmith

Join<First> <Caps>

Join<First> <Last>

Mark ScottHowardSmith

Mark ScottHowardSmith

Union

Mark ScottHowardSmithMark ScottHowardSmith

ScottMark

Howard

Consolidate

Mark ScottHowardSmith

Dictionary<Last>

Regex<Caps>

……Tomorrow, we will meet Mark Scott, Howard Smith …

Explicit operator for

resolving ambiguityInput may contain overlapping annotations

(No Lossy Sequencing problem)

Output may contain overlapping annotations

(No Rigid Matching Regimes)

ScottMark

Howard

26

Page 27: 2017-01-25-SystemT-Overview-Stanford

create view FirstCaps asselect CombineSpans(F.name, C.name) as namefrom First F, Caps Cwhere FollowsTok(F.name, C.name, 0, 0);

<First> <Caps>

0 tokens

27

Page 28: 2017-01-25-SystemT-Overview-Stanford

create view Person asselect S.name as namefrom (

( select CombineSpans(F.name, C.name) as namefrom First F, Caps Cwhere FollowsTok(F.name, C.name, 0, 0))

union all( select CombineSpans(F.name, L.name) as namefrom First F, Last Lwhere FollowsTok(F.name, L.name, 0, 0))

union all( select *from First F )

) Sconsolidate on name;

<First><Caps>

<First><Last>

<First>

28

Page 29: 2017-01-25-SystemT-Overview-Stanford

create view Person asselect S.name as namefrom (

( select CombineSpans(F.name, C.name) as namefrom First F, Caps Cwhere FollowsTok(F.name, C.name, 0, 0))

union all( select CombineSpans(F.name, L.name) as namefrom First F, Last Lwhere FollowsTok(F.name, L.name, 0, 0))

union all( select *from First F )

) Sconsolidate on name;

Explicit clause for

resolving ambiguity

(No Rigid Priority

problem)

Input may contain

overlapping annotations

(No Lossy Sequencing

problem)

29

Page 30: 2017-01-25-SystemT-Overview-Stanford

30

Deep Syntactic Parsing ML Training & Scoring

Core Operators

Tokenization Parts of Speech DictionariesRegular

ExpressionsSpan

OperationsRelational Operations

Semantic Role Labels

Language to express NLP Algorithms AQL

….AggregationOperations

Page 31: 2017-01-25-SystemT-Overview-Stanford

31

package com.ibm.avatar.algebra.util.sentence;

import java.io.BufferedWriter;

import java.util.ArrayList;

import java.util.HashSet;

import java.util.regex.Matcher;

public class SentenceChunker

{

private Matcher sentenceEndingMatcher = null;

public static BufferedWriter sentenceBufferedWriter = null;

private HashSet<String> abbreviations = new HashSet<String> ();

public SentenceChunker ()

{

}

/** Constructor that takes in the abbreviations directly. */

public SentenceChunker (String[] abbreviations)

{

// Generate the abbreviations directly.

for (String abbr : abbreviations) {

this.abbreviations.add (abbr);

}

}

/**

* @param doc the document text to be analyzed

* @return true if the document contains at least one sentence boundary

*/

public boolean containsSentenceBoundary (String doc)

{

String origDoc = doc;

/*

* Based on getSentenceOffsetArrayList()

*/

// String origDoc = doc;

// int dotpos, quepos, exclpos, newlinepos;

int boundary;

int currentOffset = 0;

do {

/* Get the next tentative boundary for the sentenceString */

setDocumentForObtainingBoundaries (doc);

boundary = getNextCandidateBoundary ();

if (boundary != -1) {doc.substring (0, boundary + 1);

String remainder = doc.substring (boundary + 1);

String candidate = /*

* Looks at the last character of the String. If this last

* character is part of an abbreviation (as detected by

* REGEX) then the sentenceString is not a fullSentence and

* "false” is returned

*/

// while (!(isFullSentence(candidate) &&

// doesNotBeginWithCaps(remainder))) {

while (!(doesNotBeginWithPunctuation (remainder)

&& isFullSentence (candidate))) {

/* Get the next tentative boundary for the sentenceString */

int nextBoundary = getNextCandidateBoundary ();

if (nextBoundary == -1) {

break;

}

boundary = nextBoundary;

candidate = doc.substring (0, boundary + 1);

remainder = doc.substring (boundary + 1);

}

if (candidate.length () > 0) {

// sentences.addElement(candidate.trim().replaceAll("\n", "

// "));

// sentenceArrayList.add(new Integer(currentOffset + boundary

// + 1));

// currentOffset += boundary + 1;

// Found a sentence boundary. If the boundary is the last

// character in the string, we don't consider it to be

// contained within the string.

int baseOffset = currentOffset + boundary + 1;

if (baseOffset < origDoc.length ()) {

// System.err.printf("Sentence ends at %d of %d\n",

// baseOffset, origDoc.length());

return true;

}

else {

return false;

}

}

// origDoc.substring(0,currentOffset));

// doc = doc.substring(boundary + 1);

doc = remainder;

}

}

while (boundary != -1);

// If we get here, didn't find any boundaries.

return false;

}

public ArrayList<Integer> getSentenceOffsetArrayList (String doc)

{

ArrayList<Integer> sentenceArrayList = new ArrayList<Integer> ();

// String origDoc = doc;

// int dotpos, quepos, exclpos, newlinepos;

int boundary;

int currentOffset = 0;

sentenceArrayList.add (new Integer (0));

do {

/* Get the next tentative boundary for the sentenceString */

setDocumentForObtainingBoundaries (doc);

boundary = getNextCandidateBoundary ();

if (boundary != -1) {

String candidate = doc.substring (0, boundary + 1);

String remainder = doc.substring (boundary + 1);

/*

* Looks at the last character of the String. If this last character

* is part of an abbreviation (as detected by REGEX) then the

* sentenceString is not a fullSentence and "false" is returned

*/

// while (!(isFullSentence(candidate) &&

// doesNotBeginWithCaps(remainder))) {

while (!(doesNotBeginWithPunctuation (remainder) &&

isFullSentence (candidate))) {

/* Get the next tentative boundary for the sentenceString */

int nextBoundary = getNextCandidateBoundary ();

if (nextBoundary == -1) {

break;

}

boundary = nextBoundary;

candidate = doc.substring (0, boundary + 1);

remainder = doc.substring (boundary + 1);

}

if (candidate.length () > 0) {

sentenceArrayList.add (new Integer (currentOffset + boundary + 1));

currentOffset += boundary + 1;

}

// origDoc.substring(0,currentOffset));

// doc = doc.substring(boundary + 1);

doc = remainder;

}

}

while (boundary != -1);

if (doc.length () > 0) {

sentenceArrayList.add (new Integer (currentOffset + doc.length ()));

}

sentenceArrayList.trimToSize ();

return sentenceArrayList;

}

private void setDocumentForObtainingBoundaries (String doc)

{

sentenceEndingMatcher = SentenceConstants.

sentenceEndingPattern.matcher (doc);

}

private int getNextCandidateBoundary ()

{

if (sentenceEndingMatcher.find ()) {

return sentenceEndingMatcher.start ();

}

else

return -1;

}

private boolean doesNotBeginWithPunctuation (String remainder)

{

Matcher m = SentenceConstants.punctuationPattern.matcher (remainder);

return (!m.find ());

}

private String getLastWord (String cand)

{

Matcher lastWordMatcher = SentenceConstants.lastWordPattern.matcher (cand);

if (lastWordMatcher.find ()) {

return lastWordMatcher.group ();

}

else {

return "";

}

}

/*

* Looks at the last character of the String. If this last character is

* par of an abbreviation (as detected by REGEX)

* then the sentenceString is not a fullSentence and "false" is returned

*/

private boolean isFullSentence (String cand)

{

// cand = cand.replaceAll("\n", " "); cand = " " + cand;

Matcher validSentenceBoundaryMatcher =

SentenceConstants.validSentenceBoundaryPattern.matcher (cand);

if (validSentenceBoundaryMatcher.find ()) return true;

Matcher abbrevMatcher = SentenceConstants.abbrevPattern.matcher (cand);

if (abbrevMatcher.find ()) {

return false; // Means it ends with an abbreviation

}

else {

// Check if the last word of the sentenceString has an entry in the

// abbreviations dictionary (like Mr etc.)

String lastword = getLastWord (cand);

if (abbreviations.contains (lastword)) { return false; }

}

return true;

}

}

Java Implementation of Sentence Boundary Detection

create dictionary AbbrevDict from file

'abbreviation.dict’;

create view SentenceBoundary as

select R.match as boundary

from ( extract regex /(([\.\?!]+\s)|(\n\s*\n))/

on D.text as match from Document D ) R

where

Not(ContainsDict('AbbrevDict',

CombineSpans(LeftContextTok(R.match, 1),R.match)));

Equivalent AQL Implementation

31

Page 32: 2017-01-25-SystemT-Overview-Stanford

32

Page 33: 2017-01-25-SystemT-Overview-Stanford

33

Page 34: 2017-01-25-SystemT-Overview-Stanford

34

Page 35: 2017-01-25-SystemT-Overview-Stanford

35

Tokenization overhead is paid only once

First

(followed within 0 tokens)

Plan C

Plan A

Join

Caps

Restricted Span Evaluation

Plan B

FirstIdentify Caps starting

within 0 tokens

Extract text to the

right

CapsIdentify First ending

within 0 tokensExtract text to the left

Page 36: 2017-01-25-SystemT-Overview-Stanford

0

100

200

300

400

500

600

700

0 20 40 60 80 100

Average document size (KB)

Th

rou

gh

pu

t (K

B/s

ec

)

Open Source Entity Tagger

SystemT

10~50x faster

[Chiticariu et al., ACL’10] 36

Page 37: 2017-01-25-SystemT-Overview-Stanford

[Chiticariu et al., ACL’10]

Dataset Document SizeThroughput

(KB/sec)Average Memory

(MB)

Range Average ANNIE SystemT ANNIE SystemT

Web Crawl 68 B – 388 KB 8.8 KB 42.8 498.8 201.8 77.2

Medium SEC Filings

240 KB – 0.9 MB 401 KB 26.3 703.5 601.8 143.7

Large

SEC Flings1 MB – 3.4 MB 1.54 MB 21.1 954.5 2683.5 189.6

37

Page 38: 2017-01-25-SystemT-Overview-Stanford

38

Page 39: 2017-01-25-SystemT-Overview-Stanford

39

Page 40: 2017-01-25-SystemT-Overview-Stanford

PersonPhone

Person PhonePerson

Anna at James St. office (555-5555) ….

create view PersonPhone as

select P.name as person, N.number as phone

from Person P, Phone N

where Follows(P.name, N.number, 0, 30);

Person Phone

t1t2

t3t1 t3

t2 t3

Provenance:

Boolean

expression

40

Page 41: 2017-01-25-SystemT-Overview-Stanford

41

Page 42: 2017-01-25-SystemT-Overview-Stanford

2013 2015 2016 2017

• UC Santa Cruz

(full Graduate class)

2014

• U. Washington (Grad)

• U. Oregon (Undergrad)

• U. Aalborg, Denmark (Grad)

• UIUC (Grad)

• U. Maryland Baltimore County

(Undergrad)

• UC Irvine (Grad)

• NYU Abu-Dhabi (Undergrad)

• U. Washington (Grad)

• U. Oregon (Undergrad)

• U. Maryland Baltimore County

(Undergrad)

• …

• UC Santa Cruz, 3 lectures

in one Grad class

SystemT MOOC

42

Page 43: 2017-01-25-SystemT-Overview-Stanford

43

Page 44: 2017-01-25-SystemT-Overview-Stanford

44

Page 45: 2017-01-25-SystemT-Overview-Stanford

45

Page 46: 2017-01-25-SystemT-Overview-Stanford

create dictionary PurchaseVerbs as

('buy.01', 'purchase.01', 'acquire.01', 'get.01');

create view Relation as

select A.verb as BUY_VERB, R2.head as PURCHASE, A.polarity as WILL_BUY

from Action A, Role R

where

MatchesDict('PurchaseVerbs', A.verbClass);

and Equals(A.aid, R.aid)

and Equals(R.type, 'A1');

ACL ‘15, ‘16, EMNLP ‘16, COLING ’16a, ‘16b, ‘16c

46

Page 47: 2017-01-25-SystemT-Overview-Stanford

47

Page 48: 2017-01-25-SystemT-Overview-Stanford

48

Page 49: 2017-01-25-SystemT-Overview-Stanford

49

Page 50: 2017-01-25-SystemT-Overview-Stanford

50

Page 51: 2017-01-25-SystemT-Overview-Stanford

51

Page 52: 2017-01-25-SystemT-Overview-Stanford

52

Page 53: 2017-01-25-SystemT-Overview-Stanford

Ease of

Programming

Ease of

Sharing

53

Page 54: 2017-01-25-SystemT-Overview-Stanford

54

R1: create view Phone as

Regex(‘d{3}-\d{4}’, Document, text);

R2: create view Person as

Dictionary(‘first_names.dict’, Document, text);

Dictionary file first_names.dict:

anna, james, john, peter…

R3: create table PersonPhone(match span);

insert into PersonPhone

select Merge(F.match, P.match) as match

from Person F, Phone P

where Follows(F.match, P.match, 0, 60);

Person PhonePerson Person Phone

Anna at James St. office (555-5555), or James, her assistant - 777-7777 have the details.

••

54

Page 55: 2017-01-25-SystemT-Overview-Stanford

PersonDictionary

FirstNames.dict

Doc

PersonPhoneJoin

Follows(name,phone,0,60)

James

James555-5555

PhoneRegex

/\d{3}-\d{4}/

555-5555

PhonePerson

Anna at James St. office (555-5555), …

55

Page 56: 2017-01-25-SystemT-Overview-Stanford

56

56

HLC 2

Remove James

from output of R2’Dictionary op.

HLC 3: Remove

James555-5555

from output of R3’s

join op.

HLC 1

Remove 555-5555

from output of

R1’s Regex op.

true

Merge(F.match, P.match) as match

⋈Follows(F.match,P.match,0,60)

Dictionary‘firstName.dict’, text

Regex‘\d{3}-\d{4}’, text

R2 R1

R3

Doc

Goal: remove “James 555-5555” from output

56

Page 57: 2017-01-25-SystemT-Overview-Stanford

57

57

HLC 2

Remove James

from output of R2’Dictionary op.

HLC 3: Remove

James555-5555

from output of R3’s

join op.

HLC 1

Remove 555-5555

from output of

R1’s Regex op.

true

Merge(F.match, P.match) as match

⋈Follows(F.match,P.match,0,60)

Dictionary‘firstName.dict’, text

Regex‘\d{3}-\d{4}’, text

R2 R1

R3

Doc

Goal: remove “James 555-5555” from output

LLC 1

Remove ‘James’from FirstNames.dict

LLC 2

Add filter pred. on

street suffix in right

context of match

LLC 3

Reduce character gap between

F.match and P.match from 60 to 10

57

Page 58: 2017-01-25-SystemT-Overview-Stanford

58

Dictionary ContainsDict()

Contains IsContained Overlaps

“ PersonPhonePersonPhone ”

58

Page 59: 2017-01-25-SystemT-Overview-Stanford

59 • Input:

– Set of HLCs, provenance graph, labeled results

• Output:

– List of LLCs, ranked based on improvement in F1-measure

• Algorithm:

– For each operator Op, consider all HLCs (ti, Op)– For each HLC, enumerate all possible LLCs– For each LLC:

• Compute the set of local tuples it removes from the output of Op

• Propagate these removals up through the provenance graph to compute the effect on end-to-end result

– Rank LLCs

59

Page 60: 2017-01-25-SystemT-Overview-Stanford

– t Dictionary

Op

– k Op

– k

– O(n2) n

Op ti Op ti

+++

+

+++

+ ++

+ +

+

+ + ++

++ +

+

++++

-- - --

- - -- - --

--

- - ---

-

--

--

- -

-Tuples to remove

from output of OpOutput tuples

60

Page 61: 2017-01-25-SystemT-Overview-Stanford

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Baseline I1 I2 I3 I4 I5

Enron

ACE

CoNLL

EnronPP

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Baseline I1 I2 I3 I4 I5

Enron

ACE

CoNLL

EnronPP

61

Precision improves greatly after a few iterations, while recall remains fairly stable

Precision – % correct results of total results identified

Recall – % correct results identified of total correct labels

Person extraction on formal text (CoNLL, ACE)Person and PersonPhone extraction on informal text (Enron)

61

Page 62: 2017-01-25-SystemT-Overview-Stanford

0

10

20

30

40

50

60

70

80

90

F1- measure

62

Almost all expert’s refinements are among top 12 generated refinements

Done in 2 minutes !

Expert A after 1 hour, 9 refinements

Person extraction on informal text (Enron)

62

Page 63: 2017-01-25-SystemT-Overview-Stanford

63

Page 64: 2017-01-25-SystemT-Overview-Stanford

Development Environment

AQL Extractor

create view ProductMention asselect ...from ...where ...

create view IntentToBuy asselect ...from ...where ... Cost-based

optimization

. .

.

Discovery tools for AQL development

SystemT Runtime

Input

Documents

Extracted

Objects

Challenge: Building extractors for enterprise applications requires an information extraction system that is expressive,

efficient, transparent and usable. Existing solutions are either rule-based solutions based on cascading grammar with

expressivity and efficiency issues, or black-box solutions based on machine learning with lack of transparency.

Our Solution: A declarative information extraction system with cost-based optimization, high-performance runtime and

novel development tooling based on solid theoretical foundation [PODS’13, PODS’14], shipping with over 10+ IBM products.

AQL: a declarative language that can be used to build extractors

outperforming the state-of-the-arts [ACL’10]

Multilingual SRL-enabled: [ACL’15, ACL’16, EMNLP’16, COLING’16]

A suite of novel development tooling leveraging

machine learning and HCI [EMNLP’08, VLDB’10,

ACL’11, CIKM’11, ACL’12, EMNLP’12, CHI’13,

SIGMOD’13, ACL’13,VLDB’15, NAACL’15]

Cost-based optimization for

text-centric operations [ICDE’08, ICDE’11, FPL’13, FPL’14]

Highly embeddable runtime

with high-throughput and

small memory footprint. [SIGMOD Record’09, SIGMOD’09]

For details and

Online Class visit:https://ibm.biz/BdF4GQ

64