openki: integrating open information extraction and ... · closed information extraction according...

OpenKI: Integrating Open Information Extraction and Knowledge Baseswith Relation Inference

Dongxu Zhang1⇤, Subhabrata Mukherjee2, Colin Lockard2,Xin Luna Dong2, Andrew McCallum1

1University of Massachusetts Amherst{dongxuzhang,mccallum}@cs.umass.com

2Amazon{subhomj,clockard,lunadong}@amazon.com

Abstract

In this paper, we consider advancing web-scale knowledge extraction and alignment byintegrating OpenIE extractions in the form of(subject, predicate, object) triples with Knowl-edge Bases (KB). Traditional techniques fromuniversal schema and from schema mappingfall in two extremes: either they performinstance-level inference relying on embeddingfor (subject, object) pairs, thus cannot handlepairs absent in any existing triples; or they per-form predicate-level mapping and completelyignore background evidence from individualentities, thus cannot achieve satisfying quality.

We propose OpenKI to handle sparsity of Ope-nIE extractions by performing instance-levelinference: for each entity, we encode the richinformation in its neighborhood in both KBand OpenIE extractions, and leverage this in-formation in relation inference by exploringdifferent methods of aggregation and attention.In order to handle unseen entities, our modelis designed without creating entity-specific pa-rameters. Extensive experiments show thatthis method not only significantly improvesstate-of-the-art for conventional OpenIE ex-tractions like ReVerb, but also boosts theperformance on OpenIE from semi-structureddata, where new entity pairs are abundant anddata are fairly sparse.

1 Introduction

Web-scale knowledge extraction and alignmenthas been a vision held by different communitiesfor decades. The Natural Language Processing(NLP) community has been focusing on knowl-edge extraction from texts. They apply eitherclosed information extraction according to an on-tology (Mintz et al., 2009; Zhou et al., 2005), re-stricting to a subset of relations pre-defined in theontology, or open information extraction (OpenIE)

⇤ This work was performed while at Amazon.

to extract free-text relations (Banko et al., 2007;Fader et al., 2011), leaving the relations unalignedand thus potentially duplicated. The Database(DB) community has been focusing on aligning re-lational data or WebTables (Cafarella et al., 2008)by schema mapping (Rahm and Bernstein, 2001),but the quality is far below adequate for assuringcorrect data integration.

We propose advancing progress in this direc-tion by applying knowledge integration from Ope-nIE extractions. OpenIE extracts SPO (subject,predicate, object) triples, where each elementis a text phrase, such as E1: (“Robin Hood”,“Full Cast and Crew”, “Leonardo Decaprio”)and E2: (“Ang Lee”, “was named best direc-tor for”, “Brokeback”). OpenIE has been stud-ied for text extraction extensively (Yates et al.,2007; Fader et al., 2011; Mausam et al., 2012),and also for semi-structured sources (Bronzi et al.,2013), thus serves an effective tool for web-scaleknowledge extraction. The remaining problem isto align text-phrase predicates1 from OpenIE toknowledge bases (KB). Knowledge integration an-swers the following question: given an OpenIE ex-traction (s, p, o), how can one populate an existingKB using relations in the pre-defined ontology?

The problem of knowledge integration is notcompletely new. The DB community has beensolving the problem using schema mapping tech-niques, identifying mappings from a sourceschema (OpenIE extractions in our context) to atarget schema (KB ontology in our context) (Rahmand Bernstein, 2001). Existing solutions considerpredicate-level (i.e., attribute) similarity on names,types, descriptions, instances, and so on, and gen-erate mappings like “email” mapped to “email-

1We also need to align text-phrase entities, which falls inthe area of entity linking (Dredze et al., 2010; Ji et al., 2014);it is out of scope of this paper and we refer readers to relevantreferences.

address”; “first name” and “last name” togethermapped to “full name”. However, for our example“Full Cast and Crew”, which is a union of multipleKB relations such as “directed by”, “written by”,and “actor”, it is very hard to determine a mappingat the predicate level.

On the other hand, the NLP community has pro-posed Universal Schema (Riedel et al., 2013) toapply instance-level inference from both OpenIEextractions and knowledge in existing knowledgebases: given a set of extractions regarding an en-tity pair (s, o) and also information of each entity,infer new relations for this pair. One drawback ofthis method is that it cannot handle unseen enti-ties and entity pairs. Also, the technique tends tooverfit when the data is sparse due to large num-ber of parameters for entities and entity pairs. Un-fortunately, in the majority of the real extractionswe examined in our experiments, we can find only1.4 textual triples on average between the subjectand object. The latest proposal Rowless UniversalSchema (Verga et al., 2017) removes the entity-specific parameters and makes the inference di-rectly between predicates and relations, therebyallowing us to reason about unseen entity pairs.However, it completely ignores the entities them-selves, so in a sense falls back to predicate-leveldecisions, especially when only one text predicateis observed.

In this paper we propose a solution that lever-ages information about the individual entitieswhenever possible, and falls back to predicate-level decisions only when both involved entitiesare new. Continuing with our example E1 – if weknow from existing knowledge that “Leonardo” isa famous actor and has rarely directed or written amovie, we can decide with a high confidence thatthis predicate maps to @film.actor in this triple,even if our knowledge graph knows nothing aboutthe new movie “Robin Hood”. In particular, wemake three contributions in this paper.

1. We design an embedding for each entity by ex-ploring rich signals from its neighboring rela-tions and predicates in KB and OpenIE. Thisembedding provides a soft constraint on whichrelations the entities are likely to be involvedin, while keeping our model free from creatingnew entity-specific parameters so allowing usto handle unseen entities during inference.

2. Inspired by predicate-level mapping fromschema mapping and instance-level inference

from universal schema, we design a joint modelthat leverages the neighborhood embedding ofentities and relations with different methods ofaggregation and attention.

3. Through extensive experiments on variousOpenIE extractions and KB, we show thatour method improves over state-of-the-arts by33.5% on average across different datasets.

In the rest of the paper, we define the problemformally in Section 2, present our method in Sec-tion 3, describe experimental results in Section 4,and discuss related work in Section 5.

2 Problem Overview

Problem Statement. Given (i) an existing knowl-edge base KB of triples (s, p, o) – where s, o 2E

KB (set of KB entities) and p 2 R

KB (set ofKB relations), and (ii) a set of instances (s0, p0, o0)from OpenIE extraction (s0 and o

0 may not be-long to E

KB , and p

0 are text predicates)2: predictscore(s

0, p, o

0) – where p 2 R

KB .For example, given E1 and E2 as OpenIE extrac-tions and background knowledge bases (KB) likeIMDB, we want to predict “@film.actor” relationgiven E1 and “@film.directed by” relation givenE2 as the target KB relations between the partic-ipating entities. Particularly, we want to performthis relation inference at instance-level, which canbe different for different entities sharing the samepredicate. Table 1 introduces important notationsused in this paper.

s Subjecto Objectp KB relation or text predicatev

s

, v

p

, v

o

, v

s,o

Embedding vectors of s, p, o and (s, o)

S(s, p, o) Scoring function for (s, p, o) to be trueAgg

p2R(s,o)(vp) Aggregation function over embeddings(v

p

) of p shared by s and o.

Table 1: Notation table.

2.1 Existing Solution and BackgroundUniversal Schema (F-Model) (Riedel et al.,2013) is modeled as a matrix factorization taskwhere entity pairs, e.g., (RobinHood, LeonardoDecaprio) form the rows, and relations from Ope-nIE and KB form the columns (e.g., @film.actor,“Full Cast and Crew”). During training, we ob-serve some positive entries in the matrix and theobjective is to predict the missing cells at test time.

2 In this paper, a ‘relation’ always refers to a KB relation,whereas a ‘predicate’ refers to an OpenIE textual relation.

Each (subject, predicate, object) triple is scored as:

S

F

(s, p, o) = v

s,o

· vTp

where, vs,o

2 Rd is the embedding vector of theentity pair (subject, object), v

p

is the embeddingvector of a KB relation or OpenIE predicate, andthe triple score is obtained by their dot product.The parameters v

p

and v

s,o

are randomly initial-ized and learned via gradient descent.

One of the drawbacks of universal schema is theexplicit modeling of entity pairs using free param-eters v

s,o

. Therefore, it cannot model unseen en-tities. This also makes the model overfit on ourdata as the number of OpenIE text predicates ob-served with each entity pair is rather small (1.4 onaverage in our datasets).Universal Schema (E-Model) (Riedel et al.,2013) considers entity-level information, thus de-composing the scoring function from the F-modelas follows:

S

E

(s, p, o) = S

subj

(s, p) + S

obj

(p, o)

= v

s

· vsubjTp

+ v

o

· vobjTp

(1)

where each relation is represented by two vectorscorresponding to its argument type for a subject oran object. The final score is an additive summa-tion over the subject and object scores S

subj

andS

obj

that implicitly contain the argument type in-formation of the predicate p. Thus, a joint F- andE-model of S

F+E

= S

F

+ S

E

can perform rela-tion inference at instance-level considering the en-tity information. Although the E-model capturesrich information about entities, it still cannot dealwith unseen entities due to the entity-specific freeparameters v

s

and v

o

.Rowless Universal Schema (Rowless) (Vergaet al., 2017) handles new entities as follows. Itconsiders all relations in KB and OpenIE that thesubject s and object o co-participates in (denotedby R(s, o)), and represents the entity pair with anaggregation over embeddings of these relations.

v

Rowless

s,o

= Agg

p

02R(s,o)(vp0)

S

Rowless

(s, p, o) = v

Rowless

s,o

· vTp

(2)

Agg(.) is an aggregation function like averagepooling, max pooling, hard attention (RowlessMaxR) or soft attention given query relations(Rowless Attention) (Verga et al., 2017). The

Rowless model ignores the individual informa-tion of entities, and therefore falls back to mak-ing predicate-level decisions in a sense, especiallywhen there are only a few OpenIE predicates foran entity pair.

3 Our Approach

We propose OpenKI for instance-level relation in-ference such that it (i) captures rich informationabout each entity from its neighborhood KB rela-tions and text predicates to serve as backgroundknowledge and generalizes to unseen entities bynot learning any entity-specific parameters (onlyKB relations and OpenIE predicates are parame-terized) (ii) considers both shared predicates andentity neighborhood information to encode entitypair information. Figure 1 shows the architectureof our model.

3.1 Entity Neighborhood Encoder (ENE)

The core of our model is the Entity NeighborhoodEncoder. Recall that Rowless Universal Schemarepresents each entity pair with common relationsshared by this pair. However, it misses critical in-formation when entities do not only occur in thecurrent entity pair, but also interact with other en-tities. This entity neighborhood can be regarded asa soft and fine-grained entity type information thatcould help infer relations when observed text pred-icates are ambiguous (polysemous), noisy (lowquality of data source) or low-frequency (sparsityof language representation). 3

Our aim is to incorporate this entity neighbor-hood information into our model for instance-levelrelation inference while keeping it free of entity-specific parameters. To do this, for each entity, weleverage all its neighboring KB relations and Ope-nIE predicates for relation inference. We aggre-gate their embeddings to obtain two scores for thesubject and object separately in our ENE model.The subject score S

ENE

subj

for an entity considersthe aggregated embedding of its participating KBrelations and OpenIE predicates where it serves asa subject (similar for the object score S

ENE

obj

):

3Note that, the notion of entity neighborhood is differ-ent from the Neighborhood model in the Universal Schemawork (Riedel et al., 2013). Our entity neighborhood cap-tures information of each entity, whereas their Neighborhoodmodel leverages prediction from similar predicates.

w1 ( )

Aggregation Function


"Life of Pi"

IMDB:”Cast" Rottentomato: ”Rating:"

IMDB: "Director:"

Rottentomato: “Cast"

"Ang Lee"@film.directed_by @film.directed_by

Neighborhood of Subject Neighborhood of Object

WeightedAveraging

(”Life of Pi" , "Ang Lee")

IMDB:"Full Cast & Crew"

@film.directed_byScore("life of Pi", @film.directed_by,"Ang Lee") =

Predicates between the Entity Pair

IMDB: "Excecutive Director"

w2 =


"Life of Pi (2012)"

IMDB:"Executive Director"

"Ang Lee"



( ) ( ) ( )

"Ang Lee""Life of Pi"

IMDB:”Full Cast & Crew"


?

IMDB:“Cast"Rotte

ntomato:

”Cast"

IMDB:”Director:"

Sub-graph

w1, w2

Input sub-graph

[email protected]_by

( )

Query dependent attention Entity neighbor attention

Rottentomato:“Rating:"


allmovie:”D

irector"

Embedding with free parameters

Intermediate layer

Figure 1: Architecture of the proposed method. In this example, the ENE model uses “Ang Lee’s” neighboring predicates“IMDB:Director” and “allmovie:Director” for predicting the target KB relation “@film.directed by”. The attention mechanismassigns a larger weight over “IMDB:Executive Director” for generating entity pair embedding. Different colors of vectorsrepresent different sets of parameters. The Entity Neighborhood Encoder (ENE) (yellow and pink) model contributes thefollowing components to the final scoring function: (1) entity neighborhood scores SENE

subj

and S

ENE

obj

of the subject and objectrespectively; (2) query and neighborhood signal for the attention module to calculate the weight of each text predicate (Blue)and the attention score S

Att

(subj, pred, obj).

v

agg

subj

= Agg

p

02R(s,·)(vsubj

p

0 )

S

ENE

subj

(s, p) = v

agg

subj

· vsubj T

p

v

agg

obj

= Agg

p

02R(·,o)(vobj

p

0 )

S

ENE

obj

(p, o) = v

agg

obj

· vobj T

p

(3)

R(s, .) denotes all neighboring relations and pred-icates of the subject s (similar for the object).v

subj

p

and v

obj

p

are the only free parameters in ENE.These are randomly initialized and then learnedvia gradient descent. We choose average poolingas our aggregation function to capture the propor-tion of different relation and predicate types withinthe target entity’s neighborhood.

3.2 Attention Mechanism

Given multiple predicates between a subject andan object, only some of them are important for pre-dicting the target KB relation between them. Forexample, in Figure 1, the predicate “Executive Di-rector” is more important than “Full Cast & Crew”to predict the KB relation “@film.directed by” be-tween “Life of Pi” and “Ang Lee”.

We first present a query-based attention mecha-nism from earlier work, and then present our ownsolution with a neighborhood attention and com-bining both in a dual attention mechanism.

3.2.1 Query AttentionThe first attention mechanism uses a query relationq (i.e., the target relation we may want to predict)to find out the importance (weight) w

p|q of differ-ent predicates p with respect to q with v

p

and v

q

asthe corresponding relation embeddings.

w

p|q =exp(v

q

· vTp

)

Pp

0 exp(vq

· vTp

0)

Thus, given each query relation q, the model triesto find evidence from predicates that are most rel-evant to the query. Similar techniques have beenused in (Verga et al., 2017). We can also use hardattention (referred as MaxR) instead of soft atten-tion where the maximum weight is replaced withone and others with zero. One potential shortcom-ing of this attention mechanism is its sensitivity tonoise, whereby it may magnify sparsely observedpredicates between entities.

3.2.2 Neighborhood AttentionIn this attention mechanism, we use the subjectand object’s neighborhood information as a filterto remove unrelated predicates. Intuitively, the en-tity representation generated by the ENE from itsneighboring relations can be regarded as a soft andfine-grained entity type information.

Consider the embedding vectors vaggsubj

and v

agg

obj

in Equation 3 that are aggregated from the en-tity’s neighboring predicates and relations using

an aggregation function. We compute the simi-larity w

p|Nb

between an entity’s neighborhood in-formation given by the above embeddings and atext predicate p to enforce a soft and fine-grainedargument type constraint over the text predicate:

w

p|Nb

=

exp(v

agg

subj

· vsubjT

p

+ v

agg

obj

· vobjT

p

)

Pp

0 exp(vagg

subj

· vsubjT

p

0 + v

agg

obj

· vobjT

p

0 )

Finally, we combine both the query-dependentand neighborhood-based attention into a Dual At-tention mechanism:

w

p|q+Nb

= w

p|q · wp|Nb

w

p

=

w

p|q+NbPp

0 wp

0|q+Nb

And the score function is given by:

S

Att

(s, q, o) = Agg

p2R(s,o)(vp) · vTq

=

�X

p

w

p

v

p

�· vT

q

(4)

3.3 Joint Model: OpenKIAll of the above models capture different types offeatures. Given a target triple (s, p, o), we com-bine scores from Eq. 3 and Eq. 4 in our finalOpenKI model. It aggregates the neighborhoodinformation of s and o and also uses an attentionmechanism to focus on the important predicatesbetween s and o. Refer to Figure 1 for an illustra-tion. The final score of (s, p, o) is given by:

score(s, p, o) = f1(SAtt

(s, p, o)) ⇤ReLU(↵1)

+ f2(SENE

subj

(s, p)) ⇤ReLU(↵2)

+ f3(SENE

obj

(p, o)) ⇤ReLU(↵3)

where f

i

(X) = �(a

i

X + b

i

) normalizes differentscores to a comparable distribution. ReLU(↵

i

)

enforces non-negative weights that allow scores toonly contribute to the final model without cancel-ing each other. a

i

, b

i

,↵

i

are free parameters thatare learned during the back propagation gradientdescent process.

3.4 Training ProcessOur task is posed as a ranking problem. Givenan entity pair, we want the observed KB relationsbetween them to have higher scores than the unob-served ones. Thus, a pair-wise ranking based lossfunction is used to train our model:

L(s, p

pos

, p

neg

, o) = max(0, � � score(s, p

pos

, o)

+ score(s, p

neg

, o))

where ppos

refers to a positive relation, pneg

refersto a uniformly sampled negative relation, and �

is the margin hyper-parameter. We optimize theloss function using Adam (Kingma and Ba, 2014).The training process uses early stop according tothe validation set.

3.5 Explicit Argument Type Constraint

Subject and object argument types of relationshelp in filtering out a large number of candidaterelations that do not meet the argument type, andtherefore serve as useful constraints for relation in-ference. Similar to (Yu et al., 2017), we identifythe subject and object argument type of each rela-tion by calculating its probability of co-occurrencewith subject / object entity types. During infer-ence, we select candidate relations by performinga post-processing filtering step using the subjectand object’s type information when available.

4 Experiments

4.1 Data

We experiment with the following OpenIEdatasets and Knowledge Bases.(i) Ceres (Lockard et al., 2019) works on semi-structured web pages (e.g., IMDB) and exploitsthe DOM Tree and XPath (Olteanu et al., 2002)structure in the page to extract triples like (Incred-ibles 2, Cast and Crew, Brad Bird) and (Incredi-bles 2, Writers, Brad Bird). We apply Ceres onthe SWDE (Hao et al., 2011) movie corpus to gen-erate triples. We align these triples to two differ-ent knowledge bases: (i) IMDB and (ii) subset ofFreebase with relations under /film domain. Theaverage length of text predicates is 1.8 tokens forCeres extractions.(ii) ReVerb (Fader et al., 2011) works at sentencelevel and employs various syntactic constraintslike part-of-speech-based regular expressions andlexical constraints to prune incoherent and unin-formative extractions. We use 3 million ReVerbextractions from ClueWeb where the subject is al-ready linked to Freebase (Lin et al., 2012) 4. Wealign these extractions to (i) entire Freebase and(ii) subset of Freebase with relations under /filmdomain. The average length of text predicates is3.4 tokens for ReVerb extractions.

4Extractions are downloadable at http:

//knowitall.cs.washington.edu/linked_

extractions/

ReVerb + ReVerb + Ceres + Ceres +Freebase Freebase(/film) Freebase(/film) IMDB

Training set

# entity pairs for model training 40,878 1,102 23,389 64,539# KB relation types 250 64 54 66# OpenIE predicate types 124,836 35,366 124 178

Test set

# test triples 4938 402 986 998Avg./Med. # text edges per entity pair 1.74 / 1 1.49 / 1 1.35 / 1 1.23 / 1Avg./Med. # edges for each subj 95.71 / 9 100.27 / 23 48.80 / 44 121.06 / 110Avg./Med. # kb edges for each subj 61.41 / 4 8.30 / 6 7.00 / 7 30.77 / 29Avg./Med. # edges for each obj 699.89 / 62 24.81 / 8 558.33 / 9 775.74 / 12Avg./Med. # kb edges for each obj 325.70 / 23 10.11 / 6 340.96 / 3 606.31 / 6

Table 2: Data Statistics (Avg: Average, Med: Median).

Models ReVerb + ReVerb + Ceres + Ceres +Freebase Freebase(/film) Freebase (/flim) IMDB

P (p|p0) (similar to PMI (Angeli et al., 2015)) 0.412 0.301 0.507 0.663P (p|s, p0, o) 0.474 0.317 0.627 0.770

E-model (Riedel et al., 2013) 0.215 0.156 0.431 0.506ENE 0.479 0.359 0.646 0.808

Rowless with MaxR (Verga et al., 2017) 0.318 0.285 0.481 0.659Rowless with Query Attn. (Verga et al., 2017) 0.326 0.278 0.512 0.695

OpenKI with MaxR 0.500 0.378 0.649 0.802OpenKI with Query Att. 0.497 0.372 0.663 0.800OpenKI with Neighbor Att. 0.495 0.372 0.650 0.813OpenKI with Dual Att. 0.505 0.365 0.658 0.814

Table 3: Mean average precision (MAP) of different models over four data settings.

In order to show the generalizability of our ap-proach to traditional (non OpenIE) corpora, wealso perform experiments in the New York Times(NYT) and Freebase dataset (Riedel et al., 2010),which is a well known benchmark for distant su-pervision relation extraction. We consider the sen-tences (average length of 18.8 tokens) there to bea proxy for text predicates. These results are pre-sented in Section 4.5.

Data preparation: We collect all entity mentionsM from OpenIE text extractions, and all candidateentities E

KB

from KB whose name exists in M .We retain the sub-graph G

KB

of KB triples wherethe subject and object belongs to E

KB

. Similarto (Riedel et al., 2013), we use string match to col-lect candidate entities for each entity mention. Foreach pair of entity mentions, we link them if twocandidate entities in E

KB

share a relation in KB.Otherwise, we link each mention to the most com-mon candidate. For entity mentions that cannotbe linked to KB, we consider them as new entitiesand link together mentions that share same text .

For validation and test, we randomly hold-out apart of the entity pairs from G

KB

where text pred-

icates are observed. Our training data consists ofthe rest of G

KB

and all the OpenIE text extrac-tions. In addition, we exclude direct KB triplesfrom training where corresponding entity pairs ap-pear in the test data (following the data settingof (Toutanova et al., 2015)). Table 2 shows thedata statistics 5.

We adopt a similar training strategy as Univer-sal Schema for the Ceres dataset – that not onlylearns direct mapping from text predicates to KBrelations, but also clusters OpenIE predicates andKB relations by their co-occurrence. However, forthe ReVerb data containing a large number of textpredicates compared to Ceres, we only learn thedirect mapping from text predicates to KB rela-tions that empirically works well for this dataset.

4.2 Verifying Usefulness of NeighborhoodInformation: Bayesian Methods

To verify the usefulness of the entity’s neigh-borhood information, we devise simple Bayesianmethods as baselines. The simplest method counts

5Our datasets with train, test, validation split are down-loadable at https://github.com/zhangdongxu/

relation-inference-naacl19 for benchmarking.

the co-occurrence of text predicates and KB rela-tions (by applying Bayes rule) to find the condi-tional probability P (p|p0) of a target KB relationp given a set of observed text predicates p

0. Thisperforms relation inference at predicate-level.

Then, we can include the entity’s relationalneighbors in the Bayesian network by adding theneighboring predicates and relations of the sub-ject (given by p

N

s

) and object (given by p

N

o

) tofind P (p|s, p0, o), which performs relation infer-ence at the instance-level. The graph structures ofthese three Bayesian methods are shown in Fig-ure 2. For detailed formula derivation, please referto Appendix A.

Figure 2: Structures of P (p|p0), P (p|s, o), P (p|s, p0, o)are listed from top to bottom.

4.3 Baselines and Experimental SetupAngeli et al. (2015) employ point-wise mutual in-formation (PMI) between target relations and ob-served predicates to map OpenIE predicates toKB relations. This is similar to our Bayes con-ditional probability P (p|p0). This baseline op-erates at predicate-level. To indicate the useful-ness of entity neighborhood information, we alsocompare with P (p|s, p0, o) as mentioned in Sec-tion 4.2. For the advanced embedding-based base-lines, we compare with the E-model and the Row-less model (with MaxR and query attention) intro-duced in Section 2.1.Hyper-parameters: In our experiments, we use25 dimensional embedding vectors for the Row-less model, and 12 dimensional embedding vec-tors for the E- and ENE models. We use a batch-size of 128, and 16 negative samples for each posi-tive sample in a batch. Due to memory constraints,we sample at most 8 predicates between entitiesand 16 neighbors for each entity during training.We use � = 1.0 and set the learning rate to 5e-3for ReVerb and 1e-3 for Ceres datasets.Evaluation measures: Our task is a multi-label

task, where each entity pair can share multiple KBrelations. Therefore, we consider each KB relationas a query and compute the Mean Average Preci-sion (MAP) – where entity pairs sharing the queryrelation should be ranked higher than those with-out the relation. In Section 4.4 we report MAPstatistics for the 50 most common KB relationsfor ReVerb and Freebase dataset, and for the 10most common relations in other domain specificdatasets. The left out relations involve few triplesto report any significant statistics. We also reportthe area under the precision recall curve (AUC-PR) for evaluation in Section 4.5.

4.4 Results

Table 3 shows that the overall results. OpenKIachieves significant performance improvementover all the baselines. Overall, we observe 33.5%MAP improvement on average across differentdatasets.

From the first two rows of Table 3, we ob-serve the performance to improve as we incorpo-rate neighborhood information into the Bayesianmethod. This depicts the strong influence of theentity’s neighboring relations and predicates forrelation inference.

The results show that our Entity Neighbor En-coder (ENE) outperforms the E-Model signifi-cantly. This is because the majority of the entitypairs in our test data have at least one unseen en-tity (refer to Table 4), which is very common inthe OpenIE setting. The E-model cannot handleunseen entities because of its modeling of entity-specific parameters. This demonstrates the benefitof encoding entities with their neighborhood in-formation (KB relations and text predicates) ratherthan learning entity-specific parameters. Besides,ENE outperforms the Rowless Universal Schemamodel, which does not consider any informationsurrounding the entities. This becomes a disad-vantage in sparse data setting where only a fewpredicates are observed between an entity pair.

Finally, the results also show consistent im-provement of OpenKI model over only-Rowlessand only-ENE models. This indicates that themodels are complementary to each other. We fur-ther observe significant improvements by applyingdifferent attention mechanisms over the OpenKIMaxR model – thus establishing the effectivenessof our attention mechanism.Unseen entity: Table 4 shows the data statistics

of unseen entity pairs in our test data. The mostcommon scenario is that only one of the entity in apair is observed during training, where our modelbenefits from the extra neighborhood informationof the observed entity in contrast to the Rowlessmodel.

Dataset Both One Bothseen unseen unseen

ReVerb + Freebase 864 3232 842ReVerb + Freebase(/film) 27 147 228Ceres + Freebase(/film) 383 561 42Ceres + IMDB 462 533 3

Table 4: Statistics for unseen entities in test data.“Both seen” indicates both entities exist in trainingdata; “One unseen” indicates only one of the entities inthe pair exist in training data; “Both unseen” indicatesboth entities were unobserved during training.

Models All data At least one seen

Rowless Model 0.278 0.282OpenKI with Dual Att. 0.365 0.419

Table 5: Mean average precision (MAP) of Rowlessand OpenKI on ReVerb + Freebase (/film) dataset.

Table 5 shows the performance comparison ontest data where at least one of the entity is knownat test time. We choose ReVerb+Freebase(/film)for analysis because it contains the largest pro-portion of test triples where both entities are un-known during training. From the results, we ob-serve that OpenKI outperforms the Rowless modelby 48.6% when at least one of the entity in thetriple is observed during training. Overall, we ob-tain 31.3% MAP improvement considering all ofthe test data. This validates the efficacy of encod-ing entity neighborhood information where at leastone of the entities is known at test time. In the sce-nario where both entities are unknown at test time,the model falls back to the Rowless setting.

Models MAP

Rowless Model 0.695+Type Constraint 0.769 " 10.6%

ENE Model 0.808+Type Constraint 0.818 " 1.2%

OpenKI with Dual Att. 0.814+Type Constraint 0.828 " 1.7%

Table 6: MAP improvement with argument type con-straints on Ceres + IMDB dataset.

Explicit Argument Type Constraint: As dis-cussed in Section 3.5, incorporating explicit typeconstraints can improve the model performance.

However, entity type information and argumenttype constraints are not always available espe-cially for new entities. Table 6 shows the perfor-mance improvement of different models with en-tity type constraints. We observe the performanceimprovement of the ENE model to be much lessthan that of the Rowless model with explicit typeconstraint. This shows that the ENE model alreadycaptures soft entity type information while model-ing the neighborhood information of an entity incontrast to the other methods that require explicittype constraint.

4.5 Results on NYT + Freebase DatasetPrior works (Surdeanu et al., 2012; Zeng et al.,2015; Lin et al., 2016; Qin et al., 2018) on dis-tantly supervised relation extraction performedevaluations on the New York Times (NYT) + Free-base benchmark data developed by Riedel et al.(2010)6. The dataset contains sentences whoseentity mentions are annotated with Freebase enti-ties as well as relations. The training data consistsof sentences from articles in 2005-2006 whereasthe test data consists of sentences from articlesin 2007. There are 1950 relational facts in ourtest data7. In contrast to our prior experiments inthe semi-structured setting with text predicates, inthis experiment we consider the sentences to be aproxy for the text predicates.

Models AUC-PR

PCNN + MaxR (Zeng et al., 2015) 0.325PCNN + Att. (Lin et al., 2016) 0.341

ENE 0.421OpenKI with Dual Att. 0.461

Table 7: Performances on NYT + Freebase data.

Table 7 compares the performance of our modelwith two state-of-the-art works (Zeng et al., 2015;Lin et al., 2016) on this dataset using AUC-PR asthe evaluation metric.

Overall, OpenKI obtains 35% MAP improve-ment over the best performing PCNN baseline. Incontrast to baseline models, our approach lever-ages the neighborhood information of each entityfrom the text predicates in the 2007 corpus andpredicates / relations from the 2005-2006 corpus.This background knowledge contributes to the sig-nificant performance improvement.

6This data can be downloaded from http://iesl.

cs.umass.edu/riedel/ecml/

7Facts of ‘NA’ (no relation) in the test data are not in-cluded in the evaluation process.

Note that, our model uses only the graph in-formation from the entity neighborhood and doesnot use any text encoder such as Piecewise Convo-lutional Neural Nets (PCNN) (Zeng et al., 2015),where convolutional neural networks were appliedwith piecewise max pooling to encode textualsentences. This further demonstrates the impor-tance of entity neighborhood information for rela-tion inference. It is possible to further improvethe performance of our model by incorporatingtext encoders as an additional signal. Some priorworks (Verga et al., 2016; Toutanova et al., 2015)also leverage text encoders for relation inference.

5 Related Work

Relation Extraction: Mintz et al. (2009) utilizethe entity pair overlap between knowledge basesand text corpus to generate signals for automaticsupervision. To avoid false positives during train-ing, many works follow the at-least-one assump-tion, where at least one of the text patterns be-tween the entity pair indicate an aligned predi-cate in the KB (Hoffmann et al., 2011; Surdeanuet al., 2012; Zeng et al., 2015; Lin et al., 2016).These works do not leverage graph information.In addition, Universal Schema (Riedel et al., 2013;Verga et al., 2017) tackled this task by low-rankmatrix factorization. Toutanova et al. (2015) ex-ploit graph information for knowledge base com-pletion. However, their work cannot deal with un-seen entities since entities’ parameters are explic-itly learned during training.Schema Mapping: Traditional schema mappingmethods (Rahm and Bernstein, 2001) involvethree kinds of features, namely, language (nameor description), type constraint, and instance levelco-occurrence information. These methods usu-ally involve hand-crafted features. In contrast, ourmodel learns all the features automatically fromOpenIE and KB with no feature engineering. Thismakes it easy to scale to different domains withlittle model tuning. Also, the entity types used intraditional schema mapping is always pre-definedand coarse grained, so cannot provide precise con-straint of relations for each entity. Instead, ourENE model automatically learns soft and fine-grained constraints on which relations entities arelikely to participate in. It is also compatible withpre-defined type systems.Relation Grounding from OpenIE to KB: In-stead of modeling existing schema, open informa-

tion extraction (OpenIE) (Banko et al., 2007; Yateset al., 2007; Fader et al., 2011; Mausam et al.,2012) regards surface text mentions between en-tity pairs as separate relations, and do not requireentity resolution or linking to KB. Since they donot model KB, it is difficult to infer KB relationsonly based on textual observations. Soderlandet al. (2013) designed manual rules to map rela-tional triples to slot types. Angeli et al. (2015)used PMI between OpenIE predicates and KB re-lations using distant-supervision from shared en-tity pairs for relation grounding. Yu et al. (2017)used word embedding to assign KB relation labelsto OpenIE text predicates without entity align-ment. These works do not exploit any graph in-formation.Entity Modeling for Relation Grounding: Peo-ple leveraged several entity information to helprelation extraction. Zhou et al. (2005) employedtype information and observed 8% improvementof F-1 scores. Ji et al. (2017) encoded entity de-scription to calculate attention weights among dif-ferent text predicates within an entity pair. How-ever, entity type and description information is notcommonly available. Instead, the neighborhoodinformation is easier to obtain and can also be re-garded as entities’ background knowledge. Uni-versal Schema (Riedel et al., 2013) proposed anE-Model to capture entity type information. How-ever, it can easily overfit in the OpenIE settingwith large number of entities and a sparse knowl-edge graph.

6 Conclusion

In this work we jointly leverage relation mentionsfrom OpenIE extractions and knowledge bases(KB) for relation inference and aligning OpenIEextractions to KB. Our model leverages the richinformation (KB relations and OpenIE predicates)from the neighborhood of entities to improve theperformance of relation inference. This also al-lows us to deal with new entities without usingany entity-specific parameters. We further exploreseveral attention mechanisms to better capture en-tity pair information. Our experiments over sev-eral datasets show 33.5% MAP improvement onaverage over state-of-the-art baselines.

Some future extensions include exploring moreadvanced graph embedding techniques withoutmodeling entity-specific parameters and using textencoders as additional signals.

ReferencesGabor Angeli, Melvin Jose Johnson Premkumar, and

Christopher D Manning. 2015. Leveraging linguis-tic structure for open domain information extraction.In Proceedings of the 53rd Annual Meeting of theAssociation for Computational Linguistics and the7th International Joint Conference on Natural Lan-guage Processing (Volume 1: Long Papers), vol-ume 1, pages 344–354.

Michele Banko, Michael J Cafarella, Stephen Soder-land, Matthew Broadhead, and Oren Etzioni. 2007.Open information extraction from the web. In IJ-CAI, volume 7, pages 2670–2676.

Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, andPaolo Papotti. 2013. Extraction and integration ofpartially overlapping web sources. Proceedings ofthe VLDB Endowment, 6(10):805–816.

Michael J Cafarella, Alon Halevy, Daisy Zhe Wang,Eugene Wu, and Yang Zhang. 2008. Webtables: ex-ploring the power of tables on the web. Proceedingsof the VLDB Endowment, 1(1):538–549.

Mark Dredze, Paul McNamee, Delip Rao, Adam Ger-ber, and Tim Finin. 2010. Entity disambiguationfor knowledge base population. In Proceedingsof the 23rd International Conference on Computa-tional Linguistics, pages 277–285. Association forComputational Linguistics.

Anthony Fader, Stephen Soderland, and Oren Etzioni.2011. Identifying relations for open information ex-traction. In Proceedings of the 2011 Conference onEmpirical Methods in Natural Language Process-ing, EMNLP 2011, 27-31 July 2011, John McIntyreConference Centre, Edinburgh, UK, A meeting ofSIGDAT, a Special Interest Group of the ACL, pages1535–1545.

Qiang Hao, Rui Cai, Yanwei Pang, and Lei Zhang.2011. From one tree to a forest: a unified solutionfor structured web data extraction. In Proceedingsof the 34th international ACM SIGIR conference onResearch and development in Information Retrieval,pages 775–784. ACM.

Raphael Hoffmann, Congle Zhang, Xiao Ling, LukeZettlemoyer, and Daniel S Weld. 2011. Knowledge-based weak supervision for information extractionof overlapping relations. In Proceedings of the 49thAnnual Meeting of the Association for Computa-tional Linguistics: Human Language Technologies-Volume 1, pages 541–550. Association for Compu-tational Linguistics.

Guoliang Ji, Kang Liu, Shizhu He, Jun Zhao, et al.2017. Distant supervision for relation extractionwith sentence-level attention and entity descriptions.In AAAI, pages 3060–3066.

Heng Ji, Joel Nothman, Ben Hachey, et al. 2014.Overview of tac-kbp2014 entity discovery and link-ing tasks. In Proc. Text Analysis Conference(TAC2014), pages 1333–1339.

Diederik P Kingma and Jimmy Ba. 2014. Adam: Amethod for stochastic optimization. arXiv preprintarXiv:1412.6980.

Thomas Lin, Oren Etzioni, et al. 2012. Entity linking atweb scale. In Proceedings of the Joint Workshop onAutomatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 84–88. Associa-tion for Computational Linguistics.

Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan,and Maosong Sun. 2016. Neural relation extractionwith selective attention over instances. In Proceed-ings of the 54th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Pa-pers), volume 1, pages 2124–2133.

Colin Lockard, Prashant Shiralkar, and Xin LunaDong. 2019. When open information extractionmeets the semi-structured web. In NAACL-HLT. As-sociation for Computational Linguistics.

Mausam, Michael Schmitz, Stephen Soderland, RobertBart, and Oren Etzioni. 2012. Open language learn-ing for information extraction. In Proceedings ofthe 2012 Joint Conference on Empirical Methodsin Natural Language Processing and ComputationalNatural Language Learning, EMNLP-CoNLL 2012,July 12-14, 2012, Jeju Island, Korea, pages 523–534.

Mike Mintz, Steven Bills, Rion Snow, and Dan Juraf-sky. 2009. Distant supervision for relation extrac-tion without labeled data. In Proceedings of theJoint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference onNatural Language Processing of the AFNLP: Vol-ume 2-Volume 2, pages 1003–1011. Association forComputational Linguistics.

Dan Olteanu, Holger Meuss, Tim Furche, and FrancoisBry. 2002. Xpath: Looking forward. In XML-Based Data Management and Multimedia Engineer-ing - EDBT 2002 Workshops, EDBT 2002 Work-shops XMLDM, MDDE, and YRWS, Prague, CzechRepublic, March 24-28, 2002, Revised Papers, pages109–127.

Pengda Qin, Weiran Xu, and William Yang Wang.2018. Robust distant supervision relation extrac-tion via deep reinforcement learning. arXiv preprintarXiv:1805.09927.

Erhard Rahm and Philip A Bernstein. 2001. A surveyof approaches to automatic schema matching. theVLDB Journal, 10(4):334–350.

Sebastian Riedel, Limin Yao, and Andrew McCallum.2010. Modeling relations and their mentions with-out labeled text. In Joint European Conferenceon Machine Learning and Knowledge Discovery inDatabases, pages 148–163. Springer.

Sebastian Riedel, Limin Yao, Andrew McCallum, andBenjamin M Marlin. 2013. Relation extraction with

matrix factorization and universal schemas. In Pro-ceedings of the 2013 Conference of the North Amer-ican Chapter of the Association for ComputationalLinguistics: Human Language Technologies, pages74–84.

Stephen Soderland, John Gilmer, Robert Bart, Oren Et-zioni, and Daniel S Weld. 2013. Open informationextraction to kbp relations in 3 hours. In Proc. TextAnalysis Conference (TAC2013).

Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati,and Christopher D Manning. 2012. Multi-instancemulti-label learning for relation extraction. In Pro-ceedings of the 2012 joint conference on empiricalmethods in natural language processing and compu-tational natural language learning, pages 455–465.Association for Computational Linguistics.

Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoi-fung Poon, Pallavi Choudhury, and Michael Gamon.2015. Representing text for joint embedding of textand knowledge bases. In Empirical Methods in Nat-ural Language Processing (EMNLP).

Patrick Verga, David Belanger, Emma Strubell, Ben-jamin Roth, and Andrew McCallum. 2016. Multi-lingual relation extraction using compositional uni-versal schema. In Proceedings of the 2016 Con-ference of the North American Chapter of the As-sociation for Computational Linguistics: HumanLanguage Technologies, pages 886–896, San Diego,California. Association for Computational Linguis-tics.

Patrick Verga, Arvind Neelakantan, and Andrew Mc-Callum. 2017. Generalizing to unseen entities andentity pairs with row-less universal schema. In Pro-ceedings of the 15th Conference of the EuropeanChapter of the Association for Computational Lin-guistics: Volume 1, Long Papers, pages 613–622,Valencia, Spain. Association for Computational Lin-guistics.

Yadollah Yaghoobzadeh, Heike Adel, and HinrichSchutze. 2016. Noise mitigation for neural en-tity typing and relation extraction. arXiv preprintarXiv:1612.07495.

Alexander Yates, Michele Banko, Matthew Broadhead,Michael J. Cafarella, Oren Etzioni, and StephenSoderland. 2007. Textrunner: Open information ex-traction on the web. In Human Language Technol-ogy Conference of the North American Chapter ofthe Association of Computational Linguistics, Pro-ceedings, April 22-27, 2007, Rochester, New York,USA, pages 25–26.

Dian Yu, Lifu Huang, and Heng Ji. 2017. Open re-lation extraction and grounding. In Proceedings ofthe Eighth International Joint Conference on Natu-ral Language Processing (Volume 1: Long Papers),volume 1, pages 854–864.

Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao.2015. Distant supervision for relation extraction viapiecewise convolutional neural networks. In Pro-ceedings of the 2015 Conference on Empirical Meth-ods in Natural Language Processing, pages 1753–1762.

Guodong Zhou, Jian Su, Jie Zhang, and Min Zhang.2005. Exploring various knowledge in relation ex-traction. In Proceedings of the 43rd annual meetingon association for computational linguistics, pages427–434. Association for Computational Linguis-tics.

openki: integrating open information extraction and ... · closed information extraction according...

Documents