the road to the semantic web

The Road to the Semantic WebMichael Genkin

SDBI 2010@HUJI

Michael Genkin ([email protected])

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation."

Tim Berners-Lee, James Hendler and Ora Lassila; Scientific American, May 2001


Over 25 billion RDF triples (October 2010)

More than 24 billion web pages (June 2010)

Probably more than one triple per page, lot more


How will we populate the Semantic Web?

Humans will enter structured data

Data-store owners will share their data

Computers will read unstructured data


Read the Webhttp://rtw.ml.cmu.edu/rtw/(or google it)


Roadmap Motivation Some definitions

Natural language processing Machine learning

Macro reading the web Coupled training NELL

Demo Summary


Some Definitions Natural Language Processing Machine Learning


Natural Language Processing Part of Speech Tagging (e.g. noun, verb) Noun phrase: a phrase that normally

consists of a (modified) head noun. “pre-modified” (e.g. this, that, the red…) “post-modified” (e.g. …with long hair, …

where I live) Proper noun: a noun which represents an

unique entity (e.g. Jerusalem, Michael) Common noun: a noun which represents a

class of entities (e.g. car, university)


Learning: What is it? Assume there is some knowledge base KB. Let some algorithm to perform a set of task

T. Let a performance metric Perf. We will say that a computer program learns

if:


Training Methods


We have a set of examples (KB) and a domain (D) Examples might be positive, or negative e.g. for every input for some .

The learning algorithm A would try to find such . is called a classifier or regression

Supervised


Distinguished from supervised learning by that there are no labeled examples (KB=D).

The unsupervised learning algorithm A will try to find a classifier that when given some as input, will return some arbitrary label. i.e. the algorithm A analyses the structure of

D

Supervised

Unsupervised


A middle way between supervised and unsupervised.

Use a minimal amount of labeled examples and a large amount of unlabeled.

Learn the structure of D in unsupervised manner, but use the labeled examples to constraint the results. Repeat. Known as bootstrapping.

Supervised Semi-Supervised

Unsupervised


Bootstrapping Iterative semi-supervised learningJerusalemTel AvivHaifa

mayor of arg1life in arg1

Ness-ZionaLondondenial

anxietyselfishnessAmsterdam

arg1 is home oftraits such as arg1

Under constrained! Sematic drift


Macro Reading the WebPopulating the Semantic Web by Macro-Reading Internet Text.T.M. Mitchell, J. Betteridge, A. Carlson, E.R. Hruschka Jr., and R.C. Wang. Invited Paper, In Proceedings of the International Semantic Web Conference (ISWC), 2009


Problem Specification (1): Input Initial ontology that contains:

Dozens of categories and relations (e.g. Company, CompanyHeadquarteredInCity)

Relations between categories and relations (e.g. mutual exclusion, type constraints)

A few seed examples of each predicate in ontology

The web Occasional access to human trainer


Problem Specification (2): The Task Run forever (24x7) Each day:

Run over ~500 million web pages. Extract new facts and relations from the

web to populate ontology. Perform better than the day before

Populate the semantic web.


A Solution? An automatic, learning, macro-reader.


Micro vs. Macro Reading (1) Micro-reading: the traditional NLP task of

annotating a single web page to extract the full body of information contained in the document. NLP is hard!

Macro-reading: the task of “reading” a large corpus of web pages (e.g. the web) and returning large collection of facts expressed in the corpus. But not necessarily all the facts.


Micro vs. Macro Reading (2) Macro-reading is easier than micro-reading.

Why? Macro-reading doesn’t require extracting

every bit of information available. In text corpora as large as the web, many

important fact are stated redundantly, thousands of times, using different wordings. Benefit by ignoring complex sentences. Benefit by statistically combining evidence

from many fragments to determine a belief in a hypothesis.


Why an Input Ontology? The problem with understanding free text is

that it can mean virtually anything. By formulating the problem of macro-

reading as populating an ontology we allow the system to focus only on relevant documents.

The ontology can define meta properties of its categories and relations.

Allows to populate parts of the semantic web for which an ontology is available.


Machine Learning Methods Semi-supervised (use an ontology to

learn). Learn textual patterns for extraction. Employ methods such as Coupled

Training to improve accuracy. Expand the ontology to improve

performance.


Coupled Training


Bootstrapping – Revised Iterative semi-supervised learningJerusalemTel AvivHaifa

mayor of arg1life in arg1

Ness-ZionaLondondenial

anxietyselfishnessAmsterdam

arg1 is home oftraits such as arg1


Coupled Training

Couple the training of multiple functions to make unlabeled data more informative

Makes the learning task easier by adding constraints


Coupling (1):Output Constraints We wish to train a function

e.g. Assume we have two different functions

that assign the label city, but receive different input.

Coupling constraint: must agree over unlabeled data.


Coupling (1):Output Constraints

arg1: Nir Barkat is the mayor of Jerusalem

X1=arg1

Y=city?

X2=arg1

Y=country?≠

X2=arg1

Y=city?¿


Coupling (2):Compositional Constraints Assume we have Assume we have a constraint on valid

pairs given . Coupling constraint: must satisfy the

constraint on . e.g. “type checks” first argument for


Coupling (2):Compositional Constraints

Nir Barkat is the mayor of Jerusalem

MayorOf(X1,X2)

city?

location?

politician?

city?

location?

politician?


Coupling (3):Multi-view Agreement We have a function

If X can be partitioned into two “views” . Assume and can predict Y.

We wish to learn Coupling constraint: must agree.


Coupling (3):Multi-view Agreement Let Y a set of possible web page

categories Let X be a set of web pages

Assume represents the words in a page Assume represents the words in

hyperlinks pointing to the page


NELL – Never-Ending Language LearningCoupled Semi-Supervised Learning for Information Extraction.A. Carlson, J. Betteridge, R.C. Wang, E.R. Hruschka Jr. and T.M. Mitchell. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2010.Never Ending Language LearningTom Mitchell's invited talk in the Univ. of Washington CSE Distinguished Lecture Series, October 21, 2010.


Motivation Humans learn many things, for years,

and become better learners over time Why not machines?


Coupled Constraints (1) Mutual Exclusion:

Two mutually exclusive predicates can’t be both satisfied by the same input .

Relation argument type checking: Insure the noun phrases to satisfy each

relation correspond to the categories defined for this relation.

e.g. CompanyIsInEconomicSector relation has arguments of Company and EconomicSector categories.


Coupled Constraints (2) Unstructured and Semi-structured text

features: Noun phrases appear on the web in free

text context or semi-structured context. Structured and Semi-structured classifiers

will make independent mistakes But each is sufficient for classification

Both the classifiers must agree.


Coupled Pattern Learner (CPL): Overview Learns to extract

category and patterninstances.

Learns high-precisiontextual patterns. e.g. arg1 scored a

goal for arg2

Coupled Pattern Learner (CPL): Extracting Runs forever, on each iteration bootstraps a

patterns promoted on the last iteration to extract instances. Select the 1000 that co-occur with most patterns. Similar procedure for patterns, but using recently

promoted instances. Uses PoS heuristics to accomplish extraction

e.g. per category proper/common noun specification, pattern is a sequence of verbs followed by adjectives, prepositions, or determiners (and optionally preceded by nouns).



Coupled Pattern Learner (CPL): Filtering and Ranking Candidates are filtered to enforce mutual

exclusion and type constraints A candidate is rejected unless it co-occurs

with a promoted pattern at least three times more than it co-occurs with mutually exclusive predicates.

Candidates are ranked as following: Instances: by the number of promoted

patterns the co-occur with. Patterns: by precision estimation


Coupled Pattern Learner (CPL): Promoting Candidates For each predicate – promotes at most

100 instances and 5 patterns. Highest rated. Instances and patterns promoted only if

they co-occur with two promoted pattern or instances.

Relations instances are promoted only if their arguments are candidates for the specified categories.


Coupled SEAL (1) SEAL is an established wrapper

induction algorithm. Creates page specific extractors Independent of language Category wrappers defined by prefix and

postfix, relation wrappers defined by infix. Wrappers for each predicate learned

independently.


Coupled SEAL (2) Coupled SEAL adds mutual

exclusion and type checkingconstrains to SEAL. Bootstraps recently promoted

wrappers. Filters candidates that are

mutually exclusive or not ofthe right type for relation.

Uses a single page per domainfor ranking.

Promotes the top 100 instances extracted by at least two wrappers.


Meta-Bootstrap Learner Couples the training of

multiple extractiontechniques. Intuition: different

extractors will makeindependent errors.

Replaces the PROMOTEstep of subordinateextractor algorithms. Promotes any instance recommended by all

the extractors, as long as mutual exclusion and type checks hold.


Learning New Constraints Data mine the KB to infer new beliefs. Generates probabilistic, first order, horn

clauses. Connects previously uncoupled

predicates.

Manually filter rules.


Demo Time http://rtw.ml.cmu.edu/rtw/kbbrowser/

http://rtw.ml.cmu.edu/rtw/kbbrowser/

http://rtw.ml.cmu.edu/rtw/kbbrowser/


SummaryPopulating the semantic web by using NELL for macro reading


Populating the Semantic Web Many ways to accomplish. Use initial ontology to focus, constrain

the learning task. Couple the learning of many, many

extractors. Macro Reading: instead of annotating a

single page each time, read many pages simultaneously.

A never ending task.


Macro-Reading Helps to improve accuracy. Still doesn’t help to annotate a single

page, but… Many things that are true for a single

page are also true for many pages Helps to populate databases with

frequently mentioned knowledge


Future Directions Coupling with external sources

DBpedia, Freenode Ontology extension

New relations through reading, Subcategories

Use a macro-reader to train a micro-reader

Self-reflection, Self-correction Distinguishing tokens from entities Active learning – crowd sourcing

[email protected]

the road to the semantic web

Technology

university michael genkin

regression michael genkin

structure of d michael

human trainer michael

michael common noun

separate web

web http

macro reading