wiss qa do it yourself question answering over linked data

39
NLP & Semantic Computing Group N L P WISS Challenge Do-it-yourself Question Answering over Linked Data Andre Freitas

Upload: andre-freitas

Post on 28-Jan-2018

743 views

Category:

Data & Analytics


3 download

TRANSCRIPT

NLP & Semantic Computing Group

N L P

WISS Challenge

Do-it-yourself Question Answering over Linked Data

Andre Freitas

NLP & Semantic Computing Group

Challenge Description

Create a Question Answering (QA) system over

DBpedia (and maybe part of Wikipedia text

data).

Evaluate the QA system using the latest

Question Answering over Linked Data (QALD)

test collection.

NLP & Semantic Computing Group

Simple Queries (Video)

NLP & Semantic Computing Group

More Complex Queries (Video)

NLP & Semantic Computing Group

Why should I participate?

Very intense and solid learning experience.

Will help to consolidate and to make concrete

the concepts you saw at the talks.

If you are starting in the field, will give you the

basic artefacts to experiment with QA.

NLP & Semantic Computing Group

Approach

Participants will be split into groups.

Each group will develop a component of the QA

system.

Group shuffling at the end will help everybody

to be aware of different components of the

system.

You can bring your own code. You can suggest

variations over a theme.

This is a hands-on session! Thou shalt code.

NLP & Semantic Computing Group

Guidelines

Having a decent QA system by the end on the week is

a very challenging task.

Don’t be afraid to ask and to make mistakes.

Ethical project commitment: if you started then you

should finish.

Do not hesitate to contact me anytime: [email protected]

skype: andre.freitas5

NLP & Semantic Computing Group

System Components

Question

Analysis

Query

Generation

Semantic

Matching

QA Pipeline

Web Interface

Answer Ranking

&

Generation

Evaluation

Query

Generation

Entity

Search

QA Pipeline

Web Interface / REST API

Query

GenerationQuery

Execution

Graph

Extraction

NLP & Semantic Computing Group

Question Analysis

Identifies linguistic regularities in the question

and individuate main question features.

Use of basic NLP tools (e.g. syntactic parsing,

NER …).

Understand what is expressed in a query and

how to harvest this information.

NLP & Semantic Computing Group

Question Analysis

POS Tagging - Who/WP - is/VBZ - the/DT - daughter/NN - of/IN - Bill/NNP - Clinton/NNP - married/VBN - to/TO - ?/.

NLP & Semantic Computing Group

Dependency parsing - dep(married-8, Who-1) - auxpass(married-8, is-2) - det(daughter-4, the-3) - nsubjpass(married-8, daughter-4) - prep(daughter-4, of-5) - nn(Clinton-7, Bill-6) - pobj(of-5, Clinton-7) - root(ROOT-0, married-8) - xcomp(married-8, to-9)

Question Analysis

NLP & Semantic Computing Group

Question segmentation and candidate type identification.

Who is the daughter of Bill Clinton married to?

(PROPERTY) (INSTANCE) (PROPERTY)

Question Analysis

NLP & Semantic Computing Group

Determine answer type.

Who is the daughter of Bill Clinton married to?

(PERSON)

Question Analysis

NLP & Semantic Computing Group

Question Analysis

Input: Natural language question.

Output: Parsed question. Candidate entities and associated types.

Candidate relations between entities.

Lexical answer type.

Candidate database operations.

NLP & Semantic Computing Group

Entity Search

Matches query terms to dataset entities.

Index/search temporal performance.

Need to support semantic approximations. E.g. coping with different lexical expressions,

abstraction levels.

Will use thesauri and distributional semantics

based approaches for semantic matching.

NLP & Semantic Computing Group

Entity Search

Query terms:

daughter of Bill Clinton married to

Dataset entities:

child of Bill Clinton spouse of

NLP & Semantic Computing Group

Entity Search

Input: query terms.

Output: corresponding database entities.

NLP & Semantic Computing Group

Query Generation

Transforms the natural language query into a

query in a logical form.

Involves the interface between natural language

and knowledge representation / logical models.

Relation identification / extraction.

NLP & Semantic Computing Group

Query Generation

child of Bill Clinton spouse of

SELECT ?y WHERE{

:Bill Clinton :child ?x .?x :spouse ?y .?y :type :Person .

}

NLP & Semantic Computing Group

Query Generation

Input: outputs from the question analysis and

entity search.

Output: Possible SPARQL queries.

NLP & Semantic Computing Group

Query Execution

Input: Possible SPARQL queries.

Output: Result sets.

NLP & Semantic Computing Group

Answer Ranking & Generation

Ranking models and heuristic models for

classifying the answers in relation to a question.

Transform results in triple format to a natural

language form.

NLP & Semantic Computing Group

Answer Ranking & Generation

Chelsea Clinton’s spouse is Marc Mezvinsky

NLP & Semantic Computing Group

Answer Ranking & Generation

Input: SPARQL result sets, lexical answer type.

Output: Ranked answers in a natural language

format.

NLP & Semantic Computing Group

Graph Extraction

Extract entities and relations from Wikipedia

text.

Preserving contextual information.

Persist them as RDF graphs.

Focus on fact extraction.

NLP & Semantic Computing Group

On July 31, 2010, Chelsea Clinton

married to investment banker Marc

Mezvinsky in Rhinebeck, New York.

Graph Extraction

Chelsea Clinton Marc Mezvinskymarried to

time place

Investment Banker

31.07.2010 Rhinebeck, New York

type

NLP & Semantic Computing Group

QA Pipeline & UI

Integration of the QA components.

Development of the Web interface for the QA

system.

Exploration of simple user feedback

mechanisms (e.g. entity disambiguation).

NLP & Semantic Computing Group

Evaluation

Automatic evaluation for the QA system using

the Question Answering over Linked Data Test

Collection (QALD-4).

NLP & Semantic Computing Group

System Components: Groups

Question

Analysis

Query

Generation

Semantic

Matching

QA Pipeline

Web Interface

Answer Ranking

&

Generation

Evaluation

Query

Generation

Entity

Search

QA Pipeline

Web Interface / REST API

Query

GenerationQuery

Execution

Graph

Extraction

NLP & Semantic Computing Group

Coding Proficiency

Entity Search (1) UI & QA Pipeline (2) Question Analysis (3) Graph Extraction (4) Query Execution / Answer Ranking &

Generation (5) Query Generation (6)

Evaluation (7)

NLP & Semantic Computing Group

NLP & Semantic Computing Group

Focused Practical Session

NLP Tools (Syntactic Parsing, SRL, NER,

Relation Extraction).

Semantic Matching (WordNet, Distributional

Models).

Semantic Web / Linked Data (Entity Linking.

SPARQL).

Other?

NLP & Semantic Computing Group

Question Analysis: First task

Using rules and regular expressions over POS

Tags.

Detect the lexical answer type of the

example questions.

Segment the question into a set of candidate

terms.

Use Stanford CoreNLP or NLTK.

NLP & Semantic Computing Group

Entity Search: First task

Index the DBpedia graph using Lucene.

NLP & Semantic Computing Group

Query Generation: First task

Based on entity candidates and Stanford

dependencies or C-structures.

Build a triple-like representation of the query.

NLP & Semantic Computing Group

Query Execution & Answer

Generation: First task

Build an interface for the public DBpedia

SPARQL Endpoint.

Build a simple answer verbalizer from the

SPARQL result set to a more natural language

format.

NLP & Semantic Computing Group

Graph Extraction: First task

Using OpenIE, extract relations from the

Wikipedia articles Barack Obama, Paris,

Jupiter.

NLP & Semantic Computing Group

Evaluation: First task

Using the latest QALD version, build a tool to

calculate precision, recall and f1-measure for

the example queries.

NLP & Semantic Computing Group

QA Pipeline & UI: First task

Build the initial pipeline and the stubs for the

components of the QA system.