talking to your data: natural language interfaces for a schema-less world (keynote at nliwod, iswc...

77
Talking to your Data: Natural Language Interfaces for a schema-less world André Freitas NLIWoD at ISWC 2014 Riva del Garda

Upload: andre-freitas

Post on 02-Jul-2015

280 views

Category:

Technology


1 download

DESCRIPTION

The increase in the size, heterogeneity and complexity of contemporary Big Data environments brings major challenges for the consumption of structured and semi–structured data. Addressing these challenges requires a convergence of approaches from different communities including databases, natural language processing, and information retrieval. Research on Natural Language Interfaces (NLI) and Question Answering systems has played a prominent role in stimulating a multidisciplinary approach to the problem that has moved the field from a futuristic vision to a concrete industry-level technological trend. In this talk we distill the key principles of state-of-the-art approaches for data consumption using NLI. Particular attention is paid to the maturity and effectiveness of each approach together with discussion on future trends and active research questions.

TRANSCRIPT

Page 1: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Talking to your Data:

Natural Language Interfaces for a

schema-less world

André Freitas

NLIWoD at ISWC 2014

Riva del Garda

Page 2: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Outline

Shift in the Database Landscape

On Schema-agnosticism & Semantics

Distributional Semantics to the Help

Case Study: Treo QA System

Living in a Schema-less World

Take-away Message

Page 3: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Shift in the Database

Landscape

3

Page 4: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Big Data (Data Variety)

Vision: More complete data-based picture of the world for

systems and users.

4

Page 5: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

The Long Tail of Data Variety

Page 6: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

The Long Tail of Data Variety

6

Page 7: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Programs

Full data coverage

Full automation

Full knowledge

The Long Tail of Data Variety

7

Page 8: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Programs

Full data coverage

Full automation

Full knowledge

The Long Tail of Data Variety

Data generation

8

Page 9: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Very-large and dynamic “schemas”

10s-100s attributes1,000s-1,000,000s attributes

circa 2000circa 2014

9

Page 10: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Semantic Heterogeneity

Decentralized content generation.

Multiple perspectives (conceptualizations) of the reality.

Ambiguity, vagueness, inconsistency.

10

Page 11: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Programs

Full data coverage

Full automation

Full knowledge

The Long Tail of Data Variety

Data generation

Data consumption

11

Page 12: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Databases for a Complex World

How do you query data at this scale?

12

Page 13: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Schema-agnosticism

Ab

str

ac

tio

n

La

ye

r

User

13

Page 14: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

First-level independency

(Relational Model)

“… it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and representation and organization of data on the other”

Codd, 1970

Second-level independency

(Schema-agnosticism)

14

Page 15: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

On Schema-agnosticism

& semantics

15

Page 16: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Vocabulary Problem for Databases

Query: Who is the daughter of Bill Clinton married to?

Semantic Gap

Possible representations

Schema-agnostic query

mechanisms

Abstraction level differences

Lexical variation

Structural (compositional) differences

Operational/functional differences

16

Page 17: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Robust Semantic Model

Semantic intelligent behaviour is highly dependent on knowledge scale (commonsense, semantic)

Semantics

=

Formal meaning representation model

(lots of data)

+

inference model

17

Page 18: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Robust Semantic Model

Not scalable!

1st Hard problem: Acquisition

Semantics

=

Formal meaning representation model

(lots of data)

+

inference model

18

Page 19: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Robust Semantic Model

Not scalable!

2nd Hard problem: Consistency

Semantics

=

Formal meaning representation model

(lots of data)

+

inference model

19

Page 20: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

“Most semantic models have dealt with particular types of

constructions, and have been carried out under very simplifying

assumptions, in true lab conditions.”

“If these idealizations are removed it is not clear at all that modern

semantics can give a full account of all but the simplest

models/statements.”

Formal World Real World

Baroni et al. 2013

Semantics for a Complex World

20

Page 21: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Distributional Semantic Models

Semantic Model with low acquisition effort(automatically built from text)

Simplification of the representation

Enables the construction of comprehensive commonsense/semantic KBs

What is the cost?

Some level of noise(semantic best-effort)

21

Page 22: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Distributional Hypothesis

“Words occurring in similar (linguistic) contexts tend to be semantically similar”

He filled the wampimuk with the substance, passed itaround and we all drunk some

22

Page 23: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Distributional Semantic Models (DSMs)

“The dog barked in the park. The owner of the dog put him on the

leash since he barked.”contexts = nouns and verbs in the same

sentence

23

Page 24: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Distributional Semantic Models (DSMs)

“The dog barked in the park. The owner of the dog put him on the

leash since he barked.”

bark

dog

park

leash

contexts = nouns and verbs in the same

sentence

bark : 2

park : 1

leash : 1

owner : 1

24

Page 25: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Distributional Semantic Models (DSMs)

car

dog

bark

run

leash

25

Context

Page 26: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Semantic Similarity & Relatedness

car

dog

bark

run

leash

26

Query: cat

Page 27: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Semantic Similarity & Relatedness

θ

car

dog

cat

bark

run

leash

27

Query: cat

Page 28: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

DSMs as Commonsense Reasoning

Commonsense is here

θ

car

dog

cat

bark

run

leash

28

Semantic Approximation is here

Page 29: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

DSMs as Commonsense Reasoning

θ

car

dog

cat

bark

run

leash

...

vs.

Semantic best-effort

Page 30: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Case Study: Treo QA

System

30

Page 31: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Approach Overview

Query Planner

Ƭ-Space

Large-scale

unstructured data

Commonsense

knowledge

Structured

Data

Distributional

semantics

Core semantic approximation &

composition operations

Query AnalysisQuery Query Features

Query Plan

31

Page 32: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Approach Overview

Query Planner

Ƭ-Space

Wikipedia

RDF Data

Explicit Semantic

Analysis (ESA)

Core semantic approximation &

composition operations

Query AnalysisQuery Query Features

Query Plan

Commonsense

knowledge

32

Page 33: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Ƭ-Space

e

p

r

33

Page 34: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Core Operations

Search &

Composition

Operations

Query

34

Page 35: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Does it work?

35

Page 36: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Addressing the Vocabulary Problem for

Databases (with Distributional Semantics)

Gaelic: direction

36

Page 37: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Solution (Video)

37

Page 38: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

More Complex Queries (Video)

38

Page 39: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Treo Answers Jeopardy Queries (Video)

http://bit.ly/1hWcch939

Page 40: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Relevance

Test Collection: QALD 2011.

DBpedia.

Dataset (DBpedia + YAGO links): 45,767 predicates, 9,434,677

instances, more than 200,000 classes

40

Page 41: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Transform natural language queries into triplepatterns.

“Who is the daughter of Bill Clinton married to?”

Query Pre-Processing

(Question Analysis)

41

Page 42: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Step 1: POS Tagging- Who/WP

- is/VBZ

- the/DT

- daughter/NN

- of/IN

- Bill/NNP

- Clinton/NNP

- married/VBN

- to/TO

- ?/.

Query Pre-Processing

(Question Analysis)

42

Page 43: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Step 2: Core Entity Recognition- Rules-based: POS Tag + TF/IDF

Who is the daughter of Bill Clinton married to?(PROBABLY AN INSTANCE)

Query Pre-Processing

(Question Analysis)

43

Page 44: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Step 3: Determine answer typeRules-based.

Who is the daughter of Bill Clinton married to?(PERSON)

Query Pre-Processing

(Question Analysis)

44

Page 45: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Step 4: Dependency parsing- dep(married-8, Who-1)

- auxpass(married-8, is-2)

- det(daughter-4, the-3)

- nsubjpass(married-8, daughter-4)

- prep(daughter-4, of-5)

- nn(Clinton-7, Bill-6)

- pobj(of-5, Clinton-7)

- root(ROOT-0, married-8)

- xcomp(married-8, to-9)

Query Pre-Processing

(Question Analysis)

45

Page 46: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Step 5: Determine Partial Ordered Dependency Structure

(PODS)

- Rules based.

• Remove stop words.

• Merge words into entities.

• Reorder structure from core entity position.

Query Pre-Processing

(Question Analysis)

46

Bill Clinton daughter married to

(INSTANCE)

ANSWER

TYPE

Person

QUESTION FOCUSLower level of ambiguity,

vagueness, synonimy

Page 47: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Question Analysis

Transform natural language queries into triplepatterns

“Who is the daughter of Bill Clinton married to?”

Bill Clinton daughter married to

(INSTANCE) (PREDICATE) (PREDICATE) Query Features

PODS

47

Page 48: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Query Plan

Map query features into a query plan.

A query plan contains a sequence of core operations.

(INSTANCE) (PREDICATE) (PREDICATE) Query Features

Query Plan

(1) INSTANCE SEARCH (Bill Clinton)

(2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)

(3) e1 <- NAVIGATE (Bill Clintion, p1)

(4) p2 <- SEARCH PREDICATE (e1, married to)

(5) e2 <- NAVIGATE (e1, p2)

48

Page 49: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Instance Search

Bill Clinton daughter married to

:Bill_Clinton

Query:

Linked

Data:

Instance Search

49

Page 50: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Predicate Search

Bill Clinton daughter married to

:Bill_Clinton

Query:

Linked

Data::Chelsea_Clinton

:child

:Baptists:religion

:Yale_Law_School

:almaMater

...(PIVOT ENTITY)

(ASSOCIATED

TRIPLES)

50

Page 51: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Predicate Search

Bill Clinton daughter married to

:Bill_Clinton

Query:

Linked

Data::Chelsea_Clinton

:child

:Baptists:religion

:Yale_Law_School

:almaMater

...

sem_rel(daughter,child)=0.054

sem_rel(daughter,child)=0.004

sem_rel(daughter,alma mater)=0.001

Which properties are semantically related to ‘daughter’?

51

Page 52: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Predicate Search

Bill Clinton daughter married to

:Bill_Clinton

Query:

Linked

Data::Chelsea_Clinton

:child

:Baptists:religion

:Yale_Law_School

:almaMater

...

sem_rel(daughter,child)=0.054

sem_rel(daughter,child)=0.004

sem_rel(daughter,alma mater)=0.001

Which properties are semantically related to ‘daughter’?

(In the context of Bill Clinton)

52

Page 53: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Navigate

Bill Clinton daughter married to

:Bill_Clinton

Query:

Linked

Data::Chelsea_Clinton

:child

53

Page 54: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Navigate

Bill Clinton daughter married to

:Bill_Clinton

Query:

Linked

Data::Chelsea_Clinton

:child

(PIVOT ENTITY)

54

Page 55: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Predicate Search

Bill Clinton daughter married to

:Bill_Clinton

Query:

Linked

Data::Chelsea_Clinton

:child

(PIVOT ENTITY)

:Mark_Mezvinsky

:spouse

55

Page 56: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Results

56

Page 57: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Core Principles

Minimize the impact of Ambiguity, Vagueness, Synonymy with

semantic pivoting.

Semantic pivoting: Address the simplest matchings first

(heuristics).

Semantic Relatedness as a primitive semantic approximation

operation.

Distributional semantics as commonsense/semantic

knowledge.

Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-

Compositional Semantics Approach, IUI 2014

Page 58: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Living in a

Schema-less World

58

Page 59: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

How do we build systems today?

Structure the domain

59

Page 60: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Generalize and encode some rules

How do we build systems today?

Page 61: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Allow some constrained interaction

How do we build systems today?

Query is here

61

Page 62: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Siloed Systems

62

Page 63: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Full data coverage

Full automation

Full knowledge

63

Page 64: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Linked Data: Datasets are easier to integrate and to

consume (data model level). However, the semantic

barrier for consumption is still there

Page 65: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Full data coverage

Full automation

Full knowledge

65

Page 66: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Distributional DBMS

Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-

Compositional Semantics Approach, IUI 2014

Page 67: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Full data coverage

Full automation

Full knowledge

67

Page 68: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Simplification of Information Extraction

A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs, WoLE, 2012

Page 69: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Simplification of Information Extraction

General Electric Company, or GE , is an American multinational conglomerate

corporation incorporated in Schenectady , New York

69

Page 70: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Full data coverage

Full knowledge

Full automation

70

Page 71: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Schema-agnostic programs

Towards An Approximative Ontology-Agnostic Approach for Logic Programs, FOIKS 2014

Page 72: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Full data coverage

Full knowledge

Full automation

72

Page 73: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Reasoning with Distributional Semantics

A Distributional Semantics Approach for Selective Reasoning on Commonsense Graph

Knowledge Bases, NLDB 2014

Page 74: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Data variety +

Data

Full data coverage

Full automation

Full knowledge

74

Page 75: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Existing semantic technologies can address today major data

management problems

Muiti-disciplinarity is one key (and NLI people are very good at it!):- NLP + IR + Semantic Web + Databases

Schema-agnosticism is a central property/functionality/goal!

Distributional Semantics + semantics of structured data =

schema-agnosticism

Schema-agnosticism brings major impact for information systems.

We can tame the long tail of data variety!

The wave is just starting. Be a part of it!

Take-away Message

75

Page 76: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Want to play with Distributional

Semantics?

http://easy-esa.org

76

Page 77: Talking to your Data: Natural Language Interfaces for a schema-less world (Keynote at NLIWoD, ISWC 2014)

Any Queries?