semantic web & natural language

58
Natural Language Interface: Challenges and Partial Solutions NURFADHLINA MOHD SHAREF (PhD) Postdoctoral Fellow Knowledge Technology Group Centre of Artificial Intelligence Faculty of Technology and Information Science Universiti Kebangsaan Malaysia [email protected]

Upload: nurfadhlina-mohd-sharef

Post on 05-Jul-2015

76 views

Category:

Internet


4 download

DESCRIPTION

a system called natural language interface which transforms user's natural language question into SPARQL query find related papers here https://sites.google.com/site/fadhlinams81/publication

TRANSCRIPT

Page 1: semantic web & natural language

Natural Language Interface: Challenges and

Partial Solutions

NURFADHLINA MOHD SHAREF (PhD)

Postdoctoral Fellow

Knowledge Technology Group

Centre of Artificial Intelligence

Faculty of Technology and

Information Science

Universiti Kebangsaan Malaysia

[email protected]

Page 2: semantic web & natural language

Outline

• Part 1: Introduction to Semantic Web– RDF– OWL– SPARQL

• Part 2: Natural Language Interface– Semantic Web Search Engine– NLI Applications– Challenges and Partial Solutions– Potential Works

• Part 3: Practical Examples– Mooneys Geography Dataset– Automatic SPARQL Construction for Natural Language-based

Search in Semantic Database

Page 3: semantic web & natural language

Part 1

• Introduction to Semantic Web

– RDF

– OWL

– SPARQL

Page 4: semantic web & natural language
Page 5: semantic web & natural language

Semantic Web: “a web of data that can be processed directly and indirectly by

machines (Tim Berners-Lee) ”

5

Page 6: semantic web & natural language

RDF (Resource Description Framework)

• Talk about resources – Resources can be pretty much anything– Resources are identified by Uniform Resource Identifiers (URIs)– Things (in a broad sense) are labelled with URIs– URIs act as globally valid names– Sets of names are organized in vocabularies– Vocabularies are demarcated by namespaces

• Information is encoded in Triples= subject-predicate-objectpatterns – Malaysia has capital Kuala Lumpur– Participant has course Semantic Technology

6Taken from: http://www.w3.org/2009/Talks/1030-Philadelphia-IH/Tutorial.ppt

Page 7: semantic web & natural language

7

http://.../KualaLumpur

ShoppingMall

:hasShoppingMall

Resource /Subject

Properties / Predicate

Literals

KLCC

Object

Imbi_Plaza

Literals

Page 8: semantic web & natural language

From Feigenbaum8

Page 9: semantic web & natural language

RDF Example9

Properties of the resource- The elements, artist, country, company, price, and yearare defined in the http://www.recshop.fake/cd# namespace.

XML Declaration

namespace

Page 10: semantic web & natural language

10

From: http://www.w3.org/TR/1998/WD-rdf-schema/

Page 11: semantic web & natural language

11

rdf:typerdfs:subClassOf

rdfs:subPropertyOf

Page 12: semantic web & natural language

Ontology in Information Science• An ontology is an engineering artefact consisting of:

– A vocabulary used to describe (a particular view of) some domain

– An explicit specification of the intended meaningof the vocabulary. • Often includes classification based information

– Constraints capturing background knowledgeabout the domain

• Ideally, an ontology should:

– Capture a shared understanding of a domain of interest

– Provide a formal and machine manipulateablemodel

12

Page 13: semantic web & natural language

OWL

• built on top of RDF • for processing information on the web • designed to be interpreted by computers • was not designed for being read by people • written in XML • is a W3C standard• Based on predecessors (DAML+OIL)• A Web Language: Based on RDF(S)• An Ontology Language: Based on logic

13

Page 14: semantic web & natural language

OWL vs RDF

• OWL and RDF are much of the same thing, but OWL is a stronger language with greater machine interpretability than RDF.

• OWL comes with a larger vocabulary and stronger syntax than RDF.– specific relations between classes, cardinality, equality,

richer typing of properties, characteristics of properties, and enumerated classes.

• OWL comes in three increasingly expressive layers that are designed for different groups of users– OWL Lite, OWL DL, and OWL Full

14

Page 15: semantic web & natural language

OWL Ontology

15

Page 16: semantic web & natural language

KualaLumpurInfo.owl

16

http://.../KualaLumpur

ShoppingMall

:hasShoppingMall

KLCC Imbi_Plaza

Thing rdf:type

i

:hasPublicTransport

land rail

:hasPublicTransport

ExpressGrocers

Seven Eleveni

rdfs:subClassOf

ERL

i

LRT

ii

owl:equivalentOf

:railTransport :railTransport

rdfs:subPropertyOf

rdfs:domain

rdfs:range

Page 17: semantic web & natural language

17

Page 18: semantic web & natural language

18

Page 19: semantic web & natural language

name mbox

Johnny Lee Outlaw <mailto:[email protected]>

Peter Goodguy <mailto:[email protected]>

19

Page 20: semantic web & natural language

20

Page 21: semantic web & natural language

The SPARQL Query Language

?name ?faculty

Joe “CS“

Fred “CS“

21

SELECT ?name ?faculty

WHERE {

?teacher rdf:type Teachers.

?teacher name ?name.

?teacher faculty ?faculty.

}

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

name

facultyfaculty

Operator AND („.“)

Page 22: semantic web & natural language

The SPARQL Query Language

?name ?faculty

Joe “CS“

22

SELECT ?name ?faculty

WHERE {

?teacher rdf:type Teachers.

?teacher name ?name.

?teacher faculty ?faculty.

FILTER (?name=„Joe“)

}

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

name

facultyfaculty

Operator FILTER

Page 23: semantic web & natural language

The SPARQL Query Language

?name ?faculty ?title

Joe “CS“

Fred “CS“ “Professor“

23

SELECT ?name ?faculty ?title

WHERE {

?teacher rdf:type Teachers.

?teacher name ?name.

?teacher faculty ?faculty.

OPTIONAL {

?teacher title ?title.

}

}

title

„Professor“

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

name

facultyfaculty

Operator OPTIONAL

Page 24: semantic web & natural language

Part 2

• Natural Language Interface

– Semantic Web Search Engine

– NLI Applications

– Challenges and Partial Solutions

– Potential Works

Page 25: semantic web & natural language

Semantic Web Search Engine

• to provide ability to understand the intent of the searcher and return result in the context of the query meaning.

• distinguished from standard search engine because the sources of the documents are in the RDF, OWL and RDF-extended HTML documents.

• E.g: Swoogle, Serene, Watson

Page 26: semantic web & natural language

Natural Language Interface (NLI)

• allows user to query in human-like sentences, without requiring them to be aware of the underlying schema, vocabulary and query language

• Famous for question answering • three types of NLI

– with structured data such as database and ontologies, – with a semi or unstructured data such as text documents, – interactive setting as conversational system

• Approaches– Controlled Natural Language for query construction– Visual-based query construction– NL query mapping to triple representation

Page 27: semantic web & natural language

NLI Example - NLPReduce

Page 28: semantic web & natural language

NLI Example – Semantic Crystal

Page 29: semantic web & natural language

NLI Example – GINO & Ginseng

Page 30: semantic web & natural language

NLI Example - Querix

Page 31: semantic web & natural language

NLI Example - AquaLog

Page 32: semantic web & natural language

NLI Example - PowerAqua

Page 33: semantic web & natural language

NLI Example - FREyA

Page 34: semantic web & natural language

Comparison

Year

Inp

ut

typ

e

Syn

on

ym

sup

po

rt

Syn

tact

ic a

nal

ysis

Cal

cula

te s

trin

g si

mila

rity

Cla

rifi

cati

on

d

ialo

gue

Lear

nab

ility

Sup

po

rt K

B

Het

ero

gen

eit

y

SemanticCrystal

1993 Graphical based query

NO NO NO NO NO NO

GINO /Ginseng

2006 Controlled natural language based interface

WordNet YES NO NO NO NO

Querix 2006 Query by example WordNet NO NO YES NO NONLPReduce 2007 Keywords, sentence

fragments and fullsentences

NO NO NO NO NO NO

QuestIO 2008 Full natural language Gazetteer YES YES NO NO NO

ORAKEL 2008 Factual question Lexicon NO NO NO NO NOAquaLog /PowerAqua

2010 Full natural language WordNet,Lexicon

YES YES YES NO YES

FREyA 2012 Full natural language WordNet YES NO YES YES NO

Page 35: semantic web & natural language

NLI Implementation

• Query: “Who wrote The Neverending Story?”

• PowerAqua triple:

<[person,organization], wrote,Neverending Story>

• Triple Matching from Dbpedia:

<Writer, IS A,Person>

<Writer, author,The Neverending Story>

• Answer: “Michael Ende”

Page 36: semantic web & natural language
Page 37: semantic web & natural language

NLI Challenges (Unger et al., 2012)

1.

(a) Which cities have more than three universities?

(b) <[cities],more than,universities three>

(c) SELECT ?y WHERE {

?x rdf:type onto:University . ?x onto:city ?y .

} HAVING (COUNT(?x) > 3)

2.

(a) Who produced the most films?

(b) <[person,organization], produced,most films>

(c) SELECT ?y WHERE {

?x rdf:type onto:Film . ?x onto:producer ?y .

} ORDER BY DESC(COUNT(?x)) OFFSET 0 LIMIT 1

Page 38: semantic web & natural language

NLI Challenges

• Layer 1: Query understanding

– E.g: Complex query: negation, subqueries, arithmetic operation, etc

• Layer 2: Query-KB granularity homogenisation

– E.g: Different format/styles

– E.g: Mismatch in concept name

• Layer 3: Result presentation

– E.g: ranking

Page 39: semantic web & natural language

Query Understanding

• Input type

– Current

• Guided query: controlled natural language, query indicator (e.g: WH-terms)

• Graphical query construction

– Problem

• Confusing

• Requires a degree of background knowledge

• Constrained search

Page 40: semantic web & natural language

Query Understanding

• Compositional Density– Current

• Triple generated by PowerAqua for Give me five albums by Pink Lloyd– <[albums, five], null, Lloyd Pink>,

– <[five], null, albums>,

– <[Pink], null, Lloyd>

– Potential Works• Negation (e.g: not, outside, except)

• Arithmetic (e.g: sum of, how many, largest)

• Auxiliary (e.g: largest, latest, top)

Page 41: semantic web & natural language

Query Understanding

• Ambiguity Reduction

– Triple identification

– Stanford Parser

– WordNet

– Similarity Matching

– Clarification Dialogue

– Entity Identification

Page 42: semantic web & natural language
Page 43: semantic web & natural language
Page 44: semantic web & natural language
Page 45: semantic web & natural language

Types of queries (Ferre & Hermann, 2011)

• Visualization– exploration of the facet hierarchy

• Selection– count or list items that have a particular feature

• Path– subjects had to follow a path of properties.

• Disjunction – required the use of unions

• Negation – required the use of exclusions

• Inverse – required the crossing of the inverse of properties

• Cycle – required the use of co-reference variables (naming and reference navigation

links)

Page 46: semantic web & natural language
Page 47: semantic web & natural language

Query-KB granularity homogenisation

• KB variation

– Format (e.g: RDF, OWL)

– Style (e.g: with/without schema)

– Concept names (e.g: length)

– Query-triple conversion

– Sources supported (single/multi sources, LOD)

– Disambiguat

• WordNet, Similarity Matching, Clarification Dialogue

Page 48: semantic web & natural language
Page 49: semantic web & natural language

Result Understanding

• Ranking result

– List vs. finite answer

– Degree of confidence / hit score

– Learnability

Page 50: semantic web & natural language

Part 3

• Practical Examples

– Mooneys Geography Dataset

– Automatic SPARQL Construction for Natural Language-based Search in Semantic Database

Page 51: semantic web & natural language

Geography.owl

Class

DataTypeProperty ObjectProperty

Name Domain Range Name Domain Range

City cityPopulation City float borders State State

Capital statePopulation State float isCityOf City State

State statePopDensity State float hasCity State City

HiPoint abbreviation State string isCapitalOf Capital State

LoPoint stateArea State float hasCapital State Capital

Mountain lakeArea Lake float isMountainOf Mountain State

Lake height Mountain float hasMountain State Mountain

River hiElevation HiPoint float isHighestPointOf HiPoint State

Road loElevation LoPoint float hasHighPoint State HiPoint

length River float isLowestPointOf LoPoint State

number Road float hasLowPoint State LoPoint

isLakeOf Lake State

hasLake State Lake

runsThrough River State

hasRiver State River

passesThrough Road State

hasRoad State Road

Page 52: semantic web & natural language

Can you tell me the capital of texas? How large is texas?

Give me all the states of usa? How long is rio grande?

Give me the cities in texas? How long is the colorado river?

Give me the cities which are in texas? How long is the mississippi?

Give me the lakes in california? How long is the mississippi river?

Give me the states that border utah? How long is the mississippi river in miles?

Give me the number of rivers in california? How many capitals does rhode island have?

How many citizens in alabama? How many cities does texas have?

How many citizens live in california? How many cities does the usa have?

Give me the longest river that passes through the us?

How many citizens does the biggest city have in the usa?

Give me the largest state? How high are the highest points of all the states?

Could you tell me what is the highest point in the state of oregon? How high is the highest point in america?

Count the states which have elevations lower than what alabama has? How high is the highest point in montana?

How big is texas? How high is the highest point in the largest state?

How big is the city of new york? How large is the largest city in alaska?

How many colorado rivers are there? How long is the longest river in california?

How high is guadalupe peak? How long is the longest river in the usa?

How high is mount mckinley? How long is the shortest river in the usa?

How many cities named austin are there in the usa? How many big cities are in pennsylvania?

Page 53: semantic web & natural language

Approach• Can you tell me the capital of texas?

– POS: Can/MD you/PRP tell/VB me/PRP the/DT capital/NN of/IN texas/NNS ?/.

– Triple: <capital,?,texas>

– SPARQL:

"PREFIX geo:<http://www.mooney.net/geo#>"+

"SELECT ?s "+

"WHERE "+

"{?s geo:isCapitalOf geo:texas . }";

– Answer: geo:austinTx

• Give me all the states of usa?

– POS: Give/VB me/PRP all/PDT the/DT states/NNS of/IN usa/NN ?/.

– Triple: <states, ?, usa>

– SPARQL:

"PREFIX geo:<http://www.mooney.net/geo#>"+

"SELECT ?s "+

"WHERE "+

"{?s a geo:State . }";

– Answer: geo:kansas, geo:rhodeIsland, geo:montana, geo:tennessee, geo:arkansas, geo:newMexico, …(all the states)

Page 54: semantic web & natural language

POS tagging

• Give/VB me/PRP the/DT cities/NNS which/WDT are/VBP in/IN texas/NNS ?/.

• Give/VB me/PRP the/DT lakes/NNS in/IN california/NN ?/.

• Give/VB me/PRP the/DT states/NNS that/WDT border/NN utah/NN ?/.

• Give/VB me/PRP the/DT number/NN of/IN rivers/NNS in/IN california/NN ?/.

• How/WRB many/JJ citizens/NNS in/IN alabama/NN ?/.

Page 55: semantic web & natural language

More to Do

• Domain dependent/independent?

• Is the heuristics that POS and KB compliance enough for SPARQL generation?

• More complex queries

– Arithmetic operation (COUNT, SUB-QUERY)

– Aggregation (requires FILTER, OPTIONAL, HAVING)

– Auxiliary (e.g: latest, earliest)

Page 56: semantic web & natural language

Conclusion

• NLI is a potential area

• Highlight: ambiguity reduction, query understanding, query-KB matching

• Focus: SPARQL generation and optimization

• Potential sub-area: negation, arithmetic, temporal, complex queries

Page 57: semantic web & natural language

References

• Ferre, S., & Hermann, A. (2011). Semantic Search : Reconciling Expressive Querying and Exploratory Search. ISWC11 Proceedings of the 10th international conference on The semantic web (pp. 177-192).

• Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.-C., Gerber, D., & Cimiano, P. (2012). Template-based question answering over RDF data. Proceedings of the 21st international conference on World Wide Web -WWW 12, 639. New York, New York, USA: ACM Press. doi:10.1145/2187836.2187923

Page 58: semantic web & natural language

Contact

Nurfadhlina Mohd Sharef• Postdoctoral Fellow, Knowledge Technology Group,

Centre of Artifical Intelligence, Universiti KebangsaanMalaysia

(Room 4.4, Level 4, Block H)

• Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia

(Room C2.08, Level 2, Block C)

[email protected]