from sentence to sense level information retrieval bridging content and object with holtran...

31
From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Upload: mason-lancaster

Post on 26-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

From sentence to sense level information retrieval

Bridging CONTENT and object with HOLTRAN Technology

Page 2: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Mission Statement

HOLTRAN (Higher Order Logic Translation) Technology fills the gap between two fundamental

methods of representation of information: unstructured texts and structured data.

HOLTRAN Technology is aimed to serve as a universal international standard of representation,

storing and exchange of information

Our mission is to become an industry-leading provider of Natural Language Processing (NLP)

solutions for consumers and companies

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 3: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

What do we do?

Next generation knowledge base engine

• Extracts information from multi-lingual natural language texts

• Stores the information in a structured form

• Keeps both semantic and textual information on equal rights

• Enables knowledge base queries in natural languages

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 4: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Why?

“The challenge … is to find effective solutions to unlock the value from unstructured information sources, and to leverage it in Business Intelligence deployments”

“The challenge … is to find effective solutions to unlock the value from unstructured information sources, and to leverage it in Business Intelligence deployments”

Butler GroupButler Group

“Poor classification costs a 10,000 user organization $10M annually.”

“Poor classification costs a 10,000 user organization $10M annually.”

Usability expert, Jacob NielsenUsability expert, Jacob Nielsen

“Unstructured information doubles in quantity every three months”

“Unstructured information doubles in quantity every three months”

Gartner GroupGartner Group

“In modern enterprise,85% of data is unstructured”

“In modern enterprise,85% of data is unstructured”

Butler GroupButler Group

Page 5: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

NEED

common interface framework for diverse multilingual and multiform sources of information.

OBSTACLE

unstructured information

human-friendly

BUT

meaningless to computers.

and heavily dependant on manual processes

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 6: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

The general need for…

common interface being able to bind and put in order diverse multilingual and multiform sources of information

…is also lately sharply realized by European leading companies and by ITEA board

which has formulated in its “Technology Roadmap on Software Intensive Systems ” the main challenge for Semantic data as “the possibility for applications to understand the ‘meaning’ of each others data” and has invited HOLTRAN Technology Ltd, to the annual ITEA meeting in Amsterdam (January, 2003).

As result, the company was invited to join the 6-th call EUREKA ITEA consortium:

DigiNews (News and Information for mobile e-paper terminals. Leader: Philips Technology)

HOLTRAN Technology addresses this need providing a framework for building semantic knowledge based information systems of a new generation being capable to extract semantic (structured) information from multilingual textual (unstructured) form and vice versa – to express the stored structured information in textual form.

Page 7: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Novelty

Employment of a knowledge model based on our own improved

Multiple sorted type theory.

HOLTRAN based system is able to

store arbitrary order relationships between semantic entities (also incomplete and contradictive information)

store extendable language definitions of any type and complexity

effectively evaluate full responses to any queries related to the stored information HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 8: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Novelty in Question Answering

Unlike search engines, HOLTRAN Question Answering system aims to supply users with the essence of "just the right information," instead of merely providing a list of hits.

Current question answering systems are based on

sentence level information retrieval.

The rate of correct answers in such systems is about 70%.

HOLTRAN Question Answering system provides a revolutionary leap from sentence level retrieval to

sense level information retrieval

and answer formation

Page 9: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Why we are better than other QA systems?

•State-of-the-Art question answering systems use a variety of linguistic resources to understand users’ queries and match documents’ sections.

•Most common linguistic resources include:

–part-of-speech tagging

–parsing

–named entity extraction

–semantic relations

–dictionaries, WordNet, etc.

•HOLTRAN Technology provides a unique framework embracing these resources inside the system and thus covering complex lexical, syntactic and semantic relationships between question and answer strings.

Page 10: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

The social impact of HOLTRAN in 2020 will cover the following situations:

Two or more people talking over the "phone" each in his native language and receiving the answer in it.

Traveling in a car one can ask "it" any relevant question and receive the system response in his native language.

One can receive ANY "newspaper" in his native language (the first step to be done through our participation in DigiNews ITEA project).

Fully automated call centers: each user can be receive the answer in his native language.

Translation form one artificial language to another providing full database compatibility without additional software development (solution of PDM-ERP compatibility problem).

No manual transaction treatment: all e-mails read and treated automatically (Spam, as well).

Page 11: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Relational DBMS

Unstructured documents

System Server System Client

Local documents

User

Meta data (SQL)

Documents)text stream (

Meta data (XML)

Documents(text stream)

Met

a d

ata

+ Doc

um

ents

)G

UI

(

Typical modern information system architecture

The system repository contains• a relational database of some related meta information• a hybrid of a vault of textual

documents stored as unstructured "black boxes“

The System Server • provides execution of queries to the relational DB and access to the textual documents. • communicates to the system clients usually via GUI for the end users. HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 12: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

The basic problems of classical architecture:

Redundancy and inconsistency between contents of primary documents and meta information on these documents

Limitations on inter-version and inter-application compatibility

HOLTRAN technology solves these fundamental problems due to its ability to

extract the semantic content from unstructured documents

extract more semantic information from the same documents upon extending language definitions without a new software development

express the content in a textual form.HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 13: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRANKBMS

HOLTRANInterpreter

Content +languages definitions

(HOLTRAN native)

Meta data +Documents

(text stream)

Local documents

User

Met

a d

ata

+ Doc

um

ents

(use

r n

ativ

e)

Browser

HOLTRAN based information system architecture

HOLTRAN KB stores on equal rights both •application information •definitions of various external languages, i.e. any artificial or natural languages used to exchange information with applications and users.

HOLTRAN interpreter translates information coming from users and documents in external languages to the internal KBMS language and vice versa. It allows users to communicate with the system in their native languages.

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 14: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

How it works?

0)(

2120eteteeeeete ccccc

Objects are assigned entity type “e”

• John plays table tennis well

“t” stands for truth type

)))c c (c (c(c eeeeeet(et)et02120

• Interior axiom representation

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 15: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

How it works? (2)

0)(

212eteteeeeete ccccx

Querying in HOLTRAN:

• Who plays table tennis well ?

Inference procedure consists in finding of all consistent substitutions of free variables (x’s) in the tested formula with which it is provable.

))) x c (c (c(c eeeeeet(et)et2120

• Interior axiom representation

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 16: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

What is higher order logic?

)c&(c 4e

3e

First Order Logic:

Mary and Cathy play tennis.

First order logic expressions – the ones being written in SQL and containing

constants and variables only of simple types e, t,…

Mary or Cathy Not Mary and Cathy

)c&c~()c|(c 4e

3e

4e

3e

First orderpart

Page 17: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

What is higher order logic? (2)

First Order Query - queries a variable of the first order:

Who plays table tennis well ?

Higher order logic expressions – the ones containing constants and variables of variable order

types ee, et, eet …

Second Order Query - queries a variable of the 2-d order:

What does John do? How does Mary play tennis ?

)))cc(c((x)c(x 3e

2e

2eet(et)eteet

0

))) x c (c (c(c eeeeeet(et)et2120

4-th order query

Page 18: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

How do we do it in HOLTRAN Native

programming language

HOLTRAN Native programming language is especially designed to express any human-percept notions and ideas, including definitions of natural

languages

This is how looks the piece of code in HOLTRAN Native to program the questions of the sort:

How does Mary play tennis ?

(("How does"=) ##& NounPhrase ##> VerbPhrase ##& ("?"=) =>> \np:e\vp:et((x:(et)et vp:et) np:e))

This is the way Interpreter translates from English to HOLTRAN Native and back to English:

< How does Mary play tennis ?

> Test (x:(et)et (_3:eet (COM _5:et)) (ID _2:e));

> Assert (_1:(et)et (_3:eet (COM _5:et)) (ID _2:e));

> Mary plays tennis well.

Page 19: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Product Overview

Our core product –

HOLTRAN Semantic Platform

is a suit of software components and tools serving as a middleware to build customizable and extensible applications for semantic processing of textual information in multiple artificial and natural languages.

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 20: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Semantic Platform comprises:

HOLTRAN Engine - the heart of the product which reads input information, translates it into internal instructions and executes them by directly accessing HOLTRAN Knowledge Bases. HOLTRAN Engine includes built in

Interpreter of HOLTRAN Native Language

A set of HOLTRAN Knowledge Bases

HOLTRAN Knowledge Connector  -  a package of  software components forming a C++ and Java API (SDK) to provide local or network access to HOLTRAN Knowledge Bases.

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 21: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Semantic Platform also comprises:

A configurable set of HOLTRAN Standard Applications including

HOLTRAN Desktop Assistant - a standard extension to MS Windows Explorer

HOLTRAN Network Assistant providing instant Information Search, categorization and summarization across an enterprise network

HOLTRAN Web Server providing access to HOLTRAN Knowledge Bases via http protocol and publishing documents to the linked HOLTRAN Knowledge Bases

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 22: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Question & Answering: Current Status

Consider the following short input:

Mark is a doctor. He has a clinic. Mark works well. Bill and Cathy love Mark.

John is a programmer. He is a good programmer. John builds a new program. He works well.

Alex is a driver. Alex builds a new house. He works hard. Olga helps him, she works hard too. HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 23: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Question & Answering: Current Status Now the following dialogue takes place between the user (<) and the program (>). Note the dialogue management capabilities:

< Who is Mark ?

> He is a doctor.

< Who is he ?

> He is a doctor.

< Does Mark have a clinic ?

> Yes, he has it.

< How does Mark work ?

> He works well.

< How does Olga work ?

> She works hard.

< Who loves him ?

> Bill and Cathy love him.

< Who works hard ?

> Alex and Olga work hard.

< Who is Alex ?

> He is a driver.

< What does he do?

> He works and builds a new house.

< Who helps him ?

> Olga helps him.

< What does John do ?

> He works and builds a new program.

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 24: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Question & Answering: Current StatusSelf – learning capabilities: Note that “carnivore” , “lion” and “lab” are absent in the system dictionary and we introduce them in the dialogue for the first time:

< Alex is a carnivore.

< A lion is a carnivore.

< Who is Alex ?

> He is a driver and he is a carnivore.

< Who is a lion ?

> It is a carnivore.

< Mark builds a new lab.

< What does Mark do ?

> He works, builds a new lab and has a clinic.

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 25: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Question & Answering Test Cases - 1

No doubts that none of existing products could pass even a half of these tests in viewable future. We expect to pass at least the first 7 cases in a year (the last one might require some additional efforts).

1. Negation AccountingI: The wolf huffed and puffed but he could not blow down

that brick house.

Q: Could the wolf blow down the brick house ?

2. SyllogismsI: Every human is mortal. Socrates is a human.

Q: Is Socrates mortal ?HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 26: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Question & Answering Test Cases -2

3. “Wh" questionsI: After its final passage by both houses, the bill is sent to

the president.

Q: Whom is a bill sent after its final passage by both houses ?

Q: When is a bill sent to the president ?

4. References resolutionI: When a senator or a representative introduces a bill, he or

she sends it to the clerk of his house, who gives it a number and title.

Q: Who sends a bill to the clerk of his house ?

Q: Who gives a bill a number and title ?

5. Synonyms/antonyms accountingI: Diesel engines are heavier than gasoline engines.

Q: Which type of internal-combustion engine is lighter ?

Page 27: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Question & Answering Test Cases -3

6. Semantic categories accounting

I: The heart employs a separate vascular system to obtain blood for its own nourishment. Two major coronary arteries regulate this blood supply.

Q: What is the function of coronary arteries ?

7. Ontology accounting

I: Joseph Kennedy devoted the rest of his life to advancing the political careers of his sons, John, Robert and Edward.

Q: Is Robert Kennedy a brother of John Kennedy ? HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 28: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

HOLTRAN Question & Answering Test Cases -4

8. Merging distributed info

I: IBM and Philips announced a joint initiative to collaborate on radio frequency identification (RFID) technology for companies using supply-chain software.

I: Eastman Kodak Co. and IBM on Tuesday announced a joint effort to offer healthcare facilities products that combine Kodak's medical imaging technology and services with IBM's storage devices.

I: GiveMePower Corporation today announced it has partnered with IBM Corporation as one of four business solutions to be showcased in Intel Corporation’s "Inside Your Digital Life: Intel" exhibit at CeBIT 2004.

Q: Which companies does IBM Corporation have joint projects or partnership with ?

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 29: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Context dependency

Absolute truth

< John works hard.

> Yes, John works hard.

< John does not work hard.

> No, he does work hard.

< How does John work ?

> He works hard.

(Currently implemented

dialogue)

Relative truth

< Bryan says that John works hard.

< Bill says that John does not work hard.

< How does John work ?

> According to Bryan he works hard and according to Bill – not.

(Future dialogue)

Contradiction resolution

HOLTRAN Technology Ltd.HOLTRAN Technology Ltd.

Page 30: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Key Persons:Dr. Alexander Brenner, President and

CEO.

Ph.D. in Mathematics from the Technion - Israel Institute of Technology and M.Sci. from Moscow State University.

Previously: Image Processing and algorithms department leader at Imaginarix Ltd. and lecturer, at the Technion - Israel Institute of Technology.

Professional experience:

Pure and applied mathematics, Image and Signal Processing. Software engineering (object oriented design, testing and maintenance). Management of R&D teams.

Applications:

Image processing, Call Centres, Artificial intelligence (pattern recognition, natural language processing), scientific programming, industrial applications, mathematical and statistical modelling.

Page 31: From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

Key Persons:

Dr. Victor Gluzberg, VP R&D.

Ph.D. and M.Sci. in physics and applied mathematics from Novosibirsk State University. Previously: Software Manager at Parametric Technology, Israel

Professional experience:

Applied mathematics and computer sciences, Physics, Software engineering ( requirements analysis, program specification and design, testing and maintenance). Management of R&D teams.

Applications:

Data processing, System programming, CAD/CAM, Artificial intelligence( pattern recognition, inference, natural language processing) scientific programming, industrial applications, mathematical and statistical modelling.