semantics rule, keywords drool j. brooke aker ceo expert system usa february 2010

20
Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Upload: julio-dudgeon

Post on 31-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Semantics Rule, Keywords Drool

J. Brooke AkerCEO Expert System USA

February 2010

Page 2: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Corporate background

• Most accurate, largest, fastest growing semantics company worldwide

• 100+ customers including large corporations, government in;– business intelligence - enterprise search & data extensibility

– market sentiment - customer care

• 100+ dedicated engineers focused on core semantic technology, applications, tools and services:

– 200 man/years in the development of COGITO over the last 10 years.

• 20 years old, private & profitable – FY2008: $13.5M, 110+ employees, 30% growth each of last 3 years

– Offices in Connecticut, California, UK, Italy, & Germany

2

Page 3: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Why Do Keywords Drool?

3 Problems with Search Technology;

1. Same WordDifferent Meanings

Jaguar (animal) Jaguar (car)

2. Different WordsSame Meaning

Disability Legislation Equal Opportunity Law

3. Different WordsRelated Meaning

Organization CompanyOrganization Charity

Organization Trade Union

Page 4: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Results in Declining Productivity

Pro

duct

ivit

y o

f Searc

h

Amount of Information

Databases

Files & Folders

Directories

Keyword Search (Google)

Tagging

Natural Language Search

Semantic Search

Desktop

PC Era

World Wide Web

Web 1.0

Social Web

Web 2.0

Semantic Web

Web 3.0

Page 5: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Information Tasks In Business

Query Well Formed

Query Not Well Formed

Discovery Analysis

Exploration

SourcesKnown

SourcesNot Known

Search

Page 6: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Information Measures In Business1. Precision: Retrieving a high level of accurate results relevant to your search

query (a measure of exactness)

2. Recall: Retrieving a high percentage of relevant documents (a measure of completeness)

Recall

Precisionlow high

high

low

PowerSet

Keywords

Statistics

Semantics

Page 7: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

What Business Wants IT to Provide

Semantics plays a role in all these except perhaps the last 2.

Source: AMR Research

Page 8: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

So What Then is the Semantic Web?

Web 1.0

Producer Consumer

Web 2.0

Web 3.0

One ProducerMany Consumers

Everyone ProducesEveryone Consumes

Everyone ProducesPinpoint Consumption

semantics

Page 9: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

COGITO® : deep analysis

4 Approaches Definition Example

Morphological Analysis understand word formsdog, dogs, and dog-catcher are closely related

Grammatical Analysis understand the parts of speech

"There are 40 rows in the table" uses rows as a noun, vs. "She rows 5 times a week" uses rows as a verb

Logical Analysisunderstand how words relate to other words

"Jeffrey Skilling, represented by Attorney Daniel Petrocelli, is married to Rebecca Carter". Rebecca is married to Jeffrey not Daniel.

Semantic Analysis (disambiguation)

understand the context of key words

"I used beef broth for my soup stock" uses stock in the context of food, vs. "The company keeps lots of stock on hand" uses stock in the context of inventory.

• Technology that understands the real meaning of the words – based on theories of human comprehension

Page 10: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

The solution is Semantics

Using human comprehension for machine understanding of text.

Machine understanding of text needs:

A semantic network

A parser to trace each text back to its basic elements

A linguistic engine to query the semantic network

A system to eliminate ambiguity

Steps to establish meaning

SemanticNetwork

ParseEliminateAmbiguity

Order &Priority

1 2 3

Linguistic Query Engine

Page 11: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

COGITO® is generic and horizontal and can transform unstructured information in structured data that can be managed with standard databases

Page 12: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

• The heart of semantic technology;

• Quality of results derived from the complexity and richness of the network.

• Includes all definitions of all words.• Include relationships among all words.

COGITO® EnglishSemantic Network:

- 350,000 words- 2.8m relationships

What is a Semantic Network?

Page 13: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Semantic Networks

Traditional technologies can only “guess” the meaning using; keywords, shallow linguistics, & statistics

Semantic Networks instead indentify;

Connections

Concepts

Terms

Abbrev.

Phrases Meanings

Domains

“San Jose is anAmerican city”

“San Jose is ageographic part of California”

Page 14: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

SemanticNetwork

SemanticNetwork

SemanticNetwork

SemanticNetwork

Technology Stack

SemanticNetwork

LinguisticQueryEngine

DevelopmentStudio

English

Arabic

Italian

German

Other Middle Eastern

1. Morphology

2. Grammatical

4. Disambiguation

Develop & AddCustom Rules

3. Logic

80% Precision

90%+ Precision

Page 15: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Semantic Intelligence• Linguistic rules• Sentence analysis• Semantic Network

Shallow text analytics• Statistics• Heuristic rules• Morphological recognition

Keyword-basedtechnologies

Disam

bigu

atio

n

Entit

y ex

tract

ion

Categ

oriza

tion

Natur

al la

ng. U

I

Sem

antic

Sea

rch

Discov

ery

Sent

imen

t

100% Semantic Technology

Page 16: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

60KB / secSemantic text analysis processing speed (one CPU)

<10-6 sec

Scalability in number of CPUs

Typical time of access to a concept in the semantic net

Number of concepts in English semantic net

Hyponyms and hypernyms

Hypernyms and troponyms

Average # of attributes for each concept

Number of relations in semantic net (English)

Software memory footprint (semantic net and engine) 50 MB

350,000

400,000+

55,000

20

2,800,000

Virtually unlimited

Superior Performance

Page 17: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Expert System Unique Feature #1

• Expanded Definition Sets - captures all possible ways of expressing a concept, beyond the use of a single word;

• Compound word – like “blackbird” or “cookbook”

• Collocation – like “overhead projector” or “landing field”

• Idiomatic expression – like “to fly off the handle” or “to weight anchor”

• Locutions – group of words that express simple concepts that cannot be expressed by a single word

• Verbal lemmas – such as a verb in the infinitive form, e.g. “to write”, or verbal collocations, e.g. “to sneak away”

Keyword / Statistical and Shallow Semantic Tech Fails Here treats “to fly off the handle” all as separate words not as a concept.

Page 18: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Expert System Unique Feature #2• Expanded Semantic Relations - expanded set (65) of

relations between concepts by looking at their use within the text. Answers questions like “Who did what to whom?”, often called a “triple” or a subject-action-object. WordNet for example contains only 5 relation types.

•Verb / Subject•Verb / Direct Object•Adjective / Class•Syncon / Class•Syncon / Corpus•Syncon / Geography•Fine Grain / Coarse Grain•Supernomen / Subnomen•Omninomen / Parsnomen

Keyword / Statistical and Shallow Semantic Tech Fails Here treats “RIM sued Verizon” as the same thing as “Verizon sued RIM”

Page 19: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Expert System Unique Feature #3

• Categories of Attributes – every concept in the semantic network also contains attributes which are organized into a hierarchy of categories. The attributes and categories are assigned to maximize similarities and differences between concepts as an aid in disambiguation.

objectanimals plantspeople concepts places

timenatural phenomena

statesquantity groups

Keyword / Statistical and Shallow Semantic Tech Fails Here can’t tell you what portions of a document are related to categorically … e.g. only points to words not sections within a long document as a first cut.

Page 20: Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010

Thank you

Brooke Aker

CEO of Expert System US

+1 860-614-2411

[email protected]

www.expertsystem.net