ontology-based classification and faceted search interface for apis

53
Ontology-based Classification and Faceted Search Interface for APIs Knarig Arabshian, PhD [email protected]

Category:

Software


0 download

TRANSCRIPT

Page 1: Ontology-based Classification and Faceted Search Interface for APIs

Ontology-based Classification and Faceted Search Interface for APIs

Knarig Arabshian, PhD [email protected]

Page 2: Ontology-based Classification and Faceted Search Interface for APIs

Overview

n  Motivation n  Background n  Problem n  Solution n  Related Work n  Conclusion & Future Work

Page 3: Ontology-based Classification and Faceted Search Interface for APIs

Motivation

n  Most of today’s Web content is suitable for human consumption n  Humans are left with the work of gathering

information from various websites n  Web content is heterogeneous with little or no

structure n  Data is not easily shared between web

content providers

Page 4: Ontology-based Classification and Faceted Search Interface for APIs

Travel Example Use services to

manually search for airfares, car rentals

and hotels

Or search with aggregating services

Use services to help plan travel itinerary and provide information

on local sites such as weather, events, or attractions

Use services that also provide you with helpful

customer reviews

Page 5: Ontology-based Classification and Faceted Search Interface for APIs

Semantic Web Vision

n  Web information can be processed by computers n  Computers can integrate information from the web

“A web of data that can be processed directly and indirectly by computers”

~Tim Berners-Lee (Inventor of WWW)

Page 6: Ontology-based Classification and Faceted Search Interface for APIs

Quest for Semantics

Three main goals of the Semantic Web:

1. Building models: describe the world in abstract terms to allow for an easier understanding of complex reality

2. Computing with knowledge: constructing reasoning machines that can draw meaningful conclusions from encoded knowledge

3. Exchanging Information: distribute, interlink, and reconcile knowledge on a global scale

Page 7: Ontology-based Classification and Faceted Search Interface for APIs

Planning Booking Reviews

Travel

Airline Tickets Car Rental Hotels

Using structured data, computers can aggregate information and customize it for the user

Travel ontology

describes and classifies

travel services

Page 8: Ontology-based Classification and Faceted Search Interface for APIs

Motivation

§  We can see similar problems when it comes to API discovery on the Web §  Discovering an API requires searching through a

large number of services on the Internet §  Reading pages of documentation to figure out how

to use the ones that may match your application §  Example: ProgrammableWeb (PW)

§  De facto API directory with over 14,000 APIs §  Contains over 50 categories of services §  API providers register their APIs in PW §  Each API is manually categorized in a single category

by PW team

Page 9: Ontology-based Classification and Faceted Search Interface for APIs

Exponential PW API Growth

Page 10: Ontology-based Classification and Faceted Search Interface for APIs

Current state of PW

Current classification is a flat categorization of high-level service classes without any refinement between common attributes

Needs a better method for API discovery

Page 11: Ontology-based Classification and Faceted Search Interface for APIs

Example: Search for Social Advertising APIs in PW

Page 12: Ontology-based Classification and Faceted Search Interface for APIs

Example: Search for Social Advertising APIs in PW

Page 13: Ontology-based Classification and Faceted Search Interface for APIs

Example: Search for Social Advertising APIs in the Advertising Category

Search for ‘social’ and ‘advertising’ keywords in Advertising

Category

Results in 7 APIs

Page 14: Ontology-based Classification and Faceted Search Interface for APIs

Example: Search for Social Advertising APIs in the Social Category

Search for ‘social’ and ‘advertising’ keywords in

Social Category

Results in 2 APIs

Page 15: Ontology-based Classification and Faceted Search Interface for APIs

What is needed?

A common data model has to be provided such as an ontology in order to classify terms and

represent knowledge Definition:

A formal, explicit specification of a shared conceptualization ~ Tom Gruber

Page 16: Ontology-based Classification and Faceted Search Interface for APIs

Overview

n  Motivation n  Background n  Problem n  Solution n  Related Work n  Conclusion & Future Work

Page 17: Ontology-based Classification and Faceted Search Interface for APIs

Ontology

§  OWL (Web Ontology Language): Approved standard by W3C

§  Characteristics of ontologies §  Classes: set of resources §  Instances: ground level objects §  Properties: relationships between classes

§  First order logic axioms §  Class relationships such as disjointness, equivalence,

subsumption §  Restrictions on properties such as existential, universal,

cardinality

Page 18: Ontology-based Classification and Faceted Search Interface for APIs

Ontology Benefits

n  Standard way of describing the world both in terms of language and meaning

n  Easily sharable across domains n  Machine readable n  Reasoning

n  Provide complex class relationships such as disjointness, union, intersection besides pure hierarchy

n  Description logic reasoners automatically derive new information and classify data

n  Automated classification can be very useful for dynamic data that is continually updated

Page 19: Ontology-based Classification and Faceted Search Interface for APIs

Ontology vs Relational Database

n  Similarities n  Both use a model to identify common classes and

properties n  ER model can be seen as a simple hierarchical

ontology n  Differences

n  Ontologies are broader in scope (rules, incomplete knowledge)

n  Ontologies provide a way for automated reasoning to occur in order to discover new relationships between entities

Page 20: Ontology-based Classification and Faceted Search Interface for APIs

Example: Reasoning with a Restaurant Ontology

Import class Cuisine

Create a restaurant classification based on

cuisine by setting a restriction on the

hasCuisine property

Page 21: Ontology-based Classification and Faceted Search Interface for APIs

Example: Reasoning with a Restaurant Ontology

Since ChineseCuisine has non-disjoint siblings JapaneseCuisine and KoreanCuisine then also conclude that these are similar to ChineseCuisine

Page 22: Ontology-based Classification and Faceted Search Interface for APIs

Example: Reasoning with a Restaurant Ontology

Page 23: Ontology-based Classification and Faceted Search Interface for APIs

Example: Reasoning with a Restaurant Ontology

Run Reasoner for Automated Classification

Conclude that NewClass is equivalent to

ChineseRestaurant

EQUIVALENT

Page 24: Ontology-based Classification and Faceted Search Interface for APIs

Overview

n  Motivation n  Background n  Problem n  Solution n  Related Work n  Conclusion & Future Work

Page 25: Ontology-based Classification and Faceted Search Interface for APIs

Problem §  Problem:

§  Improve API discovery and classification in Programmable Web by providing a common data model such as an ontology in order to automatically classify terms and perform semantic API searches

§  Main Challenges: §  Define high-level semantic descriptions of Programmable

Web services §  Combine manual and automated data mining techniques to

create an ontology description of existing Programmable Web services

§  Implement system that makes use of the ontology, such as front-end user interface

Page 26: Ontology-based Classification and Faceted Search Interface for APIs

What will improve?

§  Given a PW ontology, the system will: §  Automatically classify existing API instances

within this ontology §  Create an ontology-based user-interface for

automatic registration and querying §  API providers will be able to register their services via this

interface §  Users will be able to discover services with semantic queries

§  Example:

§  Find me an advertising service for social networks §  Find me a social networking service for book

sharing

Page 27: Ontology-based Classification and Faceted Search Interface for APIs

What do we need?

PW Service Classes Properties Feature Classes

API Individuals Automated Classification

PW Service Classes

hasFeature

<140Proof, hasFeature, Advertising_Feature> <140Proof, hasFeature, Social_Feature>

Advertising_Service Social_Service

Advertising_Feature Social_Feature

Advertising_Service Social_Service

<BadgeVille, hasFeature, Advertising_Feature> <BadgeVille, hasFeature, Social_Feature>

Refinement properties for a given PW Category to enable automatic classification

Page 28: Ontology-based Classification and Faceted Search Interface for APIs

Example: Ontology for Feature Class

Page 29: Ontology-based Classification and Faceted Search Interface for APIs
Page 30: Ontology-based Classification and Faceted Search Interface for APIs

Current PW Classification

Video Advertising Social Photo

PW Services

Travel

Page 31: Ontology-based Classification and Faceted Search Interface for APIs

Video

Improved PW Classification using an OWL Ontology

Advertising Social Photo

PW Services

VideoSocial PhotoSocial TravelSocial

Travel

AdvertisingSocial

APIs that have attributes belonging in more than one category will automatically be classified

Page 32: Ontology-based Classification and Faceted Search Interface for APIs

Current PW Search Interface

Page 33: Ontology-based Classification and Faceted Search Interface for APIs

APIBrowse: Improved Faceted Search Interface

Given the PW ontology, automatically generate a faceted search interface by integrating it with a search platform such as SOLR

Page 34: Ontology-based Classification and Faceted Search Interface for APIs

APIBrowse: Improved Faceted Search Interface

Page 35: Ontology-based Classification and Faceted Search Interface for APIs

Overlapping API Instances

Page 36: Ontology-based Classification and Faceted Search Interface for APIs

Overlapping API Instances

Page 37: Ontology-based Classification and Faceted Search Interface for APIs

Overview

n  Motivation n  Background n  Problem n  Solution n  Related Work n  Conclusion & Future Work

Page 38: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt: A semi-automatic ontology creation tool

§  A semi-automatic ontology creation tool that uses the Programmable Web as its corpus

§  Suggest high-level property terms for a given service class which distinguish it from the rest of the categories

§  Implemented as a Protege plugin, de facto ontology editor, to aid in semi-automated ontology creation

§  Contributions: §  Novel algorithm ranks terms and phrases within a PW category as

candidate property assignments by comparing them to external domain knowledge within Wikipedia, Wordnet and the current state of the ontology

§  Can be used even if the ontology engineer is not necessarily an expert of a certain domain

Page 39: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Algorithms

Well-known NLP algorithms used to find terms and phrases

§  TF-IDF: Text frequency-inverse document frequency §  Score of a word in the document shows how important the word is §  Importance of a word depends on how frequently the word has been

used in the document vs. all the documents in the corpus §  Significant Phrases:

§  Chi-square test used to calculate the significance of collocated words §  Two-phase process:

§  Determine collocations and terms that appear together §  Filter out unique collocations from the list

§  Gave a very good indication of high-level property descriptions

Page 40: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Algorithms Novel algorithm uses external resources like Wikipedia, Wordnet

and the constructed ontology to highlight the important terms even more

§  Useful for those who are not domain experts but want to understand what the relevant terms of a domain are

§  Algorithm for using the External Knowledge Base §  Extract Wikipedia page for each category and rank top words with TF-IDF §  If a word or phrase in the API contains any of the top Wikipedia words, label it §  Find synonymous or related terms to the list of generated terms using Wordnet §  If a word or phrase in the API contains any of the related terms label them §  If any of the generated terms lexically match terms in the ontology label them

using a color code

Page 41: Ontology-based Classification and Faceted Search Interface for APIs

Top N TF-IDF from Wiki Advertising, marketing, brand, television, semiotics, advertisement, billboard, radio, product, bowl, sponsor, consumer, advertise, placement, super, logo, commercial, infomercial

Top N TF-IDF from Wordnet Ad, advertisement, advertizement, advertising, advertizing, advert, promotion, direct-mail, prview, advertorial, mailer, newspaper-ad, commercial, circular, teaser, top-billing

Top N TF-IDF from PW Category

Proof, persona, stream, replies, authors, say, hello, ad, brands, social, consumers, advertisers, audience, ads

Top N TF-IDF Ranked based on external KB

Advertisers (wiki), Consumers(wiki), Social(wiki) Brands(wiki), Ads (related), Ad (related), proof, persona, stream, replies, authors, say, hello, audience

Top N Significant Phrases ranked based on external KB

Stream advertising (wiki), social stream(wiki), say hello, author, replies, google groups, ober, michaels, proof, erik, michaels, persona targeting

Example of Property Selection from a Social Advertising API

Page 42: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt: A semi-automatic ontology creation tool

Page 43: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Implementation

§  LexOnt is implemented as a Protege plugin to enhance the user experience of semi-automated ontology creation

§  Four different Java APIs used for the implementation §  Lingpipe API used for the NLP algorithms to

generate TF-IDF terms and Significant Phrases §  Lucene used for indexing and searching for terms §  Protege API used for implementing the Protege

plugin GUI §  OWL-API used for ontology generation code

Page 44: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Results §  Used PW Corpus of ~3000 APIs equalling 250MB data §  Constructed ontology for 5 categories with following features:

§  Domain specificity §  A priori knowledge of domain §  Number of APIs within the domain

§  Tested for four things when evaluating LexOnt 1. The precision/recall of the TF-IDF term and Significant Phrase

generation without external KB information 2. How helpful the external KB was when choosing terms by finding

the percentage of terms used in ontology 3. Whether or not the terms were used in their exact form, similar

form or different forms 4. How quickly an ontology API was constructed by the user

Page 45: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Results 1. Precision/Recall tests for terms without taking external KB

into account n  4% precision n  28% recall Results:

Using only TF-IDF/Sig Phrases alone is not good enough to determine how terms should be used

2. For categories with well-defined Wikipedia pages,

percentage of terms used from external KB was >50% Results:

Well-defined external KBs made it much easier to quickly assess distinguishing features of a category

Page 46: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Results

Domain Number of APIs

Specifically Defined External KB

A priori knowledge of Domain

% terms used from External KB

Advertising <100 √ X 50%

Travel <100 √

80%

Real Estate <100 √

X 100%

Social >100 X √

20%

Page 47: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Results

3. Tested to see how these terms were actually assigned within the instances n  Compared matches that were exact, similar or completely

different n  Example: if LexOnt produced a term “mobile” but the actual

ontology assignment was “mobile advertising,” this would count as a similar match

n  Percentage of equal and similar matches for API instances averaged over 80%

Results:

n  External KB terms were used over 80% of the time n  Percentage of different matches was higher when category was

not well-defined such as the Utility category

Page 48: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Results 4. Speed of ontology construction

n  Before we had the LexOnt tool, and only worked with generated TF-IDF/Sig Phrase terms, it took around 15 minutes to construct an API instance and related feature

n  After the completion of LexOnt, this dropped to 2 minutes.

Results: n  LexOnt’s user interface and external knowledge base ranking

reduced the time for ontology construction by a factor of 7

Page 49: Ontology-based Classification and Faceted Search Interface for APIs

Overview

n  Motivation n  Background n  Problem n  Solution n  Related Work n  Conclusion & Future Work

Page 50: Ontology-based Classification and Faceted Search Interface for APIs

Related Work

§  Most related work involves semi-automated ontology creation for

§  Pure hierarchical ontologies §  Domains that already have some kind of structural description

§  Machine learning and NLP techniques used §  On text corpora §  Alongside existing structured or annotated external knowledge base

§  The work closest to LexOnt’s §  Find property relationships between concepts §  Use unstructured external knowledge bases

Page 51: Ontology-based Classification and Faceted Search Interface for APIs

Related Work

System Corpus Ontology Suggestions

External Knowledge

Text2Onto annotated Probabilistic Ontology Models

None

OntoLT rule-based Classes and properties

None

OntoLearn unstructured Hierarchical classification

Definitions, Synonyms

LexOnt unstructured Properties Wikipedia, Wordnet, Generated Ontology

Page 52: Ontology-based Classification and Faceted Search Interface for APIs

Conclusion

n  LexOnt has shown to be an effective tool for semi-automated ontology creation

n  From our initial results, we have determined that using an external knowledge base to filter out generated terms and phrases n  Increases the accuracy of the feature selection n  Helps in understanding the common terms within a

corpus

Page 53: Ontology-based Classification and Faceted Search Interface for APIs

LexOnt Publications n  Knarig Arabshian and Peter Danielsen, Ontology-based Faceted Search

Interface for APIs (In Journal Submission). n  Peter Danielsen and Knarig Arabshian, User Interface Design in Semi-

Automated Ontology Construction, International Conference on Web Services (ICWS 2013), Santa Clara, CA, June 2013.

n  Knarig Arabshian, Peter Danielsen and Sadia Afroz, LexOnt: Semi-Automatic ontology Creation Tool for Programmable Web, AAAI 2012 Spring Symposium on Intelligent Web Services Meet Social Computing, Palo Alto, CA, March 2012.

n  Knarig Arabshian and Peter Danielsen, Semi-automated Ontology Creation for High-level Service Classification 7th International Conference on Semantics, Knowledge and Grids (SKG 2011), Beijing, China, Oct 2011.