blaz fortuna, marko grobelnik, dunja mladenic jozef stefan institute ontogen semi-automatic...

21
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute http://ontogen.ijs. ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Upload: curtis-wade

Post on 11-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Blaz Fortuna, Marko Grobelnik, Dunja MladenicJozef Stefan Institute

http://ontogen.ijs.si

ONTOGEN SEMI-AUTOMATIC

ONTOLOGY EDITOR

Page 2: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Outline

Motivation Functionality Conclusion

HCII2007, July 26th

2

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 3: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Motivation

HCII2007, July 26th

3

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 4: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

What is ontology?

Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts.

Generally it consist of Classes: sets, collections, or types of objects Instances: the basic or "ground level" objects Relations: ways that objects can be related to one another

It can be used … as schema for knowledge management system, … to reason about the objects within that domain, etc.

HCII2007, July 26th

4

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 5: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Sample Ontology

HCII2007, July 26th

5

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 6: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Ontology is normally designed by knowledge engineers using ontology editors: Protégé, OntoStudio, …

Domain experts are needed to aid the knowledge engineer at the understanding the domain Ontology editors are not aware of

the ontology’s domain

Our goal is to make ontology editor easy-to-use and domain-aware so that it can be used by domain experts. Reduces the need for knowledge

engineer This is done through the use of text

mining and machine learning.

In this presentation we focus on construction of Topic Ontologies

Ontology Editor

Creating Ontology

HCII2007, July 26th

6

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Domain Expert

Domain Expert

Knowledge

Engineer

Knowledge

Engineer

Xerox

Xerox Corporation is a technology and services enterprise engaged in developing, manufacturing, marketing, servicing and financing a portfolio of document equipment, software, solutions and services. It manages its business in four segments: Production, Office, Developing Markets Operations (DMO) and Other. The Production segment includes black and white products, which operate at speeds over 90 pages per minute …

Xerox

Xerox Corporation is a technology and services enterprise engaged in developing, manufacturing, marketing, servicing and financing a portfolio of document equipment, software, solutions and services. It manages its business in four segments: Production, Office, Developing Markets Operations (DMO) and Other. The Production segment includes black and white products, which operate at speeds over 90 pages per minute …

Yahoo!

Yahoo! Inc. is a provider of Internet products and services to consumers and businesses through the Yahoo! Network, its worldwide network of online properties. The Company's properties and services for consumers and businesses reside in four areas: Search and Marketplace, …

Yahoo!

Yahoo! Inc. is a provider of Internet products and services to consumers and businesses through the Yahoo! Network, its worldwide network of online properties. The Company's properties and services for consumers and businesses reside in four areas: Search and Marketplace, …

The Washington Post

Company's principal business activities consist of newspaper publishing (principally The Washington Post), television broadcasting (through the ownership and operation of six television broadcast stations), the ownership and operation of cable television systems, magazine publishing (principally Newsweek magazine), and (through its Kaplan subsidiary) the provision of educational services. …

The Washington Post

Company's principal business activities consist of newspaper publishing (principally The Washington Post), television broadcasting (through the ownership and operation of six television broadcast stations), the ownership and operation of cable television systems, magazine publishing (principally Newsweek magazine), and (through its Kaplan subsidiary) the provision of educational services. …

Page 7: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

How does it work?

OntoGen suggests concepts Suggestions are generated automatically

… from the text corpus by clustering similar documents … based on user query … through text corpus map

User selects appropriate suggestions and adds them to the ontology OntoGen helps deciding which suggestions to include

… by extracting main keywords from the documents … with ontology and concept visualizations … by list documents behind concepts

Behind each concept there is a set of documents Documents are automatically assigned to concepts Document assignments can be edited manually

HCII2007, July 26th

7

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 8: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Example

Domain

Text corpus Ontology

Concept A

Concept B

Concept C

HCII2007, July 26th

8

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 9: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Functionality

HCII2007, July 26th

9

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 10: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Main Features

Interactive user interface User can interact in real-

time with the integrated machine learning and text mining methods

Concept discovery methods: Unsupervised

System provides suggestions

Supervised Concept learning Concept visualization

Methods for helping at understanding the discovered concepts: Keyword extraction

Generates a list of characteristic keywords of a given concept

Concept visualization Creates a map of

documents from a given concept

Also available as a separate tool named Document Atlas

http://docatlas.ijs.si

HCII2007, July 26th

10

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 11: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Main view

Concept hierarchyConcept

hierarchy

List of suggested sub-concepts

List of suggested sub-concepts

Ontology visualization

Ontology visualization

Selected conceptSelected concept

11

Page 12: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Concept suggestion

Selected conceptSelected concept

12

Suggested subconcepts

Suggested subconcepts

Add new conceptAdd new concept

New concept

New concept

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Page 13: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Personalized suggestions13

Topics view

Countries view

UK takeovers and mergersThe following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover …

UK takeovers and mergersThe following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover …

Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about …

Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about …

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Page 14: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Concept learning14

QueryQuery

New ConceptNew ConceptFinis

hFinis

h

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Page 15: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Concept’s instances visualization

15

Instances are visualized as points on 2D map The distance

between two instances on the map correspond to their content similarity

Characteristic keywords are shown for all parts of the map

User can select groups of instances on the map to create sub-concepts.

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

Page 16: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Concept management

Concept’s details

Concept’s details

Concept’s instance

management

Concept’s instance

management

Selected conceptSelected concept

KeywordsKeywords

Selected instanceSelected instance

16

Page 17: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

New documentsNew documents

Classification of selected document

Classification of selected document

Content of selected

document

Content of selected

document

Adding new documents to ontology

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

17

Selected documentSelected

document

Page 18: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Conclusions

HCII2007, July 26th

18

Blaz Fortuna, Jozef Stefan Institute, Slovenia

Page 19: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Evaluation

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

19

First prototype was successfully used in several commercial projects: Applied in multiple domains: business, legislations and digital

libraries Users were always domain experts with limited knowledge and

experience with ontology construction / knowledge engineering Valuable data from first trails was used as input for the interface

design of the second prototype (the one presented here). Feedback from the users of the second prototype

Main impression was that the tool saves time and is especially useful when working with large collections of documents

Among main disadvantages were abstraction and unattractive look

Many users use the program for exploration of the data

Page 20: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Future work

HCII2007, July 26thBlaz Fortuna, Jozef Stefan Institute, Slovenia

20

Tools for suggestion and learning of more complex relations

Extended support for collaborative editing of ontologies

Easier input of background knowledge Improvement of the user interface based on the

feedback from user trails and real-world users

Page 21: Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute  ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR

Questions? Comments?

Thank you for listening!

HCII2007, July 26th

21

Blaz Fortuna, Jozef Stefan Institute, Slovenia

http://ontogen.ijs.si