1 automatically building concept structures and displaying concept trails for the use in...

21
1 Automatically Building Concept Structures and Displaying Concept Trails for the Use in Brainstorming Sessions and Content Management Systems Chris Biemann, Karsten Böhm, Gerhard Heyer, Ronny Melz University of Leipzig I2CS 2004 – Guadalajara - Mexico 06-23-2004

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

1

Automatically Building Concept Structures and Displaying Concept Trails for the Use in

Brainstorming Sessions and Content Management Systems

Chris Biemann, Karsten Böhm, Gerhard Heyer, Ronny Melz University of Leipzig

I2CS 2004 – Guadalajara - Mexico

06-23-2004

2

Support for Creativity

Using Knowledge Visualization of terms and relations. Filters define views on semantically relevant contents and structures.

Using Knowledge Visualization of terms and relations. Filters define views on semantically relevant contents and structures.

Processing KnowledgeGeneration of semantic maps and associations in cooperative teamwork meetings

Processing KnowledgeGeneration of semantic maps and associations in cooperative teamwork meetings

Acquisition of Knowledge Gathering information from structured and unstructered texts, databases, document collections, web etc.

Acquisition of Knowledge Gathering information from structured and unstructered texts, databases, document collections, web etc.

Markt

Produktgattung

Branche

Autos

Technologie

Telekommunikation

Software

Konsum

TransportLogistik

Medien

Versicherungen

Banken

Bau

Öl

Chemie

Pharma Energie

InternetEuropa

AmerikaAsien

Südamerika

Nordamerika

Deutschland

EU-Anwärter

Frankreich

SingapurJapan

Emerging Markets

EuroStoxx 50

GlobalPolen

Argentinien

Flugzeugbeteiligung

Aktienfonds

Rentenfonds

Mischfonds

HedgefondsWertpapier

SachbeteiligungAktie

Schiffsbeteiligung

Future

Derivat

FondsImmobilienbeteiligung

Immobilienfonds

Sparbrief

Sparbuch

Warrant

Lebensversicherung Versicherung

Devisen

Markt

Produktgattung

Branche

Autos

Technologie

Telekommunikation

Software

Konsum

TransportLogistik

Medien

Versicherungen

Banken

Bau

Öl

Chemie

Pharma Energie

Internet

Autos

Technologie

Telekommunikation

Software

Konsum

TransportLogistik

Medien

Versicherungen

Banken

Bau

Versicherungen

Banken

Bau

Öl

Chemie

Pharma Energie

Öl

Chemie

Pharma Energie

InternetEuropa

AmerikaAsien

Südamerika

Nordamerika

Deutschland

EU-Anwärter

Frankreich

SingapurJapan

Emerging Markets

EuroStoxx 50

GlobalPolen

Argentinien

Europa

AmerikaAsien

Südamerika

Nordamerika

Deutschland

EU-Anwärter

Frankreich

SingapurJapan

Emerging Markets

EuroStoxx 50

GlobalPolen

Argentinien

Flugzeugbeteiligung

Aktienfonds

Rentenfonds

Mischfonds

HedgefondsWertpapier

SachbeteiligungAktie

Schiffsbeteiligung

Future

Derivat

FondsImmobilienbeteiligung

Immobilienfonds

Sparbrief

Sparbuch

Warrant

Lebensversicherung Versicherung

Devisen

Flugzeugbeteiligung

Aktienfonds

Rentenfonds

Mischfonds

HedgefondsWertpapier

SachbeteiligungAktie

Schiffsbeteiligung

Future

Derivat

FondsImmobilienbeteiligung

Immobilienfonds

Sparbrief

Sparbuch

Warrant

Lebensversicherung Versicherung

Devisen

3

Goal 1: Computer-aided Associating

Software realizes• Protocol function by displaying identified keywords• Adding associations from database• Displaying keywords reflecting semantical similarity

Desired effects:• Users can remember the session later easily• During the session, associations remind users of terms

they might have forgotten otherwise• The weight and the relatedness of differrent topics in a

session becomes visible

4

Goal 2: Semantic Map and Red Thread

Software realises:• Calculation and visualisation of large document collections

by using important terms (keywords) • Positioning of terms reflects semantic closeness• Small documents can be drawn into the semantic map: red

thread functionality

Desired effects• A fixed map gives rise to orientation in the contents of the

document collection• Important terms can be overseen quickly• Red thread functionality can be used for „fast reading“

5

Data Sources• Projekt Deutscher Wortschatz:

Word list and co-occurrences- for associations- as a reference corpus for the semantic map

• Manual Annotation: typed (coloured) edges and nodes- Semantic primitives- Semantic relations

6

Calculating Associations: Statistical Co-occurrences

• Co-occurrence: occurrence of two or more words within a well-defined unit of information (sentence, nearest neighbors)

• Significant Co-occurrences reflect relations between words• Significance measure (log-likelihood)

• This measure defines the association degree between all words. High degrees result in edges in the semantic map

( , ) log log !

with number of sentences,

.

sig A B x k x k

n

abx

n

7

Example for Co-occurrencesSignificant Co-occurrences of Guadalajara:

Camarena (194), Mexico (104), Mexican (58), kidnapped (43), Zavala (40), ranch (40), Avelar (37), abducted (35), Alvarez (33), drug (33), Camarena's (32), pilot (32), Caro (30), Enrique (29), agent (27), Enforcement (25), Quintero (25), gynecologist (23), tortured (23), Jalisco (22), DEA (21), Drug (21), miles (21), torture (21), Alfredo (20), Machain (18), Feb (16), bodies (16), southeast (16), Monterrey (15), Rafael (15), found (15), Radelat (14), Paso (13), consulate (13), Administration (12), Salazar (12), body (12), killed (12), outside (12), Vasquez (11), Verdugo (11), bullet-riddled (11), murder (11), El (10), Humberto (10), Lopez (10), lord (10), Felix (9), Gallardo (9), Hernandez (9), Mexico's (9), arrested (9), cartel (9), Alberto (8), City (8), March (8), Zuno (8), city (8), homicide (8), indictment (8), kidnapping (8), Caro-Quintero (7), February (7), Tijuana (7), Zuno-Arce (7), buried (7), marijuana (7), racketeering (7), slayings (7), 31-year-old (6), April (6), Consulate (6), Culiacan (6), Javier (6), Machain's (6), agents (6), office (6)

Significant left Neighbours of Guadalajara:

outside (12), near (5)

Significant right Neighbours of Guadalajara:

gynecologist (27), office (8), street (8), home (6), Haggadah (5), drug (5)

8

Calculating Semantic Maps

Requires: document collection• Calculate co-occurrences and keywords by differential

frequency analysis: important words are much more frequent in the document collection than in a large reference corpus

• take the highest ranked words from the differential frequency analysis as nodes

• Take highly significant co-occurrences to existing nodes as further nodes

• Remove stopwords (functional words, determiners...)• Insert edges between nodes that have a high association

degree by co-occurrence significance

9

Positioning in Semantic Maps

force-directed: nodes and edges are thrown on a plane and then driven to equilibrium by minimizing the energy

10

Domain Adjustment

Wortschatz- Database (Very Large Corpus) Wortschatz- Database (Very Large Corpus)

Community knowledge / Domain knowledgeGeneration of a semantic map by processing domain-relevant documents and incorporating existing ontologies.

Community knowledge / Domain knowledgeGeneration of a semantic map by processing domain-relevant documents and incorporating existing ontologies.

Session knowledge / Project knowledgeEnrichment of database by incorporating task-specific knowledge and know-how.

Session knowledge / Project knowledgeEnrichment of database by incorporating task-specific knowledge and know-how.

PARTIAL OVERLA

P

PARTIAL OVERLA

P

11

VisualizationExtension of Touchgraph (www.touchgraph.com):• Force-directed model for positioning• Label filling colours for runtime-type (keyword, associated,

red thread)• Label edge colours for semantic primitives• Edge colours for semantic relations• Nodes can be displayed as lables or dots

Is-A Relation

co-hyponymy Relation

white: keyword by user

grey: association from DB

primitive: Noun

primitive: organisation

12

Zooms• Conceptual zoom:

lexicalize nodes or display them as dots

• Granularity: reduce number of visible nodes

• Optical zoom: size of window compared to total size of the map

13

Adding nodes in Association Mode• User keywords are added to the graph. They fade if they

get not connected for a certain time.• Grey words are added if they are associated to at least two

user keywords

Lasst uns über Mexiko sprechen.

Das ist ein Land in Mittelamerika.

Die Mexikaner tragen Sombreros, das sind Hüte für den Sonnenschutz.

So einen Hut hätte ich auch gern!

14

... it knows lots of countries

15

Semantic Map Example

16

Red Thread Functionality• Given: semantic map, additional input• Terms from the additional input that are found in the semantic map are

coloured in red and connected in sequence of their occurrence- red connection: the edge already existed in the semantic map- yellow connection: the edge is new

• Long-range yellow edges visualize topic shifts

GeorgiaAfghanistan

Iraq

17

SemanticTalk GUI

zoom rulers local context window

topic survey window

18

Embedding in the system• Implementation as java servlet with tomcat webserver• Mysql-Database for Graphs and associations• Linguatec VoicePro 10 – Interface for speech recognition• Several (language recognition)-clients can be connected via

LAN

19

Interfaces • Import/Export

- various formats for text files- XML/RDF/RDB for maps- PNG for maps

The results obtained with SemanticTalk can be saved, loaded and exported to other tools for further processing

• Retrieval: - words (nodes) with links to occurrences in the document collection- associations (edges) with links to occurrences in the document collection- explicit links, e.g. pictures for words

20

Further Processing of Net Topology Structures

Transformation in application modelsTransformation in application models

Exchange format(e.g. rdf)

Exchange format(e.g. rdf)

Semantic Map

Ressource model

Process model

VariantenVarianten

Product model

21

Questions?

THANK YOU!