1 automatically building concept structures and displaying concept trails for the use in...
Post on 21-Dec-2015
217 views
TRANSCRIPT
1
Automatically Building Concept Structures and Displaying Concept Trails for the Use in
Brainstorming Sessions and Content Management Systems
Chris Biemann, Karsten Böhm, Gerhard Heyer, Ronny Melz University of Leipzig
I2CS 2004 – Guadalajara - Mexico
06-23-2004
2
Support for Creativity
Using Knowledge Visualization of terms and relations. Filters define views on semantically relevant contents and structures.
Using Knowledge Visualization of terms and relations. Filters define views on semantically relevant contents and structures.
Processing KnowledgeGeneration of semantic maps and associations in cooperative teamwork meetings
Processing KnowledgeGeneration of semantic maps and associations in cooperative teamwork meetings
Acquisition of Knowledge Gathering information from structured and unstructered texts, databases, document collections, web etc.
Acquisition of Knowledge Gathering information from structured and unstructered texts, databases, document collections, web etc.
Markt
Produktgattung
Branche
Autos
Technologie
Telekommunikation
Software
Konsum
TransportLogistik
Medien
Versicherungen
Banken
Bau
Öl
Chemie
Pharma Energie
InternetEuropa
AmerikaAsien
Südamerika
Nordamerika
Deutschland
EU-Anwärter
Frankreich
SingapurJapan
Emerging Markets
EuroStoxx 50
GlobalPolen
Argentinien
Flugzeugbeteiligung
Aktienfonds
Rentenfonds
Mischfonds
HedgefondsWertpapier
SachbeteiligungAktie
Schiffsbeteiligung
Future
Derivat
FondsImmobilienbeteiligung
Immobilienfonds
Sparbrief
Sparbuch
Warrant
Lebensversicherung Versicherung
Devisen
Markt
Produktgattung
Branche
Autos
Technologie
Telekommunikation
Software
Konsum
TransportLogistik
Medien
Versicherungen
Banken
Bau
Öl
Chemie
Pharma Energie
Internet
Autos
Technologie
Telekommunikation
Software
Konsum
TransportLogistik
Medien
Versicherungen
Banken
Bau
Versicherungen
Banken
Bau
Öl
Chemie
Pharma Energie
Öl
Chemie
Pharma Energie
InternetEuropa
AmerikaAsien
Südamerika
Nordamerika
Deutschland
EU-Anwärter
Frankreich
SingapurJapan
Emerging Markets
EuroStoxx 50
GlobalPolen
Argentinien
Europa
AmerikaAsien
Südamerika
Nordamerika
Deutschland
EU-Anwärter
Frankreich
SingapurJapan
Emerging Markets
EuroStoxx 50
GlobalPolen
Argentinien
Flugzeugbeteiligung
Aktienfonds
Rentenfonds
Mischfonds
HedgefondsWertpapier
SachbeteiligungAktie
Schiffsbeteiligung
Future
Derivat
FondsImmobilienbeteiligung
Immobilienfonds
Sparbrief
Sparbuch
Warrant
Lebensversicherung Versicherung
Devisen
Flugzeugbeteiligung
Aktienfonds
Rentenfonds
Mischfonds
HedgefondsWertpapier
SachbeteiligungAktie
Schiffsbeteiligung
Future
Derivat
FondsImmobilienbeteiligung
Immobilienfonds
Sparbrief
Sparbuch
Warrant
Lebensversicherung Versicherung
Devisen
3
Goal 1: Computer-aided Associating
Software realizes• Protocol function by displaying identified keywords• Adding associations from database• Displaying keywords reflecting semantical similarity
Desired effects:• Users can remember the session later easily• During the session, associations remind users of terms
they might have forgotten otherwise• The weight and the relatedness of differrent topics in a
session becomes visible
4
Goal 2: Semantic Map and Red Thread
Software realises:• Calculation and visualisation of large document collections
by using important terms (keywords) • Positioning of terms reflects semantic closeness• Small documents can be drawn into the semantic map: red
thread functionality
Desired effects• A fixed map gives rise to orientation in the contents of the
document collection• Important terms can be overseen quickly• Red thread functionality can be used for „fast reading“
5
Data Sources• Projekt Deutscher Wortschatz:
Word list and co-occurrences- for associations- as a reference corpus for the semantic map
• Manual Annotation: typed (coloured) edges and nodes- Semantic primitives- Semantic relations
6
Calculating Associations: Statistical Co-occurrences
• Co-occurrence: occurrence of two or more words within a well-defined unit of information (sentence, nearest neighbors)
• Significant Co-occurrences reflect relations between words• Significance measure (log-likelihood)
• This measure defines the association degree between all words. High degrees result in edges in the semantic map
( , ) log log !
with number of sentences,
.
sig A B x k x k
n
abx
n
7
Example for Co-occurrencesSignificant Co-occurrences of Guadalajara:
Camarena (194), Mexico (104), Mexican (58), kidnapped (43), Zavala (40), ranch (40), Avelar (37), abducted (35), Alvarez (33), drug (33), Camarena's (32), pilot (32), Caro (30), Enrique (29), agent (27), Enforcement (25), Quintero (25), gynecologist (23), tortured (23), Jalisco (22), DEA (21), Drug (21), miles (21), torture (21), Alfredo (20), Machain (18), Feb (16), bodies (16), southeast (16), Monterrey (15), Rafael (15), found (15), Radelat (14), Paso (13), consulate (13), Administration (12), Salazar (12), body (12), killed (12), outside (12), Vasquez (11), Verdugo (11), bullet-riddled (11), murder (11), El (10), Humberto (10), Lopez (10), lord (10), Felix (9), Gallardo (9), Hernandez (9), Mexico's (9), arrested (9), cartel (9), Alberto (8), City (8), March (8), Zuno (8), city (8), homicide (8), indictment (8), kidnapping (8), Caro-Quintero (7), February (7), Tijuana (7), Zuno-Arce (7), buried (7), marijuana (7), racketeering (7), slayings (7), 31-year-old (6), April (6), Consulate (6), Culiacan (6), Javier (6), Machain's (6), agents (6), office (6)
Significant left Neighbours of Guadalajara:
outside (12), near (5)
Significant right Neighbours of Guadalajara:
gynecologist (27), office (8), street (8), home (6), Haggadah (5), drug (5)
8
Calculating Semantic Maps
Requires: document collection• Calculate co-occurrences and keywords by differential
frequency analysis: important words are much more frequent in the document collection than in a large reference corpus
• take the highest ranked words from the differential frequency analysis as nodes
• Take highly significant co-occurrences to existing nodes as further nodes
• Remove stopwords (functional words, determiners...)• Insert edges between nodes that have a high association
degree by co-occurrence significance
9
Positioning in Semantic Maps
force-directed: nodes and edges are thrown on a plane and then driven to equilibrium by minimizing the energy
10
Domain Adjustment
Wortschatz- Database (Very Large Corpus) Wortschatz- Database (Very Large Corpus)
Community knowledge / Domain knowledgeGeneration of a semantic map by processing domain-relevant documents and incorporating existing ontologies.
Community knowledge / Domain knowledgeGeneration of a semantic map by processing domain-relevant documents and incorporating existing ontologies.
Session knowledge / Project knowledgeEnrichment of database by incorporating task-specific knowledge and know-how.
Session knowledge / Project knowledgeEnrichment of database by incorporating task-specific knowledge and know-how.
PARTIAL OVERLA
P
PARTIAL OVERLA
P
11
VisualizationExtension of Touchgraph (www.touchgraph.com):• Force-directed model for positioning• Label filling colours for runtime-type (keyword, associated,
red thread)• Label edge colours for semantic primitives• Edge colours for semantic relations• Nodes can be displayed as lables or dots
Is-A Relation
co-hyponymy Relation
white: keyword by user
grey: association from DB
primitive: Noun
primitive: organisation
12
Zooms• Conceptual zoom:
lexicalize nodes or display them as dots
• Granularity: reduce number of visible nodes
• Optical zoom: size of window compared to total size of the map
13
Adding nodes in Association Mode• User keywords are added to the graph. They fade if they
get not connected for a certain time.• Grey words are added if they are associated to at least two
user keywords
Lasst uns über Mexiko sprechen.
Das ist ein Land in Mittelamerika.
Die Mexikaner tragen Sombreros, das sind Hüte für den Sonnenschutz.
So einen Hut hätte ich auch gern!
16
Red Thread Functionality• Given: semantic map, additional input• Terms from the additional input that are found in the semantic map are
coloured in red and connected in sequence of their occurrence- red connection: the edge already existed in the semantic map- yellow connection: the edge is new
• Long-range yellow edges visualize topic shifts
GeorgiaAfghanistan
Iraq
18
Embedding in the system• Implementation as java servlet with tomcat webserver• Mysql-Database for Graphs and associations• Linguatec VoicePro 10 – Interface for speech recognition• Several (language recognition)-clients can be connected via
LAN
19
Interfaces • Import/Export
- various formats for text files- XML/RDF/RDB for maps- PNG for maps
The results obtained with SemanticTalk can be saved, loaded and exported to other tools for further processing
• Retrieval: - words (nodes) with links to occurrences in the document collection- associations (edges) with links to occurrences in the document collection- explicit links, e.g. pictures for words
20
Further Processing of Net Topology Structures
Transformation in application modelsTransformation in application models
Exchange format(e.g. rdf)
Exchange format(e.g. rdf)
Semantic Map
Ressource model
Process model
VariantenVarianten
Product model