visual text mining with swapit detection of semantic relationships among text documents and...
TRANSCRIPT
Visual Text Mining with SWAPitDetection of semantic relationships among text documents and associated data sources
Andreas Becks
Fraunhofer-Institute of Applied Information TechnologySankt Augustin & Aachen, Germany
Aachen
St.Augustin
Roma, 24 novembre 2005
2© Fraunhofer-FIT 2005
Lost in the Ocean of Text Documents?
Text Mining helps to explore and analyse natural-language texts
uncover relationships, recognize trendsgroup, condense pieces of knowledge
categorize text information
A huge amount of organisational knowledge is stored in text documents
85 to 90 percent of all corporate data according to Merrill Lynch and Gartner studies
Even when DMS and desktop search are used, a huge amount of time is necessary to find important information
80% of companies and 40% of public administrations need more than one day [Zylab survey]
3© Fraunhofer-FIT 2005
SWAPit Helps You to Navigate Through Your Text Data
The tool visualises semantic relationships among text documents...
X-ray view for document archives
4© Fraunhofer-FIT 2005
SWAPit Integrates Text and Data Mining
... and allows to navigate, search, browse and analyse text documents and associated data and metadata
text documents
catalogue oftext categories
related structured data
Similarity ViewSimilarity View
Category ViewCategory ViewTools for
analysis and search
Tools for analysis and
search
Fact ViewFact View
cate
go
riza
tio
nca
teg
ori
zati
on
associationsassociations
5© Fraunhofer-FIT 2005
Application Example: Document Management
New text documents
Protocollazione
Titolario
Information about type,
AOO/UO, ‘Fascicoli’, etc.
Project selection
Document similarity helps
to create ‘fascicoli’ and
find misclassified documents
DL-based categorization
DL-based categorization
10© Fraunhofer-FIT 2005
SWAPit as a Single Point of Access
operational databases
text documents
user-specific schema & integrated access
DL-based integrationDL-based integration
Virtual Integrated Database
Virtual Integrated Database
From scattered information...
...to integrated informationmulti-schema databases,
distributed & data-centred accessintuitive, user-centred
access
DL-based categorization
DL-based categorization
11© Fraunhofer-FIT 2005
Monitoring Documents with SWAPit and DL
unfiltered and unstructured
text documents DL
-based
filterD
L-b
ased filter
conceptually filtered, relevant text documents
DL-based catalogue
builder
DL-based catalogue
builder
3 news in 1 minute 1 document map per day
From information overflow...
intuitively structured text documents
...to information overview
12© Fraunhofer-FIT 2005
Displaying XML Documents in SWAPit
From complex, machine-readable documents...
...to a human-oriented presentation
data with technically rich structural annotation
customized, task-oriented view
web ontology
metadata (selected attributes and elements)
text content from specified attributes
and elementsXML
XMLXML
XML
XMLXML
XMLXML
XML
ontology-context of specified elements
13© Fraunhofer-FIT 2005
Conclusion: Visual and Intuitive Text Mining with SWAPit
SWAPit combines views on text documents and associated data sources on a single sreen
Overview instead of overflow Improves quality of text access tasks Leverages knowledge sources
Flexible architecture Designed to integrate Semantic Web technology
Derives additional power from integration of DL technologies Can be integrated easily into existing infrastructures or company
portals Can be tailored to specific needs of different market segments
Long-standing experience in research and practical applications Document Management, Business Intelligence, Customer
Relationship Management, ... Main sectors: Insurance, Textile, Engineering, Social Science
Technology has been extended in a joint project with Maurizio Lenzerini (SEWASIE)
14© Fraunhofer-FIT 2005
Grazie dell’attenzione!