text mining in poolparty semantic suite
TRANSCRIPT
Martin KaltenböckCFO, Semantic Web Company
Timea Turdean Technical Consultant, SWC
POOLPARTY SEMANTIC SUITE
AIMS Webinar 21st Sept 2017
1
PoolParty Drupal Integration
2
Agenda▸ Introduction Semantic Web Company (SWC)
▸ Introduction PoolParty Semantic Suite
▸ Using PoolParty for Text & Data Mining
▹ Text Mining for continuous knowledge graph modelling
▹ Entity linking and data integration
▹ Classification and semantic annotation / tagging
▸ DEMO(s) of text mining capability of PoolParty
▸ Customer Success Stories
▹ REEEP ClimateTagger
▹ healthdirect Australia
▹ CTCN Semantic Search
▹ EIP Water Matchmaking
▸ Q&A Session
INTRODUCTIONSemantic Web Company &
PoolParty Semantic Suite
3
INTRODUCING SEMANTIC WEB COMPANY
Semantic Web Company (SWC)▸ Founded in 2004
▸ Based in Vienna
▸ Privately held
▸ 40+ employees, experts in text
mining & linked data
▸ ~15-20% revenue growth / year
▸ 2.5 Mio Euro funding for R&D
▸ SWC named to KMWorld’s 2017
‘100 Companies That Matter in
Knowledge Management’
▸ Organising SEMANTiCS
conference series for 13 years
▸ https://www.semantic-web.com
4
INTRODUCING POOLPARTY
PoolParty Semantic Suite
▸ First release in 2009
▸ Current version 6.0
▸ W3C standards compliant
▸ Over 200 installations
worldwide
▸ 50% of revenue is reinvested
into PoolParty development
PoolParty on-premises or
used as a cloud service
▸ KMWorld listed PoolParty as Trend-Setting Product 2015, 2016 and 2017
▸ https://www.poolparty.biz/
5
SELECTED CUSTOMER REFERENCESAND PARTNERS
SWC head-quarters
6
Customer References
● Credit Suisse● Boehringer Ingelheim● Roche● adidas● The Pokémon Company● Canadian Broadcasting Corporation● Harvard Business School● Wolters Kluwer● Talend● HealthStream● TC Media● Techtarget● Seek● Alliander N.V.● Pearson - Always Learning● Education Services Australia● American Physical Society● Healthdirect Australia● World Bank Group● Inter-American Development Bank● Renewable Energy Partnership● Wood MacKenzie● Oxford University Press● International Atomic Energy Agency● Norwegian Directorate of Immigration● Ministry of Finance (AT)● Council of the E.U.● Australian National Data Service
Partners
● Accenture● EPAM Systems● Enterprise Knowledge● Mekon Intelligent Content Solutions● B-S-S Business Software Solutions● MarkLogic● Wolters Kluwer● Digirati● Quark
US East
US West
AUS/NZL
UK
MAKE USE OF POOLPARTY SEMANTIC SUITE
OVERVIEW
7
TECHNICAL CORE COMPONENTS
8
Bain Capital is a venture capital
company based in Boston, MA.
Since inception it has invested in
hundreds of companies including AMC
Entertainment, Brookstone, and Burger
King. The company was co-founded by
Mitt Romney.
Taxonomy & Ontology Server
Entity Extractor & Text Mining
Data Integration & Data Linking
Unstructured
Data
Semi-
structured
Data
Structured
Data
Unified
Views
PoolParty
GraphSearch
Identify newcandidate conceptsto be included in a controlled vocabulary
Controlled vocabulariesas a basis for highly
precise entity extraction
Entity Extractor informsall incoming data streams about its semantics and links them
Schema mapping based on ontologies
RDF
Graph Database
PoolParty Semantic Suite
System Architecture Overview
9
360-degree views over various content repositories
10
‘Elevator Pitch’
▸ Built as a ‘Semantic Middleware’
▸ Outstanding user-friendliness
▸ Fully standards-compliant
▸ Highly precise entity extraction
▸ Comprehensive API
▸ Excellent maintainability of extraction models
▸ Integrated with leading search engines & graph databases
▸ Integrated with leading content management platforms
▸ Product configuration options for growing requirements
▸ Highly expertised partners / service team
11
Product Overview
All products are available as cloud services or for on-premise installation
> PoolParty Feature & Price Matrix
12
PoolParty Basic Server
PoolParty Advanced Server
PoolParty Enterprise Server
PoolParty Semantic Integrator
SKOS Taxonomy ManagementMultiple Projects
Taxonomy Rest APIImport/Export (incl. Excel)
Rollback and History
Ontologies and Custom SchemesQuality Management & ReportsAdvanced Corpus Management
Vocabulary Mapping, Linked Data MappingLinked Data Enrichment, Frontend, and SPARQL endpoint
Entity Extractor Extractor APIAuto Populate project from DBpedia
Export to Remote RepositoryWorkflow Management
SKOS-XL (optional)
Integration with Graph databasesIntegration with Search engines
Data linking & mappingData transformation pipelines with UnifiedViews
Graph Search Server
HOW DOESTHIS WORK
Taking a look under the hood
13
BASIC PRINCIPLESBenefiting from the Semantic Web
in a Nutshell
14
Four-layered Content Architecture
15
Metadata and semantic data
16
The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.
Metadata and semantic data
17
The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.
Peggy Guggenheim
Peggy Guggenheim Collection
Venice
Canale Grande
http://my.com/resource/328832
skos:preLabel
http://my.com/docs/45367
skos:preLabel
http://my.com/docs/52345
skos:preLabel
http://my.com/resource/328832
skos:preLabel
Metadata and semantic data
18
The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.
Peggy Guggenheim
Peggy Guggenheim Collection
Venice
museum
Canale Grande
skos:preLabel
http://my.com/docs/45367
skos:preLabel
http://my.com/docs/52345
skos:preLabel
skos:preLabel
http://my.com/resource/62545
skos:preLabel
http://www.mycom.com/images/90546089
imgae
has ladmark
named after
http://my.com/resource/328832
http://my.com/resource/328832hosted in
hosted in
has
Metadata and semantic data
19
The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.
Peggy Guggenheim Collection
dct:title
Mike Miller
Michael Miller
skos:prefLabel
skos:altLabel
dct:creator
http://my.com/docs/328832
http://my.com/people/32schema:Article
rdf:type
http://my.com/img/99.jpg
schema:image
skos:subject
Peggy Guggenheim Collection Venice
museum
skos:prefLabel
skos:subject
skos:altLabel
skos:broader
skos:prefLabel
schema:image
Canale Grande
skos:prefLabel
Resolving Language Problems
“While most people can deal with linguistic features as synonyms, homographs, polyhierarchies, and even with far more peculiar characteristics of natural languages, machines often struggle with automatic sense-making because of the lack of a semantic knowledge model that can be used programmatically.”
Knowledge Graph Text Mining for
knowledge graph development
21
PoolParty Extractor
Uses several components of a knowledge model:
▸ Taxonomies based on the SKOS standard
▸ Ontologies based on RDF Schema or OWL
▸ Word form dictionaries
▸ Blacklists and stop word lists
▸ Disambiguation settings
▸ Domain-specific reference document corpus
▸ Statistical language model
22
PoolParty’s SKOS editor
23
The Audi Q3 is a compact crossover SUV made by Audi.
It is based on the PQ35 platform of Volkswagen.
A5 platform
A series
PoolParty’s ontology and custom schema management
24
Taxonomy
Ontology
Ontology 1from library
Ontology 2(imported)
Ontology 3(custom-made)
Custom Schema
‘Setting the rules’ for text mining & entity extraction via thesaurus
25
Proper use of an funduscoperequires a bit of practice and familiarity with the functions of your device.
Diagnostic Equipment
Ophtalmoscope
Disambiguation settings
26
Disambiguation settings
27
Corpus analysis results in a network of concepts and terms
28I need support to continuously extend our taxonomy / controlled vocabulary!
skos:Concept
ReferenceCorpus
- Websites- PDF, Word, …- Abstracts from
DBpedia- RSS Feeds
skos:Concept
skos:Concept
Term 1
Term 3
Term 7
Term 8
Term 6
Term 4
Term 2
Term 5
- Relevant terms and phrases- Relevancy of concepts- co-occurence between concepts and terms- co-occurence between terms and terms
Semantic AnnotationClassification and Semantic
Annotation / Tagging
29
Entity Extraction based on Knowledge Graphs
30
PoolParty as a supervised learning system
31
Content Manager
Integrator
Taxonomist/Ontologist
ThesaurusServer
Extractor
PowerTagging
uses API
is user of
is user of
is basis of
is basis of
Index
annotates
enriches
Reference Corpus
CMS
extends
is basis of
analyzesuses API
Data Integration Mapping and Linking of Data
32
PoolParty Semantic Integrator -at a glance
https://youtu.be/l_LppfS3wxk
33
Deep Data Analytics
SemanticSearch
SemanticIntegrator
Unstructured Data
Structured Data
ETL / Monitoring / Scheduling
PoolParty Semantic Integrator
High-level architecture
34
DEMO(s)… lets see how it works in action
35
PoolParty Thesaurus Manager● SKOS editor● Ontology and custom scheme manager
PoolParty PowerTagging for Drupal (backend)● Automated Tagging ● Manual Tagging ● Configuration of modules
PoolParty GraphSearch for Drupal (frontend)● Semantic Search● Explore Trends & Sentiments● Facets and Similarity
36
DEMOS
Drupal and PoolParty at a Glance
37
PoolParty Drupal Integration Demo: http://drupal.poolparty.biz/
USE CASESSuccess Stories about Text Mining and Linked Data
using PoolParty Semantic Suite
38
Use Cases: Text Mining & Linked Data
▸ Climate Tagger (PDF)Streamline and catalogue data and information resources
▸ healthdirect Australia (PDF)Semantic Search based on the Australian Health Thesaurus
▸ CTCN Semantic SearchIntegrating thousands of documents from several sources on climate technology
▸European Innovation Partnership /EIP) on Water Online Marketplace including semantic Matchmaking
39
Place your screenshot here
40
Climate TaggerHelp organizations in the climate and development arenas catalogue, categorize, contextualize, and connect data and information resources.
Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus.
http://www.climatetagger.net
How does it work
41
Place your screenshot here
42
EIP Water MatchmakingControlled vocabularies enable accurate matchmaking between Supply and Demand for Water Innovation in Europe.
Matchmaking is based upon the EIP Water Innovation Thesaurus (GEMET based).
http://www.eip-water.eu
Place your screenshot here
43
CTCN Semantic SearchHelp organisations in the climate technology field to explore and find relevant content from thousands of Drupal Nodes and several sources using PoolParty, PowerTagging and s0nr webmining
CTCN is backed by the CTCN Climate Technology Thesaurus.
https://www.ctc-n.org/semantic-search
Place your screenshot here
44
healthdirect AustraliaIntegrated views and semantic search over more than 100 trusted sources.
Harmonization of various metadata systems through the use of a central vocabulary hub: Australian Health Thesaurus.
http://www.healthdirect.gov.au
SUMMARY
WHY TAXONOMISTS AND INFORMATION ARCHITECTS LIKE POOLPARTY
Read more
Different project stakeholders expect specific qualities from a semantic technology platform:45
I am a taxonomist. I need a tool that provides convenient functionalities and intuitive user interfaces for my daily work.
I am an information architect. Enterprise metadata management deserves scalable technologies, which provide semantic services on top of rich APIs based on standards.
PoolParty Academy
Get certified!
46
https://www.poolparty.biz/academy/
CONNECT
Timea TurdeanTechnical Consultant, SWC▸ [email protected]▸ https://www.linkedin.com/in/timeaturdean/▸ https://twitter.com/poolparty_team
48
© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/
Martin KaltenböckCFO, Semantic Web Company
▸ https://www.linkedin.com/in/martinkaltenboeck
▸ https://twitter.com/semwebcompany
▸ https://blog.semantic-web.at/