nuxeo semantic ecm: from scribo and stanbol to valuable applications
DESCRIPTION
Work on integrating semantic technologies developed in several R&D projects is now progressing at full speed. Expect to see creative new uses of semantic technologies in Nuxeo open source content management products in 2011!TRANSCRIPT
Nov. 23 2010 - S. Fermigier & O. Grisel, Nuxeo
Semantic ECM @ NuxeoA progress report - Nov. 2010
Agenda
From ECM to Semantic ECM
Scribo & IKS
Fise & Apache Stanbol
Nuxeo Integration
Roadmap for 2011
Nuxeo: from ECM...
Nuxeo: an open source ECM vendor
Our Focus is Enterprise Content Management
ECM as a Platform for Content Applications
Open Source as Efficient Development Model
Modern architecture for 21st Century business
“Lean, mobile, social, interoperable”
A Social Marketplace in action
Innovation driven by community of customers, partners, and our core developers
5
Nuxeo ECM - From Platform to Products
PlatformContent
Infrastructure
Nuxeo Enterprise PlatformComplete set of components covering all aspects of ECM
Nuxeo CoreLightweight, scalable, embeddable content repository
HorizontalPackages
DocumentManagement
Digital AssetManagement
CaseManagement
Framework
StructuredDocument
Server
ContentAggregator
Business Solutions
Correspondence Management
Contracts Management
Invoice ProcessingRecords
Management
Construction Media Government Life Sciences
Major Customers
... to Semantic ECM
Picture source: http://www.flickr.com/photos/pixelydixel/
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Online Data in 2007
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2008
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2009
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
2010
Good for Enterprise apps too!
Diagram source: http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/
Key Enablers
Open Data and Linked Online Data
Advances in automatic content analysis (linguistics, image processing) and machine learning
Classical logic and classical AI
Computing power (Moore’s law + MapReduce)
Let’s put them to use!
The technologies and data are available,
Semantic ECM
Content
Text
Image
Sound
Video
Semantic ECM
Content
Text
Image
Sound
Video
Meaning
Metadata
Relations
EntitiesTags
Reasoning
Semantic ECM
Content
Text
Image
Sound
Video
Meaning
Metadata
Relations
EntitiesTags
Reasoning
Semantic ECM
Goals for Semantic ECM
Repurpose existing content
Improve search and collaboration
Make information contextual
Extract and use information from your content
Make your content smarter!
Challenges
Extract meaning from content
Enrich content with knowledge
Enhance interaction with content thanks to added meaning
Content Stack vs. Knowledge Cake
Architectural Challenge
Business valuefrom semantic ECM
Efficiency gains: 20% to 90% (ex: in search, collaboration)
Effectiveness gains: better returns from your assets (ex: news and images from AFP)
Strategic edge: growth, value capture, new services, gain unfair strategic advantage (ex: vertical ontologies for CEVAs / CCAs)
SCRIBO and IKS
Project under the french FUI program, with 9 partners, and a budget of 4.7 M€
Goal: to develop algorithms and collaborative tools for extracting knowledge from unstructured documents and images
Started in 2008, finishing in Dec. 2010, with results already integrated as a Nuxeo plugin
European project under the FP7, with 13 partners (6 SMEs) and a 8.5 M€ budget
Goal: create a semantic software “stack” that will be used by CMS vendors to add semantic features to their products
Started in Jan. 2009, will last until Dec. 2012
First tangible result: FISE, already integrated in a Nuxeo plugin
Linking Semantic EntitiesApache Stanbol - Nuxeo integration
Demo time!
25
Screencast online at http://blogs.nuxeo.com/dev
How does this work?
26
27
28
• Open Source Semantic Engine
• HTTP Services
• For content driven applications
• OSGi: loosely coupled components
• Analysis Engines
• Knowledge RDF vocabularies
What is a semantic engine?
29
• Unstructured content => Knowledge
• Language guessing
• Topic classification (Business, Sports, Media, ...)
• Named Entities extraction and linking
• Relationships and properties extraction
30
31
32
curl -X POST \ -H "Accept: application/json" \ -H "Content-type: text/plain" \ --data "John Smith works at Smith Consulting in Paris." \ http://fise.demo.nuxeo.com/engines
{ "urn:enhancement-1564680b-861c-df6f-fdf9-d34a75d68dfe": { "http://fise.iks-project.eu/ontology/selected-text": [ { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "literal", "value": "Paris" } ], "http://fise.iks-project.eu/ontology/selection-context": [ { "datatype": "http://www.w3.org/2001/XMLSchema#string", "type": "literal", "value": "John Smith works at Smith Consulting Paris." } ], "http://purl.org/dc/terms/type": [ { "type": "uri", "value": "http://dbpedia.org/ontology/Place" } ] }, …
33
34
35
= fise +
fast Linked Data local index +
semantic rule engine+
more ?
Apache Stanbol / Nuxeo integration
36
Local IT infrastructure (LAN) 37
Nuxeo DM
addon
1
Apache Stanbol
2
Engine 1
Engine 2
Engine 3
3
DBpedia
Freebase
GeonamesLDAP
Roadmap 2010-2011
Nuxeo DM Improvement
Automated document categorization (language, subject, geo coverage based on fixed lists)
Semantic entities detection and linking
Available as add-ons on the Nuxeo Marketplace in December!
Nuxeo DM:Upcoming Work
Stanbol + Scribo integration
Multilingual support
Extraction of relations between entities
Topic classification and linking to external taxonomies
Nuxeo DAM
Clustering pictures by similarity
Faces detection
Faces recognition using contextual information
Speech to text integration for full-text search on audio and video files
Nuxeo CMF / Correspondence
Document OCR and structure extraction
Scanned document categorization (ex: invoice vs. contract vs. claim...) and routing
Structured field extraction with configurable document masks
Questions?
http://www.nuxeo.com/
http://blogs.nuxeo.com/dev
http://iks-project.eu
http://fise.demo.nuxeo.com
http://scribo.ws
http://incubator.apache.org/stanbol
More info