microsoft semantic engine
DESCRIPTION
SVR32. Microsoft Semantic Engine. Naveen Garg , Duncan Davenport Microsoft Corporation. Unified Search, Discovery and Insight. Microsoft Semantic Engine. Significant Content is Outside Structured Storage (RDBMS, OLAP, BI) - PowerPoint PPT PresentationTRANSCRIPT
Microsoft Semantic Engine
Naveen Garg, Duncan DavenportMicrosoft Corporation
SVR32
Session Code: SVR32
FUTURE
MICROSOFT SEMANTIC ENGINEUnified Search, Discovery and Insight
Session Code: SVR32
FUTURE
THE SITUATION TODAY
Significant Content is Outside Structured Storage (RDBMS, OLAP, BI)Integration of this Content is Prohibitively Expensive (Time, Money, Resources)Extracting Insight, Analytics, and Recommendations is even harderSituation is a Confluence of Search | Predictive Analytics | Large-Scale Collaborative Filtering
Session Code: SVR32
FUTURE
THE SOLUTION
Having all forms of digital information on a single platform allows people to blend unstructured and structured content and to drive insight and decision making
Microsoft Semantic Engine provides a combination of technologies to form a contextual understanding of all digital content
Session Code: SVR32
FUTURE
Criti
cal B
usin
ess N
eed
Analysts gather documents, media and web content about “Business Analytics”, “Data Integration” and “Search and Discovery”
Core
Mac
hine
Lea
rnin
g
Unsupervised learning infers “Unified Information Access” concept cluster based on automated analysis of content
Efficie
nt D
ata
Aggr
egat
ion
Cluster gains in relevance from mining across unstructured and structured sources added from ERP and BI systems
User
Rel
evan
ce B
oost
Users (BDM) re-label cluster as “Unified Search, Discovery and Insight” and engine adopts it further boosting that cluster relevance
Colla
bora
tive
Boos
tAnalysts collate this content requiring multi-resolution super-clusters with embedded sub-clusters
Busin
ess D
ecisi
on M
akin
g
The CxO explores super-cluster and drafts business plan for her new division
SCENARIO|MEANING DRIVEN INSIGHT
Session Code: SVR32
FUTURE
SCENARIOS|UNIVERSAL APPEAL
Search and Collaboration | Personalized search, discovery and organizationLegal | Precedent and subject based search over large scale textual corpusesLife Sciences | Systems biology with large volume data correlation and searchGovernment Services | Intelligence, real-time analytics, visualization, clusteringSocial Networking | Social graph relevance mining, ranking criteria auto tuning
Session Code: SVR32
FUTURE
FEATURES|UNIFY YOUR CONTENT
Unified Search, Discovery and InsightAutomatic Clustering and Organization Meaning-Driven Indexing, Classification and StorageScalable Content Processing over all Content TypesInstant On Experience for Out of Box Value
Session Code: SVR32
FUTURE
DEMO|VIEWS GALLERY
Search, Discover and Organize features exposed via sample UX gallerySeamless installation and indexing of desktop, email and web contentFully documented Managed APIs used in UX gallery and JavaScript / C# samples
Session Code: SVR32
FUTURE
DESIGN|MEANING-DRIVEN PROCESSING
Streams | Descriptors (Properties) | Kinds (Concepts)Streams processed into contextualized and indexed concepts for search | discovery | organization
KR_CLIENT_225.docxSTREAM
LEGAL DOCUMENTCONCEPT
BILLABLE WORKCONCEPT
EVIDENCECONCEPT
DEPOSITIONCONCEPT
EXTRACTED PROPERTIESPROPERTY
LEGAL CASE [xxx]CONCEPT CLUSTER
SEARCH AND SHAREMDP
Session Code: SVR32
FUTURE
DESIGN|ARCHITECTUREEngine consists of self-contained set of pluggable services
Text Processing
Image Processing
Video Processing
Audio Processing Supervised Machine Learning
Clustering MDI (RBV)
Conceptual Search
Inference Sequence Store (Suffix Tree) Distributed Content Store Ontology and Taxonomy
Management
Semantic Engine
Search and Markup Trend and Predictive Analysis Automatic Organization Recommendation and
Discovery
Session Code: SVR32
FUTURE
DESIGN|SCALABLE ARCHITECTUREThe logical architecture partitions analysis, indexing and storage
API1 API2 API3Analysi
s3
Analysis2
Analysis1
Staging Core Index Stream
Scale out by adding boxes; standard “web farm” (VIP) configuration
Scale out by adding boxes; each box can run all processors or specific processors
Store(<content>) Annotate(<kind>)Index(<content>) Organize(<kinds>)Search(<query>) …
TextImageAudio Video Video
Single Logical Partitionable
Session Code: SVR32
FUTURE
DESIGN|PROGRAMMING
Designed to be hassle free out of the boxSeveral programming languages and frameworks supportedCLR/.NET, JavaScript, TSQL, C++
Session Code: SVR32
FUTURE
DESIGN|PROGRAMMING
Sample of storing a stream in the systemInitiates the content processing, classification, and indexing
Session Code: SVR32
FUTURE
DESIGN|PROGRAMMING
Sample of search and recommendationsReturns contextual results from the store and the web
Session Code: SVR32
FUTURE
DEMO|WINDOWS 7 SHELL EXTENSION
Seamless Integration in Windows Desktop Federated SearchExpose Meaning-Driven Indexing and Semantic ActionsZero Learning Curve
Session Code: SVR32
FUTURE
DESIGN|ARCHITECTURE DETAILS
System Integration Fabric (SIF)
ImportersImportersImporters
Files
API Layer
PlugInsPlugInsPlug-Ins
SemanticEngine
Database
Kind Descriptor Stream KindLink
ListKind
Session Code: SVR32
FUTURE
DESIGN|ANATOMY OF A KIND
KindID SourceUri00000000-1111
C:\My Documents\Saint Germain Des Pres Cafe (Finest electro-jazz compilation)\05 Track 5.wma
StreamID KindID StreamUri
Format Stream
11111111-2222
00000000-1111
audio/x-ms-wma
0xFFD8FFE000104A4649460001…
DescriptorID KindID Type Attribute ValueDescriptorID KindID Type Attribute Value10000000-0000
00000000-1111
Classification
Audio 1.0
20000000-0000
00000000-1111
Metadata
Name 05 Track 5.wma
30000000-0000
00000000-1111
Metadata
Item Type Windows Media Audio File
DescriptorID KindID Type Attribute Value10000000-0000
00000000-1111
Classification
Audio 1.0
20000000-0000
00000000-1111
Metadata
Name 05 Track 5.wma
30000000-0000
00000000-1111
Metadata
Item Type Windows Media Audio File
40000000-0000
00000000-1111
Metadata
Length 00:05:22
50000000-0000
00000000-1111
Metadata
WM/ProviderStyle
Electronica
DescriptorID KindID Type Attribute Value10000000-0000
00000000-1111
Classification
Audio 1.0
20000000-0000
00000000-1111
Metadata
Name 05 Track 5.wma
30000000-0000
00000000-1111
Metadata
Item Type Windows Media Audio File
40000000-0000
00000000-1111
Metadata
Length 00:05:22
50000000-0000
00000000-1111
Metadata
WM/ProviderStyle
Electronica
60000000-0000
00000000-1111
Audio Tonality/Major 0.78
70000000-0000
00000000-1111
Audio Tempo/Moderato
0.79
DescriptorID KindID Type Attribute Value10000000-0000
00000000-1111
Classification
Audio 1.0
20000000-0000
00000000-1111
Metadata
Name 05 Track 5.wma
30000000-0000
00000000-1111
Metadata
Item Type Windows Media Audio File
40000000-0000
00000000-1111
Metadata
Length 00:05:22
50000000-0000
00000000-1111
Metadata
WM/ProviderStyle
Electronica
60000000-0000
00000000-1111
Audio Tonality/Major 0.78
70000000-0000
00000000-1111
Audio Tempo/Moderato
0.79
80000000-0000
00000000-1111
Classification
Music .8
Session Code: SVR32
FUTURE
DESIGN| MODELSPACE
Session Code: SVR32
FUTURE
DESIGN| PROPERTYSPACE
Periodically, MSE checks the User database for ChangesAll Change data is returned to MSE as one XML blockMSE creates Kinds and Descriptors as needed, and Commits the activityMSE data is exposed through custom views keyed to the Users’ Primary Keys
Session Code: SVR32
FUTURE
DEMO|SQL PROPERTY PROMOTION
Seamless Integration of Meaning-Driven Indexing in ALL SQL TablesExpose Meaning-Driven Indexing via T-SQL
Session Code: SVR32
FUTURE
PARTING THOUGHTS
Unified Search, Discovery and Insight over Every Digital ArtifactExtensible and Scalable Semantic PlatformZero Learning Curve
YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation
forms online atMicrosoftPDC.com
Learn More On Channel 9> Expand your PDC experience through
Channel 9.
> Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses.
channel9.msdn.com/learnBuilt by Developers for Developers….
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.