Download - Sumeet vij enterprise_knowledge_graph
Enterprise Knowledge Graph (EKG)Mining an Enterprise’s Systems of Engagements
Sumeet VijSenior Associate
Booz Allen Hamilton
Can you make your decisions on just 20% of your data?
◦ According to IDC Research, less than 20% percent of an enterprise’s information is in the form of structured data which can reside neatly in traditional columnar relational databases
◦ 80% of information is unstructured and semi-structured in the form of documents, web-pages, emails, images and videos which are growing at a tremendous rate
◦ Current Enterprise Systems of Record (ERPs, CRMs) capture a miniscule amount of information generated within an Enterprise
◦ However the Systems of Record remain the main focus of the IT team and the main source of information for the enterprise leadership
Unstructured Data creates Enterprise Information Management Challenges
• Information is scattered and inaccessible• Spread across documents, spreadsheet, emails
• Data is stored in multiple, often incompatible formats
• Data sources are not linked• No documented relationships between pieces of
information
• No easy way to harness data from external sources including social networks
• Information is hard to understand• Different terminology and vocabularies
How do employees create and share information? Through Systems of EngagementThese systems are the primary
way employees in an Enterprise communicate and share information, namely◦Email◦IM (Lync)◦Social collaboration tools like Yammer,
Tibbr, JiveNot surprisingly, these systems
generate unstructured data at high velocity and variety
Systems of Engagement Loosely structured knowledge flows Conversational Dynamic and in flux
How does the industry extract information from unstructured text? Google Knowledge Graph
The Google Knowledge Graph provides “Things not just Strings”, that is, it enhances its search results with semantic information gathered from multiple sources. It provides structured information about entities and links to other related entities. Its goal is to help people
• Find the right thing: Find the right entity, understand the difference between Taj Mahal the monument and Taj Mahal the musician
• Get the best summary: Summarize relevant content related to the entity, key facts and other related entities
• Go deep and broader: Help make unexpected discoveries and relationships
How does an Enterprise extract information from these Systems of Engagement? Enter the Enterprise Knowledge Graph (EKG)Along the lines of the Google Knowledge Graph, the EKG aims to help enterprises extract and explore information created by systems of engagement. Core EKG concepts are:
• Knowledge Capture: Extract key concepts and relationships from unstructured documents using an Enterprise Ontology. This allows concept based indexing of content• Example: An employee submits a trip report in the form of an email. EKG automatically
extracts the Who, What, When and Where information and links it to other relevant resources.
• Knowledge Discovery: Search multiple data sources for information using a relevant Enterprise Ontology• Example: A proposal manager can ask, “Who has background information
about the Army CIO/G6?”.
• Knowledge Exploration: Expose information to a host of graphical tools to visualize and further analyze relationships between data
How is the EKG seeded? Crowd-source the creation The major source of information generation in an
enterprise is email. The process to seed the EKG with email would be:◦ The sender copies their email to a monitored EKG
email mailbox◦ The EKG parses, analyzes and adds the extracted
facts to the Knowledge Graph◦ The EKG then sends an automated email back to
the sender, describing the facts and a link to correct the extracted information
Start with a specific Ontology geared towards a high value use case and then build out the entities and their relationships
Integrate with Linked Data sources like Freebase and DBPedia to provide external context
Benefits of adding email to the EKG◦Bigger insights as we can leverage the
collective interactions of all the employees (not just the respondent) and the subsequent interactions enrich the EKG, allowing even more questions to be answered
◦Liberate employee knowledge, expertise and interactions from the mailbox and make it available for the enterprise to leverage.
SLIDE 10
• Utilize all available knowledge sources• Allows documents, spreadsheets and emails to serve as “top-
level” information sources
• Integration• Ties disconnected pieces of data together into meaningful
wholes that provide a basis for planning and decision making
• Meaning-Centric• Facts around an object or an entity can be easily explored
• Search phrases are better “understood” as they are based upon concepts and not literals
• Serendipity • Related searches allow the formerly “unknown” to be
discovered
EKG Benefits
Facts
How we discover information within an Enterprise today
SLIDE 11
Proposal Manager
Who has information about the army CIO/G6 ?
Sumeet VijResumeSystem
OpportunityManagement
System
CRM
DoD SOA &
Semantic Technology Symposiu
m
Presented at
Cliff Daus
Attended
Demonstration at
CIO/G6
Attended
Follow on Meeting
CIO/G6
Employee of
Customer
Attended
Semantic Technologi
es
Topic
DA CB
Attended
Search
Search
Search
Trip Report
Social Network
Web
Systems of Record Systems of Engagement
Knowledge Discovery
Knowledge Discovery using EKG
SLIDE 12
Proposal Manager
CRM
Opportunity Management
System
Resume System
Sumeet Vij
Who has information about the Army CIO/G6?
Parse
Determine Sources for Information
Query
Knowledge Capture
Trip ReportsMeeting MinutesEtc.
Ent
ity E
xtra
ctio
n
Knowledgebase
Email Submission
Web Submission
Update
Submit
SLIDE 13
Conceptual EKG Architecture
Integration Layer
E-Mail Connector
Database Connector
Web Services Client
Semantic Processing Layer
User Interface Layer
Persistence Layer
NoSQL Store
Entity Extraction
Knowledge BrowserQuery UI
Document Upload
Concept Catalog
Data Source Catalog
• An open architecture composed of re-useable open source components
Questions?