how to combine structured and unstructured data for real ...files.meetup.com/1804355/attivio - there...

26
Proprietary & Confidential How to combine structured and unstructured data for real business value Nachman Geva, GM Israel Leon Ribinik, Solution Architect [email protected] [email protected] +972.52.743.0563 +972.54.457.5038

Upload: vananh

Post on 20-Apr-2018

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

How to combine structured and unstructured data for real business value

Nachman Geva, GM Israel Leon Ribinik, Solution Architect [email protected] [email protected] +972.52.743.0563 +972.54.457.5038

Page 2: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Big Data Business Model: Amazon.com

• More products • More customers • More transactions • More shipments • More returns • More…

Page 3: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Page 4: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Big Data Business Model 2000-2010

• Predictive analytics on highly structured data

1. Use structured data (or make it structured)

2. Put in a large, powerful “data warehouse”

3. Use predictive analytics software

4. Get business insight

• Examples of Business Applications

• Clickstream analysis retail optimization

• CDR analysis service optimization and churn

• Market baste analysis upsell/bundling

• Risk analysis margin and risk optimization

4

Page 5: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Disadvantages of “Old Model”

5

• Expensive

• Time consuming

• Rigid / Non-Agile

• Dependency & Integration

Page 6: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Big Data Solutions (today…)

Source: Zaponet http://www.zaponet.com/products

Page 7: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Problem Solved?

7

• Expensive?

OSS on low cost hardware (or cloud)

• Time consuming & Rigid?

Broad range of NoSQL solutions

• Dependency & Integration?

Agile

Page 8: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Big Data Challenges

• Current solutions still focus on a single, high-volume data type (e.g., clicks, logs)

• Latency of batch processing creates blind spots

• Questions about “what” can be answered, but answers to “why” contained within unstructured content

8

Page 9: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Let’s Think Bigger About Big Data…

• What Big Data source are you working with now?

– Customer behavior? (Log files? Clicks?)

– Customer transactions?

– System data? etc…

• How can you determine root causes behind Big Data trends?

• What other sources of information would help provide this insight?

– Documents?

– Web content?

– Email? etc…

9

Page 10: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Big Data: Just One Part of Extreme Information

• Big Data only addresses the challenge of volume, but not:

• Variety

• Velocity

• Complexity

• This broader enterprise picture is referred to as Extreme Information

Source: 'Big Data' Is Only the Beginning of Extreme Information Management, April 7, 2011, Gartner Group

10

Page 11: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Attivio’s Extreme Information Value Proposition

Extreme Information

Attivio completes the Big Data picture:

• Add “why” insights from unstructured content

• Access all data types for BI and decision making with one query

• Turn Big Data into real-time Active Information that initiates action

• Eliminate latency of information that causes “blind spots”

11

Page 12: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Attivio AIE a Unified Information Access platform

Page 13: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Attivio’s Active Intelligence Engine (AIE)

• Enterprise software, deployed on-premises or in cloud (AWS, Azure) for building applications and solutions that are strategic because they use and consume information from multiple sources

• Integrates information of any type (structured, unstructured), in any format, from any repository, inside or outside the enterprise

• Uniquely correlates information across sources at query time, so the system is agile - not brittle

• Access information at any level – document/record, thumbnail, aggregate/trend – using search or SQL

• Build analytics that incorporate information from all silos of information

ACCESS ANALYTICS CORRELATION INTEGRATION

Content & data trapped in silos

Valuable, actionable information

Page 14: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

ENTERPRISE & PACKAGED

APPLICATIONS

SEARCH & DISCOVERY

AD HOC QUERY TOOLS

ACTIVE DASHBOARDS

BUSINESS INTELLIGENCE

TOOLS

CONTENT &

eCOMMERCE

PERSONALIZED

Structured Data Unstructured Content ERP/CRM/Other DB Applications/ADBMS

Data Integration (ETL, MDM etc.)

Data Warehouses

Data Mart

CMS/Documents/PDFs/Email archive etc.

Distributed Data Mgmt.

Unstructured Data

Data Mart

Web/Social Media/SaaS/RSS

External Structured & Unstructured

Hadoop/Machine Data, etc.

splunk >

ATTIVIO ACTIVE INTELLIGENCE ENGINE (AIE)

Page 15: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

INGESTION WORKFLOWS

QUERY WORKFLOWS

ANALYTIC WORKFLOWS

UNIVERSAL ENGINE

SEARCH API

JDBC/ODBC

ANALYTICS • Feed information already

in AIE through workflows to do additional enrichment, transform data, or perform analytic calculations

AIE – Architecture & Capabilities

QUERY-SIDE • Active Security™ • Facet Finder™ • JOIN Processing • Predictive Autocomplete • Spelling Suggestions • Relevancy Ranking • Result Content & Sorting • Spotlighting • Alerts & Syndication • Recommendations

MODELS SPOTLIGHTS

INGEST-SIDE • Language Identification • Tokenization/Segmentation • Lemmatization/Stemming • Entity Extraction • Entity/Sentiment Analysis • Classification • Key Phrase Extraction

CONNECTORS • Databases • Content Management

Systems • Applications • Wrappers for command-

line utilities, Web Services, etc.

SEARCH UI

EMBEDDED IN APPLICATIONS

BUSINESS INTELLIGENCE

TOOLS

ATTIVIO ACTIVE DASHBOARDS

Page 16: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Text Analytics

KEY PHRASES AUTO-CLASSIFICATION SENTIMENT ANALYSIS

ENTITY SENTIMENT ENTITY/CONCEPT EXTRACTION

Page 17: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Unified Information Access - Example

Analyze & enrich unstructured data

Retain & respect normalized structure

John Smith <[email protected]>

New engagement

I am delighted that we were able to move forward … your service desk has been wonderful and helped resolve…

8

1

Page 18: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

AIE – Triples & Graphs

<triple id="1">

<entityId>P01</entityId>

<name>Joe</name>

<is>person</is>

...

</triple>

JOIN(is:person, INNER(JOIN(is:city, INNER(is:college, on="name=locatedIn")),

on="livesIn=name"))

JOIN(is:person, INNER(JOIN(is:city, INNER(JOIN(is:college,

INNER(AND(table:news, NEAR(happiest, students)), ON="name=college")),

ON="name=locatedIn")), ON="livesIn=name"))

All people who live in a college town:

All people who live in a college town with “happy students”:

Page 19: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Scalability Model: Real Life Example

• 700M documents (Office, PDF and web pages), 1B+ security objects, plus person records and metadata

• 42k records/second • Organized in 75k

Communities of Excellence (COE)

• 500k users (350K employees, 150K partners)

• Multiple auth schemes, but query is SSO

• Total servers required for HA solution: 8

Page 20: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Problem

• System outages costing millions per year in slower cash collections, lost productivity.

• Service interruptions taking too long to resolve.

Very high: Critical, diverse information sources include application log data; documents (SharePoint, Documentum, etc.); HP Service Center data; People Central data

Very high: Troubleshooting content scattered across 60+ internal sources. Also, must identify specific log data in real time that indicate a system problem

Customer Example: Large Mutual Funds Firm

High: Heavy velocity and massive volume of log data from over 90 internal applications

20

Page 21: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Solution: Active Intelligence Engine

• Integrates > 20 million scientific publications, 8 million patents, 150,000 diseases, and 100,000 clinical trials

• Advanced search, structured querying, faceted navigation of all biopharma documents

• At any time, the user can visualize results as time-series BI data

• AIE In-engine analytics calc “Relay Score” for each document

• Relay launched 9 months ahead of schedule at 1/3rd of the development cost

Outcomes

Challenge

• Build next-gen competitive BI solution to help biopharmas find new research areas most likely to yield new blockbuster drugs

Case Study

“First of our competitors to market, ahead of schedule and under budget [with] views on the data just not possible in the SQL world.” --Brigham Hyde, COO, Relay ™

Page 23: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Complete Data Discovery Experience

Discovered insight from AIE text analytics

Complete discovery experience: filtering via full-text search

Visualization of unstructured content from non-relational sources

Page 24: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Proprietary & Confidential

Root Cause Analysis with Unstructured Content

Contextual highlighting of details enriches visual analysis

Synonym, acronym expansion and automatic search by corrected spelling refines results

Page 25: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Demos

http://www.attivio.com/resources/demos.html

Page 26: How to combine structured and unstructured data for real ...files.meetup.com/1804355/Attivio - There is More Than Size to Big...How to combine structured and unstructured data for

Thank You

26