aka distributed index-based and conventional data ... · conventional data virtualization basic...

87
SmartData Fabric ® security-centric distributed virtual data, master data and graph data management, and analytics SmartData Fabric ® (SDF) aka SmartData Lake™ (SDL) aka Distributed Index-based and Conventional Data Virtualization Basic Overview Revision 12.2 Copyright 2020 WhamTech, Inc. 1 October 2020

Upload: others

Post on 08-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® (SDF)

aka SmartData Lake™ (SDL)

aka Distributed Index-based and

Conventional Data Virtualization

Basic Overview

Revision 12.2 Copyright 2020 WhamTech, Inc. 1

October 2020

Page 2: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Significant customers and partners

Revision 12.2 Copyright 2020 WhamTech, Inc. 2

CUSTOMERS PARTNERS

Very large

healthcare

provider

Very large

bank

Page 3: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Conventional data solutions compared to

SmartData Fabric®

Revision 12.2 Copyright 2020 WhamTech, Inc. 3

Page 4: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Conventional data solutions between a rock and a hard place

Revision 12.2 Copyright 2020 WhamTech, Inc. 4

ROCK = DATA VIRTUALIZATION/FEDERATION

Pros

• Leaves data where it is

• Easy to add or remove data sources

• Meets on-soil data retention regulations

Cons

• 100% dependent on source data quality and systems for queries

• Access control and data security impacted by data quality

• Difficult to manage and integrate Master Data Management (MDM)

HARD PLACE = DATA WAREHOUSE

Pros

• Removes dependency on source data quality and systems for queries

• Determines best (master) data, deduplicates data and ensures referential integrity

• Single database for queries

Cons

• Copies all data, which introduces latency and a security liability

• Transforms source schemas and data to a one-size-fits-all database schema

• Takes significant time and cost

• Has an inflexible schema

• Difficult to add or remove data sources

• Difficult to trace and erase personal data for CCPA/CCPR and GDPR

• In many cases, needs additional data marts

• Does not meet on-soil data retention regulations

Rock

Hard Place

Page 5: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Data lakes are in-between solutions

Revision 12.2 Copyright 2020 WhamTech, Inc. 5

IN-BETWEEN A ROCK AND A HARD PLACE = DATA LAKES

Pros

• All data in a single location or system

• Leaves schema and data “as are”

• Helps with IT issues of access control, scalability, and query processing, performance and

load

• Easy to add or remove data sources

Cons

• Copies all data, which introduces latency and a security liability

• Does not help with data management – still requires ETL to a data warehouse and then data

marts, or an additional data management layer

• New market solutions are data lake + ETL + data warehouse (+ data marts?)

• Difficult to erase personal data as per CCPA/CCPR and GDPR unless traceback from ETL

process or data management layer

• Does not meet on-soil data retention regulations

Page 6: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Comparison among conventional data solutions

Revision 12.2 Copyright 2020 WhamTech, Inc. 6

Feature

Rock =

Data virtualization/federation

Hard Place =

Data warehouse

In-between =

Data lake

Leaves data in sources ✓

Leaves schema and data “as are” ✓ ✓

Addresses data quality and other data-related

issues ✓

Avoids queries on source systems ✓ ✓

Easy to add/remove data sources ✓ ✓

Supports on-soil data retention regulations ✓

Avoids schema and resultant complex data

transformation✓ ✓

Supports traceback and erase personal data ✓ ✓

Avoids latency and security liability ✓

Avoids additional ETL ✓ or ✓

Offers integrated Master Data Management (MDM) ✓

Page 7: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Almost ANY and MULTIPLE data sources on

ANY and MULTIPLE platforms, including

• mainframes

• databases

• data lakes/warehouses/marts

• files

• logs

• office docs

• applications/SaaS

• email

• Web docs

• social media

• Big Data

• streaming

• clouds

• IoT

SmartData Fabric® is a complete data management layer

Revision 12.2 Copyright 2020 WhamTech, Inc. 7

DATA + virtualization

discovery

identification

classification

security

cleansing

entity extraction

transformation

standardization

access control

governance

federation or index store

relationships/links

master data management (MDM)

integration

catalog

monitoring

virtual graph database

support reporting, BI and analytics

Leverage index-

based federated

adapters for

preprocessing and

query execution

only on data that

needs attention -

much/most data

may not, e.g.,

transactional data

Page 8: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Comparison between conventional data solutions and SmartData Fabric®

Revision 12.2 Copyright 2020 WhamTech, Inc. 8

Feature

Rock =

Data virtualization/federation

Hard Place =

Data warehouse

In-between =

Data lake SmartData Fabric®

Leaves data in sources ✓ ✓

Leaves schema and data “as are” ✓ ✓ ✓

Addresses data quality and other data-related

issues ✓ ✓

Avoids queries on source systems ✓ ✓ ✓

Easy to add/remove data sources ✓ ✓ ✓

Supports on-soil data retention regulations ✓ ✓

Avoids schema and resultant complex data

transformation✓ ✓ ✓

Supports traceback and erase personal data ✓ ✓ ✓

Avoids latency and security liability ✓ ✓

Avoids additional ETL ✓ or ✓ ✓

Offers integrated Master Data Management (MDM) ✓ ✓

Page 9: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Enterprise needs

Revision 12.2 Copyright 2020 WhamTech, Inc. 9

Page 10: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® impacts all enterprise needs

Revision 12.2 Copyright 2020 WhamTech, Inc. 10

Most Enterprises

MEETING REGULATIONS and regulatory scrutiny that are increasing

INCREASING EFFICIENCY by

reducing costs and latency in operations

INCREASING EFFECTIVENESS by

gaining and leveraging complete views of clients and other entities using

analytics

LEVERAGING NEW TECHNOLOGIES such as AI/ML, Big

Data, cloud, virtualization, APIs, process automation

and Blockchain

ADDING VALUE to clients and therefore

the enterprise

Meeting these

needs depends

on access to

high quality,

standardized

data and master

data

Page 11: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Digital Transformation -> Projects -> Changes

Revision 12.2 Copyright 2020 WhamTech, Inc. 11

Digital

Transformation

[DRIVER]

Project

Management

Change

Management

These processes

depend on access

to high quality,

standardized data

through standard

APIs and

workflows, and

support for

interoperability

Page 12: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

A fundamental shift in all markets

• Data is now seen as a prime asset

• Data-driven reporting, BI and analytics are, in-turn, seen as driving business

• Data plus AI/ML, cloud, hybrid cloud, data/app/network/storage virtualization,

APIs, process automation, Blockchain, etc. seen as differentiators

Revision 12.2 Copyright 2020 WhamTech, Inc. 12

Page 13: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

• Data access control and data

security

• Master data management (MDM)

• Reporting, BI and analytics

• Compliance

• Near real-time access to, and

insights from, data

• Closing the loop from analytics to

operations

• CCPA/CCPR and/or GDPR

• Data discovery

• Metadata

• Data quality

• Legacy data sources, including

mainframes and file systems

• Unstructured data, including docs,

email and social media

• Data catalog

• Data governance

• Data integration

• Data interoperability

Revision 12.2 Copyright 2020 WhamTech, Inc. 13

However, enterprises continue to struggle with data-

related fundamentals, including…

Page 14: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Main data problems

Revision 12.2 Copyright 2020 WhamTech, Inc. 14

Page 15: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Main data problems are

(a) data is everywhere, and

(b) data-related issues

Migrating ALL data and applications to a cloud helps IT issues,

but NOT data-related issues, and

UNREALISTIC in short-to-medium term for most enterprises

Revision 12.2 Copyright 2020 WhamTech, Inc. 15

Page 16: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Main data problems

Data is everywhere

• Where is it – copies aka discovery?

• What is it – personal/sensitive?

• Is it secured/protected?

• Is metadata available?

• Is governance in-place?

• What is it used for – essential for

CCPA/CCPR and GDPR, with right to erase

personal data?

• Can it be copied to one location, e.g., a data

lake?

• Can it be ETL’d to a one-size-fits-all

database, e.g., a data warehouse?

• How is it connected to other data within and

across source systems?

Data issues

• Does it need cleansing, transformation and/or

standardization?

• Does it need pre-processing, e.g.,

unstructured data?

• Does it need to be de-duplicated – keep the

latest/best?

• Does it need to be aggregated, joined and/or

calculated?

• Is it part of master data?

• Is master data (i) seamlessly and

automatically integrated with data access, (ii)

updated in near-real-time, or (iii) centralized

or distributed?

• Are different views required - operational vs.

analytical?

Revision 12.2 Copyright 2020 WhamTech, Inc. 16

Page 17: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Typical data source, access control and deployment issues

Data source issues

• Are standard drivers of ODBC and JDBC available? SQL query processing? What about APIs?

• Query performance? Availability of indexes and indexed views? Load on data sources?

• If using cache, how is it populated and maintained? Load on data sources? Latency?

• Data monitoring? Event processing? BPM workflows/process automation

• Key updates propagated to other data sources, views and master data, e.g., emails and phone numbers

Data source access control issues

• Support for advanced access control within and across domains, e.g., AD/LDAP, IAM, SSO, RBAC,

ABAC/RLS and CLS, regardless of data source support for any of these?

Compute deployment issues

• Accepting that data is everywhere, it is difficult to deploy compute everywhere – even Hybrid Cloud (1.0)

needs multiple remote local deployments that need to be resourced, managed and coordinated

Revision 12.2 Copyright 2020 WhamTech, Inc. 17

Page 18: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® and the power of indexes

Revision 12.2 Copyright 2020 WhamTech, Inc. 18

Page 19: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Conventional data virtualization/federation vendors

Revision 12.2 Copyright 2020 WhamTech, Inc. 19

1. Leave data in sources (some do not)

2. Virtualize the view of data in sources

3. Connect standard applications using standard drivers and SQL

4. Access and query data in multiple sources in parallel, usually through connectors or adapters

5. Combine results from multiple sources (some do not)

6. Cache results data for improved query performance and less data source load (some do not)

7. Build and maintain MDM (some do not)

8. Apply MDM to combined results data to provide integrated results to applications (some do not)

In step 4 above,

• ALL CONVENTIONAL DATA VIRTUALIZATION VENDORS ARE 100% DEPENDENT ON DATA SOURCES AND DATA

IN SOURCES for data quality, data standardization, available indexes and indexed views, and query processing, which

• CAN IMPOSE SIGNIFICANT QUERY LOAD ON DATA SOURCES AND LEAD TO POOR QUERY PERFORMANCE

• HAS AN IMPACT ON DATA SOURCE ACCESS CONTROL AND DATA SECURITY and

• FAILS TO 100% DELIVER

➔ Enterprises need to enable all/most data-related fundamentals AND address data-related

issues to be successful – not just enable access to data

Page 20: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® data virtualization/federation value-add

100% NOT dependent on data sources or data in sources, and addresses related issues upfront by

a)pre-processing raw source data (and optionally storing processed data in indexes) while building and

maintaining indexes and indexed views,

b)processing queries against these indexes and indexed views, and

c) post-query processing raw results data read from sources (or read directly from indexes)

but only on data in sources that needs it!

• Most data in many systems is non-human-generated and does not need pre-processing or post-query

processing, e.g., transaction systems, but may still need external indexing and query processing

• Some data needs pre-processing and post-query processing, such as customer, product, organization,

etc., e.g., entities and associated attributes, usually for data quality, standardization and security, and

master data management (MDM)

Enables all/most data-related fundamentals that enterprises continue to struggle with, mainly by

addressing all/most issues with data source, data in sources and access control

➔ COMBINES THE BEST of conventional data virtualization, data warehousing, enterprise search

and graph database, and OVERCOMES THE WORST of these approaches

Revision 12.2 Copyright 2020 WhamTech, Inc. 20

Page 21: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

• Transparent virtual distributed data management layer that plugs-and-plays in existing IT infrastructures

• Complements and leverages existing IT systems, tools and applications

• Key differentiator: Federated adapters that Read, Transform and INDEX (RTI) data, wherever it

resides, process queries against these indexes, and read and transform results data from sources

• Indexes enable UPFRONT semi-automated data discovery, security, quality, standards, MDM and other

data-related processes, BEFORE the first query made/application used

• Leaves and guards data in sources

Unconventional data virtualization

Revision 12.2 Copyright 2020 WhamTech, Inc. 21

EXTERNAL

COMMERCIAL/PUBLIC

DATA SOURCE

CLOUD

ORGANIZATION A

ORGANIZATION B

ORGANIZATION C

Data

Governance

MDMETL

ORGANIZATION’S OWN

SYSTEMS OF REFERENCE

= SDF Federation Server

= SDF Adapter

= SDF Indexes

= SDF Hybrid Adapter

= Direct connect

DS1 DS2 DS3 DS4 DS5 DS6 DS7 DS8 DS10DS6

Page 22: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

New paradigm – Read, Transform and INDEX (RTI)

Revision 12.2 Copyright 2020 WhamTech, Inc. 22

Raw DATA

DATA management

Master DATA management

DATA integration

DATA relationships

DATA analytics

INDEXES

increase

the value,

and

reduce the

cost and

risk, of

DATA

DATA access control

DATA governance

DATA protection regulations

DATA classification

DATA security

DATA discovery and profiling

Source: #WhamTech SmartData Fabric Power of Indexes

Page 23: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Three types of indexes

Revision 12.2 Copyright 2020 WhamTech, Inc. 23

Content indexes basis for other

indexes

All indexes resolve to “record

numbers” – internal to SDF, but

correlated with external/data

source references, and can be

combined using Boolean

operations on physical

and virtual bitmaps

ContentIndexes

Master Data

Indexes

Link Indexes™

Source: #WhamTech SmartData Fabric Power of Indexes

Page 24: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

INDEXES ARE KEY to a complete understanding

of data, enabling capabilities and driving value

Including being able to identify and access GDPR, HIPAA, PCI

and other confidential, classified and risk data

Revision 12.2 Copyright 2020 WhamTech, Inc. 24

Source: #WhamTech SmartData Fabric Power of Indexes

Page 25: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 25

SmartData Fabric®

Indexes(data stays in

sources)

DataDiscovery/Profiling/Matching/

TransformsZero Trust Data

Security

Data Pre-processing

Link Indexes™ as basis for MDM and other

Combine for

complete and

multiple views

Full

Data

Traceability

High Performance

QueryProcessing

Data Monitoring

/Event Processing

Virtual Graph

Database and Link Analysis

Generate results without

data source

1. Use raw indexes for DATA DISCOVERY (metadata), build and maintain DATA

PROFILING, DATA MATCHING within and across data sources, and

DEVELOPING AND TESTING DATA TRANSFORMS

2. Support FORRESTER ZERO TRUST DATA SECURITY FRAMEWORK –

discover, INDEX, classify and secure – GDPR, PCI, PHI, PII, etc.

3. PRE-PROCESS DATA while building and maintaining production indexes to

address data management fundamentals, e.g., cleansing, transformation,

standardization and security – data is usually discarded

4. Use LINK INDEXES™ AS BASIS FOR MDM AND OTHER CAPABILITIES –

future development to use indexes exclusively for MDM match and merge

5. Provide COMPLETE AND MULTIPLE VIEWS OF DATA through queries on

combined content, link and master data indexes

6. Provide FULL DATA TRACEABILITY as indexes and results contain unique

pointers to data in sources – data lineage, governance and audit

7. Enable HIGH PERFORMANCE, DISTRIBUTED PARALLEL QUERY

PROCESSSING through standard drivers, APIs, Web/data services, SQL and

other query languages

8. MONITOR DATA SOURCES for content and relationships in near real-time,

and support EVENT PROCESSING

9. Enable VIRTUAL GRAPH DATABASE, link analysis and graph/link

visualization

10. GENERATE RESULTS WITHOUT DATA SOURCE when source is

unavailable, for query optimization, or as storage, e.g., for IoT devices, as

indexes are columnar and can be inverted and combined

Indexes key to understanding data, enabling capabilities and driving value

Source: #WhamTech SmartData Fabric Power of Indexes

Page 26: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Eight types of content indexes

Revision 12.2 Copyright 2020 WhamTech, Inc. 26

Content Indexes

Source data

Composite(source data combined)

Derived from source data

Indexed views (pre-

aggregated, calculated and

joined data)

Unstructured text

Extracted entities

Fuzzy match

Security or access level

Source: #WhamTech SmartData Fabric Power of Indexes

Page 27: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® configuration and

deployment options

Revision 12.2 Copyright 2020 WhamTech, Inc. 27

Page 28: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 28

Content

Indexes

EIQ SuperAdapter

SDV with MDM extension

EIQ Federation Server

Applications

SDV with MDM extension

Credit Reporting Bureau

(Oracle)

Content

Indexes

EIQ SuperAdapter

SDV with MDM extension

Master

Customer

Index

MCRs

Link

Indexes

Master

Customer

Index

MCRs

Link

Indexes

SDV = Standard Data View

aka Standard/Common Data Model

MCR = Master Customer Record

* = All SDF Adapters have Standard Drivers/SQL

SEAMLESS INTEGRATION OF

MASTER CUSTOMER DATA WITH

OPERATIONAL/TRANSACTIONAL

AND ANY OTHER DATA

EIQ ConventionalAdapter

SDV

EIQ Federation Server

SDV with MDM extension

Credit Card Transactions

(MapR Hive)

CLOUD

External to the Cloud

(On-premise, SaaS, data

center, other Cloud, etc.)

HYBRID

CLOUD

2.0

Standard Drivers/SQL*

AD/LDAP

HYBRID

ADAPTER

SmartData Fabric® capabilities address issues

Page 29: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 29

Content

Indexes

EIQ SuperAdapter

SDV with MDM extension

EIQ Federation Server

Applications

SDV with MDM extension

Credit Reporting Bureau

(Oracle)

Content

Indexes

EIQ SuperAdapter

SDV with MDM extension

Master

Customer

Index

MCRs

Link

Indexes

Master

Customer

Index

MCRs

Link

Indexes

SEAMLESS INTEGRATION OF

MASTER CUSTOMER DATA WITH

OPERATIONAL/TRANSACTIONAL

AND ANY OTHER DATA

EIQ ConventionalAdapter

SDV

EIQ Federation Server

SDV with MDM extension

Credit Card Transactions

(MapR Hive)

CLOUD

External to the Cloud

(On-premise, SaaS, data

center, other Cloud, etc.)

HYBRID

CLOUD

2.0

1. Data discovery

2. Only select data

3. Links/relationships

4. MDM

5. SDVs

6. Unstructured

data

7. Data monitoring

8. Standard drivers/SQL

9. EIQ layer

10. Advanced

access control

11. Hybrid Cloud

2.0 (and 1.0)

Standard Drivers/SQL*

AD/LDAP

HYBRID

ADAPTER

SmartData Fabric® capabilities address issues

SDV = Standard Data View

aka Standard/Common Data Model

MCR = Master Customer Record

*All SDF Adapters have Standard Drivers/SQL

Page 30: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Smartphone CRM app invoking new BPM-based workflows with

write-back to legacy systems through standard APIs/data services

Revision 12.2 Copyright 2020 WhamTech, Inc. 30

ORGANIZATION A ORGANIZATION B

Patient

Management

Indexes

EIQ Adapter

EIQ Federation Server

EIQ Federation Server

FHIR REST APIs

EHR

Type 1

Indexes

EIQ Adapter

Labs

Indexes

EIQ Adapter

ORGANIZATION C

Patient

Management

Indexes

EIQ Adapter

EIQ Federation Server

Patient

Management

Indexes

EIQ Adapter

EIQ Federation Server

EHR

Type 3

Indexes

EIQ Adapter

Web Server

EHR

Type 2

Indexes

EIQ Adapter

Applications

Applications

Applications

Applications Applications

Applications

Applications

Applications

Applications

Applications

Applications

Public Cloud

Data sources remain on premise

Local SmartData Fabric deployed on premise

Data sources remain on premise

Local SmartData Fabric deployed on Cloud

Web Server

Applications

Applications

• Patient-centric smartphone app

interacts with legacy data sources

through new workflows developed

and orchestrated by BPM software

• BPM workflows interact with data

source through standard FHIR

REST APIs provided as data

services

• BPM workflows both read and

write back to legacy data sources

Web Server

BPM Workflow

Hybrid Cloud 1.0

Hybrid Cloud 2.0

Page 31: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Special note on Hybrid Cloud 2.0

1. Data is everywhere - leave it where it is: On-premise, on mainframes, in data centers, cloud(s), SaaS, third-parties, Web,

social media, etc.

2. Run SmartData Fabric® unconventional data virtualization in the cloud, leveraging index-based, in addition to

conventional, federated adapters for data-related pre-processing and query processing in and/or from the cloud – no

need to install and run anything elsewhere, as is the case with Hybrid Cloud 1.0

- Establish index update process through changed data capture (CDC)

- Multiple CDC options, including near real-time (NRT)

3. Focus on data that needs processing for quality, standardization, security, relationship mapping and master data

management (MDM) – various options for the rest of the data

- Enable data-related fundamentals

- Address data, data source and access control issues

4. Multiple configuration options, including (a) some data indexed and the rest stays in the source, (b) all data indexed and

stored in indexes, and (c) no data indexed and all queries on data source, with other options in-between

5. Avoid incomplete or incorrect query results, query load and/or poor query performance of conventional data

virtualization/federation, i.e., avoid dependence on data sources, data in sources or data source own access control

6. Immediate short-to-medium-term implementation

7. Optional, medium-to-longer-term transition-migration to the cloud

Revision 12.2 Copyright 2020 WhamTech, Inc. 31

Page 32: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Example multiple data source SDF configuration

Revision 12.2 Copyright 2020 WhamTech, Inc. 32

F I

R E

W A

L L

F I

R E

W A

L L

EIQ Federation

Server

EIQ Federation

Server

Social

Media

FeedIndexes

EIQ

SuperAdapter

EIQ Conventional

Adapter

3rd Party

AdapterSalesforce

Hadoop IndexesEIQ

SuperAdapter

Mainframe IndexesEIQ

SuperAdapter

ERP

System

EIQ Federation

Server Application(s)

WhamTech

ODBC/JDBC

Driver,

APIs,

Web/data

services

TCP / IP

RDBMS IndexesEIQ

SuperAdapter • Adapters and federation servers

independently configurable and accessible at

multiple levels

• Potential LIFO/FIFO query processing

Page 33: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Example shared-nothing architecture

Revision 12.2 Copyright 2020 WhamTech, Inc. 33

Data

Source

Indexes

EIQ SuperAdapter EIQ SuperAdapter EIQ SuperAdapter

EIQ Federation ServerEIQ Federation ServerEIQ Federation Server

EIQ Federation Server

Indexes Indexes

Application(s)

EIQ SuperAdapter EIQ SuperAdapter EIQ SuperAdapter

Indexes can be multiple

sharded segments or replicated

copies

Out-of-the-box configurable

backup, failover and load

balancing = high availability

Page 34: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Automated SmartData

Discovery and

Classification

(ASDAC) thus far

Initial EIQ Adapter configuration, index build and data view mapping

Revision 12.2 Copyright 2020 WhamTech, Inc. 34

Data

Source

Data Read,

Transform/

clean-up

(and Index)

Index schema

and names

usually same

as data source

Twelve ways

to build and

maintain

indexes

EIQ

Adapter*

w/SDV**

EIQ

Indexes

Develop

and test

Data Transforms

using profiles

Network

Asset

and Device

Discovery

Metadata

Discovery

and Semantic

Mapping

Data

Source

Discovery

Indexes usually

do not store data

– only queryable

representations*EIQ SuperAdapter and EIQ TurboAdapter

**Standard Data View

Data

Classification

and Data

Security

Alternate use of raw indexes to initially build EIQ Indexes

Data Discovery

and raw index-

based

Data Profiling

Indexes mapped

to SDV

Distributed Metadata Repository,

incl. Data Governance

Page 35: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

EIQ Adapter index update, query and results retrieval

Revision 12.2 Copyright 2020 WhamTech, Inc. 35

EIQ

Server

(sub-

Middleware)

Data

Source

Application(s)

Data Read,

Transform/

clean-up

(and Index)

Result-set pointers

to data in source

Results provided

in almost any format

Applications / middleware

connect with standard drivers or

Web Services and SQL***

EIQ

Adapter*

w/SDV**

Multiple other data sources

EIQ

Indexes

User-level

access

Middleware

*EIQ SuperAdapter and EIQ TurboAdapter

**Standard Data View

Queries resolved

in the EIQ Adapter

and EIQ Indexes

Raw results data usually

transformed/cleaned-up

from source

EIQ

Federation

Server

(sub-

middleware)

w/SDV

EIQ

Federation

Server

…***Future OQL, SPARQL and NoSQL options

Continual EIQ Indexes updates

Distributed Metadata Repository,

incl. Data Governance

Page 36: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Results Level

Batch updates (flat

file export)

Incremental updates

(flat file export)

Polling*

Update / event

notifications*

Index updates/changed data capture

Revision 12.2 Copyright 2020 WhamTech, Inc. 36

LEGEND

Data Schema Level

Triggers

Transaction / change

/ redo logs

Existing replication /

backup / change data

capture processes

Batch updates

(schema file export)

Incremental updates

(schema file export)

Either Data Schema

Level or Results Level

Crawler / spider

Message queues

RSS feeds*

Near real-time

– low rate

DE

CR

EA

SIN

G

INT

RU

SIV

EN

ES

S

Near real-time

– high rate

Batch / incremental

– high volume

Batch / incremental

– low volume

Preferred option

* = User-level access

Data Schema Level

Source: #WhamTech SmartData Fabric Power of Indexes

Page 37: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Levels of data representations and processes

Revision 12.2 Copyright 2020 WhamTech, Inc. 37

T1: PERSON

PER_ID

PERSON:

: PER_LNAME

: PER_FNAME

T3: ADDRESS

ADD_ID

PROPERTY_NO

ADDRESS:

: ADD_1

: ADD_2

: ADD_CITY

: ADD_STATE

: ADD_ZIP

T2: PERADD

PER_ID

ADD_ID

DATA SOURCE SCHEMA

STANDARD DATA MODEL

Could be industry standard, e.g.,

ACORD, HL7 or NIEM

PERSON

PER_ID

Last Name

First Name

Sex

DOB

SSN

Height

Weight

Eye Color

LICENSE

LIC_ID

License No.

Class

Date Issued

Data Expires

Restrictions

ADDRESS

ADD_ID

Property No.

St. No.

St. Name

St. Type

Apt./PO/Ste. No.

City

State

ZIP

VEHICLE

VEH_ID

VIN

Year

Manufacturer

Model

Color

VEH-REG-ADD

VEH_ID

ADD_ID

PER-REG-ADD

PER_ID

ADD_ID

PER-OWNS-VEH

PER_ID

ADD_ID

PER-OWNS-ADD

PER_ID

ADD_ID

PER-LIC

PER_ID

LIC_ID

PER-LIC-ADD

PER_ID

ADD_ID

T1: PERSON

PER_ID

PERSON:

: PER_LNAME

: PER_FNAME

T3: ADDRESS

ADD_ID

PROPERTY_NO

ADDRESS:

: ADD_1

: ADD_2

: ADD_CITY

: ADD_STATE

: ADD_ZIP

T2: PERADD

PER_ID

ADD_ID

OPTIONAL DATA LAKE SCHEMA

Optional copy

or conversion to

another format, e.g.,

document

PR

OC

ES

S

Larry

Curly

Moe

Recno

WURN005

WURN245

WURN912

Recno

WURN193

Recno

WURN087

WURN332

T1:PERSON.PER_FNAME

INDEXESTypically, do not store data, but, optionally, can

Recno

WURN005

WURN245

WURN005

WURN087

RECORD NO.

Recno

WURN245

WURN912

WURN332

WURN245

WURN912

Recno

WURN332

Recno

WURN005

WURN912

Recno

WURN087

L1

L2 L3 L4 L6

LINK INDEX

Index schemas are usually the same as

or very similar to data source schemas

Recno

1

2

3

Last Name

Smith

Jones

Parker

First Name

Curly

Larry

Moe

No. Properties Owned

7

1

3

INDEXED VIEW

e.g., materialized aggregation

Can be virtual and hierarchicalGRAPH DB

ALL VIRTUAL LOGICAL

DATA VIEWS

STANDARD DATA VIEW

IMP

OR

T

L7

ADDITIONAL BUSINESS OBJECT(S)

ADDITIONAL BUSINESS OBJECT(S)

QU

ER

Y

PERSON WHO OWNS PROPERTY

Last Name

First Name

Property No.

St. No.

St. Name

St. Type

Apt./PO/Ste. No.

City

State

ZIP

DATA MART(S)

L5

Source: #WhamTech Link Indexes and Ontologies

CONTENT INDEX

What

applications/

end-users see

Page 38: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® enables operational and

analytical solutions, and bridges the gap

between them

Revision 12.2 Copyright 2020 WhamTech, Inc. 38

Page 39: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SDF bridges the gap between the enterprise, and reporting, BI, analytics and other apps

Revision 12.2 Copyright 2020 WhamTech, Inc. 39

Enterprise(operational,

transactional,

other and

external

data sources)

Integration

(read only)

Single customer

view (read only)

Interoperability

(read and write) Data Provisioning

Other Predictive rules/interactive CRM and BPM

Queries/Results

Data Security

Data Quality

Data Links/

Relationships

Data Masking, Tokenization and

Encryption

Master Data

Data Governance

Global

Individual

Regional

Reporting,

BI,

analytics

and other

apps

Multiple

entity

centricities

Local

Group

Population

SmartData Fabric®

Data Aggregation

Data Mapping

Access Security

Page 40: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Unique security-centric distributed indexed adapter-based data virtualization, data federation and data

integration software for the following general solutions:

• Automated data discovery, profiling, quality, standardization, governance and relationships mapping

• Advanced data access and data security – seen as a security solution

• Virtual data warehouse and/or virtual data mart

• Data lake + data management + master data management = clean and usable data reservoir

• Data provisioning for highly curated, self-serve reporting, BI and analytics

• Interoperability with write-back to data sources – integrated data, not just app to app

• Seamless, automatic and near real-time updateable distributed master data management

• Virtual graph database and link analysis, and interactive graph/link visualization

• Hybrid Cloud 2.0 where data sources remain wherever they reside, but run all compute in the Cloud or data center

• Near real-time data source monitoring, event processing and Business Process Management (BPM)

• Embrace and enable STANDARDS such as ODBC, JDBC, REST APIs and SQL, and standard applications

➔ Discover, secure, access, integrate and deliver INTEGRATED structured, unstructured and semi-

structured data from almost ANYWHERE to almost ANYWHERE in almost ANY FORMAT, aka Actionable

Data Catalog

SmartData Fabric® general solutions

Revision 12.2 Copyright 2020 WhamTech, Inc. 40

Page 41: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® basic solutions

Revision 12.2 Copyright 2020 WhamTech, Inc. 41

Page 42: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® incremental solutions

Revision 12.2 Copyright 2020 WhamTech, Inc. 42

Page 43: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Services-based combination for vertical market-specific solutions

Revision 12.2 Copyright 2020 WhamTech, Inc. 43

BPM Workflows/Automation(Third-party)

Actionable Data Catalog(WhamTech)

API Catalog(Third-party)

REST

APIs

Support

REST

APIs

Direct

REST

APIs

Indirect

Page 44: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Basic architecture for services-based vertical market-specific solutions

Revision 12.2 Copyright 2020 WhamTech, Inc.

44

Applications (many options)

Mainframes Databases Files LogsOffice

docsEmail

Big

Data

Web

docs

Social

media

Cloud

DBStreaming IoTApplications

Data

Sources

Actionable Data Catalog

Index-based and conventional federated adapters

API Catalog

BPM Workflows/Automation

Optional Data Lake – Distributed, Partial or Centralized

Page 45: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

AD

MIN

IST

RA

TIO

N/ C

ON

FIG

UR

AT

ION

AC

CE

SS

SE

CU

RIT

Y

Basic SmartData Fabric® architecture

Revision 12.2 Copyright 2020 WhamTech, Inc.

45

Virtual interactive

link analysis

and visualization

Virtual

reporting, BI

and analytics

Virtual

MDMVirtual

cybersecurity

Virtual

data

security

Virtual

event

processing

Applications

Mainframes Databases Files LogsOffice

docsEmail

Big

Data

Web

docs

Social

media

Cloud

DBStreaming IoTApplications

WhamTech EIQ Adapters (indexed and conventional federated)

Data

Sources

WhamTech

SmartData

Fabric®

(SDF)

Independent structured and unstructured Indexes, and indexed views only for data that needs it!

Standard drivers, APIs, Web/data services, SQL and potential other query languages

Virtual network, data

source and advanced

data discovery

Data preprocessing (read, but not usually stored, but can be in indexes and index views)

Optional Data Lake – Centralized or Distributed

WhamTech EIQ Federation Servers

WhamTech EIQ Federation Servers

Page 46: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 46

Enterprise conventional data life-cycle

Created

Stored

Copied

Quality/improved

Stored

Copied

Quality/improvedStored

Copied

Quality/improved

Stored

Related

Reported

Analyzed

Acted on

Operational

Data Store (ODS)

Data

Warehouse (ETL and DW)

Data Mart

(DM)/

Analytics

Database/

Link

Analysis-

Graph

Database

Log/Transaction

System

Indexed

Indexed

Data copied multiple timesDiscarded/retained

Indexed

Page 47: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 47

Big Data life-cycle similar to enterprise data life-cycle

Created

Stored

Copied x 3

Quality/improved

Stored

Copied

Quality/improvedStored

Copied

Quality/improved

Stored

Related

Reported

Analyzed

Acted on

Big Data

Lake/Reservoir

(similar to ODS)

Big Data Refinery

(similar to ETL)

Log/Transaction

System

Indexed

Indexed

Data copied multiple timesDiscarded/retained

Big Data/

Analytics

Database/

Link

Analysis/

Graph

Database

Indexed

Page 48: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 48

Created

Stored

Copied

Quality/improved

Stored

Copied

Quality/improvedStored

Copied

Quality/improved

Stored

Indexed

Related

Reported

Analyzed

Acted on

Indexed

Indexed

Discarded/retained

Created

Stored

Reported

Analyzed

Acted on

Discarded/retained

Quality/improved

Indexed

Related

Master Data

Capabilities in the SmartData

Fabric® support applications

WhamTech

SmartData Fabric®

SDF eliminates most conventional data life-cycle stages

Page 49: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 49

Created

Stored

Quality/improved

Indexed

RelatedCopied

Quality/improved

Stored

Indexed

Reported

Analyzed

Acted on

WhamTech

SmartData Fabric®

Log/Transaction

System

Master data

Discarded/retained

Big Data/

Analytics

Database/

Link

Analysis/

Graph

Database

Data provisioning for Big Data and other analytics

Data mapping, quality, security, masking

tokenization, encryption and link mapping,

and master data, addressed

• Assume data engineering role

• Eliminate up to 80% of time spent by

expensive data scientists and

analysts preparing data

• Tend towards real-time analytics and

feedback to operational/transactional

systems

Page 50: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

The End

Appendix: Backup material

Revision 12.2 Copyright 2020 WhamTech, Inc. 50

Page 51: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® for data virtualization/federation

Revision 12.2 Copyright 2020 WhamTech, Inc. 51

Discovery

Raw indexingand profiling

(data/(pointers) discarded)

Classification and

categorization

Access anddata

security

Cleansing,transformation

andstandardization

Maskingtokenization,

andencryption

Productionindexing

(data discarded)

Standard data view mapping

Indexed viewsBI, analytics,

CRM and BPM support

Link mapping/indexing

Master Data Management

(MDM)

Event processing

[Event correlation]

[Anomaly detection]

High performance parallel query

processing

Integration (results read

only)

Interoperability (results read

and write)

Link Analysis/Graph

Database

WhamTech key differentiators addressed upfront

Page 52: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SDF combines best and overcomes worst of alternatives

Revision 12.2 Copyright 2020 WhamTech, Inc. 52

No. Feature

SmartData

Fabric®

Data

Warehouse

Conventional

Federated Data Search

Data

Lake

1 Query clean, transformed and standardized data ✓ ✓

2 Consistent and multiple indexes and types ✓ ✓ (✓)

3 Near real-time updateable pre-aggregated, pre-

calculated and pre-joined views

✓ ✓

4 Results when data sources unavailable ✓ ✓ ✓ or ✓

5 Row, column and data element security ✓ ✓ (✓)

6 Data stays in original format ✓ ✓ ✓

7 Data remains in source ✓ ✓ ✓ or

8 User-level access to source data ✓ ✓

9 Latest data available ✓ ✓

10 Drill-down capability ✓ ✓ (✓)

Page 53: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SDF better than or as good as alternatives

Revision 12.2 Copyright 2020 WhamTech, Inc. 53

No. Feature

SmartData

Fabric®

Data

Warehouse

Conventional

Federated Data Search

Data

Lake

11 Actively monitor data sources ✓ () ✓ or

12 Work with unstructured data/text analytics ✓ ✓ ✓

13 Unlimited query options and performance ✓ (✓)*

14 Data/entity relationship/link mapping ✓ ✓ or ✓ or

15 Write back to data sources ✓ ✓ or

16 Avoid schema transforms ✓ ✓ or ✓

17 Full text search ✓ ✓

*with data marts

Page 54: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SDF slightly disadvantaged compared to alternatives

Revision 12.2 Copyright 2020 WhamTech, Inc. 54

No. Feature

SmartData

Fabric®

Data

Warehouse

Conventional

Federated Data Search

Data

Lake

18 No index or query load on data sources (✓) ✓ ✓ ✓

19 Data source owners not aware of queries (✓) ✓ ✓ ✓

20 Archive options (✓) ✓ () ✓

21 Good for application data sources (✓) ✓ ✓ ✓

22 Minimal additional system cost () ✓ ()

23 No need for data or index update process ✓

Page 55: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Example projects (1 of 2)

1. Optum/NAMM: Hybrid Cloud 2.0-type access to 100s and eventually 1000s of remote healthcare

partner healthcare data sources using selective indexing to provide a single patient virtual logical

view – data cannot be copied or moved

2. General Dynamics (GD): Tableau with Single Sign-On (SSO) enablement on multiple data

sources, including Peoplesoft HR and SaaS, using both index-based and conventional federated

adapters – seen as a data access and data security solution by GD

3. Northrop Grumman: Major DoD cyber program and platform – data cannot be copied or moved

and need real-time access - potential inclusion

4. Major healthcare payer: Matching unstructured contract content to structured claims data –

involves ML-trained entity extraction - potential project

5. Major healthcare provider: Test data access to production data using data virtualization + access

and data security + data masking - potential project

Revision 12.2 Copyright 2020 WhamTech, Inc. 55

Page 56: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Example projects (2 of 2)

6. Past work with major DoD and intel government contractors – high performance and complex

query processing, including up to 60 billion records/day on HBase

7. POC for single patient view for NHS Trust in the UK – 3 organizations, 7 data sources = 4 on

premise and 2 in Cloud - Hybrid Cloud, use NHS MPI and own MPI, FHIR APIs, services and

AWS

8. Message Bank for very large medical academic delivery system for HL7 and other messages –

Cassandra target data source, real-time, parse and index, VMPI, FHIR APIs, and future support

for reporting, BI and analytics, including SPARK and ML

9. Bitcoin/Blockchain transaction reporting, BI and analytics for fraud detection – graph visualization

10. Virtual graph database, link analysis and graph visualization using simple SQL – OEM KeyLines

visualization

Revision 12.2 Copyright 2020 WhamTech, Inc. 56

Page 57: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Reasons why conventional data virtualization fails to 100% deliver

FOR EACH DATA SOURCE:

• DATA QUALITY ISSUES

• Needing cleansed, typos, transpositions, missing, wrongly placed, etc.

• DATA PRE-PROCESSING NEEDS

• Text analytics on unstructured data, e.g., entity extraction, and other analytics on structured data

• DATA STANDARDIZATION ISSUES

• Different format, type, values, etc. – can impact range queries

• INDEX and INDEXED VIEW LIMITATIONS OR NOT AVAILABLE

• Queries unable to execute, need full-table scans and/or poor query performance

• QUERY PROCESSING LIMITATIONS

- Capabilities

- Performance/scale

- Load

• RESULTS DATA INCORRECT OR INCOMPLETE

• ACCESS CONTROL and DATA SECURITY LIMITATIONS

• Assume AD/LDAP-based IAM in-place, is SSO?

• Limited security levels, e.g., RBAC, ABAC/RLS and CLS

• May read/access incorrect, protected or sensitive data

Revision 12.2 Copyright 2020 WhamTech, Inc. 57

Page 58: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® addresses issues with… (1 of 2)

DATA, by enabling:

1. Automated data discovery, profiling, identification, quality, standardization and governance that:

a. Can be acted on directly as part of SmartData Fabric® index and query processing layer vs. a one-time handoff to a

metadata repository, data governance or ETL system

b. Updates complete metadata/data profiles as indexes are updated

2. Data to be cleansed, transformed, standardized, masked, tokenized and/or encrypted in indexes and indexed

views, e.g., for personal, sensitive, MDM, other entity, “dirty” and incomplete data

3. Data/entity link/relationship mapping within and across multiple data sources for MDM, virtual graph database

and other uses

4. Seamless, automatic and optionally distributed MDM with near real-time updates for integration within and

across multiple data sources

5. Standard data views, business objects and knowledge graph across all data

6. Integration of unstructured data with structured data through text analytics, e.g., entity extraction and OCR,

and search

7. Data monitoring, event processing and BPM workflows in near real-time, e.g., operational reporting, BI and

analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 58

Page 59: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

DATA SOURCES, by enabling:

8. Standard drivers of ODBC, JDBC and APIs, and/or SQL query processing for incompatible formats, e.g.,

mainframe files, file systems, IoT devices, office docs, email, Web pages and other unstructured/semi-

structured data sources

9. An external indexing and query processing layer that can absorb the load of external queries

DATA SOURCE ACCESS CONTROL, by enabling:

10.Advanced access control within and across domains, e.g., AD/LDAP, IAM, SSO, RBAC, ABAC/RLS and CLS,

regardless of data support for any of these – also applies to conventional federated adapters

DEPLOYMENT, by enabling:

11.Hybrid Cloud 2.0 where compute is in the Cloud or a data center, but data sources remain remote on-premise,

in data centers, SaaS, third-parties, multi-Cloud, etc., in addition to Hybrid Cloud 1.0

Revision 12.2 Copyright 2020 WhamTech, Inc. 59

SmartData Fabric® addresses issues with… (2 of 2)

Page 60: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Conventional vs. SDF adapters costs and ROI comparison

Revision 12.2 Copyright 2020 WhamTech, Inc. 60

Attribute Conventional Federated Data

Access Adapters

WhamTech SDF Adapters

Costs - TCO Up to 1000 % of WhamTech 100%

ROI – assuming TCO as

basis, and revenue

gains and cost savings

0 - 10 10 – 100

Capabilities Basic Advanced – more capabilities for less cost

Perpetual License

Costs – CAPEX

IBM and others > 200% of

WhamTech; some freeware and Red

Hat < WhamTech

100%, starting at $10K per data source

Lease/SaaS Costs Assume 40% of perpetual license

costs per year, including

maintenance and support

40% of perpetual license costs per year,

including maintenance and support

Implementation Costs Up to 500% of WhamTech, long

duration to implement

100%, relatively simple to implement = low

costs and short duration

Maintenance and

Support Costs

18% of perpetual license costs –

included in lease/SaaS costs

18% of perpetual license costs – included in

lease/SaaS costs

Page 61: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Simplified* layer capabilities

Revision 12.2 Copyright 2020 WhamTech, Inc. 61

Discovery, profiling and correlation – device, source and data

Security – classification, data masking, tokenization and encryption

Indexing – content, security, extracted entities, indexed views, unstructured and Link Indexes™

Quality – cleansing, transformation and standardization

Analytics – parsing, categorization, entity extraction and other analytics

Standard data view mapping – more than one possible

Master data management

Event processing Event correlation [Anomaly detection]

Support for BI/analytics

Support for CRMSupport for BPM/ decision support

Support for interoperability

RBAC and data loss prevention

Link analysis/graph database

VisualizationBig Data/analytics data provisioning

Standard drivers, APIs, Web/data services, SQL and other query languages

Post-index, standard

data view, multi-record

processing

Built-in support for

common applications

Built-in advanced

capabilities

Pre-index, single record

processing

AU

TO

MA

TIO

N

Data

SourcesMainframes Databases Files Logs

Officedocs

Applications EmailWebdocs

Socialmedia

BigData

StreamingCloud

DBIoT

Applications*Detailed layer diagram in

Appendix on slide 75

Optional Data Lake – Centralized or Distributed

Page 62: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

EIQ Adapter

Data source-specific

Query Transform

Application to Standard Data View Mapping

SDF EIQ Adapter index and query process

Revision 12.2 Copyright 2020 WhamTech, Inc. 62

EIQ Product

front-end

Data

Source

Data

Source

EIQ Indexes

Update ServerData Profiler

Read Transform

Index (RTI) Tool

Data Transforms/clean-ups

Data Retrieval

CONVENTIONAL DRIVER

OR BULK LOAD

USER API / DRIVER

EIQ Adapter

Other data source EIQ Adapters

and EIQ Federation Servers

DISCOVERY

INITIAL INDEX BUILD

CONTINUOUS INDEX UPDATE

QUERY PROCESSING

RESULTS RETRIEVAL

STANDARD

DRIVER

SQL

DEVELOP

and TEST

USED BY

BUILD

Transaction

Log

MESSAGE QUEUE

Data Discovery

Automatic Query Processing

BI / Analytics / Application(s)

Standard Data View Mapping to EIQ Indexes

EIQ Federation Server

EIQ Federation

Server

Result-set

data source

pointers

Page 63: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® Keyword Descriptions (1 of 2)

DISTRIBUTED aka FEDERATED (not centralized)

VIRTUAL (leaves data where it resides – uses federated adapters and/or stores data

in indexes as needed)

SECURITY (data and access to it)

DATA MANAGEMENT (data discovery, classification, security, processing/analytics,

cleansing, transformation, standardization, mapping to a standard data view,

linking/matching within and across data sources, indexing and query processing)

MASTER DATA MANAGEMENT (hybrid [limited repository + full registry] near real-

time distributed, seamless and automatic integration with data access)

ANALYTICS (built-in, externally run against and highly curated data provisioning for)

INDEPENDENT (of where data resides and associated systems, and configurations)

INTEROPERABLE (can write back to data sources)Revision 12.2 Copyright 2020 WhamTech, Inc. 63

Page 64: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® Keyword Descriptions (2 of 2)

INDEXES SYNCED TO DATA SOURCES IN REAL-TIME TO BATCH (twelve

change data capture options)

VIRTUAL GRAPH DATABASE (semantic model)

VIRTUAL LINK ANALYSIS (find connections between entities, n degrees of

separation)

GRAPH/LINK VISUALIZATION (highly interactive thin client, OEM tool)

RUNS IN CLOUD, ON PREMISE, IN DATA CENTERS OR AS HYBRID (including

Hybrid Cloud 2.0)

EVENT PROCESSING (as indexes and index views are being updated)

ULTIMATE METADATA MANAGEMENT (complete on all data)

BUILDS AND SUPPORTS DATA GOVERNANCE (bottom-up/edge-in)EOS

Revision 12.2 Copyright 2020 WhamTech, Inc. 64

Page 65: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Advanced data management

• Combine the best and overcome the worst of conventional approaches of data warehousing, federated data

access and enterprise search through index-based federated adapters to create a Hybrid SmartData

Fabric/Lake without needing to copy or move data from source systems, although that is an option

• Access adapters and federation servers at any level and from any location with advanced access control

• Include all data sources – from mainframes to IoT devices, on premise, Cloud, Hybrid Cloud, external, etc.

• Use multiple types of indexes and indexed views – distributed, 100% contiguous across data sources,

columnar, file-based and contain pointers to source data or data itself

• Federate/distribute data governance built and maintained from the bottom up as systems are discovered,

read, indexed and metadata captured – combine with advanced access security and data security - can

obtain (and store) a complete centralized data governance view at any time, and intervene and impose as

needed

• Federate/distribute metadata repository, data discovery, classification, security, quality, transforms,

relationships and mapping to standard data views

• Seamless, automatic and near real-time updateable integration of master data management to enable single

customer/patient and other entity views across the extended organization – can also federate/distribute

master data

Revision 12.2 Copyright 2020 WhamTech, Inc. 65

Page 66: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Advanced application support

• Combine queries on structured, semi-structured and unstructured data

• Accelerate query processing on existing systems, but almost no load on data sources

• High performance parallel/edge query processing

• Enable direct access to data sources through indexes and/or use indexes to represent and use data logically

as (1) objects, (2) relational, (3) hierarchical and (4) NoSQL/Big Table

• Built-in virtual graph database, link analysis and graph visualization - use simple SQL

• Event processing - monitor changes to data sources through indexes and indexed views, trigger workflows,

and update applications and visualizations, e.g., operational dashboards and graphs

• True interoperability based on single customer/patient views with both read and write-back to data sources –

goal to have almost any application working with almost any data source(s)

• Provision highly curated data to Big Data/analytics in near real-time

• Bridge the gap between enterprise operational/transactional systems and reporting, BI, analytics and other

applications – tend towards closing the loop in near real-time

• Partnered with Tableau reporting, BI and analytics tool, Cambridge Intelligence (KeyLines) highly interactive

graph/link visualization tool

Revision 12.2 Copyright 2020 WhamTech, Inc. 66

Page 67: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Advanced data source access and data security

• Leverage centrally managed AD/LDAP, IAM, SSO with Kerberos, RBAC, ABAC/RLS and CLS

– Data source stewards can have ultimate veto

• Supports advanced access security and data security for all data sources regardless of data source system

support

• Cross/multi-domain support (a major hurdle for most solutions)

• Be a data security gatekeeper for data sources

– Follow Forrester Zero Trust Data Security Model = Discover, INDEX, classify and secure

• All results data traceable to source records

• Dynamic data masking, tokenization and encryption (third-party Format-Preserving Encryption (FPE))

• Data governance from the bottom-up and/or can support a top-down tool

• Full auditability

• Support for third-party User Behavior Analytics (UBA)

– Alleviates/prevents insider data thefts (25%) and external origin (hacks) data thefts (75%)

– Leverage user logs, including queries made

Revision 12.2 Copyright 2020 WhamTech, Inc. 67

Page 68: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Other advanced capabilities

NEAR REAL-TIME ARCHITECTURE

• Edge process data

• Enable a near real-time data/event-driven architecture

• Build new workflows using BPM software on top of legacy systems to support operations, CRM, smartphone

apps, IOT devices, reporting, BI and analytics

STANDARDS

• Standard drivers, APIs, Web/data services, REST APIs, ANSI SQL and other query languages with

conversion, Cloud, VMs, PMs, Windows, Linux and soon-to-be IBM Power Systems

• Standard data models/views, e.g., HL7 and FHIR for healthcare, NIEM for government and other areas, XBR

and others for financial services, ACORD for insurance or organization’s own

Revision 12.2 Copyright 2020 WhamTech, Inc. 68

Page 69: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® example Bluemix deployment

Revision 12.2 Copyright 2020 WhamTech, Inc. 69

• Multiple access methods

• Multiple query language options

• Multiple ways to represent data

• Standard data view, e.g., FHIR APIs

and NIEM

• Cloud platform-based data services

• New BPM workflows running on

legacy data sources

• Write-back to data sources

• VMPI-governed data access

• Multiple legacy data sources

• Data sources could be in multiple

organizations

• Data sources could be on premise

and in the Cloud – Hybrid Cloud

access

Page 70: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Multi-level value contribution

Revision 12.2 Copyright 2020 WhamTech, Inc. 70

Convert data to value-added INFORMATION

Advancedtext

search

Entityextraction

Indexedviews

LinkIndexes™/mapping

Real-timealerts

Entityresolution

CEP Categorization

Provide basic indexed virtual DATA discovery, access, integration, sharing and interoperability

Accessalmost

any datasource

Work withstructured

andunstructured

data

Improvedata

quality

Buildstructured

andunstructured

indexes

Integratemultiple

datasources

Leavedata insources

Scale withdistributed

parallelprocessing

Almostno loadon datasources

Map toa virtualstandarddata view

Updatein near

real-time

Datadiscovery

Dataprofiling

Convert knowledge to SUCCESS OUTCOMES

Tend toreal-time

Improvecompliance

Gaincustomers

Improvecustomer

experience

Upsell andcross-sell

customers

Reducecosts

Increaserevenue

Increaseprofit

Improvereporting

Reducewaste

Reduceliability

Convert information to KNOWLEDGE

Decisionsupport

BPM CRM/MDM

BI/analytics

EHR/HIE

Visualization Ontologyrepresentation

Linkanalysis

Socialanalytics

Page 71: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® is location agnostic

Revision 12.2 Copyright 2020 WhamTech, Inc. 71

Page 72: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® is configuration agnostic

Revision 12.2 Copyright 2020 WhamTech, Inc. 72

Page 73: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Data-driven bottom-up vs top-down approach

Revision 12.2 Copyright 2020 WhamTech, Inc. 73

For each data source:

Application/Middleware

External Query in SQL

Data Source

Data Source Driver/API/Web Service

Data Quality/Parser/Entity Extraction/Other

WhamTech’s Automatic Query Processor

WhamTech’s Mapping Layer

WhamTech’s Standard Drivers/APIs/Web Services

WhamTech Link Indexes™

Transaction Log Reader,

MQ or similar

EIQ SuperAdapter™

WhamTech Content Indexes

Data discovery and profiling

Page 74: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Detailed SmartData Fabric® layer capabilities

Revision 12.2 Copyright 2020 WhamTech, Inc. 74

Living

Networks™

Real-time

Business

Intelligence

Distributed

Analytics

Virtual

Graph

Database

Complex

Event

Processin

g

Link

Analysis

Social

Network

Analysi

s

Enterpris

e

and Web

Search

CDI-MDM/

Single

Entity View

Other

ApplicationsApplications

Changed Data Capture Intelligent Spider™

Device, Source and Data Discovery

Metadata Discovery/Data Profiling

Entity Extraction/NLP/Categorization/Other Text Processing

Data Cleansing, Transformation, Standardization, Masking, Tokenization and Encryption

Structured

Indexes

Text

Indexes

Extracted

Entity

Indexes

Fuzzy

Match

Indexes

Pre-

aggregated

Indexes

Pre-

calculated

Indexes

Embedded

Value

Indexes

Join

Indexes

Link

Indexes™

De-

normalized

Indexes

Data Security Layer – Query Side

Automatic Query Processing

EIQ SuperAdapter™

Standard drivers, APIs, Web/data services, SQL and other query languages

Real-time Monitoring and Event Processing

Administration

and

Configuration

Tools

Security

and

Privacy

Access

Controls

Master

Data

Indexe

s

Data Security Layer – Data Side

Semantic Mapping to Standard Data View(s)SmartData

Fabric®

Relationa

l

Database

s

Enterprise documents

and email

Mainframe

data

Spidered files from Web and

other sources

Web Services Applications

Standard, Proprietary

and Web Service Drivers

Application

Drivers

Files

Changed Data Capture

Data Sources

Network Assets and Devices

Network Assets and

Devices

Metadata

Management

and

Repository,

incl. Data

Governance

Master Data Management

Page 75: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Discovery

SmartData Fabric® impact diagram

Revision 12.2 Copyright 2020 WhamTech, Inc. 75

Standard

Data View

ResultsContent

Indexes

Link

Indexes

Master Data

and Indexes

Device/Host

Source

Data

Profiling

Intra and Inter-

Source Data

Correlation

Index Data Preparation and

Results Data Transformation

Entity Extraction

Transform

Development

and Testing

Transform

Masking,

Tokenization

and Encryption

Structured

Unstructured Parser

Semantic

Identification

Categorization

Security

Classification

Structured

Indexes

(most data

discarded)

Unstructured

Indexes

(most data

discarded)

Standard

Data View

(indexes

semantically

mapped)

Distributed

Metadata

RepositoryLink

Indexes™

Security

Classification

Master

Data

Link

Analytics

and

Visualization

Master

Data

Indexes

Results

Data

Results

Data

Results

Data

Pointers

Distributed

Metadata

Indexes

Page 76: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

ResultsMaster Data

and Indexes

Content

Indexes

Index Data Preparation and

Results Data Transformation

Discovery

3

SmartData Fabric® impact diagram for query submission

Revision 12.2 Copyright 2020 WhamTech, Inc. 76

Standard

Data View

Link

Indexes

Data

Structured

Unstructured

Structured

Indexes

(most data

discarded)

Unstructured

Indexes

(most data

discarded)

Standard

Data View

(indexes

semantically

mapped)

Distributed

Metadata

RepositoryLink

Indexes™

Master

Data

Link

Analytics

and

Visualization

Master

Data

Indexes

Distributed

Metadata

Indexes

APPLICATION

1 1

2

3

3

3

Page 77: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

ResultsDiscovery Standard

Data View

Content

Indexes

Link

Indexes

Master Data

and Indexes

Index Data Preparation and

Results Data Transformation

Data

Transform

Masking,

Tokenization

and Encryption

Structured

Unstructured

Structured

Indexes

(most data

discarded)

Unstructured

Indexes

(most data

discarded)

Standard

Data View

(indexes

semantically

mapped)

Distributed

Metadata

RepositoryLink

Indexes™

Master

Data

Link

Analytics

and

Visualization

Master

Data

Indexes

Results

Data

Results

Data

Pointers

Distributed

Metadata

Indexes

4

7

5

8

4

49

SmartData Fabric® impact diagram for results retrieval

Revision 12.2 Copyright 2020 WhamTech, Inc. 77

APPLICATION

5

6

Page 78: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® processes (1 of 5)

Automate (using BPM) as much as possible:

• Deploy on AWS, Azure, IBM Bluemix, OpenStack, VMs, physical servers – other

cloud options available

• Instantiate an EIQ System Administration and Configuration Tool

• Instantiate a distributed network asset/device, data source and metadata

repository

• Network asset/device discovery

• Data source discovery

− Using network asset/device discovery tool

− Using spiders for eDiscovery-type documents, files, email, etc.

• Instantiate EIQ Adapters™ on demand

Revision 12.2 Copyright 2020 WhamTech, Inc. 78

Page 79: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® processes (2 of 5)

• Data discovery

− Optionally, with raw Link Indexes™ (internal and external pre-joins)

• Data identification

• DSL: Data risk classification

• CS: Event correlation

• Data profiling for data transforms for typos, transpositions and non-standard data,

e.g., name, address, phone and email correction

− Lookup dictionaries and thesauri, USPS or other address correction, regular expressions, APIs,

DLLs, transformation server, etc.

− DSL: Masking, tokenization or encryption for indexed data or dynamically depending on

access controls

Revision 12.2 Copyright 2020 WhamTech, Inc. 79

DSL = Data Security Layer

CS = Cybersecurity

Page 80: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® processes (3 of 5)

• Multiple indexes and types, e.g., basic content, DSL: security (classification),

aggregations, calculations, fuzzy, text, extracted entities and Link Indexes

• DSL: Can encrypt entire disc volumes, individual indexes or entire sets of indexes

• MDM: Data source-specific tables containing unique indexed primary entity IDs,

and master data, links and date-time

− Create using Link Index process, with multi-attribute fuzzy match for composite scoring and

master data rules

• DSL: WhamTech Security and Privacy Access Profiles (SPAPs) or other Role-

Based Access Control

− Current: Source organization, user, role, application, target organization and data source

profiles available

− Future: Extend for application processes

Revision 12.2 Copyright 2020 WhamTech, Inc. 80

MDM = Master Data Management

Page 81: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® processes (4 of 5)

• Hierarchies honored through joins and/or Link Indexes

− Inferred ontologies

− Reasons for hierarchies change depending on application, e.g., one vendor has multiple

products and one product from multiple vendors

• MDM: Versioning with access to historic master data

• Combine with other data sources, tending towards EDW/enterprise solutions

• MDM: Pure registry option to replace either data source indexes or source data

itself (automatically updates indexes) with master data

− Pure registry-based master data table, but limits options, lower performance and more complex

• Execute analytics, combined with other data and search/query filters, e.g.,

reporting, BI and link analysis/graph database

− Include aggregations, calculations, master data (if available) and other data, e.g., external

Revision 12.2 Copyright 2020 WhamTech, Inc. 81

Page 82: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

SmartData Fabric® processes (5 of 5)

• Write back selective updates/corrections to data sources with possible inverse

data transforms (MDM: See previous slide)

• Continuously monitor metadata (index tree profiles) using stored procedures with

triggers

− Helps identify anomalies/outliers

• Event processing enabled (federated solution for Oracle® Event Processing)

– Open source and commercial BPM software for non-Oracle solutions

• Interoperability query transformation to avoid rewriting applications

– Goal to enable almost any application(s) to work with almost any data source(s)

• Mainframe data source option – files and live systems

• Hadoop (HBase/Hive and HDFS levels) and Cassandra options

Revision 12.2 Copyright 2020 WhamTech, Inc. 82

Page 83: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Revision 12.2 Copyright 2020 WhamTech, Inc. 83

SmartData Fabric® data processes

Device/host

discovery

Data source

discovery

Data quality/transformation/

masking/tokenization/encryption

Query processing

Reported

Analyzed

Acted on

Discarded/retained

Data discovery

and profiling

Data and

link indexing

Results retrieval

(Results data quality/

transformation)

Note: Data not copied or moved

- only results retrieved

Data security

classification

Master data

management

Page 84: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Solutions to multiple problems in one platform (1 of 2)

Revision 12.2 Copyright 2020 WhamTech, Inc. 84

Applications

Basic

Product

Optional

Modules

Distributed Data Virtualization (and Federation, Integration and Interoperability) Platform,

aka SmartData Fabric®

Virtual interactive

link analysis

and visualization

Virtual

reporting, BI

and analytics

Virtual

MDMVirtual

cybersecurity

Virtual

data

security

Virtual

event

processing

Virtual network, data

source and advanced

data discovery

Page 85: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Solutions to multiple problems in one platform (2 of 2)

1. SMARTDATA FABRIC™ (SDF) for basic data discovery, profiling, quality, mapping, indexing, virtualization, federation,

integration and interoperability as basis for capability support modules and applications

2. EXTEND SDF WITH AUTOMATED NETWORK, DATA SOURCE AND ADVANCED DATA DISCOVERY including

relationships and eventual automated mapping to a standard data view

3. EXTEND SDF WITH EVENT PROCESSING to keep track of significant changes occurring in data

4. EXTEND SDF WITH DISTRIBUTED (preferably HYBRID) MDM to seamlessly combine with operational/transactional

data and maintain in near real-time

5. EXTEND SDF WITH IMPROVED CYBERSECURITY through indexed federated log and other data source access,

including automated anomaly detection and automated event correlation

6. EXTEND SDF WITH VIRTUAL DATA SECURITY LAYER to defend and protect data of value (i) as it is created, (ii) at

rest in the source, (iii) in transit, (iv) at the recipient and (v) after no longer needed

7. EXTEND SDF WITH BI/ANALYTICS oriented virtual and materialized real-time updateable hierarchical indexed views,

text analytics including entity extraction, and locally executed algorithms

8. EXTEND SDF WITH LINK ANALYSIS (and OEM LINK VISUALIZATION) for link analysis/graph database for almost

any type of analytics, including virtual MDM (master patient index), cybersecurity and data security

Revision 12.2 Copyright 2020 WhamTech, Inc. 85

Page 86: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Discovery/raw index process

Revision 12.2 Copyright 2020 WhamTech, Inc. 86

Read

File

Record

Parse

Header

3rd party

Index

Content

ProfileValue

distribution

Metadata

Auto

identify

Personal data

Other data

Auto map to

standard data view

Auto transform

Page 87: aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic Overview May 2020 ... Unique security-centric distributed indexed adapter-based data

SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics

Production index process

Revision 12.2 Copyright 2020 WhamTech, Inc. 87

Read

File

Record

Parse

Header

3rd party

Process

Entity extraction

Analyze

Transform

Cleanse

Data type

Standardize

Secure

Index

Content

Link

Master data