new innovations in information management for big data - smarter business 2013
DESCRIPTION
Big data has changed the IT landscape. Learn how your existing IIG investment, combined with our latest innovations in integration and governance, is a springboard to success with big data use cases that unlock valuable new insights. Presenter: David Corrigan, Big Data Specialist, IBMTRANSCRIPT
New Innovations in Information Integration & Governance (IIG) for Big Data
David CorriganDirector of Product Marketing, InfoSphere
Data Confidence Is Essential
If you want to find new insights from big data . . .
and ACT on those insights . . .
you need confidence in the data used for insight
Information Integration & Governance (IIG)• Make decisions with greater certainty• Analyze rapidly while providing necessary controls• Increase the value of data
Building Big Data Confidence is Essential
3x 77%80%
Organizations with IIG outperform their
competitors
OutperformCompetitors
Organizations rated their decision making as good or excellent
Transform the Front Office Experience
EstablishTrusted Information
Organizations establish high or very high level of
trust in data
IIG Evolves for the Era of Big Data
Automated Integration
Business users need rapid data provisioning among the zones
Visual Context
Categorize, index, and findbig data to optimize its usage
Agile Governance
Ensure appropriate actions based on the value of the data
1
2
3
How do I get access to new big data sources?
How do I digest all of this new information?
How do manage all of this new data?
Six Innovations that Build Big Data Confidence
VisualContext
Agile Governance
Automated Integration
Big Match
Integration of master records from big data with probabilistic
matching powered by Hadoop
Big DataCatalogue
Categorize metadata on all big data sources
MDM for Big Data
Rapid mastering of new big data sources and extension of 360° view with unstructured big data
* Statement of Direction
Data Click
Self-service data provisioning for big data repositories
Information GovernanceDashboard
Visual context to give immediate status on governance policies
Big Data Privacy & Security
Monitor and mask sensitive big data in Hadoop, NoSQL, &
relational systems *
*
*
InfoSphere Data ClickSelf-service Data Provisioning
Innovation• Two-click data provisioning designed for
business users• Integration of more big data sources – JSON,
NoSQL, Hadoop, JDBC Value• Rapid provisioning of ad-hoc repositories • Faster time to insight• Self service to eliminate the IT bottleneck
Usage• Enables rapid analysis of big data sources
Data Provisioning in
1 5000th
the timeOf traditional
approach
Automated Integration
2 Click Data
Access
* Source: IBM performance lab testing, showing JDBC inserts at 5.8% to 74% faster
Big MatchFind & Integrate Master Data in Big Data Sources
MDM BigInsights
Big Match Engine
Match
Millions Of Records
Automated Integration
How It Works• Probabilistic matching on big data platform
(BigInsights-Hadoop)• Matching at a higher volume• Matching of a wider variety of data sets Client Value• Find master data within big data sources• Get an answer faster – enable real-time
matching at big data volumes
Usage• Provides more context by detecting master
entities faster
* Source: IBM InfoSphere performanceteam test results
Big Data CatalogueFind Big Data More Easily
VisualContext
Big Data Catalogue
170xImprovement in metadata import
performance*
Innovation• Stores metadata on every available big data
source• Provides structure to the Hadoop landing zone
so data may be easily found and leveraged • Classifies data (origin, lineage, source, value….)
Value• Find data more easily within a growing Hadoop
landing zone and a complex zone architecture• Rapidly leverage new big data sources
Usage• Enables optimal usage of big data * Source: IBM internal performance
results, where three test runs with the latest version averaged 11.46 seconds vs 1,964 seconds with the previous release
Information Governance DashboardVisualize and Control Governance Visual
Context
Innovation• Measurements for policies and KPIs • Rapid creation of tailored dashboards Value• Immediate insight into governance policy status • Interception of issues when they start, right at
the source
Usage• Raises data confidence with visual governance
status
1000s Of data points
and policies visualized
Big Data Privacy and SecurityProtect a Wider Variety of Sources
InfoSphere Optim
InfoSphere
Guardium
Agile Governance
80%Faster Activity Monitoring*
Innovation• Data activity monitoring of more NoSQL,
Hadoop, and Relational Systems• Masking of sensitive data used in Hadoop Value• Protection is a pre-requisite for the fundamental
assumption of big data – sharing data for new insight
• Automation enables protection without inhibiting speed
Usage• Ensures sensitive data is protected and secure
RDBMS
Hadoop
NoSQL
Data Warehouses
Application Data and Files
•Source: IBM internal benchmarks of InfoSphere Guardium V9 p50
MDM for Big DataThe Complete 360° View of Important Data
MDM Data Explorer
Agile Governance
21KCustomer-centric transactions per
second*
How It Works• Extend the master view with federated,
unstructured big data• Hybrid styles enable linking source records or
consolidating based on confidence
Client Value• Visualize every related data item in the 360° view• Rapidly onboard new big data sources • MDM adapts to the source
Usage• Provides a complete understanding of the
customer or master entity
* Source: InfoSphere MDM with DB2 pureScale achieves: 21,000 customer-centric transactions a second, 2X transaction rate of Oracle MDM on Exalogic/Exadata using ½ the number of coresNote to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.Approved Claim in US/Canada only.Results valid as of 10/21/2012.
Demonstration
InfoSphere Delivers Data Confidence For Big Data Use Cases
Big Data Exploration Enhanced 360o Viewof the Customer
Operations Analysis Data Warehouse Augmentation
Security/IntelligenceExtension Understand confidence
Determine risk Establish master record Extent to all sources
Automatic data protection Mask sensitive information
High volume data integration Automatic data protection
High volume data integration Agile big data archiving and retrieval
Use Case Spotlight: Enhanced 360° View
MDM and Big DataDeliver the Complete 360° View
Capabilities Required toBe Successful
1. Combine structured MDM and unstructured big data
2. Rapidly onboard uncertain data sources in a registry style to separate low and high confidence data
3. Find and match master data entities within big data sources
MDM
Integration & Quality
Data ExplorerSingle Version of
the TruthExtended View of Master Data
Use Case Spotlight: Data Warehouse Augmentation
Improve your data warehouseby improving data confidence
Integration & Quality
Data Warehouse
High performancedata loads
MDM
Archiving Security & Privacy
Test Data Management Automated
Archiving Automated Data Protection
Self-service Testing
More Accurate Analysis
Capabilities Required toBe Successful
1. Self-service integration for ad-hoc requests
2. Understand context of all available big data with a single metadata repository and business glossary
3. Mask any variety of sensitive data before ingestion
4. Automatically protect big data with activity monitoring
5. Store and analyze archive files on Hadoop
A Busy Year of Innovation within the Labs
Literally dozens of innovations that raise confidence in big data
Two highlights:
1. BLU Acceleration 2. PureData System
for Hadoop
BLU Acceleration
BLU AccelerationIBM Research & Development Lab Innovations
Dynamic In-Memory In-memory columnar processing withdynamic movement of unused data to storage
Actionable CompressionIndustry’s first data compression that preserves order so that the data can be used without decompressing
Parallel Vector ProcessingMulti-core and SIMD parallelism(Single Instruction Multiple Data)
Data SkippingSkips unnecessary processing of irrelevant data
Super Fast, Super Easy—Create, Load and Go!No indexes, No aggregates, No tuning, No SQL changes,
No schema changes
Iqbal Goralwalla, Head of
DB2 Managed Services, Triton
Lennart Henäng,IT Architect
Yong Zhou, Sr. Manager of DataWarehouse & Business
Intelligence Dept.
BLU Acceleration: Customers are Seeing Great Results
“100x speed up with literally no tuning!”
“Converting this row-organized uncompressed table to a column-organized table in DB2 10.5 delivered a massive 15.4x savings!”
“With BLU Acceleration, we’ve been able to reduce the time spent on pre-aggregation by 30x—from one hour to two minutes! BLU Acceleration is truly amazing.”
PureData System for HadoopBringing big data to the enterprise
Simplify the delivery of unstructured data to the enterprise
Integrate Hadoop with the data warehouse
Leverage Hadoop for data archive
Provide best in class security
Provide data exploration across structured and unstructured data
Accelerate insight with machine data
Accelerate insight with social data
Simplify
Big Data
for the
enterprise
!
Confidence Is Essential for Actionable Insight
• Make decisions with greater certainty• Analyze rapidly while providing necessary
controls• Increase the value of data
Visual Context
Agile Governance
Automated Integration
Understanding Your Data is the Basis for Confidence