innovations in big data & analytics
DESCRIPTION
Phil Thomas, an IBM Big Data and Analytics Architect presented recently at Internet World - this is the presentation he used. Read his supporting blog post about the event here - http://ukbigdataevents.wordpress.com/2014/08/18/innovations-in-big-data-analytics-by-philip-thomas-software-client-architect/TRANSCRIPT
© 2014 IBM Corporation
Innovations in Big Data & Analytics
Phil Thomas
IBM Big Data & Analytics Architect
© 2014 IBM Corporation
Big Data & Analytics is a journey.
Be proactive
about privacy,
security and
governance
Build a culture
that infuses
analytics
everywhere
Invest in a
big data &
analytics
platform
Imagine It. Realise It. Trust It.
© 2014 IBM Corporation
Data at Scale Data in Many Forms Data in Motion Data Uncertainty
Big Data Is All Data
VolumeVolume VarietyVariety VelocityVelocity VeracityVeracity
But Without a Value Target and Case is Simply a Waste
ValueValue
Growth Efficiency Engagement Automation
© 2014 IBM Corporation
Analytics and big data includes traditional and new techniques
• Reconcile data sources together
• Query relational warehouses
• Individual transaction records
• Surface data directly from its source
• Query specialized systems
• Data relationships and networks
• Graphs and reports
• Hierarchical navigation
• Managed and adhoc delivery
• Manual analysis and action
• Visualize masses of data
• Context and relationship navigation
• Exploration of what’s important
• Automated action
• Numeric data and text attributes
• Sample based models
• Data analyzed at rest
• Humans interpret patterns
• Linguistic interpretation of meaning
• More accurate models
• Analyze stream data in motion
• Algorithms uncover hidden patterns
User
Interaction
Traditional Analytics + Big Data Analytics
Data
Access
Applied
Analytics
4
© 2014 IBM Corporation
New/Enhanced
Applications
Why did it happen?Reporting, Analysis, Content Analytics
What did I learn, what’s best?Cognitive
What is happening?Exploration & Discovery
What action should I take?Decision Management
What could happen?Predictive Analytics & Modelling
BeMore Right, More Often
Realise It. Invest
© 2014 IBM Corporation
IBM Big Data & Analytics Platform
Systems, Security, Storage
IBM Big Data & Analytics Infrastructure
All Data
Reporting, Analysis, Content Analytics
Cognitive
Exploration & Discovery
Decision Management
Predictive Analytics & Modeling
Information Governance Zone
New/Enhanced
Applications
Real-time Analytics
Zone
Exploration, Landing &
Archive Zone
Information Ingestion & Operational Information
Zone
Enterprise Warehouse, Data Mart &
Analytic Appliance
Zone
Realize It. Invest in a Big Data & Analytics platform.
© 2014 IBM Corporation
Unique – fuels journey to Cognitive
Innovative – easy to consume
Complete – enterprise-ready
Fast – start anywhere and grow
Watson Foundations te cornerstone of our IBMBig Data & Analytics Portfolio
WATSON FOUNDATIONS
Sales Marketing Finance Operations HRRisk ITFraud
IBM Watson™ and Industry Solutions
SOLUTIONS
CONSULTING AND IMPLEMENTATION SERVICES
BIG DATA & ANALYTICS INFRASTRUCTURE
DecisionManagement
Planning &Forecasting
Discovery &Exploration
Business Intelligence & Predictive Analytics
ContentAnalytics
Information Integration & Governance
Data Mgmt & Warehouse
HadoopSystem
StreamComputing
ContentManagement
WATSON FOUNDATIONS
Sales Marketing Finance Operations HRRisk ITFraud
IBM Watson™ and Industry Solutions
SOLUTIONS
CONSULTING AND IMPLEMENTATION SERVICES
BIG DATA & ANALYTICS INFRASTRUCTURE
DecisionManagement
Planning &Forecasting
Discovery &Exploration
Business Intelligence & Predictive AnalyticsBusiness Intelligence & Predictive Analytics
ContentAnalytics
Information Integration & Governance
Data Mgmt & Warehouse
HadoopSystem
StreamComputing
ContentManagement
© 2014 IBM Corporation
…Helps me discover fresh insights
� Predictive and content analytics to
uncover patterns not yet known
� Interactive exploration across all data
…Operates in a timely fashion
� Real-time analytics as data flows through an organisation
� Enterprise-class Hadoop that runs 4x faster
� In-memory computing for speed of thought analytics
…Establishes trust so I can act with confidence
� Governance across complete data lifecycle including Hadoop
� Security and privacy with compliance
� Transparency and context to decision-making process
WATSON FOUNDATIONS
Decision
Management
Planning &
Forecasting
Discovery &
Exploration
Business Intelligence & Predictive Analytics
Content
Analytics
Information Integration & Governance
Data Mgmt &Warehouse
HadoopSystem
StreamComputing
Content Management
WATSON FOUNDATIONS
Decision
Management
Planning &
Forecasting
Discovery &
Exploration
Business Intelligence & Predictive AnalyticsBusiness Intelligence & Predictive Analytics
Content
Analytics
Information Integration & Governance
Data Mgmt &Warehouse
HadoopSystem
StreamComputing
Content Management
Watson Foundations uniquely…
© 2014 IBM Corporation
Information Integration & Governance
Exploration, landing and
archiveTrusted data
Reporting & interactive analysis
Deep analytics & modeling
Data types Real-time processing & analytics
Transaction andapplication data
Machine andsensor data
Enterprise content
Social data
Image and video
Third-party data
Operational systems
Actionable insight
Next generation architecture for delivering information and insights
Decision management
Predictive analytics and modeling
Reporting, analysis, content analytics
Discovery and exploration
© 2014 IBM Corporation
What Differentiates IBM’s Hadoop Offering?
BigInsights brings the power of Hadoop to the Enterprise by providing administration, discovery, development, security, and best-in-class analytic capabilities.
BigInsights(Blue Suit Hadoop)
Pure Open Source Code
+Optional Enterprise Class Extensions
IBM Support Infrastructure
+=
“Our customers send roughly 35 billion emails every year, and with every email they send, we have more data that we can analyse and feed back to them to help improve their success. Our work analysing email delivery times has already
given our customers a 15-25% lift in their email campaign performance – and that means more customers in their doors and increased revenue.”– Jesse Harriott, Chief Analytics Officer
© 2014 IBM Corporation
Streams
BigInsights builds on open source Hadoop capabilities for Enterprise Class Deployments
Watson
Explorer
Cognos
BI
• Accelerators
InfoSphere BigInsights
Open source based
components
Workload Management
Security
Development Environment
Analytics
Extractors and APIs
Enterprise capabilities
performance gains* on average over open source Hadoop
General Parallel Filesystem
Big R
Open sourcebase
*. Audited STAC® Report Securities Technology Analysis Center
BigSheets
Watson
Explorer
Watson
Explorer
Cognos
BI
Watson
Explorer
Cognos
BI
Watson
Explorer
BigSheetsBigSheets
Streams
BigSheets
Streams
BigSheets
Streams
BigSheets
Streams Watson
Explorer
Watson
Explorer
Watson
Explorer
Watson
Explorer
Watson
ExplorerStreamsStreams Watson
Explorer
Streams
BigSheetsBigSheets Cognos
BI
BigSheets Cognos
BI
© 2014 IBM Corporation
Big SQL 3.0: Native SQL Query Access for Hadoop
Big SQL EngineBig SQL Engine
BigInsights
Data Sources
SQL
Hive Tables HBase tables CSV Files
Application
JDBC / ODBC Server
JDBC / ODBC Driver
� Native SQL Access to data stored in
BigInsights
� Rich SQL support (ANSI, IBM, Oracle, Teradata)
� IBM Optimiser, Compiler and Runtime
ported to Hadoop
� Native Hadoop data formats
� High performance, highly scalable
� Federated query
� Granular row / column security
Get the technical white paper at
https://ibm.biz/BdRWsK
© 2014 IBM Corporation
Big Data ExplorationQuick time to value for big data
discovery & exploration
•Locate and understand existing data sources
•Expose data for new uses, without copying the data to a central location
•Get up & running quickly; discover and tag relevant big data
•Develop new insights and hypotheses
•Connect employees with all of the data at the point of impact
•Use big data sources in new information-centric applications
13
© 2014 IBM Corporation
Watson Explorer
14
CM, RM, DM RDBMS Feeds Web 2.0 Email Web CRM, ERP File Systems
ConnectorFramework
App Builder
BigInsights
Integration & Governance
UI / User
Streams WarehouseData Explorer
Find, visualise, understand all big data
to improve decision making
• Increase revenue, productivity
and efficiency by facilitating
navigation of Big Data (structured
& unstructured)
• Discover new insights by combining
and analysing various data types
residing in various federated data
repositories
© 2014 IBM Corporation
1515
Highly relevant, personalised
results
Access across many sources
Dynamic categorisation
Leveraging Structured and
unstructured content
Tagging and collaboration
Virtual folders for organising content
Refinements basedon structuredinformation
Expertise location
© 2014 IBM Corporation
Information Integration and Governance in times of Big Data
Monitor Data ActivityMask and Redact
• De-identify sensitive data at source or within Hadoop
• Apply obfuscation techniques to both structured and unstructured data
• Monitor big data sources and Hadoop stack
• Real-time alerts
• Centralised reporting of audit data
IBM InfoSphere BigInsights
MDM BigInsights
Big Match EngineInfoSphere
OptimInfoSphere Guardium
Find & Integrate
Master Data• Probabilistic matching on big
data platform (BigInsights/Hadoop)
• Matching at a higher volume• Matching of a wider variety
of data sets
InfoSphere Master Data Management
© 2014 IBM Corporation
InfoSphere Streams - Real-Time Analytics on Big Data
� Volume− Gigabytes per second or more
− Terabyte per day or more
� Variety− All kinds of data
− All kinds of analytics
� Velocity− Insights in microseconds
� Agility− Dynamically responsive
− Rapid application development
© 2013 IBM Corporation17
Millions of events per
second
Microsecond Latency
Sensor, video, audio, text,and relational data sources
Just-in-time decisions
Powerful Analytics
© 2014 IBM Corporation
Market changes driving the need for next generation databases
Are you ready to respond?
How to do it leveraging existing investments?
How to achieve the full potential without disrupting the business?
The scale and scope of big data present new
opportunities for innovation and
competitive advantage
Technology allows Technology allows us to consume more us to consume more
data and generate data and generate new insightnew insight
Fast access to Fast access to insight is a top insight is a top
requirementrequirement
These insights are These insights are sparking new & sparking new & rapidly evolving rapidly evolving analytic requestsanalytic requests
Businesses need to more quickly generate insight
from information to accelerate decision
making
Organisations need fast, simple and agile
technology strategies for manipulating data and
developing new applications
© 2014 IBM Corporation
Multi-workload database software for the era of big data
DB2 10.5 with BLU Acceleration
� Everything you need for your business in ONE database
− Optimized for transactions and analytics
− Enterprise NoSQL for greater application flexibility – JSON, RDF-Graph, XML
� Always available, fast transactions
− Online rolling maintenance updates with no planned downtime1
− Designed for disaster recovery over distances of 1000s km2
� Real benefits, low risk
− In-memory speed and simplicity on existing infrastructure
− Optimized for SAP workloads
− Average 98% Oracle Database application compatibility3
1) Based on IBM design for normal operation with rolling maintenance updates of DB2 server software on a pureScale cluster. Individual results will vary depending on individual workloads, configurations and conditions, network availability and bandwidth.
2) Based on IBM design for normal operation under typical workload. Individual results will vary depending on individual workloads, configurations and conditions, network availability and bandwidth. 3) Available with DB2 Advanced Enterprise Server Edition..
© 2014 IBM Corporation
What makes BLU Acceleration different?
Unmatched innovations from IBM Research & Development labs
Instructions Data
Results
C1 C2 C3 C4 C5 C6 C7 C8C1 C2 C3 C4 C5 C6 C7 C8
Next Generation In-MemoryIn-memory columnar processing with
dynamic movement of data from storage
Analyse Compressed DataPatented compression technique that preserves order
so data can be used without decompressing
CPU Acceleration
Multi-core and SIMD parallelism
(Single Instruction Multiple Data)
Data SkippingSkips unnecessary processing of irrelevant data
Encoded
© 2014 IBM Corporation
� Answers at the speed of thoughtfor growing revenue, reducing cost and lowering risk
� Next generation in-memory with IBM Research innovations
� 8x-25x faster analytics, with some queries running more than 1000x faster1,2
The benefits of DB2 with BLU Acceleration Analytics for the NOW business
� In-memory performance not limited by availability of memory
� Operational simplicity with “load and go” performance
� No need for indexes, aggregates, or tuning
� Compression savings, “10x. That's how much smaller our tables are with BLU Acceleration” – Andrew Juarez, Coca-Cola Bottling Co.
� Automatically adapts to any server, large or small
� Available for on premise or via the cloud
1 Based on internal IBM testing of sample analytic workloads comparing queries accessing row-based tables on DB2 10.1 vs. columnar tables on DB2 10.5. Performance improvement figures are cumulative of all queries in the workload. Individual results will vary depending on individual workloads, configurations and conditions.
2 Based on internal IBM tests of pure analytic workloads comparing queries accessing row-based tables on DB2 10.1 vs. columnar tables on DB2 10.5. Results not typical. Individual results will vary depending on individual workloads, configurations and conditions, including size and content of the table, and number of elements being queried from a given table.
FastFast
SimpleSimple AgileAgile
BLU Acceleration
© 2014 IBM Corporation
Built-in Expertise� No indexes and minimal tuning
� Data model agnostic
� Fully parallel, optimised In Database Analytics
Integration by Design� Server, Storage, Database in one easy to use package
� Automatic parallelisation and resource optimisation to scale economically
� Enterprise-class security and platform management
Simplified Experience� Up and running in hours
� Minimal ongoing administration
� Standard interfaces to best of breed Analytics, BI, and data integration tools
� Built-in analytics capabilities allow users to derive insight from data quickly
� Easy connectivity to other IBM Big Data Platform components
IBM PureData System for Analytics
© 2014 IBM Corporation
� Animated charts enhance the user experience of general reporting and Cognos Active
Report and allow users to pinpoint trends faster.
� A paradigm shift for delivering value to users with the introduction of visualization
extensibility with RAVE (Rapid Adaptive Visualization Engine).
Interactive Visualisation
Cognos – mobile, interactive visualisation capabilities
© 2014 IBM Corporation24
Browse, find and download visualisations from the extensible visualisation community to quickly provide the best visual for your
reporting needs
� Scatter
� Gantt
� Area
� Radar
� Boxplot
� Dial
� Treemap / Heatmap
� Plus a continually growing set of visualisations
analyticszone.com/visualization
New visualisations are a simple download away
© 2014 IBM Corporation
IBM SPSS Modeler predictive analytics
Hadoop, Netezza, R, DB2 … support Graphical interface, rich visualisations
Real-time deployment / execution Analytic Catalyst – “Analyst in the software”
© 2014 IBM Corporation© 2014 International Business Machines Corporation
Watson is cognitive computing
Understands
natural
language
Generates
and
evaluates
hypotheses
Adapts
and learns
Watson understands me.
Watson engages me.
Watson learns and improves over time.
Watson helps me discover.
Watson establishes trust.
Watson has endless capacity for insight.
Watson operates in a timely fashion.
© 2014 IBM Corporation
Know meLeverage profile data for personalized insight into
client wants and needs to contextualize experience
Client
Watson can transform the way people interact over the lifetime
of their relationship
Empower MeInteractive, informed natural
language dialogue that enables insights at the point of action
Engage meDynamic, evidence-based omni-channel experiences
that adapt to client preferences
© 2014 IBM Corporation
This will be Watson
Sees
Hears
Experiences
Understands natural language
Generates and evaluates hypotheses
Adapts and learns
Reasons
Explores
Visualizes
© 2014 IBM Corporation
Thank You
© 2014 IBM Corporation
Legal Disclaimer
• © IBM Corporation 2014. All Rights Reserved.• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained
in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are
subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to
future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
• Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.