ibm big data plattform · ibm puredata systems overview system for analytics system for operational...
TRANSCRIPT
IBM BIG Data Plattform
Ralph BehrensClient Technical Professional Big DataCertified Netezza SpecialistIBM Software Group Deutschland
© 2013 IBM Corporation2
“Data is the New Oil”
2
“ Data is the new Oil.
Data is just like crude. It’s valuable,
but if unrefined it cannot really be used.”
– Clive Humby, DunnHumbyWE'RE A CUSTOMER SCIENCE COMPANY
© 2013 IBM Corporation
Entdecken
� Einfache Navigieren und
Visualisieren aller internen und
externen Daten als Einstieg in die
Big Data Welt.
Analysieren
� Den Informationsgehalt aller
relevanten strukturierten oder
unstrukturierten Daten vergleichen
und analysieren.
Verstehen
� Korrelationen und Kombinationen
der Information aufdecken um
bessere Entscheidungen zu treffen
Das Verständnis der Daten ist entscheidend
© 2013 IBM Corporation
IBM Big Data & Analytics Reference Architecture
All Data Sources
Advanced Analytics/
New Insights
CognitiveLearn Dynamically?
PrescriptiveBest Outcomes?
PredictiveWhat Could Happen?
DescriptiveWhat Has Happened?
Exploration and DiscoveryWhat Do You Have?
Streaming Data
Text Data
Applications Data
Time Series
Geo Spatial
Relational
Social Network
Video & Image
New/Enhanced
Applications
Automated Process
Case Management
Analytic Applications
Watson
Cloud Services
ISV Solutions
Alerts Fraud
Big Data Platform Capabilities
• Information platform
• Real-time Analytics
• Warehouse & Data Marts
• Analytic AppliancesIn
form
ati
on
In
teg
rati
on
Landing Zone
Data Exploration
Archive
Real-timeAnalytics
Information Governance, Security and
Business Continuity
Information Governance, Security and
Business Continuity
EDW
Data Marts
Open Architecture/
Multiple Product Entry Points
Open Architecture/
Multiple Product Entry Points
© 2013 IBM Corporation5
PureData Systems
� Expert integrated
systems to make deep
and operational
analytics faster &
simpler
Solutions
Analytics and Decision Management
IBM Big Data Platform
Data Warehouse
Trend #1Appliances
Big Data Infrastructure
IBM Big Data PureData Systems
© 2013 IBM Corporation6
Powered by Netezza technology
Meeting Big Data Challenges – Fast and Easy!
IBM PureData Systems overview
System for Analytics
System for Operational Analytics
DB2 pureScale powered by System-P or System-X
DB2 powered by System-X
System for Transactions
For apps like E-commerce:
Database cluster services optimized for
transactional throughput and scalability
For apps like Customer Analysis:
Data warehouse services optimized for
high-speed, peta-scale analytics and simplicity
For apps like Real-time Fraud Detection:
Operational data warehouse services
optimized to balance high performance
analytics and real-time operational throughput
© 2013 IBM Corporation7
PureData for Analytics - Model N2001
� User Data Capacity: 192 TB*� Data Scan Speed: 450 TB/hr*� Load Speed (per system): 5+ TB/hr
� Power Requirements: 7.5 kW� Cooling Requirements: 27,000 BTU/hr� Footprint: 65x110x222 cm /1282 kg
* Assuming 4X compression
2 IBM x3650-M3 Hosts
� 2x 6-Core Intel 3.46 GHz CPUs
� Active-Passive Mode
7 IBM HX5 S-Blades™
� 2x Intel 8 Core 2+ GHz CPUs
� New Netezza BPE4 Side Car
� 2x 8-Engine Xilinx Virtex-6 FPGAs
� 128 GB RAM + 8 GB slice buffer
12 IBM EXP3000 Disk Enclosures
� 288 x 600 GB SAS2 Drives (240 for
User Data, 14 for S-Blades, 34 Spare)
� RAID 1 Mirroring
• All components
are fully redundant
and able to have
their workload
redistributed to a
set of alternate
components.
• Loss of a blade,
any storage
component, even
the host system
that serves as the
primary interface
will not prevent the
system from
functioning.
• Linux 64-bit Kernel
© 2013 IBM Corporation*Unofficial customer test, **Exadata with/out SSD
Appliance = Increase Data Center EfficiencyWith Faster, More Efficient Systems
PureData usesLess Power
than other systems1
PureData hasMore Capacity
than other systems 2,3
PureData has“Out of the box“ Faster Scan Rates
than other systems
PureData for Analytics - Model N2001
8
© 2013 IBM Corporation
IBM Platform for Big Data: BigInsights
InfoSphereBigInsights
� Enterprise-grade
Hadoop system
enhanced with
advanced text
analytics, data
visualization, tools, &
performance features
for analyzing massive
volumes of structured
and unstructured
data.
IBM Big Data Platform
HadoopSystem
Data Warehouse
Trend #2Analytical
Intelligence on cheap standard
HW
Solutions
Analytics and Decision Management
Big Data Infrastructure9
© 2013 IBM Corporation
� Scalable– New nodes can be added
on the fly
� Affordable – Massively parallel computing on
commodity servers
� Flexible – Hadoop is schema-less, and can
absorb any type of data
� Fault Tolerant – Through MapReduce
software framework
� Performance & reliability– Adaptive MapReduce, Compression,
Indexing, Flexible Scheduler, H
� Enterprise Hardening of Hadoop
� Productivity Accelerators– Web-based Uis and tools
– End-user visualization
– Analytic Accelerators, H.
� Enterprise Integration – To extend & enrich your information
supply chain
� SQL Interface
IBM Enriches Hadoop
10
© 2013 IBM Corporation
Key Features and Specifications
Key Features
Hadoop Distribution − InfoSphere BigInsights V2.1
Built-in Analytics/Accelerators − IBM BigSheets
− IBM Accelerator for Text Analytics
− IBM Accelerator for Social Data
− IBM Accelerator for Machine Data
− IBM Big SQL
Development / Administration − Eclipse-based Development Environment
− Exposed Node Management
Enterprise Readiness − Security
− High Availability SW & HW
− Hardware management & monitoring
Data Warehouse Integration − Enterprise data warehouse connectors
− Archival capabilities
Specifications Full Rack
Management Nodes 1 primary, 1 standby (x3550 M4)
Data Nodes 18 (x3630 M4)
CPU Cores 216
Memory 96 GB per node, 1728 GB total
Raw Storage 216 drives, 3 TB each. 648 TB total
User Space 216 TB
11
© 2013 IBM Corporation
Benefits of IBM PureData System for Hadoop
1Based on IBM internal testing and customer feedback. "Custom built clusters" refer to clusters that are not professionally pre-built, pre-
tested and optimized. Individual results may vary.2Based on current commercially available Big Data appliance product data sheets from large vendors. US ONLY CLAIM.
Accelerate Big Data Time to Value
Accelerate Big Data Time to Value
Simplify Big DataAdoption &
Consumption
Simplify Big DataAdoption &
Consumption
Implement Enterprise-
Class Big Data
Implement Enterprise-
Class Big Data
• Deploy 8x Fasterthan custom-built solutions1
• Built-in Visualizationto accelerate insight
• Built-in Analytic Accelerators2
unlike big data appliances on the market
• Single System Consolefor full system administration
• Rapid Maintenance Updates with automation
• No Assembly Required data load ready in hours
• Only Integrated Hadoop Systemwith Built-in Archiving Tools2
• Delivered with More Robust Securitythan open source software
• Architected for High Availability
12
© 2013 IBM Corporation
Neue Ansätze fürs Data Warehouse
Use Case - Queryable Archive
� Immediate storage alternative of cold data
� Cost savings for cold data
� Compliance requirements
PureData System for Analytics
PureData System for Hadoop
13
Use Case – do more!
� Using unstructured Data
� Explore new Data
� “Super ETL- Landing-Zone”
� Synchronous analyze the data
(Reporting, PredictionH)
© 2013 IBM Corporation
InfoSphereStreams
� Software enabling
continuous analysis of
massive volumes of
streaming data with
sub-millisecond
response times
IBM Big Data Platform
HadoopSystem
Stream Computing
Data Warehouse
Trend #3Processing
of (machine) data in real-
time
Solutions
Analytics and Decision Management
Big Data Infrastructure
IBM Platform for Big Data: Streams
© 2013 IBM Corporation
Search for recent facts
Analysis of the data while moving, before
storage
"Real-Time“-Paradigm, “Push“-Model
Data-driven. Data is brought to the
analysis
Search for historic facts
Find and analyze information stored
“Batch”-Paradigm, “Pull”-Model
Query-driven. Queries are placed on
static data
Traditional DWH Computing Stream Computing
15
Real-time Analytics
Stream Computing: A Paradigm Shift
© 2013 IBM Corporation
Streams Analyzes All Kinds of Data
Mining in Microseconds
(included with Streams)
Image & Video (Open Source)
Simple & Advanced Text
(included with Streams)Text(listen, verb),
(radio, noun)
Acoustic
(IBM Research)
(Open Source)
Geospatial
(IBM Research)
Predictive
(IBM Research)
Advanced
Mathematical
Models
(IBM Research)
Statistics
(included with
Streams)
∑population
tt asR ),(
© 2013 IBM Corporation
DB2 10.5 with In-Memory
Acceleration
� The DB2 release of
the latest generation,
which allows the
transition of
conventional
database technology,
to seamlessly
implement in-memory
analysis.
IBM Big Data Platform
HadoopSystem
Stream Computing
Data Warehouse
In-Memory Database
Trend #4In-Memory Databases
Solutions
Analytics and Decision Management
Big Data Infrastructure
IBM Platform for Big Data: DB2 10.5 BLU
Systems
Management
Application
Development
Visualization
& Discovery
© 2013 IBM Corporation
1
Customer Speedup over
DB2 10.1
Large Financial
Services Company 46.8x
Global ISV Mart Workload 37.4x
Analytics Reporting Vendor 13.0x
Global Retailer 6.1x
Large European Bank 5.6x
10x-25x improvement
is common
“It was amazing to see the faster query times compared to the performance
results with our row-organized tables. The performance of four of our
queries improved by over 100-fold! The best outcome was a query that
finished 137x faster by using BLU Acceleration.” - Kent Collins, Database Solutions Architect, BNSF Railway
DB2 10.5 with In-Memory Acceleration: Typical Results
© 2013 IBM Corporation
Govern data quality and
manage the information
lifecycle
�InfoSphere Information
Server –Cleanses data,
monitors quality and integrates
big data with existing systems
�InfoSphere Optim –manages business information
throughout its lifecycle
�InfoSphere Master Data
Management – manages and
maintains trusted views of
master and reference data
�InfoSphere Guardium – real-
time database security and
monitoring
IBM Big Data Platform
HadoopSystem
Stream Computing
Data Warehouse
In-Memory Database
Information Integration
& Governance
Solutions
Analytics and Decision Management
Big Data Infrastructure
IBM Platform for Big Data: Information Governance
Systems
Management
Application
Development
Visualization
& Discovery
MustHave
IntegrationAnd
Security
© 2013 IBM Corporation
Speed time to value with
analytic and application
accelerators
�Analytic
Accelerators – text
analytics, geospatial,
time-series, data
mining
�Application
Accelerators –
financial services,
machine data, social
data, Telco event
data
�Industry Models
- comprehensive data
models based on
deep expertise and
industry best practice
IBM Big Data Platform
Accelerators
HadoopSystem
Stream Computing
Data Warehouse
In-Memory Database
Information Integration
& Governance
Big Data Infrastructure
IBM Platform for Big Data: Accelerators
Solutions
Analytics and Decision Management
© 2013 IBM Corporation
Capabilities
BLOGS
DISCUSSION FORUMS
NEWSGROUPS
Source Areas
� Dimensional analysis and filtering
� Tunable sentiment rules
� Detect and predict emerging topics and viral posting patterns
� Discover associated themes
SENTIMENT
EVOLVING TOPICS
� Ad-Hoc keyword searches
� Automatic detection changes ““““consumer vocabulary””””
� Relationship heat-maps to understand affinity
� Quantify strength of affinity
COMPREHENSIVE ANALYSIS
AFFINITY ANALYTICS
Business Drivers
Customer CareCorporate Reputation
Campaign Effectiveness
Competitive Analysis
Product Insight
MULTILINGUAL
PREDICTIVE ANALYSIS
� Forward-looking detection of discussion topics
� Identify KPPs
� Predict impact of social interaction on business KPI’’’’s
� Predict ability to influence social interaction
Example Big Data Analytics Application: Social Media Analytics
© 2013 IBM Corporation22
IBM Big Data Platform
Accelerators
HadoopSystem
Stream Computing
Data Warehouse
In-Memory Database
Information Integration
& Governance
Solutions
Analytics and Decision Management
Big Data Infrastructure
IBM Platform for Big Data: Accelerators
Systems
Management
Application
Development
Visualization
& Discovery
Discover, understand,
search, and navigate
federated sources of
big data
�InfoSphere Data
Explorer – Discovery
and navigation
software that provides
real-time access and
fusion of big data with
rich and varied data
from enterprise
applications for
greater insight
Trend #5Search
anddiscover
© 2013 IBM Corporation
Leverage the full power of IBM’s Big Data Platform
© 2013 IBM Corporation23
CM, RM, DM RDBMS Feeds Web2.0 Email Web CRM, ERP File Systems
ConnectorFramework
IBM Data Explorer & App Builder
BigInsights
Integration & Governance
UI / User
Streams Warehouse
Data Explorer
Inte
gra
tio
n &
Go
ve
rna
nce
Data access & integration
• Index structured &
unstructured data in place
• Support existing security
• Federate to external
sources
• Leverage MDM,
governance, and
taxonomies
Discovery & navigation
• Clustering & categorization
• Contextual intelligence
• Easy-to-deploy applications
• All at the scale required for
today’s big data challenges
© 2013 IBM Corporation
Tabbed Search (1) für Quellen
basierte Suche.
Alerts (2) um auf Veränderungen im
Kontent hinzuweisen.
Expertise Location (3) um schnell die
richtigen Experten zu finden.
Such Ergebnisse anreichern durch
Ratings (4), Tagging s (5) oder frei
Text.
Suchergebnisse Speichern(6) und
Bookmarken
Schnelles und einfaches finden durch
Text Clustering (7).
Strukturierte Navigation (8), Filterung,
Verteilung von Informationen und
Zusammenarbeit.
Grafische Navigation (9) in Datums-
bereichen oder Häufigkeiten.
Query Expansion (10) Einbindung
von Thesauri oder Suchvorschlägen.
Out-of-the-Box Funktionalitäten
© 2013 IBM Corporation
Data Explorer + Analytics = Complete Picture
Enterprise Unstructured Sources
Unstructured DataContent Mgt Systems
Enterprise Systems & Content Stores
Databases Data Warehouse
s
SCM SOA, ESB,Web Service
Each system
has its own
but different
structure
Does not
have any
structureWeb RSS Feed____________
Social Media
20%80%World’s Total Data
Unstructured Structured
Data Explorer
handles the
qualitative on
unstructured info.
Analytics handles
the quantitative on
structured info.
Data Explorer surfaces insights from the unstructured
in context with the analytics.
Significant data cleansing occurs on data collected before being run
through systems like Cognos.
© 2013 IBM Corporation
Landing, Exploration& Archive
Security, Governance and Business Continuity
Information Movement, Matching & Transformation
Real-Time Analytics
Landing, Exploration& Archive
IBM End-to-End Big Data & Analytics Portfolio
Analytic Appliances
Enterprise Warehouse
Data Marts
Information
& Insight
Data
Sources
Structured
Operational
Unstructured
External
Social
Sensor
Geospatial
Time Series
Streaming
BI & Performance
Management
Predictive Analytics
& Modeling
Exploration &
Discovery
+ Insures ability to address broader requirements that may be needed now or in the future
+ Apply data security to Big Data (Guardium)
+ Enable a 360° view of all customer related Big Data (MDM)
+ Provide full information integration capabilities for Big Data (Information Server)
+ Integration enables use of existing tools and skills to start leveraging Big Data more quickly
PureData for
Analytics
InfoSphere Data Click, Information Server, MDM, G2
Guardium, Optim
InfoSphere
BigInsights
DB2 BLU,
PureData for
Analytics
PureData for
Operational
Analytics
InfoSphere Streams
SPSS
Cognos
InfoSphere
Data Explorer
© 2013 IBM Corporation
Big Data Exploration Enhanced 360o View
of the Customer
Operations Analysis Data Warehouse Augmentation
Security/Intelligence
Extension
Big Data Use Cases
27
© 2013 IBM Corporation
© 2013 IBM Corporation
29
Ralph Behrens IBM Deutschland GmbH
Client Technical Professional
IBM Big Data
Wilhelm-Fay-Straße 30-34
65936 Frankfurt
Phone +49 (0) 7034 / 6430680
Mobile +49 (0)172 / 6511333
© 2013 IBM Corporation
Client Reference Base
3
0© 2013 IBM Corporation
Telecom
Other
Digital Media
Financial Services
Health & Life Sciences
Retail / Consumer Products
3
0