big data and the cloud - dama ny · dev/test environments: challenges/observations 30% to 50% of...
Post on 01-Jun-2020
1 Views
Preview:
TRANSCRIPT
© 2013 IBM Corporation
Big Data and the Cloud
Dirk deRoos
dderoos@ca.ibm.com
@Dirk_deRoos
IBM World-Wide Technical Sales Leader, Big Data
© 2013 IBM Corporation 2
The Economics of Growth Have Changed
• Land
• Labor
• Capital
• Cloud
• Analytics
• Data
Need to Agree on Definitions: Cloud
On-demand – Users can sign-up for the service and use it immediately
Self-service
– Users can use the service at any time
Scalable – Users can scale-up the service at any time, without waiting for the provider to add more
capacity
Measurable – Users can access measurable data to determine the status of the service
Coined by Dave Nielsen, CloudCamp Founder Source: Dave Nielsen, CloudCamp
Cloud Computing Service Models
Software as a Service (SaaS) – Computing capacity
– Middleware
– Applications
Platform as a Service (PaaS) – Middleware
– Raw computing capacity
Infrastructure as a Service (IaaS) – Raw computing capacity
Source: NIST Definition of Cloud Computing v15
SaaS
PaaS
IaaS
“Consumerization” of IT
IT departments not seen as source
of innovation
Home and web-based experiences
driving IT expectations in enterprise – Self service provisioning
– Time-to-value measured in minutes
Enterprise LOB consuming Services
by-passing IT dept – IT departments respond by adopting
newer technologies, evolving traditional
capabilities
Deployment Models
Private Public Hybrid
IT capabilities are provided
“as a service,” over an
intranet, within the
enterprise and behind the
firewall
Internal and
external service
delivery methods
are integrated
IT activities /
functions are
provided “as a
service,” over
the Internet
On-Premise (Enterprise data center)
Private Cloud
Managed Private Cloud
Hosted Private Cloud
Member Cloud Services
Public Cloud Services
Third-party operated
Many clients are already on the
way to cloud with consolidation
and virtualization efforts
CONSOLIDATE Physical Infrastructure
CLOUD Dynamic provisioning for workloads
VIRTUALIZE Increase Utilization
STANDARDIZE Operational Efficiency
AUTOMATE Flexible delivery & Self Service
SHARED RESOURCES Common workload profiles
Traditional IT
Movement from Traditional Environments to Cloud
Leon Katsnelson (leon@ca.ibm.com)
Idealized
Workloads
Lower Gain from External Clouds
Higher Gain from External Clouds
Lower Pain to Cloud
Delivery
Higher Pain to Cloud Delivery
Collaboration
Transactional
Content
SMB
ERP
Large
Enterprise
ERP
On-Line
Storage
Application
Development
DB Migration
Projects
Situational
Apps
Web Scale
Analytics
[Enterprise Data]
“DB-Centric” Architecture
“Content-Centric” Architecture
“Loosely Coupled” Architecture
Storage and Data Integration Arch.
Web2.0
Data
Archive
Dep’t. BI
Application
Test
Some Workloads Better than Others for Cloud
Discovery
Dev/Test Environments: Challenges/Observations
30% to 50% of all servers within a typical IT environment are
dedicated to test
Most test servers run at less than 10% utilization, if at all!
IT staff report a top challenge is finding available resources to
perform tests in order to move new applications
into production
30% of all defects are caused by badly configured test
Testing backlog is often very long and single largest factor in the
delay new application deployments
Test environments are seen as expensive and providing little real
business value
* “Industry Developments and Models – Global Testing Services: Coming of Age,” IDC, 2008 and IBM Internal Reports
Development/Test Environment - Perfect for Cloud
Quick ROI – 30% to 50% of all servers within a typical IT environment are dedicated to test
– Most test servers run at less than 10% utilization, if they are running
at all!
Low risk – Low risk in terms of business and overall IT operations
– Security/compliance concerns easily mitigated
Excellent return on automation – Agility
– Consistent dev/test environments mean fewer errors
– Self-service
Need to Agree on Definitions: Big Data
Information management challenges that can’t be dealt with using
traditional tools and approaches
Collectively analyzing
the broadening Variety
Responding to the
increasing Velocity
Cost efficiently
processing the
growing Volume
50x 35 ZB
2020 2010
30 Billion RFID
sensors and
counting
80% of the
worlds data is
unstructured
Viscosity
Valence
Value
Variability Viability Veracity
The Big Data Conundrum
Data AVAILABLE to
an organization
Data an organization
can PROCESS
The percentage of available data an enterprise can analyze is
decreasing proportionately to the available to that enterprise
Quite simply, this means as enterprises, we are getting “more naive”
over time
Information Movement & Transformation
Traditional Enterprise Data and Analytics
Data
Sources
Structured
Operational
BI & Performance
Management
Predictive Analytics
& Modeling
Archive
Marts
Expanded EDW
Staging Area
Put Staging Area in the EDW
+ In-database transformations (ELT faster than ETL)
+ Provides some structure, enabling queries
- Adds significant cost and overhead to EDW
Actionable
Insights
Traditional Data Mining and Exploratory Analysis
© 2013 IBM Corporation 15
Big Data Analytics Iterative & Exploratory
Data is the structure
IT Team
Delivers Data
On Flexible
Platform
Business
Users
Explore and
Ask Any Question
Analyze ALL Available Information
Whole population analytics
connects the dots
Traditional Analytics Structured & Repeatable
Structure built to store data
Business
Users
Determine
Questions
IT Team
Builds System
To Answer
Known Questions
16
Available Information
Analyzed
Information
Capacity constrained down sampling
of available information
Carefully cleanse a small information
before any analysis
Analyzed
Information
Warehouse Modernization Has Two Themes
Analyze information as is & cleanse as
needed & existing repeatable
Analyzed
Information
Big Data Analytics Iterative & Exploratory
Data is the structure
Traditional Analytics Structured & Repeatable
Structure built to store data
17
Warehouse Modernization Has Two Themes
? Analyzed
Information
Question
Data Answer
Hypothesis
Start with hypothesis
Test against selected data
Data leads the way
Explore all data, identify correlations
Data
Correlation
All Information
Exploration
Actionable Insight
Analyze after landing… Analyze in motion…
Next Generation Information Management Architecture
Analytic Appliances
Security, Governance and Business Continuity
Information Movement, Matching & Transformation
Landing, Exploration & Archive Enterprise
Warehouse
Data Marts
Real-Time Analytics
Data
Sources
Structured
Operational
Unstructured
External
Social
Sensor
Geospatial
Time Series
Streaming
BI & Performance
Management
Predictive Analytics
& Modeling
Exploration &
Discovery
Big Data Platform Actionable
Insights
Hadoop and the Cloud: Considerations
Hadoop was designed for bare metal – Hadoop runs best with locally attached storage and
dedicated networking
– Rack awareness breaks in many cloud deployments
– Hadoop will still run in virtualized environments, but data processing
will not perform as well as on bare metal • Large amount of network traffic
Hadoop has sweet spots – Large scale batch analysis
– Data flexibility
Data governance requirements – Privacy
– Security
– Regulatory requirements
– Metadata management
– Data access interfaces
– …
© 2013 IBM Corporation 19
Conclusions
Cloud infrastructure has many benefits for Big Data analytics – Inexpensive storage
– Inexpensive processing (short term)
– Flexible (scale in/out) architecture
Ideal workloads: Ad-hoc analysis – Performance is of secondary concern
– Ability to flexibly pull in many different data sets
Longer term applications are more costly on public clouds – Private clouds are an interesting option for internal Hadoop deployments
– Ideal for short-term ad-hoc projects • Flexible, inexpensive
Consider governance issues!!! – Private clouds may be necessary
– Governance tools are available for Hadoop and the cloud • Hint, hint… IBM
© 2013 IBM Corporation 20
© 2013 IBM Corporation 21
THINK
top related