big data and the cloud - dama ny · dev/test environments: challenges/observations 30% to 50% of...

21
© 2013 IBM Corporation Big Data and the Cloud Dirk deRoos [email protected] @Dirk_deRoos IBM World-Wide Technical Sales Leader, Big Data

Upload: others

Post on 01-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

© 2013 IBM Corporation

Big Data and the Cloud

Dirk deRoos

[email protected]

@Dirk_deRoos

IBM World-Wide Technical Sales Leader, Big Data

Page 2: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

© 2013 IBM Corporation 2

Page 3: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

The Economics of Growth Have Changed

• Land

• Labor

• Capital

• Cloud

• Analytics

• Data

Page 4: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Need to Agree on Definitions: Cloud

On-demand – Users can sign-up for the service and use it immediately

Self-service

– Users can use the service at any time

Scalable – Users can scale-up the service at any time, without waiting for the provider to add more

capacity

Measurable – Users can access measurable data to determine the status of the service

Coined by Dave Nielsen, CloudCamp Founder Source: Dave Nielsen, CloudCamp

Page 5: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Cloud Computing Service Models

Software as a Service (SaaS) – Computing capacity

– Middleware

– Applications

Platform as a Service (PaaS) – Middleware

– Raw computing capacity

Infrastructure as a Service (IaaS) – Raw computing capacity

Source: NIST Definition of Cloud Computing v15

SaaS

PaaS

IaaS

Page 6: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

“Consumerization” of IT

IT departments not seen as source

of innovation

Home and web-based experiences

driving IT expectations in enterprise – Self service provisioning

– Time-to-value measured in minutes

Enterprise LOB consuming Services

by-passing IT dept – IT departments respond by adopting

newer technologies, evolving traditional

capabilities

Page 7: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Deployment Models

Private Public Hybrid

IT capabilities are provided

“as a service,” over an

intranet, within the

enterprise and behind the

firewall

Internal and

external service

delivery methods

are integrated

IT activities /

functions are

provided “as a

service,” over

the Internet

On-Premise (Enterprise data center)

Private Cloud

Managed Private Cloud

Hosted Private Cloud

Member Cloud Services

Public Cloud Services

Third-party operated

Page 8: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Many clients are already on the

way to cloud with consolidation

and virtualization efforts

CONSOLIDATE Physical Infrastructure

CLOUD Dynamic provisioning for workloads

VIRTUALIZE Increase Utilization

STANDARDIZE Operational Efficiency

AUTOMATE Flexible delivery & Self Service

SHARED RESOURCES Common workload profiles

Traditional IT

Movement from Traditional Environments to Cloud

Leon Katsnelson ([email protected])

Page 9: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Idealized

Workloads

Lower Gain from External Clouds

Higher Gain from External Clouds

Lower Pain to Cloud

Delivery

Higher Pain to Cloud Delivery

Collaboration

Transactional

Content

SMB

ERP

Large

Enterprise

ERP

On-Line

Storage

Application

Development

DB Migration

Projects

Situational

Apps

Web Scale

Analytics

[Enterprise Data]

“DB-Centric” Architecture

“Content-Centric” Architecture

“Loosely Coupled” Architecture

Storage and Data Integration Arch.

Web2.0

Data

Archive

Dep’t. BI

Application

Test

Some Workloads Better than Others for Cloud

Discovery

Page 10: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Dev/Test Environments: Challenges/Observations

30% to 50% of all servers within a typical IT environment are

dedicated to test

Most test servers run at less than 10% utilization, if at all!

IT staff report a top challenge is finding available resources to

perform tests in order to move new applications

into production

30% of all defects are caused by badly configured test

Testing backlog is often very long and single largest factor in the

delay new application deployments

Test environments are seen as expensive and providing little real

business value

* “Industry Developments and Models – Global Testing Services: Coming of Age,” IDC, 2008 and IBM Internal Reports

Page 11: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Development/Test Environment - Perfect for Cloud

Quick ROI – 30% to 50% of all servers within a typical IT environment are dedicated to test

– Most test servers run at less than 10% utilization, if they are running

at all!

Low risk – Low risk in terms of business and overall IT operations

– Security/compliance concerns easily mitigated

Excellent return on automation – Agility

– Consistent dev/test environments mean fewer errors

– Self-service

Page 12: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Need to Agree on Definitions: Big Data

Information management challenges that can’t be dealt with using

traditional tools and approaches

Collectively analyzing

the broadening Variety

Responding to the

increasing Velocity

Cost efficiently

processing the

growing Volume

50x 35 ZB

2020 2010

30 Billion RFID

sensors and

counting

80% of the

worlds data is

unstructured

Viscosity

Valence

Value

Variability Viability Veracity

Page 13: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

The Big Data Conundrum

Data AVAILABLE to

an organization

Data an organization

can PROCESS

The percentage of available data an enterprise can analyze is

decreasing proportionately to the available to that enterprise

Quite simply, this means as enterprises, we are getting “more naive”

over time

Page 14: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Information Movement & Transformation

Traditional Enterprise Data and Analytics

Data

Sources

Structured

Operational

BI & Performance

Management

Predictive Analytics

& Modeling

Archive

Marts

Expanded EDW

Staging Area

Put Staging Area in the EDW

+ In-database transformations (ELT faster than ETL)

+ Provides some structure, enabling queries

- Adds significant cost and overhead to EDW

Actionable

Insights

Page 15: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Traditional Data Mining and Exploratory Analysis

© 2013 IBM Corporation 15

Page 16: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Big Data Analytics Iterative & Exploratory

Data is the structure

IT Team

Delivers Data

On Flexible

Platform

Business

Users

Explore and

Ask Any Question

Analyze ALL Available Information

Whole population analytics

connects the dots

Traditional Analytics Structured & Repeatable

Structure built to store data

Business

Users

Determine

Questions

IT Team

Builds System

To Answer

Known Questions

16

Available Information

Analyzed

Information

Capacity constrained down sampling

of available information

Carefully cleanse a small information

before any analysis

Analyzed

Information

Warehouse Modernization Has Two Themes

Analyze information as is & cleanse as

needed & existing repeatable

Analyzed

Information

Page 17: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Big Data Analytics Iterative & Exploratory

Data is the structure

Traditional Analytics Structured & Repeatable

Structure built to store data

17

Warehouse Modernization Has Two Themes

? Analyzed

Information

Question

Data Answer

Hypothesis

Start with hypothesis

Test against selected data

Data leads the way

Explore all data, identify correlations

Data

Correlation

All Information

Exploration

Actionable Insight

Analyze after landing… Analyze in motion…

Page 18: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Next Generation Information Management Architecture

Analytic Appliances

Security, Governance and Business Continuity

Information Movement, Matching & Transformation

Landing, Exploration & Archive Enterprise

Warehouse

Data Marts

Real-Time Analytics

Data

Sources

Structured

Operational

Unstructured

External

Social

Sensor

Geospatial

Time Series

Streaming

BI & Performance

Management

Predictive Analytics

& Modeling

Exploration &

Discovery

Big Data Platform Actionable

Insights

Page 19: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Hadoop and the Cloud: Considerations

Hadoop was designed for bare metal – Hadoop runs best with locally attached storage and

dedicated networking

– Rack awareness breaks in many cloud deployments

– Hadoop will still run in virtualized environments, but data processing

will not perform as well as on bare metal • Large amount of network traffic

Hadoop has sweet spots – Large scale batch analysis

– Data flexibility

Data governance requirements – Privacy

– Security

– Regulatory requirements

– Metadata management

– Data access interfaces

– …

© 2013 IBM Corporation 19

Page 20: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

Conclusions

Cloud infrastructure has many benefits for Big Data analytics – Inexpensive storage

– Inexpensive processing (short term)

– Flexible (scale in/out) architecture

Ideal workloads: Ad-hoc analysis – Performance is of secondary concern

– Ability to flexibly pull in many different data sets

Longer term applications are more costly on public clouds – Private clouds are an interesting option for internal Hadoop deployments

– Ideal for short-term ad-hoc projects • Flexible, inexpensive

Consider governance issues!!! – Private clouds may be necessary

– Governance tools are available for Hadoop and the cloud • Hint, hint… IBM

© 2013 IBM Corporation 20

Page 21: Big Data and the Cloud - DAMA NY · Dev/Test Environments: Challenges/Observations 30% to 50% of all servers within a typical IT environment are dedicated to test Most test servers

© 2013 IBM Corporation 21

THINK