csc proprietary and...

33
CSC Proprietary and Confidential

Upload: truongque

Post on 28-Jun-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

CSC Proprietary and Confidential

CSC Proprietary and Confidential

Harnessing Big and Fast Data

January 2014

3 CSC Proprietary and Confidential August 25, 2014

Global Business Drivers Are Forcing Organizations to Operate and Compete in New Ways

Blurring industry lines with new forms

of competition

Heightening demand for increased

profitability

Higher customer expectations and

new buying behavior

Growing regulatory and risk concerns

4 CSC Proprietary and Confidential August 25, 2014

50 BILLION DEVICES INTERNET CONNECTED BY 2020

Vehicle, Asset, Person & Pet Monitoring & Controlling

Agriculture Automation

Security & Surveillance

Building Management

+ -

Embedded Mobile

Everyday Things

Smart Homes & Cities

Telemedicine & Healthcare

Looking into the Future

5 CSC Proprietary and Confidential August 25, 2014 5

Source: CIO Barometer 2013

August 25, 2014

CIOs See Big Data as One of the Most Significant Developments

Of CIOs Listed Analytics and Big Data

as one of the Most Significant

Developments for I.T.

56%

Source: CSC’s 2013 CIO Barometer Survey

6 CSC Proprietary and Confidential August 25, 2014

Technology Advances Are Enabling This Data to Be Found and Acted Upon

1970 1980 1990 2000 2010 2020 System ‘R’

RDBMS & SQL

Client-Server

Warehousing Data

Federated Databases

Internet SaaS

Analytic Appliances

Cloud

Webscale Big Data

Analytic Applications

Internet Of Things

The New Normal Everyone has the power to process terabytes and petabytes of data… at a lower cost.

7 CSC Proprietary and Confidential August 25, 2014

Across Industries, Clients See the Possibilities of Analytics…

Financial Services

Utilities Transportation Health & Life Sciences

Retail Telecomunications

Fraud detection Risk management 360° View of the

Customer

Real time route optimization based on traffic and weather Maintenance optimization

and asset tracking

360° View of the Customer Click-stream analysis Real-time promotions

Law Enforcement Real-time multimodal

surveillance Situational awareness Cyber security detection

CDR processing Churn prediction Geomapping / marketing Network monitoring

Epidemic early warning system ICU monitoring Remote healthcare

monitoring

Weather impact analysis on power generation Transmission monitoring Smart grid management

Predictive maintenance Real-time parts flow

monitoring Product configuration

planning

Manufacturing

8 CSC Proprietary and Confidential August 25, 2014

1 Setting up and operating a big data and analytics platform

2 Applying the right data science

3 Integrating insights into their business processes

4 Identifying and managing big data skills

…but they consistently struggle with the same four challenges

9 CSC Proprietary and Confidential August 25, 2014

BI to Big Data…What’s Changed: Dimensionality

D i m e n s i o n a l i t y BUSINESS INTELLIGENCE

search

display

direct

email

mobile

catalog

POS

Tele

TV

Web

BIG DATA

10 CSC Proprietary and Confidential August 25, 2014

BI to Big Data…What’s Changed: Growing Data Types and Sources

WEB LOGS SOCIAL MEDIA

TRANSACTIONAL DATA

SMART GRIDS

OPERATIONAL DATA DIGITAL CONTENT

R&D DATA AD IMPRESSIONS

FILES

Internal, structured data

New types and sources

BUSINESS INTELLIGENCE BIG DATA

11 CSC Proprietary and Confidential August 25, 2014

Big Data includes BI, which includes Data Visualizations

Big Data is essentially BI on steroids.

In addition, new techniques for data analysis are emerging such as

unsupervised modeling

BI has evolved from static reports with a single data visualization to an interactive, dynamic data discovery tools that incorporate many visualizations.

Data Viz Google Pie Chart Demo

Business Intelligence SAS Insurance Demo

Big Data Customer Sentiment

Analysis

12 CSC Proprietary and Confidential August 25, 2014

Data Warehousing to Big Data…What’s Changed

Schema on Write (Schema required)

Clean Trusted Structured

“As Is” Unknown

Unstructured

Schema on Read (Schema-less)

More data, more types DATA WAREHOUSING BIG DATA

13 CSC Proprietary and Confidential August 25, 2014

Google Evolved Search by Taking More Data into Account

Google’s page rank didn’t revolutionize the search engine business because of the algorithm. 1. Google recognized that

hyperlinks were an important measure of popularity (a link to a webpage counts as a vote for it)

2. The use of anchortext (the text of the hyperlinks) in a web index, giving it a weight close to the page title

More Data Usually Beats Better Algorithm. Retrieved from http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

B 38.4%

C 34.3%

F 3.9%

D 3.9%

E 8.1%

1.6%

A 3.3%

1.6% 1.6%

1.6%

1.6%

Presenter
Presentation Notes
  Mathematical PageRanks for a simple network, expressed as percentages. (Google uses a logarithmic scale.) Page C has a higher PageRank than Page E, even though there are fewer links to C; the one link to C comes from an important page and hence is of high value. If web surfers who start on a random page have an 85% likelihood of choosing a random link from the page they are currently visiting, and a 15% likelihood of jumping to a page chosen at random from the entire web, they will reach Page E 8.1% of the time. (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have PageRank zero. In the presence of damping, Page A effectively links to all pages in the web, even though it has no outgoing links of its own.

14 CSC Proprietary and Confidential August 25, 2014

Google Made AdWords More Competitive by Leveraging More Data

Overture had previously proved that the model of having advertisers bid for keywords could work. Overture ranked advertisers for a given keyword based purely on their bids.

Google added some additional data: the clickthrough rate (CTR) on each advertiser's ad. Thus, to a first approximation, Google ranks advertisers by the product of their bid and their CTR (this was true in the first version of AdWords; they now use more considerations).

More Data Usually Beats Better Algorithm. Retrieved from http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

Presenter
Presentation Notes
  Mathematical PageRanks for a simple network, expressed as percentages. (Google uses a logarithmic scale.) Page C has a higher PageRank than Page E, even though there are fewer links to C; the one link to C comes from an important page and hence is of high value. If web surfers who start on a random page have an 85% likelihood of choosing a random link from the page they are currently visiting, and a 15% likelihood of jumping to a page chosen at random from the entire web, they will reach Page E 8.1% of the time. (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have PageRank zero. In the presence of damping, Page A effectively links to all pages in the web, even though it has no outgoing links of its own.

15 CSC Proprietary and Confidential August 25, 2014

Netflix Found More Data Drove Better Recommendations

More Data Usually Beats Better Algorithm. Retrieved from http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

The Netflix challenge exists for competitors to try to create a better recommendation algorithm for Netflix’s viewers than the one that exists today. Stanford professor Anand Rajaraman, found that a complex algorithm with Netflix data alone is regularly beat by a simple algorithm that incorporates additional data sources such as IMDB. The additional data source increases the dimensionally of the movie to help pinpoint what users are attracted to.

Presenter
Presentation Notes
  Mathematical PageRanks for a simple network, expressed as percentages. (Google uses a logarithmic scale.) Page C has a higher PageRank than Page E, even though there are fewer links to C; the one link to C comes from an important page and hence is of high value. If web surfers who start on a random page have an 85% likelihood of choosing a random link from the page they are currently visiting, and a 15% likelihood of jumping to a page chosen at random from the entire web, they will reach Page E 8.1% of the time. (The 15% likelihood of jumping to an arbitrary page corresponds to a damping factor of 85%.) Without damping, all web surfers would eventually end up on Pages A, B, or C, and all other pages would have PageRank zero. In the presence of damping, Page A effectively links to all pages in the web, even though it has no outgoing links of its own.

DATABASES

DATA INTEGRATION

VISUALIZATION / BUSINESS

INTELLIGENCE

ADVANCED DATA

ANALYTICS DATA

STREAMING

Requires Embracing Heterogeneity… The Way Forward

17 CSC Proprietary and Confidential August 25, 2014

InterSystems Progress

Objectivity Versant

…Recognizing the Cooperation Between Relational and Non-Relational Databases

RELATIONAL NON-RELATIONAL ANALYTIC

OPERATIONAL

Hadoop Horton

Cloudera MapR

Hadapt Impala

Shark/Spark

Teradata Aster

EMC

Greenplum

IBM Netezza

HP Vertica

Sybase IQ

Infobright ParAccel Calpont

VectorWise

Storm Oracle IBM DB2 JustOneDB

SQLSrvr

MarkLogic McObject

KEY VALUE

DOCUMENT

BIG TABLES

‘DATA AS A SERVICE’

GRAPH

Lotus Notes

CouchDB MongoDB RavenDB

Riak Redis

Membrain Voldemort

BerkeleyDB HyperTable

Hbase Accumulo

Titan FlockDB

InfiniteGraph Neo4j

AllegroGraph

Casandra

Couchbase Cloudant App Engine SimpleDB

Amazon RDS SQL Azure

Database.com

SchoonerSQL Tokutek

Continuent Translattice

ScaleBase CodeFutures

VoltDB

HandlerSocket Akiban

MySQL Cluster Clustrix Drizzle

GenieDB ScalArc

NimbusDB

Xeround FathomDB

MySQL Ingress PostgreSQL

EnterpriseDB Sybase ASE NoSQL NewSQL

18 CSC Proprietary and Confidential August 25, 2014

…Exploiting Batch

Cloud::Hadoop Cloudera MapR Hortonworks Spark Altiscale Metascale Continuuity Motor Data HP Bamboo Zdata Zettaset

19 CSC Proprietary and Confidential August 25, 2014

Cloud::Queries Hbase Impala Shark MongoDB (10gen/mongoHQ) Cassandra (Datastax) Aerospike ElasticSearch Riak Redis CouchDB Gridgain Netezza ParAccel Greenplum Vertica Hadapt

…Enabling Ad-hoc / Fast Query

20 CSC Proprietary and Confidential August 25, 2014

Cloud::Streams Storm S4 AccelOps HStreaming Streambase SQLStream

InfoStreams OpenCQ NiagaraCQ TelegraphCQ RapideGemfire DistCEP CEDR

Cayuga Raced Sase+ Amit TESLA/T-Rex Progress Apama Tibco Business

Events Esper Aleri/Coral8 Oracle CEP

…And Integrating Stream Processing

PROCESSORS

STREAM 1

STREAM 2

21 CSC Proprietary and Confidential August 25, 2014

Big Data Can Be Done in the Cloud…

Analytic Application

Platforms

New Infrastructure

Platforms

Analytic Vertical SaaS Applications

Incumbents Emerging Players

22 CSC Proprietary and Confidential August 25, 2014

…Or On-Prem

ETL

Batch

Ad-Hoc

Real-Time

BI

Analytics

Platform

DataStage

Big Insights

PureData

Streams

Cognos

SPSS

System X

Informatica Ab Initio

Hortonworks

AsterData

N/A

Tableau+

SAS SW

Big Data Appliance

Oracle Data Integration

Cloudera

Oracle NoSQL Times Ten

Oracle EP

OBIEE

Oracle R

Exalytics, Exadata

Data Integrator

Hortonworks

Hana

Sybase EP

BOBJ

SAP Predictive

IBM/Dell

Informatica

Pivotal HD

Greenplum

Gemfire

Partners

Cetas, SAS, Alpine

Greenplum DCA,

Isilon

23 CSC Proprietary and Confidential August 25, 2014

INFRASTRUCTURE SERVICES (Bare Metal, AWS IaaS,

CSC Cloud, Open Stack, Vsphere)

REPEATABLE BD&A CORE PLATFORM

FUNCTIONAL INTELLIGENCE & ANALYTICS Customer | Finance | Marketing | Operations | Supply Chain

INDUSTRY SOLUTIONS

OUTCOMES

Retained insurance customers

Refined diagnoses

in clinical care

Efficient operations

in manufacturing

Underwriting risk based on

climate predictions

CSC offers Business, Data Science, and Platform Consulting Services around packaged offerings, best-of-breed technologies, and a repeatable platform that combines traditional and open source

Data Science Consulting

Big Data Business Consulting

Big Data Platform Consulting

24 CSC Proprietary and Confidential August 25, 2014

…and brings together proprietary and open source technologies to solve the most unique use cases

+ +

But, what happens when parts go out of

tolerance? What is causing these defaults?

How do I manage this

large analytics platform?

Store structured, manufacturing data such

as data on quality constraints

Analyze measurement system data and manufacturing data together with

unstructured machine data, such as temperatures, to identify correlations

Stand up a managed big data platform and start finding insights in

less than 30 days

A USE CASE IN MANUFACTURING

25 CSC Proprietary and Confidential August 25, 2014

CSC’s Infochimps Platform Enables Insights from your Data in Less than 30 Days

Cloud::Streams

Cloud::Hadoop

Cloud::Queries

Wukong

Cloud API

Command Center

« PR

OC

ESS DATA Q

UERY D

ATA »

COLLECT DATA

HTTP

Logs

Data Partners

Batch Upload

Custom Connectors

INFOCHIMPS CLOUD

26 CSC Proprietary and Confidential August 25, 2014

EMBRACE HETEROGENEITY. DIFFERENT CLOUDS SUPPORT DIFFERENT WORKLOADS.

CSC’s Cloud Offerings Give Options Across Workloads, Infrastructures, and Deployment

27 CSC Proprietary and Confidential August 25, 2014

CSC’s Modular Approach Ensures a Focus on Time-to-Value

SHAPE TRANSFORM MANAGE / AS A SERVICE

Transform Decision Making

Enable Business Strategy

Achieve Business Process Optimization

28 CSC Proprietary and Confidential August 25, 2014

This Approach Is Supported By Proven, Pre-Packaged Accelerators

Maturity Model BenchmarkingCurrent State

Assessment

Competitive Benchmarking

Future State Design

Business Case Phase 1

Phase 2

Workshops

Implementation Plan

No real integration capability available. Data exists in

Operational silos

Integration, either/or at batch and real-time level exists at the sub-enterprise level. This

may be Business Unit or Functional Area. No integration capability exists across the

Enterprise

Limited, targeted integration capability exists across the Enterprise, some at real-time level.

Strategies and plans exist to expand that to completion

Full integration at both real-time and historical timeframes exist across the

Enterprise

Full Realization

Partial Integration

Building Blocks to Success

At the Starting Gate

DISTINCTIVE

ADVANCED

FOUNDATIONAL

BASIC

• CSC Big Data maturity model • Project profiles and prioritization • Sample KPIs • Business case • Big Data roadmap • Sample implementation plans

• Conceptual solution architecture • Template data strategy • Logical technical model • Reference architectures

• Pre-built data extract accelerators • Pre-built models • Pre-built visualizations/dashboards • Sample governance models

29 CSC Proprietary and Confidential August 25, 2014

Through Experience with Many Projects, We’ve Learned What It Takes to Succeed…

Operationalize the resulting

insights

Start from the business need/

opportunity

Imagine the potential from disparate data

sources

Contribute CSC’s industry understanding

to create advanced analytic models

Avoid costly mistakes with attention to architecture

Present the output in an easily assimilated and visualized form

30 CSC Proprietary and Confidential August 25, 2014

…And Created a Short List of Do’s and Don’ts

DO Start simple (K.I.S.S. principle) Initial big data projects should be done with the minimal amount of plumbing and a well scope problem. Think like a marine…land and expand quickly!

DON’T Translate a data warehouse into a Big Data system table for table NoSQL approaches to data mining are different for a reason. Don’t try to just translate something you already have.

DO Tweak Today’s Big Data systems take a lot of care and feeding to continue in a stable manner. In addition, most Big Data models need to be retrained periodically.

DON’T Confuse a domain expert for a data scientist Data science is a team approach that takes development, mathematics and domain expertise.

31 CSC Proprietary and Confidential August 25, 2014

Purpose-Fit Industrialized Solutions Leveraging CSC’s deep industry and business process knowledge Full Life-Cycle Approach Using our uniquely modular, Shape-Transform-Manage, full life-cycle approach keeps focus on business outcomes

Expertise in Turning Data into Recommendations Combining the right data sets and applying the right analytic techniques to unveil new insights Leading Data Catalog 15,000+ data sources

Our Clients Choose CSC Because of Our Unique Strengths in Big Data and Analytics

TECHNOLOGY

DATA SOLUTIONS

Technology Leadership Working with Hadoop since its creation

Faster Time to Value Standing up a big data platform in less than 30 days for rapid business value

Technology Independence Delivering best-of-breed solutions, from open source to global partners, optimized for your need and budget

32 CSC Proprietary and Confidential August 25, 2014

How Can You Get Started?

Ask your CSC team how a Shape service can help you determine how Big Data can make a difference for

your organization

ASK FOR A DEMO AT www.csc.com/bigdata

32 August 25, 2014 © 2013 Computer Sciences Corporation. All Rights Reserved.

CSC Proprietary and Confidential 33 August 25, 2014

Thank You