apache spark: the analytics operating system by anjul bhambhri

10
Apache Spark: The Analytics Operating System Anjul Bhambhri IBM Vice President, Big Data

Upload: spark-summit

Post on 16-Apr-2017

2.030 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Apache Spark: The Analytics Operating System by Anjul Bhambhri

Apache Spark: The Analytics Operating System

Anjul BhambhriIBM Vice President, Big Data

Page 2: Apache Spark: The Analytics Operating System by Anjul Bhambhri

IBM Invests in Reinventing Computing

Linux, 199913,000,000 lines of code.500+ Server SolutionsUshered in Computer Science

System 360, 196410,000,000 lines of code.54 Peripheral SolutionsUshered in Information Science

Apache Spark, 2015400,000 lines of code.15+ Data & Analytics SolutionsUshered in Data Science

Page 3: Apache Spark: The Analytics Operating System by Anjul Bhambhri

The Analytics Operating System

1 platform

Apache Spark

Page 4: Apache Spark: The Analytics Operating System by Anjul Bhambhri

IBM | Spark

expressive-ness speed

any data:on disk,

or on the wire

(almost) any application unified model ->

high productivity

unparalleled performance

Why Spark?

Page 5: Apache Spark: The Analytics Operating System by Anjul Bhambhri

Enhance it! Offer it!

Leverage it!

Spark Technology Center @ SF

Shipping with BigInsights /Spark as a

Service

Inside our products

At IBM, We Love Spark!

Page 6: Apache Spark: The Analytics Operating System by Anjul Bhambhri

IBM is Building on Apache Spark

• IBM Analytics• IBM Commerce• IBM Watson• IBM Research• IBM Cloud

Page 7: Apache Spark: The Analytics Operating System by Anjul Bhambhri

Spark for scalable financial reporting Financial data lakes are growing• Regulatory requirements => data retention• 30+ years of historical data (petabytes)• 100s of business analysts• 1000s of disparate reports requested

Overnight and real-time transactions also large• Complex ledger “posting” processes

Tight timelines (2-3 hours before banks open)

Scalable “scan-sharing” engine to the rescue:• SQL-inspired “financial” DSL built on Spark• Runs common portions of queries simultaneously• Dramatically lowers cost of producing the “next” analyst request that comes along

Page 8: Apache Spark: The Analytics Operating System by Anjul Bhambhri

Spark maps Customer Experience “journey”• Multiple channels of customer

interaction.

• Very large data volumes that need fast processing.

• Correlating events across channels to interactions.

• Continuous classification of interactions and map the journey of the customer across channels.

• Sequence mining algorithm on Spark processes terabytes of interactions in minutes• MLLib models detect frustration in customers by length and frequency of interaction across

channels• SparkSQL and Parquet allow supporting multiple concurrent queries

PUB / SUBMQTT / WebSockets / Flume / Kafka

> > >

` ` `

JourneyDashboards

> > >

>>>

>> >

Interaction & Journey Data

<< < >> >

Voice & Text Data

Page 9: Apache Spark: The Analytics Operating System by Anjul Bhambhri

visit www.spark.tc for more information

IBM | Spark

IBM Spark Technology CenterSan Francisco

Growing pool of contributors

300+ inventors

Contributed SystemML

Founding member of AMPLab

Partnerships in the ecosystem

IBM has made a significant investment in Spark

Page 10: Apache Spark: The Analytics Operating System by Anjul Bhambhri

Power of data. Simplicity of design. Speed of innovation.

IBM Apache Spark

For Apache Spark news and innovationfrom IBM’s Spark Technology Center —