ibm data science experience - mladen jovanovski

31
© 2016 IBM Corporation IBM Data Science Experience Overview Mladen Jovanovski Client Technical Specialist Big Data & Databases IBM Analytics, SEE [email protected]

Upload: institute-of-contemporary-sciences

Post on 15-Apr-2017

170 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation

IBM Data Science ExperienceOverview

Mladen JovanovskiClient Technical SpecialistBig Data & DatabasesIBM Analytics, [email protected]

Page 2: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation2

Evolving the IBM DataWorks name

Self-service data preparation for data professionals

Composable data and analytics services with collaborative

experiences for all data professionals

IBM DataWorks started as a new, simpler approach for providing broader set of data professionals with self-service data preparation and integration.

We are embracing these initial values and extending them to provide these data professionals with our family of next generation data and analytics technology.

Page 3: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation3

IBM DataWorks Provides Choice of Collaborative User Experiences, Solution Blueprints, and Individual Services

Access & Ingest

FindFind ShareShare CollaborateCollaborate

Store Analyze & Build

Deploy

• IOT• Streaming• ETL

• Hadoop• NoSQL/SQL• Object Store

• Descriptive• Predictive• Prescriptive• Dev environment

• Apps/APIs• Reports• Models

Solution Blueprints

Self-Service Analytics

Internet of Things

DataLake

Mobile Applications

UserExperiences

IndividualServices

Powered by

Governance

Data AccessData RecognitionAdvanced Analytics

Page 4: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation4

Start Today with Experiences and Individual Services

Data Engineer Business AnalystApp Developer Data Scientist

IBM dashDB™

IBM Cloudant®

IBM BigInsights® forApache Hadoop

IBM Graph

Data ConnectStreaming Analytics

IBM Compose

Watson Analytics

Data Science Experience

Data ConnectBluemix

User Experiences

DataProfessionals

Data Access Data Recognition Advanced Analytics

IBM Analytics for Apache Spark

IndividualServices

Page 5: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation5

Tailored Experiences For Users Collaborating Together

Architects how data is organized & ensures operability

Gets deep into the data to draw hidden insights for the business

Works with data to apply insights to the business strategy

Plugs into data and models & writes code to build apps

Ingest data

Transform: clean

Create and build

model

Evaluate

Deliver and

deploy model

Communicate results

Understand problem and

domain

Explore and understand

data

Transform:shape

OUTPUT

ANALYSIS

INPUTData Engineer

Data Scientist

Business Analyst

App Developer

Data Connect

Data Science Experience

Watson Analytics

Bluemix

Page 6: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation6

Primary persona – Data Scientist

Rigid toolset - Have to choose one and only one approach- Cannot easily connect all of the capabilities needed- Difficult to navigate between the various tools used

Fragmented and time consuming- Using multiple disjointed environments- Separate on-ramp/community for each tool/environment- Does not have meta data or data lineage

Analytical Silo- Difficult to maintain and version control project assets- Limited means of collaborating with team- Results are difficult to share

Page 7: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation7

The perfect Data Science Team

• Normally not all the skills are in one single person but rather in a data science team

• In IBM Data Science Experience we include tools to make the perfect Data Science Team

• All in a collaborative, cloud environment that scales in demand

Page 8: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation8

Built-in learning to get started or go the distance with advanced tutorials

Learn

The best of open source and IBM value-add to create state-of-the-art data products

Create

Community and social features that provide meaningful collaboration

Collaborate

URL: http://datascience.ibm.com

Introducing the Data Science Experience

Page 9: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation9

A L L Y O U R T O O L S I N O N E P L A C E

IBM Data Science Experience is an environment that brings

together everything that a Data Scientist needs. It includes the

most popular Open Source tools and IBM unique value-add

functionalities with community and social features, integrated

as a first class citizen to make Data Scientists more successful.

datascience.ibm.com

IBM Data Science Experience

Page 10: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation10

IBM Data Science Experience

Community Open Source IBM Added Value

Powered by IBM Bluemix DataWorks analytics platformPowered by IBM Bluemix DataWorks analytics platform

- Find tutorials and datasets- Connect with other data scientist- Ask questions- Read articles and papers- Fork and share projects

- Code in Scala/Python/R/SQL- Jupyter Notebooks- RStudio IDE and Shiny apps- Apache Spark - Your favorite libraries

- Modeler UI / Statistics- Prescriptive Analytics- Auto-data preparation - Auto-modeling - Advanced Visualizations- Model management and deployment

Be a better Data Scientist

IBM Data Science Experience provides an environment that brings together everything that a data scientist needs today. It includes the most popular Open Source tools and IBM unique value-add functionalities with community and social features integrated as a first class citizen to make data scientists more successful.

Page 11: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation11

IBM Data Science Experience

Community Open Source IBM Added Value

Powered by IBM Bluemix DataWorks analytics platformPowered by IBM Bluemix DataWorks analytics platform

- Find tutorials and datasets- Connect with other data scientist- Ask questions- Read articles and papers- Fork and share projects

- Code in Scala/Python/R/SQL- Jupyter Notebooks- RStudio IDE and Shiny apps- Apache Spark - Your favorite libraries

- Modeler UI / Statistics- Prescriptive Analytics- Auto-data preparation - Auto-modeling - Advanced Visualizations- Model management and deployment

Core Attributes of the Data Science Experience

IBM Data Science Experience provides an environment that brings together everything that a data scientist needs today. It includes the most popular Open Source tools and IBM unique value-add functionalities with community and social features integrated as a first class citizen to make data scientists more successful.

Page 12: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation12

Community Cards provide in-context learning for users

Page 13: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation13

Features for Sharing, Forking, and Reusing Project Assets increase your data science team’s productivity

Page 14: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation14

Live chat on Intercom for support from the IBM team and to provide your feedback on how we can improve DSX

Page 15: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation15

DSX has RStudio built into the experience thanks to our strategic partnership

Page 16: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation16

With RStudio you can also create Shiny web applications so that your analysis is accessible to the business

Page 17: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation17

Notebooks are browser-based interactive and collaborative development environments for data science

Page 18: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation18

BigInsights (HDFS)

Cloudant

(DBaaS)

dashDB(Analytics

)

Swift(Object Storag

e)

SQDB(Managed DB2)

Data SourcesIBM Cloud Public Cloud Cloud Apps On-Premises

Execute SQL

Statements

Execute SQL

Statements

Streaming Analytics via Micro-

batch

Streaming Analytics via Micro-

batch

M.L. and Statistical Algorithms

M.L. and Statistical Algorithms

DistributedGraph

Processing Framework

DistributedGraph

Processing Framework

General compute engine Basic I/O functions Task dispatching Scheduling

General compute engine Basic I/O functions Task dispatching Scheduling

Spark CoreSpark Core

Spark SQLSpark SQL Spark Streaming

Spark Streaming

MLlib Machine Learning

MLlib Machine Learning

GraphGraph

From a Notebook you can use IBM Analytics for Apache Spark to blend multiple data types, sources, and workloads

Page 19: IBM Data Science Experience - Mladen Jovanovski

IBM Analytics for Apache Spark

Performant Architecture

Productive Workflows

Leverages Existing Investments

IBM brings strength in enterprise, scale, and a managed offering to the Spark market

Continually Improving

Fully-managed and secured Spark

environment,

accessible on-demand or via reserved instances

In-memory architecture greatly reduces disk I/O 20-100x faster than MapReduce for common tasks

Analytic workflows across a multitude of sources Simplified but powerful syntax (~5x less code

than MR)

Integrates with SQL, Java, Python, Scala, etc.

No lock-in: 100% open source Spark Spark v1.6+ since February 2016

Continually updated apace Spark ecosystem

Pay-as-you-go or Dedicated deployment options

as a service

Page 20: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation20

The Spark Service uses Bluemix Object Storage as its preferred data store for building performant applications

Object storage provides inexpensive, scalable and self-healing retention of massive amounts of unstructured data

Every object exists at the same level in a flat address space

Bluemix Object Storage has a drag-and-drop upload and Swift API for programmatic access

DataWorks Connectors enable users to easily move data in and out of Bluemix Object Storage

Page 21: IBM Data Science Experience - Mladen Jovanovski

© 2015 IBM Corporation21 All of the supported targets are compatible with each source

Supported Data Sources for DSX via on-premises and cloud Connectors

Cloud Sources On-Premises Sources Cloud Targets On-Premises Targets

Amazon Redshift Apache Hive Amazon S3 IBM DB2® LUW

Amazon S3 Cloudera Impala Bluemix Object Storage IBM Pure Data for Analytics®

Apache Hive IBM DB2® LUW IBM Cloudant™ Teradata

Bluemix Object Storage IBM Informix® IBM dashDB

IBM BigInsights™ on Cloud * IBM Pure Data for Analytics®IBM BigInsights™ on Cloud *

IBM Cloudant™ Microsoft SQL Server IBM DB2® on Cloud

IBM dashDB MySQL Enterprise Edition IBM SQL Database

IBM DB2® on Cloud Oracle IBM Watson™ Analytics

IBM SQL Database Pivotal Greenplum PostgreSQL on Compose

Microsoft Azure PostgreSQL SoftLayer Object Storage

PostgreSQL on Compose Sybase

Salesforce Sybase IQ

SoftLayer Object Storage Teradata

Page 22: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation22

It is really happening! This is what is coming very soonSPSS Algorithms in Python, R and Scala – Automatic Model

Visualization

SPSS Modeler cloud client

Model deployment (batch, streaming and real-time)

Page 23: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation23

IBM Decision Optimization for DSX today

Decision Optimization on Cloud (DOcplexcloud) credentials used inside DSX

(1) Purchase DOcplexcloud on IBM Cloud Marketplace

(2) Receive credentials(3) Enter credentials into DSX

Future: sign up from within DSX for automatic credentials

Plenty of samples and tutorials available within DSX

Marketing Campaign Planning demo

Page 24: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation24

GitHub for revision control and sharing

Page 25: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation26

Pricing plans of Data Science Experience – All cloud

The plans and pricing are not final

- Spark Enterprise and- MLaaS Enterprise and

Addons:- SPSS Modeler and/or- SPSS Statistics and/or- Decision Optimization

- Spark Freemium- MLaaS Freemium- SPSS Modeler

- Spark Pay-Go- MLaaS Pay-Go

- Enterprise features:• Job Scheduling

Page 26: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation27

Presence on Bluemix – Bluemix as another entry point to DSX

Page 27: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation28

Our mission is to win the hearts and minds of Data Scientists

IBM Data Science Experience is a freemium model with value-add features, pricing and up-sell in development

Sign up and encourage your colleagues to do so at datascience.ibm.com

Calling all Data Scientists!

Get Started with Data Science Experience Today!

Page 28: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation29

IBM Data Science Experiencehttps://www.youtube.com/watch?v=HPzXlFp4rKE

IBM Data Science Experiencehttps://www.youtube.com/watch?v=HPzXlFp4rKE

Page 30: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation31

Page 31: IBM Data Science Experience - Mladen Jovanovski

© 2016 IBM Corporation32

Legal Disclaimer

• © IBM Corporation 2016. All Rights Reserved.• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained

in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

• If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete:Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

• If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete:All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

• Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.

• If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete:Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.

• If you reference Java™ in the text, please mark the first use and include the following; otherwise delete:Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

• If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete:Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

• If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete:Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

• If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete:UNIX is a registered trademark of The Open Group in the United States and other countries.

• If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete:Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.

• If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only.