big & machine learning

8
2/5/2015 1 The many sources and rapid growth of data requires a new approach

Upload: daniel-devera

Post on 18-Jul-2015

90 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Big & machine learning

2/5/2015

1

The many

sources and

rapid growth of

data requires a

new approach

Page 2: Big & machine learning

2/5/2015

2

106

Megabyte

109

Gigabyte

500TB per Day in

Facebook

1012

Terabyte

The CERN Large Hadron

Colider generates 1PB per

Second

1015

Petabyte

1EB of data is created on the

Internet every day = 250

Million DVDs

1018

Exabyte

1.3ZB of network

traffic by 2016

1021

Zettabyte

This is our unniverse today =

250 Trillion of DVDs

1024

Yottabyte

This will be our digital

universe tomorrow with

data from the IoT

1027

Brontobyte

Discover, explore, and combine any data

Right from Excel, find any data: corporate, social, machine, Hadoop, open

Easily merge, transform, and clean up data

Page 3: Big & machine learning

2/5/2015

3

Explore & Visualize

Predictive Analytics

Forecasting/extrapolation What if these trends continue?

Predictive Modeling What will happen next?

Optimization What’s the best that can happen?

Co

mp

eti

tive

Ad

van

tag

e

Value

IntelligenceDiscoveryPresentation

Inte

ract

ive

Pro

act

ive

Pass

ive

Exploration

Page 4: Big & machine learning

2/5/2015

4

What is Machine Learning?

Predictive computing

systems become smarter

with experience

We want to learn a mapping from the input to the output; correct

values are provided by supervisor:

• Fraud Detection

• Image Recognition

• SPAM Filter

• Sales Forecast

We want to find regularities in the data. The class labels of training

data is unknown.

• Customer Segmentation

• Movies Recommendation engine

Several scenarios across diverse industries

Churn

analysis

Predictive

Maintenance

Spam

filtering

Ad

targeting

Recommendations

Engines

Fraud

detection

Image

detection &

classification

Forecasting

Anomaly

detection

Page 5: Big & machine learning

2/5/2015

5

Harvard Business, Thomas H. Davenport , October 2012

Business Problem Business ValueModeling Deployment

Azure Machine Learning

Devices & Applications

ML Studio

Azure Machine Learning

Azure Machine

Learning

APIPublish API

DATA

HDInsight

SQL Server VM

SQL Database

Blobs & Tables

Desktop files

Excel spreadsheet

Other data files on PC

Page 6: Big & machine learning

2/5/2015

6

R Open

Source

Packages

Mathematical

Programming

Online

analytical

processing

Graph

analytics

Text

analytics

Support

Vector

Machines

Boosted

Decision

Trees

Time series

processing

In the future

Support for

extensibility

by enabling

users to add

their own

algorithms as

modules

Associative

rule mining

Neural

networks

Regression

analysisClustering

Nearest-

neighbor

Algorithms

The Microsoft Cybercrime Center

Fraud Detection

Page 7: Big & machine learning

2/5/2015

7

The impact of Cybercrime

Cybercrime costs consumers $113 billion

a year

Every second, 12 people are victims of

cybercrime, nearly 400 million every year

50% of online adults have been victims in

the past year

1 in 5 small and medium enterprises are

targeted by cyber criminals

113 B

400 M

50 %

1 in 5

DEMO

Page 8: Big & machine learning

2/5/2015

8