machine learning in the age of big data: new approaches and business applications

Machine learning in the age of big data

Armando VieiraCloser

Armando.lidinwise.com

http://Armando.lidinwise.com/

Predicting the flue

1. Machine Learning: finding features, patterns & representations

2. The connectionist approach: Neural Networks3. Applications 4. The Deep Learning “revolution”: a step closer to the brain?5. Applications6. The Big Data deluge: better algorithms & more data

Topics

Was “Deep Blue” Intelligent? How about Watson? Or Google? Does machines have reached the

intelligence level of a rat? …. Let’s be pragmatic: I’ll call “intelligent” any

device capable of surprise me!

What is an “intelligent” machine?

Connectionism

1943 – Mculloch & Pitts + Hebb 1968- Rosenblat perceptron and the Minsk

argument - or why a good theory may kill an even better idea

1985- Rumelhart Perceptron 2006- Hinton Deep Learning (Boltzmann)

Networks

All together: Watson, Google et al

The connectionist way

Symbolic machines

The brain way

Input builds up on receptors (dendrites)

Cell has an input threshold

Upon breech of cell’s threshold, activation is fired down the axon.

Modeling the Human Brain?

The visual cortex

How the brain do the trick?

The simplest neural network

A step closer to success thanks to a training algorithm: back propagation

What is a Multilayer Perceptron

Learning a function

Training is nothing more than fitting: regression, classification, recommendations

Problem is we have to find a way to represent the world (extract features)

Supervised / unsupervised

Can a ANN learn this?

A very simple problem

Money

Age

FRUSTATION

Curse of dimensionality

Learning too much

OverfittingSimpler hypothesis has lower error rate

ANN are very hard to optimize Lots of local minimum (trap for stochastic

gradient descendent) Permutation invariant (no unique solution) How to stop training?

Optimization & convergence

Neural Networks are incredible powerful algorithms

But they are also wild beasts that should be treated with great care

Its very easy to fall in the GIGO trap Problems like overfit, suboptimization, bad

conditioned, wrong interpretation are common

Feeding & understanding the beast

Interpretation of outputs Loss function Outputs ≠ probabilities Where to draw the line? VERY careful in interpreting the outputs of ML

algorithms: you not always get what you see

Input preparation Clean & balance the data Normalize it properly Remove unneeded features, create new ones Missing values

Some care

PCA, Isomap, NMF and the like

Rutherford Backscattering (RBS) Credit Risk & Scoring Churn prediction (CDR) Prediction of hotel demand with Google

trends Adwords Optimization

Applications

Ion beam analysis

h

RBS

NRA

PIXE

ERDA

Channelling

MeV/amu

Rutherford backscattering: where is the pattern?

25Å Ge d-layer under 400 nm Si

Angle of incidence

Scattering angle

Beam energy

0 100 200 300 4000

500

10001.2 MeV

1.6 MeV

2 MeV

(a)

Channel

0

500

1000(b)

120o

140o

180o

Yie

ld (

arb

. un

its)

0

500

1000

1500

(c)

50o

25o

0o

Ge in Si: ANN architecture

architecture train set error test set error

(I, 100, O) 6.3 11.7

(I, 250, O) 5.2 10.1

(I, 100, 80, O) 3.6 5.3

(I, 100, 50, 20, O) 4.2 5.1

(I, 100, 80, 50, O) 3.0 4.1

(I, 100, 80, 80, O) 2.8 4.7

(I, 100, 50, 100, O) 3.0 4.2

(I, 100, 80, 80, 50, O) 3.2 4.1

(I, 100, 80, 50, 30, 20, O) 3.8 5.3

Anything in Al2O3: test set

0,1 1 10 100

0,1

1

10

100a)

Do

seA

NN (

1015

at/c

m2 )

Dosedata

(1015

at/cm2)

0 1000 2000 3000 40000

1000

2000

3000

4000

b)

Depthdata

(1015

at/cm2)

De

pth

AN

N (

1015

at/c

m2 )

Churn prediction on a Telecom

Model validation | Lift and Profit curves

Adwords optimization

Bankruptcy prediction

The Rating System

-2-1

01

2

-2

-1

0

1

2-1.5

-1

-0.5

0

0.5

1

cr

eb

Score (EBIT, Current ratio)

Fraud Detection

Hotel demand prediction using Google

Credit ScoringBefore After

Where is the information?

-4.5

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

ANN are not a “silver bullet”!

Neural networks are good when: many training data available; continuous variables; relevant features are known; unicity of the mapping.

Neural networks are less useful when: problem is linear; few data compared to the size of search space; data high dimensional; long range correlations.

They are black boxes

Characteristics of ANNsCharacteristic Traditional methods

(Von Neumann)Artificial neural networks

logics Deductive Inductive

Processing principle Logical Gestalt

Processing style Sequential Distributed (parallel)

Functions realised through

concepts, rules, calculations Concepts, images, categories, maps

Connections between concepts

Programmed a priori Dynamic, evolving

Programming Through a limited set of rigid rules

Self-programmable (given an appropriate architecture)

Learning By rules By examples (analogies)

Self-learning Through internal algorithmic parameters

Continuously adaptable

Tolerance to errors Mostly none Inerent

ANN are massive correlation & feature extraction machines isn’t what intelligence is all about?

Knowledge is embedded in a messy network of weights

Capable to model an arbitrary complex mapping

Here is the intelligence?

We need thousands of examples for training. Why?

Prior

Algorithms are simple: complexity lies in the data

Still…

Deep Learning approach

Boltzmann Machines

Teaching a RBM by Gibbs Sampling

Hinton et al, 2006

“quasi” non-supervised machines Extract and combine subtle features in the

data Build high-level representations (abstractions) Capable of knowledge transfer Can handle (very) high-dimensional data Are deep and broad: millions of synapses Work both ways: up and down

Nice features of Deep Learning

The dimensionality curse crushed?

Learning features that are not mutually exclusive

Top on image identification (is some cases it beat humans)

Top on video classification Top on real-time translation Top on Gene identification Reverse engineering: can replicate complex

human behaviour, like walking. Data visualization and of text

disambiguation (river-bank/bank-bailout) Kaggle

Applications

Before Now

Here comes everybody: Big Data, real BIG

The Big Data Revolution

In 2 years we produce more data (and garbage) than the accumulated over all history

Zettabytes of data, 1021 bytes produced every year

In data we trust

Data is the new gold… and its cheap

Machine learning molecules (**)

Most ML algorithms work better (sometimes much better) by simple throwing more data to them

And now we have more data. Plenty of it! Which is signal and which is noise? Let the

machines decide (they are good at it) Where humans stands in this equation? We

are feeding the machines!

Does size matter? A lot!

Don’t look for causation; welcome correlations Messiness - prepare to get your hands dirty Don’t expect definitive answers. Only

communists have them! Stop searching God’s equation Keep theories at bay and let the data speak Exactitude may not be better than “estimations” Forget about keep data clean and organized Data is alive and wild. Don’t imprisoned it

What have changed in such a data deluged world?

Flue prediction Netflix movie rating contest New York city building security Used car Veg food->airport Prediction rare events frauds and why its

important

Examples

A step closer to the brain? Yes and No What is missing? Predictive analytics (crime before it occurs)? Algorithms that learn & adapt Replace humans? Augment reality

Big Data & algorithms are revolutionizing the world. Fast!

6. Conclusions & reflections

Recommendations (Amazon, Netflix, Facebook) Trading (70% Wall Street is made by them) Identifying your partner, recruiting, votes Images, video, voice, translation (real time)

Where are we heading? NSA? Black boxes?

Are (intelligent) algorithms taking control of our lives?

References Deeplearning.net Hinton Google talks “Too big to know” Big Data: a new revolution that will

transform business Machine Learning in R

Matlab (several code – google for it) R (CRAN repository), Rminer Python (Skilearn) C++ (mainly on Github) Torch More on Deeplearning.net

Code

User based Collaborative Filters

Recommend an unseen item i to an user u based on engagement of other users to items 1 to 8.Items recommended in this case are i2 followed by i1.

Item based Collaborative Filters

Item based recommendation for a user ua

based on a neighbour of k = 3. Items recommended in this case are i3 followed by i4.

(item based CF superior to user based CF but it requires lot of information like ratings or user interaction with the product).

machine learning in the age of big data: new approaches and business applications

Business

data visualization

highdimensional data

learning features

big data deluge

history zettabytesof

neural networks applications

better algorithms

deep learning revolution