machine learning in the age of big data: new approaches and business applications
DESCRIPTION
Presentation at University of Lisbon on Machine Learning and big data. Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithmsTRANSCRIPT
Machine learning in the age of big data
Armando VieiraCloser
Armando.lidinwise.com
Predicting the flue
1. Machine Learning: finding features, patterns & representations
2. The connectionist approach: Neural Networks3. Applications 4. The Deep Learning “revolution”: a step closer to the brain?5. Applications6. The Big Data deluge: better algorithms & more data
Topics
Was “Deep Blue” Intelligent? How about Watson? Or Google? Does machines have reached the
intelligence level of a rat? …. Let’s be pragmatic: I’ll call “intelligent” any
device capable of surprise me!
What is an “intelligent” machine?
Connectionism
1943 – Mculloch & Pitts + Hebb 1968- Rosenblat perceptron and the Minsk
argument - or why a good theory may kill an even better idea
1985- Rumelhart Perceptron 2006- Hinton Deep Learning (Boltzmann)
Networks
All together: Watson, Google et al
The connectionist way
Symbolic machines
The brain way
Input builds up on receptors (dendrites)
Cell has an input threshold
Upon breech of cell’s threshold, activation is fired down the axon.
Modeling the Human Brain?
The visual cortex
How the brain do the trick?
The simplest neural network
A step closer to success thanks to a training algorithm: back propagation
What is a Multilayer Perceptron
Learning a function
Training is nothing more than fitting: regression, classification, recommendations
Problem is we have to find a way to represent the world (extract features)
Supervised / unsupervised
Can a ANN learn this?
Sure!
A very simple problem
Money
Age
FRUSTATION
Curse of dimensionality
Learning too much
OverfittingSimpler hypothesis has lower error rate
ANN are very hard to optimize Lots of local minimum (trap for stochastic
gradient descendent) Permutation invariant (no unique solution) How to stop training?
Optimization & convergence
Neural Networks are incredible powerful algorithms
But they are also wild beasts that should be treated with great care
Its very easy to fall in the GIGO trap Problems like overfit, suboptimization, bad
conditioned, wrong interpretation are common
Feeding & understanding the beast
Interpretation of outputs Loss function Outputs ≠ probabilities Where to draw the line? VERY careful in interpreting the outputs of ML
algorithms: you not always get what you see
Input preparation Clean & balance the data Normalize it properly Remove unneeded features, create new ones Missing values
Some care
PCA, Isomap, NMF and the like
Rutherford Backscattering (RBS) Credit Risk & Scoring Churn prediction (CDR) Prediction of hotel demand with Google
trends Adwords Optimization
Applications
Ion beam analysis
h
RBS
NRA
PIXE
ERDA
Channelling
MeV/amu
Rutherford backscattering: where is the pattern?
25Å Ge d-layer under 400 nm Si
Angle of incidence
Scattering angle
Beam energy
0 100 200 300 4000
500
10001.2 MeV
1.6 MeV
2 MeV
(a)
Channel
0
500
1000(b)
120o
140o
180o
Yie
ld (
arb
. un
its)
0
500
1000
1500
(c)
50o
25o
0o
Ge in Si: ANN architecture
architecture train set error test set error
(I, 100, O) 6.3 11.7
(I, 250, O) 5.2 10.1
(I, 100, 80, O) 3.6 5.3
(I, 100, 50, 20, O) 4.2 5.1
(I, 100, 80, 50, O) 3.0 4.1
(I, 100, 80, 80, O) 2.8 4.7
(I, 100, 50, 100, O) 3.0 4.2
(I, 100, 80, 80, 50, O) 3.2 4.1
(I, 100, 80, 50, 30, 20, O) 3.8 5.3
Anything in Al2O3: test set
0,1 1 10 100
0,1
1
10
100a)
Do
seA
NN (
1015
at/c
m2 )
Dosedata
(1015
at/cm2)
0 1000 2000 3000 40000
1000
2000
3000
4000
b)
Depthdata
(1015
at/cm2)
De
pth
AN
N (
1015
at/c
m2 )
Churn prediction on a Telecom
Model validation | Lift and Profit curves
Adwords optimization
Bankruptcy prediction
The Rating System
-2-1
01
2
-2
-1
0
1
2-1.5
-1
-0.5
0
0.5
1
cr
eb
Score (EBIT, Current ratio)
Fraud Detection
Hotel demand prediction using Google
Credit ScoringBefore After
Where is the information?
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
ANN are not a “silver bullet”!
Neural networks are good when: many training data available; continuous variables; relevant features are known; unicity of the mapping.
Neural networks are less useful when: problem is linear; few data compared to the size of search space; data high dimensional; long range correlations.
They are black boxes
Characteristics of ANNsCharacteristic Traditional methods
(Von Neumann)Artificial neural networks
logics Deductive Inductive
Processing principle Logical Gestalt
Processing style Sequential Distributed (parallel)
Functions realised through
concepts, rules, calculations Concepts, images, categories, maps
Connections between concepts
Programmed a priori Dynamic, evolving
Programming Through a limited set of rigid rules
Self-programmable (given an appropriate architecture)
Learning By rules By examples (analogies)
Self-learning Through internal algorithmic parameters
Continuously adaptable
Tolerance to errors Mostly none Inerent
ANN are massive correlation & feature extraction machines isn’t what intelligence is all about?
Knowledge is embedded in a messy network of weights
Capable to model an arbitrary complex mapping
Here is the intelligence?
We need thousands of examples for training. Why?
Prior
Algorithms are simple: complexity lies in the data
Still…
Deep Learning approach
Boltzmann Machines
Teaching a RBM by Gibbs Sampling
Hinton et al, 2006
“quasi” non-supervised machines Extract and combine subtle features in the
data Build high-level representations (abstractions) Capable of knowledge transfer Can handle (very) high-dimensional data Are deep and broad: millions of synapses Work both ways: up and down
Nice features of Deep Learning
The dimensionality curse crushed?
Learning features that are not mutually exclusive
Top on image identification (is some cases it beat humans)
Top on video classification Top on real-time translation Top on Gene identification Reverse engineering: can replicate complex
human behaviour, like walking. Data visualization and of text
disambiguation (river-bank/bank-bailout) Kaggle
Applications
Before Now
Here comes everybody: Big Data, real BIG
The Big Data Revolution
In 2 years we produce more data (and garbage) than the accumulated over all history
Zettabytes of data, 1021 bytes produced every year
In data we trust
Data is the new gold… and its cheap
Machine learning molecules (**)
Most ML algorithms work better (sometimes much better) by simple throwing more data to them
And now we have more data. Plenty of it! Which is signal and which is noise? Let the
machines decide (they are good at it) Where humans stands in this equation? We
are feeding the machines!
Does size matter? A lot!
Don’t look for causation; welcome correlations Messiness - prepare to get your hands dirty Don’t expect definitive answers. Only
communists have them! Stop searching God’s equation Keep theories at bay and let the data speak Exactitude may not be better than “estimations” Forget about keep data clean and organized Data is alive and wild. Don’t imprisoned it
What have changed in such a data deluged world?
Flue prediction Netflix movie rating contest New York city building security Used car Veg food->airport Prediction rare events frauds and why its
important
Examples
A step closer to the brain? Yes and No What is missing? Predictive analytics (crime before it occurs)? Algorithms that learn & adapt Replace humans? Augment reality
Big Data & algorithms are revolutionizing the world. Fast!
6. Conclusions & reflections
Recommendations (Amazon, Netflix, Facebook) Trading (70% Wall Street is made by them) Identifying your partner, recruiting, votes Images, video, voice, translation (real time)
Where are we heading? NSA? Black boxes?
Are (intelligent) algorithms taking control of our lives?
References Deeplearning.net Hinton Google talks “Too big to know” Big Data: a new revolution that will
transform business Machine Learning in R
Matlab (several code – google for it) R (CRAN repository), Rminer Python (Skilearn) C++ (mainly on Github) Torch More on Deeplearning.net
Code
User based Collaborative Filters
Recommend an unseen item i to an user u based on engagement of other users to items 1 to 8.Items recommended in this case are i2 followed by i1.
Item based Collaborative Filters
Item based recommendation for a user ua
based on a neighbour of k = 3. Items recommended in this case are i3 followed by i4.
(item based CF superior to user based CF but it requires lot of information like ratings or user interaction with the product).