alexander gammerman - machine learning for big data

Machine Learning for Big Data

Alexander Gammerman

Computer Learning Research CentreRoyal Holloway, University of London

Trends in Big DataSTFC/RUSI: Big Data for Security and Resillience

March 7th, 2014

1 / 19

Layout

1 Debunking the myth

2 Machine Learning (Data Analytics)

3 Trends in Machine Learning for Big Data

4 Conclusions

2 / 19

”Fashionable” pursuit

AI, Cybernetics, Neural Networks, Expert Systems,Big Data?

Big Data, small data, any data – what we need is Data Analysis orData Analytics or Machine Learning

3 / 19

Machine Learning: what is it?

ML is intersection of Statistics and Computer Science.

Statistics deals with inferences to obtain valid conclusions from data undervarious models and assumptions.

Computer Science considers what is computable, develops efficientalgorithms and concerns with data storage and manipulation.

ML takes the past data, ”learns”, tries to find some rules, regularities inthe data in order to make predictions for the future examples. Efficientalgorithms have to be developed to make valid predictions.

4 / 19

Computer Learning Research Centre (CLRC) at RoyalHolloway, University of London

Established in 1998 to develop machine learning theory, including design ofefficient algorithms for data analysis.

CLRC Fellows, including several prominent ones, such as: Vapnik andChervonenkis (the two founders of statistical learning theory), Shafer(co-founder of the DempsterShafer theory), Rissanen (inventor of theMinumum Description Length principle), Levin (one of the 3 founders ofthe theory of NP-completeness, made fundamental contributions toKolmogorov complexity)

5 / 19

Recent years: explosion of interest in machine-learning methods, inparticular statistical learning theory. Statistical learning theory: similargoals to statistical science, but

it is nonparametric and

concerned with the problem of prediction.

6 / 19

Problems and Current Techniques

Classical techniques: small scale, low-dimensional data. But conceptualand computational difficulties for high-dimensional data. Validity ofpredictions. Confidence measures. Online prediction.

Current techniques for dimensionality problem: Support Vector Machine(Vapnik, 1995, 1998; Vapnik and Chervonenkis, 1974); Kernel Methods.New technique for validity problem: Conformal Predictors.

7 / 19

Projects

Compact Descriptors for Automatic Target Identification (withQinetiQ).

Statistical profiling of offenders (with the Home Office).

Material identification with atmosphere corrections (with WatefallSolutions).

Unmixing spectra (with Qinetiq).

Anomaly detection (vehicles) (with Thales).

Fault Diagnosis (with Marconi Instruments).

8 / 19

Projects – cont’d

Abdominal Pain (with Western General Hospital, Edinburgh).

Ovarian Cancer (with Institute for Women’s Health, UCL).

Depression (with Institute of Psychiatry, Kings College)

Child Leukemia (with Royal London Hospital)

Heart Diseases ((with Institute for Women’s Health, UCL).

Analysis of microarrays (with Veterinary Laboratory Agency –DEFRA)

Protein-Protein Interaction (EU project)

9 / 19

How much data do we need to answer our questions?

Big Data: V 3

Volume: Gigabyte(109); Terabyte (1012); Petabyte (1015); Exabyte(1018); Zettabyte (1021).

Variety: structured, semi-structured, unstructured; text, image, audio,video.

Velocity: dynamic; time-varying, etc.

Plus: high-dimensionality

But: if the answer is a Zettabyte what is the question?

The global data supply reached 2.8 zettabytes (ZB) in 2012 - or 2.8trillion GB - but just 0.5% of this is used for analysis, according to theDigital Universe Study. Volumes of data are projected to reach 40ZB by2020, or 5,247 GB per person.

10 / 19

We don’t need the big data per se - we need to have a problem first andthen decide how much data we need to solve the problem.

If a child wants to learn a concept of a car, he/she doesn’t need to have 1million or billion cars to learn the concept - enough 10 or 100.If we want to predict digits, we can learn on the first 100 or 1000 digitsand confidently with high accuracy, identify the next one.

11 / 19

Figure : USPS data

12 / 19

Figure : Conformal Predictors on USPS data: Online cumulative multiplepredictions at different confidence levels (”Hedging predictions in MachineLearning” by A.Gammerman and V.Vovk The Computer Journal (2007) 50 (2):151-163).

13 / 19

In fact, there is a well-known concept in machine learning. If in the pastpeople thought that the larger training set of data we have the moreaccurate results can be obtained. But the founders of statistical learningtheory, V.Vapnik and A.Cherovnenkis, showed that it is not just the lengthof the training data - it is actually another charachterisitcs called”capacity” that is more important.

14 / 19

Trends in Machine Learning for Big Data

How do we make machine learning algorithms scale to large datasets?There are two main approaches: (1) developing parallelizable MLalgorithms and integrating them with large parallel systems and (2)developing more efficient algorithms.

The data growth is driving the need for parallel and online algorithms andmodels that can handle this ”Big Data”.

Need to explore the computational foundations associated with performingthese analyses in the context of parallel and cloud architectures.

15 / 19

Large-scale modeling techniques and algorithms include

transductive and inductive models,

online compression models (extension of conformal predictors),

graphical models,

deep learning and semi-supervised learning algorithms,

clustering algorithms,

parallel learning algorithms.

The computational techniques provide a basic foundation in large-scaleprogramming, ranging from the basic ”parfor” to parallel abstractions,such as MapReduce (Hadoop) and GraphLab.

16 / 19

Transduction

Data General

Knowledgelearning

Particular

(future examples)

(past examples)

inductive

transduction deduction

Figure : Induction and Transduction [V.Vapnik, 1995]

17 / 19

Why use conformal predictions?

Why, after 100 years of research in statistics, do we need yet anothermethod of prediction?

It is simple and rigorous.

Given any of a wide range of learning/statistical prediction methods,conformal prediction can be used as a wrapper to provide a measureof confidence.

It is valid under weak assumptions.

It limits the fraction of prediction mistakes from the start. (Crudely, apredictor can either make a prediction, or else say dont know, possiblyin a graded way, such as giving a wide prediction interval.)

It works in practice.

18 / 19

Conclusions

”It took Deep Thought 7.5 million years to answer the ultimate question.As nobody knew what the ultimate question to Life, The Universe andEverything actually was, nobody knows what to make of the answer (42)”.

Nowdays, as John Poppelaars noticed, many people think that the BigData would help to find the ultimate question.

But I already know that it is not Big Data, and the answer is not 42, butthe Machine Learning.

19 / 19

alexander gammerman - machine learning for big data

Government & Nonprofit