open source machine learning open source probabilistic network library gary bradski program manager...

30
Open Source Machine Learning Open Source Machine Learning Open Source Probabilistic Network Library Open Source Probabilistic Network Library Gary Bradski Gary Bradski Program Manager Program Manager Systems Technology Labs - Intel Systems Technology Labs - Intel

Post on 20-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Open Source Machine LearningOpen Source Machine Learning

Open Source Probabilistic Network LibraryOpen Source Probabilistic Network Library

Gary BradskiGary BradskiProgram ManagerProgram Manager

Systems Technology Labs - IntelSystems Technology Labs - Intel

Page 2: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

2

Open Source ML

What are we announcing today?What are we announcing today? Intel is releasing a library of Open Source Intel is releasing a library of Open Source

Software for Machine LearningSoftware for Machine Learning First library is Probabilistic Network Library (PNL);First library is Probabilistic Network Library (PNL);

comprised of code for inference and learning using comprised of code for inference and learning using Bayesian NetworksBayesian Networks

Research and Development was conducted in Research and Development was conducted in Intel research labs in US, Russia and China Intel research labs in US, Russia and China

Software is released as part of Intel Open Software is released as part of Intel Open Research ProgramResearch Program Tool for research in many application areasTool for research in many application areas Open Source under a BSD licenseOpen Source under a BSD license

The code is free for academic and commercial useThe code is free for academic and commercial use More info: More info: http://www.intel.com/research/mrl/pnl http://www.intel.com/research/mrl/pnl

Page 3: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

3

Open Source ML

Why is Intel involved?Why is Intel involved?

Statistical Computing and Machine Learning Statistical Computing and Machine Learning can change computing applications in a can change computing applications in a considerable wayconsiderable way

Machine Learning requires high-powered Machine Learning requires high-powered processorsprocessors

Ties into Intel’s research in other areas such as Ties into Intel’s research in other areas such as wireless networking, sensor networks and wireless networking, sensor networks and Proactive HealthProactive Health

Page 4: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

4

Open Source ML

What is Machine Learning?What is Machine Learning?

Machine Learning allows computers to learn from their Machine Learning allows computers to learn from their experiences and from gathered dataexperiences and from gathered data

We’ve known for > 200 years that probability theory is We’ve known for > 200 years that probability theory is the right tool to model systems, but it has always been the right tool to model systems, but it has always been too hard to compute. Recent advances in computing too hard to compute. Recent advances in computing allow calculation of complex models allow calculation of complex models

Machines are good at gathering data and performing Machines are good at gathering data and performing complex analysis complex analysis

Machine Learning is a sea change in development of Machine Learning is a sea change in development of applications since it allows computers to be more applications since it allows computers to be more proactive and predictiveproactive and predictive

Page 5: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

5

Open Source ML

ApplicationsApplications of Machine Learning of Machine Learning

Interface – Audio Visual Speech Recognition (AVSR); Interface – Audio Visual Speech Recognition (AVSR); nnatural language processing, etc.atural language processing, etc.

AI – robotics, computer games, entertainment, etc.AI – robotics, computer games, entertainment, etc. Data Analysis – information retrieval, data mining, etc.Data Analysis – information retrieval, data mining, etc. Biological – gene sequencing, genomics, Biological – gene sequencing, genomics,

computational pharmacologycomputational pharmacology Computer – run time optimizationComputer – run time optimization Industrial – fault diagnosisIndustrial – fault diagnosis Applications of machine learning cover a broad rangeApplications of machine learning cover a broad range

Genomics - matching of protein strandsGenomics - matching of protein strands Collaborative Filtering - personal “Google”Collaborative Filtering - personal “Google” Drug Discovery – shortening of drug discovery cycleDrug Discovery – shortening of drug discovery cycle Patient and elder care – wireless camera and sensor network Patient and elder care – wireless camera and sensor network

help monitor patientshelp monitor patients

Page 6: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

6

Open Source ML

Open ML Components & PlanOpen ML Components & PlanKey:• Optimized• Implemented• Not implemented

Modeless Model based

Uns

uper

vise

dS

uper

vise

d

• K-means

• K-NN

• Boosted decision trees

• SVM

• Agglomerative clustering• Spectral clustering

• BayesNets: Classification

• Decision trees

• BayesNets: Parameter fitting

• Dependency Nets

• PCA

• Influence diagrams

• Bayesnet structure learning

Statistical LearningOpenSL - 2004

Bayesian NetworksOpenPNL-2003

OpenML

Page 7: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

7

Open Source ML

Model Based Machine Learning Model Based Machine Learning Machine Learning can be based on Models (model-Machine Learning can be based on Models (model-

based) or it could be Model-lessbased) or it could be Model-less In version 1.0 of OpenML Intel is focusing on Bayesian In version 1.0 of OpenML Intel is focusing on Bayesian

Networks and the Probabilistic Networks which fall Networks and the Probabilistic Networks which fall under model-based categoryunder model-based category

The Bayesian approach provides a mathematical rule The Bayesian approach provides a mathematical rule explaining how one should change existing beliefs in explaining how one should change existing beliefs in the light of new evidencethe light of new evidence

Model-less approaches are used for clustering and Model-less approaches are used for clustering and classificationclassification

Intel will release libraries using model-less approaches next Intel will release libraries using model-less approaches next yearyear

Page 8: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

8

Open Source ML

Applications of Model-less MLApplications of Model-less ML

• Suitable for applications such as Fault Diagnosis• The system does not have a model• It collects data and clusters and classifies them• Recognition is derived from these clusters

Machine 18Fab 11

Tolerance goes out when temperature

>87o

Page 9: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

9

Open Source ML

Applications of Model-based MLApplications of Model-based ML Our research has focused on Our research has focused on

Bayesian NetworksBayesian Networks Hidden Markov Models (HMM) – a Hidden Markov Models (HMM) – a

Bayesian Net - are widely used in Bayesian Net - are widely used in speech recognition, couple Hidden speech recognition, couple Hidden Markov Models are used in Audio Markov Models are used in Audio Visual Speech Recognition (use of Visual Speech Recognition (use of visual data in speech recognition)visual data in speech recognition)

Open Source PNL is an optimized Open Source PNL is an optimized infrastructure for research and infrastructure for research and development in Model Based development in Model Based Machine LearningMachine Learning

Audio Visual Speech Recognition

Face Recognition & Tracking

Page 10: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

10

Open Source ML

Example: Vision ApplicationsExample: Vision Applications

Image super resolution - Use a Bayesian method to develop a clear image from a small resolution picture

Page 11: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

11

Open Source ML

Intel Systems Technology LabIntel Systems Technology LabSanta Clara, CA, USA

Graphics LabMachine LearningArchitecture Lab

Hillsboro, OR, USAWireless Systems

Media3D Graphics

Tech. Management

Beijing, PR China

China Research CenterSpeech and Machine

Learning

Nizhny Novgorod, Russia

Architecture for Machine Learning, Media, 3D Graphics,

Computer Vision

•One of three major labs of Intel Corporate Technology Group

•300 researchers worldwide

•Focus on impact on Intel Architecture

•Drive university and industry initiatives

Page 12: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

12

Open Source ML

WhyWhy Open Source..?Open Source..? Expands our research baseExpands our research base

Allows Intel researchers to collaborate easily Allows Intel researchers to collaborate easily with thousands of colleagues worldwidewith thousands of colleagues worldwide

Remove barriers, speed up collaborationRemove barriers, speed up collaboration

Tap into a very large innovative communityTap into a very large innovative community Ability to get feedback from a large number of Ability to get feedback from a large number of

developers to design future microprocessorsdevelopers to design future microprocessors Chance to explore innovative usage models Chance to explore innovative usage models

Diffuse new technologies and usage Diffuse new technologies and usage models to a wide group of early adoptersmodels to a wide group of early adopters

Page 13: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

13

Open Source ML

Open Research ProgramOpen Research Program

Currently four open source projectsCurrently four open source projects http://www.intel.com/software/products/opensource/index.htmhttp://www.intel.com/software/products/opensource/index.htm

OpenCV – Computer Vision LibraryOpenCV – Computer Vision Libraryhttp://www.intel.com/research/mrl/research/opencv/http://www.intel.com/research/mrl/research/opencv/

OpenRC - Open Research CompilerOpenRC - Open Research Compilerhttp://ipf-orc.sourceforge.net/ORC-overview.htmhttp://ipf-orc.sourceforge.net/ORC-overview.htm

OpenLF – Open Light FieldsOpenLF – Open Light Fieldshttp://www.intel.com/research/mrl/research/lfm/http://www.intel.com/research/mrl/research/lfm/

OpenAVSR – Audio Visual Speech RecognitionOpenAVSR – Audio Visual Speech Recognitionhttp://www.intel.com/research/mrl/research/avcsr.htmhttp://www.intel.com/research/mrl/research/avcsr.htm

Page 14: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

14

Open Source ML

Released in June 2000Released in June 2000 A library of 500+ computer vision algorithms, A library of 500+ computer vision algorithms,

including applications such as Face including applications such as Face Recognition, Face Tracking, Stereo Vision, Recognition, Face Tracking, Stereo Vision, Camera CalibrationCamera Calibration

Highly tuned for IAHighly tuned for IA Windows and Linux VersionsWindows and Linux Versions Over 500,000 DownloadsOver 500,000 Downloads Broad use in academia (450) and Industry (360)Broad use in academia (450) and Industry (360)

Example: OpenCVExample: OpenCV

Page 15: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

15

Open Source ML

More InformationMore Information

Visit Open Source MLOpen Source ML Web page & download at:

http://www.intel.com/research/mrl/pnl

Page 16: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

16

Open Source ML

BackupBackup

Page 17: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

17

Open Source ML

Modeless and Model Based MLModeless and Model Based ML

ModelessModeless ClassifiersClassifiers ClusteringClustering Kernel estimatorsKernel estimators

Model BasedModel Based Bayesian NetworksBayesian Networks Function fittersFunction fitters RegressionRegression FiltersFilters

We’ll use an example application from our current research to descibe two basic approaches to machine learning:

AAA

AACACB CBABBC

CCB ABBCCB

BC

A

B C

Page 18: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

18

Open Source ML

Quick view of Bayesian networksQuick view of Bayesian networks

Page 19: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

19

Open Source ML

What is a Bayesian Network?What is a Bayesian Network? AA Bayesian networkBayesian network, or a belief network, is a graph in , or a belief network, is a graph in

which the following holds:which the following holds: A set of random variables makes up nodes of the network.A set of random variables makes up nodes of the network. A set of directed links connects pairs of nodes to denote causality A set of directed links connects pairs of nodes to denote causality

relations between variables.relations between variables.

Each node has a Each node has a conditional probability conditional probability distribution (CPD) that distribution (CPD) that quantifies the effects quantifies the effects that the parents have on that the parents have on the nodethe node

Graphical Models are Graphical Models are more general, allowing more general, allowing undirected links, mixed undirected links, mixed directed/undirected directed/undirected connections, and loops connections, and loops within the graphwithin the graph

Page 20: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

20

Open Source ML

Computational Advantages ofComputational Advantages ofBayesian NetworksBayesian Networks Bayesian Networks graphically express Bayesian Networks graphically express conditional independenceconditional independence of probability of probability

distributions. distributions. Independencies can be exploited for large computational savings.Independencies can be exploited for large computational savings.

EXAMPLE:EXAMPLE:

Joint probability of 3 discrete variable (A,B,C) system with 5 possible values each:

P(A,B,C) = 5x5x5 table:

A

B C

A

B

C

A

B C

A

But a graphical model factors the probabilities taking advantage of the independencies:

A

125 parameters

55 parameters

Page 21: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

21

Open Source ML

Causality and Bayesian NetsCausality and Bayesian Nets

Mains

Transf.

Diode

Diode

Capac.

Ammeter

Battery

Observed

Un-Observed

Think of Bayesian Networks as a “Circuit Diagram” of Probability Models

• The Links indicate causal effect, not direction of information flow.• Just as we can predict effects of changes on the circuit diagram, we can predict consequences of “operating” on our probability model diagram.

Page 22: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

22

Open Source ML

Quick view of Decision Trees and Quick view of Decision Trees and Statistical BoostingStatistical Boosting

Page 23: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

23

Open Source ML

Statistical ClassificationStatistical ClassificationCluster data to infer or predict properties Cluster data to infer or predict properties

Example: Decision treesExample: Decision trees

Find splits that most “purify” the labeled data

AACBAABBCBCC

AACACB CBABBC

All the way down …

AAA

AACACB CBABBC

CCB

BCC

ABBCCB

BC A BBC

CBB

Prune the tree to minimize complexity

AAA

AACACB CBABBC

CCB ABBCCB

BC

The split rules are used to classify Future data

Page 24: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

24

Open Source MLStatistical ClassificationStatistical ClassificationBoostingBoosting

Use a weak classifier such as a 1 level tree:

AACBAABBCBCC

AACACB CBABBC

Re-weight the error cases and classify again;Record weight factor “Wi” for “ith” case.

Use the error weighted forest to voteon the classification of new data

AACBAABBCBCC

AAAACB CCBBBC

AACBAABBCBCC

AACC CCAABBBB

AACBAABBCBCC

AAAABBBB CCCC

AACBAABBCBCC

AAAA CBCCBBBC

AACBAABBCBCC

AAABBB ACCCCB

AACBAABBCBCC

AAABBB ACCCCB

AACBAABBCBCC

AAAABBBB CCCC

AACBAABBCBCC

AAAABBBB CCCC

Repeat until you have a “forest”AACBAABBCBCC

AACC CCAABBBB

AACBAABBCBCC

AAABBB ACCCCB

AACBAABBCBCC

AAAABBBB CCCC

AACBAABBCBCC

AACC CCAABBBB

AACBAABBCBCC

AAAA CBCCBBBC

AACBAABBCBCC

AAABBB ACCCCB

AACBAABBCBCC

AAAA CBCCBBBC

Decision1 * W1

AACBAABBCBCC

AAABBB ACCCCB

AACBAABBCBCC

AACC CCAABBBB

Decision2 * W2

DecisionN * WN

Weighted Sum Decision

Page 25: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

25

Open Source ML

Application areas and librariesApplication areas and libraries

Page 26: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

26

Open Source ML

Applications of MLApplications of ML

Interface Data AnalysisAI

Biometric ID

Lips+SpeechAVSR

VisionModels

Speech

AudioModels

Text Recog.

Natural Lang.Action Planning

CognitiveModeling

Game Play

Robotics

Mapping

Neural NetsSVM

Trees,Boosting,Randomforest

ReinforcementLearning

StatisticalRegression,ANOVA, …

StochasticDiscrimination

Ad

ap

tive

F

ilters

Re

latio

na

lN

etw

ork

s

DecisionTheory,InfluenceDiagrams

GraphicalModels/MRFs

BayesianNetworks

GeneticAlgorithms

Industrial

FaultDiagnosis

ProcessControl

Disposition

SupplyChain

Models ofManufacturing

TOOLS:

Actively working on

External activity

Past work

RampingKey:

InformationRetrieval

Datamining

Sensor Fusion

InfoFiltering

CollaborativeFiltering

Biologic

Proteomics

Genomics

Metabolics

GeneSequencing

Epidemiology

ComputationalPharmacology

Computer

TraceCompression

CompilerOptimization

Binary TransAdaptation

Run TimeOptimization

Page 27: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

27

Open Source ML

Game Play

CognitiveModeling

Probabilistic Network LibraryProbabilistic Network LibraryApplication Application

Driven Driven

Drive intoDrive intoFuture HardwareFuture Hardware

Lips+SpeechAVSR

InformationRetrieval Trace

Compression

LearnedControl

VisionModels

GeneSequencing

EpidemiologyGenomics

InterfaceInterface Data MiningData Mining ““AI”AI”

Bayesian NetworkBayesian NetworkEngineEngine

WorkloadWorkloadAnalysisAnalysis

ArchitectureArchitecture

Intel Universities

Robotics

Drive intoDrive intohardwarehardware

Chipset Platform CPU Instructionscache

Create New ArchitecturesModify Existing Architectures

Theories &Theories &AlgorithmsAlgorithms

StructureLearning

Decision &Utility theory

Dynamic BNMRFs

Gibbs SamplingParticle Filter

Junction TreeFactor Graph

EMReinforcement

Loopy BeliefVariational

Data HandlingCross ValidationPlates

InfoFiltering

Speech

AudioModels

Natural Lang.Biometric ID

ProcessControl

Disposition

SupplyChain

FaultDiagnosis

Models ofManufacturing

IndustrialIndustrial

Page 28: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

28

Open Source ML

Open Source Computer Vision (OpenCV)Open Source Computer Vision (OpenCV)

Page 29: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

29

Open Source ML

Machine Learning Library (OpenMLL)Machine Learning Library (OpenMLL)

AACBAABBCBCC

AAA

AACACB CBABBC

CCB

BCC

ABBCCB

BC A BBC

CBB

CLASSIFICATION / REGRESSIONCLASSIFICATION / REGRESSIONCARTStatistical BoostingMARTRandom ForestsStochastic DiscriminationLogisticSVMK-NN

CLUSTERINGCLUSTERINGK-MeansSpectral ClusteringAgglomerative ClusteringLDA, SVD, Fisher Discriminate

TUNING/VALIDATIONTUNING/VALIDATIONCross validationBootstrappingSampling methods

Alpha Q1’04, Beta Q4’04

Page 30: Open Source Machine Learning Open Source Probabilistic Network Library Gary Bradski Program Manager Systems Technology Labs - Intel

Intel® Confidential

30

Open Source ML

Optimization (Optimization (Lib ?Lib ?))

Large-scale Optimizations

Continuous

Mixed DiscreteConstrained Unconstrained

Linear Nonlinear Nonlinear

LP QP NLP

Interior Point

Active Set

Branch and Bound

Conjugate Gradient, Newton

Sim. Anealing, Genetic Alg,

Stoch. Search, Network

Programming,Dynamic

Programming

Combinatorial Optimizations

Domain Reduction,

Constraints Propagation

SQPSimplex

Problems looking at: Circuit layout; Device geometry; Chemical binding synthesis