a java toolbox for the analysis of massive data streams using probabilistic graphical...

1
A Java Toolbox for the Analysis of MassIve Data STreams using Probabilistic Graphical Models Description We present a Java toolbox for scalable probabilistic machine learning. Probabilistic machine learning Model your problem using a flexi- ble probabilistic language based on graphical models. Then, fit it with data using a Bayesian approach to handle modelling uncertainty. H x Multi-core and distributed processing AMIDST provides tailored parallel and distributed implementations of Bayesian parameter learning for batch and stream- ing data. This processing is based on flexible and scalable message passing al- gorithms. Main Features Probabilistic Graphical Scalable inference Data Streams Models Specify your model using probabilistic graphical mod- els with latent variables and temporal dependencies Perform inference on your probabilistic models with powerful approximate and scalable algorithms. Update your models when new data is available. Ap- propriate for learning from data streams. WEKA Large-scale Data Researchers Interoperability Use your defined models to process massive data sets in a distributed computer clus- ter using Flink or Spark. Flexible toolbox for re- searchers performing their experimentation in machine learning. Leverage existing function- alities and algorithms by in- terfacing to other software tools. Real world and highly complex use-cases Risk prediction in credit operations Recognition of traffic maneuvers Risk prediction in credit opera- tions. Financial data is collected continuously and reported on a monthly basis. We explore it as an evolving classification prob- lem and co. This work has been performed in collaboration with one of our partners, the Spanish bank BCC. Prototype models for early recog- nition of traffic maneuver inten- tions. Data is continuously col- lected by car on-board sensors giving rise to a large and quickly evolving data stream. This work has been performed in collabo- ration with one of our partners, DAIMLER. Identifying global trends in the financial sector Individual trends for financial indicators We can identify seasonal drifts and gradual changes. {E[H 1,t ]}-credit {E[H 2,t ]}-income {E[H 3,t ]}-expenses {E[H 4,t ]}-balance {E[H 5,t ]}-mortgages {E[H 6,t ]}-loans Global trend for financial indicators Highly correlated with unemployment rate (UR). Pearson’s correlation coefficient is 0.96. Apr 2007 Aug 2007 Dec 2007 Mar 2008 Jul 2008 Oct 2008 Feb 2009 Jun 2009 Sep 2009 Jan 2010 Apr 2010 Aug 2010 Dec 2010 Apr 2011 Sep 2011 Jan 2012 Jun 2012 Oct 2012 Mar 2013 Jul 2013 Dec 2013 {UR} {E[H t ]} Team Andr´ es Masegosa Ana M. Mart´ ınez Dar´ ıo Ramos-Lopez Rafael Caba˜ nas Thomas D. Nielsen Helge Langseth Antonio Salmer´ on Anders L. Madsen Developer Developer Developer Developer Scientific Member Scientific Member Scientific Member Scientific Member NTNU Aalborg University University of Almeria Aalborg University Aalborg University NTNU University of Almeria HUGIN EXPERT S/A Academic and Industrial partners Contact • Visit http://amidst.github.io/toolbox/ to sign up to join the growing AMIDST community or download the software. The AMIDST project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209. More information on amidst.eu.

Upload: others

Post on 12-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Java Toolbox for the Analysis of MassIve Data STreams using Probabilistic Graphical ...people.cs.aau.dk/~tdn/papers/EDFposter.pdf · 2017-07-31 · A Java Toolbox for the Analysis

A Java Toolbox for the Analysis ofMassIve Data STreams using

Probabilistic Graphical Models

DescriptionWe present a Java toolbox for scalable probabilistic machine learning.

Probabilistic machine learning

Model your problem using a flexi-ble probabilistic language based ongraphical models. Then, fit it withdata using a Bayesian approach tohandle modelling uncertainty.

H x

Multi-core and distributed processing

AMIDST provides tailored parallel anddistributed implementations of Bayesianparameter learning for batch and stream-ing data. This processing is based onflexible and scalable message passing al-gorithms.

Main Features

Probabilistic Graphical Scalable inference Data StreamsModels

Specify your model usingprobabilistic graphical mod-els with latent variables andtemporal dependencies

Perform inference on yourprobabilistic models withpowerful approximate andscalable algorithms.

Update your models whennew data is available. Ap-propriate for learning fromdata streams.

WEKA

Large-scale Data Researchers Interoperability

Use your defined models toprocess massive data sets ina distributed computer clus-ter using Flink or Spark.

Flexible toolbox for re-searchers performing theirexperimentation in machinelearning.

Leverage existing function-alities and algorithms by in-terfacing to other softwaretools.

Real world and highly complex use-cases

Risk prediction in creditoperations

Recognition of trafficmaneuvers

Risk prediction in credit opera-tions. Financial data is collectedcontinuously and reported on amonthly basis. We explore it asan evolving classification prob-lem and co. This work has beenperformed in collaboration withone of our partners, the Spanishbank BCC.

Prototype models for early recog-nition of traffic maneuver inten-tions. Data is continuously col-lected by car on-board sensorsgiving rise to a large and quicklyevolving data stream. This workhas been performed in collabo-ration with one of our partners,DAIMLER.

Identifying global trends in the financial sector

Individual trends forfinancial indicatorsWe can identify seasonal

drifts and gradual changes.

{E[H1,t]}−credit{E[H2,t]}−income{E[H3,t]}−expenses{E[H4,t]}−balance{E[H5,t]}−mortgages{E[H6,t]}−loans

Global trend forfinancial indicatorsHighly correlated with

unemployment rate (UR).Pearson’s correlationcoefficient is 0.96.

Apr

200

7

Aug

200

7

Dec

200

7

Mar

200

8

Jul 2

008

Oct

200

8

Feb

200

9

Jun

2009

Sep

200

9

Jan

2010

Apr

201

0

Aug

201

0

Dec

201

0

Apr

201

1

Sep

201

1

Jan

2012

Jun

2012

Oct

201

2

Mar

201

3

Jul 2

013

Dec

201

3

{UR}{E[Ht]}

Team

Andres Masegosa Ana M. Martınez Darıo Ramos-Lopez Rafael Cabanas Thomas D. Nielsen Helge Langseth Antonio Salmeron Anders L. MadsenDeveloper Developer Developer Developer Scientific Member Scientific Member Scientific Member Scientific Member

NTNU Aalborg University University of Almeria Aalborg University Aalborg University NTNU University of Almeria HUGIN EXPERT S/A

Academic and Industrial partners Contact

• Visit http://amidst.github.io/toolbox/ tosign up to join the growing AMIDST community ordownload the software.

The AMIDST project has received funding from the European Union’s Seventh FrameworkProgramme for research, technological development and demonstration under grant agreement no

619209. More information on amidst.eu.