debellor data mining platform with stream architecture marcin wojnarski warsaw university, poland

24
Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

Upload: margaret-cameron

Post on 02-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

DebellorData Mining Platform with Stream Architecture

Marcin Wojnarski

Warsaw University, Poland

Page 2: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

2

Outline

Debellor – data mining platform

Motivation

Main features

Architecture: Cell data streaming multi-threading

Available in ver. 0.6

Future releases

Summary

Page 3: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

3

Language: Java

Licence: open source (GPL)

Download: www.debellor.org

Debello – to conquer (latin). Debellor – conqueror of data

Debellor

Page 4: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

4

Rseslib

Debellor – data mining platform

Weka TA-Lib

Lib

SVM

own…

own…

Debellor

Page 5: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

5

Motivation

Demand for more complex algorithms.

Necessity to combine elementary algorithms.

Page 6: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

6

Motivation

1. Data Processing Network (DPN)

Load Preprocess PredictPreprocess

Save

Load

Visualize

Page 7: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

7

Motivation

2. Committee of algorithms

Classifier B Voting

Classifier A

Classifier C

Page 8: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

8

Motivation

3. Nested algorithms

RBF neural network

K-means

Page 9: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

9

Requirements

Versatile Efficient

Simple

Page 10: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

10

All types of data processing algorithms

Extendible data types

Stream architecture large data sets

Multi-threading

Immutability of data objects safety

Features of Debellor

Page 11: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

11

Debellor

Page 12: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

12

Algorithm Cell

cell

Cell cell = new RseslibClassifier("C45");

cell.set("pruning", "true");

Page 13: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

13

Cell – data source

cell

cell.open();

Sample s1 = cell.next(),

s2 = cell.next(),

...

cell.close();

Page 14: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

14

Cell – data receiver

cell

cell.setSource(anotherCell);

anotherCell

Page 15: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

15

Trainable Cell

cell

cell.setSource(…);

cell.learn();

cell

EMPTY

TRAINED

Page 16: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

16

Data Streaming

A B

A B

BATCH

STREAM

It’s the cell who is responsible for asking for data

Page 17: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

17

Benefits of streaming

X X

crash!

training of k-means

Page 18: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

18

Thread_1

Multi-threading

A B

Page 19: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

19

Thread_1

Multi-threading

A.newThread();

A B

Thread_2

Page 20: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

20

Available in version 0.6

Rseslib algorithms: classifiers (~20 algorithms)

Weka algorithms: ARFF reader classifiers (~60) filters (47)

Debellor algorithms: Train&Test evaluation k-means for large data (stream-based)

Data types: numeric and symbolic features vectors of features, vectors of vectors of …

Page 21: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

21

Future releases

Multi-input & multi-output cells

Composite cells (e.g. meta-learning)

Serialization and copying

Page 22: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

22

Summary

Platform

Stream architecture

Extendible

Multi-threaded

Weka & Rseslib partially integrated

Page 23: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

23

www.debellor.org

Home

Page 24: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

24