debellor data mining platform with stream architecture marcin wojnarski warsaw university, poland
TRANSCRIPT
DebellorData Mining Platform with Stream Architecture
Marcin Wojnarski
Warsaw University, Poland
2
Outline
Debellor – data mining platform
Motivation
Main features
Architecture: Cell data streaming multi-threading
Available in ver. 0.6
Future releases
Summary
3
Language: Java
Licence: open source (GPL)
Download: www.debellor.org
Debello – to conquer (latin). Debellor – conqueror of data
Debellor
4
Rseslib
Debellor – data mining platform
Weka TA-Lib
Lib
SVM
own…
own…
Debellor
5
Motivation
Demand for more complex algorithms.
Necessity to combine elementary algorithms.
6
Motivation
1. Data Processing Network (DPN)
Load Preprocess PredictPreprocess
Save
Load
Visualize
7
Motivation
2. Committee of algorithms
Classifier B Voting
Classifier A
Classifier C
8
Motivation
3. Nested algorithms
RBF neural network
K-means
9
Requirements
Versatile Efficient
Simple
10
All types of data processing algorithms
Extendible data types
Stream architecture large data sets
Multi-threading
Immutability of data objects safety
Features of Debellor
11
Debellor
12
Algorithm Cell
cell
Cell cell = new RseslibClassifier("C45");
cell.set("pruning", "true");
13
Cell – data source
cell
cell.open();
Sample s1 = cell.next(),
s2 = cell.next(),
...
cell.close();
14
Cell – data receiver
cell
cell.setSource(anotherCell);
anotherCell
15
Trainable Cell
cell
cell.setSource(…);
cell.learn();
cell
EMPTY
TRAINED
16
Data Streaming
A B
A B
BATCH
STREAM
It’s the cell who is responsible for asking for data
17
Benefits of streaming
X X
crash!
training of k-means
18
Thread_1
Multi-threading
A B
19
Thread_1
Multi-threading
A.newThread();
A B
Thread_2
20
Available in version 0.6
Rseslib algorithms: classifiers (~20 algorithms)
Weka algorithms: ARFF reader classifiers (~60) filters (47)
Debellor algorithms: Train&Test evaluation k-means for large data (stream-based)
Data types: numeric and symbolic features vectors of features, vectors of vectors of …
21
Future releases
Multi-input & multi-output cells
Composite cells (e.g. meta-learning)
Serialization and copying
…
22
Summary
Platform
Stream architecture
Extendible
Multi-threaded
Weka & Rseslib partially integrated
23
www.debellor.org
Home
24