what is jubatus? how it works for you?

40
What is Jubatus? How it works for you? NTT SIC Hiroki Kumazaki

Upload: kumazaki-hiroki

Post on 22-Jun-2015

479 views

Category:

Engineering


7 download

TRANSCRIPT

Page 1: What is jubatus? How it works for you?

What is Jubatus?How it works for you?

NTT SIC Hiroki Kumazaki

Page 2: What is jubatus? How it works for you?

Jubatus is…• A Distributed Online Machine-Learning framework

• Distributed– Fault-Tolerance– Scale out

• Online– Fixed time computation

• Machine-Learning– More than “word count”!

Page 3: What is jubatus? How it works for you?

Architecture• ML model is combined with feature-extractor

MachineLearningModel

FeatureExtractor

Jubatus Server

Jubatus RPC

Page 4: What is jubatus? How it works for you?

Architecture

• Distributed Computation– Shared-Everything Architecture• It’s fast and fault-tolerant!

Mix

Page 5: What is jubatus? How it works for you?

Architecture

• It looks as if one server running.

Client

Jubatus RPC

Proxy

Page 6: What is jubatus? How it works for you?

Architecture

• It looks as if one server running– You can use single local Jubatus server for develop– Multiple Jubatus server cluster for production

Client

Jubatus RPC

The same RPC!

Page 7: What is jubatus? How it works for you?

Architecture• With heavy load…

Client

Jubatus RPC

Proxy

Page 8: What is jubatus? How it works for you?

Architecture• Dynamically scale-out!

Client

Jubatus RPC

Proxy

Page 9: What is jubatus? How it works for you?

Architecture• Whenever servers break down– Proxy conceals failures, so the service will continue.

Client

Jubatus RPC

Proxy

Page 10: What is jubatus? How it works for you?

Architecture

• Multilanguage client library– gem, pip, cpan, maven Ready!– It essentially uses a messagepack-rpc.

• So you can use OCaml, Haskell, JavaScript, Go with your own risk.

Client

Jubatus RPC

Page 11: What is jubatus? How it works for you?

Architecture• Many ML algorithms– Classifier– Recommender– Anomaly Detection– Clustering– Regression– Graph Mining

Useful!

Page 12: What is jubatus? How it works for you?

Classifier• Task: Classification of Datum

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) endendif __FILE__ == $0 puts fib(ARGV[0].to_i)end

Sample Task: Classify what programming language used

It’s It’s

Page 13: What is jubatus? How it works for you?

Classifier• Set configuration in the Jubatus server

ClassifierFreatureExtractor

"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}

Feature Extractor

Page 14: What is jubatus? How it works for you?

Classifier• Configuration JSON– It does “feature vector design”– very important step for machine learning

"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}

setteings for extract feature from string

define function named “bigram”

original embedded function “ngram”

pass “2” to “ngram” to create “bigram”

for all dataapply “bigram”

feature weights based on tf/idfsee wikipedia/tf-idf

Page 15: What is jubatus? How it works for you?

Classifier• Feature Extractor becomes “bigram extractor”

Classifierbigramextractor

Page 16: What is jubatus? How it works for you?

Feature Extractor• What bigram extractor does?

bigramextractor

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

key value

im 1

mp 1

po 1

... ...

): 1

... ...

de 1

ef 1

... ...

Feature Vector

Page 17: What is jubatus? How it works for you?

Classifier• Training model with feature vectors

key valueim 1mp 1po 1... ...): 1... ...de 1ef 1... ...

Classifier

key valuepu 1ut 1... ...{| ...|m 1m| 1{| 1en 1nd 1

key value@a 1$_ 1... ...my ...su 1ub 1us 1se 1... ...

Page 18: What is jubatus? How it works for you?

Classifier• Set configuration in the Jubatus server

Classifier

"method" : "AROW","parameter" : { "regularization_weight" : 1.0}

Feature Extractor

bigramextractor Classifier Algorithms

• Perceptron• Passive Aggressive• Confidence Weight• Adaptive Regularization of Weights• Normal Her d

Page 19: What is jubatus? How it works for you?

Classifier• Use model to classification task– Jubatus will find clue for classification

AROW

key valuesi 1il 1... ...{| 1... ...

It’s

Page 20: What is jubatus? How it works for you?

Classifier• Use model to classification task– Jubatus will find clue for classification

AROW

key valuere 1): 1

... ...s[ 1... ...

It’s

Page 21: What is jubatus? How it works for you?

Via RPC• call feature extraction and classification from

client via RPC

AROWbigramextractor

lang = client.classify([sourcecode])

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

key value

im 1

mp 1

po 1

... ...

): 1

... ...

de 1

ef 1

... ...

It may be

Page 22: What is jubatus? How it works for you?

What classifier can do?• You can – estimate the topic of tweets– trash spam mail automatically– monitor server failure from syslog– estimate sentiment of user from blog post– detect malicious attack– find what feature is the best clue to classification

Page 23: What is jubatus? How it works for you?

What classifier cannot do• You cannot– train model from data without supervised answer– create a class without knowledge of the class– get fine model without correct feature designing

Page 24: What is jubatus? How it works for you?

How to use?• see examples in

http://github.com/jubatus/jubatus-example – gender– shogun– malware classification– language detection

Page 25: What is jubatus? How it works for you?

Recommender• Task: what datum is similar to the datum?

Name Star Wars

Harry Potter Star Trek Titanic Frozen

John 4 3 2 2

Bob 5 3

Erika 1 3 4 5

Jack 2 5

Ann 4 5

Emily 1 4 2 5 4

Which movie should we recommend Ann?

Page 26: What is jubatus? How it works for you?

Recommender• Do recommendation based on Nearest Neighbor

Movie Rating(high-dimensional)

Science Fiction

Star Trek loverJohn

Jack

Love RomanceFantasy

Erika

Ann

StarWars loverBob

Emily

Near

Far

Page 27: What is jubatus? How it works for you?

Recommender• Ann and Emily is near– we should recommend Flozen for Ann

Name Star Wars

Harry Potter Star Trek Titanic Frozen

Ann 4 5 ★

Emily 1 4 2 5 4

I bet Ann would like it!

Page 28: What is jubatus? How it works for you?

Recommender with Feature Extractor• Recommender server consist of Feature Extractor

and Recommender engine.– Jubatus calculates distance between feature vectors

RecommenderFeatureExtractor

Recommender Engine can use• Minhash• Locality Sensitive Hashing• Euclid Locality Sensitive Hashingfor defining distance.

Page 29: What is jubatus? How it works for you?

Recommender with Feature Extractor• Jubatus maps data in feature space– There are distances between data• How are they near or far?

key value

pu 1

ut 1

... ...

{| ...

|m 1

m| 1

{| 1

FeatureExtractor

key value

im 1

mp 1

... ...

... ...

“{ 1

fo 1

... ...

key value

Ma 1

ap 1

... ...

in 1

nt 1

te 1

er 1

Recommender

Ruby

Python

Java

Page 30: What is jubatus? How it works for you?

What Recommender can do?• You can– create recommendation engine in e-commerce– calculate similarity of tweets– find similar directional NBA player– visualize distance between “Star Wars” and “Star Trek”

Page 31: What is jubatus? How it works for you?

What Recommender cannot do?• You cannot– Label data(use classifier!)– get decision tree– get a-priori based recommendation

Page 32: What is jubatus? How it works for you?

Anomaly Detection• Task: Which datum is far from the others?

Page 33: What is jubatus? How it works for you?

Anomaly Detection• Task: Which datum is far from the others?

This One!

Page 34: What is jubatus? How it works for you?

Anomaly Detection• Distance based detection is not good– We cannot decide appropriate threshold of distance

Distance is equal!

Page 35: What is jubatus? How it works for you?

Anomaly Detection with Feature Extractor

• Anomaly detection server consist of Feature Extractor and anomaly detection engine.– Jubatus finds outlier from feature vectors

AnomalyDetection

FeatureExtractor

Anomaly Detection Engine can use• Minhash• Locality Sensitive Hashing• Euclid Locality Sensitive Hashingfor defining distance.

Page 36: What is jubatus? How it works for you?

Anomaly Detection• jubaanomaly can do it!– It base on local outlier factor algorithm

key value

pu 1

ut 1

... ...

{| ...

|m 1

m| 1

{| 1

FeatureExtractor

key value

im 1

mp 1

... ...

... ...

“{ 1

fo 1

... ...

key value

Ma 1

ap 1

... ...

in 1

nt 1

te 1

er 1

AnomalyDetection

Outlier!

Page 37: What is jubatus? How it works for you?

What Anomaly Detection can do?• You (might) can – find outlier– grasp the trend and overview of current data stream– detect or predict server's failure– protect Web services from zero-day attacks

Page 38: What is jubatus? How it works for you?

What Anomaly Detection cannot do?• You cannot– know the cluster distribution of data– find any kinds of outliers with 100% accuracy– easily understand how each outlier occurs– know why a datum is assigned high outlier score

Page 39: What is jubatus? How it works for you?

Conclusion• Jubatus have embedded feature extractor with

algorithms.• User should configure both feature extractor and

algorithm properly• Client use configured machine learning via

Jubatus-RPC• Classifier and Recommender and Anomaly may

be useful for your task.

Page 40: What is jubatus? How it works for you?

DEMO

• I try to run the jubatus-example.