machine learning in php

47
MACHINE LEARNING IN PHP The roots of education are bitter, but the fruit is sweet Verona, Italia, 2016

Upload: damien-seguy-

Post on 26-Jan-2017

453 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Machine learning in php

MACHINE LEARNING IN PHPThe roots of education are bitter, but the fruit is sweet

Verona, Italia, 2016

Page 2: Machine learning in php

AGENDA

How to teach tricks to your PHP

Application : searching for code in comments

Complex learning

Page 3: Machine learning in php

SPEAKER

Damien Seguy

Exakat CTO

Static analysis of PHP code

Page 4: Machine learning in php

MACHINE LEARNING

Teaching the machine

Supervised learning : learning then applying

Application build its own model : training phase

It applies its model to real cases : applying phase

Page 5: Machine learning in php

APPLICATIONS

Play go, chess, tic-tac-toe and beat everyone else

Fraud detection and risk analysis

Automated translation or automated transcription

OCR and face recognition

Medical diagnostics

Walk, welcome guest at hotels, play football

Finding good PHP code

Page 6: Machine learning in php

PHP APPLICATIONS

Recommendations systems

Predicting user behavior

SPAM

conversion user to customer

ETA

Detect code in comments

Page 7: Machine learning in php

REAL USE CASE

Identify code in comments

Classic problem

Good problem for machine learning

Complex, no simple solution

A lot of data and expertise are available

Page 8: Machine learning in php

SUPERVISED TRAINING

Historydata Training

ModelReal data Results

Page 9: Machine learning in php

THE FANN EXTENSION

ext/fann (https://pecl.php.net/package/fann)

Fast Artificial Neural Network

http://leenissen.dk/fann/wp/

Neural networks in PHP

Works on PHP 7, thanks to the hard work of Jakub Zelenka

https://github.com/bukka/php-fann

Page 10: Machine learning in php

NEURAL NETWORKS

Imitation of nature

Input layer

Output layer

Intermediate layers

Page 11: Machine learning in php

NEURAL NETWORK

Imitation of nature

Input layer

Output layer

Intermediate layers

Page 12: Machine learning in php

INITIALIZATION<?php

$num_layers  = 1; $num_input  = 5; $num_neurons_hidden = 3; $num_output  = 1; $ann = fann_create_standard($num_layers, $num_input,  $num_neurons_hidden, $num_output);

// Activation function fann_set_activation_function_hidden($ann, 

FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output($ann,  FANN_SIGMOID_SYMMETRIC);

Page 13: Machine learning in php

PREPARING DATA

Raw data Extract Filter Human review Fann ready

Page 14: Machine learning in php

EXPERT AT WORK// Test if the if is in a compressed format

// none need yet

// icon

// There is a parser specified in `Parser::$KEYWORD_PARSERS`

// $result should exist, regardless of $_message

// $a && $b and multidimensional

// numGlyphs + 1

// TODO : fix this; var_dump($var);

// if(ob_get_clean()){

//$annots .= ' /StructParent ';

// $cfg['Servers'][$i]['controlpass'] = 'pmapass';

Page 15: Machine learning in php

INPUT VECTOR

'length' : size of the comment

'countDollar' : number of $

'countEqual' : number of =

'countObjectOperator' number of -> operator ($o->p)

'countSemicolon' : number of semi-colon ;

Page 16: Machine learning in php

INPUT DATA

46 5 1 825 0 0 0 1 0 37 2 0 0 0 0 55 2 2 0 1 1 61 2 1 3 1 1 ...

 * This file is part of Exakat.  *  * Exakat is free software: you can redistribute it and/or modify  * it under the terms of the GNU Affero General Public License as published by  * the Free Software Foundation, either version 3 of the License, or  * (at your option) any later version.  *  * Exakat is distributed in the hope that it will be useful,  * but WITHOUT ANY WARRANTY; without even the implied warranty of  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the  * GNU Affero General Public License for more details.  *  * You should have received a copy of the GNU Affero General Public License  * along with Exakat.  If not, see <http://www.gnu.org/licenses/>.  *  * The latest code can be found at <http://exakat.io/>.  * */

// $x[3] or $x[] and multidimensional

//if ($round == 3) { die('Round '.$round);}

//$this->errors[] = $this->language->get('error_permission');

Number of input Number of incoming data Number of outgoing data

Page 17: Machine learning in php

TRAINING$max_epochs  = 500000; $desired_error  = 0.001;

// the actual trainingif (fann_train_on_file($ann,  'incoming.data',  $max_epochs,  $epochs_between_reports,  $desired_error)) {        fann_save($ann, 'model.out'); }fann_destroy($ann); ?>

Page 18: Machine learning in php
Page 19: Machine learning in php
Page 20: Machine learning in php
Page 21: Machine learning in php

TRAINING

47 cases

5 characteristics

3 hidden neurons

+ 5 input + 1 output

Duration : 5.711 s

Page 22: Machine learning in php

APPLICATION

Historydata Training

ModelReal data Results

Page 23: Machine learning in php

APPLICATION<?php 

$ann = fann_create_from_file('model.out'); 

$comment = '//$gvars = $this->getGraphicVars();';

$input = makeVector($comment); $results = fann_run($ann, $input); 

if ($results[0] > 0.8) {       print "\"$comment\" -> $results[0] \n";  } 

?>

Page 24: Machine learning in php

RESULTS > 0.8

Answer between 0 and 1

Values ranges from -14 to 0,999

The closer to 1, the safer. The closer to 0, the safer.

Is this a percentage? Is this a carrots count ?

It's a mix of counts…

Page 25: Machine learning in php

-16

-12

-8

-4

0

60.000000

70.000000

80.000000

90.000000

100.000000

Page 26: Machine learning in php

REAL CASES

Tested on 14093 comments

Duration 367.01ms

Found 1960 issues (14%)

Page 27: Machine learning in php

0.99999893 // $cfg['Servers'][$i]['controlhost'] = '';    

0.99999928 //$_SESSION['Import_message'] = $message->getDisplay();    

/* 0.99999928 if (defined('SESSIONUPLOAD')) {     // write sessionupload back into the loaded PMA session

    $sessionupload = unserialize(SESSIONUPLOAD);     foreach ($sessionupload as $key => $value) {         $_SESSION[$key] = $value;     }

    // remove session upload data that are not set anymore     foreach ($_SESSION as $key => $value) {         if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX))             == UPLOAD_PREFIX             && ! isset($sessionupload[$key])         ) {             unset($_SESSION[$key]);         }     } }

Page 28: Machine learning in php

0.98780382 //LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232    

0.99361396 // We have server(s) => apply default configuration      0.98383027 // Duration = as configured    

0.99999928 // original -> translation mapping    

0.97590065 // = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in    

Page 29: Machine learning in php

True positive False positive

True negative False negative

Found by FANN

Target

Page 30: Machine learning in php

True positive

False positive

True negative

False negative

Found by FANN

Target

// $cfg['Servers'][$i]['table_coords'] = 'pma__table_coords';    

//(isset($attribs['height'])?$attribs['height']: 1);    

// if ($key != null) did not work for index "0"    

// the PASSWORD() function    

0.99999923

0.73295981

0.99999851

0.2104115

Page 31: Machine learning in php

RESULTS

1960 issues

50+% of false positive

With an easy clean, 822 issues reported

14k comments, analyzed in 367 ms

Total time of coding : 27 mins.

// = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in     /* vim: set expandtab sw=4 ts=4 sts=4: */

Page 32: Machine learning in php

LEARN BETTER, NOT HARDER

Better training data

Improve characteristics

Configure the neural network

Change algorithm

Automate learning

Update constantly

Real data

Historydata

Training

Model Results

Retroaction

Page 33: Machine learning in php

BETTER TRAINING DATA

More data, more data, more data

Varied situations, real case situations

Include specific cases

Experience is capital

https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

Page 34: Machine learning in php

IMPROVE CHARACTERISTICS

Add new characteristics

Remove the one that are less interesting

Find the right set of characteristics

Page 35: Machine learning in php

NETWORK CONFIGURATION

Input vector

Intermediate neurons

Activation function

Output vector

0

5000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10

1 layer 2 layers 3 layers 4 layers

Time of training (ms)

Page 36: Machine learning in php

CHANGE ALGORITHM

First add more data before changing algorithm

Try cascade2 algorithm from FANN

0.6 => 0 found

0.5 => 2 found

Not found by the first algorithm

Page 37: Machine learning in php

FINDING THE BEST

Test with 2-4 layers10 neurons

Measure results

0

2250

4500

6750

9000

1 2 3 4 5 6 7 8 9 10 11 12 13

1 layer 2 layers 3 layers 4 layers

Page 38: Machine learning in php

DEEP LEARNING

Chaining the neural networks

Auto-encoders

Unsupervised Learning

Genetic algorithm, ant

Page 39: Machine learning in php

OTHER TOOLS

PHP ext/fann

Langage R

https://github.com/kachkaev/php-r

Scikit-learn

https://github.com/scikit-learn/scikit-learn

Mahout

https://mahout.apache.org/

Page 40: Machine learning in php

@exakathttps://joind.in/talk/42120

GRAZIE

Page 41: Machine learning in php
Page 42: Machine learning in php

AUTRES CONFIGURATIONS

Fonction d'activation

FANN_SIGMOID_SYMMETRIC

FANN_LINEAR

FANN_THRESHOLD

FANN_SIN_SYMMETRIC

Page 43: Machine learning in php

Linéaire Seuil

Tangeante

Gaussienne Quadratique

Sigmoide

Page 44: Machine learning in php

QUELLES APPLICATIONS?

Non-déterministe

Elimination de tout ce qui est systématique à trouver

Accès à l'expertise et aux vecteurs de caractéristiques

Couche finale après les résultats

Classification, priorisation, approximation rapide

Page 45: Machine learning in php

APPRENTISSAGE PAR RENFORCEMENT

Logiciel

Monde réel

RécompenseActionRéaction

Page 46: Machine learning in php

FILTRES BAYESIENS

Page 47: Machine learning in php

ALGORITHMES GÉNÉTIQUES

Population

Population

Selection

Reproduction

PopulationVariations