machine learning in php singapore

42
MACHINE LEARNING IN PHP The roots of education are bitter, but the fruit is sweet PHP conf asia, Singapore, 2016

Upload: damien-seguy-

Post on 26-Jan-2017

134 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Machine learning in php   singapore

MACHINE LEARNING IN PHPThe roots of education are bitter, but the fruit is sweet

PHP conf asia, Singapore, 2016

Page 2: Machine learning in php   singapore

Agenda

• How to teach tricks to your PHP

• Application : searching for code in comments

• Complex learning

Page 3: Machine learning in php   singapore

Speaker

• Damien Seguy

• Exakat CTO

• Static analysis of PHP code

Page 4: Machine learning in php   singapore

Machine Learning

• Teaching the machine

• Supervised learning : learning then applying

• Application build its own model : training phase

• It applies its model to real cases : applying phase

Page 5: Machine learning in php   singapore

Applications

• Play go, chess, tic-tac-toe and beat everyone else

• Fraud detection and risk analysis

• Automated translation or automated transcription

• OCR and face recognition

• Medical diagnostics

• Walk, welcome guest at hotels, play football

• Finding good PHP code

Page 6: Machine learning in php   singapore

PHP Applications

• Recommendations systems

• Predicting user behavior

• SPAM

• conversion user to customer

• ETA

• Detect code in comments

Page 7: Machine learning in php   singapore

Real use case

• Identify code in comments

• Classic problem

• Good problem for machine learning

• Complex, no simple solution

• A lot of data and expertise are available

Page 8: Machine learning in php   singapore

Supervised Training

Historydata Training

ModelReal data Results

Page 9: Machine learning in php   singapore

The Fann Extension

• ext/fann (https://pecl.php.net/package/fann)

• Fast Artificial Neural Network

• http://leenissen.dk/fann/wp/

• Neural networks in PHP

• Works on PHP 7, thanks to the hard work of Jakub Zelenka

• https://github.com/bukka/php-fann

Page 10: Machine learning in php   singapore

NEURAL NETWORKS

• Imitation of nature

• Input layer

• Output layer

• Intermediate layers

Page 11: Machine learning in php   singapore

Neural network

• Imitation of nature

• Input layer

• Output layer

• Intermediate layers

Page 12: Machine learning in php   singapore

Initialisation

<?php

$num_layers  = 1; $num_input  = 5; $num_neurons_hidden = 3; $num_output  = 1; $ann = fann_create_standard($num_layers, $num_input,  $num_neurons_hidden, $num_output);

// Activation function fann_set_activation_function_hidden($ann, 

FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output($ann,  FANN_SIGMOID_SYMMETRIC);

Page 13: Machine learning in php   singapore

Preparing data

Raw data Extract Filter Human review Fann ready

Page 14: Machine learning in php   singapore

Expert at work

// Test if the if is in a compressed format

// none need yet

// There is a parser specified in `Parser::$KEYWORD_PARSERS`

// $result should exist, regardless of $_message

// $a && $b and multidimensional

// numGlyphs + 1

// TODO : fix this; var_dump($var);

// if(ob_get_clean()){

//$annots .= ' /StructParent ';

// $cfg['Servers'][$i]['controlpass'] = 'pmapass';

Page 15: Machine learning in php   singapore

Input vector

• 'length' : size of the comment

• 'countDollar' : number of $

• 'countEqual' : number of =

• 'countObjectOperator' number of -> operator ($o->p)

• 'countSemicolon' : number of semi-colon ;

Page 16: Machine learning in php   singapore

Input data

46 5 1 825 0 0 0 1 0 37 2 0 0 0 0 55 2 2 0 1 1 61 2 1 3 1 1 ...

 * This file is part of Exakat.  *  * Exakat is free software: you can redistribute it and/or modify  * it under the terms of the GNU Affero General Public License as published by  * the Free Software Foundation, either version 3 of the License, or  * (at your option) any later version.  *  * Exakat is distributed in the hope that it will be useful,  * but WITHOUT ANY WARRANTY; without even the implied warranty of  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the  * GNU Affero General Public License for more details.  *  * You should have received a copy of the GNU Affero General Public License  * along with Exakat.  If not, see <http://www.gnu.org/licenses/>.  *  * The latest code can be found at <http://exakat.io/>.  * */

// $x[3] or $x[] and multidimensional

//if ($round == 3) { die('Round '.$round);}

//$this->errors[] = $this->language->get('error_permission');

Number of input Number of incoming data Number of outgoing data

Page 17: Machine learning in php   singapore

1 5 1 37 2 0 0 0 0

// $x[3] or $x[] and multidimensional

ext/Fann

It's a comment

Page 18: Machine learning in php   singapore

Training

$max_epochs  = 500000; $desired_error  = 0.001;

// the actual trainingif (fann_train_on_file($ann,  'incoming.data',  $max_epochs,  $epochs_between_reports,  $desired_error)) {        fann_save($ann, 'model.out'); }fann_destroy($ann); ?>

Page 19: Machine learning in php   singapore
Page 20: Machine learning in php   singapore
Page 21: Machine learning in php   singapore
Page 22: Machine learning in php   singapore

TRAINING

• 47 cases

• 5 characteristics

• 3 hidden neurons

• + 5 input + 1 output

• Duration : 5.711 s

Page 23: Machine learning in php   singapore

Application

Historydata Training

ModelReal data Results

Page 24: Machine learning in php   singapore

Application

<?php 

$ann = fann_create_from_file('model.out'); 

$comment = '//$gvars = $this->getGraphicVars();';

$input = makeVector($comment); $results = fann_run($ann, $input); 

if ($results[0] > 0.8) {       print "\"$comment\" -> $results[0] \n";  } 

?>

Page 25: Machine learning in php   singapore

Results > 0.8

• Answer between 0 and 1

• Values ranges from -14 to 0,999

• The closer to 1, the safer. The closer to 0, the safer.

• Is this a percentage? Is this a carrots count ?

• It's a mix of counts…

Page 26: Machine learning in php   singapore

-16

-12

-8

-4

0

60.000000

70.000000

80.000000

90.000000

100.000000

Page 27: Machine learning in php   singapore

REAL CASES

• Tested on 14093 comments

• Duration 68.01ms

• Found 1960 issues (14%)

Page 28: Machine learning in php   singapore

0.99999893 // $cfg['Servers'][$i]['controlhost'] = '';    

0.99999928 //$_SESSION['Import_message'] = $message->getDisplay();    

/* 0.99999928 if (defined('SESSIONUPLOAD')) {     // write sessionupload back into the loaded PMA session

    $sessionupload = unserialize(SESSIONUPLOAD);     foreach ($sessionupload as $key => $value) {         $_SESSION[$key] = $value;     }

    // remove session upload data that are not set anymore     foreach ($_SESSION as $key => $value) {         if (mb_substr($key, 0, mb_strlen(UPLOAD_PREFIX))             == UPLOAD_PREFIX             && ! isset($sessionupload[$key])         ) {             unset($_SESSION[$key]);         }     } }

Page 29: Machine learning in php   singapore

0.98780382 //LEAD_OFFSET = (0xD800 - (0x10000 >> 10)) = 55232    

0.99361396 // We have server(s) => apply default configuration      0.98383027 // Duration = as configured    

0.99999928 // original -> translation mapping    

0.97590065 // = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in    

Page 30: Machine learning in php   singapore

True positive False positive

True negative False negative

Found by FANN

Target

Page 31: Machine learning in php   singapore

True positive

False positive

True negative

False negative

Found by FANN

Target

// $cfg['Servers'][$i]['table_coords'] = 'pma__table_coords';    

//(isset($attribs['height'])?$attribs['height']: 1);    

// if ($key != null) did not work for index "0"    

// the PASSWORD() function    

0.99999923

0.73295981

0.99999851

0.2104115

Page 32: Machine learning in php   singapore

RESULTS

• 1960 issues

• 50+% of false positive

• With an easy clean, 822 issues reported

• 14k comments, analyzed in 68 ms (367ms in PHP5)

• Total time of coding : 27 mins.

// = (   59 x 84   ) mm  = (  2.32 x 3.31  ) in     /* vim: set expandtab sw=4 ts=4 sts=4: */

Page 33: Machine learning in php   singapore

Learn better, not harder

• Better training data

• Improve characteristics

• Configure the neural network

• Change algorithm

• Automate learning

• Update constantly

Real data

Historydata

Training

Model Results

Retroaction

Page 34: Machine learning in php   singapore

Better training data

• More data, more data, more data

• Varied situations, real case situations

• Include specific cases

• Experience is capital

• https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

Page 35: Machine learning in php   singapore

Improve characteristics

• Add new characteristics

• Remove the one that are less interesting

• Find the right set of characteristics

Page 36: Machine learning in php   singapore

Network Configuration

• Input vector

• Intermediate neurons

• Activation function

• Output vector

0

5000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10

1 layer 2 layers 3 layers 4 layers

Time of training (ms)

Page 37: Machine learning in php   singapore

Change algorithm

• First add more data before changing algorithm

• Try cascade2 algorithm from FANN

• 0.6 => 0 found

• 0.5 => 2 found

• Not found by the first algorithm

Page 38: Machine learning in php   singapore

Finding the BEST

• Test with 2-4 layers 10 neurons

• Measure results

0

2250

4500

6750

9000

1 2 3 4 5 6 7 8 9 10 11 12 13

1 layer 2 layers 3 layers 4 layers

Page 39: Machine learning in php   singapore

DEEP LEARNING

• Chaining the neural networks

• Auto-encoders

• Unsupervised Learning

• Genetic algorithm, ant, random forest, naive Bayes

Page 40: Machine learning in php   singapore

Other tools

• PHP ext/fann

• Langage R

• https://github.com/kachkaev/php-r

• Scikit-learn

• https://github.com/scikit-learn/scikit-learn

• Mahout

• https://mahout.apache.org/

Page 41: Machine learning in php   singapore

Conclusion

• Machine learning is about data, not code

• There are tools to use it with PHP

• Fast to try, easy results or fast fail

• Use it for complex problems, that accepts error

Page 42: Machine learning in php   singapore

THANK YOU!

@exakat