Machine Learning for the rescue

Machine Learning for the rescue

Gathering the data is not a problem today. The bigger challenge is to understand these informations and draw some conclusions. Fortunately, we can use some techniques like machine learning to „teach” computer how to learn from our data. Fast artificial neural networks, random forests, SVMs, classification, clustering - just to name a few concepts ready to use… We will apply all these solutions to PHP application to deliver automatic insights/predictions and create a real business value for a client. By the end of this session you will be familiar with Machine Learning ideas and prepared to solve unsolvable problems in PHP.

34be88398f623c109b61d23e8215bd23?s=128

Mariusz Gil

June 24, 2016
Tweet

Transcript

  1. RESCUE Machine Learning for a Mariusz Gil

  2. None
  3. None
  4. CLIENT PROBLEM

  5. 1M BACKLINKS CLASSIFY THEM

  6. None
  7. OK NOT OK I DON’T CARE

  8. OK NOT OK I DON’T CARE

  9. OK NOT OK I DON’T CARE

  10. None
  11. T(URL) → [1, 2, 3, …]

  12. IF-OLOGY UGLY CODE FOR POC 1ST APPROACH

  13. I DON’ KNOW

  14. NAIVE MACHINE LEARNING 2ND APPROACH

  15. NAIVE MACHINE LEARNING 2ND APPROACH

  16. DATA ML TASK SEND TO RESULTS CALCULATE

  17. None
  18. RECIPE FOR A FAILURE DOING WITHOUT KNOWING

  19. DATA ORIENTED MACHINE LEARNING WORKFLOW 3RD APPROACH, FINAL

  20. A COMPUTER PROGRAM IS SAID TO LEARN FROM EXPERIENCE E

    WITH RESPECT TO SOME CLASS OF TASKS T AND PERFORMANCE MEASURE P IF ITS PERFORMANCE AT TASKS IN T, AS MEASURED BY P, IMPROVES WITH EXPERIENCE E
  21. DATA ML TASK PREPARED, INPUT FOR RESULTS WITH PERFORMANCE EXPERIENCE

    FEEDBACK LOOP LEARNING, VALIDATING
  22. ML TASK CLASSIFICATION REGRESSION CLUSTERING DIMENSIONALITY REDUCTION ASSOCIATION RULES

  23. EXAMPLE TIME :)

  24. FAST ARTIFICIAL NEURAL NETWORK CLASSIFICATION

  25. None
  26. 80 2 4 2 1 1 0 0 0 1

    9 1 0 0 0 1 8 1 0 0 0 9 8 1 0 0 0 4 3 1 0 0 0 5 8 1 0 0 0 5 1 1 0 0 0 9 10 1 0 0 0 4 7 1 0 0 0 5 9 …
  27. None
  28. <?php $num_input = 2; $num_output = 4; $num_layers = 3;

    $num_neurons_hidden = 4; $desired_error = 0.001; $max_epochs = 500000; $epochs_between_reports = 1000; $ann = fann_create_standard($num_layers, $num_input, $num_neurons_hidden, $num_output); if ($ann) { fann_set_activation_function_hidden($ann, FANN_SIGMOID_SYMMETRIC); fann_set_activation_function_output($ann, FANN_SIGMOID_SYMMETRIC); $filename = dirname(__FILE__) . "/coordination_system.data"; if (fann_train_on_file( $ann, $filename, $max_epochs, $epochs_between_reports, $desired_error )) { fann_save($ann, dirname(__FILE__) . "/coordination_system.net"); } fann_destroy($ann); }
  29. <?php $train_file = (dirname(__FILE__) . "/coordination_system.net"); $ann = fann_create_from_file($train_file); if

    ($ann) { $input = array($argv[1], $argv[2]); $calc_out = fann_run($ann, $input); fann_destroy($ann); }
  30. SUPERVISED LEARNING

  31. SUPPORT VECTOR MACHINES CLASSIFICATION

  32. A SUPPORT VECTOR MACHINE PERFORMS CLASSIFICATION BY FINDING THE HYPERPLANE

    THAT MAXIMIZES THE MARGIN BETWEEN THE GIVEN CLASSES
  33. SUPERVISED LEARNING

  34. K-MEANS CLUSTERING

  35. IRIS DATASET 1936, RONALD FISHER

  36. None
  37. None
  38. None
  39. <?php require 'vendor/autoload.php'; use Phpml\Dataset\Demo\Iris; use Phpml\Clustering\KMeans; $dataset = new

    Iris(); $kmeans = new KMeans(3); echo 'Dataset size: ' . count($dataset->getSamples()) . PHP_EOL; $clusters = $kmeans->cluster($dataset->getSamples()); foreach ($clusters as $i => $cluster) { echo 'Cluster #' . $i . ' :' . count(($cluster)) . PHP_EOL; }
  40. $ php -f ./iris-clustering.php Dataset size: 150 Cluster #0 :39

    Cluster #1 :50 Cluster #2 :61 $ php -f ./iris-clustering.php Dataset size: 150 Cluster #0 :38 Cluster #1 :50 Cluster #2 :62 $ php -f ./iris-clustering.php Dataset size: 150 Cluster #0 :39 Cluster #1 :50 Cluster #2 :61 $ php -f ./iris-clustering.php Dataset size: 150 Cluster #0 :96 Cluster #1 :24 Cluster #2 :30
  41. RESULTS STABILITY

  42. UNSUPERVISED LEARNING

  43. RECIPE FOR A FAILURE DON’T YOU KNOW YOUR DATA?

  44. PREDICTING VALUES REGRESSION

  45. HOW MANY BRITISH POUNDS… EURO I SHOULD EARN AS DEVELOPER

    ACCORDING TO MY SKILLSET?
  46. | age | linkedin_php | salary | |-----|--------------|--------| | 20

    | 0 | 2000 | | 26 | 8 | 3975 | | 30 | 10 | 4000 |
  47. YEARS → LINKEDIN PHP →

  48. <?php require 'vendor/autoload.php'; use Phpml\Dataset\ArrayDataset; use Phpml\Regression\LeastSquares; $dataset = new

    ArrayDataset( [ [20, 0], [26, 8], [30, 10], ], [ 2000, 3975, 4000, ] ); $regression = new LeastSquares(); $regression->train($dataset->getSamples(), $dataset->getTargets()); echo $regression->predict(array_slice($argv, 1)) . PHP_EOL;
  49. | age | city_size | linkedin_php | salary | |-----|-----------|--------------|--------|

    | 20 | 900000 | 0 | 2000 | | 20 | 400000 | 0 | 1800 | | 25 | 450000 | 8 | 3700 | | 26 | 900000 | 8 | 3975 | | 30 | 100000 | 10 | 4000 | | 30 | 500000 | 10 | 3500 |
  50. SUPERVISED LEARNING

  51. TECHNOLOGY

  52. None
  53. None
  54. None
  55. None
  56. None
  57. …JVM, PYTHON

  58. None
  59. ML IS NOT A SINGLE RUN OF ALGORITHM

  60. IT’S A PROCESS

  61. ML PROCESS DEFINE A PROBLEM ANALYZE YOUR DATA UNDERSTAND YOUR

    DATA PREPARE DATA FOR ML SELECT & RUN ALGO(S) TUNE ALGO(S) PARAMETERS SELECT FINAL MODEL VALIDATE FINAL MODEL
  62. ML PROCESS DEFINE A PROBLEM ANALYZE YOUR DATA UNDERSTAND YOUR

    DATA PREPARE DATA FOR ML SELECT & RUN ALGO(S) TUNE ALGO(S) PARAMETERS SELECT FINAL MODEL VALIDATE FINAL MODEL
  63. | age | city_size | linkedin_php | salary | |-----|-----------|--------------|--------|

    | 20 | 900000 | 0 | 2000 | | 20 | 400000 | 0 | 1800 | | 25 | 450000 | 8 | 3700 | | 26 | 900000 | 8 | 3975 | | 30 | 100000 | 10 | 4000 | | 30 | 500000 | 10 | 3500 |
  64. | age | city_size | linkedin_php | salary | currency

    | |-----|-----------|--------------|--------|----------| | 20 | 900000 | 0 | 2000 | EUR | | 20 | 400000 | 0 | 1800 | USD | | 25 | 450000 | 8 | 3700 | USD | | 26 | 900000 | 8 | 3975 | USD | | 30 | 100000 | 10 | 4000 | USD | | 30 | 500000 | 10 | 3500 | USD |
  65. ONE MORE THING…

  66. PHPCON CFP WILL BE CLOSED TOMORROW! http://phpcon.pl/2016/en/cfp

  67. THANKS mariuszgil HAPPY LEARNING YOUR MACHINES!