Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for the rescue

Machine Learning for the rescue

Gathering the data is not a problem today. The bigger challenge is to understand these informations and draw some conclusions. Fortunately, we can use some techniques like machine learning to „teach” computer how to learn from our data. Fast artificial neural networks, random forests, SVMs, classification, clustering - just to name a few concepts ready to use… We will apply all these solutions to PHP application to deliver automatic insights/predictions and create a real business value for a client. By the end of this session you will be familiar with Machine Learning ideas and prepared to solve unsolvable problems in PHP.

Mariusz Gil

June 24, 2016
Tweet

More Decks by Mariusz Gil

Other Decks in Programming

Transcript

  1. RESCUE
    Machine Learning for a
    Mariusz Gil

    View Slide

  2. View Slide

  3. View Slide

  4. CLIENT PROBLEM

    View Slide

  5. 1M BACKLINKS
    CLASSIFY THEM

    View Slide

  6. View Slide

  7. OK
    NOT OK
    I DON’T CARE

    View Slide

  8. OK
    NOT OK
    I DON’T CARE

    View Slide

  9. OK
    NOT OK
    I DON’T CARE

    View Slide

  10. View Slide

  11. T(URL) → [1, 2, 3, …]

    View Slide

  12. IF-OLOGY
    UGLY CODE
    FOR POC
    1ST APPROACH

    View Slide

  13. I DON’ KNOW

    View Slide

  14. NAIVE
    MACHINE LEARNING
    2ND APPROACH

    View Slide

  15. NAIVE
    MACHINE LEARNING
    2ND APPROACH

    View Slide

  16. DATA ML
    TASK
    SEND TO
    RESULTS
    CALCULATE

    View Slide

  17. View Slide

  18. RECIPE FOR A FAILURE
    DOING WITHOUT KNOWING

    View Slide

  19. DATA ORIENTED
    MACHINE LEARNING
    WORKFLOW
    3RD APPROACH, FINAL

    View Slide

  20. A COMPUTER PROGRAM
    IS SAID TO LEARN FROM EXPERIENCE E
    WITH RESPECT TO SOME CLASS OF TASKS T
    AND PERFORMANCE MEASURE P
    IF ITS PERFORMANCE AT TASKS IN T,
    AS MEASURED BY P,
    IMPROVES WITH EXPERIENCE E

    View Slide

  21. DATA ML
    TASK
    PREPARED, INPUT FOR
    RESULTS
    WITH PERFORMANCE
    EXPERIENCE FEEDBACK LOOP
    LEARNING, VALIDATING

    View Slide

  22. ML
    TASK
    CLASSIFICATION
    REGRESSION
    CLUSTERING
    DIMENSIONALITY REDUCTION
    ASSOCIATION RULES

    View Slide

  23. EXAMPLE TIME :)

    View Slide

  24. FAST ARTIFICIAL
    NEURAL NETWORK
    CLASSIFICATION

    View Slide

  25. View Slide

  26. 80 2 4
    2 1
    1 0 0 0
    1 9
    1 0 0 0
    1 8
    1 0 0 0
    9 8
    1 0 0 0
    4 3
    1 0 0 0
    5 8
    1 0 0 0
    5 1
    1 0 0 0
    9 10
    1 0 0 0
    4 7
    1 0 0 0
    5 9

    View Slide

  27. View Slide

  28. $num_input = 2;
    $num_output = 4;
    $num_layers = 3;
    $num_neurons_hidden = 4;
    $desired_error = 0.001;
    $max_epochs = 500000;
    $epochs_between_reports = 1000;
    $ann = fann_create_standard($num_layers, $num_input, $num_neurons_hidden, $num_output);
    if ($ann) {
    fann_set_activation_function_hidden($ann, FANN_SIGMOID_SYMMETRIC);
    fann_set_activation_function_output($ann, FANN_SIGMOID_SYMMETRIC);
    $filename = dirname(__FILE__) . "/coordination_system.data";
    if (fann_train_on_file(
    $ann,
    $filename,
    $max_epochs,
    $epochs_between_reports,
    $desired_error
    )) {
    fann_save($ann, dirname(__FILE__) . "/coordination_system.net");
    }
    fann_destroy($ann);
    }

    View Slide

  29. $train_file = (dirname(__FILE__) . "/coordination_system.net");
    $ann = fann_create_from_file($train_file);
    if ($ann) {
    $input = array($argv[1], $argv[2]);
    $calc_out = fann_run($ann, $input);
    fann_destroy($ann);
    }

    View Slide

  30. SUPERVISED
    LEARNING

    View Slide

  31. SUPPORT
    VECTOR MACHINES
    CLASSIFICATION

    View Slide

  32. A SUPPORT VECTOR MACHINE
    PERFORMS CLASSIFICATION
    BY FINDING THE HYPERPLANE
    THAT MAXIMIZES THE MARGIN
    BETWEEN THE GIVEN CLASSES

    View Slide

  33. SUPERVISED
    LEARNING

    View Slide

  34. K-MEANS
    CLUSTERING

    View Slide

  35. IRIS DATASET
    1936, RONALD FISHER

    View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. require 'vendor/autoload.php';
    use Phpml\Dataset\Demo\Iris;
    use Phpml\Clustering\KMeans;
    $dataset = new Iris();
    $kmeans = new KMeans(3);
    echo 'Dataset size: ' . count($dataset->getSamples()) . PHP_EOL;
    $clusters = $kmeans->cluster($dataset->getSamples());
    foreach ($clusters as $i => $cluster) {
    echo 'Cluster #' . $i . ' :' . count(($cluster)) . PHP_EOL;
    }

    View Slide

  40. $ php -f ./iris-clustering.php
    Dataset size: 150
    Cluster #0 :39
    Cluster #1 :50
    Cluster #2 :61
    $ php -f ./iris-clustering.php
    Dataset size: 150
    Cluster #0 :38
    Cluster #1 :50
    Cluster #2 :62
    $ php -f ./iris-clustering.php
    Dataset size: 150
    Cluster #0 :39
    Cluster #1 :50
    Cluster #2 :61
    $ php -f ./iris-clustering.php
    Dataset size: 150
    Cluster #0 :96
    Cluster #1 :24
    Cluster #2 :30

    View Slide

  41. RESULTS STABILITY

    View Slide

  42. UNSUPERVISED
    LEARNING

    View Slide

  43. RECIPE FOR A FAILURE
    DON’T YOU KNOW YOUR DATA?

    View Slide

  44. PREDICTING VALUES
    REGRESSION

    View Slide

  45. HOW MANY BRITISH POUNDS… EURO
    I SHOULD EARN AS DEVELOPER
    ACCORDING TO MY SKILLSET?

    View Slide

  46. | age | linkedin_php | salary |
    |-----|--------------|--------|
    | 20 | 0 | 2000 |
    | 26 | 8 | 3975 |
    | 30 | 10 | 4000 |

    View Slide

  47. YEARS →
    LINKEDIN PHP →

    View Slide

  48. require 'vendor/autoload.php';
    use Phpml\Dataset\ArrayDataset;
    use Phpml\Regression\LeastSquares;
    $dataset = new ArrayDataset(
    [
    [20, 0],
    [26, 8],
    [30, 10],
    ],
    [
    2000,
    3975,
    4000,
    ]
    );
    $regression = new LeastSquares();
    $regression->train($dataset->getSamples(), $dataset->getTargets());
    echo $regression->predict(array_slice($argv, 1)) . PHP_EOL;

    View Slide

  49. | age | city_size | linkedin_php | salary |
    |-----|-----------|--------------|--------|
    | 20 | 900000 | 0 | 2000 |
    | 20 | 400000 | 0 | 1800 |
    | 25 | 450000 | 8 | 3700 |
    | 26 | 900000 | 8 | 3975 |
    | 30 | 100000 | 10 | 4000 |
    | 30 | 500000 | 10 | 3500 |

    View Slide

  50. SUPERVISED
    LEARNING

    View Slide

  51. TECHNOLOGY

    View Slide

  52. View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. …JVM, PYTHON

    View Slide

  58. View Slide

  59. ML IS NOT
    A SINGLE RUN
    OF ALGORITHM

    View Slide

  60. IT’S A PROCESS

    View Slide

  61. ML
    PROCESS
    DEFINE A PROBLEM
    ANALYZE YOUR DATA
    UNDERSTAND YOUR DATA
    PREPARE DATA FOR ML
    SELECT & RUN ALGO(S)
    TUNE ALGO(S) PARAMETERS
    SELECT FINAL MODEL
    VALIDATE FINAL MODEL

    View Slide

  62. ML
    PROCESS
    DEFINE A PROBLEM
    ANALYZE YOUR DATA
    UNDERSTAND YOUR DATA
    PREPARE DATA FOR ML
    SELECT & RUN ALGO(S)
    TUNE ALGO(S) PARAMETERS
    SELECT FINAL MODEL
    VALIDATE FINAL MODEL

    View Slide

  63. | age | city_size | linkedin_php | salary |
    |-----|-----------|--------------|--------|
    | 20 | 900000 | 0 | 2000 |
    | 20 | 400000 | 0 | 1800 |
    | 25 | 450000 | 8 | 3700 |
    | 26 | 900000 | 8 | 3975 |
    | 30 | 100000 | 10 | 4000 |
    | 30 | 500000 | 10 | 3500 |

    View Slide

  64. | age | city_size | linkedin_php | salary | currency |
    |-----|-----------|--------------|--------|----------|
    | 20 | 900000 | 0 | 2000 | EUR |
    | 20 | 400000 | 0 | 1800 | USD |
    | 25 | 450000 | 8 | 3700 | USD |
    | 26 | 900000 | 8 | 3975 | USD |
    | 30 | 100000 | 10 | 4000 | USD |
    | 30 | 500000 | 10 | 3500 | USD |

    View Slide

  65. ONE MORE THING…

    View Slide

  66. PHPCON CFP WILL BE
    CLOSED TOMORROW!
    http://phpcon.pl/2016/en/cfp

    View Slide

  67. THANKS
    mariuszgil
    HAPPY LEARNING YOUR MACHINES!

    View Slide