Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning with H2O.ai - Budapest Data Science Meetup - July 2016

szilard
June 25, 2016
410

Machine Learning with H2O.ai - Budapest Data Science Meetup - July 2016

szilard

June 25, 2016
Tweet

More Decks by szilard

Transcript

  1. Disclaimer: I am not representing my employer (Epoch) in this

    talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk
  2. Supervised Learning y = f(x) train: “learn” f from data

    X (n*p), y (n) score: f(x’) algos: k-NN, LR, NB, RF, GBM, SVM, NN, DL… goal: max accuracy measure (on new data)
  3. Supervised Learning y = f(x) train: “learn” f from data

    X (n*p), y (n) score: f(x’) algos: k-NN, LR, NB, RF, GBM, SVM, NN, DL… goal: max accuracy measure (on new data) f ∈ F(θ) min θ ( L(y, f(x,θ)) + R(θ) ) on train set
  4. Supervised Learning y = f(x) train: “learn” f from data

    X (n*p), y (n) score: f(x’) algos: k-NN, LR, NB, RF, GBM, SVM, NN, DL… goal: max accuracy measure (on new data) f ∈ F(θ) min θ ( L(y, f(x,θ)) + R(θ) ) on train set evaluate on separate test set /cross validation
  5. Model selection: Need Vary λ and get model with best

    accuracy on validation set Evaluate final model on test set /cross validation
  6. Disclaimer: I’m not affiliated with H2O.ai. It’s just that in

    my opinion H2O is a machine learning tool with several advantages. There are many other good tools (and many more awful ones).
  7. - high-performance implementation of best algos (RF, GBM, NN etc.)

    - R, Python etc. interfaces, easy to use API
  8. - high-performance implementation of best algos (RF, GBM, NN etc.)

    - R, Python etc. interfaces, easy to use API - open source - advisors: Hastie, Tibshirani
  9. - high-performance implementation of best algos (RF, GBM, NN etc.)

    - R, Python etc. interfaces, easy to use API - open source - advisors: Hastie, Tibshirani - Java, but C-style memalloc, by Java gurus - distributed, “big data”
  10. - high-performance implementation of best algos (RF, GBM, NN etc.)

    - R, Python etc. interfaces, easy to use API - open source - advisors: Hastie, Tibshirani - Java, but C-style memalloc, by Java gurus - distributed, “big data” - many knobs/tuning, model evaluation, cross validation, model selection (hyperparameter search)
  11. - high-performance implementation of best algos (RF, GBM, NN etc.)

    - R, Python etc. interfaces, easy to use API - open source - advisors: Hastie, Tibshirani - Java, but C-style memalloc, by Java gurus - distributed, “big data” - many knobs/tuning, model evaluation, cross validation, model selection (hyperparameter search) - ensembles (from R) - model deployment (POJO export), fast scoring (<1ms)