Better than Deep Learning: Gradient Boosting Machines (GBM) in R - eRum Conference - Budapest, May 2018

Ce8e94cc306ba164175f693fb01aa8b0?s=47 szilard
May 11, 2018
2k

Better than Deep Learning: Gradient Boosting Machines (GBM) in R - eRum Conference - Budapest, May 2018

Ce8e94cc306ba164175f693fb01aa8b0?s=128

szilard

May 11, 2018
Tweet

Transcript

  1. Better than Deep Learning: Gradient Boosting Machines (GBM) in R

    Szilárd Pafka, PhD Chief Scientist, Epoch (USA) eRum Conference, Budapest May 2018
  2. None
  3. Disclaimer: I am not representing my employer (Epoch) in this

    talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk
  4. Source: Andrew Ng

  5. Source: Andrew Ng

  6. Source: Andrew Ng

  7. None
  8. None
  9. None
  10. None
  11. Source: https://twitter.com/iamdevloper/

  12. None
  13. ...

  14. None
  15. None
  16. http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf http://lowrank.net/nikos/pubs/empirical.pdf

  17. http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf http://lowrank.net/nikos/pubs/empirical.pdf

  18. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL
  19. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends
  20. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all
  21. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning
  22. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles
  23. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering
  24. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering / other goals e.g. interpretability
  25. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering / other goals e.g. interpretability the title of this talk was misguided
  26. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering / other goals e.g. interpretability the title of this talk was misguided but so is recently almost every use of the term AI
  27. Source: Hastie etal, ESL 2ed

  28. Source: Hastie etal, ESL 2ed

  29. Source: Hastie etal, ESL 2ed

  30. Source: Hastie etal, ESL 2ed

  31. None
  32. None
  33. 10x

  34. None
  35. None
  36. 10x

  37. None
  38. None
  39. None
  40. None
  41. None
  42. None
  43. None
  44. None
  45. None
  46. None
  47. None
  48. http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

  49. http://www.argmin.net/2016/06/20/hypertuning/

  50. None
  51. None
  52. None
  53. None
  54. None
  55. More:

  56. None
  57. None
  58. Backup Slides

  59. All benchmarks are wrong, but some are useful

  60. None
  61. None
  62. None
  63. None
  64. None
  65. None
  66. None