Better than Deep Learning: Gradient Boosting Machines (GBM) - Crunch Conference - Budapest, Oct 2018

Ce8e94cc306ba164175f693fb01aa8b0?s=47 szilard
October 16, 2018
79

Better than Deep Learning: Gradient Boosting Machines (GBM) - Crunch Conference - Budapest, Oct 2018

Ce8e94cc306ba164175f693fb01aa8b0?s=128

szilard

October 16, 2018
Tweet

Transcript

  1. Better than Deep Learning: Gradient Boosting Machines (GBM) Szilárd Pafka,

    PhD Chief Scientist, Epoch USA Crunch Conference, Budapest Oct 2018
  2. None
  3. Disclaimer: I am not representing my employer (Epoch) in this

    talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk
  4. Source: Andrew Ng

  5. Source: Andrew Ng

  6. Source: Andrew Ng

  7. None
  8. None
  9. None
  10. None
  11. Source: https://twitter.com/iamdevloper/

  12. None
  13. ...

  14. None
  15. None
  16. http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf http://lowrank.net/nikos/pubs/empirical.pdf

  17. http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf http://lowrank.net/nikos/pubs/empirical.pdf

  18. None
  19. None
  20. None
  21. None
  22. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL
  23. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends
  24. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all
  25. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning
  26. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles
  27. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering
  28. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering / other goals e.g. interpretability
  29. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering / other goals e.g. interpretability the title of this talk was misguided
  30. structured/tabular data: GBM (or RF) very small data: LR very

    large sparse data: LR with SGD (+L1/L2) images/videos, speech: DL it depends / try them all / hyperparam tuning / ensembles feature engineering / other goals e.g. interpretability the title of this talk was misguided but so is recently almost every use of the term AI
  31. Source: Hastie etal, ESL 2ed

  32. Source: Hastie etal, ESL 2ed

  33. Source: Hastie etal, ESL 2ed

  34. Source: Hastie etal, ESL 2ed

  35. None
  36. I usually use other people’s code [...] I can find

    open source code for what I want to do, and my time is much better spent doing research and feature engineering -- Owen Zhang http://blog.kaggle.com/2015/06/22/profiling-top-kagglers-owen-zhang-currently-1-in-the-world/
  37. None
  38. None
  39. None
  40. None
  41. 10x

  42. None
  43. None
  44. 10x

  45. None
  46. None
  47. None
  48. None
  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None
  56. None
  57. None
  58. None
  59. None
  60. None
  61. http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

  62. http://www.argmin.net/2016/06/20/hypertuning/

  63. None
  64. None
  65. None
  66. None
  67. None
  68. None
  69. no-one is using this crap

  70. None
  71. None
  72. None
  73. More:

  74. None