$30 off During Our Annual Pro Sale. View Details »

Better than My Meetup/Conference Talks: Going Deeper in Various GBM Topics - GBM Advanced Workshop - Budapest, Nov 2019

szilard
November 09, 2019
46

Better than My Meetup/Conference Talks: Going Deeper in Various GBM Topics - GBM Advanced Workshop - Budapest, Nov 2019

szilard

November 09, 2019
Tweet

More Decks by szilard

Transcript

  1. Better than My Meetup/Conference Talks:
    Going Deeper in Various GBM Topics
    Szilard Pafka, PhD
    Chief Scientist, Epoch (USA)
    GBM Advanced Workshop Budapest
    Nov 2019

    View Slide

  2. Why GBMs

    View Slide

  3. View Slide

  4. meetup/conference talks
    going deeper
    section dividers

    View Slide

  5. View Slide

  6. Disclaimer:
    I am not representing my employer (Epoch) in this talk
    I cannot confirm nor deny if Epoch is using any of the methods, tools,
    results etc. mentioned in this talk

    View Slide

  7. Source: Andrew Ng

    View Slide

  8. Source: Andrew Ng

    View Slide

  9. Source: Andrew Ng

    View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. View Slide

  16. ...

    View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. http://lowrank.net/nikos/pubs/empirical.pdf
    http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

    View Slide

  26. http://lowrank.net/nikos/pubs/empirical.pdf
    http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

    View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. top algos (RF, boosting), all features
    2007

    View Slide

  32. top algos (RF, boosting), all features
    most algos (lin, tree, nnet)
    worst algos (knn, NB)
    2007

    View Slide

  33. top algos (RF, boosting), all features
    most algos (lin, tree, nnet)
    worst algos (knn, NB)
    top algos, removed top feature(s)
    2007

    View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. Source: Hastie etal, ESL 2ed

    View Slide

  38. Source: Hastie etal, ESL 2ed

    View Slide

  39. GBM libs

    View Slide

  40. View Slide

  41. View Slide

  42. View Slide

  43. View Slide

  44. 10x

    View Slide

  45. 10x

    View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. View Slide

  53. View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. Scoring

    View Slide

  68. View Slide

  69. View Slide

  70. View Slide

  71. View Slide

  72. View Slide

  73. * very first request not shown >500ms (JVM “warmup”)

    View Slide

  74. View Slide

  75. View Slide

  76. View Slide

  77. View Slide

  78. View Slide

  79. View Slide

  80. View Slide

  81. View Slide

  82. View Slide

  83. View Slide

  84. GBM-perf github repo

    View Slide

  85. View Slide

  86. View Slide

  87. View Slide

  88. View Slide

  89. View Slide

  90. View Slide

  91. View Slide

  92. multi-core/socket

    View Slide

  93. View Slide

  94. View Slide

  95. View Slide

  96. View Slide

  97. View Slide

  98. View Slide

  99. CPU 1

    View Slide

  100. CPU 1 CPU 2

    View Slide

  101. CPU 1 CPU 2

    View Slide

  102. CPU 1 CPU 2

    View Slide

  103. CPU 1 CPU 2

    View Slide

  104. View Slide

  105. 5x
    3.5x

    View Slide

  106. View Slide

  107. View Slide

  108. View Slide

  109. View Slide

  110. View Slide

  111. View Slide

  112. View Slide

  113. View Slide

  114. View Slide

  115. View Slide

  116. View Slide

  117. View Slide

  118. View Slide

  119. View Slide

  120. zero

    View Slide

  121. View Slide

  122. View Slide

  123. Spark

    View Slide

  124. View Slide

  125. View Slide

  126. View Slide

  127. View Slide

  128. View Slide

  129. View Slide

  130. View Slide

  131. View Slide

  132. View Slide

  133. View Slide

  134. View Slide

  135. View Slide

  136. View Slide

  137. View Slide

  138. View Slide

  139. View Slide

  140. View Slide

  141. View Slide

  142. View Slide

  143. View Slide

  144. View Slide

  145. View Slide

  146. View Slide

  147. View Slide

  148. View Slide

  149. View Slide

  150. View Slide

  151. View Slide

  152. GPU

    View Slide

  153. View Slide

  154. View Slide

  155. View Slide

  156. View Slide

  157. catboost

    View Slide

  158. View Slide

  159. View Slide

  160. View Slide

  161. View Slide

  162. View Slide

  163. View Slide

  164. View Slide

  165. View Slide

  166. View Slide

  167. View Slide

  168. API / tuning

    View Slide

  169. View Slide

  170. View Slide

  171. View Slide

  172. View Slide

  173. View Slide

  174. http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

    View Slide

  175. http://www.argmin.net/2016/06/20/hypertuning/

    View Slide

  176. http://www.argmin.net/2016/06/20/hypertuning/

    View Slide

  177. View Slide

  178. View Slide

  179. View Slide

  180. time ordered data time ordered data

    View Slide

  181. time ordered data time ordered data
    train
    sample

    View Slide

  182. time ordered data time ordered data
    train test
    sample sample
    (slightly different distribution)

    View Slide

  183. time ordered data time ordered data
    train test
    sample sample
    proper
    train
    early
    stopping
    Model
    selection
    resampled 80-10-10 (~CV)
    (slightly different distribution)

    View Slide

  184. time ordered data time ordered data
    train test
    sample sample
    proper
    train
    early
    stopping
    Model
    selection
    random search over lightgbm
    resampled 80-10-10 (~CV)
    (slightly different distribution)

    View Slide

  185. View Slide

  186. View Slide

  187. View Slide

  188. View Slide

  189. View Slide

  190. View Slide

  191. View Slide

  192. View Slide

  193. Closing

    View Slide

  194. View Slide

  195. View Slide

  196. Source: https://www.linkedin.com/pulse/winning-solution-kaggledays-2019-competition-san-francisco-mark-peng/

    View Slide

  197. Source: https://www.linkedin.com/pulse/winning-solution-kaggledays-2019-competition-san-francisco-mark-peng/

    View Slide

  198. Source: https://www.linkedin.com/pulse/winning-solution-kaggledays-2019-competition-san-francisco-mark-peng/

    View Slide

  199. Source: https://www.linkedin.com/pulse/winning-solution-kaggledays-2019-competition-san-francisco-mark-peng/

    View Slide

  200. Source: https://www.linkedin.com/pulse/winning-solution-kaggledays-2019-competition-san-francisco-mark-peng/

    View Slide

  201. Source: https://www.linkedin.com/pulse/winning-solution-kaggledays-2019-competition-san-francisco-mark-peng/

    View Slide

  202. View Slide

  203. More:

    View Slide

  204. View Slide