Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Usage of Spark GBDT at ele.me

David Chen
August 06, 2018

Practical Usage of Spark GBDT at ele.me

David Chen

August 06, 2018
Tweet

More Decks by David Chen

Other Decks in Technology

Transcript

  1. Practical Usage of Spark GBDT at ele.me David Chen senior

    data engineer http://mvj3.com Aug 2018
  2. Ensemble Learning Bagging Boosting Random Forest Adaptive Boosting Gradient Boosting

    GBDT * copied from one ppt of my colleague * http://qr.ae/TUIJi8
  3. Some Data training time 20min-2hours training sample 3-30million DAG tasks

    20-50+ model configuration {“numIterations”: 200, “maxDepth”: 5, “maxBins”:28} daily requests 10million-30+million single predict /response time 2ms / 5-12ms serialised model size 50KB-1.5MB features size 40-100 spark version 2.1.0 alternative framework XGBoost, TensorFlowDNN, Facebook GBDT+LR