Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Usage of Spark GBDT at ele.me

Practical Usage of Spark GBDT at ele.me


David Chen

August 06, 2018


  1. Practical Usage of Spark GBDT at ele.me David Chen senior

    data engineer http://mvj3.com Aug 2018
  2. GBDT: Gradient Boosting Decision Tree 1. Classification tree 2. Regression

  3. Ensemble Learning Bagging Boosting Random Forest Adaptive Boosting Gradient Boosting

    GBDT * copied from one ppt of my colleague * http://qr.ae/TUIJi8
  4. Gradient Boosting * http://explained.ai/gradient-boosting/L2-loss.html#sec:2.3

  5. Feature Importance

  6. Predicting Models Delivery Time Route Plan MAE <= 3 min

    MAE <= 10 min Accuracy 83.5%
  7. Some Data training time 20min-2hours training sample 3-30million DAG tasks

    20-50+ model configuration {“numIterations”: 200, “maxDepth”: 5, “maxBins”:28} daily requests 10million-30+million single predict /response time 2ms / 5-12ms serialised model size 50KB-1.5MB features size 40-100 spark version 2.1.0 alternative framework XGBoost, TensorFlowDNN, Facebook GBDT+LR
  8. Useful Links 1. https://spark.apache.org/docs/latest/mllib-decision-tree.html 2. https://spark.apache.org/docs/latest/mllib-ensembles.html