Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Usage of Spark GBDT at ele.me

Practical Usage of Spark GBDT at ele.me

David Chen

August 06, 2018
Tweet

More Decks by David Chen

Other Decks in Technology

Transcript

  1. Practical Usage of
    Spark GBDT at ele.me
    David Chen
    senior data engineer
    http://mvj3.com
    Aug 2018

    View Slide

  2. GBDT: Gradient Boosting
    Decision Tree
    1. Classification tree
    2. Regression tree

    View Slide

  3. Ensemble Learning
    Bagging Boosting
    Random Forest
    Adaptive Boosting Gradient Boosting
    GBDT
    * copied from one ppt of my colleague
    * http://qr.ae/TUIJi8

    View Slide

  4. Gradient Boosting
    * http://explained.ai/gradient-boosting/L2-loss.html#sec:2.3

    View Slide

  5. Feature Importance

    View Slide

  6. Predicting Models
    Delivery Time
    Route Plan
    MAE <= 3 min
    MAE <= 10 min
    Accuracy 83.5%

    View Slide

  7. Some Data
    training time 20min-2hours
    training sample 3-30million
    DAG tasks 20-50+
    model configuration {“numIterations”: 200, “maxDepth”: 5,
    “maxBins”:28}
    daily requests 10million-30+million
    single predict /response time 2ms / 5-12ms
    serialised model size 50KB-1.5MB
    features size 40-100
    spark version 2.1.0
    alternative framework XGBoost, TensorFlowDNN,
    Facebook GBDT+LR

    View Slide

  8. Useful Links
    1. https://spark.apache.org/docs/latest/mllib-decision-tree.html
    2. https://spark.apache.org/docs/latest/mllib-ensembles.html

    View Slide