Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modern gold mining: Leveraging Deep Learning to...

Modern gold mining: Leveraging Deep Learning to predict GMV for 100k+ shops

Shopify houses more than 600,000 merchants all around the world. Shopify Capital aims to provide merchants with accessible cash to help them grow their businesses in a timely manner. With our high volume of data we are able to provide better funding options for our merchants by looking into many features to predict how well they will perform, thus providing them with affordable and coherent cash advances.

In this talk we will present the challenges faced along the way with technical information about how we explore and fine tune our models: from a fully isolated gradient boosting tree model to an attentional decoupled deep learning model sending predictions over Kafka. We also explore how we model and organize the our scalable ML pipelines at Shopify.

Breno Freitas

May 06, 2019
Tweet

More Decks by Breno Freitas

Other Decks in Technology

Transcript

  1. Hi, I’m Breno! • Data Scientist @Shopify • M.Sc. in

    ML • Web Dev for many years • Beanie collector (15 so far) • @brenolf_ • @brenolf • //breno.io
  2. US$0 US$2,250 US$4,500 US$6,750 US$9,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales Monthly Sales Offer
  3. - Google’s Rules of Machine Learning “Rule #1: Don’t be

    afraid to launch a product without machine learning.”
  4. US$0 US$2,250 US$4,500 US$6,750 US$9,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales
  5. How do we solve it linearly, again? y ∼ mx

    + b E(y | X) = mx + b RSS = n ∑ i= 1 (y i − (mx i + b))2
  6. US$0 US$4,000 US$8,000 US$12,000 US$16,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales
  7. US$0 US$4,000 US$8,000 US$12,000 US$16,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales }
  8. US$0 US$75,000 US$150,000 US$225,000 US$300,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales
  9. US$0 US$75,000 US$150,000 US$225,000 US$300,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales Apr M ay Jun Jul Aug Sep O ct N ov D ec Jan Cumulative Sales
  10. We need confidence levels y ∼ mx + b Q

    (y | X) (τ) = in f{y : F (y | X) (y) ≥ τ}
  11. We need confidence levels y ∼ mx + b Q

    (y | X) (τ) = in f{y : F (y | X) (y) ≥ τ} QL = n ∑ i= 1 (y i − (mx i + b)) ⋅ (τ − I (yi −(mxi + b)< 0) )
  12. We need confidence levels y ∼ mx + b Q

    (y | X) (τ) = in f{y : F (y | X) (y) ≥ τ} QL = n ∑ i= 1 (y i − (mx i + b)) ⋅ (τ − I (yi −(mxi + b)< 0) )
  13. US$0 US$77,500 US$155,000 US$232,500 US$310,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales 10% 90%
  14. Feature Base Slice Feature Slice #1 Feature Slice #2 Features

    Set Targets Test/Train Split Isolating shops’ histories
  15. Model Definition Training Backtesting Inference Metrics Feature Base Slice Feature

    Slice #1 Feature Slice #2 Features Set Targets Test/Train Split Isolating shops’ histories
  16. First Iteration • Implemented in R • Quantile Random Forest

    • List of Offers manually generated and sent to Bourgeois
  17. First Iteration • Implemented in R • Quantile Random Forest

    • List of Offers manually generated and sent to Bourgeois
  18. How intricate was it? • 21 Models per confidence level

    • Pickled individually • 2000 trees, 8 levels deep • 42 containers with 2 cores • 19Gb Python RAM + 8Gb Java • +50Gb driver heap • Feature set has ~440M rows • using ~35 features
  19. US$0 US$2,250 US$4,500 US$6,750 US$9,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales Monthly Sales Offer
  20. US$0 US$2,250 US$4,500 US$6,750 US$9,000 Apr M ay Jun Jul

    Aug Sep O ct N ov D ec Jan Cumulative Sales Monthly Sales Offer
  21. How did we move from Quantile Forest? • Set product

    goals to achieve • Can we reduce the fast payers? • Can we make better offers? • XGBoost analysis • Started to play with monthly prediction with simpler neural net • All in Jupyter Notebooks • Moved into a different structure for weekly prediction
  22. Setup for the problem • We want to get the

    fluctuations • We have a series of sequential features • GMV, orders, support data, etc. • We have static features • Cumulatives, admin information, etc. • We want to predict their cumulative GMV in the future Sounds a lot like sequence to sequence
  23. - Deep Learning, Goodfellow et. al., 2016 “Much as a

    convolutional network […] is specialized for processing […] an image, a recurrent network […] is specialized for processing a sequence of values […].”
  24. Feature Base Slice Feature Slice #1 Feature Slice #2 Features

    Set Targets Test/Train Split Train tfrecords Test tfrecords
  25. Model Definition Training Backtesting Inference Metrics Feature Base Slice Feature

    Slice #1 Feature Slice #2 Features Set Targets Test/Train Split Train tfrecords Test tfrecords
  26. Features 992.1 46.2 … 45 67.1 125.5 … 90.1 …

    N0 NT … Numeric “abc” “edf” … “lak” “oqj” “xyz” … “uyw” … E0 ET … 0.5 -0.5 … 1.2 -1 3.4 … 5 … 1.3 4.5 … -5 5.2 1.2 … 1 … ⊕ … ⊕ Non-Numeric
  27. Static Features N0 NT … 992.1 46.2 … 45 67.1

    125.5 … 90.1 … Numeric E0 ET … “abc” “edf” … “lak” “oqj” “xyz” … “uyw” … 0.5 -0.5 … 1.2 -1 3.4 … 5 … 1.3 4.5 … -5 5.2 1.2 … 1 … ⊕ … ⊕ Non-Numeric
  28. N0 NT … E0 ET … Dimensionality Reduction Convolution N0

    N(T/q) … Reshape E0 E(T-q) … Eq ET ⊕ ⊕
  29. Encoder N0 NT … E0 ET ⊕ ⊕ GRU GRU

    X0 XT … … … GRU GRU
  30. Encoder N0 NT … E0 ET ⊕ ⊕ GRU GRU

    X0 XT … … … GRU GRU H0 HT
  31. Encoder N0 NT … E0 ET ⊕ ⊕ GRU GRU

    X0 XT … … … GRU GRU H0 HT S Dense C
  32. Decoder GRU1 GRU2 H0 H1 H2 H3 HT … Memory

    * The same effect could also be achieved by using a sufficiently large RNN trained for long enough — Cho et al. (2014), Sutskever et. Al (2014)
  33. How intricate is it now? • 1 model trained x

    4 confidence levels • 2 Layers for Encoder/Decoder • 256 units • 40% dropout • Trained on 8 NVIDIA Tesla K80 • Feature set has ~440M rows • using ~35 features