Modern gold mining: Leveraging Deep Learning to predict GMV for 100k+ shops

Modern Gold Mining How we use Deep Learning to predict
sales of 100k+ shops

Hi, I’m Breno! • Data Scientist @Shopify • M.Sc. in
ML • Web Dev for many years • Beanie collector (15 so far) • @brenolf_ • @brenolf • //breno.io

Currency helped scaling human production.

Tech shaped the way we interact with products.

175 800k+ Shopify in numbers countries merchants $1.5B over 2018
BFCM $100B total sales

Starting a dream is not always easy.

Shopify Capital provides merchants timely access to affordable funding to
help them grow.

What’s the problem we’re solving?

US$0 US$2,250 US$4,500 US$6,750 US$9,000 Apr M ay Jun Jul
Aug Sep O ct N ov D ec Jan Cumulative Sales Monthly Sales Offer

O ct N ov D ec Jan This is an
offer

- Google’s Rules of Machine Learning “Rule #1: Don’t be
afraid to launch a product without machine learning.”

Aug Sep O ct N ov D ec Jan Cumulative Sales

How do we solve it linearly, again?

How do we solve it linearly, again? y ∼ mx
+ b

+ b E(y | X) = mx + b

+ b E(y | X) = mx + b RSS = n ∑ i= 1 (y i − (mx i + b))2

Aug Sep O ct N ov D ec Jan Cumulative Sales }

Aug Sep O ct N ov D ec Jan Cumulative Sales Apr M ay Jun Jul Aug Sep O ct N ov D ec Jan Cumulative Sales

We need confidence levels

We need confidence levels y ∼ mx + b

We need confidence levels y ∼ mx + b Q
(y | X) (τ) = in f{y : F (y | X) (y) ≥ τ}

We need confidence levels y ∼ mx + b Q
(y | X) (τ) = in f{y : F (y | X) (y) ≥ τ} QL = n ∑ i= 1 (y i − (mx i + b)) ⋅ (τ − I (yi −(mxi + b)< 0) )

Aug Sep O ct N ov D ec Jan Cumulative Sales 10% 90%

Inference + Business Logic = Capital

How did we implement it?

Record-Based Data Databases

Record-Based Data Databases Data Acquisition

Record-Based Data Data Integration Databases Starscream Data Acquisition

Record-Based Data Data Integration Delivery Databases Starscream Data Acquisition

Record-Based Data Data Integration Delivery Analytics Databases Starscream Data Acquisition

Feature Base Slice Feature Slice #1 Feature Slice #2 Features
Set Targets Test/Train Split

Set Targets Test/Train Split Isolating shops’ histories

Model Definition Training Backtesting Inference Metrics Feature Base Slice Feature
Slice #1 Feature Slice #2 Features Set Targets Test/Train Split Isolating shops’ histories

First Iteration • Implemented in R • Quantile Random Forest
• List of Offers manually generated and sent to Bourgeois

Moving into Starscream

Moving into Starscream HTTP

Moving into Starscream

How intricate was it? • 21 Models per confidence level
• Pickled individually • 2000 trees, 8 levels deep • 42 containers with 2 cores • 19Gb Python RAM + 8Gb Java • +50Gb driver heap • Feature set has ~440M rows • using ~35 features

Into the Neural Nets

Aug Sep O ct N ov D ec Jan Cumulative Sales Monthly Sales Offer

How did we move from Quantile Forest? • Set product
goals to achieve • Can we reduce the fast payers? • Can we make better offers? • XGBoost analysis • Started to play with monthly prediction with simpler neural net • All in Jupyter Notebooks • Moved into a different structure for weekly prediction

Setup for the problem • We want to get the
fluctuations • We have a series of sequential features • GMV, orders, support data, etc. • We have static features • Cumulatives, admin information, etc. • We want to predict their cumulative GMV in the future Sounds a lot like sequence to sequence

- Deep Learning, Goodfellow et. al., 2016 “Much as a
convolutional network […] is specialized for processing […] an image, a recurrent network […] is specialized for processing a sequence of values […].”

Set Targets Test/Train Split Train tfrecords Test tfrecords

Model Definition Training Backtesting Inference Metrics Feature Base Slice Feature
Slice #1 Feature Slice #2 Features Set Targets Test/Train Split Train tfrecords Test tfrecords

Features 992.1 46.2 … 45 67.1 125.5 … 90.1 …
N0 NT … Numeric “abc” “edf” … “lak” “oqj” “xyz” … “uyw” … E0 ET … 0.5 -0.5 … 1.2 -1 3.4 … 5 … 1.3 4.5 … -5 5.2 1.2 … 1 … ⊕ … ⊕ Non-Numeric

Static Features N0 NT … 992.1 46.2 … 45 67.1
125.5 … 90.1 … Numeric E0 ET … “abc” “edf” … “lak” “oqj” “xyz” … “uyw” … 0.5 -0.5 … 1.2 -1 3.4 … 5 … 1.3 4.5 … -5 5.2 1.2 … 1 … ⊕ … ⊕ Non-Numeric

Static Features N0 NT … Numeric E0 ET … Non-Numeric

N0 NT … Numeric E0 ET … Non-Numeric Static Features
Dense S

N0 NT … E0 ET … Dimensionality Reduction Convolution N0
N(T/q) … Reshape E0 E(T-q) … Eq ET ⊕ ⊕

Encoder N0 NT … E0 ET ⊕ ⊕

Encoder N0 NT … E0 ET ⊕ ⊕ X0 XT
…

Encoder N0 NT … E0 ET ⊕ ⊕ GRU GRU
X0 XT … … … GRU GRU

X0 XT … … … GRU GRU H0 HT

X0 XT … … … GRU GRU H0 HT S Dense C

Decoder XT C

Decoder GRU XT C

Decoder GRU Dense O1 XT C

Decoder GRU Dense O1 XT C O2T-1 … … GRU
OT Dense

Decoder GRU … GRU

Decoder GRU … GRU C H’1 H’T …

Decoder GRU … GRU C Dense Dense P1 PT H’1
H’T …

Decoder GRU1 GRU2

Decoder GRU1 GRU2 H0 H1 H2 H3 HT … Memory
* The same effect could also be achieved by using a sufficiently large RNN trained for long enough — Cho et al. (2014), Sutskever et. Al (2014)

November January February March October

November January February March October 30 metrics!

How intricate is it now? • 1 model trained x
4 confidence levels • 2 Layers for Encoder/Decoder • 256 units • 40% dropout • Trained on 8 NVIDIA Tesla K80 • Feature set has ~440M rows • using ~35 features

+15% +17% How much better was it? Duration Offer Amounts
-30% Fast Payers ~0% Loss Rate

Thanks! @brenolf_ //breno.io bit.ly/qcon-shopify

Modern gold mining: Leveraging Deep Learning to...

Modern gold mining: Leveraging Deep Learning to predict GMV for 100k+ shops

More Decks by Breno Freitas

Other Decks in Technology

Featured

Transcript