Data science driving scalability
throttling rtb bid requests
Jean-Baptiste Pettit
Slide 2
Slide 2 text
Teads & me
• Online advertisement, mainly video
• inRead format is the flagship product
• Quality inventory and premium branding
• Data scientist/ML engineer at Teads for 3 years
• PhD in statistics, Masters in computer science
Slide 3
Slide 3 text
Online advertising
How does it work?
Slide 4
Slide 4 text
Protocol openRTB
Wires, network, machines, infrastructure… => $$$
Online advertising
How does it work?
The issue
Teads
SSP
External
DSP 1
External
DSP 2
External
DSP 3
External
DSP 4
Ad request
RTB bid responses
=> Sending bid requests to non responding DSPs is wasteful and expensive for
everybody
Slide 7
Slide 7 text
The solution
=> Just, like, don’t send them, you know?
Slide 8
Slide 8 text
The solution
• Prediction model to estimate
• P(Bid response of DSP | Context of the ad request)
• If this probability < threshold
• don’t send the request
• Questions:
• What’s the relevant context => features?
• What statistical model could we use?
• How to determine the threshold?
Slide 9
Slide 9 text
Features
• User
• Country
• Device
• Browser
• OS
• User mapping
• Publisher
• Website
• Article content
• Contextual
• Time of day
• Day of week…
Slide 10
Slide 10 text
Model
• Logistic regression
• Not very sexy but…
• Fast and reliable in online use, easy to optimize offline
Slide 11
Slide 11 text
Model
• Trained every hour on 6 hours of data with Spark on EMR
• Fully separated pipelines EU/US/APAC
• 10% exploration
Choosing the threshold
• The model is good but how can we determine and
adjust the threshold?
• Fix true positive rate target, set in database
• Hourly job to compute optimal threshold over the last x hours to reach
target.
Slide 15
Slide 15 text
Online results
Slide 16
Slide 16 text
Impact on infrastructure
Impact on infrastructure
DB load
Req/sec
Slide 17
Slide 17 text
$$$
Slide 18
Slide 18 text
Impact on business
AB testing
Slide 19
Slide 19 text
Impact on business
AB testing
Slide 20
Slide 20 text
Impact on business
AB testing
Slide 21
Slide 21 text
Conclusions
• Simple model at the heart of the business
• Win-win-win situation
• Low hanging fruits are everywhere in high volume
businesses
• Not all machine learning needs Deep Neural Networks
Slide 22
Slide 22 text
Thank you
I’ll take questions if I managed my time well,
otherwise sorry :(