Slide 1

Slide 1 text

Data science driving scalability throttling rtb bid requests Jean-Baptiste Pettit

Slide 2

Slide 2 text

Teads & me • Online advertisement, mainly video • inRead format is the flagship product • Quality inventory and premium branding • Data scientist/ML engineer at Teads for 3 years • PhD in statistics, Masters in computer science

Slide 3

Slide 3 text

Online advertising How does it work?

Slide 4

Slide 4 text

Protocol openRTB Wires, network, machines, infrastructure… => $$$ Online advertising How does it work?

Slide 5

Slide 5 text

The issue Teads SSP External DSP 1 External DSP 2 External DSP 3 External DSP 4 Ad request RTB bid requests

Slide 6

Slide 6 text

The issue Teads SSP External DSP 1 External DSP 2 External DSP 3 External DSP 4 Ad request RTB bid responses => Sending bid requests to non responding DSPs is wasteful and expensive for everybody

Slide 7

Slide 7 text

The solution => Just, like, don’t send them, you know?

Slide 8

Slide 8 text

The solution • Prediction model to estimate • P(Bid response of DSP | Context of the ad request) • If this probability < threshold • don’t send the request • Questions: • What’s the relevant context => features? • What statistical model could we use? • How to determine the threshold?

Slide 9

Slide 9 text

Features • User • Country • Device • Browser • OS • User mapping • Publisher • Website • Article content • Contextual • Time of day • Day of week…

Slide 10

Slide 10 text

Model • Logistic regression • Not very sexy but… • Fast and reliable in online use, easy to optimize offline

Slide 11

Slide 11 text

Model • Trained every hour on 6 hours of data with Spark on EMR • Fully separated pipelines EU/US/APAC • 10% exploration

Slide 12

Slide 12 text

Offline results True positive rate
 False positive rate (= 1 - True negative rate)

Slide 13

Slide 13 text

Offline results

Slide 14

Slide 14 text

Choosing the threshold • The model is good but how can we determine and 
 adjust the threshold? • Fix true positive rate target, set in database • Hourly job to compute optimal threshold over the last x hours to reach target.

Slide 15

Slide 15 text

Online results

Slide 16

Slide 16 text

Impact on infrastructure Impact on infrastructure DB load Req/sec

Slide 17

Slide 17 text

$$$

Slide 18

Slide 18 text

Impact on business AB testing

Slide 19

Slide 19 text

Impact on business AB testing

Slide 20

Slide 20 text

Impact on business AB testing

Slide 21

Slide 21 text

Conclusions • Simple model at the heart of the business • Win-win-win situation • Low hanging fruits are everywhere in high volume businesses • Not all machine learning needs Deep Neural Networks

Slide 22

Slide 22 text

Thank you I’ll take questions if I managed my time well, otherwise sorry :(