Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data science driving scalability

Data science driving scalability

In this presentation I'll detail an example of how data science has made its way into the heart of the business at Teads by allowing to ease the incredible scale we're experiencing, by saving up to 60% of our network costs while increasing the quality of the trafic we send to our clients. The project is called the rtb bid requests "throttling", it's a classification model used in real time which allows to filter out bid requests that we should be sending to our clients (DSPs) from our markeplace (SSP), but that we decide not to send if we detect that the probability of response by the DSP is too low.

jbpettitTeads

June 28, 2018
Tweet

Other Decks in Technology

Transcript

  1. Teads & me • Online advertisement, mainly video • inRead

    format is the flagship product • Quality inventory and premium branding • Data scientist/ML engineer at Teads for 3 years • PhD in statistics, Masters in computer science
  2. The issue Teads SSP External DSP 1 External DSP 2

    External DSP 3 External DSP 4 Ad request RTB bid requests
  3. The issue Teads SSP External DSP 1 External DSP 2

    External DSP 3 External DSP 4 Ad request RTB bid responses => Sending bid requests to non responding DSPs is wasteful and expensive for everybody
  4. The solution • Prediction model to estimate • P(Bid response

    of DSP | Context of the ad request) • If this probability < threshold • don’t send the request • Questions: • What’s the relevant context => features? • What statistical model could we use? • How to determine the threshold?
  5. Features • User • Country • Device • Browser •

    OS • User mapping • Publisher • Website • Article content • Contextual • Time of day • Day of week…
  6. Model • Logistic regression • Not very sexy but… •

    Fast and reliable in online use, easy to optimize offline
  7. Model • Trained every hour on 6 hours of data

    with Spark on EMR • Fully separated pipelines EU/US/APAC • 10% exploration
  8. Choosing the threshold • The model is good but how

    can we determine and 
 adjust the threshold? • Fix true positive rate target, set in database • Hourly job to compute optimal threshold over the last x hours to reach target.
  9. $$$

  10. Conclusions • Simple model at the heart of the business

    • Win-win-win situation • Low hanging fruits are everywhere in high volume businesses • Not all machine learning needs Deep Neural Networks