Ad Serving System

83fb9f537f0aedcfbfc22dc395e85c84?s=47 daimatz
February 26, 2017

Ad Serving System

83fb9f537f0aedcfbfc22dc395e85c84?s=128

daimatz

February 26, 2017
Tweet

Transcript

  1. Ad Serving System
 on 
 Finagle and Thrift Daisuke Matsumoto

    (@daimatz) 
 2017-02-26 (Sun)
 ScalaMatsuri 2017 Unconference
  2. About me • Daisuke Matsumoto (@daimatz) • Co-founder, VP of

    Engineering at FIVE Inc. • Largest Mobile Video Advertising Platform in Japan • My engineering roles are:
 Front-end Servers, Back-end Servers, Dashboards, Log Analysis and Reporting Batches, Operation Tools, Android SDK, iOS SDK, etc…
  3. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  4. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  5. Finagle • Twitter’s open source RPC system • Built on

    Scala • Twitter is an early adopter of Scala • Future, Try, Duration, …
 Many Scala utilities (mostly related to concurrency) are inspired by Twitter’s common library.
  6. “Your Server as a Function” • Marius Eriksen at Twitter

    • A lead developer of Finagle • https://monkey.org/~marius/funsrv.pdf • The paper introduces how to use Twitter’s Future and Finagle’s Service
  7. Futures [1/3] • A great abstraction of callback based programming

    • JavaScript’s Promise, Java 8’s CompletableFuture,
 C++11’s std::future • A value of Future[A] is a placeholder to hold the result of an asynchronous operation • Typically it will issue some IO operations that may fail. • What makes different from traditional callback style is “composable”
  8. Futures [2/3] • f: Registering callback function transforms Future[A] to

    another Future[B] • g: In Scala, we can use for-comprehension
  9. Futures [3/3] • Futures are composable • There are many

    utilities to compose Futures:
  10. Finagle’s Service • A Service is an asynchronous function •

    Service represents both server and client • A Server is a function to implement the Service; Finagle dispatches incoming requests to it • A Client is a function to use the Service; Finagle dispatches requests to the service and handle responses
  11. Finagle’s Service

  12. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  13. Thrift • A data serialization format and RPC interface •

    Users write data definitions and RPC interfaces in Thrift IDL • Thrift compiler generates code to serialize/ deserialize data and RPC client/server. • Originally developed by Facebook and now it’s an Apache project
  14. Thrift elements • Primitive types: bool, byte, i16, i32, i64,

    double, binary, string • User defined types: enum, struct, union • Container types: optional, list<T>, set<T>, map<K,V> • Unlike protobuf, map key can be any types including user defined structs • But I don’t recommend • `service` keyword defines RPC interface • Each struct fields and RPC parameters have unique id: 1,2,3,…
  15. None
  16. Thrift generator • Thrift supports Java code generation, of course,

    that can be used from Scala • “But the generated code uses Java collections and mutable “bean” classes, causing some annoying boilerplate conversions to be hand-written.” • Twitter developed their own Thrift parser/generator, called Scrooge • https://twitter.github.io/scrooge/
  17. Scrooge • A Thrift parser and code generator that generates

    Scala-friendly API • list, set, map by scala.collection.{Seq, Set, Map} • struct by immutable case class • enum, union by sealed trait • Easy interface to send/receive RPC • sbt support • Example codes are described later!
  18. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  19. None
  20. How we use Thrift • Data serialization • Store in

    DB • Logging • RPC • Splitting many services by Thrift RPC • Dashboard’s JSON-API schema
  21. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  22. None
  23. Store Thrift struct in DB • We use MySQL as

    a indexed KVS • All serialized data are stored in `bytes` column and only index keys are defined as other columns. • Joins and aggregate functions are calculated in application layer
  24. Data construction and serialization, save to DB

  25. How to manipulate our data • MySQL schema doesn’t change

    • But Thrift schema often drastically changes, which require data migration • There is no DB migration tool nor “UPDATE” statement applicable for serialized Thrift data • If you want to update data, you need to create a new sbt project, write a program that access to DB and change the data and save it, create a jar file, deploy, … • Or you can do it on sbt console in production environment… but it’s so painful
  26. Write operation scripts
 by Scala casually • We write operation

    scripts by Scala, called “ScalaScript” project • Originally implemented by Twitter’s util-eval but now it’s no more needed • 1. Prepare fat-jar file that all common libraries are included.
 2. Run the jar file with the argument of our script
 3. The script is dynamically loaded, compiled and executed with all classpaths enabled!
  27. ScalaScript example [1/2] • Prepare a fat-jar that contains all

    common libraries
  28. ScalaScript example [2/2] • We can reuse our in-house Scala

    library, such as DB access, Thrift serialization, Thrift RPC, running BigQuery, uploading to SpreadSheet, posting to Slack, … • What’s more, it’s type safe! • Write a script and run it
  29. Daily data editing • Of course there is a “rich”

    dashboard for daily operation • But experimental feature are often added to schema that dashboard can’t keep up with • Needs of simple data viewer/editor • There is no “phpMyAdmin”
  30. Scrooge exports parser API • Users can easily get ASTs

    of all defined structs • Without writing parser with your own hands • We can traverse the AST and create HTML form automatically
  31. Auto-generated HTML form

  32. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  33. None
  34. In-process Cache • Cold data, such as Campaign, Media, InternalUser,

    are cached on each servers’ process. • To reduce redis access • But if you update master of such cold data, servers need to reload them • Dashboard copies MySQL data to Redis, then sends RPCs to each servers to reload cache.
  35. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  36. None
  37. Scalable scorer • Scoring is the essence of ad serving

    • If it take much time than expected, we will partition campaigns to calc their scores on different nodes. • The paper “Your Server as a Function” shows an example of search query on different instances
  38. None
  39. RPC server on
 Scorer instances

  40. RPC client on
 Ad frontend instances

  41. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  42. None
  43. Communicate with
 external servers • Sometimes our servers need real

    time communication with external servers. • Here external means other company whose servers are across the internet • If we don’t have any ad to show but partner company have, we want to deliver it • We want to separate servers that receive high-traffic requests and ones that send outgoing requests.
  44. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  45. None
  46. Dashboard’s JSON-API • Dashboard’s JSON request/response are also defined as

    Thrift schema
  47. JavaScript codes • Actually it is not RPC, just HTTP

    request/response • But we can share schema on client and server
  48. Where we don’t use Thrift • SDK-to-server communication • We

    provide iOS/Android SDK to partner developers • Our SDK doesn’t depend on any other libraries as they can easily implement it to their apps
  49. Conclusion • Introduced how we use Finagle and peripheral tools