Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ad Serving System

daimatz
February 26, 2017

Ad Serving System

daimatz

February 26, 2017
Tweet

More Decks by daimatz

Other Decks in Technology

Transcript

  1. Ad Serving System
 on 
 Finagle and Thrift Daisuke Matsumoto

    (@daimatz) 
 2017-02-26 (Sun)
 ScalaMatsuri 2017 Unconference
  2. About me • Daisuke Matsumoto (@daimatz) • Co-founder, VP of

    Engineering at FIVE Inc. • Largest Mobile Video Advertising Platform in Japan • My engineering roles are:
 Front-end Servers, Back-end Servers, Dashboards, Log Analysis and Reporting Batches, Operation Tools, Android SDK, iOS SDK, etc…
  3. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  4. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  5. Finagle • Twitter’s open source RPC system • Built on

    Scala • Twitter is an early adopter of Scala • Future, Try, Duration, …
 Many Scala utilities (mostly related to concurrency) are inspired by Twitter’s common library.
  6. “Your Server as a Function” • Marius Eriksen at Twitter

    • A lead developer of Finagle • https://monkey.org/~marius/funsrv.pdf • The paper introduces how to use Twitter’s Future and Finagle’s Service
  7. Futures [1/3] • A great abstraction of callback based programming

    • JavaScript’s Promise, Java 8’s CompletableFuture,
 C++11’s std::future • A value of Future[A] is a placeholder to hold the result of an asynchronous operation • Typically it will issue some IO operations that may fail. • What makes different from traditional callback style is “composable”
  8. Futures [2/3] • f: Registering callback function transforms Future[A] to

    another Future[B] • g: In Scala, we can use for-comprehension
  9. Finagle’s Service • A Service is an asynchronous function •

    Service represents both server and client • A Server is a function to implement the Service; Finagle dispatches incoming requests to it • A Client is a function to use the Service; Finagle dispatches requests to the service and handle responses
  10. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  11. Thrift • A data serialization format and RPC interface •

    Users write data definitions and RPC interfaces in Thrift IDL • Thrift compiler generates code to serialize/ deserialize data and RPC client/server. • Originally developed by Facebook and now it’s an Apache project
  12. Thrift elements • Primitive types: bool, byte, i16, i32, i64,

    double, binary, string • User defined types: enum, struct, union • Container types: optional, list<T>, set<T>, map<K,V> • Unlike protobuf, map key can be any types including user defined structs • But I don’t recommend • `service` keyword defines RPC interface • Each struct fields and RPC parameters have unique id: 1,2,3,…
  13. Thrift generator • Thrift supports Java code generation, of course,

    that can be used from Scala • “But the generated code uses Java collections and mutable “bean” classes, causing some annoying boilerplate conversions to be hand-written.” • Twitter developed their own Thrift parser/generator, called Scrooge • https://twitter.github.io/scrooge/
  14. Scrooge • A Thrift parser and code generator that generates

    Scala-friendly API • list, set, map by scala.collection.{Seq, Set, Map} • struct by immutable case class • enum, union by sealed trait • Easy interface to send/receive RPC • sbt support • Example codes are described later!
  15. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  16. How we use Thrift • Data serialization • Store in

    DB • Logging • RPC • Splitting many services by Thrift RPC • Dashboard’s JSON-API schema
  17. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  18. Store Thrift struct in DB • We use MySQL as

    a indexed KVS • All serialized data are stored in `bytes` column and only index keys are defined as other columns. • Joins and aggregate functions are calculated in application layer
  19. How to manipulate our data • MySQL schema doesn’t change

    • But Thrift schema often drastically changes, which require data migration • There is no DB migration tool nor “UPDATE” statement applicable for serialized Thrift data • If you want to update data, you need to create a new sbt project, write a program that access to DB and change the data and save it, create a jar file, deploy, … • Or you can do it on sbt console in production environment… but it’s so painful
  20. Write operation scripts
 by Scala casually • We write operation

    scripts by Scala, called “ScalaScript” project • Originally implemented by Twitter’s util-eval but now it’s no more needed • 1. Prepare fat-jar file that all common libraries are included.
 2. Run the jar file with the argument of our script
 3. The script is dynamically loaded, compiled and executed with all classpaths enabled!
  21. ScalaScript example [2/2] • We can reuse our in-house Scala

    library, such as DB access, Thrift serialization, Thrift RPC, running BigQuery, uploading to SpreadSheet, posting to Slack, … • What’s more, it’s type safe! • Write a script and run it
  22. Daily data editing • Of course there is a “rich”

    dashboard for daily operation • But experimental feature are often added to schema that dashboard can’t keep up with • Needs of simple data viewer/editor • There is no “phpMyAdmin”
  23. Scrooge exports parser API • Users can easily get ASTs

    of all defined structs • Without writing parser with your own hands • We can traverse the AST and create HTML form automatically
  24. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  25. In-process Cache • Cold data, such as Campaign, Media, InternalUser,

    are cached on each servers’ process. • To reduce redis access • But if you update master of such cold data, servers need to reload them • Dashboard copies MySQL data to Redis, then sends RPCs to each servers to reload cache.
  26. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  27. Scalable scorer • Scoring is the essence of ad serving

    • If it take much time than expected, we will partition campaigns to calc their scores on different nodes. • The paper “Your Server as a Function” shows an example of search query on different instances
  28. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  29. Communicate with
 external servers • Sometimes our servers need real

    time communication with external servers. • Here external means other company whose servers are across the internet • If we don’t have any ad to show but partner company have, we want to deliver it • We want to separate servers that receive high-traffic requests and ones that send outgoing requests.
  30. Agenda • Finagle • Thrift • Our server-side architecture and

    how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion
  31. JavaScript codes • Actually it is not RPC, just HTTP

    request/response • But we can share schema on client and server
  32. Where we don’t use Thrift • SDK-to-server communication • We

    provide iOS/Android SDK to partner developers • Our SDK doesn’t depend on any other libraries as they can easily implement it to their apps