Slide 1

Slide 1 text

Ad Serving System
 on 
 Finagle and Thrift Daisuke Matsumoto (@daimatz) 
 2017-02-26 (Sun)
 ScalaMatsuri 2017 Unconference

Slide 2

Slide 2 text

About me • Daisuke Matsumoto (@daimatz) • Co-founder, VP of Engineering at FIVE Inc. • Largest Mobile Video Advertising Platform in Japan • My engineering roles are:
 Front-end Servers, Back-end Servers, Dashboards, Log Analysis and Reporting Batches, Operation Tools, Android SDK, iOS SDK, etc…

Slide 3

Slide 3 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 4

Slide 4 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 5

Slide 5 text

Finagle • Twitter’s open source RPC system • Built on Scala • Twitter is an early adopter of Scala • Future, Try, Duration, …
 Many Scala utilities (mostly related to concurrency) are inspired by Twitter’s common library.

Slide 6

Slide 6 text

“Your Server as a Function” • Marius Eriksen at Twitter • A lead developer of Finagle • https://monkey.org/~marius/funsrv.pdf • The paper introduces how to use Twitter’s Future and Finagle’s Service

Slide 7

Slide 7 text

Futures [1/3] • A great abstraction of callback based programming • JavaScript’s Promise, Java 8’s CompletableFuture,
 C++11’s std::future • A value of Future[A] is a placeholder to hold the result of an asynchronous operation • Typically it will issue some IO operations that may fail. • What makes different from traditional callback style is “composable”

Slide 8

Slide 8 text

Futures [2/3] • f: Registering callback function transforms Future[A] to another Future[B] • g: In Scala, we can use for-comprehension

Slide 9

Slide 9 text

Futures [3/3] • Futures are composable • There are many utilities to compose Futures:

Slide 10

Slide 10 text

Finagle’s Service • A Service is an asynchronous function • Service represents both server and client • A Server is a function to implement the Service; Finagle dispatches incoming requests to it • A Client is a function to use the Service; Finagle dispatches requests to the service and handle responses

Slide 11

Slide 11 text

Finagle’s Service

Slide 12

Slide 12 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 13

Slide 13 text

Thrift • A data serialization format and RPC interface • Users write data definitions and RPC interfaces in Thrift IDL • Thrift compiler generates code to serialize/ deserialize data and RPC client/server. • Originally developed by Facebook and now it’s an Apache project

Slide 14

Slide 14 text

Thrift elements • Primitive types: bool, byte, i16, i32, i64, double, binary, string • User defined types: enum, struct, union • Container types: optional, list, set, map • Unlike protobuf, map key can be any types including user defined structs • But I don’t recommend • `service` keyword defines RPC interface • Each struct fields and RPC parameters have unique id: 1,2,3,…

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Thrift generator • Thrift supports Java code generation, of course, that can be used from Scala • “But the generated code uses Java collections and mutable “bean” classes, causing some annoying boilerplate conversions to be hand-written.” • Twitter developed their own Thrift parser/generator, called Scrooge • https://twitter.github.io/scrooge/

Slide 17

Slide 17 text

Scrooge • A Thrift parser and code generator that generates Scala-friendly API • list, set, map by scala.collection.{Seq, Set, Map} • struct by immutable case class • enum, union by sealed trait • Easy interface to send/receive RPC • sbt support • Example codes are described later!

Slide 18

Slide 18 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

How we use Thrift • Data serialization • Store in DB • Logging • RPC • Splitting many services by Thrift RPC • Dashboard’s JSON-API schema

Slide 21

Slide 21 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Store Thrift struct in DB • We use MySQL as a indexed KVS • All serialized data are stored in `bytes` column and only index keys are defined as other columns. • Joins and aggregate functions are calculated in application layer

Slide 24

Slide 24 text

Data construction and serialization, save to DB

Slide 25

Slide 25 text

How to manipulate our data • MySQL schema doesn’t change • But Thrift schema often drastically changes, which require data migration • There is no DB migration tool nor “UPDATE” statement applicable for serialized Thrift data • If you want to update data, you need to create a new sbt project, write a program that access to DB and change the data and save it, create a jar file, deploy, … • Or you can do it on sbt console in production environment… but it’s so painful

Slide 26

Slide 26 text

Write operation scripts
 by Scala casually • We write operation scripts by Scala, called “ScalaScript” project • Originally implemented by Twitter’s util-eval but now it’s no more needed • 1. Prepare fat-jar file that all common libraries are included.
 2. Run the jar file with the argument of our script
 3. The script is dynamically loaded, compiled and executed with all classpaths enabled!

Slide 27

Slide 27 text

ScalaScript example [1/2] • Prepare a fat-jar that contains all common libraries

Slide 28

Slide 28 text

ScalaScript example [2/2] • We can reuse our in-house Scala library, such as DB access, Thrift serialization, Thrift RPC, running BigQuery, uploading to SpreadSheet, posting to Slack, … • What’s more, it’s type safe! • Write a script and run it

Slide 29

Slide 29 text

Daily data editing • Of course there is a “rich” dashboard for daily operation • But experimental feature are often added to schema that dashboard can’t keep up with • Needs of simple data viewer/editor • There is no “phpMyAdmin”

Slide 30

Slide 30 text

Scrooge exports parser API • Users can easily get ASTs of all defined structs • Without writing parser with your own hands • We can traverse the AST and create HTML form automatically

Slide 31

Slide 31 text

Auto-generated HTML form

Slide 32

Slide 32 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

In-process Cache • Cold data, such as Campaign, Media, InternalUser, are cached on each servers’ process. • To reduce redis access • But if you update master of such cold data, servers need to reload them • Dashboard copies MySQL data to Redis, then sends RPCs to each servers to reload cache.

Slide 35

Slide 35 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

Scalable scorer • Scoring is the essence of ad serving • If it take much time than expected, we will partition campaigns to calc their scores on different nodes. • The paper “Your Server as a Function” shows an example of search query on different instances

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

RPC server on
 Scorer instances

Slide 40

Slide 40 text

RPC client on
 Ad frontend instances

Slide 41

Slide 41 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

Communicate with
 external servers • Sometimes our servers need real time communication with external servers. • Here external means other company whose servers are across the internet • If we don’t have any ad to show but partner company have, we want to deliver it • We want to separate servers that receive high-traffic requests and ones that send outgoing requests.

Slide 44

Slide 44 text

Agenda • Finagle • Thrift • Our server-side architecture and how we use Finagle/Thrift • Storing serialized data in DB • Reloading in-process cache • Scoring • Communicating with external service • Sharing schema in dashboard’s JavaScript and server • Conclusion

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

Dashboard’s JSON-API • Dashboard’s JSON request/response are also defined as Thrift schema

Slide 47

Slide 47 text

JavaScript codes • Actually it is not RPC, just HTTP request/response • But we can share schema on client and server

Slide 48

Slide 48 text

Where we don’t use Thrift • SDK-to-server communication • We provide iOS/Android SDK to partner developers • Our SDK doesn’t depend on any other libraries as they can easily implement it to their apps

Slide 49

Slide 49 text

Conclusion • Introduced how we use Finagle and peripheral tools