Save 37% off PRO during our Black Friday Sale! »



Presented at RootConf 2015.
Video at


Abhishek Kona

May 14, 2015


  1. 5/26/2015 Rewriting 1/30 Rewriting RootConf, Bangalore 15

    May 2015
  2. 5/26/2015 Rewriting 2/30 Who is this Guy? Abhishek

    Kona Software Engineer at, Facebook Ex-Flipkart @sheki
  3. 5/26/2015 Rewriting 3/30 What is the talk about?

    What are the (scaling) problems we had at ? How we solved them? Did we learn anything?
  4. 5/26/2015 Rewriting 4/30 What is Developer platform

    to build mobile apps. Backend-As-A-Service, build an app not backend. Works for IOS, Android, JS, React, React-Native, Windows, PHP ... Acquired by Facebook in 2013.
  5. 5/26/2015 Rewriting 5/30 Parse - Circa 2013. ~60K

    apps. 10 Engineers. Ruby on Rails App (like every company out of YCombinator)
  6. 5/26/2015 Rewriting 6/30 Parse - Right now. 500K

    apps built on Parse. 100% Year-On-Year traffic growth. Primarily a Go Stack.
  7. 5/26/2015 Rewriting 7/30 - Issues (2013) Uptime

    ~90% Single popular app can take down Unmanageable codebase
  8. 5/26/2015 Rewriting 8/30 Listing down our Problems -

    "Beast List" Create a Checklist of all issues preventing us having an uptime of 99.9+% Came up with software / tools we can build. Some concrete issues Unicorn, our Ruby HTTP server was a resource hog. Large deploy times.
  9. 5/26/2015 Rewriting 9/30 We decided to Rewrite in

  10. 5/26/2015 Rewriting 10/30 Why Rewrite? Could not understand

    the Ruby codebase. Estimated performance win - huge. New codebase will be statically typed.
  11. 5/26/2015 Rewriting 11/30 Why Go? Statically typed programming

    language with good concurrency support. Outperforms Ruby - build and execution time. Our second choice was C#,
  12. 5/26/2015 Rewriting 12/30 Status of the Rewrite Took

    3-4 Engineers 1.5 years to complete. It works.
  13. 5/26/2015 Rewriting 13/30 How did the rewrite help?

    Got rid of Unicorn. We can add capacity quickly, deploy speeds went up. Readable codebase (for now).
  14. 5/26/2015 Rewriting 14/30 What did we learn? No

    silver bullet. Mostly about managing the pain.
  15. 5/26/2015 Rewriting 15/30 Monoliths are all right.

  16. 5/26/2015 Rewriting 16/30 Monoliths Micro-Services are all a

    rage, but it is quicker to build/test a single binary. We built mostly as a monolith, inspired by Micro-services work if there are multiple teams managing different services.
  17. 5/26/2015 Rewriting 17/30 Proxies - you probably need

    them. Connections consume precious memory on the DB. Proxies help effectively manage connections across app servers. Side-effect - you can monitor your database perf from a central place. We wrote our own proxy for Mongo in Go: (
  18. 5/26/2015 Rewriting 18/30 Metrics When in doubt measure

    everything. We started with Ganglia -> UI froze after 100 metrics. We use Facebooks Scuba and ODS. Find a metrics service, hopefully you don't have to write one. scuba/10150599692628920 (
  19. 5/26/2015 Rewriting 19/30 Shadow Live Traffic Real bugs

    show under live traffic. We had a mechanism to run live traffic on a test cluster. Tools to compare results from test and prod cluster invaluable. Shadowed traffic for months for some endpoints before we released to 100% of users. Our setup -> a custom Go HTTP proxy to send requests to test and prod clusters. Works great for Read APIs. Complicated setup for Write APIs with database snapshot and DB compare.
  20. 5/26/2015 Rewriting 20/30 Throttles First line of defense

    - capability to block any backend, client. Our throttling was Simple Memcache based counters. Currently evolving into Auto-throttling.
  21. 5/26/2015 Rewriting 21/30 Gatekeeper / Decider Feature flags

    / production hooks to control roll out of new code to a fraction of users/traffic. Good way to get confidence. Our in house Go system is called Decider -> based of Redis. Important to clean up old code after the roll out to avoid code smells.
  22. 5/26/2015 Rewriting 22/30 Deploys Our philosophy - every

    engineer should deploy when needed. We moved away from a fixed Monday release to release all the time. Deploy many small changes as often as possible. Deployctl - In house tool written in Python to deploy Go (Zookeeper based). Deploy locking and canarying.
  23. 5/26/2015 Rewriting 23/30 Cockpit Admin HTTP service on

    every binary for debugging. Exposes Health Checks / Git version / build time / uptime. Can connect pprof over it (thanks GoLang). Can activate verbose logging on a particular server - logs every request response pair.
  24. 5/26/2015 Rewriting 24/30 Context Pass global context object

    through out our codebase. Context can be used to tag along ReqID, AppID We use context to pass in a ReqID, that is added to query comment on Mongo, helps us track back a request from a slow query in the log. Support context objects when writing a new library. Golang has great context package ( .
  25. 5/26/2015 Rewriting 25/30 Own your Database Sooner or

    later DB will be the bottleneck. Understand the internals of your Database from the start. Query planner, db caches - row/block cache, Indexing trade-offs, major locks. Start hacking on the DB codebase, you can add custom metrics - usually easier than it seems. team at Facebook built a new storage engine for Mongo - Mongo-Rocks. ( rocksdb-parse/)
  26. 5/26/2015 Rewriting 26/30 About our Codebase Dependency Injection

    - only at boot time ( . Lots of small libraries ( . Try not to fork - we submit patches upstream.
  27. 5/26/2015 Rewriting 27/30 Tests Integration tests > unit

    tests. Our go test suite takes less than 2min to run. Parallel test runs are beautiful. We boot multiple mongo/memcache instances in memory in our test binary. ( (
  28. 5/26/2015 Rewriting 28/30 Closing Thoughts Rewrite is not

    the worst idea. GO is great. User for your next app.
  29. 5/26/2015 Rewriting 29/30 Thank you

  30. 5/26/2015 Rewriting 30/30