Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a high throughput REST API with Scala

Building a high throughput REST API with Scala

Slides of my talk at the Scala DC meetup held on Jan 15th 2014.

Bhaskar V. Karambelkar

February 05, 2014
Tweet

More Decks by Bhaskar V. Karambelkar

Other Decks in Technology

Transcript

  1. Building a high throughput REST API with Scala + Play

    + Akka Bhaskar V. Karambelkar https://www.linkedin.com/in/bhaskarvk https://twitter.com/bhaskar_vk Scala DC-MD-NOVA meetup Jan-15-2014
  2. Status quo • APIs used to be built with various

    protocols such as JDBC (Stored Procs), JMS, SOAP/HTTP, XML-RPC, file transfer. • Issues –  No uniformity  Not firewall friendly  Programming language dependency (JMS)  Not easy to test / document.  Not easy to scale, load-balance, fail-over. Scala DC-MD-NOVA meetup Jan-15-2014
  3. Why Scala + Play + Akka • Needed an API

    that could successfully tackle the 4 Vs of Big Data viz. Volume, Velocity, Variety, Veracity. • Needed the API to be horizontally as well as vertically scalable. • Needed an “event driven” architecture/ programming model. • Needed easy “HA”, “fail-over”, “concurrency”, “load balancing” constructs. Scala DC-MD-NOVA meetup Jan-15-2014
  4. Stack • Scala 2.10.3, Play 2.2.1, Akka 2.2.3. • Eclipse

    + ScalaIDE (4.0.0 M1) • Mongo DB as a Config Data Store + Queue • metrics-scala library for metrics. • Webjars library to manage javascript/css dependencies. • sbt for building, jenkins for CI. Scala DC-MD-NOVA meetup Jan-15-2014
  5. Architecture Cont. • Apache Reverse Proxy ( HA, Load Balancing,

    fail-over, TLS termination). • API farm gets JSON POSTs , parses JSON , normalized to Scala Objects, uploaded to Mongo DB acting as a Q. • Same API farm de-queues from Mongo, sends it to next hop in the pipeline. • A basic admin console written in AngularJS. • Eventual destination HDFS & Elasticsearch. Scala DC-MD-NOVA meetup Jan-15-2014
  6. Performance in Production on first run • Slow JSON parsing,

    frequent OOMs, or even worse JVM hangs (kill -9). • No Transactions in MongoDB , so Data Loss in case of crash/hang. • Not scalable beyond a certain load. • CPUs pegged at 60 to 70% utilization, non-uniform core usage. • Heap usage high. • I/O bottlenecks. • Heavy en-queuing slowed down de-queuing, so queues fill up over time. Scala DC-MD-NOVA meetup Jan-15-2014
  7. Architecture 2.0 Cont. • Dedicated Pipelines for clients. • Separate

    heavy traffic from light traffic. • Separate enqueue and de-queue in to dedicated API Server instances. • Compression all the way, even in Mongo. • Incremental JSON Parsing. • Avoid unnecessary JSON->Object->BSON- >Object->Stream. • Changed logic so as to not lose data even in the event of an instance crash/hang. Scala DC-MD-NOVA meetup Jan-15-2014
  8. Results • Platform Stable • CPU usage steady @ 30

    to 40 %, with uniform distribution across cores. • Memory consumption under control, no more OOM / hanging. • Increased Throughput and scalability. • Very easy to increase scaling, create more data paths. Scala DC-MD-NOVA meetup Jan-15-2014
  9. Buzzwords/Recommendations • Scala – Immutability every where, Use case classes

    / immutable collections. – Monadic Patterns everywhere ( Collections, Try, Option) . • Akka – prefer ! (tell) Over ? (ask) – Tune Dispatcher parameters, don’t rely on default dispatcher. – Give Scheduler its own dispatcher. – Routers with own dispatcher for load-balancing actors writing to destinations. – CircuitBreaker to prevent cascading failures. – Throttler Actor for throttling when required. Scala DC-MD-NOVA meetup Jan-15-2014
  10. Buzzwords/Recommendations • Play – Prefer non-blocking/async calls whenever possible. –

    Use webjars for managing javascript/css dependency. – For huge JSONs use incremental JSON parser + Play’s Iteratee f/w. • JVM – Use Java 7. – Profile and tune GC and memory params. Scala DC-MD-NOVA meetup Jan-15-2014
  11. Some Numbers • Current Load – 2.5 Billion events /

    day ( > 30 K/sec sustained). – 2 to 3 TB / day. – Expected to grow by 5x to 10x. • Current h/w count – 2 Data Paths with 4 enqueue and 4 de-queue API servers in each path. Scala DC-MD-NOVA meetup Jan-15-2014
  12. Future … • Waiting for Typesafe platform to stabilize a

    bit (akka-io, spray, akka-cluster) • More reactive than current implementation (Play Futures, Iteratees) • Reactive Mongo (currently we use Casbah). • Evaluating Scala for use in the analytics pipeline (spark f/w, cascading). Scala DC-MD-NOVA meetup Jan-15-2014