protocols such as JDBC (Stored Procs), JMS, SOAP/HTTP, XML-RPC, file transfer. • Issues – No uniformity Not firewall friendly Programming language dependency (JMS) Not easy to test / document. Not easy to scale, load-balance, fail-over. Scala DC-MD-NOVA meetup Jan-15-2014
that could successfully tackle the 4 Vs of Big Data viz. Volume, Velocity, Variety, Veracity. • Needed the API to be horizontally as well as vertically scalable. • Needed an “event driven” architecture/ programming model. • Needed easy “HA”, “fail-over”, “concurrency”, “load balancing” constructs. Scala DC-MD-NOVA meetup Jan-15-2014
+ ScalaIDE (4.0.0 M1) • Mongo DB as a Config Data Store + Queue • metrics-scala library for metrics. • Webjars library to manage javascript/css dependencies. • sbt for building, jenkins for CI. Scala DC-MD-NOVA meetup Jan-15-2014
fail-over, TLS termination). • API farm gets JSON POSTs , parses JSON , normalized to Scala Objects, uploaded to Mongo DB acting as a Q. • Same API farm de-queues from Mongo, sends it to next hop in the pipeline. • A basic admin console written in AngularJS. • Eventual destination HDFS & Elasticsearch. Scala DC-MD-NOVA meetup Jan-15-2014
frequent OOMs, or even worse JVM hangs (kill -9). • No Transactions in MongoDB , so Data Loss in case of crash/hang. • Not scalable beyond a certain load. • CPUs pegged at 60 to 70% utilization, non-uniform core usage. • Heap usage high. • I/O bottlenecks. • Heavy en-queuing slowed down de-queuing, so queues fill up over time. Scala DC-MD-NOVA meetup Jan-15-2014
heavy traffic from light traffic. • Separate enqueue and de-queue in to dedicated API Server instances. • Compression all the way, even in Mongo. • Incremental JSON Parsing. • Avoid unnecessary JSON->Object->BSON- >Object->Stream. • Changed logic so as to not lose data even in the event of an instance crash/hang. Scala DC-MD-NOVA meetup Jan-15-2014
to 40 %, with uniform distribution across cores. • Memory consumption under control, no more OOM / hanging. • Increased Throughput and scalability. • Very easy to increase scaling, create more data paths. Scala DC-MD-NOVA meetup Jan-15-2014
/ immutable collections. – Monadic Patterns everywhere ( Collections, Try, Option) . • Akka – prefer ! (tell) Over ? (ask) – Tune Dispatcher parameters, don’t rely on default dispatcher. – Give Scheduler its own dispatcher. – Routers with own dispatcher for load-balancing actors writing to destinations. – CircuitBreaker to prevent cascading failures. – Throttler Actor for throttling when required. Scala DC-MD-NOVA meetup Jan-15-2014
day ( > 30 K/sec sustained). – 2 to 3 TB / day. – Expected to grow by 5x to 10x. • Current h/w count – 2 Data Paths with 4 enqueue and 4 de-queue API servers in each path. Scala DC-MD-NOVA meetup Jan-15-2014
bit (akka-io, spray, akka-cluster) • More reactive than current implementation (Play Futures, Iteratees) • Reactive Mongo (currently we use Casbah). • Evaluating Scala for use in the analytics pipeline (spark f/w, cascading). Scala DC-MD-NOVA meetup Jan-15-2014