relevant / irrelevant history & background) • Why do we do streaming & real time • Problems we faced running Apache Storm in production • The motivation for writing Antimony • Some code example • Conclusion & “antimony”???
DNS logs in real time for suspicious activities in the network for a majority of Government Organizations. We analyze billions of records every day (over 10 TB a day) originating from 100+ organizations. Organizations maturity vary greatly (from small offices with a handful of nodes to mega corps with 1000+ nodes)
500 Mil to 1 Bn USD The malware they used frequently communicated with these domains update-java.net -- systemsvc.net -- adobe-update.net (can be spoted by keywords) Dyre / Dyreza [POS] Malware used DGA afadsfasdfsafwerwqerqqiye.cn -- jlaooireqieoruqoireqpwoiue.to -- ...
pyleaus from Yelp to not have to deal with java when writing a topology • Developed by Nathan Marz • Open sourced by Twitter in 2011 • Now an Apache Software Foundation project • {Map/Reduce}-like semantics for stream processing • Supports a multi-language protocol (JSON over STDIN/STDOUT)
of data tuples Spout: source of data tuple to the topology Kafka / NSQ / mysql / Network Stream Bolt: processing of incoming data tuples Joins / Filters / Aggs / any arbitrary computation
time writing Rust. • Our way of using Storm ( python on top of jvm ) was inefficient • Had enough evidence that replacing JVM with Rust we would gain more performance.
functionality (coordination / scheduling / monitoring) • Nimbus have no resource reservation / isolation capabilities • Nimbus is a single point of failure