Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Antimony: a real time stream processing

Antimony: a real time stream processing

Presentation at RustFest Zurich 2017

Mohammed Makhlouf

October 01, 2017
Tweet

More Decks by Mohammed Makhlouf

Other Decks in Programming

Transcript

  1. Antimony A rustacean’s approach to real time stream processing a

    real time distributed stream processing system
  2. The plan of this talk • The use case (equally

    relevant / irrelevant history & background) • Why do we do streaming & real time • Problems we faced running Apache Storm in production • The motivation for writing Antimony • Some code example • Conclusion & “antimony”???
  3. The use case for a real time streaming We analyze

    DNS logs in real time for suspicious activities in the network for a majority of Government Organizations. We analyze billions of records every day (over 10 TB a day) originating from 100+ organizations. Organizations maturity vary greatly (from small offices with a handful of nodes to mega corps with 1000+ nodes)
  4. What are we looking at / for 8/12/2017 9:35:32 AM

    0490 PACKET 000000216EFFC910 UDP Snd 212.77.192.37 80a8 R Q [1080 NOERROR] A (3)ns1(3)ctc(3)gov(2)qa(0) {timestamp} … {dir} … {ip} … {status} … {fqdn}
  5. Considering the {fqdn} s Carbanak Cyber Gang Stole somewhere between

    500 Mil to 1 Bn USD The malware they used frequently communicated with these domains update-java.net -- systemsvc.net -- adobe-update.net (can be spoted by keywords) Dyre / Dyreza [POS] Malware used DGA afadsfasdfsafwerwqerqqiye.cn -- jlaooireqieoruqoireqpwoiue.to -- ...
  6. Apache Storm We build a pipeline in apache storm using

    pyleaus from Yelp to not have to deal with java when writing a topology • Developed by Nathan Marz • Open sourced by Twitter in 2011 • Now an Apache Software Foundation project • {Map/Reduce}-like semantics for stream processing • Supports a multi-language protocol (JSON over STDIN/STDOUT)
  7. Some Terminology Topology: DAG where vertices: are computations edges: stream

    of data tuples Spout: source of data tuple to the topology Kafka / NSQ / mysql / Network Stream Bolt: processing of incoming data tuples Joins / Filters / Aggs / any arbitrary computation
  8. Physical Plan of topology Log Entries Spout Extract FQDN Bolt

    Count FQDN Bolt Shuffle grouping Field grouping
  9. Motivation for Antimony • Just an excuse to spend more

    time writing Rust. • Our way of using Storm ( python on top of jvm ) was inefficient • Had enough evidence that replacing JVM with Rust we would gain more performance.
  10. Nimbus Supervisor W1 W2 W4 W3 Supervisor W1 W2 W4

    W3 master node slave node slave node ZK Topology Submission Assignment Map Code Sync
  11. Some problems with that architecture • Nimbus is overloaded with

    functionality (coordination / scheduling / monitoring) • Nimbus have no resource reservation / isolation capabilities • Nimbus is a single point of failure
  12. Storm worker • Multiplexed scheduling algorithms. • Hard to debug

    / difficult to tune. Task Task Task Task Task Executor Task Task Task Task Task Executor
  13. Inside the Storm Worker Worker Receive Thread RX Worker Send

    Thread User Logic Thread Spout / Bolt An Executor e of N Send Thread Inc Q Out Q TX To other executors Networking layer
  14. Some homegrown wisdom Run each topology on a separate storm

    cluster with a separate zookeeper cluster. Over provisioned everything.
  15. Two things happened MIO / Tokio seemed like a good

    choice for Async IO Twitter published the Heron paper
  16. Antimony Architecture Topology Master ZK cluster Topology Submission Logical /

    Physical plan State Stream Manager I1 I2 I4 I3 Container Metrics Manager Stream Manager I1 I2 I4 I3 Container Metrics Manager Physical Plan sync
  17. Topology Master Advertises itself in Zookeeper to be discovered by

    other processes. Prevents multiple Topology masters from assuming the role. Communicates with Scheduler.
  18. Stream Manager Stream Manager LE FE FC Stream Manager LE

    FE FC Stream Manager LE FE FC Stream Manager LE FE FC
  19. Define a topology . ├── Cargo.lock ├── Cargo.toml ├── src

    │ ├── bolts │ │ ├── mod.rs │ │ ├── efbolt.rs │ │ └── febolts.rs │ ├── lib.rs │ └── spouts │ ├── mod.rs │ └── lespout.rs └── topology.json antimony = “0.0.1”
  20. antimony-cli Simply run in your topology lib dir and it

    will submit the topology to antimony cluster / using Apache Mesos.
  21. The Name Iron, Cobalt, Nickel, Titanium, Metal IO. The periodic

    table is all taken or at least all shiny metals that can rust. We are here