Antimony: a real time stream processing

Antimony: a real time stream processing

Presentation at RustFest Zurich 2017

D500450ea21d23477c1f2b22589627d3?s=128

Mohammed Makhlouf

October 01, 2017
Tweet

Transcript

  1. Antimony A rustacean’s approach to real time stream processing a

    real time distributed stream processing system
  2. Who are we? Mohammed Makhlouf QCERT @msmakhlouf Mohammad Samir QCERT

    @_msamir_
  3. The plan of this talk • The use case (equally

    relevant / irrelevant history & background) • Why do we do streaming & real time • Problems we faced running Apache Storm in production • The motivation for writing Antimony • Some code example • Conclusion & “antimony”???
  4. The use case for a real time streaming We analyze

    DNS logs in real time for suspicious activities in the network for a majority of Government Organizations. We analyze billions of records every day (over 10 TB a day) originating from 100+ organizations. Organizations maturity vary greatly (from small offices with a handful of nodes to mega corps with 1000+ nodes)
  5. What are we looking at / for 8/12/2017 9:35:32 AM

    0490 PACKET 000000216EFFC910 UDP Snd 212.77.192.37 80a8 R Q [1080 NOERROR] A (3)ns1(3)ctc(3)gov(2)qa(0) {timestamp} … {dir} … {ip} … {status} … {fqdn}
  6. Considering the {fqdn} s Carbanak Cyber Gang Stole somewhere between

    500 Mil to 1 Bn USD The malware they used frequently communicated with these domains update-java.net -- systemsvc.net -- adobe-update.net (can be spoted by keywords) Dyre / Dyreza [POS] Malware used DGA afadsfasdfsafwerwqerqqiye.cn -- jlaooireqieoruqoireqpwoiue.to -- ...
  7. Attackers / Defenders best friend http://christian-rossow.de/publications/downloaders-dimva12.pdf

  8. Apache Storm We build a pipeline in apache storm using

    pyleaus from Yelp to not have to deal with java when writing a topology • Developed by Nathan Marz • Open sourced by Twitter in 2011 • Now an Apache Software Foundation project • {Map/Reduce}-like semantics for stream processing • Supports a multi-language protocol (JSON over STDIN/STDOUT)
  9. A Streaming Job

  10. Some Terminology Topology: DAG where vertices: are computations edges: stream

    of data tuples Spout: source of data tuple to the topology Kafka / NSQ / mysql / Network Stream Bolt: processing of incoming data tuples Joins / Filters / Aggs / any arbitrary computation
  11. Example topology Log Entries Spout Extract FQDN Bolt Count FQDN

    Bolt Also know as The Logical plan
  12. Physical Plan of topology Log Entries Spout Extract FQDN Bolt

    Count FQDN Bolt Shuffle grouping Field grouping
  13. Motivation for Antimony • Just an excuse to spend more

    time writing Rust. • Our way of using Storm ( python on top of jvm ) was inefficient • Had enough evidence that replacing JVM with Rust we would gain more performance.
  14. Nimbus Supervisor W1 W2 W4 W3 Supervisor W1 W2 W4

    W3 master node slave node slave node ZK Topology Submission Assignment Map Code Sync
  15. Some problems with that architecture • Nimbus is overloaded with

    functionality (coordination / scheduling / monitoring) • Nimbus have no resource reservation / isolation capabilities • Nimbus is a single point of failure
  16. Storm worker • Multiplexed scheduling algorithms. • Hard to debug

    / difficult to tune. Task Task Task Task Task Executor Task Task Task Task Task Executor
  17. Inside the Storm Worker Worker Receive Thread RX Worker Send

    Thread User Logic Thread Spout / Bolt An Executor e of N Send Thread Inc Q Out Q TX To other executors Networking layer
  18. Zookeeper Overload W1 W2 W2 ZK Storm S1 S1 Other

    Services
  19. Some homegrown wisdom Run each topology on a separate storm

    cluster with a separate zookeeper cluster. Over provisioned everything.
  20. Two things happened MIO / Tokio seemed like a good

    choice for Async IO Twitter published the Heron paper
  21. Much better

  22. Antimony Architecture Scheduler Topology 1 Topology 2 Topology 3 Topology

    Submission
  23. Antimony Architecture Topology Master ZK cluster Topology Submission Logical /

    Physical plan State Stream Manager I1 I2 I4 I3 Container Metrics Manager Stream Manager I1 I2 I4 I3 Container Metrics Manager Physical Plan sync
  24. Topology Master Advertises itself in Zookeeper to be discovered by

    other processes. Prevents multiple Topology masters from assuming the role. Communicates with Scheduler.
  25. Stream Manager Routes the tuples. Performs Back pressure.

  26. Stream Manager Log Entries Spout Extract FQDN Bolt Count FQDN

    Bolt
  27. Stream Manager Stream Manager LE FE FC Stream Manager LE

    FE FC Stream Manager LE FE FC Stream Manager LE FE FC
  28. Define a topology . ├── Cargo.lock ├── Cargo.toml ├── src

    │ ├── bolts │ │ ├── mod.rs │ │ ├── efbolt.rs │ │ └── febolts.rs │ ├── lib.rs │ └── spouts │ ├── mod.rs │ └── lespout.rs └── topology.json antimony = “0.0.1”
  29. None
  30. None
  31. antimony-cli Simply run in your topology lib dir and it

    will submit the topology to antimony cluster / using Apache Mesos.
  32. The Name Iron, Cobalt, Nickel, Titanium, Metal IO. The periodic

    table is all taken or at least all shiny metals that can rust. We are here
  33. Close Enough!

  34. Thank You @msmakhlouf @_msamir_ https://antimony.rs