Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Processing events at scale

Processing events at scale

Processing (almost) real-time data streams usually turns out to be an extremely difficult task. Events are comming fast, in hundreds, thousands or tens of thousands per second. The logic behind each event can be extremely complex or/and time-consuming, so executing it in HTTP request-response flow sometimes does not seem to be the best possible way. Fortunately, there are at least several methods of supporting event processing in our applications.

During this talk, I would like to introduce you to some basic concepts that are behind event processing distribution on server clusters. I am going to briefly cover the example of queue systems based on RabbitMQ queue where one can store and route messages between producers and consumers, or distributed real-time computation system like Apache Storm where you can build complex topologies and process even million tuples per second per each node. Technology is important but what seems to be even more important is moving the center of gravity of event processing from http request-response flow to some separated layer that could be scaled to the limits.

Additionally, we will also talk about data stream events storage. Sharded SQL databases or base Hadoop-powered tools are good but there are dedicated tools on the market, like Druid, where we can store and aggregate billions of events without any problem.

Mariusz Gil

January 30, 2015
Tweet

More Decks by Mariusz Gil

Other Decks in Programming

Transcript

  1. SCALE
    processing events at
    Mariusz Gil

    View Slide

  2. WROCŁAW, POLAND

    View Slide

  3. View Slide

  4. View Slide

  5. SCALE
    processing events at
    Who is working on application which is
    runnig on more than X servers?

    View Slide

  6. SCALE
    SCALE
    SCALE
    SCALE
    at

    View Slide

  7. View Slide

  8. user
    browser
    &

    View Slide

  9. user
    browser
    &
    data processing, rendering
    request, routing

    View Slide

  10. faster is better
    slower is unusable

    View Slide

  11. requests
    sometimes
    are heavy…
    …too heavy

    View Slide

  12. requests
    sometimes
    are heavy…
    …too heavy

    View Slide

  13. event processing logic
    should be moved out
    from request-response loop

    View Slide

  14. View Slide

  15. RabbitMQ is a
    platform to send and receive
    messages

    View Slide

  16. producer consumer
    Firstly

    View Slide

  17. producer
    consumer1
    consumer2
    Basic QoS settings

    View Slide

  18. producer
    consumer1
    consumer2

    View Slide

  19. producer
    consumer1
    consumer2

    View Slide

  20. View Slide

  21. namespace Acme\DemoBundle\Controller;
    use Symfony\Bundle\FrameworkBundle\Controller\Controller;
    class TweetController extends Controller
    {
    public function newTweetAction()
    {
    // ...
    // EXAMPLE AND VERY NAIVE IMPLEMENTATION
    $form->handleRequest($request);
    if ($form->isValid()) {
    $this->get('tweet_feed_producer')->publish(array(
    'user' => $user,
    'tweet' => 'Lorem ipsum dolor sit amet...'
    ));
    }
    // ...
    }
    }

    View Slide

  22. namespace Acme\DemoBundle\Consumer;
    use OldSound\RabbitMqBundle\RabbitMq\ConsumerInterface;
    use PhpAmqpLib\Message\AMQPMessage;
    class TweetFeedsConsumer implements ConsumerInterface
    {
    public function execute(AMQPMessage $msg)
    {
    // ...
    // EXAMPLE AND VERY NAIVE IMPLEMENTATION
    $friends = $user->getFriends();
    foreach ($friends as $friend) {
    $friend->getFeed()->push($tweet);
    }
    return true;
    }
    }

    View Slide

  23. How to know
    if our consumers layer
    is efficient or not?

    View Slide

  24. producer
    consumer
    producer
    consumer
    producer
    consumer
    Directed Acyclic Graphs

    View Slide

  25. View Slide

  26. Apache Storm is a
    distributed realtime
    computation system

    View Slide

  27. doing for realtime processing
    what Hadoop did for batch processing

    View Slide

  28. written in Clojure, but
    language agnostic

    View Slide

  29. use cases
    realtime analytics
    online machine learning
    continous computations
    distributed RPC
    Storm's small set of primitives satisfy a
    stunning number of use cases.

    View Slide

  30. unbouded sequence of tuples
    stream

    View Slide

  31. spout
    source of streams
    Joke about TCP and UDP connections

    View Slide

  32. bolt
    process input stream and produce new one

    View Slide

  33. bolt
    process input stream and produce new one

    View Slide

  34. Topology is a network
    of spouts and bolts
    Directed Acyclic Multi-Graphs

    View Slide

  35. Infrastructure level
    nimbus, supervisors, workers
    Apache Zookeepers

    View Slide

  36. public class RandomSentenceSpout extends BaseRichSpout {
    SpoutOutputCollector _collector;
    Random _rand;
    @Override
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
    _collector = collector;
    _rand = new Random();
    }
    @Override
    public void nextTuple() {
    Utils.sleep(100);
    String[] sentences = new String[] {
    "the cow jumped over the moon",
    "an apple a day keeps the doctor away",
    "four score and seven years ago",
    "snow white and the seven dwarfs",
    "i am at two with nature"};
    String sentence = sentences[_rand.nextInt(sentences.length)];
    _collector.emit(new Values(sentence));
    }
    @Override
    public void ack(Object id) {
    }
    @Override
    public void fail(Object id) {
    }
    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
    declarer.declare(new Fields("word"));
    }
    }

    View Slide

  37. public static class WordCount extends BaseBasicBolt {
    Map counts = new HashMap();
    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
    String word = tuple.getString(0);
    Integer count = counts.get(word);
    if (count == null) count = 0;
    count++;
    counts.put(word, count);
    collector.emit(new Values(word, count));
    }
    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
    declarer.declare(new Fields("word", "count"));
    }
    }

    View Slide

  38. public class WordCountTopology {
    public static void main(String[] args) throws Exception {
    TopologyBuilder builder = new TopologyBuilder();
    builder.setSpout("spout", new RandomSentenceSpout(), 5);
    builder.setBolt("split", new SplitSentence(), 8)
    .shuffleGrouping("spout");
    builder.setBolt("count", new WordCount(), 12)
    .fieldsGrouping("split", new Fields("word"));
    Config conf = new Config();
    conf.setDebug(true);
    if (args != null && args.length > 0) {
    conf.setNumWorkers(3);
    StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
    } else {
    conf.setMaxTaskParallelism(3);
    LocalCluster cluster = new LocalCluster();
    cluster.submitTopology("word-count", conf, builder.createTopology());
    Thread.sleep(10000);
    cluster.shutdown();
    }
    }
    }

    View Slide

  39. high-level abstraction
    Trident for realtime processing

    View Slide

  40. FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence"), 3,
    new Values("the cow jumped over the moon"),
    new Values("the man went to the store and bought some candy"),
    new Values("four score and seven years ago"),
    new Values("how many apples can you eat"));
    spout.setCycle(true);
    TridentTopology topology = new TridentTopology();
    TridentState wordCounts =
    topology.newStream("spout1", spout)
    .each(new Fields("sentence"), new Split(), new Fields("word"))
    .groupBy(new Fields("word"))
    .persistentAggregate(
    new MemoryMapState.Factory(), new Count(), new Fields("count")
    ).parallelismHint(6);

    View Slide

  41. events after all
    ?
    Where are my

    View Slide

  42. NOWHERE
    unfortunately…

    View Slide

  43. View Slide

  44. +

    View Slide

  45. +

    View Slide

  46. View Slide

  47. DO
    o

    View Slide

  48. DO
    verengineering
    on’t
    Redis pub/sub

    View Slide

  49. @mariuszgil

    View Slide

  50. THANKS

    View Slide

  51. ( it depends )

    View Slide