Processing events at scale

Processing events at scale

Processing (almost) real-time data streams usually turns out to be an extremely difficult task. Events are comming fast, in hundreds, thousands or tens of thousands per second. The logic behind each event can be extremely complex or/and time-consuming, so executing it in HTTP request-response flow sometimes does not seem to be the best possible way. Fortunately, there are at least several methods of supporting event processing in our applications.

During this talk, I would like to introduce you to some basic concepts that are behind event processing distribution on server clusters. I am going to briefly cover the example of queue systems based on RabbitMQ queue where one can store and route messages between producers and consumers, or distributed real-time computation system like Apache Storm where you can build complex topologies and process even million tuples per second per each node. Technology is important but what seems to be even more important is moving the center of gravity of event processing from http request-response flow to some separated layer that could be scaled to the limits.

Additionally, we will also talk about data stream events storage. Sharded SQL databases or base Hadoop-powered tools are good but there are dedicated tools on the market, like Druid, where we can store and aggregate billions of events without any problem.

34be88398f623c109b61d23e8215bd23?s=128

Mariusz Gil

January 30, 2015
Tweet

Transcript

  1. 3.
  2. 4.
  3. 7.
  4. 14.
  5. 20.
  6. 21.

    <?php namespace Acme\DemoBundle\Controller; use Symfony\Bundle\FrameworkBundle\Controller\Controller; class TweetController extends Controller {

    public function newTweetAction() { // ... // EXAMPLE AND VERY NAIVE IMPLEMENTATION $form->handleRequest($request); if ($form->isValid()) { $this->get('tweet_feed_producer')->publish(array( 'user' => $user, 'tweet' => 'Lorem ipsum dolor sit amet...' )); } // ... } }
  7. 22.

    <?php namespace Acme\DemoBundle\Consumer; use OldSound\RabbitMqBundle\RabbitMq\ConsumerInterface; use PhpAmqpLib\Message\AMQPMessage; class TweetFeedsConsumer implements

    ConsumerInterface { public function execute(AMQPMessage $msg) { // ... // EXAMPLE AND VERY NAIVE IMPLEMENTATION $friends = $user->getFriends(); foreach ($friends as $friend) { $friend->getFeed()->push($tweet); } return true; } }
  8. 25.
  9. 29.

    use cases realtime analytics online machine learning continous computations distributed

    RPC Storm's small set of primitives satisfy a stunning number of use cases.
  10. 36.

    public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random _rand;

    @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); } @Override public void nextTuple() { Utils.sleep(100); String[] sentences = new String[] { "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature"}; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } @Override public void ack(Object id) { } @Override public void fail(Object id) { } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } }
  11. 37.

    public static class WordCount extends BaseBasicBolt { Map<String, Integer> counts

    = new HashMap<String, Integer>(); @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }
  12. 38.

    public class WordCountTopology { public static void main(String[] args) throws

    Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); conf.setDebug(true); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); Thread.sleep(10000); cluster.shutdown(); } } }
  13. 40.

    FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence"), 3, new Values("the cow

    jumped over the moon"), new Values("the man went to the store and bought some candy"), new Values("four score and seven years ago"), new Values("how many apples can you eat")); spout.setCycle(true); TridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate( new MemoryMapState.Factory(), new Count(), new Fields("count") ).parallelismHint(6);
  14. 43.
  15. 44.

    +

  16. 45.

    +

  17. 46.
  18. 47.
  19. 50.