Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streams processing with Storm

Streams processing with Storm

REVISITED version

Mariusz Gil

July 06, 2013
Tweet

More Decks by Mariusz Gil

Other Decks in Programming

Transcript

  1. Storm is a free and open source distributed realtime computation

    system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
  2. Storm is fast, a benchmark clocked it at over a

    million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
  3. Stream (val1, val2) (val3, val4) (val5, val6) unbounded sequence of

    tuples tuple tuple tuple tuple tuple tuple tuple
  4. Spouts source of streams tuple tuple tuple tuple tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple
  5. Reliable and unreliable Spouts replay or forget about touple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple
  6. Spouts source of streams tuple tuple tuple tuple tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple Storm-Kafka
  7. Spouts source of streams tuple tuple tuple tuple tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple Storm-Kestrel
  8. Spouts source of streams tuple tuple tuple tuple tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple Storm-AMQP-Spout
  9. Spouts source of streams tuple tuple tuple tuple tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple Storm-JMS
  10. Spouts source of streams tuple tuple tuple tuple tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple Storm-PubSub*
  11. Spouts source of streams tuple tuple tuple tuple tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple Storm-Beanstalkd-Spout
  12. Bolts process input streams and produce new streams tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple
  13. Bolts process input streams and produce new streams tuple tuple

    tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple
  14. Topologies network of spouts and bolts TextSpout SplitSentenceBolt WordCountBolt [sentence]

    [word] [word, count] TextSpout SplitSentenceBolt [sentence] xyzBolt
  15. Spouts public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random

    _rand; @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); } @Override public void nextTuple() { Utils.sleep(100); String[] sentences = new String[] { "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature"}; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } @Override public void ack(Object id) { } @Override public void fail(Object id) { } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } }
  16. Bolts public static class WordCount extends BaseBasicBolt { Map<String, Integer>

    counts = new HashMap<String, Integer>(); @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }
  17. Bolts public static class ExclamationBolt implements IRichBolt { OutputCollector _collector;

    public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } public void cleanup() { } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } public Map getComponentConfiguration() { return null; } }
  18. Topology public class WordCountTopology { public static void main(String[] args)

    throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); conf.setDebug(true); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); Thread.sleep(10000); cluster.shutdown(); } } }
  19. Bolts public static class SplitSentence extends ShellBolt implements IRichBolt {

    public SplitSentence() { super("python", "splitsentence.py"); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } } import storm class SplitSentenceBolt(storm.BasicBolt): def process(self, tup): words = tup.values[0].split(" ") for word in words: storm.emit([word]) SplitSentenceBolt().run()
  20. Topology public class WordCountTopology { public static void main(String[] args)

    throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); conf.setDebug(true); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); Thread.sleep(10000); cluster.shutdown(); } } }
  21. RPC distributed arguments results [request-id, arguments] [request-id, results] public static

    class ExclaimBolt extends BaseBasicBolt { public void execute(Tuple tuple, BasicOutputCollector collector) { String input = tuple.getString(1); collector.emit(new Values(tuple.getValue(0), input + "!")); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("id", "result")); } } public static void main(String[] args) throws Exception { LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("exclamation"); builder.addBolt(new ExclaimBolt(), 3); LocalDRPC drpc = new LocalDRPC(); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("drpc-demo", conf, builder.createLocalTopology(drpc)); System.out.println("Results for 'hello':" + drpc.execute("exclamation", "hello")); cluster.shutdown(); drpc.shutdown(); }
  22. Storm-YARN enables Storm applications to utilize the computational resources in

    a Hadoop cluster along with accessing Hadoop storage resources such As HBase and HDFS