Slide 1

Slide 1 text

PROCESSING t he php way of... STORM DAta STREAMS Mariusz Gil

Slide 2

Slide 2 text

about me

Slide 3

Slide 3 text

#php #scalability #nosql #performance #hadoop #hive #pig #bigdata #mahout #datamining #storm https://music.twitter.com/_login/background.jpg

Slide 4

Slide 4 text

batch #1 batch #2 batch #3 t he P r obl em

Slide 5

Slide 5 text

t he S t or y

Slide 6

Slide 6 text

STORM DISTRIBUTED REALTIME COMPUTATION SYSTEM

Slide 7

Slide 7 text

scalable no data lost fault tolerant extremely robust language agnostic efficient messaging local or distributed

Slide 8

Slide 8 text

terms and architecture

Slide 9

Slide 9 text

Spouts Bolts Stream Topologies (val1, val2) (val3, val4) (val5, val6) unbounded sequence of tuples tuple tuple tuple tuple tuple tuple tuple

Slide 10

Slide 10 text

Spouts Bolts Stream Topologies (val1, val2) (val3, val4) (val5, val6) source of streams tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple

Slide 11

Slide 11 text

Spouts Bolts Stream Topologies (val1, val2) (val3, val4) (val5, val6) process input streams and produce new streams tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple

Slide 12

Slide 12 text

Spouts Bolts Stream Topologies (val1, val2) (val3, val4) (val5, val6) network of spouts and bolts TextSpout SplitSentenceBolt WordCountBolt [sentence] [word] [word, count]

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

storm-kestrel storm-kafka storm-amqp-spout storm-jms storm-pubsub storm-beanstalkd mapr-spout

Slide 15

Slide 15 text

shuffle grouping fields grouping all grouping global grouping direct grouping local or shuffle grouping

Slide 16

Slide 16 text

ZooKeepers Supervisors Nimbus

Slide 17

Slide 17 text

fast CLUSTER STATE IS STORED LOCALLY OR IN ZOOKEEPERS fail

Slide 18

Slide 18 text

code examples

Slide 19

Slide 19 text

https://github.com/nathanmarz/storm

Slide 20

Slide 20 text

https://github.com/maltoe/storm-install

Slide 21

Slide 21 text

https://github.com/nathanmarz/storm-starter/

Slide 22

Slide 22 text

https://github.com/lazyshot/storm-php

Slide 23

Slide 23 text

public class DoubleAndTripleBolt extends BaseRichBolt { private OutputCollectorBase _collector; @Override public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) { _collector = collector; } @Override public void execute(Tuple input) { int val = input.getInteger(0); _collector.emit(input, new Values(val*2, val*3)); _collector.ack(input); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("double", "triple")); } } Java example / bolt

Slide 24

Slide 24 text

public static class ExclamationBolt implements IRichBolt { OutputCollector _collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } public void cleanup() { } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } public Map getComponentConfiguration() { return null; } } Java example / bolt

Slide 25

Slide 25 text

TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("words", new TestWordSpout(), 10); builder.setBolt("exclaim1", new ExclamationBolt(), 3) .shuffleGrouping("words"); builder.setBolt("exclaim2", new ExclamationBolt(), 2) .shuffleGrouping("exclaim1"); Java example / topology ... words exclaim1 exclaim2

Slide 26

Slide 26 text

zkServer.sh start bin/storm nimbus bin/storm supervisor bin/storm ui #optional storm jar all-my-code.jar backtype.storm.MyTopology arg1 arg2 Java example / run

Slide 27

Slide 27 text

PHP example / spout PHP example / spout require_once('storm.php'); class RandomSentenceSpout extends ShellSpout { ! protected $sentences = array( ! ! "the cow jumped over the moon", ! ! "an apple a day keeps the doctor away", ! ! "four score and seven years ago", ! ! "snow white and the seven dwarfs", ! ); ! protected function nextTuple() ! { ! ! sleep(.1); ! ! $sentence = $this->sentences[ rand(0, count($this->sentences) -1)];! ! ! $this->emit(array($sentence)); ! } ! protected function ack($tuple_id) ! { ! ! return; ! } ! protected function fail($tuple_id) ! { ! ! return; ! }! } $SentenceSpout = new RandomSentenceSpout(); $SentenceSpout->run();

Slide 28

Slide 28 text

PHP example / bolt require_once('storm.php'); class SplitSentenceBolt extends BasicBolt { ! public function process(Tuple $tuple) ! { ! ! $words = explode(" ", $tuple->values[0]); ! ! foreach($words as $word) ! ! { ! ! ! $this->emit(array($word)); ! ! } ! } } $splitsentence = new SplitSentenceBolt(); $splitsentence->run();

Slide 29

Slide 29 text

/** * This topology demonstrates Storm's stream groupings and multilang capabilities. */ public class WordCountPHPTopology { public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("php", "splitsentence.php"); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } @Override public Map getComponentConfiguration() { return null; } } // ... } MultiLang example / Topology, Bolt

Slide 30

Slide 30 text

{"command": "next"} {"command": "ack", "id": "1231231"} {"command": "fail", "id": "1231231"} NonJVMSpout NonJVMBolt {"command": "sync"} { ! "command": "emit", ! "id": "1231231", ! "stream": "1", ! "task": 9, ! "tuple": ["field1", 2, 3] } { ! "id": "-6955786537413359385", ! "comp": "1", ! "stream": "1", ! "task": 9, ! "tuple": ["snow white and dwarfs", "field2", 3] } { ! "command": "emit", ! "anchors": ["1231231", "-234234234"], ! "stream": "1", ! "task": 9, ! "tuple": ["field1", 2, 3] } https://github.com/nathanmarz/storm/wiki/Multilang-protocol

Slide 31

Slide 31 text

demo

Slide 32

Slide 32 text

use cases

Slide 33

Slide 33 text

stream processing

Slide 34

Slide 34 text

continous query computation

Slide 35

Slide 35 text

RPC distributed arguments results [request-id, arguments] [request-id, results]

Slide 36

Slide 36 text

realtime analytics personalization search revenue optimization monitoring

Slide 37

Slide 37 text

content search realtime analytics generating feeds integrated with elastic search, Hbase,hadoop and hdfs

Slide 38

Slide 38 text

realtime scoring moments generation integration with kafka queues and hdfs storage

Slide 39

Slide 39 text

thanks! feel free to contact with me email: [email protected] twitter: @mariuszgil