Slide 1

Slide 1 text

+Ian Barber - [email protected] - @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk BUILDING A FIREHOSE

Slide 2

Slide 2 text

FILTERABLE REAL TIME STREAMING DATA

Slide 3

Slide 3 text

SELLING DATA ANALYSIS & DECISIONS USER TOOLS $£¥ ☑☒

Slide 4

Slide 4 text

DATA SOURCES COMPOSE latency AUGMENT STORE FILTER STREAM

Slide 5

Slide 5 text

EVENT SAMPLE order tweet temperature snapshot

Slide 6

Slide 6 text

Data Source Data Source Data Source Output

Slide 7

Slide 7 text

Data Source Data Source Data Source Output Output

Slide 8

Slide 8 text

Data Source Data Source Data Source Output Messaging Batch HTTP Logs HTTP Chunked Websockets Batched POST

Slide 9

Slide 9 text

APACHE PHP APACHE PHP NODE.JS PUSH ZEROMQ PULL HTTP POST WEBSOCKETS

Slide 10

Slide 10 text

APACHE PHP APACHE PHP HTTP POST function sendPos() { navigator.geolocation.getCurrentPosition( function(pos) { $.ajax({ type: 'POST', url:'http://firehose.com/input.php', data: {lat: pos.coords.latitude, lon: pos.coords.longitude}}); }); setTimeout(sendPos, 60000); } sendPos(); location.php

Slide 11

Slide 11 text

APACHE PHP APACHE PHP PUSH ZEROMQ WEBSOCKETS $ctx = new ZMQContext(); $sock = $ctx->getSocket(ZMQ::SOCKET_PUSH); $sock->connect("tcp://localhost:5566"); $data = array( 'id' => get_next_msg_id(), 'uid' => $_COOKIE['uid'], 'lat' => $_POST['lat'], 'lon' => $_POST['lon'] ); $sock->send(json_encode($data)); input.php

Slide 12

Slide 12 text

APACHE PHP APACHE PHP NODE.JS ZEROMQ PULL WEBSOCKETS app=require('http').createServer(handler), io = require('socket.io').listen(app), zmq = require('zmq'), sock = zmq.socket('pull'); app.listen(8080); sock.bind('tcp://*:5566'); sock.on('message', function (msg) { var data = JSON.parse(msg); // send to all clients io.sockets.emit("position", event); }); output.js

Slide 13

Slide 13 text

PHP DAEMON PHP DAEMON NODE.JS PUSH ZEROMQ PULL HTTP STREAM WEBSOCKETS $fh = fopen("https://".$user.":". $pass."@stream.twitter.com/1/statuses/ filter.json?track=".$search, "r"); while(!feof($fh)) { $d = fgets($fh); if(strlen($d) > 4) { $sock->send($d); } } twitter.php

Slide 14

Slide 14 text

Data Source Data Source Output Assemble Process Process

Slide 15

Slide 15 text

SOURCE ASSEMBLE PHP PHP ZEROMQ PUB SUB SUB SUB REDIS ZEROMQ PUSH

Slide 16

Slide 16 text

SOURCE PHP ZEROMQ PUB SUB REDIS $ctx = new ZMQContext(); $sub = $ctx->getSocket(ZMQ::SOCKET_SUB); $sub->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE,""); $sub->connect("tcp://localhost:5577"); while( $dat = $sub->recv() ) { $aug = augment(json_decode($dat,true),$obj); $redis->lpush($dat['id'],json_encode($aug)); } augmentor.php

Slide 17

Slide 17 text

$mongo = new Mongo(); $collection = $m->starbucks->locations; function augment($data, $collection) { $loc = array((float) $data['lon'], (float) $data['lat']); $res = $collection->findOne(array( 'loc' => array('$near' => $loc))); return array('name' => 'starbucks', 'val' => $res['street']); } SOURCE PHP REDIS starbucks.php DB

Slide 18

Slide 18 text

$ld = new Text_LanguageDetect(); $ld->setNameMode(2); function augment($data, $ld) { /* ["en"]=> float(0.24702222222222) */ $names = $ld->detect($data['text'], 1); return array('name' => 'lang', 'val' => key($names)); } SOURCE PHP REDIS langdetect.php

Slide 19

Slide 19 text

$zk = new Zookeeper(); $zk->connect("localhost:2181"); SOURCE ASSEMBLE PHP PHP REDIS PHP ZOOKEEPER COUNT OF SERVICES

Slide 20

Slide 20 text

$zk->create( $path . "/" . uniqid(), NULL, array( array( "perms" => Zookeeper::PERM_ALL, "scheme" => "world", "id" => "anyone")), Zookeeper::EPHEMERAL); PHP ZOOKEEPER augmentor.php

Slide 21

Slide 21 text

REASSEMBLE SOURCE REDIS ZOOKEEPER define("TIMEOUT", 5); $ch = $zk->getChildren("/services"); $servs = count($ch); COUNT

Slide 22

Slide 22 text

REASSEMBLE while($dat = $sub->recv()){ do { $start = microtime(true); $aug = $redis->brpop($dat['id'],$time)); if(count($aug)) $dat['aug'][] = $aug; $time -= microtime(true) - $start; } while($time > 0 && count($dat['aug']) != $servs); $out->send(json_encode($dat)); //forward } COUNT reassemble.php

Slide 23

Slide 23 text

Data Source Data Source Output Assemble Process Process Filter Filter Filter

Slide 24

Slide 24 text

FILTER ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG ? ? MSG ZEROMQ SUB ZEROMQ PUB TOPIC MSG TOPIC MSG TOPIC MSG ZEROMQ PULL QUERY - NAME MSG HTTP / REST

Slide 25

Slide 25 text

ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG HTTP / REST function escall($server, $path, $param) { $context = stream_context_create( array('http' => $http)); $result = file_get_contents( $serv.'/'.$path, NULL, $context); return json_decode( $result ); } elasticsearch.php

Slide 26

Slide 26 text

ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG HTTP / REST function percolate($host, $path, $tweet) { $path = "/twitter/tweet/_percolate"; $tweet = array('doc' => array( 'tweet' => $tweet['text'])); $match = escall($host, $path, array('content' => json_encode($tweet))); return $match['matches']; } elasticsearch.php

Slide 27

Slide 27 text

// snip... creating in, ctl, out ZMQ socks $poll = new ZMQPoll(); $poll->add($in, ZMQ::POLL_IN); $poll->add($ctl, ZMQ::POLL_IN); $read = $write = array(); FILTER MSG ZEROMQ PULL QUERY - NAME ZEROMQ SUB elasticsearch.php

Slide 28

Slide 28 text

while(true) { $ev = $poll->poll($read, $write, -1); if($read[0] === $in) { $msg = json_decode( $in->recv() ); $matches = percolate($host, $msg); foreach($matches as $match) { $out->sendMulti(array($match, $msg)); } } else if($read[0] === $ctl) { $q = json_decode($ctl->recv()); $name = $q['name']; $query = $q['query']; add_query($host, $name,$query); } } elasticsearch.php

Slide 29

Slide 29 text

Data Source Output Queue Process Filter Data Store Data Store

Slide 30

Slide 30 text

STORE PHP KAFKA TOPIC TOPIC 1 2 3 4 1 2 3 4 SUB APACHE PHP CLIENT HTTP GET TOPIC - OFFSET

Slide 31

Slide 31 text

PHP KAFKA TOPIC TOPIC 1 2 3 4 1 2 3 4 SUB $k = new Kafka_Producer("localhost", 9092); while ($data = $in->recvMulti()) { $topic = $data[0]; $msg = $data[1]; $bytes = $k->send(array($msg), $topic); } kafkastore.php

Slide 32

Slide 32 text

$consumer = new Kafka_SimpleConsumer( 'localhost', 9092, 1, $max); do { $msgs = $consumer->fetch( new Kafka_FetchRequest($top,0,$os,$max) ); foreach($msgs as $msg) echo $msg->payload(), "\n"; $offset += $msgs->validBytes(); } while($msgs->validBytes() > 0); echo json_encode(array("offset"=>$offset)); kafkaconsume.php KAFKA TOPIC TOPIC 1 2 3 4 1 2 3 4 APACHE PHP CLIENT GET

Slide 33

Slide 33 text

OPS

Slide 34

Slide 34 text

JSON & MSGPACK $data = array('id'=>1,'a'=>'a','b'=>'xyz', 'c' => array(1, 2, "abcdefg", array(5, 7, 8))); $enc = json_encode($data); var_dump( json_decode($enc) ); $enc = msgpack_pack($data); var_dump( msgpack_unpack($enc) ); JSON MSGPACK MSGPACK JSON

Slide 35

Slide 35 text

Data Source Output Queue Process Filter Data Store Tap Trace Trace Trace

Slide 36

Slide 36 text

Data Source Output See Also: http://slidesha.re/JaWE78

Slide 37

Slide 37 text

+Ian Barber - [email protected] - @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk THANKS!