$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a Firehose
Search
Ian Barber
May 24, 2012
Technology
5
1.5k
Building a Firehose
My talk about dealing with streaming data in PHP, from PHP Tek 12.
Ian Barber
May 24, 2012
Tweet
Share
More Decks by Ian Barber
See All by Ian Barber
Crossing Platforms With Google+ Sign-In
ianbarber
0
180
How Google Builds Webservices
ianbarber
3
360
Mobile & Social
ianbarber
2
180
Event Stream Processing In PHP
ianbarber
6
2.4k
Building A Firehose - PHPNW
ianbarber
2
990
Clojure for PHP Developers
ianbarber
6
2k
Taking Sites Mobile
ianbarber
1
610
The Cookie Law
ianbarber
1
970
Teaching Your Machine To Find Fraudsters
ianbarber
3
1.1k
Other Decks in Technology
See All in Technology
モダンデータスタック (MDS) の話とデータ分析が起こすビジネス変革
sutotakeshi
0
500
AWS Security Agentの紹介/introducing-aws-security-agent
tomoki10
0
300
「Managed Instances」と「durable functions」で広がるAWS Lambdaのユースケース
lamaglama39
0
330
EM歴1年10ヶ月のぼくがぶち当たった苦悩とこれからへ向けて
maaaato
0
280
re:Invent2025 コンテナ系アップデート振り返り(+CloudWatchログのアップデート紹介)
masukawa
0
390
Haskell を武器にして挑む競技プログラミング ─ 操作的思考から意味モデル思考へ
naoya
6
1.6k
大企業でもできる!ボトムアップで拡大させるプラットフォームの作り方
findy_eventslides
1
810
IAMユーザーゼロの運用は果たして可能なのか
yama3133
1
450
.NET 10の概要
tomokusaba
0
110
因果AIへの招待
sshimizu2006
0
980
【U/day Tokyo 2025】Cygames流 最新スマートフォンゲームの技術設計 〜『Shadowverse: Worlds Beyond』におけるアーキテクチャ再設計の挑戦~
cygames
PRO
2
270
AI 駆動開発勉強会 フロントエンド支部 #1 w/あずもば
1ftseabass
PRO
0
400
Featured
See All Featured
Writing Fast Ruby
sferik
630
62k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.3k
4 Signs Your Business is Dying
shpigford
186
22k
Building Adaptive Systems
keathley
44
2.9k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Transcript
ian barber -
[email protected]
- @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk BUILDING A FIREHOSE
FILTERABLE REAL TIME STREAMING DATA
SELLING DATA ANALYSIS & DECISIONS USER TOOLS $£¥ ☑☒
DATA SOURCES COMPOSE latency AUGMENT STORE FILTER STREAM
EVENT SAMPLE order tweet temperature snapshot
Data Source Data Source Data Source Output
Data Source Data Source Data Source Output Output
Data Source Data Source Data Source Output Messaging Batch HTTP
Logs HTTP Chunked Websockets Batched POST
APACHE PHP APACHE PHP NODE.JS PUSH ZEROMQ PULL HTTP POST
WEBSOCKETS
APACHE PHP APACHE PHP HTTP POST function sendPos() { navigator.geolocation.getCurrentPosition(
function(pos) { $.ajax({ type: 'POST', url:'http://firehose.com/input.php', data: {lat: pos.coords.latitude, lon: pos.coords.longitude}}); }); setTimeout(sendPos, 60000); } sendPos(); location.php
APACHE PHP APACHE PHP PUSH ZEROMQ WEBSOCKETS $ctx = new
ZMQContext(); $sock = $ctx->getSocket(ZMQ::SOCKET_PUSH); $sock->connect("tcp://localhost:5566"); $data = array( 'id' => get_next_msg_id(), 'uid' => $_COOKIE['uid'], 'lat' => $_POST['lat'], 'lon' => $_POST['lon'] ); $sock->send(json_encode($data)); input.php
APACHE PHP APACHE PHP NODE.JS ZEROMQ PULL WEBSOCKETS app=require('http').createServer(handler), io
= require('socket.io').listen(app), zmq = require('zmq'), sock = zmq.socket('pull'); app.listen(8080); sock.bind('tcp://*:5566'); sock.on('message', function (msg) { var data = JSON.parse(msg); // send to all clients io.sockets.emit("position", event); }); output.js
PHP DAEMON PHP DAEMON NODE.JS PUSH ZEROMQ PULL HTTP STREAM
WEBSOCKETS $fh = fopen("https://".$user.":". $pass."@stream.twitter.com/1/statuses/ filter.json?track=".$search, "r"); while(!feof($fh)) { $d = fgets($fh); if(strlen($d) > 4) { $sock->send($d); } } twitter.php
Data Source Data Source Output Assemble Process Process
SOURCE ASSEMBLE PHP PHP ZEROMQ PUB SUB SUB SUB REDIS
ZEROMQ PUSH
SOURCE PHP ZEROMQ PUB SUB REDIS $ctx = new ZMQContext();
$sub = $ctx->getSocket(ZMQ::SOCKET_SUB); $sub->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE,""); $sub->connect("tcp://localhost:5577"); while( $dat = $sub->recv() ) { $aug = augment(json_decode($dat,true),$obj); $redis->lpush($dat['id'],json_encode($aug)); } augmentor.php
$mongo = new Mongo(); $collection = $m->starbucks->locations; function augment($data, $collection)
{ $loc = array((float) $data['lon'], (float) $data['lat']); $res = $collection->findOne(array( 'loc' => array('$near' => $loc))); return array('name' => 'starbucks', 'val' => $res['street']); } SOURCE PHP REDIS starbucks.php DB
$ld = new Text_LanguageDetect(); $ld->setNameMode(2); function augment($data, $ld) { /*
["en"]=> float(0.24702222222222) */ $names = $ld->detect($data['text'], 1); return array('name' => 'lang', 'val' => key($names)); } SOURCE PHP REDIS langdetect.php
$zk = new Zookeeper(); $zk->connect("localhost:2181"); SOURCE ASSEMBLE PHP PHP REDIS
PHP ZOOKEEPER COUNT OF SERVICES
$zk->create( $path . "/" . uniqid(), NULL, array( array( "perms"
=> Zookeeper::PERM_ALL, "scheme" => "world", "id" => "anyone")), Zookeeper::EPHEMERAL); PHP ZOOKEEPER augmentor.php
REASSEMBLE SOURCE REDIS ZOOKEEPER define("TIMEOUT", 5); $ch = $zk->getChildren("/services"); $servs
= count($ch); COUNT
REASSEMBLE while($dat = $sub->recv()){ do { $start = microtime(true); $aug
= $redis->brpop($dat['id'],$time)); if(count($aug)) $dat['aug'][] = $aug; $time -= microtime(true) - $start; } while($time > 0 && count($dat['aug']) != $servs); $out->send(json_encode($dat)); //forward } COUNT reassemble.php
Data Source Data Source Output Assemble Process Process Filter Filter
Filter
FILTER ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG
? ? MSG ZEROMQ SUB ZEROMQ PUB TOPIC MSG TOPIC MSG TOPIC MSG ZEROMQ PULL QUERY - NAME MSG HTTP / REST
ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG
HTTP / REST function escall($server, $path, $param) { $context = stream_context_create( array('http' => $http)); $result = file_get_contents( $serv.'/'.$path, NULL, $context); return json_decode( $result ); } elasticsearch.php
ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG
HTTP / REST function percolate($host, $path, $tweet) { $path = "/twitter/tweet/_percolate"; $tweet = array('doc' => array( 'tweet' => $tweet['text'])); $match = escall($host, $path, array('content' => json_encode($tweet))); return $match['matches']; } elasticsearch.php
// snip... creating in, ctl, out ZMQ socks $poll =
new ZMQPoll(); $poll->add($in, ZMQ::POLL_IN); $poll->add($ctl, ZMQ::POLL_IN); $read = $write = array(); FILTER MSG ZEROMQ PULL QUERY - NAME ZEROMQ SUB elasticsearch.php
while(true) { $ev = $poll->poll($read, $write, -1); if($read[0] === $in)
{ $msg = json_decode( $in->recv() ); $matches = percolate($host, $msg); foreach($matches as $match) { $out->sendMulti(array($match, $msg)); } } else if($read[0] === $ctl) { $q = json_decode($ctl->recv()); $name = $q['name']; $query = $q['query']; add_query($host, $name,$query); } } elasticsearch.php
Data Source Output Queue Process Filter Data Store Data Store
STORE PHP KAFKA TOPIC TOPIC 1 2 3 4 1
2 3 4 SUB APACHE PHP CLIENT HTTP GET TOPIC - OFFSET
PHP KAFKA TOPIC TOPIC 1 2 3 4 1 2
3 4 SUB $k = new Kafka_Producer("localhost", 9092); while ($data = $in->recvMulti()) { $topic = $data[0]; $msg = $data[1]; $bytes = $k->send(array($msg), $topic); } kafkastore.php
$consumer = new Kafka_SimpleConsumer( 'localhost', 9092, 1, $max); do {
$msgs = $consumer->fetch( new Kafka_FetchRequest($top,0,$os,$max) ); foreach($msgs as $msg) echo $msg->payload(), "\n"; $offset += $msgs->validBytes(); } while($msgs->validBytes() > 0); echo json_encode(array("offset"=>$offset)); kafkaconsume.php KAFKA TOPIC TOPIC 1 2 3 4 1 2 3 4 APACHE PHP CLIENT GET
OPS
JSON & MSGPACK $data = array('id'=>1,'a'=>'a','b'=>'xyz', 'c' => array(1, 2,
"abcdefg", array(5, 7, 8))); $enc = json_encode($data); var_dump( json_decode($enc) ); $enc = msgpack_pack($data); var_dump( msgpack_unpack($enc) ); JSON MSGPACK MSGPACK JSON
Data Source Output Queue Process Filter Data Store Tap Trace
Trace Trace
Data Source Output See Also: http://slidesha.re/JaWE78
ian barber -
[email protected]
- @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk THANKS!