Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building A Firehose - PHPNW
Search
Ian Barber
October 06, 2012
Technology
2
980
Building A Firehose - PHPNW
#phpnw12 version of my talk on building firehose style streaming data systems
Ian Barber
October 06, 2012
Tweet
Share
More Decks by Ian Barber
See All by Ian Barber
Crossing Platforms With Google+ Sign-In
ianbarber
0
170
How Google Builds Webservices
ianbarber
3
350
Mobile & Social
ianbarber
2
180
Event Stream Processing In PHP
ianbarber
7
2.4k
Clojure for PHP Developers
ianbarber
6
2k
Building a Firehose
ianbarber
5
1.5k
Taking Sites Mobile
ianbarber
1
590
The Cookie Law
ianbarber
1
950
Teaching Your Machine To Find Fraudsters
ianbarber
3
1.1k
Other Decks in Technology
See All in Technology
下手な強制、ダメ!絶対! 「ガードレール」を「檻」にさせない"ガバナンス"の取り方とは?
tsukaman
2
450
研究開発と製品開発、両利きのロボティクス
youtalk
1
530
AI開発ツールCreateがAnythingになったよ
tendasato
0
130
KotlinConf 2025_イベントレポート
sony
1
140
AIエージェント開発用SDKとローカルLLMをLINE Botと組み合わせてみた / LINEを使ったLT大会 #14
you
PRO
0
130
Django's GeneratedField by example - DjangoCon US 2025
pauloxnet
0
150
2025年夏 コーディングエージェントを統べる者
nwiizo
0
170
「Linux」という言葉が指すもの
sat
PRO
4
140
プラットフォーム転換期におけるGitHub Copilot活用〜Coding agentがそれを加速するか〜 / Leveraging GitHub Copilot During Platform Transition Periods
aeonpeople
1
170
これでもう迷わない!Jetpack Composeの書き方実践ガイド
zozotech
PRO
0
990
要件定義・デザインフェーズでもAIを活用して、コミュニケーションの密度を高める
kazukihayase
0
120
TS-S205_昨年対比2倍以上の機能追加を実現するデータ基盤プロジェクトでのAI活用について
kaz3284
1
190
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
36
6.9k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Unsuck your backbone
ammeep
671
58k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
530
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
113
20k
Building Applications with DynamoDB
mza
96
6.6k
How to train your dragon (web standard)
notwaldorf
96
6.2k
Building Adaptive Systems
keathley
43
2.7k
The Invisible Side of Design
smashingmag
301
51k
Scaling GitHub
holman
463
140k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Transcript
+Ian Barber -
[email protected]
- @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk BUILDING A FIREHOSE
FILTERABLE REAL TIME STREAMING DATA
SELLING DATA ANALYSIS & DECISIONS USER TOOLS $£¥ ☑☒
DATA SOURCES COMPOSE latency AUGMENT STORE FILTER STREAM
EVENT SAMPLE order tweet temperature snapshot
Data Source Data Source Data Source Output
Data Source Data Source Data Source Output Output
Data Source Data Source Data Source Output Messaging Batch HTTP
Logs HTTP Chunked Websockets Batched POST
APACHE PHP APACHE PHP NODE.JS PUSH ZEROMQ PULL HTTP POST
WEBSOCKETS
APACHE PHP APACHE PHP HTTP POST function sendPos() { navigator.geolocation.getCurrentPosition(
function(pos) { $.ajax({ type: 'POST', url:'http://firehose.com/input.php', data: {lat: pos.coords.latitude, lon: pos.coords.longitude}}); }); setTimeout(sendPos, 60000); } sendPos(); location.php
APACHE PHP APACHE PHP PUSH ZEROMQ WEBSOCKETS $ctx = new
ZMQContext(); $sock = $ctx->getSocket(ZMQ::SOCKET_PUSH); $sock->connect("tcp://localhost:5566"); $data = array( 'id' => get_next_msg_id(), 'uid' => $_COOKIE['uid'], 'lat' => $_POST['lat'], 'lon' => $_POST['lon'] ); $sock->send(json_encode($data)); input.php
APACHE PHP APACHE PHP NODE.JS ZEROMQ PULL WEBSOCKETS app=require('http').createServer(handler), io
= require('socket.io').listen(app), zmq = require('zmq'), sock = zmq.socket('pull'); app.listen(8080); sock.bind('tcp://*:5566'); sock.on('message', function (msg) { var data = JSON.parse(msg); // send to all clients io.sockets.emit("position", event); }); output.js
PHP DAEMON PHP DAEMON NODE.JS PUSH ZEROMQ PULL HTTP STREAM
WEBSOCKETS $fh = fopen("https://".$user.":". $pass."@stream.twitter.com/1/statuses/ filter.json?track=".$search, "r"); while(!feof($fh)) { $d = fgets($fh); if(strlen($d) > 4) { $sock->send($d); } } twitter.php
Data Source Data Source Output Assemble Process Process
SOURCE ASSEMBLE PHP PHP ZEROMQ PUB SUB SUB SUB REDIS
ZEROMQ PUSH
SOURCE PHP ZEROMQ PUB SUB REDIS $ctx = new ZMQContext();
$sub = $ctx->getSocket(ZMQ::SOCKET_SUB); $sub->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE,""); $sub->connect("tcp://localhost:5577"); while( $dat = $sub->recv() ) { $aug = augment(json_decode($dat,true),$obj); $redis->lpush($dat['id'],json_encode($aug)); } augmentor.php
$mongo = new Mongo(); $collection = $m->starbucks->locations; function augment($data, $collection)
{ $loc = array((float) $data['lon'], (float) $data['lat']); $res = $collection->findOne(array( 'loc' => array('$near' => $loc))); return array('name' => 'starbucks', 'val' => $res['street']); } SOURCE PHP REDIS starbucks.php DB
$ld = new Text_LanguageDetect(); $ld->setNameMode(2); function augment($data, $ld) { /*
["en"]=> float(0.24702222222222) */ $names = $ld->detect($data['text'], 1); return array('name' => 'lang', 'val' => key($names)); } SOURCE PHP REDIS langdetect.php
$zk = new Zookeeper(); $zk->connect("localhost:2181"); SOURCE ASSEMBLE PHP PHP REDIS
PHP ZOOKEEPER COUNT OF SERVICES
$zk->create( $path . "/" . uniqid(), NULL, array( array( "perms"
=> Zookeeper::PERM_ALL, "scheme" => "world", "id" => "anyone")), Zookeeper::EPHEMERAL); PHP ZOOKEEPER augmentor.php
REASSEMBLE SOURCE REDIS ZOOKEEPER define("TIMEOUT", 5); $ch = $zk->getChildren("/services"); $servs
= count($ch); COUNT
REASSEMBLE while($dat = $sub->recv()){ do { $start = microtime(true); $aug
= $redis->brpop($dat['id'],$time)); if(count($aug)) $dat['aug'][] = $aug; $time -= microtime(true) - $start; } while($time > 0 && count($dat['aug']) != $servs); $out->send(json_encode($dat)); //forward } COUNT reassemble.php
Data Source Data Source Output Assemble Process Process Filter Filter
Filter
FILTER ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG
? ? MSG ZEROMQ SUB ZEROMQ PUB TOPIC MSG TOPIC MSG TOPIC MSG ZEROMQ PULL QUERY - NAME MSG HTTP / REST
ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG
HTTP / REST function escall($server, $path, $param) { $context = stream_context_create( array('http' => $http)); $result = file_get_contents( $serv.'/'.$path, NULL, $context); return json_decode( $result ); } elasticsearch.php
ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG
HTTP / REST function percolate($host, $path, $tweet) { $path = "/twitter/tweet/_percolate"; $tweet = array('doc' => array( 'tweet' => $tweet['text'])); $match = escall($host, $path, array('content' => json_encode($tweet))); return $match['matches']; } elasticsearch.php
// snip... creating in, ctl, out ZMQ socks $poll =
new ZMQPoll(); $poll->add($in, ZMQ::POLL_IN); $poll->add($ctl, ZMQ::POLL_IN); $read = $write = array(); FILTER MSG ZEROMQ PULL QUERY - NAME ZEROMQ SUB elasticsearch.php
while(true) { $ev = $poll->poll($read, $write, -1); if($read[0] === $in)
{ $msg = json_decode( $in->recv() ); $matches = percolate($host, $msg); foreach($matches as $match) { $out->sendMulti(array($match, $msg)); } } else if($read[0] === $ctl) { $q = json_decode($ctl->recv()); $name = $q['name']; $query = $q['query']; add_query($host, $name,$query); } } elasticsearch.php
Data Source Output Queue Process Filter Data Store Data Store
STORE PHP KAFKA TOPIC TOPIC 1 2 3 4 1
2 3 4 SUB APACHE PHP CLIENT HTTP GET TOPIC - OFFSET
PHP KAFKA TOPIC TOPIC 1 2 3 4 1 2
3 4 SUB $k = new Kafka_Producer("localhost", 9092); while ($data = $in->recvMulti()) { $topic = $data[0]; $msg = $data[1]; $bytes = $k->send(array($msg), $topic); } kafkastore.php
$consumer = new Kafka_SimpleConsumer( 'localhost', 9092, 1, $max); do {
$msgs = $consumer->fetch( new Kafka_FetchRequest($top,0,$os,$max) ); foreach($msgs as $msg) echo $msg->payload(), "\n"; $offset += $msgs->validBytes(); } while($msgs->validBytes() > 0); echo json_encode(array("offset"=>$offset)); kafkaconsume.php KAFKA TOPIC TOPIC 1 2 3 4 1 2 3 4 APACHE PHP CLIENT GET
OPS
JSON & MSGPACK $data = array('id'=>1,'a'=>'a','b'=>'xyz', 'c' => array(1, 2,
"abcdefg", array(5, 7, 8))); $enc = json_encode($data); var_dump( json_decode($enc) ); $enc = msgpack_pack($data); var_dump( msgpack_unpack($enc) ); JSON MSGPACK MSGPACK JSON
Data Source Output Queue Process Filter Data Store Tap Trace
Trace Trace
Data Source Output See Also: http://slidesha.re/JaWE78
+Ian Barber -
[email protected]
- @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk THANKS!