Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building A Firehose - PHPNW
Search
Ian Barber
October 06, 2012
Technology
2
950
Building A Firehose - PHPNW
#phpnw12 version of my talk on building firehose style streaming data systems
Ian Barber
October 06, 2012
Tweet
Share
More Decks by Ian Barber
See All by Ian Barber
Crossing Platforms With Google+ Sign-In
ianbarber
0
160
How Google Builds Webservices
ianbarber
3
340
Mobile & Social
ianbarber
2
180
Event Stream Processing In PHP
ianbarber
7
2.4k
Clojure for PHP Developers
ianbarber
6
1.9k
Building a Firehose
ianbarber
5
1.4k
Taking Sites Mobile
ianbarber
1
580
The Cookie Law
ianbarber
1
940
Teaching Your Machine To Find Fraudsters
ianbarber
3
1k
Other Decks in Technology
See All in Technology
kintone開発組織のDevOpsへの移り変わりと実践
ueokande
1
280
大失敗しないための Web API 開発レシピ / A recipe for not making a big failure on WebAPI development
yokawasa
1
200
Introduction to Bill One Development Engineer
sansan33
PRO
0
240
dbt Cloudの新機能を紹介!データエンジニアリングの民主化:GUIで操作、SQLで管理する新時代のdbt Cloud
sagara
0
130
バクラクのモノレポにおける AI Coding のための環境整備と {Roo,Claude} Code活用事例 / AI Coding in Bakuraku's Monorepo: Environment Setup & Case Studies with {Roo, Claude} Code
upamune
7
3k
会社紹介資料 / Sansan Company Profile
sansan33
PRO
6
370k
JavaのMCPサーバーで体験するAIエージェントの世界
tatsuya1bm
1
220
Tenstorrent HW/SW 概要説明
tenstorrent_japan
0
260
Data Hubグループ 紹介資料
sansan33
PRO
0
1.8k
AIエージェント実践集中コース LT
okaru
1
190
うちの会社の評判は?SNSの投稿分析にAIを使ってみた
doumae
0
620
データ戦略部門 紹介資料
sansan33
PRO
1
3.2k
Featured
See All Featured
Product Roadmaps are Hard
iamctodd
PRO
53
11k
Six Lessons from altMBA
skipperchong
28
3.8k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Making the Leap to Tech Lead
cromwellryan
134
9.3k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
25
2.8k
Designing for Performance
lara
609
69k
Practical Orchestrator
shlominoach
188
11k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
180
53k
Why You Should Never Use an ORM
jnunemaker
PRO
56
9.4k
Transcript
+Ian Barber - ianb@php.net - @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk BUILDING A FIREHOSE
FILTERABLE REAL TIME STREAMING DATA
SELLING DATA ANALYSIS & DECISIONS USER TOOLS $£¥ ☑☒
DATA SOURCES COMPOSE latency AUGMENT STORE FILTER STREAM
EVENT SAMPLE order tweet temperature snapshot
Data Source Data Source Data Source Output
Data Source Data Source Data Source Output Output
Data Source Data Source Data Source Output Messaging Batch HTTP
Logs HTTP Chunked Websockets Batched POST
APACHE PHP APACHE PHP NODE.JS PUSH ZEROMQ PULL HTTP POST
WEBSOCKETS
APACHE PHP APACHE PHP HTTP POST function sendPos() { navigator.geolocation.getCurrentPosition(
function(pos) { $.ajax({ type: 'POST', url:'http://firehose.com/input.php', data: {lat: pos.coords.latitude, lon: pos.coords.longitude}}); }); setTimeout(sendPos, 60000); } sendPos(); location.php
APACHE PHP APACHE PHP PUSH ZEROMQ WEBSOCKETS $ctx = new
ZMQContext(); $sock = $ctx->getSocket(ZMQ::SOCKET_PUSH); $sock->connect("tcp://localhost:5566"); $data = array( 'id' => get_next_msg_id(), 'uid' => $_COOKIE['uid'], 'lat' => $_POST['lat'], 'lon' => $_POST['lon'] ); $sock->send(json_encode($data)); input.php
APACHE PHP APACHE PHP NODE.JS ZEROMQ PULL WEBSOCKETS app=require('http').createServer(handler), io
= require('socket.io').listen(app), zmq = require('zmq'), sock = zmq.socket('pull'); app.listen(8080); sock.bind('tcp://*:5566'); sock.on('message', function (msg) { var data = JSON.parse(msg); // send to all clients io.sockets.emit("position", event); }); output.js
PHP DAEMON PHP DAEMON NODE.JS PUSH ZEROMQ PULL HTTP STREAM
WEBSOCKETS $fh = fopen("https://".$user.":". $pass."@stream.twitter.com/1/statuses/ filter.json?track=".$search, "r"); while(!feof($fh)) { $d = fgets($fh); if(strlen($d) > 4) { $sock->send($d); } } twitter.php
Data Source Data Source Output Assemble Process Process
SOURCE ASSEMBLE PHP PHP ZEROMQ PUB SUB SUB SUB REDIS
ZEROMQ PUSH
SOURCE PHP ZEROMQ PUB SUB REDIS $ctx = new ZMQContext();
$sub = $ctx->getSocket(ZMQ::SOCKET_SUB); $sub->setSockOpt(ZMQ::SOCKOPT_SUBSCRIBE,""); $sub->connect("tcp://localhost:5577"); while( $dat = $sub->recv() ) { $aug = augment(json_decode($dat,true),$obj); $redis->lpush($dat['id'],json_encode($aug)); } augmentor.php
$mongo = new Mongo(); $collection = $m->starbucks->locations; function augment($data, $collection)
{ $loc = array((float) $data['lon'], (float) $data['lat']); $res = $collection->findOne(array( 'loc' => array('$near' => $loc))); return array('name' => 'starbucks', 'val' => $res['street']); } SOURCE PHP REDIS starbucks.php DB
$ld = new Text_LanguageDetect(); $ld->setNameMode(2); function augment($data, $ld) { /*
["en"]=> float(0.24702222222222) */ $names = $ld->detect($data['text'], 1); return array('name' => 'lang', 'val' => key($names)); } SOURCE PHP REDIS langdetect.php
$zk = new Zookeeper(); $zk->connect("localhost:2181"); SOURCE ASSEMBLE PHP PHP REDIS
PHP ZOOKEEPER COUNT OF SERVICES
$zk->create( $path . "/" . uniqid(), NULL, array( array( "perms"
=> Zookeeper::PERM_ALL, "scheme" => "world", "id" => "anyone")), Zookeeper::EPHEMERAL); PHP ZOOKEEPER augmentor.php
REASSEMBLE SOURCE REDIS ZOOKEEPER define("TIMEOUT", 5); $ch = $zk->getChildren("/services"); $servs
= count($ch); COUNT
REASSEMBLE while($dat = $sub->recv()){ do { $start = microtime(true); $aug
= $redis->brpop($dat['id'],$time)); if(count($aug)) $dat['aug'][] = $aug; $time -= microtime(true) - $start; } while($time > 0 && count($dat['aug']) != $servs); $out->send(json_encode($dat)); //forward } COUNT reassemble.php
Data Source Data Source Output Assemble Process Process Filter Filter
Filter
FILTER ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG
? ? MSG ZEROMQ SUB ZEROMQ PUB TOPIC MSG TOPIC MSG TOPIC MSG ZEROMQ PULL QUERY - NAME MSG HTTP / REST
ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG
HTTP / REST function escall($server, $path, $param) { $context = stream_context_create( array('http' => $http)); $result = file_get_contents( $serv.'/'.$path, NULL, $context); return json_decode( $result ); } elasticsearch.php
ELASTIC SEARCH QUERY - NAME QUERY - NAME MSG MSG
HTTP / REST function percolate($host, $path, $tweet) { $path = "/twitter/tweet/_percolate"; $tweet = array('doc' => array( 'tweet' => $tweet['text'])); $match = escall($host, $path, array('content' => json_encode($tweet))); return $match['matches']; } elasticsearch.php
// snip... creating in, ctl, out ZMQ socks $poll =
new ZMQPoll(); $poll->add($in, ZMQ::POLL_IN); $poll->add($ctl, ZMQ::POLL_IN); $read = $write = array(); FILTER MSG ZEROMQ PULL QUERY - NAME ZEROMQ SUB elasticsearch.php
while(true) { $ev = $poll->poll($read, $write, -1); if($read[0] === $in)
{ $msg = json_decode( $in->recv() ); $matches = percolate($host, $msg); foreach($matches as $match) { $out->sendMulti(array($match, $msg)); } } else if($read[0] === $ctl) { $q = json_decode($ctl->recv()); $name = $q['name']; $query = $q['query']; add_query($host, $name,$query); } } elasticsearch.php
Data Source Output Queue Process Filter Data Store Data Store
STORE PHP KAFKA TOPIC TOPIC 1 2 3 4 1
2 3 4 SUB APACHE PHP CLIENT HTTP GET TOPIC - OFFSET
PHP KAFKA TOPIC TOPIC 1 2 3 4 1 2
3 4 SUB $k = new Kafka_Producer("localhost", 9092); while ($data = $in->recvMulti()) { $topic = $data[0]; $msg = $data[1]; $bytes = $k->send(array($msg), $topic); } kafkastore.php
$consumer = new Kafka_SimpleConsumer( 'localhost', 9092, 1, $max); do {
$msgs = $consumer->fetch( new Kafka_FetchRequest($top,0,$os,$max) ); foreach($msgs as $msg) echo $msg->payload(), "\n"; $offset += $msgs->validBytes(); } while($msgs->validBytes() > 0); echo json_encode(array("offset"=>$offset)); kafkaconsume.php KAFKA TOPIC TOPIC 1 2 3 4 1 2 3 4 APACHE PHP CLIENT GET
OPS
JSON & MSGPACK $data = array('id'=>1,'a'=>'a','b'=>'xyz', 'c' => array(1, 2,
"abcdefg", array(5, 7, 8))); $enc = json_encode($data); var_dump( json_decode($enc) ); $enc = msgpack_pack($data); var_dump( msgpack_unpack($enc) ); JSON MSGPACK MSGPACK JSON
Data Source Output Queue Process Filter Data Store Tap Trace
Trace Trace
Data Source Output See Also: http://slidesha.re/JaWE78
+Ian Barber - ianb@php.net - @ianbarber https://github.com/ianbarber/Firehose-PHP-Talk THANKS!