Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Event Stream Processing In PHP

Ian Barber
February 22, 2013

Event Stream Processing In PHP

From #phpuk13

Ian Barber

February 22, 2013
Tweet

More Decks by Ian Barber

Other Decks in Programming

Transcript

  1. PREDICTION We have many tools for analysing data and making

    predictions in buckets (RDBMs, Hadoop, BigQuery)
  2. EVENT ( A B C ) We can process data

    in the form of events: programmatic representations of occurrences in the real world or other systems. For our purposes they’re just a tuple, or a list.
  3. EVENT STREAM ( A B C ) ( A B

    C ) ( A B C ) XOM 88.52 XOM 88.56 XOM 88.57 An event stream is a series of events of (usually) the same type - for example a series of tick prices for a stock.
  4. EVENT CLOUD ( A B ) ( A B )

    ( A B ) FTSE 6318 FTSE 6320 FTSE 6321 ( C D ) ( C D ) ( C D ) An event cloud is a set of related, but not necessarily directly interactive streams. There is usually a non-trivially complex relationship between them - e.g. a stock market index.
  5. EEP.js https://github.com/darach/eep-js embedding event processing Darach Ennis took the evented,

    callback based Node.js and built the basic structures needed to do event stream processing. By focusing on the simplest, most common tools, it can be very, very fast!
  6. X STREAM OPERATIONS We just need a handful of tools

    to do some really interesting things...
  7. X map pipe EVENT FUNCTIONS In PHP, we can use

    React to give us flexible, unix style pipes, and we have all the power of a general purpose language to map values from one form to another.
  8. <?php require "vendor/autoload.php"; $loop = React\EventLoop\Factory::create(); $socket = new React\Socket\Server($loop);

    class UpperStream extends React\Stream\ThroughStream { public function filter($data) { return strtoupper($data); } } $filter = new UpperStream(); $socket->on('connection', function($conn) use ($filter) { $conn->pipe($filter)->pipe($conn); }); $socket->listen(4000); $loop->run(); react.php With React we can create an event loop, and pipe the connection just like a unix pipe. We can use ThroughStream pipes to build mapping or transform functions.
  9. 1 2 3 4 5 6 7 8 9 8

    7 6 WINDOWS Windows allow us to frame a calculation across a (potentially infinite!) stream.
  10. 1 2 3 4 5 6 7 8 9 8

    7 6 1 2 3 4 5 6 7 8 9 8 7 6 emit emit emit TUMBLING WINDOW Tumbling windows accept items up to their size, then emit the calculation and open a new window.
  11. <?php require __DIR__.'/../vendor/autoload.php'; echo "Monitor combined variables\n"; $all_fn = new

    React\EEP\Stats\All; $all_win = new React\EEP\Window\Tumbling($all_fn, 100); $all_win->on('emit', function($vals) { if($vals['stdevs']>92 && $vals['mean']>280 && $vals['max']>400) { printf("Stddev %.2fms Average %dms Max %dms - ALERT\n", $vals['stdevs'], $vals['mean'], $vals['max']); } }); $start = microtime(true); for($i = 0; $i < 50000; $i++) { $var = 275 + rand(-150, 150); $all_win->enqueue($var); } servermon.php ‘All’ is our aggregate stats function. When the tumbling window closes, it emits a value which we listen for.
  12. 1 2 3 4 5 6 7 8 9 8

    7 6 emit 3:40 3:40 3:41 3:42 3:45 3:48 3:49 3:50 3:52 3:53 3:53 3:54 emit PERIODIC 5 MINUTE WALL CLOCK Periodic windows have a variable number of events per window (maybe 0) - because they open and closed based on time.
  13. <?php require __DIR__.'/../vendor/autoload.php'; echo "Simple low rate detector\n"; $count_fn =

    new React\EEP\Stats\Count; $win = new React\EEP\Window\Periodic($count_fn, 1000 * 5); $win->on('emit', function($count) { if($count < 50) { echo "Alert - Low Rate! - $count\n"; } else { echo "$count :)\n"; } }); while(true) { $win->enqueue(array(300, 4, 5, 10)); $win->tick(); usleep(100000 + rand(-20000, 20000)); } lowrate.php This example triggers an alert if the count per 5 seconds in less than 50 - this could be a count of orders or sign ups on a site.
  14. 1 2 3 4 5 6 7 8 9 8

    7 6 1 2 3 4 emit 2 3 4 5 3 4 5 6 4 5 6 7 emit emit emit 5 6 7 8 emit SLIDING WINDOW
  15. STREAM FUNCTIONS mem aggregate We don’t just have to use

    the basic statistical functions, but can create custom functions to handle other cases.
  16. class TickDetector implements Aggregator { private $state, $data, $time; public

    function init() { $this->state = 0; $this->data = array(); $this->time = null; } public function accumulate($v) { switch($this->state) { case 0: // Initial state. Always accept the value. $this->data[0] = $v->value; $this->time = $v->at; $this->state = 1; break; case 1: // Look for a drop. if($v->value < $this->data[0]) { $this->data[1] = $v->value; $this->state = 2; } else { $this->reset($v); } ... tick.php Patterns can be matched using a finite state machine - this one uses a switch statement and state tracker variable.
  17. class WideFinderFunction implements Aggregator { private $re, $keys; public function

    __construct($regex) { $this->re = $regex; } public function init() { $this->keys = array(); } public function accumulate($line) { preg_match($this->re, $line, $matches); if(count($matches) == 0) { return; } $key = $matches[1]; if(!isset($this->keys[$key])) { $this->keys[$key] = 1; } else { $this->keys[$key]++; } } } widefinder.php We can process text data - WideFinder updates a hash table of URLs as log events arrive and emits the count for each page.
  18. $fn = new Semijoin(); $a_win = new React\EEP\Window\Sliding($fn, 100); $b_win

    = new React\EEP\Window\Sliding($fn, 100); $counter = 0; $match = function($match) use (&$counter) { if($match) { $counter++; printf("%s \n", $match[1]); } }; $a_win->on("emit", $match); $b_win->on("emit", $match); tuplejoin.php (1) We can look for events between streams using a semijoin - which just checks there is a matching value in the window for the other stream. This is a custom aggregate function shared between two sliding windows.
  19. class Semijoin implements Aggregator { private $stream, $last_value; public function

    __construct() { $this->stream = array(); } public function init() { $this->stream[0] = array(); $this->stream[1] = array(); $this->last_value = null; } public function accumulate($v) { $this->last_value = null; $key = $v->value[0]; if(!isset($this->stream[$v->stream][$key])) { $this->stream[$v->stream][$key] = 0; } $this->stream[$v->stream][$key] += 1; if(isset($this->stream[$v->stream ^ 1][$key])) { $this->last_value = $v->value; } } } tuplejoin.php (2) The join aggregate uses a stream ID on each event, and a hashtable of keys.
  20. private function correlate($xo, $x_stats, $yo, $y_stats) { $acc = 0;

    for($i = 0; $i < $this->sub; $i++) { $acc += ($this->buffers[0][($xo + $i) % $this->size] - $x_stats['mean']) * ($this->buffers[1][($yo + $i) % $this->size] - $y_stats['mean']); } $acc /= $this->sub; $stddevs = ($x_stats['stdevs'] * $y_stats['stdevs']); return ($stddevs == 0 ? 0 : $acc /$stddevs) ; } leader.php (1) We can even do complex interactions. This example attempts to correlate between several windows over a pair of streams. We run the function below against each pair to see if one stream predicts the movement of the other.
  21. $a = array(1, 1, 1, 3, 3, 3, 5, 5,

    5, 7, 7, 7, 9, 9, 9, 11, 11, 11); $b = array(1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 5, 5, 5, 7, 7, 7); for($i = 0; $i < count($a); $i++) { $win->enqueue(new React\EEP\Event\Muxed($a[$i], 0)); $win->enqueue(new React\EEP\Event\Muxed($b[$i], 1)); } leader.php (2) $ php examples/leader.php Stream 1 leads stream 0 by 9 ticks with a -6.68 correlation Stream 0 leads stream 1 by 3 ticks with a 8.82 correlation Stream 0 leads stream 1 by 3 ticks with a 7.21 correlation Stream 0 leads stream 1 by 3 ticks with a 5.84 correlation Stream 0 leads stream 1 by 3 ticks with a 4.57 correlation Stream 0 leads stream 1 by 3 ticks with a 4.28 correlation Stream 0 leads stream 1 by 3 ticks with a 3.75 correlation Stream 0 leads stream 1 by 3 ticks with a 4.32 correlation
  22. X STREAM OPERATIONS With just some simple examples, we can

    handle a wide range of use cases. These types of windows and functions can be implemented in any language or framework!