Save 37% off PRO during our Black Friday Sale! »

Event Stream Processing In PHP

47d1af0e885746e39195c8ff3234f47d?s=47 Ian Barber
February 22, 2013

Event Stream Processing In PHP

From #phpuk13

47d1af0e885746e39195c8ff3234f47d?s=128

Ian Barber

February 22, 2013
Tweet

Transcript

  1. PROCESSING IN PHP event stream

  2. IAN BARBER https://github.com/ianbarber/eep-php https://joind.in/8040 ianb@php.net

  3. http://goo.gl/7IHBZ Blatant Plug: If you’re interested in this topic, try

    the (rather short) free ebook!
  4. PREDICTION We have many tools for analysing data and making

    predictions in buckets (RDBMs, Hadoop, BigQuery)
  5. PREDICTION But can we do the same thing with data-in-motion,

    analysing data as it is created?
  6. EVENT ( A B C ) We can process data

    in the form of events: programmatic representations of occurrences in the real world or other systems. For our purposes they’re just a tuple, or a list.
  7. EVENT STREAM ( A B C ) ( A B

    C ) ( A B C ) XOM 88.52 XOM 88.56 XOM 88.57 An event stream is a series of events of (usually) the same type - for example a series of tick prices for a stock.
  8. EVENT CLOUD ( A B ) ( A B )

    ( A B ) FTSE 6318 FTSE 6320 FTSE 6321 ( C D ) ( C D ) ( C D ) An event cloud is a set of related, but not necessarily directly interactive streams. There is usually a non-trivially complex relationship between them - e.g. a stock market index.
  9. EEP.js https://github.com/darach/eep-js embedding event processing Darach Ennis took the evented,

    callback based Node.js and built the basic structures needed to do event stream processing. By focusing on the simplest, most common tools, it can be very, very fast!
  10. X STREAM OPERATIONS We just need a handful of tools

    to do some really interesting things...
  11. X STREAM OPERATIONS mem map pipe window aggregate

  12. http://reactphp.org

  13. X map pipe EVENT FUNCTIONS In PHP, we can use

    React to give us flexible, unix style pipes, and we have all the power of a general purpose language to map values from one form to another.
  14. <?php require "vendor/autoload.php"; $loop = React\EventLoop\Factory::create(); $socket = new React\Socket\Server($loop);

    class UpperStream extends React\Stream\ThroughStream { public function filter($data) { return strtoupper($data); } } $filter = new UpperStream(); $socket->on('connection', function($conn) use ($filter) { $conn->pipe($filter)->pipe($conn); }); $socket->listen(4000); $loop->run(); react.php With React we can create an event loop, and pipe the connection just like a unix pipe. We can use ThroughStream pipes to build mapping or transform functions.
  15. 1 2 3 4 5 6 7 8 9 8

    7 6 WINDOWS Windows allow us to frame a calculation across a (potentially infinite!) stream.
  16. 1 2 3 4 5 6 7 8 9 8

    7 6 1 2 3 4 5 6 7 8 9 8 7 6 emit emit emit TUMBLING WINDOW Tumbling windows accept items up to their size, then emit the calculation and open a new window.
  17. <?php require __DIR__.'/../vendor/autoload.php'; echo "Monitor combined variables\n"; $all_fn = new

    React\EEP\Stats\All; $all_win = new React\EEP\Window\Tumbling($all_fn, 100); $all_win->on('emit', function($vals) { if($vals['stdevs']>92 && $vals['mean']>280 && $vals['max']>400) { printf("Stddev %.2fms Average %dms Max %dms - ALERT\n", $vals['stdevs'], $vals['mean'], $vals['max']); } }); $start = microtime(true); for($i = 0; $i < 50000; $i++) { $var = 275 + rand(-150, 150); $all_win->enqueue($var); } servermon.php ‘All’ is our aggregate stats function. When the tumbling window closes, it emits a value which we listen for.
  18. 1 2 3 4 5 6 7 8 9 8

    7 6 emit 3:40 3:40 3:41 3:42 3:45 3:48 3:49 3:50 3:52 3:53 3:53 3:54 emit PERIODIC 5 MINUTE WALL CLOCK Periodic windows have a variable number of events per window (maybe 0) - because they open and closed based on time.
  19. <?php require __DIR__.'/../vendor/autoload.php'; echo "Simple low rate detector\n"; $count_fn =

    new React\EEP\Stats\Count; $win = new React\EEP\Window\Periodic($count_fn, 1000 * 5); $win->on('emit', function($count) { if($count < 50) { echo "Alert - Low Rate! - $count\n"; } else { echo "$count :)\n"; } }); while(true) { $win->enqueue(array(300, 4, 5, 10)); $win->tick(); usleep(100000 + rand(-20000, 20000)); } lowrate.php This example triggers an alert if the count per 5 seconds in less than 50 - this could be a count of orders or sign ups on a site.
  20. 1 2 3 4 5 6 7 8 9 8

    7 6 1 2 3 4 emit 2 3 4 5 3 4 5 6 4 5 6 7 emit emit emit 5 6 7 8 emit SLIDING WINDOW
  21. STREAM FUNCTIONS mem aggregate We don’t just have to use

    the basic statistical functions, but can create custom functions to handle other cases.
  22. class TickDetector implements Aggregator { private $state, $data, $time; public

    function init() { $this->state = 0; $this->data = array(); $this->time = null; } public function accumulate($v) { switch($this->state) { case 0: // Initial state. Always accept the value. $this->data[0] = $v->value; $this->time = $v->at; $this->state = 1; break; case 1: // Look for a drop. if($v->value < $this->data[0]) { $this->data[1] = $v->value; $this->state = 2; } else { $this->reset($v); } ... tick.php Patterns can be matched using a finite state machine - this one uses a switch statement and state tracker variable.
  23. class WideFinderFunction implements Aggregator { private $re, $keys; public function

    __construct($regex) { $this->re = $regex; } public function init() { $this->keys = array(); } public function accumulate($line) { preg_match($this->re, $line, $matches); if(count($matches) == 0) { return; } $key = $matches[1]; if(!isset($this->keys[$key])) { $this->keys[$key] = 1; } else { $this->keys[$key]++; } } } widefinder.php We can process text data - WideFinder updates a hash table of URLs as log events arrive and emits the count for each page.
  24. $fn = new Semijoin(); $a_win = new React\EEP\Window\Sliding($fn, 100); $b_win

    = new React\EEP\Window\Sliding($fn, 100); $counter = 0; $match = function($match) use (&$counter) { if($match) { $counter++; printf("%s \n", $match[1]); } }; $a_win->on("emit", $match); $b_win->on("emit", $match); tuplejoin.php (1) We can look for events between streams using a semijoin - which just checks there is a matching value in the window for the other stream. This is a custom aggregate function shared between two sliding windows.
  25. class Semijoin implements Aggregator { private $stream, $last_value; public function

    __construct() { $this->stream = array(); } public function init() { $this->stream[0] = array(); $this->stream[1] = array(); $this->last_value = null; } public function accumulate($v) { $this->last_value = null; $key = $v->value[0]; if(!isset($this->stream[$v->stream][$key])) { $this->stream[$v->stream][$key] = 0; } $this->stream[$v->stream][$key] += 1; if(isset($this->stream[$v->stream ^ 1][$key])) { $this->last_value = $v->value; } } } tuplejoin.php (2) The join aggregate uses a stream ID on each event, and a hashtable of keys.
  26. private function correlate($xo, $x_stats, $yo, $y_stats) { $acc = 0;

    for($i = 0; $i < $this->sub; $i++) { $acc += ($this->buffers[0][($xo + $i) % $this->size] - $x_stats['mean']) * ($this->buffers[1][($yo + $i) % $this->size] - $y_stats['mean']); } $acc /= $this->sub; $stddevs = ($x_stats['stdevs'] * $y_stats['stdevs']); return ($stddevs == 0 ? 0 : $acc /$stddevs) ; } leader.php (1) We can even do complex interactions. This example attempts to correlate between several windows over a pair of streams. We run the function below against each pair to see if one stream predicts the movement of the other.
  27. $a = array(1, 1, 1, 3, 3, 3, 5, 5,

    5, 7, 7, 7, 9, 9, 9, 11, 11, 11); $b = array(1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 5, 5, 5, 7, 7, 7); for($i = 0; $i < count($a); $i++) { $win->enqueue(new React\EEP\Event\Muxed($a[$i], 0)); $win->enqueue(new React\EEP\Event\Muxed($b[$i], 1)); } leader.php (2) $ php examples/leader.php Stream 1 leads stream 0 by 9 ticks with a -6.68 correlation Stream 0 leads stream 1 by 3 ticks with a 8.82 correlation Stream 0 leads stream 1 by 3 ticks with a 7.21 correlation Stream 0 leads stream 1 by 3 ticks with a 5.84 correlation Stream 0 leads stream 1 by 3 ticks with a 4.57 correlation Stream 0 leads stream 1 by 3 ticks with a 4.28 correlation Stream 0 leads stream 1 by 3 ticks with a 3.75 correlation Stream 0 leads stream 1 by 3 ticks with a 4.32 correlation
  28. X STREAM OPERATIONS With just some simple examples, we can

    handle a wide range of use cases. These types of windows and functions can be implemented in any language or framework!
  29. THANK YOU https://github.com/ianbarber/eep-php embedding event processing https://github.com/darach/eep-js https://joind.in/8040 ianb@php.net http://profiles.google.com/ianbarber

    http://twitter.com/ianbarber