Slide 1

Slide 1 text

PROCESSING IN PHP event stream

Slide 2

Slide 2 text

IAN BARBER https://github.com/ianbarber/eep-php https://joind.in/8040 [email protected]

Slide 3

Slide 3 text

http://goo.gl/7IHBZ Blatant Plug: If you’re interested in this topic, try the (rather short) free ebook!

Slide 4

Slide 4 text

PREDICTION We have many tools for analysing data and making predictions in buckets (RDBMs, Hadoop, BigQuery)

Slide 5

Slide 5 text

PREDICTION But can we do the same thing with data-in-motion, analysing data as it is created?

Slide 6

Slide 6 text

EVENT ( A B C ) We can process data in the form of events: programmatic representations of occurrences in the real world or other systems. For our purposes they’re just a tuple, or a list.

Slide 7

Slide 7 text

EVENT STREAM ( A B C ) ( A B C ) ( A B C ) XOM 88.52 XOM 88.56 XOM 88.57 An event stream is a series of events of (usually) the same type - for example a series of tick prices for a stock.

Slide 8

Slide 8 text

EVENT CLOUD ( A B ) ( A B ) ( A B ) FTSE 6318 FTSE 6320 FTSE 6321 ( C D ) ( C D ) ( C D ) An event cloud is a set of related, but not necessarily directly interactive streams. There is usually a non-trivially complex relationship between them - e.g. a stock market index.

Slide 9

Slide 9 text

EEP.js https://github.com/darach/eep-js embedding event processing Darach Ennis took the evented, callback based Node.js and built the basic structures needed to do event stream processing. By focusing on the simplest, most common tools, it can be very, very fast!

Slide 10

Slide 10 text

X STREAM OPERATIONS We just need a handful of tools to do some really interesting things...

Slide 11

Slide 11 text

X STREAM OPERATIONS mem map pipe window aggregate

Slide 12

Slide 12 text

http://reactphp.org

Slide 13

Slide 13 text

X map pipe EVENT FUNCTIONS In PHP, we can use React to give us flexible, unix style pipes, and we have all the power of a general purpose language to map values from one form to another.

Slide 14

Slide 14 text

on('connection', function($conn) use ($filter) { $conn->pipe($filter)->pipe($conn); }); $socket->listen(4000); $loop->run(); react.php With React we can create an event loop, and pipe the connection just like a unix pipe. We can use ThroughStream pipes to build mapping or transform functions.

Slide 15

Slide 15 text

1 2 3 4 5 6 7 8 9 8 7 6 WINDOWS Windows allow us to frame a calculation across a (potentially infinite!) stream.

Slide 16

Slide 16 text

1 2 3 4 5 6 7 8 9 8 7 6 1 2 3 4 5 6 7 8 9 8 7 6 emit emit emit TUMBLING WINDOW Tumbling windows accept items up to their size, then emit the calculation and open a new window.

Slide 17

Slide 17 text

on('emit', function($vals) { if($vals['stdevs']>92 && $vals['mean']>280 && $vals['max']>400) { printf("Stddev %.2fms Average %dms Max %dms - ALERT\n", $vals['stdevs'], $vals['mean'], $vals['max']); } }); $start = microtime(true); for($i = 0; $i < 50000; $i++) { $var = 275 + rand(-150, 150); $all_win->enqueue($var); } servermon.php ‘All’ is our aggregate stats function. When the tumbling window closes, it emits a value which we listen for.

Slide 18

Slide 18 text

1 2 3 4 5 6 7 8 9 8 7 6 emit 3:40 3:40 3:41 3:42 3:45 3:48 3:49 3:50 3:52 3:53 3:53 3:54 emit PERIODIC 5 MINUTE WALL CLOCK Periodic windows have a variable number of events per window (maybe 0) - because they open and closed based on time.

Slide 19

Slide 19 text

on('emit', function($count) { if($count < 50) { echo "Alert - Low Rate! - $count\n"; } else { echo "$count :)\n"; } }); while(true) { $win->enqueue(array(300, 4, 5, 10)); $win->tick(); usleep(100000 + rand(-20000, 20000)); } lowrate.php This example triggers an alert if the count per 5 seconds in less than 50 - this could be a count of orders or sign ups on a site.

Slide 20

Slide 20 text

1 2 3 4 5 6 7 8 9 8 7 6 1 2 3 4 emit 2 3 4 5 3 4 5 6 4 5 6 7 emit emit emit 5 6 7 8 emit SLIDING WINDOW

Slide 21

Slide 21 text

STREAM FUNCTIONS mem aggregate We don’t just have to use the basic statistical functions, but can create custom functions to handle other cases.

Slide 22

Slide 22 text

class TickDetector implements Aggregator { private $state, $data, $time; public function init() { $this->state = 0; $this->data = array(); $this->time = null; } public function accumulate($v) { switch($this->state) { case 0: // Initial state. Always accept the value. $this->data[0] = $v->value; $this->time = $v->at; $this->state = 1; break; case 1: // Look for a drop. if($v->value < $this->data[0]) { $this->data[1] = $v->value; $this->state = 2; } else { $this->reset($v); } ... tick.php Patterns can be matched using a finite state machine - this one uses a switch statement and state tracker variable.

Slide 23

Slide 23 text

class WideFinderFunction implements Aggregator { private $re, $keys; public function __construct($regex) { $this->re = $regex; } public function init() { $this->keys = array(); } public function accumulate($line) { preg_match($this->re, $line, $matches); if(count($matches) == 0) { return; } $key = $matches[1]; if(!isset($this->keys[$key])) { $this->keys[$key] = 1; } else { $this->keys[$key]++; } } } widefinder.php We can process text data - WideFinder updates a hash table of URLs as log events arrive and emits the count for each page.

Slide 24

Slide 24 text

$fn = new Semijoin(); $a_win = new React\EEP\Window\Sliding($fn, 100); $b_win = new React\EEP\Window\Sliding($fn, 100); $counter = 0; $match = function($match) use (&$counter) { if($match) { $counter++; printf("%s \n", $match[1]); } }; $a_win->on("emit", $match); $b_win->on("emit", $match); tuplejoin.php (1) We can look for events between streams using a semijoin - which just checks there is a matching value in the window for the other stream. This is a custom aggregate function shared between two sliding windows.

Slide 25

Slide 25 text

class Semijoin implements Aggregator { private $stream, $last_value; public function __construct() { $this->stream = array(); } public function init() { $this->stream[0] = array(); $this->stream[1] = array(); $this->last_value = null; } public function accumulate($v) { $this->last_value = null; $key = $v->value[0]; if(!isset($this->stream[$v->stream][$key])) { $this->stream[$v->stream][$key] = 0; } $this->stream[$v->stream][$key] += 1; if(isset($this->stream[$v->stream ^ 1][$key])) { $this->last_value = $v->value; } } } tuplejoin.php (2) The join aggregate uses a stream ID on each event, and a hashtable of keys.

Slide 26

Slide 26 text

private function correlate($xo, $x_stats, $yo, $y_stats) { $acc = 0; for($i = 0; $i < $this->sub; $i++) { $acc += ($this->buffers[0][($xo + $i) % $this->size] - $x_stats['mean']) * ($this->buffers[1][($yo + $i) % $this->size] - $y_stats['mean']); } $acc /= $this->sub; $stddevs = ($x_stats['stdevs'] * $y_stats['stdevs']); return ($stddevs == 0 ? 0 : $acc /$stddevs) ; } leader.php (1) We can even do complex interactions. This example attempts to correlate between several windows over a pair of streams. We run the function below against each pair to see if one stream predicts the movement of the other.

Slide 27

Slide 27 text

$a = array(1, 1, 1, 3, 3, 3, 5, 5, 5, 7, 7, 7, 9, 9, 9, 11, 11, 11); $b = array(1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 5, 5, 5, 7, 7, 7); for($i = 0; $i < count($a); $i++) { $win->enqueue(new React\EEP\Event\Muxed($a[$i], 0)); $win->enqueue(new React\EEP\Event\Muxed($b[$i], 1)); } leader.php (2) $ php examples/leader.php Stream 1 leads stream 0 by 9 ticks with a -6.68 correlation Stream 0 leads stream 1 by 3 ticks with a 8.82 correlation Stream 0 leads stream 1 by 3 ticks with a 7.21 correlation Stream 0 leads stream 1 by 3 ticks with a 5.84 correlation Stream 0 leads stream 1 by 3 ticks with a 4.57 correlation Stream 0 leads stream 1 by 3 ticks with a 4.28 correlation Stream 0 leads stream 1 by 3 ticks with a 3.75 correlation Stream 0 leads stream 1 by 3 ticks with a 4.32 correlation

Slide 28

Slide 28 text

X STREAM OPERATIONS With just some simple examples, we can handle a wide range of use cases. These types of windows and functions can be implemented in any language or framework!

Slide 29

Slide 29 text

THANK YOU https://github.com/ianbarber/eep-php embedding event processing https://github.com/darach/eep-js https://joind.in/8040 [email protected] http://profiles.google.com/ianbarber http://twitter.com/ianbarber