Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Drinking from the Firehose (True North PHP)

Drinking from the Firehose (True North PHP)

Samantha Quiñones

November 07, 2015
Tweet

More Decks by Samantha Quiñones

Other Decks in Technology

Transcript

  1. DRINKING FROM THE FIREHOSE
    Real-Time Metrics
    SAMANTHA QUIÑONES
    True North PHP 2015

    View Slide

  2. SAMANTHA QUIÑONES
    ABOUT ME
    ▸ Software Engineer & Data Nerd since 1997
    ▸ Doing “media stuff” since 2012
    ▸ Principal @ AOL since 2014
    ▸ @ieatkillerbees
    ▸ http://samanthaquinones.com

    View Slide

  3. THAT AOL?

    View Slide

  4. “HOW WOULD YOU
    LET EDITORS TEST
    HOW WELL
    DIFFERENT
    HEADLINES
    PERFORM FOR THE
    SAME PIECE OF
    CONTENT?”
    Shashi Reddy, Senior Engineer, AOL

    View Slide

  5. MEASURE ONCE CUT TWICE
    TRADITIONAL METRICS
    ▸ Request response time - are we responding fast enough?
    ▸ Cache hit rate - are we making our backend work too hard?
    ▸ Resource utilization - do we have enough “hardware”?

    View Slide

  6. View Slide

  7. View Slide

  8. Delay Perception
    <100ms Instantaneous
    <300ms Perceptible Delay
    <1000ms System “Working”
    <10000ms System “Slow”
    >10000ms System “Down”
    Source: O’Reilly Media

    View Slide

  9. AMAZON SALES DATA
    EFFECTS OF LATENCY
    Amazon Sales: -1% sales per 100ms increased latency
    Sales (USD)
    0
    500
    1000
    1500
    2000
    Seconds of Latency
    0 1 2 3 5 6 7 8 9 10 11 12 13 14 15
    S = S - (msL*1%)
    Linden, G. (2006, December 3). Make Data Useful. Data Mining (CS345). Lecture conducted from Stanford University, Stanford, CA.

    View Slide

  10. GOOGLE SEARCH EXPERIMENT
    EFFECTS OF LATENCY
    Google Search Experiment (4-6 Weeks)
    % Fewer Searches per Day
    -1
    -0.75
    -0.5
    -0.25
    0
    Milliseconds of Additional Latency
    50 100 200 400
    Schurman, E., Brutlag, J. (2009, June 23). The User and Business Impact of Server Delays, Additional Bytes, and HTTP Chunking in Web Search. O'Reilly Velocity. Lecture conducted from O'Reilly Media, San Jose, CA.

    View Slide

  11. BEHAVIORAL METRICS
    MULTIVARIATE (A/B) TESTING
    ▸ Sort all users in to groups
    ▸ 1 control group receives unaltered content
    ▸ 1 or more groups receive altered content
    ▸ Measure behavioral statistics (CTR, abandon rate, time on page, scroll depth) for
    each group

    View Slide

  12. OTHER METRICS
    STATE MONITORING
    ▸ Exception logging
    ▸ Load monitoring
    ▸ System performance
    ▸ Application performance
    ▸ Cache performance

    View Slide

  13. UNDERSTANDING YOUR
    AUDIENCE
    TRAFFIC METRICS

    View Slide

  14. BEHAVIORAL METRICS
    MEASURING USER BEHAVIOR & EXPERIENCE
    ▸ Application path - What does the user click on?
    ▸ Usage patterns - When does the user visit? Where do they come from?
    ▸ Mouse & attention tracking - What draws the user’s attention?
    ▸ RUM

    View Slide

  15. View Slide

  16. TRAFFIC METRICS
    DEMOGRAPHIC INFORMATION COLLECTION
    ▸ Geographic location and region
    ▸ ISP
    ▸ Device information
    ▸ Anonymized user identification

    View Slide

  17. CASE STUDY
    AOL MEDIA PLATFORM
    ▸ Content management
    ▸ Distributed rendering farm
    ▸ Integrated development environment using custom DSL
    ▸ Content aggregation platform
    ▸ Machine learning platform
    ▸ Multi-tenant system

    View Slide

  18. CASE STUDY
    MEASURING THE AOL MEDIA PLATFORM

    View Slide

  19. CASE STUDY
    METRICS & ANALYTICS
    ▸ Omniture (revenue analytics)
    ▸ New Relic (APM)
    ▸ ELK (APM)
    ▸ AOL proprietary data platform (RUM & Demographics)

    View Slide

  20. CASE STUDY
    AOL DATA LAYER
    ▸ Massively distributed data collection
    ▸ Hadoop
    ▸ Access via Hive & Pig
    ▸ Time-shared
    ▸ Cassandra
    ▸ Vertica (ingested Omniture data)
    ▸ Streaming Interface (raw data)

    View Slide

  21. “BEACON” SERVER “BEACON” SERVER
    “BEACON” SERVER “BEACON” SERVER “BEACON” SERVER
    RABBITMQ FARM
    DATA LAYER
    SERVICES
    couchbase
    hadoop
    RABBITMQ STREAMER FARM
    DATA LAYER
    STREAMER
    DATA LAYER
    STREAMER
    DATA LAYER
    STREAMER
    vertica
    cassandra

    View Slide

  22. BEACON PAYLOAD
    {
    "anonymous_id": "e33d53be-7b7e-11e5-8bcf-feff819cdc9f",
    "channel": "aol.us",
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36",
    "referer": "www.aol.com",
    "location": "country=us,region=va,city=alexandria,latitude=38.819940,longitude=-77.145418",
    "mv_tests": "mv_test1:mv_test_pop_id;mv_test_metadata"
    }

    View Slide

  23. ~40 metrics
    ~1.6 KB
    per
    EVENT

    View Slide

  24. ~15,000 events
    25 MB
    per
    SECOND

    View Slide

  25. 1,300,000,000 events
    ~2 TB
    per
    DAY

    View Slide

  26. CASE STUDY
    CONTENT CREATORS WANT TO KNOW
    ▸ Today’s traffic by author & vertical
    ▸ Top performing articles for the past hour
    ▸ Recent social engagement trends

    View Slide

  27. CASE STUDY
    CONTENT SITE DEVELOPERS WANT TO KNOW
    ▸ API Query Performance
    ▸ Details of handled exceptions
    ▸ How to maximize cache hit rate

    View Slide

  28. THEY NEED TO KNOW
    NOW.
    THE 24-HOUR MEDIA CYCLE

    View Slide

  29. View Slide

  30. “HOW WOULD YOU
    LET EDITORS TEST
    HOW WELL
    DIFFERENT
    HEADLINES
    PERFORM FOR THE
    SAME PIECE OF
    CONTENT?”
    Shashi Reddy, Senior Engineer, AOL

    View Slide

  31. IN THREE EASY FAILURES
    BUILDING A REAL-TIME DATA PIPELINE

    View Slide

  32. THE PROOF OF CONCEPT
    COLLECTOR
    COLLECTOR
    COLLECTOR
    STREAMER COLLECTOR
    COLLECTOR
    COLLECTOR
    RECEIVER COLLECTOR
    COLLECTOR
    COLLECTOR
    STATSD
    CLUSTER
    ELASTICSEARCH

    View Slide

  33. View Slide

  34. TINY, ENCAPSULATED NANOSERVICES
    this.visit = function(record) {
    if (record.userAgent) {
    var parser = new UAParser();
    parser.setUA(record.userAgent);
    var user_agent = parser.getResult();
    return { user_agent: user_agent }
    }
    return {};
    };

    View Slide

  35. View Slide

  36. CASE STUDY
    PROOF OF CONCEPT PERFORMANCE & RESULTS
    ▸ Message Rate: 300 per second
    ▸ Receivers Needed: ~70+
    ▸ StatsD imposes a number of limitations
    ▸ Breaks rich payloads down in to discrete metrics
    ▸ Anything but in-flight aggregation means querying Elasticsearch

    View Slide

  37. An efficient real-time data pathway consists of a
    network of transits and terminals, where no single
    node acts as both a transit and a terminal at the
    same time.
    CASE STUDY

    View Slide

  38. CASE STUDY
    TRANSITS
    ▸ Short-term
    ▸ In-memory
    ▸ Volatile storage
    ▸ Data with life-spans up to a few seconds

    View Slide

  39. CASE STUDY
    TERMINALS
    ▸ Destinations that store,
    ▸ Destroy,
    ▸ or Retransmit data

    View Slide

  40. KAFKA VS RABBITMQ
    TOOL EVALUATION

    View Slide

  41. TOOL EVALUATION - KAFKA VS STORM
    APACHE KAFKA
    ▸ Pub/Sub Message Broker
    ▸ Born @LinkedIn around 2011
    ▸ Apache project since 2014
    ▸ Key focuses
    ▸ Message integrity (persistence-first model)
    ▸ Message order
    ▸ Fault tolerance

    View Slide

  42. TOOL EVALUATION
    RABBITMQ
    ▸ AMQP implementation
    ▸ Born in 2007
    ▸ Acquired by Pivotal Software in 2013
    ▸ Key focuses:
    ▸ General-purpose messaging
    ▸ Routing
    ▸ HA through Federation

    View Slide

  43. TOOL EVALUATION
    REQUIREMENTS & CONSIDERATIONS
    ▸ Payloads may arrive in any order.
    ▸ Some data loss is acceptable.
    ▸ Consumers may only want small subsets of data
    ▸ Need to route data to consumers in multiple
    datacentres / in AWS
    ▸ Broad support for languages

    View Slide

  44. TOOL EVALUATION
    TRANSIT: RABBITMQ
    ▸ RabbitMQ’s priorities are similar to ours
    ▸ Federation over at-least-once delivery
    ▸ Supports complex routing
    ▸ Allows federation over network boundaries (even when it’s dumb)
    ▸ Mature clients for our Big Three Stacks (Java, Node.js, PHP)
    ▸ Big enterprises like stuff with companies behind it

    View Slide

  45. VERSION 1
    COLLECTOR
    COLLECTOR
    COLLECTOR
    STREAMER COLLECTOR
    COLLECTOR
    COLLECTOR
    RECEIVER COLLECTOR
    COLLECTOR
    COLLECTOR
    RABBITMQ ELASTICSEARCH

    View Slide

  46. CASE STUDY
    MORE THAN JUST RABBITMQ
    ▸ Moved away from Observer Pattern for data processing to a single in and a
    single out event.
    ▸ Node.js event handling is VERY fast, but the sheer number of events being
    created caused memory problems.
    ▸ Rather than tuning within the app or engine, let back pressure mechanism
    regulate input rate.

    View Slide

  47. View Slide

  48. while (buffer.length > 0) {
    var char = buffer.shift();
    if ('\n' === char) {
    queue.push(new Buffer(outbuf.join('')));
    continue;
    }
    outbuf.push(char);
    }
    var i = 0;
    var tBuf = buffer.slice();
    while (i < buffer.length) {
    var char = tBuf[i++];
    if ('\n' === char) {
    queue.push(new Buffer(outbuf.join('')));
    }
    outbuf.push(char);
    }

    View Slide

  49. CASE STUDY
    VERSION 1 PERFORMANCE & RESULTS
    ▸ Message Rate: 600/s
    ▸ Receivers Needed: ~35+
    ▸ Adding code to handle weird edge cases in data degrades performance.
    ▸ Micro-optimization of code leads to hard-to-fix crashes and memory leaks.

    View Slide

  50. NOT KNOWING
    HOW TO USE A
    TOOL DOESN’T
    MEAN IT’S
    BROKEN.

    View Slide

  51. CASE STUDY
    GETTING SERIOUS
    ▸ Receiving data, editing it, and routing it in the same step violates my transit/
    terminal separation policy.
    ▸ Receiver needs to be a simple transit that consumes and pushes data on to
    RabbitMQ
    ▸ Nice-to-haves:
    ▸ Static & dynamic optimization
    ▸ Clean multithreading/multiprocessing
    ▸ Good memory management for large, volatile in-memory data sets

    View Slide

  52. TOOL EVALUATION
    PICKING A STACK FOR THE DATA RECEIVER - THE PROS
    ▸ Node.js - Simple, easy-to-distribute, fast.
    ▸ Go - Native concurrency & memory management, fast compiler.
    ▸ Rust - C++ with modern tooling.
    ▸ Java - Static & dynamic optimization, good memory management & multi-
    threading.
    ▸ C/C++ - Speed, good libraries for handling concurrency & memory.

    View Slide

  53. TOOL EVALUATION
    PICKING A STACK FOR THE DATA RECEIVER - THE CONS
    ▸ Node.js - Too many instances needed to manage production flow.
    ▸ Go - No one on my team has any familiarity.
    ▸ Rust - No one on my team has any desire to have any familiarity.
    ▸ Java - All the cool kids will pick on me.
    ▸ C/C++ - I like myself too much.

    View Slide

  54. AN ARCHITECT MUST
    UNDERSTAND OTHERS’ VISIONS
    BEFORE EXPRESSING THEIR OWN.
    ARE YOU REALLY GOING TO QUOTE YOURSELF?

    View Slide

  55. mfw java :(

    View Slide

  56. VERSION 2 (JAVA BOOGALOO)
    COLLECTOR
    COLLECTOR
    COLLECTOR
    STREAMER COLLECTOR
    COLLECTOR
    COLLECTOR
    RECEIVER ELASTICSEARCH
    RABBITMQ COLLECTOR
    COLLECTOR
    COLLECTOR
    PROCESSOR/
    ROUTER

    View Slide

  57. public class StreamReader {
    private static final Logger logger = Logger.getLogger(StreamReader.class.getName());
    private StreamerQueue queue = new StreamerQueue();
    private StreamProcessor processor;
    private List workerThreads = new ArrayList();
    private RtStreamerClient client;
    public StreamReader(String streamerURI, AmqpClient amqpClient, String appID, String tpcFltrs, String
    rfFltrs, String bt) {
    ArrayList queueList = new ArrayList();
    this.processor = new StreamProcessor(amqpClient);
    byte numThreads = 8;
    for(int i = 0; i < numThreads; ++i) {
    StreamReader.BeaconWorkerThread worker = new StreamReader.BeaconWorkerThread();
    this.workerThreads.add(worker);
    worker.start();
    }
    queueList.add(this.queue);
    this.client = new RtStreamerClient(streamerURI, appID, tpcFltrs, rfFltrs, bt, queueList);
    }
    }
    CREATING MULTIPLE THREADS
    WITH STANDALONE
    CONNECTIONS TO RABBITMQ
    SIMPLE WRAPPER AROUND
    NATIVE JAVA LINE STREAMER

    View Slide

  58. public class StreamProcessor {
    private static final Logger logger =
    Logger.getLogger(StreamProcessor.class.getName());
    private AmqpClient amqpClient;
    public StreamProcessor(AmqpClient amqpClient) {
    this.amqpClient = amqpClient;
    }
    public void send(String data) throws Exception {
    this.amqpClient.send(data.getBytes());
    logger.debug("Sent event " + data + " to AMQP");
    }
    }
    SIMPLE PASS-THRU

    View Slide

  59. QUEUE
    QUEUE
    QUEUE
    QUEUE
    QUEUE
    QUEUE
    QUEUE
    QUEUE
    QUEUE
    QUEUE
    QUEUE
    NETWORK INPUT
    NETWORK OUTPUT
    Linked List Queues

    View Slide

  60. View Slide

  61. CASE STUDY
    VERSION 2 PERFORMANCE & RESULTS
    ▸ Message Rate: 2600/s
    ▸ Receivers Needed: ~10
    ▸ Validity filtering is almost free in the Java receiver (can’t parse as JSON, drop it)
    ▸ Processor / Router Service selects only the messages it wants. Everything else is
    left for another service to collect, or to be dropped on the floor.

    View Slide

  62. View Slide

  63. WITHOUT CONSUMERS, A
    PIPELINE IS USELESS.
    PLEASE STOP QUOTING YOURSELF SAMANTHA, IT’S PATHETIC

    View Slide

  64. LETS DO MATH AT IT!
    REAL-TIME ANALYTICS SERVICE

    View Slide

  65. REAL-TIME ANALYTICS SERVICE
    GOALS
    ▸ Provide (near) real-time statistics, metrics, and analytics for editorial staff
    ▸ Allow statistical evaluation of arbitrary variables
    ▸ Provide a simple interface for developers working in the publishing stack (PHP)

    View Slide

  66. REAL-TIME ANALYTICS SERVICE
    WHAT IS ELASTIC SEARCH
    ▸ A full-text search database
    ▸ A high performance NOSQL document store that features
    ▸ High-availability via clustering
    ▸ Rack/Datacentre-aware sharding
    ▸ Expressive & dynamic query DSL
    ▸ Some powerful full-text search, I guess, whatever?

    View Slide

  67. AOL US East Datacentre
    AOL France Datacentre AWS us-east-1 Region
    AOL US West Datacentre
    ELASTICSEARCH MASTER
    ELASTICSEARCH NODE
    ELASTICSEARCH NODE
    ELASTICSEARCH NODE

    View Slide

  68. ELASTICSEARCH CLUSTERING
    ONE INDEX, TWO REPLICAS
    MASTER NODE NODE
    NODE
    R0 P1 P2 P0
    R1 R2 R3 R2 R3

    View Slide

  69. {
    "query": {
    "filtered": {
    "query": {
    "multi_match": {
    "query": "miley cyrus",
    "fields": [
    "byline",
    "title",
    "contents"
    ],
    "type": "cross_fields"
    }
    },
    "filter": {
    "terms": {
    "site_id": [
    698
    ]
    }
    }
    }
    },
    "size": 25
    }

    View Slide

  70. {
    "size": 0,
    "query": {
    "filtered": {
    "query": {
    "terms": {
    "content.source.cms.post_id": [
    12347,
    22314,
    242123,
    342414
    ]
    }
    },
    "filter": {
    "bool": {
    "must": [
    {
    "term": {
    "click_type": "ping"
    }
    },
    {
    "range": {
    "timestamp": {
    "gte": 1445854380000,
    "lte": 1445940780000
    }
    }
    }
    ]
    }
    }
    }
    },
    "aggregations": {
    "post_id": {
    "terms": {
    "field": "content.source.cms.post_id",
    "size": 4,
    "order": {
    "_count": "desc"
    }
    },
    "aggregations": {
    "search_terms": {
    "terms": {
    "field": "referer.search_term.raw"
    }
    },
    "source": {
    "terms": {
    "field": "referer.medium"
    },
    "aggregations": {
    "referer": {
    "terms": {
    "field": "referer.referer"
    },
    "aggregations": {
    "search_terms": {
    "terms": {
    "field": "referrer.search_term.raw"
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }

    View Slide

  71. View Slide

  72. var elasticsearch = require('elasticsearch');
    var client = new elasticsearch.Client({hosts: ['http://localhost:9200']});
    var buffer = [];
    for (var document in documents) {
    buffer.push({ index: { _index: "some_index", _type: "some_type" }});
    buffer.push(document);
    }
    client.bulk({body: buffer});

    View Slide

  73. $params = [];
    $params['type'] = 'stat';
    $params['index'] = isset($args->search_index)
    ? $args->search_index
    : $elasticSearch->getDatedIndexList($start, $end);
    $params['ignore_unavailable'] = true;
    $params['body'] = $this->getQuery($args->post_ids, $start, $end);
    $results = $client->search($params);

    View Slide

  74. CASE STUDY
    MULTIVARIATE TESTING - REQUIREMENTS
    ▸ Allow editors to test the performance of any discrete content element
    ▸ Content elements being: headlines, deks, ledes, subledes, hero images, river
    images, etc.
    ▸ Editors should be able to create, start, stop, and evaluate tests without spending
    developer time.

    View Slide

  75. CASE STUDY
    MULTIVARIATE TESTING - IMPLEMENTATION
    ▸ Assign new visitors to a test group via cookie
    ▸ Inject test markers in to beacon payload
    ▸ Compare CTR for PVs with test markers to calculate performance

    View Slide

  76. {
    "mv_stats": {
    "type": "nested",
    "include_in_parent": true,
    "properties": {
    "hash": {
    "type": "string"
    },
    "test_id": {
    "type": "integer"
    }
    }
    }
    }
    TEST POPULATION
    IDENTIFIER
    TEST ID

    View Slide

  77. {
    "size": 0,
    "query": {
    "filtered": {
    "query": {
    "terms": {
    "mv_stats.test_id": [
    42
    ]
    }
    },
    "filter": {
    "bool": {
    “must": [
    {
    "term": {
    "click_type": "ping"
    }
    },
    {
    "range": {
    "timestamp": {
    "gte": 1445854380000,
    "lte": 1445940780000
    }
    }
    }
    ]
    }
    }
    }
    },
    "aggs": {
    "event_type": {
    "terms": {
    "field": "click_type"
    },
    "aggs": {
    "multivariate": {
    "nested": {
    "path": "mv_stats"
    },
    "aggs": {
    "test_ids": {
    "terms": {
    "field": "mv_stats.test_id"
    },
    "aggs": {
    "hashes": {
    "terms": {
    "field": "mv_stats.hash"
    },
    "aggs": {
    "event_types": {
    "terms": {
    "field": "click_type"
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    }
    REGULAR
    PAGEVIEW
    AGGREGATIONS NEST
    AND TAKE A CONTEXT OF
    THE PARENT

    View Slide

  78. $results = $this->analytics()->multivariate()->get([
    'test_id' => $id,
    'event_type' => 'all',
    'start' => $test['started']
    ])->data();
    if (!empty($results['hashes'])) {
    foreach (array_keys($test['items']) as $hash) {
    $clicks = 0;
    if (!empty($results['hashes'][$hash]['clicks'])) {
    $clicks = $results['hashes'][$hash]['clicks'];
    }
    $pings = 0;
    if (!empty($results['hashes'][$hash]['pings'])) {
    $pings = $results['hashes'][$hash]['pings'];
    }
    $test['items'][$hash]['clicks'] = $clicks;
    $test['items'][$hash]['pings'] = $pings;
    $test['items'][$hash]['percent'] = ($clicks / $pings) * 100;
    }
    }

    View Slide

  79. View Slide

  80. MEANINGLESS SHINIES
    TO OOH AND AHH AT
    WALL MAPS

    View Slide

  81. View Slide

  82. View Slide

  83. RABBITMQ
    INPUT
    OUT TO ANALYTICS
    SERVICE
    OUT TO
    VISUALIZATION
    SERVICE

    View Slide

  84. function plot(point) {
    var points = svg.selectAll("circle")
    .data([point], function(d) {
    return d.id;
    });
    points.enter()
    .append("circle")
    .attr("cx", function (d) { return projection([parseInt(d.location.geopoint.lon), parseInt(d.location.geopoint.lat)])[0] })
    .attr("cy", function (d) { return projection([parseInt(d.location.geopoint.lon), parseInt(d.location.geopoint.lat)])[1] })
    .attr("r", function (d) { return 1; })
    .style('fill', 'red')
    .style('fill-opacity', 1)
    .style('stroke', 'red')
    .style('stroke-width', '0.5px')
    .style('stroke-opacity', 1)
    .transition()
    .duration(10000)
    .style('fill-opacity', 0)
    .style('stroke-opacity', 0)
    .attr('r', '32px').remove();
    }
    var buffer = [];
    var socket = io();
    socket.on('geopoint', function(point) {
    if (point.location.geopoint) {
    plot(point);
    }
    });

    View Slide

  85. View Slide

  86. View Slide

  87. EMBEDDABLE VISUALIZATIONS
    IN-DEVELOPMENT

    View Slide

  88. View Slide

  89. var views = 0;
    var socket = io();
    socket.on('pageview', function(point) {
    views++;
    });
    function tick() {
    data.push(views);
    views = 0;
    path
    .attr("d", line)
    .attr("transform", null)
    .transition()
    .duration(500)
    .ease("linear")
    .attr("transform", "translate(" + x(0) + ",0)")
    .each("end", tick);
    data.shift();
    }
    tick();

    View Slide

  90. LIVE PROFILING
    DEVELOPERS NEED LOVE TOO

    View Slide

  91. LIVE PROFILING
    DEVELOPING ON THE AOL MEDIA PLATFORM
    ▸ Use our API and build what you like on servers you manage.
    ▸ Use our managed hosting platform which handles scaling, caching, etc.
    ▸ But… requires you to work in a custom DSL

    View Slide

  92. HOLY CRAP THE GUY WHO BUILT ALL OF THIS IS A GENIUS
    DEVELOPING FOR THE AOL MEDIA PLATFORM
    ▸ Create a repository in your source control system of choice
    ▸ Write code in our twig-based language (CodeBlocks)
    ▸ Code on your local machine is synced to a live sandbox with access to test data
    and resources that mirror production
    ▸ Promote sandboxes to live production
    ▸ This was seriously all built by a guy named Ralph.

    View Slide

  93. {%
    set posts = api.posts.query({
    page: req.params.page|default(1),
    limit: 3,
    categories: req.params.category
    ? [{parent:req.params.category}, req.params.category]
    : null,
    categories_match: 'any',
    tags: req.params.tag ? [req.params.tag] : null
    })
    %}

    View Slide

  94. DEV STARTS A
    PROFILER SESSION
    DEV VISITS
    PRODUCTION SITE
    WITH QUERY PARAM
    RENDER SERVER
    ACTIVATES
    PROFILING
    EVENT MESSAGES
    ARE TAGGED WITH
    SESSION ID
    RABBITMQ
    ROUTES TAGGED
    MESSAGES TO
    PROFILER SERVICE
    DEV’S PROFILER
    CONSOLE CONNECTS
    TO PROFILER
    SERVICE
    PROFILER SERVICE
    WAITS FOR
    MESSAGES
    MESSAGES ARE
    RECEIVED AND
    RENDERED IN THE
    CONSOLE

    View Slide

  95. View Slide

  96. CROSS-PLATFORM
    EVENTING
    WHEN A PACKET HITS A SOCKET ON A POCKET ON A PORT

    View Slide

  97. CROSS-PLATFORM EVENTING
    A LITTLE SOMETHING FOR THAT NICE ENGINEER OVER THERE
    ▸ Allow devs to dispatch “native” events in one stack and observe them in another
    ▸ The PHP CMS uses the Symfony EventDispatcher to trigger an event in Node.js
    ▸ Distributed event handling without PHP workers
    ▸ Event-driven search indexing (no rivers or crons)

    View Slide

  98. RABBITMQ
    INPUT
    OUT TO ANALYTICS
    SERVICE
    OUT TO
    VISUALIZATION
    SERVICE
    RABBITMQ
    OUT TO EVENT
    HANDLER SERVICE

    View Slide

  99. public function dispatch($event_name, Event $event = null)
    {
    $dispatchedEvent = parent::dispatch($event_name, $event);
    if ($dispatchedEvent instanceof ForwardableEvent) {
    $data = $dispatchedEvent->getEventData();
    try {
    $this->amqp->publish(
    self::AMQP_CONNECTION,
    self::AMQP_EXCHANGE,
    self::AMQP_ROUTING_KEY,
    json_encode(['name' => $event_name, 'data' => $data])
    );
    } catch (\Exception $exc) {
    $this->logger->error(
    self::class . ' failed to publish event to AMQP.’,
    [ 'exception' => $exc ]
    );
    }
    }
    return $dispatchedEvent;
    }
    OVERRIDING THE DEFAULT
    BEHAVIOR OF THE PHP EVENT
    DISPATCHER
    DEVS MARK EVENTS AS
    ‘FORWARDABLE’ BY
    IMPLEMENTING AN INTERFACE
    EVENTS ARE
    FORWARDED ON TO
    AN AMQP EXCHANGE

    View Slide

  100. module.exports = {
    register: function (config) {
    client = new es.Client({
    hosts: config.hosts,
    log: Logger
    });
    logger.info('AMP Elasticsearch Indexer module loaded!');
    },
    listeners: {
    'amp.post.save': function (event, callback) {
    var index = 'posts';
    var type = 'post';
    var id = event['id'];
    if (!id) {
    return callback('Invalid post object received');
    }
    indexRecord(index, type, id, event, callback);
    }
    }
    };
    JS FUNCTION EXECUTED WHEN
    PHP DISPATCHES EVENTS

    View Slide

  101. WRAPPING IT UP
    AOL’S DATA PIPELINE - BY THE NUMBERS
    ▸ 1.3 billion events per day
    ▸ Routed by RabbitMQ to microservice consumers
    ▸ Driving real-time analytics over 250 GB of raw data per day
    ▸ Visualizing 1.3 million events per day
    ▸ Generating live profiles for developers of ~50 properties
    ▸ Handling 10,000 Elasticsearch search index updates per day

    View Slide

  102. WRAPPING IT UP
    AOL’S DATA PIPELINE - STACKS AND TECH
    ▸ Programming Languages: Java, Node.js, PHP, Python (HA load-balancing and
    routing)
    ▸ Hadoop, RabbitMQ, Elasticsearch, Vertica

    View Slide

  103. WRAPPING IT UP
    AOL’S DATA PIPELINE - 2016 & BEYOND
    ▸ Embeddable visualizations
    ▸ On-demand stream filters with Redis time-series bucketing
    ▸ Real-time predictive performance analysis
    ▸ Real-time social sentiment analysis
    ▸ Moving all of this infrastructure to AWS (oy!)
    ▸ Integrating Apache Spark

    View Slide

  104. PIPELINE MAP
    AOL DATA LAYER
    RABBITMQ
    REAL-TIME
    ANALYTICS
    SERVICE
    VISUALIZATIONS
    SERVICE
    AOL MEDIA
    PLATFORM
    PROFILER
    SERVICE
    CROSS-PLATFORM
    EVENT PROPAGATION
    SERVICE
    RELEGENCE

    View Slide

  105. Tweet: @ieatkillerbees
    Blog: samanthaquinones.com
    AOL: engineering.aol.com
    https://joind.in/15752

    View Slide