Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making sense of stream processing

Making sense of stream processing

Talk given at /dev/winter 2014, Cambridge, UK. http://devcycles.net/2015/winter/sessions/index.php?session=8

Abstract:

Some people call it stream processing. Others call it Event Sourcing or CQRS. Some even call it Complex Event Processing. Sometimes, such self-important buzzwords are just smoke and mirrors, invented by consultants to sell you stuff. But sometimes, they contain a kernel of wisdom which can really help us design better systems.

In this talk, we will go in search of the wisdom behind the buzzwords. We will discuss how event streams can help make your application more scalable, more reliable and more maintainable. Founded in the experience of building large-scale data systems at LinkedIn, and implemented in open source projects like Apache Samza, stream processing is finally coming of age.

Martin Kleppmann

January 24, 2015
Tweet

More Decks by Martin Kleppmann

Other Decks in Programming

Transcript

  1. View Slide

  2. View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. {


    eventType:


    PageViewEvent,


    timestamp:



    1413215518,


    ipAddress:


    12.34.56.78,


    sessionId:


    106d2a521d3c6abcf36,


    pageUrl:


    /talks.html,


    referrer:


    google.com/search?q=…,


    browser:


    Chrome 39

    }

    View Slide

  7. View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. {


    eventType:


    PageViewEvent,


    timestamp:



    1413215518,


    ipAddress:


    12.34.56.78,


    sessionId:


    106d2a521d3c6abcf36,


    pageUrl:


    /talks.html,


    referrer:


    google.com/search?q=…,


    browser:


    Chrome 39

    }

    View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. Input (write)
    Output (read)

    View Slide

  32. Input (write)
    Output (read)

    {

    “user_id”: 17055506,

    “timestamp”: 1421777578,

    “status”: “Hello world!”

    }

    {

    “timeline_id”: 17055506,

    “tweets”: [

    {

    “tweet_id”: 557595969962127360,

    “username”: “hotnumbers”,

    “name”: “Hot Numbers Coffee”,

    “timestamp”: 1421777123,

    “status”: “Open till 9pm tonight”,

    “picture_url”: “http://twimg.com/…”

    }, {

    “tweet_id”: 557515007622414337,



    }, …

    ]

    }

    View Slide

  33. SELECT tweets.*, users.*
    FROM tweets
    JOIN users
    ON users.id = tweets.sender_id
    JOIN follows
    ON follows.followee_id = users.id
    WHERE follows.follower_id = $user
    ORDER BY tweets.time DESC
    LIMIT 100;

    View Slide

  34. Input (write)
    Output (read)

    View Slide

  35. Input (write)
    Output (read)

    {

    “user_id”: 10152654725303061,

    “action”: “like”,

    “item_id”: 10101851078206231

    }

    {

    “post_id”: 10101851078206231,

    “author”: {

    “name”: “Mark Zuckerberg”,

    “username”: “zuck”,

    “photo_url”: “http://fbcdn.akamai…”

    },

    “post_text”: “You can’t kill an idea.\n…”,

    “timestamp”: 1421025628,

    “total_likes”: 160213,

    “total_shares”: 6027,

    “top_comments”: [

    {“name”: “Saida Maaoui”, …},



    ], …

    }

    View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. Input (write)
    Output (read)

    View Slide

  40. Input (write)
    Output (read)

    {

    “user”: “Foo”,

    “edit_timestamp”: 1421777578,

    “text”: “Elliptic curve cryptography (ECC)

    is an approach to [[public-key

    cryptography]] based on the algebraic

    structure of [[elliptic curve]]s over

    [[finite field]]s. …”

    }

    {

    “text”: “Elliptic curve cryptography (ECC)

    is an approach to [[public-key

    cryptography]] based on the algebraic

    structure of [[elliptic curve]]s over

    [[finite field]]s. …”

    }

    View Slide

  41. Input (write)
    Output (read)

    View Slide

  42. Input (write)
    Output (read)

    {

    “user_id”: 12526586,

    “action”: “add-job-to-profile”,

    “job_info”: {

    “job_title”: “Author”,

    “company”: “O’Reilly”,

    “start_date”: “2013-08”,

    “end_date”: null,

    “description”: “…”

    }

    }

    accountant → 168929, 929431, …

    administrative → 481143, 937298, …

    actor → 656468, 807204, 894765, …

    advertising → 702221, 715066, …

    airline → 71955, 215020, 545045, …

    animal → 107553, 478445, 720498, …

    auction → 770989, 833569, …

    author → 218037, 755543, …

    banker → 408729, 758862, …

    biotechnology → 106272, 228421, …

    business → 22388, 539165, …

    chef → 94341, 363176, 365579, …

    college → 459788, 830339, …

    computer → 598379, 693195, …



    View Slide

  43. View Slide

  44. View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. View Slide

  53. View Slide

  54. View Slide

  55. View Slide