Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making sense of stream processing

Making sense of stream processing

Talk given at /dev/winter 2014, Cambridge, UK. http://devcycles.net/2015/winter/sessions/index.php?session=8

Abstract:

Some people call it stream processing. Others call it Event Sourcing or CQRS. Some even call it Complex Event Processing. Sometimes, such self-important buzzwords are just smoke and mirrors, invented by consultants to sell you stuff. But sometimes, they contain a kernel of wisdom which can really help us design better systems.

In this talk, we will go in search of the wisdom behind the buzzwords. We will discuss how event streams can help make your application more scalable, more reliable and more maintainable. Founded in the experience of building large-scale data systems at LinkedIn, and implemented in open source projects like Apache Samza, stream processing is finally coming of age.

0d4ef9af8e4f0cf5c162b48ba24faea6?s=128

Martin Kleppmann

January 24, 2015
Tweet

Transcript

  1. None
  2. None
  3. None
  4. None
  5. None
  6. { eventType: PageViewEvent, timestamp: 1413215518, ipAddress: 12.34.56.78, sessionId: 106d2a521d3c6abcf36, pageUrl:

    /talks.html, referrer: google.com/search?q=…, browser: Chrome 39 }
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. { eventType: PageViewEvent, timestamp: 1413215518, ipAddress: 12.34.56.78, sessionId: 106d2a521d3c6abcf36, pageUrl:

    /talks.html, referrer: google.com/search?q=…, browser: Chrome 39 }
  15. None
  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. Input (write) Output (read)

  32. Input (write) Output (read) { “user_id”: 17055506, “timestamp”: 1421777578, “status”:

    “Hello world!” } { “timeline_id”: 17055506, “tweets”: [ { “tweet_id”: 557595969962127360, “username”: “hotnumbers”, “name”: “Hot Numbers Coffee”, “timestamp”: 1421777123, “status”: “Open till 9pm tonight”, “picture_url”: “http://twimg.com/…” }, { “tweet_id”: 557515007622414337, … }, … ] }
  33. SELECT tweets.*, users.* FROM tweets JOIN users ON users.id =

    tweets.sender_id JOIN follows ON follows.followee_id = users.id WHERE follows.follower_id = $user ORDER BY tweets.time DESC LIMIT 100;
  34. Input (write) Output (read)

  35. Input (write) Output (read) { “user_id”: 10152654725303061, “action”: “like”, “item_id”:

    10101851078206231 } { “post_id”: 10101851078206231, “author”: { “name”: “Mark Zuckerberg”, “username”: “zuck”, “photo_url”: “http://fbcdn.akamai…” }, “post_text”: “You can’t kill an idea.\n…”, “timestamp”: 1421025628, “total_likes”: 160213, “total_shares”: 6027, “top_comments”: [ {“name”: “Saida Maaoui”, …}, … ], … }
  36. None
  37. None
  38. None
  39. Input (write) Output (read)

  40. Input (write) Output (read) { “user”: “Foo”, “edit_timestamp”: 1421777578, “text”:

    “Elliptic curve cryptography (ECC) is an approach to [[public-key cryptography]] based on the algebraic structure of [[elliptic curve]]s over [[finite field]]s. …” } { “text”: “Elliptic curve cryptography (ECC) is an approach to [[public-key cryptography]] based on the algebraic structure of [[elliptic curve]]s over [[finite field]]s. …” }
  41. Input (write) Output (read)

  42. Input (write) Output (read) { “user_id”: 12526586, “action”: “add-job-to-profile”, “job_info”:

    { “job_title”: “Author”, “company”: “O’Reilly”, “start_date”: “2013-08”, “end_date”: null, “description”: “…” } } accountant → 168929, 929431, … administrative → 481143, 937298, … actor → 656468, 807204, 894765, … advertising → 702221, 715066, … airline → 71955, 215020, 545045, … animal → 107553, 478445, 720498, … auction → 770989, 833569, … author → 218037, 755543, … banker → 408729, 758862, … biotechnology → 106272, 228421, … business → 22388, 539165, … chef → 94341, 363176, 365579, … college → 459788, 830339, … computer → 598379, 693195, … …
  43. None
  44. None
  45. None
  46. None
  47. None
  48. None
  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None