Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kafka, Samza, and the Unix philosophy of distributed data

Kafka, Samza, and the Unix philosophy of distributed data

Talk given at the UK Hadoop Meetup (HUGUK), London, UK on 5 August 2015. http://martin.kleppmann.com/2015/08/05/samza-unix-philosophy-at-huguk.html

Transcript: http://martinkl.com/unix

Abstract:

One of the big ideas in Unix was to allow small, simple command-line tools to be chained together with pipes. Each of those tools would do one thing and do it well. Even now, 50 years later, Unix tools are one of the most powerful ways of getting things done: a one-liner of grep | awk | sort | uniq is still one of the fastest ways of processing data and analysing logs.

Many modern data systems are monolithic, the very opposite of the Unix philosophy. But Apache Samza is different: it is, in some sense, an attempt to bring the Unix philosophy into 21st-century distributed systems. In this talk, we will explore the design decisions behind Samza, and see how the Unix philosophy can help us build modern systems that are robust, scalable and maintainable.

0d4ef9af8e4f0cf5c162b48ba24faea6?s=128

Martin Kleppmann

August 05, 2015
Tweet

Transcript

  1. None
  2. 216.58.210.78 - - [27/Feb/2015:17:55:11 +0000] "GET /css/typography.css HTTP/1.1" 200 3377

    "http://martin.kleppmann.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36"
  3. None
  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. cat access.log | awk '{print $7}' | sort | uniq

    -c | sort -rn | head –n 5
  13. cat access.log | awk '{print $7}' | sort | uniq

    -c | sort -rn | head –n 5 216.58.210.78 - - [27/Feb/2015:17:55:11 +0000] "GET /css/typography.css HTTP/1.1" 200 3377 "http://martin.kleppmann.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36"
  14. cat access.log | awk '{print $7}' | sort | uniq

    -c | sort -rn | head –n 5 /css/typography.css /index.html /favicon.ico /talks.html /favicon.ico /index.html /css/typography.css /favicon.ico
  15. cat access.log | awk '{print $7}' | sort | uniq

    -c | sort -rn | head –n 5 /css/typography.css /css/typography.css /favicon.ico /favicon.ico /favicon.ico /index.html /index.html /talks.html
  16. cat access.log | awk '{print $7}' | sort | uniq

    -c | sort -rn | head –n 5 2 /css/typography.css 3 /favicon.ico 2 /index.html 1 /talks.html
  17. cat access.log | awk '{print $7}' | sort | uniq

    -c | sort -rn | head –n 5 3 /favicon.ico 2 /css/typography.css 2 /index.html 1 /talks.html
  18. cat access.log | awk '{print $7}' | sort | uniq

    -c | sort -rn | head –n 5 3 /favicon.ico 2 /css/typography.css 2 /index.html 1 /talks.html
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. None
  34. None
  35. None
  36. None
  37. None
  38. None
  39. None
  40. None
  41. None
  42. None
  43. None
  44. None
  45. None
  46. None
  47. None
  48. None
  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None
  56. None
  57. None
  58. None
  59. None
  60. None
  61. None
  62. None
  63. None
  64. None
  65. None
  66. None
  67. None
  68. None
  69. None
  70. None
  71. References •  M D McIlroy, E N Pinson, and B

    A Tague: “UNIX Time-Sharing System: Foreword,” The Bell System Technical Journal, volume 57, number 6, pages 1899–1904, July 1978. https:// archive.org/details/bstj57-6-1899 •  Rob Pike and Brian W Kernighan: “Program design in the UNIX environment,” AT&T Bell Laboratories Technical Journal, volume 63, number 8, pages 1595–1605, October 1984. doi: 10.1002/j.1538-7305.1984.tb00055.x, http://harmful.cat-v.org/cat-v/unix_prog_design.pdf •  Dennis Ritchie: “Advice from Doug McIlroy.” http://cm.bell-labs.co/who/dmr/ mdmpipe.html •  Jay Kreps: “Putting Apache Kafka to use: A practical guide to building a stream data platform (part 2).” 24 February 2015. http://www.confluent.io/blog/stream-data- platform-2/ •  Jay Kreps: “I ♥︎ Logs.” O’Reilly Media, September 2014. http://shop.oreilly.com/product/ 0636920034339.do •  Martin Kleppmann: “Bottled Water: Real-time integration of PostgreSQL and Kafka.” 23 April 2015. http://www.confluent.io/blog/bottled-water-real-time-integration-of- postgresql-and-kafka/ •  Martin Kleppmann: “Designing Data-Intensive Applications.” O’Reilly Media, to appear in 2015. http://dataintensive.net
  72. None
  73. Bonus  slides  

  74. None
  75. None
  76. None
  77. None
  78. None
  79. None
  80. None