Speaker Deck

Kafka, Samza, and the Unix philosophy of distributed data

by Martin Kleppmann

Published August 5, 2015 in Programming

Talk given at the UK Hadoop Meetup (HUGUK), London, UK on 5 August 2015. http://martin.kleppmann.com/2015/08/05/samza-unix-philosophy-at-huguk.html

Transcript: http://martinkl.com/unix

Abstract:

One of the big ideas in Unix was to allow small, simple command-line tools to be chained together with pipes. Each of those tools would do one thing and do it well. Even now, 50 years later, Unix tools are one of the most powerful ways of getting things done: a one-liner of grep | awk | sort | uniq is still one of the fastest ways of processing data and analysing logs.

Many modern data systems are monolithic, the very opposite of the Unix philosophy. But Apache Samza is different: it is, in some sense, an attempt to bring the Unix philosophy into 21st-century distributed systems. In this talk, we will explore the design decisions behind Samza, and see how the Unix philosophy can help us build modern systems that are robust, scalable and maintainable.