Kafka and Samza: Distributed stream processing in practice

Kafka at LinkedIn •  350+ commodity machines •  8,000+ topics
•  140,000+ partitions •  278 Billion messages/day •  49 TB/day in •  176 TB/day out •  Peak Load –  4.4 Million messages per second –  6 Gigabits/sec Inbound –  21 Gigabits/sec Outbound

References 1.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September
2014. http://shop.oreilly.com/product/0636920034339.do 2.  Jay Kreps: “Why local state is a fundamental primitive in stream processing.” 31 July 2014. http://radar.oreilly.com/ 2014/07/why-local-state-is-a-fundamental-primitive-in-stream-processing.html 3.  Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear in 2015. http://dataintensive.net 4.  Martin Kleppmann: “Rethinking caching in web apps.” 1 October 2012. http://martin.kleppmann.com/2012/10/01/ rethinking-caching-in-web-apps.html 5.  Martin Kleppmann: “Moving faster with data streams: The rise of Samza at LinkedIn.” 14 July 2014. http:// engineering.linkedin.com/stream-processing/moving-faster-data-streams-rise-samza-linkedin 6.  Jakob Homan: “Real time insights into LinkedIn's performance using Apache Samza.” 18 Aug 2014. http:// engineering.linkedin.com/samza/real-time-insights-linkedins-performance-using-apache-samza 7.  Nathan Marz and James Warren: “Big Data: Principles and best practices of scalable realtime data systems.” Manning MEAP, to appear January 2015. http://manning.com/marz/ 8.  Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing (SoCC), October 2012. http://www.socc2012.org/s18-das.pdf 9.  Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, et al.: “Tango: Distributed Data Structures over a Shared Log,” at 24th ACM Symposium on Operating Systems Principles (SOSP), pages 325–340, November 2013. http:// research.microsoft.com/pubs/199947/Tango.pdf 10.  Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M Hellerstein, and Ion Stoica: “Scalable Atomic Visibility with RAMP Transactions,” at ACM International Conference on Management of Data (SIGMOD), June 2014. http:// www.bailis.org/papers/ramp-sigmod2014.pdf 11.  Roshan Sumbaly, Jay Kreps, and Sam Shah: “The ‘Big Data’ Ecosystem at LinkedIn,” at ACM International Conference on Management of Data (SIGMOD), July 2013. http://www.slideshare.net/s_shah/the-big-data-ecosystem-at- linkedin-23512853 12.  Apache Samza documentation. http://samza.incubator.apache.org

Kafka and Samza: Distributed stream processing ...

Kafka and Samza: Distributed stream processing in practice

Martin Kleppmann

More Decks by Martin Kleppmann

Other Decks in Programming

Featured

Transcript

Kafka at LinkedIn •  350+ commodity machines •  8,000+ topics

References 1.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September

Kafka and Samza: Distributed stream processing ...

Kafka and Samza: Distributed stream processing in practice

Martin Kleppmann

More Decks by Martin Kleppmann

Other Decks in Programming

Featured

Transcript

Kafka at LinkedIn • 350+ commodity machines • 8,000+ topics

References 1. Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September

Kafka at LinkedIn •  350+ commodity machines •  8,000+ topics

References 1.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September