Event Sourcing and Stream Processing at Scale

{ eventType: PageViewEvent, 3mestamp: 1413215518, viewerId: 1234, sessionId: 646cf6694c550a24, pageKey:
proﬁle-view, viewedProﬁleId: 4321, trackingKey: invita3on-email, ... etc. metadata about what content was displayed... }

{ eventType: ProﬁleEditEvent, 3mestamp: 1413215518, proﬁleId: 1234, old: { loca3on:
"London, UK", industry: "Financial Services"}, new: { loca3on: "Brussels, Belgium", industry: "SoUware"} }

Kafka at scale •  LinkedIn: 1.1 trillion (1.1⨯1012) events per
day peak: 18 M events/sec, 3.8 GB/sec •  Netflix: 400 billion (4⨯1011) events per day peak: 8 M events/sec, 17 GB/sec •  Uber, Twitter, Yahoo, Spotify, etc. •  http://www.confluent.io/blog/apache-kafka-hits-1.1-trillion-messages-per-day-joins-the-4-comma-club (Sept 2015) https://engineering.linkedin.com/kafka/running-kafka-scale (March 2015) https://engineering.linkedin.com/blog/2016/01/whats-new-samza (January 2016) http://www.slideshare.net/wangxia5/netflix-kafka (March 2015) https://cwiki.apache.org/confluence/display/KAFKA/Powered+By

References 1.  Tyler Akidau, Robert Bradshaw, Craig Chambers, et al.:
“The Dataﬂow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing,” Proceedings of the VLDB Endowment, volume 8, number 12, pages 1792–1803, August 2015. http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf 2.  Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing (SoCC), October 2012. http://www.socc2012.org/s18-das.pdf 3.  Pat Helland: “Immutability Changes Everything,” at 7th Biennial Conference on Innovative Data Systems Research (CIDR), January 2015. http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf 4.  Nathan Marz and James Warren: “Big Data: Principles and best practices of scalable realtime data systems.” Manning, April 2015, ISBN 9781617290343. http://manning.com/marz/ 5.  Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear. http://dataintensive.net 6.  Martin Kleppmann and Jay Kreps: “Kafka, Samza and the Unix philosophy of distributed data.” IEEE Data Engineering Bulletin, December 2015. http://martin.kleppmann.com/papers/kafka-debull15.pdf 7.  Jay Kreps: “Why local state is a fundamental primitive in stream processing.” 31 July 2014. http://radar.oreilly.com/ 2014/07/why-local-state-is-a-fundamental-primitive-in-stream-processing.html 8.  Jay Kreps: “Questioning the Lambda Architecture.” July 2014. http://radar.oreilly.com/2014/07/questioning-the-lambda- architecture.html 9.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September 2014. http://shop.oreilly.com/product/0636920034339.do 10.  Praveen Neppalli Naga: “Real-time Analytics at Massive Scale with Pinot.” 29 Sept 2014. http:// engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot

Event Sourcing and Stream Processing at Scale

Event Sourcing and Stream Processing at Scale

Martin Kleppmann

More Decks by Martin Kleppmann

Other Decks in Programming

Featured

Transcript

{ eventType: PageViewEvent, 3mestamp: 1413215518, viewerId: 1234, sessionId: 646cf6694c550a24, pageKey:

{ eventType: PageViewEvent, 3mestamp: 1413215518, viewerId: 1234, sessionId: 646cf6694c550a24, pageKey:

{ eventType: ProﬁleEditEvent, 3mestamp: 1413215518, proﬁleId: 1234, old: { loca3on:

Kafka at scale •  LinkedIn: 1.1 trillion (1.1⨯1012) events per

References 1.  Tyler Akidau, Robert Bradshaw, Craig Chambers, et al.:

Event Sourcing and Stream Processing at Scale

Event Sourcing and Stream Processing at Scale

Martin Kleppmann

More Decks by Martin Kleppmann

Other Decks in Programming

Featured

Transcript

{ eventType: PageViewEvent, 3mestamp: 1413215518, viewerId: 1234, sessionId: 646cf6694c550a24, pageKey:

{ eventType: PageViewEvent, 3mestamp: 1413215518, viewerId: 1234, sessionId: 646cf6694c550a24, pageKey:

{ eventType: ProﬁleEditEvent, 3mestamp: 1413215518, proﬁleId: 1234, old: { loca3on:

Kafka at scale • LinkedIn: 1.1 trillion (1.1⨯1012) events per

References 1. Tyler Akidau, Robert Bradshaw, Craig Chambers, et al.:

Kafka at scale •  LinkedIn: 1.1 trillion (1.1⨯1012) events per

References 1.  Tyler Akidau, Robert Bradshaw, Craig Chambers, et al.: