Systems that enable data agility

May 06, 2015

3.9k

Systems that enable data agility

Talk given at Strata + Hadoop World London, 6 May 2015. http://strataconf.com/big-data-conference-uk-2015/public/schedule/detail/39689

Abstract:

Congratulations, you’ve got a lot of data! Now what? How do you enable your organisation to create value from that data? What tools do your data scientists need in order to create data-driven products? How do you empower your teams to experiment, to innovate, and to be agile in response to customer needs?

In this session we will discuss LinkedIn’s approach to solving these problems, and the open source tools that were created at LinkedIn to support data agility in a large organisation. The approach boils down to a few simple ideas:

1. Make all data available centrally, in real time. If it’s too difficult to access data across silos, you can’t derive value from it. For this purpose, LinkedIn created Apache Kafka, which can be the data exchange backbone of your organisation.

2. Make it easy to analyse and process that data. You’ve hired smart people, now empower them to easily try out new ideas for data-driven products, and rapidly get them into production if they are good. To support this, LinkedIn created Apache Samza, a stream processing framework that provides powerful, reliable tools for working with data in Kafka.

Since Kafka and Samza are open source, you can apply these lessons and start implementing your own agile data pipeline today.

In this talk you’ll learn about:

- How Kafka and Samza reliably scale to millions of messages per second
- What kinds of real-time data problems you can solve with Samza
- How Samza compares to other stream processing frameworks
- How data streams support collaboration between different data science, product and engineering teams within an organisation
- Lessons learnt on how to move fast without breaking things

Martin Kleppmann

May 06, 2015

More Decks by Martin Kleppmann

See All by Martin Kleppmann

Collaborative text editing with Eg-walker: Better, faster, smaller

ept

490

Byzantine Eventual Consistency and Local-first Access Control

ept

730

The past, present, and future of local-first

ept

2.3k

Where local-first came from and where it's going

ept

4.4k

Byzantine fault tolerance for peer-to-peer collaboration

ept

1.3k

New algorithms for collaborative text editing

ept

1.3k

Creating local-first collaboration software with Automerge

ept

2.8k

Collaborative editing through a databases lens

ept

2.5k

Making CRDTs Byzantine fault tolerant

ept

Other Decks in Programming

See All in Programming

CIを整備してメンテナンスを生成AIに任せる

hazumirr

280

React は次の10年を生き残れるか：3つのトレンドから考える

oukayuka

15k

AIのメモリー

watany

階層化自動テストで開発に機動力を

ickx

430

SwiftでMCPサーバーを作ろう！

giginet

PRO

210

No Install CMS戦略〜 5年先を見据えたフロントエンド開発を考える / no_install_cms

rdlabo

370

構造化・自動化・ガードレール - Vibe Coding実践記 -

tonegawa07

150

GPUを計算資源として使おう！

primenumber

290

Prompt Engineeringの再定義「Context Engineering」とは

htsuruo

110

テスターからテストエンジニアへ ~新米テストエンジニアが歩んだ9ヶ月振り返り~

non0113

240

AIともっと楽するE2Eテスト

myohei

3.1k

DMMを支える決済基盤の技術的負債にどう立ち向かうか / Addressing Technical Debt in Payment Infrastructure

yoshiyoshifujii

640

Featured

See All Featured

The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024

eileencodes

2.9k

The Invisible Side of Design

smashingmag

301

51k

Practical Tips for Bootstrapping Information Extraction Pipelines

honnibal

PRO

1.4k

How to Ace a Technical Interview

jacobian

278

23k

KATA

mclloyd

14k

YesSQL, Process and Tooling at Scale

rocio

173

14k

Building Flexible Design Systems

yeseniaperezcruz

328

39k

GitHub's CSS Performance

jonrohan

1031

460k

Site-Speed That Sticks

csswizardry

720

Scaling GitHub

holman

461

140k

Visualization

eitanlees

146

16k

[RailsConf 2023 Opening Keynote] The Magic of Rails

eileencodes

9.6k

Transcript

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
216.58.210.78 - - [27/Feb/2015:17:55:11 +0000] "GET /css/typography.css HTTP/1.1” 200 3377
"http://martin. kleppmann.com/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36"
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
References 1.  Jay Kreps: “Putting Apache Kafka to use: A
practical guide to building a stream data platform (part 1).” 25 February 2015. http://blog.conﬂuent.io/2015/02/25/stream-data-platform-1/ 2.  Jay Kreps: “I ♥︎ Logs.” O’Reilly Media, September 2014. http://shop.oreilly.com/product/ 0636920034339.do 3.  Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear in 2015. http://dataintensive.net 4.  Martin Kleppmann: “Bottled Water: Real-time integration of PostgreSQL and Kafka.” 23 April 2015. http://blog.conﬂuent.io/2015/04/23/bottled-water-real-time-integration-of- postgresql-and-kafka/ 5.  Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing (SoCC), October 2012. http://www.socc2012.org/s18- das.pdf
None

Systems that enable data agility

Systems that enable data agility

Martin Kleppmann

More Decks by Martin Kleppmann

Other Decks in Programming

Featured

Transcript

216.58.210.78 - - [27/Feb/2015:17:55:11 +0000] "GET /css/typography.css HTTP/1.1” 200 3377

References 1. Jay Kreps: “Putting Apache Kafka to use: A

References 1.  Jay Kreps: “Putting Apache Kafka to use: A