Searching over streams with Luwak and Samza

Martin Kleppmann

January 31, 2015

760

Searching over streams with Luwak and Samza

Talk co-presented with Alan Woodward at FOSDEM, Brussels, Belgium, on 31 January 2015. http://martin.kleppmann.com/2015/01/31/searching-over-streams-at-fosdem.html

Abstract:

Real-time searching over streams is useful in a number of contexts. For example, companies may want to detect whenever they are mentioned in a news feed; or a Twitter user might want to see a continuous stream of tweets for a particular hashtag.

Luwak (https://github.com/flaxsearch/luwak) provides a mechanism for running many thousands of queries over a single document in a highly efficient manner, by filtering out queries that it can detect will not match. Luwak is designed to run on a single node, holding all registered queries in RAM. Scaling to higher document throughput, or to more queries, requires parallelization across multiple machines.

Samza (http://samza.apache.org/) provides a framework for such parallelization, by partitioning and recombining both the document streams and the query set (which can be treated as just another stream), and also provides fault-tolerance mechanisms that allows swift recovery from machine failure, without losing documents or queries.

Martin Kleppmann

January 31, 2015

Tweet

More Decks by Martin Kleppmann

See All by Martin Kleppmann

Collaborative text editing with Eg-walker: Better, faster, smaller

0

570

Byzantine Eventual Consistency and Local-first Access Control

0

800

The past, present, and future of local-first

0

2.4k

Where local-first came from and where it's going

0

4.5k

Byzantine fault tolerance for peer-to-peer collaboration

0

1.3k

New algorithms for collaborative text editing

0

1.3k

Creating local-first collaboration software with Automerge

0

2.9k

Collaborative editing through a databases lens

0

2.5k

Making CRDTs Byzantine fault tolerant

0

3k

Other Decks in Programming

See All in Programming

実践 Dev Containers × Claude Code

1

190

GUI操作LLMの最新動向: UI-TARSと関連論文紹介

0

930

CEDEC2025 長期運営ゲームをあと10年続けるための0から始める自動テスト ~4000項目を50%自動化し、月1→毎日実行にした3年間~

akatsukigames_tech

0

130

ゲームの物理

5

1.1k

生成AI、実際どう？ - ニーリーの場合

0

110

AIレビュアーをスケールさせるには / Scaling AI Reviewers

2

180

What's new in Adaptive Android development

0

140

『リコリス・リコイル』に学ぶ！！〜キャリア戦略における計画的偶発性理論と変わる勇気の重要性〜

1

530

オホーツクでコミュニティを立ち上げた理由―地方出身プログラマの挑戦 / TechRAMEN 2025 Conference

2

470

一人でAIプロダクトを作るための工夫〜技術選定・開発プロセス編〜 / I want AI to work harder

12

2.6k

可変性を制する設計: 構造と振る舞いから考える概念モデリングとその実装

10

1.7k

パスタの技術

1

380

Featured

See All Featured

A better future with KSS

239

17k

Chrome DevTools: State of the Union 2024 - Debugging React & Beyond

7

810

Product Roadmaps are Hard

PRO

54

11k

Large-scale JavaScript Application Architecture

512

110k

Faster Mobile Websites

309

31k

Keith and Marios Guide to Fast Websites

411

22k

The Cost Of JavaScript in 2023

53

8.8k

What’s in a name? Adding method to the madness

productmarketing

PRO

23

3.6k

Automating Front-end Workflow

1370

200k

It's Worth the Effort

186

28k

Imperfection Machines: The Place of Print at Facebook

268

13k

Typedesign – Prime Four

42

2.8k

Transcript