Slide 1

Slide 1 text

The TopN extension: Maintaining top 10 lists at scale • Furkan Şahin • Software Engineer • Citus Data @Microsoft • Author of postgresql-topn extension @sahinffurkan @furkansahin

Slide 2

Slide 2 text

Presentation Flow • TopN lists and their usages • How to calculate TopN lists in PostgreSQL? • Why do we need an extension? • How does postgresql-topn work? • How do we use postgresql-topn? • Demo • Summary @sahinffurkan @furkansahin

Slide 3

Slide 3 text

TopN Lists and Their Usages How to calculate TopN lists in PostgreSQL? Why do we need an extension? How does postgresql-topn work? How do we use postgresql-topn? Demo Summary

Slide 4

Slide 4 text

TopN Lists and Their Usages Spotify: Your Top Songs 2018 IMDB: Top Rated Movies E-commerce: Most selling items @sahinffurkan @furkansahin

Slide 5

Slide 5 text

How to calculate TopN lists in PostgreSQL SELECT item, count(*) FROM table GROUP BY 1 ORDER BY 2 DESC LIMIT 10; SELECT topn(topn_add_agg(item),10) FROM table; @sahinffurkan @furkansahin

Slide 6

Slide 6 text

Why do we need an extension? How does postgresql-topn work? How do we use postgresql-topn? Demo Summary TopN Lists and Their Usages How to calculate TopN lists in PostgreSQL

Slide 7

Slide 7 text

Why do we need an extension? Continuous calculations Memory usage Scanning CPU usage Sorting @sahinffurkan @furkansahin

Slide 8

Slide 8 text

Why needed a TopN extension? High load times @sahinffurkan @furkansahin

Slide 9

Slide 9 text

Solution? An extension that uses count-min sketch Low memory footprint Low CPU usage Facilitate pre- processing @sahinffurkan @furkansahin

Slide 10

Slide 10 text

How does postgresql-topn work? How do we use postgresql-topn? Demo Summary TopN Lists and Their Usages How to calculate TopN lists in PostgreSQL Why do we need an extension?

Slide 11

Slide 11 text

How does postgresql-topn work? Count-min sketch Limited number of counters @sahinffurkan @furkansahin

Slide 12

Slide 12 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 Item Freq Heavy fuel 1 SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin

Slide 13

Slide 13 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 Item Freq Heavy fuel 1 Back in Black 1 SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin

Slide 14

Slide 14 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 Item Freq Back in Black 2 Heavy fuel 1 SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin

Slide 15

Slide 15 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 Item Freq Back in Black 3 Heavy fuel 1 SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin

Slide 16

Slide 16 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 Item Freq Back in Black 3 Heavy fuel 1 Englishman in New York 1 SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin

Slide 17

Slide 17 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 Item Freq Back in Black 3 Heavy fuel 2 Englishman in New York 1 SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin

Slide 18

Slide 18 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 Item Freq Back in Black 3 Heavy fuel 2 Englishman in New York 1 Breakfast in America 1 Wipe the bottom half SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin

Slide 19

Slide 19 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 No more data SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin Item Freq Back in Black 3 Heavy fuel 2

Slide 20

Slide 20 text

Example: Calculating the Top 1 song Date Music 10-10-2019 Heavy Fuel 11-10-2019 Back in Black 12-10-2019 Back in Black 13-10-2019 Back in Black 13-10-2019 Englishman in New York 14-10-2019 Heavy Fuel 14-10-2019 Breakfast in America topn.number_of_counters = 1 SELECT topn(topn_add_agg(Music),1) FROM my_music_history; @sahinffurkan @furkansahin Item Freq Back in Black 3

Slide 21

Slide 21 text

How does postgresql-topn work? topn(column::JSONB, n::int) => returns Tuple Set(item, frequency) topn_add_agg(column::text) => returns JSONB topn_union_agg(column::JSONB) => returns JSONB topn_add(column::JSONB, data::text) => returns JSONB topn_union(column_1::JSONB, column_2::JSONB) => returns JSONB @sahinffurkan @furkansahin

Slide 22

Slide 22 text

Why JSONB? Human readable Portable Easy to process @sahinffurkan @furkansahin

Slide 23

Slide 23 text

How do we use postgresql-topn? Demo Summary TopN Lists and Their Usages How to calculate TopN lists in PostgreSQL Why do we need an extension? How does postgresql-topn work?

Slide 24

Slide 24 text

How do we use postgresql-topn? Pre-aggregation Materialization of TopNs

Slide 25

Slide 25 text

Demo Summary TopN Lists and Their Usages How to calculate TopN lists in PostgreSQL Why do we need an extension? How does postgresql-topn work? How do we use postgresql-topn?

Slide 26

Slide 26 text

Summary Requires pre-processing for performance gain Low memory footprint Approximates the results Open source Production ready

Slide 27

Slide 27 text

Resources • https://github.com/citusdata/postgresql-topn • TopN for your Postgres database https://www.citusdata.com/blog/2018/03/27/topn-for-your-postgres-database/ @sahinffurkan @furkansahin

Slide 28

Slide 28 text

Thank you @sahinffurkan @furkansahin

Slide 29

Slide 29 text

No content