Slide 1

Slide 1 text

Blooming Trending Topics

Slide 2

Slide 2 text

A little bit of context

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Timeline 17 August • According to Musk, Alexandre de Moras threatened to arrest the legal representative of Twitter in Brazil. • The threat was a response to Twitter not banning, in secrecy, a number of accounts requested by the Supreme Court of Brazil. • As a result, to protect their sta ff , Twitter closed operations in Brazil.

Slide 6

Slide 6 text

Timeline 28 August • The Supreme Court, through its own pro fi le on Twitter, summoned Elon Musk and ordered him to appoint a legal representative for the company within 24 hours. • If Twitter failed to comply with the decision, it would be suspended.

Slide 7

Slide 7 text

Timeline 29 August • Alexandre de Moraes determined the freezing of fi nancial resources of Starlink in Brazil to ensure the payment of fi nes imposed on Twitter.

Slide 8

Slide 8 text

Timeline 30 August • Alexandre de Moraes ordered the suspension of access to the service in the country. • He also set a fi ne of ~9 thousand dollars to any person 
 or company using a VPN to circumvent the block. • And further ordered the removal of any VPN 
 applications available on the App or Play Store, 
 but this decision was later reversed.

Slide 9

Slide 9 text

Timeline 31 August • Internet Providers started blocking Twitter

Slide 10

Slide 10 text

Timeline 1 September • Bluesky gets 1 million new users (more than 3 million already)

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

One of many things BSKY doesn’t have…

Slide 14

Slide 14 text

Blooming Trending Topics on Bluesky

Slide 15

Slide 15 text

Building my own Trending Topics • Listen to Bluesky’s fi rehose • Parse each message and extract hashtags • Count these hashtags • Sort them

Slide 16

Slide 16 text

Sorted Set • I also want to keep historical data so that I can track the evolution of trending topics • I decided that a window of 15 minutes would be good enough • For ~2500 hashtags (15 minutes), it would consume around ~275kb of memory • This translates to (4 * 24 * 275)kb per day (26mb) or ~10gb per year. [if the number of users, messages and frequency of hashtags doesn’t increase] • I wasn’t only thinking of tracking hashtags. I was also thinking of tracking user data such as followers, post counts, blocked data, etc…

Slide 17

Slide 17 text

That’s when I heard of Count-Min Sketch Do you know what Count-Min Sketch is? No

Slide 18

Slide 18 text

Count-Min Sketch • A probablistic Data Structure included in RedisBloom • Used to estimate the frequency of elements in a data stream • Operates with space-e ffi ciency, using a fi xed amount of memory regardless of data scale • The advantage of using it is that it may consume way less memory by giving up on accuracy How many times have I been mentioned?

Slide 19

Slide 19 text

How it works…

Slide 20

Slide 20 text

Count-Min Sketch • Internally it’s a grid (sketch) of w (width) and d (depth) • The rows (d) represent the number of hash functions. The columns (w) represent the counter array for each of the hashing functions CMS.INITBYDIM key width depth

Slide 21

Slide 21 text

Count-Min Sketch: Initializing 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CMS.INITBYDIM hashtags 5 3 fi xed size

Slide 22

Slide 22 text

Count-Min Sketch: Incrementing 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hash1(“#redis”) % 5 = 2 Hash2(“#redis”) % 5 = 4 Hash3(“#redis”) % 5 = 1 CMS.INCRBY hashtags #redis 1 1 1 1

Slide 23

Slide 23 text

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hash1(“#pets”) % 5 = 0 Hash2(“#pets”) % 5 = 3 Hash3(“#pets”) % 5 = 1 CMS.INCRBY hashtags #pets 1 1 1 1 1 1 2 Count-Min Sketch: Incrementing

Slide 24

Slide 24 text

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hash1(“#cats”) % 5 = 3 Hash2(“#cats”) % 5 = 4 Hash3(“#cats”) % 5 = 0 CMS.INCRBY hashtags #cats 1 1 1 1 1 1 2 1 2 1 Count-Min Sketch: Incrementing

Slide 25

Slide 25 text

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hash1(“#dogs”) % 5 = 2 Hash2(“#dogs”) % 5 = 1 Hash3(“#dogs”) % 5 = 3 CMS.INCRBY hashtags #dogs 1 1 1 1 1 1 2 1 2 1 2 1 1 Count-Min Sketch: Incrementing

Slide 26

Slide 26 text

Count-Min Sketch: Querying 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hash1(“#dogs”) % 5 = 2 Hash2(“#dogs”) % 5 = 1 Hash3(“#dogs”) % 5 = 3 CMS.QUERY hashtags #dogs 1 1 1 1 1 2 1 2 1 2 1 1 2 1 1

Slide 27

Slide 27 text

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hash1(“#redis”) % 5 = 2 Hash2(“#redis”) % 5 = 4 Hash3(“#redis”) % 5 = 1 CMS.QUERY hashtags #redis 1 1 1 1 1 2 1 2 1 2 1 1 2 1 1 Count-Min Sketch: Querying 2

Slide 28

Slide 28 text

Count-Min Sketch: Probability • The width determines the error rate:
 A larger width means more counters to distribute the counts, leading to a lower error rate because there’s less likelihood of collisions in fl ating counts. If we are conservative and say that a counter can get twice the average amount, then the formula to calculate it is: “e = 2/w" • The depth determines the con fi dence in this error rate: (½)^d
 A greater depth means that there are more rows, reducing the likelihood that all rows will simultaneously overestimate due to collisions. The chance a row will overestimate is of 50%, it either will or not. By increasing the number of rows, we decrease this chance: (½)^d For a Sketch of 5/3: • Error rate: 40% • Con fi dence in this error rate: 99.87% 99.87% of the time, the counter will be within 40% of the true value For a Sketch of 2000/10: • Error rate: 0.1% • Con fi dence in this error rate: 99,99% 99.99% of the time, the counter will be within 0.1% of the true value

Slide 29

Slide 29 text

Demo time!