Slide 1

Slide 1 text

Building a Bluesky Bot powered by AI Raphael De Lio

Slide 2

Slide 2 text

Who is in Bluesky?

Slide 3

Slide 3 text

Decentralized Network

Slide 4

Slide 4 text

Bluesky’s Jetstream

Slide 5

Slide 5 text

How many posts are created in Bluesky per minute?

Slide 6

Slide 6 text

Approximately 2200

Slide 7

Slide 7 text

Approximately 95 million per month

Slide 8

Slide 8 text

How can we efficiently analyze this data?

Slide 9

Slide 9 text

What are we building today? • Listen to Bluesky’s Jetstream • Filter in posts related to AI • Enrich the data from these posts with sub topics • Keep track of frequency of topics • Allow Bluesky users to interact with our bot and ask questions

Slide 10

Slide 10 text

How are we gonna do that? • Redis Streams • Semantic Classification • Semantic Routing • Semantic Caching • Probabilistic Data Structures

Slide 11

Slide 11 text

Challenge #1: Storing the messages

Slide 12

Slide 12 text

Why? • Messages are ephemeral • We want to process these messages with multiple services.

Slide 13

Slide 13 text

Redis Streams

Slide 14

Slide 14 text

• In-memory first (Fast) • Easy setup • Support for Consumer Groups • Perfect fit for realtime pipelines

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Application 1 Application 2 Application 3

Slide 17

Slide 17 text

Application 2 Application 3 Application 1 Consumer 1 Consumer 2

Slide 18

Slide 18 text

Demo time!

Slide 19

Slide 19 text

Challenge #2: Filtering the messages

Slide 20

Slide 20 text

Classifying Messages about Artificial Intelligence Why?

Slide 21

Slide 21 text

All Bluesky Posts Challenge #1 Filtered Bluesky Posts Filtering Service Challenge #2

Slide 22

Slide 22 text

Filtering Service

Slide 23

Slide 23 text

Approach #1: Using an LLM

Slide 24

Slide 24 text

Filtering Service Is this post about AI? Filtered Bluesky Posts Store the post in Redis as a Hash or a JSON

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Approach #2: Using Vector Database (like Redis)

Slide 27

Slide 27 text

Generate 500 Bluesky Posts about AI Turn these samples into Vectors Store the vectors in a Vector Database

Slide 28

Slide 28 text

Filtering Service Turn the post into a vector Compare the vector of the post with the vector of the samples in the Vector Database Write the post to the filtered-posts stream Store the post in Redis as a Hash or a JSON

Slide 29

Slide 29 text

Demo time!

Slide 30

Slide 30 text

Challenge #3: Enriching the data

Slide 31

Slide 31 text

Topic Extraction

Slide 32

Slide 32 text

Increase the granularity of what’s being talked about Why? Keep track of how popular these topics are

Slide 33

Slide 33 text

Filtering Service Filtered Bluesky Posts Challenge #2 Challenge #3 Topic Extractor Service

Slide 34

Slide 34 text

Topic Extractor Service

Slide 35

Slide 35 text

Topic Extractor Service What topics can be implied from this post? Update post in Redis (Hash or JSON) Increment topics counters in the TopK

Slide 36

Slide 36 text

Demo time!

Slide 37

Slide 37 text

Demo time!

Slide 38

Slide 38 text

Challenge #4: Analyze the data & take action

Slide 39

Slide 39 text

Understand what functions to call

Slide 40

Slide 40 text

I have two functions: • Listing trending topics • Summarizing posts

Slide 41

Slide 41 text

Approach #1: Using a LLM

Slide 42

Slide 42 text

Bot These are the possible functions: […] What function should I call based on the post? Bot Invokes the specific function Generates final response with information from the function Bot Reads post that mentions the bot Posts response

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

Approach #2: Using a Vector Database (like Redis) (Semantic Routing)

Slide 45

Slide 45 text

Generate references for a certain route Turn these references into Vectors alongside their route Store the vectors in a Vector Database

Slide 46

Slide 46 text

Bot Turn the post into a vector Compare the vector of the post with the vector of the routes in the Vector Database Invoke returned route (function) if similarity is high enough Bot Generates final response with information from the function Bot Post response

Slide 47

Slide 47 text

Demo time!

Slide 48

Slide 48 text

Challenge #5: Repeated Questions

Slide 49

Slide 49 text

Users may ask the same question phrased differently

Slide 50

Slide 50 text

Semantic Cache

Slide 51

Slide 51 text

Demo time!

Slide 52

Slide 52 text

Challenge #6: Deduplication

Slide 53

Slide 53 text

Messages are delivered At Least Once

Slide 54

Slide 54 text

Approach #1: Using a SET

Slide 55

Slide 55 text

Consumer Stream Process Message Check if ID exists in SET Consumer Stores ID in SET

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

Approach #2: Using a Bloom Filter

Slide 58

Slide 58 text

Consumer Stream Process Message Check if ID exists in BLOOM FILTER Consumer Add ID to the Bloom Filter

Slide 59

Slide 59 text

Demo time!

Slide 60

Slide 60 text

Recap • How to use Redis Streams for consuming realtime data • How to use Semantic Classification for filtering data • How to use LLMs to extract data • How to use Semantic Routing for calling functions • How to use Semantic Caching for saving money & time • How to efficiently analyze huge data streams with TopK and Bloom Filter

Slide 61

Slide 61 text

RAPHAEL DE LIO DEVELOPER ADVOCATE *