The untapped power of vector embeddings

by Frank van Dijk

Embed

Start on current slide

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

You’ll automate redirects, handle AI hallucinations more effectively, and understand what embeddings are

Slide 3

Slide 3 text

We don’t speak the same language Humans think in concepts, computers process data.

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Source: Google Trends (query: vector embeddings)

Slide 6

Slide 6 text

6 @frankvndijk What are embeddings? “Embeddings are numerical representations of data (like words, images, or audio) in a multi- dimensional space” Images Audio Text Embedding model 0.9 0.7 0.2 0.6

Slide 7

Slide 7 text

7 @frankvndijk Images Audio Text Embedding model 0.9 0.7 0.2 0.6

Slide 8

Slide 8 text

8 @frankvndijk cat Embedding model 0.9 0.7 -0.3 0.6

Slide 9

Slide 9 text

9 @frankvndijk -0.2 cat Embedding model 0.9 0.7 -0.3 0.6 dog Embedding model 0.9 0.6 -0.2 0.8 pet Embedding model 0.9 0.7 -0.2 0.9 lion Embedding model 0.9 0.2 0.8

Slide 10

Slide 10 text

10 @frankvndijk cat pet lion dog

Slide 11

Slide 11 text

11 @frankvndijk cat dog 0.9 0.7 -0.3 0.6 0.9 0.6 -0.2 0.8

Slide 12

Slide 12 text

12 @frankvndijk What is cosine similarity? “Cosine similarity measures the angle between two embeddings in a multi-dimensional space to determine how similar they are” cat dog

Slide 13

Slide 13 text

13 @frankvndijk 0 1 Identical No similarity Similarities really arise Cosine similarity Always a score between 0 (or -1) and 1

Slide 14

Slide 14 text

15 @frankvndijk dog cat lion shark fox meerkat

Slide 15

Slide 15 text

16 @frankvndijk Practical examples Chatbots Recommendation systems Search engines

Slide 16

Slide 16 text

17 @frankvndijk Search query ‘spring’

Slide 17

Slide 17 text

18 @frankvndijk Search query ‘spring season’

Slide 18

Slide 18 text

19 @frankvndijk “We no longer live in a keyword era, but in an era of search intent” since 2013…

Slide 19

Slide 19 text

20 @frankvndijk What is a vector database? "A vector database stores data as high- dimensional vectors for efficient similarity searches and AI applications." 0.4 0.8 0.3 0.7 0.3 0.7 0.4 0.6 0.5 0.6 0.5 0.5

Slide 20

Slide 20 text

21 @frankvndijk Dataset Embedding model LLM Relevant data Answer Embedding model User question

Slide 21

Slide 21 text

22 @frankvndijk 128.000 tokens context window +/- 96.000 words 4.096 token limit +/- 3.000 words Limits of ChatGPT

Slide 22

Slide 22 text

23 @frankvndijk 0.4 0.8 0.3 0.7 Embedding model User question Connect ChatGPT to our database

Slide 23

Slide 23 text

24 @frankvndijk Connect ChatGPT to our database

Slide 24

Slide 24 text

25 @frankvndijk This has major advantages It helps to prevent hallucinations Have control over what data you use Use real time or new data

Slide 25

Slide 25 text

26 @frankvndijk Let’s get practical!

Slide 26

Slide 26 text

27 @frankvndijk Easy way of starting with embeddings Python Screaming Frog

Slide 27

Slide 27 text

28 @frankvndijk

Slide 28

Slide 28 text

29 @frankvndijk Different models for creating embeddings

Slide 29

Slide 29 text

30 @frankvndijk Embedding models from OpenAI text-embedding-ada-002 text-embedding-3-small text-embedding-3-large Released December 2022 1536 dimensions Released January 2024 1024 dimensions Released January 2024 3072 dimensions *Source: benchmark from datacamp.com

Slide 30

Slide 30 text

31 @frankvndijk Screaming Frog OpenAI API Embedding model Request Response

Slide 31

Slide 31 text

32 @frankvndijk Correct settings in SF Make sure that your crawl configurator is set properly: ● Extraction => Store Rendered HTML ● Rendering => JavaScript

Slide 32

Slide 32 text

33 @frankvndijk Connect with OpenAI Make the connection with OpenAI in your crawl: ● Add your API from OpenAI ● Choose the “Extract embeddings form page content” template

Slide 33

Slide 33 text

34 @frankvndijk Visible in your crawl Next, the embeddings will be visible in your crawl: ● Go to the AI tab ● Scroll to the embeddings

Slide 34

Slide 34 text

35 @frankvndijk Gemini or Ollama Ollama Gemini

Slide 35

Slide 35 text

36 @frankvndijk 0.4 0.8 0.3 0.7 0.8 0.4 0.3 0.2 0.2 0.7 0.9 0.1 0.1 0.5 0.3 0.9

Slide 36

Slide 36 text

37 @frankvndijk Three scripts for embeddings Internal link opportunities Redirect mapping Duplicate content analyses I will give them away

Slide 37

Slide 37 text

38 @frankvndijk SF crawl with embeddings Checking cosine similarity Checking existing link in HTML Internal link recommendations Webpage embedding Rest of the embeddings Internal link opportunities Checking similarity Checking relevancy Gathering pages

Slide 38

Slide 38 text

39 @frankvndijk Checking similarity Checking relevancy Gathering pages Gather the pages we want to optimise Gather the pages we want to optimize

Slide 39

Slide 39 text

40 @frankvndijk Check cosine similarity Checking if the similarity is at least 0.85 Checking similarity Checking relevancy Gathering pages

Slide 40

Slide 40 text

41 @frankvndijk Checking similarity Checking relevancy Gathering pages Check for in content link Checking for potential link in HTML of page

Slide 41

Slide 41 text

42 @frankvndijk Checking similarity Checking relevancy Gathering pages Looping through all pages Looping through this steps to find all recommendations

Slide 42

Slide 42 text

45 @frankvndijk Use my Google Colabs to run them Give it your input Run the script Download the results

Slide 43

Slide 43 text

46 @frankvndijk What about databases?

Slide 44

Slide 44 text

47 @frankvndijk Case A client was not present in the informational and orientation phase of the customer journey Solution Creating content based on a semantic search in a vector database

Slide 45

Slide 45 text

48 @frankvndijk Blog subject Writing process Extract data Blog content DB search Finding the right blog subjects Searching for our products in combination with ‘best’

Slide 46

Slide 46 text

49 @frankvndijk Blog subject Writing process Extract data Blog content DB search Semantic searches for products that match Getting the products that matches the blog subject

Slide 47

Slide 47 text

50 @frankvndijk Blog subject Writing process Extract data Blog content DB search Getting the product information Getting the products that matches the blog subject

Slide 48

Slide 48 text

51 @frankvndijk Blog subject Writing process Extract data Blog content DB search Write content With AI we managed to make a concept of the blog content

Slide 49

Slide 49 text

52 @frankvndijk Case Another clients website didn’t appear in the AI overviews for key keywords, while competitors did Solution Analyze and predict with embeddings and other things when content could be displayed

Slide 50

Slide 50 text

53 @frankvndijk Keywords & URLs Write Rewrite Update content Validation Optimized content Prediction Scrape the AI overviews Scraping content shown in the AI overview

Slide 51

Slide 51 text

54 @frankvndijk Keywords & URLs Write Rewrite Update content Validation Optimized content Prediction Scraping competitors Scraping the content of the competitor shown

Slide 52

Slide 52 text

55 @frankvndijk Keywords & URLs Write Rewrite Update content Validation Optimized content Prediction More data for the prediction Gathering other relevant data that is important

Slide 53

Slide 53 text

56 @frankvndijk Keywords & URLs Write Rewrite Update content Validation Optimized content Prediction Prediction partially based on embeddings We predict if the content is capable to be shown

Slide 54

Slide 54 text

57 @frankvndijk Keywords & URLs Write Rewrite Update content Validation Optimized content Prediction Optimization advice We (re)write content so we are capable to be shown

Slide 55

Slide 55 text

58 @frankvndijk Results After implementing: ● Before: 32.63% of AI Overviews contained a link ● After: 54.48% of AI Overviews contained a link +67% +49% Increase in clicks Increase in display of links

Slide 56

Slide 56 text

59 @frankvndijk Join the embeddings movement Start experimenting with embeddings and discover what’s possible. This is where the future begins

Slide 57

Slide 57 text

60 @frankvndijk Key takeaways What embeddings are and how they help us as SEO specialists How to automate an internal link audit and redirect mapping The handles for building a vector database and linking it to an LLM 01. 02. 03.

Slide 58

Slide 58 text

THANK YOU! LET’S CONNECT. linkedin.com/in/frankvndijk x.com/frankvndijk